• Magyar

MTA–DE–SZTE Research Group

for Theoretical Linguistics

  • Home
  • History
  • Goals
    • Theorietical Foundations
    • Lingistic Strategies
    • Tool Developments
  • Research Staff
    • Management
    • Research Staff
    • Advisory Board
  • Publications, Events
  • Applications, Database
    • MedCollect Corpus
    • AI Fake News detector
    • Mobile Application
    • Browser Extension
  • News

MedCollect corpus

The MedCollect corpus was manually built for the purposes of linguistic analysis, it contains 2206 articles (1.259.567 tokens) in the topic of health and medicine, 1448 (864.472) of which are fake news and 758 (395.095) of which are control samples. The corpus contains articles from 179 different websites, around 90% of articles, however, are from only 26 websites. The oldest article in the corpus is from 2007, however, 75% of the articles were published after 2020.

In order to uncover the structural and hidden manipulative strategies characteristic of fake news, 707 articles (370.300) were isolated for manual annotation. Of these, 322 (182.626) are fake news and 385 (187.626) are control samples.

6722 Szeged,
Egyetem utca 2.
enyik@szte.hu

MTA–DE–SZTE Research Group
for Theoretical Linguistics

Science for the Hungarian Language National Programme of the Hungarian Academy of Sciences (MTA)

Linguistic identification of fake news and pseudoscientific views

 

University of Szeged

Faculty of Humanities and Social Sciences

Department of General Linguistics

Copyright © Yougrids 2025 All rights reserved. Custom Design by Youjoomla.com