Language Resources and Evaluation

Papers
(The TQCC of Language Resources and Evaluation is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-04-01 to 2025-04-01.)
ArticleCitations
Investigating droplet emission during speech interaction90
A semantics-aware approach for multilingual natural language inference56
Editorial: LRE updates32
Language resources for clinical linguistics: introduction to the special issue29
Error annotation: a review and faceted taxonomy25
Stereohoax: a multilingual corpus of racial hoaxes and social media reactions annotated for stereotypes23
Constructing understanding: on the constructional information encoded in large language models21
PolitePEER: does peer review hurt? A dataset to gauge politeness intensity in the peer reviews20
Investigating the role of swear words in abusive language detection tasks19
Evaluation of the morphological rules for the Tenyidie language: a low-resource language15
Fake news article detection datasets for Hindi language15
PinLID: a dataset for Pinglish language identiftcation based on code-mixing sentence on unstructured resources13
A multimodal corpus of simulated consultations between a patient and multiple healthcare professionals12
Manfred Stede and Jodi Schneider: Argumentation mining. Synthesis lectures on human language technologies, edited by Graeme Hirst11
Open source platform for Estonian speech transcription11
Studying word meaning evolution through incremental semantic shift detection11
Multi-layered semantic annotation and the formalisation of annotation schemas for the investigation of modality in a Latin corpus9
Identifying communicative functions in discourse with content types9
EventDNA: a dataset for Dutch news event extraction as a basis for news diversification9
SENTiVENT: enabling supervised information extraction of company-specific events in economic and financial news9
Spelling errors made by people with dyslexia9
A Spanish dataset for reproducible benchmarked offline handwriting recognition9
Strategies for managing time and costs in speech corpus creation: insights from the Slovenian ARTUR corpus8
Sense through time: diachronic word sense annotations for word sense induction and Lexical Semantic Change Detection7
Data augmentation strategies to improve text classification: a use case in smart cities7
Pragmatic evaluations of automated linguistic creativity7
A tale of four parsers: methodological reflections on diagnostic evaluation and in-depth error analysis for meaning representation parsing7
A corpus of Schlieren photography of speech production: potential methodology to study aerodynamics of labial, nasal and vocalic processes7
From LIMA to DeepLIMA: following a new path of interoperability7
Regionalized models for Spanish language variations based on Twitter6
Commonsense based text mining on urban policy6
Investigating interoperable event corpora: limitations of reusability of resources and portability of models5
Map Task Corpus of Heritage BCMS spoken by second-generation speakers in Switzerland5
Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks5
Develop corpora and methods for cross-lingual text reuse detection for English Urdu language pair at lexical, syntactical, and phrasal levels5
The robotic-surgery propositional bank4
ParlaMint II: advancing comparable parliamentary corpora across Europe4
When MIPVU goes to no man’s land: a new language resource for hybrid, morpheme-based metaphor identification in Hungarian4
How different is different? Systematically identifying distribution shifts and their impacts in NER datasets4
A survey on geocoding: algorithms and datasets for toponym resolution4
Speech acts in the Dutch COVID-19 Press Conferences4
The ParlaMint corpora of parliamentary proceedings4
Democratizing neural machine translation with OPUS-MT4
From greatest simplicity to full power4
Hope speech detection in Spanish4
Two sepedi-english code-switched speech corpora4
AC-IQuAD: Automatically Constructed Indonesian Question Answering Dataset by Leveraging Wikidata3
A morphologically annotated longitudinal corpus of spoken Czech child–adult interactions3
The narratives of war (NoW) corpus of written testimonies of the Russia-Ukraine war3
Correction to: Investigating droplet emission during speech interaction3
Semantic processing for Urdu: corpus creation, parsing, and generation3
KurdiSent: a corpus for kurdish sentiment analysis3
Normalized dataset for Sanskrit word segmentation and morphological parsing3
Evaluation of end-to-end continuous spanish lipreading in different data conditions3
Exploratory Analysis of Rinconada Bikol Language-Nabua Text Corpus3
Automatic genre identification: a survey3
The WASABI song corpus and knowledge graph for music lyrics analysis3
VeLeRo: an inflected verbal lexicon of standard Romanian and a quantitative analysis of morphological predictability3
Umigon-lexicon: rule-based model for interpretable sentiment analysis and factuality categorization3
Abstractive text summarization and new large-scale datasets for agglutinative languages Turkish and Hungarian3
FinnSentiment: a Finnish social media corpus for sentiment polarity annotation3
Comparative performance of ensemble machine learning for Arabic cyberbullying and offensive language detection2
Brazilian Portuguese corpora for teaching and translation: the CoMET project2
Universal Dependencies for Mandarin Chinese2
The DELAD initiative for sharing language resources on speech disorders2
Between welcome culture and border fence2
Depression symptoms modelling from social media text: an LLM driven semi-supervised learning approach2
The Electronic Corpus of 17th- and 18th-century Polish Texts2
Analyzing learner language: the case of the Hebrew Learner Essay Corpus2
NewsCom-TOX: a corpus of comments on news articles annotated for toxicity in Spanish2
Developing and testing syllabification systems for South African Sesotho2
Detecting explicit lyrics: a case study in Italian music2
Evaluation of a rule-based approach to automatic factual question generation using syntactic and semantic analysis2
Harnessing Indigenous Tweets: The Reo Māori Twitter corpus2
Constructing a cross-document event coreference corpus for Dutch2
Usage disambiguation of Turkish discourse connectives2
TIARA 2.0: an interactive tool for annotating discourse structure and text improvement2
The Visual Language Research Corpus (VLRC): an annotated corpus of comics from Asia, Europe, and the United States2
Linguistic resources for paraphrase generation in portuguese: a lexicon-grammar approach2
Automatic language identification: a case study of Pahari languages2
Cross-linguistically consistent semantic and syntactic annotation of child-directed speech2
A semi-supervised method to generate a persian dataset for suggestion classification2
SetembroBR: a social media corpus for depression and anxiety disorder prediction2
Correction to: Two sepedi‑english code‑switched speech corpora2
Jira: a Central Kurdish speech recognition system, designing and building speech corpus and pronunciation lexicon2
RUN-AS: a novel approach to annotate news reliability for disinformation detection2
0.045291900634766