Language Resources and Evaluation

Papers
(The TQCC of Language Resources and Evaluation is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-06-01 to 2025-06-01.)
ArticleCitations
Strategies for managing time and costs in speech corpus creation: insights from the Slovenian ARTUR corpus60
Spelling errors made by people with dyslexia34
Speech acts in the Dutch COVID-19 Press Conferences32
Investigating the role of swear words in abusive language detection tasks27
Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks23
From LIMA to DeepLIMA: following a new path of interoperability22
Commonsense based text mining on urban policy22
A survey on geocoding: algorithms and datasets for toponym resolution20
AC-IQuAD: Automatically Constructed Indonesian Question Answering Dataset by Leveraging Wikidata18
Hope speech detection in Spanish18
The Visual Language Research Corpus (VLRC): an annotated corpus of comics from Asia, Europe, and the United States15
The narratives of war (NoW) corpus of written testimonies of the Russia-Ukraine war15
Brazilian Portuguese corpora for teaching and translation: the CoMET project13
Spontaneous, controlled acts of reference between friends and strangers13
Construction of Amharic information retrieval resources and corpora12
Understanding conversational interaction in multiparty conversations: the EVA Corpus12
Toxic comment classification and rationale extraction in code-mixed text leveraging co-attentive multi-task learning11
A new evaluation method: evaluation data and metrics for Chinese grammatical error correction11
Corpus tools for parallel corpora of theatre plays: an introduction to TAligner and ACM-theatre10
A study on methods for revising dependency treebanks: in search of gold10
LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI9
Assessing linguistic generalisation in language models: a dataset for Brazilian Portuguese9
Automatic readability assessment for sentences: neural, hybrid and large language models8
CORAA ASR: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese8
Utilizing phonetic similarity for cross-source and cross-language toponym matching: a benchmark and prototype7
Human–machine interaction in building an English reference dataset for natural language processing tasks7
Sentiment analysis in Portuguese tweets: an evaluation of diverse word representation models7
adaptNMT: an open-source, language-agnostic development environment for neural machine translation7
UHated: hate speech detection in Urdu language using transfer learning7
A comparative analysis of encoder only and decoder only models in intent classification and sentiment analysis: navigating the trade-offs in model size and performance6
Uzbek news corpus for named entity recognition6
Labelling the past: data set creation and multi-label classification of Dutch archaeological excavation reports6
Perspectivist approaches to natural language processing: a survey6
An integrated framework for emotion and sentiment analysis in Tamil and Malayalam visual content6
Conversion of the Spanish WordNet databases into a Prolog-readable format5
Slovenian parliamentary corpus siParl5
Manfred Stede and Jodi Schneider: Argumentation mining. Synthesis lectures on human language technologies, edited by Graeme Hirst5
DoSLex: automatic generation of all domain semantically rich sentiment lexicon5
VeLeSpa: An inflected verbal lexicon of Peninsular Spanish and a quantitative analysis of paradigmatic predictability5
Managing, storing, and sharing long-form recordings and their annotations5
TCMeta: a multilingual dataset of COVID tweets for relation-level metaphor analysis5
Chinese-DiMLex: a lexicon of Chinese discourse connectives5
Ulysses Tesemõ: a new large corpus for Brazilian legal and governmental domain5
Open source platform for Estonian speech transcription4
Studying word meaning evolution through incremental semantic shift detection4
Sense through time: diachronic word sense annotations for word sense induction and Lexical Semantic Change Detection4
Harnessing Indigenous Tweets: The Reo Māori Twitter corpus4
Developing and testing syllabification systems for South African Sesotho4
Identifying communicative functions in discourse with content types4
Multi-task learning for multi-dialect Arabic sentiment classification and sarcasm detection4
PolitePEER: does peer review hurt? A dataset to gauge politeness intensity in the peer reviews4
Correction to: Two sepedi‑english code‑switched speech corpora4
KurdiSent: a corpus for kurdish sentiment analysis4
Benchmarking Hindi-to-English direct speech-to-speech translation with synthetic data4
Language resources for clinical linguistics: introduction to the special issue4
The WASABI song corpus and knowledge graph for music lyrics analysis4
The ParlaMint corpora of parliamentary proceedings4
Constructing a cross-document event coreference corpus for Dutch4
ArgRewrite V.2: an annotated argumentative revisions corpus3
Low resource language specific pre-processing and features for sentiment analysis task3
Finnish parliament ASR corpus3
Creation of a gold standard Dutch corpus of clinical notes for adverse drug event detection: the Dutch ADE corpus3
A corpus of English learners with Arabic and Hebrew backgrounds3
Correction: Cross-linguistically consistent semantic and syntactic annotation of child-directed speech3
Using BERT models for breast cancer diagnosis from Turkish radiology reports3
Abstractive text summarization and new large-scale datasets for agglutinative languages Turkish and Hungarian3
Sentiment analysis dataset in Moroccan dialect: bridging the gap between Arabic and Latin scripted dialect3
The limitations of irony detection in Dutch social media3
Design and construction of Guayaquil radio speech corpus (CHARG)3
A Spanish dataset for reproducible benchmarked offline handwriting recognition2
Text complexity of open educational resources in Portuguese: mixing written and spoken registers in a multi-task approach2
LexO: an open-source system for managing OntoLex-Lemon resources2
Normalized dataset for Sanskrit word segmentation and morphological parsing2
Evaluation of a rule-based approach to automatic factual question generation using syntactic and semantic analysis2
PARSEME-AR: Arabic reference corpus for multiword expressions using PARSEME annotation guidelines2
Investigating interoperable event corpora: limitations of reusability of resources and portability of models2
OLID-BR: offensive language identification dataset for Brazilian Portuguese2
OMCD: Offensive Moroccan Comments Dataset2
The Hmong Medical Corpus: a biomedical corpus for a minority language2
FullStop: punctuation and segmentation prediction for Dutch with transformers2
Examining inferred author and textual correlates of harmful language annotation2
Automatic genre identification: a survey2
SOLD: Sinhala offensive language dataset2
A tale of four parsers: methodological reflections on diagnostic evaluation and in-depth error analysis for meaning representation parsing2
RUN-AS: a novel approach to annotate news reliability for disinformation detection2
MulCogBench: a multi-modal cognitive benchmark dataset for evaluating Chinese and English computational language models2
MarIA and BETO are sexist: evaluating gender bias in large language models for Spanish2
Detecting explicit lyrics: a case study in Italian music2
Sentiment analysis in low-resource contexts: BERT’s impact on Central Kurdish2
Correction to: Semi-automation of gesture annotation by machine learning and human collaboration2
DILLo: an Italian lexical database for speech-language pathologists2
Correction to: Resources for Turkish natural language processing: A critical survey2
Multi-layered semantic annotation and the formalisation of annotation schemas for the investigation of modality in a Latin corpus2
COLLIE: a broad-coverage ontology and lexicon of verbs in English2
Aspect-based multimodal sentiment analysis via employing visual-to-emotional-caption translation network using visual-caption pairs2
Comparative performance of ensemble machine learning for Arabic cyberbullying and offensive language detection2
HASTIKA: hate speech and target identification in Kannada-English code-mixed text2
FinnSentiment: a Finnish social media corpus for sentiment polarity annotation2
Assessment of pragmatic abilities and cognitive substrates (APACS) brief remote: a novel tool for the rapid and tele-evaluation of pragmatic skills in Italian2
A sequence labelling approach for automatic analysis of ello: tagging pronouns, antecedents, and connective phrases2
Correction: COLLIE: a broad-coverage ontology and lexicon of verbs in English2
Benchmark of public intent recognition services2
0.10824799537659