Language Resources and Evaluation

Papers
(The median citation count of Language Resources and Evaluation is 0. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-04-01 to 2025-04-01.)
ArticleCitations
Investigating droplet emission during speech interaction90
A semantics-aware approach for multilingual natural language inference56
Editorial: LRE updates32
Language resources for clinical linguistics: introduction to the special issue29
Error annotation: a review and faceted taxonomy25
Stereohoax: a multilingual corpus of racial hoaxes and social media reactions annotated for stereotypes23
Constructing understanding: on the constructional information encoded in large language models21
PolitePEER: does peer review hurt? A dataset to gauge politeness intensity in the peer reviews20
Investigating the role of swear words in abusive language detection tasks19
Evaluation of the morphological rules for the Tenyidie language: a low-resource language15
Fake news article detection datasets for Hindi language15
PinLID: a dataset for Pinglish language identiftcation based on code-mixing sentence on unstructured resources13
A multimodal corpus of simulated consultations between a patient and multiple healthcare professionals12
Manfred Stede and Jodi Schneider: Argumentation mining. Synthesis lectures on human language technologies, edited by Graeme Hirst11
Open source platform for Estonian speech transcription11
Studying word meaning evolution through incremental semantic shift detection11
Multi-layered semantic annotation and the formalisation of annotation schemas for the investigation of modality in a Latin corpus9
Identifying communicative functions in discourse with content types9
EventDNA: a dataset for Dutch news event extraction as a basis for news diversification9
SENTiVENT: enabling supervised information extraction of company-specific events in economic and financial news9
Spelling errors made by people with dyslexia9
A Spanish dataset for reproducible benchmarked offline handwriting recognition9
Strategies for managing time and costs in speech corpus creation: insights from the Slovenian ARTUR corpus8
From LIMA to DeepLIMA: following a new path of interoperability7
Sense through time: diachronic word sense annotations for word sense induction and Lexical Semantic Change Detection7
Data augmentation strategies to improve text classification: a use case in smart cities7
Pragmatic evaluations of automated linguistic creativity7
A tale of four parsers: methodological reflections on diagnostic evaluation and in-depth error analysis for meaning representation parsing7
A corpus of Schlieren photography of speech production: potential methodology to study aerodynamics of labial, nasal and vocalic processes7
Commonsense based text mining on urban policy6
Regionalized models for Spanish language variations based on Twitter6
Develop corpora and methods for cross-lingual text reuse detection for English Urdu language pair at lexical, syntactical, and phrasal levels5
Investigating interoperable event corpora: limitations of reusability of resources and portability of models5
Map Task Corpus of Heritage BCMS spoken by second-generation speakers in Switzerland5
Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks5
Two sepedi-english code-switched speech corpora4
The robotic-surgery propositional bank4
ParlaMint II: advancing comparable parliamentary corpora across Europe4
When MIPVU goes to no man’s land: a new language resource for hybrid, morpheme-based metaphor identification in Hungarian4
How different is different? Systematically identifying distribution shifts and their impacts in NER datasets4
A survey on geocoding: algorithms and datasets for toponym resolution4
Speech acts in the Dutch COVID-19 Press Conferences4
The ParlaMint corpora of parliamentary proceedings4
Democratizing neural machine translation with OPUS-MT4
From greatest simplicity to full power4
Hope speech detection in Spanish4
FinnSentiment: a Finnish social media corpus for sentiment polarity annotation3
AC-IQuAD: Automatically Constructed Indonesian Question Answering Dataset by Leveraging Wikidata3
A morphologically annotated longitudinal corpus of spoken Czech child–adult interactions3
The narratives of war (NoW) corpus of written testimonies of the Russia-Ukraine war3
Correction to: Investigating droplet emission during speech interaction3
Semantic processing for Urdu: corpus creation, parsing, and generation3
KurdiSent: a corpus for kurdish sentiment analysis3
Normalized dataset for Sanskrit word segmentation and morphological parsing3
Evaluation of end-to-end continuous spanish lipreading in different data conditions3
Exploratory Analysis of Rinconada Bikol Language-Nabua Text Corpus3
Automatic genre identification: a survey3
The WASABI song corpus and knowledge graph for music lyrics analysis3
VeLeRo: an inflected verbal lexicon of standard Romanian and a quantitative analysis of morphological predictability3
Umigon-lexicon: rule-based model for interpretable sentiment analysis and factuality categorization3
Abstractive text summarization and new large-scale datasets for agglutinative languages Turkish and Hungarian3
Correction to: Two sepedi‑english code‑switched speech corpora2
Jira: a Central Kurdish speech recognition system, designing and building speech corpus and pronunciation lexicon2
RUN-AS: a novel approach to annotate news reliability for disinformation detection2
Comparative performance of ensemble machine learning for Arabic cyberbullying and offensive language detection2
Brazilian Portuguese corpora for teaching and translation: the CoMET project2
Universal Dependencies for Mandarin Chinese2
The DELAD initiative for sharing language resources on speech disorders2
Between welcome culture and border fence2
Depression symptoms modelling from social media text: an LLM driven semi-supervised learning approach2
The Electronic Corpus of 17th- and 18th-century Polish Texts2
Analyzing learner language: the case of the Hebrew Learner Essay Corpus2
NewsCom-TOX: a corpus of comments on news articles annotated for toxicity in Spanish2
Developing and testing syllabification systems for South African Sesotho2
Detecting explicit lyrics: a case study in Italian music2
Evaluation of a rule-based approach to automatic factual question generation using syntactic and semantic analysis2
Harnessing Indigenous Tweets: The Reo Māori Twitter corpus2
Constructing a cross-document event coreference corpus for Dutch2
Usage disambiguation of Turkish discourse connectives2
TIARA 2.0: an interactive tool for annotating discourse structure and text improvement2
The Visual Language Research Corpus (VLRC): an annotated corpus of comics from Asia, Europe, and the United States2
Linguistic resources for paraphrase generation in portuguese: a lexicon-grammar approach2
Automatic language identification: a case study of Pahari languages2
Cross-linguistically consistent semantic and syntactic annotation of child-directed speech2
A semi-supervised method to generate a persian dataset for suggestion classification2
SetembroBR: a social media corpus for depression and anxiety disorder prediction2
LexO: an open-source system for managing OntoLex-Lemon resources1
Spontaneous, controlled acts of reference between friends and strangers1
Building the VisSE Corpus of Spanish SignWriting1
NEREL: a Russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity links1
Predicting lexical complexity in English texts: the Complex 2.0 dataset1
The language of discrimination: assessing attention discrimination by Hungarian local governments1
A flexible tool for a qualia-enriched FrameNet: the FrameNet Brasil WebTool1
Corpus tools for parallel corpora of theatre plays: an introduction to TAligner and ACM-theatre1
The Najdi Arabic Corpus: a new corpus for an underrepresented Arabic dialect1
SCTB-V2: the 2nd version of the Chinese treebank in the scientific domain1
Cantonese natural language processing in the transformers era: a survey and current challenges1
Which words are important?: an empirical study of Assamese sentiment analysis1
Detection of political hate speech in Korean language1
Rapidly developing NLP applications for content curation1
SOLD: Sinhala offensive language dataset1
A corpus of English learners with Arabic and Hebrew backgrounds1
Multi-domain adaptation for named entity recognition with multi-aspect relevance learning1
Low resource language specific pre-processing and features for sentiment analysis task1
CachacaNER: a dataset for named entity recognition in texts about the cachaça beverage1
Automatic generation of creative text in Portuguese: an overview1
The development of a labelled te reo Māori–English bilingual database for language technology1
Automatic construction of direction-aware sentiment lexicon using direction-dependent words1
COLLIE: a broad-coverage ontology and lexicon of verbs in English1
CsFEVER and CTKFacts: acquiring Czech data for fact verification1
Text complexity of open educational resources in Portuguese: mixing written and spoken registers in a multi-task approach1
MedicalCare: building and annotating an empathy-rich corpus1
Understanding conversational interaction in multiparty conversations: the EVA Corpus1
Data-driven dependency parsing of Vedic Sanskrit1
Umplc: the first longitudinal learner corpus of Portuguese1
An eye-tracking-with-EEG coregistration corpus of narrative sentences1
Towards the benchmarking of question generation: introducing the Monserrate corpus1
A comparative evaluation and analysis of three generations of Distributional Semantic Models1
A new corpus of geolocated ASR transcripts from Germany1
Beyond plain toxic: building datasets for detection of flammable topics and inappropriate statements1
Nonverbal communication with emojis in social media: dissociating hedonic intensity from frequency1
Construction of Amharic information retrieval resources and corpora1
Sentiment analysis dataset in Moroccan dialect: bridging the gap between Arabic and Latin scripted dialect1
Correction to: Universal Dependencies for Mandarin Chinese1
Correction: Cross-linguistically consistent semantic and syntactic annotation of child-directed speech1
Blackfoot Words: a database of Blackfoot lexical forms1
Improving Arabic sentiment analysis across context-aware attention deep model based on natural language processing1
The semantically annotated corpus of Polish quantificational expressions1
The C-ORAL-ESQ project: a corpus for the study of spontaneous speech of individuals with schizophrenia1
ArgRewrite V.2: an annotated argumentative revisions corpus1
A sentiment corpus for the cryptocurrency financial domain: the CryptoLin corpus1
A study on methods for revising dependency treebanks: in search of gold1
Evaluation of the Brazilian Portuguese version of linguistic inquiry and word count 2015 (BP-LIWC2015)1
Historical Portuguese corpora: a survey1
A corpus of Persian literary text1
A multi-source entity-level sentiment corpus for the financial domain: the FinLin corpus1
A new evaluation method: evaluation data and metrics for Chinese grammatical error correction1
Toxic comment classification and rationale extraction in code-mixed text leveraging co-attentive multi-task learning1
EmoTwiCS: a corpus for modelling emotion trajectories in Dutch customer service dialogues on Twitter1
"Approaches to sentiment analysis of Hungarian political news at the sentence level"0
TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese0
Correction: The DELAD initiative for sharing language resources on speech disorders0
Semantic search as extractive paraphrase span detection0
NILC-Metrix: assessing the complexity of written and spoken language in Brazilian Portuguese0
adaptNMT: an open-source, language-agnostic development environment for neural machine translation0
Infectious risk events and their novelty in event-based surveillance: new definitions and annotated corpus0
Faux Hate: unravelling the web of fake narratives in spreading hateful stories: a multi-label and multi-class dataset in cross-lingual Hindi-English code-mixed text0
Ulysses Tesemõ: a new large corpus for Brazilian legal and governmental domain0
Features in extractive supervised single-document summarization: case of Persian news0
Manipuri–English comparable corpus for cross-lingual studies0
Lexical modeling for the development of Amharic automatic speech recognition systems0
Chinese-DiMLex: a lexicon of Chinese discourse connectives0
Unparalleled sarcasm: a framework of parallel deep LSTMs with cross activation functions towards detection and generation of sarcastic statements0
MarIA and BETO are sexist: evaluating gender bias in large language models for Spanish0
The limitations of irony detection in Dutch social media0
Benchmark of public intent recognition services0
Šolar, the developmental corpus of Slovene0
The eHRI database: a multimodal database of engagement in human–robot interactions0
CINWA (database of terminology for cultivated plants in indigenous languages of northwestern South America): introducing a resource for research in ethnobiology, anthropology, historical linguistics, 0
Data augmentation and transfer learning for cross-lingual Named Entity Recognition in the biomedical domain0
Introducing the Gab Hate Corpus: defining and applying hate-based rhetoric to social media posts at scale0
DeepMine-multi-TTS: a Persian speech corpus for multi-speaker text-to-speech0
Statistical quality estimation for partially subjective classification tasks through crowdsourcing0
A comparative analysis of encoder only and decoder only models in intent classification and sentiment analysis: navigating the trade-offs in model size and performance0
Automatic dependency parsing of Estonian: what linguistic features to include?0
Sentence boundary detection of various forms of Tunisian Arabic0
Introducing a Swahili social media sentiment analysis dataset for the telecom industry0
The Italian Roots in Australian Soil (IRIAS) multilingual speech corpus. Speech variation in two generations of Italo-Australians0
PARSEME-AR: Arabic reference corpus for multiword expressions using PARSEME annotation guidelines0
Determinants of grader agreement: an analysis of multiple short answer corpora0
POMET: a corpus for poetic meter classification0
Mismatching-aware unsupervised translation quality estimation for low-resource languages0
LanguageCrawl: a generic tool for building language models upon common Crawl0
Perspectivist approaches to natural language processing: a survey0
An integrated framework for emotion and sentiment analysis in Tamil and Malayalam visual content0
PRAUTOCAL corpus: a corpus for the study of Down syndrome prosodic aspects0
UHated: hate speech detection in Urdu language using transfer learning0
Broad coverage emotion annotation0
Syntactic annotation for Portuguese corpora: standards, parsers, and search interfaces0
Multiple annotation for biodiversity: developing an annotation framework among biology, linguistics and text technology0
Using contrastive language-image pre-training for Thai recipe recommendation0
TCMeta: a multilingual dataset of COVID tweets for relation-level metaphor analysis0
Constructing Arabic Reading Comprehension Datasets: Arabic WikiReading and KaifLematha0
Assessing linguistic generalisation in language models: a dataset for Brazilian Portuguese0
An aligned corpus of Spanish bibles0
The Mandarin Chinese speech database: a corpus of 18,820 auditory neutral nonsense sentences0
The UAN Colombian co-speech gesture corpus0
Two languages, one treebank: building a Turkish–German code-switching treebank and its challenges0
UFLA-FORMS: an academic forms dataset for information extraction in the Portuguese language0
Design and construction of Guayaquil radio speech corpus (CHARG)0
Human-inspired computational models for European Portuguese: a review0
Automatic readability assessment for sentences: neural, hybrid and large language models0
A benchmark dataset and evaluation methodology for Chinese zero pronoun translation0
LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI0
FullStop: punctuation and segmentation prediction for Dutch with transformers0
Training and evaluation of vector models for Galician0
Disfluency annotated corpora for Indian English in technical domains0
Semi-automation of gesture annotation by machine learning and human collaboration0
Speech emotion recognition for the Urdu language0
Sentiment analysis in low-resource contexts: BERT’s impact on Central Kurdish0
Correction to: Semi-automation of gesture annotation by machine learning and human collaboration0
BRISE-plandok: a German legal corpus of building regulations0
VeLeSpa: An inflected verbal lexicon of Peninsular Spanish and a quantitative analysis of paradigmatic predictability0
Dataset on sentiment-based cryptocurrency-related news and tweets in English and Malay language0
Exploring lexical factors in semantic annotation: insights from the classification of nouns in French0
A survey and study impact of tweet sentiment analysis via transfer learning in low resource scenarios0
Textflows: an open science NLP evaluation approach0
Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches0
OMCD: Offensive Moroccan Comments Dataset0
Resources for Turkish dependency parsing: introducing the BOUN Treebank and the BoAT annotation tool0
A rich task-oriented dialogue corpus in Vietnamese0
Using BERT models for breast cancer diagnosis from Turkish radiology reports0
Resources for Turkish natural language processing: A critical survey0
“You’ll be a nurse, my son!” Automatically assessing gender biases in autoregressive language models in French and Italian0
JWSAN: Japanese word similarity and association norm0
Finnish parliament ASR corpus0
Mining culture from professional discourse: a lexicon-based hybrid method0
OLID-BR: offensive language identification dataset for Brazilian Portuguese0
DILLo: an Italian lexical database for speech-language pathologists0
Corpora compilation for prosody-informed speech processing0
Fine-tuning language models to recognize semantic relations0
Machine translation in society: insights from UK users0
The link between translation difficulty and the quality of machine translation: a literature review and empirical investigation0
Uzbek news corpus for named entity recognition0
Book Review: the Routledge Handbook of Translation and Ethics0
A sequence labelling approach for automatic analysis of ello: tagging pronouns, antecedents, and connective phrases0
A multilingual, multimodal dataset of aggression and bias: the ComMA dataset0
Register identification from the unrestricted open Web using the Corpus of Online Registers of English0
Research on translation quality self-evaluation by expert translators: an empirical study0
Correction to: Resources for Turkish natural language processing: A critical survey0
CORAA ASR: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese0
Rei Miyata: controlled document authoring in a machine translation age0
Making the most of comparable corpora in Neural Machine Translation: a case study0
Improving irony speech spreaders profiling on social networks using clustering & transformer based models0
RastrOS Project: Natural Language Processing contributions to the development of an eye-tracking corpus with predictability norms for Brazilian Portuguese0
Human–robot dialogue annotation for multi-modal common ground0
Correction: COLLIE: a broad-coverage ontology and lexicon of verbs in English0
DISCO PAL: Diachronic Spanish sonnet corpus with psychological and affective labels0
Utilizing phonetic similarity for cross-source and cross-language toponym matching: a benchmark and prototype0
Managing, storing, and sharing long-form recordings and their annotations0
Conducting sentiment analysis: Lei L. & Liu D. Elements in Corpus Linguistics, CUP0
Computational approaches to Portuguese: introduction to the special issue0
Treebanking user-generated content: a UD based overview of guidelines, corpora and unified recommendations0
Content-free speech activity records: interviews with people with schizophrenia0
Assessment of pragmatic abilities and cognitive substrates (APACS) brief remote: a novel tool for the rapid and tele-evaluation of pragmatic skills in Italian0
From extended chunking to dependency parsing using traditional Arabic grammar0
Czech news dataset for semantic textual similarity0
Text augmentation for semantic frame induction and parsing0
Labelling the past: data set creation and multi-label classification of Dutch archaeological excavation reports0
Sanitization of septic news sentences through hybrid approach in English0
0.059506893157959