Language Resources and Evaluation

Papers
(The TQCC of Language Resources and Evaluation is 2. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-05-01 to 2024-05-01.)
ArticleCitations
Resources and benchmark corpora for hate speech detection: a systematic review120
Machine translation systems and quality assessment: a systematic review42
DravidianCodeMix: sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text36
Investigating the effects of gender, dialect, and training size on the performance of Arabic speech recognition21
The Natural Stories corpus: a reading-time corpus of English texts containing rare syntactic constructions19
A comparative evaluation and analysis of three generations of Distributional Semantic Models17
Current limitations in cyberbullying detection: On evaluation criteria, reproducibility, and data scarcity16
The ParlaMint corpora of parliamentary proceedings15
A large English–Thai parallel corpus from the web and machine-generated text14
SENTiVENT: enabling supervised information extraction of company-specific events in economic and financial news14
Low resource language specific pre-processing and features for sentiment analysis task13
Automatic genre identification: a survey12
Roman Urdu toxic comment classification12
AI2D-RST: a multimodal corpus of 1000 primary school science diagrams11
Introducing the Gab Hate Corpus: defining and applying hate-based rhetoric to social media posts at scale11
Machine translation in society: insights from UK users11
C2SI corpus: a database of speech disorder productions to assess intelligibility and quality of life in head and neck cancers10
Developing computational infrastructure for the CorCenCC corpus: The National Corpus of Contemporary Welsh8
The Electronic Corpus of 17th- and 18th-century Polish Texts8
Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks7
The impact of preprocessing on word embedding quality: a comparative study7
Resources for Turkish dependency parsing: introducing the BOUN Treebank and the BoAT annotation tool6
MEmoFC: introducing the Multilingual Emotional Football Corpus6
The KAS corpus of Slovenian academic writing6
Exploring the role of lexis and grammar for the stable identification of register in an unrestricted corpus of web documents6
A large and evolving cognate database6
ViMs: a high-quality Vietnamese dataset for abstractive multi-document summarization6
Improvement of sentiment analysis via re-evaluation of objective words in SenticNet for hotel reviews6
Multiple annotation for biodiversity: developing an annotation framework among biology, linguistics and text technology6
SetembroBR: a social media corpus for depression and anxiety disorder prediction5
Investigating the role of swear words in abusive language detection tasks5
LDC-IL: The Indian repository of resources for language technology5
Commonsense based text mining on urban policy5
TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese5
Comparative performance of ensemble machine learning for Arabic cyberbullying and offensive language detection5
Finnish parliament ASR corpus4
Abstractive text summarization and new large-scale datasets for agglutinative languages Turkish and Hungarian4
Labelling the past: data set creation and multi-label classification of Dutch archaeological excavation reports4
Detecting explicit lyrics: a case study in Italian music4
Representing variation in a spoken corpus of an endangered dialect: the case of Torlak4
DISCO PAL: Diachronic Spanish sonnet corpus with psychological and affective labels3
Making the most of comparable corpora in Neural Machine Translation: a case study3
Register identification from the unrestricted open Web using the Corpus of Online Registers of English3
Annotating affective dimensions in user-generated content3
TuLeD (Tupían lexical database): introducing a database of a South American language family3
LexO: an open-source system for managing OntoLex-Lemon resources3
Sentence boundary detection of various forms of Tunisian Arabic3
Resources for Turkish natural language processing: A critical survey3
Constructing Arabic Reading Comprehension Datasets: Arabic WikiReading and KaifLematha3
Arabic real time entity resolution using inverted indexing3
A multi-source entity-level sentiment corpus for the financial domain: the FinLin corpus3
Linguistic resources for paraphrase generation in portuguese: a lexicon-grammar approach3
LanguageCrawl: a generic tool for building language models upon common Crawl3
PRAUTOCAL corpus: a corpus for the study of Down syndrome prosodic aspects3
Two languages, one treebank: building a Turkish–German code-switching treebank and its challenges3
Towards alignment strategies in human-agent interactions based on measures of lexical repetitions3
Predicting lexical complexity in English texts: the Complex 2.0 dataset3
The robotic-surgery propositional bank3
A Spanish dataset for reproducible benchmarked offline handwriting recognition2
Semi-automation of gesture annotation by machine learning and human collaboration2
Składnica: a constituency treebank of Polish harmonised with the Walenty valency dictionary2
Treebanking user-generated content: a UD based overview of guidelines, corpora and unified recommendations2
Unparalleled sarcasm: a framework of parallel deep LSTMs with cross activation functions towards detection and generation of sarcastic statements2
ArgRewrite V.2: an annotated argumentative revisions corpus2
Corpus tools for parallel corpora of theatre plays: an introduction to TAligner and ACM-theatre2
Towards the benchmarking of question generation: introducing the Monserrate corpus2
The WASABI song corpus and knowledge graph for music lyrics analysis2
A multimodal corpus of simulated consultations between a patient and multiple healthcare professionals2
Modelling multi-level prosody and spectral features using deep neural network for an automatic tonal and non-tonal pre-classification-based Indian language identification system2
OLID-BR: offensive language identification dataset for Brazilian Portuguese2
Corpora compilation for prosody-informed speech processing2
Label modification and bootstrapping for zero-shot cross-lingual hate speech detection2
Live blog summarization2
Understanding conversational interaction in multiparty conversations: the EVA Corpus2
The LRE Map: what does it tell us about the last decade of our field?2
FinnSentiment: a Finnish social media corpus for sentiment polarity annotation2
Writer’s uncertainty identification in scientific biomedical articles: a tool for automatic if-clause tagging2
Assessment of pragmatic abilities and cognitive substrates (APACS) brief remote: a novel tool for the rapid and tele-evaluation of pragmatic skills in Italian2
Broad coverage emotion annotation2
Redundancy and coverage aware enriched dragonfly-FL single document summarization2
Semantics-aware typographical choices via affective associations2
Sense representations for Portuguese: experiments with sense embeddings and deep neural language models2
Nonverbal communication with emojis in social media: dissociating hedonic intensity from frequency2
Development and evaluation of an Urdu treebank (CLE-UTB) and a statistical parser2
Harnessing Indigenous Tweets: The Reo Māori Twitter corpus2
0.031255006790161