OOIR: Observatory of International Research

Papers

(The median citation count of Transactions of the Association for Computational Linguistics is 4. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-01-01 to 2026-01-01.)

Article	Citations
Overcoming Source Object Grounding for Semantic Image Editing	539
From Robustness to Improved Generalization and Calibration in Pre-trained Language Models	258
KEFT: Knowledge-Enhanced Fine-Tuning for Large Language Models in Domain-Specific Question Answering	219
DARE: Diverse Visual Question Answering with Robustness Evaluation	218
Persona-Aware Alignment Framework for Personalized Dialogue Generation	195
Cross-functional Analysis of Generalization in Behavioral Learning	133
How to Select Datapoints for Efficient Human Evaluation of NLG Models?	99
Segmentation-Free Streaming Machine Translation	99
The Ethics of Automating Legal Actors	97
State of What Art? A Call for Multi-Prompt LLM Evaluation	96
Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection	93
Transformers for Tabular Data Representation: A Survey of Models and Applications	92
The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation	78
Erasure of Unaligned Attributes from Neural Representations	77
A Survey of Text Games for Reinforcement Learning Informed by Natural Language	75
Aggretriever: A Simple Approach to Aggregate Textual Representations for Robust Dense Passage Retrieval	69
Revisiting Meta-evaluation for Grammatical Error Correction	68
T 2 -NER: A Two-Stage Span-Based Framework for Unified Named Entity Recognition with Templates	60
A Survey on Automated Fact-Checking	59
Bridging the Gap between Synthetic and Natural Questions via Sentence Decomposition for Semantic Parsing	58
Learning English with Peppa Pig	57
Do Multi-Document Summarization Models Synthesize?	56
Retrieval-Pretrained Transformer: Long-range Language Modeling with Self-retrieval	55
DEAR: Disentangled Event-Agnostic Representation Learning for Early Fake News Detection	52
Uncertainty Estimation and Reduction of Pre-trained Models for Text Regression	50

Benchmarking the Generation of Fact Checking Explanations	49
Frame Representation Hypothesis: Multi-Token LLM Interpretability and Concept-Guided Text Generation	49
Federated Learning for Exploiting Annotators’ Disagreements in Natural Language Processing	47
Context-Aware Machine Translation with Source Coreference Explanation	45
Learning More from Mixed Emotions: A Label Refinement Method for Emotion Recognition in Conversations	43
Time-Aware Language Models as Temporal Knowledge Bases	42
mtRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems	42
How to Dissect a Muppet: The Structure of Transformer Embedding Spaces	39
Few-Shot Multilingual Open-Domain QA from Five Examples	37
Scientia Potentia Est—On the Role of Knowledge in Computational Argumentation	36
To Diverge or Not to Diverge: A Morphosyntactic Perspective on Machine Translation vs Human Translation	34
Adversarial Defense without Adversarial Defense : Enhancing Language Model Robustness via Instance-level Principal Component Removal	33
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets	32
Compositional Evaluation on Japanese Textual Entailment and Similarity	32
Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art	31
Morphology Without Borders: Clause-Level Morphology	27
Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation	27
Toward Robust RALMs: Revealing the Impact of Imperfect Retrieval on Retrieval-Augmented Language Models	26
Break, Perturb, Build: Automatic Perturbation of Reasoning Paths Through Question Decomposition	26
Template-based Abstractive Microblog Opinion Summarization	26
An Energy-based Model for Word-level AutoCompletion in Computer-aided Translation	26
Communication Drives the Emergence of Language Universals in Neural Agents: Evidence from the Word-order/Case-marking Trade-off	26
Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale	25
Analyzing and Adapting Large Language Models for Few-Shot Multilingual NLU: Are We There Yet?	25
ProoFVer: Natural Logic Theorem Proving for Fact Verification	25
PADA: Example-based Prompt Learning for on-the-fly Adaptation to Unseen Domains	25
Are Triggers Needed for Document-Level Event Extraction?	25
Conformal Prediction for Natural Language Processing: A Survey	25
Questions Are All You Need to Train a Dense Passage Retriever	25
True Few-Shot Learning with Prompts—A Real-World Perspective	22
Prompt Contrastive Transformation: An Enhanced Strategy for Efficient Prompt Transfer in Natural Language Processing	21
Adapting to the Long Tail: A Meta-Analysis of Transfer Learning Research for Language Understanding Tasks	21
Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance in Adaptation	21
Canine: Pre-training an Efficient Tokenization-Free Encoder for Language Representation	19
Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis	19
Navigating Cultural Chasms: Exploring and Unlocking the Cultural POV of Text-To-Image Models	18
Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5 for Machine Translation	17
CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation	16
Neuron-level Interpretation of Deep NLP Models: A Survey	16
Navigating the Landscape of Hint Generation Research: From the Past to the Future	16
OpenFact: Factuality Enhanced Open Knowledge Extraction	16
Retrieve What You Need: A Mutual Learning Framework for Open-domain Question Answering	16
InSCIt: Information-Seeking Conversations with Mixed-Initiative Interactions	16
Interactive Machine Teaching by Labeling Rules and Instances	15
Sense-specific Historical Word Usage Generation	15
Efficient Long-Text Understanding with Short-Text Models	15
Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models	15
A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction	14
Addressing the Binning Problem in Calibration Assessment through Scalar Annotations	14
Robust Pronoun Fidelity with English LLMs: Are they Reasoning, Repeating, or Just Biased?	13

MENLI: Robust Evaluation Metrics from Natural Language Inference	13
Explainable Abuse Detection as Intent Classification and Slot Filling	13
Objectifying the Subjective: Cognitive Biases in Topic Interpretations	13
Structural Persistence in Language Models: Priming as a Window into Abstract Language Representations	13
ABNIRML: Analyzing the Behavior of Neural IR Models	13
BharatBBQ: A Multilingual Bias Benchmark for Question Answering in the Indian Context	13
Pre-train, Prompt, and Recommendation: A Comprehensive Survey of Language Modeling Paradigm Adaptations in Recommender Systems	13
Learning Fair Representations via Rate-Distortion Maximization	12
TaxoPro: A Plug-In LoRA-based Cross-Domain Method for Low-Resource Taxonomy Completion	12
Human Choice Prediction in Language-based Persuasion Games: Simulation-based Off-Policy Evaluation	12
Safe Pruning LoRA: Robust Distance-Guided Pruning for Safety Alignment in Adaptation of LLMs	12
Towards More Realistic Extraction Attacks: An Adversarial Perspective	11
PaniniQA: Enhancing Patient Education Through Interactive Question Answering	11
NLP Security and Ethics, in the Wild	11
Self-Rationalization in the Wild: A Large-scale Out-of-Distribution Evaluation on NLI-related tasks	11
TANQ: An Open Domain Dataset of Table Answered Questions	11
Adding Chocolate to Mint : Mitigating Metric Interference in Machine Translation	11
Is My Model Using the Right Evidence? Systematic Probes for Examining Evidence-Based Tabular Reasoning	11
Rescue Conversations from Dead-ends: Efficient Exploration for Task-oriented Dialogue Policy Optimization	11
How “Real” is Your Real-Time Simultaneous Speech-to-Text Translation System?	11
Investigating Critical Period Effects in Language Acquisition through Neural Language Models	11
Modeling Emotion Dynamics in Song Lyrics with State Space Models	11
Data-driven Parsing Evaluation for Child-Parent Interactions	10
xcomet : Transparent Machine Translation Evaluation through Fine-grained Error Detection	10
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark	10
Time-and-Space-Efficient Weighted Deduction	10
Data-to-text Generation with Variational Sequential Planning	10
Helpful Neighbors: Leveraging Neighbors in Geographic Feature Pronunciation	10
Sub-Character Tokenization for Chinese Pretrained Language Models	10
Patchwise Cooperative Game-based Interpretability Method for Large Vision-language Models	10
FeTaQA: Free-form Table Question Answering	9
Benchmarking Large Language Models for News Summarization	9
Assessing the Capacity of Transformer to Abstract Syntactic Representations: A Contrastive Analysis Based on Long-distance Agreement	9
End-to-end Argument Mining with Cross-corpora Multi-task Learning	9
Large Language Models Enable Few-Shot Clustering	9
Decomposing and Recomposing Event Structure	9
Know Your Limits: A Survey of Abstention in Large Language Models	9
Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding	9
Visual Spatial Reasoning	9
Step-by-Step Unmasking for Parameter-Efficient Fine-Tuning of Large Language Models	9
Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages	9
QAmeleon: Multilingual QA with Only 5 Examples	8
Diff-Explainer: Differentiable Convex Optimization for Explainable Multi-hop Inference	8
How Abstract Is Linguistic Generalization in Large Language Models? Experiments with Argument Structure	8
Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation	8
On the Effect of Instruction Tuning Loss on Generalization	8
QE4PE: Word-level Quality Estimation for Human Post-Editing	8
Evaluating Transformer Models and Human Behaviors on Chinese Character Naming	8
Abstractive Meeting Summarization: A Survey	8
Direct Speech Translation for Automatic Subtitling	8
Scope Ambiguities in Large Language Models	7
Visually Grounded Speech Models Have a Mutual Exclusivity Bias	7
Can Authorship Representation Learning Capture Stylistic Features?	7
The Parallelism Tradeoff: Limitations of Log-Precision Transformers	7
CreoleVal: Multilingual Multitask Benchmarks for Creoles	7
Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision	7
♫ MuSiQue: Multihop Questions via Single-hop Question Composition	7
A Cross-Linguistic Pressure for Uniform Information Density in Word Order	7
Hallucinations in Large Multilingual Translation Models	7
Meta-Learning the Difference: Preparing Large Language Models for Efficient Adaptation	7
Conformalizing Machine Translation Evaluation	7
Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond	6
A Multi-Level Optimization Framework for End-to-End Text Augmentation	6
Collective Human Opinions in Semantic Textual Similarity	6
A Comparative Approach for Auditing Multilingual Phonetic Transcript Archives	6
Chinese Idiom Paraphrasing	6
The Emergence of Argument Structure in Artificial Languages	6
Robust Dialogue State Tracking with Weak Supervision and Sparse Data	6
Expectations over Unspoken Alternatives Predict Pragmatic Inferences	6
Cultural Adaptation of Recipes	6
Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences	6
How Much Semantic Information is Available in Large Language Model Tokens?	5
Meta-Learning a Cross-lingual Manifold for Semantic Parsing	5
Hierarchical Indexing for Retrieval-Augmented Opinion Summarization	5
Lost in the Middle: How Language Models Use Long Contexts	5
Compositional Generalization in Multilingual Semantic Parsing over Wikidata	5
STPar: A Structure-Aware Triaffine Parser for Screenplay Character Coreference Resolution	5
Document Summarization with Latent Queries	5
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends, and Metrics Analysis	5
ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation	5

mGPT: Few-Shot Learners Go Multilingual	5
Comparing Humans and Large Language Models on an Experimental Protocol Inventory for Theory of Mind Evaluation (EPITOME)	5
Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering	5
A Unifying Scheme for Extractive Content Selection Tasks	5
AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR	5
FaithDial: A Faithful Benchmark for Information-Seeking Dialogue	4
Relational Memory-Augmented Language Models	4
Decision-Oriented Dialogue for Human-AI Collaboration	4
Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval	4
RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns	4
Ultra-fine Entity Typing with Indirect Supervision from Natural Language Inference	4
Saturated Transformers are Constant-Depth Threshold Circuits	4
Unleashing the True Potential of Sequence-to-Sequence Models for Sequence Tagging and Structure Parsing	4
Sentence Similarity Based on Contexts	4
Hate Speech Classifiers Learn Normative Social Stereotypes	4
Less is More: Mitigate Spurious Correlations for Open-Domain Dialogue Response Generation Models by Causal Discovery	4
Investigating Reasons for Disagreement in Natural Language Inference	4
Shared Lexical Items as Triggers of Code Switching	4
Can Authorship Attribution Models Distinguish Speakers in Speech Transcripts?	4
Do LLMs Exhibit Human-like Response Biases? A Case Study in Survey Design	4
Do Text Simplification Systems Preserve Meaning? A Human Evaluation via Reading Comprehension	4
Naturalistic Causal Probing for Morpho-Syntax	4
KoBBQ: Korean Bias Benchmark for Question Answering	4
FoVer: First-Order Logic Verification for Natural Language Reasoning	4
Exploring Contrast Consistency of Open-Domain Question Answering Systems on Minimally Edited Questions	4
Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing	4