OOIR: Observatory of International Research

Papers

(The TQCC of Empirical Software Engineering is 9. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-11-01 to 2025-11-01.)

Article	Citations
Introduction to the special issue on program comprehension	242
Consensus task interaction trace recommender to guide developers’ software navigation	110
Toward effective secure code reviews: an empirical study of security-related coding weaknesses	83
Understanding the characteristics and the role of visual issue reports	77
Underproduction analysis of open source software	70
Effects of variability in models: a family of experiments	70
An empirical study on the effectiveness of large language models for SATD identification and classification	60
Bugs in machine learning-based systems: a faultload benchmark	54
Shaky structures: The wobbly world of causal graphs in software analytics	50
TestEvoViz: visualizing genetically-based test coverage evolution	50
(In)Security of mobile apps in developing countries: a systematic literature review	49
Evaluating software user feedback classifier performance on unseen apps, datasets, and metadata	45
More than React: Investigating the Role of Emoji Reaction in GitHub Pull Requests	40
Does the first response matter for future contributions? A study of first contributions	39
The human experience of comprehending source code in virtual reality	35
Optimal priority assignment for real-time systems: a coevolution-based approach	34
On the adoption and effects of source code reuse on defect proneness and maintenance effort	33
A study of documentation for software architecture	33
Path context augmented statement and network for learning programs	33
Can static analysis tools find more defects?	33
Evaluating few-shot and contrastive learning methods for code clone detection	33
Seeing the invisible: test prioritization for object detection system	32
Practitioner’s view of the success factors for software outsourcing partnership formation: an empirical exploration	31
Developers’ perception matters: machine learning to detect developer-sensitive smells	30
Analyzing and mitigating (with LLMs) the security misconfigurations of Helm charts from Artifact Hub	30

Towards cost-benefit evaluation for continuous software engineering activities	29
What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow	29
Testing the past: can we still run tests in past snapshots for Java projects?	29
Deep learning based identification of inconsistent method names: How far are we?	27
The impact of the COVID-19 pandemic on women’s contribution to public code	27
Smells in system user interactive tests	27
A fine-grained taxonomy of code review feedback in TypeScript projects	27
The impact of class imbalance techniques on crashing fault residence prediction models	26
Evaluating the impact of flaky simulators on testing autonomous driving systems	26
On the use of commit-relevant mutants	26
Cross-status communication and project outcomes in OSS development	25
On the impact of security vulnerabilities in the npm and RubyGems dependency networks	25
Deep learning techniques to detect cybersecurity attacks: a systematic mapping study	25
Automated test generation for Scratch programs	25
The Influence of Code Comments on the Perceived Helpfulness of Stack Overflow Posts	24
Automatic prediction of rejected edits in Stack Overflow	24
Collaboration failure analysis in cyber-physical system-of-systems using context fuzzy clustering	24
BTLink : automatic link recovery between issues and commits based on pre-trained BERT model	23
App review driven collaborative bug finding	23
An empirical evaluation of a novel domain-specific language – modelling vehicle routing problems with Athos	22
AI support for data scientists: An empirical study on workflow and alternative code recommendations	22
An empirical study of untangling patterns of two-class dependency cycles	21
Understanding practitioners’ reasoning and requirements for efficient tool support in technical debt management	21
The effect of stereotypes on perceived competence of indigenous software practitioners: a study of dress style in professional photos	21
Indentation and reading time: a randomized control trial on the differences between generated indented and non-indented if-statements	21
JNFuzz-Droid: a lightweight fuzzing and taint analysis framework for native code of Android applications	21
Real world projects, real faults: evaluating spectrum based fault localization techniques on Python projects	20
Static detection of equivalent mutants in real-time model-based mutation testing	20
A grounded theory of community package maintenance organizations	20
Securing dependencies: A comprehensive study of Dependabot’s impact on vulnerability mitigation	20
Advantages and disadvantages of (dedicated) model transformation languages	20
How far are app secrets from being stolen? a case study on android	20
Code reviews in open source projects : how do gender biases affect participation and outcomes?	20
Why android app testing falls short: empirical insights from open-source projects and a practitioner survey	19
A configurable method for benchmarking scalability of cloud-native applications	19
The well-being of software engineers: a systematic literature review and a theory	19
Visualizing the customization endeavor in product-based-evolving software product lines: a case of action design research	19
On combining commit grouping and build skip prediction to reduce redundant continuous integration activity	19
Experimental comparison of features, analyses, and classifiers for Android malware detection	19
A large-scale empirical study of commit message generation: models, datasets and evaluation	19
An empirical study of the impact of log parsers on the performance of log-based anomaly detection	19
Engineering recommender systems for modelling languages: concept, tool and evaluation	18
How far are we with automated machine learning? characterization and challenges of AutoML toolkits	18
Demystifying regular expression bugs	17
Software product line testing: a systematic literature review	17
A metrics-based approach for selecting among various refactoring candidates	17
Lightweight dynamic build batching algorithms for continuous integration	17
What really changes when developers intend to improve their source code: a commit-level study of static metric value and static analysis warning changes	17
Systematic Evaluation of Deep Learning Models for Log-based Failure Prediction	17
Patterns of multi-container composition for service orchestration with Docker Compose	16

An empirical study on the potential of word embedding techniques in bug report management tasks	16
Towards a recipe for language decomposition: quality assessment of language product lines	16
Take a deep breath: Benefits of neuroplasticity practices for software developers and computer workers in a family of experiments	16
Semantic matching in GUI test reuse	15
Software testing in the machine learning era	15
When less is more: on the value of “co-training” for semi-supervised software defect predictors	15
Mastering uncertainty in performance estimations of configurable software systems	15
An investigation of online and offline learning models for online Just-in-Time Software Defect Prediction	15
On the Investigation of Empirical Contradictions - Aggregated Results of Local Studies on Readability and Comprehensibility of Source Code	15
LineFlowDP: A Deep Learning-Based Two-Phase Approach for Line-Level Defect Prediction	15
Gamification in software engineering: the mediating role of developer engagement and job satisfaction	14
Can the configuration of static analyses make resolving security vulnerabilities more effective? - A user study	14
OpTrans: enhancing binary code similarity detection with function inlining re-optimization	14
Präzi: from package-based to call-based dependency networks	14
What kinds of contracts do ML APIs need?	14
Enhanced SQL error messages facilitate faster error fixing	14
Common challenges of deep reinforcement learning applications development: an empirical study	14
RAG-Driven multiple assertions generation with large language models	14
Language usage analysis for EMF metamodels on GitHub	14
Comparing effectiveness and efficiency of Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP) tools in a large java-based system	14
Applying bayesian data analysis for causal inference about requirements quality: a controlled experiment	13
SmartFast: an accurate and robust formal analysis tool for Ethereum smart contracts	13
Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code?	13
Semantically-enhanced topic recommendation systems for software projects	13
Toward granular search-based automatic unit test case generation	13
Which design decisions in AI-enabled mobile applications contribute to greener AI?	13
A zero-shot framework for cross-project vulnerability detection in source code	13
Studying the explanations for the automated prediction of bug and non-bug issues using LIME and SHAP	13
Correction to: Examining ownership models in software teams	13
Measuring SES-related traits relating to technology usage: Two validated surveys	13
Prioritizing test cases for deep learning-based video classifiers	13
Towards understanding the challenges of bug localization in deep learning systems	13
Program transformation landscapes for automated program modification using Gin	13
Why secret detection tools are not enough: It’s not just about false positives - An industrial case study	13
Test smells 20 years later: detectability, validity, and reliability	13
Challenges and practices of deep learning model reengineering: A case study on computer vision	13
Defect prediction using deep learning with Network Portrait Divergence for software evolution	13
Exploring the black box: analysing explainable AI challenges and best practices through stack exchange discussions	13
Test schedule generation for acceptance testing of mission-critical satellite systems	12
An empirical study on self-admitted technical debt in Dockerfiles	12
DDImage: an image reduction based approach for automatically explaining black-box classifiers	12
Fixing Dockerfile smells: an empirical study	12
A multi-model framework for semantically enhancing detection of quality-related bug report descriptions	12
Correction to: Towards a recipe for language decomposition: quality assessment of language product lines	12
On the spread and evolution of dead methods in Java desktop applications: an exploratory study	12
Demystifying API misuses in deep learning applications	12
What have we learned? A conceptual framework on New Zealand software professionals and companies’ response to COVID-19	11
Cross-project defect prediction via semantic and syntactic encoding	11
Propagating frugal user feedback through closeness of code dependencies to improve IR-based traceability recovery	11
A fine-grained data set and analysis of tangling in bug fixing commits	11
A fine-grained evaluation of mutation operators to boost mutation testing for deep learning systems	11
Seeing confusion through a new lens: on the impact of atoms of confusion on novices’ code comprehension	11
Experimental Evaluation of a Checklist-Based Inspection Technique to Verify the Compliance of Software Systems with the Brazilian General Data Protection Law	11
Styler: learning formatting conventions to repair Checkstyle violations	11
When uncertainty leads to unsafety: Empirical insights into the role of uncertainty in unmanned aerial vehicle safety	11
Towards automatic labeling of exception handling bugs: A case study of 10 years bug-fixing in Apache Hadoop	11
Learning to Predict Code Review Completion Time In Modern Code Review	11
Explainable automated debugging via large language model-driven scientific debugging	11
Static analysis driven enhancements for comprehension in machine learning notebooks	11
Unveiling overlooked performance variance in serverless computing	11
Modeling function-level interactions for file-level bug localization	11
CyberSAGE: The cyber security argument graph evaluation tool	11
APR4Vul: an empirical study of automatic program repair techniques on real-world Java vulnerabilities	10
A qualitative study on refactorings induced by code review	10
Predicting merge conflicts considering social and technical assets	10
Story points changes in agile iterative development	10
Detecting data manipulation errors in android applications using scene-guided exploration	10
A qualitative study of developers’ discussions of their problems and joys during the early COVID-19 months	10
Transformer-based code model with compressed hierarchy representation	10
Studying differentiated code to support smart contract update	10
A controlled experiment on the impact of microtasking on programming	10
SoftNER: Mining knowledge graphs from cloud incidents	10
Two N-of-1 self-trials on readability differences between anonymous inner classes (AICs) and lambda expressions (LEs) on Java code snippets	10
Model vs system level testing of autonomous driving systems: a replication and extension study	10
Refactoring practices in the context of data-intensive systems	10
Empirically evaluating flaky test detection techniques combining test case rerunning and machine learning models	10
A comprehensive overview of software product management challenges	10
CsmithEdge: more effective compiler testing by handling undefined behaviour less conservatively	10
Understanding and effectively mitigating code review anxiety	10
Studying the characteristics of AIOps projects on GitHub	10

Towards understanding quality challenges of the federated learning for neural networks: a first look from the lens of robustness	10
Inter-team communication in large-scale co-located software engineering: a case study	10
An empirical study on developers’ shared conversations with ChatGPT in GitHub pull requests and issues	10
Industrial adoption of machine learning techniques for early identification of invalid bug reports	9
Correction to: Why do companies create and how do they succeed with a vendor-led open source foundation	9
What happens in my code reviews? An investigation on automatically classifying review changes	9
Come for syntax, stay for speed, write secure code: an empirical study of security weaknesses in Julia programs	9
Reuse and maintenance practices among divergent forks in three software ecosystems	9
Agile software development one year into the COVID-19 pandemic	9
Silent bugs in deep learning frameworks: an empirical study of Keras and TensorFlow	9
Assessing the exposure of software changes	9
A comprehensive study of machine learning techniques for log-based anomaly detection	9
Software selection in large-scale software engineering: A model and criteria based on interactive rapid reviews	9
Extracting enhanced artificial intelligence model metadata from software repositories	9
Can search-based testing with pareto optimization effectively cover failure-revealing test inputs?	9
Understanding refactorings in Elixir functional language	9
Toward a theory on programmer’s block inspired by writer’s block	9
From guidelines to practice: assessing Android app developer compliance with google’s security recommendations	9
Automatic bi-modal question title generation for Stack Overflow with prompt learning	9
How programmers find online learning resources	9
An empirical study of the systemic and technical migration towards microservices	9
On the assignment of commits to releases	9
Machine learning-based test smell detection	9
Hyperfuzzing: black-box security hypertesting with a grey-box fuzzer	9
IRJIT: A simple, online, information retrieval approach for just-in-time software defect prediction	9
Multi-granular software annotation using file-level weak labelling	9
Navigating fairness: practitioners’ understanding, challenges, and strategies in AI/ML development	9
Deep learning approaches for bad smell detection: a systematic literature review	9
Predicting the objective and priority of issue reports in software repositories	9