Empirical Software Engineering

Papers
(The TQCC of Empirical Software Engineering is 9. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-05-01 to 2026-05-01.)
ArticleCitations
Introduction to the special issue on program comprehension101
Consensus task interaction trace recommender to guide developers’ software navigation84
TestEvoViz: visualizing genetically-based test coverage evolution83
Shaky structures: The wobbly world of causal graphs in software analytics77
More than React: Investigating the Role of Emoji Reaction in GitHub Pull Requests69
Does the first response matter for future contributions? A study of first contributions63
Underproduction analysis of open source software55
The human experience of comprehending source code in virtual reality52
Security by documentation? characterizing GitHub SECURITY.md policy and their adoption in Python libraries48
The design space of lockfiles across package managers47
(In)Security of mobile apps in developing countries: a systematic literature review44
Seeing the invisible: test prioritization for object detection system41
Optimal priority assignment for real-time systems: a coevolution-based approach40
Can static analysis tools find more defects?39
Evaluating software user feedback classifier performance on unseen apps, datasets, and metadata39
Evaluating few-shot and contrastive learning methods for code clone detection38
Fuzzing-based mutation testing of C/C++ software in cyber-physical systems38
Understanding the characteristics and the role of visual issue reports37
Toward effective secure code reviews: an empirical study of security-related coding weaknesses36
Mitigating omitted variable bias in empirical software engineering35
Bugs in machine learning-based systems: a faultload benchmark35
On the adoption and effects of source code reuse on defect proneness and maintenance effort35
A study of documentation for software architecture34
The impact of class imbalance techniques on crashing fault residence prediction models32
An empirical study on the effectiveness of large language models for SATD identification and classification32
Developers’ perception matters: machine learning to detect developer-sensitive smells31
On the use of commit-relevant mutants31
Cross-status communication and project outcomes in OSS development30
The impact of the COVID-19 pandemic on women’s contribution to public code30
On the impact of security vulnerabilities in the npm and RubyGems dependency networks29
Automated test generation for Scratch programs29
Automatic prediction of rejected edits in Stack Overflow28
Smells in system user interactive tests28
The Influence of Code Comments on the Perceived Helpfulness of Stack Overflow Posts28
Output format biases in the evaluation of large language models for code translation27
Maintaining shared understanding of non-functional requirements in small companies using continuous software engineering26
Collaboration failure analysis in cyber-physical system-of-systems using context fuzzy clustering25
App review driven collaborative bug finding25
Deep learning based identification of inconsistent method names: How far are we?24
What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow24
Analyzing and mitigating (with LLMs) the security misconfigurations of Helm charts from Artifact Hub24
Testing the past: can we still run tests in past snapshots for Java projects?24
Towards cost-benefit evaluation for continuous software engineering activities24
Deep learning techniques to detect cybersecurity attacks: a systematic mapping study23
BTLink : automatic link recovery between issues and commits based on pre-trained BERT model23
A fine-grained taxonomy of code review feedback in TypeScript projects23
Evaluating the impact of flaky simulators on testing autonomous driving systems23
An empirical study of untangling patterns of two-class dependency cycles22
The effect of stereotypes on perceived competence of indigenous software practitioners: a study of dress style in professional photos22
AI support for data scientists: An empirical study on workflow and alternative code recommendations22
A grounded theory of community package maintenance organizations21
A Comprehensive Study of the Lifecycle of Dormant npm Packages21
On combining commit grouping and build skip prediction to reduce redundant continuous integration activity21
Indentation and reading time: a randomized control trial on the differences between generated indented and non-indented if-statements21
Securing dependencies: A comprehensive study of Dependabot’s impact on vulnerability mitigation20
JNFuzz-Droid: a lightweight fuzzing and taint analysis framework for native code of Android applications20
How far are app secrets from being stolen? a case study on android20
Why android app testing falls short: empirical insights from open-source projects and a practitioner survey20
Real world projects, real faults: evaluating spectrum based fault localization techniques on Python projects20
Static detection of equivalent mutants in real-time model-based mutation testing20
Understanding practitioners’ reasoning and requirements for efficient tool support in technical debt management19
Scalable hierarchical protocol format inference via feature-heuristic message delimiter19
The well-being of software engineers: a systematic literature review and a theory19
Code reviews in open source projects : how do gender biases affect participation and outcomes?19
Advantages and disadvantages of (dedicated) model transformation languages19
Experimental comparison of features, analyses, and classifiers for Android malware detection18
An empirical study of the impact of log parsers on the performance of log-based anomaly detection18
Quantifying adoption: A SEM study of quantum software technology in software development18
An empirical evaluation of a novel domain-specific language – modelling vehicle routing problems with Athos18
A large-scale empirical study of commit message generation: models, datasets and evaluation17
Patterns of multi-container composition for service orchestration with Docker Compose17
How far are we with automated machine learning? characterization and challenges of AutoML toolkits17
Lightweight dynamic build batching algorithms for continuous integration17
LineFlowDP: A Deep Learning-Based Two-Phase Approach for Line-Level Defect Prediction17
A configurable method for benchmarking scalability of cloud-native applications17
A metrics-based approach for selecting among various refactoring candidates17
What really changes when developers intend to improve their source code: a commit-level study of static metric value and static analysis warning changes16
Validation of an analyzability model for quantum software: a family of experiments16
Securing LLM-in-the-loop software for empirical study of risks, mitigations, and utility trade-offs in a safety-critical case16
Local software buildability across Java versions16
Engineering recommender systems for modelling languages: concept, tool and evaluation16
Software product line testing: a systematic literature review16
An empirical study on the potential of word embedding techniques in bug report management tasks16
ContractFull: a rapid and comprehensive static analysis tool for Ethereum smart contracts15
An investigation of online and offline learning models for online Just-in-Time Software Defect Prediction15
Mastering uncertainty in performance estimations of configurable software systems15
Software testing in the machine learning era15
Systematic Evaluation of Deep Learning Models for Log-based Failure Prediction15
Language usage analysis for EMF metamodels on GitHub15
Take a deep breath: Benefits of neuroplasticity practices for software developers and computer workers in a family of experiments15
Tools and benchmarks evolve: what is their impact on parameter tuning in SBSE experiments?15
Preface to the Special Issue on Security Testing for Complex Software Systems Special Issue 1239 Editorial14
When less is more: on the value of “co-training” for semi-supervised software defect predictors14
What kinds of contracts do ML APIs need?14
Präzi: from package-based to call-based dependency networks14
OpTrans: enhancing binary code similarity detection with function inlining re-optimization14
Common challenges of deep reinforcement learning applications development: an empirical study14
Enhanced SQL error messages facilitate faster error fixing14
On the Investigation of Empirical Contradictions - Aggregated Results of Local Studies on Readability and Comprehensibility of Source Code14
Semantic matching in GUI test reuse14
Can the configuration of static analyses make resolving security vulnerabilities more effective? - A user study13
Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code?13
An exploratory study on fine-tuning large language models for secure code generation13
A zero-shot framework for cross-project vulnerability detection in source code13
Test schedule generation for acceptance testing of mission-critical satellite systems13
Toward granular search-based automatic unit test case generation13
Semantically-enhanced topic recommendation systems for software projects13
Test smells 20 years later: detectability, validity, and reliability13
RAG-Driven multiple assertions generation with large language models13
Defect prediction using deep learning with Network Portrait Divergence for software evolution13
Classifier or prompt: A case study on legal requirements traceability13
Which design decisions in AI-enabled mobile applications contribute to greener AI?13
Applying bayesian data analysis for causal inference about requirements quality: a controlled experiment13
Comparing effectiveness and efficiency of Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP) tools in a large java-based system13
Prioritizing test cases for deep learning-based video classifiers13
Correction to: Examining ownership models in software teams13
Exploring the black box: analysing explainable AI challenges and best practices through stack exchange discussions13
Challenges and practices of deep learning model reengineering: A case study on computer vision13
Measuring SES-related traits relating to technology usage: Two validated surveys13
An empirical study of testing practices in open source AI agent frameworks and agentic applications12
DDImage: an image reduction based approach for automatically explaining black-box classifiers12
Studying the explanations for the automated prediction of bug and non-bug issues using LIME and SHAP12
SmartFast: an accurate and robust formal analysis tool for Ethereum smart contracts12
Correction to: Towards a recipe for language decomposition: quality assessment of language product lines12
Program transformation landscapes for automated program modification using Gin12
On the spread and evolution of dead methods in Java desktop applications: an exploratory study12
A fine-grained data set and analysis of tangling in bug fixing commits12
Towards understanding the challenges of bug localization in deep learning systems12
A controlled experiment on the impact of microtasking on programming11
A fine-grained evaluation of mutation operators to boost mutation testing for deep learning systems11
How challenging it is to identify real code authors: an empirical study11
Towards automatic labeling of exception handling bugs: A case study of 10 years bug-fixing in Apache Hadoop11
A multi-model framework for semantically enhancing detection of quality-related bug report descriptions11
Learning to Predict Code Review Completion Time In Modern Code Review11
Cross-project defect prediction via semantic and syntactic encoding11
APR4Vul: an empirical study of automatic program repair techniques on real-world Java vulnerabilities11
CsmithEdge: more effective compiler testing by handling undefined behaviour less conservatively11
What have we learned? A conceptual framework on New Zealand software professionals and companies’ response to COVID-1911
Demystifying API misuses in deep learning applications11
KPIRoot+: An efficient integrated framework for anomaly detection and root cause analysis in large-scale cloud systems11
Styler: learning formatting conventions to repair Checkstyle violations11
Fixing Dockerfile smells: an empirical study11
CyberSAGE: The cyber security argument graph evaluation tool11
Explainable automated debugging via large language model-driven scientific debugging11
Modeling function-level interactions for file-level bug localization11
On detection latencies of network intrusion detectors – discussion and application11
Automated detection of algorithm debt in deep learning frameworks: an empirical study11
Experimental Evaluation of a Checklist-Based Inspection Technique to Verify the Compliance of Software Systems with the Brazilian General Data Protection Law11
Static analysis driven enhancements for comprehension in machine learning notebooks11
When uncertainty leads to unsafety: Empirical insights into the role of uncertainty in unmanned aerial vehicle safety11
A comprehensive overview of software product management challenges11
A qualitative study on refactorings induced by code review10
Navigating fairness: practitioners’ understanding, challenges, and strategies in AI/ML development10
Unveiling overlooked performance variance in serverless computing10
Detecting data manipulation errors in android applications using scene-guided exploration10
An empirical study on developers’ shared conversations with ChatGPT in GitHub pull requests and issues10
Assessing the exposure of software changes10
Understanding and effectively mitigating code review anxiety10
Towards understanding quality challenges of the federated learning for neural networks: a first look from the lens of robustness10
Predicting merge conflicts considering social and technical assets10
Studying differentiated code to support smart contract update10
Silent bugs in deep learning frameworks: an empirical study of Keras and TensorFlow10
Investigating cross-market android apps: Security, protection, and components10
Studying the characteristics of AIOps projects on GitHub10
A qualitative study of developers’ discussions of their problems and joys during the early COVID-19 months10
Detecting API compatibility issues of android applications based on screen transition graphs10
Seeing confusion through a new lens: on the impact of atoms of confusion on novices’ code comprehension10
Transformer-based code model with compressed hierarchy representation10
Empirically evaluating flaky test detection techniques combining test case rerunning and machine learning models10
Agile software development one year into the COVID-19 pandemic10
From guidelines to practice: assessing Android app developer compliance with google’s security recommendations9
Story points changes in agile iterative development9
Can search-based testing with pareto optimization effectively cover failure-revealing test inputs?9
Peer-aided repairer: empowering large language models to repair advanced student assignments9
Developers and generative AI: A study of self-admitted usage in open source projects9
An efficient model maintenance approach for MLOps9
Industrial adoption of machine learning techniques for early identification of invalid bug reports9
A comprehensive study of machine learning techniques for log-based anomaly detection9
An empirical study of the systemic and technical migration towards microservices9
Software selection in large-scale software engineering: A model and criteria based on interactive rapid reviews9
Refactoring practices in the context of data-intensive systems9
Automatic bi-modal question title generation for Stack Overflow with prompt learning9
Evaluating pre-trained models for user feedback analysis in software engineering: a study on classification of app-reviews9
On the assignment of commits to releases9
Correction to: Utilization of pre-trained language models for adapter-based knowledge transfer in software engineering9
Deep learning approaches for bad smell detection: a systematic literature review9
GenCode: A generic data augmentation framework for boosting deep learning-based code understanding9
Extracting enhanced artificial intelligence model metadata from software repositories9
Machine learning-based test smell detection9
Correction to: Why do companies create and how do they succeed with a vendor-led open source foundation9
How programmers find online learning resources9
Hyperfuzzing: black-box security hypertesting with a grey-box fuzzer9
Model vs system level testing of autonomous driving systems: a replication and extension study9
IRJIT: A simple, online, information retrieval approach for just-in-time software defect prediction9
Multi-granular software annotation using file-level weak labelling9
What characteristics make ChatGPT effective for software issue resolution? An empirical study of task, project, and conversational signals in GitHub issues9
Come for syntax, stay for speed, write secure code: an empirical study of security weaknesses in Julia programs9
Can generative AI bridge the gap? A quasi-experimental study of non-programmers with AI vs. programmers without AI9
“What really happened to my models?” Extending co-evolution with cross-layer traceability in metamodel-model histories9
The whos, whats, and whys of issues related to personal data and data protection in open-source projects on GitHub9
Understanding refactorings in Elixir functional language9
Toward a theory on programmer’s block inspired by writer’s block9
0.07437801361084