Empirical Software Engineering

Papers
(The median citation count of Empirical Software Engineering is 3. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-04-01 to 2025-04-01.)
ArticleCitations
Unveiling overlooked performance variance in serverless computing173
Assessing the adoption of security policies by developers in terraform across different cloud providers103
GitHub Actions: The Impact on the Pull Request Process78
A fine-grained data set and analysis of tangling in bug fixing commits68
Finding the sweet spot for organizational control and team autonomy in large-scale agile software development56
Fixing Dockerfile smells: an empirical study54
Calling relationship investigation and application on Ethereum Blockchain System54
UPC sentinel: An accurate approach for detecting upgradeability proxy contracts in Ethereum49
Reflections on the Empirical Software Engineering journal45
Adversarial domain adaptation for cross-project defect prediction43
A review of automatic source code summarization43
Introduction to the special issue on program comprehension42
A machine and deep learning analysis among SonarQube rules, product, and process metrics for fault prediction42
Assessing exception handling testing practices in open-source libraries38
Transformers and meta-tokenization in sentiment analysis for software engineering35
Explainable automated debugging via large language model-driven scientific debugging34
Prioritizing code review requests to improve review efficiency: a simulation study33
A fine-grained evaluation of mutation operators to boost mutation testing for deep learning systems33
Hold on! is my feedback useful? evaluating the usefulness of code review comments32
Computation offloading for ground robotic systems communicating over WiFi – an empirical exploration on performance and energy trade-offs32
An empirical study on release notes patterns of popular apps in the Google Play Store32
Evaluating the robustness of source code plagiarism detection tools to pervasive plagiarism-hiding modifications31
Detection and evaluation of bias-inducing features in machine learning30
APR4Vul: an empirical study of automatic program repair techniques on real-world Java vulnerabilities28
Reviewing rounds prediction for code patches27
Software development metrics: to VR or not to VR27
SAFe transformation in a large financial corporation27
Using Screenshot Attachments in Issue Reports for Triaging26
Rap4DQ: Learning to recommend relevant API documentation for developer questions25
A controlled experiment on the impact of microtasking on programming25
An empirical study on self-admitted technical debt in Dockerfiles25
A comprehensive overview of software product management challenges25
Using code reviews to automatically configure static analysis tools24
Development effort estimation in free/open source software from activity in version control systems24
Mutation testing in the wild: findings from GitHub23
Upstream bug management in Linux distributions23
Path context augmented statement and network for learning programs22
An empirical study of issue-link algorithms: which issue-link algorithms should we use?22
Dependabot and security pull requests: large empirical study22
Predicting health indicators for open source projects (using hyperparameter optimization)21
Newcomer OSS-Candidates: Characterizing Contributions of Novice Developers to GitHub21
18 million links in commit messages: purpose, evolution, and decay21
Supporting single responsibility through automated extract method refactoring21
Understanding code smells in Elixir functional language21
On the spread and evolution of dead methods in Java desktop applications: an exploratory study21
Toward effective secure code reviews: an empirical study of security-related coding weaknesses20
Towards automatic labeling of exception handling bugs: A case study of 10 years bug-fixing in Apache Hadoop20
Enhancing robustness of AI offensive code generators via data augmentation20
A study of documentation for software architecture19
The human experience of comprehending source code in virtual reality19
Can static analysis tools find more defects?19
A study of common bug fix patterns in Rust19
Seeing confusion through a new lens: on the impact of atoms of confusion on novices’ code comprehension19
Optimal priority assignment for real-time systems: a coevolution-based approach19
How Scrum adds value to achieving software quality?19
Static test flakiness prediction: How Far Can We Go?18
A syntax-guided multi-task learning approach for Turducken-style code generation18
Static analysis driven enhancements for comprehension in machine learning notebooks18
Pitfalls and guidelines for using time-based Git data18
TestEvoViz: visualizing genetically-based test coverage evolution18
OpenSCV: an open hierarchical taxonomy for smart contract vulnerabilities18
Do Agile scaling approaches make a difference? an empirical comparison of team effectiveness across popular scaling approaches18
Sources of software development task friction18
Learning to Predict Code Review Completion Time In Modern Code Review18
VaryMinions: leveraging RNNs to identify variants in variability-intensive systems’ logs17
CyberSAGE: The cyber security argument graph evaluation tool17
Towards Trusted Smart Contracts: A Comprehensive Test Suite For Vulnerability Detection17
On the adoption and effects of source code reuse on defect proneness and maintenance effort17
An empirical study on the effectiveness of large language models for SATD identification and classification17
Does the first response matter for future contributions? A study of first contributions17
DDImage: an image reduction based approach for automatically explaining black-box classifiers16
Augmented testing to support manual GUI-based regression testing: An empirical study16
Towards enhancing the reproducibility of deep learning bugs: an empirical study16
Cross-project defect prediction via semantic and syntactic encoding16
CloneRipples: predicting change propagation between code clone instances by graph-based deep learning16
An empirical study on API usages from code search engine and local library16
Using knowledge units of programming languages to recommend reviewers for pull requests: an empirical study15
The software heritage license dataset (2022 edition)15
Performance evolution of configurable software systems: an empirical study15
An empirical study into the effects of transpilation on quantum circuit smells15
Dynamical analysis of diversity in rule-based open source network intrusion detection systems15
Measuring model alignment for code clone detection using causal interpretation15
An empirical study of business process models and model clones on GitHub15
Breaking bad? Semantic versioning and impact of breaking changes in Maven Central14
Styler: learning formatting conventions to repair Checkstyle violations14
Effects of variability in models: a family of experiments14
Revisiting reopened bugs in open source software systems14
An empirical study on the use of SZZ for identifying inducing changes of non-functional bugs13
Efficient static analysis and verification of featured transition systems13
Consensus task interaction trace recommender to guide developers’ software navigation13
Will you come back to contribute? Investigating the inactivity of OSS core developers in GitHub13
Understanding the characteristics and the role of visual issue reports13
HyperPUT: generating synthetic faulty programs to challenge bug-finding tools13
Analysing Time-Stamped Co-Editing Networks in Software Development Teams using git2net13
VEER: enhancing the interpretability of model-based optimizations13
A multi-model framework for semantically enhancing detection of quality-related bug report descriptions13
An extensive replication study of the ABLoTS approach for bug localization13
Demystifying API misuses in deep learning applications13
E-APR: Mapping the effectiveness of automated program repair techniques13
Evaluating software user feedback classifier performance on unseen apps, datasets, and metadata13
DebtFree: minimizing labeling cost in self-admitted technical debt identification using semi-supervised learning13
An empirical study of same-day releases of popular packages in the npm ecosystem13
Seeing the invisible: test prioritization for object detection system12
Analyzing source code vulnerabilities in the D2A dataset with ML ensembles and C-BERT12
An empirical study of IoT topics in IoT developer discussions on Stack Overflow12
An empirical study of the effectiveness of IR-based bug localization for large-scale industrial projects12
The best ends by the best means: ethical concerns in app reviews12
Enhancing the defectiveness prediction of methods and classes via JIT12
More than React: Investigating the Role of Emoji Reaction in GitHub Pull Requests12
Correction to: Towards a recipe for language decomposition: quality assessment of language product lines12
Evaluating few-shot and contrastive learning methods for code clone detection12
On the effectiveness of log representation for log-based anomaly detection12
Propagating frugal user feedback through closeness of code dependencies to improve IR-based traceability recovery11
TCTracer: Establishing test-to-code traceability links using dynamic and static techniques11
Promises and challenges of microservices: an exploratory study11
SoftNER: Mining knowledge graphs from cloud incidents11
Bugs in machine learning-based systems: a faultload benchmark11
Challenges in software model reuse: cross application domain vs. cross modeling paradigm11
Learning from what we know: How to perform vulnerability prediction using noisy historical data11
CsmithEdge: more effective compiler testing by handling undefined behaviour less conservatively11
Assessing practitioner beliefs about software engineering11
Maintenance-related concerns for post-deployed Ethereum smart contract development: issues, techniques, and future challenges11
“More Than Deep Learning”: post-processing for API sequence recommendation11
What have we learned? A conceptual framework on New Zealand software professionals and companies’ response to COVID-1911
Modeling function-level interactions for file-level bug localization11
Studying the impact of risk assessment analytics on risk awareness and code review performance10
The upper bound of information diffusion in code review10
Governing the commons: code ownership and code-clones in large-scale software development10
Empirically evaluating flaky test detection techniques combining test case rerunning and machine learning models10
Mutation analysis for evaluating code translation10
Understanding and effectively mitigating code review anxiety10
Large scale reuse of microservices using CI/CD and InnerSource practices - a case study10
An empirical comparison of ethnic and gender diversity of DevOps and non-DevOps contributions to open-source projects10
Can instability variations warn developers when open-source projects boost?10
Predicting merge conflicts considering social and technical assets10
Adopting automated bug assignment in practice — a longitudinal case study at Ericsson10
The untold impact of learning approaches on software fault-proneness predictions: an analysis of temporal aspects10
A literature review and existing challenges on software logging practices10
Analyzing the BizDev interface in an enterprise context: a case of developers acting in business10
A qualitative study on refactorings induced by code review10
What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow10
Collaboration failure analysis in cyber-physical system-of-systems using context fuzzy clustering10
The impact of the COVID-19 pandemic on women’s contribution to public code10
Taxonomy of inline code comment smells10
An empirical study on developers’ shared conversations with ChatGPT in GitHub pull requests and issues10
The broken windows theory applies to technical debt10
Code smells detection via modern code review: a study of the OpenStack and Qt communities9
On the use of commit-relevant mutants9
Comparing ϕ and the F-measure as performance metrics for software-related classifications9
Topic recommendation for software repositories using multi-label classification algorithms9
Automatic team recommendation for collaborative software development9
Weighted software metrics aggregation and its application to defect prediction9
Exposed! A case study on the vulnerability-proneness of Google Play Apps9
TaintBench: Automatic real-world malware benchmarking of Android taint analyses9
GitHub Discussions: An exploratory study of early adoption9
Studying differentiated code to support smart contract update9
Towards cost-benefit evaluation for continuous software engineering activities9
Genetic programming for feature model synthesis: a replication study9
A requirements inspection method based on scenarios generated by model mutation and the experimental validation9
Contrasting test selection, prioritization, and batch testing at scale9
Two N-of-1 self-trials on readability differences between anonymous inner classes (AICs) and lambda expressions (LEs) on Java code snippets9
Assessing the opportunity of combining state-of-the-art Android malware detectors9
Crowdsmelling: A preliminary study on using collective knowledge in code smells detection9
Uniform and scalable sampling of highly configurable systems9
A qualitative study of developers’ discussions of their problems and joys during the early COVID-19 months9
Measuring affective states from technical debt9
Out of sight, out of mind? How vulnerable dependencies affect open-source projects9
MLASP: Machine learning assisted capacity planning9
Cross-status communication and project outcomes in OSS development9
TraceSim: An Alignment Method for Computing Stack Trace Similarity9
On the preferences of quality indicators for multi-objective search algorithms in search-based software engineering8
Model vs system level testing of autonomous driving systems: a replication and extension study8
Workflow analysis of data science code in public GitHub repositories8
Do explicit review strategies improve code review performance? Towards understanding the role of cognitive load8
Analyzing Techniques for Duplicate Question Detection on Q&A Websites for Game Developers8
Leveraging Stack Overflow to detect relevant tutorial fragments of APIs8
FIXME: synchronize with database! An empirical study of data access self-admitted technical debt8
Deep learning techniques to detect cybersecurity attacks: a systematic mapping study8
From guidelines to practice: assessing Android app developer compliance with google’s security recommendations8
Free open source communities sustainability: Does it make a difference in software quality?8
Agile software development one year into the COVID-19 pandemic8
Assessing the exposure of software changes8
Refactoring practices in the context of data-intensive systems8
Automatic bi-modal question title generation for Stack Overflow with prompt learning8
Testing the past: can we still run tests in past snapshots for Java projects?8
Practitioner’s view of the success factors for software outsourcing partnership formation: an empirical exploration8
PTM4Tag+: Tag recommendation of stack overflow posts with pre-trained models7
On the adequacy of static analysis warnings with respect to code smell prediction7
Understanding developers’ privacy and security mindsets via climate theory7
Automatic prediction of rejected edits in Stack Overflow7
Lightweight precise automatic extraction of exception preconditions in java methods7
Using acceptance tests to predict merge conflict risk7
Revisiting the debate: Are code metrics useful for measuring maintenance effort?7
Studying the characteristics of AIOps projects on GitHub7
Evaluating refactorings for disciplining #ifdef annotations: An eye tracking study with novices7
Inter-team communication in large-scale co-located software engineering: a case study7
App review driven collaborative bug finding7
Conclusion stability for natural language based mining of design discussions7
The impact of a continuous integration service on the delivery time of merged pull requests7
How do ML practitioners perceive explainability? an interview study of practices and challenges7
A Hybrid Distributed EA Approach for Energy Optimisation on Smartphones7
We do not understand what it says – studying student perceptions of software modelling7
How do developers use type inference: an exploratory study in Kotlin7
A fine-grained taxonomy of code review feedback in TypeScript projects7
The impact of class imbalance techniques on crashing fault residence prediction models7
Towards understanding quality challenges of the federated learning for neural networks: a first look from the lens of robustness7
Beyond the virus: a first look at coronavirus-themed Android malware7
Evaluating the impact of flaky simulators on testing autonomous driving systems7
Exploring the relationship between performance metrics and cost saving potential of defect prediction models7
When conversations turn into work: a taxonomy of converted discussions and issues in GitHub7
Story points changes in agile iterative development7
Demystifying code snippets in code reviews: a study of the OpenStack and Qt communities and a practitioner survey7
Developers’ perception matters: machine learning to detect developer-sensitive smells7
Developer reactions to protestware in open source software: the cases of color.js and es5.ext7
Advantages and disadvantages of (dedicated) model transformation languages6
Guest editorial: special issue on empirical software engineering and measurement6
BTLink : automatic link recovery between issues and commits based on pre-trained BERT model6
A theory of factors affecting continuous experimentation (FACE)6
Silent bugs in deep learning frameworks: an empirical study of Keras and TensorFlow6
Analysis of a many-objective optimization approach for identifying microservices from legacy systems6
What constitutes debugging? An exploratory study of debugging episodes6
Motivating members’ involvement to effectually conduct collaborative software process tailoring6
Harnessing pre-trained generalist agents for software engineering tasks6
Smells in system user interactive tests6
Examining ownership models in software teams6
Does agile methodology fit all characteristics of software projects? Review and analysis6
On the impact of security vulnerabilities in the npm and RubyGems dependency networks6
SPVF: security property assisted vulnerability fixing via attention-based models6
The effects of continuous integration on software development: a systematic literature review6
Type-migrating C-to-Rust translation using a large language model6
Transformer-based code model with compressed hierarchy representation6
Investigating the readability of test code6
An empirical study on the usage of mocking frameworks in Apache software foundation6
Deep learning based identification of inconsistent method names: How far are we?6
Predicting sensitive information leakage in IoT applications using flows-aware machine learning approach6
Preface to the Special issue on the 36th IEEE International Conference on Software Maintenance and Evolution (ICSME 2020)6
On the coordination of vulnerability fixes6
A large scale analysis of mHealth app user reviews6
Evaluating interactive documentation for programmers6
Responding to change over time: A longitudinal case study on changes in coordination mechanisms in large-scale agile6
Automated test generation for Scratch programs6
Automatic identification of self-admitted technical debt from four different sources6
Static detection of equivalent mutants in real-time model-based mutation testing5
An empirical study of task infections in Ansible scripts5
Energy efficiency of the Visitor Pattern: contrasting Java and C++ implementations5
Analysing app reviews for software engineering: a systematic literature review5
Visualizing the customization endeavor in product-based-evolving software product lines: a case of action design research5
Improving the quality of software issue report descriptions in Turkish: An industrial case study at Softtech5
Towards graph-anonymization of software analytics data: empirical study on JIT defect prediction5
How far are we with automated machine learning? characterization and challenges of AutoML toolkits5
0.09436821937561