Empirical Software Engineering

Papers
(The TQCC of Empirical Software Engineering is 8. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-04-01 to 2025-04-01.)
ArticleCitations
Unveiling overlooked performance variance in serverless computing173
Assessing the adoption of security policies by developers in terraform across different cloud providers103
GitHub Actions: The Impact on the Pull Request Process78
A fine-grained data set and analysis of tangling in bug fixing commits68
Finding the sweet spot for organizational control and team autonomy in large-scale agile software development56
Fixing Dockerfile smells: an empirical study54
Calling relationship investigation and application on Ethereum Blockchain System54
UPC sentinel: An accurate approach for detecting upgradeability proxy contracts in Ethereum49
Reflections on the Empirical Software Engineering journal45
A review of automatic source code summarization43
Adversarial domain adaptation for cross-project defect prediction43
Introduction to the special issue on program comprehension42
A machine and deep learning analysis among SonarQube rules, product, and process metrics for fault prediction42
Assessing exception handling testing practices in open-source libraries38
Transformers and meta-tokenization in sentiment analysis for software engineering35
Explainable automated debugging via large language model-driven scientific debugging34
A fine-grained evaluation of mutation operators to boost mutation testing for deep learning systems33
Prioritizing code review requests to improve review efficiency: a simulation study33
An empirical study on release notes patterns of popular apps in the Google Play Store32
Hold on! is my feedback useful? evaluating the usefulness of code review comments32
Computation offloading for ground robotic systems communicating over WiFi – an empirical exploration on performance and energy trade-offs32
Evaluating the robustness of source code plagiarism detection tools to pervasive plagiarism-hiding modifications31
Detection and evaluation of bias-inducing features in machine learning30
APR4Vul: an empirical study of automatic program repair techniques on real-world Java vulnerabilities28
SAFe transformation in a large financial corporation27
Reviewing rounds prediction for code patches27
Software development metrics: to VR or not to VR27
Using Screenshot Attachments in Issue Reports for Triaging26
A comprehensive overview of software product management challenges25
Rap4DQ: Learning to recommend relevant API documentation for developer questions25
A controlled experiment on the impact of microtasking on programming25
An empirical study on self-admitted technical debt in Dockerfiles25
Development effort estimation in free/open source software from activity in version control systems24
Using code reviews to automatically configure static analysis tools24
Mutation testing in the wild: findings from GitHub23
Upstream bug management in Linux distributions23
Dependabot and security pull requests: large empirical study22
Path context augmented statement and network for learning programs22
An empirical study of issue-link algorithms: which issue-link algorithms should we use?22
Understanding code smells in Elixir functional language21
On the spread and evolution of dead methods in Java desktop applications: an exploratory study21
Predicting health indicators for open source projects (using hyperparameter optimization)21
Newcomer OSS-Candidates: Characterizing Contributions of Novice Developers to GitHub21
18 million links in commit messages: purpose, evolution, and decay21
Supporting single responsibility through automated extract method refactoring21
Towards automatic labeling of exception handling bugs: A case study of 10 years bug-fixing in Apache Hadoop20
Enhancing robustness of AI offensive code generators via data augmentation20
Toward effective secure code reviews: an empirical study of security-related coding weaknesses20
Seeing confusion through a new lens: on the impact of atoms of confusion on novices’ code comprehension19
Optimal priority assignment for real-time systems: a coevolution-based approach19
How Scrum adds value to achieving software quality?19
A study of documentation for software architecture19
The human experience of comprehending source code in virtual reality19
Can static analysis tools find more defects?19
A study of common bug fix patterns in Rust19
Do Agile scaling approaches make a difference? an empirical comparison of team effectiveness across popular scaling approaches18
Sources of software development task friction18
Learning to Predict Code Review Completion Time In Modern Code Review18
Static test flakiness prediction: How Far Can We Go?18
A syntax-guided multi-task learning approach for Turducken-style code generation18
Static analysis driven enhancements for comprehension in machine learning notebooks18
Pitfalls and guidelines for using time-based Git data18
TestEvoViz: visualizing genetically-based test coverage evolution18
OpenSCV: an open hierarchical taxonomy for smart contract vulnerabilities18
An empirical study on the effectiveness of large language models for SATD identification and classification17
Does the first response matter for future contributions? A study of first contributions17
VaryMinions: leveraging RNNs to identify variants in variability-intensive systems’ logs17
CyberSAGE: The cyber security argument graph evaluation tool17
Towards Trusted Smart Contracts: A Comprehensive Test Suite For Vulnerability Detection17
On the adoption and effects of source code reuse on defect proneness and maintenance effort17
CloneRipples: predicting change propagation between code clone instances by graph-based deep learning16
An empirical study on API usages from code search engine and local library16
DDImage: an image reduction based approach for automatically explaining black-box classifiers16
Augmented testing to support manual GUI-based regression testing: An empirical study16
Towards enhancing the reproducibility of deep learning bugs: an empirical study16
Cross-project defect prediction via semantic and syntactic encoding16
An empirical study into the effects of transpilation on quantum circuit smells15
Dynamical analysis of diversity in rule-based open source network intrusion detection systems15
Measuring model alignment for code clone detection using causal interpretation15
An empirical study of business process models and model clones on GitHub15
Using knowledge units of programming languages to recommend reviewers for pull requests: an empirical study15
The software heritage license dataset (2022 edition)15
Performance evolution of configurable software systems: an empirical study15
Revisiting reopened bugs in open source software systems14
Breaking bad? Semantic versioning and impact of breaking changes in Maven Central14
Styler: learning formatting conventions to repair Checkstyle violations14
Effects of variability in models: a family of experiments14
Evaluating software user feedback classifier performance on unseen apps, datasets, and metadata13
DebtFree: minimizing labeling cost in self-admitted technical debt identification using semi-supervised learning13
An empirical study of same-day releases of popular packages in the npm ecosystem13
An empirical study on the use of SZZ for identifying inducing changes of non-functional bugs13
Efficient static analysis and verification of featured transition systems13
Will you come back to contribute? Investigating the inactivity of OSS core developers in GitHub13
Understanding the characteristics and the role of visual issue reports13
HyperPUT: generating synthetic faulty programs to challenge bug-finding tools13
Analysing Time-Stamped Co-Editing Networks in Software Development Teams using git2net13
Consensus task interaction trace recommender to guide developers’ software navigation13
VEER: enhancing the interpretability of model-based optimizations13
A multi-model framework for semantically enhancing detection of quality-related bug report descriptions13
An extensive replication study of the ABLoTS approach for bug localization13
Demystifying API misuses in deep learning applications13
E-APR: Mapping the effectiveness of automated program repair techniques13
Correction to: Towards a recipe for language decomposition: quality assessment of language product lines12
Evaluating few-shot and contrastive learning methods for code clone detection12
On the effectiveness of log representation for log-based anomaly detection12
Seeing the invisible: test prioritization for object detection system12
Analyzing source code vulnerabilities in the D2A dataset with ML ensembles and C-BERT12
An empirical study of IoT topics in IoT developer discussions on Stack Overflow12
An empirical study of the effectiveness of IR-based bug localization for large-scale industrial projects12
The best ends by the best means: ethical concerns in app reviews12
Enhancing the defectiveness prediction of methods and classes via JIT12
More than React: Investigating the Role of Emoji Reaction in GitHub Pull Requests12
Propagating frugal user feedback through closeness of code dependencies to improve IR-based traceability recovery11
TCTracer: Establishing test-to-code traceability links using dynamic and static techniques11
Promises and challenges of microservices: an exploratory study11
SoftNER: Mining knowledge graphs from cloud incidents11
Bugs in machine learning-based systems: a faultload benchmark11
Challenges in software model reuse: cross application domain vs. cross modeling paradigm11
Learning from what we know: How to perform vulnerability prediction using noisy historical data11
CsmithEdge: more effective compiler testing by handling undefined behaviour less conservatively11
Assessing practitioner beliefs about software engineering11
Maintenance-related concerns for post-deployed Ethereum smart contract development: issues, techniques, and future challenges11
“More Than Deep Learning”: post-processing for API sequence recommendation11
What have we learned? A conceptual framework on New Zealand software professionals and companies’ response to COVID-1911
Modeling function-level interactions for file-level bug localization11
What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow10
Collaboration failure analysis in cyber-physical system-of-systems using context fuzzy clustering10
The impact of the COVID-19 pandemic on women’s contribution to public code10
Taxonomy of inline code comment smells10
An empirical study on developers’ shared conversations with ChatGPT in GitHub pull requests and issues10
The broken windows theory applies to technical debt10
Studying the impact of risk assessment analytics on risk awareness and code review performance10
The upper bound of information diffusion in code review10
Governing the commons: code ownership and code-clones in large-scale software development10
Empirically evaluating flaky test detection techniques combining test case rerunning and machine learning models10
Mutation analysis for evaluating code translation10
Understanding and effectively mitigating code review anxiety10
Large scale reuse of microservices using CI/CD and InnerSource practices - a case study10
An empirical comparison of ethnic and gender diversity of DevOps and non-DevOps contributions to open-source projects10
Can instability variations warn developers when open-source projects boost?10
Predicting merge conflicts considering social and technical assets10
Adopting automated bug assignment in practice — a longitudinal case study at Ericsson10
The untold impact of learning approaches on software fault-proneness predictions: an analysis of temporal aspects10
A literature review and existing challenges on software logging practices10
Analyzing the BizDev interface in an enterprise context: a case of developers acting in business10
A qualitative study on refactorings induced by code review10
Measuring affective states from technical debt9
Out of sight, out of mind? How vulnerable dependencies affect open-source projects9
MLASP: Machine learning assisted capacity planning9
Cross-status communication and project outcomes in OSS development9
TraceSim: An Alignment Method for Computing Stack Trace Similarity9
Code smells detection via modern code review: a study of the OpenStack and Qt communities9
On the use of commit-relevant mutants9
Comparing ϕ and the F-measure as performance metrics for software-related classifications9
Topic recommendation for software repositories using multi-label classification algorithms9
Automatic team recommendation for collaborative software development9
Weighted software metrics aggregation and its application to defect prediction9
Exposed! A case study on the vulnerability-proneness of Google Play Apps9
TaintBench: Automatic real-world malware benchmarking of Android taint analyses9
GitHub Discussions: An exploratory study of early adoption9
Studying differentiated code to support smart contract update9
Towards cost-benefit evaluation for continuous software engineering activities9
Genetic programming for feature model synthesis: a replication study9
A requirements inspection method based on scenarios generated by model mutation and the experimental validation9
Contrasting test selection, prioritization, and batch testing at scale9
Two N-of-1 self-trials on readability differences between anonymous inner classes (AICs) and lambda expressions (LEs) on Java code snippets9
Assessing the opportunity of combining state-of-the-art Android malware detectors9
Crowdsmelling: A preliminary study on using collective knowledge in code smells detection9
Uniform and scalable sampling of highly configurable systems9
A qualitative study of developers’ discussions of their problems and joys during the early COVID-19 months9
Workflow analysis of data science code in public GitHub repositories8
Testing the past: can we still run tests in past snapshots for Java projects?8
Analyzing Techniques for Duplicate Question Detection on Q&A Websites for Game Developers8
Leveraging Stack Overflow to detect relevant tutorial fragments of APIs8
FIXME: synchronize with database! An empirical study of data access self-admitted technical debt8
Model vs system level testing of autonomous driving systems: a replication and extension study8
From guidelines to practice: assessing Android app developer compliance with google’s security recommendations8
Do explicit review strategies improve code review performance? Towards understanding the role of cognitive load8
Agile software development one year into the COVID-19 pandemic8
Assessing the exposure of software changes8
Automatic bi-modal question title generation for Stack Overflow with prompt learning8
Deep learning techniques to detect cybersecurity attacks: a systematic mapping study8
Practitioner’s view of the success factors for software outsourcing partnership formation: an empirical exploration8
Free open source communities sustainability: Does it make a difference in software quality?8
On the preferences of quality indicators for multi-objective search algorithms in search-based software engineering8
Refactoring practices in the context of data-intensive systems8
0.081228971481323