Journal of Educational Measurement

Papers
(The median citation count of Journal of Educational Measurement is 0. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-06-01 to 2025-06-01.)
ArticleCitations
NCME Presidential Address 2022: Turning the Page to the Next Chapter of Educational Measurement19
The Automated Test Assembly and Routing Rule for Multistage Adaptive Testing with Multidimensional Item Response Theory14
A Statistical Test for the Detection of Item Compromise Combining Responses and Response Times14
Assessing Differential Bundle Functioning Using Meta‐Analysis13
13
Measuring the Uncertainty of Imputed Scores11
Optimal Calibration of Items for Multidimensional Achievement Tests10
A Note on the Use of Categorical Subscores9
Two IRT Characteristic Curve Linking Methods Weighted by Information8
Editorial for JEM issue 58‐38
Using Item Parameter Predictions for Reducing Calibration Sample Requirements—A Case Study Based on a High‐Stakes Admission Test8
Issue Information8
A Deterministic Gated Lognormal Response Time Model to Identify Examinees with Item Preknowledge7
Historical Perspectives on Score Comparability Issues Raised by Innovations in Testing6
6
Using Linkage Sets to Improve Connectedness in Rater Response Model Estimation6
Validity Arguments for AI‐Based Automated Scores: Essay Scoring as an Illustration6
Briggs, Derek C.Historical and Conceptual Foundations of Measurement in the Human Sciences: Credos and Controversies6
Differential and Functional Response Time Item Analysis: An Application to Understanding Paper versus Digital Reading Processes5
An Exponentially Weighted Moving Average Procedure for Detecting Back Random Responding Behavior5
Model Selection Posterior Predictive Model Checking via Limited‐Information Indices for Bayesian Diagnostic Classification Modeling5
Gender Bias in Test Item Formats: Evidence from PISA 2009, 2012, and 2015 Math and Reading Tests4
Information Functions of Rank‐2PL Models for Forced‐Choice Questionnaires4
A Computationally Simple Method for Estimating Decision Consistency4
On the Positive Correlation between DIF and Difficulty: A New Theory on the Correlation as Methodological Artifact4
Likelihood‐Based Estimation of Model‐Derived Oral Reading Fluency4
3
Using Eye‐Tracking Data as Part of the Validity Argument for Multiple‐Choice Questions: A Demonstration3
Controlling the Speededness of Assembled Test Forms: A Generalization to the Three‐Parameter Lognormal Response Time Model3
A Generalized Objective Function for Computer Adaptive Item Selection3
Using Response Time in Multidimensional Computerized Adaptive Testing3
Score Comparability between Online Proctored and In‐Person Credentialing Exams3
Sensemaking of Process Data from Evaluation Studies of Educational Games: An Application of Cross‐Classified Item Response Theory Modeling3
DIF Detection for Multiple Groups: Comparing Three‐Level GLMMs and Multiple‐Group IRT Models3
Editorial for JEM issue 58‐23
Addressing Bias in Spoken Language Systems Used in the Development and Implementation of Automated Child Language‐Based Assessment3
Detecting Group Collaboration Using Multiple Correspondence Analysis3
Exploring the Impact of Random Guessing in Distractor Analysis2
Issue Information2
Comparing and Combining IRTree Models and Anchoring Vignettes in Addressing Response Styles2
A Highly Adaptive Testing Design for PISA2
Cognitive Diagnostic Multistage Testing by Partitioning Hierarchically Structured Attributes2
2
An Item Response Tree Model for Items with Multiple‐Choice and Constructed‐Response Parts2
Utilizing Response Time for Item Selection in On‐the‐Fly Multistage Adaptive Testing for PISA Assessment2
Detecting Multidimensional DIF in Polytomous Items with IRT Methods and Estimation Approaches2
Issue Information2
Measuring the Impact of Peer Interaction in Group Oral Assessments with an Extended Many‐Facet Rasch Model2
2
Variation in Respondent Speed and its Implications: Evidence from an Adaptive Testing Scenario2
Explanatory Cognitive Diagnostic Modeling Incorporating Response Times2
MSAEM Estimation for Confirmatory Multidimensional Four‐Parameter Normal Ogive Models2
Subscores: A Practical Guide to Their Production and Consumption. ShelbyHaberman, SandipSinharay, RichardFeinberg, and HowardWainer. Cambridge, Cambridge University Press2024, 176 pp. (paperback)2
On the Choice of Parameters for the Lognormal Model for Response Times: Commentary on Becker et al. (2013)2
The Vulnerability of AI‐Based Scoring Systems to Gaming Strategies: A Case Study2
BettyLanteigne, ChristineCoombe, & James DeanBrown. 2021. Challenges in Language Testing around the World: Insights for language test users. Singapore: Springer, 2021, 129.99 € (hardcover),2
1
Constructing a Robust Score Scale from IRT Scores with Informed Boundaries1
Online Monitoring of Test‐Taking Behavior Based on Item Responses and Response Times1
Influence of Intersectional Routing Modules between Dimensions on Measurement Precision in Multidimensional Multistage Testing1
Using Keystroke Dynamics to Detect Nonoriginal Text1
On Joining a Signal Detection Choice Model with Response Time Models1
Robustness of Item Response Theory Models under the PISA Multistage Adaptive Testing Designs1
Reckase, M.The Psychometrics of Standard Setting: Connecting Policy and Test Scores: First edition published 2023 by CRC Press, 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487‐2741
Argument‐Based Approach to Validity: Developing a Living Document and Incorporating Preregistration1
The Impact of Cheating on Score Comparability via Pool‐Based IRT Pre‐equating1
Issue Information1
Issue Information1
Issue Information1
Detecting Differential Item Functioning Using Posterior Predictive Model Checking: A Comparison of Discrepancy Statistics1
Curvilinearity in the Reference Composite and Practical Implications for Measurement1
Validity Arguments Meet Artificial Intelligence in Innovative Educational Assessment: A Discussion and Look Forward1
Fully Gibbs Sampling Algorithms for Bayesian Variable Selection in Latent Regression Models1
1
Using Item Scores and Distractors in Person‐Fit Assessment1
1
Using Multilabel Neural Network to Score High‐Dimensional Assessments for Different Use Foci: An Example with College Major Preference Assessment1
1
Modeling Hierarchical Attribute Structures in Diagnostic Classification Models with Multiple Attempts1
A Factor Mixture Model for Item Responses and Certainty of Response Indices to Identify Student Knowledge Profiles1
IRT Observed‐Score Equating for Rater‐Mediated Assessments Using a Hierarchical Rater Model1
Pretest Item Calibration in Computerized Multistage Adaptive Testing0
Using Keystroke Behavior Patterns to Detect Nonauthentic Texts in Writing Assessments: Evaluating the Fairness of Predictive Models0
Exploring Latent Constructs through Multimodal Data Analysis0
Editorial for JEM issue 58‐40
Modeling Directional Testlet Effects on Multiple Open‐Ended Questions0
Issue Information0
Computation and Accuracy Evaluation of Comparable Scores on Culturally Responsive Assessments0
Using Simulated Retests to Estimate the Reliability of Diagnostic Assessment Systems0
Correction to “Expanding the Lognormal Response Time Model Using Profile Similarity Metrics to Improve the Detection of Anomalous Testing Behavior”0
A One‐Parameter Diagnostic Classification Model with Familiar Measurement Properties0
Sociocognitive Processes and Item Response Models: A Didactic Example0
Introduction to the Special Issue Maintaining Score Comparability: Recent Challenges and Some Possible Solutions0
Sequential Reservoir Computing for Log File‐Based Behavior Process Data Analyses0
0
Validation for Personalized Assessments: A Threats‐to‐Validity Approach0
A New Bayesian Person‐Fit Analysis Method Using Pivotal Discrepancy Measures0
Issue Information0
Editorial for JEM issue 59‐40
A Residual‐Based Differential Item Functioning Detection Framework in Item Response Theory0
Modeling Response Styles in Cross‐Classified Data Using a Cross‐Classified Multidimensional Nominal Response Model0
Modeling the Intraindividual Relation of Ability and Speed within a Test0
Using Automated Procedures to Score Educational Essays Written in Three Languages0
An Exploration of an Improved Aggregate Student Growth Measure Using Data from Two States0
Random Responders in the TIMSS 2015 Student Questionnaire: A Threat to Validity?0
Incorporating Test‐Taking Engagement into Multistage Adaptive Testing Design for Large‐Scale Assessments0
0
Specifying the Three Ws in Educational Measurement: Who Uses Which Scores for What Purpose?0
A Note on Latent Traits Estimates under IRT Models with Missingness0
Issue Information0
Issue Information0
Assessing the Impact of Equating Error on Group Means and Group Mean Differences0
0
Another Look at Yen's Q3: Is .2 an Appropriate Cut‐Off?0
0
A Dual‐Purpose Model for Binary Data: Estimating Ability and Misconceptions0
Latent Space Model for Process Data0
Evaluation of Factors Affecting the Performance of the S−X2$S-X^{2}$ Item‐Fit Index0
Classification Accuracy and Consistency of Compensatory Composite Test Scores0
Issue Information0
Score Comparability Issues with At‐Home Testing and How to Address Them0
Validating Performance Standards via Latent Class Analysis0
An Exploratory Study Using Innovative Graphical Network Analysis to Model Eye Movements in Spatial Reasoning Problem Solving0
Does Timed Testing Affect the Interpretation of Efficiency Scores?—A GLMM Analysis of Reading Components0
Modeling Nonlinear Effects of Person‐by‐Item Covariates in Explanatory Item Response Models: Exploratory Plots and Modeling Using Smooth Functions0
A Unified Comparison of IRT‐Based Effect Sizes for DIF Investigations0
Several Variations of Simple‐Structure MIRT Equating0
Detecting Differential Item Functioning in CAT Using IRT Residual DIF Approach0
Issue Information0
A Bayesian Moderated Nonlinear Factor Analysis Approach for DIF Detection under Violation of the Equal Variance Assumption0
0
Issue Information0
Expanding the Lognormal Response Time Model Using Profile Similarity Metrics to Improve the Detection of Anomalous Testing Behavior0
Recent Challenges to Maintaining Score Comparability: A Commentary0
A Comparison of Anchor Selection Strategies for DIF Analysis0
Algorithmic Bias in BERT for Response Accuracy Prediction: A Case Study for Investigating Population Validity0
Online Calibration in Multidimensional Computerized Adaptive Testing with Polytomously Scored Items0
Linking and Comparability across Conditions of Measurement: Established Frameworks and Proposed Updates0
Corrigendum: A Residual‐Based Differential Item Functioning Detection Framework in Item Response Theory0
Optimizing Implementation of Artificial‐Intelligence‐Based Automated Scoring: An Evidence Centered Design Approach for Designing Assessments for AI‐based Scoring0
Validity Arguments Meet Artificial Intelligence in Innovative Educational Assessment0
Examining the Impacts of Ignoring Rater Effects in Mixed‐Format Tests0
Modeling Missing Response Data in Item Response Theory: Addressing Missing Not at Random Mechanism with Monotone Missing Characteristics0
Psychometric Methods to Evaluate Measurement and Algorithmic Bias in Automated Scoring0
Using Multiple Maximum Exposure Rates in Computerized Adaptive Testing0
Classical Item Analysis from a Signal Detection Perspective0
Estimating Classification Accuracy and Consistency Indices for Multiple Measures with the Simple Structure MIRT Model0
von Davier, Alina, Mislevy, Robert J., and Hao, Jiangang (Eds.) (2021). Computational Psychometrics: New Methodologies for a New Generation of Digital Learning and Assessment. Methodology of Education0
Editorial for JEM issue 59‐10
Toward Argument‐Based Fairness with an Application to AI‐Enhanced Educational Assessments0
Generating Models for Item Preknowledge0
Multiple‐Group Joint Modeling of Item Responses, Response Times, and Action Counts with the Conway‐Maxwell‐Poisson Distribution0
Detecting Differential Item Functioning among Multiple Groups Using IRT Residual DIF Framework0
0
Theory‐Driven IRT Modeling of Vocabulary Development: Matthew Effects and the Case for Unipolar IRT0
Differences in Time Usage as a Competing Hypothesis for Observed Group Differences in Accuracy with an Application to Observed Gender Differences in PISA Data0
Issue Information0
Anchoring Validity Evidence for Automated Essay Scoring0
A Nonparametric Composite Group DIF Index for Focal Groups Stemming from Multicategorical Variables0
Issue Information0
Issue Information0
An Unsupervised‐Learning‐Based Approach to Compromised Items Detection0
0
0.1019880771637