Applied Measurement in Education

Papers
(The median citation count of Applied Measurement in Education is 1. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-06-01 to 2025-06-01.)
ArticleCitations
Impact of Violating Unidimensionality on Rasch Calibration for Mixed-Format Tests20
Shifting Educational Measurement from an Agent of Systemic Racism to an Anti-Racist Endeavor14
IRT Characteristic Curve Linking Methods Weighted by Information for Mixed-Format Tests8
Detecting Item Parameter Drift in Small Sample Rasch Equating8
TheStandardsWill Never Be Enough: A Racial Justice Extension7
Characterizing the Latent Classes in a Mixture IRT Model Using DIF6
Computer-Based Listening Test with Full Video, Visual-Limited Video, and Audio: A Comparative Analysis Based on Difficulty, Discrimination Power, and Response Time6
Analyzing Student Response Processes to Evaluate Success on a Technology-Based Problem-Solving Task5
Comparing Drift Detection Methods for Accurate Rasch Equating in Different Sample Sizes5
Not-reached Items: An Issue of Time and of test-taking Disengagement? the Case of PISA 2015 Reading Data5
Comparing Examinee-Based and Response-Based Motivation Filtering Methods in Remote Low-Stakes Testing5
New Tests of Rater Drift in Trend Scoring4
Response Demands of Reading Comprehension Test Items: A Review of Item Difficulty Modeling Studies4
Applying a Culturally Responsive Pedagogical Framework to Design and Evaluate Classroom Performance-Based Assessments in Hawai‘i4
The Impact of Non-Effortful Responding on Item and Person Parameters in Item-Pool Scaling Linking4
The Consideration of Admissions Testing at Colleges and Universities: A Perspective3
Between- versus Within-Examinee Variability in Test-Taking Effort and Test Emotions during a Low-Stakes Test3
Measurement Invariance in Relation to First Language: An Evaluation of German Reading and Spelling Tests3
Don’t Test After Lunch: The Relationship Between Disengagement and the Time of Day That Low-Stakes Testing Occurs3
Dissecting Knowledge, Guessing, and Blunder in Multiple Choice Assessments3
Modeling Dimensions Converging at the Upper Anchor in Learning Progressions: An Example of Micro-Evolution3
A Method for Displaying Incremental Validity with Expectancy Charts3
Automated Scoring of Short-Answer Questions: A Progress Report3
Bayesian Estimation and Testing of a Linear Logistic Test Model for Learning during the Test2
Are Online and Paper Tests Comparable? Evidence from Statewide K-12 Tests2
Identifying Careless Responses in Computer-Adaptive Affective Surveys Using Person Fit Analysis2
Gender Differences and Types of Test-Taking Behaviors in PIRLS 20212
Comparing School Reports and Empirical Estimates of Relative Reliance on Tests Vs Grades in College Admissions2
Does the Response Options Placement Provide Clues to the Correct Answers in Multiple-choice Tests? A Systematic Review2
Enacting a Process for Developing Culturally Relevant Classroom Assessments2
Item-Writing Guidelines on Response Option Placement: A Systematic Review2
Efficient Estimation of Mean Ability Growth Using Vertical Scaling2
Change in Engagement During Test Events: An Argument for Weighted Scoring?2
Efficient Assessment of Students’ Proportional Reasoning2
A Method of Empirical Q-Matrix Validation for Multidimensional Item Response Theory2
Multi-Group Generalizations of SIBTEST and Crossing-SIBTEST2
Performance Decline as an Indicator of Generalized Test-Taking Disengagement1
Comparing the Robustness of Three Nonparametric DIF Procedures to Differential Rapid Guessing1
Can Adaptive Testing Improve Test-Taking Experience? A Case Study on Educational Survey Assessment1
Maintaining Score Scales Over Time: A Comparison of Five Scoring Methods1
A Critical Review of Fairness from Multiple Perspectives: Implications for Classroom Assessment Theory1
Using Bayesian Networks for Cognitive Assessment of Student Understanding of Buoyancy: A Granular Hierarchy Model1
Comparison of Methods for Identifying Differential Step Functioning with Polytomous Item Response Data1
Personalized Online Learning, Test Fairness, and Educational Measurement: Considering Differential Content Exposure Prior to a High Stakes End of Course Exam1
An Examination of Individual Ability Estimation and Classification Accuracy Under Rapid Guessing Misidentifications1
Reconceptualizing Rapid Responses as a Speededness Indicator in High-Stakes Assessments1
Effects of Using Double Ratings as Item Scores on IRT Proficiency Estimation1
0.029109001159668