Intro to Item Response Theory using Open Source Solutions

Billy Buchanan, Ph.D.
Director of Data, Research, and Accountability
Fayette County Public Schools

https://wbuchanan.github.io/kaacSlideDeck
  • What is Item Response Theory?
  • What is jMetrik?
  • Why You Need to Care
  • How do you do it?

What is Item Response Theory?



And now for a bit of math...

$$Pr(Y_{ij} = 1 | a_i, b_i, c_i, d_i, \theta_j) = c_i + (d_i - c_i)\frac{exp^{(\alpha(\theta_j-\beta_i) ) }}{1 + exp^{(\alpha(\theta_j-\beta_i) ) }}$$

  • $\alpha$ is the "discrimination" parameter
  • $\beta$ is the "difficulty" parameter
  • $c$ is the "pseudoguessing" parameter
  • $d$ is the upper asymptote or highest probability of a correct response
  • $\theta$ is the "ability" parameter


  • Do to time contraints, we'll only talk about one of the ways to estimate 1PL models.
  • More specifically, we'll be talking about fitting a Rasch model using the Joint Maximum Likelihood Estimator (JMLE) to the data.
  • However, for those interested, if you view the slides and click your down arrow, there are some brief explanations of other IRT models that are appropriate for other contexts.


Partial Credit Models (PCM)

$$Pr(Y_{ij} = k | \theta_j) = \frac{exp^{ (\Sigma_{t=1}^k\alpha(\theta_j-\beta_{it}) ) }}{1 + \Sigma_{s = 1}^K exp^{ (\Sigma_{s=1}^s\alpha(\theta_j-\beta_{it}) ) } } $$

  • The $\alpha$ & $\theta$ parameters have the same meeting as the had from the other models.
  • The $\beta$ parameter is the difficulty associated with the $t^{th}$ response option on the $i^{th}$ item
  • The difference is that here we are predicting the probability of the respondent selecting the $k^{th}$ option from the response set if they have an ability of $\theta_j$
  • $\theta$ is also assumed to be $N(0, 1)$


Rating Scale Models (RSM)

$$Pr(Y_{ij} = k | \alpha, \beta_i, \theta_j) = \frac{exp^{ (\Sigma_{t=1}^k\alpha(\theta_j-\beta_{it}) ) }}{1 + \Sigma_{s = 1}^K exp^{ (\Sigma_{s=1}^s\alpha(\theta_j-\beta_{it}) ) } } $$

  • There are some subtle but important differences between the Rating Scale and Partial Credit Models
  • The distance between the category difficulties are also constrained to be equal (e.g., the difference between scoring a 3 or 4 is the same as the difference in scoring a 2 or 3)


Graded Response Models (GRM)

$$Pr(Y_{ij} \geq k | \theta_j) = \frac{exp^{ (\Sigma_{t=1}^k\alpha_i(\theta_j-\beta_{ik}) ) }}{1 + \Sigma_{s = 1}^K exp^{ (\Sigma_{s=1}^s\alpha_i(\theta_j-\beta_{ik}) ) } } $$

  • Here the interpretation of the $\beta$ parameter changes to indicate the difficulty of endorsing category $k$ or higher for the $i^{th}$ item
  • Additionally, unlike the PCM, the item discriminations (i.e., the $\alpha$) parameters are freely estimated


Nominal Response Models (NRM)

$$Pr(Y_{ij} = k | \theta_j) = \frac{exp^{ (\alpha_{ik}(\theta_j-\beta_{ik}) ) }}{\Sigma_{h = 1}^K exp^{ (\alpha_{ih}(\theta_j-\beta_{ih}) ) } } $$

  • You can think of this as the categorical analog to the GRM.
  • These models would be used in cases where the response choice has no inherent value that could be ordered (e.g., what is your favorite ice cream flavor?)

What is jMetrik?



Get jMetrik Here

  • Java-based application for Psychometric analysis of data
  • Freely available (e.g., does not cost you anything to use it and you can even modify it if you so desire)
  • Some functionality has been integrated into other software platforms like Stata (the raschjmle program)
  • The source code for all of the math and user interface is publicly available:

Example Data and Source Code

All the data used for these examples, and the source code used to simulate it is publicly available.

To get the files, go to https://github.com/wbuchanan/kaacSlideDeck/tree/gh-pages

The file itemResponses.csv contains the .CSV file used for the examples.

The file simulateItemResponses.R contains the source code used to generate the simulated data in the .CSV above.

Why You Need to Care



HR Issues



The Real Reason

  • Bad Measurement = Bad Decisions = Bad Outcomes
  • We should enable the adults working with children to make the best decisions based on the best possible data.
  • Bad Measurement + Correct Decision = Bad Decisions = Bad Outcomes
  • The quality of your measurement is empirical, not opinion or feeling.
  • Just because you think you are measuring something doesn't mean you are measuring it.

Starting jMetrik



Initial view when starting the application


Menu Item used to launch creation of new Database


Shows dialog used to name the new database


Launches the Dialog to open a DB


Shows dialog where users select which database to open


Menu item used to launch the file import dialog


Shows the dialog used to select the file to import


Shows the change in the GUI after a file is loaded


Shows a preview of the data loaded into jMetrik


Shows the variable view option to view the data

Setting up Answer Keys



Shows where to click to launch the advanced scoring dialog


Setting up answer key for items with keyed response option a


Shows dialog after clicking the submit button


Shows variable view after refreshing the view to confirm columns are registered as item types


Setting up answer key for items with keyed response option b


Shows dialog after clicking the submit button


Shows variable view after refreshing the view to confirm columns are registered as item types


Setting up answer key for items with keyed response option c


Shows dialog after clicking the submit button


Shows variable view after refreshing the view to confirm columns are registered as item types


Setting up answer key for items with keyed response option d


Shows dialog after clicking the submit button


Shows variable view after refreshing the view to confirm columns are registered as item types


Setting up answer key for items with keyed response option e


Shows dialog after clicking the submit button


Shows variable view after refreshing the view to confirm columns are registered as item types

Item Frequencies



Shows menu option to click to launch item frequency dialog


Shows item frequency analysis dialog


Shows output from frequency analysis

Distractor Analysis



Shows menu option to click for distractor analysis


Shows distractor analysis dialog


Shows additional recommended options to select for distractor analyses


Shows button to click on to save results


Shows dialog to enter table name where results will be saved


Shows button to click to execute distractor analyses


Shows annotated output for distractor analyses

The Rasch Model



Warning to click back on the item responses before moving forward


Shows where to click in the menu to launch the JMLE Dialog Box for Rasch Model


Shows dialog box with default settings


Shows how to select all items in bulk


Verify that all items you want included in the analysis are located in the box on the right


Shows some optional configuration settings on the global tab of the dialog box


Shows the default view for the item tab


Shows suggested options to use on the item tab


Shows the default view for the person tab


Shows suggested options to use on the person tab


Shows where to click to fit the Rasch model to the data


Shows annotated output from the start of the text based output that appears after fitting the model to the data


Continuation of previous slide showing annotated text-based output after fitting the Rasch model to the data


Shows annotations for the table where the results are stored related to the item parameter estimates

IRT Plots



Annotated set up and menu location to generate Item/Test characteristic curves


Annotation showing what it will look like when items are selected


Shows suggested options to include when creating item characteristic curves


Shows buttons to click to select location where results will be saved and to execute the ICC graph generation


Annotation explaining beta parameter location and meaning


Annotation explaining item information function


Warning about reversed ICC...with reversed items exclude them from the test form


Annotation explaining test characteristic curve, test information function, and the standard error of measurement.


Annotation explaining the pseudoguessing parameter


Annotation explaining the d parameter


Annotation explaining discrimination parameter
  • The discrimination parameter does not mean human discrimination
  • You could think of it like how well Kentuckians can detect good from bad basketball players (e.g., discriminating taste in basketball players), particularly those who view basketball through blue and white lenses
  • This is basically about the range over which the item does a good job at identifying high and low skill, mastery, and ability
  • The slope should always be positive when using an item in a test form
  • For a Rasch model, this slope will always be equal to 1
  • For other types of 1PL models, the slope will be equal to the average item discrimination
  • In 2PL, 3PL, and 4PL models, the slope can and will vary by item


Item Characteristic Curve for example item number 7


Item Characteristic Curve for example item number 8


Item Characteristic Curve for example item number 9


Item Characteristic Curve for example item number 10


Item Characteristic Curve for example item number 11


Item Characteristic Curve for example item number 12


Item Characteristic Curve for example item number 13


Item Characteristic Curve for example item number 14


Item Characteristic Curve for example item number 15


Item Characteristic Curve for example item number 16


Item Characteristic Curve for example item number 17


Item Characteristic Curve for example item number 18


Item Characteristic Curve for example item number 19


Item Characteristic Curve for example item number 20

Differential Item Functioning



Shows Menu item where DIF analysis can be found


Shows annotated set up for DIF analysis


Shows dialog to name table where results will be saved


Shows button to click to execute DIF analysis


Explanation of class A DIF


Explanation of class B and C of DIF


Annotated DIF output with advice on handling items with higher levels of DIF


Differential Item Function Results table annotated

Oh...and by the way...


FCPS is Hiring Data Strategists

If you or someone you know is a data ninja, code wizard, or other type of quant that wants to enjoy life out in Wildcat Country tell them to

email me