Item Response Theory Functions – catsim.irt

This module contains functions pertaining to the Item Response Theory logistic models, including probability calculations, information functions, likelihood estimation, and other IRT-related utilities.

class catsim.irt.NumParams(*values)[source]

Enumerator that informs how many parameters each logistic model of IRT has.

Created to avoid accessing magic numbers in the code. Each value corresponds to the number of parameters in the respective IRT model.

Attributes:
PL1int

One-parameter logistic model (Rasch model).

PL2int

Two-parameter logistic model.

PL3int

Three-parameter logistic model.

PL4int

Four-parameter logistic model.

catsim.irt.confidence_interval(theta: float, items: ndarray[tuple[Any, ...], dtype[floating[Any]]], confidence: float = 0.95) tuple[float, float][source]

Compute the confidence interval for an ability estimate.

The confidence interval is computed using the normal approximation:

\[CI = \hat{\theta} \pm z_{\alpha/2} \times SEE(\hat{\theta})\]

where \(z_{\alpha/2}\) is the critical value from the standard normal distribution corresponding to the desired confidence level, and \(SEE\) is the standard error of estimation.

Parameters:
thetafloat

The estimated ability value.

itemsnpt.NDArray[numpy.floating[Any]]

A matrix containing item parameters for administered items.

confidencefloat, optional

The confidence level, must be between 0 and 1. Default is 0.95 (95% confidence). Common values are 0.90 (90%), 0.95 (95%), and 0.99 (99%).

Returns:
tuple[float, float]

A tuple containing (lower_bound, upper_bound) of the confidence interval.

Raises:
ValueError

If confidence is not between 0 and 1.

Examples

>>> import numpy as np
>>> items = np.array([[1.0, 0.0, 0.0, 1.0], [1.2, -0.5, 0.0, 1.0]])
>>> theta = 0.5
>>> lower, upper = confidence_interval(theta, items, confidence=0.95)
>>> print(f"95% CI: [{lower:.3f}, {upper:.3f}]")
95% CI: [-0.234, 1.234]
catsim.irt.detect_model(items: ndarray[tuple[Any, ...], dtype[floating[Any]]]) int[source]

Detect which logistic model an item matrix fits into.

Examines the item parameters to determine if they represent a 1PL, 2PL, 3PL, or 4PL model based on which parameters vary.

Parameters:
itemsnumpy.ndarray

An item matrix with at least four columns.

Returns:
int

An integer between 1 and 4 denoting the logistic model: - 1: Rasch/1PL (all discrimination equal, no guessing, no upper asymptote) - 2: 2PL (varying discrimination, no guessing, no upper asymptote) - 3: 3PL (varying discrimination, guessing present, no upper asymptote variation) - 4: 4PL (all parameters vary)

catsim.irt.icc(theta: float, a: float, b: float, c: float = 0, d: float = 1) float[source]

Compute the Item Response Theory four-parameter logistic function [Magis13].

The item characteristic curve (ICC) represents the probability that an examinee with ability \(\theta\) will correctly answer an item with the given parameters.

\[P(X_i = 1| \theta) = c_i + \frac{d_i-c_i}{1+ e^{-a_i(\theta-b_i)}}\]
Parameters:
thetafloat

The individual’s ability value. This parameter value has no boundary, but if a distribution of the form \(N(0, 1)\) was used to estimate the parameters, then typically \(-4 \leq \theta \leq 4\).

afloat

The discrimination parameter of the item, usually a positive value in which \(0.8 \leq a \leq 2.5\). Higher values indicate better discrimination between ability levels.

bfloat

The item difficulty parameter. This parameter value has no boundaries, but it must be in the same value space as theta (usually \(-4 \leq b \leq 4\)). Higher values indicate more difficult items.

cfloat, optional

The item pseudo-guessing parameter. Being a probability, \(0\leq c \leq 1\), but items considered good usually have \(c \leq 0.2\). Represents the lower asymptote (probability of guessing correctly). Default is 0.

dfloat, optional

The item upper asymptote. Being a probability, \(0\leq d \leq 1\), but items considered good usually have \(d \approx 1\). Represents the maximum probability of correct response. Default is 1.

Returns:
float

The probability \(P(X_i = 1| \theta)\) of a correct response, a value between c and d.

catsim.irt.icc_hpc(theta: float, items: ndarray[tuple[Any, ...], dtype[floating[Any]]]) ndarray[tuple[Any, ...], dtype[floating[Any]]][source]

Compute item characteristic functions for all items in a numpy array at once.

This is a high-performance computing (HPC) version that uses vectorized operations with numpy and numexpr for efficient batch processing of multiple items.

Parameters:
thetafloat

The individual’s ability value.

itemsnumpy.ndarray

Array containing item parameters with at least four columns [a, b, c, d, …].

Returns:
numpy.ndarray

An array of probabilities, one for each item, representing the probability of a correct response given the current theta.

catsim.irt.inf(theta: float, a: float, b: float, c: float = 0, d: float = 1) float[source]

Compute the information value of an item using the Item Response Theory four-parameter logistic model.

Item information quantifies how precisely an item can estimate ability at a given \(\theta\) level. Higher information indicates better precision. References are given in [Ayala2009], [Magis13].

\[I_i(\theta) = \frac{a^2[(P(\theta)-c)]^2[d - P(\theta)]^2}{(d-c)^2[1-P(\theta)]P(\theta)}\]
Parameters:
thetafloat

The individual’s ability value. This parameter value has no boundary, but if a distribution of the form \(N(0, 1)\) was used to estimate the parameters, then typically \(-4 \leq \theta \leq 4\).

afloat

The discrimination parameter of the item, usually a positive value in which \(0.8 \leq a \leq 2.5\).

bfloat

The item difficulty parameter. This parameter value has no boundary, but if a distribution of the form \(N(0, 1)\) was used to estimate the parameters, then typically \(-4 \leq b \leq 4\).

cfloat, optional

The item pseudo-guessing parameter. Being a probability, \(0\leq c \leq 1\), but items considered good usually have \(c \leq 0.2\). Default is 0.

dfloat, optional

The item upper asymptote. Being a probability, \(0\leq d \leq 1\), but items considered good usually have \(d \approx 1\). Default is 1.

Returns:
float

The information value of the item at the designated theta point.

catsim.irt.inf_hpc(theta: float, items: ndarray[tuple[Any, ...], dtype[floating[Any]]]) ndarray[tuple[Any, ...], dtype[floating[Any]]][source]

Compute the information values for all items in a numpy array using numpy and numexpr.

This is a high-performance computing (HPC) version that uses vectorized operations for efficient batch processing of item information values.

Parameters:
thetafloat

The individual’s ability value.

itemsnumpy.ndarray

Array containing item parameters with at least four columns [a, b, c, d, …].

Returns:
numpy.ndarray

Array containing the information values for each item at the given theta. Information quantifies how much an item contributes to ability estimation precision at the specified ability level.

catsim.irt.log_likelihood(est_theta: float, response_vector: list[bool], administered_items: ndarray[tuple[Any, ...], dtype[floating[Any]]]) float[source]

Compute the log-likelihood of an ability, given a response vector and the parameters of the answered items.

The likelihood function of a given \(\theta\) value given the answers to \(I\) items is given by [Ayala2009]:

\[L(X_{Ij} | \theta_j, a_I, b_I, c_I, d_I) = \prod_{i=1} ^ I P_{ij}(\theta)^{X_{ij}} Q_{ij}(\theta)^{1-X_{ij}}\]

Finding the maximum of \(L(X_{Ij})\) includes using the product rule of derivations. Since \(L(X_{Ij})\) has \(j\) parts, it can be quite complicated to do so. Also, for computational reasons, the product of probabilities can quickly tend to 0, so it is common to use the log-likelihood in maximization/minimization problems, transforming the product of probabilities into a sum of log probabilities:

\[\log L(X_{Ij} | \theta_j, a_I, b_I, c_I, d_I) = \sum_{i=1} ^ I \left\lbrace x_{ij} \log P_{ij}(\theta)+ (1 - x_{ij}) \log Q_{ij}(\theta) \right\rbrace\]

where \(Q_{ij}(\theta) = 1 - P_{ij}(\theta)\).

Parameters:
est_thetafloat

Estimated ability value.

response_vectorlist[bool]

A boolean list containing the response vector, where True indicates a correct response and False indicates an incorrect response.

administered_itemsnumpy.ndarray

A numpy array containing the parameters of the answered items.

Returns:
float

Log-likelihood of the given ability value, given the responses to the administered items. Higher values indicate better fit.

Raises:
ValueError

If response vector and administered items have different lengths or if the response vector contains non-Boolean elements.

catsim.irt.max_info(a: float = 1, b: float = 0, c: float = 0, d: float = 1) float[source]

Return the \(\theta\) value at which an item provides maximum information.

For the 1-parameter and 2-parameter logistic models, this \(\theta\) equals \(b\). In the 3-parameter and 4-parameter logistic models, however, this value is given by ([Magis13])

\[argmax_{\theta}I(\theta) = b + \frac{1}{a} log \left(\frac{x^* - c}{d - x^*}\right)\]

where

\[x^* = 2 \sqrt{\frac{-u}{3}} cos\left\{\frac{1}{3}acos\left(-\frac{v}{2}\sqrt{\frac{27}{-u^3}}\right)+ \frac{4 \pi}{3}\right\} + 0.5\]
\[u = -\frac{3}{4} + \frac{c + d - 2cd}{2}\]
\[v = \frac{c + d - 1}{4}\]

A few results can be seen in the plots below:

(Source code, png, hires.png, pdf)

_images/irt-1.png

Fig. 9 Item information curves for two distinct items. The point of maximum information denoted by a dot.

Parameters:
afloat, optional

Item discrimination parameter. Default is 1.

bfloat, optional

Item difficulty parameter. Default is 0.

cfloat, optional

Item pseudo-guessing parameter. Default is 0.

dfloat, optional

Item upper asymptote. Default is 1.

Returns:
float

The \(\theta\) value at which the item with the given parameters provides maximum information.

catsim.irt.max_info_hpc(items: ndarray[tuple[Any, ...], dtype[floating[Any]]]) ndarray[tuple[Any, ...], dtype[floating[Any]]][source]

Parallelized version of max_info() using numpy and numexpr.

This high-performance computing version efficiently computes the theta values of maximum information for multiple items simultaneously using vectorized operations.

Parameters:
itemsnumpy.ndarray

Array containing item parameters with at least four columns [a, b, c, d, …].

Returns:
numpy.ndarray

An array of theta values, one for each item, indicating where each item’s information function reaches its maximum.

catsim.irt.negative_log_likelihood(est_theta: float, *args: Any) float[source]

Compute the negative log-likelihood of an ability value for a response vector and parameters of administered items.

This function is used by scipy.optimize optimization functions that tend to minimize values, instead of maximizing them. Since we want to maximize the log-likelihood, we minimize the negative log-likelihood. The value of negative_log_likelihood() is simply \(-\) log_likelihood().

Parameters:
est_thetafloat

Estimated ability value.

*argstuple

Variable length argument list. Expected to contain:

  • args[0]: response_vector (list[bool]) - A Boolean list containing the response vector

  • args[1]: administered_items (numpy.ndarray) - A numpy array containing the parameters of the answered items

Returns:
float

Negative log-likelihood of a given ability value, given the responses to the administered items. Lower values indicate better fit.

catsim.irt.normalize_item_bank(items: ndarray[tuple[Any, ...], dtype[floating[Any]]]) ndarray[tuple[Any, ...], dtype[floating[Any]]][source]

Normalize an item matrix so that it conforms to the standard used by catsim.

The item matrix must have dimension \(n \times 4\), in which column 0 represents item discrimination, column 1 represents item difficulty, column 2 represents the pseudo-guessing parameter and column 3 represents the item upper asymptote.

This function automatically expands item matrices with fewer than 4 columns:

  • 1 column (1PL): Assumed to be difficulty. Discrimination=1, c=0, d=1 are added.

  • 2 columns (2PL): Assumed to be [discrimination, difficulty]. c=0, d=1 are added.

  • 3 columns (3PL): Assumed to be [discrimination, difficulty, pseudo-guessing]. d=1 is added.

  • 4 columns (4PL): Already complete, returned as-is.

Parameters:
itemsnumpy.ndarray

The item matrix with 1, 2, 3, or 4 columns.

Returns:
numpy.ndarray

An \(n \times 4\) item matrix conforming to the 4-parameter logistic model, with columns [a, b, c, d].

catsim.irt.reliability(theta: float, items: ndarray[tuple[Any, ...], dtype[floating[Any]]]) float[source]

Compute test reliability [Thissen00].

Test reliability is a measure of internal consistency for the test, similar to Cronbach’s \(\alpha\) in Classical Test Theory. Its value is always lower than 1, with values close to 1 indicating good reliability. If \(I(\theta) < 1\), \(Rel < 0\) and in these cases it does not make sense, but usually the application of additional items solves this problem.

\[Rel = 1 - \frac{1}{I(\theta)}\]
Parameters:
thetafloat

An ability value.

itemsnumpy.ndarray

A matrix containing item parameters.

Returns:
float

The test reliability at theta for a test represented by items. Values range from negative infinity to just below 1, with values closer to 1 indicating higher reliability.

catsim.irt.scale_to_theta(score: float | ndarray[tuple[Any, ...], dtype[floating[Any]]], scale_min: float = 0, scale_max: float = 100, theta_min: float = -4, theta_max: float = 4) float | ndarray[tuple[Any, ...], dtype[floating[Any]]][source]

Convert scores from a custom scale to theta values using linear transformation.

This function transforms scores from any desired scale (e.g., 0-100, 200-800) back to the standard IRT theta scale (typically centered at 0).

The inverse linear transformation is:

\[\theta = \frac{\text{score} - b}{a}\]

where \(a\) and \(b\) are determined by the scale mapping.

Parameters:
scorefloat or numpy.ndarray

Score value(s) on the custom scale.

scale_minfloat, optional

Minimum value of the score scale. Default is 0.

scale_maxfloat, optional

Maximum value of the score scale. Default is 100.

theta_minfloat, optional

Minimum theta value in the mapping. Default is -4.

theta_maxfloat, optional

Maximum theta value in the mapping. Default is 4.

Returns:
float or numpy.ndarray

The transformed theta value(s). Returns the same type as input.

Notes

  • This is the inverse of theta_to_scale()

  • Useful for converting cutoff scores to theta values for classification

  • Preserves the relative ordering and distances of scores

Examples

>>> # Convert 0-100 score to theta
>>> scale_to_theta(50.0)  # Average score
0.0
>>> scale_to_theta(75.0)  # High score
2.0
>>> scale_to_theta(25.0)  # Low score
-2.0
>>> # Convert from SAT-like scale (200-800)
>>> scale_to_theta(500.0, scale_min=200, scale_max=800)
0.0
>>> # Convert multiple values at once
>>> import numpy as np
>>> scores = np.array([0, 25, 50, 75, 100])
>>> scale_to_theta(scores)
array([-4., -2.,  0.,  2.,  4.])
catsim.irt.see(theta: float, items: ndarray[tuple[Any, ...], dtype[floating[Any]]]) float[source]

Compute the standard error of estimation (\(SEE\)) of a test at a specific \(\theta\) value [Ayala2009].

The standard error of estimation is the square root of variance and represents the typical error in ability estimation. It is in the same units as the ability scale, making it more interpretable than variance.

\[SEE = \sqrt{\frac{1}{I(\theta)}}\]

where \(I(\theta)\) is the test information (see test_info()).

Parameters:
thetafloat

An ability value.

itemsnpt.NDArray[numpy.floating[Any]]

A matrix containing item parameters.

Returns:
float

The standard error of estimation at theta for a test represented by items. Returns infinity if test information is 0.

catsim.irt.test_info(theta: float, items: ndarray[tuple[Any, ...], dtype[floating[Any]]]) float[source]

Compute the test information of a test at a specific \(\theta\) value [Ayala2009].

Test information is the sum of individual item information values and indicates the precision of ability estimation at a given ability level.

\[I(\theta) = \sum_{j \in J} I_j(\theta)\]
Parameters:
thetafloat

An ability value at which to compute test information.

itemsnumpy.ndarray

A matrix containing item parameters with at least four columns [a, b, c, d, …].

Returns:
float

The test information at theta for a test represented by items. Higher values indicate more precise ability estimation.

catsim.irt.theta_to_scale(theta: float | ndarray[tuple[Any, ...], dtype[floating[Any]]], scale_min: float = 0, scale_max: float = 100, theta_min: float = -4, theta_max: float = 4) float | ndarray[tuple[Any, ...], dtype[floating[Any]]][source]

Convert theta values to a custom score scale using linear transformation.

This function transforms ability estimates from the standard IRT theta scale (typically centered at 0) to any desired score scale (e.g., 0-100, 200-800).

The linear transformation is:

\[\text{score} = a \cdot \theta + b\]

where \(a\) and \(b\) are determined by mapping the theta range to the score range.

Parameters:
thetafloat or numpy.ndarray

Ability value(s) on the theta scale.

scale_minfloat, optional

Minimum value of the target score scale. Default is 0.

scale_maxfloat, optional

Maximum value of the target score scale. Default is 100.

theta_minfloat, optional

Minimum theta value to map. Default is -4 (typical lower bound).

theta_maxfloat, optional

Maximum theta value to map. Default is 4 (typical upper bound).

Returns:
float or numpy.ndarray

The transformed score(s) on the target scale. Returns the same type as input.

Notes

  • Values outside [theta_min, theta_max] are extrapolated linearly

  • The transformation preserves relative distances between ability levels

  • Common score scales: 0-100, 200-800 (SAT), 0-500 (TOEFL), etc.

Examples

>>> # Convert theta to 0-100 scale
>>> theta_to_scale(0.0)  # Average ability
50.0
>>> theta_to_scale(2.0)  # High ability
75.0
>>> theta_to_scale(-2.0)  # Low ability
25.0
>>> # Convert to SAT-like scale (200-800)
>>> theta_to_scale(0.0, scale_min=200, scale_max=800)
500.0
>>> # Convert multiple values at once
>>> import numpy as np
>>> thetas = np.array([-4, -2, 0, 2, 4])
>>> theta_to_scale(thetas)
array([  0.,  25.,  50.,  75., 100.])
catsim.irt.validate_item_bank(items: ndarray[tuple[Any, ...], dtype[floating[Any]]], raise_err: bool = False) None[source]

Validate the shape and parameters in the item matrix to ensure it conforms to catsim standards.

The item matrix must have dimension \(n \times 4\), in which column 0 represents item discrimination, column 1 represents item difficulty, column 2 represents the pseudo-guessing parameter and column 3 represents the item upper asymptote.

The item matrix must have at least one line (representing an item) and exactly four columns, representing the discrimination, difficulty, pseudo-guessing and upper asymptote parameters (\(a\), \(b\), \(c\) and \(d\)), respectively. The item matrix is considered valid if, for all items in the matrix, \(a > 0 \wedge 0 \leq c \leq 1 \wedge 0 \leq d \leq 1\).

Parameters:
itemsnumpy.ndarray

The item matrix to validate.

raise_errbool, optional

Whether to raise an error in case the validation fails (True) or just print the error message to standard output (False). Default is False.

Raises:
TypeError

If items is not a numpy.ndarray.

ValueError

If raise_err is True and the item matrix does not meet validation criteria.

catsim.irt.var(theta: float | None = None, items: ndarray[tuple[Any, ...], dtype[floating[Any]]] | None = None, test_inf: float | None = None) float[source]

Compute the variance (\(Var\)) of the ability estimate of a test at a specific \(\theta\) value.

Variance quantifies the uncertainty in the ability estimate. Lower variance indicates more precise estimation.

\[Var = \frac{1}{I(\theta)}\]

where \(I(\theta)\) is the test information (see test_info()).

Parameters:
thetafloat or None, optional

An ability value (required if test_inf is not provided). Default is None.

itemsnumpy.ndarray or None, optional

A matrix containing item parameters (required if test_inf is not provided). Default is None.

test_inffloat or None, optional

The test information value. If provided, theta and items are not required. Default is None.

Returns:
float

The variance of ability estimation. Returns negative infinity if test information is 0.

Raises:
ValueError

If neither test_inf nor both theta and items are provided.