Validity and reliability in assessment

Sometimes, we might expect multiple examinees to have very similar competence, and consequently very similar assessment scores. The standard of administering exams that are fair demands that decisions about someone’s competence are based on reliable data.

Conversely, if everyone gets a vastly different score in no discernible pattern, then the assessment is not tapping into the holistic competency of the examinees and so the data will also have low reliability. Practically speaking, the reliability of assessment data is an indicator of how well it can distinguish between examinees of different competence levels if everyone gets the same score, the data have no reliability. any error related to the assessment process itself (such as the difficulty of a question or the subjective ratings of an examiner) – which is itself composed of multiple elements (some known and some unknown) their ability to achieve that level on any given task (true variations in their performance due to relevant factors or a context specific level of competence)Ĭ. the examinee’s true ability level, (which might be thought of as a holistic level of competence)ī. Reliability of the data is expressed as a coefficient between 0 and 1, and reflects a complex relationship between:Ī. What does reliability and validity mean? ReliabilityĪ practical definition considers the reliability of the data produced by the assessment it is inappropriate to think of the reliability of an assessment as if it will never change or as if it can appropriately evaluate all aspects of individual competence (1). The psychometric statistics from these models can tell us if the exam data is reliable, and therefore discriminating examinees of different competency levels, and if results are valid, which demonstrates the exam is aligned with the intended purpose.

Statistical models are used to determine how reliable and valid exams are despite the challenges of precise measurement faced by all exams.

Any measurement is inherently imprecise, and includes error, which is the difference between an expected true value and the value obtained by a measurement instrument. The respective roles of Classical Test Theory (CTT) and Item Response Theory (IRT)Įvaluating the level of competence of an individual within an entry to practice pathway requires measurement.