We ought to think about high stakes psychometric tests in a wider contexts than we usually do, namely (a) in the context of human functioning and (b) in the context of human rights and democracy. Then it will become apparent that we have to abandon current psychometric tests and look for alternatives.
(a) All tests which are based on classical test theory (CTT) and its off-springs (e.g., item-response-theory, Rasch-scaling) are mistaken. The hidden psychology of its statistics is at odds with our knowledge of psychological functioning underlying human behavior. These tests are built on a very questionable postulate which says: each and every human response (Y) to a test is determined only by one disposition (X), namely the competence or personality under consideration, except for some degree of random measurement error (e) which can be easily minimized by repeating measurements (Gulliksen 1950):
Y = X + e
Modern variations of classical test theory (Rasch scaling, Item Response Theory, etc.) follow the same logic, except that they deal with continuous dispositions (variables) instead of dichotomous variables.
Common sense as well as psychological research agree that a response to a test item is rarely, if ever, determined by a single disposition but mostly by several dispositions. Moreover, the mixture of determining dispositions is different from person to person. Hence a single response to an item is ambiguous and does not allow to make any valid inference on a particular disposition. If data falsify this believe they are mis-classified as “unreliable.”
Moreover, repeated measurement is mostly not possible with human subjects. Repeated questions have to be varied, and the more we vary the items/tasks of a test in order to reduce “error” or “unreliability,” the less valid a test becomes.
Better methodologies for psychological measurement exist. Egon Brunswik’s (1955) “diacritical method” gave an hint how to solve the problem. But he did not develop a workable methodology. In the 1970ties I used Brunswik’s hint to develop a new, multivariate experimentally designed measurement methodology, called Experimental Questionnaire (EQ). With EQs we can single out the disposition(s) determining an individual’s responses. EQs produce pattern of responses to orthogonally arranged pattern of tasks that let us analyze the degree to which the hypothesized internal factors are at work.
The Moral Competence Test (MCT), designed as an EQ, is in use in many countries since over 40 years. It has produced a great wealth of new findings which would not have been found with classical tests (Lind, 2016). Of course, such tests require much expertise and also much money, probably more money than the private test industry is able to provide. This seems to be a task for the public research facilities like our universities.
(b) High-stakes testing violates human rights and undermines democracy. The frequent evaluation – year by year, month by month, day by day, and sometimes even hour by hour – of students violates their basic rights and, indirectly, also of the rights of their teachers and parents. This inhumane practice has nothing to do with well reasoned and well designed assessments required before taking over a responsible position in our society. Neither has this practice anything to do with test-based studies concerning the efficacy of teaching methods and educational policy-making. Such kinds of test usage are too rare. We would get a much better education if we would select teaching methods and school policies on the basis of efficacy studies, instead of focusing exclusively on selecting students and teachers..
Frequent high-stakes testing is also a threat to democracy. It restricts students’ opportunities for practicing thinking and reflection. It leaves too little opportunity for the development of moral competence. It produces “subjects” not citizens of a democracy. As many decades of research into the development of moral competence shows, that the extreme proportion of time absorbed by the preparation for evaluations and other activities required by authorities prevents students from developing the ability to solve problems and conflicts through thinking and discussion instead of through violence, deceit and power.
Later, as adults, they will be unable to solve problems and conflicts through thinking and discussion, but will have to rely on violence and deceit. Citizens with a low moral competence require, as Thomas Hobbes has pointed out, a “strong state” and an dictator to keep violence, deception and power within bounds. In contrast, morally competent citizens do not need a “Leviathan” (Hobbes).
Gulliksen, H. (1950). Theory of mental tests. New York: Wiley.
Lind, G. (2016). How to Teach Morality. Promoting Deliberation and Discussion. Reducing Violence and Deceit. Berlin: Logos publisher.
Wilson, M. (2005). Constructing measures. An item response modeling approach. Mawah, NJ: Erlbaum Associates Publishers.
Konstanz, Oct. 25, 2016
Dr. Georg Lind, Prof. em. of Psychology
78462 Konstanz, Germany