Content Disclaimer Copyright @2020. All Rights Reserved. |
Links : Home Index (Subjects) Contact StatsToDo
Explanations
Introduction and References
This page provides explanations and support for item analysis of multiple choice questions as in the Calculations panel
Example
Item analysis is a tool kit used during the development of multiple choice questions. It analyse each question within the context of the whole test, and estimates two characteristics of how difficult the question is, and how well it discriminates (separates) those who scores low and high in the test. Nomenclature and clarificationsUsers should be aware of the following nomenclature and data presentations used on this page
ReferencesThe creation and evaluation of multiple choice questions is a very large subject, and this page cannot hope to cover the references adequately. However, the following are easily accessible references on which this page and the accompanying program are basedhttps://www.washington.edu/assessment/scanning-scoring/scoring/reports/item-analysis/ A short but excellent tutorial on item analysis http://www.ericae.net/ft/tamu/Espy.htm A paper presented in a conference, containing the algorithms which are presented on this page. Please note the paper contains few typos, and some of the tables are misaligned. However, the contents are clear enough to follow easily, and detailed enough to be the basis of the algorithm on this page. To gueard against lost in the future, as this article is in the public domain, and as I am fully referencing it, a copy of this article can be viewed here https://en.wikipedia.org/wiki/Phi_coefficient The phi φ coefficient by Wikipedia https://en.wikipedia.org/wiki/Point-biserial_correlation_coefficient Point Serial Correlation by Wikipedia
This panel takes the user through the steps of calculation and explains the results produced, using the default example data in the Calculations panel
Default Example DataThe default example data are artificially created to demonstrate the algorithm, and does not reflect reality. It purports to be the results of a multiple choice test, using 10 questions (items) and on 50 students (responses). Three inputs are required.
Table 1. Count for Answer from each question is a descriptive table as shown to the right
User should note: that the data contains only those options that have been chosen at least once. For example, in a quesion with options A, B, C, and D, if C has never been chosen then only A, B, and D appears in the data and included in the calculations. Table 1 is therefore important for user to review how much each option in each question have been chosen. Table 2. Score for each Response
Response 1 correctly answered 9 out of the 10 questions, so p=9/10=0.9, and designated a high scorer (H).
Table 2 is descriptive, allowing users to check the score of each response and their distribution in the data
Table 3. Proportion Correct Table 3 displays the proportion of correct answers for each question, and compare this with the proportion if the answer is randomly chosen , and what it ideally should be. this is shown in the table to the right
Proportions correct (pcorrect) and when chosen at random (prandom) have already been discussed with table 1. The ideal proportion (pideal) is calculated as midpoint between 1 and random. Using question 3 as an example
pideal = (1 - prandom) / 2 + prandom = (1 - 0.33) / 2 + 0.33 = 0.67 Table 3 provides an initial estimate of the level of difficulty for each question, as the rate of success (Pcorrect) can be compares with that from a ransom guess (Prandom) and when half the responses were correct if no responses were a random guess (Pideal). Table 4. Characteristics of Questions
Table 4 displays the formal indeces that are used to define the characteristics of each question
Item analysis is a generic term that applies to many tests that uses multiple measurements. In this page item analysis refers to the evaluation of questions or items in multiple choice questions intended for evaluation of students in the educational setting.
Calculations
The development of multiple choice questions, and the selection from a bank of such questions for a particular examination, involve complex methodologies that are beyond the scope of his page, which covers only the statistical aspects concerning difficulty and discrimination. Users are reminded that evaluations of items are context sensitive. The results from the same questions will be different if the responders are from different age group, educational experience, language, and other parameters that affect the ability to choose the correct answere. The low (L) and high (H) scoring groups Some of the indeces in item analysis are carried out on a subset of the total population being tested, those with very high and very low total scores from all questions. This results in indeces that reflect the importance of extreme values, and are useful to identify outlyers for awards or failures. The general recommendation (see references) is to select responses whose total scores are in the top and bottom 27 percentile. The idea is to make the two groups as different as possible. In small data sets, with fewer responses and questions, a larger percentile may be necessary to obtain sufficient sample size for stable results. In large data set, a smaller percentile can be used to enhance the difference between those with low and high scores. Measurement of difficulty measures how likely the question will be answered correctly. Two measurements of difficulties are provided
Measurement of Discrimination measures how well the question discriminate the responders with low and high total scrores of all questions. Three measures of discrimination are provided
General CommentsThe creation and evaluation of multiple choice questions, as well as the selection from a bank of questions for a test, are complex and sophisticated tasks that require the combined efforts of subject expert, educational psychologist, and statistician. Most of the expertise required are beyond the scope of this page, and inexperienced users are strongly advised to seek guidance from those with appropriate expertise.What follows are simplistic and elementary comments of how difficulty and discrimination idex are used. If an examination is criteria based, evaluating that the responses have achived a level of competence represented by correct answers, then selecting questions base on the difficulty index is appropriate If an examination is normative based, evaluating the responses against each other, then selecting questions based on discrimination is appropriate. If, in addition, the objective is to rank the responses, then discrimination index based on the total population, such as the Biserial Correlation ρ is more appropriate. On the other hand, if the objective is to identify the outlyers, to select the high achivers for awards or low achievers for exclusion, then the index based on the low and high scoring population such as Idxdiscrimination or φ is appropriate.
Hints on Data Entry
This panel provides supports for data entry and interpretation of results only. Detailed discussions on Item Analysis for Multiple Choice Questions are provided in the Introduction panel
Javascript Program
Data EntryThree (3) sets of data are requiredMultiple Choice Responses is a table of test data for item analysis.
Percentile is the percentile used to identify the high and low scorers for the purpose of estimating the difficulty and discrimination ideces. The recommended value is 27% (and its correcponsing (100-27=73%) for low and high scorers. This default value should be used unless the users has reasons to change it. More discussions on this issue are provided in the Introduction section Default Example DataThe default example data are artificially created to demonstrate the algorithm, and does not reflect reality. It purports to be the results of a multiple choice test, using 10 questions (items) on 50 students (responses).
Percentile setting to identify high and low scorers is set at 27 (and its correcponsing 100-27=73 for high scorers) Please note: The data was generated using random numbers so that the proportions of correct answers were set at around 0.5. This results in the indeces calculated to be lower than expected from a real set of multiple choice questions. Users should not be confused and think the level of difficulties and discrimination presented on this page are at the expected level. ResultsDetailed explanation of results are presented in the explanation panel. The following are summary descriptionsTable 1 displays the number (count) and proportion (p) for each answer in each question, matching against the proportion if the answers were randomly chosen. Table 2 displays the score (n) and proportion of correct answers from each response (row), and the label as low (L) , high (H), or medium (M) scorers Table 3 displays the number (Ncorrect) ane proportion (pcorrect) of correct answers for each question, and matching these against the proportion if the question is answered in random (prandom), and the rheoretically ideal proportion (pideal) Table 4 displays the results of item analysis for each question. Detailed descriptions are presented in the Explanation panel ReferencesCalculations are based on algorithm described in the web page http://www.ericae.net/ft/tamu/Espy.htm. To gueard against lost in the future, as this article is in the public domain, and as I am fully referencing it, a copy of this artical can be viewed here
Multiple Choice Responses
The data is a table of answers Each row is from a response Each column from a question, separated by spaces or tabs Each cell the answer provided for that question by that response Correct Answers Percentile to Mark High and Low Scores
|