Question-Score Identity Detection (Q-SID)

Version 2.3 (July 2024)

Q-SID identifies groups of students whose question scores on exams are more similar to each other than is usual. Using this information, the course instructor can then compare the written answers of students within each group to determine if this similarity is due to collusion.

Educational instructors upload an excel file that contains the scores to each question for each student who took the exam as well as each students' total score. Q-SID

gives each student a Collusion Score that defines the similarity of their exam to that of the partner in the class with the largest number of question scores identical to the student's,
clusters any student/partner pairs that share unusually high Collusion Scores into small Collusion Groups of typically two to five students.
assigns each Collusion Group an Empirical false positive rate (FPR) based on the frequency that Collusion Groups are found in strictly proctored examinations and, in many instances, a Synthetic FPR calculated from in silico data that mimics the distribution of question scores expected in the instructor's exam if no collusion occurred.

To use Q-SID, read the Q-SID Guide and go to the Analyze Data page. Students should read the explanation on the For Students page.

Q-SID was developed by researchers in the Department of Statistics at UCLA and at Lawrence Berkeley National Laboratory using exam data and instructor feedback from UC Berkeley and UCLA. Q-SID controls the FPR. Collusion Groups have Empirical FPRs of 0.04%, 0.20%, or 0.56% and Synthetic FPRs no higher than 0.8%, see the Q-SID Guide.

Q-SID works well for class sizes of 25 or more. It can identify up to 80% of the students who have colluded on a given exam, depending on the number of question scores provided. It requires at least 20 question scores and performs best with at least 50 scores. Q-SID can combine data from two or more exams from the same class to facilitate optimum analysis, see the Q-SID Guide. A detailed description of the Q-SID algorithm is available in this arXiv preprint.

Q-SID's goals are fourfold: One, to identify those students most likely to have colluded; Two, to protect the majority of students who obey their school's honor code by ensuring that they receive the grade that they deserve, not a lower one; Three, to deter cheating by informing students in advance that their exams will be screed by Q-SID; Four, to allow departments and institutions to measure collusion system-wide and thereby establish more informed policies. For discussion of the importance of using quantitative methods such as Q-SID to address collusion, read this Forbes article and listen to this podcast.

Q-SID does not store users' data or require registration. Nor does it record IP address, location or other user information. The Q-SID team would, however, welcome feedback on its performance. Feel free to email us about your experience with Q-SID at qsid@stat.ucla.edu.

Teachers employed by nonprofit institutions, as well as academic or nonprofit researchers, are permitted to use the Q-SID Program posted on this site subject to the Academic License.

Q-SID Analysis

Use this page to upload a file that contains the question scores for all students from one exam. Q-SID will process these data to produce two output files as described in the Q-SID Guide.

Q-SID takes as input an excel file in either .xls, .xlsx or .csv format.

When uploading a single exam for analysis, the data should be placed on Sheet 1. When uploading multiple exams for the same class for a combined analysis, each exam should be placed on a separate sheet in the file, starting with Sheet 1 and with subsequent exams placed on consecutive sheets. Q-SID expects the number of exams specified by the user to correspond to the number of sheets with data. When more than one exam is included, Q-SID combines and uses data only for those students whose IDs are present in all exams.

On each sheet, the top row of the spreadsheet should describe the information in each column for all rows below it. The user determines the titles used in these column headers, but the titles must correspond to the information specified below.

The columns in the second and successive rows must contain in order left to right: the student's ID, which can be any text; the student's total score on the exam, which must be a number; and multiple columns, each of which give the score for one question. The score may be either a numeric value—representing the graded number of points that the student obtained—or any one letter/word of text representing the student's choice of answer to a multiple choice question, for example a, b, c, d or e; or true or false. A mix of numeric and text scores for a single exam are allowed. Each row must contain information for one student. Data from any rows that share the same Student ID as well as data in any row lacking an ID will be not be used by Q-SID. Q-SID will list the IDs of any data it ignores.

The columns and rows must be in the order specified. Ensure that no additional information is present in the file below the rows containing student data. Download this template file as an example.

Frequently asked questions

Q. How is a question score defined?

A. A question score can be either a numeric value—representing the graded number of points that the student obtained—or for multiple choice questions any one letter/word of text representing the student's choice of answer, for example a, b, c, d, or e; or true or false. A mix of numeric and text scores for a single exam are allowed.

Q. Is membership of a Collusion Group sufficient proof that a student cheated?

A. No. Membership of a Collusion Group is not intended as sole proof of collusion. Collusion Groups identify students who may have colluded. It is essential that the written answers of the students be carefully compared to determine if in fact they did collude, see the Q-SID Guide for a discussion of this process.

Q. How accurate are FPRs?

Q-SID calculates Empirical FPRs and Synthetic FPRs using entirely different data and assumptions, see the Q-SID Guide. Despite this, these two FPRs broadly agree and both are robust to changes in the number and variation in question scores. The maximum FPRs for Collusion Groups are an Empirical FPR of 0.56% and a Synthetic FPR of 0.8%. 95% confidence limits for each estimate are also provided.

Q. How is exam Complexity defined?

A. Complexity is a measure of how much information is present in the question scores for a given exam or combination of exams from the same class. The more question scores are recorded for an exam, the higher the Complexity. The more variation there is in the scores that students receive for a typical question, the higher the Complexity. The larger the Complexity, the greater Q-SID's ability to detect collusion should it exist. Complexity is calculated for each exam by Q-SID and reported to the instructor. See the Q-SID Guide for the expected performance of Q-SID in detecting students who colluded on exams of given Complexities.

Q. How can I best design an exam to allow Q-SID to efficiently detect collusion?

A. Q-SID is more effective the higher the Complexity of the exam, see the Q-SID Guide. For technical reasons Q-SID is not effective for exams with Complexities <8 and will not calculate Collusion Scores for exams below this threshold. We recommend that exams ideally have a Complexity of ≥15. To estimate the Complexity of an exam prior to giving the exam to the class, Q-SID can be run on the results of similar exams from prior years.

To maximize exam Complexity, for questions that carry many points, instructors should breakdown and recoded separately the scores for parts of the question to generate more independent scores. If a subset of questions on the test are multiple choice, record the students choice of answer as a question score. Knowing which wrong answer students give provides additional statistical power in detecting collusion.

Q. What if the Complexity of the exam I gave is too low?

A. To increase Complexity and thus the power to detect collusion, Q-SID can combine data from two or more exams for the same class. The Complexity of the combined exam is calculated as the sum of the Complexities of the individual exams. To have Q-SID combine exams, instructors simply upload a single excel file with each exam on a separate sheet and specify the number of exams on the Analyze Data page.

Q. Why is it necessary to specify on the Analyze Data page if the question score data in the input file is only numeric?

A. User specification allows Q-SID to test that the data in the file meets the user's expectation. In addition, the current implementation of Q-SID will not calculate a synthetic FPR in cases where multiple choice answers are provided instead of numeric scores for one or more question.

Q. Does Q-SID perform differently depending on the number of students in the class?

A. Q-SID is remarkably robust to changes in class size, False positive rates are not affected by the number of students in a class for all classes of 25 or more students. The percent of the students who collude that are identified by Q-SID (i.e. true positive rates) are similar for classes of 100 or more, but somewhat less for classes smaller than 100, see the Q-SID Guide. Q-SID does not analyze exams with fewer than 25 students.

Q. Does Q-SID store the data that users upload?

A. No. Users data is not stored after the results of the analysis have been provided.

Q. The file I uploaded had students listed in a certain order, but the Q-SID output excel file has ranked the students differently. Why?

A. Q-SID ranks students by Collusion Score and Collusion Group, with the student/partner pair with the highest score at the top.

Q. What relevant expertise do the Q-SID developers have?

A. The senior developers have many years’ experience establishing analysis methods for large scale quantitative datasets in biology, specifically in genomics and proteomics. They have also taught many undergraduate and graduate classes in Molecular Biology or Statistics at Yale, UC Berkeley and UCLA.

Other questions? Please contact us at qsid@stat.ucla.edu

Explanation for students

Dear Students,

Our chief motivation for developing Q-SID and making it freely available is to protect the large majority of you who follow your school's honor code and play by the rules. When, unfortunately, some students collude on their exams, it disadvantages the majority of you who do not. Because it is challenging to grade each student's performance objectively on an absolute scale, instructors tend to judge your performance relative to that of the other students in the class. As a result, if collusion is allowed to go unchecked, those of your who do not cheat may receive a grade lower than you deserve.

Q-SID protects students who play by the rules by helping identify others who have colluded. Students who exchange information sufficiently to affect their grade will be detected by Q-SID. In our experience, students who cheat generally come to regret their actions and suffer significant remorse. It would be far healthier to avoid colluding in the first place.

Those of you who play by the rules will not be falsely accused as a result of Q-SID being employed. This web tool's only role is to identify small groups with similar question scores so that their written answers can be examined in detail by course instructors. Compelling evidence for collusion requires that group members' written answers should be substantially more similar to each other's than those of the rest of the class. Shared errors that are not present either in the course materials or in study guides are often indicative of collusion. The written exams of students who have not colluded are remarkably distinct from each other compared to those who have.

We hope that all students will benefit from Q-SID.

The Q-SID team

Dr. Mark D. Biggin, Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley CA: Website.

Prof. Jingyi Jessica Li, Department of Statistics, UCLA, Los Angeles CA: Website

Guan'ao Yan, Department of Statistics, UCLA, Los Angeles CA: Website

We can be contacted at qsid@stat.ucla.edu.

We gratefully acknowledge help from a number of instructors at UC Berkeley who provided question score data from their classes and feedback on the results of Q-SID's analysis.

Academic License

Academic License: © 2022 Mark Biggin, Jingyi Li, and Guan’ao Yan (“Authors”). Teachers employed by nonprofit institutions, as well as academic or nonprofit researchers, are permitted to use the Q-SID Program posted on this site subject to Paragraphs 1-3:

1. Authors hereby grant to you free of charge, so long as you are a teacher employed by a nonprofit institution or an academic or nonprofit researcher, a nonexclusive license under Authors’ ownership interest in this Q-SID Program (the “Program”) to use the Program solely for non-commercial educational or academic research purposes subject to the terms of this Academic License. Except as granted herein, all rights are reserved by Authors, including the right to pursue patent protection of any and all features of the Program. UCLA and the University of California system has no ownership interest in the Program. Failure by you to adhere to the requirements in Paragraphs 1 and 2 will result in immediate termination of the license granted to you pursuant to this Academic License effective as of the date you first used the Program.

2. By uploading data to this website, you represent that you have rights and/or permission to use such data and to analyze it using this website. Due to limited server capacity, any use of scripting or automation for uploading data to this website is prohibited and considered in violation of this license. Any data that is output by the Program should also be used only for non-commercial educational or academic research purposes.

3. In no event shall authors be liable to any entity or person for direct, indirect, special, incidental, or consequential damages, including lost profits, arising out of the use of this program, even if authors have been advised of the possibility of such damage. The program and any other services of this website are provided “as is.” Authors specifically disclaim any and all warranties, express or implied, including, but not limited to, any implied warranties of merchantability and fitness for a particular purpose. Authors have no obligation to provide maintenance, support, updates, enhancements, or modifications of this program.

Commercial entities: please contact qsid@stat.ucla.edu for licensing opportunities. The Q-SID method is covered by US patent NO 11,915,615 B2 granted to Authors.