Question-Score Identity Detection (Q-SID)

Version 2.0 (Sept. 2021)


Q-SID identifies groups of students whose scores for many exam questions are more similar to each other than is usual. Using this information, the course instructor can then compare the written answers of students within each identified group to determine if this similarity is due to collusion.

Educational instructors upload an excel file that contains the scores to each question for each student who took the exam. Q-SID

  • gives each student a Collusion Score that defines the similarity of their exam to that of the partner in the class with the largest number of question scores identical to the student's,
  • clusters any student/partner pairs that share unusually high Collusion Scores into small Collusion Groups of typically two to five students.
  • estimates the false positive rate for each Collusion Group based on the frequency that Collusion Groups are found by Q-SID in strictly proctored examinations taken in person.

To use Q-SID, read the Q-SID Guide, view Example Results, and go to the Analyze Data page. Students should read the explanation on the For Students page.


Q-SID was developed to counter the increase in cheating that has occurred since the COVID-19 pandemic prevented strict proctoring of exams. Q-SID was established and optimized using data from in-person proctored, pre-pandemic exams and unproctored, online, pandemic era exams taken at UC Berkeley. Q-SID assigns each Collusion Group one of three false positive rates (FPRs): 0.5% FPR, 0.2% FPR, or 0.05% FPR. Approximately half of the students known to have colluded based on analysis of their written answers as well as on student confessions are found at FPR 0.05%, as shown here. Q-SID identifies 40% to 90% of the students who have colluded on a given exam, depending on the number of question scores available.


Q-SID was developed by researchers in the Department of Statistics at UCLA for the wider academic community. Its goals are fourfold: One, to identify those students most likely to have colluded; Two, to protect the majority of students who obey their school‘s honor code by ensuring that they receive the grade that they deserve, not a lower one; Three, to deter cheating by informing students in advance that their exams will be screed by Q-SID; Four, to allow departments and institutions to measure collusion system-wide and thereby establish more informed policies. Read this Article for evidence that Q-SID deters cheating.


Q-SID works well for all class sizes of 25 or more. It requires at least 20 question scores recorded per exam and performs best if an exam has at least 40 scores and ideally >60. Q-SID can combine data from two or more exams from the same class to facilitate optimum analysis. Please read the Q-SID Guide for full details.


Q-SID does not store users' data or require registration. The Q-SID team would, however, welcome feedback on its performance. Feel free to email us about your experience with Q-SID at qsid@stat.ucla.edu.


We do not record IP address, location or other user information. For users whose browser allows tracking, the site will record (anonymously) each occasion that a Q-SID analysis is run. Users who do not wish their use of Q-SID to be so counted can prevent this by using Firefox with Privacy set to 'Strict' and Do Not Track set to 'Always'.


Academic or nonprofit researchers, as well as teachers employed by nonprofit institutions, are permitted to use the Q-SID Program posted on this site subject to the Academic License.

Q-SID Analysis


Use this page to upload a file that contains the question scores for all students from one exam. Q-SID will process these data to produce two output files as described in the Q-SID Guide.


Q-SID takes as input an excel file in either .xls, .xlsx or .csv format.


When uploading a single exam for analysis, the data should be placed on Sheet 1. When uploading multiple exams for the same class for a combined analysis, each exam should be placed on a separate sheet in the file, starting with Sheet 1 and with subsequent exams placed on consecutive sheets. Q-SID expects the number of exams specified by the user to correspond to the number of sheets with data. When more than one exam is included, Q-SID combines and uses data only for those students whose IDs are present in all exams.


On each sheet, the top row of the spreadsheet should describe the information in each column for all rows below it. The user determines the titles used in these column headers, but the titles must correspond to the information specified below.


The columns in the second and successive rows must contain in order left to right: the student's ID, which can be any text; the student's total score on the exam, which must be a number; and multiple columns, each of which give the score for one question. The score may be either a numeric value—representing the graded number of points that the student obtained—or for multiple choice tests any one letter/word of text representing the student's choice of answer, for example a, b, c, d or e; or true or false. A mix of numeric and text scores for a single exam are allowed. Each row must contain information for one student. Data from any rows that share the same Student ID as well as data in any row lacking an ID will be not be used by Q-SID. Q-SID will list the IDs of any data it ignores.


The columns and rows must be in the order specified. Ensure that no additional information is present in the file below the rows containing student data. Download this template file as an example.

Frequently asked questions


Q. How is a question score defined?

A. A question score can be either a numeric value—representing the graded number of points that the student obtained—or for multiple choice tests any one letter/word of text representing the student's choice of answer, for example a, b, c, d, or e; or true or false. A mix of numeric and text scores for a single exam are allowed.


Q. How is exam Complexity defined?

A. Complexity is a measure of how much information is present in the question scores for a given exam or combination of exams from the same class. The more question scores are recorded for an exam, the higher the Complexity. The more variation there is in the scores that students receive for a typical question, the higher the Complexity. The larger the Complexity, the greater the ability of Q-SID to detect collusion should it exist. The Q-SID guide describes how exam Complexity is calculated and indicates the likely performance of Q-SID for different Complexities.


Q. Is a high Collusion Score sufficient to prove that a student cheated?

A. Collusion Scores are not intended as sole proof of collusion. Collusion Scores identify pairs of students who may have colluded. It is essential that the written answers of students be carefully compared to determine if in fact they did collude, see the Q-SID Guide for a discussion of this process.


Q. What criteria are used to place students in a Collusion Group?

A. Each Collusion Group must include at least one student whose Collusion Score is greater than or equal to a first threshold and whose other members' Collusion Scores are greater than or equal to a second, lower threshold. The thresholds are chosen to identify on average one pair of students who did not collude out of a class of three hundred students, see the Q-SID Guide. Membership of a Collusion Group is thus not, in of itself, evidence of collusion. Instead, Collusion Groups identify students so that their answers can be compared to determine if they exchanged information.


Q. How are false positive rates (FPRs) estimated?

A. FPRs are the percent of students from strictly proctored examinations who are placed by Q-SID into one of three Collusion Group bins. Students are assumed not to have colluded during strictly proctored exams. The three FPR bins are defined by the highest Collusion Score in each group. FPRs associated with each bin were calculated from 31 proctored examinations taken by >9,000 students. The FPR for Collusion Groups in all other examinations is inferred from those in this proctored dataset.


Q. How accurate are FPRs?

A. The Collusion Scores ranges that define the three FPRs bins are adjusted based on class size to maintain the FPR. FPRs are also robust to changes in the number and rigor of questions (i.e. exam Complexity). We cannot, however, rule out that other differences in examination type or form could affect their accuracy.


Q. In addition to FPR, what other information from Q-SID can be used to suggest which Collusion Groups are most likely to have cheated?

A. None of the false positive Collusion Groups identified in the set of nineteen strictly proctored exams had more than two members. Further, in only one case was the same false positive pair identified in more than one exam. In contrast, students shown by independent evidence to have colluded are often placed in the same Collusion Group in multiple exams for the same class and can be members of groups larger than two.


Q. How can I best design an exam to allow Q-SID to efficiently detect collusion?

A. Q-SID is more effective the higher the Complexity of the exam, see the Q-SID Guide. For technical reasons Q-SID is not effective for exams with Complexities <10 and will not calculate Collusion Scores for exams below this threshold. We recommend that exams ideally have a Complexity of ≥15. To estimate the Complexity of an exam prior to giving the exam to the class, Q-SID can be run on the results of similar exams from prior years.

To maximize exam Complexity, for questions that carry many points, instructors should breakdown and recoded separately the scores for parts of the question to generate more independent scores. If a subset of questions on the test are multiple choice, record the students choice of answer as a question score. Knowing which wrong answer students give provides additional statistical power in detecting collusion.


Q. What if the Complexity of the exam I gave is too low?

A. To increase Complexity and thus the power to detect collusion, Q-SID can combine data from two or more exams for the same class. The Complexity of the combined exam is calculated as the sum of the Complexities of the individual exams. To have Q-SID combine exams, instructors simply upload a single excel file with each exam on a separate sheet and specify the number of exams on the Analyze Data page.


Q. Does Q-SID perform differently depending on the number of students in the class?

A. Q-SID is remarkably robust to changes in class size, see the Q-SID Guide. It works well for classes of 15 or more students.


Q. Does Q-SID store the data that users upload?

A. No. Users data is not stored after the results of the analysis have been provided.


Q. The file I uploaded had students listed in a certain order, but the Q-SID output excel file has ranked the students differently. Why?

A. Q-SID ranks students by Collusion Score and Collusion Group, with the student/partner pair with the highest score at the top of the list.


Q. What relevant expertise do the Q-SID developers have?

A. The senior developers have many years’ experience establishing analysis methods for large scale quantitative datasets in biology, specifically in genomics and proteomics. They have also taught many undergraduate and graduate classes in Molecular Biology or Statistics at Yale, UC Berkeley and UCLA.


Q. How is Q-SID funded?

A. It is not. This is a volunteer effort in response to a widespread problem affecting Universities and Colleges.


Other questions? Please contact us at qsid@stat.ucla.edu

Explanation for students


Dear Students,


Our chief motivation for developing Q-SID and making it freely available is to protect the large majority of you who follow your school's honor code and play by the rules. When, unfortunately, some students collude on their exams, it disadvantages the majority of you who do not. Because it is challenging to grade each student's performance objectively on an absolute scale, instructors tend to judge your performance relative to that of the other students in the class. As a result, if collusion is allowed to go unchecked, those of your who do not cheat may receive a grade lower than you deserve.


Q-SID protects students who play by the rules by helping identify others who have colluded. In addition, we hope that students who contemplate cheating will be dissuaded by reading about Q-SID in advance. Students who exchange information sufficiently to affect their grade will be detected by Q-SID. In our experience, students who cheat generally come to regret their actions and suffer significant remorse. It would be far healthier to avoid colluding in the first place. Read this Article for evidence that Q-SID deters cheating.


Those of you who play by the rules will not be falsely accused as a result of Q-SID being employed. This web tool's only role is to identify small groups with similar question scores so that their written answers can be examined in detail by course instructors. Compelling evidence for collusion requires that group members' written answers should be substantially more similar to each other's than those of the rest of the class. Shared errors that are not present either in the course materials or in study guides are often indicative of collusion. The written exams of students who have not colluded are remarkably distinct from each other compared to those who have.


We hope that all students will benefit from Q-SID.

The Q-SID team


Dr. Mark D. Biggin, Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley CA: Website.


Prof. Jingyi Jessica Li, Department of Statistics, UCLA, Los Angeles CA: Website


Guan'ao Yan, Department of Statistics, UCLA, Los Angeles CA: Website


We can be contacted at qsid@stat.ucla.edu.


We gratefully acknowledge help from a number of instructors at UC Berkeley who provided question score data from their classes and feedback on the results of Q-SID's analysis.

Academic License


Academic License: © 2020 UCLA (“Institution”). Academic or nonprofit researchers, as well as teachers employed by nonprofit institutions, are permitted to use the Q-SID Program posted on this site subject to Paragraphs 1-2:


1. Institution hereby grants to you free of charge, so long as you are an academic or nonprofit researcher, a nonexclusive license under Institution’s ownership interest in this Q-SID Program (the “Program”) to use the Program solely for educational or academic research purposes subject to the terms of this Academic License. Except as granted herein, all rights are reserved by Institution, including the right to pursue patent protection of the Program. Failure by you to adhere to the requirements in Paragraphs 1 and 2 will result in immediate termination of the license granted to you pursuant to this Academic License effective as of the date you first used the Program.


2. In no event shall Institution be liable to any entity or person for direct, indirect, special, incidental, or consequential damages, including lost profits, arising out of the use of this program, even if Institution has been advised of the possibility of such damage. Institution specifically disclaims any and all warranties, express and implied, including, but not limited to, any implied warranties of merchantability and fitness for a particular purpose. The software is provided “as is.” Institution has no obligation to provide maintenance, support, updates, enhancements, or modifications of this program.


Commercial entities: please contact qsid@stat.ucla.edu for licensing opportunities.