Given the high-stakes nature of the independent school admission process, regular audits of the tools used to assess applicants—tests, interviews, recommendations, writing samples, etc.—should be conducted to ensure their appropriate, responsible, and standardized use. Are the tools being used for the purpose for which they were designed? Are they measuring what they are intended to measure? Are all professionals consistently applying standard assessment rubrics year over year? How do you know? To address questions like these, The Enrollment Management Association’s testing and research team, lead by professional statisticians, regularly assesses the SSAT. Recently, the team performed a large-scale test to ensure its validity and reliability in meeting its goal of predicting student success.
There are multiple ways in which The Enrollment Management Association’s psychometricians regularly audit the performance of the SSAT. They look for bias in each item; they make sure the items are accurate and that each question response choice is clear; they look to see if items are functioning differently between and among groups; they ensure that scores between and among test administrations are equivalent (through an advanced statistical method called equating); and they perform real- time analyses before scores are released to students.
Yet, the primary way these experts ensure the integrity of the test is through research into its reliability and validity. Statistically speaking, reliable tests are not necessarily valid, and there are various sources of validity evidence that can be examined, such as the content tested (e.g., subject area and types of items), the internal structure of the test (e.g., reliability and other psychometric properties), and relationships between the test scores and other variables (e.g., correlations with the outcomes the test is expected to predict).
Dr. Jinghua Liu, chief testing and research officer for the SSAT program, reminds us that “Test validity refers to the degree to which evidence exists to support the interpretation of test scores for particular purposes. It is important to note that we validate a test score for a particular use (e.g., admission, placement), and that validity is not the property of a test in and of itself. This means that as opposed to talking about a test as simply valid or not valid, one should instead state, for example, ‘There is a great deal of validity evidence to support the use of SSAT scores for independent school admission decisions.’” In essence, validity is a matter of degree and not absolute.
It is therefore imperative that testing programs regularly gather validity evidence over time to either enhance, confirm, or contradict previous findings. In spring 2015, Liu and her colleagues conducted a large-scale predictive validity study to confirm prior validity data. By measuring predictive validity, she quantifies the extent to which the SSAT measures what it was designed to predict—a student’s first year GPA (grade point average).
A representative sample of 59 schools provided performance data for over 16,000 students. After analyzing the data, Liu found that the degree to which the SSAT predicts first-year GPA is either very similar to or even slightly higher than that of the prediction power of the SAT. (See table page 32). Therefore, this study confirms the predictive power of the SSAT.
1 This table appears in SSAT Predictive Validity Study – Preliminary Results (Liu and Low), June 2016. The complete research report is currently under peer review and will be published on enrollment.org later this year. (It is important to remember that correlations (R) in the .3-.5 range are typically considered strong for admission testing.)
Understanding Range Restriction
While these results are impressive and speak directly to Dr. Liu’s oversight and the value of the SSAT as an admission tool, the statement we sometimes hear is: “That may be true, but it’s not true for my school.” And here is where the most important parameter of prediction comes into play—the issue of range restriction.
Simply put, range restriction refers to the fact that if a sample has a restricted range of scores, the correlation will be reduced. This makes sense when one thinks of the most extreme example. As Rice University Associate Professor David Lane explains on his website, “Consider what the correlation between high-school GPA and college GPA would be in a sample where every student had the same high-school GPA. The correlation would necessarily be 0.0.” (davidmlane.com)
Often schools that use an admission test as a selection tool will admit students within a fairly narrow range of scores. For these schools, when they study the relationship between incoming test scores and first-year GPA, they will sometimes find correlations below or even well below .3 due to restriction of range. Even for a large-scale predictive validity study like the one Dr. Liu undertook with over 16,000 students in the sample, range restriction must be addressed in the statistical model.
Therefore schools should investigate and audit their admission programs from all possible angles, while understanding some of the limitations of their enrolled student data. Most importantly, they should also rely on the predictive validity results of regular and large-scale studies to gauge the overall quality and effectiveness of the assessment tool they are using.
Ask questions and seek answers. Be an educated consumer. To paraphrase Dr. Shaun Harper, Center for the Study of Race & Equity in Education at the University of Pennsylvania: We need to be IRS-like in auditing our institutions’ commitment to excellence in admission.