In this age of RtI where testing is claimed to be a "thing of the past," even though that position is hogwash, it is even more important to fully understand the language and process of testing. This posting is part 1 of a two part blog post on testing.
Pay Attention to Sub-Scores on Tests
The most common issue that I see regarding sub-scores is on IQ or ability testing. For instance, a student receives an average or even above average score in Verbal IQ (VIQ) but a lower score in Performance IQ. I have had many psychologists agree that in terms of school-based performance that the VIQ is more telling of expected aptitude. Another example was a student who had severe dsygraphia. The school gave him a VMI (test of Visual Motor Integration). The test has three components—a visual score, a motor score and an integration or composite score. His visual score was very high but his motor score was very low resulting in an average composite from which the evaluator concluded that there was no issue. I was able to show the fallacy of this conclusion when the scores were unpacked into its sub-scores, and the hearing officer looked at his work product, which was also poor. The subtests are there for a reason and should not be ignored or glossed over. There is as much art as science in the process of student evaluations.
Look at the Task Being Required on the Test
One of the most common achievement tests used is the WIAT. It produces scores in a variety of areas of educational achievement. The WIAT, as an example, has some troubling flaws regarding the writing component. Even in the upper grades, if the student wrote a poorly developed paragraph, that could be as short as one sentence, he or she could get a low average score, totally missing the boat that the student had severe issues in writing. I insist that the school apply a district or state rubric that would normally be used for high stakes/NCLB testing, which generally will be more sensitive and more aligned with work that is expected in the class.
The ultimate fail safe is to ask to see the work produced on this test or any other test. The technical term is to ask to see the “test protocol.” Getting copies can sometimes be problematic (even though there is legal support) but a review of the answer book should not be controversial. This review will allow you to know how little or what was required to get the score. Sometimes the format of the question or other aspects of the test can account for an artificially low score. Reviewing the test protocol can provide a real insight to the basis for the test.
Insist on Standard Scores Not Percentiles or Grade Equivalent Scores
Most norm based testing will produce a variety of scoring options—standard score, percentiles and grade equivalents. The least useful and often the most misleading score is grade equivalents. This score should be ignored or significantly discounted. The scores that have the most statistical significance are standard scores. These scores are usually normed based upon a mean of 100 or 10 with scores in the range of 85 to 115 being in the average range for tests with a mean of 100, or 7 to 13 on test/subtests with a mean of 10. I find it best to print out a copy of bell curve from the web and plot your scores to see where things stand relative to the mean.
Issues come up for students with very high IQs. They may receive average scores in writing, math, and reading but the issue is whether that is a fair comparison since they could be achieving several standard deviations below their ability. OSEP’s Letter to Lillie Felton would support the position that it makes more sense to compare the student’s scores to himself, as opposed to the average norm, which may not reveal this student’s issues. The answer is often “well we are not required to address giftedness and he/she is performing at an average level, so we the (SD) are just fine.” I argue based upon Lillie Felton and seek to take a more comprehensive view of the student’s work product to get at the real needs. The standard scores for students with high IQs can be misleading.
Similarly, in the case where students do not test well because of anxiety or behaviors, these standard scores should be discounted and other data needs to be taken into account. Scores are useful only to the extent that they reveal the students needs, not that he or she does not test well.
I have had some districts that refuse to report out any scores and resort to narrative descriptions like “average” or even worse “doing well”. Get the scores reported and even better insist on an appendix of all scores so you can see them all in one place.
Some evaluations like the Connors or BASC are criterion-based using data from questionnaires. There are no standard scores and the above discussion does not apply. You must insist on being told what the different ranges of scores mean.
Beware of Non-Standard Test Administration
Test publishers have very strict rules about how and how often tests can be given. Tests of written expression are normed involving students using pencil/pen and paper not laptops or other keyboards. Using a keyboard invalidates the standardization of the testing. There are time limits on many subtests that must be observed if the scores are to be standardized. You can contact the test publishers (e.g.Pro-Ed, Psychological Corp.) for information about standard test procedures. If the standard test conditions are not observed then the score should not be reported, as a standard score and the deviation of testing procedures must be noted in the report.
I have had several excellent evaluators who apply some subtests under both standard and nonstandard conditions. For instance, allowing the student extra time to see what effect a little more time can have on the results. The insights gained, if not the score, can be invaluable. The evaluator has to be willing to be creative and nonlinear in his or her thinking, which can be a challenge.
Comprehensive Means Comprehensive!
Schools will often limit their testing to those tests that they have on hand, or those that are familiar to the evaluator. There are many tests available to meet a variety of needs. I find that under closer questioning schools tend to only pick tests that they readily have on hand, not those that may fit the student’s particular profile. For instance the NEPSY is a test that evaluates visual-spatial and executive function issues, two areas that can be ignored or not addressed in-depth though other test instruments. The BRIEF is another test that I do not see applied very frequently. It is not as comprehensive as the NEPSY but it can provide useful information about executive functioning through rating scales (questionnaire format).
The CTOPP is the Comprehensive Test of Phonological Processing that reveals important in-depth information regarding phonological processing, but is not applied very frequently. It provides much more detailed information about aspects of the students’ needs where he or she is having reading issues.
Beware of tests that are “summary” in nature or that are meant to be a “screening instrument” only but are being used to provide all the information at the IEP. Case study evaluations are meant to evaluate the full scope of issues and in a manner that goes beyond a screening instrument.
[Here is a link, that is local to Illinois, for Dr. Tonya Gall, an evaluator who does a great job doing testing and is presenting at an IBIDA event]
Comments