Sexual intercourse began
In nineteen sixty-three
(which was rather late for me) -
Between the end of the Chatterley
ban
And the Beatles’ first LP.
(from Annus Mirabilis by Philip Larkin)
Evidence-based practice has become
all but a cliché in educational discourse. Perhaps finally tiring of talking
about ‘learnings’, ‘privileging’ and verbing any other noun they can get their
hands on, educationists have decided to "sing from the same songsheet” of
evidence-based practice. That’s got to be a good thing, right? Well, yes, it
would be if they were singing the same tune and the same words. Unfortunately,
evidence-based practice means different things to different people. This is why
I personally prefer the term scientific evidence-based practice. But how are we
to know what constitutes (scientific) evidence-based practice?
The Education Minister for New
South Wales has recently (August 2012) launched the Centre for Education
Statistics and Evaluation which “undertakes in depth analysis of education
programs and outcomes across early childhood, school, training and higher
education to inform whole-of-government, evidence based decision making.” (See http://tinyurl.com/d53f2y2 and http://tinyurl.com/c6uh3y4). Moreover, we are told,
“The Centre turns data into knowledge, providing
information about the effectiveness of different programs and strategies in
different contexts – put simply, it seeks to find out what works”.
Ah, ‘what works’, that rings a bell. It is too
early to tell whether this new centre will deliver on its promises but what
about the original ‘What Works Clearinghouse’ (WWC), the US based repository of
reports on educational program efficacy that originally promised so much?
As Daniel Willingham
has pointed out:
“The U.S. Department of Education
has, in the past, tried to bring some scientific rigor to teaching. The What
Works Clearinghouse, created in 2002 by the DOE's Institute of Education
Sciences, evaluates classroom curricula, programs and materials, but its standards
of evidence are overly stringent, and teachers play no role in the vetting
process.” (See http://tinyurl.com/bn8mvdt)
My colleagues and I have also been critical of
WWC. And not just for being too stringent. Far from being too rigorous, the WWC
boffins frequently make, to us, egregious mistakes; mistakes that, far too
often for comfort, seem to support a particular approach to teaching and
learning.
I first
became a little wary of WWC when I found that our own truly experimental study
on the efficacy of Reading Recovery (RR) had been omitted from their analyses
underlying their report on RR. Too bad, you might think, that’s just sour
grapes. But, according to Google Scholar, the article has been cited 160 times
since publication in 1995 and was described by eminent American reading
researchers Shanahan and Barr as one of the “more sophisticated studies”.
Interestingly enough, it is frequently cited by proponents of RR (we did find it to be effective) as well as by its
critics (but effective only for one
in three children who received it). So why was it not included by WWC? It was considered for inclusion but was
rejected on the following grounds:
“Incomparable groups: this study was a quasi-experimental
design that used achievement pre-tests but it did not establish that the
comparison group was comparable to the treatment group prior to the start of
the intervention.”
You can read the details of why this is just
plain wrong, as well as other criticisms of WWC, in Carter and Wheldall (2008)
(http://tinyurl.com/c6jcknl). Suffice to say that participants were randomly allocated to treatment
groups and that we did establish that the control group (as well as the
comparison group) was comparable to the (experimental) treatment group who
received RR prior to the start of the intervention. This example also
highlights another problem with WWC’s approach. Because they are supposedly so
‘rigorous’, they discard the vast majority of studies from the research
literature on any given topic as not meeting their criteria for inclusion or
‘evidence standards’. In the case of RR, 78 studies of RR were considered and
all but five were excluded from further consideration. Our many other criticisms
of what we regard as a seriously flawed WWC evaluation report on RR are
detailed in Reynolds, Wheldall, and Madelaine (2009) (http://tinyurl.com/cuj8sqm).
Advocates
of Direct Instruction (DI) seem to have been particularly ill-served by the
methodological ‘rigour’ of WWC, for not only are most more recent studies of the
efficacy of DI programs excluded because they do not meet the WWC evidence
standards but they also impose a blanket ban on including any study (regardless
of technical adequacy) published before 1985; an interesting if
somewhat idiosyncratic approach to science. Philip Larkin told us that sex only
began in 1963 but who would have thought that there was no educational research
worth considering before 1985? (Insert
your own favourite examples here of important scientific research in other
areas that would fall foul of this criterion. Relativity anyone? Gravity?) Zig
Engelmann, the godfather of DI, has written scathingly about the practices of
WWC (http://tinyurl.com/c5pjm9d and http://tinyurl.com/85t2vpt), concluding:
“I consider WWC a very dangerous organization. It is
not fulfilling its role of providing the field with honest information about
what works, but rather seems bent on finding evidence for programs it would
like to believe are effective (like Reading Recovery and Everyday
Mathematics).”
Engelmann
can be forgiven for having his doubts given that for the 2008 WWC evaluation
report on the DI program Reading Mastery (RM) (http://tinyurl.com/d8kawf7), WWC could not find a single study that met their evidence standards out of the 61
studies they were able to retrieve. (Engelmann claims that there were over 90
such studies, mostly peer reviewed.)
The most recent WWC report on RM in 2012 (http://tinyurl.com/7bdobxv), specifically concerned with its efficacy for
students with learning disabilities, determined that only two of the 17 studies
it identified as relevant met evidence standards and concluded:
“Reading Mastery was found to have no discernible
effects on reading comprehension and potentially negative
effects on alphabetics, reading fluency, and writing for students with
learning disabilities.”
In response to this judgement, the Institute
for Direct Instruction pointed out, not unreasonably, that, of the two studies
considered:
“One actually showed that students studying with RM
had significantly greater gains than students in national and state norming
populations. Because the gains were equal to students in Horizons (another DI
program), the WWC concluded that RM had no effect. The other study involved
giving an extra 45 minutes of phonics related instruction to students studying
RM. The WWC interpreted the better results of the students with the extra time
as indicating potentially negative effects of RM.” (http://tinyurl.com/9oewdlo)
In other words when
Reading Mastery was compared with another very similar DI program (in each
case), and the results were no different from or slightly better than the
standard Reading Mastery program, it was concluded that Reading Mastery was
therefore ineffective for students with learning disabilities and possibly even
detrimental to their progress. It is conclusions such as these that have led
some experts in the field to wonder whether this is the result of incompetence
or bias: cock up or conspiracy.
If we needed any further
proof of the unreliability of WWC reports, we now have their August 2012 report
on whether Open Court Reading©
improves adolescent literacy (http://tinyurl.com/9nzv5wj). True to form, they
discarded 57 out of 58 studies as not meeting evidence standards. On the basis
of this one study they concluded that Open Court “was found to have potentially
positive effects on comprehension for adolescent readers”. There are at least
three problems with this conclusion. First, this is a bold claim based on the
results for just one study, the large sample size and their ‘potentially
positive’ caveat notwithstanding. Second, the effect size was trivial at 0.16,
not even ‘small’, and well below WWC’s own usual threshold of 0.25. Third, and
most important of all, this study was not even carried out with adolescents!
The study sample comprised “more than 900 first-grade though fifth-grade who
attended five schools across the United States”. As Private Eye magazine would
have it “shorely shome mishtake” …
There is, then, good reason for serious concern
regarding the reliability of the judgments offered by WWC. The egregious errors
noted above apart, there is the more pressing problem that truly experimental
trials are still relatively rare in educational research and those that have
been carried out may often be methodologically flawed. In its early years, What
Works was renamed ‘Nothing Works’ by some because there was little or no
acceptable evidence available on many programs. Clearly, teachers cannot just
stop using almost all programs and interventions until there are sufficient
RCTs testifying to their efficacy to warrant adopting them. Hattie, for
example, in his seminal 2009 work ‘Visible Learning’ has synthesized over 800
meta-analyses relating to achievement in order to be able to offer
evidence-based advice to teachers (http://tinyurl.com/3h9jssl). (Very few of the studies
on which the meta-analyses were based were randomized control trials, however,
as Hattie makes clear.)
Until we have a large evidence base of
methodologically sound randomized control trials on a wide variety of
educational programs, methods and procedures, we need a more sophisticated and
pragmatic analysis of the evidence we currently have available. It is not a
question of accepting any evidence in the absence of good evidence, but rather of
assessing the existing research findings and carefully explaining the
limitations and caveats.
As I have attempted to show, the spurious rigour of
WWC whereby the vast majority of studies on any topic are simply discarded as
being too old or too weak methodologically, coupled with their unfortunate
habit of making alarming mistakes, makes it hard to trust their judgments. If
the suggestions of bias regarding their pedagogical preferences have any
substance, we have even more cause for concern. As it stands, What Works simply
won’t wash.
Postscript November 15, 2013
Postscript November 15, 2013
Further to my original blog post ‘What’s wrong with
What Works?’, WWC have released new reports on reading interventions that
confirm that at WWC it is business as usual. My colleague, Mark Carter, alerted
me to problems associated with the WWC evaluation (March 2013) of the efficacy
of FastForWord® (FFW):
“I just can't
believe WWC. They found "positive" effects of FFW for alphabetics but
the ES was 0.15 – trivial, below their own 0.25 (low) standard for educational
significance and moving the average child a whole 6 percentile ranks. They got
exactly the same results for fluency and comprehension and reached different
conclusions for each. Three identical effect sizes - three different
conclusions.
They say in the text
that the effects are below 0.25 and are "indeterminate" but then give
it a positive rating. They seem to be vote counting the number of significant
outcomes in a given area. This is conceptually antithetical to the whole idea
of meta-analysis. The problem with examining significance is that it is
substantially a function of sample size - that is why we use effect sizes to
aggregate findings across studies. And, to boot, they are ignoring their own
criteria for educational significance. I really can't believe it.”
It is also worth noting that WWC
based their efficacy evaluation on just nine studies out of the 342 studies
they originally identified that looked at the efficacy of FFW on early reading
skills; seven studies that met their evidence standards without reservations
and two studies that met their standards with reservations.
No comments:
Post a Comment
Comments policy:
Comments are welcome but will be moderated.
I blog under my own name and provide a contact email address. I reserve the right to expect the same of those who wish to comment.
Anonymous comments will not be published.