Reader Kate sent me a link to the HuffPo article by Srinivasan Pillay, The Science of Distant Healing, that everyone’s talking about this week. Apparently a study showed remote “intention” could act as a therapeutic intervention. I originally wasn’t going to bother with this, as the article was in my view rather confused and poorly written, and several skeptics in the comments seemed to be doing a pretty good job of taking it apart already anyway. And then on Friday both Orac and Steven Novella wrote posts critical of the article. But then I got a hold of the full study, and a little light went off in my head that told me I had something to add even to what those two luminaries had written.
First, I’ll do what Pillay didn’t do, and link to the abstract. I managed to click on the Elsevier link in the abstract and obtain a temporary log on ID to read the study. It’s entitled “Compassionate Intention As a Therapeutic Intervention by Partners of Cancer Patients: Effects of Distant Intention on the Patients' Autonomic Nervous System”. An odd title since the authors clearly state in the study, “we did not test for distant healing” (more on that below).
The study supposedly measured the effects of intention on the autonomic nervous system of a human "sender" and distant "receiver". Well, not really. What they actually measured were changes in skin conductance level, or as Pillay wrote, “a measure of the ability of sweat to conduct electricity”. PAL called that “measurements from glorified Scientology E-meters”. Ouch! No illnesses being cured then (as they admitted – see above). The paired senders and receivers were divided into three groups:
- Trained in directing intention, one person in each pair had cancer
- Untrained in directing intention, one person in each pair had cancer
- Untrained in directing intention, neither person in each pair had cancer
In group 1 and 2, the healthy person directed intention at the sick person. In group 3, a healthy person directed intention at another healthy person. Members of group 3 were not randomly selected – they were (obviously) non-randomly allocated to the group with no cancer. And yet, group 3 was claimed to be the “control group”. However, all three groups were instructed to direct intention – ie, even the “control group” directed intention. This is important when you consider the hypothesis being tested, which was:
The principal hypothesis was that the sender's DHI [distant healing intention] directed toward the distant, isolated receiver would cause the receiver's autonomic nervous system to become activated. A secondary analysis explored whether the factors of motivation and training modulated the postulated effect.
To test the principal hypothesis you obviously need a control group which is not sending intention, to compare with the intention group. Otherwise, how do you know if the intention had any effect? But there was no group without directed intention, which means there was no control group to test the actual principal hypothesis the authors of the study specifically said they were testing. So what were the results? Did the receiver's autonomic nervous systems become activated, and did training and motivation make a difference? Take a look at Figure 6 from the study, and the note under it, and see what you think:
Figure 6 Comparison of sender and receiver effect sizes (per epoch) measured at stimulus offset (with ±2 standard error confidence intervals) for all sessions, motivated sessions (trained group and wait group combined), and trained, wait, and control groups separately. EDA, electrodermal activity.
You’ll note there is no significant difference between the receivers in the different groups. (The senders differ, but then they knew they were sending.) The receivers all register an effect, but since there is no control group to compare these results with, these data tell you nothing about the principal hypothesis. Again I say, you need a control group to test this hypothesis, and they didn’t have one. There was no significant difference between the trained / untrained groups or between the motivated (ie including sick people) / unmotivated groups. So the secondary hypothesis failed.
So, end of story. Study failed, yes? Write it off, study something new next time? Well, no of course not. Not with woo. The study authors weren’t satisfied with that. Here, Steven Novella noticed something I initially missed. It was this, from the abstract:
Planned differences in skin conductance among the three groups were not significant, but a post hoc analysis showed that peak deviations were largest and most sustained in the trained group, followed by more moderate effects in the wait group, and still smaller effects in the control group
Translation: the study didn’t show what we wanted it to show (clearly – see Figure 6), so we data mined it to find something we could say was an effect. So what did they find with this bit of ad hoc activity? They produced several other graphs, of which (to keep it simple) I will reproduce just Figure 7:
Figure 7 Normalized comparison of receiver skin conductance levels in the three groups. EDA, electrodermal activity.
What they want you to look at is the difference between the three groups during the “intention” period. Group 1 (the “trained” group), showed the largest increase during the ten second burst of “intention”. (See the timescale on the bottom – seconds 0 to 10 is when the intention is being directed.) OK, but what I want you to notice is the 5 seconds before the intention (-5 to 0 on the bottom axis). The normalized EDA is actually higher for one group (group 2 – the “wait” group), when no intention is being directed at all! So for those not trained, it appears distant healing effects are higher when the sender does nothing. Even with group 1 (“trained”), the “doing nothing” period has a higher measurement than roughly 50% of the “sending intention” period. Then it struck me what we are missing – we are missing readings from all the “doing nothing” periods – since the intention sessions were all 10 seconds long, and the non-intention sessions were from five to 40 seconds long, we are talking about probably 60-75% of the total time. Was the EDA measurement higher or lower during those periods? Were there other peaks in EDA during those periods? They don’t say.
And I’m pretty sure they didn’t even look. Tucked away just before the “results” section of the report, they state this:
To avoid multiple testing problems, the preplanned hypothesis examined the normalized deviation only at stimulus offset.
I’ve read that section about ten times now, and the only sensible interpretation of that sentence is that they only looked at EDA changes during the intention sending sessions – they didn’t look at them during the non-intention sending periods (unintentional periods?). This sounds like the sharpshooter fallacy – shooting a load of bullets at the side of a barn and then painting a target where most of the bullets landed. But they ignored the larger clusters of bullets fired at different times.
Why This Is Significant
The study’s lead author is Dean Radin. Radin has a history of fitting statistical anomalies to temporal events, while ignoring the same anomalies that occur at other times that he doesn’t want you to know about. An example would be Radin’s interpretation of the now defunct (correction - it's still going) Global Consciousness Project’s (GCP) output from a series of random number generators – data that supposedly showed global consciousness spiked at certain major global events. If you want to see how credulous Radin can be, and/or how determined he is to find a correlation whether one exists or not (you decide), read this account by Claus Larsen, who attended a talk by Dean Radin in 2002:
Radin gave several examples of how GCP had detected "global consciousness". One was the day O.J. Simpson was acquitted of double-murder. We were shown a graph where - no doubt about that - the data formed a nice ascending curve in the minutes after the pre-show started, with cameras basically waiting for the verdict to be read. And yes, there was a nice, ascending curve in the minutes after the verdict was read.
However, about half an hour before the verdict, there was a similar curve ascending for no apparent reason. Radin's quick explanation before moving on to the next slide?
"I don't know what happened there."
It was not to be the last time we heard that answer.
Does that remind you a little of figure 7 above, and does it make you ask what happened during the “no intention” periods? It should.
And then there was 9/11:
Another serious problem with the September 11 result was that during the days before the attacks, there were several instances of the [random number generators] picking up data that showed the same fluctuation as on September 11th. When I asked Radin what had happened on those days, the answer was:
"I don't know."
I then asked him - and I'll admit that I was a bit flabbergasted - why on earth he hadn't gone back to see if similar "global events" had happened there since he got the same fluctuations. He answered that it would be "shoe-horning" - fitting the data to the result.
Checking your hypothesis against seemingly contradictory data is "shoe-horning"?
For once, I was speechless.
Did Radin check to see if there were similar fluctuations in the data in the “down” periods of this recent study? I don’t know, but we know for a fact from the above that Radin has selected data to fit his hypothesis in the past, and so I’m not going to trust him not to have done it this time. We know he performed some additional manipulation on the data, as Orac also noticed from the study:
To reduce the potential biasing effects of movement artifacts, all data were visually inspected, and SCL epochs with artifacts were eliminated from further consideration (artifacts were identified by [Dean Radin], who was not blind to each epoch's underlying condition).
So Radin admits he un-blinded the study and eliminated data he didn’t like.
Throughout this post I avoided any personal attacks on Radin’s (or Pillay’s) credibility, and concentrated instead on the actual study. However, when considering a study that claims a statistical effect like this (and the study authors admit the size of the observed effects were very small), on such frankly dubious grounds, it is relevant to consider where the author has in the past ignored contradictory data when forming conclusions. Clearly he has in the past, and he may well have done so here. The most generous conclusion I can draw about this study would be that it would need to be replicated by independent experimenters before I would even consider that there might be some basis in what it is claiming. (Randi’s $1 million test, anyone?) A more realistic interpretation is that Radin has been known to select data that fits his hypothesis and ignore that which doesn’t, and so there’s no reason to think that hasn’t happened here. Radin even admits he un-blinded the study to eliminate some data he didn’t like. Add the fact that there was no control group, the null hypotheses were not even rejected, and the only interesting thing they found required some (admitted by the authors) post hoc rationalization, and there really isn’t much left worth looking at.
The study ends with the words “This study is dedicated to Elisabeth Targ.” That would be the Elizabeth Targ whose study of intercessory prayer was also fraudulently un-blinded so it could report a success when in reality it had failed. And this study is dedicated to her? I couldn’t have put it better myself.