Saturday, May 07, 2016

Facts and causes

Just a little thought: often when reading about scientific studies done, one will read things like (made-up example):

42% of people who spent less than half an hour weekly using a blender called themselves 'very happy', compared to 34% amongst those who spent more.

Thus implying or directly saying that less use of a blender is causing more happiness.

But I think in many cases or the majority, it's at least as likely that the two facts are accidental or they may be caused by the same thing, for example that blue-eyed people are less likely to use a blender and blue-eyed people just happen to be happier for some reason. Or heck, maybe they are using blenders less because they are happier, the opposite causality. Probably the perception of the causality is much more based on the expectations rather than the numbers.

Update: Ken said...
I teach statistics and one of the things that we try to convince students is that correlation doesn't imply causation. Sometimes it can be possible to have a reasonable conclusion but often not.


Ken said...

I teach statistics and one of the things that we try to convince students is that correlation doesn't imply causation. Sometimes it can be possible to have a reasonable conclusion but often note. When it usually works is when the exposure comes before the outcome, but not always. If people with a genetic marker are more likely to have a cancer then that is fairly reliable, and similarly for exposure to a chemical.

What doesn't work is a lot of the cross-sectional studies.

Eolake Stobblehouse said...

Thank you, Ken.

(What does "cross-sectional" mean, please?)

Ken said...

Cross-sectional is when we look at data at a single time. So a subject does two tests on the same day, which is close enough to the same time, and we see if they are correlated.

The alternative is longitudinal where we do a test at one time and then look at the future results.

For showing causality what we really need to do is to intervene in some way. Obviously that is difficult for a lot of things. We can't randomly give children doses of lead to see if they turn into violent adults, we just have to rely on observed lead levels or exposures.

Andreas Weber said...

Let's say a test for a disease has 99% certainty to show a positive result when the tested person has the disease and shows 1% "false positives" for those who don't actually have it. If among those tested 1 in 1000 really has the disease (e.g. due to pre-screening) and your test shows a positive result - how likely is it that you have it?

Ol'Ben said...

I figure its one chance in ten, Andreas. I am in the middle of a book by Jordan Ellenberg, How not to be Wrong: The Power of Mathematical Thinking which spends several chapters on these and similar subjects. One particularly unsettling example he uses (starting on p. 151) is testing the possibility of a link between green jelly beans and acne:

Suppose you tested twenty [times] ... and you found just one result that
achieved p < .05 significance. Being a mathematical sophisticate, you'd
recognize that one success in twenty is exactly what you'd expect if none
of the [jellybeans] had any effect...

But what if the green jelly bean were tested twenty times by twenty
different research groups in twenty different labs? Nineteen of the labs
find no statistically significant effect. They don't write up their results--
Who's going to publish the bombshell "green jelly beans irrelevant to your
complexion" paper? The scientists in the twentieth lab, the lucky ones, find
a statistically significant effect, because they got lucky--
but they don't know they got lucky. For all they can tell,
their green-jelly-beans-cause-acne theory has been tested only once, and
it passed.

So a "properly conducted scientific study" can be wrong a lot more often than you might think!

Andreas Weber said...

Almost exactly one in eleven actually, unless my math is off (a distinct possibility):
Let's say we test 100000 people. 1 in 1000 has the disease, that's 100 in our sample. 99 of those will get a positive result, one get's missed (let's hope it's nothing too contagious). 99900 don't have it - but 999 (1 in 100) of those still get a positive test result, the rest correctly tests negative.
So of 100000 tests 1089 gave a positive result, of which 99 are correct and more than 10 times as many are false positives.

Of course normally your doctor is supposed to interpret that kind of data for you. But a question of that type was given to a significant number of them by a scientific newpaper in Germany and the results were ... alarming.