“There are lies, damned lies, and statistics” – Benjamin Disraeli

It’s hard to get anyone excited about statistics. Oxford mathematician Peter Donelly once said that statisticians were people who liked figures, but lacked the people skills to become accountants. But statistics matter. They are often used to influence policy, make claims, manipulate people… and so they fully deserve a blog entry on youarebeingmanipulated.com.

The reason that statistics can be used to manipulate people is that they can be fairly complex, and that mathematics don’t always behave in simple, intuitive ways. In the right hands, this can be very useful if you have an agenda to push.

For example, in June 2011, two polls were made of New Yorkers to find out attitudes about same-sex marriage. One found that 58% of NY voters supported same-sex marriage. The other found the exact opposite: 57% of NYers supported marriage as a man-woman only proposition. How is that possible?

The quick answer is that one of the polls, the NOM poll (57% against same-sex marriage), used some fairly misleading tricks. They polled a lot older folks – 70% of the sample was 60 years and older (compared to 37% for the other poll). Since older folks tend to be more conservative, a larger sample size of older people translates into a fairly obvious bias. The NOM poll also used strange wording in their questions: “Do you believe that the issue should be decided by legislators in Albany, or directly by the voters of NY?”, for example. It’s a cute example of question loading: using words like “directly”, “legislators in Albany” [i.e. far away], and contrasting “voters” and “legislators” means that the question is almost designed to manipulate people to respond “voters in NY, of course!”. The NOM poll has a sample size half that of the other, too. All in all, which one should you trust more?

"I SAID, WILL YOU FILL OUT A QUICK QUESTIONNAIRE FOR ME?"

Well, it sort of doesn’t matter. This is the important thing to realize about polls and questionnaires. Small changes in wording and sample can shift the answer quite a bit, and when polls are reported, no one tends to report the underlying methodology, just the top line result. So even if one poll is heavily ‘tweaked’ to push an answer, it will be ‘weighed’ by the media and the policy-makers just as heavily as a better designed, more neutral poll. Remember that polling agencies and companies conduct polls for clients. Someone pays them a lot of money to do all of those calls. Some of these actually care about understanding the position of the audience on something and design good polls. Many more clients want a poll that support their position, and design the polls accordingly.

And, of course, you can take the concept one step further. Ever heard of “push-polling”?

Basically, push-polling is a technique where random people will be called and asked to participate in a poll (which many people tend to accept), but which will be asked questions that are massively, massively biased. “Do you support John, the child-murdering demon bastard, or Fred, who was described as ‘angelic’ by a majority of Americans, in the coming election?”

I’m not exaggerating, by the way. Listen to some amusing push-polls here. The goals of these polls is less to sample and more to influence potential voters (and as a bonus, you get a poll result that you can report where 98% of voters support Fred and want to exile John, which could always be useful).

But let’s get back to statistics. It’s important to understand that statistics obey the Distribution rule: the statistic with the larger distribution wins, whether it’s the ‘right’ one or not.

"I SAID IT'S 15%!!"

Consider, for example, a common statistic: what percentage of the population is homosexual? If you answered 10%, you’re in the majority – that’s the figure that is most often quoted in the media. The actual answer is closer to 3-4%. So where did the 10% figure come from? In the Kinsey reports, in 1948, Dr. Alfred Kinsey produced a seminal report on human sexuality, basically shattering the myth that everyone had sex in their marriage and nowhere else. It changed how sex was perceived in society, and that is where Kinsey mentioned the 10% figure. Dr. Kinsey was interested in cataloging the sexual mores of his time, so he spent more time with college students and young, sexually active interviews, which is why he got a much higher proportion of homosexual behavior. But since his was the first real widely reported number, it stuck around… for over 50 years! (it doesn’t hurt that various gay and lesbian advocacy groups like to push the higher number, as it makes their own cause that much more important). But this is a good example of one poor statistic that has outlasted thousands of better ones and which has influenced policies and society for 6 decades, and still going strong.

Also, statistics are generally used by someone to justify something. Often, you can get away with quite a bit of ‘creative’ manipulation on statistics if you have no one with the agenda to oppose the statistic. For example, let’s take child prostitution, which is a Bad Thing. Do you remember the “Real Men” campaign?

It was mostly prompted by an alarming figure: 300,000 children lost to prostitution each year in the US. That figure became a major headline last year. As the Village Voice reported, the figure got quoted in dozen of national newspapers, was the basis of a number of Hollywood PSAs, and influenced policy in Washington (and also helped shut down the Craigslist erotic services listing, amongst other impact). In some way, that’s not surprising – 300,000 kids lost to prostitution each year is a major problem. But where did the statistic came from? It came from an academic study by two professors at Penn State, Richard Estes and Neil Weiner. And if you look at the study, as the Village Voice did, you find that the statistic is mostly garbage: Estes and Weiner count not the kids who actually end up as prostitutes, but kids “who are at risk”. Who is at risk? Runaways (even those who run away for a day). Transgender kids. Kids who live next to the Canadian or Mexican border. Add them all up, and you get 300,000 kids ‘at risk’. So what is the real number of kids forced into prostitution? There is no authoritative figure, but most police forces estimate a few hundred kids a year. Even the study authors – after significant academic and media challenges – admit that the “actual number is very small”.

So once the Village Voice published their criticism, all went back to normal, right? People put the problem in context, had a laugh about the study, and moved on? Well, no. Ashton Kusher suggested a boycott of the Village Voice, and many Hollywood “philanthropists” argued that the number is immaterial. Since child prostitution is a Bad Thing, who cares what the actual size of the problem is?

The answer, of course, is that many people in Hollywood had a vested interest in making that problem as large as possible for their own benefits. Large problems get media attention. They get government dollars. They create momentum for groups like the Global Philanthropy Group, which penned a response to the Village Voice article. Statistics matter because they have the sheen of truth, and groups will use them to justify their existence and to draw attention and resources to themselves, even if they are very questionable. Would it shock you to learn that the Global Philantropy Group, the same group that was so cavalier about the actual size of the problem, was the group that was hired by Ashton Kusher and Demi Moore to polish their PR image?

Another problem with statistics is that numbers, on their own, tell us very little. You often hear, for example, that the US ranks fairly low in mathematics and science education than places like Estonia and Singapore. In actuality, we’re not doing all that badly – by the statistics of the National Center of Education Statistics, we’re slightly better than average in math and science. But still, why are we not doing better? One intriguing answer is that essentially, we’re poorer. This analysis argues, for example, argues that the US has a high proportion of poor students, and poor students tend to do worse in standardized tests. Once you factor in the poverty index, US students appear much better on the rank tables. Basically, the argument is that the US tries to educate mostly everyone throughout high school, including the very poor, whereas most other geographies divert some students to other occupations – vocational schools or alternative education systems, for example. So we’re comparing all our kids – including the very poor – to what tends to be the best of many other geographies. In that context, it’s not that hard to see why we don’t do better. But that is a fairly complex explanation, and the simple ‘we suck’ statistic is a lot easier to explain (and to use to argue for more money to education).

Another example of this is the woman-man wage income gap. It’s often reported that women make 75% of what men make, for the same jobs. This statistic is often repeated, often calculated, and is used by almost every single administration to pass new laws to eliminate the gap. The statistic is calculated properly – but what does it mean? Either all business owners (including women) have agreed at the Great Conspiracy Meeting of 1908 to pay women less, or there is a great business opportunity starting fully women-staffed businesses (and lowering your costs by 25% compared to the competition), or the statistic is meaningless. Why is it meaningless? There are a large number of reasons, but it boils down quite quickly to the fact that men and women choose different jobs. Men pick more jobs in science and technology, women pick more flexible jobs (gross over generalization, but actually not far from the mark). In other words, the statistic of male-female wage gap is largely meaningless. But it is used, constantly, to argue for new laws and policies.

Mastering statistics is important. Even a simple chart, with the proper caption, can make people assume that correlation = causation. Check this example from Bloomberg:

All the numbers are accurate, of course. Now, whether they have anything to do with each other…

Now, do you still believe that statistics are as boring as you thought?