How to Lie with Statistics : Book Review


Advertise Here

Recently I read a book - “How to Lie with Statistics” . I stumbled on this jewel from Bill Gates Recommended reading for Ted 2015. Though this book was published in the 1950s I found this book is more relevant now with the amount of statistical data we see every day in our life and work. A small book with very simple explanation of powerful concepts. 
Statistics tell us nothing until we understand what is being counted in the first place- Tim Harford
I will try to summarize the crux of the book in few points.   So the book is divided into these broad areas, and we can use this as a lens to view the any statistical data we see in media, or news or any company’s annual reports etc.
Sampling Bias : In real world problems and scenarios we handle it is really hard to generate a random sample that is representative of the population. Always question weather the sample representative of the population or biased / not representing the actual population. When magazines share the findings, first go and look how the sampling of the population was done.
Averages - Mean, Median & Mode:  Business School Salaries reported out In India is one of the good example.( I have been on both sides, reading the figures as an outsider was astonished, and later as insider when I was also part of a business school placement process). When the report calls out the average salary of the placement offers is say 20 Lakhs INR. They mostly refer the Mean, which hides the reality of what the salaries were for the majority of the population. Suppose if 4 people out of 60 people get an international offer at 40 Lakhs INR, you know how the data would skew to show a high number. Median & Mode are extremely good statistic which reveal more information.  Suppose they say Median Salary was 10Lakhs INR , this would imply half of the population is getting a salary at or below 10Lakhs. Pretty interesting to know right :).  There are other statistic as well like the Standard deviation which was not covered in the book, but important to know if you are into investing in stock markets. Suppose the statement made is - the average annual return of the stock market for the last 5 years is 12%, then don’t think the returns were like 12% every year. It could be like this 75%, –10%, 10%,-20%, 5%. The mean hear is actually 12% but how does this help. If you have one leg in a hot stove and other leg in a cold stove with the mean at room temperature does it help you in any way? It would actually result in burning or having a frost bite but the mean statistic looked very normal. The median in the above stock market example was 5%, and the standard deviation for the above case was 37%, which shows high volatility.  This was one of my favorite chapter and fun reading.
Statistics with missing / Manipulated pieces of information:  When there is statistical data shown, there are very key information missing like statistical significance or error rate. Where the data could be +/- some range and still valid for the interpretation. There is an error rate of +/- some percentages that is allowable.  Missing / manipulation of statistics with Graphs could be scary. By changing the scale of the abscissa or ordinate, or by not mentioning what the axis represent , or the units you can manipulate the reader to infer totally opposite thing in the graph. By scaling the graph, and cutting down the axis points, I can show a marginal 5% increase as some great increase in revenues for the company pictorially. So watch out for those.
Measuring something and reporting the something close to that: Asking doctors what is their preferred brand of cigarettes and reporting to public as even doctors use this brand of cigarettes so this is a healthier option. Though while measuring all that was asked to the doctor was what is their preferred brand, and creating an assumption that doctors only choose healthy option.
Correlation does not mean Causality  : No need to say this more – if I can create data points saying that whenever I eat vegetarian food, it rains in the city. Wow, so eating vegetarian food leads to rain? Applying context and common sense is very important and correlation does not imply to causality.
The key take away were these questions that we need to ask when we see a statistical information.
  1. Who said it – Who was the person that measured this statistic, credible person and what are the incentives. Or a credible name could be used to deceive as well
  2. How he knows and What info is  missing  ( Look for some information that is hidden or not removed)
  3. Did somebody changed subject – This is to check any semi attachment or representing something totally different to what was measured
  4. Does it make sense Common sense – Apply common sense test – remember if I eat vegetarian food it rains example :)
A short 140 page quick read and a must read for all.


Post a Comment

After reading the post , please leave your thoughts good / bad for me to help improve