Do Statistics Lie?

Dennis Pilkey believes that statistics don’t lie, but people can deceive or even lie using statistics. This can occur by taking a statistic out of context, or being selective on what statistics are used. Recent NAFTA talks and the issue of trade deficits between Canada and the United States are a good example of this. See Graphically Speaking for examples of the use of graphs that exaggerate or distort time series changes. Not understanding graphical presentation of data can also cause inadvertent rather than deliberate misrepresentation of statistics.

Most statistics are sample based and are often cited as being representative of the sampled population 19 times out of 20.  That’s 95% of the time.  Wouldn’t it be nice if we were always right that much. Of course there are many factors that affect the accuracy rate of the results of a given survey, including the size of the sample relative to the population, how the sample was chosen and whether there are any sub-groups of interest within the sample. Many surveys have smaller samples and therefore have lower rates of reliability. There can also be response bias. This is increased if a sample is not truly random, such as an internet based poll.

The statistics used on this site are from the Census of Population. The short form is intended to cover 100% of the population and therefore it is not a survey and not subject to sampling errors. There are however, non-response errors, people counted twice and some people missed.  The overall net accuracy or coverage for Nova Scotia in the 2016 Census was about 98%. The short form includes such things as age, gender and household living arrangements. The long form, which includes such things as education, income and home ownership details, was based on a 25% sample of households, i.e. one out of each four households are asked to complete the long form.  Dennis, in his previous work, determined that the non-response rate for the long form was likely higher than for the short form. For 2016, the Census was again made mandatory. Probably as a result of the issues caused by replacement of the 2011 long form census with the voluntary National Household Survey (NHS), the results of the 2016 Census seem to have reached a better level of completeness and accuracy.

The raw Census numbers are close but not 100% accurate. For example, the 2016 Census of Population, conducted in May 2016, counted Nova Scotia’s population to be 923,948. This compares to the official population estimate of 948,618 as of July 1, 2016. See Nova Scotia Statistics for more details about this.  While the actual number of people counted in the Census are only “very close”, the share or percentage of various characteristics such as people living alone can be considered accurate, i.e. representative of the population.

For 2011, the results of the voluntary NHS have been brought into question by a number of researchers. DWPilkey Consulting, under a project completed for United Way Halifax, demonstrated and concluded that the NHS data could not be used for neighbourhood work.  Comparison of NHS data with taxfiler information at the Census Tract level was a key part of this analysis. In a special Globe and Mail feature, Dr. David Hulchanski and his colleagues explains why they believe that the 2011 voluntary census is worthless . The sample size for the NHS was one in three, but non-response rates were very high.  Statistics Canada suppressed data for any geographic area that had over 50% non-response.  Dennis believes that people at both ends of the economic spectrum were more likely to be among those not completing the NHS. This belief was confirmed by the Hulchanski work.

Assessment of the NHS related to taxfiler data also revealed issues with the latter.  Taxfiler information is based on postal codes included on Canada Revenue Agency’s tax forms.  Over the last twenty years, Dennis Pilkey has worked with and carried out a number of analytical projects related to postal code based data. For example, the Postal Code Conversion File, a Census byproduct, had residences in Digby County with Halifax postal codes. The 2010 taxfiler data used for assessment of the NHS showed that the data attributed to Preston bore no resemblance to the corresponding 2011 Census data (short form, which was good information in contrast to the NHS).  A recent local report that compared Low Income Measures, After Tax using taxfiler data made front page news because of the alarming rate for this measure as it relates to Preston. The corresponding 2016 Census, which is based on 2015 income, had dramatically different numbers for this area. The advantage of the taxfiler data is that it is available annually compared to the Census which is every five years. The disadvantage is all the challenges of postal code vagaries in assigning the data to a physical area, especially for rural areas.

Statistics do not lie. Some sources are better than others. The survey results could be the one in twenty that produces a non-representative sample. There are many ways to choose what ones to use and how to present them. Some people may choose to take statistics out of context or interpret the numbers in a different way. Some may choose to present them in a way that exaggerates changes they are trying to demonstrate. Graphically Speaking and Thematic Mapping 101 on this site speak to the many ways that data can be visually displayed.