The Fallibility Of Data

The phrase “science and data” is used widely to imply an indisputable truth. Science has the weight of repeatable experiments with consistent results to support it, but data are simply individual facts that are of limited value unless they are extrapolative to a population, and they are correctly interpreted and contextualized. If the data are wrong, or if they are missing partially or completely, then it is easy to reach a misleading or false conclusion.

Data that are accessible, complete, and accurate are the beginning point of a defensible analysis. But getting a hold of data that meet these criteria is harder than it appears. As an example, using US attorney’s office figures seems like a good way to examine domestic terrorism trends. But a study based on Transactional Records Access Clearinghouse (TRAC) records conducted by Syracuse University found:

“U.S. Attorneys’ offices vary greatly in their numbers of domestic terrorism prosecutions. The largest during 2020, a total of 78 prosecutions, were brought in Oregon federal courts…

“At the other extreme, many U.S. Attorneys’ offices across the country brought no domestic terrorism suits, or just a single suit in all of FY 2020. This includes the U.S. Attorney in the Western District of Washington (Seattle) who was recorded as bringing only a single domestic terrorism suit, although protests there[,] similar to those in nearby Portland, Oregon, had figured prominently in the news.”

This brings up a point: Which data are “correct”? You’ll reach completely different conclusions about the severity of the domestic terrorist issue if you use the Oregon data set or the Western District of Washington data set. The difference in prosecutions illustrates the error-prone nature of data that rely on human judgment and decision making, as well as the influence of politics.

Here are a few tips to help strengthen your data collection.

  1. Be aware of outliers. In the example of the US attorney’s offices in Oregon and the Western District of Washington, did one track with national trends while the other did not? You may want to incorporate the former into a broader data set. At the same time, don’t dismiss outliers. They can be interesting unto themselves. They may broaden the perspective of your findings, or else launch new inquiries.
  2. Most non-profits and NGOs focus on specific causes. Some engage in data collection. Keep in mind, if they do collect data, its use is likely to demonstrate the gravity of their cause. Use it judiciously, and be sure it fits your data collection methodology.
  3. Unsubstantiated tips into law enforcement agencies can be interesting, and might be of collateral value. However, your data set will be stronger if you focus on cases that have gone through the rigor of the investigative process.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: