Compiling A Defensible Data Set

Compiling a data set to support an intelligence project seems like a straightforward process: define attributes, select entities, and add them to a database for later sorting and analysis. In reality, the process can be challenging to get right.

One of the first obstacles is finding data. There are three choices: assemble your own data set, work from an established database, or combine the two methods. Unfortunately, the IC does not have a searchable, comprehensive, and current database of all reported crimes. Some private companies maintain subject-specific databases, although they may have limited date ranges. Here are some examples:

  1. The Global Terrorism Database is an excellent resource for incident of domestic and international terrorism, but its data stop at 31 December 2019.
  2. Mother Jones has good and accessible database of mass shootings in the United States from 1982 to 2021.
  3. The Washington Post offers a database of police-involved shootings. Their data range from 2015 to present.
  4. The Transactional Records Access Clearinghouse maintained by Syracuse University has quite a bit of good quantitative data on federal law enforcement and immigration matters.

If you choose to compile your own data set, you first need to choose parameters. Let’s say you want to address gaps concerning domestic violent extremism in your area of responsibility (AOR). What do you include in your data set? Only federally-charged crimes? Only state-charged crimes? Subjects whose actions caused harm to persons (killed, maimed, is there a threshold for the number of victims)? Property damage (dollar amount)? Disrupted plots (no harm to persons, nor damage to property)? A subject or subjects who may have initially been charged with a domestic violent extremist-related charge, but who pled down to a lesser crime? A subject who was killed in action? Financial crimes perpetrated in furtherance of domestic violent extremist activity? Stings? Threats?

You’re also likely have to create categories or use existing categories to sort your data in order to discuss and describe the results of your analysis. The FBI uses these categories to sort domestic violent extremists: racially or ethnically motivated; anti-government or anti-authority; animal rights/environmental; abortion-related; all other domestic terrorism threats. You can use these classifications or define your own. If a subject fits into two categories, you’ll probably have to choose one in order to maintain the integrity of the numbers, and it’s a good idea to be prepared to defend your reasoning.

There is also the decision about geographic range. You may start with data from your AOR, but you can add greater qualitative and quantitative context if you compare your results to a broader data set.

And there is the time frame to consider. The more data and the longer the time span, the better position you are in to see trends. You may have noted an uptick in the past year, but if you widen the time span, is it still significant?

If you choose to use an existing database, review the organizer’s methodology for gathering data and the parameters for entry to be sure you agree with the method. If you add to it, use the same methodology. If you compile your own unique data set, define your attributes up front and stick with them. A strong data set is consistent and comprehensive.

