How To Create A Defensible Data Set For Intelligence Production

Compiling a data set to support an intelligence product seems like a straightforward process: define attributes, select entities, and add them to a database for later sorting and analysis. In reality, the process can be challenging to get right.

One of the first obstacles is finding data. There are three choices: assemble your own data set, work from an established database, or combine the two methods. Unfortunately, the IC does not have a searchable, comprehensive, and current database of all reported crimes. Some private companies maintain subject-specific databases, although they may have limited date ranges. Here are some examples:

  1. The Global Terrorism Database is a strong resource for incidents of domestic and international terrorism, but its data stop at 31 December 2019. (You are required to fill out an application to download and use the GTD.)
  2. Mother Jones has a comprehensive and accessible database of mass shootings in the United States from 1982 to 2025.
  3. The Washington Post offers a database of police-involved shootings. Their data range from 2015 to 2024.
  4. The Transactional Records Access Clearinghouse, maintained by Syracuse University, has quite a bit of quantitative data on federal law enforcement and immigration matters.

If you choose to compile your own data set, first you need to choose parameters. Let’s say you want to address an issue related to domestic violent extremism in your area of responsibility (AOR). What do you include in your data set? Only federally-charged crimes? Only state-charged crimes? Subjects whose actions caused harm to persons (killed; maimed; is there a threshold for the number of victims)? Property damage (dollar amount)? Disrupted plots (no harm to persons, nor damage to property)? A subject or subjects who may have initially been charged with a domestic violent extremist-related charge, but who pled down to a lesser crime? Financial crimes perpetrated in furtherance of domestic violent extremist activity? Stings? Threats?

You’re also likely have to create categories or use existing categories to sort your data in order to discuss and describe the results of your analysis. The FBI uses these categories to sort domestic violent extremists: racially or ethnically motivated; anti-government or anti-authority; animal rights/environmental; abortion-related; all other domestic terrorism threats. You can use these same classifications or define your own. If a subject fits into two categories, you may have to choose one in order to maintain the integrity of the numbers, and it’s a good idea to be prepared to defend your reasoning.

There is also the decision about geographic range. You may start with data from your AOR, but you can add greater qualitative and quantitative context if you compare your results to a broader data set.

There is the time frame to consider. The more data and the longer the time span, the better position you are in to see trends. You may have noticed an uptick in the past year, but if you widen the time span, is it still significant?

Lastly, a word about data sources. Law enforcement agencies and court systems are generally authoritative sites from which to gather data, but if you’re compiling bits from multiple agencies be sure they have similar parameters. For example, smaller jurisdictions may have a lower financial threshold for crimes on which they will take reports and conduct investigations, whereas larger jurisdictions may have a much higher threshold. These thresholds differ even within the same agency, such as the 56 field offices of the FBI. Court systems, too, have differing criteria. A case may be accepted for prosecution in one area, but rejected in another. If you’re gathering data from a broad variety of sources, be mindful of these potential differences. Keep your readers informed by including descriptions/explanations of your gathering technique in your source statement (or scope note).


Bottom line:

  • If you choose to use an existing database, review the organizer’s methodology for gathering data and the parameters for entry to be sure you agree with both. If you add to it, follow the same guidelines.
  • If you compile your own data set, define your attributes, collect from authoritative sites, and compile a sample that is comprehensive enough to address your intelligence question and give you a credible and defensible result.

Leave a Reply

Discover more from The Intelligence Shop

Subscribe now to keep reading and get access to the full archive.

Continue reading