It is happening again. Lawyers who don’t know anything about data analysis make a request (often an official open records request) for information from agencies. Then, because I have gotten a reputation for analyzing public data and making it reveal previously-unseen patterns, they dump it on me and ask me to analyze it. Some years ago the request was for information on the racial breakouts of juvenile arrests for misdemeanors and citations — the person making the request assumed without asking that information about felonies already existed. That request yielded a pile of printouts of incomparable information from a dozen different agencies, including hundreds of pages of printouts listing all juvenile arrests in the central city. (I dealt with that by giving the data entry job to freshmen in a “research experience” program; it was a good experience for them, even though the data was of limited value.)
This time the request vaguely asked for information on the racial breakouts of arrests and traffic stops and was sent to the two dozen law enforcement agencies in the county. They sent me the responses yesterday. Six agencies responded with spreadsheets that are all in the same format, which includes the breakouts by race for 64 offense groups and five citation/stop categories. The two biggest agencies responded with dumps of all charges: to turn them into the appropriate counts, you first need to collapse the charges down to incidents (as the same person can have multiple charges in an arrest or traffic stop) using the date and time of the contact and the date of birth of the offender, following some sort of protocol for which offense to treat as the “most serious” offense, and then collapse the zillions of specific offenses into a smaller number of categories. Of course there is no crosstalk file for linking the specific offenses into either the 64 standard categories used by the six agencies that used the same format nor the Uniform Crime Reports categories nor the severity code, nor did the agencies include these fields in their data dump, even though they must have that information for their own needs. The most passive aggressive agency did not even include the offense description field, just the statute number. The rest of the agencies responded in a wide variety of ways, including PDFs of pie charts of the racial breakouts of traffic stops, counts by race that summed across arrests and traffic stops, or emails that said “everyone we arrested was white” (no counts given).
UPDATE 7/29: One agency actually sent a VIDEO with slides of their report that rotate in 3-D space and a voice-over describing was was on each page. I am not kidding!
The lawyers don’t understand why I’m saying it isn’t worth my time to try to “analyze” this mess. They say, “We asked for it, it will look bad if we don’t include the results in the report.” And they can’t understand why I can’t get it done by next week, when the draft report is due.