In terms of police reporting, Orlando is a large metropolis that likes to pretend it’s still a small town. What I mean by this is that the Orlando Police Department files and stores police dispatches on everything that officers are called on for (except minor traffic stops). This means that Orlando often ranks disproportionately high in crime lists that are based on the number of reports per capita. It also means that we have plenty of data to look through.
Rather than choosing a default set, I asked if anyone in Orlando had a public dataset that they wanted analyzed. Someone in our Code for Orlando brigade sent me a CSV of around 1.5 million Orlando Police Dept. dispatches.
Before importing the dataset into R, I wanted to split the datetime column into its elements and add the header line to it. I ended up using this Python code in a terminal.
lines = []
newlines = []
with open('opddata.csv' , 'r') as fin: lines = fin.readlines()
#Split datetime into columns
for line in lines:
line = line.split(',')
newline = line[0].strip('"')
for item in ['-',' ',':']: newline = newline.replace(item , ',')
newline = line[0] + ',' + newline + ',' + ','.join(line[1:])
newlines.append(newline)
#Add header line
header = 'datetime,year,month,day,hour,minute,second,lat,lon,reason,agency\n'
with open('opddatasplit.csv' , 'w') as fout:
fout.write(header + ''.join(newlines))
Now we can setup our workspace and load it into R.
## datetime year month
## 2010-09-15 13:01:00: 14 Min. :2009 Min. : 1.000
## 2011-09-17 00:23:00: 12 1st Qu.:2011 1st Qu.: 4.000
## 2011-09-21 00:21:00: 12 Median :2013 Median : 7.000
## 2014-10-22 11:02:00: 12 Mean :2012 Mean : 6.522
## 2014-11-25 11:54:00: 12 3rd Qu.:2014 3rd Qu.: 9.000
## 2010-03-22 21:40:00: 10 Max. :2015 Max. :12.000
## (Other) :1448486
## day hour minute second
## Min. : 1.00 Min. : 0.00 Min. : 0.00 Min. : 0.0000
## 1st Qu.: 8.00 1st Qu.: 8.00 1st Qu.:15.00 1st Qu.: 0.0000
## Median :16.00 Median :14.00 Median :30.00 Median : 0.0000
## Mean :15.81 Mean :12.96 Mean :29.52 Mean : 0.6054
## 3rd Qu.:23.00 3rd Qu.:18.00 3rd Qu.:45.00 3rd Qu.: 0.0000
## Max. :31.00 Max. :23.00 Max. :59.00 Max. :59.0000
##
## lat lon reason
## Min. :-34.53 Min. :-88.0324 general disturbance:135448
## 1st Qu.: 28.50 1st Qu.:-81.4359 accident :120913
## Median : 28.53 Median :-81.3878 suspicious person :109178
## Mean : 28.52 Mean :-81.3889 battery : 69508
## 3rd Qu.: 28.55 3rd Qu.:-81.3481 unknown trouble : 69184
## Max. : 50.83 Max. : -0.2423 commercial alarm : 67645
## (Other) :876682
## agency
## ocso: 29624
## opd :1418934
##
##
##
##
##
## datetime year month day hour minute second lat lon
## 1 2009-05-09 12:37:00 2009 5 9 12 37 0 28.54386 -81.39834
## reason agency
## 1 battery opd
## datetime year month day hour minute second lat
## 1448558 2015-08-21 20:46:25 2015 8 21 20 46 25 28.53131
## lon reason agency
## 1448558 -81.14496 house/bus./area/check ocso
Yes, this dataset has 1.45 million rows of police dispatchess dating from 2009-05-09 to 2015-08-21. Looking at the datetime items, we can make some initial observations and conjectures.
Let’s start by looking at the times when the incidents are reported. We’ll look at year, month, day, and hour; there’s nothing valuable we can gain from minute and second.
Most years, the bin count is pretty stable just over 200K. We have incomplete data for 2009 and 2015. However, there’s a sizable spike in dispatches in 2014. That’s something to investigate later.
We can see there is, in fact, an increase in dispatchess during the Summer months and drops back down to normal in September. This is likely due to having no data before April 2009 and after August 2015. Even still, there’s a noticeable drop during December matched only by February, which is usually three days shorter, and we only have four years of data for each. I’d like to see that separated out by year.
It seems we’ve found why there was an up-tick in the summer and in 2014: there were about twice as many dispatches as normal in 2014 from April to November. There’s also a spike in August 2015, a month in which we only have 2/3 of the supposed data. Was crime rampant during these months. What I think is more likely is there is a new ‘reason’ that caused the spike or a police policy that led to officers responding to more incidents.
Turns out the number of daily dispatches is fairly steady with the median staying around 625 and the range of the middle 50% of values staying around 125. The outliers also seem to form somewhat distinct bands. I want to look at this again later.
Here’s a clear view of the hourly dispatches. We can see that the graph mostly follows a parabolic arc starting at 5 AM and peaking in the early evening. The spike at 6 PM is likely due to rush hour accidents getting reported. I’m interested why there’s a drop just before it, though.
Now I want to look more at the ‘reason’ column. We have 153 of them, and I’d like to classify them into a couple larger categories.
Also, a quick note. These are the reasons the police officer was called to the scene. While I will look at this data and make assumptions about the actual outcome, not all of these dispatches likely match one-to-one with the actual events.
## [1] "911 emergency"
## [2] "911 hang up"
## [3] "911 non-emergency"
## [4] "abandoned boat"
## [5] "abandoned vehicle"
## [6] "accident"
## [7] "aggravated assault"
## [8] "aggravated battery"
## [9] "airplane accident"
## [10] "ambulance escort"
## [11] "animal calls"
## [12] "armed robbery"
## [13] "arson fire"
## [14] "assist fire dept."
## [15] "attempted rape"
## [16] "attempted suicide"
## [17] "bad check passed"
## [18] "bank alarm"
## [19] "bank robbery"
## [20] "battery"
## [21] "batt. on law enf. off."
## [22] "bike patrol"
## [23] "bomb explosion"
## [24] "bomb threat"
## [25] "bribery"
## [26] "burglary business"
## [27] "burglary hotel"
## [28] "burglary residence"
## [29] "burglary vehicle"
## [30] "carjacking"
## [31] "check well being"
## [32] "child abuse"
## [33] "child neglect"
## [34] "citizen assist"
## [35] "commercial alarm"
## [36] "commercial b&e"
## [37] "commercial robbery"
## [38] "community orientated policing detail"
## [39] "county ord. viol."
## [40] "criminal mischief"
## [41] "dead animal"
## [42] "dead person"
## [43] "designated patrol area"
## [44] "deviant sexual activities"
## [45] "direct traffic"
## [46] "disabled occupied vehicle"
## [47] "discharge weapon"
## [48] "domestic disturbance"
## [49] "door alarm"
## [50] "d.p.a. available"
## [51] "drowning"
## [52] "drug violation"
## [53] "drunk driver"
## [54] "drunk pedestrian"
## [55] "drunk person"
## [56] "escaped prisoner"
## [57] "false imprisonment"
## [58] "felony"
## [59] "felony drugs"
## [60] "fire"
## [61] "fishing violation"
## [62] "forgery"
## [63] "found property"
## [64] "fraud/counterfeit"
## [65] "fugitive from justice"
## [66] "gambling"
## [67] "general disturbance"
## [68] "general investigation"
## [69] "grand theft"
## [70] "hit and run"
## [71] "hitchhiker"
## [72] "hold-up alarm"
## [73] "home invasion"
## [74] "house/bus./area/check"
## [75] "house/business check"
## [76] "illegal fishing"
## [77] "illegally parked cars"
## [78] "impersonating police officer"
## [79] "industrial accident"
## [80] "k-9 requested"
## [81] "kidnapping"
## [82] "law enforcement officer escort"
## [83] "leo escort"
## [84] "liquor law violation"
## [85] "lost/found property"
## [86] "man down"
## [87] "mentally-ill person"
## [88] "misd. drugs"
## [89] "misdemeanor"
## [90] "missing person"
## [91] "missing person recovered"
## [92] "murder"
## [93] "mutual aid"
## [94] "near drowning"
## [95] "noise ordinance violation"
## [96] "non-emergency assistance"
## [97] "non-so warrant"
## [98] "nuisance animal"
## [99] "obscene/harassing phone calls"
## [100] "obstruction on highway"
## [101] "obstruct on hwy"
## [102] "officer with prisoner"
## [103] "open door/window"
## [104] "other sex crimes"
## [105] "parking violation"
## [106] "person robbery"
## [107] "petit theft"
## [108] "physical fight"
## [109] "prostitution"
## [110] "prowler"
## [111] "rape"
## [112] "reckless boat"
## [113] "reckless driver"
## [114] "reckless vehicle"
## [115] "rescue-medical only"
## [116] "residential alarm"
## [117] "residential b&e"
## [118] "resist w/o violence"
## [119] "school zone crossing"
## [120] "security checkpoint alarm"
## [121] "shoplifting"
## [122] "sick or injured person"
## [123] "signal out"
## [124] "solicitor"
## [125] "stalking"
## [126] "standby"
## [127] "stolen/lost tag"
## [128] "stolen/lost tag recovered"
## [129] "stolen vehicle"
## [130] "stolen vehicle recovered"
## [131] "strong arm robbery"
## [132] "suicide"
## [133] "suspicious boat"
## [134] "suspicious car/occupant armed"
## [135] "suspicious hazard"
## [136] "suspicious incident"
## [137] "suspicious luggage"
## [138] "suspicious person"
## [139] "suspicious vehicle"
## [140] "suspicious video"
## [141] "theft"
## [142] "threatening animal"
## [143] "threats/assaults"
## [144] "traffic light"
## [145] "traffic (misc)"
## [146] "trash dumping"
## [147] "trespasser"
## [148] "unknown trouble"
## [149] "vandalism/criminal mischief"
## [150] "vehicle accident"
## [151] "vehicle alarm"
## [152] "verbal disturbance"
## [153] "weapons/armed"
Given these levels, I think the best categories will be:
The items put into each category in the code below are at my discretion. However, I used the definition of violent crime from the Bureau of Justice Statistics as my guide for the first two lists.
Violent crime involves intentional or intended physical harm to another human including murder, rape and sexual assault, robbery, and assault.
Many police departments also include attempted violent crime as violent crime as well as crimes like arson where bodily harm is possible. This is why robbery (victims present) is a violent crime while burglary (victims not present) is not. I’ll also state that, for the purpose of these lists, ‘crime’ is breaking federal or state laws, not county ordinances, so reasons that include ‘violation’, which mostly apply to local ordinances, will be put in the ‘oncall’ list.
violent_list = c('aggravated assault','aggravated battery','armed robbery','arson fire','attempted rape','bank robbery','battery','batt. on law enf. off.','bomb explosion','bomb threat','carjacking','child abuse','child neglect','commercial robbery','drunk driver','false imprisonment','hit and run','hold-up alarm','home invasion','kidnapping','murder','other sex crimes','person robbery','rape','strong arm robbery','threats/assaults','weapons/armed')
nonviolent_list = c('bad check passed','bribery','burglary business','burglary hotel','burglary residence','commercial b&e','criminal mischief','drug violation','drunk pedestrian','drunk person','escaped prisoner','felony','felony drugs','forgery','fraud/counterfeit','fugitive from justice','gambling','grand theft','illegal fishing','impersonating police officer','misd. drugs','misdemeanor','petit theft','prostitution','residential b&e','resist w/o violence','shoplifting','theft','vandalism/criminal mischief')
transport_list = c('abandoned boat','abandoned vehicle','accident','airplane accident','burglary vehicle','disabled occupied vehicle','illegally parked cars','obstruction on highway','obstruct on hwy','parking violation','reckless boat','reckless driver','reckless vehicle','signal out','stolen/lost tag','stolen/lost tag recovered','stolen vehicle','stolen vehicle recovered','suspicious boat','suspicious car/occupant armed','suspicious vehicle','traffic light','traffic (misc)','vehicle accident','vehicle alarm')
oncall_list = c('911 emergency','911 hang up','animal calls','attempted suicide','bank alarm','check well being','commercial alarm','county ord. viol.','dead animal','dead person','deviant sexual activities','discharge weapon','domestic disturbance','door alarm','drowning','fire','fishing violation','found property','general disturbance','general investigation','hitchhiker','house/bus./area/check','house/business check','industrial accident','liquor law violation','lost/found property','mentally-ill person','missing person','missing person recovered','near drowning','noise ordinance violation','non-emergency assistance','non-so warrant','nuisance animal','obscene/harassing phone calls','open door/window','physical fight','prowler','rescue-medical only','residential alarm','security checkpoint alarm','sick or injured person','solicitor','stalking','suicide','suspicious hazard','suspicious incident','suspicious luggage','suspicious person','suspicious video','threatening animal','trash dumping','trespasser','unknown trouble','verbal disturbance')
Now that we have our list, let’s make a new column called ‘reason_cat’ that tells us which category that dispatch belongs to and take a quick look at the distribution of our reason categories.
Over half of the dispatches fall into the ‘oncall’ category, which makes sense. Police are often called upon to make official reports of an incident or act as a government liaison for certain events. That category also has the most individual reasons. I’d like to see the most frequent items in these categories.
## battery threats/assaults hit and run person robbery
## 69508 30264 12272 6365
## hold-up alarm other sex crimes child neglect rape
## 4690 3874 3273 1955
## child abuse drunk driver
## 1721 1122
Of our violent crimes, half of them are for battery. In this category, 97% of our dispatches fall into the top 10 of the 27 reasons. Also, there are only 12 murder dispatches. This seems uncharacteristically low for a span of six years. It’s possible that police respond to certain calls that end up as a murder incident rather than responding after the murder has already happened.
## theft residential b&e
## 40975 37225
## shoplifting vandalism/criminal mischief
## 25822 17320
## fugitive from justice drug violation
## 15763 13931
## commercial b&e fraud/counterfeit
## 7474 6704
## drunk pedestrian burglary residence
## 3802 680
Similarly, 97% of non-violent dispatches are also made up of the top 10 of 29.
## accident suspicious vehicle
## 120913 27244
## burglary vehicle stolen vehicle
## 26980 17872
## disabled occupied vehicle obstruction on highway
## 13159 11506
## illegally parked cars abandoned vehicle
## 9387 4812
## signal out stolen vehicle recovered
## 3445 3428
Accidents make up half of our transport dispatches and are the second most common reason making up 8.3% of our dataset. Again, 97% of this category is made up of the top 10 of 25.
## general disturbance suspicious person
## 135448 109178
## unknown trouble commercial alarm
## 69184 67645
## trespasser suspicious incident
## 54349 40917
## residential alarm house/business check
## 40280 39522
## domestic disturbance noise ordinance violation
## 35888 26418
Now to our largest group. General disturbances are the most numerous reason making up 17.2% of this category and 9.4% of our dataset. We also have ‘unknown’ for 4.8% of our dataset. This category is a little more spread out with the top 10 making up only 78.6% of the 55 reasons.
Armed with this new column, let’s take another look at our hourly graph. This time, we’ll divide each bar by category.