Download full report (HTML)
Full project materials on GitHub

Abstract

In terms of police reporting, Orlando is a large metropolis that likes to pretend it’s still a small town. What I mean by this is that the Orlando Police Department files and stores police dispatches on everything that officers are called on for (except minor traffic stops). This means that Orlando often ranks disproportionately high in crime lists that are based on the number of reports per capita. It also means that we have plenty of data to look through.

Preparation

Rather than choosing a default set, I asked if anyone in Orlando had a public dataset that they wanted analyzed. Someone in our Code for Orlando brigade sent me a CSV of around 1.5 million Orlando Police Dept. dispatches.

Before importing the dataset into R, I wanted to split the datetime column into its elements and add the header line to it. I ended up using this Python code in a terminal.

lines = []
newlines = []
with open('opddata.csv' , 'r') as fin: lines = fin.readlines()
#Split datetime into columns
for line in lines:
  line = line.split(',')
  newline = line[0].strip('"')
  for item in ['-',' ',':']: newline = newline.replace(item , ',')
  newline = line[0] + ',' + newline + ',' + ','.join(line[1:])
  newlines.append(newline)
#Add header line
header = 'datetime,year,month,day,hour,minute,second,lat,lon,reason,agency\n'
with open('opddatasplit.csv' , 'w') as fout:
  fout.write(header + ''.join(newlines))

Now we can setup our workspace and load it into R.

##                 datetime            year          month       
##  2010-09-15 13:01:00:     14   Min.   :2009   Min.   : 1.000  
##  2011-09-17 00:23:00:     12   1st Qu.:2011   1st Qu.: 4.000  
##  2011-09-21 00:21:00:     12   Median :2013   Median : 7.000  
##  2014-10-22 11:02:00:     12   Mean   :2012   Mean   : 6.522  
##  2014-11-25 11:54:00:     12   3rd Qu.:2014   3rd Qu.: 9.000  
##  2010-03-22 21:40:00:     10   Max.   :2015   Max.   :12.000  
##  (Other)            :1448486                                  
##       day             hour           minute          second       
##  Min.   : 1.00   Min.   : 0.00   Min.   : 0.00   Min.   : 0.0000  
##  1st Qu.: 8.00   1st Qu.: 8.00   1st Qu.:15.00   1st Qu.: 0.0000  
##  Median :16.00   Median :14.00   Median :30.00   Median : 0.0000  
##  Mean   :15.81   Mean   :12.96   Mean   :29.52   Mean   : 0.6054  
##  3rd Qu.:23.00   3rd Qu.:18.00   3rd Qu.:45.00   3rd Qu.: 0.0000  
##  Max.   :31.00   Max.   :23.00   Max.   :59.00   Max.   :59.0000  
##                                                                   
##       lat              lon                           reason      
##  Min.   :-34.53   Min.   :-88.0324   general disturbance:135448  
##  1st Qu.: 28.50   1st Qu.:-81.4359   accident           :120913  
##  Median : 28.53   Median :-81.3878   suspicious person  :109178  
##  Mean   : 28.52   Mean   :-81.3889   battery            : 69508  
##  3rd Qu.: 28.55   3rd Qu.:-81.3481   unknown trouble    : 69184  
##  Max.   : 50.83   Max.   : -0.2423   commercial alarm   : 67645  
##                                      (Other)            :876682  
##   agency       
##  ocso:  29624  
##  opd :1418934  
##                
##                
##                
##                
## 
##              datetime year month day hour minute second      lat       lon
## 1 2009-05-09 12:37:00 2009     5   9   12     37      0 28.54386 -81.39834
##    reason agency
## 1 battery    opd
##                    datetime year month day hour minute second      lat
## 1448558 2015-08-21 20:46:25 2015     8  21   20     46     25 28.53131
##               lon                reason agency
## 1448558 -81.14496 house/bus./area/check   ocso

Yes, this dataset has 1.45 million rows of police dispatchess dating from 2009-05-09 to 2015-08-21. Looking at the datetime items, we can make some initial observations and conjectures.

Analysis

Datetime

Let’s start by looking at the times when the incidents are reported. We’ll look at year, month, day, and hour; there’s nothing valuable we can gain from minute and second.

Most years, the bin count is pretty stable just over 200K. We have incomplete data for 2009 and 2015. However, there’s a sizable spike in dispatches in 2014. That’s something to investigate later.

We can see there is, in fact, an increase in dispatchess during the Summer months and drops back down to normal in September. This is likely due to having no data before April 2009 and after August 2015. Even still, there’s a noticeable drop during December matched only by February, which is usually three days shorter, and we only have four years of data for each. I’d like to see that separated out by year.

It seems we’ve found why there was an up-tick in the summer and in 2014: there were about twice as many dispatches as normal in 2014 from April to November. There’s also a spike in August 2015, a month in which we only have 2/3 of the supposed data. Was crime rampant during these months. What I think is more likely is there is a new ‘reason’ that caused the spike or a police policy that led to officers responding to more incidents.

Turns out the number of daily dispatches is fairly steady with the median staying around 625 and the range of the middle 50% of values staying around 125. The outliers also seem to form somewhat distinct bands. I want to look at this again later.

Here’s a clear view of the hourly dispatches. We can see that the graph mostly follows a parabolic arc starting at 5 AM and peaking in the early evening. The spike at 6 PM is likely due to rush hour accidents getting reported. I’m interested why there’s a drop just before it, though.

Categorizing Reason

Now I want to look more at the ‘reason’ column. We have 153 of them, and I’d like to classify them into a couple larger categories.

Also, a quick note. These are the reasons the police officer was called to the scene. While I will look at this data and make assumptions about the actual outcome, not all of these dispatches likely match one-to-one with the actual events.

##   [1] "911 emergency"                       
##   [2] "911 hang up"                         
##   [3] "911 non-emergency"                   
##   [4] "abandoned boat"                      
##   [5] "abandoned vehicle"                   
##   [6] "accident"                            
##   [7] "aggravated assault"                  
##   [8] "aggravated battery"                  
##   [9] "airplane accident"                   
##  [10] "ambulance escort"                    
##  [11] "animal calls"                        
##  [12] "armed robbery"                       
##  [13] "arson fire"                          
##  [14] "assist fire dept."                   
##  [15] "attempted rape"                      
##  [16] "attempted suicide"                   
##  [17] "bad check passed"                    
##  [18] "bank alarm"                          
##  [19] "bank robbery"                        
##  [20] "battery"                             
##  [21] "batt. on law enf. off."              
##  [22] "bike patrol"                         
##  [23] "bomb explosion"                      
##  [24] "bomb threat"                         
##  [25] "bribery"                             
##  [26] "burglary business"                   
##  [27] "burglary hotel"                      
##  [28] "burglary residence"                  
##  [29] "burglary vehicle"                    
##  [30] "carjacking"                          
##  [31] "check well being"                    
##  [32] "child abuse"                         
##  [33] "child neglect"                       
##  [34] "citizen assist"                      
##  [35] "commercial alarm"                    
##  [36] "commercial b&e"                      
##  [37] "commercial robbery"                  
##  [38] "community orientated policing detail"
##  [39] "county ord. viol."                   
##  [40] "criminal mischief"                   
##  [41] "dead animal"                         
##  [42] "dead person"                         
##  [43] "designated patrol area"              
##  [44] "deviant sexual activities"           
##  [45] "direct traffic"                      
##  [46] "disabled occupied vehicle"           
##  [47] "discharge weapon"                    
##  [48] "domestic disturbance"                
##  [49] "door alarm"                          
##  [50] "d.p.a. available"                    
##  [51] "drowning"                            
##  [52] "drug violation"                      
##  [53] "drunk driver"                        
##  [54] "drunk pedestrian"                    
##  [55] "drunk person"                        
##  [56] "escaped prisoner"                    
##  [57] "false imprisonment"                  
##  [58] "felony"                              
##  [59] "felony drugs"                        
##  [60] "fire"                                
##  [61] "fishing violation"                   
##  [62] "forgery"                             
##  [63] "found property"                      
##  [64] "fraud/counterfeit"                   
##  [65] "fugitive from justice"               
##  [66] "gambling"                            
##  [67] "general disturbance"                 
##  [68] "general investigation"               
##  [69] "grand theft"                         
##  [70] "hit and run"                         
##  [71] "hitchhiker"                          
##  [72] "hold-up alarm"                       
##  [73] "home invasion"                       
##  [74] "house/bus./area/check"               
##  [75] "house/business check"                
##  [76] "illegal fishing"                     
##  [77] "illegally parked cars"               
##  [78] "impersonating police officer"        
##  [79] "industrial accident"                 
##  [80] "k-9 requested"                       
##  [81] "kidnapping"                          
##  [82] "law enforcement officer escort"      
##  [83] "leo escort"                          
##  [84] "liquor law violation"                
##  [85] "lost/found property"                 
##  [86] "man down"                            
##  [87] "mentally-ill person"                 
##  [88] "misd. drugs"                         
##  [89] "misdemeanor"                         
##  [90] "missing person"                      
##  [91] "missing person recovered"            
##  [92] "murder"                              
##  [93] "mutual aid"                          
##  [94] "near drowning"                       
##  [95] "noise ordinance violation"           
##  [96] "non-emergency assistance"            
##  [97] "non-so warrant"                      
##  [98] "nuisance animal"                     
##  [99] "obscene/harassing phone calls"       
## [100] "obstruction on highway"              
## [101] "obstruct on hwy"                     
## [102] "officer with prisoner"               
## [103] "open door/window"                    
## [104] "other sex crimes"                    
## [105] "parking violation"                   
## [106] "person robbery"                      
## [107] "petit theft"                         
## [108] "physical fight"                      
## [109] "prostitution"                        
## [110] "prowler"                             
## [111] "rape"                                
## [112] "reckless boat"                       
## [113] "reckless driver"                     
## [114] "reckless vehicle"                    
## [115] "rescue-medical only"                 
## [116] "residential alarm"                   
## [117] "residential b&e"                     
## [118] "resist w/o violence"                 
## [119] "school zone crossing"                
## [120] "security checkpoint alarm"           
## [121] "shoplifting"                         
## [122] "sick or injured person"              
## [123] "signal out"                          
## [124] "solicitor"                           
## [125] "stalking"                            
## [126] "standby"                             
## [127] "stolen/lost tag"                     
## [128] "stolen/lost tag recovered"           
## [129] "stolen vehicle"                      
## [130] "stolen vehicle recovered"            
## [131] "strong arm robbery"                  
## [132] "suicide"                             
## [133] "suspicious boat"                     
## [134] "suspicious car/occupant armed"       
## [135] "suspicious hazard"                   
## [136] "suspicious incident"                 
## [137] "suspicious luggage"                  
## [138] "suspicious person"                   
## [139] "suspicious vehicle"                  
## [140] "suspicious video"                    
## [141] "theft"                               
## [142] "threatening animal"                  
## [143] "threats/assaults"                    
## [144] "traffic light"                       
## [145] "traffic (misc)"                      
## [146] "trash dumping"                       
## [147] "trespasser"                          
## [148] "unknown trouble"                     
## [149] "vandalism/criminal mischief"         
## [150] "vehicle accident"                    
## [151] "vehicle alarm"                       
## [152] "verbal disturbance"                  
## [153] "weapons/armed"

Given these levels, I think the best categories will be:

  • violent: violent crimes such as murder, kidnapping, and battery
  • nonviolent: non-violent crimes such as criminal mischief, drug violations, and breaking & entering
  • transport: involving vehicles, roads, and waterways (that are not considered violent crimes like DUI) including accidents and reckless boating
  • oncall: where police responded to a non-criminal call like a 911 hang up, dead animal, or suspicious person
  • other: anything else that doesn’t fall into these categories like school zone crossings, bike patrols, and escorts

The items put into each category in the code below are at my discretion. However, I used the definition of violent crime from the Bureau of Justice Statistics as my guide for the first two lists.

Violent crime involves intentional or intended physical harm to another human including murder, rape and sexual assault, robbery, and assault.

Many police departments also include attempted violent crime as violent crime as well as crimes like arson where bodily harm is possible. This is why robbery (victims present) is a violent crime while burglary (victims not present) is not. I’ll also state that, for the purpose of these lists, ‘crime’ is breaking federal or state laws, not county ordinances, so reasons that include ‘violation’, which mostly apply to local ordinances, will be put in the ‘oncall’ list.

violent_list = c('aggravated assault','aggravated battery','armed robbery','arson fire','attempted rape','bank robbery','battery','batt. on law enf. off.','bomb explosion','bomb threat','carjacking','child abuse','child neglect','commercial robbery','drunk driver','false imprisonment','hit and run','hold-up alarm','home invasion','kidnapping','murder','other sex crimes','person robbery','rape','strong arm robbery','threats/assaults','weapons/armed')
nonviolent_list = c('bad check passed','bribery','burglary business','burglary hotel','burglary residence','commercial b&e','criminal mischief','drug violation','drunk pedestrian','drunk person','escaped prisoner','felony','felony drugs','forgery','fraud/counterfeit','fugitive from justice','gambling','grand theft','illegal fishing','impersonating police officer','misd. drugs','misdemeanor','petit theft','prostitution','residential b&e','resist w/o violence','shoplifting','theft','vandalism/criminal mischief')
transport_list = c('abandoned boat','abandoned vehicle','accident','airplane accident','burglary vehicle','disabled occupied vehicle','illegally parked cars','obstruction on highway','obstruct on hwy','parking violation','reckless boat','reckless driver','reckless vehicle','signal out','stolen/lost tag','stolen/lost tag recovered','stolen vehicle','stolen vehicle recovered','suspicious boat','suspicious car/occupant armed','suspicious vehicle','traffic light','traffic (misc)','vehicle accident','vehicle alarm')
oncall_list = c('911 emergency','911 hang up','animal calls','attempted suicide','bank alarm','check well being','commercial alarm','county ord. viol.','dead animal','dead person','deviant sexual activities','discharge weapon','domestic disturbance','door alarm','drowning','fire','fishing violation','found property','general disturbance','general investigation','hitchhiker','house/bus./area/check','house/business check','industrial accident','liquor law violation','lost/found property','mentally-ill person','missing person','missing person recovered','near drowning','noise ordinance violation','non-emergency assistance','non-so warrant','nuisance animal','obscene/harassing phone calls','open door/window','physical fight','prowler','rescue-medical only','residential alarm','security checkpoint alarm','sick or injured person','solicitor','stalking','suicide','suspicious hazard','suspicious incident','suspicious luggage','suspicious person','suspicious video','threatening animal','trash dumping','trespasser','unknown trouble','verbal disturbance')

Now that we have our list, let’s make a new column called ‘reason_cat’ that tells us which category that dispatch belongs to and take a quick look at the distribution of our reason categories.

Over half of the dispatches fall into the ‘oncall’ category, which makes sense. Police are often called upon to make official reports of an incident or act as a government liaison for certain events. That category also has the most individual reasons. I’d like to see the most frequent items in these categories.

##          battery threats/assaults      hit and run   person robbery 
##            69508            30264            12272             6365 
##    hold-up alarm other sex crimes    child neglect             rape 
##             4690             3874             3273             1955 
##      child abuse     drunk driver 
##             1721             1122

Of our violent crimes, half of them are for battery. In this category, 97% of our dispatches fall into the top 10 of the 27 reasons. Also, there are only 12 murder dispatches. This seems uncharacteristically low for a span of six years. It’s possible that police respond to certain calls that end up as a murder incident rather than responding after the murder has already happened.

##                       theft             residential b&e 
##                       40975                       37225 
##                 shoplifting vandalism/criminal mischief 
##                       25822                       17320 
##       fugitive from justice              drug violation 
##                       15763                       13931 
##              commercial b&e           fraud/counterfeit 
##                        7474                        6704 
##            drunk pedestrian          burglary residence 
##                        3802                         680

Similarly, 97% of non-violent dispatches are also made up of the top 10 of 29.

##                  accident        suspicious vehicle 
##                    120913                     27244 
##          burglary vehicle            stolen vehicle 
##                     26980                     17872 
## disabled occupied vehicle    obstruction on highway 
##                     13159                     11506 
##     illegally parked cars         abandoned vehicle 
##                      9387                      4812 
##                signal out  stolen vehicle recovered 
##                      3445                      3428

Accidents make up half of our transport dispatches and are the second most common reason making up 8.3% of our dataset. Again, 97% of this category is made up of the top 10 of 25.

##       general disturbance         suspicious person 
##                    135448                    109178 
##           unknown trouble          commercial alarm 
##                     69184                     67645 
##                trespasser       suspicious incident 
##                     54349                     40917 
##         residential alarm      house/business check 
##                     40280                     39522 
##      domestic disturbance noise ordinance violation 
##                     35888                     26418

Now to our largest group. General disturbances are the most numerous reason making up 17.2% of this category and 9.4% of our dataset. We also have ‘unknown’ for 4.8% of our dataset. This category is a little more spread out with the top 10 making up only 78.6% of the 55 reasons.

Armed with this new column, let’s take another look at our hourly graph. This time, we’ll divide each bar by category.

For the most part, each category rises and falls with the overall arc of the day as we saw before. I have an idea that might explain what we see at 5 PM and 6 PM.

The time associated with a police report is not when the incident actually happened; it’s when the report is filed, ie when the officer arrives at that location. The heaviest rush hour traffic starts around 5 PM when most people leave work. I believe that many of the 6 PM dispatches happened in the 5 PM block, but the traffic kept enough officers from getting to the site promptly. If you average both bars in the graph, they fit the arc we would expect to see.

Also, there are increases in the height of ‘other’ at 7 AM and 2 PM. Because of the timing and that ‘other’ mostly consists of non-incident police activities, I believe these increases are do to the public school system beginning and ending during those times.

Let’s see if that spike in August 2015 is related to the categories.

There it is. There was a dramatic increase in the ‘oncall’ category. However, there are also smaller increases in every other category as well. Given that our data for August is only 2/3 complete, there was definitely either an increase in overall police activity or a policy change that lead to more police dispatches.

What about that increase in 2014?

These columns look very similar to the one in August 2015. It could be that they share the same cause. However, I believe there is something else going on here. The increase isn’t strictly during the Summer; it starts in April and goes through November. Rather than either/or, I believe there was both a policy change and an increase in law enforcement presence. Why? The increase in reporting matches up to the election season. While the president wasn’t on the ballot, the state governor was. However, we don’t see a similar increase in 2012.

Day of the Week

Because each row comes with a datetime string, we can use R to determine on which day of the week it was filed.

Let’s take another look at those categories by day of the week.

That’s flatter than I thought it would be. There are slightly less on Sundays, but not by much. Maybe we’ll see something if it’s faceted (we’ll exclude ‘oncall’ from this).

Violent crime stays steady throughout the week, while three of the five categories see drops over the weekend. This is likely to do with officer prioritization. A department only has so many officers to send places especially on weekends when some officers have a day off. Violent crimes take precedent, so they see relatively little fluctuation. The other categories are responded to based on the officers who are left. However, ‘oncall’ actually increased on Friday and Saturday. It’s likely that some of the reasons in the category are not as time-dependent, so they are pushed to the weekend.

Let’s revisit that day boxplot, but we’ll use points and color by day of the week this time.

There doesn’t seem to be a connection between day of the week and the number of dispatches per day, but we can see why the bands of outliers exist in our boxplot. The number of dispatches are mostly consistent within each year except for 2014. When included with the other years, almost all of 2009 is considdered an outlier. In 2014, there are two distinct bands which likely has to do with the jumps in numbers from April to November. What I’m shocked to see is just how abrupt the changes are per year. For example, there are only a couple of days in 2010 that even fall into the range of 2009. This makes me think that a new police policy took effect at the very start of 2010 that had an immediate impact in the number of daily dispatches.

Heat Mapping

Let’s try to make some heat maps from our geo data. We know there are some outliers in the coordinates, so let’s figure out a better bounding box. I know from an older project the approximate bounds of Orange County, FL. Let’s start there with a decent buffer zone.

Now refining those values, we’ll use (28.34,-81.6),(28.64,-81.2) as our bounding box. Let’s create a subset of our data so we can round our coordinates. We’ll also create a function that will turn the dataframe into a frequency table we can use in the visualizations. However, there’s a caveat in the data. There’s an high number of reports that are located at or in the immediate vicinity of the police station and the county courthouse which are causing the rest of the data points to be washed out. For the purpose of making these plots, we will also omit these two locations. We’ll do this by supplying a frequency cap to our function.

Now for our visualization. We’ll be using the ggmap package to overlay our frequency table on a map of Orlando (sourced from Google Maps).

We can see that the darkest areas are downtown, along E & W Colonial Dr, and around shopping areas like the Millenia Mall. All of these areas are either highly populated or highly trafficked during the day. There’s also a couple of hot spots around intersections which are likely due to accidents.

I’d like to break it down into just violent and non-violent.

First, the locations with the greatest frequency of violent crime are:

  • Wall St (downtown bar/club scene)
  • The intersection of E Colonial and N Westmoreland Dr
  • Park/empty lot between Paramore and Callahan
  • Florida Hospital at Loch Haven
  • The neighborhood of Haralson Estates
  • Around the Heart of Mercy Church on Mercy Dr
  • Around the Apostolic Overcoming Church on Raleigh St

These areas are centered around nightlife or are in low income neighborhoods. The outlier here is the Florida Hospital. I believe this is a similar situation to the police station where reports are filed at the hospital because the victim has already been rushed away from the scene for care.

Now, the locations with the greatest frequency of non-violent crime are:

  • Shopping Centers like Millenia Mall, West Oaks Mall, Florida Outlet Mall, Parkwood Plaza, and Wallmart south of Valencia
  • The intersection of E Colonial and N Westmoreland Dr
  • Park/empty lot between Paramore and Callahan
  • The Universal Studios employee parking lot

With a few repeats, most of these areas are commercial shopping plazas. This makes sense because our non-violent crime category is dominated by types of theft including shoplifting. I do find it interesting that the Universal Studios employee parking lot was so red. It’s likely that the guest parking lots have more incidents, but they’re handled by park security.

Accidents

I’d like to just look at our accident dispatches. I want to see what the least safe datetime is and the locations with the most accidents. Let’s start with accidents by hour.

We see that accidents also follow the 5 AM arc we saw earlier and have the 5 PM response time dip. We can also see that the number of accidents decreases just after morning rush hour.

The number of accidents is only about 75% as high on the weekends. Most accidents happen on Friday. I bet this is because of people going out or traveling on Friday evening, and we’ll see that with a faceted graph.

As I thought, we see an overall increase in accidents on Friday as people leave work. We get a second large spike in accidents at 4 PM as some people try to leave work early. As for the safest time, that would be at 4 AM on Thursday. We actually see the most early morning accidents on the weekend.

Now let’s see where the most accidents are.

##          lat     lon Freq
## 12826 28.494 -81.459 1823
## 18784 28.495 -81.436 1016
## 12815 28.483 -81.459  978
## 34342 28.513 -81.376  967
## 51404 28.481 -81.310  913
## 51447 28.524 -81.310  907
## 51420 28.497 -81.310  878
## 12794 28.462 -81.459  858
## 12308 28.494 -81.461  827
## 12846 28.514 -81.459  811
## 20333 28.490 -81.430  806

Looking at the heatmap, we can see the coordinates match up to the darkest areas on the heatmap and they’re all on top of road intersections.

Final Plots and Summary

Over the course of this analysis, I believe these plots best represent the information and findings in the dataset.

This first graph shows the number of dispatches divided by category through the course of the day. It generally shows how the level of police (and civilian) activity changes based on people’s sleep habits, work schedule (and the resulting rush hours), and public school day. This graph also allowed me to figure out that “police in traffic” is likely the cause of the value changes we see at 5 PM and 6 PM.

This heatmap shows the geo-spacial distribution of crime in the city. This is the kind of plot that best tells where additional patrols would best be utilized. There are hotspots of crime around the suspected places like downtown, shopping centers, and some low-income neighborhoods. However, there are other places of concern.

This faceted graph can best inform people when the safest time to drive is. On Fridays, for example, it might be safer to wait an extra hour at work than try to leave an hour early.

Using the same subset as the graph above, here are the most dangerous intersections in Orlando ranked by the number of accidents from April 2009 to August 2015:

Reflection

Issues

As far as usable data goes, this dataset started as dispatch items with a datetime string, a reason string, and lat/lon coordinates. In other words, mostly categorical data. Attempting to use the data “as is” was not going to lead to any useful conclusions. I had to figure out ways to augment this database using the data available. Some of the new columns were created programmatically, like splitting the datetime, while some required a more “hands on” approach, like categorizing the dispatch reasons individually.

By far, I had the most trouble figuring out how to create the heatmaps. I started looking at ggplot2 with geom_map, but the smallest I could get was a blank polygon map of Florida’s counties. Then I looked at RgoogleMaps, but decided it wasn’t what I wanted. I tried (successfully) creating a heatmap by exporting the dataframe to Google Fusion Tables, but they don’t offer the color gradient overlay. It only put a solid dot at the geo-location of each, which didn’t adequately convey the data behind 1.45 million items.

Finally I found a library called ggmap which I could use with the ggplot additive layers. Specifically, using geom_tile with variable alpha levels, I was able to subset and round a dataframe to color the heatmap by location. The nice thing about ggmap is that it automatically restricts the bounds of the data displayed based on the bounds of the map. This meant I could change the zoom level without having to recreate the initial opdgeo dataframe.

Insights

I originally divided the datetime column into sections in Python but didn’t re-include it. I decided to rerun the script to add it back in after I realized that I wanted to use R to determine the day of the week, which required a POSIX-style datetime string, not the values themselves.

The most interesting part was redrawing some of the graphs after creating the reason categories. Seeing a jump of ‘oncall’ dispatches between 5 PM and 6 PM is what made me realize “the cops get stuck in traffic too” could be a valid explanation for the difference in the original plot.

Improvements

One thing this analysis lacks is any sort of modeling. It could be possible to merge coordinate-based demographic data to model the number of dispatches for an area over a given period of time.

Conclusions

A look into the dataset shows that the Orlando police spend comparatively little time responding to actual crimes and making arrests. At least half of their time is spent either as a third-party for reporting an event or as a figure of authority to de-escalate a tense, non-criminal situation. While supposedly limited to Orlando, they often assist county and local police in smaller towns outside of the city’s official, twisted limits.

References

http://stackoverflow.com/questions/5234117/how-to-drop-columns-by-name-in-a-data-frame

http://stackoverflow.com/questions/11985799/converting-date-to-a-day-of-week-in-r

http://www.bjs.gov/index.cfm?ty=tp&tid=31

https://learnr.wordpress.com/2010/03/23/ggplot2-changing-the-default-order-of-legend-labels-and-stacking-of-data/

http://rstudio-pubs-static.s3.amazonaws.com/7433_4537ea5073dc4162950abb715f513469.html

http://www.r-bloggers.com/visualising-thefts-using-heatmaps-in-ggplot2/

https://gist.github.com/jmarhee/8530768

http://stats.stackexchange.com/questions/5007/how-can-i-change-the-title-of-a-legend-in-ggplot2