knitr::opts_chunk$set(echo = TRUE,warning=FALSE,message=FALSE)

Introduction:

Loading the dataset and libraries

library(ggplot2)
library(dplyr)
library(ggthemes)
library(formattable)
library(plotly)
library(ggrepel)
library(tidyr)
library(cowplot)
library(knitr)
library(forcats)
shoot=read.csv("Mass Shootings Dataset.csv",header=TRUE,stringsAsFactors = FALSE)

Getting a glimpse and structure of the data,

glimpse(shoot)
## Observations: 398
## Variables: 13
## $ S.                   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...
## $ Title                <chr> "Las Vegas Strip mass shooting", "San Fra...
## $ Location             <chr> "Las Vegas, NV", "San Francisco, CA", "Tu...
## $ Date                 <chr> "10/1/2017", "6/14/2017", "6/7/2017", "6/...
## $ Summary              <chr> "", "Jimmy Lam, 38, fatally shot three co...
## $ Fatalities           <int> 58, 3, 3, 5, 3, 3, 5, 5, 3, 5, 49, 0, 1, ...
## $ Injured              <int> 515, 2, 0, 0, 0, 0, 6, 0, 3, 11, 53, 4, 4...
## $ Total.victims        <int> 573, 5, 3, 5, 3, 3, 11, 5, 6, 16, 102, 4,...
## $ Mental.Health.Issues <chr> "Unclear", "Yes", "Unclear", "Unclear", "...
## $ Race                 <chr> "", "Asian", "White", "", "White", "Black...
## $ Gender               <chr> "", "M", "M", "M", "M", "M", "M", "M", "M...
## $ Latitude             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ Longitude            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
themegg=function(){theme(axis.title.x = element_text(color="red",hjust=1),axis.text.x = element_text(angle=90,vjust=0.5),axis.title.y = element_text(color="red",hjust=1),plot.title=element_text(size=15,color="blue",hjust=0.5),plot.subtitle=element_text(face="italic"),legend.background = element_blank(),legend.title=element_text(color="black",face="bold"),legend.text=element_text(color="black",face="bold"))}

Answering some questions to find out the pattern hidden

Which event has claimed maximum fatalities?

length(unique(shoot$Title))
## [1] 397
  • There are 397 unique shooting events captured in the dataset.
temp=shoot %>% select(Title,Location,Fatalities) %>% arrange(desc(Fatalities)) %>% head(20) 
q=ggplot(temp,aes(factor(Title,levels=Title),Fatalities,fill=Title))+geom_bar(stat="identity")+themegg()+labs(x="Event",y="Fatalities",title="Event and Fatalities",caption="Source:Wiki,USAToday,Web")+theme(legend.position="none")+scale_y_continuous(limits=c(0,60),breaks=seq(0,60,5))
ggplotly(q)
  • The Las Vegas strip mass shooting has maximum fatalities in this dataset followed by Orlando nightclub massacare.

Which event has more victims ?

temp=shoot %>% select(Title,Location,Total.victims) %>% arrange(desc(Total.victims)) %>% head(20)
ggplot(temp,aes(factor(Title,level=Title),Total.victims,fill=Title))+geom_bar(stat="identity")+theme_fivethirtyeight()+labs(x="Event",y="Total Victims",title="Maximum victims for an Event")+theme(legend.position="none",axis.text.x=element_text(angle=90,hjust=0.5))+geom_label_repel(aes(label=Total.victims,fill=factor(Location)),color="white",size=2.1)

  • The first two incidents have both maximum fatalities and has more victims if we do a comparative analysis of the earlier chart.

Which month,year has seen more incidents ?

The dataset has date column in the format MM/DD/YYYY.We split the date into month and year to find out the trend.Date is in character format.

Then I have plotted the graphs individually for months and years and have tried my hands on cowplot package.For more info on the package check out this link

shoot=separate(shoot,"Date",c("month","day","Year"),sep="/")
shoot$month=month.abb[as.numeric(shoot$month)]
shoot$month=factor(shoot$month,levels=month.abb)
temp =shoot %>% select(month) %>% group_by(month) %>% summarise(count=n())
q=ggplot(temp,aes(month,count,fill=month))+geom_bar(stat="identity")+theme_gdocs()+theme(legend.position="none")+labs(x="Month",y="Aggregate number of incidents",title="Number of Incidents by Month",caption="Source:Wiki,USA Today,Web")
temp=shoot %>% select(Year) %>% group_by(Year) %>% summarise(count=n()) %>% arrange(desc(count))
p=ggplot(temp,aes(Year,count,group=1))+geom_line()+theme_classic()+theme(legend.position="none",axis.text.x=element_text(angle=90,hjust=0.5))+labs(x="Year",y="Aggregate number of incidents",title="Number of Incidents by Year",caption="Source:Wiki,USA Today,Web")
plot_grid(p,q,labels=NULL,cols=1)

The number of incidents have peaked during the month of feburary,March and april.

The number of incidents seems to have peaked after 2010.

Let us see which incident has lead to the maximum spike in the number of victims.

temp=shoot %>% select(Title,Location,Total.victims) %>% arrange(desc(Total.victims)) %>% head(20)
temp=inner_join(temp,shoot,by=c("Location","Title")) 
kable(head(temp))
Title Location Total.victims.x S. month day Year Summary Fatalities Injured Total.victims.y Mental.Health.Issues Race Gender Latitude Longitude
Las Vegas Strip mass shooting Las Vegas, NV 573 1 Oct 1 2017 58 515 573 Unclear NA NA
Orlando nightclub massacre Orlando, Florida 102 11 Jun 12 2016 Omar Mateen, 29, attacked the Pulse nighclub in Orlando in the early morning hours of June 12. He was killed by law enforcement who raided the club after a prolonged standoff. 49 53 102 Unclear Other M NA NA
Aurora theater shooting Aurora, Colorado 82 203 Jul 20 2012 James Holmes, 24, opened fire in a movie theater during the opening night of “The Dark Night Rises” and was later arrested outside. 12 70 82 Yes white Male 39.70928 -104.82349
Movie Theater in Aurora Denver, Colorado 70 204 Jul 20 2012 On July 20, 2012, a 24-year old student set off several gas or smoke canisters at a movie theater in Aurora, and then opened fire on the theater audience, killing twelve people and wounding fifty-eight. Moments after the shooting, police arrested the shooter next to his car behind the theater. Once apprehended, the shooter told the police that he had booby-trapped his apartment with explosive devices before heading to the theater. 12 58 70 Yes White American or European American Male 39.72942 -104.98252
Virginia Tech massacre Blacksburg, Virginia 55 255 Apr 16 2007 Virginia Tech student Seung-Hui Cho, 23, opened fire on his school’s campus before committing suicide. 32 23 55 Yes Asian Male 37.22957 -80.41394
Virginia Tech Campus Blacksburg, Virginia 49 256 Apr 16 2007 On April 16, 2007, a 23-year-old mentally ill student of Virginia Tech in Blacksburg, Virginia arrived on campus and began methodically shooting at students and faculty in classrooms and hallways. He killed five faculty members and twenty seven students before shooting himself in the head. 33 17 49 Yes Asian American Male 37.22995 -80.42769
#Removing the unnecessary columns,
temp=temp %>% select(Title,Location,Total.victims.x,month,day,Year,Fatalities,Injured)
formattable(temp,align="l",list(Total.victims.x=color_tile("orange","red"),Fatalities=color_bar("darkgrey"),Injured=color_bar("violet")))
Title Location Total.victims.x month day Year Fatalities Injured
Las Vegas Strip mass shooting Las Vegas, NV 573 Oct 1 2017 58 515
Orlando nightclub massacre Orlando, Florida 102 Jun 12 2016 49 53
Aurora theater shooting Aurora, Colorado 82 Jul 20 2012 12 70
Movie Theater in Aurora Denver, Colorado 70 Jul 20 2012 12 58
Virginia Tech massacre Blacksburg, Virginia 55 Apr 16 2007 32 23
Virginia Tech Campus Blacksburg, Virginia 49 Apr 16 2007 33 17
University of Texas at Austin Austin, Texas 48 Aug 1 1966 17 32
Fort Hood Army Base Fort Hood, Texas 45 Nov 5 2009 13 32
Luby’s massacre Killeen, Texas 44 Oct 16 1991 24 20
Fort Hood massacre Fort Hood, Texas 43 Nov 5 2009 13 30
Luby’s Cafeteria in Killeen, Texas Killeen, Texas 43 Oct 16 1991 24 20
San Ysidro McDonald’s massacre San Ysidro, California 41 Jul 18 1984 22 19
McDonald’s restaurant in San Ysidro San Ysidro, California 40 Jul 18 1984 22 19
Columbine High School massacre Littleton, Colorado 37 Apr 20 1999 13 24
Columbine High School Littleton, Colorado 37 Apr 20 1999 15 24
San Bernardino mass shooting San Bernardino, California 35 Dec 2 2015 14 21
San Bernardino, California San Bernardino, California 35 Dec 2 2015 16 21
Stockton schoolyard shooting Stockton, California 35 Jan 17 1989 6 29
Cleveland Elementary School Stockton, California 35 Jan 17 1989 6 30
Newtown school shooting Newtown, Connecticut 30 Dec 14 2012 28 2

It is observed that the Las Vegas Incident in the month of Oct,2017 has maximum victims.

By Gender,Race and Mental Health

By Gender

We first clean this column since M/F ,Male/Female are same but they are given as seperate entries and similarly M and Male are same.This is done with the help of fct_collapse from forcats package.

str(shoot$Gender)
##  chr [1:398] "" "M" "M" "M" "M" "M" "M" "M" "M" "M" "M" "Unknown" ...

The data type is character which we convert it into factor and collapse factors intio similar groups for better visualisation.

shoot$Gender=factor(shoot$Gender)
shoot=shoot %>% mutate(Gender=fct_collapse(Gender,"Male"=c("M","Male"),"Male/Female"=c("M/F","Male/Female"),"Unknown"="Unknown","Female"="Female"))
temp=shoot %>% group_by(Gender)%>% summarise(count=n()) %>% mutate(perc=round((count/sum(count))*100)) %>% arrange(desc(count))
ggplot(temp[temp$Gender!="",],aes(Gender,count,fill=Gender))+geom_bar(stat="identity",na.rm=TRUE)+theme_fivethirtyeight()+theme(legend.position="none")+labs(x="Gender",y="Count",title="Perperator of the Incident")

The incidents seems to have been perperated by Male gender as evident from the chart.

By Race

str(shoot$Race)
##  chr [1:398] "" "Asian" "White" "" "White" "Black" "Latino" "" "Black" ...

The data type is character.We convert it into factor and visualize.

shoot$Race=factor(shoot$Race)
temp=shoot %>% group_by(Race) %>% summarise(count=n()) %>% mutate(perc=round((count/sum(count))*100)) %>% arrange(desc(count))
formattable(temp,align=c("l","r","r"),list(count=color_bar("red"),perc=color_tile("white","pink")))
Race count perc
White American or European American 137 34
Black American or African American 78 20
Unknown 44 11
white 41 10
Some other race 23 6
Asian American 16 4
black 9 2
White 8 2
Asian 7 2
Latino 7 2
Black 6 2
Other 5 1
3 1
Native American 3 1
Native American or Alaska Native 3 1
Two or more races 3 1
Asian American/Some other race 1 0
Black American or African American/Unknown 1 0
unclear 1 0
White 1 0
White American or European American/Some other Race 1 0

Similar to gender,there is a discrepancy in the data.we try to combine the repetitive races so that a clear picture emerge.

shoot=shoot %>% mutate(Race=fct_collapse(Race,"UnknownRace"=c("Unknown","Some other race","Other","unclear"),"Two or more Race"=c("Two or more races","Asian American/Some other race","Black American or African American/Unknown","White American or European American/Some other Race"),"WhiteRace"=c("White","white"),"BlackRace"=c("black","Black"))) 
temp=shoot %>% group_by(Race) %>% summarise(count=n()) %>% mutate(perc=round((count/sum(count))*100)) %>% arrange(desc(count))
ggplot(head(temp[temp!="",]),aes(Race,count,fill=Race))+geom_bar(stat="identity")+theme_hc()+theme(legend.position="none",axis.text.x=element_text(angle=90,hjust=0.5))+labs(x="Race",y="Count",title="Perperator's Race",caption="Source:FBI,USA Today,Web")

34 % of the time,the perperator of the incident turns out to be a White American or European American descent.

Mental Health

Data type:

str(shoot$Mental.Health.Issues)
##  chr [1:398] "Unclear" "Yes" "Unclear" "Unclear" "Yes" "Unclear" "Yes" ...

The datatype is character.Similar to last two variables we repeat the same steps for Mental Health Issues variable.

shoot$Mental.Health.Issues=factor(shoot$Mental.Health.Issues)
temp=shoot %>% group_by(Mental.Health.Issues) %>% summarise(count=n())
head(temp)
## # A tibble: 6 x 2
##   Mental.Health.Issues count
##                 <fctr> <int>
## 1                   No   110
## 2              Unclear    21
## 3             Unclear      1
## 4              unknown     1
## 5              Unknown   120
## 6                  Yes   145
shoot = shoot %>% mutate(Mental.Health.Issues=factor(ifelse(c(shoot$Mental.Health.Issues=="Unknown"|shoot$Mental.Health.Issues=="unknown"| shoot$Mental.Health.Issues=="Unclear"),"HealthUnknown",ifelse(shoot$Mental.Health.Issues=="Yes","Yes","No"))))
temp=shoot %>% group_by(Mental.Health.Issues) %>% summarise(count=n())
ggplot(temp,aes(Mental.Health.Issues,count,fill=Mental.Health.Issues))+geom_bar(stat="identity")+themegg()+labs(x="Mental Health",title="Mental Stability")+theme(legend.position = "none")

Since there is equal split in the data we are not able to conclude definitely whether the incidents were commited by people with mental issues.

Conclusion:

This kernal tries to answer basic questions after seeing the data like - incident time pattern,victims sufferred by incident,incident perperator’s background etc.

I have used popular packages in R like dplyr,ggplot,formattable and have tried my hands on markdown tools like tabset and R packages like forcats.