Loading the dataset and libraries
library(ggplot2)
library(dplyr)
library(ggthemes)
library(formattable)
library(plotly)
library(ggrepel)
library(tidyr)
library(cowplot)
library(knitr)
library(forcats)
shoot=read.csv("Mass Shootings Dataset.csv",header=TRUE,stringsAsFactors = FALSE)
Getting a glimpse and structure of the data,
glimpse(shoot)
## Observations: 398
## Variables: 13
## $ S. <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...
## $ Title <chr> "Las Vegas Strip mass shooting", "San Fra...
## $ Location <chr> "Las Vegas, NV", "San Francisco, CA", "Tu...
## $ Date <chr> "10/1/2017", "6/14/2017", "6/7/2017", "6/...
## $ Summary <chr> "", "Jimmy Lam, 38, fatally shot three co...
## $ Fatalities <int> 58, 3, 3, 5, 3, 3, 5, 5, 3, 5, 49, 0, 1, ...
## $ Injured <int> 515, 2, 0, 0, 0, 0, 6, 0, 3, 11, 53, 4, 4...
## $ Total.victims <int> 573, 5, 3, 5, 3, 3, 11, 5, 6, 16, 102, 4,...
## $ Mental.Health.Issues <chr> "Unclear", "Yes", "Unclear", "Unclear", "...
## $ Race <chr> "", "Asian", "White", "", "White", "Black...
## $ Gender <chr> "", "M", "M", "M", "M", "M", "M", "M", "M...
## $ Latitude <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ Longitude <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
themegg=function(){theme(axis.title.x = element_text(color="red",hjust=1),axis.text.x = element_text(angle=90,vjust=0.5),axis.title.y = element_text(color="red",hjust=1),plot.title=element_text(size=15,color="blue",hjust=0.5),plot.subtitle=element_text(face="italic"),legend.background = element_blank(),legend.title=element_text(color="black",face="bold"),legend.text=element_text(color="black",face="bold"))}
Answering some questions to find out the pattern hidden
Which event has claimed maximum fatalities?
length(unique(shoot$Title))
## [1] 397
- There are 397 unique shooting events captured in the dataset.
temp=shoot %>% select(Title,Location,Fatalities) %>% arrange(desc(Fatalities)) %>% head(20)
q=ggplot(temp,aes(factor(Title,levels=Title),Fatalities,fill=Title))+geom_bar(stat="identity")+themegg()+labs(x="Event",y="Fatalities",title="Event and Fatalities",caption="Source:Wiki,USAToday,Web")+theme(legend.position="none")+scale_y_continuous(limits=c(0,60),breaks=seq(0,60,5))
ggplotly(q)
- The Las Vegas strip mass shooting has maximum fatalities in this dataset followed by Orlando nightclub massacare.
Which event has more victims ?
temp=shoot %>% select(Title,Location,Total.victims) %>% arrange(desc(Total.victims)) %>% head(20)
ggplot(temp,aes(factor(Title,level=Title),Total.victims,fill=Title))+geom_bar(stat="identity")+theme_fivethirtyeight()+labs(x="Event",y="Total Victims",title="Maximum victims for an Event")+theme(legend.position="none",axis.text.x=element_text(angle=90,hjust=0.5))+geom_label_repel(aes(label=Total.victims,fill=factor(Location)),color="white",size=2.1)

- The first two incidents have both maximum fatalities and has more victims if we do a comparative analysis of the earlier chart.
Which month,year has seen more incidents ?
The dataset has date column in the format MM/DD/YYYY.We split the date into month and year to find out the trend.Date is in character format.
Then I have plotted the graphs individually for months and years and have tried my hands on cowplot package.For more info on the package check out this link
shoot=separate(shoot,"Date",c("month","day","Year"),sep="/")
shoot$month=month.abb[as.numeric(shoot$month)]
shoot$month=factor(shoot$month,levels=month.abb)
temp =shoot %>% select(month) %>% group_by(month) %>% summarise(count=n())
q=ggplot(temp,aes(month,count,fill=month))+geom_bar(stat="identity")+theme_gdocs()+theme(legend.position="none")+labs(x="Month",y="Aggregate number of incidents",title="Number of Incidents by Month",caption="Source:Wiki,USA Today,Web")
temp=shoot %>% select(Year) %>% group_by(Year) %>% summarise(count=n()) %>% arrange(desc(count))
p=ggplot(temp,aes(Year,count,group=1))+geom_line()+theme_classic()+theme(legend.position="none",axis.text.x=element_text(angle=90,hjust=0.5))+labs(x="Year",y="Aggregate number of incidents",title="Number of Incidents by Year",caption="Source:Wiki,USA Today,Web")
plot_grid(p,q,labels=NULL,cols=1)

The number of incidents have peaked during the month of feburary,March and april.
The number of incidents seems to have peaked after 2010.
Let us see which incident has lead to the maximum spike in the number of victims.
temp=shoot %>% select(Title,Location,Total.victims) %>% arrange(desc(Total.victims)) %>% head(20)
temp=inner_join(temp,shoot,by=c("Location","Title"))
kable(head(temp))
Las Vegas Strip mass shooting |
Las Vegas, NV |
573 |
1 |
Oct |
1 |
2017 |
|
58 |
515 |
573 |
Unclear |
|
|
NA |
NA |
Orlando nightclub massacre |
Orlando, Florida |
102 |
11 |
Jun |
12 |
2016 |
Omar Mateen, 29, attacked the Pulse nighclub in Orlando in the early morning hours of June 12. He was killed by law enforcement who raided the club after a prolonged standoff. |
49 |
53 |
102 |
Unclear |
Other |
M |
NA |
NA |
Aurora theater shooting |
Aurora, Colorado |
82 |
203 |
Jul |
20 |
2012 |
James Holmes, 24, opened fire in a movie theater during the opening night of “The Dark Night Rises” and was later arrested outside. |
12 |
70 |
82 |
Yes |
white |
Male |
39.70928 |
-104.82349 |
Movie Theater in Aurora |
Denver, Colorado |
70 |
204 |
Jul |
20 |
2012 |
On July 20, 2012, a 24-year old student set off several gas or smoke canisters at a movie theater in Aurora, and then opened fire on the theater audience, killing twelve people and wounding fifty-eight. Moments after the shooting, police arrested the shooter next to his car behind the theater. Once apprehended, the shooter told the police that he had booby-trapped his apartment with explosive devices before heading to the theater. |
12 |
58 |
70 |
Yes |
White American or European American |
Male |
39.72942 |
-104.98252 |
Virginia Tech massacre |
Blacksburg, Virginia |
55 |
255 |
Apr |
16 |
2007 |
Virginia Tech student Seung-Hui Cho, 23, opened fire on his school’s campus before committing suicide. |
32 |
23 |
55 |
Yes |
Asian |
Male |
37.22957 |
-80.41394 |
Virginia Tech Campus |
Blacksburg, Virginia |
49 |
256 |
Apr |
16 |
2007 |
On April 16, 2007, a 23-year-old mentally ill student of Virginia Tech in Blacksburg, Virginia arrived on campus and began methodically shooting at students and faculty in classrooms and hallways. He killed five faculty members and twenty seven students before shooting himself in the head. |
33 |
17 |
49 |
Yes |
Asian American |
Male |
37.22995 |
-80.42769 |
#Removing the unnecessary columns,
temp=temp %>% select(Title,Location,Total.victims.x,month,day,Year,Fatalities,Injured)
formattable(temp,align="l",list(Total.victims.x=color_tile("orange","red"),Fatalities=color_bar("darkgrey"),Injured=color_bar("violet")))
Title
|
Location
|
Total.victims.x
|
month
|
day
|
Year
|
Fatalities
|
Injured
|
Las Vegas Strip mass shooting
|
Las Vegas, NV
|
573
|
Oct
|
1
|
2017
|
58
|
515
|
Orlando nightclub massacre
|
Orlando, Florida
|
102
|
Jun
|
12
|
2016
|
49
|
53
|
Aurora theater shooting
|
Aurora, Colorado
|
82
|
Jul
|
20
|
2012
|
12
|
70
|
Movie Theater in Aurora
|
Denver, Colorado
|
70
|
Jul
|
20
|
2012
|
12
|
58
|
Virginia Tech massacre
|
Blacksburg, Virginia
|
55
|
Apr
|
16
|
2007
|
32
|
23
|
Virginia Tech Campus
|
Blacksburg, Virginia
|
49
|
Apr
|
16
|
2007
|
33
|
17
|
University of Texas at Austin
|
Austin, Texas
|
48
|
Aug
|
1
|
1966
|
17
|
32
|
Fort Hood Army Base
|
Fort Hood, Texas
|
45
|
Nov
|
5
|
2009
|
13
|
32
|
Luby’s massacre
|
Killeen, Texas
|
44
|
Oct
|
16
|
1991
|
24
|
20
|
Fort Hood massacre
|
Fort Hood, Texas
|
43
|
Nov
|
5
|
2009
|
13
|
30
|
Luby’s Cafeteria in Killeen, Texas
|
Killeen, Texas
|
43
|
Oct
|
16
|
1991
|
24
|
20
|
San Ysidro McDonald’s massacre
|
San Ysidro, California
|
41
|
Jul
|
18
|
1984
|
22
|
19
|
McDonald’s restaurant in San Ysidro
|
San Ysidro, California
|
40
|
Jul
|
18
|
1984
|
22
|
19
|
Columbine High School massacre
|
Littleton, Colorado
|
37
|
Apr
|
20
|
1999
|
13
|
24
|
Columbine High School
|
Littleton, Colorado
|
37
|
Apr
|
20
|
1999
|
15
|
24
|
San Bernardino mass shooting
|
San Bernardino, California
|
35
|
Dec
|
2
|
2015
|
14
|
21
|
San Bernardino, California
|
San Bernardino, California
|
35
|
Dec
|
2
|
2015
|
16
|
21
|
Stockton schoolyard shooting
|
Stockton, California
|
35
|
Jan
|
17
|
1989
|
6
|
29
|
Cleveland Elementary School
|
Stockton, California
|
35
|
Jan
|
17
|
1989
|
6
|
30
|
Newtown school shooting
|
Newtown, Connecticut
|
30
|
Dec
|
14
|
2012
|
28
|
2
|
It is observed that the Las Vegas Incident in the month of Oct,2017 has maximum victims.
By Gender,Race and Mental Health
By Gender
We first clean this column since M/F ,Male/Female are same but they are given as seperate entries and similarly M and Male are same.This is done with the help of fct_collapse from forcats package.
str(shoot$Gender)
## chr [1:398] "" "M" "M" "M" "M" "M" "M" "M" "M" "M" "M" "Unknown" ...
The data type is character which we convert it into factor and collapse factors intio similar groups for better visualisation.
shoot$Gender=factor(shoot$Gender)
shoot=shoot %>% mutate(Gender=fct_collapse(Gender,"Male"=c("M","Male"),"Male/Female"=c("M/F","Male/Female"),"Unknown"="Unknown","Female"="Female"))
temp=shoot %>% group_by(Gender)%>% summarise(count=n()) %>% mutate(perc=round((count/sum(count))*100)) %>% arrange(desc(count))
ggplot(temp[temp$Gender!="",],aes(Gender,count,fill=Gender))+geom_bar(stat="identity",na.rm=TRUE)+theme_fivethirtyeight()+theme(legend.position="none")+labs(x="Gender",y="Count",title="Perperator of the Incident")

The incidents seems to have been perperated by Male gender as evident from the chart.
By Race
str(shoot$Race)
## chr [1:398] "" "Asian" "White" "" "White" "Black" "Latino" "" "Black" ...
The data type is character.We convert it into factor and visualize.
shoot$Race=factor(shoot$Race)
temp=shoot %>% group_by(Race) %>% summarise(count=n()) %>% mutate(perc=round((count/sum(count))*100)) %>% arrange(desc(count))
formattable(temp,align=c("l","r","r"),list(count=color_bar("red"),perc=color_tile("white","pink")))
Race
|
count
|
perc
|
White American or European American
|
137
|
34
|
Black American or African American
|
78
|
20
|
Unknown
|
44
|
11
|
white
|
41
|
10
|
Some other race
|
23
|
6
|
Asian American
|
16
|
4
|
black
|
9
|
2
|
White
|
8
|
2
|
Asian
|
7
|
2
|
Latino
|
7
|
2
|
Black
|
6
|
2
|
Other
|
5
|
1
|
|
3
|
1
|
Native American
|
3
|
1
|
Native American or Alaska Native
|
3
|
1
|
Two or more races
|
3
|
1
|
Asian American/Some other race
|
1
|
0
|
Black American or African American/Unknown
|
1
|
0
|
unclear
|
1
|
0
|
White
|
1
|
0
|
White American or European American/Some other Race
|
1
|
0
|
Similar to gender,there is a discrepancy in the data.we try to combine the repetitive races so that a clear picture emerge.
shoot=shoot %>% mutate(Race=fct_collapse(Race,"UnknownRace"=c("Unknown","Some other race","Other","unclear"),"Two or more Race"=c("Two or more races","Asian American/Some other race","Black American or African American/Unknown","White American or European American/Some other Race"),"WhiteRace"=c("White","white"),"BlackRace"=c("black","Black")))
temp=shoot %>% group_by(Race) %>% summarise(count=n()) %>% mutate(perc=round((count/sum(count))*100)) %>% arrange(desc(count))
ggplot(head(temp[temp!="",]),aes(Race,count,fill=Race))+geom_bar(stat="identity")+theme_hc()+theme(legend.position="none",axis.text.x=element_text(angle=90,hjust=0.5))+labs(x="Race",y="Count",title="Perperator's Race",caption="Source:FBI,USA Today,Web")

34 % of the time,the perperator of the incident turns out to be a White American or European American descent.
Mental Health
Data type:
str(shoot$Mental.Health.Issues)
## chr [1:398] "Unclear" "Yes" "Unclear" "Unclear" "Yes" "Unclear" "Yes" ...
The datatype is character.Similar to last two variables we repeat the same steps for Mental Health Issues variable.
shoot$Mental.Health.Issues=factor(shoot$Mental.Health.Issues)
temp=shoot %>% group_by(Mental.Health.Issues) %>% summarise(count=n())
head(temp)
## # A tibble: 6 x 2
## Mental.Health.Issues count
## <fctr> <int>
## 1 No 110
## 2 Unclear 21
## 3 Unclear 1
## 4 unknown 1
## 5 Unknown 120
## 6 Yes 145
shoot = shoot %>% mutate(Mental.Health.Issues=factor(ifelse(c(shoot$Mental.Health.Issues=="Unknown"|shoot$Mental.Health.Issues=="unknown"| shoot$Mental.Health.Issues=="Unclear"),"HealthUnknown",ifelse(shoot$Mental.Health.Issues=="Yes","Yes","No"))))
temp=shoot %>% group_by(Mental.Health.Issues) %>% summarise(count=n())
ggplot(temp,aes(Mental.Health.Issues,count,fill=Mental.Health.Issues))+geom_bar(stat="identity")+themegg()+labs(x="Mental Health",title="Mental Stability")+theme(legend.position = "none")

Since there is equal split in the data we are not able to conclude definitely whether the incidents were commited by people with mental issues.