Introduction

Loading the dataset:

## # A tibble: 5 x 228
##                                        GenderSelect       Country   Age
##                                               <chr>         <chr> <int>
## 1 Non-binary, genderqueer, or gender non-conforming          <NA>    NA
## 2                                            Female United States    30
## 3                                              Male        Canada    28
## 4                                              Male United States    56
## 5                                              Male        Taiwan    38
## # ... with 225 more variables: EmploymentStatus <chr>,
## #   StudentStatus <chr>, LearningDataScience <chr>, CodeWriter <chr>,
## #   CareerSwitcher <chr>, CurrentJobTitleSelect <chr>, TitleFit <chr>,
## #   CurrentEmployerType <chr>, MLToolNextYearSelect <chr>,
## #   MLMethodNextYearSelect <chr>, LanguageRecommendationSelect <chr>,
## #   PublicDatasetsSelect <chr>, LearningPlatformSelect <chr>,
## #   LearningPlatformUsefulnessArxiv <chr>,
## #   LearningPlatformUsefulnessBlogs <chr>,
## #   LearningPlatformUsefulnessCollege <chr>,
## #   LearningPlatformUsefulnessCompany <chr>,
## #   LearningPlatformUsefulnessConferences <chr>,
## #   LearningPlatformUsefulnessFriends <chr>,
## #   LearningPlatformUsefulnessKaggle <chr>,
## #   LearningPlatformUsefulnessNewsletters <chr>,
## #   LearningPlatformUsefulnessCommunities <chr>,
## #   LearningPlatformUsefulnessDocumentation <chr>,
## #   LearningPlatformUsefulnessCourses <chr>,
## #   LearningPlatformUsefulnessProjects <chr>,
## #   LearningPlatformUsefulnessPodcasts <chr>,
## #   LearningPlatformUsefulnessSO <chr>,
## #   LearningPlatformUsefulnessTextbook <chr>,
## #   LearningPlatformUsefulnessTradeBook <chr>,
## #   LearningPlatformUsefulnessTutoring <chr>,
## #   LearningPlatformUsefulnessYouTube <chr>,
## #   BlogsPodcastsNewslettersSelect <chr>, LearningDataScienceTime <chr>,
## #   JobSkillImportanceBigData <chr>, JobSkillImportanceDegree <chr>,
## #   JobSkillImportanceStats <chr>,
## #   JobSkillImportanceEnterpriseTools <chr>,
## #   JobSkillImportancePython <chr>, JobSkillImportanceR <chr>,
## #   JobSkillImportanceSQL <chr>, JobSkillImportanceKaggleRanking <chr>,
## #   JobSkillImportanceMOOC <chr>, JobSkillImportanceVisualizations <chr>,
## #   JobSkillImportanceOtherSelect1 <chr>,
## #   JobSkillImportanceOtherSelect2 <chr>,
## #   JobSkillImportanceOtherSelect3 <chr>, CoursePlatformSelect <chr>,
## #   HardwarePersonalProjectsSelect <chr>, TimeSpentStudying <chr>,
## #   ProveKnowledgeSelect <chr>, DataScienceIdentitySelect <chr>,
## #   FormalEducation <chr>, MajorSelect <chr>, Tenure <chr>,
## #   PastJobTitlesSelect <chr>, FirstTrainingSelect <chr>,
## #   LearningCategorySelftTaught <int>,
## #   LearningCategoryOnlineCourses <int>, LearningCategoryWork <int>,
## #   LearningCategoryUniversity <dbl>, LearningCategoryKaggle <dbl>,
## #   LearningCategoryOther <int>, MLSkillsSelect <chr>,
## #   MLTechniquesSelect <chr>, ParentsEducation <chr>,
## #   EmployerIndustry <chr>, EmployerSize <chr>, EmployerSizeChange <chr>,
## #   EmployerMLTime <chr>, EmployerSearchMethod <chr>,
## #   UniversityImportance <chr>, JobFunctionSelect <chr>,
## #   WorkHardwareSelect <chr>, WorkDataTypeSelect <chr>,
## #   WorkProductionFrequency <chr>, WorkDatasetSize <chr>,
## #   WorkAlgorithmsSelect <chr>, WorkToolsSelect <chr>,
## #   WorkToolsFrequencyAmazonML <chr>, WorkToolsFrequencyAWS <chr>,
## #   WorkToolsFrequencyAngoss <chr>, WorkToolsFrequencyC <chr>,
## #   WorkToolsFrequencyCloudera <chr>, WorkToolsFrequencyDataRobot <chr>,
## #   WorkToolsFrequencyFlume <chr>, WorkToolsFrequencyGCP <chr>,
## #   WorkToolsFrequencyHadoop <chr>, WorkToolsFrequencyIBMCognos <chr>,
## #   WorkToolsFrequencyIBMSPSSModeler <chr>,
## #   WorkToolsFrequencyIBMSPSSStatistics <chr>,
## #   WorkToolsFrequencyIBMWatson <chr>, WorkToolsFrequencyImpala <chr>,
## #   WorkToolsFrequencyJava <chr>, WorkToolsFrequencyJulia <chr>,
## #   WorkToolsFrequencyJupyter <chr>,
## #   WorkToolsFrequencyKNIMECommercial <chr>,
## #   WorkToolsFrequencyKNIMEFree <chr>,
## #   WorkToolsFrequencyMathematica <chr>, WorkToolsFrequencyMATLAB <chr>,
## #   WorkToolsFrequencyAzure <chr>, ...

Survey Diversity

Let us understand the respondents background-Gender,Employment,Country,Age etc in detail.

Gender

## [1] "character"

82 % of the respondends are male.

Age Distribution

## There are 331 NA values in Age

  • The graph seems to be skewed towards right which means that the median value is around 25 and the mean is greater than 25.Let us plot the summary statistics of the Age with respect to gender.

Age Distribution by Gender

There seems to be a moderate difference in median age between the genders as evident from the boxplot.Another point to note here is that there are outliers with age mentioned 100 and 0.

Location of Respondents

## [1] 121

The population of the survey consist of majority from India and United States.

Formal Education

Let us know about the survey diversity interms of education status.

FormalEducation Count Perc
Master’s degree 6273 41.78
Bachelor’s degree 4811 32.04
Doctoral degree 2347 15.63
Some college/university study without earning a bachelor’s degree 786 5.23
Professional degree 451 3.00
I did not complete any formal education past high school 257 1.71
I prefer not to answer 90 0.60

41.7 % of our respondends have masters degree while 32 % of them have completed their Bachelors.

Let us know about their majors.

MajorSelect Count Perc
Computer Science 4397 33.11
Mathematics or statistics 2220 16.72
Engineering (non-computer focused) 1339 10.08
Electrical Engineering 1303 9.81
Other 848 6.39
Physics 830 6.25
Information technology, networking, or system administration 693 5.22
A social science 531 4.00
Biology 274 2.06
Management information systems 237 1.78
A humanities discipline 198 1.49
A health science 152 1.14
Psychology 137 1.03
I never declared a major 65 0.49
Fine arts or performing arts 57 0.43

33% of them have completed Computer Science while 30 % of the respondends have their majors in either one of the areas mentioned in the table highlighted in red.

Years of Study

Time Taken to learn DataScience

How long it takes to learn Data Science??

We compare the data with current job title of respondents and the learning time.

## [1] 16
## [1] 6

From the graph,we understand that a majority of people across the job title have responded that they learn data science within <1 year.Does this correlate with study hours??

Study Hours Vs Data Science Learning time

## [1] 4

People who study for 2-10 hrs every day and below that range have learnt data science <1 year whereas people who put in 40 + hours have actually taken 1-2 years to grasp the subject..!!! I may have interpreted this wrong or this might be the exact scenario.Either that people who have already into the field or related to that field could have found it easier that with less effort and practise they are able to master the skills or that since the dataset is randomly represented,this is not accurate.

Conclusion: