For the first time, Kaggle conducted an industry-wide survey to establish a comprehensive view of the state of data science and machine learning. The survey received over 16,000 responses and it gave a ton of information about who is working with data, what’s happening at the cutting edge of machine learning across industries, and how new data scientists can best break into the field.
We try to explore this dataset and find out from respondents some interesting trends and summarise the results.In some cases,I also intend to do some modelling if there is scope for classification or differentiation between the groups.
## # A tibble: 5 x 228
## GenderSelect Country Age
## <chr> <chr> <int>
## 1 Non-binary, genderqueer, or gender non-conforming <NA> NA
## 2 Female United States 30
## 3 Male Canada 28
## 4 Male United States 56
## 5 Male Taiwan 38
## # ... with 225 more variables: EmploymentStatus <chr>,
## # StudentStatus <chr>, LearningDataScience <chr>, CodeWriter <chr>,
## # CareerSwitcher <chr>, CurrentJobTitleSelect <chr>, TitleFit <chr>,
## # CurrentEmployerType <chr>, MLToolNextYearSelect <chr>,
## # MLMethodNextYearSelect <chr>, LanguageRecommendationSelect <chr>,
## # PublicDatasetsSelect <chr>, LearningPlatformSelect <chr>,
## # LearningPlatformUsefulnessArxiv <chr>,
## # LearningPlatformUsefulnessBlogs <chr>,
## # LearningPlatformUsefulnessCollege <chr>,
## # LearningPlatformUsefulnessCompany <chr>,
## # LearningPlatformUsefulnessConferences <chr>,
## # LearningPlatformUsefulnessFriends <chr>,
## # LearningPlatformUsefulnessKaggle <chr>,
## # LearningPlatformUsefulnessNewsletters <chr>,
## # LearningPlatformUsefulnessCommunities <chr>,
## # LearningPlatformUsefulnessDocumentation <chr>,
## # LearningPlatformUsefulnessCourses <chr>,
## # LearningPlatformUsefulnessProjects <chr>,
## # LearningPlatformUsefulnessPodcasts <chr>,
## # LearningPlatformUsefulnessSO <chr>,
## # LearningPlatformUsefulnessTextbook <chr>,
## # LearningPlatformUsefulnessTradeBook <chr>,
## # LearningPlatformUsefulnessTutoring <chr>,
## # LearningPlatformUsefulnessYouTube <chr>,
## # BlogsPodcastsNewslettersSelect <chr>, LearningDataScienceTime <chr>,
## # JobSkillImportanceBigData <chr>, JobSkillImportanceDegree <chr>,
## # JobSkillImportanceStats <chr>,
## # JobSkillImportanceEnterpriseTools <chr>,
## # JobSkillImportancePython <chr>, JobSkillImportanceR <chr>,
## # JobSkillImportanceSQL <chr>, JobSkillImportanceKaggleRanking <chr>,
## # JobSkillImportanceMOOC <chr>, JobSkillImportanceVisualizations <chr>,
## # JobSkillImportanceOtherSelect1 <chr>,
## # JobSkillImportanceOtherSelect2 <chr>,
## # JobSkillImportanceOtherSelect3 <chr>, CoursePlatformSelect <chr>,
## # HardwarePersonalProjectsSelect <chr>, TimeSpentStudying <chr>,
## # ProveKnowledgeSelect <chr>, DataScienceIdentitySelect <chr>,
## # FormalEducation <chr>, MajorSelect <chr>, Tenure <chr>,
## # PastJobTitlesSelect <chr>, FirstTrainingSelect <chr>,
## # LearningCategorySelftTaught <int>,
## # LearningCategoryOnlineCourses <int>, LearningCategoryWork <int>,
## # LearningCategoryUniversity <dbl>, LearningCategoryKaggle <dbl>,
## # LearningCategoryOther <int>, MLSkillsSelect <chr>,
## # MLTechniquesSelect <chr>, ParentsEducation <chr>,
## # EmployerIndustry <chr>, EmployerSize <chr>, EmployerSizeChange <chr>,
## # EmployerMLTime <chr>, EmployerSearchMethod <chr>,
## # UniversityImportance <chr>, JobFunctionSelect <chr>,
## # WorkHardwareSelect <chr>, WorkDataTypeSelect <chr>,
## # WorkProductionFrequency <chr>, WorkDatasetSize <chr>,
## # WorkAlgorithmsSelect <chr>, WorkToolsSelect <chr>,
## # WorkToolsFrequencyAmazonML <chr>, WorkToolsFrequencyAWS <chr>,
## # WorkToolsFrequencyAngoss <chr>, WorkToolsFrequencyC <chr>,
## # WorkToolsFrequencyCloudera <chr>, WorkToolsFrequencyDataRobot <chr>,
## # WorkToolsFrequencyFlume <chr>, WorkToolsFrequencyGCP <chr>,
## # WorkToolsFrequencyHadoop <chr>, WorkToolsFrequencyIBMCognos <chr>,
## # WorkToolsFrequencyIBMSPSSModeler <chr>,
## # WorkToolsFrequencyIBMSPSSStatistics <chr>,
## # WorkToolsFrequencyIBMWatson <chr>, WorkToolsFrequencyImpala <chr>,
## # WorkToolsFrequencyJava <chr>, WorkToolsFrequencyJulia <chr>,
## # WorkToolsFrequencyJupyter <chr>,
## # WorkToolsFrequencyKNIMECommercial <chr>,
## # WorkToolsFrequencyKNIMEFree <chr>,
## # WorkToolsFrequencyMathematica <chr>, WorkToolsFrequencyMATLAB <chr>,
## # WorkToolsFrequencyAzure <chr>, ...
Let us understand the respondents background-Gender,Employment,Country,Age etc in detail.
## [1] "character"
82 % of the respondends are male.
## There are 331 NA values in Age
There seems to be a moderate difference in median age between the genders as evident from the boxplot.Another point to note here is that there are outliers with age mentioned 100 and 0.
## [1] 121
The population of the survey consist of majority from India and United States.
Let us know about the survey diversity interms of education status.
FormalEducation | Count | Perc |
---|---|---|
Master’s degree | 6273 | 41.78 |
Bachelor’s degree | 4811 | 32.04 |
Doctoral degree | 2347 | 15.63 |
Some college/university study without earning a bachelor’s degree | 786 | 5.23 |
Professional degree | 451 | 3.00 |
I did not complete any formal education past high school | 257 | 1.71 |
I prefer not to answer | 90 | 0.60 |
41.7 % of our respondends have masters degree while 32 % of them have completed their Bachelors.
Let us know about their majors.
MajorSelect | Count | Perc |
---|---|---|
Computer Science | 4397 | 33.11 |
Mathematics or statistics | 2220 | 16.72 |
Engineering (non-computer focused) | 1339 | 10.08 |
Electrical Engineering | 1303 | 9.81 |
Other | 848 | 6.39 |
Physics | 830 | 6.25 |
Information technology, networking, or system administration | 693 | 5.22 |
A social science | 531 | 4.00 |
Biology | 274 | 2.06 |
Management information systems | 237 | 1.78 |
A humanities discipline | 198 | 1.49 |
A health science | 152 | 1.14 |
Psychology | 137 | 1.03 |
I never declared a major | 65 | 0.49 |
Fine arts or performing arts | 57 | 0.43 |
33% of them have completed Computer Science while 30 % of the respondends have their majors in either one of the areas mentioned in the table highlighted in red.
How long it takes to learn Data Science??
We compare the data with current job title of respondents and the learning time.
## [1] 16
## [1] 6
From the graph,we understand that a majority of people across the job title have responded that they learn data science within <1 year.Does this correlate with study hours??
## [1] 4
People who study for 2-10 hrs every day and below that range have learnt data science <1 year whereas people who put in 40 + hours have actually taken 1-2 years to grasp the subject..!!! I may have interpreted this wrong or this might be the exact scenario.Either that people who have already into the field or related to that field could have found it easier that with less effort and practise they are able to master the skills or that since the dataset is randomly represented,this is not accurate.
On the other hand ,though i have analysed only one part of the survey data,due to time constraints i couldn’t complete analysing the other data files which is a downside.I can tell you that as i find time i will try to add those files too and do an analysis.
Finally All the best to all aspiring and growing data scientists and analysts.Thanks for reading.