What skills does a data scientist need?Posted: 14 January 2013
From what I’ve learned, data science is a highly interdisciplinary field, requiring a diverse skill set.
Here are some of the most cited skills:
- All-purpose programming languages like Python and R (perhaps Matlab/Octave) which are good for analyzing data
- (??) Backend programming languages that might include Java, Python, Scala, or Ruby
- (??) Big data stuff like Hadoop, Map-Reduce
- Databases (SQL?)
- Data munging/wrangling skills (UNIX, Perl?) for parsing, cleaning data
- General knowledge of good software engineering practices
- Machine learning and its various algorithms
- The more “artistic” part which is understanding models, knowing what questions to ask, being rigorous and not just cranking algorithms through a black box
- Data visualization – apparently many of the New York Times visualizations are made in R!
- Storytelling – can you construct a narrative around the data? Can you explain your analysis in human relatable terms, to convince non-data scientists of your conclusions? Michael Driscoll says: “A data scientist is someone who takes your data and transforms it into actionable intelligence.”
- Technically not a skill, but a data scientist should be someone who’s excited about playing around with data, curious about the world, and enjoys doing their own data analysis projects for fun
(??) Not as important, but potentially useful depending on what you’re doing
I drew up this list partially from the opinions of a few practicing data scientists, including Pete Skomoroch (Principal Data Scientist at LinkedIn), Hilary Mason (Chief Scientist at bitly), and Michael Driscoll (Chairman of Dataspora).
Another way to find out what skills a data scientist needs — go to LinkedIn’s skills trend page on data science.