Original article appeared on Quora as What are the best skills for aspiring data scientists to have at the beginning of their careers?. Answer by Dan Wulin, Head of Data Science & Machine Learning at Wayfair:
This is a good question, but it is also a difficult one to answer with specifics. There is a wide range of how data science is defined within industry and a corresponding wide range of technical skills that may be relevant depending on the role being discussed. I’ll start by laying out broad intellectual skills that I believe are important and then will touch on technical skills that I also have seen being generally useful.
There are common intellectual skills that we recruit for on the Data Science & Machine Learning team at Wayfair and I imagine that most teams screen for a similar set:
Excitement to learn & engage with a variety of quantitative & engineering problems:
Any meaningful data science problem will necessarily have a variety of known & unknown challenges at the outset that will need to be addressed in the course of solving it. One needs to have a willingness & excitement to learn the technical skills necessary to cope with this. The challenges that might arise are strongly context dependent, but can range anywhere from needing to leverage newer machine learning algorithms that are less polished and appear in recent research literature, to having to engineer a robust data pipeline to ensure your model will function appropriately in production settings, and more. You need to be flexible and effective when you change gears.
Strong bias towards transparency with outcomes & findings:
This is a generally useful trait, doubly so in technical fields. No one will know the details of what you are working on better than you, so it is important that your manager, business partners & data science peers get a full picture of how your work is going so that they are able to work effectively with you. To give an example, if you are encountering a roadblock in your work, it is incredibly valuable to be able to directly identify this, the problem you are seeing and potential root causes to the team around you so you can brainstorm solutions. This level of transparency does not come naturally to many people because it involves a certain amount of humility to say that you ran into a roadblock, but is a big part of the “scientific” thinking that one needs to be effective.
Knowing when (not) to be a perfectionist:
Again, this is a generally useful trait and it is especially dangerous in technical fields when one does not have it. There is almost an infinite level of depth one can take any problem – I often use the example that many of the problems we work on at Wayfair could be the focus of a single Ph.D thesis (the point is that there are a ton of interesting & unanswered questions that arise in any project). Strong data scientists will have a sense of “what is good enough” to create a minimum viable product and there will often be a mix of areas where the data scientist needs to pursue perfection versus areas where less-than-perfect is perfectly acceptable.
On a technical skill level:
It is hard to go wrong with learning Python and standard machine learning packages like NumPy, scikit-learn etc. I would also recommend that aspiring data scientists learn as many best practices from the software engineering world as possible – many members of the Wayfair Data Science team join very comfortably scripting in Python but less comfortable writing production level code, so this is a great dimension to differentiate oneself along. Outside of this, I would recommend to pick one or two more “advanced” topics (deep learning, NLP, learn to rank) to dig deeper on.
More than all of this, one of the best things that you can do is to work through as many realistic problems as possible.