Aug 28, 2020
Why Every Data Scientist Wants a Data Engineer
Are you asking your data scientist to perform data engineering tasks? If so, she may soon quit.
A data scientist or the new age unicorn is hard to find and harder to keep. The massive success of titans like Google, Facebook, Amazon, and LinkedIn has left frontrunners scampering to hire frantically for this role. But, their quest hardly halts at a great hire.
According to a recent poll conducted by KDnuggests, How long analytics/data science professionals stay at their jobs:
50% of respondents said they stayed with the previous organizations for less than two years.
In the same poll, the respondents also said they would like to stay with an organization for three to four and half years.
So what is making data scientists leave when they intend to stay? We dig into their roles to understand better:
What do Data Scientists do?
Data scientists clean, construct and analyze data. They work with R, SAS, Python, Matlab, SQL, Hive, Spark, and Pig to get insights.
In most organizations, the part that data scientists hate most is also the part they play the most.
Check the pie diagram showing the daily activities of data scientists. A major piece of this pie is consumed by the task of cleaning and constructing data sets. This is way slimmer than the part they like most i.e. finding data patterns and building training data sets.
What Do Data Engineers Do?
A data engineer develops, builds, tests, and maintains architectures. She uses languages like SQL, Hive, Pig, R, Matlab, SAS, SPSS, Python, Java, Ruby, Perl, and C++ to do so. With a strong technical background and a command over creating and integrating APIs, data engineers are champions of data pipelining and performance optimization.
In other words, data engineers ready the data for the Data Scientists to use.
Their roles can change as per the requirement. “In smaller companies — where no data infrastructure team has yet been formalized — the data engineering role may also cover the workload around setting up and operating the organization’s data infrastructure” - Maxime Beauchemin, data engineer extraordinaire at Airbnb
How Data Engineers Help Data Scientists
Both facilitate what the other needs. Data engineers lay the groundwork on which data scientists build the structure. Without the groundwork, the structure will fall and without the structure, the groundwork will be useless.
A data scientist cannot perform her job without access to the required volume of clean data which, in the most suitable scenarios, comes from data engineers.
In many organizations, data scientists and engineers work hand-in-hand often collaborating on tasks and overlapping their responsibilities. While data scientists don’t mind occasionally pitching in the data engineer’s role, they certainly wouldn’t be too pleased about handling the entire gamut of activities.
Contrarily, in organizations where the two don’t get along so well, projects are reduced to bumpy roads with endless bottlenecks. Findings from Venturebeat say, 87% of machine learning projects don’t make it into production because of a lack of collaboration and data concerns among data engineers and data scientists.
Fortunately, this scenario can be greatly improved by incorporating the following tips:
Don’t look for a magical hybrid of a data scientist and a data engineer. It doesn’t exist.
Look for an engineer who wants to take a larger set of responsibilities and doesn’t just want to stick to the backend. A data engineer who is interested in building tools for the team can greatly ease the tasks for the data scientist.
Understand the role you are hiring for and ask relevant questions while hiring.
Most importantly, ask this question - Do you have the right number and kind of engineers to support your data scientist?