Navigating the titles in the Data market used to be simple. Analysts were hired to explore Business Intelligence, with niche and relatively narrow skill-sets. With the onslaught of big data, and the ever increasing complexity of data analysis, for an ever increasing variety of use-cases, the market has became much more confusing.

This transition is similar to that of the general software world. When computer usage was limited, companies could hire an “IT guy”. Now they navigate roles such as Software Engineers of many types (front, back, full), DevOps Engineers, System Administrators, SCRUM Masters etc.

With the Data / ML market being young and immature, many of the titles are used fuzzily, or just completely incorrectly. This prevents organisations from attracting, hiring and retaining the appropriate talent to deliver their goals. It is common for titles to completely mismatch intention, leading to project failure.

Below I have outlined the most commonly used titles in this space, with the scope of their encompassing skill-sets, and how they inter-relate to each other.

Software Engineer

A software engineer builds solutions to automate, semi-automate or enrich business processes. Software engineers are responsible for collaborating with domain experts and product management to understand challenges a user faces in typical user workflows. They explore user personas and workflows and collaborate to propose solutions technology can provide.

Software engineers will have various forms of expertise to solve different parts of problems in different ways. Different software engineers may have skill-sets useful in building:

Web apps
- On-premise
- In the cloud
Phone apps
Desktop apps
Embedded systems

Within these apps / systems different software engineers may have skill-sets useful for:

User interface / experience design
Implementing the user interface and linking to back-end services
Designing cloud systems
Translating business logic into robust, consistent logic
Data storage / streaming architecture

The title of software engineer is the most generic in this space. In the market there is a large supply of software engineers, but an even larger demand. The software engineer title is the most frequently found, due to its longer history and broader context.

Data Scientist

Just like a software engineer, a Data Scientist builds solutions to automate, semi-automate or enrich business processes, but they do so via a different methodology. A Data Scientist investigates, researches and applies knowledge of statistics and probability to solve problems.

The Data Scientist title is likely the second oldest title in this field, and can be used broadly in its application. To try to target more specific subsets Airbnb have three types of Data Scientist roles.

In practice Data Scientists have become increasingly associated with the application of Machine Learning tooling, as ML has become the go to solution for an increasing number of problems. Aka the role AirBnB calls an Algorithms Data Scientist. What Airbnb calls an Analytics Data Scientist may be more commonly found now as a Data Analyst, and an Inference Data Scientist perhaps a statistician.

Different Data Scientists have skill-sets useful for working on different types of problems with different types of data, such as:

Tabular data analysis
Image analysis
Natural Language Processing
Recommendation systems
Data in range of: Megabytes, Gigabytes, Terabytes or Petabytes
Specific domains such as BioTech, FinTech or BI

Within these contexts different Data Scientists may have skill-sets useful for:

Data Collection / Sampling / Validation
Data Engineering / MLOps
Data Visualisation
Applying Classic ML Algorithms
Applying Deep Learning
Problem framing / Domain exploration
Technical communication
- Internal
- External: Blogs, papers, patents etc.

The power of the Data Science methodology was recognised decades ago, but the tooling and processes for wide adoption have only recently been developed. Due to this, Data Scientists often feel closer to academia than industry.

In many organisations Data Scientists are within a research department, publishing papers and creating patents rather than working within product teams. Often these roles may go under the alternative title “Research Scientist”. They often differ from academia in terms of research scope, incentive structures and career progression, while having much overlap in day-to-day activities.

As the tooling and processes have matured, Data Scientists are being increasingly hired in a product specific context. In a product context, ML tends to be applied in a very specific part of a user workflow. In this context it is important for Data Scientists to understand the user, their domain and the full workflow to decide on an appropriate problem framing for their sub-solution.

Data Engineer

A Data Engineer typically builds internal solutions to solve problems faced in the training and application of Machine Learning models. A Data Engineer can typically be viewed as a form of specialised Software Engineer, focused on Data storage, networking and the Data Science domain, given their “customer” is internal employees applying Data Science. They also tend to take on skill-sets from a DevOps Engineer role since both these roles are focused on automating internal engineering processes. They may also take on responsibilities typically covered by a Cloud Architect role in certain organisations where the data scale is large enough to require such solutions.

Different Data Engineers have experience with different technologies that are more suitable for different types of data / problems that need solved, such as:

Hadoop with HDFS for big data problems that can be divided into independent units
Apache Spark for more flexible big data problems
Docker, Docker Swarm, Kubernetes for repeatable, scalable application of programs in any language using any framework
DVC / MLFlow / Pachyderm / KubeFlow / Weights & Biases as underlying infrastructure for creating ML Pipelines
GitHub Actions / Bitbucket pipelines / GoCD for CI/CD pipelines, possibly combined with ML Pipelines
Luigi for complex, long-running batch job pipelines
Relational, Document, Key-Value and / or Graph databases for different data structures

Machine Learning Engineer

Given the broad scope of the Data Scientist title, and its historical linkage with academia; the new title Machine Learning Engineer has emerged to be more focused and targeted on a business need.

An ML Engineer typically has many traditional Software Engineering skills, combined with Data Science knowledge. Where a Data Scientist may be more focused on theory and research, an ML Engineer will be more focused on tools and application.

An ML Engineer’s role and responsibilities will overlap with those of Software Engineers, Data Scientists, Data Engineers and DevOps Engineers. ML Engineers will be focused on functional and non-functional requirements of technological solutions. Some Data Science solutions may functionally solve a problem, but not meet the desired non-functional requirements, such as cost to run, execution time, throughput, maintenance overhead etc. A good ML Engineer will take all of these into account when exploring problem and solution framing. They will typically use a combination of Software Engineering and Data Science to meet functional requirements and utilise Software Engineering / DevOps / Data Engineering tooling to meet non-functional requirements.

Different ML Engineers will have different skills that map to a subset including any of the skills of a Software Engineer, Data Scientist and Data Engineer, typically sampling from across all three.

There have been past titles used for this role that haven’t caught on across the market such as:

Algorithm Engineer
Research Engineer