Skip to content

Latest commit

 

History

History
171 lines (135 loc) · 6.53 KB

1.2-data-engineering-roadmap.md

File metadata and controls

171 lines (135 loc) · 6.53 KB

Data Engineering roadmap

In previous chapter we learned that data engineers work on delivering good data services. To build such services you will need to learn a lot because such job requires many cross-discipline knowledge.

To master Data Engineering (or DE for short) you need to learn many areas of Computer Science, from programming and hardware to the specific domain knowledge. But fear not, you don't have to learn all the concepts of the profession at once.

I like the lean approach to learn Data Engineering. In this roadmap I'll show you a way to level up gradually so you don't feel overwhelmed.

Let's talk about areas of expertise. A good Data Engineer should be proficient in the following:

  • Software Development – you need to know how to write good software
  • Tech Stack – to understand and develop a good infrastructure
  • Analytics – to speak the same language with analysts and the business
  • Domain Knowledge – best practices of data engineering

So, my roadmap is based on those four areas. You need to gradually start learning all of them, advancing in each of them.

In the end I came up with the three paths of Data Engineering.

Three paths of Data Engineering

I like to split the roadmap into three path (or tiers): Beginner path, Big Data path and Data Architect path.

High-level overview of the roadmap

This split is not a golden rule, but rather observation of the market. Those tiers are based on the job descriptions I've analyzed. For example, some DE positions require knowledge of databases, but not require knowledge of Big Data stack. Thus, such skills could be separated to different "buckets".

Another thing, you need think of those tiers as a layered pie: every new level should be on top the previous. Saying that, if you are an absolute zero in Data Engineering and start with Big Data path you will fail. As I said, higher levels are built on top of lower levels, so make sure that you understand concepts from lower tiers before starting the next one.

Let's look closely at each of the paths.

Beginner path

Basics of data engineering diagram

When you want to start DE career you need to work on some fundamental skill-sets.

Beginner path is all about data engineering basics. Here you need to gain a good understanding of SQL and relational databases, some programming experience (I'd recommend Python), basics of analytics and data engineering concepts.

Knowledge of the beginner level should guarantee you a job on Junior or Middle data engineering positions. After completing this level you will be able to communicate with databases, create data pipelines, model data for data warehouses and even visualize data in BI tools.

Big Data path

Big Data engineering diagram

Big Data path is a continuation of the Beginner path with strong shift towards Big Data technologies. Probably it will take x2-x3 times more to master this path.

You should learn about Big Data stack, distributed processing, advanced programming, containerization, clouds, statistics, and continue with the most important data engineering concepts, like data quality and data security.

With knowledge of such level you are able to cover most of Senior Data Engineer requirements.

Data Architect path

Data Architects diagram

Data Architect path is a High League. People with such knowledge have a broad understanding of all Data Engineering concepts.

On this level you are not only able to create reliable data pipelines, but also decide the architecture decisions and pick the most suitable solution for each use case in data world. You will need to learn some advanced technologies like streaming and DevOps, learn more programming languages (like Java, Scala and Go), have hand-on experience with ML-engineering, and so on.

Table of skills

For convenience, I combined all skills and paths into one big table.

Come back to this tables once you feel that you need inspiration about "what to learn next?".

Beginner path Big Data path Data Architect path
Software engineering
  • SQL
  • Version control
  • Python
  • Web
  • Advanced programming
  • Advanced Python
  • High performance languages
Tech Stack
  • Databases
  • Relational and non-relational databases
  • Linux
  • Docker
  • Clouds
  • Distributed processing
  • Hadoop stack
  • Streaming
  • Operations (DevOps, DataOps)
Analytics
  • BI tools
  • Business Metrics
  • Statistics
  • A/B testing
  • Machine learning
  • Time series analysis
Domain knowledge
  • Data modeling
  • Data pipelines
  • Data quality
  • Security and privacy
  • Data architectures
  • DE tools landscape

Summary

Data Engineering is a fast developing field and changes rapidly. New technologies emerge every day. That is why my roadmap focuses more on core skills rather than particular technologies.

Good luck with your journey!

Table of content