I so much love this book probably beacause it was the first book that got my feet started on data enginnering in code OR the book is just worth loving.
- Brings you up on DE with Python as promised, if you have at least a month with Python you are fine to consume everything detail by detail.
- Introduces the following topics to 100% clarity for begineers:
- reading and writing to files,
- working with databases (Postgres),
- using Faker to generate fake data to work with,
- easy to follow installation and use of Airflow, Nifi, Elasticsearch, Postgres,
- DE methodologies like data cleaning, transforming, data validation all in simple terms with Python code,
- some production level thingy with data pipline for intermediates,
- introduces the big guns of big data, Spark and kafka, with easy code to understand.
It really gets you coding with Python as promised, so yes, with this book you will be scripting like a Data Engineer.
Now, these are my dislike with this great book:
- the book introduces both Apache Airflow and Nifi for orchestration but drops Airflow in the first section and followed up with Nifi. I guess Paul really wanted to make it easy but Nifi is really not used(compared to Airflow) and people don't talk much about it. It is supposed to be GUI based, so yes, Paul was right, it is easy to learn and use but... it just does'nt feel right.
- Paul Crickard also taught us how to use Elasticsearch which is a NoSQL database with Kibana for visualization, but for me, I was lost with Elasticsearch and Apache Cassandra has more hype for open-source NoSQL database. Kibana, I also did'nt dig with it much.
See, the book is that good, I had only just two probs with it, period.
The book is in three section;
-
beginners,
- if your Python is good enough, finish up section one.
- skip section two for laters, filled with Nifi and production-level pipeline stuff.
- only do section three if you are ready for Spark and Kafka, but really it would not be enough to know enough, try adding Youtube or specifically-focused books for this two data-tools beasts.
- skip anything Nifi, Elasticsearch and Kibana if your heart feels so (mine did).
-
not-a-beginner,
- do section one and three, if you like Nifi (I don't) go ahead with section two.
- skip anything Nifi, Elasticsearch and Kibana if your heart feels so (mine did).
There is one thing i can tell you with this book, I've no 👎 with it at all. Data pipeline is arguably one of crux of data engineering and this very book delivers 100% in introdcuing that both
In Theory and Code
.
- Explains all to know about data piplines, and modern data infrastructures (data warehouses, cloud services, tools).
- Data Extraction with Relational databases (Postgres and MySQL), NoSQL (MongoDB), REST APIs all with
Python code
. Extra peek, it includes working with CDC (change capture data) using python, which was my first hearing of it and also very much important. Don't know CDC too, do check the book. - Transforming and Validation with Pyhton and Airflow, simply explained.
- It aslo introduces ETL with cloud sevices (AWS and the python Boto3 library, I so love it for that, gotta see it too)
- Also contains chapters for intermediates to play with such as Data validation, best practices with piplines, mesuring and monitoring pipline performance.
Like I said, I stand with the book completely, read the whole 10 chapters A-Z myself.
Just read and practice alongside with the book, period.
Data engineering with/on the cloud is data engineering now. This book is not even an option, you just have to read and follow through with it to balance your skill-set as data engineer. Now, this is for AWS(Amazon), I should mention there is for GCP(Google) and AZURE(Microsoft).
This book is better done with Youtube videos and actually creating an account to practice the teachings. I'll recommend the Youtube as it could be a little hard following through (visuals counts here, I'll tell you that), I should also add about practicing with cloud, watch out not to run into debts like i did.
About this book, it not neccessary at all but I'll say it is quite informative about what data engineers are saying about their careers and job role. If you are also learning like me, having 97 insights and 97 lessons from 97 individul already in the indsutry would be important too.