See what the course offers

Watch every data engineering concept explained

Over 30 hours of video lessons, filled with concepts and live examples covering all major areas of data processing and engineering.

Apply every lesson with challenging exercises

Tons of exercises and projects that replicate the real-life data engineering process. You will learn how to assess big data technology based on data requirements.

Get feedback from instructors and other students

Ask questions day or night. Post your projects to get feedback. Share resources and learn from your fellow students.

Download the entire curriculum for offline access

All the course videos and assignments are available for offline use. You also have lifetime access to the curriculum, which is regularly updated.

A typical data team

The data engineer gathers and collects the data, stores it, does batch processing or real-time processing on it, and serves it via the database to a data scientist who can easily query it.

The data scientist interprets and extracts meaning from data using exploratory data analysis and machine learning, and communicates these findings to other teams.

The machine learning engineer takes trained models and prepares them for the production environment by creating APIs, scheduling jobs as well as logging and monitoring.

Embark on a learning journey

A modern Hadoop-based MapReduce pipeline.

Introductory Concepts

(50 hours)

We look at the problems with existing infrastructure, the different types of data engineering systems and what are the most common requirements and concerns.

Topics include: big data systems, scaling, parallel processing, Hadoop, MapReduce, Parquet and HDFS.

High-level perspective of Lambda Architecture.

Lambda Architecture

(200 hours)

Nathan Marz came up with the term Lambda Architecture for generic, scalable and fault-tolerant data processing architecture. It is data-processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream processing methods.

Topics include: lambda architecture, data collection & ingestion, messaging, batch processing and views, distributed data persistence, stream processing, distributed index, search and visualization.

Enhanced Hadoop pipeline with Storm, Cassandra and Voldemort.

Capstone Project

(100 hours)

Towards the end of the course, you will build a final capstone project with either your employer or as a side project.

Topics include: independent planning, system design and end-to-end pipeline building.

Get your questions answered here

How long is the course?

30 hours of video lessons, 200 hours of assignments and 100 hours for the capstone project.

What are the prerequisites?

You should have a strong command of Python, SQL, and the Command Line. You can learn these subjects for free via Codecademy.

Can I preview the course material?

We have converted a handful of early lectures to articles and posted them via Medium. Here is a sample coding lab from our first unit as well.

Can you help with employer sponsorship?

If you need a custom invoice, syllabus, letter of acceptance or certificate of completion, simply email me.

Do I get immediate access to all videos?

Yes, as soon as you sign up, you will get immediate access to the entire video series and an invitation to the student community.

Do I have lifetime access to the course?

Yes, all students have lifetime access to the course. You can start the course the minute you buy it, or a year later – it's up to you.

What is a comparable program?

As there are not really any structured data engineering bootcamps, the most comparable programs would be graduate certificates teaching big data technologies. We like the one offered by the University of Washington. Ours will be more affordable, flexible and include newer technologies. We do understand that some people and employers prefer going with name-brand institutions.

Meet the team behind the course

This course was built by a lean team of industry professionals. We all enjoy teaching and learning new things, so we thought we could make something amazing to help fill a major gap in the technology space.

Felix Raimundo
Curriculum Contributor
Data Engineer @ Tweag

Ashish Nagdev
Curriculum Contributor
Senior Data Engineer @ Citi

Accelerate your learning with mentorship

If you need to learn data engineering fast, there’s no better way to do it than with personal advice and one-on-one technical feedback.

The Mentorship Plan is for students who need to up their skills as quickly and efficiently as possible. While all students will have access to feedback and reviews in the Student Community, the Mentorship students will also get six one-on-one sessions over Skype, where we’ll screen share code you’re working on and zoom in on the topics that will personally help you the most.

These are example questions you might ask:

"I'm a bit fuzzy on concept X, can you share practical use cases for it?"

"Here's a problem I want to solve. How would you approach it?"

"Let's talk more about messaging queues / key-value stores / etc."

"Here's a data engineering project I'm working on. Can you review it and give me pointers on what to change or improve?"

Anything that helps you become a better data engineer is fair game.