Python Tutorial: What is data engineering?

preview_player
Показать описание

---
Hi. My name is Vincent. I'm a Data and Software Engineer at DataCamp. If you've ever heard of data science, there's a good chance you've heard of data engineering as well. This course will help you take your first steps in the world of data engineering. All very exciting, so let's get started!.

In the first chapter, we'll start off by introducing the concept of data engineering. In the second chapter, you'll learn more about the tools data engineers use. The third chapter is all about Extracting, Transforming and Loading data, or ETL. Finally, you'll get to have a peek behind the curtain in the case study on data engineering at DataCamp. But first, let's understand what data engineers do!

Imagine this: you've been hired as a data scientist at a young startup. Tasked with predicting customer churn, you want to use a fancy machine learning technique that you have been honing for years. However, after a bit of digging around, you realize all of your data is scattered around many databases. Additionally, the data resides in tables that are optimized for applications to run, not for analyses. To make matters worse, some legacy code has caused a lot of the data to be corrupt. In your previous company, you never really had this problem, because all the data was available to you in an orderly fashion. You're getting desperate. In comes the data engineer to the rescue.

It is the data engineer's task to make your life as a data scientist easier. Do you need data that currently comes from several different sources? No problem, the data engineer extracts data from these sources and loads it into one single database ready to use. At the same time, they've optimized the database scheme so it becomes faster to query. They also removed corrupt data. In this sense, the data engineer is one of the most valuable people in a data-driven company that wants to scale up.

Back in 2015, DataCamp published an infographic on precisely this: who does what in the data science industry. In this infographic, we described a data engineer as "an engineer that develops, constructs, tests, and maintains architectures such as databases and large-scale processing systems." A lot has changed since then, but the definition still holds up. The data engineer is focused on processing and handling massive amounts of data, and setting up clusters of machines to do the computing.

Typically, the tasks of a data engineer consist of developing a scalable data architecture, streamlining data acquisition, setting up processes that bring data together from several sources and safeguarding data quality by cleaning up corrupt data. Typically, the data engineer also has a deep understanding of cloud technology. They generally are experienced using cloud service providers like AWS, Azure, or Google Cloud.

Compare this with the tasks of a data scientist, who spend their time mining for patterns in data, applying statistical models on large datasets, building predictive models using machine learning, developing tools to monitor essential business processes, or cleaning data by removing statistical outliers. Data scientist typically have a deep understanding of the business itself.

Let's see if you can recognize the qualities of a data engineer in the exercises.
Рекомендации по теме
Комментарии
Автор

What is the requirement to learn this course?

loshcoock