Apache Sqoop - Import Specific Columns from MySQL database into Hadoop

Показать описание

In this course, you will step by step learn everything that you need to know about Apache Sqoop and how to integrate it within Hadoop ecosystem. With every concept explained with real world like examples, you will learn how to create Data Pipelines to move in/out the data from Hadoop.

This comprehensive course focuses on building real world data pipelines to move data from RDBMS systems (such as Oracle, MySQL etc) to Hadoop systems and vice versa. This knowledge is very critical for any big data engineer today.

Why Apache SQOOP
Apache SQOOP is designed to import data from relational databases such as Oracle, MySQL, etc to Hadoop systems. Hadoop is ideal for batch processing of huge amounts of data. It is industry standard nowadays. In real world scenarios, using SQOOP you can transfer the data from relational tables into Hadoop and then leverage the parallel processing capabilities of Hadoop to process huge amounts of data and generate meaningful data insights. The results of Hadoop processing can again be stored back to relational tables using SQOOP export functionality.

A Note For Data Engineers
This course will help you be prepared for CCA Spark 175 & Hortonworks Data Platform Developer Certifications

What will you achieve after completing this course
After completing this course, you will be one step closer to CCA175 & HDPCD certifications. You will need to take other lessons as well to fully prepare for the test which we will be launching soon. Even if you are not planning for a certification (although we highly recommend you to get one, as it improves your chances of getting into big companies), you will still need the knowledge from this course to work as a data engineer.

What you will get in course
3.5 hours of On-Demand Videos | Working Code | Full Lifetime Access | Access on Mobile & TV | Certification of Completion

You will learn

Section 1 - APACHE SQOOP - IMPORT TOPICS (MySQL to Hadoop/Hive)

In this section of the course, you will learn how to move data from a MySQL database into Hadoop/Hive systems. There are lots of key areas that we will cover in this section of the course and it's very critical for any data engineer to complete it. Here are few of the key areas that we will cover in the course:

+ warehouse hadoop storage
+ specific target on hadoop storage
+ controlling parallelism
+ overwriting existing data
+ append data
+ load specific columns from MySQL table
+ control data splitting logic
+ default to single mapper when needed
+ Sqoop Option files
+ debugging Sqoop Operations
+ Importing data in various file formats - TEXT, SEQUENCE, AVRO, PARQUET & ORC
+ data compression while importing
+ custom query execution
+ handling null strings and non string values
+ setting delimiters for imported data files
+ setting escaped characters
+ incremental loading of data
+ write directly to hive table
+ using HCATALOG parameters
+ importing all tables from MySQL database
+ importing entire MySQL database into Hive database

Section 2 - APACHE SQOOP - EXPORT TOPICS (Hadoop/Hive to MySQL)

In this section of the course, we will learn opposite of sqoop import process. In other words, you will learn how to move data from a hadoop or hive system to MySQL (RDBMS) database. This is an important lesson for data engineers and data analysts who often need to store aggregated results of their data processing into relational databases.

+ Move data from Hadoop to MySQL table
+ Move specific columns from Hadoop to MySQL table
+ Avoid partial export issues
+ Update Operation while exporting

Section 3 - APACHE SQOOP - JOBS TOPICS (Automation)

In this section, you will learn how to automate the process of sqoop import or sqoop export using sqoop jobs feature. This is how a real process will be ran in production. So, this lesson is critical for your success at job.

+ create sqoop job
+ list existing sqoop jobs
+ check metadata about sqoop jobs
+ execute sqoop job
+ delete sqoop job
+ enable password storage for easy execution in production

Windows users will need to install virtual machine on their device to setup single node hadoop cluster while MacBook or Linux users can directly install hadoop and sqoop components on their machines. The step by step process is illustrated within course.

Find us on: