Python Libraries You Should Know As A Data Engineer - Python For Beginners

preview_player
Показать описание
What python libraries should data engineers know?

Here is a list from beginner to advanced!

Beginner
- Requests
- Paramiko
- Psycopg2 or SQLAlchemy
- Datetime
Mid
- BeautifulSoup
- Airflow
- All the cloud libraries(AWS, GCP, Azure)

Advanced
- PySpark
- PyKafka

0:00 Intro
2:10 Requests
2:44 Paramiko
3:02 Psycopg2
4:00 Basic Data Engineering Project Idea
4:42 BeautifulSoup
5:02 Datetime
6:00 Airflow
6:33 All the cloud libraries(AWS, GCP, Azure)
8:30 PySpark and PyKafka

If you enjoyed this video, check out some of my other top videos.

Top Courses To Become A Data Engineer In 2022

What Is The Modern Data Stack - Intro To Data Infrastructure Part 1

If you would like to learn more about data engineering, then check out Googles GCP certificate

If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.

Or check out my blog

And if you want to support the channel, then you can become a paid member of my newsletter

Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio

_____________________________________________________________
_____________________________________________________________
About me:
I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.

*I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.
Рекомендации по теме
Комментарии
Автор

Beginner -
1. Requests (and sftp)
2. Psycopg2 and similar database libraries
3. Beautifulsoup and scrapy
4. Datetime
5. Virtualenv
Intermediate -
6. Airflow
7. Boto3 and similar libraries to interact with cloud
8. Flask/Django
Advanced (based on need to know) -
9. Pyspark
10. Pyarrow

shravanshenoy
Автор

Some other cool libraries from my side:
- Pandas - you've mentioned it but you haven't put it in a context that one should know I think (vide the case from your Facebook interviews) - I think its essential for any sort of data wrangling with Python.
- NumPy - essential stuff for any sort of algebra if you want to dive deeper into ML
- MyPy/Pydantic - for data validation & static typing
- Pytest - for testing
- matplotlib & seaborn - for data visualization in Python
- any sort of file libraries for specific file formats like json, csv, avro-python etc.
- ML libraries like scikit-learn
- FastAPI as an alternative to Django/Flask
- Selenium
- argparse for scripting

Although I haven't used most of these in my job on a regular basis - I think it doesn't hurt to know them :)

RSKriegs
Автор

Requests
Psycopg
Bigquery
Beautifulsoup & scrapy
Datetime
Boto 3
Flask
Virtualenv
Spark
Pyarrow
Pykafka
Snowflake

hdr-tech
Автор

Psycho pg2 is how I've heard folks say it too!

matthewwiese
Автор

Great content as usual! I'd add json library to that

luizhenriquecudo
Автор

Watching the premiere... expecting to hear about the tenacity library here xD

lkellermann
Автор

I've gone through possibly all python courses in Udemy but have never seen a course focused on Data Engineering and the good-to-know libraries. Some times there is one short chapter about one of them buth nothing complete. Anyone has any tips?

pcargolo
Автор

I have to use a shell script ti execute mysql queries then pass the resulrt as an argument in my python scripts >_< wish i could just use mysql connector

redrum
Автор

I'm stuck in a "data engineer" position where all my boss will let me do is debug SQL script and it's killing me

EH-itpj
Автор

How can you know pandas every which direction, but not understand a dictionary? You wouldn't know how to construct a dataframe from a dictionary of lists (often my approach when webscraping) or know how to use the map function to change categorical names. Wes McKinney (who created pandas) even says that a pandas series data structure is similar to an ordered dictionary.

data-dylan
Автор

good list, but most of your psycopg2 stuff prob would have been easier with sqlalchemy

EbeneezerGumb
Автор

Regarding to APIs I always thought we should learn how to pull from them, not actually create them. So where does Flask fits into all that?

gabrielkolletalves