Data Engineering For Data Scientists by Pete Fein | Data Science Global Summit 22.2

preview_player
Показать описание

Talk and Workshop by Pete Fein, Consultant at Snakedev

A high-level introduction to data engineering for data scientists. In this fast-paced talk, you’ll learn how adopting data engineering best practices and tools can improve your data science projects and empower you to deliver better, more reliable results in record time. We’ll discuss data architecture and design principles, and explore open source tools you can use today, including:
- Running Jupyter notebooks in production using Papermill and nbdev
- Write unit tests for your pandas and Spark dataframes with pandera
- Reusable SQL with dbt, an exciting new tool for data transformation that’s transforming data teams.
- Workflow orchestration with Apache Airflow, a better approach than fragile and frustrating cron jobs or Lambdas.
- Version control your data alongside your code with DVC"

Weak supervision uses weak signals to generate noisy labelled data from unlabelled data. Let me guide you through the process of using skweak a weak supervision library for Natural Language Processing to make your dataset creation processes scale.

I will show you how to generate some noisy data using skweak and then we will train a NLP model.

#geekle #dataengineering #datascience

Рекомендации по теме