filmov
tv
Building a Reliable & Reusable Feature Repository to Accelerate Model Development
Показать описание
Speaker:
Lan Yao - Data Scientist, Loblaw Digital
Abstract:
At Loblaw Digital, we have abundant data resources. Yet, it takes a lot of processing before we can build predictive models or perform analysis on them. Data engineers, data analysts, and data scientists have to conduct time-consuming and repetitive tasks to understand the business logic within and across data components to get the desired features and datasets.
Data discovery and data generation became the most challenging piece before putting ML solutions in production. To conquer these difficulties, we enrich millions of transactions from a variety of sources using data build tool (DBT) while ensuring quality checks.
The pipelines are scheduled using AirFlow DAGs and they output in a single, scalable, consolidated repository. These features enable our teams to have a quicker turnaround time on our solutions' development.
Lan Yao - Data Scientist, Loblaw Digital
Abstract:
At Loblaw Digital, we have abundant data resources. Yet, it takes a lot of processing before we can build predictive models or perform analysis on them. Data engineers, data analysts, and data scientists have to conduct time-consuming and repetitive tasks to understand the business logic within and across data components to get the desired features and datasets.
Data discovery and data generation became the most challenging piece before putting ML solutions in production. To conquer these difficulties, we enrich millions of transactions from a variety of sources using data build tool (DBT) while ensuring quality checks.
The pipelines are scheduled using AirFlow DAGs and they output in a single, scalable, consolidated repository. These features enable our teams to have a quicker turnaround time on our solutions' development.