filmov
tv
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Production

Показать описание
The ‘feature store’ is an emerging concept in data architecture that is motivated by the challenge of productionizing ML applications. The rapid iteration in experimental, data driven research applications creates new challenges for data management and application deployment. These challenges are complicated by production ML pipelines with interdependent modeling and featurization stages. Large tech companies have published popular reference architectures for ‘feature stores’ that address some of these challenges, and an active open source ecosystem provides a full workbench of power tools. Still, the abstract role of the feature store can be a barrier to implementation. We demonstrate an implementation of a feature store as an orchestration engine for a mesh of ML pipeline stages using Spark and MLflow. This is broader than the role of a metadata repository for feature discovery. The metadata in a feature store allows us to break the unit of deployment down to the level of the ML pipeline stage so that we can break the anti-pattern of ‘clone and own’ ML pipelines. We isolate concerns of pipeline orchestration and provide tooling for deployment management, A/B testing, discovery, telemetry and governance. We provide novel algorithms for pipeline stage orchestration, data models for feature stage metadata, and concrete systems designs you can use to create a similar feature store using open source tools.
Key Takeaways:
- Understand the state of the feature store in industry with a survey of published reference architectures, open source repositories, and anecdotal client experiences.
- Take away concrete system designs and novel algorithms to inspire the design of your feature store.
About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Connect with us:
Key Takeaways:
- Understand the state of the feature store in industry with a survey of published reference architectures, open source repositories, and anecdotal client experiences.
- Take away concrete system designs and novel algorithms to inspire the design of your feature store.
About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Connect with us: