True ML Talks #19 |Deep diving into Machine learning Pipeline in the ads team at Pinterest

Показать описание

In this #TrueMLtalks edition, we're excited to host Aayush, a senior ML engineer at Pinterest. He has been in the company for the last 5 years primarily working in the conversion optimization product department. As the product was in the development stage, he got a chance to work on this from the ground up and develop it to what it is now. He mostly worked on ads personalization which plays a major role in buying behaviour.

Here he gave us great insights on ML infrastructure at Pinterest and various aspects of models they use. Additionally a lot of discussion was done on access control , application of large language models at Pinterest and a lot more.

00:00: Start
00:15: Introduction of the Guest
7:00: Pinterest's active production ML models count and their request volume
8:30: Challenges and ML's impact on the ad funnel process
15:40: Overview of ML platform architecture driving diverse model types.
24:54: Incorporation of reserved instances and team's operational approach
26:22: Usage of feature store to cut storage costs
32:14: What's the access control setup with Airflow and ML Flow?
35:15: Flow of access control and its hierarchy
40:12: Model server selection, latency optimization, and system benefits
42:12: Pinterest's CPU-to-GPU switch: Optimizing large language models
44:13: What Migrations Pinterest had to undergo in terms of their ML systems
49:40: Pinterest's move to native serving from C++ serving: Advantages
53:43: Guidance for self-built ML system migration
55:50: Did Pinterest always use Kubernetes and why prefer it over other open-source systems

We engaged in a captivating dialogue with Aayush, covering the following topics:
✅ How many ML models does Pinterest have in production and how many request do these models serve?
✅ Challenges faced and how ML impacts the entire process of the ad funnel
✅ Overview of the overall architecture of the ML platform layer that powers different types of models
✅ How do reserved instances come into the picture and how does the team function around it
✅ Has the usage of the feature store reduced the storage cost of the infrastructure at Pinterest?
✅ How has the access control been built when the airflow and ML flow are in the picture
✅ Flow of access control and What is its hierarchy.
✅ Choice of model servers, advantages, and optimization from a latency perspective and benefits of the system
✅How did Pinterest optimize the large language models while switching from CPU to GPU?
✅ How many major migrations did Pinterest have to undergo in terms of their ML systems?
✅Why did Pinterest shift to native serving from C++ Serving and what advantages were observed?
✅ Advice on migrating when it comes to building your own infra for an ML system
✅ Has Pinterest always used Kubernets, and why it is preferred over other open-source systems

ABOUT OUR GUEST

Aayush started his career at Pinterest 5 years back and since then he has been working in the conversion product optimization department . During the product's developmental phase, he had the opportunity to contribute right from its inception, playing an instrumental role in shaping it to its current state. His primary focus revolved around ads personalization, a crucial factor influencing consumer purchasing patterns.

ABOUT OUR CHANNEL

In our video series TrueMLTalks, we speak with experts in the machine learning field from organisations including Gong, Intuit, Sales Force, Facebook, Door Dash, and others. It is a useful tool for professionals wishing to remain up to date on the most recent developments in the field because we give an informed view of their experiences managing complex ML pipelines and creating successful best practices.
_____________________________________
ABOUT TRUEFOUNDRY

TrueFoundry is a PaaS for cross-cloud machine learning deployment that enables businesses to speed up model testing and deployment while preserving total security and control for the Infrastructure/Development Operations team. We give machine learning teams the ability to deploy and monitor models with 100% reliability and scalability in just 15 minutes, saving money and allowing models to be put into production quicker, which generates real business value. In order to protect data privacy and other security issues, we deploy on the customer's infrastructure.