How We Scaled Bert To Serve 1+ Billion Daily Requests on CPU

Показать описание

Roblox is a global online platform bringing millions of people together through play, with over 37 million daily active users and millions of games on the platform. Machine learning is a key part of our ability to scale important services to our massive community. In this talk, we share our journey of scaling our deep learning text classifiers to process 50k+ requests per second at latencies under 20ms. We will share how we were able to not only make BERT fast enough for our users, but also economical enough to run in production at a manageable cost on CPU.

Connect with us:

Databricks
Databricks

Рекомендации по теме

Комментарии

Great walkthrough, thanks! Do you use NVIDIA Triton inference server? It works with CPU applications as well. It may add some more optimizations

kjkszpjab

How We Scaled Bert To Serve 1+ Billion Daily Requests on CPU

How We Scaled Bert To Serve 1+ Billion Daily Requests on CPU

Unveiling the Cleverness of BERT: Scaling the Power of Language Models

Unlocking BERT: Scaling Intelligent Insights

Scaling BERT and GPT for Financial Services with Jennifer Glore - #561

Should you switch from BERT to ALBERT?

How to Compress Your BERT NLP Models For Very Efficient Inference

Faster & More Accurate BERT Models on CPUs

How ChatGPT Works Technically | ChatGPT Architecture

GPT & BERT Decode DNA: AI's Next Frontier? 🧬🤖

BERT: one NLP model to rule them all

Tutorial 5: BERT for Computational Social Scientists

BERT Can See Out of the Box

ZeRO & Fastest BERT: Increasing the scale and speed of deep learning training in DeepSpeed

DeBERTa: Decoding-enhanced BERT with Disentangled Attention (Machine Learning Paper Explained)

Natural Language Processing in Digital Content Webinar. Who’s BERT?

How To Train BERT 15x Faster | NLP Summit 2020

On-mobile real-time question answering using BERT 1

Deep Learning for NLP Lecture 09 - Transformers and BERT

Mike Lewis | Beyond BERT: Representation Learning for Natural Language at Scale

Graphcore at NeurIPS 2019 – Processing Large-Scale NLP Model BERT on IPU

Exploring German BERT model pre-training from scratch

Bing is Now Utilizing BERT at a Larger Scale Than Google via @MattGSouthern

Distilling BERT | Sam Sucik

Gordon Gibson at Ada Inc- Testing and Deploying BERT at Scale