7 steps to build your own data pipeline

preview_player
Показать описание
CTO and co-founder of Moonfrog Labs - Kumar Pushpesh - explains why the company built data infrastructure in parallel to games/products, including:

1. Having a scalable system for data ingestion
2. Data design
3. Querying interface - why stick to SQL?
4. Query interface users
5. Data ingestion
6. High throughput stats service
7. Thin client: Badger
8. High throughput Ingestion backend
9. Hot loading to Redshift
10. Data Warehousing
11. Data design in Redshift and data lake
12. Tuning for scale
13. Taking care of Querying patterns of Product Managers and Data scientists
14. S3 as Data Lake
15. On demand Data loading and querying: OnDemand Table(s)
16. Flexibility for complicated analysis: Adhoc redshift cluster(s)
17. Scaling up
18. Typical bottlenecks and solutions we tried

Рекомендации по теме
Комментарии
Автор

Surely one of the best talks that I have seen in a long while. Thanks Puspesh Kr., for an insightful talk in a lucid and no BS style.

KarthikSirasanagandla
Автор

Thanks for this great talk with abundance of knowledge in regards to Data Engineering

venkateshkalyan
Автор

I think this kingdom, phylum... schema is genius.

christopherdismuke
welcome to shbcf.ru