Amazon EMR Deep Dive and Best Practices - AWS Online Tech Talks

preview_player
Показать описание
Amazon EMR is the industry-leading cloud big data platform for processing vast amounts of data using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto. In this tech talk, we introduce you to Amazon EMR design patterns and architectural best practices. We show how EMR can help you run Petabyte-scale analysis at less than half of the cost of traditional on-premises solutions and over 3x faster than standard Apache Spark. You'll learn how to spin up and spin down clusters as needed for short jobs and how to create highly available clusters that automatically scale to meet demand. Discover how Apache Hudi simplifies building data pipelines. You will also learn how to run EMR clusters on AWS Outposts for on-premises or hybrid deployments.

Learning Objectives:
*Learn how to design a big data environment using best practices
*Find out the best way to support Apache Spark, Hive, HBase and other open source applications
*See how to choose the right approach for both short- and long-running jobs


Follow Amazon Web Services:

☁️ AWS Online Tech Talks cover a wide range of topics and expertise levels through technical deep dives, demos, customer examples, and live Q&A with AWS experts. Builders can choose from bite-sized 15-minute sessions, insightful fireside chats, immersive virtual workshops, interactive office hours, or watch on-demand tech talks at your own pace. Join us to fuel your learning journey with AWS.

#AWS
Рекомендации по теме
Комментарии
Автор

thanks for demo and presentation, it's rare to find such a professionally structured materials!

oldoctopus
Автор

This is an excellent presentation providing concise depth on a number of critical features

thekaders
Автор

In scaling comparison time scale is different for both of the graphs(12h vs 14 days), I believe the graph will be much smoother in higher time frames.

mohitpandey
Автор

This is a marvel. I read a book with similar content, and it was a marvel to behold. "Mastering AWS: A Software Engineers Guide" by Nathan Vale

Larry
Автор

great video. Can we get that Step Functions code ?

bugzia
Автор

Am quite intrigued about the small file problem.. .say thr is a table and has 10 recs and we the convert them we create a parquet ifile..whr.woukd it cause problems? While say external table creation or while we run any inquiry or during any analytics job performing complex calculation...

vigneshvit
Автор

Very informative!As EMR File system provides provisioning s3 services. Can I use EMR FS as the storage and on top of it running spark for handling big data? Is it possible to replace hadoop with s3?

avitabayansarma
Автор

Great Video. To the point, clear and very helpful. THANK YOU!

friendslives
Автор

Fantastic demo and the presentation is rich and cool one its more helpful to design productioniged solutions

saleembasha
Автор

great presentation, covering key features, well explained and showing demos as well.. thanks very much ; )

leandromana
Автор

Is the step function code open sourced?

mwont
Автор

Diddo on the step functions code. Great video though!

justindebo
Автор

A lot of information for sure but not explained well to actually understand it

manjuchoudhary