How to use WordCount in Apache Beam

preview_player
Показать описание

Welcome back to Getting Started with Apache Beam! In this episode, Debi Cabrera demonstrates how to process and transform data using Apache Beam with Python and Google Cloud Dataflow as the runner. Watch to see how you can use Apache Beam to count the words from Shakespeare’s King Lear as a batch data job and then try it out for yourself!

Chapters:
0:00 - Intro
0:40 - In this episode
1:06 - The pipeline
1:31 - The input file
1:46 - Direct runner
2:17 - Dataflow runner
2:57 - The pipeline code
4:07 - Dataflow in the Cloud Console
4:45 - The output file
5:15 - Wrap up

#ApacheBeam

product: Cloud - General; fullname: Mark Mirchandani, Debi Cabrera;
Рекомендации по теме
Комментарии
Автор

Good pace & excellent content clearly communicated. Nice to see the walkthrough of a Python example using the Direct Runner too. Would be great to see some more in this series perhaps with a focus on a pipeline development best practice workflow from local dev of pipeline code up to deployment into staging & production on GCP

timantrobus
Автор

This is so so so so helpful. It is easy to understand and well produced.

therealjohnshelburne
Автор

Thank you for this video Debi :) Can you please make one on how create and deploy a custom pipeline on dataflow using CLI !!

eyagrissia