Intro to Big Data AppHub: Kinesis to S3 App Template using AWS & HDFS to S3 Sync App Template

preview_player
Показать описание
Abstract:

To make critical business decisions in real time, many businesses today rely on a variety of data, which arrives in large volumes. Variety and volume together make big data applications complex operations. Big data applications require businesses to combine transactional data with structured, semi-structured, and unstructured data for deep and holistic insights.

And, time is of the essence: to derive the most valuable insights and drive key decisions, large amounts of data have to be continuously ingested into Hadoop data lakes as well as other destinations. As a result, data ingestion poses the first challenge for businesses, which must be overcome before embarking on data analysis.

With its various Application Templates for ingestion, DataTorrent allows users to: Ingest vast amounts of data with enterprise-grade operability and performance guarantees provided by its underlying Apache Apex framework. Those guarantees include fault tolerance, linear scalability, high throughput, low latency, and end-to-end exactly-once processing. Quickly launch template applications to ingest raw data, while also providing an easy and iterative way to add business logic and such processing logic as parse, dedupe, filter, transform, enrich, and more to ingestion pipelines. Visualize various metrics on throughput, latency and app data in real-time throughout execution.

In this webinar, we will also show you how seamless it is to download and run the app template on your AWS account with the AWS integration scripts.

Template descriptions:

Kinesis to S3 App: The Kinesis S3 Application Template continuously ingest messages from Kinesis and upload those to Amazon S3.

HDFS to S3: The HDFS S3 Sync Application Template continuously ingests files as blocks and backup Hadoop HDFS data to Amazon S3 for data upload from Hadoop to Amazon.

Presenters:

Ashwin Putta, Product Manager at DataTorrent, Committer for Apache Apex

Sanjay Pujare, Engineer at DataTorrent
Рекомендации по теме