Continuous Data Ingestion pipeline for the Enterprise

preview_player
Показать описание
Continuous Data ingestion platform built on NIFI and Spark that integrates variety of data sources including real-time events, data from external sources , structured and unstructured data with in-flight governance providing a real-time pipeline moving data from source to consumption in minutes. The next-gen data pipeline has helped eliminate the legacy batch latency and improve data quality and governance by designing custom NIFI processors and embedded Spark code. To meet the stringent regulatory requirements the data pipeline is being augmented with features to do in-flight ETL , DQ checks that enables a continuous workflow enhancing the Raw / unclassified data to Enriched / classified data available for consumption by users and production processes.

Speaker:
Santosh Bardwaj
Vice President, Advanced Analytics & Decision Platforms
Discover Financial Services

Рекомендации по теме
Комментарии
Автор

Detailed but concise presentation, a great combination and indeed extremely helpful. Thank you for sharing.

kennethcarvalho
Автор

Don't recommend to use NIFi for data ingestion. It's difficult to work with flow provisioning / IT testing is almost impossible, using existing components and building flows become a nightmare when it grows. No easy way to add ingestion as a pipeline into the existing system (like Oozie).
Flow changes are difficult to track (no integration with git out of the box).

olegzastavnyi