Cloud-Native Stream Processing | Agari

Показать описание

Recorded at DataEngConf SF '17

Big Data companies (e.g. LinkedIn, Facebook, Google, and Twitter) have historically built custom data pipelines over bare metal in custom-designed data centers. While this affords greater control over performance and cost, it often creates a division between operations and development that leads to decreased agility and velocity. Operations see their clients (developers) as internal and hence often don’t invest in self-service tooling, preferring archaic human-backed ticketing systems with human-scale turnaround SLAs (e.g. from hours to days to months to get new machines). The public cloud is a game changer because all users of infrastructure are deemed to be external - hence, all resources from Kinesis or PubSub streams to S3 or Cloud Storage buckets to DynamoDB or BigTable tables can be requested and provisioned on the fly!

Thanks to recent cloud improvements, data infrastructure (i.e. databases, data pipelines, search engines, blob stores) can now not only be provisioned on the fly but also autoscaled and auto-healed without the developer being aware. Autoscaling of EC2, introduced ~8 years ago, has recently been replaced by serverless (e.g. AWS Lambda) & fully-hosted ( e.g. AWS Elasticache, ElasticSearch, DynamoDB ) approaches. Provisioning automation such as Chef, Puppet, and Ansible, is increasingly being obsoleted by Terraform. Developers of Data Pipelines, Predictive and ETL, can focus more on the differentiated aspects of their work, leaving the management of data infrastructure to AWS for the most part.

Agari, a leading email security company, is applying big & fast data best practices to both the security industry and to the cloud in order to secure the world against email-bourne threats. We do this by building near-real time stream processing predictive data pipelines & control systems in the AWS cloud that are infinitely scalable, highly available, low latency, and easy to manage.

Speaker: Sid Anand, Agari

ABOUT DATA COUNCIL:

FOLLOW DATA COUNCIL:

Рекомендации по теме

Cloud-Native Stream Processing | Agari

Cloud-Native Stream Processing | Agari

Sid Anand on Building Agari’s Cloud-native Data Pipelines with AWS Kinesis and Serverless

Cloud Native Data Pipelines • Sid Anand • GOTO 2017

Sid Anand on Building Agari’s Cloud-Native Data Pipelines with AWS Kinesis and Serverless

Building & Operating High-Fidelity Data Streams

Resilient Predictive Data Pipelines • Siddharth 'Sid' Anand • GOTO 2016

Integrating Infrastructure as Code into a Continuous Delivery Pipeline (125146)

AWS re:Invent 2017: Building High-Throughput Serverless Data Processing Pipelines (SRV304)

Real-time Web Analytics with Arroyo (Part 1)

Ultimate DevOps: OpenShift Dedicated With CloudBees Jenkins Platform (Andy Pemberton)

Ansible: Repeatable Server Setups You Will Actually Use by Kevin Kaland

OpenShift automated AB deployment and integrated build pipelines / Tero Ahonen / SaKE20

2017 swampUP Sessions | Debug Production Applications in Kubernetes --Ray Tsang & Baruch Sadogur...

Openshift and the Git to Cluster Pipeline

Format Wars: from VHS and Beta to Avro and Parquet | Silicon Valley Data Science

Introducing the Arroyo Architecture

TECHINPORTO 2017 - Kubernetes – Hands-On Demos for Container Orchestration - Bruno Terkaly