filmov
tv
Cloud-Native Stream Processing | Agari

Показать описание
Recorded at DataEngConf SF '17
Big Data companies (e.g. LinkedIn, Facebook, Google, and Twitter) have historically built custom data pipelines over bare metal in custom-designed data centers. While this affords greater control over performance and cost, it often creates a division between operations and development that leads to decreased agility and velocity. Operations see their clients (developers) as internal and hence often don’t invest in self-service tooling, preferring archaic human-backed ticketing systems with human-scale turnaround SLAs (e.g. from hours to days to months to get new machines). The public cloud is a game changer because all users of infrastructure are deemed to be external - hence, all resources from Kinesis or PubSub streams to S3 or Cloud Storage buckets to DynamoDB or BigTable tables can be requested and provisioned on the fly!
Thanks to recent cloud improvements, data infrastructure (i.e. databases, data pipelines, search engines, blob stores) can now not only be provisioned on the fly but also autoscaled and auto-healed without the developer being aware. Autoscaling of EC2, introduced ~8 years ago, has recently been replaced by serverless (e.g. AWS Lambda) & fully-hosted ( e.g. AWS Elasticache, ElasticSearch, DynamoDB ) approaches. Provisioning automation such as Chef, Puppet, and Ansible, is increasingly being obsoleted by Terraform. Developers of Data Pipelines, Predictive and ETL, can focus more on the differentiated aspects of their work, leaving the management of data infrastructure to AWS for the most part.
Agari, a leading email security company, is applying big & fast data best practices to both the security industry and to the cloud in order to secure the world against email-bourne threats. We do this by building near-real time stream processing predictive data pipelines & control systems in the AWS cloud that are infinitely scalable, highly available, low latency, and easy to manage.
Speaker: Sid Anand, Agari
ABOUT DATA COUNCIL:
FOLLOW DATA COUNCIL:
Big Data companies (e.g. LinkedIn, Facebook, Google, and Twitter) have historically built custom data pipelines over bare metal in custom-designed data centers. While this affords greater control over performance and cost, it often creates a division between operations and development that leads to decreased agility and velocity. Operations see their clients (developers) as internal and hence often don’t invest in self-service tooling, preferring archaic human-backed ticketing systems with human-scale turnaround SLAs (e.g. from hours to days to months to get new machines). The public cloud is a game changer because all users of infrastructure are deemed to be external - hence, all resources from Kinesis or PubSub streams to S3 or Cloud Storage buckets to DynamoDB or BigTable tables can be requested and provisioned on the fly!
Thanks to recent cloud improvements, data infrastructure (i.e. databases, data pipelines, search engines, blob stores) can now not only be provisioned on the fly but also autoscaled and auto-healed without the developer being aware. Autoscaling of EC2, introduced ~8 years ago, has recently been replaced by serverless (e.g. AWS Lambda) & fully-hosted ( e.g. AWS Elasticache, ElasticSearch, DynamoDB ) approaches. Provisioning automation such as Chef, Puppet, and Ansible, is increasingly being obsoleted by Terraform. Developers of Data Pipelines, Predictive and ETL, can focus more on the differentiated aspects of their work, leaving the management of data infrastructure to AWS for the most part.
Agari, a leading email security company, is applying big & fast data best practices to both the security industry and to the cloud in order to secure the world against email-bourne threats. We do this by building near-real time stream processing predictive data pipelines & control systems in the AWS cloud that are infinitely scalable, highly available, low latency, and easy to manage.
Speaker: Sid Anand, Agari
ABOUT DATA COUNCIL:
FOLLOW DATA COUNCIL: