On the path to building an event-monitoring data pipeline for storage microservices

preview_player
Показать описание
Watch Narendra Narang, Principal Specialist Solutions Architect for Cloud Storage, Red Hat and Daniel Smith, Operations and Infrastructure, Kaloom speak in this breakout session at Red Hat Summit 2017.

Over the past few years, telecommunications companies have invested significantly in large, multi-tenant cloud deployments and aspire to operate with high efficieny by running and orchestrating a suite of infrastructure microservices at scale. These cloud deployments and services are backed by systems that are typically shared by tenants and as these environments scale, the need to streamline operational control by managing these systems with a higher degree of predictability becomes imperative. In this talk, we will present a production, multi-tenant (containerized) compute and software-defined (Red Hat Ceph and Gluster) storage microservices architecture, discuss the typical workloads that are imposed on the system and examine key events that are generated by discrete components that comprise the overall system. Subsequently, we will propose a set of open-source technologies utilized to build a pipeline that encompasses collection of these events, appropriate storage of these events and a Kappa (Analytics) architecture that leverages an in-memory analytics engine - Apache Spark - to correlate these events in order to extract operational insight with predictability. As a result, we advocate that event data is voluminous, has numerous properties and, in some cases, may require a grouping of multiple events to create a more complex event. Furthermore, the characterization and evaluation of complex events to predictively recognize and avert potential issues is highly dependent on the temporal relevance of the event data of concern. We hope to demonstrate how simple machine learning algorithms may be applied to extract typical (affinity) and atypical (interference) patterns of resource behavior within the shared, distributed system and how building a parametric model with suitable weightings for different features may be useful in making these predictions with increased accuracy and in taking the appropriate remedial action.

Рекомендации по теме