Combining Transactional And Analytical Workloads On MemSQL with Nikita Shamgunov - Episode 51

preview_player
Показать описание
Summary

One of the most complex aspects of managing data for analytical workloads is moving it from a transactional database into the data warehouse. What if you didn’t have to do that at all? MemSQL is a distributed database built to support concurrent use by transactional, application oriented, and analytical, high volume, workloads on the same hardware. In this episode the CEO of MemSQL describes how the company and database got started, how it is architected for scale and speed, and how it is being used in production. This was a deep dive on how to build a successful company around a powerful platform, and how that platform simplifies operations for enterprise grade data management.

Preamble



• Hello and welcome to the Data Engineering Podcast, the show about modern data management












• Your host is Tobias Macey and today I’m interviewing Nikita Shamgunov about MemSQL, a newSQL database built for simultaneous transactional and analytic workloads



Interview



• Introduction


• How did you get involved in the area of data management?


• Can you start by describing what MemSQL is and how the product and business first got started?


• What are the typical use cases for customers running MemSQL?


• What are the benefits of integrating the ingestion pipeline with the database engine?



• What are some typical ways that the ingest capability is leveraged by customers?










• How is MemSQL architected and how has the internal design evolved from when you first started working on it?







• Where does it fall on the axes of the CAP theorem?









How much processing overhead is involved in the conversion from the column oriented data stored on disk to the row oriented data stored in memory?




• Can you describe the lifecycle of a write transaction?

















Can you discuss the techniques that are used in MemSQL to optimize for speed and overall system performance?





• How do you mitigate the impact of network latency throughout the cluster during query planning and execution?

...
Рекомендации по теме