What Is A Data Lakehouse A Super Simple Explanation For Anyone

preview_player
Показать описание
First, there was a data warehouse – an information storage architecture that allowed structured data to be archived for specific business intelligence purposes and reporting. The concept of the data warehouse dates back to the 1980s and has served businesses well for several decades – until the dawn of the Big Data era. This was when businesses began to unlock the value of working with unstructured data – messy, raw information that might come in the form of pictures, videos, or sound recordings. This type of data typically makes up 80 to 90% of the information available to organizations and often holds a phenomenal amount of value – think of the insights contained in years' worth of customer email communications or hours of production line video footage. Unfortunately, it doesn't fit well with the structured and ordered way information is stored in the data warehouse model. This led to the development of a different type of architecture known as the data lake – where unstructured information is stored in its raw format, ready for whatever uses we may be able to find for it, now or in the future. The data lake is undoubtedly a hugely powerful and flexible architecture. However, it does have some issues. For a start, as you can imagine, it can get very messy – in fact, I've heard it said that if they aren’t careful, businesses can end up with something that more closely resembles a data swamp! This can create governance and privacy issues, as well as technical complexities involved with creating systems that are able to ingest data in a myriad of schema and formats. Which brings us to …So today, businesses and other organizations that work with datasets that could be considered Big Data have yet another option when it comes to storage architecture. Just as we are with cloud platforms in general, with data storage, we are increasingly hearing about a hybrid architecture which is being called the "data lakehouse” approach. There are no prizes for guessing that the fundamental idea behind this approach is to take the best concepts from both the data warehouse and data lake models and put them together while trying to eliminate the worst concepts of both models! Just like a data lake, a data lakehouse is built to house both structured and unstructured data. This means that businesses that can benefit from working with unstructured data (which is pretty much any business) only need one data repository rather than requiring both warehouse and lake infrastructure. Where organizations do use both, then generally data in the warehouse feeds BI analytics, while data in the lake is used for data science – which could include artificial intelligence (AI) such as machine learning -and storage for future, as-of-yet undefined use cases. Data lakehouses enable structure and schema like those used in a data warehouse to be applied to the unstructured data of the type that would typically be stored in a data lake.

#data #newslive #newstodaycnn #newstodayupdate #newstoday #newstodayabc #
Рекомендации по теме