006 Common Data Models: to use or not?

preview_player
Показать описание
Episode 006 discusses the broader question of whether to use a common data model like Open Cybersecurity Schema Framework (OCSF) or not. It outlines the options including schema-on-write and schema-on-read approaches, and also elucidates the trade-offs especially in the context of Databricks Lakehouse for cybersecurity. The video also demonstrates how you could even query Zeek logs (extracted from PCAP using Zeek) in cloud storage without ingesting that data into delta lake format if you really need to trade performance for cost savings.

All opinions expressed in the video are my own and do not necessarily reflect the views and opinions of my employers past or present.
Рекомендации по теме
Комментарии
Автор

Mostly semantics, however I would still consider direct queries against the raw files to be part of the medallion model bronze layer, but I would register those as views against the raw files, vs materializing them as bronze delta tables. Those bronze views could still be used as a source for further ELT into silver tables at a later point if needed, or would be read and available for ad-hoc analysis whenever needed.

Also, with different teams, use-cases, and evolution of industry data model preferences over time, there may be situations where the raw data needs to be simultaneously mapped to multiple data models, eg OCSF (relatively new) + CIM (established). In these cases, support for each individual data model would have a discreet schema on read/write mapping decision based on how each model was intended to be utilized and for how long it is expected to be maintained.

rlhf