Data Modeling for BigQuery (Google Cloud Next '17)

preview_player
Показать описание
BigQuery is a different data warehouse, permitting new approaches to data modeling. To get the most out of this system, Dan McClary and Daniel Mintz examine where old assumptions of schema design come from, as well as how BigQuery allows them to challenge those assumptions to produce data models which are easier to query and more performant. Additionally, they examine this parallel evolution of assumptions in Business Intelligence, and how modern tools such as Looker can take full advantage BigQuery's flexible data models.

Рекомендации по теме
Комментарии
Автор

The real bigquery modeling part starts at 39:40. Other than that, it's history thing, and Looker doing some promotion..

rendybjunior
Автор

Yes, at least the first part is like a history lesson. Unfortunately, several parts of it are inaccurate. To mention a few:
(0) We are not modeling data -- the whole point is to build a representation of things, relationships among them, and constraints or business rules in some user domain. We are modeling some aspect of the business (presuming that we will implement that in some database).
(1) Codd's 1970 paper said nothing about normalization as we understand it today. That all came later.
(2) In 1970 we actually had a richer data modeling scheme in CODASYL network structures (formally defined relationships on COBOL files, which were a hierarchical structure). Codd's relational scheme retreated from the richer structure by requiring that all data attributes be single valued (now called 1NF). One consequence of that is the need to resolve all M:N relationships using an intersection entity -- an artificial construct which is only needed for the convenience of the system, an RDBMS. Users can understand M:N relationships even if our systems can't! That is where nesting comes in (or returns).
. . The focus here is on the efficient processing of large volumes of data, rather than modeling some users' world as accurately and completely as possible to gain understanding. Before we can make sense of all the data we gather, we must first understand the data, or rather the world of which it is a model.

GordonEverest
Автор

I just went through this today .. due to a closer interest in BigQuery.
To me the speakers come through as inadequately rounded in the data modelling discipline.
For example, there are repeated assertions that data modelling is done to gain performance.
As seasoned data modellers know ..logical data models depict the reality and the semantics.
I felt that their demeanor is condescending towards data modelling (i.e., "we are the new world. whatever we do is right/better").
All said and done, it appears that BigQuery is like an XML store or JSON store due to its nesting.
Totally agree that the economics of storage and the accessibility to distributed compute capacity warrants due diligence in designing the physical storage of the data or using the undesigned - as-is - stored data.

madanibasha
Автор

Very insightful. At 51:38, I would name the table "products", instead of "items".

anandakumarsanthinathan
Автор

so lookml is language for looker tool and i cant writer ETL in lookml unless we opt for looker.

AshishGujalwar
Автор

I am sorry, I really like the Google cloud stack but a lot of things told here the first 20 minutes are just plain false...

bashyroger
Автор

Bottom Line: no database gives you what you wanted until you request properly. If you dont know how to retrieve and relay on some kind of LM then good luck.

santhoshchanna
Автор

feels like a long, boring history lesson. sorry.

mlugggy