Master Databricks and Apache Spark Step by Step: Lesson 9 - Creating the SQL Tables on Databricks

preview_player
Показать описание
In this video, you will learn about the project use case, its data, and how to create and load the Spark SQL tables you'll need from the CSV files provided. This video lays the foundations for the ones that follow so make sure you watch it and create your own database.

Note: Video was re-edited to improve sound and uploaded again.

Join my Patreon Community and Watch this Video without Ads!

Example Slides & Notebook at:

You need to unzip the file and import the notebook into Databricks to run the code.

Video on The Data Science Process

Video on Dimensional Modeling

Databricks Spark SQL Data Types
Рекомендации по теме
Комментарии
Автор

This a re-edited upload. I cleaned up the sound, removing some annoying background noise and I cut our some parts that seemed unnecessary.

BryanCafferky
Автор

I've been watching these videos for a couple of days, and they are great. I have a Udemy account through my employer but the videos available there are lacking. They don't necessarily give a rhyme or reason why you want/need to do something, and they completely ignore the background of Databricks and Spark, just jumping straight into how to use notebooks. Bryan spends a lot more time explaining how and why you do something, which means you are more likely to figure out how to do what you want to do rather than simply memorizing commands.

TL;DR: This video series is much more valuable than the paid-for content on Udemy and, possibly, similar sites.

codyjackson
Автор

hey @BryanCafferky when i am adding the factinternetsalesreason table it is also adding the header on the first line even though I have done header = "false"

itsshehri
Автор

It's better to use a Python script with for loop to create all the tables in one go compared to writing multiple SQL statements which does more or less the same job, right?

phemasundar
Автор

Hi Bryan thanks for the content. I have a question:
Thinking in a real life scenario, what would be the advantage of creating tables within Databricks/Spark rather than reading files from a blob storage as dataframes and the writing the output to blob storage or an OLAP database?

Wouldn't having tables within Databricks will add complexity to the storage layer?

I guess I am missing the use cases here.

Thanks a lot!

santicodaro
Автор

these schema-on-read "tables" are persisted by hive;what does this mean for user visibility? i.e., who can view and/or use this stored info?

DM-pypj
Автор

@Bryan Cafferky the updated UI does not let you load more than 10 files at a time (your practice above has 19), any work around or suggestions?

Leez
Автор

How to avoid duplicate csv file upload? What will be the cost impact on azure cloud due to upload of duplicate files?

ashukol
Автор

I am getting this error: UnityCatalogServiceException: uri /FileStore/tables/DimDate.csv is not a valid URI. Error message: INVALID_PARAMETER_VALUE: Missing cloud file system scheme.

I added dbfs and still syntax won't run Creating table in Unity Catalog with file scheme dbfs is not supported.
Instead, please create a federated data source connection using the CREATE CONNECTION command for the same table provider, then create a catalog based on the connection with a CREATE FOREIGN CATALOG command to reference the tables therein. SQLSTATE: 0AKUC

hemalpbhatt
Автор

Error in SQL statement: AnalysisException: Unable to infer schema for CSV. It must be specified manually. I am getting this error @BryanCafferky

muskangupta