Master Databricks and Apache Spark Step by Step: Lesson 9 - Creating the SQL Tables on Databricks

Показать описание

In this video, you will learn about the project use case, its data, and how to create and load the Spark SQL tables you'll need from the CSV files provided. This video lays the foundations for the ones that follow so make sure you watch it and create your own database.

Note: Video was re-edited to improve sound and uploaded again.

Join my Patreon Community and Watch this Video without Ads!

Example Slides & Notebook at:

You need to unzip the file and import the notebook into Databricks to run the code.

Video on The Data Science Process

Video on Dimensional Modeling

Databricks Spark SQL Data Types

Bryan Cafferky

Рекомендации по теме

Комментарии

This a re-edited upload. I cleaned up the sound, removing some annoying background noise and I cut our some parts that seemed unnecessary.

BryanCafferky

I've been watching these videos for a couple of days, and they are great. I have a Udemy account through my employer but the videos available there are lacking. They don't necessarily give a rhyme or reason why you want/need to do something, and they completely ignore the background of Databricks and Spark, just jumping straight into how to use notebooks. Bryan spends a lot more time explaining how and why you do something, which means you are more likely to figure out how to do what you want to do rather than simply memorizing commands.

TL;DR: This video series is much more valuable than the paid-for content on Udemy and, possibly, similar sites.

codyjackson

hey @BryanCafferky when i am adding the factinternetsalesreason table it is also adding the header on the first line even though I have done header = "false"

itsshehri

It's better to use a Python script with for loop to create all the tables in one go compared to writing multiple SQL statements which does more or less the same job, right?

phemasundar

Hi Bryan thanks for the content. I have a question:
Thinking in a real life scenario, what would be the advantage of creating tables within Databricks/Spark rather than reading files from a blob storage as dataframes and the writing the output to blob storage or an OLAP database?

Wouldn't having tables within Databricks will add complexity to the storage layer?

I guess I am missing the use cases here.

Thanks a lot!

santicodaro

these schema-on-read "tables" are persisted by hive;what does this mean for user visibility? i.e., who can view and/or use this stored info?

DM-pypj

@Bryan Cafferky the updated UI does not let you load more than 10 files at a time (your practice above has 19), any work around or suggestions?

Leez

How to avoid duplicate csv file upload? What will be the cost impact on azure cloud due to upload of duplicate files?

ashukol

I am getting this error: UnityCatalogServiceException: uri /FileStore/tables/DimDate.csv is not a valid URI. Error message: INVALID_PARAMETER_VALUE: Missing cloud file system scheme.

I added dbfs and still syntax won't run Creating table in Unity Catalog with file scheme dbfs is not supported.
Instead, please create a federated data source connection using the CREATE CONNECTION command for the same table provider, then create a catalog based on the connection with a CREATE FOREIGN CATALOG command to reference the tables therein. SQLSTATE: 0AKUC

hemalpbhatt

Error in SQL statement: AnalysisException: Unable to infer schema for CSV. It must be specified manually. I am getting this error @BryanCafferky

muskangupta

Master Databricks and Apache Spark Step by Step: Lesson 9 - Creating the SQL Tables on Databricks

Master Databricks and Apache Spark Step by Step: Series Overview

Master Databricks and Apache Spark Step by Step: Lesson 1 - Introduction

Master Databricks and Apache Spark Step by Step: Series Update - What's Changed?

Master Databricks and Apache Spark Step by Step: Using Scala Dataframes & Datasets

Learn Apache Spark in 10 Minutes | Step by Step Guide

Master Databricks and Apache Spark Step by Step: Lesson 3 - Databricks Demo

Master Databricks & Apache Spark Step by Step: Lesson 5 - Using The Data Science Process

Master Databricks and Apache Spark Step by Step: Lesson 27 - PySpark: Coding pandas UDFs

PySpark Interview Questions | Azure Data Engineer #azuredataengineer #databricks #pyspark

Master Databricks and Apache Spark Step by Step: Lesson 35 - How to use SparkR (R on Spark)

What Is Apache Spark?

Master Databricks and Apache Spark Step by Step: Lesson 2 - Create a Databricks Workspace

Master Databricks and Apache Spark Step by Step: Lesson 21 - PySpark Using RDDs

Master Databricks and Apache Spark Step by Step: Lesson 20 - PySpark Introduction

PySpark Tutorial

What is Data Bricks ? | Data Bricks Explained in 5 mins | Apache Spark | Great Learning

Master Databricks and Apache Spark Step by Step: Lesson 24 - Creating PySpark Dataframe Scalar UDFs

Master Databricks and Apache Spark Step by Step: Lesson 14 - Using SQL Set Operators

Master Databricks and Apache Spark Step by Step: Lesson 18 - Using SQL Views on Spark

What is Databricks? | Introduction to Databricks | Edureka

Spark Full Course | Spark Tutorial For Beginners | Learn Apache Spark | Simplilearn

Master Databricks and Apache Spark Step by Step: Lesson 13 - Using SQL Joins

Databricks and Apache Spark

Master Databricks and Apache Spark Step by Step: Lesson 26 - PySpark: Intro to the New pandas UDFs