Advancing Spark - Managing Files with Unity Catalog Volumes

Показать описание

In the time before Unity Catalog, we mounted our lakes to a workspace and had nice aliased folder paths to refer to incoming data files, sandbox data, experiments and any other types of lake file. Unity Catalog brings a huge amount of governance, security and management functionality, but we felt a huge gap when it came to accessing actual files! Unity Catalog Volumes fills this gap, providing a slick, easy way of bringing your file-based data into the catalog.

In this video, Simon walks through setting up a Unity Catalog volume, before showing how it can then be viewed, queried and even hooked up to Autoloader for efficient ETL loading.

If you need help rolling out Unity Catalog and revamping your lakehouse to take full advantage, get in touch with Advancing Analytics

Advancing Analytics

Рекомендации по теме

Комментарии

Thank you sir!
I'll try it out right away and probably include it to our ways of working.
I feel it can reduce the burden and avoid creating external locations for each data analysts projects.

vincentdelbaen

Great video, ! as always, best place to learn new Databricks features :)

datawithabe

Great video. One unrelated question: how do you guys manage deployments with databricks? I come from an airflow +Jenkins background as an engineer. Would you recommend Jenkins for databricks deployments?

coleb

Love your work Simon. Do you know if it is possible to have a credential that is not associated with same cloud provider as the Unity Catalogue instance? I have Databricks environment deployed on Azure but one of the ingestions is via an S3 bucket. I would love to be able to set this up as an external volume.

AshleyBetts-ht

How can I get the access of data ricks environment for learning. I know there is a community edition available but somehow I am not able to load my raw files into that

atulbansal

So with mounts we can have the dev workspace mount the dev containers, and the prod environment mount the prod containers, and they both get mounted to the same path. So the notebook don't have to 'know' if its running in dev or prod. How will that work in this new world? I noticed that the path contains "dev". Does each notebook have to figure out what environment it is in, and then read/write from the right paths and catalogs based on some string manipulation?

ErikParmann

Does this also replace DBFS access in general?

MariusS-hp

Advancing Spark - Managing Files with Unity Catalog Volumes

Advancing Spark - Managing Files with Unity Catalog Volumes

Advancing Spark - Autoloader Resource Management

Advancing Spark - Rethinking ETL with Databricks Autoloader

Advancing Spark - Reflecting on a Year of Unity Catalog

Advancing Spark - Setting up Databricks Unity Catalog Environments

Advancing Spark - The Photon Whitepaper

Advancing Spark - Azure Databricks News April 2023

Advancing Spark - Engineering behind Featurestore

Advancing Spark - Understanding the Unity Catalog Permission Model

Spark Executor Core & Memory Explained

Advancing Spark - Understanding the Spark UI

Advancing Spark - Row-Level Security and Dynamic Masking with Unity Catalog

Advancing Spark - Delta Lake VLDB Paper Walkthrough

Advancing Spark - Azure Databricks News March 2023

An Advanced S3 Connector for Spark to Hunt for Cyber Attacks

Advancing Spark - Azure Databricks News Oct 2022

Advancing Spark - Databricks Runtime 9

Advancing Spark - Data Lakehouse Star Schemas with Dynamic Partition Pruning!

Dynamic Databricks Workflows - Advancing Spark

Advancing Spark - Getting hands-on with Delta Cloning

Advancing Spark - External Tables with Unity Catalog

Advancing Spark - Understanding Terraform

Advancing Spark - Delta Merging with Structured Streaming Data

Advancing Spark - Databricks Delta Change Feed