Enabling Unity Catalog on Azure Databricks: A Step-by-Step Guide

preview_player
Показать описание
Welcome to this straightforward and practical guide on enabling Unity Catalog in your Azure environment. This video is tailored for users who are looking to activate this powerful feature but are unsure where to start.

Unity Catalog brings a new layer of data management and security to your Databricks environment, and with this short demo, you’ll learn how to unlock these capabilities in a few easy steps. I’ll walk you through a step-by-step demonstration on how to enable it in your Azure Databricks workspace.

⌚Timestamps:
00:00 Intro
00:30 Requirements to follow along
01:40 High Level Summary of Enablement Process and Services Required
02:50 Creating a Resource Group
03:30 Creating a premium Databricks workspace
04:16 Creating a ADLS Gen2Storage Account and Container
05:44 Creating the Access Connector for Azure Databricks
06:55 Assigning Storage Blob Data Contributor permissions to the Access Connector
08:18 Enabling Unity Catalog for the Databricks workspace
08:30 Accessing the Admin Console
09:30 Creating a Metastore
11:27 Assigning the workspace to the Metastore

🔗 Links and Documentation:

💻 Check out my Databricks courses on Udemy:
Рекомендации по теме
Комментарии
Автор

TYSM for putting this out. I was struggling to stitch it all together by reading the documentation, but so many details were still unclear. Documentation is still written as if it were 1995, prioritizing exhaustiveness of option descriptions over clarity. Step-by-step tutorials like these are priceless and should be included by editors because it's a real pain in the butt to go through Russian doll documentation pages, get lost, and waste 6 hours of trial and error just to set up an option correctly.

HuxleyCrimson
Автор

Amazing man. Your explanation and demo are clear. Keep going don’t stop.

ibdallah
Автор

Great Explanation, clear and informative. Thank You.

suniguha
Автор

Thanks a lot from sharing your knowledge

oscarestorach
Автор

Great, thanks a lot! Very clear. If that's possible could you make a guide how to do all the deployments and enabling Unity Catalog using Terraform?

madessen
Автор

Great video, thanks! I found various documents for different tasks mentioned here; but it was a pain to resolve what I was missing. Thanks to this video, I was able to find what I was missing. Is there a single Azure Databricks document that explains the whole flow?

mikenike
Автор

How would this implementation play out when you need the underlying data to be stored across different environment locations? Like a dev/uat/prd? Creating the single metastore means that all underlying managed files are created in that one location. Would you skip providing the ADLS Gen 2 path (@ 9:57) and then provide each location when creating the catalog itself? That part is not clear to me...

Alex-hwoj
Автор

Well Explained !! Very Useful. Kindly try to make the video bit clear vision. Thanks :)

ranjansrivastava
Автор

That's for a really clear and straight forward tutorial. The one question that nagged me though (and you have may have touched on it, but I missed it), is why we are creating an additional storage account and account connector when one already appears to exist in the managed resource group that is generated when you created the initial premium workspace. I even saw in your tutorial when I re-watched it, the drop-down menus referenced the ones (storage account and access connector) that were already available in addition to the ones you created. Is there anything wrong with using the ones the Storage Account and Access Connector that were already created on your behalf as opposed to re-creating your own in order to enable UC? Or is it a best practice to create additional separate resources for UC enablement rather than using what exists?

Polyglot
Автор

From Nov 2023, by default Azure is enabling all workspaces with Unity catalog, I saw it in MSft documentation

abhishekm
Автор

Unity catalog can't be enabled for the hive_metastore, that's there by default?

TheDataArchitect