Tutorial: Building a Data Lakehouse with Apache Iceberg, Spark, Dremio, Nessie & Minio

preview_player
Показать описание
Blog Tutorial:

Similar Tutorial Using Flink and Nessie:
Рекомендации по теме
Комментарии
Автор

This is so nice. Now I dont have to pay for databricks in order to learn Spark!

recs
Автор

Hi Alex, this looks really great and I can imagine so many use cases with it. I just wonder if there is a way to tell what has changed between the branches at the moment? And please correct me if I'm wrong, but is this only for use with parquet or structured data files? I have a project where we use other data format like fastq, fasta that is widely used in bioinformatics to store genetics informatoin, they are nothing like parquet, and I don't think any engines can query anything from them. We keep them in a "data warehouse" (s3 bucket) and we would need to version them. Would Nessie be a good use case for this? Thanks!

hieuthaingoc
Автор

amazing stuff. thank you so much for that.
i was wondering if spark is a must or can we just use Dremio to do the data ingestion too?

SharonLavie_
Автор

Awesome.. Just what I was looking to get rid of AWS. How can I create tables from a CSV file uploaded in minio?

OswaldoSaumet
Автор

Why is the spark configuration with all of the lakehouse services hardcoded in a notebook? Shouldn’t these configurations be incorporated into the docker image you’re using for Spark?

recs
Автор

Great article Alex. Slight issue creating a view in Dremio, I get the following exception "Validation of view sql failed. Version context for table nessie.names must be specified using AT SQL syntax". Nothing obvious in the console output, any ideas?

joeingle
Автор

Is there a new link for the article? The Flink+Nessie article is still available, but the "Blog Tutorial" link is dead.

gfinleyg
Автор

great video.
how orchestration all that?

rafaelg
Автор

we cant able to read files direcly from minio bucket to appache spark .
How can we can read file from mino bucket and process in spark ?

aesthetic_mard
Автор

I got an error Failed to load class "org.slf4j.impl.StaticLoggerBinder", when running the script for spark

joshuajames
Автор

Awesomw tutorial, just a question, trying to create the table, I'm getting this error (can you help)....
{
"name": "Py4JJavaError",
"message": "An error occurred while calling o64.sql.
:
\tat

marceloacarrasco