Reading Parquet Files in Python

preview_player
Показать описание
This video is a step by step guide on how to read parquet files in python. Leveraging the pandas library, we can read in data into python without needing pyspark or hadoop cluster. This walkthrough discusses how to install the prerequisites you will need in python as well.

#python
Рекомендации по теме
Комментарии
Автор

Bang on. Thanks for even including the error portion for installing pyarrow. Helpful.

industryrule-
Автор

You save my life!! Thx for the tutorial!!!

jozzalex
Автор

any tips on the error "binary file expected, got text file" ? Second dataset in a row with this error.

lockwood
Автор

Very helpful tutorial. Newbie question - I am able to load my parquet file in the notebook. It has 130 columns. But it shows only 20 columns. How can I see all the columns? even if it is for at least 1 or 2 rows is fine.

itsevennow
Автор

What to do when you have a parquet file somewhere on someone else's cloud, is it possible to feed it to Pandas without saving it locally? I am not seing a way to save it locally. it's a coding challenge that simply gives you the link to the cloud location of the parquet data.

mihaelacostea
Автор

Thanks a lot. I encountered that Jupyter kernel is dead and when restaring the kernel and trying again, I got the same problem. I even tried to put the code in .py file and run it from the terninal but I got nothing printed `print(df.head())`

KhalilYasser
Автор

How to read a list of parquet files and read it as a single dataframe?

gauravanand
Автор

Thanks a lot! This video helps a lot! Could you also let us know how to convert the parquet file to .csv file in Python?

hiyoungsun
Автор

@DataEng Uncomplicated I'm getting a NameError for parquet_file, but it has been defined as shown in the video. Please help, thanks

MetallicSiren
Автор

Is there a way to load parquet file into oracle DB directly using python scripts?

akshatahabbu
Автор

I get the error OSError: Passed non-file path:. Have you had this ?

neelrama
Автор

FileNotFoundError: [Errno 2] No such file or directory .... my directory path and file name are correct

diy__diy
Автор

I get the error: name 'pd' is not defined any advice?

enzopablofranciscocarratal
Автор

Can you let me know commands to edit parquet metadata information

nishaddhamne
Автор

how to read parquet file from azure blob storage?

shubhammural
Автор

man why can't they just use a zip file...

zmtnx