Install Apache Spark and PySpark on Ubuntu 20.04 Linux Debian, Python 3.7 - Part 1a

preview_player
Показать описание
0:00 - check if Java is already installed then install JRE and JDK
2:26 - download the Spark library from Apache website
4:22 - uncompress and install the Spark tgz file to a target directory
8:10 - update .bashrc file to include environment path variables for both Spark and PySpark
12:26 - install PySpark, and install pip3 if required
13:24 - validate Spark is working via the Spark shell (Scala prompt)
15:00 - validate PySpark is working via the PySpark shell (Python prompt)
15:36 - access Spark UI via the browser

Apache Spark - Install Spark3, PySpark3 on Ubuntu 20.04, Debian, Python 3.8 - Part 1b
Follow this tutorial if you're using Python3.8 or higher
You won't get any incompatible version 3 errors

If you get further errors you might be installing the wrong versions of Spark and Pyspark
both Spark and PySpark versions need to be compatible.
If you do need to downgrade to Python 3.7 or switch between versions,

Рекомендации по теме
Комментарии
Автор

After hours of searching I finally found this working manual. Thanks a lot for the video!

peterwissel
Автор

thank you for such a detailed demonstration. it finally worked for me!

kawaicool
Автор

Like many others, even I messed up my PATH initially but I fixed it and it is working now. Thanks for making this video!

tnmyk_
Автор

Thanks mate ..really i have search a lot, but have got solution here..

kollurusrinivasarao
Автор

My path variable had been messed up dude

panyampraneeth
Автор

I followed all the steps as you did, now my PC does not start.
I put the password and it request me it again. I thing it is something with the PATH env variable :/

danielbaena
Автор

command not found error I run following command
sudo spark-2.4.5-bin-hadoop2.7

srinuj
Автор

Hi mate, I executed your video on a VM built with Ubuntu ISO image "ubuntu-20.04-desktop-amd64". I then did everything you showed but when I checked the version of Python it was 3.8. The result was that my Pyspark could not load and gave the error TypError. I did a bit more research and found that it is apparently an error which is caused by incompatibility with Python 3.8. Some people say it can be fixed by downgrading to Python 3.7.5. Don't know how to do that though. What amazes me is that my versions, etc. were exactly matching yours but my Python turned out R3.8.2. Can you give us an ideas please?
thanks!

pgafshar