01 Install and Setup Apache Spark 2.2.0 Python in Windows - PySpark

preview_player
Показать описание
Apache Spark for Big Data Analytics and Machine Learning is available now (link below).

Install and Setup Apache Spark 2.2.0 Python in Windows - PySpark
** Support by following this channel:) **

New windows environments:
1. HADOOP_HOME = C:\spark\hadoop
2. JAVA_HOME = C:\Program Files\Java\jdk1.8.0_151
3. SCALA_HOME = C:\spark\scala\bin
4. SPARK_HOME = C:\spark\spark\bin
6. PYSPARK_DRIVER_PYTHON_OPTS = notebook

No 2,4,5, you can change regarding your own path locations.

Best,
Ardian
Рекомендации по теме
Комментарии
Автор

Apache Spark tutorial is already completed now. Click link below for the playlist.

ArdianUmam
Автор

After a lot of "The system cannot find the path specified" the winning combination was:

HADOOP_HOME = C:\spark\hadoop
JAVA_HOME = C:\Program Files\Java\jdk1.8.0_151
SCALA_HOME = C:\spark\scala
SPARK_HOME = C:\spark\spark
PYSPARK_PYTHON =

PATH:
%SPARK_HOME%\bin
%HADOOP_HOME%\bin
%SCALA_HOME%\bin
%JAVA_HOME%\bin

Thanks for the video!

StevenVanDorpe
Автор

These settings worked for me :)
Change:
SCALA_HOME = C:\spark\scala\bin => SCALA_HOME = C:\spark\scala
SPARK_HOME = C:\spark\spark\bin => SPARK_HOME = C:\spark\spark
PYSPARK_DRIVER_PYTHON = => PYSPARK_DRIVER_PYTHON = jupyter

Add in Path variable:
%SPARK_HOME%\bin
%HADOOP_HOME%\bin
%SCALA_HOME%\bin

Remove from environment variable:
PYSPARK_PYTHON =

ANANTBARA
Автор

If someone else has the same problem as me:
"The system cannot find the path specified"
TRY: (without \bin)
3. SCALA_HOME = C:\spark\scala
4. SPARK_HOME = C:\spark\spark

:D

PetaZire
Автор

Heeeey, to avoid error: The system cannot find the path specified", please, DO NOT set these variables:
PYSPARK_PYTHON =
5. PYSPARK_DRIVER_PYTHON =
6. PYSPARK_DRIVER_PYTHON_OPTS = notebook.
It's not necesary, and can do a mess.

Also, remember: system variables - go without "\bin" at the end. In the %PATH% - go with "\bin".

karolinaswiergaa
Автор

Wonderful resource brother, helped me to resolve the path issue.

sandeepg
Автор

Hi all,
I have the same problem "The system cannot find the path specified". I have adjusted the environment already both with "\bin" and without "\bin" but it still does not work. Help me please!

xuanhuongdinh
Автор

** My Book in Indonesian Language **
Telah hadir buku ttg data mining, data science, big data analytics, machine learning dalam bahasa Indonesia, yang saya tulis bersama Prof Budi Santosa (ITS). Di Edisi 2, ada tambahan materi Deep Learning, khususnya tentang CNN (Convolutional Neural Network). Dapatkan di sini sekarang :)

ArdianUmam
Автор

Those who are facing errors:
Look clearly at your file locations. It might not be the same as the video.
And look up at the comments section. I think you will surely find a solution.

faisal.fs
Автор

This video helped me a lot, thank you so much Mr. Umam.

ozgebars-tuzemen
Автор

hey man, can you please come up with a tutorial on setting up spark on pycharm ide? and show some ways on how to work with pyspark in pycharm ide. a nice data analysis with core spark transformation and actions along any important spark related stuff which can be covered as a project explanation process for interviews

imohitr
Автор

This was really helpful it worked without any error. Thanks a ton brother.

flamboyantperson
Автор

hi i am getting a error stating that pyspark is not recognised as an internal, external or a batch file how can i resolve this error

nithinreddy
Автор

Thank you for this video, changing my environment variables worked for me.

samskyverareddy
Автор

Thanks for the tutorial, but I'd point out that if the PYSPARK_PYTHON_DRIVER points to a python.exe in anaconda, and PYSPARK_PYTHON_DRIVER_OPT is set to notebook, typing pyspark in the command prompt will open a jupyter notebook, not a command prompt python environment. Your video has these variable values, but opens to a command prompt python environment, but it should open a jupyter notebook (righ?) it through me for a huge loop.

spacedustpi
Автор

Can you show us how to install nltk for pyspark? Apparently you need to separately install it onto the nodes

saleem
Автор

Hi. I see you have installed Scala but I didn't find how you integrated Scala with Jupyter. Do you have such a configuration? Thanks.

ioanvapi
Автор

I have followed the instructions and it shows me the system can not find the specified route. Can you help me?

ziqi
Автор

getting this error: jupyter: error: one of the arguments --version subcommand --config-dir --data-dir --runtime-dir --paths is required

techgeeks
Автор

This helped me a lot, thank you Ardian

hemanthvarma