Spark Installation | PySpark Installation | Windows 10 / 11 | Step by Step |#spark #interview

preview_player
Показать описание
In this video we will be setting up #python #java #spark #pycharm and #pyspark in our local system.

Steps :
======

1) #Java Download:

2) #Python Download: (3.11.4) :

3) #Spark Download: (3.4.2) :

5) #Pycharm community download:

Check Python , Java , PySpark and Spark Version :
========================================
python --version
java --version
spark-shell
pyspark --version

Solution :
========
Write these below two lines before the spark object creation.

-------------------------------------------------------------------------------------------------------------------------------------------------------

If we don't want to use the virtual environment python then.
add the below environment variable.

Variable Name : PYSPARK_PYTHON

if you add the PYSPARK_PYTHON variable then you will not required to set the OS environ variables in the code.
-----------------------------------------------------------------------------------------------------------------------------------------------------
Sample Code :
============
from datetime import datetime, date
import os
import sys

Row(a=1, b=2., c='string1', d=date(2000, 1, 1), e=datetime(2000, 1, 1, 12, 0)),
Row(a=2, b=3., c='string2', d=date(2000, 2, 1), e=datetime(2000, 1, 2, 12, 0)),
Row(a=4, b=5., c='string3', d=date(2000, 3, 1), e=datetime(2000, 1, 3, 12, 0))
])

#python #leetcode #dsa #interview #sql #dataengineers #dataanalytics #datascience #StrataScratch #Facebook #data #dataengineeringinterview #codechallenge #datascientist #pyspark #CodingInterview
#dsafordataguy
Рекомендации по теме
Комментарии
Автор

If we don't want to use the virtual environment python then.
add the below environment variable.

Variable Name : PYSPARK_PYTHON
Variable Value :

if you add the "PYSPARK_PYTHON" variable then you will not required to set the OS environ variables in the code.

DEwithDhairy
Автор

The Best Video about this topic I found on YT

BiswajitSibun-nb
Автор

I really apricate you brother . i was encountering many issues even i could not figure out from out . but this video resolves all errors .Thank you .

ithisrinu
Автор

Really, you are great tutor.
I literally struggled googling for errors running pyspark files. Finally your video helped me..Many Thanks

pramilaj
Автор

Thanks Diraj. Am trying to do via notebook when am execting the code am getting py4JJavaerror. And how can I see pyspark kernel in notebook do u have any idea about it

nsreeabburi
Автор

when I try to install spark in windows home then getting error

ChandanDeveloper
Автор

Hi, nice explanation. Thank for making the video. I request you to make a video how to write df to csv file.

g.suresh
Автор

Hi. When im running pyspark in command prompt .it is showing the error. And when im initializing a varibale like x=sc.textFile("Readme")

It is givinh the error as sc is not defined..please help

Rayudu_Alapati
Автор

Hello @Dhairy Gupta, I followed the same steps what u said, but I'm getting error for Spark and pyspark as --> "is not recognized as an internal or external command,
operable program or batch file." could u please tell what I've to do?

SandhyaRani-eutn
Автор

Hello help me i am getting crash error

yashrammalhotra
Автор

Is Java 17 version incompatible with Hadoop 3.3.5?

ashokraj-go
Автор

I am getting error. Please help



from pyspark.sql import SparkSession
from datetime import datetime, date
from pyspark.sql import Row

import os
import sys

os.environ['PYSPARK_PYTHON'] = sys.executable
= sys.executable

print(sys.executable)

spark =

data = [(1, 'A'), (2, 'B')]
schema = ['id', 'name']

df = spark.createDataFrame(data, schema)
# df.show()

df.write.csv(path='D:/Practice/PySpark/Files', header=True, mode='overwrite')

g.suresh
Автор

while running this code this error occurred 24/03/12 11:52:23 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
Python worker exited unexpectedly (crashed)
at
at
at
at
at
at
at
at
at
at
at Source)
at
at
at
at
at
at
at
at

debajyotijana
Автор

Is Java 17 version incompatible with Hadoop 3.3.5?

ashokraj-go