Walmart PySpark Interview Question | Data Engineering |

preview_player
Показать описание


# Initialize Spark session
.appName("Create Datasets") \
.getOrCreate()

# Define the schema for transactions
transaction_schema = StructType([
StructField("customer_id", IntegerType(), True),
StructField("transaction_type", StringType(), True),
StructField("transaction_amount", FloatType(), True)
])

# Create the transactions DataFrame
transactions_data = [
(1, "credit", 30.0),
(1, "debit", 90.0),
(2, "credit", 50.0),
(3, "debit", 57.0),
(2, "debit", 90.0)
]

# Show the transactions DataFrame

# Define the schema for amounts
amount_schema = StructType([
StructField("customer_id", IntegerType(), True),
StructField("current_amount", FloatType(), True)
])

# Create the amounts DataFrame
amounts_data = [
(1, 1000.0),
(2, 2000.0),
(3, 3000.0),
(4, 4000.0)
]

# Show the amounts DataFrame

This series is for beginners and intermediate level candidates who wants to crack PySpark interviews

#data #walmart #dataengineering #kafka
#python #sql #azuredatabrickswithpyspark #llm

#pyspark #interviewquestions #interview #pysparkinterview #dataengineer #aws #databricks #python
Рекомендации по теме
Комментарии
Автор

this was a very tough question but overall amazing

prajju
Автор

Sagar, your code will fail if Total Credit is more than Total Debit. You have to reverse the summing and finally add when generating the result. Hope this has been captured and raised by someone else also 🙂

SanjaySuryavanshi-rk
Автор

i am not a Pyspark developer, however correct me if my logic will work here ... Separate Dr and Cr, then in Debit side multiply -1, then union all three table/data set ..then select the district record system will add and minus those values ..

Lemme know if you think this logic won't work.

sumit
Автор

Bro tell me honesltly one answer if i see your video and i want to become a data engineer . Your content is next level.

praveenbhandari