Data validation between source and target table | PySpark Interview Question |

Показать описание

Hello Everyone,

source_data = [(1,'A'),(2,'B'),(3,'C'),(4,'D'),(5,'E')]
source_schema = ['id','name']

target_data = [(1,'A'),(2,'B'),(3,'X'),(4,'F'),(6,'G')]
target_schema = ['id','name']

This series is for beginners and intermediate level candidates who wants to crack PySpark interviews

#pyspark #interviewquestions #interview #pysparkinterview #dataengineer #aws #databricks #python

Рекомендации по теме

Комментарии

At 6.04 instead of copying the same statement you can use .otherwise("not matching")

beingnagur

I do below steps to compare source vs target table
1) Count should be matching in source and target table
2) Schema should be matching in source and target table
3) Use the except and to check if any records are there which are present in source and not in target or vice versa.
4) Use the left anti join to find out the records which are not matching.
5) Trying to debug why there is record mismatch

rishabhkesarwani-brrx

exceptAll can be usefull too or anti join

gudiatoka

Main Problem i found in learning Pyspark is brackets every time it gives me some error.

jhonsen

I request you to please create a playlist for Pyspark Unit testing .

nishirajnikku

plz make video on pyspark unit testing

shivamchandan

Data validation between source and target table | PySpark Interview Question |

Data validation between source and target table | PySpark Interview Question |

Advanced Excel - Data Validation and Drop-Down Lists

ETL Testing | How to validate data from Source to Target

How to Validate Millions of Record in ETL testing?

Data Validation with Pyspark || Real Time Scenario

Data Validation with SQL (Validate your Dataset in Three Steps)

Data Validation with Pyspark || Schema Comparison || Dynamically || Real Time Scenario

Database vs Data Warehouse vs Data Lake | What is the Difference?

How to create Excel data validation for unique values

35. Compare source data with target data using data flows in Azure data factory or Azure Synapse

Data validation of data warehouse and database migrations through Data Validator Tool (DVT)

Validation on etl testing

🔥PART 2 - ETL Validation - Data Validation - TABLE vs TABLE, File vs File, File vs Database #QA

Power BI Data Validation - Data God Tools - Intermediate

Create Multiple Dependent Drop-Down Lists in Excel (on Every Row)

🔥PART 1 - ETL Validation - Data Validation - Database vs Database, File vs File, File vs Database

Drop Down based on another cell | Dependent Data validation | Microsoft Excel Tutorial

Data Validation across Two Columns

Dynamic Excel Drop Down Lists - PLUS how to get SEARCHABLE Drop Down Lists!

5| CSV file to file validation using Databricks

ETL testing End to End Process with Live Data | Flat File , Mapping Sheet, Test Case, Excel Validatn

Excel Create Dependent Drop Down List Tutorial

Quick Excel Tip: Edit Data Validation Rule for All Same Cells

Informatica Data Validation Option Architecture