Realtime Socket Streaming with Apache Spark | End to End Data Engineering Project

Показать описание

In this video, you will be building a real-time data streaming pipeline with a dataset of 7 million records. We'll utilize a powerful stack of tools and technologies, including TCP/IP Socket, Apache Spark, OpenAI Large Language Model (LLM), Kafka, and Elasticsearch.

📚 What You'll Learn:
👉 Setting up and configuring TCP/IP for data transmission over Socket.
👉 Streaming Data With Apache Spark from Socket
👉 Realtime Sentiment Analysis with OpenAI LLM (ChatGPT)
👉 Prompt Engineering
👉 Setting up Kafka for real-time data ingestion and distribution.
👉 Using Elasticsearch for efficient data indexing and search capabilities.

✨ Timestamps: ✨
0:00 Introduction
01:10 Creating Spark Master-worker architecture with Docker
10:40 Setting up the TCP IP Socket Source Stream
23:25 Setting up Apache Spark Stream
42:56 Setting up Kafka Cluster on confluent cloud
47:12 Getting Keys for Kafka cluster and Schema Registry
1:12:53 Realtime Sentiment Analysis with OpenAI LLM (ChatGPT)
1:24:10 Setting up Elasticsearch deployment on Elastic cloud
1:30:50 Realtime Data Indexing on Elasticsearch
1:36:05 Testing and Results
1:41:50 Outro

🌟 Please LIKE ❤️ and SUBSCRIBE for more AMAZING content! 🌟

🔗 Useful Links and Resources:

✨ Tags ✨
Data Engineering, Apache Airflow, Kafka, Apache Spark, Cassandra, PostgreSQL, Zookeeper, Docker, Docker Compose, ETL Pipeline, Data Pipeline, Big Data, Streaming Data, Real-time Analytics, Kafka Connect, Spark Master, Spark Worker, Schema Registry, Control Center, Data Streaming, Real-time Data Streaming, OpenAI LLM, Elasticsearch, Data Processing, Data Analytics, TCP/IP, Streaming Solutions, Data Ingestion, Real-time Analysis, Spark Configuration, OpenAI Integration, Kafka Topics, Elasticsearch Indexing, Data Storage, Stream Processing, Machine Learning Integration

✨ Hashtags ✨
#confluent #DataEngineering #TCP #TCPIP #sockets #socketstreaming #Kafka #ApacheSpark #Docker #ETLPipeline #DataPipeline #DataStreaming #OpenAI #Elasticsearch #RealTimeData #BigData #TechTutorial #StreamingAnalytics #MachineLearning #DataFlow #SparkStreaming #DataScience #AIIntegration #RealTimeAnalytics #StreamingData #realtimestreaming #realtime

Рекомендации по теме

Комментарии

Thanks for watching! Hit the LIKE button, SUBSCRIBE and comment for wider reach 🥺🙏

CodeWithYu

Your Channel is very helpful and affective. I am learning a lot from you.

judyramphele

Your content is so helpful, hope your channel grows to million.

pratiknarendraraut

I really enjoyed going through this video. Very informative. Thank you very much

ataimebenson

Another awesome piece of you, a great contribution to the data engineering community

travelwithshayan

Well done @CodeWithYu. This is elaborate and I love it. I was following up at the beginning but got lost along the way. Though I have resolved to watching this several time to be able to understand it better.

adebisiabioduntedvideo

Definitely you are one of the best professionals that share the knowledge properly. You will help all of us to boost our data engineering skills.

I've just watch this tutorial twice in order to understand the architecture and workflow correctly before start coding. For sure it will help me to bring my portfolio to the next level and i will mention you of course.

Regarding this project, where is the spark df storaged? Is it keeped into chache or into the docker image volume?

About visualization, I've good knowledge about PBI but not conecting it with elastiksearch. I would appreciate any suggestions, although I will explore by my self eks-pbi connectors.

Thanks for your unselfishness teaching
Regards

RafaVeraDataEng

This is awesome. Thanks for the great content.

____prajwal____

Thanks for your videos. can you make a project using spark, kafka and jenkins for ci/cd and test automation ?

jmagames

Quick question:

Why do you submit spark job separately? You have initially ran the socket streaming as independent process and later you mentioned to submit it to spark-worker but eventually you have submitted to spark-master itself, just wanted to understand the motive behind it.

Great project though. Keep up the good work.

saikirannukala

Thank you so much, this is so informative

RecaAtoz

Perfects! But I have a question: bcz you have all of the py files and other files, why you just run spark locally?

Sakasiton

Some Useful solutions

For the schema registry, scroll down => cli & tools => kafka connect => schema registry create api key => 4 => Generate config => scroll down to the bottom of the code to copy the schema url

ataimebenson

Can I ask how do you run the streaming pyspark job. I saw u spinning up spark with docker compose. But how do we run the pyspark streaming job to the spun up containers.

HaiDo-bd

great video m8t, have you a video on how to create a kafka on a baremetal ?

sclem

HI, I am.a beginner, love your channel and knowledge you give, Love from India!!
I have a doubt, in the docker compose file im unsure about the network, can i use the same network given code-with-yu or should i use a different network name for doing this project?
If i have to use a different one how do i do that?

DivineSam-wm

Hey Yusuf Amazing Video. But Facing Some Errors

jaysinhpadhiyar

Thanks for your awesome content. About visualization, I want to use kibana to draw a line chart to plot the ratio of positive and negative reviews in real time, the x axis will be timestamps, the y axis will be %, how I can do? Pls help me.

anhminh-jtql

In Getting Keys for Kafka cluster and Schema Registry part exactly in 50:28, I didn't find the "create schema Registry API key" below the "create Kafka Cluster API key" in the second step under the clients section. What should I do? Many thanks

yasminemasmoudi

Can You please make a video as how to setup the Environment

ShubhamKumar-zwoq

Realtime Socket Streaming with Apache Spark | End to End Data Engineering Project

Realtime Socket Streaming with Apache Spark | End to End Data Engineering Project

Apache Flink-47 Flink Source Connector TCP Socket Source Connector

Live Dashboard Using Apache Kafka and Spring WebSocket

How Web Sockets work | System Design Interview Basics

Don't Use Websockets (Until You Try This…)

Building our first Spark Streaming Application! | PySpark Tutorial

Build a multiplayer real-time game with WebSockets and Apache Pulsar

Kafka in 100 Seconds

Window-based Stream Data Analytics with Spark and Kafka: Apache Spark Streaming

Top 6 Most Popular API Architecture Styles

TCP Socket Server Streaming data generator for Apache Spark Network Streaming Test

How to Make 2500 HTTP Requests in 2 Seconds with Async & Await

Build Real-Time Streaming ETL Pipelines with Akka Streams, Alpakka and Apache Kafka

RabbitMQ in 100 Seconds

Application Data Streaming with Apache Kafka and Swim

Creating Stream processing application using Spark and Kafka in Scala | Spark Streaming Course

Display Real Time Updates Using Python, Flask and Websocket

Swiftchat full demo | Socket.io | Apache Kafka | Redis

How I like to setup websockets in my web applications

NodeJS Streams

Spring WebSocket integration with Apache Kafka

The different Streams Processing

Apache Spark Streaming | Real-time data processing for Hadoop | Big Data Tutorial

Spark Tutorial | Spark Streaming Intro | DStream vs RDD | Apache PySpark for Beginners | Part - 7