Using Glue schema registry for Apache Kafka with Python

preview_player
Показать описание
This video explains a Python streaming data pipeline which leverage schemas for data validation using Kafka with AVRO and AWS Glue Schema Registry.

Documentation Links:
-----------------------------------

Prerequisites:
---------------------------
Introduction to Schema Registry in Kafka | Part 1
Introduction to Schema Registry in Kafka | Part 2

Avro Schema Used:
------------------------------------
{
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "Age", "type": "int"}
]
}

Python Code:
--------------------------
#pip3 install boto3 -t.
#pip3 install aws-glue-schema-registry --upgrade --use-pep517 -t .
#pip install kafka-python -t .
import boto3
from time import sleep
from json import dumps
from kafka import KafkaProducer
from aws_schema_registry import DataAndSchema, SchemaRegistryClient

session = boto3.Session(aws_access_key_id='{}', aws_secret_access_key='{}')

# Create the schema registry client, which is a façade around the boto3 glue client
client = SchemaRegistryClient(glue_client,
registry_name='my-registry')

# Create the serializer
serializer = KafkaSerializer(client)

# Create the producer
producer = KafkaProducer(bootstrap_servers=['127.0.0.1:9092'],value_serializer=serializer)

# Our producer needs a schema to send along with the data.
# In this example we're using Avro, so we'll load an .avsc file.

# Send message data along with schema
data = {
'name': 'Hello',
'Age':45
}
#data={'Partiiton_no':2}

Check this playlist for more Data Engineering related videos:

Apache Kafka form scratch

Snowflake Complete Course from scratch with End-to-End Project with in-depth explanation--

🙏🙏🙏🙏🙏🙏🙏🙏
YOU JUST NEED TO DO
3 THINGS to support my channel
LIKE
SHARE
&
SUBSCRIBE
TO MY YOUTUBE CHANNEL
Рекомендации по теме
Комментарии
Автор

Great explanation through code walkthrough as well as console demo. Best part is the explanation is very detailed for anyone to understand.

harvestingdata
Автор

thanks a lot, are you using CONDUCKTOR in local system ?

SpiritOfIndiaaa
Автор

Thanks alot sir, because of you I learned glue schema registry....keep going

mranaljadhav
Автор

how can i set environment variables in ubuntu to set aws cred when consuming messages in conduktor

SreshthBhatt
Автор

weird I have an error * SchemaRegistryException: Exception occured while fetching or registering schema definition *

MrMadmaggot
Автор

Can we just validate the keys of the message, regardless of their values?

luckyratnawat
Автор

Can you explain how the schema id is getting produced and how it is used by producer and consumer

dibyangsumajumdar
Автор

Does schema registry induce any kind of performance issues? As producer will always perform schema validation before sending the data to kafka broker.

Also, thanks for your videos. These are really helpful!

roshankumargupta
Автор

How we can use confluent kafka for same ?

ayushmandloi
Автор

HI, I am a Korean student. First of all, thank you for providing a great quality video!!

One thing I'm curious about is what is the blue icon UI application in the video (shown in the process of consumming)
Your reply will be of great help to my work. :)

kkw_on
Автор

Thanks, but I'm problem when install aws-glue-schema-registry. Follow message bellow about problem:

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for orjson
Failed to build orjson
ERROR: Could not build wheels for orjson, which is required to install pyproject.toml-based projects

EmersonSousa-sjwy
welcome to shbcf.ru