how to upload on premise database data to AWS S3 | Build a Data Lake | Python

Показать описание

A data lake is a centralized cloud storage in which you can store of all the data, both structured and unstructured, at any scale. This platform is fast becoming the standard for users looking to store and process big data. we will cover how to build an AWS S3 data lake with an on-premise SQL Server database. S3 is an easy to use data store. We use it to load large amounts of data for later analysis.

Subscribe to our channel:

---------------------------------------------
Follow me on social media!

---------------------------------------------

#AWS #S3 # DataLake

Topics covered in this video:
0:00 - Intro data lake from on-premise to to AWS S3
1:03 Create S3 user with programmatic access
2:37 - Create S3 bucket
3:04 - Python setup
3:56 - Read data from SQL Server
5:04 - Load Data to S3 Bucket
6:59 - Code Demo
7:36 - Review S3 Data Lake

Рекомендации по теме

Комментарии

Hey, how to automate this process, like after certain time, the code will run aotmatically and upload sql data to s3.

kshitijbansal

Thanks for the informative content! so how would you deal with large tables especially for the initial loads? assume that table is 200-300GB, selecting all data and keeping in data frames/ in memory objects doesn't look practical so I believe defining a batch key/partitions on source side and iterate it on codes could be a way..

ahmetaslan

Thanks for great video! instead of csv, would you recommend to upload the same structured data as parquet?

uatcloud

Where would you run main.py if you dont want to run locally

AnshuJoshi-ohio

I am working with ms sql the data base is on aws rds and contains data in billions i want to extract and load data to s3 via glue how can i do it ??

mukeshgupta

i found problem like this
importing rows0 to 606 for table DimProduct
Data load error: name 'upload_file_bucket' is not defined
how to problem solve this case? thanks

ihab

What if the tables are so big that you cannot load all the data on your local/Pandas? It would be great to show how to batch the table to s3 using SQL and Pandas or SQL and PySpark. I am using a Docker Container currently, but my source is so big, even with all my Mac's computing power allocated to the Docker container, that the transfer still fails with an OOM 137 from Docker.

jasonp

I have 4 different csv files in s3 and i need to load into four different tables in redshift, can tell me how it is possible using lambda function

muppallavenkatadri

can i move any type of files by using this

AnanthuS-miqm

I have these two databases from amazon AWS s3 into my on premise sql server

is there a way i can migrate this entire DB on amazon s3 to snowflake ?

socialawareness

hi if you can show us how to connect to MSSQL with details

montassarbenkraiem

Is it possible to update an existing file on s3 line by line ?

KeshavChoudhary-dxxd

what's your email? I do have some questions for you?

kwabenau

Hey, how to automate this process, like after certain time, the code will run aotmatically and upload sql data to s3.

kshitijbansal

Excellent content, just what I need to know! Looking forward to your Airflow videos next

richardhoppe

Hey Haq, How to do incremental load in this pipeline? Do we need to rewrite the new data all again into S3 or is there a way to make field level changes / inserts /deletes on existing buckets? I am assuming that this is where data lake file formats such as Apache Hudi into picture, correct me if i am wrong and please walk me through to workaround process

jaswanth

how to upload on premise database data to AWS S3 | Build a Data Lake | Python

how to upload on premise database data to AWS S3 | Build a Data Lake | Python

Get Paid Uploading Photos Using The Premise Mobile App In 2024

17. ADF | Copy files from On-premises file system into blob

Uploading on-premises data with Qlik Data Transfer

Migrating On Premise VM to AWS | VM Import/Export | Create EC2 instance based on on-premises server

How to migrate a MySQL on-prem database to AWS RDS

AWS DATASYNC | On-premises to AWS | Agent Setup | Transfer Data to S3

How To Connect To On-Premises Data In Microsoft Fabric

Tech Trends 2025 and how they would impact digital transformation initiatives

Azure - Move files from on prem to azure Blob using AZcopy utility !!!

#56. Azure Data Factory - Copy File from On Premise to Cloud

Learn how to Upload Data Files to AWS S3 via CLI Tool | S3 Sync

Azure FILE Share Explained with DEMO Step by step Tutorial

On-premises data with Data Factory in Microsoft Fabric

Upload files programmatically from On-premises to Azure Storage Account Using C#.Net

On Premises Storage Vs AWS Storage

Extend Quickbase Workflows to Your On-Premise Data

Power Automate - Get Files from Shared/Network Drive to SharePoint/OneDrive

EFT Arcus - On Premise File Upload and SFTP Event Rule Tutorial

Amazon S3 on Outposts: Extending S3 Into Your on-premises Environment

Data Migration from on-prem server to Snowflake Data Warehouse || How to Migrate data

How to Upload files from local to AWS S3 using Python (Boto3) API | upload_file method |Handson Demo

How to build on-premise Data Lake? | Build your own Data Lake | Open Source Tools | On-Premise

How to Upload Files to a SharePoint Site | SharePoint File Management | 2023 Tutorial