Project Name : Build a SQL-based data analysis toolkit with a Project

Показать описание

Project Name : Build a SQL-based data analysis toolkit with a Project

This project gives hands-on experience in designing, querying, optimizing, and visualizing data in a relational database. We have used Finance Domain for this. In this project we are going to cover below topics :

Schema Design :

Structuring tables, columns, and relationships to logically represent domain entities and enforce data integrity, enabling efficient storage and reliable joins.

Modeled a finance domain with tables for customers, accounts, instruments, transactions, market_data, portfolios, and portfolio_holdings.

Enforce data integrity via primary and foreign keys.

Chose data types appropriate for each field (e.g., DECIMAL(18,6) for prices, TIMESTAMP for dates).

Complex SQL Queries :

Crafting advanced SELECT statements with aggregations, conditional logic, and date functions to extract and analyze data for key business insights.

These queries will make you learn pattern-based query construction for filtering, aggregation, and ranking

Developed five key queries to answer business questions:

Monthly trading summary per account

Top 10 instruments by traded volume

Customer lifetime net traded value

Unrealized P/L per portfolio

7-day moving average of close price

Used aggregates (SUM), conditional logic (CASE WHEN), date functions (DATE_FORMAT, YEAR, MONTH), and correlated subqueries.

Stored Procedures

Packaging reusable SQL logic into database-level routines with parameters to simplify client code, standardize operations, and centralize maintenance.

Encapsulated recurring logic into five procedures:

get_monthly_summary(year, month)

get_top_instruments_by_volume(start, end, limit)

get_customer_lifetime_net(customer_id)

get_unrealized_pl(portfolio_id)

purge_old_transactions(cutoff_datetime)

Used IN parameters, BEGIN...END blocks, and CALL syntax.

Snippet :
DELIMITER //
CREATE PROCEDURE sp_get_monthly_summary(
IN p_year INT,
IN p_month INT
)
BEGIN

-- aggregation logic here

END;
//
DELIMITER ;

Performance Optimization :

Tuning indexes, refreshing statistics, and examining EXPLAIN plans to reduce scan costs and accelerate query execution on growing datasets.

Added composite indexes on highcardinality columns:

transactions(instrument_id, txn_time)

transactions(account_id, txn_time)

market_data(instrument_id, date)

accounts(customer_id)

Used ANALYZE TABLE to refresh optimizer statistics.

Employed EXPLAIN to compare query plans before and after indexing.

Measured endtoend runtimes in Python to quantify speedups (e.g., from full table scans to index seeks).

Python Interface & Visualization :

Bridging SQL with Python via connectors and Pandas to fetch query results and render them in Matplotlib charts for intuitive data storytelling.

Built a Python module using SQLAlchemy, PyMySQL, Pandas, and Matplotlib.

Provided utility function run_query(sql, params) to fetch results into a DataFrame.

Created a prototype chart showing the monthly net traded value trend.

Demonstrated performance benchmarking and visualization in one script.

This toolkit demonstrates the end-to-end workflow for a SQL-based data analysis project:

From careful schema design to complex queries,

From encapsulating logic in stored procedures to tuning performance,

And finally exposing insights via a Python client.

By following these steps, students learn best practices in database modeling, query writing, optimization, and visualization foundational skills for any data-driven role.