Tutorials - Simon Willison: Data analysis with SQLite and Python

preview_player
Показать описание
SQLite is the world's most widely used database and has been a part of the Python standard library since 2006. It continues to evolve and offer more capabilities every year.

This tutorial will transform you into a SQLite power-user. You'll learn to use SQLite with Python, level up your SQL skills and take advantage of libraries such as sqlite-utils and tools such as Datasette to explore and analyze data in all kinds of shapes and sizes.

This hands-on tutorial will cover:

- The sqlite3 module in the Python standard library
- A review of SQL basics, plus advanced SQL features available in SQLite
- Using sqlite-utils for advanced manipulation of SQLite databases
- Datasette as a tool for exploring, analyzing and publishing data
- Applying the Baked Data architectural pattern to build a data application using Datasette and deploy it to the cloud
This tutorial is aimed at beginner-to-intermediate Python users with some previous exposure to basic SQL.

Attendees will leave this workshop with a powerful new set of tools for productively exploring, analyzing and publishing data.
Рекомендации по теме
Комментарии
Автор

*Data Analysis with SQLite and Python: A Comprehensive Tutorial*

* *0:07** Introduction:* Simon Willison welcomes attendees to PyCon and introduces the focus of the tutorial: data analysis using SQLite and Python.
* *0:19** Extensive Handout:* Willison directs attendees to an online handout that mirrors the tutorial's content.
* *1:01** Power of SQLite:* Willison highlights the power and versatility of SQLite, emphasizing its embedded nature, speed, extensive language bindings, backwards compatibility, single-file format, and surprising power.
* *7:00** Working with SQLite in Python:* Willison demonstrates basic SQLite operations in Python, including:
* *7:10** Connecting to a database:* `import sqlite3; db =
* *11:34** Listing tables:* `SELECT name FROM sqlite_master WHERE type='table'`
* *11:48** Fetching query results:* `cursor.fetchall()`
* *12:28** Iterating through results:* Using the cursor object as an iterator.
* *12:52** Customizing row factories:* `db.row_factory = sqlite3.Row` to return rows as dictionary-like objects.
* *14:10** Creating tables:* Using SQL `CREATE TABLE` statements, emphasizing the simplicity of SQLite's data types (text, integer, real, blob).
* *16:02** Parsing PEPs:* Willison imports PEP data from a URL, parses it with a GPT-4 generated function, and shows how to insert the data into the database.
* *18:07** Parameterizing queries:* Emphasizes the importance of using placeholders (question marks or colon syntax) to prevent SQL injection vulnerabilities.
* *18:37** Transactions:* Demonstrates using the `with db:` context manager to wrap database changes within a transaction for data integrity.
* *21:48** Updates and Deletes:* Shows SQL `UPDATE` and `DELETE` statements to modify and remove data from tables.
* *24:52** Introducing Datasette:* Willison introduces Datasette, his open-source tool for exploring, analyzing, and publishing SQLite databases. He demonstrates how to:
* *25:54** Start a Datasette server:* `dataset content.db`
* *26:51** Explore database tables:* Datasette provides a web interface for viewing table data, including his personal website's database.
* *28:41** Foreign Key Hyperlinks:* Datasette automatically creates hyperlinks for foreign key relationships, allowing for easy data navigation.
* *35:45** Faceting:* Demonstrates how to use facets to group and count data based on values in specific columns, enabling quick insights into large datasets.
* *37:12** Query Building:* Shows the built-in query builder for filtering data.
* *37:58** Exporting Data:* Datasette allows exporting data in CSV, JSON, and other formats.
* *39:24** Datasette as an API:* Emphasizes that the Datasette interface acts as an API, allowing programmatic access to the data.
* *41:31** Exploring SQL with Datasette:* The "View and edit SQL" button reveals the underlying SQL queries generated by Datasette, making it a valuable tool for learning SQL.
* *46:12** SQL as an API:* Datasette exposes an API endpoint for executing SQL queries, arguing that this is safe and flexible for read-only published databases.
* *48:48** Datasette Plugins:* Willison highlights the extensive plugin ecosystem of Datasette:
* *49:23** Installing plugins:* `dataset install dataset-cluster-map`
* *49:53** The Value of Plugins:* Plugins enable community-driven feature development.
* *50:24** `dataset-cluster-map`:* Demonstrates a plugin that automatically creates a map for tables with latitude and longitude data.
* *55:18** SQLite Utils:* Willison introduces SQLite Utils, his command-line tool and Python library for manipulating SQLite databases, demonstrating how to:
* *55:32** Insert data from CSV:* `sqlite-utils insert manatees.db locations ... -csv -d`
* *57:01** View database schema:* `sqlite-utils schema manatees.db`
* *57:12** Exploring Data in Datasette:* Willison loads the newly created database into Datasette to visualize and explore the Manatee carcass data.
* *59:55** Transform command:* `sqlite-utils transform ... ` to refactor tables, rename columns, drop columns, and reassign primary keys.
* *59:55** Normalizing Data:* Uses SQLite Utils to extract unique mortality codes into a separate table and create a foreign key relationship, illustrating data normalization techniques.
* *59:55** Convert command:* `sqlite-utils convert ...` to modify data within columns using Python expressions, such as converting date formats.
* *1:07:06** SQLite Utils as a Python Library:* Willison transitions to a Jupyter notebook to demonstrate using SQLite Utils as a Python library, showing how to:
* *1:07:11** Create a database:* `db =
* *1:07:54** Parse PEP data:* Imports a function to parse PEP files into dictionaries.
* *1:08:01** Insert data with `insert_all`:* `db['peps']-dot-insert_all(peps, pk='pep', alter=True)`
* *1:08:52** Exploring PEPs in Datasette:* Loads the newly created PEP database into Datasette for browsing and faceting.
* *1:09:49** Integrating with Pandas:* `pd.read_sql('SELECT * FROM peps', db.conn)` to create a Pandas DataFrame from the SQLite data.
* *1:10:55** Full-Text Search (FTS):* `db['peps'].enable_fts(['title', 'body'])` to enable SQLite's built-in FTS5 extension.
* *1:10:55** Datasette Search Integration:* Datasette automatically adds a search box for tables with FTS enabled.
* *1:17:07** Publishing to the Web:* Willison demonstrates how to quickly publish a Datasette instance:
* *1:17:43** Using Vercel:* He uses the `dataset publish vercel` plugin to deploy the PEP database to Vercel, a serverless platform.
* *1:18:08** Alternative Hosting Options:* He mentions other plugins for publishing to Google Cloud Run, Heroku, and Fly-dot-io.
* *1:25:50** Datasette Light:* Willison introduces Datasette Light, a version of Datasette that runs entirely in the browser using WebAssembly.
* *1:27:18** Loading Datasette Light:* Visiting `light.dataset-dot-io` triggers the browser to download and run the Python interpreter and Datasette using Pyodide.
* *1:59:55** Loading Data from URLs:* Datasette Light can load data directly from CSV, JSON, and SQLite URLs.
* *2:06:09** Plugin Support:* Datasette Light supports installing plugins using the `?install=` URL parameter.
* *2:08:44** Advanced SQL Techniques:* Willison covers several advanced SQL concepts:
* *2:08:50** Aggregations:* Demonstrates grouping data with `GROUP BY` and using aggregate functions like `COUNT`, `MAX`, `MIN`, and `SUM`.
* *2:11:24** Subqueries:* Shows how to use nested `SELECT` statements within queries for more complex data retrieval.
* *2:16:59** Common Table Expressions (CTEs):* Explains how CTEs (`WITH ... AS`) can be used to break down complex queries into more manageable chunks.
* *2:21:12** JSON Functions:* Highlights SQLite's support for working with JSON data, including the `JSON_GROUP_ARRAY` function for aggregating related data into JSON arrays.
* *2:25:59** Window Functions:* Demonstrates using window functions, such as `RANK()`, for analytical tasks like calculating rankings within partitioned data.
* *2:34:03** Real-World Datasette Examples:* Willison showcases personal projects that utilize Datasette, including:
* *2:34:16** Niche Museums Website:* His website listing tiny museums, demonstrating the use of custom Datasette templates.
* *2:37:44** simonwillison-dot-net Blog:* A Datasette-powered version of his blog, updated every two hours via a GitHub Actions script that pulls data from his primary PostgreSQL database.
* *2:39:48** Blog to Substack Newsletter:* An Observable notebook that uses the Datasette API to generate HTML content for his Substack newsletter.

*Key Takeaways:*

* SQLite and Datasette offer a powerful and flexible toolkit for data analysis and publishing.
* SQLite Utils streamlines data manipulation and database refactoring.
* Datasette Light provides a truly serverless way to explore and share data.
* Advanced SQL techniques can significantly enhance data exploration and analysis capabilities.
* Datasette's plugin ecosystem extends functionality and enables customization.
* Baking data into applications can leverage serverless hosting platforms for cost-effective data publishing.


I used gemini-1.5-pro-exp-0827 on rocketrecap dot com to summarize the transcript.
Cost (if I didn't use the free tier): $0.09
Input tokens: 60309
Output tokens: 2132

wolpumba
Автор

Definitely a converted to SQLite now :) I had forgotten how powerful it is, and planning to use datasette to share with my team mates.
The presenter has amazing teaching style and obviously so knowledgeable and experienced that can bat any demo demons or free-styling based on audience questions without change of tone of voice or change of pace… seamlessly weaving it into the tutorial. I reckon I can get my very python/sql newbies colleagues to seat through this almost 3h presentation to get an all in one intro to using data and leave excited about the possibilities … all looks so easy :)
Many thanks

jespasan
Автор

This is gold. Thanks for the tutorial!

senpos
Автор

Excellent presentation! Thank you Sir.

facundolopez