Pandas vs SQL - What's The Difference?

preview_player
Показать описание
SQL, or Structured Query Language, is a powerful language that's specifically designed for working with data stored in relational databases. With SQL, you can write queries that allow you to select, filter, and transform data in order to extract the information you need. SQL is widely used in industries such as finance, healthcare, and retail, and it's a must-know skill for anyone who works with data in these fields.
On the other hand, Pandas is a popular library for Python that provides similar functionality to SQL for working with data in a tabular format. With Pandas, you can use Python to perform complex data manipulation tasks, including filtering, aggregating, and transforming data. Pandas is often used in conjunction with other scientific computing libraries in Python, such as NumPy and SciPy, to perform advanced data analysis tasks.

So, how do SQL and Pandas compare? Well, they both have their strengths and weaknesses. Let's find out.
Рекомендации по теме
Комментарии
Автор

Pandas is a monster for Data wrangling. One of its key pros is that it is 2 dimensional for data manipulation SQL is 1 dimensional, i.e., record-based but not field-based. here is just one of the examples... finding rolling mean with 10 periods and inserting in a new column. Pandas can do this in just One small line of code. In SQL you have to ALTER, ADD and derive the mechanism in a very complex way for the same with multiple codes.

Emotekofficial
Автор

Interesting video, but i feel that the comparison examples between SQL and pandas are unfair. For example, in the "age>30" filtering comparison, you said that SQL is better than pandas, becasue in pandas you need to first import the library, and then add the line "df=pd.read('customers.csv')" before filtering by age>30, but I'm pretty sure that in SQL you are going to need to import the table from a related database, from a first look at SQL documentation, I underestand that the complete code for SQL would be something like:

CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(255),
age INT
);

LOAD DATA LOCAL INFILE 'customers.csv'
INTO TABLE customers
FIELDS TERMINATED BY ', '
LINES TERMINATED BY '\n';

and then, finally

SELECT * FROM customers
WHERE age > 30;

that vs:

import pandas as pd
df=pd.read('customers.csv')
df[df[age]>30]

operonfun