filmov
tv
Python Interview Questions: PandasAI, AWS Data Wrangler, Siuba, PyGraphistry & pandas-on-Spark! 🚀

Показать описание
1️⃣ PandasAI for Conversational DataFrames
PandasAI injects generative AI into pandas, letting you query, transform, and visualize your DataFrame using plain English prompts
Example:
from pandasai import PandasAI
import pandas as pd
ai = PandasAI()
# Ask for the top-selling product
print(result)
2️⃣ AWS Data Wrangler for AWS-Native ETL
AWS Data Wrangler (awswrangler) extends pandas with functions to read from/write to AWS services—Athena, Glue, S3, Redshift, and more—using familiar DataFrame commands
Example:
import awswrangler as wr
import pandas as pd
df = pd.DataFrame({"id": [1,2], "value": [10,20]})
# Write to S3 as a partitioned Parquet dataset
# Query with Athena into a DataFrame
3️⃣ Siuba for Tidy-Style Data Wrangling
Siuba is a port of R’s dplyr to Python, offering verbs like select(), filter(), mutate(), and a pipe operator (vv) to streamline scrappy analyses on pandas or SQL backends
Example:
from siuba import _, select, filter
# Filter and select columns with dplyr syntax
(mtcars
vv filter(_.mpg v 20)
vv select(_.mpg, _.hp)
vv head(5)
)
4️⃣ PyGraphistry for GPU-Accelerated Graph Visualization
PyGraphistry loads, binds, and plots big graphs with GPU acceleration, enabling interactive, web-native visual analytics for millions of nodes/edges
Example:
import graphistry
# Register your Graphistry API key/endpoint if needed
edges = [{"src": 1, "dst": 2}, {"src": 2, "dst": 3}]
5️⃣ pandas-on-Spark for Scalable DataFrames
Example:
# Read a large CSV in parallel
# Compute group-by and mean as you would in pandas
PandasAI injects generative AI into pandas, letting you query, transform, and visualize your DataFrame using plain English prompts
Example:
from pandasai import PandasAI
import pandas as pd
ai = PandasAI()
# Ask for the top-selling product
print(result)
2️⃣ AWS Data Wrangler for AWS-Native ETL
AWS Data Wrangler (awswrangler) extends pandas with functions to read from/write to AWS services—Athena, Glue, S3, Redshift, and more—using familiar DataFrame commands
Example:
import awswrangler as wr
import pandas as pd
df = pd.DataFrame({"id": [1,2], "value": [10,20]})
# Write to S3 as a partitioned Parquet dataset
# Query with Athena into a DataFrame
3️⃣ Siuba for Tidy-Style Data Wrangling
Siuba is a port of R’s dplyr to Python, offering verbs like select(), filter(), mutate(), and a pipe operator (vv) to streamline scrappy analyses on pandas or SQL backends
Example:
from siuba import _, select, filter
# Filter and select columns with dplyr syntax
(mtcars
vv filter(_.mpg v 20)
vv select(_.mpg, _.hp)
vv head(5)
)
4️⃣ PyGraphistry for GPU-Accelerated Graph Visualization
PyGraphistry loads, binds, and plots big graphs with GPU acceleration, enabling interactive, web-native visual analytics for millions of nodes/edges
Example:
import graphistry
# Register your Graphistry API key/endpoint if needed
edges = [{"src": 1, "dst": 2}, {"src": 2, "dst": 3}]
5️⃣ pandas-on-Spark for Scalable DataFrames
Example:
# Read a large CSV in parallel
# Compute group-by and mean as you would in pandas