what is Spark SQL

preview_player
Показать описание
ATTENTION DATA SCIENCE ASPIRANTS:
Click Below Link to Download Proven 90-Day Roadmap to become a Data Scientist in 90 days

Spark SQL is a spark's interface based on dataframe, to work with structured data and semi-structured data. Structured data is any data with schema and known set of fields. The advantage of this interface is that it makes easy to load and query the data.

The 3 main capabilities that spark SQL provides are

1. It provides Dataframe abstraction, which is an extension of the RDD. Think of dataframe as a table with rows and columns. Yes, it is conceptually equivalent to relational database table.
2. Spark SQL provides capability to read and write data in different formats like JSON, Parquet, CSV, and text file.
3. Whether you use Scala, or Python or Java, whether you use spark-shell, pyspark, or spark-submit tool, you can write SQL like queries that can interact with the underlying data.
Рекомендации по теме