filmov
tv
Spark SQL An Example-scala

Показать описание
Spark SQL - An Example
Work through an example that demonstrates executing queries using Spark SQL.
We'll cover the following
Simple query example
Complex query example
Simple query example
#scala When we execute Spark SQL queries in the spark-shell, we don’t need to explicitly create a SparkSession object. We would if we were writing a Spark application that was to be run using spark-submit. The SparkSession object is provided implicitly by the shell. Read the listing below, which is similar to what we have done in the previous lessons, except for creating the temporary view, which we discuss later.
"SELECT title FROM tempView WHERE releaseYear 2010 ORDER BY title desc"
The query returns all the movie titles that were released post-2010 and also sorts them in descending order by title. We then print three rows from the result. The sql method returns the results as a DataFrame. Spark SQL offers an ANSI:2003-compliant SQL interface and the query we executed should look and feel similar to a query against a relational database table such as MySQL or PostgreSQL. We can also express the same query using DataFrames API as follows:
Complex query example#
Let’s try a slightly more complex query. Imagine that we want to list all the movies released post 2010 and at the same time label each movie below average, average, and above average if the hitFlop column is less than 5, equal to 5, and higher than 5, respectively. The query and its output are presented below:
Work through an example that demonstrates executing queries using Spark SQL.
We'll cover the following
Simple query example
Complex query example
Simple query example
#scala When we execute Spark SQL queries in the spark-shell, we don’t need to explicitly create a SparkSession object. We would if we were writing a Spark application that was to be run using spark-submit. The SparkSession object is provided implicitly by the shell. Read the listing below, which is similar to what we have done in the previous lessons, except for creating the temporary view, which we discuss later.
"SELECT title FROM tempView WHERE releaseYear 2010 ORDER BY title desc"
The query returns all the movie titles that were released post-2010 and also sorts them in descending order by title. We then print three rows from the result. The sql method returns the results as a DataFrame. Spark SQL offers an ANSI:2003-compliant SQL interface and the query we executed should look and feel similar to a query against a relational database table such as MySQL or PostgreSQL. We can also express the same query using DataFrames API as follows:
Complex query example#
Let’s try a slightly more complex query. Imagine that we want to list all the movies released post 2010 and at the same time label each movie below average, average, and above average if the hitFlop column is less than 5, equal to 5, and higher than 5, respectively. The query and its output are presented below: