Python Tutorial: Dot notation and SQL

preview_player
Показать описание

----

Previously we learned how to use SQL tables.
We now learn how to query dataframes using either SQL queries or using dataframe dot notation. This allows us to compare and contrast these two notations.

For example, suppose we have a dataframe containing three columns, and we want to select only two columns.

We could do this.

Printed again here is how we obtained the result in the previous slide. See that the column train_id is a string given in quotes. We can also do the following:

We can also import this column function: which allows us to do the following:

This time, the column is given as an argument to this new operator. It may seem more verbose in this case. However, it is useful in other cases. Such as in the following,

To rename a column we can use the withColumnRenamed function. But we could also use the column operator, like so:

Don’t do this!

Pro tip: try not to use all three conventions at the same time without good reason.

Most Spark SQL queries can be done in either dot notation or SQL notation. Here’s an example. Notice that the limit operation is done at query time instead of at showtime. We can get the same result using dot notation, like so:
Note how we used the column operator to select the train_id column, and renamed it in place.

Window functions can also be done in either SQL or dot notation. This query adds a number to each stop on a train line -- in a new column called id.

Note how the id column starts over for train_id 324.

Here’s the same result using dot notation. There is typically a dot notation equivalent of every SQL clause including window functions.

The row_number SQL clause has an equivalent dot notation SQL function.

The inside of an OVER clause is handled by a Window object.
The Window object provides methods for it to be partitioned - and ordered. Some people prefer the SQL version, other people prefer the dot notation.

A "WindowSpec" is defined using the "Window" class and then used subsequently as an argument to the over() function in a window function query. Here is an example.

Let's practice!
Рекомендации по теме
visit shbcf.ru