Python Tutorial: Building DataFrames from scratch

preview_player
Показать описание

---
We've seen how to work with DataFrames in memory.

But how do we get them in memory?

In the Intermediate Python for Data Science course, we used read_csv to load a DataFrame from a comma-separated-values file.

For instance, here we use a file users dot csv to create a DataFrame called users.

The file records visitors to a blog for a band and who signed up for the newsletter. By tracking where visitors come from, this information can help design tours later.

DataFrames can also be rolled by hand using dictionaries.

Remember, dictionaries (or associative arrays) are a core data structure in Python.

Here, we construct a dictioary of lists with the same users data.

The keys of the dictionary data are used as column labels.

Notice, with no index specified, the row labels are the integers zero to three by default.

Let's build the DataFrame users up a different way, using conforming lists cities, signups, visitors and weekdays for the column data.

It is useful to be able to build DataFrames from lists because lists are a common Python data structure; it's natural that we might receive data accumulated in lists.

We can then define two other lists: list_labels (containing the column labels) and list_cols (containing the column entries for each column).

Notice list_cols is a list of lists.

Using Python's list and zip functions constructs a list called zipped of tuples (column names and columns) to feed to the dict command.

Calling dict(zipped) creates a dict data which is then used with pd dot DataFrame to build the DataFrame.

Let's look again at broadcasting, a convenient technique in NumPy & Pandas.

With users in memory, a new column, say fees, can be created on the fly.

By using the new column label fees and by assigning the scalar value zero, the value is broadcast to the entire column.

Broadcasting saves time in generating long lists, arrays, or columns.

Broadcasting is not restricted to numbers.

Here, we create a dictionary data with column labels height and sex as keys and a list and a single-character string 'M' as values.

When the dict data is used to create DataFrame results, the value 'M' is broadcast to the entire column.

Remember, we can change the column and index labels using the columns and index attributes of a Pandas DataFrame.

We can assign lists of strings to the attributes columns and index as long as they are of suitable length (that is, the number of columns and rowss respectively).

It's time for you to practice using other DataFrame construction techniques, broadcasting, and relabelling.

#Python #PythonTutorial #DataCamp #pandas #Foundations #DataFrames
Рекомендации по теме
join shbcf.ru