filmov
tv
R Tutorial: Data Manipulation with data.table in R | Intro

Показать описание
---
Welcome to the new and renewed introductory course on R's data table package. I am Arun Srinivasan, a data scientist in the Finance industry.
We hope that you are already familiar with data frames at this point.
A data table is also a data frame but does so much more. Like data frames, they are columnar data structures and all columns must be of equal length.
So why do you need data tables?
A data table is a 2-D data structure, the two dimensions being rows and columns. However, most data analysis tasks require performing operations by groups. It is quite common to consider grouping as a virtual third dimension.
The data table syntax is quite powerful because it provides quick access to these dimensions in the form of placeholders for operations on rows, columns, and groups. The first argument, 'i' allows for filtering of required rows by accepting an expression or simply the required row numbers. If the 'i' argument is empty, then no rows are filtered. The second argument, 'j' operates on columns. In addition to just selecting columns as in a data frame, it also allows for directly computing on the columns as you will see in the next chapters. The last argument 'by' allows you to operate on columns by groups.
data table is also very fast - many operations are parallelized including filtering, ordering, grouping, file reading, writing etc. Check out this link for up-to-date benchmarks on data table's performance against other common packages.
Finally, data table has many additional powerful features including rolling, overlapping and non-equi joins, updating tables by reference, fast reshaping, parallel file reading/writing, primary key based joins, automatic creation of secondary keys for faster filtering and joins etc. We will not cover joins in this course, but they are covered in great detail in the Joining Data in R with data table course here on DataCamp.
There are at least three ways in which you can create data tables. In this video, we will cover the first two - using the data table and as data table functions. We will cover fread() in the final chapter of this course.
You can use the data table function to create a data table from scratch the same way you would use the data frame function. All you need to do is pass vectors of the same length to the data table function.
To convert an existing R object to a data table, you can use the as data table function. As you can see here, we converted the list y to a data table.
As mentioned earlier, a data table is also a data frame and you can confirm that here from the output of class().
And thus you can use all the functions you would use on a data frame on a data table. nrow(), ncol(), and dim(), for example, return the number of rows, columns, and dimensions of a data table, as it would on a data frame.
However, there are a few enhancements.
Unlike a data frame, a data table doesn't automatically convert characters to factors thus preventing bugs by avoiding unexpected behavior.
Also, a data table never sets or uses the row names.
Finally, a minor but useful feature is that when you print a data table, a colon (:) is added after the row number to visually separate it from the first column.
Now it's your turn to create data tables!
#R #RTutorial #DataCamp #Data #Manipulation #datatable