R Tutorial: The ExpressionSet class

Показать описание

---
Now you'll learn how to manage gene expression data using Bioconductor classes.

So far you have been dealing with 3 separate objects for a given experiment. This can become tedious and precarious when you want to subset the features and/or samples in your data. For example, to subset to include only the 1000th gene and the first 10 samples, you would need to write 3 separate lines of code, making sure to always subset the correct dimension depending on which object you are subsetting. A single misplaced comma looks almost identical but will cause a huge problem.

To make analysis easier and less error-prone, Bioconductor provides classes to store the data for complex biological experiments. This is an approach known as object-oriented programming. A class defines a structure to hold complex data, and a variable of a given class is referred to as an object of that class. Every class has methods, or functions that work in a special way for objects of that class. Two common types of methods are getters, which retreive the data in an object, and setters, which modify the data. As you'll see, methods for Bioconductor classes can be both getters and setters.

The core Bioconductor classes are in the package Biobase. If you've used any Bioconductor packages, you likely already have this installed. However, to specifically install it, you can follow the standard Bioconductor installation process by running these two lines.

You create an ExpressionSet object with the function of the same name. You'll pass it three objects. You pass the expression matrix as assayData, the phenotype data frame as phenoData, and the feature data frame as featureData. Note that you first need to convert the phenotype and feature data frames into annotated data frames. These are a Bioconductor class that supports including descriptions of the columns of the data frame, which you don't take advantage of here in this code.

For the breast cancer data, this creates an ExpressionSet object with 22283 features and 344 samples.

In general this is a bare minimum to include in an ExpressionSet object. You can also include information on the experimental procedures and the scientists that performed them. See the manual page if you're curious to learn more.

If you have data in an ExpressionSet object, you will want to access specific parts of the data to perform different analyses, such as visualization. To do this, you can retreive the expression matrix with the function exprs, the feature data with fData, and the phenotype data with pData.

Recall the code to subset when using 3 separate objects to manage the data. This can be accomplished in one line of code using an ExpressionSet object, and you don't have to worry about remembering how to properly subset the feature and phenotype data frames since these are automatically subset for you.

Recall the code for creating a boxplot. Using an ExpressionSet object containing the breast cancer data, the first gene can be plotted by using the accessor methods `exprs`, `pData`, and `fData` to get the same result. This becomes more powerful when combined with the convenient subsetting from the previous slide.

Now it's your turn to work with ExpressionSet objects.

#R #RTutorial #DataCamp #Analysis #limma #Differential #expression #data