R Tutorial: Typical workflow

Показать описание

---
In this video, we are going to go over the typical steps to analyze single-cell RNA-seq data.

The development of new methods and protocols for scRNA-Seq is currently a very active area of research, and several protocols have been published over the last few years. The image here taken from Svensson et al paper shows on the y-axis that the number of cells per dataset has increased from 1 cell for the very first dataset in 2009 to up to 1 million cells for datasets generated today.

The methods can be categorized in different ways, but the two most important aspects are quantification and capture.

For quantification, there are two types of technologies, full-length and tag-based. The full-length protocols try to achieve a uniform coverage of each RNA sequence. By contrast, tag-based technologies only capture either the two ends of each RNA. The choice of quantification method has important implications for what types of analyses the data can be used for.

Then, the strategy used for capture mostly determines throughput. The three most widely used options are microwell-, microfluidic- and droplet-based.

For more details about the different technologies, you can go to the Hemberg lab website, it's the reference number 2 at the bottom of this slide and a great reference for analyzing scRNA-Seq.

After this brief overview of the different technologies, let's now get an overview of the different steps of a typical workflow to analyze single-cell RNA-seq. Each of these steps is actually a chapter of the course, so we won't go into details here, but just look at the big picture.

The very first step when working with scRNA-Seq data is to filter out low-quality cells to ensure that technical effects do not distort downstream analysis results. Two common measures of cell quality are the library size and cell coverage. The library size is defined as the total sum of counts across all genes, where here the word "library" refers to a cell. And the cell coverage is defined as the average number of genes with non-zero counts for that cell.

Once the problematic cells have been removed, a typical workflow to analyze scRNA-Seq data includes several steps.

The first step is the normalization of cell- and gene- specific biases. It is a critical step in the analysis pipeline that adjusts for unwanted biological and technical effects that can mask the biological signal of interest.

Then, the large majority of scRNA-Seq analyses include a dimensionality reduction step where the number of dimensions goes from J (that is the number of genes) to K which is smaller than J. This step achieves a two-fold objectives: first the data become more tractable, and second noise can be reduced while preserving the signal of interest.

The third step, is to group the cells according to the low-dimensional matrix K by N computed in the previous step where N is the total number of cells in the dataset. From this step, you get a cluster label for each cell.

Finally, the last step is to find biomarkers between identified groups of cells, that is, find genes that are differentially expressed between groups of cells. This gives you an overview of the typical workflow used to analyze scRNA-Seq data.

Let's now go over some exercises.

#R #RTutorial #DataCamp #SingleCell #RNASeq #Bioconductor #Typical #workflow