Unsupervised feature learning with matrix decomposition - Aedin Culhane, PhD | ODSC East 2018

preview_player
Показать описание
Supervised learning is among the most powerful tools in data science but it requires a training dataset in which one knows the classes of the input features apriori. For example, a classification algorithm learns the identify of animals through training on a dataset of images that are labeled with the species of each animal. Unsupervised learning is applied when data is without labels, the classes are unknown or one seeks to discover new groups or features that best characterize the data. I will provide an overview of unsupervised learning algorithms, including dimension reduction and matrix factorization approaches that learn low-dimensional mathematical representations from high-dimensional data. There are numerous computational techniques within the class of matrix factorization, each of which provides a unique interpretation of the processes in high-dimensional data. I will describe and do my best to demystify matrix factorization approaches, including principal component analysis, correspondence analysis and non-negative matrix factorization, in addition to newer approaches including t-SNE and autoencoders. Extensions to these approaches can be applied to simultaneously learn the structure and features in multiple data sets. Methods such as canonical correlations analysis, multiple factor analysis extract the linear relationships that best explain the correlated structure across datasets. I will describe how we apply these approaches to tens of thousands of tumors to advance precision medicine in oncology.

Do you like this material?
Share your opinion in comments

#FeatureLearning #DataScience #DeepLearning #ODSC