Improving the LSA with a TFIDF (4/5)

preview_player
Показать описание
This video follows up with a third full LSA Pipeline using Databricks Runtime for Machine Learning and the open-source libraries Scikit-Learn and Pandas.
This video introduces the core concepts in Natural Language Processing and the Unsupervised Learning technique, Latent Semantic Analysis. The purposes and benefits of the technique are discussed. In particular, the video highlights how the technique can aid understanding of latent, or hidden, aspects of a body of documents, in addition to reducing the dimensionality of the original dataset.

This is Part 4 of our Introduction to Latent Semantic Analysis Series:

This video uses the same body of documents: strings of text from two popular children’s books. Here, we iterat on the previous LSA Pipeline by using an alternate method, Term Frequency-Inverse Document Frequency, to prepare the Document-Term Matrix. After completing the process, we examine two byproducts of the LSA—the dictionary and the encoding matrix—in order to understand how the documents are encoded in topic space. Finally, we plot the resulting documents in their topic-space encoding using the open source library Matplotlib and compare the plot to the one prepared in the previous video.
Рекомендации по теме
Комментарии
Автор

These vids are great man. Love how they are bite sized and in a series rather than one long lecture.

WeMet
Автор

In order to properly use the LSA, shouldn't you perform a Varimax Rotation on the term and document loadings before interpreting the results? If so, is it possible to do that in sci-kit learn aswell? Thanks in advance

BlackAirtrack
welcome to shbcf.ru