filmov
tv
Improving the LSA with a TFIDF (4/5)

Показать описание
This video follows up with a third full LSA Pipeline using Databricks Runtime for Machine Learning and the open-source libraries Scikit-Learn and Pandas.
This video introduces the core concepts in Natural Language Processing and the Unsupervised Learning technique, Latent Semantic Analysis. The purposes and benefits of the technique are discussed. In particular, the video highlights how the technique can aid understanding of latent, or hidden, aspects of a body of documents, in addition to reducing the dimensionality of the original dataset.
This is Part 4 of our Introduction to Latent Semantic Analysis Series:
This video uses the same body of documents: strings of text from two popular children’s books. Here, we iterat on the previous LSA Pipeline by using an alternate method, Term Frequency-Inverse Document Frequency, to prepare the Document-Term Matrix. After completing the process, we examine two byproducts of the LSA—the dictionary and the encoding matrix—in order to understand how the documents are encoded in topic space. Finally, we plot the resulting documents in their topic-space encoding using the open source library Matplotlib and compare the plot to the one prepared in the previous video.
This video introduces the core concepts in Natural Language Processing and the Unsupervised Learning technique, Latent Semantic Analysis. The purposes and benefits of the technique are discussed. In particular, the video highlights how the technique can aid understanding of latent, or hidden, aspects of a body of documents, in addition to reducing the dimensionality of the original dataset.
This is Part 4 of our Introduction to Latent Semantic Analysis Series:
This video uses the same body of documents: strings of text from two popular children’s books. Here, we iterat on the previous LSA Pipeline by using an alternate method, Term Frequency-Inverse Document Frequency, to prepare the Document-Term Matrix. After completing the process, we examine two byproducts of the LSA—the dictionary and the encoding matrix—in order to understand how the documents are encoded in topic space. Finally, we plot the resulting documents in their topic-space encoding using the open source library Matplotlib and compare the plot to the one prepared in the previous video.
Комментарии