Text Summarizer Using Python | NLTK Library in Python | Auto Text Summary Generator Using Python

preview_player
Показать описание
There is an unbelievably huge amount of data. It is impossible for a user to get insights from such huge volumes of data. Furthermore, a large portion of this data is either redundant or doesn't contain much useful information. The most efficient way to get access to the most important parts of the data, without having to sift through redundant and insignificant data, is to summarize the data in a way that it contains non-redundant and useful information only. The data can be in any form such as audio, video, images, and text.

It is text summarization using natural language processing.

In this video, we will see how we can use automatic text summarization techniques using python library nltk to summarize text data. nlp projects are in demand now a days.

To keep it simple, I will be using an unsupervised learning approach to find the sentences similarity and rank them. One benefit of this will be, you don’t need to train and build a model prior start using it for your project.
______________________________________
What is Natural Language Processing?

Natural language processing is a sub-field of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data
______________________________________
Natural Language Processing Tools:

NLTK: It stands for Natural Language ToolKit and is an essential library supporting tasks such as classification, stemming, tagging, parsing, semantic reasoning, and tokenization in Python. It’s the primary tool for natural language processing and machine learning. It represents all data in the form of strings, which is fine for simple constructs but makes it hard to use some advanced functionality. Today it serves as an educational foundation for Python developers who are new to machine learning.

TextBlob: It is helpful for developers who are starting out with NLP in Python and want to make the most of their first encounter with NLTK. It basically provides beginners with an easy interface to help them learn most basic NLP tasks like sentiment analysis, noun phrase extraction, text classification, part-of-speech tagging, and more. . TextBlob also includes functionality from the Pattern library. It can be used for rapid prototyping of various NLP models and can easily grow into full-scale projects.

gensim: It is a highly specialized Python library that largely deals with topic modeling tasks using algorithms like Latent Dirichlet Allocation (LDA). It is also excellent at statistical semantics and recognizing text similarities, indexing texts, and navigating different documents. genism has also been designed to extend with other vector space algorithms. Further, it is licensed under the OSI approved GNU LGPLv2.1 license. Also, it is free for both personal and commercial use.

spaCy: It is a relatively young library was designed for production usage. It is more accessible than other Python NLP libraries like NLTK. It offers the fastest syntactic parser available on the market today. As the toolkit is written in Cython, it’s also really speedy and efficient. Due to C-like blazing fast performance, spaCy provides a compelling approach to NLP, superior to the rest of the competition. Additionally, it helps in integrating the other data science tools and frameworks.
There are many cool projects using pythons nltk library we can develop. This project is one part of this.
______________________________________
It’s good to understand Cosine similarity to make the best use of code you are going to see. Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them. Since we will be representing our sentences as the bunch of vectors, we can use it to find the similarity among sentences. Its measures cosine of the angle between vectors. Angle will be 0 if sentences are similar.
______________________________________
What You'll Need:
1.Python
2. Python code editor like Visual Studio Code(I am using this ) or jupyterlab will also work
3. nltk library
4. networkx library
______________________________________
Video Timeline:
00:00 Introduction
00:25 Introduction to "Why Text Summarization?"
01:00 Introduction to nltk library.
02:14 Cosine Distance Method to Measure the Similarity.
03:25 Installing nltk library on python terminal.
05:05 Actual Coding Starts here.
05:30 Read Article Function.
08:00 Sentence Similarity Function.
11:00 Generati Similarty Matrix Function.
14:00 Generate Summary Function.
18:34 Output Shown.
______________________________________

Follow us on

Рекомендации по теме
Комментарии
Автор

Im having an error saying list index out of range. Could you please help

swaminisontakke
Автор

thanx it helped me a lot today if i wont be able to do it i would have been disqualified from my competition thanx alot ;) ;) :) :)

monikagupta
Автор

Very informative video, thank you for sharing valuable information. Need to ask you something, is NLTK support other languages like (Japanese, Chinese, Korean etc ) for auto text summary generator ?

kangandhi
Автор

I try your method but the summarize_text output won't showed up when i use larger dataset. For example, when i use only 8 list of sentences the summarize_text will show up but when there are around 6000 sentences in my list, it doesnt show up. Can you tell me what's wrong? By the way, i love your tutorials it is really well explained

yovisstar
Автор

how can i do text summerization by using knime tool

smtsdlkr
Автор

At 6:18 it should be for sentence in article not sentences. Please verify once. Thanks!

palakmantry
Автор

sentences.append(sentence.replace("[^a-zA-Z]", " ").split(" "))
AttributeError: 'str' object has no attribute 'append'

gayathrigarine
Автор

mam one doubt for how many lines or specific amount of text this code will give us summary ? is there any limit if it is how much to enter the text please tell

adityabapat
Автор

I am having Error which Says "list index out of range" plz help its my semestr projct.?

hussnainkhizar
Автор

What is the flow diagram of text summarization??

RahulRoy-hbnj
Автор

Can someone please provide the code of this program? I am getting many errors in file handling section. Trying it on Google Colab.

LightningJake
Автор

Hi i am having error in line 21 (syntax error) please help.

nikhilnagar
Автор

Hey can you pls share the training data in which u trained the model

AmanSharma-bjyc
Автор

Hi, this is Extractive or Abstractive method?

captainng
Автор

Ma'am can u please provide the whole Bcoz I m getting some So plz

ssabarish
Автор

Can anybody tell me why we put ""stopwords = None"" in sentence_similarity function as a function argument instead of only stopwords?

calkestis
Автор

Maam, Can you please share the code?! I have mailed uh asking for it.

shaikabdulraqeeb
Автор

Traceback (most recent call last):
File "first.py", line 56, in <module>
generate_summary("msft.txt", 2)
File "first.py", line 47, in generate_summary

File "first.py", line 9, in read_article
filedata=file.readlines()
getting this errors can u help to solve this ?

neetusonkar
Автор

Can you please share the link for code

alamnomaan
Автор

Thank you for sharing this video. Hello Ma'am I found FileNotFoundError what I can do to remove this error .

priyarakate