Analyzing Text Data with R on Windows

preview_player
Показать описание
Provides introduction to text mining with r on a Windows computer. Text analytics related topics include:
- reading txt or csv file
- cleaning of text data
- creating term document matrix
- making wordcloud and barplots.

R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.
Рекомендации по теме
Комментарии
Автор

You're such an excellent and clear teacher ! thank you.
Question: how do you deal with names? First names and last names are separated into 2 different words. How to merge them into one so that the bar plot visualizes them not as separate ?

ericrichard
Автор

Hello sir I have followed the same process and want to make Sankey and node diagram but am getting an error, can u help me out in making the plot

shraykumar
Автор

Thanks for this Gr8 and simple video.
I have 200k rows in dataset and in 1st and 2nd column consist of sentences and i have to predict cosine similarity between 1 & 2 column into 3rd column
ex:- 1st column : who is ramesh, 2nd column I'm not a ramesh singh and in 3rd column: 0.70 (which is there cosine value )
how to approach this problem.

devawratvidhate
Автор

Thank you so much for your prompt response. Do you have any other video about Data analysis using R?

matharbarghi
Автор

please Sir make video on build a model Convolutional Recurrent Neural Network for text recognition .

sk
Автор

In the bar plot some words are out of the box, i tried with cex.axis but it doesnt fix also i tried with axis(1, cex.axis=0.5) but it still cuts some letter of the words .So is it a R studio problem or is their a way to this

NiteshKumar-tjbc
Автор

At the last moment, when I run the last code I received the following error:

Error in if (grepl(tails, words[i])) ht <- ht + ht * 0.2 :
argument is of length zero

In addition: Warning messages:

1: In doTryCatch(return(expr), name, parentenv, handler) :

"min.words" is not a graphical parameter

2: In doTryCatch(return(expr), name, parentenv, handler) :

"min.words" is not a graphical parameter

3: In doTryCatch(return(expr), name, parentenv, handler) :

"min.words" is not a graphical parameter

4: In doTryCatch(return(expr), name, parentenv, handler) :

"min.words" is not a graphical parameter


Do you have any suggestion for that please?

zahedarman
Автор

Thanks for your great channel. I am wondering, could you please teach us about regex library? - (i.e. how to search questions in a text file save it in other formats like CSV)

Didanihaaaa
Автор

Can you make video on tokenization in R language

DnyaneshwarPanchaldsp
Автор

Any idea how to remove emoticons and smileys from the review in tm_map () func.

tanmaygawade
Автор

Hi
I am little lost, My question is how to find the data as you mention you have downloaded from codes website? can you please help me by explaining how to obtain that .
thanks

mdabusayeed
Автор

Dear Sir,
I am getting the following error. could you please check. Thanks

> cleanset <- tm_map(cleanset, PlainTextDocument)
> dtm <- TermDocumentMatrix(cleanset,
Error in simple_triplet_matrix(i, j, v, nrow = length(terms), ncol = length(corpus),   :
  'i, j' invalid

akkimalhotra
Автор

Dear Dr. Rai,

Thank you for another excellent tutorial. I have gained many skills from your tutorials.
While running the code dtm <- TermDocumentMatrix(cleanset, control=list(minWordLength=c(1, Inf)))
resulted in an error given below. I request you to take a look at this issue and help me to move further.
Error in simple_triplet_matrix(i, j, v, nrow = length(terms), ncol = length(corpus), :
'i, j' invalid


--Arjun

ArjunCPArjun
Автор

sir i am getting : Error in barplot(termFrequency, las = 2, col = rainbow(20)) :
object 'termFrequency' not found

randomslugger
Автор

sir please make a video related to the tweets polarity and ggplot and maps related to the tweets origin

KnowledgeADDA-nc
Автор

could you please share the code files and data file

rajatbathla
Автор

while executing code
dtm<-TermDocumentMatrix(cleanset, control=list(minWordLength=c(1, Inf)))

its showing error
dtm<-TermDocumentMatrix(cleanset, control=list(minWordLength=c(1, Inf)))
Error in simple_triplet_matrix(i, j, v, nrow = length(terms), ncol = length(corpus), :
'i, j' invalid

MukeshKumar-mpkc
Автор

Hi Sir, does the terms "creating corpus" and "tokens (tokenization)" are one and the same ???

vishnukowndinya
Автор

Hi Sir,
Thank u very much. It's a great tutorial.
I have two question
1)How to fix spelling mistake of a word in the corpus and replace with the correct word?
2)Is R able to handle if I have 5 lacs comment to analyse?

kumarmithun
Автор

Sir may I ask what if the text file contains special characters? like ("" \ /), I tried the suggested commands, but it doesn't seem working properly.

bbbaaa