Text analysis in R. Demo 1: Corpus statistics

preview_player
Показать описание
This demo is part of a short series of videos on text analysis in R, developed mainly for R introduction workshops.

A more detailed tutorial for the code discussed here can be found on our R course material Github page:
Рекомендации по теме
Комментарии
Автор

I'm literally crying because of how awesome you can compile and explain text analysis. Thank you!

zbear
Автор

Thanks for making such a great video. I have been reading about text analysis and your video is the first I found easy to practise along with on YouTube.
The explanation of what every code chunk means and does was the magic for me. Thanks again

data_kom
Автор

Thank you for this!!
Gracias por esto!!

davidrogerdat
Автор

Hi! Thank you so much for your great help! This is exactly what I was looking for. Many of the lines of codes are deprecated though :( can someone teach me how to update them with the latest R version? Thanks so much!!

doctortito
Автор

Thank you for informative text analysis videos. I am just begginner on texxt analysis and R, I start with your videos. I have got a question at 12 :13 min, kwic() needs tokens() so, I applied toks <- tokens(corp)
k = kwic(toks, 'freedom', window = 5) . Is it true?

learning.data.science
Автор

The dfm function is defunct unfortunately :(

marcosechevarria
Автор

Awesome video, Kasper! Thx a lot!
I have some questions about the comparison of PDF-files. I want to compare german party manifestos over time, which I already gathered (for EP and for regional elections). I wanted to compare their europe-saliency (how often the term europ* accours and if it occurs more often when elections are combined (EP2014, Regional 2015 VS EP+Regional 2019)) and I cannot run the code like in the is_obama example. I do need something like
"is_CDU = docvars(dtm)$CDU== 'europ*' " if you know what I mean? I think my approach to copmare these manifestos should not take that much time/code, but I am somewhat hopeless and I hope you can help me.

U got me as a sub for life! :D

toddcoode
Автор

Thanks for the video! How do you define the documents for the corresponding president, such as Obama? Does R do it automatically? How? Thanks in advance.

zolzayaenkhtur
Автор

4:51 its showing this message "Error in textplot_wordcloud(dtm, max_words = 50) :
could not find function "textplot_wordcloud""
i have all the relevant packages but still getting this. Do you know why ? and how to solve it?

sakifzaman
Автор

Hi, thnku for the video. It's really helpful. Just a doubt, what if we have to use a text file consisting of stories or something descriptive. The file is not in csv format, it has only paragraphs, neither column, nor row. How to handle such data file?

abhipsatripathy
Автор

Please prepare a video on fastRText, text classification library by facebook... the documentation is poor

redietgebretsion
Автор

hello i' can't find the moment where you speak bout word documents. I'm having my words documents to crete a corpus

lobe
Автор

@kesper welbers How could we use the quanteda for bibliographic data obtained from the Scopus in .csv format or web of science database in plain text format? Please help me with that process. Would be more thankful

syedabdullah
Автор

Hi, thank you for your video. I have a question. While creating the dictionary, what if I have a long keywords list, should I type them in manually? That's hard. Do you have any idea? Thank you.

yimeilong
Автор

can not use - dfmat_inaug <- dfm(toks_inaug, remove = stopwords("en") -is outdeated - what can I do insted?

roxyioana
Автор

I am bit confused with my question. My text contains digits also. In weight format like 120g, 130g etc. I need to remove them and I have to categorize the column into three names such as potato_chips, not_potato_chips, not_chips. Could you please help me ? Or any hint. :)

Havana
Автор

what about importing text from multiple pdf/docx?

DeborahNicoletti
join shbcf.ru