Zipf's Law

Показать описание
Do most words in a corpus occur with average frequency? Absolutely not! This video discusses a surprising regularity about word frequencies in corpora. And at the end, we'll make a trip to Hogwarts and see if Zipf's Law applies also in the world of wizards.

If you want to follow along, here are the word list files:

Request the Potter corpus:

Vsauce on Zipf:
Рекомендации по теме

I LOVE that you've linked to Michael Stevens' video. I'm playing around with predictive language models and I'm really happy you're talking about WORD TOKENS in this video!


Hello :) I watched your abralin talk live on Wednesday. I study generative syntax, and I was very inspired by your discussion of negative evidence in the Q&A session! Thank you for all the wonderful videos!


Never seen normal distribution being explained so clearly and easy way to understand.


Thanks for that extensive video! It put a great value into my master's thesis. Even though I'm dealing with distributions in geographical data, it was great and easy way to understand Zipf's law.


Martin, Zipf's law makes me wonder about the value of MI scores, not that they aren't meaningful, but when you review collocation results for a word and find that MI seems to have nothing to do with absolute frequency, but just mutual attraction continuing to exert its pull regardless of frequency. Collocation is a function of context, and it's the frequency of contexts that varies, analogous to the way certain climatic circumstances can promote the health of, say, vegetation and insects. Plug "miserable" into COCA and you get "creature" at rank 15 and an MI of 7.38 after a long line of MIs in the 3.0 range, because "miserable creature" is construction that occurs on certain rhetorical occasions. Am I overthinking this?


I'm doing a project on this same thing, would there be any chance for me to get in contact with you for a possible interview? awesome video by the way


Excellent video. You teach excellently, your students must be happy with you.


Hi Martin, thank you for the wonderful and very helpful video. I am applying Zipf's law on my task to create a dictionary of words that are specific for a particular category - However, I wonder if I could use the curve to determine a threshold number for the most significant words for the dictionary ? For instance, use the intercept to determine this?


Have you ever tried plotting the multiple "position × n", would be interesting to see how much it varies. (if it was in the video I missed it)


Hi, thank you for your wonderful videos.
Does this law hold true for words uttered or written by non-native speakers of a language? or uttered by children before having mastered the language?


But what if you make a language with "aaa" before every word? Does Zipf's law apply then?


Thank you for the video:D I'm trying to download Antconc on mac with the newest version but there can be opened because "Apple cannot check it for malicious software." Also, when I was forced to open it doesn't have a way to open files on it. I would wondering do there have any ways to fix those problems?


Hi, Thank you, I will follow all video of uncle
