Indexing 7: v-byte encoding (compression)

preview_player
Показать описание
V-byte encoding allows us to use fewer bits to represent smaller numbers (which result from delta encoding), while allowing arbitrarily large numbers in the index. We use 7 bits of each byte to represent the number itself, and 1 bit to indicate how many bytes are used to represent the full number.
Рекомендации по теме
Комментарии
Автор

Very useful! Came here after reading the chapter of Information Retrieval book, was a little bit confused. Glad you cleared it up!

ihorprotsenko
Автор

at 3:08, you said a byte start with 0 means that this is the first byte of a multi-byte number. I don't think it is accurate. It should be: a byte start with 0 means that this is NOT the last byte of a multi-byte number and we should continue looking next byte.
Please correct me if I am wrong. Thanks!

zjuzhanxf
Автор

Is this also how variable length encoding like utf-8 works? 🤔

hansisbrucker
Автор

Thanks this was awesome explanation 😁😀

RAJATTHEPAGAL