Berlin Buzzwords 2015: Adrien Grand – Algorithms & data-structures that power Lucene & ElasticSearch

Показать описание

When you want to make search fast, 80% of the job involves organizing your data so that it can be accessed with as little work as possible. This is the exact reason why Lucene is based on an inverted index. But there are some very interesting algorithms and data structures involved in that last 20% of the job.

In this talk, you will gain insights into some internals of Lucene and ElasticSearch, and see how priority queues, finite state machines, bit twiddling hacks and several other algorithms and data structures help make them fast.

Read more:

About Adrien Grand:

Рекомендации по теме

Комментарии

Agenda:
conjunctions: 1:30
regexp queries: 9:15
numeric doc values compression: 14:47
cardinality aggregation: 24:46

davids

Succintly explained. Thanks a lot for sharing.

AmanGarg

Which values are being hashed in hyperloglog?

tarunjain

Berlin Buzzwords 2015: Adrien Grand – Algorithms & data-structures that power Lucene & ElasticSearch

Berlin Buzzwords 2015: Adrien Grand – Algorithms & data-structures that power Lucene & Elast...

Berlin Buzzwords 2017: Adrien Grand - Running slow queries with Lucene #bbuzz

Berlin Buzzwords 2014: Adrien Grand - ElasticSearch - aggregations #bbuzz

Berlin Buzzwords 2015: Michael Busch – A complete Tweet index on Apache Lucene #bbuzz

Berlin Buzzwords 2015: Omer Trajman – Predictive Insights for IT Operations #bbuzz

Berlin Buzzwords 2015: Shikhar Bhushan - Diving into ElasticSearch Discovery #bbuzz

Berlin Buzzwords 2015: Mikhail Khludnev - Approaching Join Index for Lucene #bbuzz

Berlin Buzzwords 2015: Ryan Ernst - Compression in Lucene #bbuzz

Berlin Buzzwords 2012: Stefan Pohl - Efficient Scoring in Lucene #bbuzz

Berlin Buzzwords 2015: Nick Burch - What's with the 1s and 0s? Making sense of binary data at s...

Berlin Buzzwords 2015: Ted Dunning -What and Why and How: Apache Drill 1.0 #bbuzz

Berlin Buzzwords 2015: Ivan Mamontov - Fast Decompression Lucene Codec #bbuzz

Berlin Buzzwords 2019: Nick Burch – Building an AI/ML powered text search system #bbuzz

Berlin Buzzwords 2015: Stephan Ewen - Apache Flink deep-dive #bbuzz

Berlin Buzzwords 2017: Grant Ingersoll - BM25 is so Yesterday: Modern Techniques for Better Search..

Berlin Buzzwords 2012: Martijn van Groningen - Joining in Lucene #bbuzz

Berlin Buzzwords 2015: Patrick Peschlow –The Dos & Don'ts of ElasticSearch Scalability &...

Berlin Buzzwords 2011: Simon Wilnauer - Heavy Committing Lucene DocValues aka. Column Stride Fields

Berlin Buzzwords 2015: Uwe Schindler - Apache Lucene 5 - New Features & Improvements for Apache ...

Berlin Buzzwords 2013: Martijn van Groningen - Document relations with ElasticSearch #bbuzz

Berlin Buzzwords 2023: The Debate Returns (with more vectors): Which Search Engine?

Berlin Buzzwords 2015: Ted Dunning – Practical t-digest Applications #bbuzz

Berlin Buzzwords 2019: Erik Hatcher – Chatting with Solr #bbuzz

Berlin Buzzwords 2012: Sanne Grinovero - What You Get by Replicating Lucene Indexes on the Data Grid