Evaluation Measures for Search and Recommender Systems

Показать описание

Evaluation of information retrieval (IR) systems is critical to making well-informed design decisions. From search to recommendations, evaluation measures are paramount to understanding what does and does not work in retrieval.

Many big tech companies contribute much of their success to well-built IR systems. One of Amazon's earliest iterations of the technology was reportedly driving more than 35% of their sales. Google attributes 70% of YouTube views to their IR recommender systems.

IR systems power some of the greatest companies in the world, and behind every successful IR system is a set of evaluation measures.

🌲 Pinecone article:

🔗 Code notebooks:

🤖 70% Discount on the NLP With Transformers in Python course:

🎉 Subscribe for Article and Video Updates!

👾 Discord:

00:00 Intro
00:51 Offline Metrics
02:38 Dataset and Retrieval 101
10:21 MRR
13:32 MRR in Python
29:48 Final Thoughts

Рекомендации по теме

Комментарии

Hi James, I have a question on NDCG or any other ranking aware metrics. How does these metrics work where you have millions of products/items. What I mean is if we have millions of items, then it means we have to first label (manually) all the million items for relevance /rank. And then when our model predicts we use NDCG. Isn't this a big drawback of NDCG. Can you please suggest what is better approach to rank if we don't have relevance labeled data. Thanks in

goelnikhils

Amazing Explanation. So clear. Very helpful

goelnikhils

Your videos are impressive and very informative mate. 👌

shrar

Hi, I have a query If I am working on a song recommendation project by using Spotify API data set, I have used models like cosine similarity, matrix factorization, knn, Latent Semantic Analysis (LSA) model, Correlation Distance method. Now I am confused about how should I approach for evaluation metric in this system.

preetimehta

21:23 Statistically there is probably a cat in the box on image 3

morannechushtan

1. I got confused at 18:29 when predicted is a nicely increasing sequence making me think are those ranks or item ids. I was also thinking whether the len of intersection act_set & pred_set could simply be len(act_set), then i realized this example here is a very special case where act_set is subset of pred_set. If act_set contains value 9, then we can't use len(act_set) alone and the formula in video is required.

2. Similar to question nikhil goel asked in comments section 2 weeks before this, where does 13:46 actual_relevant data come from? It looks manually labelled, and this labelling occurs per query making it super unscalable?.

3. Assuming we accept manual labelling how is the 0-4 range determined? I feel like drift is a problem, when todays 4 becomes tomorrows' 3 as value judgements change, does this mean relabelling all results again?

4. I noticed some metrics aggregate across queries and k, and some are only within 1 query across k, in what scenarios do we use each?

Han-veuh

Biggest problem is labeling the product whether it is relevant or not. It is not possible to label each search. Meanless if you can't handle with that.

tarikkarakas

Hi James! can u make some vedios of updating Models if we Keep on getting data(e.g Biweekly)

joyeetamallik

IN MRR, when our search result doesnt inclued the result that we want, for your example if we want to search for cats and we find only dogs, how can we calculate MRR ? can we give it a big number for exemple rank 20 for all Not included results? 1/20

Data_scientist_trmi

love your videos but why do you always seem so sad

mattygrows

Evaluation Measures for Search and Recommender Systems

Evaluation Measures for Search and Recommender Systems

Search and Evaluation Methods for Class Level Information Retrieval

Evaluation 8: F-measure

CMPT 621 | Information Retrieval | S21 | Lec 3.b | Evaluation Measures

IATF 16949 audits | How to: define, communicate, measure & improve Key Performance Indicators - ...

Made to Measure Ranking Evaluation using Elasticsearch

25. Precision and Recall - Effectiveness Measures in Evaluation of Information Retrieval System

Part 2: How to choose evaluation methods

Evaluating AI Agents

How to Measure Intranet Performance: 3 Key Success Metrics

Analyze Performance Measures in Solution Evaluation

Recruiter Performance Evaluation: How Should You Measure It? | Starred

Measures used in Information System Evaluation

SEO Metrics to Measure SEO Performance

Week 6 Lecture 44 - 2 Class Evaluation Measures

Monitoring and Evaluation Tools for NGOs in 2022

Accuracy and Error measures | Evaluation of Accuracy for classifier and predictor

RSA 911 Data, Common Performance Measures, Program Evaluation, and Quality Assurance

The Easiest Way to Measure Your Method’s Performance in C#

How to Measure your Brand Performance

Tracking SEO Performance Tools and Metrics for Success

MEASURE A WEBSITE's PERFORMANCE IN 5 MINUTES

SEO Metrics to Measure SEO Performance | Search Engine Optimization | SEO Course | SEO Tutorials

Search Engine Optimization Techniques Experience Performance Measure Services Process Hierarchy