[Monday evening short video] Summary of two new amazing LLM benchmarking papers: GAIA and GPQA

Показать описание

Sharing a summary of two amazing LLM benchmarking papers published just last week:
- GAIA: the General AI Assistant benchmark (disclaimer: I'm a co-author with amazing co-authors)
- GPQA: the Graduate Level Google Proof QA benchmark (disclaimer: the authors are also awesome)

When two teams (covering a diverse range of actors like Anthropic, Cohere, New-York University, Hugging Face, Meta AI) independently come up with benchmarks that share so many aspects (while being really different in goals and approaches), you know the future of LLM benchmarking is basically changing under your eyes.

Both are super difficult with ~30% GPT4 success rate, both are small (450 questions) and carefully hand-crafted question by question, with a single gold answer and a strong interest for the reasoning it-self rather than memorization capabilities. Both very challenging test bed for coming models capabilities

And above all: I'm super excited that these are open-source benchmark giving us common ground for comparison of the coming frontier models. On to the future (of open evaluation)!

Papers and more information:

HuggingFace

Рекомендации по теме

Комментарии

Vegas in the building. Could use better videos on explaining how to use huggingface to autotrain, or using docker. Can not find a fully video that explains in great detail for learning purpose. Everyone send you to a website to read pages of work. Help plz 🙏

bumlifeBomblifeManagement

After reading the GPQA, I'm pretty sure I can't do a single question (BSc Biology)

zaf

[Monday evening short video] Summary of two new amazing LLM benchmarking papers: GAIA and GPQA

[Monday evening short video] Summary of two new amazing LLM benchmarking papers: GAIA and GPQA

Watch highlights from Night 1 of the Democratic National Convention in 3 minutes

Evening Star Candlestick pattern|| #shorts#shortvideo#sharemarket#candlestickpattern#chartpatterns

Monday evening #shortvideo no comment

How to Read Evening Star Candlestick #candlestick #stocks #shortvideo #shortsfeed #shorts #nifty

Client: Magical Vacation Planner Short Form Video of 70's Night Story Summary

Evening chart patterns #shortvideo #motivational @smartupdates

evening Star ⭐ candle stick pettarn 15 minutes #candlestickpattern #shortvideo #youtube

6 moments from Barack Obama's speech at the 2024 DNC

best honeymoon destination in Bali Indonesia #honeymoon #bali #nidhiholidays #couple

evening 🌆 view 🪟 beautiful 😍😍 nature penting short #viralvideo ##youtubeshorts #shortvideos trending...

good night pc | Minecraft | #shorts #shortvideo

night scenery drawing with oil pastel😍#viral #youtubeshorts #trending #shorts #short #shortvideo ❤...

Evening Star Pattern | #candlestick #short #stockmarket #shorts

EVENING STAR #aducation #stockmarket #candlestick #shorts #short #shortvideo #trendingshorts

evening star pattern #short #priceaction #shorts #banknifty #trading #shortvideo #youtubeshorts

Intraday Over Night Strategy 78% accuracy Nifty50 #shorts #shortvideo #video

The Evening Star|| The Evening Start in Live Chart #candlestickpatterns #chartpatterns #shortvideo 🔥...

#haryana #yamunanagar #night #view #shortvideo

Out swing by kid ramzan night cricket #cricketshorts #shortvideo #cricket #tapeballcricket

Madhyamik candidate night 10 p.m. study #shortvideo #vlog

Honda Civic Reborn Night Driving Status #shorts #ytshorts #youtubeshort #shortvideo #civicreborn #yt

evening doji star #intradaytrading #stokmarket #tranding #shortvideo#trading#viralshorts

Night Seen Rajghat bridge with Ganga #shortvideo #shorts #short #trainvideo