Autonomous Open Source LLM Evaluator (Ollama) - Full Guide

Показать описание

Autonomous Open Source LLM Evaluator (Ollama) - Full Guide

👊 Become a member and get access to GitHub and Code:

🤖 Great AI Engineer Course:

🔥 Open GitHub Repos:

📧 Join the newsletter:

🌐 My website:

Today I take a look at my Autonomous Open Source LLM Evaluator using Ollama and GPT-4. This is a neet tool to test open source LLMs on different tasks like problems and code

00:00 Ollama LLM Eval Intro
00:21 Ollama LLM Eval Flowchart
01:28 LLM Evaluator Code 1
06:24 Test 1
08:30 LLM Evaluator Code 2
09:13 Test 2
10:53 Conclusion

Рекомендации по теме

Комментарии

I've built a similar system, but I noticed that judge model sometimes hallucinates and gives high marks to obviously wrong solutions. I tried to make a jury of multiple judges (different big models) this improved judging quality, but made it 8X slower. Also, with multiple judges you will need to fuse their judgements to some consensus, it's just pretty slow and all models do hallucinate.

ArseniyPotapov

aya:35b blows everything out of the window. Not ten times better then chatGPT but one hundred times better. It's slow as it's 35B run locally but, I love it. Besides that I use llama3 for most everyday tasks..

ProfessorCrumbs

In The Bubble sort evaluation, all the models that were eval as wrong (MIstral, Codestral..etc) had a syntax error in line 1 because it included the output text as a line of code as for the code itself it was sound on all..so it is not a proper eval as you need to check your code as to why it worked for a couple but not the others as a simple syntax error that wasnt part of the LLM's code but yours does not make for a proper eval. Other than its a cool idea

thenarrowgate

May I ask what is your roadmap for this channel?

tonywhite

What is the sense to estimate many models by some more powerful model if this is required for each problem so it would be much faster to just ask GPT-4 for an answer of the problem

JohnDoe-zxbu

Autonomous Open Source LLM Evaluator (Ollama) - Full Guide

Autonomous Open Source LLM Evaluator (Ollama) - Full Guide

How to Build, Evaluate, and Iterate on LLM Agents

How Large Language Models Work

AgentBench: NEW Benchmarking Tool CHANGES The LLM LEADERBOARD (Installation Tutorial)

LLM Explained | What is LLM

Build Anything with Llama 3 Agents, Here’s How

OpenAI One Step Closer to SELF IMPROVING AI | AI Agents doing AI Research | MLE-bench

'I want Llama3.1 to perform 10x with my private knowledge' - Self learning Local Llama3.1 ...

GPT-4 is still the KING of AGENT LLMs!

MASSIVE Step Allowing AI Agents To Control Computers (MacOS, Windows, Linux)

AgentSims: An Open-Source Sandbox for Large Language Model Evaluation

Reducing Hallucinations and Evaluating LLMs for Production - Divyansh Chaurasia, Deepchecks

Evals for AI Agents, the right way!!!

OS-World: Improving LLM Agent Operating Systems!

AutoGen Tutorial 🚀 Create Custom AI Agents EASILY (Incredible)

Magicoder: BEST Coding LLM with ONLY 7B In Size + Opensource!

LLM Agents and Evaluation: An Interview With Graham Neubig

What is LangChain?

Datadog on LLMs: From Chatbots to Autonomous Agents

Evaluation for Large Language Models and Generative AI - A Deep Dive

Prompt Engineering And LLM's With LangChain In One Shot-Generative AI

The RIGHT WAY To Build AI Agents with CrewAI (BONUS: 100% Local)

Qwen-7B: Alibaba's NEW Opensource LLM Beats LLAMA 2 and Stays on Par with GPT-4!

TurboPi Raspberry Pi Omnidirectional Mecanum Wheels Robot Car Kit