New AI cascade of LLMs - FrugalGPT (Stanford)

Показать описание

ChatGPT and GPT-4 can be expensive for a small enterprise, like $21K per month to support customer service. How to reduce the price (costs) for ChatGPT and GPT-4 services with PROMPT adaptation, LLM approximation (caching solutions and fine-tuning Transformer sub-systems) and/or LLM cascades?

Data-adaptive LLM selection!

Stanford University published new insights how to save costs and increase AI system performance, for you as a business owner or a AI developer. The optimal way to use LLM systems like ChatGPT or GPT-4 (also valid for GPT-5) to reduce the price you have to pay (to OpenAI, Microsoft, ...).

Thevideo highlights the challenges faced by small businesses in the United States that use GPT-4 for customer service, as the monthly costs can be around $21,000. Additionally, the energy and environmental impacts of running LLMs in cloud compute centers are significant. To address these issues, the authors compare the costs associated with 12 different commercial LLMs and introduce the concept of frugal GPT.

Frugal GPT aims to reduce the resources needed to run LLMs by focusing on prompt adaptation, LLM approximation, and LLM cascading. Prompt adaptation involves decreasing the length of the input prompt, while LLM approximation utilizes an external cache to store previous query responses. By checking the cache before submitting a new query, the system can avoid unnecessary interactions with the LLM. LLM cascading involves using different LLMs based on cost and performance, starting with cheaper options like GPT-3 and progressing to more expensive models like GPT-4 if necessary.

The authors conducted experiments using different LLM APIs and prompting strategies and found that frugal GPT can achieve substantial efficiency gains. In one specific case using a news dataset, they reduced the inference cost of running queries by 98% while still exceeding the performance of GPT-4. The implementation also involved a scoring function, which could be a regression model trained on a dataset of headlines using a simplified version of BERT.

The video emphasizes the potential cost savings and improved performance achieved by using a multi-AI system with various price points and performance levels. By utilizing frugal GPT and considering different LLM options based on cost and quality, significant cost reductions and even accuracy improvements can be achieved compared to using a single expensive LLM like GPT-4.

The YT video also briefly mentions the idea of going even further in reducing costs by using even lower-quality LLMs and providing additional semantic linguistic information to compensate for their limitations. However, this hypothetical scenario would require longer prompts and potentially contradict the prompt adaptation strategy discussed earlier.

Overall, the publication presents an approach to reduce the cost of using LLMs in cloud computing environments while maintaining or improving performance. The frugal GPT concept and the use of multiple LLMs with varying costs and performance levels offer potential benefits for businesses seeking cost-effective solutions for their AI applications.

All rights (data, model, charts and tables) are with the authors of the scientific preprint:
FrugalGPT: How to Use Large Language Models
While Reducing Cost and Improving Performance
by Lingjiao Chen, Matei Zaharia, James Zou

Article by Markettechpost:

#performance
#price
#ai
#gpt4
#chatgpt
#explained
#insights

Discover AI

Рекомендации по теме

Комментарии

Yep. I've been using Chroma and creating vector indexes locally, using free models and train them on out data and rely on gpt-3 for only some of the needs.gpt4 very rarely. Cost last month, $8

krisvq

Very funny ending... or may have been just me... Excellent video as usual...

joser

I'm currently designing a customer service app and when I started doing some maths with the prices of the GPT4 API, it kind of threw me off.
So this is very interesting, thanks!

Alain.Robert

This is a great study, and explanation. I've been looking at a similar idea to generating fine tuning datasets similar to that of the WizardLM/Evol-Instruct approach. The main model (eg starting with WizardLM itself) will generate the questions and answers with access to tools, but separate models to score questions, answers, and both together. The original/base model would need to be good enough to utilize tools with some basic reasoning, which is a limitation.

Then filtering, categorizing and packing data into training sets to fine tune the main model. Then evaluate the performance and repeat the process with the fine tuned model. My hypothesis is the main model abilities will be limited by the scoring ability of the smaller models before the parameter size. I'm still setting up this system and learning as I go, this study would likely indicate I'll be wrong (happy to be!), but I'll be interested to see where it goes. Likely smarter people than I are already further along building such a system, or such a system already exists and I have yet to find the relevant papers. Either way, I really appreciate your videos, they have provided a lot of great guidance!

DarrenReidAu

Speaking of coincidences, have you noticed in your world the increasing number of coincidences- some orchestrated, (hey Google, hey Alexa) and some to lead us to to find amazing pertinent free information tools to enable anyone to build literally anything. And this particular subject begs for collaboration and sharing. and is getting it. Great community in A.I.. ..how everyone nearly gives it away. Ironically, it definitely make it hard to get anything done if you don't make a living doing A.I. (i.e. dont code).., but you gotta wonder what cool stuff those deeply in it are working on. Doc, are you doing anything super cool? I'm working on a bootstrap agent.

josephshawa

Brilliant.🎉
But all in all, i feel a well tuned free and small LLM run on our our specific data will give best cost and performance.

CharlesOkwuagwu

FYI in GPT-4 you can literally copy-paste and send guidance template, just replace variables like {{query}} with valid string, end effect will be the same.

wojciechzielinski

nice video. does the scoring function performed by the Bert intermediary require the Bert to be fine tuned on data specific to the task and knowledge domain that the cascade will be used for? the example said train DistilBert on headlines for example but is that only appropriate if the cascade is to be used for similar queries?

theshrubberer

This video is very usefully informative. Many thanks.

jayhu

i think they used this in Bing too, except for the cache part

xiaojinyusaudiobookswebnov

wasnt sure how to solve harder questions and was thinking about his approach as a cheating (call to GPT4 for help), but here we go, its just a normal I guess

LaFragas

New AI cascade of LLMs - FrugalGPT (Stanford)

New AI cascade of LLMs - FrugalGPT (Stanford)

FrugalGPT: Tips for saving money, processing time, and improving speed with LLMs

#mlops #ai #machinelearning #llm

AI How To: Advanced LLM Prompting

#mlops #ai #machinelearning #llm

#mlops #ai #machinelearning #llm

THE FUTURE of FREE AI Models Is HERE! LOCAL INSTALL in 1 CLICK!

The REAL cost of LLM (And How to reduce 78%+ of Cost)

I wrote Video Shuffle Studio in Python with LLM, you can too !

Cosplay by b.tech final year at IIT Kharagpur

Transformers: The best idea in AI | Andrej Karpathy and Lex Fridman

The inner workings of LLMs explained - VISUALIZE the self-attention mechanism

This Week in AI: Gemini 1.5, Sora, DeepMind's LLM Breakthrough, Reka Flash, Stable Cascade

Stanford's new ALPACA 7B LLM explained - Fine-tune code and data set for DIY

5 Easy Ways to help LLMs to Reason

How to reduce cost using LLM Usage Efficiently #shorts

AI: Grappling with a New Kind of Intelligence

LLMs as Tool Makers [LATM] - GPT-4 *UPGRADES* lower AI Models.

FrugalGPT - How to Use LLMs While Reducing Cost and Improving Performance

The Emergent Abilities of LLMs - why LLMs are so useful

The Open Source KING is BACK. Stability's NEW AI Image Generator!

Real-world Attacks on LLM Applications

FrugalGPT to Minimize API Costs| GPT-4 API is Expensive

Efficient Text-to-Image Training (16x cheaper than Stable Diffusion) | Paper Explained

LLMs as Tool Makers [LATM] - GPT-4 UPGRADES lower AI Models.