Meta Llama 3 Fine tuning, RAG, and Prompt Engineering for Drug Discovery

preview_player
Показать описание
Large language models such as Meta's newly released Llama 3 have demonstrated state-of-the-art performance on standard benchmarks and real-world scenarios. (1) To further improve domain-specific generative AI answers, Fine-tuning on a different dataset, Prompt Engineering, and Retrieval Augmented Generation (RAG) are utilized to improve Llama 3 utility.

For enhanced usability, Llama 3 text-generations may need additional modifications, provide additional context, or use a specialized vocabulary. Fine-tuning is the process of further training the original pre-trained Llama 3 using domain-specific dataset(s). Prompt engineering doesn’t involve re-training Llama 3, but is the process of "designing and refining the input given to a model to guide and influence the kind of output you want." RAG "combines prompt engineering with context retrieval from external data sources to improve the performance and relevance of LLMs." (2)

The seminar will detail how to use Drug Discovery related datasets with the three LLM techniques mentioned above. The cover image depicts cancer drug candidate RTx-152 and residing Protein and DNA interactions, in Separate research. Fried, W., et al. Nature Communications. April 05, 2024. (A)

-CEO Kevin Kawchak
Рекомендации по теме
Комментарии
Автор

Please keep posting these technical videos. So many people are tired of beginner level knowledge. Whoever serves the community with relevant, advanced knowledge will prosper.

madmen
Автор

Thank you for sharing the video. As you mentioned, it would be helpful to have the links to the associated Jupyter notebooks. Could you please provide those in the video description?

meelanc
Автор

So what Kevin Kawchak is saying...(smile)... is that it doesn't matter much about which LLM model you use. All the LLM does is provide control of the conversation. What counts is a phase in which a bias is developed towards text-based current information which the developer or user provides to the LLM interface.

Yyou cannot be assured that your data, or even the domain of interest, has been used to train the LLM model's response. Effectively, Retrieval Augmented Generation (RAG) patches in small, medium or large amounts of structured or unstructured data into to the AI environment and the AI provides answers accordingly using a vector database.

This raises the question of how do we test accuracy? And that depends on whether the output is rigorously re-evaluated after the RAG process. I feel the need for a workflow...

simonmasters
Автор

Some great drug discoveries are based on small experiments, e.g., Tamoxifen, Gleevec, Crizotinib & Vemurafenib etc. Is massive "fine-tuning" of LLM necessary, or counter-productive vs specific/narrow training?

bamhre
Автор

This guy can make abc and 123 complex.

generationgap