Evaluating Large Language Models for Cybersecurity Tasks: Challenges and Best Practices

Показать описание

How can we effectively use large language models (LLMs) for cybersecurity tasks? In this podcast from the Carnegie Mellon Software Engineering Institute, Jeff Gennari and Sam Perl discuss applications for LLMs in cybersecurity, potential challenges, and recommendations for evaluating LLMs.

#LLMs, #AI, #cybersecurity, @TheSEICMU

Software Engineering Institute | Carnegie Mellon University

Рекомендации по теме

Комментарии

At 36:45 a key point is made about the specialization within the profession. It's already a problem within #cybersecurity that specialization across vulnerability management, platform security, app sec, cryptography, secrets management -- become de facto siloes. So far, it's unclear whether #LLMs can help address this problem, by summarization and narrative-making perhaps. But I doubt it can fully overcome it. In fact, the LLM may add a layer of, for instance, generated "code" or templating on top of what is already a tower of subcomponents and services -- thereby adding still more complexity.

That said, I doubt there will be any turning back.

MarkUnderwood-knowlengr

Evaluating Large Language Models for Cybersecurity Tasks: Challenges and Best Practices

How to evaluate and choose a Large Language Model (LLM)

How Large Language Models Work

Evaluating the Output of Your LLM (Large Language Models): Insights from Microsoft & LangChain

Master LLMs: Top Strategies to Evaluate LLM Performance

Petr Polezhaev – Advancements in Evaluating Large Language Model Applications

Evaluation for Large Language Models and Generative AI - A Deep Dive

Evaluating LLM-based Applications

LLM Module 4: Fine-tuning and Evaluating LLMs | 4.9 Evaluating LLMs

There are no standardized evaluation criteria to measure the responsible behavior of LLMs.

Evaluating Large Language Models: Simple and Easy Techniques for Ensuring Generative AI Reliability

Evaluating Large Language Models for Cybersecurity Tasks: Challenges and Best Practices

EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria

Evaluating Large Language Models on Clinical & Biomedical NLP Benchmarks

Evaluating Large Language Models in Generating Synthetic HCI Research Data: a Case Study

Evaluating Large Language Models Trained on Code - OpenAI Codex Paper

Evaluating Large Language Models Trained on Code

Evaluating Large Language Models Trained on Code

Evaluating Large Language Models: 30 Common Metrics

Yann Dubois: Scalable Evaluation of Large Language Models

Read TWO papers: How to evaluate LLM performance

Evaluating large language models with Ray in hybrid cloud

Can AI Really Plan? Evaluating Large Language Models and Reasoning Models

How to evaluate large language models using Prompt Engineering | Testing and Improving with PyTorch

Evaluation Approaches for Your LLM (Large Language Model): Insights from Microsoft & LangChain