Test and Evaluation Framework for AI Enabled Systems

preview_player
Показать описание
Presenter: Brian Woolley (Joint Artificial Intelligence Center)

In the current moment, autonomous and artificial intelligence (AI) systems are emerging at a dizzying pace. Such systems promise to expand the capacity and capability of individuals by delegating increasing levels of decision making down to the agent-level. In this way, operators can set high-level objectives for multiple vehicles or agents and need only intervene when alerted to anomalous conditions. Test and evaluation efforts at the Join AI Center are focused on exercising a prescribed test strategy for AI-enabled systems. This new AI T&E Framework recognizes the inherent complexity that follows from incorporating dynamic decision makers into a system (or into a system-of-systems). The AI T&E Framework is composed of four high-level types of testing that examine at an AI-enabled system from different angles to provide as complete a picture as possible of the system’s capabilities and limitations, including algorithmic, system integration, human-system integration, and operational tests. These testing categories provides stakeholders with appropriate qualitative and quantitative assessments that bound the system’s use cases in a meaningful way. The algorithmic tests characterize the AI models themselves against metrics for effectiveness, security, robustness and responsible AI principles. The system integration tests the system itself to ensure it operates reliably, functions correctly, and is compatible with other components. The human-machine testing asks what do human operators think of the system, if they understand what the system is telling them, and if they trust the system under appropriate conditions. All of which culminates in an operational test that evaluates how the system performs in a realistic environment with realistic scenarios and adversaries. Interestingly, counter to traditional approaches, this framework is best applied during and throughout the development of an AI-enabled system. Our experience is that programs that conduct independent T&E alongside development do not suffer delays, but instead benefit from the feedback and insights gained from incremental and iterative testing, which leads to the delivery of a better overall capability.
Рекомендации по теме