filmov
tv
LFI Conf 23 | Marisa Bigelow | OREOS minus the milk: Deploying to the DoD in Staged Worlds Studies
Показать описание
Marisa Bigelow, Cognitive Systems Engineering Manager, Mile Two LLC
Digital outages are inevitable in complex production software systems and can have dangerous effects for mission-critical IT infrastructure like those found in the Department of Defense (DoD) environments. Our Resilience Engineering Ops Simulation (OREOS) revealed challenges to our digital services such as multi-adversarial attacks, coordinated team responses, and highly sensitive information concerns. The aim of this talk is to describe the staged world process, difficulty in capturing the real world, and specific obstacles discovered when dealing with the DoD as a customer similar to other security incidents in the larger business-critical digital services industry.
Teams scramble to troubleshoot and respond to outages, which continually challenge their current mental models of the system. Diverse perspectives are needed in adapting to disruptions, particularly in coping with the challenges deploying to distributed DoD environments. While these multiple perspectives are necessary, they can also create conflicts in responsibility, authority, and goals within and across echelons. We brought together diverse software teams to elicit their experiences in working with DoD deployments through low-fidelity staged world simulations. Staged world problems are similar to game days in that they are high fidelity example problems that prompt a wide range of stories that can broaden and narrow the range of responses, including hard problems like overcoming opaque tools and limited access to distributed systems. The staged world is used as a means to uncover the challenges, because of the difficulty in simulating the true operational setting. Running teams through these experiences culminated in a deep understanding for the organization of the various problems our teams have faced and new pathways to explore in anticipating and supporting their adaptive capacity to maneuver within the DoD ecosystem.
Learning from Incidents (LFI) is a community challenging conventional views and reshaping how the software industry thinks about incidents, software reliability, and the critical role people play in keeping their systems running.In today’s economy, software organizations can’t afford to not learn from incidents.
LFI Conference is made possible by the financial and planning support of the Jeli team. Nora Jones, Founder and CEO of Jeli, founded the LFI community and website as a way to show organizations how to get more ROI out of their most powerful investments -- their incidents.