AI Safety x Physics Grand Challenge Submission

Project Track

Valentina Schastlivaia
Molecular Bionics Labs
Institute for Bioengineering of Catalonia
Barcelona, Spain
vschastlivaia@ibecbarcelona.eu

Aray Karjauv
XAI
Technical University of Berlin
Berlin, Germany
aray.karjauv@tu-berlin.de

With PIBBSS, Timaeus, & Apart Research

July 28, 2025

Abstract

As AI systems scale into decentralized, multi-agent deployments, emergent vulnerabilities challenge our ability to evaluate and manage systemic risks. In this work, we adapt classical epidemiological modeling (specifically SEIR compartment models) to model adversarial behavior propagation in AI agents. By solving systems of ODEs describing the systems with physics-informed neural networks (PINNs), we analyze stable and unstable equilibria, bifurcation points, and the effectiveness of interventions. We estimate parameters from real-world data (e.g., adversarial success rates, detection latency, patching delays) and simulate attack propagation scenarios across 8 sectors (enterprise, retail, trading, development, customer service, academia, medical, and critical infrastructure AI tools). Our results demonstrate how agent population dynamics interact with architectural and policy design interventions to stabilize the system. This framework bridges concepts from dynamical systems and cybersecurity to offer a proactive, quantitative toolbox on AI safety. We argue that epidemic-style monitoring and tools grounded in interpretable, physics-aligned dynamics can serve as early warning systems for cascading AI agentic failures.

Keywords: Physics-informed AI safety, your specific physics approach, AI safety problem area, your methodology

1 Introduction

Language-based AI agents are increasingly deployed across domains, from customer support to autonomous trading agents. According to KPMG's survey of 5,161 businesses with $1 billion or more in revenue, 12% of respondents have deployed AI agents for use across their organizations, another 37% are piloting AI agents, and roughly half (51%) of organizations are exploring the use of AI agents. More than 80% of respondents identified risk management as a significant concern in their generative AI strategies.

However, the widespread deployment of these AI agents has also exposed fundamental vulnerabilities in their reliability [Boisvert et al., 2025]. They can be compromised by adversarial inputs (e.g., prompt injections), propagate misinformation learned from uncurated data, or miscommunicate in multi-agent settings [Nie et al., 2024, Lin et al., 2021, Sun et al., 2022]. While traditional robustness evaluations focus on static benchmarks or single-turn prompt testing, real-world deployments demand systematic, multi-faceted assessment under both intentional attacks and emergent errors [Boisvert et al., 2025]. Moreover, existing evaluation frameworks seldom address interactions among agents or leverage domain-driven priors to improve resilience [Sun et al., 2022].

Research conducted at the AI Safety x Physics Grand Challenge, 2025

2 Methods

To model the propagation of adversarial behavior in large AI agent populations, we adapt a well-known technique from epidemiology: the SEIR compartmental model. We treat agents as elements of a dynamic system whose states change according to interactions with other agents, external adversaries, and intervention policies. We use Physics-Informed Neural Networks (PINNs) to learn the solution trajectories of the governing differential equations. PINNs are well-suited for this task because they allow us to encode known physical structure. Beyond simply tracking the number of malignant agents, we investigate the systems' phase spaces and stability properties. These help us answer key questions such as: What are current parameters of the system? At what parameter values does the system transition from an unsafe to a safe regime? What are the long-term equilibrium states? How sensitive are these outcomes to intervention timing and scale?

2.1 Theoretical Framework: Epidemiological Model Adaptation

We reinterpret the SEIR model (originally developed for biological epidemics) as a way to track how adversarial behavior spreads among AI agents.

AI Agent SEIR Epidemiological Model: Physics-Informed Framework for AI Agent Security Dynamics

The model describes AI agents transitioning through four states:

Susceptible (S): Vulnerable AI Agents (Operational agents, No security patches, Exposed to attacks, Normal behavior)
Exposed (E): Compromised Agents (Attack successful, Not yet malicious, Latent period, Undetected)
Infected (I): Malicious Agents (Harmful behavior, Spreading malware, Data poisoning)
Removed (R): Secured Agents (Isolated/patched, Immunized, Unplugged)

Transitions are governed by parameters such as transmission rate (β), activation rate (σ), detection/isolation rate (γ), immunization/patching rate (ν), external attack pressure (α), and system turnover rate (μ).

Differential Equations:
dS/dt = νR - βSI/N - αS + μ(N - S)
dE/dt = βSI/N + αS - σE - μE
dI/dt = σE - γI - μI
dR/dt = γI - νR - μR

Basic Reproduction Number (R₀): R₀ = β / (γ + μ)
Epidemic Threshold: R₀ > 1: Exponential spread, catastrophic event; R₀ < 1: Controlled system, infection dies out naturally.

[Description of Figure 1: A diagram illustrating the AI Agent SEIR Epidemiological Model, showing the four states (Susceptible, Exposed, Infected, Removed) and the flow between them, driven by parameters like transmission rate, activation rate, and detection rate. It also includes the differential equations and the definition of the basic reproduction number.]

2.2 Technical Implementation: Physics-Informed Neural Network Implementation

We implement a SEIR-PINN solver using the PINNsFormer architecture [Zhao et al., 2023] to capture complex nonlinear dynamics while enforcing physical constraints. The loss function combines data fitting with physics constraints:
L = Ldata + λ_physicsL_physics + λ_boundaryL_boundary (Equation 5)
The physics loss enforces the SEIR differential equations:
L_physics = Σ_i=1^N_physics [ (∂S/∂t - f_S)² + (∂E/∂t - f_E)² + (∂I/∂t - f_I)² + (∂R/∂t - f_R)² ] (Equation 6)

Our numerical solver allows exploration of the system's phase space, identification of bifurcation points, and prediction of failure cascades.

[Description of Figure 2: Visualizations of systems' dynamics analysis through phase spaces. These plots typically show trajectories of the system states (e.g., Susceptible vs. Infected agents) over time, illustrating stable and unstable equilibria, and potential tipping points.]

Most importantly, it allows monitoring and implementing effective interventions when the basic reproduction rate R₀ exceeds the critical threshold.

Key parameters for AI-specific dynamics:

β: Attack transmission rate (depends on ASR and connectivity)
σ: Incubation rate (exposed → infected transition)
γ: Detection/isolation rate (mean time to detection)
ν: Immunization/patching rate
α: External attack pressure
μ: Agent turnover rate (system refresh/replacement)

It's sometimes convenient to think of rates as probabilities of an agent transition from one state to another during time dt.

[Description of Figure 3: A 3D phase space projection (S-E-I) for an AI agent system with R₀ = 5.00, illustrating disease-free (unstable) and endemic (stable) states. The plot shows the trajectories of agent populations across the three dimensions.]

[Description of Figure 4: Depicts the system's reaction to an intervention. It includes phase space plots before (R₀ = 5.00) and after (R₀ = 0.357) intervention, as well as time series plots showing the evolution of Susceptible (S) and Infected (I) agents, and a phase trajectory plot illustrating the intervention's effect.]

2.3 Experimental Design: Empirical Parameter Estimation

We combined vulnerability data from multiple sources to adjust our model to real-world observations.

Data sources include:

DoomArena: [Boisvert et al., 2025] for GPT-4o vulnerability (22.7% to 78.6%).
Web3 Context Manipulation: [Patlan et al., 2025] showing 65% vulnerability (ASR) across 500+ test cases.
Medical AI Research: [Qiu et al., 2025] reporting 55% vulnerability in healthcare agents.
Industry Cybersecurity Reports: [Edgesca, 2023], [Chakrabarty, 2025] for Mean Time to Detect (MTTD), Mean Time to Remediate (MTTR), and breach statistics.

Parameters were estimated by weighting vulnerability sources by confidence and sample size. Vulnerability rates were converted to transmission rates using connectivity factors. Detection rates were taken from cybersecurity reports (MTTR, MTTD), progression rates from cyber kill chain models [Hoffmann, 2019], and parameters were constrained against epidemic thresholds and realism.

The basic reproduction number is constrained: 0.1 < R₀ = β / (γ + μ) < 10.0 (Equations 7 & 8)

Table 1: Empirical Parameter Estimates for AI Agent Epidemiology
Parameter	Range	Interpretation	Data Source
β	0.002-0.055 day^-1	Daily transmission probability	DoomArena, Web3 studies
γ	0.01-0.3 day^-1	Daily detection rate	Industry MTTD benchmarks
σ	0.02-1.0 day^-1	Activation rate	Cyber kill chain model
ν	0.0005-0.05 day^-1	Patching rate	Software lifecycle
α	0.0001-0.005 day^-1	External attack rate	Threat intelligence
μ	0.0001-0.01 day^-1	System turnover rate	Infrastructure data

Based on publicly available data from DoomArena, academic research, and industry reports, realistic epidemiological parameters for AI agent security modeling were derived.

Table 2: Population of different AI Agent Deployment Scenarios
AI agent purpose	Population
Enterprise Assistants	4,855⁺³
Development Tools	85⁴
Retail bots	2M⁵
Customer Service	17,333⁶
Research/Academic	3,000⁷
Web3/Blockchain/Autonomous Trading	200K⁸
Medical AI	223⁹
Critical Infrastructure (airlines, banks, telecoms)¹⁰	32,000

3 Results

Using the trained PINNs, we simulated time-series curves, estimated long-term prevalence under no intervention, and tested the effectiveness of countermeasures (such as increasing γ or ν). The results helped visualize when a given system might approach criticality and how to reduce the risk.

Analysis of 8 realistic deployment AI agentic scenarios reveals significant variation in epidemic potential. Our empirical analysis revealed that some agent deployments (especially in research and medical contexts) lie close to or above the R₀ = 1 threshold, meaning the need for monitoring tools and risk-mitigation frameworks.

3.1 Empirical Findings: System Dynamics and Phase Portraits

Using the trained PINNs, we plotted system trajectories in the S-I and S-E-I phase space to better understand the structure of the dynamical system. We observed that:

In low R₀ regimes (R₀ < 1), the system tends toward disease-free equilibria.
In high R₀ regimes (R₀ > 1), adversarial behavior persists and may saturate large parts of the agent population.
Some systems exhibit bifurcation behavior: a critical point in the parameters where stability flips.

[Description of Figure 5: Visualizations of AI agent epidemiological parameters based on empirical data from cybersecurity research and industry deployments. This figure likely includes bar charts showing reproduction numbers by scenario, transmission vs. detection rates, risk vs. impact matrices, and parameter values.]

Table 3: Risk Assessment Across AI Agent Deployment Scenarios
AI Agent Purpose	Population	R₀	Risk Level	Data Source
Enterprise Assistants	4,855	0.276	LOW	DoomArena airline scenarios
Development Tools	85	0.075	LOW	DoomArena computer-use
Retail Bots	2M	0.136	LOW	DoomArena retail scenarios
Customer Service	17,333	0.469	LOW	DoomArena retail with defense
Research/Academic	3,000	2.353	LOW	DoomArena web navigation
Web3/Blockchain/Trading	200K	0.282	LOW	Web3 context manipulation
Medical AI	223	1.293	MODERATE	Medical AI vulnerability
Critical Infrastructure	32,000	0.002	LOW	NIST cybersecurity

Enhancing monitoring and detection (γ) is most effective for reducing R₀. Network segmentation and model isolation (β) reduce attack transmission between agents. Immunization by introducing a guardian model (ν), as well as regular updates, reduce susceptible populations.

[Description of Figure 6: Trained PINNs predictions of AI agent epidemiological situation evolution for various scenarios: Enterprise Assistants (R₀ = 0.276), Web3 Blockchain Trading (R₀ = 0.282), Medical AI (R₀ = 1.293), and Critical Infrastructure (R₀ = 0.002). The plots show the time evolution of Susceptible, Exposed, Infected, and Removed agent populations.]

[Description of Figure 7: Phase portraits of AI systems with high reproduction rates. (a) Phase space for Medical AI agents (R₀ = 1.293), showing equilibrium points and trajectories. (b) Phase space for Research AI agents (R₀ = 2.353), also illustrating system dynamics and equilibria.]

[Description of Figure 8: Bifurcation analysis and intervention planning for AI systems with top 2 reproduction rates. (a) Bifurcation analysis for Medical AI agents, showing relationships between parameters (like transmission rate β and detection rate γ) and system outcomes (equilibrium population, time to equilibrium). (b) Bifurcation analysis for Research AI agents, presenting similar analyses.]

[Description of Figure 9: Comparison of intervention strategies' effects on the infected population for Medical AI. The plot shows the number of infected agents over time under different interventions: Enhanced Monitoring, Federated Defense, Model Isolation, and Gradual Update, illustrating how each strategy impacts the epidemic's progression.]

4 Discussion and Conclusion

4.1 Future Directions

We have demonstrated that physics-informed epidemiological modeling provides a powerful framework for understanding and managing security risks in large-scale AI agent deployments. Our empirical analysis reveals significant variation in epidemic potential across deployment contexts, with research environments requiring immediate attention due to high R₀ values.

We would like to explore percolation theory to characterize the spread of malignant behavior. The attractiveness of percolation theory is that it exhibits power law behavior, which might be interesting to look at.

On the practical side, the development of a monitoring system might enable AI companies to:

Continuous R₀ Calculation: Real-time basic reproduction number monitoring.
Epidemic Alert System: Automated alerts when R₀ > 1.
Time-to-Saturation Prediction: Calculate hours until 90% infection.
Intervention Strategy Optimization: Cost-benefit analysis for different responses.

The real-time monitoring will enable AI companies to transition from reactive to proactive security postures, providing quantitative guidance for intervention strategies. By establishing the basic reproduction number (R₀) as a key metric for AI system health, we provide a universal language for discussing and managing AI security risks with executive management.

This work opens a new direction for physics-informed AI safety research while providing immediately actionable tools for securing the rapidly growing population of AI agents across diverse deployment contexts.

References

Leo Boisvert, Mihir Bansal, Chandra Kiran Reddy Evuru, Gabriel Huang, Abhay Puri, Avinandan Bose, Maryam Fazel, Quentin Cappart, Jason Stanley, Alexandre Lacoste, et al. Doomarena: A framework for testing ai agents against evolving security threats. arXiv preprint arXiv:2504.14064, 2025.
Yuzhou Nie, Zhun Wang, Ye Yu, Xian Wu, Xuandong Zhao, Wenbo Guo, and Dawn Song. Privagent: Agentic-based red-teaming for llm privacy leakage. arXiv preprint arXiv:2412.05734, 2024.
Stephanie Lin, Jacob Hilton, and Owain Evans. Truthfulqa: Measuring how models mimic human falsehoods. arXiv preprint arXiv:2109.07958, 2021.
Yanchao Sun, Ruijie Zheng, Parisa Hassanzadeh, Yongyuan Liang, Soheil Feizi, Sumitra Ganesh, and Furong Huang. Certifiably robust policy learning against adversarial communication in multi-agent systems. arXiv preprint arXiv:2206.10158, 2022.
Zhiyuan Zhao, Xueying Ding, and B. Aditya Prakash. PINNsFormer: A Transformer-Based Framework For Physics-Informed Neural Networks. arXiv preprint arXiv:2307.11833, 7 2023. URL http://arxiv.org/abs/2307.11833.
Yanzhe Zhang, Tao Yu, and Diyi Yang. Attacking vision-language computer agents via pop-ups, 2024.
Atharv Singh Patlan, S Ashwin Hebbar, Prateek Mittal, and Pramod Viswanath. Real ai agents with fake memories: Fatal context manipulation attacks on web3 agents. arXiv preprint arXiv:2503.1624, 7 2025. URL https://arxiv.org/abs/2503.16248.
Jianing Qiu, Lin Li, Jiankai Sun, Hao Wei, Zhe Xu, Kyle Lam, and Wu Yuan. Emerging cyber attack risks of medical ai agents. arXiv preprint arXiv:2504.03759, 4 2025. URL https://arxiv.org/pdf/2504.03759.
Edgesca. Vulnerability statistics report | mean time to remediate data (mttr), 2023. URL https://info.edgescan.com/vulnerability-statistics-li23.
Pradipta Kishore Chakrabarty. Adversarial attacks on agentic ai systems: Mechanisms, impacts, and defense strategies. International Journal of Science and Research (IJSR), 14:1367–1369, 4 2025. doi:10.21275/SR25417074844.
Romuald Hoffmann. Markov models of cyber kill chains with iterations. 2019 International Conference on Military Communications and Information Systems, ICMCIS 2019, 5 2019. doi:10.1109/ICMCIS.2019.8842810.

Appendix

Code and Implementation

Complete implementation available at: https://github.com/GingerSpacetail/pinnsformer

Key components:

ai_epidemiology_model.py: Core SEIR-PINN implementation
bifurcationanalysis.py, empirical_parameter_estimation.py: Parameter estimation framework
realistic_ai_epidemiology_scenarios.py: Scenario analysis tools
real_time_monitoring.py: Monitoring framework (future work)

LLM Usage Declaration

This research was conducted with assistance from Claude 3.5 Sonnet for:

Sources summarization, introduction improvement
Code debugging for PINNs implementation
Extensive technical documentation

The core theoretical insights, empirical analysis, and framework development represent original research contributions by the authors.

	External Knowledge Augmented Language Models for Code Generation and Agents: Thesis Proposal This thesis proposal by Fangzheng (Frank) Xu investigates the integration of external knowledge into language models for enhanced code generation and the development of AI agents. It covers pre-training, human studies, retrieval augmentation, and LLM agent applications, aiming to improve natural language interaction with computers.
	Convolutional Neural Networks for Speech Recognition This research paper explores the application of Convolutional Neural Networks (CNNs) to enhance Automatic Speech Recognition (ASR) systems, detailing their architecture, a novel limited weight sharing scheme, and experimental results showing improved performance over DNNs and GMM-HMMs.
	Transformative Innovation: Defining and Understanding a New Phenomenon within Organizations A doctoral thesis by Costanza Baldrighi from the University of Pavia & University of Bergamo, exploring Transformative Innovation (TI). It proposes a unifying framework for integrating digital, sustainable, and business transformations to help organizations navigate complex changes and achieve societal and economic goals.
	W100 Smart Glasses: AI Bluetooth Translator and Audio Device Specifications Detailed specifications for the W100 Smart Glasses, featuring AI translation, Bluetooth connectivity, voice assistant, and long battery life. Includes technical parameters, features, and package contents for this AI-powered wearable device.
	Design and Fabrication of a Talking Robot This paper details the design and fabrication of an AI-powered talking robot, outlining its architecture, hardware components (Raspberry Pi, Arduino, sensors, motors), software implementation, and potential applications. It discusses challenges in speech recognition and natural language processing for human-robot interaction.
	Stephen R. Kraus CV \| Leading Urologist & Urodynamics Expert Curriculum Vitae of Dr. Stephen R. Kraus, Professor and Vice Chairman of Urology at UTHSCSA, specializing in Female Urology, Neuro-Urology, and Urodynamics. Details extensive academic, research, and clinical contributions.
	Advancing the Clinical Management of Cystic Echinococcosis Patients: Three Years of Research This PhD thesis details three years of clinical research on Cystic Echinococcosis (CE), covering epidemiology, clinical management, and diagnosis. It includes studies on CE in the Mediterranean and Kazakhstan, management of bone CE and IVC cysts, albendazole shortages, diagnostic test performance, and miRNA analysis.
	Print Specifications and Guidelines for Graphic Design Comprehensive print specifications and graphic design guidelines, detailing safe margins, trim lines, bleed requirements, resolution (300 DPI), CMYK color mode, and supported file formats (TIFF, PDF, EPS, AI, CDR).