Production infrastructure accumulates security debt continuously — a new CVE lands upstream, an audit flags a misconfiguration, a policy tightens an SSH default — faster than any responder can clear it. We study autonomous vulnerability remediation: agents that observe a running system, diagnose the exposure, synthesize and execute a fix, and verify the risk is gone.
The hard part is not finding a fix — it is deploying one without disturbing the workload around it. Remediation is a dual objective: the patch must close the vulnerability and the protected service must stay available. Closing a CVE by taking the service offline is a failure, not a partial success — so the work sits in a gap that offensive-agent benchmarks (which only reward landing an exploit) and source-repair benchmarks (which test a quiescent codebase) both miss.
Our work along this line:
- SysRepair-Bench — 313 live, Dockerized/VM scenarios scored on both exploitability and service availability, including a compensating-control track where direct patching is forbidden.
- SysRepair / NeuroPlan — a multi-architecture study and a neural-symbolic planner that emits verifiable, reviewable remediation plans (PDDL + classical planning) instead of opaque agent actions.
- A survey mapping the emerging field across agentic AI, reinforcement learning, benchmarks, and operational safety.
See the preprints below and the code repositories linked above.
Preprints
- Autonomous System Vulnerability Remediation: A Survey of Agentic AI, Reinforcement Learning, Benchmarks, and Operational SafetyAbanisenioluwa Orojo, Webster Elumelu, Emmanuelli El-Mahmoud, Erika Leal · Preprint · under review (Int. Journal of Information Security)
- SysRepair-Bench: A Benchmark for AI Agents' Ability to Remediate Real-World System VulnerabilitiesAbanisenioluwa Orojo, Webster Elumelu, Emmanuelli El-Mahmoud, Erika Leal · Preprint · under review (NeurIPS 2026)
- SysRepair: A Benchmark and Multi-Architecture Approach to Autonomous Vulnerability RemediationAbanisenioluwa Orojo, Webster Elumelu, Emmanuelli El-Mahmoud, Erika Leal · Preprint · under review (ACSAC 2026)
Related publications
No publications mapped to this area yet.