Rohan Subramani

Hi, I'm Rohan! I aim to promote welfare and reduce suffering as much as possible for all sentient beings, which has led me to work on AGI safety research. I am particularly interested in foundation model agents (FMAs): systems like Claude Code and AutoGPT that equip foundation models with memory, tool use, and other affordances so they can perform multi-step tasks autonomously.

I am the founder of Aether, an independent research lab focused on foundation model agent safety. I also started as a PhD student at the University of Toronto in September 2025. I am supervised by Professor Zhijing Jin and continue to run Aether. Previously, I completed an undergrad in CS and Math at Columbia, where I helped run Columbia Effective Altruism and Columbia AI Alignment Club (CAIAC). I have done research internships with AI Safety Hub Labs (now LASR Labs), UC Berkeley's Center for Human-Compatible AI (CHAI), and the ML Alignment & Theory Scholars (MATS) program.

I love playing tennis, listening to rock and indie pop music, playing social deduction games, reading fantasy books, watching a fairly varied set of TV shows and movies, and playing the saxophone, among other things.

Papers

How does information access affect LLM monitors' ability to detect sabotage?

R. Arike*, R.M. Moreno*, R. Subramani*, S. Biswas, F.R. Ward
Preprint, 2026
Studying how information access affects LLM monitor performance; finds that monitors often perform better with less of the agent's reasoning and actions (less-is-more effect) and introduces extract-and-evaluate (EaE) monitoring, which improves sabotage detection in multiple environments.

Password-Activated Shutdown Protocols for Misaligned Frontier Agents

K. Williams*, R. Subramani*, F.R. Ward*
Preprint, 2025
A safety mechanism for advanced AI systems using password-activated emergency shutdowns.

Higher-Order Beliefs in Incomplete Information MAIDs

R. Subramani*, J. Foxabbott*, F.R. Ward
AAMAS, 2025
A framework for reasoning about higher-order beliefs in multi-agent influence diagrams with incomplete information.

The Partially Observable Off-Switch Game

A. Garber*, R. Subramani*, L. Luu*, M. Bedaywi, S. Russell, S. Emmons
AAAI, 2025
Extending the AI off-switch game to partially observable settings, analyzing optimal policies for both human and AI.

Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains

J. Clymer, G. Baker, R. Subramani, S. Wang
Preprint, 2023.
Developing formal frameworks to test AI systems' ability to generalize oversight to novel domains.

On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning

R. Subramani*, M. Williams*, M. Heitmann*, H. Holm, C. Griffin, J. Skalse
ICLR, 2024
Analyzing the theoretical limitations of different frameworks for specifying objectives in RL.

Projects

Coding GPT-2 from scratch

Implemented a transformer-based language model from scratch to better understand the architecture.

Implementing various LLM agents

Built several autonomous agents with the OpenAI API to explore their capabilities and limitations.

Alignment Research Engineer Accelerator (ARENA) exercises

Completed technical exercises focused on AI alignment concepts and implementation.

Experimenting with neural network pruning

Reimplemented and visualized some ideas from the Lottery Ticket Hypothesis paper.

Misc

Trajectory Labs Talk: Chain-of-Thought Monitoring and AI Control

Talk from October 2025 with Rauno Arike. Overview of chain-of-thought monitorability and AI control, and discussion of recent work on comparing the performance of monitors with varying amounts of information access.