Rohan Subramani

Hi, I'm Rohan! I aim to promote welfare and reduce suffering as much as possible for all sentient beings, which has led me to work on AGI safety research. I am particularly interested in foundation model agents (FMAs): systems like AutoGPT and Operator that equip foundation models with memory, tool use, and other affordances so they can perform multi-step tasks autonomously.

I am the founder of Aether, an independent research lab focused on foundation model agent safety. I'm also an incoming PhD student at the University of Toronto, where I will be supervised by Professor Zhijing Jin and continue to run Aether. Previously, I completed an undergrad in CS and Math at Columbia, where I helped run Columbia Effective Altruism and Columbia AI Alignment Club (CAIAC). I have done research internships with AI Safety Hub Labs (now LASR Labs), UC Berkeley's Center for Human-Compatible AI (CHAI), and the ML Alignment & Theory Scholars (MATS) program.

I love playing tennis, listening to rock and indie pop music, playing social deduction games, reading fantasy books, watching a fairly varied set of TV shows and movies, and playing the saxophone, among other things.

profile photo

Papers

Higher-Order Beliefs in Incomplete Information MAIDs

R. Subramani*, J. Foxabbott*, F.R. Ward
AAMAS, 2025
A framework for reasoning about higher-order beliefs in multi-agent influence diagrams with incomplete information.

The Partially Observable Off-Switch Game

A. Garber*, R. Subramani*, L. Luu*, M. Bedaywi, S. Russell, S. Emmons
AAAI, 2025
Extending the AI off-switch game to partially observable settings, analyzing optimal policies for both human and AI.

Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains

J. Clymer, G. Baker, R. Subramani, S. Wang
Preprint.
Developing formal frameworks to test AI systems' ability to generalize oversight to novel domains.

On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning

R. Subramani*, M. Williams*, M. Heitmann*, H. Holm, C. Griffin, J. Skalse
ICLR, 2024
Analyzing the theoretical limitations of different frameworks for specifying objectives in RL.

Projects

Coding GPT-2 from scratch

Implemented a transformer-based language model from scratch to better understand the architecture.

Implementing basic (and not-so-basic) LLM agents

Built various autonomous agents powered by language models to explore their capabilities and limitations.

Alignment Research Engineer Accelerator (ARENA) exercises

Completed technical exercises focused on AI alignment concepts and implementation.

Experimenting with neural network pruning

Investigated different approaches to reducing neural network size while maintaining performance.