
Professor of Computer Science
I develop intelligent autonomous agents that interact with and adapt to complex, real-world environments. My research focuses on creating new paradigms, benchmarks, and frameworks to advance the capabilities and applications of machine learning and AI. Some recent highlights include:
- GPT: demonstrated an autoregressive transformer for language modeling and introduced the idea of solving NLP tasks through token prediction
- ReAct and Tree of Thoughts (ToT): combined reasoning and acting with language models into one paradigm and helped kickstart LM-based AI agents.
- SWE-bench, SWE-agent (++): introduced a comprehensive benchmark for software engineering AI agents (much more than just writing code) and helped turbocharge progress in AI coding agents.
- WebShop: introduced the idea of web-based AI agents that can perform tasks on realistic websites (e.g. Amazon/Ebay shopping).
- GEO: introduced the paradigm of content optimization in the age of generative engines like ChatGPT.
- TAU-bench: introduced a dynamic dual-control environment for testing AI agents at user-facing tasks like customer support.
I previously received my PhD from MIT, advised by Prof. Regina Barzilay. I have also spent time as a research scientist at OpenAI (2017-18) and head of research at Sierra (2023-25).