Yan Zhou (Terry)

Hello! I am a Master’s student in Data Science at Harvard University. I am currently part of the CRISP group, working under the supervision of Prof. Demba Ba on my master’s thesis.

I am primarily interested in mechanistic interpretability. In particular, my research focuses on understanding models’ internal computations through analyzing weight-space structures.

I have also been working on knowledge distillation with Prof. David Alvarez-Melis in the ML Foundations group, and on AI safety and interpretability in the AI4LIFE group with Prof. Hima Lakkaraju. Previously, I completed my Bachelor’s degree in Politics and Data Science at the London School of Economics.

I am always happy to discuss research ideas! Please feel free to reach out to me at terryzhou [at] fas [dot] harvard [dot] edu.

News

Jun 27, 2026	I’ll be presenting our work User Persona Subspaces Modulate Refusal Behavior in Language Models at the Mech Interp Workshop at ICML 2026!

Selected Publications

ICML — Workshop

User Persona Subspaces Modulate Refusal Behavior in Language Models

Yan Zhou, Shichang Zhang, Zidi Xiong, and Himabindu Lakkaraju

ICML Workshop on Mechanistic Interpretability, 2026

URL PDF Bib

@article{zhou2026user,
  title = {User Persona Subspaces Modulate Refusal Behavior in Language Models},
  author = {Zhou, Yan and Zhang, Shichang and Xiong, Zidi and Lakkaraju, Himabindu},
  journal = {ICML Workshop on Mechanistic Interpretability},
  year = {2026},
}