Publications

Publications in reversed chronological order. For a complete list, see Google Scholar.

2026

  1. ICML — Workshop
    User Persona Subspaces Modulate Refusal Behavior in Language Models
    Yan Zhou, Shichang Zhang, Zidi Xiong, and Himabindu Lakkaraju
    ICML Workshop on Mechanistic Interpretability, 2026