Analyzing (In)Abilities of SAEs via Formal Languages

Published in MINT@NeurIPS, 2024

Authors: Abhinav Menon, Manish Shrivastava, Ekdeep Singh Lubana, David Krueger
Download Paper