These two sentences:
If linear directions are interpretable, it’s natural to think there’s some “basic set” of meaningful directions which more complex directions can be created from. We call these directions features.
are from the paper, Towards Monosemanticity: Decomposing Language Models With Dictionary Learning.
My question is: Are ‘these directions’ the “basic set” or the ‘more complex directions’?
#question #llm #transformer #sparse #autoencoder #feature