CVPR 2026

From Weights to Concepts
Data-Free Interpretability of CLIP
via Singular Vector Decomposition

SITH — Semantic Inspection of Transformer Heads

Francesco Gentile1*·Nicola Dall’Asen1,2·Francesco Tonini1,3·Massimiliano Mancini1·Lorenzo Vaquero3·Elisa Ricci1,3
1 University of Trento2 University of Pisa3 Fondazione Bruno Kessler
Abstract

As vision-language models are deployed at scale, understanding their internal mechanisms becomes increasingly critical. Existing interpretability methods predominantly rely on activations, making them dataset-dependent, vulnerable to data bias, and often restricted to coarse head-level explanations. We introduce SITH (Semantic Inspection of Transformer Heads), a fully data-free, training-free framework that directly analyzes CLIP’s vision transformer in weight space. For each attention head, we decompose its value-output matrix into singular vectors and interpret each one via COMP (Coherent Orthogonal Matching Pursuit), a new algorithm that explains them as sparse, semantically coherent combinations of human-interpretable concepts. We show that SITH yields monosemantic, faithful intra-head explanations, validated through reconstruction fidelity and interpretability experiments. This enables precise, interpretable weight-space model edits that amplify or suppress specific concepts, improving downstream performance without retraining. Furthermore, SITH reveals how fine-tuning primarily reweights a stable semantic basis rather than learning entirely new features.

Method

How SITH Works

1
Decompose WVO via SVD

For each attention head, SITH isolates its value-output weight matrix WVO and factorizes it into r singular vectors { vj } via SVD. These vectors capture the head’s dominant computational directions — directly from weights, requiring no images, activations, or labels.

2
Explain each vector with COMP

Each singular vector is decomposed by COMP (Coherent Orthogonal Matching Pursuit) into a sparse, non-negative combination of concept embeddings drawn from a textual dictionary Γ (e.g., ConceptNet). Unlike standard matching pursuit, COMP adds a coherence term that favours concepts semantically consistent with those already selected — producing explanations that read as meaningful groups, not unrelated word lists.

Overview of SITH and COMP
Figure. (left) SITH isolates WVO and applies SVD, yielding singular vectors {<em>v<sub>j</sub></em>}. Given concept pool Γ, COMP explains each vector as a sparse coherent combination of concept embeddings. (right) COMP iteratively builds the concept set by selecting, at each step, the concept most similar to the current residual and semantically consistent with the concepts already chosen.
Qualitative Results

What Do Individual Heads Encode?

By considering the first few singular vectors of each Value-Output matrix, which represent the dominant directions of information flow through each head, we can identify the primary functional role of each head. Exploring these singular vectors reveals striking semantic structure: many individual heads naturally specialize in distinct, coherent themes, such as locations, colors, or materials. Furthermore, we observe the same phenomenon across all examined CLIP-trained models, suggesting that this semantic specialization is a universal property of CLIP-trained models. As shown below, the exact same functionally specialized heads emerge across vastly different model capacities and architectures, from ViT-B/32 to MobileCLIP.

Layer 22, Head 2 · ViT-L/14 — encodes locations.

← Left direction
house rooms
gogglebox
0.173
cleaning living room
0.171
hotel suite
0.161
kitchen bedroom bathroom
0.147
room of house
0.107
Right direction →
outdoor locations
outdoor concert
0.194
park near mall entrance
0.192
outside budget
0.189
having sex outdoors
0.188
outdoor stalls
0.164
Applications

Interpretable Model Editing with SITH

Because SITH operates directly on weights, its decompositions are immediately actionable. By scaling the singular values associated with identified concept directions, we can surgically modify CLIP’s behavior — suppressing unwanted features or amplifying task-relevant ones — with no training, no labeled data, and no gradient computation.

1
Spurious Correlations
Background-Invariant Bird Classification

On Waterbirds, CLIP can exploit background habitat cues (water vs. land) rather than bird appearance to classify the two classes (waterbird vs. landbird). Thus, we use SITH to identify and suppress the singular vectors encoding background information, without any additional training or labels. This simple edit improves worst-group accuracy by a wide margin (+22.7pp), outperforming TextSpan, a recent method that removes spurious features by ablating entire heads, showing that more fine-grained editing can be more effective.

+22.7pp
worst-group accuracy
47.9% → 70.6%
2
NSFW Removal
Safety-Aware Retrieval Without Retraining

CLIP can encode and retrieve unsafe visual content, thus requiring costly retraining to meet safety standards. Instead, we use SITH to identify the singular vectors associated with nudity and/or violence, and suppress them to create a safer model without any training or labeled data. On the ViSU benchmark, the edited model improves retrieval recall on unsafe visual queries, in particular for unsafe visual queries.

+1.5pp
for I*→T queries
38.9% → 40.4%
3
Classification Performance
Task-Aware Singular Value Amplification

For a target classification task, we use SITH to measure the alignment between the semantic content of each singular vector and task-relevant concepts. We then edit the model by rescaling proportionally the singular values, so as to amplify relevant directions and downweight irrelevant ones. This simple edit yields consistent gains across Flowers 102, FGVC-Aircraft, and DTD, without any training, labeled data, or gradient computation.

+1.0pp
Flowers 102
76.5% → 77.5%
Citation

BibTeX

BibTeX
@inproceedings{gentile2026sith,
	title        = {From Weights to Concepts: Data-Free Interpretability of {CLIP} via Singular Vector Decomposition},
	author       = {Gentile, Francesco and Dall'Asen, Nicola and Tonini, Francesco and Mancini, Massimiliano and Vaquero, Lorenzo and Ricci, Elisa},
	year         = 2026,
	booktitle    = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}
}