How to reconstruct the full attention matrix? #33

FarzanT · 2022-11-30T14:58:16Z

Hello,

The implementation for the Reformer model allows for the reconstruction of the full attention matrix (https://github.com/lucidrains/reformer-pytorch#research). There, the Recorder class can expand the attention matrix to it's original form.
How can one get this full attention matrix for the Routing transformer? The Recorder class is only compatible with the Reformer transformer.
The full attention matrix is needed for Transformer Interpretability/Explanation, such as the one described here: https://github.com/hila-chefer/Transformer-Explainability

I believe it would involve the lines here:

routing-transformer/routing_transformer/routing_transformer.py

Lines 407 to 417 in 3f6c461

    
           q = batched_index_select(q, indices) 
        
           k = batched_index_select(k, kv_indices) 
        
           v = batched_index_select(v, kv_indices) 
        
           reshape_with_window = lambda x: x.reshape(b, h, nc, -1, d) 
        
           q, k, v = map(reshape_with_window, (q, k, v)) 
        
           m_k, m_v = map(lambda x: expand_dim(x, 0, b).to(q), (self.mem_key, self.mem_value)) 
        
           k, v = map(lambda x: torch.cat(x, dim=3), ((m_k, k), (m_v, v))) 
        
           dots = torch.einsum('bhnid,bhnjd->bhnij', q, k) * (d ** -0.5)

KatarinaYuan · 2023-08-24T15:27:17Z

Hi, have you solved this problem?

FarzanT · 2023-08-26T04:26:28Z

@KatarinaYuan Hi, unfortunately not, I don't think it's trivial. I decided to use the full attention matrix but with more efficient implementations such as in PyTorch 2.0 and DeepSpeed. Hope it helps!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to reconstruct the full attention matrix? #33

How to reconstruct the full attention matrix? #33

FarzanT commented Nov 30, 2022 •

edited

Loading

KatarinaYuan commented Aug 24, 2023

FarzanT commented Aug 26, 2023

How to reconstruct the full attention matrix? #33

How to reconstruct the full attention matrix? #33

Comments

FarzanT commented Nov 30, 2022 • edited Loading

KatarinaYuan commented Aug 24, 2023

FarzanT commented Aug 26, 2023

FarzanT commented Nov 30, 2022 •

edited

Loading