This is an extension for stable-diffusion-webui which visualizes the embedding vector generated by CLIP.
- Input prompt as you like.
- Click
Run
button. - Wait a second.
This figure shows the embedding vector for each token. Each vector has 768 (for SDv1) or 1024 (for SDv2) dimensions.
This figure shows correlations between each token. Calculation is carried out as follows:
- Compute an embedding vector
v
from the given prompt.v
is typically has dimension (77, 768). For xattn,v
is converted byto_k
linear layer. - For each token
t
, create a new prompt with thet
replaced by padding token. Then compute its embedding vectorv_{t}
. - Let
d_{t} = v - v_{t}
. - Let
d_{t,n}
is nth row vector ofd_{t}
.d_{t,n}
is a 768(or 1024)-dimensional vector representingt
's effect on nth token. Then compute|d_{t,n}|
where|x|
is norm of a vectorx
. - Repeat procedure 2..3 for all
t
in the given prompt.
By default, padding token is _</w>
(ID=318).