Tokens to be integers #38

Tommy-Hsu · 2024-06-01T09:20:54Z

Hello, I would like to ask about the meaning of tokens being integers. I noticed that the final forward pass to the tokenizer involves the cls_logits_softmax tensor, and it directly performs a matrix multiplication with the codebook. However, these operations are all in floating-point. So, what does it mean for tokens to be integers in classifier stage?

The text was updated successfully, but these errors were encountered:

ndsl555 · 2024-06-24T07:41:00Z

I'd like to know too

Kun-Ming · 2024-07-17T23:00:19Z

Seems like the integer token only happens when stage I training. I think is the variable encoding_indices is this line: https://github.com/Gengzigang/PCT/blob/main/models/pct_tokenizer.py#L142

Tommy-Hsu · 2024-07-18T08:30:08Z

Seems like the integer token only happens when stage I training. I think is the variable encoding_indices is this line: https://github.com/Gengzigang/PCT/blob/main/models/pct_tokenizer.py#L142

That's true. In stage 1, the encoding_indices would be integers but not in stage 2.

Tommy-Hsu · 2024-07-18T08:33:45Z

Figure 1 is quite confusing to me. In the reference stage, the class head output should be logits, and the codebook context is composed of floating-point data. However, this image shows that both are composed of integers, which is what I find puzzling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tokens to be integers #38

Tokens to be integers #38

Tommy-Hsu commented Jun 1, 2024

ndsl555 commented Jun 24, 2024

Kun-Ming commented Jul 17, 2024

Tommy-Hsu commented Jul 18, 2024

Tommy-Hsu commented Jul 18, 2024

Tokens to be integers #38

Tokens to be integers #38

Comments

Tommy-Hsu commented Jun 1, 2024

ndsl555 commented Jun 24, 2024

Kun-Ming commented Jul 17, 2024

Tommy-Hsu commented Jul 18, 2024

Tommy-Hsu commented Jul 18, 2024