Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update JetStream grpc proto to support I/O with text and token ids #78

Merged
merged 10 commits into from
May 14, 2024

Conversation

JoeZijunZhou
Copy link
Collaborator

@JoeZijunZhou JoeZijunZhou commented May 7, 2024

  • Update gprc proto to support
    • Request: token id or text (one of).
    • Response: token id, text or both of them.
    • When input token id, return only token ids. (client tokenization (MLPerf) mode)
    • When input text, return both text and token ids.
  • Refactored detokenization handling and Tokenizer API to decouple the logics.
  • Added complete support for SentencePiece tokenizer streaming decoding (ensured output text correctness).
  • Added and update unit tests for token utils, orchestrator, and server.
  • Benchmark script client-side tokenization is enabled.

Copy link
Collaborator

@FanhaiLu1 FanhaiLu1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode it to str (or piece) in jetstream". Would you like to share the reasons to do make the total different decision in just two weeks?

Below are comment from PR 40:

@FanhaiLu1 FanhaiLu1 [2 weeks ago]
Can we still keep a str as option? The internal keep both text and token id.

@[JoeZijunZhou] [2 weeks ago]
I guess we don't want to decode it to str (or piece) in jetstream, since it would have some off in the final result.

@JoeZijunZhou
Copy link
Collaborator Author

Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode it to str (or piece) in jetstream". Would you like to share the reasons to do make the total different decision in just two weeks?

Below are comment from PR 40:

@FanhaiLu1 FanhaiLu1 [2 weeks ago]
Can we still keep a str as option? The internal keep both text and token id.
@[JoeZijunZhou] [2 weeks ago]
I guess we don't want to decode it to str (or piece) in jetstream, since it would have some off in the final result.

Mainly for our customer's request, they would like to have server-side detokenization support. This PR will include the complete support. " don't want to decode it to str (or piece) in jetstream" is due to the incorrect streaming detokenization previously.

// The client can pass the inputs either as a string, in which case the server will
// tokenize it, or as tokens, in which case it's the client's responsibility to
// ensure they tokenize its input strings with the correct tokenizer.
oneof content {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be both. Client could have 3 choice: token_id, token_text or both of them. If we only support one of them, it's high possible that we need to refactor the code one more time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the request input, user should only input text or token. The response can return token_id, token_text or both of them. The control logic will be implemented in the orchestrator.

@FanhaiLu1
Copy link
Collaborator

Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode it to str (or piece) in jetstream". Would you like to share the reasons to do make the total different decision in just two weeks?
Below are comment from PR 40:

@FanhaiLu1 FanhaiLu1 [2 weeks ago]
Can we still keep a str as option? The internal keep both text and token id.
@[JoeZijunZhou] [2 weeks ago]
I guess we don't want to decode it to str (or piece) in jetstream, since it would have some off in the final result.

Mainly for our customer's request, they would like to have server-side detokenization support. This PR will include the complete support. " don't want to decode it to str (or piece) in jetstream" is due to the incorrect streaming detokenization previously.

Thanks, can you create a issues and add the customer's request details? It will be good context for readers.

@JoeZijunZhou
Copy link
Collaborator Author

Addressing #79 .

@FanhaiLu1
Copy link
Collaborator

Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode it to str (or piece) in jetstream". Would you like to share the reasons to do make the total different decision in just two weeks?
Below are comment from PR 40:

@FanhaiLu1 FanhaiLu1 [2 weeks ago]
Can we still keep a str as option? The internal keep both text and token id.
@[JoeZijunZhou] [2 weeks ago]
I guess we don't want to decode it to str (or piece) in jetstream, since it would have some off in the final result.

Mainly for our customer's request, they would like to have server-side detokenization support. This PR will include the complete support. " don't want to decode it to str (or piece) in jetstream" is due to the incorrect streaming detokenization previously.

Soon, there could be another customer ask both token id an text support? I feel we should predict what are the customer's requirements and do it in once. Support token id, text and both is one solution for all the customers.

@JoeZijunZhou
Copy link
Collaborator Author

JoeZijunZhou commented May 8, 2024

Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode it to str (or piece) in jetstream". Would you like to share the reasons to do make the total different decision in just two weeks?
Below are comment from PR 40:

@FanhaiLu1 FanhaiLu1 [2 weeks ago]
Can we still keep a str as option? The internal keep both text and token id.
@[JoeZijunZhou] [2 weeks ago]
I guess we don't want to decode it to str (or piece) in jetstream, since it would have some off in the final result.

Mainly for our customer's request, they would like to have server-side detokenization support. This PR will include the complete support. " don't want to decode it to str (or piece) in jetstream" is due to the incorrect streaming detokenization previously.

Soon, there could be another customer ask both token id an text support? I feel we should predict what are the customer's requirements and do it in once. Support token id, text and both is one solution for all the customers.

This already supports token id, text and both as response. User will not input both text and tokens at the same time in 1 request right? Why will user input both text and tokens to request JetStream API?

I was thinking simplified it to 2 mode:

  1. For MLPerf, return token ids only
  2. Other cases, always return text + token ids + score (to be added to Sample)

@FanhaiLu1
Copy link
Collaborator

Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode it to str (or piece) in jetstream". Would you like to share the reasons to do make the total different decision in just two weeks?
Below are comment from PR 40:

@FanhaiLu1 FanhaiLu1 [2 weeks ago]
Can we still keep a str as option? The internal keep both text and token id.
@[JoeZijunZhou] [2 weeks ago]
I guess we don't want to decode it to str (or piece) in jetstream, since it would have some off in the final result.

Mainly for our customer's request, they would like to have server-side detokenization support. This PR will include the complete support. " don't want to decode it to str (or piece) in jetstream" is due to the incorrect streaming detokenization previously.

Soon, there could be another customer ask both token id an text support? I feel we should predict what are the customer's requirements and do it in once. Support token id, text and both is one solution for all the customers.

This already supports token id, text and both as response. User will not input both text and tokens at the same time in 1 request right? Why will user input both text and tokens to request JetStream API?

I was thinking simplified it to 2 mode:

  1. For MLPerf, return token ids only
  2. Other cases, always return text + token ids + score (to be added to Sample)

Looks good to me for these setting:

Request: token id or text (one of)
Response: token id, text or both of them.

Copy link
Collaborator

@FanhaiLu1 FanhaiLu1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rollback approve

@JoeZijunZhou JoeZijunZhou marked this pull request as ready for review May 10, 2024 22:30
@FanhaiLu1
Copy link
Collaborator

  • When input text, return both text and token ids.

Is it still a streaming mode?

@JoeZijunZhou
Copy link
Collaborator Author

JoeZijunZhou commented May 14, 2024

  • When input text, return both text and token ids.

Is it still a streaming mode?

Yes, each streaming content contains a token_ids list and its text piece with greedy decode strategy. Performance has no regression.

@JoeZijunZhou JoeZijunZhou merged commit 01c5a03 into main May 14, 2024
3 checks passed
@JoeZijunZhou JoeZijunZhou deleted the zijun/proto-update branch May 14, 2024 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants