-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update JetStream grpc proto to support I/O with text and token ids #78
Conversation
JoeZijunZhou
commented
May 7, 2024
•
edited
Loading
edited
- Update gprc proto to support
- Request: token id or text (one of).
- Response: token id, text or both of them.
- When input token id, return only token ids. (client tokenization (MLPerf) mode)
- When input text, return both text and token ids.
- Refactored detokenization handling and Tokenizer API to decouple the logics.
- Added complete support for SentencePiece tokenizer streaming decoding (ensured output text correctness).
- Added and update unit tests for token utils, orchestrator, and server.
- Benchmark script client-side tokenization is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode it to str (or piece) in jetstream". Would you like to share the reasons to do make the total different decision in just two weeks?
Below are comment from PR 40:
@FanhaiLu1 FanhaiLu1 [2 weeks ago]
Can we still keep a str as option? The internal keep both text and token id.@[JoeZijunZhou] [2 weeks ago]
I guess we don't want to decode it to str (or piece) in jetstream, since it would have some off in the final result.
Mainly for our customer's request, they would like to have server-side detokenization support. This PR will include the complete support. " don't want to decode it to str (or piece) in jetstream" is due to the incorrect streaming detokenization previously. |
// The client can pass the inputs either as a string, in which case the server will | ||
// tokenize it, or as tokens, in which case it's the client's responsibility to | ||
// ensure they tokenize its input strings with the correct tokenizer. | ||
oneof content { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be both. Client could have 3 choice: token_id, token_text or both of them. If we only support one of them, it's high possible that we need to refactor the code one more time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the request input, user should only input text or token. The response can return token_id, token_text or both of them. The control logic will be implemented in the orchestrator.
Thanks, can you create a issues and add the customer's request details? It will be good context for readers. |
Addressing #79 . |
Soon, there could be another customer ask both token id an text support? I feel we should predict what are the customer's requirements and do it in once. Support token id, text and both is one solution for all the customers. |
This already supports token id, text and both as response. User will not input both text and tokens at the same time in 1 request right? Why will user input both text and tokens to request JetStream API? I was thinking simplified it to 2 mode:
|
Looks good to me for these setting: Request: token id or text (one of) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rollback approve
Is it still a streaming mode? |
Yes, each streaming content contains a token_ids list and its text piece with greedy decode strategy. Performance has no regression. |