Update JetStream grpc proto to support I/O with text and token ids #78

JoeZijunZhou · 2024-05-07T23:51:17Z

Update gprc proto to support
- Request: token id or text (one of).
- Response: token id, text or both of them.
- When input token id, return only token ids. (client tokenization (MLPerf) mode)
- When input text, return both text and token ids.
Refactored detokenization handling and Tokenizer API to decouple the logics.
Added complete support for SentencePiece tokenizer streaming decoding (ensured output text correctness).
Added and update unit tests for token utils, orchestrator, and server.
Benchmark script client-side tokenization is enabled.

FanhaiLu1

Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode it to str (or piece) in jetstream". Would you like to share the reasons to do make the total different decision in just two weeks?

Below are comment from PR 40:

@FanhaiLu1 FanhaiLu1 [2 weeks ago]
Can we still keep a str as option? The internal keep both text and token id.

@[JoeZijunZhou] [2 weeks ago]
I guess we don't want to decode it to str (or piece) in jetstream, since it would have some off in the final result.

JoeZijunZhou · 2024-05-08T03:47:09Z

Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode it to str (or piece) in jetstream". Would you like to share the reasons to do make the total different decision in just two weeks?

Below are comment from PR 40:

@FanhaiLu1 FanhaiLu1 [2 weeks ago]
Can we still keep a str as option? The internal keep both text and token id.
@[JoeZijunZhou] [2 weeks ago]
I guess we don't want to decode it to str (or piece) in jetstream, since it would have some off in the final result.

Mainly for our customer's request, they would like to have server-side detokenization support. This PR will include the complete support. " don't want to decode it to str (or piece) in jetstream" is due to the incorrect streaming detokenization previously.

FanhaiLu1 · 2024-05-08T16:17:56Z

jetstream/core/proto/jetstream.proto

+  // The client can pass the inputs either as a string, in which case the server will
+  // tokenize it, or as tokens, in which case it's the client's responsibility to
+  // ensure they tokenize its input strings with the correct tokenizer.
+  oneof content {


It could be both. Client could have 3 choice: token_id, token_text or both of them. If we only support one of them, it's high possible that we need to refactor the code one more time.

This is the request input, user should only input text or token. The response can return token_id, token_text or both of them. The control logic will be implemented in the orchestrator.

FanhaiLu1 · 2024-05-08T16:18:55Z

Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode it to str (or piece) in jetstream". Would you like to share the reasons to do make the total different decision in just two weeks?
Below are comment from PR 40:

@FanhaiLu1 FanhaiLu1 [2 weeks ago]
Can we still keep a str as option? The internal keep both text and token id.
@[JoeZijunZhou] [2 weeks ago]
I guess we don't want to decode it to str (or piece) in jetstream, since it would have some off in the final result.

Mainly for our customer's request, they would like to have server-side detokenization support. This PR will include the complete support. " don't want to decode it to str (or piece) in jetstream" is due to the incorrect streaming detokenization previously.

Thanks, can you create a issues and add the customer's request details? It will be good context for readers.

JoeZijunZhou · 2024-05-08T17:22:00Z

Addressing #79 .

FanhaiLu1 · 2024-05-08T20:57:19Z

Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode it to str (or piece) in jetstream". Would you like to share the reasons to do make the total different decision in just two weeks?
Below are comment from PR 40:

@FanhaiLu1 FanhaiLu1 [2 weeks ago]
Can we still keep a str as option? The internal keep both text and token id.
@[JoeZijunZhou] [2 weeks ago]
I guess we don't want to decode it to str (or piece) in jetstream, since it would have some off in the final result.

Mainly for our customer's request, they would like to have server-side detokenization support. This PR will include the complete support. " don't want to decode it to str (or piece) in jetstream" is due to the incorrect streaming detokenization previously.

Soon, there could be another customer ask both token id an text support? I feel we should predict what are the customer's requirements and do it in once. Support token id, text and both is one solution for all the customers.

JoeZijunZhou · 2024-05-08T22:29:12Z

Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode it to str (or piece) in jetstream". Would you like to share the reasons to do make the total different decision in just two weeks?
Below are comment from PR 40:

@FanhaiLu1 FanhaiLu1 [2 weeks ago]
Can we still keep a str as option? The internal keep both text and token id.
@[JoeZijunZhou] [2 weeks ago]
I guess we don't want to decode it to str (or piece) in jetstream, since it would have some off in the final result.

Mainly for our customer's request, they would like to have server-side detokenization support. This PR will include the complete support. " don't want to decode it to str (or piece) in jetstream" is due to the incorrect streaming detokenization previously.

Soon, there could be another customer ask both token id an text support? I feel we should predict what are the customer's requirements and do it in once. Support token id, text and both is one solution for all the customers.

This already supports token id, text and both as response. User will not input both text and tokens at the same time in 1 request right? Why will user input both text and tokens to request JetStream API?

I was thinking simplified it to 2 mode:

For MLPerf, return token ids only
Other cases, always return text + token ids + score (to be added to Sample)

FanhaiLu1 · 2024-05-08T23:35:58Z

Any reason to add text back, I suggested we keep both str and id in response in #40. The answer is " don't want to decode it to str (or piece) in jetstream". Would you like to share the reasons to do make the total different decision in just two weeks?
Below are comment from PR 40:

@FanhaiLu1 FanhaiLu1 [2 weeks ago]
Can we still keep a str as option? The internal keep both text and token id.
@[JoeZijunZhou] [2 weeks ago]
I guess we don't want to decode it to str (or piece) in jetstream, since it would have some off in the final result.

Mainly for our customer's request, they would like to have server-side detokenization support. This PR will include the complete support. " don't want to decode it to str (or piece) in jetstream" is due to the incorrect streaming detokenization previously.

Soon, there could be another customer ask both token id an text support? I feel we should predict what are the customer's requirements and do it in once. Support token id, text and both is one solution for all the customers.

This already supports token id, text and both as response. User will not input both text and tokens at the same time in 1 request right? Why will user input both text and tokens to request JetStream API?

I was thinking simplified it to 2 mode:

For MLPerf, return token ids only

Other cases, always return text + token ids + score (to be added to Sample)

Looks good to me for these setting:

Request: token id or text (one of)
Response: token id, text or both of them.

FanhaiLu1

rollback approve

FanhaiLu1 · 2024-05-14T22:39:57Z

When input text, return both text and token ids.

Is it still a streaming mode?

JoeZijunZhou · 2024-05-14T23:08:13Z

When input text, return both text and token ids.

Is it still a streaming mode?

Yes, each streaming content contains a token_ids list and its text piece with greedy decode strategy. Performance has no regression.

Update JetStream grpc proto to support I/O with text and token ids

db454ee

FanhaiLu1 reviewed May 8, 2024

View reviewed changes

FanhaiLu1 approved these changes May 8, 2024

View reviewed changes

FanhaiLu1 reviewed May 9, 2024

View reviewed changes

JoeZijunZhou added 3 commits May 10, 2024 17:58

Update orchestrator and token utils to support text and token I/O

2666bfa

Add and update unit tests

2fa9e5c

Merge branch 'main' into zijun/proto-update

fd0c964

JoeZijunZhou marked this pull request as ready for review May 10, 2024 22:30

JoeZijunZhou requested a review from vipannalla as a code owner May 10, 2024 22:30

JoeZijunZhou added 6 commits May 10, 2024 23:30

Fix prometheus duplicate metrics issue

2df0f58

add shortuuid dep

93b5028

Update docstring

25d58c6

Add client tokenization mode

608819a

Update client side I/O handling

972ac84

latest pylint fix

7e35890

FanhaiLu1 approved these changes May 14, 2024

View reviewed changes

JoeZijunZhou merged commit 01c5a03 into main May 14, 2024
3 checks passed

JoeZijunZhou deleted the zijun/proto-update branch May 14, 2024 23:10

JoeZijunZhou mentioned this pull request May 14, 2024

Support I/O with text and token ids #79

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update JetStream grpc proto to support I/O with text and token ids #78

Update JetStream grpc proto to support I/O with text and token ids #78

JoeZijunZhou commented May 7, 2024 •

edited

Loading

FanhaiLu1 left a comment

JoeZijunZhou commented May 8, 2024

FanhaiLu1 May 8, 2024

JoeZijunZhou May 8, 2024

FanhaiLu1 commented May 8, 2024

JoeZijunZhou commented May 8, 2024

FanhaiLu1 commented May 8, 2024

JoeZijunZhou commented May 8, 2024 •

edited

Loading

FanhaiLu1 commented May 8, 2024

FanhaiLu1 left a comment

FanhaiLu1 commented May 14, 2024

JoeZijunZhou commented May 14, 2024 •

edited

Loading

Update JetStream grpc proto to support I/O with text and token ids #78

Update JetStream grpc proto to support I/O with text and token ids #78

Conversation

JoeZijunZhou commented May 7, 2024 • edited Loading

FanhaiLu1 left a comment

Choose a reason for hiding this comment

JoeZijunZhou commented May 8, 2024

FanhaiLu1 May 8, 2024

Choose a reason for hiding this comment

JoeZijunZhou May 8, 2024

Choose a reason for hiding this comment

FanhaiLu1 commented May 8, 2024

JoeZijunZhou commented May 8, 2024

FanhaiLu1 commented May 8, 2024

JoeZijunZhou commented May 8, 2024 • edited Loading

FanhaiLu1 commented May 8, 2024

FanhaiLu1 left a comment

Choose a reason for hiding this comment

FanhaiLu1 commented May 14, 2024

JoeZijunZhou commented May 14, 2024 • edited Loading

JoeZijunZhou commented May 7, 2024 •

edited

Loading

JoeZijunZhou commented May 8, 2024 •

edited

Loading

JoeZijunZhou commented May 14, 2024 •

edited

Loading