Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new service support: DBFS API 2.0 #2550

Closed
morristai opened this issue Jun 27, 2023 · 5 comments · Fixed by #3334
Closed

Add new service support: DBFS API 2.0 #2550

morristai opened this issue Jun 27, 2023 · 5 comments · Fixed by #3334
Assignees

Comments

@morristai
Copy link
Member

Description

Hi @Xuanwo, is this still in demand? I would like to take a look at it.

@Xuanwo
Copy link
Member

Xuanwo commented Jun 27, 2023

Thanks! Take your time and have fun.

@morristai
Copy link
Member Author

Hi @Xuanwo,
When I implemented the read function for DBFS, I encountered a problem that appears to be a design issue. The DBFS does not return the content_length in the header (instead, it sets the transfer-encoding to chunked). This is causing a panic because the complete_reader requires content_length to not be None. The workaround for this issue might involve manually setting the content_length.
However, I'm struggling to find a solution that returns an IncomingAsyncBody while deserializing the response body. Calling bytes will take ownership of the IncomingAsyncBody.
I'm curious to know your opinion on this matter. What do you think would be the best approach to address this?

image

@Xuanwo
Copy link
Member

Xuanwo commented Aug 14, 2023

Seems dbfs can't reuse our existing code like in s3. Instead, we need to implement a new Reader for that. We need to handle content length internally:

  • The read request is limited to 1MiB at most.
  • The read response contant the base64 of it's real content.

So in our reader, we need to read 512KiB or 1MiB data in buffer and decode the base64 content. And implement oio::Read based on this.

@morristai
Copy link
Member Author

Hi @Xuanwo,
In DBFS, when calling the "append data block" API, it overwrites the original content and writes from the start. This behavior doesn't align with the defined behavior of AppendObjectWrite. In short, it's essentially a larger content version of their "Upload a file" API. Should I implement it with always overwrite=true, but warn the user in advance? Or should I create a new oio write trait to match the desired behavior?

Current Implementation:

oio:multipart_upload_write

oio:one_shot_write

@Xuanwo
Copy link
Member

Xuanwo commented Oct 10, 2023

Seems we need a new oio trait for this case. How about implement as one_shot_write first? We can polish this part in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants