Add new service support: DBFS API 2.0 #2550

morristai · 2023-06-27T01:19:11Z

Description

Refer to Tracking issue of services support #5: Databricks DBFS API 2.0

Hi @Xuanwo, is this still in demand? I would like to take a look at it.

Xuanwo · 2023-06-27T01:21:46Z

Thanks! Take your time and have fun.

morristai · 2023-08-14T04:56:50Z

Hi @Xuanwo,
When I implemented the read function for DBFS, I encountered a problem that appears to be a design issue. The DBFS does not return the content_length in the header (instead, it sets the transfer-encoding to chunked). This is causing a panic because the complete_reader requires content_length to not be None. The workaround for this issue might involve manually setting the content_length.
However, I'm struggling to find a solution that returns an IncomingAsyncBody while deserializing the response body. Calling bytes will take ownership of the IncomingAsyncBody.
I'm curious to know your opinion on this matter. What do you think would be the best approach to address this?

Xuanwo · 2023-08-14T05:11:23Z

Seems dbfs can't reuse our existing code like in s3. Instead, we need to implement a new Reader for that. We need to handle content length internally:

The read request is limited to 1MiB at most.
The read response contant the base64 of it's real content.

So in our reader, we need to read 512KiB or 1MiB data in buffer and decode the base64 content. And implement oio::Read based on this.

morristai · 2023-10-10T04:15:59Z

Hi @Xuanwo,
In DBFS, when calling the "append data block" API, it overwrites the original content and writes from the start. This behavior doesn't align with the defined behavior of AppendObjectWrite. In short, it's essentially a larger content version of their "Upload a file" API. Should I implement it with always overwrite=true, but warn the user in advance? Or should I create a new oio write trait to match the desired behavior?

Current Implementation:

oio:multipart_upload_write

oio:one_shot_write

https://docs.databricks.com/api/azure/workspace/dbfs/put

Xuanwo · 2023-10-10T04:26:06Z

Seems we need a new oio trait for this case. How about implement as one_shot_write first? We can polish this part in the future.

Xuanwo assigned morristai Jun 27, 2023

This was referenced Oct 18, 2023

feat: service add dbfs api 2.0 support morristai/opendal#7

Closed

feat(core): service add DBFS API 2.0 support #3334

Merged

Xuanwo closed this as completed in #3334 Oct 26, 2023

morristai mentioned this issue Oct 26, 2023

DBFS supports append write operation #3385

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new service support: DBFS API 2.0 #2550

Add new service support: DBFS API 2.0 #2550

morristai commented Jun 27, 2023

Xuanwo commented Jun 27, 2023

morristai commented Aug 14, 2023

Xuanwo commented Aug 14, 2023

morristai commented Oct 10, 2023

Xuanwo commented Oct 10, 2023

Add new service support: DBFS API 2.0 #2550

Add new service support: DBFS API 2.0 #2550

Comments

morristai commented Jun 27, 2023

Description

Xuanwo commented Jun 27, 2023

morristai commented Aug 14, 2023

Xuanwo commented Aug 14, 2023

morristai commented Oct 10, 2023

Xuanwo commented Oct 10, 2023