Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add webhdfs support #7844

Merged
merged 10 commits into from
Feb 27, 2023
Merged

feat: Add webhdfs support #7844

merged 10 commits into from
Feb 27, 2023

Conversation

Xuanwo
Copy link
Contributor

@Xuanwo Xuanwo commented Feb 12, 2023

Signed-off-by: Xuanwo [email protected]

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR add webdhfs support.

Also, this PR bump opendal to 0.27 to address the duplicated deps backon.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

Click here for Documentation

Types of user-facing changes

  • Installation and deployment

Release note

  • Added webhdfs support

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Feb 12, 2023

Thanks for your contribution! 🥰

Copy link
Contributor

@wcy-fdu wcy-fdu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for your changes!
I'll run some tests for a while and merge this PR later.

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Feb 12, 2023

I'll run some tests for a while and merge this PR later.

To run this test on existing hdfs deployment, please make sure dfs.webhdfs.enabled in hdfs-site.xml has been set to true.

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Feb 17, 2023

I can not start RisingWave via webhdfs backend, is there any configuration problem?

Error: while parsing a block mapping, did not find expected key at line 145 column 17

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Feb 17, 2023

I can not start RisingWave via webhdfs backend, is there any configuration problem?

Error: while parsing a block mapping, did not find expected key at line 145 column 17

Doesn't seem to be an error from OpenDAL or hdfs.

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Feb 17, 2023

Can I use webhdfs just by addingdfs.webhdfs.enabled == trueto the original hdfs configuration?

1 similar comment
@wcy-fdu
Copy link
Contributor

wcy-fdu commented Feb 17, 2023

Can I use webhdfs just by addingdfs.webhdfs.enabled == trueto the original hdfs configuration?

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Feb 17, 2023

Can I use webhdfs just by addingdfs.webhdfs.enabled == trueto the original hdfs configuration?

It should work. OpenDAL test webhdfs in the same way: https://github.com/beyondstorage/setup-hdfs/blob/master/src/setup-hdfs.ts#L31-L48

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Feb 17, 2023

Good news: successfully start RisingWave via webhdfs
Bad news: Got an error in compute node

2023-02-17T06:04:57.351493Z  WARN risingwave_storage::hummock::compactor::shared_buffer_compact: Shared Buffer Compaction failed with error: ObjectStore failed with IO error Unexpected (permanent) at write, context: { service: webhdfs, path: hummock_001/2.data } => building request, source: invalid format.
  backtrace of `HummockError`:
   0: <risingwave_storage::hummock::error::HummockError as core::convert::From<risingwave_storage::hummock::error::HummockErrorInner>>::from
             at ./src/storage/src/hummock/error.rs:68:10
   1: <T as core::convert::Into<U>>::into
             at /rustc/3984bc5833db8bfb0acc522c9775383e4171f3de/library/core/src/convert/mod.rs:726:9
   2: risingwave_storage::hummock::error::HummockError::object_io_error
             at ./src/storage/src/hummock/error.rs:78:9
   3: core::ops::function::FnOnce::call_once
             at /rustc/3984bc5833db8bfb0acc522c9775383e4171f3de/library/core/src/ops/function.rs:250:5
   4: core::result::Result<T,E>::map_err
             at /rustc/3984bc5833db8bfb0acc522c9775383e4171f3de/library/core/src/result.rs:860:27
   5: risingwave_storage::hummock::sstable_store::SstableStore::put_sst_data::{{closure}}
             at ./src/storage/src/hummock/sstable_store.rs:205:9
   6: <risingwave_storage::hummock::sstable_store::BatchUploadWriter as risingwave_storage::hummock::sstable::writer::SstableWriter>::finish::{{closure}}::{{closure}}
             at ./src/storage/src/hummock/sstable_store.rs:573:17

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Feb 17, 2023

Oh, it's an error happened while buidling http request. What's the config are you using?

risedev.yml Outdated Show resolved Hide resolved
@Xuanwo
Copy link
Contributor Author

Xuanwo commented Feb 20, 2023

cc @wcy-fdu, Hi opendal v0.27.2 has been released, please try again~

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Feb 20, 2023

cc @wcy-fdu, Hi opendal v0.27.2 has been released, please try again~

Thanks, let me try it.

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Feb 23, 2023

cc @wcy-fdu, Hi opendal v0.27.2 has been released, please try again~

Sorry for the wait, just found that the seem error occurs again when after updating to v0.27.2, and I guess it's because my env is incorrect. Let me rebuild a machine for verification.

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Feb 27, 2023

I still can't write successfully with webhdfs, the hdfs-site.xml:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.permissions</name>
        <value>false</value>   
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/home/ubuntu/hadoop/tmp/dfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/home/ubuntu/hadoop/tmp/dfs/data</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.namenode.http-address</name>
        <value>localhost:9870</value>
    </property>
    <property>
        <name>dfs.secondary.http.address</name>
        <value>localhost:9100</value>
    </property>
</configuration>

error log:

Caused by:
    statement failed: db error: ERROR: QueryError: internal error: Rpc error: gRPC error (Internal error): Storage error: Hummock error: Other error sync task failed for ObjectStore failed with IO error Unexpected (temporary) at write, context: { called: http_util::Client::send_async, service: webhdfs, path: hummock_001/2.data } => send async request, source: error sending request for url (http://127.0.0.1:9870/webhdfs/v1/risingwave/webhdfs/hummock_001/2.data?op=CREATE&overwrite=true): error trying to connect: tcp connect error: Connection refused (os error 111).
      backtrace of `HummockError`:
       0: <risingwave_storage::hummock::error::HummockError as core::convert::From<risingwave_storage::hummock::error::HummockErrorInner>>::from
                 at ./src/storage/src/hummock/error.rs:68:10
       1: <T as core::convert::Into<U>>::into
                 at /rustc/3984bc5833db8bfb0acc522c9775383e4171f3de/library/core/src/convert/mod.rs:726:9
       2: risingwave_storage::hummock::error::HummockError::object_io_error
                 at ./src/storage/src/hummock/error.rs:78:9

can you please help me to have a look? cc @Xuanwo

@ClSlaid
Copy link

ClSlaid commented Feb 27, 2023

Looks like the request from OpenDAL is refused. Will 'localhost' open to HTTP requests from foreign machines?

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Feb 27, 2023

"Connection refused" indicates that OpenDAL is unable to establish a connection with this endpoint. If you are running HDFS in Docker or Docker Compose, please ensure that the "dfs.namenode.http-address" is set correctly to Docker's network bridge.

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Feb 27, 2023

"Connection refused" indicates that OpenDAL is unable to establish a connection with this endpoint. If you are running HDFS in Docker or Docker Compose, please ensure that the "dfs.namenode.http-address" is set correctly to Docker's network bridge.

I'm running webhdfs in aws ec2, so dfs.namenode.http-address should be set to the public ip of this machine?

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Feb 27, 2023

I'm running webhdfs in aws ec2, so dfs.namenode.http-address should be set to the public ip of this machine?

The core question is are you running hdfs and risingwave on the same machine? If not, you need to change it to ec2's public ip.

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Feb 27, 2023

I'm running webhdfs in aws ec2, so dfs.namenode.http-address should be set to the public ip of this machine?

The core question is are you running hdfs and risingwave on the same machine? If not, you need to change it to ec2's public ip.

It is the same machine.

@Xuanwo
Copy link
Contributor Author

Xuanwo commented Feb 27, 2023

For current information, I think it's not related to OpenDAL, please make sure curl http://127.0.0.1:9870 return valid WebHDFS response.

Copy link
Contributor

@wcy-fdu wcy-fdu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally I successfully ran RisingWave on webhdfs, thanks for your contribution!

@risingwavelabs risingwavelabs deleted a comment from Xuanwo Feb 27, 2023
risedev.yml Outdated Show resolved Hide resolved
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
Signed-off-by: Xuanwo <[email protected]>
auto-merge was automatically disabled February 27, 2023 09:11

Head branch was pushed to by a user without write access

Signed-off-by: Xuanwo <[email protected]>
@wcy-fdu wcy-fdu added this pull request to the merge queue Feb 27, 2023
Merged via the queue into risingwavelabs:main with commit cb7e029 Feb 27, 2023
@xxchan xxchan added the user-facing-changes Contains changes that are visible to users label Feb 27, 2023
@Xuanwo Xuanwo deleted the webhdfs-support branch February 27, 2023 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
user-facing-changes Contains changes that are visible to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants