Skip to content

Commit

Permalink
apacheGH-43967: [C++] Enhance error message for URI parsing (apache#4…
Browse files Browse the repository at this point in the history
…3938)

### Rationale for this change

We want to enhance error message for URI parsing error to provide more information for the syntax error scenario.

When error message is generated from `uriParseSingleUriExA`, the return value might indicate a `URI_ERROR_SYNTAX` error, and `error_pos` would be set to the position causing syntax error. ([uriparser/Uri.h](https://github.com/apache/arrow/blob/c455d6b8c4ae2cb22baceb4c27e1325b973d39e1/cpp/src/arrow/vendored/uriparser/Uri.h#L288))

In the new error message, it includes the character causing syntax error and its position, so users can have a better idea why the error happens.

### What changes are included in this PR?

- Error message change in URI parsing function.

### Are these changes tested?

PR includes unit tests.

### Are there any user-facing changes?

Yes, but only for error message.

* GitHub Issue: apache#41365
* GitHub Issue: apache#43967

Authored-by: Crystal Zhou <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
  • Loading branch information
CrystalZhou0529 authored and khwilson committed Sep 14, 2024
1 parent caa368e commit 790f892
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 3 deletions.
8 changes: 8 additions & 0 deletions cpp/src/arrow/filesystem/filesystem_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
#include <utility>
#include <vector>

#include <gmock/gmock.h>
#include <gtest/gtest.h>

#include "arrow/filesystem/filesystem.h"
Expand Down Expand Up @@ -632,6 +633,13 @@ TEST_F(TestMockFS, FileSystemFromUri) {
ASSERT_OK_AND_ASSIGN(fs_, FileSystemFromUri("mock:///foo/bar?q=zzz", &path));
ASSERT_EQ(path, "foo/bar");
CheckDirs({});
ASSERT_OK_AND_ASSIGN(fs_, FileSystemFromUri("mock:/folder+name/bar?q=zzz", &path));
ASSERT_EQ(path, "folder+name/bar");
CheckDirs({});
EXPECT_RAISES_WITH_MESSAGE_THAT(
Invalid, ::testing::HasSubstr("syntax error at character ' ' (position 12)"),
FileSystemFromUri("mock:/folder name/bar", &path));
CheckDirs({});
}

////////////////////////////////////////////////////////////////////////////
Expand Down
13 changes: 10 additions & 3 deletions cpp/src/arrow/util/uri.cc
Original file line number Diff line number Diff line change
Expand Up @@ -250,9 +250,16 @@ Status Uri::Parse(const std::string& uri_string) {
const auto& s = impl_->KeepString(uri_string);
impl_->string_rep_ = s;
const char* error_pos;
if (uriParseSingleUriExA(&impl_->uri_, s.data(), s.data() + s.size(), &error_pos) !=
URI_SUCCESS) {
return Status::Invalid("Cannot parse URI: '", uri_string, "'");
int retval =
uriParseSingleUriExA(&impl_->uri_, s.data(), s.data() + s.size(), &error_pos);
if (retval != URI_SUCCESS) {
if (retval == URI_ERROR_SYNTAX) {
return Status::Invalid("Cannot parse URI: '", uri_string,
"' due to syntax error at character '", *error_pos,
"' (position ", error_pos - s.data(), ")");
} else {
return Status::Invalid("Cannot parse URI: '", uri_string, "'");
}
}

const auto scheme = TextRangeToView(impl_->uri_.scheme);
Expand Down

0 comments on commit 790f892

Please sign in to comment.