Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #2724: Broken handling of AWS S3 urls #2725

Merged
merged 5 commits into from
May 3, 2022

Conversation

hindog
Copy link
Contributor

@hindog hindog commented Apr 8, 2022

Fixes #2724

  • Fixes issue where secretKey/sessionToken are not url-decoded when extracted from S3 url (by converting the URL to java.net.URI instead of java.net.URL, which returns the values decoded)
  • Fixes issue when parsing the bucket and key. PLEASE REVIEW and let me know if I'm missing something here.

Proposed Changes (Mandatory)

  • Modify S3ParamExtractor to construct a java.net.URI from the S3 url so when we call getUserInfo, we get a decoded value (java.net.URL#getUserInfo returns the original encoded value).
  • Change the bucket+key extraction logic to split on the first / (after the leading /) rather than last

@hindog hindog changed the title Fixes #2724: AWS S3 url handling Fixes #2724: Broken handling of AWS S3 urls Apr 8, 2022
@hindog
Copy link
Contributor Author

hindog commented Apr 8, 2022

After testing some more, there is another "double url-encoding" issue that I believe is caused by a call to apoc.util.FileUtils#changeFileUrlIfImportDirectoryConstrained, which may be the same issue reported in #2714

@leonpierre
Copy link

@hindog I'm not sure if this will fix our issue (#2714) with AWS Blob Url's, because S3 is working for us, and as far as I understand the fix, you only changed S3 code?!

@vga91
Copy link
Collaborator

vga91 commented Apr 12, 2022

After testing some more, there is another "double url-encoding" issue that I believe is caused by a call to apoc.util.FileUtils#changeFileUrlIfImportDirectoryConstrained, which may be the same issue reported in #2714

I think the same, this issue is fixed in 4.3 but not yet in 4.4.
Can you try to build your project by adding this changes #2740 ?

@hindog
Copy link
Contributor Author

hindog commented Apr 12, 2022

@vga91 thanks! I tested locally and confirmed the double-encoding issue is fixed after cherry-picking that commit.

@leonpierre that's correct. After fixing a couple of S3-specific issues, I then hit the double-encoding issue as well, and started to investigate. As @vga91 mentioned, there is already a fix for that issue in 4.3 that needs to be pulled into 4.4. I pulled that change into this PR and confirmed the double-encoding issue is fixed for me now.

@hindog
Copy link
Contributor Author

hindog commented Apr 12, 2022

  • I refactored the fix to be less intrusive, so now the changes are isolated to S3ParamsExtractor
  • Unit tests have been added for S3ParamsExtractor
  • I tested these changes locally and I'm able to read from S3 URL's using accessKeyId:secretKey:sessionToken in the url


Integer slashIndex = url.getPath().lastIndexOf("/");
Integer slashIndex = uri.getPath().indexOf("/", 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines 48 to 56
String endpoint = url.getHost();
String endpoint = uri.getHost();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a problem with this (and that's why the LoadS3Test tests are failing). The example I give is with http addresses, but it would be similar for S3 ones:

import java.net.{URL, URI}

new URL("http://us-east-1.127.0.0.1:55220/test-bucket/test.csv?accessKey=accesskey&secretKey=secretkey").getHost 
// us-east-1.127.0.0.1

new URI("http://us-east-1.127.0.0.1:55220/test-bucket/test.csv?accessKey=accesskey&secretKey=secretkey").getHost
// null

Copy link
Contributor

@ncordon ncordon Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I experimented with a fix for that in 80e800d. Feel free to discard it if you have a better suggestion

@ncordon ncordon merged commit d043d9f into neo4j-contrib:4.4 May 3, 2022
vga91 added a commit to vga91/neo4j-apoc-procedures that referenced this pull request May 17, 2022
…ib#2725)

* Fixes neo4j-contrib#2724, AWS S3 url handling

* Fixes neo4j-contrib#2269: apoc.load procedures don't work anymore with urls containing %

* Code formatting

* Adds fix for the getHost problem

Co-authored-by: Giuseppe Villani <[email protected]>
Co-authored-by: Nacho Cordón <[email protected]>
vga91 added a commit to vga91/neo4j-apoc-procedures that referenced this pull request May 17, 2022
…ib#2725)

* Fixes neo4j-contrib#2724, AWS S3 url handling

* Fixes neo4j-contrib#2269: apoc.load procedures don't work anymore with urls containing %

* Code formatting

* Adds fix for the getHost problem

Co-authored-by: Giuseppe Villani <[email protected]>
Co-authored-by: Nacho Cordón <[email protected]>
conker84 pushed a commit that referenced this pull request May 18, 2022
* Fixes #2724, AWS S3 url handling

* Fixes #2269: apoc.load procedures don't work anymore with urls containing %

* Code formatting

* Adds fix for the getHost problem

Co-authored-by: Giuseppe Villani <[email protected]>
Co-authored-by: Nacho Cordón <[email protected]>
vga91 added a commit to vga91/neo4j-apoc-procedures that referenced this pull request May 24, 2022
…ib#2725)

* Fixes neo4j-contrib#2724, AWS S3 url handling

* Fixes neo4j-contrib#2269: apoc.load procedures don't work anymore with urls containing %

* Code formatting

* Adds fix for the getHost problem

Co-authored-by: Giuseppe Villani <[email protected]>
Co-authored-by: Nacho Cordón <[email protected]>
vga91 added a commit to vga91/neo4j-apoc-procedures that referenced this pull request May 25, 2022
…ib#2725)

* Fixes neo4j-contrib#2724, AWS S3 url handling

* Fixes neo4j-contrib#2269: apoc.load procedures don't work anymore with urls containing %

* Code formatting

* Adds fix for the getHost problem

Co-authored-by: Giuseppe Villani <[email protected]>
Co-authored-by: Nacho Cordón <[email protected]>
vga91 added a commit to vga91/neo4j-apoc-procedures that referenced this pull request May 25, 2022
…ib#2725)

* Fixes neo4j-contrib#2724, AWS S3 url handling

* Fixes neo4j-contrib#2269: apoc.load procedures don't work anymore with urls containing %

* Code formatting

* Adds fix for the getHost problem

Co-authored-by: Giuseppe Villani <[email protected]>
Co-authored-by: Nacho Cordón <[email protected]>
vga91 added a commit that referenced this pull request May 27, 2022
* Fixes #2724, AWS S3 url handling

* Fixes #2269: apoc.load procedures don't work anymore with urls containing %

* Code formatting

* Adds fix for the getHost problem

Co-authored-by: Giuseppe Villani <[email protected]>
Co-authored-by: Nacho Cordón <[email protected]>
vga91 added a commit that referenced this pull request May 27, 2022
* Fixes #2724, AWS S3 url handling

* Fixes #2269: apoc.load procedures don't work anymore with urls containing %

* Code formatting

* Adds fix for the getHost problem

Co-authored-by: Giuseppe Villani <[email protected]>
Co-authored-by: Nacho Cordón <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Broken handling of AWS S3 urls
4 participants