Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken handling of AWS S3 urls #2724

Closed
hindog opened this issue Apr 8, 2022 · 0 comments · Fixed by #2725
Closed

Broken handling of AWS S3 urls #2724

hindog opened this issue Apr 8, 2022 · 0 comments · Fixed by #2725

Comments

@hindog
Copy link
Contributor

hindog commented Apr 8, 2022

Guidelines

Please note that GitHub issues are only meant for bug reports/feature requests. If you have questions on how to use APOC, please ask on the Neo4j Discussion Forum instead of creating an issue here.

Expected Behavior (Mandatory)

When using AWS S3 url with credentials embedded in the url (eg: s3://accessKey:secretKey[:sessionToken]@endpoint:port/bucket/key), procedures such as apoc.load.json should be able to read the data if secret key contains url-escaped characters.

Actual Behavior (Mandatory)

With a secret key contains a / (which they sometimes do), I tried two approaches:

  1. Passing the secret UNESCAPED in the url string (eg [not a real secret]: ABCDefGHikbWUyzkbsIKapTBr/fk4VkGsGWDOS/2XYZ). This fails with:
Caused by: java.lang.NumberFormatException: Error at index 1 in: "ABCDefGHikbWUyzkbsIKapTBr"
	at java.lang.NumberFormatException.forCharSequence(NumberFormatException.java:81) ~[?:?]
	at java.lang.Integer.parseInt(Integer.java:735) ~[?:?]
	at java.net.URLStreamHandler.parseURL(URLStreamHandler.java:223) ~[?:?]
	at java.net.URL.<init>(URL.java:674) ~[?:?]
	at java.net.URL.<init>(URL.java:541) ~[?:?]
	at java.net.URL.<init>(URL.java:488) ~[?:?]
	at apoc.util.FileUtils$SupportedProtocols.from(FileUtils.java:119) ~[apoc-4.4.0.2.jar:4.4.0.2]
	at apoc.util.FileUtils.isFile(FileUtils.java:243) ~[apoc-4.4.0.2.jar:4.4.0.2]
	at apoc.ApocConfig.checkReadAllowed(ApocConfig.java:263) ~[apoc-4.4.0.2.jar:4.4.0.2]
	at apoc.util.FileUtils.inputStreamFor(FileUtils.java:163) ~[apoc-4.4.0.2.jar:4.4.0.2]
	at apoc.util.JsonUtil.loadJson(JsonUtil.java:95) ~[apoc-4.4.0.2.jar:4.4.0.2]
	at apoc.load.LoadJson.loadJsonStream(LoadJson.java:71) ~[apoc-4.4.0.2.jar:4.4.0.2]
	at apoc.load.LoadJson.jsonParams(LoadJson.java:60) ~[apoc-4.4.0.2.jar:4.4.0.2]
	at apoc.load.LoadJson.json(LoadJson.java:49) ~[apoc-4.4.0.2.jar:4.4.0.2]
	at org.neo4j.kernel.impl.proc.GeneratedProcedure_json21076469570321.apply(Unknown Source) ~[?:?]
	at org.neo4j.procedure.impl.ProcedureRegistry.callProcedure(ProcedureRegistry.java:235) ~[neo4j-procedure-4.4.3.jar:4.4.3]
	at org.neo4j.procedure.impl.GlobalProceduresRegistry.callProcedure(GlobalProceduresRegistry.java:352) ~[neo4j-procedure-4.4.3.jar:4.4.3]
	at org.neo4j.kernel.impl.newapi.AllStoreHolder.callProcedure(AllStoreHolder.java:1092) ~[neo4j-kernel-4.4.3.jar:4.4.3]
	at org.neo4j.kernel.impl.newapi.AllStoreHolder.procedureCallRead(AllStoreHolder.java:1004) ~[neo4j-kernel-4.4.3.jar:4.4.3]
	at org.neo4j.cypher.internal.runtime.interpreted.CallSupport$.$anonfun$callReadOnlyProcedure$1(CallSupport.scala:47) ~[neo4j-cypher-interpreted-runtime-4.4.3.jar:4.4.3]
	at org.neo4j.cypher.internal.runtime.interpreted.CallSupport$.callProcedure(CallSupport.scala:70) ~[neo4j-cypher-interpreted-runtime-4.4.3.jar:4.4.3]
	at org.neo4j.cypher.internal.runtime.interpreted.CallSupport$.callReadOnlyProcedure(CallSupport.scala:47) ~[neo4j-cypher-interpreted-runtime-4.4.3.jar:4.4.3]
	at org.neo4j.cypher.internal.runtime.interpreted.TransactionBoundReadQueryContext.callReadOnlyProcedure(TransactionBoundQueryContext.scala:1134) ~[neo4j-cypher-interpreted-runtime-4.4.3.jar:4.4.3]
	... 87 more
  1. Passing the secret UNESCAPED in the url string (eg [this not a real secret]): ABCDefGHikbWUyzkbsIKapTBr%2Ffk4VkGsGWDOS%2F2XYZ). This fails with:
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The AWS Access Key Id you provided does not exist in our records.

So S3ParamsExtractor doesn't url-decode the secret key and apoc.util.FileUtils$SupportedProtocols.from expects the url to be encoded, so they are in conflict with each other.

How to Reproduce the Problem

  1. Construct an S3 URL with a / in the secret key and call a procedure such as apoc.load.csv.
  2. Construct an S3 URL with a %2F in the secret key and call a procedure such as apoc.load.csv.

Simple Dataset (where it's possible)

//Insert here a set of Cypher statements that helps us to reproduce the problem

Steps (Mandatory)

  1. Obtain an AWS secret key with a / in the value
  2. Try to load any S3 url (doesn't have to exist) while passing the credentials as part of the URL (either url-encoded or not)

Screenshots (where it's possibile)

Specifications (Mandatory)

  • The values extracted from S3 URL should be url-decoded

Currently used versions

Versions

  • OS: macOS 11.2.3
  • Neo4j: 4.4.4
  • Neo4j-Apoc: 4.4.0.2
hindog added a commit to hindog/neo4j-apoc-procedures that referenced this issue Apr 8, 2022
@hindog hindog changed the title AWS S3 urls broken if secretKey in url contains '/' Broken handling of AWS S3 urls Apr 8, 2022
hindog added a commit to hindog/neo4j-apoc-procedures that referenced this issue Apr 12, 2022
ncordon added a commit that referenced this issue May 3, 2022
* Fixes #2724, AWS S3 url handling

* Fixes #2269: apoc.load procedures don't work anymore with urls containing %

* Code formatting

* Adds fix for the getHost problem

Co-authored-by: Giuseppe Villani <[email protected]>
Co-authored-by: Nacho Cordón <[email protected]>
vga91 added a commit to vga91/neo4j-apoc-procedures that referenced this issue May 17, 2022
…ib#2725)

* Fixes neo4j-contrib#2724, AWS S3 url handling

* Fixes neo4j-contrib#2269: apoc.load procedures don't work anymore with urls containing %

* Code formatting

* Adds fix for the getHost problem

Co-authored-by: Giuseppe Villani <[email protected]>
Co-authored-by: Nacho Cordón <[email protected]>
vga91 added a commit to vga91/neo4j-apoc-procedures that referenced this issue May 17, 2022
…ib#2725)

* Fixes neo4j-contrib#2724, AWS S3 url handling

* Fixes neo4j-contrib#2269: apoc.load procedures don't work anymore with urls containing %

* Code formatting

* Adds fix for the getHost problem

Co-authored-by: Giuseppe Villani <[email protected]>
Co-authored-by: Nacho Cordón <[email protected]>
conker84 pushed a commit that referenced this issue May 18, 2022
* Fixes #2724, AWS S3 url handling

* Fixes #2269: apoc.load procedures don't work anymore with urls containing %

* Code formatting

* Adds fix for the getHost problem

Co-authored-by: Giuseppe Villani <[email protected]>
Co-authored-by: Nacho Cordón <[email protected]>
vga91 added a commit to vga91/neo4j-apoc-procedures that referenced this issue May 24, 2022
…ib#2725)

* Fixes neo4j-contrib#2724, AWS S3 url handling

* Fixes neo4j-contrib#2269: apoc.load procedures don't work anymore with urls containing %

* Code formatting

* Adds fix for the getHost problem

Co-authored-by: Giuseppe Villani <[email protected]>
Co-authored-by: Nacho Cordón <[email protected]>
vga91 added a commit to vga91/neo4j-apoc-procedures that referenced this issue May 25, 2022
…ib#2725)

* Fixes neo4j-contrib#2724, AWS S3 url handling

* Fixes neo4j-contrib#2269: apoc.load procedures don't work anymore with urls containing %

* Code formatting

* Adds fix for the getHost problem

Co-authored-by: Giuseppe Villani <[email protected]>
Co-authored-by: Nacho Cordón <[email protected]>
vga91 added a commit to vga91/neo4j-apoc-procedures that referenced this issue May 25, 2022
…ib#2725)

* Fixes neo4j-contrib#2724, AWS S3 url handling

* Fixes neo4j-contrib#2269: apoc.load procedures don't work anymore with urls containing %

* Code formatting

* Adds fix for the getHost problem

Co-authored-by: Giuseppe Villani <[email protected]>
Co-authored-by: Nacho Cordón <[email protected]>
vga91 added a commit that referenced this issue May 27, 2022
* Fixes #2724, AWS S3 url handling

* Fixes #2269: apoc.load procedures don't work anymore with urls containing %

* Code formatting

* Adds fix for the getHost problem

Co-authored-by: Giuseppe Villani <[email protected]>
Co-authored-by: Nacho Cordón <[email protected]>
vga91 added a commit that referenced this issue May 27, 2022
* Fixes #2724, AWS S3 url handling

* Fixes #2269: apoc.load procedures don't work anymore with urls containing %

* Code formatting

* Adds fix for the getHost problem

Co-authored-by: Giuseppe Villani <[email protected]>
Co-authored-by: Nacho Cordón <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant