Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some API providers cannot interpret encoded colons in the path #1567

Closed
ykrods opened this issue Jun 13, 2024 · 6 comments · Fixed by #1569
Closed

Some API providers cannot interpret encoded colons in the path #1567

ykrods opened this issue Jun 13, 2024 · 6 comments · Fixed by #1569

Comments

@ykrods
Copy link
Contributor

ykrods commented Jun 13, 2024

Hi and thank you very much for this project at first.

Basic Info

  • Faraday Version: 2.9.1
  • Ruby Version: 3.3.3

Issue description

By the #1237 improvements, colons in path are now encoded as %3A. However, it is up to the server implementation to accept encoded colons, and in fact Firebase's fcm v1 api responds 404 status for encoded messages%3Asend .

https://firebase.google.com/docs/reference/fcm/rest/v1/projects.messages/send

Currently, encoding occurs only in limited cases like a below reproduction code.

And no error occurs in the following situations.

  1. Give the full url to Faraday.initialize() or post() method
  2. Give the full path to post() (like: client.post("/v1/projects/#{project_id}/messages:send") )

Steps to reproduce

(using ENV for secret values)

require "faraday"


project_id = ENV["PROJECT_ID"]
iid_token = ENV["IID_TOKEN"]

def get_access_token
  ENV["ACCESS_TOKEN"]
end

base_url = "https://fcm.googleapis.com/v1/projects/#{project_id}"
path = "messages:send"


client = Faraday.new(url: base_url) do |builder|
  builder.request :authorization, 'Bearer', get_access_token
  builder.request :json
  builder.response :json,
                   content_type: /application\/json/,
                   parser_options: { symbolize_names: true }
  builder.response :logger,
                   Logger.new(STDOUT),
                   { headers: true, bodies: true, errors: true }
end

res = client.post(path) do |req|
  req.body = {
    validate_only: true,
    message: {
      token: iid_token,
      notification: {
        title: "test",
        body: "test",
      }
    }
  }
end

Execution log

I, [2024-06-13T08:31:07.928238 #1]  INFO -- request: POST https://fcm.googleapis.com/v1/projects/***/messages%3Asend
I, [2024-06-13T08:31:07.928371 #1]  INFO -- request: User-Agent: "Faraday v2.9.1"
Authorization: "Bearer ***"
Content-Type: "application/json"
I, [2024-06-13T08:31:07.928417 #1]  INFO -- request: {"validate_only":true,"message":{"token":"***","notification":{"title":"test","body":"test"}}}
I, [2024-06-13T08:31:08.084930 #1]  INFO -- response: Status 404
I, [2024-06-13T08:31:08.085055 #1]  INFO -- response: content-type: "text/html"
date: "Thu, 13 Jun 2024 08:31:08 GMT"
server: "scaffolding on HTTPServer2"
content-length: "0"
x-xss-protection: "0"
x-frame-options: "SAMEORIGIN"
x-content-type-options: "nosniff"
alt-svc: "h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000"
I, [2024-06-13T08:31:08.085115 #1]  INFO -- response:

My thoughts

def build_exclusive_url(url = nil, params = nil, params_encoder = nil)
url = nil if url.respond_to?(:empty?) && url.empty?
base = url_prefix.dup
if url && !base.path.end_with?('/')
base.path = "#{base.path}/" # ensure trailing slash
end
url = url.to_s.gsub(':', '%3A') if URI.parse(url.to_s).opaque
uri = url ? base + url : base
if params
uri.query = params.to_query(params_encoder || options.params_encoder)
end
uri.query = nil if uri.query && uri.query.empty?
uri
end

How about replace URL + string merge to string + string merge

- url = url.to_s.gsub(':', '%3A') if URI.parse(url.to_s).opaque
- uri = url ? base + url : base
+ uri = url ? URI.parse(base.to_s + url) : base

This has the following advantages and disadvantages.

Advantages.

  1. No error if colons are included in the path url.
  2. User can decide whether to encode colons or not.
  3. The path contained in base url is not ignored if the path url starts with slash.

Disadvantages.

  1. Inefficient because base is already parsed to URL.
  2. Error might occur if query or hash is contained in the base url.
  3. Needs to remove first slash because base url is assumed to end with slash.
@iMacTia
Copy link
Member

iMacTia commented Jun 13, 2024

Hi @ykrods and thank you for the thorough investigation!
The escaping actually came out during the review of #1237, although at the time we didn’t find any case where this could cause a problem. Now that you’ve brought one, I really think we should get this fixed.

The underlying issue was described in this comment, so I want to make sure we don’t fall back into that. To check your proposed solution, I’ve run the following:

base = URI.parse('https://service.com/')
url = 'service:search?limit=50&offset=400'
uri = url ? URI.parse(base.to_s + url) : base # this is your proposed solution, which works!
=> #<URI::HTTPS https://service.com/service:search?limit=50&offset=400>

Thank you also for pointing out pros/cons of the proposed solution.
Looking at the disadvantages:

  1. I really don’t think this is gonna be a problem. Running the following benchmark actually shows the proposed solutions to be much faster, actually!
Benchmark.bm do |x|
  x.report do
    10_000.times do
      url = url.to_s.gsub(':', '%3A') if URI.parse(url.to_s).opaque
      uri = url ? base + url : base
    end
  end
  x.report { 10_000.times { URI.parse(base.to_s + url) } }
end
       user     system      total        real
   0.184706   0.021542   0.206248 (  0.206335)
   0.040563   0.001414   0.041977 (  0.041988)
  1. If the base already contains query params or hash, then there should be no URL?
  2. Could you articulate more on this one? I’m not sure I understand which case you’re pointing out here

@ykrods
Copy link
Contributor Author

ykrods commented Jun 13, 2024

@iMacTia

  1. I really don’t think this is gonna be a problem. Running the following benchmark actually shows the proposed solutions to be much faster, actually!

Thanks for benchmarking and very good news!

  1. If the base already contains query params or hash, then there should be no URL?

I know this might not be a common use case, but I concerned about like a below case.

base = URI.parse("https://example.com/search?limit=100")
url = "additional_path"
p URI.parse(base.to_s + url) # ==> <URI::HTTPS https://example.com/search?limit=100additional_path>

However, I found query is removed in url_prefix= method, so it's disadvantage has been resolved.

def url_prefix=(url, encoder = nil)
uri = @url_prefix = Utils.URI(url)
self.path_prefix = uri.path
params.merge_query(uri.query, encoder)
uri.query = nil

  1. Could you articulate more on this one? I’m not sure I understand which case you’re pointing out here

For example, the following code, if patched as is, would result in an extra slash.

con = Faraday.new(url: "http://example.com/v1")
p con.build_exclusive_url("/search")
# ==> <URI::HTTP http://example.com/v1//search>

It seems to me that we could simply remove the leading slash from the non-base url, but I'm concerned that there may be use cases that I'm not recognizing.

@ykrods
Copy link
Contributor Author

ykrods commented Jun 16, 2024

Sorry for changing my mind, but I came up with another, better idea.

As mentioned in #1237, the point of problem is caused by string before the colon (like service below) being parsed as a uri scheme when a url is converted to a URI in base + url.

p URI.parse("service:search").scheme  # ==> "service"

URI.parse is converting uri according to RFC3986 (RFC2396) so it is normal behaviour.

Here, quoting from rfc3986 4.2, is

A path segment that contains a colon character (e.g., "this:that")
cannot be used as the first segment of a relative-path reference, as
it would be mistaken for a scheme name. Such a segment must be
preceded by a dot-segment (e.g., "./this:that") to make a relative-
path reference.

and url can be parsed as a relative path reference by prepending a dot segment as noted above.

base = URI.parse("http://example.com/v1/")
p base + "service:search"    # ==> #<URI::Generic service:search>
p base + "./service:search"  # ==> #<URI::HTTP http://example.com/v1/service:search>

Applying this to the current build_exclusive_url would result in the following code.

-     url = url.to_s.gsub(':', '%3A') if URI.parse(url.to_s).opaque
+     url = "./#{url}" if url.respond_to?(:start_with?) && !url.start_with?("http://", "https://", "/", "./")
      uri = url ? base + url : base

This seems to be a smaller change than string joins.

references

@iMacTia
Copy link
Member

iMacTia commented Jun 17, 2024

Thank you for pointing that out!
You're right, url could be a full address starting with https://, which would make the proposed solution break.

I agree with your take that it's probably best to keep base as a URI to preserve as much behaviour as possible, so your new proposed solution makes sense 👍

Could you re-run the benchmark I shared above just to make sure this is also in line (or faster) than the current implementation?
And if so, would you be interested in giving it a stub on a new PR 🙂?

@ykrods
Copy link
Contributor Author

ykrods commented Jun 18, 2024

@iMacTia

Thanks for your response!

Below are the benchmark results:

require "uri"
require "benchmark"


base = URI.parse("https://example.com/")

def t1(base, url)
  url = url.to_s.gsub(":", "%3A") if URI.parse(url.to_s).opaque
  uri = url ? base + url : base
end

def t2(base, url)
  url = "./#{url}" if url.respond_to?(:start_with?) && !url.start_with?("http://", "https://", "/", "./")
  uri = url ? base + url : base
end


Benchmark.bm do |x|
  p "case 1: with colon"
  x.report { 10_000.times { t1 base, "service:search" } }
  x.report { 10_000.times { t2 base, "service:search" } }

  p "case 2: without colon"
  x.report { 10_000.times { t1 base, "service/search" } }
  x.report { 10_000.times { t2 base, "service/search" } }
end

output:

       user     system      total        real
"case 1: with colon"
   0.125153   0.000436   0.125589 (  0.125913)
   0.071536   0.000000   0.071536 (  0.071653)
"case 2: without colon"
   0.100534   0.000157   0.100691 (  0.100862)
   0.073407   0.000057   0.073464 (  0.073600)

And I create a new PR: #1569

@iMacTia
Copy link
Member

iMacTia commented Jun 18, 2024

The new solution is faster in both cases, this is great 🎉 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants