Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header passing fails with header that starts with HTTP_ #590

Closed
ben519 opened this issue May 7, 2019 · 5 comments
Closed

Header passing fails with header that starts with HTTP_ #590

ben519 opened this issue May 7, 2019 · 5 comments
Labels
bug an unexpected problem or unintended behavior help wanted ❤️ we'd love your help!

Comments

@ben519
Copy link

ben519 commented May 7, 2019

I'm using httr to query some data from the Shopify API. Every call I make generates the warning message "In parse_http_status(lines[[1]]) : NAs introduced by coercion". For example,

response <- GET(
  url = "https://test-store.myshopify.com/admin/orders/count.json",
  httr::add_headers(Accept = "application/json"),
  httr::authenticate(user = "abc", password = "def")
)

Warning message:
In parse_http_status(lines[[1]]) : NAs introduced by coercion

It seems that the culprit is this function

parse_http_headers <- function(raw) {
  lines <- strsplit(rawToChar(raw), "\r?\n")[[1]]

  new_response <- grepl("^HTTP", lines)
  grps <- cumsum(new_response)

  lapply(unname(split(lines, grps)), parse_single_header)
}

specifically this bit: new_response <- grepl("^HTTP", lines) that attempts to pick out HTTP status lines from the vector of lines. Shopify's API always returns a header line like "HTTP_X_SHOPIFY_SHOP_API_CALL_LIMIT: 1/40" and this is falsely getting identified as an HTTP status line. Further down, it's getting parsed and an as.integer(status$status) bit it causing the coercion warning.

Can the regex used to identify HTTP status lines be tightened to prevent this false positive? Apologies - I don't know enough about possible HTTP status formats to make a good suggestion.

Thanks!

@ben519 ben519 changed the title "In parse_http_status(lines[[1]]) : NAs introduced by coercion" warning caused by false identification of HTTP response "In parse_http_status(lines[[1]]) : NAs introduced by coercion" warning caused by false identification of HTTP status line May 7, 2019
@hadley hadley added bug an unexpected problem or unintended behavior help wanted ❤️ we'd love your help! labels Apr 3, 2020
@hadley hadley changed the title "In parse_http_status(lines[[1]]) : NAs introduced by coercion" warning caused by false identification of HTTP status line Header passing fails with header that starts with HTTP_ Apr 3, 2020
@sharlagelfand
Copy link

I'm interested in getting this fixed for an API I'm working with (the offending header in my case is HTTP_API_VERSION: v0.19.2) and happy to make a PR, but I also don't know enough about HTTP status formats!

Do status codes always start with HTTP/? Seems like all the ones being tested do (https://github.com/r-lib/httr/blob/master/tests/testthat/test-header.r#L24) so I'm happy to update the offending line to check for that if that's the case - or maybe what's needed is that the line starts with HTTP but not HTTP_, as the updated title indicates?

ty!

@deschen1
Copy link

deschen1 commented Aug 23, 2022

Not sure if of any help. But today I got the exact same warning for another API (Caplena - non-public, so can't share any reprex) that is new, i.e. it never occured before and I didn't change any code.

The specifc API request is (where caplena_key is my API key):

library(httr)
response <- GET(url = paste0("https://api.caplena.com/api/projects/"),
                add_headers("Authorization" = paste0("apikey ", caplena_key)))

Warning message:
In parse_http_status(lines[[1]]) : NAs introduced by coercion

The problem is, if I now want to turn the response object into something readable, I was able to do so previously with:

projects <- content(response, encoding = "UTF-8")

This returned a list obbject I could then process further.

Now, I have to change this to

projects <- content(response, encoding = "UTF-8", type = "application/json")

Any helpful information I should add here from the response objects (e.g. the headers or sth.)?

@deschen1
Copy link

deschen1 commented Aug 23, 2022

One small update. I tested it a bit further and the warning only occurs on Windows machines, NOT on MACs.

My IT suggests that it might indeed be related to this part here:

parse_http_headers <- function(raw) {
  lines <- strsplit(rawToChar(raw), "\r?\n")[[1]]

  new_response <- grepl("^HTTP", lines)
  grps <- cumsum(new_response)

  lapply(unname(split(lines, grps)), parse_single_header)
}

especially the second line, because there is no carriage return on Windows. Or the order of line break and carraige return is different than on Mac, (I didn't fully understand this part).

OK, did some more debugging, and the issue is in httr:::parse_single_header.

Before getting into that function we are in the above function where the results of the lines, new_response and grps is:


lines <- c("HTTP/1.1 200 OK", "Server: nginx/1.18.0", "Date: Tue, 23 Aug 2022 15:50:57 GMT", 
           "Content-Type: application/json", "Vary: Accept, Accept-Language, Origin, Cookie", 
           "Allow: POST, GET", "HTTP_X_CORRELATION_ID: 44398-1", "Content-Language: en", 
           "X-Frame-Options: DENY", "X-Content-Type-Options: nosniff", "Referrer-Policy: same-origin", 
           "X-Protected-By: Sqreen", "Strict-Transport-Security: max-age=31536000; preload", 
           "Content-Encoding: gzip", "Via: 1.1 google", "Alt-Svc: h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000", 
           "Transfer-Encoding: chunked", "")

new_response <- c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)

gprs <- c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L)

And the problem is in the second group. In the lapply, the thing that gets fed to the httr:::parse_single_header is:

lines <- c("HTTP_X_CORRELATION_ID: 44398-1", "Content-Language: en", "X-Frame-Options: DENY", 
           "X-Content-Type-Options: nosniff", "Referrer-Policy: same-origin", 
           "X-Protected-By: Sqreen", "Strict-Transport-Security: max-age=31536000; preload", 
           "Content-Encoding: gzip", "Via: 1.1 google", "Alt-Svc: h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000", 
           "Transfer-Encoding: chunked", "")

And if we feed it to the httr:::parse_single_header, then the warning occurs directly in the first line where parse_http_status is called:

parse_single_header <- function (lines) 
{
    status <- parse_http_status(lines[[1]])
    ...
}
parse_http_status <- function (x) 
{
  status <- as.list(strsplit(x, "\\s+")[[1]])
  names(status) <- c("version", "status", "message")[seq_along(status)]
  status$status <- as.integer(status$status)
  status
}

where x <- "HTTP_X_CORRELATION_ID: 44398-1"

And then the first line returns a list with two elements, but the names(status) part expects three elements and then wants to turn the status into an integer, which doesn't work here.

So that's what I can contribute and I hope there is something we/you can do about it.

FWIW, here's also the content of the raw object that is required to create the lines in the parse_http_headers function.

raw <- as.raw(c(0x48, 0x54, 0x54, 0x50, 0x2f, 0x31, 0x2e, 0x31, 0x20, 
0x32, 0x30, 0x30, 0x20, 0x4f, 0x4b, 0x0d, 0x0a, 0x53, 0x65, 0x72, 
0x76, 0x65, 0x72, 0x3a, 0x20, 0x6e, 0x67, 0x69, 0x6e, 0x78, 0x2f, 
0x31, 0x2e, 0x31, 0x38, 0x2e, 0x30, 0x0d, 0x0a, 0x44, 0x61, 0x74, 
0x65, 0x3a, 0x20, 0x54, 0x75, 0x65, 0x2c, 0x20, 0x32, 0x33, 0x20, 
0x41, 0x75, 0x67, 0x20, 0x32, 0x30, 0x32, 0x32, 0x20, 0x31, 0x35, 
0x3a, 0x35, 0x30, 0x3a, 0x35, 0x37, 0x20, 0x47, 0x4d, 0x54, 0x0d, 
0x0a, 0x43, 0x6f, 0x6e, 0x74, 0x65, 0x6e, 0x74, 0x2d, 0x54, 0x79, 
0x70, 0x65, 0x3a, 0x20, 0x61, 0x70, 0x70, 0x6c, 0x69, 0x63, 0x61, 
0x74, 0x69, 0x6f, 0x6e, 0x2f, 0x6a, 0x73, 0x6f, 0x6e, 0x0d, 0x0a, 
0x56, 0x61, 0x72, 0x79, 0x3a, 0x20, 0x41, 0x63, 0x63, 0x65, 0x70, 
0x74, 0x2c, 0x20, 0x41, 0x63, 0x63, 0x65, 0x70, 0x74, 0x2d, 0x4c, 
0x61, 0x6e, 0x67, 0x75, 0x61, 0x67, 0x65, 0x2c, 0x20, 0x4f, 0x72, 
0x69, 0x67, 0x69, 0x6e, 0x2c, 0x20, 0x43, 0x6f, 0x6f, 0x6b, 0x69, 
0x65, 0x0d, 0x0a, 0x41, 0x6c, 0x6c, 0x6f, 0x77, 0x3a, 0x20, 0x50, 
0x4f, 0x53, 0x54, 0x2c, 0x20, 0x47, 0x45, 0x54, 0x0d, 0x0a, 0x48, 
0x54, 0x54, 0x50, 0x5f, 0x58, 0x5f, 0x43, 0x4f, 0x52, 0x52, 0x45, 
0x4c, 0x41, 0x54, 0x49, 0x4f, 0x4e, 0x5f, 0x49, 0x44, 0x3a, 0x20, 
0x34, 0x34, 0x33, 0x39, 0x38, 0x2d, 0x31, 0x0d, 0x0a, 0x43, 0x6f, 
0x6e, 0x74, 0x65, 0x6e, 0x74, 0x2d, 0x4c, 0x61, 0x6e, 0x67, 0x75, 
0x61, 0x67, 0x65, 0x3a, 0x20, 0x65, 0x6e, 0x0d, 0x0a, 0x58, 0x2d, 
0x46, 0x72, 0x61, 0x6d, 0x65, 0x2d, 0x4f, 0x70, 0x74, 0x69, 0x6f, 
0x6e, 0x73, 0x3a, 0x20, 0x44, 0x45, 0x4e, 0x59, 0x0d, 0x0a, 0x58, 
0x2d, 0x43, 0x6f, 0x6e, 0x74, 0x65, 0x6e, 0x74, 0x2d, 0x54, 0x79, 
0x70, 0x65, 0x2d, 0x4f, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x73, 0x3a, 
0x20, 0x6e, 0x6f, 0x73, 0x6e, 0x69, 0x66, 0x66, 0x0d, 0x0a, 0x52, 
0x65, 0x66, 0x65, 0x72, 0x72, 0x65, 0x72, 0x2d, 0x50, 0x6f, 0x6c, 
0x69, 0x63, 0x79, 0x3a, 0x20, 0x73, 0x61, 0x6d, 0x65, 0x2d, 0x6f, 
0x72, 0x69, 0x67, 0x69, 0x6e, 0x0d, 0x0a, 0x58, 0x2d, 0x50, 0x72, 
0x6f, 0x74, 0x65, 0x63, 0x74, 0x65, 0x64, 0x2d, 0x42, 0x79, 0x3a, 
0x20, 0x53, 0x71, 0x72, 0x65, 0x65, 0x6e, 0x0d, 0x0a, 0x53, 0x74, 
0x72, 0x69, 0x63, 0x74, 0x2d, 0x54, 0x72, 0x61, 0x6e, 0x73, 0x70, 
0x6f, 0x72, 0x74, 0x2d, 0x53, 0x65, 0x63, 0x75, 0x72, 0x69, 0x74, 
0x79, 0x3a, 0x20, 0x6d, 0x61, 0x78, 0x2d, 0x61, 0x67, 0x65, 0x3d, 
0x33, 0x31, 0x35, 0x33, 0x36, 0x30, 0x30, 0x30, 0x3b, 0x20, 0x70, 
0x72, 0x65, 0x6c, 0x6f, 0x61, 0x64, 0x0d, 0x0a, 0x43, 0x6f, 0x6e, 
0x74, 0x65, 0x6e, 0x74, 0x2d, 0x45, 0x6e, 0x63, 0x6f, 0x64, 0x69, 
0x6e, 0x67, 0x3a, 0x20, 0x67, 0x7a, 0x69, 0x70, 0x0d, 0x0a, 0x56, 
0x69, 0x61, 0x3a, 0x20, 0x31, 0x2e, 0x31, 0x20, 0x67, 0x6f, 0x6f, 
0x67, 0x6c, 0x65, 0x0d, 0x0a, 0x41, 0x6c, 0x74, 0x2d, 0x53, 0x76, 
0x63, 0x3a, 0x20, 0x68, 0x33, 0x3d, 0x22, 0x3a, 0x34, 0x34, 0x33, 
0x22, 0x3b, 0x20, 0x6d, 0x61, 0x3d, 0x32, 0x35, 0x39, 0x32, 0x30, 
0x30, 0x30, 0x2c, 0x68, 0x33, 0x2d, 0x32, 0x39, 0x3d, 0x22, 0x3a, 
0x34, 0x34, 0x33, 0x22, 0x3b, 0x20, 0x6d, 0x61, 0x3d, 0x32, 0x35, 
0x39, 0x32, 0x30, 0x30, 0x30, 0x0d, 0x0a, 0x54, 0x72, 0x61, 0x6e, 
0x73, 0x66, 0x65, 0x72, 0x2d, 0x45, 0x6e, 0x63, 0x6f, 0x64, 0x69, 
0x6e, 0x67, 0x3a, 0x20, 0x63, 0x68, 0x75, 0x6e, 0x6b, 0x65, 0x64, 
0x0d, 0x0a, 0x0d, 0x0a))

One last update:

it seems the initial problem why it (now) doesn't work on Windows, but still works on Mac, is that this second http occurence "HTTP_X_CORRELATION_ID: 44398-1" is uppercase on Windows, but lowercase on Mac. And that's why the function gets confused on Windows because it checks for uppercase HTTP in the parse_http_headers function.

@deschen1
Copy link

Investigated a bit further here and first thing is that HTTP headers are case insensitive, so not sure what httr would return if a website uses lowercase (because the grepl in parse_http_status seems to check for uppercase HTTP): https://stackoverflow.com/questions/5258977/are-http-headers-case-sensitive

And second thing is that the parsing of the status purely by the header starting with "http" is problematic. Maybe the header should be split and then checked if any element contains a number that would be a valid status response (e.g. a 200 or 404 or sth.).

@hadley
Copy link
Member

hadley commented Oct 31, 2023

httr has been superseded in favour of httr2, so is no longer under active development. If this problem is still important to you in httr2, I'd suggest filing an issue offer there 😄. Thanks for using httr!

@hadley hadley closed this as completed Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior help wanted ❤️ we'd love your help!
Projects
None yet
Development

No branches or pull requests

4 participants