Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long query URL gives error in oa_request() but works in browser #216

Open
rkrug opened this issue Mar 8, 2024 · 7 comments
Open

Long query URL gives error in oa_request() but works in browser #216

rkrug opened this issue Mar 8, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@rkrug
Copy link

rkrug commented Mar 8, 2024

I have an extremely long search query which works in the browser.

But when running

library(devtools)
#> Loading required package: usethis
library(openalexR)
#> Thank you for using openalexR!
#> To acknowledge our work, please cite the package by calling `citation("openalexR")`.
#> To suppress this message, add `openalexR.message = suppressed` to your .Renviron file.
oa_request(
    query_url = "https://api.openalex.org/works?page=1&filter=title_and_abstract.search:%22Agriculture+reform%22+OR+%22ocean+reform%22+OR+%22energy+reform%22+OR+%22decarbonization%22+OR+%22Eco-friendly+Subsidies%22+OR+%22Green+Subsidies%22+OR+%22Polluter+Pays+Principle%22+OR+%22Environmental+Externalities%22+OR+%22Biodiversity+Offsetting%22+OR+%22Conservation+Finance%22+OR+%22Payment+for+Ecosystem+Services%22+OR+%22Agri-environmental+Schemes%22+OR+%22Cross-compliance%22+OR+%22Eco-taxes%22+OR+%22Sustainable+Agriculture+Incentives%22+OR+%22Carbon+Pricing%22+OR+%22Biodiversity+Credits%22+OR+%22Habitat+Banking%22+OR+%22Rewilding+Incentives%22+OR+%22Green+Bonds%22+OR+%22Ecological+Fiscal+Transfers%22+OR+%22Renewable+Energy+Subsidies%22+OR+%22Water+Quality+Trading%22+OR+%22Sustainable+Fisheries+Subsidies%22+OR+%22Green+Certification+Schemes%22+OR+%22Conservation+Easements%22+OR+%22Environmental+Impact+Bonds%22+OR+%22Climate+Smart+Agriculture%22+OR+%22Natural+Capital+Financing%22+OR+%22Bioenergy%22+OR+%22Forest+Carbon+Credits%22+OR+%22Blue+Carbon+Initiatives%22+OR+%22Green+Public+Procurement%22+OR+%22Integrated+Pest+Management+Incentives%22+%22Wildlife+Corridors+Funding%22+OR+%22Biodiversity+Banking%22+OR+%22Climate+Adaptation+Finance%22+OR+%22Deforestation+Reduction+Programs%22+OR+%22Environmental+Risk+Assessment%22+OR+%22Green+Infrastructure+Investments%22+OR+%22High+Conservation+Value+Incentives%22+OR+%22Landscape+Restoration+Funds%22+OR+%22Marine+Protected+Areas+Support%22+OR+%22Natural+Resource+Management%22+OR+%22Organic+Farming+Subsidies%22+OR+%22Permaculture+Design+Grants%22+OR+%22Pollination+Services+Payments%22+OR+%22Protected+Area+Financing%22+OR+%22Regenerative+Agriculture+Support%22+OR+%22Sustainability+Linked+Loans%22+OR+%22Urban+Greening+Grants%22+OR+%22Wetlands+Restoration+Funding%22+OR+%22Zero+Emission+Vehicle+Incentives%22+OR+%22Adaptive+Management+Practices%22+OR+%22Biodiversity+Informatics%22+OR+%22Climate+Bonds%22+OR+%22Debt-for-Nature+Swap%22+OR+%22Ecosystem-Based+Adaptation%22+OR+%22Forest+Stewardship+Council+Certification%22+OR+%22Greenhouse+Gas+Inventory%22+%22Habitat+Restoration+Grants%22+OR+%22Invasive+Species+Control+Funding%22+OR+%22Land+Degradation+Neutrality+Fund%22+OR+%22Mitigation+Banking%22+OR+%22Non-Timber+Forest+Product+Incentives%22+%22Ocean+Acidification+Research+Grants%22+OR+%22Pollinator+Habitat+Enhancement%22+OR+%22Renewable+Energy+Certificates%22+OR+%22Soil+Health+Improvement+Programs%22+OR+%22Tree+Planting+Campaigns%22+OR+%22Wildlife+Management+Areas%22+OR+%22Biodiversity+Strategy+and+Action+Plans%22+OR+%22Circular+Economy+Initiatives%22+OR+%22Disaster+Risk+Reduction+Funding%22+OR+%22DRR+Funding%22+OR+%22Ecosystem+Valuation%22+OR+%22Fisheries+Improvement+Projects%22+OR+%22Green+Job+Training+Programs%22+OR+%22Holistic+Management+Funding%22+OR+%22Indigenous+Peoples%27+Biodiversity+Conservation%22+OR+%22Landscape+Connectivity+Projects%22+OR+%22Mangrove+Restoration+Initiatives%22+OR+%22Nature-based+Solutions%22+OR+%22Organic+Certification+Cost+Share%22+OR+%22Peatland+Restoration+and+Management%22+OR+%22Quantitative+Easing+for+the+Planet%22+OR+%22Riparian+Buffer+Zones+Support%22+OR+%22Sustainable+Land+Management%22+OR+%22Threatened+Species+Recovery+Plans%22+OR+%22Urban+Biodiversity+Enhancement%22+OR+%22Vertical+Farming+Incentives%22+OR+%22Water+Efficiency+Programs%22+OR+%22Xeriscaping+Rebates%22+OR+%22Youth+Engagement+in+Conservation%22+OR+%22Zero-waste+Strategies%22+OR+%22Agrobiodiversity+Conservation+Subsidies%22+OR+%22Biochar+Production+Incentives%22+OR+%22Climate+Resilience+Building%22+OR+%22Drought+Management+Assistance%22+OR+%22Eco-labeling+Programs%22+OR+%22Functional+Biodiversity+Promotion%22+OR+%22Green+Supply+Chain+Financing%22+OR+%22Hedgerow+Restoration+Support%22+OR+%22Integrated+Water+Resources+Management+Funding%22+OR+%22Jungle+Restoration+Projects%22",
    verbose = TRUE
)
#> Error: lexical error: invalid char in json text.
#>                                        <html>   <head>     <title>Bad 
#>                      (right here) ------^

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.3 (2024-02-29)
#>  os       macOS Sonoma 14.4
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Zurich
#>  date     2024-03-08
#>  pandoc   3.1.12.2 @ /opt/homebrew/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cachem        1.0.8   2023-05-01 [1] CRAN (R 4.3.0)
#>  cli           3.6.2   2023-12-11 [1] CRAN (R 4.3.1)
#>  curl          5.2.1   2024-03-01 [1] CRAN (R 4.3.1)
#>  devtools    * 2.4.5   2022-10-11 [1] CRAN (R 4.3.0)
#>  digest        0.6.34  2024-01-11 [1] CRAN (R 4.3.1)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.3.0)
#>  evaluate      0.23    2023-11-01 [1] CRAN (R 4.3.1)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
#>  fs            1.6.3   2023-07-20 [1] CRAN (R 4.3.0)
#>  glue          1.7.0   2024-01-09 [1] CRAN (R 4.3.1)
#>  htmltools     0.5.7   2023-11-03 [1] CRAN (R 4.3.1)
#>  htmlwidgets   1.6.4   2023-12-06 [1] CRAN (R 4.3.1)
#>  httpuv        1.6.14  2024-01-26 [1] CRAN (R 4.3.1)
#>  httr          1.4.7   2023-08-15 [1] CRAN (R 4.3.0)
#>  jsonlite      1.8.8   2023-12-04 [1] CRAN (R 4.3.1)
#>  knitr         1.45    2023-10-30 [1] CRAN (R 4.3.1)
#>  later         1.3.2   2023-12-06 [1] CRAN (R 4.3.1)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.3.1)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
#>  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.3.0)
#>  mime          0.12    2021-09-28 [1] CRAN (R 4.3.0)
#>  miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0)
#>  openalexR   * 1.2.3   2023-11-16 [1] CRAN (R 4.3.1)
#>  pkgbuild      1.4.3   2023-12-10 [1] CRAN (R 4.3.1)
#>  pkgload       1.3.4   2024-01-16 [1] CRAN (R 4.3.1)
#>  profvis       0.3.8   2023-05-02 [1] CRAN (R 4.3.0)
#>  promises      1.2.1   2023-08-10 [1] CRAN (R 4.3.0)
#>  purrr         1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.3.0)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.3.0)
#>  R.oo          1.26.0  2024-01-24 [1] CRAN (R 4.3.1)
#>  R.utils       2.12.3  2023-11-18 [1] CRAN (R 4.3.1)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
#>  Rcpp          1.0.12  2024-01-09 [1] CRAN (R 4.3.1)
#>  remotes       2.4.2.1 2023-07-18 [1] CRAN (R 4.3.0)
#>  reprex        2.1.0   2024-01-11 [1] CRAN (R 4.3.1)
#>  rlang         1.1.3   2024-01-10 [1] CRAN (R 4.3.1)
#>  rmarkdown     2.26    2024-03-05 [1] CRAN (R 4.3.1)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
#>  shiny         1.8.0   2023-11-17 [1] CRAN (R 4.3.1)
#>  stringi       1.8.3   2023-12-11 [1] CRAN (R 4.3.1)
#>  stringr       1.5.1   2023-11-14 [1] CRAN (R 4.3.1)
#>  styler        1.10.2  2023-08-29 [1] CRAN (R 4.3.0)
#>  urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.3.0)
#>  usethis     * 2.2.3   2024-02-19 [1] CRAN (R 4.3.1)
#>  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.3.1)
#>  withr         3.0.0   2024-01-16 [1] CRAN (R 4.3.1)
#>  xfun          0.42    2024-02-08 [1] CRAN (R 4.3.1)
#>  xtable        1.8-4   2019-04-21 [1] CRAN (R 4.3.0)
#>  yaml          2.3.8   2023-12-11 [1] CRAN (R 4.3.1)
#> 
#>  [1] /Users/rainerkrug/R/library/aarch64-apple-darwin20/4.3
#>  [2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2024-03-08 with reprex v2.1.0

@yjunechoe
Copy link
Collaborator

yjunechoe commented Mar 8, 2024

Wow this one is really really weird. The problem isn't even about length of the query string. Minimal reprex:

query_substr <- "https://api.openalex.org/works?page=1&filter=title_and_abstract.search:%22Agriculture+reform%22+OR+%22ocean+reform%22"
oa_request(query_substr)
#> Warning in oa_request(query_substr): No records found!
#> list()
httr::GET(query_substr)
#> Response [https://api.openalex.org/works?page=1&filter=title_and_abstract.search:%22Agriculture+reform%22+OR+%22ocean+reform%22]
#>   Date: 2024-03-08 18:53
#>   Status: 200
#>   Content-Type: application/json
#>   Size: 332 kB
#> {"meta":{"count":1717,"db_response_time_ms":222,"page":1,"per_page":25,"groups_count":null},"results":[{"id...

This happens because httr::GET() for some reason mangles the url when we specify query = .... So with our per-page=1 default:

httr::GET(query_substr, query = list(`per-page` = 1))
#> Response [https://api.openalex.org/works?page=1&filter=title_and_abstract.search%3A%22Agriculture%2Breform%22%2BOR%2B%22ocean%2Breform%22&per-page=1]
#>   Date: 2024-03-08 18:57
#>   Status: 200
#>   Content-Type: application/json
#>   Size: 115 B
#> {"meta":{"count":0,"db_response_time_ms":68,"page":1,"per_page":1,"groups_count":null},"results":[],"group_...

Essentially, GET() sees the " but encoded as %22, so does not escape it with the slash.

So instead of this url from above:

bad_url <- "https://api.openalex.org/works?page=1&filter=title_and_abstract.search%3A%22Agriculture%2Breform%22%2BOR%2B%22ocean%2Breform%22&per-page=1"

GET() should instead be sending something like this:

good_url <- "https://api.openalex.org/works?page=1&filter=title_and_abstract.search:%5C%22Agriculture+reform%5C%22+OR+%5C%22ocean+reform%5C%22&per-page=1"
httr::GET(good_url)
#> Response [https://api.openalex.org/works?page=1&filter=title_and_abstract.search:%5C%22Agriculture+reform%5C%22+OR+%5C%22ocean+reform%5C%22&per-page=1]
#>   Date: 2024-03-08 19:33
#>   Status: 200
#>   Content-Type: application/json
#>   Size: 9.69 kB
#> {"meta":{"count":35789,"db_response_time_ms":338,"page":1,"per_page":1,"groups_count":null},"results":[{"id...

One hacky way around that is to add the slash character and ensure that it decodes before GET() sees it:

httr::GET(
  URLdecode(gsub("%22", "%5C%22", bad_url))
)
#> Response [https://api.openalex.org/works?page=1&filter=title_and_abstract.search:\"Agriculture+reform\"+OR+\"ocean+reform\"&per-page=1]
#>   Date: 2024-03-08 19:30
#>   Status: 200
#>   Content-Type: application/json
#>   Size: 9.69 kB
#> {"meta":{"count":35789,"db_response_time_ms":338,"page":1,"per_page":1,"groups_count":null},"results":[{"id...

So for your reprex, you can do reformat your url:

query_url <- "https://api.openalex.org/works?page=1&filter=title_and_abstract.search:%22Agriculture+reform%22+OR+%22ocean+reform%22+OR+%22energy+reform%22+OR+%22decarbonization%22+OR+%22Eco-friendly+Subsidies%22+OR+%22Green+Subsidies%22+OR+%22Polluter+Pays+Principle%22+OR+%22Environmental+Externalities%22+OR+%22Biodiversity+Offsetting%22+OR+%22Conservation+Finance%22+OR+%22Payment+for+Ecosystem+Services%22+OR+%22Agri-environmental+Schemes%22+OR+%22Cross-compliance%22+OR+%22Eco-taxes%22+OR+%22Sustainable+Agriculture+Incentives%22+OR+%22Carbon+Pricing%22+OR+%22Biodiversity+Credits%22+OR+%22Habitat+Banking%22+OR+%22Rewilding+Incentives%22+OR+%22Green+Bonds%22+OR+%22Ecological+Fiscal+Transfers%22+OR+%22Renewable+Energy+Subsidies%22+OR+%22Water+Quality+Trading%22+OR+%22Sustainable+Fisheries+Subsidies%22+OR+%22Green+Certification+Schemes%22+OR+%22Conservation+Easements%22+OR+%22Environmental+Impact+Bonds%22+OR+%22Climate+Smart+Agriculture%22+OR+%22Natural+Capital+Financing%22+OR+%22Bioenergy%22+OR+%22Forest+Carbon+Credits%22+OR+%22Blue+Carbon+Initiatives%22+OR+%22Green+Public+Procurement%22+OR+%22Integrated+Pest+Management+Incentives%22+%22Wildlife+Corridors+Funding%22+OR+%22Biodiversity+Banking%22+OR+%22Climate+Adaptation+Finance%22+OR+%22Deforestation+Reduction+Programs%22+OR+%22Environmental+Risk+Assessment%22+OR+%22Green+Infrastructure+Investments%22+OR+%22High+Conservation+Value+Incentives%22+OR+%22Landscape+Restoration+Funds%22+OR+%22Marine+Protected+Areas+Support%22+OR+%22Natural+Resource+Management%22+OR+%22Organic+Farming+Subsidies%22+OR+%22Permaculture+Design+Grants%22+OR+%22Pollination+Services+Payments%22+OR+%22Protected+Area+Financing%22+OR+%22Regenerative+Agriculture+Support%22+OR+%22Sustainability+Linked+Loans%22+OR+%22Urban+Greening+Grants%22+OR+%22Wetlands+Restoration+Funding%22+OR+%22Zero+Emission+Vehicle+Incentives%22+OR+%22Adaptive+Management+Practices%22+OR+%22Biodiversity+Informatics%22+OR+%22Climate+Bonds%22+OR+%22Debt-for-Nature+Swap%22+OR+%22Ecosystem-Based+Adaptation%22+OR+%22Forest+Stewardship+Council+Certification%22+OR+%22Greenhouse+Gas+Inventory%22+%22Habitat+Restoration+Grants%22+OR+%22Invasive+Species+Control+Funding%22+OR+%22Land+Degradation+Neutrality+Fund%22+OR+%22Mitigation+Banking%22+OR+%22Non-Timber+Forest+Product+Incentives%22+%22Ocean+Acidification+Research+Grants%22+OR+%22Pollinator+Habitat+Enhancement%22+OR+%22Renewable+Energy+Certificates%22+OR+%22Soil+Health+Improvement+Programs%22+OR+%22Tree+Planting+Campaigns%22+OR+%22Wildlife+Management+Areas%22+OR+%22Biodiversity+Strategy+and+Action+Plans%22+OR+%22Circular+Economy+Initiatives%22+OR+%22Disaster+Risk+Reduction+Funding%22+OR+%22DRR+Funding%22+OR+%22Ecosystem+Valuation%22+OR+%22Fisheries+Improvement+Projects%22+OR+%22Green+Job+Training+Programs%22+OR+%22Holistic+Management+Funding%22+OR+%22Indigenous+Peoples%27+Biodiversity+Conservation%22+OR+%22Landscape+Connectivity+Projects%22+OR+%22Mangrove+Restoration+Initiatives%22+OR+%22Nature-based+Solutions%22+OR+%22Organic+Certification+Cost+Share%22+OR+%22Peatland+Restoration+and+Management%22+OR+%22Quantitative+Easing+for+the+Planet%22+OR+%22Riparian+Buffer+Zones+Support%22+OR+%22Sustainable+Land+Management%22+OR+%22Threatened+Species+Recovery+Plans%22+OR+%22Urban+Biodiversity+Enhancement%22+OR+%22Vertical+Farming+Incentives%22+OR+%22Water+Efficiency+Programs%22+OR+%22Xeriscaping+Rebates%22+OR+%22Youth+Engagement+in+Conservation%22+OR+%22Zero-waste+Strategies%22+OR+%22Agrobiodiversity+Conservation+Subsidies%22+OR+%22Biochar+Production+Incentives%22+OR+%22Climate+Resilience+Building%22+OR+%22Drought+Management+Assistance%22+OR+%22Eco-labeling+Programs%22+OR+%22Functional+Biodiversity+Promotion%22+OR+%22Green+Supply+Chain+Financing%22+OR+%22Hedgerow+Restoration+Support%22+OR+%22Integrated+Water+Resources+Management+Funding%22+OR+%22Jungle+Restoration+Projects%22"
query_url2 <- gsub("%22", "%5C%22", query_url)

This still errors though, but now for a different reason - it's just genuinely long:

cat(rawToChar(
  httr::GET(query_url2)$content
))
#> <html>
#>   <head>
#>     <title>Bad Request</title>
#>   </head>
#>   <body>
#>     <h1><p>Bad Request</p></h1>
#>     Request Line is too large (4468 &gt; 4094)
#>   </body>
#> </html>

Overall I'm completely stumped though. I have no idea why this is an issue and whether this is on our end, OA's end, httr's end, etc.

@rkrug
Copy link
Author

rkrug commented Mar 9, 2024

Hm. What about using the opportunity to move to httr2? That would exclude one possible culprit.

Also - if I could try to get somebody from OA to look at it - maybe log files?

@yjunechoe
Copy link
Collaborator

Switching over to httr2 would indeed be nice but it'll require more than just rewriting code and I currently don't have the bandwidth for this - I'll keep the issue in mind but for now the workaround above should do.

@yjunechoe
Copy link
Collaborator

Sorry just for completeness - what function call generated the long query URL you originally posted? Was it spit out by oa_query() (if so, what were the inputs??

@rkrug
Copy link
Author

rkrug commented Mar 11, 2024

I got the URL from the OpenAlex web interface. If I remember correctly, the original search term did not work via openalexR (same symptoms as to long, but probably something different - by the way, it would be niche to give a warning if the url might be to long), so I tried the API to find out by how much. But there it worked. So I copied the API call back into the openalexR call, which is where it did not worked.

@rkrug
Copy link
Author

rkrug commented Mar 11, 2024

Switching over to httr2 would indeed be nice but it'll require more than just rewriting code

Could you elaborate? Why do you say that? I agree, that a switch to httr2 opens the possibility to do some breaking changes (openalexR2), but why do you say that is necessary?

@yjunechoe
Copy link
Collaborator

Could you elaborate? Why do you say that? I agree, that a switch to httr2 opens the possibility to do some breaking changes (openalexR2), but why do you say that is necessary?

Oh - it's not necessary to switch over at all! I just meant that if we were to, it would require quite a bit of work.

@trangdata trangdata added the bug Something isn't working label Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants