Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karaf seems to send derivative requests to the wrong place. #1070

Open
alistairjmcintyre opened this issue Mar 27, 2019 · 12 comments
Open

Karaf seems to send derivative requests to the wrong place. #1070

alistairjmcintyre opened this issue Mar 27, 2019 · 12 comments
Labels
Type: feature request a proposal for a new feature in the software (should be justified by a ‘use case’)
Milestone

Comments

@alistairjmcintyre
Copy link

We currently have Islandora 8 running behind an nginx reverse proxy server that takes care of TLS termination.

We enforce HTTPS everywhere which results in 301 or 302 redirects to the https entry, but I'm getting the error below when generating a derivative:

2019-03-28 11:04:52,340 | DEBUG | nnector-houdini] | DefaultErrorHandler | 86 - org.apache.camel.camel-core - 2.19.2 | Failed delivery for (MessageId: ID-staging-idora1-41349-1553724259858-1-11 on ExchangeId: ID--staging-idora1-41349-1553724259858-1-1). On delivery attempt: 2 caught: org.apache.camel.http.common.HttpOperationFailedException: HTTP operation failed invoking http://staging.example.nz/node/5/media/image/18 with statusCode: 302, redirectLocation: https://staging.example.nz/node/5/media/image/18

I'm wondering a couple of things regarding this:

  1. It seems odd that karaf would blindly assume I want to go back to the front-end proxy, is there something I'm missing here?
  2. Secondly, is there a way to make karaf/camel okay with 302 redirects?

I'm not particularly adept with Karaf and/or Camel, so it's plausible there's something I'm missing here.

I do have a work-around for this at the moment, but it involves some truly truly evil nginx config and an /etc/hosts entry and that seems like I'm doing it the wrong way.

@whikloj
Copy link
Member

whikloj commented Mar 28, 2019

That is odd because the response URL (ie http://staging.example.nz/node/5/media/image/18) is provided by your Drupal instance here.

So does your Drupal instance respond to non-SSL requests?

@alistairjmcintyre
Copy link
Author

Thanks for the quick (and extremely useful) response!

We made some changes yesterday with regard to Drupal 8 and Reverse Proxying, namely $settings['reverse_proxy'] = TRUE; and $settings['reverse_proxy_addresses'] which seems to have made karaf not run into the 302 redirect error, so that's a bonus.

Is there any way to override this setting? It seems redundant to go Karaf -> Proxy -> Drupal when it could go Karaf -> Drupal.

@whikloj
Copy link
Member

whikloj commented Mar 28, 2019

Except that I think we are doing that (Karaf -> Drupal), but Drupal is telling us to go to the Proxy.

Which indicates that Drupal is setting the post back address as non-SSL Drupal address when we generate the event for Alpaca, but then when we try to post to that URL Drupal says "Oops you want the Proxy (302)`.

This could be a Drupal 8 problem or perhaps we need to look at the use of Url::fromRoute() and reverse proxies. This might require more investigating.

@whikloj
Copy link
Member

whikloj commented Mar 28, 2019

When you work against Drupal 8 are you accessing the SSL or non-SSL site?

@alistairjmcintyre
Copy link
Author

I am accessing Drupal via an SSL site.

So, to try and clarify the exact steps (I think) are happening here:

  1. I access Drupal via the Reverse Proxy. As far my browser is concerned I am accessing via HTTPS.
    However, SSL/TLS is terminated by nginx on the Reverse Proxy, meaning all traffic that reaches Drupal is HTTP.
  2. I add a Repository Item with type Image to Drupal
  3. I add an Image with type Original File to that Repository Item
  4. Drupal sees an Original File under a Repository Item of type Image and goes to generate a derivative
  5. The derivative is generated by houdini at http://localhost:8000/houdini on the webserver.
  6. Karaf(?) gets the response from houdini containing the derivative file and makes a PUT request to the Drupal endpoint it was given.
  7. The request goes to https://example.nz/the/put/route/here (as an example), which goes through the Reverse Proxy and back to Drupal.

I am more than likely missing some parts here, but this is my understanding of it, it's the 'goes through the reverse proxy' part of Step 7 that feels redundant to me.

@kayakr
Copy link
Contributor

kayakr commented Mar 28, 2019

@alistairjmcintyre islandora.media_source_put_to_node is specified in web/modules/contrib/islandora/islandora.routing.yml
path: '/node/{node}/media/{media_type}/{taxonomy_term}'
That should be a Drupal URL unless there's some rewriting going on?

@whikloj
Copy link
Member

whikloj commented Mar 28, 2019

@alistairjmcintyre What I think what is happening is:

  1. You access Drupal via the Reverse Proxy at https://localhost (for example)
    However, SSL/TLS is terminated by nginx on the Reverse Proxy, meaning all traffic that reaches Drupal is HTTP.
  2. You add a Repository Item with type Image to Drupal (say node/3)
  3. You add an Image with type Original File to that Repository Item
    Drupal sees an Original File under a Repository Item of type Image and goes to generate a derivative
  4. The Event is generated with a post back URL of http://localhost/node/3/media/image/18 and sent to Alpaca (on Karaf).
  5. The derivative is generated by houdini at http://localhost/houdini and sent back to Alpaca.
  6. Alpaca (Karaf) makes a PUT request to http://localhost/node/3/media/image/18
  7. Drupal says "Sorry all requests should go through our reverse proxy available at httpS://localhost/node/3/media/image/18"
  8. Alpaca dies.

So what we probably need to do is handle the 302 better and have Alpaca just try again at the redirected URL.

@alistairjmcintyre
Copy link
Author

I'm with you up until about Step 7, except derivatives are definitely being generated and Karaf logs ( /opt/karaf/data/logs/camel.log ) are not showing any errors.

Drupal knows it's behind a Reverse Proxy ( https://medium.com/@lmakarov/drupal-8-and-reverse-proxies-the-base-url-drama-c5553cbc9a3e proved to be an invaluable resource for this ) and as such the base url of the website it knows to be 'https://staging.example.nz', which I guess is the one that Karaf is using.

It seems like it would be more efficient to go directly back to Drupal, without having to jump through the proxy.

@whikloj
Copy link
Member

whikloj commented Mar 29, 2019 via email

@alistairjmcintyre
Copy link
Author

My apologies for the confusion.

Initially (when I made this ticket) I had an awful, evil nginx config that only allowed HTTP traffic via the webserver to the proxy, without that, I would get the error about a 302 redirect in Karaf.

We've since learned a few things about Reverse Proxying and Drupal 8 that we didn't know, tweaked relevant settings both in nginx and Drupal and the 302s aren't an issue at all now (although something to be aware of for anyone who's going to put Islandora behind a reverse proxy).

The real problem here is Karaf is routing traffic to the wrong place. It shouldn't be routing traffic from Karaf, to the Proxy, then onto Drupal, when Karaf and Drupal are on same machine, but the Proxy is on another.

Currently it's functional, but I don't think it's expected behavior that it would require 2 network hops rather than talking to another service on the same machine.

@dannylamb
Copy link
Contributor

Alpaca is pretty naive when it comes to this. It really doesn't know anything at all about the urls it uses. It's straight up told where to fetch and put everything with info in the message it reads from the queue. We generate that message using a Drupal action, so in theory it's totally possible to monkey with that PUT url to get things right. We'd just have to figure out how best to do it without interfering with non-TLS-terminating setups. Either some hardcoded special case logic or maybe let modules alter the message before it goes to the queue?

BTW thanks for linking that article @alistairjmcintyre, it was super informative. Learn something new every day...

@whikloj
Copy link
Member

whikloj commented Apr 11, 2019

This seems like a feature which (if the services are on the same machine) would allow you to replace the hostname of the machine with localhost to avoid exiting. I think this is a worth while feature to investigate.

@whikloj whikloj added this to the 1.x milestone Apr 11, 2019
@kstapelfeldt kstapelfeldt added Type: feature request a proposal for a new feature in the software (should be justified by a ‘use case’) and removed architecture labels Sep 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: feature request a proposal for a new feature in the software (should be justified by a ‘use case’)
Projects
Development

No branches or pull requests

5 participants