Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't use absolute path in redirect for visualizer #2785

Merged
merged 4 commits into from
Sep 24, 2020
Merged

don't use absolute path in redirect for visualizer #2785

merged 4 commits into from
Sep 24, 2020

Conversation

timkpaine
Copy link
Contributor

In reverse proxy setups, you sometimes want to setup sites behind a sub path e.g. my.site.com/luigi/
This PR makes sure the initial redirect is relative to the base_url, not absolute.

@honnix
Copy link
Member

honnix commented Sep 20, 2019

Thanks for the PR. Have you tested this will work for both behind reverse proxy and direct access?

Copy link
Contributor

@Tarrasch Tarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this isn't testable, then maybe you can add a code comment that explains why there's no "/"?

@timkpaine
Copy link
Contributor Author

will get back to this this week, got distracted

@timkpaine
Copy link
Contributor Author

@honnix works fine for both, @Tarrasch it might be possible to modify this test to test explicitly

def test_visualiser(self):

@honnix
Copy link
Member

honnix commented Oct 1, 2019

@stale
Copy link

stale bot commented Jan 30, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If closed, you may revisit when your time allows and reopen! Thank you for your contributions.

@stale stale bot added wontfix and removed wontfix labels Jan 30, 2020
@timkpaine
Copy link
Contributor Author

timkpaine commented Jun 4, 2020

@honnix @Tarrasch
finally got around to this and added a test that specifically checks there is no leading slash:

response = self.fetch("/", follow_redirects=False)
self.assertEqual(response.code, 302)
self.assertEqual(response.headers['Location'], 'static/visualiser/index.html')  # assert that doesnt beging with leading slash !

Copy link
Member

@honnix honnix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @Tarrasch Would like you to take another look at this as well?

Copy link
Collaborator

@dlstadther dlstadther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@honnix honnix merged commit d4eb8a6 into spotify:master Sep 24, 2020
@timkpaine timkpaine deleted the reverse-proxy branch September 24, 2020 22:55
@timkpaine
Copy link
Contributor Author

thanks!

@tashrifbillah
Copy link
Contributor

This work missed my eyes. I have reported the need in many places before including issue #2981. A similar issue is #2670. Thanks for the modification. I shall test it and let you know.

@tashrifbillah
Copy link
Contributor

tashrifbillah commented Nov 14, 2020

Unfortunately, the PR didn't work for me. See below a comparison between plotly server and luigi server:

Launching luigid

luigid --port 8083

Confirming luigi is functional w/o proxy

curl -L http://localhost:8083

Configuring nginx proxy

    server {

        root /var/www/html/;


        location /luigi/ {
            proxy_pass http://localhost:8083;
        }

        location /dash/ {
            proxy_pass http://localhost:8084;
        }


    }

Issue

# works gracefully
curl -L http://myIP/dash/

# fails :(
curl -L http://myIP/luigi/

Failure logs:

console log
<html><title>404: Not Found</title><body>404: Not Found</body></html>
nginx/access.log
170.223.221.217 - [13/Nov/2020:18:54:43 -0500] "GET /luigi/ HTTP/1.1" 404 69 "-" "curl/7.29.0" "-"
nginx/error.log
2020/11/13 19:05:09 [notice] 106027#0: signal process started
Tornado log
2020-11-13 19:25:05,597 luigi.scheduler[145038] INFO: Attempting to load state from /home/tb571/luigi-state.pickle
2020-11-13 19:25:05,602 luigi.server[145038] INFO: Scheduler starting up
2020-11-13 19:25:39,761 tornado.access[145038] WARNING: 404 GET /luigi/ (::1) 1.48ms

@timkpaine @honnix

@timkpaine
Copy link
Contributor Author

timkpaine commented Nov 14, 2020

The logs you posted are nginx telling you that the route is missing. Nothing to do with Luigi, are there Luigi logs indicating the process was hit with the request?

@timkpaine
Copy link
Contributor Author

also you installed from master right?

@timkpaine
Copy link
Contributor Author

timkpaine commented Nov 14, 2020

don't you need a trailing slash on proxy_pass http://localhost:8083;? otherwise you're redirecting to localhost:8083/luigi/, and whereas this is a valid dash route, its not a valid luigi route

@tashrifbillah
Copy link
Contributor

also you installed from master right?

Upon release 3.0.2, I applied your change, which is just one / removal.

Can you see my edited comment above for Tornado log?

@timkpaine
Copy link
Contributor Author

timkpaine commented Nov 14, 2020

don't you need a trailing slash on proxy_pass http://localhost:8083;? otherwise you're redirecting to localhost:8083/luigi/, and whereas this is a valid dash route, its not a valid luigi route

https://serverfault.com/questions/562756/how-to-remove-the-path-with-an-nginx-proxy-pass

@tashrifbillah
Copy link
Contributor

Thanks Tim, it looks like the trailing slash is required. So is it the trailing slash that makes sure the redirection is towards http://localhost:8083?

That said, I am going to do a few more tests tonight with other tabs such as /history.

@timkpaine
Copy link
Contributor Author

yeah all of this has nothing to do with luigi, its all nginx configuration. The trailing slash ensures that myip/luigi/a/path -> localhost:8083/a/path. You just happened to get unlucky with the url as dash directs its traffic through /dash anyway.

@tashrifbillah
Copy link
Contributor

tashrifbillah commented Nov 14, 2020

Hello again @timkpaine , unfortunately, we are not set yet. Have you tried to run a job through your modification? See the issue I figured:

nginx.conf

    server {
        root /var/www/html/;
        location /luigi/ {
            proxy_pass http://localhost:8083/;
        }
    }

luigi.cfg:

[core]
default-scheduler-url = http://myIP/luigi/

[scheduler]
record_task_history = True
state_path = ${HOME}/luigi-state.pickle

[task_history]
db_connection = sqlite:///${HOME}/luigi-task-hist.db

Error

Then if you try my simple task, it should run into the following error:

WARNING: Failed connecting to remote scheduler 'http://myIP/luigi'
Traceback (most recent call last):
  File "/tmp/new_luigi/miniconda3/lib/python3.7/site-packages/luigi/rpc.py", line 163, in _fetch
    response = self._fetcher.fetch(full_url, body, self._connect_timeout)
  File "/tmp/new_luigi/miniconda3/lib/python3.7/site-packages/luigi/rpc.py", line 117, in fetch
    resp.raise_for_status()
  File "/tmp/new_luigi/miniconda3/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://myIP/api/add_task

As apparent from http://myIP/api/add_task, I think my nginx URL rewriting isn't right. http://myIP/api/add_task isn't even hitting Luigi. It should be http://localhost:8083/api/add_task.

Any suggestion about the URL rewriting for @proxy_pass in the above nginx.conf?

@timkpaine
Copy link
Contributor Author

I don't use nginx for this but I think you'll have a lot easier time if you set it up as a sub domain rather than a path, e.g. luigi.myip rather than myip/luigi. When you reverse proxy under paths, often times you need a lot more library modifications.

@tashrifbillah
Copy link
Contributor

Did you test the job scheduling part other than visualizer?

@timkpaine
Copy link
Contributor Author

I run jobs from the same machine so my scheduler url is local host. But it's reasonable to assume that the client would still use absolute paths. This would still work fine with a sub domain instead of a path

@tashrifbillah
Copy link
Contributor

tashrifbillah commented Nov 14, 2020

I shall try what you suggested. More info for you--the /history tab in the seemingly functional visualizer is missing style sheets. Did you notice that? Because of reverse proxy, it can't find them.

@tashrifbillah
Copy link
Contributor

Hi @timkpaine , could you at least check if style sheets load for you in the /history tab?

@tashrifbillah
Copy link
Contributor

tashrifbillah commented Dec 10, 2020

I solved the style sheet loading issue by another location block:

    server {
        root /var/www/html/;
        location /luigi/ {
            proxy_pass http://localhost:8083/;
        }

        location /static/ {
            proxy_pass http://localhost:8083/static/;
        }

    }

It is a hack that does not make me so happy but it works.


Edit:

I needed another block to run a job:

         location /api/ {
             proxy_pass http://127.0.0.1:8082/api/;
         }

The fix is just not complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants