Remove symlinks and serve static files with Python #4550

ericholscher · 2018-08-21T14:38:11Z

Currently all of our documentation pages are served by Nginx. We maintain this with a web of symlinks that is pretty error prone and hard to maintain. We also throw a large number of errors in these code paths, trying to manipulate the filesystem in wonky ways.

We currently already have a pattern that solves this problem: Sendfile. It allows us to handle processing of the request in Python, but still have Nginx serve the file.

This logic would be a combination of our current redirect logic, which doesn't hit the DB, and existing Sendfile support. These nginx docs cover the usage: https://www.nginx.com/resources/wiki/start/topics/examples/x-accel/ -- we already do this in a couple places:

Media downloads (eg. pdf/html): https://github.com/rtfd/readthedocs.org/blob/master/readthedocs/projects/views/public.py#L194
Serving of docs from Python: https://github.com/rtfd/readthedocs.org/blob/master/readthedocs/core/views/serve.py#L137

The primary difference is that we need to be able to do it without hitting the database.

Benefits

We remove the symlink code which is quite complex and not valuable
We are able to do "real" redirects, not just on 404 pages
We move logic from nginx into Python, and give ourselves a lot more flexibly.

Considerations

Should we just be moving all our static file serving to cloud files/S3, instead of managing them on disk?
Is it worth all the work if we don't get additional user benefits beyond redirects?

Requirements

All static files must continue being served without hitting the database

Implementation

Write more data into the metadata.json for each project, allowing us to make more decisions without hitting the database (existing code: https://github.com/rtfd/readthedocs.org/blob/master/readthedocs/projects/tasks.py#L1125)
Write a small Python proxy that reads metadata.json and then served the correct file off disk. We could only keep the user_builds directory around, and the Python app would be in charge of translating the URL to the filepath to serve, accounting for subprojects, translations, etc.

The text was updated successfully, but these errors were encountered:

agjohnson · 2018-09-17T21:46:36Z

So I'm:

+0 on dropping symlinks. I'm 👍 on the idea, but weighing the work required to remove symlinks and develop features for our redirect application, I feel like this will be another distraction from building product features.
+0 on moving files to s3/azure. We can sendfile to an external azure storage URL, but perhaps we first explore serving docs directly from blob storage first. Serving from storage blobs is a difficult problem, and it might not even be possible with the amount of additional logic we need (application redirects, etc). We don't get all the benefits if we sendfile to blob storage. But if this doesn't work, sendfile to Azure blob could be a great option to reduce storage duplication.

Instead of reimplementing serve_docs, could we serve docs through Django, but add operations pieces like caching or CDN in front? This would only be acceptable if we can ensure cache serving is seamless when database goes down or latency increases. The benefit here is we don't have additional work on our application.

humitos · 2019-01-18T11:55:22Z

My position here is making this in movement in two phases:

Remove all symlinks and serve files from our disks using the NGINX header: this is still a good amount of work probably but we will clean the code a lot removing hacky decisions and start allowing other features as better redirects.
Serve files from blob storage: once the phase 1 is completed, we could work in all the infrastructure needed to upload the files to a blob storage and start exploring that path (without breaking our existing serving) and when we have something testable, we can just switch where the NGINX header points to.

ericholscher · 2019-01-21T15:20:16Z

we can just switch where the NGINX header points to.

This only works on internal files. We could proxy to an external file host, but that would add a decent bit of latency. Probably <30ms, but worth thinking about. This is what packages.python.org is doing currently w/ S3, so might be worth asking them how it's working.

humitos · 2019-04-28T18:20:42Z

We could proxy to an external file host, but that would add a decent bit of latency. Probably <30ms, but worth thinking about

I think we decided to go in this direction all together with "El proxito" (an app that will receive all the requests and translate an URL into the path of that file in blob storage and this file will be proxy).

I'm closing this issue here. We can revisit it if we need it when implementing El Proxito.

ericholscher added Feature New feature Operations Operations or server issue Needed: design decision A core team decision is required labels Aug 21, 2018

humitos mentioned this issue Aug 21, 2018

Fix Exact Redirect to work properly when using $rest keyword #4501

Merged

humitos mentioned this issue Sep 4, 2018

Cannot redirect from old version to new version if old version is marked inactive #4598

Closed

agjohnson added this to the Refactoring milestone Sep 17, 2018

humitos mentioned this issue Nov 15, 2018

Refactor where we are calling out to update symlinks #3753

Closed

humitos closed this as completed Apr 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove symlinks and serve static files with Python #4550

Remove symlinks and serve static files with Python #4550

ericholscher commented Aug 21, 2018 •

edited

Loading

agjohnson commented Sep 17, 2018

humitos commented Jan 18, 2019

ericholscher commented Jan 21, 2019

humitos commented Apr 28, 2019

Remove symlinks and serve static files with Python #4550

Remove symlinks and serve static files with Python #4550

Comments

ericholscher commented Aug 21, 2018 • edited Loading

Benefits

Considerations

Requirements

Implementation

agjohnson commented Sep 17, 2018

humitos commented Jan 18, 2019

ericholscher commented Jan 21, 2019

humitos commented Apr 28, 2019

ericholscher commented Aug 21, 2018 •

edited

Loading