Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static site generator #1825

Open
veganstraightedge opened this issue Oct 24, 2020 · 6 comments
Open

Static site generator #1825

veganstraightedge opened this issue Oct 24, 2020 · 6 comments

Comments

@veganstraightedge
Copy link
Contributor

We're still going to run a Rails app for the .com and the CMS.

A static snapshot of the site could serve as a read replica mirror.

This issue is to create a way to create a static version of the site, which could then be hosted just about anywhere.

  • One version would continue to hot link images/etc from wherever the .com uses them.
  • Another (optional) version would be where they could also host all of them images/etc with the static site (and rewrite the image tags?)
@goncalopereira
Copy link

goncalopereira commented Nov 4, 2020

As a proof of concept I ran a crawler (wget with mirror settings) followed by a script to sanitise the data.

I was able to get a partial static read-only copy of the production website.

For briefness:

  • Not all pages have been included (I ran it rate limited not to impact the site and did not let it complete).

  • Fixed it to the crimethinc domain so it is not ingesting other domains e.g. podcast episodes, this could be rethought. (lite., cloudfront., etc)

  • No changes to the website app.

  • Can be scheduled and maintenance as a separate tool.

  • we can keep historical copies.

  • keep copies with and without large external media.

  • we can run it closer to the production server for faster backups.

pain points for CDN/caching/mirror:

  • Last-modified header not consistently in every file
  • ETag missing
  • Mixed Content found
  • font links with ? in the URL - Netlify not a fan

POC AWS http://ctmirror.s3-website.eu-west-2.amazonaws.com/
POC Netlify https://eloquent-swirles-d1e8f8.netlify.app/

@veganstraightedge
Copy link
Contributor Author

This is awesome! @goncalopereira

Great start! What're the next steps? What are the open questions to consider?

@goncalopereira
Copy link

I think a 2nd opinion would be great. I can create a PR with the ongoing scripts, need to figure out the project structure for it.

I think the questions are:

  • How to run it in prod without affecting or being blocked / can we get a prod db ?

  • What mirrors are we supporting

  • What subdomains or external websites need caching

  • Fix headers for Prod would make it more efficient (and Prod caching in itself)

  • Fix mixed content on Prod if possible.

@anarcat
Copy link

anarcat commented Feb 24, 2022

i don't think crawling is the best way to think about this, because then you have to recrawl everything (or parts? or what? hard to decide!) whenever content changes.

what some dynamic sites do is that they internalize the "crawler", or more accurately, the static generation. each page rendering is actually stored on disk, which doubles as a fast cache which helps for denial of service conditions. i worked on Drupal sites in the past which used the "boost.module" to do this, but that didn't work well to create a static site copy. i think there's something better for drupal now, but that's irrelevant since you don't use drupal. :p (Django similarly can drive static sites too.)

So I guess the question, IMHO, is how to do this caching thing but with Rails as a backend. I frankly have close to zero experience coding in Rails, but a few searches gave me this documentation, where "page cache" certainly looks interesting.

Note that you'd still have to have something that crawls the entire site (maybe? or maybe rails is magic and will do that on its own?) but the difference is that then you have a server-side archive that you can more easily distribute, and that's a trusted copy that you don't necessarily need to refresh all the time. Whenever you post something new, as soon as someone reads it, it gets cached and added to the pile.

This beats recrawling everything all the time...

@veganstraightedge
Copy link
Contributor Author

Thanks @anarcat.

I agree that an internal static site generator is also a good idea.
We already do do a fair bit of caching in Rails land, but that's dependent on have a big Redis server running. And isn't easy to hand off to another person/place hosting a copy of the site.

IMO, a happy medium would be if the Rails CMS generated static files/folders of the site, then shipped it off-site somewhere, as both files/folders ready to serve as a static site and as a gzip/tarball for others to download and mirror, if needed.

Being able to easily spin up a new Rails/Postgres/Redis would be a nice to have too, but not as easy for many people in many situations to run than just a static classic web server.

@anarcat
Copy link

anarcat commented Feb 25, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants