Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mirroring #1081

Closed
wants to merge 2 commits into from
Closed

Mirroring #1081

wants to merge 2 commits into from

Conversation

mjpost
Copy link
Member

@mjpost mjpost commented Nov 18, 2020

This PR is for discussion of mirroring (#295).

The minor modifications here are in place at http://anthology.aclweb.org/, which was pretty easy to setup. However, a few minor problems remain:

  • The /anthology subdirectory appears to be hard-coded in some places (I had to add a symlink from anthology. to get things working)
  • Setting the prefix in bin/anthology/data.py may be a bit too hidden. Also, if it's empty, PDFs (strangely) link to domains, e.g., http://2020.emnlp-main.1/
  • We should be sure to mark all links with the Anthology canonical URLs, so that indexing sites will correctly manage the redundancy. Or we turn off crawling.

@mjpost mjpost requested review from akoehn and a team November 18, 2020 05:21
@akoehn
Copy link
Member

akoehn commented Nov 18, 2020

I had a look at the work needed for mirroring yesterday and think that I have a pretty good idea of what to change.

We essentially need to keep track of two URLs: the canonical one and the mirror. I'll push a PR with changes later today, or I can push into this pr.

@mjpost
Copy link
Member Author

mjpost commented Nov 18, 2020

Sounds great. Directory-depth invariance will be important to have working, since it seems we are actually going to be moving hosting to anthology.aclweb.org. Separating from aclweb will give us separation, a better hosting service, and also a free backup.

@akoehn
Copy link
Member

akoehn commented Nov 18, 2020

it seems we are actually going to be moving hosting to anthology.aclweb.org

While I like subdomains in principle, do we really want to move away from the canonical URLs that we have? This seems to be a premier source for confusion; even if we create redirects everywhere.

The Anthology URLs are nearly the only URLs on the aclweb.org site that have a very good reason to stay the way they are; why not migrate future CPU heavy dynamic stuff to somewhere else?

This is not a complete argument, just please let us discuss this before creating facts!

@mjpost
Copy link
Member Author

mjpost commented Nov 19, 2020

We can defer discussion about permanent hosting. For the meantime, I think it'd be good to:

  • Continue hosting on aclweb.org
  • Get at least one permanent mirror in place (perhaps hosted at aclanthology.org, which we own)

Note that we continue to have problems with bluehost; for example, the most recent build failed to deploy with this:

Run rsync -aze "ssh -o StrictHostKeyChecking=accept-new" --delete build/anthology/ $PUBLISH_TARGET
7
ssh: connect to host aclweb.org port 22: Connection timed out
8
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
9
rsync error: error in rsync protocol data stream (code 12) at io.c(235) [sender=3.1.2]
10
Error: Process completed with exit code 12.

@akoehn
Copy link
Member

akoehn commented Nov 19, 2020

Yeah, I saw that. Maybe let's have a chat over infrastructure stuff some time in the near future (I'm available at UTC+1 times).

I think I have finished all the necessary changes to host a mirror (currently testing) and now only need the code that actually mirrors the PDFs. That should not be too much work as nearly everything is already in place for that.

@akoehn
Copy link
Member

akoehn commented Nov 20, 2020

Short update: URL logic for mirroring works as intended, I will push some stuff next week.

@mjpost
Copy link
Member Author

mjpost commented Dec 9, 2020

Closed in favor of #1124.

@mjpost mjpost closed this Dec 9, 2020
@mjpost mjpost deleted the mirror branch June 4, 2021 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants