Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autogenerate timestamps #317

Closed
tunetheweb opened this issue Nov 4, 2019 · 14 comments · Fixed by #363 or #783
Closed

Autogenerate timestamps #317

tunetheweb opened this issue Nov 4, 2019 · 14 comments · Fixed by #363 or #783
Assignees
Labels
development Building the Almanac tech stack enhancement New feature or request
Milestone

Comments

@tunetheweb
Copy link
Member

tunetheweb commented Nov 4, 2019

As part of #302 and #308 we added Article Structured Data which depends on having a Published Date (required or it fails the Google Structured Data Testing Tool) and a Last Updated Date (recommended by the Google Structured Data Testing Tool). These have been implemented in the chapter markdown files as meta data. For example:

---
part_number: IV
chapter_number: 20
title: HTTP/2
description: HTTP/2 chapter of the 2019 Web Almanac covering adoption and impact of HTTP/2, HTTP/2 Push, HTTP/2 Issues, and HTTP/3
authors: [bazzadp]
reviewers: [bagder, rmarx, dotjs]
published: 2019-11-04T12:00:00+00:00:00
last_updated: 2019-11-04T12:00:00+00:00:00 
---

This allows for different timestamps for different publication dates and also for different timestamps for the translations which might be published at different dates - hence why I put it in the markdown.

At present these dates must be manually maintained which @rviscomi and I have been discussing.

I think that makes sense for the Published Date as this should be set once and forgotten about, but for the last_updated this is more error prone and people may forget to update them when changing the contents.

It should not be generated as part npm run generate as the contents my not have changed for every chapter (or even any chapter!) when it is run, and also generic formatting changes, which are not specific to this chapter (e.g. when base_chapter.html or chapter.html is updated) probably don't count as a "real edit".

Requirements:

  • Auto update last_updated meta data in chapter md files when a merge is accepted into that chapter’s markdown. Is it possible to do this as part of a GitHub action?

The format currently has to be as per above example, however this will be converted as part of npm run generate as per #345 so less of a concern (and we should use an easier format for authors!).

  • Bonus points 1 - have a global "earliest publish date" per year and auto flag any date (published or last_updated) before then (or when a date is not present?) to prevent errors creeping in and prevent the merge. Set this global constant to 2019-11-04T12:00:00+00:00:00 for now and then we can update it later once we have an actual publish date. This has been implemented as part of Auto generate sitemap #345
  • Bonus points 2 - also do the non-chapter pages (Home, methodology, ToC, Contributors) though these may be trickier as just embedded in HTML at present.

@mikegeyser any thoughts on this?

@rviscomi rviscomi added development Building the Almanac tech stack enhancement New feature or request labels Nov 4, 2019
@rviscomi rviscomi added this to the SHIP IT! milestone Nov 4, 2019
This was referenced Nov 4, 2019
@mikegeyser
Copy link
Contributor

How about adding a pre-commit hook that updates the last_updated in the markdown files? Another option (perhaps simpler) is to use git itself as a part of npm run generate, something like this:

>> git log -1 --pretty="format:%ci" content/en/2019/performance.md | cat
2019-11-05 22:33:38 -0800%

We can do something dodgy, like call out to the git cli from node?

@tunetheweb
Copy link
Member Author

I'm happy either way.

I'm not that familiar with git, pre-commit hooks, the cli...etc. so think it would be good if you could take this on @mikegeyser ?

Good thing is that as part of #345 we will no longer just dump this date into the files as is, and we use JavaScript to read and format the date, so that gives us more options of converting the date to the format we need from that format that git spits out.

Is that date specific to the repo? Or will the value (and format) change depending on the local timezone of whoever runs npm run generate? Hopefully it's repo-specific so it'll be the same for everyone.

@mikegeyser
Copy link
Contributor

The git commit has timezone information as a part of it.

@mikegeyser
Copy link
Contributor

Are we ok with the standard toISOString() timestamps? So 2019-11-04T18:46:53.000Z instead of 2019-11-04T18:46:53+00:00:00.

@tunetheweb
Copy link
Member Author

tunetheweb commented Nov 7, 2019

Are we ok with the standard toISOString() timestamps? So 2019-11-04T18:46:53.000Z instead of 2019-11-04T18:46:53+00:00:00.

I think so. https://developers.google.com/search/docs/data-types/article#non-amp-sd has this to say:

The date and time the article was first published, in ISO 8601 format.

And the wikipedia link says both are fine.

For the ultimate test, I also tested it in the Structured Data Testing Tool (you can load a URL like my http2 chapter and then edit it) and it accepted both formats!

@rviscomi
Copy link
Member

rviscomi commented Nov 7, 2019

Reopening to add one feature request. I just ran npm run generate for https://github.com/HTTPArchive/almanac.httparchive.org/pull/364/files and was surprised to see unrelated chapters getting their last modified updated. Would it be possible to only update the timestamp on chapters that have been changed? A few reasons:

  • it makes reviewing PRs more difficult because there are ~40 diffs
  • the files haven't actually changed so semantically it's not accurate
  • it might confuse crawlers to say something has changed when it hasn't

@tunetheweb
Copy link
Member Author

Was just looking at this. Suspect it's in part from other branches and other forks. As we update the .md files with each generation, they become part of the commit. Need a better way of doing this.

@tunetheweb
Copy link
Member Author

I think we should revert this commit for now.

Or at the very least comment out this line for now:

let { generate_last_updated } = require('./generate_last_updated');

@mikegeyser
Copy link
Contributor

I'm so sorry, I could have sworn it was working correctly. :(

@mikegeyser
Copy link
Contributor

Oh, I see where the flawed logic is now. :( Updating the file, even just to update the timestamp, will mean that becomes the latest commit - updating the timestamp. This approach could never work if we change source files, but could possibly work if we only use it to generate the timestamps in the html.

Again, I'm really sorry for the mistake.

@tunetheweb
Copy link
Member Author

I really think the way to do this is with a Git Action (that I'm not familiar with).

On merge to master:

  • fire up a node terminal
  • git checkout
  • git update timestamp
  • npm run generate
  • git commit
  • auto deploy to website if no issues

The CI/CD dream perfect for automating future edits/corrections.

But not one for launch.

@rviscomi rviscomi removed this from the SHIP IT! milestone Nov 8, 2019
@rviscomi rviscomi added this to the Après Ski milestone Nov 8, 2019
@tunetheweb
Copy link
Member Author

tunetheweb commented Nov 10, 2019

BTW if you change this line:

const command = `git log -1 --date=iso-strict-local ${path} | cat`;

From this:

const command = `git log -1 --date=iso-strict-local ${path} | cat`;

To this:

const command = `git log -1 --date=iso-strict-local master ${path} | cat`;

Then at least it gets the master commit time rather than local branch commit time.

Three issues with this:

  1. It won't work from forks with a merge request - you must run npm run generate from this repo.
  2. Need to ensure your local master is up to date.
  3. You need to run npm run generate after all commits.

So not a great solution either but thought I'd throw it out there in case want to do this short term in next week or two.

@tunetheweb
Copy link
Member Author

@ethomson as you've been helpful on that other issue, have you any thoughts on how best to tackle this one (it was mentioned in that other issue you responded on, but it has much bigger scope that this little one).

To summarise above, we basically want to automatically update a timestamp in a file when a pull request is accepted for that file. The timestamp should be set to the date that the pull request was merged (or perhaps when submitted if that's easier) - but it should only be changed on files changed in that PR, and btw not all files have that timestamp meta data. Let me know if that doesn't make sense and can provide more details with examples.

@tunetheweb
Copy link
Member Author

Ignore this @ethomson. Thanks for you help on the other thread! will have a play.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Building the Almanac tech stack enhancement New feature or request
Projects
None yet
3 participants