Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate bots to GitHub Actions #11851

Open
timvandermeij opened this issue Apr 27, 2020 · 26 comments
Open

Migrate bots to GitHub Actions #11851

timvandermeij opened this issue Apr 27, 2020 · 26 comments
Assignees

Comments

@timvandermeij
Copy link
Contributor

timvandermeij commented Apr 27, 2020

PDF.js has two dedicated bots for testing. However, this is all custom and maintained manually. This is already a problem now since the bots run quite old OS versions that lack a (modern enough) Python 3 version, which blocks the ttx upgrade for example (see #11802).

We could unblock this work by upgrading the OS versions, but for the longer term it would be better for the project to move towards a managed solution such as GitHub Actions. The latter integrates well with GitHub and our existing jobs, and reading quickly through the docs might allow us to create a botio-like flow where whitelisted users can comment on PRs to trigger actions (builds).

In terms of our builds, I mainly see a challenge for the browser tests since they are stateful, so we need to be able to fetch/store the reference images somewhere. I'm thinking this could be done in an S3 bucket, and let GitHub manage the credentials for that.

Let's investigate if these solutions could work for us and how much logic we have to maintain ourselves. The objective is to use existing solutions as much as possible so we don't have to build/maintain it ourselves, as well as improving reliability.

@brendandahl brendandahl changed the title Investigate if moving away from the bots is beneficial Migrate bots to GitHub Actions Jul 28, 2021
@brendandahl
Copy link
Contributor

For the long term health of the project I think it would be good to migrate all testing to github actions. As Tim mentioned we can store refs in S3 or alternatively Marco mentioned gh action artifacts can be stored for 90 days so we could set up a cron job to maintain them.

@timvandermeij
Copy link
Contributor Author

I think using GitHub artifacts might also be a good and perhaps even easier solution since everything then stays within the GitHub ecosystem, so we don't have to manage another AWS account for S3. The cronjob should be doable.

@timvandermeij
Copy link
Contributor Author

Time permitting I'll try to set up a basic workflow to run the bot commands using GitHub Actions, without artifact retention for now, so we can see how/if it works and go from there.

@timvandermeij
Copy link
Contributor Author

timvandermeij commented Jul 31, 2021

I have been working on migrating the preview task from the bot to GitHub Actions in timvandermeij@fe60c41 and while mostly working, I did run into the limitation that build artifact files can't seem to be accessed individually, at least through the UI, since the are all combined into one ZIP file. There are actions to post a comment on a PR with a link to the artifact, but if we can't get a direct link that opens the viewer in the browser that'll be inconvenient. Something similar will be needed for the reftest analyzer for migrating the reference testing task.

This limitation is already reported in actions/upload-artifact#14, so given that GitHub doesn't really "host" the artifacts separately, we might still need to use an S3 bucket to make the preview/reftest analyzer available in the browser from a direct link, similar to how Botio does it. We can then also store the reference images there, avoiding the need for a cronjob.

@brendandahl Would it be possible for you to set up an S3 bucket and configure the required secrets for accessing it in GitHub's secret manager? Once that's done, we can test uploading our artifacts and reference images there. See https://dev.to/johnkevinlosito/deploy-static-website-to-s3-using-github-actions-4a0e for how to do this.

@timvandermeij
Copy link
Contributor Author

timvandermeij commented Aug 8, 2021

I now have a working implementation of Botio's lint command in a GitHub Actions workflow in commit timvandermeij@d37aa7d. It works the same as Botio: if a specific comment is placed on a pull request by an authorized user, it will trigger the command and put notification comments on the pull request. timvandermeij#5 (comment) is an example of how that looks (first a commit that failed linting, then one that made it pass).

I chose the lint command since it's easiest to set up the issue comment infrastructure with since we don't have to bother with running browsers. However, while Botio has the linting command, I'm not sure if we should also introduce that in the new setup given that linting by default already runs in the CI workflow for every push, so I don't think there is ever a point where we want to run linting on-demand since it already happens (and if it fails, we can simply retrigger the CI build). Therefore, once I get the browsers working, we can start introducing workflows for all Botio tasks that don't require assets since that's pending on the S3 point above (basically everything except for the reference tests).

I'm glad that we now know that we can have the same behavior as Botio with the workflows. Overall this migration will make the build infrastructure nicely managed and easier to work with than the current bots. Issues like #11802 will be easier to address too because the build infrastructure is managed centrally and a simple pull request will be enough to update it, so we don't need direct machine access anymore.

@Snuffleupagus
Copy link
Collaborator

One thing that has always annoyed me a little bit with the botio setup is the need to have separate bot-specific tasks for all the tests we want to run (look at the gulp-tasks whose names start with "bot...").

In the new "GitHub Actions" setup, would it be possible to use command line flags instead together with the "normal" gulp-tasks (to reduce duplication)?
For example, instead of having to run gulp bottest, could we run gulp test --bot and have createTestSource directly check for that flag?

@timvandermeij
Copy link
Contributor Author

I don't see why not. In fact, I'm not really sure why that's even needed in the current Botio setup given that I don't see anything stopping us from using gulp test --bot at https://github.com/mozilla/botio-files-pdfjs/blob/master/on_cmd_test.js#L42? Botio or GitHub Actions will simply run the command we provide it, so if the Gulpfile can handle that argument we're good.

@brendandahl
Copy link
Contributor

@brendandahl Would it be possible for you to set up an S3 bucket and configure the required secrets for accessing it in GitHub's secret manager? Once that's done, we can test uploading our artifacts and reference images there. See https://dev.to/johnkevinlosito/deploy-static-website-to-s3-using-github-actions-4a0e for how to do this.

Yeah, we should be able to do that. Let me look into getting it set up.

@brendandahl
Copy link
Contributor

@timvandermeij I've added an s3 bucket. I've also added the following repository secrets:

AWS_ACCESS_KEY_ID
AWS_REGION
AWS_S3_BUCKET
AWS_SECRET_ACCESS_KEY

@timvandermeij
Copy link
Contributor Author

Nice; thank you! It will require some more tweaking of my initial GitHub Actions workflows, but I think we now have all the pieces of the puzzle to implement all features that our current bots have.

@calixteman
Copy link
Contributor

Will we have access (using ssh or whatever) to the machines where the actions are ran ?
I ask because few times I had to hack directly on them.

@timvandermeij
Copy link
Contributor Author

Not that I know of. GitHub Actions' workers get destroyed after the job is done, just like at Travis CI and other CIs. We can save artifacts in the S3 bucket if need be though.

@marco-c
Copy link
Contributor

marco-c commented May 26, 2022

There are things like https://github.com/marketplace/actions/debugging-with-ssh that would allow us to access the GH Actions machines in an interactive fashion through ssh.

@marco-c
Copy link
Contributor

marco-c commented May 26, 2022

@timvandermeij do you think you will have time to continue this?

@timvandermeij
Copy link
Contributor Author

Sadly I won't have much time on the short term. If you want to have it done faster, feel free to take the proof-of-concept work in #11851 (comment) and go from there. It mainly needed some more thought given that GitHub did not have a way of inheriting parts of jobs, so for example the list of allowed users would have to be duplicated in every job file, which seems suboptimal. Moreover, we need to try if storing the reference images in S3 works for the browser tests.

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Apr 2, 2023

Re-reading the discussion above it seems that most of the "problems" are around testing in particular, so could we perhaps make easier progress here by ignoring that part for now?
Basically, could we start by only moving tasks like building the gh-pages branch and the pdfjs-dist library from the bots and into GitHub actions instead?

@timvandermeij
Copy link
Contributor Author

timvandermeij commented Apr 29, 2023

The source code for the existing commands of the bots can be found at https://github.com/mozilla/botio-files-pdfjs. The list of commands of the bots is as follows, and all of them should get an equivalent GitHub Actions workflow before we can shut down the bots.

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Apr 29, 2023

  • on_release.js: publishes a new release to the pdfjs-dist repository and NPM.

That seems like the single most important one to me, given all of the problems we've had with recent release. (Also, we really "need" to do a new release soon given some of the standard-font data stuff being worked on now.)

  • on_cmd_lint.js: runs the linting command.

#11851 (comment) talks about possibly just removing that one, which is something that I'd fully support.

@marco-c
Copy link
Contributor

marco-c commented Oct 25, 2023

Thinking about this again, it might be easier to store reference images in a GitHub repository. This way we can easily access images from any run of makeref, and we don't need to set up anything in S3 or GCS.

timvandermeij added a commit to timvandermeij/botio-files-pdfjs that referenced this issue Nov 11, 2023
Nowadays GitHub Actions runs linting on every push, so there is no need
for this separate command anymore as discussed in
mozilla/pdf.js#11851.
@timvandermeij
Copy link
Contributor Author

timvandermeij commented Nov 12, 2023

Thinking about this again, it might be easier to store reference images in a GitHub repository. This way we can easily access images from any run of makeref, and we don't need to set up anything in S3 or GCS.

I think that could work, but then we still need a location for storing the cached PDF files, and especially the linked PDFs are difficult. We don't want to download them again for every run but we also can't put them in a Git repository (for the same reason we currently don't and have the concept of linked PDFs in the first place). Currently we cache them on the filesystem on the bots, and we will need some equivalent option for GitHub Actions.

If we don't want to rely on S3 then GitHub Actions can also store artifacts (see https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts) but a problem is the 90 day retention limit and the fact that most likely the artifacts are public. Or do you perhaps have other suggestions here?

(It has been a while since I worked on it so I'm not sure if the secrets listed in #11851 (comment) are still valid, but because we have recently been making more progress in moving bot actions to GitHub Actions workflows I figured this becomes relevant again.)

@marco-c
Copy link
Contributor

marco-c commented Nov 12, 2023

What if we used a private github repository? It would be essentially the same as a local filesystem.

@timvandermeij
Copy link
Contributor Author

timvandermeij commented Nov 18, 2023

Yes, I think that can work. We basically have two requirements for storage:

  • Private storage: cache of the PDF files used in reference tests (but also in unit/integration tests if linked test cases are used);
  • Public storage: preview builds of the viewer and reference images viewer, both publicly accessible via URLs like Botio currently does.

We'd have to think how the interaction between GitHub Actions and such a repository should work (probably a custom SSH key, or API key, or special credentials that we store in the GitHub secrets manager). It might be possible to use the same repository for both requirements, or split them off if there are better options per requirement (for example, perhaps GitHub Pages can be used for the public viewers).

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Dec 2, 2023

It seems that any further progress on this issue is now completely blocked by having somewhere to store test PDFs and reference images.

Most likely we'll need to do some "real world" experimentation/testing to see what will/won't work, and converting e.g. botio unittest might be a good place to start; however based on the recent discussion we'll need:

  • A private GitHub repository setup. (It'd be nice if Tim and myself had access to it, to make things easier.)
  • The private repository configured such that the PDF.js GitHub Actions can access it, as mentioned in Migrate bots to GitHub Actions #11851 (comment), for both reading and writing.

@calixteman, @marco-c Can either of you please help out with the above?

With that in place we'd hopefully be able to replicate the reading/writing that botio unittest currently does, but in a GitHub Actions workflow, see

which again hopefully would be enough to migrate the unit-tests to GitHub Actions. Once this works, the other test-suites should be fairly simple to convert as well.

@marco-c
Copy link
Contributor

marco-c commented Dec 2, 2023

I'm going to create a pdf.js-references repository and give all of us access.

@marco-c
Copy link
Contributor

marco-c commented Dec 2, 2023

Done, it is currently under my own account, I'll see if I can move it to the mozilla org next week.

@marco-c
Copy link
Contributor

marco-c commented Dec 2, 2023

Regarding accessing the repo through GitHub Actions, we can do it through the @moz-tools-bot bot. We can create a secret that stores the API key and push to the repo by doing something like "https://{git_user}:{git_password}@github.com/{repo}".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants