Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIX CI machines cannot access raw.githubusercontent.com #2330

Closed
targos opened this issue May 26, 2020 · 15 comments
Closed

AIX CI machines cannot access raw.githubusercontent.com #2330

targos opened this issue May 26, 2020 · 15 comments

Comments

@targos
Copy link
Member

targos commented May 26, 2020

This is looking like some sort of infrastructure issue where the AIX CI machine doesn't appear to be able to download https://raw.githubusercontent.com/ URLs.

e.g.

Running AIX CI on v12.x to check if that works: https://ci.nodejs.org/job/node-test-commit-aix/30710/

(Note this build passed)

...
+ curl https://raw.githubusercontent.com/nodejs/build/master/jenkins/scripts/node-test-commit-pre.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0+ bash -xe

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
...
  0     0    0     0    0     0      0      0 --:--:--  0:01:14 --:--:--     0
curl: (28) Failed to connect to raw.githubusercontent.com port 443: A remote host did not respond within the timeout period.
...
if [ -x /home/iojs/build/workspace/node-test-commit-aix/nodes/aix71-ppc64/./node ] && [ -e /home/iojs/build/workspace/node-test-commit-aix/nodes/aix71-ppc64/./node ]; then /home/iojs/build/workspace/node-test-commit-aix/nodes/aix71-ppc64/./node  tools/doc/versions.js out/previous-doc-versions.json; elif [ -x `which node` ] && [ -e `which node` ] && [ `which node` ]; then `which node`  tools/doc/versions.js out/previous-doc-versions.json; else echo "No available node, cannot run \"node  tools/doc/versions.js out/previous-doc-versions.json\""; exit 1; fi;
Unable to retrieve https://raw.githubusercontent.com/nodejs/node/master/CHANGELOG.md. Falling back to /home/iojs/build/workspace/node-test-commit-aix/nodes/aix71-ppc64/CHANGELOG.md.
...

The tools/doc/versions.js tool falls back to the local copy of the CHANGELOG.md for non-release commits (which is why only the release commit build failed) to accommodate users with poor/firewalled Internet connections.

cc @nodejs/platform-aix

Originally posted by @richardlau in nodejs/node#33197 (comment)

@sam-github
Copy link
Contributor

Update: we have reached out internally to the infra providers to ask what's going on, it looks like some kind of off-machine firewall is banning the traffic. Waiting for a response, still.

@sam-github
Copy link
Contributor

We got a response, and unfortunately, its deliberate.

raw.githubusercontent.com is apparently a commonly blacklisted site (I assume because its used sometimes by bad actors as a CDN for malware). Avoiding using it in our CI would be quite onerous to us. The network specialists are in contact with GH to ask their opinion of this.

@mhdawson
Copy link
Member

@sam-github I believe this is now resolved right?

@sam-github
Copy link
Contributor

Builds are green now: https://ci.nodejs.org/job/node-test-commit-aix/30771/

The provider sec team resolved something with GH, and unblocked it.

@targos
Copy link
Member Author

targos commented Jun 2, 2020

This is still happening with today's security releases: https://ci.nodejs.org/job/node-test-commit-aix/30869/nodes=aix71-ppc64/console

@richardlau richardlau reopened this Jun 2, 2020
@sam-github
Copy link
Contributor

I confirmed with a manual curl https://raw.githubusercontent.com/nodejs/build/master/jenkins/scripts/node-test-commit-pre.sh, and sent an email to the infra provider (CCing @mhdawson @AshCripps @richardlau ).

@sam-github
Copy link
Contributor

At this point, I think we have no choice but to go forward with the releases, and then release AIX afterwards. Agreed @mhdawson ?

@BethGriggs
Copy link
Member

Looks like the Release builds worked (assuming they don't depend on that domain) - https://ci-release.nodejs.org/job/iojs+release/nodes=aix71-ppc64/

@targos
Copy link
Member Author

targos commented Jun 2, 2020

@sam-github this doesn't impact the release build. Only node-test-commit jobs run on release commits.

@richardlau
Copy link
Member

Yes, the release build aren't affected because they don't access that domain.

Just to clarify being unable to contact raw.githubusercontent.com causes the following two issues for the test CI:

  1. We're unable to fetch the latest version of the jenkins/scripts/node-test-commit-pre.sh script which handles things like rebasing. This doesn't appear to fail the build when it fails to download but does mean the script isn't executed (so if you were trying to rebase you may not be build/testing what you thought you requested).
  2. The docs build fails to pull down the latest CHANGELOG.md from master in tools/doc/versions.js. For non-release commits this results in a warning and then fallback to the local changelog file and does not fail the build. For release commits the doc tool hard fails if it cannot grab the latest changelog (because the local changelog for older release lines has no references to the later release lines).

One of the issues with this issue is that for non-release commits the builds are not failing so we don't know this has happened between releases unless we are scanning the job output of passing builds (i.e. it was fixed, but now appears to have come back but we didn't know until we tried to test a release).

@sam-github
Copy link
Contributor

OK, great that the release is not impacted, that is a relief.

Though I do wonder if the curls should be mandatory, so that we do notice when they break!

Of course, that would then maybe block the release.... sometimes available infrastructure is hard to deal with. I'm not sure what the right thing to do here is, longer term.

@richardlau
Copy link
Member

I'm not going to start messing around with the job config while the security release is being prepped, but for the first issue with the node-test-commit-pre.sh script the current flow is:

  1. Use curl to get the latest version of the script and pipe it to bash to execute.
  2. The post-build-status-update job is run
  3. The job later uses git to clone down https://github.com/nodejs/build.git (so it can run the select-compiler.sh script).

So we could move the git clone of the build repo to before running the post-build-status-update job and then execute node-test-commit-pre.sh from the locally cloned files.

For the second issue options we can do include:

  • Instead of parsing the CHANGELOG.md from GitHub (via the raw.githubusercontent.com URL) we could use one of the index files from nodejs.org (e.g. index.json) on the basis that (hopefully) nodejs.org is less likely to be blacklisted than raw.githubusercontent.com.
  • Allow the NODE_TEST_NO_INTERNET environment variable (note despite the name of the env var it's not the tests that are failing here, it's the actual docs build) to affect release commit builds and set it in the AIX test job.

@richardlau
Copy link
Member

I've made modifications to the node-test-commit-aix job so that the git clone of the build repo happens earlier and we use the checked out copy of jenkins/scripts/node-test-commit-pre.sh instead of attempting to use curl on the raw.githubusercontent.com URL. I've also added the following to set the NODE_TEST_NO_INTERNET environment variable if we cannot reach raw.githubusercontent.com:

# Test to see if we can reach raw.githubusercontent.com
curl -s https://raw.githubusercontent.com/nodejs/node/master/CHANGELOG.md > /dev/null || export NODE_TEST_NO_INTERNET=1

Test build on the v10.21.0 release commit passed but revealed that the jenkins/scripts/node-test-commit-pre.sh script does another curl:
https://ci.nodejs.org/job/node-test-commit-aix/30880/nodes=aix71-ppc64/consoleFull

++ curl https://raw.githubusercontent.com/nodejs/build/master/jenkins/scripts/node-test-commit-diagnostics.sh
++ bash -ex -s before
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
 ...
  0     0    0     0    0     0      0      0 --:--:--  0:01:14 --:--:--     0
curl: (28) Failed to connect to raw.githubusercontent.com port 443: A remote host did not respond within the timeout period.

There's a TODO comment in the script saying to run locally, so I'll look into addressing that.

# TODO(gib): Run locally once we're cloning the whole git repo.
curl https://raw.githubusercontent.com/nodejs/build/master/jenkins/scripts/node-test-commit-diagnostics.sh | bash -ex -s before

@richardlau
Copy link
Member

I've drafted a change in #2342 but I'll need to clone some of the node-test-commit-* jobs and edit the copies to use that PR to test it hasn't regressed other platforms and does address this issue on AIX.

@richardlau
Copy link
Member

Landed #2342. Together with the job configuration changes we are no longer attempting to use raw.githubusercontent.com URLs except for one place where we test to see if we can access it (and only set the NODE_TEST_NO_INTERNET environment variable to shortcut the part of the docs build that uses it if we cannot).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants