Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(web): CI builds are currently extremely unreliable #10494

Closed
mcdurdin opened this issue Jan 24, 2024 · 11 comments
Closed

chore(web): CI builds are currently extremely unreliable #10494

mcdurdin opened this issue Jan 24, 2024 · 11 comments
Assignees
Labels
chore ci Issues relating to build infrastructure web/
Milestone

Comments

@mcdurdin
Copy link
Member

Over the last month, we have had over 50 failed builds which are due to timeouts or similar and seem to be related to BrowserStack calls?

This is adding a lot of noise to otherwise good pull requests. Recommendations:

  1. Move /common/web modules that are KeymanWeb-exclusive into /web
  2. Disable BrowserStack tests entirely unless /web files are touched.
  3. Try and find out what is going wrong -- add retries/robustness etc.
@mcdurdin mcdurdin added this to the 18.0 milestone Jan 24, 2024
@mcdurdin mcdurdin added the ci Issues relating to build infrastructure label Jan 24, 2024
@mcdurdin
Copy link
Member Author

Notes:

  • it seems that some BrowserStack sessions are getting a plain major.minor.patch version number and others are getting a full major.minor.patch-x-y-pr version number. Could this be related to the agent triggering it? Could multiple agents running web builds at the same time be related?

@mcdurdin
Copy link
Member Author

mcdurdin commented Jan 24, 2024

See also:

keyman/web/test.sh

Lines 49 to 65 in c9fac1b

if [[ $VERSION_ENVIRONMENT == test ]]; then
# Implied: CONFIG=CI.conf.js because `-CI` parameter is passed.
#
# If we are running a TeamCity test build, for now, only run BrowserStack
# tests when on a PR branch with a title including "(web)" or with the label
# test-browserstack. This is because the BrowserStack tests are currently
# unreliable, and the false positive failures are masking actual failures.
#
# We do not run BrowserStack tests on master, beta, or stable-x.y test
# builds.
DO_BROWSER_TEST_SUITE=false
if builder_pull_get_details; then
if [[ $builder_pull_title =~ \(web\) ]] || builder_pull_has_label test-browserstack; then
DO_BROWSER_TEST_SUITE=true
fi
fi
fi

Why is this not working?

@mcdurdin
Copy link
Member Author

The BrowserStack tests with full version information are under @jahorton's username. The ones with only major.minor.patch are under the Keyman Server username (i.e. running from CI).

So ... something is different in how they are being triggered.

@jahorton
Copy link
Contributor

The BrowserStack tests with full version information are under @jahorton's username. The ones with only major.minor.patch are under the Keyman Server username (i.e. running from CI).

So ... something is different in how they are being triggered.

In case this is something we have to triage and get back to later... I haven't directly run BrowserStack tests from my machine in months.

I remember that early on, I think my log-on details were used to set up BrowserStack CI... and then we changed the details to use Keyman Server stuff.

@mcdurdin
Copy link
Member Author

The BrowserStack tests with full version information are under @jahorton's username

JH noticed that the LMLayer test configuration has a different username

@jahorton
Copy link
Contributor

Toward the bottom of the logs BrowserStack had available, we were able to determine that in stable-16.0, both sources of unit tests - KMW and the LMLayer - had full version strings pass through. (And my name, not keyman-server.)

We have not yet determined... where it's even getting the version string from.

@mcdurdin mcdurdin modified the milestones: 18.0, A18S1 Apr 19, 2024
@jahorton
Copy link
Contributor

jahorton commented Apr 26, 2024

Thought I'd put the comment here, but I can't find it, wherever it did land.

Useful reference: actions/runner-images#3737

A specific comment of note: actions/runner-images#3737 (comment)

There's a recent npm fix that may be notably related. Updating to NPM 10.5.2 or higher would give us that fix.

@mcdurdin
Copy link
Member Author

mcdurdin commented May 3, 2024

Thought I'd put the comment here, but I can't find it, wherever it did land.

#10350.

@darcywong00 darcywong00 modified the milestones: A18S1, A18S2 May 11, 2024
@mcdurdin mcdurdin modified the milestones: A18S2, A18S3, A18S4 May 24, 2024
@jahorton
Copy link
Contributor

jahorton commented Jun 10, 2024

With #11456 and #11451 in place, this issue is probably resolved for now. It's also believed that #8616 was related; @mcdurdin mentioned finding something about use of git:// entries in package.json sometimes contributing to npm instability. (Sadly, I can't find where he posted a reference about it, but it was certainly mentioned in our discussions in other team channels.)

I'll leave this up for a bit to make sure that everything is fully resolved, as the changes made by #8616 and #11451 may trigger a new behavior we haven't gotten to test or validate fully set.

Granted, there may be a minor flare-up when we get to #11501... but we should be able to limit the scope of those tests and keep things stabler than before, even then.

@jahorton
Copy link
Contributor

jahorton commented Jun 17, 2024

At this point, the only instabilities I'm seeing on master-based PRs is from the test builds for our other products, rather than from testing Keyman Engine for Web itself. This doesn't fix stable instability issues, but I still consider it enough of a win to close this issue.

@mcdurdin
Copy link
Member Author

mcdurdin commented Jul 2, 2024

@mcdurdin mentioned finding something about use of git:// entries in package.json sometimes contributing to npm instability. (Sadly, I can't find where he posted a reference about it, but it was certainly mentioned in our discussions in other team channels.)

I wrote on Slack:

So removing the two git-based dependencies from package.json reduces the npm i time from 4 minutes to 13 seconds on my machine. Seems like that is a very poorly supported mechanism.

Certainly I concur that the issues appear to be pretty much resolved, hurrah (except stable as you note, which would be a bit hard to patch)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore ci Issues relating to build infrastructure web/
Projects
None yet
Development

No branches or pull requests

3 participants