Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up CI with Azure Pipelines #6495

Merged
merged 38 commits into from
Nov 14, 2018
Merged

Set up CI with Azure Pipelines #6495

merged 38 commits into from
Nov 14, 2018

Conversation

arcanis
Copy link
Member

@arcanis arcanis commented Oct 4, 2018

No description provided.

@arcanis
Copy link
Member Author

arcanis commented Oct 4, 2018

@btholt Is it possible for the pipeline to be configured to run on multiple systems? It seems to be currently running on Ubuntu, but could I run it on OSX and Windows as well? I found how to make matrices work for the node executable, but not yet the OS.

@arcanis
Copy link
Member Author

arcanis commented Oct 5, 2018

Well, that was easy. Let's see if we can factorize a bit the build steps ...

@buildsize
Copy link

buildsize bot commented Oct 5, 2018

File name Previous Size New Size Change
yarn-[version].noarch.rpm 1.1 MB 1.1 MB 19 bytes (0%)
yarn-[version].js 4.46 MB 4.46 MB 298 bytes (0%)
yarn-legacy-[version].js 4.65 MB 4.65 MB 298 bytes (0%)
yarn-v[version].tar.gz 1.12 MB 1.11 MB -2.92 KB (0%)
yarn_[version]all.deb 813.67 KB 813.69 KB 22 bytes (0%)

@arcanis
Copy link
Member Author

arcanis commented Oct 5, 2018

It also seems like the pipeline is configured as public project, but the organization "Retention & parallel jobs" tab report it as private (already 70 minutes over 1800!)

@btholt
Copy link

btholt commented Oct 5, 2018

@arcanis I'm asking about it showing up as private.

@arcanis
Copy link
Member Author

arcanis commented Oct 5, 2018

@btholt Something weird: it seems that my yarn build command isn't executed on Windows:

https://dev.azure.com/yarnpkg/yarn/_build/results?buildId=7&view=logs

I've checked the logs for the "install and build" step, and as you can see the yarn install is correctly executed, but there are no logs for yarn build:

2018-10-05T09:57:05.9881068Z ##[section]Starting: install and build
2018-10-05T09:57:05.9885186Z ==============================================================================
2018-10-05T09:57:05.9885272Z Task         : Command Line
2018-10-05T09:57:05.9885345Z Description  : Run a command line script using cmd.exe on Windows and bash on macOS and Linux.
2018-10-05T09:57:05.9885401Z Version      : 2.136.0
2018-10-05T09:57:05.9885467Z Author       : Microsoft Corporation
2018-10-05T09:57:05.9885524Z Help         : [More Information](https://go.microsoft.com/fwlink/?LinkID=613735)
2018-10-05T09:57:05.9885601Z ==============================================================================
2018-10-05T09:57:08.0611987Z Generating script.
2018-10-05T09:57:08.1266833Z ##[command]"C:\Windows\system32\cmd.exe" /D /E:ON /V:OFF /S /C "CALL "D:\a\_temp\eaa09d62-ec63-48f7-b305-69cfcaf2c36d.cmd""
2018-10-05T09:57:09.3095863Z yarn install v1.9.4
2018-10-05T09:57:09.5041870Z [1/5] Validating package.json...
2018-10-05T09:57:09.5065327Z [2/5] Resolving packages...
2018-10-05T09:57:10.0549255Z [3/5] Fetching packages...
2018-10-05T09:58:54.4028700Z info [email protected]: The platform "win32" is incompatible with this module.
2018-10-05T09:58:54.4030249Z info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
2018-10-05T09:58:54.4030537Z [4/5] Linking dependencies...
2018-10-05T09:58:54.4065184Z warning " > [email protected]" has incorrect peer dependency "eslint-plugin-babel@^4.1.1".
2018-10-05T09:58:54.4072112Z warning "eslint-config-fb-strict > [email protected]" has incorrect peer dependency "eslint-plugin-babel@^4.1.1".
2018-10-05T09:59:04.6016375Z [5/5] Building fresh packages...
2018-10-05T09:59:05.2854300Z Done in 115.99s.
2018-10-05T09:59:05.3512375Z ##[section]Finishing: install and build

Contrast that with the Linux job, that clearly shows that the yarn build command is correctly executed after the install (cf the yarn run v1.9.4 line):

2018-10-05T00:23:33.9871864Z ##[section]Starting: install and build
2018-10-05T00:23:33.9874577Z ==============================================================================
2018-10-05T00:23:33.9874685Z Task         : Command Line
2018-10-05T00:23:33.9874744Z Description  : Run a command line script using cmd.exe on Windows and bash on macOS and Linux.
2018-10-05T00:23:33.9874804Z Version      : 2.136.0
2018-10-05T00:23:33.9874913Z Author       : Microsoft Corporation
2018-10-05T00:23:33.9874970Z Help         : [More Information](https://go.microsoft.com/fwlink/?LinkID=613735)
2018-10-05T00:23:33.9875070Z ==============================================================================
2018-10-05T00:23:34.1198224Z Generating script.
2018-10-05T00:23:34.1250923Z [command]/bin/bash --noprofile --norc /home/vsts/work/_temp/e2378931-9f83-4cfd-b32d-e99bad57fdd4.sh
2018-10-05T00:23:34.7121696Z yarn install v1.9.4
2018-10-05T00:23:34.8923594Z [1/5] Validating package.json...
2018-10-05T00:23:34.8944808Z [2/5] Resolving packages...
2018-10-05T00:23:35.5205509Z [3/5] Fetching packages...
2018-10-05T00:23:47.6694416Z info [email protected]: The platform "linux" is incompatible with this module.
2018-10-05T00:23:47.6696142Z info "[email protected]" is an optional dependency and failed compatibility check. Excluding it from installation.
2018-10-05T00:23:47.6790237Z [4/5] Linking dependencies...
2018-10-05T00:23:47.6870499Z warning " > [email protected]" has incorrect peer dependency "eslint-plugin-babel@^4.1.1".
2018-10-05T00:23:47.6874818Z warning "eslint-config-fb-strict > [email protected]" has incorrect peer dependency "eslint-plugin-babel@^4.1.1".
2018-10-05T00:23:57.1925459Z [5/5] Building fresh packages...
2018-10-05T00:23:57.6616831Z Done in 22.96s.
2018-10-05T00:23:58.0674000Z yarn run v1.9.4
2018-10-05T00:23:58.1539930Z $ gulp build
2018-10-05T00:23:58.9273760Z [00:23:58] Using gulpfile ~/work/1/s/gulpfile.js
2018-10-05T00:23:58.9285855Z [00:23:58] Starting 'build'...
2018-10-05T00:24:06.9641047Z [00:24:06] Finished 'build' after 8.04 s
2018-10-05T00:24:06.9717244Z Done in 8.91s.
2018-10-05T00:24:07.0034845Z ##[section]Finishing: install and build

Any idea what I'm doing wrong?

@arcanis
Copy link
Member Author

arcanis commented Oct 5, 2018

Interesting - using bash instead of script seems to have fixed this blocker (not sure why, since yarn jest worked in a later step). Maybe that's what the tutorial should use in its examples?

@arcanis
Copy link
Member Author

arcanis commented Oct 5, 2018

I think there's still something wrong - we have 122 synchronous tests that have a timeout of 5 seconds each, which should give at most 10 minutes of execution. But the tests have been running for 45 minutes now with no output 😢

I guess we'll have to ask someone with Windows to investigate and find what could block the execution.

@arcanis
Copy link
Member Author

arcanis commented Oct 5, 2018

Seems like:

  • Lots of tests are timeouting, but some pass. Some of those which pass seem to execute in ~4s though (much higher than Linux, and close to the per-test timeout), which could explain this. I'll try to bump the timeout to 15s (max execution time: 30m).

  • The tests finished executing in 10m (as anticipated), but something prevented the process from exiting until the Azure infra canceled the job. Maybe it's related to the timeouts, but I'll add --detectOpenHandles to see if I can extract information if it happens again.

https://dev.azure.com/yarnpkg/yarn/_build/results?buildId=11&view=logs

2018-10-05T10:27:06.4982973Z Test Suites: 1 failed, 1 total
2018-10-05T10:27:06.4983035Z Tests:       108 failed, 14 passed, 122 total
2018-10-05T10:27:06.4983102Z Snapshots:   0 total
2018-10-05T10:27:06.4983165Z Time:        556.525s
2018-10-05T10:27:06.4983224Z Ran all test suites matching /yarn/i.
2018-10-05T10:27:07.4447450Z Jest did not exit one second after the test run has completed.
2018-10-05T10:27:07.4684363Z 
2018-10-05T10:27:07.6951431Z This usually means that there are asynchronous operations that weren't stopped in your tests. Consider running Jest with `--detectOpenHandles` to troubleshoot this issue.
2018-10-05T11:14:40.9439292Z Terminate batch job (Y/N)? 
2018-10-05T11:14:41.1987238Z Terminate batch job (Y/N)? 
2018-10-05T11:14:41.4322601Z ^C
2018-10-05T11:14:41.4937321Z ##[error]The operation was canceled.
2018-10-05T11:14:41.4984506Z ##[section]Finishing: run the acceptance tests

@arcanis
Copy link
Member Author

arcanis commented Oct 5, 2018

A bit more passed (20/122), but it seems like there's something blocking the processes, that end up stuck in most cases. This is annoying, especially since we don't even access the real network (the tests boot a local mock server and communicate with it, it never needs to reach the npm registry) 🙁

@kaylangan
Copy link

kaylangan commented Oct 5, 2018

I'm a Program Manager on Azure Pipelines. Were you able to get in contact with someone about why your builds are getting marked as private? If not, I can follow up.

Can you try setting timeoutInMinutes to say 120 (max is 360)? The default is 60 minutes and I see that the Windows build timed out and got cancelled before everything completed. That may help diagnose what's going on.

@btholt
Copy link

btholt commented Oct 5, 2018

@kaylangan I was going to send an email this morning, but since you're here could you follow up on it?

@ericsciple
Copy link

@arcanis regarding the hang in the step run acceptance tests, can you try adding an additional line to the end of your script. Something like:

  - script: | 
      cd packages/pkg-tests 
      yarn jest yarn --detectOpenHandles
      echo done
    displayName: 'run the acceptance tests'

I'm curious whether yarn is hanging, or whether the script task is hanging. If the script task is hanging, then the bug is on me :)

@arcanis
Copy link
Member Author

arcanis commented Oct 6, 2018

Hey @kaylangan! 🙂

Regarding private/public, no, not yet. For the record, this is why I see in my interface. It's a bit strange since the settings also say that the project has a public visibility.

Hey @ericsciple, thanks for the support! From my investigation, it appears that it's Jest that doesn't exit, not the script task itself. The exact reason why it doesn't exit isn't clear (likely an open handle somewhere), but I think it's related to how most of the individual tests are timeouting - it's possible they leak subprocesses somewhere. So what I'm trying to figure out first is why all the tests are hanging on Windows, even though they appear to work fine on Linux / OSX 🤔

@ericsciple
Copy link

@arcanis if all processes within the tree are still in tact (can build tree from pid/ppid), then a tool like Process Monitor may help to show the tree. Of course you would need to be logged in to a Windows machine since it is a graphical tool.

At the end of the job, the agent kills all orphaned processes that it can detect. We add a new guid environment variable per-job, and child processes typically inherit the env vars. At the end of the job, we search for all processes with that env var, and kill them. All killed processes are logged to the worker diagnostic log. If you set a variable agent.diagnostic=true, the agent/worker diagnostic logs are uploaded and you should be able to download them from the build summary page.

Otherwise if the process is already gone at that point, I have another idea that might work. You could start a background process during your step, that scans for any processes with that specific environment variable and logs them to a file. And set a timeout on your step. Then in a subsequent always-run step, you could upload that log. If that approach sounds like it would help, email ersciple and we can iterate to get a script working to troubleshoot (at microsoft com). If it helps, I can publish it on our troubleshooting doc too (may help others).

@kaylangan
Copy link

Do you have any other private projects in the Azure DevOps account or was this project private at any point?

@arcanis
Copy link
Member Author

arcanis commented Oct 8, 2018

Never tried Azure before. The weird thing is that I did select private project at the beginning, but I don't remember ever switching it into private (but I can't switch it "back" to private, so maybe I'm mistaken?)

@hross
Copy link

hross commented Oct 9, 2018

I ran one of the tests with yarn jest -t 'it should correctly install a single dependency that contains no sub-dependencies' on my local windows environment and ran into the same issue above. It seems like a windows specific issue.

When I look at the process spawned by the test it is just hanging there. I added some logging and it looks like the reason is yarn install is hanging in the tests without failing. I'm wondering if there is an issue resolving/fetching packages that is causing the process to hang (I can repro that by changing the NPM_CONFIG_REGISTRY value but I'm not yet sure if that's the issue).

@arcanis
Copy link
Member Author

arcanis commented Oct 9, 2018

Yes I've spent some time debugging it yesterday, and it seems the problem is there (or around here):

https://github.com/yarnpkg/yarn/blob/master/src/fetchers/tarball-fetcher.js#L151

It seems that during the Fetch step (where we fetch the tgz) we might reach a state where the untarStream is still waiting for data even after the request has returned (so finish is never called).

I have no idea why it would only apply on Windows, or why we don't see it on the regular testsuite. Maybe it could be related to the download size (bigger archives don't trigger the bug, but small ones would)?

@kaylangan
Copy link

So long as the source is public and the project in Azure Pipelines is public, the minutes shouldn't count against the private quota. Are you still seeing that number go up after you turned the project public? We're taking a look on our side.

Also, I see your builds are timing out at 60 minutes. You can set the timeout to be as high as 6 hours.

@arcanis
Copy link
Member Author

arcanis commented Oct 9, 2018

Yup, the number went up after each build, but as mentionned I don't remember even turning it public in the first place, so that would make sense (but then it doesn't make sense that it's listed as public in another part of the dashboard and that the links can be accessed by anyone 😅 ).

Re: the timeout, the problem appears to be software - our tests shouldn't take more than 10 minutes top.

@kaylangan
Copy link

@arcanis we've tracked down the issue to a bug on our side. The bug won't affect the concurrency of your jobs; we're just posting the minutes back as the wrong type. We've got a fix ready and it'll be deployed soon.

@arcanis
Copy link
Member Author

arcanis commented Nov 7, 2018

Btw @pablonete, regarding 6830dd3: the test is to make sure that when you run yarn run foo, the foo script will always use the exact same Node binary than the one used to run Yarn itself. So the "fake binary" that is setup in the test is actually a trap: the test should fail if this binary ends up executed.

@pablonete
Copy link
Contributor

Thank you for clarifying, @arcanis, I guess I didn't dig deep enough, I'll take another look at that test and see why fake binaries were being executed then.

Btw, do you think it's a right time to enable Linux and macOS builds on Azure Pipelines? It would help us to detect regressions on other OS while fixing Windows tests. I see it's commented out, did you find any issue with running on the 3 platforms?

@arcanis
Copy link
Member Author

arcanis commented Nov 7, 2018

Note that this same testsuite is being executed on Linux via the test-pkg-tests-linux-node10 and test-pkg-tests-linux-node8 testsuites, so if they break it's that something is off.

I see it's commented out, did you find any issue with running on the 3 platforms?

Nope, they were working perfectly fine, I just wanted to decrease the load on Azure to ensure quick feedback for the Windows builds (especially to avoid problems caused by queuing and such) - I'll reenable them and deprecate their CircleCI counterparts as soon as we get Windows working 😃

@arcanis
Copy link
Member Author

arcanis commented Nov 8, 2018

Almost there! Only two remaining issues:

  • The test runner creates a temporary directory for each test (in C:\). Unfortunately, the repository is checked-in in D:\. This causes issues with link:-type dependencies, because they are stored as relative paths - which cannot work accross disks. The fix would be to add support for that by automatically storing the absolute path if the relative one happens to be on separate drives.

  • For some reason the scripts don't seem to be using the forced binaries, which is odd. This is very likely caused by the makePortableScript calls, which aren't actually portable (I didn't took the time to write the Windows version 🤕). They should be replaced by cmd-shim, that we use in other places.

@arcanis
Copy link
Member Author

arcanis commented Nov 12, 2018

Everything works! Many thanks to you, @pablonete! 🥇

Interestingly, CircleCI seems generally faster than Azure except for the OSX builds which are always queued for a very long amount of time. I think it's fine in our case, but I wonder if maybe there's a cache somewhere I need to enable.

@kaylangan I received the following email from Azure - is there a way to keep using the free plan for the Yarn organization? I admit I'm not too sure what are the various options. At the moment CircleCI has the advantage of being free (which matters since we're an independent org with no financing).

@martinwoodward
Copy link

Checked your Azure DevOps organization and it's all properly set up for free open source builds so it's all good there. The email from Azure looks to be related to an Azure Free Trial subscription which is probably related to something set up on that same email address? If you are just using hosted Linux/windows/mac build jobs in Azure DevOps then you are all good. If you have some of your own Azure compute resources such as your own virtual machines, web apps, DNS etc then that might be what the email is about. Feel free to message me if you want me to dig in more or if the yarn project needs some additional Azure hosted services. Sorry about the confusing email message though.

* Increase timeout in Windows, we're seeing tests failing randomly and others close to default 5 sec.

* Distinguish tests published from each job.

* Pass name as vmImage is not available

* Remove unnecessary detect unfinished tests.

* Using strategy var instead of parameter

* Use variables instead of strategy
@arcanis
Copy link
Member Author

arcanis commented Nov 14, 2018

Awesome 😃

Ok, I think this PR is ready to be merged. I'll just temporarily revert my commit that disable CircleCI until @Daniel15 can take a look at how we could integrate the Azure testsuite inside our release process (if I remember correctly what he explained about Appveyor, there's a webhook that need to be configured somewhere).

Thanks a lot for all your help!

@arcanis arcanis merged commit 7f41910 into master Nov 14, 2018
@Daniel15
Copy link
Member

@arcanis The webhook is used to archive the 'nightly' builds (https://yarnpkg.com/en/docs/nightly) and to publish them when a release is tagged. The webhook doesn't actually run any of the tests. Having said that, it's on my todo list to try and get some time to see if we can move those webhooks to use the Azure DevOps stuff. Likely towards the end of the month. For now we should keep AppVeyor and CircleCI running as-is 😃

It would simplify a lot if we could use one system, as currently we need to grab build artifacts from both CircleCI and AppVeyor, and only publish the release once we receive the webhook calls from both.

@Daniel15
Copy link
Member

Does this use GitHub permissions, or is permissioning separate? Wondering if we need to add all the core Yarn team as admins for the project.

@kaylangan
Copy link

@Daniel15 for now, permissioning is separate. We're working on integrating with GitHub permissions.

@arcanis arcanis deleted the azure-pipelines branch September 18, 2019 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants