Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RISC-V build plan #3591

Open
5 of 7 tasks
luhenry opened this issue Dec 20, 2023 · 9 comments
Open
5 of 7 tasks

RISC-V build plan #3591

luhenry opened this issue Dec 20, 2023 · 9 comments
Labels
aarch Issues that affect or relate to the aarch ARCHITECTURE enhancement Issues that enhance the code or documentation of the repo in any way epic Issues that are large and likely multi-layered features or refactors jenkins Issues that enhance or fix our jenkins server testing Issues that enhance or fix our test suites

Comments

@luhenry
Copy link
Contributor

luhenry commented Dec 20, 2023

In our current building and testing of Temurin on RISC-V, the main limiting factor to get to GA is the limited access to RISC-V boards. We are hoping to have access to more in the future, but even the ones we have access to today are slow or not available in enough quantity.

To alleviate the pressure on the pool of RISC-V boards we have, I am exploring building Temurin on QEMU on an aarch64/x86 host. The first succesfull run can be found at https://ci.adoptium.net/job/build-scripts/job/jobs/job/evaluation/job/jobs/job/jdk17u/job/jdk17u-evaluation-linux-riscv64-temurin/34/.

This work relies on the following PRs:

@luhenry luhenry added the enhancement Issues that enhance the code or documentation of the repo in any way label Dec 20, 2023
@github-actions github-actions bot added aarch Issues that affect or relate to the aarch ARCHITECTURE jenkins Issues that enhance or fix our jenkins server testing Issues that enhance or fix our test suites labels Dec 20, 2023
@sxa
Copy link
Member

sxa commented Dec 20, 2023

the main limiting factor to get to GA is the limited access to RISC-V boards

Hmmm just to give my view on this I would actually somewhat dispute that assertion as it's written - we do have over a dozen boards of various types in the CI and while they may not be highly performant they are generally capable of producing useful test output so I don't think that's directly blocking progress - but a few other things likely are: To my mind the real limiting factors which I've seen while working on this are:

  • Whether anyone is actively looking at any test results and working to get them to green, since the gating factor for a Temurin GA is having passing AQA and TCK runs. (I'd suggest that the most Important priorities would be sanity.openjdk 17 21 22 and extended.openjdk 17 21 22)
  • Making sure the build and test runs are scheduled somewhere on a regular basis (Generally that will be weekly for the ones we're interested in - this PR should re-enable it for JDK21)
  • Making sure the build image has the appropriate boot JDKs available in it, or that they are downloadable during the build (build issue 3378 is currently preventing the boot JDK download from succeeding I believe when it's not on the machines)
  • Understanding which openjdk releases are the priority (I'd have said 21 as the latest LTS, although I know you've been working a lot with 17 recently Ludovic)
  • Try to ensure that we do not not end up blocked with the jobs trying to fire up a dynamic agent that our CI cannot currently do (as per what you've seen earlier in this thread - I've got a few of the unmatched boards started up again so there is an ok capacity there but I need to clear up some of the obsolete 2104 agent definitions under ci.role.tess&&hw.arch.riscv but we could certainly see if we can set up a way to use your new machine to spine them up if we're at capacity as per your aqa-tests PR but need to make sure it doesn't break the existing users of that outside Adoptium's CI
  • Some of the unmatched boards in the CI are still having slowness in their network transfer causing timeouts - we can look at increasing that, but I've also alerted PLCTlab in the last week since it's inconsistent but hopefully something they can resolve on their side. The do seem to proceed though so I'm tempted to increase the test timeout for the copyArtifacts stage for this platform
  • I think we've still also got an issue with the core counts not being correctly detected so the riscv64 test jobs so typically use concurrency:1 which makes them slower which they should do (this means the sanity.openjdk jobs typically hit the default 10 hours timeout) I'd been experimenting with this branch of aqa-tests which has given us some results for sanity.openjd that's currently failing with being unable to find the correct jtreg version on our dependencies job

Noting that on the first point, TRSS can help with the test analysis, although that has a prereq on the builds being scheduled regularly via the jobs such as https://ci.adoptium.net/job/build-scripts/job/evaluation-openjdk21-pipeline/ (Should be fixed as per the PR in the second bullet - note that's not currently publicly visible but we should fix that) but if it's useful we could potentially also have a tab on the ci.adoptium.net page for RISC-V which showed just the build and test jobs for that platform to make it easy to find the important ones to look at.

Obviously, proving it can pass in an RVV1.0 environment is highly desirable too (and a reasonable goal which could be solved with static docker containers or the dynamic ones from the second last bullet point) but if it doesn't fully pass anywhere we've got a bigger problem to solve 🙂

@luhenry
Copy link
Contributor Author

luhenry commented Dec 21, 2023

AIs from offline discussion:

Notes:

  • Priorities for Rivos are 17, 21, tip. 11 when merged upstream
  • We want regular (weekly at first, ideally daily) build+tests on public CI, and weekly build+tests on private/TCK CI

@sxa
Copy link
Member

sxa commented Dec 21, 2023

Provide bootjdk for jdk21u from TBD to docker image

For this, the best one to use is the one described in https://fosstodon.org/@sxa/111449356957539294 which was built in a way that will run in a container (still needs --security-opt secocomp=unconfined and on Ubuntu 20.04: https://api.adoptium.net/v3/binary/version/jdk-21.0.1+12.1-ea-beta/linux/riscv64/jdk/hotspot/normal/adoptium - that should be good as a bootstrap for JDK22 as well (The first build of that should happen on the next new tag in there, which is likely to be later today.

The other thing, which we didn't explicitly talk about, was that we'll need #3378 fixed to be able to build the main jdk (jdk23 now) repository unless we also put a JDK22 into the image.

@sxa
Copy link
Member

sxa commented Dec 21, 2023

I've created a RISC-V view at https://ci.adoptium.net/view/RISC-V/ as a convenient way of viewing the jobs we're interested in for the purposes of this so we can see how many of them are having problems

@sxa
Copy link
Member

sxa commented Dec 22, 2023

Verified that 22 and 23 are now being triggered along with the other platforms but 23 is failing (as expected) due to the dirmngr error (third bullet point in the big list above)

@sxa
Copy link
Member

sxa commented Dec 22, 2023

New docker build image with the updated JDKs is being pushed as I write this :-)

@luhenry
Copy link
Contributor Author

luhenry commented Jan 16, 2024

I'm doing a full run on jdk21u at https://ci.adoptium.net/job/build-scripts/job/jobs/job/evaluation/job/jobs/job/jdk21u/job/jdk21u-evaluation-linux-riscv64-temurin/112/console and collecting the test failures into adoptium/aqa-tests#4976.

I'll do a full run on jdk17u next and collect the test failures into adoptium/aqa-tests#4976 as well.

@sxa
Copy link
Member

sxa commented May 10, 2024

jdk17u pipeline:

@sxa sxa added the epic Issues that are large and likely multi-layered features or refactors label Aug 14, 2024
@sxa
Copy link
Member

sxa commented Aug 14, 2024

Noting that jdk11u is currently failing as the regular pipelines are building from tags which are not valid for that repository as the tags do not include the changes from the riscv-port branch.
Covered in #3911

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aarch Issues that affect or relate to the aarch ARCHITECTURE enhancement Issues that enhance the code or documentation of the repo in any way epic Issues that are large and likely multi-layered features or refactors jenkins Issues that enhance or fix our jenkins server testing Issues that enhance or fix our test suites
Projects
Status: Todo
Development

No branches or pull requests

2 participants