Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please create new Helix queue ubuntu.2004.ppc64le.experimental.open #9567

Closed
directhex opened this issue Jun 3, 2022 · 28 comments
Closed

Please create new Helix queue ubuntu.2004.ppc64le.experimental.open #9567

directhex opened this issue Jun 3, 2022 · 28 comments
Assignees
Labels
Detected By - Customer Issue was reported by a customer Operations Used by FR to track issues related to operations work Ops - First Responder

Comments

@directhex
Copy link
Contributor

directhex commented Jun 3, 2022

As part of IBM's engagement with us, we'd like to add some IBM-provided PPC64 little endian VMs they've provided to a Helix queue so they can get early & easy visibility on failures - this mirrors their engagement on s390x, and the corresponding ubuntu.2004.s390x.experimental.open queue.

Right now the only authorized SSH key on the provided VMs is my own, I can add anyone else's if there's a pubkey documented somewhere I should add.

@ilyas1974 ilyas1974 added Ops - First Responder Detected By - Customer Issue was reported by a customer labels Jun 3, 2022
@ilyas1974 ilyas1974 assigned lkts and unassigned lkts Jun 6, 2022
@tkapin
Copy link
Member

tkapin commented Jun 7, 2022

Assigning to @oleksandr-didyk, @premun - please give Alex the context here. Thanks!

@ilyas1974 ilyas1974 added the Operations Used by FR to track issues related to operations work label Jun 7, 2022
@directhex
Copy link
Contributor Author

directhex commented Jun 7, 2022 via email

@premun
Copy link
Member

premun commented Jun 8, 2022

@directhex just to make sure - do I understand it correctly that there is one specific VM somewhere (in Azure?) and we will want to install the Helix agent there and connect it to the new queue?

@directhex
Copy link
Contributor Author

It's four VMs in a cloud operated by an IBM partner, running PPC64 hardware. We just want to install the Helix agent & connect it to the new queue, for test runs via runtime-community.yml

@oleksandr-didyk
Copy link
Contributor

@directhex and will you install the agent on the VMs or should we provide a pubkey and install it ourselves?

@directhex
Copy link
Contributor Author

I'm happy to do it myself, but I'll need access to the keyvault secrets demanded by the helix setup script. If y'all would prefer to do it, I'm happy to do that too.

@oleksandr-didyk
Copy link
Contributor

@oleksandr-didyk
Copy link
Contributor

As for your question @directhex, I think @premun has more context to make the best decision here

@directhex directhex changed the title Please create new Helix queue ubuntu.2004.ppc64le.experimental Please create new Helix queue ubuntu.2004.ppc64le.experimental.open Jun 8, 2022
@oleksandr-didyk
Copy link
Contributor

After a quick chat with @premun, we decided that if would be better if we installed the agent ourselves. I'll reach out to you with the pubkey that needs to be added. Also, just in case - would it be possible to roll-back the VM to some snapshot in the low-low case that we mess something up during the installation?

@directhex
Copy link
Contributor Author

I don't have direct access to the VM management (I didn't request it), but I can file a ticket for that if needed

@premun
Copy link
Member

premun commented Jun 8, 2022

@oleksandr-didyk looks like we have 4 attempts for now then 😅 should be doable

@oleksandr-didyk
Copy link
Contributor

Just a quick update on task status - PR was approved and merged, waiting on resolution of an issue with the build (related to the move to the new datacenter). Once that stops blocking us, we will continue with installing the agent on the VMs and will let you know once they are ready to accepts tasks from the queue

@oleksandr-didyk
Copy link
Contributor

@directhex we successfully installed the agent on one of the VMs. Could you please submit some proper job to the queue so we can test it E2E properly before we procced with installing it to the other VMs. Thanks

@directhex
Copy link
Contributor Author

PR for that is dotnet/runtime#70734 let's see how it goes

@directhex
Copy link
Contributor Author

Hm. What did the queue name end up as? I don't see it on helix.dot.net, and none of the variations I've tried work.

##[error].packages/microsoft.dotnet.helix.sdk/7.0.0-beta.22310.1/tools/Microsoft.DotNet.Helix.Sdk.MonoQueue.targets(46,5): error : (NETCORE_ENGINEERING_TELEMETRY=Build) Helix API does not contain an entry for Ubuntu.2004.PPC64EL.Experimental.Open

(Above value from the dotnet-helix-machines.git PR)

@directhex
Copy link
Contributor Author

And was using "ppc64el" (the Debian/Ubuntu architecture name) rather than "ppc64le" (IBM's preferred name) intentional?

@premun
Copy link
Member

premun commented Jun 15, 2022

@directhex the queue hasn't been released yet and it's in staging only (https://helix.int-dot.net/). You will have to point the Helix SDK onto staging:

<HelixBaseUri>https://helix.int-dot.net</HelixBaseUri>

However, maybe don't run all of the tests on staging so either change this property for that specific RID or please comment out the other legs if possible.

We will rename the queue, that was a mistake (@oleksandr-didyk ^^)

@oleksandr-didyk
Copy link
Contributor

Yeah, my bad, forgot to mention that it was still on staging.

Sure, will rename it to ppc64le. Wanted to keep it consistent with the naming of the underlying OS, but that's not what you requested

@directhex
Copy link
Contributor Author

I'm not going to change HelixBaseUrl in the PR in case I mess it up and submit everything there, but I've submitted that one job leg locally from my dev machine. Just waiting for a job to start running. Thanks for renaming the queue.

@premun
Copy link
Member

premun commented Jun 15, 2022

@directhex it is possible that @oleksandr-didyk hasn't moved the VM he bootstrapped yesterday into the renamed queue yet

@directhex
Copy link
Contributor Author

@oleksandr-didyk
Copy link
Contributor

Yeah, the queue is up and the agent seems to be processing the messages without any issues. I apologize for the delay and will continue with adding the agents to the other VMs. Once they are ready, I'll let you know

@directhex
Copy link
Contributor Author

Thanks @oleksandr-didyk

What's the process going to be for getting this into the prod instance? The initial smoke test submitted from my dev machine looks promising, jobs are running and tests are passing https://helix.int-dot.net/api/jobs/019947f6-a77e-4a08-ae41-e2c395e82d0d/workitems?api-version=2019-06-17

@premun
Copy link
Member

premun commented Jun 16, 2022

We just give it a different configuration file and it will connect to the PROD queue. So machines that are already set up and working can be moved once we roll out - so most probably next Wednesday

@oleksandr-didyk
Copy link
Contributor

I just installed the agent on the rest of the VMs, everything seems to be A-OK. As @premun mentioned, once the PROD queue is up after next rollout, I'll switch the configuration files and will notify here

@premun
Copy link
Member

premun commented Jun 23, 2022

The new queue is up and we should move the machines into PROD. @oleksandr-didyk is this something you could do in between the bootcamp? I believe it's only you who has access to these VMs?

@oleksandr-didyk
Copy link
Contributor

@directhex the VMs were updated to listen to the PROD queue and seem to be working / hearbeating just fine. Could you please verify on your end that its A-OK so that the ticket can be Done? Thanks

@directhex
Copy link
Contributor Author

Yep, queue in prod is working

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Detected By - Customer Issue was reported by a customer Operations Used by FR to track issues related to operations work Ops - First Responder
Projects
None yet
Development

No branches or pull requests

6 participants