Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Launch several virtual machines simultaneously #291

Open
Ezhvsalate opened this issue Feb 18, 2020 · 9 comments
Open

Launch several virtual machines simultaneously #291

Ezhvsalate opened this issue Feb 18, 2020 · 9 comments

Comments

@Ezhvsalate
Copy link

Hi, @olivergondza

We encountered the following problem while using your plugin: new virtual machines are launched one by one, even when build queue is full.

Steps to reproduce:

  • Two templates of slaves: CENTOS & REDHAT
  • Settings are set in "Default slave options"
  • Max. No. of Instances = 50
  • Min. No. of Instances = 3
  • Launch 15 long-running jobs

Observed behaviour:

  • 6 job start on existing agents
  • 9 jobs are hanging in the queue
  • new agents are launched one-by-one, new vm is created only after a previous one is complete

What we want:

  • some parameter, which allows us to set up maximum number of agents to be launched simultaneously
    or
  • the way to configure such behaviour by some combination of existing parameters

Below are the logs showing that creation of 3 instances is planned, but they are launched one-by-one.

2020-02-18 08:26:51.501+0000 [id=11476] INFO j.p.o.c.JCloudsPreCreationThread#execute: Pre-creating 3 instance(s) for template rhel7 in cloud cloud.xxx.com

2020-02-18 08:26:51.513+0000 [id=11476] INFO j.p.o.c.JCloudsSlaveTemplate#provisionServer: Provisioning new openstack server rhel7-162 with options jenkins.plugins.openstack.compute.SlaveOptions@517f51dd[...]

2020-02-18 08:27:21.872+0000 [id=11476] INFO j.p.o.compute.JCloudsSlave#createComputer: Creating a new computer for rhel7-162

2020-02-18 08:27:21.886+0000 [id=11476] INFO j.p.o.c.JCloudsSlaveTemplate#provisionServer: Provisioning new openstack server rhel7-163 with options jenkins.plugins.openstack.compute.SlaveOptions@5723a58f[...]

2020-02-18 08:27:54.541+0000 [id=11476] INFO j.p.o.compute.JCloudsSlave#createComputer: Creating a new computer for rhel7-163

2020-02-18 08:27:54.561+0000 [id=11476] INFO j.p.o.c.JCloudsSlaveTemplate#provisionServer: Provisioning new openstack server rhel7-164 with options jenkins.plugins.openstack.compute.SlaveOptions@457210bd[...]

Thanks in advance.

@olivergondza
Copy link
Member

It is strange you cannot get to anything faster than one machine per ~30 seconds. The heavy lifting is supposed to be done in parallel so there either is something that causes that to be dispatched in quasi-serial way (which is fixable in plugin) or Jenkins heuristics for node planning are not requesting the capacity any faster than once per ~30 seconds (which is hard to fix).

Can you attach "Remaining excess workload" logs from NodeProvisioner and "Provisioning slave for" from JCloudsCloud (Fine logging needs to be turned on)? If NodeProvisioner is requesting capacity one by one, we know it is the core.

[1]

LOGGER.fine("Provisioning slave for " + label + " from template " + template.getName());

@Ezhvsalate
Copy link
Author

@olivergondza, thank you for quick answer!
We turned on 'FINE' logging level but unfortunately can't see the lines you mentioned in our logs.
Please check attached file.

provisioning.log

@olivergondza
Copy link
Member

Hmm, what version of the plugin are you using? The line is there since 2.46...

@Ezhvsalate
Copy link
Author

We currently have version 2.51

@Ezhvsalate
Copy link
Author

@olivergondza, today we installed version 2.52-SNAPSHOT (private-4acc8024-jenkins).
After that we repeated experiment and finally caught that message, but only once and it looked so:

Provisioning slave for null from template rhel7

You can see it in attached XML: log.xml.txt

When we tried once more - there didn't see this message anymore.
Maybe we have configured something wrong? If it is important, we used web interface, not jcasc.

@olivergondza
Copy link
Member

@Ezhvsalate, I see no logs from the NodeProvisioner there. In order to do a reasonable measurement, it is important those machines are scheduled based on queue demand. Then we have to evaluate whether the plugin invoked one node at a time or it is slow provisioning the bulk.

@Ezhvsalate
Copy link
Author

Ezhvsalate commented Feb 20, 2020

@olivergondza, ok, attached log with NodeProvisioner below.
provisioning_with_node_prov.log

@olivergondza
Copy link
Member

olivergondza commented Feb 21, 2020

@Ezhvsalate, I am sorry, but the log only contain one slave provisioning attempt (we need a lot more to assess why they are not dispatched fast when in bulk). I have retested in my env, and I have witnessed sub-second delays between "Provisioning new openstack server..." messages (meaning the servers are requested lot fasted than for you) and the "Provisioned: NovaServer..." interleaved with the previous ones confirming machines are provisioned in parallel. So in order to investigate in your environment, it is needed to:

  • Eliminate available Jenkins nodes not to interfere with the measurement.
  • Configure the UI logging like this[1].
  • Rapidly schedule at least 10 items in the queue that run on openstack provided nodes.
  • Wait until all nodes are running the builds.
  • Attach the entire log produced (this cannot be postponed as the log will be rotated eventualy)

Then we shall see.

[1]
screenshot

@Ezhvsalate
Copy link
Author

@olivergondza, when I carefully followed your instructions above it was ok - agents were starting simultaneously. Here is the log:
openstack_fast_creating_agents.log

But after that I returned back "Min. No. of Instances" setting and the issue was back :)

So, the following steps should be completed to to reproduce issue:

  1. Set the "Min./Max. No. of Instances" settings in "Default slave options" section to "3 / 50":
    image
  2. Wait until 3 agents (in our case -6, because there are two different types) are created one-by-one.
  3. Launch 15 builds. 6 will be started, 9 will fall in queue.
  4. Observe one-by-one creation of new agents:
    Openstack-one-by-one-creation.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants