Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LiveImage support #754

Closed
wants to merge 5 commits into from
Closed

Add LiveImage support #754

wants to merge 5 commits into from

Conversation

hardys
Copy link
Member

@hardys hardys commented Dec 16, 2020

First-pass of implementation related to metal3-io/metal3-docs#150

@metal3-io-bot metal3-io-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 16, 2020
@metal3-io-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hardys

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot metal3-io-bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 16, 2020
@hardys
Copy link
Member Author

hardys commented Dec 16, 2020

This is still WIP as currently the BMH controller isn't triggering provisioning when liveImage is added, and I think we need to handle changing the DeployInterface after inspection as currently it only gets set on nodes.Create, feedback welcome though :)

/cc @sacharya

Also note this depends on #750

@metal3-io-bot
Copy link
Contributor

@hardys: GitHub didn't allow me to request PR reviews from the following users: sacharya.

Note that only metal3-io members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

This is still WIP as currently the BMH controller isn't triggering provisioning when liveImage is added, and I think we need to handle changing the DeployInterface after inspection as currently it only gets set on nodes.Create, feedback welcome though :)

/cc @sacharya

Also note this depends on #750

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@metal3-io-bot metal3-io-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 16, 2020
pkg/provisioner/ironic/ironic.go Outdated Show resolved Hide resolved
}

// image_source
func (p *ironicProvisioner) getImageUpdateOptsForNode(ironicNode *nodes.Node, imageData *metal3v1alpha1.Image, liveImageData *metal3v1alpha1.LiveImage) (updates nodes.UpdateOpts, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside this method we have the information we need to derive imageData and liveImageData, so we shouldn't need to pass them as arguments.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I did this is because since 2e4e00b from @zaneb the imageData is potentially different depending on where it's called - in ValidateManagementAccess it could use the Status image on adopt, but when called via setUpForProvisioning->getUpdateOptsForNode we'll only ever want to use the Spec?

I wasn't sure if unconditionally doing that check against the status was safe, so I tried to replicate the existing logic, I was worried about introducing a new corner case in addition to the one @zaneb was fixing..

pkg/provisioner/ironic/ironic.go Outdated Show resolved Hide resolved
pkg/provisioner/ironic/ironic.go Outdated Show resolved Hide resolved
@metal3-io-bot metal3-io-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 17, 2020
@hardys
Copy link
Member Author

hardys commented Dec 18, 2020

@dhellmann you asked for details of how I'm testing this:

metal3-io/metal3-dev-env#579 applied and export IMAGE_OS="FCOS-ISO" in config_$USER.sh

This should result in a locally cached FCOS iso:

$ curl -I http://172.22.0.1/images/fedora-coreos-33.20201201.2.1-live.x86_64.iso
HTTP/1.1 200 OK
Date: Fri, 18 Dec 2020 10:17:24 GMT
Server: Apache
Last-Modified: Fri, 18 Dec 2020 10:11:00 GMT
ETag: "2ee00000-5b6ba534537d3"
Accept-Ranges: bytes
Content-Length: 786432000
Content-Type: application/octet-stream

Then I'm running make and modifying the /opt/metal3-dev-env/bmhosts_crs.yaml to add liveImage to one of the BMH resources, e.g:

apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: node-1
spec:
  online: true
  bootMACAddress: 00:8e:b1:1f:8e:4f
  bootMode: legacy
  bmc:
    address: redfish+http://192.168.111.1:8000/redfish/v1/Systems/b85dd1da-439a-4da0-ba9d-686cc0177447
    credentialsName: node-1-bmc-secret
  liveImage:
    url: http://172.22.0.1/images/fedora-coreos-33.20201201.2.1-live.x86_64.iso

Then applying the change e.g:

$ kubectl apply -f bmhosts_crs.yaml -n metal3
secret/node-0-bmc-secret unchanged
baremetalhost.metal3.io/node-0 unchanged
secret/node-1-bmc-secret unchanged
baremetalhost.metal3.io/node-1 configured

Currently that doesn't result in any state change, the BMH remains in ready provisioning status, so I must be missing some Image reference somewhere, will keep digging :)

@@ -293,16 +293,25 @@ func (hsm *hostStateMachine) provisioningCancelled() bool {
if hsm.Host.HasError() {
return true
}
if hsm.Host.Spec.Image == nil {
if hsm.Host.Spec.Image == nil && hsm.Host.Spec.LiveImage == nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is very similar to the BMH NeedsProvisioning() - I wonder if we could use the inverse of that here instead of duplicating nearly the same conditionals?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to be the NeedsDeprovisioning() method; we moved it here because having methods on the BMH object is actually bad. They're not inverses of each other.

pkg/provisioner/ironic/ironic.go Outdated Show resolved Hide resolved
if host.Spec.Image.URL == "" {
// We have an Image struct but it is empty
return false
if host.Spec.LiveImage != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to make the logic in this function a bit more complicated to deal with switching a host from live image to provisioned image, because in that case we may have a non-nil pointer to a struct with an empty string for the live image and we would want to treat that as though the image was set. We can probably deal with that case in another PR, after we have the basic case working.

@hardys
Copy link
Member Author

hardys commented Dec 23, 2020

/retitle Add LiveImage support

@metal3-io-bot metal3-io-bot changed the title WIP add LiveImage support Add LiveImage support Dec 23, 2020
@metal3-io-bot metal3-io-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 23, 2020
@hardys
Copy link
Member Author

hardys commented Dec 23, 2020

Ok testing locally this seems to be working now - so I removed the WIP but we need to decide if the case of switching from Image->LiveImage or LiveImage->Image needs to be handled here or in a followup (I've not tested that yet, just adding LiveImage where there is no existing Image

@hardys
Copy link
Member Author

hardys commented Jan 4, 2021

/cc @zaneb ptal when you get a moment, thanks!

@dhellmann
Copy link
Member

Ok testing locally this seems to be working now - so I removed the WIP but we need to decide if the case of switching from Image->LiveImage or LiveImage->Image needs to be handled here or in a followup (I've not tested that yet, just adding LiveImage where there is no existing Image

I'd be happy to have that work in a follow-up PR, to make things easier to review.

@dhellmann
Copy link
Member

/test-integration

@@ -293,16 +293,25 @@ func (hsm *hostStateMachine) provisioningCancelled() bool {
if hsm.Host.HasError() {
return true
}
if hsm.Host.Spec.Image == nil {
if hsm.Host.Spec.Image == nil && hsm.Host.Spec.LiveImage == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to be the NeedsDeprovisioning() method; we moved it here because having methods on the BMH object is actually bad. They're not inverses of each other.

return true
}
if hsm.Host.Status.Provisioning.Image.URL == "" {
if hsm.Host.Spec.LiveImage != nil && hsm.Host.Status.Provisioning.LiveImage.URL == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't handle the case where Spec.LiveImage is set and Status.Provisioning.Image.URL is also set, in which case we want to return true.

Similarly, below we don't handle the case where Spec.Image is set and Status.Provisioning.LiveImage.URL is also set.

I think the correct logic is something like:

switch {
case hsm.Host.Spec.LiveImage != nil:
    if hsm.Host.Spec.LiveImage.URL == "" {
        return true
    }
    if hsm.Host.Status.Provisioning.Image.URL != "" {
        return true
    }
    if hsm.Hosts.Status.Provisioning.LiveImage.URL == "" {
        return false
    }
    if hsm.Host.Spec.LiveImage != nil && hsm.Host.Spec.LiveImage.URL != hsm.Host.Status.Provisioning.LiveImage.URL {
        return true
    }
}
case hsm.Host.Spec.Image != nil:
    if hsm.Host.Spec.Image.URL == "" {
        return true
    }
    if hsm.Host.Status.Provisioning.LiveImage.URL != "" {
        return true
    }
    if hsm.Host.Status.Provisioning.Image.URL == "" {
        return false
    }
    if hsm.Host.Spec.Image.URL != hsm.Host.Status.Provisioning.Image.URL {
        return true
    }
default:
    return true
}
return false

Assuming that hsm.Host.Spec.LiveImage and hsm.Host.Spec.Image cannot both be non-nil. If they both can be non-nil it quickly devolves even further into a nightmare. Seeing all this duplicated/permuted logic makes me wonder whether we should just have an ISO flag in the regular Image struct instead. At least in the Status maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @zaneb - In the docstring I do mention that only Image or LiveImage should be set, but I agree long term they should be validated as mutually exclusive.

Re the ISO flag, that was my original proposal but this approach was preferred since it's possible we might support a live kernel/ramdisk in future, and the checksum argument to the current Image section currently isn't applicable to the LiveImage case: metal3-io/metal3-docs#150 (comment)

} else {
op = nodes.ReplaceOp
p.log.Info("updating boot_iso")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we also need to remove any options corresponding to the other type of image?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think there are several corner cases in the Image->LiveImage and LiveImage->Image case which I've not yet tested or accounted for in the code.

I can attempt to do that in this PR, or as previously mentioned by @dhellmann we might want to land this first iteration then take care of that as a followup, so it's easier to review?

@hardys
Copy link
Member Author

hardys commented Jan 8, 2021

/hold after discussion I'm working on an alternative version which uses the Image DiskFormat option, then we can compare and decide which interface to go with (the DiskFormat approach should be significantly simpler than that in this PR and avoid some of the corner cases discussed in the review comments)

@metal3-io-bot metal3-io-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 8, 2021
@hardys hardys mentioned this pull request Jan 11, 2021
@hardys
Copy link
Member Author

hardys commented Jan 12, 2021

Closing in favor of #759

@hardys hardys closed this Jan 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants