Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syncs upstream stable/train into jg-ironic-rebalance #7

Open
wants to merge 93 commits into
base: jg-ironic-rebalance
Choose a base branch
from

Commits on Jul 8, 2020

  1. add [libvirt]/max_queues config option

    This change adds a max_queues config option to allow
    operators to set the maximium number of virtio queue
    pairs that can be allocated to a virtio network
    interface.
    
    Change-Id: I9abe783a9a9443c799e7c74a57cc30835f679a01
    Closes-Bug: #1847367
    (cherry picked from commit 0e6aac3)
    SeanMooney committed Jul 8, 2020
    Configuration menu
    Copy the full SHA
    286d7cf View commit details
    Browse the repository at this point in the history

Commits on Aug 26, 2020

  1. hardware: Reject requests for no hyperthreads on hosts with HT

    Attempting to boot an instance with 'hw:cpu_policy=dedicated' will
    result in a request from nova-scheduler to placement for allocation
    candidates with $flavor.vcpu 'PCPU' inventory. Similarly, booting an
    instance with 'hw:cpu_thread_policy=isolate' will result in a request
    for allocation candidates with 'HW_CPU_HYPERTHREADING=forbidden', i.e.
    hosts without hyperthreading. This has been the case since the
    cpu-resources feature was implemented in Train. However, as part of that
    work and to enable upgrades from hosts that predated Train, we also make
    a second request for candidates with $flavor.vcpu 'VCPU' inventory. The
    idea behind this is that old compute nodes would only report 'VCPU' and
    should be useable, and any new compute nodes that got caught up in this
    second request could never actually be scheduled to since there wouldn't
    be enough cores from 'ComputeNode.numa_topology.cells.[*].pcpuset'
    available to schedule to, resulting in rejection by the
    'NUMATopologyFilter'. However, if a host was rejected in the first
    query because it reported the 'HW_CPU_HYPERTHREADING' trait, it could
    get picked up by the second query and would happily be scheduled to,
    resulting in an instance consuming 'VCPU' inventory from a host that
    properly supported 'PCPU' inventory.
    
    The solution is simply, though also a huge hack. If we detect that the
    host is using new style configuration and should be able to report
    'PCPU', check if the instance asked for no hyperthreading and whether
    the host has it. If all are True, reject the request.
    
    Change-Id: Id39aaaac09585ca1a754b669351c86e234b89dd9
    Signed-off-by: Stephen Finucane <[email protected]>
    Closes-Bug: #1889633
    (cherry picked from commit 9c27033)
    (cherry picked from commit 7ddab32)
    stephenfin committed Aug 26, 2020
    Configuration menu
    Copy the full SHA
    44676dd View commit details
    Browse the repository at this point in the history

Commits on Sep 3, 2020

  1. compute: Validate a BDMs disk_bus when provided

    Previously disk_bus values were never validated and could easily end up
    being ignored by the underlying virt driver and hypervisor.
    
    For example, a common mistake made by users is to request a virtio-scsi
    disk_bus when using the libvirt virt driver. This however isn't a valid
    bus and is ignored, defaulting back to the virtio (virtio-blk) bus.
    
    This change adds a simple validation in the compute API using the
    potential disk_bus values provided by the DiskBus field class as used
    when validating the hw_*_bus image properties.
    
    Conflicts:
        nova/tests/unit/compute/test_compute_api.py
    
    NOTE(lyarwood): Conflict as If9c459a9a0aa752c478949e4240286cbdb146494 is
    not present in stable/train. test_validate_bdm_disk_bus is also updated
    as Ib31ba2cbff0ebb22503172d8801b6e0c3d2aa68a is not present in
    stable/train.
    
    Closes-Bug: #1876301
    Change-Id: I77b28b9cc8f99b159f628f4655d85ff305a71db8
    (cherry picked from commit 5913bd8)
    (cherry picked from commit fb31ae4)
    lyarwood committed Sep 3, 2020
    Configuration menu
    Copy the full SHA
    bbc562c View commit details
    Browse the repository at this point in the history

Commits on Sep 8, 2020

  1. Add note and daxio version to the vPMEM document

    Make the spec of virtual persistent memory consistent with
    the contents of the admin manual, update the dependency of virtual
    persistent memory about daxio, and add NOTE for the tested kernel
    version.
    
    Closes-Bug: #1894022
    
    Change-Id: I30539bb47c98a588b95c066a394949d60af9c520
    (cherry picked from commit a8b0c6b)
    (cherry picked from commit eae463c)
    1049965823 committed Sep 8, 2020
    Configuration menu
    Copy the full SHA
    ed9eacf View commit details
    Browse the repository at this point in the history
  2. libvirt:driver:Disallow AIO=native when 'O_DIRECT' is not available

    Because of the libvirt issue[1], there is a bug[2] that if we set cache mode
    whose write semantic is not O_DIRECT (.i.e unsafe, writeback or writethrough),
    there will be a problem with the volume drivers
    (.i.e nova.virt.libvirt.volume.LibvirtISCSIVolumeDriver,
    nova.virt.libvirt.volume.LibvirtNFSVolumeDriver and so on), which designate
    native io explicitly.
    
    That problem will generate a libvirt xml for the instance,
    whose content contains
    
    ```
    ...
    <disk ... >
      <driver ... cache='unsafe/writeback/writethrough' io='native' />
    </disk>
    ...
    ```
    In turn, it will fail to start the instance or attach the disk.
    
    > When qemu is configured with a block device that has aio=native set, but
    > the cache mode doesn't use O_DIRECT (i.e. isn't cache=none/directsync or any
    > unnamed mode with explicit cache.direct=on), then the raw-posix block driver
    > for local files and block devices will silently fall back to aio=threads.
    > The blockdev-add interface rejects such combinations, but qemu can't
    > change the existing legacy interfaces that libvirt uses today.
    
    [1]: libvirt/libvirt@0583840
    [2]: https://bugzilla.redhat.com/show_bug.cgi?id=1086704
    
    Closes-Bug: #1841363
    Change-Id: If9acc054100a6733f3659a15dd9fc2d462e84d64
    (cherry picked from commit af2405e)
    (cherry picked from commit 0bd5892)
    Arthur Dayne authored and Elod Illes committed Sep 8, 2020
    Configuration menu
    Copy the full SHA
    d92fe4f View commit details
    Browse the repository at this point in the history

Commits on Sep 9, 2020

  1. post live migration: don't call Neutron needlessly

    In bug 1879787, the call to network_api.get_instance_nw_info() in
    _post_live_migration() on the source compute manager eventually calls
    out to the Neutron REST API. If this fails, the exception is
    unhandled, and the migrating instance - which is fully running on the
    destination at this point - will never be updated in the database.
    This update normally happens later in
    post_live_migration_at_destination().
    
    The network_info variable obtained from get_instance_nw_info() is used
    for two things: notifications - which aren't critical - and unplugging
    the instance's vifs on the source - which is very important!
    
    It turns out that at the time of the get_instance_nw_info() call, the
    network info in the instance info cache is still valid for unplugging
    the source vifs. The port bindings on the destination are only
    activated by the network_api.migrate_instance_start() [1] call that
    happens shortly *after* the problematic get_instance_nw_info() call.
    In other words, get_instance_nw_info() will always return the source
    ports. Because of that, we can replace it with a call to
    instance.get_network_info().
    
    NOTE(artom) The functional test has been excised, as in stable/train
    the NeutronFixture does not properly support live migration with
    ports, making the test worthless. The work to support this was done as
    part of bp/support-move-ops-with-qos-ports-ussuri, and starts at
    commit b2734b5.
    
    NOTE(artom) The
    test_post_live_migration_no_shared_storage_working_correctly and
    test_post_live_migration_cinder_v3_api unit tests had to be adjusted
    as part of the backport to pass with the new code.
    
    [1] https://opendev.org/openstack/nova/src/commit/d9e04c4ff0b1a9c3383f1848dc846e93030d83cb/nova/network/neutronv2/api.py#L2493-L2522
    
    Change-Id: If0fbae33ce2af198188c91638afef939256c2556
    Closes-bug: 1879787
    (cherry picked from commit 6488a5d)
    (cherry picked from commit 2c949cb)
    notartom committed Sep 9, 2020
    Configuration menu
    Copy the full SHA
    7ace26e View commit details
    Browse the repository at this point in the history

Commits on Sep 11, 2020

  1. Removes the delta file once image is extracted

    When creating a live snapshot of an instance, nova creates a
    copy of the instance disk using a QEMU shallow rebase. This
    copy - the delta file - is then extracted and uploaded. The
    delta file will eventually be deleted, when the temporary
    working directory nova is using for the live snapshot is
    discarded, however, until this happens, we will use 3x the
    size of the image of host disk space: the original disk,
    the delta file, and the extracted file. This can be problematic
    when concurrent snapshots of multiple instances are requested
    at once.
    
    The solution is simple: delete the delta file after it has
    been extracted and is no longer necessary.
    
    Change-Id: I15e9975fa516d81e7d34206e5a4069db5431caa9
    Closes-Bug: #1881727
    (cherry picked from commit d2af7ca)
    (cherry picked from commit e51555b)
    esubramanian-godaddy authored and tsecheran committed Sep 11, 2020
    Configuration menu
    Copy the full SHA
    06df7ca View commit details
    Browse the repository at this point in the history

Commits on Sep 12, 2020

  1. Merge "hardware: Reject requests for no hyperthreads on hosts with HT…

    …" into stable/train
    Zuul authored and openstack-gerrit committed Sep 12, 2020
    Configuration menu
    Copy the full SHA
    90c1b6a View commit details
    Browse the repository at this point in the history

Commits on Sep 13, 2020

  1. Correctly disable greendns

    Previously, we were setting the environment variable to disable
    greendns in eventlet *after* import eventlet. This has no effect, as
    eventlet processes environment variables at import time. This patch
    moves the setting of EVENTLET_NO_GREENDNS before importing eventlet in
    order to correctly disable greendns.
    
    Closes-bug: 1895322
    Change-Id: I4deed815c8984df095019a7f61d089f233f1fc66
    (cherry picked from commit 7c1d964)
    (cherry picked from commit 79e6b7f)
    notartom committed Sep 13, 2020
    Configuration menu
    Copy the full SHA
    4984b3b View commit details
    Browse the repository at this point in the history

Commits on Sep 15, 2020

  1. Configuration menu
    Copy the full SHA
    cd9bd06 View commit details
    Browse the repository at this point in the history

Commits on Sep 16, 2020

  1. Configuration menu
    Copy the full SHA
    1ee93b9 View commit details
    Browse the repository at this point in the history
  2. Sanity check instance mapping during scheduling

    mnaser reported a weird case where an instance was found
    in both cell0 (deleted there) and in cell1 (not deleted
    there but in error state from a failed build). It's unclear
    how this could happen besides some weird clustered rabbitmq
    issue where maybe the schedule and build request to conductor
    happens twice for the same instance and one picks a host and
    tries to build and the other fails during scheduling and is
    buried in cell0.
    
    To avoid a split brain situation like this, we add a sanity
    check in _bury_in_cell0 to make sure the instance mapping is
    not pointing at a cell when we go to update it to cell0.
    Similarly a check is added in the schedule_and_build_instances
    flow (the code is moved to a private method to make it easier
    to test).
    
    Worst case is this is unnecessary but doesn't hurt anything,
    best case is this helps avoid split brain clustered rabbit
    issues.
    
    Closes-Bug: #1775934
    
    Change-Id: I335113f0ec59516cb337d34b6fc9078ea202130f
    (cherry picked from commit 5b55251)
    mriedem authored and melwitt committed Sep 16, 2020
    Configuration menu
    Copy the full SHA
    efc35b1 View commit details
    Browse the repository at this point in the history

Commits on Sep 17, 2020

  1. Merge "libvirt:driver:Disallow AIO=native when 'O_DIRECT' is not avai…

    …lable" into stable/train
    Zuul authored and openstack-gerrit committed Sep 17, 2020
    Configuration menu
    Copy the full SHA
    d7a7db6 View commit details
    Browse the repository at this point in the history

Commits on Sep 18, 2020

  1. tests: Add regression test for bug 1894966

    You must specify the 'policies' field. Currently, not doing so will
    result in a HTTP 500 error code. This should be a 4xx error. Add a test
    to demonstrate the bug before we provide a fix.
    
    Changes:
      nova/tests/functional/regressions/test_bug_1894966.py
    
    NOTE(stephenfin): Need to update 'super' call to Python 2-compatible
    variant.
    
    Change-Id: I72e85855f621d3a51cd58d14247abd302dcd958b
    Signed-off-by: Stephen Finucane <[email protected]>
    Related-Bug: #1894966
    (cherry picked from commit 2c66962)
    (cherry picked from commit 94d24e3)
    stephenfin committed Sep 18, 2020
    Configuration menu
    Copy the full SHA
    cf6db29 View commit details
    Browse the repository at this point in the history
  2. api: Set min, maxItems for server_group.policies field

    As noted inline, the 'policies' field may be a list but it expects one
    of two items.
    
    Change-Id: I34c68df1e6330dab1524aa0abec733610211a407
    Signed-off-by: Stephen Finucane <[email protected]>
    Closes-Bug: #1894966
    (cherry picked from commit 32c43fc)
    (cherry picked from commit 781210b)
    stephenfin committed Sep 18, 2020
    Configuration menu
    Copy the full SHA
    1634d3f View commit details
    Browse the repository at this point in the history

Commits on Sep 21, 2020

  1. Configuration menu
    Copy the full SHA
    4cf72ea View commit details
    Browse the repository at this point in the history

Commits on Sep 22, 2020

  1. Set different VirtualDevice.key

    In vSphere 7.0, the VirtualDevice.key cannot be the same any more.
    So set different values to VirtualDevice.key
    
    Change-Id: I574ed88729d2f0760ea4065cc0e542eea8d20cc2
    Closes-Bug: #1892961
    (cherry picked from commit a5d153a)
    (cherry picked from commit 0ea5bcc)
    yingjisun committed Sep 22, 2020
    Configuration menu
    Copy the full SHA
    75c0327 View commit details
    Browse the repository at this point in the history

Commits on Oct 6, 2020

  1. Configuration menu
    Copy the full SHA
    8f0c3e9 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ec01dad View commit details
    Browse the repository at this point in the history

Commits on Oct 12, 2020

  1. libvirt: 'video.vram' property must be an integer

    The 'vram' property of the 'video' device must be an integer else
    libvirt will spit the dummy out, e.g.
    
      libvirt.libvirtError: XML error: cannot parse video vram '8192.0'
    
    The division operator in Python 3 results in a float, not an integer
    like in Python 2. Use the truncation division operator instead.
    
    Change-Id: Iebf678c229da4f455459d068cafeee5f241aea1f
    Signed-off-by: Stephen Finucane <[email protected]>
    Closes-Bug: #1896496
    (cherry picked from commit f2ca089)
    (cherry picked from commit fd7c66f)
    (cherry picked from commit 121e481)
    stephenfin committed Oct 12, 2020
    Configuration menu
    Copy the full SHA
    06b8f14 View commit details
    Browse the repository at this point in the history

Commits on Oct 13, 2020

  1. Allow tap interface with multiqueue

    When vif_type="tap" (such as when using calico),
    attempting to create an instance using an image that has
    the property hw_vif_multiqueue_enabled=True fails, because
    the interface is always being created without multiqueue
    flags.
    
    This change checks if the property is defined and passes
    the multiqueue parameter to create the tap interface
    accordingly.
    
    In case the multiqueue parameter is passed but the
    vif_model is not virtio (or unspecified), the old
    behavior is maintained.
    
    Change-Id: I0307c43dcd0cace1620d2ac75925651d4ee2e96c
    Closes-bug: #1893263
    (cherry picked from commit 84cfc8e)
    (cherry picked from commit a69845f)
    rodrigogansobarbieri committed Oct 13, 2020
    Configuration menu
    Copy the full SHA
    750655c View commit details
    Browse the repository at this point in the history

Commits on Oct 14, 2020

  1. Configuration menu
    Copy the full SHA
    8e23d72 View commit details
    Browse the repository at this point in the history

Commits on Oct 15, 2020

  1. Merge "api: Set min, maxItems for server_group.policies field" into s…

    …table/train
    Zuul authored and openstack-gerrit committed Oct 15, 2020
    Configuration menu
    Copy the full SHA
    c718cf4 View commit details
    Browse the repository at this point in the history

Commits on Oct 16, 2020

  1. Configuration menu
    Copy the full SHA
    e325040 View commit details
    Browse the repository at this point in the history

Commits on Oct 17, 2020

  1. Configuration menu
    Copy the full SHA
    17a233c View commit details
    Browse the repository at this point in the history

Commits on Oct 21, 2020

  1. Add a workaround config toggle to refuse ceph image upload

    If a compute node is backed by ceph, and the image is not clone-able
    in that same ceph, nova will try to download the image from glance
    and upload it to ceph itself. This is nice in that it "just works",
    but it also means we store that image in ceph in an extremely
    inefficient way. In a glance multi-store case with multiple ceph
    clusters, the user is currently required to make sure that the image
    they are going to use is stored in a backend local to the compute
    node they land on, and if they do not (or can not), then nova will
    do this non-COW inefficient copy of the image, which is likely not
    what the operator expects.
    
    Per the discussion at the Denver PTG, this adds a workaround flag
    which allows the operators to direct nova to *not* do this behavior
    and instead refuse to boot the instance entirely.
    
    Conflicts:
        nova/conf/workarounds.py
    
    NOTE(melwitt): The conflict is because this patch originally landed on
    ussuri and change If874f018ea996587e178219569c2903c2ee923cf (Reserve
    DISK_GB resource for the image cache) landed afterward and was
    backported to stable/train.
    
    Related-Bug: #1858877
    Change-Id: I069b6b1d28eaf1eee5c7fb8d0fdef9c0c229a1bf
    (cherry picked from commit 80191e6)
    kk7ds authored and lyarwood committed Oct 21, 2020
    Configuration menu
    Copy the full SHA
    794bedf View commit details
    Browse the repository at this point in the history
  2. Follow up for cherry-pick check for merge patch

    This is a follow up to change
    I8e4e5afc773d53dee9c1c24951bb07a45ddc2f1a which fixed an issue with
    validation when the topmost patch after a Zuul rebase is a merge
    patch.
    
    We need to also use the $commit_hash variable for the check for
    stable-only patches, else it will incorrectly fail because it is
    checking the merge patch's commit message.
    
    Change-Id: Ia725346b65dd5e2f16aa049c74b45d99e22b3524
    (cherry picked from commit 1e10461)
    (cherry picked from commit f1e4f6b)
    (cherry picked from commit e676a48)
    melwitt committed Oct 21, 2020
    Configuration menu
    Copy the full SHA
    115b43e View commit details
    Browse the repository at this point in the history

Commits on Oct 23, 2020

  1. Configuration menu
    Copy the full SHA
    2a26f63 View commit details
    Browse the repository at this point in the history

Commits on Oct 30, 2020

  1. Configuration menu
    Copy the full SHA
    5016a36 View commit details
    Browse the repository at this point in the history

Commits on Nov 2, 2020

  1. Prevent archiving of pci_devices records because of 'instance_uuid'

    Currently in the archive_deleted_rows code, we will attempt to clean up
    "residue" of deleted instance records by assuming any table with a
    'instance_uuid' column represents data tied to an instance's lifecycle
    and delete such records.
    
    This behavior poses a problem in the case where an instance has a PCI
    device allocated and someone deletes the instance. The 'instance_uuid'
    column in the pci_devices table is used to track the allocation
    association of a PCI with an instance. There is a small time window
    during which the instance record has been deleted but the PCI device
    has not yet been freed from a database record perspective as PCI
    devices are freed during the _complete_deletion method in the compute
    manager as part of the resource tracker update call.
    
    Records in the pci_devices table are anyway not related to the
    lifecycle of instances so they should not be considered residue to
    clean up if an instance is deleted. This adds a condition to avoid
    archiving pci_devices on the basis of an instance association.
    
    Closes-Bug: #1899541
    
    Change-Id: Ie62d3566230aa3e2786d129adbb2e3570b06e4c6
    (cherry picked from commit 1c256cf)
    (cherry picked from commit 09784db)
    (cherry picked from commit 79df36f)
    melwitt committed Nov 2, 2020
    Configuration menu
    Copy the full SHA
    e3bb611 View commit details
    Browse the repository at this point in the history

Commits on Nov 3, 2020

  1. libvirt: Only ask tpool.Proxy to autowrap vir* classes

    I668643c836d46a25df46d4c99a973af5e50a39db attempted to fix service wide
    pauses by providing a more complete list of classes to tpool.Proxy.
    
    While this excluded libvirtError it can include internal libvirt-python
    classes pointed to by private globals that have been introduced with the
    use of type checking within the module.
    
    Any attempt to wrap these internal classes will result in the failure
    seen in bug #1901383. As a result this change simply ignores any class
    found during inspection that doesn't start with the `vir` string, used
    by libvirt to denote public methods and classes.
    
    Closes-Bug: #1901383
    Co-Authored-By: Daniel Berrange <[email protected]>
    Change-Id: I568b0c4fd6069b9118ff116532f14abb46cc42ab
    (cherry picked from commit 0d2ca53)
    (cherry picked from commit 048a333)
    (cherry picked from commit 36cb57d)
    lyarwood committed Nov 3, 2020
    Configuration menu
    Copy the full SHA
    cd83da5 View commit details
    Browse the repository at this point in the history

Commits on Nov 13, 2020

  1. Change default num_retries for glance to 3

    Previously, the default value of num_retries for glance is 0.
    It means that the request to glance is sent only one time.
    On the other hand, neutron and cinder clients set the default
    value to 3.
    To align the default value for retry to other components, we
    should change the default value to 3.
    
    Closes-Bug: #1888168
    Change-Id: Ibbd4bd26408328b9e1a1128b3794721405631193
    (cherry picked from commit 662af9f)
    (cherry picked from commit 1f9dd69)
    knoha-rh committed Nov 13, 2020
    Configuration menu
    Copy the full SHA
    ca2fd80 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    60071a2 View commit details
    Browse the repository at this point in the history

Commits on Nov 19, 2020

  1. Test for disabling greendns

    In commit 7c1d964 we fixed how we disable greendns. This patch adds
    a test for this. It also lays down the groundwork for future tests
    of how we manage eventlet's monkeypatching.
    
    How and what eventlet monkeypatches can be controlled by environment
    variables that are processed by eventlet at import-time (for exmaple,
    EVENTLET_NO_GREENDNS). Nova manages all of this in nova.monkey_patch.
    Therefore, nova.monkey_patch must be the first thing to import
    eventlet. As nova.tests.functional.__init__ imports nova.monkey_patch,
    our new test can go in the functional tree.
    
    Related-bug: 1895322
    Change-Id: I5b6c45b7b9a9eca3c13ecfaa5f50942922b69270
    (cherry picked from commit 6f35e4f)
    (cherry picked from commit 9ac794b)
    notartom authored and Elod Illes committed Nov 19, 2020
    Configuration menu
    Copy the full SHA
    5c3b4b6 View commit details
    Browse the repository at this point in the history
  2. Add missing exception

    Change Idd49b0c70caedfcd42420ffa2ac926a6087d406e added support for
    discovery of PMEM devices by the libvirt driver. Some error handling
    code in this was expected to raise a 'GetPMEMNamespacesFailed'
    exception, however, a typo meant the exception was actually called
    'GetPMEMNamespaceFailed' (singular). This exception was later removed in
    change I6fd027fb51823b8a8a24ed7b864a2191c4e8e8c0 because it had no
    references.
    
    Re-introduce the exception, this time with the correct name, and add
    some unit tests to prevent us regressing.
    
    Conflicts:
    	nova/exception.py
    
    NOTE(stephenfin): Conflicts are because change
    I6fd027fb51823b8a8a24ed7b864a2191c4e8e8c0 doesn't exist on this branch,
    meaning the misnamed exception still exists and simply needs to be
    renamed.
    
    Change-Id: I3b597a46314a1b29a952fc0f7a9c4537341e37b8
    Signed-off-by: Stephen Finucane <[email protected]>
    Closes-Bug: #1904446
    (cherry picked from commit 160ed6f)
    (cherry picked from commit 82d415d)
    (cherry picked from commit 8f65de9)
    stephenfin committed Nov 19, 2020
    Configuration menu
    Copy the full SHA
    eaecd1c View commit details
    Browse the repository at this point in the history

Commits on Nov 26, 2020

  1. docs: Rework the PCI passthrough guides

    Rewrite the document, making the following changes:
    
    - Remove use of bullet points in favour of more descriptive steps
    - Cross-reference various configuration options
    - Emphasise that ``[pci] alias`` must be set on both controller and
      compute node
    - Style nits, such as fixing the header style
    
    Change-Id: I2ac7df7d235f0af25f5a99bc8f6abddbae2cb3af
    Signed-off-by: Stephen Finucane <[email protected]>
    Related-Bug: #1852727
    (cherry picked from commit d5259ab)
    stephenfin committed Nov 26, 2020
    Configuration menu
    Copy the full SHA
    74b2af4 View commit details
    Browse the repository at this point in the history
  2. docs: Change order of PCI configuration steps

    It doesn't really make sense to describe the "higher level"
    configuration steps necessary for PCI passthrough before describing
    things like BIOS configuration. Simply switch the ordering.
    
    Change-Id: I4ea1d9a332d6585ce2c0d5a531fa3c4ad9c89482
    Signed-off-by: Stephen Finucane <[email protected]>
    Related-Bug: #1852727
    (cherry picked from commit 557728a)
    stephenfin committed Nov 26, 2020
    Configuration menu
    Copy the full SHA
    9223613 View commit details
    Browse the repository at this point in the history
  3. docs: Clarify configuration steps for PF devices

    Devices that report SR-IOV capabilities cannot be used without special
    configuration - namely, the addition of "'device_type': 'type-PF'" or
    "'device_type': 'type-VF'" to the '[pci] alias' configuration option.
    Spell this out in the docs.
    
    Change-Id: I4abbe30505a5e4ccba16027addd6d5f45066e31b
    Signed-off-by: Stephen Finucane <[email protected]>
    Closes-Bug: #1852727
    (cherry picked from commit 810aafc)
    stephenfin committed Nov 26, 2020
    Configuration menu
    Copy the full SHA
    0c0c5b1 View commit details
    Browse the repository at this point in the history

Commits on Nov 27, 2020

  1. Validate id as integer for os-aggregates

    According to the api-ref, the id passed to calls in os-aggregates is
    supposed to be an integer. No function validated this, so any value
    passed to these functions would directly reach the DB. While this is
    fine for SQLite, making a query with a string for an integer column on
    other databases like PostgreSQL results in a DBError exception and thus
    a HTTP 500 instead of 400 or 404.
    
    This commit adds validation for the id parameter the same way it's
    already done for other endpoints.
    
    Conflicts:
      nova/api/openstack/compute/aggregates.py
    
    Changes:
      nova/tests/unit/api/openstack/compute/test_aggregates.py
    
    NOTE(stephenfin): Conflicts are due to absence of change
    I4ab96095106b38737ed355fcad07e758f8b5a9b0 ("Add image caching API for
    aggregates") which we don't want to backport. A test related to this
    feature must also be removed.
    
    Change-Id: I83817f7301680801beaee375825f02eda526eda1
    Closes-Bug: 1865040
    (cherry picked from commit 2e70a17)
    joker-at-work authored and stephenfin committed Nov 27, 2020
    Configuration menu
    Copy the full SHA
    4653245 View commit details
    Browse the repository at this point in the history

Commits on Nov 28, 2020

  1. Configuration menu
    Copy the full SHA
    b96645e View commit details
    Browse the repository at this point in the history

Commits on Dec 1, 2020

  1. Configuration menu
    Copy the full SHA
    81a3f4b View commit details
    Browse the repository at this point in the history

Commits on Dec 23, 2020

  1. [stable-only] Cap bandit to 1.6.2 and raise hacking, flake8 and stestr

    The 1.6.3 [1] release has dropped support for py2 [2] so cap to 1.6.2
    when using py2.
    
    This change also raises hacking to 1.1.0 in lower-constraints.txt after
    it was bumped by I35c654bd39f343417e0a1124263ff31dcd0b05c9. This also
    means that flake8 is bumped to 2.6.0. stestr is also bumped to 2.0.0 as
    required by oslotest 3.8.0.
    
    All of these changes are squashed into a single change to pass the gate.
    
    [1] https://github.com/PyCQA/bandit/releases/tag/1.6.3
    [2] PyCQA/bandit#615
    
    Depends-On: https://review.opendev.org/c/openstack/devstack/+/768256
    Depends-On: https://review.opendev.org/c/openstack/swift/+/766214
    
    Closes-Bug: #1907438
    Closes-Bug: #1907756
    Change-Id: Ie5221bf37c6ed9268a4aa0737ffcdd811e39360a
    lyarwood committed Dec 23, 2020
    Configuration menu
    Copy the full SHA
    b2037fc View commit details
    Browse the repository at this point in the history

Commits on Jan 9, 2021

  1. Fix a hacking test

    In test_useless_assertion,
    the useless_assertion method should be checked instead of
    nonexistent_assertion_methods_and_attributes.
    
    Change-Id: Ifd19f636f58ae353d912bde57cba2cd0a29a9baa
    Signed-off-by: Takashi Natsume <[email protected]>
    (cherry picked from commit 1175081)
    (cherry picked from commit f4d62e1)
    (cherry picked from commit 7562e64)
    takanattie committed Jan 9, 2021
    Configuration menu
    Copy the full SHA
    b6cc7e9 View commit details
    Browse the repository at this point in the history

Commits on Jan 11, 2021

  1. Update pci stat pools based on PCI device changes

    At start up of nova-compute service, the PCI stat pools are
    populated based on information in pci_devices table in Nova
    database. The pools are updated only when new device is added
    or removed but not on any device changes like device type.
    
    If an existing device is configured as SRIOV and nova-compute
    is restarted, the pci_devices table gets updated but the device
    is still listed under the old pool in pci_tracker.stats.pool
    (in-memory object).
    
    This patch looks for device type updates in existing devices
    and updates the pools accordingly.
    
    Conflicts:
          nova/tests/functional/libvirt/test_pci_sriov_servers.py
          nova/tests/unit/virt/libvirt/fakelibvirt.py
          nova/tests/functional/libvirt/base.py
    
    To avoid the conflicts and make the new functional test execute,
    following changes are performed
    - Modified the test case to use flavor extra spec pci_passthrough
      :alias to create a server with sriov port instead of creating a
      sriov port and passing port information during server creation.
    - Removed changes in nova/tests/functional/libvirt/base.py as they
      are required only if neutron sriov port is created in the test
      case.
    
    Change-Id: Id4ebb06e634a612c8be4be6c678d8265e0b99730
    Closes-Bug: #1892361
    (cherry picked from commit b8695de)
    (cherry picked from commit d8b8a81)
    (cherry picked from commit f58399c)
    hemanthnakkina committed Jan 11, 2021
    Configuration menu
    Copy the full SHA
    8378785 View commit details
    Browse the repository at this point in the history

Commits on Jan 19, 2021

  1. Use cell targeted context to query BDMs for metadata

    The metadata service supports a multicell deployment in a configuration
    where the nova-api service implements the metadata API. In this case the
    metadata query needs to be cell targeted. This was partly implemented
    already. The instance itself is queried from the cell DB properly.
    However the BDM data used the non targeted context resulting in an empty
    BDM returned by the metadata service.
    
    Functional reproduction test is not added as I did not find a way to
    have a cell setup in the functional test that reproduce the problem. I
    reproduced the bug and tested the fix in a devstack.
    
    Change-Id: I48f57082edaef3ec4722bd31ce29a90b94d32523
    Closes-Bug: #1881944
    (cherry picked from commit 1390eec)
    (cherry picked from commit 9a5b624)
    (cherry picked from commit c727cfc)
    Balazs Gibizer committed Jan 19, 2021
    Configuration menu
    Copy the full SHA
    4839d41 View commit details
    Browse the repository at this point in the history

Commits on Jan 27, 2021

  1. Use subqueryload() instead of joinedload() for (system_)metadata

    Currently, when we "get" a single instance from the database and we
    load metadata and system_metadata, we do so using a joinedload() which
    does JOINs with the respective tables. Because of the one-to-many
    relationship between an instance and (system_)metadata records, doing
    the database query this way can result in a large number of additional
    rows being returned unnecessarily and cause a large data transfer.
    
    This is similar to the problem addressed by change
    I0610fb16ccce2ee95c318589c8abcc30613a3fe9 which added separate queries
    for (system_)metadata when we "get" multiple instances. We don't,
    however, reuse the same code for this change because
    _instances_fill_metadata converts the instance database object to a
    dict, and some callers of _instance_get_by_uuid need to be able to
    access an instance database object attached to the session (example:
    instance_update_and_get_original).
    
    By using subqueryload() [1], we can perform the additional queries for
    (system_)metadata to solve the problem with a similar approach.
    
    Closes-Bug: #1799298
    
    [1] https://docs.sqlalchemy.org/en/13/orm/loading_relationships.html#subquery-eager-loading
    
    Change-Id: I5c071f70f669966e9807b38e99077c1cae5b4606
    (cherry picked from commit e728fe6)
    (cherry picked from commit 63d2e62)
    (cherry picked from commit e7a45e0)
    melwitt authored and s10 committed Jan 27, 2021
    Configuration menu
    Copy the full SHA
    4350074 View commit details
    Browse the repository at this point in the history

Commits on Jan 29, 2021

  1. Disallow CONF.compute.max_disk_devices_to_attach = 0

    The CONF.compute.max_disk_devices_to_attach option controls the maximum
    number of disk devices allowed to attach to an instance. If it is set
    to 0, it will literally allow no disk device for instances, preventing
    them from being able to boot.
    
    This adds a note to the config option help to call this out and changes
    nova-compute to raise InvalidConfiguration during init_host if
    [compute]max_disk_devices_to_attach has been set to 0. The nova-compute
    service will fail to start if the option is set to 0.
    
    Note: there doesn't appear to be any way to disallow particular values
    in a oslo.config IntOpt other than the min/max values. Here we need the
    min value to be -1 to represent unlimited. There is a 'choices' kwarg
    available but that is only for enumerating valid values and we need to
    allow any integer >= 1 as well.
    
    Change-Id: I6e30468bc28f661ddc17937ab1de04a706f05063
    Closes-Bug: #1897950
    (cherry picked from commit 25a632a)
    (cherry picked from commit 8e12b81)
    (cherry picked from commit 4ad7e5e)
    melwitt authored and stephenfin committed Jan 29, 2021
    Configuration menu
    Copy the full SHA
    62bf0ef View commit details
    Browse the repository at this point in the history

Commits on Feb 2, 2021

  1. Handle disabled CPU features to fix live migration failures

    When performing a live migration between hypervisors running
    libvirt, where one or more CPU features are disabled, nova does
    not take account of these. This results in migration failures
    as none of the available hypervisor targets appear compatible.
    
    This patch ensures that the libvirt 'disable' poicy is taken
    account of, at least in a basic sense, by explicitly ignoring
    items flagged in this way when enumerating CPU features.
    
    Closes-Bug: #1898715
    Change-Id: Iaf14ca97cfac99dd280d1114123f2d4bb6292b63
    (cherry picked from commit eeeca4c)
    (cherry picked from commit 45a4110)
    (cherry picked from commit b6c4731)
    andrewbonney authored and lyarwood committed Feb 2, 2021
    Configuration menu
    Copy the full SHA
    e9e0998 View commit details
    Browse the repository at this point in the history

Commits on Feb 4, 2021

  1. Merge "Fix a hacking test" into stable/train

    Zuul authored and openstack-gerrit committed Feb 4, 2021
    Configuration menu
    Copy the full SHA
    1829580 View commit details
    Browse the repository at this point in the history

Commits on Feb 5, 2021

  1. Configuration menu
    Copy the full SHA
    42f8679 View commit details
    Browse the repository at this point in the history

Commits on Feb 17, 2021

  1. tools: Allow check-cherry-picks.sh to be disabled by an env var

    The checks performed by this script aren't always useful to downstream
    consumers of the repo so allow them to disable the script without having
    to make changes to tox.ini.
    
    NOTE(lyarwood): This backport has
    Ie8a672fd21184c810bfe9c0e3a49582189bf2111 squashed into it to ensure the
    introduced env var is passed into the pep8 tox env.
    
    tox: Add passenv DISABLE_CHERRY_PICK_CHECK to pep8
    
    I4f551dc4b57905cab8aa005c5680223ad1b57639 introduced the environment
    variable to disable the check-cherry-pick.sh script but forgot to allow
    it to be passed into the pep8 tox env.
    
    Change-Id: I4f551dc4b57905cab8aa005c5680223ad1b57639
    (cherry picked from commit 610396f)
    (cherry picked from commit 3d86df0)
    (cherry picked from commit cb96119)
    lyarwood committed Feb 17, 2021
    Configuration menu
    Copy the full SHA
    3c88415 View commit details
    Browse the repository at this point in the history

Commits on Feb 20, 2021

  1. Merge "libvirt: Only ask tpool.Proxy to autowrap vir* classes" into s…

    …table/train
    Zuul authored and openstack-gerrit committed Feb 20, 2021
    Configuration menu
    Copy the full SHA
    09d228d View commit details
    Browse the repository at this point in the history

Commits on Feb 23, 2021

  1. Merge "Disallow CONF.compute.max_disk_devices_to_attach = 0" into sta…

    …ble/train
    Zuul authored and openstack-gerrit committed Feb 23, 2021
    Configuration menu
    Copy the full SHA
    e39e622 View commit details
    Browse the repository at this point in the history
  2. Raise InstanceMappingNotFound if StaleDataError is encountered

    We have a race where if a user issues a delete request while an
    instance is in the middle of booting, we could fail to update the
    'queued_for_delete' field on the instance mapping with:
    
      sqlalchemy.orm.exc.StaleDataError: UPDATE statement on table
      'instance_mappings' expected to update 1 row(s); 0 were matched.
    
    This happens if we've retrieved the instance mapping record from the
    database and then it gets deleted by nova-conductor before we attempt
    to save() it.
    
    This handles the situation by adding try-except around the update call
    to catch StaleDataError and raise InstanceMappingNotFound instead,
    which the caller does know how to handle.
    
    Closes-Bug: #1882608
    
    Change-Id: I2cdcad7226312ed81f4242c8d9ac919715524b48
    (cherry picked from commit 16df22d)
    (cherry picked from commit 812ce63)
    melwitt committed Feb 23, 2021
    Configuration menu
    Copy the full SHA
    98048ee View commit details
    Browse the repository at this point in the history

Commits on Mar 1, 2021

  1. replace the "hide_hypervisor_id" to "hw:hide_hypervisor_id"

    When we use the flavor extra_specs "hide_hypervisor_id" in
    AggregateInstanceExtraSpecsFilter, then will retrun False.
    So we need correct the extra_specs.
    
    NOTE: The first two files do not exist in stable/train due to
    extra specs validators (patch Ib64a1348cce1dca995746214616c4f33d9d664bd)
    was introduced in Ussuri.
    The last one file has conflict due to the same
    bp/flavor-extra-sepc-validators feature
    (patch: I8da84b48e4d630eeb91d92346aa2323e25e28e3b)
    added in Ussuri.
    
    Conflicts:
    	nova/api/validation/extra_specs/hw.py
    	nova/api/validation/extra_specs/null.py
    	nova/tests/unit/api/openstack/compute/test_flavors_extra_specs.py
    
    Change-Id: I9d8d8c3a30cf6da7e8fb48374347e069ab075df2
    Closes-Bug: 1841932
    (cherry picked from commit bf488a8)
    (cherry picked from commit 9d28d7e)
    ramboman authored and angeiv committed Mar 1, 2021
    Configuration menu
    Copy the full SHA
    f602536 View commit details
    Browse the repository at this point in the history

Commits on Mar 11, 2021

  1. [stable-only] gate: Pin CEPH_RELEASE to nautilus in LM hook

    I1edd5a50079f325fa143a7e0d51b3aa3bb5ed45d moved the branchless
    devstack-plugin-ceph project to the Octopus release of Ceph that drops
    support for py2. As this was still the default on stable/train this
    breaks the nova-live-migration and nova-grenade jobs.
    
    This change works around this by pinning the CEPH_RELEASE to nautilus
    within the LM hook as was previously used prior to the above landing.
    
    Note that the devstack-plugin-ceph-tempest job from the plugin repo
    continues to pass as it is correctly pinned to the Luminous release that
    supports py2.
    
    If anything the above enforces the need to move away from these hook
    scripts and instead inherit our base ceph jobs from this repo in the
    future to avoid the Ceph release jumping around like this.
    
    Change-Id: I1d029ebe78b16ed2d4345201b515baf3701533d5
    lyarwood committed Mar 11, 2021
    Configuration menu
    Copy the full SHA
    ff570d1 View commit details
    Browse the repository at this point in the history
  2. compute: Lock by instance.uuid lock during swap_volume

    The libvirt driver is currently the only virt driver implementing swap
    volume within Nova. While libvirt itself does support moving between
    multiple volumes attached to the same instance at the same time the
    current logic within the libvirt driver makes a call to
    virDomainGetXMLDesc that fails if there are active block jobs against
    any disk attached to the domain.
    
    This change simply uses an instance.uuid based lock in the compute layer
    to serialise requests to swap_volume to avoid this from being possible.
    
    Closes-Bug: #1896621
    Change-Id: Ic5ce2580e7638a47f1ffddb4edbb503bf490504c
    (cherry picked from commit 6cf449b)
    (cherry picked from commit eebf94b)
    (cherry picked from commit f7ba1aa)
    lyarwood committed Mar 11, 2021
    Configuration menu
    Copy the full SHA
    fb81b16 View commit details
    Browse the repository at this point in the history

Commits on Mar 12, 2021

  1. Merge "replace the "hide_hypervisor_id" to "hw:hide_hypervisor_id"" i…

    …nto stable/train
    Zuul authored and openstack-gerrit committed Mar 12, 2021
    Configuration menu
    Copy the full SHA
    1d6d4e1 View commit details
    Browse the repository at this point in the history

Commits on Mar 13, 2021

  1. Merge "Use subqueryload() instead of joinedload() for (system_)metada…

    …ta" into stable/train
    Zuul authored and openstack-gerrit committed Mar 13, 2021
    Configuration menu
    Copy the full SHA
    dff026a View commit details
    Browse the repository at this point in the history

Commits on Mar 19, 2021

  1. Merge "compute: Lock by instance.uuid lock during swap_volume" into s…

    …table/train
    Zuul authored and openstack-gerrit committed Mar 19, 2021
    Configuration menu
    Copy the full SHA
    7139634 View commit details
    Browse the repository at this point in the history

Commits on Mar 22, 2021

  1. Merge "Prevent archiving of pci_devices records because of 'instance_…

    …uuid'" into stable/train
    Zuul authored and openstack-gerrit committed Mar 22, 2021
    Configuration menu
    Copy the full SHA
    bf5f696 View commit details
    Browse the repository at this point in the history

Commits on Mar 24, 2021

  1. Add config parameter 'live_migration_scheme' to live migration with t…

    …ls guide
    
    This patch adds the config option 'live_migration_scheme = tls' to the
    secure live migration guide.
    
    To let the live migration use the qemu native tls, some configuration of
    the compute nodes is needed. The guide describes this but misses the
    'live_migration_scheme' config option.
    
    It is necessary to set 'live_migration_scheme' to tls to use the
    connection uri for encrypted traffic. Without this parameter everything
    seems to work, but the unencrypted tcp-connection is still used for the
    live migration.
    
    Closes-Bug: #1919357
    Change-Id: Ia5130d411706bf7e1c983156158011a3bc6d5cd6
    (cherry picked from commit 5d5ff82)
    (cherry picked from commit 276b8db)
    (cherry picked from commit a968289)
    josephineSei committed Mar 24, 2021
    Configuration menu
    Copy the full SHA
    8559cee View commit details
    Browse the repository at this point in the history

Commits on Mar 25, 2021

  1. Improve error log when snapshot fails

    If snapshot creation via glance fails due to lack of space or over
    quota, we want to have a clearer error message.
    
    Change-Id: Ic9133f6bc14d4fe766d37a438bf52c33e89da768
    Closes-Bug: #1613770
    (cherry picked from commit 024bf10)
    Vu Tran authored and lyarwood committed Mar 25, 2021
    Configuration menu
    Copy the full SHA
    9e9c022 View commit details
    Browse the repository at this point in the history

Commits on Mar 31, 2021

  1. Configuration menu
    Copy the full SHA
    04298cf View commit details
    Browse the repository at this point in the history

Commits on Apr 2, 2021

  1. add functional regression test for bug #1888395

    This change adds a funcitonal regression test that
    assert the broken behavior when trying to live migrate
    with a neutron backend that does not support multiple port
    bindings.
    
    Conflicts/Changes:
      nova/tests/functional/regressions/test_bug_1888395.py:
        - specify api major version to allow block_migration 'auto'
        - use TempDir fixture for instances path
        - worked around  lack of create_server and start_computes in integrated
          helpers in train by inlining the behavior in setUp and test_live_migrate
        - reverted to python2 compatiable super() syntax
      nova/tests/unit/virt/libvirt/fake_imagebackend.py:
        - include portion of change Ia3d7351c1805d98bcb799ab0375673c7f1cb8848
          which stubs out the is_file_in_instance_path method. That was
          included in a feature patch set so just pulling the necessary
          bit.
    
    Change-Id: I470a016d35afe69809321bd67359f466c3feb90a
    Partial-Bug: #1888395
    (cherry picked from commit 71bc6fc)
    (cherry picked from commit bea55a7)
    SeanMooney committed Apr 2, 2021
    Configuration menu
    Copy the full SHA
    a4e2a6a View commit details
    Browse the repository at this point in the history

Commits on Apr 7, 2021

  1. Set migrate_data.vifs only when using multiple port bindings

    In the rocky cycle nova was enhanced to support the multiple
    port binding live migration workflow when neutron supports
    the binding-extended API extension.
    When the migration_data object was extended to support
    multiple port bindings, populating the vifs field was used
    as a sentinel to indicate that the new workflow should
    be used.
    
    In the train release
    I734cc01dce13f9e75a16639faf890ddb1661b7eb
    (SR-IOV Live migration indirect port support)
    broke the semantics of the migrate_data object by
    unconditionally populating the vifs field
    
    This change restores the rocky semantics, which are depended
    on by several parts of the code base, by only conditionally
    populating vifs if neutron supports multiple port bindings.
    
    Changes to patch:
      - unit/virt/libvirt/fakelibvirt.py: Include partial pick from
        change Ia3d7351c1805d98bcb799ab0375673c7f1cb8848 to add the
        jobStats, complete_job and fail_job to fakelibvirt. The full
        change was not cherry-picked as it was part of the numa aware
        live migration feature in Victoria.
      - renamed import of nova.network.neutron to
        nova.network.neutronv2.api
      - mocked nova.virt.libvirt.guest.Guest.get_job_info to return
        fakelibvirt.VIR_DOMAIN_JOB_COMPLETED
      - replaced from urllib import parse as urlparse with
        import six.moves.urllib.parse as urlparse for py2.7
    
    Conflicts:
        nova/tests/functional/regressions/test_bug_1888395.py
        nova/tests/unit/compute/test_compute.py
        nova/tests/unit/compute/test_compute_mgr.py
        nova/tests/unit/virt/test_virt_drivers.py
    
    Co-Authored-By: Sean Mooney <[email protected]>
    Change-Id: Ia00277ac8a68a635db85f9e0ce2c6d8df396e0d8
    Closes-Bug: #1888395
    (cherry picked from commit b8f3be6)
    (cherry picked from commit afa843c)
    Comrade88 authored and SeanMooney committed Apr 7, 2021
    Configuration menu
    Copy the full SHA
    5a6fd88 View commit details
    Browse the repository at this point in the history

Commits on Apr 13, 2021

  1. libvirt: Increase incremental and max sleep time during device detach

    Bug #1894804 outlines how DEVICE_DELETED events were often missing from
    QEMU on Focal based OpenStack CI hosts as originally seen in bug
     #1882521. This has eventually been tracked down to some undefined QEMU
    behaviour when a new device_del QMP command is received while another is
    still being processed, causing the original attempt to be aborted.
    
    We hit this race in slower OpenStack CI envs as n-cpu rather crudely
    retries attempts to detach devices using the RetryDecorator from
    oslo.service. The default incremental sleep time currently being tight
    enough to ensure QEMU is still processing the first device_del request
    on these slower CI hosts when n-cpu asks libvirt to retry the detach,
    sending another device_del to QEMU hitting the above behaviour.
    
    Additionally we have also seen the following check being hit when
    testing with QEMU >= v5.0.0. This check now rejects overlapping
    device_del requests in QEMU rather than aborting the original:
    
    qemu/qemu@cce8944
    
    This change aims to avoid this situation entirely by raising the default
    incremental sleep time between detach requests from 2 seconds to 10,
    leaving enough time for the first attempt to complete. The overall
    maximum sleep time is also increased from 30 to 60 seconds.
    
    Future work will aim to entirely remove this retry logic with a libvirt
    event driven approach, polling for the the
    VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED and
    VIR_DOMAIN_EVENT_ID_DEVICE_REMOVAL_FAILED events before retrying.
    
    Finally, the cleanup of unused arguments in detach_device_with_retry is
    left for a follow up change in order to keep this initial change small
    enough to quickly backport.
    
    Closes-Bug: #1882521
    Related-Bug: #1894804
    Change-Id: Ib9ed7069cef5b73033351f7a78a3fb566753970d
    (cherry picked from commit dd1e6d4)
    (cherry picked from commit 4819f69)
    (cherry picked from commit f32286c)
    lyarwood authored and Elod Illes committed Apr 13, 2021
    Configuration menu
    Copy the full SHA
    618103d View commit details
    Browse the repository at this point in the history

Commits on Apr 17, 2021

  1. Configuration menu
    Copy the full SHA
    366f938 View commit details
    Browse the repository at this point in the history

Commits on Apr 24, 2021

  1. Merge "libvirt: Increase incremental and max sleep time during device…

    … detach" into stable/train
    Zuul authored and openstack-gerrit committed Apr 24, 2021
    Configuration menu
    Copy the full SHA
    779596b View commit details
    Browse the repository at this point in the history

Commits on Apr 27, 2021

  1. Make _rebase_with_qemu_img() generic

    Move volume_delete related logic away from this method, in order to make
    it generic and usable elsewhere.
    
    NOTE(lyarwood): Conflict caused by I52fbbcac9dc386f24ee81b3321dd0d8355e01976
    landing in stbale/ussuri.
    
    Conflicts:
      nova/tests/unit/virt/libvirt/test_driver.py
    
    Change-Id: I17357d85f845d4160cb7c7784772530a1e92af76
    Related-Bug: #1732428
    (cherry picked from commit ce22034)
    (cherry picked from commit 2e89699)
    Alexandre Arents authored and lyarwood committed Apr 27, 2021
    Configuration menu
    Copy the full SHA
    c61ceac View commit details
    Browse the repository at this point in the history
  2. Rebase qcow2 images when unshelving an instance

    During unshelve, instance is spawn with image created by shelve
    and is deleted just after, instance.image_ref still point
    to the original instance build image.
    
    In qcow2 environment, this is an issue because instance backing file
    don't match anymore instance.image_ref and during live-migration/resize,
    target host will fetch image corresponding to instance.image_ref
    involving instance corruption.
    
    This change fetches original image and rebase instance disk on it.
    This avoid image_ref mismatch and bring back storage benefit to keep common
    image in cache.
    
    If original image is no more available in glance, backing file is merged into
    disk(flatten), ensuring instance integrity during next live-migration/resize
    operation.
    
    NOTE(lyarwood): Test conflicts caused by If56842da51688 not being
    present in stable/train.
    
    Conflicts:
      nova/tests/unit/virt/libvirt/test_driver.py
    
    Change-Id: I1a33fadf0b7439cf06c06cba2bc06df6cef0945b
    Closes-Bug: #1732428
    (cherry picked from commit 8953a68)
    (cherry picked from commit 7003618)
    Alexandre Arents authored and lyarwood committed Apr 27, 2021
    Configuration menu
    Copy the full SHA
    3bc1502 View commit details
    Browse the repository at this point in the history
  3. Update image_base_image_ref during rebuild.

    In different location we assume system_metadata.image_base_image_ref
    exists, because it is set during instance creation in method
    _populate_instance_for_create
    
    But once instance is rebuild, all system_metadata image property a dropped
    and replace by new image property and without setting back
    image_base_image_ref.
    
    This change propose to set image_base_image_ref during rebuild.
    
    In specific case of shelve/unshelve in Qcow2 backend, image_base_image_ref is
    used to rebase disk image, so we ensure this property is set as instance may
    have been rebuild before the fix was apply.
    
    Related-Bug: #1732428
    Closes-Bug: #1893618
    Change-Id: Ia3031ea1f7db8b398f02d2080ca603ded8970200
    (cherry picked from commit fe52b6c)
    (cherry picked from commit 5604140)
    Alexandre Arents authored and lyarwood committed Apr 27, 2021
    Configuration menu
    Copy the full SHA
    8a01a58 View commit details
    Browse the repository at this point in the history

Commits on Apr 28, 2021

  1. Configuration menu
    Copy the full SHA
    10df176 View commit details
    Browse the repository at this point in the history

Commits on May 22, 2021

  1. Configuration menu
    Copy the full SHA
    3d39a54 View commit details
    Browse the repository at this point in the history

Commits on May 23, 2021

  1. Configuration menu
    Copy the full SHA
    a030ddd View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    c5619c7 View commit details
    Browse the repository at this point in the history

Commits on Jun 4, 2021

  1. Configuration menu
    Copy the full SHA
    c5c9713 View commit details
    Browse the repository at this point in the history
  2. Merge "Set migrate_data.vifs only when using multiple port bindings" …

    …into stable/train
    Zuul authored and openstack-gerrit committed Jun 4, 2021
    Configuration menu
    Copy the full SHA
    b095ed1 View commit details
    Browse the repository at this point in the history

Commits on Jun 18, 2021

  1. Move 'check-cherry-picks' test to gate, n-v check

    This currently runs in the 'check' pipeline, as part of the pep8 job,
    which causes otherwise perfectly valid backports to report as failing
    CI. There's no reason a stable core shouldn't be encouraged to review
    these patches: we simply want to prevent them *merging* before their
    parent(s). Resolve this conflict by moving the check to separate voting
    job in the 'gate' pipeline as well as a non-voting job in the 'check'
    pipeline to catch more obvious issues.
    
    Change-Id: Id3e4452883f6a3cf44ff58b39ded82e882e28c23
    Signed-off-by: Stephen Finucane <[email protected]>
    (cherry picked from commit 98b01c9)
    (cherry picked from commit fef0305)
    (cherry picked from commit b7677ae)
    (cherry picked from commit 91314f7)
    stephenfin authored and lyarwood committed Jun 18, 2021
    Configuration menu
    Copy the full SHA
    de94f42 View commit details
    Browse the repository at this point in the history

Commits on Jun 25, 2021

  1. Merge "Add missing exception" into stable/train

    Zuul authored and openstack-gerrit committed Jun 25, 2021
    Configuration menu
    Copy the full SHA
    7acb9fc View commit details
    Browse the repository at this point in the history
  2. Use absolute path during qemu img rebase

    During an assisted volume snapshot delete request from Cinder nova
    removes the snapshot from the backing file chain. During that nova
    checks the existence of such file. However in some cases (see the bug
    report) the path is relative and therefore os.path.exists fails.
    
    This patch makes sure that nova uses the volume absolute path to make
    the backing file path absolute as well.
    
    Closes-Bug #1885528
    
    Change-Id: I58dca95251b607eaff602783fee2fc38e2421944
    (cherry picked from commit b933312)
    (cherry picked from commit 831abc9)
    (cherry picked from commit c2044d4)
    Balazs Gibizer authored and Elod Illes committed Jun 25, 2021
    Configuration menu
    Copy the full SHA
    e926ec7 View commit details
    Browse the repository at this point in the history
  3. Merge "Add a workaround config toggle to refuse ceph image upload" in…

    …to stable/train
    Zuul authored and openstack-gerrit committed Jun 25, 2021
    Configuration menu
    Copy the full SHA
    e6d6284 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    db1660d View commit details
    Browse the repository at this point in the history

Commits on Jun 29, 2021

  1. Error anti-affinity violation on migrations

    Error-out the migrations (cold and live) whenever the
    anti-affinity policy is violated. This addresses
    violations when multiple concurrent migrations are
    requested.
    
    Added detection on:
    - prep_resize
    - check_can_live_migration_destination
    - pre_live_migration
    
    The improved method of detection now locks based on group_id
    and considers other migrations in-progress as well.
    
    Closes-bug: #1821755
    Change-Id: I32e6214568bb57f7613ddeba2c2c46da0320fabc
    (cherry picked from commit 33c8af1)
    (cherry picked from commit 8b62a4e)
    (cherry picked from commit 6ede6df)
    (cherry picked from commit bf90a1e)
    rodrigogansobarbieri committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    a22d1b0 View commit details
    Browse the repository at this point in the history

Commits on Jul 9, 2021

  1. [neutron] Get only ID and name of the SGs from Neutron

    During the VM booting process Nova asks Neutron for the security groups
    of the project. If there are no any fields specified, Neutron will
    prepare list of security groups with all fields, including rules.
    In case if project got many SGs, it may take long time as rules needs to
    be loaded separately for each SG on Neutron's side.
    
    During booting of the VM, Nova really needs only "id" and "name" of the
    security groups so this patch limits request to only those 2 fields.
    
    This lazy loading of the SG rules was introduced in Neutron in [1] and
    [2].
    
    [1] https://review.opendev.org/#/c/630401/
    [2] https://review.opendev.org/#/c/637407/
    
    Related-Bug: #1865223
    Change-Id: I15c3119857970c38541f4de270bd561166334548
    (cherry picked from commit 388498a)
    (cherry picked from commit 4f49545)
    (cherry picked from commit f7d84db)
    (cherry picked from commit be4a514)
    slawqo authored and Elod Illes committed Jul 9, 2021
    Configuration menu
    Copy the full SHA
    1aa5711 View commit details
    Browse the repository at this point in the history

Commits on Aug 12, 2021

  1. only wait for plugtime events in pre-live-migration

    This change modifies _get_neutron_events_for_live_migration
    to filter the event to just the subset that will be sent
    at plug-time.
    
    Currently neuton has a bug where by the dhcp agent
    send a network-vif-plugged event during live migration after
    we update the port profile with "migrating-to:"
    this cause a network-vif-plugged event to be sent for
    configuration where vif_plugging in nova/os-vif is a noop.
    
    when that is corrected the current logic in nova cause the migration
    to time out as its waiting for an event that will never arrive.
    
    This change filters the set of events we wait for to just the plug
    time events.
    
    Conflicts:
        nova/compute/manager.py
        nova/tests/unit/compute/test_compute_mgr.py
    
    Related-Bug: #1815989
    Closes-Bug: #1901707
    Change-Id: Id2d8d72d30075200d2b07b847c4e5568599b0d3b
    (cherry picked from commit 8b33ac0)
    (cherry picked from commit ef348c4)
    (cherry picked from commit d9c833d)
    SeanMooney authored and lyarwood committed Aug 12, 2021
    Configuration menu
    Copy the full SHA
    c0a36d9 View commit details
    Browse the repository at this point in the history

Commits on Aug 15, 2021

  1. Configuration menu
    Copy the full SHA
    b8174ec View commit details
    Browse the repository at this point in the history

Commits on Aug 26, 2021

  1. Merge "Raise InstanceMappingNotFound if StaleDataError is encountered…

    …" into stable/train
    Zuul authored and openstack-gerrit committed Aug 26, 2021
    Configuration menu
    Copy the full SHA
    48ad6b4 View commit details
    Browse the repository at this point in the history

Commits on Oct 5, 2021

  1. [stable-only] Pin virtualenv and setuptools

    Setuptools 58.0 (bundled in virtualenv 20.8) breaks the installation of
    decorator 3.4.0. So this patch pins virtualenv to avoid the break.
    
    As the used 'require' feature was introduced in tox in version 3.2 [1],
    the required minversion has to be bumped, too.
    
    [1] https://tox.readthedocs.io/en/latest/config.html#conf-requires
    
    Conflicts:
        tox.ini
    
    NOTE(melwitt): The conflict is because change
    Ie1a0cbd82a617dbcc15729647218ac3e9cd0e5a9 (Stop testing Python 2) is
    not in Train.
    
    Change-Id: I26b2a14e0b91c0ab77299c3e4fbed5f7916fe8cf
    (cherry picked from commit b27f8e9)
    Balazs Gibizer authored and melwitt committed Oct 5, 2021
    Configuration menu
    Copy the full SHA
    f1be212 View commit details
    Browse the repository at this point in the history

Commits on Oct 8, 2021

  1. Reject open redirection in the console proxy

    NOTE(melwitt): This is the combination of two commits, the bug fix and
    a followup change to the unit test to enable it also run on
    Python < 3.6.
    
    Our console proxies (novnc, serial, spice) run in a websockify server
    whose request handler inherits from the python standard
    SimpleHTTPRequestHandler. There is a known issue [1] in the
    SimpleHTTPRequestHandler which allows open redirects by way of URLs
    in the following format:
    
      http://vncproxy.my.domain.com//example.com/%2F..
    
    which if visited, will redirect a user to example.com.
    
    We can intercept a request and reject requests that pass a redirection
    URL beginning with "//" by implementing the
    SimpleHTTPRequestHandler.send_head() method containing the
    vulnerability to reject such requests with a 400 Bad Request.
    
    This code is copied from a patch suggested in one of the issue comments
    [2].
    
    Closes-Bug: #1927677
    
    [1] https://bugs.python.org/issue32084
    [2] https://bugs.python.org/issue32084#msg306545
    
    Conflicts:
        nova/tests/unit/console/test_websocketproxy.py
    
    NOTE(melwitt): The conflict is because change
    I23ac1cc79482d0fabb359486a4b934463854cae5 (Allow TLS ciphers/protocols
    to be configurable for console proxies) is not in Train.
    
    NOTE(melwitt): The difference from the cherry picked change:
    HTTPStatus.BAD_REQUEST => 400 is due to the fact that HTTPStatus does
    not exist in Python 2.7.
    
    Reduce mocking in test_reject_open_redirect for compat
    
    This is a followup for change Ie36401c782f023d1d5f2623732619105dc2cfa24
    to reduce mocking in the unit test coverage for it.
    
    While backporting the bug fix, it was found to be incompatible with
    earlier versions of Python < 3.6 due to a difference in internal
    implementation [1].
    
    This reduces the mocking in the unit test to be more agnostic to the
    internals of the StreamRequestHandler (ancestor of
    SimpleHTTPRequestHandler) and work across Python versions >= 2.7.
    
    Related-Bug: #1927677
    
    [1] python/cpython@34eeed4
    
    Change-Id: I546d376869a992601b443fb95acf1034da2a8f36
    (cherry picked from commit 214cabe)
    (cherry picked from commit 9c2f297)
    (cherry picked from commit 94e265f)
    (cherry picked from commit d43b88a)
    
    Change-Id: Ie36401c782f023d1d5f2623732619105dc2cfa24
    (cherry picked from commit 781612b)
    (cherry picked from commit 4709256)
    (cherry picked from commit 6b70350)
    (cherry picked from commit 719e651)
    melwitt authored and Elod Illes committed Oct 8, 2021
    Configuration menu
    Copy the full SHA
    04d4852 View commit details
    Browse the repository at this point in the history
  2. address open redirect with 3 forward slashes

    Ie36401c782f023d1d5f2623732619105dc2cfa24 was intended
    to address OSSA-2021-002 (CVE-2021-3654) however after its
    release it was discovered that the fix only worked
    for urls with 2 leading slashes or more then 4.
    
    This change adresses the missing edgecase for 3 leading slashes
    and also maintian support for rejecting 2+.
    
    Conflicts:
      nova/console/websocketproxy.py
      nova/tests/unit/console/test_websocketproxy.py
    
    NOTE(melwitt): The conflict and difference in websocketproxy.py from
    the cherry picked change: HTTPStatus.BAD_REQUEST => 400 is due to the
    fact that HTTPStatus does not exist in Python 2.7. The conflict in
    test_websocketproxy.py is because change
    I23ac1cc79482d0fabb359486a4b934463854cae5 (Allow TLS ciphers/protocols
    to be configurable for console proxies) is not in Train. The difference
    in test_websocketproxy.py from the cherry picked change is due to a
    difference in internal implementation [1] in Python < 3.6. See change
    I546d376869a992601b443fb95acf1034da2a8f36 for reference.
    
    [1] python/cpython@34eeed4
    
    Change-Id: I95f68be76330ff09e5eabb5ef8dd9a18f5547866
    co-authored-by: Matteo Pozza
    Closes-Bug: #1927677
    (cherry picked from commit 6fbd0b7)
    (cherry picked from commit 47dad48)
    (cherry picked from commit 9588cdb)
    (cherry picked from commit 0997043)
    SeanMooney authored and Elod Illes committed Oct 8, 2021
    Configuration menu
    Copy the full SHA
    8906552 View commit details
    Browse the repository at this point in the history

Commits on Jan 21, 2022

  1. Ensure MAC addresses characters are in the same case

    Currently neutron can report ports to have MAC addresses
    in upper case when they're created like that. In the meanwhile
    libvirt configuration file always stores MAC in lower case
    which leads to KeyError while trying to retrieve migrate_vif.
    
    Closes-Bug: #1945646
    Change-Id: Ie3129ee395427337e9abcef2f938012608f643e1
    (cherry picked from commit 6a15169)
    (cherry picked from commit 63a6388)
    (cherry picked from commit 6c3d5de)
    (cherry picked from commit 28d0059)
    (cherry picked from commit 184a3c9)
    Dmitriy Rabotyagov authored and MrStupnikov committed Jan 21, 2022
    Configuration menu
    Copy the full SHA
    a5da31e View commit details
    Browse the repository at this point in the history

Commits on May 12, 2022

  1. Configuration menu
    Copy the full SHA
    b18ee51 View commit details
    Browse the repository at this point in the history