openstackgerrit | Merged openstack/nova master: Remove Stein compute compat checks for volume type support https://review.opendev.org/687428 | 00:08 |
---|---|---|
*** d34dh0r53 has quit IRC | 00:51 | |
*** nweinber has joined #openstack-nova | 01:02 | |
*** nweinber has quit IRC | 01:04 | |
*** nweinber has joined #openstack-nova | 01:05 | |
*** nanzha has joined #openstack-nova | 01:08 | |
*** ociuhandu has joined #openstack-nova | 01:08 | |
*** ociuhandu has quit IRC | 01:13 | |
*** Liang__ has joined #openstack-nova | 01:13 | |
*** nweinber has quit IRC | 01:13 | |
*** hamzy_ has quit IRC | 01:33 | |
*** nanzha has quit IRC | 01:37 | |
*** nanzha has joined #openstack-nova | 01:38 | |
*** ganso has quit IRC | 02:18 | |
*** ganso has joined #openstack-nova | 02:18 | |
*** ricolin has joined #openstack-nova | 02:29 | |
*** larainema has joined #openstack-nova | 02:31 | |
*** gbarros has joined #openstack-nova | 02:31 | |
*** spsurya has joined #openstack-nova | 02:32 | |
*** gbarros has quit IRC | 02:43 | |
*** xek has quit IRC | 02:49 | |
*** nanzha has quit IRC | 02:54 | |
*** nanzha has joined #openstack-nova | 02:54 | |
*** mkrai_ has joined #openstack-nova | 03:48 | |
*** ociuhandu has joined #openstack-nova | 04:31 | |
*** factor has quit IRC | 04:32 | |
*** ociuhandu has quit IRC | 04:36 | |
*** gbarros has joined #openstack-nova | 05:02 | |
*** Luzi has joined #openstack-nova | 05:03 | |
*** gbarros has quit IRC | 05:06 | |
*** dave-mccowan has quit IRC | 05:08 | |
*** jhesketh has quit IRC | 05:11 | |
*** markvoelker has joined #openstack-nova | 05:21 | |
*** markvoelker has quit IRC | 05:26 | |
*** gbarros has joined #openstack-nova | 05:31 | |
*** gbarros has quit IRC | 05:35 | |
*** hoonetorg has quit IRC | 05:37 | |
*** Liang__ has quit IRC | 05:41 | |
*** gbarros has joined #openstack-nova | 05:42 | |
*** Liang__ has joined #openstack-nova | 05:46 | |
*** hoonetorg has joined #openstack-nova | 05:48 | |
*** gbarros has quit IRC | 05:49 | |
*** ttsiouts has joined #openstack-nova | 06:23 | |
*** slaweq has joined #openstack-nova | 06:31 | |
*** ccamacho has joined #openstack-nova | 06:39 | |
*** dpawlik has joined #openstack-nova | 06:51 | |
*** gbarros has joined #openstack-nova | 06:53 | |
bauzas | good morning Nova | 06:58 |
*** gbarros has quit IRC | 06:58 | |
gibi | bauzas: good morning! | 07:11 |
bauzas | :) | 07:11 |
*** pcaruana has joined #openstack-nova | 07:16 | |
*** maciejjozefczyk has joined #openstack-nova | 07:22 | |
*** tesseract has joined #openstack-nova | 07:22 | |
*** damien_r has joined #openstack-nova | 07:31 | |
*** rcernin has quit IRC | 07:32 | |
*** damien_r has quit IRC | 07:34 | |
*** damien_r has joined #openstack-nova | 07:36 | |
*** gbarros has joined #openstack-nova | 07:44 | |
*** avolkov has joined #openstack-nova | 07:46 | |
*** gbarros has quit IRC | 07:50 | |
*** rpittau|afk is now known as rpittau | 07:51 | |
*** ralonsoh has joined #openstack-nova | 08:07 | |
lyarwood | remote: amqps://messaging-devops-broker02.web.prod.ext.phx2.redhat.com:5671: proton:io: recv: Connection refused | 08:18 |
lyarwood | argh sorry | 08:18 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Fix ItemMatcher to avoid false positives https://review.opendev.org/689690 | 08:23 |
*** nanzha has quit IRC | 08:23 | |
*** nanzha has joined #openstack-nova | 08:25 | |
mdbooth | sean-k-mooney: You around, yet? Mind taking a look at Stephen's unplug vif patch and the functional test I wrote for it? Head: https://review.opendev.org/#/c/663382/12 Base: https://review.opendev.org/#/c/689186/2 | 08:33 |
*** dpawlik has quit IRC | 08:34 | |
mdbooth | 4 patches in total | 08:34 |
mdbooth | 2 bugs, unfortunately, as the functional test uncovered another one | 08:34 |
*** derekh has joined #openstack-nova | 08:36 | |
*** brinzhang has joined #openstack-nova | 08:40 | |
*** vesper has joined #openstack-nova | 08:40 | |
*** vesper11 has quit IRC | 08:40 | |
*** hamzy_ has joined #openstack-nova | 08:41 | |
*** dtantsur|afk is now known as dtantsur | 08:41 | |
*** jangutter has joined #openstack-nova | 08:42 | |
*** dpawlik has joined #openstack-nova | 08:44 | |
*** nanzha has quit IRC | 08:53 | |
*** mdbooth has quit IRC | 08:55 | |
*** ociuhandu has joined #openstack-nova | 08:56 | |
*** mdbooth has joined #openstack-nova | 08:57 | |
*** brinzhang_ has joined #openstack-nova | 08:57 | |
*** ociuhandu has quit IRC | 09:00 | |
*** brinzhang has quit IRC | 09:00 | |
*** jaosorior has joined #openstack-nova | 09:06 | |
*** nanzha has joined #openstack-nova | 09:06 | |
*** ociuhandu has joined #openstack-nova | 09:10 | |
*** ociuhandu has quit IRC | 09:11 | |
*** ociuhandu has joined #openstack-nova | 09:12 | |
*** xek has joined #openstack-nova | 09:18 | |
*** shilpasd has joined #openstack-nova | 09:24 | |
*** openstackgerrit has quit IRC | 09:37 | |
*** Luzi has quit IRC | 09:43 | |
*** ociuhandu has quit IRC | 09:47 | |
*** ociuhandu has joined #openstack-nova | 09:48 | |
*** Liang__ has quit IRC | 09:49 | |
*** ociuhandu has quit IRC | 09:53 | |
*** yaawang_ has quit IRC | 09:53 | |
*** yaawang_ has joined #openstack-nova | 09:54 | |
*** ociuhandu has joined #openstack-nova | 09:54 | |
*** ociuhandu has quit IRC | 09:55 | |
*** ociuhandu has joined #openstack-nova | 09:56 | |
*** ociuhandu has quit IRC | 09:56 | |
*** ociuhandu has joined #openstack-nova | 09:56 | |
*** slaweq has quit IRC | 10:04 | |
*** ttsiouts has quit IRC | 10:20 | |
*** ttsiouts has joined #openstack-nova | 10:20 | |
*** ttsiouts has quit IRC | 10:24 | |
*** rcernin has joined #openstack-nova | 10:42 | |
*** mkrai_ has quit IRC | 10:46 | |
*** mkrai__ has joined #openstack-nova | 10:46 | |
*** tbachman has quit IRC | 10:47 | |
*** ttsiouts has joined #openstack-nova | 10:50 | |
*** mkrai__ has quit IRC | 10:50 | |
*** mkrai_ has joined #openstack-nova | 10:50 | |
*** tssurya has joined #openstack-nova | 10:51 | |
*** rcernin has quit IRC | 10:52 | |
*** ttsiouts has quit IRC | 10:55 | |
*** brinzhang_ has quit IRC | 10:56 | |
*** mkrai_ has quit IRC | 10:56 | |
*** brinzhang_ has joined #openstack-nova | 10:56 | |
*** brinzhang has joined #openstack-nova | 10:57 | |
*** brinzhang_ has quit IRC | 11:01 | |
*** markvoelker has joined #openstack-nova | 11:10 | |
*** trident has quit IRC | 11:11 | |
*** ttsiouts has joined #openstack-nova | 11:13 | |
*** ociuhandu has quit IRC | 11:14 | |
*** ociuhandu has joined #openstack-nova | 11:15 | |
*** trident has joined #openstack-nova | 11:15 | |
*** markvoelker has quit IRC | 11:16 | |
*** openstackgerrit has joined #openstack-nova | 11:18 | |
openstackgerrit | Merged openstack/nova master: Add PrepResizeAtDestTask https://review.opendev.org/627890 | 11:18 |
openstackgerrit | Merged openstack/nova master: Add prep_snapshot_based_resize_at_source compute method https://review.opendev.org/634832 | 11:19 |
*** ociuhandu has quit IRC | 11:19 | |
*** ociuhandu has joined #openstack-nova | 11:20 | |
*** ricolin_ has joined #openstack-nova | 11:23 | |
*** rcernin has joined #openstack-nova | 11:24 | |
*** ociuhandu has quit IRC | 11:24 | |
*** ricolin has quit IRC | 11:25 | |
*** ttsiouts has quit IRC | 11:28 | |
*** ttsiouts has joined #openstack-nova | 11:29 | |
openstackgerrit | Merged openstack/nova master: Add PrepResizeAtSourceTask https://review.opendev.org/627891 | 11:31 |
*** ttsiouts has quit IRC | 11:33 | |
*** rcernin has quit IRC | 11:34 | |
*** brinzhang_ has joined #openstack-nova | 11:39 | |
*** brinzhang has quit IRC | 11:43 | |
*** jistr is now known as jistr|mtgs | 11:47 | |
*** nanzha has quit IRC | 11:48 | |
*** jangutter has quit IRC | 11:49 | |
*** dviroel has joined #openstack-nova | 11:50 | |
*** mgariepy has joined #openstack-nova | 11:51 | |
*** nanzha has joined #openstack-nova | 11:54 | |
*** ttsiouts has joined #openstack-nova | 11:55 | |
*** shilpasd has quit IRC | 11:58 | |
*** markvoelker has joined #openstack-nova | 11:59 | |
*** jangutter has joined #openstack-nova | 12:02 | |
*** tbachman has joined #openstack-nova | 12:03 | |
*** tbachman_ has joined #openstack-nova | 12:08 | |
*** tbachman has quit IRC | 12:08 | |
*** tbachman_ is now known as tbachman | 12:08 | |
*** ociuhandu has joined #openstack-nova | 12:16 | |
*** larainema has quit IRC | 12:19 | |
*** another_larsks is now known as larsks | 12:29 | |
*** jaosorior has quit IRC | 12:32 | |
*** obre has quit IRC | 12:39 | |
*** ociuhandu has quit IRC | 12:44 | |
*** ociuhandu has joined #openstack-nova | 12:45 | |
*** ociuhandu has quit IRC | 12:50 | |
*** ociuhandu has joined #openstack-nova | 12:55 | |
*** Luzi has joined #openstack-nova | 12:55 | |
*** obre has joined #openstack-nova | 13:00 | |
*** davee__ has joined #openstack-nova | 13:00 | |
*** mmethot has joined #openstack-nova | 13:17 | |
*** nweinber has joined #openstack-nova | 13:18 | |
*** damien_r has quit IRC | 13:22 | |
*** Luzi has quit IRC | 13:27 | |
*** dave-mccowan has joined #openstack-nova | 13:27 | |
*** bnemec has joined #openstack-nova | 13:29 | |
*** shilpasd has joined #openstack-nova | 13:31 | |
*** damien_r has joined #openstack-nova | 13:33 | |
*** brinzhang has joined #openstack-nova | 13:34 | |
*** brinzhang has quit IRC | 13:35 | |
*** maciejjozefczyk is now known as mjozefcz|lunch | 13:35 | |
*** brinzhang has joined #openstack-nova | 13:35 | |
*** eharney has joined #openstack-nova | 13:36 | |
*** brinzhang_ has quit IRC | 13:36 | |
*** mjozefcz|lunch has quit IRC | 13:40 | |
*** gbarros has joined #openstack-nova | 13:41 | |
*** jaosorior has joined #openstack-nova | 13:46 | |
*** xek has quit IRC | 13:49 | |
*** mjozefcz|lunch has joined #openstack-nova | 13:51 | |
*** xek has joined #openstack-nova | 13:51 | |
dansmith | stephenfin: so, I got a novaclient change merged, what is the magic I need to be able to use that api from osc? a release and requirements bump? | 14:08 |
efried | dansmith: I think stephenfin is on vacay until the ptg. | 14:09 |
dansmith | oh right | 14:09 |
efried | dtroyer: --^ | 14:09 |
dansmith | I could go bug other people (like dtroyer) I just was going to keep it in the family since we have experienced people here | 14:10 |
dansmith | I'm sure mriedem knows too, were he around | 14:10 |
*** ociuhandu has quit IRC | 14:12 | |
*** mriedem has joined #openstack-nova | 14:20 | |
mriedem | happy monday everyone! | 14:20 |
dansmith | um. | 14:21 |
* mriedem is delirious from cold pills | 14:22 | |
dansmith | mriedem: so, I got a novaclient change merged, what is the magic I need to be able to use that api from osc? a release and requirements bump? | 14:22 |
mriedem | dansmith: yeah, brinzhang has a novaclient release patch i was holding for you | 14:22 |
mriedem | just update the git hash | 14:22 |
mriedem | https://review.opendev.org/#/c/688638/ | 14:22 |
mriedem | looks like he already updated it to include your stuff | 14:23 |
*** nanzha has quit IRC | 14:23 | |
dansmith | yeah | 14:23 |
dansmith | just found that | 14:23 |
bauzas | mriedem: happy end of monday for me ! | 14:24 |
*** nanzha has joined #openstack-nova | 14:28 | |
*** spatel has joined #openstack-nova | 14:28 | |
*** ttsiouts has quit IRC | 14:35 | |
*** ttsiouts has joined #openstack-nova | 14:35 | |
*** ttsiouts has quit IRC | 14:40 | |
*** jmlowe has quit IRC | 14:43 | |
*** jmlowe has joined #openstack-nova | 14:45 | |
*** dpawlik has quit IRC | 14:45 | |
*** mkrai_ has joined #openstack-nova | 14:47 | |
*** dklyle has joined #openstack-nova | 14:50 | |
*** brinzhang has quit IRC | 14:52 | |
*** brinzhang has joined #openstack-nova | 14:53 | |
*** macz has joined #openstack-nova | 14:54 | |
lyarwood | mriedem: https://review.opendev.org/#/q/topic:bug/1835400+status:open - would you mind hitting these? | 15:00 |
*** mjozefcz|lunch has quit IRC | 15:00 | |
mriedem | sure | 15:02 |
*** TxGirlGeek has joined #openstack-nova | 15:02 | |
lyarwood | thanks | 15:03 |
*** ttsiouts has joined #openstack-nova | 15:04 | |
mriedem | lyarwood: btw i'm assuming you've seen the emails about getting queens released sometimes this week before EM yeah? | 15:05 |
lyarwood | mriedem: no sorry, just catching up after yet more time out at the tail end of last week, I'll take a look after this meeting. | 15:06 |
mriedem | ok, tl;dr unfortunately a lot of what's sitting in queens is waiting to merge on newer branches, so kind of need to flush from there but you know how that goes. would be helpful to start with these in train: https://review.opendev.org/#/q/status:open+project:openstack/nova+branch:stable/train | 15:08 |
*** mlavalle has joined #openstack-nova | 15:10 | |
lyarwood | mriedem: ack I'll make a start tonight | 15:16 |
*** priteau has joined #openstack-nova | 15:18 | |
*** artom has joined #openstack-nova | 15:19 | |
*** igordc has joined #openstack-nova | 15:27 | |
mriedem | gibi: are you ready for https://blueprints.launchpad.net/nova/+spec/support-move-ops-with-qos-ports-ussuri to go into an open runway slot? | 15:28 |
gibi | mriedem: yes, I'm | 15:28 |
*** trident has quit IRC | 15:29 | |
gibi | mriedem: zuul hate https://review.opendev.org/#/c/688387/ as it always fail with a different reason | 15:29 |
*** trident has joined #openstack-nova | 15:32 | |
*** igordc has quit IRC | 15:34 | |
*** damien_r has quit IRC | 15:38 | |
*** mkrai_ has quit IRC | 15:38 | |
*** mkrai_ has joined #openstack-nova | 15:39 | |
*** priteau has quit IRC | 15:41 | |
*** markvoelker has quit IRC | 15:46 | |
*** ttsiouts has quit IRC | 15:48 | |
*** tbachman has quit IRC | 15:48 | |
dansmith | mriedem: can probably remove mine from a slot now.. just client and docs stuff remaining, and then some notification stuff with gibi's help | 15:48 |
*** ttsiouts has joined #openstack-nova | 15:48 | |
gibi | dansmith: do you need my help writing the notificaiton patch or just my eyes to review it? | 15:49 |
dansmith | gibi: yes :D | 15:49 |
gibi | :D | 15:49 |
*** markvoelker has joined #openstack-nova | 15:49 | |
mriedem | gibi: i'd be wary of nova-live-migration job failures on that patch since it's the only job that also runs evacuate | 15:50 |
mriedem | sean-k-mooney: mdbooth: i don't see why we need to tie this regression test https://review.opendev.org/#/c/689278/ to the libvirt functional test base or the refactors you're doing in that libvirt func test tree | 15:51 |
gibi | mriedem: I will double check that but it does not fail constantly | 15:51 |
mriedem | that means unnecessary backports | 15:51 |
*** mkrai_ has quit IRC | 15:51 | |
mriedem | and the plug/unplug thing is driver agnostic | 15:51 |
mriedem | gibi: https://zuul.opendev.org/t/openstack/build/2b10b4a240b84245bcee3366db93951d/log/job-output.txt#8630 | 15:52 |
*** tssurya has quit IRC | 15:52 | |
mriedem | gibi: that might be http://status.openstack.org/elastic-recheck/#1813789 | 15:52 |
mriedem | but would have to check | 15:52 |
gibi | mriedem: will do | 15:52 |
*** ttsiouts has quit IRC | 15:53 | |
gmann | mriedem: melwitt replied on host_status policy patch. that is only doc bug for 2.75. let me know if ok to fix in same patch otherwise i can push enw one - https://review.opendev.org/#/c/679181/1/nova/policies/servers.py@120 | 15:55 |
mriedem | dansmith: done, removed from queue | 15:55 |
dansmith | cool | 15:55 |
mriedem | gmann: maybe push a separate fix and melwitt can rebase on top of it | 15:56 |
gmann | mriedem: ok, even same for extended-server-attributes. https://github.com/openstack/nova/blob/964d7dc87989b5765fcc60d34f734963ab8e03e7/nova/policies/extended_server_attributes.py | 15:57 |
*** shilpasd has quit IRC | 16:00 | |
*** slaweq has joined #openstack-nova | 16:00 | |
melwitt | gmann: yeah if you upload a fix I'll rebase my patch on top of it | 16:00 |
gmann | melwitt: ok, doing.. | 16:00 |
mriedem | gibi: something occurs to me - what if _update_pci_request_spec_with_allocated_interface_name runs and updates the instance pci_requests, then we fail later in whatever operation (resize, cold migrate, evacuate), we don't roll that back right? so instance.pci_requests would be wrong after that and might cause issues, but i'm not sure what kind - maybe hard reboot would fail? | 16:00 |
*** tesseract has quit IRC | 16:03 | |
*** jaosorior has quit IRC | 16:03 | |
gibi | mriedem: good point. The InstancePciRequest change is not rolled back. | 16:04 |
gibi | mriedem: it could cause the failure of the pci_claim in the rt | 16:04 |
gibi | mriedem: do we re-claim resource at hard reboot? I don't think | 16:04 |
mriedem | no | 16:05 |
gibi | mriedem: if the instance is migrated forward then the migration will update the pci request again | 16:05 |
gibi | so the pci_claim at the second migration will see consitent data | 16:06 |
*** mgariepy has quit IRC | 16:06 | |
gibi | mriedem: now I think not rolling back this is not causing problems. but I agree that it is not nice to have wrong data in the PCI request after rollback | 16:07 |
*** brinzhang_ has joined #openstack-nova | 16:08 | |
*** jistr|mtgs is now known as jistr | 16:08 | |
mriedem | so could we have a case like, | 16:08 |
mriedem | 1. create server using port request and it gets parent_ifname foo, | 16:09 |
mriedem | 2. try to migrate the server and we update the instance pci_request to point at parent_ifname bar, | 16:09 |
mriedem | 3. migrate fails and we don't rollback the instance pci_request | 16:09 |
mriedem | 4. another server is created - it can't use parent_ifname bar even though nothing is using it right? | 16:09 |
*** mgariepy has joined #openstack-nova | 16:10 | |
mriedem | iow it's "claimed" by the failed first server | 16:10 |
gibi | this is not part of the claim | 16:10 |
gibi | it drivers the claim to select the good PF | 16:10 |
gibi | but the whole PF never claimed | 16:10 |
*** brinzhang has quit IRC | 16:10 | |
*** mjozefcz|lunch has joined #openstack-nova | 16:11 | |
mriedem | ok, then how about if we hard reboot the failed server, will it try to use parent_ifname bar and fail if another guest on the same host is already using that? | 16:11 |
gibi | mriedem: if we hard reboot then nobody will check the parent_ifname. It is only the VF pci address that matters | 16:11 |
mriedem | ok i don't know how any of this stuff is actually used in the driver | 16:12 |
mriedem | just thinking about fallout scenarios | 16:12 |
gibi | parent_finame is not a consumable, it is like a trait | 16:12 |
gibi | it helps selecting the proper pool of VFs during the pci_claom | 16:12 |
*** nanzha has quit IRC | 16:13 | |
mriedem | efried: alex_xu: looks like we have a fun TypeError for the resources stuff added late in Train https://zuul.opendev.org/t/openstack/build/2b10b4a240b84245bcee3366db93951d/log/logs/screen-n-cpu.txt.gz?severity=4#2675 | 16:13 |
*** dtantsur is now known as dtantsur|afk | 16:14 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Fix policy doc for host_status and extended servers attribute https://review.opendev.org/689833 | 16:15 |
gibi | mriedem: made a TODO to look into how hard to roll back the parent_ifname during rollback or revert | 16:15 |
* gibi leaves for today | 16:17 | |
mriedem | gibi: ack, not a huge deal | 16:17 |
mriedem | i know you like to pull these types of threads in your spare time :) | 16:17 |
gibi | :) | 16:17 |
mriedem | efried: alex_xu: https://bugs.launchpad.net/nova/+bug/1849165 seems there is a race in that resources code in the RT during migrations | 16:18 |
openstack | Launchpad bug 1849165 in OpenStack Compute (nova) "_populate_assigned_resources raises TypeError: argument of type 'NoneType' is not iterable" [High,New] | 16:18 |
mriedem | this is where i say even things that touch the RT which aren't used still have side effects... | 16:22 |
*** mjozefcz|lunch has quit IRC | 16:22 | |
*** markvoelker has quit IRC | 16:23 | |
efried | mriedem: is that happening because mig.instance.migration_context isn't set at that point? | 16:30 |
*** macz has quit IRC | 16:31 | |
*** macz has joined #openstack-nova | 16:31 | |
mriedem | yeah | 16:33 |
mriedem | i haven't traced everything here, but in this case i think the RT is running the periodic on the dest host before the instance gets there | 16:33 |
mriedem | so the Migration record exists pointing at the source and dest host, but the instance hasn't moved yet | 16:33 |
mriedem | and given the migration record is usually created in the control plane but the migration context doesn't exist until we do a claim in the compute, there is a window | 16:34 |
mriedem | dansmith: remember https://review.opendev.org/#/c/274870/ ? i'm seeing where we hit the KeyError in _pair_instances_to_migrations and end up still lazy-loading the migration.instance.migration_context/flavor - what do you think about changing Migration.instance to load the Instance with migration_context and flavor fields always? | 16:34 |
openstackgerrit | Eric Fried proposed openstack/nova master: Always trait the compute node RP with COMPUTE_NODE https://review.opendev.org/688979 | 16:35 |
efried | bauzas: added reno ^ | 16:35 |
*** dpawlik has joined #openstack-nova | 16:35 | |
efried | mriedem: so `if not mig_ctx: continue` ? | 16:37 |
mriedem | efried: yeah i think so | 16:39 |
efried | mriedem: I'll throw that out | 16:39 |
*** spsurya has quit IRC | 16:40 | |
mriedem | _update_usage_from_migration is what populates self.tracked_migrations and you can see the "elif incoming and not tracked:" logic | 16:40 |
mriedem | that's the case we're hitting here i think | 16:40 |
*** slaweq has quit IRC | 16:40 | |
mriedem | [None req-dd5ddbad-4234-4288-bbab-2c3d20b7f4ad None None] [instance: cd4148a2-4550-4e83-b6f7-c91752eaf779] Starting to track incoming migration 407fd025-e8ba-4012-ade7-d0255d2a1837 | 16:40 |
mriedem | [None req-dd5ddbad-4234-4288-bbab-2c3d20b7f4ad None None] Error updating resources for node ubuntu-bionic-rax-iad-0012404623. | 16:41 |
efried | mriedem: but it looks like we should be under COMPUTE_RESOURCE_SEMAPHORE in all those code paths | 16:43 |
gmann | mriedem: melwitt https://review.opendev.org/#/c/689833/ | 16:43 |
mriedem | doesn't matter | 16:43 |
mriedem | efried: that's per-worker, not global | 16:43 |
mriedem | efried: so: | 16:44 |
mriedem | 1. conductor creates migration record with source and dest host set, | 16:44 |
mriedem | 2. update_available_resource runs on dest host to start tracking incoming migration | 16:44 |
mriedem | kaboom | 16:44 |
mriedem | 3. claim happens on dest host to create migration_context (on a different request thread) | 16:44 |
dansmith | mriedem: yeah, makes sense.. if we're looking for an instance involved with a migration, it's likely we care about the migration context | 16:45 |
mriedem | dansmith: ok will push something | 16:47 |
efried | mriedem: are you working up a func test? | 16:47 |
mriedem | nope | 16:47 |
efried | is a func test possible? | 16:48 |
mriedem | yeah probably | 16:48 |
mriedem | i mean, i *could* write one up | 16:48 |
mriedem | if $$$properly$$$ motivated | 16:49 |
openstackgerrit | Dan Smith proposed openstack/nova master: Add image precaching docs for aggregates https://review.opendev.org/687348 | 16:52 |
efried | I just can't seem to stop seeing that ^ as "preaching" | 16:52 |
dansmith | mriedem: ^ includes some generic image caching doc stuff in addition to precaching | 16:52 |
dansmith | as we discussed | 16:52 |
*** ociuhandu has joined #openstack-nova | 16:52 | |
mriedem | is it worth marking "Partial-Bug: #1847302" or at least related? | 16:53 |
openstack | bug 1847302 in OpenStack Compute (nova) "doc: need admin guide for the image cache" [Undecided,New] https://launchpad.net/bugs/1847302 | 16:53 |
dansmith | if you want | 16:54 |
mriedem | i think that woudl be good (Related-Bug) | 16:54 |
openstackgerrit | Eric Fried proposed openstack/nova master: WIP: Don't populate resources for not-yet-migrated instance https://review.opendev.org/689842 | 16:54 |
efried | mriedem: ^ | 16:54 |
openstackgerrit | Dan Smith proposed openstack/nova master: Add image precaching docs for aggregates https://review.opendev.org/687348 | 16:54 |
dansmith | mriedem: already did partial-, I can change to related- if you want, as I'm sure this will need some respins | 16:54 |
mriedem | up to you, only difference is with partial the bug is assigned to you | 16:55 |
mriedem | which might imply signing up for more work than you want | 16:55 |
dansmith | drat! :) | 16:56 |
*** ociuhandu has quit IRC | 16:58 | |
*** derekh has quit IRC | 17:00 | |
efried | still trying to understand the race. | 17:01 |
efried | - create instance | 17:01 |
efried | - migrate instance, which has the following steps: | 17:01 |
efried | - x create migration context -- this creates the migration record and populates the instance in it, but the instance doesn't have a migration_context yet | 17:01 |
efried | - y schedule | 17:01 |
efried | - z claim on destination (_move_claim) -- this is the thing that sets the instance's migration_context | 17:01 |
efried | So we need update_available_resource on the dest compute to run after x and before z, yah? | 17:01 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Join migration_context and flavor in Migration.instance https://review.opendev.org/689846 | 17:03 |
mriedem | your steps are wrong | 17:03 |
*** markvoelker has joined #openstack-nova | 17:04 | |
mriedem | you don't control when update_available_resource runs, it's on a timer in a thread pool | 17:04 |
efried | well, we can trigger it manually in a test | 17:04 |
*** KeithMnemonic has joined #openstack-nova | 17:05 | |
mriedem | 1. create instance, | 17:05 |
mriedem | 2. wrap _prep_resize on the dest compute service to trigger the periodic on the dest when calling _prep_resize | 17:06 |
mriedem | 3. initiate a migration | 17:06 |
mriedem | check logs for TypeError | 17:06 |
dansmith | this is happening in CI yeah? | 17:06 |
dansmith | so more soak time and we would have found it I guess | 17:07 |
mriedem | yeah, multinode migration tests | 17:07 |
mriedem | efried: i can whip up a func recreate test quick if you want | 17:09 |
efried | can we please merge https://review.opendev.org/#/c/686207/ which will be useful for this? | 17:09 |
mriedem | no | 17:09 |
efried | mriedem: I'm working on it, wouldn't mind the experience. | 17:09 |
mriedem | because we don't want to backport that spy stuff | 17:10 |
mriedem | and this has to go to train | 17:10 |
efried | ight | 17:10 |
mriedem | i mean, i'd rather not backport the spy stuff | 17:10 |
mriedem | if enough stuff builds on it i guess it will eventually happen | 17:10 |
efried | slight catch-22 there | 17:11 |
mriedem | well, it would be nice if the spy stuff does land, that it soaks a bit before we have to backport it for something that otherwise wouldn't need it | 17:12 |
mriedem | cuz if there is a bug in spy then you have to fix and backport that later as well | 17:12 |
efried | okay. Its successor has soakage. Let's merge both | 17:12 |
mriedem | i don't think you can claim "soakage" on anything that isn't merged | 17:13 |
dansmith | to me, confidence in soakage comes from being in the firehose, not a few rechecks on a single patch | 17:13 |
dansmith | yeah, that ^ | 17:13 |
*** mjozefcz|lunch has joined #openstack-nova | 17:14 | |
efried | no, my point is, if we merge both of those things in master now, they're in the firehose because the second patch puts them in the way of I think three functional tests. Then we can soak them in master until you're comfortable before doing anything further. | 17:16 |
*** ralonsoh has quit IRC | 17:17 | |
*** dpawlik has quit IRC | 17:23 | |
* mriedem puts food stuffs into my ingress hole | 17:25 | |
openstackgerrit | Merged openstack/nova stable/rocky: Stop sending bad values from libosinfo to libvirt https://review.opendev.org/688068 | 17:26 |
openstackgerrit | Merged openstack/nova stable/stein: Fix unit of hw_rng:rate_period https://review.opendev.org/689153 | 17:26 |
*** jangutter has quit IRC | 17:26 | |
*** nweinber_ has joined #openstack-nova | 17:26 | |
*** nanzha has joined #openstack-nova | 17:27 | |
*** dpawlik has joined #openstack-nova | 17:27 | |
*** tbachman has joined #openstack-nova | 17:29 | |
*** nweinber has quit IRC | 17:29 | |
*** dpawlik has quit IRC | 17:33 | |
*** nanzha has quit IRC | 17:33 | |
*** jmlowe has quit IRC | 17:35 | |
*** dpawlik has joined #openstack-nova | 17:40 | |
*** ccamacho has quit IRC | 17:40 | |
artom | sean-k-mooney, btw, do we care about https://review.opendev.org/#/c/465783/ not being in OSP10 Neutron? | 17:42 |
artom | (Sorry to make you context-switch - it's a dependency of https://code.engineering.redhat.com/gerrit/#/c/183568/1) | 17:42 |
artom | Doh, was meant for the internal channel | 17:43 |
*** mjozefcz|lunch has quit IRC | 17:43 | |
*** dpawlik has quit IRC | 17:47 | |
efried | mriedem: I'm close-ish. | 17:48 |
efried | Doing the mock at _prep_resize wasn't working because both the migration and the migration context are happening underneath that. | 17:48 |
efried | AFAICT I need to get inside _move_claim itself, between _claim_existing_migration (at the top) and the init of mig_context (at the bottom). | 17:48 |
efried | So I tried mocking _claim_existing_migration to do its thing first, followed by running periodics. | 17:48 |
efried | But that deadlocks because *in test* they're both running in the same worker, so they're under the same COMPUTE_RESOURCE_SEMAPHORE. | 17:48 |
efried | Where would you go from here? | 17:48 |
* efried also feeds face, bbiab | 17:49 | |
*** mjozefcz|lunch has joined #openstack-nova | 18:01 | |
*** eharney has quit IRC | 18:02 | |
*** nweinber__ has joined #openstack-nova | 18:03 | |
*** nweinber_ has quit IRC | 18:06 | |
*** N3l1x has joined #openstack-nova | 18:11 | |
mriedem | efried: you don't need to replace _prep_resize entirely, just wrap it, | 18:11 |
mriedem | i.e. stub it out and inside the stub run the periodics and then call the original _prep_resize | 18:12 |
mriedem | like this https://review.opendev.org/#/c/689013/1/nova/tests/functional/regressions/test_bug_1848343.py | 18:12 |
*** eharney has joined #openstack-nova | 18:15 | |
mriedem | looks like the ceph job is not happy again | 18:19 |
mriedem | http://grafana.openstack.org/d/-iKINcImz/ceph-failure-rate?orgId=1 | 18:19 |
mriedem | and the graphs are outdated on master since we don't run tempest-full-py3 in master (we run tempest-integrated-compute) | 18:19 |
melwitt | dangit, I'll look and also get the grafana page issue fixed | 18:20 |
mriedem | i looked at a ceph failure last week and it looked like some setup issue in devstack | 18:20 |
melwitt | ok | 18:21 |
openstackgerrit | sean mooney proposed openstack/nova master: block rebuild when numa topology changed https://review.opendev.org/687957 | 18:23 |
openstackgerrit | sean mooney proposed openstack/nova master: Disable NUMATopologyFilter on rebuild https://review.opendev.org/689861 | 18:23 |
openstackgerrit | Merged openstack/nova stable/train: Update compute rpc version alias for train https://review.opendev.org/689164 | 18:26 |
openstackgerrit | Merged openstack/nova stable/train: Error out interrupted builds https://review.opendev.org/687216 | 18:27 |
*** mjozefcz|lunch has quit IRC | 18:27 | |
melwitt | mriedem: can you link me a failure? I don't see any recent failure when I open a few sample patches | 18:28 |
*** nanzha has joined #openstack-nova | 18:32 | |
dansmith | melwitt: could probably convince kibana to show you just that job with failed status | 18:33 |
melwitt | oh yeah, kibana. my old friend | 18:34 |
*** factor has joined #openstack-nova | 18:35 | |
dansmith | something like build_status:FAILURE build_name:"ceph" | 18:35 |
openstackgerrit | Merged openstack/nova stable/train: Handle get_host_availability_zone error during reschedule https://review.opendev.org/686226 | 18:35 |
* dansmith runs off | 18:35 | |
melwitt | yeah, thanks | 18:36 |
melwitt | can do that | 18:36 |
*** nanzha has quit IRC | 18:37 | |
*** amodi has quit IRC | 18:38 | |
openstackgerrit | Merged openstack/nova stable/train: Fix exception translation when creating volume https://review.opendev.org/688072 | 18:50 |
*** jmlowe has joined #openstack-nova | 18:50 | |
*** dpawlik has joined #openstack-nova | 18:51 | |
*** gbarros has quit IRC | 19:04 | |
*** spatel has quit IRC | 19:04 | |
*** slaweq has joined #openstack-nova | 19:10 | |
efried | mriedem: yes, that's what I did, but it didn't trigger the problem. | 19:12 |
jmlowe | Is anybody aware off the top of their heads of a mechanism to quota vgpu's ? | 19:14 |
*** CeeMac has quit IRC | 19:15 | |
efried | melwitt: ^ did we get anywhere with the placement-based quota thing? | 19:17 |
melwitt | efried: not yet. that's johnthetubaguy's unified limits spec | 19:17 |
melwitt | I had a chat with him about it and he'd like to get started on it this cycle and will re-propose the spec | 19:18 |
efried | https://review.opendev.org/#/c/602201/ | 19:18 |
*** jmlowe has quit IRC | 19:18 | |
melwitt | (and I will help with the work) | 19:18 |
efried | that un ^ ? | 19:18 |
melwitt | yes that's the one | 19:18 |
efried | oh, jmlowe left, nm | 19:18 |
*** jmlowe has joined #openstack-nova | 19:21 | |
*** dpawlik has quit IRC | 19:21 | |
efried | mriedem: is the theory that MigrationTask._execute (n-api) creates the Migration record, which should thenceforth be able to produce an instance via the .instance @property; and then casts to prep_resize on the compute? So by hijacking _prep_resize and doing update_available_resource first, that should hit the window. | 19:21 |
efried | jmlowe: you're back! | 19:23 |
jmlowe | sometimes you eat the wifi sometimes the wifi eats you | 19:23 |
efried | see https://review.opendev.org/#/c/602201/ which melwitt says she and johnthetubaguy will be working on this cycle. | 19:24 |
efried | that would allow you to quota placement-based resources like vgpu | 19:24 |
efried | btw, I'm assuming you were talking about post-vgpu-in-placement. Otherwise I think it's just a PCI device and can be quotaed like any other PCI device. | 19:24 |
jmlowe | exactly what I was hoping for | 19:24 |
jmlowe | slapping an arbitrary quota on something tracked by placement | 19:25 |
* melwitt nods | 19:25 | |
jmlowe | now I just need to hope placement can understand different flavors of vgpu's | 19:25 |
efried | jmlowe: what do you mean by "flavors"? | 19:26 |
jmlowe | a 8GB slice of a 16GB framebuffer != a 4GB slice of a 16GB frame buffer | 19:27 |
efried | Those are distinctions for which traits would be appropriate | 19:27 |
jmlowe | the nvidia vgpu stuff slices up the frame buffer in powers of 2 | 19:27 |
efried | which of them are done by nova and which you would have to do manually, I couldn't say off the top. | 19:27 |
jmlowe | 1/2, 1/4, 1/8, etc | 19:27 |
efried | If you want to hang out during euro business hours, bauzas might be able to answer those off the top. | 19:28 |
*** gyee has joined #openstack-nova | 19:29 | |
efried | ...I can't see where we're assigning any traits to VGPU providers at the moment. | 19:30 |
jmlowe | I'm currently on rocky, so we are looking at the U release for the real stuff, I've got 24 GPU's now so I can manage for the time being by hand (by hand I mean sharp whacks on the back of the user's hands with a ruler) | 19:30 |
jmlowe | a year from now there's a chance I'll have a few gpu's hundred to manage | 19:31 |
*** eharney has quit IRC | 19:33 | |
*** gbarros has joined #openstack-nova | 19:33 | |
efried | is there any workaround for this f'in subunit parser bs, hitting it locally and it's making things tough to debug :( | 19:40 |
mriedem | efried: MigrationTask.execute runs in conductor, not the api, but otherwise yes same idea | 19:40 |
efried | mriedem: so okay, I was doing all those things, and not seeing the error in the logs. | 19:41 |
mriedem | are you running the periodic on the source or dest compute service? | 19:41 |
efried | both | 19:41 |
efried | because self._run_periodics() is easier | 19:41 |
efried | than digging up just the dest. | 19:41 |
efried | but that shouldn't matter, should it? | 19:41 |
mriedem | it shouldn't no | 19:42 |
efried | let me push it and maybe something jumps out. | 19:42 |
mriedem | "is there any workaround for this f'in subunit parser bs, hitting it locally and it's making things tough to debug" - i was hitting that a week or two ago on my "mega boot from volume request" test and had to change some stuff, i thought about pushing that up but never did | 19:42 |
mriedem | iow to make OS_DEBUG=True work | 19:42 |
openstackgerrit | Eric Fried proposed openstack/nova master: WIP: Don't populate resources for not-yet-migrated instance https://review.opendev.org/689842 | 19:43 |
openstackgerrit | Eric Fried proposed openstack/nova master: WIP: Func: bug 1849165: mig race with _populate_assigned_resources https://review.opendev.org/689866 | 19:43 |
openstack | bug 1849165 in OpenStack Compute (nova) "_populate_assigned_resources raises "TypeError: argument of type 'NoneType' is not iterable" during active migration" [High,In progress] https://launchpad.net/bugs/1849165 - Assigned to Eric Fried (efried) | 19:43 |
efried | mriedem: ^ | 19:43 |
*** nweinber__ has quit IRC | 19:46 | |
openstackgerrit | Eric Fried proposed openstack/nova master: Don't populate resources for not-yet-migrated inst https://review.opendev.org/689842 | 19:48 |
efried | that'll be copacetic once the test works at least ^ | 19:49 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Filter duplicates from compute API get_migrations_sorted() https://review.opendev.org/636224 | 19:49 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Start functional testing for cross-cell resize https://review.opendev.org/636253 | 19:49 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Handle target host cross-cell cold migration in conductor https://review.opendev.org/642591 | 19:49 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Validate image/create during cross-cell resize functional testing https://review.opendev.org/642592 | 19:49 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add zones wrinkle to TestMultiCellMigrate https://review.opendev.org/643450 | 19:49 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add negative test for cross-cell finish_resize failing https://review.opendev.org/643451 | 19:49 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Refresh instance in MigrationTask.execute Exception handler https://review.opendev.org/669012 | 19:49 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add negative test for prep_snapshot_based_resize_at_source failing https://review.opendev.org/669013 | 19:49 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: WIP: Add confirm_snapshot_based_resize_at_source compute method https://review.opendev.org/637058 | 19:49 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: WIP: Add ConfirmResizeTask https://review.opendev.org/637070 | 19:49 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add confirm_snapshot_based_resize conductor RPC method https://review.opendev.org/637075 | 19:49 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Confirm cross-cell resize from the API https://review.opendev.org/637316 | 19:49 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: WIP: Add revert_snapshot_based_resize_at_dest compute method https://review.opendev.org/637630 | 19:49 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Deal with cross-cell resize in _remove_deleted_instances_allocations https://review.opendev.org/639453 | 19:49 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: WIP: Add finish_revert_snapshot_based_resize_at_source compute method https://review.opendev.org/637647 | 19:49 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: WIP: Add RevertResizeTask https://review.opendev.org/638046 | 19:49 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add revert_snapshot_based_resize conductor RPC method https://review.opendev.org/638047 | 19:49 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Revert cross-cell resize from the API https://review.opendev.org/638048 | 19:49 |
efried | (It'll work fine now since the repro... doesn't.) | 19:50 |
*** jmlowe has quit IRC | 19:50 | |
mriedem | efried: ok i think i see the problem, | 19:52 |
mriedem | RT._claim_existing_migration is what sets Migration.dest_compute, not conductor, | 19:52 |
mriedem | so you need to mock that, run the original and then run periodics before returning from _claim_existing_migration | 19:52 |
mriedem | the live migration task in conductor sets Migration.dest_compute so that's where i got confused | 19:53 |
mriedem | i guess we do that for the live migration task because we used to never do claims in the compute for live migration | 19:53 |
mriedem | so we'd never call RT._claim_existing_migration for live migration before train | 19:54 |
*** dpawlik has joined #openstack-nova | 19:54 | |
*** artom has quit IRC | 19:54 | |
mriedem | well, and we still don't if the instance doesn't have a numa topology | 19:54 |
efried | mriedem: I tried doing as you suggest and ran into the deadlock. | 19:54 |
efried | but | 19:55 |
efried | why does Migration.dest_compute make the difference? | 19:55 |
mriedem | because RT._update_available_resource calls MigrationList.get_in_progress_by_host_and_node | 19:55 |
efried | okay | 19:56 |
mriedem | without having Migration.dest_compute set the migration won't be returning for that dest host during the periodic | 19:56 |
efried | so I was barking up the right tree | 19:56 |
efried | to get around the deadlock, should I ... stub out the lock? | 19:56 |
efried | that seems pretty dangerous. | 19:56 |
mriedem | hmm | 19:57 |
efried | Perhaps I can stub out the lock just from within my stub | 19:59 |
mriedem | well, if we have to monkey with locks we're likely going about this the wrong way | 20:00 |
mriedem | so maybe you can't reproduce with a cold migration resize_claim and instead need to do a live migration in the test, | 20:00 |
mriedem | because in that case Migration.dest_compute is set in the conductor | 20:00 |
efried | okay. | 20:00 |
efried | still _prep_resize tho? | 20:01 |
mriedem | no, that's not called for live migration | 20:01 |
mriedem | you could probably stub pre_live_migration | 20:01 |
mriedem | that runs on the dest | 20:01 |
mriedem | unless the instance has a numa topology i don't think it'll ever have a migration context so it doesn't matter too much | 20:01 |
*** dpawlik has quit IRC | 20:02 | |
efried | woot, got repro | 20:04 |
mriedem | yass | 20:05 |
mriedem | now i have to go pick up some contacts | 20:05 |
*** mriedem is now known as mriedem_afk | 20:05 | |
*** jmlowe has joined #openstack-nova | 20:06 | |
*** dpawlik has joined #openstack-nova | 20:07 | |
*** spatel has joined #openstack-nova | 20:08 | |
openstackgerrit | Eric Fried proposed openstack/nova master: Func: bug 1849165: mig race with _populate_assigned_resources https://review.opendev.org/689866 | 20:08 |
openstack | bug 1849165 in OpenStack Compute (nova) "_populate_assigned_resources raises "TypeError: argument of type 'NoneType' is not iterable" during active migration" [High,In progress] https://launchpad.net/bugs/1849165 - Assigned to Eric Fried (efried) | 20:08 |
openstackgerrit | Eric Fried proposed openstack/nova master: Don't populate resources for not-yet-migrated inst https://review.opendev.org/689842 | 20:08 |
efried | mriedem_afk, luyao: ^ | 20:09 |
*** markvoelker has quit IRC | 20:11 | |
*** markvoelker has joined #openstack-nova | 20:12 | |
openstackgerrit | Eric Fried proposed openstack/nova master: Func: bug 1849165: mig race with _populate_assigned_resources https://review.opendev.org/689866 | 20:13 |
openstack | bug 1849165 in OpenStack Compute (nova) "_populate_assigned_resources raises "TypeError: argument of type 'NoneType' is not iterable" during active migration" [High,In progress] https://launchpad.net/bugs/1849165 - Assigned to Eric Fried (efried) | 20:13 |
openstackgerrit | Eric Fried proposed openstack/nova master: Don't populate resources for not-yet-migrated inst https://review.opendev.org/689842 | 20:13 |
efried | now with pep8 fixed! | 20:13 |
*** pcaruana has quit IRC | 20:16 | |
*** slaweq has quit IRC | 20:16 | |
*** bbowen has quit IRC | 20:25 | |
*** slaweq has joined #openstack-nova | 20:28 | |
*** slaweq has quit IRC | 20:33 | |
*** slaweq has joined #openstack-nova | 20:43 | |
*** dave-mccowan has quit IRC | 20:47 | |
*** slaweq has quit IRC | 20:48 | |
*** spatel has quit IRC | 20:51 | |
*** mriedem_afk is now known as mriedem | 20:54 | |
openstackgerrit | Florian Haas proposed openstack/nova stable/queens: Explain nested guest support https://review.opendev.org/609790 | 20:56 |
*** tbachman has quit IRC | 20:57 | |
mriedem | efried: +2 on both | 20:58 |
*** dpawlik has quit IRC | 20:58 | |
efried | thanks mriedem. I think gibi is to blame for the formatting -- I copy/pasted test_bug_1845291.py | 20:58 |
mriedem | tarnations | 20:58 |
efried | We should get someone like wangfaxin to propose a patch to reformat it. | 20:59 |
efried | https://review.opendev.org/#/q/owner:wangfaxin%2540inspur.com+status:open | 20:59 |
*** gbarros has quit IRC | 21:05 | |
*** factor has quit IRC | 21:08 | |
*** dave-mccowan has joined #openstack-nova | 21:10 | |
*** dpawlik has joined #openstack-nova | 21:11 | |
mriedem | whatever helps the new contributor | 21:12 |
mriedem | efried: i replied in https://review.opendev.org/#/c/689049/ - i'm not sure how much we're on the same page, but i think i'm thinking more we need to do your "eventually" case | 21:13 |
*** amodi has joined #openstack-nova | 21:13 | |
mriedem | you said, "TL;DR the only thing that should change right now is: don't retry on consumer 409." but that's not what we do today anyway in move_allocations | 21:13 |
efried | oh | 21:14 |
efried | yeah | 21:14 |
mriedem | unless i don't understand how that code works | 21:14 |
efried | no, you're right, I forgot we were doing the text scraping thing | 21:14 |
efried | in that case, we shouldn't "fix" this at all. | 21:14 |
mriedem | ok | 21:14 |
*** mloza has joined #openstack-nova | 21:15 | |
efried | I'll need to swap it back in to be more thoroughly convinced of that ^ but at least last week when I was thinking this through, I convinced myself that making these narrow/local changes was not going to help anything (given that we're not retrying on consumer alloc 409) | 21:16 |
mriedem | i agree that we don't have the context for figuring out what to do within move_allocations, | 21:17 |
mriedem | which is why i think we need a separate method that wraps move_allocations and has the necessary logic | 21:17 |
mriedem | which only gets used in the revert case | 21:17 |
mriedem | i've got about 45 min to kill so maybe i can hack something together for what i'm thinking | 21:18 |
lifeless | mriedem: constraints was parallel to solver work FWIW | 21:19 |
lifeless | mriedem: even with a solver we want stability and precise controls :) | 21:19 |
lifeless | mriedem: I did have a solver branch, but yeah, job changes at the wrong time stalled that work then someone else popped up so ... its in a branch on github somewhere but the other volunteers thing will eventuate eventually I hope | 21:20 |
*** markvoelker has quit IRC | 21:25 | |
*** mgoddard has quit IRC | 21:31 | |
*** mgoddard has joined #openstack-nova | 21:32 | |
*** bbowen has joined #openstack-nova | 21:34 | |
mordred | lifeless: you're a solver branch | 21:35 |
*** eharney has joined #openstack-nova | 21:36 | |
*** gbarros has joined #openstack-nova | 21:40 | |
*** slaweq has joined #openstack-nova | 21:44 | |
*** mgoddard has quit IRC | 21:47 | |
*** mdbooth has quit IRC | 21:48 | |
*** mgoddard has joined #openstack-nova | 21:49 | |
*** slaweq has quit IRC | 21:49 | |
*** dpawlik has quit IRC | 21:50 | |
*** mdbooth has joined #openstack-nova | 21:50 | |
mriedem | lifeless: ack | 22:02 |
mriedem | efried: ok i've got a poc which passes functional tests, i'll push that up for you and gibi to ponder before trying to cover new unit tests | 22:02 |
efried | ight | 22:03 |
*** dave-mccowan has quit IRC | 22:03 | |
mriedem | i think you're right in that there is still a race with the instance being deleted no matter how tight i make things unless we assert the existence of the instance both before and after calling move_allocations, which kind of sucks | 22:04 |
*** openstackgerrit has quit IRC | 22:07 | |
*** jhesketh has joined #openstack-nova | 22:12 | |
*** dave-mccowan has joined #openstack-nova | 22:14 | |
lifeless | mordred: weak sauce | 22:15 |
*** mlavalle has quit IRC | 22:15 | |
*** nanzha has joined #openstack-nova | 22:18 | |
efried | mriedem: do you agree the proper thing would be to pull the instance allocation at the very beginning of the operation? | 22:20 |
mriedem | like before we move the instance allocations to the migration record before calling the scheduler? | 22:22 |
mriedem | or just before calling move_allocations on revert? | 22:22 |
efried | the former | 22:22 |
*** nanzha has quit IRC | 22:23 | |
efried | or, does that operation return a payload? in which case keep hold of the generation from there. | 22:23 |
mriedem | does what operation return a payload? scheduling? | 22:24 |
efried | Moving allocs to the mig record. | 22:25 |
mriedem | POST /allocations returns a 204 | 22:26 |
mriedem | so no payload | 22:26 |
efried | but | 22:26 |
efried | we must have pushed a gen to do it | 22:26 |
efried | so, save that gen. | 22:26 |
mriedem | the target consumer generatoin is optional in move_allocations, which is the problem in my bug; the source consumer generation is required | 22:27 |
mriedem | but i see what you're saying | 22:27 |
*** mgoddard has quit IRC | 22:27 | |
efried | In English: "Make sure the instance didn't change between when we started the migration and... the bug place." | 22:27 |
mriedem | yeah idk what could go wonky with that, seems we'd only use that in an edge case, e.g. the target consumer doesn't have allocations for us to revert with a gen | 22:28 |
mriedem | for now i've just got a big comment so i can push this up | 22:29 |
*** mgoddard has joined #openstack-nova | 22:33 | |
*** avolkov has quit IRC | 22:36 | |
*** dklyle has quit IRC | 22:42 | |
*** david-lyle has joined #openstack-nova | 22:42 | |
*** slaweq has joined #openstack-nova | 22:45 | |
*** mgoddard has quit IRC | 22:47 | |
*** Liang__ has joined #openstack-nova | 22:48 | |
*** slaweq has quit IRC | 22:50 | |
*** macz has quit IRC | 22:54 | |
*** macz has joined #openstack-nova | 22:55 | |
*** dviroel has quit IRC | 22:58 | |
melwitt | argh how do I get kibana to not show me the same change a million times for a build_status:FAILURE | 23:05 |
*** artom has joined #openstack-nova | 23:08 | |
mriedem | you need one line from the console | 23:10 |
melwitt | ok | 23:11 |
mriedem | "ERROR: all: commands failed"? | 23:12 |
mriedem | though that's only if tempest failed | 23:12 |
mriedem | "failed: 1" would be if any play failed | 23:12 |
melwitt | aha much better. thanks | 23:13 |
mriedem | http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22failed%3A%201%5C%22%20AND%20tags%3A%5C%22console%5C%22%20AND%20build_name%3A%5C%22devstack-plugin-ceph-tempest%5C%22&from=7d | 23:13 |
mriedem | ? | 23:13 |
melwitt | a lot of these look like they are the result of rebases | 23:14 |
melwitt | like, I go to the change that says it failed and there's no logs for that PS and it's not done running yet? I dunno, still looking | 23:15 |
mriedem | looking at https://d494348350733031166c-4e71828f84900af50a9a26357b84a827.ssl.cf1.rackcdn.com/689842/5/check/devstack-plugin-ceph-tempest/962455b/ | 23:15 |
mriedem | it has a few failures, | 23:15 |
mriedem | some look like http://status.openstack.org/elastic-recheck/#1763070 | 23:16 |
mriedem | and one test failed because n-api returned a 500 due to: | 23:16 |
mriedem | "NovaException: Cell 901092b8-6de2-4aad-b21a-e1c21691eb30 is not responding and hence instance info is not available." | 23:16 |
melwitt | yeah I was just looking at that one | 23:16 |
mriedem | which is likely the same thing as http://status.openstack.org/elastic-recheck/#1844929 | 23:16 |
mriedem | though ^ has only been showing up in grenade jobs (in the scheduler logs that is) | 23:16 |
melwitt | ah right | 23:17 |
*** rcernin has joined #openstack-nova | 23:19 | |
melwitt | this one's an ssh timeout https://e02da289fa5cd71d2848-a802bb880ba142924be00bfc16ee185a.ssl.cf5.rackcdn.com/637647/45/check/devstack-plugin-ceph-tempest/97e930a/testr_results.html.gz | 23:19 |
*** rcernin has quit IRC | 23:19 | |
melwitt | so far not seeing anything outside of known gate issues | 23:19 |
*** rcernin has joined #openstack-nova | 23:20 | |
mriedem | hmm i know we disabled run_validation (ssh) in tempest api tests for that job but the ssh stuff in the scenario tests is unconditional i guess | 23:21 |
mriedem | and in that ssh fail it's due to the guest being out of space, "GROWROOT: NOCHANGE: partition 1 is size 2078687. it cannot be grown" | 23:21 |
mriedem | which we've been tracking with http://status.openstack.org/elastic-recheck/#1808010 | 23:21 |
mriedem | so maybe just get the fail dashboard fixed and we go from there | 23:21 |
melwitt | ok, yup will do | 23:22 |
mriedem | otherwise i'm seeing things like creating volumes from snapshots or images failing, which might be known races with the rbd backend in cinder | 23:22 |
melwitt | yeah, could also be races in tempest test code. I have fixed one like that in the distant past before | 23:23 |
*** tbachman has joined #openstack-nova | 23:26 | |
*** markvoelker has joined #openstack-nova | 23:26 | |
*** markvoelker has quit IRC | 23:30 | |
*** slaweq has joined #openstack-nova | 23:46 | |
*** slaweq has quit IRC | 23:50 | |
*** openstackgerrit has joined #openstack-nova | 23:53 | |
openstackgerrit | Brin Zhang proposed openstack/python-novaclient master: Add minor version [21] to the test_versions https://review.opendev.org/688599 | 23:53 |
*** Liang__ has quit IRC | 23:58 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!