Wednesday, 2019-06-26

*** sapd1_x has quit IRC00:07
*** spatel has joined #openstack-nova00:08
*** slaweq has joined #openstack-nova00:11
*** slaweq has quit IRC00:15
*** lbragstad has quit IRC00:23
*** hamzy has joined #openstack-nova00:27
*** brinzhang has joined #openstack-nova00:32
*** spatel has quit IRC00:34
*** bhagyashris has joined #openstack-nova00:50
*** takashin has joined #openstack-nova00:51
*** brinzhang has quit IRC00:55
*** bhagyashris has quit IRC00:59
*** _alastor_ has quit IRC01:12
*** brinzhang has joined #openstack-nova01:18
*** spatel has joined #openstack-nova01:30
*** spatel has quit IRC01:30
openstackgerritMerged openstack/nova master: hacking: Resolve W503 (line break occurred before a binary operator)  https://review.opendev.org/65155501:31
openstackgerritMerged openstack/nova master: hacking: Resolve E741 (ambiguous variable name)  https://review.opendev.org/65210301:31
*** mgoddard has quit IRC01:40
*** mgoddard has joined #openstack-nova01:48
*** yedongcan has joined #openstack-nova01:53
openstackgerritYongli He proposed openstack/nova-specs master: grammar fix for show-server-numa-topology spec  https://review.opendev.org/66748701:54
openstackgerritYongli He proposed openstack/nova master: Clean up orphan instances virt driver  https://review.opendev.org/64891201:57
openstackgerritYongli He proposed openstack/nova master: clean up orphan instances  https://review.opendev.org/62776501:57
*** igordc has quit IRC01:58
*** Dinesh_Bhor has joined #openstack-nova02:05
*** slaweq has joined #openstack-nova02:11
*** slaweq has quit IRC02:16
openstackgerritBhagyashri Shewale proposed openstack/nova master: Ignore root_gb for BFV in simple tenant usage API  https://review.opendev.org/61262602:27
*** bhagyashris has joined #openstack-nova02:33
*** hongbin has joined #openstack-nova02:44
openstackgerritAlex Xu proposed openstack/nova master: Correct the comment of RequestSpec's network_metadata  https://review.opendev.org/66706102:44
*** cfriesen has quit IRC02:52
*** ricolin has joined #openstack-nova03:01
*** takashin has left #openstack-nova03:02
*** whoami-rajat has joined #openstack-nova03:04
*** hongbin has quit IRC03:18
*** mkrai__ has joined #openstack-nova03:30
*** Dinesh_Bhor has quit IRC03:33
*** psachin has joined #openstack-nova03:35
*** udesale has joined #openstack-nova03:51
*** hongbin has joined #openstack-nova03:54
*** slaweq has joined #openstack-nova04:11
*** hongbin has quit IRC04:12
*** slaweq has quit IRC04:16
*** jhesketh has quit IRC04:19
*** jhesketh has joined #openstack-nova04:19
*** dave-mccowan has quit IRC04:23
*** mkrai__ has quit IRC04:27
*** mkrai__ has joined #openstack-nova04:29
*** shilpasd has joined #openstack-nova04:29
*** ratailor has joined #openstack-nova04:58
openstackgerritYongli He proposed openstack/nova-specs master: grammar fix for show-server-numa-topology spec  https://review.opendev.org/66748705:22
*** konetzed has quit IRC05:28
*** ivve has joined #openstack-nova05:36
*** mmethot has quit IRC05:46
*** mgoddard has quit IRC05:54
*** shilpasd has quit IRC05:56
*** Bidwe_jay65 has quit IRC05:56
*** lpetrut has joined #openstack-nova05:56
*** lpetrut has quit IRC05:57
*** lpetrut has joined #openstack-nova05:57
*** tetsuro has joined #openstack-nova05:59
*** mgoddard has joined #openstack-nova06:00
*** dpawlik has joined #openstack-nova06:05
*** slaweq has joined #openstack-nova06:11
*** ratailor has quit IRC06:14
*** slaweq has quit IRC06:16
*** artom has joined #openstack-nova06:16
*** artom is now known as artom|gmtplus306:16
*** ratailor has joined #openstack-nova06:21
*** pcaruana has joined #openstack-nova06:26
*** belmoreira has joined #openstack-nova06:33
*** rdopiera has joined #openstack-nova06:35
*** mkrai__ has quit IRC06:39
*** mkrai_ has joined #openstack-nova06:39
openstackgerritMerged openstack/nova master: Remove comments about mirroring changes to nova/cells/messaging.py  https://review.opendev.org/66710706:43
openstackgerritMerged openstack/nova master: Drop source node allocations if finish_resize fails  https://review.opendev.org/65406706:43
*** slaweq has joined #openstack-nova06:44
*** belmoreira has quit IRC06:45
*** luksky has joined #openstack-nova06:45
*** belmoreira has joined #openstack-nova06:47
*** maciejjozefczyk has joined #openstack-nova06:50
*** rdopiera has quit IRC06:52
*** rdopiera has joined #openstack-nova06:52
openstackgerritBrin Zhang proposed openstack/python-novaclient master: Microversion 2.74: Support Specifying AZ to unshelve  https://review.opendev.org/66513606:52
*** bhagyashris has quit IRC07:01
*** rcernin has quit IRC07:06
*** belmoreira has quit IRC07:13
*** belmoreira has joined #openstack-nova07:14
*** tesseract has joined #openstack-nova07:16
brinzhangefried: Are you around?07:29
*** belmoreira has quit IRC07:34
*** ccamacho has joined #openstack-nova07:37
*** tetsuro has quit IRC07:40
*** rajinir has quit IRC07:45
*** damien_r has joined #openstack-nova07:46
*** belmoreira has joined #openstack-nova07:48
*** ttsiouts has joined #openstack-nova07:48
*** ralonsoh has joined #openstack-nova07:49
*** psachin has quit IRC07:55
*** tetsuro has joined #openstack-nova07:58
*** tssurya has joined #openstack-nova08:00
*** ttsiouts has quit IRC08:00
*** ttsiouts has joined #openstack-nova08:01
*** ttsiouts has quit IRC08:05
*** ttsiouts has joined #openstack-nova08:06
*** tkajinam has quit IRC08:16
openstackgerritBalazs Gibizer proposed openstack/nova master: pull out functions from _heal_allocations_for_instance  https://review.opendev.org/65545708:25
openstackgerritBalazs Gibizer proposed openstack/nova master: reorder conditions in _heal_allocations_for_instance  https://review.opendev.org/65545808:25
openstackgerritBalazs Gibizer proposed openstack/nova master: Prepare _heal_allocations_for_instance for nested allocations  https://review.opendev.org/63795408:25
openstackgerritBalazs Gibizer proposed openstack/nova master: pull out put_allocation call from _heal_*  https://review.opendev.org/65545908:25
openstackgerritBalazs Gibizer proposed openstack/nova master: nova-manage: heal port allocations  https://review.opendev.org/63795508:25
*** tetsuro has quit IRC08:28
*** Luzi has joined #openstack-nova08:31
*** dtantsur|afk is now known as dtantsur|mtg08:37
*** tesseract has quit IRC08:38
*** tesseract has joined #openstack-nova08:40
*** rpittau|afk is now known as rpittau|mtg08:41
*** imacdonn has quit IRC08:42
*** imacdonn has joined #openstack-nova08:43
*** rcernin has joined #openstack-nova08:46
*** ociuhandu has joined #openstack-nova08:47
*** mdbooth has quit IRC09:02
openstackgerritSurya Seetharaman proposed openstack/nova master: Grab fresh power state info from the driver  https://review.opendev.org/66597509:04
*** ociuhandu has quit IRC09:04
*** rcernin has quit IRC09:07
*** jaosorior has quit IRC09:22
*** jaosorior has joined #openstack-nova09:24
openstackgerritBoxiang Zhu proposed openstack/nova master: Update AZ admin doc to mention the new way to specify hosts  https://review.opendev.org/66676709:29
*** mdbooth has joined #openstack-nova09:32
kashyapDoes anyone here of an existing bug in the Gate, where the "tempest-slow-py3" is failing with:09:32
kashyaptempest.exceptions.BuildErrorException: Server 008c5c50-ff54-49f4-adb0-23775e8af5f1 failed to build and is in ERROR status09:32
kashyapDetails: {'code': 500, 'created': '2019-06-25T20:55:49Z', 'message': 'Unexpected vif_type=binding_failed'}09:32
kashyaphttp://logs.openstack.org/89/667389/1/check/tempest-slow-py3/2606bcc/testr_results.html.gz09:32
kashyap[That is a stable/stein backport]09:32
kashyapOkay, I see time outs (also in the stable/rocky backport) in the 'testr_results'.  /me goes to 'recheck'09:37
*** psachin has joined #openstack-nova09:39
*** xek has joined #openstack-nova09:55
*** ratailor_ has joined #openstack-nova10:01
*** ratailor has quit IRC10:04
*** ociuhandu has joined #openstack-nova10:06
*** artom|gmtplus3 has quit IRC10:06
*** ttsiouts has quit IRC10:10
*** ttsiouts has joined #openstack-nova10:10
*** artom has joined #openstack-nova10:10
*** liuyulong has joined #openstack-nova10:14
*** ttsiouts has quit IRC10:15
*** brinzhang has quit IRC10:18
*** luksky has quit IRC10:28
*** mkrai_ has quit IRC10:29
*** mkrai_ has joined #openstack-nova10:29
*** mkrai_ has quit IRC10:31
*** mkrai__ has joined #openstack-nova10:31
*** dpawlik has quit IRC10:37
*** dpawlik has joined #openstack-nova10:38
*** davidsha has joined #openstack-nova10:40
*** brinzhang has joined #openstack-nova10:41
*** dpawlik has quit IRC10:42
*** sapd1_x has joined #openstack-nova10:42
*** dpawlik has joined #openstack-nova10:45
*** liuyulong has quit IRC10:47
*** dpawlik has quit IRC10:50
*** dpawlik has joined #openstack-nova10:53
*** Bidwe_jay has joined #openstack-nova10:57
*** mkrai__ has quit IRC10:58
*** mkrai_ has joined #openstack-nova10:58
*** luksky has joined #openstack-nova10:58
*** dpawlik has quit IRC11:00
*** dpawlik has joined #openstack-nova11:01
*** sapd1_x has quit IRC11:02
NewBruceHey @kayshap11:06
NewBruceso good news, i didnt try to mess around with xml, instead  just use SELinux ;) but not sure if you can help out on this one - migrations are failing with11:07
NewBruceLive Migration failure: Library function returned error but did not set virError: libvirtError: Library function returned error but did not set vir11:07
NewBrucedigging into the libvirt logs -11:08
NewBruce2019-06-26 09:46:22.816+0000: 30621: error : virNetClientStreamRaiseError:200 : stream had I/O failure11:08
NewBruce2019-06-26 09:46:23.190+0000: 19228: error : virNetClientProgramDispatchError:177 : internal error: qemu unexpectedly closed the monitor: 2019-06-26T09:46:22.815029Z qemu-kvm: Failed to load PCIDevice:config11:08
NewBruce2019-06-26T09:46:22.815033Z qemu-kvm: Failed to load virtio-net:virtio11:08
NewBruce2019-06-26T09:46:22.815036Z qemu-kvm: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-net'11:08
NewBrucething is, from the control side of things, the migration completed - no errors or anything are returned… also, i could migrate fine between these hosts before i added SELinux, and it (rarely) works to migrate a machine… im lost if its a LibVirt or nova issue at this point - thoughts?11:09
NewBrucethe osc reports life is peachy : openstack server migrate --live cc-compute28-sto2 aadfe56a-88b8-49c0-9dac-41a4c494c1b5 --wait11:10
NewBruceProgress: 97Complete11:10
NewBrucebut nova never gets to post-migration, and i dont think its actually doing the migration itself - on the source11:11
NewBruceTook 2.35 seconds for pre_live_migration on destination host cc-compute26-sto2.11:11
NewBruceMigration running for 0 secs, memory 100% remaining; (bytes processed=0, remaining=0, total=0)11:11
NewBruceMigration operation has aborted11:11
*** ttsiouts has joined #openstack-nova11:13
*** ociuhandu has quit IRC11:23
*** ociuhandu has joined #openstack-nova11:23
*** tbachman has joined #openstack-nova11:27
*** mkrai_ has quit IRC11:31
*** shilpasd has joined #openstack-nova11:31
*** shilpasd10 has joined #openstack-nova11:31
*** shilpasd10 has quit IRC11:32
*** shilpasd63 has joined #openstack-nova11:33
*** sean-k-mooney has joined #openstack-nova11:43
*** ratailor_ has quit IRC11:48
*** ivve has quit IRC11:51
*** udesale has quit IRC11:51
*** udesale has joined #openstack-nova11:52
*** eharney has quit IRC11:55
*** _erlon_ has joined #openstack-nova11:59
*** _alastor_ has joined #openstack-nova12:00
*** luksky has quit IRC12:02
*** luksky has joined #openstack-nova12:03
*** francoisp has joined #openstack-nova12:04
*** _alastor_ has quit IRC12:04
*** mdbooth has quit IRC12:11
*** ttsiouts has quit IRC12:13
*** ttsiouts has joined #openstack-nova12:13
*** lbragstad has joined #openstack-nova12:17
*** ttsiouts has quit IRC12:18
*** ttsiouts has joined #openstack-nova12:20
*** mdbooth has joined #openstack-nova12:21
*** martinkennelly has joined #openstack-nova12:23
*** mriedem has joined #openstack-nova12:25
mriedemlyarwood: is your plan for https://bugs.launchpad.net/nova/+bug/1832248 to get https://review.opendev.org/#/c/664418/ released and then bump the lower constraint dependency from nova to os-brick and then consider the nova bug fixed?12:27
openstackLaunchpad bug 1832248 in OpenStack Compute (nova) "tempest.api.volume.test_volumes_extend.VolumesExtendAttachedTest.test_extend_attached_volume failing when using the Q35 machine type" [Undecided,New]12:27
*** shilpasd63 has quit IRC12:30
*** shilpasd80 has joined #openstack-nova12:31
alex_xumriedem: hope we answered all your question https://review.opendev.org/601596, looking for one more +2 :)12:32
alex_xujohnthetubaguy: ^ hope you around, the vpmem spec is in good shape12:33
lyarwoodmriedem: no the nova bug is seperate, the os-brick change just works around the underlying QEMU issue Nova is hitting with q3512:34
*** Luzi has quit IRC12:35
mriedemlyarwood: oh ok12:36
mriedemalex_xu: ack, i still need to read all of the replies...12:36
alex_xumriedem: hah, i see, a lot12:37
*** brinzhang has quit IRC12:38
*** brinzhang has joined #openstack-nova12:39
alex_xumriedem: also, there is the code for reference https://review.opendev.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/virtual-persistent-memory, althought it is merge conflict, but it is still good to see what we probably are going to change in the code12:39
*** dpawlik has quit IRC12:40
sean-k-mooneymriedem: can you take a look at https://review.opendev.org/#/c/667264/ its a osc change for force down. you sent a mail to the list about droping computenode host/service id compat code and im wondering if that is related or not12:42
mriedemi think my biggest hangups were on the (1) flavor extra spec definition which was a bit hard to parse from a user perspective in my opinion and (2) the questions about the new data model and versioned object which were very similar to a BDM but i realize we don't want to re-use BDMs for this12:43
mriedemsean-k-mooney: different issue12:44
mriedemsean-k-mooney: before 2.53 you had to call a force-down route, with 2.53 you just call the normal PUT route12:44
openstackgerritGhanshyam Mann proposed openstack/nova master: Add mising tests for flavor extra_specs mv 2.61  https://review.opendev.org/66760012:44
mriedemhttps://developer.openstack.org/api-ref/compute/#update-forced-down for <2.5312:44
sean-k-mooneyright i saw that12:44
mriedemhttps://developer.openstack.org/api-ref/compute/#update-compute-service >=2.5312:44
sean-k-mooneywhat i was concerned about is the new form uses service id12:45
mriedemwith 2.53 the service_id in the API is a uuid12:45
openstackgerritGhanshyam Mann proposed openstack/nova master: Add missing tests for flavor extra_specs mv 2.61  https://review.opendev.org/66760012:45
sean-k-mooneythe old used host and binary name12:45
mriedemservice_id is the uuid of the service, it's fine12:45
sean-k-mooneyand i was not clear what you were proposeing droping in the mail12:45
mriedemit's unrelated to the relationship between compute nodes and services12:45
sean-k-mooneyok12:45
mriedemsee all of the notes/todos around ComputeNode.service_id in the code12:45
mriedemand ComputeNode.host12:46
lyarwoodmriedem: https://review.opendev.org/#/c/457886/ - btw, would you mind taking a look at this if you have time this week.12:46
sean-k-mooneymriedem: ok im reading them now thanks12:46
mriedemlyarwood: sure12:47
lyarwoodthanks12:47
*** dave-mccowan has joined #openstack-nova12:52
*** dpawlik has joined #openstack-nova12:55
*** mmethot has joined #openstack-nova12:57
openstackgerritGhanshyam Mann proposed openstack/nova master: Add missing tests for flavor extra_specs mv 2.61  https://review.opendev.org/66760013:00
*** xek_ has joined #openstack-nova13:05
*** xek has quit IRC13:07
bauzasmriedem: FWIW, I need to reload a shit ton of context from Kilo before replying to you but I saw your email13:08
bauzasmriedem: because I wonder if we need a major version bump for the ComputeNode object13:09
*** mmethot is now known as mmethot|brb13:10
mriedembauzas: i wondered about that as well but figured it wasn't required13:15
openstackgerritMartin Midolesov proposed openstack/nova master: Implementing graceful shutdown.  https://review.opendev.org/66624513:15
*** eharney has joined #openstack-nova13:15
mriedemi think we've only ever bumped the major version on an object and that's when dansmith did Instance v2.013:16
mriedemi don't remember the details of how complicated it was but i'm pretty sure i'd screw it up if i tried to do it myself13:16
*** rajinir has joined #openstack-nova13:18
bauzasmriedem: yeah I need to remember why I was thinking about that by Kilo time13:24
mdboothstephenfin or sean-k-mooney: https://review.opendev.org/#/c/663382/4/nova/compute/manager.py Not my area of expertise, but would the prior call to _deallocate_network not mean that neutron would no longer return this stuff?13:28
mriedemsean-k-mooney: i've replied on https://review.opendev.org/#/c/667264/2 with what i think they should do in the 2.53 case,13:28
mriedemwhether or not novaclient has all of the plumbing they need i haven't checked13:28
*** eharney has quit IRC13:29
sean-k-mooneymriedem: thanks osc is not what i normally review but since they asked me to take a look i said i would review13:29
*** mmethot|brb is now known as mmethot13:29
mriedemsean-k-mooney: mdbooth: also commented in https://review.opendev.org/#/c/663382/413:31
*** spatel has joined #openstack-nova13:31
sean-k-mooneymaybe im looking. we could proably use try_dealocate_networks there too13:31
mdboothmriedem: Ooh, I'd forgotten that gem.13:31
mriedemmdbooth: what? force_refresh?13:33
dansmithmriedem: correct, and yes, it's complicated13:33
mriedemyou don't want to use that in this case13:33
mriedemmdbooth: because force_refresh only goes back to stein and i'm guessing you want to backport this further than that13:33
mdboothmriedem: Ack.13:33
*** shilpasd80 has quit IRC13:34
*** spatel has quit IRC13:36
kashyapAny others seeing stable/stein failures with the 'tempest-slow-p3' job?13:43
kashyaphttp://logs.openstack.org/89/667389/1/check/tempest-slow-py3/2606bcc/testr_results.html.gz13:44
mriedemkashyap: yes known issue13:44
mriedemhttps://review.opendev.org/#/c/66721613:44
kashyapAh, thanks.  I didn't wanted to mindlessly do 'recheck'13:45
*** yedongcan has quit IRC13:45
sean-k-mooneymdbooth: deallocate_network delete the neutron port that were auto allcoated by nova so yes we proably should move that to the end of the function since it clears the network info caceh https://opendev.org/openstack/nova/src/branch/master/nova/network/neutronv2/api.py#L1603-L160413:46
*** mloza has quit IRC13:47
*** eharney has joined #openstack-nova13:48
*** eharney has quit IRC13:48
*** eharney has joined #openstack-nova13:49
*** mlavalle has joined #openstack-nova13:54
*** belmoreira has quit IRC13:55
openstackgerritLee Yarwood proposed openstack/nova master: libvirt: Add a rbd_connect_timeout configurable  https://review.opendev.org/66742113:56
mriedemsean-k-mooney: i left some more comments/questions in that one and added some vmware and zvm driver devs14:00
sean-k-mooneymriedem: it looks like https://review.opendev.org/#/c/660761/8 is trying to fix the same or a similar bug14:01
*** belmoreira has joined #openstack-nova14:02
*** brinzhang has quit IRC14:03
sean-k-mooneymriedem: if we delete while building there is a scond race which causes us to not clean up the vif14:03
*** brinzhang has joined #openstack-nova14:03
mriedemthat's amorin's fix yes14:04
mriedemwhich is different from stephenfin's which is handling a failure while building14:04
sean-k-mooneye.g. if the vm has spawned but we get teh delete before we update the instcance state in the db we raise an exception which is what cause us to not clean them up14:04
mriedemand amorin was just in here the other day saying he had a similar issue there14:04
sean-k-mooneymriedem: no stephens issue was a failure caused when you delete while building14:05
*** jistr is now known as jistr|call14:05
sean-k-mooneyspefically for the customer it was cause because one of the isntance in there heat stack failed to build and that cause all fo the instance to be deleted14:05
sean-k-mooneymriedem: i think amorin bug is a duplicate of stephens but im not sure it would fix it in all cases as in sthepens edgecase we never call distroy14:07
sean-k-mooneywell maybe they are both bugs i did not fully review there bug in detail14:08
mriedemas i said amorin said he still has an issue which stephenfin's patch might resolve14:12
mriedemamorin said he was going to try and recreate and use stephen's patch to test it14:12
sean-k-mooneyya i think on reading there bug both would be needed14:12
sean-k-mooneymriedem:  amorin is fixing the fact we might be useing an outdated network_info object form the instance and stephenfin is fixing that if we fail due to the db update we never even tried to clean up the vifs14:13
sean-k-mooneyso to fix the downsteam bug we will need to backprot both.14:14
sean-k-mooneyok this make more sense to me now.14:14
*** _alastor_ has joined #openstack-nova14:15
amorinhey all14:20
amorinthe bug I faced 2 days ago was not fixed by stephenfin patch14:22
*** jistr|call is now known as jistr14:22
amorinI found that it was something else in our code14:22
amorincc mriedem sean-k-mooney14:23
*** _erlon_ has quit IRC14:23
mriedemmnaser: i think you just hit something like this nw info cache lost thing, so you might have input here http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007363.html14:23
amorinby the way, I faced an other one, related to the patch I did:14:23
amorinhttps://review.opendev.org/#/c/667294/14:23
mriedemmaciejjozefczyk: sean-k-mooney: ^14:23
mriedemamorin: one step forward, two steps back :(14:24
amorinyup14:25
mriedemi remember a similar check was added here https://github.com/openstack/nova/blob/707deb158996d540111c23afd8c916ea1c18906a/nova/network/base_api.py#L3514:25
amorinexact14:25
sean-k-mooneyok so we might need all 3 patches14:26
sean-k-mooneyamorin: stephenfin patch is a generalised fix to a very specific edgecase14:27
sean-k-mooneyamorin: what you originally tried to fix was more subtle as we were passing stale data in some cases14:28
maciejjozefczykehh, instance_info_cache :)14:29
openstackgerritMartin Midolesov proposed openstack/nova master: Implementing graceful shutdown.  https://review.opendev.org/66624514:29
sean-k-mooneymaciejjozefczyk: yep its awsome...14:30
sean-k-mooneymriedem: out of interest why do we store the instance info cache in the db?14:30
sean-k-mooneyi fell like we would have fewer bugs related to it if we actully just made it an in process dict cache14:31
mriedemsean-k-mooney: i'll direct your question to the people that worked on nova back in 2011 or something14:31
sean-k-mooneywell my next question was going to be "i assume this is because of nova networks legacy choices"14:32
openstackgerritStephen Finucane proposed openstack/nova master: Remove no longer required "inner" methods.  https://review.opendev.org/65528214:32
openstackgerritStephen Finucane proposed openstack/nova master: Privsepify ipv4 forwarding enablement.  https://review.opendev.org/63543114:32
openstackgerritStephen Finucane proposed openstack/nova master: Remove unused FP device creation and deletion methods.  https://review.opendev.org/63543314:32
openstackgerritStephen Finucane proposed openstack/nova master: Privsep the ebtables modification code.  https://review.opendev.org/63543514:32
openstackgerritStephen Finucane proposed openstack/nova master: Move adding vlans to interfaces to privsep.  https://review.opendev.org/63543614:32
openstackgerritStephen Finucane proposed openstack/nova master: Move iptables rule fetching and setting to privsep.  https://review.opendev.org/63650814:32
openstackgerritStephen Finucane proposed openstack/nova master: Move dnsmasq restarts to privsep.  https://review.opendev.org/63928014:32
openstackgerritStephen Finucane proposed openstack/nova master: Move router advertisement daemon restarts to privsep.  https://review.opendev.org/63928114:32
openstackgerritStephen Finucane proposed openstack/nova master: Move calls to ovs-vsctl to privsep.  https://review.opendev.org/63928214:32
openstackgerritStephen Finucane proposed openstack/nova master: Move setting of device trust to privsep.  https://review.opendev.org/63928314:32
openstackgerritStephen Finucane proposed openstack/nova master: Move final bridge commands to privsep.  https://review.opendev.org/63958014:32
openstackgerritStephen Finucane proposed openstack/nova master: Cleanup the _execute shim in nova/network.  https://review.opendev.org/63958114:32
openstackgerritStephen Finucane proposed openstack/nova master: We no longer need rootwrap.  https://review.opendev.org/55443814:32
openstackgerritStephen Finucane proposed openstack/nova master: Cleanup no longer required filters and add a release note.  https://review.opendev.org/63982614:32
mriedemsean-k-mooney: idk, you'd have to do some digging to find out when the network info cache was introduced, i don't know if it was before quantum or not14:33
mriedembut we also store bdms in the db which are essentially the same thing - a cache of volume information for the server14:33
mriedemwhich was probably before cinder existed14:33
sean-k-mooneyim seeing a pattern there14:33
sean-k-mooneyok well lets fix the current issue first but i think i might look into if we could remove storing it to the db14:34
amorinI would love that14:34
amorin:p14:34
sean-k-mooneycacheing in memory in the compute agent would likely be enough14:34
sean-k-mooneywe would have to rebuild it every time the compute agent restarts but i think that is fine14:35
sean-k-mooneyactully we could use memcache to cache it too which would mean all the services would have acess to it anway its now on my todo list14:37
sean-k-mooneymessing up the netron policy and currpting the network info cache is what cause our ci cloud production outage at the weekend14:38
*** aarents has joined #openstack-nova14:38
*** luksky has quit IRC14:43
mriedemTheJulia: is this a known busted job? http://logs.openstack.org/17/667417/1/check/ironic-tempest-ipa-wholedisk-bios-agent_ipmitool-tinyipa/db33ba3/controller/logs/devstacklog.txt.gz#_2019-06-26_05_47_14_16814:45
mriedemsean-k-mooney: redoing how the nw info cache works is hopefully wayyyyyyy down on your todo lits14:46
mriedem*list14:46
shilpasdefried: mriedem: can you tell me how to trigger live migration sync and async way, any CLI commands?14:47
mriedemshilpasd: i don't know what you mean, sync and async way14:48
shilpasdmriedem: means nova live-migration <instance_id>, it triggers live migration, but any another way to live migrate, any periodic call or something14:48
mriedemno nova doesn't auto-live migrate things for you14:49
shilpasdmriedem: i am in process of verifying all move operations on NFS changes done against https://review.opendev.org/#/c/650188/14:49
shilpasdso wnat to take care of all move operations14:50
shilpasdso just want to know @ it14:50
mriedemall move operations are user-initiated14:51
mriedemas far as i know anyway14:51
shilpasdok, as of now verifying SHELVE + SHELVE with offload + UNSHELVE + REBUILD + RESIZE + RESIZE REVERT + EVACUATION + COLD MIGRATION + COLD MIGRATION REVERT + LIVE MIGRATION14:52
*** lpetrut has quit IRC14:52
shilpasdjust list if i missed anything14:52
mriedemby rebuild i assume you mean evacuate14:52
mriedemrebuild (the server action in the api) isn't a move,14:52
mriedembut evacuate is14:52
efriedbrinzhang: I'm here now, what's up?14:52
mriedemevacuate = rebuild on another host14:52
shilpasdrebuild using another image14:52
mriedemrebuild + a new image is not a move14:53
mriedemit's rebuilding the server's root disk image on the same host14:53
bauzasmriedem: not sure I understood your point in https://bugs.launchpad.net/nova/+bug/1793569/comments/514:53
openstackLaunchpad bug 1793569 in OpenStack Compute (nova) "Add placement audit commands" [Wishlist,Confirmed] - Assigned to Sylvain Bauza (sylvain-bauza)14:53
mriedemalso, shelve w/o offload and then unshelve is also not a move operation,14:53
bauzasmriedem: do you want heal_allocations to support this or the "placement audit' rather ?14:53
mriedemif the instance is shelved but not offloaded, and then the user unshelves, it's just unshelved on the same host14:53
shilpasdmriedem: ok, noted14:53
shilpasdmriedem: what @ resize14:54
shilpasdits move operation, right, since resizing on another host also14:55
mriedemshilpasd: maybe :)14:55
mriedemunless nova is configured with allow_resize_to_same_host and the scheduler picks the same host the instance is already one,14:55
mriedemwhich is possible in a small edge site or if the server is in a strict affinity group and can't be moved14:55
mriedem*already on14:56
shilpasdgot it14:56
mriedemhttps://bugs.launchpad.net/nova/+bug/1790204 is all about that problem14:56
openstackLaunchpad bug 1790204 in OpenStack Compute (nova) "Allocations are "doubled up" on same host resize even though there is only 1 server on the host" [High,Triaged]14:56
mriedembauzas: i think i meant to say "nova-manage placement audit" there,14:58
mriedemsince heal_allocations doesn't report on things really, nor does it delete allocations, it only adds allocations for instances (not migrations) that are missing14:58
bauzasmriedem: ack, will add this there then14:58
mriedemi went on to continue talking about heal_allocations but idk, it's a blur15:00
shilpasdmriedem: one more query, i have NFS configuration, and performing resize on another host, and it goes for creating a instance data file on the dest system via SSH15:00
shilpasdrefer code at https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L886115:00
shilpasdmriedem: during shared resource provider check, why this check is necessary?15:01
*** cfriesen has joined #openstack-nova15:01
shilpasd_is_storage_shared_with()15:02
*** asettle is now known as asettle-PTO15:03
*** xek__ has joined #openstack-nova15:03
mriedemshilpasd: it may be ssh or rsync, it depends on config https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.remote_filesystem_transport15:03
mriedemthe default is ssh15:03
mriedemi'm less familiar with this code, but for one we don't have shared storage provider support in the libvirt driver anyway,15:04
*** ccamacho has quit IRC15:04
mriedembut this is presumably one of the things we could replace if we had compute nodes modeled in a shared storage aggregate and we could avoid the "temp file create" tests and such for shared storage15:04
mriedemas i'm sure lyarwood and mdbooth could probably attest, shared storage support in the libvirt driver can be very confusing because there are the instance files like console logs and such, and there is the image backend, and that can all be different and be a mix of shared storage and non-shared storage, e.g. the root disk images might be in the rbd image backend but the instance files, like console logs, could be on local dis15:05
mriedemd get ssh'ed/rsync'ed around15:05
*** xek_ has quit IRC15:05
mriedeme.g.15:06
mriedemhttps://docs.openstack.org/nova/latest/configuration/config.html#workarounds.ensure_libvirt_rbd_instance_dir_cleanup15:06
sean-k-mooneymriedem: yes it is but if we keep getting bug with it i might have to raise it. but ya not before m2 likely not before m3 if in train at all.15:07
sean-k-mooney^ network info cache rework15:07
mriedembauzas: i think what i was thinking of was an audit command could detect that you have orphaned allocations tied to a not-in-progress migration, e.g. a migration that failed but we failed to cleanup the allocations,15:07
mriedembauzas: and then that information could be provided to the admin to then determine what to do, e.g. delete the allocations for the migration record consumer and potentially the related instance,15:08
bauzasmriedem: yeah ok15:08
mriedemand if they delete the allocations for the instance, then they could run heal_allocations on the instance to fix things up15:08
mriedemwe could also eventually build on that to make it automatic with options15:08
mriedeme.g. nova-manage placement audit --heal15:08
mriedemsomething like that15:08
shilpasdmriedem: thanks for discussing doubts, will go through the sharings and get back to you for any further15:09
mriedemsean-k-mooney: redoing nova's nw info cache at this point in the game is going to be a big undertaking, and i would not be surprised if trying to use a global cache like memcache or etcd or something just generates more or different kinds of bugs than what we've already been patching lo these many years, as i'm sure dansmith can agree15:09
* mriedem feels the need to phone a friend15:10
dansmithoh mahgod15:10
efriedyonglihe: I'm going to fix your pep8 error on https://review.opendev.org/#/c/627765/ real quick, k?15:10
dansmithwhy do we need a memcache? it's in the database15:10
efriedit's due to a new rule that recently merged.15:10
sean-k-mooneydansmith: i was suggesting not keeping in in the database and only having a dict cache or maybe use memcache15:11
dansmithsean-k-mooney: ...why?15:11
sean-k-mooneymriedem: and ya it would be a blueprint or spec not a bug fix15:11
mriedemsean-k-mooney: we can just as easily f that up15:11
sean-k-mooneywell if its in process as a dict cache then if we f it up it fixed by restarting the compute agent15:12
sean-k-mooneymemcahce is proably not going to help with anything15:12
dansmithwe store some stuff in nwinfo that isn't anywhere else, IIRC, like which ports we created vs. the user, so that has to be persisted somewhere if we were going to use memcache15:12
dansmith...yeah ;)15:12
dansmithwhat problem is being solved here?15:13
mriedemi don't think that overhauling to use an external cache service and restarting the compute is the giant hammer we really need for what we're trying to solve15:13
sean-k-mooneynothing at the momemnt reworking it is unrelated to what we are trying to fix15:13
openstackgerritEric Fried proposed openstack/nova master: Clean up orphan instances virt driver  https://review.opendev.org/64891215:13
openstackgerritEric Fried proposed openstack/nova master: clean up orphan instances  https://review.opendev.org/62776515:13
mriedemso this is a....thought exercise?15:13
efriedsean-k-mooney, gibi: Would y'all please have another look at these --^15:13
sean-k-mooneyyes15:13
*** brinzhang has quit IRC15:13
sean-k-mooneyits on my todo list to figure ot if it makes sense to even do15:14
gibiefried: I have it open15:14
*** brinzhang has joined #openstack-nova15:14
efriedthanks gibi15:15
efriedthanks sean-k-mooney15:15
efriedsean-k-mooney: fyi it's apparently a thing stx cares about15:15
efriedthus presumably it "makes sense" in some capacity :)15:15
mriedemefried: hyperv ci is happy with the update_provider_tree patch https://review.opendev.org/#/c/667417/15:17
efriedmriedem: thanks for the reminder15:17
mriedemefried: fwiw that cleanup orphan instances thing is also something that the public cloud SIG (and huawei public cloud ops) care about as well, which i was initially reviewing it awhile back15:17
mriedem*why i was15:18
mriedemthe concern at the last ptg was how much duplication there was with the existing periodic to cleanup running deleted (but not orphaned) instances15:18
*** psachin has quit IRC15:19
efriedokay, thanks for that background.15:20
mriedemsomething something live migration fails and you've got untracked guests on the host consuming resources (which aren't tracked obviously) so then trying to schedule things to those hosts fails b/c you're out of resources15:21
efriedsounds like we need a patch to clean up those orphaned instances15:21
* mriedem shrugs15:22
mriedemi'm sure lots of operators have already just written scripts to detect and clean those types of thing sup15:22
mriedem*up15:22
*** whoami-rajat has quit IRC15:22
mriedembut yeah it's better to have it native probably15:22
*** brinzhang has quit IRC15:27
*** dpawlik has quit IRC15:28
*** icarusfactor has joined #openstack-nova15:28
*** jamesdenton has joined #openstack-nova15:29
*** factor has quit IRC15:29
*** aarents has quit IRC15:35
*** whoami-rajat has joined #openstack-nova15:38
efriedmriedem: We don't have a way to prove the xen one is being hit, do we? (update_provider_tree)15:42
efriedsince their CI is dead?15:42
efriedmriedem: also, if you haven't already, there should be a note to the ML warning of this (and another before we remove the code path, obvsly)15:43
efried...for oot folk15:44
mriedemsorry was just doing tech support with my mom15:44
efried(I know nova_powervm is copacetic fwiw)15:44
mriedemi was waiting to send the oot ML email until we were more sure about what i've proposed15:45
mriedemand idk about the xen one if their CI is dead, though it's pretty damn basic15:45
mriedemjust a port of get_inventory15:45
bauzasefried: mriedem: heh, the reportclient doesn't of course support all placement API queries, so I wonder whether I should add something like "get_resource_providers()" method in the reportclient just for nova-manage caller, or calling directly the Placement API15:46
bauzasthoughts on that ?15:46
efriedbauzas: If it's something simple like GET /resource_providers (you really want all of them?) then yeah, just call SchedulerReportClient.get()15:47
bauzaszactly15:47
efriedsfine15:47
bauzasefried: but then I don't have a safe_connect connection15:48
mriedemif you're not going to page, you could be listing 14K providers in the case of cern...15:48
efriedbauzas: We don't want @safe_connect15:48
efriedever, anywhere15:48
efriedHandle ksa.ClientException at the caller instead.15:48
efriedAnd if you see @safe_connect anywhere in your travels and want to kill it and do that ^, I will buy your drivks.15:48
efrieddrinks15:48
* mriedem notes that GET /resource_providers doesn't support paging15:48
efriedtrue story15:48
bauzasit's 40°C here, I'm all for a drink15:49
efriedbauzas: what are you trying to do with the master list?15:49
bauzasefried: looking up all allocations to see whether they're orphaned15:49
bauzasmriedem: ah shit, excellent point15:49
mriedemyou could instead page the compute nodes in the cells and hit this api https://developer.openstack.org/api-ref/placement/?expanded=#list-resource-provider-allocations15:50
bauzaswe could possibly need to look at all allocations per resource provider, which would be given by a list of compute services (which is paginated AFAIK)15:50
bauzasheh, jinxed15:50
mriedemcompute service != compute node == resource provider15:50
bauzasshit, typo, nodes indeed15:50
bauzastell me about my Kilo bp15:50
*** ttsiouts has quit IRC15:51
mriedemso once you get the allocations for a given provider, what are you going to do?15:51
*** ttsiouts has joined #openstack-nova15:51
mriedemcheck if an instance (or migration) exists with the given consumer uuid?15:51
mriedemand if not, consider the allocation orphaned?15:51
mriedemiff the allocation has resources that nova "owns" like VCPU15:52
mriedemwithout consumer types in the allocations response we have to rely on the resource class15:52
bauzasexactly this, I was about to say which resource classes where nova-related15:52
bauzaswere*15:53
efriedugh, relying on resource class...15:53
efriedthis is where the concept of provider owner would be handy.15:54
bauzasyeah I know15:54
efriedhopefully we're not allowing allocations from different owners against the same provider anywhere15:54
bauzaswe could also add an argument asking for the resource class we wanna check15:54
efriedno, we shouldn't do it by resource class15:54
efriedbecause same resource class may be managed by different owners in different providers15:55
*** tssurya has quit IRC15:55
efriedthink VF (nova-PCI vs cyborg vs neutron)15:55
efriedbut we (need to make sure we) have a rule that a provider as a whole is only managed by a single owner.15:55
bauzashmmm15:56
*** ttsiouts has quit IRC15:56
bauzasactually, I'm checking consumer_id15:57
bauzasso I guess all resource providers corresponding to compute nodes (and children associated) should have allocations against consumer_id that15:57
bauzasthat *is* either a migration object or a nova instance15:57
bauzaseven cyborg, right?15:58
openstackgerritNate Johnston proposed openstack/nova stable/stein: [DNM] Test change to check for port/instance project mismatch  https://review.opendev.org/66766315:58
bauzasefried: ^?15:59
*** igordc has joined #openstack-nova15:59
*** damien_r has quit IRC15:59
efriedbauzas: If what you're looking to do is clean up allocations against orphaned instances, I think it's legit to remove all the allocations associated with that consumer, even if they're on providers you don't own. That's symmetrical with what we do when we schedule (we claim all of those atomically from nova).16:00
efriedand16:00
efriedif there's an allocation against a compute node RP, you can legitimately assume it's in that category16:01
efriedbut16:01
efriedthat will break eventually if we ever have resourceless roots16:01
efriedbecause16:01
efriedyou can *not* assume that all children of the compute node RP *also* belong to nova.16:01
bauzasbaby steps here :)16:01
efriedyeah, just leave a note/todo I guess.16:02
bauzasat least if I can support nested rps, it would be cool16:02
*** _erlon_ has joined #openstack-nova16:02
bauzasbecause eg. VGPU allocations are still made *against* a consumer which is an instance, yeepee16:02
bauzasbut, that would mean I would look at all resource providers, not only the ones Nova owns16:03
efriedyeah, it would be 1) compute node => 2) compute node RP => 3) allocations against that RP => 4) consumer for that allocation => 5) filter down to orphan consumers => 6) allocations for those consumers => 7) delete all of those16:03
efriedno16:03
bauzasand here comes pagination...16:04
efriedwith the limitation noted above (stops working for resourceless roots, which we're a long way off of), the above process will get you there.16:04
efriedStep 1 done by paginating from the nova API.16:04
bauzascool then16:04
efriedthis is in a nova-manage type utility?16:05
efriedSo we don't care that it'll take FOREVER to run at cern?16:05
bauzasa nova-manage placement audit thing16:05
efriedmm16:05
bauzasso a cron job basically16:05
bauzasmarker and the likes16:05
efriedmm16:06
bauzaszactly like heal_allocations16:06
efriedsure would be nice to find a way to make it more efficient then.16:06
mriedemheal_allocations doesn't have a marker16:06
efriedbut: make it, make it right, make it fast16:06
mriedemit has a limit of things to process16:06
mriedemnor does heal_allocations deal with nested allocations16:06
mriedemthe audit command could also take a --consumer option to just investigate what the operator thinks is a problem instance/migration16:08
mriedemnote that i added --instance to heal_allocations later for that reason16:09
bauzasyup I saw16:09
mriedemand --dry-run16:09
mriedemdepends on what the command will do though, if it's just reporting then you don't need a --dry-run16:09
*** artom has quit IRC16:10
bauzasI was thinking of just telling the orphaned, but then later adding a --remove option16:11
bauzas*later*16:11
bauzasanyway, needs to go off and run by hot summer nights16:12
bauzasI think I have everything I needed, thanks folks16:13
* mriedem assumes "hot summer nights" is a french adult store that bauzas frequents16:14
dansmithhoo boy16:14
mriedemstrictly adult cheese, wine and things of that nature16:15
melwittnow, for another fun topic16:16
melwittmriedem, dansmith: I was reading these comments on an old [unmerged] patch: https://review.opendev.org/#/c/462521/12/releasenotes/notes/resize-auto-revert-6e1648828aba16b2.yaml@5,16:17
*** maciejjozefczyk has quit IRC16:17
melwitt and it made me think of the [recently merged] patch: https://review.opendev.org/633227 again and how it changed ERROR state to ACTIVE (or STOPPED) state. now I'm worried that wasn't an ok thing to do (API change?)16:17
melwittfor a failed cold migration to self16:18
mriedemnot the same16:18
mriedemin my change,16:18
mriedemwe failed in prep_resize before we actually did anything to the guest16:18
mriedemin that case, putting the instance in ERROR status makes no sense imo16:18
mriedemas i said, the only way you can get it out of error then is to do something like rebuild, hard reboot and/or reset status to ACTIVE,16:19
dansmithand was yours also resetting to ACTIVE if it was actually shutoff?16:19
dansmithI forget16:19
mriedemand if i started a resize or cold migration of a STOPPED instance, then resetting it to ACTIVE isn't what i want, nor is rebuild or hard reboot really16:19
mriedemdansmith: that was the point of my fix16:19
mriedemto reset to STOPPED if it was STOPPED16:19
dansmithmriedem: right16:19
mriedemwell, in part,16:19
mriedemthe main point was don't put it in ERROR status16:20
*** belmoreira has quit IRC16:21
melwittok, I think I see. this is ok because the instance is actually ok (other than cosmetic), whereas for the first example, the instance was not ok and was proposed to auto-correct to an ok/healthy state16:21
dansmiththe auto-revert actually moved stuff back, IIRC16:22
dansmithnot just correcting state, but actual revert16:22
melwittyeah it did16:22
melwittI was zooming in on the vm_state part of it, how it appears to an external script like in your example in the comment16:23
melwittand then I was thinking, is that a problem, if we imagine an external script executing a cold migrate and it fails and the instance stays ACTIVE so the script doesn't know it didn't work. that sort of thing16:24
melwittI was wondering about that after I read the comments on the old auto-revert patch16:25
dansmithbut the difference is,16:26
mriedemthe external thing should be waiting for task_state to be None to know the operation is done (or the instance action is finished/error, or the migration status is 'finished' or whatever in this case)16:26
dansmiththe merged patch corrected state before it changed from $orig to MIGRATING or whatever, right?16:26
mriedempolling the vm_state in the API is not sufficient16:26
dansmiththe auto-revert one has it go into all the migrating states and then pop back16:26
dansmithspecifically, potentially pop back to ACTIVE and not have moved, IIRC16:27
melwittyes, I believe it did prevent an ERROR state that occurred before going to MIGRATING16:27
mriedemi'm getting lost in the "it" references here when talking about separate changes16:28
melwittheh, sorry. the merged change16:28
mriedembooth's change was,16:28
mriedemresize/cold migrated failed somewhere and somehow, and the instance was set to ERROR status, right?16:28
mriedemand if you tried doing a revertResize API call on that ERROR instance, it would do the revert resize flow to go back from the dest to the source host16:29
dansmithno, it did a full revert I think16:29
mriedemeven though what we could have failed on was maybe something in prep_resize or resize_instance before the guest / volumes / networking ever actually *got* to the dest host16:29
dansmithso we get to the dest host, fail, auto-revert back to source, and go back to ACTIVE16:30
dansmithyou wait for ACTIVE to mean "success" but really it failed and the instance hasn't resized or move16:30
melwittyeah, I think it was a full revert on the booth change. i.e. do automatically what a user would have to do, initiate a revert16:30
mriedemoh i see https://review.opendev.org/#/c/462521/12/nova/compute/manager.py@444916:30
dansmithgranted it's been 18 months since I last looked at this16:30
dansmithit's really the opposite of what mriedem's change was doing,16:30
dansmithwhich was keep it active if we don't start16:31
mriedemor stopped rather than active...16:31
dansmithwell, and that's an important piece yeah16:31
mriedemi.e. start resize with a stopped server, prep_resize fails, don't reset to active *because it's stopped*16:31
dansmithright16:31
mriedemeventually the power sync task would stop the instance i think but still16:31
dansmithor restart it when it shouldn't, right?16:32
melwittyeah, makes sense16:32
dansmithif vm_state is active, it was stopped, power state sync says "hmm, this should be running"16:32
mriedemi don't think that task ever starts anything16:32
mriedemeven though people have asked for that in the past16:32
dansmithno? I thought it would for things like post-host-failure recovery16:32
mriedemi believe the reasoning was always, we don't want to turn things on by guessing and then bill the user16:32
dansmithwell, billing is unrelated to started or stopped, but okay :)16:33
dansmithit's a complex enough not-really-a-state-machine that I'm sure I'm getting it wrong16:33
mriedemdepends on how you do your billing16:33
dansmithregardless, ACTIVE but not running is about as bad16:33
mriedemsame - it's been a long time since i loked16:33
mriedem*looked16:33
mriedemanyway, i agree that if i'm doing a resize (and i'm sure tempest would do this), you're waiting for the instance to go to VERIFY_RESIZE with task_state=None,16:34
mriedemit the instance goes back to ACTIVE with task_state=None, i'd wait indefinitely16:34
mriedemunless i've got a timeout,16:34
dansmithespecially if you went into RESIZING in between16:34
mriedemor also checking instance actions or migration status (which might be admin-only inof)16:34
mriedem*info16:34
mriedemi personally wouldn't try to track the task_state transitions since that's probably a losing game16:35
mriedemi would just wait for terminal states but yeah16:35
*** tesseract has quit IRC16:35
dansmiththe thing is, ACTIVE is a terminal state for auto-confirm16:35
mriedemtrue yeah16:35
dansmithso if it went ACTIVE -> RESIZING -> ACTIVE, you should assume it actually resized and was auto-confirmed16:35
dansmithbut with auto-revert,16:35
mriedemi know powervc set auto-confirm to 1 second16:35
dansmiththat breaks that behavior16:35
mriedemlbragstad had to fix a few race bugs as a result :)16:36
dansmithwith auto-revert, ACTIVE->RESIZING->ACTIVE could mean "it worked" or "it didn't"16:36
mriedemdansmith: yeah, and you wouldn't know unless you checked the migratoin or instance actions, which you as a non-admin might not have access to those details16:36
melwittyeah, I see16:36
dansmithit turns waiting for a terminal state into a much more complex affair for sure16:36
openstackgerritMerged openstack/nova master: Replace deprecated with_lockmode with with_for_update  https://review.opendev.org/66622116:37
melwittthat's a helpful way to think about it, imagining what a tempest (or func test) would need to do to be able to automate it16:37
mriedemmaybe should link this conversation into the abandoned change so we have that when this comes up again in 2 years :)16:40
melwittyeah, that's a good idea. let me do that now16:40
sean-k-mooney... i started reading the scroll back and i think on second tought i not going to do that16:44
sean-k-mooneymelwitt: the only way for a non admin to deterim if a cold migrate suceeded would be to check the hashed host id before and after16:47
sean-k-mooneyfor resize they could check the if the flavor is the one they expected16:48
dansmithsean-k-mooney: not really16:48
dansmithoh for a strict migration, yeah16:48
dansmithwas going to say, resize to same host breaks that assumption16:48
melwittsean-k-mooney: could also observe ACTIVE -> RESIZING -> ACTIVE as dansmith described, right? as non-admin16:49
dansmithmelwitt: yes16:49
sean-k-mooneyyou could observe it if you pool but you would not know if it succeded or failed16:49
sean-k-mooneywithout also checking if the falvor is the old or new one16:49
dansmithsean-k-mooney: you won't go back to active from resizing currently16:49
sean-k-mooneyoh ok so that was the change ye were talking about16:50
melwittsean-k-mooney: if it failed [after going to RESIZING] it would go to ERROR. are you talking hypothetically with the abandoned patch?16:50
sean-k-mooneymelwitt: there are case we i though it would auto revert that went back to active16:51
melwittsean-k-mooney: no, that was the proposal in the abandoned patch16:51
sean-k-mooneyok i might be thinking about live migrate then16:52
sean-k-mooneyfor live migrate we can fail to migrate but still be in active16:52
*** davidsha has quit IRC16:56
sean-k-mooneyso ya looking at code earch revert_resize is only ever called form the api which simplifes some things but not others16:58
sean-k-mooneymelwitt: do we currently allow you to revert a resize for an instance that is in error because the resize failed16:59
sean-k-mooneyso you can go active->resizeing->error->active?16:59
mriedemfwiw, as a non-admin i think you can tell if your resize failed if the instance action "message" is not null /servers/{server_id}/os-instance-actions/{request_id}17:00
mriedemer GET /servers/{server_id}/os-instance-actions/{request_id}17:00
melwittsean-k-mooney: I think so, based on the abandoned patch. it was proposing to do that automatically (from error)17:00
sean-k-mooneymelwitt: ok if we did not you would have to do rest state (which is admin only?) + a hard reboot17:01
melwittmriedem: you mean failed before resize started right17:02
mriedemno if the operation failed17:02
mriedemif any event in an action (operation) fails, the overall action 'message' is always 'Error': https://github.com/openstack/nova/blob/707deb158996d540111c23afd8c916ea1c18906a/nova/db/sqlalchemy/api.py#L522717:02
*** ricolin has quit IRC17:02
melwittsean-k-mooney: if we did not allow revert from error? I don't think reset state + reboot would put everything back properly17:02
mriedemwhich is actually a bug...17:02
mriedemhttps://bugs.launchpad.net/nova/+bug/182442017:03
openstackLaunchpad bug 1824420 in OpenStack Compute (nova) "Live migration succeeds but instance-action-list still has unexpected Error status" [Undecided,Triaged]17:03
melwittoh17:03
mriedemso before we go down the road of "well the user can track the operatoin to see if it was auto-reverted on error because of instance actions" let me point out that relying on instance actions that way isn't fool proof because of that bug17:04
mriedemand especially b/c it's a result of failures on hosts and then doing reschedules to other hosts17:04
mriedemwhich resize can do17:04
sean-k-mooneythe instace should become active on the source host but it might not fix the allocation in placment properly17:04
mriedemauto-reverting a failed resize could be all sorts of f'ed up17:05
mriedembecause rollbacks are near impossible17:05
mriedemhard to test17:05
mriedemi'm fairly certain our live migration rollback code is also quite janky in several ways17:05
mriedembecause we don't test it in the gate17:06
*** martinkennelly has quit IRC17:07
sean-k-mooneyjust looking at that bug the live migration failed right?17:08
sean-k-mooneyso we would exepct there to be an error in the instance action log?17:09
mriedemno17:10
mriedemread my comments on the bug17:10
sean-k-mooneymaybe im missreading it as its kind of hard to read the initilal bug17:10
mriedema pre-check on one of the candidate dest hosts failed17:10
mriedemwhich triggers a reschedule to another dest host in the conductor live migration task17:10
mriedemthe 2nd host works17:10
mriedembut b/c the pre-check failed on the first dest host the instance action event for that one is error which sets the overall action message to 'Error'17:11
sean-k-mooneyah ok17:11
mriedemiow, actions aren't reschedule aware17:11
mriedemor what is a non-fatal error17:11
sean-k-mooneyright17:12
sean-k-mooneyshould we be loging the prechecks as events?17:12
sean-k-mooneyi was not exepcting to see compute_check_can_live_migrate_destination events17:12
mriedemhard to say17:13
mriedemif you configure nova to not have alternate hosts for reschedules17:13
mriedemthen you'd likely want to know it's that dest pre-check that failed right?17:13
sean-k-mooneymaybe or jsut that you had no valid hosts?17:14
sean-k-mooney/ exausted teh list of alternates17:14
sean-k-mooneyi thikn we would still log the failure right17:14
openstackgerritMerged openstack/nova master: Remove orphaned comment from _get_group_details  https://review.opendev.org/66713517:15
mriedemsure, if you set https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.migrate_max_retries to 0 for now retries, or you don't have any alternate hosts17:15
mriedemidk, anyway, it's tangential to the auto-revert failed resize thing mel was asking about17:15
sean-k-mooneyfor me this feels like we are leaking an implemenation detail as an event17:15
sean-k-mooneyya it is17:16
mriedeminstance action events are basically entirely leaked implementation details :)17:16
mriedemthe event names come from the methods they decorate17:16
mriedemthere is no guarantee on api stability for those things17:16
sean-k-mooneyok personally i would prefer not to decorate that function17:17
sean-k-mooneybut as you said its tangental to melwitt's  topic17:17
*** zbr|ruck is now known as zbr17:18
*** luksky has joined #openstack-nova17:29
*** udesale has quit IRC17:29
*** panda has quit IRC17:31
*** panda has joined #openstack-nova17:35
openstackgerritLee Yarwood proposed openstack/nova master: libvirt: Add a rbd_connect_timeout configurable  https://review.opendev.org/66742117:36
openstackgerritEric Fried proposed openstack/nova-specs master: grammar fix for show-server-numa-topology spec  https://review.opendev.org/66748717:36
Nick_Aany idea why metadata would send all /24 routes in a region to each instance? http://paste.openstack.org/show/y0lE42EA59yhnu7G1KnY/17:38
openstackgerritMatt Riedemann proposed openstack/nova master: Fix AttributeError in RT._update_usage_from_migration  https://review.opendev.org/66768717:38
openstackgerritMatt Riedemann proposed openstack/nova master: Fix RT init arg order in test_unsupported_move_type  https://review.opendev.org/66768817:38
openstackgerritGhanshyam Mann proposed openstack/nova master: Multiple API cleanup changes  https://review.opendev.org/66688917:48
*** NewBruce9 has joined #openstack-nova17:48
*** jamesdenton has quit IRC17:49
*** jistr_ has joined #openstack-nova17:52
*** dtantsur has joined #openstack-nova17:52
*** klindgren_ has joined #openstack-nova17:52
*** NewBruce has quit IRC17:53
*** dtantsur|mtg has quit IRC17:53
*** klindgren has quit IRC17:53
*** jistr has quit IRC17:53
*** maciejjozefczyk has joined #openstack-nova17:53
*** altlogbot_1 has quit IRC17:55
*** jamesdenton has joined #openstack-nova17:56
*** altlogbot_2 has joined #openstack-nova17:59
sean-k-mooneyyonglihe: efried just reviewing https://review.opendev.org/#/c/648912/14 but why are we looking up instance by name?17:59
*** altlogbot_2 has quit IRC18:01
efriedsean-k-mooney: I haven't the foggiest. I'm involved here in an administrative capacity :)18:01
sean-k-mooneythe domain xml has the instance uuid stored in the uuid field for several release now so im wondering why we would use the instance domain name instead18:02
*** altlogbot_1 has joined #openstack-nova18:03
melwittsean-k-mooney: been meaning to get to that review. after what we briefly discussed at the ptg, that change would be best to piggyback onto the existing periodic for handling deleted instances. I don't know why it would be looking up instances by name18:03
efriedk, hopefully yonglihe will be able to answer on the review. Thanks sean-k-mooney.18:03
sean-k-mooneymelwitt: it is piggybacking on that task18:04
melwittbut I see a lot of new methods18:04
sean-k-mooneyefried: ok im trying to find where it gets teh list of suspected instances18:04
sean-k-mooneymelwitt: ya there are. im not sure if they are all needed.18:05
melwittyeah, I would think there should be none new needed18:05
sean-k-mooneywell we need a new method to query the driver for the instance that are runnin on the host but not in the db18:06
sean-k-mooneyand then we can call the old meethods that implement the policy. e.g. reap or log or do nothing whatever you have set in the config18:07
melwittwhy? there's a self._get_instances_on_driver method already18:07
sean-k-mooneythat is a good question :)18:08
sean-k-mooneyi have only properly looked at https://review.opendev.org/#/c/648912/14 whic is doing driver change but i need to look at how that is being used in https://review.opendev.org/#/c/627765/2818:08
melwittanyway, just saying skimming those patches I don't see why they're so large18:08
melwittor rather, I expected not such a large patch for this18:09
sean-k-mooneymelwitt: this seam oververly complex https://review.opendev.org/#/c/627765/28/nova/compute/manager.py@888418:13
sean-k-mooneyalso _destroy_orphan_instances is not a greate name for that since it might not destroy anything .18:14
sean-k-mooneymelwitt: i was wrong however its adding a new periodici task not extending the existing tasks18:15
*** hongbin has joined #openstack-nova18:33
*** ociuhandu has quit IRC18:36
melwittsean-k-mooney: yeah, that's what I had thought. it should get much simpler if it's changed to extend the existing periodic. and I think the suggestion at the ptg from dansmith was to add another enumerated choice to the existing conf option that is something like "reap-unknown" which will reap both known deleted and unknown guests. and otherwise just log unknowns18:41
openstackgerritMerged openstack/nova master: Fix update_provider_tree signature in reference docs  https://review.opendev.org/66740818:43
openstackgerritEric Fried proposed openstack/nova-specs master: grammar fix for show-server-numa-topology spec  https://review.opendev.org/66748718:43
mriedemsean-k-mooney: it's named _destroy* because it's similar to the existing _destroy_running_instances18:44
mriedemwhich might also not destroy anything18:44
sean-k-mooneymriedem: ah ok18:44
sean-k-mooneymy main issue with this is it will only really work for the libvirt dirver18:45
sean-k-mooneyand the fake driver18:45
*** ivve has joined #openstack-nova18:45
mriedemyup18:53
mriedemi believe i noted something along those lines last time i did a deep review on it18:54
openstackgerritMiguel Ángel Herranz Trillo proposed openstack/nova master: Fix type error on call to mount device  https://review.opendev.org/65978018:54
mriedemi seem to remember why they needed to lookup by name at the time, and it was libvirt-specific18:54
sean-k-mooneyya im looking at it again18:54
sean-k-mooneyits because we can have libvirt domains that are for instance that nova created but have been deleted form the db18:55
sean-k-mooneyso to destory the running guest they need to use the libvirt domain name18:55
mriedemhaving said all that, i'm not opposed to starting with something that could eventually be implemented by other drivers18:55
mriedemas long as it's graceful about other drivers not implementing the necessary interface18:56
sean-k-mooneyya it handels the not implmetned excetion correctly in the manager18:56
sean-k-mooneyim think of asking them to use the uuid instead of the name however18:56
sean-k-mooneythe instance uuid is set in the libvirt domains uuid field18:57
mriedemyeah uuid is ideal if we can use it18:57
sean-k-mooneybut i libvirt allows us to deleate by uuid18:58
sean-k-mooneyme might just be pushing the translation into the dirver18:58
sean-k-mooneyim also wondering how to deal with the fact we could be leaking plugged interfaces18:59
*** maciejjozefczyk has quit IRC19:02
mriedemheh,19:03
mriedemwell we could be leaking all sorts of things19:04
mriedemstorage connections19:04
sean-k-mooneywill undefineing the domin delete the root disk?19:04
sean-k-mooneyor other disks19:04
mriedemi doubt it19:05
sean-k-mooneyif we are reaping the orpahn vms that were created by nova but are deleted in the db we really should be cleaning up all there local resouces in that case which this current patch does not attemtp to do19:05
mriedemotherwise we wouldn't need separate calls during driver.destroy to cleanup the volumes via brick and unplug vifs via os-vif19:06
mriedemat some point this could also be cyborg devices and such couldn't it?19:06
sean-k-mooneyya that is a good point19:06
sean-k-mooneyya19:06
sean-k-mooneyso if we are going to reap these we need to try and clean up as much of the resouces as we can or just support powering down the instance but not reaping them?19:07
openstackgerritMerged openstack/nova-specs master: grammar fix for show-server-numa-topology spec  https://review.opendev.org/66748719:07
sean-k-mooneyif we just delete the domain the operator has noting ot go on to figure out what they need to clean up manually19:07
sean-k-mooneyits tricky however as from the domain we dont know what nova create or not but i think we should at lesast try to do some cleanup19:09
*** dklyle has quit IRC19:09
sean-k-mooneyanyway im going to get something to eat19:12
sean-k-mooneyo/19:13
*** ralonsoh has quit IRC19:17
*** phughk has quit IRC19:18
mriedemthere is meta in the domain that tells us if nova created it19:19
sean-k-mooneyyes there is19:19
mriedemfor libvirt anyway19:19
sean-k-mooneywhich i asked yonglihe  to check when deleteing the domian19:20
sean-k-mooneyin the libvirt case that is19:20
efriedmriedem: +1 on the ML note, thanks for that19:20
mriedemnp19:21
sean-k-mooneythat is what https://review.opendev.org/#/c/648912/14/nova/virt/libvirt/driver.py@9695 is doing and its used to filer the domains here https://review.opendev.org/#/c/627765/28/nova/compute/manager.py@888619:22
*** whoami-rajat has quit IRC19:22
*** maciejjozefczyk has joined #openstack-nova19:31
*** dklyle has joined #openstack-nova19:36
*** panda has quit IRC19:39
*** panda has joined #openstack-nova19:40
*** altlogbot_1 has quit IRC19:45
*** altlogbot_2 has joined #openstack-nova19:47
*** maciejjozefczyk has quit IRC19:50
*** tbachman has quit IRC19:55
efriedmriedem: took me three days to go through all the specs, but http://lists.openstack.org/pipermail/openstack-discuss/2019-June/007381.html19:59
*** tbachman has joined #openstack-nova19:59
*** mriedem has quit IRC20:04
*** mriedem has joined #openstack-nova20:07
mriedemefried: huh https://review.opendev.org/#/q/project:openstack/nova-specs+status:open+path:%255Especs/train/approved/.*+reviewedby:self seems to not work properly, it's only showing me https://review.opendev.org/#/c/631154/ but i've clearly commented on https://review.opendev.org/#/c/648686/ as well20:10
mriedemmaybe reviewedby is only on the latest PS?20:10
mriedemaha20:10
mriedemhttps://review.opendev.org/#/q/project:openstack/nova-specs+status:open+path:%255Especs/train/approved/.*+reviewer:self20:10
mriedemreviewer:self, not reviewedby:self20:11
efriedwhoops, thanks20:14
*** altlogbot_2 has quit IRC20:15
*** eharney has quit IRC20:16
*** altlogbot_2 has joined #openstack-nova20:19
*** altlogbot_2 has quit IRC20:43
*** altlogbot_2 has joined #openstack-nova20:45
*** Bidwe_jay has quit IRC20:59
*** altlogbot_2 has quit IRC21:00
*** altlogbot_3 has joined #openstack-nova21:05
*** pcaruana has quit IRC21:05
*** ivve has quit IRC21:07
*** cfriesen has quit IRC21:24
melwittmnaser, imacdonn: hi, as responders to the ML thread awhile back, I have a spec up for showing server status as UNKNOWN if host status is UNKNOWN that has been receiving some reviews. your reviews would be helpful for deciding whether it goes forward https://review.opendev.org/66618121:31
*** cfriesen has joined #openstack-nova21:31
mriedemdansmith: you may want to drop your +2 to a +1 or 0 https://review.opendev.org/#/c/457886/ until i get the ceph job results on it21:33
openstackgerritMiguel Ángel Herranz Trillo proposed openstack/nova master: Fix type error on call to mount device  https://review.opendev.org/65978021:41
*** panda has quit IRC21:42
mriedemdansmith: nvm, lyarwood already had a patch up to test that21:44
*** panda has joined #openstack-nova21:45
*** takashin has joined #openstack-nova21:50
*** rcernin has joined #openstack-nova22:00
*** mlavalle has quit IRC22:09
*** xek__ has quit IRC22:10
*** shilpasd has quit IRC22:11
*** eharney has joined #openstack-nova22:12
mriedemefried: are you ok with me just pushing up this test change and +2ing? https://review.opendev.org/#/c/659780/3/nova/tests/unit/virt/disk/mount/test_nbd.py22:14
efried...22:14
efriedmriedem: yes, and I'll +A22:15
*** luksky has quit IRC22:16
openstackgerritMatt Riedemann proposed openstack/nova master: Fix type error on call to mount device  https://review.opendev.org/65978022:16
mriedemdone22:17
efried+A22:18
*** rcernin has quit IRC22:19
*** rcernin has joined #openstack-nova22:20
mnasermelwitt: left a comment thanks :D22:25
melwittthanks22:43
*** tkajinam has joined #openstack-nova23:05
*** threestrands has joined #openstack-nova23:15
*** igordc has quit IRC23:25
*** threestrands has quit IRC23:29
*** mriedem has quit IRC23:42
*** hongbin has quit IRC23:43
*** slaweq has quit IRC23:50
*** icarusfactor has quit IRC23:51

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!