Wednesday, 2018-12-19

*** wolverineav has joined #openstack-nova00:01
*** brault has joined #openstack-nova00:03
*** wolverineav has quit IRC00:05
*** jmlowe has quit IRC00:07
*** brault has quit IRC00:07
*** itlinux has joined #openstack-nova00:08
openstackgerritHongbin Lu proposed openstack/nova-specs master: [WIP] Support scheduling VM's NICs to different PFs  https://review.openstack.org/62605500:19
*** hongbin has quit IRC00:21
openstackgerritsean mooney proposed openstack/nova-specs master: Add spec for sriov live migration  https://review.openstack.org/60511600:24
*** mriedem has quit IRC00:27
*** itlinux has quit IRC00:28
*** _alastor_ has joined #openstack-nova00:28
*** itlinux has joined #openstack-nova00:28
*** itlinux has quit IRC00:29
*** macza has quit IRC00:32
*** mlavalle has quit IRC00:32
*** fragatina has joined #openstack-nova00:40
*** sapd1 has joined #openstack-nova00:45
*** ileixe has joined #openstack-nova00:50
*** ileixe has quit IRC00:51
*** ileixe has joined #openstack-nova00:53
openstackgerritMatt Riedemann proposed openstack/nova master: Remove "API Service Version" upgrade check  https://review.openstack.org/61534800:54
openstackgerritMatt Riedemann proposed openstack/nova master: Drop old service version check compat from _delete_while_booting  https://review.openstack.org/62358900:54
*** gyee has quit IRC00:55
*** itlinux has joined #openstack-nova01:12
*** wolverineav has joined #openstack-nova01:18
*** sapd1 has quit IRC01:20
*** wolverineav has quit IRC01:23
*** igordc has quit IRC01:29
*** _alastor_ has quit IRC01:34
*** tiendc has joined #openstack-nova01:34
*** igordc has joined #openstack-nova01:38
*** igordc has quit IRC01:42
*** colby_ has quit IRC01:45
openstackgerritzhaodan7597 proposed openstack/nova master: fix a bug, when creating a vmware instance from a volume,and it goes to error state, the volume still in  "in use" state.  https://review.openstack.org/57111201:47
*** tetsuro has quit IRC01:47
openstackgerritzhaodan7597 proposed openstack/nova master: Bug description: when creating a vmware instance from a volume is failed, vm goes to error state, the volume is in "in use" state,  and after deleting the vm, the state of the volume is still in  "in use" and can't be deleted.  https://review.openstack.org/57111201:54
openstackgerritMerged openstack/nova master: Make [cinder]/catalog_info no longer require a service_name  https://review.openstack.org/62073801:59
*** sapd1 has joined #openstack-nova02:04
*** Dinesh_Bhor has joined #openstack-nova02:08
*** sapd1 has quit IRC02:09
*** cfriesen has quit IRC02:12
openstackgerritzhaodan7597 proposed openstack/nova master: Catch the not found exception when power off an bfv wmware instance failed  https://review.openstack.org/57111202:15
*** colby_ has joined #openstack-nova02:17
*** igordc has joined #openstack-nova02:22
*** mhen has quit IRC02:22
*** mhen has joined #openstack-nova02:23
*** Dinesh_Bhor has quit IRC02:23
*** wolverineav has joined #openstack-nova02:33
*** wolverineav has quit IRC02:34
*** wolverin_ has joined #openstack-nova02:34
*** Dinesh_Bhor has joined #openstack-nova02:39
*** itlinux has quit IRC02:40
*** igordc has quit IRC02:43
*** psachin has joined #openstack-nova02:58
*** mrsoul has quit IRC03:00
*** brinzhang has joined #openstack-nova03:04
*** igordc has joined #openstack-nova03:19
*** Bhujay has joined #openstack-nova03:20
*** tbachman has quit IRC03:20
openstackgerritzhaodan7597 proposed openstack/nova master: Catch the not found exception when powering off a wmware instance  https://review.openstack.org/57111203:30
*** tbachman has joined #openstack-nova03:30
*** Bhujay has quit IRC03:38
*** erlon has quit IRC03:44
*** Dinesh_Bhor has quit IRC03:46
openstackgerritBrin Zhang proposed openstack/nova-specs master: Support admin to specify project to create snapshot  https://review.openstack.org/61684303:46
*** hongbin has joined #openstack-nova03:51
*** igordc has quit IRC03:52
*** dklyle has quit IRC04:02
*** david-lyle has joined #openstack-nova04:02
*** brault has joined #openstack-nova04:05
*** wolverin_ has quit IRC04:06
*** sapd1 has joined #openstack-nova04:07
*** takashin has joined #openstack-nova04:09
*** brault has quit IRC04:09
*** wolverineav has joined #openstack-nova04:13
*** wolverineav has quit IRC04:13
gmanngibi: i replied on this, please check if that make sense - https://review.openstack.org/#/c/625002/04:13
*** slaweq has quit IRC04:13
*** udesale has joined #openstack-nova04:14
*** wolverineav has joined #openstack-nova04:14
*** macza has joined #openstack-nova04:17
*** wolverineav has quit IRC04:18
*** Bhujay has joined #openstack-nova04:33
openstackgerritGhanshyam Mann proposed openstack/nova master: Use renamed template 'integrated-gate-py3'  https://review.openstack.org/62608804:34
*** janki has joined #openstack-nova04:48
*** vabada has quit IRC05:05
*** evrardjp_ has joined #openstack-nova05:05
*** evrardjp has quit IRC05:07
*** Dinesh_Bhor has joined #openstack-nova05:18
*** ileixe has quit IRC05:25
*** ileixe has joined #openstack-nova05:27
*** hongbin has quit IRC05:40
*** udesale has quit IRC05:45
*** brinzhang has quit IRC05:45
*** udesale has joined #openstack-nova05:49
*** macza has quit IRC05:58
*** evrardjp has joined #openstack-nova06:01
*** evrardjp_ has quit IRC06:03
*** ratailor has joined #openstack-nova06:04
*** ratailor has quit IRC06:04
*** licanwei has joined #openstack-nova06:04
*** ratailor has joined #openstack-nova06:05
openstackgerritzhaodan7597 proposed openstack/nova master: Catch the not found exception when powering off a wmware instance  https://review.openstack.org/62609506:05
*** ratailor has quit IRC06:06
openstackgerritzhaodan7597 proposed openstack/nova master: Catch the not found exception when powering off a wmware instance  https://review.openstack.org/62609506:08
*** sridharg has joined #openstack-nova06:09
openstackgerritzhaodan7597 proposed openstack/nova master: Catch the not found exception when powering off a wmware instance  https://review.openstack.org/57111206:10
*** wolverineav has joined #openstack-nova06:12
openstackgerritTakashi NATSUME proposed openstack/nova master: Adds view builders for keypairs controller  https://review.openstack.org/34728906:18
openstackgerritTakashi NATSUME proposed openstack/nova master: Fix 500 error while passing 4-byte unicode data  https://review.openstack.org/40751406:19
*** wolverineav has quit IRC06:21
*** dims has quit IRC06:27
*** fragatina has quit IRC06:27
*** fragatina has joined #openstack-nova06:28
*** dims has joined #openstack-nova06:28
*** dims has quit IRC06:33
*** dims has joined #openstack-nova06:34
*** mgagne has quit IRC06:35
*** fragatina has quit IRC06:35
*** fragatina has joined #openstack-nova06:36
*** mgagne has joined #openstack-nova06:39
*** ianw is now known as ianw_pto06:43
*** Dinesh_Bhor has quit IRC06:56
*** fragatina has quit IRC07:01
*** fragatina has joined #openstack-nova07:01
*** liuyulong has joined #openstack-nova07:07
*** Dinesh_Bhor has joined #openstack-nova07:10
*** brault has joined #openstack-nova07:12
*** brault has quit IRC07:14
*** wolverineav has joined #openstack-nova07:20
*** wolverineav has quit IRC07:20
*** wolverineav has joined #openstack-nova07:20
*** ratailor has joined #openstack-nova07:21
*** dpawlik has joined #openstack-nova07:21
*** dpawlik has quit IRC07:25
*** vabada has joined #openstack-nova07:27
*** ileixe has quit IRC07:29
*** imacdonn has quit IRC07:29
*** imacdonn has joined #openstack-nova07:29
openstackgerritzhaodan7597 proposed openstack/nova master: Catch the not found exception when powering off a vmware instance  https://review.openstack.org/57111207:31
*** dpawlik has joined #openstack-nova07:36
*** dpawlik has quit IRC07:36
*** dpawlik_ has joined #openstack-nova07:36
*** ccamacho has joined #openstack-nova07:40
*** moshele has joined #openstack-nova07:42
*** wolverineav has quit IRC07:46
*** sapd1 has quit IRC07:49
*** slaweq has joined #openstack-nova08:00
*** takashin has left #openstack-nova08:00
*** Dinesh_Bhor has quit IRC08:03
*** pcaruana has joined #openstack-nova08:03
openstackgerritYongli He proposed openstack/nova-specs master: add 'show-server-group' spec  https://review.openstack.org/61225508:05
*** sapd1 has joined #openstack-nova08:06
*** markvoelker has joined #openstack-nova08:10
openstackgerritMartin Midolesov proposed openstack/nova master: vmware:add support for the hw_video_ram image property  https://review.openstack.org/56419308:12
openstackgerritwingwj proposed openstack/nova master: Fix a broken-link in nova doc  https://review.openstack.org/62611008:12
*** liuyulong has quit IRC08:15
*** Bhujay has quit IRC08:19
openstackgerritwingwj proposed openstack/nova master: Fix a broken-link in nova doc  https://review.openstack.org/62611308:22
*** sahid has joined #openstack-nova08:23
*** helenafm has joined #openstack-nova08:24
*** pcaruana has quit IRC08:24
openstackgerritwingwj proposed openstack/nova master: Fix a broken-link in nova doc  https://review.openstack.org/62611008:26
*** pcaruana has joined #openstack-nova08:33
*** bhagyashris has joined #openstack-nova08:34
*** brault has joined #openstack-nova08:40
*** brault has quit IRC08:41
*** pcaruana has quit IRC08:41
openstackgerritMerged openstack/nova master: Remove legacy request spec compat code from API  https://review.openstack.org/61430908:43
gibigmann: hi! regarding https://review.openstack.org/#/c/625002/ do we assume that every dependency related code change are fully covered with unit and functional test?08:46
*** jogo has joined #openstack-nova08:46
*** rcernin has quit IRC08:51
*** alexchadin has joined #openstack-nova08:55
*** Bhujay has joined #openstack-nova09:03
*** Bhujay has quit IRC09:04
*** Bhujay has joined #openstack-nova09:05
*** Bhujay has quit IRC09:06
*** Dinesh_Bhor has joined #openstack-nova09:06
*** Bhujay has joined #openstack-nova09:06
*** Bhujay has quit IRC09:07
*** Bhujay has joined #openstack-nova09:08
*** Bhujay has quit IRC09:09
*** Bhujay has joined #openstack-nova09:09
*** rodolof has joined #openstack-nova09:10
*** wolverineav has joined #openstack-nova09:10
*** Bhujay has quit IRC09:10
*** Bhujay has joined #openstack-nova09:11
*** moshele has quit IRC09:13
*** wolverineav has quit IRC09:15
gibigmann: left a reply in https://review.openstack.org/#/c/625002/09:16
*** priteau has joined #openstack-nova09:23
*** Bhujay has quit IRC09:24
*** sapd1 has quit IRC09:34
*** sapd1 has joined #openstack-nova09:35
*** licanwei has quit IRC09:35
*** derekh has joined #openstack-nova09:37
*** rodolof has quit IRC09:39
*** rodolof has joined #openstack-nova09:40
*** brault has joined #openstack-nova09:45
*** Bhujay has joined #openstack-nova09:48
*** Bhujay has quit IRC09:51
*** ttsiouts has joined #openstack-nova09:58
*** bhagyashris has quit IRC10:00
*** maciejjozefczyk has quit IRC10:04
*** maciejjozefczyk has joined #openstack-nova10:05
*** maciejjozefczyk has quit IRC10:06
*** erlon has joined #openstack-nova10:06
*** maciejjozefczyk has joined #openstack-nova10:08
*** maciejjozefczyk has quit IRC10:08
*** Dinesh_Bhor has quit IRC10:09
*** Bhujay has joined #openstack-nova10:11
*** Bhujay has quit IRC10:12
*** Bhujay has joined #openstack-nova10:12
*** Bhujay has quit IRC10:13
*** Bhujay has joined #openstack-nova10:14
*** ttsiouts has quit IRC10:16
*** ttsiouts has joined #openstack-nova10:16
*** erlon_ has joined #openstack-nova10:17
*** erlon has quit IRC10:20
*** ttsiouts has quit IRC10:21
*** Bhujay has quit IRC10:24
*** ttsiouts has joined #openstack-nova10:25
*** psachin has quit IRC10:29
*** yan0s has joined #openstack-nova10:34
*** maciejjozefczyk has joined #openstack-nova10:49
*** maciejjozefczyk has quit IRC10:51
*** rodolof has quit IRC10:51
*** rodolof has joined #openstack-nova10:51
*** maciejjozefczyk has joined #openstack-nova10:54
*** maciejjozefczyk has quit IRC10:57
*** avolkov has joined #openstack-nova11:03
*** rodolof has quit IRC11:03
*** ccamacho has quit IRC11:05
*** maciejjozefczyk has joined #openstack-nova11:05
*** maciejjozefczyk has quit IRC11:07
*** dpawlik_ has quit IRC11:09
*** dpawlik has joined #openstack-nova11:09
*** sapd1 has quit IRC11:10
*** dpawlik has quit IRC11:13
*** udesale has quit IRC11:13
*** dpawlik has joined #openstack-nova11:13
*** ccamacho has joined #openstack-nova11:14
*** dpawlik has quit IRC11:14
*** dpawlik has joined #openstack-nova11:14
*** maciejjozefczyk has joined #openstack-nova11:15
*** dpawlik has quit IRC11:16
*** dpawlik has joined #openstack-nova11:17
*** dpawlik has quit IRC11:17
*** sapd1 has joined #openstack-nova11:17
*** dpawlik has joined #openstack-nova11:17
*** rodolof has joined #openstack-nova11:20
*** Bhujay has joined #openstack-nova11:21
*** ralonsoh has joined #openstack-nova11:27
*** ttsiouts has quit IRC11:27
*** rodolof has quit IRC11:52
*** rodolof has joined #openstack-nova11:53
*** erlon_ has quit IRC11:59
*** tbachman_ has joined #openstack-nova12:02
*** jonher_ has joined #openstack-nova12:05
*** rodolof has quit IRC12:05
*** tbachman has quit IRC12:05
*** tbachman_ is now known as tbachman12:05
*** dpawlik has quit IRC12:06
*** jonher has quit IRC12:08
*** jonher_ is now known as jonher12:08
*** tbachman_ has joined #openstack-nova12:08
*** tbachman has quit IRC12:10
*** tbachman_ is now known as tbachman12:10
*** erlon_ has joined #openstack-nova12:15
*** ttsiouts has joined #openstack-nova12:20
*** ratailor has quit IRC12:23
*** tiendc has quit IRC12:27
*** hogepodge has quit IRC12:29
*** hogepodge has joined #openstack-nova12:30
*** janki has quit IRC12:32
*** wolverineav has joined #openstack-nova12:47
*** dpawlik has joined #openstack-nova12:50
*** wolverineav has quit IRC12:51
*** ttsiouts has quit IRC12:54
*** ttsiouts has joined #openstack-nova12:57
*** sapd1 has quit IRC13:03
*** alex_xu has quit IRC13:03
*** mriedem has joined #openstack-nova13:06
*** dave-mccowan has joined #openstack-nova13:08
mriedemsean-k-mooney: you might be able to triage this https://bugs.launchpad.net/nova/+bug/180909513:10
openstackLaunchpad bug 1809095 in OpenStack Compute (nova) "Wrong representor port was unplugged from OVS during cold migration" [Undecided,New]13:10
*** maciejjozefczyk has quit IRC13:11
sean-k-mooneyill take a look. form the title it sounds like its related to os-vif and mellonox hardware offload13:11
*** maciejjozefczyk has joined #openstack-nova13:12
gibimriedem: I left a hint about the serialization problem in https://review.openstack.org/#/c/582417/6/nova/compute/rpcapi.py@73613:22
mriedemmelwitt: this sounds very similar to a thing you fixed about caching HostState values globally per scheduler worker https://bugs.launchpad.net/nova/+bug/180906113:24
openstackLaunchpad bug 1809061 in OpenStack Compute (nova) "KeyError when booting multi-stagger-instances" [Undecided,New]13:24
*** markvoelker has quit IRC13:24
*** janki has joined #openstack-nova13:26
mriedemgibi: replied, thanks i'll mess with that13:30
*** helenafm has quit IRC13:34
mriedemsean-k-mooney: stephenfin: you may also enjoy https://bugs.launchpad.net/nova/+bug/1809040 but i'm not really sure what to do about it,13:35
openstackLaunchpad bug 1809040 in OpenStack Compute (nova) "pci device lost when error in the configuration file " [Undecided,New]13:35
mriedembasically, they goofed their pci passthrough whitelist config during an upgrade,13:35
mriedemand lost all the pci device inventory that was previously discovered and assigned to a given vm once they rebooted the vm13:35
mriedemmaybe they can cold migrate their way out of it13:36
*** yan0s has quit IRC13:37
mriedempci devices are only assigned during claims right?13:37
*** maciejjozefczyk has quit IRC13:38
mriedemfrickler: you should have your queens/pike releases now13:39
stephenfinmriedem: Just finishing up a rather lengthy email. I'll take a look soon as that's done13:39
stephenfinmriedem: But yeah, only during claims to the best of my recollection13:39
fricklermriedem: yes, I saw the notification on the bug report, thank you13:41
*** helenafm has joined #openstack-nova13:44
mgariepyha, hello :)13:44
*** mmethot has quit IRC13:45
mgariepythe cold migrate might work guess i don't tend to migrate vms with pci passthrough tho.. and resize isn't really an option since it allow you to select a new flavor and not old same one.13:45
mriedemcold migrate is just resize without a new flavor13:46
mriedemthe pci passthrough whitelist / framework is a fickle beast13:47
*** mmethot has joined #openstack-nova13:47
mriedemgiven the "inventory" in nova is the intersection of what's in the config and what's on the host13:47
*** moshele has joined #openstack-nova13:48
mgariepymy use case is more like: 1 vm / host with all the ressources, (pci passthrough ram and cpu), resize doesn't really work in that case.13:50
mgariepyi noticed that the pci devices are re-created in the db, how is the link made to the computes ?13:51
mgariepypci passthrough is fun. it gave me quite a few issues lately.13:51
mriedemso you can't cold migrate the vm because ther are no other available hosts with the same pci device?13:52
mriedem*there13:52
mgariepyno there isn't13:52
*** yan0s has joined #openstack-nova13:52
mriedemhmm, i wonder if you could trick the resize to same host though by creating a private duplicate flavor with some bogus extra spec like foo=bar13:54
mgariepyin nova.pci_devices why isn't it link to the compute node id  ?13:54
mriedemyou'd have to of course enable this option https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.allow_resize_to_same_host13:54
mgariepywhen recreating the device13:54
mriedemi'm really not the person that can answer low level questions about how the pci device code works within nova,13:55
mriedemhopefully sean-k-mooney and/or stephenfin could help there13:55
mriedemiow i have to look all of the code up every time i need to investigate it13:56
mriedemthe PciDevice object does have a compute_node_id field13:56
sean-k-mooneymriedem: sorry was reading bug. what is the context13:56
mgariepyas far as i'm concern, if i have a compute node, and the devices is the same in the same pci address, it should be ""undeleted"" instead of created again.13:56
mgariepysean-k-mooney, https://bugs.launchpad.net/nova/+bug/180904013:57
openstackLaunchpad bug 1809040 in OpenStack Compute (nova) "pci device lost when error in the configuration file " [Undecided,New]13:57
mgariepysean-k-mooney,  in nova.pci_devices why isn't it link to the compute node id  ?13:58
mgariepyhttps://paste.ubuntu.com/p/Pn76QVmwqr/13:58
mriedemthey are linked to compute node 23 there13:59
sean-k-mooneymgariepy: they are13:59
mgariepyyep, but the issue is that if i remove the passthrough config from nova.conf, it get the deleted_at but if I re-add it it create new one.14:00
mgariepyin the same host, same address. etc..14:00
sean-k-mooneyyes14:00
sean-k-mooneythat is expected14:00
sean-k-mooneythe id filed is an auto incremting filed and the uuid is randomly generated if the device id not found in the database14:01
mgariepywouldn't be better to re-use the old entry if all the info matches?14:01
sean-k-mooneymgariepy: if we have already deleted it no14:02
stephenfinmgariepy: For what it's worth, that also confused me but it is expected14:02
mgariepythe 2 first are the ""original"" one, then the 2 other are the new one created.14:02
*** Bhujay has quit IRC14:02
mriedemit seems that things break down on reboot here https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L535014:02
*** Bhujay has joined #openstack-nova14:02
mriedempci_manager.get_instance_pci_devs(instance) must be returning []14:02
sean-k-mooneymgariepy: we do it this way because you could have pull the card and install a different one in the same slot14:02
mgariepyyes.14:02
mgariepythen the product id would have changed14:03
sean-k-mooneythat could break the guest if we jsut blindly reused it14:03
mriedemand i think that's probably [] because i think instance.pci_devices is set during a resource claim, which doesn't happen on reboot14:03
sean-k-mooneyno the product ids could be the same but if they were nics14:03
*** Bhujay has quit IRC14:03
sean-k-mooneythat were usidn for pf passthouhg the mac woudl have changed14:03
openstackgerritMerged openstack/nova master: Fix a broken-link in nova doc  https://review.openstack.org/62611014:03
*** Bhujay has joined #openstack-nova14:04
mgariepyyes for a nic, that's true.14:04
mgariepythat's my testbed , the other system i have uses graphic card :D14:04
mgariepyhaha14:04
sean-k-mooneyevent for gpus we only the vendor id and produc id are recored14:04
sean-k-mooneynot the subvend id14:04
*** Bhujay has quit IRC14:05
sean-k-mooneyso all GTX 1080s have the same vendor id and product id but an EVGA or asus one has different subvendor ids14:05
*** Bhujay has joined #openstack-nova14:05
sean-k-mooneynot they should be identical but no all such product are.14:05
sean-k-mooneywell actully the clock speed/ram could chagne14:06
mgariepyyeah but the drive would manage that for you.14:06
*** Bhujay has quit IRC14:06
*** Bhujay has joined #openstack-nova14:07
sean-k-mooneyit may  but the point is if we deteact the device was removed we cannot trust that it si the same and cant reuse the entry14:07
sean-k-mooneywe do not recerate tehm on every reboot14:07
sean-k-mooneywe only do it if the agent does a pci scan and did not find them14:08
mgariepyyeah, i never had issue with reboot before :D haha14:08
mgariepyanyway at least now i know, and i'll be more careful next time.14:09
sean-k-mooneyso out of interst the device was allocated when it was removed14:10
sean-k-mooneywhat was teh state of the vm14:10
stephenfinmriedem: Yeah, I'm not actually sure how else we could resolve that besides a cold migrate/rebuild. It's a mismatch between two sources of truth: what the libvirt driver is finding on the host (based on the whitelist) and what the instance is saying it's using14:10
mgariepyas long as you don't restart it's ok14:11
sean-k-mooneymgariepy: i would have assumed you would have migrated the vm off the host before upgrade but if you didnt have you tried doing a openstack server reboot --hard to try and fix the issue14:11
mgariepywhen you restart, the libvirt config generated doesn't have the pci devices but boots anyway14:11
*** Bhujay has quit IRC14:12
sean-k-mooneyright ok then the only way to fix that is likely to shelve and unshelve14:12
sean-k-mooneye.g. to free up the node and its pci device and then recreate the vm with its data on the same node14:13
mgariepythe reboot --hard doesn't work, since i guess the data is pulled from the DB and my ""new"" device is not allocated14:13
mgariepyyeah.14:13
mgariepyi'll have my client to rebuild his cluster.14:13
mgariepysean-k-mooney, i do inplace upgrades, it's not a big deal ;P14:14
sean-k-mooneymgariepy: well you could also manually fix this in the db without too much hasel but i guess14:14
sean-k-mooneyon a larger cluster it may be more complicated however14:14
sean-k-mooneymgariepy: i have done that but usally i create a test env first.14:15
mgariepyi don't like updating the db. sometimes it comes back to bite me ..14:15
*** eharney has joined #openstack-nova14:15
*** munimeha1 has joined #openstack-nova14:15
sean-k-mooneymgariepy: yes it can it can be less painful then redeploying the cluster however14:15
sean-k-mooneyunless you jsut ment the applicateion in the cluster.14:16
mgariepyit's a "special" contributed cluster part of another one14:16
sean-k-mooneydeleting the vm and recreating it would have the same effect but i assume this happend on all nodes so you would have to delete all the vms with passthough and recreate them14:16
mgariepyhe runs some kind of hpc cluster on kubernetes on the vms14:16
sean-k-mooneyso to be clear. you use openstack to spwan 1 vm per host that uses all the host resouces then they use kubernetes to run a distibuted hpc application on the vms14:17
mgariepyanyway, not a big deal, i'll be more careful next time. i just messed up the nova.conf pci config on the upgrade14:18
mgariepyyep.14:18
mgariepyhaha :D14:18
sean-k-mooneythat seam overly complicated but if ti work it works i guess :)14:18
*** maciejjozefczyk has joined #openstack-nova14:18
mgariepyit's shared, and this way the client doesn't really have to deal with the hardware...14:18
mgariepyand have some other benefit like access to some storage and so on.14:19
sean-k-mooneyyep i totally get why you would do it its just you have at least 3 layers fo orchestration there14:19
mgariepyyep, not all the same person do all 3.14:20
sean-k-mooneyopenstack orchestrting the vms, kubernets orchestatin the hpc cluster and spark or whatever orchestatign the hpc jobs on the cluster14:20
mgariepyprobably slurm, but i'm not 100% sure.14:21
mgariepyanyway, thanks a lot for your time and help.14:21
*** moshele has quit IRC14:21
sean-k-mooneyif you deploy your openstack with kubernets you can make it nice an inceptione14:21
sean-k-mooneyno worres are you ok with me closing https://bugs.launchpad.net/nova/+bug/180904014:21
openstackLaunchpad bug 1809040 in OpenStack Compute (nova) "pci device lost when error in the configuration file " [Undecided,New]14:21
mgariepythe question is ,will I be able to remove the physical server at some point.14:22
mgariepyyep14:22
stephenfinsean-k-mooney: Perhaps we could add a nova-compute start up check to see if there are unrecognized PCI devices attached to running instances and fail to start if so?14:25
stephenfinsean-k-mooney: Thought I guess by then the old PCI devs in the manager would have been marked deleted and new ones created14:25
stephenfinUnless we did it realllly early, but that would involve duplicating a lot of logic14:25
stephenfinreally early = before the PCI manager stuff kicks off14:26
sean-k-mooneystephenfin: well i was going to suggest in the but if some wanted to retarget the bug to allow a hard reboot to fix it then it would be fine14:26
jaypipesjackding: will try my best14:26
mgariepystephenfin, there are already something like that : https://paste.ubuntu.com/p/GVJQqMSTrM/14:26
sean-k-mooneystephenfin: e.g. if you had a vm with a passthough deivce in the flavor aliase we would revalidate taht we have claimed it  and fix it on hard reboot14:28
sean-k-mooneyi was gong to triage it as wontfix and low priority14:29
mgariepyanyone of you uses passthrough and seen: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/180841214:30
openstackLaunchpad bug 1808412 in linux (Ubuntu) "4.15.0 memory allocation issue" [Undecided,Confirmed]14:30
mgariepywhen i start the vm with pci_passthrough ,it does pre-allocate the ram, and get stuck at some point.14:30
mriedemstephenfin: rebuild won't fix it b/c we don't do a resource claim on rebuild14:30
*** liuyulong has joined #openstack-nova14:31
mgariepyshelve/unshelve works14:32
openstackgerritMatt Riedemann proposed openstack/nova master: Fix a broken-link in nova doc  https://review.openstack.org/62611314:35
*** ralonsoh has quit IRC14:38
mriedemyeah that would work, didn't think about that, but that will do a new instance resoure claim on unshelve14:38
mriedemhow are you pinning it back to the same host though?14:38
mriedemor is that the only option for that instance?14:38
*** yan0s has quit IRC14:38
mriedems/instance/flavor/14:38
mgariepyi don't have another spot for it haha14:38
mgariepyotherwise migrate should be better14:39
mgariepyas shelve is not shelving ephemeral drive.14:39
*** maciejjozefczyk has quit IRC14:41
mriedemmgariepy: are you ok with https://bugs.launchpad.net/nova/+bug/1809040/comments/4 ?14:42
openstackLaunchpad bug 1809040 in OpenStack Compute (nova) "pci device lost when error in the configuration file " [Undecided,New]14:42
sean-k-mooneysorry had to pop away for a bit. i have changed my mind i think i will set it to triaged and low instead of wontfix and low and state that the logic is working as desigend but that we shoudl be able to correct the issue with a hard reboot and we shoudl fix that14:42
*** mlavalle has joined #openstack-nova14:42
sean-k-mooneymriedem: mgariepy stephenfin does ^ sound resonable14:43
*** maciejjozefczyk has joined #openstack-nova14:43
openstackgerritLee Yarwood proposed openstack/nova master: compute: Reject migration requests when source is down  https://review.openstack.org/62348914:43
stephenfinsean-k-mooney: Sounds fair14:44
mriedemavolkov was working a fix at one point https://review.openstack.org/#/c/426243/14:44
stephenfinmriedem: Could you send this docs followup patch on its way? https://review.openstack.org/#/c/614322/14:44
mriedemdone14:45
sean-k-mooneymriedem: that occurred to me as well but i was not sure how involved that would be. e.g. updating the db from the deleted value. also not sure how safe it would be in some cases such as if the divece was chaged14:46
sean-k-mooneyill link it in the bug for context too14:46
mriedemthere is also https://bugs.launchpad.net/nova/+bug/163312014:46
openstackLaunchpad bug 1633120 in OpenStack Compute (nova) "Nova scheduler tries to assign an already-in-use SRIOV QAT VF to a new instance" [Undecided,Confirmed]14:46
mriedemall the same issue,14:46
mriedemchange pci whitelist and restart nova compute blows away allocated pci devices14:47
sean-k-mooneyyou know when we start tracking these device in placement we are not going to be able to just blow away the resouce provider anymore if there are allocation against it14:48
mriedemthat was my reply on mgariepy's bug,14:48
mriedemlong-term when pci device inventory and allocation is in placement, this shouldn't be a problem,14:48
mriedembecause even if nova-compute tried to remove device inventory, if it's in-use by a consumer (instance) the inventory delete request will fail with a 40914:49
mriedemwhere is the code that removes / deletes devices on restart of nova-compute? shouldn't that be smarter and not delete those devices which are assigned to an instance?14:49
*** hongbin has joined #openstack-nova14:50
mriedemas we can see in https://paste.ubuntu.com/p/Pn76QVmwqr/ pci device 8 is allocated to an instance but was deleted anyway14:50
sean-k-mooneymriedem: well we will still need to resouce track to handel the pci device assingment aspect and we will need to be able to coralt it back to the rp14:50
*** udesale has joined #openstack-nova14:50
mriedemso in 4 years when pci devices are handled with placement, this will not suck as bad anymore yeah?14:50
mriedemcan't we just not delete allocated pci devices?14:51
sean-k-mooneymriedem: actuly i think it will suck more14:51
mriedemobviously that is some kind of referential constraint14:51
sean-k-mooneywe could delay deleting the entry until it is deallcoated14:52
sean-k-mooneyif the admin change the whitelist to prevent a device form being used we would still want to stop new instaces form using it14:52
sean-k-mooneybut it would be resoable to assume that an existing instance could continue to use it14:52
mriedemi don't know how you delay that14:53
mriedemwithout new logic / data modeling on the pci device record to say, "pending delete" or something once it's no longer allocated14:54
mriedembut by then you'd have to see if it's back in the pci whitelist14:54
sean-k-mooneythis is where we figure out the available devices https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L11914:54
mriedemi would think a simple solution is fail nova-compute restart if you tried scrubbing an entry from the whitelist for an allocated devie14:54
mriedemjaypipes: have you ever heard of this craziness? ^14:54
*** maciejjozefczyk has quit IRC14:55
* jaypipes reads back14:55
mriedemjaypipes: if you change the pci whitelist config to remove an already allocated device, and restart nova-compute, compute deletes the pci device record that is still allocated to an instance14:55
mriedemif you then add it back to the whitelist and restart, nova-compute creates a new 'available' pci device record and the scheduler can try to use it14:56
sean-k-mooneyso we could just change the if form  self.dev_filter.device_assignable(dev) to  self.dev_filter.device_assignable(dev) or is_allocated(deve) where is allocated is a new fuction14:56
*** idlemind has joined #openstack-nova14:56
mriedemwhich the hypervisor will reject14:56
mriedemyou can also lose the old device assigned to the old instance if you reboot the instance14:56
*** jonher_ has joined #openstack-nova14:56
mriedemsean-k-mooney: maybe? i don't know what devices_json is14:57
mriedemis that from the db or the config?14:57
*** ralonsoh has joined #openstack-nova14:57
*** awaugama has joined #openstack-nova14:57
mriedemgod even finding the object code to see where pci device records are deleted is hard14:58
*** jonher- has joined #openstack-nova14:59
mriedemoh it's in save() i should have known!14:59
*** jonher has quit IRC14:59
*** jonher- is now known as jonher14:59
mriedemhttps://github.com/openstack/nova/blob/master/nova/objects/pci_device.py#L24414:59
sean-k-mooneyso this is suspicious https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L166-L18114:59
mriedemidk, seems like a very easy solution to avoid screwing up the db state would just be raise an exception here https://github.com/openstack/nova/blob/master/nova/objects/pci_device.py#L244 if self.instance_uuid is not None15:01
mriedemi.e. you can't remove/delete an allocated pci device15:01
*** jaypipes has quit IRC15:01
*** erlon_ has quit IRC15:01
*** erlon has joined #openstack-nova15:01
*** jonher_ has quit IRC15:02
sean-k-mooney yes but it looks like   https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L166-L181 was maybe working around that15:02
mriedemthat was added 6 years ago https://github.com/openstack/nova/commit/4855239497050c9ee03fed627c5f41d6b59eddc615:04
mriedemand clearly it sucks15:04
gibimriedem: FYI my wall of text as a review guide for the bandwidth series http://lists.openstack.org/pipermail/openstack-discuss/2018-December/001129.html15:04
sean-k-mooneyyes im reading the original commit curently15:04
*** dpawlik has quit IRC15:04
*** dpawlik has joined #openstack-nova15:05
mriedemgibi: ack thanks15:05
*** dpawlik has quit IRC15:06
*** dpawlik has joined #openstack-nova15:06
sean-k-mooneymriedem: if we simply dont do   existed.status = 'removed' with "continue" in the excpet clause that may be enough.15:06
*** jonher_ has joined #openstack-nova15:07
sean-k-mooneymriedem: i lean more and more to its working how its was intended but what was intended is dumb and we shoudl fix it15:07
mriedemmgariepy: did you see this warning in the nova-compute logs? https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L16915:07
mriedemsean-k-mooney: yeah i don't understand that logic at all15:08
mriedemi guess it's saying the hypervisor is no longer reporting that device so we need to forcefully remove it/15:08
mriedem?15:08
sean-k-mooneymriedem: the orginal code comment was lost at some point from the set_hvdevs fucntion15:09
sean-k-mooney"Devices should not be hot-plugged when assigned to a guest,15:09
sean-k-mooney        but possibly the hypervisor has no such guarantee. The best15:09
sean-k-mooney        we can do is to give a warning if a device is changed15:09
sean-k-mooney        or removed while assigned.15:09
sean-k-mooney"15:09
mriedemoh but the new_devs are filtered through the whitelist15:09
mriedemin device_assignable15:09
mriedemhttps://github.com/openstack/nova/blob/master/nova/pci/manager.py#L12015:09
mriedemso it's not that the hypervisor is no longer reporting the devices, it's that the whitelist changed15:10
mriedemwhich is exactly the bug15:10
sean-k-mooneyyes15:10
*** jonher has quit IRC15:10
*** jonher_ is now known as jonher15:10
sean-k-mooneyso if we change the filtering to allow allocated deivce in addtion to the whitelist it will be fine15:10
*** dpawlik has quit IRC15:11
*** alexchadin has quit IRC15:11
*** cfriesen has joined #openstack-nova15:11
sean-k-mooneywhen the device is deallocated form the guest it will be removed from the aviable device on the next periodic sync15:11
sean-k-mooneyas it si nolonger in the whitelist or allocated15:11
stephenfinmriedem: Not to distract you now, but did you make a mistake on https://github.com/openstack/nova/commit/cdf8ba5acb ? You've said it fixes https://bugs.launchpad.net/nova/+bug/1784579 but that bug is for live migration, not compute service restart which is what your commit addresses15:12
openstackLaunchpad bug 1784579 in OpenStack Compute (nova) queens "unable to live migrate instance after update to queens" [Medium,Confirmed]15:12
stephenfinmriedem: I ask because I found a similar bug which does deal with the compute service restart https://bugs.launchpad.net/nova/+bug/173837315:13
openstackLaunchpad bug 1738373 in OpenStack Compute (nova) "nova-compute cannot restart if _init_host failed" [Undecided,In progress] - Assigned to Xiao Gong (gongxiao)15:13
mgariepymriedem,  https://paste.ubuntu.com/p/GVJQqMSTrM/15:13
mgariepyyes i did saw the warning.15:13
mriedemmgariepy: yup and that matches the instance uuid in https://paste.ubuntu.com/p/Pn76QVmwqr/15:14
mriedemfor the deleted pci device15:14
*** jaypipes has joined #openstack-nova15:15
sean-k-mooneymriedem: mgariepy ill write this up in the bug and i might go fix it. that said im going on vaction today so i might not get to it untill january15:15
sean-k-mooneyif other want to work on it in the interim feel free.15:15
mriedemsean-k-mooney: i just did https://bugs.launchpad.net/nova/+bug/163312015:16
openstackLaunchpad bug 1633120 in OpenStack Compute (nova) "Nova scheduler tries to assign an already-in-use SRIOV QAT VF to a new instance" [Undecided,Confirmed]15:16
*** cfriesen has quit IRC15:17
sean-k-mooneymriedem: ok in that case ill close https://bugs.launchpad.net/nova/+bug/1809040 as a duplicate of https://bugs.launchpad.net/nova/+bug/163312015:17
openstackLaunchpad bug 1633120 in OpenStack Compute (nova) "duplicate for #1809040 Nova scheduler tries to assign an already-in-use SRIOV QAT VF to a new instance" [Undecided,Confirmed]15:17
mriedemsean-k-mooney: already did15:17
mriedemstephenfin: yes bug 1784579 is about os-vif port binding failed errors right?15:18
openstackbug 1784579 in OpenStack Compute (nova) queens "unable to live migrate instance after update to queens" [Medium,Confirmed] https://launchpad.net/bugs/178457915:18
jaypipesmriedem: apologies. internet down for last 15 minutes in Sarasota... last thing I got from you was "is that from the db or the config"15:18
sean-k-mooneymriedem: in that case ill pay sean-k-mooney 1 million dollars :)15:18
mriedemjaypipes: oh just this terrible code https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L17715:19
stephenfinmriedem: Yup, but it's to do with live migration and the fix is only for the service startup code path15:19
mriedemsean-k-mooney: if you want, throw up a patch that changes ^ to a continue, fix whatever test hits that and i'll +215:19
stephenfinAt least, assuming I'm reading it right. I'll do some digging but just wanted to sanity check it before I dived down the rabbit hole :)15:19
mriedemstephenfin: the live migratoin fails because of the port binding failures15:19
jaypipesmriedem: I think you meant this code: https://github.com/openstack/nova/blob/a1dba961f0018a4995d208a290f4a859ce295840/nova/pci/manager.py#L1-L35715:20
mriedemi see what you did there15:20
jaypipesu like that?15:20
sean-k-mooneyjaypipes: your not wronge15:21
mriedemstephenfin: comment15:21
mriedem215:21
mriedem"To summarize, it looks like the pre_live_migration method on the  destination host fails to plug vifs and you end up with the  "binding_failed" error, which is raised and makes the source  live_migration method fail as expected. The failure is on the dest host.  As a result, the info cache is updated with "binding_failed" which  causes the source compute restart to fail here:"15:21
jaypipessean-k-mooney: but you are.15:21
jaypipeswronge that is.15:21
jaypipessean-k-mooney: :P15:21
sean-k-mooneyjaypipes: it is currently more functional then cyborge's pci passthough support at least ...15:22
mriedemstephenfin: so no i didn't fix the original reason for the port binding failure in pre_live_migration, because that could have been for any number of reasons (neutron agent was down on the dest host?)15:22
mriedemi fixed a symptom of that failure, which was nova-compute failed to restart after that failure15:22
mriedemas the commit message says, "Admittedly this isn't the smartest thing and doesn't attempt15:22
mriedem    to recover / fix the instance networking info"15:22
stephenfinmriedem: I'm missing something. Why make changes to 'ComputeManager.init_host' (via '_init_instance') in that commit? The exception was being seen in the live migration flow15:22
stephenfinahhhhh15:23
mriedem1. live migratoin fails, port binding failed - that gets saved in the info cache15:23
*** dpawlik has joined #openstack-nova15:23
mriedem2. restart source compute - that blows up because it wasn't handling binding_failed vif types in the os-vif conversion code15:23
mriedemi handle #215:23
mriedem#1 is sort of out of my control15:23
stephenfinYour fix would inadvertently resolve https://bugs.launchpad.net/nova/+bug/1738373 so15:23
openstackLaunchpad bug 1738373 in OpenStack Compute (nova) "nova-compute cannot restart if _init_host failed" [Undecided,In progress] - Assigned to Xiao Gong (gongxiao)15:23
mriedemi mean, we probably shouldn't be saving off busted port binding information when pre_live_migration fails,15:24
mriedemsince that overwrites the previously good port binding information from the source host15:24
mriedemi would have to dig into where we save off the bad port binding information15:25
stephenfinYup, there's a related fix (also for live migration) that you worked on which looks more involved https://bugs.launchpad.net/nova/+bug/178391715:25
openstackLaunchpad bug 1783917 in OpenStack Compute (nova) "live migration fails with NovaException: Unsupported VIF type unbound convert '_nova_to_osvif_vif_unbound'" [High,Fix released] - Assigned to Matt Riedemann (mriedem)15:25
stephenfinWait, wrong link?15:25
* stephenfin checks (ETOOMUCHTABS)15:25
*** yan0s has joined #openstack-nova15:25
mriedem^ was a regression in rocky15:26
*** tbachman has quit IRC15:26
mriedemso i suppose my fix should have been related to bug 178457915:26
openstackbug 1784579 in OpenStack Compute (nova) queens "unable to live migrate instance after update to queens" [Medium,Confirmed] https://launchpad.net/bugs/178457915:26
mriedemnot closes it15:26
stephenfinProbably, yeah15:27
stephenfinOK, there's just a lot of overlap on these and I'm just trying to unravel it. We also have https://bugzilla.redhat.com/show_bug.cgi?id=1578028 (which I'm filing upstream now) which seems related too15:27
openstackbugzilla.redhat.com bug 1578028 in openstack-nova "ovaException: Unsupported VIF type unbound convert '_nova_to_osvif_vif_unbound'" [Urgent,Assigned] - Assigned to nova-maint15:27
*** dpawlik has quit IRC15:27
stephenfinThat os_vif conversion code could probably do with some tweaks. I'll see what I can do15:28
mriedemlooks like i need to backport https://review.openstack.org/#/c/595317/ as well - i had marked the bug for queens but forgot to keep going i guess15:29
sean-k-mooney stephenfin mriedem fixed that in rocky rc phase15:29
mriedemstephenfin: sorting out where we save off bogus port binding failed information during pre_live_migration is probably worthwhile15:30
mriedemi'm sure there is probably some update db decorator involved that automatically does it15:30
sean-k-mooneymriedem: i thik its related to the network opdate events we get form neutron15:31
mriedemoh yeah that might have been it15:31
stephenfinmriedem: Ack. I just need to get a reproducer. Easier said than done15:31
mriedemsean-k-mooney: b/c we'll get a network-changed event for that i believe15:32
stephenfinsean-k-mooney: I assume you're referring to https://bugs.launchpad.net/nova/+bug/1783917 which I think is different to https://bugzilla.redhat.com/show_bug.cgi?id=157802815:32
openstackLaunchpad bug 1783917 in OpenStack Compute (nova) "live migration fails with NovaException: Unsupported VIF type unbound convert '_nova_to_osvif_vif_unbound'" [High,Fix released] - Assigned to Matt Riedemann (mriedem)15:32
openstackbugzilla.redhat.com bug 1578028 in openstack-nova "ovaException: Unsupported VIF type unbound convert '_nova_to_osvif_vif_unbound'" [Urgent,Assigned] - Assigned to nova-maint15:32
stephenfinThe latter is an issue all the way back to newton, assuming that BZ information is correct. Couldn't possibly be a Rocky regression15:33
mriedemstephenfin: https://bugzilla.redhat.com/show_bug.cgi?id=1578028 is newton, so yes15:33
mriedemhttps://review.openstack.org/#/c/595317/1/nova/network/os_vif_util.py wouldn't fix that since it's specifically handling vif_type of 'binding_failed'15:33
mriedemnot 'unbound'15:33
sean-k-mooneyunbound is the vif type before we set the host in the binding profile15:34
openstackgerritMatt Riedemann proposed openstack/nova stable/queens: Handle binding_failed vif plug errors on compute restart  https://review.openstack.org/62621815:34
stephenfinmriedem: Aye, the dumb fix would be to handle unbound. Reading that bug, sahid suggested refreshing the info_cache but I'm thinking we can't do that on service startup due to the cost?15:34
sean-k-mooneystephenfin: well https://review.openstack.org/#/c/591607/ will help with that15:35
*** ccamacho has quit IRC15:35
mriedemstephenfin: there is a lot of conversation about that in https://review.openstack.org/#/c/587498/15:35
stephenfinmriedem++15:35
mriedemthis https://review.openstack.org/#/c/603844/ would also be related15:35
mriedemas a forced recovery action15:36
*** erlon_ has joined #openstack-nova15:36
mriedemsean correctly pointed out i should have used related/partial bug https://review.openstack.org/#/c/587498/3//COMMIT_MSG@3515:37
mriedemi must have missed that15:37
sean-k-mooneyhttps://bugzilla.redhat.com/show_bug.cgi?id=1645316 needs https://review.openstack.org/#/c/591607/15:37
openstackbugzilla.redhat.com bug 1645316 in openstack-nova "Nova fails to attach both interfaces to VM after hypervisor reboot" [High,On_dev] - Assigned to smooney15:37
mriedemstephenfin: i think the heal conversation is here https://review.openstack.org/#/c/587498/1/nova/compute/manager.py@95615:38
mriedemi have to come back on https://review.openstack.org/#/c/591607/15:38
mriedemtoo much gd stuff going on15:38
mriedemsean-k-mooney: are you working on that pci device bogus removal patch?15:38
mriedemor should i?15:38
stephenfinmriedem: Yeah, sorry. Let me draft a bug report and a patch and we can discuss later. As you were15:39
sean-k-mooneyam i just booted up my sriov env so ill make the change now and see if that works15:39
mriedemstephenfin: bug report for the unbound thing? yeah i guess just refer to these other 20 bugs for binding_failed and it's the same issue15:39
*** erlon has quit IRC15:39
stephenfinYeah, maybe I should just make the existing bug more generic and re-open it15:40
mriedemstephenfin: btw, PS1 of my patch would have handled unbound https://review.openstack.org/#/c/587498/1/nova/network/os_vif_util.py15:40
mriedemsince it raised a new UnsupportedVifTypeConversion exception for anything we can't convert15:40
*** fragatina has quit IRC15:41
mriedemand handled in init_host https://review.openstack.org/#/c/587498/1/nova/compute/manager.py15:41
mriedemeric convinced me to do something more targeted15:41
stephenfinmriedem: Any idea why you changed?15:41
stephenfinah15:41
mriedemhttps://review.openstack.org/#/c/587498/1/nova/network/os_vif_util.py15:42
mriedemso your fix is just do the same pattern but for vif_type=unbound15:42
openstackgerritMartin Midolesov proposed openstack/nova master: vmware:add support for the hw_video_ram image property  https://review.openstack.org/56419315:42
stephenfinyup15:42
*** cfriesen has joined #openstack-nova15:42
canori01mriedem: Is it possible to force the resize of an instance to the same host?  I have allow_resize_to_same_host, but it seems like it still tries to pick different hosts15:43
mriedemstephenfin: btw it goes back to newton https://review.openstack.org/#/c/350595/15:43
mriedemcanori01: i know you can force a cold migrate to a specific host with a newer microversion but i'm not sure if that applies to resize as well15:44
stephenfinbackporting fun \o/15:44
mriedemhttps://developer.openstack.org/api-ref/compute/?expanded=migrate-server-migrate-action-detail#migrate-server-migrate-action15:44
mriedemcanori01: ^ host param was added in 2.5615:44
* stephenfin has no idea how mriedem keeps all that context/these conversation threads in his head15:45
mriedemcanori01: ah looks like that won't work for cold migrate at least https://github.com/openstack/nova/blob/master/nova/compute/api.py#L354215:45
mriedemand you can't specify a host for resize https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/servers.py#L80615:46
mriedemhttps://developer.openstack.org/api-ref/compute/?expanded=migrate-server-migrate-action-detail,resize-server-resize-action-detail#resize-server-resize-action15:46
*** jmlowe has joined #openstack-nova15:47
mriedemcanori01: so the answer is no, and the scheduler is likely picking another host because one is available even though you could use the original host, but maybe weighers or something is picking the other host15:47
mriedemcanori01: or the source host is filtered out because of bug https://bugs.launchpad.net/nova/+bug/179020415:48
openstackLaunchpad bug 1790204 in OpenStack Compute (nova) "Allocations are "doubled up" on same host resize even though there is only 1 server on the host" [Medium,Triaged]15:48
*** brault has quit IRC15:48
openstackgerritBalazs Gibizer proposed openstack/nova master: Create RequestGroup from neutron port  https://review.openstack.org/62594115:49
openstackgerritBalazs Gibizer proposed openstack/nova master: Include requested_resources to allocation candidate query  https://review.openstack.org/62594215:49
openstackgerritBalazs Gibizer proposed openstack/nova master: Transfer port.resource_request to the scheduler  https://review.openstack.org/56726815:49
openstackgerritBalazs Gibizer proposed openstack/nova master: Extend RequestGroup object for mapping  https://review.openstack.org/61952715:49
openstackgerritBalazs Gibizer proposed openstack/nova master: Calculate RequestGroup resource provider mapping  https://review.openstack.org/61623915:49
openstackgerritBalazs Gibizer proposed openstack/nova master: Fill the RequestGroup mapping during schedule  https://review.openstack.org/61952815:49
openstackgerritBalazs Gibizer proposed openstack/nova master: Pass resource provider mapping to neutronv2 api  https://review.openstack.org/61624015:49
openstackgerritBalazs Gibizer proposed openstack/nova master: Recalculate request group - RP mapping during re-schedule  https://review.openstack.org/61952915:49
openstackgerritBalazs Gibizer proposed openstack/nova master: Send RP uuid in the port binding  https://review.openstack.org/56945915:49
openstackgerritBalazs Gibizer proposed openstack/nova master: Test boot with more ports with bandwidth request  https://review.openstack.org/57331715:49
openstackgerritBalazs Gibizer proposed openstack/nova master: Reject interface attach with QoS aware port  https://review.openstack.org/57007815:49
openstackgerritBalazs Gibizer proposed openstack/nova master: Reject networks with QoS policy  https://review.openstack.org/57007915:49
openstackgerritBalazs Gibizer proposed openstack/nova master: Remove port allocation during detach  https://review.openstack.org/62242115:49
openstackgerritBalazs Gibizer proposed openstack/nova master: Refactor PortResourceRequestBasedSchedulingTestBase  https://review.openstack.org/62408015:49
openstackgerritBalazs Gibizer proposed openstack/nova master: Record requester in the InstancePCIRequest  https://review.openstack.org/62531015:49
openstackgerritBalazs Gibizer proposed openstack/nova master: Add pf_interface_name tag to passthrough_whitelist  https://review.openstack.org/62531115:49
openstackgerritBalazs Gibizer proposed openstack/nova master: Ensure that bandwidth and VF are from the same PF  https://review.openstack.org/62354315:49
sean-k-mooneydamb that a long patch chain15:49
gibisean-k-mooney: sorry, it is "just" 17 patches long :)15:50
sean-k-mooneygibi: no complait form me. its much better then one bing one.15:51
gibisean-k-mooney: and I have to do a rebase soon as the second half is in merge conflict :/15:51
sean-k-mooneyim also looking forward to this feature too15:51
gibisean-k-mooney: yeah, I tried to make small steps15:51
gibisean-k-mooney: I've just posted a mail to the ML about the status and some summary about the implementation if you like long mails :)15:51
ShilpaSDgibi: Hi15:51
gibiShilpaSD: hi15:52
mriedemgibi: are you going to kill me? https://review.openstack.org/#/c/625942/215:52
ShilpaSDgibi: one question, for notification, can we have multi-valued for driver in configuration file?15:52
gibimriedem: you are right, so I'm not going to kill anybody :)15:52
gibiShilpaSD: that is coming from oslo, let me dig the doc for it15:53
ShilpaSDgibi: Are you talking about https://docs.openstack.org/oslo.messaging/latest/reference/notifier.html15:54
gibiShilpaSD: https://docs.openstack.org/oslo.messaging/ocata/opts.html#oslo_messaging_notifications.driver15:54
gibiyes15:54
mriedemfunny the config options are not in the oslo.messaging docs15:55
gibiit says multi-valued15:55
mriedembnemec: ^15:55
mriedemooo but they're in our docs https://docs.openstack.org/nova/latest/configuration/config.html#oslo-messaging-notifications15:55
*** Bhujay has joined #openstack-nova15:57
*** macza has joined #openstack-nova15:57
gibimriedem: I think it is in the official too https://docs.openstack.org/oslo.messaging/latest/configuration/opts.html#oslo_messaging_notifications.driver15:58
*** Bhujay has quit IRC15:58
*** Bhujay has joined #openstack-nova15:58
*** Bhujay has quit IRC15:59
*** Bhujay has joined #openstack-nova16:00
*** cdent has joined #openstack-nova16:01
*** ccamacho has joined #openstack-nova16:01
*** Bhujay has quit IRC16:01
*** fragatina has joined #openstack-nova16:01
*** macza has quit IRC16:01
melwittmriedem: thanks, taking a look16:01
*** Bhujay has joined #openstack-nova16:01
mriedemgibi: ah yes good call16:02
ShilpaSDgibi: thnaks, actually i want this config option to be set for masakari for notificatins as Default:'', so looking how to be declared this in conf16:02
*** Bhujay has quit IRC16:02
*** Bhujay has joined #openstack-nova16:03
mriedem[oslo_messaging_notifications]/driver=x16:03
mriedemsee http://logs.openstack.org/44/603844/11/check/tempest-full/ee8b609/controller/logs/etc/nova/nova_conf.txt.gz16:04
*** Bhujay has quit IRC16:04
mriedemif you want multi-valued, you just specify additional driver entries16:04
*** Bhujay has joined #openstack-nova16:04
*** Bhujay has quit IRC16:05
*** Bhujay has joined #openstack-nova16:06
ShilpaSDmriedem: means as comma seperated?16:06
*** Bhujay has quit IRC16:07
*** Bhujay has joined #openstack-nova16:07
*** Bhujay has quit IRC16:08
nicolasbockHi, good morning. I have a quick (hopefully) question: I am trying to find all key pairs. `openstack keypair list` only shows me keypairs associated with the current user. So I started digging through the Nova DB and stumbled upon a `key_pairs` table, which mysteriously is empty though. Where are those keypairs stored?16:09
*** Bhujay has joined #openstack-nova16:09
nicolasbockThanks already!16:09
*** _alastor_ has joined #openstack-nova16:10
*** Bhujay has quit IRC16:10
*** Bhujay has joined #openstack-nova16:10
*** Bhujay has quit IRC16:11
*** Bhujay has joined #openstack-nova16:12
sahidnicolasbock: are you sure to use the right database?16:13
mriedemShilpaSD: no i think separate lines16:13
mriedemListOpt is comma-separated16:13
mriedemMultiOpt is multiple entries for the same key16:13
nicolasbocksahid: Yes, that thought occurred to me as well16:13
mriedemthey are similar16:13
*** Bhujay has quit IRC16:13
nicolasbockI'll double check16:13
ShilpaSDmriedem: ok, thnaks16:13
mriedemnicolasbock: key pairs are in the api db16:13
*** Bhujay has joined #openstack-nova16:13
mriedemkey pair info per instance is in the instance_extra table in the cell db16:14
nicolasbockThanks mriedem !16:14
mriedemstephenfin: sean-k-mooney; btw, i tried to summarize some stuff on that port binding failed bug https://bugs.launchpad.net/nova/+bug/1784579/comments/1316:14
openstackLaunchpad bug 1784579 in OpenStack Compute (nova) queens "unable to live migrate instance after update to queens" [Medium,In progress] - Assigned to Matt Riedemann (mriedem)16:14
*** Bhujay has quit IRC16:14
sean-k-mooneymriedem: so i made the continue change and there were not unit test failures with tox -e py27 -- "pci|PCI|hvdevs|update_devices_from_hypervisor_resources" so im going to write a new one16:15
*** moshele has joined #openstack-nova16:15
stephenfinmriedem: :D https://bugs.launchpad.net/nova/+bug/1784579/comments/1416:15
*** Bhujay has joined #openstack-nova16:15
sean-k-mooneyill run the full set to be sure but im guessing ther was not test code16:15
*** itlinux has joined #openstack-nova16:15
stephenfinProbably should have agreed on who was doing that. Oh well16:15
mriedemsean-k-mooney: i'm not at all surprised there was missing test coverage for that code16:16
*** Bhujay has quit IRC16:16
openstackgerritStephen Finucane proposed openstack/nova master: Handle unbound vif plug errors on compute restart  https://review.openstack.org/62622816:16
*** mdbooth has quit IRC16:16
*** Bhujay has joined #openstack-nova16:16
stephenfinmriedem: There's the fix for the latest issue16:16
sean-k-mooneymriedem: look like ther are some test but none that assert that behavior16:17
*** Bhujay has quit IRC16:17
sahidnicolasbock: i just checked on my devstack i can list the keypairs, i used "nova_api" database16:18
*** Bhujay has joined #openstack-nova16:18
*** moshele has quit IRC16:19
*** Bhujay has quit IRC16:19
*** Bhujay has joined #openstack-nova16:19
melwittjackding: review runways are only for approved blueprint implementations, not spec reviews unfortuatnately (please see instructions on the etherpad), so I'm removing the specs from the queue FYI16:20
mriedemstephenfin: comments inline16:20
*** Bhujay has quit IRC16:20
nicolasbocksahid: Thanks, I found them!16:21
*** Bhujay has joined #openstack-nova16:21
nicolasbockI was looking in the `nova` database before16:21
nicolasbockThanks for the help sahid and mriedem16:21
sean-k-mooneymelwitt: for blueprints/specs we are ment to list them for open discution in the nova team meeting instead right too highlight them16:21
mriedemmelwitt: before adding https://blueprints.launchpad.net/nova/+spec/handling-down-cell back into the runway, we should probably know if tssurya is even around16:22
mriedembecause i don't think she is and there are -1s on the api change16:22
mriedemso there isn't much point in it being in a runway slot16:22
melwittmriedem: oh, right16:22
*** Bhujay has quit IRC16:22
*** Bhujay has joined #openstack-nova16:22
*** wolverineav has joined #openstack-nova16:23
*** Bhujay has quit IRC16:23
canori01mriedem: so pinning my flavors to the az's like you suggested yesterday worked fine. So instances don't leave their hypervisor's az if I give them a flavor that associates them to a host aggregate. My situation for the boot volumes is that they are ceph rbd backed. However, each az is backed  by a different ceph cluster (because we didn't want a ceph problem in one az to affect the others).16:24
canori01Probnlem I had is that on resize operations, the scheduler sometimes picked a host in another az and the resize would subsequently fail because that host can't access the rbd volume if it's in a different az16:24
*** Bhujay has joined #openstack-nova16:24
melwittsean-k-mooney: yeah, that's a way to get visibility by putting them on open discussion agenda16:24
canori01So while the flavor pinning works, I'm wondering how come it doesn't honor the OS-EXT-AZ:availability_zone of the instance when resizing16:24
*** eharney has quit IRC16:24
*** Bhujay has quit IRC16:25
*** Bhujay has joined #openstack-nova16:25
*** dpawlik has joined #openstack-nova16:26
*** moshele has joined #openstack-nova16:26
*** Bhujay has quit IRC16:26
*** Bhujay has joined #openstack-nova16:27
*** wolverineav has quit IRC16:27
sean-k-mooneyCardoe: does the instance have an az set16:27
mriedemcanori01: as i said before, if the instance is not created with a specific az, the scheduler does not restrict it to that az16:27
sean-k-mooneyCardoe: sorry that was for canori0116:28
openstackgerritStephen Finucane proposed openstack/nova master: Handle unbound vif plug errors on compute restart  https://review.openstack.org/62622816:28
mriedemyou could set https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.default_schedule_zone to force a default az, but you might not want that16:28
stephenfinmriedem: Thanks. Addressed16:28
*** Bhujay has quit IRC16:28
*** Bhujay has joined #openstack-nova16:28
mriedemcanori01: alternatively, if each volume is in a specific az, you could set https://docs.openstack.org/nova/latest/configuration/config.html#cinder.cross_az_attach to false and then would need https://review.openstack.org/#/c/469675/ to proxy the volume az to the instance during server create16:29
mriedemcross_az_attach=false means the server and root volume have to be in the same az16:29
*** Bhujay has quit IRC16:29
*** Bhujay has joined #openstack-nova16:30
mriedemhaving said that, https://review.openstack.org/#/c/469675/ is kind of fugly and i would like to work on an alternative fix that is less tightly coupled down that stack of code, but haven't found the time16:30
sean-k-mooneymriedem: does that rely on the cinder az mataching the nova az16:30
mriedemsean-k-mooney: yes16:30
*** dpawlik has quit IRC16:30
mriedemas can be seen here, it's extremely easy to break cross_az_attach=false today https://review.openstack.org/#/c/467674/16:31
*** Bhujay has quit IRC16:31
*** janki has quit IRC16:31
*** gyee has joined #openstack-nova16:31
mriedemsorrison at nectar is the only deployer i know personally (target uses it also) that uses cross_az_attach and he said their users are just required to always specify an az when creating a server16:31
mriedemb/c of bug 169484416:31
openstackbug 1694844 in OpenStack Compute (nova) "Boot from volume fails when cross_az_attach=False and volume is provided to nova without an AZ for the instance" [Medium,In progress] https://launchpad.net/bugs/1694844 - Assigned to Matt Riedemann (mriedem)16:31
*** Bhujay has joined #openstack-nova16:31
sean-k-mooneyright16:32
sean-k-mooneyi assume we dont have an api policy/config option to enforce that16:32
canori01mriedem: is the  OS-EXT-AZ:availability_zone attribute on the instance different than what the scheduler looks at? When, instantiate from horizon and choose any availability zone, that gets set with that of the host. So if I were to actually choose an az at that time, then it would honor the az when moving/resizing?16:32
*** Bhujay has quit IRC16:32
*** Bhujay has joined #openstack-nova16:33
sean-k-mooneyi think that is just the az it is currently schdulerd to but i think the schduler looks at the requst spec for the az when picking hosts16:33
melwittmriedem: I agree with your assessment on that bug you linked. the trace shows "(self.host_state_map[host] for host in seen_nodes)" noting self.host_state_map which is before my fix landed, but also noting that's different than what the code was _before_ my fix landed as well. the previous code was "(self.host_state_map[host] for host in seen_nodes if host in self.host_state_map)" which would also avoid the KeyError, which reminds me,16:33
melwittof another fix that they also don't have, judging from that trace https://github.com/openstack/nova/commit/d72b33b986525a9b2c7aa08b609ae386de1d0e8916:34
*** Bhujay has quit IRC16:34
mriedemah yeah16:34
*** macza has joined #openstack-nova16:34
*** Bhujay has joined #openstack-nova16:34
mriedemok maybe they just reported their version incorrectly16:34
mriedemsean-k-mooney: api policy/config option for what?16:35
mriedem[cinder]/cross_az_attach is read in the api16:35
*** udesale has quit IRC16:35
mriedemcanori01: "When, instantiate from horizon and choose any availability zone, that gets set with that of the host. So if I were to actually choose an az at that time, then it would honor the az when moving/resizing?" correct16:35
*** Bhujay has quit IRC16:35
mriedemcanori01: "is the  OS-EXT-AZ:availability_zone attribute on the instance different than what the scheduler looks at? " also correct16:36
*** Bhujay has joined #openstack-nova16:36
mriedemcanori01: the instance.availability_zone field in the db is set to whatever az its compute host is in16:36
mriedemand instance.availability_zone gets changed as the instance moves around16:36
mriedemthe scheduler looks at RequestSpec.availability_zone, which is the thing the user requested when they created the server16:37
*** yan0s has quit IRC16:37
mriedemso request_spec.az is immutable, but instance.az is not16:37
*** Bhujay has quit IRC16:37
mriedemhow we decide which az to use if the instance.host is in multiple azs....idk16:37
*** Bhujay has joined #openstack-nova16:37
mriedem^ is a jaypipes feature16:38
mriedemoh wait the compute host can be in multiple aggregates but only 1 az right?16:38
mriedemi always forget the rules16:38
*** macza_ has joined #openstack-nova16:38
*** macza has quit IRC16:38
*** Bhujay has quit IRC16:38
*** Bhujay has joined #openstack-nova16:39
mriedemyeah https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#host-aggregates-and-availability-zones "Availability zones are different from host aggregates in that they are explicitly exposed to the user, and hosts can only be in a single availability zone. Administrators can configure a default availability zone where instances will be scheduled when the user fails to specify one."16:39
mriedemcanori01: so, you could setup a default ceph pool with a default az for users that don't specify a specific az and flavor that isn't tied to a given az16:39
mriedemgets kind of hard to manage after awhile probably16:40
*** Bhujay has quit IRC16:40
openstackgerritMerged openstack/nova master: Address nits on I08991796aaced2abc824f608108c0c786181eb65  https://review.openstack.org/61432216:40
canori01mriedem: Perfect, thansk!16:40
*** Bhujay has joined #openstack-nova16:40
canori01sean-k-mooney: thank you as well16:40
*** jistr has quit IRC16:41
*** Bhujay has quit IRC16:41
*** jistr has joined #openstack-nova16:42
*** Bhujay has joined #openstack-nova16:42
*** dpawlik has joined #openstack-nova16:42
*** Bhujay has quit IRC16:43
*** Bhujay has joined #openstack-nova16:43
mriedemstephenfin: +216:43
stephenfin\o/16:44
openstackgerritMatt Riedemann proposed openstack/nova stable/pike: Handle binding_failed vif plug errors on compute restart  https://review.openstack.org/62636116:44
*** Bhujay has quit IRC16:44
*** Bhujay has joined #openstack-nova16:45
jaypipesmriedem: ask bauzas.16:45
mriedemi think bauzas is on permanent PTO16:45
*** Bhujay has quit IRC16:46
*** jistr_ has joined #openstack-nova16:46
*** dpawlik has quit IRC16:46
*** Bhujay has joined #openstack-nova16:46
*** jistr has quit IRC16:47
*** Bhujay has quit IRC16:47
*** Bhujay has joined #openstack-nova16:48
*** helenafm has quit IRC16:49
*** Bhujay has quit IRC16:49
*** Bhujay has joined #openstack-nova16:49
mriedemseriously though if anyone knows if bauzas is out the rest of the year, or what, it would be nice to know since i thought he was done with downstream fires for awhile16:50
*** Bhujay has quit IRC16:50
bauzasmriedem: I literrally have 2 days left :(16:50
mriedemoh so you have been around16:51
bauzasmriedem: but I'll commit myself on upstream reviews16:51
*** Bhujay has joined #openstack-nova16:51
*** moshele has quit IRC16:51
bauzasand upstream revision of the placement spec I have16:51
mriedemwell i can give you a bunch of specs to just blindly approve then16:51
*** jistr_ has quit IRC16:51
bauzasmriedem: that's reasonable, I'm just discussing with internal folks about begging time for upstream before I leave16:51
*** jistr has joined #openstack-nova16:52
bauzasI'll just throw my downstream firehose for the next 2 days16:52
mriedembauzas: in berlin you told me you were good to go for upstream again?16:52
bauzasmriedem: I was *thinking* to16:52
*** Bhujay has quit IRC16:52
*** Bhujay has joined #openstack-nova16:52
bauzasmriedem: but then someone left us, and more customers are using our OSP12/OSP13 codebase that runs placement :)16:52
bauzaswhich makes me dragged16:53
mriedemalright well here is a list: https://review.openstack.org/#/c/393930/ https://review.openstack.org/#/c/612531/ https://review.openstack.org/#/c/616037/ https://review.openstack.org/#/c/609779/ https://review.openstack.org/#/c/603352/16:53
*** Bhujay has quit IRC16:53
mriedemmelwitt: weren't you also working on a list of specs that looked like they could use some attention before the freeze?16:54
*** Bhujay has joined #openstack-nova16:54
melwittmriedem: yes, sent it out like a minute ago16:54
mriedemin general, i need reviews on the cross cell resize spec from people not named dan since he's been the only one16:54
mriedemand it's a hairy gd monster and if others aren't going to review it it's DOA for stein16:54
mriedembauzas: thoughts on my email yesterday about per-instance live migration timeouts would also be nice16:55
melwittyeah, I know :( I'm reviewing it today16:55
*** Bhujay has quit IRC16:55
bauzasmriedem: ack16:55
mriedemcfriesen: you might chime in on http://lists.openstack.org/pipermail/openstack-discuss/2018-December/001112.html as well16:55
*** Bhujay has joined #openstack-nova16:55
melwittmriedem: feel free to add notes and specs that are on your radar that I missed https://etherpad.openstack.org/p/nova-stein-blueprint-spec-freeze16:56
mriedemwill do16:56
*** Bhujay has quit IRC16:56
*** Bhujay has joined #openstack-nova16:57
*** dpawlik has joined #openstack-nova16:58
*** Bhujay has quit IRC16:58
*** Bhujay has joined #openstack-nova16:58
*** Bhujay has quit IRC16:59
*** Bhujay has joined #openstack-nova17:00
*** Bhujay has quit IRC17:01
*** Bhujay has joined #openstack-nova17:01
*** Bhujay has quit IRC17:02
*** dpawlik has quit IRC17:03
*** Bhujay has joined #openstack-nova17:03
bauzasmelwitt: thanks for the etherpad17:03
*** Bhujay has quit IRC17:04
*** Bhujay has joined #openstack-nova17:04
*** Bhujay has quit IRC17:05
*** Bhujay has joined #openstack-nova17:06
*** Bhujay has quit IRC17:07
*** Bhujay has joined #openstack-nova17:07
*** Bhujay has quit IRC17:08
*** Bhujay has joined #openstack-nova17:09
*** Bhujay has quit IRC17:10
*** Bhujay has joined #openstack-nova17:10
cfriesenmriedem: will take a look17:11
*** Bhujay has quit IRC17:11
*** Bhujay has joined #openstack-nova17:12
*** Bhujay has quit IRC17:13
*** Bhujay has joined #openstack-nova17:13
*** moshele has joined #openstack-nova17:14
*** ttsiouts has quit IRC17:14
*** Bhujay has quit IRC17:14
*** Bhujay has joined #openstack-nova17:15
*** Bhujay has quit IRC17:16
*** Bhujay has joined #openstack-nova17:16
*** Bhujay has quit IRC17:17
*** Bhujay has joined #openstack-nova17:18
*** Bhujay has quit IRC17:19
*** Bhujay has joined #openstack-nova17:19
mriedemtl;dr are the compromises worthwhile to move forward17:20
*** Bhujay has quit IRC17:20
*** Bhujay has joined #openstack-nova17:21
*** Bhujay has quit IRC17:22
*** Bhujay has joined #openstack-nova17:22
*** Bhujay has quit IRC17:23
*** Bhujay has joined #openstack-nova17:24
*** Bhujay has quit IRC17:25
*** Bhujay has joined #openstack-nova17:25
stephenfinBhujay: What is going on with your IRC connection?17:26
*** Bhujay has quit IRC17:26
*** Bhujay has joined #openstack-nova17:27
openstackgerritMerged openstack/nova master: Address nits on I1f1fa1d0f79bec5a4101e03bc2d43ba581dd35a0  https://review.openstack.org/61432317:27
openstackgerritMerged openstack/nova master: Fix a broken-link in nova doc  https://review.openstack.org/62611317:27
*** Bhujay has quit IRC17:28
*** Bhujay has joined #openstack-nova17:28
*** Bhujay has quit IRC17:29
*** Bhujay has joined #openstack-nova17:30
*** Bhujay has quit IRC17:31
*** Bhujay has joined #openstack-nova17:31
openstackgerritMatt Riedemann proposed openstack/nova stable/ocata: Handle binding_failed vif plug errors on compute restart  https://review.openstack.org/62636917:32
mriedemhooray for ocata em ^17:32
*** Bhujay has quit IRC17:32
*** Bhujay has joined #openstack-nova17:33
*** Bhujay has quit IRC17:34
stephenfinmelwitt: Seeing as you looked at the earlier change, fancy taking a look at https://review.openstack.org/#/c/626228 ?17:34
*** Bhujay has joined #openstack-nova17:34
melwittstephenfin: sure, always up for being pinged for reviews17:35
stephenfinmriedem: Thanks for reviewing that nit patch (y)17:35
*** Bhujay has quit IRC17:35
mriedemthe docs one? i didn't really, just saw gibi was +2 and it was a rebase17:36
mriedembut yw :)17:36
*** Bhujay has joined #openstack-nova17:36
*** Bhujay has quit IRC17:37
*** Bhujay has joined #openstack-nova17:37
*** Bhujay has quit IRC17:38
*** Bhujay has joined #openstack-nova17:39
openstackgerritJack Ding proposed openstack/nova-specs master: Select cpu model from a list of cpu models  https://review.openstack.org/62095917:39
*** fragatina has quit IRC17:39
*** Bhujay has quit IRC17:40
*** Bhujay has joined #openstack-nova17:40
*** Bhujay has quit IRC17:41
*** Bhujay has joined #openstack-nova17:42
*** Bhujay has quit IRC17:43
*** Bhujay has joined #openstack-nova17:43
*** derekh has quit IRC17:44
*** Bhujay has quit IRC17:44
*** Bhujay has joined #openstack-nova17:45
*** Bhujay has quit IRC17:46
*** Bhujay has joined #openstack-nova17:46
*** Bhujay has quit IRC17:47
*** dpawlik has joined #openstack-nova17:48
*** Bhujay has joined #openstack-nova17:48
*** Bhujay has quit IRC17:49
*** Bhujay has joined #openstack-nova17:49
*** Bhujay has quit IRC17:50
*** Bhujay has joined #openstack-nova17:51
*** Bhujay has quit IRC17:52
*** Bhujay has joined #openstack-nova17:52
*** dpawlik has quit IRC17:53
*** Bhujay has quit IRC17:53
*** Bhujay has joined #openstack-nova17:54
*** Bhujay has quit IRC17:55
*** Bhujay has joined #openstack-nova17:55
*** Bhujay has quit IRC17:56
openstackgerritChris Dent proposed openstack/nova master: Redirect user/placement to placement docs  https://review.openstack.org/62633317:57
*** Bhujay has joined #openstack-nova17:57
*** Bhujay has quit IRC17:58
*** Bhujay has joined #openstack-nova17:58
*** Bhujay has quit IRC17:59
*** Bhujay has joined #openstack-nova18:00
openstackgerritKrzysztof Opasiak proposed openstack/nova master: Fix server IPs with non-unique network names  https://review.openstack.org/62537118:02
cfriesenstephenfin: any chance you could take a look at the cpu models spec proposed by Jack ^ ?  Basically instead of setting one model in nova.conf the operator could specify a list, and the virt driver would use the first one that provides the requested cpu features.18:02
openstackgerritKrzysztof Opasiak proposed openstack/nova master: Fix server IPs with non-unique network names  https://review.openstack.org/62537118:02
*** dpawlik has joined #openstack-nova18:04
*** Bhujay has quit IRC18:05
*** dpawlik has quit IRC18:09
*** cezary_zukowski has quit IRC18:11
*** wolverineav has joined #openstack-nova18:12
*** wolverineav has quit IRC18:12
*** wolverineav has joined #openstack-nova18:12
*** sahid has quit IRC18:13
*** wolverineav has quit IRC18:13
*** cdent has quit IRC18:15
melwittmriedem: I wanted to bring this review to your attention, bug about returning build requests when a marker is specified (I know you love paginating stuff). I'm +2 on it https://review.openstack.org/62487018:16
*** wolverineav has joined #openstack-nova18:19
openstackgerritMerged openstack/nova master: Remove legacy RequestSpec compat code from live migrate task  https://review.openstack.org/62570518:22
*** amodi has quit IRC18:25
*** amodi has joined #openstack-nova18:26
*** sridharg has quit IRC18:32
openstackgerritTim Rozet proposed openstack/nova master: Fixes race condition with privsep utime  https://review.openstack.org/62574118:39
*** moshele has quit IRC18:46
*** avolkov has quit IRC18:53
mriedemi saw it before, asked andrey to flesh out the commit message, haven't been back18:57
*** moshele has joined #openstack-nova19:06
*** moshele has quit IRC19:12
melwittah, ok19:16
*** tbachman has joined #openstack-nova19:24
*** dpawlik has joined #openstack-nova19:25
*** tbachman has quit IRC19:26
*** moshele has joined #openstack-nova19:27
mnaserfriendly bump on this - https://review.openstack.org/#/c/619352/19:28
mnasersimple backport, the changes in the newer branches have merged too19:28
mriedemfrickler: fyi redo of the queens release https://review.openstack.org/62637719:29
*** dpawlik has quit IRC19:30
mriedemduplicate bug of https://review.openstack.org/#/c/567701/ just came through triage, the fix is straight-forward, the patch is mostly a functional test19:31
*** gouthamr_ is now known as gouthamr19:38
*** dpawlik has joined #openstack-nova19:41
*** dpawlik has quit IRC19:46
*** brault has joined #openstack-nova19:47
*** brault has quit IRC19:51
*** erlon_ has quit IRC19:55
*** wolverineav has quit IRC19:58
openstackgerritsean mooney proposed openstack/nova master: PCI: do not force remove allcoated devices  https://review.openstack.org/62638119:58
sean-k-mooneymriedem: i have no idea why my unit test is not working in ^19:59
sean-k-mooneyim going to grab dinner but if you have any insight let me know.20:00
mriedemack thanks20:01
*** david-lyle has quit IRC20:04
*** moshele has quit IRC20:07
*** markvoelker has joined #openstack-nova20:14
*** dklyle has joined #openstack-nova20:15
*** markvoelker has quit IRC20:19
*** wolverineav has joined #openstack-nova20:30
*** jmlowe has quit IRC20:31
openstackgerritMatt Riedemann proposed openstack/nova master: Document using service user tokens for long running operations  https://review.openstack.org/62638820:33
*** wolverineav has quit IRC20:34
openstackgerritJack Ding proposed openstack/nova-specs master: Select cpu model from a list of cpu models  https://review.openstack.org/62095920:35
melwittmriedem: re: the ML thread about that, I thought the oslo.messaging heart beat would take care of the long running live migration problem?20:39
mriedemthe problem isn't rpc20:40
*** priteau has quit IRC20:40
melwittoh, the token auth expiring20:40
mriedemnova tries to make a rest api request using the users token to cinder,20:40
mriedemthe token has timed out20:40
melwittI see, ok20:40
melwittyeah, have to have both then. I got the two confused together but they are two different issues20:41
mriedemi want to say i heard anecdotes at one point that rax public cloud had 24 token timeouts because of stuff like this way back when20:41
mriedemthe service user token stuff was added by osic, which was rax+intel20:41
melwittyeah, sounds familiar. I feel like we had something similar at yahoo too20:43
melwitt*something similar to service user auth20:44
*** wolverineav has joined #openstack-nova20:45
*** wolverineav has quit IRC20:50
*** wolverineav has joined #openstack-nova20:57
mriedemlong_rpc_timeout probably also deserves some mention somewhere in troubleshooting admin docs, but not sure right now,20:58
mriedemin general i've had random thoughts about things that would be good to put into a 'scaling issues' page in the docs20:58
openstackgerritKrzysztof Opasiak proposed openstack/nova master: Fix server IPs with non-unique network names  https://review.openstack.org/62537120:58
mriedembut haven't started anything20:58
melwitt++20:59
*** wolverineav has quit IRC21:09
*** wolverineav has joined #openstack-nova21:09
*** wolverineav has quit IRC21:14
mriedemmelwitt: i'm going to fix that unnecessary for loop in https://review.openstack.org/#/c/624870/ that Kevin pointd out, then approve21:15
melwittok, sounds good21:15
*** brault has joined #openstack-nova21:19
*** brault has quit IRC21:23
openstackgerritMatt Riedemann proposed openstack/nova master: Exclude build request marker from server listing  https://review.openstack.org/62487021:30
melwittmriedem: so the func test in this change doesn't fail without the change https://review.openstack.org/567701 is that expected based on the commit message? if so, is there no way to demonstrate the bug in the test?21:41
melwittI wasn't sure based on the wording "that is not a regression"21:42
*** dpawlik has joined #openstack-nova21:42
mriedembeen awhile, but the commit message is saying I8d426f2635232ffc4b510548a905794ca88d7f99 didn't introduce a regression21:43
melwittok, so unrelated to what I'm seeing I think. basically, without the change, somehow AZ is being updated on the instance. I don't yet know how21:43
mriedemi'll have to poke at it, i wrote that in may21:44
*** takashin has joined #openstack-nova21:45
* melwitt nods21:45
*** wolverineav has joined #openstack-nova21:46
*** dpawlik has quit IRC21:46
*** awaugama has quit IRC21:50
*** wolverineav has quit IRC21:51
*** wolverineav has joined #openstack-nova21:55
*** wolverineav has quit IRC21:55
*** wolverineav has joined #openstack-nova21:56
*** dpawlik has joined #openstack-nova21:58
melwittlooking at it myself for curiosity, I'm not finding how AZ could be updated without the fix. weird22:00
*** dpawlik has quit IRC22:02
openstackgerritMerged openstack/nova master: Move a generic bridge helper to a linux_net privsep file.  https://review.openstack.org/62001022:05
*** takashin has left #openstack-nova22:06
*** takashin has joined #openstack-nova22:07
*** slaweq has quit IRC22:13
* melwitt prints22:14
mriedemoh yay, for a looong time we passed potentially the wrong image to move claim during a resize https://github.com/openstack/nova/blob/1249617bdfaa8f4c586159374a4a0b244bbb298a/nova/conductor/tasks/migrate.py#L7722:15
mriedembased on the original image used to create the server, but potentially not the last image used to rebuild the server22:15
mriedemgd req spec22:15
*** markvoelker has joined #openstack-nova22:15
*** igordc has joined #openstack-nova22:17
mriedemwhich wasn't fixed until https://github.com/openstack/nova/commit/984dd8ad6add4523d93c7ce5a666a32233e02e34 inadvertently22:17
melwitthoo boy22:17
* mriedem hates request spec22:18
melwittme too :(22:18
*** rcernin has joined #openstack-nova22:19
mriedemoh this also likely means that if you shelve, unshelve, resize, we're passing the original image used to create the server, not the current image meta (in case that changed)22:19
*** ralonsoh has quit IRC22:22
melwittso the AZ is still the original all the way to the end of the live migration. so how is the servers.get API returning the new AZ... the search continues22:23
mriedemi think i know22:25
mriedemthe api code looks up the az from the instance.host22:26
mriedemi think22:26
melwittpulled from the DB, the AZ is still the original in instance.availability_zone22:26
melwittahhhh22:26
melwittmust be, it can't be looking at instance.availability_zone22:26
mriedemthere are other bugs for that behavior22:26
melwittTHANKS API22:26
mriedemhttps://github.com/openstack/nova/blob/master/nova/api/openstack/compute/views/servers.py#L18522:26
melwittle sigh22:27
mriedemhttps://github.com/openstack/nova/blob/master/nova/availability_zones.py#L16522:27
mriedemyeah if instance.host is not None, that code gets the az from the host aggregate, not the instance.az22:27
mriedemala https://review.openstack.org/#/c/582342/22:27
melwittwell, that explains it22:28
mriedemi have a f'ing patch for everything22:28
melwittit's true22:28
mriedemhttps://bugs.launchpad.net/nova/+bug/178253922:28
openstackLaunchpad bug 1782539 in OpenStack Compute (nova) "Fail to filter the list of instances by the available zone" [Medium,In progress] - Assigned to huanhongda (hongda)22:28
openstackgerritMatt Riedemann proposed openstack/nova master: Pass request_spec from compute to cell conductor on reschedule  https://review.openstack.org/58241722:33
mriedemspeak-o-the-turd22:33
mriedemmelwitt: so i'll adjust that test to assert based on the db rather than the api22:34
melwittok, makes sense22:34
*** rcernin has quit IRC22:36
melwittnow, to finish reviewing cross-cell-resize22:37
*** rcernin has joined #openstack-nova22:37
*** erlon_ has joined #openstack-nova22:43
*** munimeha1 has quit IRC22:45
*** rcernin has quit IRC22:57
openstackgerritHongbin Lu proposed openstack/nova-specs master: [WIP] Support scheduling VM's NICs to different PFs  https://review.openstack.org/62605522:58
*** rcernin has joined #openstack-nova22:58
*** rcernin has quit IRC22:59
tonybI'm seeing something 'strange' where placement is always selecting the same host out of the $n available  I *suspect* there is cruft in the DB after a bunch of failed boots but I don't really know23:03
tonybasside form the scheduler and placement logs where should I look?23:03
melwitttonyb: in the past, default scheduler behavior was to pack instances, so as long as a host has capacity, it would be returned again. I'm not sure if that's still the case now though23:04
melwittis that what you're seeing or no?23:04
mriedemtonyb: there was a thread in the ops list about this recently, there is an option you can set to randomize the results from placement23:05
mriedemhttps://docs.openstack.org/nova/latest/configuration/config.html#placement.randomize_allocation_candidates23:05
tonybmelwitt: Ahh perhaps that's it23:05
mriedem^ goes in the placement config btw, not nova-scheduler23:05
tonybmriedem: Okay.  I should have said this is queens but I'll look for packing and/or randomising23:06
tonybThanks23:07
*** rcernin has joined #openstack-nova23:07
mriedemlooky here23:07
mriedemhttps://docs.openstack.org/nova/queens/configuration/config.html#placement.randomize_allocation_candidates23:07
tonybmriedem: Will do.23:08
openstackgerritMatt Riedemann proposed openstack/nova master: Update instance.availability_zone during live migration  https://review.openstack.org/56770123:09
melwittin the olden days, IIRC, lots of operators preferred to maximize cloud utilization and packing is the way to do that. but maybe nowadays people prefer to spread out servers and risk not being able to land larger flavors. that's the history on the default as I understand it23:09
mriedemhuawei public cloud wants to pack like you've never packed before23:09
melwitt:)23:09
mriedemthey want nova to pack even more,23:09
mriedemdoing shit like the solver scheduler and what watcher does,23:09
melwittmoar efficiency. not surprising23:10
mriedemdynamically load balancing for ultimate packitude23:10
*** mlavalle has quit IRC23:13
*** jmlowe has joined #openstack-nova23:16
*** mchlumsky has quit IRC23:18
openstackgerritMatt Riedemann proposed openstack/nova stable/rocky: Update port device_owner when unshelving  https://review.openstack.org/62640723:20
openstackgerritMatt Riedemann proposed openstack/nova stable/queens: Update port device_owner when unshelving  https://review.openstack.org/62640823:22
openstackgerritMatt Riedemann proposed openstack/nova stable/pike: Update port device_owner when unshelving  https://review.openstack.org/62640923:29
*** alex_xu has joined #openstack-nova23:32
*** rcernin has quit IRC23:38
*** fragatina has joined #openstack-nova23:38
openstackgerritHongbin Lu proposed openstack/nova-specs master: [WIP] Support scheduling VM's NICs to different PFs  https://review.openstack.org/62605523:40
*** rcernin has joined #openstack-nova23:41
openstackgerritMerged openstack/nova master: Handle unbound vif plug errors on compute restart  https://review.openstack.org/62622823:43
*** NostawRm has quit IRC23:44
mriedemjackding: looked at https://review.openstack.org/#/c/603844/ again, i still don't like it23:45
*** wolverineav has quit IRC23:45
*** wolverineav has joined #openstack-nova23:46
*** erlon_ has quit IRC23:46
openstackgerritMerged openstack/nova master: Update port device_owner when unshelving  https://review.openstack.org/55982823:49
openstackgerritmelanie witt proposed openstack/nova stable/rocky: Handle unbound vif plug errors on compute restart  https://review.openstack.org/62641023:49
*** wolverineav has quit IRC23:50
mriedemtl;dr i don't want to randomly have to hit the neutron API every time we rebuild/reboot to list ports for a server just to check if something is broken23:52
*** wolverineav has joined #openstack-nova23:55
*** slaweq has joined #openstack-nova23:58
*** dpawlik has joined #openstack-nova23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!