*** wolverineav has joined #openstack-nova | 00:00 | |
*** tetsuro has joined #openstack-nova | 00:04 | |
*** liuyulong is now known as liuyulong|away | 00:05 | |
*** wolverineav has quit IRC | 00:06 | |
openstackgerrit | Merged openstack/nova master: Fix bug preventing forbidden traits from working https://review.openstack.org/648653 | 00:16 |
---|---|---|
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Add a live migration regression test https://review.openstack.org/641200 | 00:17 |
mriedem | oh wow this is pretty bad https://review.openstack.org/#/c/648653/ | 00:18 |
*** takashin has joined #openstack-nova | 00:19 | |
mriedem | efried: so forbidden traits just never worked in nova? ^ | 00:19 |
*** brinzhang has joined #openstack-nova | 00:26 | |
*** hamzy has joined #openstack-nova | 00:28 | |
*** wolverineav has joined #openstack-nova | 00:28 | |
*** wolverineav has quit IRC | 00:58 | |
*** wolverineav has joined #openstack-nova | 01:02 | |
*** wolverineav has quit IRC | 01:06 | |
*** tetsuro_ has joined #openstack-nova | 01:10 | |
*** alex_xu has quit IRC | 01:10 | |
*** lbragstad_ has joined #openstack-nova | 01:10 | |
*** tetsuro has quit IRC | 01:12 | |
*** cfriesen has quit IRC | 01:12 | |
*** lbragstad has quit IRC | 01:12 | |
*** phasespace has quit IRC | 01:12 | |
*** nicholas has quit IRC | 01:12 | |
*** mtreinish has quit IRC | 01:12 | |
*** toabctl has quit IRC | 01:12 | |
*** spotz has quit IRC | 01:12 | |
*** mtreinish has joined #openstack-nova | 01:13 | |
openstackgerrit | Yongli He proposed openstack/nova master: Clean up orphan instances virt driver https://review.openstack.org/648912 | 01:13 |
openstackgerrit | Yongli He proposed openstack/nova master: Clean up orphan instances https://review.openstack.org/627765 | 01:13 |
*** bbowen__ has quit IRC | 01:16 | |
*** wolverineav has joined #openstack-nova | 01:18 | |
*** mmethot has quit IRC | 01:19 | |
*** wolverineav has quit IRC | 01:22 | |
*** dtantsur|afk has quit IRC | 01:24 | |
*** dtantsur has joined #openstack-nova | 01:25 | |
*** igordc has quit IRC | 01:29 | |
*** wolverineav has joined #openstack-nova | 01:39 | |
*** hongbin has joined #openstack-nova | 01:41 | |
*** bbowen__ has joined #openstack-nova | 01:41 | |
*** wolverineav has quit IRC | 01:43 | |
*** tetsuro_ has quit IRC | 01:49 | |
*** whoami-rajat has joined #openstack-nova | 01:57 | |
*** mriedem has quit IRC | 02:07 | |
*** spsurya has joined #openstack-nova | 02:12 | |
*** BjoernT has joined #openstack-nova | 02:27 | |
*** alex_xu has joined #openstack-nova | 02:32 | |
*** BjoernT has quit IRC | 02:58 | |
*** hongbin has quit IRC | 03:00 | |
*** Sundar has quit IRC | 03:02 | |
openstackgerrit | Yongli He proposed openstack/nova master: Clean up orphan instances virt driver https://review.openstack.org/648912 | 03:04 |
openstackgerrit | Yongli He proposed openstack/nova master: Clean up orphan instances https://review.openstack.org/627765 | 03:04 |
*** BjoernT has joined #openstack-nova | 03:05 | |
*** BjoernT has quit IRC | 03:06 | |
*** BjoernT has joined #openstack-nova | 03:09 | |
*** lbragstad_ is now known as lbragstad | 03:14 | |
*** psachin has joined #openstack-nova | 03:16 | |
*** zhubx has joined #openstack-nova | 03:17 | |
*** zhubx has quit IRC | 03:30 | |
*** zhubx has joined #openstack-nova | 03:31 | |
*** nicolasbock has quit IRC | 03:33 | |
*** wolverineav has joined #openstack-nova | 03:40 | |
*** wolverineav has quit IRC | 03:44 | |
*** udesale has joined #openstack-nova | 04:00 | |
*** brinzhang has quit IRC | 04:11 | |
*** brinzhang has joined #openstack-nova | 04:11 | |
*** igordc has joined #openstack-nova | 04:18 | |
*** whoami-rajat has quit IRC | 04:57 | |
*** gbarros has joined #openstack-nova | 05:02 | |
*** BjoernT has quit IRC | 05:11 | |
*** ratailor has joined #openstack-nova | 05:18 | |
*** toabctl has joined #openstack-nova | 05:21 | |
*** abhishekk has joined #openstack-nova | 05:26 | |
*** gbarros has quit IRC | 05:27 | |
*** lbragstad has quit IRC | 05:28 | |
*** krypto has joined #openstack-nova | 05:51 | |
*** aarora06 has joined #openstack-nova | 06:07 | |
*** janki has joined #openstack-nova | 06:17 | |
*** sridharg has joined #openstack-nova | 06:18 | |
*** shilpasd has joined #openstack-nova | 06:20 | |
*** slaweq has joined #openstack-nova | 06:26 | |
*** pcaruana has joined #openstack-nova | 06:36 | |
*** pcaruana has quit IRC | 06:38 | |
*** pcaruana has joined #openstack-nova | 06:38 | |
openstackgerrit | Merged openstack/nova master: Remove flavor id and name validation code https://review.openstack.org/638150 | 06:39 |
*** mdbooth_ has joined #openstack-nova | 06:40 | |
*** mdbooth has quit IRC | 06:43 | |
*** Cardoe has quit IRC | 06:47 | |
*** Cardoe has joined #openstack-nova | 06:47 | |
*** dpawlik has joined #openstack-nova | 06:47 | |
*** krypto has quit IRC | 06:49 | |
*** rpittau|afk is now known as rpittau | 06:50 | |
*** krypto has joined #openstack-nova | 06:50 | |
*** luksky has joined #openstack-nova | 06:50 | |
*** phasespace has joined #openstack-nova | 06:52 | |
*** tkajinam has quit IRC | 06:58 | |
*** tkajinam has joined #openstack-nova | 07:03 | |
*** tosky has joined #openstack-nova | 07:09 | |
*** whoami-rajat has joined #openstack-nova | 07:13 | |
*** awalende has joined #openstack-nova | 07:13 | |
*** tesseract has joined #openstack-nova | 07:16 | |
*** bbobrov has quit IRC | 07:17 | |
*** bbobrov has joined #openstack-nova | 07:17 | |
*** ralonsoh has joined #openstack-nova | 07:18 | |
*** xek has joined #openstack-nova | 07:19 | |
*** jistr is now known as jistr|afk | 07:22 | |
*** tssurya has joined #openstack-nova | 07:32 | |
*** helenafm has joined #openstack-nova | 07:42 | |
awalende | I want to set the numatune memory mode to "preferred" in nova, how do I do that in Queens? | 07:44 |
*** igordc has quit IRC | 08:04 | |
openstackgerrit | Yongli He proposed openstack/nova master: Clean up orphan instances https://review.openstack.org/627765 | 08:06 |
openstackgerrit | Kashyap Chamarthy proposed openstack/nova-specs master: Re-propose the spec to allow specifying a list of CPU models https://review.openstack.org/642030 | 08:11 |
openstackgerrit | Tetsuro Nakamura proposed openstack/nova master: Add node_uuid field to Destination object https://review.openstack.org/649532 | 08:16 |
openstackgerrit | Tetsuro Nakamura proposed openstack/nova master: Pass node uuid to new Destination.node_uuid https://review.openstack.org/649533 | 08:16 |
openstackgerrit | Tetsuro Nakamura proposed openstack/nova master: Add in_tree field to RequestGroup object https://review.openstack.org/649534 | 08:16 |
openstackgerrit | Tetsuro Nakamura proposed openstack/nova master: node_uuid from RequestSpec to ResourceRequest https://review.openstack.org/649535 | 08:16 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: Block swap volume on volumes with >1 rw attachment https://review.openstack.org/572790 | 08:16 |
*** ttsiouts has joined #openstack-nova | 08:18 | |
*** ccamacho has joined #openstack-nova | 08:25 | |
*** priteau has joined #openstack-nova | 08:29 | |
*** takashin has left #openstack-nova | 08:29 | |
*** abhishekk has quit IRC | 08:33 | |
*** owalsh_ is now known as owalsh | 08:43 | |
*** tkajinam has quit IRC | 08:45 | |
*** wolverineav has joined #openstack-nova | 08:45 | |
*** derekh has joined #openstack-nova | 08:46 | |
*** luksky has quit IRC | 08:46 | |
*** wolverineav has quit IRC | 08:50 | |
*** luksky has joined #openstack-nova | 09:18 | |
*** xek has quit IRC | 09:20 | |
*** xek has joined #openstack-nova | 09:21 | |
*** jangutter has quit IRC | 09:21 | |
*** jangutter has joined #openstack-nova | 09:22 | |
*** xek has quit IRC | 09:22 | |
*** shilpasd has quit IRC | 09:22 | |
openstackgerrit | Merged openstack/nova master: De-cruft compute manager live migration https://review.openstack.org/641449 | 09:36 |
*** whoami-rajat has quit IRC | 09:37 | |
*** markvoelker has joined #openstack-nova | 09:55 | |
*** sidx64_ has joined #openstack-nova | 10:02 | |
*** sidx64_ has quit IRC | 10:18 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove unreachable codepaths https://review.openstack.org/649559 | 10:19 |
*** sidx64 has joined #openstack-nova | 10:19 | |
*** ttsiouts has quit IRC | 10:22 | |
*** ttsiouts has joined #openstack-nova | 10:23 | |
*** markvoelker has quit IRC | 10:27 | |
*** ttsiouts has quit IRC | 10:27 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: doc: Group API versions by release https://review.openstack.org/649560 | 10:27 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: doc: Trivial fixes to API version history https://review.openstack.org/649561 | 10:27 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: trivial: Remove dead code https://review.openstack.org/649562 | 10:27 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: hacking: Fix dodgy check https://review.openstack.org/649563 | 10:27 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: zvm: Remove dead code https://review.openstack.org/649564 | 10:27 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: trivial: Remove dead 'ALIAS' constant https://review.openstack.org/649565 | 10:27 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: trivial: Remove dead placement API functions https://review.openstack.org/649566 | 10:27 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: trivial: Remove unused constants, functions https://review.openstack.org/649567 | 10:27 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Use '_' for unused variables https://review.openstack.org/649568 | 10:27 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: trivial: Remove dead resource tracker code https://review.openstack.org/649569 | 10:27 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: trivial: Remove dead nova.db functions https://review.openstack.org/649570 | 10:28 |
stephenfin | awalende: You can't. We don't expose that | 10:28 |
stephenfin | awalende: It was planned but never implemented | 10:28 |
*** nicolasbock has joined #openstack-nova | 10:33 | |
*** Dinesh_Bhor has quit IRC | 10:34 | |
*** erlon has joined #openstack-nova | 10:38 | |
openstackgerrit | Kashyap Chamarthy proposed openstack/nova-specs master: Add Secure Boot support for KVM- and QEMU-based guests https://review.openstack.org/506720 | 10:39 |
* kashyap hates the dangling hyphen. But is there a better way to put it? | 10:39 | |
sean-k-mooney | yes just remove teh hypens | 10:40 |
sean-k-mooney | for kvm and qemu based guests | 10:41 |
sean-k-mooney | you can also remove based | 10:41 |
NewBruce | howdy sean-k-mooney | 10:41 |
sean-k-mooney | just looking at your proposea i dont se any reference to the use of the flavor extraspecs for contoling secure boot | 10:42 |
sean-k-mooney | NewBruce: hi | 10:42 |
NewBruce | was a bit like christmas last night, waiting to see how the tests would run :) | 10:42 |
NewBruce | shame it didn’t blow up, huh?! ;) | 10:42 |
sean-k-mooney | ya i had the tab open for a while but needed to go sleep | 10:42 |
sean-k-mooney | there are some other thing we could try like forcing different conpute level on either node | 10:43 |
NewBruce | yep, same - we’re gonna roll out rocky accross the full compute today and see what that does | 10:43 |
NewBruce | …. well, I was going to, until a net node decided to fall over due to excessive CPU usage in netlink | 10:44 |
kashyap | sean-k-mooney: But the sentence is syntactically correct. (I read about dangling hyphens enough that the above is correct, but ugly :D) | 10:44 |
sean-k-mooney | the grenade job should have been testing queens to rocky migration i think | 10:44 |
NewBruce | ever seen this? | 10:44 |
NewBruce | 2019-04-03T10:30:28.283Z|2158410|poll_loop(handler44)|INFO|Dropped 3426304 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate | 10:44 |
NewBruce | 2019-04-03T10:30:28.283Z|2158411|poll_loop(handler44)|INFO|wakeup due to [POLLIN] on fd 33 (unknown anon_inode:[eventpoll]) at ../lib/dpif-netlink.c:2786 (99% CPU usage) | 10:44 |
sean-k-mooney | actuly no it would have been rocky to master | 10:44 |
sean-k-mooney | is that form ovs | 10:44 |
NewBruce | yup | 10:45 |
NewBruce | Absolutely nothing on the machine; freshly rebooted | 10:45 |
NewBruce | PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND | 10:45 |
NewBruce | 1979 root 10 -10 1000172 135296 10912 S 799.3 0.2 1136:19 ovs-vswitchd | 10:45 |
kashyap | sean-k-mooney: Hmm, I mentioned the metadata property (`os_secure_boot`), but didn't write anything about flavor extra_specs | 10:45 |
NewBruce | … ouch; seems like a FD leak perhaps | 10:45 |
kashyap | sean-k-mooney: I'm ill a bit; will be taking the noon off. Will respond to feedback tomm | 10:45 |
sean-k-mooney | kashyap: take it easy. ill see if i can find the relevent place to add it and do a full review | 10:46 |
sean-k-mooney | openstack flavor set FLAVOR-NAME \ | 10:46 |
sean-k-mooney | --property os:secure_boot=SECURE_BOOT_OPTION | 10:46 |
*** wolverineav has joined #openstack-nova | 10:46 | |
sean-k-mooney | values are required|optional|disabled | 10:47 |
sean-k-mooney | NewBruce: perhaps it could be jsut ovs trying franticly to reconnect to all the tap device for the vms or something | 10:48 |
sean-k-mooney | i have seen that message or one like it when ovs has been under heavy load but i dont recal it on startup | 10:49 |
kashyap | sean-k-mooney: Yeah, makes sense. Please add your comment on the review. And thanks for the quick feedback | 10:49 |
kashyap | Appreciate your time | 10:49 |
sean-k-mooney | kashyap: its just a mirror of the image propery but it is used so that operators can use the aggerate filter to force the only instance with secure boot enabeled can be schduled to a host aggrate for increased security | 10:50 |
sean-k-mooney | at least in hyperv land where this works today | 10:50 |
NewBruce | looks like its got itself into a loop waiting for netlink and just stuck there | 10:51 |
*** wolverineav has quit IRC | 10:51 | |
*** tbachman has quit IRC | 10:51 | |
kashyap | sean-k-mooney: I see. Noted. | 10:52 |
kashyap | sean-k-mooney: The current problem for KVM/QEMU guests, upstream is that it gives a bogus sense of "secure boot" :-( | 10:52 |
kashyap | Because currently in upstream Nova (a) no way to configure "SMM"; (b) no way to specify NVRAM file with enrolled keys; (c) auto-select the _right_ OVMF binary | 10:53 |
kashyap | Anyway, more discussion later :-) | 10:53 |
*** mvkr has quit IRC | 11:01 | |
*** sidx64 has quit IRC | 11:01 | |
*** jistr|afk is now known as jistr | 11:02 | |
*** sidx64_ has joined #openstack-nova | 11:02 | |
*** helenafm has quit IRC | 11:04 | |
*** ttsiouts has joined #openstack-nova | 11:08 | |
*** hoonetorg has joined #openstack-nova | 11:16 | |
*** markvoelker has joined #openstack-nova | 11:24 | |
*** brinzhang has quit IRC | 11:24 | |
*** udesale has quit IRC | 11:26 | |
*** dikonoor has joined #openstack-nova | 11:30 | |
openstackgerrit | sean mooney proposed openstack/nova stable/ocata: PCI: do not force remove allocated devices https://review.openstack.org/635075 | 11:30 |
*** liuyulong|away is now known as liuyulong | 11:30 | |
*** sidx64_ has quit IRC | 11:32 | |
*** ratailor has quit IRC | 11:36 | |
*** rcernin has quit IRC | 11:38 | |
*** aarora06 has quit IRC | 11:45 | |
openstackgerrit | Merged openstack/nova master: Docs: emulator threads: clarify expected behavior https://review.openstack.org/649416 | 11:47 |
*** yan0s has joined #openstack-nova | 11:52 | |
*** mvkr has joined #openstack-nova | 11:54 | |
*** artom has quit IRC | 11:56 | |
*** markvoelker has quit IRC | 11:57 | |
openstackgerrit | Tetsuro Nakamura proposed openstack/nova master: Add node_uuid field to Destination object https://review.openstack.org/649532 | 12:05 |
openstackgerrit | Tetsuro Nakamura proposed openstack/nova master: Pass node uuid to new Destination.node_uuid https://review.openstack.org/649533 | 12:05 |
openstackgerrit | Tetsuro Nakamura proposed openstack/nova master: Add in_tree field to RequestGroup object https://review.openstack.org/649534 | 12:05 |
openstackgerrit | Tetsuro Nakamura proposed openstack/nova master: node_uuid from RequestSpec to ResourceRequest https://review.openstack.org/649535 | 12:05 |
*** shilpasd has joined #openstack-nova | 12:06 | |
shilpasd | hi: observed that in multinode setup, instance data files created at all nodes (controller + compute in my case) at 'instances_path', why is this so? | 12:08 |
openstackgerrit | Merged openstack/nova master: Style corrections for privsep usage. https://review.openstack.org/648615 | 12:09 |
openstackgerrit | Merged openstack/nova master: Remove mox in unit/network/test_neutronv2.py (4) https://review.openstack.org/574106 | 12:09 |
openstackgerrit | Merged openstack/nova master: Remove mox in unit/network/test_neutronv2.py (5) https://review.openstack.org/574110 | 12:10 |
*** tbachman has joined #openstack-nova | 12:12 | |
*** odyssey4me has quit IRC | 12:12 | |
sean-k-mooney | o/ | 12:18 |
sean-k-mooney | are there any known gate issues related to installing hacking package | 12:19 |
sean-k-mooney | i think the failure im seeing was just an intermitent netowrk issue but | 12:19 |
sean-k-mooney | before i recheck https://review.openstack.org/#/c/649409/ i just said i would ask | 12:19 |
*** whoami-rajat has joined #openstack-nova | 12:19 | |
sean-k-mooney | ill jsut recheck as everything else passed | 12:21 |
sean-k-mooney | melwitt: when https://review.openstack.org/#/c/649409/ merges ill cherry pick it back to stable/stein | 12:22 |
*** artom has joined #openstack-nova | 12:25 | |
sean-k-mooney | tonyb: speaking of stable backport i fixed the indentation in https://review.openstack.org/#/c/635075/2..3 | 12:25 |
*** odyssey4me has joined #openstack-nova | 12:26 | |
*** eharney has quit IRC | 12:33 | |
*** dtantsur is now known as dtantsur|brb | 12:36 | |
*** priteau has quit IRC | 12:36 | |
*** dtantsur|brb is now known as dtantsur | 12:36 | |
*** mmethot has joined #openstack-nova | 12:36 | |
*** janki has quit IRC | 12:38 | |
*** priteau has joined #openstack-nova | 12:38 | |
*** sridharg has quit IRC | 12:38 | |
efried | mriedem: Right (forbidden traits never worked) | 12:38 |
*** helenafm has joined #openstack-nova | 12:39 | |
alex_xu | eandersson: yea, tags is better for just lable instance, and searching http://specs.openstack.org/openstack/nova-specs/specs/newton/implemented/tag-instances.html | 12:39 |
sean-k-mooney | efried: it worked in a flavor right just not in an image? | 12:43 |
sean-k-mooney | efried: or is it just totally borked for everything on the nova sise | 12:43 |
sean-k-mooney | side | 12:43 |
efried | sean-k-mooney: it didn't work in flavor, but it was supposed to. | 12:43 |
efried | sean-k-mooney: it doesn't work in image, but we never implemented that, so that part's not surprising. | 12:44 |
sean-k-mooney | oh ok so it only works at the placement api level | 12:44 |
efried | right | 12:44 |
efried | well, now it works from flavor too. | 12:44 |
sean-k-mooney | ah ok | 12:44 |
efried | because fixed | 12:44 |
sean-k-mooney | :) | 12:44 |
efried | probably stein backport-worthy. | 12:44 |
*** priteau has quit IRC | 12:44 | |
efried | but probably not RC-worthy | 12:44 |
efried | not sure | 12:45 |
sean-k-mooney | it was "intoduced" in rocky right? | 12:45 |
* efried looks... | 12:45 | |
sean-k-mooney | yep https://github.com/openstack/nova-specs/blob/master/specs/rocky/implemented/placement-forbidden-traits.rst | 12:45 |
efried | sean-k-mooney: The placement side was a year ago. | 12:45 |
efried | trying to find the nova side | 12:46 |
sean-k-mooney | i think it was all covered by the same spec for rocky | 12:47 |
shilpasd | efried: observed that in multinode setup, instance data files created at all nodes (controller + compute in my case) at 'instances_path', is it correct behavior? | 12:47 |
*** wolverineav has joined #openstack-nova | 12:47 | |
efried | sean-k-mooney: https://blueprints.launchpad.net/nova/+spec/forbidden-traits-in-nova different spec | 12:49 |
efried | shilpasd: sounds like a libvirt question. sean-k-mooney or stephenfin, y'all have that answer off the top? | 12:49 |
sean-k-mooney | efried: assuming this was added in rocky then its not an rc candiate since the regession was not in stien but i would agrue the fix should be backported to both stable/stien and stable/rocky as it was just a bug in the original implemenation and not a new feature | 12:50 |
*** gbarros has joined #openstack-nova | 12:50 | |
sean-k-mooney | efried: shilpasd i did not anser because i did not know. | 12:50 |
sean-k-mooney | if by instance data path you mean under /var/lib/libvirt* then yes | 12:50 |
efried | sean-k-mooney: https://review.openstack.org/#/c/561677/ <== rocky. So yeah. | 12:51 |
sean-k-mooney | i think that is correct | 12:51 |
shilpasd | efried: sean-k-mooney: okay, will check with stephenfin, 'instances_path = /opt/stack/data/nova/instances' | 12:51 |
openstackgerrit | Matthew Booth proposed openstack/nova master: systemd detection result caching nit fixes https://review.openstack.org/649229 | 12:51 |
sean-k-mooney | efried: so ya not an RC candiate but assuming the fix is non invasive it should be a backprot candiate | 12:51 |
sean-k-mooney | shilpasd: is this a devstack setup | 12:52 |
*** wolverineav has quit IRC | 12:52 | |
shilpasd | sean-k-mooney: yes | 12:53 |
sean-k-mooney | if so the /opt/stack/data directory is where we sotre image and cinder voluems and other perstnet data form the openstack services | 12:53 |
sean-k-mooney | so i would not be surpiesd if the instacne root disk is also stored there | 12:53 |
sean-k-mooney | i have never really checked however. | 12:54 |
shilpasd | sean-k-mooney: but in multinode case, it was stored at both node in my case, at controller side and at compute side | 12:54 |
sean-k-mooney | shilpasd: it wont be sotre in both localtions it will be copied if you do a migration | 12:55 |
sean-k-mooney | e.g. we wont create the isntace root disk on all nodes in the multinode devstack cloud | 12:55 |
openstackgerrit | Eric Fried proposed openstack/nova stable/stein: Adding tests to demonstrate bug #1821824 https://review.openstack.org/649600 | 12:56 |
openstack | bug 1821824 in OpenStack Compute (nova) stein "Forbidden traits in flavor properties don't work" [High,Confirmed] https://launchpad.net/bugs/1821824 | 12:56 |
shilpasd | sean-k-mooney: yes, but before migration i am verifying 'instances_path' and observed that instance files are already there | 12:56 |
openstackgerrit | Eric Fried proposed openstack/nova stable/stein: Fix bug preventing forbidden traits from working https://review.openstack.org/649601 | 12:56 |
*** lbragstad has joined #openstack-nova | 12:56 | |
shilpasd | sean-k-mooney: IMO it should not be the case, as you said during migration it should copy | 12:56 |
sean-k-mooney | if you are doing a block migration it will copy them yes | 12:56 |
*** jmlowe has quit IRC | 12:57 | |
sean-k-mooney | you can also mount /opt/stack/data/nova on nfs to avoid this but you would have to do that manually | 12:57 |
openstackgerrit | Eric Fried proposed openstack/nova stable/rocky: Adding tests to demonstrate bug #1821824 https://review.openstack.org/649602 | 12:57 |
openstack | bug 1821824 in OpenStack Compute (nova) stein "Forbidden traits in flavor properties don't work" [High,In progress] https://launchpad.net/bugs/1821824 - Assigned to Eric Fried (efried) | 12:57 |
openstackgerrit | Eric Fried proposed openstack/nova stable/rocky: Fix bug preventing forbidden traits from working https://review.openstack.org/649603 | 12:57 |
sean-k-mooney | shilpasd: are you tryign to debug somthing locally or in the gate | 12:58 |
sean-k-mooney | or just understand how it works | 12:58 |
shilpasd | sean-k-mooney: tried with NFS + multinode, there saw insatnce data file created at all 3 places in my case, at NFS shared location, at controller and at compute node. | 12:58 |
sean-k-mooney | ah its not create at all 3 locations | 12:58 |
sean-k-mooney | the data is stored on the nfs share and its just mounted on the contoler and compute | 12:59 |
sean-k-mooney | there is one copy of the data but its acceable via nfs form multiple locations | 12:59 |
shilpasd | sean-k-mooney: yes me to wondered, and here during evacuation getting error 'libvirtError: internal error: process exited while connecting to monitor.', 'ERROR oslo_messaging.rpc.server Is another process using the image?' | 12:59 |
sean-k-mooney | shilpasd: what version of openstack are you deploying | 13:00 |
*** gaoyan has joined #openstack-nova | 13:00 | |
shilpasd | sean-k-mooney: stein | 13:00 |
sean-k-mooney | when we do an evacuate we assume the guest on the donw node is stoped | 13:01 |
sean-k-mooney | if its not its not safe to evacuate | 13:01 |
shilpasd | openstack 3.18.0 | 13:01 |
sean-k-mooney | did you stop the vm? | 13:01 |
shilpasd | purposefully stopping n-cpu by service-force-down | 13:01 |
sean-k-mooney | that will not stop the vm | 13:02 |
shilpasd | yes but evacuation happens in this case also | 13:02 |
sean-k-mooney | yes and the if you use force down you are required to check that the vms are stopped on the host | 13:02 |
sean-k-mooney | force does was added specficaly for operator that had external monitorin that could determin the host had failed and would stop all reunning guest before seting it | 13:03 |
shilpasd | ok, will check that, but IMO error i am getting is because of same data files already available and during evacuation the said error ''ERROR oslo_messaging.rpc.server Is another process using the image?'' | 13:04 |
sean-k-mooney | https://specs.openstack.org/openstack/nova-specs/specs/liberty/implemented/mark-host-down.html#use-cases | 13:04 |
shilpasd | ok, will check after VM down, but original question remains, why insatnce data files created at all 3 places NFS, controller and compute? | 13:05 |
shilpasd | is it correct behaviour? | 13:05 |
sean-k-mooney | yes | 13:05 |
sean-k-mooney | assumign that the instance directory is on nfs | 13:05 |
sean-k-mooney | what does mount show on the contoler or compute node | 13:05 |
sean-k-mooney | we have special logic that detect if the instace directory is on nfs and skips copying the disk in that cases | 13:06 |
shilpasd | same shared location | 13:06 |
shilpasd | can you please help me where exatly this logic is | 13:06 |
shilpasd | i have checked def _create_image() of libvirt | 13:07 |
openstackgerrit | Jared Winborne proposed openstack/nova master: Leave brackets on Ceph IP addresses for libguestfs https://review.openstack.org/649405 | 13:08 |
*** Alon_KS has joined #openstack-nova | 13:09 | |
*** cdent has joined #openstack-nova | 13:09 | |
sean-k-mooney | shilpasd: https://github.com/openstack/nova/blob/1554d35834a474514f827449bd7d4f1d2f0af1d6/nova/virt/libvirt/driver.py#L6611-L6641 | 13:10 |
*** mriedem has joined #openstack-nova | 13:10 | |
sean-k-mooney | shilpasd: the issue here however is not related to this check. | 13:11 |
KH-Jared | I'm just going to blame my long lines on trusting my IDE too much. Its pep8 warning was at 120 characters instead of 79, whoops | 13:11 |
sean-k-mooney | if you use force host down, you are required to ensure the vm is not running before you call evacuate. that is the cause of the error you hit | 13:12 |
*** amodi has joined #openstack-nova | 13:12 | |
shilpasd | sean-k-mooney: yes, thanks for this input, will check with VM down, and will debug more the given refrence code, and get back to you on the same later | 13:13 |
sean-k-mooney | KH-Jared: ya the 79 column limmit is particlaly annoying because no ide defualt to 79. some defult to 80 but even pycharm does not defualt to 79 and its a python focused ide | 13:13 |
KH-Jared | I can get behind it though, the code has always been extremely nice to read, I assume it adds to that. I also have it fixed in my ide now, so hopefully won't see a pep8 failure again in the future | 13:15 |
sean-k-mooney | i have been working on openstack for the better part of 6 years and i still dont write pep8 complient code by default | 13:16 |
sean-k-mooney | but i run fast8 on it most of the time before i push | 13:16 |
sean-k-mooney | KH-Jared: you are aware of the fast8 env | 13:16 |
sean-k-mooney | tox -e fast8 | 13:16 |
KH-Jared | i am now | 13:16 |
sean-k-mooney | it runs pep8 but just on your patched files | 13:17 |
sean-k-mooney | not on all of nova | 13:17 |
sean-k-mooney | is way fater | 13:17 |
artom | sean-k-mooney, whoa, that exists? | 13:17 |
sean-k-mooney | *faster | 13:17 |
sean-k-mooney | artom: yes... | 13:17 |
artom | I've been hacking something like it with $(tox -e pep8 `git show --name-only | grep ^nova`) | 13:17 |
sean-k-mooney | artom: stephenfin added it in pike | 13:18 |
artom | ... | 13:18 |
KH-Jared | this was my first change on a large project, not even just openstack, all testing I've done has been much smaller by comparison. I'm going to be using flake8 more often atleast | 13:18 |
artom | \o/ | 13:18 |
sean-k-mooney | its also in a few other repos at this point | 13:18 |
efried | jaypipes: Do we still have one RT per ironic node? | 13:18 |
*** wolverineav has joined #openstack-nova | 13:19 | |
mriedem | gibi_off: do you know why this was made into a warning? https://github.com/openstack/nova/blob/b33fa1c054ba4b7d4e789aa51250ad5c8325da2d/nova/scheduler/client/report.py#L1880 we hit that a lot in normal resizes: https://bugs.launchpad.net/nova/+bug/1822917 | 13:19 |
openstack | Launchpad bug 1822917 in OpenStack Compute (nova) ""Overwriting current allocation" warnings in logs during move operations although there are no failures" [Undecided,In progress] - Assigned to Takashi NATSUME (natsume-takashi) | 13:19 |
mriedem | efried: there is one RT per nova-compute service, | 13:20 |
mriedem | and the RT has a dict of compute nodes | 13:20 |
*** erlon has quit IRC | 13:20 | |
mriedem | https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L544 | 13:21 |
mriedem | https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L139 | 13:21 |
efried | mriedem: I'm looking at stephenfin's change https://review.openstack.org/#/c/649559/ which seems sane in itself, but https://review.openstack.org/#/c/649559/1/nova/compute/resource_tracker.py@605 is only being run through once, for the "first" node. | 13:21 |
efried | it's probably moot by luck because we don't track PCI devices on ironic nodes (right?) | 13:22 |
mriedem | probably | 13:23 |
*** wolverineav has quit IRC | 13:23 | |
jaypipes | efried: no, not for years. | 13:25 |
efried | swhat I thought, see above | 13:25 |
jaypipes | k, will look. | 13:26 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: trivial: Remove dead resource tracker code https://review.openstack.org/649569 | 13:29 |
*** ricolin has joined #openstack-nova | 13:30 | |
*** dklyle has quit IRC | 13:30 | |
sean-k-mooney | stephenfin: regarding ^ this conflits with some other chages that might be starting to used some of that dead code | 13:30 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: trivial: Remove dead nova.db functions https://review.openstack.org/649570 | 13:31 |
sean-k-mooney | im thinking about https://review.openstack.org/#/q/topic:bug/1809095+(status:open+OR+status:merged) | 13:31 |
*** dklyle has joined #openstack-nova | 13:31 | |
*** elod_off has quit IRC | 13:31 | |
*** elod_off has joined #openstack-nova | 13:32 | |
stephenfin | sean-k-mooney: Possibly. If so, let me know. I was just using vulture (https://pypi.org/project/vulture/) to figure that stuff out so there's a lot of missing context | 13:32 |
*** gbarros has quit IRC | 13:32 | |
sean-k-mooney | stephenfin: actully no never mind | 13:32 |
sean-k-mooney | they are not | 13:32 |
sean-k-mooney | its just a conflit in the tests | 13:32 |
sean-k-mooney | stephenfin: we might have some function that were added for the sriov migration code too | 13:33 |
sean-k-mooney | we merged all the resouce tracker code but have not merged the 2 patches that use it | 13:33 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/stein: Error out migration when confirm_resize fails https://review.openstack.org/649421 | 13:33 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: trivial: Remove dead nova.db functions https://review.openstack.org/649570 | 13:36 |
mriedem | sean-k-mooney: are you going to cherry pick https://review.openstack.org/#/c/649409/ to stable/stein now? | 13:36 |
sean-k-mooney | speaking of https://review.openstack.org/#/q/topic:bp/libvirt-neutron-sriov-livemigration+(status:open) if people have time to review that again it would be nice to see that merged now that train is open | 13:36 |
mriedem | efried: we're doing an rc2 for https://review.openstack.org/#/c/649409/ right? | 13:36 |
gibi_off | mriedem: I didn't find the specific reason for the warning in move_allocation so I guess I added it becase it thought that is should not really happen that we overwrite existing allocation during move | 13:36 |
sean-k-mooney | im waiting for zuul to merge it | 13:36 |
sean-k-mooney | but yes ill cherry pick it once that is done | 13:37 |
mriedem | gibi_off: it's there because of https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/migration-allocations.html | 13:37 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: trivial: Remove unused constants, functions https://review.openstack.org/649567 | 13:37 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: trivial: Remove dead resource tracker code https://review.openstack.org/649569 | 13:37 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: trivial: Remove dead nova.db functions https://review.openstack.org/649570 | 13:37 |
sean-k-mooney | ... it going to fail again because fo failure in lower constarits | 13:38 |
stephenfin | mriedem, gibi_off: Have reshuffled those around to remove the placement changes and fix an issue with the last patch, FYI | 13:38 |
sean-k-mooney | http://logs.openstack.org/09/649409/3/check/openstack-tox-lower-constraints/1a8274d/testr_results.html.gz | 13:38 |
mriedem | i assume that's related to http://status.openstack.org/elastic-recheck/#1793364 but the signature on that query is old | 13:40 |
gibi_off | mriedem: so during revert resize/migrate the target allocation is non empty as we keep the allocation exists both on the source and the target host of the migration | 13:40 |
*** awaugama has joined #openstack-nova | 13:40 | |
mriedem | during revert we should drop the target node allocations, held by the instance consumer, and move the source node allocations, held by the migratoin consumer, to the instance consumer | 13:41 |
sean-k-mooney | mriedem: well the patch to fix our lower constratins job just laned so maybe this is a bug that was fixed in later versions of sqlacamy/pymysql | 13:41 |
mriedem | so after revert there are no allocations for the instance on the target node and the instance has the source node allocations again | 13:41 |
gibi_off | mriedem: yeah I mixed up it is not moving allocation between hosts it moves between cosnumers | 13:42 |
mriedem | dansmith: sounds like we might be doing an RC2 for stein, and bug 1715374 is latent, but do you think it would be worth putting out a known issue release note for stein anyway? | 13:42 |
openstack | bug 1715374 in OpenStack Compute (nova) "Reloading compute with SIGHUP prevents instances from booting" [High,In progress] https://launchpad.net/bugs/1715374 - Assigned to Ralf Haferkamp (rhafer) | 13:42 |
dansmith | mriedem: I dunno, it's been this way for a long time apparently | 13:42 |
dansmith | so doesn't really seem like it | 13:43 |
gibi_off | mriedem: anyhow I think the warning can be deleted | 13:43 |
mriedem | gibi_off: or at least dropped to debug | 13:43 |
*** ttsiouts has quit IRC | 13:43 | |
gibi_off | yeah | 13:44 |
mriedem | dansmith: just thinking about our upgrade docs and such that mention to use sighup during an upgrade https://docs.openstack.org/nova/stein/user/upgrade.html?highlight=sighup#concepts | 13:45 |
*** ttsiouts has joined #openstack-nova | 13:45 | |
dansmith | well, I know | 13:45 |
mriedem | maybe that should have a note instead | 13:45 |
dansmith | yeah, that would make more sense I think | 13:46 |
mriedem | and https://docs.openstack.org/nova/stein/configuration/config.html?highlight=sighup#compute.resource_provider_association_refresh for efried | 13:46 |
mriedem | ^ might work for that option, but then kills the event listening stuff so you can't boot a server right? | 13:46 |
mriedem | neutron events i mean | 13:46 |
mriedem | blech we have stuff in here too https://docs.openstack.org/nova/stein/admin/configuration/schedulers.html?highlight=sighup#compute-capabilities-as-traits | 13:48 |
*** jmlowe has joined #openstack-nova | 13:49 | |
sean-k-mooney | so appraently rechecking a runing job does not work so i can either rebase to head of mater if peopel feel like +w again or i can recheck in 50mins when zuul finishies it current run | 13:49 |
mriedem | in other fun news, i learned last night that with cross-cell resize, we have to do a hard delete of the instance in a cell db rather than a soft delete | 13:49 |
mriedem | sean-k-mooney: just rebase | 13:50 |
openstackgerrit | sean mooney proposed openstack/nova master: Libvirt: gracefully handle non-nic VFs https://review.openstack.org/649409 | 13:50 |
sean-k-mooney | done that will retriger the jobs and it just need +w. im going to grab lunch before a meeting so brb | 13:51 |
mriedem | so i wonder if we could recreate this sighup issue in one of our post-test hook scripts, we'd just sighup the local compute service and then create a server which should timeout waiting for the network-vif-plugged callback right? | 13:54 |
dansmith | no, | 13:54 |
dansmith | you have to have a server in the middle of creating when you sighup I think | 13:55 |
*** oanson has quit IRC | 13:55 | |
*** eharney has joined #openstack-nova | 13:55 | |
dansmith | before the event comes in | 13:55 |
dansmith | the other option is probably just to pick another signal as a stop-gap, register for it ourselves, and wire it up to our existing handlers | 13:55 |
dansmith | but that's even more icky for rc2 | 13:55 |
mriedem | on a new server create after the signup, won't the _events dict be None and when registering the callback we'd hit this? https://review.openstack.org/#/c/420026/7/nova/compute/manager.py@316 | 13:56 |
*** BjoernT has joined #openstack-nova | 13:56 | |
*** udesale has joined #openstack-nova | 13:56 | |
dansmith | I don't think so because we'll be re-created at the point before we get there | 13:56 |
*** BjoernT has quit IRC | 13:56 | |
dansmith | I think it only happens if we end up with a server in the middle of that while the event comes in, but once the sighup has finished the full restart, the new manager is hooked to new rpc connections, etc | 13:57 |
dansmith | it doesn't completely bork the server forever, AFAIK | 13:57 |
dansmith | else it would be really obvious that it's totally broken | 13:57 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove unreachable codepaths https://review.openstack.org/649559 | 13:58 |
mriedem | ok i thought the service was fubar after the sighup | 13:59 |
mriedem | so this is less severe than i thought | 13:59 |
bauzas | any urgent reviews I should do before RC2 tagging ? | 14:01 |
*** mlavalle has joined #openstack-nova | 14:01 | |
dansmith | not that I know of | 14:01 |
trident | mriedem: Regarding https://review.openstack.org/#/c/648653/ | 14:01 |
bauzas | https://etherpad.openstack.org/p/nova-stein-rc-potential is pretty done | 14:01 |
*** igordc has joined #openstack-nova | 14:01 | |
trident | mriedem: You are correct, forbidden traits never worked unless there were also a required trait in the same flavor. If there were, both would be used, if not, the forbidden trait was lost. | 14:02 |
*** awalende has quit IRC | 14:03 | |
bauzas | mriedem: https://review.openstack.org/#/c/649454/ wanting it for RC2 ? I can vote on my own change given melwitt proposed it | 14:03 |
bauzas | I think it should clarify our Stein docs | 14:03 |
*** awalende has joined #openstack-nova | 14:03 | |
bauzas | even if we don't honestly branch them | 14:03 |
mriedem | bauzas: i'm waiting to hear that yes we're doing an rc2 | 14:06 |
mriedem | and then i'd like to step through with the ptl which stein backports are going to go into it | 14:06 |
*** BjoernT has joined #openstack-nova | 14:06 | |
bauzas | soooooo... efried? | 14:06 |
mriedem | there is a 50% chance efried is the PTL coordinating the stein RC2 :) | 14:07 |
*** awalende_ has joined #openstack-nova | 14:07 | |
*** awalende has quit IRC | 14:08 | |
bauzas | oh, right, it's still melwitt's point :p | 14:08 |
efried | hi, sorry, catching up. | 14:09 |
mriedem | bauzas: you could start by backporting https://review.openstack.org/#/c/649409/ to stable/stein | 14:10 |
bauzas | mriedem: ok, I can do it | 14:10 |
efried | I think we need an RC2 for sure for https://review.openstack.org/#/c/649409/ at least | 14:10 |
efried | and if we're doing it, we might as well put the docs in, no risk there. | 14:10 |
mriedem | https://review.openstack.org/#/c/649454/ specifically yeah? | 14:11 |
*** awalende_ has quit IRC | 14:11 | |
efried | yes | 14:12 |
efried | mriedem: wanna +A that one please? | 14:13 |
* bauzas would love the Gerrit cherry-pick button to amend the commit msg like -x | 14:13 | |
mriedem | done | 14:14 |
efried | mriedem: Has anyone done docs on the SIGHUP thing? | 14:14 |
efried | L56 | 14:14 |
mriedem | no | 14:14 |
mriedem | i was just creating a devstack env to try and recreate the bug | 14:14 |
efried | mriedem: I can go through and add a quick | 14:15 |
efried | .. note:: SIGHUP is broken, see `bug 1715374`_ | 14:15 |
openstack | bug 1715374 in OpenStack Compute (nova) "Reloading compute with SIGHUP prevents instances from booting" [High,In progress] https://launchpad.net/bugs/1715374 - Assigned to Ralf Haferkamp (rhafer) | 14:15 |
efried | to all the docs where it's mentioned if you like. | 14:15 |
*** dklyle has quit IRC | 14:15 | |
mriedem | well, if the nuance is it's broken for a window while servers are being created and waiting for an event when the sighup runs, that gets a bit hard to communicate in all the places we have sighup mentioned in the docs | 14:16 |
efried | .. note:: SIGHUP behavior is questionable, see `bug 1715374`_ if you try it and things go wobbly. | 14:17 |
openstack | bug 1715374 in OpenStack Compute (nova) "Reloading compute with SIGHUP prevents instances from booting" [High,In progress] https://launchpad.net/bugs/1715374 - Assigned to Ralf Haferkamp (rhafer) | 14:17 |
efried | ? | 14:17 |
mriedem | idk, could give a more detailed explanation of the bug in the reset() portion of https://docs.openstack.org/nova/latest/reference/services.html#the-nova-manager-module for just the compute service | 14:18 |
mriedem | i wouldn't want to mention a bug in 5 places just to have to remember after the bug is fixed to go back and remove all of those 5 places | 14:19 |
efried | as you wish | 14:20 |
openstackgerrit | Sylvain Bauza proposed openstack/nova stable/stein: Libvirt: gracefully handle non-nic VFs https://review.openstack.org/649630 | 14:21 |
bauzas | mriedem: ^ | 14:21 |
*** cdent has quit IRC | 14:24 | |
*** cdent has joined #openstack-nova | 14:26 | |
mriedem | is there anything else we want to consider? https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:stable/stein | 14:27 |
bauzas | mriedem: https://review.openstack.org/#/c/647310/ is controversial for RC2 | 14:29 |
dansmith | I thought we said no to that? | 14:31 |
dansmith | and sounds like lyarwood is on board with the original plan too | 14:31 |
openstackgerrit | Helena proposed openstack/nova-specs master: Spec for a new nova virt driver to manage an RSD, composable infrastructure deployment https://review.openstack.org/648665 | 14:32 |
lyarwood | yarp, missed that it had already been discussed in the change once I got back online yesterday. gertty-- | 14:32 |
lyarwood | Does anyone have any idea why nova-stable-maint already has +2 on stable/stein btw? | 14:34 |
mriedem | i'm not sure if the release team distinguishes anymore | 14:34 |
mriedem | smcginnis: ^? | 14:34 |
*** awalende has joined #openstack-nova | 14:35 | |
jaypipes | mriedem, dansmith: for online data migrations, did we envision those migration routines living forever or are we envisioning being able to delete some over time? | 14:36 |
mriedem | jaypipes: we already have deleted some over time | 14:36 |
dansmith | jaypipes: we have deleted many | 14:36 |
jaypipes | oh, ok. | 14:36 |
mriedem | most recent memory is i removed the request spec and flavor migration ones | 14:36 |
dansmith | we're not super good at setting that timer and then doing it, like any other cleanup | 14:37 |
jaypipes | mriedem, dansmith: after looking into mnaser's troubles, I think the removing the keypairs migration entirely might be the best solution. | 14:37 |
dansmith | but most of them should be relatively no-op-ish if done properly | 14:37 |
jaypipes | it's Newton-era | 14:37 |
jaypipes | dansmith: unfortunately that one does a full table scan across instance_extra each time it's run. | 14:37 |
mriedem | jaypipes: yeah i started looking at that the other day, but i don't think we have a blocker migration or anything in place to make sure the migration has completed before we rip that out | 14:37 |
jaypipes | dansmith: looking for WHERE keypairs=NULL | 14:37 |
mriedem | like we did for flavors | 14:37 |
dansmith | mriedem: we might not be able to for that one | 14:38 |
jaypipes | dansmith: and for large DBs like mnaser's 3M+ instance_extra records, it really is a resource hog. | 14:38 |
dansmith | mriedem: but since it's so old, if it hadn't finished, we probably know that other things would be broken | 14:38 |
mriedem | if there are keypairs in the cell db wouldn't that be an indication? | 14:38 |
dansmith | jaypipes: I hear you | 14:38 |
mriedem | for the request spec migration, we didn't have a blocker migration, just a nova-status upgrade check: https://github.com/openstack/nova/commit/ed4fe3ead62c09ec7de7b6a11072295a99997b4f#diff-91e852cb498abb50ca653ef6418bd65a | 14:38 |
mriedem | oh i added the upgrade check in rocky, dropped the reqspec migration code in stein | 14:39 |
dansmith | jaypipes: I don't agree that going back to putting this inside the schema migration would make people happy, as his cloud would still be down while he waits for that to happen, but.. you know :) | 14:39 |
*** awalende has quit IRC | 14:39 | |
dansmith | jaypipes: hopefully you just meant some sort of version-like sentinel instead of scanning to determine if things needed to be done :) | 14:40 |
jaypipes | dansmith: why would his cloud be down? | 14:40 |
dansmith | jaypipes: if we did it in the middle of a schema migration? | 14:40 |
jaypipes | yeah. | 14:40 |
dansmith | because he'd be halfway between two schema versions with two versions of code that can't use it? | 14:40 |
dansmith | i.e. the reason we decoupled those tasks in the first place | 14:41 |
jaypipes | sorry, you misunderstand... | 14:41 |
*** dklyle has joined #openstack-nova | 14:41 | |
mriedem | wouldn't dropping the keypairs migration be similar to dropping the instance groups migration? https://github.com/openstack/nova/commit/1160921c2d053ce33279ca4ec1f00572271e7c95#diff-91e852cb498abb50ca653ef6418bd65a | 14:41 |
mriedem | jaypipes: fyi https://review.openstack.org/#/q/topic:remove-newton-online-compat-code+(status:open+OR+status:merged) | 14:42 |
mriedem | for a blueprint | 14:42 |
jaypipes | dansmith: if we did the data migration as an Alembic/sqlalchemy-migrate migration, then we wouldn't have to ever run the "check to see if this pre-condition exists" (like the SELECT against the instance_extra table...) more than once. | 14:42 |
jaypipes | mriedem: ack | 14:42 |
mriedem | i seem to remember we've fucked up the schema migration scripts before as well | 14:42 |
jaypipes | dansmith: I'm not saying the data migration would be the *same* migration as the schema migration. just that we could do it *as* a migraiton. | 14:43 |
dansmith | jaypipes: that's what I said above about the sentinel approach for scanning, but definitely do not agree they should be in the same migration | 14:43 |
jaypipes | instead of the whole separate nova-manage thing. | 14:43 |
mriedem | and since you can't downgrade the schema to get back, you'd have to reset the version in the migrations table and re-run those if we had a bug | 14:43 |
jaypipes | but that train has passed... | 14:43 |
dansmith | jaypipes: okay, well, you can see why people might be confused about such a statement :) | 14:43 |
jaypipes | mriedem: not once, ever, has anyone ever done a downgrade of a schema migration. | 14:43 |
mriedem | that's not true | 14:43 |
mriedem | hell we used to gate on being able to upgrade and downgrade the schema | 14:44 |
jaypipes | gating != ever being done in production, ever. | 14:44 |
dansmith | it didn't actually work for real data | 14:44 |
dansmith | yeah | 14:44 |
dansmith | heh | 14:44 |
*** dpawlik has quit IRC | 14:44 | |
jaypipes | it just doesn't happen, sorry. | 14:44 |
dansmith | which is why I don't want to couple the two processes | 14:44 |
jaypipes | the procedure is a) backup your stuff, b) run migrations, c) test and if problem, d) restore from backup. | 14:45 |
dansmith | but jaypipes, we could of course just set up our own version counter to be a sentinel to avoid re-running migrations, | 14:45 |
jaypipes | nobody ever ran a schema downgrade in production. I can almost guarantee that. | 14:45 |
mriedem | i should introduce you to some chinese people i know then | 14:45 |
mriedem | anyway, i'm not saying it's something that should be done | 14:45 |
jaypipes | mriedem: then, clearly, SOMEONE is WRONG on the Internet. | 14:46 |
jaypipes | :P | 14:46 |
dansmith | jaypipes: we certainly have people that think they want it, but I definitely agree they're misguided :) | 14:46 |
mriedem | i have also told them ^ | 14:46 |
jaypipes | I'm a bit slap-happy today, guys, you'll have to forgive me. | 14:46 |
* jaypipes goes back to shooting himself in the head with Ironic rebuild issues. | 14:46 | |
*** luksky has quit IRC | 14:47 | |
mriedem | i'd be happy to drop the newton-era keypair migrations, but i think it's more complicated than just deleting that code, since we normally have something that checks to see if you have completed your homework before we just drop the migration routine and all of the compat code | 14:48 |
mriedem | which requires some thinking | 14:48 |
dansmith | well, | 14:48 |
dansmith | if the migration is acutely painful we could drop it without dropping the compat code and then take our time with the latter I think | 14:49 |
dansmith | or fix the migration to be more efficient if that's really possible | 14:49 |
dansmith | ISTR that since they're one-to-many and across DBs it's harder than that | 14:49 |
dansmith | but it's been a long time | 14:49 |
jaypipes | mriedem: yeah, I was pondering what that check would be other than SELECT COUNT(*) FROM instance_extra WHERE keypairs IS NULL, though. | 14:49 |
mriedem | there is also https://review.openstack.org/#/c/517158/ which might mean some of the migration routine is dead now | 14:50 |
mriedem | i.e. https://github.com/openstack/nova/blob/master/nova/objects/keypair.py#L242 | 14:50 |
mriedem | oh nvm that just means we wouldn't hit this https://github.com/openstack/nova/blob/master/nova/objects/keypair.py#L260 | 14:51 |
mriedem | https://github.com/openstack/nova/commit/be8242cb5a0f8396f6b8c042813847db0571df14#diff-f5877540177ee26b63552ec5f56d74fb | 14:53 |
mriedem | so as of ocata, the keypairs table in the cell dbs should be empty | 14:53 |
mriedem | and the keypair information per instance should be in the instance_extra table yeah | 14:53 |
mriedem | ? | 14:53 |
mriedem | i'm confused, if you can't upgrade to ocata while there are keypairs in the cell db https://github.com/openstack/nova/commit/be8242cb5a0f8396f6b8c042813847db0571df14#diff-f5877540177ee26b63552ec5f56d74fb then how would the 'migrate to api db' still hit anything here? https://github.com/openstack/nova/blob/master/nova/objects/keypair.py#L265 | 14:56 |
dansmith | mriedem: I think it's not hitting anything, it's just scanning the whole table which is the problem | 14:57 |
mriedem | "select([func.count()]).select_from(keypairs).where( keypairs.c.deleted == 0).scalar()" is the query jaypipes just said | 14:57 |
mriedem | oh nvm it's not | 14:58 |
mriedem | based on the commit message in https://review.openstack.org/#/c/517158/ i think we probably have reasonable justification to just kill that 'migrate to api db' data migration from newton that is still getting run | 14:59 |
jaypipes | mriedem: yeah, the problematic one is the scan on instance_extra | 14:59 |
*** artom has quit IRC | 14:59 | |
*** artom has joined #openstack-nova | 15:00 | |
mriedem | i'm not sure why i didn't go further on https://review.openstack.org/#/c/517158/2/nova/cmd/manage.py and remove keypair_obj.migrate_keypairs_to_api_db - i probably just ran out of time and was doing them in order, and wanted to handle the request spec one first, which i did in rocky/stein | 15:00 |
*** artom has quit IRC | 15:00 | |
mriedem | so.....i think we can probably remove migrate_keypairs_to_api_db now? | 15:00 |
jaypipes | mriedem: maybe you were performing a schema downgrade in production. | 15:01 |
mriedem | i will cut you | 15:01 |
* jaypipes dons armor | 15:01 | |
mriedem | if we think we can drop this, and we're doing an rc2 for stein we might want to get off this pot and include it in rc2 if it's really causing pain for upgrades | 15:01 |
mriedem | but i feel like i'm trying to convince myself this is ok | 15:02 |
mriedem | like everything i do in nova, which comes back to bite my ass | 15:02 |
dansmith | seems risky for an rc2 | 15:04 |
dansmith | not from any real data I have | 15:04 |
dansmith | but rc2 should be "we can't release without this" and this doesn't seem to fit that, IMHO | 15:04 |
*** artom has joined #openstack-nova | 15:04 | |
dansmith | we probably need to nuke all those "added in neutron" migrations | 15:05 |
mriedem | i've got a local change, sec | 15:07 |
dansmith | only one more now I guess | 15:07 |
dansmith | looking at your patch from earlier there were abunch | 15:07 |
mriedem | yeah https://review.openstack.org/#/q/topic:remove-newton-online-compat-code+(status:open+OR+status:merged) | 15:07 |
mriedem | i tried | 15:07 |
mriedem | got tired | 15:07 |
*** phasespace has quit IRC | 15:12 | |
*** zhubx has quit IRC | 15:16 | |
*** zhubx has joined #openstack-nova | 15:16 | |
cdent | Is there a concise description on the rules about migrations between AZs somewhere? cold and live, force and not force? | 15:22 |
mriedem | likely not, at least off the top of my head, but i could probably explain it quick | 15:23 |
mriedem | then we could report a docs bug to fill that in later | 15:23 |
cdent | I'll take that | 15:24 |
mriedem | tl;dr if the user creates the server with a specific AZ or https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.default_schedule_zone is not None (meaning it goes into some default AZ), then the instance is restricted to that AZ for all move operations, | 15:24 |
mriedem | UNLESS it is forced like with live migrate or evacuate | 15:24 |
mriedem | which, by default, with OSC you get a forced live migration every time | 15:25 |
mriedem | so very easy to shoot yourself in the foot there as an admin | 15:25 |
mriedem | which is why there are at least 5 patches to osc to fix that, listed at L18 here https://etherpad.openstack.org/p/DEN-osc-compute-api-gaps | 15:25 |
*** gaoyan has quit IRC | 15:25 | |
cdent | \o/ | 15:26 |
cdent | are there any restrictions that distinguish the default AZ as special with regard to other AZs? | 15:26 |
mriedem | docs on migrate caveats regarding AZs could probably live here https://docs.openstack.org/nova/latest/user/aggregates.html#availability-zones-azs if you want to report a bug and i can wordsmith it later | 15:27 |
mriedem | no | 15:27 |
mriedem | if DEFAULT.default_schedule_zone is not None and the user doesn't request an AZ explicitly, it's treated as if the user did request the default AZ | 15:27 |
cdent | I guess if you specify that a sever is in foo-AZ (non-default) then you can't cold migrate from there to any other `one? | 15:27 |
mriedem | that happens here https://github.com/openstack/nova/blob/357da989c194a8b59842629cb64b2809143a4eae/nova/api/openstack/compute/servers.py#L641 | 15:28 |
mriedem | cdent: correct | 15:28 |
*** mvkr has quit IRC | 15:28 | |
cdent | okay. thanks. I'll make a bug, summarize this stuff there. | 15:28 |
mriedem | note there is a spec proposed to allow pasing a new AZ on unshelve, which i think is reasonable | 15:28 |
* cdent nods | 15:30 | |
mriedem | [cinder]/cross_az_attach=False also makes this all more complicated.... | 15:30 |
mriedem | ala https://review.openstack.org/#/c/469675/ wee | 15:30 |
* cdent burns everything down | 15:31 | |
mriedem | tl;dr if the cloud is configured for [cinder]/cross_az_attach=False and you boot from volume where the volume is in a non-default zone, server create explodes immediately | 15:31 |
mriedem | *and you don't create the server in the same zone | 15:31 |
mriedem | sorrison suffers from ^ | 15:31 |
* bauzas just seeing the backlog | 15:32 | |
mriedem | side effects include chronic vegimitis | 15:32 |
dansmith | gdi mriedem, save your depressing stuff for mondays will you? wednesday is supposed to be cresting the hill of depression and heading down to friday happiness | 15:32 |
mriedem | being constantly depressed about bugs from essex still being in our code is all i have to live for anymore | 15:32 |
* cdent prefers his depressed on tuesday | 15:32 | |
mriedem | not to mention all of my software skills are stuck in 6 years ago and i'm not learning anything new as a developer...wahwah | 15:34 |
artom | You're learning new soft skills | 15:35 |
artom | Like... how to deal with depression ^_^ | 15:36 |
dansmith | nice | 15:37 |
*** burt has quit IRC | 15:38 | |
mriedem | that'll be good on my next interview | 15:40 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Drop migrate_keypairs_to_api_db data migration https://review.openstack.org/649648 | 15:40 |
mriedem | dansmith: jaypipes: mnaser: ^ there you go | 15:40 |
*** burt has joined #openstack-nova | 15:42 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Drop migrate_keypairs_to_api_db data migration https://review.openstack.org/649648 | 15:43 |
mriedem | cdent: heh sound familiar? https://bugs.launchpad.net/nova/+bug/1822986 | 15:45 |
openstack | Launchpad bug 1822986 in OpenStack Compute (nova) "Not clear if www_authenticate_uri is really needed" [Undecided,New] | 15:45 |
cdent | oh hai | 15:45 |
cdent | hmm, yes, rather familiar | 15:46 |
*** dpawlik has joined #openstack-nova | 15:48 | |
*** BjoernT has quit IRC | 15:48 | |
*** tssurya has quit IRC | 15:49 | |
melwitt | mriedem, efried: I thought I was coordinating RC2, but if someone else wants to, that's fine with me. I put the things I thought should go into it on https://etherpad.openstack.org/p/nova-stein-rc-potential last night | 15:51 |
mriedem | melwitt: i think those are the same two changes that are being put in, so it's aligned | 15:52 |
*** dpawlik has quit IRC | 15:52 | |
melwitt | ok, cool | 15:53 |
*** manjeets has joined #openstack-nova | 15:54 | |
*** rpittau is now known as rpittau|afk | 15:57 | |
*** phasespace has joined #openstack-nova | 15:58 | |
jaypipes | mriedem: ++ | 16:00 |
*** nicolasbock has quit IRC | 16:02 | |
*** erlon has joined #openstack-nova | 16:02 | |
*** dpawlik has joined #openstack-nova | 16:02 | |
*** dikonoor has quit IRC | 16:05 | |
melwitt | mriedem, efried: fyi I've proposed RC2 at https://review.openstack.org/649656 | 16:06 |
*** yan0s has quit IRC | 16:09 | |
*** BjoernT has joined #openstack-nova | 16:11 | |
*** artom has quit IRC | 16:12 | |
*** imacdonn has quit IRC | 16:12 | |
*** imacdonn has joined #openstack-nova | 16:13 | |
*** gbarros has joined #openstack-nova | 16:17 | |
*** gbarros_ has joined #openstack-nova | 16:17 | |
*** gbarros has quit IRC | 16:18 | |
efried | mriedem: was away for a bit there. Did you decide to do something about SIGHUP for RC2? | 16:19 |
*** udesale has quit IRC | 16:19 | |
mriedem | efried: no | 16:19 |
mriedem | i went on a online data migrations tangent | 16:20 |
efried | ight | 16:20 |
mriedem | kashyap: MIN_LIBVIRT_VERSION = (3, 0, 0) and MIN_LIBVIRT_POSTCOPY_VERSION = (1, 3, 3), do you have a change to remove MIN_LIBVIRT_POSTCOPY_VERSION? | 16:21 |
*** gbarros_ has quit IRC | 16:22 | |
mriedem | doesn't look like it | 16:24 |
*** dtantsur is now known as dtantsur|afk | 16:24 | |
*** ttsiouts has quit IRC | 16:32 | |
*** ttsiouts has joined #openstack-nova | 16:33 | |
*** psachin has quit IRC | 16:34 | |
mriedem | NewBruce: what do you have set for live_migration_permit_post_copy in nova.conf on your compute services? | 16:34 |
*** cfriesen has joined #openstack-nova | 16:36 | |
*** bbowen__ has quit IRC | 16:37 | |
*** bbowen__ has joined #openstack-nova | 16:37 | |
*** ttsiouts has quit IRC | 16:37 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: libvirt: drop MIN_LIBVIRT_POSTCOPY_VERSION https://review.openstack.org/649671 | 16:38 |
*** READ10 has joined #openstack-nova | 16:43 | |
*** spsurya has quit IRC | 16:46 | |
*** tesseract has quit IRC | 16:49 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: libvirt: remove conditional on VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY https://review.openstack.org/649674 | 16:50 |
mriedem | sean-k-mooney: so on https://review.openstack.org/#/c/649464/ we're never getting a post-copy event during live migration so we don't activate the dest host port binding prior to post_live_migration_at_destination, so that kind of throws that theory out the window as being the issue NewBruce is hitting | 16:52 |
openstackgerrit | Eric Fried proposed openstack/nova stable/rocky: Fix bug preventing forbidden traits from working https://review.openstack.org/649603 | 16:52 |
*** cdent has quit IRC | 16:53 | |
*** helenafm has quit IRC | 16:54 | |
*** luksky has joined #openstack-nova | 16:55 | |
*** derekh has quit IRC | 16:56 | |
*** wolverineav has joined #openstack-nova | 16:57 | |
sean-k-mooney | mriedem: ya i noticed that this morning that it passed | 16:58 |
sean-k-mooney | i was wondering if we shoudl maybe try and force the compute level to be different on the contoler vs the compute and see if that triggers the issue | 17:00 |
sean-k-mooney | mriedem: were you going to resubmit with post copy enabled? | 17:00 |
*** BjoernT has quit IRC | 17:01 | |
mriedem | sean-k-mooney: yeah i was going to try enabling post-copy but these cirros guests are so tiny i'm not sure it will mean anything | 17:02 |
mriedem | efried: dansmith: on that SIGHUP n-cpu issue, i tried that in a train devstack created this morning and i don't even get to the wait for network event part | 17:02 |
mriedem | http://paste.openstack.org/show/748820/ | 17:02 |
mriedem | libvirt blows up in privsep | 17:02 |
sean-k-mooney | mriedem: i might try and create a neutron fullstack test to reproduce the nutron db error | 17:03 |
mriedem | this is how i hup'ed: sudo systemctl kill -s HUP devstack@n-cpu.service | 17:03 |
dansmith | mriedem: what happens if you try it again? | 17:03 |
mriedem | the server create? or the hup? | 17:03 |
dansmith | mriedem: the create | 17:03 |
mriedem | same thing | 17:03 |
dansmith | okay | 17:03 |
dansmith | I have not seen it manifest that way | 17:04 |
dansmith | obviously that seems like a much bigger deal | 17:04 |
mriedem | unplug on cleaning up from the failure also then blows up b/c privsep http://paste.openstack.org/show/748821/ | 17:04 |
dansmith | which should be more evidence that the oslo behavior is completely wrong | 17:04 |
mriedem | did a full systemctl restart and created a server and it was fine as expected | 17:06 |
dansmith | I forget, but can people still choose to run privsep things via rootwrap so we can restart the daemon? | 17:06 |
dansmith | because if so, maybe that's why you see that and it's devstack default config or whatever | 17:06 |
dansmith | maybe I could get mikal to yell at us over twitter to answer that | 17:06 |
mriedem | idk, privsep config et al is something i haven't had to look at in years | 17:07 |
mriedem | but was always confusing to me | 17:07 |
dansmith | yeah | 17:07 |
dansmith | or maybe it's different with some of the privsep things we merged recently, | 17:07 |
dansmith | but people reporting it are missing more of the privsepification | 17:08 |
dansmith | did you create a server before the hup too? | 17:08 |
mriedem | yup | 17:12 |
mriedem | created test1, fine, sighup'ed, create test2 fails, create test3 fails, restart n-cpu, create test4 ok | 17:12 |
mriedem | commented 34 and 35 fwiw in bug https://bugs.launchpad.net/nova/+bug/1715374 | 17:14 |
openstack | Launchpad bug 1715374 in OpenStack Compute (nova) "Reloading compute with SIGHUP prevents instances from booting" [High,In progress] - Assigned to Ralf Haferkamp (rhafer) | 17:14 |
*** artom has joined #openstack-nova | 17:14 | |
dansmith | okay just trying to think of reasons you see it differently than reported, but probably because of recent changes I guess | 17:14 |
mriedem | the privsep-helper child processes are definitely gone after the SIGHUP http://paste.openstack.org/show/748822/ | 17:18 |
dansmith | so we probably just didn't notice before or something as fewer things used it | 17:19 |
*** BjoernT has joined #openstack-nova | 17:19 | |
dansmith | but I don't think there's any reason that they're being killed now and not before | 17:19 |
*** erlon has quit IRC | 17:20 | |
mriedem | yeah, this is what i see in the n-cpu logs on the HUP http://paste.openstack.org/show/748823/ | 17:20 |
mriedem | note the | 17:21 |
mriedem | Apr 03 17:15:28 train nova-compute[19990]: DEBUG oslo_privsep.comm [-] EOF on privsep read channel {{(pid=19990) _reader_main /usr/local/lib/python2.7/dist-packages/oslo | 17:21 |
dansmith | yep | 17:21 |
dansmith | and that we're calling the sighup handler, but then restarting anyway | 17:21 |
dansmith | soft restart | 17:21 |
dansmith | so you might be able to do something to the global privsep state to cause it to be respawned again after the restart, but I dunno how that works | 17:21 |
dansmith | it's just all pretty wrong | 17:22 |
*** BjoernT has quit IRC | 17:23 | |
mriedem | not much interesting in the unit either http://paste.openstack.org/show/748824/ | 17:24 |
*** spsurya has joined #openstack-nova | 17:27 | |
*** BjoernT has joined #openstack-nova | 17:28 | |
mriedem | yeah i'm not sure what does this http://logs.openstack.org/64/649464/1/check/tempest-full-py3/f849a52/controller/logs/screen-n-cpu.txt.gz#_Apr_02_23_01_32_822059 | 17:29 |
mriedem | oh probably this https://github.com/openstack/nova/blob/master/nova/cmd/compute.py#L45 | 17:30 |
mriedem | i'll throw that into ComputeManager.reset() and see what happens | 17:33 |
dansmith | I think that won't help, because it's before the restart | 17:33 |
dansmith | but worth a try I guess | 17:34 |
mriedem | yeah it didn't do anything | 17:36 |
mriedem | well i guess we can say SIGHUP of n-cpu is f'ed | 17:37 |
mnaser | very very f'd | 17:37 |
mnaser | I mean I caught this through the openstack-ansible CI which was relying on it to refresh RPCs versions after upgrades | 17:38 |
mnaser | but for some reason we had vif_plugging_is_fatal=false so it never broke | 17:38 |
mnaser | but once i got rid of that option, we resorted to restarting all agents which is not ideal really | 17:38 |
mriedem | i'm not sure if i should report a new bug for the privsep wrinkle here, or if that is a regression in stein - to find out i'd have to spin up a stable/rocky devstack | 17:38 |
dansmith | but this is more broken than what you were seeing | 17:38 |
mnaser | :< | 17:39 |
mriedem | right, i could probably just recreate in our post-test hook in the nova-next job | 17:39 |
mriedem | but first i need some lunch | 17:39 |
eandersson | alex_xu, for sure, but nothing supports it yet, not even the openstackclient | 17:40 |
*** dpawlik has quit IRC | 17:45 | |
KH-Jared | I'm learning that simple changes aren't always simple. Apparently I'm making the tearDown of a test fail | 17:52 |
*** jmlowe has quit IRC | 17:52 | |
*** jmlowe has joined #openstack-nova | 17:55 | |
*** wolverineav has quit IRC | 17:56 | |
artom | KH-Jared, yep. Actual fix: 1 day's work. Tests: Methuselah. | 17:56 |
*** wolverineav has joined #openstack-nova | 17:57 | |
*** amodi has quit IRC | 18:03 | |
*** wolverineav has quit IRC | 18:09 | |
*** wolverineav has joined #openstack-nova | 18:10 | |
*** lbragstad has quit IRC | 18:11 | |
*** lbragstad has joined #openstack-nova | 18:12 | |
*** igordc has quit IRC | 18:14 | |
*** samueldmq has joined #openstack-nova | 18:25 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: DNM: Test theory about bug 1822884 https://review.openstack.org/649464 | 18:26 |
openstack | bug 1822884 in OpenStack Compute (nova) "live migration fails due to port binding duplicate key entry in post_live_migrate" [Undecided,New] https://launchpad.net/bugs/1822884 | 18:26 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: libvirt: drop MIN_LIBVIRT_POSTCOPY_VERSION https://review.openstack.org/649671 | 18:27 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: libvirt: remove conditional on VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY https://review.openstack.org/649674 | 18:27 |
mriedem | hmm wtf http://logs.openstack.org/53/641453/1/check/nova-grenade-live-migration/88cd6f4/logs/screen-n-cpu.txt.gz?level=TRACE#_Apr_03_15_47_39_469617 | 18:29 |
*** READ10 has quit IRC | 18:32 | |
*** ricolin has quit IRC | 18:38 | |
KH-Jared | I feel like I'm missing something, because what failed for 'IBM zVM CI' was a tempest test, but under Zuul, the same tempest test was successful | 18:41 |
mriedem | KH-Jared: you don't need to worry about the zvm ci, your rbd change is only for the libvirt driver which is not what zvm ci is using | 18:42 |
mriedem | the zuul jobs and specifically the non-voting ceph plugin job are what you'd care about | 18:42 |
KH-Jared | good to know. I figured, but I was still going to dig to make sure it wasn't my changes somehow | 18:43 |
KH-Jared | so that was some time spent that I probably should've just asked sooner | 18:43 |
efried | mriedem: If privsep is part of what doesn't start back up properly, and there's privsep stuff in the critical path that's merged since that bug was opened (https://review.openstack.org/#/q/status:merged+project:openstack/nova+branch:master+topic:my-own-personal-alternative-universe) then that'd do it. | 18:45 |
sean-k-mooney | KH-Jared: well it good that you are paying attention to the thrid party ci jobs but yes its always good to ask yourself is it resonable the my change could have broken it | 18:45 |
sean-k-mooney | and if you dont know always feel free to ask | 18:45 |
efried | mriedem: We don't know if privsep was borked before and just not being hit because those paths were still using rootwrap | 18:45 |
mriedem | efried: i'm going to try on rocky devstack to see if it is a very obvious regression in stein because if so we should have a known issue reno | 18:46 |
efried | ack | 18:46 |
*** igordc has joined #openstack-nova | 18:50 | |
*** nicolasbock has joined #openstack-nova | 18:51 | |
*** jmlowe has quit IRC | 18:57 | |
*** tbachman has quit IRC | 19:08 | |
*** tbachman has joined #openstack-nova | 19:10 | |
*** wolverineav has quit IRC | 19:10 | |
*** wolverineav has joined #openstack-nova | 19:10 | |
*** wolverineav has quit IRC | 19:15 | |
*** sidx64 has joined #openstack-nova | 19:19 | |
*** erlon has joined #openstack-nova | 19:32 | |
*** eharney has quit IRC | 19:37 | |
*** tosky has quit IRC | 19:44 | |
*** jmlowe has joined #openstack-nova | 19:48 | |
kashyap | mriedem: Was AFK as I was a bit "under the weather". To your question, no, I don't yet have a patch to remove MIN_LIBVIRT_POSTCOPY_VERSION (and a couple of other constants that I noted in the main commit message) | 19:53 |
kashyap | I'll get to it this week. As I noted I will fix them | 19:53 |
kashyap | mriedem: Curious, what made you notice it? Or just regular code audit caught your eye? | 19:53 |
*** wolverineav has joined #openstack-nova | 19:58 | |
*** READ10 has joined #openstack-nova | 19:58 | |
* kashyap will catch up tomm early (CEST) | 19:58 | |
openstackgerrit | Merged openstack/nova stable/stein: Add doc on VGPU allocs and inventories for nrp https://review.openstack.org/649454 | 19:59 |
mriedem | kashyap: i already pushed up a change, and i just noticed b/c i was looking at that post-copy code | 19:59 |
*** mriedem has quit IRC | 20:01 | |
*** sidx64 has quit IRC | 20:01 | |
*** mriedem has joined #openstack-nova | 20:06 | |
*** erlon has quit IRC | 20:07 | |
*** krypto has quit IRC | 20:08 | |
*** BjoernT has quit IRC | 20:10 | |
*** mrhillsman is now known as mrhillsman_bbiab | 20:11 | |
mriedem | efried: dansmith: i recreated that sighup privsep issue on rocky devstack so it's not a stein regression | 20:14 |
dansmith | "nice" | 20:16 |
*** igordc has quit IRC | 20:18 | |
*** awalende has joined #openstack-nova | 20:20 | |
*** whoami-rajat has quit IRC | 20:29 | |
*** READ10 has quit IRC | 20:34 | |
*** BjoernT has joined #openstack-nova | 20:44 | |
*** rcernin has joined #openstack-nova | 20:45 | |
*** BjoernT has quit IRC | 20:45 | |
*** artom has quit IRC | 20:47 | |
mriedem | heh, if an instance action fails, the message is always "Error" | 20:48 |
mriedem | that's it | 20:48 |
mriedem | the api says, "The related error message for when an action fails." - but that's really only 'Error' | 20:48 |
mriedem | very helpful | 20:48 |
*** wolverineav has quit IRC | 20:49 | |
*** BjoernT has joined #openstack-nova | 20:50 | |
openstackgerrit | Merged openstack/nova master: Libvirt: gracefully handle non-nic VFs https://review.openstack.org/649409 | 20:51 |
*** BjoernT has quit IRC | 20:54 | |
*** ralonsoh has quit IRC | 20:54 | |
*** BjoernT has joined #openstack-nova | 20:57 | |
*** spsurya has quit IRC | 20:59 | |
*** igordc has joined #openstack-nova | 20:59 | |
*** mmethot has quit IRC | 21:00 | |
*** BjoernT has quit IRC | 21:05 | |
mriedem | so this is probably going to be filed under a big pile of don't care, but when you resize a server, you get at least 2 events on the action in conductor, one here (conductor_migrate_server): https://github.com/openstack/nova/blob/e7ae6c65cd24fb3e0776fac80fbab2ab16e9d9ed/nova/conductor/manager.py#L266 | 21:05 |
mriedem | and one here which is just called 'cold_migrate': https://github.com/openstack/nova/blob/e7ae6c65cd24fb3e0776fac80fbab2ab16e9d9ed/nova/conductor/manager.py#L288 | 21:05 |
mriedem | the latter is confusing if you're doing a resize and not a cold migration | 21:05 |
mriedem | and both together is redundant | 21:05 |
mriedem | the latter was here first though | 21:05 |
mriedem | former was added in newton | 21:06 |
mriedem | any validity in dropping the confusing 'cold_migrate' one? | 21:06 |
*** pcaruana has quit IRC | 21:06 | |
mriedem | note for a resize the action name is still 'resize' rather than (cold) 'migrate' | 21:06 |
*** BjoernT has joined #openstack-nova | 21:06 | |
*** owalsh has quit IRC | 21:07 | |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/stein: Libvirt: gracefully handle non-nic VFs https://review.openstack.org/649630 | 21:08 |
mriedem | efried: melwitt: ^ approved | 21:09 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Fix ProviderUsageBaseTestCase._run_periodics for multi-cell https://review.openstack.org/641179 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Improve CinderFixtureNewAttachFlow https://review.openstack.org/639382 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add functional recreate test for bug 1818914 https://review.openstack.org/641521 | 21:12 |
openstack | bug 1818914 in OpenStack Compute (nova) "Hypervisor resource usage on source still shows old flavor usage after resize confirm until update_available_resource periodic runs" [Low,In progress] https://launchpad.net/bugs/1818914 - Assigned to Matt Riedemann (mriedem) | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Remove unused context parameter from RT._get_instance_type https://review.openstack.org/641792 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Update usage in RT.drop_move_claim during confirm resize https://review.openstack.org/641806 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add Migration.cross_cell_move and get_by_uuid https://review.openstack.org/614012 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add InstanceAction/Event create() method https://review.openstack.org/614036 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add Instance.hidden field https://review.openstack.org/631123 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add TargetDBSetupTask https://review.openstack.org/627892 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add CrossCellMigrationTask https://review.openstack.org/631581 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Execute TargetDBSetupTask https://review.openstack.org/633853 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add can_connect_volume() compute driver method https://review.openstack.org/621313 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add prep_snapshot_based_resize_at_dest compute method https://review.openstack.org/633293 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add PrepResizeAtDestTask https://review.openstack.org/627890 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add prep_snapshot_based_resize_at_source compute method https://review.openstack.org/634832 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add nova.compute.utils.delete_image https://review.openstack.org/637605 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add PrepResizeAtSourceTask https://review.openstack.org/627891 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Refactor ComputeManager.remove_volume_connection https://review.openstack.org/642183 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add revert_snapshot_based_resize conductor RPC method https://review.openstack.org/638047 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Revert cross-cell resize from the API https://review.openstack.org/638048 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Confirm cross-cell resize while deleting a server https://review.openstack.org/638268 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add CrossCellWeigher https://review.openstack.org/614353 | 21:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add cross-cell resize policy rule and enable in API https://review.openstack.org/638269 | 21:12 |
*** awaugama has quit IRC | 21:13 | |
mriedem | dansmith: i added a test for this cross-cell resize issue i ran into yesterday: https://review.openstack.org/#/c/643451/5/nova/tests/functional/test_cross_cell_migrate.py@499 - tl;dr is i'm going to need to hard destroy the instance in the db, instance.destroy() won't cut it | 21:14 |
mriedem | for example, resize to target cell and then revert back to source, then try to resize again to target will fail (that's that test) because there is a (soft) deleted instance record in the target cell db still | 21:14 |
mriedem | and the uuid unique constraint will prevent us from creating the instance in the target cell db on the 2nd attempt | 21:15 |
mriedem | same for rollbacks on failure (hard destroy in target cell db) and confirm resize (hard destroy from source cell db) | 21:15 |
mriedem | i think it's probably ok, although it sounds kind of scary - but you'll always have a copy of the instance in one of the db when the operation is over, so we don't lose anything | 21:15 |
*** owalsh has joined #openstack-nova | 21:16 | |
mriedem | a real shitty alternative limitation/workaround is that it's just a no-go for cross-cell resizing that instance until the problem db is purged of the deleted instance | 21:16 |
*** eharney has joined #openstack-nova | 21:21 | |
melwitt | mriedem: thx | 21:30 |
efried | mriedem: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Deadlock%20found%20when%20trying%20to%20get%20lock%3B%20try%20restarting%20transaction%5C%22%20AND%20message%3A%5C%22UPDATE%20migrations%20SET%20updated_at%3D%25(updated_at)s%2C%20status%3D%25(status)s%20WHERE%20migrations.id%20%3D%20%25(migrations_id)s%5C%22 | 21:33 |
efried | looks... new? | 21:33 |
mriedem | efried: nope | 21:34 |
mriedem | https://bugs.launchpad.net/nova/+bug/1642537 | 21:35 |
openstack | Launchpad bug 1642537 in OpenStack Compute (nova) stein "finish_resize fails with DBDeadlock on migrations table" [Medium,In progress] - Assigned to Matt Riedemann (mriedem) | 21:35 |
dansmith | mriedem: you can't undelete it if you need to restore? | 21:36 |
efried | mriedem: not in e-r I guess? | 21:36 |
tonyb | Is there a configurable 'timeout' for server build operations? | 21:37 |
*** slaweq has quit IRC | 21:38 | |
mriedem | efried: used to be i think | 21:39 |
mriedem | dansmith: no....but restore how/where? | 21:39 |
* tonyb is trying to build a baremetal and the instance is hitting exception.MaxRetriesExceeded but the build doesn't actually fail or complete the node just gets powered down too soon | 21:39 | |
mriedem | like i said, you've have a copy of the instance and related records in the other cell db | 21:39 |
dansmith | mriedem: I mean undelete if you need to revert | 21:39 |
dansmith | mriedem: look it up with deleted=yes, set deleted=0, save | 21:39 |
dansmith | you'll have to fix up things we don't soft-delete, but that seems easier | 21:40 |
dansmith | I think you probably do want to hard-delete it when you confirm though, so you don't have to worry about the was-it-in-this-cell-before case each time, | 21:40 |
mriedem | that's not really the issue, unless i'm misunderstanding | 21:40 |
dansmith | but I dunno, while we're waiting for confirm... | 21:40 |
dansmith | okay | 21:40 |
dansmith | oh, I see, | 21:41 |
mriedem | when you revert, the instance is in both dbs, | 21:41 |
melwitt | tonyb: I found this https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.instance_build_timeout | 21:41 |
dansmith | you're talking about the instance in the target cell that got reverted back to source, a week later when you try to migrate again? | 21:41 |
mriedem | yeah | 21:41 |
mriedem | we literally can't have the instance.deleted!=0 in the cell db we're moving to | 21:41 |
tonyb | melwitt: Thanks | 21:42 |
tonyb | (undercloud) [root@director config-data]# grep -Erin instance_build_timeout nova | 21:42 |
tonyb | nova/etc/nova/nova.conf:983:#instance_build_timeout=0 | 21:42 |
mriedem | heh i didn't know instance_build_timeout existed | 21:42 |
*** rcernin has quit IRC | 21:42 | |
tonyb | so I guess that isn't it | 21:42 |
mriedem | it's a periodic | 21:42 |
mriedem | looks like a hack for just setting instance status to ERROR for things that were hung or we didn't properly set to ERROR state on failure | 21:43 |
*** takashin has joined #openstack-nova | 21:43 | |
dansmith | mriedem: okay, I guess that seems obvious to me so I'm not sure why you're bringing it up as if it's a problem or something | 21:43 |
mriedem | dansmith: we just haven't hard hard destroy on stuff in the cell db records, except tags i guess | 21:43 |
mriedem | so it seems new and scary | 21:43 |
melwitt | tonyb: oh yeah, hm | 21:43 |
dansmith | mmkay | 21:44 |
* tonyb wander off and look harder | 21:44 | |
mriedem | luckily theodoros already has a patch to add that support for his rebuild from cell0 series https://review.openstack.org/#/c/570202/ | 21:44 |
mriedem | which it turned out he later didn't need, but i will | 21:44 |
mriedem | anyway, i'll update the re-proposed spec to note it | 21:47 |
*** bbowen__ has quit IRC | 21:51 | |
*** slaweq has joined #openstack-nova | 21:55 | |
*** awalende has quit IRC | 21:59 | |
*** slaweq has quit IRC | 22:00 | |
*** bbowen__ has joined #openstack-nova | 22:00 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: api-ref: document ordering for instance actions and events https://review.openstack.org/649748 | 22:06 |
*** wolverineav has joined #openstack-nova | 22:06 | |
*** weshay is now known as weshay|ruck | 22:08 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: api-ref: fix description of os-server-external-events 'events' param https://review.openstack.org/649750 | 22:09 |
*** BjoernT has quit IRC | 22:12 | |
*** wolverineav has quit IRC | 22:12 | |
*** mlavalle has quit IRC | 22:15 | |
*** artom has joined #openstack-nova | 22:27 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Remove MIN_COMPUTE_MULTIATTACH conditions in API https://review.openstack.org/649757 | 22:32 |
*** bbowen__ has quit IRC | 22:33 | |
*** bbowen__ has joined #openstack-nova | 22:33 | |
*** nicolasbock has quit IRC | 22:34 | |
*** lbragstad has quit IRC | 22:49 | |
*** lbragstad has joined #openstack-nova | 22:50 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add nova-status upgrade check for minimum required cinder API version https://review.openstack.org/649759 | 22:51 |
efried | mriedem: where does grenade source live? | 23:01 |
*** wolverineav has joined #openstack-nova | 23:02 | |
mriedem | http://git.openstack.org/cgit/openstack-dev/grenade/ | 23:03 |
*** tkajinam has joined #openstack-nova | 23:04 | |
*** slaweq has joined #openstack-nova | 23:06 | |
*** wolverineav has quit IRC | 23:06 | |
*** slaweq has quit IRC | 23:10 | |
*** wolverineav has joined #openstack-nova | 23:56 | |
*** brinzhang has joined #openstack-nova | 23:58 | |
*** tbachman has quit IRC | 23:58 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!