*** frankwang has quit IRC | 00:06 | |
*** markvoelker has quit IRC | 00:07 | |
*** markvoelker has joined #openstack-nova | 00:07 | |
*** _hemna has quit IRC | 00:08 | |
*** frankwang has joined #openstack-nova | 00:10 | |
*** markvoelker has quit IRC | 00:12 | |
*** jaypipes has quit IRC | 00:15 | |
*** spatel has joined #openstack-nova | 00:21 | |
*** tetsuro has joined #openstack-nova | 00:43 | |
*** rcernin has joined #openstack-nova | 00:45 | |
*** rcernin has quit IRC | 00:45 | |
*** rcernin has joined #openstack-nova | 00:45 | |
*** lbragstad has quit IRC | 00:58 | |
*** spsurya has joined #openstack-nova | 01:01 | |
*** spatel has quit IRC | 01:01 | |
*** slaweq has quit IRC | 01:07 | |
*** rcernin has quit IRC | 01:21 | |
*** rcernin has joined #openstack-nova | 01:21 | |
*** itlinux has joined #openstack-nova | 01:25 | |
*** guozijn has joined #openstack-nova | 01:27 | |
*** guozijn has quit IRC | 01:28 | |
*** dave-mccowan has joined #openstack-nova | 01:31 | |
*** guozijn has joined #openstack-nova | 01:33 | |
*** mriedem has quit IRC | 01:38 | |
*** hongbin has joined #openstack-nova | 01:55 | |
*** boxiang_ has quit IRC | 01:59 | |
*** boxiang_ has joined #openstack-nova | 01:59 | |
*** _hemna has joined #openstack-nova | 02:04 | |
*** Sundar has joined #openstack-nova | 02:07 | |
*** whoami-rajat has joined #openstack-nova | 02:07 | |
*** markvoelker has joined #openstack-nova | 02:08 | |
*** guozijn has quit IRC | 02:10 | |
*** takashin has joined #openstack-nova | 02:12 | |
alex_xu | sean-k-mooney: thanks | 02:17 |
---|---|---|
yonglihe | sean-k-mooney: thanks | 02:18 |
*** _hemna has quit IRC | 02:38 | |
*** markvoelker has quit IRC | 02:42 | |
*** Sundar has quit IRC | 02:44 | |
*** JamesBenson has joined #openstack-nova | 02:44 | |
*** _hemna has joined #openstack-nova | 02:59 | |
*** Sundar has joined #openstack-nova | 03:03 | |
*** frankwang has quit IRC | 03:07 | |
*** frankwang has joined #openstack-nova | 03:07 | |
openstackgerrit | XiaojueGuan proposed openstack/nova master: Fix code intendent of file wsgi.py https://review.opendev.org/663487 | 03:09 |
*** brinzhang has quit IRC | 03:09 | |
*** _hemna has quit IRC | 03:14 | |
*** tetsuro has quit IRC | 03:22 | |
*** BjoernT has quit IRC | 03:31 | |
*** dikonoor has joined #openstack-nova | 03:41 | |
*** _hemna has joined #openstack-nova | 03:41 | |
*** dave-mccowan has quit IRC | 03:50 | |
*** frankwang has quit IRC | 03:53 | |
*** tetsuro has joined #openstack-nova | 03:56 | |
*** udesale has joined #openstack-nova | 04:00 | |
*** brinzhang has joined #openstack-nova | 04:06 | |
*** hongbin has quit IRC | 04:11 | |
*** _hemna has quit IRC | 04:15 | |
*** Sundar has quit IRC | 04:27 | |
*** pcaruana has joined #openstack-nova | 04:50 | |
*** dikonoor has quit IRC | 04:59 | |
*** tkajinam has quit IRC | 05:00 | |
*** tetsuro has quit IRC | 05:08 | |
*** dikonoor has joined #openstack-nova | 05:12 | |
*** ratailor has joined #openstack-nova | 05:12 | |
*** boxiang_ has quit IRC | 05:13 | |
*** boxiang_ has joined #openstack-nova | 05:14 | |
*** _hemna has joined #openstack-nova | 05:29 | |
*** janki has joined #openstack-nova | 05:30 | |
*** luksky has joined #openstack-nova | 05:34 | |
openstackgerrit | Merged openstack/nova master: Hide hypervisor id on windows guests https://review.opendev.org/579897 | 05:35 |
*** frankwang has joined #openstack-nova | 05:48 | |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove an unused method https://review.opendev.org/663502 | 05:48 |
*** janki has quit IRC | 05:49 | |
*** dtantsur|afk is now known as dtantsur | 05:49 | |
*** janki has joined #openstack-nova | 05:49 | |
*** janki has quit IRC | 05:50 | |
*** janki has joined #openstack-nova | 05:51 | |
*** luksky has quit IRC | 05:53 | |
*** damien_r has joined #openstack-nova | 05:56 | |
*** damien_r has quit IRC | 06:00 | |
*** tkajinam has joined #openstack-nova | 06:00 | |
*** _hemna has quit IRC | 06:03 | |
*** lpetrut has joined #openstack-nova | 06:03 | |
*** liuyulong_ has joined #openstack-nova | 06:07 | |
*** slaweq has joined #openstack-nova | 06:14 | |
*** xek has joined #openstack-nova | 06:16 | |
*** Spencer_Yu has joined #openstack-nova | 06:18 | |
*** udesale has quit IRC | 06:21 | |
*** udesale has joined #openstack-nova | 06:21 | |
*** dpawlik has joined #openstack-nova | 06:22 | |
*** maciejjozefczyk_ has joined #openstack-nova | 06:30 | |
*** dklyle has quit IRC | 06:38 | |
*** dklyle has joined #openstack-nova | 06:38 | |
*** maciejjozefczyk_ is now known as maciejjozefczyk | 06:48 | |
*** udesale has quit IRC | 06:49 | |
*** udesale has joined #openstack-nova | 06:50 | |
*** tetsuro has joined #openstack-nova | 06:55 | |
*** tetsuro has quit IRC | 07:00 | |
*** gyee has quit IRC | 07:04 | |
*** Spencer_Yu has quit IRC | 07:06 | |
*** frankwang has quit IRC | 07:07 | |
*** damien_r has joined #openstack-nova | 07:09 | |
*** damien_r has quit IRC | 07:10 | |
*** _hemna has joined #openstack-nova | 07:12 | |
*** tesseract has joined #openstack-nova | 07:12 | |
*** damien_r has joined #openstack-nova | 07:13 | |
*** rcernin has quit IRC | 07:14 | |
*** frankwang has joined #openstack-nova | 07:15 | |
*** _hemna has quit IRC | 07:16 | |
*** brault has joined #openstack-nova | 07:17 | |
*** tssurya has joined #openstack-nova | 07:17 | |
*** ratailor has quit IRC | 07:18 | |
*** ratailor has joined #openstack-nova | 07:18 | |
*** rpittau|afk is now known as rpittau | 07:19 | |
*** rnoriega has quit IRC | 07:22 | |
*** weshay has quit IRC | 07:22 | |
*** weshay has joined #openstack-nova | 07:23 | |
*** rnoriega has joined #openstack-nova | 07:23 | |
*** tetsuro has joined #openstack-nova | 07:25 | |
*** janki has quit IRC | 07:27 | |
*** janki has joined #openstack-nova | 07:27 | |
*** janki has quit IRC | 07:28 | |
openstackgerrit | Sharat Sharma proposed openstack/nova master: Modifying install-guide to include public endpoint for identity service https://review.opendev.org/663530 | 07:32 |
*** ttsiouts has joined #openstack-nova | 07:43 | |
*** brault has quit IRC | 07:56 | |
*** ratailor_ has joined #openstack-nova | 07:56 | |
*** brault has joined #openstack-nova | 07:57 | |
*** helenafm has joined #openstack-nova | 07:58 | |
*** ralonsoh has joined #openstack-nova | 07:58 | |
*** ttsiouts has quit IRC | 07:58 | |
*** markvoelker has joined #openstack-nova | 07:58 | |
*** ratailor has quit IRC | 07:59 | |
*** ttsiouts has joined #openstack-nova | 07:59 | |
*** takashin has left #openstack-nova | 08:00 | |
*** udesale has quit IRC | 08:01 | |
*** ratailor__ has joined #openstack-nova | 08:01 | |
*** brault has quit IRC | 08:01 | |
*** udesale has joined #openstack-nova | 08:03 | |
*** ttsiouts has quit IRC | 08:03 | |
*** ratailor_ has quit IRC | 08:03 | |
*** ttsiouts has joined #openstack-nova | 08:07 | |
*** ociuhandu has joined #openstack-nova | 08:07 | |
*** ociuhandu has quit IRC | 08:07 | |
*** ociuhandu has joined #openstack-nova | 08:08 | |
*** kashyap has joined #openstack-nova | 08:14 | |
*** tetsuro has quit IRC | 08:15 | |
kashyap | alex_xu: Want to put this through? I already had your +2 on it (and it now has +2 from Eric): https://review.opendev.org/#/c/661574/ | 08:19 |
*** luksky has joined #openstack-nova | 08:20 | |
alex_xu | kashyap: checking now | 08:22 |
lyarwood | mdbooth: morning, think I've worked out the test_show_update_rebuild_list_server RDO failure. It's a simple race, when passing the test isn't verifying the image as other tests have already downloaded it. | 08:22 |
*** ttsiouts has quit IRC | 08:22 | |
mdbooth | lyarwood: Ouch! | 08:23 |
lyarwood | mdbooth: I can reproduce easily by running tempest.api.compute.servers.test_servers and everything passes and tempest.api.compute.servers.test_servers.ServerShowV263Test where the test fails | 08:23 |
mdbooth | lyarwood: Also, that's awesome sleuthing | 08:23 |
*** ttsiouts has joined #openstack-nova | 08:23 | |
lyarwood | mdbooth: ta, the DNM change helped by not actually printing anything in the logs | 08:23 |
lyarwood | a true W T F moment again this morning | 08:23 |
* lyarwood writes this up in a bug | 08:24 | |
*** udesale has quit IRC | 08:24 | |
*** ttsiouts has quit IRC | 08:27 | |
*** tetsuro has joined #openstack-nova | 08:28 | |
*** ratailor__ has quit IRC | 08:30 | |
*** ratailor has joined #openstack-nova | 08:31 | |
*** markvoelker has quit IRC | 08:32 | |
*** brault has joined #openstack-nova | 08:34 | |
*** ratailor_ has joined #openstack-nova | 08:36 | |
*** ratailor has quit IRC | 08:38 | |
*** derekh has joined #openstack-nova | 08:41 | |
*** ttsiouts has joined #openstack-nova | 08:42 | |
*** ociuhandu has quit IRC | 08:44 | |
openstackgerrit | Liang Fang proposed openstack/nova master: [WIP] Leverage OCF cache framework for VM disks https://review.opendev.org/663542 | 08:46 |
*** brault has quit IRC | 08:47 | |
*** udesale has joined #openstack-nova | 08:51 | |
*** davidsha has joined #openstack-nova | 08:58 | |
*** liuyulong_ has quit IRC | 09:00 | |
*** tetsuro has quit IRC | 09:01 | |
*** tkajinam has quit IRC | 09:01 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: allow getting resource request of every bound ports of an instance https://review.opendev.org/655110 | 09:04 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Pass network API to the conducor's MigrationTask https://review.opendev.org/655111 | 09:04 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Add request_spec to server move RPC calls https://review.opendev.org/655721 | 09:04 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: re-calculate provider mapping during migration https://review.opendev.org/655112 | 09:04 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: update allocation in binding profile during migrate https://review.opendev.org/656422 | 09:04 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Extend NeutronFixture to handle migrations https://review.opendev.org/655114 | 09:04 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: prepare func test env for moving servers with bandwidth https://review.opendev.org/655109 | 09:04 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: func test for migrate server with ports having resource request https://review.opendev.org/655113 | 09:04 |
openstackgerrit | Alvaro Lopez Garcia proposed openstack/nova master: Ensure that periodic reclaim cleans DB deleted instances https://review.opendev.org/323250 | 09:07 |
bauzas | lyarwood: FWIW, I'm categorizing https://bugs.launchpad.net/nova/+bug/1831538 as High since 'q35' isn't the gate default machine type | 09:09 |
openstack | Launchpad bug 1831538 in OpenStack Compute (nova) "IDE config drive CDROM doesn't work with q35 machine type" [High,In progress] - Assigned to Lee Yarwood (lyarwood) | 09:09 |
lyarwood | bauzas: ack that's fair | 09:10 |
bauzas | lyarwood: but if we consider 'q35' to be more useful to our users than 'pc', I think https://review.opendev.org/#/c/662887/ is much appreciated | 09:10 |
kashyap | bauzas: It's not necessarily "more useful"; in some cases 'pc' chipset maybe what the user precisely wants | 09:11 |
lyarwood | bauzas: yeah I spoke to artom yesterday about potentially dropping DNM from that | 09:11 |
*** tetsuro has joined #openstack-nova | 09:11 | |
bauzas | lyarwood: https://review.opendev.org/#/c/662887/ is still getting -1 from Zuul | 09:11 |
lyarwood | bauzas: it's an unrelated failure | 09:11 |
kashyap | bauzas: But in general, for new guests, we recommend 'q35'. | 09:11 |
lyarwood | bauzas: I rechecked this morning | 09:11 |
bauzas | oh sorry | 09:12 |
*** _hemna has joined #openstack-nova | 09:12 | |
bauzas | kashyap: yeah i don't disagree, that's why we leave machine types to be configurable | 09:12 |
*** priteau has joined #openstack-nova | 09:13 | |
bauzas | kashyap: here I'm talking of choosing a default value for most users, and it looks to me 'q35' gives more benefits than 'pc' | 09:13 |
bauzas | so, technically, we should have it in the gate | 09:13 |
kashyap | Yep; also one reason I avoided changing the default in Nova (https://bugs.launchpad.net/nova/+bug/1780138), because we've delegated that decision to the orchestrator for now. | 09:13 |
openstack | Launchpad bug 1780138 in OpenStack Compute (nova) "Don't assume the guest machine type to be of 'pc'" [Medium,Confirmed] - Assigned to Kashyap Chamarthy (kashyapc) | 09:13 |
kashyap | [We may revisit it later] | 09:13 |
kashyap | bauzas: Yeah, agreed -- on testing in Gate. | 09:14 |
openstackgerrit | Merged openstack/nova master: Document mitigation for Intel MDS security flaws https://review.opendev.org/661574 | 09:16 |
*** tetsuro has quit IRC | 09:20 | |
*** jistr is now known as jistr|lnl | 09:28 | |
*** luksky has quit IRC | 09:28 | |
*** markvoelker has joined #openstack-nova | 09:29 | |
*** janki has joined #openstack-nova | 09:29 | |
*** abhishekk has joined #openstack-nova | 09:31 | |
openstackgerrit | Brin Zhang proposed openstack/nova-specs master: Add flavor metadata or metadata group https://review.opendev.org/663563 | 09:35 |
openstackgerrit | Brin Zhang proposed openstack/nova-specs master: Add flavor metadata or metadata group https://review.opendev.org/663563 | 09:37 |
*** boxiang_ has quit IRC | 09:39 | |
*** boxiang_ has joined #openstack-nova | 09:40 | |
*** tetsuro has joined #openstack-nova | 09:43 | |
*** _hemna has quit IRC | 09:46 | |
openstackgerrit | Dongcan Ye proposed openstack/nova master: Raise BuildAbortException while updating instance task_state conflict https://review.opendev.org/633160 | 09:47 |
*** luksky has joined #openstack-nova | 10:00 | |
*** ociuhandu has joined #openstack-nova | 10:01 | |
*** markvoelker has quit IRC | 10:03 | |
*** brinzhang has quit IRC | 10:06 | |
*** ttsiouts has quit IRC | 10:14 | |
*** ttsiouts has joined #openstack-nova | 10:14 | |
*** dikonoor has quit IRC | 10:17 | |
*** ttsiouts has quit IRC | 10:19 | |
*** sapd1_x has joined #openstack-nova | 10:22 | |
*** abhishekk has quit IRC | 10:26 | |
*** ivve has joined #openstack-nova | 10:27 | |
*** frankwang has quit IRC | 10:33 | |
*** frankwang has joined #openstack-nova | 10:33 | |
openstackgerrit | Sharat Sharma proposed openstack/nova master: "SUSPENDED" description changed in server_concepts guide and API REF https://review.opendev.org/663590 | 10:34 |
*** priteau has quit IRC | 10:35 | |
*** bbowen has quit IRC | 10:42 | |
*** tetsuro has quit IRC | 10:45 | |
*** tbachman has quit IRC | 10:47 | |
mdbooth | lyarwood: Re: https://review.opendev.org/#/c/663596/ has that test ever worked? | 10:53 |
mdbooth | I mean, it sorta worked, but only because we weren't doing image validation, right? | 10:53 |
lyarwood | mdbooth: correct | 10:55 |
*** brault has joined #openstack-nova | 10:56 | |
lyarwood | mdbooth: so that change disables it by default and leaves some configurables we can wire up over in https://review.opendev.org/#/c/515210/ if we wanted to | 10:56 |
lyarwood | mdbooth: just respinning to make the requirements more clear and link https://review.opendev.org/#/c/515210/ in the commit message | 10:58 |
*** frankwang has quit IRC | 10:58 | |
mdbooth | lyarwood: Even with that change though, the test is still broken right? | 10:58 |
mdbooth | Because if the image has been previously cached by another test it won't actually run. | 10:59 |
lyarwood | mdbooth: if a valid image and trusted certs are provided then no | 10:59 |
mdbooth | s/run/validate/ | 10:59 |
*** markvoelker has joined #openstack-nova | 10:59 | |
lyarwood | mdbooth: it's no longer using the default image that wouldn't work | 10:59 |
mdbooth | It will pass, but it won't have tested image validation. | 10:59 |
mdbooth | Ah... you also updated the ref | 10:59 |
lyarwood | mdbooth: it should | 10:59 |
lyarwood | yeah | 11:00 |
lyarwood | it should cause a download etc | 11:00 |
mdbooth | lyarwood: Sorry, was being sloppy didn't read that far. | 11:00 |
lyarwood | the test isn't specifically looking at that btw | 11:00 |
lyarwood | that's more to just avoid failures | 11:00 |
lyarwood | the barbican plugin is really testing all of that | 11:00 |
openstackgerrit | Sharat Sharma proposed openstack/nova master: [Docs] Update the confusing console output https://review.opendev.org/589004 | 11:00 |
*** brault has quit IRC | 11:01 | |
*** takamatsu has quit IRC | 11:01 | |
*** takamatsu has joined #openstack-nova | 11:02 | |
*** rafaelweingartne has joined #openstack-nova | 11:04 | |
*** jistr|lnl is now known as jistr | 11:18 | |
kashyap | sean-k-mooney: So, for the PCIe root ports, default to 32 / max 32 -- yeah? Based on DanPB's tests? | 11:22 |
sean-k-mooney | yes | 11:23 |
*** udesale has quit IRC | 11:23 | |
kashyap | Thx | 11:27 |
openstackgerrit | Edward Hope-Morley proposed openstack/nova master: Fix python3 compatibility of rbd get_fsid https://review.opendev.org/663607 | 11:28 |
*** markvoelker has quit IRC | 11:32 | |
*** _hemna has joined #openstack-nova | 11:42 | |
*** ttsiouts has joined #openstack-nova | 11:47 | |
kashyap | git fetch gerrit | 11:49 |
kashyap | Oops | 11:50 |
openstackgerrit | Kashyap Chamarthy proposed openstack/nova master: [WIP] libvirt: Update the default number of PCIe root ports to 32 https://review.opendev.org/663614 | 11:50 |
kashyap | sean-k-mooney: ^ Still need to reword the commit, and tweak the test, perhaps | 11:50 |
kashyap | (And also to see if any other code path is effected) | 11:50 |
*** ratailor_ has quit IRC | 11:51 | |
sean-k-mooney | kashyap: ya just commented on the test you should leave it at 8 as you are testing non defualt config values | 11:54 |
kashyap | sean-k-mooney: Yeah, you're right. (Aside: Instead of "Update", I'd use the phrase "Preallocate" -- as that captures the intention more correctly?) | 11:55 |
kashyap | sean-k-mooney: I stole the preallocate word from you | 11:55 |
sean-k-mooney | i didnt look into the test fully but i suspect we want to test the default, when its set to 0 and when its set to a non default non 0 values | 11:56 |
kashyap | Right, let me twiddle | 11:57 |
openstackgerrit | Vlad Gusev proposed openstack/nova stable/stein: Hide hypervisor id on windows guests https://review.opendev.org/663616 | 11:57 |
sean-k-mooney | well we are updating the config value to preallocate the pcie-ports so tehy are availabel for hotplug when needed | 11:57 |
kashyap | (Nod) | 11:58 |
*** dikonoor has joined #openstack-nova | 11:59 | |
*** lpetrut has quit IRC | 12:00 | |
*** lpetrut has joined #openstack-nova | 12:00 | |
*** jaypipes has joined #openstack-nova | 12:01 | |
*** tetsuro has joined #openstack-nova | 12:02 | |
*** tetsuro has quit IRC | 12:03 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: Fix python3 compatibility of rbd get_fsid https://review.opendev.org/635220 | 12:06 |
lyarwood | mdbooth: ^ FYI just stumbled across this | 12:07 |
mdbooth | lyarwood: Interesting. I wonder why we haven't hit that, yet. | 12:09 |
artom | lyarwood, yeah, a ML post needs to happen about that | 12:10 |
artom | (that = q35) | 12:10 |
*** bbowen has joined #openstack-nova | 12:12 | |
*** tbachman has joined #openstack-nova | 12:13 | |
*** eharney has quit IRC | 12:14 | |
mdbooth | lyarwood: I wonder if we've already pulled in the ceph fix downstream? | 12:14 |
kashyap | artom: I have a half draft sitting; will send something "soon" | 12:14 |
artom | kashyap, appreciated :) | 12:14 |
kashyap | artom: Based on the description notes here: https://bugs.launchpad.net/nova/+bug/1780138 | 12:14 |
openstack | Launchpad bug 1780138 in OpenStack Compute (nova) "Don't assume the guest machine type to be of 'pc'" [Medium,Confirmed] - Assigned to Kashyap Chamarthy (kashyapc) | 12:14 |
lyarwood | mdbooth: it would be pretty transparent if we hadn't | 12:15 |
mdbooth | Presumably snapshot would be failing on ceph | 12:15 |
mdbooth | Why isn't the upstream gate broken? | 12:15 |
lyarwood | upstream gate is py2 | 12:15 |
* mdbooth guesses no py3/ceph testing | 12:15 | |
lyarwood | right, actually ceph is still nv on py2 | 12:16 |
*** lbragstad has joined #openstack-nova | 12:16 | |
lyarwood | ceph py3 is in the experimental queue | 12:16 |
mdbooth | lyarwood: Ack. So this looks like a fix we approve of, but probably not a downstream blocker. | 12:16 |
mdbooth | Should be high priority upstream, though. | 12:17 |
*** _hemna has quit IRC | 12:17 | |
*** claudiub has joined #openstack-nova | 12:20 | |
*** derekh has quit IRC | 12:21 | |
*** trident has quit IRC | 12:21 | |
*** trident has joined #openstack-nova | 12:26 | |
*** dave-mccowan has joined #openstack-nova | 12:28 | |
*** markvoelker has joined #openstack-nova | 12:29 | |
*** jchhatbar has joined #openstack-nova | 12:36 | |
*** janki has quit IRC | 12:38 | |
*** jchhatbar has quit IRC | 12:49 | |
kashyap | sean-k-mooney: So I dug the upstream logs of #virt channel, and the "max 28" thing for PCIe ports came from this: | 12:51 |
*** pcaruana has quit IRC | 12:52 | |
kashyap | [For aarch64, *apparently*:] | 12:52 |
kashyap | <paste> | 12:53 |
kashyap | 13:23 < hrw> ok. 28 pcie-root-port entries are maximum | 12:53 |
kashyap | 13:23 < hrw> 29 == uefi dumps to shell instead of booting | 12:53 |
kashyap | 13:25 < abologna> hrw: that might be a bug rather than an actual limit, but there's a limit of 256 on... something? possibly devices, each pcie-root-port has 8 functions so that would be 31 to get to 256 devices | 12:53 |
kashyap | 13:25 < abologna> hrw: except pcie-root-ports are of course devices themselves | 12:53 |
kashyap | 13:25 < abologna> hrw: and even after plugging in 28 pcie-root-ports some of pcie-root's slots and functions will be empty | 12:53 |
kashyap | 13:26 < hrw> will discuss that with our uefi developers ;D | 12:53 |
kashyap | </end-paste-spam> | 12:53 |
kashyap | [That snippet is from Feb 2018, BTW] | 12:53 |
sean-k-mooney | ok so we should leave it at 28 and just set teh default to 28 | 12:53 |
kashyap | sean-k-mooney: I wondering if I should step into the rabbit hole to test with an AArch64 guest -- to see what is the current limitation? | 12:54 |
kashyap | sean-k-mooney: As abologna, the author of https://libvirt.org/pci-hotplug.html, seems to imply the 28 is not even a valid limit. | 12:55 |
sean-k-mooney | kashyap: proably not as we cant assume they are using a new version of qemu or whatever impose the 28 limit on arrch64 | 12:55 |
kashyap | sean-k-mooney: The thing is, we're not sure that 28 is an _actual_ limit or not. Nobody seem to have confirmed. | 12:56 |
sean-k-mooney | if you want to grab a aarch64 guest image and test then sure | 12:57 |
sean-k-mooney | but we shoudl play it safe with the default | 12:57 |
kashyap | Yeah, let me do the test. I want to be sure. | 12:57 |
*** brault has joined #openstack-nova | 12:59 | |
*** dikonoor has quit IRC | 13:01 | |
*** markvoelker has quit IRC | 13:03 | |
*** brault has quit IRC | 13:03 | |
*** derekh has joined #openstack-nova | 13:06 | |
stephenfin | gibi: Any chance you could take a look at https://review.opendev.org/#/c/660774/ again today and hit me up if you need more info? | 13:07 |
gibi | stephenfin: give me an hour | 13:07 |
stephenfin | ta | 13:08 |
gibi | stephenfin: I'm on a meeting now | 13:08 |
stephenfin | All good. Thanks :) | 13:08 |
*** bnemec has joined #openstack-nova | 13:11 | |
kashyap | Does anyone here (or know someone who) uses OpenStack on AArch64? | 13:22 |
* kashyap will expect crickets to chirp | 13:22 | |
sean-k-mooney | there are some folk from lenaro that hang out on the openstack-kolla irc form time to time | 13:23 |
*** mriedem has joined #openstack-nova | 13:23 | |
stephenfin | kashyap: Might want to talk to tonyb | 13:25 |
kashyap | stephenfin: Yep, noted | 13:26 |
kashyap | sean-k-mooney: Yeah, I was looking for 'hrw' | 13:27 |
kashyap | sean-k-mooney: What a beast this whole PCIe saga is | 13:29 |
*** mloza has joined #openstack-nova | 13:29 | |
*** udesale has joined #openstack-nova | 13:29 | |
*** tbachman has quit IRC | 13:29 | |
*** BjoernT has joined #openstack-nova | 13:30 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: api: Remove 'Debug' middleware https://review.opendev.org/662506 | 13:30 |
sean-k-mooney | kashyap: well its in good hands. we could just do nothing and leave it to the operator to configur the config for there needs which is what nova originally opted to do | 13:32 |
sean-k-mooney | but if there is a sane default then that is also a good outcome too | 13:33 |
sean-k-mooney | brb | 13:33 |
kashyap | sean-k-mooney: Yeah, indeed. But was just exclaiming about the general subtlety involved here... | 13:33 |
*** Sundar has joined #openstack-nova | 13:34 | |
*** ttsiouts has quit IRC | 13:35 | |
*** ttsiouts has joined #openstack-nova | 13:36 | |
*** ttsiouts has quit IRC | 13:40 | |
lyarwood | mriedem / efried ; https://review.opendev.org/#/c/663011/ - morning, if you have time to day I'm looking for some non-RH core review on this libvirt specific bugfix. There's a change on top of this that's testing the q35 machine type in the gate. I'm looking into the extend volume failure at the moment. | 13:41 |
efried | lyarwood: looking | 13:41 |
*** ttsiouts has joined #openstack-nova | 13:47 | |
*** whoami-rajat has quit IRC | 13:47 | |
*** jaosorior has joined #openstack-nova | 13:47 | |
*** jaosorior has quit IRC | 13:51 | |
efried | lyarwood: Are you looking to backport this? | 13:52 |
lyarwood | efried: only to stable/stein | 13:53 |
lyarwood | and the last I checked it was still clean | 13:53 |
lyarwood | just | 13:53 |
efried | orly? | 13:54 |
lyarwood | yeah I know right | 13:54 |
*** frankwang has joined #openstack-nova | 13:54 | |
efried | I thought I'd seen a bunch of twiddling of test_driver that seems unavoidable considering how many places you hit | 13:54 |
openstackgerrit | Lee Yarwood proposed openstack/nova stable/stein: libvirt: Use SATA bus for cdrom devices when using Q35 machine type https://review.opendev.org/663677 | 13:55 |
lyarwood | ^ just to show I'm not making it up | 13:55 |
stephenfin | lyarwood: comments incoming on that, btw | 13:57 |
stephenfin | efried: too | 13:57 |
lyarwood | stephenfin: thanks | 13:57 |
*** whoami-rajat has joined #openstack-nova | 13:57 | |
*** ricolin has joined #openstack-nova | 13:57 | |
*** frankwang has quit IRC | 13:58 | |
stephenfin | lyarwood: done | 13:59 |
*** markvoelker has joined #openstack-nova | 13:59 | |
stephenfin | Apologies if you explored that idea already but I couldn't see any comments suggesting it | 13:59 |
openstackgerrit | Arnaud Morin proposed openstack/nova master: Refresh instance network info on deletion https://review.opendev.org/660761 | 14:01 |
*** tbachman has joined #openstack-nova | 14:01 | |
efried | lyarwood: It's early, but I think you changed the logic of the extracted method. | 14:02 |
*** ttsiouts has quit IRC | 14:05 | |
*** brault has joined #openstack-nova | 14:06 | |
*** ttsiouts has joined #openstack-nova | 14:06 | |
gibi | exit | 14:07 |
dansmith | no | 14:08 |
gibi | :) | 14:09 |
efried | you can check out any time you like, | 14:09 |
efried | but you can never leeeeaaave | 14:09 |
gibi | sorry | 14:09 |
dansmith | efried: kinda infringing on mriedem's turf there buddy | 14:10 |
gibi | stephenfin: which test case was too brittle to change in https://review.opendev.org/#/c/660774 ? | 14:10 |
*** ttsiouts has quit IRC | 14:11 | |
stephenfin | gibi: | 14:11 |
stephenfin | * tbachman (~tbachman@128.107.241.188) has joined #openstack-nova | 14:11 |
stephenfin | <efried> lyarwood: It's early, but I think you changed the logic of the extracted method. | 14:11 |
stephenfin | * ttsiouts has quit (Remote host closed the connection) | 14:11 |
stephenfin | * brault (~brault@lfbn-1-9197-156.w86-238.abo.wanadoo.fr) has joined #openstack-nova | 14:11 |
stephenfin | * ttsiouts (~ttsiouts@2001:1458:204:1::101:9145) has joined #openstack-nova | 14:11 |
stephenfin | damn you HexChat | 14:11 |
*** brault has quit IRC | 14:11 | |
stephenfin | gibi: https://review.opendev.org/#/c/660774/3/nova/tests/unit/compute/test_compute.py@12875 | 14:11 |
tbachman | stephenfin: accidental beep? | 14:11 |
stephenfin | tbachman: Yup. Sorry for the noise | 14:12 |
lyarwood | efried: yeah you're right but I'm not sure the original logic was there tbh | 14:12 |
tbachman | no worries! | 14:12 |
lyarwood | efried: that seems to suggest that the config would overwrite the image metadata | 14:12 |
lyarwood | efried: shouldn't it be (mach_type or libvirt_utils.get_default_machine_type(caps.host.cpu.arch))? | 14:12 |
efried | lyarwood: No, the image meta took first priority, then the config, then the caps (IIUC) | 14:13 |
*** _hemna has joined #openstack-nova | 14:13 | |
lyarwood | ah that diff was all messed up on my screen | 14:13 |
efried | I don't know what it *should* be. I'm just parsing what it *was* vs what it is in your patch. | 14:13 |
lyarwood | ack thanks | 14:13 |
*** dikonoor has joined #openstack-nova | 14:14 | |
efried | if image_meta... is not None: | 14:15 |
efried | mach_type = image_meta... | 14:15 |
efried | ==> and the rest of the logic was in the `else`, so would be skipped and we go right to the return | 14:15 |
*** cmart has joined #openstack-nova | 14:15 | |
lyarwood | yup sorry the indentation was all messed up in gerrit so I missed that before | 14:16 |
*** lpetrut has quit IRC | 14:16 | |
efried | in the `else` we did: | 14:16 |
efried | if caps.onething: | 14:16 |
efried | mach_type = onething | 14:16 |
efried | if caps.anotherthing | 14:16 |
efried | mach_type = anotherthing | 14:16 |
efried | and then the weirdness of: | 14:16 |
efried | mach_type = get_default_machine_type... or mach_type <== i.e. get_default_machine_type gets priority here | 14:16 |
efried | This has struck me as pretty tough to follow every time I've looked at this method, so any refactor to make it more explicit would be welcomed :) | 14:17 |
gibi | stephenfin: OK. I don't think I should block your patch. We can revisit the removal of _resize() at a later point | 14:18 |
gibi | stephenfin: I will play a bit with that test case but I put a +2 on your patch | 14:19 |
*** tssurya has quit IRC | 14:20 | |
*** ttsiouts has joined #openstack-nova | 14:23 | |
*** Sundar has quit IRC | 14:28 | |
lyarwood | stephenfin: okay, another way of doing this with less crazy passing of _host | 14:32 |
lyarwood | stephenfin: actually defining the two arch:machine_type configs we have in code here in nova.conf | 14:32 |
*** markvoelker has quit IRC | 14:33 | |
lyarwood | stephenfin: they are the only reason we need _host to fetch things anyway | 14:33 |
stephenfin | lyarwood: Left more comments there but probably easier discuss here | 14:33 |
stephenfin | Also, thanks gibi :) | 14:33 |
stephenfin | lyarwood: To which two configs do you refer? | 14:33 |
lyarwood | stephenfin: virt for fields.Architecture.ARMV7 & fields.Architecture.AARCH64 | 14:35 |
lyarwood | stephenfin: and s390-ccw-virtio for fields.Architecture.S390 & fields.Architecture.S390X | 14:35 |
stephenfin | Gotcha. Yeah, it's weird that they're there | 14:35 |
lyarwood | hmm we still pass caps.host.cpu.arch to get_default_machine_type so nvm | 14:36 |
stephenfin | At the very least, that should probably be done in the 'libvirt_utils.machine_type_mappings' function instead | 14:36 |
*** priteau has joined #openstack-nova | 14:36 | |
stephenfin | lyarwood: We pass an arch | 14:36 |
stephenfin | which we're getting from caps.host.cpu.arch but I don't think that's necessary | 14:36 |
stephenfin | lyarwood: Any reason we could pass the arch from 'libvirt_utils.get_arch' instead? | 14:37 |
*** jaosorior has joined #openstack-nova | 14:37 | |
stephenfin | That would probably be more correct since we surely want to retrieve the machine type for the _guest_ architecture | 14:38 |
* lyarwood *slams head into desk* | 14:39 | |
kashyap | efried: Actually _very_ good observation on the priority of 'image_meta'... | 14:39 |
lyarwood | stephenfin:yeah that works | 14:39 |
lyarwood | stephenfin: totally missed that as an option | 14:39 |
*** jaosorior has quit IRC | 14:39 | |
efried | kashyap: Does it actually wind up mattering? I'll feel less nitpicky if it does. | 14:39 |
stephenfin | lyarwood: That's got to be a bug too, right? | 14:39 |
kashyap | efried: I'm not 100% sure; but from my reading, it doesn't. (Sorry for the weasel words.) | 14:40 |
lyarwood | stephenfin: hmm it's inefficient but I don't think it was a bug | 14:41 |
stephenfin | kashyap, lyarwood: So from https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L4327-L4355 we'll try to retrieve the machine type from image metadata | 14:41 |
stephenfin | If can't do that, we'll fall back to using something based on the host architecture | 14:41 |
*** mlavalle has joined #openstack-nova | 14:41 | |
kashyap | stephenfin: Correct | 14:41 |
*** jaosorior has joined #openstack-nova | 14:41 | |
stephenfin | But if the guest is e.g. x86 running on an ARMV7 host, we'll return a machine type of 'virt' | 14:42 |
kashyap | Am I wrong in insisting to do this extraction thingie in a separate change? | 14:42 |
*** xek_ has joined #openstack-nova | 14:42 | |
kashyap | stephenfin: You mean, an emulated x86 guest running on an ARMv7 host? | 14:42 |
stephenfin | So the guest would have an x86 architecture but a ARM'y machine type | 14:42 |
stephenfin | kashyap: Yeah | 14:42 |
lyarwood | that would be abug | 14:42 |
kashyap | Yes, that's a bug. | 14:43 |
*** xek has quit IRC | 14:43 | |
stephenfin | Sweet :) | 14:43 |
kashyap | HOWEVER | 14:43 |
stephenfin | I imagine no one would ever see this because what sane operator would run fully-emulated guests | 14:43 |
kashyap | stephenfin: Who in their right mind would do that for any production workload? | 14:43 |
stephenfin | kashyap: Jinx | 14:44 |
kashyap | _Exactly_, that was my "HOWEVER" | 14:44 |
kashyap | So, we can't have 100% guards for people willingly sticking knives in thier necks. | 14:44 |
kashyap | If that's an analogy at all :D | 14:44 |
*** maciejjozefczyk has quit IRC | 14:45 | |
*** rpittau is now known as rpittau|brb | 14:45 | |
mriedem | stephenfin: i know of an operator in the ML asking for that, to run powervm guests on an x86 host | 14:45 |
kashyap | lyarwood: Sorry, your good deed is getting punished, is it? | 14:45 |
stephenfin | Ultimately though, this seems to suggest we can remove the 'caps' argument to '_get_machine_type' and retrieve the *guest* architecture via the 'libvirt_utils.get_arch' function instead | 14:45 |
stephenfin | and lyarwood gets to fix two bugs in one | 14:45 |
stephenfin | for the win | 14:45 |
sean-k-mooney | stephenfin: operator that want to support cross arch developemnt | 14:45 |
mriedem | i'm pretty sure danpb was also ok (in the ML) with the fully emulated thing | 14:45 |
kashyap | But in two separate changes, obviously. | 14:45 |
kashyap | sean-k-mooney: Yeah, for development, yes... | 14:46 |
stephenfin | mriedem: In production?? | 14:46 |
sean-k-mooney | yes so its also a valid usecase for build farms | 14:46 |
mriedem | let me get the thread | 14:46 |
stephenfin | If so, can they share some of the cash they're burning with me? | 14:46 |
* stephenfin wants a new bike | 14:46 | |
sean-k-mooney | e.g. if you are a software company uing the cloud to bulid your product for mulitple target archatecure full emulation it totally vailid | 14:47 |
mriedem | this guy is funded by the US military so they have infinite funds | 14:47 |
kashyap | sean-k-mooney: I contend that anyone who *REALLY* cares about multiple archs, they will get a devel box for that arch. | 14:47 |
*** _hemna has quit IRC | 14:47 | |
stephenfin | Called it :) | 14:47 |
sean-k-mooney | kashyap: that is not what i think the majority of people do | 14:47 |
mriedem | http://lists.openstack.org/pipermail/openstack-operators/2018-August/thread.html#15617 | 14:48 |
mriedem | ^ is the thread | 14:48 |
sean-k-mooney | most use qemu to develop and test locally and only get real hardware if they ware writing low level software | 14:48 |
mriedem | danpb's reply is in this one http://lists.openstack.org/pipermail/openstack-operators/2018-August/015653.html | 14:49 |
kashyap | Yeah, I'm not denying there's no devel/test use case | 14:49 |
mriedem | "> Yes, it should do exactly that IMHO !" | 14:49 |
stephenfin | mriedem: How'd you manage to link to a specific email on the list page? Manually adding the anchor (or whatever the #foo part of a URL is called)? | 14:49 |
kashyap | I totally missed that thread :-( | 14:50 |
kashyap | Because of e-mail filtering snafu | 14:50 |
mriedem | stephenfin: "thread" under " Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] "" | 14:50 |
stephenfin | ooh, nifty | 14:51 |
stephenfin | mriedem++ | 14:51 |
sean-k-mooney | mriedem: was a blueprint ever filed for that | 14:52 |
kashyap | Yeah, the thread" view is easy to miss if you're not often parsing the archives :-) | 14:52 |
kashyap | mriedem: Some elephant-like memory you've got there! | 14:53 |
*** luksky has quit IRC | 14:53 | |
kashyap | What is it that you munch on for breakfast? | 14:53 |
kashyap | sean-k-mooney: No, it wasn't filed, near as I see. | 14:54 |
*** tbachman has quit IRC | 14:54 | |
* kashyap adds it to the KM-long TODO list this month; will get to it | 14:54 | |
*** luksky has joined #openstack-nova | 14:55 | |
*** jaosorior has quit IRC | 14:55 | |
*** brault has joined #openstack-nova | 14:56 | |
sean-k-mooney | thats a shame it would be nice to be able to handel cross arch emultaiton properly in openstack | 14:56 |
mriedem | sean-k-mooney: not that i'm aware of | 14:57 |
kashyap | sean-k-mooney: Filing it right now... | 14:57 |
*** jaosorior has joined #openstack-nova | 14:57 | |
mriedem | heh so you guys went from "this is a bug burn it burn it!" to "hey it's a feature let's support it!" | 14:58 |
mriedem | kashyap: i remember the thread because i was the only one engaging chris on it | 14:58 |
mriedem | and it took me awhile to understand what he was trying to do | 14:59 |
kashyap | mriedem: I completely missed it due to filtering :-( Normally anything with 'qemu' or 'libvirt' in the thread, I make it a point to engage | 14:59 |
*** cmart has quit IRC | 14:59 | |
kashyap | The case is valid for *test* / *devel*: because as of a couple of hours ago, I was running an AArch64 guest on x86_64 -- to test some PCIe stuff | 15:00 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: WIP libvirt: Use SATA bus for cdrom devices when using Q35 machine type https://review.opendev.org/663011 | 15:00 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: DNM: Run tempest-full-py3 with q35 machine type https://review.opendev.org/662887 | 15:00 |
lyarwood | stephenfin: ^ can you take a look at that during the team call and I'll sort tests out in the background. | 15:00 |
kashyap | But even for that test to be reliable, I had to ask a person with actual AArch64 hardware | 15:00 |
mriedem | couldn't the same be said for nested virt? | 15:00 |
sean-k-mooney | kashyap: test and dev are two of the larger useces for openstack | 15:01 |
sean-k-mooney | not everything is a long lived NFV app :) | 15:01 |
kashyap | mriedem: Somewhat; some people use nested for real workloads | 15:01 |
mriedem | that's my point | 15:02 |
kashyap | As that still is using hardware extensions | 15:02 |
kashyap | s/that/that's/ | 15:02 |
*** sapd1_x has quit IRC | 15:02 | |
kashyap | For pure emulation (or "TCG") -- no, every instruction is emulated | 15:02 |
sean-k-mooney | some people use emulation for real workload too | 15:02 |
* kashyap back in a bit; meeting | 15:02 | |
*** _hemna has joined #openstack-nova | 15:15 | |
*** gyee has joined #openstack-nova | 15:16 | |
*** tbachman has joined #openstack-nova | 15:19 | |
stephenfin | lyarwood: Done. Think there's _another_ bug here. Might be helpful to get aspiers input on it | 15:22 |
stephenfin | aspiers: Referring to https://review.opendev.org/#/c/663011/9/nova/virt/libvirt/utils.py@563 | 15:23 |
*** rpittau|brb is now known as rpittau | 15:26 | |
lyarwood | stephenfin: yup that's true | 15:27 |
lyarwood | stephenfin: and when you say dedent? | 15:27 |
stephenfin | the opposite of indent? | 15:28 |
kashyap | I think he means to unindent | 15:28 |
stephenfin | oh, I thought dedent was a word | 15:28 |
stephenfin | TIL | 15:28 |
kashyap | I got used to "stephenfin speak" on that :D | 15:28 |
lyarwood | stephenfin: right it is, but I didn't think that was valid tbh | 15:28 |
stephenfin | the word or what I'm suggesting? | 15:28 |
lyarwood | stephenfin: what you're suggesting | 15:29 |
lyarwood | stephenfin: if pep8 is happy then I'm happy, I just didn't think it would be ;) | 15:29 |
stephenfin | Think I suggested two dedents. Which one are you asking about? | 15:29 |
*** markvoelker has joined #openstack-nova | 15:29 | |
lyarwood | stephenfin: the comment is the one I'm looking at | 15:30 |
stephenfin | the docstring for get_disk_bus_for_device_type? It's indented by 8 but it should only be indented by 4 | 15:31 |
lyarwood | kk | 15:31 |
kashyap | sean-k-mooney: On "emulation for real workloads": it is completely and utterly baloney because, TCG is completely insecure and upstream provides no guarantee whatsover. | 15:32 |
kashyap | [It is to be limited to test/dev/CI; that's it.] | 15:32 |
*** aloga has quit IRC | 15:32 | |
sean-k-mooney | kashyap: ci is a real workload | 15:33 |
kashyap | On the "security" bit, of course, some people will come back with: "but, we're in a 'trusted network' [whatever that means] | 15:33 |
kashyap | sean-k-mooney: Right. I didn't define the word "real", though :-) | 15:33 |
*** aloga has joined #openstack-nova | 15:34 | |
kashyap | sean-k-mooney: Fresh off the oven: | 15:39 |
kashyap | < abologna> danpb, kashyap: I tried starting an aarch64/virt guest with 32 pcie-root-ports with ,io-reserve=0 (hacked libvirt) and it boots fine | 15:39 |
aspiers | stephenfin, lyarwood: dedent is definitely a word ;-) https://docs.python.org/3/library/textwrap.html#textwrap.dedent | 15:43 |
aspiers | even though maybe not in the official dictionaries ... | 15:44 |
*** damien_r has quit IRC | 15:44 | |
*** rpittau is now known as rpittau|afk | 15:46 | |
*** jaosorior has quit IRC | 15:50 | |
*** _hemna has quit IRC | 15:54 | |
*** wwriverrat has joined #openstack-nova | 15:59 | |
*** luksky has quit IRC | 16:00 | |
*** helenafm has quit IRC | 16:01 | |
*** markvoelker has quit IRC | 16:02 | |
*** tesseract has quit IRC | 16:03 | |
*** whoami-rajat has quit IRC | 16:07 | |
*** ttsiouts has quit IRC | 16:07 | |
*** ttsiouts has joined #openstack-nova | 16:08 | |
*** ttsiouts has quit IRC | 16:12 | |
*** whoami-rajat has joined #openstack-nova | 16:12 | |
openstackgerrit | John Garbutt proposed openstack/nova master: Admin password check for project or system scope https://review.opendev.org/663715 | 16:13 |
openstackgerrit | John Garbutt proposed openstack/nova master: WIP: Add new style rule for admin_password https://review.opendev.org/663716 | 16:13 |
openstackgerrit | John Garbutt proposed openstack/nova master: WIP: test admin_password with opt out https://review.opendev.org/663717 | 16:13 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Update quota known issues docs https://review.opendev.org/662570 | 16:22 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Cleanup quota user docs https://review.opendev.org/662573 | 16:22 |
*** tjgresha has joined #openstack-nova | 16:25 | |
*** dikonoor has quit IRC | 16:27 | |
mriedem | artom: i dumped some thoughts into your bottom change that renames the nova-live-migration job | 16:43 |
mriedem | kind of scatterbrained though | 16:43 |
*** xek_ has quit IRC | 16:45 | |
*** mgoddard has quit IRC | 16:48 | |
openstackgerrit | sean mooney proposed openstack/nova master: extend libvirt video model support https://review.opendev.org/647733 | 16:49 |
openstackgerrit | John Garbutt proposed openstack/nova master: WIP: stop admin_password information leakage https://review.opendev.org/663721 | 16:50 |
sean-k-mooney | johnthetubaguy: sorry for the delay but i have added the extra testing to ^ | 16:50 |
*** mgoddard has joined #openstack-nova | 16:50 | |
artom | mriedem, thank you for your dump (should TYFYD become a thing?) | 16:50 |
johnthetubaguy | sean-k-mooney: I have to run now, tuba rehearsal later, putting it on the pile | 16:50 |
sean-k-mooney | no worries | 16:51 |
sean-k-mooney | am i think it more or less ready to go but we can proably drop it out of the runway at this point | 16:51 |
* artom is confused about cold migration and resize | 16:53 | |
artom | I thought they were different operations? | 16:53 |
*** ricolin has quit IRC | 16:54 | |
sean-k-mooney | they are but they share alot of the same code | 16:54 |
sean-k-mooney | the only real differens is a cold migrate does not change teh flavor | 16:54 |
sean-k-mooney | everything else is the same | 16:54 |
artom | Aha... | 16:55 |
sean-k-mooney | i think there is some special caseing for resize to the same host but again its minimal | 16:55 |
artom | mriedem, so, my first thought is, if we already have it in tempest-slow, then keep it there | 16:56 |
artom | So I'd just abandon https://review.opendev.org/#/c/653498/ entirely | 16:56 |
artom | But since I don't want to block on https://review.opendev.org/#/c/663405/1, I'd put the fix patch below it | 16:57 |
artom | So we'd get the revert test coverage with the fix | 16:57 |
*** davidsha has quit IRC | 16:58 | |
mriedem | artom: cold migrate and resize are the same except for the flavor | 16:58 |
mriedem | for the intent of your bug fix, it doesn't matter which one runs | 16:58 |
artom | Conversely, skip the revert tests in tempest-slow entirely, and keep my multinode patch. Because of the -slow part | 16:58 |
mriedem | artom: i was also thinking that if i can get that cold migrate resize revert test unskipped then yeah we don't need to add new tests to the nova-live-migration job | 16:59 |
kashyap | stephenfin: mriedem: sean-k-mooney: As promised, I bring you https://blueprints.launchpad.net/nova/+spec/pick-guest-arch-based-on-host-arch-in-libvirt-driver | 16:59 |
kashyap | [Corrections, snide remarks, rotten tomatoes welcome.] | 16:59 |
*** markvoelker has joined #openstack-nova | 16:59 | |
mriedem | artom: i'm not sure how you'd depend on https://review.opendev.org/#/c/663405/ if you drop your job rename patch | 16:59 |
* kashyap AFK; back later | 16:59 | |
*** derekh has quit IRC | 17:00 | |
artom | mriedem, I'd "rebase" my fix below it in order to, in the top patch (aka 663405), still have a revert test with my fix | 17:00 |
sean-k-mooney | kashyap: there is a way to select host that have hardware accleration already | 17:01 |
*** mrhillsman is now known as openlab | 17:02 | |
mriedem | artom: you can't rebase a nova change "below" https://review.opendev.org/#/c/663405/ since it's in a different repo... | 17:02 |
sean-k-mooney | kashyap: you can use the vm_mode image property and specify hvm https://github.com/openstack/glance/blob/master/etc/metadefs/compute-hypervisor.json#L30-L41 | 17:03 |
artom | *facepalm* | 17:03 |
*** damien_r has joined #openstack-nova | 17:03 | |
artom | mriedem, make yours depend on mine then, same effect, right? | 17:03 |
*** dpawlik has quit IRC | 17:04 | |
sean-k-mooney | kashyap: that said we never test that in the ci as far as im aware so no idea if it really works | 17:04 |
*** dpawlik has joined #openstack-nova | 17:04 | |
*** openlab is now known as codebauss | 17:05 | |
mriedem | artom: that's one option yeah. i'm not sure it's the right option...since unskipping that test should be possible without your change - although i guess one could argue that maybe the test isn't safe if you're using ovs hybrid plugging w/o your fix | 17:05 |
*** dtantsur is now known as dtantsur|afk | 17:06 | |
artom | mriedem, unskipping that change *is* possible without my fix (upstream gate doesn't use hybrid plug) | 17:06 |
artom | My idea was more to have integration test coverage of my code, at least in the non-hybrid-plug case | 17:06 |
mriedem | artom: i think you missed my point, | 17:08 |
mriedem | tempest can be run anywhere on any config, | 17:08 |
artom | It happens a lot | 17:08 |
mriedem | since it's just hitting APIs, | 17:08 |
artom | OOOIC | 17:08 |
mriedem | so while unskipping it could be ok in the gate b/c of how most of the devstack jobs are setup (ovs w/o hybrid plug or lb), | 17:08 |
artom | Upstream gate would be fine, but we'd break someone else who runs it with hybrid plug | 17:08 |
mriedem | that doesn't mean someone, like your downstream ci, could break on it | 17:08 |
mriedem | correct | 17:09 |
mriedem | i'm basically convincing myself on your behalf | 17:09 |
artom | Our downstream CI is permanently on fire anyways | 17:09 |
mriedem | b/c of tripleo | 17:09 |
mriedem | sure | 17:09 |
mriedem | :) | 17:09 |
artom | (Did I say that out loud? It's not true, obviously) | 17:09 |
sean-k-mooney | sure... | 17:09 |
*** dpawlik has quit IRC | 17:10 | |
artom | mriedem, what I selfishly *don't* want to do is have my fix depend on your tempest patch | 17:10 |
artom | Because your tempest patch might be stuck in recheck hell for a while, and I'm in a *massive* hurry ;) | 17:11 |
artom | But... I think you made the case for me that it's better the other way around anyways | 17:11 |
efried | kashyap, mriedem: https://blueprints.launchpad.net/nova/+spec/pick-guest-arch-based-on-host-arch-in-libvirt-driver for train? | 17:11 |
sean-k-mooney | efried: proably not | 17:12 |
mriedem | efried: no | 17:12 |
efried | ight, thx | 17:12 |
mriedem | imo | 17:12 |
efried | just wanted to tag series goal if so, so it shows up on dashboard things. | 17:12 |
efried | NOT tagging. | 17:12 |
sean-k-mooney | do we have a NEXT tag or somthing we can use | 17:13 |
mriedem | artom: i have a hard time knowing what your team considers a priority ever really... | 17:13 |
sean-k-mooney | e.g. after train | 17:13 |
mriedem | until the boss (mdbooth) shows up asking for reviews | 17:13 |
*** codebauss is now known as openlab | 17:13 | |
sean-k-mooney | mriedem: our relase folk considers it a blocker for the osp 15 beta | 17:13 |
artom | mriedem, speaking for myself I try not to push that angle too much, because what's important for upstream isn't necessarily what's important for us | 17:13 |
artom | Or vice versa | 17:14 |
mriedem | sean-k-mooney: did those people never realize this has always been broken? | 17:14 |
sean-k-mooney | but its a beta so its not going to break production deployments | 17:14 |
artom | I just feel weird saying "well this thing is super important for *us*, so dump everything else and work on my thing" | 17:14 |
mriedem | unless you guys just changed to hybrid plug | 17:14 |
*** openlab is now known as codebauss | 17:14 | |
artom | But if you'd rather I just be up front about it, I can do that too | 17:14 |
mriedem | btw, whatever happened with russell ovn'ing the world at rh? | 17:15 |
sean-k-mooney | mriedem: its because its showing up in the downstream ci and that makes them unhappy | 17:15 |
mriedem | sean-k-mooney: oh easy, just stop doing testing :) | 17:15 |
*** codebauss is now known as openlab | 17:15 | |
mriedem | then you're in the same state as before | 17:15 |
mriedem | heads in the sand and whatnot | 17:15 |
melwitt | I think we switched our default network backend to ovn recently. sean-k-mooney correct me | 17:16 |
*** openlab is now known as codebauss | 17:16 | |
mriedem | artom: if you're cool with dropping the nova-live-migratoin rename + resize revert stuff (i still thing the rename is good at some point, just not while hurrying for a fix) i'm ok with adding a depends-on from my unskip to your fix | 17:16 |
sean-k-mooney | hehe well for 15 we were planning to test with ovn but it has other issues | 17:16 |
mriedem | ooo the plot thickens | 17:16 |
artom | mriedem, 🤝 | 17:16 |
artom | (That's a handshake shaped like a yellow heart) | 17:17 |
mriedem | ok - yeah pidgin doesn't render those | 17:17 |
melwitt | I can see it. looks cool | 17:17 |
artom | sean-k-mooney, so what's geneve? another name for OVN? | 17:17 |
sean-k-mooney | melwitt: yes but ovn has only works when we disabel waiting for the network-evnets on live migation so i think they wanted to test ml2/ovn with iptabls as a fallback plan | 17:17 |
artom | 'cuz that's the default networking thingee in our ospd custom job | 17:17 |
sean-k-mooney | geneve is a l3 tunneling protocol like vxlan | 17:17 |
sean-k-mooney | and its the protocol ovn uses for its networking by defualt | 17:18 |
sean-k-mooney | artom: you are refering to the neutron type driver which for ovn would be geneve | 17:19 |
*** _hemna has joined #openstack-nova | 17:19 | |
artom | sean-k-mooney, "Choose the network variant (default for OSP15 is 'geneve') | 17:19 |
artom | For the geneve option set the NETWORK_OVN paremeter to yes." | 17:19 |
artom | (Form the Jenkins job build page) | 17:19 |
artom | So yeah, looks like OSP15 default is OVN | 17:19 |
sean-k-mooney | mriedem: but to your point yes this has always been broken in one way or another | 17:20 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: Revert resize: wait for events according to hybrid plug https://review.opendev.org/644881 | 17:21 |
sean-k-mooney | but we i think we only started waithing in stien so its only showing up now as they are testing Stien/osp15 | 17:21 |
artom | sean-k-mooney, I guess you can abandon https://review.opendev.org/#/c/660782/ now, since we're settling on https://review.opendev.org/#/c/663405/1 | 17:22 |
sean-k-mooney | am sure i never intended it to merge anyway so if you dont need it for testing then sure ill abandon | 17:23 |
artom | We could replace it with another DNM on top of mriedem's tempest patch to run tempest-slow with hybrid plug | 17:24 |
artom | For the extra confidence | 17:24 |
artom | But we've seen it pass already, not sure how necessary a new patch would be | 17:24 |
*** codebauss is now known as mrhillsman | 17:24 | |
sean-k-mooney | well i can add a depens on to matts change too but ok ill just abandon it | 17:24 |
artom | Well no you can't, because your change was in the multinode job | 17:25 |
artom | Whereas mriedem's change unksips a test in tempest-slow | 17:25 |
sean-k-mooney | yes but may change didnt diable tempest-slow | 17:25 |
sean-k-mooney | but anyway its abandoned | 17:26 |
sean-k-mooney | you know how to reporduce this if you need to so i dont need to keep a patch open | 17:26 |
*** dpawlik has joined #openstack-nova | 17:26 | |
artom | I need lunch | 17:26 |
* artom -> pheeding | 17:26 | |
*** dpawlik has quit IRC | 17:31 | |
*** markvoelker has quit IRC | 17:32 | |
*** damien_r has quit IRC | 17:33 | |
*** ociuhandu_ has joined #openstack-nova | 17:39 | |
*** _hemna has quit IRC | 17:39 | |
*** spsurya has quit IRC | 17:40 | |
*** udesale has quit IRC | 17:42 | |
*** dpawlik has joined #openstack-nova | 17:42 | |
*** ociuhandu has quit IRC | 17:42 | |
*** ociuhandu_ has quit IRC | 17:43 | |
*** dpawlik has quit IRC | 17:46 | |
*** jdillaman has joined #openstack-nova | 17:49 | |
*** _hemna has joined #openstack-nova | 17:56 | |
*** priteau has quit IRC | 17:56 | |
*** panda has quit IRC | 17:59 | |
*** lennyb has quit IRC | 18:00 | |
*** panda has joined #openstack-nova | 18:01 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add functional recreate test for bug 1829479 and bug 1817833 https://review.opendev.org/663737 | 18:02 |
openstack | bug 1829479 in OpenStack Compute (nova) "The allocation table has residual records when instance is evacuated and the source physical node is removed" [Medium,Triaged] https://launchpad.net/bugs/1829479 | 18:02 |
mriedem | cfriesen: sean-k-mooney: ^ | 18:02 |
openstack | bug 1817833 in OpenStack Compute (nova) "Check compute_id existence when nova-compute reports info to placement" [Undecided,In progress] https://launchpad.net/bugs/1817833 - Assigned to xulei (605423512-j) | 18:02 |
mriedem | bugtastic | 18:02 |
*** mriedem is now known as mriedem_lunch | 18:02 | |
*** mgoddard has quit IRC | 18:03 | |
*** mgoddard has joined #openstack-nova | 18:03 | |
*** ralonsoh has quit IRC | 18:04 | |
sean-k-mooney | mriedem_lunch: the comment help but ya that looks correct | 18:05 |
*** bbowen_ has joined #openstack-nova | 18:07 | |
*** bbowen has quit IRC | 18:09 | |
*** bbowen__ has joined #openstack-nova | 18:09 | |
*** bbowen_ has quit IRC | 18:12 | |
*** _hemna has quit IRC | 18:16 | |
*** damien_r has joined #openstack-nova | 18:22 | |
*** mvkr has quit IRC | 18:28 | |
*** markvoelker has joined #openstack-nova | 18:29 | |
efried | I wonder, if we made a bug whose title referenced another bug, and that bug's title referenced the first, could we make patchbot spin forever? | 18:32 |
*** dpawlik has joined #openstack-nova | 18:37 | |
openstackgerrit | Eric Fried proposed openstack/nova master: Introduces the openstacksdk to nova https://review.opendev.org/643664 | 18:41 |
openstackgerrit | Eric Fried proposed openstack/nova master: Use OpenStack SDK for placement https://review.opendev.org/656023 | 18:41 |
openstackgerrit | Eric Fried proposed openstack/nova master: Introduces SDK to IronicDriver and uses for node.get https://review.opendev.org/642899 | 18:41 |
openstackgerrit | Eric Fried proposed openstack/nova master: Use SDK instead of ironicclient for node.list https://review.opendev.org/656027 | 18:41 |
openstackgerrit | Eric Fried proposed openstack/nova master: Use SDK instead of ironicclient for validating instance and node https://review.opendev.org/656028 | 18:41 |
openstackgerrit | Eric Fried proposed openstack/nova master: Use SDK instead of ironicclient for setting instance id https://review.opendev.org/659690 | 18:42 |
*** dpawlik has quit IRC | 18:42 | |
openstackgerrit | Merged openstack/nova master: Add testing guide for down cells https://review.opendev.org/650167 | 18:50 |
*** dpawlik has joined #openstack-nova | 18:53 | |
*** dpawlik has quit IRC | 18:58 | |
*** markvoelker has quit IRC | 19:02 | |
*** mriedem_lunch is now known as mriedem | 19:13 | |
*** owalsh has quit IRC | 19:18 | |
*** bnemec has quit IRC | 19:23 | |
*** d34dh0r53 has quit IRC | 19:25 | |
*** zbr has quit IRC | 19:25 | |
*** bnemec has joined #openstack-nova | 19:25 | |
*** bnemec has quit IRC | 19:31 | |
*** d34dh0r53 has joined #openstack-nova | 19:31 | |
*** owalsh has joined #openstack-nova | 19:34 | |
*** bnemec has joined #openstack-nova | 19:34 | |
*** bnemec has quit IRC | 19:41 | |
*** bnemec has joined #openstack-nova | 19:42 | |
*** luksky has joined #openstack-nova | 19:49 | |
*** bbowen__ has quit IRC | 19:53 | |
*** imacdonn has quit IRC | 19:53 | |
*** imacdonn has joined #openstack-nova | 19:54 | |
*** markvoelker has joined #openstack-nova | 19:59 | |
*** panda has quit IRC | 20:04 | |
mriedem | artom: i guess you missed these comments https://review.opendev.org/#/c/644881/19//COMMIT_MSG@11 | 20:04 |
*** panda has joined #openstack-nova | 20:05 | |
*** ccamacho has quit IRC | 20:08 | |
*** hongbin has joined #openstack-nova | 20:14 | |
*** bnemec has quit IRC | 20:16 | |
*** bnemec has joined #openstack-nova | 20:17 | |
*** tjgresha has quit IRC | 20:21 | |
mriedem | dansmith et al, I'm +2 on the nova/cyborg spec now https://review.opendev.org/#/c/603955/ | 20:27 |
*** markvoelker has quit IRC | 20:32 | |
*** mdbooth_ has joined #openstack-nova | 20:40 | |
*** mdbooth has quit IRC | 20:41 | |
*** damien_r has quit IRC | 20:46 | |
*** takashin has joined #openstack-nova | 20:48 | |
*** damien_r has joined #openstack-nova | 20:49 | |
mriedem | melwitt: see what you think about my proposed wording in this "specify az on unshelve" spec and if you agree i'll update it https://review.opendev.org/#/c/624689/10/specs/train/approved/support-specifying-az-when-restore-shelved-server.rst@62 | 20:49 |
*** dpawlik has joined #openstack-nova | 20:54 | |
*** damien_r has quit IRC | 20:55 | |
*** BjoernT has quit IRC | 20:56 | |
efried | nova meeting in 3 minutes in #openstack-meeting | 20:58 |
*** dpawlik has quit IRC | 20:59 | |
melwitt | mriedem: ack | 20:59 |
efried | nova meeting now | 21:02 |
mriedem | no meeting?! | 21:07 |
*** tbachman has quit IRC | 21:07 | |
mriedem | was reviewing something... | 21:07 |
*** tbachman has joined #openstack-nova | 21:08 | |
*** dpawlik has joined #openstack-nova | 21:10 | |
*** dpawlik has quit IRC | 21:14 | |
*** mriedem is now known as mriedem_afk | 21:26 | |
*** whoami-rajat has quit IRC | 21:27 | |
melwitt | mriedem_afk: +1 to proposed wording | 21:28 |
*** markvoelker has joined #openstack-nova | 21:29 | |
*** JamesBenson has quit IRC | 21:35 | |
openstackgerrit | Dustin Cowles proposed openstack/nova master: WIP: Use SDK instead of ironicclient for add/remove instance info from node https://review.opendev.org/659691 | 21:43 |
tonyb | kashyap: I don't but it's on my list of things to get workign this year | 21:53 |
tonyb | kashyap: 'sup? | 21:53 |
*** bbowen__ has joined #openstack-nova | 21:54 | |
*** d34dh0r53 has quit IRC | 22:01 | |
*** markvoelker has quit IRC | 22:03 | |
*** JamesBenson has joined #openstack-nova | 22:07 | |
*** JamesBenson has quit IRC | 22:12 | |
openstackgerrit | Merged openstack/nova master: [Docs] Update the confusing console output https://review.opendev.org/589004 | 22:14 |
*** rcernin has joined #openstack-nova | 22:23 | |
*** rcernin has quit IRC | 22:23 | |
*** rcernin has joined #openstack-nova | 22:24 | |
*** rcernin has quit IRC | 22:26 | |
*** slaweq has quit IRC | 22:27 | |
*** rcernin has joined #openstack-nova | 22:28 | |
*** d34dh0r53 has joined #openstack-nova | 22:28 | |
*** JamesBenson has joined #openstack-nova | 22:30 | |
*** JamesBenson has quit IRC | 22:35 | |
*** mlavalle has quit IRC | 22:41 | |
mnaser | alright. has anyone ever ran into this? i'm convinced this feels like an oslo bug or something. | 22:47 |
mnaser | i've noticed that if i have a 3 node rabbitmq cluster, and the cluster loses a node and regains it, for some reason, its like nothing gets routed across rabbitmq | 22:47 |
mnaser | in this case, an instance gets booted and goes into SCHEDULING state and gets stuck forever | 22:47 |
mnaser | this is with rabbitmq 3.6.16 -- the only way i've solved it often is by deleting all the queues and restarting nova | 22:48 |
mnaser | the one 'odd' thing is with osa, we deploy a cluster with ssl for rabbitmq | 22:48 |
mnaser | i have a (non-prod) cluster stuck at this state right now and i really want to get to the bottom of what/why | 22:49 |
mnaser | all queues listing 0 messages.. except for notifications | 22:51 |
*** luksky has quit IRC | 22:51 | |
*** tkajinam has joined #openstack-nova | 22:53 | |
*** slaweq has joined #openstack-nova | 22:55 | |
melwitt | mnaser: feel like you're going to need to get some other operator input about that one.. I haven't personally heard of it | 22:59 |
*** markvoelker has joined #openstack-nova | 22:59 | |
melwitt | imacdonn ^ | 23:00 |
mnaser | melwitt: i've ran into it a bunch of times and had others who ran to it and the "drop all queues" thing fixed it.. | 23:00 |
*** _hemna has joined #openstack-nova | 23:00 | |
melwitt | I also asked penick to look at your messages here and let us know if he's seen it | 23:00 |
mnaser | melwitt: yay cool thanks, i'm going to continue digging, this is a non prod environment so i have the time to dig into it | 23:01 |
mnaser | usually things are under fire when i need to do this | 23:01 |
melwitt | yeah, that's awesome you got a repro in non-prod | 23:01 |
mnaser | feels like it's gonna be a really annoying one though | 23:02 |
*** penick has joined #openstack-nova | 23:02 | |
melwitt | yeah. another person we could ask is kgiusti, he knows oslo.messaging | 23:03 |
mnaser | so, instance in scheduling state, 7 messages in a bunch of scheduler_fanout_* queues | 23:04 |
mnaser | but nothing happening, so i guess the conductor is doing what its suppoed to do | 23:04 |
penick | @mnaser we've had weird issues with rabbitmq and ssl. Mostly with performance and the queue faceplanting though. I don't think we've ever had to wipe and recreate all queues. | 23:04 |
mnaser | :( sads, yeah, it's pretty destructive but not sure.. | 23:05 |
penick | Though the way we worked around the ssl issue is we switched from native SSL to using stunnel. Which is awful | 23:05 |
penick | something about native rabbit ssl is borked | 23:05 |
mnaser | oh wow, is the native rabbit ssl that bad | 23:05 |
mnaser | can i ask what release were you running then (or now)? | 23:05 |
penick | Ocata | 23:05 |
penick | oh, of rabbit | 23:05 |
penick | uh, one sec | 23:05 |
mnaser | thanks :> | 23:06 |
mnaser | there is exactly 48 scheduler_fanout_* queues | 23:06 |
mnaser | each controller has 16 nova-scheduler processes, which adds up to 48 | 23:06 |
mnaser | so the theory is nova-conductor is doing its job but the scheduler just isn't picking up work off the queue.. | 23:07 |
penick | 3.6.15 | 23:07 |
melwitt | penick: I wondered whether it might be related to queue mirroring or non, clustered or non, how it behaves when a node leaves and rejoins | 23:08 |
mnaser | yeah we're at 3.6.16 so we're not far away | 23:08 |
*** slaweq has quit IRC | 23:08 | |
mnaser | melwitt: yeah so actually i wondered if that was maybe the issue.. | 23:08 |
imacdonn | FWIW, I don't run clustered rabbitmq - just a single instance | 23:08 |
melwitt | I had thought there might have been some configuration where that doesn't work well, but I can't remember anything more about it | 23:09 |
mnaser | imacdonn: you're living life | 23:09 |
* mnaser would like to just buy a big box with a zillion cores and memory and run a single instance of rabbit | 23:09 | |
melwitt | :D | 23:09 |
imacdonn | I think so :) My career experience with clustering amounts to "it usually causes more problems than it solves" | 23:09 |
mnaser | thanks penick for the insights, i guess we might be hitting the weird ssl things you might have ran into | 23:09 |
mnaser | i guess i can bring the log levels up on the mq stuff and see how/why its getting bound and why its not picking up any messages | 23:10 |
imacdonn | yeah, there's stuff you can turn on with default_log_levels | 23:10 |
mnaser | i mean i think the api is doing its job pushing things out to the scheduler fanout, but then things faceplant there | 23:11 |
*** slaweq has joined #openstack-nova | 23:11 | |
* mnaser always hates dealing with default_log_levels | 23:11 | |
*** dpawlik has joined #openstack-nova | 23:11 | |
melwitt | log levels in python are totally easy to control (not) | 23:12 |
mnaser | i'll at least shut down the other 2 schedulers so i can debug on one ost | 23:12 |
*** dpawlik has quit IRC | 23:15 | |
mnaser | oh that's interesitng | 23:16 |
mnaser | it actually schedules it | 23:16 |
mnaser | it even puts allocations in placement | 23:16 |
mnaser | sends notifications.. | 23:16 |
melwitt | so, it doesn't get past the scheduler? doesn't move on back to conductor and then to compute? | 23:18 |
mnaser | it gets to the scheduler, then the scheduler sends something, gets a reply, and it seems to stall out there | 23:20 |
*** brinzhang has joined #openstack-nova | 23:22 | |
mnaser | i guess i need to find out what happens after an instance is scheduled, it send something back to the conductor (or it casts to that cell?) | 23:22 |
penick | @mnaser if it helps we don't use durable queues | 23:22 |
melwitt | mnaser: this is old but sounds kind of the same? https://github.com/rabbitmq/rabbitmq-server/issues/224 | 23:22 |
mnaser | hmm, wonder what the defaults are | 23:23 |
mnaser | oslo_messaging_rabbit.amqp_durable_queues = False yeah same here | 23:23 |
mnaser | oh interesting | 23:24 |
*** slaweq has quit IRC | 23:24 | |
melwitt | are you using mirrored queues? | 23:25 |
melwitt | penick doesn't use mirrored and I think that's the thing that had issues that I'm vaguely remembering. and that issue I linked from rabbitmq also seems to have to do with mirrored queues | 23:26 |
mnaser | i think we mirror a subset of queus in OSA | 23:26 |
*** dpawlik has joined #openstack-nova | 23:27 | |
melwitt | that issue was fixed eons ago but was about "some kind of race with mirrored, auto_delete queues" | 23:29 |
mnaser | https://github.com/openstack/openstack-ansible/blob/dc5729ad6f10b7e6083d60c2270f75090cf3d5f4/inventory/group_vars/all/infra.yml#L27-L31 | 23:29 |
melwitt | I wonder if a new race could have cropped up recently | 23:29 |
mnaser | so question, after an instance is scheduled, does it then contact the rabbitmq cluster @ the cell? | 23:31 |
*** markvoelker has quit IRC | 23:31 | |
*** dpawlik has quit IRC | 23:31 | |
*** jaypipes has quit IRC | 23:33 | |
melwitt | we do a sync call over rpc from conductor to scheduler, get the result in conductor, then conductor async casts to compute over rpc | 23:33 |
melwitt | it sounds like what you're saying is scheduler does fine, creates allocs in placement etc, but then the reply never goes back to conductor over rpc | 23:34 |
mnaser | it looks like the reply to the conductor does get sent out.. so i need to maybe go back and check the conductor this time i guess | 23:35 |
melwitt | if that is the case, then it's never getting the to cell mq part | 23:35 |
melwitt | if the super conductor receives the reply, the next step is super conductor casts to the cell mq | 23:35 |
mnaser | ok, ill move away from scheduler and start looking at conductor now | 23:36 |
mnaser | conductor, the worlds quietest service | 23:37 |
melwitt | :) | 23:37 |
melwitt | you know.... we do cache mq connection info. so if you find it's hanging up at the step where super conductor is trying to drop the "build instance" message onto the cell mq and that cached info is no longer valid, that could be the culprit | 23:38 |
melwitt | but no, you said things return to normal if you kill the rabbit queues and recreate them without restarting any nova services yeah? | 23:38 |
*** claudiub has quit IRC | 23:38 | |
melwitt | that would mean the nova cache thing doesn't matter | 23:38 |
mnaser | ok so schedule_and_build_instances gets to the point where its gets an asigned host | 23:40 |
mnaser | and it creates a block device mapping in the db for the cell | 23:42 |
melwitt | ok, so it's getting the reply from the scheduler. so it must be that the message going onto the cell mq is going into the ether | 23:43 |
melwitt | which makes me suspicious about the mq connection caching | 23:43 |
melwitt | mnaser: when you do the queues recreate thing, are you restarting nova services after? | 23:44 |
mnaser | yes i have to restart them too after (but i do that just to keep things clean rather than make it work) | 23:45 |
melwitt | *do you restart nova services after? | 23:45 |
melwitt | ok, then it might be a problem with that cache (it never refreshes) | 23:45 |
melwitt | have you ever tried restarting services without recreating the queues? | 23:46 |
mnaser | yep, no bueno | 23:46 |
melwitt | restarting would only help new server creates though, not the stuck ones | 23:47 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Fix cleaning up console tokens https://review.opendev.org/637716 | 23:47 |
mnaser | so i know i at least get to here: https://github.com/openstack/nova/blob/3370f0f03ce17aaf3a7ebaa95d497f62bef238c0/nova/conductor/manager.py#L1400-L1401 | 23:47 |
melwitt | mnaser: does recreating the queues and restarting services make the stuck servers finish building? | 23:48 |
mnaser | melwitt: yeah but ive restarted them all in this case and no bueno | 23:48 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Add a live migration regression test https://review.opendev.org/641200 | 23:48 |
mnaser | nope, all the stuck ones stay stuck | 23:48 |
melwitt | ok, so restarting services only will NOT make new server creates work? | 23:48 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (13) https://review.opendev.org/576020 | 23:48 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (14) https://review.opendev.org/576027 | 23:48 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: Remove mox in unit/network/test_neutronv2.py (15) https://review.opendev.org/576031 | 23:49 |
mnaser | melwitt: correct | 23:49 |
melwitt | if restarting services only doesn't make new server creates work, then I'd think the issue has to be with rabbit or oslo.messaging | 23:49 |
mnaser | yeah my feeling is this is oslo.messaging | 23:50 |
mnaser | https://github.com/openstack/nova/blob/3370f0f03ce17aaf3a7ebaa95d497f62bef238c0/nova/conductor/manager.py#L1462-L1473 -- i get here, i see it connect to rabbit, i see it cast on compute.<hostname> under exchange nova | 23:50 |
mnaser | and then nothing | 23:50 |
melwitt | could also be a mirrored queues thing in rabbit, there's been that similar bug in the past | 23:51 |
melwitt | that I linked earlier | 23:51 |
mnaser | yeah.. it also connects to rabbitmq successfuly... well the weird thing is i think maybe what is broken is casts only? | 23:51 |
mnaser | because the normal rpc stuff works | 23:51 |
mnaser | its the compute cast that fails | 23:51 |
melwitt | you could try something that is a call to compute to see if that works. that would be an interesting test | 23:52 |
melwitt | like getting vnc console url | 23:52 |
mnaser | the service does report as up too inside nova | 23:52 |
mnaser | dont have any vms unfortunately :P | 23:52 |
melwitt | lemme see if there's something else | 23:52 |
mnaser | hypervisor-stats maybe | 23:52 |
melwitt | get_diagnostics | 23:53 |
mnaser | err -utpime | 23:53 |
mnaser | isnt diag per vm | 23:53 |
mnaser | nova hypervisor-uptime <hypervisor-uuid> hangs | 23:54 |
melwitt | oh yeah, sorry | 23:54 |
openstackgerrit | Matt Riedemann proposed openstack/nova-specs master: Specifying az when restore shelved server https://review.opendev.org/624689 | 23:54 |
*** mriedem_afk has quit IRC | 23:54 | |
melwitt | ok, so it doesn't have to do with cast vs call, it's an issue with trying to get to the cell message queue in general | 23:54 |
*** rcernin has quit IRC | 23:55 | |
*** rcernin has joined #openstack-nova | 23:56 | |
mnaser | i mean the configs match.. in terms of cell_mapping | 23:56 |
mnaser | and it successfully connects.. | 23:56 |
mnaser | i guess now that i know its just reproducable with something far more basic.. | 23:56 |
melwitt | yeah... true | 23:56 |
sean-k-mooney | melwitt: are you suspecting this is is similar to the wsgi issue? | 23:58 |
mnaser | CALL msg_id: 7dc9fa9615ed4ef8bf083ac84e40beef exchange 'nova' topic 'compute.<hostname>' _send | 23:59 |
melwitt | sean-k-mooney: no, this is something different. I wondered if it's something like this old bug that was fixed in the past, if another race like this has re-emerged https://github.com/rabbitmq/rabbitmq-server/issues/224 | 23:59 |
mnaser | that's how far it'll get and then hang, same way it hung for the new compute | 23:59 |
mnaser | i mean let me restart the nova-compute... but still feels like it should be recovering on its own.. | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!