*** EugenMayer4 is now known as EugenMayer | 08:36 | |
opendevreview | Maxim Korezkij proposed openstack/nova master: Handle volume attachments https://review.opendev.org/c/openstack/nova/+/833232 | 09:13 |
---|---|---|
opendevreview | Maxim Korezkij proposed openstack/nova master: Handle volume attachments https://review.opendev.org/c/openstack/nova/+/833233 | 09:15 |
opendevreview | Maxim Korezkij proposed openstack/nova master: Handle volume attachments https://review.opendev.org/c/openstack/nova/+/833234 | 09:21 |
lajoskatona | bauzas: Hi, do you know if there is any Nova/Neutron xproject topic? On the nova etherpad there's no at the moment, but perhaps you know more :-) | 09:28 |
bauzas | lajoskatona: hey :) | 09:30 |
bauzas | lajoskatona: I created the etherpad last week, so for the moment, we don't have a lot of topics | 09:30 |
bauzas | lajoskatona: but you can surely add your own topics if you want and we can find some timeslot after ;) | 09:31 |
opendevreview | ribaudr proposed openstack/nova master: Fix unit tests when they are run with OS_DEBUG=True https://review.opendev.org/c/openstack/nova/+/833115 | 09:31 |
lajoskatona | bauzas: ok, thanks, I will check, but currently I don't know any | 09:31 |
bauzas | lajoskatona: I can add a section, in case people want to discuss | 09:32 |
lajoskatona | bauzas: thanks, perhaps that's the best , and we will see if there will be concrete issues/features to discuss | 09:35 |
bauzas | lajoskatona: just updating the etherpad | 09:36 |
*** gibi is now known as gibi_pto | 09:38 | |
gibi_pto | I will be back around a bit Monday and Thursday | 09:39 |
elodilles | gibi_pto: we will have national holiday on those days ;) | 09:42 |
gibi_pto | not on thursday :) | 09:42 |
elodilles | true :) | 09:43 |
gibi_pto | and my wife works on Monday so she cannot prevent me to look at IRC | 09:43 |
kashyap | gibi_pto: Get off IRC, you're polluting your PTO, really :) | 09:45 |
elodilles | :] | 09:45 |
kashyap | Embrace JOMO (joy of missing out). | 09:46 |
opendevreview | OpenStack Release Bot proposed openstack/nova stable/yoga: Update .gitreview for stable/yoga https://review.opendev.org/c/openstack/nova/+/833241 | 09:54 |
opendevreview | OpenStack Release Bot proposed openstack/nova stable/yoga: Update TOX_CONSTRAINTS_FILE for stable/yoga https://review.opendev.org/c/openstack/nova/+/833242 | 09:54 |
opendevreview | OpenStack Release Bot proposed openstack/nova master: Update master for stable/yoga https://review.opendev.org/c/openstack/nova/+/833243 | 09:54 |
opendevreview | OpenStack Release Bot proposed openstack/nova master: Add Python3 zed unit tests https://review.opendev.org/c/openstack/nova/+/833244 | 09:54 |
opendevreview | ribaudr proposed openstack/nova master: [WIP] Attach Manila shares via virtiofs(manila abstraction) https://review.opendev.org/c/openstack/nova/+/831194 | 10:02 |
opendevreview | ribaudr proposed openstack/nova master: [WIP] Enable and use COMPUTE_STORAGE_VIRTIO_FS and COMPUTE_MEM_BACKING_FILE traits. https://review.opendev.org/c/openstack/nova/+/833090 | 10:02 |
opendevreview | Elod Illes proposed openstack/nova stable/yoga: [stable-only] Update .gitreview for stable/yoga https://review.opendev.org/c/openstack/nova/+/833241 | 10:03 |
opendevreview | Elod Illes proposed openstack/nova stable/yoga: [stable-only] Update TOX_CONSTRAINTS_FILE for stable/yoga https://review.opendev.org/c/openstack/nova/+/833242 | 10:03 |
bauzas | gibi_pto: enjoy your time off | 10:07 |
bauzas | elodilles: so not around too on Monday and Tuesday ? | 10:09 |
elodilles | bauzas: yepp, those are holidays here | 10:13 |
bauzas | cool, enjoy then ! | 10:14 |
elodilles | bauzas: thanks :) | 10:15 |
elodilles | bauzas: though i also might be around, will look at IRC and participate on the weekly meeting :) | 10:16 |
bauzas | elodilles: this is holidays | 10:16 |
elodilles | but i don't promise anything o:) | 10:16 |
elodilles | bauzas: yepp :D | 10:16 |
bauzas | elodilles: don't be around and rest | 10:16 |
elodilles | bauzas: will do that as well ;) | 10:16 |
bauzas | in August, I'll be off for 3.5 weeks and I'd be back 2 days before FF... | 10:17 |
elodilles | that sounds like a brave thing :D | 10:17 |
elodilles | but the team is here, so there will be surely someone who can help in :) | 10:18 |
elodilles | will it be a big family - tesla tour around France? :) | 10:19 |
opendevreview | changxin xiao proposed openstack/nova master: Fix openstack/nova git repo https://bugs.launchpad.net/nova/+bug/1964548 https://review.opendev.org/c/openstack/nova/+/833248 | 10:24 |
bauzas | elodilles: just the usual annual family holidays with 2 weeks in Corsica, indeed | 10:28 |
elodilles | that's also sounds relaxing :) | 10:28 |
bauzas | with 2 kids ? not exactly relaxing | 10:42 |
bauzas | but at least, it's called "holidays" | 10:42 |
bauzas | the only change is I won't dad taxi every day | 10:42 |
opendevreview | Maxim Korezkij proposed openstack/nova master: fixup! Handle volume attachments https://review.opendev.org/c/openstack/nova/+/833257 | 10:44 |
opendevreview | Maxim Korezkij proposed openstack/nova master: Handle volume attachments https://review.opendev.org/c/openstack/nova/+/833234 | 10:45 |
elodilles | bauzas: let's say, different kind of relaxing :D | 11:57 |
opendevreview | Merged openstack/nova stable/yoga: [stable-only] Update .gitreview for stable/yoga https://review.opendev.org/c/openstack/nova/+/833241 | 12:46 |
opendevreview | Merged openstack/nova stable/yoga: [stable-only] Update TOX_CONSTRAINTS_FILE for stable/yoga https://review.opendev.org/c/openstack/nova/+/833242 | 12:51 |
opendevreview | Erlon R. Cruz proposed openstack/nova master: Adds regression test for bug LP#1944619 https://review.opendev.org/c/openstack/nova/+/833166 | 13:55 |
opendevreview | Erlon R. Cruz proposed openstack/nova master: Fix pre_live_migration rollback https://review.opendev.org/c/openstack/nova/+/815324 | 13:55 |
bauzas | huzzah, we now have master be Zed :) | 14:01 |
bauzas | thanks elodilles | 14:01 |
chateaulav | all hail Zed | 14:02 |
*** dasm|off is now known as dasm | 14:06 | |
dansmith | kashyap: I wonder if you might be interested in chasing a CI failure I've seen a couple times on centos jobs, where we ask libvirt to detach a volume, and it just never happens | 14:29 |
dansmith | it's just, AFAICT, a simple file-based volume so I dunno if it's a guest refusing to let it go or what | 14:30 |
kashyap | dansmith: Got a bug or a link? (Detach volume sucks the marrow out of my life ... but got used to it :D) | 14:36 |
dansmith | yeah, lemme get one | 14:36 |
kashyap | Is it this one? - https://bugs.launchpad.net/nova/+bug/1960346 | 14:39 |
dansmith | kashyap: https://zuul.opendev.org/t/openstack/build/87df2018e335440f830b08fe1a05bfb7/logs | 14:41 |
* kashyap clicks | 14:42 | |
dansmith | kashyap: ah, looks like it! | 14:42 |
kashyap | Okay, I was hoping: "Oh, not again, not a new one" -- me and Gibi recently spent a few days chasing it down | 14:43 |
dansmith | so what's the outcome there? it looks like just making a centos job non-voting :) | 14:43 |
* kashyap goes to check the email thread about it | 14:44 | |
kashyap | There was a direct thread about it w/ libvirt folks. We had two hypotheses: | 14:45 |
kashyap | (Actually I noted both in the bug. Lemme look what's the current status) | 14:45 |
kashyap | dansmith: Okay, to summarize, our two hypotheses were these: | 14:45 |
kashyap | 1) the guest OS didn't confirm the detach | 14:45 |
kashyap | 2) there was a recent bug in QEMU triggered by using JSON syntax for `-device` | 14:45 |
kashyap | It turns out to be #1 | 14:46 |
dansmith | but why does this seem to only happen on the centos jobs? it's all the same cirros in the guest right? | 14:46 |
kashyap | For now, we've hacked around it by adding extra delay :-( | 14:46 |
dansmith | are you talking about the wait_until=SSHABLE patches in terms of the delay? | 14:47 |
kashyap | dansmith: Good question! Damned if I know, why it's happening only on CentOS jobs. (I'm tempted to say "something to do w/ virt-package versions") | 14:47 |
kashyap | dansmith: No, that's the bigger thing that's not merged yet (IIRC). There was a 120-sec delay patch from Gibi ... lemme look | 14:48 |
dansmith | because I think it's still happening, AFAICT | 14:48 |
dansmith | oh jeez, 120s? | 14:48 |
sean-k-mooney | kashyap: the centos 9 stream jobs are much much newer version fo libvirt and qemu vs ubuntu | 14:48 |
sean-k-mooney | i think sshable has merged or mostly merged in the last day or two | 14:49 |
dansmith | right, which is why I'm not sure how we could ever really run on something so bleeding edge where we need stability | 14:49 |
sean-k-mooney | im not sure if all of the patches are landed but there was proggess on them | 14:49 |
dansmith | sean-k-mooney: that landed on 3/3 I think | 14:49 |
sean-k-mooney | well its the same version we will be releaseing stable wallaby on downstream | 14:49 |
sean-k-mooney | also ubuntu 22.04 will have similar version when it releases | 14:50 |
sean-k-mooney | so its kind of good that centos 9 stream is catching this | 14:50 |
dansmith | sean-k-mooney: doesn't stream9's version track upstream closer, like almost constantly moving? | 14:51 |
kashyap | Not quite; it is the "upstream of RHEL" | 14:51 |
kashyap | So not as bleeding as Fedora, but not as "stable" as RHEL either | 14:52 |
sean-k-mooney | dansmith: the pacakges are older the fedora but newer then rhel but not by much | 14:52 |
kashyap | So you might get the worst of everything w/ Stream :D | 14:52 |
dansmith | heh | 14:52 |
sean-k-mooney | effectivly stream is what would be in the next point release of rhel | 14:53 |
dansmith | well, I'd be concerned if we're adding sleeps to paper over something that would be a real problem with the host triggering the guest to release the block device | 14:53 |
dansmith | seems like something must have changed if the guest is identical | 14:53 |
sean-k-mooney | dansmith: the sleep was to see if it was related to the kernel booting | 14:53 |
sean-k-mooney | not an actual fix | 14:53 |
sean-k-mooney | before the sshable serires landed | 14:53 |
dansmith | sean-k-mooney: ah okay I thought kashyap was suggesting that was the workaround | 14:54 |
sean-k-mooney | at least that was my understanding | 14:54 |
dansmith | [06:46:25] <kashyap> For now, we've hacked around it by adding extra delay :-( | 14:54 |
dansmith | this ^ but maybe that's not what he meant | 14:54 |
sean-k-mooney | the sleep i belive was in tempest not nova if its the patch im thinking of | 14:54 |
kashyap | dansmith: Sorry, I should've been clearer; I don't see the 120sec patch in tree; but I swear Gibi mentioned it in a thread | 14:54 |
dansmith | sean-k-mooney: yeah I assumed tempest | 14:55 |
kashyap | sean-k-mooney: For the SSHable series to land, "someone" (a body) needs to shepherd it...Not sure who has the will for it | 14:55 |
sean-k-mooney | kashyap: gmann took it over | 14:56 |
sean-k-mooney | i think its landed already | 14:56 |
dansmith | yeah it's already merged | 14:56 |
dansmith | and I still see fails, like this one I think from five days later: | 14:56 |
dansmith | https://zuul.opendev.org/t/openstack/build/ee63e247893c42a69e096e14f4305850 | 14:56 |
dansmith | I haven't dug into that one yet but looks the same | 14:57 |
sean-k-mooney | ya we did not know if the sshable woudl fix it we just hoped it woudl reduce the issue. we saw some cases where test wer doing attach, detach and live migrate all before the kernel was at thet login prompt form the inital boot | 14:58 |
sean-k-mooney | and the kernel then crashed during the migration | 14:58 |
dansmith | okay, but nova retries things like the volume detach like ten times, | 14:58 |
dansmith | so I would expect that would resolve that race for detach right/ | 14:59 |
sean-k-mooney | ya so we need to remove that retry | 14:59 |
sean-k-mooney | gibi and i spoke about this a while ago | 14:59 |
sean-k-mooney | kashyap: can proably verify but i think qemu now considers it an error to detach an already detaching vloume | 14:59 |
dansmith | is libvirt/qemu only requesting the guest drop it once? because subsequent ones seem to be a libvirt refusal to try again | 15:00 |
dansmith | yeah I see that error in the logs after the first attempt | 15:00 |
sean-k-mooney | ya so it used to be undefiend behavior that on some release aborted the detach then qemu started rejecting it and | 15:00 |
kashyap | sean-k-mooney: Yes, that's right - a device that is already being unplugged, QEMU will consider another attempt at it an error | 15:00 |
sean-k-mooney | im not really sure what the intended behavior is now | 15:01 |
dansmith | so .. what to do? if the guest isn't ready to handle the signal we're just screwed until reboot or something? | 15:01 |
dansmith | that seems pretty broken | 15:01 |
sean-k-mooney | i dont know honestly. i dont think we want to do what we did in the past for snapshot which was stop and start the vm | 15:02 |
sean-k-mooney | when live snapshots were not possible | 15:02 |
dansmith | yeah :/ | 15:02 |
sean-k-mooney | other then that i dont know of a way to force this form nova | 15:02 |
dansmith | but again, I'm still curious about why this seems to only happen on centos hosts | 15:02 |
dansmith | unless you think that behavior changed between 20.04's libvirt and now? | 15:03 |
sean-k-mooney | we were seeign it on ubuntu too with q35 | 15:03 |
kashyap | dansmith: So, this was the DNM patch of 120sec delay: https://review.opendev.org/c/openstack/devstack/+/828705/8/.zuul.yaml | 15:03 |
sean-k-mooney | i dont think we have seen this with pc and ubuntu | 15:03 |
dansmith | sean-k-mooney: oh, are the centos jobs all q35 by default? | 15:03 |
sean-k-mooney | no i dont think so | 15:04 |
kashyap | The thing is, CirrOS needs 10 sec to boot in our CI, but Nova returns "ACTIVE" when the guest spawns. | 15:04 |
dansmith | kashyap: ah that's the timeout not a delay, and just runs more futile retries I guess? | 15:04 |
sean-k-mooney | but it could be a combination of the differnet way attach is done with q35 and the new versions treatment of pc | 15:04 |
kashyap | Right, I read the timeout as "delay before the detach" | 15:04 |
sean-k-mooney | dansmith: we coudl try using a debian or maybe ubuntu latest job to verify | 15:04 |
sean-k-mooney | ill check nodepool quickly | 15:05 |
sean-k-mooney | but i think we have a debian 11 image avaiable | 15:05 |
sean-k-mooney | and i think it will have a similar libvirt as c9s | 15:05 |
kashyap | dansmith: Also, yes, Q35 does have some additional hidden special bugs with hot unplug. I was told <cough> CentOS/RHEL 8.6 has better fixes in that area | 15:05 |
kashyap | (Ouch, I should not use the c-word in these times) | 15:06 |
dansmith | kashyap: may I suggest <ahem> | 15:06 |
kashyap | Heh, sure | 15:07 |
sean-k-mooney | debian-bullseye is there so we could try recreating it on that | 15:07 |
kashyap | dansmith: Incidentally, I was supposed to work with Red Hat QE today/Monday to test those bits | 15:08 |
dansmith | sean-k-mooney: so what's the point of doing that? to see if it seems to be characteristic of new libvirt/qemu and not something else about stream9 itself? | 15:08 |
kashyap | ("those bits" == supposed fixes in QEMU from 8.6) | 15:09 |
dansmith | kashyap: does that make it to stream9 at some point I hope? | 15:09 |
kashyap | Yes, definitely. They should. | 15:09 |
kashyap | "There were number of improvements for both native PCI-E (albeit it still slow to react (due to how it's implemented in guest OS) and now q35 supports ACPI base hotplug, can you check with latest machine type (which supposedly should use ACPI hotplug) and see if it resolved the issue." | 15:10 |
sean-k-mooney | dansmith: yep basically | 15:10 |
sean-k-mooney | https://bugzilla.redhat.com/show_bug.cgi?id=2007129 | 15:10 |
kashyap | That's the comment from a PCI(e) dev from QEMU (from a RHT bug) | 15:10 |
sean-k-mooney | ^ that is the main bug right | 15:10 |
kashyap | (Where "latest machine type" == 8.6 / 9) | 15:10 |
dansmith | ack | 15:12 |
sean-k-mooney | from that bug the say "This bug is related to some change in qemu-6.2 so it should not be there in RHEL 8.4/8.2," | 15:12 |
sean-k-mooney | however we have had detach issue on 8.4 | 15:12 |
sean-k-mooney | anyway it might just be down to the use of 6.2+ on c9s | 15:12 |
*** hemna1 is now known as hemna | 15:27 | |
opendevreview | Andre Aranha proposed openstack/nova master: Move FIPS jobs to experimental and periodic queue https://review.opendev.org/c/openstack/nova/+/833431 | 15:50 |
opendevreview | sean mooney proposed openstack/nova master: [WIP] enable block VDPA operations https://review.opendev.org/c/openstack/nova/+/832330 | 15:55 |
opendevreview | sean mooney proposed openstack/nova master: [WIP] enable blocked VDPA operations https://review.opendev.org/c/openstack/nova/+/832330 | 15:55 |
opendevreview | Andre Aranha proposed openstack/nova master: Test setting the nova job to centos-9-stream https://review.opendev.org/c/openstack/nova/+/831844 | 15:56 |
sean-k-mooney | i need to add a few unit tests and a release note but i think ^ that is basically done | 15:56 |
sean-k-mooney | i still want to test this with real hardware however before i do and i proably need to update the docs too | 15:57 |
* bauzas stops to work for this week | 15:58 | |
bauzas | \o | 15:58 |
sean-k-mooney | o/ | 16:01 |
opendevreview | sean mooney proposed openstack/nova stable/xena: reenable greendns in nova. https://review.opendev.org/c/openstack/nova/+/833411 | 16:14 |
opendevreview | sean mooney proposed openstack/nova stable/wallaby: reenable greendns in nova. https://review.opendev.org/c/openstack/nova/+/833435 | 16:21 |
opendevreview | sean mooney proposed openstack/nova stable/victoria: reenable greendns in nova. https://review.opendev.org/c/openstack/nova/+/833436 | 16:22 |
opendevreview | sean mooney proposed openstack/nova stable/ussuri: reenable greendns in nova. https://review.opendev.org/c/openstack/nova/+/833437 | 16:23 |
opendevreview | sean mooney proposed openstack/nova stable/train: reenable greendns in nova. https://review.opendev.org/c/openstack/nova/+/833438 | 16:23 |
opendevreview | Takashi Natsume proposed openstack/nova master: Update min supported service version for Zed https://review.opendev.org/c/openstack/nova/+/833440 | 16:49 |
zigo | I'm packaging nova RC1. I've seen that os-win is removed. Is that library useless now? | 17:05 |
zigo | Oh... setup.cfg ... :P | 17:06 |
opendevreview | Takashi Natsume proposed openstack/nova master: Update contributor guide for Zed https://review.opendev.org/c/openstack/nova/+/833441 | 17:06 |
sean-k-mooney | zigo: its an optional dep (and always was) so its no in extras | 17:21 |
sean-k-mooney | zigo: https://github.com/openstack/nova/commit/86d87be8db588cc3125d53cd92e271fb45b1a3aa for context | 17:22 |
sean-k-mooney | zigo: this was partly propeted by unmaintained packages that were breakign the gate | 17:23 |
zigo | sean-k-mooney: Is zVMCloudConnector completely gone? | 17:24 |
zigo | Or will it stay ... | 17:24 |
zigo | In other words: should I ask for its removal from Debian and erase all traces of it? | 17:25 |
gmann | dansmith: kashyap sean-k-mooney for detach failure/SSHable things, this last one needs to be merged, rescue negative test which this patch making SSHable was failing in reported bug. https://review.opendev.org/c/openstack/tempest/+/831608 | 17:25 |
gmann | it is not ready, need to debug on change failure though | 17:25 |
dansmith | gmann: ah are you saying that the sshable patches that already merged are working and that there are just a few remaining ones needing to be converted (in that patch)? | 17:29 |
sean-k-mooney | zigo: we still have it in tree https://github.com/openstack/nova/tree/master/nova/virt/zvm im not sure what its state is | 17:30 |
gmann | dansmith: patches merged are few volume detach are made SSH-able but the failing test in centos9-stream was rescue negative which is in-progress in 831608 | 17:30 |
gmann | dansmith: those merged were not failing, may be due to the wait between server create and detach operation call. in rescue negative timing were playing key role | 17:31 |
dansmith | gmann: okay the latest failure I'm looking at includes test_rescued_vm_detach_volume but there's another in there, which may or may not be related | 17:32 |
dansmith | but yeah good to know | 17:32 |
sean-k-mooney | dansmith: how oftten are you seeing the failure by the way | 17:34 |
sean-k-mooney | is it blockign the gate consitently | 17:34 |
gmann | dansmith: yeah that test and in negative test just try the detach and assert expected failure as detach cannot be done on rescue server but later this test does un-rescue server and detach in cleanup there it stuck | 17:34 |
dansmith | sean-k-mooney: on centos, this was one very common one we were suffering in the glance job | 17:34 |
gmann | * in that negative test | 17:34 |
sean-k-mooney | does glance need to test this? | 17:35 |
dansmith | gmann: do those tests actually ssh for some reason, or are we just using ssh to determine readiness? | 17:35 |
gmann | dansmith: just to check readiness | 17:35 |
dansmith | sean-k-mooney: yeah, this was a glance-cinder-multistore job which needs to run volume-related tests of course | 17:35 |
sean-k-mooney | some volume operation are certenly glance related but detach proably isnt | 17:35 |
dansmith | gmann: we could use the login prompt via console instead to reduce the need for secgroups, if that's hard for some reason | 17:36 |
dansmith | sean-k-mooney: it's a job that tests cinder-glance multistore arrangements | 17:36 |
gmann | dansmith: that is one try if that fix it. but there might be some other issue. we will see if that patch (once pass gate) can pass cento9 job too | 17:36 |
sean-k-mooney | dansmith: we removed usign the console in an eairler patch | 17:36 |
dansmith | sean-k-mooney: but the only reason we were running on centos was because we were trying to get a fips job and used one of our existing wide-coverage jobs to do that | 17:36 |
dansmith | gmann: ack | 17:37 |
gmann | yeah SSH-able was preferred than console check | 17:37 |
dansmith | it's definitely good, it's just more than required for this but fair enough | 17:37 |
gmann | let me debug sec group thing today or monday. I compared and it was same as other test doing but i might have missing something | 17:37 |
sean-k-mooney | i think the console check failed in a specific edgecase but i dont recall what it was exactly. | 17:38 |
dansmith | it's just out-of-band, so a little less fragile, | 17:38 |
dansmith | but it's cool if we're trying to stick to sshable as the indicator | 17:38 |
opendevreview | Artom Lifshitz proposed openstack/nova master: Add whitebox-devstack-multinode job to periodic https://review.opendev.org/c/openstack/nova/+/833453 | 18:10 |
*** artom__ is now known as artom | 18:10 | |
opendevreview | Dan Smith proposed openstack/nova master: Attempt to thin out nova-ceph-multistore https://review.opendev.org/c/openstack/nova/+/833470 | 21:46 |
*** dasm is now known as dasm|off | 22:27 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!