16:02:15 #startmeeting nova 16:02:15 Meeting started Tue Nov 7 16:02:15 2023 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:02:15 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:02:15 The meeting name has been set to 'nova' 16:02:22 o/ 16:02:30 #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting 16:02:45 (this will be live, I haven't updated the wiki yet) 16:03:08 o/ 16:03:43 o/ 16:04:44 there, let's start 16:04:48 #topic Bugs (stuck/critical) 16:04:53 #info No Critical bug 16:04:58 #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 32 new untriaged bugs (-4 since the last meeting) 16:05:02 #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster 16:05:12 Uggla_: any bug you wanted to tell us ? 16:05:52 anyway, let's move on 16:06:09 elodilles: fancy taking the baton ? 16:06:44 bauzas: yepp 16:06:54 i can take it :) 16:06:55 cool thanks 16:07:08 #info bug baton is elodilles 16:07:15 elodilles: ++ 16:08:12 oh 16:08:16 yes 16:08:23 #topic Gate status 16:08:28 #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:08:33 #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status 16:08:43 https://bugs.launchpad.net/nova/+bug/2039381 16:08:47 #info Please look at the gate failures and file a bug report with the gate-failure tag. 16:09:13 Uggla_: oh sorry we moved to the gate section, can we discuss your bug in the open discussion then ? 16:09:46 melwitt has a fix up for the vnc ting.. I've asked her a question, but assume we'll get that on the way soon once she's around 16:09:58 fwiw, all periodics are in green 16:10:04 yep sure 16:10:08 dansmith: oh, nice to hear, any change I could dent ? 16:10:41 sec 16:11:17 https://review.opendev.org/c/openstack/grenade/+/900257 16:11:50 dansmith: cool, CCing it 16:12:18 oh that's why it's failing intermittently 16:12:25 depending on the host you land 16:12:53 I nacked the idea when gibi suggested it because I was too lazy to look at the job and I just trusted the fact we got green runs 16:12:56 what a shame 16:13:04 bauzas: that's what I'm wondering 16:13:19 seems like we should be failing a lot more though, 16:13:34 and why this has become a thing all of a sudden also seems weird if it's been broken like this for a while 16:13:43 but otherwise yeah, hopefully an easy fix 16:13:56 because yesterday we merged my patch that added the servers actions checks in grenade 16:14:18 this test was flakey but never run, and I just opened the can of worms 16:14:33 but I think it has failed on other patches, not just yours 16:15:03 well, then the vnc check itself could be present on other tempest tests 16:15:20 or we could run the server actions list in other jobs, rather 16:15:22 anyway 16:16:01 moving on 16:16:49 #topic Release Planning 16:16:54 #link https://releases.openstack.org/caracal/schedule.html 16:16:58 #info Nova deadlines will be proposed in the schedule above 16:17:12 I just need to file a release patch with the correct dates :) 16:17:21 #info Caracal-1 milestone in 1 week 16:17:30 #info Spec review day today 16:17:56 I think I made a correct round of reviews but there are still some specs I haven't looked yet 16:18:09 I'll continue my duty until EOB 16:18:47 any list of specs just waiting for a +W? 16:19:02 sure 16:19:19 Steven Relf proposed openstack/nova master: Adding basic auth to dynamic vendordata api calls https://review.opendev.org/c/openstack/nova/+/900252 16:19:21 there is a set of reproposals that are easy wins 16:19:30 dansmith: you mean a second +2 or just a plain +W ? :) 16:19:46 bauzas: second review I mean? 16:19:48 https://review.opendev.org/q/(project:openstack/nova-specs)+status:open+NOT+owner:self+NOT+label:Workflow%253C%253D-1+label:Verified%253E%253D1%252Czuul+NOT+reviewedby:self+is:mergeable+NOT+label:Code-Review%253C%253D-1%252Cnova-core+label:Code-Review%253E%253D2 16:19:59 not sure the link will render correctly 16:20:11 but tl;dr: dansmith you have the sole spec that needs a second +2 :) 16:20:37 and yeah, there are reproposals on the way 16:20:44 btw. I wonder why they don't show up 16:21:09 bauzas: ah bummer.. maybe gibi can circle back on that .. that's the only one I can't finish :) 16:21:14 https://review.opendev.org/q/project:openstack/nova-specs+status:open+label:Code-Review%253E%253D2 16:21:31 but yeah I'll look at the re-proposals 16:21:38 sorry, my dash link seems to be wrong 16:21:59 fwiw, had no time yet to correctly write a mdev live-migration spec 16:22:01 on CI bugs, nova-emulation job is failing too (I wanted to check if its failing for more patches, but builds page is not opening for me right now ) 16:22:04 I'll bug folks directly :) 16:22:04 i can also try and loop back 16:22:09 ill be doing more review later today 16:22:21 dansmith: on your spec 16:22:36 * gibi will check back on the device alias 16:22:41 (I keep trying to do the ironic re-proposal, but I keep getting distracted, sorry) 16:22:43 Dmitriy Rabotyagov proposed openstack/nova stable/2023.1: Fix rebuild compute RPC API exception for rolling-upgrades https://review.opendev.org/c/openstack/nova/+/900336 16:22:44 Dmitriy Rabotyagov proposed openstack/nova stable/2023.1: Adding server actions tests to grenade-multinode https://review.opendev.org/c/openstack/nova/+/900337 16:22:58 gibi: you left a +1 because of Uggla_'s concern but AFAICR, it was resolved 16:23:11 so I just went +2 16:23:18 bauzas: OK, thanks 16:23:31 Dmitriy Rabotyagov proposed openstack/nova stable/2023.2: add a regression test for all compute RPCAPI 6.x pinnings for rebuild https://review.opendev.org/c/openstack/nova/+/900309 16:23:40 okay, moving on 16:24:21 dvo-plv wanted us to do a group discussion on https://review.opendev.org/c/openstack/nova-specs/+/895924/ since I asked to *not* use a trait but let's not discuss this now 16:24:32 Dmitriy Rabotyagov proposed openstack/nova stable/2023.2: Fix rebuild compute RPC API exception for rolling-upgrades https://review.opendev.org/c/openstack/nova/+/900338 16:24:33 Dmitriy Rabotyagov proposed openstack/nova stable/2023.2: Adding server actions tests to grenade-multinode https://review.opendev.org/c/openstack/nova/+/900339 16:24:41 if people want, we could discuss this during open discussion 16:25:17 (or just read my gerrit comments, you'll get my point, which is please avoid adding traits for just a libvirt version check that's already supported for all computes except Antelope and olders) 16:25:53 anyway, moving on 16:26:16 yeah 16:26:28 needs to be capability-based, IMHO 16:26:38 "supports X" not "is version X" 16:27:09 yep 16:27:17 so supprot virtio-packed format 16:27:20 is a capablity 16:27:22 I said we could resolve that with a service version check 16:27:22 not a version check 16:27:27 so it shoudl be a trait 16:27:35 and we shoudl not need a comptue service check 16:27:43 sean-k-mooney: that's not exactly what's written in both code and spec 16:27:48 as this is not a feature that is impemnted at the comptue mager level 16:27:56 the requirement is "is libvirt >6.7" 16:28:12 right that is the requireemtn for the feature too work 16:28:33 since bobcat supports 7.0, we're good 16:28:35 but it is modeled by reportign a triat for the host where the capablitys is supproted 16:28:38 yep 16:28:51 but we now need to supprot n-2 and diffent oss 16:29:00 *operating systems 16:29:02 so it only leaves the rolling upgrade case with a caracal env + an antelope node 16:29:21 which will no longer be a problem with D 16:29:29 well this featur like many other can be docusmeed as only supproted after a full upgrade is complete 16:29:49 so again, I'm very against adding a trait for modeling a libvirt version that's gonna be minimum for all our computes next cycle anyway 16:29:59 bauzas: that not what its modeling 16:30:27 and its required in my view for as long as we have any virt driver (other then ironic and libvirt) that supprot vms 16:30:36 https://review.opendev.org/c/openstack/nova/+/876075/25/nova/virt/libvirt/driver.py 16:30:57 that exactly models the libvirt version 16:31:14 not really 16:31:16 its the only way to detect it because libvirt does not have a way to detect it without a version check 16:31:19 but 16:31:26 since we now have the min versio nrequiremnet 16:31:31 it's exposing whether or not the feature is supported, but the way it does that is by the libvirt version internally 16:31:40 this would be a static trait that is exposed by computes using the libvirt driver 16:31:42 that's different than exposing a trait of the actual version 16:31:52 dansmith: exactly 16:31:54 sean-k-mooney: then I'd suggest a prefilter that would only ensure that we land on a libvirt hostr 16:32:08 the prefilter can be the trait though 16:32:15 bauzas: i belive the prefilter was aldreay in the spec 16:32:34 bauzas: its somethign i have defintly dicussed for this feature previously 16:32:37 yeah, but again, I don't like us adding yet another trait for this 16:32:56 bauzas: i belive this is exactly what traits shoudl be used for 16:32:58 but this is what we do right? all the other traits are just like this I think 16:33:08 dansmith: more or less 16:33:17 then, let's just provide this trait without a conditional 16:33:28 bauzas: yes i also said that in my comments 16:33:40 and i said this last cycle too when we origianlly proposed it 16:33:41 if it's really about directing to libvirt nodes 16:33:41 we can do that if we hard-require the min version 16:34:20 bauzas: basically the context yoru missing is we had prviously agree that if we enforced the min verison we would update the sepc depending on which feature landed first 16:34:27 kasyaps min version bump 16:34:29 or this feature 16:34:48 in the repopoal it should be updated to drop the check because the min verion bump already happend in bobcat 16:35:05 so to be clear, we're already hard-requiring the min version needed for this, and thus the trait is just for the upgrade case where we might have old computes? 16:35:11 Dmitriy Rabotyagov proposed openstack/nova stable/zed: Fix rebuild compute RPC API exception for rolling-upgrades https://review.opendev.org/c/openstack/nova/+/900341 16:35:12 Dmitriy Rabotyagov proposed openstack/nova stable/zed: Adding server actions tests to grenade-multinode https://review.opendev.org/c/openstack/nova/+/900342 16:35:21 what *you* missed is that I figured this out before the meeting (that spec is older than our min bump) but I missed the fact we want to direct to libvirt-only nodes, hence my mistake 16:35:28 dansmith: upgrade case or where your mixing virt drivers 16:35:41 dansmith: but use oru current min is 7.0.0 i belvie and the feature is in 6.x 16:35:48 sean-k-mooney: okay libvirt and ironic being the only possibilities there :) 16:35:50 dansmith: the former (upgrade case) was my concern 16:35:58 sean-k-mooney: it requires 6.7 16:36:26 so I mean, I would probably lean towards just a service version check to not let this work until everything is upgraded, but the multi-virt driver thing is a fair point 16:36:27 bauzas: yep i said 6.x because i know we met the min requirement with our min supported version 16:36:31 I don't want us to buy a new trait for something that's only because we support N-2 this cycle 16:36:40 and even though we're dropping basically all the others, we could have a new one in the future without this support, so... 16:36:57 dansmith: well we dont need a new compute service verion for this feature in general 16:37:24 altough i guess we coudl do oen for the new prefilter 16:37:27 dansmith: so you're on the same page than me, a service version check for upgrades is enough, but if we really want to avoid other hypervisors we could need a trait 16:37:28 sean-k-mooney: we don't need it, but we could use it (cheaper than a trait) for the usual purpose of not exposing features until everything is upgraded 16:37:51 bauzas: no, I said I lean that way, but I'm also fine with a trait because of the virt driver possibility 16:37:52 dansmith: oh i disagree i alwasy conisderd a comptue version bump more expensive or at most the same 16:38:17 sean-k-mooney: well, we disagree then.. we bump service versions all the time for stuff like this, where there isn't even a rpc bump to correlate 16:38:37 and it is a single integer in one tree versus a new enum in traits, a package release, a dep update, and then a nova patch 16:38:39 tbh, I'd rather prefer having the prefilter asking 'get me a libvirt compute' rather than 'get me a compute that supports foo' since this feature is very QEMU-centric 16:39:05 sean-k-mooney: the fact is, this service version check can drop next cycle 16:39:05 bauzas: that would be a misuse of traits 16:39:26 sean-k-mooney: starting with D, all computes will support a libvirt recent enough 16:39:29 sean-k-mooney: how is that a misuse of traits? 16:39:32 if we want to also have a servic verion bump and an additon check in the api prior to the call to the schduler we can 16:39:52 dansmith: we previosly said we didnt want to have a trait for which virt driver is in use 16:39:58 if two virt drivers were proposing the same feature, then yah a trait sounds good to me 16:40:07 we can revert that if we want but i know there was pushback to that in the past 16:40:30 but here that capability is purely qemu-based 16:40:37 bauzas: so is the only reason your takign this stance because we did the min libvirt bump last cycle 16:40:43 sean-k-mooney: we have traits for which type of hardware is on the host, this seems similar, but we can also filter on hypervisor type from host state anyway right? so we can do it without a trait 16:40:50 say some other feature does the same, we gonna add another trait and another prefilter 16:41:19 bauzas: yes we should anytime thre is a capablity that is not supproted by all supproted drivers 16:41:25 and boom, traits explosion, plus the fact we yet again push hypervisor features upfront 16:41:44 bauzas: traits are cheap and placment was built to deal with many of them 16:41:59 honestly, this particular feature is probably not critical, right? meaning: 16:42:13 service version checks that can get rid next cycle are cheaper IMHO and, 16:42:21 no traits is cheaper than a single one :)= 16:42:29 if we restrict to libvirt hosts with a filter, then it's easy for us to say in the reno that if you're not upgraded that feature request will not be honored by old computes that don't know about it 16:42:36 after the upgrade it's all good 16:42:51 this is an optimization not like "I *need* 32G of memory else please reject" 16:43:01 I just feel we can say in the notes 'please do what you need in case you have mixed hypervisors' 16:43:43 bauzas: i really dislike that direction as i feel our schduler shoudl ensure you land on a host that can supprot the request feature without additonal admin intervention 16:44:25 dansmith: in this partical case if we dont have the trait it will be non critical only in that on older host it willl be ignored 16:44:26 only if you have mixed hypervisors, right? 16:44:41 bauzas: no even with a singel hypervior 16:44:45 and in that case, you probably already did the setup in order to shard your cloiud 16:44:51 sean-k-mooney: right, it just seems like a best-effort sort of thing compared to some others 16:45:06 sean-k-mooney: still talking of the 'I want a compute with libvirt recent enough' then ? 16:45:15 again, doesn't sound to me worth adding a trait for this 16:45:19 dansmith: if its a flavor extra spec we shoudl always guarentee it abel to work on the slected host 16:45:43 so if we are fine with rejecting it on the compute node sure 16:45:47 anyway, I feel we're arguing right, but the time flies and we're on a meeting 16:45:49 im not ok with allowing the vm to boot 16:46:04 can we drop this until the end of this meeting 16:46:06 ? 16:46:10 yep 16:46:14 sure or to thet spec review 16:46:18 and we could try to find a way forward just after 16:46:29 cool, moving on then 16:47:13 #topic Review Priorities 16:47:18 still an action item on me 16:47:33 I have to create an etherpad as we agreed at PTG 16:47:46 so I'll keep this bullet until I'm set 16:47:46 I have some issue with electricity, battery dead, so ping me and i will answer later when I will be back online 16:48:11 #action bauzas to create a tracking etherpad for this cycle 16:48:38 dvo-plv: np, we also captured this conversation in the meeting logs that'll show up once we end the meeting 16:48:57 #topic Stable Branches 16:49:03 elodilles: your time 16:49:12 #info stable gates don't seem blocked 16:49:20 (stable/victoria's nova-ceph-multistore looked suspicious as it failed with POST_FAILURE a couple of times, but it has passed now) 16:49:35 #info stable release patches proposed: https://review.opendev.org/q/project:openstack/releases+is:open+intopic:nova 16:50:03 last time we agreed to wait for some patches to merge ^^^ 16:50:24 feel free to update the patch when they are merged 16:50:27 yeah and noonedeadpunk proposed backports 16:50:35 ++ 16:50:38 related to the RPC fixes we wanted to land 16:50:42 so, we need reviews 16:50:54 I surely can review things I wrote 16:51:03 ACK, will review them then :) 16:51:18 #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci 16:51:30 and that's all from my side 16:51:35 elodilles: ping me if you need the change numbers 16:51:41 cool, thanks 16:51:53 rushing, we have a few time left 16:51:57 #topic Open discussion 16:52:07 (gibi): Seeking for specless approval https://blueprints.launchpad.net/nova/+spec/libvirt-migrate-with-hostname-instead-of-ip based on the PTG discussion https://etherpad.opendev.org/p/nova-caracal-ptg#L815. I have a proposed impl https://review.opendev.org/c/openstack/nova/+/900203 16:52:17 so 16:52:44 as we discussed on the PTG I would like to make it possible for the libvirt driver to use hostnames during cold migration instead of IP addresses 16:52:45 bauzas: fwiw, backport to Zed is failing unit tests 16:52:53 so some look there is appreciated 16:53:09 the live migration already uses hostnames by default 16:53:15 gibi: ++ 16:53:30 this specless bp propose a new config option to make the new behavior opt in 16:53:31 noonedeadpunk: I have a clue, but let's not discuss this now (I guess this is the rpc version check unittest that fails) 16:53:31 gibi: no new RPC or object stuff needed, just a change on which thing we advertise to the remote side right? 16:53:46 dansmith: we might need a db change 16:53:47 no rpc, no object, just a config option for the libvirt driver 16:53:56 sean-k-mooney: why/ 16:53:58 sean-k-mooney: no DB change is needed afaik 16:54:03 and this is per-computes ? 16:54:06 gibi: cool, ++ for specless BP from me 16:54:16 bauzas: this is per compute for the incoming migrations 16:54:19 dont we look up the remote systme "connection" info via fiels in the obejct 16:54:22 that are stored in teh db 16:54:28 it is in the migration object 16:54:32 we store the IP there today 16:54:34 as a string 16:54:39 but it's just a string, AFAIK 16:54:43 right so will the content of that change 16:54:44 yeah 16:54:46 now it will be either IP or a hostname 16:54:54 ok that is what i was wondering about 16:55:12 and we said it was opt-in 16:55:18 ok provided we dont assume that its an ip anywhere today 16:55:19 so this is not a breaking upgrade change 16:55:23 and given its opt in 16:55:30 it is opt-in the default of the new config is to use my_ip as before 16:55:30 i think im fine with that as specless 16:55:35 yeah, so I'm favor of approving it 16:55:44 any concerns ? 16:56:07 the only thing we need to do is docuemnt that you shoudl not set the new config option until the cloud is fully upgraded 16:56:21 it will likely work with the hostname 16:56:41 and old nova 16:56:43 I can add a reno 16:56:53 ack then all good form me 16:56:56 and amend the config doc with a warning 16:56:59 I bet it works even with old ones 16:57:05 it depends 16:57:09 provided its resolveable it likely will 16:57:13 yeah 16:57:17 but it will depend on dns and /etc/hosts 16:57:35 I mean, that will depend on how the cloud is configured, but old computes can work 16:57:37 for sure 16:57:42 gibi: and just to clarify it can be an fqdn right (ip, hostname or fqdn) 16:57:52 it being the new config option 16:58:25 it can be IP, it is defulted to my_ip, it can be a user defined FQDN or hostname, or it can be "%s" which will be replaced with the hostname of the node 16:58:34 +1 16:58:37 yup 16:58:42 okay, if you don't mind, 16:58:51 the last is the most useful for me in my deployment work 16:59:00 #agreed https://blueprints.launchpad.net/nova/+spec/libvirt-migrate-with-hostname-instead-of-ip is approved as a specless BP 16:59:07 we're on time 16:59:14 any last min question ? 16:59:28 yep 16:59:34 looking at https://bugs.launchpad.net/nova/+bug/2039381 16:59:59 do you think it could be linked to a configuration (service token) issue ? 17:00:21 oh dman, forgot 17:00:22 I'll end the meeting now, we'll discuss this right after 17:00:29 ok 17:00:43 thanks all 17:00:48 #endmeeting