16:00:02 <bauzas> #startmeeting nova 16:00:02 <opendevmeet> Meeting started Tue May 2 16:00:02 2023 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:02 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:02 <opendevmeet> The meeting name has been set to 'nova' 16:00:11 <bauzas> welcome everyone 16:00:19 <elodilles> o/ 16:00:25 <bauzas> #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting 16:00:42 <gmann> o/ 16:00:50 <gibi> o/ 16:00:54 <gibi> (on a spotty connection) 16:01:19 <dansmith> bauzas: got it 16:02:02 <bauzas> coolio, let's start 16:02:11 <bauzas> #topic Bugs (stuck/critical) 16:02:21 <bauzas> #info One Critical bug 16:02:30 <bauzas> #link https://bugs.launchpad.net/nova/+bug/2012993 16:02:35 <bauzas> shall be quickly reverted 16:02:45 <bauzas> and I'll propose backports as soon as it lands 16:03:25 <bauzas> not sure we need to discuss on it by now 16:03:38 <bauzas> so we can move on, unless people wanna know more about it 16:04:07 <sean-k-mooney> i have +w'ed it so its on its way 16:04:21 <bauzas> yup, and as you wrote, two backports are planned 16:04:26 <bauzas> down to 2023.1 and Zed 16:04:45 <bauzas> anyway, moving on 16:04:56 <bauzas> #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 20 new untriaged bugs (+3 since the last meeting) 16:05:16 <bauzas> artom: I assume you didn't had a lot of time for spinning around those bugs ? 16:05:29 <bauzas> again, no worries 16:05:32 <Uggla> o/ 16:05:53 <bauzas> #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster 16:06:08 <bauzas> Uggla: you're the next on the list, happy to axe some of the bugs ? 16:06:19 <artom> bauzas, trying to do a few now 16:06:28 <bauzas> cool, no rush 16:06:36 <Uggla> bauzas, yep ok for me. 16:06:43 <bauzas> thanbks 16:06:45 <artom> For https://bugs.launchpad.net/nova/+bug/2018172 I think we need kashyap to weigh in, 'virtio' may not be a thing in RHEL 9.1? 16:06:51 <bauzas> #info bug baton is being passed to Uggla 16:06:55 <bauzas> artom: looking 16:07:01 <dansmith> uh, what 16:07:03 <artom> Or maybe just close immediately and direct them to a RHEL... thing 16:07:07 <kashyap> artom: Wait; it's impossible "virtio" can't be a thing in RHEL9.1. /me looks 16:07:21 <dansmith> hah, yeah 16:07:27 <dansmith> but looks like maybe just a particular virtio video model/ 16:07:30 <bauzas> that's a libvirt exception 16:07:42 <bauzas> so, IMHO invalid 16:07:50 <bauzas> (for the project) 16:08:23 <artom> Right, but before that I'd like to confirm that Nova is doing the right thing 16:08:46 <dansmith> is the video model an extra spec/ 16:08:50 <opendevreview> Rafael Weingartner proposed openstack/nova master: Nova to honor "cross_az_attach" during server(VM) migrations https://review.opendev.org/c/openstack/nova/+/864760 16:08:57 * dansmith curses his ? key 16:09:10 <artom> I guess I'm going about it backwards - it's just an easier thing to check initially, RHEL 9.1 support for virtio video 16:09:12 <artom> dansmith, yeah 16:09:14 <sean-k-mooney> its an image property 16:09:19 <bauzas> yeah 16:09:41 <dansmith> is virtio the only option there or is it like virtio plus a device model name/ 16:09:47 <sean-k-mooney> virtio shoudl be a thing in rhel9 16:09:48 <bauzas> we just try to generate a domain which is badly incorrect given the image metadata 16:09:49 <dansmith> just wondering if they're specifying the wrong thing 16:09:56 <sean-k-mooney> technially it maps to virtio-gpu 16:10:05 <bauzas> that rings a bell to me 16:10:39 <bauzas> anyway, I'd propose to punt this bug today by asking the reporter its flavor and image 16:10:57 <opendevreview> Rafael Weingartner proposed openstack/nova master: Nova to honor "cross_az_attach" during server(VM) migrations https://review.opendev.org/c/openstack/nova/+/864760 16:11:02 <dansmith> why are we even discussing this on the meeting? 16:11:43 <bauzas> I guess because artom wanted to triage it 16:11:48 <bauzas> so please => incomplete it 16:11:49 <artom> I just wanted to ping kashyap on it, is all 16:11:49 <sean-k-mooney> i guess it was a bug artom tought whoudl be brougt to our attention based on the triage 16:11:57 <bauzas> and ask for the image metadata 16:12:11 <kashyap> artom: We can sort it out off-meeting :) Thx! 16:12:25 <sean-k-mooney> artom: i have been using hw_video_model=virtio for years so its valid for sure 16:12:48 <sean-k-mooney> but ya we can move on 16:12:56 <kashyap> (Yeah; plain "virtio" must work) 16:13:20 <bauzas> moving on so 16:13:36 <kashyap> Is it still on-topic to discuss a CirrOS seg-fault that dansmith pointed out earlier? 16:13:41 <bauzas> #topic Gate status 16:13:48 <bauzas> #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:13:53 <bauzas> #link https://etherpad.opendev.org/p/nova-ci-failures 16:14:04 <bauzas> there it is time to discuss gate failures now 16:14:26 <bauzas> dansmith: shoot the good news and the bad ones if you have 16:14:49 <dansmith> tldr is that I have got the ceph job moved to jammy and cephadm and it's working finally, and: 16:15:10 <dansmith> that I found a bunch of places where we thought we were requiring SSHABLE before volume activities where it was being silently ignored 16:15:17 <dansmith> basically what gibi asserted 16:15:28 <dansmith> so I've got a stack of tempest patches (and cinder-tempest-plugin) to not only fix those, 16:15:51 <dansmith> but also sanity check and raise an error if someone asks for SSHABLE but without the requisite extra stuff to actually honor it 16:16:13 <dansmith> that helps us pass the ceph job in the new config but will also likely help improve volume failures in the other jobs 16:16:37 <bauzas> cool 16:16:43 <bauzas> thanks for the janitorisation 16:16:57 <gibi> dansmith: awesome 16:17:06 <gibi> thank you 16:17:16 <bauzas> which patches shall be looked at for people who care ? 16:17:24 <dansmith> it's been a lot of work and frustration, but happy to see it becoming fruitful 16:17:36 <dansmith> well, no patches to nova, but I can get you a link, hang on 16:17:56 <gmann> this one #link https://review.opendev.org/q/topic:sshable-volume-tests 16:18:00 <dansmith> this stack on tempest: https://review.opendev.org/c/openstack/tempest/+/881925 16:18:14 <dansmith> and this one against cinder-tempest https://review.opendev.org/c/openstack/cinder-tempest-plugin/+/881764 16:18:30 <dansmith> the nova patch I have up is a DNM just to test our job with that full stack, but we don't need to merge anything 16:18:55 <bauzas> #link https://review.opendev.org/q/topic:sshable-volume-tests Tempest patches that add more ssh checks for volume-related tests 16:19:11 <bauzas> cool excellent, thanks a lot dansmith for this hard work 16:19:26 * dansmith nods 16:19:32 <bauzas> that's noted. 16:19:50 <bauzas> any other CI failure to relate or mention ? 16:20:22 <bauzas> looks not, excellent 16:20:37 <bauzas> #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status 16:21:08 <bauzas> the fips job run had a node failure, but should be green next week 16:22:11 <bauzas> on a side note, I'm trying to add some kind of specific vgpu job that would use the mtty sample framework for validating the usage we have on a periodic basis 16:22:55 <bauzas> #info Please look at the gate failures and file a bug report with the gate-failure tag. 16:23:01 <bauzas> #info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures 16:23:06 <bauzas> that's it for me on that topic 16:23:08 <bauzas> moving on 16:23:15 <bauzas> #topic Release Planning 16:23:19 <bauzas> #link https://releases.openstack.org/bobcat/schedule.html 16:23:31 <bauzas> #info Nova deadlines are set in the above schedule 16:23:37 <bauzas> #info Bobcat-1 is in 1 weeks 16:23:56 <bauzas> #info Next Tuesday 9th is stable branches review day https://releases.openstack.org/bobcat/schedule.html#b-nova-stable-review-day 16:24:24 <bauzas> I'll communicate on that review day by emailing -discuss 16:24:46 <bauzas> but let's discuss that more in the stable topic we have later 16:25:04 <bauzas> #topic Review priorities 16:25:04 <bauzas> #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2) 16:25:06 <bauzas> #info As a reminder, cores eager to review changes can +1 to indicate their interest, +2 for committing to the review 16:25:11 <bauzas> #topic Stable Branches 16:25:16 <bauzas> elodilles: go for it 16:25:23 <elodilles> #info stable nova versions were released for zed (26.1.1) and for yoga (25.1.1) last week 16:25:29 <elodilles> otherwise not much happened 16:25:35 <bauzas> rigt 16:25:38 <elodilles> #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci 16:25:50 <elodilles> that's all from me 16:27:05 <bauzas> excellent thanks 16:27:30 <bauzas> and as I said just before, we shall round about some patches next week hopefully 16:27:36 <elodilles> \o/ 16:28:02 <bauzas> #topic Open discussion 16:28:11 <bauzas> (enriquetaso) Discuss the blueprint: NFS Encryption Support for qemu 16:28:14 <bauzas> enriquetaso: shoot 16:28:18 <enriquetaso> hi 16:28:21 <enriquetaso> sure 16:28:30 <enriquetaso> As discussed on the PTG a couple weeks ago. I’ve proposed the blueprint. 16:28:42 <enriquetaso> #link https://blueprints.launchpad.net/nova/+spec/nfs-encryption-support 16:28:57 <enriquetaso> Summary: Cinder is working on supporting encryption on NFS volumes. To do this NFS driver uses LUKS inside qcow2 for this. 16:29:10 <enriquetaso> This affects Nova because Nova cannot handle qemu + LUKS inside qcow2 disk format at the moment. 16:29:25 <enriquetaso> What are your thoughts? 16:29:35 <enriquetaso> should I mention the rolling upgrades would be a problem on the bp ? 16:30:20 <bauzas> lemme reopen the ptg notes 16:30:50 <bauzas> right 16:31:51 <bauzas> we basically said we were quite okay with the proposed design but we were wondering if it was worth not scheduling to old computes 16:32:03 <bauzas> we have three options here : 16:32:23 <bauzas> 1/ avoid scheduling to old computes (by adding a prefilter) 16:33:07 <bauzas> 2/ preventing this feature on the API level by checking the compute service versions 16:33:59 <bauzas> 3/ do some compute check that would prevent the volume to be encryped on some preconditionsq 16:34:36 <bauzas> #2 seems a bit harsh to me 16:35:02 <sean-k-mooney> dind we discusss addign a trait 16:35:07 <bauzas> we did it 16:35:09 <sean-k-mooney> so 1 16:35:21 <bauzas> without having full quorum, hence me restating the options 16:35:34 <sean-k-mooney> well i vote 1 or file a spec 16:36:04 <bauzas> then I tend to say option #1 and specless blueprint as we basically went down 16:36:12 <gibi> I'm OK with 1/ 16:36:16 <sean-k-mooney> becasue if we dont just use a trait/prefileter then i think we need a spec to expalin why that is not sufficent and descirbe it in detail 16:37:01 <dansmith> so a trait of "this is newer than X"? 16:37:09 <sean-k-mooney> no 16:37:16 <dansmith> that's kinda fundamentally wrong, so you need to expose it as a feature flag 16:37:22 <sean-k-mooney> COMPUTE_somehting 16:37:36 <bauzas> sean-k-mooney: do we all agree now that we accept a new trait saying something like "I_CAN_ENCRYPT_YOUR-STUFF" 16:37:38 <dansmith> is there any compute config that needs to be enabled? if so, ideally that would control the exposure (or not) of the trait 16:37:38 <sean-k-mooney> to report the hypervers capablity to supprot lux in qcow 16:37:56 <sean-k-mooney> dansmith: i think this just need to check the qemu/libvirt version 16:38:00 <bauzas> my only concern is the traits inflation but that's a string 16:38:02 <sean-k-mooney> and report it statically if we are above that 16:38:09 <sean-k-mooney> i dont think we need a cofnig option 16:38:17 <bauzas> sean-k-mooney: that's why I was considering option 3 16:38:35 <dansmith> sean-k-mooney: yeah, that's just a little annoying I think 16:38:39 <bauzas> which would be "meh dude, you don't have what I need, I'll just do what I can do" 16:38:45 <sean-k-mooney> bauzas: i dont think optional encypting is accpetable 16:38:47 <dansmith> because it becomes not as much a feature flag but a shadow version number 16:39:22 <bauzas> sean-k-mooney: true 16:39:28 <sean-k-mooney> if you asked for the storage to be enypeed we either need to do it or raise an error 16:40:05 <bauzas> sounds then reasonable to ERROR the instance 16:40:12 <enriquetaso> what a `new trait` involves? 16:40:18 <sean-k-mooney> dansmith: im not agaisn a min compute service version check in the api as well by the way 16:40:24 <bauzas> that's option 3 16:40:25 <sean-k-mooney> i dont really think 2 is harsh 16:40:29 <bauzas> s/3/2 16:40:46 <sean-k-mooney> we normally dont enabel feature untill the cloud is fully upgraded 16:40:54 <dansmith> I'm not saying that's how it needs to be, I'm just saying it feels like we're bordering on trait abuse here 16:40:56 <dansmith> so FWIW, 16:41:10 <sean-k-mooney> we have in the past done this on a per compute host bassis 16:41:15 <dansmith> we could also have a scheduler filter that requires a service version at or above a number 16:41:20 <sean-k-mooney> but in generall i think a min comptue version check is preferable 16:41:23 <dansmith> and we could add hints/advice to the scheduler for this sort of thing 16:41:31 <dansmith> which would be nice for this and other things I imagine 16:41:49 <bauzas> yeah this sounds quite a reasonable tradeoff 16:42:03 <sean-k-mooney> this being? 16:42:20 <bauzas> I'm just wondering whether we expose the service version on the scheduling side 16:42:31 <sean-k-mooney> i think you can already schedul based on the comptue service version 16:42:45 <sean-k-mooney> with either the json of comptue capablity filter 16:42:58 <sean-k-mooney> but in any case we want this to work without any configuration requried 16:43:20 <sean-k-mooney> bauzas: why not just do 2? 16:43:24 <gibi> I might miss something but if this feature nees compute code + libirt/qemu version then as simple compute version check is not enough 16:43:34 <dansmith> sean-k-mooney: Im saying make it integrated 16:43:44 <dansmith> sean-k-mooney: kinda like a prefilter 16:43:56 <bauzas> wait https://github.com/openstack/nova/blob/master/nova/scheduler/request_filter.py#L399 16:44:17 <dansmith> gibi: yeah, you need service version and libvirt version and qemu version right? 16:44:20 <sean-k-mooney> so if feature x requries minv version y have a prefilter or simialr that will request a host with that min version 16:44:25 <bauzas> that's ephemeral encryption 16:44:50 <sean-k-mooney> ya thats seperate 16:44:56 <sean-k-mooney> we are talkign about encyption for nfs 16:45:09 <sean-k-mooney> for cinder volume backend that use nfs 16:45:10 <dansmith> gibi: that's kinda why I just think this is trait pollution because we end up with all these feature flags for any possible combination of several versions 16:45:24 <dansmith> and especially in three years, that trait is useless as everything exposes it all the time 16:45:55 <dansmith> so I'm not sure what the best plan is here, to be clear, I'm just saying none of these simple things feels right 16:46:07 <sean-k-mooney> dansmith: not really usign a trait we report an abstract capblity. but in general i would prefer both a trait and a compute version bump 16:46:43 <dansmith> that's the thing though: 16:47:02 <sean-k-mooney> enriquetaso: how is this feature requested on teh volume by the way? 16:47:08 <dansmith> just exposing a "can do nfs encryption" because all the versions are new enough just feels like we could have a thousand of those things 16:47:20 <enriquetaso> when attaching the volume sean-k-mooney 16:47:35 <sean-k-mooney> how exactly a atribute on the volume 16:47:44 <sean-k-mooney> enriquetaso: we have two distict but related problems 16:47:55 <sean-k-mooney> for attachemtn we know the host and have to check if the host suprot this 16:48:03 <sean-k-mooney> for new boots or move operattions 16:48:13 <sean-k-mooney> we need to find another host that also supports it 16:48:38 <bauzas> that's why I tend to lean on option 2 16:48:40 <sean-k-mooney> and we need this to work without any operator configurign of aggreates ectra 16:48:46 <bauzas> it's an interim solution 16:48:54 <dansmith> bauzas: me too, but 2 is not enough right? 16:49:07 <sean-k-mooney> option 2 works if an only if our min libvirt/qemu version are new enough 16:49:10 <dansmith> because you could be running an older libvirt/qemu 16:49:11 <enriquetaso> mmh, the volume attr says it's encrypted and the volume image format is qcow2: https://review.opendev.org/c/openstack/nova/+/854030/6/nova/virt/libvirt/utils.py 16:49:18 <dansmith> and I suspect it also depends on brick versions? 16:49:24 <enriquetaso> not sure to understand the questions 16:49:42 <sean-k-mooney> enriquetaso: i was asking because if we are booting a new vm we would need to look that up 16:49:55 <sean-k-mooney> and then include it as an input to placemnt and the schduler in some way 16:50:00 <bauzas> dansmith: hah, true, I was considering an API check, but that would require the libvirt versions, so nevermind my foolness 16:50:24 <bauzas> yeah, so that'a tuple (libvirt, compute) for accepted versions 16:50:26 <sean-k-mooney> well it may or may not again it depend on if our min version is above or below the ersion that intoduced it 16:50:44 <bauzas> you need libvirt AND compute versions to be recent enough 16:50:44 <enriquetaso> oh, the volume doesnt have anything special, it just volume type=nfs and encrypted=true sean-k-mooney 16:51:22 <dansmith> sean-k-mooney: yeah, well, that's a good reason not to just add the compute+libvirt+qemu into a trait IMHO 16:51:35 <sean-k-mooney> dansmith: right which is not what i suggested 16:51:57 <dansmith> I know :) 16:51:57 <sean-k-mooney> i very explictly said have a triat for the capablity to supprot lux in qcow 16:52:06 <gibi> dansmith: on the trait pollution: either we encode a list of versions as a capability on the compute side, or we expose those versions to the scheduler / placement and code up a similar mapping of capability - versions in the sceduler side. We just move around similar logic 16:52:40 <dansmith> gibi: yeah, understand, it's just that traits are supposed to be timeless right? 16:52:47 <bauzas> we have 5 mins to find a solution or defer to a spec, honestly 16:53:00 <dansmith> I'm not saying I know which solution (combination) is best, I'm just saying nothing feels particularly natural to me 16:53:19 <sean-k-mooney> so i was suggestign have the driver check the requirements and report COMPUTE_LUKS_IN_QCOW if it supprots it 16:53:50 <sean-k-mooney> and have a prefilter requesst that if the voluem was nfs and qcow and encrypted=true 16:54:01 <gibi> dansmith: we will have a bunch of traits yes, but I don't see what problem that causes. We never remove min compute version checks from the code either 16:54:07 <sean-k-mooney> and addtional have a min compute service check for the rooling upgrade case 16:54:36 <sean-k-mooney> the other thing about traits is they are ment to be virt driver independent 16:55:08 <gibi> LUKS_IN_QCOW does not seem to be libvirt dependent 16:55:13 <sean-k-mooney> to dans timeless point i.e. if we add one it shoudl be resuable by other virt driver 16:55:19 <sean-k-mooney> ya i was just thinking that 16:55:25 <sean-k-mooney> its the qcow bit but 16:55:32 <sean-k-mooney> we have that alredy 16:56:17 <sean-k-mooney> https://github.com/openstack/os-traits/blob/master/os_traits/compute/ephemeral.py#L18 and https://github.com/openstack/os-traits/blob/master/os_traits/compute/image.py#L28 16:56:23 <sean-k-mooney> almost would work togather 16:56:43 <sean-k-mooney> btu we cant assume that epmeral encryption supprot for lux means nfs also works 16:56:58 <enriquetaso> LUKS_IN_QCOW is already a config option on Nova gibi ? 16:57:13 <sean-k-mooney> enriquetaso: not really 16:57:23 <sean-k-mooney> or not that im aware of 16:57:35 <sean-k-mooney> in the context of cinder 16:57:39 <gibi> enriquetaso: we are discussing creating a new placement trait to represent if a compute supports luks in qcow 16:57:54 <enriquetaso> gibi++ thanks 16:58:19 <bauzas> I honestly think we need to settle the dust 16:58:49 <bauzas> given the very short time we have left I hereby propose enriquetaso to create a spec and describe the feature 16:58:58 <bauzas> we could then chime on the upgrade concerns 16:59:06 <enriquetaso> okay u.u 16:59:15 <bauzas> enriquetaso: are you familiar with the spec process ? 16:59:26 <enriquetaso> is it too different from the cinder one? 16:59:39 <enriquetaso> bauzas, do you a have a doc? :P 16:59:55 <bauzas> enriquetaso: I can provide you pointers and more than that : guidance 17:00:11 <enriquetaso> sure bauzas 17:00:19 <bauzas> #agreed enriquetaso to provide a spec for this feature 17:00:33 <bauzas> #action bauzas to provide enriquetaso details on the spec process 17:00:40 <bauzas> we're on time 17:00:45 <bauzas> the agenda is done 17:00:56 <bauzas> so thanks all and see you next week 17:01:01 <bauzas> #endmeeting