16:00:02 #startmeeting nova 16:00:02 Meeting started Tue May 2 16:00:02 2023 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:02 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:02 The meeting name has been set to 'nova' 16:00:11 welcome everyone 16:00:19 o/ 16:00:25 #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting 16:00:42 o/ 16:00:50 o/ 16:00:54 (on a spotty connection) 16:01:19 bauzas: got it 16:02:02 coolio, let's start 16:02:11 #topic Bugs (stuck/critical) 16:02:21 #info One Critical bug 16:02:30 #link https://bugs.launchpad.net/nova/+bug/2012993 16:02:35 shall be quickly reverted 16:02:45 and I'll propose backports as soon as it lands 16:03:25 not sure we need to discuss on it by now 16:03:38 so we can move on, unless people wanna know more about it 16:04:07 i have +w'ed it so its on its way 16:04:21 yup, and as you wrote, two backports are planned 16:04:26 down to 2023.1 and Zed 16:04:45 anyway, moving on 16:04:56 #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 20 new untriaged bugs (+3 since the last meeting) 16:05:16 artom: I assume you didn't had a lot of time for spinning around those bugs ? 16:05:29 again, no worries 16:05:32 o/ 16:05:53 #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster 16:06:08 Uggla: you're the next on the list, happy to axe some of the bugs ? 16:06:19 bauzas, trying to do a few now 16:06:28 cool, no rush 16:06:36 bauzas, yep ok for me. 16:06:43 thanbks 16:06:45 For https://bugs.launchpad.net/nova/+bug/2018172 I think we need kashyap to weigh in, 'virtio' may not be a thing in RHEL 9.1? 16:06:51 #info bug baton is being passed to Uggla 16:06:55 artom: looking 16:07:01 uh, what 16:07:03 Or maybe just close immediately and direct them to a RHEL... thing 16:07:07 artom: Wait; it's impossible "virtio" can't be a thing in RHEL9.1. /me looks 16:07:21 hah, yeah 16:07:27 but looks like maybe just a particular virtio video model/ 16:07:30 that's a libvirt exception 16:07:42 so, IMHO invalid 16:07:50 (for the project) 16:08:23 Right, but before that I'd like to confirm that Nova is doing the right thing 16:08:46 is the video model an extra spec/ 16:08:50 Rafael Weingartner proposed openstack/nova master: Nova to honor "cross_az_attach" during server(VM) migrations https://review.opendev.org/c/openstack/nova/+/864760 16:08:57 * dansmith curses his ? key 16:09:10 I guess I'm going about it backwards - it's just an easier thing to check initially, RHEL 9.1 support for virtio video 16:09:12 dansmith, yeah 16:09:14 its an image property 16:09:19 yeah 16:09:41 is virtio the only option there or is it like virtio plus a device model name/ 16:09:47 virtio shoudl be a thing in rhel9 16:09:48 we just try to generate a domain which is badly incorrect given the image metadata 16:09:49 just wondering if they're specifying the wrong thing 16:09:56 technially it maps to virtio-gpu 16:10:05 that rings a bell to me 16:10:39 anyway, I'd propose to punt this bug today by asking the reporter its flavor and image 16:10:57 Rafael Weingartner proposed openstack/nova master: Nova to honor "cross_az_attach" during server(VM) migrations https://review.opendev.org/c/openstack/nova/+/864760 16:11:02 why are we even discussing this on the meeting? 16:11:43 I guess because artom wanted to triage it 16:11:48 so please => incomplete it 16:11:49 I just wanted to ping kashyap on it, is all 16:11:49 i guess it was a bug artom tought whoudl be brougt to our attention based on the triage 16:11:57 and ask for the image metadata 16:12:11 artom: We can sort it out off-meeting :) Thx! 16:12:25 artom: i have been using hw_video_model=virtio for years so its valid for sure 16:12:48 but ya we can move on 16:12:56 (Yeah; plain "virtio" must work) 16:13:20 moving on so 16:13:36 Is it still on-topic to discuss a CirrOS seg-fault that dansmith pointed out earlier? 16:13:41 #topic Gate status 16:13:48 #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:13:53 #link https://etherpad.opendev.org/p/nova-ci-failures 16:14:04 there it is time to discuss gate failures now 16:14:26 dansmith: shoot the good news and the bad ones if you have 16:14:49 tldr is that I have got the ceph job moved to jammy and cephadm and it's working finally, and: 16:15:10 that I found a bunch of places where we thought we were requiring SSHABLE before volume activities where it was being silently ignored 16:15:17 basically what gibi asserted 16:15:28 so I've got a stack of tempest patches (and cinder-tempest-plugin) to not only fix those, 16:15:51 but also sanity check and raise an error if someone asks for SSHABLE but without the requisite extra stuff to actually honor it 16:16:13 that helps us pass the ceph job in the new config but will also likely help improve volume failures in the other jobs 16:16:37 cool 16:16:43 thanks for the janitorisation 16:16:57 dansmith: awesome 16:17:06 thank you 16:17:16 which patches shall be looked at for people who care ? 16:17:24 it's been a lot of work and frustration, but happy to see it becoming fruitful 16:17:36 well, no patches to nova, but I can get you a link, hang on 16:17:56 this one #link https://review.opendev.org/q/topic:sshable-volume-tests 16:18:00 this stack on tempest: https://review.opendev.org/c/openstack/tempest/+/881925 16:18:14 and this one against cinder-tempest https://review.opendev.org/c/openstack/cinder-tempest-plugin/+/881764 16:18:30 the nova patch I have up is a DNM just to test our job with that full stack, but we don't need to merge anything 16:18:55 #link https://review.opendev.org/q/topic:sshable-volume-tests Tempest patches that add more ssh checks for volume-related tests 16:19:11 cool excellent, thanks a lot dansmith for this hard work 16:19:26 * dansmith nods 16:19:32 that's noted. 16:19:50 any other CI failure to relate or mention ? 16:20:22 looks not, excellent 16:20:37 #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status 16:21:08 the fips job run had a node failure, but should be green next week 16:22:11 on a side note, I'm trying to add some kind of specific vgpu job that would use the mtty sample framework for validating the usage we have on a periodic basis 16:22:55 #info Please look at the gate failures and file a bug report with the gate-failure tag. 16:23:01 #info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures 16:23:06 that's it for me on that topic 16:23:08 moving on 16:23:15 #topic Release Planning 16:23:19 #link https://releases.openstack.org/bobcat/schedule.html 16:23:31 #info Nova deadlines are set in the above schedule 16:23:37 #info Bobcat-1 is in 1 weeks 16:23:56 #info Next Tuesday 9th is stable branches review day https://releases.openstack.org/bobcat/schedule.html#b-nova-stable-review-day 16:24:24 I'll communicate on that review day by emailing -discuss 16:24:46 but let's discuss that more in the stable topic we have later 16:25:04 #topic Review priorities 16:25:04 #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2) 16:25:06 #info As a reminder, cores eager to review changes can +1 to indicate their interest, +2 for committing to the review 16:25:11 #topic Stable Branches 16:25:16 elodilles: go for it 16:25:23 #info stable nova versions were released for zed (26.1.1) and for yoga (25.1.1) last week 16:25:29 otherwise not much happened 16:25:35 rigt 16:25:38 #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci 16:25:50 that's all from me 16:27:05 excellent thanks 16:27:30 and as I said just before, we shall round about some patches next week hopefully 16:27:36 \o/ 16:28:02 #topic Open discussion 16:28:11 (enriquetaso) Discuss the blueprint: NFS Encryption Support for qemu 16:28:14 enriquetaso: shoot 16:28:18 hi 16:28:21 sure 16:28:30 As discussed on the PTG a couple weeks ago. I’ve proposed the blueprint. 16:28:42 #link https://blueprints.launchpad.net/nova/+spec/nfs-encryption-support 16:28:57 Summary: Cinder is working on supporting encryption on NFS volumes. To do this NFS driver uses LUKS inside qcow2 for this. 16:29:10 This affects Nova because Nova cannot handle qemu + LUKS inside qcow2 disk format at the moment. 16:29:25 What are your thoughts? 16:29:35 should I mention the rolling upgrades would be a problem on the bp ? 16:30:20 lemme reopen the ptg notes 16:30:50 right 16:31:51 we basically said we were quite okay with the proposed design but we were wondering if it was worth not scheduling to old computes 16:32:03 we have three options here : 16:32:23 1/ avoid scheduling to old computes (by adding a prefilter) 16:33:07 2/ preventing this feature on the API level by checking the compute service versions 16:33:59 3/ do some compute check that would prevent the volume to be encryped on some preconditionsq 16:34:36 #2 seems a bit harsh to me 16:35:02 dind we discusss addign a trait 16:35:07 we did it 16:35:09 so 1 16:35:21 without having full quorum, hence me restating the options 16:35:34 well i vote 1 or file a spec 16:36:04 then I tend to say option #1 and specless blueprint as we basically went down 16:36:12 I'm OK with 1/ 16:36:16 becasue if we dont just use a trait/prefileter then i think we need a spec to expalin why that is not sufficent and descirbe it in detail 16:37:01 so a trait of "this is newer than X"? 16:37:09 no 16:37:16 that's kinda fundamentally wrong, so you need to expose it as a feature flag 16:37:22 COMPUTE_somehting 16:37:36 sean-k-mooney: do we all agree now that we accept a new trait saying something like "I_CAN_ENCRYPT_YOUR-STUFF" 16:37:38 is there any compute config that needs to be enabled? if so, ideally that would control the exposure (or not) of the trait 16:37:38 to report the hypervers capablity to supprot lux in qcow 16:37:56 dansmith: i think this just need to check the qemu/libvirt version 16:38:00 my only concern is the traits inflation but that's a string 16:38:02 and report it statically if we are above that 16:38:09 i dont think we need a cofnig option 16:38:17 sean-k-mooney: that's why I was considering option 3 16:38:35 sean-k-mooney: yeah, that's just a little annoying I think 16:38:39 which would be "meh dude, you don't have what I need, I'll just do what I can do" 16:38:45 bauzas: i dont think optional encypting is accpetable 16:38:47 because it becomes not as much a feature flag but a shadow version number 16:39:22 sean-k-mooney: true 16:39:28 if you asked for the storage to be enypeed we either need to do it or raise an error 16:40:05 sounds then reasonable to ERROR the instance 16:40:12 what a `new trait` involves? 16:40:18 dansmith: im not agaisn a min compute service version check in the api as well by the way 16:40:24 that's option 3 16:40:25 i dont really think 2 is harsh 16:40:29 s/3/2 16:40:46 we normally dont enabel feature untill the cloud is fully upgraded 16:40:54 I'm not saying that's how it needs to be, I'm just saying it feels like we're bordering on trait abuse here 16:40:56 so FWIW, 16:41:10 we have in the past done this on a per compute host bassis 16:41:15 we could also have a scheduler filter that requires a service version at or above a number 16:41:20 but in generall i think a min comptue version check is preferable 16:41:23 and we could add hints/advice to the scheduler for this sort of thing 16:41:31 which would be nice for this and other things I imagine 16:41:49 yeah this sounds quite a reasonable tradeoff 16:42:03 this being? 16:42:20 I'm just wondering whether we expose the service version on the scheduling side 16:42:31 i think you can already schedul based on the comptue service version 16:42:45 with either the json of comptue capablity filter 16:42:58 but in any case we want this to work without any configuration requried 16:43:20 bauzas: why not just do 2? 16:43:24 I might miss something but if this feature nees compute code + libirt/qemu version then as simple compute version check is not enough 16:43:34 sean-k-mooney: Im saying make it integrated 16:43:44 sean-k-mooney: kinda like a prefilter 16:43:56 wait https://github.com/openstack/nova/blob/master/nova/scheduler/request_filter.py#L399 16:44:17 gibi: yeah, you need service version and libvirt version and qemu version right? 16:44:20 so if feature x requries minv version y have a prefilter or simialr that will request a host with that min version 16:44:25 that's ephemeral encryption 16:44:50 ya thats seperate 16:44:56 we are talkign about encyption for nfs 16:45:09 for cinder volume backend that use nfs 16:45:10 gibi: that's kinda why I just think this is trait pollution because we end up with all these feature flags for any possible combination of several versions 16:45:24 and especially in three years, that trait is useless as everything exposes it all the time 16:45:55 so I'm not sure what the best plan is here, to be clear, I'm just saying none of these simple things feels right 16:46:07 dansmith: not really usign a trait we report an abstract capblity. but in general i would prefer both a trait and a compute version bump 16:46:43 that's the thing though: 16:47:02 enriquetaso: how is this feature requested on teh volume by the way? 16:47:08 just exposing a "can do nfs encryption" because all the versions are new enough just feels like we could have a thousand of those things 16:47:20 when attaching the volume sean-k-mooney 16:47:35 how exactly a atribute on the volume 16:47:44 enriquetaso: we have two distict but related problems 16:47:55 for attachemtn we know the host and have to check if the host suprot this 16:48:03 for new boots or move operattions 16:48:13 we need to find another host that also supports it 16:48:38 that's why I tend to lean on option 2 16:48:40 and we need this to work without any operator configurign of aggreates ectra 16:48:46 it's an interim solution 16:48:54 bauzas: me too, but 2 is not enough right? 16:49:07 option 2 works if an only if our min libvirt/qemu version are new enough 16:49:10 because you could be running an older libvirt/qemu 16:49:11 mmh, the volume attr says it's encrypted and the volume image format is qcow2: https://review.opendev.org/c/openstack/nova/+/854030/6/nova/virt/libvirt/utils.py 16:49:18 and I suspect it also depends on brick versions? 16:49:24 not sure to understand the questions 16:49:42 enriquetaso: i was asking because if we are booting a new vm we would need to look that up 16:49:55 and then include it as an input to placemnt and the schduler in some way 16:50:00 dansmith: hah, true, I was considering an API check, but that would require the libvirt versions, so nevermind my foolness 16:50:24 yeah, so that'a tuple (libvirt, compute) for accepted versions 16:50:26 well it may or may not again it depend on if our min version is above or below the ersion that intoduced it 16:50:44 you need libvirt AND compute versions to be recent enough 16:50:44 oh, the volume doesnt have anything special, it just volume type=nfs and encrypted=true sean-k-mooney 16:51:22 sean-k-mooney: yeah, well, that's a good reason not to just add the compute+libvirt+qemu into a trait IMHO 16:51:35 dansmith: right which is not what i suggested 16:51:57 I know :) 16:51:57 i very explictly said have a triat for the capablity to supprot lux in qcow 16:52:06 dansmith: on the trait pollution: either we encode a list of versions as a capability on the compute side, or we expose those versions to the scheduler / placement and code up a similar mapping of capability - versions in the sceduler side. We just move around similar logic 16:52:40 gibi: yeah, understand, it's just that traits are supposed to be timeless right? 16:52:47 we have 5 mins to find a solution or defer to a spec, honestly 16:53:00 I'm not saying I know which solution (combination) is best, I'm just saying nothing feels particularly natural to me 16:53:19 so i was suggestign have the driver check the requirements and report COMPUTE_LUKS_IN_QCOW if it supprots it 16:53:50 and have a prefilter requesst that if the voluem was nfs and qcow and encrypted=true 16:54:01 dansmith: we will have a bunch of traits yes, but I don't see what problem that causes. We never remove min compute version checks from the code either 16:54:07 and addtional have a min compute service check for the rooling upgrade case 16:54:36 the other thing about traits is they are ment to be virt driver independent 16:55:08 LUKS_IN_QCOW does not seem to be libvirt dependent 16:55:13 to dans timeless point i.e. if we add one it shoudl be resuable by other virt driver 16:55:19 ya i was just thinking that 16:55:25 its the qcow bit but 16:55:32 we have that alredy 16:56:17 https://github.com/openstack/os-traits/blob/master/os_traits/compute/ephemeral.py#L18 and https://github.com/openstack/os-traits/blob/master/os_traits/compute/image.py#L28 16:56:23 almost would work togather 16:56:43 btu we cant assume that epmeral encryption supprot for lux means nfs also works 16:56:58 LUKS_IN_QCOW is already a config option on Nova gibi ? 16:57:13 enriquetaso: not really 16:57:23 or not that im aware of 16:57:35 in the context of cinder 16:57:39 enriquetaso: we are discussing creating a new placement trait to represent if a compute supports luks in qcow 16:57:54 gibi++ thanks 16:58:19 I honestly think we need to settle the dust 16:58:49 given the very short time we have left I hereby propose enriquetaso to create a spec and describe the feature 16:58:58 we could then chime on the upgrade concerns 16:59:06 okay u.u 16:59:15 enriquetaso: are you familiar with the spec process ? 16:59:26 is it too different from the cinder one? 16:59:39 bauzas, do you a have a doc? :P 16:59:55 enriquetaso: I can provide you pointers and more than that : guidance 17:00:11 sure bauzas 17:00:19 #agreed enriquetaso to provide a spec for this feature 17:00:33 #action bauzas to provide enriquetaso details on the spec process 17:00:40 we're on time 17:00:45 the agenda is done 17:00:56 so thanks all and see you next week 17:01:01 #endmeeting