16:01:15 #startmeeting nova 16:01:15 Meeting started Tue Dec 12 16:01:15 2023 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:15 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:15 The meeting name has been set to 'nova' 16:01:27 sorry folks for the delay, I had to write the agenda :D 16:01:33 #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting 16:01:41 who's around ? 16:01:49 o/ 16:01:54 o/ 16:02:58 o/ 16:04:14 let's slowly start 16:04:21 hopefully people will join 16:04:58 #topic Bugs (stuck/critical) 16:05:03 #info No Critical bug 16:05:08 #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 41 new untriaged bugs (+3 since the last meeting) 16:05:17 not sure anyone had time to look at bugs this week 16:05:22 #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster 16:05:54 melwitt is on the next round for the bug baton but she's off 16:06:03 I'll ask her later iirc 16:06:23 anything about bugs ? 16:06:38 looks not 16:06:40 moving on 16:06:45 #topic Gate status 16:06:50 #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:06:59 looks like the gate is more stable today 16:07:06 * gibi had not time to look at bugs sorry 16:07:07 I was able to recheck a few changes without issues 16:07:50 #link https://etherpad.opendev.org/p/nova-ci-failures-minimal 16:08:20 most of the patches from https://review.opendev.org/q/topic:%22nova-ci-improvements%22 are now merged 16:08:51 I'll note that Ironic's gate was broken with some of the recent Ironic<>Nova driver changes; there's a small fix in the gate now we've been trying to merge since yesterday. I don't think there's an action for Nova team as it was quickly approved and I'm rechecking. 16:09:14 JayF: ack thanks 16:09:28 #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status 16:09:39 all greens for master 16:09:56 nova-emulation continues to fail on stable/zed, but that's not a problem 16:10:05 #info Please look at the gate failures and file a bug report with the gate-failure tag. 16:10:11 anything about the gate ? 16:10:22 looks not 16:10:28 #topic Release Planning 16:10:32 #link https://releases.openstack.org/caracal/schedule.html#nova 16:10:35 #info Caracal-2 (and spec freeze) milestone in 4 weeks 16:10:38 time flies 16:10:51 last week, we had a good spec review day with 3 specs merged 16:11:10 but I beg here cores to look at other specs :) 16:11:44 fwiw, I'll do my duty 16:12:34 https://review.opendev.org/q/project:openstack/nova-specs+is:open+file:%5Especs/2024.1/.* is the list of open specs for 2024.1 16:13:00 that's it for me on the release cadence 16:13:04 nothing really important else 16:13:10 moving on 16:13:14 #topic Review priorities 16:13:29 o/ 16:13:34 one important thing 16:13:40 #link https://etherpad.opendev.org/p/nova-caracal-status 16:13:47 I updated the etherpad 16:14:18 #info please use and reuse this etherpad by looking at both the specs and the bugfixes 16:14:39 do we want to add a fixed/merged section in that 16:14:49 sean-k-mooney: we have it 16:14:55 but not for the bugfixes 16:14:58 I can add it 16:15:10 ya we have feature compelte 16:15:33 basically it might be a nice refernce for the prolog or release summary 16:15:50 yes 16:16:31 anyway, moving on 16:16:37 #topic Stable Branches 16:16:46 elodilles_pto: oh, he's on PTO 16:16:58 #info stable gates don't seem blocked 16:17:02 #info stable release patches still open for review: https://review.opendev.org/q/project:openstack/releases+is:open+intopic:nova 16:17:13 #info yoga is going to be unmaintained, so final stable/yoga release should happen ASAP - https://etherpad.opendev.org/p/nova-stable-yoga-eom 16:17:25 also, I'll add my own point 16:18:22 #link Yoga EOL change https://review.opendev.org/c/openstack/releases/+/903278 16:19:03 folks, if you want to wait to merge the EOL change until some change, please tell it above ^ 16:19:16 for me, I already +1d this EOL change 16:19:38 oh shit 16:19:39 I'll note that's the Ussuri EOL change if you wanna fix the minutes 16:19:40 #undo 16:19:40 Removing item from minutes: #link https://review.opendev.org/c/openstack/releases/+/903278 16:19:47 jinx :) 16:19:56 #link *Ussuri* EOL change https://review.opendev.org/c/openstack/releases/+/903278 16:20:08 voila 16:20:20 for Yoga, that's for EM 16:20:33 #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci 16:20:42 that's it I guess for the stable topic 16:20:56 anything else for stable branches ? 16:21:03 one thing 16:21:12 dependign on how the new images work on master 16:21:29 we may want to consider backporting those changes to stabel branches if we see 16:21:33 kernel panics there 16:21:45 so we should revisit this topic in a few weeks 16:21:52 nothing to do now 16:22:12 we should wait a bit until we backport them 16:22:14 but sure 16:22:24 looks to me the gate is better by now 16:23:06 ok, then moving to the next topic 16:23:11 #topic vmwareapi 3rd-party CI efforts Highlights 16:23:23 #Info Fully working devstack in lab environment manually set up. Now working on automatic setup / teardown for CI 16:23:27 fwiesel: grandchild: anything you want to tell ? 16:23:41 fwiesel: ++ 16:23:44 thanks ! 16:23:49 So, "fully working" meaning, we can spin up an instance and it has network, etc..., but.... 16:23:56 #Info Initial test runs show vmwareapi is broken (only boot from volume works, not nova boot), bugfixes will come after working CI 16:24:07 ahah 16:24:08 We cannot simply boot from an image 16:24:21 so that's why we need a 3rd party CI :)à 16:24:32 looks like we regressed during some time 16:24:49 So probably the CI needs to be non-voting in the beginning, I assume 16:24:55 fwiesel: will you be able to provide some bugfixes ? have you found the root cause ? 16:25:06 fwiesel: oh yeah, definitely 16:25:24 we'll run the job as non-voting first 16:25:27 bauzas: I haven't looked at the root cause yet, as I thought a working CI has priority. Then we tackle the bugs one by one. 16:25:35 fwiesel: cool, no worries 16:25:41 fwiesel: third party ci cannot be voting 16:25:51 fwiesel: in case you need help, we can discuss those in your topic next weeks 16:25:53 bauzas: But yes, we will be able to provide the bug fixes. It should be not terribly difficult 16:25:55 you can leave code review +1 or -1 16:26:04 but never verifed +1 or -1 16:26:19 or at least in a way that will prevent a patch merging 16:26:22 sean-k-mooney: Ah, thanks for the explanation. Then I got the terminology wrong. 16:26:32 no worries 16:26:39 sean-k-mooney: I think we had 3rd-party CI jobs voting before ? (with zuul 2) 16:26:45 no 16:26:47 never 16:27:02 we disucssed it in the past and said that was not ok 16:27:10 as we cant have the gate blocked by a third party ci 16:27:39 if we are revieing a vmware patch and it breaks the vmware ci 16:27:44 I meant, it probably shouldn't even leave a -1 in the beginning 16:27:44 we are very unlikely to merge it 16:27:52 but htat was left to cores to judge 16:27:59 you can haev a third party CI -1 and +1 without blocking anything. Only -2 blocks 16:28:00 okay, maybe this was in 2015 or earlier, but IIRC we had a job that was testing the DB time when upgrading and it was a 3rd-party CI 16:28:18 clarkb: we can yes 16:28:28 since gate only looks form verified form zuul 16:28:30 but maybe it never voted, can't exactly remember the details 16:29:02 bauzas: as far as i am aware we have never had third party voting ci and 16:29:16 i am not sure i want to change that in the future 16:29:16 anyway, this is not a problem 16:29:29 sure we just need to see the logs and if it passed or failed 16:29:51 let's see what fwiesel and grandchild can do with their CI and what they can provide for regression bugfixes 16:30:15 That's from my side. Any questions? 16:30:15 yep 16:30:24 fwiesel: just one thing 16:30:28 fwiw, I'm okay with verifying some link every week during our meeting about how many job runs failed 16:30:31 you said local images dont work 16:30:39 did you make sure to use vmdks 16:30:42 instead of qcow 16:30:46 so even if we don't make them voting, we could continue to check those jobs continue to work 16:31:08 sean-k-mooney: Sure, we only run with vmdks. 16:31:25 ok i was wondering if it was a simple format issue 16:31:35 feel free to file a bug with details when you have time 16:31:49 fwiesel: do you know we changed the VDMK types ? 16:32:08 o we blocked one of the times right 16:32:15 https://bugs.launchpad.net/nova/+bug/1996188 for the context 16:32:24 bauzas: No, I don't. Thanks for the info 16:32:34 so you now need to pass an allowed list of vdmk types 16:32:50 Ah, no. That one is fine... The same check is in cinder, and it works with boot from volume 16:32:55 maybe this is the root cause, maybe not 16:33:16 fwiesel: nova has its own config option 16:33:29 https://review.opendev.org/c/openstack/nova/+/871612 16:33:41 Sure, but we use the image subformat that isn't blocked 16:33:49 cool 16:33:57 (streamOptimized or something) 16:34:01 was more a fyi, just in case 16:34:16 the vdmk bug hit me before :) 16:34:45 and when we had it, noone was around for telling us whether it was a problem for vmwareapi :) 16:34:57 anyway 16:35:01 shouldn't be a problem 16:35:19 fwiesel: thanks for the report, greatly appreciated 16:35:30 you're welcome 16:35:36 fwiesel: someone also freaked out in the mailing list 16:35:45 I haven't replied but you could 16:35:57 bauzas: Good idea, I will 16:36:11 bauzas: openstack-discuss? 16:36:18 oh sean-k-mooney did 16:36:35 fwiesel: yup https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/message/DLTQ2KHQPFD4S7LLYKTWUPNFXRSTRTHU/ 16:36:54 ya i just said what we dicussed before 16:37:07 i.e. we wont remvoe anything until at least m2 depending on the ci status 16:37:17 and advies that while peopel can deprecate support 16:37:18 fwiw, I see a good progress 16:37:20 they shoudl not remove it 16:37:28 but let's continue to discuss it every week 16:37:33 yep 16:37:44 anyway, good talks 16:38:54 fwiw, I also advertised in the previous OpenInfraLive episode about what we agreed at the PTG and I explained verbally our current state, which is good for telling 'if you care about something, come tell us' 16:39:22 taking vmwareapi non-removal as an example of a working effort 16:39:46 anyway, time flis 16:39:50 flies* 16:39:53 #topic Open discussion 16:39:58 we have two topics 16:40:07 that we punted from last week 16:40:12 (artom) Specless blueprint for persistent mdevs 16:40:17 https://blueprints.launchpad.net/nova/+spec/persistent-mdevs 16:40:24 artom: around ? 16:40:40 Heya 16:41:16 To basically yeah - from memory, this would be limited to the libvirt driver, the idea is to persist mdevs when an instance is booted that uses an mdev 16:41:24 I'm maybe opiniated, so I won't really tell a lot, but I think this is a simple specless feature that only touches how our virt driver is creating a mdev 16:41:32 So that in case of a host reboot, the instances can come back without operator intervention 16:42:02 There might be operator intervention necessary to manually clean mdevs in certain cases 16:42:04 I don't see any upgrade concerns about it, the fact is that we will start persisting mdevs upon reboot by every compute that's upgraded 16:42:15 Because the mdevs would outlive their instances and host reboots 16:42:37 So for instance, changing the enabled mdev types (after draining the host), the operator would need to clean up the old mdevs 16:42:52 one thing you need to be careful of 16:42:56 artom: surely this will require a releasenote and some upstream docs, but this doesn't require to add a new DB model or anything about RPC 16:42:57 as draining the host will not remove mdevs? 16:43:08 is on restating nova to a version with this support 16:43:19 if we have vms using mdevs created via sysfs 16:43:49 we need to supprot cretign the libvirt nodedev object to persist them 16:44:05 sean-k-mooney: I see the upgrade path for persisting the mdevs to restart the instances 16:44:12 i.e. we need to supprot upgrade in place without restarts 16:44:21 why? 16:44:33 we shoudl not need vm downtime or move operations 16:44:41 oh my bad, I was wrong 16:44:50 the mdev would already be created 16:44:54 yep 16:44:57 and in use by the vm 16:45:09 we jsut need to create the mdev with the same uuid in the libvirt api 16:45:12 to have it persited 16:45:19 so, I think this feature requires some upgrade doc that explains how to persist the mdecv 16:45:33 well nova can do it 16:45:34 I mean, an admin doc 16:45:42 but we can have n upgade doc to cover how this works 16:46:10 i would hope its litrally just update the nova-compute binary adn restart the compute agent 16:46:15 no other upgrade impact 16:46:15 sean-k-mooney: do you think of a nova-compute startup method that would check every single mdev and would persist it ? 16:46:29 init host can reconsile what we expect based on teh current xmls 16:46:41 ya 16:46:45 that is what i was thinking 16:46:48 that's an additional effort, sure, still specless I think 16:46:57 we can defer that to impelmation review if we like 16:47:13 but i woudl ike to see upgrade in place support with or without a hard reboot i guess 16:47:15 sounds acceptable o me 16:47:37 we already have a broken method that's run on init_host 16:47:48 we could amend it to persist the mdev instead 16:48:12 (ie. delete and recreate it using libvirt API with the same uuid) 16:48:38 artom: does that sound reasonable to you ? 16:49:06 Sorry, reading back, in multiple places at the same time 16:49:28 IRC meetings are good, but people do many things at the same time :) 16:49:39 I wish we could be more in sync somehow :) 16:50:34 given we only have 10 mins left and anther procedural approval to do 16:50:41 lemme summarize 16:51:06 1/ the feature will ensure that every new mdev to create will use the libvirt API 16:51:38 2/ at compute restart, the implementation will check every mdev that's created and will recreate it using the libvirt API 16:52:11 OK, yeah, I think that makes sense, though the mechanics of persisting existing transient mdevs are less obvious to me at this time 16:52:25 3/ documentation will address the fact that operator needs some cleanup (unpersisting the mdevs) in case they want to change the type, additional to the fact they need to drain vgpu instances from that host 16:52:38 i would hope its just generateing an xml and asking livbirt to create it 16:52:54 artom: don't be afraid, I exactly see what, where and how to do it 16:52:57 it should see that it already exist and i hope just write the mdevctl file 16:53:07 we can do this at the end of the seriese 16:53:12 to not blcoke the over all feature 16:53:32 based on those 3 bullet points, I don't see anything that requires a spec 16:53:37 anyone disagreeing ? 16:53:58 looks not 16:54:13 and as a reminder, a specless approval isn't a blank check about design 16:54:28 if something controversial comes up, we could revisit that and ask for a spec 16:54:34 based on that 16:54:40 looks OK to me 16:54:54 #agreed https://blueprints.launchpad.net/nova/+spec/persistent-mdevs accepted as a specless blueprint 16:55:08 Im not clear about when manual cleanup is needed but we can discuss that in the review 16:55:15 #action artom to amend the blueprint description to note what we agreed 16:55:24 moving on 16:55:32 I really want the last item to be discussed 16:55:37 (JayF/johnthetubaguy) Specless blueprint for ironic guest metadata 16:55:40 o/ 16:55:42 JayF: 'sup ? 16:55:48 I am unsure if John will be here, but I am. 16:55:49 https://blueprints.launchpad.net/nova/+spec/ironic-guest-metadata 16:55:53 shot 16:55:57 shoot* even 16:56:44 reading it quickly it looks reasonable but i would use flavor uuid instead of name or both 16:56:59 ditto here 16:57:00 Essentially, libvirt instances get a large amount of useful metadata that Ironic would like to get as well for various uses -- the primary case that drove us to this was implementing Ironic's "automatic_lessee" support, allowing Ironic to use the project that provisioned an instance some RBAC access to it 16:57:30 but generally many of those metadata items map to previous feature requests / things that operators have asked for in node instance_info for troubleshooting in the past (like flavor) 16:57:33 I only care about the upgrade path 16:57:36 so it seemed like a good fit/easy win 16:57:37 so i guess my request woudl be can we update the decription with the full list of things we want to be set 16:58:04 would we need some interim period to ensure all computes report that metadata ? 16:58:22 Essentially this is just additional metadata you'd set on deploy 16:58:28 it'd be Ironic's job to do the right thing if it's set/not set 16:58:30 so this is settign metadta on the ironic nodes right 16:58:32 not the compute nodes 16:58:34 from a Nova standpoint, it should be 100% backwards compatible 16:58:41 as in when an instnace is shcudled ot a node 16:58:52 so for upgrade i guess we coudl have a nova-manage command 16:58:57 to set it for existing instnces 16:59:02 oh, you mean for backfilling instance metadata, I understand 16:59:08 yep 16:59:10 that's a case I hadn't even consideredd! 16:59:28 so i assume this would normally only be set on spawn 16:59:37 since ironci dose not support resize 16:59:45 well spawn or rebuild/evacuate 17:00:02 Alright, looks like I have two actions: 1) List all the specific fields and 2) add details about migration path for preexisting instances and how/if they get metadata 17:00:08 sean-k-mooney: we do rebuild, very common use case 17:00:19 ya so we would want to update it on rebuild right 17:00:33 so 3 actions. list the data to set, list when it will be set 17:00:39 sounds then an implementation detail to me 17:00:44 and then if we want to have a nova-manage command to backfile then detail that too 17:00:50 yeah but one I don't mind enumerated in the blueprint to ensure I don't miss it 17:01:27 nova-manage what ? fill my ironic stuff for that instance ? 17:02:05 bauzas: ya set the metadtaa on the corresponding ironc node for an existing instance 17:02:19 couldn't it be some ironic script that would gather the details from the nova API ? 17:02:23 that shoudl be pretty simple to do its just a ironic api call 17:02:40 it could btu that feels liek a worse soultion to me 17:02:49 So I'll note 17:02:52 sean-k-mooney: I'm not 100% happy with a very specific virt driver method in our nova-manage command 17:03:00 From an Ironic standpoint, having the data backfilled is not super awesome 17:03:07 im pretty sure it woould not be the first 17:03:09 we aren't going to do much with it 17:03:12 this would require nova-manage to be able to speak the ironic language 17:03:20 but this feels like the volume refesh commands to me 17:03:32 bauzas: it already can talk to ironci 17:03:33 so while I'm happy to implement it, and I'm sure someone would find a use for it, I don't think it's in the primary path for enabling the sorta features we want 17:03:56 so i dont really mind where it ends up i guess but i think it woudl be nice to have 17:03:59 sean-k-mooney: creds and all the likes are set on nova.conf that nova-manage reads ? 17:04:30 bauzas: you already have creds on nova-computes to do the calls you need 17:04:43 PATCH calls to /v1/node/{UUID} 17:04:50 JayF: nova-manage isn't meant to be run on nova computes 17:04:55 ack 17:05:15 bauzas: well its normally run on the contoler which often by acidnet is where nova-compute with ironic runs 17:05:20 despite we shipped that sail with the volume attach command :) 17:05:23 but we cant assume it will be colocated 17:06:03 did I say " we shipped that sail" ? oh gosh, I'm tired 17:06:21 anyway 17:06:52 sounds there is kind of a grey path about what we would for non-greenfields instances 17:06:58 would do* 17:07:09 I like the idea of Ironic owning the migratino script 17:07:09 time is flying tho and we're *again* late 17:07:18 and will think abuot it further and likely propose that 17:07:23 we shoudl figure our a solution to exisitng instnace but we dont need to do that now 17:07:25 (I'm really sorry about it) 17:07:39 JayF: what would you be your preference ? 17:07:42 JayF: do you want to think about that for a few days and let use knwo what you think is the best approch 17:08:11 approving the blueprint with a note saying "this is only a path for new instances, the migration path is yet to be defined" ? 17:08:17 im kind of feeling liek a spec would help by the way 17:08:24 or we could revisit the approval in later meetings 17:08:27 Yeah I'm thinking Ironic side script, because then we can allow the Ironic-side actions to be done, too 17:08:30 I'd say lets revisit 17:08:30 but if other are ok im not going to say we must have one 17:08:39 I think I'll get to talk to John in the intervening week 17:08:52 okay, I'll keep the bleuprint in the agenda 17:08:56 I wasn't sure what the edges were on this, now I know what they are and can file them down :) 17:08:59 thank you 17:08:59 and we could revisit it next week 17:09:04 thanks 17:09:05 i just dont knwo this code as well as other so it would help me to have a little more detail. but we coudl just put more detail in the blueprint 17:09:17 and because we're horribly late, I'll end the meeting now 17:09:24 o/ 17:09:24 thanks all 17:09:30 and sorry again 17:09:32 #endmeeting