16:00:24 <Uggla> #startmeeting nova 16:00:24 <opendevmeet> Meeting started Tue Jul 1 16:00:24 2025 UTC and is due to finish in 60 minutes. The chair is Uggla. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:24 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:24 <opendevmeet> The meeting name has been set to 'nova' 16:00:32 <Uggla> Hello everyone 16:01:32 <gibi> o/ 16:01:41 <bauzas> o/ 16:01:52 <bauzas> (I'll need to leave a bit early, like 15 mins before the end) 16:02:21 <masahito> o/ 16:02:46 <Uggla> #topic Bugs (stuck/critical) 16:02:53 <Uggla> #info No Critical bug 16:03:10 <Uggla> #topic Gate status 16:03:23 <Uggla> #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:03:31 <Uggla> #link https://etherpad.opendev.org/p/nova-ci-failures-minimal 16:03:37 <Uggla> #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&branch=stable%2F*&branch=master&pipeline=periodic-weekly&skip=0 Nova&Placement periodic jobs status 16:03:45 <Uggla> #info Please look at the gate failures and file a bug report with the gate-failure tag. 16:03:51 <Uggla> #info Please try to provide a meaningful comment when you recheck 16:04:18 <Uggla> #topic tempest-with-latest-microversion job status 16:04:26 <Uggla> #link https://zuul.opendev.org/t/openstack/builds?job_name=tempest-with-latest-microversion&skip=0 16:04:39 <Uggla> gmaan, something you'd like to tell us ? 16:04:39 <fwiesel> o/ 16:05:45 <Uggla> not sut gmaan is available today, so I'm gonna move on. 16:06:01 <Uggla> #topic Release Planning 16:06:15 <Uggla> #link https://releases.openstack.org/flamingo/schedule.html 16:06:22 <Uggla> #info Nova deadlines are set in the above schedule 16:06:29 <Uggla> #info Nova spec freeze is Thursday. 16:06:56 <Uggla> ⚠️ ^ 16:08:21 <Uggla> FYI, I have discussed with masahito about https://review.opendev.org/c/openstack/nova-specs/+/951636 and if he will be able to submit an update. So it will give us a chance to approve this one. 16:08:30 <opendevreview> Stephen Finucane proposed openstack/nova master: api: Correct expected errors https://review.opendev.org/c/openstack/nova/+/951640 16:08:49 <Uggla> #topic Review priorities 16:08:58 <masahito> yup. let me focus on updating my specs tomorrow. 16:09:09 <Uggla> masahito 👍 16:09:16 <Uggla> #link https://etherpad.opendev.org/p/nova-2025.2-status 16:10:25 <Uggla> I have updated the doc and updated launchpad accordingly. So blueprints, SL blueprints should be all in the correct status. 16:10:47 <Uggla> If you see something wrong in the doc, please let me know. 16:11:22 <Uggla> #topic OpenAPI 16:11:30 <Uggla> #link: https://review.opendev.org/q/topic:%22openapi%22+(project:openstack/nova+OR+project:openstack/placement)+-status:merged+-status:abandoned 16:11:36 <Uggla> #info 19 decrease -6. \o/ 16:12:14 <Uggla> elodilles is not available so I'll skip the stable branch topic. 16:12:33 <Uggla> #topic vmwareapi 3rd-party CI efforts Highlights 16:12:41 <Uggla> fwiesel do you have something to share ? 16:14:34 <Uggla> fwiesel ? 16:14:37 <fwiesel> Uggla: No, nothing from my side 16:14:49 <Uggla> fwiesel ok thx 16:14:56 <Uggla> #topic Gibi's news about eventlet removal. 16:15:05 <Uggla> #link Blog: https://gibizer.github.io/categories/eventlet/ 16:15:14 <Uggla> #link nova-scheduler series is ready for core review, starting at https://review.opendev.org/c/openstack/nova/+/947966 16:15:14 <gibi> o/ 16:15:19 <Uggla> gibi the mic is yours 16:15:32 <gibi> so the scatter gather refactor patch has been landed 16:15:37 <gibi> thanks for the reviews 16:15:40 <Uggla> \o/ 16:16:01 <gibi> the next two patches are also approved but the gate was / is pretty flaky in the recent days so they are not landed yet 16:16:12 <gibi> (hybrid-plug and ceph issuees) 16:16:47 <gibi> I fixed the comments form bauzas on the next patch in the series about a possible race condition. Please take a look 16:17:09 <gibi> On the top of the series I started an effor to run unit tests without eventlet 16:17:29 <gibi> the effort starts here https://review.opendev.org/c/openstack/nova/+/953436/5 16:18:07 <gibi> couple of patches later in that series we have a job that runs most of our unit tests without monkey patching and with threading oslo.service and oslo.messaging backend 16:18:19 <gibi> there is a list of excluded test I have to work through 16:18:57 <gibi> I'm tracking to type of problems. Sqlaclhemy nonsense errors and simple test case hangs 16:19:04 <gibi> s/to/two/ 16:19:15 <gibi> you can see them noted in the test exclude list 16:19:37 <gibi> I think we can make the new zuul job voting pretty soon 16:19:45 <gibi> (while keeping some excludes) 16:20:23 <gibi> that is it from me 16:20:51 <sean-k-mooney> o/ sorry had this behind anohter window 16:21:48 <Uggla> gibi do you have some kind of "urgent" review anymore ? 16:22:01 <bauzas> I'll continue to review the series for sure 16:22:25 <gibi> Uggla: the bottom of the series is as urgent as the eventlet removal :) 16:22:51 <gibi> bauzas: hanks 16:22:54 <gibi> thanks 16:24:23 <gibi> I think we can move on :) 16:24:30 <Uggla> BTW we have Nova Eventlet removal sync tomorrow too. 16:25:11 <Uggla> 14:30 UTC 16:25:12 <gibi> yepp we have the call 16:25:14 <gibi> as planned 16:25:32 <Uggla> thanks gibi, next topic 16:25:39 <Uggla> #topic Open discussion 16:25:47 <Uggla> #topic (fwiesel) Evacuate Action & InstanceInvalidState 16:26:20 <Uggla> fwiesel please go ahead. 16:26:28 <fwiesel> So, we are currently looking into doing HA by using the evacuate action on instances, and were running into various states, where it was not possible. 16:27:00 <fwiesel> From our side, evacuation usually happens when the host is down so can happen at any time. 16:27:39 <fwiesel> And I was wondering, if it wouldn't make sense to adjust the code so it can evacuate an instance in as many sitations as possible. 16:28:47 <fwiesel> So, raising an InstanceInvalidState is merely a reflection of what has been implemented. What do you say? 16:29:44 <gibi> do you have specific examples of states you want to evacuate from? 16:29:48 <cardoe> So I'll say internally we've got extra stuff which we don't see nova ever being able to handle. 16:29:59 <cardoe> Like even human involved state 16:30:37 <fwiesel> Let me think. For starters: Verify-Resize. 16:31:13 <fwiesel> But let's say, we have a live-migration, and that host falls down. I still would like to do a failover 16:31:57 <gibi> verify-resize: so while you could evacuate from that state you also need to implement a proper cleanup in the nova-compute coming up in the source node of the original resize 16:32:00 <sean-k-mooney> so evacuate requires you the admin 16:32:11 <sean-k-mooney> to assert that the vm is not running on any host 16:32:31 <sean-k-mooney> so allowing it in verify resize and othe rstates is questionable 16:32:52 <sean-k-mooney> espically since in that state your in the middle of an ongoing migration 16:32:55 <cardoe> Yeah there's a lot of squirrely cases there which need to be monitored for. 16:33:07 <sean-k-mooney> and the flavor that is in use is kind of ill defiend 16:33:47 <sean-k-mooney> its technialy the new one in a real reaise but it has nto beeen confirm so you woudl want ot evacuate to the old one, but hte numa toplogy and ohter things are all in a weird hybrid state 16:34:28 <sean-k-mooney> what you likely should be able to do is revert the resize in verify-resize 16:34:29 <gibi> so in general it would be nice to support these but in practic it seems like a bunch of cans of worms 16:34:37 <sean-k-mooney> even if the dest host is down 16:34:41 <sean-k-mooney> then evacuate it 16:34:43 <sean-k-mooney> if needed 16:35:13 <cardoe> We've got a whole thing out of band that's specific to the hypervisor in use for these items. 16:35:31 <gibi> yeah allowing revert-resize in souce host down is one way, even though a weird one. Who wants to revert a resize that will lead to the VM runing nowhere :) 16:35:33 <fwiesel> But technically, an evacuate is practically a rebuild. So, I can start from "scratch" (nova wise) 16:35:54 <sean-k-mooney> gibi: well it would revert reisze when dest down and conrim resize when source down 16:35:56 <fwiesel> I just have to build the VM up somewhere fitting the spec 16:36:05 <gibi> yeah starting is easy, cleaning up the remnants is messy 16:36:12 <sean-k-mooney> fwiesel: yes and no 16:36:13 <cardoe> yeah but you need to make sure that gets fully cleaned up. 16:36:20 <fwiesel> Of course 16:36:25 <sean-k-mooney> we have some api contract about preserving storage in some cases 16:36:40 <sean-k-mooney> please there are the neutorn/cidner state clean up that we need to do 16:36:48 <fwiesel> Exactly 16:36:48 <gibi> sean-k-mooney: revert-resize: ahh right 16:37:07 <fwiesel> Right now, I have the choice to implement all that outside of nova. Or within nova 16:37:25 <sean-k-mooney> to do this in nova woudl need a very detailed spec 16:37:41 <fwiesel> Well, I would not tackle all cases in one go. 16:37:42 <sean-k-mooney> and you woudl have to consier things liek cross cell resize 16:38:01 <fwiesel> And maybe then not resize :) 16:38:15 <cardoe> its a worthy goal 16:38:18 <sean-k-mooney> right but even if you take an incremntal approch you need to have each incremantal chagne be valid 16:38:55 <sean-k-mooney> i think for rezie it woubld be siiler to allwo the migration to compelte or revert based on which end is alive 16:38:57 <cardoe> I'll say when I did this with xen it was much easier because xapi had an object version field so I used that as a monotonic number which also then was stored in the nova DB 16:39:03 <sean-k-mooney> and if they are both dead well your kind of out of luck 16:39:10 <fwiesel> Of course. I was more fearing that it will be difficult to test that in the CI, and adds complexity. Which might be areason you might not want it at all. 16:39:26 <cardoe> So each time the instance got updated the monotonic number increased and that got pushed down into xapi 16:39:28 <gibi> fwiesel: start with a simpler state first, write up a spec how would you make sure we end up in the consistent state after the a successful evac and a recovered compute (or an failed evac and a recovered compute, of a failed evac and a retried evac and recovered compute, ...) 16:39:31 <cardoe> But libvirt doesn't have that 16:39:47 <cardoe> So when we'd go to clean up, anything that wasn't at the current version could get nuked. 16:39:59 <sean-k-mooney> fwiesel: so testing that in tempest would be hard, but it would be more doable in the functional suite 16:40:03 <cardoe> But there's all kinds of dangling references to storage and what not that really are a pain to fix 16:40:36 <gibi> yepp use functional test for sure 16:40:51 <fwiesel> Okay, I'll take that with me. Thanks. 16:40:52 <gibi> we have a limited evact test somewhere after tempest 16:40:57 <sean-k-mooney> cardoe: so we dont need montic object version in general becasue we assocate the instance with the comptue service that is currently manging it 16:40:58 <gibi> in a palybook 16:41:11 <sean-k-mooney> in the post test hook in nova-next 16:41:16 <cardoe> For us we implemented this more as monitoring that an unexpected object was on a host. 16:41:17 <sean-k-mooney> and i think the cepch job? 16:42:03 <sean-k-mooney> cardoe: ya so that kind of approch is not very comaptible with the nova desing although we do have some per host perodic that are intened ot clean up these types of issues 16:42:08 <cardoe> sean-k-mooney: when you perform a live migrate and the dest goes down at the very end during the resume 16:42:09 <sean-k-mooney> esplcially on compute agent start up 16:42:53 <sean-k-mooney> cardoe: if the vm dies before resume libvirt will abort the migraiot 16:43:01 <sean-k-mooney> if we got to post live migrate 16:43:21 <sean-k-mooney> its too late to abort but we will update the instance hsot to point to the dest 16:43:28 <gibi> this is the evac testing in devstack https://github.com/openstack/nova/blob/master/roles/run-evacuate-hook/tasks/main.yaml 16:43:58 <Uggla> guys is it ok to move to the second point ? 16:44:05 <gibi> OK for me 16:44:08 <cardoe> I'm just saying it wasn't a simple problem when I was working in the VM world because hardware dies at the worst possible times and nova can be in many different code calls when that happens. 16:44:10 <sean-k-mooney> yep lets move on for now. 16:44:24 <Uggla> #topic (jonnyb) Blueprint for new compute created weigher (https://blueprints.launchpad.net/nova/+spec/node-uptime-weigher) 16:44:47 <Uggla> jonnyb wants to discuss about this weigher 16:45:01 <Uggla> jonnyb please go ahead. 16:45:13 <jonnyb> so I propose a scheduler weigher based on creation date of the compute service 16:45:41 <jonnyb> the initial idea was to use uptime but this is too dependend on the hypervisor and its driver 16:46:11 <jonnyb> for us the creation is good enough and we use it since we migrate and deploy a lot of servers 16:46:13 <gibi> do you have the service creation time as part of the HostState the weigher running on? 16:46:29 <sean-k-mooney> i belive we do 16:46:36 <dansmith> that surprises me 16:46:53 <sean-k-mooney> we have it form the compute node object i belive 16:47:01 <dansmith> sure, but in hoststate? 16:47:38 <sean-k-mooney> so this is the poc https://review.opendev.org/c/openstack/nova/+/947503/9/nova/scheduler/weights/compute_created.py 16:48:27 <dansmith> we have an object, but I didn't think it was queried from the db and thus not complete, but maybe that's not right 16:49:08 <sean-k-mooney> so yes https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L176 16:49:26 <sean-k-mooney> you could be correct about it being a partial object 16:49:26 <gibi> based on the poc we have HostState.service.created_at as created_at does not expect to change it is fair to assume it is up to date 16:49:45 <gibi> isn't expected to change 16:50:34 <sean-k-mooney> so it looks like this is only populated via update its not populated iva __init__ 16:51:02 <sean-k-mooney> so im not sure if its there if you disatble the track_isntace_state notifications stuff 16:51:17 <dansmith> okay we are doing a scatter to gather nodes and hosts from cells 16:51:18 <sean-k-mooney> we woudl have to confirm that its always present in both confiugrations 16:51:30 <dansmith> I was thinking we did an optimized construction from a list there 16:51:44 <dansmith> sean-k-mooney: yeah, good point 16:52:11 <sean-k-mooney> i think that config only affect the instnace object and how those are populated 16:52:16 <dansmith> because this could also be triggering massive lazy loads, which would be bad 16:52:19 <sean-k-mooney> but assuming we have that alwasy 16:52:43 <sean-k-mooney> i think it would be ok to use it in the weigher provide ya we are not lazy loading this 16:53:04 <dansmith> we need to make sure this is run in CI at least once so we can examine logs for lazy loads 16:56:32 <sean-k-mooney> ya we can turn this on in one fo the jobs 16:57:08 <sean-k-mooney> it wont really have much of an effect with only 1-2 nodes 16:57:10 <dansmith> we don't have to permanently, but just a DNM on top of this to get a run and make sure we don't see "lazy loading service.created_at" :) 16:58:10 <sean-k-mooney> we are wrappign the sercie in a readonly dict class that i have not really seen before. i would assume that shoudl also block lazyloading? 16:58:24 <sean-k-mooney> goign back on topic 16:58:34 <sean-k-mooney> are we ok with appvoing this as a specless blueprint 16:58:38 <sean-k-mooney> and continuing to review 16:59:16 <dansmith> idk 16:59:53 <Uggla> :+1 for me it looks an interesting weigher. 17:00:20 <Uggla> gibi, ok for you ? 17:00:24 <gibi> I'm fine 17:00:46 <sean-k-mooney> i dont nessiarly have an objection just note that if there are lazy loadign issues they will need to be resolved ebfore we can merge it 17:00:56 <dansmith> same 17:00:58 <sean-k-mooney> so it might not be ready before FF 17:01:17 <sean-k-mooney> but im ok to move to the review and take a look a ci logs ectra 17:01:30 <Uggla> jonnyb is that ok for you ? 17:01:35 <jonnyb> thats fine with me, i wasnt aware that lazy loading could be an issue. but good point 17:02:04 <sean-k-mooney> you defaulted to 1.0 so in theroy its enabled by default 17:02:15 <sean-k-mooney> so we should b eable to look at any of the current ci logs 17:02:28 <Uggla> ok last topic: 17:02:28 <Uggla> #topic (gibi) Workaround for cpython3.13 GC bug needs review https://review.opendev.org/c/openstack/nova/+/952966 17:02:36 <dansmith> oh, definitely seems like it should be disabled by default 17:02:53 <dansmith> that's merged? 17:03:05 <sean-k-mooney> well we can defintly change that. in the review and ya the gc thing is merged 17:03:26 <gibi> yepp it is merged 17:03:29 <Uggla> dansmith yes it seems 17:03:48 <gibi> I proposed the backport to Epoxy as I know debian want it there 17:04:23 <gibi> (the cpython bug fixing effort is progressing in the background) 17:05:14 <gibi> so other than the backport there is nothing else left here 17:05:18 <gibi> we can move on 17:05:41 <sean-k-mooney> jonnyb: so without diging deeply https://zuul.opendev.org/t/openstack/build/8a9b0a0404034a79a95621b18e16534b/log/controller/logs/screen-n-sch.txt#1017 you schduler was enabeld and i dont see lazy loadign but we need to look at that properly after the meeting 17:05:44 <Uggla> ok cardoe, wants to discuss about a patch 17:06:03 <Uggla> cardoe please go ahead, and then we will close. 17:06:07 <cardoe> https://review.opendev.org/c/openstack/nova-specs/+/471815 17:06:36 <cardoe> It is a proposed spec change to add trunk ports with their vlan tags to the network_data.json 17:06:47 <sean-k-mooney> ya im +2 on that gibi is +1 because they wont have review bandwith i belvie 17:07:04 <gibi> I'm +2-1=+1 signalling review bandwidth yes 17:07:16 <cardoe> It's something written originally by someone not involved in the effort currently. 17:07:23 <cardoe> But neutron supports defining ports like that. 17:07:55 <cardoe> There have been a few patches floating around from different operators that are using that functionality as a downstream patch. 17:07:56 <dansmith> sean-k-mooney: jonnyb: agree, I see the weigher being weighed and no lazy loads in that log, so seems fine 17:08:06 <gibi> having +1 from rubasov (Bence) is a good sign, he was involved in the original work of trunk 17:08:15 <cardoe> I've been working to consolidate the downstream operators around a single implementation and wanting to get it upstreamed. 17:08:20 <sean-k-mooney> yep 17:08:48 <cardoe> We've got temptest tests written and an implementation. 17:08:52 <sean-k-mooney> there is code for noav and tempest test too 17:09:04 <sean-k-mooney> ya https://review.opendev.org/q/topic:%22bp/expose-vlan-trunking%22 17:09:20 <sean-k-mooney> this is useful for both vms adn for ironic 17:09:24 <cardoe> This is just part of my TC work to try and get downstream operators to push back to the community. 17:10:29 <sean-k-mooney> so the requwst is bascily for another core to sponsor/review this before spec freeze on thruday 17:10:57 <sean-k-mooney> and then to review the actul pactchs before m3 17:11:54 <sean-k-mooney> the one thing i will say is i think we only have turnks in one job 17:12:12 <sean-k-mooney> so we will need to confirm that this si properly tested in our ci and that that job passes 17:13:07 <sean-k-mooney> the logs have rotated and the one nova patch that is requried https://review.opendev.org/c/openstack/nova/+/941227 is in merge conflict if i ty an rebase 17:13:17 <cardoe> I'll work to get folks to update things based on feedback. 17:13:25 <sean-k-mooney> oh its just missign isnged off by 17:14:50 <Uggla> cardoe something else you want to ask ? 17:15:16 <cardoe> No my only question is how can I get this on the roadmap 17:18:12 <Uggla> We are overtime, so we should close. 17:18:19 <Uggla> Thanks all 17:18:25 <Uggla> #endmeeting