16:00:24 <Uggla> #startmeeting nova
16:00:24 <opendevmeet> Meeting started Tue Jul  1 16:00:24 2025 UTC and is due to finish in 60 minutes.  The chair is Uggla. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:24 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:24 <opendevmeet> The meeting name has been set to 'nova'
16:00:32 <Uggla> Hello everyone
16:01:32 <gibi> o/
16:01:41 <bauzas> o/
16:01:52 <bauzas> (I'll need to leave a bit early, like 15 mins before the end)
16:02:21 <masahito> o/
16:02:46 <Uggla> #topic Bugs (stuck/critical)
16:02:53 <Uggla> #info No Critical bug
16:03:10 <Uggla> #topic Gate status
16:03:23 <Uggla> #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs
16:03:31 <Uggla> #link https://etherpad.opendev.org/p/nova-ci-failures-minimal
16:03:37 <Uggla> #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&branch=stable%2F*&branch=master&pipeline=periodic-weekly&skip=0 Nova&Placement periodic jobs status
16:03:45 <Uggla> #info Please look at the gate failures and file a bug report with the gate-failure tag.
16:03:51 <Uggla> #info Please try to provide a meaningful comment when you recheck
16:04:18 <Uggla> #topic tempest-with-latest-microversion job status
16:04:26 <Uggla> #link https://zuul.opendev.org/t/openstack/builds?job_name=tempest-with-latest-microversion&skip=0
16:04:39 <Uggla> gmaan, something you'd like to tell us ?
16:04:39 <fwiesel> o/
16:05:45 <Uggla> not sut gmaan is available today, so I'm gonna move on.
16:06:01 <Uggla> #topic Release Planning
16:06:15 <Uggla> #link https://releases.openstack.org/flamingo/schedule.html
16:06:22 <Uggla> #info Nova deadlines are set in the above schedule
16:06:29 <Uggla> #info Nova spec freeze is Thursday.
16:06:56 <Uggla> ⚠️ ^
16:08:21 <Uggla> FYI, I have discussed with masahito about https://review.opendev.org/c/openstack/nova-specs/+/951636 and if he will be able to submit an update. So it will give us a chance to approve this one.
16:08:30 <opendevreview> Stephen Finucane proposed openstack/nova master: api: Correct expected errors  https://review.opendev.org/c/openstack/nova/+/951640
16:08:49 <Uggla> #topic Review priorities
16:08:58 <masahito> yup. let me focus on updating my specs tomorrow.
16:09:09 <Uggla> masahito 👍
16:09:16 <Uggla> #link https://etherpad.opendev.org/p/nova-2025.2-status
16:10:25 <Uggla> I have updated the doc and updated launchpad accordingly. So blueprints, SL blueprints should be all in the correct status.
16:10:47 <Uggla> If you see something wrong in the doc, please let me know.
16:11:22 <Uggla> #topic OpenAPI
16:11:30 <Uggla> #link: https://review.opendev.org/q/topic:%22openapi%22+(project:openstack/nova+OR+project:openstack/placement)+-status:merged+-status:abandoned
16:11:36 <Uggla> #info 19 decrease -6. \o/
16:12:14 <Uggla> elodilles is not available so I'll skip the stable branch topic.
16:12:33 <Uggla> #topic vmwareapi 3rd-party CI efforts Highlights
16:12:41 <Uggla> fwiesel do you have something to share ?
16:14:34 <Uggla> fwiesel ?
16:14:37 <fwiesel> Uggla: No, nothing from my side
16:14:49 <Uggla> fwiesel ok thx
16:14:56 <Uggla> #topic Gibi's news about eventlet removal.
16:15:05 <Uggla> #link Blog: https://gibizer.github.io/categories/eventlet/
16:15:14 <Uggla> #link nova-scheduler series is ready for core review, starting at https://review.opendev.org/c/openstack/nova/+/947966
16:15:14 <gibi> o/
16:15:19 <Uggla> gibi the mic is yours
16:15:32 <gibi> so the scatter gather refactor patch has been landed
16:15:37 <gibi> thanks for the reviews
16:15:40 <Uggla> \o/
16:16:01 <gibi> the next two patches are also approved but the gate was / is pretty flaky in the recent days so they are not landed yet
16:16:12 <gibi> (hybrid-plug and ceph issuees)
16:16:47 <gibi> I fixed the comments form bauzas on the next patch in the series about a possible race condition. Please take a look
16:17:09 <gibi> On the top of the series I started an effor to run unit tests without eventlet
16:17:29 <gibi> the effort starts here https://review.opendev.org/c/openstack/nova/+/953436/5
16:18:07 <gibi> couple of patches later in that series we have a job that runs most of our unit tests without monkey patching and with threading oslo.service and oslo.messaging backend
16:18:19 <gibi> there is a list of excluded test I have to work through
16:18:57 <gibi> I'm tracking to type of problems. Sqlaclhemy nonsense errors and simple test case hangs
16:19:04 <gibi> s/to/two/
16:19:15 <gibi> you can see them noted in the test exclude list
16:19:37 <gibi> I think we can make the new zuul job voting pretty soon
16:19:45 <gibi> (while keeping some excludes)
16:20:23 <gibi> that is it from me
16:20:51 <sean-k-mooney> o/ sorry had this behind anohter window
16:21:48 <Uggla> gibi do you have some kind of "urgent" review anymore ?
16:22:01 <bauzas> I'll continue to review the series for sure
16:22:25 <gibi> Uggla: the bottom of the series is as urgent as the eventlet removal :)
16:22:51 <gibi> bauzas: hanks
16:22:54 <gibi> thanks
16:24:23 <gibi> I think we can move on :)
16:24:30 <Uggla> BTW we have Nova Eventlet removal sync tomorrow too.
16:25:11 <Uggla> 14:30 UTC
16:25:12 <gibi> yepp we have the call
16:25:14 <gibi> as planned
16:25:32 <Uggla> thanks gibi, next topic
16:25:39 <Uggla> #topic Open discussion
16:25:47 <Uggla> #topic (fwiesel) Evacuate Action & InstanceInvalidState
16:26:20 <Uggla> fwiesel please go ahead.
16:26:28 <fwiesel> So, we are currently looking into doing HA by using the evacuate action on instances, and were running into various states, where it was not possible.
16:27:00 <fwiesel> From our side, evacuation usually happens when the host is down so can happen at any time.
16:27:39 <fwiesel> And I was wondering, if it wouldn't make sense to adjust the code so it can evacuate an instance in as many sitations as possible.
16:28:47 <fwiesel> So, raising an InstanceInvalidState is merely a reflection of  what has been implemented. What do you say?
16:29:44 <gibi> do you have specific examples of states you want to evacuate from?
16:29:48 <cardoe> So I'll say internally we've got extra stuff which we don't see nova ever being able to handle.
16:29:59 <cardoe> Like even human involved state
16:30:37 <fwiesel> Let me think. For starters: Verify-Resize.
16:31:13 <fwiesel> But let's say, we have a live-migration, and that host falls down. I still would like to do a failover
16:31:57 <gibi> verify-resize: so while you could evacuate from that state you also need to implement a proper cleanup in the nova-compute coming up in the source node of the original resize
16:32:00 <sean-k-mooney> so evacuate requires you the admin
16:32:11 <sean-k-mooney> to assert that the vm is not running on any host
16:32:31 <sean-k-mooney> so allowing it in verify resize and othe rstates is questionable
16:32:52 <sean-k-mooney> espically since in that state your in the middle of an ongoing migration
16:32:55 <cardoe> Yeah there's a lot of squirrely cases there which need to be monitored for.
16:33:07 <sean-k-mooney> and the flavor that is in use is kind of ill defiend
16:33:47 <sean-k-mooney> its technialy the new one in a real reaise but it has nto beeen confirm so you woudl want ot evacuate to the old one, but hte numa toplogy and ohter things are all in a weird hybrid state
16:34:28 <sean-k-mooney> what you likely should be able to do is revert the resize in verify-resize
16:34:29 <gibi> so in general it would be nice to support these but in practic it seems like a bunch of cans of worms
16:34:37 <sean-k-mooney> even if the dest host is down
16:34:41 <sean-k-mooney> then evacuate it
16:34:43 <sean-k-mooney> if needed
16:35:13 <cardoe> We've got a whole thing out of band that's specific to the hypervisor in use for these items.
16:35:31 <gibi> yeah allowing revert-resize in souce host down is one way, even though a weird one. Who wants to revert a resize that will lead to the VM runing nowhere :)
16:35:33 <fwiesel> But technically, an evacuate is practically a rebuild. So, I can start from "scratch" (nova wise)
16:35:54 <sean-k-mooney> gibi: well it would revert reisze when dest down and conrim resize when source down
16:35:56 <fwiesel> I just have to build the VM up somewhere fitting the spec
16:36:05 <gibi> yeah starting is easy, cleaning up the remnants is messy
16:36:12 <sean-k-mooney> fwiesel: yes and no
16:36:13 <cardoe> yeah but you need to make sure that gets fully cleaned up.
16:36:20 <fwiesel> Of course
16:36:25 <sean-k-mooney> we have some api contract about preserving storage in some cases
16:36:40 <sean-k-mooney> please there are the neutorn/cidner state clean up that we need to do
16:36:48 <fwiesel> Exactly
16:36:48 <gibi> sean-k-mooney: revert-resize: ahh right
16:37:07 <fwiesel> Right now, I have the choice to implement all that outside of nova. Or within nova
16:37:25 <sean-k-mooney> to do this in nova woudl need a very detailed spec
16:37:41 <fwiesel> Well, I would not tackle all cases in one go.
16:37:42 <sean-k-mooney> and you woudl have to consier things liek cross cell resize
16:38:01 <fwiesel> And maybe then not resize :)
16:38:15 <cardoe> its a worthy goal
16:38:18 <sean-k-mooney> right but even if you take an incremntal approch you need to have each incremantal chagne be valid
16:38:55 <sean-k-mooney> i think for rezie it woubld be siiler to allwo the migration to compelte or revert based on which end is alive
16:38:57 <cardoe> I'll say when I did this with xen it was much easier because xapi had an object version field so I used that as a monotonic number which also then was stored in the nova DB
16:39:03 <sean-k-mooney> and if they are both dead well your kind of out of luck
16:39:10 <fwiesel> Of course. I was more fearing that it will be difficult to test that in the CI, and adds complexity. Which might be areason you might not want it at all.
16:39:26 <cardoe> So each time the instance got updated the monotonic number increased and that got pushed down into xapi
16:39:28 <gibi> fwiesel: start with a simpler state first, write up a spec how would you make sure we end up in the consistent state after the a successful evac and a recovered compute (or an failed evac and a recovered compute, of a failed evac and a retried evac and recovered compute, ...)
16:39:31 <cardoe> But libvirt doesn't have that
16:39:47 <cardoe> So when we'd go to clean up, anything that wasn't at the current version could get nuked.
16:39:59 <sean-k-mooney> fwiesel: so testing that in tempest would be hard, but it would be more doable in the functional suite
16:40:03 <cardoe> But there's all kinds of dangling references to storage and what not that really are a pain to fix
16:40:36 <gibi> yepp use functional test for sure
16:40:51 <fwiesel> Okay, I'll take that with me. Thanks.
16:40:52 <gibi> we have a limited evact test somewhere after tempest
16:40:57 <sean-k-mooney> cardoe: so we dont need montic object version in general becasue we assocate the instance with the comptue service that is currently manging it
16:40:58 <gibi> in a palybook
16:41:11 <sean-k-mooney> in the post test hook in nova-next
16:41:16 <cardoe> For us we implemented this more as monitoring that an unexpected object was on a host.
16:41:17 <sean-k-mooney> and i think the cepch job?
16:42:03 <sean-k-mooney> cardoe: ya so that kind of approch is not very comaptible with the nova desing although we do have some per host perodic that are intened ot clean up these types of issues
16:42:08 <cardoe> sean-k-mooney: when you perform a live migrate and the dest goes down at the very end during the resume
16:42:09 <sean-k-mooney> esplcially on compute agent start up
16:42:53 <sean-k-mooney> cardoe: if the vm dies before resume libvirt will abort the migraiot
16:43:01 <sean-k-mooney> if we got to post live migrate
16:43:21 <sean-k-mooney> its too late to abort but we will update the instance hsot to point to the dest
16:43:28 <gibi> this is the evac testing in devstack https://github.com/openstack/nova/blob/master/roles/run-evacuate-hook/tasks/main.yaml
16:43:58 <Uggla> guys is it ok to move to the second point ?
16:44:05 <gibi> OK for me
16:44:08 <cardoe> I'm just saying it wasn't a simple problem when I was working in the VM world because hardware dies at the worst possible times and nova can be in many different code calls when that happens.
16:44:10 <sean-k-mooney> yep lets move on for now.
16:44:24 <Uggla> #topic (jonnyb) Blueprint for new compute created weigher (https://blueprints.launchpad.net/nova/+spec/node-uptime-weigher)
16:44:47 <Uggla> jonnyb wants to discuss about this weigher
16:45:01 <Uggla> jonnyb please go ahead.
16:45:13 <jonnyb> so I propose a scheduler weigher based on creation date of the compute service
16:45:41 <jonnyb> the initial idea was to use uptime but this is too dependend on the hypervisor and its driver
16:46:11 <jonnyb> for us the creation is good enough and we use it since we migrate and deploy a lot of servers
16:46:13 <gibi> do you have the service creation time as part of the HostState the weigher running on?
16:46:29 <sean-k-mooney> i belive we do
16:46:36 <dansmith> that surprises me
16:46:53 <sean-k-mooney> we have  it form the compute node object i belive
16:47:01 <dansmith> sure, but in hoststate?
16:47:38 <sean-k-mooney> so this is the poc https://review.opendev.org/c/openstack/nova/+/947503/9/nova/scheduler/weights/compute_created.py
16:48:27 <dansmith> we have an object, but I didn't think it was queried from the db and thus not complete, but maybe that's not right
16:49:08 <sean-k-mooney> so yes https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L176
16:49:26 <sean-k-mooney> you could be correct about it being a partial object
16:49:26 <gibi> based on the poc we have HostState.service.created_at  as created_at does not expect to change it is fair to assume it is up to date
16:49:45 <gibi> isn't expected to change
16:50:34 <sean-k-mooney> so it looks like this is only populated via update its not populated iva __init__
16:51:02 <sean-k-mooney> so im not sure if its there if you disatble the track_isntace_state notifications stuff
16:51:17 <dansmith> okay we are doing a scatter to gather nodes and hosts from cells
16:51:18 <sean-k-mooney> we woudl have to confirm that its always present in both confiugrations
16:51:30 <dansmith> I was thinking we did an optimized construction from a list there
16:51:44 <dansmith> sean-k-mooney: yeah, good point
16:52:11 <sean-k-mooney> i think that config only affect the instnace object and how those are populated
16:52:16 <dansmith> because this could also be triggering massive lazy loads, which would be bad
16:52:19 <sean-k-mooney> but assuming we have that alwasy
16:52:43 <sean-k-mooney> i think it would be ok to use it in the weigher provide ya we are not lazy loading this
16:53:04 <dansmith> we need to make sure this is run in CI at least once so we can examine logs for lazy loads
16:56:32 <sean-k-mooney> ya we can turn this on in one fo the jobs
16:57:08 <sean-k-mooney> it wont really have much of an effect with only 1-2 nodes
16:57:10 <dansmith> we don't have to permanently, but just a DNM on top of this to get a run and make sure we don't see "lazy loading service.created_at" :)
16:58:10 <sean-k-mooney> we are wrappign the sercie in a readonly dict class that i have not really seen before. i would assume that shoudl also block lazyloading?
16:58:24 <sean-k-mooney> goign back on topic
16:58:34 <sean-k-mooney> are we ok with appvoing this as a specless blueprint
16:58:38 <sean-k-mooney> and continuing to review
16:59:16 <dansmith> idk
16:59:53 <Uggla> :+1 for me it looks an interesting weigher.
17:00:20 <Uggla> gibi, ok for you ?
17:00:24 <gibi> I'm fine
17:00:46 <sean-k-mooney> i dont nessiarly have an objection just note that if there are lazy loadign issues they will need to be resolved ebfore we can merge it
17:00:56 <dansmith> same
17:00:58 <sean-k-mooney> so it might not be ready before FF
17:01:17 <sean-k-mooney> but im ok to move to the review and take a look a ci logs ectra
17:01:30 <Uggla> jonnyb is that ok for you ?
17:01:35 <jonnyb> thats fine with me, i wasnt aware that lazy loading could be an issue. but good point
17:02:04 <sean-k-mooney> you defaulted to 1.0 so in theroy its enabled by default
17:02:15 <sean-k-mooney> so we should b eable to look at any of the current ci logs
17:02:28 <Uggla> ok last topic:
17:02:28 <Uggla> #topic (gibi) Workaround for cpython3.13 GC bug needs review https://review.opendev.org/c/openstack/nova/+/952966
17:02:36 <dansmith> oh, definitely seems like it should be disabled by default
17:02:53 <dansmith> that's merged?
17:03:05 <sean-k-mooney> well we can defintly change that. in the review and ya the gc thing is merged
17:03:26 <gibi> yepp it is merged
17:03:29 <Uggla> dansmith yes it seems
17:03:48 <gibi> I proposed the backport to Epoxy as I know debian want it there
17:04:23 <gibi> (the cpython bug fixing effort is progressing in the background)
17:05:14 <gibi> so other than the backport there is nothing else left here
17:05:18 <gibi> we can move on
17:05:41 <sean-k-mooney> jonnyb: so without diging deeply https://zuul.opendev.org/t/openstack/build/8a9b0a0404034a79a95621b18e16534b/log/controller/logs/screen-n-sch.txt#1017 you schduler was enabeld and i dont see lazy loadign but we need to look at that properly after the meeting
17:05:44 <Uggla> ok cardoe, wants to discuss about a patch
17:06:03 <Uggla> cardoe please go ahead, and then we will close.
17:06:07 <cardoe> https://review.opendev.org/c/openstack/nova-specs/+/471815
17:06:36 <cardoe> It is a proposed spec change to add trunk ports with their vlan tags to the network_data.json
17:06:47 <sean-k-mooney> ya im +2 on that gibi is +1 because they wont have review bandwith i belvie
17:07:04 <gibi> I'm +2-1=+1 signalling review bandwidth yes
17:07:16 <cardoe> It's something written originally by someone not involved in the effort currently.
17:07:23 <cardoe> But neutron supports defining ports like that.
17:07:55 <cardoe> There have been a few patches floating around from different operators that are using that functionality as a downstream patch.
17:07:56 <dansmith> sean-k-mooney: jonnyb: agree, I see the weigher being weighed and no lazy loads in that log, so seems fine
17:08:06 <gibi> having +1 from rubasov (Bence) is a good sign, he was involved in the original work of trunk
17:08:15 <cardoe> I've been working to consolidate the downstream operators around a single implementation and wanting to get it upstreamed.
17:08:20 <sean-k-mooney> yep
17:08:48 <cardoe> We've got temptest tests written and an implementation.
17:08:52 <sean-k-mooney> there is code for noav and tempest test too
17:09:04 <sean-k-mooney> ya https://review.opendev.org/q/topic:%22bp/expose-vlan-trunking%22
17:09:20 <sean-k-mooney> this is useful for both vms adn for ironic
17:09:24 <cardoe> This is just part of my TC work to try and get downstream operators to push back to the community.
17:10:29 <sean-k-mooney> so the requwst is bascily for another core to sponsor/review this before spec freeze on thruday
17:10:57 <sean-k-mooney> and then to review the actul pactchs before m3
17:11:54 <sean-k-mooney> the one thing i will say is i think we only have turnks in one job
17:12:12 <sean-k-mooney> so we will need to confirm that this si properly tested in our ci and that that job passes
17:13:07 <sean-k-mooney> the logs have rotated and the one nova patch that is requried https://review.opendev.org/c/openstack/nova/+/941227 is in merge conflict if i ty an rebase
17:13:17 <cardoe> I'll work to get folks to update things based on feedback.
17:13:25 <sean-k-mooney> oh its just missign isnged off by
17:14:50 <Uggla> cardoe something else you want to ask ?
17:15:16 <cardoe> No my only question is how can I get this on the roadmap
17:18:12 <Uggla> We are overtime, so we should close.
17:18:19 <Uggla> Thanks all
17:18:25 <Uggla> #endmeeting