16:01:18 <bauzas> #startmeeting nova
16:01:18 <opendevmeet> Meeting started Tue Jan  7 16:01:18 2025 UTC and is due to finish in 60 minutes.  The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:01:18 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:01:18 <opendevmeet> The meeting name has been set to 'nova'
16:01:24 <bauzas> hi folks
16:01:29 <bauzas> who's around ?
16:01:34 <sean-k-mooney> o/
16:01:40 <gibi> /o (partially)
16:01:56 <elodilles> o/
16:02:17 <bauzas> (I'm partially here too but I need to run the meeting :) )
16:02:32 * bauzas likes doing three things at same time
16:03:14 <bauzas> okay let's start and hopefully it will be quick
16:03:57 <bauzas> #topic Bugs (stuck/critical)
16:04:02 <bauzas> #info No Critical bug
16:04:09 <bauzas> #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster
16:04:18 <bauzas> any bugs people wanna raise ?
16:04:21 <Uggla> o/
16:04:57 <fwiesel> o/
16:05:03 <bauzas> looks not, moving on
16:05:09 <bauzas> #topic Gate status
16:05:15 <bauzas> #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs
16:05:21 <bauzas> #link https://etherpad.opendev.org/p/nova-ci-failures-minimal
16:05:30 <bauzas> #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&branch=stable%2F*&branch=master&pipeline=periodic-weekly&skip=0 Nova&Placement periodic jobs status
16:05:52 <bauzas> nova-emulation is in the weeds
16:06:09 <bauzas> #info Please look at the gate failures and file a bug report with the gate-failure tag.
16:06:15 <bauzas> #info Please try to provide meaningful comment when you recheck
16:07:51 <bauzas> #topic Release Planning
16:07:56 <bauzas> a few things to mention here
16:08:04 <bauzas> #link https://releases.openstack.org/epoxy/schedule.html
16:08:10 <bauzas> #info Nova deadlines are set in the above schedule
16:08:15 <bauzas> #info Implemention review day is planned tomorrow
16:08:32 <bauzas> I'll send an email about it ^
16:08:58 <bauzas> #action bauzas to notify about review day thru email
16:09:13 <bauzas> the other, also important :
16:09:16 <bauzas> #info Specs approval freeze planned for Thursday EOB
16:09:22 <bauzas> you are warned
16:09:42 <bauzas> I'm mostly burned today by spec reviews and I'll continue tomorrow
16:09:58 <bauzas> anything worth mentioning now ?
16:10:26 <s3rj1k> late hey to all, connection issues
16:10:44 <bauzas> moving on
16:11:53 <bauzas> #topic Review priorities
16:12:01 <bauzas> #link https://etherpad.opendev.org/p/nova-2025.1-status
16:12:17 <bauzas> nothing to mention, continuing
16:12:23 <bauzas> #topic Stable Branches
16:12:32 <bauzas> elodilles: happy new year
16:12:36 <elodilles> :)
16:12:45 <elodilles> happy new year too o/
16:12:46 <elodilles> :)
16:12:57 <elodilles> speaking of that
16:13:19 <elodilles> i see not much activity on stable branches in the past weeks
16:13:56 <elodilles> but i'm not aware of any stable gate issue
16:14:11 <elodilles> (maybe because of the above reason o:))
16:14:27 <elodilles> #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci
16:14:49 <elodilles> please if you see any issue, add there ^^^
16:15:09 <elodilles> that's all about stable branches from me
16:15:29 <bauzas> cool
16:15:47 <bauzas> #topic vmwareapi 3rd-party CI efforts Highlights
16:16:04 <bauzas> fwiesel: heya, happy new year too
16:16:05 <fwiesel> Hi, happy new year. No updates from my side.
16:16:08 <bauzas> ++
16:16:18 <bauzas> #topic Open discussion
16:16:38 <bauzas> one item in the agenda, quite on time given thursday's deadline
16:16:46 <bauzas> (bauzas) Specless approval for https://blueprints.launchpad.net/nova/+spec/image-metadata-props-weigher ?
16:17:06 <bauzas> tl;dr: I'm hereby asking for an exception to write a spec
16:17:22 <bauzas> this is just a weigher, we have everything we already need
16:17:41 <bauzas> in the past, some filters and weighers required a spec, some others not
16:18:02 <bauzas> so I'm leaning towards asking you to grant this blueprint as it is
16:18:17 <gibi> bauzas: fine by me to have it specless
16:22:09 <bauzas> tbc, we have Instance objects in the HostState
16:23:27 <sean-k-mooney> we do
16:23:39 <sean-k-mooney> for affinity/anti affintiy filters
16:23:48 <sean-k-mooney> and a sa result the weigher have them too for the same reason
16:24:19 <bauzas> the weigher design will be 'lookup at each of the instances from the host, see their imagemeta from the instance and compare with the request'
16:24:20 <sean-k-mooney> and the instance objects have teh image metadata avaialable
16:24:33 <sean-k-mooney> the only concern is is that lazy loaded today or not
16:24:42 <bauzas> the instances list ?
16:24:48 <sean-k-mooney> not the instnace list
16:25:10 <sean-k-mooney> the image metadta which is construction form the cached copy in teh instance_system_metadata table
16:25:22 <sean-k-mooney> i dont know if we load that in the host manager
16:25:32 <sean-k-mooney> we proably do just wanted ot call that out
16:25:39 <bauzas> ah good point
16:26:12 <bauzas> we can't load that later
16:26:20 <bauzas> as we're on the scheduler
16:26:33 <bauzas> or we need to target the cell db
16:26:34 <sean-k-mooney> https://github.com/openstack/nova/blob/master/nova/objects/instance.py#L70-L71
16:26:46 <sean-k-mooney> we are good the instnace system metadta is in teh defalt fields
16:26:58 <sean-k-mooney> so we have that already
16:27:29 <bauzas> excellenbt
16:27:34 <bauzas> thanks for checking
16:27:46 <bauzas> are we then ok with the design ?
16:28:05 <sean-k-mooney> i think so. i capture dmost of my feedback inthe blueprint already
16:28:29 <sean-k-mooney> so im more or less ok with proceedign to the implemenation reivew based on that design
16:29:35 <bauzas> ++
16:29:58 <bauzas> okay, then let's approve it as specless
16:30:14 <bauzas> #approved https://blueprints.launchpad.net/nova/+spec/image-metadata-props-weigher approved as specless
16:30:20 <bauzas> doh
16:30:35 <bauzas> #agreed https://blueprints.launchpad.net/nova/+spec/image-metadata-props-weigher approved as specless
16:30:39 <bauzas> that was it for me
16:30:44 <bauzas> anything else to mention ?
16:30:51 <sean-k-mooney> if there is time/enerty i have one topic
16:31:09 <bauzas> I don't have energy but we have time
16:31:36 <sean-k-mooney> https://blueprints.launchpad.net/nova/+spec/host-discovery-distributed-lock
16:31:57 <sean-k-mooney> we talked about this at the ptg
16:32:06 <sean-k-mooney> and htere is a related spec https://review.opendev.org/c/openstack/nova-specs/+/936389
16:32:27 <sean-k-mooney> tldr they woud like to be able to enabel the discover host periodic on multiple schduler at a time
16:32:38 <sean-k-mooney> so i dont actully like the proposal as written
16:32:49 <sean-k-mooney> but i did a short poc of one of the alternitives
16:32:58 <sean-k-mooney> https://review.opendev.org/c/openstack/nova/+/938523
16:33:15 <sean-k-mooney> i don think this could be a reasonable minor imporvment
16:33:39 <sean-k-mooney> the change is basically just this https://review.opendev.org/c/openstack/nova/+/938523/2/nova/scheduler/manager.py#113
16:34:16 <sean-k-mooney> what i wanted to know is if we took this supper simpler approch of doing leader election in the perodic and returnign if not the leader
16:34:26 <sean-k-mooney> woudl this be a spec/bluepirnt or bugfix?
16:35:01 <bauzas> hmmmm
16:35:06 <sean-k-mooney> im askign becasue of the time constratint for the first two options
16:35:12 <gibi> it has no API impact so I lean towards specless bp
16:35:36 <s3rj1k> specless from me if that counts :)
16:36:08 <sean-k-mooney> s3rj1k: input is always welcome :)
16:36:25 <gibi> sean-k-mooney: but you can formulate it as a bug as nova allows enabling the periodic in multiple schedulers today
16:36:27 <s3rj1k> (plus it somewhat covered in alternatives section of mentioned above spec)
16:36:55 <gibi> sean-k-mooney: without a safety net
16:37:15 <sean-k-mooney> ya so my perspecitve was. if we are takign the heviry weight approch of adding a distibuted lock manager that needed a spec because its complex and  a large change
16:37:29 <bauzas> can't we just add a config option ?
16:37:39 <sean-k-mooney> if we do a very very small change to just od best effored leader election it might be a bug
16:37:45 <bauzas> I understand the reasoning, magically nova will elect a new leader
16:38:07 <sean-k-mooney> bauzas: we could also and a config opiton yes
16:38:08 <bauzas> but I'm afraid this periodic could silently run somewhere else without the op noticing it
16:38:14 <dansmith> omg, DLM?
16:38:16 * dansmith reads up
16:38:31 <sean-k-mooney> bauzas: so you have to opt into the perodic
16:38:33 <bauzas> and you know, brainsplits and all the likes happen
16:38:40 <sean-k-mooney> by settign a config option to specify the interval today
16:38:51 <bauzas> that's my point
16:39:02 <sean-k-mooney> so this woudl just be a change to the behaiovr when you opt in
16:39:04 <bauzas> -1 disables the periodic IIRC
16:39:14 <sean-k-mooney> yep and i dont want to change that
16:39:24 <bauzas> but I wouldn't trust python for electing my leader
16:39:38 <bauzas> particularly the sorted command
16:39:42 <sean-k-mooney> so today the db enfoces that bad thing dont happen
16:39:53 <sean-k-mooney> you just get excptions in the log
16:40:14 <sean-k-mooney> if multiple race that is
16:40:22 <sean-k-mooney> which is annoying for operators
16:40:44 <dansmith> if you want to automate host discovery active-active, you can do that yourself with nova-manage and your own DLM right?
16:40:54 <sean-k-mooney> yes
16:41:04 <bauzas> yes, that's why I don't like that approach
16:41:12 <sean-k-mooney> that a pain in k8s however which s3rj1k cares about
16:41:18 <bauzas> you're considering that DLM is just not needed because sorted exists
16:41:25 <gibi> definitely having a full DLM in nova just for this is way overkill
16:41:39 <sean-k-mooney> yep which is why i did no liek the dlm/tooz approch
16:41:56 <bauzas> the SG API isn't also good at doing live healthchecks
16:42:19 <bauzas> there could be a reasonable amount in time where nova wouldn't see a node done
16:42:26 <bauzas> gone*
16:42:33 <sean-k-mooney> which is fine
16:42:37 <gibi> today if you run the periodic in multiple schedulers you burn power unnecessarily but you don't break nova DB just get exceptions.
16:42:48 <sean-k-mooney> we can miss runnign the perodic for a protracted period of time with out bad impacts
16:43:56 <bauzas> we could just let CONF.scheduler.discover_hosts_in_cells_interval be mutable and leave DLMs to manage nova-scheduler A/Ps instead of us
16:44:04 <sean-k-mooney> nope
16:44:09 <dansmith> There are also things we could do in nova that don't require a DLM if we really care
16:44:16 <sean-k-mooney> that does not work for k8s deployments which is the motivating usecase
16:44:31 <dansmith> like the periodic could say "am I the oldest-started nova-scheduler service that is currently up? If so, then run the discovery, if not, then don't"
16:44:34 <sean-k-mooney> dansmith: right im proposing not using a DLM
16:44:49 <dansmith> then only one of them would run it until you shut down the oldest and then the next one would do it.. sort of lazy-consensus election
16:44:52 <sean-k-mooney> dansmith: that basically what my patch does
16:44:53 <gibi> dansmith: exactly, that is very close to what sean-k-mooney proposes
16:44:59 <bauzas> dansmith: I don't like to trust the service group API for electing my leader
16:45:02 <dansmith> oh, then.. yeah :D
16:45:22 <dansmith> bauzas: it's not really trust, it's just optimization
16:45:30 <bauzas> sorted was one option, the oldest is another alternative
16:45:38 <gibi> bauzas: don't think about as leader election, it is more like limiting the number of discover host runs based on an input
16:45:51 <dansmith> I see sean-k-mooney's patch now, sorry I'm catching up
16:45:52 <sean-k-mooney> sorted is just for determinium
16:45:58 <sean-k-mooney> dansmith: no worries
16:46:15 <dansmith> determinism .. yeah, let's do something like that if we care, for sure
16:46:20 <gibi> bauzas: it could be a last run timestamp if we don't like age
16:46:55 <gibi> bauzas: but as we hav age we have a good set of value to find a single scheduler that min / max in that value to be the only one to do the work
16:46:57 <bauzas> well,
16:47:24 <sean-k-mooney> so for today what i really want to knwo is spec, bug or specless bluepritn so i know if we have till thurday or m3 to finalise the design and implamations
16:47:47 <bauzas> there are possibilities where a nova-scheduler could see itself being the oldest while another one, runnning 2 secs later could also see it as the oldest
16:47:59 <sean-k-mooney> that why im not using time
16:48:06 <sean-k-mooney> im sorting on the value of host
16:48:10 <bauzas> I know
16:48:18 <bauzas> but that's still the same
16:48:23 <dansmith> bauzas: sort by id, filter by up, that should be stable
16:48:33 <bauzas> some host could see itself as up while someother too, 1 sec later
16:48:55 <dansmith> if it does, then it's flapping from the view of the operator, which I think is not likely to be unnoticed
16:49:23 <sean-k-mooney> also people run with this in production without this protection or any kind fo arbitation today.
16:49:31 <bauzas> dansmith: if you are okay with operators unnoticing which nova-scheduler runs where, then ok
16:49:35 <sean-k-mooney> i.e. the just ignor the longs when there is a collioson
16:49:47 <dansmith> bauzas: which nova scheduler runs...host discovery?
16:49:53 <bauzas> yes
16:50:10 <sean-k-mooney> why would the care?
16:50:15 <dansmith> isn't that the point of this? instead of run-everywhere it's run-one-place, automatically managed?
16:50:28 <dansmith> right now, they can all run it in parallel all the time, if you care, but that's expensive
16:50:46 <gibi> ^^ yepp
16:50:51 <dansmith> I thought the point was to make it less expensive and automatically decide that (hopefully) only one does it each time.. seems fine to me
16:50:52 <bauzas> OK, then let's go for the proposal
16:50:57 <dansmith> worst case, two do it, no problem
16:51:10 <sean-k-mooney> yep ^
16:51:12 <bauzas> well, you're right
16:51:22 <bauzas> two running at same time don't split brains
16:51:38 <bauzas> because of the periodic itself
16:51:41 <sean-k-mooney> they dont even cause an error if you have not added any hosts
16:51:52 <dansmith> yeah
16:51:58 <bauzas> okay, then to answer sean-k-mooney's question, I'm fine with it being specless
16:52:08 <bauzas> I actually would prefer it to be specless please
16:52:32 <bauzas> as I wouldn't want to step into DLM muds and leader elections
16:52:40 <sean-k-mooney> ok i think ill file a new bluepritn for ti and update the description
16:52:54 <sean-k-mooney> if we are ok with that are we ok to appove that async
16:52:58 <bauzas> sure
16:53:08 <sean-k-mooney> ill do it now and ping outside fo the meeting
16:53:36 <bauzas> #agreed https://review.opendev.org/c/openstack/nova/+/938523 can be filed as a specless blueprint
16:53:54 <s3rj1k> sean-k-mooney: thanks for spending time on this
16:53:55 <bauzas> cool thanks
16:54:30 <sean-k-mooney> s3rj1k: no worreis i ment to do it before going on pto i just didnt finish hacking on it until i got back on monday
16:54:54 <sean-k-mooney> ok thats if from me
16:55:05 <sean-k-mooney> s/if/it/
16:55:52 <bauzas> cool
16:55:55 <bauzas> thanks all then
16:55:58 <bauzas> #endmeeting