16:01:18 <bauzas> #startmeeting nova 16:01:18 <opendevmeet> Meeting started Tue Jan 7 16:01:18 2025 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:18 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:18 <opendevmeet> The meeting name has been set to 'nova' 16:01:24 <bauzas> hi folks 16:01:29 <bauzas> who's around ? 16:01:34 <sean-k-mooney> o/ 16:01:40 <gibi> /o (partially) 16:01:56 <elodilles> o/ 16:02:17 <bauzas> (I'm partially here too but I need to run the meeting :) ) 16:02:32 * bauzas likes doing three things at same time 16:03:14 <bauzas> okay let's start and hopefully it will be quick 16:03:57 <bauzas> #topic Bugs (stuck/critical) 16:04:02 <bauzas> #info No Critical bug 16:04:09 <bauzas> #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster 16:04:18 <bauzas> any bugs people wanna raise ? 16:04:21 <Uggla> o/ 16:04:57 <fwiesel> o/ 16:05:03 <bauzas> looks not, moving on 16:05:09 <bauzas> #topic Gate status 16:05:15 <bauzas> #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:05:21 <bauzas> #link https://etherpad.opendev.org/p/nova-ci-failures-minimal 16:05:30 <bauzas> #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&branch=stable%2F*&branch=master&pipeline=periodic-weekly&skip=0 Nova&Placement periodic jobs status 16:05:52 <bauzas> nova-emulation is in the weeds 16:06:09 <bauzas> #info Please look at the gate failures and file a bug report with the gate-failure tag. 16:06:15 <bauzas> #info Please try to provide meaningful comment when you recheck 16:07:51 <bauzas> #topic Release Planning 16:07:56 <bauzas> a few things to mention here 16:08:04 <bauzas> #link https://releases.openstack.org/epoxy/schedule.html 16:08:10 <bauzas> #info Nova deadlines are set in the above schedule 16:08:15 <bauzas> #info Implemention review day is planned tomorrow 16:08:32 <bauzas> I'll send an email about it ^ 16:08:58 <bauzas> #action bauzas to notify about review day thru email 16:09:13 <bauzas> the other, also important : 16:09:16 <bauzas> #info Specs approval freeze planned for Thursday EOB 16:09:22 <bauzas> you are warned 16:09:42 <bauzas> I'm mostly burned today by spec reviews and I'll continue tomorrow 16:09:58 <bauzas> anything worth mentioning now ? 16:10:26 <s3rj1k> late hey to all, connection issues 16:10:44 <bauzas> moving on 16:11:53 <bauzas> #topic Review priorities 16:12:01 <bauzas> #link https://etherpad.opendev.org/p/nova-2025.1-status 16:12:17 <bauzas> nothing to mention, continuing 16:12:23 <bauzas> #topic Stable Branches 16:12:32 <bauzas> elodilles: happy new year 16:12:36 <elodilles> :) 16:12:45 <elodilles> happy new year too o/ 16:12:46 <elodilles> :) 16:12:57 <elodilles> speaking of that 16:13:19 <elodilles> i see not much activity on stable branches in the past weeks 16:13:56 <elodilles> but i'm not aware of any stable gate issue 16:14:11 <elodilles> (maybe because of the above reason o:)) 16:14:27 <elodilles> #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci 16:14:49 <elodilles> please if you see any issue, add there ^^^ 16:15:09 <elodilles> that's all about stable branches from me 16:15:29 <bauzas> cool 16:15:47 <bauzas> #topic vmwareapi 3rd-party CI efforts Highlights 16:16:04 <bauzas> fwiesel: heya, happy new year too 16:16:05 <fwiesel> Hi, happy new year. No updates from my side. 16:16:08 <bauzas> ++ 16:16:18 <bauzas> #topic Open discussion 16:16:38 <bauzas> one item in the agenda, quite on time given thursday's deadline 16:16:46 <bauzas> (bauzas) Specless approval for https://blueprints.launchpad.net/nova/+spec/image-metadata-props-weigher ? 16:17:06 <bauzas> tl;dr: I'm hereby asking for an exception to write a spec 16:17:22 <bauzas> this is just a weigher, we have everything we already need 16:17:41 <bauzas> in the past, some filters and weighers required a spec, some others not 16:18:02 <bauzas> so I'm leaning towards asking you to grant this blueprint as it is 16:18:17 <gibi> bauzas: fine by me to have it specless 16:22:09 <bauzas> tbc, we have Instance objects in the HostState 16:23:27 <sean-k-mooney> we do 16:23:39 <sean-k-mooney> for affinity/anti affintiy filters 16:23:48 <sean-k-mooney> and a sa result the weigher have them too for the same reason 16:24:19 <bauzas> the weigher design will be 'lookup at each of the instances from the host, see their imagemeta from the instance and compare with the request' 16:24:20 <sean-k-mooney> and the instance objects have teh image metadata avaialable 16:24:33 <sean-k-mooney> the only concern is is that lazy loaded today or not 16:24:42 <bauzas> the instances list ? 16:24:48 <sean-k-mooney> not the instnace list 16:25:10 <sean-k-mooney> the image metadta which is construction form the cached copy in teh instance_system_metadata table 16:25:22 <sean-k-mooney> i dont know if we load that in the host manager 16:25:32 <sean-k-mooney> we proably do just wanted ot call that out 16:25:39 <bauzas> ah good point 16:26:12 <bauzas> we can't load that later 16:26:20 <bauzas> as we're on the scheduler 16:26:33 <bauzas> or we need to target the cell db 16:26:34 <sean-k-mooney> https://github.com/openstack/nova/blob/master/nova/objects/instance.py#L70-L71 16:26:46 <sean-k-mooney> we are good the instnace system metadta is in teh defalt fields 16:26:58 <sean-k-mooney> so we have that already 16:27:29 <bauzas> excellenbt 16:27:34 <bauzas> thanks for checking 16:27:46 <bauzas> are we then ok with the design ? 16:28:05 <sean-k-mooney> i think so. i capture dmost of my feedback inthe blueprint already 16:28:29 <sean-k-mooney> so im more or less ok with proceedign to the implemenation reivew based on that design 16:29:35 <bauzas> ++ 16:29:58 <bauzas> okay, then let's approve it as specless 16:30:14 <bauzas> #approved https://blueprints.launchpad.net/nova/+spec/image-metadata-props-weigher approved as specless 16:30:20 <bauzas> doh 16:30:35 <bauzas> #agreed https://blueprints.launchpad.net/nova/+spec/image-metadata-props-weigher approved as specless 16:30:39 <bauzas> that was it for me 16:30:44 <bauzas> anything else to mention ? 16:30:51 <sean-k-mooney> if there is time/enerty i have one topic 16:31:09 <bauzas> I don't have energy but we have time 16:31:36 <sean-k-mooney> https://blueprints.launchpad.net/nova/+spec/host-discovery-distributed-lock 16:31:57 <sean-k-mooney> we talked about this at the ptg 16:32:06 <sean-k-mooney> and htere is a related spec https://review.opendev.org/c/openstack/nova-specs/+/936389 16:32:27 <sean-k-mooney> tldr they woud like to be able to enabel the discover host periodic on multiple schduler at a time 16:32:38 <sean-k-mooney> so i dont actully like the proposal as written 16:32:49 <sean-k-mooney> but i did a short poc of one of the alternitives 16:32:58 <sean-k-mooney> https://review.opendev.org/c/openstack/nova/+/938523 16:33:15 <sean-k-mooney> i don think this could be a reasonable minor imporvment 16:33:39 <sean-k-mooney> the change is basically just this https://review.opendev.org/c/openstack/nova/+/938523/2/nova/scheduler/manager.py#113 16:34:16 <sean-k-mooney> what i wanted to know is if we took this supper simpler approch of doing leader election in the perodic and returnign if not the leader 16:34:26 <sean-k-mooney> woudl this be a spec/bluepirnt or bugfix? 16:35:01 <bauzas> hmmmm 16:35:06 <sean-k-mooney> im askign becasue of the time constratint for the first two options 16:35:12 <gibi> it has no API impact so I lean towards specless bp 16:35:36 <s3rj1k> specless from me if that counts :) 16:36:08 <sean-k-mooney> s3rj1k: input is always welcome :) 16:36:25 <gibi> sean-k-mooney: but you can formulate it as a bug as nova allows enabling the periodic in multiple schedulers today 16:36:27 <s3rj1k> (plus it somewhat covered in alternatives section of mentioned above spec) 16:36:55 <gibi> sean-k-mooney: without a safety net 16:37:15 <sean-k-mooney> ya so my perspecitve was. if we are takign the heviry weight approch of adding a distibuted lock manager that needed a spec because its complex and a large change 16:37:29 <bauzas> can't we just add a config option ? 16:37:39 <sean-k-mooney> if we do a very very small change to just od best effored leader election it might be a bug 16:37:45 <bauzas> I understand the reasoning, magically nova will elect a new leader 16:38:07 <sean-k-mooney> bauzas: we could also and a config opiton yes 16:38:08 <bauzas> but I'm afraid this periodic could silently run somewhere else without the op noticing it 16:38:14 <dansmith> omg, DLM? 16:38:16 * dansmith reads up 16:38:31 <sean-k-mooney> bauzas: so you have to opt into the perodic 16:38:33 <bauzas> and you know, brainsplits and all the likes happen 16:38:40 <sean-k-mooney> by settign a config option to specify the interval today 16:38:51 <bauzas> that's my point 16:39:02 <sean-k-mooney> so this woudl just be a change to the behaiovr when you opt in 16:39:04 <bauzas> -1 disables the periodic IIRC 16:39:14 <sean-k-mooney> yep and i dont want to change that 16:39:24 <bauzas> but I wouldn't trust python for electing my leader 16:39:38 <bauzas> particularly the sorted command 16:39:42 <sean-k-mooney> so today the db enfoces that bad thing dont happen 16:39:53 <sean-k-mooney> you just get excptions in the log 16:40:14 <sean-k-mooney> if multiple race that is 16:40:22 <sean-k-mooney> which is annoying for operators 16:40:44 <dansmith> if you want to automate host discovery active-active, you can do that yourself with nova-manage and your own DLM right? 16:40:54 <sean-k-mooney> yes 16:41:04 <bauzas> yes, that's why I don't like that approach 16:41:12 <sean-k-mooney> that a pain in k8s however which s3rj1k cares about 16:41:18 <bauzas> you're considering that DLM is just not needed because sorted exists 16:41:25 <gibi> definitely having a full DLM in nova just for this is way overkill 16:41:39 <sean-k-mooney> yep which is why i did no liek the dlm/tooz approch 16:41:56 <bauzas> the SG API isn't also good at doing live healthchecks 16:42:19 <bauzas> there could be a reasonable amount in time where nova wouldn't see a node done 16:42:26 <bauzas> gone* 16:42:33 <sean-k-mooney> which is fine 16:42:37 <gibi> today if you run the periodic in multiple schedulers you burn power unnecessarily but you don't break nova DB just get exceptions. 16:42:48 <sean-k-mooney> we can miss runnign the perodic for a protracted period of time with out bad impacts 16:43:56 <bauzas> we could just let CONF.scheduler.discover_hosts_in_cells_interval be mutable and leave DLMs to manage nova-scheduler A/Ps instead of us 16:44:04 <sean-k-mooney> nope 16:44:09 <dansmith> There are also things we could do in nova that don't require a DLM if we really care 16:44:16 <sean-k-mooney> that does not work for k8s deployments which is the motivating usecase 16:44:31 <dansmith> like the periodic could say "am I the oldest-started nova-scheduler service that is currently up? If so, then run the discovery, if not, then don't" 16:44:34 <sean-k-mooney> dansmith: right im proposing not using a DLM 16:44:49 <dansmith> then only one of them would run it until you shut down the oldest and then the next one would do it.. sort of lazy-consensus election 16:44:52 <sean-k-mooney> dansmith: that basically what my patch does 16:44:53 <gibi> dansmith: exactly, that is very close to what sean-k-mooney proposes 16:44:59 <bauzas> dansmith: I don't like to trust the service group API for electing my leader 16:45:02 <dansmith> oh, then.. yeah :D 16:45:22 <dansmith> bauzas: it's not really trust, it's just optimization 16:45:30 <bauzas> sorted was one option, the oldest is another alternative 16:45:38 <gibi> bauzas: don't think about as leader election, it is more like limiting the number of discover host runs based on an input 16:45:51 <dansmith> I see sean-k-mooney's patch now, sorry I'm catching up 16:45:52 <sean-k-mooney> sorted is just for determinium 16:45:58 <sean-k-mooney> dansmith: no worries 16:46:15 <dansmith> determinism .. yeah, let's do something like that if we care, for sure 16:46:20 <gibi> bauzas: it could be a last run timestamp if we don't like age 16:46:55 <gibi> bauzas: but as we hav age we have a good set of value to find a single scheduler that min / max in that value to be the only one to do the work 16:46:57 <bauzas> well, 16:47:24 <sean-k-mooney> so for today what i really want to knwo is spec, bug or specless bluepritn so i know if we have till thurday or m3 to finalise the design and implamations 16:47:47 <bauzas> there are possibilities where a nova-scheduler could see itself being the oldest while another one, runnning 2 secs later could also see it as the oldest 16:47:59 <sean-k-mooney> that why im not using time 16:48:06 <sean-k-mooney> im sorting on the value of host 16:48:10 <bauzas> I know 16:48:18 <bauzas> but that's still the same 16:48:23 <dansmith> bauzas: sort by id, filter by up, that should be stable 16:48:33 <bauzas> some host could see itself as up while someother too, 1 sec later 16:48:55 <dansmith> if it does, then it's flapping from the view of the operator, which I think is not likely to be unnoticed 16:49:23 <sean-k-mooney> also people run with this in production without this protection or any kind fo arbitation today. 16:49:31 <bauzas> dansmith: if you are okay with operators unnoticing which nova-scheduler runs where, then ok 16:49:35 <sean-k-mooney> i.e. the just ignor the longs when there is a collioson 16:49:47 <dansmith> bauzas: which nova scheduler runs...host discovery? 16:49:53 <bauzas> yes 16:50:10 <sean-k-mooney> why would the care? 16:50:15 <dansmith> isn't that the point of this? instead of run-everywhere it's run-one-place, automatically managed? 16:50:28 <dansmith> right now, they can all run it in parallel all the time, if you care, but that's expensive 16:50:46 <gibi> ^^ yepp 16:50:51 <dansmith> I thought the point was to make it less expensive and automatically decide that (hopefully) only one does it each time.. seems fine to me 16:50:52 <bauzas> OK, then let's go for the proposal 16:50:57 <dansmith> worst case, two do it, no problem 16:51:10 <sean-k-mooney> yep ^ 16:51:12 <bauzas> well, you're right 16:51:22 <bauzas> two running at same time don't split brains 16:51:38 <bauzas> because of the periodic itself 16:51:41 <sean-k-mooney> they dont even cause an error if you have not added any hosts 16:51:52 <dansmith> yeah 16:51:58 <bauzas> okay, then to answer sean-k-mooney's question, I'm fine with it being specless 16:52:08 <bauzas> I actually would prefer it to be specless please 16:52:32 <bauzas> as I wouldn't want to step into DLM muds and leader elections 16:52:40 <sean-k-mooney> ok i think ill file a new bluepritn for ti and update the description 16:52:54 <sean-k-mooney> if we are ok with that are we ok to appove that async 16:52:58 <bauzas> sure 16:53:08 <sean-k-mooney> ill do it now and ping outside fo the meeting 16:53:36 <bauzas> #agreed https://review.opendev.org/c/openstack/nova/+/938523 can be filed as a specless blueprint 16:53:54 <s3rj1k> sean-k-mooney: thanks for spending time on this 16:53:55 <bauzas> cool thanks 16:54:30 <sean-k-mooney> s3rj1k: no worreis i ment to do it before going on pto i just didnt finish hacking on it until i got back on monday 16:54:54 <sean-k-mooney> ok thats if from me 16:55:05 <sean-k-mooney> s/if/it/ 16:55:52 <bauzas> cool 16:55:55 <bauzas> thanks all then 16:55:58 <bauzas> #endmeeting