16:01:18 #startmeeting nova 16:01:18 Meeting started Tue Jan 7 16:01:18 2025 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:18 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:18 The meeting name has been set to 'nova' 16:01:24 hi folks 16:01:29 who's around ? 16:01:34 o/ 16:01:40 /o (partially) 16:01:56 o/ 16:02:17 (I'm partially here too but I need to run the meeting :) ) 16:02:32 * bauzas likes doing three things at same time 16:03:14 okay let's start and hopefully it will be quick 16:03:57 #topic Bugs (stuck/critical) 16:04:02 #info No Critical bug 16:04:09 #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster 16:04:18 any bugs people wanna raise ? 16:04:21 o/ 16:04:57 o/ 16:05:03 looks not, moving on 16:05:09 #topic Gate status 16:05:15 #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:05:21 #link https://etherpad.opendev.org/p/nova-ci-failures-minimal 16:05:30 #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&branch=stable%2F*&branch=master&pipeline=periodic-weekly&skip=0 Nova&Placement periodic jobs status 16:05:52 nova-emulation is in the weeds 16:06:09 #info Please look at the gate failures and file a bug report with the gate-failure tag. 16:06:15 #info Please try to provide meaningful comment when you recheck 16:07:51 #topic Release Planning 16:07:56 a few things to mention here 16:08:04 #link https://releases.openstack.org/epoxy/schedule.html 16:08:10 #info Nova deadlines are set in the above schedule 16:08:15 #info Implemention review day is planned tomorrow 16:08:32 I'll send an email about it ^ 16:08:58 #action bauzas to notify about review day thru email 16:09:13 the other, also important : 16:09:16 #info Specs approval freeze planned for Thursday EOB 16:09:22 you are warned 16:09:42 I'm mostly burned today by spec reviews and I'll continue tomorrow 16:09:58 anything worth mentioning now ? 16:10:26 late hey to all, connection issues 16:10:44 moving on 16:11:53 #topic Review priorities 16:12:01 #link https://etherpad.opendev.org/p/nova-2025.1-status 16:12:17 nothing to mention, continuing 16:12:23 #topic Stable Branches 16:12:32 elodilles: happy new year 16:12:36 :) 16:12:45 happy new year too o/ 16:12:46 :) 16:12:57 speaking of that 16:13:19 i see not much activity on stable branches in the past weeks 16:13:56 but i'm not aware of any stable gate issue 16:14:11 (maybe because of the above reason o:)) 16:14:27 #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci 16:14:49 please if you see any issue, add there ^^^ 16:15:09 that's all about stable branches from me 16:15:29 cool 16:15:47 #topic vmwareapi 3rd-party CI efforts Highlights 16:16:04 fwiesel: heya, happy new year too 16:16:05 Hi, happy new year. No updates from my side. 16:16:08 ++ 16:16:18 #topic Open discussion 16:16:38 one item in the agenda, quite on time given thursday's deadline 16:16:46 (bauzas) Specless approval for https://blueprints.launchpad.net/nova/+spec/image-metadata-props-weigher ? 16:17:06 tl;dr: I'm hereby asking for an exception to write a spec 16:17:22 this is just a weigher, we have everything we already need 16:17:41 in the past, some filters and weighers required a spec, some others not 16:18:02 so I'm leaning towards asking you to grant this blueprint as it is 16:18:17 bauzas: fine by me to have it specless 16:22:09 tbc, we have Instance objects in the HostState 16:23:27 we do 16:23:39 for affinity/anti affintiy filters 16:23:48 and a sa result the weigher have them too for the same reason 16:24:19 the weigher design will be 'lookup at each of the instances from the host, see their imagemeta from the instance and compare with the request' 16:24:20 and the instance objects have teh image metadata avaialable 16:24:33 the only concern is is that lazy loaded today or not 16:24:42 the instances list ? 16:24:48 not the instnace list 16:25:10 the image metadta which is construction form the cached copy in teh instance_system_metadata table 16:25:22 i dont know if we load that in the host manager 16:25:32 we proably do just wanted ot call that out 16:25:39 ah good point 16:26:12 we can't load that later 16:26:20 as we're on the scheduler 16:26:33 or we need to target the cell db 16:26:34 https://github.com/openstack/nova/blob/master/nova/objects/instance.py#L70-L71 16:26:46 we are good the instnace system metadta is in teh defalt fields 16:26:58 so we have that already 16:27:29 excellenbt 16:27:34 thanks for checking 16:27:46 are we then ok with the design ? 16:28:05 i think so. i capture dmost of my feedback inthe blueprint already 16:28:29 so im more or less ok with proceedign to the implemenation reivew based on that design 16:29:35 ++ 16:29:58 okay, then let's approve it as specless 16:30:14 #approved https://blueprints.launchpad.net/nova/+spec/image-metadata-props-weigher approved as specless 16:30:20 doh 16:30:35 #agreed https://blueprints.launchpad.net/nova/+spec/image-metadata-props-weigher approved as specless 16:30:39 that was it for me 16:30:44 anything else to mention ? 16:30:51 if there is time/enerty i have one topic 16:31:09 I don't have energy but we have time 16:31:36 https://blueprints.launchpad.net/nova/+spec/host-discovery-distributed-lock 16:31:57 we talked about this at the ptg 16:32:06 and htere is a related spec https://review.opendev.org/c/openstack/nova-specs/+/936389 16:32:27 tldr they woud like to be able to enabel the discover host periodic on multiple schduler at a time 16:32:38 so i dont actully like the proposal as written 16:32:49 but i did a short poc of one of the alternitives 16:32:58 https://review.opendev.org/c/openstack/nova/+/938523 16:33:15 i don think this could be a reasonable minor imporvment 16:33:39 the change is basically just this https://review.opendev.org/c/openstack/nova/+/938523/2/nova/scheduler/manager.py#113 16:34:16 what i wanted to know is if we took this supper simpler approch of doing leader election in the perodic and returnign if not the leader 16:34:26 woudl this be a spec/bluepirnt or bugfix? 16:35:01 hmmmm 16:35:06 im askign becasue of the time constratint for the first two options 16:35:12 it has no API impact so I lean towards specless bp 16:35:36 specless from me if that counts :) 16:36:08 s3rj1k: input is always welcome :) 16:36:25 sean-k-mooney: but you can formulate it as a bug as nova allows enabling the periodic in multiple schedulers today 16:36:27 (plus it somewhat covered in alternatives section of mentioned above spec) 16:36:55 sean-k-mooney: without a safety net 16:37:15 ya so my perspecitve was. if we are takign the heviry weight approch of adding a distibuted lock manager that needed a spec because its complex and a large change 16:37:29 can't we just add a config option ? 16:37:39 if we do a very very small change to just od best effored leader election it might be a bug 16:37:45 I understand the reasoning, magically nova will elect a new leader 16:38:07 bauzas: we could also and a config opiton yes 16:38:08 but I'm afraid this periodic could silently run somewhere else without the op noticing it 16:38:14 omg, DLM? 16:38:16 * dansmith reads up 16:38:31 bauzas: so you have to opt into the perodic 16:38:33 and you know, brainsplits and all the likes happen 16:38:40 by settign a config option to specify the interval today 16:38:51 that's my point 16:39:02 so this woudl just be a change to the behaiovr when you opt in 16:39:04 -1 disables the periodic IIRC 16:39:14 yep and i dont want to change that 16:39:24 but I wouldn't trust python for electing my leader 16:39:38 particularly the sorted command 16:39:42 so today the db enfoces that bad thing dont happen 16:39:53 you just get excptions in the log 16:40:14 if multiple race that is 16:40:22 which is annoying for operators 16:40:44 if you want to automate host discovery active-active, you can do that yourself with nova-manage and your own DLM right? 16:40:54 yes 16:41:04 yes, that's why I don't like that approach 16:41:12 that a pain in k8s however which s3rj1k cares about 16:41:18 you're considering that DLM is just not needed because sorted exists 16:41:25 definitely having a full DLM in nova just for this is way overkill 16:41:39 yep which is why i did no liek the dlm/tooz approch 16:41:56 the SG API isn't also good at doing live healthchecks 16:42:19 there could be a reasonable amount in time where nova wouldn't see a node done 16:42:26 gone* 16:42:33 which is fine 16:42:37 today if you run the periodic in multiple schedulers you burn power unnecessarily but you don't break nova DB just get exceptions. 16:42:48 we can miss runnign the perodic for a protracted period of time with out bad impacts 16:43:56 we could just let CONF.scheduler.discover_hosts_in_cells_interval be mutable and leave DLMs to manage nova-scheduler A/Ps instead of us 16:44:04 nope 16:44:09 There are also things we could do in nova that don't require a DLM if we really care 16:44:16 that does not work for k8s deployments which is the motivating usecase 16:44:31 like the periodic could say "am I the oldest-started nova-scheduler service that is currently up? If so, then run the discovery, if not, then don't" 16:44:34 dansmith: right im proposing not using a DLM 16:44:49 then only one of them would run it until you shut down the oldest and then the next one would do it.. sort of lazy-consensus election 16:44:52 dansmith: that basically what my patch does 16:44:53 dansmith: exactly, that is very close to what sean-k-mooney proposes 16:44:59 dansmith: I don't like to trust the service group API for electing my leader 16:45:02 oh, then.. yeah :D 16:45:22 bauzas: it's not really trust, it's just optimization 16:45:30 sorted was one option, the oldest is another alternative 16:45:38 bauzas: don't think about as leader election, it is more like limiting the number of discover host runs based on an input 16:45:51 I see sean-k-mooney's patch now, sorry I'm catching up 16:45:52 sorted is just for determinium 16:45:58 dansmith: no worries 16:46:15 determinism .. yeah, let's do something like that if we care, for sure 16:46:20 bauzas: it could be a last run timestamp if we don't like age 16:46:55 bauzas: but as we hav age we have a good set of value to find a single scheduler that min / max in that value to be the only one to do the work 16:46:57 well, 16:47:24 so for today what i really want to knwo is spec, bug or specless bluepritn so i know if we have till thurday or m3 to finalise the design and implamations 16:47:47 there are possibilities where a nova-scheduler could see itself being the oldest while another one, runnning 2 secs later could also see it as the oldest 16:47:59 that why im not using time 16:48:06 im sorting on the value of host 16:48:10 I know 16:48:18 but that's still the same 16:48:23 bauzas: sort by id, filter by up, that should be stable 16:48:33 some host could see itself as up while someother too, 1 sec later 16:48:55 if it does, then it's flapping from the view of the operator, which I think is not likely to be unnoticed 16:49:23 also people run with this in production without this protection or any kind fo arbitation today. 16:49:31 dansmith: if you are okay with operators unnoticing which nova-scheduler runs where, then ok 16:49:35 i.e. the just ignor the longs when there is a collioson 16:49:47 bauzas: which nova scheduler runs...host discovery? 16:49:53 yes 16:50:10 why would the care? 16:50:15 isn't that the point of this? instead of run-everywhere it's run-one-place, automatically managed? 16:50:28 right now, they can all run it in parallel all the time, if you care, but that's expensive 16:50:46 ^^ yepp 16:50:51 I thought the point was to make it less expensive and automatically decide that (hopefully) only one does it each time.. seems fine to me 16:50:52 OK, then let's go for the proposal 16:50:57 worst case, two do it, no problem 16:51:10 yep ^ 16:51:12 well, you're right 16:51:22 two running at same time don't split brains 16:51:38 because of the periodic itself 16:51:41 they dont even cause an error if you have not added any hosts 16:51:52 yeah 16:51:58 okay, then to answer sean-k-mooney's question, I'm fine with it being specless 16:52:08 I actually would prefer it to be specless please 16:52:32 as I wouldn't want to step into DLM muds and leader elections 16:52:40 ok i think ill file a new bluepritn for ti and update the description 16:52:54 if we are ok with that are we ok to appove that async 16:52:58 sure 16:53:08 ill do it now and ping outside fo the meeting 16:53:36 #agreed https://review.opendev.org/c/openstack/nova/+/938523 can be filed as a specless blueprint 16:53:54 sean-k-mooney: thanks for spending time on this 16:53:55 cool thanks 16:54:30 s3rj1k: no worreis i ment to do it before going on pto i just didnt finish hacking on it until i got back on monday 16:54:54 ok thats if from me 16:55:05 s/if/it/ 16:55:52 cool 16:55:55 thanks all then 16:55:58 #endmeeting