16:00:27 <bauzas> #startmeeting nova
16:00:27 <opendevmeet> Meeting started Tue Nov 26 16:00:27 2024 UTC and is due to finish in 60 minutes.  The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:27 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:27 <opendevmeet> The meeting name has been set to 'nova'
16:01:18 <bauzas> #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting
16:01:27 <tkajinam> o/
16:01:33 <elodilles> o/
16:01:48 <s3rj1k> hi all
16:02:27 <Uggla> o/
16:03:30 <bauzas> hey
16:04:22 <bauzas> starting slowly
16:04:31 <bauzas> #topic Bugs (stuck/critical)
16:04:49 <bauzas> #info No Critical bug
16:05:02 <bauzas> #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster
16:05:08 <bauzas> any questions about bugs ?
16:07:06 <bauzas> ok moving on
16:07:35 <bauzas> #topic Gate status
16:07:43 <bauzas> #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs
16:07:48 <bauzas> #link https://etherpad.opendev.org/p/nova-ci-failures-minimal
16:07:59 <bauzas> #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&branch=stable%2F*&branch=master&pipeline=periodic-weekly&skip=0
16:08:14 <bauzas> #info Please look at the gate failures and file a bug report with the gate-failure tag.
16:08:19 <bauzas> #info Please try to provide meaningful comment when you recheck
16:08:29 <bauzas> I saw a couple of failures but those are known issues
16:08:40 <bauzas> anything about CI failures that is pretty new ?
16:08:52 <bauzas> (all periodics are green)
16:10:00 <bauzas> looks not, moving on
16:10:09 <bauzas> #topic Release Planning
16:10:15 <bauzas> #link https://releases.openstack.org/epoxy/schedule.html
16:10:20 <bauzas> #action bauzas to add Epoxy nova deadlines in the schedule
16:10:45 <bauzas> I'm pretty done with the patch proposal but I need to fix something before uploading it
16:11:03 <bauzas> #topic Review priorities
16:11:10 <bauzas> #link https://etherpad.opendev.org/p/nova-2025.1-status
16:11:28 <bauzas> the page should be up to date, feel free to use it and amend it
16:12:46 <bauzas> anything about that ?
16:13:15 <gibi> o/
16:13:24 <sean-k-mooney> o/ nothing from me on that topic
16:13:33 <bauzas> cool
16:13:41 <bauzas> #topic Stable Branches
16:13:49 <bauzas> elodilles: shoor
16:14:02 <elodilles> #info stable/2024.2 gate seem to be OK
16:14:12 <elodilles> #info stable/2024.1 gate is blocked on grenade-skip-level & stable/203.2 is blocked on nova-grenade-multinode
16:14:21 <elodilles> failure is due to stable/2023.1->unmaintained/2023.1 transition, devstack and grenade fixes are proposed
16:14:52 <elodilles> and actually the 2024.1 branch fix (grenade) patch is already in the gate queue
16:15:05 <elodilles> though: other workaround is to set these jobs as non-voting - given that gate should not rely on an unmaintained branch
16:15:21 <elodilles> see further details:
16:15:28 <elodilles> #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci
16:15:58 <elodilles> and that's all from me about stable branches now
16:16:09 <bauzas> thanks
16:16:13 <elodilles> np
16:16:30 <bauzas> #topic vmwareapi 3rd-party CI efforts Highlights
16:16:33 <bauzas> fwiesel: around ?
16:17:20 <bauzas> looks he's AFK
16:17:24 <bauzas> no worries, moving on
16:17:28 <fwiesel> Sorry , I am here
16:17:31 <bauzas> ah
16:17:41 <bauzas> anything to raise from your side ?
16:17:59 <fwiesel> There was a regression in oslo.utils (master) and I have created a change to fix it: https://review.opendev.org/c/openstack/oslo.utils/+/936247
16:18:23 <fwiesel> Hopefully the builds will back to the two failures and I will tackle these then.
16:18:26 <fwiesel> That's from my side
16:18:28 <sean-k-mooney> ah is that related to removign netifaces
16:18:55 <bauzas> okay, gtk
16:18:57 <bauzas> thanks
16:19:11 * tkajinam is aware of the proposed fix and will ping the other cores to get in
16:19:16 <tkajinam> get that in
16:19:23 <bauzas> nice, thanks tkajinam
16:19:24 <tkajinam> fwiesel, if you need a new release with the fix early then ping me
16:19:35 <tkajinam> once that is merged
16:20:01 <fwiesel> tkajinam: Thanks, I'll let you know
16:20:39 <bauzas> cool
16:20:47 <bauzas> then moving to the last item from the agenda
16:20:54 <bauzas> #topic Open discussion
16:21:01 <bauzas> anything in the agenda, so anything anyone ?
16:21:11 <s3rj1k> there is this https://bugs.launchpad.net/nova/+bug/2089386
16:21:34 <sean-k-mooney> i have one followup form last week too
16:21:48 <sean-k-mooney> lets start with s3rj1k topic
16:22:29 <bauzas> ok, s3rj1k, shoot
16:23:30 <s3rj1k> idea is to allow for host discovery to be concurrent, both cli and internal using distributed locking
16:24:25 <sean-k-mooney> so perhaps i can provide some context
16:24:46 <s3rj1k> thi mostly needed for k8s like envs where discovery is run in multiple places
16:24:47 <sean-k-mooney> s3rj1k is interested in using the discover hsost perodic in a ha env
16:25:09 <bauzas> s3rj1k: I think that topic requires a proper discussion that can't be done during a meeting
16:25:18 <sean-k-mooney> currently we require that if you use the perodic its enabled on at most one host
16:25:29 <sean-k-mooney> they would like to adress that pain point
16:25:35 <bauzas> if we want to discuss about the design, it has to be an async conversation that has to be in a proper formatted document
16:26:04 <bauzas> that's the reason why we introduced our specification program for those kind of feature requests
16:26:14 <s3rj1k> bauzas: spec? or rfe is enough for this time?
16:26:30 <sean-k-mooney> so this would defneitlly be a spec if you were going to work on it
16:26:36 <bauzas> s3rj1k: are you familiar to the specs writing or do you need guidance ?
16:27:03 <s3rj1k> bauzas: done one for neutron, so all ok
16:27:11 <sean-k-mooney> i think before going that far however s3rj1k wanted some intiall feedback on is this in scope of nova to fix
16:27:44 <bauzas> sean-k-mooney: well, I'm not sure we have a quorum today for such design discussion
16:28:10 <bauzas> if that was something before the PTG, we would have said "sure, just add that to the PTG and we'll discuss it"
16:28:24 <sean-k-mooney> thats still an option
16:28:37 <sean-k-mooney> i suggested that s3rj1k  bring it here to advertise that it exist
16:28:47 <bauzas> honestly, I haven't yet formally written the nova deadlines for Epoxy but we're already running short in tome
16:28:49 <sean-k-mooney> and then start eithe r a mailing list or spec dicussion after that
16:28:52 <bauzas> time*
16:29:19 <bauzas> what exact problem are we trying to solve then ?
16:29:45 <sean-k-mooney> currently if you enable the discover host perodic task in more then one schdluer it can get duplict key error form the db
16:29:47 <bauzas> are we speaking of concurrent nova-scheduler services that need to be HA active-active for X reasons ?
16:29:54 <sean-k-mooney> as 2 process can race to create the mappings
16:29:59 <sean-k-mooney> leading to errors in the logs
16:30:04 <sean-k-mooney> we dont actully supprot that today
16:30:09 <bauzas> I think we always said that nova-scheduler has to be active-passive
16:30:13 <sean-k-mooney> but our documention on that is kind of lacking
16:30:17 <sean-k-mooney> no
16:30:26 <bauzas> I pretty bet we documented it
16:30:28 <sean-k-mooney> the schdluer has been supproted in active active for a very long time
16:30:31 <bauzas> nevre
16:30:35 <sean-k-mooney> yes
16:31:03 <tkajinam> as far as I can tell Tripleo in the past deployed it in all controllers
16:31:12 <bauzas> with placement, we thought that we /could/ run it active-active but there were reasons not to
16:31:21 <sean-k-mooney> nope
16:31:26 <bauzas> tkajinam: which was a bug that we raised a couple of times
16:31:33 <sean-k-mooney> downstream its been active active since like 16 maybe before
16:31:41 <bauzas> and I think TripleO changed it to A-P
16:31:49 <bauzas> for that exact reason
16:31:53 <sean-k-mooney> nope
16:31:56 <tkajinam> no
16:32:22 <sean-k-mooney> ok well i think we need a longer discussion on this RFE request
16:32:36 <sean-k-mooney> likely a spec and we probly dont have time to complete it in epoxy
16:32:50 <sean-k-mooney> but we shoudl dicuss this more async
16:33:22 <s3rj1k> no prob, thanks sean-k-mooney for taking a lead on explaining
16:34:28 <bauzas> I have to admit that none of that tribal knowledge is written in https://docs.openstack.org/nova/latest/admin/scheduling.html
16:34:43 <sean-k-mooney> its also not in the config option
16:35:01 <sean-k-mooney> i left my inital feedback on the bug when i traged it as opion
16:35:30 <sean-k-mooney> i dint make it as invlaid as i tought we shoudl atelast dicuss it more widely first
16:36:01 <bauzas> for now, we should document that active-passive HA configuration for sure
16:36:16 <sean-k-mooney> for the perodic only
16:36:19 <bauzas> because indeed, we know that there is no eventual consistency betwen schedulers
16:36:27 <sean-k-mooney> the schduler shoudl be generally deploy active active
16:36:42 <bauzas> that's your opinion :)
16:36:44 <sean-k-mooney> but also the perodic has perfomance issuues
16:36:50 <sean-k-mooney> bauzas: its waht we use in our product
16:36:58 <sean-k-mooney> and what almost all instller do by defult
16:37:42 <bauzas> https://specs.openstack.org/openstack/nova-specs/specs/abandoned/parallel-scheduler.html
16:37:53 <tkajinam> yeah > almost all installer do by default
16:38:11 <sean-k-mooney> that a diffent proposal
16:38:36 <bauzas> I litterally quote the first sentence of that spec :
16:38:42 <bauzas> "If you running two nova-scheduler processes they race each other, they don’t find out about each others choices until the DB gets updated by the nova-compute resource tracker. This has lead to many deployments opting for an Active/Passive HA setup for the nova-scheduler process."
16:39:10 <tkajinam> people may not prefer using act-act for simplicity and avoid clustering mechanism to implement active-passive.
16:39:24 <tkajinam> without large warning :-P
16:39:31 <sean-k-mooney> bauzas: that does not really apply as of placement
16:39:55 <sean-k-mooney> bauzas: i woudl condier it to be very incorrect advice to deocument that active active is not supported
16:40:50 <gibi> yeah the goal of placement to shrink the race window between parallel schedulers
16:41:07 <gibi> it is a solved problem for those resources that are tracked in placement
16:41:13 <bauzas> I don't disagree with the fact that HA active-active schedulers is a problem to solve
16:41:30 <gibi> for those that are not tracked there, the compute manager has a lock around claim to prevent overallocation
16:41:36 <gibi> and we have alternatives to reschedule
16:41:50 <bauzas> gibi: exactly, hence the A/P mechanism
16:41:57 <gibi> no this is A A
16:42:10 <gibi> the only A P problem is in the periodic discovery
16:42:13 <bauzas> in the very early times, we were considering reschedules as a way to address the problem
16:42:38 <bauzas> we stopped that tenet by wanting to reduce the reschedules, leading to indeed a broader problem
16:43:10 <gibi> we reduced reschedules with placement
16:43:16 <bauzas> originally, the scheduler wasn't intended to provide an exact solution
16:43:21 <gibi> and we improved reschedules with alterntive generation
16:43:30 <bauzas> right, which is why we never solved that problem
16:43:46 <bauzas> we reduced the scope of reschedules, that's it
16:43:49 <sean-k-mooney> we solved it to the point that we recomemnd active active as the defualt
16:43:50 <gibi> in a distributes system you have limits what you can solve exactly
16:44:11 <gibi> I agree with sean-k-mooney we can recomend A A
16:44:19 <gibi> actually OSP 18 does A A A
16:44:42 <gibi> (or as many As as you want :D)
16:44:43 <sean-k-mooney> right our product does not supprot active passive but i belive that was true in 17 as well
16:44:45 <bauzas> A A A is OK to me with resources tracked by placement
16:45:16 <sean-k-mooney> anyway perhasp we should move on?
16:45:23 <bauzas> agreed
16:45:28 <sean-k-mooney> we can talk about this more but proably dont need to in the meeting
16:45:32 <bauzas> and agreed on the fact we need a spec
16:45:43 <bauzas> but maybe the solution is to add more resources to placement
16:46:01 <sean-k-mooney> well that is the general direction anyway
16:46:04 <bauzas> or consider this as a non-solvable problem and accepting reschedules as a caveart
16:46:07 <bauzas> caveat
16:46:11 <sean-k-mooney> but that does not adress the reporte problem
16:46:17 <gibi> on the proposal of a distributed discover I can suggest to do the discover outside of a scheduler periodic to avoid the race
16:46:19 <sean-k-mooney> nova-audit would
16:46:28 <bauzas> anyway, moving on
16:46:31 <sean-k-mooney> gibi: yes its very diffent
16:46:41 <bauzas> s3rj1k: fancy writing a spec ?
16:46:41 <sean-k-mooney> bauzas: ack so i had one quick topic
16:46:47 <bauzas> sean-k-mooney: shoot
16:46:51 <s3rj1k> gibi: similar issue would be with CLI, check out RFE
16:47:05 <s3rj1k> bauzas: will do
16:47:12 <sean-k-mooney> so last week i raised adding rodolfo to os-vif core
16:47:23 <sean-k-mooney> i sent a mail to the list and no one objected
16:47:24 <gibi> s3rj1k: I mean if you control to only run the discover from a single CLI session at a time then I assume there is no race
16:47:37 <sean-k-mooney> so if there is no other objection here i will proceed with that after the call.
16:48:01 <s3rj1k> gibi: yes, need external control on how CLI gets run
16:48:16 <s3rj1k> lets move on, yes
16:48:31 <gibi> sean-k-mooney: no objection on my side
16:48:51 <bauzas> sean-k-mooney: no objections indeed
16:49:17 <sean-k-mooney> ack so that is all i had
16:49:36 <tkajinam> I have no objections but +1 :-) (I'm not a core, though)
16:50:18 <sean-k-mooney> ill send a mail to the list and then ill add them after that
16:50:37 <sean-k-mooney> jsut to keep a record of it beyond this meeting
16:51:59 <bauzas> ++
16:52:07 <bauzas> okay, then I think we're done for today
16:52:11 <bauzas> anything else ?
16:52:39 <bauzas> looks not
16:52:44 <bauzas> have a good end of day
16:52:47 <bauzas> thanks all
16:52:50 <bauzas> #endmeeting