16:00:07 <bauzas> #startmeeting nova
16:00:07 <opendevmeet> Meeting started Tue Nov 29 16:00:07 2022 UTC and is due to finish in 60 minutes.  The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:07 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:07 <opendevmeet> The meeting name has been set to 'nova'
16:00:12 <bauzas> hey folks
16:00:16 <auniyal> O/
16:00:22 <gibi> o/
16:00:48 <Uggla> o/
16:01:16 <elodilles> o/
16:01:36 <bauzas> let me grab a coffee and we start
16:02:50 <bauzas> ok let's start and welcome
16:03:00 <bauzas> #topic Bugs (stuck/critical)
16:03:03 <dansmith> o/
16:03:05 <bauzas> #info No Critical bug
16:03:07 <sean-k-mooney> o/
16:03:09 <bauzas> #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 16 new untriaged bugs (+5 since the last meeting)
16:03:15 <bauzas> #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster
16:03:31 <bauzas> I know this was a busy week
16:03:40 <bauzas> any bug to discuss ?
16:03:49 <bauzas> (apart from the gate ones)
16:04:12 <bauzas> looks not
16:04:20 <bauzas> elodilles: can you use the baton for the next bugs ?
16:04:26 <elodilles> yepp
16:04:29 <bauzas> cool thanks !
16:04:35 <bauzas> #info bug baton is being passed to elodilles
16:04:41 <bauzas> #topic Gate status
16:04:45 <bauzas> #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs
16:04:55 <bauzas> it was a busy week
16:05:17 <bauzas> #info ML thread about the gate blocking issues we had https://lists.openstack.org/pipermail/openstack-discuss/2022-November/031357.html
16:05:30 <bauzas> kudos to the team for the hard work
16:05:34 <bauzas> it looks now the gate is back
16:05:49 <bauzas> unfortunately, we had to skip some tests :(
16:06:23 <bauzas> but actually maybe they were not necessary :)
16:06:33 <bauzas> anyway
16:06:40 <bauzas> #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status
16:06:47 <bauzas> #info Please look at the gate failures and file a bug report with the gate-failure tag.
16:06:51 <bauzas> #info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures
16:06:51 <gibi> nah no test is necessary :) only the code need to work :)
16:07:01 <opendevreview> Arnaud Morin proposed openstack/nova master: Unbind port when offloading a shelved instance  https://review.opendev.org/c/openstack/nova/+/853682
16:07:25 <bauzas> anything to discuss about the gate ?
16:07:48 <bauzas> #topic Release Planning
16:07:52 <bauzas> #link https://releases.openstack.org/antelope/schedule.html
16:07:57 <bauzas> #info Antelope-2 is in 5 weeks
16:08:47 <bauzas> as a reminder, remember that the last December week(s) you could be off :)
16:09:04 <bauzas> so even if we have 5 weeks until A-2, maybe less for you :)
16:09:27 <sean-k-mooney> ya i dont know if we want to have another review day before then
16:09:27 <bauzas> should we do another spec review day before end of December, btw ?
16:09:48 <gibi> I vote for someting after 13th of Dec :)
16:09:49 <bauzas> sean-k-mooney: we accepted to have an implementation review day around end of Dec, like Dec 20
16:09:49 <sean-k-mooney> well spec or impelemnation or both
16:10:06 <sean-k-mooney> that might be a bit late
16:10:14 <bauzas> for implementations, not really
16:10:20 <gibi> more specifically between 14th and 19th
16:10:25 <bauzas> as we only have a deadline for A-3
16:10:30 <sean-k-mooney> because of vacation
16:10:31 <bauzas> for specs, yes
16:10:38 <bauzas> ah
16:11:01 <bauzas> then, we could do a spec review day on Dec 13th
16:11:10 <gibi> 14 please
16:11:25 <gibi> there is an internal demo on 13th I will be busy with :)
16:11:34 <bauzas> and when should we be doing a implementation review day ?
16:11:56 <sean-k-mooney> ya im off form the 19th so 14th-16th for feature review
16:11:58 <sean-k-mooney> woudl be ok
16:11:59 <bauzas> gibi: haha, I don't see what you're saying :p
16:12:25 <gibi> yeah I support what sean-k-mooney proposes 14-16
16:12:31 <bauzas> gibi: as a reminder, last week, I was discussing here during the meeting while adding some slides for some internal meeting :p
16:12:47 <bauzas> you surely could do the same :D
16:12:54 <gibi> maybe a spec review day on 14th and an impl review day on 15th :)
16:13:05 <bauzas> if people accept to have two upstream days
16:13:11 <sean-k-mooney> that would work for me
16:13:12 <gibi> bauzas: I'm no way near to your abbility to multi task
16:13:17 <bauzas> during the same week
16:13:23 <sean-k-mooney> that leave the 16th to wrap up stuff before pto
16:13:32 <gibi> yepp
16:13:42 <gibi> (I will be here on 19th too but off from 20th)
16:13:51 <bauzas> gibi: or then I'd prefer to have an implementation day once we're back
16:14:10 <bauzas> not all of us work upstream everyday :)
16:14:10 <sean-k-mooney> we could but the idea was to have 2 of them
16:14:15 <sean-k-mooney> to take the pressure off m3
16:14:26 <sean-k-mooney> so one before m2 and one before m3
16:14:32 <gibi> I'm back on the 5th of Jan
16:14:55 <sean-k-mooney> so we coudl proably do one on the 10th of january
16:15:01 <sean-k-mooney> most will be around by then
16:15:09 <bauzas> sean-k-mooney: I don't disagree, I'm just advocating that some folks couldn't be able to have two review days on the same week
16:15:15 <sean-k-mooney> if we want ot keep it aligned to the meetign days
16:15:28 <gibi> 10th works for me
16:16:17 <bauzas> gibi: we don't really need to align those review days to our meeting
16:16:26 <bauzas> gibi: but this is nice as a reminder
16:17:02 <gibi> so I think we are converging on Dec 14th as spec and Jan 10th as a code review day
16:17:36 <bauzas> I think this works for me
16:17:53 <bauzas> and we can have another implementation review day later after Jan 10th
16:18:15 <sean-k-mooney> ya sure that sound workable
16:18:39 <bauzas> as a reminder, Antelope-3 (FF) is Feb 16th
16:18:55 <bauzas> more than 5 weeks after Jan 10th
16:19:09 <sean-k-mooney> there are still a few bits i would hope we can merge by the end of the year however. namely i would liek to see us make progress on the pci in placement serises
16:19:20 <bauzas> sure
16:19:52 <sean-k-mooney> ok so i think we can move on for now
16:19:56 <bauzas> what we can do is to tell we can review some changes by Dec 15th if we want so
16:20:23 <bauzas> that shouldn't be a specific review day, but people would know that *some* folks can review their changes by this day
16:20:39 <bauzas> anyway, I think we found a way
16:21:08 <gibi> yepp
16:21:11 <bauzas> #agreed Dec-14th will be a spec review day and Jan-10th will be an implementation review day, mark your calendars
16:21:41 <bauzas> #action bauzas to send an email about it
16:22:16 <bauzas> #agreed Some nova-cores can review some features changes around Dec 15th, you now know about it
16:22:27 <gibi> :)
16:22:28 <bauzas> OK, that's it
16:22:43 <bauzas> moving on
16:22:50 <bauzas> (sorry, that was a long discussion)
16:22:54 <bauzas> #topic Review priorities
16:23:00 <bauzas> #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2)
16:23:05 <bauzas> #info As a reminder, cores eager to review changes can +1 to indicate their interest, +2 for committing to the review
16:23:30 <bauzas> I'm happy to see people using it
16:23:56 <bauzas> that's it for that topic
16:24:00 <bauzas> next one
16:24:07 <bauzas> #topic Stable Branches
16:24:13 <bauzas> elodilles: your turn
16:24:16 <elodilles> ack
16:24:20 <elodilles> this will be short
16:24:23 <elodilles> #info stable branches seem to be unblocked / OK
16:24:27 <elodilles> #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci
16:24:30 <elodilles> that's it
16:25:58 <gibi> nice
16:26:14 <bauzas> was quick and awesome
16:26:36 <bauzas> last topic but not the least in theory,
16:26:45 <bauzas> #topic Open discussion
16:26:55 <bauzas> nothing in the wikipage
16:26:58 <bauzas> so
16:27:04 <bauzas> anything to discuss here by now ?
16:27:08 <gibi> -
16:27:17 <sean-k-mooney> did you merge skipign the failing nova-lvm tests yet
16:27:26 <sean-k-mooney> or is the master gate still explodingon that
16:27:30 <bauzas> I think yesterday we said we could discuss during this meeting about the test skips
16:27:46 <bauzas> but given we merged gmann's patch, the ship has sailed
16:27:56 <sean-k-mooney> ack
16:28:02 <sean-k-mooney> so they are disabeled currently
16:28:05 <bauzas> sean-k-mooney: see my ML thread above ^
16:28:06 <sean-k-mooney> the failing detach tests
16:28:15 <sean-k-mooney> ah ok will check after meeting
16:28:19 <bauzas> sean-k-mooney: you'll get the link to the gerrit change
16:28:20 <sean-k-mooney> nothing else form me
16:28:29 <auniyal> hand-raise: zuul frequent timeout issue/fails - this seems to be resource issue, is it possible zuull resource can be increased ?
16:29:09 <bauzas> sean-k-mooney: tl;dr: yes we skipped the related tests but maybe they are actually not needed as you said
16:29:16 <sean-k-mooney> auniyal: not really timeout are not that common in our jobs
16:29:24 <bauzas> auniyal: see what I said above, we had problems with the gate very recently
16:29:26 <sean-k-mooney> auniyal: do you have an example
16:29:31 <auniyal> in morning when there are less number of jobs running if we run same, job gets passed
16:29:40 <auniyal> like less then 20
16:29:47 <auniyal> right now 60 jobs are running
16:29:52 <sean-k-mooney> that should not really be a thing
16:30:03 <sean-k-mooney> unless we have issues with our ci providers
16:30:17 <bauzas> auniyal: if you speak about job results telling timeouts, agreed with sean-k-mooney, you should tell which ones so we could investigate
16:30:24 <sean-k-mooney> we ocationally have issues with slow providers but its not normally coralated with the number of runnign jobs
16:30:29 <bauzas> yup
16:30:35 <auniyal> ack
16:30:38 <bauzas> timeouts are generally an infra issue
16:30:43 <bauzas> from a ci provider
16:30:50 <bauzas> but "generally"
16:31:04 <bauzas> which means sometimes we may have a larger problem
16:31:07 <sean-k-mooney> auniyal: do you have a gerrit link to a change where it happend
16:31:12 <dansmith> are they fips jobs?
16:31:31 <clarkb> bauzas: I'm not sure I agree with that statement
16:31:31 <sean-k-mooney> oh ya it could be that did we add the extra 30 mins ot the job yet
16:31:38 <clarkb> we have significant amounts of very inefficient test payload
16:31:55 <clarkb> yes slow providers make that worse, but we have lots of ability to improve things in the jobs just about every time I look
16:32:15 <sean-k-mooney> clarkb: we dont often see timeouts in the jobs that run on the nova gate
16:32:29 <sean-k-mooney> we tent to be well within the job timeout interval
16:32:45 <sean-k-mooney> that is not nessialy the same for other projects
16:32:46 <clarkb> (it is common for tempets jobs to dig into swap which slows everything down, devstack uses osc which is super slow because it gets a new token for every request and has python spin up time, ansible loops are costly with large numbers of entries and so on)
16:32:58 <auniyal> sean, I am trying to find a link but its time taking
16:33:04 <clarkb> sean-k-mooney: yes swap is a common cause for the difference in behaviors and that isn't an infra issue
16:33:15 <clarkb> sean-k-mooney: and devstack runtime could be ~halved if we stopped using osc
16:33:25 <clarkb> or improved osc's startup and token acquisition time
16:33:31 <sean-k-mooney> clarkb: ack
16:33:36 <clarkb> I just want to avoid the idea its an infra issue so ignore it
16:33:39 <sean-k-mooney> ya the osc thing is a long runing known issue
16:33:49 <clarkb> this assertion gets made often then I go looking and there is plenty of job payload that is just slow
16:33:55 <sean-k-mooney> the parallel improments dansmith did helped indirectly
16:33:55 <bauzas> clarkb: sorry, I was unclear
16:33:59 <auniyal> although, I have experinced this alot, if my zuul, is not passing at night time (IST), even after recheck I ran them in morning, then pass
16:34:10 <bauzas> clarkb: I wasn't advocating about someone else's fault
16:34:52 <bauzas> clarkb: I was just explaining to some new nova contributor that given the current situation, we only have timeouts with nova jobs due to some ci provider issue
16:35:04 <clarkb> bauzas: right I disagree with that
16:35:13 <bauzas> clarkb: but I agree with you on some jobs that are wasting resources
16:35:13 <clarkb> jobs timeout due to an accumulation of slow steps
16:35:21 <clarkb> some of those may be due to a slow provider or slow instance
16:35:30 <clarkb> but, it is extremely rare that this is the only problem
16:35:34 <sean-k-mooney> clarkb: we tend to be seeing an avgerate runtime at about 75% or less of the job timeout in my experince
16:35:42 <clarkb> and I know nova tempest jobs have a large number of other slowness problems
16:35:55 <clarkb> sean-k-mooney: yes, but if a job digs deeply into swap its all downhill from there
16:35:56 <bauzas> clarkb: that's a fair point
16:35:56 <sean-k-mooney> we have 2 hour timeouts on our tempest jobs and we rarly go above about 90 mins
16:36:05 <clarkb> suddenly your 75% typical runtime can balloon to 200%
16:36:08 <bauzas> except the fips one
16:36:14 <sean-k-mooney> clarkb: sure but i dont think we are
16:36:25 <sean-k-mooney> but its somethign we can look at
16:36:40 <sean-k-mooney> auniyal: the best thing you can do is provide us an example and we can look into it
16:36:46 <clarkb> ++ to looking at it
16:36:47 <sean-k-mooney> and then see if there is a trend
16:37:10 <auniyal> ack
16:38:36 <bauzas> I actually wonder how we can track the trend
16:38:50 <sean-k-mooney> https://zuul.openstack.org/builds?project=openstack%2Fnova&result=TIMED_OUT&skip=0
16:38:56 <sean-k-mooney> that but its currently loading
16:39:17 <sean-k-mooney> we have a couple every few days
16:39:18 <bauzas> sure, but you don't have the time a SUCCESS job runs
16:39:28 <bauzas> which is what we should track
16:39:29 <clarkb> you can show both success and timeouts in a listing
16:39:35 <clarkb> (and failures, etc)
16:39:37 <sean-k-mooney> well we can change the result to filter both
16:39:51 <bauzas> the duration field, shit, missed it
16:40:01 <sean-k-mooney> we also hav ento fixed the fips job
16:40:09 <sean-k-mooney> ill create a patch for that i think
16:40:16 <bauzas> sean-k-mooney: I said I should do it
16:40:27 <sean-k-mooney> bauzas: ok please do
16:40:33 <bauzas> sean-k-mooney: that's simple to do and that's like 4 weeks I promised it
16:41:13 <bauzas> sean-k-mooney: you know what ? I'll end this meeting by now so everyone can do what they want, including me writing a zuul patch :)
16:41:25 <sean-k-mooney> :)
16:41:35 <bauzas> having said it,
16:41:39 <bauzas> thanks folks
16:41:43 <bauzas> #endmeeting