16:00:07 <bauzas> #startmeeting nova 16:00:07 <opendevmeet> Meeting started Tue Nov 29 16:00:07 2022 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:07 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:07 <opendevmeet> The meeting name has been set to 'nova' 16:00:12 <bauzas> hey folks 16:00:16 <auniyal> O/ 16:00:22 <gibi> o/ 16:00:48 <Uggla> o/ 16:01:16 <elodilles> o/ 16:01:36 <bauzas> let me grab a coffee and we start 16:02:50 <bauzas> ok let's start and welcome 16:03:00 <bauzas> #topic Bugs (stuck/critical) 16:03:03 <dansmith> o/ 16:03:05 <bauzas> #info No Critical bug 16:03:07 <sean-k-mooney> o/ 16:03:09 <bauzas> #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 16 new untriaged bugs (+5 since the last meeting) 16:03:15 <bauzas> #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster 16:03:31 <bauzas> I know this was a busy week 16:03:40 <bauzas> any bug to discuss ? 16:03:49 <bauzas> (apart from the gate ones) 16:04:12 <bauzas> looks not 16:04:20 <bauzas> elodilles: can you use the baton for the next bugs ? 16:04:26 <elodilles> yepp 16:04:29 <bauzas> cool thanks ! 16:04:35 <bauzas> #info bug baton is being passed to elodilles 16:04:41 <bauzas> #topic Gate status 16:04:45 <bauzas> #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:04:55 <bauzas> it was a busy week 16:05:17 <bauzas> #info ML thread about the gate blocking issues we had https://lists.openstack.org/pipermail/openstack-discuss/2022-November/031357.html 16:05:30 <bauzas> kudos to the team for the hard work 16:05:34 <bauzas> it looks now the gate is back 16:05:49 <bauzas> unfortunately, we had to skip some tests :( 16:06:23 <bauzas> but actually maybe they were not necessary :) 16:06:33 <bauzas> anyway 16:06:40 <bauzas> #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status 16:06:47 <bauzas> #info Please look at the gate failures and file a bug report with the gate-failure tag. 16:06:51 <bauzas> #info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures 16:06:51 <gibi> nah no test is necessary :) only the code need to work :) 16:07:01 <opendevreview> Arnaud Morin proposed openstack/nova master: Unbind port when offloading a shelved instance https://review.opendev.org/c/openstack/nova/+/853682 16:07:25 <bauzas> anything to discuss about the gate ? 16:07:48 <bauzas> #topic Release Planning 16:07:52 <bauzas> #link https://releases.openstack.org/antelope/schedule.html 16:07:57 <bauzas> #info Antelope-2 is in 5 weeks 16:08:47 <bauzas> as a reminder, remember that the last December week(s) you could be off :) 16:09:04 <bauzas> so even if we have 5 weeks until A-2, maybe less for you :) 16:09:27 <sean-k-mooney> ya i dont know if we want to have another review day before then 16:09:27 <bauzas> should we do another spec review day before end of December, btw ? 16:09:48 <gibi> I vote for someting after 13th of Dec :) 16:09:49 <bauzas> sean-k-mooney: we accepted to have an implementation review day around end of Dec, like Dec 20 16:09:49 <sean-k-mooney> well spec or impelemnation or both 16:10:06 <sean-k-mooney> that might be a bit late 16:10:14 <bauzas> for implementations, not really 16:10:20 <gibi> more specifically between 14th and 19th 16:10:25 <bauzas> as we only have a deadline for A-3 16:10:30 <sean-k-mooney> because of vacation 16:10:31 <bauzas> for specs, yes 16:10:38 <bauzas> ah 16:11:01 <bauzas> then, we could do a spec review day on Dec 13th 16:11:10 <gibi> 14 please 16:11:25 <gibi> there is an internal demo on 13th I will be busy with :) 16:11:34 <bauzas> and when should we be doing a implementation review day ? 16:11:56 <sean-k-mooney> ya im off form the 19th so 14th-16th for feature review 16:11:58 <sean-k-mooney> woudl be ok 16:11:59 <bauzas> gibi: haha, I don't see what you're saying :p 16:12:25 <gibi> yeah I support what sean-k-mooney proposes 14-16 16:12:31 <bauzas> gibi: as a reminder, last week, I was discussing here during the meeting while adding some slides for some internal meeting :p 16:12:47 <bauzas> you surely could do the same :D 16:12:54 <gibi> maybe a spec review day on 14th and an impl review day on 15th :) 16:13:05 <bauzas> if people accept to have two upstream days 16:13:11 <sean-k-mooney> that would work for me 16:13:12 <gibi> bauzas: I'm no way near to your abbility to multi task 16:13:17 <bauzas> during the same week 16:13:23 <sean-k-mooney> that leave the 16th to wrap up stuff before pto 16:13:32 <gibi> yepp 16:13:42 <gibi> (I will be here on 19th too but off from 20th) 16:13:51 <bauzas> gibi: or then I'd prefer to have an implementation day once we're back 16:14:10 <bauzas> not all of us work upstream everyday :) 16:14:10 <sean-k-mooney> we could but the idea was to have 2 of them 16:14:15 <sean-k-mooney> to take the pressure off m3 16:14:26 <sean-k-mooney> so one before m2 and one before m3 16:14:32 <gibi> I'm back on the 5th of Jan 16:14:55 <sean-k-mooney> so we coudl proably do one on the 10th of january 16:15:01 <sean-k-mooney> most will be around by then 16:15:09 <bauzas> sean-k-mooney: I don't disagree, I'm just advocating that some folks couldn't be able to have two review days on the same week 16:15:15 <sean-k-mooney> if we want ot keep it aligned to the meetign days 16:15:28 <gibi> 10th works for me 16:16:17 <bauzas> gibi: we don't really need to align those review days to our meeting 16:16:26 <bauzas> gibi: but this is nice as a reminder 16:17:02 <gibi> so I think we are converging on Dec 14th as spec and Jan 10th as a code review day 16:17:36 <bauzas> I think this works for me 16:17:53 <bauzas> and we can have another implementation review day later after Jan 10th 16:18:15 <sean-k-mooney> ya sure that sound workable 16:18:39 <bauzas> as a reminder, Antelope-3 (FF) is Feb 16th 16:18:55 <bauzas> more than 5 weeks after Jan 10th 16:19:09 <sean-k-mooney> there are still a few bits i would hope we can merge by the end of the year however. namely i would liek to see us make progress on the pci in placement serises 16:19:20 <bauzas> sure 16:19:52 <sean-k-mooney> ok so i think we can move on for now 16:19:56 <bauzas> what we can do is to tell we can review some changes by Dec 15th if we want so 16:20:23 <bauzas> that shouldn't be a specific review day, but people would know that *some* folks can review their changes by this day 16:20:39 <bauzas> anyway, I think we found a way 16:21:08 <gibi> yepp 16:21:11 <bauzas> #agreed Dec-14th will be a spec review day and Jan-10th will be an implementation review day, mark your calendars 16:21:41 <bauzas> #action bauzas to send an email about it 16:22:16 <bauzas> #agreed Some nova-cores can review some features changes around Dec 15th, you now know about it 16:22:27 <gibi> :) 16:22:28 <bauzas> OK, that's it 16:22:43 <bauzas> moving on 16:22:50 <bauzas> (sorry, that was a long discussion) 16:22:54 <bauzas> #topic Review priorities 16:23:00 <bauzas> #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2) 16:23:05 <bauzas> #info As a reminder, cores eager to review changes can +1 to indicate their interest, +2 for committing to the review 16:23:30 <bauzas> I'm happy to see people using it 16:23:56 <bauzas> that's it for that topic 16:24:00 <bauzas> next one 16:24:07 <bauzas> #topic Stable Branches 16:24:13 <bauzas> elodilles: your turn 16:24:16 <elodilles> ack 16:24:20 <elodilles> this will be short 16:24:23 <elodilles> #info stable branches seem to be unblocked / OK 16:24:27 <elodilles> #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci 16:24:30 <elodilles> that's it 16:25:58 <gibi> nice 16:26:14 <bauzas> was quick and awesome 16:26:36 <bauzas> last topic but not the least in theory, 16:26:45 <bauzas> #topic Open discussion 16:26:55 <bauzas> nothing in the wikipage 16:26:58 <bauzas> so 16:27:04 <bauzas> anything to discuss here by now ? 16:27:08 <gibi> - 16:27:17 <sean-k-mooney> did you merge skipign the failing nova-lvm tests yet 16:27:26 <sean-k-mooney> or is the master gate still explodingon that 16:27:30 <bauzas> I think yesterday we said we could discuss during this meeting about the test skips 16:27:46 <bauzas> but given we merged gmann's patch, the ship has sailed 16:27:56 <sean-k-mooney> ack 16:28:02 <sean-k-mooney> so they are disabeled currently 16:28:05 <bauzas> sean-k-mooney: see my ML thread above ^ 16:28:06 <sean-k-mooney> the failing detach tests 16:28:15 <sean-k-mooney> ah ok will check after meeting 16:28:19 <bauzas> sean-k-mooney: you'll get the link to the gerrit change 16:28:20 <sean-k-mooney> nothing else form me 16:28:29 <auniyal> hand-raise: zuul frequent timeout issue/fails - this seems to be resource issue, is it possible zuull resource can be increased ? 16:29:09 <bauzas> sean-k-mooney: tl;dr: yes we skipped the related tests but maybe they are actually not needed as you said 16:29:16 <sean-k-mooney> auniyal: not really timeout are not that common in our jobs 16:29:24 <bauzas> auniyal: see what I said above, we had problems with the gate very recently 16:29:26 <sean-k-mooney> auniyal: do you have an example 16:29:31 <auniyal> in morning when there are less number of jobs running if we run same, job gets passed 16:29:40 <auniyal> like less then 20 16:29:47 <auniyal> right now 60 jobs are running 16:29:52 <sean-k-mooney> that should not really be a thing 16:30:03 <sean-k-mooney> unless we have issues with our ci providers 16:30:17 <bauzas> auniyal: if you speak about job results telling timeouts, agreed with sean-k-mooney, you should tell which ones so we could investigate 16:30:24 <sean-k-mooney> we ocationally have issues with slow providers but its not normally coralated with the number of runnign jobs 16:30:29 <bauzas> yup 16:30:35 <auniyal> ack 16:30:38 <bauzas> timeouts are generally an infra issue 16:30:43 <bauzas> from a ci provider 16:30:50 <bauzas> but "generally" 16:31:04 <bauzas> which means sometimes we may have a larger problem 16:31:07 <sean-k-mooney> auniyal: do you have a gerrit link to a change where it happend 16:31:12 <dansmith> are they fips jobs? 16:31:31 <clarkb> bauzas: I'm not sure I agree with that statement 16:31:31 <sean-k-mooney> oh ya it could be that did we add the extra 30 mins ot the job yet 16:31:38 <clarkb> we have significant amounts of very inefficient test payload 16:31:55 <clarkb> yes slow providers make that worse, but we have lots of ability to improve things in the jobs just about every time I look 16:32:15 <sean-k-mooney> clarkb: we dont often see timeouts in the jobs that run on the nova gate 16:32:29 <sean-k-mooney> we tent to be well within the job timeout interval 16:32:45 <sean-k-mooney> that is not nessialy the same for other projects 16:32:46 <clarkb> (it is common for tempets jobs to dig into swap which slows everything down, devstack uses osc which is super slow because it gets a new token for every request and has python spin up time, ansible loops are costly with large numbers of entries and so on) 16:32:58 <auniyal> sean, I am trying to find a link but its time taking 16:33:04 <clarkb> sean-k-mooney: yes swap is a common cause for the difference in behaviors and that isn't an infra issue 16:33:15 <clarkb> sean-k-mooney: and devstack runtime could be ~halved if we stopped using osc 16:33:25 <clarkb> or improved osc's startup and token acquisition time 16:33:31 <sean-k-mooney> clarkb: ack 16:33:36 <clarkb> I just want to avoid the idea its an infra issue so ignore it 16:33:39 <sean-k-mooney> ya the osc thing is a long runing known issue 16:33:49 <clarkb> this assertion gets made often then I go looking and there is plenty of job payload that is just slow 16:33:55 <sean-k-mooney> the parallel improments dansmith did helped indirectly 16:33:55 <bauzas> clarkb: sorry, I was unclear 16:33:59 <auniyal> although, I have experinced this alot, if my zuul, is not passing at night time (IST), even after recheck I ran them in morning, then pass 16:34:10 <bauzas> clarkb: I wasn't advocating about someone else's fault 16:34:52 <bauzas> clarkb: I was just explaining to some new nova contributor that given the current situation, we only have timeouts with nova jobs due to some ci provider issue 16:35:04 <clarkb> bauzas: right I disagree with that 16:35:13 <bauzas> clarkb: but I agree with you on some jobs that are wasting resources 16:35:13 <clarkb> jobs timeout due to an accumulation of slow steps 16:35:21 <clarkb> some of those may be due to a slow provider or slow instance 16:35:30 <clarkb> but, it is extremely rare that this is the only problem 16:35:34 <sean-k-mooney> clarkb: we tend to be seeing an avgerate runtime at about 75% or less of the job timeout in my experince 16:35:42 <clarkb> and I know nova tempest jobs have a large number of other slowness problems 16:35:55 <clarkb> sean-k-mooney: yes, but if a job digs deeply into swap its all downhill from there 16:35:56 <bauzas> clarkb: that's a fair point 16:35:56 <sean-k-mooney> we have 2 hour timeouts on our tempest jobs and we rarly go above about 90 mins 16:36:05 <clarkb> suddenly your 75% typical runtime can balloon to 200% 16:36:08 <bauzas> except the fips one 16:36:14 <sean-k-mooney> clarkb: sure but i dont think we are 16:36:25 <sean-k-mooney> but its somethign we can look at 16:36:40 <sean-k-mooney> auniyal: the best thing you can do is provide us an example and we can look into it 16:36:46 <clarkb> ++ to looking at it 16:36:47 <sean-k-mooney> and then see if there is a trend 16:37:10 <auniyal> ack 16:38:36 <bauzas> I actually wonder how we can track the trend 16:38:50 <sean-k-mooney> https://zuul.openstack.org/builds?project=openstack%2Fnova&result=TIMED_OUT&skip=0 16:38:56 <sean-k-mooney> that but its currently loading 16:39:17 <sean-k-mooney> we have a couple every few days 16:39:18 <bauzas> sure, but you don't have the time a SUCCESS job runs 16:39:28 <bauzas> which is what we should track 16:39:29 <clarkb> you can show both success and timeouts in a listing 16:39:35 <clarkb> (and failures, etc) 16:39:37 <sean-k-mooney> well we can change the result to filter both 16:39:51 <bauzas> the duration field, shit, missed it 16:40:01 <sean-k-mooney> we also hav ento fixed the fips job 16:40:09 <sean-k-mooney> ill create a patch for that i think 16:40:16 <bauzas> sean-k-mooney: I said I should do it 16:40:27 <sean-k-mooney> bauzas: ok please do 16:40:33 <bauzas> sean-k-mooney: that's simple to do and that's like 4 weeks I promised it 16:41:13 <bauzas> sean-k-mooney: you know what ? I'll end this meeting by now so everyone can do what they want, including me writing a zuul patch :) 16:41:25 <sean-k-mooney> :) 16:41:35 <bauzas> having said it, 16:41:39 <bauzas> thanks folks 16:41:43 <bauzas> #endmeeting