16:00:07 #startmeeting nova 16:00:07 Meeting started Tue Nov 29 16:00:07 2022 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:07 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:07 The meeting name has been set to 'nova' 16:00:12 hey folks 16:00:16 O/ 16:00:22 o/ 16:00:48 o/ 16:01:16 o/ 16:01:36 let me grab a coffee and we start 16:02:50 ok let's start and welcome 16:03:00 #topic Bugs (stuck/critical) 16:03:03 o/ 16:03:05 #info No Critical bug 16:03:07 o/ 16:03:09 #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 16 new untriaged bugs (+5 since the last meeting) 16:03:15 #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster 16:03:31 I know this was a busy week 16:03:40 any bug to discuss ? 16:03:49 (apart from the gate ones) 16:04:12 looks not 16:04:20 elodilles: can you use the baton for the next bugs ? 16:04:26 yepp 16:04:29 cool thanks ! 16:04:35 #info bug baton is being passed to elodilles 16:04:41 #topic Gate status 16:04:45 #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:04:55 it was a busy week 16:05:17 #info ML thread about the gate blocking issues we had https://lists.openstack.org/pipermail/openstack-discuss/2022-November/031357.html 16:05:30 kudos to the team for the hard work 16:05:34 it looks now the gate is back 16:05:49 unfortunately, we had to skip some tests :( 16:06:23 but actually maybe they were not necessary :) 16:06:33 anyway 16:06:40 #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status 16:06:47 #info Please look at the gate failures and file a bug report with the gate-failure tag. 16:06:51 #info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures 16:06:51 nah no test is necessary :) only the code need to work :) 16:07:01 Arnaud Morin proposed openstack/nova master: Unbind port when offloading a shelved instance https://review.opendev.org/c/openstack/nova/+/853682 16:07:25 anything to discuss about the gate ? 16:07:48 #topic Release Planning 16:07:52 #link https://releases.openstack.org/antelope/schedule.html 16:07:57 #info Antelope-2 is in 5 weeks 16:08:47 as a reminder, remember that the last December week(s) you could be off :) 16:09:04 so even if we have 5 weeks until A-2, maybe less for you :) 16:09:27 ya i dont know if we want to have another review day before then 16:09:27 should we do another spec review day before end of December, btw ? 16:09:48 I vote for someting after 13th of Dec :) 16:09:49 sean-k-mooney: we accepted to have an implementation review day around end of Dec, like Dec 20 16:09:49 well spec or impelemnation or both 16:10:06 that might be a bit late 16:10:14 for implementations, not really 16:10:20 more specifically between 14th and 19th 16:10:25 as we only have a deadline for A-3 16:10:30 because of vacation 16:10:31 for specs, yes 16:10:38 ah 16:11:01 then, we could do a spec review day on Dec 13th 16:11:10 14 please 16:11:25 there is an internal demo on 13th I will be busy with :) 16:11:34 and when should we be doing a implementation review day ? 16:11:56 ya im off form the 19th so 14th-16th for feature review 16:11:58 woudl be ok 16:11:59 gibi: haha, I don't see what you're saying :p 16:12:25 yeah I support what sean-k-mooney proposes 14-16 16:12:31 gibi: as a reminder, last week, I was discussing here during the meeting while adding some slides for some internal meeting :p 16:12:47 you surely could do the same :D 16:12:54 maybe a spec review day on 14th and an impl review day on 15th :) 16:13:05 if people accept to have two upstream days 16:13:11 that would work for me 16:13:12 bauzas: I'm no way near to your abbility to multi task 16:13:17 during the same week 16:13:23 that leave the 16th to wrap up stuff before pto 16:13:32 yepp 16:13:42 (I will be here on 19th too but off from 20th) 16:13:51 gibi: or then I'd prefer to have an implementation day once we're back 16:14:10 not all of us work upstream everyday :) 16:14:10 we could but the idea was to have 2 of them 16:14:15 to take the pressure off m3 16:14:26 so one before m2 and one before m3 16:14:32 I'm back on the 5th of Jan 16:14:55 so we coudl proably do one on the 10th of january 16:15:01 most will be around by then 16:15:09 sean-k-mooney: I don't disagree, I'm just advocating that some folks couldn't be able to have two review days on the same week 16:15:15 if we want ot keep it aligned to the meetign days 16:15:28 10th works for me 16:16:17 gibi: we don't really need to align those review days to our meeting 16:16:26 gibi: but this is nice as a reminder 16:17:02 so I think we are converging on Dec 14th as spec and Jan 10th as a code review day 16:17:36 I think this works for me 16:17:53 and we can have another implementation review day later after Jan 10th 16:18:15 ya sure that sound workable 16:18:39 as a reminder, Antelope-3 (FF) is Feb 16th 16:18:55 more than 5 weeks after Jan 10th 16:19:09 there are still a few bits i would hope we can merge by the end of the year however. namely i would liek to see us make progress on the pci in placement serises 16:19:20 sure 16:19:52 ok so i think we can move on for now 16:19:56 what we can do is to tell we can review some changes by Dec 15th if we want so 16:20:23 that shouldn't be a specific review day, but people would know that *some* folks can review their changes by this day 16:20:39 anyway, I think we found a way 16:21:08 yepp 16:21:11 #agreed Dec-14th will be a spec review day and Jan-10th will be an implementation review day, mark your calendars 16:21:41 #action bauzas to send an email about it 16:22:16 #agreed Some nova-cores can review some features changes around Dec 15th, you now know about it 16:22:27 :) 16:22:28 OK, that's it 16:22:43 moving on 16:22:50 (sorry, that was a long discussion) 16:22:54 #topic Review priorities 16:23:00 #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2) 16:23:05 #info As a reminder, cores eager to review changes can +1 to indicate their interest, +2 for committing to the review 16:23:30 I'm happy to see people using it 16:23:56 that's it for that topic 16:24:00 next one 16:24:07 #topic Stable Branches 16:24:13 elodilles: your turn 16:24:16 ack 16:24:20 this will be short 16:24:23 #info stable branches seem to be unblocked / OK 16:24:27 #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci 16:24:30 that's it 16:25:58 nice 16:26:14 was quick and awesome 16:26:36 last topic but not the least in theory, 16:26:45 #topic Open discussion 16:26:55 nothing in the wikipage 16:26:58 so 16:27:04 anything to discuss here by now ? 16:27:08 - 16:27:17 did you merge skipign the failing nova-lvm tests yet 16:27:26 or is the master gate still explodingon that 16:27:30 I think yesterday we said we could discuss during this meeting about the test skips 16:27:46 but given we merged gmann's patch, the ship has sailed 16:27:56 ack 16:28:02 so they are disabeled currently 16:28:05 sean-k-mooney: see my ML thread above ^ 16:28:06 the failing detach tests 16:28:15 ah ok will check after meeting 16:28:19 sean-k-mooney: you'll get the link to the gerrit change 16:28:20 nothing else form me 16:28:29 hand-raise: zuul frequent timeout issue/fails - this seems to be resource issue, is it possible zuull resource can be increased ? 16:29:09 sean-k-mooney: tl;dr: yes we skipped the related tests but maybe they are actually not needed as you said 16:29:16 auniyal: not really timeout are not that common in our jobs 16:29:24 auniyal: see what I said above, we had problems with the gate very recently 16:29:26 auniyal: do you have an example 16:29:31 in morning when there are less number of jobs running if we run same, job gets passed 16:29:40 like less then 20 16:29:47 right now 60 jobs are running 16:29:52 that should not really be a thing 16:30:03 unless we have issues with our ci providers 16:30:17 auniyal: if you speak about job results telling timeouts, agreed with sean-k-mooney, you should tell which ones so we could investigate 16:30:24 we ocationally have issues with slow providers but its not normally coralated with the number of runnign jobs 16:30:29 yup 16:30:35 ack 16:30:38 timeouts are generally an infra issue 16:30:43 from a ci provider 16:30:50 but "generally" 16:31:04 which means sometimes we may have a larger problem 16:31:07 auniyal: do you have a gerrit link to a change where it happend 16:31:12 are they fips jobs? 16:31:31 bauzas: I'm not sure I agree with that statement 16:31:31 oh ya it could be that did we add the extra 30 mins ot the job yet 16:31:38 we have significant amounts of very inefficient test payload 16:31:55 yes slow providers make that worse, but we have lots of ability to improve things in the jobs just about every time I look 16:32:15 clarkb: we dont often see timeouts in the jobs that run on the nova gate 16:32:29 we tent to be well within the job timeout interval 16:32:45 that is not nessialy the same for other projects 16:32:46 (it is common for tempets jobs to dig into swap which slows everything down, devstack uses osc which is super slow because it gets a new token for every request and has python spin up time, ansible loops are costly with large numbers of entries and so on) 16:32:58 sean, I am trying to find a link but its time taking 16:33:04 sean-k-mooney: yes swap is a common cause for the difference in behaviors and that isn't an infra issue 16:33:15 sean-k-mooney: and devstack runtime could be ~halved if we stopped using osc 16:33:25 or improved osc's startup and token acquisition time 16:33:31 clarkb: ack 16:33:36 I just want to avoid the idea its an infra issue so ignore it 16:33:39 ya the osc thing is a long runing known issue 16:33:49 this assertion gets made often then I go looking and there is plenty of job payload that is just slow 16:33:55 the parallel improments dansmith did helped indirectly 16:33:55 clarkb: sorry, I was unclear 16:33:59 although, I have experinced this alot, if my zuul, is not passing at night time (IST), even after recheck I ran them in morning, then pass 16:34:10 clarkb: I wasn't advocating about someone else's fault 16:34:52 clarkb: I was just explaining to some new nova contributor that given the current situation, we only have timeouts with nova jobs due to some ci provider issue 16:35:04 bauzas: right I disagree with that 16:35:13 clarkb: but I agree with you on some jobs that are wasting resources 16:35:13 jobs timeout due to an accumulation of slow steps 16:35:21 some of those may be due to a slow provider or slow instance 16:35:30 but, it is extremely rare that this is the only problem 16:35:34 clarkb: we tend to be seeing an avgerate runtime at about 75% or less of the job timeout in my experince 16:35:42 and I know nova tempest jobs have a large number of other slowness problems 16:35:55 sean-k-mooney: yes, but if a job digs deeply into swap its all downhill from there 16:35:56 clarkb: that's a fair point 16:35:56 we have 2 hour timeouts on our tempest jobs and we rarly go above about 90 mins 16:36:05 suddenly your 75% typical runtime can balloon to 200% 16:36:08 except the fips one 16:36:14 clarkb: sure but i dont think we are 16:36:25 but its somethign we can look at 16:36:40 auniyal: the best thing you can do is provide us an example and we can look into it 16:36:46 ++ to looking at it 16:36:47 and then see if there is a trend 16:37:10 ack 16:38:36 I actually wonder how we can track the trend 16:38:50 https://zuul.openstack.org/builds?project=openstack%2Fnova&result=TIMED_OUT&skip=0 16:38:56 that but its currently loading 16:39:17 we have a couple every few days 16:39:18 sure, but you don't have the time a SUCCESS job runs 16:39:28 which is what we should track 16:39:29 you can show both success and timeouts in a listing 16:39:35 (and failures, etc) 16:39:37 well we can change the result to filter both 16:39:51 the duration field, shit, missed it 16:40:01 we also hav ento fixed the fips job 16:40:09 ill create a patch for that i think 16:40:16 sean-k-mooney: I said I should do it 16:40:27 bauzas: ok please do 16:40:33 sean-k-mooney: that's simple to do and that's like 4 weeks I promised it 16:41:13 sean-k-mooney: you know what ? I'll end this meeting by now so everyone can do what they want, including me writing a zuul patch :) 16:41:25 :) 16:41:35 having said it, 16:41:39 thanks folks 16:41:43 #endmeeting