15:00:09 #startmeeting tc 15:00:09 Meeting started Thu Apr 28 15:00:09 2022 UTC and is due to finish in 60 minutes. The chair is gmann. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:09 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:09 The meeting name has been set to 'tc' 15:00:16 #topic Roll call 15:00:21 o/ 15:00:22 tc-members meetig time 15:00:23 o/ 15:00:24 o/ 15:00:25 o/ 15:00:40 o/ 15:00:43 o/ 15:00:46 o/ 15:01:32 today agenda #link https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting 15:02:08 let's start 15:02:16 #link Follow up on past action items 15:02:34 dpawlik to send the new dashboard communication on openstack-discuss ML 15:02:44 he sent that #link http://lists.openstack.org/pipermail/openstack-discuss/2022-April/028346.html 15:02:46 already done 15:02:48 :) 15:02:50 thanks 15:03:02 we will talk about it in separate topic too 15:03:12 gmann to schedule call on tick-tock release note question 15:03:29 I did not do this, I will day 15:03:58 rosmaita: dansmith what time is best for this discussion ? 15:04:18 our next TC meeting is video call on 5th may you want to cover in that? 15:04:28 that sounds like a good plan 15:04:30 sure 15:04:51 ok, I will keep at least half an hour for this or more if less topic 15:04:54 thanks 15:04:56 o/ 15:05:16 that sounds like plenty of time 15:05:41 ++ 15:05:49 +1 15:05:51 and in paralel I am checking with foundation about trademark checks on 'tick-tock' name so we might be ready with that 15:06:34 I thought we were going to try to get away from the term tick-tock ? 15:06:59 we said these are ok but let's check legal things also for name 15:07:03 jungleboyj: we did? 15:07:20 Maybe that was the Cinder team that was complaining about the name. 15:07:21 tick-tock has the same meaning elsewhere in the industry, so we get the benefit of not inventing new words 15:07:31 yes, they did do that :) 15:07:31 jungleboyj: I think we only said check if there are concerns since it seems connected to intel. 15:07:46 yeah, if trademark is ok then we will go with these name 15:07:52 Ok ... 15:08:05 arne_wiebalck: right 15:08:09 wes said he already heard back from legal counsel, but i guess he hasn't followed up with you yet 15:08:10 If they complain again I will explain accordingly. :-) 15:08:16 i'll prod him 15:08:20 jungleboyj: +1 15:08:25 if we can't use them, then we can't but if we can, I think it is best to stick to the same terminology 15:08:37 agree 15:08:52 we need to hold the next ptg in new jersey so we can all meet at the Tick Tock Diner 15:09:05 intel failed to trademark 3, 8, and 6, but.. tick-tock, maybe they were successful :P 15:09:18 ooo i miss in-person ptgs. 15:09:44 next one might be. but not final 15:09:48 Cedar Rapids, IA would work as well. One of our favorite restaurants is call the 'Tick Tock'. 15:09:49 * arne_wiebalck checks the menu at the tick tock diner ... 15:10:00 :) 15:10:01 let's move to next topic 15:10:11 #topic Gate health check 15:10:16 any news? 15:10:51 yeah, so, 15:11:02 I've been working on the perf stats collection thing, which has been ... educational 15:11:29 but I realized that we're generating more than 100k rows of query log, which causes it to roll over, such that comparing two runs to each other is problematic 15:11:40 I upped the limit to 1m rows and we OOM 15:11:50 that's ironic 15:11:56 so I'm trying other limits, but I'm concerned we'll be adding fuel to the fire here at this point :/ 15:12:13 I wish we could get query stat counters without logging friggin everything, but it doesn't look like we can 15:12:22 this is sql query logs specifically? 15:12:23 how about going with 'compare with static data' only? 15:12:38 gmann: that's the problem and what I'm trying to do 15:13:05 gmann: generating static data from one run where we only have "the last 100k queries, whatever those are" is a different set of data from "the last 100k queries of this run" 15:13:13 so the numbers change randomly, when they shouldn't 15:13:45 work fine for tempest smoke, but when you compare two full tempest runs, you get seemingly wide variations because you're comparing different sets of data 15:13:56 so anyway, 15:13:57 Ack sorry, I'm here! 15:13:59 and that is mainly because of polliing API call? 15:14:27 gmann: no, I thought that was what was generating the variation, but it's because we're rolling over our logs 15:14:40 gmann: I mean, there's some variation due to polling as well, but I'm nulling that out for the api counters now 15:14:50 ok. +1 15:14:54 we don't need to solve it here, 15:14:55 I remember 15:14:56 just giving an update 15:15:18 that I'm trying, and I thought the counter-based approach would be easier because it wouldn't be sensitive to performance variations, but of course, devil in the details 15:15:19 +1, I think this perf data is going to be very important things to check in our CI 15:15:57 other than that, I think there was some oslo policy thing breaking docs changes? 15:15:57 regarding rechecks I started "monitoring" them and I prepared simple etherpad https://etherpad.opendev.org/p/recheck-weekly-summary 15:16:26 dansmith: I heared in nova channel about oslo policy break but did not check yet. 15:16:31 slaweq: what are the numbers, like 5,0? 15:16:52 gmann: I think whoami-rajat has a patch up, or so I heard 15:16:55 dansmith: average number of rechecks before patches in that project were merged 15:17:01 ack 15:17:09 I checked patches merged in last week 15:17:11 slaweq: naked or non-naked? 15:17:35 so two number are like in check and gate pipeline? 15:17:36 what do You mean by naked? 15:17:49 recheck without a reason included in the comment 15:17:54 I count basically number of "build failed" comments from zuul 15:17:55 "recheck" with no reason.. you are just counting all rechecks? 15:18:03 ah 15:18:06 ahh, I didn't check them 15:18:18 interesting that swift is so high 15:18:22 but in patches which were rechecked most times it was "naked" 15:18:35 ack 15:19:02 our message has not gotten through yet 15:19:05 next week I will add info about how many patches was merged 15:19:09 I still did not get 0,37 or 5,0 means? 15:19:35 gmann: 0,37 is average number of rechecks in all merged patches in last week 15:19:38 average 5 rechecks to merge 15:19:58 so I checked each patch which was merged and count in each of them how many "build failed" comments were on last patchset 15:20:10 that many times basically it had to be rechecked before it was merged 15:20:13 0,37 meaning 37%? 15:20:26 and 5,0 meaning 500%? 15:20:27 and I count average number of such rechecks in all patches 15:20:45 fungi: no, I think it means on average 5 rechecks to merge a patch 15:20:56 fungi: no, 5.0 means that all patches in swift in average were rechecked 5 times to get merged 15:21:04 so for example this 2,75. what 2 donate and what 75 donate ? 15:21:13 2.75 15:21:17 it could be e.g. 2 patches - one rechecked 10 times and one merged at first try 15:21:23 oh is it . or , 15:21:37 gmann: put on your eurogoggles :) 15:21:45 yeah, it should be with "." :) 15:21:56 is "build failed" the same as "recheck", or simply close enough? (a broken patch set will also result in "build failed" and then a fixed set is pushed and the build will work, no?) 15:21:59 it's decimal number 15:21:59 eurogoggles:) 15:22:00 with , I thought it is two different number donating two things 15:22:08 yeah, sorry for confusion 15:22:13 got it now. 15:22:21 I will be better with it next week :) 15:22:38 "eurogoggles" - I like that :) 15:22:38 arne_wiebalck: build failed and rechecks are not the same for sure, and some people will resolve with a patch push instead of a recheck 15:22:43 yeah, so 0,37 is the same as 0.37 (37%), and 5,0 is the same as 5.0 (500%) like i said 15:22:51 so swift could be pushing lots of broken patches and be high on the list 15:22:54 without rechecking 15:23:02 dansmith: this is what I mean 15:23:09 yeah 15:23:10 arne_wiebalck: that's why I count build failed comments only on last patchset 15:23:10 fungi: I don't see how these are percentages :) 15:23:14 the one which was merged 15:23:33 rechecks as 500% of patches 15:23:34 slaweq: oh, ok, thanks! 15:23:35 slaweq: ah, that's better 15:23:38 a.k.a. 5x 15:23:38 slaweq: yeah. but there are still more recheck on patches in-progress 15:24:10 gmann: but I'm checking only build failed in last patcheset on patches which are already merged 15:24:17 ok 15:24:21 so there are only "good" patches counted 15:24:25 and final ones 15:25:15 slaweq: so it is rather conservative as there might be rechecks before 15:25:36 i'm confused about these numbers. the swift patch listed under "Patches with most rechecks from last week" (https://review.opendev.org/c/openstack/swift/+/837036) supposedly has 5 rechecks, but i see only 2. perhaps the non-check/gate pipelines should not be counted? 15:26:06 timburke_: yeah, now I see that I need to improve it a bit 15:26:19 it seems it counted build failed (arm64 pipeline) results too 15:26:31 and that shouldn't be the case probably 15:26:49 timburke_: main idea is not to do recheck without comment or debugging #link https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures 15:26:55 when I wrote that script there wasn't that arm64 pipeline yet :) 15:27:02 yes, looks like failures in non-voting pipelines were probably included 15:27:38 slaweq: can we just count the recheck with no comment? I think that is what we target first as part of this work, to educate people on this. 15:27:51 slaweq: in case it's not clear, we're still very thankful that you're doing this, despite having some feedback about methodology :) 15:27:54 i agree limiting it to check and gate is prudent. failures in pipelines like promote or experimental are noise for this purpose 15:28:05 how many recheck with comment is something different that 'why this project is not so stable or what frequent failure we need to solve' 15:28:18 got it. i can work on that. fwiw, the other two failures fall into either repo mirror troubles (the probe test retry limit) or an eventlet deadlock bug (https://github.com/eventlet/eventlet/issues/742 -- i need to bug temoto for a release that includes the fix) 15:28:20 gmann: I will do something to check "naked" rechecks 15:28:33 timburke_: +1 15:28:34 gmann: well I think slaweq was also looking to find which projects are "recheck grinding" patches into the gate that might further make it less stable 15:28:45 so I think both sets of data are useful 15:29:35 sure, if both we can do that is great. and at the end of month or so we can push it on ML or project saying you are doung 'naked recheck' and 'you are not that stable so need to figure out that first' 15:30:09 gmann: I will prepare such data for each project for next week 15:30:19 but thanks to slaweq for starting it and its nice start. let's discuss it after meeitng on what we can improve or so 15:30:25 slaweq: thanks again. 15:30:56 on gate, one things is QA decided to drop centos-8-stream testing #link http://lists.openstack.org/pipermail/openstack-discuss/2022-April/028321.html 15:31:20 if any of the job project you know, please remove it or replace the testing with c9s 15:31:54 anything else on gate health ? 15:32:49 #topic Retiring the status.openstack.org server 15:32:54 #link http://lists.openstack.org/pipermail/openstack-discuss/2022-April/028279.html 15:33:42 there are two things are not yet replaced which are proposed for retire 1. #link http://status.openstack.org/elastic-recheck/ 2. #link http://status.openstack.org/reviews/ 15:33:59 for elastic recheck we will discuss in next topic 15:34:19 for review dashbaord, anyone use that? (or was using that as it is broken now) 15:34:49 it's probably been broken for many months, i really have no idea when it started returning empty json 15:35:22 and nobody's replied to my message about it on the ml yet, so i'm expecting it's been entirely unused for a long time 15:35:34 ups, I didn't even know about it :) 15:35:44 but question is if we need can we fix and make it up? 15:36:03 or no option of that and we need to retire it as no help or infra resource? 15:36:22 We do have Upstream University content that refers to this. So we will need to remove/update that. 15:37:51 I'm not sure we mention rechecks but could add it 15:38:10 fungi: do we have option to bring it up again if anyone want it? 15:38:27 jungleboyj: refers to the reviewday query dashboard, or to something else? 15:40:07 spotz: I think we had information on rechecks. I know we refer to status.openstack.org for monitoring patches being checked and a mention of elastic recheck. 15:40:18 Anyway, not a show stopper in this discussion. 15:40:27 ahh, yes, well elastic-recheck is next on the agenda 15:40:36 yeah, amny porject also using that. we will discuss it next 15:42:24 let's wait for reply on ML but if we do not have option to bring it up if anyone need than I think we cannot do much. and whatever opendev team (fungi ) decide is ok. 15:42:35 moving to next topic? 15:42:44 yeah, the plan outlined in the e-mail was to take it down tomorrow 15:42:56 ok 15:42:59 #topic Communicating the new ELK service dashboard and login information 15:43:16 #link http://lists.openstack.org/pipermail/openstack-discuss/2022-April/028346.html 15:43:34 dpawlik: go ahead on updates and elastic-reheck poosiblities ? 15:44:48 so the tripleO project got own version of e-r . They contenerized the service - https://opendev.org/opendev/elastic-recheck/src/branch/rdo 15:44:57 Still don't know if TripleO can handle that service or they have enough resources on that server to provide e-r for Openstack. 15:46:05 If they can provde I think that will be great as we do not have e-r service now. 15:46:26 by saying own version I mean http://ci-health.tripleo.org/ 15:46:49 its also maintained in a separate fork of the e-r tool (same repo but different branch) 15:47:13 clarkb: exactly, branch rdo 15:47:24 yeah, no idea why but I think we can sync up with tripelo team on this 15:48:13 it will be the best idea to get more information 15:48:13 dpawlik: that is ok whatever server it run as long as we get that in OpenStack it is fine. we do not have much resoruce in opendev now so sharing from tripleO or other place is all good 15:48:34 any tc-members to syncup on this with tripleO team? 15:49:14 especailyl redhat folks as they might know eack other 15:49:18 each 15:49:54 I will talk 15:50:23 thanks and i will check if any tc-member can join you also. 15:50:26 next week I should have full answer 15:50:37 anything else dpawlik on this topic ? 15:51:09 its probably worth reiterating that the opendev team would like to continue to be able to support these use cases 15:51:28 the problem we're facing is that it seems its more likely for projects to go and do it themselves and not contribute to the commons 15:51:50 I can probably help 15:51:55 clarkb: you mean ELK service or e-r specially ? 15:51:59 I think it is a bug that tripleo forked the tool and did their own thing rather tahn work in the commons. But that ship has sailed for this particular insance 15:52:01 gmann: both 15:52:17 spotz: great, please coordinate with dpawlik . 15:53:28 clarkb: I am confused. you mean if anyone comes and maintain them in opendev then or going back the dicission of shuting down those and with current resource opendev team can do? 15:54:11 gmann: I mean a year ago we identified shutdown as a risk and said we need help. Since then tripleo forked the tools and has stood up a completely separate instance of things ignoring the commons which have now been shutdown and partially replaced with opensearch 15:54:14 gmann: if those people joined the opendev team we'd have capacity to do more 15:54:33 right if when we said "we need help" we got help instead of the tool getting forked then we are in a very different situation today 15:54:48 I think the ship has sailed and we should move on. But I don't want the impression to be with the next thing that we don't wnt help 15:54:53 ok. so its same situation and the reason you have to shutdown 15:55:00 our position is still that help si what we need as the priority 15:55:04 we're doing our best otherwise 15:55:08 I agree more coordinatrion from tripleo can be good here 15:55:28 from the other topic: it would be good to know what visualization can be created that will be helpful for developers. I made simple one, but they should be replaced. I will write an email in few days 15:55:51 but may be opendev team need to brainstorm how to get that help from projects and why they do not do 15:56:03 dpawlik: +1, sure 15:56:35 dpawlik: I will keep this topic in agenda and we will continue discussion on e-r 15:56:39 moving next 15:56:48 #topic Open Reviews 15:56:59 #link https://review.opendev.org/q/projects:openstack/governance+is:open 15:57:12 I will list quick one here 15:58:43 I checked and most of them are good to go. but we need review on FIPS goal milestone #link https://review.opendev.org/c/openstack/governance/+/838601 15:58:47 please check 15:59:07 that is all from my side. next meeting is on video call. I will paste the link in wiki. 15:59:29 thanks all for joining 15:59:33 #endmeeting