#openstack-tc log

15:00:09 <gmann> #startmeeting tc
15:00:09 <opendevmeet> Meeting started Thu Apr 28 15:00:09 2022 UTC and is due to finish in 60 minutes.  The chair is gmann. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:09 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:09 <opendevmeet> The meeting name has been set to 'tc'
15:00:16 <gmann> #topic Roll call
15:00:21 <slaweq> o/
15:00:22 <gmann> tc-members meetig time
15:00:23 <rosmaita> o/
15:00:24 <dpawlik> o/
15:00:25 <gmann> o/
15:00:40 <dansmith> o/
15:00:43 <knikolla> o/
15:00:46 <jungleboyj> o/
15:01:32 <gmann> today agenda #link https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Next_Meeting
15:02:08 <gmann> let's start
15:02:16 <gmann> #link Follow up on past action items
15:02:34 <gmann> dpawlik to send the new dashboard communication on openstack-discuss ML
15:02:44 <gmann> he sent that #link http://lists.openstack.org/pipermail/openstack-discuss/2022-April/028346.html
15:02:46 <dansmith> already done
15:02:48 <dansmith> :)
15:02:50 <gmann> thanks
15:03:02 <gmann> we will talk about it in separate topic too
15:03:12 <gmann> gmann to schedule call on tick-tock release note question
15:03:29 <gmann> I did  not do this, I will day
15:03:58 <gmann> rosmaita: dansmith  what time is best for this discussion ?
15:04:18 <gmann> our next TC meeting is video call on 5th may you want to cover in that?
15:04:28 <rosmaita> that sounds like a good plan
15:04:30 <dansmith> sure
15:04:51 <gmann> ok, I will keep at least half an hour for this or more if less topic
15:04:54 <gmann> thanks
15:04:56 <arne_wiebalck> o/
15:05:16 <rosmaita> that sounds like plenty of time
15:05:41 <jungleboyj> ++
15:05:49 <slaweq> +1
15:05:51 <gmann> and in paralel I am checking with foundation about trademark checks on 'tick-tock' name so we might be ready with that
15:06:34 <jungleboyj> I thought we were going to try to get away from the term tick-tock ?
15:06:59 <gmann> we said these are ok but let's check legal things also for name
15:07:03 <dansmith> jungleboyj: we did?
15:07:20 <jungleboyj> Maybe that was the Cinder team that was complaining about the name.
15:07:21 <dansmith> tick-tock has the same meaning elsewhere in the industry, so we get the benefit of not inventing new words
15:07:31 <dansmith> yes, they did do that :)
15:07:31 <arne_wiebalck> jungleboyj: I think we only said check if there are concerns since it seems connected to intel.
15:07:46 <gmann> yeah, if trademark is ok then we will go with these name
15:07:52 <jungleboyj> Ok ...
15:08:05 <dansmith> arne_wiebalck: right
15:08:09 <fungi> wes said he already heard back from legal counsel, but i guess he hasn't followed up with you yet
15:08:10 <jungleboyj> If they complain again I will explain accordingly.  :-)
15:08:16 <fungi> i'll prod him
15:08:20 <gmann> jungleboyj: +1
15:08:25 <dansmith> if we can't use them, then we can't but if we can, I think it is best to stick to the same terminology
15:08:37 <gmann> agree
15:08:52 <rosmaita> we need to hold the next ptg in new jersey so we can all meet at the Tick Tock Diner
15:09:05 <dansmith> intel failed to trademark 3, 8, and 6, but.. tick-tock, maybe they were successful :P
15:09:18 <knikolla> ooo i miss in-person ptgs.
15:09:44 <gmann> next one might be. but not final
15:09:48 <jungleboyj> Cedar Rapids, IA would work as well.  One of our favorite restaurants is call the 'Tick Tock'.
15:09:49 * arne_wiebalck checks the menu at the tick tock diner ...
15:10:00 <gmann> :)
15:10:01 <gmann> let's move to next topic
15:10:11 <gmann> #topic Gate health check
15:10:16 <gmann> any news?
15:10:51 <dansmith> yeah, so,
15:11:02 <dansmith> I've been working on the perf stats collection thing, which has been ... educational
15:11:29 <dansmith> but I realized that we're generating more than 100k rows of query log, which causes it to roll over, such that comparing two runs to each other is problematic
15:11:40 <dansmith> I upped the limit to 1m rows and we OOM
15:11:50 <rosmaita> that's ironic
15:11:56 <dansmith> so I'm trying other limits, but I'm concerned we'll be adding fuel to the fire here at this point :/
15:12:13 <dansmith> I wish we could get query stat counters without logging friggin everything, but it doesn't look like we can
15:12:22 <fungi> this is sql query logs specifically?
15:12:23 <gmann> how about going with 'compare with static data' only?
15:12:38 <dansmith> gmann: that's the problem and what I'm trying to do
15:13:05 <dansmith> gmann: generating static data from one run where we only have "the last 100k queries, whatever those are" is a different set of data from "the last 100k queries of this run"
15:13:13 <dansmith> so the numbers change randomly, when they shouldn't
15:13:45 <dansmith> work fine for tempest smoke, but when you compare two full tempest runs, you get seemingly wide variations because you're comparing different sets of data
15:13:56 <dansmith> so anyway,
15:13:57 <spotz> Ack sorry, I'm here!
15:13:59 <gmann> and that is mainly because of polliing API call?
15:14:27 <dansmith> gmann: no, I thought that was what was generating the variation, but it's because we're rolling over our logs
15:14:40 <dansmith> gmann: I mean, there's some variation due to polling as well, but I'm nulling that out for the api counters now
15:14:50 <gmann> ok. +1
15:14:54 <dansmith> we don't need to solve it here,
15:14:55 <gmann> I remember
15:14:56 <dansmith> just giving an update
15:15:18 <dansmith> that I'm trying, and I thought the counter-based approach would be easier because it wouldn't be sensitive to performance variations, but of course, devil in the details
15:15:19 <gmann> +1, I think this perf data is going to be very important things to check in our CI
15:15:57 <dansmith> other than that, I think there was some oslo policy thing breaking docs changes?
15:15:57 <slaweq> regarding rechecks I started "monitoring" them and I prepared simple etherpad https://etherpad.opendev.org/p/recheck-weekly-summary
15:16:26 <gmann> dansmith: I heared in nova channel about oslo policy break but did not check yet.
15:16:31 <dansmith> slaweq: what are the numbers, like 5,0?
15:16:52 <dansmith> gmann: I think whoami-rajat has a patch up, or so I heard
15:16:55 <slaweq> dansmith: average number of rechecks before patches in that project were merged
15:17:01 <gmann> ack
15:17:09 <slaweq> I checked patches merged in last week
15:17:11 <dansmith> slaweq: naked or non-naked?
15:17:35 <gmann> so two number are like in check and gate pipeline?
15:17:36 <slaweq> what do You mean by naked?
15:17:49 <fungi> recheck without a reason included in the comment
15:17:54 <slaweq> I count basically number of "build failed" comments from zuul
15:17:55 <dansmith> "recheck" with no reason.. you are just counting all rechecks?
15:18:03 <dansmith> ah
15:18:06 <slaweq> ahh, I didn't check them
15:18:18 <dansmith> interesting that swift is so high
15:18:22 <slaweq> but in patches which were rechecked most times it was "naked"
15:18:35 <dansmith> ack
15:19:02 <rosmaita> our message has not gotten through yet
15:19:05 <slaweq> next week I will add info about how many patches was merged
15:19:09 <gmann> I still did not get 0,37 or 5,0 means?
15:19:35 <slaweq> gmann: 0,37 is average number of rechecks in all merged patches in last week
15:19:38 <dansmith> average 5 rechecks to merge
15:19:58 <slaweq> so I checked each patch which was merged and count in each of them how many "build failed" comments were on last patchset
15:20:10 <slaweq> that many times basically it had to be rechecked before it was merged
15:20:13 <fungi> 0,37 meaning 37%?
15:20:26 <fungi> and 5,0 meaning 500%?
15:20:27 <slaweq> and I count average number of such rechecks in all patches
15:20:45 <dansmith> fungi: no, I think it means on average 5 rechecks to merge a patch
15:20:56 <slaweq> fungi: no, 5.0 means that all patches in swift in average were rechecked 5 times to get merged
15:21:04 <gmann> so for example this 2,75. what 2 donate and what 75 donate ?
15:21:13 <dansmith> 2.75
15:21:17 <slaweq> it could be e.g. 2 patches - one rechecked 10 times and one merged at first try
15:21:23 <gmann> oh is it . or ,
15:21:37 <dansmith> gmann: put on your eurogoggles :)
15:21:45 <slaweq> yeah, it should be with "." :)
15:21:56 <arne_wiebalck> is "build failed" the same as "recheck", or simply close enough? (a broken patch set will also result in "build failed" and then a fixed set is pushed and the build will work, no?)
15:21:59 <slaweq> it's decimal number
15:21:59 <spotz> eurogoggles:)
15:22:00 <gmann> with , I thought it is two different number donating two things
15:22:08 <slaweq> yeah, sorry for confusion
15:22:13 <gmann> got it now.
15:22:21 <slaweq> I will be better with it next week :)
15:22:38 <slaweq> "eurogoggles" - I like that :)
15:22:38 <dansmith> arne_wiebalck: build failed and rechecks are not the same for sure, and some people will resolve with a patch push instead of a recheck
15:22:43 <fungi> yeah, so 0,37 is the same as 0.37 (37%), and 5,0 is the same as 5.0 (500%) like i said
15:22:51 <dansmith> so swift could be pushing lots of broken patches and be high on the list
15:22:54 <dansmith> without rechecking
15:23:02 <arne_wiebalck> dansmith: this is what I mean
15:23:09 <gmann> yeah
15:23:10 <slaweq> arne_wiebalck: that's why I count build failed comments only on last patchset
15:23:10 <dansmith> fungi: I don't see how these are percentages :)
15:23:14 <slaweq> the one which was merged
15:23:33 <fungi> rechecks as 500% of patches
15:23:34 <arne_wiebalck> slaweq: oh, ok, thanks!
15:23:35 <dansmith> slaweq: ah, that's better
15:23:38 <fungi> a.k.a. 5x
15:23:38 <gmann> slaweq: yeah. but there are still more recheck on patches in-progress
15:24:10 <slaweq> gmann: but I'm checking only build failed in last patcheset on patches which are already merged
15:24:17 <gmann> ok
15:24:21 <slaweq> so there are only "good" patches counted
15:24:25 <slaweq> and final ones
15:25:15 <arne_wiebalck> slaweq: so it is rather conservative as there might be rechecks before
15:25:36 <timburke_> i'm confused about these numbers. the swift patch listed under "Patches with most rechecks from last week" (https://review.opendev.org/c/openstack/swift/+/837036) supposedly has 5 rechecks, but i see only 2. perhaps the non-check/gate pipelines should not be counted?
15:26:06 <slaweq> timburke_: yeah, now I see that I need to improve it a bit
15:26:19 <slaweq> it seems it counted build failed (arm64 pipeline) results too
15:26:31 <slaweq> and that shouldn't be the case probably
15:26:49 <gmann> timburke_: main idea is not to do recheck without comment or debugging #link https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures
15:26:55 <slaweq> when I wrote that script there wasn't that arm64 pipeline yet :)
15:27:02 <fungi> yes, looks like failures in non-voting pipelines were probably included
15:27:38 <gmann> slaweq: can we just count the recheck with no comment? I think that is what we target first as part of this work, to educate people on this.
15:27:51 <dansmith> slaweq: in case it's not clear, we're still very thankful that you're doing this, despite having some feedback about methodology :)
15:27:54 <fungi> i agree limiting it to check and gate is prudent. failures in pipelines like promote or experimental are noise for this purpose
15:28:05 <gmann> how many recheck with comment is something different that 'why this project is not so stable or what frequent failure we need to solve'
15:28:18 <timburke_> got it. i can work on that. fwiw, the other two failures fall into either repo mirror troubles (the probe test retry limit) or an eventlet deadlock bug (https://github.com/eventlet/eventlet/issues/742 -- i need to bug temoto for a release that includes the fix)
15:28:20 <slaweq> gmann: I will do something to check "naked" rechecks
15:28:33 <gmann> timburke_: +1
15:28:34 <dansmith> gmann: well I think slaweq was also looking to find which projects are "recheck grinding" patches into the gate that might further make it less stable
15:28:45 <dansmith> so I think both sets of data are useful
15:29:35 <gmann> sure, if both we can do that is great. and at the end of month or so we can push it on ML or project saying you are doung 'naked recheck' and 'you are not that stable so need to figure out that first'
15:30:09 <slaweq> gmann: I will prepare such data for each project for next week
15:30:19 <gmann> but thanks to slaweq for starting it and its nice start. let's discuss it after meeitng on what we can improve or so
15:30:25 <gmann> slaweq: thanks again.
15:30:56 <gmann> on gate, one things is QA decided to drop centos-8-stream testing #link http://lists.openstack.org/pipermail/openstack-discuss/2022-April/028321.html
15:31:20 <gmann> if any of the job project you know, please remove it or replace the testing with c9s
15:31:54 <gmann> anything else on gate health ?
15:32:49 <gmann> #topic Retiring the status.openstack.org server
15:32:54 <gmann> #link http://lists.openstack.org/pipermail/openstack-discuss/2022-April/028279.html
15:33:42 <gmann> there are two things are not yet replaced which are proposed for retire 1. #link http://status.openstack.org/elastic-recheck/ 2. #link http://status.openstack.org/reviews/
15:33:59 <gmann> for elastic recheck we will discuss in next topic
15:34:19 <gmann> for review dashbaord, anyone use that? (or was using that as it is broken now)
15:34:49 <fungi> it's probably been broken for many months, i really have no idea when it started returning empty json
15:35:22 <fungi> and nobody's replied to my message about it on the ml yet, so i'm expecting it's been entirely unused for a long time
15:35:34 <slaweq> ups, I didn't even know about it :)
15:35:44 <gmann> but question is if we need can we fix and make it up?
15:36:03 <gmann> or no option of that and we need to retire it as no help or infra resource?
15:36:22 <jungleboyj> We do have Upstream University content that refers to this.  So we will need to remove/update that.
15:37:51 <spotz> I'm not sure we mention rechecks but could add it
15:38:10 <gmann> fungi: do we have option to bring it up again if anyone want it?
15:38:27 <fungi> jungleboyj: refers to the reviewday query dashboard, or to something else?
15:40:07 <jungleboyj> spotz: I think we had information on rechecks.  I know we refer to status.openstack.org for monitoring patches being checked and a mention of elastic recheck.
15:40:18 <jungleboyj> Anyway, not a show stopper in this discussion.
15:40:27 <fungi> ahh, yes, well elastic-recheck is next on the agenda
15:40:36 <gmann> yeah, amny porject also using that. we will discuss it next
15:42:24 <gmann> let's wait for reply on ML but if we do not have option to bring it up if anyone need than I think we cannot do much. and whatever opendev team (fungi ) decide is ok.
15:42:35 <gmann> moving to next topic?
15:42:44 <fungi> yeah, the plan outlined in the e-mail was to take it down tomorrow
15:42:56 <gmann> ok
15:42:59 <gmann> #topic Communicating the new ELK service dashboard and login information
15:43:16 <gmann> #link http://lists.openstack.org/pipermail/openstack-discuss/2022-April/028346.html
15:43:34 <gmann> dpawlik: go ahead on updates and elastic-reheck poosiblities ?
15:44:48 <dpawlik> so the tripleO project got own version of e-r . They contenerized the service - https://opendev.org/opendev/elastic-recheck/src/branch/rdo
15:44:57 <dpawlik> Still don't know if TripleO can handle that service or they have enough resources on that server to provide e-r for Openstack.
15:46:05 <gmann> If they can provde I think that will be great as we do not have e-r service now.
15:46:26 <dpawlik> by saying own version I mean http://ci-health.tripleo.org/
15:46:49 <clarkb> its also maintained in a separate fork of the e-r tool (same repo but different branch)
15:47:13 <dpawlik> clarkb: exactly, branch rdo
15:47:24 <gmann> yeah, no idea why but I think we can sync up with tripelo team on this
15:48:13 <dpawlik> it will be the best idea to get more information
15:48:13 <gmann> dpawlik: that is ok whatever server it run as long as we get that in OpenStack it is fine.  we do not have much resoruce in opendev now so sharing from tripleO or other place is all good
15:48:34 <gmann> any tc-members to syncup on this with tripleO team?
15:49:14 <gmann> especailyl redhat folks as they might know eack other
15:49:18 <gmann> each
15:49:54 <dpawlik> I will talk
15:50:23 <gmann> thanks and i will check if any tc-member can join you also.
15:50:26 <dpawlik> next week I should have full answer
15:50:37 <gmann> anything else dpawlik on this topic ?
15:51:09 <clarkb> its probably worth reiterating that the opendev team would like to continue to be able to support these use cases
15:51:28 <clarkb> the problem we're facing is that it seems its more likely for projects to go and do it themselves and not contribute to the commons
15:51:50 <spotz> I can probably help
15:51:55 <gmann> clarkb: you mean ELK service or e-r specially ?
15:51:59 <clarkb> I think it is a bug that tripleo forked the tool and did their own thing rather tahn work in the commons. But that ship has sailed for this particular insance
15:52:01 <clarkb> gmann: both
15:52:17 <gmann> spotz: great, please coordinate with dpawlik .
15:53:28 <gmann> clarkb: I am confused. you mean if anyone comes and maintain them in opendev then or going back the dicission of shuting down those and with current resource opendev team can do?
15:54:11 <clarkb> gmann: I mean a year ago we identified shutdown as a risk and said we need help. Since then tripleo forked the tools and has stood up a completely separate instance of things ignoring the commons which have now been shutdown and partially replaced with opensearch
15:54:14 <fungi> gmann: if those people joined the opendev team we'd have capacity to do more
15:54:33 <clarkb> right if when we said "we need help" we got help instead of the tool getting forked then we are in a very different situation today
15:54:48 <clarkb> I think the ship has sailed and we should move on. But I don't want the impression to be with the next thing that we don't wnt help
15:54:53 <gmann> ok. so its same situation and the reason you have to shutdown
15:55:00 <clarkb> our position is still that help si what we need as the priority
15:55:04 <clarkb> we're doing our best otherwise
15:55:08 <gmann> I agree more coordinatrion from tripleo can be good here
15:55:28 <dpawlik> from the other topic: it would be good to know what visualization can be created that will be helpful for developers. I made simple one, but they should be replaced. I will write an email in few days
15:55:51 <gmann> but may be opendev team need to brainstorm how to get that help from projects and why they do not do
15:56:03 <gmann> dpawlik: +1, sure
15:56:35 <gmann> dpawlik:  I will keep this topic in agenda and we will continue discussion on e-r
15:56:39 <gmann> moving next
15:56:48 <gmann> #topic Open Reviews
15:56:59 <gmann> #link https://review.opendev.org/q/projects:openstack/governance+is:open
15:57:12 <gmann> I will list quick one here
15:58:43 <gmann> I checked and most of them are good to go. but we need review on FIPS goal milestone #link https://review.opendev.org/c/openstack/governance/+/838601
15:58:47 <gmann> please check
15:59:07 <gmann> that is all from my side. next meeting is on video call. I will paste the link in wiki.
15:59:29 <gmann> thanks all for joining
15:59:33 <gmann> #endmeeting