15:00:10 <gmann> #startmeeting tc
15:00:10 <opendevmeet> Meeting started Thu Sep 30 15:00:10 2021 UTC and is due to finish in 60 minutes.  The chair is gmann. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:10 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:10 <opendevmeet> The meeting name has been set to 'tc'
15:00:15 <yoctozepto> o/
15:00:18 <jungleboyj> o/
15:00:21 <gmann> fungi: clarkb dpawlik ping
15:00:23 <gmann> #topic Roll call
15:00:24 <gmann> o/
15:00:26 <dpawlik> o/
15:00:26 <fungi> mnaser is still presenting on openinfra.live
15:00:34 <belmoreira> o/
15:01:57 <clarkb> hi
15:02:06 <gmann> only four tc-members, let's wait for 1 more min.
15:02:48 <mnaser> o/ kinda?
15:03:06 <gmann> let's start
15:03:08 <gmann> #topic Follow up on past action items
15:03:15 <gmann> gmann to remove 'TC tags analysis' from agenda
15:03:16 <gmann> done
15:03:26 <gmann> #topic Gate health check (dansmith/yoctozepto)
15:03:35 <gmann> dansmith: yoctozepto go ahead
15:03:57 <yoctozepto> gates are unbelievably happy our I'm just oblivious to gate issues
15:04:02 <yoctozepto> or*
15:04:05 <yoctozepto> :-)
15:04:10 <jungleboyj> :-)
15:04:12 <gmann> after lot of fixes :)
15:04:34 <yoctozepto> yeah, but it was much smaller than previous similar events
15:04:41 <yoctozepto> at least how I remembered them
15:04:46 <gmann> apache2 on devstack uwsgi setup is fixed which blocked gate for almost 1.5 days
15:05:03 <gmann> we also faced two more issue: 1. nova ceph on stable and 2. nova grenade on master failure
15:05:04 <fungi> mit's fedora 34 mirror we sync from in opendev was broken, we just moments ago completed a switch to a mirror at uh.edu
15:05:06 <gmann> both are fixed now
15:05:29 <gmann> +1
15:05:48 <clarkb> did the dstat replacement issue ever get addressed?
15:05:53 <fungi> also we've got another zuul restart coming up today for some more stabilization fixes in advance of xena release week
15:06:09 <yoctozepto> clarkb: no, unfortunately
15:06:32 <gmann> release is planned for next week so let's keep eyes on gate.
15:06:38 <yoctozepto> guess it's not bad enough
15:06:55 <yoctozepto> I'll look at it at some time for sure
15:07:24 <gmann> thanks
15:07:28 <gmann> anything else on gate?
15:08:10 <gmann> #topic Place to maintain the external hosted ELK, E-R, O-H services
15:08:20 <gmann> As background and to refresh the memory:
15:08:56 <gmann> As there is/was no maintainer to maintain ELK services in OpenDev and these services were proposed to shutdown, we raised the ELK service help in community as well as to Board of Directors.
15:09:21 <gmann> Allison(BoD chair) putting a great effort on this and it seems, there are good possibility (still under discussion) of getting hosted ELK service from OpenSearch (an open source fork of Elasticsearch).
15:09:39 <gmann> in addition to hosted ELK, dpawlik volunteer to maintain it.
15:10:07 <gmann> this etherpad has more details #link https://etherpad.opendev.org/p/elk-service-maintenance-plan
15:10:08 <dpawlik> exactly gmann
15:10:23 <gmann> Now two open questions on us:
15:10:32 <gmann> let's go one by one
15:10:39 <gmann> 1. is OpenSearch can fulfill our requirement for ELK service usage?
15:10:42 <clarkb> Note OpenSearch doesn't do the L in ELK. Only the E and K
15:10:53 <gmann> ok
15:11:06 <gmann> clarkb: fungi dpawlik you can give more details on us
15:11:25 <yoctozepto> can we use fluentd instead of logstash then?
15:11:28 <gmann> s/us/this
15:11:31 <fungi> and there's a bunch more tech in that suite openstack is relying on, elasticsearch is merely the queryable storage backend
15:11:41 <yoctozepto> gmann: perhaps on us as well, who knows
15:11:53 <wendar> yoctozepto: yes, and also, OpenSearch is working on an alternative to logstash, it just isn't ready yet
15:12:05 <gmann> wendar: hi
15:12:15 <wendar> hi
15:12:18 <fungi> yoctozepto: whoever's going to run it can presumably design it to use whatever ingestion software they want, just note that there's already a pile of logstash grok parsers
15:12:27 <gmann> fungi: yes, basically "ELK, Elastic-Recheck, and OpenStack-Health services"
15:12:28 <fungi> grok parser rules
15:12:52 <yoctozepto> fungi: yeah, I was wondering how much investment was done already
15:12:59 <yoctozepto> then a logstash clone would be better
15:12:59 <wendar> and, there is also an existing plugin for logstash to write to OpenSearch, so that might be the simplest in terms of immediate migration
15:13:08 <yoctozepto> or that ^
15:13:19 <gmann> yeah
15:14:58 <gmann> any other challenge we see on OpenSearch option?
15:14:59 * mnaser actually here
15:15:05 <gmann> mnaser: hi
15:15:25 <fungi> also opendev's sysadmins don't want to be responsible for custom notification bits for this, so we need to be able to drop the gearman event bits from our base job and instead the ingestion should work on polling available apis
15:15:48 <mnaser> ingestion rules are a thing, fluentd stuff is a thing, there's a few other ways that we can get data into opensearch which is going to have to probably be dealt with by whoever decides to pick this up tbh
15:16:06 <yoctozepto> mhm
15:16:08 <dpawlik> gmann: on disabling security plugin, we can have elasticsearch like it is running right now on opendev, but the proxy rules what operations like PUT/POST/DELETE needs to be filtered
15:16:16 <mnaser> i figure whoever will be pushing for this will probably get to be the authoritive answer at the end of the day
15:16:17 <dpawlik> by disabling*
15:16:19 <diablo_rojo> Sorry am late, was hosting an Open Infra Live episode
15:16:49 <gmann> mnaser: yeah we will be discussing that as next item which is important
15:17:09 <fungi> dpawlik: however, according to the opensearch folks at amazon we talked to, kibana isn't going to work correctly through a read-only proxy. they suggested switching to a shared read-only account for queries
15:17:51 <ricolin> sorry I'm late
15:18:01 <dpawlik> fungi: on rdo we share the read only user account
15:18:04 <gmann> shared read-only should work though
15:18:05 <clarkb> right one of the biggest things that kept us from upgraded prior to losing help was new ES needs new Kibana and new Kiabana can't operate anonymously
15:18:16 <fungi> something like what we do for the openstack-health subuinit database
15:18:19 <clarkb> opensearch addresses this by open sourcing AAA tooling
15:18:27 <fungi> s/subuinit/subunit/
15:19:02 <fungi> (though we actually have a query proxy in front of the mysql api for that database too, now that oi think about it, i should add that to the list in the pad)
15:20:12 <dpawlik> clarkb: that's why we do a trick to inject header with user credentials to automatically  login into the kibana
15:21:01 <dpawlik> clarkb: by saying "operate" you mean just to make a query and check visualizations, right?
15:21:17 <fungi> right, they alluded to that on the opensearch call... basically embed http basic auth creds
15:21:38 <clarkb> dpawlik: no, it needed write access to the database which we've removed with the proxy
15:22:10 <clarkb> anyway it sounds like their authentication and authorization tooling allows sufficient fine tuning to make kibana work
15:22:23 <fungi> from what they were saying, new kibana sort of "drops privileges" when a user authenticates to it, but needs essentially administrative access through the api for other functions
15:23:19 <dpawlik> fungi: right, there are some indexes that the user should be able to have write access to it
15:24:09 <clarkb> I don't think kibana will be an issue. And if it is it sounds like there may be sufficient interest on the opensearch side for potentailly improving things
15:24:22 <gmann> true
15:24:31 <dpawlik> clarkb: +1
15:24:36 <clarkb> The bigger issue is figuring out updating ingestion so that it works against a modern system. Then operating and maintaining that system
15:24:50 <clarkb> Since OpenSearch as a service leaves that to the user
15:24:57 <gmann> wendar: clarkb how is their contract going to be, yearly?
15:25:46 <wendar> Tom described it as "perpetual", and also said to chat with him if the current amount/year isn't enough (it probably will be)
15:25:54 <gmann> clarkb: yeah. let's figure out where to place those which we will discuss next.
15:26:11 <gmann> wendar: ok
15:27:11 <gmann> I think we seems ok on opensearch on our requirements and later we can see what all we can truncate  the things
15:27:41 <gmann> now next question is on operating it
15:27:44 <gmann> 2. Where to maintain the hosted ELK? OpenStack?  or OpenDev?
15:28:12 <gmann> Due to less/no long term maintainers for ELK and more focus on key services, the OpenDev team decided to cease maintenance for the ELK, Elastic-Recheck, and OpenStack-Health services.
15:28:13 <TheJulia> I think there is a third option, spin up a new namespace on opendev because its use can be more generally applicable.
15:28:51 <gmann> I added few option in L16 https://etherpad.opendev.org/p/elk-service-maintenance-plan
15:28:59 <fungi> i think we're well past the time for opendev to continue being responsible for these services. we tried, we asked for help, we finally decided it was too far gone and have planned to shut it down
15:29:09 <gmann> fungi: agree
15:29:54 <TheJulia> so the determination then becomes, where does it need to live and the tooling operate, and much of that would be implementation detail based on re-engineering right?
15:30:24 <fungi> yes, i believe so
15:30:26 <fungi> at the moment, if something happens and the servers it's running on suffer a catastrophic failure, we have no way to rebuild what's there
15:30:43 <gmann> current options:
15:30:49 <gmann> Option1: Add it under TACT SIG
15:30:52 <gmann> Option2: Add it in OpenStack QA.
15:30:55 <TheJulia> so in essence, we're kind of spinning an entirely new project then
15:30:56 <gmann> Option3: Start a new project or SIG.
15:31:07 <gmann> or Option4: (for completeness) Stop relying on these services.  but we want these so let's keep it for later
15:31:16 <yoctozepto> Option2 is not feasible, we are already too small
15:31:31 <clarkb> re this tooling being generally applicable, I think it totally is and tehre is a lot of neat stuff that can be done in the psace. But we've not seen interst from outside of openstack really
15:31:34 <gmann> yoctozepto: yeah, same opinion from me too and listed in etherpad too
15:31:39 <yoctozepto> ok
15:31:52 <TheJulia> clarkb: could that also be the connoation that comes with the openstack name?
15:32:15 <TheJulia> connatation
15:32:18 <clarkb> TheJulia: maybe? What I've heard from the zuul devs is that it just isn't necessary for their workflows because Zuul's logging is much easier to sfit through
15:32:28 <yoctozepto> gmann: but I can't see this in the etherpad, you sure you wrote it down?
15:32:53 <gmann> yoctozepto: ah, missed.
15:33:03 <clarkb> I think a certain scale is necessary before you cross the threshold where developer time is better spent learning lucene instead of grep
15:33:08 * yoctozepto glad it's not him being blind
15:33:21 <gmann> done
15:33:25 <yoctozepto> thanks
15:33:31 <fungi> the reason it's been useful for openstack is a matter of scale, there are numerous failures which occur a fraction of a percent of the time, but you need to run tens or hundreds of thousands of times to get a large enough sample size to isolate and classify them
15:33:57 <gmann> yeah, scale is all key thing here
15:34:16 <clarkb> Zuul being a much smaller project also has a stronger sense of ownership for issues that pop up. If my tests fail due to some random reason I go and debug it and hopefully fix it and many of the zuul devs do that here
15:34:28 <fungi> the other projects in opendev just don't have the change volume openstack does, and don't run similar integration tests nearly as many times, so they lack sample size to make good use of this sort of solution
15:34:36 <clarkb> With many more devs in openstack not everyone can be expected to do that and rechecks happen which allow a larger percentage of these things through long term
15:34:54 <yoctozepto> ++ on all the thoughts
15:35:28 <gmann> ditto, +1. and with the integration scale openstack has it is natural.
15:35:40 <gmann> ok so we left with two option now
15:35:51 <gmann> TACT SIG or new project/SIG under openstack
15:36:06 <TheJulia> does it *have* to be under openstack?
15:36:14 <gmann> for TACT SIG fungi its again you need to give opinion.
15:36:37 <gmann> TheJulia: it is better if openstack is only user of it and no one else want it
15:36:37 <fungi> TheJulia: it doesn't have to be an openstack project or sig, no
15:36:57 <gmann> fungi: then new open infra project ?
15:36:57 <wendar> TheJulia: It doesn't have to be under openstack, but openstack is currently the only user, so it makes some sense, at least as a starting point.
15:37:06 <yoctozepto> it would look better under opendev to gather more folks around the solution
15:37:07 <fungi> the software could still be generally applicable even if the instance of the service being run isn't used beyond openstack projects
15:37:14 <TheJulia> gmann: it will never be able to gain external usage or adoption outside of openstack if we brand it openstack upfront
15:37:15 <yoctozepto> even if only for naming sake
15:37:26 <yoctozepto> TheJulia ++
15:37:28 <clarkb> TheJulia: wow that is an extremely pessimistic take
15:37:31 <gmann> opendev seems not option so I think openstack is default then
15:37:46 <jungleboyj> TheJulia:  ++
15:37:55 <gmann> or what other option we have?
15:38:00 <jungleboyj> clarkb:  It is pessimistic but probably realistic.
15:38:19 <wendar> We can always move it out of openstack later, if/when we start to generalize the code.
15:38:22 <clarkb> sure but OpenDev the non openstack location for doing this work has said "hey lets try and figure this out" for a few years now
15:38:31 <gmann> wendar: exactly.
15:38:32 <clarkb> and literally we have had negative interest (the team has shrunk over that time)
15:38:36 <fungi> the tact sig doesn't really have any "resources", it's effectively an umbrella to cover things people happen to be working on which didn't fit as opendev services but also didn't really make sense as an openstack project team
15:38:46 <clarkb> I don't think calling it something other than openstack helps either is kinda my point there
15:38:51 <TheJulia> clarkb: the name comes with a lot, good and bad unfortunately
15:38:58 <gmann> fungi: we have maintainer now its just we need a place it acn better fit for now
15:39:12 <wendar> The OpenSearch team even said they'd be willing to consider taking it on as a project, if it becomes generally useful.
15:39:29 <TheJulia> wendar: that sounds like a great possibility
15:39:29 <gmann> fungi: and if we again face no maintainer issue then it goes away
15:39:31 <wendar> (But, there's quite a bit of work to do, before that would make sense anyway.)
15:39:34 <fungi> we've been winding down all the services the opendev sysadmins were running outside of opendev itself, in order to be able to keep our core services in good shape. the only one which we don't currently have plans to discontinue (but would like to) is wiki.openstack.org
15:39:37 <clarkb> at least putting it in openstack whee people seem to care would give it potential to survive
15:39:57 <gmann> clarkb: indeed
15:40:28 <gmann> and if more project care and we have more maintainer then we can things of opendev or at external place again
15:40:50 <gmann> so let's go with the current situation 'only openstack need/care about it'
15:41:26 <gmann> and openstack would not object if it goes at some central place for others to use
15:42:26 <fungi> we have said in the past that we'd manage server resources at a low level for any project-specific efforts which needed them, but we stopped considering that viable some years back when it became apparent that people would build things which weren't long-term manageable, users would start to rely on them, then the services would be abandoned and the sysadmin group would be pressured to
15:42:28 <fungi> keep them running (ask.openstack.org as a prime example)
15:43:21 <gmann> so we have (will have) hosted service and maintainer to maintain it so place hardly matter for now and starting in openstack make sense to me and later we can again re-think on place based on interest of other projects.
15:43:53 <yoctozepto> ok, so a SIG?
15:44:02 <gmann> new SIG or TACT?
15:44:10 <yoctozepto> Debuggability SIG
15:44:14 <fungi> wiki.openstack.org is perhaps an even better example. we had volunteers from wikimedia foundation running it, they left, it wasn't really set up compatible with how we did our configuration management, so now it's in a really precarious state where basically nobody knows how to tend to it, its very outdated (probably rife with security holes) and we'd be hard pressed to rebuild it if
15:44:16 <fungi> something happened to the server it's on
15:44:39 <yoctozepto> fungi: good to know
15:44:56 <TheJulia> fungi: sounds like everyone just needs to begin moving off of it (projects wise)
15:45:14 <fungi> TheJulia: well, it's an example of what i don't want whatever this replacement is to become
15:45:19 <spotz__> Hey all
15:45:24 * TheJulia adds to ironic's PTG topic list
15:45:29 <TheJulia> fungi: ++
15:46:49 <gmann> I am fine on new SIG if fungi think it is ok and leave TACT SIG as it is as per its current scope ?
15:46:50 <gmann> but for me TACT SIG seems more close and even fungi there helping on migration or any other question, sharing same IRC channel or so
15:46:58 <fungi> each time we've proposed discontinuing the wiki server over the years, there's been uproar but no new volunteers to fix the problem. and the elastic-recheck/openstack-health discontinuation is at risk of going down the same path (though hopefully not, as we've learned those lessons i think?)
15:47:48 <fungi> i'm fine with the coordination happening through the tact sig, as a peg to hang the sign on, but just be aware that doesn't come with any guaranteed resources
15:47:50 <gmann> openstack-health we can again re-think if needed or not. QA is not using it so much as it used to be
15:47:58 <TheJulia> fungi: yeah, we need to adapt and change processes, and it sounds like getting rid of the gearman bits would help, I hope at least.
15:48:11 <clarkb> gmann: in fact I think the health services has been dead for a few months now and no one noticed :)
15:48:28 <gmann> fungi: true, and same thing if there is maintainer issue then yes we re-think on it. Will not put everything on you.
15:48:36 <gmann> clarkb: :)
15:49:06 <gmann> let me bring it again in QA meeting on 'who need it' ? i raised it few month back but we did not decided
15:49:43 <fungi> to be fair, hald of elastic-recheck has been broken for months/years so i don't think all of its functionality is being relied on, and sometimes the backend is down for a week with nobody noticing
15:49:49 <fungi> s/hald/half/
15:50:16 <clarkb> fungi: yup it tends to get more use during periods of time like now with release candidates and stabilization and people caring about the gate
15:50:26 <gmann> yeah
15:50:30 <clarkb> then no one cares during the 3 months of merge everything as quickly as possible
15:50:40 <gmann> ok so are out of time for other agenda
15:50:44 <yoctozepto> (btw, I think TC should now concern itself with wiki going down to help TACT SIG too)
15:51:22 <gmann> so as summary 1. OpenSearch is good option 2. TACT SIG is new place for hosted ELK maintenance
15:51:31 <yoctozepto> (as I doubt it's common knowledge)
15:51:45 <gmann> any more things to discuss on this, though I will keep it in agenda for next meeting too
15:52:01 <clarkb> note it isn't just the wiki and the ELK stack. It is all the services right? This is why I continue to try and remind people that helping OpenDev ensures this doesn't happen to a growing list of services. But I think many have sort of accepted that ship has sailed at this point?
15:52:29 <gmann> thanks for joining and great discussion on this  clarkb fungi dpawlik wendar fungi
15:52:35 <gmann> moving next
15:52:41 <clarkb> Now I don't think OpenStack should be the only project to help either. Zuul does a bunch of help but primarily on the CI side. Airship and Starlginx could probably be involved more
15:52:42 <gmann> #topic Project Health checks framework
15:52:51 <gmann> clarkb: indeed.
15:53:19 <gmann> ricolin: I think as we discussed in previous meeting, we are going to work on documentation in parallel to current tool right?
15:53:27 <gmann> #link https://etherpad.opendev.org/p/health_check
15:54:00 <ricolin> Indeed we have that action
15:54:16 <gmann> ricolin: ok, any more updates or so you would like to share today?
15:54:19 <ricolin> not yet starts it
15:54:40 <gmann> ok, no issue. if we can have somthing by PTG then it is good to discuss there
15:54:55 <ricolin> NP
15:54:57 <gmann> moving next
15:55:00 <gmann> #toppic OpenStack newsletter
15:55:31 <gmann> we have two items there for newsletter, please add more if you think to highlight #link https://etherpad.opendev.org/p/newsletter-openstack-news
15:55:45 <gmann> may be by tomorrow as deadline? diablo_rojo ?
15:56:22 <clarkb> You should probably have something about the new release?
15:56:25 <clarkb> Or maybe that is implied
15:56:50 <gmann> clarkb: yeah not sure when it is going to publish before or after release ?
15:57:21 <gmann> we usually add it explicitly if i remember correctly
15:57:32 <diablo_rojo> gmann, actually its tuesday
15:57:42 <diablo_rojo> for me to write the content
15:57:46 <diablo_rojo> so preferably Monday
15:57:48 <diablo_rojo> lol
15:58:07 <gmann> diablo_rojo: ok then we can add it if it is released after Xena then add
15:58:10 <gmann> #topic Open Reviews
15:58:17 <gmann> #link https://review.opendev.org/q/projects:openstack/governance+is:open
15:58:45 <gmann> this one to review and provide early feedback if you have any #link https://review.opendev.org/c/openstack/governance/+/810721
15:59:31 <diablo_rojo> I expect there is going to be a focus on Xena in the next one if not this one. We can mention it but I don't think I will be needing to spend a lot of time on that particular topic.
15:59:49 <gmann> diablo_rojo: +1
16:00:11 <gmann> one last thing, next meeting is going to be Video call.
16:00:15 <gmann> that's all from me today.
16:00:19 <gmann> thanks all for joining
16:00:25 <gmann> #endmeeting