15:00:10 #startmeeting tc 15:00:10 Meeting started Thu Sep 30 15:00:10 2021 UTC and is due to finish in 60 minutes. The chair is gmann. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:10 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:10 The meeting name has been set to 'tc' 15:00:15 o/ 15:00:18 o/ 15:00:21 fungi: clarkb dpawlik ping 15:00:23 #topic Roll call 15:00:24 o/ 15:00:26 o/ 15:00:26 mnaser is still presenting on openinfra.live 15:00:34 o/ 15:01:57 hi 15:02:06 only four tc-members, let's wait for 1 more min. 15:02:48 o/ kinda? 15:03:06 let's start 15:03:08 #topic Follow up on past action items 15:03:15 gmann to remove 'TC tags analysis' from agenda 15:03:16 done 15:03:26 #topic Gate health check (dansmith/yoctozepto) 15:03:35 dansmith: yoctozepto go ahead 15:03:57 gates are unbelievably happy our I'm just oblivious to gate issues 15:04:02 or* 15:04:05 :-) 15:04:10 :-) 15:04:12 after lot of fixes :) 15:04:34 yeah, but it was much smaller than previous similar events 15:04:41 at least how I remembered them 15:04:46 apache2 on devstack uwsgi setup is fixed which blocked gate for almost 1.5 days 15:05:03 we also faced two more issue: 1. nova ceph on stable and 2. nova grenade on master failure 15:05:04 mit's fedora 34 mirror we sync from in opendev was broken, we just moments ago completed a switch to a mirror at uh.edu 15:05:06 both are fixed now 15:05:29 +1 15:05:48 did the dstat replacement issue ever get addressed? 15:05:53 also we've got another zuul restart coming up today for some more stabilization fixes in advance of xena release week 15:06:09 clarkb: no, unfortunately 15:06:32 release is planned for next week so let's keep eyes on gate. 15:06:38 guess it's not bad enough 15:06:55 I'll look at it at some time for sure 15:07:24 thanks 15:07:28 anything else on gate? 15:08:10 #topic Place to maintain the external hosted ELK, E-R, O-H services 15:08:20 As background and to refresh the memory: 15:08:56 As there is/was no maintainer to maintain ELK services in OpenDev and these services were proposed to shutdown, we raised the ELK service help in community as well as to Board of Directors. 15:09:21 Allison(BoD chair) putting a great effort on this and it seems, there are good possibility (still under discussion) of getting hosted ELK service from OpenSearch (an open source fork of Elasticsearch). 15:09:39 in addition to hosted ELK, dpawlik volunteer to maintain it. 15:10:07 this etherpad has more details #link https://etherpad.opendev.org/p/elk-service-maintenance-plan 15:10:08 exactly gmann 15:10:23 Now two open questions on us: 15:10:32 let's go one by one 15:10:39 1. is OpenSearch can fulfill our requirement for ELK service usage? 15:10:42 Note OpenSearch doesn't do the L in ELK. Only the E and K 15:10:53 ok 15:11:06 clarkb: fungi dpawlik you can give more details on us 15:11:25 can we use fluentd instead of logstash then? 15:11:28 s/us/this 15:11:31 and there's a bunch more tech in that suite openstack is relying on, elasticsearch is merely the queryable storage backend 15:11:41 gmann: perhaps on us as well, who knows 15:11:53 yoctozepto: yes, and also, OpenSearch is working on an alternative to logstash, it just isn't ready yet 15:12:05 wendar: hi 15:12:15 hi 15:12:18 yoctozepto: whoever's going to run it can presumably design it to use whatever ingestion software they want, just note that there's already a pile of logstash grok parsers 15:12:27 fungi: yes, basically "ELK, Elastic-Recheck, and OpenStack-Health services" 15:12:28 grok parser rules 15:12:52 fungi: yeah, I was wondering how much investment was done already 15:12:59 then a logstash clone would be better 15:12:59 and, there is also an existing plugin for logstash to write to OpenSearch, so that might be the simplest in terms of immediate migration 15:13:08 or that ^ 15:13:19 yeah 15:14:58 any other challenge we see on OpenSearch option? 15:14:59 * mnaser actually here 15:15:05 mnaser: hi 15:15:25 also opendev's sysadmins don't want to be responsible for custom notification bits for this, so we need to be able to drop the gearman event bits from our base job and instead the ingestion should work on polling available apis 15:15:48 ingestion rules are a thing, fluentd stuff is a thing, there's a few other ways that we can get data into opensearch which is going to have to probably be dealt with by whoever decides to pick this up tbh 15:16:06 mhm 15:16:08 gmann: on disabling security plugin, we can have elasticsearch like it is running right now on opendev, but the proxy rules what operations like PUT/POST/DELETE needs to be filtered 15:16:16 i figure whoever will be pushing for this will probably get to be the authoritive answer at the end of the day 15:16:17 by disabling* 15:16:19 Sorry am late, was hosting an Open Infra Live episode 15:16:49 mnaser: yeah we will be discussing that as next item which is important 15:17:09 dpawlik: however, according to the opensearch folks at amazon we talked to, kibana isn't going to work correctly through a read-only proxy. they suggested switching to a shared read-only account for queries 15:17:51 sorry I'm late 15:18:01 fungi: on rdo we share the read only user account 15:18:04 shared read-only should work though 15:18:05 right one of the biggest things that kept us from upgraded prior to losing help was new ES needs new Kibana and new Kiabana can't operate anonymously 15:18:16 something like what we do for the openstack-health subuinit database 15:18:19 opensearch addresses this by open sourcing AAA tooling 15:18:27 s/subuinit/subunit/ 15:19:02 (though we actually have a query proxy in front of the mysql api for that database too, now that oi think about it, i should add that to the list in the pad) 15:20:12 clarkb: that's why we do a trick to inject header with user credentials to automatically login into the kibana 15:21:01 clarkb: by saying "operate" you mean just to make a query and check visualizations, right? 15:21:17 right, they alluded to that on the opensearch call... basically embed http basic auth creds 15:21:38 dpawlik: no, it needed write access to the database which we've removed with the proxy 15:22:10 anyway it sounds like their authentication and authorization tooling allows sufficient fine tuning to make kibana work 15:22:23 from what they were saying, new kibana sort of "drops privileges" when a user authenticates to it, but needs essentially administrative access through the api for other functions 15:23:19 fungi: right, there are some indexes that the user should be able to have write access to it 15:24:09 I don't think kibana will be an issue. And if it is it sounds like there may be sufficient interest on the opensearch side for potentailly improving things 15:24:22 true 15:24:31 clarkb: +1 15:24:36 The bigger issue is figuring out updating ingestion so that it works against a modern system. Then operating and maintaining that system 15:24:50 Since OpenSearch as a service leaves that to the user 15:24:57 wendar: clarkb how is their contract going to be, yearly? 15:25:46 Tom described it as "perpetual", and also said to chat with him if the current amount/year isn't enough (it probably will be) 15:25:54 clarkb: yeah. let's figure out where to place those which we will discuss next. 15:26:11 wendar: ok 15:27:11 I think we seems ok on opensearch on our requirements and later we can see what all we can truncate the things 15:27:41 now next question is on operating it 15:27:44 2. Where to maintain the hosted ELK? OpenStack? or OpenDev? 15:28:12 Due to less/no long term maintainers for ELK and more focus on key services, the OpenDev team decided to cease maintenance for the ELK, Elastic-Recheck, and OpenStack-Health services. 15:28:13 I think there is a third option, spin up a new namespace on opendev because its use can be more generally applicable. 15:28:51 I added few option in L16 https://etherpad.opendev.org/p/elk-service-maintenance-plan 15:28:59 i think we're well past the time for opendev to continue being responsible for these services. we tried, we asked for help, we finally decided it was too far gone and have planned to shut it down 15:29:09 fungi: agree 15:29:54 so the determination then becomes, where does it need to live and the tooling operate, and much of that would be implementation detail based on re-engineering right? 15:30:24 yes, i believe so 15:30:26 at the moment, if something happens and the servers it's running on suffer a catastrophic failure, we have no way to rebuild what's there 15:30:43 current options: 15:30:49 Option1: Add it under TACT SIG 15:30:52 Option2: Add it in OpenStack QA. 15:30:55 so in essence, we're kind of spinning an entirely new project then 15:30:56 Option3: Start a new project or SIG. 15:31:07 or Option4: (for completeness) Stop relying on these services. but we want these so let's keep it for later 15:31:16 Option2 is not feasible, we are already too small 15:31:31 re this tooling being generally applicable, I think it totally is and tehre is a lot of neat stuff that can be done in the psace. But we've not seen interst from outside of openstack really 15:31:34 yoctozepto: yeah, same opinion from me too and listed in etherpad too 15:31:39 ok 15:31:52 clarkb: could that also be the connoation that comes with the openstack name? 15:32:15 connatation 15:32:18 TheJulia: maybe? What I've heard from the zuul devs is that it just isn't necessary for their workflows because Zuul's logging is much easier to sfit through 15:32:28 gmann: but I can't see this in the etherpad, you sure you wrote it down? 15:32:53 yoctozepto: ah, missed. 15:33:03 I think a certain scale is necessary before you cross the threshold where developer time is better spent learning lucene instead of grep 15:33:08 * yoctozepto glad it's not him being blind 15:33:21 done 15:33:25 thanks 15:33:31 the reason it's been useful for openstack is a matter of scale, there are numerous failures which occur a fraction of a percent of the time, but you need to run tens or hundreds of thousands of times to get a large enough sample size to isolate and classify them 15:33:57 yeah, scale is all key thing here 15:34:16 Zuul being a much smaller project also has a stronger sense of ownership for issues that pop up. If my tests fail due to some random reason I go and debug it and hopefully fix it and many of the zuul devs do that here 15:34:28 the other projects in opendev just don't have the change volume openstack does, and don't run similar integration tests nearly as many times, so they lack sample size to make good use of this sort of solution 15:34:36 With many more devs in openstack not everyone can be expected to do that and rechecks happen which allow a larger percentage of these things through long term 15:34:54 ++ on all the thoughts 15:35:28 ditto, +1. and with the integration scale openstack has it is natural. 15:35:40 ok so we left with two option now 15:35:51 TACT SIG or new project/SIG under openstack 15:36:06 does it *have* to be under openstack? 15:36:14 for TACT SIG fungi its again you need to give opinion. 15:36:37 TheJulia: it is better if openstack is only user of it and no one else want it 15:36:37 TheJulia: it doesn't have to be an openstack project or sig, no 15:36:57 fungi: then new open infra project ? 15:36:57 TheJulia: It doesn't have to be under openstack, but openstack is currently the only user, so it makes some sense, at least as a starting point. 15:37:06 it would look better under opendev to gather more folks around the solution 15:37:07 the software could still be generally applicable even if the instance of the service being run isn't used beyond openstack projects 15:37:14 gmann: it will never be able to gain external usage or adoption outside of openstack if we brand it openstack upfront 15:37:15 even if only for naming sake 15:37:26 TheJulia ++ 15:37:28 TheJulia: wow that is an extremely pessimistic take 15:37:31 opendev seems not option so I think openstack is default then 15:37:46 TheJulia: ++ 15:37:55 or what other option we have? 15:38:00 clarkb: It is pessimistic but probably realistic. 15:38:19 We can always move it out of openstack later, if/when we start to generalize the code. 15:38:22 sure but OpenDev the non openstack location for doing this work has said "hey lets try and figure this out" for a few years now 15:38:31 wendar: exactly. 15:38:32 and literally we have had negative interest (the team has shrunk over that time) 15:38:36 the tact sig doesn't really have any "resources", it's effectively an umbrella to cover things people happen to be working on which didn't fit as opendev services but also didn't really make sense as an openstack project team 15:38:46 I don't think calling it something other than openstack helps either is kinda my point there 15:38:51 clarkb: the name comes with a lot, good and bad unfortunately 15:38:58 fungi: we have maintainer now its just we need a place it acn better fit for now 15:39:12 The OpenSearch team even said they'd be willing to consider taking it on as a project, if it becomes generally useful. 15:39:29 wendar: that sounds like a great possibility 15:39:29 fungi: and if we again face no maintainer issue then it goes away 15:39:31 (But, there's quite a bit of work to do, before that would make sense anyway.) 15:39:34 we've been winding down all the services the opendev sysadmins were running outside of opendev itself, in order to be able to keep our core services in good shape. the only one which we don't currently have plans to discontinue (but would like to) is wiki.openstack.org 15:39:37 at least putting it in openstack whee people seem to care would give it potential to survive 15:39:57 clarkb: indeed 15:40:28 and if more project care and we have more maintainer then we can things of opendev or at external place again 15:40:50 so let's go with the current situation 'only openstack need/care about it' 15:41:26 and openstack would not object if it goes at some central place for others to use 15:42:26 we have said in the past that we'd manage server resources at a low level for any project-specific efforts which needed them, but we stopped considering that viable some years back when it became apparent that people would build things which weren't long-term manageable, users would start to rely on them, then the services would be abandoned and the sysadmin group would be pressured to 15:42:28 keep them running (ask.openstack.org as a prime example) 15:43:21 so we have (will have) hosted service and maintainer to maintain it so place hardly matter for now and starting in openstack make sense to me and later we can again re-think on place based on interest of other projects. 15:43:53 ok, so a SIG? 15:44:02 new SIG or TACT? 15:44:10 Debuggability SIG 15:44:14 wiki.openstack.org is perhaps an even better example. we had volunteers from wikimedia foundation running it, they left, it wasn't really set up compatible with how we did our configuration management, so now it's in a really precarious state where basically nobody knows how to tend to it, its very outdated (probably rife with security holes) and we'd be hard pressed to rebuild it if 15:44:16 something happened to the server it's on 15:44:39 fungi: good to know 15:44:56 fungi: sounds like everyone just needs to begin moving off of it (projects wise) 15:45:14 TheJulia: well, it's an example of what i don't want whatever this replacement is to become 15:45:19 Hey all 15:45:24 * TheJulia adds to ironic's PTG topic list 15:45:29 fungi: ++ 15:46:49 I am fine on new SIG if fungi think it is ok and leave TACT SIG as it is as per its current scope ? 15:46:50 but for me TACT SIG seems more close and even fungi there helping on migration or any other question, sharing same IRC channel or so 15:46:58 each time we've proposed discontinuing the wiki server over the years, there's been uproar but no new volunteers to fix the problem. and the elastic-recheck/openstack-health discontinuation is at risk of going down the same path (though hopefully not, as we've learned those lessons i think?) 15:47:48 i'm fine with the coordination happening through the tact sig, as a peg to hang the sign on, but just be aware that doesn't come with any guaranteed resources 15:47:50 openstack-health we can again re-think if needed or not. QA is not using it so much as it used to be 15:47:58 fungi: yeah, we need to adapt and change processes, and it sounds like getting rid of the gearman bits would help, I hope at least. 15:48:11 gmann: in fact I think the health services has been dead for a few months now and no one noticed :) 15:48:28 fungi: true, and same thing if there is maintainer issue then yes we re-think on it. Will not put everything on you. 15:48:36 clarkb: :) 15:49:06 let me bring it again in QA meeting on 'who need it' ? i raised it few month back but we did not decided 15:49:43 to be fair, hald of elastic-recheck has been broken for months/years so i don't think all of its functionality is being relied on, and sometimes the backend is down for a week with nobody noticing 15:49:49 s/hald/half/ 15:50:16 fungi: yup it tends to get more use during periods of time like now with release candidates and stabilization and people caring about the gate 15:50:26 yeah 15:50:30 then no one cares during the 3 months of merge everything as quickly as possible 15:50:40 ok so are out of time for other agenda 15:50:44 (btw, I think TC should now concern itself with wiki going down to help TACT SIG too) 15:51:22 so as summary 1. OpenSearch is good option 2. TACT SIG is new place for hosted ELK maintenance 15:51:31 (as I doubt it's common knowledge) 15:51:45 any more things to discuss on this, though I will keep it in agenda for next meeting too 15:52:01 note it isn't just the wiki and the ELK stack. It is all the services right? This is why I continue to try and remind people that helping OpenDev ensures this doesn't happen to a growing list of services. But I think many have sort of accepted that ship has sailed at this point? 15:52:29 thanks for joining and great discussion on this clarkb fungi dpawlik wendar fungi 15:52:35 moving next 15:52:41 Now I don't think OpenStack should be the only project to help either. Zuul does a bunch of help but primarily on the CI side. Airship and Starlginx could probably be involved more 15:52:42 #topic Project Health checks framework 15:52:51 clarkb: indeed. 15:53:19 ricolin: I think as we discussed in previous meeting, we are going to work on documentation in parallel to current tool right? 15:53:27 #link https://etherpad.opendev.org/p/health_check 15:54:00 Indeed we have that action 15:54:16 ricolin: ok, any more updates or so you would like to share today? 15:54:19 not yet starts it 15:54:40 ok, no issue. if we can have somthing by PTG then it is good to discuss there 15:54:55 NP 15:54:57 moving next 15:55:00 #toppic OpenStack newsletter 15:55:31 we have two items there for newsletter, please add more if you think to highlight #link https://etherpad.opendev.org/p/newsletter-openstack-news 15:55:45 may be by tomorrow as deadline? diablo_rojo ? 15:56:22 You should probably have something about the new release? 15:56:25 Or maybe that is implied 15:56:50 clarkb: yeah not sure when it is going to publish before or after release ? 15:57:21 we usually add it explicitly if i remember correctly 15:57:32 gmann, actually its tuesday 15:57:42 for me to write the content 15:57:46 so preferably Monday 15:57:48 lol 15:58:07 diablo_rojo: ok then we can add it if it is released after Xena then add 15:58:10 #topic Open Reviews 15:58:17 #link https://review.opendev.org/q/projects:openstack/governance+is:open 15:58:45 this one to review and provide early feedback if you have any #link https://review.opendev.org/c/openstack/governance/+/810721 15:59:31 I expect there is going to be a focus on Xena in the next one if not this one. We can mention it but I don't think I will be needing to spend a lot of time on that particular topic. 15:59:49 diablo_rojo: +1 16:00:11 one last thing, next meeting is going to be Video call. 16:00:15 that's all from me today. 16:00:19 thanks all for joining 16:00:25 #endmeeting