15:00:08 #startmeeting tc 15:00:08 Meeting started Thu Apr 21 15:00:08 2022 UTC and is due to finish in 60 minutes. The chair is gmann. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:08 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:08 The meeting name has been set to 'tc' 15:00:14 o. 15:00:17 o/ 15:00:17 o/ 15:00:18 #topic Roll call 15:00:20 o/ 15:01:29 \o 15:02:01 slaweq will not be present as he informed before meeting. kendall in Asia TZ so would not join. 15:02:01 o/ 15:02:31 knikolla arne_wiebalck rosmaita meeting time 15:02:37 o/ 15:02:58 Today agenda #link https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee#Agenda_Suggestions 15:03:02 #topic Follow up on past action items 15:03:42 no action item as such, as this is first meeting after PTG 15:03:51 #topic Zed cycle Tracker 15:04:03 #link https://etherpad.opendev.org/p/tc-zed-tracker 15:04:33 #link http://lists.openstack.org/pipermail/openstack-discuss/2022-April/028206.html 15:04:49 as you know, I have prepared the zed cycle tracker 15:04:59 please check the items . 15:05:55 two items 2nd and 3rd need assignee 15:06:30 I will wait for spotz arne_wiebalck Kendal to check first and then in next meeting we can discuss it 15:06:54 idea is we all in TC distribute the items we target for the cycle 15:06:59 I added myself to the Documentation updates for release cadence. 15:07:20 I just added myself on to 2 15:07:30 Not 2 or 3 though:( 15:08:02 ok, looks like me & jay for #2 15:08:11 Also could work with rosmaita on the logging one. Did logging stuff back in the day. 15:08:24 +1 15:08:45 Should not sign up for more as I know it is going to be a very busy 6 months for me. 15:08:49 #3 is osc as in openstackclient yeah? 15:08:56 o/ 15:08:56 yes, that is important 15:09:23 jungleboyj: np!. 15:09:31 I'm also signed up for multiple things, but I will be second fiddle on the osc one for sure 15:09:43 I think glance might be a tough sell on that one, but important :/ 15:09:54 dansmith: thanks 15:10:00 I will be second fiddle on the Release Cadence. 15:10:10 i need to look into the claim of cinderclient feature parity 15:10:14 The user survey analysis should take toooo much more time. 15:10:22 anyone else for osc one as primary assignee ? 15:10:31 rosmaita: ++ That does seem suspicious. 15:11:25 mainly to find someone ti drive the goal which is difficult so mainly "drive the goal" 15:12:45 I thought keystone had already deprecated their CLI 15:12:58 I think so but not 100% 15:13:36 anyways I will keep it open if anyone want to help in this. dansmith is already signed up for help but need primary assignee 15:13:52 anything else on tracker? 15:13:53 dmendiza[m]: Do you know if the Keystone CLI is deprecated? 15:14:28 gmann: I am not signed up for anything yet, so ... 15:14:34 arne_wiebalck: osc! 15:14:45 ++ 15:15:03 ++:) 15:15:04 woooot 15:15:08 arne_wiebalck: great, thanks. 15:15:24 and we will check the progress in every alternate week on each item 15:15:43 arne_wiebalck: you are the lucky winner! 15:16:04 moving on.. 15:16:05 *gets a "hot ptotato" feeling* 15:16:10 :) 15:16:15 #topic Gate health check 15:16:29 any news on gate. 15:16:39 so, I just heard in the glance meeting that their periodic cs9 fips job is failing off and on 15:16:49 and a quick look showed another volume test 15:17:02 looks like cinder-volume is unable to allocate memory, 15:17:06 but I don't think there's an OOM 15:17:13 ok may be we need to apply SSH-able workaround in more tests? 15:17:19 ok 15:17:20 no, I think it's memory-related 15:17:25 I see 15:17:26 which is a great transition to.... 15:17:32 https://review.opendev.org/c/openstack/devstack/+/837139 15:17:46 +1 15:17:51 this ^ might help us see that one service is really fat on cs9 or something and help explain the difference 15:17:55 in devstack we have c9 as voting, need to check if that is blcoked 15:17:56 https://github.com/bloomberg/memray was getting traction on hacker news yesterday as a python memory profiler 15:18:10 gmann: it's not 100% apparently, but yeah 15:18:17 might be a cood subsequent thing to look into once dansmith's tracking has landed and offenders are identified 15:18:18 k 15:18:23 cool 15:18:43 was thinking we might be able to have a profiler like that as a non default devstack service that we can turn on for specific changes or something 15:18:48 dansmith: how we will track the some service going high in memory? compare with few previous run data? 15:18:50 then you don't really even need to run it locally if you don't want to 15:18:53 clarkb: yeah 15:19:17 or capture the current usage somewhere and use that as threshold 15:19:21 gmann: still working on that for total automation, but I'm going to work on at least a simple tool that will let you compare two runs and highlight major differences 15:19:32 dansmith: great 15:19:43 gmann: the easy thing to do is to just establish some baselines and then warn if you're 5% over that (per job type) or something 15:20:03 yeah, that will be good 15:20:07 but clarkb is also maybe going to help me with some grafana stuff and dpawlik is digesting the json file into opensearch 15:21:01 #link https://review.opendev.org/c/openstack/ci-log-processing/+/838655 15:21:05 dansmith: the logscraper patch is in review https://review.opendev.org/c/openstack/ci-log-processing/+/838655 15:21:12 ++ 15:21:22 this is nice direction. 15:21:24 anyway, that's all I know of current state-of-the-gate 15:21:46 git things are also merged in all supported branches and I think on EM also 15:22:03 when its merged, I will recreate container image and deploy latest version, so soon should be some results in the Opensearch 15:22:11 I saw the recheck on T this morning so I assume that landed but didn't check and if so, all good on the git stuff yeah 15:22:31 dpawlik: we have to merge the actual change to generate the file too, but yeah awesome :) 15:22:51 yeah all merged for git things- https://review.opendev.org/q/topic:bug%252F1968798 15:22:55 #link https://review.opendev.org/q/topic:bug%252F1968798 15:23:00 sweet 15:23:04 dpawlik: +1 15:23:38 anything else on gate? 15:23:55 not from me 15:23:58 my, my, c-bak really is a memory hog 15:24:15 as is privsep 15:24:25 we were disabling that by default in devstack, is that merged? 15:24:41 gmann: no the change is still WIP'd I think it failed beacuse there is some testing of c-bak in tempest? 15:24:46 #lin https://review.opendev.org/c/openstack/devstack/+/837630 15:24:46 it will need more coordination to land that 15:24:48 #link https://review.opendev.org/c/openstack/devstack/+/837630 15:25:16 yeah, I think we can run those in separate job, I am sure cinder have few c-back specific job 15:25:18 I removed my WIP feel free to update if we want ot keep pushing that 15:25:24 +1 15:25:27 rosmaita: yeah a gig is a lot :/ 15:25:37 rosmaita: its on you I think ^^ 15:26:33 we are waiting for you opinion there but let's discuss it in gerrit 15:26:47 yeah, i will see if we can move the c-bak tests to cinder-tempest-plugin or something 15:27:07 or just run from tempest location. 15:27:18 see, performance.json is already helping and it isn't even merged. :P 15:27:39 * dansmith whispers in kopecmartin's ear 15:27:44 let's discuss that in review or in qa channel 15:27:48 #topic Migration from old ELK service to new Dashboard 15:28:16 before we talk about the shutdown of old ELK server, let's check new dashboard 15:28:16 I Guess I can just jump in? 15:28:17 Communicating the new ELK service dashboard and login information 15:28:20 ah I'll wait 15:28:21 yeah 15:28:33 dpawlik: if you can update here what is state of new dashbaord 15:29:23 so the whole cluster is working normally without any issue so far 15:29:49 I create few simple visualization, one dashboard, but it just show basic information, nothing else 15:30:08 in the future I got a plan to create more visualization/better dashboard that will be helpful for developers 15:30:50 +1 15:31:00 the auto login is.... crazy to implement. I spend some time on doing it and I did not create a good configuration that will do autologin 15:31:23 I would assume that is a lower priority since sharing credentials isn't the end of the world 15:31:30 also backup/restore of kibana objects like visualization/dashboard should be done 15:32:01 yeah, autologin will be good to have but not blocker or priority thing. 15:32:32 the logstash service is not needed. The logscraper + logsender is working fine but will be good to have some feedback from the community if some builds are not there 15:33:58 and dpawlik has updated the login/url information in tact sig and p-t-g 15:34:00 #link https://docs.openstack.org/project-team-guide/testing.html#checking-status-of-other-job-results 15:34:07 #link https://governance.openstack.org/sigs/tact-sig.html#opensearch 15:34:21 these place we can use to communicate to community 15:34:37 I think next step is send it on ML and ask community to use. dpawlik ? 15:35:18 yup, will be good. I will write a message 15:35:19 #link https://review.opendev.org/833264 also raises the question of whether opendev will be able to get rid of the status.o.o server as well, the main thing i'm wondering at this point is whether people are still actively using http://status.openstack.org/reviews/ (reviewday) 15:35:47 yeah, we will discuss next what all good to shutdown 15:35:48 um, 15:36:02 how would we get rid of that.. we need it for the zuul dashboard at least right? 15:36:09 status.o.o I mean 15:36:10 dpawlik: thanks, Please send and thanks again for your work. 15:36:15 the zuul dashboard is zuul.opendev.org 15:36:30 #action dpawlik to send the new dashboard communication on openstack-discuss ML 15:36:31 oh, sorry I didn't realize it was redirecting there, I see 15:36:43 I still hit status.o.o by habit 15:36:45 status.openstack.org is what served the elastic recheck and openstack health dashboard, both of which are going away now 15:36:47 Shutdown of old ELK service 15:36:55 same as dansmith 15:36:58 clarkb: your turn 15:37:29 ok so about a year ago the opendev team called out the concern that the ELK systems were under maintained and running on old distro releases. We asked for help but said without help we would have to shut it down. We also said we'd do our best to keep it up through the yoga release 15:37:56 Yoga has happend and while we didn't get direct help openstack took on the effort of using opensearch to replace this service themselves 15:38:13 For this reason I think we're ready to go ahead and remove the service and servers from opendev. 15:38:46 There is another pressure pushing this along which is that zuul is changing the way executors run playbooks which means we cannot rely on firewall rules to protect the submission of indexing jobs to the service with iptables any longer 15:39:10 This gives us a bit of a hard deadline for removing things: when those zuul updates land. But I'd like to go ahead and start shutting it down sooner (now) 15:39:10 As new dashbaord is ready to use and we are going to communicate it on ML. we are good to shutdown the old server. 15:39:26 any thoughts from other tc-members ? 15:39:41 yeah, I guess so 15:40:25 Makes sense. 15:40:27 dpawlik's new design is also a much safer approach than how the old gearman submission solution worked, so doesn't suffer the risks clarkb mentioned 15:40:35 the first step is to remove the gearman job submissions from the base zuul job. Then we can remove our configuration management and finally delete the servers 15:40:47 nice 15:40:47 i don't have any objection given dpawlik's work ... is there any big functionality we will lose? 15:40:47 But I expect things can move fairly quickly once we have a decision 15:41:08 rosmaita: I'm not sure elastic-recheck is hooked up to the new system (and it currently runs on status.o.o which fungi would like to also cleanup) 15:41:32 but elatic-recheck could be made to run against opensearch. Maybe hosting it on the server doing the processing? 15:41:46 yes, part of the goal was for us to not be running elastic-recheck in opendev 15:42:00 it's part of that whole set 15:42:06 ok 15:42:10 right basically all the related tools are 1) not maintained beacuse they lack maintainers and 2) running on old systems with old config management that need to be removed 15:42:14 so e-r is moved on opensearch or need to? 15:42:16 (logstash, kibana, elasticsearch, and so on) 15:42:31 dpawlik ^^ 15:42:42 2 +1 implies deletion. We started by asking for help to address 1 to address it without deleting thigns but didn't get that so here we are :/ 15:43:19 rosmaita: there can be only one situation, that we need to ask Zuul if they can fix that. I mean: the logscraper is taking latest 1k job builds that has been done, but the results of that jobs can be executed long time ago, so later it will be not processed. I can explain that "bug" later 15:43:39 ok 15:44:07 so in opensearch, we will have log search and e-r right? 15:44:14 so the new deployment is not using any of Opendev services 15:44:22 I mean logstash/gearman etc 15:44:40 gmann: I think you have log search today but no e-r 15:44:57 at least I'm unaware of a new e-r deployment talking to the new opensearch indexes 15:45:07 yeah, I am thinking what is plan before we agree on e-r shutdown 15:45:29 specifically, the component which creates and publishes these graphs: http://status.openstack.org/reviews/ 15:45:36 from opendev's side the plan is we are shutting it down ebcause it isn't sustainable or maintained :) 15:45:36 er, http://status.openstack.org/elastic-recheck/ 15:46:09 #link http://status.openstack.org/elastic-recheck/ 15:47:32 the source code and the queries for e-r aren't going away. If/when the openstack opensearch deployment wants to deploy e-r that should be doable 15:47:49 and I think tripleo has a branch with updates to work against more modern elasticsearch as well 15:47:51 because I think we need e-r when we talked about opensearch. or I am missing things here? 15:48:09 yeah, source code is there 15:48:23 gmann: I think there are two things. First is that yes e-r would be great to have. But second we literally cannot run this service any longer it must be shut down. That means someone somewhere somehow will need to address e-r 15:48:24 clarkb: so with the ELK shutdown, this also goes away http://status.openstack.org/elastic-recheck/ 15:48:30 it looks like the opendev/elastic-recheck repository where people were curating the logstash queries hasn't been touched in a year either 15:48:49 so i don't think it's providing much value at this point 15:48:55 we have tried for quite some time to convince that someone somehow somewhere to help us and keep runnign the service but that did not happen. Instead a decision was made to address this without opendev :/ 15:49:00 it's certainly not classifying any bugs which are new in the past two cycles 15:49:13 and yes e-r is in the group of tools that are under maintained 15:49:26 tripleo decided that instead of maintaining it upstream they would essentialyl fork it on a branch dedicated to them 15:49:39 its not ideal, but there isn't much that fungi and I can do about it considering the other demands on us 15:49:41 fork what, e-r? 15:49:42 I agree on opendev perspective, just waiting for dpawlik if he has plan to integrate it in new system ? 15:49:49 dansmith: elastic-recheck 15:49:53 really.. 15:49:55 dansmith: yes tripleo soft forked e-r 15:50:12 #link https://opendev.org/opendev/elastic-recheck/commits/branch/rdo 15:50:21 I thought they said they will maintain opendev/elastic-recheck repo only 15:50:22 they have some improvements that will work with Opensearch 15:50:49 maybe the plan is to merge the rdo branch into master once we take down status.o.o 15:51:04 they host it in their server 15:51:46 right, i'm saying the master branch is unmaintained and we're shutting down the instance we have of it, so what's in the rdo branch could realistically become the new master branch anyway 15:52:00 We can ping and see 15:52:06 ok, so back to shutdown of ELK 15:52:11 I need to check where it is hosted 15:52:22 dpawlik: I think in rdo servers. 15:52:35 yes, but I need to check utilization of that server 15:52:50 dpawlik: but question is if openstack want to move e-r in new system opensearch, do we have server for that? 15:52:55 ok 15:53:24 yeah, because whatever AWS servers/creds we have we need to see their consumption 15:54:06 we can replace the logstash with elasticsearch recheck and should be ok... 15:54:07 so if we shutdown the ELK now we loose the e-r service and dpawlik will check possibility to have to on new system. then is it ok for ELK shutdown? 15:54:10 the ci-log-processor system is running on a vm in one of opendev's donor clouds, we're just not managing it (dpawlik et all have root access) 15:54:20 ok 15:54:34 so that's already not using aws credits 15:54:54 fungi: exactly. I added my team and attach the logscraper host into our monitoring system 15:55:24 so if they want to also integrate a new elastic-recheck replacement, they could do it on that server or another similar one 15:55:26 gmann: seems to be a good plan 15:55:44 i expect elastic-recheck uses almost no system resources to run anyway 15:56:07 it's just hosting queries which link out to kibana anyway 15:56:15 when it generates the graphs it can use some memory but ya most of the hard work is in elasticsearch 15:56:24 ok, so as agreement 1. we are ok to shutdown ELK old server even e-r is not ready on new system 2. dpawlik to investigate on e-r on new system 15:56:28 fungi: no those queries are not live. They are statically done every half hour 15:56:34 ahh, okay 15:56:34 tc-members any objection on that ^^ plan 15:56:58 no objection 15:57:05 I think that sounds fine to me. 15:57:22 sounds good to me 15:57:27 sounds good 15:57:30 ok 15:57:47 thanks! We'll probably remove the gearman job submissions today then and then work on systems cleanup tomorrow 15:57:55 ok, clarkb we have agreement on ELK old server shutdown, please proceed on that 15:58:12 fungi: and on http://status.openstack.org/reviews/ and status.o.o let's continue the discussion in next meeting ? 15:58:34 sure 15:58:36 fungi: but meanwhile if you can ask this in ML if anyone need it will be good input in that meeitng 15:58:40 fungi: thanks 15:58:54 that's my plan, just wanted to double-check if anyone here knew of teams still relying on it 15:59:00 sure. 15:59:10 since it's really the last thing on that site now 15:59:19 skipping open reivew but tc-member can check those and review 15:59:27 #topic Open Discussion 15:59:33 tick-tock release notes feedback (rosmaita) 15:59:52 rosmaita: not sure how much time we need on this but we can extend meeting lillte bit 15:59:56 yeah, the cinder team had some concerns about what was on the etherpad 15:59:59 i will be quick 16:00:05 sure, 16:00:09 1. don't like the "synopsis" idea ("Last week on OpenStack, ..."), because it's really hard to know what is important and what's not (it's all important -- that's why we wrote the release note in the first place!) 16:00:19 2. how about the beginning of the release notes having a section, something like, "If you are upgrading from Tick, read these first (link to n-1 release notes)" 16:00:28 3. finally, important for all projects to do the same thing, or it will be a pain point for operators 16:00:37 that's all 16:00:50 :-) 16:03:00 :) 16:03:03 rosmaita: which one in https://etherpad.opendev.org/p/tc-zed-ptg ? 16:04:05 may be we start tagging 'for-tick-too' while we do any release notes so we do not need to figure out at the end? 16:04:30 I really don't want to make reading the last release notes required, and if we did, no point in bringing things forward 16:04:33 well, the question is, aren't they all important? 16:04:39 but "recommending" that they read them is certainly good 16:04:42 well, reading the release notes isnt' required 16:04:48 :D 16:04:53 no, all our release notes are not important :) 16:05:04 maybe in nova 16:05:07 :P 16:05:09 and not even all notes we think are important are important to everyone, 16:05:21 well, the problem is how do you tell the difference? 16:05:23 :-) I think recommending is good. If we recommend and they don't do it ... well, we can't force them. 16:05:32 let them read and decide for themselves 16:05:32 so I would think any upgrade-related ones are critical to highlight, security maybe, but not every single bug 16:05:46 +1, upgraded and derpecation one if any 16:06:05 the important thing is that we don't *rely* on them reading them along with the current ones 16:06:13 if they read none, then they're out of luck, 16:06:13 or we start not putting non-important (not recommended to read) in releasenotes :) 16:06:21 why are they going to read the current ones, though? 16:06:26 but we can't just say "we expect you've read about all the upgrade and deprecation things in the previous notes of the release you didn't run" 16:06:30 i mean, why are we writing them at all? 16:06:38 especially because sometimes those will be different in terms of what they need to do by the time the tick rolls around 16:07:07 yeah, with tick-tick. they need to know what things happened in between of these two release 16:07:32 * arne_wiebalck needs to leave to pick up kids o/ 16:07:40 we're over time and I have to get to another thing 16:07:40 adding previous tock release links for read does not harm 16:07:46 maybe we need a voice call to argue about this? 16:07:47 yeah, me too 16:08:01 +1, let's setup a call to figure out all these things 16:08:03 it's not a big deal until AA 16:08:11 so we don't need to settle this now 16:08:22 but we need to agree on things well before 16:08:30 let me check about call later 16:08:32 yeah, but still plenty of time 16:08:36 really CC is when we need to care 16:08:36 and end today meeting 16:08:38 ++ 16:09:03 #action gmann to schedule call on tick-tock release note question 16:09:06 thanks all for joining 16:09:09 #endmeeting