15:00:00 #startmeeting kolla 15:00:00 Meeting started Wed Sep 8 15:00:00 2021 UTC and is due to finish in 60 minutes. The chair is mnasiadka. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:01 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:01 The meeting name has been set to 'kolla' 15:00:12 #topic rollcall 15:01:59 \o 15:02:01 o/ 15:03:35 . 15:04:24 o/ 15:05:12 ok, I think it's time to start :) 15:05:17 #topic agenda 15:05:26 * Announcements 15:05:26 * Review action items from the last meeting 15:05:26 * CI status 15:05:26 * Forward upgrade testing ( re: https://bugs.launchpad.net/kolla-ansible/+bug/1941706 ) 15:05:26 * DockerHub limits hurting users https://bugs.launchpad.net/kolla-ansible/+bug/1942134 15:05:28 * Future of Monasca - in context of CI failures and Elasticsearch switch to AWS Opensearch 15:05:28 * Release tasks 15:05:30 * Xena cycle planning 15:05:30 * Yoga PTG planning 15:05:32 * Open discussion 15:05:54 #topic Announcements 15:06:26 No announcements from me - well, seems there was a mail with PTL announcements - so it's official now. 15:06:34 Anybody has anything else? 15:07:32 Guess not. 15:07:40 #topic Review action items from the last meeting 15:08:15 No action items. 15:08:33 #topic CI status 15:08:54 Seems all green based on the whiteboard. 15:10:02 #topic Forward upgrade testing ( re: https://bugs.launchpad.net/kolla-ansible/+bug/1941706 ) 15:10:06 yoctozepto: I guess it's yours. 15:12:56 oh, sorry 15:13:04 I wanted to speak up on CI status but missed it 15:13:11 You still can :) 15:13:19 yeah :-) 15:13:32 so, relating to CI status; the gates are green, obviously :-) 15:13:53 I asked hongbin for help with the zun scenario (because it was red on ubuntu and prevented us to see other issues) 15:14:14 and so he proposed to drop the capsule testing as upstream did as well 15:14:27 I merged the change on all supported branches 15:14:31 you can see it in the workarounds 15:14:36 now the zun scenario is green 15:14:42 Great to hear. 15:14:45 remember zun scenario tests multinode cinder with iscsi 15:14:52 so it's useful for this purpose 15:14:54 ok, another 15:14:57 cinder failure in xena 15:15:05 it got promoted to critical before release 15:15:19 the cinder folks are debating on the best approach to it 15:15:25 but it's in progress 15:15:39 Good, let's track it - I'll subscribe to the bug. 15:15:42 our cephadm upgrade seems to work when any patch is applied 15:15:57 they just need to merge something finally (-: 15:16:06 What I'm more concerned with is the Monasca scenario, but maybe let's discsus that during the Monasca/Elasticsearch topic parallax raised. 15:16:09 remember it's our only multinode upgrade job 15:16:21 yeah, Monasca is to be discussed later so I'm not starting this now 15:16:36 Debian I will fix another time 15:16:46 that's all for CI status; now onto the "current topic" 15:16:56 so "Forward upgrade testing ( re: https://bugs.launchpad.net/kolla-ansible/+bug/1941706 )" 15:17:13 what I mean in here is that it's obviously very recommended to merge additional testing 15:17:26 but I think we need one more job variant - forward upgrades 15:17:48 so that we actually exercise the proposed code's ability to allow to be *upgraded-from* 15:18:13 i.e., even with the stricter testing we would not catch that libvirt issue immediately 15:18:34 only after it merged on wallaby, would we see failures on master and scratch our heads 15:18:48 do we have a feel for how many bugs this would have caught 15:19:27 likely not that many tbh; thus have not worked on that upfront and just put a discussion point 15:19:36 that we have this obvious hole in testing 15:20:04 we can always just "ack" it and go ahead; other projects don't seem to do forward testing either 15:20:18 (unless perhaps tripleo does but their CI scares me) 15:20:48 tripleo's CI is not the scariest element of tripleo ;-) 15:20:59 hard to tell really 15:21:23 would it prevent us from merging some changes? we already have some cases where we have to push a fix back to N-3 before it can be merged in later branches 15:22:15 That's my concern, we are doing reverse-backports from time to time, so this could complicate such scenarios. 15:22:20 yeah, I was wondering the same thing but I would recommend the wisdom of the elders - when you have an emergency and know the root cause, disable the blocking jobs and merge the necessary fixes ;-) 15:22:55 my biggest concern is general stability and gate time hit would not be compensated by the extra testing scope 15:23:42 if we are all sceptic, then it's perhaps best to just ack this point and move on 15:23:52 get back to it when we hit another similar issue 15:24:07 (could be like in 3 years time and we don't even remember lol) 15:24:14 while we may merge some breaking changes currently, we would at least catch them when running upgrade jobs in later branches 15:24:22 assuming our CI catches the issue 15:24:35 yeah, the issue is: only after and you have to guess which change broke 15:24:42 otherwise definitely 15:25:05 which it didn't with the libvirt issue, since you added testing to catch it 15:25:15 yes 15:25:27 i.e. 15:25:40 with new testing in place for now, we would have this situation 15:25:47 1) we actually break master 15:25:56 2) we backport the breakage to wallaby 15:26:12 3) we merge the breakage in wallaby because everything is nice and dandy 15:26:20 4) suddenly we see master jobs break 15:26:39 5) we investigate and figure out (after some time) that it was a change merged on wallaby that broke it 15:26:40 so 15:26:52 So from my perspective it's already better ;-) 15:26:54 at least we are already reducing the time window for affecting users 15:26:58 it is 15:27:10 the point was if we want to make it even tighter 15:27:20 but it comes with all the already mentioned drawbacks ;-) 15:27:54 wu.chunyang proposed openstack/kolla-ansible master: Remove chrony role from kolla https://review.opendev.org/c/openstack/kolla-ansible/+/791743 15:28:35 So, let's agree to revisit it if we think about it again/happen to be in a similar situation? 15:29:11 works for me; glad now you also understand the matter :-) 15:29:34 #agreed to not pursue forward-upgrade-testing currently and get back to it when we hit another similar issue. 15:29:47 #topic DockerHub limits hurting users https://bugs.launchpad.net/kolla-ansible/+bug/1942134 15:29:51 yoctozepto: you again! 15:29:52 mgoddard: also agrees? 15:29:58 mnasiadka: me again, yes (-: 15:30:14 agree 15:30:20 ok, so users are legitimately giving up on kolla because of dockerhub 15:30:23 good that we all agree to agreeing. 15:30:48 what could we do? could we just point to quay.io? 15:30:51 and forget dockerhub? 15:30:56 I thought that will sooner or later come to that point. 15:31:04 mnasiadka: me kinda too 15:31:08 Will it be better with quay.io for a regular user? 15:31:21 Well, CI is better - so I guess it kind of will. 15:31:31 same thinking here ;-) 15:31:50 it also gets those daily updates :D 15:31:57 I think we should also encourage in the docs, with bold blinking letters on orange background, to build your own images and setup your own registry. 15:32:10 wild mgoddard needs to appear in the discussion too 15:32:13 mnasiadka: ++ 15:32:22 I'm reading 15:32:32 +2 for blinking, like it's 90s still (-: 15:32:35 Start writing :) 15:32:47 mnasiadka: mgoddard in read-only mode 15:32:55 alecorps proposed openstack/kolla-ansible master: Add support for VMware NSXP https://review.opendev.org/c/openstack/kolla-ansible/+/807404 15:32:57 I'm just biding my time before saying quay.io++ 15:33:07 ok, action me on that 15:33:22 and now - what about the docs? mnasiadka to propose better wording? 15:33:34 or mgoddard (as English native)? 15:33:35 -1 blinking 15:33:35 how about this: registry deployed by default as a caching proxy to the outer docker hub / quay.io 15:33:42 #action yoctozepto to point all deployments to quay.io 15:33:56 parallax: dockerhub still breaks behind proxy 15:34:01 :-((( 15:34:05 #action mnasiadka to update docs encouraging to build your own containers and run your own registry 15:34:24 should we also write a contrib/ playbook to fetch quay.io containers and push it to your own registry? 15:34:27 tbh - never had these issues when deploying all in one dev envs 15:34:42 mnasiadka: yeah, that could help for sure 15:34:49 mnasiadka: build/retag/push was a kolla feature request 15:35:13 s/build/pull/ 15:35:14 parallax: yeah, aio does not break, but all-in-two does afair lol (-: 15:35:26 especially with retries 15:35:35 Do we have a volunteer to work on the playbook? Or even a kolla-ansible command? 15:35:38 dockerhub is a joke really 15:35:50 kolla command 15:36:04 mgoddard: now we talk pull-push 15:36:47 I don't think we have a blueprint around that feature. 15:37:30 hmm 15:38:20 Ok, so - any volunteer to create a blueprint, write down what it should do - and then we can find a volunteer to write the code? 15:38:46 here is the original kayobe RFE: https://storyboard.openstack.org/#!/story/2007731 15:40:24 ok then, doesn't sound very detailed, but it's a start. 15:41:03 for the sake of meeting time, I'll create the blueprint and reference it in Kayobe's RFE - if there will be anyone interested in writing that, he can pick it up. 15:41:21 #action mnasiadka to create pull-retag-push blueprint based on kayobe RFE: https://storyboard.openstack.org/#!/story/2007731 15:41:40 Is there anything else left in that topic to discuss, or we can move on? 15:42:43 #topic Future of Monasca - in context of CI failures and Elasticsearch switch to AWS Opensearch 15:42:51 parallax: stage is yours 15:43:46 We don't know if there are actual users of it, do we? 15:44:13 Thinking about recent releases, eg. Wallaby 15:44:40 mgoddard: any past user inquiries about Monasca in Kolla? Do you remember? 15:45:02 have you checked bug reports? 15:45:50 There's a Wallaby bug report 15:45:56 well, we are jumping the gun a little here - StackHPC is still a monasca user 15:46:07 we have quite a few open it seems 15:46:19 Yes, so we need to assume it needs to live. 15:46:21 and other people do use it 15:46:34 mgoddard: I see at least two StackHPC folks wishing to kill Monasca 15:46:38 parallax: what's the main problem (apart of CI liveliness) 15:47:55 anyhow, any idea what broke in CI? 15:47:59 Elasticsearch in Kolla is getting slightly outdated; however, Monasca is depending on it 15:48:00 I proposed https://review.opendev.org/c/openstack/kolla-ansible/+/807689 to try to fix the CI 15:48:13 This blocks e.g. migration to OpenSearch 15:48:24 parallax: what about centralised logging? 15:48:30 parallax: we can have both streams 15:48:41 Which is not impossible, but likely requires development effort in Monasca 15:49:06 mgoddard: that is WIP and looks like stopping to test Monasca in the Monasca scenario ;p 15:49:42 not really 15:50:06 so, uhm, the index in es means nothing? 15:50:36 it does, but that is just one part of the test 15:51:12 OK, maybe we could leave ES as it is and proceed with OpenSearch for centralised logging 15:51:19 ok, I now see it's still read there 15:51:21 red* 15:51:27 Anyway, so from what I understand Monasca is persisting on using Elasticsearch - and we'd like to move our central logging to use OpenSearch 15:51:27 Anything that blocks us from keeping Elasticsearch for Monasca (and waiting for them to move in some direction) and add OpenSearch as new container images and use them in central logging feature? 15:51:29 in the absence of someone willing to properly investigate, I'd rather have a green job 15:51:38 mnasiadka: ++ 15:51:44 mgoddard: ++ 15:52:13 Not removing / deprecating anything, just adding 15:52:21 seems reasonable 15:53:00 #agreed Keep Elasticsearch for Monasca and work on separate container images for OpenSearch and include it in central_logging feature 15:53:22 Great, another one solved. 15:53:30 Nice 15:53:37 Let's go forward... 15:53:41 #topic Release tasks 15:54:16 The winter is coming, err - the feature freeze is coming. 15:55:17 Sep 27 - Oct 01 is R-1 - Feature Freeze date 15:55:40 3 weeks from now 15:56:00 #topic Xena cycle planning 15:56:26 Should we go quickly through priorities once again? 15:56:46 Ansible - it's on track, three changes awaiting reviews. 15:56:55 or even four. 15:57:09 Ceph packages have been bumped to Pacific 15:57:52 kevko is again not here, and there's a lot of changes in the proxysql series 15:58:44 I've been reviewing system scope patches recently 15:58:48 I will not be doing consul this cycle 15:58:53 as masakari delayed this 15:58:58 I think we won't be landing system scope this cycle 15:59:09 but maybe we can start it if we break it up 15:59:40 I didn't have time to review modernisation of the Swift role - did you mgoddard ? 15:59:49 nope 15:59:56 ceph radosgw and gather facts command are awaiting other core reviews 16:00:26 ok, keystone system scope feels like a big thing in just one patch, so it would be nice to break it up and come up with a plan 16:00:39 headphoneJames: online? 16:00:59 well, it's already time to end the meeting 16:01:13 let's try to work on those reviews 16:01:23 #endmeeting kolla