#openstack-kolla log

15:00:00 <mnasiadka> #startmeeting kolla
15:00:00 <opendevmeet> Meeting started Wed Sep  8 15:00:00 2021 UTC and is due to finish in 60 minutes.  The chair is mnasiadka. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:01 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:01 <opendevmeet> The meeting name has been set to 'kolla'
15:00:12 <mnasiadka> #topic rollcall
15:01:59 <mgoddard> \o
15:02:01 <mnasiadka> o/
15:03:35 <parallax> .
15:04:24 <yoctozepto> o/
15:05:12 <mnasiadka> ok, I think it's time to start :)
15:05:17 <mnasiadka> #topic agenda
15:05:26 <mnasiadka> * Announcements
15:05:26 <mnasiadka> * Review action items from the last meeting
15:05:26 <mnasiadka> * CI status
15:05:26 <mnasiadka> * Forward upgrade testing ( re: https://bugs.launchpad.net/kolla-ansible/+bug/1941706 )
15:05:26 <mnasiadka> * DockerHub limits hurting users https://bugs.launchpad.net/kolla-ansible/+bug/1942134
15:05:28 <mnasiadka> * Future of Monasca - in context  of CI failures and Elasticsearch switch to AWS Opensearch
15:05:28 <mnasiadka> * Release tasks
15:05:30 <mnasiadka> * Xena cycle planning
15:05:30 <mnasiadka> * Yoga PTG planning
15:05:32 <mnasiadka> * Open discussion
15:05:54 <mnasiadka> #topic Announcements
15:06:26 <mnasiadka> No announcements from me - well, seems there was a mail with PTL announcements - so it's official now.
15:06:34 <mnasiadka> Anybody has anything else?
15:07:32 <mnasiadka> Guess not.
15:07:40 <mnasiadka> #topic Review action items from the last meeting
15:08:15 <mnasiadka> No action items.
15:08:33 <mnasiadka> #topic CI status
15:08:54 <mnasiadka> Seems all green based on the whiteboard.
15:10:02 <mnasiadka> #topic Forward upgrade testing ( re: https://bugs.launchpad.net/kolla-ansible/+bug/1941706 )
15:10:06 <mnasiadka> yoctozepto: I guess it's yours.
15:12:56 <yoctozepto> oh, sorry
15:13:04 <yoctozepto> I wanted to speak up on CI status but missed it
15:13:11 <mnasiadka> You still can :)
15:13:19 <yoctozepto> yeah :-)
15:13:32 <yoctozepto> so, relating to CI status; the gates are green, obviously :-)
15:13:53 <yoctozepto> I asked hongbin for help with the zun scenario (because it was red on ubuntu and prevented us to see other issues)
15:14:14 <yoctozepto> and so he proposed to drop the capsule testing as upstream did as well
15:14:27 <yoctozepto> I merged the change on all supported branches
15:14:31 <yoctozepto> you can see it in the workarounds
15:14:36 <yoctozepto> now the zun scenario is green
15:14:42 <mnasiadka> Great to hear.
15:14:45 <yoctozepto> remember zun scenario tests multinode cinder with iscsi
15:14:52 <yoctozepto> so it's useful for this purpose
15:14:54 <yoctozepto> ok, another
15:14:57 <yoctozepto> cinder failure in xena
15:15:05 <yoctozepto> it got promoted to critical before release
15:15:19 <yoctozepto> the cinder folks are debating on the best approach to it
15:15:25 <yoctozepto> but it's in progress
15:15:39 <mnasiadka> Good, let's track it - I'll subscribe to the bug.
15:15:42 <yoctozepto> our cephadm upgrade seems to work when any patch is applied
15:15:57 <yoctozepto> they just need to merge something finally (-:
15:16:06 <mnasiadka> What I'm more concerned with is the Monasca scenario, but maybe let's discsus that during the Monasca/Elasticsearch topic parallax raised.
15:16:09 <yoctozepto> remember it's our only multinode upgrade job
15:16:21 <yoctozepto> yeah, Monasca is to be discussed later so I'm not starting this now
15:16:36 <yoctozepto> Debian I will fix another time
15:16:46 <yoctozepto> that's all for CI status; now onto the "current topic"
15:16:56 <yoctozepto> so "Forward upgrade testing ( re: https://bugs.launchpad.net/kolla-ansible/+bug/1941706 )"
15:17:13 <yoctozepto> what I mean in here is that it's obviously very recommended to merge additional testing
15:17:26 <yoctozepto> but I think we need one more job variant - forward upgrades
15:17:48 <yoctozepto> so that we actually exercise the proposed code's ability to allow to be *upgraded-from*
15:18:13 <yoctozepto> i.e., even with the stricter testing we would not catch that libvirt issue immediately
15:18:34 <yoctozepto> only after it merged on wallaby, would we see failures on master and scratch our heads
15:18:48 <mgoddard> do we have a feel for how many bugs this would have caught
15:19:27 <yoctozepto> likely not that many tbh; thus have not worked on that upfront and just put a discussion point
15:19:36 <yoctozepto> that we have this obvious hole in testing
15:20:04 <yoctozepto> we can always just "ack" it and go ahead; other projects don't seem to do forward testing either
15:20:18 <yoctozepto> (unless perhaps tripleo does but their CI scares me)
15:20:48 <mnasiadka> tripleo's CI is not the scariest element of tripleo ;-)
15:20:59 <yoctozepto> hard to tell really
15:21:23 <mgoddard> would it prevent us from merging some changes? we already have some cases where we have to push a fix back to N-3 before it can be merged in later branches
15:22:15 <mnasiadka> That's my concern, we are doing reverse-backports from time to time, so this could complicate such scenarios.
15:22:20 <yoctozepto> yeah, I was wondering the same thing but I would recommend the wisdom of the elders - when you have an emergency and know the root cause, disable the blocking jobs and merge the necessary fixes ;-)
15:22:55 <yoctozepto> my biggest concern is general stability and gate time hit would not be compensated by the extra testing scope
15:23:42 <yoctozepto> if we are all sceptic, then it's perhaps best to just ack this point and move on
15:23:52 <yoctozepto> get back to it when we hit another similar issue
15:24:07 <yoctozepto> (could be like in 3 years time and we don't even remember lol)
15:24:14 <mgoddard> while we may merge some breaking changes currently, we would at least catch them when running upgrade jobs in later branches
15:24:22 <mgoddard> assuming our CI catches the issue
15:24:35 <yoctozepto> yeah, the issue is: only after and you have to guess which change broke
15:24:42 <yoctozepto> otherwise definitely
15:25:05 <mgoddard> which it didn't with the libvirt issue, since you added testing to catch it
15:25:15 <yoctozepto> yes
15:25:27 <yoctozepto> i.e.
15:25:40 <yoctozepto> with new testing in place for now, we would have this situation
15:25:47 <yoctozepto> 1) we actually break master
15:25:56 <yoctozepto> 2) we backport the breakage to wallaby
15:26:12 <yoctozepto> 3) we merge the breakage in wallaby because everything is nice and dandy
15:26:20 <yoctozepto> 4) suddenly we see master jobs break
15:26:39 <yoctozepto> 5) we investigate and figure out (after some time) that it was a change merged on wallaby that broke it
15:26:40 <yoctozepto> so
15:26:52 <mnasiadka> So from my perspective it's already better ;-)
15:26:54 <yoctozepto> at least we are already reducing the time window for affecting users
15:26:58 <yoctozepto> it is
15:27:10 <yoctozepto> the point was if we want to make it even tighter
15:27:20 <yoctozepto> but it comes with all the already mentioned drawbacks ;-)
15:27:54 <opendevreview> wu.chunyang proposed openstack/kolla-ansible master: Remove chrony role from kolla  https://review.opendev.org/c/openstack/kolla-ansible/+/791743
15:28:35 <mnasiadka> So, let's agree to revisit it if we think about it again/happen to be in a similar situation?
15:29:11 <yoctozepto> works for me; glad now you also understand the matter :-)
15:29:34 <mnasiadka> #agreed to not pursue forward-upgrade-testing currently and get back to it when we hit another similar issue.
15:29:47 <mnasiadka> #topic DockerHub limits hurting users https://bugs.launchpad.net/kolla-ansible/+bug/1942134
15:29:51 <mnasiadka> yoctozepto: you again!
15:29:52 <yoctozepto> mgoddard: also agrees?
15:29:58 <yoctozepto> mnasiadka: me again, yes (-:
15:30:14 <mgoddard> agree
15:30:20 <yoctozepto> ok, so users are legitimately giving up on kolla because of dockerhub
15:30:23 <mnasiadka> good that we all agree to agreeing.
15:30:48 <yoctozepto> what could we do? could we just point to quay.io?
15:30:51 <yoctozepto> and forget dockerhub?
15:30:56 <mnasiadka> I thought that will sooner or later come to that point.
15:31:04 <yoctozepto> mnasiadka: me kinda too
15:31:08 <mnasiadka> Will it be better with quay.io for a regular user?
15:31:21 <mnasiadka> Well, CI is better - so I guess it kind of will.
15:31:31 <yoctozepto> same thinking here ;-)
15:31:50 <yoctozepto> it also gets those daily updates :D
15:31:57 <mnasiadka> I think we should also encourage in the docs, with bold blinking letters on orange background, to build your own images and setup your own registry.
15:32:10 <yoctozepto> wild mgoddard needs to appear in the discussion too
15:32:13 <yoctozepto> mnasiadka: ++
15:32:22 <mgoddard> I'm reading
15:32:32 <yoctozepto> +2 for blinking, like it's 90s still (-:
15:32:35 <mnasiadka> Start writing :)
15:32:47 <yoctozepto> mnasiadka: mgoddard in read-only mode
15:32:55 <opendevreview> alecorps proposed openstack/kolla-ansible master: Add support for VMware NSXP  https://review.opendev.org/c/openstack/kolla-ansible/+/807404
15:32:57 <mgoddard> I'm just biding my time before saying quay.io++
15:33:07 <yoctozepto> ok, action me on that
15:33:22 <yoctozepto> and now - what about the docs? mnasiadka to propose better wording?
15:33:34 <yoctozepto> or mgoddard (as English native)?
15:33:35 <mgoddard> -1 blinking
15:33:35 <parallax> how about this: registry deployed by default as a caching proxy to the outer docker hub / quay.io
15:33:42 <mnasiadka> #action yoctozepto to point all deployments to quay.io
15:33:56 <yoctozepto> parallax: dockerhub still breaks behind proxy
15:34:01 <yoctozepto> :-(((
15:34:05 <mnasiadka> #action mnasiadka to update docs encouraging to build your own containers and run your own registry
15:34:24 <mnasiadka> should we also write a contrib/ playbook to fetch quay.io containers and push it to your own registry?
15:34:27 <parallax> tbh - never had these issues when deploying all in one dev envs
15:34:42 <yoctozepto> mnasiadka: yeah, that could help for sure
15:34:49 <mgoddard> mnasiadka: build/retag/push was a kolla feature request
15:35:13 <mgoddard> s/build/pull/
15:35:14 <yoctozepto> parallax: yeah, aio does not break, but all-in-two does afair lol (-:
15:35:26 <yoctozepto> especially with retries
15:35:35 <mnasiadka> Do we have a volunteer to work on the playbook? Or even a kolla-ansible command?
15:35:38 <yoctozepto> dockerhub is a joke really
15:35:50 <mgoddard> kolla command
15:36:04 <yoctozepto> mgoddard: now we talk pull-push
15:36:47 <mnasiadka> I don't think we have a blueprint around that feature.
15:37:30 <yoctozepto> hmm
15:38:20 <mnasiadka> Ok, so - any volunteer to create a blueprint, write down what it should do - and then we can find a volunteer to write the code?
15:38:46 <mgoddard> here is the original kayobe RFE: https://storyboard.openstack.org/#!/story/2007731
15:40:24 <mnasiadka> ok then, doesn't sound very detailed, but it's a start.
15:41:03 <mnasiadka> for the sake of meeting time, I'll create the blueprint and reference it in Kayobe's RFE - if there will be anyone interested in writing that, he can pick it up.
15:41:21 <mnasiadka> #action mnasiadka to create pull-retag-push blueprint based on kayobe RFE: https://storyboard.openstack.org/#!/story/2007731
15:41:40 <mnasiadka> Is there anything else left in that topic to discuss, or we can move on?
15:42:43 <mnasiadka> #topic Future of Monasca - in context  of CI failures and Elasticsearch switch to AWS Opensearch
15:42:51 <mnasiadka> parallax: stage is yours
15:43:46 <parallax> We don't know if there are actual users of it, do we?
15:44:13 <parallax> Thinking about recent releases, eg. Wallaby
15:44:40 <mnasiadka> mgoddard: any past user inquiries about Monasca in Kolla? Do you remember?
15:45:02 <yoctozepto> have you checked bug reports?
15:45:50 <mnasiadka> There's a Wallaby bug report
15:45:56 <mgoddard> well, we are jumping the gun a little here - StackHPC is still a monasca user
15:46:07 <yoctozepto> we have quite a few open it seems
15:46:19 <mnasiadka> Yes, so we need to assume it needs to live.
15:46:21 <mgoddard> and other people do use it
15:46:34 <yoctozepto> mgoddard: I see at least two StackHPC folks wishing to kill Monasca
15:46:38 <mnasiadka> parallax: what's the main problem (apart of CI liveliness)
15:47:55 <yoctozepto> anyhow, any idea what broke in CI?
15:47:59 <parallax> Elasticsearch in Kolla is getting slightly outdated; however, Monasca is depending on it
15:48:00 <mgoddard> I proposed https://review.opendev.org/c/openstack/kolla-ansible/+/807689 to try to fix the CI
15:48:13 <parallax> This blocks e.g. migration to OpenSearch
15:48:24 <mgoddard> parallax: what about centralised logging?
15:48:30 <yoctozepto> parallax: we can have both streams
15:48:41 <parallax> Which is not impossible, but likely requires development effort in Monasca
15:49:06 <yoctozepto> mgoddard: that is WIP and looks like stopping to test Monasca in the Monasca scenario ;p
15:49:42 <mgoddard> not really
15:50:06 <yoctozepto> so, uhm, the index in es means nothing?
15:50:36 <mgoddard> it does, but that is just one part of the test
15:51:12 <parallax> OK, maybe we could leave ES as it is and proceed with OpenSearch for centralised logging
15:51:19 <yoctozepto> ok, I now see it's still read there
15:51:21 <yoctozepto> red*
15:51:27 <mnasiadka> Anyway, so from what I understand Monasca is persisting on using Elasticsearch - and we'd like to move our central logging to use OpenSearch
15:51:27 <mnasiadka> Anything that blocks us from keeping Elasticsearch for Monasca (and waiting for them to move in some direction) and add OpenSearch as new container images and use them in central logging feature?
15:51:29 <mgoddard> in the absence of someone willing to properly investigate, I'd rather have a green job
15:51:38 <yoctozepto> mnasiadka: ++
15:51:44 <yoctozepto> mgoddard: ++
15:52:13 <parallax> Not removing / deprecating anything, just adding
15:52:21 <mgoddard> seems reasonable
15:53:00 <mnasiadka> #agreed Keep Elasticsearch for Monasca and work on separate container images for OpenSearch and include it in central_logging feature
15:53:22 <mnasiadka> Great, another one solved.
15:53:30 <parallax> Nice
15:53:37 <mnasiadka> Let's go forward...
15:53:41 <mnasiadka> #topic Release tasks
15:54:16 <mnasiadka> The winter is coming, err - the feature freeze is coming.
15:55:17 <mnasiadka> Sep 27 - Oct 01 is R-1 - Feature Freeze date
15:55:40 <mnasiadka> 3 weeks from now
15:56:00 <mnasiadka> #topic Xena cycle planning
15:56:26 <mnasiadka> Should we go quickly through priorities once again?
15:56:46 <mnasiadka> Ansible - it's on track, three changes awaiting reviews.
15:56:55 <mnasiadka> or even four.
15:57:09 <mnasiadka> Ceph packages have been bumped to Pacific
15:57:52 <mnasiadka> kevko is again not here, and there's a lot of changes in the proxysql series
15:58:44 <mgoddard> I've been reviewing system scope patches recently
15:58:48 <yoctozepto> I will not be doing consul this cycle
15:58:53 <yoctozepto> as masakari delayed this
15:58:58 <mgoddard> I think we won't be landing system scope this cycle
15:59:09 <mgoddard> but maybe we can start it if we break it up
15:59:40 <mnasiadka> I didn't have time to review modernisation of the Swift role - did you mgoddard ?
15:59:49 <mgoddard> nope
15:59:56 <mnasiadka> ceph radosgw and gather facts command are awaiting other core reviews
16:00:26 <mnasiadka> ok, keystone system scope feels like a big thing in just one patch, so it would be nice to break it up and come up with a plan
16:00:39 <mnasiadka> headphoneJames: online?
16:00:59 <mnasiadka> well, it's already time to end the meeting
16:01:13 <mnasiadka> let's try to work on those reviews
16:01:23 <mnasiadka> #endmeeting kolla