15:00:06 <mgoddard> #startmeeting kolla
15:00:06 <opendevmeet> Meeting started Wed Aug  4 15:00:06 2021 UTC and is due to finish in 60 minutes.  The chair is mgoddard. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:06 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:06 <opendevmeet> The meeting name has been set to 'kolla'
15:00:14 <mgoddard> #topic rollcall
15:00:17 <mgoddard> \o
15:00:30 <yoctozepto> o/
15:03:20 <priteau> \o/
15:04:23 <mgoddard> #topic agenda
15:04:31 <mgoddard> * Roll-call
15:04:33 <mgoddard> * Agenda
15:04:35 <mgoddard> * Announcements
15:04:37 <mgoddard> * Review action items from the last meeting
15:04:39 <mgoddard> * CI status
15:04:41 <mgoddard> * Release tasks
15:04:43 <mgoddard> ** Policy on dead projects (vide Freezer) https://bugs.launchpad.net/kolla-ansible/+bug/1901698
15:04:45 <mgoddard> * Xena cycle planning
15:04:47 <mgoddard> ** Xena feature prioritisation https://docs.google.com/spreadsheets/d/1BuVMwP8eLnOVJDX8f3Nb6hCrNcNpRQl57T2ENU9Xao8/edit?usp=sharing
15:04:49 <mgoddard> * Kolla operator pain points https://etherpad.opendev.org/p/pain-point-elimination
15:04:51 <mgoddard> * Open discussion
15:04:53 <mgoddard> #topic announcements
15:05:00 <mgoddard> none here
15:05:03 <mgoddard> anyone else?
15:05:36 <mgoddard> #topic Review action items from the last meeting
15:05:58 <mgoddard> mgoddard email kolla-klubbers & openstack-discuss about pain points
15:06:00 <mgoddard> done
15:06:09 <mgoddard> no responses :)
15:06:13 <yoctozepto> ;-(
15:06:32 <mgoddard> clearly our users are pain free
15:06:54 <yoctozepto> painKillers
15:07:19 <mgoddard> #topic CI status
15:07:51 <mgoddard> I noticed that the debian upgrade job is borked on master
15:08:33 <mgoddard> #link https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b1e/793664/7/check/kolla-ansible-debian-source-upgrade/b1e0b6b/primary/logs/ansible/chrony-install
15:09:50 <yoctozepto> ah, yes, looks like ansible broke with python autodetection; come on, ansible :P
15:10:05 <mgoddard> last passed 2021-07-01...
15:10:26 <yoctozepto> the wonder of non-voting jobs
15:11:10 <mgoddard> oh, is that what it is? makes sense now
15:12:13 <mgoddard> https://opendev.org/openstack/kolla-ansible/src/branch/master/tests/upgrade.sh#L27
15:14:29 <mgoddard> any other CI breakage to speak of?
15:15:10 <yoctozepto> not seen any ohter
15:15:16 <yoctozepto> nor other
15:15:18 <yoctozepto> nor otter
15:15:26 <yoctozepto> nor oh-errr
15:15:41 <mgoddard> good
15:15:52 <mgoddard> #topic Release tasks
15:16:16 <mgoddard> R-9
15:18:22 <mgoddard> #link https://docs.openstack.org/kolla/latest/contributor/release-management.html
15:18:40 <mgoddard> so next week we  Switch binary images to current release
15:18:52 <mgoddard> otherwise I think we're ok
15:19:06 <mgoddard> #topic Policy on dead projects (vide Freezer) https://bugs.launchpad.net/kolla-ansible/+bug/1901698
15:19:11 <mgoddard> I smell a yoctozepto
15:19:35 <opendevreview> Michal Arbet proposed openstack/kolla-ansible master: Add mariadb arbitrator to mariadb role  https://review.opendev.org/c/openstack/kolla-ansible/+/780811
15:19:59 <yoctozepto> a wild yoctozepto appeared
15:20:21 <yoctozepto> Discuss / Change core / Items / Run
15:20:39 <yoctozepto> and back to the topic - yes
15:20:57 <yoctozepto> and I see yesterday we also got asked on the channel about freezer specifically
15:21:02 <yoctozepto> anyhow
15:21:15 <mgoddard> it's certainly in my list of 'would not recommend' projects
15:21:19 <yoctozepto> the topic is general - how should we approach projects that seem dead and unusable
15:21:23 <yoctozepto> because
15:21:26 <mgoddard> but how do we decide?
15:21:30 <yoctozepto> ( mgoddard: mine too )
15:21:33 <yoctozepto> mgoddard: precisely
15:21:40 <yoctozepto> but let me continue
15:21:43 <yoctozepto> because
15:21:53 <yoctozepto> the issue is we (as kolla) endorse the projects we support
15:22:09 <yoctozepto> it is just how the worlds seems to view that
15:22:40 <yoctozepto> and that is problematic because it just grows the frustration in users as there are parts that are simply known-not-to-work
15:22:51 <yoctozepto> as opposed to merely not-known-whether-it-works
15:23:05 <opendevreview> Michal Arbet proposed openstack/kolla-ansible master: [CI] Test Mariadb-Arbitrator with shards in the nova cells scenario  https://review.opendev.org/c/openstack/kolla-ansible/+/780970
15:23:21 <yoctozepto> I think we kind of need a policy/doc thing on that
15:23:37 <yoctozepto> with extra bells and whistles around tries to use such projects
15:23:50 <yoctozepto> though I, personally, would just drop them not to create illusions of workability
15:24:27 <yoctozepto> what are your thoughts?
15:24:33 <yoctozepto> I know it's mostly only mgoddard today
15:24:35 <priteau> New variable? enable_experimental_services: True
15:24:40 <yoctozepto> but perhaps priteau can say a thing or two
15:24:42 <yoctozepto> oh, he did
15:24:43 <mgoddard> it's a tricky one
15:24:56 <priteau> And without it you can't deploy them?
15:25:10 <yoctozepto> priteau: that's a part of those bells and whistles, yeah, good idea, thanks!
15:25:23 <mgoddard> on the one hand I don't like that we have unstable/unusable projects that add to our maintenance burden
15:25:40 <priteau> Doc is important too of course, but we know it isn't read by everyone
15:26:13 <mgoddard> on the other hand, I quite like that a project can go quiet but be rekindled by an interested party
15:27:07 <yoctozepto> well, perhaps the freezer example is just radical and we should just drop freezer and be happy with all the others
15:27:19 <mgoddard> ansible-collection-kolla-moribund-projects
15:27:30 <yoctozepto> xD
15:27:49 <yoctozepto> what other projects come to your mind, btw?
15:27:51 <yoctozepto> be honest
15:27:55 <mgoddard> it could quickly go stale though
15:28:10 <mgoddard> unless we maintained it
15:28:51 <mgoddard> freezer seems pretty dead, if it doesn't support py3.8
15:29:05 <mgoddard> a targeted approach might be best
15:29:15 <mgoddard> for others, maybe a long period of deprecation?
15:29:37 <yoctozepto> ok, so for freezer a mail to the mailing list that it looks bad
15:29:45 <yoctozepto> perhaps some TC look at it too
15:29:52 <mgoddard> https://www.stackalytics.io/?module=freezer-group
15:29:53 <yoctozepto> and what "others" do you have in mind?
15:29:57 <mgoddard> there have been some commits
15:30:45 <mnasiadka> we could just add metadata to a role if it's tested and supported, or deprecated (known not to work, no data, no known users, etc) - and build some support matrix based on that? I'm not saying we should remove those deprecated, but just deprecate them?
15:30:52 <yoctozepto> yeah, there are commits but nobody looks at bug reports
15:32:05 <priteau> Isn't this the fix for your bug? https://review.opendev.org/c/openstack/freezer/+/795715
15:33:06 <yoctozepto> priteau: looks like it
15:33:14 <priteau> Although, source & master images should have it?
15:33:18 <yoctozepto> the bug is not mine though :-)
15:33:25 <priteau> Oh, bug comments are from 2020
15:33:29 <yoctozepto> yeah, but people are running stable
15:33:29 <yoctozepto> yeah
15:34:37 <hrw> Sooner or later we will reverse support. Provide a list of known-to-work images and rest will generate 'UNSUPPORTED, YOU HAVE BEEN WARNED' message on build and deploy. plus 1 minute sleep per image
15:34:39 <yoctozepto> priteau: you could comment with that in the bug report
15:34:45 <priteau> Already did
15:34:57 <yoctozepto> hrw: I suggest one hour, make them learn :D
15:35:02 <yoctozepto> priteau: great!
15:35:20 <hrw> yoctozepto: 1 minute is annoying. 1 hour is 'where it is, I'll kill it'
15:35:33 <mgoddard> would love to get rid of these:
15:35:35 <mgoddard> aodh
15:35:37 <mgoddard> murano
15:35:39 <mgoddard> solum
15:35:41 <mgoddard> trove
15:35:43 <mgoddard> gnocchi
15:35:45 <mgoddard> qdrouterd
15:35:47 <mgoddard> storm
15:35:49 <mgoddard> vitrage
15:35:51 <mgoddard> mistral
15:35:53 <mgoddard> ovs-dpdk
15:35:55 <mgoddard> vmtp
15:35:57 <mgoddard> tacker
15:35:59 <mgoddard> watcher
15:36:01 <mgoddard> collectd
15:36:03 <mgoddard> sahara
15:36:05 <mgoddard> telegraf
15:36:07 <mgoddard> ceilometer
15:36:09 <mgoddard> freezer
15:36:11 <mgoddard> senlin
15:36:13 <mgoddard> skydive
15:36:20 <hrw> base
15:36:22 <yoctozepto> for the love of CI, that's one long list :-)
15:36:43 <mgoddard> I doubt it would help our CI - few are tested!
15:36:45 <priteau> I suppose there are still many deployments using ceilometer + gnocchi + aodh
15:36:50 <mnasiadka> vitrage's HA is broken, so nothing new
15:37:01 <priteau> collectd, wasn't flint adding new things recently?
15:37:10 <mgoddard> priteau: true, that's just my preference :)
15:37:11 <yoctozepto> I hope he was not
15:37:16 <yoctozepto> :D
15:37:49 <mgoddard> so where are we now?
15:37:56 <mgoddard> freezer fixed?
15:38:11 <yoctozepto> perhaps? at least that error is hidden now
15:38:47 <mgoddard> it stumbles on
15:39:06 <mgoddard> taking us back to the general question of how proactive we should be
15:39:46 <mgoddard> I'd quite like to see a page in docs for each service
15:39:52 <yoctozepto> that priteau-proposed flag sounds cool
15:40:02 <yoctozepto> mgoddard: yeah, and warnings on bad services
15:40:08 <mgoddard> as I think a lot of users at least get that far - search for 'kolla freezer'
15:40:20 <mgoddard> the flag sounds interesting
15:40:52 <mgoddard> but what are the criteria?
15:41:14 <mgoddard> having a banner in docs that says 'we do not test this in CI' makes sense to me
15:41:18 <priteau> We have a support matrix right?
15:41:31 <mgoddard> true
15:41:40 <yoctozepto> projects with long history of weird issues and no help coming is yet another stage
15:42:29 <priteau> Maybe we need another level in the matrix in addition to "C - Community maintained"
15:42:44 <mgoddard> D - despised
15:42:53 <mgoddard> U - unloved?
15:43:01 <yoctozepto> P - promoted
15:43:04 <yoctozepto> U - underrated
15:43:43 <priteau> U - unsupported
15:43:46 <johnthetubaguy[m]> FWIW, I like what cinder did around their drivers with requiring testing, and eventually removing things with no tests. Maybe you have to set: allow_crazy_untested_services: True or the config doesn't validate?
15:45:09 <johnthetubaguy[m]> in Nova we found no one bothered to read the support matrix, some people noticed having to enable extra config flags to use dodgy stuff
15:46:32 <mgoddard> is there anyone who would like to work on this?
15:46:44 <yoctozepto> I can
15:46:49 <yoctozepto> unless priteau wants to
15:46:56 <priteau> E - experimental (another suggestion for the matrix)
15:47:01 <yoctozepto> anyhow, I think we agree on two things
15:47:04 <yoctozepto> or three
15:47:14 <priteau> Unfortunately no time at the moment
15:47:21 <hrw> now we have Terrible, Common issues, Not even try levels in support matrix
15:47:28 <yoctozepto> 1) add the extra flag (starting Xena, with reno and general docs)
15:47:51 <yoctozepto> 2) extend the support matrix to mark "worse" services (this needs discussing how to choose and call these really)
15:48:29 <yoctozepto> 3) add docs per service with warning banner for services not tested in CI and especially for those that we know are ugly
15:48:53 <priteau> 4) drop the really bad ones that won't be fixed?
15:49:09 <yoctozepto> ad 4) that is long term I guess
15:49:30 <yoctozepto> anyhow, I can drive this
15:49:32 <mgoddard> just to add another take on it - H - needs help. That was an original goal of the support matrix, to highlight where we lack skills/interest, and attract help
15:49:43 <yoctozepto> oh, good one
15:50:02 <priteau> Nice one
15:50:03 <mgoddard> although that assumes the issues are on our side
15:50:07 <mgoddard> sor tof
15:50:12 <yoctozepto> so rtof
15:50:22 <yoctozepto> read the online form
15:50:26 <mgoddard> anyway, I think your list makes sense yoctozepto, especially if you're picking it up :)
15:50:33 <yoctozepto> mgoddard: thanks
15:50:47 <yoctozepto> I just want to focus on these and restore https://review.opendev.org/c/openstack/kolla-ansible/+/773246 and its stack
15:50:57 <yoctozepto> shameless plug but please consider reviewing and pushing that
15:51:07 <yoctozepto> as it's rotting for no good reason
15:51:08 <mgoddard> yeah, let's do that
15:51:24 <mgoddard> ok, 10m left
15:51:36 <mgoddard> #topic Xena feature prioritisation https://docs.google.com/spreadsheets/d/1BuVMwP8eLnOVJDX8f3Nb6hCrNcNpRQl57T2ENU9Xao8/edit?usp=sharing
15:52:01 <mgoddard> We covered this last time
15:52:12 <yoctozepto> I gave it some thought and using the launchpad blueprints for Masakari now
15:52:13 <mgoddard> How did that project-config patch go?
15:52:19 <yoctozepto> perhaps we could use it for Kolla projects too
15:52:42 <yoctozepto> got stuck
15:53:05 <mgoddard> could do
15:53:07 <priteau> mgoddard: the description for "H" could explain that it needs help in integration / testing in Kolla Ansible and/or fixing issues upstream
15:53:20 <mgoddard> priteau: true
15:53:41 <mgoddard> re blueprints, it could work
15:53:51 <mgoddard> we don't really use them much today
15:53:55 <yoctozepto> yup
15:53:58 <yoctozepto> we could start
15:54:01 <mgoddard> there are many stale ones
15:54:13 <yoctozepto> yeah, needs some form of cleanup
15:54:24 <yoctozepto> read my post on masakari for how I've done that
15:54:30 <yoctozepto> closed the obviously implemented ones
15:54:38 <yoctozepto> and reset priorities for other oldies
15:54:57 <yoctozepto> then we set priorities for our real goals
15:54:58 <mgoddard> I think I have done that mostly
15:55:04 <yoctozepto> and handle it all from there
15:55:09 <yoctozepto> plus the gerrit tags
15:55:12 <yoctozepto> I will ping infra
15:56:47 <yoctozepto> done
15:56:51 <yoctozepto> both channels
15:56:56 <mgoddard> so no hashtags for masakari?
15:57:32 <yoctozepto> nah, we don't have that much of change traffic
15:57:43 <yoctozepto> and only really one core that does most of the reviews
15:58:11 <mgoddard> doh
15:58:15 <yoctozepto> yup
15:58:50 <mgoddard> #topic Kolla operator pain points https://etherpad.opendev.org/p/pain-point-elimination
15:59:00 <mgoddard> I think I misspoke earlier about this
15:59:17 <mgoddard> while I received no direct responses, we do have 4 pain points
15:59:28 <mgoddard> or 3, if you can count
15:59:37 <mgoddard> (clarkb) Stopping docker and its containers to perform rolling reboots for operating system patches can result in bad rabbitmq state preventing services like cinder-scheduler from reconnecting successfully afterwards. This was addressed by completely rebuilding the rabbitmq cluster with a blank state. Not sure if there is a better rolling reboot process or maybe improvements can be made
15:59:39 <mgoddard> to rabbitmq in kolla to make it more resilient? This was with a victoria kolla deployment.
15:59:41 <mgoddard> (Fl1nt): Kolla-ansible lack a bit of an offline mode for companies that are using it in a really source restricted environment that doesn't get access to the internet at all but get access to internal mirrors and cloned repositories. I'm already working on this issue right now and few patches have already been proposed and merged back, few remaining patches are coming.
15:59:43 <mgoddard> Kolla-ansible lack some self-recovery mechanisms for offline environments. For example, mariadb cannot recover automatically after the cluster is restarted, and human intervention is highly likely to be required. It is hoped that some self-recovery solutions can be introduced to improve the recoverability of the cluster.
16:00:47 <yoctozepto> :-(
16:00:49 <yoctozepto> sad stories
16:00:53 <mgoddard> some food for though
16:01:01 <mgoddard> time's up y'all
16:01:05 <mgoddard> thanks
16:01:07 <mgoddard> #endmeeting