15:00:34 <mgoddard> #startmeeting kolla
15:00:35 <openstack> Meeting started Wed Jul  1 15:00:34 2020 UTC and is due to finish in 60 minutes.  The chair is mgoddard. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:36 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:38 <openstack> The meeting name has been set to 'kolla'
15:00:57 <mgoddard> #topic rollcall
15:01:07 <suryasingh> \0
15:01:13 <yoctozepto> o/
15:01:31 <mgoddard> Hi, this meeting is clashing with the opendev large scale discussions. Let's see who shows up
15:01:32 <mgoddard> \o
15:01:40 <hrw> o/
15:03:33 <yoctozepto> I can see only priteau with me in there
15:03:38 <headphoneJames> o/
15:03:44 <yoctozepto> so we might start as well
15:03:47 <priteau> o/
15:03:51 <yoctozepto> edge is not really my thing
15:05:15 <mgoddard> mgoddard mnasiadka hrw egonzalez yoctozepto rafaelweingartne cosmicsound osmanlicilegi
15:05:19 <mgoddard> ^ meeting
15:05:35 <mgoddard> #topic agenda
15:05:38 <mgoddard> * Roll-call
15:05:40 <mgoddard> * Announcements
15:05:42 <mgoddard> * Review action items from last meeting
15:05:44 <mgoddard> * CI status
15:05:48 <mgoddard> * Chrony & Ceph octopus (https://bugs.launchpad.net/kolla-ansible/+bug/1885689 & https://storyboard.openstack.org/#!/story/2007872)
15:05:48 <openstack> Launchpad bug 1885689 in kolla-ansible victoria "Ceph octopus incompatible with containerised chrony" [Medium,Triaged]
15:05:48 <mgoddard> * Switching Ussuri to track stable branches
15:05:50 <mgoddard> * Future of the common role
15:05:52 <mgoddard> * Keystone fernet keys handling https://review.opendev.org/707080
15:05:53 <patchbot> patch 707080 - kolla-ansible - Fix fernet bootstrap and key distribution - follow up - 14 patch sets
15:05:54 <mgoddard> * CentOS/AArch64: use Erlang from CentOS to get RabbitMQ working: https://review.opendev.org/#/q/I2559267d120081f2e5eabc9d966b019517a5ad5d
15:05:56 <mgoddard> * Infra images: https://etherpad.opendev.org/p/Kolla-infra-images
15:05:58 <mgoddard> * Victoria release planning (kayobe)
15:06:01 <mgoddard> * Victoria release planning (kolla & kolla ansible)
15:06:02 <mgoddard> * Kolla klub
15:06:05 <mgoddard> Lots of topics today
15:06:06 <mgoddard> #topic Announcements
15:06:40 <mgoddard> #info OpenDev large scale infra sessions took place this week
15:06:57 <mgoddard> #link https://www.openstack.org/events/opendev-2020/opendev-schedule
15:07:09 <mgoddard> Videos on youtube, notes on etherpad
15:07:46 <mgoddard> Any others?
15:08:45 <hrw> go on
15:09:03 <mgoddard> #topic Review action items from last meeting
15:09:13 <mgoddard> mnasiadka backport stein CI fixes
15:09:15 <mgoddard> yoctozepto to email list about kolla dev kalls
15:09:20 <mgoddard> mnasiadka did his
15:09:24 <mgoddard> yoctozepto did his
15:09:34 <mgoddard> #topic CI status
15:09:59 <yoctozepto> pretty green
15:10:02 <mgoddard> yeah
15:10:06 <mgoddard> Rocky still broken
15:10:06 <hrw> yep
15:10:07 <yoctozepto> random errors with mirrors
15:10:12 <mgoddard> https://review.opendev.org/#/c/738344/
15:10:12 <patchbot> patch 738344 - kolla (stable/rocky) - Fix multiple issues - 3 patch sets
15:10:33 <yoctozepto> any idea what's left to be done?
15:10:39 <yoctozepto> (for rocky)
15:11:01 <mgoddard> no, haven't dug into it
15:11:05 <yoctozepto> ah, you reverted my fix part
15:11:13 <mgoddard> probably
15:11:29 <hrw> +2
15:12:11 <openstackgerrit> Radosław Piliszek proposed openstack/kolla stable/rocky: Fix multiple issues  https://review.opendev.org/738344
15:12:38 <yoctozepto> +2 from me as well, let's see now
15:12:46 <mgoddard> let's see how it goes
15:12:56 <mgoddard> hold your +2s until we pass...
15:13:06 <hrw> no +W until pass
15:13:17 <yoctozepto> meh, I agree with the contents
15:13:24 <yoctozepto> fingers crossed for CI jobs
15:13:38 <yoctozepto> rocky k-a fine?
15:13:46 <mgoddard> hopefully
15:13:51 <yoctozepto> ok
15:13:59 <mgoddard> we are seeing instability in kayobe CI
15:14:11 <mgoddard> at least partially down to out of disk space issues
15:14:20 <yoctozepto> as I mentioned above mirrors are very grumpy
15:14:33 <yoctozepto> so could affect kayobe as well
15:14:38 <mgoddard> likely
15:14:56 <mgoddard> let's see how it goes, but probably needs attention
15:15:00 <mgoddard> #topic Chrony & Ceph octopus (https://bugs.launchpad.net/kolla-ansible/+bug/1885689 & https://storyboard.openstack.org/#!/story/2007872)
15:15:01 <openstack> Launchpad bug 1885689 in kolla-ansible victoria "Ceph octopus incompatible with containerised chrony" [Medium,Triaged]
15:15:22 <yoctozepto> all in for changing the default
15:15:40 <yoctozepto> even for ussuri now (with a reno, ml post and alike)
15:16:46 <mgoddard> it would leave us without NTP by default
15:17:09 <hrw> add it to bootstrap-servers?
15:17:19 <yoctozepto> there is some ntp nowadays by default though
15:17:48 <openstackgerrit> Doug Szumski proposed openstack/kolla-ansible master: Set a chunk size for Fluentd bulk log upload to Monasca  https://review.opendev.org/738859
15:18:23 <hrw> my Debian 'buster' laptop lacks chronyd
15:18:27 <mgoddard> true, but if we disable it then existing environments may lose any custom config
15:18:34 <yoctozepto> that true
15:18:47 <priteau> Make it an option in Ussuri and earlier?
15:18:56 <priteau> And change the default in Victoria
15:18:56 <yoctozepto> it's already an option :-)
15:19:02 <yoctozepto> the default is bad
15:19:08 <mgoddard> I think that makes sense
15:19:10 <openstackgerrit> Doug Szumski proposed openstack/kolla-ansible master: Set a chunk size for Fluentd bulk log upload to Monasca  https://review.opendev.org/738859
15:19:14 <yoctozepto> but changing is disruptive for those unaware ;/
15:20:14 <mgoddard> I suppose if we get in quick and make a release we could minimise disruption in ussuri
15:20:18 <hrw> install chronyd, enable it, check is it running. if fails then fail deploy at bootstrap-servers
15:20:29 <yoctozepto> mgoddard: my thoughts exactly :-)
15:20:34 <hrw> as that would mean user config being wrong
15:20:38 <yoctozepto> just add reno, post to ml, all that nice stuff
15:20:55 <yoctozepto> we could do better with prechecks, that true
15:21:21 <yoctozepto> all our target platforms have systemd
15:21:27 <yoctozepto> so timedatectl check would do
15:21:35 <hrw> I always forget order of prechecks and bootstrap-servers ;D
15:21:50 <yoctozepto> hrw: what does your laptop say about timedatectl ?
15:22:13 <hrw> Time zone: Europe/Warsaw (CEST, +0200)
15:22:13 <hrw> System clock synchronized: yes
15:22:13 <hrw> NTP service: active
15:22:18 <mgoddard> mine says unknown command check
15:22:22 <yoctozepto> so you've got ntp
15:22:25 <hrw> no ntp nor chronyd
15:22:25 <yoctozepto> mgoddard: :-(
15:22:35 <yoctozepto> hrw: probably timesyncd
15:22:44 <hrw> no
15:22:45 <yoctozepto> we don't really check for all conflicting ntp clients
15:22:58 <hrw> chronyd ntp ntpd timesyncd missing
15:23:08 <yoctozepto> odd
15:23:26 <mgoddard> my bionic laptop is using ntpd
15:23:35 <mgoddard> anyway, I think we're rabbit holing
15:23:43 <hrw> systemd-timesyncd.service not timesyncd ;d
15:23:45 <mgoddard> we have a few things to get through
15:23:52 <yoctozepto> on bionic, focal and buster I see systemd-timesyncd
15:23:56 <yoctozepto> did not touch anything
15:24:05 <mgoddard> does anyone want to pick this up?
15:24:14 <dougsz> +1, Bionic: systemd-timesyncd.service active: yes
15:24:22 <yoctozepto> mgoddard: changing default + prechecks with timedatectl?
15:24:45 <yoctozepto> centos 7 obviously chronyd
15:24:57 <hrw> so chronyd on centos and ssytemd-timesyncd on debuntu?
15:25:18 <yoctozepto> it looks so
15:25:21 <yoctozepto> chcecking centos8
15:25:21 <mgoddard> can chrony not be installed on buster?
15:25:36 <priteau> IIRC timesyncd is just a client, so we cannot do the NTP server on VIP like we do with chrony
15:26:13 <mgoddard> presumably chrony is supported on all platforms, given we have a container
15:26:15 <priteau> https://packages.debian.org/buster/chrony
15:27:33 <yoctozepto> it is
15:27:38 <yoctozepto> centos8 is also chrony
15:27:57 <yoctozepto> priteau: if we drop controlling ntp, then it's not our biz to carry on with that
15:28:26 <yoctozepto> timesyncd is sntp client only
15:28:57 <yoctozepto> does k-a actually run chronyd on controllers as server?
15:29:01 <mgoddard> I updated the bug report
15:29:04 <priteau> Sorry if I misunderstood, I thought you were discussing configuring it as a host service via bootstrap-servers
15:29:48 <mgoddard> ah, so we're not planning to add configuration in bootstrap-servers?
15:29:57 <yoctozepto> I wouldn't
15:30:10 <yoctozepto> I mean, if it's to be done somewhere, then kayobe is probably a better place
15:30:20 <yoctozepto> as MAAS/Foreman will deal with NTP anyhow
15:30:24 <mgoddard> there's a bit of a regression there then, to provide custom config
15:30:29 <yoctozepto> CD install or cloud image have it by default as well
15:30:46 <yoctozepto> depends on how many folks actually rely on that feature
15:30:51 <yoctozepto> always sad to revert such things
15:31:08 <yoctozepto> well, we are leaving the switch for now
15:31:15 <mgoddard> true
15:31:27 <yoctozepto> we might deprecate it and remove in W or later but it does not hurt much
15:32:21 <mgoddard> I think whoever picks this up can spend some time thinking about it
15:32:25 <mgoddard> Anyone want to
15:32:27 <mgoddard> ?
15:32:58 <priteau> yoctozepto: My understanding is that k-a configures the VIP as one of the server, and that makes the active controller act as an NTP server
15:33:26 <yoctozepto> mgoddard: /me as secondary candidate
15:33:37 <yoctozepto> priteau: I never really used it so no idea, could check
15:34:20 <priteau> Although the external NTP servers are also listed in the configuration, and they have a lower stratum, so not sure if it's really used
15:34:35 <yoctozepto> we've also got enable_host_ntp for lolz
15:34:41 <yoctozepto> what about it
15:34:44 <mgoddard> priteau: I haven't seen that VIP support
15:34:47 <yoctozepto> looks half-butted
15:34:58 <mgoddard> won't work on centos 8
15:35:22 <mgoddard> ok, calling time on this
15:35:24 <yoctozepto> ack, add to the bug report
15:35:30 <mgoddard> #topic Switching Ussuri to track stable branches
15:35:40 <yoctozepto> +1
15:35:47 <hrw> +1
15:35:53 <mgoddard> We agreed to switch kolla stable/ussuri to track stable branches rather than versions at the PTG
15:36:04 <mgoddard> we made the GA release, so now we can do it
15:36:06 <yoctozepto> yeah, so no need to discuss
15:36:08 <mgoddard> anyone want to?
15:36:08 <yoctozepto> next topic
15:36:17 <mgoddard> looking for a volunteer :)
15:36:21 <hrw> I will
15:36:24 <yoctozepto> hrw sounds like the guy
15:36:25 <hrw> continue
15:36:26 <mgoddard> thanks hrw
15:36:27 <yoctozepto> you see
15:36:28 <yoctozepto> :D
15:36:57 <mgoddard> #action hrw to switch kolla to use stable branches and update release documentation
15:37:24 <mgoddard> #topic Future of the common role
15:37:48 <mgoddard> I've been spending a lot of time profiling ansible recently
15:38:04 <yoctozepto> much appreciated
15:38:12 <mgoddard> At scale (100+ hosts) the common role takes the most time to execute
15:38:31 <yoctozepto> you know the drill: deprecate and drop :-)
15:38:53 <mgoddard> We can improve this by removing the role dependency and executing it like any other role
15:39:08 <mgoddard> There is a behaviour change here though
15:39:30 <mgoddard> The common role was previously added as a dependency to all other roles.
15:39:33 <mgoddard> It would set a fact after running on a host to avoid running twice. This
15:39:34 <mgoddard> had the nice effect that deploying any service would automatically pull
15:39:36 <mgoddard> in the common services for that host. When using tags, any services with
15:39:39 <mgoddard> matching tags would also run the common role. This could be both
15:39:40 <mgoddard> surprising and sometimes useful.
15:39:42 <mgoddard> 
15:39:44 <mgoddard> When using Ansible at large scale, there is a penalty associated with
15:39:47 <mgoddard> executing a task against a large number of hosts, even if it is skipped.
15:39:47 <openstackgerrit> Merged openstack/kolla-ansible master: Use public interface for Magnum client and trustee Keystone interface  https://review.opendev.org/738351
15:39:49 <mgoddard> The common role introduces some overhead, just in determining that it
15:39:51 <mgoddard> has already run.
15:39:52 <mgoddard> 
15:39:55 <mgoddard> This change extracts the common role into a separate play, and removes
15:39:56 <mgoddard> the dependency on it from all other roles. New groups have been added
15:39:58 <mgoddard> for cron, fluentd, and kolla-toolbox, similar to other services. This
15:40:00 <mgoddard> changes the behaviour in the following ways:
15:40:02 <mgoddard> 
15:40:04 <yoctozepto> ah, that paste
15:40:05 <mgoddard> * The common role is now run for all hosts at the beginning, rather than
15:40:07 <mgoddard> prior to their first enabled service
15:40:08 <mgoddard> * Hosts must be in the necessary group for each of the common services
15:40:11 <mgoddard> in order to have that service deployed. This is mostly to avoid
15:40:13 <mgoddard> deploying on localhost or the deployment host
15:40:14 <mgoddard> * If tags are specified for another service e.g. nova, the common role
15:40:16 <mgoddard> will *not* automatically run for matching hosts. The common tag must
15:40:18 <mgoddard> be specified explicitly
15:40:21 <mgoddard> 
15:40:22 <mgoddard> The last of these is probably the largest behaviour change. While it
15:40:24 <mgoddard> would be possible to determine which hosts should automatically run the
15:40:26 <mgoddard> common role, it would be quite complex, and would introduce some
15:40:28 <mgoddard> overhead that would probably negate the benefit of splitting out the
15:40:31 <mgoddard> common role.
15:40:32 <mgoddard> just dropping in my commit message for reference :)
15:40:34 <mgoddard> skip to the end...
15:40:37 <mgoddard> is this a hit we are willing to take to improve performance?
15:41:13 <yoctozepto> since we have other "dependent" roles, I +1 as it matches most of our design
15:41:22 <yoctozepto> the 'all depends on common' is funky
15:41:25 <priteau> IMHO I would say yes, because I always run Kayobe with --kolla-skip-tags common unless I know I need it
15:41:40 <yoctozepto> priteau: truer words have never been spoken
15:41:48 <yoctozepto> priteau: although /me with k-a directly
15:42:11 <yoctozepto> mgoddard: how much better does it get?
15:42:14 <yoctozepto> performance-wise
15:42:15 <hrw> I do not have opinion on it.
15:42:39 <mgoddard> it depends. I think there are non-linear scaling factors
15:43:02 <yoctozepto> non-linear is bad
15:43:05 <mgoddard> at 100+ hosts just skipping a task takes a long time
15:43:08 <yoctozepto> unless it's sub-linear
15:43:21 <yoctozepto> yeah, figured, and there is a bunch
15:43:25 <yoctozepto> how was it for you?
15:43:34 <yoctozepto> at least 10% decrease in time?
15:44:02 <mgoddard> I'm doing a few runs with multiple improvements applied. I don't know if I'll get an exact figure just for this one
15:44:30 <yoctozepto> I see; well, it would be nicer if it was measurable
15:44:43 <dougsz> the other thing - sometimes you want to just run the common role which I don't think you can at the moment
15:44:50 <yoctozepto> if we are refactoring for 1% performance gain then it's not really worth it
15:44:55 <mgoddard> that's true. This makes that possible
15:45:06 <yoctozepto> and this makes it possible to finally split the common role
15:45:14 <yoctozepto> to be less confusing
15:45:34 <mgoddard> true, although more roles == more painful includes :)
15:45:41 <yoctozepto> haha, that true
15:45:52 <yoctozepto> deprecate all, let's go for the monolith 8-)
15:46:06 <mgoddard> I don't think this one change will be a 10% improvement, but I'd expect more than 1%
15:46:19 <yoctozepto> mgoddard: measurements welcome
15:46:46 <mgoddard> I can add up some tasks that won't exist :)
15:47:32 <mgoddard> I'm going to be doing quite a few small improvements. I don't know if it will be realistic to benchmark them all, but I am benchmarking the underlying behaviour
15:47:44 <mgoddard> #link https://github.com/stackhpc/ansible-scaling
15:47:55 <mgoddard> WIP
15:48:18 <yoctozepto> nice, instastar
15:48:23 <mgoddard> #topic Keystone fernet keys handling https://review.opendev.org/707080
15:48:24 <patchbot> patch 707080 - kolla-ansible - Fix fernet bootstrap and key distribution - follow up - 14 patch sets
15:48:25 <JamesBenson> wouldn't there be a lot of repetition between roles separating it out completely?  Perhaps demarkating the common role more, but not completely?
15:49:18 <mgoddard> JamesBenson: possibly. I don't plan to split at this point
15:49:23 <yoctozepto> ++
15:49:26 <mgoddard> who added this topic?
15:49:29 <yoctozepto> ok, now unto the keystone
15:49:36 * yoctozepto did
15:49:53 <yoctozepto> just quick recap
15:50:02 <yoctozepto> whether you have something to say about it
15:50:15 <yoctozepto> it seems to we are still plagued by the random keystone issues
15:50:55 <yoctozepto> I don't want to take up too much meeting time but rather make sure you remember we have this pesky issue and reviewing and any more info is welcome
15:51:12 <yoctozepto> mnasiadka is not around today so we might as well postpone any further discussion till the next meeting
15:51:45 <mgoddard> +1 for prioritising it
15:51:47 * yoctozepto said what he wanted to say
15:51:59 <yoctozepto> dougsz, priteau: your thoughts maybe?
15:52:00 <mgoddard> #topic CentOS/AArch64: use Erlang from CentOS to get RabbitMQ working: https://review.opendev.org/#/q/I2559267d120081f2e5eabc9d966b019517a5ad5d
15:52:01 <mgoddard> hrw:
15:52:05 <hrw> yes, me
15:52:17 <hrw> we need those two patches in
15:52:31 <priteau> yoctozepto: Sorry I've not looked closely at the fernet issues
15:52:31 <dougsz> yoctozepto: Only that the follow up patch resolved my issues on a single node deploy
15:52:31 <hrw> otherwise no rabbitmq for centos/aarch64 so no deployments
15:52:41 <dougsz> agree we need those 2 patches
15:52:55 <hrw> and no, I do not plan to work on building upstream erlang to satisfy sick yoctozepto wishes
15:52:58 <yoctozepto> priteau, dougsz: thanks, guys
15:53:12 <yoctozepto> hrw: I'm sorry to hear you consider them sick
15:54:13 <yoctozepto> I'm fine as long as we don't have to cater for multiarch deployment of rabbitmq (unlikely) or rdo-provided rmq breaks k-a logic at some point (more likely)
15:54:14 <hrw> I came from 'get it working and then fix if bug reports' than 'spend extra week or two on stuff no one uses'
15:55:34 <mgoddard> +1
15:55:35 <mgoddard> +2
15:55:49 <yoctozepto> I'm an overthinker as you might have noticed
15:55:58 <yoctozepto> I like to know the traps ahead
15:56:02 <hrw> yoctozepto: rmq goes from upstream repo not centos
15:56:10 <hrw> https://24eade5565127d985eb0-7e6feee1594781d3a430e22d861f8db7.ssl.cf2.rackcdn.com/737473/2/check-arm64/kolla-build-centos8-source-aarch64/b9f98b0/kolla/build/rabbitmq.log
15:56:19 <yoctozepto> hrw: yeah, but erlang is critical to rmq's happiness :-)
15:56:30 <hrw> yoctozepto: so go, spend some time on building it.
15:56:38 <hrw> copr will make it quite easy probably
15:56:41 <yoctozepto> hrw: you are supporting aarch64
15:56:44 <mgoddard> hrw: ussuri fails
15:57:00 <yoctozepto> I'm supporting your supporting
15:57:05 <yoctozepto> and that's it :-)
15:57:07 <hrw> mgoddard: once master gets approved I will see looking at ussuri
15:57:12 <mgoddard> ok
15:57:16 <mgoddard> it's approved :)
15:57:19 <mgoddard> #topic Infra images: https://etherpad.opendev.org/p/Kolla-infra-images
15:57:23 <mgoddard> hrw again
15:57:29 <hrw> yes
15:57:37 <hrw> review patches, read and comment notes
15:57:51 <hrw> or better read/comment/review even
15:58:14 <hrw> as we had so many discussions that I am starting to loose track what we agreed and what not
15:58:38 <yoctozepto> ++
15:58:54 <yoctozepto> only me looking at that etherpad though
15:59:59 <mgoddard> will read the pad after the meeting
16:00:03 <hrw> thx
16:00:10 <mgoddard> that's a wrap
16:00:18 <mgoddard> Kolla klub tomorrow
16:00:22 <mgoddard> see you there
16:00:24 <hrw> time for #endmeeting
16:00:25 <mgoddard> #endmeeting