15:00:34 <mgoddard> #startmeeting kolla 15:00:35 <openstack> Meeting started Wed Jul 1 15:00:34 2020 UTC and is due to finish in 60 minutes. The chair is mgoddard. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:36 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:38 <openstack> The meeting name has been set to 'kolla' 15:00:57 <mgoddard> #topic rollcall 15:01:07 <suryasingh> \0 15:01:13 <yoctozepto> o/ 15:01:31 <mgoddard> Hi, this meeting is clashing with the opendev large scale discussions. Let's see who shows up 15:01:32 <mgoddard> \o 15:01:40 <hrw> o/ 15:03:33 <yoctozepto> I can see only priteau with me in there 15:03:38 <headphoneJames> o/ 15:03:44 <yoctozepto> so we might start as well 15:03:47 <priteau> o/ 15:03:51 <yoctozepto> edge is not really my thing 15:05:15 <mgoddard> mgoddard mnasiadka hrw egonzalez yoctozepto rafaelweingartne cosmicsound osmanlicilegi 15:05:19 <mgoddard> ^ meeting 15:05:35 <mgoddard> #topic agenda 15:05:38 <mgoddard> * Roll-call 15:05:40 <mgoddard> * Announcements 15:05:42 <mgoddard> * Review action items from last meeting 15:05:44 <mgoddard> * CI status 15:05:48 <mgoddard> * Chrony & Ceph octopus (https://bugs.launchpad.net/kolla-ansible/+bug/1885689 & https://storyboard.openstack.org/#!/story/2007872) 15:05:48 <openstack> Launchpad bug 1885689 in kolla-ansible victoria "Ceph octopus incompatible with containerised chrony" [Medium,Triaged] 15:05:48 <mgoddard> * Switching Ussuri to track stable branches 15:05:50 <mgoddard> * Future of the common role 15:05:52 <mgoddard> * Keystone fernet keys handling https://review.opendev.org/707080 15:05:53 <patchbot> patch 707080 - kolla-ansible - Fix fernet bootstrap and key distribution - follow up - 14 patch sets 15:05:54 <mgoddard> * CentOS/AArch64: use Erlang from CentOS to get RabbitMQ working: https://review.opendev.org/#/q/I2559267d120081f2e5eabc9d966b019517a5ad5d 15:05:56 <mgoddard> * Infra images: https://etherpad.opendev.org/p/Kolla-infra-images 15:05:58 <mgoddard> * Victoria release planning (kayobe) 15:06:01 <mgoddard> * Victoria release planning (kolla & kolla ansible) 15:06:02 <mgoddard> * Kolla klub 15:06:05 <mgoddard> Lots of topics today 15:06:06 <mgoddard> #topic Announcements 15:06:40 <mgoddard> #info OpenDev large scale infra sessions took place this week 15:06:57 <mgoddard> #link https://www.openstack.org/events/opendev-2020/opendev-schedule 15:07:09 <mgoddard> Videos on youtube, notes on etherpad 15:07:46 <mgoddard> Any others? 15:08:45 <hrw> go on 15:09:03 <mgoddard> #topic Review action items from last meeting 15:09:13 <mgoddard> mnasiadka backport stein CI fixes 15:09:15 <mgoddard> yoctozepto to email list about kolla dev kalls 15:09:20 <mgoddard> mnasiadka did his 15:09:24 <mgoddard> yoctozepto did his 15:09:34 <mgoddard> #topic CI status 15:09:59 <yoctozepto> pretty green 15:10:02 <mgoddard> yeah 15:10:06 <mgoddard> Rocky still broken 15:10:06 <hrw> yep 15:10:07 <yoctozepto> random errors with mirrors 15:10:12 <mgoddard> https://review.opendev.org/#/c/738344/ 15:10:12 <patchbot> patch 738344 - kolla (stable/rocky) - Fix multiple issues - 3 patch sets 15:10:33 <yoctozepto> any idea what's left to be done? 15:10:39 <yoctozepto> (for rocky) 15:11:01 <mgoddard> no, haven't dug into it 15:11:05 <yoctozepto> ah, you reverted my fix part 15:11:13 <mgoddard> probably 15:11:29 <hrw> +2 15:12:11 <openstackgerrit> Radosław Piliszek proposed openstack/kolla stable/rocky: Fix multiple issues https://review.opendev.org/738344 15:12:38 <yoctozepto> +2 from me as well, let's see now 15:12:46 <mgoddard> let's see how it goes 15:12:56 <mgoddard> hold your +2s until we pass... 15:13:06 <hrw> no +W until pass 15:13:17 <yoctozepto> meh, I agree with the contents 15:13:24 <yoctozepto> fingers crossed for CI jobs 15:13:38 <yoctozepto> rocky k-a fine? 15:13:46 <mgoddard> hopefully 15:13:51 <yoctozepto> ok 15:13:59 <mgoddard> we are seeing instability in kayobe CI 15:14:11 <mgoddard> at least partially down to out of disk space issues 15:14:20 <yoctozepto> as I mentioned above mirrors are very grumpy 15:14:33 <yoctozepto> so could affect kayobe as well 15:14:38 <mgoddard> likely 15:14:56 <mgoddard> let's see how it goes, but probably needs attention 15:15:00 <mgoddard> #topic Chrony & Ceph octopus (https://bugs.launchpad.net/kolla-ansible/+bug/1885689 & https://storyboard.openstack.org/#!/story/2007872) 15:15:01 <openstack> Launchpad bug 1885689 in kolla-ansible victoria "Ceph octopus incompatible with containerised chrony" [Medium,Triaged] 15:15:22 <yoctozepto> all in for changing the default 15:15:40 <yoctozepto> even for ussuri now (with a reno, ml post and alike) 15:16:46 <mgoddard> it would leave us without NTP by default 15:17:09 <hrw> add it to bootstrap-servers? 15:17:19 <yoctozepto> there is some ntp nowadays by default though 15:17:48 <openstackgerrit> Doug Szumski proposed openstack/kolla-ansible master: Set a chunk size for Fluentd bulk log upload to Monasca https://review.opendev.org/738859 15:18:23 <hrw> my Debian 'buster' laptop lacks chronyd 15:18:27 <mgoddard> true, but if we disable it then existing environments may lose any custom config 15:18:34 <yoctozepto> that true 15:18:47 <priteau> Make it an option in Ussuri and earlier? 15:18:56 <priteau> And change the default in Victoria 15:18:56 <yoctozepto> it's already an option :-) 15:19:02 <yoctozepto> the default is bad 15:19:08 <mgoddard> I think that makes sense 15:19:10 <openstackgerrit> Doug Szumski proposed openstack/kolla-ansible master: Set a chunk size for Fluentd bulk log upload to Monasca https://review.opendev.org/738859 15:19:14 <yoctozepto> but changing is disruptive for those unaware ;/ 15:20:14 <mgoddard> I suppose if we get in quick and make a release we could minimise disruption in ussuri 15:20:18 <hrw> install chronyd, enable it, check is it running. if fails then fail deploy at bootstrap-servers 15:20:29 <yoctozepto> mgoddard: my thoughts exactly :-) 15:20:34 <hrw> as that would mean user config being wrong 15:20:38 <yoctozepto> just add reno, post to ml, all that nice stuff 15:20:55 <yoctozepto> we could do better with prechecks, that true 15:21:21 <yoctozepto> all our target platforms have systemd 15:21:27 <yoctozepto> so timedatectl check would do 15:21:35 <hrw> I always forget order of prechecks and bootstrap-servers ;D 15:21:50 <yoctozepto> hrw: what does your laptop say about timedatectl ? 15:22:13 <hrw> Time zone: Europe/Warsaw (CEST, +0200) 15:22:13 <hrw> System clock synchronized: yes 15:22:13 <hrw> NTP service: active 15:22:18 <mgoddard> mine says unknown command check 15:22:22 <yoctozepto> so you've got ntp 15:22:25 <hrw> no ntp nor chronyd 15:22:25 <yoctozepto> mgoddard: :-( 15:22:35 <yoctozepto> hrw: probably timesyncd 15:22:44 <hrw> no 15:22:45 <yoctozepto> we don't really check for all conflicting ntp clients 15:22:58 <hrw> chronyd ntp ntpd timesyncd missing 15:23:08 <yoctozepto> odd 15:23:26 <mgoddard> my bionic laptop is using ntpd 15:23:35 <mgoddard> anyway, I think we're rabbit holing 15:23:43 <hrw> systemd-timesyncd.service not timesyncd ;d 15:23:45 <mgoddard> we have a few things to get through 15:23:52 <yoctozepto> on bionic, focal and buster I see systemd-timesyncd 15:23:56 <yoctozepto> did not touch anything 15:24:05 <mgoddard> does anyone want to pick this up? 15:24:14 <dougsz> +1, Bionic: systemd-timesyncd.service active: yes 15:24:22 <yoctozepto> mgoddard: changing default + prechecks with timedatectl? 15:24:45 <yoctozepto> centos 7 obviously chronyd 15:24:57 <hrw> so chronyd on centos and ssytemd-timesyncd on debuntu? 15:25:18 <yoctozepto> it looks so 15:25:21 <yoctozepto> chcecking centos8 15:25:21 <mgoddard> can chrony not be installed on buster? 15:25:36 <priteau> IIRC timesyncd is just a client, so we cannot do the NTP server on VIP like we do with chrony 15:26:13 <mgoddard> presumably chrony is supported on all platforms, given we have a container 15:26:15 <priteau> https://packages.debian.org/buster/chrony 15:27:33 <yoctozepto> it is 15:27:38 <yoctozepto> centos8 is also chrony 15:27:57 <yoctozepto> priteau: if we drop controlling ntp, then it's not our biz to carry on with that 15:28:26 <yoctozepto> timesyncd is sntp client only 15:28:57 <yoctozepto> does k-a actually run chronyd on controllers as server? 15:29:01 <mgoddard> I updated the bug report 15:29:04 <priteau> Sorry if I misunderstood, I thought you were discussing configuring it as a host service via bootstrap-servers 15:29:48 <mgoddard> ah, so we're not planning to add configuration in bootstrap-servers? 15:29:57 <yoctozepto> I wouldn't 15:30:10 <yoctozepto> I mean, if it's to be done somewhere, then kayobe is probably a better place 15:30:20 <yoctozepto> as MAAS/Foreman will deal with NTP anyhow 15:30:24 <mgoddard> there's a bit of a regression there then, to provide custom config 15:30:29 <yoctozepto> CD install or cloud image have it by default as well 15:30:46 <yoctozepto> depends on how many folks actually rely on that feature 15:30:51 <yoctozepto> always sad to revert such things 15:31:08 <yoctozepto> well, we are leaving the switch for now 15:31:15 <mgoddard> true 15:31:27 <yoctozepto> we might deprecate it and remove in W or later but it does not hurt much 15:32:21 <mgoddard> I think whoever picks this up can spend some time thinking about it 15:32:25 <mgoddard> Anyone want to 15:32:27 <mgoddard> ? 15:32:58 <priteau> yoctozepto: My understanding is that k-a configures the VIP as one of the server, and that makes the active controller act as an NTP server 15:33:26 <yoctozepto> mgoddard: /me as secondary candidate 15:33:37 <yoctozepto> priteau: I never really used it so no idea, could check 15:34:20 <priteau> Although the external NTP servers are also listed in the configuration, and they have a lower stratum, so not sure if it's really used 15:34:35 <yoctozepto> we've also got enable_host_ntp for lolz 15:34:41 <yoctozepto> what about it 15:34:44 <mgoddard> priteau: I haven't seen that VIP support 15:34:47 <yoctozepto> looks half-butted 15:34:58 <mgoddard> won't work on centos 8 15:35:22 <mgoddard> ok, calling time on this 15:35:24 <yoctozepto> ack, add to the bug report 15:35:30 <mgoddard> #topic Switching Ussuri to track stable branches 15:35:40 <yoctozepto> +1 15:35:47 <hrw> +1 15:35:53 <mgoddard> We agreed to switch kolla stable/ussuri to track stable branches rather than versions at the PTG 15:36:04 <mgoddard> we made the GA release, so now we can do it 15:36:06 <yoctozepto> yeah, so no need to discuss 15:36:08 <mgoddard> anyone want to? 15:36:08 <yoctozepto> next topic 15:36:17 <mgoddard> looking for a volunteer :) 15:36:21 <hrw> I will 15:36:24 <yoctozepto> hrw sounds like the guy 15:36:25 <hrw> continue 15:36:26 <mgoddard> thanks hrw 15:36:27 <yoctozepto> you see 15:36:28 <yoctozepto> :D 15:36:57 <mgoddard> #action hrw to switch kolla to use stable branches and update release documentation 15:37:24 <mgoddard> #topic Future of the common role 15:37:48 <mgoddard> I've been spending a lot of time profiling ansible recently 15:38:04 <yoctozepto> much appreciated 15:38:12 <mgoddard> At scale (100+ hosts) the common role takes the most time to execute 15:38:31 <yoctozepto> you know the drill: deprecate and drop :-) 15:38:53 <mgoddard> We can improve this by removing the role dependency and executing it like any other role 15:39:08 <mgoddard> There is a behaviour change here though 15:39:30 <mgoddard> The common role was previously added as a dependency to all other roles. 15:39:33 <mgoddard> It would set a fact after running on a host to avoid running twice. This 15:39:34 <mgoddard> had the nice effect that deploying any service would automatically pull 15:39:36 <mgoddard> in the common services for that host. When using tags, any services with 15:39:39 <mgoddard> matching tags would also run the common role. This could be both 15:39:40 <mgoddard> surprising and sometimes useful. 15:39:42 <mgoddard> 15:39:44 <mgoddard> When using Ansible at large scale, there is a penalty associated with 15:39:47 <mgoddard> executing a task against a large number of hosts, even if it is skipped. 15:39:47 <openstackgerrit> Merged openstack/kolla-ansible master: Use public interface for Magnum client and trustee Keystone interface https://review.opendev.org/738351 15:39:49 <mgoddard> The common role introduces some overhead, just in determining that it 15:39:51 <mgoddard> has already run. 15:39:52 <mgoddard> 15:39:55 <mgoddard> This change extracts the common role into a separate play, and removes 15:39:56 <mgoddard> the dependency on it from all other roles. New groups have been added 15:39:58 <mgoddard> for cron, fluentd, and kolla-toolbox, similar to other services. This 15:40:00 <mgoddard> changes the behaviour in the following ways: 15:40:02 <mgoddard> 15:40:04 <yoctozepto> ah, that paste 15:40:05 <mgoddard> * The common role is now run for all hosts at the beginning, rather than 15:40:07 <mgoddard> prior to their first enabled service 15:40:08 <mgoddard> * Hosts must be in the necessary group for each of the common services 15:40:11 <mgoddard> in order to have that service deployed. This is mostly to avoid 15:40:13 <mgoddard> deploying on localhost or the deployment host 15:40:14 <mgoddard> * If tags are specified for another service e.g. nova, the common role 15:40:16 <mgoddard> will *not* automatically run for matching hosts. The common tag must 15:40:18 <mgoddard> be specified explicitly 15:40:21 <mgoddard> 15:40:22 <mgoddard> The last of these is probably the largest behaviour change. While it 15:40:24 <mgoddard> would be possible to determine which hosts should automatically run the 15:40:26 <mgoddard> common role, it would be quite complex, and would introduce some 15:40:28 <mgoddard> overhead that would probably negate the benefit of splitting out the 15:40:31 <mgoddard> common role. 15:40:32 <mgoddard> just dropping in my commit message for reference :) 15:40:34 <mgoddard> skip to the end... 15:40:37 <mgoddard> is this a hit we are willing to take to improve performance? 15:41:13 <yoctozepto> since we have other "dependent" roles, I +1 as it matches most of our design 15:41:22 <yoctozepto> the 'all depends on common' is funky 15:41:25 <priteau> IMHO I would say yes, because I always run Kayobe with --kolla-skip-tags common unless I know I need it 15:41:40 <yoctozepto> priteau: truer words have never been spoken 15:41:48 <yoctozepto> priteau: although /me with k-a directly 15:42:11 <yoctozepto> mgoddard: how much better does it get? 15:42:14 <yoctozepto> performance-wise 15:42:15 <hrw> I do not have opinion on it. 15:42:39 <mgoddard> it depends. I think there are non-linear scaling factors 15:43:02 <yoctozepto> non-linear is bad 15:43:05 <mgoddard> at 100+ hosts just skipping a task takes a long time 15:43:08 <yoctozepto> unless it's sub-linear 15:43:21 <yoctozepto> yeah, figured, and there is a bunch 15:43:25 <yoctozepto> how was it for you? 15:43:34 <yoctozepto> at least 10% decrease in time? 15:44:02 <mgoddard> I'm doing a few runs with multiple improvements applied. I don't know if I'll get an exact figure just for this one 15:44:30 <yoctozepto> I see; well, it would be nicer if it was measurable 15:44:43 <dougsz> the other thing - sometimes you want to just run the common role which I don't think you can at the moment 15:44:50 <yoctozepto> if we are refactoring for 1% performance gain then it's not really worth it 15:44:55 <mgoddard> that's true. This makes that possible 15:45:06 <yoctozepto> and this makes it possible to finally split the common role 15:45:14 <yoctozepto> to be less confusing 15:45:34 <mgoddard> true, although more roles == more painful includes :) 15:45:41 <yoctozepto> haha, that true 15:45:52 <yoctozepto> deprecate all, let's go for the monolith 8-) 15:46:06 <mgoddard> I don't think this one change will be a 10% improvement, but I'd expect more than 1% 15:46:19 <yoctozepto> mgoddard: measurements welcome 15:46:46 <mgoddard> I can add up some tasks that won't exist :) 15:47:32 <mgoddard> I'm going to be doing quite a few small improvements. I don't know if it will be realistic to benchmark them all, but I am benchmarking the underlying behaviour 15:47:44 <mgoddard> #link https://github.com/stackhpc/ansible-scaling 15:47:55 <mgoddard> WIP 15:48:18 <yoctozepto> nice, instastar 15:48:23 <mgoddard> #topic Keystone fernet keys handling https://review.opendev.org/707080 15:48:24 <patchbot> patch 707080 - kolla-ansible - Fix fernet bootstrap and key distribution - follow up - 14 patch sets 15:48:25 <JamesBenson> wouldn't there be a lot of repetition between roles separating it out completely? Perhaps demarkating the common role more, but not completely? 15:49:18 <mgoddard> JamesBenson: possibly. I don't plan to split at this point 15:49:23 <yoctozepto> ++ 15:49:26 <mgoddard> who added this topic? 15:49:29 <yoctozepto> ok, now unto the keystone 15:49:36 * yoctozepto did 15:49:53 <yoctozepto> just quick recap 15:50:02 <yoctozepto> whether you have something to say about it 15:50:15 <yoctozepto> it seems to we are still plagued by the random keystone issues 15:50:55 <yoctozepto> I don't want to take up too much meeting time but rather make sure you remember we have this pesky issue and reviewing and any more info is welcome 15:51:12 <yoctozepto> mnasiadka is not around today so we might as well postpone any further discussion till the next meeting 15:51:45 <mgoddard> +1 for prioritising it 15:51:47 * yoctozepto said what he wanted to say 15:51:59 <yoctozepto> dougsz, priteau: your thoughts maybe? 15:52:00 <mgoddard> #topic CentOS/AArch64: use Erlang from CentOS to get RabbitMQ working: https://review.opendev.org/#/q/I2559267d120081f2e5eabc9d966b019517a5ad5d 15:52:01 <mgoddard> hrw: 15:52:05 <hrw> yes, me 15:52:17 <hrw> we need those two patches in 15:52:31 <priteau> yoctozepto: Sorry I've not looked closely at the fernet issues 15:52:31 <dougsz> yoctozepto: Only that the follow up patch resolved my issues on a single node deploy 15:52:31 <hrw> otherwise no rabbitmq for centos/aarch64 so no deployments 15:52:41 <dougsz> agree we need those 2 patches 15:52:55 <hrw> and no, I do not plan to work on building upstream erlang to satisfy sick yoctozepto wishes 15:52:58 <yoctozepto> priteau, dougsz: thanks, guys 15:53:12 <yoctozepto> hrw: I'm sorry to hear you consider them sick 15:54:13 <yoctozepto> I'm fine as long as we don't have to cater for multiarch deployment of rabbitmq (unlikely) or rdo-provided rmq breaks k-a logic at some point (more likely) 15:54:14 <hrw> I came from 'get it working and then fix if bug reports' than 'spend extra week or two on stuff no one uses' 15:55:34 <mgoddard> +1 15:55:35 <mgoddard> +2 15:55:49 <yoctozepto> I'm an overthinker as you might have noticed 15:55:58 <yoctozepto> I like to know the traps ahead 15:56:02 <hrw> yoctozepto: rmq goes from upstream repo not centos 15:56:10 <hrw> https://24eade5565127d985eb0-7e6feee1594781d3a430e22d861f8db7.ssl.cf2.rackcdn.com/737473/2/check-arm64/kolla-build-centos8-source-aarch64/b9f98b0/kolla/build/rabbitmq.log 15:56:19 <yoctozepto> hrw: yeah, but erlang is critical to rmq's happiness :-) 15:56:30 <hrw> yoctozepto: so go, spend some time on building it. 15:56:38 <hrw> copr will make it quite easy probably 15:56:41 <yoctozepto> hrw: you are supporting aarch64 15:56:44 <mgoddard> hrw: ussuri fails 15:57:00 <yoctozepto> I'm supporting your supporting 15:57:05 <yoctozepto> and that's it :-) 15:57:07 <hrw> mgoddard: once master gets approved I will see looking at ussuri 15:57:12 <mgoddard> ok 15:57:16 <mgoddard> it's approved :) 15:57:19 <mgoddard> #topic Infra images: https://etherpad.opendev.org/p/Kolla-infra-images 15:57:23 <mgoddard> hrw again 15:57:29 <hrw> yes 15:57:37 <hrw> review patches, read and comment notes 15:57:51 <hrw> or better read/comment/review even 15:58:14 <hrw> as we had so many discussions that I am starting to loose track what we agreed and what not 15:58:38 <yoctozepto> ++ 15:58:54 <yoctozepto> only me looking at that etherpad though 15:59:59 <mgoddard> will read the pad after the meeting 16:00:03 <hrw> thx 16:00:10 <mgoddard> that's a wrap 16:00:18 <mgoddard> Kolla klub tomorrow 16:00:22 <mgoddard> see you there 16:00:24 <hrw> time for #endmeeting 16:00:25 <mgoddard> #endmeeting