15:00:34 #startmeeting kolla 15:00:35 Meeting started Wed Jul 1 15:00:34 2020 UTC and is due to finish in 60 minutes. The chair is mgoddard. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:36 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:38 The meeting name has been set to 'kolla' 15:00:57 #topic rollcall 15:01:07 \0 15:01:13 o/ 15:01:31 Hi, this meeting is clashing with the opendev large scale discussions. Let's see who shows up 15:01:32 \o 15:01:40 o/ 15:03:33 I can see only priteau with me in there 15:03:38 o/ 15:03:44 so we might start as well 15:03:47 o/ 15:03:51 edge is not really my thing 15:05:15 mgoddard mnasiadka hrw egonzalez yoctozepto rafaelweingartne cosmicsound osmanlicilegi 15:05:19 ^ meeting 15:05:35 #topic agenda 15:05:38 * Roll-call 15:05:40 * Announcements 15:05:42 * Review action items from last meeting 15:05:44 * CI status 15:05:48 * Chrony & Ceph octopus (https://bugs.launchpad.net/kolla-ansible/+bug/1885689 & https://storyboard.openstack.org/#!/story/2007872) 15:05:48 Launchpad bug 1885689 in kolla-ansible victoria "Ceph octopus incompatible with containerised chrony" [Medium,Triaged] 15:05:48 * Switching Ussuri to track stable branches 15:05:50 * Future of the common role 15:05:52 * Keystone fernet keys handling https://review.opendev.org/707080 15:05:53 patch 707080 - kolla-ansible - Fix fernet bootstrap and key distribution - follow up - 14 patch sets 15:05:54 * CentOS/AArch64: use Erlang from CentOS to get RabbitMQ working: https://review.opendev.org/#/q/I2559267d120081f2e5eabc9d966b019517a5ad5d 15:05:56 * Infra images: https://etherpad.opendev.org/p/Kolla-infra-images 15:05:58 * Victoria release planning (kayobe) 15:06:01 * Victoria release planning (kolla & kolla ansible) 15:06:02 * Kolla klub 15:06:05 Lots of topics today 15:06:06 #topic Announcements 15:06:40 #info OpenDev large scale infra sessions took place this week 15:06:57 #link https://www.openstack.org/events/opendev-2020/opendev-schedule 15:07:09 Videos on youtube, notes on etherpad 15:07:46 Any others? 15:08:45 go on 15:09:03 #topic Review action items from last meeting 15:09:13 mnasiadka backport stein CI fixes 15:09:15 yoctozepto to email list about kolla dev kalls 15:09:20 mnasiadka did his 15:09:24 yoctozepto did his 15:09:34 #topic CI status 15:09:59 pretty green 15:10:02 yeah 15:10:06 Rocky still broken 15:10:06 yep 15:10:07 random errors with mirrors 15:10:12 https://review.opendev.org/#/c/738344/ 15:10:12 patch 738344 - kolla (stable/rocky) - Fix multiple issues - 3 patch sets 15:10:33 any idea what's left to be done? 15:10:39 (for rocky) 15:11:01 no, haven't dug into it 15:11:05 ah, you reverted my fix part 15:11:13 probably 15:11:29 +2 15:12:11 Radosław Piliszek proposed openstack/kolla stable/rocky: Fix multiple issues https://review.opendev.org/738344 15:12:38 +2 from me as well, let's see now 15:12:46 let's see how it goes 15:12:56 hold your +2s until we pass... 15:13:06 no +W until pass 15:13:17 meh, I agree with the contents 15:13:24 fingers crossed for CI jobs 15:13:38 rocky k-a fine? 15:13:46 hopefully 15:13:51 ok 15:13:59 we are seeing instability in kayobe CI 15:14:11 at least partially down to out of disk space issues 15:14:20 as I mentioned above mirrors are very grumpy 15:14:33 so could affect kayobe as well 15:14:38 likely 15:14:56 let's see how it goes, but probably needs attention 15:15:00 #topic Chrony & Ceph octopus (https://bugs.launchpad.net/kolla-ansible/+bug/1885689 & https://storyboard.openstack.org/#!/story/2007872) 15:15:01 Launchpad bug 1885689 in kolla-ansible victoria "Ceph octopus incompatible with containerised chrony" [Medium,Triaged] 15:15:22 all in for changing the default 15:15:40 even for ussuri now (with a reno, ml post and alike) 15:16:46 it would leave us without NTP by default 15:17:09 add it to bootstrap-servers? 15:17:19 there is some ntp nowadays by default though 15:17:48 Doug Szumski proposed openstack/kolla-ansible master: Set a chunk size for Fluentd bulk log upload to Monasca https://review.opendev.org/738859 15:18:23 my Debian 'buster' laptop lacks chronyd 15:18:27 true, but if we disable it then existing environments may lose any custom config 15:18:34 that true 15:18:47 Make it an option in Ussuri and earlier? 15:18:56 And change the default in Victoria 15:18:56 it's already an option :-) 15:19:02 the default is bad 15:19:08 I think that makes sense 15:19:10 Doug Szumski proposed openstack/kolla-ansible master: Set a chunk size for Fluentd bulk log upload to Monasca https://review.opendev.org/738859 15:19:14 but changing is disruptive for those unaware ;/ 15:20:14 I suppose if we get in quick and make a release we could minimise disruption in ussuri 15:20:18 install chronyd, enable it, check is it running. if fails then fail deploy at bootstrap-servers 15:20:29 mgoddard: my thoughts exactly :-) 15:20:34 as that would mean user config being wrong 15:20:38 just add reno, post to ml, all that nice stuff 15:20:55 we could do better with prechecks, that true 15:21:21 all our target platforms have systemd 15:21:27 so timedatectl check would do 15:21:35 I always forget order of prechecks and bootstrap-servers ;D 15:21:50 hrw: what does your laptop say about timedatectl ? 15:22:13 Time zone: Europe/Warsaw (CEST, +0200) 15:22:13 System clock synchronized: yes 15:22:13 NTP service: active 15:22:18 mine says unknown command check 15:22:22 so you've got ntp 15:22:25 no ntp nor chronyd 15:22:25 mgoddard: :-( 15:22:35 hrw: probably timesyncd 15:22:44 no 15:22:45 we don't really check for all conflicting ntp clients 15:22:58 chronyd ntp ntpd timesyncd missing 15:23:08 odd 15:23:26 my bionic laptop is using ntpd 15:23:35 anyway, I think we're rabbit holing 15:23:43 systemd-timesyncd.service not timesyncd ;d 15:23:45 we have a few things to get through 15:23:52 on bionic, focal and buster I see systemd-timesyncd 15:23:56 did not touch anything 15:24:05 does anyone want to pick this up? 15:24:14 +1, Bionic: systemd-timesyncd.service active: yes 15:24:22 mgoddard: changing default + prechecks with timedatectl? 15:24:45 centos 7 obviously chronyd 15:24:57 so chronyd on centos and ssytemd-timesyncd on debuntu? 15:25:18 it looks so 15:25:21 chcecking centos8 15:25:21 can chrony not be installed on buster? 15:25:36 IIRC timesyncd is just a client, so we cannot do the NTP server on VIP like we do with chrony 15:26:13 presumably chrony is supported on all platforms, given we have a container 15:26:15 https://packages.debian.org/buster/chrony 15:27:33 it is 15:27:38 centos8 is also chrony 15:27:57 priteau: if we drop controlling ntp, then it's not our biz to carry on with that 15:28:26 timesyncd is sntp client only 15:28:57 does k-a actually run chronyd on controllers as server? 15:29:01 I updated the bug report 15:29:04 Sorry if I misunderstood, I thought you were discussing configuring it as a host service via bootstrap-servers 15:29:48 ah, so we're not planning to add configuration in bootstrap-servers? 15:29:57 I wouldn't 15:30:10 I mean, if it's to be done somewhere, then kayobe is probably a better place 15:30:20 as MAAS/Foreman will deal with NTP anyhow 15:30:24 there's a bit of a regression there then, to provide custom config 15:30:29 CD install or cloud image have it by default as well 15:30:46 depends on how many folks actually rely on that feature 15:30:51 always sad to revert such things 15:31:08 well, we are leaving the switch for now 15:31:15 true 15:31:27 we might deprecate it and remove in W or later but it does not hurt much 15:32:21 I think whoever picks this up can spend some time thinking about it 15:32:25 Anyone want to 15:32:27 ? 15:32:58 yoctozepto: My understanding is that k-a configures the VIP as one of the server, and that makes the active controller act as an NTP server 15:33:26 mgoddard: /me as secondary candidate 15:33:37 priteau: I never really used it so no idea, could check 15:34:20 Although the external NTP servers are also listed in the configuration, and they have a lower stratum, so not sure if it's really used 15:34:35 we've also got enable_host_ntp for lolz 15:34:41 what about it 15:34:44 priteau: I haven't seen that VIP support 15:34:47 looks half-butted 15:34:58 won't work on centos 8 15:35:22 ok, calling time on this 15:35:24 ack, add to the bug report 15:35:30 #topic Switching Ussuri to track stable branches 15:35:40 +1 15:35:47 +1 15:35:53 We agreed to switch kolla stable/ussuri to track stable branches rather than versions at the PTG 15:36:04 we made the GA release, so now we can do it 15:36:06 yeah, so no need to discuss 15:36:08 anyone want to? 15:36:08 next topic 15:36:17 looking for a volunteer :) 15:36:21 I will 15:36:24 hrw sounds like the guy 15:36:25 continue 15:36:26 thanks hrw 15:36:27 you see 15:36:28 :D 15:36:57 #action hrw to switch kolla to use stable branches and update release documentation 15:37:24 #topic Future of the common role 15:37:48 I've been spending a lot of time profiling ansible recently 15:38:04 much appreciated 15:38:12 At scale (100+ hosts) the common role takes the most time to execute 15:38:31 you know the drill: deprecate and drop :-) 15:38:53 We can improve this by removing the role dependency and executing it like any other role 15:39:08 There is a behaviour change here though 15:39:30 The common role was previously added as a dependency to all other roles. 15:39:33 It would set a fact after running on a host to avoid running twice. This 15:39:34 had the nice effect that deploying any service would automatically pull 15:39:36 in the common services for that host. When using tags, any services with 15:39:39 matching tags would also run the common role. This could be both 15:39:40 surprising and sometimes useful. 15:39:42 15:39:44 When using Ansible at large scale, there is a penalty associated with 15:39:47 executing a task against a large number of hosts, even if it is skipped. 15:39:47 Merged openstack/kolla-ansible master: Use public interface for Magnum client and trustee Keystone interface https://review.opendev.org/738351 15:39:49 The common role introduces some overhead, just in determining that it 15:39:51 has already run. 15:39:52 15:39:55 This change extracts the common role into a separate play, and removes 15:39:56 the dependency on it from all other roles. New groups have been added 15:39:58 for cron, fluentd, and kolla-toolbox, similar to other services. This 15:40:00 changes the behaviour in the following ways: 15:40:02 15:40:04 ah, that paste 15:40:05 * The common role is now run for all hosts at the beginning, rather than 15:40:07 prior to their first enabled service 15:40:08 * Hosts must be in the necessary group for each of the common services 15:40:11 in order to have that service deployed. This is mostly to avoid 15:40:13 deploying on localhost or the deployment host 15:40:14 * If tags are specified for another service e.g. nova, the common role 15:40:16 will *not* automatically run for matching hosts. The common tag must 15:40:18 be specified explicitly 15:40:21 15:40:22 The last of these is probably the largest behaviour change. While it 15:40:24 would be possible to determine which hosts should automatically run the 15:40:26 common role, it would be quite complex, and would introduce some 15:40:28 overhead that would probably negate the benefit of splitting out the 15:40:31 common role. 15:40:32 just dropping in my commit message for reference :) 15:40:34 skip to the end... 15:40:37 is this a hit we are willing to take to improve performance? 15:41:13 since we have other "dependent" roles, I +1 as it matches most of our design 15:41:22 the 'all depends on common' is funky 15:41:25 IMHO I would say yes, because I always run Kayobe with --kolla-skip-tags common unless I know I need it 15:41:40 priteau: truer words have never been spoken 15:41:48 priteau: although /me with k-a directly 15:42:11 mgoddard: how much better does it get? 15:42:14 performance-wise 15:42:15 I do not have opinion on it. 15:42:39 it depends. I think there are non-linear scaling factors 15:43:02 non-linear is bad 15:43:05 at 100+ hosts just skipping a task takes a long time 15:43:08 unless it's sub-linear 15:43:21 yeah, figured, and there is a bunch 15:43:25 how was it for you? 15:43:34 at least 10% decrease in time? 15:44:02 I'm doing a few runs with multiple improvements applied. I don't know if I'll get an exact figure just for this one 15:44:30 I see; well, it would be nicer if it was measurable 15:44:43 the other thing - sometimes you want to just run the common role which I don't think you can at the moment 15:44:50 if we are refactoring for 1% performance gain then it's not really worth it 15:44:55 that's true. This makes that possible 15:45:06 and this makes it possible to finally split the common role 15:45:14 to be less confusing 15:45:34 true, although more roles == more painful includes :) 15:45:41 haha, that true 15:45:52 deprecate all, let's go for the monolith 8-) 15:46:06 I don't think this one change will be a 10% improvement, but I'd expect more than 1% 15:46:19 mgoddard: measurements welcome 15:46:46 I can add up some tasks that won't exist :) 15:47:32 I'm going to be doing quite a few small improvements. I don't know if it will be realistic to benchmark them all, but I am benchmarking the underlying behaviour 15:47:44 #link https://github.com/stackhpc/ansible-scaling 15:47:55 WIP 15:48:18 nice, instastar 15:48:23 #topic Keystone fernet keys handling https://review.opendev.org/707080 15:48:24 patch 707080 - kolla-ansible - Fix fernet bootstrap and key distribution - follow up - 14 patch sets 15:48:25 wouldn't there be a lot of repetition between roles separating it out completely? Perhaps demarkating the common role more, but not completely? 15:49:18 JamesBenson: possibly. I don't plan to split at this point 15:49:23 ++ 15:49:26 who added this topic? 15:49:29 ok, now unto the keystone 15:49:36 * yoctozepto did 15:49:53 just quick recap 15:50:02 whether you have something to say about it 15:50:15 it seems to we are still plagued by the random keystone issues 15:50:55 I don't want to take up too much meeting time but rather make sure you remember we have this pesky issue and reviewing and any more info is welcome 15:51:12 mnasiadka is not around today so we might as well postpone any further discussion till the next meeting 15:51:45 +1 for prioritising it 15:51:47 * yoctozepto said what he wanted to say 15:51:59 dougsz, priteau: your thoughts maybe? 15:52:00 #topic CentOS/AArch64: use Erlang from CentOS to get RabbitMQ working: https://review.opendev.org/#/q/I2559267d120081f2e5eabc9d966b019517a5ad5d 15:52:01 hrw: 15:52:05 yes, me 15:52:17 we need those two patches in 15:52:31 yoctozepto: Sorry I've not looked closely at the fernet issues 15:52:31 yoctozepto: Only that the follow up patch resolved my issues on a single node deploy 15:52:31 otherwise no rabbitmq for centos/aarch64 so no deployments 15:52:41 agree we need those 2 patches 15:52:55 and no, I do not plan to work on building upstream erlang to satisfy sick yoctozepto wishes 15:52:58 priteau, dougsz: thanks, guys 15:53:12 hrw: I'm sorry to hear you consider them sick 15:54:13 I'm fine as long as we don't have to cater for multiarch deployment of rabbitmq (unlikely) or rdo-provided rmq breaks k-a logic at some point (more likely) 15:54:14 I came from 'get it working and then fix if bug reports' than 'spend extra week or two on stuff no one uses' 15:55:34 +1 15:55:35 +2 15:55:49 I'm an overthinker as you might have noticed 15:55:58 I like to know the traps ahead 15:56:02 yoctozepto: rmq goes from upstream repo not centos 15:56:10 https://24eade5565127d985eb0-7e6feee1594781d3a430e22d861f8db7.ssl.cf2.rackcdn.com/737473/2/check-arm64/kolla-build-centos8-source-aarch64/b9f98b0/kolla/build/rabbitmq.log 15:56:19 hrw: yeah, but erlang is critical to rmq's happiness :-) 15:56:30 yoctozepto: so go, spend some time on building it. 15:56:38 copr will make it quite easy probably 15:56:41 hrw: you are supporting aarch64 15:56:44 hrw: ussuri fails 15:57:00 I'm supporting your supporting 15:57:05 and that's it :-) 15:57:07 mgoddard: once master gets approved I will see looking at ussuri 15:57:12 ok 15:57:16 it's approved :) 15:57:19 #topic Infra images: https://etherpad.opendev.org/p/Kolla-infra-images 15:57:23 hrw again 15:57:29 yes 15:57:37 review patches, read and comment notes 15:57:51 or better read/comment/review even 15:58:14 as we had so many discussions that I am starting to loose track what we agreed and what not 15:58:38 ++ 15:58:54 only me looking at that etherpad though 15:59:59 will read the pad after the meeting 16:00:03 thx 16:00:10 that's a wrap 16:00:18 Kolla klub tomorrow 16:00:22 see you there 16:00:24 time for #endmeeting 16:00:25 #endmeeting