15:00:31 #startmeeting kolla 15:00:32 Meeting started Wed Nov 25 15:00:31 2020 UTC and is due to finish in 60 minutes. The chair is mgoddard. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:33 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:35 The meeting name has been set to 'kolla' 15:00:59 #topic rollcall 15:01:01 \o 15:01:05 o/ 15:01:05 o/ 15:01:19 \o 15:02:15 #topic agenda 15:02:19 * Roll-call 15:02:20 * Announcements 15:02:22 ** Kolla Wallaby priorities https://etherpad.opendev.org/p/kolla-wallaby-priorities 15:02:24 ** Stein release now in Extended Maintenance (EM) 15:02:26 * Review action items from the last meeting 15:02:28 * CI status 15:02:30 * Victoria release planning 15:02:32 * Dockerhub pull rate limits https://etherpad.opendev.org/p/docker-pull-limits 15:02:34 * Cinder active/active https://bugs.launchpad.net/kolla-ansible/+bug/1904062 15:02:36 Launchpad bug 1904062 in kolla-ansible wallaby "external ceph cinder volume config breaks volumes on ussuri upgrade" [High,In progress] - Assigned to Michal Nasiadka (mnasiadka) 15:02:36 * Wallaby PTG actions 15:02:38 * Review new retirements (Wallaby) 15:02:40 * Cinder v2 to be dropped in Wallaby http://lists.openstack.org/pipermail/openstack-discuss/2020-November/018697.html 15:02:42 * Stein Extended Maintenance 15:02:44 #topic announcements 15:03:06 #info Kolla Wallaby priorities 15:03:10 #link Kolla Wallaby priorities https://etherpad.opendev.org/p/kolla-wallaby-priorities 15:03:13 #undo 15:03:14 Removing item from minutes: #link https://etherpad.opendev.org/p/kolla-wallaby-priorities 15:03:19 #link https://etherpad.opendev.org/p/kolla-wallaby-priorities 15:03:32 Voting now finished, we have our priorities 15:03:38 I've added them to the whiteboard 15:03:51 #link https://etherpad.opendev.org/p/KollaWhiteBoard 15:04:51 Feel free to add yourself as an owner/interested party on one of the priorities 15:04:54 #info Stein release now in Extended Maintenance (EM) 15:05:05 Final releases have been created 15:05:19 We can discuss more later 15:05:22 Any others? 15:06:43 #topic Review action items from the last meeting 15:06:58 mgoddard to email openstack-discuss about final reminder for wallaby priority voting 15:07:00 yoctozepto fix NFV 15:07:05 mgoddard did his 15:07:15 yoctozepto seems to have been busy 15:07:26 do you want another action? 15:08:23 mgoddard: yes, please 15:08:29 mgoddard: health issues 15:08:53 yoctozepto: sorry to hear 15:09:03 #action yoctozepto fix NFV 15:09:13 #topic CI status 15:09:34 15:09:48 docker rate limits we can discuss again later 15:10:22 https://bugs.launchpad.net/nova/+bug/1902696 15:10:23 Launchpad bug 1902696 in oslo.messaging "nova-compute fails with Unhandled error: TypeError: _wrap_socket_sni() got an unexpected keyword argument 'ca_certs'" [Undecided,New] 15:10:36 there was another patch for requirements, but it didn't help us 15:11:16 I'm sure there was a version of the patch that worked, do we need to go back and find out what it was 15:11:36 5.0.2 does not help 15:12:15 no 15:13:16 looks like PS1 passed 15:13:26 nov 7 11:35PM 15:14:03 PS1 of which one? 15:14:04 at that time, the requirements patch was PS2: https://review.opendev.org/c/openstack/requirements/+/761194/2 15:14:08 OK 15:14:39 which pinned amqp<=2.6.1 15:14:40 so yeah, it simply says new kombu is b0rken 15:14:59 and kombu<=5.0.1 15:15:37 current patch allows amqp 5.0.2 15:16:10 and kombu 5.0.2 15:16:41 yes 15:17:50 duh, new gerrit clumsy 15:17:55 yeah, and slow 15:18:05 hopefully it'll improve 15:18:30 ++ 15:18:33 #action mgoddard to try reverting to https://review.opendev.org/c/openstack/requirements/+/761194/2 15:19:11 I think other CI issues are unchanged 15:19:29 #topic Victoria release planning 15:19:43 https://etherpad.opendev.org/p/KollaWhiteBoard 15:19:49 L144 lists release blockers 15:20:11 Kolla has the rabbitmq TLS issue we were just discussing 15:20:20 anyone looked into a/a? 15:20:24 Kolla ansible has cinder-volume active/active 15:20:40 I don't think so 15:20:59 heh 15:21:15 not beyond mnasiadka's patch anyway 15:21:36 I'll try to put some time into it tomorrow 15:21:41 yeah 15:21:47 best approach with CI testing 15:21:56 what happens if we upgrade to new approach 15:22:01 when we have volumes in place 15:22:07 might be worth to have this in general 15:22:13 instead of cleaning up 15:23:29 yes 15:24:02 create, test, delete, create, test, upgrade, test, delete, create, test, delete 15:24:13 something like that 15:24:15 ++ 15:24:37 for simplicity 15:24:50 we can treat reconf = upgrade 15:24:55 would not hurt 15:25:58 well, we need to be sure that no volumes are in -ing state before upgrade 15:26:03 although it's effectively a noop, so we'd be unlikely to catch anything 15:26:25 (ignore the above, we don't test after reconf) 15:26:28 (it = reconfigure in CI) 15:26:33 (and remove old agents from db or ask the user to remove them) 15:26:37 after upgrade 15:26:54 or am I missing something major? :) 15:27:08 we are discussing testing to see the impact 15:27:18 and strenghten our testing in general 15:27:32 volumes disappearing due to an upgrade is a big no-no 15:28:11 ok, let's move on 15:28:21 #topic Dockerhub pull rate limits https://etherpad.opendev.org/p/docker-pull-limits 15:29:05 I was thinking, given the lack of any action on this, how about we go with hrw's suggestion to publish less frequently? 15:29:29 it should help, although not guaranteed 15:30:12 pretty simple to implement though, and it might be nice to have a pattern for less frequent publishing, e.g. for EM branches 15:30:32 any thoughts? 15:30:46 but we get penalized for *pulling* 15:30:51 not pushing 15:30:55 yes 15:31:10 but the less we push, the less we have to pull into registry mirrors 15:31:19 since they should cache images 15:31:21 Would it be possible for the community/foundation to implement a proxy-registry somehow? 15:31:24 hmm, clever 15:31:33 i think reconfigure can do some reload work(or restart one by one without downtime) 15:32:07 wuchunyang: rate limits are per 6 hours, not per second :) 15:32:46 rafaelweingartne: opendev infra provide registry mirrors, which we use 15:32:46 weekly images sound fine 15:32:51 ok.. 15:32:56 and absurdly easy to implement now 15:33:03 just switch the pipeline 15:33:05 :D 15:33:30 mgoddard: so, we do not actually need to worry about the pull problem, right? If we document on how people can use these mirrors then 15:33:37 oh nice, periodic-weekly exists 15:33:44 indeed 15:33:54 the only problem is 15:33:59 the failures 15:33:59 :D 15:34:06 ever-b0rken images 15:34:12 never-new images 15:34:20 sad reality 15:34:53 rafaelweingartne: we're mostly focussing on how it affects CI here. Users need their own solution, such as a local registry or registry mirror 15:35:17 I see, that is actually what we are using. 15:35:28 rafaelweingartne: sensible, even before the rate limit 15:35:41 yoctozepto: true. I was thinking more like every 2-3 days 15:35:56 with some early exit in the publish job 15:36:50 if hash_of_job_info % 7 in days_to_publish then publish 15:37:33 would be nice if zuul gave the date of the last successful run of the job 15:37:47 possibly we could query it 15:38:02 sounds sensible to query 15:38:04 that would allow for failures 15:38:09 though never used the zuul api before 15:38:17 it's easy enough 15:38:18 mhm 15:39:06 anyway, some solution in there 15:39:09 anyone want to pick it up? 15:39:22 I suppose periodic-weekly would be an easy win 15:39:40 although they probably all run on a sunday 15:39:48 and an easy self-sabotage ;d 15:40:08 let's go with the zuul querying 15:40:21 though it's all problematic as these jobs only ever run periodically 15:40:28 need to simulate in DNM first 15:40:45 yes 15:41:08 I think a static hash makes sense. It distributes jobs over days 15:41:31 a system failure would lead to stampeding if we use the last success 15:41:55 so you prefer stochastic approaches 15:42:01 fine by me 15:42:36 I don't mind too much, as long as it gets fixed :) 15:43:03 somebody fix :-) 15:43:09 it's available, if someone wants to pick it up 15:43:13 #topic Cinder active/active https://bugs.launchpad.net/kolla-ansible/+bug/1904062 15:43:15 Launchpad bug 1904062 in kolla-ansible wallaby "external ceph cinder volume config breaks volumes on ussuri upgrade" [High,In progress] - Assigned to Michal Nasiadka (mnasiadka) 15:43:35 I don't think we have too much to discuss on this one, other than we need to get on with it 15:44:12 #topic Wallaby PTG actions 15:44:28 TODO(): High level documentation, eg. examples of networking config, diagrams, justification of use of containers, not k8s etc. 15:44:30 TODO(): document justification for kolla/kolla-ansible 15:44:32 TODO(yoctozepto): wait for zuul-runner, try it out, fix issues, document how to use it 15:44:34 TODO(mgoddard): Poll community for a new kolla klub timeslot 15:44:36 TODO(Fl1nt): Call for contributor campaign, offer some (limited) mentorship 15:44:38 TODO(mgoddard): Reach out to existing community members in non EU/US 15:44:40 TODO() find 'interested parties' for infra image CI work 15:44:42 TODO(Fl1nt): add a note to documentation (FAQ/troubleshooting?) about new Docker hub limits 15:44:44 TODO(): ask opendev infra about open source project application process 15:44:46 TODO(): add tags for register & bootstrap? 15:44:48 TODO(yoctozepto): make genconfig + deploy-containers work 15:44:50 TODO(yoctozepto): Deprecate reconfigure command 15:44:52 TODO(): Modernise the old skool Swift role 15:44:54 TODO(Fl1nt, or others): PoC and/or spec for podman 15:44:56 TODO(yoctozepto): work on masakari hostmonitor integration 15:44:58 TODO(headphoneJames): write up high level description of how Letsencrypt fits together 15:45:00 TODO(): write a high level design document/spec for kayobe multiple environments 15:45:02 TODO(): 'Deprecate' devicemapper on stable branches, require some action (set a flag) to override 15:45:04 TODO(jovial): make a kayobe story for switching to networkmanager 15:45:06 TODO(dougszu): Strip out Grafana post configure functionality and move it to Kolla-Ansible 15:45:08 TODO(dougszu): Investigate ansible collections, reference custom playbook repo (e.g. kayobe-ops) 15:45:10 any updates? 15:45:12 anyone want to pick one up that is not assigned? 15:45:37 the high level description: https://etherpad.opendev.org/p/kolla-ansible-letsencrypt-https 15:45:42 it is not in this list, but we did update https://review.opendev.org/c/openstack/kolla-ansible/+/695432 15:45:44 that was quick 15:45:55 I added the requested documentation on how to create a DEV env. for testing 15:47:17 thanks rafaelweingartne, that should help reviewers & testers 15:47:42 Cool, we are guessing that now, people would be more confident in testing 15:47:46 and then we can move on with that 15:48:20 yes, I think we should prioritise it once Victoria is released 15:48:54 awesome, thanks! 15:49:05 headphoneJames: a little off topic, but have you been communicating with Jason? 15:49:17 Deprecate reconfigure command this task i can help 15:49:22 I did reach out to him 15:49:54 he knows that I'm running with LE 15:49:57 ok, great 15:50:07 I also let him know that I updated the spec 15:50:14 as long as we're keeping him in the loop 15:50:25 anyone uses podman in production ? 15:51:02 i use podman to deploy ceph in production, but i hit a podman bug.. 15:51:26 wuchunyang: the list is on https://etherpad.opendev.org/p/kolla-wallaby-ptg, feel free to add your nick 15:51:38 ok 15:52:13 wuchunyang: I think we normally use docker 15:52:20 #topic Review new retirements (Wallaby) 15:52:25 yoctozepto: is this you? 15:52:38 yes, i think docker is more reliable .. 15:52:44 yees 15:53:01 wuchunyang, we use a mix (docker for openstack, podman for ceph) 15:53:03 we got a bit of retirements 15:53:10 on the mailing list 15:53:22 should we allow time for people to come forward to help? 15:53:23 so we might want to deprecate as well 15:53:27 dswebb me too.. but our ceph hit a podman bug 15:53:39 yeah, I'm fine with allowing more time 15:53:47 I guess deprecation doesn't hurt 15:53:48 you having issues with the node exporter not starting properly on deploy? 15:53:48 just keep this in the back of our minds 15:53:58 good thinking 15:54:02 you can action me on it 15:54:07 I will propose notes 15:54:10 to keep around 15:54:19 #action yoctozepto deprecate the retired Wallabies 15:54:26 thx 15:54:30 dswebb https://github.com/containers/podman/issues/2553 15:54:34 #topic Cinder v2 to be dropped in Wallaby http://lists.openstack.org/pipermail/openstack-discuss/2020-November/018697.html 15:54:47 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-November/018697.html 15:54:50 yoctozepto again? 15:55:05 you guessed it! 15:55:12 I actually missed that one 15:55:28 now, it's trickier than the previous one 15:55:39 we need to handle the drop and removal of existing 15:55:51 it should not be breaking if we forget 15:55:57 but better play it safe 15:56:09 we need a way to track things like this 15:56:24 another etherpad? 15:56:29 whiteboard feels lengthy 15:56:33 or maybe 15:56:34 was thinking another list next to priorities 15:56:36 just use wiki 15:56:45 oh well 15:56:46 considering it only has one item currently :) 15:56:52 whatever works 15:56:56 true that 15:57:04 etherpads get forgotten 15:57:24 we can always start in a contest for longest etherpad :) 15:57:58 mnasiadka: ++ 15:58:06 yoctozepto: I remember we had the discussion before, that we need a role/change in service-ks-register to remove endpoints 15:58:12 hopefully the prizes are worth it 15:58:19 mnasiadka: we had 15:58:26 but there was no urgency 15:58:29 now there is 16:00:09 I've added it as a priority for kolla-ansible 16:00:11 what about one patch to deprecate and next one to remove with WIP status? 16:00:30 or similar 16:00:44 hrw: for cinder or retired projects? 16:00:50 retired 16:00:53 ok 16:01:04 can do, was just hoping to avoid possibly unnecessary work 16:01:13 if someone picks up the projects 16:01:35 but I won't/can't stop anyone proposing a patch 16:01:40 #topic Stein Extended Maintenance 16:01:45 last topic 16:01:48 Stein is now EM 16:02:27 so we can stop publish/care? 16:02:33 I was hoping to see something on https://docs.openstack.org/kolla/latest/contributor/release-management.html about what we normally do for EM 16:02:37 but there is nothing 16:02:48 I think, we do this 16:03:11 stop backports by default, but accept if proposed 16:03:37 stop publishing (although not usually immediately) 16:04:02 we could also try switching to stable branches in source images (as we did in ussuri) 16:04:11 and switch to weekly publish 16:04:14 thoughts? 16:04:58 well, switching to stable branches could be good for users using Stein 16:05:00 it is em. I would do one final release if there were changes since previous 16:05:13 final release is done 16:05:59 I would end. 16:06:07 do final publish and done 16:06:20 switching to stable branches can bring new issues 16:07:18 rocky was last published 8 months ago 16:07:22 when someone propose backport with sensible reason then we merge and let users do build on they own or wait for weekly/monthly publish 16:07:45 mgoddard: and no one asked about rocky so we do not recognize that name anymore? 16:08:17 ok, we're past time 16:08:17 thanks all 16:08:19 #endmeeting