15:00:59 #startmeeting openstack_ansible_meeting 15:01:00 Meeting started Tue Jul 26 15:00:59 2022 UTC and is due to finish in 60 minutes. The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:00 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:00 The meeting name has been set to 'openstack_ansible_meeting' 15:01:06 #topic office hours 15:01:23 right on time :D 15:02:30 so for infra jobs yeah, we probably should also add ansible-role-requirements and ansible-collection-requirements 15:02:46 to "files" 15:03:18 stream-9 distro jobs - well, I intended to fix them one day 15:03:47 they're NV as of today, so why not. But I had a dependency on smth to get them working, can't really recall 15:04:02 * noonedeadpunk fixed alarm clock :D 15:04:18 :D 15:04:33 Jonathan Rosser proposed openstack/openstack-ansible master: Remove centos-9-stream distro jobs https://review.opendev.org/c/openstack/openstack-ansible/+/851041 15:05:28 o/ hello 15:05:58 Will try to spend some time on them maybe next week 15:07:16 oh, ok, so the reason why distro jobs are failing, is that zed packages are not published yet 15:07:32 and we need openstacksdk 0.99 with our collection versions 15:09:32 So. I was looking today at our lxc jobs. And wondering if we should try switching to overlayfs instead of dir by default? 15:09:42 maybe it's more a ptg topic though 15:10:15 is there a benefit? 15:10:20 But I guess as of today, all supported distros (well, except focal - do we support it?) have kernel 3.13 by default 15:10:41 Well, yes? You don't copy root each time, but just use cow and snapshot? 15:11:12 I have no idea if it's working at all tbh as of today, as I haven't touched that part for real 15:11:28 but you should save quite some space with that on controllers 15:12:02 i think we have good tests for all of these, so it should be easy to see 15:12:03 Each container is ~500MB 15:12:12 And diff would be like 20MB tops 15:12:33 Nah, more. Image is 423MB 15:12:56 So we would save 423Mb per each container that runs 15:13:43 Another thing was, that we totally forget to mention during previous PTG, is getting rid of dash-sparated groups and use underscore only 15:13:58 We're postponing this for quite a while now to have that said... 15:14:11 But it might be not that big chunk of work. 15:14:24 I will try to look into that, but not now for sure 15:15:20 Right now I'm trying to work on AZ concept and what needs to be done from OSA side to get this working. Will publish a scenario to docs once done if there're no objections 15:15:45 env.d overrides are quite massive as of to day to have that said 15:15:52 *today 15:16:18 i wonder if we prune journals properly 15:16:31 Marc Gariépy proposed openstack/openstack-ansible master: Fix cloud.openstack.* modules https://review.opendev.org/c/openstack/openstack-ansible/+/851038 15:16:41 i am already using COW on my controllers (zfs) and the base size of the container is like tiny compared to the total 15:17:24 ah. well, then there's no profit in overlayfs for you :D 15:18:27 But I kind of wonder if there're consequences to use overlayfs with other bind-mounts, as it kind of change representaton of these from what I read 15:18:33 i guess i mean that the delta doesnt always stay small 15:18:43 it will be when you first create them 15:19:45 well, delta would be basically venvs size I guess? 15:20:01 for $time-since-resinstall on one i'me looking at 25G for 13 containers even with COW 15:20:03 as logs and journals and databases are passed into the container and not root 15:20:46 that is kind of weird? 15:28:42 wow.. https://github.com/ansible/ansible/issues/78344 15:29:50 huh 15:30:08 they're going to complain next at the slightly unusual use of the setup module, i can just feel it 15:30:09 let's make the control persist 4 hours.. ;p 15:30:49 what a good responses.. 15:31:07 noonedeadpunk: in my container /openstack is nearly 1G for keystone container, each keystone venv is ~200M so for a couple of upgrade this adds up pretty quick 15:32:02 btw, retention of old venvs is a good topic 15:32:27 should we create some playbook at least for ops repo for that? 15:35:54 it's pretty coupled to the inventory so might be hard for the ops repo? 15:36:47 mgariepy: i will reproduce with piplining=False and show that it handles -13 in that case 15:36:55 i'm sure i saw it doing that 15:37:46 hmm i haven't been able to reproduce it with pipelining false. 15:37:58 because it re-tries :) 15:38:12 no 15:38:26 because scp catch it to transfer the file. 15:38:33 then reactivate the error 15:38:54 error / socket 15:40:00 well - question is if it throws AnsibleControlPersistBrokenPipeError and handles it properly when pipelining is false 15:40:13 when pipelining is false 15:40:18 it will transfert the file. 15:40:43 if there is an error it most likely end up being exit(255). 15:40:50 then it retries. 15:40:56 no ? 15:41:16 lets test it 15:41:19 you never catch the same race with pipelining false. 15:41:37 no, but my code that was printing p.returncode was showing what was happening though 15:42:38 anyway, lets meeting some more then look at that 15:44:30 we did a Y upgrade in the lab last week 15:44:48 which went really quite smoothly, just a couple of issues 15:44:57 nice 15:45:27 galaxy seems to have some mishandling of http proxies so we had to override all the things to be https://github.... 15:46:08 also lxc_hosts_container_build_command is not taking account of lxc_apt_mirror 15:46:42 noonedeadpunk: there is already one: https://opendev.org/openstack/openstack-ansible-ops/src/branch/master/ansible_tools/playbooks/cleanup-venvs.yml 15:47:07 but the biggest trouble was from `2022-07-22 08:04:35 upgrade lxc:all 1:4.0.6-0ubuntu1~20.04.1 1:4.0.12-0ubuntu1~20.04.1` 15:47:07 anskiy: ah, well :) 15:48:14 oh, well, it would restart all containers 15:48:23 yeah, pretty much 15:48:42 and i think setup-hosts does that across * at the same time 15:49:03 what I had to do with that, is to systemctl stop lxc, systemctl mask lxc, then update package, unmask and reboot 15:49:41 if we pass package_state: latest, then yeah... 15:49:46 i don't know if we want to manage that a bit more 15:50:06 but that might be a big surprise for poeple expecting the upgrades to not hose everything at once 15:51:16 I bet this is causing troubles https://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/run-upgrade.sh#L186 15:53:02 we also put that in the documentation https://opendev.org/openstack/openstack-ansible/src/branch/master/doc/source/admin/upgrades/minor-upgrades.rst#L52 15:53:04 So we would need to add serial here https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/containers-lxc-host.yml#L24 and override it in script 15:53:36 ok, that is good a point. 15:53:43 * a good point 15:53:50 well i don't know if serial is the right thing to do 15:54:47 are you thinking about putting lxc on hold? 15:55:09 I really can't recall how it's done on centos though 15:57:11 but yes, I do agree we need to fix that for ppl 15:58:16 not upgrading packages - well, it's also a thing. Maybe we can adjust docs not to mention `-e package_state=latest` as smth "default" but what can be added if you want to update packages, but with some pre-cautions 15:58:39 As at very least this also affects neutron, as ovs being upgraded also cause disturbances 15:59:33 andrewbonney is back tomorrow - i'm sure we have some use of unattended upgrades here with config to prevent $bad-things but i can't find where that is just now 16:00:01 well, we jsut disabled unattended upgrades... 16:00:19 yeah i might be confused here, but it is the same sort of area 16:01:10 so my intention when I added `-e package_state=latest` was - you're panning upgrade and thus maintenance. During maintenance interruptions happen. 16:01:25 But with case of LXC you would need to re-boostrap galera at bery least 16:01:43 as all goes down at the same time 16:01:48 right, it's not at all self-healing just by running some playbooks 16:01:53 which is kind of what poeple expect 16:02:51 and what I see we can do - either manage serial, then restarting lxc will likely self-heal as others are alive, or, put some packages on hold and add another flag to upgrade LXC to make it intentionally 16:03:15 as I don't think we are able to pin the version 16:04:02 ok, let's wait for andrewbonney then to make a decision 16:04:06 #endmeeting