15:00:59 <noonedeadpunk> #startmeeting openstack_ansible_meeting 15:01:00 <opendevmeet> Meeting started Tue Jul 26 15:00:59 2022 UTC and is due to finish in 60 minutes. The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:00 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:00 <opendevmeet> The meeting name has been set to 'openstack_ansible_meeting' 15:01:06 <noonedeadpunk> #topic office hours 15:01:23 <mgariepy> right on time :D 15:02:30 <noonedeadpunk> so for infra jobs yeah, we probably should also add ansible-role-requirements and ansible-collection-requirements 15:02:46 <noonedeadpunk> to "files" 15:03:18 <noonedeadpunk> stream-9 distro jobs - well, I intended to fix them one day 15:03:47 <noonedeadpunk> they're NV as of today, so why not. But I had a dependency on smth to get them working, can't really recall 15:04:02 * noonedeadpunk fixed alarm clock :D 15:04:18 <damiandabrowski> :D 15:04:33 <opendevreview> Jonathan Rosser proposed openstack/openstack-ansible master: Remove centos-9-stream distro jobs https://review.opendev.org/c/openstack/openstack-ansible/+/851041 15:05:28 <jrosser> o/ hello 15:05:58 <noonedeadpunk> Will try to spend some time on them maybe next week 15:07:16 <noonedeadpunk> oh, ok, so the reason why distro jobs are failing, is that zed packages are not published yet 15:07:32 <noonedeadpunk> and we need openstacksdk 0.99 with our collection versions 15:09:32 <noonedeadpunk> So. I was looking today at our lxc jobs. And wondering if we should try switching to overlayfs instead of dir by default? 15:09:42 <noonedeadpunk> maybe it's more a ptg topic though 15:10:15 <jrosser> is there a benefit? 15:10:20 <noonedeadpunk> But I guess as of today, all supported distros (well, except focal - do we support it?) have kernel 3.13 by default 15:10:41 <noonedeadpunk> Well, yes? You don't copy root each time, but just use cow and snapshot? 15:11:12 <noonedeadpunk> I have no idea if it's working at all tbh as of today, as I haven't touched that part for real 15:11:28 <noonedeadpunk> but you should save quite some space with that on controllers 15:12:02 <jrosser> i think we have good tests for all of these, so it should be easy to see 15:12:03 <noonedeadpunk> Each container is ~500MB 15:12:12 <noonedeadpunk> And diff would be like 20MB tops 15:12:33 <noonedeadpunk> Nah, more. Image is 423MB 15:12:56 <noonedeadpunk> So we would save 423Mb per each container that runs 15:13:43 <noonedeadpunk> Another thing was, that we totally forget to mention during previous PTG, is getting rid of dash-sparated groups and use underscore only 15:13:58 <noonedeadpunk> We're postponing this for quite a while now to have that said... 15:14:11 <noonedeadpunk> But it might be not that big chunk of work. 15:14:24 <noonedeadpunk> I will try to look into that, but not now for sure 15:15:20 <noonedeadpunk> Right now I'm trying to work on AZ concept and what needs to be done from OSA side to get this working. Will publish a scenario to docs once done if there're no objections 15:15:45 <noonedeadpunk> env.d overrides are quite massive as of to day to have that said 15:15:52 <noonedeadpunk> *today 15:16:18 <jrosser> i wonder if we prune journals properly 15:16:31 <opendevreview> Marc GariƩpy proposed openstack/openstack-ansible master: Fix cloud.openstack.* modules https://review.opendev.org/c/openstack/openstack-ansible/+/851038 15:16:41 <jrosser> i am already using COW on my controllers (zfs) and the base size of the container is like tiny compared to the total 15:17:24 <noonedeadpunk> ah. well, then there's no profit in overlayfs for you :D 15:18:27 <noonedeadpunk> But I kind of wonder if there're consequences to use overlayfs with other bind-mounts, as it kind of change representaton of these from what I read 15:18:33 <jrosser> i guess i mean that the delta doesnt always stay small 15:18:43 <jrosser> it will be when you first create them 15:19:45 <noonedeadpunk> well, delta would be basically venvs size I guess? 15:20:01 <jrosser> for $time-since-resinstall on one i'me looking at 25G for 13 containers even with COW 15:20:03 <noonedeadpunk> as logs and journals and databases are passed into the container and not root 15:20:46 <noonedeadpunk> that is kind of weird? 15:28:42 <mgariepy> wow.. https://github.com/ansible/ansible/issues/78344 15:29:50 <jrosser> huh 15:30:08 <jrosser> they're going to complain next at the slightly unusual use of the setup module, i can just feel it 15:30:09 <mgariepy> let's make the control persist 4 hours.. ;p 15:30:49 <mgariepy> what a good responses.. 15:31:07 <jrosser> noonedeadpunk: in my container /openstack is nearly 1G for keystone container, each keystone venv is ~200M so for a couple of upgrade this adds up pretty quick 15:32:02 <noonedeadpunk> btw, retention of old venvs is a good topic 15:32:27 <noonedeadpunk> should we create some playbook at least for ops repo for that? 15:35:54 <jrosser> it's pretty coupled to the inventory so might be hard for the ops repo? 15:36:47 <jrosser> mgariepy: i will reproduce with piplining=False and show that it handles -13 in that case 15:36:55 <jrosser> i'm sure i saw it doing that 15:37:46 <mgariepy> hmm i haven't been able to reproduce it with pipelining false. 15:37:58 <noonedeadpunk> because it re-tries :) 15:38:12 <mgariepy> no 15:38:26 <mgariepy> because scp catch it to transfer the file. 15:38:33 <mgariepy> then reactivate the error 15:38:54 <mgariepy> error / socket 15:40:00 <jrosser> well - question is if it throws AnsibleControlPersistBrokenPipeError and handles it properly when pipelining is false 15:40:13 <mgariepy> when pipelining is false 15:40:18 <mgariepy> it will transfert the file. 15:40:43 <mgariepy> if there is an error it most likely end up being exit(255). 15:40:50 <mgariepy> then it retries. 15:40:56 <mgariepy> no ? 15:41:16 <jrosser> lets test it 15:41:19 <mgariepy> you never catch the same race with pipelining false. 15:41:37 <jrosser> no, but my code that was printing p.returncode was showing what was happening though 15:42:38 <jrosser> anyway, lets meeting some more then look at that 15:44:30 <jrosser> we did a Y upgrade in the lab last week 15:44:48 <jrosser> which went really quite smoothly, just a couple of issues 15:44:57 <spotz_> nice 15:45:27 <jrosser> galaxy seems to have some mishandling of http proxies so we had to override all the things to be https://github.... 15:46:08 <jrosser> also lxc_hosts_container_build_command is not taking account of lxc_apt_mirror 15:46:42 <anskiy> noonedeadpunk: there is already one: https://opendev.org/openstack/openstack-ansible-ops/src/branch/master/ansible_tools/playbooks/cleanup-venvs.yml 15:47:07 <jrosser> but the biggest trouble was from `2022-07-22 08:04:35 upgrade lxc:all 1:4.0.6-0ubuntu1~20.04.1 1:4.0.12-0ubuntu1~20.04.1` 15:47:07 <noonedeadpunk> anskiy: ah, well :) 15:48:14 <noonedeadpunk> oh, well, it would restart all containers 15:48:23 <jrosser> yeah, pretty much 15:48:42 <jrosser> and i think setup-hosts does that across * at the same time 15:49:03 <noonedeadpunk> what I had to do with that, is to systemctl stop lxc, systemctl mask lxc, then update package, unmask and reboot 15:49:41 <noonedeadpunk> if we pass package_state: latest, then yeah... 15:49:46 <jrosser> i don't know if we want to manage that a bit more 15:50:06 <jrosser> but that might be a big surprise for poeple expecting the upgrades to not hose everything at once 15:51:16 <noonedeadpunk> I bet this is causing troubles https://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/run-upgrade.sh#L186 15:53:02 <jrosser> we also put that in the documentation https://opendev.org/openstack/openstack-ansible/src/branch/master/doc/source/admin/upgrades/minor-upgrades.rst#L52 15:53:04 <noonedeadpunk> So we would need to add serial here https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/containers-lxc-host.yml#L24 and override it in script 15:53:36 <noonedeadpunk> ok, that is good a point. 15:53:43 <noonedeadpunk> * a good point 15:53:50 <jrosser> well i don't know if serial is the right thing to do 15:54:47 <noonedeadpunk> are you thinking about putting lxc on hold? 15:55:09 <noonedeadpunk> I really can't recall how it's done on centos though 15:57:11 <noonedeadpunk> but yes, I do agree we need to fix that for ppl 15:58:16 <noonedeadpunk> not upgrading packages - well, it's also a thing. Maybe we can adjust docs not to mention `-e package_state=latest` as smth "default" but what can be added if you want to update packages, but with some pre-cautions 15:58:39 <noonedeadpunk> As at very least this also affects neutron, as ovs being upgraded also cause disturbances 15:59:33 <jrosser> andrewbonney is back tomorrow - i'm sure we have some use of unattended upgrades here with config to prevent $bad-things but i can't find where that is just now 16:00:01 <noonedeadpunk> well, we jsut disabled unattended upgrades... 16:00:19 <jrosser> yeah i might be confused here, but it is the same sort of area 16:01:10 <noonedeadpunk> so my intention when I added `-e package_state=latest` was - you're panning upgrade and thus maintenance. During maintenance interruptions happen. 16:01:25 <noonedeadpunk> But with case of LXC you would need to re-boostrap galera at bery least 16:01:43 <noonedeadpunk> as all goes down at the same time 16:01:48 <jrosser> right, it's not at all self-healing just by running some playbooks 16:01:53 <jrosser> which is kind of what poeple expect 16:02:51 <noonedeadpunk> and what I see we can do - either manage serial, then restarting lxc will likely self-heal as others are alive, or, put some packages on hold and add another flag to upgrade LXC to make it intentionally 16:03:15 <noonedeadpunk> as I don't think we are able to pin the version 16:04:02 <noonedeadpunk> ok, let's wait for andrewbonney then to make a decision 16:04:06 <noonedeadpunk> #endmeeting