| *** ysandeep|out is now known as ysandeep | 01:55 | |
| *** ysandeep is now known as ysandeep|afk | 03:21 | |
| *** ysandeep|afk is now known as ysandeep | 05:03 | |
| jrosser | morning | 07:38 |
|---|---|---|
| *** ysandeep is now known as ysandeep|afk | 08:02 | |
| noonedeadpunk | morning! | 08:07 |
| damiandabrowski | hi! | 09:10 |
| *** tosky_ is now known as tosky | 09:10 | |
| *** dviroel__ is now known as dviroel | 11:05 | |
| mgariepy | good morning everyone | 11:39 |
| *** ysandeep|afk is now known as ysandeep | 12:30 | |
| mgariepy | hmm ? ERROR! couldn't resolve module/action 'openstack.cloud.os_auth'. This often indicates a misspelling, missing collection, or incorrect module path. | 14:44 |
| opendevreview | Marc Gariépy proposed openstack/openstack-ansible master: Fix cloud.openstack.auth module https://review.opendev.org/c/openstack/openstack-ansible/+/851038 | 14:47 |
| noonedeadpunk | do we need to backport that? ^ | 14:49 |
| jrosser | whoops | 14:50 |
| mgariepy | maybe we need to start managing the version or the module we install ? | 14:50 |
| jrosser | i think the zuul stuff is set up now to run the infra jobs when those playbooks are touched | 14:50 |
| jrosser | we didnt do that before and could merge broken stuff | 14:50 |
| jrosser | huh interesting https://github.com/openstack/openstack-ansible/blob/master/zuul.d/jobs.yaml#L267-L277 | 14:52 |
| jrosser | yes so we don't actually run that anywhere in CI? | 14:53 |
| *** dviroel is now known as dviroel|lunch | 14:55 | |
| opendevreview | Marc Gariépy proposed openstack/openstack-ansible master: Set the number of threads for processes to 2 https://review.opendev.org/c/openstack/openstack-ansible/+/850942 | 14:56 |
| jrosser | also why are we running stream-9 distro jobs | 14:59 |
| jrosser | that looks like a mistake | 14:59 |
| *** ysandeep is now known as ysandeep|out | 15:00 | |
| noonedeadpunk | #startmeeting openstack_ansible_meeting | 15:00 |
| opendevmeet | Meeting started Tue Jul 26 15:00:59 2022 UTC and is due to finish in 60 minutes. The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:01 |
| opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:01 |
| opendevmeet | The meeting name has been set to 'openstack_ansible_meeting' | 15:01 |
| noonedeadpunk | #topic office hours | 15:01 |
| mgariepy | right on time :D | 15:01 |
| noonedeadpunk | so for infra jobs yeah, we probably should also add ansible-role-requirements and ansible-collection-requirements | 15:02 |
| noonedeadpunk | to "files" | 15:02 |
| noonedeadpunk | stream-9 distro jobs - well, I intended to fix them one day | 15:03 |
| noonedeadpunk | they're NV as of today, so why not. But I had a dependency on smth to get them working, can't really recall | 15:03 |
| * noonedeadpunk fixed alarm clock :D | 15:04 | |
| damiandabrowski | :D | 15:04 |
| opendevreview | Jonathan Rosser proposed openstack/openstack-ansible master: Remove centos-9-stream distro jobs https://review.opendev.org/c/openstack/openstack-ansible/+/851041 | 15:04 |
| jrosser | o/ hello | 15:05 |
| noonedeadpunk | Will try to spend some time on them maybe next week | 15:05 |
| noonedeadpunk | oh, ok, so the reason why distro jobs are failing, is that zed packages are not published yet | 15:07 |
| noonedeadpunk | and we need openstacksdk 0.99 with our collection versions | 15:07 |
| noonedeadpunk | So. I was looking today at our lxc jobs. And wondering if we should try switching to overlayfs instead of dir by default? | 15:09 |
| noonedeadpunk | maybe it's more a ptg topic though | 15:09 |
| jrosser | is there a benefit? | 15:10 |
| noonedeadpunk | But I guess as of today, all supported distros (well, except focal - do we support it?) have kernel 3.13 by default | 15:10 |
| noonedeadpunk | Well, yes? You don't copy root each time, but just use cow and snapshot? | 15:10 |
| noonedeadpunk | I have no idea if it's working at all tbh as of today, as I haven't touched that part for real | 15:11 |
| noonedeadpunk | but you should save quite some space with that on controllers | 15:11 |
| jrosser | i think we have good tests for all of these, so it should be easy to see | 15:12 |
| noonedeadpunk | Each container is ~500MB | 15:12 |
| noonedeadpunk | And diff would be like 20MB tops | 15:12 |
| noonedeadpunk | Nah, more. Image is 423MB | 15:12 |
| noonedeadpunk | So we would save 423Mb per each container that runs | 15:12 |
| noonedeadpunk | Another thing was, that we totally forget to mention during previous PTG, is getting rid of dash-sparated groups and use underscore only | 15:13 |
| noonedeadpunk | We're postponing this for quite a while now to have that said... | 15:13 |
| noonedeadpunk | But it might be not that big chunk of work. | 15:14 |
| noonedeadpunk | I will try to look into that, but not now for sure | 15:14 |
| noonedeadpunk | Right now I'm trying to work on AZ concept and what needs to be done from OSA side to get this working. Will publish a scenario to docs once done if there're no objections | 15:15 |
| noonedeadpunk | env.d overrides are quite massive as of to day to have that said | 15:15 |
| noonedeadpunk | *today | 15:15 |
| jrosser | i wonder if we prune journals properly | 15:16 |
| opendevreview | Marc Gariépy proposed openstack/openstack-ansible master: Fix cloud.openstack.* modules https://review.opendev.org/c/openstack/openstack-ansible/+/851038 | 15:16 |
| jrosser | i am already using COW on my controllers (zfs) and the base size of the container is like tiny compared to the total | 15:16 |
| noonedeadpunk | ah. well, then there's no profit in overlayfs for you :D | 15:17 |
| noonedeadpunk | But I kind of wonder if there're consequences to use overlayfs with other bind-mounts, as it kind of change representaton of these from what I read | 15:18 |
| jrosser | i guess i mean that the delta doesnt always stay small | 15:18 |
| jrosser | it will be when you first create them | 15:18 |
| noonedeadpunk | well, delta would be basically venvs size I guess? | 15:19 |
| jrosser | for $time-since-resinstall on one i'me looking at 25G for 13 containers even with COW | 15:20 |
| noonedeadpunk | as logs and journals and databases are passed into the container and not root | 15:20 |
| noonedeadpunk | that is kind of weird? | 15:20 |
| mgariepy | wow.. https://github.com/ansible/ansible/issues/78344 | 15:28 |
| jrosser | huh | 15:29 |
| jrosser | they're going to complain next at the slightly unusual use of the setup module, i can just feel it | 15:30 |
| mgariepy | let's make the control persist 4 hours.. ;p | 15:30 |
| mgariepy | what a good responses.. | 15:30 |
| jrosser | noonedeadpunk: in my container /openstack is nearly 1G for keystone container, each keystone venv is ~200M so for a couple of upgrade this adds up pretty quick | 15:31 |
| noonedeadpunk | btw, retention of old venvs is a good topic | 15:32 |
| noonedeadpunk | should we create some playbook at least for ops repo for that? | 15:32 |
| jrosser | it's pretty coupled to the inventory so might be hard for the ops repo? | 15:35 |
| jrosser | mgariepy: i will reproduce with piplining=False and show that it handles -13 in that case | 15:36 |
| jrosser | i'm sure i saw it doing that | 15:36 |
| mgariepy | hmm i haven't been able to reproduce it with pipelining false. | 15:37 |
| noonedeadpunk | because it re-tries :) | 15:37 |
| mgariepy | no | 15:38 |
| mgariepy | because scp catch it to transfer the file. | 15:38 |
| mgariepy | then reactivate the error | 15:38 |
| mgariepy | error / socket | 15:38 |
| jrosser | well - question is if it throws AnsibleControlPersistBrokenPipeError and handles it properly when pipelining is false | 15:40 |
| mgariepy | when pipelining is false | 15:40 |
| mgariepy | it will transfert the file. | 15:40 |
| mgariepy | if there is an error it most likely end up being exit(255). | 15:40 |
| mgariepy | then it retries. | 15:40 |
| mgariepy | no ? | 15:40 |
| jrosser | lets test it | 15:41 |
| mgariepy | you never catch the same race with pipelining false. | 15:41 |
| jrosser | no, but my code that was printing p.returncode was showing what was happening though | 15:41 |
| jrosser | anyway, lets meeting some more then look at that | 15:42 |
| jrosser | we did a Y upgrade in the lab last week | 15:44 |
| jrosser | which went really quite smoothly, just a couple of issues | 15:44 |
| spotz_ | nice | 15:44 |
| jrosser | galaxy seems to have some mishandling of http proxies so we had to override all the things to be https://github.... | 15:45 |
| jrosser | also lxc_hosts_container_build_command is not taking account of lxc_apt_mirror | 15:46 |
| anskiy | noonedeadpunk: there is already one: https://opendev.org/openstack/openstack-ansible-ops/src/branch/master/ansible_tools/playbooks/cleanup-venvs.yml | 15:46 |
| jrosser | but the biggest trouble was from `2022-07-22 08:04:35 upgrade lxc:all 1:4.0.6-0ubuntu1~20.04.1 1:4.0.12-0ubuntu1~20.04.1` | 15:47 |
| noonedeadpunk | anskiy: ah, well :) | 15:47 |
| noonedeadpunk | oh, well, it would restart all containers | 15:48 |
| jrosser | yeah, pretty much | 15:48 |
| jrosser | and i think setup-hosts does that across * at the same time | 15:48 |
| noonedeadpunk | what I had to do with that, is to systemctl stop lxc, systemctl mask lxc, then update package, unmask and reboot | 15:49 |
| noonedeadpunk | if we pass package_state: latest, then yeah... | 15:49 |
| jrosser | i don't know if we want to manage that a bit more | 15:49 |
| jrosser | but that might be a big surprise for poeple expecting the upgrades to not hose everything at once | 15:50 |
| noonedeadpunk | I bet this is causing troubles https://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/run-upgrade.sh#L186 | 15:51 |
| jrosser | we also put that in the documentation https://opendev.org/openstack/openstack-ansible/src/branch/master/doc/source/admin/upgrades/minor-upgrades.rst#L52 | 15:53 |
| noonedeadpunk | So we would need to add serial here https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/containers-lxc-host.yml#L24 and override it in script | 15:53 |
| noonedeadpunk | ok, that is good a point. | 15:53 |
| noonedeadpunk | * a good point | 15:53 |
| jrosser | well i don't know if serial is the right thing to do | 15:53 |
| noonedeadpunk | are you thinking about putting lxc on hold? | 15:54 |
| noonedeadpunk | I really can't recall how it's done on centos though | 15:55 |
| noonedeadpunk | but yes, I do agree we need to fix that for ppl | 15:57 |
| noonedeadpunk | not upgrading packages - well, it's also a thing. Maybe we can adjust docs not to mention `-e package_state=latest` as smth "default" but what can be added if you want to update packages, but with some pre-cautions | 15:58 |
| noonedeadpunk | As at very least this also affects neutron, as ovs being upgraded also cause disturbances | 15:58 |
| jrosser | andrewbonney is back tomorrow - i'm sure we have some use of unattended upgrades here with config to prevent $bad-things but i can't find where that is just now | 15:59 |
| noonedeadpunk | well, we jsut disabled unattended upgrades... | 16:00 |
| jrosser | yeah i might be confused here, but it is the same sort of area | 16:00 |
| noonedeadpunk | so my intention when I added `-e package_state=latest` was - you're panning upgrade and thus maintenance. During maintenance interruptions happen. | 16:01 |
| noonedeadpunk | But with case of LXC you would need to re-boostrap galera at bery least | 16:01 |
| noonedeadpunk | as all goes down at the same time | 16:01 |
| jrosser | right, it's not at all self-healing just by running some playbooks | 16:01 |
| *** dviroel|lunch is now known as dviroel | 16:01 | |
| jrosser | which is kind of what poeple expect | 16:01 |
| noonedeadpunk | and what I see we can do - either manage serial, then restarting lxc will likely self-heal as others are alive, or, put some packages on hold and add another flag to upgrade LXC to make it intentionally | 16:02 |
| noonedeadpunk | as I don't think we are able to pin the version | 16:03 |
| noonedeadpunk | ok, let's wait for andrewbonney then to make a decision | 16:04 |
| noonedeadpunk | #endmeeting | 16:04 |
| opendevmeet | Meeting ended Tue Jul 26 16:04:06 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:04 |
| opendevmeet | Minutes: https://meetings.opendev.org/meetings/openstack_ansible_meeting/2022/openstack_ansible_meeting.2022-07-26-15.00.html | 16:04 |
| opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/openstack_ansible_meeting/2022/openstack_ansible_meeting.2022-07-26-15.00.txt | 16:04 |
| opendevmeet | Log: https://meetings.opendev.org/meetings/openstack_ansible_meeting/2022/openstack_ansible_meeting.2022-07-26-15.00.log.html | 16:04 |
| noonedeadpunk | oh, I can recall that we somehow prevented package triggers to run for galera I guess. | 16:08 |
| noonedeadpunk | and we already have lxc_container_allow_restarts variable | 16:09 |
| noonedeadpunk | https://opendev.org/openstack/openstack-ansible-galera_server/src/branch/master/tasks/galera_install_apt.yml#L65-L71 | 16:09 |
| jrosser | mgariepy: without pipelining https://paste.opendev.org/show/bfdqBx2uOhGOr8LJRMGV/ | 16:15 |
| jrosser | and with -vvv https://paste.opendev.org/show/bWCoeQuDgiFeR8msOpop/ | 16:17 |
| opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Allow to provide serial for lxc_hosts https://review.opendev.org/c/openstack/openstack-ansible/+/851049 | 16:17 |
| jrosser | and with pipelining enabled https://paste.opendev.org/show/baJ8I2xKlAiovdwHvniM/ | 16:20 |
| opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts master: Prevent lxc.service from being restarted on package update https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/851071 | 16:29 |
| noonedeadpunk | huh, how it re-tries then.... | 16:30 |
| jrosser | i don't know | 16:34 |
| jrosser | mgariepy: noonedeadpunk the other thing i totally don't undestand is why this always happens with the setup module | 16:43 |
| jrosser | or is it really 'always', or are we just unlucky in the stucture of our playbooks that T>ControlPersist has elapsed before we try to gather facts on the repo hosts | 16:44 |
| mgariepy | it think it's reproducable by another module. | 17:02 |
| mgariepy | but it would need not to exit 255.. | 17:03 |
| mgariepy | we probably can trigger it with comamnd module also | 17:03 |
| jrosser | perhaps there is something specific with the setup module | 17:06 |
| mgariepy | maybe. | 17:06 |
| mgariepy | but as long as we get the same race and an exit outside of 255 it should be failing. | 17:07 |
| mgariepy | the fun part is to make the other modules takes as much time as setup takes :D | 17:10 |
| jrosser | ok here with go with the file: module https://paste.opendev.org/show/bFNbYfNTxDRBENHh6h9S/ | 17:11 |
| mgariepy | how nice :D | 17:11 |
| mgariepy | https://github.com/ansible/ansible/blob/e4087baa835df6046950a6579d2d2b56df75d12b/lib/ansible/plugins/connection/ssh.py#L1235 | 17:16 |
| mgariepy | checkrc=False | 17:17 |
| mgariepy | so if it's scp of sftp it does not care about the exit status and retry if so | 17:18 |
| jrosser | here is another case where we get `muxclient: master hello exchange failed` https://paste.opendev.org/show/b3xllr0pREQ3vDvpyoax/ | 17:18 |
| jrosser | i am still not able to make a MODULE FAILURE with anything else though | 17:38 |
| mgariepy | the file modules with pipelining does it do the same as the setup module ? | 17:41 |
| spatel | jamesdenton around? | 17:41 |
| mgariepy | or it does transfer the file in another way ? | 17:42 |
| jrosser | it's possible it's different, just look at the number of SSH steps in my paste | 17:42 |
| jrosser | i'm just trying something else like package: | 17:42 |
| mgariepy | some vote please :D https://review.opendev.org/c/openstack/openstack-ansible/+/851038 | 17:43 |
| mgariepy | also any comments on : https://review.opendev.org/c/openstack/openstack-ansible/+/850942/ ? | 17:49 |
| mgariepy | i think having it to 1 makes the process not responing to haproxy an then disconect the endpoint server. | 17:50 |
| jrosser | package: MODULE FAILURE https://paste.opendev.org/show/bGFkEBoURUovv3ktl9JJ/ | 17:51 |
| jrosser | mgariepy: is it really threads - i always find this sooo confusing with python | 17:51 |
| mgariepy | le'ts rewrite openstack in go. | 17:52 |
| mgariepy | simple deployment, real threads. and so on. | 17:52 |
| jrosser | i think what i mean is i don't know if that actually does what you expect :) | 17:54 |
| mgariepy | so should we set the number of process to 2 instead ? | 17:56 |
| mgariepy | i assume that with 2 threads it should be a bit better to respond to multiple request. | 17:56 |
| jrosser | noonedeadpunk: ^ do you understand how this works? | 17:58 |
| mgariepy | arf.. meeting now | 17:58 |
| noonedeadpunk | I wonder what's wrong with linters actually. seems that always fail | 19:22 |
| noonedeadpunk | um, not sure I got where this leads... | 19:28 |
| mgariepy | the linter is fixed in the other patch. | 19:29 |
| noonedeadpunk | (last was regarding python threads) | 19:30 |
| mgariepy | i saw a few failure when tempest was running that the services were just not responding. | 19:38 |
| mgariepy | not 100% sure having 2 thread will fix 100% of the case. but maybe. | 19:39 |
| mgariepy | mayber i could try to take the time to test if it actualy help. but i'm kinda short on time for the remaining of the week. | 19:39 |
| opendevreview | Merged openstack/openstack-ansible master: Fix cloud.openstack.* modules https://review.opendev.org/c/openstack/openstack-ansible/+/851038 | 20:14 |
| *** anskiy1 is now known as anskiy | 21:28 | |
| *** dviroel is now known as dviroel|out | 23:29 | |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!