*** ysandeep|out is now known as ysandeep | 01:55 | |
*** ysandeep is now known as ysandeep|afk | 03:21 | |
*** ysandeep|afk is now known as ysandeep | 05:03 | |
jrosser | morning | 07:38 |
---|---|---|
*** ysandeep is now known as ysandeep|afk | 08:02 | |
noonedeadpunk | morning! | 08:07 |
damiandabrowski | hi! | 09:10 |
*** tosky_ is now known as tosky | 09:10 | |
*** dviroel__ is now known as dviroel | 11:05 | |
mgariepy | good morning everyone | 11:39 |
*** ysandeep|afk is now known as ysandeep | 12:30 | |
mgariepy | hmm ? ERROR! couldn't resolve module/action 'openstack.cloud.os_auth'. This often indicates a misspelling, missing collection, or incorrect module path. | 14:44 |
opendevreview | Marc Gariépy proposed openstack/openstack-ansible master: Fix cloud.openstack.auth module https://review.opendev.org/c/openstack/openstack-ansible/+/851038 | 14:47 |
noonedeadpunk | do we need to backport that? ^ | 14:49 |
jrosser | whoops | 14:50 |
mgariepy | maybe we need to start managing the version or the module we install ? | 14:50 |
jrosser | i think the zuul stuff is set up now to run the infra jobs when those playbooks are touched | 14:50 |
jrosser | we didnt do that before and could merge broken stuff | 14:50 |
jrosser | huh interesting https://github.com/openstack/openstack-ansible/blob/master/zuul.d/jobs.yaml#L267-L277 | 14:52 |
jrosser | yes so we don't actually run that anywhere in CI? | 14:53 |
*** dviroel is now known as dviroel|lunch | 14:55 | |
opendevreview | Marc Gariépy proposed openstack/openstack-ansible master: Set the number of threads for processes to 2 https://review.opendev.org/c/openstack/openstack-ansible/+/850942 | 14:56 |
jrosser | also why are we running stream-9 distro jobs | 14:59 |
jrosser | that looks like a mistake | 14:59 |
*** ysandeep is now known as ysandeep|out | 15:00 | |
noonedeadpunk | #startmeeting openstack_ansible_meeting | 15:00 |
opendevmeet | Meeting started Tue Jul 26 15:00:59 2022 UTC and is due to finish in 60 minutes. The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:01 |
opendevmeet | The meeting name has been set to 'openstack_ansible_meeting' | 15:01 |
noonedeadpunk | #topic office hours | 15:01 |
mgariepy | right on time :D | 15:01 |
noonedeadpunk | so for infra jobs yeah, we probably should also add ansible-role-requirements and ansible-collection-requirements | 15:02 |
noonedeadpunk | to "files" | 15:02 |
noonedeadpunk | stream-9 distro jobs - well, I intended to fix them one day | 15:03 |
noonedeadpunk | they're NV as of today, so why not. But I had a dependency on smth to get them working, can't really recall | 15:03 |
* noonedeadpunk fixed alarm clock :D | 15:04 | |
damiandabrowski | :D | 15:04 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible master: Remove centos-9-stream distro jobs https://review.opendev.org/c/openstack/openstack-ansible/+/851041 | 15:04 |
jrosser | o/ hello | 15:05 |
noonedeadpunk | Will try to spend some time on them maybe next week | 15:05 |
noonedeadpunk | oh, ok, so the reason why distro jobs are failing, is that zed packages are not published yet | 15:07 |
noonedeadpunk | and we need openstacksdk 0.99 with our collection versions | 15:07 |
noonedeadpunk | So. I was looking today at our lxc jobs. And wondering if we should try switching to overlayfs instead of dir by default? | 15:09 |
noonedeadpunk | maybe it's more a ptg topic though | 15:09 |
jrosser | is there a benefit? | 15:10 |
noonedeadpunk | But I guess as of today, all supported distros (well, except focal - do we support it?) have kernel 3.13 by default | 15:10 |
noonedeadpunk | Well, yes? You don't copy root each time, but just use cow and snapshot? | 15:10 |
noonedeadpunk | I have no idea if it's working at all tbh as of today, as I haven't touched that part for real | 15:11 |
noonedeadpunk | but you should save quite some space with that on controllers | 15:11 |
jrosser | i think we have good tests for all of these, so it should be easy to see | 15:12 |
noonedeadpunk | Each container is ~500MB | 15:12 |
noonedeadpunk | And diff would be like 20MB tops | 15:12 |
noonedeadpunk | Nah, more. Image is 423MB | 15:12 |
noonedeadpunk | So we would save 423Mb per each container that runs | 15:12 |
noonedeadpunk | Another thing was, that we totally forget to mention during previous PTG, is getting rid of dash-sparated groups and use underscore only | 15:13 |
noonedeadpunk | We're postponing this for quite a while now to have that said... | 15:13 |
noonedeadpunk | But it might be not that big chunk of work. | 15:14 |
noonedeadpunk | I will try to look into that, but not now for sure | 15:14 |
noonedeadpunk | Right now I'm trying to work on AZ concept and what needs to be done from OSA side to get this working. Will publish a scenario to docs once done if there're no objections | 15:15 |
noonedeadpunk | env.d overrides are quite massive as of to day to have that said | 15:15 |
noonedeadpunk | *today | 15:15 |
jrosser | i wonder if we prune journals properly | 15:16 |
opendevreview | Marc Gariépy proposed openstack/openstack-ansible master: Fix cloud.openstack.* modules https://review.opendev.org/c/openstack/openstack-ansible/+/851038 | 15:16 |
jrosser | i am already using COW on my controllers (zfs) and the base size of the container is like tiny compared to the total | 15:16 |
noonedeadpunk | ah. well, then there's no profit in overlayfs for you :D | 15:17 |
noonedeadpunk | But I kind of wonder if there're consequences to use overlayfs with other bind-mounts, as it kind of change representaton of these from what I read | 15:18 |
jrosser | i guess i mean that the delta doesnt always stay small | 15:18 |
jrosser | it will be when you first create them | 15:18 |
noonedeadpunk | well, delta would be basically venvs size I guess? | 15:19 |
jrosser | for $time-since-resinstall on one i'me looking at 25G for 13 containers even with COW | 15:20 |
noonedeadpunk | as logs and journals and databases are passed into the container and not root | 15:20 |
noonedeadpunk | that is kind of weird? | 15:20 |
mgariepy | wow.. https://github.com/ansible/ansible/issues/78344 | 15:28 |
jrosser | huh | 15:29 |
jrosser | they're going to complain next at the slightly unusual use of the setup module, i can just feel it | 15:30 |
mgariepy | let's make the control persist 4 hours.. ;p | 15:30 |
mgariepy | what a good responses.. | 15:30 |
jrosser | noonedeadpunk: in my container /openstack is nearly 1G for keystone container, each keystone venv is ~200M so for a couple of upgrade this adds up pretty quick | 15:31 |
noonedeadpunk | btw, retention of old venvs is a good topic | 15:32 |
noonedeadpunk | should we create some playbook at least for ops repo for that? | 15:32 |
jrosser | it's pretty coupled to the inventory so might be hard for the ops repo? | 15:35 |
jrosser | mgariepy: i will reproduce with piplining=False and show that it handles -13 in that case | 15:36 |
jrosser | i'm sure i saw it doing that | 15:36 |
mgariepy | hmm i haven't been able to reproduce it with pipelining false. | 15:37 |
noonedeadpunk | because it re-tries :) | 15:37 |
mgariepy | no | 15:38 |
mgariepy | because scp catch it to transfer the file. | 15:38 |
mgariepy | then reactivate the error | 15:38 |
mgariepy | error / socket | 15:38 |
jrosser | well - question is if it throws AnsibleControlPersistBrokenPipeError and handles it properly when pipelining is false | 15:40 |
mgariepy | when pipelining is false | 15:40 |
mgariepy | it will transfert the file. | 15:40 |
mgariepy | if there is an error it most likely end up being exit(255). | 15:40 |
mgariepy | then it retries. | 15:40 |
mgariepy | no ? | 15:40 |
jrosser | lets test it | 15:41 |
mgariepy | you never catch the same race with pipelining false. | 15:41 |
jrosser | no, but my code that was printing p.returncode was showing what was happening though | 15:41 |
jrosser | anyway, lets meeting some more then look at that | 15:42 |
jrosser | we did a Y upgrade in the lab last week | 15:44 |
jrosser | which went really quite smoothly, just a couple of issues | 15:44 |
spotz_ | nice | 15:44 |
jrosser | galaxy seems to have some mishandling of http proxies so we had to override all the things to be https://github.... | 15:45 |
jrosser | also lxc_hosts_container_build_command is not taking account of lxc_apt_mirror | 15:46 |
anskiy | noonedeadpunk: there is already one: https://opendev.org/openstack/openstack-ansible-ops/src/branch/master/ansible_tools/playbooks/cleanup-venvs.yml | 15:46 |
jrosser | but the biggest trouble was from `2022-07-22 08:04:35 upgrade lxc:all 1:4.0.6-0ubuntu1~20.04.1 1:4.0.12-0ubuntu1~20.04.1` | 15:47 |
noonedeadpunk | anskiy: ah, well :) | 15:47 |
noonedeadpunk | oh, well, it would restart all containers | 15:48 |
jrosser | yeah, pretty much | 15:48 |
jrosser | and i think setup-hosts does that across * at the same time | 15:48 |
noonedeadpunk | what I had to do with that, is to systemctl stop lxc, systemctl mask lxc, then update package, unmask and reboot | 15:49 |
noonedeadpunk | if we pass package_state: latest, then yeah... | 15:49 |
jrosser | i don't know if we want to manage that a bit more | 15:49 |
jrosser | but that might be a big surprise for poeple expecting the upgrades to not hose everything at once | 15:50 |
noonedeadpunk | I bet this is causing troubles https://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/run-upgrade.sh#L186 | 15:51 |
jrosser | we also put that in the documentation https://opendev.org/openstack/openstack-ansible/src/branch/master/doc/source/admin/upgrades/minor-upgrades.rst#L52 | 15:53 |
noonedeadpunk | So we would need to add serial here https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/containers-lxc-host.yml#L24 and override it in script | 15:53 |
noonedeadpunk | ok, that is good a point. | 15:53 |
noonedeadpunk | * a good point | 15:53 |
jrosser | well i don't know if serial is the right thing to do | 15:53 |
noonedeadpunk | are you thinking about putting lxc on hold? | 15:54 |
noonedeadpunk | I really can't recall how it's done on centos though | 15:55 |
noonedeadpunk | but yes, I do agree we need to fix that for ppl | 15:57 |
noonedeadpunk | not upgrading packages - well, it's also a thing. Maybe we can adjust docs not to mention `-e package_state=latest` as smth "default" but what can be added if you want to update packages, but with some pre-cautions | 15:58 |
noonedeadpunk | As at very least this also affects neutron, as ovs being upgraded also cause disturbances | 15:58 |
jrosser | andrewbonney is back tomorrow - i'm sure we have some use of unattended upgrades here with config to prevent $bad-things but i can't find where that is just now | 15:59 |
noonedeadpunk | well, we jsut disabled unattended upgrades... | 16:00 |
jrosser | yeah i might be confused here, but it is the same sort of area | 16:00 |
noonedeadpunk | so my intention when I added `-e package_state=latest` was - you're panning upgrade and thus maintenance. During maintenance interruptions happen. | 16:01 |
noonedeadpunk | But with case of LXC you would need to re-boostrap galera at bery least | 16:01 |
noonedeadpunk | as all goes down at the same time | 16:01 |
jrosser | right, it's not at all self-healing just by running some playbooks | 16:01 |
*** dviroel|lunch is now known as dviroel | 16:01 | |
jrosser | which is kind of what poeple expect | 16:01 |
noonedeadpunk | and what I see we can do - either manage serial, then restarting lxc will likely self-heal as others are alive, or, put some packages on hold and add another flag to upgrade LXC to make it intentionally | 16:02 |
noonedeadpunk | as I don't think we are able to pin the version | 16:03 |
noonedeadpunk | ok, let's wait for andrewbonney then to make a decision | 16:04 |
noonedeadpunk | #endmeeting | 16:04 |
opendevmeet | Meeting ended Tue Jul 26 16:04:06 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:04 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/openstack_ansible_meeting/2022/openstack_ansible_meeting.2022-07-26-15.00.html | 16:04 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/openstack_ansible_meeting/2022/openstack_ansible_meeting.2022-07-26-15.00.txt | 16:04 |
opendevmeet | Log: https://meetings.opendev.org/meetings/openstack_ansible_meeting/2022/openstack_ansible_meeting.2022-07-26-15.00.log.html | 16:04 |
noonedeadpunk | oh, I can recall that we somehow prevented package triggers to run for galera I guess. | 16:08 |
noonedeadpunk | and we already have lxc_container_allow_restarts variable | 16:09 |
noonedeadpunk | https://opendev.org/openstack/openstack-ansible-galera_server/src/branch/master/tasks/galera_install_apt.yml#L65-L71 | 16:09 |
jrosser | mgariepy: without pipelining https://paste.opendev.org/show/bfdqBx2uOhGOr8LJRMGV/ | 16:15 |
jrosser | and with -vvv https://paste.opendev.org/show/bWCoeQuDgiFeR8msOpop/ | 16:17 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Allow to provide serial for lxc_hosts https://review.opendev.org/c/openstack/openstack-ansible/+/851049 | 16:17 |
jrosser | and with pipelining enabled https://paste.opendev.org/show/baJ8I2xKlAiovdwHvniM/ | 16:20 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts master: Prevent lxc.service from being restarted on package update https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/851071 | 16:29 |
noonedeadpunk | huh, how it re-tries then.... | 16:30 |
jrosser | i don't know | 16:34 |
jrosser | mgariepy: noonedeadpunk the other thing i totally don't undestand is why this always happens with the setup module | 16:43 |
jrosser | or is it really 'always', or are we just unlucky in the stucture of our playbooks that T>ControlPersist has elapsed before we try to gather facts on the repo hosts | 16:44 |
mgariepy | it think it's reproducable by another module. | 17:02 |
mgariepy | but it would need not to exit 255.. | 17:03 |
mgariepy | we probably can trigger it with comamnd module also | 17:03 |
jrosser | perhaps there is something specific with the setup module | 17:06 |
mgariepy | maybe. | 17:06 |
mgariepy | but as long as we get the same race and an exit outside of 255 it should be failing. | 17:07 |
mgariepy | the fun part is to make the other modules takes as much time as setup takes :D | 17:10 |
jrosser | ok here with go with the file: module https://paste.opendev.org/show/bFNbYfNTxDRBENHh6h9S/ | 17:11 |
mgariepy | how nice :D | 17:11 |
mgariepy | https://github.com/ansible/ansible/blob/e4087baa835df6046950a6579d2d2b56df75d12b/lib/ansible/plugins/connection/ssh.py#L1235 | 17:16 |
mgariepy | checkrc=False | 17:17 |
mgariepy | so if it's scp of sftp it does not care about the exit status and retry if so | 17:18 |
jrosser | here is another case where we get `muxclient: master hello exchange failed` https://paste.opendev.org/show/b3xllr0pREQ3vDvpyoax/ | 17:18 |
jrosser | i am still not able to make a MODULE FAILURE with anything else though | 17:38 |
mgariepy | the file modules with pipelining does it do the same as the setup module ? | 17:41 |
spatel | jamesdenton around? | 17:41 |
mgariepy | or it does transfer the file in another way ? | 17:42 |
jrosser | it's possible it's different, just look at the number of SSH steps in my paste | 17:42 |
jrosser | i'm just trying something else like package: | 17:42 |
mgariepy | some vote please :D https://review.opendev.org/c/openstack/openstack-ansible/+/851038 | 17:43 |
mgariepy | also any comments on : https://review.opendev.org/c/openstack/openstack-ansible/+/850942/ ? | 17:49 |
mgariepy | i think having it to 1 makes the process not responing to haproxy an then disconect the endpoint server. | 17:50 |
jrosser | package: MODULE FAILURE https://paste.opendev.org/show/bGFkEBoURUovv3ktl9JJ/ | 17:51 |
jrosser | mgariepy: is it really threads - i always find this sooo confusing with python | 17:51 |
mgariepy | le'ts rewrite openstack in go. | 17:52 |
mgariepy | simple deployment, real threads. and so on. | 17:52 |
jrosser | i think what i mean is i don't know if that actually does what you expect :) | 17:54 |
mgariepy | so should we set the number of process to 2 instead ? | 17:56 |
mgariepy | i assume that with 2 threads it should be a bit better to respond to multiple request. | 17:56 |
jrosser | noonedeadpunk: ^ do you understand how this works? | 17:58 |
mgariepy | arf.. meeting now | 17:58 |
noonedeadpunk | I wonder what's wrong with linters actually. seems that always fail | 19:22 |
noonedeadpunk | um, not sure I got where this leads... | 19:28 |
mgariepy | the linter is fixed in the other patch. | 19:29 |
noonedeadpunk | (last was regarding python threads) | 19:30 |
mgariepy | i saw a few failure when tempest was running that the services were just not responding. | 19:38 |
mgariepy | not 100% sure having 2 thread will fix 100% of the case. but maybe. | 19:39 |
mgariepy | mayber i could try to take the time to test if it actualy help. but i'm kinda short on time for the remaining of the week. | 19:39 |
opendevreview | Merged openstack/openstack-ansible master: Fix cloud.openstack.* modules https://review.opendev.org/c/openstack/openstack-ansible/+/851038 | 20:14 |
*** anskiy1 is now known as anskiy | 21:28 | |
*** dviroel is now known as dviroel|out | 23:29 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!