jrosser | noonedeadpunk: odd errors here https://review.opendev.org/c/openstack/openstack-ansible/+/836378 | 07:42 |
---|---|---|
jrosser | i wonder what changed there | 07:42 |
jrosser | can we merge this https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/836377 | 07:43 |
noonedeadpunk | mornings | 07:56 |
noonedeadpunk | basically lxc fails | 07:57 |
noonedeadpunk | that sounds like some issue with our connection plugin definition | 07:58 |
noonedeadpunk | I think we use related path import there or some nasty thing that likely has changed | 07:58 |
noonedeadpunk | I wonder if that's `AnsiballZ - Ensure we use the full python package in the module cache filename to avoid a case where collections: is used to execute a module via short name, where the short name duplicates another module from ansible.builtin or another collection that was executed previously.` | 08:00 |
noonedeadpunk | https://github.com/ansible/ansible/blob/stable-2.12/changelogs/CHANGELOG-v2.12.rst#v2-12-4 | 08:00 |
jrosser | so maybe thats because we duplicate the name of the ssh connection plugin with ours? | 08:08 |
noonedeadpunk | but at the same time we provide full path.... | 08:21 |
noonedeadpunk | https://opendev.org/openstack/openstack-ansible/src/branch/master/scripts/openstack-ansible.rc#L50 | 08:21 |
noonedeadpunk | but I'm 99.9% sure it has smth to do with connection plugin this way or another. Will spawn aio then to test this out | 08:22 |
*** ysandeep is now known as ysandeep|lunch | 08:28 | |
opendevreview | Merged openstack/openstack-ansible-plugins master: Fix detection of Rocky Linux for ssh_keypairs role https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/836377 | 08:29 |
*** ysandeep|lunch is now known as ysandeep | 09:00 | |
opendevreview | Merged openstack/openstack-ansible-openstack_hosts stable/wallaby: Use correct system.conf.d permissions https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/836339 | 09:03 |
jrosser | noonedeadpunk: as far as i can see it is trying to ssh to the container rather than the physical host https://paste.opendev.org/show/bl3xVqKH6LFaOljc82ig/ | 09:09 |
noonedeadpunk | yup.. | 09:10 |
noonedeadpunk | which maskes me think that our connection plugin simply not used for some reason | 09:10 |
jrosser | it is certainly using the strategy plugin as some of those messages come from that | 09:12 |
opendevreview | Merged openstack/openstack-ansible-openstack_hosts stable/xena: Use correct system.conf.d permissions https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/836338 | 09:20 |
noonedeadpunk | ok, it;'s 2.12.3 that broke us eventually | 09:21 |
noonedeadpunk | `ssh connection now uses more correct host source as play_context can ignore loop/delegation variations.` | 09:22 |
noonedeadpunk | `gather_facts action now handles the move of base connection plugin types into collections to add/prevent subset argument correctly` | 09:22 |
noonedeadpunk | sounds like smth related to the issue:) | 09:23 |
jrosser | https://github.com/ansible/ansible/commit/be19863e44cc6b78706147b25489a73d7c8fbcb5 | 09:25 |
noonedeadpunk | mmm, yeah... | 09:27 |
noonedeadpunk | but! I think it's indeed jsut facts gathering | 09:28 |
noonedeadpunk | nah, disregard that | 09:28 |
jrosser | i think this is where we override the target https://github.com/openstack/openstack-ansible-plugins/blob/master/plugins/connection/ssh.py#L414-L417 | 09:29 |
jrosser | but in the ansible patch they add a whole bunch of `self.host = self.get_option('host') or self._play_context.remote_addr` | 09:30 |
noonedeadpunk | so we basically should unset 'host' from options as well I guess | 09:31 |
jrosser | actually that is the only change they make the the ssh plugin | 09:33 |
noonedeadpunk | but hm, shouldn't shouldn't self.host be returned for get_option('hsot') | 09:34 |
noonedeadpunk | ok, no, it doesn't | 09:35 |
opendevreview | Merged openstack/openstack-ansible-repo_server master: Use /run/nginx.pid https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/836374 | 09:36 |
noonedeadpunk | jrosser: ok, I have patch I think | 09:39 |
jrosser | just from the debug message we can see that its using self.host `<172.29.238.145> ESTABLISH SSH CONNECTION FOR USER: root` | 09:40 |
jrosser | the IP in < > is self.host | 09:41 |
jrosser | and self._play_context.remote_addr does indeed contain the thing we want | 09:43 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-plugins master: Define physical_host in options https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/836585 | 09:43 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-plugins master: Define physical_host in options https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/836585 | 09:45 |
noonedeadpunk | jrosser: ^ this works nicely in aio.... | 09:46 |
noonedeadpunk | looks like nasty hook though | 09:46 |
noonedeadpunk | but I printed self.get_options('host') and it was container address | 09:47 |
noonedeadpunk | I dunno if self.host is used anywhere down the line though.... | 09:47 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-plugins master: Define physical_host in options https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/836585 | 09:50 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Update ansible-core to 2.12.4 https://review.opendev.org/c/openstack/openstack-ansible/+/836378 | 09:50 |
jrosser | works here too - we get to see if it also works on 2.12.2 with 836585 | 09:52 |
noonedeadpunk | I think on 2.12.2 it just uses context, but not 100% sure | 09:55 |
noonedeadpunk | at least after core downgrade locally things remain working | 09:56 |
jrosser | yes it looks good | 10:00 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-repo_server stable/xena: Use /run/nginx.pid https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/836593 | 10:04 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-repo_server stable/wallaby: Use /run/nginx.pid https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/836594 | 10:04 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-repo_server stable/victoria: Use /run/nginx.pid https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/836595 | 10:04 |
jrosser | nginx pid is not the fix for stable branches it seems https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/836594 | 12:15 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible stable/xena: Connect openstack_pki_regen_ca variable to pki role https://review.opendev.org/c/openstack/openstack-ansible/+/834017 | 12:24 |
opendevreview | Merged openstack/openstack-ansible-os_tempest stable/xena: Set py_modules to an empty list in setup.py https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/836315 | 12:40 |
noonedeadpunk | hm | 12:41 |
noonedeadpunk | wondering if it is for master as well... | 12:42 |
noonedeadpunk | oh | 12:42 |
noonedeadpunk | ut it fails for buster | 12:42 |
noonedeadpunk | so likely buster is different | 12:42 |
opendevreview | Merged openstack/openstack-ansible-os_tempest stable/wallaby: Set py_modules to an empty list in setup.py https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/836169 | 12:45 |
opendevreview | Merged openstack/openstack-ansible-os_tempest stable/victoria: Set py_modules to an empty list in setup.py https://review.opendev.org/c/openstack/openstack-ansible-os_tempest/+/836330 | 12:45 |
jrosser | oh no more patches get zuul -2 because of early +2+W | 13:02 |
opendevreview | Merged openstack/openstack-ansible-plugins master: Define physical_host in options https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/836585 | 13:03 |
mgariepy | what ? | 13:03 |
jrosser | mgariepy: see what happened here https://review.opendev.org/c/openstack/openstack-ansible/+/836378 | 13:04 |
jrosser | thats super unhelpful behaviour | 13:04 |
mgariepy | so the workflow would be to +2 wait for zuul, then +w ? | 13:05 |
jrosser | i think it's because it has an unmerged depends-on in a different zuul queue | 13:05 |
jrosser | but i don't really see what that would warrant a -2 | 13:05 |
jrosser | it used to just do nothing at that point | 13:05 |
mgariepy | so cross repo depend is broken ? | 13:06 |
jrosser | but the +2+W of someone else would be retained, so the reviewer didnt have to wait for the dependant patch to merge | 13:06 |
noonedeadpunk | let's go #openstack-infra mmaybe to discuss that? As I tried understand but I can't | 13:06 |
jrosser | i asked yesterday but perhaps we need to ask again | 13:06 |
jrosser | noonedeadpunk: do wo have a PTG etherpad? | 13:08 |
jrosser | *we | 13:09 |
noonedeadpunk | https://etherpad.opendev.org/p/osa-Z-ptg | 13:09 |
jrosser | i expect PTG week is a tricky week to get answers on this stuff | 13:18 |
noonedeadpunk | well, yes... but I have lack of ideas I believe.... or well, lack of time to think about adding new cool stuff:) | 13:46 |
mgariepy | today is not quite a good day for me. i'll running out of power on my ups in like 30-60 minutes for my internet :( so i won't be able to be there for ptg today. | 13:50 |
noonedeadpunk | doh.... | 14:03 |
noonedeadpunk | that's a bummer | 14:03 |
mgariepy | they always tell the worst case for work on the powelines but it's 10 am and they say it will be back for 8 pm.. so i don't expect much. | 14:05 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-os_senlin master: Updated from OpenStack Ansible Tests https://review.opendev.org/c/openstack/openstack-ansible-os_senlin/+/835721 | 14:20 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Define zuul queue https://review.opendev.org/c/openstack/openstack-ansible/+/836657 | 14:23 |
jrosser | noonedeadpunk: do we think that is correct? ^ | 14:23 |
jrosser | if all our changes get queued together and one at the head of the queue fails, doesnt that invalidate the whole queue | 14:24 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Use integrated queue for project https://review.opendev.org/c/openstack/openstack-ansible/+/836658 | 14:24 |
jrosser | where actually pretty much most of our repos can merge patches independantly | 14:24 |
jrosser | afaik this is why we see giant queues for some projects in zuul status, but ours stay short and have good throughput because they don't break each other | 14:25 |
noonedeadpunk | oh, indeed, you;re right.... | 14:27 |
noonedeadpunk | I read that different way at first | 14:27 |
jrosser | my understanding of this is a little lightweight though | 14:28 |
jrosser | i think a lot of the style in openstack is driven by mono-repo or single project oriented things | 14:29 |
jrosser | and we really do something a bit different to that | 14:29 |
noonedeadpunk | this just wasn't stated anywhere in docs this way:) | 14:30 |
noonedeadpunk | And `Any projects which interact with each other in tests should be part of the same shared queue in order to ensure that they don’t merge changes which break the others.` sounded like indeed smth we should have had | 14:31 |
jrosser | yeah, though i was thinking about that | 14:31 |
jrosser | and the logical conclusion is that there could only ever be one queue | 14:31 |
jrosser | because we might legitimately want to depends-on SDK, or nova, or requirements | 14:31 |
jrosser | or anything really | 14:31 |
jrosser | so thats when i start to get confused about what the actual concept is here | 14:32 |
noonedeadpunk | well, true... | 14:32 |
opendevreview | Merged openstack/openstack-ansible-repo_server stable/xena: Use /run/nginx.pid https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/836593 | 14:32 |
noonedeadpunk | And indeed fungi said "their testing has to be reset if there's a failure for a change ahead of the current change" which leads that you're right and it would be mess if we use them I guess. Like any patch failure would lead to invalidation of others.... | 14:33 |
jrosser | oh yes just imagine trying to merge a proposal bot set of changes | 14:33 |
fungi | depends on whether your projects often fail gate jobs after passing in check | 14:33 |
jrosser | it would be impossible | 14:34 |
fungi | zuul's dependent pipelines are optimized for rapidly merging changes, so jobs with a high rate of spurious failures unrelated to the changes being tested tend to slow things up as a result | 14:35 |
jrosser | fungi: is the assumption there that it is rapid merging to the same repo? | 14:35 |
noonedeadpunk | and likely we even wanted to push it here instead of all projects https://opendev.org/openstack/project-config/src/branch/master/zuul.d/projects.yaml | 14:35 |
fungi | if your jobs usually only fail when there's something wrong with the change itself, then things should merge very quickly with dependent queues | 14:35 |
fungi | rapid merging to all repos, but yes | 14:36 |
jrosser | i think we are mostly getting failure due to $external-random-thing that we have no control over (ansible-galaxy gives error) | 14:36 |
noonedeadpunk | we tend to have interminnent issues related to some infra/resource related stuff | 14:36 |
jrosser | or our current 'noise floor' of things that we've not got to the bottom of occasional errors yest | 14:37 |
noonedeadpunk | or OOM :) | 14:37 |
jrosser | hah | 14:37 |
fungi | i wonder if we could improve the galaxy issues with some caching. i know tripleo has zuul directly provide a number of galaxy modules as git checkouts in their jobs | 14:37 |
jrosser | we do that for everything we can already | 14:37 |
jrosser | but some collections do not contain the right stuff to allow local installs to work | 14:38 |
noonedeadpunk | tripleo installs roles as python packages, so dunno what you're talking about :) | 14:38 |
fungi | ahh, okay i couldn't remember if that had already been explored | 14:38 |
fungi | i probably confused the two in that case | 14:38 |
jrosser | though really i guess this discussion is all a result of our cross-queue trouble | 14:38 |
fungi | i guess it was openstack-ansible installing galaxy modules from git checkous | 14:38 |
jrosser | and if conceptually we are doing the right thing or not | 14:38 |
noonedeadpunk | we do that in gates, yes | 14:39 |
noonedeadpunk | wherever we can | 14:39 |
fungi | some background: the cross-project dependent queuing concept in zuul originated early in openstack's development, because we wanted to make sure that changes to cinder or cinderclient didn't break volume attachments through the nova api, for example, so we tested changes for all of them together in an integrated queue using devstack/tempest to make sure they remained interoperable at | 14:40 |
fungi | each change before merging it | 14:40 |
dmsimard | o/ I'm not at the openstack PTG but feel free to reach out if you have ansible or ara questions | 14:40 |
jrosser | dmsimard: versions in galaxy.yml should be mandatory in the git repos :) | 14:41 |
jrosser | not added after-the-fact as something is published to galaxy - otherwise you can't install from git source | 14:41 |
fungi | so if a change to cinder's api was queued ahead of a change to nova's volume handling and the former broke the latter, then that prevented the latter from merging. however if the former change couldn't pass its own tests, the latter would be re-tested without it and merged if it passed on its own | 14:41 |
fungi | that way we could test both changes in parallel with the assumption they would both pass their tests, and only need to restart testing if that assumption was untrue | 14:42 |
jrosser | dmsimard: so specifically ansible.netcommon ansible.utils openvswitch.openvswitch don't do this, which means they cant be installed with type: git in a collection requirements file | 14:44 |
dmsimard | jrosser: oh ? can you show me an example ? | 14:44 |
* dmsimard looks in the meantime | 14:44 | |
jrosser | here is our collection requirements file https://github.com/openstack/openstack-ansible/blob/master/ansible-collection-requirements.yml | 14:45 |
dmsimard | jrosser: oh I see what you mean | 14:45 |
jrosser | and the ones at the bottom cannot come from git, so cannot use cached repos in CI, so contribute to our CI failure noise floor | 14:45 |
dmsimard | you're right -- I think that's because they insert it dynamically at build/publish time | 14:45 |
dmsimard | but it's definitely worth talking about | 14:45 |
jrosser | if the process of applying the tag also committed the version, that would be cool | 14:46 |
jrosser | but i guess that can be tricky as you want the repo structure always to be 'master' or 'devel' or whatever | 14:46 |
noonedeadpunk | dmsimard: eventually Paul was quite picking in terms of not adding version to galaxy.yml for whatever reason | 14:47 |
dmsimard | I wonder if we could make the argument that it's a bug in the ansible-galaxy CLI (as in, if there's no version it's not a fatal error) | 14:47 |
dmsimard | noonedeadpunk: Paul has moved on and ain't maintaining those anymore | 14:47 |
noonedeadpunk | ah | 14:47 |
noonedeadpunk | well, eventually galaxy not failing anymore | 14:47 |
noonedeadpunk | but it shows collection version as "*" | 14:47 |
noonedeadpunk | which is not helpfull either | 14:48 |
* jrosser rechecked a patch for galaxy 502 error today | 14:48 | |
noonedeadpunk | but super valid from ansible-galaxy prespective | 14:48 |
noonedeadpunk | dmsimard: fwiw I created a bug https://github.com/ansible-collections/openvswitch.openvswitch/issues/94 | 14:48 |
noonedeadpunk | but not sure if worth spreading it across every collection.... | 14:49 |
dmsimard | jrosser: those 502 errors are a plague | 14:49 |
jrosser | right, and its hours and hours of wasted CPU time in an openstack context | 14:49 |
dmsimard | I see a lot of HTTP 429's too | 14:49 |
jrosser | the collections are cached on the CI nodes for some of them | 14:49 |
jrosser | dmsimard: in our jobs we pre-process the collections requirement file and re-write the entries that have local clones of the collections, like this https://github.com/openstack/openstack-ansible/blob/master/scripts/get-ansible-collection-requirements.yml#L50-L67 | 14:51 |
dmsimard | ouch | 14:52 |
jrosser | it's that big a deal relying on upstream galaxy being reliable that such measures are needed | 14:54 |
jrosser | whilst i understand that there is 'policy' that things are installed with the galaxy backend / api / whatever, reality says otherwise | 14:55 |
jrosser | and i would also like an option on ansible-galaxy which can install from git and retain the git metadata | 14:55 |
jrosser | as the developer workflow for collections is pretty terrible right now | 14:55 |
dmsimard | I will hunt that git version one for now but it would be helpful if you could organize and write down some of these papercuts somewhere that I can easily share them with the right folks | 14:56 |
dmsimard | fwiw I reproduced it with ansible.netcommon and it ain't even a helpful error message: ERROR! Collection artifact at '/home/dmsimard/.ansible/tmp/ansible-local-24116076pwl8md3/tmpyr5shbyf/ansible.ne-lp9xq87k/ansible.netcommon' is not a valid tar file. | 14:58 |
jrosser | :) | 14:58 |
noonedeadpunk | Just in case - we're about to start in https://www.openstack.org/ptg/rooms/essex | 15:02 |
* jrosser was just wondering where to find the schedule | 15:02 | |
jrosser | kinda absent from https://www.openstack.org/ptg/ | 15:02 |
noonedeadpunk | https://ptg.opendev.org/ptg.html | 15:03 |
damiandabrowski[m] | I also had a problem with that :D | 15:03 |
jrosser | doh | 15:03 |
*** ysandeep is now known as ysandeep|out | 16:03 | |
dmsimard | noonedeadpunk, jrosser: I spent some time reproducing the galaxy.yml version issue and asking folks about it -- in my testing, even though it says "Installing 'ansible.netcommon:*'", it installs the correct version (I tried 2.5.1 and 2.6.1) | 20:19 |
dmsimard | I've tested with the latest version of ansible-core (2.12.4) though | 20:20 |
dmsimard | The error I got earlier was because I had set "source: git" instead of "type: git" so it was my fault | 20:20 |
jrosser | dmsimard: interesting | 20:24 |
jrosser | we have a patch in flight to upgrade to 2.12.4 already | 20:25 |
jrosser | those tags all exist in ansible.netcommon git repo so maybe now it's more tolerant of missing version info in galaxy.yml | 20:26 |
opendevreview | Merged openstack/openstack-ansible-tests stable/xena: Add ansible.utils collection requirement https://review.opendev.org/c/openstack/openstack-ansible-tests/+/836361 | 20:32 |
*** dviroel is now known as dviroel|out | 20:36 | |
dmsimard | jrosser: I can try to reproduce with earlier versions to find out whether it was fixed at some point | 20:42 |
opendevreview | Merged openstack/openstack-ansible-rabbitmq_server stable/wallaby: Verify if hosts file already managed with OSA https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/836167 | 21:40 |
opendevreview | Merged openstack/openstack-ansible-repo_server master: Use ssh_keypairs role to generate keys for repo sync https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/827100 | 21:50 |
dmsimard | jrosser: seems to work with the latest 2.11 too (2.11.10) | 23:08 |
dmsimard | doesn't work with 2.9 (but also neither does ansible.posix with type: git) | 23:10 |
dmsimard | interestingly enough, ansible-base 2.10 does print an actual warning: [WARNING]: Collection at '/home/dmsimard/.ansible/tmp/ansible-local-2454180er65d81l/tmpr_cryqol/ansible.netcommon' does not have a valid version set, falling back to '*'. Found version: 'None' | 23:11 |
dmsimard | that warning doesn't exist in 2.11 and 2.12 | 23:12 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!