mordred | (garbage, org, repo, more_garbage, number) = s.rsplit('/', 5) really | 00:00 |
---|---|---|
* mordred loves split | 00:01 | |
jlk | oh yeah I could do that | 00:02 |
jlk | does "__" by itself have special meaning in python? | 00:05 |
jlk | pep8 might bitch about defining garbage and more garbage but not using it | 00:05 |
jamielennox | jlk: there's convention but it doesn't mean anything by default | 00:08 |
jamielennox | in web stuff it's general used as _('string') translation, which is annoying for when you want to use it for an ignore variable case | 00:09 |
jamielennox | so pep and a lot of things have rules built in to ignore it | 00:09 |
jlk | yeah I'll just use _ twice. | 00:13 |
jlk | DEBUG:connection.github:Updating <Change 0x7f20b8159c50 2,943abf6d7430610d348e105b6196e67626ff2d32>: Getting commit-dependent pull request j2sol/z8s-sandbox/1 | 00:23 |
mordred | jlk: ooh - that seems promising | 00:24 |
jlk | zuul-scheduler_1 | DEBUG:zuul.IndependentPipelineManager:Checking for changes needed by <Change 0x7f20b8159c50 2,943abf6d7430610d348e105b6196e67626ff2d32>: | 00:25 |
jlk | zuul-scheduler_1 | DEBUG:zuul.IndependentPipelineManager: Change <Change 0x7f20b8159c50 2,943abf6d7430610d348e105b6196e67626ff2d32> needs change <Change 0x7f20b81b38d0 1,bb71d444ce36ea561bf98d83d05d2ccefe38525a>: | 00:25 |
jlk | zuul-scheduler_1 | DEBUG:zuul.IndependentPipelineManager: Change <Change 0x7f20b81b38d0 1,bb71d444ce36ea561bf98d83d05d2ccefe38525a> is needed | 00:25 |
* mordred hands jlk a fatted calf | 00:25 | |
jlk | it's falling over a bit later, but yes, definitely progress | 00:25 |
SpamapS | nice | 00:29 |
jamielennox | jlk: oh you're working on dependencies - awesome | 00:36 |
jlk | thought I'd throw some brain at it today | 00:36 |
jamielennox | jlk: re https://review.openstack.org/#/c/474300/ - what is a non-PR event? | 00:37 |
jamielennox | like a merge event? | 00:37 |
jlk | a push | 00:37 |
jamielennox | so direct push to a repo, why are we handling that/running tests at all | 00:37 |
jlk | either by way of somebody merging a pr, or a direct push | 00:37 |
jlk | post-merge deployment pipelines | 00:37 |
jlk | like publishing to pypi | 00:38 |
jlk | a push event covers tags too | 00:38 |
jlk | so somebody may want pipeline action when a tag is applied | 00:38 |
jamielennox | hmm, yea that's difficult in the current model | 00:39 |
jamielennox | i imagine gerrit has the same problem, there's not really anywhere to report a pypi publish failing | 00:39 |
jlk | it kind of has that problem yes | 00:39 |
jlk | but it doesn't really come up in openstack because rarely ever is something pushed vs merged via gerrit | 00:40 |
jamielennox | you still push tags for version publishing, but yea | 00:42 |
jlk | zuul-scheduler_1 | Project j2sol/z8s-sandbox change 1,bb71d444ce36ea561bf98d83d05d2ccefe38525a based on None | 01:02 |
jlk | zuul-scheduler_1 | Project j2sol/z8s-sandbox change 2,943abf6d7430610d348e105b6196e67626ff2d32 based on <QueueItem 0x7fecd40c0310 for <Change 0x7fecd410a8d0 1,bb71d444ce36ea561bf98d83d05d2ccefe38525a> in check> | 01:02 |
jlk | oh cute, the github web UI is "helpfully" translating my DependOn URLs into relative links | 01:04 |
jlk | but it DOES notice that and makes the link | 01:04 |
jlk | comments about it in the linked to PR | 01:05 |
openstackgerrit | Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Implement Depends-On for github [wip] https://review.openstack.org/474401 | 01:20 |
jlk | So ^^ seems to be working at least in the independent manager. My setup doesn't yet _do_ jobs, but it is at least merging the right things and scheduling the jobs. I know it doesn't reflect the work happening in https://review.openstack.org/#/c/451423 but I can easily integrate that when it lands. | 01:21 |
jlk | of course need to add tests and what not | 01:21 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: Add support for zuul.d configuration split https://review.openstack.org/473764 | 02:27 |
*** isaacb has joined #zuul | 03:22 | |
*** isaacb has quit IRC | 03:24 | |
SpamapS | jlk: nice! | 03:44 |
* SpamapS has been fully distracted and hopes to be undistracted tomorrow | 03:44 | |
*** nt has quit IRC | 03:59 | |
*** nt has joined #zuul | 04:01 | |
jlk | tomorrow I'm mostly AFK, playing co-pilot for my grandmother, to show her how to get to the recovery facility where my grandfather is for the next few weeks. | 04:53 |
jlk | I'm still really loving testing out this live code from my source tree via docker compose. Really fast iteration. | 04:59 |
jamielennox | jlk: how are you kicking it off though? i'm still testing on our v3 instance and making github reissue the events | 05:03 |
jamielennox | means you have to do everything right though | 05:04 |
jamielennox | cannot figure out why the delegate_to: localhost is still not producting output | 05:04 |
*** hashar has joined #zuul | 06:39 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: executor: add support for custom ansible_port https://review.openstack.org/468710 | 07:04 |
*** ajafo has quit IRC | 07:13 | |
*** ajafo has joined #zuul | 07:13 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: bubblewrap: adds --die-with-parent option https://review.openstack.org/473164 | 07:18 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: executor: run trusted playbook in a bubblewrap https://review.openstack.org/474460 | 07:18 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: executor: run trusted playbook in a bubblewrap https://review.openstack.org/474460 | 07:49 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: config: refactor config get default https://review.openstack.org/474484 | 08:23 |
*** yolanda_ has joined #zuul | 08:59 | |
*** jkilpatr has quit IRC | 10:32 | |
*** jkilpatr has joined #zuul | 10:49 | |
*** yolanda__ has joined #zuul | 11:30 | |
*** yolanda_ has quit IRC | 11:30 | |
*** olaph has joined #zuul | 11:54 | |
*** yolanda__ is now known as yolanda | 12:05 | |
dmsimard | pabelanger: what are the odds of seeing https://review.openstack.org/#/c/464283/ backported to master ? | 13:20 |
dmsimard | I don't think anything is there is zuulv3 specific but I could be mistasken | 13:20 |
SpamapS | dmsimard: new features are not being merged into master for the most part, but I'm sure exceptions can be made. Also I believe there's an effort to allow running nodepoolv3 with zuul v2.5 for a while via a shim. | 13:30 |
mordred | SpamapS: fwiw, nobody (me) has gotten around to working on that shim | 13:32 |
SpamapS | mordred: so there's a *stalled* effort ... ;) | 13:35 |
mordred | heh | 13:36 |
jlk | jamielennox: I have copied a few event payloads into files, and then just use curl locally to throw the file at the scheduler. The PRs themselves the files reference are still open | 13:37 |
dmsimard | SpamapS: okay. | 13:45 |
*** hashar has quit IRC | 13:47 | |
*** hashar has joined #zuul | 13:54 | |
*** hashar has quit IRC | 14:04 | |
*** hashar has joined #zuul | 14:12 | |
Shrews | jlk: any chance you'd share your docker setup? | 14:18 |
jlk | Shrews: https://github.com/j2sol/z8s | 14:36 |
jlk | when I want to mount in my local code directory instead of using the clone that happens during the image build, I use docker-compose -f docker-compose.yaml -f devel.yaml up zuul-zookeeper zuul-scheduler zuul-executor | 14:36 |
jlk | pointing to both docker-compose.yaml and devel.yaml | 14:36 |
jlk | and you could use ZUUL_SRC=/path/to/your/checkout to change from the default of ~/src/zuul | 14:37 |
Shrews | jlk: thx. will have to give that a try soon | 14:39 |
pabelanger | dmsimard: not sure, it should be straightfoward to manually backport right now. Biggest change is the shade dependency | 14:50 |
dmsimard | pabelanger: ok tristanC will take a shot at it | 14:53 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: Add boot-from-volume support for nodes https://review.openstack.org/474607 | 15:07 |
tristanC | dmsimard: pabelanger: let's see how it goes ^ | 15:08 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: config: refactor config get default https://review.openstack.org/474484 | 15:29 |
*** hashar has quit IRC | 15:31 | |
jeblair | tristanC: oh thank you! | 15:32 |
jeblair | that removes 147 lines of the most boring, hard to read, error-prone code we have! :) | 15:33 |
jlk | oh lordy! | 15:34 |
jlk | then again, python 3's ConfigParser has that built into it. | 15:34 |
jlk | parser.get('section', 'value', fallback='something_to_fall_on') | 15:35 |
jlk | BTW I'm mostly out today. I have to deal with some grandparent medical stuffs. | 15:36 |
jeblair | we're on the cusp of moving to py3 entirely, we could probably just move to that. however, we haven't actually turned off the py2 tests yet, so would be hard to land that right now. maybe go with tristanC's, then move to py3 version later...? | 15:36 |
jlk | jeblair: yeah his is absolutely an improvement | 15:37 |
jlk | and there may be reason to keep our own wrapper, who knows. | 15:37 |
*** jkilpatr_ has joined #zuul | 15:40 | |
jeblair | SpamapS: https://review.openstack.org/474064 and https://review.openstack.org/474188 are good warmups. :) | 15:42 |
*** jkilpatr has quit IRC | 15:43 | |
pabelanger | jeblair: I'd like to stop ze01.o.o and pick up the latest zuul changes | 16:15 |
jeblair | pabelanger: wym | 16:17 |
jeblair | pabelanger: wfm | 16:17 |
mordred | jlk, jeblair: his version does have the expand_user flag, which is nice. | 16:22 |
jeblair | mordred: ++ | 16:33 |
*** jkilpatr has joined #zuul | 16:36 | |
*** jkilpatr_ has quit IRC | 16:37 | |
jlk | yeah, the v3 move might just be to simplify the wrapper, without losing the wrapper. | 16:39 |
mordred | jeblair: also - re: py3/py2 - as we consider operator documentation and whatnot, we should be careful about py2 on nodes, as ansible modules themselves are the least-likely thing to be py3 compat at the moment | 17:08 |
mordred | jeblair: jamielennox ran in to that with the delegate_to: localhost issu | 17:09 |
SpamapS | so if you delegate_to: localhost it uses whatever interpreter is running ansible-playbook ? | 17:09 |
jeblair | mordred: oh, i thought the py3 issue was on remote nodes -- that was on delegate_to: localhost? | 17:09 |
SpamapS | seems like you can solve that with ansible_interpreter=/usr/bin/python2.7 or something | 17:09 |
jeblair | mordred: no wonder we couldn't figure it out. | 17:10 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Default bubblewrap to work_root https://review.openstack.org/473099 | 17:10 |
mordred | SpamapS: yes - we just need to make sure people realize that "zuul v3 runs py3" doesn't mean "I don't need py2 on my nodes" | 17:11 |
mordred | jeblair: well, delegate_to: localhost uses localhost as a remote node | 17:11 |
clarkb | SpamapS: mordred not sure about deletgate_to but connection local uses system python iirc, there are sorts of problems using that and virtualenvs | 17:12 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Read layout from BuildSet in first merge scenario https://review.openstack.org/474064 | 17:12 |
jeblair | mordred: yeah, i grok that. i just remember having conversations like "python on the remote node is python2" | 17:12 |
mordred | yah. I think there is stuff here, like clarkb and SpamapS mention, that we probably just need to tighten just a bit | 17:12 |
jeblair | mordred: i was clearly missing an important piece of information. i'm happy i have it now and it all makes much more sense. | 17:12 |
mordred | woot | 17:12 |
clarkb | its my big grump when running ansible tests locally out of a venv beacuse my virtualenv makes python3 env by default then it does python2 "remotely" iirc | 17:13 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Limit github reporters to event types https://review.openstack.org/474300 | 17:18 |
*** greghaynes is now known as greghayn1 | 17:18 | |
*** greghayn1 has quit IRC | 17:25 | |
pabelanger | mordred: we seem to no longer be redirecting ansible error output to zuul-executor debug logs? Is that intentional? | 17:34 |
jeblair | pabelanger: not sure what you mean -- i see some ansible output, what's missing? | 17:47 |
jeblair | pabelanger: (also, remember since you restarted, if you want -vvv you'll need to 'zuul-launcher verbose' again) | 17:48 |
jeblair | looks like you did that though, i think? | 17:49 |
pabelanger | jeblair: if you have a syntax error in your playbook / role, we no longer seem to log that into zuul-executor. I thought we did that before | 17:49 |
pabelanger | jeblair: yes, verbose is enabled | 17:49 |
mordred | pabelanger: that's likely a bug | 17:50 |
jeblair | pabelanger: oh, i don't know if that was there or not. | 17:50 |
jeblair | mordred: is it though? if that can end up in the ansible_log, isn't that enough? | 17:50 |
jeblair | (obviously, if that doesn't end up in the ansible log, it should be in the executor log) | 17:51 |
jeblair | i just don't want to make the executor log noiser than necessary :) | 17:51 |
jeblair | 2017-06-15 17:51:11,695 DEBUG zuul.AnsibleJob: [build: 74ce26349c2546d29f31e38a03ccd516] Ansible output: b'' | 17:53 |
jeblair | 2017-06-15 17:52:09,729 DEBUG zuul.AnsibleJob: [build: 74ce26349c2546d29f31e38a03ccd516] Ansible output terminated | 17:53 |
pabelanger | I think adding it to exector log make sense, because we also add normal ansible output today. It seems we are just missing errors. If we want to remove that from executor log too, that is is okay. But makes debugging a little harder, if your log pulbishers are down (like they are today for us) you become blind | 17:53 |
jeblair | pabelanger: ok | 17:53 |
jeblair | pabelanger: those lines tell me that the ~1 minute delay is between the last line being read and stdout closing | 17:54 |
pabelanger | jeblair: agree | 17:55 |
pabelanger | Is ansible production ready under python3? I've been having a hard time of the last few days trying to figure a series of errors since we switched to ze01.o.o. And have made not much progress writing jobs | 17:56 |
pabelanger | as an example: http://paste.openstack.org/show/612739/ | 17:57 |
pabelanger | we are getting random file exists errors from tox now | 17:57 |
pabelanger | and I am not seeing this on zuul.o.o today | 17:58 |
jeblair | pabelanger: how is that related to ansible? that's an error in virtualenv | 17:59 |
pabelanger | I'm not sure it is, but I am having a hard try determining what changed between moving to ze01.o.o and random failures | 18:00 |
pabelanger | I've been basically trying to make our existing jobs stable for the last day and 1/2. and haven't been able to write any new jobs | 18:01 |
jeblair | pabelanger, mordred: switching back to the one minute delay -- actually, i'm a little confused -- the "Ansible output: b''" log line is the sentinel for the iterator. as soon as it is read, the for loop should exit. | 18:01 |
jeblair | pabelanger: i'm happy to help investigate. which problem would you like to focus on? | 18:01 |
jeblair | pabelanger: 1min delay, or venv error? | 18:01 |
pabelanger | jeblair: 1min delay, lets continue with that | 18:02 |
jeblair | okay | 18:02 |
pabelanger | I'll put venv error on hold | 18:02 |
jeblair | pabelanger, mordred, clarkb, SpamapS: i think i may not fully understand the iter() method and how it relates to the sentinel | 18:02 |
jeblair | i'm going to poke at that and see if i can gain a better understanding | 18:03 |
jeblair | oh, i think i see it | 18:04 |
jeblair | the b'' in the log is probably originally a b'\n'. then we strip it | 18:04 |
mordred | jeblair: oh. yes. | 18:04 |
jeblair | so that's not the sentinel being logged | 18:04 |
jeblair | so indeed, it looks like we get a blank line from ansible, then 1 minute later, stdout closes. | 18:05 |
jeblair | mordred, pabelanger: how about we throw some more vvv logging into zuul_stream to see if it's being slow | 18:06 |
jeblair | (it doesn't appear slow enough to log errors, but maybe there's still something going on?) | 18:06 |
pabelanger | wfm | 18:07 |
jeblair | working on that | 18:07 |
mordred | jeblair: ++ | 18:08 |
* mordred also looking at that code, fwiw | 18:08 | |
jeblair | mordred: part of me wants the 'handle list of streamers' change to land before we do too much more on this | 18:09 |
mordred | jeblair: want me to address jamie's comments and push that back up real quick? | 18:09 |
jeblair | mordred: sure | 18:11 |
jeblair | pabelanger: while that's going on, i want to try to strace an ansible-playbook process | 18:15 |
jeblair | pabelanger: do you have any idea what the file/stat loop is? | 18:16 |
jeblair | 2017-06-15 18:16:58,929 DEBUG zuul.AnsibleJob: [build: fa29ae54fa2a40a4bc3ad3082ad3bb2c] Ansible output: b'Using module file /usr/local/lib/python3.5/dist-packages/ansible/modules/files/stat.py' | 18:17 |
jeblair | 2017-06-15 18:16:59,210 DEBUG zuul.AnsibleJob: [build: fa29ae54fa2a40a4bc3ad3082ad3bb2c] Ansible output: b'Using module file /usr/local/lib/python3.5/dist-packages/ansible/modules/files/file.py' | 18:17 |
pabelanger | jeblair: no, I haven't debugged that yet. I did notice it | 18:17 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Make logging helper method in zuul_stream https://review.openstack.org/472963 | 18:17 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Special case shell logging on localhost https://review.openstack.org/474216 | 18:17 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Change log streaming link to finger protocol https://review.openstack.org/437764 | 18:17 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Extract get_playhosts listing in zuul_stream to a method https://review.openstack.org/472964 | 18:17 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Handle lists of streamers https://review.openstack.org/474230 | 18:17 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Direct streaming at delegated_to target https://review.openstack.org/474215 | 18:17 |
jeblair | it's real slow | 18:17 |
mordred | jeblair: ok. there's the stakc fixed and restacked - I checked it locally against a node running a zuul_console and it did what I expected | 18:18 |
jeblair | i have a hunch... working on it | 18:19 |
jeblair | i think it's bubblewrap | 18:20 |
jeblair | ansible exits as expected, but it takes bubblewrap 1 minute to then exit itself | 18:21 |
mordred | oh - interesting. I'm not running anything with bubblewrap in my local tests | 18:21 |
jeblair | when be're between the b'' and 'Ansible output terminated' log lines, there is a bwrap process, but no ansible-playbook process | 18:21 |
jeblair | wait4(-1, | 18:21 |
jeblair | [{WIFEXITED(s) && WEXITSTATUS(s) == 255}], 0, NULL) = 28 | 18:21 |
jeblair | wait4(-1, 0x7ffdb3e6cc9c, 0, NULL) = -1 ECHILD (No child processes) | 18:21 |
jeblair | exit_group(2) = ? | 18:21 |
jeblair | +++ exited with 2 +++ | 18:21 |
jeblair | strace only reports that ^ | 18:22 |
pabelanger | cool, now we know what is happening | 18:23 |
pabelanger | it is possible we could terminate bubblewrap our self from the executor? | 18:27 |
jeblair | pabelanger: i doubt we would get the correct exit code, and it could even be that bwrap *is* terminating already | 18:35 |
mordred | maybe it's taking time removing things? | 18:38 |
pabelanger | that likely explains why I also see | 18:40 |
pabelanger | zuul 12161 0.0 0.0 0 0 ? Zs 18:39 0:00 [bwrap] <defunct> | 18:40 |
pabelanger | ansible-playbook termintated, and bwrap waiting to close | 18:40 |
jeblair | mordred: seems to be stuck on a wait -- there was only one bwrap process, and i followed threads in strace. so i think that was the only thing it was doing. | 18:41 |
mordred | jeblair: so might just be a bug in bwrap? | 18:44 |
mordred | https://github.com/projectatomic/bubblewrap/blob/a4709b6547caf438e41cb478b0b9faded7e4b941/bubblewrap.c#L384 | 18:45 |
jeblair | mordred: i think there are 3 wait/waitpids in bwrap; i haven't quite wrapped my head around what they do | 18:45 |
jeblair | oh that helps | 18:45 |
jeblair | mordred: oh | 18:46 |
jeblair | i should look for the whole process group | 18:46 |
jeblair | there could be something other than just bwrap or ansible-playbook... | 18:46 |
jeblair | ugh all the jobs are in the long boring file/stat loop | 18:48 |
jeblair | wow | 18:48 |
jeblair | that's running gcc every time | 18:48 |
jeblair | (i don't know if it's file.py or stat.py) | 18:48 |
mordred | jeblair: really? something is running gcc at runtime every time? | 18:49 |
jeblair | mordred: every second | 18:49 |
jeblair | bwrap───ansible-playboo─┬─ansible-playboo───sh───sh───python3───python3───sh───gcc───collect2───ld └─{ansible-playboo} | 18:49 |
jeblair | that did not work, sorry. | 18:49 |
Shrews | heh, i thought weechat just threw up | 18:50 |
jeblair | Shrews: i threw up into the channel | 18:50 |
* jeblair mops up ascii characters | 18:50 | |
jeblair | okay that's running /tmp/a3d7e5eac7b84be59ac32d2f088fb39d/ansible/post_playbook_2/git.openstack.org/openstack-infra/openstack-zuul-jobs/playbooks/base/post.yaml | 18:51 |
jeblair | maybe it's terrible because /opt/zuul-logs/ doesn't exist? | 18:51 |
pabelanger | Oh | 18:51 |
pabelanger | yes, that is likely the case | 18:51 |
jeblair | pabelanger: maybe we should just empty the task list for the base post playbook right now? | 18:51 |
SpamapS | gcc at runtime smells like ffi? | 18:51 |
pabelanger | jeblair: ++ | 18:51 |
jeblair | SpamapS: yeah | 18:51 |
pabelanger | I will do that now | 18:52 |
jeblair | pabelanger: thx | 18:52 |
jeblair | pabelanger: im considering mkdiring that directory for the moment, but then rm-rfing it when your patch lands | 18:53 |
mordred | pabelanger, jeblair: which patch - should I go review real quick? | 18:54 |
jeblair | mordred: pabelanger is writing it now | 18:54 |
pabelanger | jeblair: WFM | 18:54 |
pabelanger | I had to manually create it on zuulv3-dev.o.o a while back too | 18:55 |
jeblair | mkdir didn't help the copy already in progress :( | 18:55 |
SpamapS | so catching up... there's a thought that bubblewrap's pid piece is waiting for a minute after the last process exits? | 18:56 |
jeblair | SpamapS: yep -- or, at least, one minute after ansible-playbook exits. i have not confirmed that's the last process | 18:57 |
SpamapS | yeah its entirely possible there are some other processes spawned out by ansible that need cleaning up | 18:57 |
jeblair | okay, after all that, there was no 1 minute delay on that playbook | 18:57 |
jeblair | so at least for the base/post playbook, we don't see the behavior | 18:58 |
jeblair | i'll need to retrigger something to try to catch it | 18:58 |
jeblair | pabelanger: oh, /opt/zuul-logs isn't in bwrap, that's why the mkdir didn't help | 18:59 |
pabelanger | Aha | 18:59 |
jeblair | pabelanger: anyway, new jobs should get your chaneg | 18:59 |
pabelanger | ++ | 18:59 |
jeblair | bwrap───ssh | 19:00 |
jeblair | aha! | 19:00 |
jeblair | it's the ssh connection cache thingy | 19:00 |
jeblair | er... controlmaster? | 19:00 |
SpamapS | ah and it times out at 1 minute | 19:01 |
jeblair | yep | 19:01 |
pabelanger | right, that now make sense , it is 60s | 19:01 |
pabelanger | ControlPersist=60s is what I am thinking of | 19:02 |
jeblair | yep | 19:02 |
jeblair | this is interesting... i guess with bwrap we won't be able to utilize that across playbook runs... | 19:02 |
SpamapS | if we execute 'ssh -O cancel' I think that might do it. | 19:02 |
SpamapS | jeblair: it's still useful for each play sharing SSH conns inside a single playbook. | 19:03 |
SpamapS | so not a total sadmaker | 19:03 |
SpamapS | also we could get fancy and pass the socket in | 19:03 |
jeblair | SpamapS: yep. best we can do right now i think, at least without a lot of creative thought | 19:03 |
jeblair | oh look, creative thought | 19:03 |
pabelanger | SpamapS: ya, was just searching for a socket thing | 19:03 |
SpamapS | It would work just like ssh-agent | 19:04 |
jeblair | is it just one process that handles all connections to all hosts? | 19:04 |
mordred | how do controlpersist and agent work with each other? | 19:04 |
pabelanger | Ya, looks like control_path is what we'd have to bindmount | 19:04 |
SpamapS | mordred: controlpersist just works like a multiplexer for the ssh socket. | 19:05 |
SpamapS | so agent doesn't interact with controlpersist | 19:05 |
SpamapS | oh hm | 19:05 |
SpamapS | OH | 19:05 |
SpamapS | we could replace the agent copying in with already-established SSH conns on a controlpersist master | 19:06 |
SpamapS | would reduce the attack vector a bit | 19:06 |
jeblair | SpamapS: if a non-ssh play took longer than 60s, the controlpersist master will die, and the next ssh task ansible performs will start a new one, right? | 19:07 |
SpamapS | since they couldn't establish new conns | 19:07 |
SpamapS | jeblair: you can run a persistent master. | 19:07 |
SpamapS | that just sticks around until you tell it to go away | 19:07 |
jeblair | oh, neat | 19:07 |
SpamapS | (assuming your keepalives work so you don't loose the transport) | 19:07 |
SpamapS | lose | 19:07 |
jeblair | so basically *just like* the agent model. which is what you said. but now with more emphasis. :) | 19:07 |
SpamapS | Yeah you just set ControlPersist to 0 to make it live indefinitely. | 19:08 |
mordred | the downside of that is that we'd have to know which connections to start and when | 19:08 |
SpamapS | mordred: you'd have to do a ping all I think | 19:08 |
SpamapS | which isn't a bad idea anyway | 19:08 |
mordred | SpamapS: nod. so write a playbook that the executor runs that does hosts: all ping essentially, but with it set to control-persist 0 | 19:09 |
mordred | then pass in the created socket to the real playbook invocations | 19:09 |
mordred | then destroy the thing when we're done with the job | 19:09 |
mordred | (saying that to make sure I grok) | 19:09 |
SpamapS | Basically just set ControlPersist to 0, run ansible -i ourinventory -m ping all in trusted context, then after the job 'ssh -O exit' | 19:10 |
jeblair | SpamapS: there's still a middle ground, right? run both agent and master before starting. ansible can still create new connections, but we solve the caching and exit problem. | 19:10 |
SpamapS | I actually really like this and it might even prove simpler than managing agents. | 19:10 |
pabelanger | If we moved ControlPath to outside bwrap, passed into bwrap, doesn't that solve the cache issue? But we still have the ControlPersist=60s issue? | 19:14 |
pabelanger | also, I would also think ansible-playbook might also want to run ssh -o cancel directly, to just avoid this issue on shutdown too | 19:14 |
jeblair | pabelanger: bwrap is still running because the ssh process is running. so to solve the exit issue, you need to start the control master before running bwrap. | 19:14 |
pabelanger | jeblair: Ah, I see now | 19:15 |
pabelanger | thank you | 19:15 |
jeblair | so i think the two options on the table are: | 19:15 |
pabelanger | okay, ya. So setting up control master, with ssh-agent seems to make sense then | 19:15 |
jeblair | A) start agent and controlmaster before first ansible invocation. kill both after last ansible invocation. all connections will actually be made within the playbooks, and new connections can be made. the agent can be accessed by the playbooks and manipulated. | 19:16 |
jeblair | B) start controlmaster and then ping all hosts before first ansible-playbook invocation. kill controlmaster after last ansible invocation. all connections will be made before playbooks run, no new connection can be made. no agent is required. | 19:17 |
mordred | jeblair: I think I like B better - less moving parts - and if we can't make a connection to a remote host it happens in a place where we're totally in control of that and will know exactly what is happening | 19:18 |
mordred | jeblair: that said - if someone wants to reboot a host in the middle of a playbook, this would prevent that | 19:19 |
mordred | because the connnection would go away and the playbook would not be able to re-connect | 19:19 |
pabelanger | interested, ya. I'd love for us to be able to suppor that | 19:19 |
jeblair | mordred: yeah. B has a lot going for it (it removes the need for the agent, *and* the whole ssh-key-removal pre-playbook dance SpamapS is working on). but it does have that problem. | 19:20 |
mordred | that's potentially a valid thing to want to do for testing kernels and nothing else in the system prevents that | 19:20 |
jeblair | mordred: or really sophisticated deployment testing? | 19:20 |
mordred | yah | 19:21 |
jeblair | like the sort of hypothetical "run ursula" job | 19:21 |
pabelanger | ++ In the case of loosing reboots, I think A gets my vote | 19:21 |
mordred | exactly. although that one, I believe, needs to run ursula itself on a node pointed at the other node | 19:21 |
mordred | since it needs to do things we don't allow | 19:22 |
jeblair | right. "trusted run usula" :) | 19:22 |
jeblair | ursula even | 19:22 |
SpamapS | A+=1 | 19:22 |
SpamapS | less state, more resilience. | 19:23 |
jeblair | okay. the A's have it. but we'll all remember B and how it's actually so much cooler than A. | 19:23 |
pabelanger | :) | 19:24 |
jeblair | i'm going to grab some lunch; pabelanger i'll look at the venv thing with you when i get back | 19:24 |
SpamapS | I liked B before it was cool. | 19:25 |
*** jkilpatr has quit IRC | 20:18 | |
jeblair | pabelanger: what job was your venv error from? | 20:53 |
clarkb | jeblair: openstack ansible jobs pep8 iirc | 20:53 |
jeblair | er, should have asked which build :) | 20:53 |
jeblair | or does it consistently happen on all builds of that job? | 20:54 |
pabelanger | jeblair: I believe 473845 was one that showed the issue | 20:55 |
pabelanger | so far I have seen it on pep8, and docs jobs | 20:55 |
pabelanger | now, it is completely possible it is limited to my patchset, that's what I've been trying to determine | 20:56 |
jeblair | pabelanger: do you have the whole log from that build? | 20:57 |
pabelanger | I can manually grab it now | 20:57 |
jeblair | pabelanger: from where? | 20:58 |
pabelanger | jeblair: /tmp/paul.txt on ze01.o.o | 20:58 |
jeblair | pabelanger: ok i'll look at it there | 20:58 |
pabelanger | I manually copied it from jobdir | 20:58 |
*** jkilpatr has joined #zuul | 20:59 | |
pabelanger | jeblair: I have to step away for a bit, for family activity. Will be able to loop back in a few hours | 20:59 |
mordred | SpamapS: if you have a second: https://review.openstack.org/#/c/474230 and the 5 before it are ready to go in | 21:09 |
jeblair | clarkb, mordred: okay, i'm on a node where it failed... there is, indeed, a tox pep8 venv. | 21:27 |
jeblair | when i try to run 'tox' on my own, in various forms, i can't reproduce the error | 21:27 |
clarkb | jeblair: check the git reflog to see if host was reused maybe? | 21:27 |
clarkb | (should have checkouts of earlier change ?) | 21:28 |
jeblair | clarkb: i checked that; syslog looks new, and the last time zl01 used the host was a couple hours ago. uptime is ~20 mins | 21:29 |
jeblair | but i'm surprised that i can't reproduce the error | 21:32 |
jeblair | run-tox.sh works whether .tox/pep8 exists or not. | 21:32 |
clarkb | ya tox seems to be deciding that it needs to execute virtualenv even though the venv exists already | 21:32 |
clarkb | and that is what fails. I bet if you manually tried to run virtualenv there it would fail too | 21:33 |
jeblair | clarkb: nope that works too | 21:33 |
jeblair | /usr/bin/python2 -m virtualenv --python /usr/bin/python2 pep8 | 21:33 |
jeblair | succeeds | 21:33 |
jeblair | (that's the command from the log) | 21:33 |
clarkb | that is in .tox/ ? | 21:33 |
jeblair | 2017-06-15 21:09:31.343068 | ubuntu-xenial | ERROR: InvocationError: /usr/bin/python2 -m virtualenv --python /usr/bin/python2 pep8 (see /home/zuul/src/git.openstack.org/openstack-infra/openstack-zuul-jobs/.tox/pep8/log/pep8-0.log) | 21:33 |
jeblair | clarkb: yep. still succeeds. | 21:34 |
jeblair | the log file in that line, however, does not exist. | 21:34 |
mordred | jeblair: something something bubblewrap? | 21:35 |
jeblair | mordred: i hope not, we're actually on the remote node now. but maybe something something ansible? | 21:36 |
mordred | virtualenv uses python egg-link files right? | 21:36 |
mordred | oh- duh. remote node | 21:36 |
jeblair | mordred: i'm going to try being zuul and running ansible-playbook from zl01 | 21:36 |
mordred | jeblair: ++ | 21:36 |
jeblair | oops, we missed adding some env vars to our debug lines | 21:37 |
jeblair | also, ssh-agent makes this hard | 21:37 |
clarkb | huh ya confirmed outside of tox and all that that virtualenv doesn't ciomplain if you revenv | 21:38 |
mordred | jeblair: fwiw,when I run ansible-playbook by hand in the context of our zuul stuff, I do: | 21:38 |
mordred | jeblair: ZUUL_JOB_OUTPUT_FILE=$(pwd)/log.out ansible-playbook | 21:38 |
jeblair | clarkb: the traceback almost looks like it's a file/dir mixup (like it's mkdiring over a file maybe?) | 21:38 |
jeblair | mordred: ya, got past that, just need to figure out how to ssh now | 21:39 |
mordred | jeblair: but - this is making me think we should make a utility like "zuul-playbook" that we can use to run ansible-playbook in the forground lke zuul runs it | 21:39 |
mordred | jeblair: ++ | 21:39 |
mordred | (just as a zuul-admin debug tool) | 21:39 |
jeblair | 2017-06-15 21:07:05,545 DEBUG zuul.AnsibleJob: [build: 783818dc9fcb4ada9724bf32c6cc5e5b] Ansible command: ANSIBLE_CONFIG=/tmp/783818dc9fcb4ada9724bf32c6cc5e5b/ansible/untrusted.cfg ansible-playbook -vvv /tmp/783818dc9fcb4ada9724bf32c6cc5e5b/work/src/git.openstack.org/openstack-infra/openstack-zuul-jobs/playbooks/base/pre.yaml | 21:40 |
jeblair | that used to work as a copy paste. we can add the ZUUL_JOB_OUTPUT_FILE var to it, but we'll need to solve the ssh-agent thing. | 21:40 |
mordred | ++ | 21:40 |
jeblair | mordred: that may be the thing that pushes us to needing zuul-playbook | 21:40 |
mordred | jeblair: --private-key= | 21:41 |
mordred | jeblair: argument to ansible-playbook | 21:41 |
jeblair | mordred: oh neat | 21:41 |
jeblair | just in time for the clock to run out (i put a sleep 40 mins in a change since we don't have node holding yet) | 21:42 |
jeblair | will try again | 21:42 |
mordred | jeblair: fwiw, I'm running a tox -epep8 via ansible with our libs/ansible plugins loaded against a node I grabbed a few days ago from regular nodepool | 21:45 |
mordred | jeblair: and it worked | 21:45 |
*** yolanda has quit IRC | 21:45 | |
jeblair | okay, i'm on a node before it's gotten around to running tox | 21:48 |
jeblair | and can confirm that there is no .tox dir at the moment | 21:48 |
jeblair | and now there is | 21:49 |
jeblair | 2017-06-15 21:48:14.564793 | ubuntu-xenial | py.error.ENOENT: [No such file or directory]: open('/home/zuul/src/git.openstack.org/openstack-infra/openstack-zuul-jobs/.tox/pep8/log/pep8-0.log', 'r') | 21:49 |
jeblair | okay, that's a slightly different error | 21:49 |
jeblair | clarkb, mordred: i am able to reproduce when running via ansible | 21:54 |
mordred | jeblair: and it's base/pre.yaml that's failing? | 21:54 |
jeblair | mordred: no, tox/linters.yaml | 21:54 |
jeblair | when i run it with no existing .tox dir, i get the first error (the one about logs). if i run it after that, i get the other error about the python binary. | 21:55 |
jeblair | /tmp/jeblair1 and /tmp/jeblair2 are the zuul job log output files for both of those cases | 21:55 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Make logging helper method in zuul_stream https://review.openstack.org/472963 | 21:56 |
jeblair | also http://paste.openstack.org/show/612753/ is the console output for one of those | 21:56 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Extract get_playhosts listing in zuul_stream to a method https://review.openstack.org/472964 | 21:57 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Change log streaming link to finger protocol https://review.openstack.org/437764 | 21:57 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Direct streaming at delegated_to target https://review.openstack.org/474215 | 21:58 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Special case shell logging on localhost https://review.openstack.org/474216 | 21:58 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Handle lists of streamers https://review.openstack.org/474230 | 21:58 |
mordred | jeblair: ok - as another data point - if I run that playbook with the role from openstack-zuul-jobs with ansible with our plugins but on the host I just hapepn to have - it works | 22:01 |
mordred | jeblair: so I have not managed to reproduce it using ansible from my laptop | 22:01 |
jeblair | i made a shell script on the node that set the same env variables that were printed in the log | 22:15 |
jeblair | i ran it with "env -i". and it works | 22:15 |
jeblair | but ansible-playbook from zl01 continues to fail | 22:15 |
jeblair | ZUUL_JOB_OUTPUT_FILE=/tmp/jeblair3 ANSIBLE_CONFIG=/tmp/fc8b064159e742fd8d7ecfe216335fb4/ansible/untrusted.cfg ansible-playbook -vvv /tmp/fc8b064159e742fd8d7ecfe216335fb4/work/src/git.openstack.org/openstack-infra/openstack-zuul-jobs/playbooks/tox/linters.yaml --private-key=/var/lib/zuul/ssh/nodepool_id_rsa | 22:15 |
jeblair | is the command i'm using | 22:16 |
jeblair | when i run that test script using ansible-playbook on zl01, it fails | 22:17 |
jeblair | and there goes the host | 22:20 |
mordred | blast | 22:20 |
jeblair | getting a new one | 22:20 |
clarkb | virtualenv uses the python you are using by default which may be affected by ansible buy tox passes explicit -p to select the python | 22:22 |
*** jkilpatr has quit IRC | 22:24 | |
mordred | clarkb: it seems to be the right python in both cases | 22:31 |
mordred | clarkb: /home/zuul/src/git.openstack.org/openstack-infra/openstack-zuul-jobs/.tox$ /usr/bin/python2 -m virtualenv --python /usr/bin/python2 pep8 >/home/zuul/src/git.openstack.org/openstack-infra/openstack-zuul-jobs/.tox/pep8/log/pep8-0.log | 22:32 |
clarkb | ya it passes -p I guess really --python which is as explicit as you can get so the calling python shouldn't matter at all | 22:32 |
jeblair | by commenting out a lot of stuff in run-tox.sh and removing the .tox dir, i got it to succeed. | 22:41 |
jeblair | working out what's relevant now | 22:41 |
mordred | jeblair: woot | 22:43 |
jeblair | nope. this is not consistent. | 22:44 |
jeblair | it sometimes succeeds and sometimes fails. | 22:44 |
jeblair | i'm out of ideas | 22:45 |
mordred | jeblair: which things did you comment out? | 22:45 |
mordred | jeblair: were any of them the env vars it sets? | 22:45 |
jeblair | mordred: yes | 22:45 |
jeblair | mordred: i don't trust that experiment | 22:45 |
mordred | me either | 22:45 |
jeblair | mordred: maybe commenting some of them out makes it sometimes succeed instead of always fail | 22:46 |
jeblair | but... ?? | 22:46 |
pabelanger | and back, just catching up | 22:46 |
mordred | yah - I'd like to understand what's different in the env | 22:46 |
mordred | especially since I can run it successfully from my laptop using the role and playbook | 22:46 |
mordred | so there is $something being changed in between those two | 22:47 |
* jlk back from a long day | 22:47 | |
jlk | ... to start working | 22:47 |
mordred | jeblair: are you working in /tmp/d3ed28ec9f954f56b897a52ecefa62ea/ansible ? | 22:48 |
jeblair | mordred: yes | 22:48 |
jeblair | 2017-06-15 22:51:58.461919 | ubuntu-xenial | py.error.EACCES: [Permission denied]: open('/home/zuul/src/git.openstack.org/openstack-infra/openstack-zuul-jobs/.tox/pep8/log/pep8-1.log', 'w') | 22:52 |
mordred | jeblair: I'm running a few times | 22:52 |
jeblair | what is | 22:52 |
jeblair | oh | 22:52 |
jeblair | that explains why every time i run i get a different error | 22:52 |
mordred | no - this is the first time I ran - sorry - thought you'd stopped | 22:52 |
* mordred no longer touching anyhting | 22:52 | |
jeblair | 2017-06-15 22:54:14.881247 | ubuntu-xenial | shutil.Error: `/usr/lib/python2.7/copy_reg.py` and `/home/zuul/src/git.openstack.org/openstack-infra/openstack-zuul-jobs/.tox/pep8/lib/python2.7/copy_reg.py` are the same file | 22:54 |
jeblair | now a success | 22:54 |
mordred | wtf | 22:54 |
jeblair | another success | 22:55 |
pabelanger | ya, that's been my last 24 hours. pulling hair out at randomness | 22:55 |
jeblair | mordred: all yours :) | 22:55 |
mordred | 2017-06-15 22:56:00.397324 | mordred-irc.inaugust.com | /usr/local/jenkins/slave_scripts/run-tox.sh: .tox/pep8/bin/pip: /home/zuul/src/git.openstack.org/openstack-infra/openstack-zuul-jobs/.tox/pep: bad interpreter: No such file or directory | 22:56 |
jeblair | mordred: note, i have editied run-tox.sh. i only commented things out. feel free to edit. | 22:56 |
jeblair | mordred, pabelanger: i have stopped nodepool-launcher on nl01. | 22:58 |
jeblair | that should prevent this host from being deleted. | 22:58 |
pabelanger | k | 22:58 |
mordred | jeblair: so - I'm currently removing the .tox dir between each run | 22:59 |
jeblair | oh wow, i did that just in time. the job completed. i lost the tmpdir on ze01, but the host is still there. | 22:59 |
mordred | woot | 23:00 |
jeblair | mordred: yeah, i did that too | 23:00 |
mordred | jeblair: I have not yet gotten it to fail - so I'm uncommenting the things and trying again | 23:02 |
jeblair | mordred: i'll see if i can convert another build directory on ze01 to work with this host | 23:02 |
mordred | jeblair: ok. I can run it over and over again and it works - so something is different from my execution and the build dir on ze01 | 23:04 |
jeblair | mordred: ok, can you pause for a min, and lemme see if i have this reconstructed? | 23:04 |
mordred | yup. | 23:04 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Implement pipeline reject filter for github https://review.openstack.org/474001 | 23:08 |
jeblair | ZUUL_JOB_OUTPUT_FILE=/tmp/jeblair3 ANSIBLE_CONFIG=/tmp/39321fb5d76f4a54a37de4f3beb384fa/ansible/untrusted.cfg ansible-playbook -vvv /tmp/39321fb5d76f4a54a37de4f3beb384fa/ansible/playbook_0/git.openstack.org/openstack-infra/openstack-zuul-jobs/playbooks/tox/linters.yaml --private-key=/var/lib/zuul/ssh/nodepool_id_rsa | 23:09 |
jeblair | mordred: okay, that just hit the error | 23:09 |
jeblair | mordred: i ran this on the host before running that: | 23:09 |
jeblair | rm -fr ~zuul/.cache/ ~zuul/src/git.openstack.org/openstack-infra/openstack-zuul-jobs/.tox/ | 23:09 |
mordred | jeblair: cool | 23:09 |
jeblair | mordred: do you want to try running that? | 23:11 |
mordred | yes - copying a few things real quick ... | 23:11 |
mordred | jeblair: ok. I'm running it again - the first time worked | 23:18 |
mordred | second time worked | 23:18 |
mordred | ran it again without cleaning and it worked | 23:18 |
mordred | cleaned the dir - running again | 23:19 |
mordred | k. it keeps working for me | 23:20 |
mordred | let's figure out what's different between me and ze01 | 23:20 |
SpamapS | on the surface.. ze01 is a VM and you are a human? | 23:21 |
* SpamapS will stop snarking and try to be helpful | 23:21 | |
SpamapS | is there something I can do to be a third pair of eyes? | 23:21 |
jeblair | mordred: have you run it on ze01 yet? | 23:21 |
jeblair | mordred: i'm guessing your most recent series was copy to your workstation and run from there? | 23:22 |
mordred | jeblair: yes, that is correct - so all the files from the work dir are what I'm working from | 23:22 |
mordred | I also just updated my local zuul copy to be the same as the copy of zuul on ze01 and it still works | 23:23 |
mordred | jeblair: oh wait - it just failed | 23:24 |
jeblair | yay? | 23:26 |
SpamapS | success fail! | 23:26 |
mordred | ok. SO - one of the differences ... | 23:27 |
jeblair | SpamapS: i feel like i'm successfail in all my endeavors! | 23:27 |
mordred | are 429428c0200becf7ac59d83ad1a61fb37171de5c and 1dbf5f96b5928695d65587cccc7fe56630f6114a | 23:27 |
mordred | which are updates to the command module from latest ansible added to try to fix python3 related issues | 23:27 |
jeblair | mordred: interesting... | 23:28 |
mordred | I have reverted those two locally - trying again | 23:28 |
jeblair | mordred: those should be easy to revert on their own, right? | 23:28 |
jeblair | ++ | 23:28 |
mordred | bingo | 23:29 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Revert sync from latest command from ansible https://review.openstack.org/474808 | 23:30 |
mordred | jeblair: now - the purpose of those is that running our command module with ansible with python3 was borking for jamie | 23:31 |
jlk | bitten by ansible updates again? | 23:31 |
jeblair | mordred: but not us? | 23:31 |
mordred | jeblair: we weren't running jobs when it bit him | 23:31 |
pabelanger | I don't think we are using command much right now | 23:31 |
pabelanger | just shell | 23:32 |
mordred | we use it a ton | 23:32 |
pabelanger | or, is that the same | 23:32 |
mordred | it's the same thing | 23:32 |
mordred | yup | 23:32 |
pabelanger | ah | 23:32 |
jeblair | mordred: but was that also the same problem as delegate_to? | 23:32 |
jeblair | let me rephrase | 23:32 |
jeblair | were the python3 ansible issues also related to delegate_to tasks? | 23:32 |
mordred | yes | 23:32 |
pabelanger | mordred: thank you, I was going crazy | 23:33 |
jeblair | ok, so that's another reason we may not see those issues immediately | 23:33 |
mordred | I think we should we should consider, at least until we've got this particular topic sorted, running ansible via python2 and setting ansible_python_interpreter in our inventory files to python2 as well | 23:33 |
jeblair | mordred: ansible_python_interpreter sounds straightforward... run ansible-playbook via python3 is, perhaps, less so? | 23:35 |
jeblair | mordred: since we're listing ansible as a dependency of zuul; we'd have to switch to having it externally installed, and then we lose control over versioning | 23:35 |
mordred | yes - I really just mean for a few days, not as a general approach | 23:36 |
jeblair | oh ok | 23:36 |
jeblair | well, probably a week or two at this point :| | 23:36 |
pabelanger | I'm okay for python2 and ansible | 23:36 |
mordred | so that we can unstick folks from working on jobs and figure out the py3 issues in isolation | 23:36 |
mordred | jeblair: yah | 23:36 |
jeblair | mordred: okay, so what's the proposal on the table? | 23:36 |
mordred | jeblair: we need to do a pip install of ansible on ze01 with python2's pip so that "ansible-playbook" gets us a python2 version | 23:37 |
mordred | jeblair: then we can also add ansible_python_version to the inventory we write out | 23:37 |
mordred | or, ansible_python_interpreter rather | 23:37 |
jeblair | mordred: okay... i'm probably going to want to blow away ze01 when we're ready to revert this | 23:38 |
mordred | ++ | 23:38 |
jeblair | i don't trust that "uninstall something we manually installed with pip2" is going to leave us in a fresh state :) | 23:39 |
jeblair | mordred: how do we use the py2 version of ansible-playbook? | 23:39 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Use python2 for ansible on remote hosts https://review.openstack.org/474810 | 23:39 |
jeblair | (considering that the py3 one will still be installed) | 23:40 |
clarkb | fwiw pip has gotten much better at uninstalling stuff | 23:40 |
clarkb | and worst case you just delete everything under the python2 site packages in /usr/local | 23:40 |
mordred | the latest pip install will overrite the /usr/local/bin/ansible-playbook link | 23:40 |
pabelanger | what about ansible into a virtualenv, then setup symlink? | 23:40 |
mordred | jeblair: you can verify after installing by doing ansible-playbook --version and it'll also print the python version | 23:40 |
jeblair | mordred: i guess that's okay until there's another ansible release and pip installs it for zuul again; that's probably not going to happen this or next week i guess. :) | 23:41 |
mordred | jeblair: yah. this is definitely not the state we want long-term | 23:42 |
mordred | jeblair: I mean - I'm going to wake up in the morning and try to figure out why the module changes break things | 23:42 |
jeblair | mordred: does delegate_to not honor python_interpereter_whatever? | 23:42 |
*** jkilpatr has joined #zuul | 23:42 | |
jeblair | ansible_python_interpreter. that's it. | 23:42 |
jeblair | mordred: i guess i'm wondering if just setting *that*, but still running py3 ansible-playbook would be okay... | 23:43 |
mordred | jeblair: it should - I honestly don't know why delegate_to was finding python3 instead of python2 | 23:43 |
mordred | I'll _also_ look in to that | 23:43 |
mordred | jeblair: we can certainly try that | 23:43 |
jeblair | mordred: will save a bunch of time, if it works. | 23:43 |
mordred | jeblair: we could pull those revert patches into ze01, then add ansible_python_interpreter to the inventory and run the ze01 test again | 23:44 |
mordred | actually - lemme see if I can test python3 real quick | 23:44 |
pabelanger | if we are concerned about changing ansible versions on ze01.o.o over time, we could have bwrap launch from a tarball. I think we talked about that in the past. Obviously, that increases the burden on operator to get said tarball on to ze01.o.o | 23:44 |
mordred | jeblair: k. that seems to work | 23:44 |
jeblair | pabelanger: i'm not at all concerned with that under normal circumstances. i don't want to deal with that now. | 23:45 |
mordred | yup | 23:45 |
mordred | jeblair: I believe I have verified that reverting those patches should unstick us without any additionl python interpreter stuffs | 23:45 |
mordred | jeblair: this may re-break jamie and delegate_to | 23:45 |
mordred | but I've got a constrainted surface area to investigate tomorrow | 23:46 |
jeblair | mordred: right, but the ansible_python_interpreter may fix jamie ? | 23:46 |
mordred | oh - I think the problem there might be localhost connection=local may use the action in-process python | 23:46 |
jeblair | mordred: or did jamie try ansible_python_interpereter and it still didn't work? | 23:46 |
mordred | I don't know | 23:46 |
jeblair | ok. | 23:47 |
mordred | but that's a problem we can figure out | 23:47 |
jeblair | jamielennox: ^ we said your name a lot :) | 23:47 |
* mordred needs to eod - but promises to pick up helping solve the issue with the hting we just reverted in the morning | 23:48 | |
clarkb | mordred: I want to say connection=local forks a new 'python' and doesn't use the existing process | 23:48 |
jeblair | mordred: good night! | 23:48 |
clarkb | mordred: this is why virtualenvs don't work as people expect with connection=local | 23:48 |
mordred | clarkb: k. in that case, ansible_python_interpreter MAY still help that | 23:48 |
clarkb | yes I think ansible_python_interpreter is how you work around the virtuaelnv issue, you point it at the python in the venv and then its happy | 23:48 |
SpamapS | That's actually quite nice as you may intentionally want to use the same python you use for your remote nodes for local connections, but you may want a specific python for ansible that is different. | 23:52 |
SpamapS | Not how you'd expect, sure, because PATH is being ignored, but functional once revealed. | 23:52 |
jamielennox | jeblair: oh? i flicked through the history and didn't see highlights | 23:54 |
jamielennox | there's a lot of overnight in the zuul channels at the moment | 23:54 |
jamielennox | jeblair: so the patch we've merged fixes the localhost issue afaik | 23:55 |
jamielennox | i have other localhost issues, primarily at the moment that i don't get any output from a task on localhost, and so i have a failing task and cannot figure out why or what it's doing | 23:55 |
jamielennox | but from what i can see when you connection=local you are still in the same venv | 23:56 |
jamielennox | (though realistically i am running command: with localhost so i don't actually notice) | 23:56 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!