jeblair | jlk: i think maybe half of what that test is doing is still relevant -- testing that if a new repo shows up after zuul starts, it doesn't bork. however, we certainly can't have a configuration for that repo until after it exists now. as pabelanger found out earlier today. | 00:00 |
---|---|---|
jlk | seems kind of chicken-egg | 00:00 |
jlk | you can't use a repo unless it's defined in a tenant config | 00:00 |
jeblair | indeed | 00:00 |
jlk | can't put a repo in a tenant config until it exists | 00:01 |
jeblair | yeah, so our repo creation in infra will need to be at least two phases: create in gerrit, then, after existing, add to tenant config. | 00:02 |
jeblair | but aside from that, zuul can get events from that repo even if it's not in its config, and it could end up getting used as a foreign project. | 00:02 |
jeblair | we should probably drop that test, and add a test for foreign projects. i think that would cover what we need. | 00:03 |
jlk | alright | 00:04 |
jeblair | SpamapS, clarkb: i would rank ssh-agent above generating keys per node -- keys-per-node won't work well for a static host inventory in nodepool | 00:04 |
jeblair | pabelanger: ^ | 00:04 |
*** harlowja has quit IRC | 00:04 | |
jlk | # TODOv3 (jeblair): re-add support for foreign projects if needed hehe found that. | 00:05 |
clarkb | jlk: thats a good point re static hosts | 00:05 |
jeblair | (also, just to make sure this is out there -- the custom ansible plugins should prevent a rogue check job from accessing the private key data. however, that's just a belt, not belt-and-suspenders, so i think pursuing suspenders here is worthwhile) | 00:05 |
clarkb | er jeblair ^ | 00:05 |
jeblair | been thinking about that a lot lately :) | 00:06 |
pabelanger | SpamapS: I was able to get ssh-agent working inside bubblewrap container on local fedora system using steps listed: https://review.openstack.org/#/c/453851/7/zuul/driver/bubblewrap/__init__.py | 00:19 |
pabelanger | SpamapS: I am going to dive more into it tomorrow | 00:19 |
pabelanger | I've been side tracked by flatpak tonight :) | 00:20 |
openstackgerrit | K Jonathan Harker proposed openstack-infra/zuul feature/zuulv3: Add github reporter status_url config option https://review.openstack.org/449794 | 01:27 |
*** jamielennox is now known as jamielennox|away | 02:03 | |
*** jamielennox|away is now known as jamielennox | 02:17 | |
jamielennox | should it be possible to filter reporters based on source? | 02:37 |
jamielennox | this is mostly a conceptual thing i'm thinking of | 02:38 |
jamielennox | i'm hooking v3 up to both gerrit and github simultaneously, eventually it would be nice to do cross project dependencies and so you nned triggers for both sources | 02:38 |
jamielennox | however i started by just reusing the pipeline in both projects | 02:39 |
jamielennox | but then on action i have both a gerrit and github reporter so when a github patch succeeds/fails it tries to report on gerrit as well | 02:44 |
jamielennox | obviously i can (and will for now) create seperate pipelines - but is this the intention or should i be able to say only report to github if the source is github? | 02:45 |
jamielennox | current zuul.yaml: https://github.com/BonnyCI/project-config-v3/blob/d7830586dbc98b4c3ff10742b3d3862b6448d869/zuul.yaml | 02:47 |
jlk | jamielennox: that's a good question! | 03:30 |
jlk | I would think that a report should be tied to the source the job came through | 03:30 |
jlk | also, because that's how multiple connections to the same source should work too | 03:31 |
jlk | er multiple connections via the same driver | 03:31 |
jamielennox | jlk: so i'm wondering how complex this should be, but i'm at least thinking that on the pipeline you could say: | 03:32 |
jamielennox | reporters: {'github': {'source': 'github'}} | 03:32 |
jlk | well | 03:32 |
jamielennox | so a reporterfilter in the same way you do event filter | 03:32 |
jlk | we did work so that the trigger is source specific from the event | 03:32 |
jlk | I would think that same work should translate somehow to the reporter things | 03:33 |
jamielennox | maybe | 03:33 |
jlk | I have to go afk, kiddo bed time | 03:33 |
jamielennox | i mean an event is always specific to the source, it has to be | 03:33 |
jlk | I believe the defining bit is the canonical_hostname | 03:33 |
jlk | event hostname has to match the trigger source hostname | 03:33 |
jamielennox | a reporter could be like please record a log over here that is common to all reporters | 03:33 |
jlk | oooh, you're thinking of a global reporter and maybe connection specific reporters. | 03:34 |
jamielennox | so i would be ok saying like this reporter triggers for these sources, and this reporter triggers for all sources | 03:34 |
jlk | sounds like a larger discussion | 03:34 |
jamielennox | (just preferrably not in every action) | 03:34 |
* jlk really afk | 03:34 | |
jamielennox | yea, i figured i'd put it out now and people can read and i'll chase it tomorrow | 03:35 |
jamielennox | i think things like github/gerrit reporters should be source specific by default, smtp, sql global by default and preferrably overridable in config | 03:38 |
jamielennox | (can't think of a legitimate reason you'd override github/gerrit being non source specific) | 03:42 |
openstackgerrit | Jamie Lennox proposed openstack-infra/zuul feature/zuulv3: Only report to gerrit if the action is from gerrit https://review.openstack.org/461981 | 03:57 |
jamielennox | not sure ^ is appropriate or not, going to try it out | 03:57 |
openstackgerrit | Jamie Lennox proposed openstack-infra/zuul feature/zuulv3: Only report to gerrit if the action is from gerrit https://review.openstack.org/461981 | 04:16 |
*** bhavik1 has joined #zuul | 05:37 | |
*** yolanda has quit IRC | 06:20 | |
jamielennox | jeblair: so with the change to canonical hostnames i'm missing a way in the job to determine the directory that the project code was cloned into | 06:20 |
jamielennox | for example the merger used to maintain BonnyCI/sandbox and now it maintains github.com/BonnyCI/sandbox | 06:21 |
jamielennox | however github.com there is a default, in my jobs etc i only say BonnyCI/sandbox | 06:22 |
jamielennox | i used to locate this directory like: "{{ workspace_root }}/src/{{ zuul.project }}" however now there's an additional github.com folder and no zuul variable to tell me what it should be | 06:24 |
jamielennox | there's a couple of ways i could see to fix this and just want to know your opinion - i think zuul.project should be the canonical name but there's a few ways to do that and don't want to go down the wrong rabbit hole | 06:25 |
*** hashar has joined #zuul | 07:28 | |
*** isaacb has joined #zuul | 08:48 | |
*** Cibo_ has joined #zuul | 09:01 | |
*** Cibo_ has quit IRC | 09:15 | |
openstackgerrit | Jamie Lennox proposed openstack-infra/zuul feature/zuulv3: Use routes for URL handling in webapp https://review.openstack.org/461331 | 09:30 |
openstackgerrit | Jamie Lennox proposed openstack-infra/zuul feature/zuulv3: Use tenant_name to look up project key https://review.openstack.org/461332 | 09:30 |
openstackgerrit | Jamie Lennox proposed openstack-infra/zuul feature/zuulv3: Use routes for URL handling in webapp https://review.openstack.org/461331 | 10:04 |
openstackgerrit | Jamie Lennox proposed openstack-infra/zuul feature/zuulv3: Use tenant_name to look up project key https://review.openstack.org/461332 | 10:04 |
*** jkilpatr has quit IRC | 10:31 | |
mordred | jamielennox: that commit message ^^ made me twitch | 10:37 |
jamielennox | mordred: which? | 10:41 |
mordred | jamielennox: "Use tenant_name to look up project..." | 10:42 |
mordred | jamielennox: just incorrect brain context moment | 10:42 |
mordred | I was like "No - that's backwards!!! ... oh, wait, wrong project" | 10:42 |
jamielennox | oh, right - not allowed to call them tenants for keystone so i'm maintaining the barrier | 10:42 |
*** jkilpatr has joined #zuul | 10:49 | |
*** bhavik1 has quit IRC | 10:52 | |
*** hashar is now known as hasharLunch | 11:48 | |
*** isaacb has quit IRC | 12:01 | |
*** isaacb has joined #zuul | 12:03 | |
*** hasharLunch is now known as hashar | 12:19 | |
*** isaacb has quit IRC | 12:23 | |
pabelanger | morning! | 13:02 |
hashar | good morning | 13:04 |
*** dkranz_ has joined #zuul | 13:20 | |
*** isaacb has joined #zuul | 13:21 | |
SpamapS | pabelanger: so I had a thought about the SSH key in untrusted context conundrum. | 14:03 |
SpamapS | pabelanger: wondering if we can basically load the key into ssh-agent _inside_ the bwrap, and then rm the key, and kill the agent if ansible-playbook exits. | 14:06 |
jeblair | SpamapS: let bubblewrap handle the process mgmt? | 14:07 |
SpamapS | http://paste.openstack.org/show/608711/ | 14:09 |
SpamapS | jeblair: bwrap is pid 1... and will aggressively remove the entire pid space under it. But I mean in between bwrap and ansible-playbook. | 14:09 |
pabelanger | That would mean multiple ssh-agents running on the host, inside bubblewrap, over single ssh-agent outside. Trying to understand why that is better | 14:09 |
jeblair | jamielennox: i could see making zuul.project be the canonical project name (hostname + full path). or we could do a series of variables based on the internal ones like https://etherpad.openstack.org/p/vB1WjfL804 (mordred, clarkb: thoughts?) | 14:10 |
SpamapS | pabelanger: because that is a copy of the key in RAM that we can kill | 14:10 |
SpamapS | but now that I think about it | 14:10 |
SpamapS | bwrap's going to aggressively kill anything on exit of ansible-playbook.... so an attacker is already going to have to _replace_ and fork under ansible-playbook | 14:11 |
SpamapS | so this gets us nothing | 14:11 |
pabelanger | not sure I follow | 14:11 |
pabelanger | the last part | 14:11 |
SpamapS | bwrap forks and execs its arguments, in our case, that is 'ansible-playbook --foo --bar baz' | 14:12 |
SpamapS | and when that process exits, bwrap being its pid 1, it simply terminates the entire namespace, and itself. | 14:12 |
SpamapS | so if somebody wants to break out of ansible-playbook, they will have to do it by forking something from ansible-playbook, and keeping ansible-playbook running. | 14:13 |
SpamapS | since SSH forks from ansible-playbook and reads its private key every time, I don't know that we can prevent a nefarious process from doing the same. | 14:14 |
SpamapS | (we could with SELinux or AppArmor, because we could say only ssh is allowed to read the key) | 14:14 |
pabelanger | is it possible to list a private key from ssh-agent? | 14:15 |
SpamapS | So, a single executor ssh-agent buys us something, as it will keep the SSH key in RAM and not sprayed out into the various work dirs, so prevents accidental key material propagation. But it doesn't protect us from ansible-playbook escapers. | 14:16 |
SpamapS | Yes, ssh-agent's protocol is to request a private key and get it back over the socket. | 14:16 |
SpamapS | oy, have to AFK now to get kids off to school | 14:16 |
*** isaacb has quit IRC | 14:17 | |
pabelanger | Right, that's the first part I am concerned about right now. I'd like to minimize the copying of our private keys around, if possible. I think clarkb suggestion of ssh-agent, gets us that | 14:17 |
SpamapS | I'm still concerned that the private SSH key will be a single ansible-playbook breakout away, and thus must if nothing else be rotated aggressively. | 14:17 |
pabelanger | as for breaking out of ansible-playbook, I haven't considered testing that yet | 14:17 |
SpamapS | The only reason we're bwrapping is we think it's possible. | 14:17 |
*** openstackgerrit has quit IRC | 14:18 | |
*** isaacb has joined #zuul | 14:18 | |
jeblair | SpamapS: i thought the ssh-agent protocol was signing operations, not key data transfer | 14:19 |
pabelanger | right, it also stop playbooks from looking into /home/zuul on executor too, which contains our sensitive data right now | 14:19 |
jeblair | so if you use ssh-agent, you don't request a copy of the key, but rather, request that the key be used to sign something | 14:19 |
jeblair | if that's the case, we don't have to worry as much about losing the key material in a breakout, however, that job would still be able to use the key to log into any host for the duration of the job. | 14:21 |
jeblair | https://tools.ietf.org/id/draft-miller-ssh-agent-00.html#rfc.section.4.5 | 14:22 |
pabelanger | Ya, having somebody break out and use key is something I haven't looked into yet. I was mostly wanting to stop somebody writing a cat ~/.ssh/id_rsa playbook right now, and think ssh-agent might provide protection against that | 14:23 |
*** isaacb has quit IRC | 14:24 | |
jeblair | pabelanger: sure, but the cat playbook only works in the case of a breakout too | 14:24 |
pabelanger | or somebody missed a trusted playbook change to do that | 14:25 |
pabelanger | but, agree | 14:25 |
jeblair | pabelanger: trusted playbooks will not run with bwrap, so this won't stop that. | 14:26 |
pabelanger | I'm still trying to understand why mount private key into bwrap at all, over using ssh-agent outside | 14:26 |
pabelanger | jeblair: ah, right. keep forgetting that | 14:27 |
pabelanger | if we moved to ssh-agent on executor, we could keep ssh_private_key contents outside of zuul in theory. Not including the discussion from last night about having zuul run ssh-add commands | 14:29 |
jeblair | pabelanger: there will be multiple keys in the future, some of them managed by zuul, so it's likely that we would need multiple ssh agents, and i think zuul would have to start them. | 14:31 |
dmsimard | fungi, pabelanger: so, the reason why I'm interested in booting from volume with nodepool is because the cloud I'm working with has a huge disparity in performance between ephemeral (local spindles) and volumes (all ssd ceph) | 14:47 |
dmsimard | some jobs can land in ephemeral just fine (say, simple tox jobs) but I'd rather have our "intensive" jobs to use volumes (i.e, openstack installation/integration tests) | 14:48 |
pabelanger | I don't see why it could be used, I think shade support it out of box | 14:49 |
dmsimard | I would guess so since ansible modules can use boot from volume | 14:51 |
Shrews | shade supports boot from volume, nodepool does not so that's where the work would be | 14:51 |
fungi | dmsimard: is there any openstack feature allowing you to pre-image a volume? or do you need a separate step to boot from image, attach the seed volume, overwrite it with the bootable image, detach it and then clone that for subsequent booting? | 14:51 |
pabelanger | Ya, create_server takes a few parameters, 50GB volume seems to be default right now. I think it would be a matter of exposing some configuration settings into 'images' section to indicate we'd want to use the image from a volume | 14:51 |
pabelanger | Shrews: agree | 14:52 |
clarkb | yes ssh agent doesnt expose the key. If bubblewrap doesnt protrct access to memory we lose regardless | 14:52 |
clarkb | jeblair: re project names, maybe have a project name object whose default representation/text is the canonical version. But include methods to get the hostname/short_name/name/protocol? | 15:04 |
*** openstackgerrit has joined #zuul | 15:06 | |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Increase file permissions around generate keys https://review.openstack.org/461218 | 15:06 |
pabelanger | jeblair: any objection to updating zuulv3-dev.o.o with new openstack-zuul-jobs / openstack-zuul-roles config changes? | 15:08 |
*** rcarrillocruz has quit IRC | 15:16 | |
jeblair | pabelanger: please do | 15:17 |
*** rcarrillocruz has joined #zuul | 15:18 | |
pabelanger | nice, reload worked | 15:23 |
mordred | fungi, Shrews, pabelanger, dmsimard: I _believe_ (would have to double-check) that we have options for create_server that allow booting from a pre-created volume, as well as booting from an image but on to a volume. there is also a shade call which is "please make me a volume from this image" | 15:24 |
fungi | oh, very nice! | 15:25 |
mordred | (we honestly should think about using such things for our long-lived servers, fwiw) | 15:25 |
fungi | so in theory the logic within nodepool could remain basically the same if you have "please make me a volume from this image" followed by "please boot from this volume" | 15:26 |
mordred | yes | 15:26 |
mordred | in fact, it's just a boolean flag to shade | 15:26 |
mordred | so it can be _very_ little code in nodepool | 15:26 |
fungi | configuration wise you still have images you're maintaining, so just need somewhere (new toggle in nodepool config?) for "and i want it on a volume thxbye" | 15:27 |
mordred | if you say "boot_from_volume=True" - you then have 2 options - provide a volume to boot from in "boot_volume" - or or provide an image to turn into a volume in the normal image param | 15:27 |
mordred | you also want "volume_size" | 15:27 |
mordred | which is in G | 15:27 |
mordred | (it defaults, as pabelanger mentions, to 50Gb) | 15:27 |
fungi | ahh, yep right | 15:27 |
mordred | so maybe 2 new options - boot_from_volume and volume_size? | 15:28 |
fungi | and maybe volume type/flavor i guess if your provider has more than just a default one | 15:28 |
pabelanger | Ya, I think the hard part is new options to bike shed on | 15:28 |
pabelanger | hopefully we could also try this in nodepool dsvm job too | 15:28 |
fungi | like in rax we can choose between sata (default) and ssd (non-default) | 15:28 |
pabelanger | mordred: why for long lived servers? Not making the connection | 15:31 |
mordred | pabelanger: it allows us to pick the size of the root filesystem | 15:42 |
mordred | pabelanger: and also for our root filesystem to be in a cinder volume instead of just on a local disk - this might or might not be an improvement, depending on the server - but maybe it's worth investigating and thinking about if/when we should use it | 15:42 |
mordred | fungi: oh - excellent call on volume_type - that is, in fact, _not_ exposed in shade - but I believe it should be | 15:44 |
mordred | will need to read up on how block_device_mapping_v2 works for volume types | 15:44 |
clarkb | mordred: I mentioned it in -infra bu apparently image metadata can control volume related things on boot, any idea if it can be used to have an image boot on volume? | 15:44 |
clarkb | mordred: beacuse if it can then you could do that in nodepool with no changes to any code, just config | 15:45 |
mordred | clarkb: I do not know - but I'll admit to being wary of making more use of magic glance metadata ... I don't believe any of them are actually documented | 15:45 |
fungi | certainly sounds convenient | 15:45 |
clarkb | mordred: I mean 95% of shade is doing undocumented things | 15:46 |
clarkb | :P | 15:46 |
mordred | :) | 15:46 |
clarkb | like figuring out ip addresses | 15:46 |
mordred | if we can learn how to trigger such a thing, that seems potentially fine - however it might also be nice to add the options | 15:46 |
mordred | so that nodepool users can just express "boot these using boot-from-volume" and not have to themselves learn BFV image metadata magic :) | 15:47 |
mordred | (also, if we do learn more about them, it might be worth while exposing them in some way as options somehow in shade so that it's easy for people to choose to do that themselves) | 15:47 |
pabelanger | mordred: got it, thanks | 15:48 |
clarkb | mordred: definitely, just thinking it might be doable now if people need it (but ya undocumented in general so who knows!?) | 15:50 |
*** isaacb has joined #zuul | 15:51 | |
SpamapS | jeblair: AH! TIL something :) | 15:57 |
SpamapS | My understanding was still based on ssh1 keys.. when was ssh2 created, 18 years ago? | 15:57 |
SpamapS | :-P | 15:57 |
mordred | SpamapS: that's still super new | 15:57 |
SpamapS | I really just hadn't thought about how ssh-agent works completely. | 15:58 |
SpamapS | https://github.com/openssh/openssh-portable/blob/master/PROTOCOL.agent has the whole thing spelled out nicely btw | 15:59 |
jeblair | yeah, that agrees with the ietf thing too | 15:59 |
SpamapS | so yes, ssh-agent exposed into the bwrap will limit the ability of an ansible escaper to using the key while they're running on the box. | 16:00 |
SpamapS | so changes to the current implementation are needed.. and I think 1 spec revision. | 16:03 |
SpamapS | spec should add a brief SSH key discussion | 16:03 |
SpamapS | and we still need to add the venv and now ssh agent to the implementation. Yes? | 16:04 |
jeblair | SpamapS: sounds about right | 16:07 |
* SpamapS is on it | 16:08 | |
SpamapS | pabelanger: have you started on any of that? | 16:09 |
pabelanger | SpamapS: no, I have POC'd ssh-agent locally but think we need to solve venv issue first. So we can test bwrap in gate | 16:10 |
pabelanger | currently head down, trying to get openstack-zuul-jobs setup | 16:11 |
SpamapS | pabelanger: k, I'm going to take them both on. | 16:11 |
eventingmonkey | pabelanger: Hey, SpamapS suggested I reach out to you about https://review.openstack.org/#/c/454396/3/? | 16:13 |
pabelanger | eventingmonkey: sure, what can I help with | 16:14 |
eventingmonkey | pabelanger: I'm looking for something I can help out with, and SpamapS told me to ping you about in here. | 16:16 |
pabelanger | eventingmonkey: sure, the idea of 454396, is we want to add playbooks to get code coverage for all our blocked lookup plugins here: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/ansible/lookup?h=feature/zuulv3 Right now, the test is setup as an untrusted-project, which means we should report FAILURE if we try to use certain lookups. | 16:19 |
pabelanger | for example, lookup('passwd') will always fail | 16:19 |
SpamapS | pabelanger: is there already an example test written, or maybe one that asserts success that could be modified to test for fail? | 16:20 |
pabelanger | just what we have in 454396 right now | 16:20 |
* eventingmonkey relooks over 454396... | 16:21 | |
pabelanger | eventingmonkey: SpamapS: what I was thinking, we'd write playbooks to use the lookup function, register a variable, then assert it failed. I do something like that here: http://git.openstack.org/cgit/openstack-infra/zuul/tree/tests/fixtures/config/ansible/git/common-config/playbooks/check-vars.yaml?h=feature/zuulv3 | 16:23 |
pabelanger | then, we can enhance the test, and validate we get the right failure message in back from zuul | 16:23 |
SpamapS | pabelanger: k. I just want eventingmonkey to have a framework to hang his tests off of, because the test fakes can be a little confusing at first. | 16:24 |
pabelanger | sure, I can add some code to 454396 on how I envision it working. Ensure we are all happy with that, then iterate on it | 16:26 |
jeblair | what if we proceeded with what was already in 454396, and then improved it with the variable registration later? | 16:27 |
pabelanger | that works for me, in that case eventingmonkey should be okay to keep iterating on it | 16:29 |
eventingmonkey | pabelanger: cool, I'm down. | 16:30 |
eventingmonkey | pabelanger and SpamapS: What I'm not clear on is what it is you would like me to iterate on? I can look at it and see where I would go, but if you had something in mind already I'd rather not jump into the wrong thing... | 16:31 |
jeblair | eventingmonkey: that test covers a portion of the custom lookup plugins; we want to test all of them. | 16:31 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Move playbook out of zuul https://review.openstack.org/462210 | 16:31 |
jlk | o/ | 16:32 |
eventingmonkey | jeblair: Ah...I see, so working in the same area, just add tests for the other ones. Cool beans. | 16:33 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add jobs back to .zuul.yaml https://review.openstack.org/462212 | 16:35 |
*** isaacb_ has joined #zuul | 16:35 | |
*** isaacb has quit IRC | 16:39 | |
*** isaacb_ has quit IRC | 16:39 | |
jeblair | jlk: if you have a minute to weigh in on 461331, i'd love another opinion there. | 16:40 |
* jlk reads | 16:40 | |
* jeblair subscribes to openstack-zuul-jobs and updates dashboards | 16:43 | |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add jobs back to .zuul.yaml https://review.openstack.org/462212 | 16:44 |
jeblair | Zuul: Patch Set 2: Verified-1 (2017-05-03 16:25:54+0000) < Reply > | 16:45 |
jeblair | Unknown configuration error | 16:45 |
jeblair | we should track that down | 16:45 |
* jeblair looks in logs | 16:45 | |
pabelanger | I think it is job permissions | 16:45 |
jeblair | http://paste.openstack.org/show/608728/ | 16:46 |
jeblair | we should make sure that error propogates back to gerrit | 16:46 |
pabelanger | yes, I am struggling to setup dependencies between untrusted projects ATM | 16:46 |
pabelanger | learning how it all works | 16:47 |
jeblair | pabelanger: i think it might be very hard to move these between repos | 16:47 |
*** harlowja has joined #zuul | 16:48 | |
jeblair | becuase you can't have jobs in two different repos with the same name | 16:48 |
pabelanger | should I not expect depends-on to work properly in this case? | 16:48 |
jeblair | pabelanger: you should. and the second change should work. but the first change won't. | 16:49 |
jeblair | pabelanger: the first change, which adds a second copy of the jobs, is invalid configuration. the second job, which removes the first copy of the jobs, should be valid config. | 16:50 |
jeblair | pabelanger: (assuming the second depends-on the first) | 16:50 |
pabelanger | okay, great | 16:50 |
jeblair | but that sequence will never merge with one-way dependencies (if we did two-way cross-repo-deps, we could merge them both simultaneously and that would be correct) | 16:51 |
jeblair | pabelanger: i have an idea though | 16:51 |
*** hashar has quit IRC | 16:51 | |
jlk | jeblair: commented. I'm on the same fence as you. | 16:51 |
pabelanger | jeblair: please share :) | 16:51 |
pabelanger | also, just seen the following: | 16:52 |
pabelanger | 2017-05-03 16:51:06,879 DEBUG zuul.IndependentPipelineManager: <QueueItem 0x7efe18089590 for <Change 0x7efdfeea4ed0 462214,1> in check> is a failing item because ['is a non-live item with no items behind'] | 16:52 |
jeblair | pabelanger: that's a pretty normal message | 16:52 |
jeblair | pabelanger: that's a dependent change being dequeued after the depending change reported | 16:53 |
jeblair | pabelanger: (recall that dependent changes are added to the queue, but aren't acted upon) | 16:54 |
pabelanger | k | 16:54 |
pabelanger | I'm having some trouble understanding why 462212 is not running jobs on zuulv3 ATM | 16:54 |
pabelanger | I don't think openstack-infra/zuul can see openstack-infra/openstack-zuul-jobs right now | 16:55 |
pabelanger | Oh | 16:57 |
pabelanger | I think I know | 16:57 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add jobs back to .zuul.yaml https://review.openstack.org/462212 | 16:58 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add jobs back to .zuul.yaml https://review.openstack.org/462212 | 16:59 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add jobs back to .zuul.yaml https://review.openstack.org/462212 | 17:01 |
jeblair | pabelanger: you keep shadowing those jobs. it's not going to wokr. | 17:03 |
pabelanger | jeblair: maybe I am confused. How do I get zuul to run tox-linters again? | 17:03 |
jeblair | pabelanger: it's what i said above -- you can not have the same job defined in two different repos. | 17:04 |
*** Cibo_ has joined #zuul | 17:04 | |
jeblair | pabelanger: maybe we should have a phone call? | 17:04 |
pabelanger | yes, that would help | 17:05 |
pabelanger | give me 1 moment | 17:05 |
clarkb | couldn't you use a new name for the moved stuff, remove the old stuff, then rename to name you want? | 17:05 |
jeblair | clarkb: yes, that's a possibility | 17:05 |
pabelanger | okay, back. should we jump into pbx.o.o? | 17:07 |
jeblair | yep. let's go with room 6812 | 17:07 |
jeblair | anyone is welcome to join: https://wiki.openstack.org/wiki/Infrastructure/Conferencing | 17:08 |
SpamapS | pabelanger: btw, the bubblewrap implementation doesn't bind mount in the user's home dir | 17:21 |
SpamapS | pabelanger: --dir just creates that dir in the chroot. | 17:21 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add jobs back to .zuul.yaml https://review.openstack.org/462212 | 17:24 |
pabelanger | SpamapS: okay cool. I was confused | 17:24 |
jeblair | quick call summary: we're going to proceed with: 1) a change that removes the jobs from zuul and switches zuul to noop; 2) a change that adds them to openstack-role-jobs and depends-on (1); 3) a change to start running the newly moved jobs on zuul that depends-on (2) | 17:24 |
jeblair | we should end up with +1s all the way on that. but we may have bugs. :) | 17:25 |
pabelanger | okay, I think 462212,6 is correct now | 17:26 |
pabelanger | I see zuulv3 reporting no jobs ATM | 17:26 |
pabelanger | Also, I wonder if we should also bubble up that message to gerrit | 17:27 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add support for bwrap https://review.openstack.org/453851 | 17:27 |
SpamapS | pabelanger: ^ that should now copy in the venv | 17:27 |
SpamapS | well, ro-bind it | 17:27 |
*** Cibo_ has quit IRC | 17:29 | |
jeblair | pabelanger: no jobs isn't normally an error, so probably shouldn't report to gerrit | 17:29 |
jeblair | pabelanger: i'm going to go work on a test case for 'dynamically adding a job in one repo and dynamically running it from another one' and see if i can reproduce this | 17:32 |
SpamapS | so with the SSH agent. I'm not sure what to do if SSH_AUTH_SOCK is already set. | 17:32 |
pabelanger | jeblair: ok | 17:32 |
SpamapS | do I just use it? save it, execute stuff, and restore it? | 17:32 |
clarkb | i would pass it through? that seems like it would be least surprise method | 17:32 |
jeblair | SpamapS: force it to our value? | 17:33 |
jeblair | i think zuul has to control the ssh agent(s), otherwise we won't be able to manage having multiple keys later | 17:33 |
jeblair | we're just talking about setting the env var with bubblewrap, right? so we're not altering zuul's environment? | 17:34 |
jlk | jeblair: in gertty, when in a diff view, when I'm trying to comment on a line it keeps opening the comment on the left side (old code) rather than the right side (new code). What changes the focus there? | 17:36 |
jeblair | jlk: tab | 17:37 |
jlk | oh arrow key too | 17:37 |
jeblair | oh yep | 17:37 |
jlk | why didn't I try that already | 17:37 |
jeblair | it should also be persistent once you switch | 17:37 |
jlk | and is it normal that my comment box opens on top of somebody else's comment, but once saved it'll show _below_ the comment? | 17:38 |
jeblair | yeah. not my preferred behavior, but programming is hard. | 17:38 |
jeblair | i might know enough about urwid to fix both of those things now. :) | 17:38 |
jlk | haha, no worries | 17:39 |
pabelanger | If SSH_SOCK_AUTH was setup, then zuul also added a key, you would get 2 keys I would assume? | 17:39 |
SpamapS | jeblair: right | 17:39 |
SpamapS | currently we don't set env= when calling bwrap | 17:40 |
SpamapS | but perhaps we should be very explicit about what gets passed in there | 17:40 |
* SpamapS ponders while brewing mmoar coffee | 17:40 | |
jlk | jamielennox: I implemented fixes for your feedback on the github patches. I pushed up the result to my 'github-v3' branch in j2sol/zuul, but I'm not going to push up to gerrit until we get more feedback, since it'd be a lot of noise. | 17:44 |
pabelanger | SpamapS: jeblair: so, it sounds like we might be okay with https://review.openstack.org/#/c/331148/ for initializing ssh-agent and ssh-add / ssh-remove of keys when executor starts | 17:44 |
jeblair | SpamapS: pabelanger noted an env variable option to bwap -- maybe that's what we need to use instead of the env= argument to the subprocess call? | 17:44 |
jeblair | or maybe we need both? | 17:44 |
jeblair | pabelanger: something like that, maybe -- but i think it needs to happen entirely inside the executor, because i think it will eventually need to start and stup multiple agents. | 17:46 |
pabelanger | yes, I used --setenv SSH_AUTH_SOCK $SSH_AUTH_SOCK in testing last night, however should maybe confirm if that is needed. I'm not sure which env vars are passed to bwrap by default | 17:46 |
jeblair | pabelanger: (so, it shouldn't touch any files in zuul/cmd/; only zuul/executor/server) | 17:46 |
pabelanger | k | 17:47 |
pabelanger | make sense | 17:47 |
SpamapS | I've not looked at the env var option to bwrap | 17:48 |
SpamapS | let me see | 17:48 |
*** Cibo_ has joined #zuul | 17:48 | |
SpamapS | ah ok, those are just unset/set options. No real difference I think than just env= to Popen | 17:49 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add untrusted-projects ansible test https://review.openstack.org/461881 | 17:50 |
SpamapS | pabelanger: bwrap gets zuul's env, and thus, ansible-playbook gets the same env | 17:50 |
SpamapS | which is why I say, maybe we should black/whitelist envvars | 17:51 |
pabelanger | yes, I would be okay with that. | 17:51 |
*** Cibo_ has quit IRC | 17:53 | |
pabelanger | jeblair: SpamapS: the good news, is paramiko has an SSH agent interface which we can likely use | 17:53 |
SpamapS | hm | 17:54 |
SpamapS | I'll take a look | 17:54 |
pabelanger | hmm, at first glance it seems we cannot add key to paramiko SSH agent, just read from existing keys setup | 17:56 |
clarkb | pabelanger: its just for consuming keys out of an agent? | 17:57 |
pabelanger | clarkb: looks like it | 17:57 |
pabelanger | guess we could push that upstream | 17:57 |
jeblair | pabelanger: hrm, a 2-change series to add jobs in one repo and use them in another passes tests. i'm going to extend my test to do the 3-change series we're doing now. | 17:58 |
pabelanger | ack | 18:02 |
SpamapS | pabelanger: so yeah, I think if we move the Evseev change into the executor specifically, and pass that agent's envvars into bwrap via Popen's env= .. I think we're good. | 18:07 |
SpamapS | we can talk about filtering the env later. | 18:07 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Be less agressive with implied branch matchers https://review.openstack.org/462268 | 18:48 |
jeblair | pabelanger: ^ i think that's the bug that's causing no jobs to run. the key is the branch -- the openstack-zuul-jobs definitions are getting implicit 'master' brinch matchers, so they aren't running on feature/zuulv3. | 18:49 |
pabelanger | Ah interesting | 18:49 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Be less agressive with implied branch matchers https://review.openstack.org/462268 | 18:50 |
jeblair | git add ^ | 18:50 |
jlk | that's quite the commit message | 18:50 |
*** Cibo_ has joined #zuul | 18:52 | |
jeblair | it's longer than the change (but not the test) | 18:53 |
jeblair | wow, i'm going to python that a bit better, hang on. | 18:54 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Be less agressive with implied branch matchers https://review.openstack.org/462268 | 18:54 |
jeblair | there goes 3 ridiculous lines | 18:54 |
jlk | oof, okay | 18:55 |
jlk | re-reviews | 18:55 |
jeblair | sorry, it's hot here. | 18:55 |
jlk | hahah | 18:55 |
*** tobiash_ has joined #zuul | 19:01 | |
jeblair | tobiash_: have you seen https://review.openstack.org/461509 ? | 19:02 |
tobiash_ | jeblair: not yet, title sounds interesting :) | 19:03 |
tobiash_ | will have a look asap | 19:04 |
tobiash_ | jeblair: I encountered a bug with v3 where it reports with 'Starting check jobs' for every change in the gerrit instance | 19:06 |
tobiash_ | https://review.openstack.org/#/c/455711 | 19:06 |
tobiash_ | that's my try to fix that | 19:06 |
jeblair | tobiash_: ah thanks, that slipped my attention | 19:06 |
pabelanger | SpamapS: here is the bwrap command running on my local zuul env, but don't think we have the right mounts setup: http://paste.openstack.org/raw/608739/ | 19:11 |
pabelanger | ansible-playbook runs properly now, but missing zuul.ansible libraries | 19:12 |
pabelanger | check ansible.cfg pathes now | 19:13 |
Shrews | morgan, mordred, jeblair: why were the zuul py3 changes (325661, 327436, 327459) all abandoned? | 19:14 |
jeblair | Shrews: maybe because of the gear work? | 19:16 |
pabelanger | I think I see the issue | 19:18 |
*** Cibo_ has quit IRC | 19:20 | |
jeblair | Shrews: SpamapS has changes in progress for gear | 19:20 |
tobiash_ | jeblair: the spec sounds great, thanks for writing this :) | 19:24 |
*** hashar has joined #zuul | 19:26 | |
*** tobiash_ has quit IRC | 19:29 | |
Shrews | jeblair: good spec. i think dkranz_ would probably be interested in that, too | 19:33 |
jlk | it's too nice to be inside any more. I'm out for a few hours of cycling. | 20:02 |
dkranz_ | Shrews: I'll check it out. | 20:10 |
jeblair | pabelanger, jlk: i think we need something close to the old implied branch-matcher behavior for the project-pipeline variants. in other words, the change i wrote makes sense for job definitions, but i think if you stuck a project definition inside of a project repo, you would expect implied branch matchers for that... i'm working on a new revision | 20:13 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add untrusted-projects ansible test https://review.openstack.org/461881 | 20:25 |
pabelanger | Shrews: jeblair: some progress on bwrap^ I we need to mount the full git repo in bwrap, because we pip install -e for tox.ini | 20:26 |
pabelanger | Shrews: sorry | 20:26 |
pabelanger | SpamapS: ^ | 20:26 |
pabelanger | the blocker now, is untrusted and Executing local code is prohibited | 20:26 |
jeblair | pabelanger: hrm. that sounds like it could be complex -- how would we know whether (and how) to do that? | 20:31 |
jeblair | (keeping in mind that the goal here isn't just to handle zuul's unit tests, but to also handle zuul being installed in a virtualenv in production. even an editable virtualenv, i guess) | 20:31 |
pabelanger | jeblair: so, I am using the copy module now | 20:40 |
pabelanger | http://paste.openstack.org/show/608757/ | 20:40 |
pabelanger | but trying to determine if that is a valid use case or not | 20:40 |
pabelanger | today, that actually breaks | 20:41 |
pabelanger | IIRC, we say it was okay for untrusted jobs to write into the executor job dir | 20:42 |
jeblair | pabelanger: what's the error? | 20:44 |
pabelanger | http://paste.openstack.org/show/608758/ | 20:45 |
pabelanger | src is none | 20:45 |
pabelanger | happy to fix, but want to confirm that is actually valid code for untrusted executor | 20:46 |
pabelanger | untrusted job on executor* | 20:46 |
jeblair | pabelanger: yep, i think so | 20:46 |
pabelanger | k, let me work on fixing it | 20:48 |
SpamapS | pabelanger: I thought work_dir would have all the git repos in it. | 20:57 |
*** dkranz_ has quit IRC | 20:58 | |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add untrusted-projects ansible test https://review.openstack.org/461881 | 21:02 |
pabelanger | okay, that should pass, and use bwrap properly. Comments most welcome!^ | 21:03 |
SpamapS | pabelanger: left a comment inline. | 21:28 |
SpamapS | pabelanger: Little confused by ../../.. | 21:28 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Be less agressive with implied branch matchers https://review.openstack.org/462268 | 21:28 |
jeblair | i'm really sorry about that commit message | 21:28 |
jeblair | it's kind of "but wait! there's more!" | 21:29 |
SpamapS | approve now and you'll get not one, but TWO changes | 21:34 |
jeblair | our layaway plan lets you buy now and pay down your technical debt over the lifetime of the program! | 21:35 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Report job shadow errors to users https://review.openstack.org/462300 | 21:51 |
jeblair | pabelanger: i think those two changes take care of all the issues we've seen with your move. | 21:52 |
jeblair | i think the first one is done and passes tests now, so we should probably see about merging it asap to proceed with the job move. i think the second one is good too, just waiting on tests now. | 21:53 |
SpamapS | well this is stupid | 21:54 |
SpamapS | ssh-add -x locks an agent | 21:54 |
SpamapS | but it won't accept stdin redirects as the password | 21:54 |
jeblair | SpamapS: is locking feasible for us? i'd assume ansible might start new ssh connections at any time... | 21:54 |
SpamapS | jeblair: I was going to look at whether we could use the controlsocket functionality | 21:55 |
SpamapS | jeblair: so establish the connection and configure ssh to use it, then lock | 21:56 |
SpamapS | but that's a bit of a stretch | 21:56 |
jeblair | SpamapS: interesting... ansible does use controlsocket on its own. maybe it does the right thing if there is one already, but i don't know. | 21:57 |
SpamapS | actually, locking just prevents ssh-add operations. I think. | 21:57 |
SpamapS | I thought anyway | 21:58 |
jeblair | SpamapS: according to the rfc (draft), locking should "at least" suspend private key operations. so i'd expect it to not sign anything. but maybe also not accept additions. | 22:00 |
*** fbouliane has joined #zuul | 22:01 | |
SpamapS | the docs are not awesome | 22:01 |
SpamapS | for openssh | 22:01 |
SpamapS | While locked, the agent will refuse all requests except | 22:03 |
SpamapS | SSH_AGENTC_UNLOCK, SSH_AGENTC_REQUEST_RSA_IDENTITIES and | 22:03 |
SpamapS | SSH2_AGENTC_REQUEST_IDENTITIES. | 22:03 |
SpamapS | jeblair: the concern is that the escaper can forward the agent to the test node, and then poke stuff into the agent | 22:04 |
SpamapS | which suggests that agent-per-bwrap may be necessary | 22:04 |
*** jkilpatr has quit IRC | 22:05 | |
jeblair | SpamapS: *nod* | 22:06 |
clarkb | the main issue being removing a key that subsequent jobs would depend on? | 22:06 |
jeblair | clarkb: that, or adding a bunch of useless keys (dos?) | 22:07 |
SpamapS | both | 22:07 |
SpamapS | also keeping it very busy | 22:07 |
SpamapS | we can cgroup the bwraps and they'll share CPU fairly. But there would be only one ssh-agent. | 22:08 |
SpamapS | I think in the scheme of things, ssh-agent with one key loaded is going to be quite a bit smaller than ansible-playbook | 22:09 |
jeblair | yeah, doesn't seem troubling. | 22:13 |
*** cmurphy_ has joined #zuul | 22:24 | |
*** hashar_ has joined #zuul | 22:26 | |
*** SotK_ has joined #zuul | 22:26 | |
*** cmurphy has quit IRC | 22:26 | |
*** cmurphy_ is now known as cmurphy | 22:26 | |
*** nibz has joined #zuul | 22:27 | |
*** jesusaur has quit IRC | 22:28 | |
*** SotK has quit IRC | 22:28 | |
*** nibalizer has quit IRC | 22:28 | |
*** hashar has quit IRC | 22:28 | |
*** harlowja has quit IRC | 22:28 | |
*** jesusaur has joined #zuul | 22:29 | |
*** nibz is now known as nibalizer | 22:31 | |
jamielennox | so much scroll | 22:32 |
jamielennox | is it possible to do per-run ssh keys? | 22:32 |
jamielennox | so the image or cloud-init gets baked with an ssh key for a trusted zuul process, be that executor-trusted or zuul elsewhere | 22:33 |
jamielennox | it generates a new ssh key there, copies public to the job nodes, and private into the executor | 22:34 |
*** jkilpatr has joined #zuul | 22:39 | |
pabelanger | SpamapS: replied to your comment | 22:45 |
pabelanger | jeblair: ack, looking now | 22:46 |
jeblair | jamielennox: interesting idea; though i think the easier way to do that would just be to have nodepool generate the keys for every node, which is possible but not something i'm excited about (it may be easier, but it's very difficult -- key management in openstack is difficult to get right). neither approach maps well either to nodepool static nodes or zuul bypassing nodepool to log into machines directly with project- or tenant-specific keys. | 22:51 |
pabelanger | ya, that is something we talked about last night too. Wen | 22:53 |
jamielennox | that would require nodepool having to log into the new nodes though right? | 22:53 |
pabelanger | we'd need to update glean too | 22:53 |
jamielennox | like you can't do arbirtrary put this key on this vm in openstack, you have to register it with nova (AFAIK) | 22:54 |
jamielennox | which sucks | 22:54 |
clarkb | jamielennox: yup | 22:54 |
jeblair | jamielennox: right, which means making boatloads of keys. | 22:54 |
jeblair | possibly more than we have quota for | 22:54 |
jamielennox | so you'd still be in a position where something with the starting priviledged key has to log in to the box and replace that key with a per-job generated one | 22:55 |
jeblair | (you can have 1000 servers. and 10 keys. have fun.) | 22:55 |
jeblair | jamielennox: well, only if we can't use nova. | 22:55 |
jamielennox | jeblair: having nodepool maintain and rotate X keys in nova sounds horrible | 22:56 |
jamielennox | so given complete freedom on this (and not having many deployments to worry about) the ideal for me would be a pam module out to hashicorp vault | 22:57 |
* jamielennox would need to double check you can do ssh keys via pam, but the same daemon on box idea | 22:57 | |
jeblair | jamielennox: yes, it was when we were doing one key per day for image builds. but at this point, we're pretty good at working around nova leaking things, so the only thing (other than just not wanting to do it because it's so painful) that would technically block it is the key quota issue. | 22:58 |
jamielennox | but vault (or similar) is not something you can put a hard dep on | 22:58 |
SpamapS | I don't think we need key per node. | 22:58 |
SpamapS | But making keys short lifetime is a good idea. | 22:58 |
clarkb | jamielennox: there are ldap things but require changes to sshd? | 22:58 |
clarkb | I think rhel may carry the patches | 22:59 |
jamielennox | SpamapS: i'm happy with short for now, but there are some easily exploitable things in our deploy atm, like test nodes come up with the same keys on basically sequential ips so malicious actor can easily pre-hose a few waiting boxes | 22:59 |
jamielennox | not a huge problem for now, | 23:00 |
jamielennox | a big problem if you want to ever entertain private jobs in tenants | 23:00 |
jamielennox | clarkb: i think sssd can do it | 23:00 |
pabelanger | another idea, since zuul is using a http server for job secrets, is add ssh-import-id support to glean. Then have zuul worker fetch data from zuul on boot. | 23:00 |
SpamapS | Key per tenant is one we can do too. | 23:00 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Store initial repo state in the merger https://review.openstack.org/461176 | 23:01 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use previously stored repo state on executor https://review.openstack.org/461177 | 23:01 |
pabelanger | IIRC, cloud-init support thats, from github URLs | 23:01 |
SpamapS | yeah, cloud-init does github or lp users | 23:01 |
jeblair | jamielennox: the key should not be available to untrusted projects, so that should not be easily exploitable. | 23:02 |
clarkb | SpamapS: pabelanger I wouldn't rely on external services that way | 23:02 |
clarkb | they are far too unreliable | 23:02 |
SpamapS | clarkb: it would be zuul | 23:02 |
SpamapS | not an external service | 23:02 |
jeblair | jamielennox: (the work SpamapS is doing is still in the realm of protecting from an escape from ansible) | 23:03 |
jamielennox | jeblair: in which model? the key has to be present for ansible to ssh even in the untrusted phase | 23:03 |
SpamapS | and you don't really need ssh-import-id support. just curl http://{{zuul}}/{{key_name}} >> /root/.ssh/authorized_keys | 23:03 |
clarkb | SpamapS: zuul maybe external to the cloud? | 23:03 |
jeblair | SpamapS: we can't rely on zuul being accessible from the nodes | 23:03 |
jeblair | right that | 23:04 |
SpamapS | it's true, we did fix that | 23:04 |
SpamapS | so let's not break it again :) | 23:04 |
pabelanger | clarkb: sure, but one way it works today. | 23:04 |
jamielennox | SpamapS: right, that was where i was starting because you don't need to execute that from cloud-init/glean, you can do that from the trusted playbooks to set up a key for the untrusted components | 23:04 |
SpamapS | I think we just have to live with the DoS potential of an ansible-playbook escaper. | 23:05 |
SpamapS | Unless we do key-per-test-node, which is quite expensive. | 23:05 |
*** hashar_ has quit IRC | 23:05 | |
jeblair | SpamapS: if you want to use zuul to drive CD, getting the key is about as bad as things can get though. so i wouldn't characterize it as a dos potential. i'd charactize it as privilege escalation. | 23:06 |
SpamapS | IMO CD would never be untrusted. | 23:07 |
SpamapS | In fact that's where I think key-per-tenant makes sense. | 23:07 |
jeblair | (unless your CD key is a different key, which is the plan in the spec) | 23:07 |
jeblair | SpamapS: yeah, that's what the spec describes, is a per-tenant (and also per-project) key. | 23:07 |
clarkb | having an agent would protect the key though righr? | 23:07 |
SpamapS | clarkb: if they escape, they get to use it to ssh to things while they're escaped | 23:08 |
SpamapS | and once they ssh somewhere, they can agent forward | 23:08 |
pabelanger | trying to catch up, are we saying ssh-agent might be an issue? | 23:08 |
clarkb | right which should be same test env? | 23:08 |
jeblair | SpamapS, clarkb: so assuming we have agents set up correctly, an exposure of the nodepool key doesn't necessarily mean an exposure of the CD key, which is good. but we just need to keep that case in mind. | 23:08 |
SpamapS | and then from there.. as long as they are still connected and running.. they're ssh'ing wherever they can | 23:08 |
pabelanger | jeblair: yes, that is my thinking too | 23:08 |
SpamapS | so we have to think in job timeout windows | 23:09 |
jeblair | still, if you grab the nodepool key, you could start sshing into nodes that use secrets and you may be able to steal the secrets | 23:09 |
SpamapS | If we let jobs go for 4 hours.. that's how long somebody can play with a key. | 23:09 |
SpamapS | that's nsaty | 23:10 |
SpamapS | nasty even | 23:10 |
SpamapS | So.. what if we did have the executor just do 'ssh-keygen -t rsa -b 1024 -f tmpdir/jobuuid ; ssh-agent ; ssh-add tmpdir/jobuuid ; ssh-copy-id node-address ; bwrap ansible-playbook' ? | 23:12 |
SpamapS | (in pseudo-commands...) | 23:12 |
SpamapS | basically just make a crappy key.. copy it up for this _one playbook run_ using the real key, and then only expose the crappy key to untrusted execution. | 23:13 |
jeblair | i think it would work for dynamic nodes; needs something to cleanup static nodes though | 23:14 |
jeblair | SpamapS, jamielennox: the "cleanup" job idea from SpamapS yesterday may apply here | 23:14 |
jeblair | (and i think we could do this either hard-coded in the executor, or as a trusted pre-playbook as jamielennox suggested; the setup/cleanup cycle may be easier that way actually) | 23:15 |
jamielennox | SpamapS: that's basically what i meant by generating a key in the trusted pre-playbooks, though i wasn't thinking of the ssh-agent | 23:15 |
pabelanger | a trusted playbook seems like it would fix well | 23:15 |
SpamapS | I think this goes inside zuul | 23:16 |
jeblair | if we do trusted pre-playbook we have to remove the trusted key from the agent after installing the new one | 23:16 |
jamielennox | that ssh-agent is single use right? you just shut it down? | 23:17 |
SpamapS | And we can make the cleanup a finally: after bwrap | 23:17 |
SpamapS | jamielennox: yep | 23:17 |
jamielennox | i think it preferable to code this into the executor that playbook - but either works | 23:17 |
jamielennox | s/that/than | 23:17 |
SpamapS | it's foreground, you just read the sock path from it in and feed it to bwrap/ansible-playbook as env vars and then after kill it off | 23:18 |
jamielennox | i don't see a situation where you would want to opt-out from this | 23:18 |
SpamapS | yeah, this feels like it should be in the executor | 23:18 |
jeblair | jamielennox: yeah, though i was imagining an agent per job, not per ansible-playbook invocation; so you'd use the agent with the nodepool key for the first pre-playbook, then remove the nodepool key, the subsequent playbooks use the new key | 23:18 |
jeblair | SpamapS: note that we only want to create one key per job, but we run ansible-playbook and bwrap several times within a job | 23:19 |
SpamapS | per job seems more sane | 23:19 |
pabelanger | So, if we do pre-playbook, don't we still have an issue if somebody breaks out of bwrap, they can still access our primary SSH key to access the node? | 23:19 |
SpamapS | jeblair: can set it up as a lazyload singleton | 23:19 |
SpamapS | that gets torn down after the job | 23:19 |
jeblair | pabelanger: that's why i say if we do that, the pre-playbook needs to remove the key from the agent | 23:19 |
jamielennox | so what is this key per job idea? | 23:19 |
pabelanger | jeblair: right, but how does the next job SSH into the next node? | 23:20 |
jamielennox | jeblair: i don't think it's removed from the agent, you would start a new ssh-agent with only the new key and put that socket through to the bwrap environment | 23:20 |
jeblair | pabelanger: agent is per-job too | 23:20 |
jeblair | jamielennox: i would expect something like 3-10 ansible invocations per job; a mix of bubblewrap and non-bubblewrap. that's why i was thinking agent-per-job. | 23:21 |
jamielennox | jeblair: oh sorry, my bad i mentally read that as key per project for some reason | 23:21 |
jamielennox | yes agent-per-job | 23:22 |
pabelanger | sorry, still confused :( Does executor SSH into zuul work, and add key B authorized_key file? Then bwrap connection with key B to zuul worker? | 23:22 |
SpamapS | Yeah ok, but I don't think playbooks would do this well. | 23:22 |
jamielennox | well, no agent-per-build/executor not per defined job | 23:22 |
jamielennox | well, no agent-per-build/run not per defined job | 23:22 |
SpamapS | yeah, agent+key per build | 23:22 |
jeblair | jamielennox: i stand corrected: agent per build. :) | 23:22 |
SpamapS | run? | 23:23 |
SpamapS | gah | 23:23 |
SpamapS | terms | 23:23 |
SpamapS | >: | 23:23 |
jamielennox | i'm still not exactly sure of all the terms | 23:23 |
jeblair | yeah, "build" is the right zuul vocab | 23:23 |
openstackgerrit | Ian Wienand proposed openstack-infra/nodepool master: Remove timestamp in devstack plugin https://review.openstack.org/462319 | 23:23 |
pabelanger | Can we maybe get an etherpad going? | 23:23 |
jeblair | (it's sometimes loose in the executor since we accept gearman "jobs" which tell us to perform a zuul "build") | 23:23 |
jeblair | https://etherpad.openstack.org/p/EIrCy4mhuR | 23:24 |
pabelanger | ty | 23:24 |
SpamapS | so the way I'm seeing it, we have the executor basically just say "I want to run ansible playbook" and we go "do I have an ssh agent yet? no. Ok... let's create one.. and spray the key out to the nodes (maybe that part is an internal-to-zuul playbook) and then run the playbook you asked for" | 23:24 |
SpamapS | and then when a build completes we tear that agent down and clean the key off the nodes. | 23:25 |
SpamapS | but I don't think we actually ever remove the "real" nodepool key. We just leave that agent running and use it for setup/teardown | 23:25 |
SpamapS | since it's never exposed to untrusted playbooks | 23:25 |
jamielennox | i don't think there's a teardown in any of these scenarios | 23:26 |
jamielennox | new keys are generated in jobdir that executor auto cleans up | 23:26 |
jamielennox | and ssh-agent exits at the end of the build | 23:26 |
pabelanger | right, if the real nodepool key is still on executor, I am struggling to understand why we go though all the steps for per build ssh-keys | 23:27 |
jamielennox | pabelanger: because it is super easy to have ansible-playbook exec something on the executor that just proxies the ssh-agent out to the rest of the world | 23:27 |
SpamapS | jamielennox: you have to remove the keys from static nodes | 23:28 |
jeblair | pabelanger: so that if someone escapes out of ansible, into the bubblewrap container, they can't use the nodepool key (whether on disk or in agent) to ssh to anything else other than the nodes for that build | 23:28 |
jamielennox | SpamapS: oh, static, right | 23:28 |
jeblair | i think i have the steps for hardcode-in-zuul in the etherpad | 23:28 |
jeblair | working on pre-playbook now | 23:28 |
SpamapS | jamielennox: and ssh-agent would only exit when you kill it, so it's still part of tearing things down. | 23:30 |
*** harlowja has joined #zuul | 23:31 | |
jeblair | okay, i think that's the plan for either hardcode or playbook variants | 23:31 |
* SpamapS looking | 23:31 | |
jamielennox | from ssh-agent man page: If a command line is given, this is executed as a subprocess of the agent. When the command dies, so does the agent. | 23:31 |
jamielennox | may or may not be possible if you have to add keys etc | 23:31 |
jeblair | jamielennox: we still have multiple playbook invocations to deal with though | 23:32 |
jamielennox | also not sure how that ould play with bubblewrap | 23:32 |
jamielennox | yea, ok, ssh-agent cleanup | 23:32 |
pabelanger | I'm leaning to trying it with pre-playbook first, see how it goes | 23:35 |
SpamapS | it's a pretty easy cleanup | 23:36 |
jeblair | SpamapS: i'm with you on having this be a batteries included thing (and not something that we want users to mess up). but i'm hesitant to hard-code assumptions about the target hosts into zuul. so i kind of like the pre-playbook idea since it lets us put it in the zuul standard-lib base job so that everyone can use it easily. but in the rare case that someone has to do something different in some circumstance, it's still site-customizable. | 23:36 |
SpamapS | build.agent.kill() | 23:36 |
jamielennox | pabelanger: can you establish an agent in pre-playbook that will influence how it is run in the untrusted | 23:36 |
SpamapS | jeblair: I hadn't thought about a base job, but yes, I think you're right. | 23:37 |
SpamapS | err | 23:37 |
SpamapS | base playbook? | 23:37 |
jamielennox | pabelanger: and have an always run cleanup of agent even on failue? | 23:37 |
jeblair | SpamapS: job ~= playbook in this situation | 23:37 |
SpamapS | k | 23:37 |
pabelanger | jamielennox: I think so. I'd need to test | 23:38 |
jamielennox | somehow you need to tell bwrap to plumb the socket into the untrusted environment and i'm not sure how you do that from a playbook | 23:38 |
jeblair | jamielennox: yeah, theoretically a trusted playbook could start a daemon, and a post playbook always runs, even on failure. but.. yes ^ that's one of the reasons we probably want zuul managing the agent, so it can set up the right bubblewrap cmdline | 23:39 |
jamielennox | (and mostly because i don't know how a lot of this stuff works in bwrap) | 23:39 |
pabelanger | we should be able to place the ssh-agent bind_address inside our job dir, right? | 23:39 |
jamielennox | because depending on how you put code into the bwrap environment if you might be able to put through the actual domain socket as like {jobdir}/ssh-agent | 23:40 |
pabelanger | then untrusted would look for it | 23:40 |
SpamapS | Yeah I like the idea of zuul just having the per-build agent be in all playbooks environments. | 23:40 |
SpamapS | but all the stuff we do with the agent makes a lot more sense being done in playbook | 23:40 |
SpamapS | since we want to copy data from executor -> all nodes, that's kind of ansible's job :) | 23:40 |
jeblair | ++ | 23:41 |
SpamapS | pabelanger: so.. turning attention to the venv... | 23:41 |
SpamapS | pabelanger: since we pip install -e for tox jobs, the problem is that where the code is coming from is not actually _inside_ the venv. Yes? | 23:41 |
pabelanger | I do like the idea of having nodepool adding the 1time key to worker, then passing the key via zookeeper to zuul, but no more SSH | 23:41 |
pabelanger | SpamapS: right, it is 2 directories up | 23:42 |
SpamapS | pabelanger: in the tox case. | 23:42 |
SpamapS | in prod.. ???? | 23:42 |
pabelanger | tox | 23:42 |
pabelanger | well | 23:42 |
pabelanger | I mean, somebody could pip -e in production, but we don't | 23:43 |
pabelanger | pip install -U /opt/zuul | 23:43 |
SpamapS | so.. point being, venvs can have symlinks to wherever | 23:44 |
SpamapS | so we can't just bind mount in the venv itself. | 23:44 |
SpamapS | pabelanger: also we don't need zuul's code, we need ansible | 23:44 |
pabelanger | SpamapS: we actually get ansible on your patch | 23:45 |
jeblair | technically, we need a tiny bit of zuul's code, for our custom plugins, but we can remove that dependency | 23:45 |
pabelanger | but, we are missing zuul.ansible | 23:45 |
pabelanger | yes, that | 23:45 |
jeblair | (i think) | 23:45 |
SpamapS | jeblair: that's already handled when we copy the action plugins in. I think? | 23:45 |
SpamapS | Oh, those import zuul stuff. | 23:45 |
SpamapS | durn shared library code | 23:46 |
jeblair | yep. but it's just zuul.ansible.path, which is a 70 line file, mostly comments, whose only further dependencies are ansible | 23:46 |
pabelanger | http://paste.openstack.org/show/608776/ is the traceback | 23:47 |
SpamapS | yeah, so.. hm | 23:47 |
SpamapS | two ideas popped into my head immediately. They might be insane. | 23:47 |
SpamapS | 1) move that into its own pypi library that we require | 23:48 |
SpamapS | (2) was in fact insane and I will not share. | 23:48 |
jeblair | a simple option is: we can copy that specific file into somewhere in sys.path in the bubblewrap and change the import statements to match. | 23:48 |
jeblair | (was that #2? :) | 23:49 |
SpamapS | sorta | 23:49 |
SpamapS | yeah | 23:49 |
jeblair | (and i guess we'd have to add /var/lib/zuul/ansible to sys.path for the non-bubblewrap case) | 23:49 |
jeblair | SpamapS: i feel it's less insane because we're already compelled to copy the custom plugins to some location anyway. | 23:50 |
jeblair | really just adding that location to sys.path might be enough | 23:50 |
SpamapS | Yeah the custom plugins make everything bonkers already. | 23:50 |
jeblair | then "import path" instead of "import zuul.path". obviousl, path is a bad name then, so "import zuulpaths" or something. | 23:50 |
openstackgerrit | Ian Wienand proposed openstack-infra/nodepool master: Turn of time stamps in dib log https://review.openstack.org/462321 | 23:52 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Report job shadow errors to users https://review.openstack.org/462300 | 23:52 |
SpamapS | jeblair: that will bork in py3 unfortunately... | 23:53 |
SpamapS | but we can make a well known parent dir and add that to sys.path | 23:53 |
jeblair | SpamapS: why would it bork? | 23:53 |
SpamapS | jeblair: py3 does not do relative imports anymore | 23:53 |
SpamapS | you need package name | 23:54 |
jeblair | SpamapS: but this would appear as a "root" package, right? if it's /var/lib/zuul/zuulpaths.py and /var/lib/zuul is in sys.path? | 23:54 |
SpamapS | perhaps? It's a solvable problem | 23:55 |
SpamapS | it was a nit pick that flew out of me without much thought | 23:55 |
SpamapS | EOD rushing without testing ;) | 23:55 |
* SpamapS has to go | 23:55 | |
jeblair | ya | 23:55 |
* SpamapS -> AFK | 23:56 | |
pabelanger | jeblair: would we ever have a zuul-executor that would only run trusted playbooks? | 23:59 |
jeblair | pabelanger: nope (unless the only playbooks in the system were trusted) | 23:59 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!