Wednesday, 2017-05-03

jeblair	jlk: i think maybe half of what that test is doing is still relevant -- testing that if a new repo shows up after zuul starts, it doesn't bork. however, we certainly can't have a configuration for that repo until after it exists now. as pabelanger found out earlier today.	00:00
jlk	seems kind of chicken-egg	00:00
jlk	you can't use a repo unless it's defined in a tenant config	00:00
jeblair	indeed	00:00
jlk	can't put a repo in a tenant config until it exists	00:01
jeblair	yeah, so our repo creation in infra will need to be at least two phases: create in gerrit, then, after existing, add to tenant config.	00:02
jeblair	but aside from that, zuul can get events from that repo even if it's not in its config, and it could end up getting used as a foreign project.	00:02
jeblair	we should probably drop that test, and add a test for foreign projects. i think that would cover what we need.	00:03
jlk	alright	00:04
jeblair	SpamapS, clarkb: i would rank ssh-agent above generating keys per node -- keys-per-node won't work well for a static host inventory in nodepool	00:04
jeblair	pabelanger: ^	00:04
*** harlowja has quit IRC		00:04
jlk	# TODOv3 (jeblair): re-add support for foreign projects if needed hehe found that.	00:05
clarkb	jlk: thats a good point re static hosts	00:05
jeblair	(also, just to make sure this is out there -- the custom ansible plugins should prevent a rogue check job from accessing the private key data. however, that's just a belt, not belt-and-suspenders, so i think pursuing suspenders here is worthwhile)	00:05
clarkb	er jeblair ^	00:05
jeblair	been thinking about that a lot lately :)	00:06
pabelanger	SpamapS: I was able to get ssh-agent working inside bubblewrap container on local fedora system using steps listed: https://review.openstack.org/#/c/453851/7/zuul/driver/bubblewrap/__init__.py	00:19
pabelanger	SpamapS: I am going to dive more into it tomorrow	00:19
pabelanger	I've been side tracked by flatpak tonight :)	00:20
openstackgerrit	K Jonathan Harker proposed openstack-infra/zuul feature/zuulv3: Add github reporter status_url config option https://review.openstack.org/449794	01:27
*** jamielennox is now known as jamielennox\|away		02:03
*** jamielennox\|away is now known as jamielennox		02:17
jamielennox	should it be possible to filter reporters based on source?	02:37
jamielennox	this is mostly a conceptual thing i'm thinking of	02:38
jamielennox	i'm hooking v3 up to both gerrit and github simultaneously, eventually it would be nice to do cross project dependencies and so you nned triggers for both sources	02:38
jamielennox	however i started by just reusing the pipeline in both projects	02:39
jamielennox	but then on action i have both a gerrit and github reporter so when a github patch succeeds/fails it tries to report on gerrit as well	02:44
jamielennox	obviously i can (and will for now) create seperate pipelines - but is this the intention or should i be able to say only report to github if the source is github?	02:45
jamielennox	current zuul.yaml: https://github.com/BonnyCI/project-config-v3/blob/d7830586dbc98b4c3ff10742b3d3862b6448d869/zuul.yaml	02:47
jlk	jamielennox: that's a good question!	03:30
jlk	I would think that a report should be tied to the source the job came through	03:30
jlk	also, because that's how multiple connections to the same source should work too	03:31
jlk	er multiple connections via the same driver	03:31
jamielennox	jlk: so i'm wondering how complex this should be, but i'm at least thinking that on the pipeline you could say:	03:32
jamielennox	reporters: {'github': {'source': 'github'}}	03:32
jlk	well	03:32
jamielennox	so a reporterfilter in the same way you do event filter	03:32
jlk	we did work so that the trigger is source specific from the event	03:32
jlk	I would think that same work should translate somehow to the reporter things	03:33
jamielennox	maybe	03:33
jlk	I have to go afk, kiddo bed time	03:33
jamielennox	i mean an event is always specific to the source, it has to be	03:33
jlk	I believe the defining bit is the canonical_hostname	03:33
jlk	event hostname has to match the trigger source hostname	03:33
jamielennox	a reporter could be like please record a log over here that is common to all reporters	03:33
jlk	oooh, you're thinking of a global reporter and maybe connection specific reporters.	03:34
jamielennox	so i would be ok saying like this reporter triggers for these sources, and this reporter triggers for all sources	03:34
jlk	sounds like a larger discussion	03:34
jamielennox	(just preferrably not in every action)	03:34
* jlk really afk		03:34
jamielennox	yea, i figured i'd put it out now and people can read and i'll chase it tomorrow	03:35
jamielennox	i think things like github/gerrit reporters should be source specific by default, smtp, sql global by default and preferrably overridable in config	03:38
jamielennox	(can't think of a legitimate reason you'd override github/gerrit being non source specific)	03:42
openstackgerrit	Jamie Lennox proposed openstack-infra/zuul feature/zuulv3: Only report to gerrit if the action is from gerrit https://review.openstack.org/461981	03:57
jamielennox	not sure ^ is appropriate or not, going to try it out	03:57
openstackgerrit	Jamie Lennox proposed openstack-infra/zuul feature/zuulv3: Only report to gerrit if the action is from gerrit https://review.openstack.org/461981	04:16
*** bhavik1 has joined #zuul		05:37
*** yolanda has quit IRC		06:20
jamielennox	jeblair: so with the change to canonical hostnames i'm missing a way in the job to determine the directory that the project code was cloned into	06:20
jamielennox	for example the merger used to maintain BonnyCI/sandbox and now it maintains github.com/BonnyCI/sandbox	06:21
jamielennox	however github.com there is a default, in my jobs etc i only say BonnyCI/sandbox	06:22
jamielennox	i used to locate this directory like: "{{ workspace_root }}/src/{{ zuul.project }}" however now there's an additional github.com folder and no zuul variable to tell me what it should be	06:24
jamielennox	there's a couple of ways i could see to fix this and just want to know your opinion - i think zuul.project should be the canonical name but there's a few ways to do that and don't want to go down the wrong rabbit hole	06:25
*** hashar has joined #zuul		07:28
*** isaacb has joined #zuul		08:48
*** Cibo_ has joined #zuul		09:01
*** Cibo_ has quit IRC		09:15
openstackgerrit	Jamie Lennox proposed openstack-infra/zuul feature/zuulv3: Use routes for URL handling in webapp https://review.openstack.org/461331	09:30
openstackgerrit	Jamie Lennox proposed openstack-infra/zuul feature/zuulv3: Use tenant_name to look up project key https://review.openstack.org/461332	09:30
openstackgerrit	Jamie Lennox proposed openstack-infra/zuul feature/zuulv3: Use routes for URL handling in webapp https://review.openstack.org/461331	10:04
openstackgerrit	Jamie Lennox proposed openstack-infra/zuul feature/zuulv3: Use tenant_name to look up project key https://review.openstack.org/461332	10:04
*** jkilpatr has quit IRC		10:31
mordred	jamielennox: that commit message ^^ made me twitch	10:37
jamielennox	mordred: which?	10:41
mordred	jamielennox: "Use tenant_name to look up project..."	10:42
mordred	jamielennox: just incorrect brain context moment	10:42
mordred	I was like "No - that's backwards!!! ... oh, wait, wrong project"	10:42
jamielennox	oh, right - not allowed to call them tenants for keystone so i'm maintaining the barrier	10:42
*** jkilpatr has joined #zuul		10:49
*** bhavik1 has quit IRC		10:52
*** hashar is now known as hasharLunch		11:48
*** isaacb has quit IRC		12:01
*** isaacb has joined #zuul		12:03
*** hasharLunch is now known as hashar		12:19
*** isaacb has quit IRC		12:23
pabelanger	morning!	13:02
hashar	good morning	13:04
*** dkranz_ has joined #zuul		13:20
*** isaacb has joined #zuul		13:21
SpamapS	pabelanger: so I had a thought about the SSH key in untrusted context conundrum.	14:03
SpamapS	pabelanger: wondering if we can basically load the key into ssh-agent _inside_ the bwrap, and then rm the key, and kill the agent if ansible-playbook exits.	14:06
jeblair	SpamapS: let bubblewrap handle the process mgmt?	14:07
SpamapS	http://paste.openstack.org/show/608711/	14:09
SpamapS	jeblair: bwrap is pid 1... and will aggressively remove the entire pid space under it. But I mean in between bwrap and ansible-playbook.	14:09
pabelanger	That would mean multiple ssh-agents running on the host, inside bubblewrap, over single ssh-agent outside. Trying to understand why that is better	14:09
jeblair	jamielennox: i could see making zuul.project be the canonical project name (hostname + full path). or we could do a series of variables based on the internal ones like https://etherpad.openstack.org/p/vB1WjfL804 (mordred, clarkb: thoughts?)	14:10
SpamapS	pabelanger: because that is a copy of the key in RAM that we can kill	14:10
SpamapS	but now that I think about it	14:10
SpamapS	bwrap's going to aggressively kill anything on exit of ansible-playbook.... so an attacker is already going to have to _replace_ and fork under ansible-playbook	14:11
SpamapS	so this gets us nothing	14:11
pabelanger	not sure I follow	14:11
pabelanger	the last part	14:11
SpamapS	bwrap forks and execs its arguments, in our case, that is 'ansible-playbook --foo --bar baz'	14:12
SpamapS	and when that process exits, bwrap being its pid 1, it simply terminates the entire namespace, and itself.	14:12
SpamapS	so if somebody wants to break out of ansible-playbook, they will have to do it by forking something from ansible-playbook, and keeping ansible-playbook running.	14:13
SpamapS	since SSH forks from ansible-playbook and reads its private key every time, I don't know that we can prevent a nefarious process from doing the same.	14:14
SpamapS	(we could with SELinux or AppArmor, because we could say only ssh is allowed to read the key)	14:14
pabelanger	is it possible to list a private key from ssh-agent?	14:15
SpamapS	So, a single executor ssh-agent buys us something, as it will keep the SSH key in RAM and not sprayed out into the various work dirs, so prevents accidental key material propagation. But it doesn't protect us from ansible-playbook escapers.	14:16
SpamapS	Yes, ssh-agent's protocol is to request a private key and get it back over the socket.	14:16
SpamapS	oy, have to AFK now to get kids off to school	14:16
*** isaacb has quit IRC		14:17
pabelanger	Right, that's the first part I am concerned about right now. I'd like to minimize the copying of our private keys around, if possible. I think clarkb suggestion of ssh-agent, gets us that	14:17
SpamapS	I'm still concerned that the private SSH key will be a single ansible-playbook breakout away, and thus must if nothing else be rotated aggressively.	14:17
pabelanger	as for breaking out of ansible-playbook, I haven't considered testing that yet	14:17
SpamapS	The only reason we're bwrapping is we think it's possible.	14:17
*** openstackgerrit has quit IRC		14:18
*** isaacb has joined #zuul		14:18
jeblair	SpamapS: i thought the ssh-agent protocol was signing operations, not key data transfer	14:19
pabelanger	right, it also stop playbooks from looking into /home/zuul on executor too, which contains our sensitive data right now	14:19
jeblair	so if you use ssh-agent, you don't request a copy of the key, but rather, request that the key be used to sign something	14:19
jeblair	if that's the case, we don't have to worry as much about losing the key material in a breakout, however, that job would still be able to use the key to log into any host for the duration of the job.	14:21
jeblair	https://tools.ietf.org/id/draft-miller-ssh-agent-00.html#rfc.section.4.5	14:22
pabelanger	Ya, having somebody break out and use key is something I haven't looked into yet. I was mostly wanting to stop somebody writing a cat ~/.ssh/id_rsa playbook right now, and think ssh-agent might provide protection against that	14:23
*** isaacb has quit IRC		14:24
jeblair	pabelanger: sure, but the cat playbook only works in the case of a breakout too	14:24
pabelanger	or somebody missed a trusted playbook change to do that	14:25
pabelanger	but, agree	14:25
jeblair	pabelanger: trusted playbooks will not run with bwrap, so this won't stop that.	14:26
pabelanger	I'm still trying to understand why mount private key into bwrap at all, over using ssh-agent outside	14:26
pabelanger	jeblair: ah, right. keep forgetting that	14:27
pabelanger	if we moved to ssh-agent on executor, we could keep ssh_private_key contents outside of zuul in theory. Not including the discussion from last night about having zuul run ssh-add commands	14:29
jeblair	pabelanger: there will be multiple keys in the future, some of them managed by zuul, so it's likely that we would need multiple ssh agents, and i think zuul would have to start them.	14:31
dmsimard	fungi, pabelanger: so, the reason why I'm interested in booting from volume with nodepool is because the cloud I'm working with has a huge disparity in performance between ephemeral (local spindles) and volumes (all ssd ceph)	14:47
dmsimard	some jobs can land in ephemeral just fine (say, simple tox jobs) but I'd rather have our "intensive" jobs to use volumes (i.e, openstack installation/integration tests)	14:48
pabelanger	I don't see why it could be used, I think shade support it out of box	14:49
dmsimard	I would guess so since ansible modules can use boot from volume	14:51
Shrews	shade supports boot from volume, nodepool does not so that's where the work would be	14:51
fungi	dmsimard: is there any openstack feature allowing you to pre-image a volume? or do you need a separate step to boot from image, attach the seed volume, overwrite it with the bootable image, detach it and then clone that for subsequent booting?	14:51
pabelanger	Ya, create_server takes a few parameters, 50GB volume seems to be default right now. I think it would be a matter of exposing some configuration settings into 'images' section to indicate we'd want to use the image from a volume	14:51
pabelanger	Shrews: agree	14:52
clarkb	yes ssh agent doesnt expose the key. If bubblewrap doesnt protrct access to memory we lose regardless	14:52
clarkb	jeblair: re project names, maybe have a project name object whose default representation/text is the canonical version. But include methods to get the hostname/short_name/name/protocol?	15:04
*** openstackgerrit has joined #zuul		15:06
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Increase file permissions around generate keys https://review.openstack.org/461218	15:06
pabelanger	jeblair: any objection to updating zuulv3-dev.o.o with new openstack-zuul-jobs / openstack-zuul-roles config changes?	15:08
*** rcarrillocruz has quit IRC		15:16
jeblair	pabelanger: please do	15:17
*** rcarrillocruz has joined #zuul		15:18
pabelanger	nice, reload worked	15:23
mordred	fungi, Shrews, pabelanger, dmsimard: I _believe_ (would have to double-check) that we have options for create_server that allow booting from a pre-created volume, as well as booting from an image but on to a volume. there is also a shade call which is "please make me a volume from this image"	15:24
fungi	oh, very nice!	15:25
mordred	(we honestly should think about using such things for our long-lived servers, fwiw)	15:25
fungi	so in theory the logic within nodepool could remain basically the same if you have "please make me a volume from this image" followed by "please boot from this volume"	15:26
mordred	yes	15:26
mordred	in fact, it's just a boolean flag to shade	15:26
mordred	so it can be _very_ little code in nodepool	15:26
fungi	configuration wise you still have images you're maintaining, so just need somewhere (new toggle in nodepool config?) for "and i want it on a volume thxbye"	15:27
mordred	if you say "boot_from_volume=True" - you then have 2 options - provide a volume to boot from in "boot_volume" - or or provide an image to turn into a volume in the normal image param	15:27
mordred	you also want "volume_size"	15:27
mordred	which is in G	15:27
mordred	(it defaults, as pabelanger mentions, to 50Gb)	15:27
fungi	ahh, yep right	15:27
mordred	so maybe 2 new options - boot_from_volume and volume_size?	15:28
fungi	and maybe volume type/flavor i guess if your provider has more than just a default one	15:28
pabelanger	Ya, I think the hard part is new options to bike shed on	15:28
pabelanger	hopefully we could also try this in nodepool dsvm job too	15:28
fungi	like in rax we can choose between sata (default) and ssd (non-default)	15:28
pabelanger	mordred: why for long lived servers? Not making the connection	15:31
mordred	pabelanger: it allows us to pick the size of the root filesystem	15:42
mordred	pabelanger: and also for our root filesystem to be in a cinder volume instead of just on a local disk - this might or might not be an improvement, depending on the server - but maybe it's worth investigating and thinking about if/when we should use it	15:42
mordred	fungi: oh - excellent call on volume_type - that is, in fact, _not_ exposed in shade - but I believe it should be	15:44
mordred	will need to read up on how block_device_mapping_v2 works for volume types	15:44
clarkb	mordred: I mentioned it in -infra bu apparently image metadata can control volume related things on boot, any idea if it can be used to have an image boot on volume?	15:44
clarkb	mordred: beacuse if it can then you could do that in nodepool with no changes to any code, just config	15:45
mordred	clarkb: I do not know - but I'll admit to being wary of making more use of magic glance metadata ... I don't believe any of them are actually documented	15:45
fungi	certainly sounds convenient	15:45
clarkb	mordred: I mean 95% of shade is doing undocumented things	15:46
clarkb	:P	15:46
mordred	:)	15:46
clarkb	like figuring out ip addresses	15:46
mordred	if we can learn how to trigger such a thing, that seems potentially fine - however it might also be nice to add the options	15:46
mordred	so that nodepool users can just express "boot these using boot-from-volume" and not have to themselves learn BFV image metadata magic :)	15:47
mordred	(also, if we do learn more about them, it might be worth while exposing them in some way as options somehow in shade so that it's easy for people to choose to do that themselves)	15:47
pabelanger	mordred: got it, thanks	15:48
clarkb	mordred: definitely, just thinking it might be doable now if people need it (but ya undocumented in general so who knows!?)	15:50
*** isaacb has joined #zuul		15:51
SpamapS	jeblair: AH! TIL something :)	15:57
SpamapS	My understanding was still based on ssh1 keys.. when was ssh2 created, 18 years ago?	15:57
SpamapS	:-P	15:57
mordred	SpamapS: that's still super new	15:57
SpamapS	I really just hadn't thought about how ssh-agent works completely.	15:58
SpamapS	https://github.com/openssh/openssh-portable/blob/master/PROTOCOL.agent has the whole thing spelled out nicely btw	15:59
jeblair	yeah, that agrees with the ietf thing too	15:59
SpamapS	so yes, ssh-agent exposed into the bwrap will limit the ability of an ansible escaper to using the key while they're running on the box.	16:00
SpamapS	so changes to the current implementation are needed.. and I think 1 spec revision.	16:03
SpamapS	spec should add a brief SSH key discussion	16:03
SpamapS	and we still need to add the venv and now ssh agent to the implementation. Yes?	16:04
jeblair	SpamapS: sounds about right	16:07
* SpamapS is on it		16:08
SpamapS	pabelanger: have you started on any of that?	16:09
pabelanger	SpamapS: no, I have POC'd ssh-agent locally but think we need to solve venv issue first. So we can test bwrap in gate	16:10
pabelanger	currently head down, trying to get openstack-zuul-jobs setup	16:11
SpamapS	pabelanger: k, I'm going to take them both on.	16:11
eventingmonkey	pabelanger: Hey, SpamapS suggested I reach out to you about https://review.openstack.org/#/c/454396/3/?	16:13
pabelanger	eventingmonkey: sure, what can I help with	16:14
eventingmonkey	pabelanger: I'm looking for something I can help out with, and SpamapS told me to ping you about in here.	16:16
pabelanger	eventingmonkey: sure, the idea of 454396, is we want to add playbooks to get code coverage for all our blocked lookup plugins here: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/ansible/lookup?h=feature/zuulv3 Right now, the test is setup as an untrusted-project, which means we should report FAILURE if we try to use certain lookups.	16:19
pabelanger	for example, lookup('passwd') will always fail	16:19
SpamapS	pabelanger: is there already an example test written, or maybe one that asserts success that could be modified to test for fail?	16:20
pabelanger	just what we have in 454396 right now	16:20
* eventingmonkey relooks over 454396...		16:21
pabelanger	eventingmonkey: SpamapS: what I was thinking, we'd write playbooks to use the lookup function, register a variable, then assert it failed. I do something like that here: http://git.openstack.org/cgit/openstack-infra/zuul/tree/tests/fixtures/config/ansible/git/common-config/playbooks/check-vars.yaml?h=feature/zuulv3	16:23
pabelanger	then, we can enhance the test, and validate we get the right failure message in back from zuul	16:23
SpamapS	pabelanger: k. I just want eventingmonkey to have a framework to hang his tests off of, because the test fakes can be a little confusing at first.	16:24
pabelanger	sure, I can add some code to 454396 on how I envision it working. Ensure we are all happy with that, then iterate on it	16:26
jeblair	what if we proceeded with what was already in 454396, and then improved it with the variable registration later?	16:27
pabelanger	that works for me, in that case eventingmonkey should be okay to keep iterating on it	16:29
eventingmonkey	pabelanger: cool, I'm down.	16:30
eventingmonkey	pabelanger and SpamapS: What I'm not clear on is what it is you would like me to iterate on? I can look at it and see where I would go, but if you had something in mind already I'd rather not jump into the wrong thing...	16:31
jeblair	eventingmonkey: that test covers a portion of the custom lookup plugins; we want to test all of them.	16:31
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Move playbook out of zuul https://review.openstack.org/462210	16:31
jlk	o/	16:32
eventingmonkey	jeblair: Ah...I see, so working in the same area, just add tests for the other ones. Cool beans.	16:33
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add jobs back to .zuul.yaml https://review.openstack.org/462212	16:35
*** isaacb_ has joined #zuul		16:35
*** isaacb has quit IRC		16:39
*** isaacb_ has quit IRC		16:39
jeblair	jlk: if you have a minute to weigh in on 461331, i'd love another opinion there.	16:40
* jlk reads		16:40
* jeblair subscribes to openstack-zuul-jobs and updates dashboards		16:43
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add jobs back to .zuul.yaml https://review.openstack.org/462212	16:44
jeblair	Zuul: Patch Set 2: Verified-1 (2017-05-03 16:25:54+0000) < Reply >	16:45
jeblair	Unknown configuration error	16:45
jeblair	we should track that down	16:45
* jeblair looks in logs		16:45
pabelanger	I think it is job permissions	16:45
jeblair	http://paste.openstack.org/show/608728/	16:46
jeblair	we should make sure that error propogates back to gerrit	16:46
pabelanger	yes, I am struggling to setup dependencies between untrusted projects ATM	16:46
pabelanger	learning how it all works	16:47
jeblair	pabelanger: i think it might be very hard to move these between repos	16:47
*** harlowja has joined #zuul		16:48
jeblair	becuase you can't have jobs in two different repos with the same name	16:48
pabelanger	should I not expect depends-on to work properly in this case?	16:48
jeblair	pabelanger: you should. and the second change should work. but the first change won't.	16:49
jeblair	pabelanger: the first change, which adds a second copy of the jobs, is invalid configuration. the second job, which removes the first copy of the jobs, should be valid config.	16:50
jeblair	pabelanger: (assuming the second depends-on the first)	16:50
pabelanger	okay, great	16:50
jeblair	but that sequence will never merge with one-way dependencies (if we did two-way cross-repo-deps, we could merge them both simultaneously and that would be correct)	16:51
jeblair	pabelanger: i have an idea though	16:51
*** hashar has quit IRC		16:51
jlk	jeblair: commented. I'm on the same fence as you.	16:51
pabelanger	jeblair: please share :)	16:51
pabelanger	also, just seen the following:	16:52
pabelanger	2017-05-03 16:51:06,879 DEBUG zuul.IndependentPipelineManager: <QueueItem 0x7efe18089590 for <Change 0x7efdfeea4ed0 462214,1> in check> is a failing item because ['is a non-live item with no items behind']	16:52
jeblair	pabelanger: that's a pretty normal message	16:52
jeblair	pabelanger: that's a dependent change being dequeued after the depending change reported	16:53
jeblair	pabelanger: (recall that dependent changes are added to the queue, but aren't acted upon)	16:54
pabelanger	k	16:54
pabelanger	I'm having some trouble understanding why 462212 is not running jobs on zuulv3 ATM	16:54
pabelanger	I don't think openstack-infra/zuul can see openstack-infra/openstack-zuul-jobs right now	16:55
pabelanger	Oh	16:57
pabelanger	I think I know	16:57
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add jobs back to .zuul.yaml https://review.openstack.org/462212	16:58
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add jobs back to .zuul.yaml https://review.openstack.org/462212	16:59
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add jobs back to .zuul.yaml https://review.openstack.org/462212	17:01
jeblair	pabelanger: you keep shadowing those jobs. it's not going to wokr.	17:03
pabelanger	jeblair: maybe I am confused. How do I get zuul to run tox-linters again?	17:03
jeblair	pabelanger: it's what i said above -- you can not have the same job defined in two different repos.	17:04
*** Cibo_ has joined #zuul		17:04
jeblair	pabelanger: maybe we should have a phone call?	17:04
pabelanger	yes, that would help	17:05
pabelanger	give me 1 moment	17:05
clarkb	couldn't you use a new name for the moved stuff, remove the old stuff, then rename to name you want?	17:05
jeblair	clarkb: yes, that's a possibility	17:05
pabelanger	okay, back. should we jump into pbx.o.o?	17:07
jeblair	yep. let's go with room 6812	17:07
jeblair	anyone is welcome to join: https://wiki.openstack.org/wiki/Infrastructure/Conferencing	17:08
SpamapS	pabelanger: btw, the bubblewrap implementation doesn't bind mount in the user's home dir	17:21
SpamapS	pabelanger: --dir just creates that dir in the chroot.	17:21
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: WIP: Add jobs back to .zuul.yaml https://review.openstack.org/462212	17:24
pabelanger	SpamapS: okay cool. I was confused	17:24
jeblair	quick call summary: we're going to proceed with: 1) a change that removes the jobs from zuul and switches zuul to noop; 2) a change that adds them to openstack-role-jobs and depends-on (1); 3) a change to start running the newly moved jobs on zuul that depends-on (2)	17:24
jeblair	we should end up with +1s all the way on that. but we may have bugs. :)	17:25
pabelanger	okay, I think 462212,6 is correct now	17:26
pabelanger	I see zuulv3 reporting no jobs ATM	17:26
pabelanger	Also, I wonder if we should also bubble up that message to gerrit	17:27
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add support for bwrap https://review.openstack.org/453851	17:27
SpamapS	pabelanger: ^ that should now copy in the venv	17:27
SpamapS	well, ro-bind it	17:27
*** Cibo_ has quit IRC		17:29
jeblair	pabelanger: no jobs isn't normally an error, so probably shouldn't report to gerrit	17:29
jeblair	pabelanger: i'm going to go work on a test case for 'dynamically adding a job in one repo and dynamically running it from another one' and see if i can reproduce this	17:32
SpamapS	so with the SSH agent. I'm not sure what to do if SSH_AUTH_SOCK is already set.	17:32
pabelanger	jeblair: ok	17:32
SpamapS	do I just use it? save it, execute stuff, and restore it?	17:32
clarkb	i would pass it through? that seems like it would be least surprise method	17:32
jeblair	SpamapS: force it to our value?	17:33
jeblair	i think zuul has to control the ssh agent(s), otherwise we won't be able to manage having multiple keys later	17:33
jeblair	we're just talking about setting the env var with bubblewrap, right? so we're not altering zuul's environment?	17:34
jlk	jeblair: in gertty, when in a diff view, when I'm trying to comment on a line it keeps opening the comment on the left side (old code) rather than the right side (new code). What changes the focus there?	17:36
jeblair	jlk: tab	17:37
jlk	oh arrow key too	17:37
jeblair	oh yep	17:37
jlk	why didn't I try that already	17:37
jeblair	it should also be persistent once you switch	17:37
jlk	and is it normal that my comment box opens on top of somebody else's comment, but once saved it'll show _below_ the comment?	17:38
jeblair	yeah. not my preferred behavior, but programming is hard.	17:38
jeblair	i might know enough about urwid to fix both of those things now. :)	17:38
jlk	haha, no worries	17:39
pabelanger	If SSH_SOCK_AUTH was setup, then zuul also added a key, you would get 2 keys I would assume?	17:39
SpamapS	jeblair: right	17:39
SpamapS	currently we don't set env= when calling bwrap	17:40
SpamapS	but perhaps we should be very explicit about what gets passed in there	17:40
* SpamapS ponders while brewing mmoar coffee		17:40
jlk	jamielennox: I implemented fixes for your feedback on the github patches. I pushed up the result to my 'github-v3' branch in j2sol/zuul, but I'm not going to push up to gerrit until we get more feedback, since it'd be a lot of noise.	17:44
pabelanger	SpamapS: jeblair: so, it sounds like we might be okay with https://review.openstack.org/#/c/331148/ for initializing ssh-agent and ssh-add / ssh-remove of keys when executor starts	17:44
jeblair	SpamapS: pabelanger noted an env variable option to bwap -- maybe that's what we need to use instead of the env= argument to the subprocess call?	17:44
jeblair	or maybe we need both?	17:44
jeblair	pabelanger: something like that, maybe -- but i think it needs to happen entirely inside the executor, because i think it will eventually need to start and stup multiple agents.	17:46
pabelanger	yes, I used --setenv SSH_AUTH_SOCK $SSH_AUTH_SOCK in testing last night, however should maybe confirm if that is needed. I'm not sure which env vars are passed to bwrap by default	17:46
jeblair	pabelanger: (so, it shouldn't touch any files in zuul/cmd/; only zuul/executor/server)	17:46
pabelanger	k	17:47
pabelanger	make sense	17:47
SpamapS	I've not looked at the env var option to bwrap	17:48
SpamapS	let me see	17:48
*** Cibo_ has joined #zuul		17:48
SpamapS	ah ok, those are just unset/set options. No real difference I think than just env= to Popen	17:49
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add untrusted-projects ansible test https://review.openstack.org/461881	17:50
SpamapS	pabelanger: bwrap gets zuul's env, and thus, ansible-playbook gets the same env	17:50
SpamapS	which is why I say, maybe we should black/whitelist envvars	17:51
pabelanger	yes, I would be okay with that.	17:51
*** Cibo_ has quit IRC		17:53
pabelanger	jeblair: SpamapS: the good news, is paramiko has an SSH agent interface which we can likely use	17:53
SpamapS	hm	17:54
SpamapS	I'll take a look	17:54
pabelanger	hmm, at first glance it seems we cannot add key to paramiko SSH agent, just read from existing keys setup	17:56
clarkb	pabelanger: its just for consuming keys out of an agent?	17:57
pabelanger	clarkb: looks like it	17:57
pabelanger	guess we could push that upstream	17:57
jeblair	pabelanger: hrm, a 2-change series to add jobs in one repo and use them in another passes tests. i'm going to extend my test to do the 3-change series we're doing now.	17:58
pabelanger	ack	18:02
SpamapS	pabelanger: so yeah, I think if we move the Evseev change into the executor specifically, and pass that agent's envvars into bwrap via Popen's env= .. I think we're good.	18:07
SpamapS	we can talk about filtering the env later.	18:07
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Be less agressive with implied branch matchers https://review.openstack.org/462268	18:48
jeblair	pabelanger: ^ i think that's the bug that's causing no jobs to run. the key is the branch -- the openstack-zuul-jobs definitions are getting implicit 'master' brinch matchers, so they aren't running on feature/zuulv3.	18:49
pabelanger	Ah interesting	18:49
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Be less agressive with implied branch matchers https://review.openstack.org/462268	18:50
jeblair	git add ^	18:50
jlk	that's quite the commit message	18:50
*** Cibo_ has joined #zuul		18:52
jeblair	it's longer than the change (but not the test)	18:53
jeblair	wow, i'm going to python that a bit better, hang on.	18:54
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Be less agressive with implied branch matchers https://review.openstack.org/462268	18:54
jeblair	there goes 3 ridiculous lines	18:54
jlk	oof, okay	18:55
jlk	re-reviews	18:55
jeblair	sorry, it's hot here.	18:55
jlk	hahah	18:55
*** tobiash_ has joined #zuul		19:01
jeblair	tobiash_: have you seen https://review.openstack.org/461509 ?	19:02
tobiash_	jeblair: not yet, title sounds interesting :)	19:03
tobiash_	will have a look asap	19:04
tobiash_	jeblair: I encountered a bug with v3 where it reports with 'Starting check jobs' for every change in the gerrit instance	19:06
tobiash_	https://review.openstack.org/#/c/455711	19:06
tobiash_	that's my try to fix that	19:06
jeblair	tobiash_: ah thanks, that slipped my attention	19:06
pabelanger	SpamapS: here is the bwrap command running on my local zuul env, but don't think we have the right mounts setup: http://paste.openstack.org/raw/608739/	19:11
pabelanger	ansible-playbook runs properly now, but missing zuul.ansible libraries	19:12
pabelanger	check ansible.cfg pathes now	19:13
Shrews	morgan, mordred, jeblair: why were the zuul py3 changes (325661, 327436, 327459) all abandoned?	19:14
jeblair	Shrews: maybe because of the gear work?	19:16
pabelanger	I think I see the issue	19:18
*** Cibo_ has quit IRC		19:20
jeblair	Shrews: SpamapS has changes in progress for gear	19:20
tobiash_	jeblair: the spec sounds great, thanks for writing this :)	19:24
*** hashar has joined #zuul		19:26
*** tobiash_ has quit IRC		19:29
Shrews	jeblair: good spec. i think dkranz_ would probably be interested in that, too	19:33
jlk	it's too nice to be inside any more. I'm out for a few hours of cycling.	20:02
dkranz_	Shrews: I'll check it out.	20:10
jeblair	pabelanger, jlk: i think we need something close to the old implied branch-matcher behavior for the project-pipeline variants. in other words, the change i wrote makes sense for job definitions, but i think if you stuck a project definition inside of a project repo, you would expect implied branch matchers for that... i'm working on a new revision	20:13
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add untrusted-projects ansible test https://review.openstack.org/461881	20:25
pabelanger	Shrews: jeblair: some progress on bwrap^ I we need to mount the full git repo in bwrap, because we pip install -e for tox.ini	20:26
pabelanger	Shrews: sorry	20:26
pabelanger	SpamapS: ^	20:26
pabelanger	the blocker now, is untrusted and Executing local code is prohibited	20:26
jeblair	pabelanger: hrm. that sounds like it could be complex -- how would we know whether (and how) to do that?	20:31
jeblair	(keeping in mind that the goal here isn't just to handle zuul's unit tests, but to also handle zuul being installed in a virtualenv in production. even an editable virtualenv, i guess)	20:31
pabelanger	jeblair: so, I am using the copy module now	20:40
pabelanger	http://paste.openstack.org/show/608757/	20:40
pabelanger	but trying to determine if that is a valid use case or not	20:40
pabelanger	today, that actually breaks	20:41
pabelanger	IIRC, we say it was okay for untrusted jobs to write into the executor job dir	20:42
jeblair	pabelanger: what's the error?	20:44
pabelanger	http://paste.openstack.org/show/608758/	20:45
pabelanger	src is none	20:45
pabelanger	happy to fix, but want to confirm that is actually valid code for untrusted executor	20:46
pabelanger	untrusted job on executor*	20:46
jeblair	pabelanger: yep, i think so	20:46
pabelanger	k, let me work on fixing it	20:48
SpamapS	pabelanger: I thought work_dir would have all the git repos in it.	20:57
*** dkranz_ has quit IRC		20:58
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add untrusted-projects ansible test https://review.openstack.org/461881	21:02
pabelanger	okay, that should pass, and use bwrap properly. Comments most welcome!^	21:03
SpamapS	pabelanger: left a comment inline.	21:28
SpamapS	pabelanger: Little confused by ../../..	21:28
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Be less agressive with implied branch matchers https://review.openstack.org/462268	21:28
jeblair	i'm really sorry about that commit message	21:28
jeblair	it's kind of "but wait! there's more!"	21:29
SpamapS	approve now and you'll get not one, but TWO changes	21:34
jeblair	our layaway plan lets you buy now and pay down your technical debt over the lifetime of the program!	21:35
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Report job shadow errors to users https://review.openstack.org/462300	21:51
jeblair	pabelanger: i think those two changes take care of all the issues we've seen with your move.	21:52
jeblair	i think the first one is done and passes tests now, so we should probably see about merging it asap to proceed with the job move. i think the second one is good too, just waiting on tests now.	21:53
SpamapS	well this is stupid	21:54
SpamapS	ssh-add -x locks an agent	21:54
SpamapS	but it won't accept stdin redirects as the password	21:54
jeblair	SpamapS: is locking feasible for us? i'd assume ansible might start new ssh connections at any time...	21:54
SpamapS	jeblair: I was going to look at whether we could use the controlsocket functionality	21:55
SpamapS	jeblair: so establish the connection and configure ssh to use it, then lock	21:56
SpamapS	but that's a bit of a stretch	21:56
jeblair	SpamapS: interesting... ansible does use controlsocket on its own. maybe it does the right thing if there is one already, but i don't know.	21:57
SpamapS	actually, locking just prevents ssh-add operations. I think.	21:57
SpamapS	I thought anyway	21:58
jeblair	SpamapS: according to the rfc (draft), locking should "at least" suspend private key operations. so i'd expect it to not sign anything. but maybe also not accept additions.	22:00
*** fbouliane has joined #zuul		22:01
SpamapS	the docs are not awesome	22:01
SpamapS	for openssh	22:01
SpamapS	While locked, the agent will refuse all requests except	22:03
SpamapS	SSH_AGENTC_UNLOCK, SSH_AGENTC_REQUEST_RSA_IDENTITIES and	22:03
SpamapS	SSH2_AGENTC_REQUEST_IDENTITIES.	22:03
SpamapS	jeblair: the concern is that the escaper can forward the agent to the test node, and then poke stuff into the agent	22:04
SpamapS	which suggests that agent-per-bwrap may be necessary	22:04
*** jkilpatr has quit IRC		22:05
jeblair	SpamapS: nod	22:06
clarkb	the main issue being removing a key that subsequent jobs would depend on?	22:06
jeblair	clarkb: that, or adding a bunch of useless keys (dos?)	22:07
SpamapS	both	22:07
SpamapS	also keeping it very busy	22:07
SpamapS	we can cgroup the bwraps and they'll share CPU fairly. But there would be only one ssh-agent.	22:08
SpamapS	I think in the scheme of things, ssh-agent with one key loaded is going to be quite a bit smaller than ansible-playbook	22:09
jeblair	yeah, doesn't seem troubling.	22:13
*** cmurphy_ has joined #zuul		22:24
*** hashar_ has joined #zuul		22:26
*** SotK_ has joined #zuul		22:26
*** cmurphy has quit IRC		22:26
*** cmurphy_ is now known as cmurphy		22:26
*** nibz has joined #zuul		22:27
*** jesusaur has quit IRC		22:28
*** SotK has quit IRC		22:28
*** nibalizer has quit IRC		22:28
*** hashar has quit IRC		22:28
*** harlowja has quit IRC		22:28
*** jesusaur has joined #zuul		22:29
*** nibz is now known as nibalizer		22:31
jamielennox	so much scroll	22:32
jamielennox	is it possible to do per-run ssh keys?	22:32
jamielennox	so the image or cloud-init gets baked with an ssh key for a trusted zuul process, be that executor-trusted or zuul elsewhere	22:33
jamielennox	it generates a new ssh key there, copies public to the job nodes, and private into the executor	22:34
*** jkilpatr has joined #zuul		22:39
pabelanger	SpamapS: replied to your comment	22:45
pabelanger	jeblair: ack, looking now	22:46
jeblair	jamielennox: interesting idea; though i think the easier way to do that would just be to have nodepool generate the keys for every node, which is possible but not something i'm excited about (it may be easier, but it's very difficult -- key management in openstack is difficult to get right). neither approach maps well either to nodepool static nodes or zuul bypassing nodepool to log into machines directly with project- or tenant-specific keys.	22:51
pabelanger	ya, that is something we talked about last night too. Wen	22:53
jamielennox	that would require nodepool having to log into the new nodes though right?	22:53
pabelanger	we'd need to update glean too	22:53
jamielennox	like you can't do arbirtrary put this key on this vm in openstack, you have to register it with nova (AFAIK)	22:54
jamielennox	which sucks	22:54
clarkb	jamielennox: yup	22:54
jeblair	jamielennox: right, which means making boatloads of keys.	22:54
jeblair	possibly more than we have quota for	22:54
jamielennox	so you'd still be in a position where something with the starting priviledged key has to log in to the box and replace that key with a per-job generated one	22:55
jeblair	(you can have 1000 servers. and 10 keys. have fun.)	22:55
jeblair	jamielennox: well, only if we can't use nova.	22:55
jamielennox	jeblair: having nodepool maintain and rotate X keys in nova sounds horrible	22:56
jamielennox	so given complete freedom on this (and not having many deployments to worry about) the ideal for me would be a pam module out to hashicorp vault	22:57
* jamielennox would need to double check you can do ssh keys via pam, but the same daemon on box idea		22:57
jeblair	jamielennox: yes, it was when we were doing one key per day for image builds. but at this point, we're pretty good at working around nova leaking things, so the only thing (other than just not wanting to do it because it's so painful) that would technically block it is the key quota issue.	22:58
jamielennox	but vault (or similar) is not something you can put a hard dep on	22:58
SpamapS	I don't think we need key per node.	22:58
SpamapS	But making keys short lifetime is a good idea.	22:58
clarkb	jamielennox: there are ldap things but require changes to sshd?	22:58
clarkb	I think rhel may carry the patches	22:59
jamielennox	SpamapS: i'm happy with short for now, but there are some easily exploitable things in our deploy atm, like test nodes come up with the same keys on basically sequential ips so malicious actor can easily pre-hose a few waiting boxes	22:59
jamielennox	not a huge problem for now,	23:00
jamielennox	a big problem if you want to ever entertain private jobs in tenants	23:00
jamielennox	clarkb: i think sssd can do it	23:00
pabelanger	another idea, since zuul is using a http server for job secrets, is add ssh-import-id support to glean. Then have zuul worker fetch data from zuul on boot.	23:00
SpamapS	Key per tenant is one we can do too.	23:00
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Store initial repo state in the merger https://review.openstack.org/461176	23:01
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use previously stored repo state on executor https://review.openstack.org/461177	23:01
pabelanger	IIRC, cloud-init support thats, from github URLs	23:01
SpamapS	yeah, cloud-init does github or lp users	23:01
jeblair	jamielennox: the key should not be available to untrusted projects, so that should not be easily exploitable.	23:02
clarkb	SpamapS: pabelanger I wouldn't rely on external services that way	23:02
clarkb	they are far too unreliable	23:02
SpamapS	clarkb: it would be zuul	23:02
SpamapS	not an external service	23:02
jeblair	jamielennox: (the work SpamapS is doing is still in the realm of protecting from an escape from ansible)	23:03
jamielennox	jeblair: in which model? the key has to be present for ansible to ssh even in the untrusted phase	23:03
SpamapS	and you don't really need ssh-import-id support. just curl http://{{zuul}}/{{key_name}} >> /root/.ssh/authorized_keys	23:03
clarkb	SpamapS: zuul maybe external to the cloud?	23:03
jeblair	SpamapS: we can't rely on zuul being accessible from the nodes	23:03
jeblair	right that	23:04
SpamapS	it's true, we did fix that	23:04
SpamapS	so let's not break it again :)	23:04
pabelanger	clarkb: sure, but one way it works today.	23:04
jamielennox	SpamapS: right, that was where i was starting because you don't need to execute that from cloud-init/glean, you can do that from the trusted playbooks to set up a key for the untrusted components	23:04
SpamapS	I think we just have to live with the DoS potential of an ansible-playbook escaper.	23:05
SpamapS	Unless we do key-per-test-node, which is quite expensive.	23:05
*** hashar_ has quit IRC		23:05
jeblair	SpamapS: if you want to use zuul to drive CD, getting the key is about as bad as things can get though. so i wouldn't characterize it as a dos potential. i'd charactize it as privilege escalation.	23:06
SpamapS	IMO CD would never be untrusted.	23:07
SpamapS	In fact that's where I think key-per-tenant makes sense.	23:07
jeblair	(unless your CD key is a different key, which is the plan in the spec)	23:07
jeblair	SpamapS: yeah, that's what the spec describes, is a per-tenant (and also per-project) key.	23:07
clarkb	having an agent would protect the key though righr?	23:07
SpamapS	clarkb: if they escape, they get to use it to ssh to things while they're escaped	23:08
SpamapS	and once they ssh somewhere, they can agent forward	23:08
pabelanger	trying to catch up, are we saying ssh-agent might be an issue?	23:08
clarkb	right which should be same test env?	23:08
jeblair	SpamapS, clarkb: so assuming we have agents set up correctly, an exposure of the nodepool key doesn't necessarily mean an exposure of the CD key, which is good. but we just need to keep that case in mind.	23:08
SpamapS	and then from there.. as long as they are still connected and running.. they're ssh'ing wherever they can	23:08
pabelanger	jeblair: yes, that is my thinking too	23:08
SpamapS	so we have to think in job timeout windows	23:09
jeblair	still, if you grab the nodepool key, you could start sshing into nodes that use secrets and you may be able to steal the secrets	23:09
SpamapS	If we let jobs go for 4 hours.. that's how long somebody can play with a key.	23:09
SpamapS	that's nsaty	23:10
SpamapS	nasty even	23:10
SpamapS	So.. what if we did have the executor just do 'ssh-keygen -t rsa -b 1024 -f tmpdir/jobuuid ; ssh-agent ; ssh-add tmpdir/jobuuid ; ssh-copy-id node-address ; bwrap ansible-playbook' ?	23:12
SpamapS	(in pseudo-commands...)	23:12
SpamapS	basically just make a crappy key.. copy it up for this _one playbook run_ using the real key, and then only expose the crappy key to untrusted execution.	23:13
jeblair	i think it would work for dynamic nodes; needs something to cleanup static nodes though	23:14
jeblair	SpamapS, jamielennox: the "cleanup" job idea from SpamapS yesterday may apply here	23:14
jeblair	(and i think we could do this either hard-coded in the executor, or as a trusted pre-playbook as jamielennox suggested; the setup/cleanup cycle may be easier that way actually)	23:15
jamielennox	SpamapS: that's basically what i meant by generating a key in the trusted pre-playbooks, though i wasn't thinking of the ssh-agent	23:15
pabelanger	a trusted playbook seems like it would fix well	23:15
SpamapS	I think this goes inside zuul	23:16
jeblair	if we do trusted pre-playbook we have to remove the trusted key from the agent after installing the new one	23:16
jamielennox	that ssh-agent is single use right? you just shut it down?	23:17
SpamapS	And we can make the cleanup a finally: after bwrap	23:17
SpamapS	jamielennox: yep	23:17
jamielennox	i think it preferable to code this into the executor that playbook - but either works	23:17
jamielennox	s/that/than	23:17
SpamapS	it's foreground, you just read the sock path from it in and feed it to bwrap/ansible-playbook as env vars and then after kill it off	23:18
jamielennox	i don't see a situation where you would want to opt-out from this	23:18
SpamapS	yeah, this feels like it should be in the executor	23:18
jeblair	jamielennox: yeah, though i was imagining an agent per job, not per ansible-playbook invocation; so you'd use the agent with the nodepool key for the first pre-playbook, then remove the nodepool key, the subsequent playbooks use the new key	23:18
jeblair	SpamapS: note that we only want to create one key per job, but we run ansible-playbook and bwrap several times within a job	23:19
SpamapS	per job seems more sane	23:19
pabelanger	So, if we do pre-playbook, don't we still have an issue if somebody breaks out of bwrap, they can still access our primary SSH key to access the node?	23:19
SpamapS	jeblair: can set it up as a lazyload singleton	23:19
SpamapS	that gets torn down after the job	23:19
jeblair	pabelanger: that's why i say if we do that, the pre-playbook needs to remove the key from the agent	23:19
jamielennox	so what is this key per job idea?	23:19
pabelanger	jeblair: right, but how does the next job SSH into the next node?	23:20
jamielennox	jeblair: i don't think it's removed from the agent, you would start a new ssh-agent with only the new key and put that socket through to the bwrap environment	23:20
jeblair	pabelanger: agent is per-job too	23:20
jeblair	jamielennox: i would expect something like 3-10 ansible invocations per job; a mix of bubblewrap and non-bubblewrap. that's why i was thinking agent-per-job.	23:21
jamielennox	jeblair: oh sorry, my bad i mentally read that as key per project for some reason	23:21
jamielennox	yes agent-per-job	23:22
pabelanger	sorry, still confused :( Does executor SSH into zuul work, and add key B authorized_key file? Then bwrap connection with key B to zuul worker?	23:22
SpamapS	Yeah ok, but I don't think playbooks would do this well.	23:22
jamielennox	well, no agent-per-build/executor not per defined job	23:22
jamielennox	well, no agent-per-build/run not per defined job	23:22
SpamapS	yeah, agent+key per build	23:22
jeblair	jamielennox: i stand corrected: agent per build. :)	23:22
SpamapS	run?	23:23
SpamapS	gah	23:23
SpamapS	terms	23:23
SpamapS	>:	23:23
jamielennox	i'm still not exactly sure of all the terms	23:23
jeblair	yeah, "build" is the right zuul vocab	23:23
openstackgerrit	Ian Wienand proposed openstack-infra/nodepool master: Remove timestamp in devstack plugin https://review.openstack.org/462319	23:23
pabelanger	Can we maybe get an etherpad going?	23:23
jeblair	(it's sometimes loose in the executor since we accept gearman "jobs" which tell us to perform a zuul "build")	23:23
jeblair	https://etherpad.openstack.org/p/EIrCy4mhuR	23:24
pabelanger	ty	23:24
SpamapS	so the way I'm seeing it, we have the executor basically just say "I want to run ansible playbook" and we go "do I have an ssh agent yet? no. Ok... let's create one.. and spray the key out to the nodes (maybe that part is an internal-to-zuul playbook) and then run the playbook you asked for"	23:24
SpamapS	and then when a build completes we tear that agent down and clean the key off the nodes.	23:25
SpamapS	but I don't think we actually ever remove the "real" nodepool key. We just leave that agent running and use it for setup/teardown	23:25
SpamapS	since it's never exposed to untrusted playbooks	23:25
jamielennox	i don't think there's a teardown in any of these scenarios	23:26
jamielennox	new keys are generated in jobdir that executor auto cleans up	23:26
jamielennox	and ssh-agent exits at the end of the build	23:26
pabelanger	right, if the real nodepool key is still on executor, I am struggling to understand why we go though all the steps for per build ssh-keys	23:27
jamielennox	pabelanger: because it is super easy to have ansible-playbook exec something on the executor that just proxies the ssh-agent out to the rest of the world	23:27
SpamapS	jamielennox: you have to remove the keys from static nodes	23:28
jeblair	pabelanger: so that if someone escapes out of ansible, into the bubblewrap container, they can't use the nodepool key (whether on disk or in agent) to ssh to anything else other than the nodes for that build	23:28
jamielennox	SpamapS: oh, static, right	23:28
jeblair	i think i have the steps for hardcode-in-zuul in the etherpad	23:28
jeblair	working on pre-playbook now	23:28
SpamapS	jamielennox: and ssh-agent would only exit when you kill it, so it's still part of tearing things down.	23:30
*** harlowja has joined #zuul		23:31
jeblair	okay, i think that's the plan for either hardcode or playbook variants	23:31
* SpamapS looking		23:31
jamielennox	from ssh-agent man page: If a command line is given, this is executed as a subprocess of the agent. When the command dies, so does the agent.	23:31
jamielennox	may or may not be possible if you have to add keys etc	23:31
jeblair	jamielennox: we still have multiple playbook invocations to deal with though	23:32
jamielennox	also not sure how that ould play with bubblewrap	23:32
jamielennox	yea, ok, ssh-agent cleanup	23:32
pabelanger	I'm leaning to trying it with pre-playbook first, see how it goes	23:35
SpamapS	it's a pretty easy cleanup	23:36
jeblair	SpamapS: i'm with you on having this be a batteries included thing (and not something that we want users to mess up). but i'm hesitant to hard-code assumptions about the target hosts into zuul. so i kind of like the pre-playbook idea since it lets us put it in the zuul standard-lib base job so that everyone can use it easily. but in the rare case that someone has to do something different in some circumstance, it's still site-customizable.	23:36
SpamapS	build.agent.kill()	23:36
jamielennox	pabelanger: can you establish an agent in pre-playbook that will influence how it is run in the untrusted	23:36
SpamapS	jeblair: I hadn't thought about a base job, but yes, I think you're right.	23:37
SpamapS	err	23:37
SpamapS	base playbook?	23:37
jamielennox	pabelanger: and have an always run cleanup of agent even on failue?	23:37
jeblair	SpamapS: job ~= playbook in this situation	23:37
SpamapS	k	23:37
pabelanger	jamielennox: I think so. I'd need to test	23:38
jamielennox	somehow you need to tell bwrap to plumb the socket into the untrusted environment and i'm not sure how you do that from a playbook	23:38
jeblair	jamielennox: yeah, theoretically a trusted playbook could start a daemon, and a post playbook always runs, even on failure. but.. yes ^ that's one of the reasons we probably want zuul managing the agent, so it can set up the right bubblewrap cmdline	23:39
jamielennox	(and mostly because i don't know how a lot of this stuff works in bwrap)	23:39
pabelanger	we should be able to place the ssh-agent bind_address inside our job dir, right?	23:39
jamielennox	because depending on how you put code into the bwrap environment if you might be able to put through the actual domain socket as like {jobdir}/ssh-agent	23:40
pabelanger	then untrusted would look for it	23:40
SpamapS	Yeah I like the idea of zuul just having the per-build agent be in all playbooks environments.	23:40
SpamapS	but all the stuff we do with the agent makes a lot more sense being done in playbook	23:40
SpamapS	since we want to copy data from executor -> all nodes, that's kind of ansible's job :)	23:40
jeblair	++	23:41
SpamapS	pabelanger: so.. turning attention to the venv...	23:41
SpamapS	pabelanger: since we pip install -e for tox jobs, the problem is that where the code is coming from is not actually _inside_ the venv. Yes?	23:41
pabelanger	I do like the idea of having nodepool adding the 1time key to worker, then passing the key via zookeeper to zuul, but no more SSH	23:41
pabelanger	SpamapS: right, it is 2 directories up	23:42
SpamapS	pabelanger: in the tox case.	23:42
SpamapS	in prod.. ????	23:42
pabelanger	tox	23:42
pabelanger	well	23:42
pabelanger	I mean, somebody could pip -e in production, but we don't	23:43
pabelanger	pip install -U /opt/zuul	23:43
SpamapS	so.. point being, venvs can have symlinks to wherever	23:44
SpamapS	so we can't just bind mount in the venv itself.	23:44
SpamapS	pabelanger: also we don't need zuul's code, we need ansible	23:44
pabelanger	SpamapS: we actually get ansible on your patch	23:45
jeblair	technically, we need a tiny bit of zuul's code, for our custom plugins, but we can remove that dependency	23:45
pabelanger	but, we are missing zuul.ansible	23:45
pabelanger	yes, that	23:45
jeblair	(i think)	23:45
SpamapS	jeblair: that's already handled when we copy the action plugins in. I think?	23:45
SpamapS	Oh, those import zuul stuff.	23:45
SpamapS	durn shared library code	23:46
jeblair	yep. but it's just zuul.ansible.path, which is a 70 line file, mostly comments, whose only further dependencies are ansible	23:46
pabelanger	http://paste.openstack.org/show/608776/ is the traceback	23:47
SpamapS	yeah, so.. hm	23:47
SpamapS	two ideas popped into my head immediately. They might be insane.	23:47
SpamapS	1) move that into its own pypi library that we require	23:48
SpamapS	(2) was in fact insane and I will not share.	23:48
jeblair	a simple option is: we can copy that specific file into somewhere in sys.path in the bubblewrap and change the import statements to match.	23:48
jeblair	(was that #2? :)	23:49
SpamapS	sorta	23:49
SpamapS	yeah	23:49
jeblair	(and i guess we'd have to add /var/lib/zuul/ansible to sys.path for the non-bubblewrap case)	23:49
jeblair	SpamapS: i feel it's less insane because we're already compelled to copy the custom plugins to some location anyway.	23:50
jeblair	really just adding that location to sys.path might be enough	23:50
SpamapS	Yeah the custom plugins make everything bonkers already.	23:50
jeblair	then "import path" instead of "import zuul.path". obviousl, path is a bad name then, so "import zuulpaths" or something.	23:50
openstackgerrit	Ian Wienand proposed openstack-infra/nodepool master: Turn of time stamps in dib log https://review.openstack.org/462321	23:52
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Report job shadow errors to users https://review.openstack.org/462300	23:52
SpamapS	jeblair: that will bork in py3 unfortunately...	23:53
SpamapS	but we can make a well known parent dir and add that to sys.path	23:53
jeblair	SpamapS: why would it bork?	23:53
SpamapS	jeblair: py3 does not do relative imports anymore	23:53
SpamapS	you need package name	23:54
jeblair	SpamapS: but this would appear as a "root" package, right? if it's /var/lib/zuul/zuulpaths.py and /var/lib/zuul is in sys.path?	23:54
SpamapS	perhaps? It's a solvable problem	23:55
SpamapS	it was a nit pick that flew out of me without much thought	23:55
SpamapS	EOD rushing without testing ;)	23:55
* SpamapS has to go		23:55
jeblair	ya	23:55
* SpamapS -> AFK		23:56
pabelanger	jeblair: would we ever have a zuul-executor that would only run trusted playbooks?	23:59
jeblair	pabelanger: nope (unless the only playbooks in the system were trusted)	23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!