Monday, 2017-07-24

*** dkranz has quit IRC		00:16
clarkb	you could dual license but that would only make things more complicated I think	02:00
dmsimard	clarkb: (huge coincidence that I saw you reply just now) yeah, that's sort of why I almost don't want to bother with ARA	02:02
dmsimard	considering there hasn't been a lot of contributors (yet), it's not under CLA, it's not under openstack foundation governance, etc.	02:02
dmsimard	If you dual license, it sort of becomes ambiguous, confusing and you have to be careful about what you import where so you don't taint in the wrong direction..	02:04
dmsimard	mordred: btw thanks for http://lists.openstack.org/pipermail/openstack-dev/2017-April/115013.html	02:06
* dmsimard totally not switching from uuid primary keys to ids right now		02:06
*** jesusaurum has quit IRC		03:41
mordred	dmsimard: you're welcome! and totes -although I _do_ recommend switching at some point - it'll make you happier with larger installs	03:57
mordred	dmsimard: and yah - there's no reason for you to not just make ARA GPL if you have the agreement from all of the peple who have contributed patches ( just make sure you actually have agreement from their employers, since most people don't indivually have the legal authority to agree)	03:58
mordred	dmsimard: only matters for copyrightable patches - https://review.openstack.org/#/c/414381/1/ara/webapp.py, for instance, I don't think you need to worry about, for instance	03:59
mordred	dmsimard: from looking at stackalytics, it looks like you have 17 commits you need to look at, determine if they are completely trivial and if not contact the author for permission. it would be 'best' to make the patch to switch to GPL and then get each author whose perission you need to switch to +1 one that commit	04:02
*** bhavik1 has joined #zuul		04:37
*** bhavik1 has quit IRC		05:57
*** isaacb has joined #zuul		06:17
*** hashar has joined #zuul		06:29
*** amoralej\|off is now known as amoralej		06:45
*** yolanda_ has joined #zuul		07:05
*** yolanda_ has quit IRC		07:06
*** 7ITABD5MB has joined #zuul		07:06
*** 07IAALFJ9 has joined #zuul		07:06
*** 07IAALFJ9 has quit IRC		07:07
*** 7ITABD5MB has quit IRC		07:08
*** yolanda_ has joined #zuul		07:08
*** yolanda_ is now known as yolanda		07:11
*** isaacb has quit IRC		07:15
*** lennyb has quit IRC		07:19
*** isaacb has joined #zuul		07:23
*** lennyb has joined #zuul		07:32
jamielennox	hey is there a zuul logo/mascot i can put in a slide?	07:37
jamielennox	i feel like i've seen one before	07:40
jamielennox	mordred: as you're in this tz and might be here ^	07:40
mordred	jamielennox: I'm not sure we've produced one of those yet	07:51
*** isaacb has quit IRC		09:15
*** isaacb has joined #zuul		09:16
*** amoralej is now known as amoralej\|brb		10:08
*** jkilpatr has quit IRC		10:45
*** jkilpatr has joined #zuul		11:02
*** hashar is now known as hasharLunch		11:13
*** amoralej\|brb is now known as amoralej		11:34
*** dkranz has joined #zuul		11:50
*** hasharLunch is now known as hashar		13:01
*** isaacb has quit IRC		13:58
*** isaacb has joined #zuul		14:12
dmsimard	For Zuul v3 secrets ( https://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html#secrets )	14:30
dmsimard	How would you pass the equivalent of a credentials-binding for a file rather than a string ? Encrypt the base64 or something &?	14:30
pabelanger	dmsimard: just confirming, you want to encrypt the whole file?	14:38
dmsimard	pabelanger: currently jenkins allows to encrypt a text (string) or a file	14:39
dmsimard	and then at runtime it sends that file to the slave, decrypts it and makes it available as an env var	14:39
pabelanger	Ya, I don't think we support files ATM. But you should be able to store file_contents as encrypted blob then template it	14:40
pabelanger	that's what we'd plan to do with SSH private keys	14:41
jeblair	only up to 4096 bits	14:57
jeblair	or, actually, i think a bit less than that	14:57
jeblair	4096, according to the docs: https://docs.openstack.org/infra/zuul/feature/zuulv3/user/encryption.html	14:58
jeblair	dmsimard: ^	14:58
*** isaacb has quit IRC		15:08
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove ZUUL_PROJECT https://review.openstack.org/486251	15:19
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove ZUUL_UUID https://review.openstack.org/486252	15:19
pabelanger	jeblair: mordred: 485824 should be a straightforward review for zuul-jobs	15:36
jeblair	pabelanger: +3. anything else i should look at?	15:37
pabelanger	jeblair: just that for now, thanks. Working on more refactor patches this morning	15:38
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: testing https://review.openstack.org/486665	15:38
openstackgerrit	Merged openstack-infra/zuul-jobs master: Remove nodepool DIB specific logic https://review.openstack.org/485824	15:40
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: Remove .txt suffix from tox logs https://review.openstack.org/486665	15:45
*** hashar is now known as hasharMeeting		15:52
leifmadsen	just to confirm, master branch == zuul v2.5 and all v3 work still on feature/zuulv3 ?	16:01
pabelanger	yes	16:03
leifmadsen	thx	16:05
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: do not merge https://review.openstack.org/486679	16:06
*** bhavik1 has joined #zuul		16:10
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: do not merge https://review.openstack.org/486679	16:14
*** bhavik1 has quit IRC		16:20
openstackgerrit	Merged openstack-infra/zuul-jobs master: Remove .txt suffix from tox logs https://review.openstack.org/486665	16:22
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Allow loading additional variables file for site config https://review.openstack.org/447734	16:30
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: do not merge https://review.openstack.org/486679	16:44
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Implement autohold https://review.openstack.org/486692	16:45
pabelanger	jeblair: I'd like to restart zuulv3 to pick up latest logging improvements	16:47
jeblair	pabelanger: all yours	16:54
Shrews	jeblair: so, before 692 ^^^ starts getting into the actual meat of the change, curious as to how you see the in-memory autohold list being managed. Like, do we delete the project/job from the list after the first hold?	16:55
Shrews	jeblair: also, do we need to specify tenant?	16:57
pabelanger	zuulv3 restarted	16:58
jeblair	Shrews: lookin	16:59
Shrews	well, not much to see there yet. it's just the beginnings of plugging into the scheduler :)	17:00
jeblair	Shrews: (well, that also includes looking at what i wrote in storyboard so i sound like i know what i'm talking about)	17:01
Shrews	lol	17:01
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Log an extra blank line to get space after each skip https://review.openstack.org/486698	17:04
jeblair	Shrews: ah ok! good questions! :) in nodepool v0, we tell it how many failed nodes it should accumulate for a given job. i think we default to 3. so maybe we should do that here -- add an extra cmdline argument to specify the count.	17:04
jeblair	Shrews: in v0, nodepool puts a note in the 'comment' field in the node table in the db like "auto held for job foo". it counts those to figure out if it has met the limit	17:06
jeblair	Shrews: we could do something similar in v3, or we could actually add a field to the zk node rec for this purpose. like "zuul_job" or something.	17:06
Shrews	jeblair: ah, ok.	17:07
jeblair	Shrews: i think maybe once we've hit the limit, drop the entry from zuul's in-memory list? we don't do that in nodepool v0, but i think this might be a better behavior.	17:07
jeblair	(only fungi is good at remembering to clean up autoheld nodes :)	17:08
jeblair	Shrews: and yes, we need to specify tenant as well	17:08
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: do not merge https://review.openstack.org/486679	17:09
*** harlowja has joined #zuul		17:09
Shrews	jeblair: great. thanks.	17:10
jeblair	Shrews: and the project name should obey the new convention we're establing -- it should be a fully-qualified canonical project name (ie, git.openstack.org/foo/bar) if that's required to disambiguate it from another similarly named project, or if it's unique, it can just be "foo/bar". the Tenant.getProject method will take care of all that for you, so you can just treat it as an opaque string and hand it off to that method to get a project back (or an ...	17:10
jeblair	... error).	17:10
Shrews	ah. yeah, i suppose i should use that to validate the input	17:12
jeblair	Shrews: i wouldn't try to do much local input validation -- just pass it over the wire and validate it on the zuul-scheduler side, then return errors from that if there are any. i think most of the other methods work that way.	17:13
Shrews	nod	17:13
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: Create tox_environment_defaults variable for tox based jobs https://review.openstack.org/486679	17:17
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: Create tox_environment_defaults variable for tox based jobs https://review.openstack.org/486679	17:20
pabelanger	jeblair: mordred: okay, so I think we are ready to bike shed on https://review.openstack.org/#/q/topic:tox_environment_defaults	17:28
pabelanger	that gives us a way to setup tox defaults, but allows anybody to also override them	17:29
* fungi doesn't feel like he does a particularly excellent job of remembering to delete his held or autoheld nodes		17:29
pabelanger	mordred: jeblair: I'll reserve comments until you've had a chance to look	17:31
jeblair	fungi: then the rest of us are even worse off!	17:33
fungi	yikes	17:34
leifmadsen	are there any documentation patches, especially around setting up zuul w/ github (and just generally getting started) that I can review/test?	17:37
jeblair	leifmadsen: nothing in flight at the moment, but we do have some stuff merged. all docs: https://docs.openstack.org/infra/zuul/feature/zuulv3/	17:39
leifmadsen	thanks, reading though now, looks like I'll have to do some code digging	17:40
jeblair	leifmadsen: the administrators guide has things for someone setting up zuul: https://docs.openstack.org/infra/zuul/feature/zuulv3/admin/index.html	17:40
leifmadsen	I generated the latest stuff locally	17:40
jeblair	leifmadsen: there are two big weak spots we know about:	17:40
jeblair	leifmadsen: a good install HOWTO. we want to have a playbook to help with that.	17:41
leifmadsen	just remember that playbooks are not documentation :)	17:41
jeblair	leifmadsen: exactly. we still need everything to be fully documented. but "i just want to see it run" is never going to be quick and easy with a distributed system, so it'll be nice to have both. :)	17:42
SpamapS	leifmadsen: You might be able to glean some info from BonnyCI's deployment ansible, called hoist... which deploys pointed at github	17:42
SpamapS	leifmadsen: https://github.com/BonnyCI/hoist	17:42
jeblair	leifmadsen: and we know there's some stuff missing in the github docs about how to actually set up the webhooks/triggers/etc in github's interface itself.	17:43
SpamapS	There's still stuff for v2.5 in there but v3 works	17:43
leifmadsen	yea, mostly interested in v3 with github events as I'm starting a comparison / review between zuulv3 and prow	17:43
leifmadsen	and just understanding how both work, etc	17:43
Shrews	SpamapS: lol @ hoist. i'm sensing a theme	17:44
Shrews	"mateys-ahoy" ... theme confirmed	17:44
leifmadsen	nautical name theme definitely a k8s style thing :)	17:45
SpamapS	Shrews: click 'Projects' for a hearty flagon of pirate humor.	17:45
SpamapS	well, org projects	17:46
SpamapS	https://github.com/orgs/BonnyCI/projects/1	17:46
SpamapS	We don't groom the backlog.. we swab it. ;)	17:46
Shrews	404'd on that	17:46
jeblair	leifmadsen: please let us know about any other missing/confusing docs	17:47
SpamapS	Oh I wonder if that's org-only :-P well it's our scrum board and we named it Poop Deck. ;-)	17:48
* fungi is _not_ swabbing the poopdeck		17:49
jeblair	SpamapS: what's the status of bonnyci/charts?	17:51
SpamapS	jeblair: it was a spike by jamielennox .. not sure how far he got.	17:52
jeblair	ah, thus the "20 days ago"	17:52
SpamapS	We're being compelled to move our stuff off our openstack cloud which will be shutdown soon, so we were going to see if we could use that to deploy onto BlueMix k8s	17:52
SpamapS	(and get nodes from some public cloud vendor)	17:53
jeblair	gotcha. it'll be nice to have helm charts too.	17:55
SpamapS	I agree, it's a good fit I think	17:57
SpamapS	I was actually also going to play with Habitat	17:57
SpamapS	but.. distractions abound	17:58
Shrews	squirrel!	17:58
jeblair	leifmadsen: fyi, right now we're heavily focused on prepping to move openstack to zuul, hopefully in a little over a month. we're working on a shared job library so that not everyone has to write their own version of a "run $language unit tests" job, and building openstack's installation on top of that. and of course, fixing any issues that surface as part of that.	17:59
leifmadsen	well, I'll just over here toiling on trying to get it working as a newbie :)	17:59
jeblair	leifmadsen: cool, just wanted to give you some context	18:02
adam_g	v2.5 problem, anyone have any tips for debugging an issue where a node sometimes gets re-used for two changes? it looks like zmq msgs are being processed correctly, but im watching nodepool happily hand out a USED node after a previous job has completed. its fairly easy to reproduce in our env /w a loaded queue and triggers being delivered in quick succession	18:02
pabelanger	http://git.openstack.org/cgit/openstack/ansible-role-zuul should get you most of the way to zuulv3. but I haven't tested it with github integration	18:02
jeblair	adam_g: are you sending OFFLINE_NODE_WHEN_COMPLETE=1 as a job parameter?	18:04
adam_g	jeblair: no, not afaics	18:04
adam_g	should i be?	18:04
pabelanger	was just going to ask that	18:05
jeblair	adam_g: yes	18:05
adam_g	i'll give that a shot	18:06
jeblair	adam_g: remember, the v2.5 launcher is basically emulating jenkins, so nodes normally just stay attached to the "master".	18:06
jeblair	adam_g: so that's emulating the thing we added to the gearman plugin to take a node offline when the job is done.	18:06
*** amoralej is now known as amoralej\|off		18:07
adam_g	jeblair: ok, so it happens to work w/o that settings because the deleter eventually kicks in after DELETE_DELAY ?	18:07
jeblair	adam_g: yes. this addresses that race condition.	18:08
adam_g	jeblair: great	18:08
pabelanger	you should be able to reuse our openstack_functions.py python-file and setup the following regex: http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul/layout.yaml#n1112	18:09
adam_g	jlk: jamielennox SpamapS ^ look for a hoist patch to apply this, surprised we didnt see this more often /w our bonny jobs at peak working hours	18:13
SpamapS	adam_g: "peak" ;-)	18:13
SpamapS	adam_g: actually it's entirely possible our jobs were happy to run again without breaking maybe	18:13
jeblair	SpamapS: that's possible, but even so, if nodepool decides to delete the node mid-run, that's also, erm, problematic.	18:21
jeblair	SpamapS: though, actually, not as much as it could be... because zuul is likely to reschedule the job in that case	18:21
jeblair	SpamapS: so there's a pretty convincing explanation for how it could go unnoticed.	18:21
jeblair	"cloud node disappearing out from under me" is something zuul is designed to handle. even if it's self-inflicted. :/	18:22
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing https://review.openstack.org/485840	18:25
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing https://review.openstack.org/485840	18:30
SpamapS	jeblair: That's a bit schizophrenic, but I like that we have coping strategies. ;)	18:38
jeblair	SpamapS: "stop hitting yourself!"	18:50
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing https://review.openstack.org/485840	18:51
*** hasharMeeting is now known as hasharDinner		18:56
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing https://review.openstack.org/485840	19:00
SpamapS	jeblair: perhaps all distributed systems problems can be boiled down to sibling rivalry tropes. Kerberos key exchange problems might be "I know you are but what am I?"	19:08
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing https://review.openstack.org/485840	19:09
jeblair	SpamapS: this is your chance for the big time: No results found for "i know you are but what am i algorithm".	19:12
SpamapS	jeblair: It's too generic to patent. :)	19:14
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Implement autohold https://review.openstack.org/486692	19:21
*** hasharDinner has quit IRC		19:23
Shrews	jeblair: When you have a moment, looking at https://review.openstack.org/#/c/486692/2/zuul/scheduler.py , I know you said not to do much validation, but am I trying to do too much there? My thinking is that returning False (which I hope will mean job failure????) would be a friendlier way to tell the user "nope, couldn't do the hold".	19:25
Shrews	without those checks, we could just fallback to the less friendly exceptions that might occur b/c of invalid things	19:27
leifmadsen	is there an example tenant configuration for the github driver somewhere I could peep at?	19:30
leifmadsen	oh might have just figured it out (of course, right after I ask)	19:32
Shrews	jeblair: oh, doesn't look like returning False is enough to signal that. Would have to throw an exception. bummer	19:34
Shrews	guess i could just 'raise Exception()' instead	19:36
jeblair	Shrews: yeah, all the current errors are job exceptions.	19:36
jeblair	Shrews: take a look at handle_enqueue in rpclistener	19:36
jeblair	Shrews: it does input validation and returns nice error exceptions that indicate the problem	19:37
SpamapS	leifmadsen: helps to get things out of your own head :)	19:38
Shrews	jeblair: perfect. thx	19:38
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing into fetch-testr-output https://review.openstack.org/485840	20:43
*** dkranz has quit IRC		20:44
*** jkilpatr has quit IRC		21:04
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing into fetch-testr-output https://review.openstack.org/485840	21:32
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Allow loading additional variables file for site config https://review.openstack.org/447734	21:50
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove state_dir from setMountsMap https://review.openstack.org/486766	21:50
jeblair	tristanC: can you take a look at 486766 and make sure i'm correct about that?	21:50
jeblair	jamielennox: i picked up your site vars change (447734); can you take a look and let me know if that works for you?	21:51
Shrews	anyone else care to review/+A the nodepool uuid change and its parent? https://review.openstack.org/484414 Already two +2's	21:54
Shrews	SpamapS or pabelanger? ^^^	21:55
jeblair	it's zuul meeting time in #openstack-meeting-alt	22:01
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Monitor job root and kill over limit jobs https://review.openstack.org/485902	22:07
SpamapS	jeblair: good news.. ^^ now that we're synchronously killing jobs, the tests don't need to whitelist executor-diskaccountant	22:07
*** jkilpatr has joined #zuul		22:21
jeblair	clarkb: we should have an expand-all button if we do that	23:02
clarkb	jeblair: ++	23:02
jamielennox	clarkb: in counter point though, in 99% of cases where a test fails (and you're not on the -infra) team, it's not the node's fault and i really care about is the output of my tox	23:02
clarkb	pabelanger: I left a review on one of your tox playbook changes	23:02
jeblair	cause, yeah, we need to be able to see everything, but we do also have a problem in that right now, the actual error is usually right in the middle of the log. with a bunch of ignorable errors below it! :)	23:03
jamielennox	i'm not saying remove it, but debugging for example the pep8 jobs in projects involves skipping 100s lines of setup to find the actual console output	23:03
pabelanger	clarkb: thanks, replied	23:03
clarkb	jamielennox: I ^F error, which breaks in teh collapsed style setup	23:03
pabelanger	clarkb: FWIW: I do not link that patch myself. But need a good way to support all the paths for tox_environment	23:03
jamielennox	anyway we can deal with the UX later, this is an awesome start	23:04
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: Create tox_environment_defaults variable for tox based jobs https://review.openstack.org/486679	23:05
clarkb	pabelanger: I'm having a hard time parsing that last message :)	23:05
pabelanger	clarkb: so, we had a discussion last week about no defined variable is better then defined empty variable	23:06
pabelanger	when it comes to playbooks	23:06
clarkb	pabelanger: does environment: {} and environment: omit behave differently?	23:06
jamielennox	jeblair: scrolling back re BonnyCI/charts, it largely works I've definitiely got it running jobs and currently still struggling with getting the right secrets in place for uploading logs which is a problem of the non-kubernetes infrastructure	23:06
pabelanger	clarkb: yes, omit would not pass environment to the task.	23:07
pabelanger	but {} would be passed	23:07
clarkb	pabelanger: right but does that behave differently?	23:07
SpamapS	jamielennox: at least the tox jobs that have subunit give you the nice HTML breakdown though. ;)	23:07
jeblair	jamielennox: oh nice. i mean, not the struggling, but the rest of it. :)	23:07
jamielennox	jeblair: the main concerns are that it is more difficult to debug, and if you get for example the scheduler pod restarting then you end up in a really odd state	23:07
SpamapS	maybe we should make pep8 run through subunit	23:07
jamielennox	so i sort of stopped when all the option changes happened	23:07
clarkb	pabelanger: if it does then I would worry that setting vars would not do what we want either because we still want to overlay with the system defaults right? so the three layers would be system defaults, tox defaults, playbook explicit env	23:08
jeblair	SpamapS: pep8 is on my short list of things to move to line-review-comments once we add that :)	23:08
jeblair	jamielennox: anything about site-vars we didn't touch on in the meeting?	23:08
SpamapS	jeblair: mmmmmmmmmmmmmmmmmmmmmmmmmmm	23:08
jamielennox	there's a few problems that really require coordination with putting code into zuul itself - which IMO makes it a post v3 thing	23:08
* SpamapS dreams of line review comments		23:08
clarkb	pabelanger: stuff like LANG and so on we likely want to inherit from the system? (which is current zuulv2.5 behavior iirc)	23:09
jamielennox	jeblair: all i've looked at at the moment is the executor/server file and it seems to do the same thing	23:09
pabelanger	clarkb: I don't know if there is a difference, but today when using the shell command, we don't pass empty environment for tasks. So, need to test	23:09
jamielennox	jeblair: at the moment we're not using it because i got sick of rebasing the patch :P	23:09
pabelanger	clarkb: right, we don't overwrite them	23:09
pabelanger	unless somebody bassed LANG into tox_environment	23:09
pabelanger	passed*	23:09
SpamapS	jamielennox: pod restarting seems like something that k8s should have facilities for doing carefully.	23:09
SpamapS	isn't there a way to tell k8s "only one of these ever" ?	23:10
clarkb	pabelanger: right but if you pass environment: {} would that ovewrite system default env?	23:10
jeblair	jamielennox: yep. i didn't change anything substantial. but i wrote docs and tests -- i mostly wanted to make sure we knew what the story was with precedence.	23:10
clarkb	pabelanger: if not then omit and {} should be equivalent right? but using {} will reduce playbook complexity?	23:10
pabelanger	clarkb: well, so we always want to pass envirnment for run tox shell? or only pass it when a variable is defined	23:11
jamielennox	SpamapS: it'll restart just fine, and yes it'll only run 1, but it assumes that it should be able to move pods if it hsa to, but if you take down scheduler without coordinating the other components things get weird	23:11
clarkb	pabelanger: well if you always pass it then you simplify the playbook significantly and assuming the behavior isn't different that seems preferable to me	23:11
jamielennox	so i can't say (that i know of) if you restart scheduler also restart these executors	23:11
pabelanger	clarkb: we can try passing environment: {}	23:11
clarkb	pabelanger: because then you can just combine the two dicts and then pass the result in	23:11
clarkb	pabelanger: you don't even need a special block you just combine them at the environment: statement	23:12
jamielennox	so this is the sort of thing that just needs fixes to zuul to better reconnect gearman processes and to store some more state	23:12
pabelanger	clarkb: we still need logic to check if tox_environment and tox_environment_defaults are defined, but yes	23:12
clarkb	pabelanger: well you'd define them to default to {} so they would be defined	23:12
clarkb	but yes that	23:13
jamielennox	other things that are annoying is that you basically need to run the nodepool-builder and the zuul-executor with --priviledged for dib and bubblewrap	23:13
jeblair	i have to go run some errands now	23:13
jamielennox	again i think we could tune that out with a bit of dedicated effort	23:13
clarkb	jamielennox: dib at least essentially is privileged though	23:14
clarkb	jamielennox: since it can mount and write filesystems and do all sorts of fun things	23:14
jamielennox	clarkb: there should be a way of providing that cap though without giving priviledged right?	23:15
jamielennox	because we're only mounting things within the container	23:15
clarkb	jamielennox: aiui the reason mount is part of privileged (it can be separately given out) is that if you can mount you can mount whatever including the host fs?	23:16
clarkb	and once you've done that you own the system	23:16
*** artgon has left #zuul		23:16
jamielennox	you would need to have access to the host fs right? or is the implication you can still get that through /dev?	23:17
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: Create tox_environment_defaults variable for tox based jobs https://review.openstack.org/486679	23:17
jamielennox	so the classic security issue is running as root in the container and mounting directories in	23:17
clarkb	jamielennox: I think worst case you just create the node in /dev ?	23:18
jamielennox	ah, ok, didn't realize you could just recreate the node	23:18
clarkb	where worst case is "my host tried to hide it form me"	23:18
jamielennox	mknod has always been magic to me	23:18
clarkb	its been a while since I looked into all this with the iscsi container woes	23:18
clarkb	but ya mount is scary in containers	23:18
jamielennox	so that'll probably affect bubblewrap as well?	23:19
clarkb	jamielennox: reading really quickly mknod is a default docker privilege	23:19
clarkb	jamielennox: so its possible this is just docker being silly too	23:20
clarkb	jamielennox: so if you add mount to a docker container it already has mknod and thats all you need	23:20
jamielennox	yea ok, so in this case nodepool-builder i thought might be fixable, but is reasonably controlled/truste	23:22
jamielennox	running zuul-executor with --priviledged is a big problem	23:22
jamielennox	having said that i think part of the reason is the whole bubblewrap setuid thing	23:23
jamielennox	i'm not actually sure how it works if i run the executor itself as root	23:23
pabelanger	you only need root for finger today, did you change the post to something > 1024 ?	23:23
pabelanger	port*	23:23
jamielennox	pabelanger: yea i just put the port number up for that	23:24
jamielennox	there's a problem here that i don't fully understand	23:24
pabelanger	I'd like us to drop root in openstack-infra too, once we have websocket proxy	23:24
jamielennox	if you don't run bubblewrap as root you generally give it setuid so it can run	23:24
jamielennox	but there is a problem (to do with user namespaces afaict) with running setuid within the container	23:25
jamielennox	anyway, once i gave it --priviledged it worked, and i moved on with a note to come back to the problem	23:26
pabelanger	not sure I understand, I'm running bubblewrap locally as non-root. I don't think I setup anything with setuid	23:26
pabelanger	something, something, container?	23:26
jamielennox	it's not close enough for a production use yet anyway	23:26
jamielennox	pabelanger: i think the .deb puts setuid on the bin right?	23:26
pabelanger	Hmm, need to check. I am using fedora	23:27
pabelanger	unless rpm did something	23:27
clarkb	iirc you need setuid on older kernels	23:27
clarkb	where older kernel is like anything not newer than 2 months old	23:27
jamielennox	-rwsr-xr-x 1 root root 47072 May 2 16:41 /usr/bin/bwrap	23:27
clarkb	so if using a .deb that implies ubuntu/debian which have old kernels	23:27
jamielennox	that's after install on an up to date xenial	23:28
pabelanger	-rwxr-xr-x. 1 root root 48904 May 26 02:32 /usr/bin/bwrap	23:28
jamielennox	clarkb: yea, my understanding is that there's a kernel fix that still hasn't made it into xenial	23:28
jamielennox	that will fix the bwrap problem in particular	23:29
jamielennox	but i'm not sure why user namespaces and setuid is a problem, but it's mentioned in a number of places	23:29
pabelanger	jamielennox: confirmed, that is how bwrap is setup on xenial	23:30
pabelanger	https://anonscm.debian.org/cgit/collab-maint/bubblewrap.git/tree/debian/rules	23:31
clarkb	jamielennox: I think it is because the setuid perms in a namespace will setuid to a non privileged user	23:31
clarkb	jamielennox: if you use the host namespace then setuid is going to use proper root and be happy	23:31
jamielennox	interestingly if it's a kernel problem then i'm not sure what happens if we flip the docker container over to centos or something because the underlying infrastructure might not be on the host	23:32
jamielennox	clarkb: that's interesting because at least theoretically for this you only need to be root in that container, you're not writing anything out	23:33
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: Create tox_environment_defaults variable for tox based jobs https://review.openstack.org/486679	23:33
pabelanger	clarkb: okay, updated^	23:34
jamielennox	but again, apparently this is something that is fixed/improved in later kernels	23:34
clarkb	jamielennox: except that bubblwrap is using kernel capabilities that a in container unprivileged user won't have aiui	23:34
jamielennox	so it's probably something where the practical has not yet caught up with the theoretical	23:34
clarkb	jamielennox: in newer kernels they made those capabilities more fine grained so that you don't need proper root like caps	23:34
pabelanger	and EOD for me	23:35
jamielennox	clarkb: yep, we can add specific caps to the container fairly easily, which i'm ok with doing, just would prefer not to do the full --priviledged	23:35
clarkb	jamielennox: ya though my understanding is until you have a newer kernel that basically means root so its probably six one way half dozen the other until you can rely on newer kernels	23:36
clarkb	clearly we just need the future here today to solve all the problems	23:36
jamielennox	clarkb: yea, which is how i've basically go to the point that all this is super interesting but i wouldn't feel comfortable running this in any sort of prod today	23:37
jamielennox	regardless of how you lock it down	23:37
jamielennox	which is a shame because i think having a fairly easy chart you could deploy to something like GKE would be good for adoption, but something we can look at again in future	23:38

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!