*** bhavik1 has joined #zuul | 05:36 | |
*** bhavik1 has quit IRC | 06:09 | |
*** pbelamge has joined #zuul | 09:15 | |
pbelamge | Hello All | 09:17 |
---|---|---|
pbelamge | Since few days I have started exploring Zuul by following openstack infra zuul web page | 09:18 |
pbelamge | As per the doc, when I run zuul-server, it just exits and doesn't provide any logs as why it is exiting | 09:20 |
pbelamge | if I run with -d option, then it runs fine on the console | 09:20 |
pbelamge | am I missing anything in first case? | 09:20 |
pbelamge | anybody? | 09:52 |
*** _ari_|gone is now known as _ari_ | 13:09 | |
pabelanger | pbelamge: what version of zuul are you running? | 13:11 |
pabelanger | we had this issue recently with a change to logging IIRC | 13:11 |
pabelanger | make sure your logging file is correct | 13:11 |
openstackgerrit | Merged openstack-infra/nodepool master: Add mirror support for fedora-25 DIB https://review.openstack.org/456372 | 13:49 |
openstackgerrit | Merged openstack-infra/nodepool master: Switch to /etc/ci/mirror_info.sh for nodepool mirrors https://review.openstack.org/456374 | 13:50 |
*** jkilpatr has joined #zuul | 13:50 | |
pabelanger | yay | 13:51 |
pabelanger | https://review.openstack.org/#/c/455770/ is people are reviewing :D | 13:51 |
openstackgerrit | Merged openstack-infra/nodepool master: Add functional test for key-name and glean https://review.openstack.org/455770 | 13:59 |
*** dkranz has joined #zuul | 14:09 | |
*** pbelamge has quit IRC | 14:11 | |
*** eggshell has left #zuul | 15:57 | |
*** eggshell has joined #zuul | 15:57 | |
*** corvus is now known as jeblair | 16:12 | |
jeblair | good morning! | 16:12 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove source from pipelines (1/2) https://review.openstack.org/453362 | 16:21 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Replace config/project repos with config/untrusted projects https://review.openstack.org/453347 | 16:21 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove source from pipelines (2/2) https://review.openstack.org/453821 | 16:21 |
Shrews | jeblair: welcome back | 16:44 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Fix dynamic reconfiguration https://review.openstack.org/454395 | 16:44 |
jeblair | mordred, pabelanger: i guess you didn't make any more headway on https://review.openstack.org/454396 ? | 16:48 |
pabelanger | jeblair: sadly no, it was a light week for me last week on zuul things | 16:49 |
*** harlowja has quit IRC | 16:49 | |
*** harlowja has joined #zuul | 16:52 | |
*** jkilpatr has quit IRC | 17:26 | |
*** jkilpatr has joined #zuul | 17:27 | |
* SpamapS cracks knuckles and prepares to dive back in | 17:35 | |
SpamapS | jeblair: feeling recharged I hope? :) | 17:35 |
jeblair | SpamapS: yes! let's merge some changes! :) | 17:40 |
Shrews | i think there's still the random job failure issue, yeah? | 17:41 |
clarkb | Shrews: yes I think so | 17:43 |
clarkb | I have a changr up that runs tests twice that seems to catch it that you can recheck to see | 17:43 |
jeblair | clarkb: what's that telling us? | 17:47 |
SpamapS | If running twice hits sometimes.. tells me there's a race. | 17:47 |
SpamapS | If it hits always, tells me there's a cleanup problem. | 17:48 |
SpamapS | jlk: IIRC, --analyze-isolation did not find a bad interaction, right? | 17:48 |
clarkb | jeblair: SpamapS that and if its test order dependent the .tesrepository data from first run influences test order of second run | 17:48 |
clarkb | local testing showed clean first runs are more likely to pass | 17:48 |
jeblair | clarkb: have you gotten data from your experiment yet? | 17:49 |
jlk | SpamapS: that's right | 17:49 |
clarkb | the change I pushed? it ailed as expected yo match local results. I havent had much time to look at 8t further though | 17:50 |
jlk | I can run with concurrency of 4 and get failures. | 17:50 |
jlk | concurrency 8 is fine | 17:50 |
jeblair | clarkb: did it fail the first run or the second? | 17:51 |
SpamapS | jlk: on a box w/ 8 CPUs yeah? | 17:52 |
jlk | yeah, I haven't tried doing this on a 4 cpu box but with forced 8 concurrency | 17:52 |
clarkb | jeblair: tge second | 17:52 |
SpamapS | weren't we also suspicious about the sqla reporter tests? | 17:53 |
jeblair | clarkb: where will you go next with that change? | 17:54 |
jlk | I removed those from my set and still got failures | 17:54 |
clarkb | jeblair: I think we need to track down the fails and fix them then possibly merge tge change if we think it will prevent regressions else abandon | 17:55 |
clarkb | it was mostly a sanity check that the gate wasnt special | 17:55 |
*** jonesn has joined #zuul | 17:56 | |
jeblair | clarkb: okay, i wasn't sure if you had a plan to use that change to track down the failures. | 17:57 |
jonesn | Is anyone around who could answer a few (probably basic) questions about adding a gate to a project? | 17:57 |
jeblair | clarkb: we have a significant first-run error rate as it is, so i don't think running twice is necessary to prevent regressions. | 17:57 |
jeblair | clarkb: (our error rate is also significant enough that i don't think that a single run of that change is enough to show that successive runs always cause problems. i run locally with no cleanup between runs and pass/fail about as often as the gate) | 17:59 |
clarkb | ok | 17:59 |
jeblair | clarkb: may want to throw a bunch of rechecks at it? | 17:59 |
SpamapS | jonesn: I'm certain people in here can answer questions about gates and projects. It may be best to just ask, and then when people have a moment they can answer. | 18:03 |
SpamapS | jonesn: most of us are pretty focused on v3 dev, so your patience is very much appreciated. :) | 18:03 |
clarkb | jeblair: ya, thoguh I think SpamapS --analyze-isolation plan is likely to return the best results in short term | 18:03 |
jonesn | Is getting a tox based gate added as simple as "have a toxenv to run, edit zuul/layout.yaml, edit jenkins/jobs/projects.yaml"? | 18:03 |
jeblair | jonesn: are you asking about openstack's instance of zuul? | 18:05 |
jonesn | jeblair: Yes, I think. I'm trying to add a bandit gate to the cinder project. | 18:06 |
SpamapS | clarkb: To be clear, my suggestion was to hold a node that fails, and use --analyze-isolation on it when it fails. But jlk basically simulated that and got no insight. | 18:06 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Initial code for a fingerd log streamer https://review.openstack.org/456721 | 18:07 |
jeblair | jonesn: thanks, that context helps. these doc links may help: https://docs.openstack.org/infra/manual/creators.html#add-basic-jenkins-jobs should be pretty close to what you want, and for further information: https://docs.openstack.org/infra/manual/drivers.html#running-jobs-with-zuul | 18:07 |
clarkb | SpamapS: you shouldn't need to hold a node, just make it fail locally (which seems easy) and run analyze-isolation on that. :( that it didn't catch anything though | 18:07 |
jeblair | jonesn: the #openstack-infra channel is for discussion of openstack infrastrucuture tools, and there are more folks there that can help with this specific kind of issue | 18:08 |
jonesn | jeblair: Thank you. Sorry for being on the wrong channel. | 18:08 |
jeblair | jonesn: you're welcome (and it's not the *wrong* channel, just that there's a better one :) | 18:09 |
SpamapS | so.. | 18:11 |
SpamapS | the recent shake-up of Ubuntu dev has me worried about landing bubblewrap in xenial-backports | 18:12 |
*** jonesn has left #zuul | 18:12 | |
SpamapS | I've asked on the ubuntu-devel mailing list if they need help and gotten no replies | 18:12 |
SpamapS | Anybody want to migrate to Debian unstable? ;-) | 18:12 |
jeblair | SpamapS: iirc, fungi does. i'm not opposed. | 18:13 |
jeblair | (i'm also not opposed to centos, fwiw :) | 18:13 |
SpamapS | Yeah either one would be fine. | 18:13 |
clarkb | my concern with centos generally is lag on security patches | 18:14 |
SpamapS | also Debian's releasing every 2 years now, and stretch is about 3 weeks away. | 18:14 |
SpamapS | so it's actually not such a terrible thing to be on stable | 18:14 |
jeblair | SpamapS: it's possible i misspoke for fungi and that's actually what he would prefer :) | 18:14 |
jeblair | SpamapS: what's the feasibility of doing our own backport? | 18:14 |
SpamapS | jeblair: Oh our own backport is done. Just don't know if that's something infra wants to host somewhere. | 18:15 |
SpamapS | the bug requesting it is literally just a rubber stamp, the backporters team will run a script and upload the backport as soon as they get to it | 18:15 |
SpamapS | but there are 43 others in front of it. | 18:15 |
SpamapS | I've also offered to start helping with that, since I find it quite useful to have a functioning ubuntu backports system. | 18:16 |
mordred | SpamapS: we _do_ depend on a PPA in infra for one package, although it's not the happiest thing in the world to depend on it since it's a one-off with no process around it | 18:16 |
jlk | eww | 18:16 |
jlk | those always bit us at Blue Box | 18:17 |
SpamapS | PPA's are, IMO, Launchpad's killer feature. | 18:17 |
mordred | patched version of vhd-utils is needed to be able to make images for rackspace public cloud | 18:17 |
SpamapS | but yeah, one-offs w/o process are a problem. | 18:17 |
mordred | yup. totally agree | 18:17 |
jlk | agreed | 18:17 |
jlk | I had championed a similar thing over in Fedora land | 18:17 |
mordred | also, it's super annoying that there isn't already a tool that can make vhd images that rax public can consume | 18:17 |
jeblair | at least this would be time-limited (until next release) | 18:17 |
mordred | jeblair: ++ | 18:17 |
jlk | it just took a lot of time for it to come to fruition (after I left) | 18:18 |
SpamapS | Yeah a backports PPA is better than a "that's never getting into Debian" PPA. | 18:18 |
jlk | mordred: seems like that problem will solve itself in the future... :( | 18:18 |
mordred | yah. the vhd-util thing falls in to a "wow, that's a terrible patch" category | 18:18 |
mordred | jlk: sssh. we're hoping that future remains far away - will be a lot of work if it comes to pass soon :( | 18:19 |
SpamapS | and with a xenial PPA, it's somewhat natural to end up on the next LTS of Ubuntu with the package coming from universe instead of that PPA. | 18:19 |
mordred | SpamapS: ++ | 18:19 |
SpamapS | also it's possible the backporters team gets to it eventually | 18:19 |
SpamapS | and we just delete it from the PPA | 18:19 |
SpamapS | kk I'm convinced | 18:19 |
mordred | in any case, I personally would be fine with a bubblewrap ppa | 18:19 |
mordred | for now - since there is a path to the future | 18:19 |
Shrews | mordred: should you find time, i'm hoping if you could check out https://review.openstack.org/456721 and see if i'm heading in the correct direction. it's pretty basic atm | 18:22 |
mordred | Shrews: looking now | 18:22 |
Shrews | mordred: and if you wonder why i chose socketserver, it's the only thing i found that works with handling sockets in a forked/threaded manner | 18:22 |
Shrews | could not get my own manual version to function correctly :( | 18:23 |
mordred | Shrews: that looks good to me so far - jeblair you wanna double check that to make sure I'm not crazy? | 18:27 |
*** hashar has joined #zuul | 18:58 | |
*** hashar has quit IRC | 19:18 | |
*** dkranz has quit IRC | 19:33 | |
fungi | jeblair: you know me too well. i'd put in time on getting stuff going on sid if there was some consensus it's a good idea ;) | 19:34 |
fungi | (i mean, would be convenient for me since that's what my dev systems run...) | 19:34 |
fungi | (or stretch, sure, why not) | 19:34 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Initial code for a fingerd log streamer https://review.openstack.org/456721 | 19:37 |
mordred | fungi: I think bubblewrap is already in sid, so we'd be good there | 19:39 |
pabelanger | clarkb: do you have examples of the security lag for centos-7? never heard that before | 19:39 |
clarkb | pabelanger: patches to things (for example heartbleed) take a day or two longer than the other distros out there | 19:39 |
fungi | mordred: already in stretch (stable rsn!) and in jessie-backports... https://packages.debian.org/search?keywords=bubblewrap | 19:40 |
mordred | fungi: woot | 19:40 |
fungi | so pick your poison | 19:40 |
clarkb | pabelanger: I think because instead of centos patching things directly they wait for rhel then do all their testing then push? something like that maybe? | 19:40 |
pabelanger | clarkb: ya, if centos depends on RHEL, I could see the day lag or so | 19:41 |
fungi | mordred: if you _specifically_ want >=0.1.8 though, you're probably stuck waiting post stretch for the packages presently hiding in experimental | 19:41 |
fungi | but 0.1.7 can be had now in stable(backports), testing(frozen) or unstable | 19:41 |
fungi | jeblair: clarkb: SpamapS: should we add the executor security spec on today's meeting agenda for some last-minute digging into the comments about the on-a-test-node alternative? would like to be certain on the pros and cons before approval | 19:44 |
SpamapS | fungi: I feel like that horse is dead, but we can of course reanimate it if there are some more questions we forgot to ask while we killed it. | 19:49 |
fungi | clarkb: are you still on the other side of the fence there after my and SpamapS' subsequent comments about the additional cons? (or have you read them yet?) | 19:51 |
clarkb | I haven't read them yet. I wasn't going to stop moving foward as is (I noted that last meeting) | 19:51 |
* clarkb looks now | 19:52 | |
SpamapS | I actually kind of think the executor is a perfect case for a real kubernetes system btw. | 19:56 |
SpamapS | scale them out as needed, isolate each pod to vms owned only by the one project | 19:56 |
SpamapS | but that's a large pile of design work | 19:56 |
mordred | I agree with both things - I think there are some potential benefits of k8s things, and also that it'll be a large pile of design work to figure out how | 19:57 |
* clarkb discovered that k8s reused the metadata service design from openstack (and possibly elsewhere) had a sad | 20:00 | |
mordred | sigh | 20:00 |
mordred | oh well | 20:00 |
mordred | well, to be fair, the openstack one wouldn't be a problem if it was just a part of the normal api layer and thus scaled out with it instead of a separate service layer that nobody wanted to spend resources to scale | 20:01 |
mordred | it's not like scaling a rest service that returns json blobs is hard | 20:02 |
clarkb | mordred: I think NAT by definition is in the poor to scale category of solutions | 20:02 |
mordred | well - yah | 20:02 |
mordred | that part is stupid | 20:02 |
fungi | clarkb: yeah, i appreciate you were willing to not hold up approval, but since pabelanger scheduled it for approval tomorrow anyway we have time to find out if you still actually disagree. that's important to me (at least) | 20:02 |
clarkb | also I discovred it because it wasn't working :) | 20:02 |
*** dkranz has joined #zuul | 20:02 | |
clarkb | fungi: yes I think I still do disagree. We know the alternative works, and works relatively well at scale. We also know that ansible has been tricky to secure so I think using an unproven system is less desireable | 20:03 |
clarkb | thats not to say the other system can't work, its just a lot more risky imo | 20:04 |
pabelanger | I thought the issue of scaling out zuul-executor today was the caching of our git repos? Or have I confused something | 20:04 |
mordred | pabelanger: you have not | 20:04 |
clarkb | I think the option chosen is a good one if not going with the alternative and SpamapS did an excellent job laying out the problem space and optiosn available | 20:05 |
mordred | clarkb: can you expand "the alternative" into slightly more words so I'm sure I'm folling you? | 20:06 |
clarkb | mordred: running the ansible in the test env itself rather than on the executor. Basically what we do today with eg d-g | 20:06 |
fungi | clarkb: important enough to give up on being able to have untrusted pre/post playbooks which need access to things only the executor can do, like uploading logs/tarballs? it seemed like a pretty useful (if ambitious) feature, but i understand everything is of course a trade-off to some degree | 20:06 |
clarkb | fungi: thinking about it operationally, every time a hole is discovered we'd have to turn off zuul | 20:07 |
mordred | clarkb: gotcha. thank you | 20:07 |
fungi | or i guess the publication playbooks themselves do still need to be trusted, so it's more than those would have to run on the executor (without protections) while untrusted pre/post playbooks run on the test node? | 20:07 |
*** jkilpatr has quit IRC | 20:07 | |
clarkb | fungi: right | 20:07 |
fungi | er, more that those | 20:07 |
*** jkilpatr has joined #zuul | 20:07 | |
pabelanger | I mean, I like the idea of what clarkb is saying, mostly because that is the only way to do it today. However, I am happy to give the bubblewrap approach a shot too. | 20:09 |
fungi | i suppose there's nothing in the current design preventing a trusted playbook on the executor from calling ansible on the test node to run untrusted playbooks, so in theory we could support both if we already set up executor-side protections | 20:09 |
clarkb | fungi: the problem is you don't control that in the current design | 20:09 |
fungi | at worst it makes the spec under discussion redundant/overkill if it ends up only securing trusted playbook execution | 20:10 |
mordred | well - yah. I believe we need to write the "run ansible on test node" stuff at some point regardless - because there will be things that want to run with no restrictions but are untrusted | 20:10 |
clarkb | fungi: your users can push arbitrary code to run in the sandbox and current experience is its quite arbitrary | 20:10 |
clarkb | mordred: speaking of, does http://docs.ansible.com/ansible/raw_module.html need to be handled specially? | 20:10 |
mordred | clarkb: I don't believe so, no? but I'll go look at it to make sure | 20:12 |
mordred | clarkb: yah - it doesn't look like it does anything particularly interesting | 20:13 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Initial code for a fingerd log streamer https://review.openstack.org/456721 | 20:13 |
clarkb | mordred: executable set that to something like dd | 20:13 |
SpamapS | mordred: executable=/something/evil ? | 20:13 |
clarkb | mordred: then the freeform arg options to dd | 20:13 |
clarkb | SpamapS: ya that exactly | 20:13 |
SpamapS | needs a path filter | 20:14 |
clarkb | and you have to make sure the args list can't be abused either if its run by shell by default | 20:14 |
clarkb | since you can ; foo | 20:14 |
mordred | executable is talking about remote executable | 20:14 |
mordred | not local | 20:14 |
fungi | okay, so situation is that 444495 outlines a mechanism for securing ansible on the executor in the face of untrusted playbooks. we need to be able to run at least some trusted playbooks on the executor anyway, and if there is a shift in consensus later that we should only run untrusted playbooks remotely on single-use test nodes and not on the executor that's probably not a huge additional amount of work (but | 20:15 |
fungi | does lose us some exciting v3 features unfortunately) | 20:15 |
clarkb | ah ok | 20:15 |
mordred | as in, "don't run bash on the remote host, run XXX" | 20:15 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Initial code for a fingerd log streamer https://review.openstack.org/456721 | 20:15 |
SpamapS | fungi: I already have a working bubblewrap patch btw | 20:16 |
mordred | SpamapS: woot! | 20:16 |
SpamapS | https://review.openstack.org/453851 | 20:16 |
SpamapS | just needs bwrap | 20:16 |
SpamapS | doesn't do seccomp yet | 20:16 |
clarkb | SpamapS: why the subshell in bwrap-executor.sh? | 20:18 |
SpamapS | clarkb: for the FDs | 20:19 |
clarkb | does that not work without a subshell too? I guess concern is that something already has fd 11 and 12? | 20:19 |
SpamapS | clarkb: so /etc/passwd is just the result of the getent | 20:19 |
SpamapS | clarkb: we fork to run this so shouldn't be. | 20:20 |
SpamapS | oh we don't close all tho | 20:20 |
clarkb | (mostly just curious because shell magics) | 20:20 |
SpamapS | clarkb: we could get the fds | 20:20 |
jeblair | clarkb: the "test like production" design goal of v3 would be significantly compromised by "run ansible from the test node" alternative. i think we have to run ansible from something completely outside of the test framework in the general case. doing that with k8s is a reasonable thing to look into, much later. in the mean time, a light-weight containerization is very close from the POV of security architecture (just not in terms of scaling). ... | 20:33 |
jeblair | ... i hear your concern about vulnerabilities; this adds defense in depth -- we'll have (at least) two layers of protection for the executor -- which i find very reassuring. | 20:33 |
jeblair | (it also would constrain the CD aspect -- if you have zuul run jobs on a production server, there is no test node from which to launch ansible) | 20:34 |
clarkb | yup, as I said I think the spec does a good job of laying out the options and reasoning about why this one is chosen. I just personally think that given the trade off of turn zuul off for indeterminate period of time vs a couple annoying aspects about the run in the test env option the run in test env option is better | 20:34 |
clarkb | jeblair: why couldn't it run on the production nodes too? | 20:34 |
clarkb | or at least a production node if production == a test env | 20:35 |
jeblair | clarkb: ansible to the production node to run ansible? | 20:36 |
clarkb | I also don't think k8s fixes it in the general case where code is coming from users and is arbitrary (it would be if you scoped it down to a tenant and trust your tenant users though) | 20:36 |
clarkb | jeblair: something to the production node to run ansible (possibly ansible) | 20:37 |
jeblair | clarkb: if you have a job which is "run something on all the git servers" where does that job launch from? | 20:37 |
clarkb | but that something would be more tightly controlled | 20:37 |
clarkb | jeblair: head (list_of_git_servers) ? | 20:37 |
clarkb | (there are other security concerns to that too, but at least you've confined the scope of breakage to within the "env", and not env or orchestrator) | 20:38 |
jeblair | clarkb: that seems pretty arbitrary, and not really i think how people are accustomed to using ansible. the goal is to try to be as transparent as possible. if you think "i run this playbook to update my cluster" that should map easily to "zuul runs this playbook to update my cluster". i think if we add topology design requirements for users beyond that, it won't be very attractive. | 20:39 |
clarkb | jeblair: yes, I understand that. The problem is its a very poor choice for how we run zuul for openstack | 20:40 |
clarkb | at least if you are worried about the orchestrator thing being compromised that scope of that is quite large | 20:40 |
jeblair | i don't see why it's a poor choice | 20:40 |
jeblair | i see that it's important to be careful and get right; but that doesn't necessarily make it poor | 20:41 |
clarkb | because if I manage to get control of the orchestrator now I control everything and not just the test env that I owned | 20:41 |
jeblair | clarkb: sure, but there are many other aspects of zuul that if you got control over would give you similar access | 20:41 |
clarkb | and from what we have seen using ansible in this capacity is incredibly leaky | 20:41 |
clarkb | jeblair: right but we don't let arbitrary code execute within the context of those pieces of zuul | 20:42 |
jeblair | clarkb: we let zuul run with arbitrary configuration | 20:42 |
jeblair | clarkb: that seems almost as dangerous. if not worse. | 20:42 |
clarkb | jeblair: I'm not sure I follow? today zuul config is not arbitrary. Its reviewed by multiple individuals first | 20:43 |
jeblair | (personally, i'm actually more worried that we'll mess up something there than someone will escape bubblewrap) | 20:43 |
jeblair | clarkb: in v3 we have dynamic config | 20:43 |
clarkb | jeblair: you are saying bigger concern that tenant A might somehow get tenant B's secrets by configuring themselves to run that job? | 20:44 |
clarkb | (I can see that being a concern too) | 20:44 |
SpamapS | in a CD situation, Maybe this is misguided, but I don't expect any untrusted playbooks to be running. | 20:44 |
jeblair | clarkb: sure, or run a job they aren't supposed to, or run a job on a node they aren't supposed to. | 20:44 |
jeblair | SpamapS: i agree that would be bad form :) | 20:44 |
clarkb | SpamapS: but you don't actually have control over that | 20:44 |
SpamapS | I, the zuul admin, do. | 20:45 |
clarkb | (thought maybe thats something that should be configable? if it isn't arleady) | 20:45 |
SpamapS | final for all the things. | 20:45 |
jeblair | clarkb: i'm not sure what you mean by "but you don't actually have control over that" | 20:45 |
SpamapS | actually not final for all the things. | 20:45 |
clarkb | jeblair: users can push changes that change how jobs run. And then they will run before being reviewed | 20:45 |
clarkb | jeblair: so in the CD case you could have someone push a playbook that takes out testing/staging/prod whatever potentially without being reviewed first? | 20:46 |
SpamapS | just that if it's CD, I'm not running check jobs on the "actually depoy to things" branch | 20:46 |
SpamapS | of course, you still get a gun and a foot | 20:46 |
jeblair | clarkb: i would not recommend putting CD jobs in untrusted projects; config projects don't run with dynamic config. so if you put your CD stuff in config projects, you should be fine. | 20:47 |
jeblair | clarkb: (unless we mess up implementation, as i was saying above) | 20:47 |
clarkb | right gotcha | 20:47 |
SpamapS | one would hope anybody with access to the guns will be trying hard not to aim at their foot, and mostly use push powers (guns) for dire situations where reviews are broken. | 20:47 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Send interface_ip in the node description https://review.openstack.org/455639 | 20:49 |
clarkb | I think my concern on the ansible code execution front is I've spent a little time with the code base now and basically think there is no way to fully lock it down. Which is where bubblewrap comes in. And in that space people who know a lot more about it than me stillsay containers are not an isolation primitive. Whereas VMs (which still do get compromised too) seem to have a better reputation in that space | 20:49 |
jeblair | clarkb: i think we may be crossing the hump on container as security primitive. certainly the bubblewrap folks seem to think it's reasonable. and it's being used more and more. | 20:50 |
SpamapS | so, bubblewrap _is_ trying to drop privileges, and close known doors out of containers. | 20:51 |
SpamapS | it's not just making namespaces | 20:52 |
jeblair | ++ | 20:52 |
jeblair | clarkb: i agree that ansible blocking is leaky. and containers have potential vulns (they are still new, in this context, in the scheme of things). but i think that the two together give me warm fuzzies. | 20:52 |
SpamapS | the one piece I'm missing in the implementation I did already is seccomp to further ratchet down access to kernel subsystems. | 20:53 |
clarkb | definitely (which I've said before, bubble wrap of the not VM option seems like a good one) | 20:53 |
SpamapS | and then another piece which I think is a layer we would want but is hard if not impossible to build into zuul, is configuration of a MAC | 20:53 |
mordred | clarkb: ++ | 20:53 |
mordred | this is why I've been an advocate for both things. I agree that we're unlikely to get perfect on locking down ansible - and also that i'm still holding out concerns about containers - but with both of them at the same time it seems to me to be in the acceptable range - and also will give me some time in production with a container tech to get to trust it more | 20:54 |
clarkb | I'm also slightly concerned about the operational overhead involved with getting a working secure bubblewrap on $distro, but thats not my real concern, thats solvable once and then done | 20:54 |
mordred | without relying _only_ on the container tech | 20:54 |
jeblair | clarkb: and i think that the design we've chosen for v3 is a good one -- i think that the way we are looking at jobs in v3 is nothing short of revolutionary in both the CI and CD spaces. i think it's worth doing as much as we can to try to achieve that. if bwrap+ansiblock is insufficient, i'd rather tack toward k8s/nodepool-extra-node/etc rather than give up on that idea. | 20:54 |
SpamapS | It may make sense to include an apparmo and/or an selinux setup kit. But I think those may be hard to make dynamic and built in. | 20:54 |
mordred | SpamapS: yah | 20:55 |
jeblair | SpamapS: the day we can run zuul from packages that include that will be a happy day :) | 20:55 |
mordred | jeblair: ++ | 20:56 |
pabelanger | fedora rawhide soon(tm) | 20:56 |
clarkb | (I also don't want to go too far down the path of k8s magically solving this problem as I do not think it does, its solves an orthogonal problem of having cheap throwaway execution envs that may or may not be leaky themselves) | 20:57 |
jeblair | clarkb: but ultimately, i think that bwrap+ansiblock(+more) is good now, and if we can get security+scalability out of k8s/nodepool/etc later, that's a good plan. | 20:57 |
jeblair | clarkb: indeed | 20:57 |
SpamapS | at this point, with just bubblewrap, if you can get ansible to bust out, you're sitting in a purpose built readonly dir with a bind-ro mounted, user namespaced (There is no root) /usr with a namespaced everything-except-networking, a locked fsuid ... | 20:58 |
clarkb | SpamapS: interesting that networking is not namespaced (but since there is no root that should be fine) | 20:58 |
SpamapS | it can be | 20:59 |
SpamapS | but we don't need it | 20:59 |
SpamapS | You are also limited to CAP_SYS_ADMIN, CAP_SYS_CHROOT, CAP_NET_ADMIN, CAP_SETUID, CAP_SETGID (but without a way to get a uid==0) | 20:59 |
clarkb | SpamapS: do we want to allow CAP_NET_ADMIN if not namespacing networking (or do you mean in general you can have CAP_NET_ADMIN?) | 21:00 |
SpamapS | clarkb: bubblewrap allows those caps, and drops all others. | 21:01 |
SpamapS | could probably suggest a feature where if not doing --unshare-net then CAP_NET_ADMIN is dropped | 21:02 |
SpamapS | a lot of what's done in bubblewrap is dropping "things that make no sense to give to applications". But there are, I guess, apps that need that CAP sometimes. | 21:02 |
clarkb | SpamapS: ya just thinking that could be used to DoS the env if you break out of ansiblock | 21:03 |
SpamapS | it also implementes privilege separation for things that it does post-fork | 21:04 |
SpamapS | it's just solid sandboxing | 21:04 |
jeblair | Shrews, mordred: why not make fingerd part of executord? | 21:13 |
Shrews | jeblair: tried that first, but gotta be root for the finger port | 21:14 |
jeblair | Shrews: ah fun. well, we still have to solve that even if it's a separate program, right? we could drop privileges when daemonizing... | 21:16 |
Shrews | jeblair: so it was either a separate thing, or change executor to drop privs to the configured user | 21:16 |
jeblair | is there a way to use capabilities with a python program? | 21:16 |
mordred | https://pypi.python.org/pypi/deescalate/0.1 | 21:17 |
jeblair | (can you setcap /usr/local/bin/foo.py or would you have to setcap /usr/bin/python)? | 21:17 |
clarkb | could also potentially do the gerrit thing and run on some high port by default? | 21:17 |
clarkb | (thats less useful if you just want your finger command tow ork though) | 21:17 |
mordred | which is, of course, using C: https://github.com/stephane-martin/deescalate/blob/master/deescalate/_deescalate.pyx | 21:18 |
jeblair | clarkb: yeah, could do that and iptables. | 21:18 |
jeblair | i think the daemon module also supports switching uids | 21:20 |
Shrews | jeblair: i went with the separate daemon to avoid any pesky security things by changing how the executor privs work, but i could go back and rework it again to be in the executor if you prefer. | 21:20 |
Shrews | i think the separate process is actually pretty simplified. it can get all the info about jobs it needs from the zuul.conf file. but tomato tomato | 21:22 |
* Shrews just glad to be coding on non-nodepool things :) | 21:22 | |
jeblair | Shrews: i'm not convinced i know the right answer right now. combining it with executor pros: there's a 1:1 relationship between executors and fingerds, so it makes sense. there would be less boilerplate process code for devs and fewer daemons for operators to know and run. if we used threads, we have easy internal access to the jobdir and the host inventory. cons: more work for the executord, especially if we use threads. we can still use ... | 21:24 |
jeblair | ... fork, though then we lose the easy access to internal variables. | 21:24 |
jeblair | Shrews: on balance, i'm leaning toward ignoring the internal variable access argument so that we can choose thread/fork as appropriate. but i'm being swayed by the idea that if we combine them, we don't need to keep track of (or explain to operators) an extra process. | 21:25 |
SpamapS | I'm curious now... | 21:26 |
SpamapS | how does zuul know what to put in the finger hostname? | 21:26 |
jeblair | Shrews: (and i realize you have already solved the jobdir location problem, so that shouldn't be an issue. at some point in the future, we will probably want to know host inventory so we can request /var/log/syslog@test-host. but of course, we can read the ansible inventory file. :) | 21:27 |
SpamapS | in the past, the telnet hostname is just the node's best effort public IP | 21:27 |
jeblair | SpamapS: the scheduler will know the executor running a job, so it can say "finger UUID@executorhostname" | 21:27 |
Shrews | jeblair: playing devil's advocate, the executor should just "execute" jobs | 21:28 |
Shrews | but also, we're sending the finger requests to the executor host, so.... | 21:28 |
SpamapS | jeblair: yeah, then that's another good argument for fingerd==executord | 21:28 |
jeblair | Shrews: i hear that. i think there's a fuzzy line between too few and too many microservices. i'm not sure the best way to charactize that, but things like "relationship between services and hosts" is one of them, "how annoying is it for operators" is another, and "how does it affect scalability". | 21:30 |
jeblair | Shrews: those first 2 might be the same thing. :) | 21:30 |
Shrews | jeblair: i'll code it up the other way and then we can do a side-by-side comparison. maybe that will help with the decision making? | 21:30 |
Shrews | grrr... i think i lost the code from the first time. ah well. | 21:32 |
fungi | from a security perspective, separate daemons _feels_ safer. but the devil is of course in the details | 21:32 |
Shrews | fungi: YOU are the devil | 21:32 |
jeblair | Shrews: at any rate, for the last one, i'd say that's a weak push toward microservice, but the first two, i'd rate as a slightly stronger push toward monolithic. | 21:33 |
fungi | i advocate for him well, at any rate | 21:33 |
jeblair | fungi: good point, that should be on the list too. :) | 21:33 |
jeblair | fungi, Shrews: i think the security footprint is similar in both cases (we want this to run as the zuul user after getting the port and dropping privileges regardless). so it should end up having the same level of access. | 21:34 |
* fungi still thinks qmail was a good design, security-wise (just made for a fairly unfun management situation at times if you forgot what needed to be kept running) | 21:34 | |
fungi | yeah, i guess my worry is that you inadvertently open up an anonymous/unauthenticated vulnerability in the finger socket implementation which allows an attacker to influence or even take control of an executor running a sensitive job | 21:35 |
fungi | i get that's a ton of hand-waving though | 21:35 |
Shrews | fungi: yeah, that was my initial thinking in choosing the separate daemon pathway | 21:37 |
fungi | and still conceivable even if they're separate... assuming the fingerd has access to all the same files on disk that the executor does | 21:38 |
fungi | though would it be possible to limit its access to just the logs it's intended to stream? | 21:38 |
jeblair | fungi: if that's a concern, we *could* start a separate process from the executor daemon (like we do for geard). so at least it's transparent for the user. the identical access argument makes me weigh this fairly lightly though. | 21:38 |
fungi | yep, i get that | 21:39 |
fungi | just doing my part at devil's advocacy | 21:39 |
fungi | ultimately it's going to be about the same either way, and one way is less work | 21:39 |
jeblair | Shrews: i left some total nits on PS4 because that's what you were on when i started typing them. :) | 21:40 |
jeblair | (ok, 3 nits and one actual thing) | 21:40 |
Shrews | jeblair: k. ps5 just adds the actual streaming | 21:41 |
Shrews | which is really just a rip off of zuul_console.py | 21:42 |
jeblair | that seems to have been sucessfully streaming logs, so ++ :) | 21:42 |
Shrews | fwiw, i do not believe zuul_console would actually properly close the sockets at the end of the log. i had a hard time getting that to actually work | 21:43 |
mordred | jeblair: btw I read the finger protocol RFC ... and in so doing learned about finger foo@bar.com@bang.com syntax | 21:43 |
Shrews | and the vending machine query functionality! | 21:45 |
jlk | today's data point | 21:46 |
jlk | my tree passes tests with 8 cores, concurrency 8 | 21:46 |
jlk | fails tests with 8 cores concurrency 4 | 21:46 |
jlk | fails tests with 4 cores concurrency 8 | 21:46 |
jlk | (unless tox /testr forces concurrency to be no more than cores) | 21:47 |
jlk | (also fails 4 cores, concurrency 4) | 21:47 |
fungi | unit test race(s)? | 21:48 |
mordred | jlk: I swear to god, the fix is going to be a comma some where | 21:49 |
SpamapS | jlk: have you tried changing the hard fail to soft fail in the timeout setup? | 21:50 |
jlk | I have not. | 21:50 |
jlk | what would that look like? | 21:50 |
SpamapS | gentle=False changes to gentle=True | 21:51 |
SpamapS | in tests/base.py | 21:51 |
SpamapS | jlk: also do you have a log of your fail that isn't berzillions of bytes? | 21:52 |
jlk | no | 21:52 |
jlk | because it seems to need to be the whole she-bang | 21:52 |
jeblair | mordred, Shrews: would "finger /var/log/nova.log@compute@build_uuid@zuul.example.com" be appropriate? or would the rfc have us use something other than @ there? | 21:52 |
jlk | smaller sets of tests have passed | 21:52 |
jamielennox | no meeting today? | 21:52 |
SpamapS | jlk: k, is there one I can pull down? | 21:52 |
jeblair | jamielennox: i think we are going to have one? | 21:53 |
jeblair | unless we all decide it's not useful. :) | 21:53 |
jamielennox | isn't now the time? | 21:53 |
Shrews | jeblair: i think so. i was considering 'finger JOB:/log1@compute@something' | 21:53 |
jeblair | jamielennox: in 7 minutes. | 21:53 |
jamielennox | of have my calendars gone crazy again | 21:53 |
jamielennox | doh | 21:53 |
jamielennox | yea, my bad | 21:53 |
mordred | jeblair: I will not be able to be at the meeting- it took us slightly longer to arrive back in Dallas and we're just now hitting rush hour traffic | 21:53 |
SpamapS | jamielennox: sorry, we don't do wallaby time. | 21:53 |
jamielennox | SpamapS: i try so hard to get the hours lined up i completely screwed up the minutes ;) | 21:54 |
jlk | you could have been from one of those dictatorships that set their time to be a few minutes off, just to dick with people. | 21:54 |
fungi | i see you've read the constitution for my jerkocracy | 21:56 |
jamielennox | just google calendar changing the notification to 10 minutes, rather than starting now | 21:56 |
jeblair | fungi: i didn't think you let anyone read that? | 21:57 |
jlk | first rule of jerkocracy, nobody gets to know what the rules are. | 21:57 |
clarkb | the timezones that are off by half an hour always make my brain stop working | 21:57 |
jlk | I rather like the UTC complication on my watch face. | 21:58 |
clarkb | I can do hour math mostly. But when I have to add or subtract half na hour I dunno what happens but brain breaks | 21:59 |
SpamapS | jlk: you mean, like.. Bangalore? | 21:59 |
SpamapS | which is UTC+1230 | 21:59 |
SpamapS | or something like that | 21:59 |
jlk | I think there are a few others like that | 21:59 |
jamielennox | adelaide | 22:00 |
SpamapS | UTC+0930 I think | 22:00 |
jamielennox | luckily nothing really happens in adelaide | 22:00 |
SpamapS | like "oh we couldn't possibly have tea with the sun in the sky 3 degrees further on. no no no" | 22:00 |
pabelanger | Newfoundland time zone checking in! UTC-3:30 | 22:00 |
SpamapS | so now there is a meeting yeah? | 22:00 |
SpamapS | pabelanger: I guess closer to the poles it might matter. ;) | 22:01 |
jeblair | meeting time now | 22:01 |
jeblair | or 1 minute ago | 22:01 |
*** jkilpatr has quit IRC | 22:40 | |
*** jkilpatr has joined #zuul | 23:18 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!