clarkb | https://lwn.net/Articles/706025/ | 00:01 |
---|---|---|
SpamapS | ok I'm finding the cgroup things | 00:01 |
clarkb | looks like it may still be a work in progresss | 00:02 |
SpamapS | It works | 00:03 |
SpamapS | but it's new | 00:03 |
SpamapS | so I think the way you're supposed to do it | 00:03 |
SpamapS | is write a template unit file | 00:03 |
SpamapS | and then systemd-nspawn with that | 00:03 |
jlk | So it might help if we sketch out what we think we want to happen | 00:04 |
jlk | and then look at what tooling will do some of it, and what tooling will require us to write glue around it | 00:05 |
SpamapS | Yeah the spec right now is what we think might happen if somebody breaks out, and weighs the capabilities against that. | 00:06 |
SpamapS | But it doesn't do much to help design the happy path. | 00:06 |
SpamapS | What I do see as the happy path is 0) rootfs image with ansible-playbook, git, and deps, built periodically 1) Copy image into empty dir. 2) Make writable scratch space in dir. 3) [git magic in scratch space] 4) run trusted pre-playbooks in dir 5) chroot into dir and run ansible-playbook in untrusted context 6) run trusted post-playbooks in dir. 7) dust off, nuke it from orbit | 00:11 |
SpamapS | (it's the only way to be sure) | 00:11 |
jlk | I'm fuzzy on v3 stuff, the launcher is going to push code into the node, yes? | 00:13 |
SpamapS | yes, the pre playbook does that | 00:13 |
jlk | and the launcher is also what will fetch code from the upstream to merge the proposed patch(s) in the right order? | 00:13 |
SpamapS | [git magic] | 00:13 |
SpamapS | yes that's step 3 | 00:14 |
jlk | any reason why the [git magic] can't happen inside the containment? | 00:14 |
jlk | does it require too much zuul ? | 00:14 |
SpamapS | so that's where the debate on whether or not to have separate mergers comes from | 00:15 |
SpamapS | I recall there being some reason that having it separate is desirable | 00:15 |
clarkb | SpamapS: scaling is a big one (git merging is slow) | 00:15 |
SpamapS | but clearly it got ejected from my LRU cache | 00:15 |
jlk | there's a security concern | 00:15 |
jlk | if dealing with private repos | 00:15 |
SpamapS | Right, private repos is enabled by the auth system | 00:15 |
SpamapS | In theory anyway | 00:16 |
SpamapS | probably some plumbing to make that work eventually | 00:16 |
SpamapS | Since right now connections are what define how to fetch code. | 00:16 |
SpamapS | jlk: step 3 is in [] because that may also just be that we copy the merge result which the mergers created somehow somewhere. | 00:17 |
SpamapS | I think I'll add a revision to the spec that puts that general flow in for debate | 00:17 |
jlk | kk | 00:18 |
jlk | walk backwards | 00:18 |
jlk | to execute the in repo playbook, we need, the repo contents. heh. | 00:18 |
jlk | but wait | 00:18 |
SpamapS | aye we do | 00:18 |
jlk | repo has a playbook in it | 00:19 |
SpamapS | but in-repo playbooks are only in step 5 | 00:19 |
SpamapS | pre/post come from config repos | 00:19 |
jlk | k | 00:19 |
jlk | I'm still struggling with step 5 for a moment | 00:19 |
SpamapS | that's the biggie :) | 00:19 |
jlk | an in-repo playbook, gets executed by the launcher, so it has to be written from the context of .... | 00:20 |
jlk | this still feels weird to me, as in why we aren't prepping a VM, and making that in-repo playbook execute on the _vm_, with allthe code around it | 00:20 |
SpamapS | jlk: the reason we're using ansible is it lets us do some interesting things between vms | 00:21 |
SpamapS | jlk: but.. that's not a bad idea | 00:21 |
jlk | it does, but it opens up a big can of worms | 00:21 |
jlk | because if I want to do interesting things between VMs, I'm going to care about the ansibile I'm using to do it | 00:22 |
* rbergeron pulls out her pandora's box of dependencies | 00:22 | |
jlk | and the environment it's running in | 00:22 |
SpamapS | rbergeron: no it's ok, we're rewriting ansible in Rust | 00:22 |
SpamapS | almost done | 00:22 |
SpamapS | ransible | 00:22 |
rbergeron | SpamapS: oh good | 00:22 |
SpamapS | you heard it here first | 00:22 |
jlk | ansibust | 00:22 |
SpamapS | ding ding | 00:22 |
* SpamapS renames | 00:23 | |
SpamapS | jlk: so I see what you're getting at | 00:23 |
SpamapS | and I kind of love it | 00:23 |
SpamapS | just push the git trees up and run ansible from one of the nodes | 00:23 |
SpamapS | and they can do whatever they want | 00:23 |
SpamapS | no plugins needed even | 00:24 |
jlk | it just feels wrong to me to hand the repo rights to execute things on our control environment | 00:24 |
SpamapS | I agree | 00:24 |
SpamapS | ansible is too powerful for that | 00:24 |
jlk | maybe it costs jobs an extra VM as a bastion | 00:24 |
jlk | but it side steps a whole lot of nastiness | 00:24 |
SpamapS | Right, that bastion can be pretty tiny | 00:25 |
SpamapS | and if your job can handle it, just run your tasks on localhost | 00:25 |
jlk | I may be biased, I've written ansible to run ansible before | 00:25 |
SpamapS | so the default py27-tox job is just a localhost shell. | 00:26 |
jlk | yeah | 00:26 |
jlk | you can get even trickier | 00:26 |
jlk | thi sis ugly, but you add 10 hosts, all with ansible_connection=local | 00:26 |
jlk | and then you can do things in parallel, locally | 00:26 |
jlk | stop, disregard that. | 00:27 |
SpamapS | Yeah that's not the thing :) | 00:27 |
SpamapS | I think what you'd do is simply have a predictable host that the executor uses to run the in-repo playbooks. | 00:27 |
SpamapS | mordred: ^ why didn't you think of this? ;) | 00:28 |
SpamapS | jeblair: ^ | 00:28 |
SpamapS | clarkb: ^ | 00:28 |
jlk | like, Please do convince me why it's a good idea to be doing the in-repo playbook execution _on_ the executor | 00:28 |
SpamapS | oh man, time changed.. it's 5:30 but I could like, go outside and feel the day star still | 00:28 |
clarkb | except its raining | 00:28 |
SpamapS | clarkb: what you need is a nice drought-addled state ;) | 00:29 |
clarkb | SpamapS: oregon is completely drought free for the first time in 5 years or something | 00:30 |
clarkb | ERAIN | 00:30 |
jlk | I think we've had 3 "mild" days since October. | 00:33 |
SpamapS | I believe California is now 98% drought free | 00:35 |
SpamapS | with San Diego being the only part still slightly behind | 00:35 |
SpamapS | jlk: seems mordred and jeblair are not around. I think we should discuss your thinking with them tomorrow. | 00:35 |
clarkb | also coldest winter in a quarter century. I have not enjoyed thsi winter | 00:36 |
jlk | I could be missing something fundamental | 00:36 |
SpamapS | Because honestly.. I'd much rather inject ansible into the node that we're already working hard to build and isolate, than try to execute it securely on our executor. | 00:36 |
SpamapS | http://www.laalmanac.com/weather/we13a.htm | 00:37 |
SpamapS | We're a wee bit ahead of usual season totals ;) | 00:37 |
jlk | not exactly zuul related, but I have a question on workflow | 00:38 |
jlk | I feel like I'm doing this wrong. | 00:38 |
jlk | I've got this long series of patches I'm trying to bring over, one by one to v3 and submit them | 00:39 |
jlk | what I've been doing is making a new topic branch based on the previous topic branch, then cherry-picking the old patch over. | 00:39 |
jlk | fixups, then git review -t | 00:39 |
jlk | I realize that if I need to fix one of the early patches, I now have like 20 branches to mess with | 00:40 |
jlk | should I be doing this all from _one_ local branch, so that a fixup/rebase will bubble up the stack once and a git review -t will re-submit new patch sets for the whole stack in one shot? | 00:40 |
jhesketh | jlk: that is the approach I take with long series (and I've seen others too).. It's kind of expected that a change down the stack will rebase subsequent work. The patchset diff view in gerrit makes re-reviewing changes easier | 00:41 |
SpamapS | jlk: one branch, definitely | 00:42 |
SpamapS | name the branch your desired -t argument | 00:42 |
jlk | okay. I had started with one branch, but for some reason I abandoned that early on, and I can't recall why. | 00:42 |
SpamapS | git review will set topic based on branch name IIRC | 00:42 |
clarkb | ya I do a lot of git rebase -i HEAD~N | 00:42 |
SpamapS | that^ | 00:43 |
clarkb | and jeblair hsa a little utility to figure that rebase command automagically for you | 00:43 |
clarkb | I forget the name though | 00:43 |
SpamapS | would be cool to have that as a git sub-command | 00:43 |
clarkb | git restack? | 00:43 |
SpamapS | like git review-rebase or something | 00:43 |
clarkb | SpamapS: ya I think it is | 00:43 |
SpamapS | though it's probably also part of stgit | 00:43 |
clarkb | so if you pip install it you just git foo it | 00:43 |
SpamapS | well but stgit doesn't check gerrit for you, just checks when you last stacked | 00:43 |
jeblair | yep, git-restack; it's on pypi | 00:44 |
clarkb | I just do git rebase -i HEAD~N because I have used it for years and harder to retrain brain then to keep going wiht it | 00:44 |
jeblair | yeah, it's like ^ but for people who can't count. like me. :) | 00:45 |
jeblair | https://pypi.python.org/pypi/git-restack/1.0.0 | 00:45 |
jeblair | has doc and source links | 00:45 |
jhesketh | I usually rebase back onto master or the target branch as it helps with merge-conflicts and the testing from the gate anyway | 00:52 |
SpamapS | jlk: I'm writing up what we talked about into the spec. I do have a few little gotchas (but they're smaller than "use docker") | 00:53 |
SpamapS | But it also gains us a feature | 00:54 |
SpamapS | which is that you'd only need a public IP on the untrusted executor node | 00:54 |
SpamapS | s/public/reachable/ | 00:54 |
*** jamielennox is now known as jamielennox|away | 00:58 | |
SpamapS | jlk: spec updated to include execution on node | 01:07 |
SpamapS | I suspect the reason it was discounted was that putting ansible on the node and running it there feels like intruding on the user's node. | 01:08 |
*** jamielennox|away is now known as jamielennox | 01:12 | |
jlk | so the jobs and such that would run tox, were they going ot run on a node, or were they going to run right on the executor? | 01:12 |
jlk | A concern I have outside of security with allowing execution on executor, is that we'd have to scale them a lot more. Before, nodepool as the scale point, because that's where the jobs executed. More jobs == more nodes. But if it can happen on the executor, that's something new we'd have to scale | 01:14 |
jlk | and that cost model may be different. Nodepool resources may be "cheaper" (donated), but infra resources need to be _ours_ and more tightly managed. | 01:14 |
pabelanger | jlk: SpamapS: FWIW this is how I test ansible things today with zuulv2.5, clone repos on node, pip install ansible, set host to 127.0.0.1. Works as expected for single host jobs | 01:40 |
jlk | in V3, in layout, is there still a syntax when defining a project to say "run these jobs only if this first job succeeds" ? | 01:41 |
jlk | the docs say yes, but my testing is upset with the syntax | 01:41 |
jlk | pabelanger: that's what we do too | 01:43 |
jlk | we're actually doing that plus multi-node to test ansible ansible working with multiple nodes | 01:43 |
pabelanger | devstack works that way today too | 01:43 |
jlk | ah, syntax changed slightly | 01:45 |
pabelanger | but, so far with my testing of zuulv3, things work as expected from zuul-executor. Assuming we can agree on the container / chroot method of running ansible-playbook, that gets us 99% of things covered I think | 01:46 |
jlk | SpamapS: are we tracking docs bugs in v3 anywhere? | 01:46 |
pabelanger | then we don't need to worry about bootstraping workers with ansible dependencies | 01:46 |
jlk | well... | 01:46 |
jlk | it works until you want to test yoru stuff with a different version of Ansible than what's on the zuul-executor | 01:46 |
pabelanger | sure, but that is an issue today that is not specific to zuul | 01:47 |
jlk | right, however | 01:47 |
jlk | "run on the node" model allows the user to influence the version of Ansible used | 01:47 |
jlk | "run on the executor" does not | 01:47 |
pabelanger | right | 01:47 |
pabelanger | but | 01:47 |
pabelanger | I think we want to have a playbook that allows a user to run ansible on the node | 01:48 |
pabelanger | so, zuul-executor runs a playbook on a remote worker, to then run ansible | 01:48 |
pabelanger | I wonder | 01:49 |
pabelanger | if we do have ansible-playbook in a container, on executor, could we not expose which container (and version of ansible) to run? | 01:49 |
jlk | I honestly thing people got too excited that Ansible is the task execution engine and crept too far up the stack. | 01:50 |
pabelanger | that will get complicated fast I think | 01:50 |
jlk | There's a ton of work that went into scale/isloation of nodes to run suspect code | 01:50 |
jlk | re-producing that inside the executor feels wrong | 01:50 |
jlk | pabelanger: I think we'd want to just simply expose containers in general as a "node" resource in nodepool to go down that route | 01:51 |
pabelanger | I think this is where k8s comes into play | 01:53 |
pabelanger | and with that | 01:53 |
pabelanger | I am getting a beer and calling it a night | 01:53 |
openstackgerrit | Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Support for dependent pipelines with github https://review.openstack.org/445292 | 04:55 |
openstackgerrit | Jamie Lennox proposed openstack-infra/nodepool feature/zuulv3: Don't try and delete nodes with no external_id https://review.openstack.org/445299 | 05:20 |
mordred | jlk: the intent is that work always happens on nodes, never on executors - that's why we disallow local execution etc | 05:21 |
mordred | jlk: you said many more things above that will be easier to chat about when we're both online at the same time I think though :) | 05:22 |
jlk | most likely, yes | 05:22 |
jlk | maybe even in a voice situation | 05:22 |
mordred | yah - possibly so | 05:23 |
mordred | SpamapS, jlk: yes, amongst the reasons we didn't execute the ansible on the remote node is that then we'd have to install things on the remote node - and we'd have to re-implement the remote execution code that ansible already has for executing things remotely in order to execute things remotely - so we'd end up having unclean target nodes again and a more brittle execution pipeline | 05:29 |
jlk | it's a game of tradeoffs | 05:29 |
mordred | yes, it is | 05:29 |
jlk | because now we're re-implementing user provided code execution isolation | 05:30 |
mordred | well, we don't necessarily have to re-implement that part - it is totally acceptable for us to use an existing technology for such containment | 05:31 |
jlk | well, I mean, you're having to add a second containment system so that you can execute things in your existing containment system | 05:31 |
mordred | again - tradeoffs - so far we've been uncomfortableish with the root requirements of most of them - or the VC-backed obvious pattern of industry abuse of the one that doesn't require zuul itself to be root | 05:31 |
mordred | but the executor executing a local process that is "rkt ansible-playbook" instead of "ansible-playbook" isn't nearly as different as replacing "ansible-playbook" with "use paramiko to scp files to remote host, then use paramiko to execute ansible-playbook there" | 05:33 |
mordred | because then we'd also have to do something about trusted pre and post playbooks copying resources - because if we allow code to execute on the same host - even if it's the remote one, we can't trust that the user didn't rootkit the host anymore | 05:34 |
jlk | right | 05:34 |
mordred | which means we wouldn't be able to use ansible to write log publishing or artifact copying or execution of jobs that need secrets | 05:34 |
jlk | How does the user write things with this in mind? | 05:35 |
rbergeron | stupid rootkits | 05:35 |
mordred | jlk: so - 2 different answers in my head - for two different userbases ... | 05:35 |
jlk | I'm trying to map this in my head, how I would craft ansible playbooks to do my unit tests and whatnot | 05:36 |
jhesketh | mordred: why not use ansible on the executor to run ansible on the node? | 05:37 |
mordred | yah - so the main thing is that you can't write playbooks that do host=localhost - they must always be hoest = somenodename - and if you have tasks that naturally do something on the localhost like copy - you can't reference absolute paths that are outside of the execution context | 05:37 |
mordred | jhesketh: right - that's the answer for fokls who _do_ need to test more exquisitely crafted ansible | 05:37 |
mordred | (btw, I'm saying all this like how it is mostly just to try to explain thinking thus far, not to necessarily remove discussion of what might be) | 05:39 |
jlk | what feels "easiest" to me, would be using ansible on executor to run the pre playbook (since we control that content), which preps the remote. Then whatever the user provided (be it a shell call or a full playbook) gets executed _on_ the node (since it's isolated, tenant separated, etc..) via ansible calling ansible, and then the post is ran again from the executor since we control that content, and ostensibly nothing has leaked back from the node to the | 05:41 |
jlk | executor. | 05:41 |
jlk | but I get that it's awkward | 05:42 |
mordred | well - I 100% think we should write that | 05:42 |
mordred | and have it be an opt-in that a user can request for one of their jobs | 05:42 |
mordred | because we need it to handle the case of "test ursula" - since ursula needs plugins and whatnot (ursula is the most complex ansible I'm deeply familiar with, so it's my usual go-to in my head for complex use-case) | 05:43 |
jlk | right, and Ursula cares _which_ version of ansible is used | 05:44 |
mordred | but I _think_ we can write a lot of the "execute remote ansible" with ansible | 05:44 |
jlk | or say openstack-Ansible | 05:44 |
mordred | like a role | 05:44 |
jlk | which may want to test _multiple_ versions of Ansible | 05:44 |
mordred | that you can request an ansible version with | 05:44 |
mordred | so like "role: remote-ansible version: 2.1 target: {{ zuul.work_dir }}" or something | 05:44 |
jlk | but this also feels kind of like python envs or ruby envs | 05:44 |
mordred | that'll be super crazy ansible but that we wrote | 05:45 |
jlk | in travis we can say "I want py27, py32" | 05:45 |
jlk | I don't know how those are implemented, but I know when my job runs on them that version of python is there | 05:45 |
jlk | those might be nodepool images | 05:45 |
mordred | I don't think they need to be images | 05:45 |
mordred | I think the "run ansible" job in the standard library can install the requested version of ansible on the node in a pre-playbook | 05:46 |
jlk | yeah, it could be pre-playbooks to install them | 05:46 |
mordred | and then run with that | 05:46 |
mordred | and in zuul.yaml you'll just say "I want to run the "run-ansible" job on my repo" sort of like how in travis you say "I want to run with python2.7 | 05:47 |
jlk | wandering into the territory of "how does a site decide and expose these different env types" | 05:47 |
mordred | but for people who just want to run py27, they can just say "I want to run the py27 job" | 05:47 |
jlk | or, maybe our site just doesn't _do_ that feature. | 05:47 |
mordred | jlk: yah - we definitely need a plan for that so we don't accidentally and with the best of intention turn zuul into another non-compatible openstack mess | 05:48 |
jlk | heh | 05:48 |
jlk | so far, so good | 05:48 |
mordred | jlk: I think zuul itself eventually wants some amount of a "standard library" of jobs | 05:48 |
mordred | and some amount of site specific jobs | 05:48 |
mordred | so all zuul users no matter operator should be able to, for instance, depend on a base python27 unittests job | 05:48 |
mordred | that's pretty much like the travis: python: 2.7 thing | 05:49 |
jlk | right. We were hoping to be able to survey a bunch of users of travis and others, to see if there are some common things like that which bubble up | 05:49 |
mordred | but then, you know, sites that aren't openstack probably don't need a pre-canned "run devstack for me" - although that's clearly a centrally defined job openstack needs | 05:49 |
jlk | well, travis has a py27 env, but it doesn't run anything by default. | 05:50 |
jlk | you still have to define a script to run | 05:50 |
mordred | yah - I think we want to have one that does run things - or multiples ... like pabelanger's run-tox thing he's been working on for tox | 05:51 |
mordred | but then having a "install py27" pre-playbook / base job that people can use is probably also a good idea | 05:51 |
jlk | yup. our hunch is that the vast majority of those script calls are going to be to tox, or something like that | 05:51 |
mordred | yup | 05:51 |
jlk | and that there are similar patterns for other languages | 05:52 |
jlk | doing it as pre stuff is an interesting balance. | 05:52 |
mordred | so we let peopel make it easy to write scripts that run python with different pythons, but also bake in a tox target that'll just run tox so they don't even have to write a script for it | 05:52 |
mordred | yup | 05:52 |
jlk | reduces the number of images you have to build every night, at the cost of a longer "spin up" for each test | 05:52 |
mordred | yes | 05:52 |
mordred | building the ansible standard library for this is one of the exciting bits, but I think often gets lost in our discussion of building the zuul framework itself | 05:52 |
mordred | so when the answer is often "you can just do that in ansible" - that doens't mean a user _has_ to write ansible, because hopefully we've written enough standard library for normal tasks that they're just referencing pre-written zuul jobs | 05:53 |
jlk | right. the "in ansible" is just an implementation detail they don't care about | 05:54 |
mordred | (run-autotools, run-maven, run-cargo, run-cmake, etc) | 05:54 |
mordred | yup | 05:54 |
jlk | it's just one more job to list, I suppose | 05:54 |
mordred | and it's there for them when they want to get advanced | 05:54 |
* SpamapS catching up with backscroll now | 05:54 | |
jlk | SpamapS: we've carefully danced around actually arguing about the spec, and have found happier topics to discuss instead | 05:55 |
* mordred apologizes for being in europe and waking up to do the jet-lag-conversation-bomb | 05:55 | |
mordred | jlk: :) | 05:55 |
SpamapS | no it looks productive | 05:55 |
mordred | jlk: to be honest, I think this is actually really good backgroud content | 05:55 |
SpamapS | and about the spec :) | 05:55 |
mordred | yah - what SpamapS said | 05:55 |
jlk | I'm doing a fine job of avoiding working on my book. | 05:55 |
mordred | ++ | 05:55 |
jlk | the chapter I'm working on right now is ansible + containers, so... | 05:56 |
mordred | \o/ | 05:56 |
SpamapS | So I feel like we could put ansible into a chroot in a temp dir venv, run it, and not have it disturb anything on the node. Crazy? | 05:56 |
SpamapS | jlk: THIS IS working on that ;) | 05:56 |
mordred | fwiw, if the problem with bubblewrap is that it needs zuul-launcher to have root, that's also the problem with rkt - but I think just granting zuul-launcher the ability to run rkt as root is better than not wrapping or wrapping incompletely | 05:57 |
SpamapS | and we wouldn't need any special execution code in zuul itself. We'd have local ansible that we control from a config repo run remote ansible playbooks that come from in-repo def. | 05:57 |
SpamapS | mordred: still feels like we're reinventing isolation instead of just using what we already have. | 05:58 |
rbergeron | jlk: oh, ansible + containers isn't a rabbit hole or anything | 05:59 |
jlk | I just had a bad vision of porting oslo runroot over to zuul. | 05:59 |
* rbergeron sighs | 05:59 | |
jlk | rbergeron: I'm just doing a surface scratch, Packt wanted another chapter. | 05:59 |
SpamapS | btw I really wish we had like, a rack of ribs, glasses of whiskey, and a patio view of the sea, to get this discussion done. :) | 05:59 |
mordred | 05:29:26 mordred | SpamapS, jlk: yes, amongst the reasons we didn't execute the ansible on the remote node is that then we'd have to install things on the remote node - and we'd have to re-implement the | 05:59 |
mordred | | remote execution code that ansible already has for executing things remotely in order to execute things remotely - so we'd end up having unclean target nodes again and a more brittle | 05:59 |
mordred | | execution pipeline | 05:59 |
jlk | I also think this discussion is funny, because decisions here may drastically impact the talk I pitched for mordred and I do to at AnsibleFest in London. | 06:00 |
SpamapS | Yeah so I'm suggesting that we could probably do that without being brittle or dirtying the node. | 06:00 |
mordred | otoh - just make zuul run "sudo rkt ansible-playbook" | 06:00 |
rbergeron | jlk: yup, i think that's about what oreilly wanted for the next thing that lorin and rene are working on. that and "ansible 2.0 plz" | 06:00 |
jlk | shocker, same. | 06:01 |
SpamapS | And we could probably do it with less complexity than rkt/bubblewrap/systemd-nspawn/etc. | 06:01 |
SpamapS | mordred: yeah rkt may be the answer | 06:01 |
SpamapS | It's a nice tight wrapper around systemd-nspawn and even kvm. | 06:01 |
jlk | if we gave zuul a very fine line of being able to call rkt to launch unprivelaged containers, that's not too dirty | 06:02 |
jlk | ideally not running all of zuul-executerd as root | 06:02 |
mordred | yah | 06:02 |
SpamapS | The thing I worry about is that we're also building images. | 06:02 |
SpamapS | (with ansible in them) | 06:02 |
SpamapS | and it's just a lot. | 06:02 |
mordred | we shouldn't be building images with ansible in them | 06:02 |
mordred | oh - rkt images | 06:03 |
rbergeron | since i missed half of this: is this mostly related to the containery topic or ... evvvveryting? (not the rkt, but everything else about where the zuulecutioner lives) | 06:03 |
SpamapS | mordred: right | 06:03 |
SpamapS | it's a smaller thing | 06:03 |
mordred | I do not believe you need to build full images for rkt like you do with docker | 06:03 |
SpamapS | and less surface | 06:03 |
SpamapS | but.. just feels weird. | 06:03 |
mordred | I'm pretty sure you can point rkt at a chroot like dir | 06:03 |
jlk | you don't, just a directory tree | 06:03 |
mordred | yah | 06:03 |
jlk | but you still have to build that tree | 06:03 |
SpamapS | yes it's still an image | 06:03 |
SpamapS | a chroot image | 06:03 |
mordred | and we already build a direcetory tree anwyay | 06:03 |
jlk | and do an overlay for each run | 06:03 |
SpamapS | but still an OS | 06:03 |
SpamapS | a user space anyway | 06:03 |
SpamapS | This will feel a lot less awkward once I'm done with Ansibust | 06:04 |
mordred | we don't need a full user space - we need ansible-playbook | 06:04 |
jlk | rbergeron: it's around "how do we run user supplied code on a zuul control node without allowing the user to own the control node" | 06:04 |
SpamapS | mordred: which needs rsync, python, in the midnight hour, it screams moar moar moar | 06:04 |
jlk | rbergeron: to satisfy a zuul v3 feature of "directly run user provided playbook content" | 06:04 |
SpamapS | mordred: it's not a "let's not do this" | 06:04 |
mordred | SpamapS: yah - but it doesn't necessarily need to be like a full ubuntu image | 06:05 |
SpamapS | it's "what if we didn't have to do this?" | 06:05 |
mordred | totes | 06:05 |
mordred | I'm just saying - it would be neat to figure out what the smallest thing that looks more like an app container is that we need to run ansible-playbook | 06:05 |
mordred | I'd also like to keep 2 things as separate for now until they need to not be separate | 06:06 |
mordred | that's a) allow zuul to test 'ursula' and b) protect the ansible execution on the launchers | 06:06 |
mordred | it MIGHT be that solving b solves a | 06:06 |
rbergeron | jlk: without allowing the user to own the control node == you're theoretically enabling the user to pwn the control node through some malicious playbook? | 06:06 |
jlk | rbergeron: correct. Ansible is pretty powerful, if you give a user the ability to provide their own playbook content, there are numerous ways they can interact with the host that's running ansible-playbook, in nefarious ways | 06:07 |
mordred | but it's not necessarily true - and I think that jlk brought up many good points such as needing specific ansible versions that make me still thing a) is a great job for a really nice complex ansible role we right ourselves | 06:07 |
rbergeron | i guess that's not an == but something else, i am tired... | 06:07 |
mordred | for b, to keep the rkt image really small should be easy since we still wouldn't be allowing host=localhost - the number of binaries that are actually needed should be super minimal | 06:08 |
mordred | we _already_ copy the ansible modules library on zuul restarts - so we're already managing a copy of 75% of what would go into the temp dir we point rkt at :) | 06:09 |
rbergeron | jlk: not to open the floor to obvious comments, but... i wonder how much of that is handled by tower. or if not at all, aside from "assuming there are multiple layers of permissions / eyeballs on things" which... doesn't really fix it if you're like, super sekurity oriented | 06:13 |
SpamapS | mordred: yeah I'm pretty sure it will be a small chroot | 06:13 |
rbergeron | and i have to think they've gotten hit on that question before | 06:13 |
mordred | rbergeron: not at all | 06:13 |
jlk | rbergeron: it's a problem that exists in tower as well | 06:13 |
jlk | rbergeron: without careful human review of what goes into tower, one can pretty easily own the tower box. | 06:13 |
SpamapS | And I was kind of hoping these app sandboxers would be more aligned with what we want. | 06:13 |
mordred | yah - there are people in ansible-land, including bcoca, a desire to have a restricted-ansible ... but doing it _right_ is hard because ansible was never designed for it | 06:14 |
SpamapS | they all have some o_O piece | 06:14 |
jlk | or at least gain access to whatever items are on the box that can be read by whatever user is used to execute ansible on tower. | 06:14 |
mordred | SpamapS: ++ | 06:14 |
rbergeron | jlk: ssh, don't tell the rhel folks | 06:14 |
jlk | I think they know | 06:14 |
SpamapS | Somebody said it earlier.. Unix gives you a bunch of tools to get anything done. But if you actually use all of them.. it looks like a complicated mess. | 06:14 |
mordred | :) | 06:14 |
mordred | ok - I gotta run ... this is a super great conversation though and I look forward ot having more of it | 06:15 |
SpamapS | I kind of want a nice simple facade in front of chroot+cgroup+namespaces | 06:15 |
SpamapS | rkt is probably the closest to simple | 06:15 |
mordred | SpamapS: ++ | 06:15 |
SpamapS | systemd-machined/systemd-nspawn might also count | 06:15 |
SpamapS | except systemd makes me want to throw stuff | 06:15 |
SpamapS | (but rkt is just a go frontend for that anyway) | 06:16 |
mordred | yah - but then you'll make me actually type that word in a non-joking manner | 06:16 |
rbergeron | spamaps: that's 2 out of three | 06:16 |
rbergeron | i should have a bot | 06:16 |
mordred | I'd rather say "we use rkt" than "we directly use a spawn of the devil" | 06:16 |
mordred | rbergeron: ++ | 06:16 |
mordred | becuase I'd NEVER be able to ge through a conference talk on the subject without grumbling | 06:16 |
mordred | but I can choose ot ignore rkt's implementation details ... | 06:17 |
SpamapS | exactly | 06:17 |
SpamapS | same | 06:17 |
* SpamapS did write a chapter of the upstart cookbook after all | 06:17 | |
SpamapS | still a little bitter about that ;) | 06:17 |
mordred | SpamapS: so say we all | 06:18 |
jlk | I helped port Fedora from sysv to upstart to systemd. I'm done with init systems. | 06:18 |
rbergeron | man, y'all don't know bitter about that topic like i do :) | 06:19 |
SpamapS | rbergeron: yeah, you probably had to actually talk to lennart. ;) | 06:19 |
SpamapS | (even though Lennart was just the angry shouting monkey that distracted us while Kay stole pid 1 from our pockets) ;) | 06:19 |
jlk | I spent enough time talking to Lennart. | 06:20 |
jlk | and Hhoyer for the stuff around it | 06:20 |
jlk | and ported s390x boot up stuff to Dracut. That was totes fun | 06:21 |
jlk | welp, I've reached full on WTF for the night, so I'm done. | 06:22 |
rbergeron | I feel like if y'all poked at vbatts' or dwalsh's brain a bit they might have ideas. maybe even rackerhacker | 06:22 |
rbergeron | since he's off in sekurity land pretty often | 06:23 |
jlk | right, SELinux is likely part of this strategy | 06:23 |
mordred | rbergeron: I've talked to vbatts a little bit | 06:25 |
mordred | rbergeron: the problem is - as awesome as he is, the amount of the problemspace that one needs to page in to be able to be helpful is more than they have time for | 06:25 |
rbergeron | yeah | 06:26 |
rbergeron | so many ppl i wish i could clone to further all my causes | 06:26 |
mordred | rbergeron: but I think vbatts would say "use bubblewrap, or rkt/runc" | 06:27 |
rbergeron | mordred: wait, are we just talking about the container case or also not container cases? i thought it was both? | 06:28 |
rbergeron | mordred: also, i thoughtyou were going to like, do things and stuff :) | 06:28 |
mordred | rbergeron: yah - I'm working on it :) | 06:28 |
mordred | rbergeron: can you say "container case or also not container cases" but iwth different words? I don't understand the question | 06:28 |
mordred | oh - I think I do now | 06:29 |
mordred | rbergeron: we're talking about ALL invocations of ansible-playbook on user supplied code | 06:29 |
mordred | and since we run the ansible-playbook process on the zuul launcher, wrapping those invocations with rkt or some other container tech not help disallow malicious code | 06:29 |
mordred | there was also some discussion of running ansible-playbook on the remote node - which is a thing we _will_ want to do in some cases, but I'm arguing that we don't want to do it for all of them, and am hoping we can use ansible to do that when we need to do that | 06:30 |
* mordred may try to write a small POC of using ansible to do that tonight | 06:31 | |
rbergeron | mordred: yeah, i think we want to avoid that becoming the defacto standard in zuul. ppl do it with ansible, but it's usually like, "because complicated stuff" | 06:31 |
jlk | command: ansible-playbook ...... | 06:31 |
SpamapS | yeah I'm not so concerned about that | 06:32 |
SpamapS | I'm more concerned with isolating ansible-playbook on the remote node from the rest of the node | 06:32 |
SpamapS | which may be only slightly simpler than rkt/systemd-nspawning on the executor | 06:32 |
mordred | rbergeron: I don't want humans to write ansible to run ansible when it's needed - I want us to do it one time and have it be a pre-canned role people can refer to similar to run-tox | 06:32 |
rbergeron | esp since a lot of this is just "ansible is easy and makes it easy for you to do stuff" that i get to say over and over and then being like "oh but actually maybe not bring all your best practices over because aieeeeeeee" | 06:33 |
mordred | SpamapS: I'm _very_ concerned with the theory of doing the paramiko calls ourselves and not using ansible for that - paramiko likes to break and do crazy things and it's really hard to get it all right | 06:33 |
rbergeron | mordred: yeah, i'm still wrapping my brain around... mostly words with that, i poked at jim about that today in the meeting | 06:34 |
SpamapS | mordred: that is never an option IMO | 06:34 |
mordred | rbergeron: right - so there will be a better story for this that doesn't expose all of these guts - most of the ugly we're talking about here are impl details that most users should never need to know | 06:34 |
mordred | SpamapS: yah. IMO too | 06:34 |
mordred | ok. srrsly ... must AFK | 06:35 |
SpamapS | just ansible -m ansible-playbook-runner node1 | 06:35 |
SpamapS | go AFK | 06:35 |
SpamapS | nao | 06:35 |
rbergeron | i think there are reasons we went from paramiko to ssh as default :) | 06:35 |
jlk | somebody showed how much faster it was... :) | 06:36 |
* jlk afk | 06:36 | |
rbergeron | ;) | 06:37 |
openstackgerrit | Joshua Hesketh proposed openstack-infra/nodepool feature/zuulv3: Merge branch 'master' into feature/zuulv3 https://review.openstack.org/445325 | 06:59 |
rbergeron | jlk, mordred, in your glorious awayness: i guess that thing called tower has started using bubblewrap for things, but unclear to what extent (and yes, obvious comments here, lol) | 07:10 |
rbergeron | unclear to.. my immediate eyeballs / brainpower anyway | 07:10 |
*** isaacb has joined #zuul | 08:01 | |
*** hashar has joined #zuul | 08:13 | |
*** yolanda has quit IRC | 08:37 | |
*** yolanda has joined #zuul | 08:42 | |
*** Cibo_ has joined #zuul | 08:55 | |
*** Cibo_ has quit IRC | 09:14 | |
*** bhavik1 has joined #zuul | 09:25 | |
*** lennyb has quit IRC | 10:06 | |
*** lennyb has joined #zuul | 10:11 | |
*** isaacb has quit IRC | 11:00 | |
*** bhavik1 has quit IRC | 11:13 | |
*** isaacb has joined #zuul | 11:48 | |
*** hashar is now known as hasharLunch | 12:12 | |
Shrews | such scrollback | 12:36 |
Shrews | at such wee hours | 12:37 |
Shrews | you folks are weird | 12:37 |
*** isaacb has quit IRC | 12:43 | |
*** isaacb has joined #zuul | 12:44 | |
*** madgoat has joined #zuul | 13:30 | |
*** madgoat has quit IRC | 13:31 | |
rbergeron | lol | 14:12 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Add request-list nodepool command https://review.openstack.org/445169 | 14:21 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Remove AllocatorTestCase and RoundRobinTestCase https://review.openstack.org/445175 | 14:21 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix min-ready/max-servers in test configs https://review.openstack.org/445512 | 14:21 |
Shrews | jeblair: pabelanger: rebased those two reviews on new review 445512, which fixes an odd edge case in our tests. We hit that today in 445169 tests. | 14:22 |
pabelanger | looking | 14:23 |
pabelanger | Shrews: Hmm, so if I understand right, you are staying min-ready: 2, max-servers: 1 is an invalid config? | 14:26 |
Shrews | pabelanger: not an invalid config | 14:27 |
Shrews | pabelanger: it's invalid for our tests (because we can't have them hang) | 14:27 |
Shrews | totally fine for production because you will have something freeing up nodes eventually | 14:27 |
pabelanger | Hmm, let me check something. | 14:28 |
Shrews | pabelanger: btw, this is not the problem with nl01. that problem is that we have lost requests that we don't process. still working up a fix for that | 14:29 |
pabelanger | okay, not the issue I was thinking of | 14:31 |
pabelanger | Shrews: so, just to confirm, if we had 2 providers, like node_vhd_and_qcow2.yaml, each with max-servers: 1 and label of min-ready: 2. We'd properly launch 1 node in each provider? | 14:33 |
Shrews | pabelanger: not necessarily. we don't know which provider will attempt to satisfy the min-ready requests. | 14:34 |
pabelanger | right, but if provider A was at max-servers, doesn't nodepool-launcher move to the next provider | 14:34 |
Shrews | so if provider A ends up trying to handle both min-ready requests (1 per node), then it will pause because it can only have 1 server ready | 14:34 |
pabelanger | Oh | 14:35 |
pabelanger | so, it won't release the request back to other launchers? | 14:35 |
Shrews | nope. that's not the algorithm. the algorithm says to "pause if we are at quota". | 14:35 |
Shrews | pabelanger: if we were to do that, and there were no other launchers, the request would fail. And we don't really want that. This is a very weird case for the min-ready nodes. | 14:36 |
pabelanger | Ya, a little different how we do it today, but that is okay. | 14:37 |
Shrews | we can probably put our heads together and come up with a more elegant solution for that, but i'm not concentrating on that right now. gotta fix the lost requests thing. | 14:51 |
jeblair | jlk, SpamapS, rbergeron: i think you mostly covered this last night, but here's background on why we're executing ansible on the launcher^Wexecutor: https://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html#execution | 14:52 |
jeblair | jlk: and no doc bugs at the moment. we've completely stopped updating docs in zuulv3 and will update/rewrite them as a separate step. until then, the spec and my PTG email are the docs. | 14:54 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Create run-cover role https://review.openstack.org/441332 | 15:05 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Create zuul_workspace_root job variable https://review.openstack.org/441441 | 15:05 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Organize playbooks folder https://review.openstack.org/441547 | 15:05 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Rename prepare-workspace role to bootstrap https://review.openstack.org/441440 | 15:05 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add run-docs role and tox-docs job https://review.openstack.org/441345 | 15:05 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info to bootstrap role https://review.openstack.org/441617 | 15:05 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add revoke-sudo role and update tox jobs https://review.openstack.org/441467 | 15:05 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Create tox-tarball job https://review.openstack.org/441609 | 15:05 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Create run-cover role https://review.openstack.org/441332 | 15:16 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Create zuul_workspace_root job variable https://review.openstack.org/441441 | 15:16 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Organize playbooks folder https://review.openstack.org/441547 | 15:16 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Rename prepare-workspace role to bootstrap https://review.openstack.org/441440 | 15:16 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add run-docs role and tox-docs job https://review.openstack.org/441345 | 15:17 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add net-info to bootstrap role https://review.openstack.org/441617 | 15:17 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add revoke-sudo role and update tox jobs https://review.openstack.org/441467 | 15:17 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Create tox-tarball job https://review.openstack.org/441609 | 15:17 |
*** hasharLunch is now known as hashar | 15:24 | |
*** jesusaur has quit IRC | 15:38 | |
*** jesusaur has joined #zuul | 15:42 | |
pabelanger | I feel like hacking at a coffeeshop today | 15:49 |
*** isaacb has quit IRC | 15:54 | |
*** hashar has quit IRC | 16:06 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Add request-list nodepool command https://review.openstack.org/445169 | 16:10 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Remove AllocatorTestCase and RoundRobinTestCase https://review.openstack.org/445175 | 16:10 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix min-ready/max-servers in test configs https://review.openstack.org/445512 | 16:10 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix race on node state check in node cleanup https://review.openstack.org/445557 | 16:10 |
Shrews | rebased stack on a race fix ^^^ | 16:10 |
* Shrews hopes to stop finding bugs so that he can fix the one bug that was supposed to be the focus of his day | 16:15 | |
rbergeron | shrews: that's not how software works, dontcha know :) | 16:20 |
*** adamw has quit IRC | 16:22 | |
*** adamw has joined #zuul | 16:23 | |
* Shrews gives up for a while to release stress at the gym. bbl | 16:23 | |
pabelanger | I've arrived at a coffeeshop | 16:27 |
*** bhavik1 has joined #zuul | 16:31 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Remove ready-script support https://review.openstack.org/445567 | 16:36 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Stop writing nodepool bash variable on nodes https://review.openstack.org/445572 | 16:42 |
SpamapS | jeblair: thanks. I am still feeling like as long as we're taking such great pains to isolate tests, we could use some or most of that to isolate untrusted playbooks. | 16:43 |
SpamapS | jeblair: another way to put it is.. if we're going to teach nodepool to kubernetes.. maybe we should just do that instead and run untrusted playbooks on kubernetes things? I dunno. | 16:43 |
SpamapS | That may be over-reductive. | 16:44 |
jeblair | SpamapS: take a look at the 2 comments i just left on your spec | 16:50 |
*** hashar has joined #zuul | 16:51 | |
SpamapS | jeblair: kk | 16:55 |
SpamapS | jeblair: So I agree it's confusing and I hope we can get through this exercise with something not confusing either way. Ansibling on the node *is* a bit weird, however, by setting up the node in inventory by its name/groups, and just setting ansible_connection=localhost the only confusing part is that there's no SSH in -vvvv right? local is just "local to ansible-playbook's execution context" | 17:00 |
SpamapS | I think the bigger obstacle is getting a working ansible on the node without affecting it too much. | 17:01 |
jeblair | right. it was a design goal to avoid that. | 17:01 |
SpamapS | So what I think I'm driving at is whether we can use the node abstraction, rather than the nodes from nodepool itself. | 17:05 |
jeblair | can you say that in different words? | 17:05 |
SpamapS | Yeah I'm working on it. I want to use code if I can. | 17:05 |
jeblair | i'd really rather just understand the sentence you just said | 17:06 |
SpamapS | Like basically I wonder if we can make something like 'untrusted-node: k8s-ansible' or if you have plenty of capacity 'untrusted-node: ubuntu-xenial' .. makes more sense? | 17:07 |
SpamapS | Not the same as the test execution node. | 17:07 |
SpamapS | So you'd have your test execution nodes for your job, but then an untrusted node as well. But use the same abstraction. | 17:08 |
SpamapS | and instead of crippled ansible in untrusted context, you'd get a playbook that runs ansible on whatever untrusted-node is defined. | 17:09 |
jeblair | why would it be useful to have a choice there? | 17:09 |
SpamapS | Because we can have one that is entirely doable without operator pain, but maybe isn't suitable for all end users. | 17:10 |
SpamapS | And we can take advantage of nodepool when users think that's suitable. | 17:10 |
jeblair | ubuntu-xenial is doable without operator pain? | 17:10 |
SpamapS | I think it is, if users are already deploying zuul against a large cloud with lots of test (so, infra-esque users). | 17:11 |
SpamapS | lots of tests. | 17:11 |
jeblair | openstack-infra has free donated cloud resources on a huge scale, and i can't see it embracing the idea that we need 2 test nodes in order to run a pep8 job. | 17:11 |
jeblair | (even if one of them is "only" a 1G vm) | 17:12 |
SpamapS | I'd say even smaller would be required to make this sensible. | 17:12 |
SpamapS | (and address quotas likely become the limiting factor here) | 17:12 |
jeblair | right; those are not choices we have | 17:12 |
SpamapS | Also one optimization would be that untrusted nodes could be shared by a trigger? | 17:13 |
SpamapS | Dunno | 17:13 |
jeblair | SpamapS: i guess i misunderstood where you were at the end of your marathon conversation with mordred last night | 17:14 |
SpamapS | I'm working it out in my head, but the point is really that I'm pretty sure one size won't fit all. | 17:14 |
SpamapS | jeblair: I slept since then. :-P | 17:14 |
SpamapS | lots of ideas flowing out of my head | 17:15 |
jeblair | SpamapS: okay, so how can we have the conversation you need in a productive way? | 17:15 |
jeblair | because i feel like we're going in circles a bit here | 17:15 |
SpamapS | jeblair: maybe I should state the problems I have with the other method. | 17:15 |
SpamapS | Rather than trying to sell something that is also very complex. | 17:15 |
SpamapS | So isolating on the executor SHOULD be simple. But it's feeling pretty heavy from a development standpoint. | 17:16 |
SpamapS | Now, rkt may be simpler than I thought, and it may be a good option. I'm still processing its documentation. IMO, rkt is much bigger and more ambitious than what we need. | 17:17 |
jeblair | i guess that's the part i don't understand -- i'm still under the impression that "bubblewrap ansible-playbook" or "rkt ansible-playbook" or similar things are viable candidates. | 17:18 |
SpamapS | bubblewrap requires you to build a chroot to run things in. | 17:18 |
jeblair | we're like 90% of the way there, right? | 17:18 |
SpamapS | and doesn't do anything with LSM/MAC's.. so you're still vulnerable to namespace breakouts. | 17:18 |
SpamapS | rkt is more or less the same, and mostly acts as a frontend for systemd-nspawn and systemd-machined interactions. | 17:20 |
SpamapS | jeblair: my point is really that the complexity of bubblewrap and rkt aren't gaining us that much, other than another layer to peel. | 17:21 |
SpamapS | I suppose with that in mind, the simplest one should win. :-P | 17:22 |
*** hashar has quit IRC | 17:23 | |
pabelanger | my container foo is not good, but we have a JobDir today, created by zuul. What would be the simplest tool today to make that a container, and just bindmount the ansible bits? | 17:26 |
pabelanger | without diving into image builds atm | 17:26 |
clarkb | pabelanger: you'd need a fully contained python and ansible install too | 17:27 |
clarkb | (which is doable with virtualenv iirc) | 17:27 |
pabelanger | clarkb: right, say we just use virtualenv, inside the chroot | 17:27 |
clarkb | you'd have to do the full copy thing and also likely carry along things like libc? | 17:27 |
jeblair | we already copy most of ansible into a zuul staging area when we start. that plus the jobdir is why i say we're close. | 17:27 |
pabelanger | I mean, we can quickly build something in DIB. I've done that before | 17:27 |
pabelanger | jeblair: that's what I am thinking too. | 17:28 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Rename zuul-launcher to zuul-executor https://review.openstack.org/445594 | 17:29 |
SpamapS | virtualenv is not chrootable IIRC | 17:30 |
SpamapS | once you chroot in, the paths change | 17:31 |
SpamapS | and thus, it freaks out | 17:31 |
*** Cibo_ has joined #zuul | 17:31 | |
clarkb | SpamapS: it has a copy everything option which should avoid that problem | 17:31 |
clarkb | SpamapS: by default that is true because it symlinks and does funny relative to system python things | 17:31 |
SpamapS | clarkb: I've always beent old virtualenvs are not relocatable. | 17:32 |
* SpamapS testing | 17:33 | |
clarkb | SpamapS: looks like it copies all the python things but not your system deps | 17:33 |
clarkb | so not complete solution | 17:33 |
clarkb | and for relocatable I think the --relocatable flag would work in this case | 17:34 |
clarkb | (if you use --always-copy too) | 17:34 |
SpamapS | oh see they're always making new things | 17:34 |
clarkb | the general case is not relocatable for a variety of reasons. symlinks relative to things that move, versions of python being different etc | 17:34 |
clarkb | er not being relative to things that move | 17:35 |
SpamapS | you have to run --relocatable after creating | 17:35 |
SpamapS | interesting | 17:35 |
* SpamapS is lurning | 17:35 | |
clarkb | but yes general case is not relocatable | 17:35 |
SpamapS | I can't seem to chroot into it anyway | 17:37 |
SpamapS | because it's not a statically compiled python maybe | 17:38 |
SpamapS | though it's saying not found | 17:38 |
clarkb | ya you'd need all the system deps too so not complete solution | 17:39 |
clarkb | which might make a nromal python install in simple image better/simpler | 17:40 |
SpamapS | yeah, it's got to be a real user-spaced chroot | 17:42 |
SpamapS | venv isn't even close | 17:42 |
SpamapS | bash-4.3# python | 17:42 |
SpamapS | Could not find platform independent libraries <prefix> | 17:43 |
SpamapS | even once I got the system libs in there | 17:43 |
SpamapS | Much simpler to just diskimage-builder up a tarball with ansible installed | 17:43 |
jeblair | that can probably be done fairly quickly at executor start, yeah? | 17:43 |
SpamapS | Yep | 17:44 |
SpamapS | gah.. forgot how long it takes the first time you do an ubuntu-minimal dib | 17:59 |
jlk | so yes | 18:00 |
jlk | I figured at executor start, and by operator trigger, you could cause a new base image to be created | 18:00 |
jlk | the base image is what you start with, and you would overlay the dir that zuul is prepping with all the source | 18:00 |
jlk | but here is my concern as an operator | 18:00 |
jlk | I feel very uncomfortable having end user provided code executing on the resources I have to secure | 18:00 |
jlk | we already have a isolation and scale system in place, the nodepool cloud | 18:01 |
jlk | those may be resources I don't have full control over | 18:01 |
jlk | adn they're totally ephemeral | 18:01 |
SpamapS | jlk: to jeblair's point from before.. to run python unit tests, you either need two vms, or one polluted vm. | 18:01 |
jlk | but if this is executing _on_ the executor, then I have to scale the more costly resource I control, _and_ deal with clean ups and break outs and such | 18:02 |
jlk | SpamapS: a polluted VM, yes | 18:02 |
jlk | you get the VM node, + enough to run ansible locally | 18:02 |
SpamapS | Or, you deploy a VM, and you make two containers in it... ansible executor, and unit test executor. | 18:02 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP Add per-repo public and private keys https://review.openstack.org/406382 | 18:02 |
jlk | and then if you're doing pep8, you're doing it locally | 18:02 |
SpamapS | But that's the easy case. devstack-gate ... this gets complicated. | 18:02 |
jlk | right, I need to read the link jeblair sent this morning, before getting too far into this again | 18:03 |
clarkb | SpamapS: re d-g its not that complicated we already do that thing today | 18:03 |
SpamapS | jlk: it's pretty brief. But basically suggests that it is simpler to just run ansible that we control than it is to try and cleanly inject ansible. | 18:03 |
clarkb | (there is the extra overhead of teaching all the hosts how to talk to each other properly) | 18:03 |
jlk | right, that bit, teaching the hosts to talk, that's hard. | 18:04 |
SpamapS | clarkb: one might expect in a multi-node test that the point is for the nodes to talk to eachother? | 18:04 |
jlk | and we hit htat now in 2.5 with our multi-node setup | 18:04 |
clarkb | SpamapS: yes it is, but there is overhead to set up the communication from a job control perspective | 18:04 |
jlk | devstack-gate does feel pretty special case though. | 18:04 |
clarkb | we do it today. This is exactly how d-g runs if you run it today | 18:04 |
SpamapS | -rw-r--r-- 1 clint clint 712M Mar 14 11:04 ansible-chroot.tar | 18:05 |
*** mgagne_ is now known as mgagne | 18:05 | |
jlk | SpamapS: compare that to the size of the docker image that Ansible produces to run Ansible in | 18:05 |
clarkb | setuppy things make sure ssh keys are in place, then we run ansible on the "polluted" node to run the job | 18:05 |
SpamapS | That's the result of running 'disk-image-create -t tar -o ansible-chroot ubuntu-minimal -p ansible' | 18:05 |
jlk | oh they don't any more :/ | 18:05 |
SpamapS | not totally shocking really | 18:06 |
jlk | Another issue I have | 18:06 |
clarkb | SpamapS: how old is your dib? there was a bug in old dib that made ubuntu minimal really not minimal | 18:06 |
clarkb | SpamapS: I think yo ucan likely shave off at least 250MB just by updating dib if yo uare on an older version | 18:06 |
pabelanger | SpamapS: I have something smaller, I managed to get a tarball down to about 50MB over christmas | 18:06 |
jlk | doing all the multi-node things from executord host(s) means an increased amount of forks/ssh threads going out from those | 18:06 |
clarkb | but ya not small | 18:06 |
jlk | which costs memory | 18:06 |
jlk | instead of pushing that work down to the cloud resource | 18:06 |
* pabelanger restores ubuntu-rootfs elements | 18:06 | |
SpamapS | clarkb: git pulled just before | 18:07 |
SpamapS | oh, why do I have a kernel? | 18:08 |
jlk | DiB may be dumb | 18:08 |
SpamapS | yeah | 18:08 |
jlk | and not know how to do things without a kernel | 18:08 |
SpamapS | that's 209MB | 18:08 |
jlk | Fedora did all this with a thing called "mock" | 18:08 |
SpamapS | more actually | 18:08 |
jlk | which was just a wrapper around yum install --over-there/ | 18:08 |
SpamapS | 360MB | 18:08 |
pabelanger | ubuntu-minimal is very opinionated for a VM | 18:09 |
SpamapS | wow why is firmware so big?? | 18:09 |
jlk | dnf has this too, wher eou can just dnf --install path is /over/there <my-package-set> | 18:09 |
jlk | nothing "depends" on kernel, so you can avoid a whole bunch of code | 18:09 |
*** bhavik1 has quit IRC | 18:10 | |
SpamapS | right, debootstrap does this too, dib just forces a bunch of VM stuff in | 18:11 |
jlk | oh I see, it's a wrapper around a wrapper | 18:11 |
SpamapS | so it's 446M without kernel stuff | 18:11 |
jlk | are... are you in a wrap battle? | 18:11 |
jlk | (I'll see myself out) | 18:12 |
SpamapS | SpamapS The Wrapper | 18:12 |
clarkb | it has to do kernel things because debootstrap is silly with kernels iirc | 18:12 |
SpamapS | jlk: Dad Joke Level 7 achieved. | 18:12 |
SpamapS | clarkb: nope | 18:12 |
SpamapS | it adds the kernel | 18:12 |
clarkb | SpamapS: yes because debootstrap adds the wrong kernel iirc | 18:12 |
SpamapS | debootstrap is very smart about just installing essentials | 18:12 |
jlk | if by "silly" you mean "it doesn't include the kernel", then yes | 18:12 |
pabelanger | we also don't clean out apt-cache in ubuntu-minimal, so there will be cached dpkgs in the tarball | 18:13 |
SpamapS | kernel is not an essential | 18:13 |
jlk | so here's the thing. | 18:13 |
clarkb | I was pretty sure debootstrap was installing a kernel but not the current kernel | 18:13 |
jlk | size of the image isn't that big of a deal | 18:13 |
clarkb | so dib swings around and updates it to be current and cleans out the old one | 18:13 |
jlk | because you're going to maintain one, maybe 2 per executor, locally | 18:13 |
clarkb | something like that | 18:13 |
jlk | not shlepping them around | 18:13 |
jlk | and start up is super fast | 18:13 |
jlk | and overlay is low overhead | 18:13 |
SpamapS | yeah sorry I got distracted by the size | 18:14 |
SpamapS | pabelanger: good point, 56MB more | 18:14 |
pabelanger | just restored: https://review.openstack.org/#/q/status:open+topic:debootstrap-minimal (untested). That was the path I started down to create a ubuntu-rootfs element, for minimal container things | 18:14 |
pabelanger | will have to look at my notes again, but managed to get things pretty small | 18:14 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP Add per-repo public and private keys https://review.openstack.org/406382 | 18:14 |
pabelanger | even a minimal ansible + python tarball | 18:14 |
jlk | I like this work anyway | 18:14 |
SpamapS | oh, ansible doesn't like being in a chroot | 18:15 |
SpamapS | http://paste.openstack.org/show/602732/ | 18:15 |
jlk | because even if we weren't running untrusted code on executor, I still like the layers of running the ansible from inside a containment zøne | 18:15 |
SpamapS | (that looks like a devpts issue or something) | 18:15 |
jlk | whoh, not sure how I made that ø | 18:15 |
SpamapS | compose key fun | 18:16 |
jlk | anway, the idea of running the ansible things in ephemeral mostly immutable containments sits will with me, regardless of having end user code or not in there. | 18:16 |
SpamapS | feels weird to be talking about tools to build containers. Isn't that like, what the whole world is doing right now? How do rkt users build containers? | 18:17 |
clarkb | oh also I think apt installs recommends by default? anothe rarea you can probably trim the tree size | 18:18 |
SpamapS | clarkb: I believe we turn that off, let me check | 18:18 |
jlk | https://github.com/containers/build | 18:19 |
SpamapS | Yeah, dib turns off recommends | 18:19 |
jlk | rkt can use docker images, so the same tooling there | 18:19 |
SpamapS | jlk: so, ansible-container......... | 18:19 |
jlk | no | 18:19 |
jlk | it's very tech-preview | 18:19 |
pabelanger | container builds should be no different then image builds IMO | 18:20 |
SpamapS | bummer | 18:20 |
jlk | and it's built up around docker-compose | 18:20 |
pabelanger | right | 18:20 |
SpamapS | cause, that would be very congruent | 18:20 |
jlk | it would, but it's WEIRD | 18:20 |
pabelanger | I'd rather not have to depend on docker for build things. | 18:20 |
SpamapS | pabelanger: except without kernels and boot loaders and partitions and... | 18:20 |
jlk | it builds a conatiner, to run ansible in, to talk to a second container | 18:20 |
jeblair | SpamapS: apparently mounting a tmpfs at /dev/shm should solve the error 38 problem: http://stackoverflow.com/questions/6033599/oserror-38-errno-38-with-multiprocessing | 18:20 |
jlk | to deploy software into | 18:20 |
pabelanger | SpamapS: right, we just need a new element, which diskimage-builder doesn't include today | 18:20 |
SpamapS | jeblair: yeah I figured that's just my naive chrooting .. thanks for looking that up | 18:20 |
pabelanger | I raised a topic on ML about this a few months ago | 18:21 |
SpamapS | Hopefully ansible playbook can survive without proc or sys mounted. | 18:21 |
pabelanger | some interest but wanted it to be a larger refactor | 18:21 |
clarkb | also if you use tarball output you already forgo boot loaders and partitions iirc | 18:21 |
pabelanger | simple-playbook is also a thing: https://review.openstack.org/#/c/385608/ which allows you to run ansible outside the chroot, to populate things inside the chroot :) | 18:22 |
jlk | yeah, there's a chroot connection method I thought | 18:22 |
jlk | so you can do ansible to build the chroot and wrap it up | 18:23 |
jlk | or there's the docker connection method | 18:23 |
jlk | but that's kind of awkward too | 18:23 |
SpamapS | Is the fact that I have to re-read the elements I wrote 3 years ago a sign that I'm getting old, or lazy? | 18:23 |
*** Cibo_ has quit IRC | 18:24 | |
jlk | nah, it's that we have small scratch space in our brain for retaining details | 18:25 |
jlk | the rest gets paged out | 18:25 |
SpamapS | yeah, lately, aggressively :) | 18:25 |
jlk | SpamapS: so a more serious answer to your question about building containers | 18:26 |
jlk | just about everybody int he world builds container images by starting with a base image. | 18:27 |
jlk | and then layering the changes on top of that | 18:27 |
jlk | so they start with something from docker hub or whathave you | 18:27 |
jlk | I think it's far more rare to be building images completely from scratch | 18:27 |
pabelanger | which is great, until you lose a layer some place. eg: image gets deleted | 18:29 |
SpamapS | jlk: makes sense | 18:29 |
clarkb | it also means you are beholden to the choices of the layers below you | 18:29 |
clarkb | (which has bee na problem for us in the "real world") | 18:30 |
SpamapS | Right, that's the rub with using Ubuntu's images | 18:30 |
SpamapS | you can't really build the exact same image the way they do | 18:30 |
SpamapS | which is really why ubuntu-minimal exist | 18:30 |
SpamapS | s | 18:30 |
SpamapS | pabelanger: did you actually make ubuntu-rootfs or something like it? | 18:31 |
SpamapS | or just proposed? | 18:31 |
SpamapS | because that's mostly all that's needed I think | 18:31 |
pabelanger | yes, I've had something for a while. | 18:31 |
pabelanger | but haven't gotten it merged into diskimage-builder | 18:31 |
pabelanger | I've been meaning to push it into project-config for now | 18:32 |
jlk | clarkb: correct, it's very much a concern. | 18:34 |
jlk | honestly, I feel that if zuul is going to depend images at runtime, it should be in the business of building them up from scratch, not doing a FROM <somebody_else's_work> thing. | 18:35 |
pabelanger | right, nodepool-builer could handle that. Just a matter of getting the tarball to zuul-executor host | 18:36 |
jlk | does it have to be nodepool builder? | 18:36 |
jlk | that feels very back/forth to me | 18:36 |
clarkb | no, but the overlap is significant | 18:36 |
pabelanger | it is our image build service today | 18:36 |
clarkb | you could have zuul just configure a known image location then users use whatever they want | 18:36 |
clarkb | we'd likely use nodepool-builder | 18:37 |
* jlk waves hands "these are not the microservices you're looking for" | 18:37 | |
jlk | anyway, that's still implementation detail without having solid agreement on overall approach | 18:38 |
clarkb | jlk: not sure what the concern is? you just need something to put a tarball at $PATH | 18:38 |
jlk | moving part complexity I guess, bootstrap dances | 18:39 |
jlk | it'd feel more simple if executor just built the darn image locally when the service starts. | 18:39 |
jlk | but I guess it can just rely on a configured path to find the image | 18:39 |
jlk | and how that file gets there is a site decision | 18:39 |
clarkb | ya I think thats what it could do via from nodepool import builder ; builder.build_image_at(/path) | 18:39 |
clarkb | then if you want to use some other building facility you'd set the config flag to "just look for your image here" | 18:40 |
clarkb | anyways its all hand wavy at this point | 18:40 |
pabelanger | only concern about building images on executor, sudo permissions are needed | 18:41 |
jlk | ... are they? | 18:41 |
pabelanger | and, increase pressure on HDD for caching things | 18:41 |
clarkb | jlk: to chroot yes | 18:41 |
pabelanger | yup | 18:41 |
jlk | uh | 18:41 |
pabelanger | I've always wanted to see if fakechroot would work | 18:41 |
clarkb | I guess we might be able to work around those since non vm image builds are simpler | 18:41 |
jlk | that doesn't gel with my memory of building chroots on Fedora | 18:41 |
pabelanger | but never tried | 18:41 |
clarkb | jlk: it has to do with how it chroots and bind mounts things in iirc | 18:42 |
jlk | which was just yum to an alt path | 18:42 |
clarkb | possibly changeable for simpler builds | 18:42 |
jlk | you need the bind mounts to chroot into the thing, yes, but to build it?? | 18:42 |
clarkb | jlk: the way dib builds are done it bind mounts things into a chroot to do the build | 18:43 |
jlk | well that's dumb :) | 18:43 |
clarkb | jlk: its actually quite useful for many use cases. Just probably not necessary in all | 18:43 |
pabelanger | debootstrap needs sudo, IIRC | 18:44 |
SpamapS | dib's main reason for existing is efficiency, not simplicity of implementation. It does a lot of dumb stuff to make it faster. | 18:50 |
jlk | oh hrm, doing it as non-root would make permissions of the files installed all wonky, so maybe I'm misremembering things | 18:50 |
SpamapS | clarkb: do you think importing nodepool's builder like that is a good idea? I haven't thought it through, but that would make me feel a lot better about this if I could just have zuul-executor start as root.. run that to build image.. then drop to non-root. | 18:55 |
SpamapS | Pretty standard daemony behavior.. do some stuff as root then drop all the privileges and capabilities. | 18:56 |
clarkb | SpamapS: maybe, I also think that the dib 2.0 work is aimed at making that sort of use case work better iwth stock dib | 18:56 |
Shrews | CALLING ALL PYTHON EXPERTS: I need to compute the difference of two lists where elements can be repeated (so I cannot use set operations). Can anyone save me some time here? | 18:56 |
clarkb | so possibly it would be better to just support that work | 18:56 |
jlk | Shrews: feels like a google interview challenge | 18:57 |
jlk | let me go get a whiteboard and a crippling case of imposter syndrome | 18:57 |
SpamapS | indeed it does | 18:57 |
Shrews | jlk: ssssshhh, they're looking over my shoulder | 18:57 |
SpamapS | Shrews: without set operations. "the difference" could mean different things. | 18:58 |
clarkb | Shrews: collections.Counter might help | 18:58 |
jhesketh | Morning | 18:58 |
clarkb | depending on your definition for difference Counter.subtract() may do what you want | 18:58 |
Shrews | SpamapS: [1, 1, 1, 2] and [1, 2] ... diff would be [1, 1] | 18:59 |
SpamapS | The difference in the actual content as-ordered, or the set of keys, duplicated or not? What I mean is, what do you want when you compare ['a', 'b', 'a'] and [ 'a', 'b', 'b'] ? | 18:59 |
Shrews | not as-ordered | 18:59 |
Shrews | [1, 2, 1, 1] - [1, 2] is still [1, 1] | 19:00 |
SpamapS | There's a name for this | 19:00 |
clarkb | Shrews: then ya I think collections.Counter will work | 19:00 |
SpamapS | it's the opposite of the union | 19:00 |
* SpamapS will probably find out that name is "difference" | 19:00 | |
clarkb | and treat negative values as zero | 19:00 |
SpamapS | yeah counter | 19:01 |
Shrews | clarkb: that might work. thx | 19:01 |
SpamapS | mmmmm freshly shorn yak | 19:01 |
jeblair | infra meeting is starting now in #openstack-meeting | 19:03 |
Shrews | clarkb: worked brilliantly. thx x 2 | 19:07 |
jlk | jeblair: how would you feel about "running user's playbook directly on executor" being a flaggable feature? So that an implementer could say "nope" to that whole code path? | 19:13 |
jeblair | jlk: what's the alternative? | 19:21 |
jlk | ah, that's a good question. I guess I hadn't thought that through enough, as all the multi-node stuff we rely on now would be going away. | 19:21 |
harlowja | mordred SpamapS since i think i want to pull u in, https://lists.cncf.io/pipermail/cncf-ci-public/2017-March/000025.html | 19:22 |
jlk | is it safe to say that any multi-node testing would require doing the playbook on the executor ? | 19:22 |
harlowja | ` | 19:22 |
harlowja | In summary, we are offering to build (initially) a proof-of-concept implementation of a CNCF-centric CI Workflow Platform to make it easy to compose and run cross-project continuous integration workflows composed out of re-usable and configurable building blocks (which incorporate other CI systems like Jenkins, CircleCI, Travis, Zuul, etc).` | 19:22 |
harlowja | :( | 19:22 |
harlowja | why do people do this, lol | 19:22 |
jlk | a number of us are reading that rightnow | 19:24 |
harlowja | kk | 19:24 |
openstackgerrit | Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Allow github trigger to match on branches/refs https://review.openstack.org/445625 | 19:25 |
harlowja | https://lists.cncf.io/pipermail/cncf-toc/2017-March/000699.html is the other place this shows up (sent to both?) | 19:25 |
* harlowja would rather have zuul just work here (not build yet-another...) | 19:33 | |
harlowja | but that may require some reach-oout from folks here | 19:33 |
harlowja | reach-out/education... | 19:33 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix for unpaused request handlers https://review.openstack.org/445632 | 19:45 |
Shrews | ALL: The nodepool gate seems to have gained some instability with all of the new code added the last few days, and tests re-enabled. Be aware. Trying to squash what I can. | 19:46 |
jeblair | jlk: how about we try to continue down the path that we're on now, so that we can get something running based on the current design, and get some experience? the most viable alternatives i see involve interacting with a container service (eg k8s). revisiting this after (or as part of) our work to design and add container support to nodepool may be more fruitful. i think if we can agree that we want ansible to run from the perspective of an ... | 19:50 |
jeblair | ... external party to the worker nodes, we will be able to port to other ways of running ansible without needing to re-work playbooks. if we run into technical roadblocks, or we become so convinced this is a terrible idea and will never work, let's try to schedule a phone call to go over the constraints and design, because this touches a lot of stuff. | 19:50 |
jlk | jeblair: yeah, i'm backing away from the cliff. | 19:51 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix for unpaused request handlers https://review.openstack.org/445632 | 19:56 |
*** hashar has joined #zuul | 20:00 | |
pabelanger | also, https://review.openstack.org/#/c/445594/ is now ready to bikeshed over. Renames zuul-launcher to zuul-executor | 20:13 |
pabelanger | I still have to update puppet, but holding off until we get some +2's on it | 20:14 |
openstackgerrit | Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Better merge message for GitHub pull reqeusts https://review.openstack.org/445644 | 20:15 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Remove the --no-delete option from nodepool https://review.openstack.org/445241 | 20:24 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add per-repo public and private keys https://review.openstack.org/406382 | 20:24 |
jeblair | okay, that's step one of secrets ready for review; i'm going to continue building on that | 20:24 |
SpamapS | harlowja: so, background on that.. Huawei and I spoke at the Tahoe linux foundation leadership summit thing about a month ago.. and they feel it is complimentary to Zuul | 20:27 |
SpamapS | I haven't read the full thing so I'm not sure it actually is | 20:27 |
harlowja | u spoke with all of huawei | 20:27 |
harlowja | lol | 20:27 |
harlowja | mr.huawei | 20:27 |
harlowja | but fair point, they can take it wherever they want (more power to them i guess) | 20:28 |
SpamapS | harlowja: yes, yes I did. | 20:29 |
SpamapS | I spoke with a team led by Quanyi Ma, specifically | 20:29 |
harlowja | cools | 20:31 |
harlowja | just made me wonder when i saw `In summary, we are offering to build (initially) a proof-of-concept implementation of a CNCF-centric CI Workflow Platform to make it easy to compose and run cross-project continuous integration workflows composed out of re-usable and configurable building blocks (which incorporate other CI systems like Jenkins, CircleCI, Travis, Zuul, etc). ` | 20:31 |
harlowja | throw away the 'blah blah' terms from that | 20:31 |
harlowja | and it seems like zuul? | 20:31 |
pabelanger | jeblair: exciting | 20:37 |
* pabelanger off to read the spec on secrets again | 20:37 | |
pabelanger | jeblair: left a question / observation on 406382 | 20:45 |
jeblair | pabelanger: excellent point :) | 20:45 |
Shrews | Would appreciate some reviews on https://review.openstack.org/445632 and https://review.openstack.org/445557 so we can merge those and stabilize the np gate at least a bit | 20:45 |
Shrews | When folks have time, of course | 20:46 |
pabelanger | looking | 20:52 |
pabelanger | +3 | 20:54 |
jeblair | pabelanger: so we definitely need to add the source dir there. i'll take a look and see if i can put the public and private keys into the same file there. we might decide that we want them in two different directories (private/public), but i don't think anything else will access those files directly, so it should be fine. | 20:54 |
pabelanger | jeblair: cool | 20:55 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Fix race on node state check in node cleanup https://review.openstack.org/445557 | 20:58 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Fix min-ready/max-servers in test configs https://review.openstack.org/445512 | 20:58 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Fix for unpaused request handlers https://review.openstack.org/445632 | 20:58 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Add request-list nodepool command https://review.openstack.org/445169 | 21:01 |
openstackgerrit | Merged openstack-infra/nodepool feature/zuulv3: Remove AllocatorTestCase and RoundRobinTestCase https://review.openstack.org/445175 | 21:01 |
Shrews | sweet. those top 3 should help. thx | 21:04 |
harlowja | hmmmm, unsure who on kazoo is around to review stuff anymore | 21:05 |
harlowja | none of the other guys seem to be on IRC anymore, lol | 21:05 |
harlowja | i hope they just out or on vacation, lol | 21:06 |
pabelanger | Shrews: so, once nl01.o.o updates, we can start it back up? | 21:07 |
Shrews | pabelanger: no. unfortunately, the plethora of other bugs affecting the gate has pulled me away from the lost requests bug. | 21:08 |
pabelanger | Shrews: Ah. I see | 21:08 |
Shrews | sorry | 21:08 |
pabelanger | no problems | 21:08 |
pabelanger | Shrews: do you have an idea what the current issue is? | 21:08 |
Shrews | pabelanger: i think so. If we kill n-l while it's processing request, they are left in the PENDING state with nodes allocated for them. We never continue processing them on the restart | 21:09 |
Shrews | pabelanger: plan is to clean them up on restart and restart the request handling for them from scratch | 21:10 |
pabelanger | k | 21:10 |
Shrews | i just actually have to GET to the point of coding that | 21:10 |
openstackgerrit | Jamie Lennox proposed openstack-infra/nodepool feature/zuulv3: Allow loading logging config from yaml https://review.openstack.org/445656 | 21:15 |
SpamapS | jeblair: ok, just to update you on the spec.. I'm going to move the "run on the node" bit down to alternatives (as in, not what we're doing). I'm also going to start experimenting, in earnest, with bubblewrap and rkt (and add rkt to the list) | 21:25 |
jeblair | SpamapS: cool, thx | 21:26 |
pabelanger | SpamapS: https://review.openstack.org/#/q/topic:debootstrap-minimal | 21:33 |
pabelanger | gets a tarball down to 110M | 21:33 |
pabelanger | -rw-r--r-- 1 pabelanger pabelanger 110M Mar 14 21:32 ansible-chroot.tar | 21:33 |
pabelanger | that is missing python and ansible currently | 21:33 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add per-repo public and private keys https://review.openstack.org/406382 | 21:55 |
jeblair | pabelanger: that should be to spec :) | 21:55 |
pabelanger | jeblair: cool, will review shortly | 21:55 |
dmsimard | I'm going to ask a stupid question that just crossed my mind | 21:56 |
dmsimard | What's stopping Zuul from doing the VM provisioning with Ansible OpenStack modules ? | 21:56 |
dmsimard | I guess nodepool also does other things like images and pre-reserved instances | 21:56 |
dmsimard | Yeah, nevermind | 21:57 |
jeblair | dmsimard: we've discussed that more or less as a plan to use linch-pin with a nodepool worker to do that. | 21:58 |
jeblair | dmsimard: the short version is: using openstack at scale is hard, so we need a native launcher for that. | 21:59 |
dmsimard | Makes sense | 21:59 |
dmsimard | It's also okay to keep things mostly single-purpose, i.e, do one thing and do it well | 21:59 |
jeblair | dmsimard: but there are folks who would like to use nodepool with other clouds at not-openstack scale, so outsourcing that to ansible modules makes sense. linch-pin helps organize those kinds of requests, so the sort of napkin-level idea is to have a nodepool launcher that uses linch-pin to talk to anything ansible can talk to. | 22:00 |
jeblair | dmsimard: if we run into scaling trouble, we can always make a new launcher for that system. eg, if we end up with a huge aws user. | 22:00 |
dmsimard | TIL about https://github.com/CentOS-PaaS-SIG/linch-pin | 22:01 |
jeblair | ya that's the one | 22:01 |
pabelanger | not to side track us or anything, but the plugin interface spec for nodepool would be interesting. could be fun to try and add gcloud support for nodepool | 22:02 |
SpamapS | pabelanger: cool. | 22:04 |
pabelanger | SpamapS: its about 251M with ansible bits | 22:04 |
SpamapS | yeah I'm fine with it being 700M too btw.. just was surprised. :) | 22:06 |
pabelanger | jeblair: sorry for the noob question, but pem files can contain both public / private keys? | 22:10 |
dmsimard | pem is a bit of a misnomer | 22:12 |
pabelanger | jeblair: I am trying to understand why we are no longer writing the public_key_file | 22:12 |
dmsimard | it's meant for x509 certificate stuff iirc | 22:12 |
pabelanger | and looks like we get the public key from the pem file | 22:12 |
pabelanger | ya, that looks to be true. The googles says so | 22:18 |
jeblair | it's 'pem encoded' so i called the file .pem (plus that's what's in the spec) | 22:19 |
*** hashar has quit IRC | 22:19 | |
pabelanger | great, thanks | 22:19 |
jeblair | and yeah, you can get the public key from the private key, so unless we have something else that wants to read that file on its own, i don't think there's too much of a point in writing it out separately at the moment. | 22:20 |
pabelanger | makes sense | 22:20 |
*** hashar has joined #zuul | 22:20 | |
jeblair | (we'll be serving the public key via the webserver, but that's part of zuul and has access to the extracted public key) | 22:20 |
SpamapS | 5 minutes playing with bubblewrap the right way has me liking it a lot | 22:50 |
SpamapS | might be worth running it setuid, which is how the Ubuntu packages install it anyway | 22:50 |
openstackgerrit | Jamie Lennox proposed openstack-infra/nodepool feature/zuulv3: Refactor nodepool apps into base app https://review.openstack.org/445674 | 22:58 |
openstackgerrit | Jamie Lennox proposed openstack-infra/nodepool feature/zuulv3: Split webapp into its own nodepool application https://review.openstack.org/445675 | 22:58 |
openstackgerrit | Jamie Lennox proposed openstack-infra/nodepool feature/zuulv3: Split webapp into its own nodepool application https://review.openstack.org/445675 | 23:02 |
openstackgerrit | Jamie Lennox proposed openstack-infra/nodepool feature/zuulv3: Refactor nodepool apps into base app https://review.openstack.org/445674 | 23:02 |
*** hashar has quit IRC | 23:51 | |
SpamapS | so bubblewrap is actually pretty great. | 23:53 |
SpamapS | when run as a setuid helper.. it pretty much locks down a chroot completely. | 23:54 |
SpamapS | only thing missing is cgroups.. which I'm less concerned about (so technically an ansible-escaper could use up all of the executor's allocated RAM/CPU) | 23:55 |
SpamapS | but we can proably just make a templated systemd unit for that | 23:55 |
SpamapS | and then each bubblewrapped playbook will be in its own cgroup | 23:55 |
clarkb | ya should be simple to apply those too if needed | 23:55 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!