*** dkranz has quit IRC | 00:16 | |
clarkb | you could dual license but that would only make things more complicated I think | 02:00 |
---|---|---|
dmsimard | clarkb: (huge coincidence that I saw you reply just now) yeah, that's sort of why I almost don't want to bother with ARA | 02:02 |
dmsimard | considering there hasn't been a lot of contributors (yet), it's not under CLA, it's not under openstack foundation governance, etc. | 02:02 |
dmsimard | If you dual license, it sort of becomes ambiguous, confusing and you have to be careful about what you import where so you don't taint in the wrong direction.. | 02:04 |
dmsimard | mordred: btw thanks for http://lists.openstack.org/pipermail/openstack-dev/2017-April/115013.html | 02:06 |
* dmsimard totally not switching from uuid primary keys to ids right now | 02:06 | |
*** jesusaurum has quit IRC | 03:41 | |
mordred | dmsimard: you're welcome! and totes -although I _do_ recommend switching at some point - it'll make you happier with larger installs | 03:57 |
mordred | dmsimard: and yah - there's no reason for you to not just make ARA GPL if you have the agreement from all of the peple who have contributed patches ( just make sure you actually have agreement from their employers, since most people don't indivually have the legal authority to agree) | 03:58 |
mordred | dmsimard: only matters for copyrightable patches - https://review.openstack.org/#/c/414381/1/ara/webapp.py, for instance, I don't think you need to worry about, for instance | 03:59 |
mordred | dmsimard: from looking at stackalytics, it looks like you have 17 commits you need to look at, determine if they are completely trivial and if not contact the author for permission. it would be 'best' to make the patch to switch to GPL and then get each author whose perission you need to switch to +1 one that commit | 04:02 |
*** bhavik1 has joined #zuul | 04:37 | |
*** bhavik1 has quit IRC | 05:57 | |
*** isaacb has joined #zuul | 06:17 | |
*** hashar has joined #zuul | 06:29 | |
*** amoralej|off is now known as amoralej | 06:45 | |
*** yolanda_ has joined #zuul | 07:05 | |
*** yolanda_ has quit IRC | 07:06 | |
*** 7ITABD5MB has joined #zuul | 07:06 | |
*** 07IAALFJ9 has joined #zuul | 07:06 | |
*** 07IAALFJ9 has quit IRC | 07:07 | |
*** 7ITABD5MB has quit IRC | 07:08 | |
*** yolanda_ has joined #zuul | 07:08 | |
*** yolanda_ is now known as yolanda | 07:11 | |
*** isaacb has quit IRC | 07:15 | |
*** lennyb has quit IRC | 07:19 | |
*** isaacb has joined #zuul | 07:23 | |
*** lennyb has joined #zuul | 07:32 | |
jamielennox | hey is there a zuul logo/mascot i can put in a slide? | 07:37 |
jamielennox | i feel like i've seen one before | 07:40 |
jamielennox | mordred: as you're in this tz and might be here ^ | 07:40 |
mordred | jamielennox: I'm not sure we've produced one of those yet | 07:51 |
*** isaacb has quit IRC | 09:15 | |
*** isaacb has joined #zuul | 09:16 | |
*** amoralej is now known as amoralej|brb | 10:08 | |
*** jkilpatr has quit IRC | 10:45 | |
*** jkilpatr has joined #zuul | 11:02 | |
*** hashar is now known as hasharLunch | 11:13 | |
*** amoralej|brb is now known as amoralej | 11:34 | |
*** dkranz has joined #zuul | 11:50 | |
*** hasharLunch is now known as hashar | 13:01 | |
*** isaacb has quit IRC | 13:58 | |
*** isaacb has joined #zuul | 14:12 | |
dmsimard | For Zuul v3 secrets ( https://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html#secrets ) | 14:30 |
dmsimard | How would you pass the equivalent of a credentials-binding for a file rather than a string ? Encrypt the base64 or something &? | 14:30 |
pabelanger | dmsimard: just confirming, you want to encrypt the whole file? | 14:38 |
dmsimard | pabelanger: currently jenkins allows to encrypt a text (string) or a file | 14:39 |
dmsimard | and then at runtime it sends that file to the slave, decrypts it and makes it available as an env var | 14:39 |
pabelanger | Ya, I don't think we support files ATM. But you should be able to store file_contents as encrypted blob then template it | 14:40 |
pabelanger | that's what we'd plan to do with SSH private keys | 14:41 |
jeblair | only up to 4096 bits | 14:57 |
jeblair | or, actually, i think a bit less than that | 14:57 |
jeblair | 4096, according to the docs: https://docs.openstack.org/infra/zuul/feature/zuulv3/user/encryption.html | 14:58 |
jeblair | dmsimard: ^ | 14:58 |
*** isaacb has quit IRC | 15:08 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove ZUUL_PROJECT https://review.openstack.org/486251 | 15:19 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove ZUUL_UUID https://review.openstack.org/486252 | 15:19 |
pabelanger | jeblair: mordred: 485824 should be a straightforward review for zuul-jobs | 15:36 |
jeblair | pabelanger: +3. anything else i should look at? | 15:37 |
pabelanger | jeblair: just that for now, thanks. Working on more refactor patches this morning | 15:38 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: testing https://review.openstack.org/486665 | 15:38 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Remove nodepool DIB specific logic https://review.openstack.org/485824 | 15:40 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: Remove .txt suffix from tox logs https://review.openstack.org/486665 | 15:45 |
*** hashar is now known as hasharMeeting | 15:52 | |
leifmadsen | just to confirm, master branch == zuul v2.5 and all v3 work still on feature/zuulv3 ? | 16:01 |
pabelanger | yes | 16:03 |
leifmadsen | thx | 16:05 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: do not merge https://review.openstack.org/486679 | 16:06 |
*** bhavik1 has joined #zuul | 16:10 | |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: do not merge https://review.openstack.org/486679 | 16:14 |
*** bhavik1 has quit IRC | 16:20 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Remove .txt suffix from tox logs https://review.openstack.org/486665 | 16:22 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Allow loading additional variables file for site config https://review.openstack.org/447734 | 16:30 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: do not merge https://review.openstack.org/486679 | 16:44 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Implement autohold https://review.openstack.org/486692 | 16:45 |
pabelanger | jeblair: I'd like to restart zuulv3 to pick up latest logging improvements | 16:47 |
jeblair | pabelanger: all yours | 16:54 |
Shrews | jeblair: so, before 692 ^^^ starts getting into the actual meat of the change, curious as to how you see the in-memory autohold list being managed. Like, do we delete the project/job from the list after the first hold? | 16:55 |
Shrews | jeblair: also, do we need to specify tenant? | 16:57 |
pabelanger | zuulv3 restarted | 16:58 |
jeblair | Shrews: lookin | 16:59 |
Shrews | well, not much to see there yet. it's just the beginnings of plugging into the scheduler :) | 17:00 |
jeblair | Shrews: (well, that also includes looking at what i wrote in storyboard so i sound like i know what i'm talking about) | 17:01 |
Shrews | lol | 17:01 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Log an extra blank line to get space after each skip https://review.openstack.org/486698 | 17:04 |
jeblair | Shrews: ah ok! good questions! :) in nodepool v0, we tell it how many failed nodes it should accumulate for a given job. i think we default to 3. so maybe we should do that here -- add an extra cmdline argument to specify the count. | 17:04 |
jeblair | Shrews: in v0, nodepool puts a note in the 'comment' field in the node table in the db like "auto held for job foo". it counts those to figure out if it has met the limit | 17:06 |
jeblair | Shrews: we could do something similar in v3, or we could actually add a field to the zk node rec for this purpose. like "zuul_job" or something. | 17:06 |
Shrews | jeblair: ah, ok. | 17:07 |
jeblair | Shrews: i think maybe once we've hit the limit, drop the entry from zuul's in-memory list? we don't do that in nodepool v0, but i think this might be a better behavior. | 17:07 |
jeblair | (only fungi is good at remembering to clean up autoheld nodes :) | 17:08 |
jeblair | Shrews: and yes, we need to specify tenant as well | 17:08 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: do not merge https://review.openstack.org/486679 | 17:09 |
*** harlowja has joined #zuul | 17:09 | |
Shrews | jeblair: great. thanks. | 17:10 |
jeblair | Shrews: and the project name should obey the new convention we're establing -- it should be a fully-qualified canonical project name (ie, git.openstack.org/foo/bar) if that's required to disambiguate it from another similarly named project, or if it's unique, it can just be "foo/bar". the Tenant.getProject method will take care of all that for you, so you can just treat it as an opaque string and hand it off to that method to get a project back (or an ... | 17:10 |
jeblair | ... error). | 17:10 |
Shrews | ah. yeah, i suppose i should use that to validate the input | 17:12 |
jeblair | Shrews: i wouldn't try to do much local input validation -- just pass it over the wire and validate it on the zuul-scheduler side, then return errors from that if there are any. i think most of the other methods work that way. | 17:13 |
Shrews | *nod* | 17:13 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: Create tox_environment_defaults variable for tox based jobs https://review.openstack.org/486679 | 17:17 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: Create tox_environment_defaults variable for tox based jobs https://review.openstack.org/486679 | 17:20 |
pabelanger | jeblair: mordred: okay, so I think we are ready to bike shed on https://review.openstack.org/#/q/topic:tox_environment_defaults | 17:28 |
pabelanger | that gives us a way to setup tox defaults, but allows anybody to also override them | 17:29 |
* fungi doesn't feel like he does a particularly excellent job of remembering to delete his held or autoheld nodes | 17:29 | |
pabelanger | mordred: jeblair: I'll reserve comments until you've had a chance to look | 17:31 |
jeblair | fungi: then the rest of us are even worse off! | 17:33 |
fungi | yikes | 17:34 |
leifmadsen | are there any documentation patches, especially around setting up zuul w/ github (and just generally getting started) that I can review/test? | 17:37 |
jeblair | leifmadsen: nothing in flight at the moment, but we do have some stuff merged. all docs: https://docs.openstack.org/infra/zuul/feature/zuulv3/ | 17:39 |
leifmadsen | thanks, reading though now, looks like I'll have to do some code digging | 17:40 |
jeblair | leifmadsen: the administrators guide has things for someone setting up zuul: https://docs.openstack.org/infra/zuul/feature/zuulv3/admin/index.html | 17:40 |
leifmadsen | I generated the latest stuff locally | 17:40 |
jeblair | leifmadsen: there are two big weak spots we know about: | 17:40 |
jeblair | leifmadsen: a good install HOWTO. we want to have a playbook to help with that. | 17:41 |
leifmadsen | just remember that playbooks are not documentation :) | 17:41 |
jeblair | leifmadsen: exactly. we still need everything to be fully documented. but "i just want to see it run" is never going to be quick and easy with a distributed system, so it'll be nice to have both. :) | 17:42 |
SpamapS | leifmadsen: You might be able to glean some info from BonnyCI's deployment ansible, called hoist... which deploys pointed at github | 17:42 |
SpamapS | leifmadsen: https://github.com/BonnyCI/hoist | 17:42 |
jeblair | leifmadsen: and we know there's some stuff missing in the github docs about how to actually set up the webhooks/triggers/etc in github's interface itself. | 17:43 |
SpamapS | There's still stuff for v2.5 in there but v3 works | 17:43 |
leifmadsen | yea, mostly interested in v3 with github events as I'm starting a comparison / review between zuulv3 and prow | 17:43 |
leifmadsen | and just understanding how both work, etc | 17:43 |
Shrews | SpamapS: lol @ hoist. i'm sensing a theme | 17:44 |
Shrews | "mateys-ahoy" ... theme confirmed | 17:44 |
leifmadsen | nautical name theme definitely a k8s style thing :) | 17:45 |
SpamapS | Shrews: click 'Projects' for a hearty flagon of pirate humor. | 17:45 |
SpamapS | well, org projects | 17:46 |
SpamapS | https://github.com/orgs/BonnyCI/projects/1 | 17:46 |
SpamapS | We don't groom the backlog.. we swab it. ;) | 17:46 |
Shrews | 404'd on that | 17:46 |
jeblair | leifmadsen: please let us know about any other missing/confusing docs | 17:47 |
SpamapS | Oh I wonder if that's org-only :-P well it's our scrum board and we named it Poop Deck. ;-) | 17:48 |
* fungi is _not_ swabbing the poopdeck | 17:49 | |
jeblair | SpamapS: what's the status of bonnyci/charts? | 17:51 |
SpamapS | jeblair: it was a spike by jamielennox .. not sure how far he got. | 17:52 |
jeblair | ah, thus the "20 days ago" | 17:52 |
SpamapS | We're being compelled to move our stuff off our openstack cloud which will be shutdown soon, so we were going to see if we could use that to deploy onto BlueMix k8s | 17:52 |
SpamapS | (and get nodes from some public cloud vendor) | 17:53 |
jeblair | gotcha. it'll be nice to have helm charts too. | 17:55 |
SpamapS | I agree, it's a good fit I think | 17:57 |
SpamapS | I was actually also going to play with Habitat | 17:57 |
SpamapS | but.. distractions abound | 17:58 |
Shrews | squirrel! | 17:58 |
jeblair | leifmadsen: fyi, right now we're heavily focused on prepping to move openstack to zuul, hopefully in a little over a month. we're working on a shared job library so that not everyone has to write their own version of a "run $language unit tests" job, and building openstack's installation on top of that. and of course, fixing any issues that surface as part of that. | 17:59 |
leifmadsen | well, I'll just over here toiling on trying to get it working as a newbie :) | 17:59 |
jeblair | leifmadsen: cool, just wanted to give you some context | 18:02 |
adam_g | v2.5 problem, anyone have any tips for debugging an issue where a node sometimes gets re-used for two changes? it looks like zmq msgs are being processed correctly, but im watching nodepool happily hand out a USED node after a previous job has completed. its fairly easy to reproduce in our env /w a loaded queue and triggers being delivered in quick succession | 18:02 |
pabelanger | http://git.openstack.org/cgit/openstack/ansible-role-zuul should get you most of the way to zuulv3. but I haven't tested it with github integration | 18:02 |
jeblair | adam_g: are you sending OFFLINE_NODE_WHEN_COMPLETE=1 as a job parameter? | 18:04 |
adam_g | jeblair: no, not afaics | 18:04 |
adam_g | should i be? | 18:04 |
pabelanger | was just going to ask that | 18:05 |
jeblair | adam_g: yes | 18:05 |
adam_g | i'll give that a shot | 18:06 |
jeblair | adam_g: remember, the v2.5 launcher is basically emulating jenkins, so nodes normally just stay attached to the "master". | 18:06 |
jeblair | adam_g: so that's emulating the thing we added to the gearman plugin to take a node offline when the job is done. | 18:06 |
*** amoralej is now known as amoralej|off | 18:07 | |
adam_g | jeblair: ok, so it happens to work w/o that settings because the deleter eventually kicks in after DELETE_DELAY ? | 18:07 |
jeblair | adam_g: yes. this addresses that race condition. | 18:08 |
adam_g | jeblair: great | 18:08 |
pabelanger | you should be able to reuse our openstack_functions.py python-file and setup the following regex: http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul/layout.yaml#n1112 | 18:09 |
adam_g | jlk: jamielennox SpamapS ^ look for a hoist patch to apply this, surprised we didnt see this more often /w our bonny jobs at peak working hours | 18:13 |
SpamapS | adam_g: "peak" ;-) | 18:13 |
SpamapS | adam_g: actually it's entirely possible our jobs were happy to run again without breaking maybe | 18:13 |
jeblair | SpamapS: that's possible, but even so, if nodepool decides to delete the node mid-run, that's also, erm, problematic. | 18:21 |
jeblair | SpamapS: though, actually, not as much as it could be... because zuul is likely to reschedule the job in that case | 18:21 |
jeblair | SpamapS: so there's a pretty convincing explanation for how it could go unnoticed. | 18:21 |
jeblair | "cloud node disappearing out from under me" is something zuul is designed to handle. even if it's self-inflicted. :/ | 18:22 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing https://review.openstack.org/485840 | 18:25 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing https://review.openstack.org/485840 | 18:30 |
SpamapS | jeblair: That's a bit schizophrenic, but I like that we have coping strategies. ;) | 18:38 |
jeblair | SpamapS: "stop hitting yourself!" | 18:50 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing https://review.openstack.org/485840 | 18:51 |
*** hasharMeeting is now known as hasharDinner | 18:56 | |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing https://review.openstack.org/485840 | 19:00 |
SpamapS | jeblair: perhaps all distributed systems problems can be boiled down to sibling rivalry tropes. Kerberos key exchange problems might be "I know you are but what am I?" | 19:08 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing https://review.openstack.org/485840 | 19:09 |
jeblair | SpamapS: this is your chance for the big time: No results found for "i know you are but what am i algorithm". | 19:12 |
SpamapS | jeblair: It's too generic to patent. :) | 19:14 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Implement autohold https://review.openstack.org/486692 | 19:21 |
*** hasharDinner has quit IRC | 19:23 | |
Shrews | jeblair: When you have a moment, looking at https://review.openstack.org/#/c/486692/2/zuul/scheduler.py , I know you said not to do much validation, but am I trying to do too much there? My thinking is that returning False (which I hope will mean job failure????) would be a friendlier way to tell the user "nope, couldn't do the hold". | 19:25 |
Shrews | without those checks, we could just fallback to the less friendly exceptions that might occur b/c of invalid things | 19:27 |
leifmadsen | is there an example tenant configuration for the github driver somewhere I could peep at? | 19:30 |
leifmadsen | oh might have just figured it out (of course, right after I ask) | 19:32 |
Shrews | jeblair: oh, doesn't look like returning False is enough to signal that. Would have to throw an exception. bummer | 19:34 |
Shrews | guess i could just 'raise Exception()' instead | 19:36 |
jeblair | Shrews: yeah, all the current errors are job exceptions. | 19:36 |
jeblair | Shrews: take a look at handle_enqueue in rpclistener | 19:36 |
jeblair | Shrews: it does input validation and returns nice error exceptions that indicate the problem | 19:37 |
SpamapS | leifmadsen: helps to get things out of your own head :) | 19:38 |
Shrews | jeblair: perfect. thx | 19:38 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing into fetch-testr-output https://review.openstack.org/485840 | 20:43 |
*** dkranz has quit IRC | 20:44 | |
*** jkilpatr has quit IRC | 21:04 | |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing into fetch-testr-output https://review.openstack.org/485840 | 21:32 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Allow loading additional variables file for site config https://review.openstack.org/447734 | 21:50 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove state_dir from setMountsMap https://review.openstack.org/486766 | 21:50 |
jeblair | tristanC: can you take a look at 486766 and make sure i'm correct about that? | 21:50 |
jeblair | jamielennox: i picked up your site vars change (447734); can you take a look and let me know if that works for you? | 21:51 |
Shrews | anyone else care to review/+A the nodepool uuid change and its parent? https://review.openstack.org/484414 Already two +2's | 21:54 |
Shrews | SpamapS or pabelanger? ^^^ | 21:55 |
jeblair | it's zuul meeting time in #openstack-meeting-alt | 22:01 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Monitor job root and kill over limit jobs https://review.openstack.org/485902 | 22:07 |
SpamapS | jeblair: good news.. ^^ now that we're synchronously killing jobs, the tests don't need to whitelist executor-diskaccountant | 22:07 |
*** jkilpatr has joined #zuul | 22:21 | |
jeblair | clarkb: we should have an expand-all button if we do that | 23:02 |
clarkb | jeblair: ++ | 23:02 |
jamielennox | clarkb: in counter point though, in 99% of cases where a test fails (and you're not on the -infra) team, it's not the node's fault and i really care about is the output of my tox | 23:02 |
clarkb | pabelanger: I left a review on one of your tox playbook changes | 23:02 |
jeblair | cause, yeah, we need to be able to see everything, but we do also have a problem in that right now, the actual error is usually right in the middle of the log. with a bunch of ignorable errors below it! :) | 23:03 |
jamielennox | i'm not saying remove it, but debugging for example the pep8 jobs in projects involves skipping 100s lines of setup to find the actual console output | 23:03 |
pabelanger | clarkb: thanks, replied | 23:03 |
clarkb | jamielennox: I ^F error, which breaks in teh collapsed style setup | 23:03 |
pabelanger | clarkb: FWIW: I do not link that patch myself. But need a good way to support all the paths for tox_environment | 23:03 |
jamielennox | anyway we can deal with the UX later, this is an awesome start | 23:04 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: Create tox_environment_defaults variable for tox based jobs https://review.openstack.org/486679 | 23:05 |
clarkb | pabelanger: I'm having a hard time parsing that last message :) | 23:05 |
pabelanger | clarkb: so, we had a discussion last week about no defined variable is better then defined empty variable | 23:06 |
pabelanger | when it comes to playbooks | 23:06 |
clarkb | pabelanger: does environment: {} and environment: omit behave differently? | 23:06 |
jamielennox | jeblair: scrolling back re BonnyCI/charts, it largely works I've definitiely got it running jobs and currently still struggling with getting the right secrets in place for uploading logs which is a problem of the non-kubernetes infrastructure | 23:06 |
pabelanger | clarkb: yes, omit would not pass environment to the task. | 23:07 |
pabelanger | but {} would be passed | 23:07 |
clarkb | pabelanger: right but does that behave differently? | 23:07 |
SpamapS | jamielennox: at least the tox jobs that have subunit give you the nice HTML breakdown though. ;) | 23:07 |
jeblair | jamielennox: oh nice. i mean, not the struggling, but the rest of it. :) | 23:07 |
jamielennox | jeblair: the main concerns are that it is more difficult to debug, and if you get for example the scheduler pod restarting then you end up in a really odd state | 23:07 |
SpamapS | maybe we should make pep8 run through subunit | 23:07 |
jamielennox | so i sort of stopped when all the option changes happened | 23:07 |
clarkb | pabelanger: if it does then I would worry that setting vars would not do what we want either because we still want to overlay with the system defaults right? so the three layers would be system defaults, tox defaults, playbook explicit env | 23:08 |
jeblair | SpamapS: pep8 is on my short list of things to move to line-review-comments once we add that :) | 23:08 |
jeblair | jamielennox: anything about site-vars we didn't touch on in the meeting? | 23:08 |
SpamapS | jeblair: mmmmmmmmmmmmmmmmmmmmmmmmmmm | 23:08 |
jamielennox | there's a few problems that really require coordination with putting code into zuul itself - which IMO makes it a post v3 thing | 23:08 |
* SpamapS dreams of line review comments | 23:08 | |
clarkb | pabelanger: stuff like LANG and so on we likely want to inherit from the system? (which is current zuulv2.5 behavior iirc) | 23:09 |
jamielennox | jeblair: all i've looked at at the moment is the executor/server file and it seems to do the same thing | 23:09 |
pabelanger | clarkb: I don't know if there is a difference, but today when using the shell command, we don't pass empty environment for tasks. So, need to test | 23:09 |
jamielennox | jeblair: at the moment we're not using it because i got sick of rebasing the patch :P | 23:09 |
pabelanger | clarkb: right, we don't overwrite them | 23:09 |
pabelanger | unless somebody bassed LANG into tox_environment | 23:09 |
pabelanger | passed* | 23:09 |
SpamapS | jamielennox: pod restarting seems like something that k8s should have facilities for doing carefully. | 23:09 |
SpamapS | isn't there a way to tell k8s "only one of these ever" ? | 23:10 |
clarkb | pabelanger: right but if you pass environment: {} would that ovewrite system default env? | 23:10 |
jeblair | jamielennox: yep. i didn't change anything substantial. but i wrote docs and tests -- i mostly wanted to make sure we knew what the story was with precedence. | 23:10 |
clarkb | pabelanger: if not then omit and {} should be equivalent right? but using {} will reduce playbook complexity? | 23:10 |
pabelanger | clarkb: well, so we always want to pass envirnment for run tox shell? or only pass it when a variable is defined | 23:11 |
jamielennox | SpamapS: it'll restart just fine, and yes it'll only run 1, but it assumes that it should be able to move pods if it hsa to, but if you take down scheduler without coordinating the other components things get weird | 23:11 |
clarkb | pabelanger: well if you always pass it then you simplify the playbook significantly and assuming the behavior isn't different that seems preferable to me | 23:11 |
jamielennox | so i can't say (that i know of) if you restart scheduler also restart these executors | 23:11 |
pabelanger | clarkb: we can try passing environment: {} | 23:11 |
clarkb | pabelanger: because then you can just combine the two dicts and then pass the result in | 23:11 |
clarkb | pabelanger: you don't even need a special block you just combine them at the environment: statement | 23:12 |
jamielennox | so this is the sort of thing that just needs fixes to zuul to better reconnect gearman processes and to store some more state | 23:12 |
pabelanger | clarkb: we still need logic to check if tox_environment and tox_environment_defaults are defined, but yes | 23:12 |
clarkb | pabelanger: well you'd define them to default to {} so they would be defined | 23:12 |
clarkb | but yes that | 23:13 |
jamielennox | other things that are annoying is that you basically need to run the nodepool-builder and the zuul-executor with --priviledged for dib and bubblewrap | 23:13 |
jeblair | i have to go run some errands now | 23:13 |
jamielennox | again i think we could tune that out with a bit of dedicated effort | 23:13 |
clarkb | jamielennox: dib at least essentially is privileged though | 23:14 |
clarkb | jamielennox: since it can mount and write filesystems and do all sorts of fun things | 23:14 |
jamielennox | clarkb: there should be a way of providing that cap though without giving priviledged right? | 23:15 |
jamielennox | because we're only mounting things within the container | 23:15 |
clarkb | jamielennox: aiui the reason mount is part of privileged (it can be separately given out) is that if you can mount you can mount whatever including the host fs? | 23:16 |
clarkb | and once you've done that you own the system | 23:16 |
*** artgon has left #zuul | 23:16 | |
jamielennox | you would need to have access to the host fs right? or is the implication you can still get that through /dev? | 23:17 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: Create tox_environment_defaults variable for tox based jobs https://review.openstack.org/486679 | 23:17 |
jamielennox | so the classic security issue is running as root in the container and mounting directories in | 23:17 |
clarkb | jamielennox: I think worst case you just create the node in /dev ? | 23:18 |
jamielennox | ah, ok, didn't realize you could just recreate the node | 23:18 |
clarkb | where worst case is "my host tried to hide it form me" | 23:18 |
jamielennox | mknod has always been magic to me | 23:18 |
clarkb | its been a while since I looked into all this with the iscsi container woes | 23:18 |
clarkb | but ya mount is scary in containers | 23:18 |
jamielennox | so that'll probably affect bubblewrap as well? | 23:19 |
clarkb | jamielennox: reading really quickly mknod is a default docker privilege | 23:19 |
clarkb | jamielennox: so its possible this is just docker being silly too | 23:20 |
clarkb | jamielennox: so if you add mount to a docker container it already has mknod and thats all you need | 23:20 |
jamielennox | yea ok, so in this case nodepool-builder i thought might be fixable, but is reasonably controlled/truste | 23:22 |
jamielennox | running zuul-executor with --priviledged is a big problem | 23:22 |
jamielennox | having said that i think part of the reason is the whole bubblewrap setuid thing | 23:23 |
jamielennox | i'm not actually sure how it works if i run the executor itself as root | 23:23 |
pabelanger | you only need root for finger today, did you change the post to something > 1024 ? | 23:23 |
pabelanger | port* | 23:23 |
jamielennox | pabelanger: yea i just put the port number up for that | 23:24 |
jamielennox | there's a problem here that i don't fully understand | 23:24 |
pabelanger | I'd like us to drop root in openstack-infra too, once we have websocket proxy | 23:24 |
jamielennox | if you don't run bubblewrap as root you generally give it setuid so it can run | 23:24 |
jamielennox | but there is a problem (to do with user namespaces afaict) with running setuid within the container | 23:25 |
jamielennox | anyway, once i gave it --priviledged it worked, and i moved on with a note to come back to the problem | 23:26 |
pabelanger | not sure I understand, I'm running bubblewrap locally as non-root. I don't think I setup anything with setuid | 23:26 |
pabelanger | something, something, container? | 23:26 |
jamielennox | it's not close enough for a production use yet anyway | 23:26 |
jamielennox | pabelanger: i think the .deb puts setuid on the bin right? | 23:26 |
pabelanger | Hmm, need to check. I am using fedora | 23:27 |
pabelanger | unless rpm did something | 23:27 |
clarkb | iirc you need setuid on older kernels | 23:27 |
clarkb | where older kernel is like anything not newer than 2 months old | 23:27 |
jamielennox | -rwsr-xr-x 1 root root 47072 May 2 16:41 /usr/bin/bwrap | 23:27 |
clarkb | so if using a .deb that implies ubuntu/debian which have old kernels | 23:27 |
jamielennox | that's after install on an up to date xenial | 23:28 |
pabelanger | -rwxr-xr-x. 1 root root 48904 May 26 02:32 /usr/bin/bwrap | 23:28 |
jamielennox | clarkb: yea, my understanding is that there's a kernel fix that still hasn't made it into xenial | 23:28 |
jamielennox | that will fix the bwrap problem in particular | 23:29 |
jamielennox | but i'm not sure why user namespaces and setuid is a problem, but it's mentioned in a number of places | 23:29 |
pabelanger | jamielennox: confirmed, that is how bwrap is setup on xenial | 23:30 |
pabelanger | https://anonscm.debian.org/cgit/collab-maint/bubblewrap.git/tree/debian/rules | 23:31 |
clarkb | jamielennox: I think it is because the setuid perms in a namespace will setuid to a non privileged user | 23:31 |
clarkb | jamielennox: if you use the host namespace then setuid is going to use proper root and be happy | 23:31 |
jamielennox | interestingly if it's a kernel problem then i'm not sure what happens if we flip the docker container over to centos or something because the underlying infrastructure might not be on the host | 23:32 |
jamielennox | clarkb: that's interesting because at least theoretically for this you only need to be root in that container, you're not writing anything out | 23:33 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul-jobs master: Create tox_environment_defaults variable for tox based jobs https://review.openstack.org/486679 | 23:33 |
pabelanger | clarkb: okay, updated^ | 23:34 |
jamielennox | but again, apparently this is something that is fixed/improved in later kernels | 23:34 |
clarkb | jamielennox: except that bubblwrap is using kernel capabilities that a in container unprivileged user won't have aiui | 23:34 |
jamielennox | so it's probably something where the practical has not yet caught up with the theoretical | 23:34 |
clarkb | jamielennox: in newer kernels they made those capabilities more fine grained so that you don't need proper root like caps | 23:34 |
pabelanger | and EOD for me | 23:35 |
jamielennox | clarkb: yep, we can add specific caps to the container fairly easily, which i'm ok with doing, just would prefer not to do the full --priviledged | 23:35 |
clarkb | jamielennox: ya though my understanding is until you have a newer kernel that basically means root so its probably six one way half dozen the other until you can rely on newer kernels | 23:36 |
clarkb | clearly we just need the future here today to solve all the problems | 23:36 |
jamielennox | clarkb: yea, which is how i've basically go to the point that all this is super interesting but i wouldn't feel comfortable running this in any sort of prod today | 23:37 |
jamielennox | regardless of how you lock it down | 23:37 |
jamielennox | which is a shame because i think having a fairly easy chart you could deploy to something like GKE would be good for adoption, but something we can look at again in future | 23:38 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!