*** erbarr has quit IRC | 00:01 | |
*** ianychoi has quit IRC | 00:07 | |
*** Goneri has quit IRC | 00:08 | |
*** ianychoi has joined #zuul | 00:08 | |
*** igordc has joined #zuul | 00:12 | |
*** tosky has quit IRC | 00:28 | |
*** dmellado has quit IRC | 01:22 | |
*** mhu has quit IRC | 01:26 | |
*** ianychoi has quit IRC | 01:30 | |
*** ianychoi has joined #zuul | 01:32 | |
*** dmellado has joined #zuul | 01:46 | |
*** ianychoi has quit IRC | 02:10 | |
*** swest has quit IRC | 02:14 | |
*** ianychoi has joined #zuul | 02:18 | |
*** swest has joined #zuul | 02:30 | |
*** bhavikdbavishi has joined #zuul | 02:59 | |
*** igordc has quit IRC | 03:30 | |
*** ianychoi has quit IRC | 03:35 | |
*** ianychoi has joined #zuul | 03:37 | |
*** ianychoi has quit IRC | 05:00 | |
*** ianychoi has joined #zuul | 05:03 | |
*** raukadah is now known as chandankumar | 05:18 | |
*** saneax has joined #zuul | 05:27 | |
*** evrardjp has quit IRC | 05:35 | |
*** evrardjp has joined #zuul | 05:35 | |
*** ianychoi has quit IRC | 06:01 | |
*** ianychoi has joined #zuul | 06:03 | |
*** bolg has joined #zuul | 06:07 | |
*** ianychoi has quit IRC | 06:11 | |
*** ianychoi has joined #zuul | 06:13 | |
*** ianychoi has quit IRC | 06:29 | |
*** ianychoi has joined #zuul | 06:31 | |
*** y2kenny has quit IRC | 06:38 | |
*** ianychoi has quit IRC | 06:39 | |
*** ianychoi has joined #zuul | 06:41 | |
*** marvs has joined #zuul | 06:49 | |
*** ianychoi has quit IRC | 06:58 | |
*** ianychoi has joined #zuul | 07:05 | |
*** avass has joined #zuul | 07:12 | |
*** Defolos has quit IRC | 07:14 | |
*** sshnaidm|afk is now known as sshnaidm | 07:16 | |
*** dpawlik has joined #zuul | 07:17 | |
*** hashar has joined #zuul | 07:45 | |
*** jcapitao has joined #zuul | 07:46 | |
*** Defolos has joined #zuul | 08:26 | |
*** tosky has joined #zuul | 08:28 | |
*** mhu has joined #zuul | 08:45 | |
*** avass has quit IRC | 08:46 | |
*** gouthamr has quit IRC | 08:49 | |
*** mgoddard has quit IRC | 08:49 | |
*** gouthamr has joined #zuul | 08:50 | |
*** mgoddard has joined #zuul | 08:50 | |
*** jpena|off is now known as jpena | 08:52 | |
*** AJaeger has quit IRC | 09:54 | |
*** AJaeger has joined #zuul | 10:14 | |
-openstackstatus- NOTICE: The mail server for lists.openstack.org is currently not handling emails. The infra team will investigate and fix during US morning. | 10:27 | |
*** hashar has quit IRC | 10:31 | |
*** avass has joined #zuul | 10:59 | |
*** ttx has quit IRC | 11:17 | |
*** ttx has joined #zuul | 11:19 | |
*** ttx has quit IRC | 11:19 | |
*** ianychoi has quit IRC | 11:24 | |
*** ttx has joined #zuul | 11:28 | |
*** rlandy has joined #zuul | 11:54 | |
*** Goneri has joined #zuul | 11:58 | |
*** jcapitao is now known as jcapitao_lunch | 11:58 | |
*** jcapitao_lunch has quit IRC | 12:03 | |
*** jcapitao_lunch has joined #zuul | 12:05 | |
*** sshnaidm is now known as sshnaidm|afk | 12:10 | |
*** jpena is now known as jpena|lunch | 12:22 | |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Enforce sql connections for scheduler and web https://review.opendev.org/630472 | 12:41 |
---|---|---|
*** jpena|lunch is now known as jpena | 13:01 | |
*** plaurin has joined #zuul | 13:11 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Add ensure-snap role https://review.opendev.org/712414 | 13:17 |
*** bhavikdbavishi has quit IRC | 13:19 | |
*** irclogbot_3 has quit IRC | 13:22 | |
*** irclogbot_2 has joined #zuul | 13:24 | |
*** jamesmcarthur has joined #zuul | 13:27 | |
*** jcapitao_lunch has quit IRC | 13:30 | |
*** jcapitao has joined #zuul | 13:33 | |
*** zxiiro has joined #zuul | 13:43 | |
*** y2kenny has joined #zuul | 14:07 | |
*** avass has quit IRC | 14:10 | |
y2kenny | As far as I understand, in order to get zuul build any project, that project has to be referenced in the tenant config. But if I have a really large project that is not "zuul-native", what is the best way to handle it? If I include the project under untrusted-projects, the very large project will get scan on zuul startup. What I have done so far | 14:32 |
y2kenny | is to include the project under untrusted-project with an empty include. Is this the best way to do it? | 14:32 |
clarkb | y2kenny: yes, that makes the repo available in jobs without needing to wprry about loading configuration for jobs from it | 14:34 |
clarkb | we do similar when we load repos from external sources like public github repos | 14:35 |
y2kenny | great, thanks for confirming. | 14:36 |
mordred | y2kenny: https://opendev.org/openstack/project-config/src/branch/master/zuul/main.yaml#L43-L76 for instance | 14:37 |
y2kenny | mordred: oh cool, that's useful | 14:37 |
*** sshnaidm|afk is now known as sshnaidm | 14:37 | |
y2kenny | Another question: if I want to trigger off only a very specific branch of a project, I will have to first associate a pipeline with a project and then specify the branches in a job that is associated with the pipeline of that project, is that right? | 14:43 |
clarkb | y2kenny: the easiest way to do that is to put the config in those branches. Then they can add in jobs they want that apply directly to thosebranches | 14:44 |
y2kenny | clarkb: I can't really do that at this point because we don't control those project (linux kernel upstream) | 14:45 |
y2kenny | With the way I setup right now, seems like all branches will trigger to do a bit of something regardless of the branches filter in the job. | 14:47 |
*** bhavikdbavishi has joined #zuul | 14:48 | |
clarkb | I see. In that case you can use abranch filter on the jobs, I would suggest that goes into an unbranched config project for simplicity | 14:48 |
y2kenny | And that bit of something seems to be attempting to merge-mode. Which raise another question, is there a way to specify have a merge-mode: none? (for example, testing a change that hasn't been rebased or not able to merge?) | 14:49 |
fungi | right, generally te only time putting configuration like that in a branched repo works out is if the branch names are identical between the project hosting the configuration and the project for which the jobs are being run | 14:49 |
y2kenny | this usually happens on a change that is WIP | 14:49 |
clarkb | y2kenny: I dont think that behavior exists. Zuul curre tly wants to report the need for a rebase | 14:52 |
fungi | y2kenny: not to my knowledge. for one thing that would make it basically impossible to mix in any declared (not implicit) change dependencies since they need to share a common git history if they're for the same repository+branch | 14:52 |
mordred | y2kenny: I don't think we've had that use case come up as desirable before. the normal way we think of the test runs is "if we approved this for merge right now, would it pass everything" - and something that is unmergable would not do that (and the act of rebasing can, itself, cause test failures) - so we skip things and report the merge failure | 14:52 |
mordred | also - clarkb, fungi and I all just wrote the same thing :) | 14:52 |
mordred | yay async communication! | 14:53 |
y2kenny | so the use case I have is to also allow developers to use the zuul/CI infrastructure for WIP testing/experiment | 14:54 |
mordred | y2kenny: is the process your devs use to work on a kernel patch to get it functionally and semantically correct without worrying about rebasing to target, then once that's happy do the rebase and re-review the results? | 14:54 |
*** y2kenny has quit IRC | 14:54 | |
mordred | ohno. we lost y2kenny | 14:54 |
*** y2kenny has joined #zuul | 14:55 | |
mordred | y2kenny: yeah - we do that a ton - I agree, It's super-valuable. we've just all also adopted the practice of keeping rebased with the target branch as we work on those WIPs - or Do-not-merge experiments | 14:56 |
y2kenny | um... weird... got disconnected again. | 14:56 |
y2kenny | Our policy is also to rebase eventually but it's also possible that that set of changes are not rebase-able or merge ready due to code churn. | 14:57 |
mordred | that said - maybe a merge-mode: none for a. check pipeline wouldn't be impossible to implement? obviously it wouldn't work for a gate pipeline. definitely a corvus question | 14:57 |
y2kenny | how do you setup the do-not-merge jobs/pipeline btw? I was wondering if there are integration with Gerrit's WIP status | 14:58 |
mordred | we don't set up a specific pipeline - we just push patches up for review | 14:59 |
mordred | and mark then WIP so reviewers ignore them | 14:59 |
y2kenny | (let's say someone pushed a patch to gerrit with WIP on, or have a [WIP] at the subject line. Pipeline will trigger but won't submit. Pipeline then trigger when the WIP status change.) | 14:59 |
y2kenny | ok. I suppose the change won't get merge if there's no CR+2 anyway | 15:00 |
mordred | yeah | 15:00 |
mordred | y2kenny: for instance: https://review.opendev.org/#/c/705808/ | 15:00 |
mordred | or https://review.opendev.org/#/c/704644/ | 15:01 |
fungi | but also in opendev we set a workflow -1 (labeled "work in progress") vote which blocks merge | 15:01 |
y2kenny | ok... so this one is probably my workflow specifically... I have a bit of a weird situation where some folks doesn't want to use Gerrit for CR while others do. I think I will probably have to handle this with a custom job. | 15:01 |
mordred | y2kenny: that sounds like a complicated situation | 15:02 |
mordred | y2kenny: I'm guessing since we're talking kernel that some of your folks prefer email review? | 15:03 |
y2kenny | let's just say, in the linux kernel community, they have this checkpatch script (pretty much a lint script) that would scan for Change-ID in the commit message and flag them. | 15:03 |
y2kenny | mordred: yea.... | 15:03 |
corvus | y2kenny: you could entice them to use gerrit with the opportunity to have a nice ci system :) also, for the folks who may be reluctant to use gerrit's web ui, they may be interested in gertty https://pypi.org/project/gertty/ | 15:03 |
*** hashar has joined #zuul | 15:03 | |
y2kenny | corvus: I have been trying for half a decade :) And yes, I was pushing gertty too but inertia is hard to move. | 15:05 |
y2kenny | Are you the gertty author btw? | 15:06 |
corvus | also, ftr, we've talked about an email driver for zuul (so it could respond to emailed patches as changes). it may be possible. | 15:06 |
corvus | y2kenny: yes | 15:06 |
* corvus is james blair | 15:06 | |
y2kenny | cool. I think we have chat briefly in Sunnyvale. | 15:07 |
y2kenny | good to put some names and faces to the chat handle. | 15:07 |
corvus | y2kenny: nice to meet you again! :) | 15:07 |
corvus | mordred is monty taylor, who gave the zuul talk there | 15:07 |
y2kenny | oh nice. Everybody is here :D | 15:08 |
* mordred waves | 15:10 | |
fbo | hi, I wanted to continue the effort to package Zuul into Fedora. Basically adding it to rawhide. I figured out a zuul deps ws4py is not compat py37 and 38 (rawhide python version). I see Zuul is tested against py35 only. Is there an effort to test Zuul against upper version of Python ? Should it be started ? | 15:10 |
mordred | sorry I didn't go out for dinner/drinks after - I was *SUPER* jet lagged | 15:10 |
*** hashar has quit IRC | 15:11 | |
corvus | fbo: it's tested against py37 | 15:11 |
mordred | fbo: we also have a py37 tox job and our container images build and test with py38 I believe | 15:11 |
mordred | oh - my bad - containers are also py37 | 15:12 |
fbo | oh ok I'll double check because I see that https://opendev.org/zuul/zuul/src/branch/master/tox.ini#L4 | 15:13 |
*** AJaeger has quit IRC | 15:14 | |
mordred | fbo: ah - that's just what tox will run if you don't give it any parameters | 15:15 |
mordred | fbo: we have an explicit tox-py37 job - I don't know that many of us (any of us) just run "tox" ever - so I doubt that line is closely maintained | 15:16 |
fbo | mordred: ok well that's just me maybe ;) just running "tox" locally w/o version param. So ok nervermind. | 15:20 |
*** sreejithp has joined #zuul | 15:24 | |
*** mattw4 has joined #zuul | 15:26 | |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: Add tox-py38 in check https://review.opendev.org/712480 | 15:27 |
mordred | fbo: good to know there is someone who does that! | 15:28 |
*** sgw has quit IRC | 15:32 | |
fbo | Yes it would be super nice to have zuul and nodepool into fedora. dnf install zuul nodepool. It could help to lower the entry cost for Fedora user. | 15:33 |
*** bolg has quit IRC | 15:48 | |
*** sgw has joined #zuul | 15:48 | |
fungi | my position on the envlist parameter is we should just put "py3" in there | 15:49 |
fungi | and then unqualified `tox` will run with whatever the default python3 is on the system | 15:49 |
zenkuro | fbo: Ithink it already there | 15:50 |
*** mattw4 has quit IRC | 15:51 | |
corvus | fungi: oh, if that's an option, ++ | 15:52 |
fungi | it's what i do for my personal projects | 15:56 |
tobiash | corvus: should we test all python versions we officially support for zuul (and I guess we'd need to define them precicely)? | 15:56 |
tobiash | asking because of ^ | 15:56 |
*** AJaeger has joined #zuul | 15:56 | |
corvus | tobiash: we test the lowest and highest available | 15:57 |
*** jcapitao is now known as jcapitao_afk | 15:57 | |
fungi | because honestly, the options are between tox throwing an error saying you don't have an appropriate python version installed, or tests failing because you're running on an unsupported python version, but either way i don't think the envlist in tox.ini is the place to declare what python versions you support | 15:57 |
tobiash | corvus: ok, with that stragegy we should change py37 to py38 probably instead | 15:57 |
corvus | tobiash: i think we can probably skip the versions in the middle, unless we think there's some specific risk | 15:57 |
corvus | tobiash: i agree | 15:57 |
clarkb | tobiash: fbo: note that ubuntu-bionic doesn't have python3.8 packaged as far as I know so that proposed job won't work as is | 15:58 |
tobiash | clarkb: it seems to execute tests though | 15:59 |
fungi | clarkb: https://packages.ubuntu.com/bionic-updates/python3.8 | 15:59 |
clarkb | woah | 15:59 |
fungi | should work as long as we install it | 15:59 |
tobiash | but it might still run py37 | 16:00 |
y2kenny | Are there any recommendation on sizing and scaling various components of zuul? I was looking at the components doc page and sounds like executor and merger needs some disk space since they work with the git repositories. But how many executor do I need (would I need an executor per available node?) Also, are there any size recommendation for | 16:03 |
y2kenny | zookeeper? (I assume it might need to scale with the number of nodes but I am new to zookeeper myself.) | 16:03 |
corvus | y2kenny: in opendev, our executors have 8G of ram and 8vcpus and handle about 80-100 concurrent builds | 16:05 |
corvus | each | 16:05 |
clarkb | y2kenny: you can see opendev's sizing via cacti at http://cacti.openstack.org/cacti/graph_view.php (I can never figure out how to deep link to hosts so you'll have to use the nav bar on the left. zeXX are executors, zmXX are mergers, zuul01 is the scheduler and zkXX is zookeeper | 16:05 |
clarkb | corvus: y2kenny executors also have 80GB of disk which is enough for our git repo set and about that many concurrent builds | 16:06 |
clarkb | our scheduler is a bit oversized which was good for us when developing zuulv3 but can probably be shrunk at this point | 16:06 |
clarkb | seems like starting with an executor of about that size is a reasonable starting point, then you can add more as you grow. (similar with mergers) | 16:07 |
corvus | yeah, looking at its graphs, i'd probably give the scheduler 8-16G of ram | 16:07 |
y2kenny | does the executor share some kind of git cache/bare repo or does executor has it's own complete git repo for each job it support? | 16:08 |
mordred | tobiash, clarkb, fungi: I believe we updated the tox jobs to install the needed python if it's available | 16:08 |
corvus | y2kenny, clarkb: deep link to scheduler: http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=691&nodeid=node1_691&host_group_data= | 16:08 |
corvus | and executor: http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=558&nodeid=node1_558&host_group_data= | 16:09 |
clarkb | y2kenny: as long as the /var/lib/zuul/ dir is a single filesystem it (mostly git) will use hardlinks to share common git files between logically different repos | 16:09 |
clarkb | y2kenny: zuul will create a "copy" of the repo for each job but largely avoids extra disk overhead for that via the hardlinking behavior of git | 16:09 |
clarkb | y2kenny: that means we end up with one actual copy + the separate git objects (merges really) for each build | 16:10 |
tobiash | y2kenny: regarding number of executos, this depends highly on the kind of jobs (how big are the repos, how many repos are used per job). But a rough guideline could be one executor per 20-50 concurrent jobs. | 16:10 |
mordred | tobiash, clarkb, fungi: https://opendev.org/zuul/zuul-jobs/src/branch/master/zuul.d/python-jobs.yaml#L113 https://opendev.org/zuul/zuul-jobs/src/branch/master/playbooks/tox/pre.yaml#L4 | 16:12 |
y2kenny | corvus, clarkb, tobiash: thanks! | 16:13 |
openstackgerrit | Jeremy Stanley proposed zuul/zuul master: Declare support for Python3.8 https://review.opendev.org/712489 | 16:16 |
fungi | mordred: yeah, and there is already a tox-py38 job in zuul-jobs | 16:18 |
mordred | yah | 16:18 |
mordred | clarkb: do we still need fix-tox in zuul? | 16:18 |
y2kenny | For the executor, is it possible to specify a location to use with git clone reference (https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---reference-if-ableltrepositorygt)? I am thinking of pointing a network local NFS-backed git cache and have it shared by all the executors but may be this is a premature optimization. | 16:19 |
mordred | for the tox-py35 job? | 16:19 |
fungi | tobiash: y2kenny: how it's worked out for us in opendev is that we need roughly one executor per 100 remote job nodes | 16:19 |
clarkb | mordred: the thing that pinned importlib-resources? I don't think so | 16:20 |
mordred | clarkb: kk. | 16:20 |
clarkb | https://review.opendev.org/#/c/712107/ would confirm globally, but zuul got its own fix merged too | 16:20 |
AJaeger | fungi: yes, tox-py38 exists | 16:20 |
y2kenny | fungi: ok, perhaps I don't need to worry about scaling executors just yet :) | 16:20 |
AJaeger | fungi: oh, that wasn't a question. Sorry ;( | 16:20 |
fungi | no worries, nice to get confirmation ;) | 16:21 |
openstackgerrit | Jeremy Stanley proposed zuul/nodepool master: Declare support for Python3.8 https://review.opendev.org/712494 | 16:25 |
fungi | okay so 712489 and 712494 bump the upper python version tox jobs to py38, add trove classifiers to our package metadata indicating the python minor versions we expect we work with, and replace the py35 in the tox default envlist with just py3 (for convenience and also to avoid confusion) | 16:26 |
fungi | fbo: does that make more sense? | 16:26 |
openstackgerrit | Monty Taylor proposed zuul/zuul master: Remove fix-tox workaround for python3.5 https://review.opendev.org/712495 | 16:26 |
openstackgerrit | Monty Taylor proposed zuul/zuul master: Remove duplicate variables for tox jobs https://review.opendev.org/712496 | 16:26 |
fbo | fungi: yes that's look better like that | 16:29 |
fungi | after all, the python package metadata (and/or documentation) are the places to declare what python versions zuul is supposed to work with | 16:30 |
*** jcapitao_afk is now known as jcapitao | 16:32 | |
fungi | declaring them in tox.ini assumes either 1. developers are going to have all those versions installed on their system to test with when running `tox` or 2. relies on them ignoring failures with errors about missing python interpreters or 3. requires a dangerous additional setting in tox.ini which can result in tox happily returning success when it didn't run explicitly requested testenvs | 16:32 |
tobiash | zuul-maint: this saves about 5-8% of github api requests in our prod system: https://review.opendev.org/710985 | 16:33 |
fungi | #3 is especially a problem, since while it's fairly obvious for local invocations, it can result in slight misconfigurations of tox-using ci jobs to test nothing and return success (like if they're set to use an image which doesn't have the requested python version) | 16:34 |
*** jcapitao has quit IRC | 16:42 | |
mordred | tobiash: nice | 16:43 |
corvus | Shrews, mordred, tristanC, clarkb: i've taken my 3-node tls zk cluster and connected to it with a kazoo client | 16:45 |
corvus | so i think i've manually worked out everything that's needed; i'll try to distill this into a modification of our test setup now | 16:45 |
mordred | corvus: neat | 16:46 |
Shrews | awesome | 16:48 |
*** jamesmcarthur has quit IRC | 16:49 | |
*** jamesmcarthur has joined #zuul | 16:50 | |
tobiash | cool | 16:58 |
*** jamesmcarthur has quit IRC | 17:01 | |
*** Defolos has quit IRC | 17:12 | |
*** jamesmcarthur has joined #zuul | 17:15 | |
openstackgerrit | Merged zuul/zuul master: Store build.error_detail in SQL https://review.opendev.org/709857 | 17:16 |
*** chandankumar is now known as raukadah | 17:18 | |
*** jamesmcarthur has quit IRC | 17:23 | |
*** jamesmcarthur has joined #zuul | 17:28 | |
*** evrardjp has quit IRC | 17:35 | |
*** evrardjp has joined #zuul | 17:35 | |
*** bhavikdbavishi has quit IRC | 17:36 | |
*** erbarr has joined #zuul | 17:36 | |
*** jamesmcarthur has quit IRC | 17:37 | |
*** jamesmcarthur has joined #zuul | 17:37 | |
*** hashar has joined #zuul | 17:39 | |
fungi | looks like my proposed tox-py38 failed: https://zuul.opendev.org/t/zuul/build/29a876f8481449e3afd35657adb46181 | 17:40 |
*** jamesmcarthur has quit IRC | 17:40 | |
fungi | OSError: [Errno 88] Socket operation on non-socket | 17:41 |
fungi | in tests.unit.test_daemon.TestDaemon.test_daemon | 17:41 |
corvus | neat; maybe that's a real problem? is anyone using zuul under 3.8 yet? | 17:47 |
Shrews | clarkb: heh, looking to a solution for the need to use zk-shell to erase provider upload records, it seems i had already created such an option in the nodepool cli (the 'erase' command), only it doesn't cover uploads and deletes ALL zk data for a provider. I think modifying this to be a bit more flexible should do the trick. | 17:49 |
clarkb | ++ | 17:49 |
*** jamesmcarthur has joined #zuul | 17:54 | |
openstackgerrit | Merged zuul/zuul master: Cache getUser in Github connection https://review.opendev.org/710985 | 17:55 |
corvus | mordred just triggered this error http://paste.openstack.org/show/790558/ with this patch https://review.opendev.org/712525 | 17:56 |
corvus | looks like a zuul bug | 17:57 |
*** jpena is now known as jpena|off | 17:57 | |
corvus | i'd track it down, but i'm knee deep in the zk stuff | 17:57 |
corvus | if anyone else wants to look into it, that'd be great | 17:57 |
clarkb | I need to step away for a bit, but when I get back in an hour I can probably look if others aren't already | 17:59 |
Shrews | for some reason, we cap python-daemon. i'd wager we need a newer version. but also looking at nodepool things now | 18:00 |
Shrews | quick scan of the change log references socket things in a newer version: https://pagure.io/python-daemon/blob/master/f/ChangeLog#_34 | 18:02 |
Shrews | fungi: might want to uncap that and see if it helps or breaks worse | 18:02 |
Shrews | https://pagure.io/python-daemon/issue/34 | 18:03 |
Shrews | yep, that's it | 18:03 |
*** sshnaidm is now known as sshnaidm|afk | 18:04 | |
mordred | awesome | 18:12 |
*** y2kenny has quit IRC | 18:13 | |
mordred | Shrews: I think we were capping python-daemon because it either did something bad in a newer version, or didn't work with newer python | 18:13 |
mordred | Shrews: the commit message is sparse on details | 18:13 |
mordred | Shrews: The latest python-deamon has broken zuul. Constrain zuul below | 18:13 |
mordred | the last release. | 18:13 |
mordred | Shrews: so yeah - maybe the answer is uncapping it, seeing what breaks and what is needed to fix it/ | 18:14 |
Shrews | mordred: oh, i'm sure there was some reason for the cap. my personal 'git log' of my brain reveals no details either | 18:14 |
Shrews | mordred: the nodepool side of that change says: | 18:16 |
Shrews | 2.1.0 throws the exception: | 18:16 |
Shrews | 18:16 | |
Shrews | daemon.daemon.DaemonOSEnvironmentError: Unable to change process owner ([Errno 1] Operation not permitted) | 18:16 |
Shrews | so... there's that | 18:16 |
mordred | NEAT | 18:16 |
Shrews | maybe that was a thing fixed in 2.1.1 and we never followed up | 18:17 |
mordred | maybe so? | 18:18 |
*** hashar_ has joined #zuul | 18:21 | |
*** hashar has quit IRC | 18:21 | |
*** hashar_ has quit IRC | 18:22 | |
*** hashar has joined #zuul | 18:22 | |
SpamapS | wow ok I have a *really* weird thing, it's more git than zuul, but maybe y'all can help | 18:26 |
SpamapS | I've been using git-crypt for the last 18 months on this repo, but this week we're migrating to SOPS | 18:27 |
SpamapS | this means removing the '.gitattributes' file that makes git-crypt function. | 18:27 |
SpamapS | In my local repo, a file shows plaintext. But in a zuul test node, when the branch is merged on top of master, it shows as the old version, still encrypted... | 18:28 |
clarkb | is it possible that state is persisting in the merger repos? | 18:29 |
SpamapS | git log <thefile> shows the same commit for both, the one where the file was added, but it was recently moved and decrypted, so I'm also just not sure why it doesn't show a change in either checkout | 18:29 |
mordred | SpamapS: I don't know off the top of my head (don't know much about git-crypt) - but could it have to do with zuul not having the ability to decrypt the file, so it can't properly merge it? | 18:30 |
SpamapS | I also suspect something iwth renam wonkiness | 18:30 |
mordred | (I'm assuming you've done things with the file over the last 18 months though) | 18:30 |
SpamapS | One thing, the file basically moved from being binary, to text | 18:30 |
SpamapS | git-crypt stores as binaries and then uses a filter to show you the text. | 18:31 |
SpamapS | whoa.. Ok, this is confusing | 18:32 |
SpamapS | http://paste.openstack.org/show/790561/ | 18:33 |
SpamapS | dafuq? | 18:33 |
mordred | corvus: the unknown config error was a red herring - it's the one-project-two-tenants issue - that was an error from the opendev tenant | 18:33 |
SpamapS | Ok.. | 18:33 |
corvus | mordred: ah, well, it's probably still a bug | 18:33 |
SpamapS | git-crypt did something wonky | 18:33 |
*** AJaeger has quit IRC | 18:34 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: WIP: ZK TLS https://review.opendev.org/712531 | 18:34 |
mordred | SpamapS: yeah - I definitely don't know enough about git-crypt to be useful there | 18:34 |
SpamapS | hiding the fact that this file was modified locally. I did `rm id_rsa_zuul.pub && git checkout id_rsa_zuul.pub` and the binary version was there | 18:34 |
SpamapS | mordred: nobody does, that's why we're going to sops. ;) | 18:34 |
SpamapS | which is super awesome btw. | 18:34 |
mordred | corvus: yes - but I feel like we tracked it down once before - and it's related to the fact that there are config errors for system-config in the opendev tenant due to missing repos already | 18:34 |
mordred | corvus: or something like that | 18:35 |
SpamapS | for in-repo secrets, sops is tha bomb | 18:35 |
SpamapS | ok I see now.. if you've had git-crypt in a repo, and you remove the .gitattributes files, you also need to remove some other junk in .git | 18:35 |
mordred | SpamapS: which zuul doesn't know to do in this case | 18:36 |
*** AJaeger has joined #zuul | 18:37 | |
SpamapS | mordred: yeah this was 0 zuul's fault, which I was pretty sure of. :) | 18:39 |
corvus | mordred, Shrews, tristanC, clarkb, fungi: ^ 712531 is a script to manage a CA for issuing zk certs; i'll build on that for our tests, but also, i think it's something folks could use for production zk deployments too (we could just run that in opendev ansible) | 18:39 |
SpamapS | zuul actually didn't have the .git crud, but my local clone did. | 18:39 |
*** plaurin has quit IRC | 18:52 | |
*** klindgren has quit IRC | 18:59 | |
*** klindgren has joined #zuul | 19:04 | |
openstackgerrit | David Shrewsbury proposed zuul/nodepool master: Add options to CLI info command https://review.opendev.org/712539 | 19:08 |
openstackgerrit | David Shrewsbury proposed zuul/nodepool master: Add options to CLI info command https://review.opendev.org/712539 | 19:09 |
openstackgerrit | Clark Boylan proposed zuul/zuul master: Don't access parent layout errors if there is no parent layout https://review.opendev.org/712544 | 19:32 |
clarkb | corvus: mordred ^ that is admittedly half a stab in the dark, but I think it covers the behavior shown by that traceback | 19:32 |
AJaeger | mordred: want to do the zuul_return trick on openstack-zuul-jobs and zuul-jobs as well? | 19:32 |
mordred | AJaeger: yeah. I can follow up with them in just a few | 19:35 |
AJaeger | thanks | 19:35 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Use a fake zuul_return and an .ansible-lint file https://review.opendev.org/712547 | 19:46 |
*** jamesmcarthur has quit IRC | 19:48 | |
*** hashar has quit IRC | 19:48 | |
*** jamesmcarthur has joined #zuul | 19:49 | |
*** jamesmcarthur has quit IRC | 19:49 | |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Use a fake zuul_return and an .ansible-lint file https://review.opendev.org/712547 | 19:51 |
*** jamesmcarthur has joined #zuul | 19:59 | |
fungi | Shrews: corvus: mordred: clearly the problem to build failures on my changes is to disappear for lunch and then magic research elves find the problem for me ;) | 20:17 |
fungi | i'll add an uncap for python-daemon and see what happens | 20:17 |
fungi | interestingly it didn't break the equivalent nodepool change, but i'll uncap there too for consistency | 20:18 |
openstackgerrit | Jeremy Stanley proposed zuul/nodepool master: Declare support for Python3.8 https://review.opendev.org/712494 | 20:22 |
openstackgerrit | Jeremy Stanley proposed zuul/zuul master: Declare support for Python3.8 https://review.opendev.org/712489 | 20:22 |
*** saneax has quit IRC | 20:42 | |
*** zenkuro has quit IRC | 20:48 | |
*** armstrongs has joined #zuul | 21:00 | |
*** armstrongs has quit IRC | 21:12 | |
*** Defolos has joined #zuul | 21:17 | |
*** saneax has joined #zuul | 21:19 | |
*** saneax has quit IRC | 21:27 | |
corvus | clarkb: zookeeper in bionic is 3.4.10, and i think the tls stuff was added in 3.5.1 | 21:38 |
corvus | so i think adding it to test-setup.sh and using it in tox tests may be problematic | 21:39 |
openstackgerrit | Monty Taylor proposed zuul/zuul-jobs master: Use a fake zuul_return and an .ansible-lint file https://review.opendev.org/712547 | 21:40 |
corvus | we could add it to the quick-start since that runs zk in a container (so at least we would be setting up zk in a recommended manner) | 21:40 |
mordred | corvus: does that mean we'll be looking at updating our zk deploy for zuul to be container-based for opendev for us to be able to roll out tls? | 21:42 |
corvus | we could also rethink whether it's actually required... SpamapS was encouraging us to avoid putting unencrypted secrets in zk (so they don't end up on disk in a checkpoint), so maybe it's not necessary? | 21:42 |
corvus | mordred: i think that would be the most expedient thing, yes | 21:42 |
corvus | i mean, i tend toward always thinking that adding tls and authentication is a good thing, but maybe it's not strictly necessary. | 21:42 |
mordred | corvus: yeah. I mean ... hrm | 21:43 |
corvus | the sasl auth uses digest-md5, so... technically it's probably okay to go over the wire plaintext. | 21:43 |
corvus | amusingly, since the tls setup essentially requires a ca, it's also an effective auth system, so if you enable tls, it's reasonable to disable sasl | 21:44 |
mordred | corvus: I thought the current plan was to put secrets into zk to get them to the executor - do we have a replacement thought for that? | 21:45 |
corvus | but maybe with stuff like this, it's good to have a belt and suspenders. even if you're wearing overalls. | 21:45 |
mordred | yeah | 21:45 |
mordred | like - maybe it's good to have the TLS even if we don't put the unencrypted secrets into zk | 21:46 |
corvus | mordred: i think the thought on secrets was that we would encrypt them in zk with a global key that is distributed to all zuul components out of band. | 21:46 |
mordred | nod | 21:46 |
corvus | i think the scheduler would decrypt the secret using the project key for a build, encrypt it using the shared global key, and stick that in zk. then the executor would decrypt that with it's copy of the shared global key | 21:47 |
corvus | clarkb, fungi, Shrews, SpamapS: ^ if you have any thoughts on the merits of tls with zk, they would be welcome | 21:55 |
corvus | i'll work on docs for this to at least capture what i've learned. then we can look at next steps. | 21:56 |
*** sreejithp has quit IRC | 22:01 | |
fungi | yep, i'm just trying to get it straight in my noggin | 22:02 |
clarkb | what does auth look like if not tls'd? other thoughts: we could deploy from zk's upstream tarball. I use it locally and it has owrked well for me | 22:06 |
corvus | clarkb: i'm mostly concerned with tests here (since that was the concern you brought up). our tests run on bionic now and use the bionic package. we could have the tests run with the upstream tarball, and if we did so, it would be easier to have the test runner do so without special system-wide configuration. | 22:07 |
corvus | (i want to say we actually did something like that at some point) | 22:07 |
fungi | shared keys are also risky if they have an effective lifespan of less than the secrets they're protecting... scenario is that a nefarious actor snags the encrypted secret, then later decrypts it with the leaked shared key. basically you have to consider any leakage of the shared key to also be a leakage of anything it ever encrypted which could have been leaked | 22:09 |
fungi | it's possible that simply distributing the zuul scheduler's asymmetric encryption key to the executors is equally safe and simpler to design | 22:10 |
corvus | fungi: well, there's an unbounded number of those | 22:11 |
*** dpawlik has quit IRC | 22:11 | |
fungi | look at it this way: if you have the scheduler asymmetric key and some shared symmetric key both of which are needed to protect the secrets being encrypted, then that's two possible pieces of information either of which when leaked could compromise the integrity of the encrypted data | 22:11 |
corvus | (no one has yet managed to figure out how to use tobiash's single-key system with a supported python encryption lib) | 22:12 |
*** rlandy is now known as rlandy|bbl | 22:12 | |
fungi | oh, good point, the asym keys are one per repo | 22:12 |
fungi | so you'd need to leak all of them (or at least the ones for the data you wanted to decrypt) | 22:12 |
corvus | i see it this way: the asym keys exist to allow people to put secrets into git repos. the sym key would exist to protect zuul's working memory. zuul's working memory can currently be compromised by compromising any zuul host (by connecting to gearman and accepting jobs). that would not change with the proposed sym key. it would protect it from a compromise of the zk hosts (or the disks of the zk hosts) | 22:15 |
fungi | also the gains of giving each executor its own asym keypair and then having the scheduler reencrypt to the relevant executor's key aren't that great... basically you reduce the risk of leaking an infrequently-used credential in environments with a large number of executors | 22:17 |
fungi | so on balance, a shared system-wide symmetric key is safer than not encrypting the data being stashed in zk. how safe comes down to the mechanisms used to distribute and protect that shared key | 22:18 |
fungi | currently the model is scheduler decrypts the ciphertext and then injects the plaintext into zk where it's available to all executors (and possibly any connected zk client if there are no acls), right? | 22:20 |
fungi | so to SpamapS's point, also exposed in any on-disk snapshots zk records for recovery purposes | 22:21 |
corvus | fungi: currently it's in gearman in plain text | 22:21 |
fungi | oh, right, gearman | 22:21 |
corvus | which means it's not written to disk, which means it's not tripping any compliance alarms | 22:21 |
fungi | and this is about the zk equivalent for distributed scheduler | 22:21 |
corvus | yep | 22:21 |
corvus | naively we would just do the same thing, but since it's zk, snapshots will be written to disk, which means it's a new potential channel for leaking plaintext | 22:22 |
fungi | though if the system where the geard is running is under memory pressure and doesn't use encrypted swap (still not all that common for virtual machines today)... | 22:22 |
fungi | presumably the same could be said of the scheduler if its allocations containing plaintext are paged out to swapspace? | 22:23 |
corvus | fungi: yes -- we can probably assume for sake of argument that an org that is worried about that is either using encrypted swap. or no swap. :) | 22:23 |
corvus | at any rate, that's outside our control | 22:24 |
corvus | i think "please don't write out unencrypted secrets to disk" is a reasonable request | 22:25 |
fungi | looking at this from the opposite direction... should we use a shared key to encrypt everything zuul stashes in zk? | 22:25 |
fungi | not just secrets? | 22:26 |
corvus | fungi: for extra validation? | 22:26 |
clarkb | Zuulians may be interested in http://zuul.openstack.org/build/18c85ad49ca5491291e7a8c2dc703fc5 we are trying to publish website activity stats that don't disclose any personal info but can be used to fix broken links that result in 404s, improve pages that are popular, and otherwise generally give people editing zuul-website some insight into where efforts can be best spent | 22:26 |
clarkb | if you click on the Goaccess report there yuo'll get some data | 22:26 |
fungi | it gets you several benefits. one is that if you are forced to decrypt the data before processing it, you may be able to protect a majority of your code paths from exploitation by maliciously constructed data which was injected by someone without access to the key | 22:27 |
clarkb | interesting data so far: the webm demo video represents the bulk of our traffic (no surprise) | 22:28 |
fungi | this is the primary reason openvpn adds a shared "tls key" for clients | 22:28 |
clarkb | we probably want to add a robots.txt | 22:28 |
fungi | basically if a set of data doesn't decrypt to something relevant using the shared key, you discard it immediately with prejudice | 22:29 |
fungi | but it also means you don't have to treat secrets and non-secrets differently when it comes to putting them in and retrieving them from zk | 22:30 |
corvus | fungi: that seems to have a lot of overlap with using tls with verification (since you would need one of the client certs to talk to the zk cluster anyway, and they're probably sitting right next to the symmetric key on the same hosts) | 22:31 |
fungi | on the other hand, depending on how resistant your selected symmetric encryption algorithm is to things like chosen plaintext oracles, it may mean you're exposing the key more | 22:31 |
corvus | fungi: which, to me suggests a dual sided argument: if we encrypt everything with a symmetric key, we could dispense with tls | 22:31 |
*** Shrews has quit IRC | 22:31 | |
*** tributarian has quit IRC | 22:31 | |
*** gundalow has quit IRC | 22:31 | |
*** jtanner has quit IRC | 22:31 | |
*** stevthedev has quit IRC | 22:31 | |
*** dustinc has quit IRC | 22:31 | |
*** portdirect has quit IRC | 22:31 | |
*** jbryce has quit IRC | 22:31 | |
fungi | yes, i think that if we need peer-to-peer encryption anyway (to avoid data at rest disclosure) then transport layer security may be redundant | 22:33 |
*** jbryce has joined #zuul | 22:34 | |
*** portdirect has joined #zuul | 22:34 | |
*** jtanner has joined #zuul | 22:34 | |
*** jamesmcarthur has quit IRC | 22:34 | |
clarkb | except for authentication (maybe md5sums possibly good enough) | 22:34 |
*** gundalow has joined #zuul | 22:35 | |
fungi | well, i you ignore the possibility that malicious ciphertext could compromise your decryption routines, you can infer authenticity from whether or not the payload decrypts to something usable | 22:35 |
fungi | possession of the shared key authenticates your payload as having originated from a system which was trusted with a copy of the key | 22:36 |
fungi | though of course it doesn't identify a particular system | 22:36 |
*** tributarian has joined #zuul | 22:36 | |
*** stevthedev has joined #zuul | 22:36 | |
*** guilhermesp has quit IRC | 22:36 | |
*** clayg has quit IRC | 22:36 | |
*** mnaser has quit IRC | 22:36 | |
*** gmann has quit IRC | 22:36 | |
*** lseki has quit IRC | 22:36 | |
*** donnyd has quit IRC | 22:36 | |
*** erbarr has quit IRC | 22:36 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: WIP: ZK TLS https://review.opendev.org/712531 | 22:37 |
corvus | there are still other malicios acts that could be done without worrying about ciphertext -- locking or deleting znodes | 22:38 |
fungi | right, i was only considering the peer interactions. not the client/server interactions | 22:39 |
clarkb | or filling the disk | 22:39 |
*** mnaser has joined #zuul | 22:39 | |
corvus | so at least one other system (tls, sasl or firewall) should be used. (firewall is our only current option and is what we use) | 22:40 |
fungi | securing communication of zuul data between zuul daemons is still potentially distinct from securing zuul daemon interactions with zk itself | 22:40 |
*** Shrews has joined #zuul | 22:40 | |
*** dustinc has joined #zuul | 22:40 | |
*** clayg has joined #zuul | 22:40 | |
*** gmann has joined #zuul | 22:41 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: WIP: ZK TLS https://review.opendev.org/712531 | 22:41 |
*** guilhermesp has joined #zuul | 22:41 | |
fungi | but it may mean that, if the generic zk interactions like manipulating znodes do not represent a leaky side channel then you could get by simply security them with authentication and not need encryption for them | 22:41 |
openstackgerrit | James E. Blair proposed zuul/zuul master: WIP: ZK TLS https://review.opendev.org/712531 | 22:41 |
corvus | Shrews, clarkb, fungi, mordred: if you want to look at https://review.opendev.org/712531 i added a 'howto' for zk which covers sasl auth ath tls -- so that should give you an idea of what's involved | 22:42 |
openstackgerrit | James E. Blair proposed zuul/zuul master: WIP: ZK TLS https://review.opendev.org/712531 | 22:43 |
*** lseki has joined #zuul | 22:43 | |
fungi | er, securing them with authentication | 22:43 |
*** donnyd has joined #zuul | 22:44 | |
*** erbarr has joined #zuul | 22:45 | |
clarkb | corvus: left a couple notes on ps4 | 22:45 |
*** tdasilva has quit IRC | 22:46 | |
*** evgenyl has quit IRC | 22:46 | |
clarkb | it doesn't seem that bad, the worst part might be getting a new enough zk to support tls? | 22:47 |
clarkb | (the docs in particular make it approachable I think) | 22:47 |
*** irclogbot_2 has quit IRC | 22:47 | |
*** evgenyl has joined #zuul | 22:47 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: WIP: ZK TLS https://review.opendev.org/712531 | 22:48 |
*** openstackstatus has quit IRC | 22:48 | |
corvus | clarkb: yeah, and, i like to think, a shell script that runs all those crazy keytools commands | 22:48 |
*** irclogbot_3 has joined #zuul | 22:49 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: WIP: ZK TLS https://review.opendev.org/712531 | 22:49 |
*** zbr has quit IRC | 22:50 | |
*** wxy-xiyuan has quit IRC | 22:50 | |
*** ChrisShort has quit IRC | 22:50 | |
*** samccann has quit IRC | 22:50 | |
*** mnasiadka has quit IRC | 22:50 | |
clarkb | as a side note for opendev and zuul upstream development. It looks like Gerrit is planning to rely on zookeeper for its HA solution. For administration of the code review system I half expect we'll end up wanting zk anyway eventually | 22:50 |
clarkb | understanding how to run it better is a good thing in general I expect | 22:51 |
*** tdasilva has joined #zuul | 22:51 | |
corvus | i think i'm leaning toward continuing to add support for both sasl and tls, and maybe don't use tls in the unit tests for now (it doesn't affect the code base very much -- certainly not as much as sasl with the acl stuff, and it would necessitate a major overhaul to how we run zk in tests). but maybe look at adding tls to the quickstart, both for testing and as a better example. | 22:53 |
fungi | sounds reasonable | 22:54 |
*** ChrisShort has joined #zuul | 22:54 | |
*** mnasiadka has joined #zuul | 22:54 | |
fungi | and represents a good defense-in-depth posture | 22:54 |
*** zbr has joined #zuul | 22:55 | |
corvus | so next step is to build on 712531 with actually adding the tls arguments to zuul, then the same to nodepool, then we can add it to the quickstart | 22:55 |
clarkb | ya I think covering the case in quickstart should be sufficient for knwoing that our tls support works | 22:55 |
*** wxy-xiyuan has joined #zuul | 22:55 | |
clarkb | (don't need it for every unittest) | 22:56 |
*** iamweswilson has quit IRC | 22:56 | |
*** dcastellani has quit IRC | 22:56 | |
*** webknjaz has quit IRC | 22:56 | |
*** samccann has joined #zuul | 22:57 | |
*** dcastellani has joined #zuul | 23:01 | |
*** webknjaz has joined #zuul | 23:01 | |
*** iamweswilson has joined #zuul | 23:02 | |
*** ianychoi has joined #zuul | 23:06 | |
*** Goneri has quit IRC | 23:15 | |
*** Defolos has quit IRC | 23:25 | |
*** Defolos has joined #zuul | 23:25 | |
*** tosky has quit IRC | 23:44 | |
*** armstrongs has joined #zuul | 23:45 | |
*** armstrongs has quit IRC | 23:52 | |
openstackgerrit | Merged zuul/nodepool master: Install zypper on the nodepool-builder image https://review.opendev.org/712177 | 23:53 |
SpamapS | fungi,corvus: TLS should still be used, as we tend to want to operate in a cloud "0-trust" network, where it's conceivable that an IP may be stolen by a rogue actor, even on an internal network. The digest auth would be vulnerable to MITM. | 23:54 |
SpamapS | I am intrigued by the thought of just requiring client certs signed by CA, and dropping the SASL. | 23:55 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!