Wednesday, 2020-03-11

*** erbarr has quit IRC00:01
*** ianychoi has quit IRC00:07
*** Goneri has quit IRC00:08
*** ianychoi has joined #zuul00:08
*** igordc has joined #zuul00:12
*** tosky has quit IRC00:28
*** dmellado has quit IRC01:22
*** mhu has quit IRC01:26
*** ianychoi has quit IRC01:30
*** ianychoi has joined #zuul01:32
*** dmellado has joined #zuul01:46
*** ianychoi has quit IRC02:10
*** swest has quit IRC02:14
*** ianychoi has joined #zuul02:18
*** swest has joined #zuul02:30
*** bhavikdbavishi has joined #zuul02:59
*** igordc has quit IRC03:30
*** ianychoi has quit IRC03:35
*** ianychoi has joined #zuul03:37
*** ianychoi has quit IRC05:00
*** ianychoi has joined #zuul05:03
*** raukadah is now known as chandankumar05:18
*** saneax has joined #zuul05:27
*** evrardjp has quit IRC05:35
*** evrardjp has joined #zuul05:35
*** ianychoi has quit IRC06:01
*** ianychoi has joined #zuul06:03
*** bolg has joined #zuul06:07
*** ianychoi has quit IRC06:11
*** ianychoi has joined #zuul06:13
*** ianychoi has quit IRC06:29
*** ianychoi has joined #zuul06:31
*** y2kenny has quit IRC06:38
*** ianychoi has quit IRC06:39
*** ianychoi has joined #zuul06:41
*** marvs has joined #zuul06:49
*** ianychoi has quit IRC06:58
*** ianychoi has joined #zuul07:05
*** avass has joined #zuul07:12
*** Defolos has quit IRC07:14
*** sshnaidm|afk is now known as sshnaidm07:16
*** dpawlik has joined #zuul07:17
*** hashar has joined #zuul07:45
*** jcapitao has joined #zuul07:46
*** Defolos has joined #zuul08:26
*** tosky has joined #zuul08:28
*** mhu has joined #zuul08:45
*** avass has quit IRC08:46
*** gouthamr has quit IRC08:49
*** mgoddard has quit IRC08:49
*** gouthamr has joined #zuul08:50
*** mgoddard has joined #zuul08:50
*** jpena|off is now known as jpena08:52
*** AJaeger has quit IRC09:54
*** AJaeger has joined #zuul10:14
-openstackstatus- NOTICE: The mail server for lists.openstack.org is currently not handling emails. The infra team will investigate and fix during US morning.10:27
*** hashar has quit IRC10:31
*** avass has joined #zuul10:59
*** ttx has quit IRC11:17
*** ttx has joined #zuul11:19
*** ttx has quit IRC11:19
*** ianychoi has quit IRC11:24
*** ttx has joined #zuul11:28
*** rlandy has joined #zuul11:54
*** Goneri has joined #zuul11:58
*** jcapitao is now known as jcapitao_lunch11:58
*** jcapitao_lunch has quit IRC12:03
*** jcapitao_lunch has joined #zuul12:05
*** sshnaidm is now known as sshnaidm|afk12:10
*** jpena is now known as jpena|lunch12:22
openstackgerritJan Kubovy proposed zuul/zuul master: Enforce sql connections for scheduler and web  https://review.opendev.org/63047212:41
*** jpena|lunch is now known as jpena13:01
*** plaurin has joined #zuul13:11
openstackgerritSorin Sbarnea proposed zuul/zuul-jobs master: Add ensure-snap role  https://review.opendev.org/71241413:17
*** bhavikdbavishi has quit IRC13:19
*** irclogbot_3 has quit IRC13:22
*** irclogbot_2 has joined #zuul13:24
*** jamesmcarthur has joined #zuul13:27
*** jcapitao_lunch has quit IRC13:30
*** jcapitao has joined #zuul13:33
*** zxiiro has joined #zuul13:43
*** y2kenny has joined #zuul14:07
*** avass has quit IRC14:10
y2kennyAs far as I understand, in order to get zuul build any project, that project has to be referenced in the tenant config.  But if I have a really large project that is not "zuul-native", what is the best way to handle it?  If I include the project under untrusted-projects, the very large project will get scan on zuul startup.  What I have done so far14:32
y2kennyis to include the project under untrusted-project with an empty include.  Is this the best way to do it?14:32
clarkby2kenny: yes, that makes the repo available in jobs without needing to wprry about loading configuration for jobs from it14:34
clarkbwe do similar when we load repos from external sources like public github repos14:35
y2kennygreat, thanks for confirming.14:36
mordredy2kenny: https://opendev.org/openstack/project-config/src/branch/master/zuul/main.yaml#L43-L76 for instance14:37
y2kennymordred: oh cool, that's useful14:37
*** sshnaidm|afk is now known as sshnaidm14:37
y2kennyAnother question: if I want to trigger off only a very specific branch of a project, I will have to first associate a pipeline with a project and then specify the branches in a job that is associated with the pipeline of that project, is that right?14:43
clarkby2kenny: the easiest way to do that is to put the config in those branches. Then they can add in jobs they want that apply directly to thosebranches14:44
y2kennyclarkb: I can't really do that at this point because we don't control those project (linux kernel upstream)14:45
y2kennyWith the way I setup right now, seems like all branches will trigger to do a bit of something regardless of the branches filter in the job.14:47
*** bhavikdbavishi has joined #zuul14:48
clarkbI see. In that case you can use abranch filter on the jobs, I would suggest that goes into an unbranched config project for simplicity14:48
y2kennyAnd that bit of something seems to be attempting to merge-mode.  Which raise another question, is there a way to specify have a merge-mode: none?  (for example, testing a change that hasn't been rebased or not able to merge?)14:49
fungiright, generally te only time putting configuration like that in a branched repo works out is if the branch names are identical between the project hosting the configuration and the project for which the jobs are being run14:49
y2kennythis usually happens on a change that is WIP14:49
clarkby2kenny: I dont think that behavior exists. Zuul curre tly wants to report the need for a rebase14:52
fungiy2kenny: not to my knowledge. for one thing that would make it basically impossible to mix in any declared (not implicit) change dependencies since they need to share a common git history if they're for the same repository+branch14:52
mordredy2kenny: I don't think we've had that use case come up as desirable before. the normal way we think of the test runs is "if we approved this for merge right now, would it pass everything" - and something that is unmergable would not do that (and the act of rebasing can, itself, cause test failures) - so we skip things and report the merge failure14:52
mordredalso - clarkb, fungi and I all just wrote the same thing :)14:52
mordredyay async communication!14:53
y2kennyso the use case I have is to also allow developers to use the zuul/CI infrastructure for WIP testing/experiment14:54
mordredy2kenny: is the process your devs use to work on a kernel patch to get it functionally and semantically correct without worrying about rebasing to target, then once that's happy do the rebase and re-review the results?14:54
*** y2kenny has quit IRC14:54
mordredohno. we lost y2kenny14:54
*** y2kenny has joined #zuul14:55
mordredy2kenny: yeah - we do that a ton - I agree, It's super-valuable. we've just all also adopted the practice of keeping rebased with the target branch as we work on those WIPs - or Do-not-merge experiments14:56
y2kennyum... weird... got disconnected again.14:56
y2kennyOur policy is also to rebase eventually but it's also possible that that set of changes are not rebase-able or merge ready due to code churn.14:57
mordredthat said - maybe a merge-mode: none for a. check pipeline wouldn't be impossible to implement? obviously it wouldn't work for a gate pipeline. definitely a corvus question14:57
y2kennyhow do you setup the do-not-merge jobs/pipeline btw?  I was wondering if there are integration with Gerrit's WIP status14:58
mordredwe don't set up a specific pipeline - we just push patches up for review14:59
mordredand mark then WIP so reviewers ignore them14:59
y2kenny(let's say someone pushed a patch to gerrit with WIP on, or have a [WIP] at the subject line.  Pipeline will trigger but won't submit.  Pipeline then trigger when the WIP status change.)14:59
y2kennyok.  I suppose the change won't get merge if there's no CR+2 anyway15:00
mordredyeah15:00
mordredy2kenny: for instance: https://review.opendev.org/#/c/705808/15:00
mordredor https://review.opendev.org/#/c/704644/15:01
fungibut also in opendev we set a workflow -1 (labeled "work in progress") vote which blocks merge15:01
y2kennyok... so this one is probably my workflow specifically... I have a bit of a weird situation where some folks doesn't want to use Gerrit for CR while others do.  I think I will probably have to handle this with a custom job.15:01
mordredy2kenny: that sounds like a complicated situation15:02
mordredy2kenny: I'm guessing since we're talking kernel that some of your folks prefer email review?15:03
y2kennylet's just say, in the linux kernel community, they have this checkpatch script (pretty much a lint script) that would scan for Change-ID in the commit message and flag them.15:03
y2kennymordred: yea....15:03
corvusy2kenny: you could entice them to use gerrit with the opportunity to have a nice ci system :)  also, for the folks who may be reluctant to use gerrit's web ui, they may be interested in gertty https://pypi.org/project/gertty/15:03
*** hashar has joined #zuul15:03
y2kennycorvus: I have been trying for half a decade :)   And yes, I was pushing gertty too but inertia is hard to move.15:05
y2kennyAre you the gertty author btw?15:06
corvusalso, ftr, we've talked about an email driver for zuul (so it could respond to emailed patches as changes).  it may be possible.15:06
corvusy2kenny: yes15:06
* corvus is james blair15:06
y2kennycool.  I think we have chat briefly in Sunnyvale.15:07
y2kennygood to put some names and faces to the chat handle.15:07
corvusy2kenny: nice to meet you again! :)15:07
corvusmordred is monty taylor, who gave the zuul talk there15:07
y2kennyoh nice.  Everybody is here :D15:08
* mordred waves15:10
fbohi, I wanted to continue the effort to package Zuul into Fedora. Basically adding it to rawhide. I figured out a zuul deps ws4py is not compat py37 and 38 (rawhide python version). I see Zuul is tested against py35 only. Is there an effort to test Zuul against upper version of Python ? Should it be started ?15:10
mordredsorry I didn't go out for dinner/drinks after - I was *SUPER* jet lagged15:10
*** hashar has quit IRC15:11
corvusfbo: it's tested against py3715:11
mordredfbo: we also have a py37 tox job and our container images build and test with py38 I believe15:11
mordredoh - my bad - containers are also py3715:12
fbooh ok I'll double check because I see that https://opendev.org/zuul/zuul/src/branch/master/tox.ini#L415:13
*** AJaeger has quit IRC15:14
mordredfbo: ah - that's just what tox will run if you don't give it any parameters15:15
mordredfbo: we have an explicit tox-py37 job - I don't know that many of us (any of us) just run "tox" ever - so I doubt that line is closely maintained15:16
fbomordred: ok well that's just me maybe ;) just running "tox" locally w/o version param. So ok nervermind.15:20
*** sreejithp has joined #zuul15:24
*** mattw4 has joined #zuul15:26
openstackgerritFabien Boucher proposed zuul/zuul master: Add tox-py38 in check  https://review.opendev.org/71248015:27
mordredfbo: good to know there is someone who does that!15:28
*** sgw has quit IRC15:32
fboYes it would be super nice to have zuul and nodepool into fedora. dnf install zuul nodepool. It could help to lower the entry cost for Fedora user.15:33
*** bolg has quit IRC15:48
*** sgw has joined #zuul15:48
fungimy position on the envlist parameter is we should just put "py3" in there15:49
fungiand then unqualified `tox` will run with whatever the default python3 is on the system15:49
zenkurofbo: Ithink it already there15:50
*** mattw4 has quit IRC15:51
corvusfungi: oh, if that's an option, ++15:52
fungiit's what i do for my personal projects15:56
tobiashcorvus: should we test all python versions we officially support for zuul (and I guess we'd need to define them precicely)?15:56
tobiashasking because of ^15:56
*** AJaeger has joined #zuul15:56
corvustobiash: we test the lowest and highest available15:57
*** jcapitao is now known as jcapitao_afk15:57
fungibecause honestly, the options are between tox throwing an error saying you don't have an appropriate python version installed, or tests failing because you're running on an unsupported python version, but either way i don't think the envlist in tox.ini is the place to declare what python versions you support15:57
tobiashcorvus: ok, with that stragegy we should change py37 to py38 probably instead15:57
corvustobiash: i think we can probably skip the versions in the middle, unless we think there's some specific risk15:57
corvustobiash: i agree15:57
clarkbtobiash: fbo: note that ubuntu-bionic doesn't have python3.8 packaged as far as I know so that proposed job won't work as is15:58
tobiashclarkb: it seems to execute tests though15:59
fungiclarkb: https://packages.ubuntu.com/bionic-updates/python3.815:59
clarkbwoah15:59
fungishould work as long as we install it15:59
tobiashbut it might still run py3716:00
y2kennyAre there any recommendation on sizing and scaling various components of zuul?  I was looking at the components doc page and sounds like executor and merger needs some disk space since they work with the git repositories.  But how many executor do I need (would I need an executor per available node?)  Also, are there any size recommendation for16:03
y2kennyzookeeper?  (I assume it might need to scale with the number of nodes but I am new to zookeeper myself.)16:03
corvusy2kenny: in opendev, our executors have 8G of ram and 8vcpus and handle about 80-100 concurrent builds16:05
corvuseach16:05
clarkby2kenny: you can see opendev's sizing via cacti at http://cacti.openstack.org/cacti/graph_view.php (I can never figure out how to deep link to hosts so you'll have to use the nav bar on the left. zeXX are executors, zmXX are mergers, zuul01 is the scheduler and zkXX is zookeeper16:05
clarkbcorvus: y2kenny executors also have 80GB of disk which is enough for our git repo set and about that many concurrent builds16:06
clarkbour scheduler is a bit oversized which was good for us when developing zuulv3 but can probably be shrunk at this point16:06
clarkbseems like starting with an executor of about that size is a reasonable starting point, then you can add more as you grow. (similar with mergers)16:07
corvusyeah, looking at its graphs, i'd probably give the scheduler 8-16G of ram16:07
y2kennydoes the executor share some kind of git cache/bare repo or does executor has it's own complete git repo for each job it support?16:08
mordredtobiash, clarkb, fungi: I believe we updated the tox jobs to install the needed python if it's available16:08
corvusy2kenny, clarkb: deep link to scheduler: http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=691&nodeid=node1_691&host_group_data=16:08
corvusand executor: http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=558&nodeid=node1_558&host_group_data=16:09
clarkby2kenny: as long as the /var/lib/zuul/ dir is a single filesystem it (mostly git) will use hardlinks to share common git files between logically different repos16:09
clarkby2kenny: zuul will create a "copy" of the repo for each job but largely avoids extra disk overhead for that via the hardlinking behavior of git16:09
clarkby2kenny: that means we end up with one actual copy + the separate git objects (merges really) for each build16:10
tobiashy2kenny: regarding number of executos, this depends highly on the kind of jobs (how big are the repos, how many repos are used per job). But a rough guideline could be one executor per 20-50 concurrent jobs.16:10
mordredtobiash, clarkb, fungi: https://opendev.org/zuul/zuul-jobs/src/branch/master/zuul.d/python-jobs.yaml#L113 https://opendev.org/zuul/zuul-jobs/src/branch/master/playbooks/tox/pre.yaml#L416:12
y2kennycorvus, clarkb, tobiash: thanks!16:13
openstackgerritJeremy Stanley proposed zuul/zuul master: Declare support for Python3.8  https://review.opendev.org/71248916:16
fungimordred: yeah, and there is already a tox-py38 job in zuul-jobs16:18
mordredyah16:18
mordredclarkb: do we still need fix-tox in zuul?16:18
y2kennyFor the executor, is it possible to specify a location to use with git clone reference (https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---reference-if-ableltrepositorygt)?  I am thinking of pointing a network local NFS-backed git cache and have it shared by all the executors but may be this is a premature optimization.16:19
mordredfor the tox-py35 job?16:19
fungitobiash: y2kenny: how it's worked out for us in opendev is that we need roughly one executor per 100 remote job nodes16:19
clarkbmordred: the thing that pinned importlib-resources? I don't think so16:20
mordredclarkb: kk.16:20
clarkbhttps://review.opendev.org/#/c/712107/ would confirm globally, but zuul got its own fix merged too16:20
AJaegerfungi: yes, tox-py38 exists16:20
y2kennyfungi: ok, perhaps I don't need to worry about scaling executors just yet :)16:20
AJaegerfungi: oh, that wasn't a question. Sorry ;(16:20
fungino worries, nice to get confirmation ;)16:21
openstackgerritJeremy Stanley proposed zuul/nodepool master: Declare support for Python3.8  https://review.opendev.org/71249416:25
fungiokay so 712489 and 712494 bump the upper python version tox jobs to py38, add trove classifiers to our package metadata indicating the python minor versions we expect we work with, and replace the py35 in the tox default envlist with just py3 (for convenience and also to avoid confusion)16:26
fungifbo: does that make more sense?16:26
openstackgerritMonty Taylor proposed zuul/zuul master: Remove fix-tox workaround for python3.5  https://review.opendev.org/71249516:26
openstackgerritMonty Taylor proposed zuul/zuul master: Remove duplicate variables for tox jobs  https://review.opendev.org/71249616:26
fbofungi: yes that's look better like that16:29
fungiafter all, the python package metadata (and/or documentation) are the places to declare what python versions zuul is supposed to work with16:30
*** jcapitao_afk is now known as jcapitao16:32
fungideclaring them in tox.ini assumes either 1. developers are going to have all those versions installed on their system to test with when running `tox` or 2. relies on them ignoring failures with errors about missing python interpreters or 3. requires a dangerous additional setting in tox.ini which can result in tox happily returning success when it didn't run explicitly requested testenvs16:32
tobiashzuul-maint: this saves about 5-8% of github api requests in our prod system: https://review.opendev.org/71098516:33
fungi#3 is especially a problem, since while it's fairly obvious for local invocations, it can result in slight misconfigurations of tox-using ci jobs to test nothing and return success (like if they're set to use an image which doesn't have the requested python version)16:34
*** jcapitao has quit IRC16:42
mordredtobiash: nice16:43
corvusShrews, mordred, tristanC, clarkb: i've taken my 3-node tls zk cluster and connected to it with a kazoo client16:45
corvusso i think i've manually worked out everything that's needed; i'll try to distill this into a modification of our test setup now16:45
mordredcorvus: neat16:46
Shrewsawesome16:48
*** jamesmcarthur has quit IRC16:49
*** jamesmcarthur has joined #zuul16:50
tobiashcool16:58
*** jamesmcarthur has quit IRC17:01
*** Defolos has quit IRC17:12
*** jamesmcarthur has joined #zuul17:15
openstackgerritMerged zuul/zuul master: Store build.error_detail in SQL  https://review.opendev.org/70985717:16
*** chandankumar is now known as raukadah17:18
*** jamesmcarthur has quit IRC17:23
*** jamesmcarthur has joined #zuul17:28
*** evrardjp has quit IRC17:35
*** evrardjp has joined #zuul17:35
*** bhavikdbavishi has quit IRC17:36
*** erbarr has joined #zuul17:36
*** jamesmcarthur has quit IRC17:37
*** jamesmcarthur has joined #zuul17:37
*** hashar has joined #zuul17:39
fungilooks like my proposed tox-py38 failed: https://zuul.opendev.org/t/zuul/build/29a876f8481449e3afd35657adb4618117:40
*** jamesmcarthur has quit IRC17:40
fungiOSError: [Errno 88] Socket operation on non-socket17:41
fungiin tests.unit.test_daemon.TestDaemon.test_daemon17:41
corvusneat; maybe that's a real problem?  is anyone using zuul under 3.8 yet?17:47
Shrewsclarkb: heh, looking to a solution for the need to use zk-shell to erase provider upload records, it seems i had already created such an option in the nodepool cli (the 'erase' command), only it doesn't cover uploads and deletes ALL zk data for a provider. I think modifying this to be a bit more flexible should do the trick.17:49
clarkb++17:49
*** jamesmcarthur has joined #zuul17:54
openstackgerritMerged zuul/zuul master: Cache getUser in Github connection  https://review.opendev.org/71098517:55
corvusmordred just triggered this error http://paste.openstack.org/show/790558/ with this patch https://review.opendev.org/71252517:56
corvuslooks like a zuul bug17:57
*** jpena is now known as jpena|off17:57
corvusi'd track it down, but i'm knee deep in the zk stuff17:57
corvusif anyone else wants to look into it, that'd be great17:57
clarkbI need to step away for a bit, but when I get back in an hour I can probably look if others aren't already17:59
Shrewsfor some reason, we cap python-daemon. i'd wager we need a newer version. but also looking at nodepool things now18:00
Shrewsquick scan of the change log references socket things in a newer version: https://pagure.io/python-daemon/blob/master/f/ChangeLog#_3418:02
Shrewsfungi: might want to uncap that and see if it helps or breaks worse18:02
Shrewshttps://pagure.io/python-daemon/issue/3418:03
Shrewsyep, that's it18:03
*** sshnaidm is now known as sshnaidm|afk18:04
mordredawesome18:12
*** y2kenny has quit IRC18:13
mordredShrews: I think we were capping python-daemon because it either did something bad in a newer version, or didn't work with newer python18:13
mordredShrews: the commit message is sparse on details18:13
mordredShrews: The latest python-deamon has broken zuul. Constrain zuul below18:13
mordred    the last release.18:13
mordredShrews: so yeah - maybe the answer is uncapping it, seeing what breaks and what is needed to fix it/18:14
Shrewsmordred: oh, i'm sure there was some reason for the cap. my personal 'git log' of my brain reveals no details either18:14
Shrewsmordred: the nodepool side of that change says:18:16
Shrews2.1.0 throws the exception:18:16
Shrews18:16
Shrews      daemon.daemon.DaemonOSEnvironmentError: Unable to change process owner ([Errno 1] Operation not permitted)18:16
Shrewsso... there's that18:16
mordredNEAT18:16
Shrewsmaybe that was a thing fixed in 2.1.1 and we never followed up18:17
mordredmaybe so?18:18
*** hashar_ has joined #zuul18:21
*** hashar has quit IRC18:21
*** hashar_ has quit IRC18:22
*** hashar has joined #zuul18:22
SpamapSwow ok I have a *really* weird thing, it's more git than zuul, but maybe y'all can help18:26
SpamapSI've been using git-crypt for the last 18 months on this repo, but this week we're migrating to SOPS18:27
SpamapSthis means removing the '.gitattributes' file that makes git-crypt function.18:27
SpamapSIn my local repo, a file shows plaintext. But in a zuul test node, when the branch is merged on top of master, it shows as the old version, still encrypted...18:28
clarkbis it possible that state is persisting in the merger repos?18:29
SpamapSgit log <thefile> shows the same commit for both, the one where the file was added, but it was recently moved and decrypted, so I'm also just not sure why it doesn't show a change in either checkout18:29
mordredSpamapS: I don't know off the top of my head (don't know much about git-crypt) - but could it have to do with zuul not having the ability to decrypt the file, so it can't properly merge it?18:30
SpamapSI also suspect something iwth renam wonkiness18:30
mordred(I'm assuming you've done things with the file over the last 18 months though)18:30
SpamapSOne thing, the file basically moved from being binary, to text18:30
SpamapSgit-crypt stores as binaries and then uses a filter to show you the text.18:31
SpamapSwhoa.. Ok, this is confusing18:32
SpamapShttp://paste.openstack.org/show/790561/18:33
SpamapSdafuq?18:33
mordredcorvus: the unknown config error was a red herring - it's the one-project-two-tenants issue - that was an error from the opendev tenant18:33
SpamapSOk..18:33
corvusmordred: ah, well, it's probably still a bug18:33
SpamapSgit-crypt did something wonky18:33
*** AJaeger has quit IRC18:34
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: ZK TLS  https://review.opendev.org/71253118:34
mordredSpamapS: yeah - I definitely don't know enough about git-crypt to be useful there18:34
SpamapShiding the fact that this file was modified locally. I did `rm id_rsa_zuul.pub && git checkout id_rsa_zuul.pub` and the binary version was there18:34
SpamapSmordred: nobody does, that's why we're going to sops. ;)18:34
SpamapSwhich is super awesome btw.18:34
mordredcorvus: yes - but I feel like we tracked it down once before - and it's related to the fact that there are config errors for system-config in the opendev tenant due to missing repos already18:34
mordredcorvus: or something like that18:35
SpamapSfor in-repo secrets, sops is tha bomb18:35
SpamapSok I see now.. if you've had git-crypt in a repo, and you remove the .gitattributes files, you also need to remove some other junk in .git18:35
mordredSpamapS: which zuul doesn't know to do in this case18:36
*** AJaeger has joined #zuul18:37
SpamapSmordred: yeah this was 0 zuul's fault, which I was pretty sure of. :)18:39
corvusmordred, Shrews, tristanC, clarkb, fungi: ^ 712531 is a script to manage a CA for issuing zk certs; i'll build on that for our tests, but also, i think it's something folks could use for production zk deployments too (we could just run that in opendev ansible)18:39
SpamapSzuul actually didn't have the .git crud, but my local clone did.18:39
*** plaurin has quit IRC18:52
*** klindgren has quit IRC18:59
*** klindgren has joined #zuul19:04
openstackgerritDavid Shrewsbury proposed zuul/nodepool master: Add options to CLI info command  https://review.opendev.org/71253919:08
openstackgerritDavid Shrewsbury proposed zuul/nodepool master: Add options to CLI info command  https://review.opendev.org/71253919:09
openstackgerritClark Boylan proposed zuul/zuul master: Don't access parent layout errors if there is no parent layout  https://review.opendev.org/71254419:32
clarkbcorvus: mordred ^ that is admittedly half a stab in the dark, but I think it covers the behavior shown by that traceback19:32
AJaegermordred: want to do the zuul_return trick on openstack-zuul-jobs and zuul-jobs as well?19:32
mordredAJaeger: yeah. I can follow up with them in just a few19:35
AJaegerthanks19:35
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Use a fake zuul_return and an .ansible-lint file  https://review.opendev.org/71254719:46
*** jamesmcarthur has quit IRC19:48
*** hashar has quit IRC19:48
*** jamesmcarthur has joined #zuul19:49
*** jamesmcarthur has quit IRC19:49
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Use a fake zuul_return and an .ansible-lint file  https://review.opendev.org/71254719:51
*** jamesmcarthur has joined #zuul19:59
fungiShrews: corvus: mordred: clearly the problem to build failures on my changes is to disappear for lunch and then magic research elves find the problem for me ;)20:17
fungii'll add an uncap for python-daemon and see what happens20:17
fungiinterestingly it didn't break the equivalent nodepool change, but i'll uncap there too for consistency20:18
openstackgerritJeremy Stanley proposed zuul/nodepool master: Declare support for Python3.8  https://review.opendev.org/71249420:22
openstackgerritJeremy Stanley proposed zuul/zuul master: Declare support for Python3.8  https://review.opendev.org/71248920:22
*** saneax has quit IRC20:42
*** zenkuro has quit IRC20:48
*** armstrongs has joined #zuul21:00
*** armstrongs has quit IRC21:12
*** Defolos has joined #zuul21:17
*** saneax has joined #zuul21:19
*** saneax has quit IRC21:27
corvusclarkb: zookeeper in bionic is 3.4.10, and i think the tls stuff was added in 3.5.121:38
corvusso i think adding it to test-setup.sh and using it in tox tests may be problematic21:39
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Use a fake zuul_return and an .ansible-lint file  https://review.opendev.org/71254721:40
corvuswe could add it to the quick-start since that runs zk in a container (so at least we would be setting up zk in a recommended manner)21:40
mordredcorvus: does that mean we'll be looking at updating our zk deploy for zuul to be container-based for opendev for us to be able to roll out tls?21:42
corvuswe could also rethink whether it's actually required... SpamapS was encouraging us to avoid putting unencrypted secrets in zk (so they don't end up on disk in a checkpoint), so maybe it's not necessary?21:42
corvusmordred: i think that would be the most expedient thing, yes21:42
corvusi mean, i tend toward always thinking that adding tls and authentication is a good thing, but maybe it's not strictly necessary.21:42
mordredcorvus: yeah. I mean ... hrm21:43
corvusthe sasl auth uses digest-md5, so... technically it's probably okay to go over the wire plaintext.21:43
corvusamusingly, since the tls setup essentially requires a ca, it's also an effective auth system, so if you enable tls, it's reasonable to disable sasl21:44
mordredcorvus: I thought the current plan was to put secrets into zk to get them to the executor - do we have a replacement thought for that?21:45
corvusbut maybe with stuff like this, it's good to have a belt and suspenders.  even if you're wearing overalls.21:45
mordredyeah21:45
mordredlike - maybe it's good to have the TLS even if we don't put the unencrypted secrets into zk21:46
corvusmordred: i think the thought on secrets was that we would encrypt them in zk with a global key that is distributed to all zuul components out of band.21:46
mordrednod21:46
corvusi think the scheduler would decrypt the secret using the project key for a build, encrypt it using the shared global key, and stick that in zk.  then the executor would decrypt that with it's copy of the shared global key21:47
corvusclarkb, fungi, Shrews, SpamapS: ^ if you have any thoughts on the merits of tls with zk, they would be welcome21:55
corvusi'll work on docs for this to at least capture what i've learned.  then we can look at next steps.21:56
*** sreejithp has quit IRC22:01
fungiyep, i'm just trying to get it straight in my noggin22:02
clarkbwhat does auth look like if not tls'd? other thoughts: we could deploy from zk's upstream tarball. I use it locally and it has owrked well for me22:06
corvusclarkb: i'm mostly concerned with tests here (since that was the concern you brought up).  our tests run on bionic now and use the bionic package.  we could have the tests run with the upstream tarball, and if we did so, it would be easier to have the test runner do so without special system-wide configuration.22:07
corvus(i want to say we actually did something like that at some point)22:07
fungishared keys are also risky if they have an effective lifespan of less than the secrets they're protecting... scenario is that a nefarious actor snags the encrypted secret, then later decrypts it with the leaked shared key. basically you have to consider any leakage of the shared key to also be a leakage of anything it ever encrypted which could have been leaked22:09
fungiit's possible that simply distributing the zuul scheduler's asymmetric encryption key to the executors is equally safe and simpler to design22:10
corvusfungi: well, there's an unbounded number of those22:11
*** dpawlik has quit IRC22:11
fungilook at it this way: if you have the scheduler asymmetric key and some shared symmetric key both of which are needed to protect the secrets being encrypted, then that's two possible pieces of information either of which when leaked could compromise the integrity of the encrypted data22:11
corvus(no one has yet managed to figure out how to use tobiash's single-key system with a supported python encryption lib)22:12
*** rlandy is now known as rlandy|bbl22:12
fungioh, good point, the asym keys are one per repo22:12
fungiso you'd need to leak all of them (or at least the ones for the data you wanted to decrypt)22:12
corvusi see it this way: the asym keys exist to allow people to put secrets into git repos.  the sym key would exist to protect zuul's working memory.  zuul's working memory can currently be compromised by compromising any zuul host (by connecting to gearman and accepting jobs).  that would not change with the proposed sym key.  it would protect it from a compromise of the zk hosts (or the disks of the zk hosts)22:15
fungialso the gains of giving each executor its own asym keypair and then having the scheduler reencrypt to the relevant executor's key aren't that great... basically you reduce the risk of leaking an infrequently-used credential in environments with a large number of executors22:17
fungiso on balance, a shared system-wide symmetric key is safer than not encrypting the data being stashed in zk. how safe comes down to the mechanisms used to distribute and protect that shared key22:18
fungicurrently the model is scheduler decrypts the ciphertext and then injects the plaintext into zk where it's available to all executors (and possibly any connected zk client if there are no acls), right?22:20
fungiso to SpamapS's point, also exposed in any on-disk snapshots zk records for recovery purposes22:21
corvusfungi: currently it's in gearman in plain text22:21
fungioh, right, gearman22:21
corvuswhich means it's not written to disk, which means it's not tripping any compliance alarms22:21
fungiand this is about the zk equivalent for distributed scheduler22:21
corvusyep22:21
corvusnaively we would just do the same thing, but since it's zk, snapshots will be written to disk, which means it's a new potential channel for leaking plaintext22:22
fungithough if the system where the geard is running is under memory pressure and doesn't use encrypted swap (still not all that common for virtual machines today)...22:22
fungipresumably the same could be said of the scheduler if its allocations containing plaintext are paged out to swapspace?22:23
corvusfungi: yes -- we can probably assume for sake of argument that an org that is worried about that is either using encrypted swap.  or no swap.  :)22:23
corvusat any rate, that's outside our control22:24
corvusi think "please don't write out unencrypted secrets to disk" is a reasonable request22:25
fungilooking at this from the opposite direction... should we use a shared key to encrypt everything zuul stashes in zk?22:25
funginot just secrets?22:26
corvusfungi: for extra validation?22:26
clarkbZuulians may be interested in http://zuul.openstack.org/build/18c85ad49ca5491291e7a8c2dc703fc5 we are trying to publish website activity stats that don't disclose any personal info but can be used to fix broken links that result in 404s, improve pages that are popular, and otherwise generally give people editing zuul-website some insight into where efforts can be best spent22:26
clarkbif you click on the Goaccess report there yuo'll get some data22:26
fungiit gets you several benefits. one is that if you are forced to decrypt the data before processing it, you may be able to protect a majority of your code paths from exploitation by maliciously constructed data which was injected by someone without access to the key22:27
clarkbinteresting data so far: the webm demo video represents the bulk of our traffic (no surprise)22:28
fungithis is the primary reason openvpn adds a shared "tls key" for clients22:28
clarkbwe probably want to add a robots.txt22:28
fungibasically if a set of data doesn't decrypt to something relevant using the shared key, you discard it immediately with prejudice22:29
fungibut it also means you don't have to treat secrets and non-secrets differently when it comes to putting them in and retrieving them from zk22:30
corvusfungi: that seems to have a lot of overlap with using tls with verification (since you would need one of the client certs to talk to the zk cluster anyway, and they're probably sitting right next to the symmetric key on the same hosts)22:31
fungion the other hand, depending on how resistant your selected symmetric encryption algorithm is to things like chosen plaintext oracles, it may mean you're exposing the key more22:31
corvusfungi: which, to me suggests a dual sided argument: if we encrypt everything with a symmetric key, we could dispense with tls22:31
*** Shrews has quit IRC22:31
*** tributarian has quit IRC22:31
*** gundalow has quit IRC22:31
*** jtanner has quit IRC22:31
*** stevthedev has quit IRC22:31
*** dustinc has quit IRC22:31
*** portdirect has quit IRC22:31
*** jbryce has quit IRC22:31
fungiyes, i think that if we need peer-to-peer encryption anyway (to avoid data at rest disclosure) then transport layer security may be redundant22:33
*** jbryce has joined #zuul22:34
*** portdirect has joined #zuul22:34
*** jtanner has joined #zuul22:34
*** jamesmcarthur has quit IRC22:34
clarkbexcept for authentication (maybe md5sums possibly good enough)22:34
*** gundalow has joined #zuul22:35
fungiwell, i you ignore the possibility that malicious ciphertext could compromise your decryption routines, you can infer authenticity from whether or not the payload decrypts to something usable22:35
fungipossession of the shared key authenticates your payload as having originated from a system which was trusted with a copy of the key22:36
fungithough of course it doesn't identify a particular system22:36
*** tributarian has joined #zuul22:36
*** stevthedev has joined #zuul22:36
*** guilhermesp has quit IRC22:36
*** clayg has quit IRC22:36
*** mnaser has quit IRC22:36
*** gmann has quit IRC22:36
*** lseki has quit IRC22:36
*** donnyd has quit IRC22:36
*** erbarr has quit IRC22:36
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: ZK TLS  https://review.opendev.org/71253122:37
corvusthere are still other malicios acts that could be done without worrying about ciphertext -- locking or deleting znodes22:38
fungiright, i was only considering the peer interactions. not the client/server interactions22:39
clarkbor filling the disk22:39
*** mnaser has joined #zuul22:39
corvusso at least one other system (tls, sasl or firewall) should be used.  (firewall is our only current option and is what we use)22:40
fungisecuring communication of zuul data between zuul daemons is still potentially distinct from securing zuul daemon interactions with zk itself22:40
*** Shrews has joined #zuul22:40
*** dustinc has joined #zuul22:40
*** clayg has joined #zuul22:40
*** gmann has joined #zuul22:41
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: ZK TLS  https://review.opendev.org/71253122:41
*** guilhermesp has joined #zuul22:41
fungibut it may mean that, if the generic zk interactions like manipulating znodes do not represent a leaky side channel then you could get by simply security them with authentication and not need encryption for them22:41
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: ZK TLS  https://review.opendev.org/71253122:41
corvusShrews, clarkb, fungi, mordred: if you want to look at  https://review.opendev.org/712531 i added a 'howto' for zk which covers sasl auth ath tls -- so that should give you an idea of what's involved22:42
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: ZK TLS  https://review.opendev.org/71253122:43
*** lseki has joined #zuul22:43
fungier, securing them with authentication22:43
*** donnyd has joined #zuul22:44
*** erbarr has joined #zuul22:45
clarkbcorvus: left a couple notes on ps422:45
*** tdasilva has quit IRC22:46
*** evgenyl has quit IRC22:46
clarkbit doesn't seem that bad, the worst part might be getting a new enough zk to support tls?22:47
clarkb(the docs in particular make it approachable I think)22:47
*** irclogbot_2 has quit IRC22:47
*** evgenyl has joined #zuul22:47
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: ZK TLS  https://review.opendev.org/71253122:48
*** openstackstatus has quit IRC22:48
corvusclarkb: yeah, and, i like to think, a shell script that runs all those crazy keytools commands22:48
*** irclogbot_3 has joined #zuul22:49
openstackgerritJames E. Blair proposed zuul/zuul master: WIP: ZK TLS  https://review.opendev.org/71253122:49
*** zbr has quit IRC22:50
*** wxy-xiyuan has quit IRC22:50
*** ChrisShort has quit IRC22:50
*** samccann has quit IRC22:50
*** mnasiadka has quit IRC22:50
clarkbas a side note for opendev and zuul upstream development. It looks like Gerrit is planning to rely on zookeeper for its HA solution. For administration of the code review system I half expect we'll end up wanting zk anyway eventually22:50
clarkbunderstanding how to run it better is a good thing in general I expect22:51
*** tdasilva has joined #zuul22:51
corvusi think i'm leaning toward continuing to add support for both sasl and tls, and maybe don't use tls in the unit tests for now (it doesn't affect the code base very much -- certainly not as much as sasl with the acl stuff, and it would necessitate a major overhaul to how we run zk in tests).  but maybe look at adding tls to the quickstart, both for testing and as a better example.22:53
fungisounds reasonable22:54
*** ChrisShort has joined #zuul22:54
*** mnasiadka has joined #zuul22:54
fungiand represents a good defense-in-depth posture22:54
*** zbr has joined #zuul22:55
corvusso next step is to build on 712531 with actually adding the tls arguments to zuul, then the same to nodepool, then we can add it to the quickstart22:55
clarkbya I think covering the case in quickstart should be sufficient for knwoing that our tls support works22:55
*** wxy-xiyuan has joined #zuul22:55
clarkb(don't need it for every unittest)22:56
*** iamweswilson has quit IRC22:56
*** dcastellani has quit IRC22:56
*** webknjaz has quit IRC22:56
*** samccann has joined #zuul22:57
*** dcastellani has joined #zuul23:01
*** webknjaz has joined #zuul23:01
*** iamweswilson has joined #zuul23:02
*** ianychoi has joined #zuul23:06
*** Goneri has quit IRC23:15
*** Defolos has quit IRC23:25
*** Defolos has joined #zuul23:25
*** tosky has quit IRC23:44
*** armstrongs has joined #zuul23:45
*** armstrongs has quit IRC23:52
openstackgerritMerged zuul/nodepool master: Install zypper on the nodepool-builder image  https://review.opendev.org/71217723:53
SpamapSfungi,corvus: TLS should still be used, as we tend to want to operate in a cloud "0-trust" network, where it's conceivable that an IP may be stolen by a rogue actor, even on an internal network. The digest auth would be vulnerable to MITM.23:54
SpamapSI am intrigued by the thought of just requiring client certs signed by CA, and dropping the SASL.23:55

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!