*** rfolco has quit IRC | 00:22 | |
*** rfolco has joined #zuul | 00:22 | |
*** toabctl has quit IRC | 00:24 | |
*** rfolco has quit IRC | 00:24 | |
*** rfolco has joined #zuul | 00:24 | |
*** rfolco has quit IRC | 00:25 | |
*** rfolco has joined #zuul | 00:25 | |
*** toabctl has joined #zuul | 00:27 | |
*** tosky has quit IRC | 00:28 | |
*** rfolco has quit IRC | 00:28 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Add OpenShift SCC and functional test https://review.opendev.org/702758 | 00:28 |
---|---|---|
*** rfolco has joined #zuul | 00:29 | |
*** rfolco has quit IRC | 00:30 | |
*** rfolco has joined #zuul | 00:31 | |
*** rfolco has quit IRC | 00:33 | |
*** rfolco has joined #zuul | 00:33 | |
*** fdegir has quit IRC | 00:35 | |
*** fdegir has joined #zuul | 00:36 | |
*** rfolco has quit IRC | 00:38 | |
*** mattw4 has quit IRC | 00:38 | |
tristanC | ^ finally passed ci for both minikube and openshift | 00:57 |
*** jamesmcarthur has joined #zuul | 00:57 | |
*** jamesmcarthur has quit IRC | 01:17 | |
*** igordc has quit IRC | 01:43 | |
*** jamesmcarthur has joined #zuul | 03:16 | |
*** jamesmcarthur has quit IRC | 03:27 | |
*** jamesmcarthur has joined #zuul | 03:27 | |
*** rlandy has quit IRC | 04:03 | |
*** jamesmcarthur has quit IRC | 04:14 | |
*** jamesmcarthur has joined #zuul | 04:33 | |
*** jamesmcarthur has quit IRC | 04:41 | |
*** raukadah is now known as chkumar|rover | 05:17 | |
*** evrardjp has quit IRC | 05:33 | |
*** evrardjp has joined #zuul | 05:33 | |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Add spec for scale out scheduler https://review.opendev.org/621479 | 05:51 |
*** bolg has joined #zuul | 05:53 | |
bolg | corvus, tobiash: just made another iteration on https://review.opendev.org/c/621479 | 05:53 |
*** swest has joined #zuul | 06:14 | |
mnaser | corvus: +1'd from me, and +2 from tristanC so i think it's probably fine if you merge it | 06:15 |
mnaser | corvus: perhaps we can setup an acl for zuul-helm-maintainers as a set that includes zuul-maint and those interested in maintaining those charts? (hi!) | 06:16 |
*** bolg has quit IRC | 06:38 | |
*** persia has quit IRC | 07:06 | |
*** persia has joined #zuul | 07:08 | |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Update upload-afs README https://review.opendev.org/705166 | 07:35 |
openstackgerrit | Felix Schmidt proposed zuul/zuul master: Fix github app authentication to work with checks API endpoints https://review.opendev.org/705167 | 07:37 |
openstackgerrit | Felix Schmidt proposed zuul/zuul master: Implement github checks API https://review.opendev.org/705168 | 07:37 |
openstackgerrit | Merged zuul/zuul-helm master: Set ClusterIP to None for executor logs https://review.opendev.org/705132 | 07:44 |
openstackgerrit | Simon Westphahl proposed zuul/zuul master: wip: Add support for tracing using OpenTelemetry https://review.opendev.org/705170 | 07:47 |
*** bolg has joined #zuul | 07:52 | |
*** tosky has joined #zuul | 08:23 | |
*** avass has joined #zuul | 08:35 | |
*** hashar has joined #zuul | 08:45 | |
*** bhavikdbavishi has joined #zuul | 08:48 | |
*** flaper87 has quit IRC | 09:19 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul master: Add build history link to summary https://review.opendev.org/705049 | 09:47 |
*** bhavikdbavishi has quit IRC | 10:05 | |
openstackgerrit | Felix Schmidt proposed zuul/zuul master: Implement github checks API https://review.opendev.org/705168 | 10:17 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Add foreground option https://review.opendev.org/635649 | 10:21 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Add foreground option https://review.opendev.org/635649 | 10:26 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Deprecate -d switch for running in foreground https://review.opendev.org/705185 | 10:34 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Don't enforce forefround with -d switch https://review.opendev.org/705189 | 10:38 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Don't enforce foreground with -d switch https://review.opendev.org/705189 | 10:54 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: Authorization rules: add templating https://review.opendev.org/705193 | 11:16 |
*** hashar has quit IRC | 11:30 | |
openstackgerrit | Felix Schmidt proposed zuul/zuul master: Make reporting asynchronous https://review.opendev.org/691253 | 11:44 |
openstackgerrit | Felix Schmidt proposed zuul/zuul master: Make direct-push configurable on project-level https://review.opendev.org/677109 | 11:45 |
openstackgerrit | Felix Schmidt proposed zuul/zuul master: Implement push job in merger https://review.opendev.org/677110 | 11:45 |
openstackgerrit | Felix Schmidt proposed zuul/zuul master: Push changes in GerritReporter if direct-push is enabled https://review.opendev.org/677111 | 11:45 |
*** hashar has joined #zuul | 11:52 | |
*** felixedel has joined #zuul | 12:06 | |
*** rfolco has joined #zuul | 12:10 | |
openstackgerrit | Felix Schmidt proposed zuul/zuul master: Report retried builds in a build set via mqtt. https://review.opendev.org/632727 | 12:14 |
openstackgerrit | Felix Schmidt proposed zuul/zuul master: Report retried builds via sql reporter. https://review.opendev.org/633501 | 12:14 |
openstackgerrit | Felix Schmidt proposed zuul/zuul master: Store information about gate resets in MQTT and SQL reporter https://review.opendev.org/696670 | 12:14 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: Authorization rules: add templating https://review.opendev.org/705193 | 12:32 |
*** hashar has quit IRC | 12:40 | |
*** hashar has joined #zuul | 12:44 | |
*** hashar has quit IRC | 12:55 | |
*** hashar has joined #zuul | 12:55 | |
*** rlandy has joined #zuul | 13:08 | |
*** jamesmcarthur has joined #zuul | 13:22 | |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: Authorization rules: add templating https://review.opendev.org/705193 | 13:23 |
*** felixedel has quit IRC | 13:26 | |
*** bolg has quit IRC | 13:28 | |
*** jamesmcarthur has quit IRC | 13:35 | |
*** jamesmcarthur has joined #zuul | 13:37 | |
*** jamesmcarthur has quit IRC | 13:42 | |
zbr | corvus: does "build history" look ok on https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_0ff/705049/2/check/zuul-build-dashboard/0ff3ac3/npm/html/build/2e6d36faf2114b749ddcfdc30e35cc05 ? | 14:06 |
zbr | is too close? should I put a long-dash in between maybe? | 14:06 |
*** jamesmcarthur has joined #zuul | 14:06 | |
*** jamesmcarthur has quit IRC | 14:13 | |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: Authorization rules: add templating https://review.opendev.org/705193 | 14:26 |
*** zxiiro has joined #zuul | 14:35 | |
openstackgerrit | Antoine Musso proposed zuul/zuul master: web: humanize time durations https://review.opendev.org/705120 | 14:50 |
hashar | tristanC: hello, I have updated my patch to format duration on the web pages :] https://review.opendev.org/#/c/705120/ | 14:52 |
hashar | used https://www.npmjs.com/package/moment-duration-format which does exactly want I wanted \o/ | 14:52 |
corvus | tristanC, mnaser, tobiash: i could use your help in understanding the user situation on the executor container. as it stands now, i don't understand how anyone has every gotten this working in k8s. :) let me explain my confusion: | 14:52 |
hashar | tristanC: no rush though since I am heading out soonish. But I will be back around in two hours | 14:52 |
corvus | tristanC, mnaser, tobiash: it looks like the current zuul-executor images don't specify a user, therefore they run as root. that means most things conveniently work (everything is writable!). but if you run "rsync -a" inside bwrap as root, rsync will try to change directory ownership, but bwrap will disallow that. the fetch-output role does this and therefore fails. | 14:53 |
corvus | tristanC, mnaser, tobiash: but merely setting a runAsUser on the container doesn't work, because the user isn't defined in the container and therefore defaults to a home directory of /, which is not writeable, and therefore ansible won't run at all (among other problems). | 14:54 |
tristanC | corvus: isn't that fixed by https://review.opendev.org/650246 ? I needed that to make zuul works in openshift | 14:55 |
corvus | tristanC, mnaser, tobiash: it looks like in order for z-e to run in k8s at all, we need a proper user created in the container (with a writeable homedir for ansible) | 14:55 |
tobiash | corvus: our executor runs as user 1000 but in a privileged container | 14:55 |
tobiash | but we haven't migrated to the official images yet | 14:56 |
mnaser | mine.. works.. somehow? | 14:56 |
mnaser | i def have jobs working with zuul-executor under k8s using cri-o | 14:56 |
mnaser | and using those helm charts and the upstream images | 14:56 |
corvus | tristanC: yes, i think that would -- except, do we need a writeable home directory too? i just see 246 creating a passwd entry | 14:57 |
corvus | tobiash: i think i'm in a privileged container too (i believe that's default in gke) -- do you create a user and homedir in your image build? | 14:57 |
corvus | mnaser: are you perhaps not using the fetch-output role? | 14:57 |
tobiash | let me dig up our k8s spec | 14:57 |
*** jamesmcarthur has joined #zuul | 14:58 | |
corvus | mnaser: (or even any role that runs synchronize to pull stuff back to the executor?) | 14:58 |
mnaser | corvus: isn't that pretty much a requirement to be able to push logs up? | 14:58 |
mnaser | we use swift to publish logs | 14:58 |
corvus | mnaser: more or less, yes... | 14:58 |
tobiash | yes, we create a user during image build | 14:58 |
tobiash | with a pre-defined uid | 14:58 |
corvus | tobiash: like with useradd/adduser? so it gets a normal homedir? | 14:58 |
tobiash | corvus: http://paste.openstack.org/show/789009/ | 14:59 |
mnaser | im looking at a build on jan 28 on the zuul we run using those charts with logs on swift | 15:00 |
tobiash | and boot.sh does some git settings (could probably go to the image build as well) | 15:00 |
tristanC | corvus: https://review.opendev.org/#/c/650246/6/tools/uid_entrypoint.sh setups the user with HOME, and the operator set it like so: https://review.opendev.org/#/c/702106/17/conf/zuul/resources.dhall@591 | 15:00 |
corvus | mnaser: https://gerrit-zuul.inaugust.com/t/gerrit/build/7f9e1bcd84bc498d8e61c78da0bccd28/console is the error i get (and i was able to repro by running the command as root in bwrap) | 15:01 |
corvus | tristanC: what's $HOME in that? | 15:01 |
corvus | oh i see | 15:02 |
corvus | /var/lib/zuul | 15:02 |
mnaser | i can share a build in private that works just fine | 15:02 |
corvus | ok, so 246 does end up with a writeable homedir because of "volume /var/lib/zuul" | 15:02 |
tristanC | corvus: exactly | 15:03 |
corvus | tristanC: cool, i think that would address all the problems i'm seeing... | 15:03 |
tobiash | corvus: to be more precise, our executor container starts boot.sh as root and privileged and this at the end does this: | 15:03 |
corvus | now i'm just really curious why it's working for mnaser | 15:03 |
tobiash | exec runuser -u zuul -g zuul -- /sbin/tini -g -s -- /usr/bin/eatmydata -- /opt/zuul/bin/zuul-executor -d | 15:03 |
tristanC | corvus: note that this is documented in https://docs.okd.io/latest/creating_images/guidelines.html | 15:04 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: Dockerfile: add support for arbritary uid https://review.opendev.org/650246 | 15:05 |
mnaser | corvus: im happy to help investigate certain things inside the container if you'd like | 15:06 |
corvus | mnaser: i'm wondering if something about the default security policy is different, but i don't yet know how to investigate that | 15:07 |
mnaser | corvus: thats my guess, it probably has to do with the k8s that you get from gke vs what we deploy (via kubeadm) | 15:08 |
*** hashar has quit IRC | 15:09 | |
corvus | (wow there's some real garbage on the internet about this) | 15:10 |
*** jamesmcarthur has quit IRC | 15:11 | |
*** jamesmcarthur has joined #zuul | 15:12 | |
*** zbr has quit IRC | 15:15 | |
*** jamesmcarthur has quit IRC | 15:17 | |
corvus | tristanC: looks like a legit quick-start error on that change: https://zuul.opendev.org/t/zuul/build/2560c9f640e0472ebbf663c81bd401ad/log/container_logs/gerritconfig.log | 15:19 |
tristanC | corvus: oh thanks for the link, i'll fix that then | 15:19 |
corvus | tristanC: i guess the gerritconfig container needs a volume? | 15:20 |
corvus | tristanC: also left an inline note about this: https://zuul.opendev.org/t/zuul/build/2560c9f640e0472ebbf663c81bd401ad/log/container_logs/gerritconfig.log#1 | 15:20 |
corvus | mnaser: i don't know what to investigate. but i'll let you know if i think of something. | 15:22 |
mnaser | corvus: cool, happy to run any commands to check permissions of any specific folders/paths/etc | 15:22 |
*** zbr has joined #zuul | 15:23 | |
*** bhavikdbavishi has joined #zuul | 15:25 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: Dockerfile: add support for arbritary uid https://review.opendev.org/650246 | 15:25 |
*** jamesmcarthur has joined #zuul | 15:27 | |
clarkb | do we need rsync -a if the end result is a seift upload? | 15:28 |
clarkb | would not running with -a simplify this enough to be happy in a container as is? | 15:28 |
corvus | clarkb: no, we could turn off the perms settings, but it's easier/better to just avoid running as root i think :) | 15:29 |
corvus | that was "no" to the first question and "yes" to the second :) | 15:29 |
clarkb | also why is root different than another user? or is the ideathat the ownership would match remote so no chowning with nonroot? | 15:30 |
*** pcaruana has quit IRC | 15:30 | |
*** Goneri has joined #zuul | 15:32 | |
*** jamesmcarthur has quit IRC | 15:32 | |
tristanC | hm, that whoami command doesn't work | 15:32 |
tristanC | well it's always failing because of the incorrect redirection, resulting in the user being created... | 15:33 |
corvus | clarkb: well if it's running as non-root, i don't think it fails on trying to chown | 15:33 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: Authorization rules: add templating https://review.opendev.org/705193 | 15:33 |
corvus | tristanC: yes, so it's failing toward working in the usual case. but would do the wrong thing if the user did exist. | 15:33 |
sugaar | hi guys | 15:35 |
sugaar | in the gerrit deployment, did you include ansible-playbook yourselves? | 15:35 |
sugaar | becuase it seems like my container can't finde it under the path writen in docker-compose | 15:35 |
corvus | sugaar: which container? can you share the error? | 15:36 |
sugaar | corvus bash-4.2$ /usr/local/lib/zuul/ansible/2.8/bin/ansible-playbook /var/playbooks/setup.yaml | 15:37 |
sugaar | bash: /usr/local/lib/zuul/ansible/2.8/bin/ansible-playbook: No such file or directory | 15:37 |
sugaar | that is inside gerrit | 15:37 |
*** ssbarnea has left #zuul | 15:38 | |
tristanC | corvus: good thing the user does not exist then :) what do you think of using `[[ `whoami` =~ ^[a-zA-Z]+ ]] || ...` ? | 15:39 |
sugaar | corvus and here the file, if you wanna have a look https://gitlab.com/celduin/infrastructure/celduin-infra/blob/39a877851b185f101d0c433e2887db22eec7f65e/zuul-deployment/gerrit.yaml | 15:40 |
corvus | tristanC: whoami works in my testing | 15:40 |
tristanC | corvus: you mean it exit(1) if the user is missing? | 15:41 |
corvus | tristanC: yes; 1 sec | 15:41 |
tobiash | sugaar: is that a fork of the quickstart docker-compose definition? | 15:41 |
sugaar | sort of yes | 15:41 |
sugaar | I am triyung to make it in k8s so later on I can tune it to our needs | 15:41 |
corvus | tristanC: http://paste.openstack.org/show/789012/ | 15:41 |
sugaar | but first I wan to make work a basic version of it | 15:42 |
pabelanger | corvus: when this issue last came up for sean-k-mooney, https://review.opendev.org/633796/ was the proposed fix to devstack. But long term, was to have same user on remote node / executor to help keep rsync happy | 15:42 |
tobiash | sugaar: the ansible-playbook path you posted should be like this in the executor image | 15:42 |
tristanC | corvus: ha, my bad, i was using an image already patched with the uid_entrypoint | 15:42 |
pabelanger | http://eavesdrop.openstack.org/irclogs/%23zuul/%23zuul.2019-01-29.log.html#t2019-01-29T16:05:03 was last time chown error came up | 15:43 |
tobiash | sugaar: you've probably taken that from the gerritconfig container of the quick start which uses the zuul-executor image and not the gerrit image | 15:43 |
sugaar | oh that is tru | 15:43 |
pabelanger | well, one of the last times | 15:43 |
sugaar | my bad | 15:43 |
sugaar | thanks tobiash | 15:43 |
tobiash | no problem | 15:43 |
corvus | pabelanger, clarkb: hrm, maybe we should start adding user/group: no to sync tasks? | 15:44 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: uid_entrypoint: fix redirection error https://review.opendev.org/705239 | 15:44 |
*** bhavikdbavishi1 has joined #zuul | 15:44 | |
*** jamesmcarthur has joined #zuul | 15:45 | |
tobiash | corvus: I think that's a good idea anyway because setting wrong uid on files might block the deletion of the job dir | 15:45 |
corvus | tristanC: wait, we don't need to share the /var/lib/zuul volumes | 15:45 |
*** bhavikdbavishi has quit IRC | 15:46 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 15:46 | |
corvus | tristanC: i think we just need to add it to the gerritconfig container | 15:46 |
*** ofosos has joined #zuul | 15:46 | |
corvus | tristanC: but wait -- gerritconfig is zuul-executor; it should have a volume | 15:47 |
openstackgerrit | Tristan Cacqueray proposed zuul/nodepool master: uid_entrypoint: fix redirection error https://review.opendev.org/705241 | 15:47 |
tristanC | corvus: yeah, i don't get why docker-compose didn't used the VOLUME defined in the zuul image... | 15:47 |
corvus | tristanC: yeah, i'm not sure i understand the original error anymore | 15:48 |
*** pcaruana has joined #zuul | 15:49 | |
pabelanger | It isn't clear for me where to look at google zuul configuration for zuul / zuul-executors, could somebody please share | 15:52 |
*** felixedel has joined #zuul | 15:53 | |
corvus | pabelanger: it's still mostly on my workstation until i can get something that actually works. https://gerrit-review.googlesource.com/c/zuul/ops/+/252316 is close though. | 15:53 |
pabelanger | thanks! | 15:55 |
corvus | tristanC: are we sure that a VOLUME entry will actually have the right perms? maybe it needs to be chowned? | 15:56 |
*** jamesmcarthur_ has joined #zuul | 15:58 | |
pabelanger | corvus: if I understand, zuul-config volume, will contain the zuul.conf file? That isn't generated by some template, but something you'd created (as secret)? | 15:59 |
pabelanger | was trying to see what the trusted_rw_dirs / trusted_rw_paths were set too | 16:00 |
corvus | pabelanger: correct, that's entirely local right now because it has the mysql password in it. once i have a working system, i can generate it with zuul secrets. | 16:01 |
*** jamesmcarthur has quit IRC | 16:01 | |
*** chkumar|rover is now known as raukadah | 16:02 | |
pabelanger | corvus: ack, np. | 16:02 |
corvus | pabelanger: the only trusted_ option i have set in trusted_ro_paths=/authdaemon/token | 16:02 |
corvus | pabelanger: that's just for a special google service account token thing | 16:02 |
felixedel | @zuul-maintainers We have three patches https://review.opendev.org/#/q/topic:bloeffler-report-reties+(status:open+OR+status:merged)that are pending for a long time now and I would like to ask if you could provide some feedback :)They are all about providing additional information in the MQTT reporter which we are using to get some statistics about | 16:03 |
felixedel | the buildsrunning in our Zuul CI instance. The first patch https://review.opendev.org/#/c/632727/16 is IMHO consistent as is and could be integrated.For the second one I did some adaptions to incorporate the requested changes (store the retry_builds in a different table thanthe normal builds), but at second glance this doesn't quite look right - as | 16:03 |
felixedel | both contain the same data structure in the end.Please see my comment on patchset 14 of https://review.opendev.org/#/c/633501/. I still kept the change to get your opinion about this.If we decide to go this way and store the retry_builds in a different table, I will adapt the child change accordingly (which isabout storing gate resets in the same | 16:03 |
felixedel | manner). | 16:03 |
pabelanger | corvus: is there the logs are going, some volume in k8s? | 16:03 |
corvus | pabelanger: no, uploaded to gcs | 16:03 |
pabelanger | corvus: do you have default_username=root in zuul.conf / executor section? | 16:11 |
corvus | pabelanger: no; nodepool provides the username of zuul | 16:12 |
pabelanger | corvus: ack, and zuul-executor is running as root? Or as zuul user | 16:13 |
corvus | pabelanger: as root (that's the discussion earlier) | 16:14 |
pabelanger | k, | 16:14 |
pabelanger | I think we could reproduce this via quickstart | 16:14 |
pabelanger | today, quickstart looks to be using root user to log into container jobs | 16:14 |
*** felixedel has quit IRC | 16:16 | |
pabelanger | so far, in the past we've had 2 options, keep executor user / nodepool user in sync (uid/gid). Or relax rsync to not preserve permissions | 16:17 |
pabelanger | I don't believe, at job runtime, we'd be able to create the user in bwrap? | 16:18 |
corvus | pabelanger: yes we can; we currently create the user that zuul is running as | 16:18 |
pabelanger | ah | 16:19 |
pabelanger | so, maybe we need to update https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/fetch-output/tasks/main.yaml#L11 to also try to set uid / gid when we create directories. | 16:19 |
corvus | we could change that, but i think it could complicate things, especially if you wanted the bwrap user to be able to write to the executor (ie, a trusted job) | 16:19 |
corvus | pabelanger: that would not affect any subdirs that were copied over | 16:20 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: Authorization rules: add templating https://review.opendev.org/705193 | 16:26 |
corvus | tristanC: i don't understand how /etc/passwd is writable | 16:28 |
corvus | tristanC: surely uid_entrypoint runs as the container user | 16:29 |
tristanC | corvus: https://review.opendev.org/#/c/650246/8/Dockerfile@45 | 16:30 |
corvus | tristanC: doesn't that just make the group permission bits of /etc/passwd the same as user? but both user and group are root? | 16:31 |
tristanC | corvus: yes, and iirc k8s gid are set to 0 | 16:33 |
corvus | tristanC: well, isn't this whole exercise supposed to be about not running as root? | 16:34 |
corvus | i think running with gid is nearly as bad | 16:34 |
tristanC | container process gid are set to 0 when running in k8s | 16:34 |
corvus | with gid 0 | 16:34 |
corvus | tristanC: yes, container process uid are also set to 0 | 16:34 |
tristanC | corvus: iiuc, gid 0 doesn't give you root capability, it's mostly for filesystem access | 16:34 |
corvus | my understanding is that this exercise was meant to support "runasuser:0" and "runasgroup:0" | 16:34 |
corvus | tristanC: but we've just made */etc/passwd* writeable by a supposedly "unprivileged" user | 16:35 |
corvus | i mean, that's a pretty short escalation path | 16:35 |
corvus | you just write it again and set your uid to 0, then spawn a shell and you're root | 16:35 |
tristanC | corvus: that implies you can change uid | 16:36 |
tristanC | which shouldn't be the case in container | 16:36 |
corvus | there's a *lot* of if's going on here. it also assumes that no important files will be gid=0+w | 16:37 |
tristanC | corvus: iiuc, allowing root group to write to /etc/passwd doesn't give any more privilege or attack vector, and this is what okd recommend for container images ( https://docs.okd.io/latest/creating_images/guidelines.html ) | 16:38 |
tristanC | either you are privileged, and you can already do that kind of things (change file mode or change uid), either you are not privileged, then you can't change uid or do privileged actions | 16:39 |
tristanC | it's just that allowing unprivileged to write to /etc/passwd let you fix issue for tool like git or rsync that except the uid to be a real user | 16:40 |
tristanC | in other words, once you have one /etc/passwd per context, then it's content is not really relevant security wise | 16:43 |
*** jamesmcarthur_ has quit IRC | 16:47 | |
*** mattw4 has joined #zuul | 16:56 | |
*** jamesmcarthur has joined #zuul | 17:00 | |
*** jamesmcarthur has quit IRC | 17:00 | |
corvus | tristanC: i don't agree that running a container process as gid=0 is more secure than not. it is one setgid binary away from a privilege escalation. it's a loss of defense in depth -- and a very risky change since it reverses the security assumptions that almost every other part of the system is making. | 17:00 |
*** jamesmcarthur has joined #zuul | 17:00 | |
*** hashar has joined #zuul | 17:01 | |
tristanC | corvus: oh right, but is this something we can change? (i was only refering to the chmod g=u /etc/passwrd) | 17:02 |
openstackgerrit | Merged zuul/zuul-jobs master: Add a markdownlint job and role https://review.opendev.org/607691 | 17:05 |
corvus | tristanC: found one. here's a privilege escalation using writeable /etc/passwd: http://paste.openstack.org/show/789014/ | 17:10 |
corvus | i just set the password for the zuul user, and set its uid to 0, then run su. | 17:11 |
corvus | probably could have just set root's password too. | 17:11 |
tristanC | corvus: if you think that's an issue, then i don't know how to prevent it while supporting arbritary uid | 17:21 |
tristanC | iiuc, to prevent process from changing identity you drop the cap_setuid | 17:24 |
corvus | tristanC: i definitely think that's an issue. if we believe that not running as root is valuable (and i think we all do), then clearly this is not meeting our goal. i mean, that's hardly an exploit -- it's really just using tools as they were meant to be used. i think at the very least, if you're using that uid_entrypoint script anywhere, you should update it to "chmod g-w /etc/passwd". that should help | 17:28 |
corvus | prevent this particular hole, but i feel like this is an example of what can happen when you reverse the assumptions of the unix permissions model, and more creative people than i may still come up with a way to utilize being gid 0. | 17:28 |
corvus | also, it's kind of a time bomb for anyone who wants to use the image and overrides the entrypoint. they may not care about anyuid, it will work fine for them, but they'll have an exploit waiting to happen on the image. i really don't think it's safe. | 17:29 |
tristanC | corvus: that's fair, would you like to propose adding the chmod g-w to the okd guideline? | 17:31 |
*** evrardjp has quit IRC | 17:33 | |
*** evrardjp has joined #zuul | 17:34 | |
corvus | tristanC: any idea where that source is? | 17:34 |
tristanC | corvus: https://github.com/RHsyseng/container-rhel-examples/blob/master/starter-arbitrary-uid/Dockerfile.centos7#L28 | 17:35 |
tristanC | which is the example referenced by the okd guidelines | 17:35 |
corvus | tristanC: oh i thought you meant update the okd docs themselves | 17:35 |
tristanC | corvus: imo, in the context of the zuul-executor pod, i think that if a malicious user can execute arbritary action as the zuul user, then there isn't much more to gain by getting root privilege. | 17:36 |
tristanC | corvus: for example, you also need to disable the service-account token, otherwise the malicious might as well spawn another pod using the root account directly | 17:36 |
tristanC | corvus: and if you start enabling such security setting, then you might as well configure the pod security context to prevent setuid altogether | 17:38 |
corvus | tristanC: indeed. but that argument is very close to "it's okay to run a container as root". (it kind-of is okay to do that, which is why we haven't been too worried about this up to this point, but if we're going to agree that it's better not too, then we should really do that) | 17:38 |
corvus | tristanC: remind me again, are we relying on bwrap being setuid, or is it all caps now? | 17:40 |
tristanC | corvus: i think we do because k8s doesn't support userns iirc | 17:43 |
tristanC | but i don't quite remember what is at really required between seccomp, capabilities(7) and selinux... | 17:46 |
tristanC | if the uid_entrypoint script enables privilege escalation, then we should fix the script upstream. | 17:48 |
*** jamesmcarthur has quit IRC | 17:48 | |
openstackgerrit | Antoine Musso proposed zuul/zuul master: web: humanize time durations https://review.opendev.org/705120 | 17:49 |
*** jamesmcarthur has joined #zuul | 17:50 | |
corvus | tristanC: i have 3 concerns: 1) running gid 0 may create vulnerabilities because it runs counter to assumptions that other tool makers may have made. 2) leaving /etc/passwd writable after the script runs clearly opens an easily-exploitable vulnerability. this is an example of #1, but perhaps not the only one. this is easy to fix. 3) leaving /etc/passwd writable in the image itself creates a latent | 17:51 |
corvus | opportunity for #2 to happen if someone overrides the entrypoint. no one expects overriding the entrypoint to make a container image exploitable. | 17:51 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: Dockerfile: add support for arbritary uid https://review.opendev.org/650246 | 17:54 |
corvus | finally found the source: https://github.com/openshift/openshift-docs/blob/master/modules/images-create-guide-openshift.adoc#support-arbitrary-user-ids | 17:54 |
corvus | tristanC: https://github.com/openshift/openshift-docs/commit/1128e47ee0ea01ac1410cfff42370f4872dbbffd | 17:55 |
tristanC | corvus: ha, but the PR comment says that only works with OCPv4 | 17:56 |
corvus | oh, the real info is in the pr comments, not the commit message. i was wondering. | 17:57 |
corvus | it still doesn't say why it's not applicable. | 17:57 |
corvus | tristanC: any ideas why it is "unnecessary" in 4? | 17:58 |
tristanC | corvus: your concerns makes sense to me, i just don't know why the uid_entrypoint is designed like that | 17:58 |
tristanC | corvus: perhaps in 4 there is oci hook in place to take care of that? no idea | 17:59 |
hashar | hi :) | 18:00 |
corvus | tristanC: in the mean time, is it possible to tell openshift to run a container with a fixed uid? can we just say "runasuser: 1000"? | 18:01 |
corvus | (we're already not running in the default config because of privileged) | 18:01 |
tristanC | corvus: it is, but not with the default scc | 18:01 |
corvus | tristanC: that's what we have to do for the executor for privileged, right? | 18:02 |
tristanC | corvus: i kept the uid range, but that can be changed here https://review.opendev.org/#/c/702758/15/deploy/scc.yaml@22 | 18:03 |
tristanC | corvus: so we could have a | 18:03 |
tristanC | extra scc for the rest of the zuul service to runasuser:1000, or we grant the zuul-operator to use the existing anyuid scc | 18:05 |
corvus | one of those would be my vote for how to handle this for all of our images for now. i'd like to see what's coming in v4 to perhaps allow running as a range. | 18:06 |
tristanC | corvus: and thus replacing the uid_entrypoint with an useradd ? | 18:08 |
corvus | tristanC: yes | 18:08 |
tristanC | corvus: wfm | 18:08 |
tristanC | corvus: i'm waiting to see if last PS of https://review.opendev.org/#/c/650246/ worked (where i moved the VOLUME statement after the USER), then i'll update it accordingly | 18:10 |
*** plaurin has joined #zuul | 18:16 | |
plaurin | Hi irc people! I wonder if anyone could help me out. Still working trying to make kubernetes work in zuul. I have a couple of shell steps, which all work until one returns the error \"ansible.errors.AnsibleError: Unable to create local directories(/home/ubuntu/.ansible/tmp): [Errno 13] Permission denied: b'/home/ubuntu/.ansible/tmp'\"], 'stderr_lines': []}" | 18:23 |
plaurin | Seems like a common issue found on google to fix with ansible.cfg but not sure where I should be "editing" anslble.cfg in a zuul installation | 18:25 |
plaurin | other than passing as variables in /etc/zuul/variables.yml | 18:25 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: Dockerfile: create and configure a zuul user with uid 1000 https://review.opendev.org/650246 | 18:26 |
pabelanger | plaurin: are you using 'ubuntu' as the user to connect to nodepool nodes? | 18:26 |
tristanC | corvus: no luck with the gerritconfig eperm error, let see if that new dockerfile does the trick ^ | 18:27 |
plaurin | yes, it's a docker image built using packer, from the official ubuntu docker image. I pass the "USER ubuntu" to the commited image | 18:28 |
plaurin | so connecting to the image it works with ubuntu user | 18:28 |
plaurin | unless zuul/nodepool changes this i'm not sure | 18:28 |
pabelanger | if you SSH into node, as ubuntu and cannot change permissions on /home/ubuntu. Sounds like something isn't configured properly | 18:29 |
pabelanger | can you check permissions? | 18:29 |
plaurin | it's using kubectl exec, not ssh as this is using ansible's kubernetes driver | 18:29 |
pabelanger | okay, can you log into node and ls -la /home/ubuntu | 18:30 |
plaurin | sure | 18:30 |
plaurin | just a sec | 18:30 |
corvus | tristanC: https://github.com/RHsyseng/container-rhel-examples/issues/103 | 18:30 |
pabelanger | we then need to see what user ansible is using | 18:30 |
plaurin | oh well look at that the .ansible belongs to root:root 😑️ | 18:31 |
tristanC | corvus: thanks! | 18:31 |
plaurin | I never tough checking that output after building with packer | 18:32 |
pabelanger | plaurin: yah, I would assume packer ansible, was root and left it | 18:32 |
pabelanger | then zuul cannot | 18:32 |
plaurin | I'm surprised because the ansible provisioner used with packer is configured to use 'ubuntu' user | 18:33 |
plaurin | heh I guess I need to fix around packer, not zuul then :P | 18:34 |
*** armstrongs has joined #zuul | 18:35 | |
plaurin | ohh I use "become: yes" in my initial yml "" facepalm "" | 18:35 |
pabelanger | \o/ | 18:35 |
*** tosky has quit IRC | 18:36 | |
plaurin | thx I'll fix that out | 18:36 |
plaurin | (-‸ლ) | 18:36 |
plaurin | I should come here more often you guys help me fix things much faster | 18:37 |
pabelanger | anything to get more zuul users :) | 18:38 |
*** jamesmcarthur has quit IRC | 18:38 | |
plaurin | I'm doing fun/complex stuff with zuul. Upgrading firmwares and BMCs, and running tests automagically 24/7 | 18:39 |
*** jamesmcarthur has joined #zuul | 18:40 | |
Shrews | plaurin: sounds worthy of a blog post ;) | 18:40 |
plaurin | & making sure you won't break your BIOS if you flash it 30 times in a row and cutting current midway hehe | 18:41 |
plaurin | It's nice to create my ansible recipes to do what I want and just "plug" them in zuul with a cron | 18:42 |
*** hashar has quit IRC | 18:43 | |
plaurin | okay gotta go back focus on my work, as usual thx for your help! | 18:43 |
*** armstrongs has quit IRC | 18:45 | |
corvus | plaurin: have you talked with jamesmcarthur? he's looking for zuul users | 18:45 |
plaurin | I haven't | 18:45 |
corvus | plaurin: maybe you can pm him your email address if you have a minute. i'm sure he'd appreciate it | 18:46 |
plaurin | sure, will do | 18:47 |
clarkb | plaurin: as a user of things that have firmware it makes me happy to hear people care about not bricking my devices :) | 18:47 |
corvus | plaurin: yeah, i can confidently say we're all rooting for you. (so to speak) | 18:47 |
*** jamesmcarthur has quit IRC | 18:48 | |
plaurin | hehe, yeah I do devops to try and brick stuff in a manner of speaking | 18:48 |
corvus | tristanC, mnaser, tobiash: it looks like you can't set the owner of a mounted secret in k8s, so if we change our uid, i think we also have to drop the defaultMode lines we have in the helm charts | 18:49 |
tobiash | corvus: I set this on the pod (not container) spec: http://paste.openstack.org/show/789017/ | 18:50 |
tobiash | which afaik should change the group of mounted things | 18:50 |
tobiash | but maybe that doesn't work | 18:52 |
tobiash | looks like they are still owned by root | 18:53 |
*** plaurin has quit IRC | 18:57 | |
*** igordc has joined #zuul | 19:03 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add google-cloud-storage to executor ansible https://review.opendev.org/705279 | 19:07 |
*** bhavikdbavishi has quit IRC | 19:07 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: WIP: Add uid/gid 1000 to Dockerfile https://review.opendev.org/705280 | 19:07 |
corvus | tristanC: ^ just so i don't lose track, that's what i did locally to build an image (based on tobiash's paste from earlier) | 19:08 |
corvus | it's slightly different than what you uploaded (and is missing USER lines). i don't have an opinion on which is better/correct. but nuances about volumes and ownership may be important here. | 19:09 |
tristanC | corvus: quickstart is now failing because of Saving key "/var/ssh/admin" failed: Permission denied | 19:17 |
tristanC | tobiash: i confirm k8s secrets are always monted as root, and therefor require using a relaxed defaultMode, e.g. the one by default is 0644 | 19:22 |
tristanC | corvus: it seems like docker-compose volume doesn't respect the container user. perhaps we have to define the `user` attribute in the compose file? | 19:26 |
tristanC | or according to https://github.com/docker/compose/issues/3270#issuecomment-543603959 , we should mkdir /var/ssh && chown zuul /var/ssh in the Dockerfile | 19:29 |
*** pcaruana has quit IRC | 19:30 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: Dockerfile: create and configure a zuul user with uid 1000 https://review.opendev.org/650246 | 19:32 |
corvus | https://gerrit-zuul.inaugust.com/t/gerrit/build/508e47023cd243c68d5f8c128556a03b | 20:08 |
corvus | a successful build with an uploaded log | 20:08 |
corvus | tristanC: hrm, i'm not a fan of having quick-start specific volumes in the docker image.... | 20:14 |
corvus | (or rather, quickstart specific paths) | 20:15 |
corvus | tristanC: what if we add a container to docker-compose.yaml with a dockerfile which creates and chowns those mountpoints, and volume mounts for all the quickstart-specific volumes, then have the quickstart services all depends_on that container? | 20:26 |
corvus | depends_on waits for a container to "start" -- i'm assuming that means initializing new volumes... | 20:27 |
tristanC | corvus: not sure what `volume, when first used by a container, get it's initial content and permission inherited from the container` (from the compose issue) implies. i'm not a fan either, but the cost of such directories in the layers is rather negligable | 20:28 |
corvus | tristanC: i think that means that docker will copy the contents from the container image to the volume the first time a container mounts a newly created docker volume. | 20:31 |
corvus | tristanC: i think it's really messy to have this kind of interaction between the images and quickstart; i'd prefer a straightforward upstream image that people can use downstream. | 20:32 |
*** hashar has joined #zuul | 20:32 | |
clarkb | https://zuul.opendev.org/t/openstack/build/b4ad5a108af94478b09cd715b4e6a5f4/log/job-output.txt#3140 do we need a become:true in collect-container-logs ? | 20:34 |
clarkb | mnaser: ^ you added that role. Is the expectation that we do that at a higher level? | 20:34 |
corvus | clarkb: normally the zuul user is in the docker group | 20:34 |
corvus | clarkb: perhaps that's not the case for the system-config jobs though | 20:35 |
tristanC | corvus: so you prefer an extra container in the docker-compose to chown 1000 the mountpoints ? | 20:35 |
corvus | clarkb: (because they use the system-config method of bootstrapping docker?) | 20:35 |
clarkb | ya looks like there is a local install-docker | 20:35 |
clarkb | because we install docker in production | 20:36 |
tristanC | corvus: also note that when the upstream zuul image setup the USER, then it may break current user. e.g. tripleo folks copied the docker-compose and they will need to also use the extra container to do the chowning | 20:38 |
clarkb | I've added apply: become: yes to the include_role I expect that will fix it | 20:39 |
corvus | tristanC: yes, it is becoming apparent that adding USER is going to be a breaking change for anyone using images. i think we will need to announce that. | 20:40 |
corvus | that's really separate from quick-start though | 20:41 |
corvus | tristanC: do you think there's any chance that if we mounted the volumes under /var/lib/zuul they would pick up the perms from that (even though /var/lib/zuul is also a volume? | 20:42 |
tristanC | corvus: actually we don't have to set USER now, we could just create the user, that should unblock k8s deployment | 20:42 |
corvus | tristanC: agreed (then work on USER later) | 20:42 |
corvus | tristanC: it looks like https://review.opendev.org/705280 is sufficient to work in k8s with runAsUser and does not break quick-start | 20:43 |
corvus | i have to grab lunch; biab | 20:43 |
*** rfolco has quit IRC | 20:43 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: Dockerfile: create a zuul user with uid 10001 https://review.opendev.org/650246 | 20:47 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: Add OpenShift SCC and functional test https://review.opendev.org/702758 | 20:52 |
*** mattw4 has quit IRC | 20:54 | |
openstackgerrit | Antoine Musso proposed zuul/zuul master: executor: ansible does not need cowsay https://review.opendev.org/705294 | 20:56 |
*** mattw4 has joined #zuul | 20:58 | |
corvus | tristanC, tobiash: how should we reconcile https://review.opendev.org/705280 and https://review.opendev.org/650246 ? should we have an addgroup? should we set the homedir to /var/lib/zuul ? | 21:53 |
corvus | (and what should the userid be?) | 21:53 |
corvus | how about uid 10001, homedir=/var/lib/zuul, add zuul group ? | 21:53 |
corvus | that seems like it gets the highlights from both -- does that work? | 21:54 |
corvus | oh, also we should do a chmod on /var/lib/zuul since that's a volume we expect docker to create, so that would let it work as non-root in docker | 21:55 |
corvus | let me propose that just to get a strawman up | 21:55 |
tristanC | corvus: zuul doesn't need a special gid iiuc | 21:56 |
*** rlandy has quit IRC | 21:56 | |
corvus | (oh, the -m will create the homedir, so the chown is already there for /var/lib/zuul) | 21:57 |
corvus | tristanC: what does it end up with in your patch? | 21:57 |
tristanC | corvus: with 650246 and this change https://review.opendev.org/#/c/702758/15..16 (using runAsUser), both k8s and openshift for some reasone failed with: https://zuul.opendev.org/t/zuul/build/0ed04a02b1254c33949b84c619fd49a6/console#3/0/6/controller | 21:58 |
corvus | tristanC: that error probably means non-writeable homedir | 21:58 |
corvus | it's probably trying to write to ~/.ansible/tmp -- at least that's what was happening in my earlier trials | 21:59 |
tristanC | corvus: is there a reason why that would only happen with ansible-2.9? | 22:00 |
corvus | tristanC: oh wow, i didn't notice it was only 2.9 that failed... i *thought* earlier i saw all 4 versions fail. so either i was mistaken in my earlier test, or this is a different error. | 22:00 |
tristanC | i guess ansible-2.9 does something special on init with the home directory, though i don't get why 702758 is failing with the useradd addition while it was working with the uid_entrypoint script | 22:02 |
*** tosky has joined #zuul | 22:05 | |
corvus | tristanC: you can run the built image from insecure-ci-registry.opendev.org:5000/zuul/zuul-executor:d1f09d7823254a54bac0ed0f13d23606_latest | 22:07 |
corvus | tristanC: i'm doing that now to inspect it | 22:07 |
corvus | (ftr, the other question: uid=10001(zuul) gid=10001(zuul) groups=10001(zuul) | 22:07 |
corvus | tristanC: i'm unable to repro in gke, it seems to work fine | 22:10 |
tristanC | corvus: podman doesn't seems to be able to run that image: http://paste.openstack.org/show/789025/ | 22:10 |
tristanC | it works with docker though | 22:12 |
tristanC | corvus: could you paste the k8s container resources you apply on gke | 22:13 |
corvus | tristanC: this is what i just did to test that http://paste.openstack.org/show/789026/ | 22:13 |
corvus | it seems to have started up without incident and joined the scheduler | 22:14 |
*** mattw4 has quit IRC | 22:17 | |
tristanC | corvus: it seems like one difference is the securityContext.runAsUser being set for the podspec while i put it for the container (along with the privileged attr), i'll update the operator to try do that | 22:18 |
corvus | tristanC: yeah, that was accidental on my part. i'll retry with everything in the container spec and see if it breaks | 22:18 |
corvus | tristanC: didn't seem to change anything | 22:19 |
*** avass has quit IRC | 22:22 | |
openstackgerrit | Merged zuul/zuul master: executor: ansible does not need cowsay https://review.opendev.org/705294 | 22:26 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-operator master: DNM: add PodSpec securityContext attribute https://review.opendev.org/705308 | 22:26 |
tristanC | corvus: fwiw, ^ renders http://paste.openstack.org/show/789027/ | 22:28 |
corvus | tristanC: (btw it looks like that has the same service bug i proposed a fix to zuul-helm -- the service should have a "clusterIP: None" entry in the spec) | 22:29 |
corvus | tristanC: why the HOME env var? | 22:30 |
tristanC | corvus: previous PS worked fine without the clusterIP | 22:30 |
*** mattw4 has joined #zuul | 22:30 | |
tristanC | corvus: the HOME is a leftover from the uid_entrypoint | 22:31 |
corvus | tristanC: you won't notice the clusterip thing until you try to stream a log | 22:31 |
tristanC | corvus: i did streamed log without issue, on both k8s and openshift | 22:32 |
tristanC | (the resources have been tested end-to-end with a job running on kubernetes) | 22:32 |
corvus | tristanC: in the web dashboard? | 22:33 |
corvus | i must have missed the test for that, but that's pretty cool if we have it. | 22:33 |
tristanC | i meant manually tested using the web interface streaming this job: https://softwarefactory-project.io/r/#/c/17347/3/conf/sf/applications/SoftwareFactory.dhall@158, we still need a ci job to demonstrate this automatically. | 22:37 |
corvus | tristanC: i'd be interested in knowing what name the scheduler connected to | 22:39 |
corvus | because my understanding is that unless you make a headless service you don't get an entry in dns for the statefulset pod | 22:40 |
corvus | https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#stable-network-id | 22:44 |
tristanC | corvus: i have the executor running job, trying to reproduce using curl to console-stream | 22:46 |
corvus | tristanC: in my test, the scheduler was trying to contact zuul-executor-0.zuul-executor.zuul.svc.cluster.local | 22:47 |
corvus | that name did not resolve unless i made the service headless | 22:48 |
tristanC | corvus: here is the url i got from the status page "finger://sf-executor-0.executor.myproject.svc.cluster.local/30c0dbdf68214ce1aafa25534163dc54" | 22:49 |
corvus | and that hostname resolves from the scheduler? | 22:50 |
corvus | regardless, i think a headless service is more correct anyway -- it doesn't make sense to connect to executor.myproject.svc.cluster.local:7900 (executors aren't a load-balanced service). so while i'm curious about the different behavior, i still think we should add it to the operator. it's not urgent though, we have enough problems. :) | 22:52 |
tristanC | corvus: it does from zuul-web : http://paste.openstack.org/show/789029/ | 22:52 |
tristanC | and the same ip from the scheduler | 22:53 |
corvus | you can sudo on zuul-web? | 22:54 |
corvus | oh, you're doing that from outside k8s | 22:54 |
tristanC | corvus: yes, to be able to use `dig` | 22:55 |
tristanC | corvus: actually, i may not have tested the zuul-web console-stream on k8s, only the job-output.txt | 22:56 |
clarkb | https://zuul.opendev.org/t/openstack/build/c13a6d5a761e4de9af2be566033c5f51/console#3/1/3/refstack01.openstack.org somehow the container name is becoming container_command | 23:00 |
clarkb | between these two tasks https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/collect-container-logs/tasks/main.yaml#L6-L16 | 23:00 |
clarkb | anyone understand how that could happen? | 23:01 |
clarkb | maybe something to do with the loop? | 23:01 |
tristanC | corvus: here is a cli test: sudo dnf install -y python3-websocket-client; function zuul-stream { wsdump -v 2 -t '{"uuid": "'$(echo $1 | sed 's/.*stream.\([^\?]*\).*/\1/')'", "logfile": "console.log"}' -r ws://172.30.45.62:9000/api/tenant/local/console-stream ; } | 23:03 |
clarkb | it is almost like every variable in the loop is getting replaced with {{ item }} | 23:04 |
corvus | clarkb: i agree with what you're seeing and it makes no sense to me. | 23:05 |
clarkb | I'm skimming ansible commit history to see if there is any suspect fixes, but this really looks like an ansible loop bug to me | 23:08 |
clarkb | because that var is not overridden in the previous task and its value is correct there | 23:08 |
clarkb | this job ran with 2.8.7, there is a 2.8.8 | 23:08 |
clarkb | nothing jumps out though | 23:09 |
clarkb | pabelanger: ^ you seem to keep on top of ansible behaviors does that look familiar to you at all? | 23:09 |
tristanC | clarkb: perhaps setting `loop_control: {loop_var: item}` would yield helpful result? | 23:11 |
clarkb | tristanC: I can try that, then depends on it to check, changes in a minute | 23:11 |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Debug weird Ansible loop behavior https://review.opendev.org/705312 | 23:14 |
hashar | clarkb: do you have a way to run the playbook with extra verbosity? ansible-playbook -vvvvv | 23:15 |
hashar | or ANSIBLE_DEBUG=1 | 23:15 |
hashar | or something similar | 23:15 |
clarkb | I don't know, it is zuul running it | 23:15 |
corvus | clarkb: i'm unable to reproduce locally with a similar playbook under 2.8.7 | 23:16 |
corvus | clarkb: zuul-executor verbose | 23:16 |
corvus | you might need to do that for all the executors before the job starts though | 23:16 |
corvus | make sure you turn it back off if you do :) | 23:16 |
clarkb | ya, let me see if we get different behavior with the above change and the ndecide if I need to do that | 23:17 |
corvus | that gets you "-vvv" | 23:17 |
corvus | which is not quite as many v's as hashar suggested. not sure if that's enough | 23:17 |
clarkb | another option is to set the jobs ansible version to 2.9 | 23:21 |
clarkb | to try and narrow down the behavior | 23:21 |
corvus | tristanC: i'm not convinced that 702758 used the zuul image builds from 650246 (though i don't understand why not) | 23:27 |
hashar | clarkb: not sure if that can help, but 'loop' lets you register the results which would have the value of 'item' (well it is known but still.. that might help maybe) | 23:29 |
hashar | https://docs.ansible.com/ansible/latest/user_guide/playbooks_loops.html#registering-variables-with-a-loop | 23:29 |
clarkb | in this case it is the input of the loop commands that is the problem not the output | 23:30 |
hashar | eek :( | 23:30 |
clarkb | the verbose run idea is a good one, but I'll avoid that until I have no othe roptions because I have to update all the executors to do it | 23:30 |
corvus | tristanC: yeah https://zuul.opendev.org/t/zuul/build/12605547681f4171aef450e20060e758/log/zuul-info/inventory.yaml shows 246 in the chain, but no artifacts from it | 23:31 |
tristanC | corvus: indeed, could recheck fix that? | 23:34 |
corvus | tristanC: possibly; i went ahead and did that. | 23:34 |
corvus | i think we need more debug lines in the provides/requires code | 23:34 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add some debug lines for provides/requires https://review.opendev.org/705313 | 23:42 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add some debug lines for provides/requires https://review.opendev.org/705313 | 23:43 |
*** rfolco has joined #zuul | 23:48 | |
clarkb | setting the loop var did not change anything | 23:50 |
*** igordc has quit IRC | 23:52 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!