openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: zk: skip node already being deleted in cleanup leaked instance task https://review.openstack.org/576288 | 00:09 |
---|---|---|
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: job: add ansible-tags and ansible-skip-tags attribute https://review.openstack.org/575672 | 00:10 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: angular: call enableProdMode https://review.openstack.org/573494 | 00:13 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: gerrit: add support for report only connection https://review.openstack.org/568216 | 00:15 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/pipelines route https://review.openstack.org/541521 | 00:29 |
*** jiapei has joined #zuul | 03:23 | |
*** elyezer has quit IRC | 04:24 | |
*** elyezer has joined #zuul | 04:26 | |
tobiash | tristanC: these functions now return (True, 'Reason') or (False, 'Reason') and both still evaluate to True and I think that's dangerous for a 'matches' function | 05:24 |
tobiash | tristanC: but you could wrap this in a class and overwrite the __bool__ function https://docs.python.org/3/reference/datamodel.html#object.__bool__ | 05:30 |
tobiash | tristanC: like a FalseWithReason class | 05:33 |
*** elyezer has quit IRC | 05:35 | |
*** elyezer has joined #zuul | 05:36 | |
*** gtema has joined #zuul | 05:53 | |
*** bhavik1 has joined #zuul | 06:35 | |
tristanC | tobiash: oh good idea, i'll update the review | 06:35 |
*** pcaruana has joined #zuul | 06:36 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: github: add event filter debug https://review.openstack.org/580547 | 06:37 |
*** bhavik1 has quit IRC | 06:40 | |
*** hashar has joined #zuul | 07:00 | |
*** quiquell|off is now known as quiquell | 07:04 | |
*** elyezer has quit IRC | 07:21 | |
*** fbo|off is now known as fbo | 07:23 | |
*** elyezer has joined #zuul | 07:24 | |
*** elyezer has quit IRC | 07:33 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/pipelines route https://review.openstack.org/541521 | 07:33 |
*** elyezer has joined #zuul | 07:36 | |
*** jiapei has quit IRC | 07:43 | |
*** dmsimard has quit IRC | 07:54 | |
*** dmsimard has joined #zuul | 07:55 | |
tobiash | yay, tested that caching patch in prod now and freeze time of one of our bigger jobs went down from 400ms before to 9ms now :) | 08:09 |
tristanC | 98% increased performance :-) | 08:20 |
openstackgerrit | Markus Hosch proposed openstack-infra/zuul master: Reduce number of reconfigurations on branch delete https://review.openstack.org/580967 | 09:10 |
openstackgerrit | Markus Hosch proposed openstack-infra/zuul master: Per-branch management of unparsed config cache https://review.openstack.org/582897 | 09:10 |
*** electrofelix has joined #zuul | 09:14 | |
openstackgerrit | Markus Hosch proposed openstack-infra/zuul master: Per-branch management of unparsed config cache https://review.openstack.org/582897 | 09:19 |
*** sambetts_ is now known as sambetts | 09:25 | |
*** electrofelix has quit IRC | 09:59 | |
*** electrofelix has joined #zuul | 10:11 | |
*** rcarrill1 is now known as rcarrillocruz | 10:16 | |
*** elyezer has quit IRC | 10:23 | |
*** elyezer has joined #zuul | 10:24 | |
*** electrofelix has quit IRC | 10:34 | |
quiquell | tristanC: zuul.project.src_dir is the place to generate stuff in a job ? | 10:50 |
tristanC | quiquell: you need to fetch to zuul.executor.log_root | 10:51 |
quiquell | tristanC: This directory is also the directory of the log.o.o ? | 10:55 |
quiquell | tristanC: I have try zuul.executor.work_root¶ but I don't have permissions there | 10:56 |
tristanC | quiquell: the upload-log roles executed by the base post playbook will exports the file in executor.log_root to logs.o.o | 11:00 |
quiquell | tristanC: Ok, thanks ! | 11:01 |
quiquell | tristanC: And zuul.executor.work_root, we cannot write there. | 11:02 |
tristanC | quiquell: you're welcome. no you can't write to work_root, but log_root should be fine | 11:05 |
quiquell | tranzemc: Ok, is there any way we can instruct upload-log to ignore some dirs ? | 11:06 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: docs: add job's logs guide https://review.openstack.org/582921 | 11:12 |
tristanC | quiquell: https://review.openstack.org/582921 is a copy of https://softwarefactory-project.io/docs/user/zuul_user.html#export-logs-artifacts-to-the-logserver which contains more detailed instructions | 11:12 |
tristanC | not sure logstash_processor_config are available to untrusted-projects in zuul.openstack.org | 11:13 |
quiquell | tristanC: So looks like zuul.project.src_dir is the correct place to generate stuff | 11:15 |
quiquell | tristanC: Then copy what want to be logged to log_root | 11:15 |
quiquell | Is that correct ? | 11:15 |
tristanC | oh right, you can prepare artifacts in zuul.project.src_dir before doing the fetch to zuul.executor.log_root | 11:16 |
quiquell | tristanC: And after that just copy it to log_root | 11:16 |
quiquell | ok | 11:17 |
tristanC | yes, zuul.project is on the ephemeral test instance, zuul.executor is on the executor, e.g. localhost | 11:17 |
quiquell | so zuul.project.src_dir is like the workspace for the build | 11:17 |
quiquell | and also the place for the cloned project | 11:17 |
tristanC | you can also create a dir at "{{ ansible_env.HOME }}/workspace" on the test node to get a clean workspace | 11:20 |
quiquell | tristanC: Yep we where doing so, but looking for an already created place | 11:21 |
quiquell | tristanC: Do you see any problem on using src_dir ? | 11:21 |
tristanC | quiquell: beware that zuul.project.src_dir is relative to ansible_env.HOME | 11:22 |
*** elyezer has quit IRC | 11:22 | |
tristanC | quiquell: src_dir should work, but make sure you don't conflict with existing files, for example in case there is already a "build" or "logs" directory in the projects, or as a result of the test | 11:22 |
tristanC | otherwise you'll fetch and export extra bits | 11:22 |
quiquell | tristanC: That's a good one, will go back to 'workspace' dir | 11:23 |
quiquell | tristanC: Thanks | 11:23 |
*** elyezer has joined #zuul | 11:23 | |
tristanC | quiquell: you're welcome :-) | 11:23 |
*** electrofelix has joined #zuul | 11:23 | |
*** GonZo2000 has joined #zuul | 11:30 | |
*** GonZo2000 has quit IRC | 11:30 | |
*** GonZo2000 has joined #zuul | 11:30 | |
*** gtema has quit IRC | 11:32 | |
*** sshnaidm is now known as sshnaidm|rover | 11:43 | |
*** quiquell is now known as quiquell|lunch | 12:12 | |
*** rlandy has joined #zuul | 12:23 | |
*** quiquell|lunch is now known as quiquell | 12:37 | |
*** elyezer has quit IRC | 12:44 | |
*** samccann has joined #zuul | 12:50 | |
*** elyezer has joined #zuul | 12:57 | |
fungi | note that having an explicit workspace the job is allowed to write to on the job nodes is a bit of a jenkinsism inherited in zuul v2 which we got rid of in v3. as discussed in #openstack-infra the job can write anywhere on a node you like as long as the usual posix filesystem permissions are taken into account (so if you want the "zuul" user to write in /opt/mydir you need to use root permissions in the job to | 13:04 |
fungi | create and chown/chmod it accordingly) | 13:04 |
*** gtema has joined #zuul | 13:05 | |
fungi | there _is_ a "workspace" on the executors because they're shared by lots of builds and tightly restricted as to where they allow writes, but you shouldn't write files explicitly on the executor unless you know you need to do that (e.g., a nodeless executor-only job) | 13:06 |
*** electrofelix has quit IRC | 13:07 | |
*** gtema has quit IRC | 13:08 | |
mordred | Shrews: I made a pbrx patch for nodepool too https://review.openstack.org/#/c/582732/ | 13:29 |
mordred | tristanC: I'm gonna try to get something pushed up this morning, but if I don't, I'll hand it off to you for sure | 13:29 |
*** rcarrill1 has joined #zuul | 13:30 | |
Shrews | cool | 13:30 |
*** rcarrillocruz has quit IRC | 13:31 | |
*** rcarrill1 is now known as rcarrillocruz | 13:33 | |
Shrews | hrm, i'm not sure what's happening with the nodepool-functional-py35-src job: http://logs.openstack.org/32/582732/2/check/nodepool-functional-py35-src/7c024da/job-output.txt.gz#_2018-07-15_13_58_14_680051 | 13:41 |
Shrews | python3 should be the version in use, so that's weird | 13:41 |
*** acozine1 has joined #zuul | 13:42 | |
Shrews | but obviously it's not :/ | 13:44 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: ara-report: add missing ara_report_run check https://review.openstack.org/577675 | 13:52 |
*** sshnaidm|rover is now known as sshnaidm|afk | 13:54 | |
pabelanger | Shrews: looks like broken dependency for diskimage-builder | 13:55 |
pabelanger | guess something dropped 2.7 support | 13:56 |
Shrews | oh, so maybe yesterday's release of astroid breaks dib | 13:59 |
Shrews | i thought dib was also using py3 but i guess not? | 14:00 |
Shrews | oh, i think dhellmann already has a fix out | 14:04 |
*** openstackgerrit has quit IRC | 14:04 | |
Shrews | maybe not | 14:07 |
*** weshay is now known as weshay_mtg | 14:30 | |
*** quiquell is now known as quiquell|off | 14:32 | |
*** sshnaidm|afk is now known as sshnaidm|rover | 14:47 | |
corvus | tristanC, tobiash: any thoughts about my email question regarding building container images? | 14:55 |
pabelanger | I still need to read up on it, but hope to reply this afternoon | 14:57 |
*** mhu is now known as mhu|He-Man | 15:00 | |
*** mhu|He-Man is now known as mhu | 15:01 | |
clarkb | corvus: I need to write a response to that, tl;dr is dib can already build container images (it might not be the best/most useful tool for that but is possible iirc) so might be more a question of uploading images than building them? | 15:02 |
*** pcaruana has quit IRC | 15:02 | |
*** jiapei has joined #zuul | 15:03 | |
corvus | clarkb: yeah, i meant to ask less about "how" and more about "whether we should, regardless of mechanism" | 15:03 |
corvus | clarkb: i agree, if we decide we want to we should consider dib | 15:04 |
pabelanger | Yah, some optimization for DIB and containers would be recommended, but works pretty well. | 15:05 |
pabelanger | a while back create ubuntu-rootfs element to help with container builds: https://review.openstack.org/413115/ needs to be rebased | 15:07 |
mordred | corvus: thanks for the reminder - I just replied to the list | 15:08 |
tobiash | corvus: sorry, I'm in fire fighting mode since quite some time, will read later this evening | 15:09 |
*** weshay_mtg is now known as weshay | 15:13 | |
Shrews | i've read mordred's words and find myself agreeing with him (crazy, i know) and cannot think of convincing arguments in the other direction | 15:21 |
corvus | yeah, mordred and jhesketh both make compelling arguments :) | 15:23 |
corvus | Shrews: also, yes, if you agree with mordred you are by definition crazy | 15:23 |
corvus | i frequently agree with mordred, for what that's worth :) | 15:23 |
Shrews | now i agree with corvus, which makes me some sort of exponential of crazy | 15:25 |
mordred | you're all nuts | 15:27 |
corvus | i agree | 15:27 |
mordred | speaking of nuts - if anyone wants to review an ansible role: https://review.openstack.org/#/c/580730/ | 15:27 |
mordred | I should recheck the depends-on patches for that just to make sure... | 15:28 |
mordred | corvus: (also, as you may have already seen, I did a 'build images of nodepool' patch as well | 15:29 |
Shrews | speaking of crazy... mordred, i sort of want to land your nodepool openstacksdk change. we ready for that? | 15:29 |
mordred | Shrews: yes. we should totally be ready for that stack | 15:29 |
Shrews | well, i think it's the last one | 15:29 |
Shrews | anyone else want to review that change? https://review.openstack.org/572829 | 15:30 |
Shrews | already have two +2's, but since it's a rather large deal, others might be interested | 15:31 |
Shrews | mordred: just did a 'check experimental' on it, fwiw | 15:32 |
mordred | cool | 15:34 |
*** ssbarnea1 has joined #zuul | 15:39 | |
ssbarnea1 | https://review.openstack.org/#/c/570546/ anyone? one liner. | 15:40 |
*** samccann has quit IRC | 15:44 | |
*** samccann has joined #zuul | 15:45 | |
*** samccann has quit IRC | 15:46 | |
*** samccann has joined #zuul | 15:46 | |
*** openstackgerrit has joined #zuul | 15:57 | |
openstackgerrit | Fabien Boucher proposed openstack-infra/zuul master: Add tenant yaml validation option to zuul client https://review.openstack.org/574265 | 15:57 |
*** sshnaidm|rover is now known as sshnaidm|bbl | 16:28 | |
*** sshnaidm|bbl has quit IRC | 16:31 | |
tobiash | fbo: added a question to 574265 | 16:36 |
tobiash | corvus: what do you think about ^ ? | 16:36 |
pabelanger | looking for some job design help, hopefully this will be clear. In rdoproject.org, we have a project (rdoinfo) that potentially needs to have required-projects of 300 packaging repos (eg: openstack/nova-distgit). Today, I don't believe https://zuul-ci.org/docs/zuul/user/config.html#attr-job.required-projects.name can we a regex, any concerns about making it? | 16:40 |
pabelanger | The 2nd part of the design, we actually don't really need to push the 300 projects using prepare-workspace to the node, only if rdoinfo has the depends-on header do we actually care about using that project in require-project to then build an rpm, trying to best think of how to make the rsync more dynamic. Could I use zuul.items here? | 16:42 |
clarkb | pabelanger: maybe update zuul to have depends on populate required projects? | 16:50 |
clarkb | could be a job attribute (I think it may do this in some cases already though?) | 16:50 |
*** sambetts is now known as sambetts|afk | 16:51 | |
*** gtema has joined #zuul | 16:54 | |
pabelanger | I see, if zuul is able to fine the project, but not in require-projects auto addit? | 16:54 |
pabelanger | when using depends-on | 16:54 |
clarkb | yeah an implicit required project mode based on depends on | 16:55 |
pabelanger | yah, that would actually work well here, the other concern was, if we added 300 required-projects, how to we keep it in sync when we add new distgit packages. Aside from writing check jobs to help validation | 16:58 |
fungi | could have some nasty side effects if parts of the job rely on required-projects to inform them of what versions of things to install (think tox-siblings) since it could result in deadlocking | 16:58 |
fungi | so we should make it non-default, or have a second required-projects-like var which they go into, or just document that you shouldn't rely on required-projects to only be influenced by explicit static job configuration | 16:59 |
fungi | leaning toward something like a required-projects-implicit list, and then have zuul merge and check those out on disk but keep the vars distinct so you know which were required by configuration and which by dynamic dependencies | 17:01 |
tobiash | corvus: do we already have a concept for buildset resources (nodes) or buildset livetime of nodes? | 17:01 |
pabelanger | Yah, I can see that. I don't actually mind if we allow regex for required-projects then write some check job to deal with sync issues too | 17:02 |
pabelanger | the 300 projects would be ^openstack/*-distgit | 17:02 |
tobiash | corvus: our projects are starting to kill our artifactory with cached stuff which should be handed over to the next child job | 17:02 |
clarkb | pabelanger: my immediate thought on the repo sync is that we should always sync the required projects. Which is why I'm trying to come up with some way of influencing that list rather than special casing behavior around syncing | 17:03 |
clarkb | pabelanger: if you ahve asserted a repo is required theni t should be synced | 17:03 |
tobiash | buildset resources or optional buildset livetimes of nodes could be a good way to solve this use case much better than using a centralized hosting service | 17:03 |
pabelanger | clarkb: yah, I think if we added some logic or new role to only push projects in zuul.items for this job, it might something that worked today | 17:04 |
clarkb | tobiash: something similar came up in the k8s discussion. A way to keep a build around for the lifetime of all other builds (so that it could host a registry iirc) | 17:04 |
*** gtema has quit IRC | 17:04 | |
pabelanger | if I understand zuul.items correctly | 17:04 |
*** hashar is now known as hasharAway | 17:04 | |
pabelanger | tobiash: yah, rdoproject just pushed to central log server today for artifacts, so really looking forward to container spec to help remove that dependency | 17:05 |
fungi | tobiash: at one point we had discussed providing (limited) scratch space on executors readable by other builds, arranging so dependent jobs all get their builds for a particular changeset run from the same executor, and expiring the shared scratch space when the last build in a dependent job set terminates | 17:05 |
tobiash | clarkb: yes, I remember this. Did this reach a point where I could start to implement such a functionality? | 17:05 |
clarkb | pabelanger: it will still have to update 300 repos for that job though, you are only optimizing half of the problem (the sync from executor to the other test nodes) | 17:06 |
fungi | oh, but the long-lived build also makes sense. especially if it can come with some sort of provider affinity | 17:06 |
clarkb | tobiash: I think the spec may have been updated to mention it, but unsure if anything more has been done | 17:06 |
pabelanger | clarkb: agree, downside to that | 17:07 |
fungi | build node provider affinity might also be easier to satisfy than provider-pinned executors | 17:07 |
fungi | and easier to scale | 17:07 |
pabelanger | clarkb: however, likely step 1 here to solve the issue how we can today without zuul changes, then work to maybe implement what we discussed here today | 17:08 |
fungi | we already have something similar for multi-node build provider affinity anyway | 17:08 |
clarkb | pabelanger: ya the downside is it means you have to support and deprecate that functionality in the repo syncing role | 17:09 |
openstackgerrit | Goutham Pacha Ravi proposed openstack-infra/zuul-jobs master: Attempt to copy the coverage report even if job fails https://review.openstack.org/582690 | 17:09 |
pabelanger | clarkb: yup and the job is broke today, so working to land new feature (assuming we wan to do it) might just be best path here | 17:11 |
pabelanger | I'd like to hear from mordred / corvus too | 17:11 |
tobiash | clarkb, fungi: ah yes, it was in the container spec | 17:13 |
logan- | pabelanger: i'm confused why required-projects is being used for the repos that should only be cloned/synced when depends-on is present | 17:13 |
logan- | those will get cloned/synced due to the depends-on even if they are not in required projects | 17:14 |
clarkb | logan-: I thought we might do that but wasn't sure. | 17:16 |
tobiash | So I think buildset lifetime and provider affinity could be solved one after the other | 17:16 |
pabelanger | I also think I am leaving out a key piece of information, we also need to setup the https://zuul-ci.org/docs/zuul/user/config.html#attr-job.required-projects.override-checkout setting, because these projects don't have a master branch: https://review.rdoproject.org/r/13330/ | 17:17 |
pabelanger | we get the following error: ERROR Project review.rdoproject.org/openstack/networking-cisco-distgit does not have the default branch master | 17:17 |
tobiash | provider affinity in nodepool might be as easy as if a provider is requested, all other providers just decline the request | 17:17 |
corvus | wow, are we talking about two really complicated issues at the same time? | 17:17 |
fungi | so it seems | 17:17 |
tobiash | oh sorry, I'll defer my thoughs | 17:18 |
corvus | okay, i'm going to need like 10 minutes to untangle scrollback | 17:18 |
fungi | calls for a scrollback deinterlacer | 17:20 |
corvus | i'm doing it. i'll have an etherpad in a minute | 17:20 |
corvus | https://etherpad.openstack.org/p/ajP8DUX02S | 17:23 |
corvus | first time i've ever had to do that :) | 17:23 |
corvus | okay, i'm going to read the conversation about required-projects first | 17:23 |
clarkb | I'm sure slack would've solved that for us right? >_> | 17:24 |
corvus | /kick clarkb | 17:25 |
mordred | clarkb: wasn't chromakode working on a threaded chat client at one point? | 17:25 |
clarkb | mordred: ya I think he was involved with some startup that was going to fix chat problems. Like the other dozen startups all doing that :) | 17:25 |
clarkb | tl;dr it is a hard problem | 17:26 |
mordred | yup | 17:27 |
corvus | okay, i've read the required-projects chat, and my understanding is: i agree with logan-: depends-on should cause the repo to show up on disk, but pabelanger says that we *also* need a specific branch of that repo checked out. pabelanger, what branch do you want to have checked out? presumably these repos don't have the branch of the change (otherwise it would have checked them out). is the issue that you | 17:30 |
corvus | need a specific branch checked out, or that those projects need a default branch so that the job checks *something* out and the error goes away? | 17:30 |
corvus | pabelanger: try setting "default-branch" on the projects in question in their project stanzas. you can do this in a config-project. you can even do it with a regex project matcher, like "name: .*-distgit" so you don't have to type it 300 times. https://zuul-ci.org/docs/zuul/user/config.html#attr-project.default-branch | 17:33 |
pabelanger | corvus: we'd need a specific branch checked out, in the case of the error it is rpm-master, but also would need queens-rdo, pike-rdo. I am unsure the history here for rdoproject why the branches are different | 17:34 |
pabelanger | corvus: ack, I can test that | 17:34 |
corvus | pabelanger: well, that doesn't apply if you need different branches checked out | 17:34 |
pabelanger | yes, I was first looking to fix the master error first, but rdoproject does development on the other branches too | 17:35 |
corvus | pabelanger: if you have a project (-distgit) with branches that correspond to an equal set of branches in another project, zuul can only handle that if they have the same name. | 17:36 |
corvus | pabelanger: what i would do is recognize the advantages of having corresponding branches of different projects have the same name and use that system going forward so that zuul and humans can both do the intuitive thing. pike == pike. :) | 17:37 |
corvus | pabelanger: but, if you need to map different branch names to each other, then i'd recommend setting the default-branch attribute to get rid of the error, then you can always manually check out the right branch in a pre-playbook. | 17:37 |
corvus | pabelanger: so you can say "if zuul.branch == pike, checkout pike-rdo" | 17:38 |
logan- | would https://zuul-ci.org/docs/zuul/user/config.html#attr-pragma be useful to bridge the branch name issue? | 17:38 |
corvus | pabelanger: you can even iterate over all of the non-required projects by iterating over zuul.projects and looking for required=false. those will be depends-on projects (or, possibly, the project of the change under test) | 17:38 |
pabelanger | okay, thanks. Let me get error resolved first using defaul-branch, and confirm if other branches have an issue | 17:38 |
pabelanger | I don't really know yet | 17:39 |
corvus | logan-: it can do so for job definitions (ie, jobs on pike should apply to changes pike-rdo). and may have a place here just for that depending on what the job definitions look like. but i don't think it's a complete solution to the problem -- | 17:39 |
corvus | logan-: because it won't cause the pike-rdo versions of the repos to be checked out for a pike change | 17:39 |
logan- | ah | 17:40 |
corvus | logan-: *however*, if one were in a situation where you had branch variants of jobs and could specify required-projects for every branch, you could construct a mapping like that. so the pike job runs on changes to pike, and changes to pike-rdo, and it's required-projects says: check out nova@pike and nova-distgit@pike-rdo. but that only works for things added to required-projects. | 17:41 |
corvus | so it's really close, but the 300 projects thing throws a wrench in it here :) | 17:42 |
corvus | pabelanger: okay, sounds like you've got the next one or two steps to take yeah? | 17:43 |
pabelanger | I do, thanks for help | 17:44 |
corvus | ok, i'll read the buildset conversation now :) | 17:44 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Don't write docker proxy config if docker_mirror undef https://review.openstack.org/583010 | 17:47 |
corvus | tobiash, clarkb: yeah, i think we can start with having jobs zuul_return something from the main playbook which says "i completed successfully, keep me running until my child jobs are finished, then run my post playbook". that's what i was getting at in the container spec, and should allow us to implement all kinds of things based on a create, wait, cleanup pattern. | 17:48 |
corvus | provider affinity makes that better, but isn't required for a first pass, and shouldn't interfere with the initial implementation. | 17:49 |
tobiash | corvus: ah so that would keep the job alive instead of just the node | 17:49 |
tobiash | corvus: that's an interesting idea | 17:49 |
*** GonZo2000 has quit IRC | 17:49 | |
corvus | tobiash: yep. that makes things like "create an object store container; child jobs use it; delete container" easy to do. | 17:50 |
corvus | that could be a zero-node job | 17:50 |
tobiash | ok, I was more on the trip adding a leave-me-alive annotation to a node in a buildset but your idea seems to be more powerful | 17:51 |
corvus | this *also* touches on SpamapS's idea of a cleanup job -- a job which runs after all other jobs are finished. but i think the idea of suspending then resuming the parent job may be able to accomplish the same things, and more, so may be the better approach. | 17:51 |
corvus | tobiash: to be fair, i think what i wrote in the spec suggests more of what you describe. i think i, erm, may have forgotten to write down this newly revised version. :) | 17:52 |
tobiash | corvus: so with that zuul_return info would you want to pause just at the end of the playbook that called this or just at the end of the run playbook? | 17:53 |
corvus | tobiash: i think only at the end of the run playbook? i think it will be difficult to figure out what to do if a job pauses twice, so if we say we can only do it after the run playbook, that makes things clearer? | 17:54 |
corvus | though, i guess we could say that we only pause the first time. | 17:55 |
tobiash | corvus: depends, we could also say pause will be done just once, | 17:55 |
corvus | whether that happens in a pre/run/post doesn't really matter, i guess... | 17:55 |
tobiash | corvus: except you want to attach the success status to this signal | 17:56 |
pabelanger | just catching up, +1 for zuul_return + keep running | 17:56 |
tobiash | corvus: but I don't think we need to attach the success status to this signal as that job still can just use zuul_return to forward any variable to the child jobs | 17:57 |
tobiash | corvus: so I have a slight preference to just let it pause once regardless at which playbook | 17:58 |
tobiash | (but also can live with end of run playbook) | 17:58 |
tobiash | detail question: would be add a new metric to the executor stats (jobs starting, running, paused)? | 17:59 |
tobiash | s/be/we/ | 18:00 |
corvus | tobiash: well, we still decide whether to run child jobs based on the result of the parent. right now, that's the final result. we'd be talking about moving that to ether the run playbook, or any playbook. so if we allow it on any, then we can end up in a situation where the pre_playbook pauses with success and child jobs run, then the run playbook fails, so the job result switches to failure. that's | 18:00 |
corvus | weird, but, i guess okay? only allowing this from the run playbook avoids that situation. | 18:00 |
openstackgerrit | Adam Harwell proposed openstack-infra/zuul-jobs master: Make revoke-sudo work on base cloud-init images https://review.openstack.org/564674 | 18:01 |
tobiash | corvus: good point | 18:01 |
corvus | tobiash: a new metric sounds like a fine idea | 18:01 |
corvus | (we should also expose the job state in the api and the status page) | 18:01 |
* mordred likes pause-at-end-of-run | 18:02 | |
tobiash | uhm, more work, but yes, definitely | 18:02 |
tobiash | corvus: so with your point we should do pause only at end of run | 18:02 |
corvus | my gut says only do this for the run playbook. it's clear and should be sufficient. i think if we decide we want to allow it at other playbooks later, we make that change later. | 18:02 |
corvus | (if a use case comes up that pause-at-end-of-run can't handle) | 18:03 |
tobiash | however the first job still could switch to post failure | 18:03 |
tobiash | but I guess that's ok then in this case | 18:03 |
pabelanger | does parent unpause once child jobs are finished? | 18:03 |
corvus | yep. i feel like that's a fairly minor change. | 18:03 |
corvus | pabelanger: yes | 18:04 |
pabelanger | and we'd pass the parent watchdog too I assume | 18:04 |
pabelanger | pause* | 18:04 |
tobiash | corvus: so for start automatic unpause if all recursive children are finished? | 18:05 |
corvus | someday, someone is going to ask to be able to unpause before child jobs are finished. i'm sure we'll be able to accomodate that then. but until then, let's keep it simple and just unpause when all child jobs are finished. | 18:05 |
corvus | tobiash: exactly | 18:05 |
tobiash | corvus: that use case probably could be added easily also via zuul_return | 18:05 |
corvus | tobiash: ya | 18:05 |
tobiash | (later) | 18:05 |
pabelanger | yah, I don't see a need to unpause for the rdoproject use case | 18:05 |
tobiash | so I guess I will start tomorrow with that | 18:06 |
corvus | pabelanger: good question -- should the parent job timeout be paused? i think probably so. | 18:06 |
mordred | I think so too | 18:06 |
tobiash | because we'll kill our artifactory within the next two weeks probably without this feature :( | 18:06 |
corvus | tobiash: great, when i next update the spec, i'll clarify this section to match :) | 18:06 |
pabelanger | ++ | 18:07 |
* mordred is excited about this | 18:07 | |
pabelanger | exciting | 18:07 |
pabelanger | mordred: YES | 18:07 |
tobiash | mordred: excited about killing artifactory? ;) | 18:07 |
corvus | one more detail: i think currently aborted jobs don't run post-playbooks | 18:07 |
corvus | so you could end up creating containers but not cleaning them up | 18:08 |
tobiash | corvus: yes, they just stop | 18:08 |
tobiash | corvus: do you think for the container use case we need an 'always run this post playbook'-annotation in the job? | 18:09 |
clarkb | tobiash: corvus maybe a new cleanup-playbook: specifier | 18:10 |
tobiash | or that | 18:10 |
tobiash | definitly better than a post playbook annotation | 18:10 |
corvus | yeah, i think one of those would be useful. | 18:10 |
corvus | with cleanup-playbook, we need to define the nesting order (it has 4 dimenions now, which is harder to think about than 3 :). with annotation, we need to alter the yaml structure to allow for annotations (post-run is currently a simple list of strings; we'd have to make it list of [string or dict]) | 18:11 |
corvus | i think if we did cleanup-playbook, maybe add cleanup playbooks before post playbooks at each level. | 18:13 |
corvus | like: pre-parent, pre-child, run, cleanup-child, post-child, cleanup-parent, post-parent | 18:14 |
corvus | the annotation approach would let you do that plus more options. | 18:15 |
corvus | i think either would work fine, just a matter of (a) whether we want the extra flexibility, and (b) whether we're going to end up needing annotations anyway in the future for some other change :) | 18:16 |
mordred | corvus: I'm curious - why before post and not after post? | 18:16 |
pabelanger | I could see post-run always runs regardless of aborts and maybe expect users to use blocks with zuul_success, then we can add clean up handler things into roles in always section. But maybe too much work on user side? | 18:17 |
tobiash | mordred: after post could be difficult if you deregister the build ssh key in the post | 18:17 |
*** acozine1 has quit IRC | 18:17 | |
mordred | oh. good point | 18:17 |
corvus | mordred: because in the simplest case of just one job level (no child inheritance), you'd probably want "upload logs" to be last, and that's the only way you could do that. | 18:17 |
tobiash | mordred: and you might want to have logs about the cleanup | 18:17 |
corvus | pabelanger: oh, yes, that's another option. but we'd need to change a lot of existing jobs, i bet. | 18:18 |
clarkb | corvus: tobiash could also have cleanup be exclusive to post | 18:18 |
clarkb | have one or the other | 18:18 |
mordred | still gotta deal with inheritance hierarchy though | 18:18 |
clarkb | ya so parent pre, child pre, run, child cleanup, parent post type of deal? | 18:19 |
* mordred has been convinced of pre-run-cleanup-post a sequence | 18:19 | |
clarkb | not sure that is easier or more clear | 18:19 |
tobiash | hrm, the annotation would make the inheritance easy | 18:20 |
corvus | clarkb: true, but i don't think that gains us much (and loses us the ability to have a job with both a cleanup and regular post playbook; granted, you can still do the same thing with conditionals, but you may have to build more logic into the playbook than otherwise) | 18:20 |
pabelanger | corvus: yah, some jobs today (want to say tox) are already using zuul_success for log collection, but agree, we'd likely need some post-run clean up | 18:20 |
corvus | pabelanger: yeah. right now, for example, we're not uploading logs for aborted builds, just because the playbook isn't running. | 18:21 |
*** jiapei has quit IRC | 18:23 | |
corvus | tobiash: i lean ever so slightly towards the annotation idea, because it's more future proof, and because it keeps the pre/run/post sequence looking simple (but intuitively accomodates more complexity when needed) | 18:23 |
tobiash | corvus: shall we use storyboard for noting such ideas? | 18:26 |
tobiash | I find it hard to find them after weeks buried deep in the backlog | 18:26 |
corvus | tobiash: i thought you were writing this tomorrow? :) | 18:26 |
tobiash | corvus: that were two ideas :) | 18:27 |
corvus | i think we'll find quickly that pausing jobs will require cleanup playbooks :) | 18:27 |
tobiash | yes, but technically they're independent of each other | 18:27 |
corvus | tobiash: storyboard is a fine place for such ideas, though the first half of this idea is in the container spec | 18:29 |
tobiash | corvus: I'll need pausing jobs now to not being killed in the next two weeks and after I survived I can volunteer to implement the cleanup if nobody else took that task | 18:29 |
corvus | tobiash: you don't need a cleanup for artifactory? | 18:30 |
tobiash | corvus: ok, I think that both may fit well into the container spec | 18:30 |
tobiash | corvus: we do that by annotating expiry dates to the artifacts and asynchronously deleting expired stuff | 18:30 |
corvus | tobiash: or are you going to do the thing described in the container spec and run a service on a node for the duration of the buildset? | 18:30 |
corvus | tobiash: so what are you going to use the pause for? | 18:31 |
tobiash | corvus: my plan is to leave a lightweight node with the first job running serving the short lived cache instead of artifactory | 18:31 |
corvus | ah, ok. so yeah, that's the model described in the container spec, and i agree, it doesn't need cleanup (deleting the node is sufficient) | 18:32 |
tobiash | corvus: like a prepare-workspace job that gets the synced source and git-lfs data (several GB) | 18:32 |
tobiash | and all the other jobs get their data from that node and push their data to that node | 18:32 |
tobiash | then that's not hitting artifactory at all and the network traffic is more decentral in the cloud and not targeted to a single load balancer | 18:33 |
corvus | having said all of that, fungi made a point earlier that we could engineer inter-job scratch space on the executors after we add executor affinity. | 18:33 |
corvus | but, i think in the long run, having both of these options will be good. and 'pause' is probably both easier to implement, and also useful for the container work. | 18:34 |
mordred | ++ | 18:34 |
tobiash | and in my case probably distributes the network traffic better | 18:34 |
fungi | yeah, i like the resource build idea anyway since it's more flexible | 18:34 |
mordred | I think the scratch space from pause seems more potentially scalable, since the space can grow with job nodesets as needed ... but I could also see executor scratch space with affinity being a thing too | 18:35 |
mordred | also - scratch space from pause will work soon (potentially) for tobias - and just be inefficient for openstack/multi-cloud scenarios until affinity is done | 18:35 |
corvus | (pause, i'll note, will benefit from *provider* affinity too, and that may be effectively required for the container use case in some environments, but isn't *strictly* required like executor affinity is for scratch space) | 18:35 |
fungi | the scratch space on executor model wins on simplicity but mostly only handles the one use case of dependent jobs sharing artifacts | 18:35 |
mordred | corvus: jinx | 18:35 |
corvus | mordred: :) | 18:35 |
fungi | and is also yet one more place to run into executor scaling issues | 18:36 |
fungi | if provider build affinity gets implemented then a resource build could theoretically handle sharing very large ephemeral artifacts between jobs with decent performance | 18:37 |
corvus | yep | 18:37 |
fungi | whereas executor scratch space would need to be tightly constrained to avoid creating a denial of service scenario | 18:37 |
fungi | (not just in terms of disk space but also bandwidth consumption) | 18:38 |
fungi | and to get similar network performance you'd need provider-specific executors, which is yet one more scaling axis to manage | 18:39 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Add role for installing docker and configuring registry mirror https://review.openstack.org/580730 | 18:39 |
tobiash | ok, so the plan is to implement job pause, then cleanup, then provider affinity? | 18:40 |
fungi | and probably we could have roles available in the zuul-jobs stdlib to set up arbitrary storage during a resource build and pass around the necessary credentials, so in the end it _could_ be made just as easy as the scratch space on executor idea, i think | 18:41 |
corvus | tobiash, fungi: ++ | 18:46 |
tobiash | fungi: the credentials (ssh key?) could also be passed via zuul_return to the child jobs | 18:48 |
tobiash | so ++ for zuul-jobs | 18:48 |
fungi | tobiash: yes, that's what i had in mind, just thinking we could orchestrate the handling of it via zuul_return in said role(s) | 18:48 |
tobiash | yes good idea | 18:49 |
tobiash | corvus: regarding provider affinity, would you request that via an info in zuul_return (on demand) or in the project pipeline (static)? | 18:54 |
* mordred would vote for project pipeline - so that zuul/nodepool would know at the beginning that they might need to allocate all the nodes for a job graph in the same provider (might be important to know from a capacity perspective) | 18:56 | |
mordred | like, if parent (2 nodes) + 2 children (4 each) need a total of 10 nodes and one of the providers only has 8 nodes of capacity - letting the parent schedule there then request affinity via zuul_return would potentially lead to a completely stuck situation | 18:58 |
mordred | s/capacity/total capacity/ | 18:58 |
mordred | but I'm just thikning out loud | 18:58 |
tobiash | mordred: hrm, that would require a credit card like request model, take 5 nodes now but make sure you can fulfill 13 nodes | 18:59 |
corvus | tobiash, mordred: project-pipeline (static) means that we can provide the most information to nodepool. whether we make use of it now or not is a separate question. it would let us do the really sophisticated thing that mordred describes in the future if we want, but we can still do a simpler version (request child jobs in the same provider as parent if they indicate they need it) as well. | 19:27 |
tobiash | corvus: just had a further idea when having a prepare-workspace job that pauses it might reduce load on the executor if we could tag a job to skip setting up non-playbook/roles repos | 19:29 |
tobiash | regarding provider affinity I maybe would tag the jobs in the project pipeline with a provider group (arbitrary user choosable value) to indicate that this set of jobs need to run on the same provider. With this information we could easily do the validation if the whole group could be in theory satisfied by the provider (the abs quota check in nodepool) | 19:39 |
*** rcarrill1 has joined #zuul | 19:49 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Don't write docker proxy config if docker_mirror undef https://review.openstack.org/583010 | 19:50 |
*** rcarrillocruz has quit IRC | 19:51 | |
mordred | Shrews: ^^ woot! | 19:53 |
mordred | Shrews: I now expect a bazillion patches to land all in a row | 19:54 |
Shrews | mordred: was that the dependency for the pbrx jobs? | 19:54 |
mordred | yup | 19:56 |
Shrews | oh, that was that one's parent, actually | 19:56 |
mordred | yeah | 19:56 |
mordred | that one there is really just a cleanup | 19:57 |
*** sshnaidm|bbl has joined #zuul | 20:00 | |
corvus | tobiash: could you, today, make a new base job which didn't copy the workspaces over, and inherit from that for jobs which shouldn't do that? | 20:21 |
pabelanger | there was a good idea from mordred about adding a new group into inventory that was something like skip_git, then we update prepare-workspace to run on hosts: all,!skip_git and repos shouldn't get pushed. But I haven't tested that yet. | 20:23 |
pabelanger | but there is also a need for that workflow in rdoproject to help save some IO / time | 20:23 |
tobiash | corvus: we have such a base job but I mean in this case we don't even have to prepare all repos on the executor | 20:23 |
tobiash | pabelanger: our base job just reacts on a skip_synchronization variable that can be set on a job or even parts of the nodes in a nodeset | 20:25 |
tobiash | pabelanger: you don't need groups for that | 20:25 |
openstackgerrit | Goutham Pacha Ravi proposed openstack-infra/zuul-jobs master: Attempt to copy the coverage report even if job fails https://review.openstack.org/582690 | 20:27 |
dmsimard | corvus: not entirely sure what that job runs but it's probably worth considering sending some of that output to log files instead of stdout so that the job logs are the ara-report are manageable | 20:47 |
corvus | dmsimard: the job only emits output on test failure because the output is so big. all the tests failing is a pathological case. | 20:47 |
dmsimard | oh, so it's not generally that big -- got it | 20:48 |
corvus | ya. normal case is, say 0-5 test failures :) | 20:48 |
corvus | (for channel context, this is in re ara's performance with a very large sqlite database) | 20:49 |
*** hasharAway has quit IRC | 20:58 | |
SpamapS | Oh interesting... a parent job that can say "I've done the things children might need" and then pause and wait for the children to finish. I like that, and the implementation would be pretty simple I think, since you could just use SIGSTOP/SIGCONT | 21:07 |
mordred | SpamapS: you're a SIGCONT | 21:07 |
SpamapS | Or a control socket or something else I suppose. | 21:07 |
SpamapS | mordred: A feckless SIGCONT? | 21:08 |
* SpamapS should probably have SIGSTOP'd himself there. | 21:08 | |
*** samccann has quit IRC | 21:12 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Put ubuntu_gpg_key into defaults instead of vars https://review.openstack.org/583047 | 21:17 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Build container images using pbrx https://review.openstack.org/580160 | 21:28 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Specify a prefix for building the images https://review.openstack.org/582396 | 21:28 |
*** ianw_pto is now known as ianw | 21:44 | |
openstackgerrit | Merged openstack-infra/zuul master: Update bindep file with compile profiles https://review.openstack.org/580159 | 21:50 |
openstackgerrit | Merged openstack-infra/zuul master: Add alpine packages to bindep.txt https://review.openstack.org/582276 | 21:58 |
*** harlowja has joined #zuul | 21:59 | |
*** jpena|off has quit IRC | 22:33 | |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul master: Install less than alpine-sdk https://review.openstack.org/583062 | 22:35 |
tristanC | mordred: should i look into using a TenantName Singleton services to query api/info and manage the zuul_api_root_url? | 22:45 |
tristanC | mordred: and updates all the component to wait for the singleton service to be setup... | 22:46 |
tristanC | mordred: i was think we could have a tenant drop-down list, like the project list in horizon, where you could just switch tenant if many are available | 22:46 |
tristanC | hum, but that wouldn't work with the '/t/{tenant}/page.html' routing... | 22:47 |
tristanC | mordred: what do you think would be the easiest fix for the current ui issue? | 22:48 |
tristanC | well, i volunteer to fix that bug as it seems like a release blocker, but are there other blockers i can work on? | 22:59 |
*** harlowja has quit IRC | 23:03 | |
openstackgerrit | Ian Wienand proposed openstack-infra/zuul-jobs master: upload-logs: generate a script to download logs https://review.openstack.org/581204 | 23:45 |
openstackgerrit | Ian Wienand proposed openstack-infra/zuul-jobs master: upload-logs: generate a script to download logs https://review.openstack.org/581204 | 23:55 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Put ubuntu_gpg_key into defaults instead of vars https://review.openstack.org/583047 | 23:58 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!