*** rfolco has quit IRC | 00:17 | |
*** paladox is now known as paladox_UK_IN_EU | 00:20 | |
*** rfolco has joined #zuul | 00:23 | |
*** rf0lc0 has joined #zuul | 00:25 | |
*** rfolco has quit IRC | 00:27 | |
*** rf0lc0 has quit IRC | 00:29 | |
openstackgerrit | Paul Belanger proposed zuul/zuul-jobs master: Revert "Make ara-report role to zuul_return an artifact" https://review.opendev.org/703902 | 00:30 |
---|---|---|
pabelanger | AJaeger: ianw: fbo: can we revert^? We are seeing an error in zuul.ansible.com: https://object-storage-ca-ymq-1.vexxhost.net/v1/a0b4156a37f9453eb4ec7db5422272df/ansible_18/18/addd84d133eddfb9966e633686d6346d49fa0ef1/check/ansible-test-network-integration-iosxr-netconf-python36/123ceb7/job-output.html#l4577 | 00:31 |
pabelanger | that looks to have been merged today | 00:31 |
pabelanger | this is because, we don't compress ARA html | 00:32 |
pabelanger | so ara_generated is undefined | 00:32 |
*** rf0lc0 has joined #zuul | 00:32 | |
clarkb | pabelanger: that particular error is probably bceause you ansible 2.9 though right? it should be not foo is skipped | 00:34 |
clarkb | or foo is not skipped? something like that | 00:34 |
clarkb | since ansible decided that you can no longer use the | filter syntax on conditionals | 00:34 |
pabelanger | maybe, I haven't looked too much | 00:35 |
pabelanger | just noticed the error, and proposed revert | 00:35 |
pabelanger | i don't think our jobs are breaking, but looking to confirm | 00:35 |
clarkb | pretty sure that would fix it | 00:35 |
clarkb | maybe you can test locally and confirm? | 00:35 |
ianw | the same thing is in the line above, how come it doesn't cause an error there? | 00:36 |
pabelanger | clarkb: I can tomorrow, just wrapping up for night. Hence the revert, otherwise would push up patch | 00:36 |
clarkb | ianw: oh interesting | 00:36 |
pabelanger | ianw: ara_compress_html | bool | 00:37 |
pabelanger | is false | 00:37 |
pabelanger | and defined | 00:37 |
pabelanger | ara_generate isn't defined | 00:37 |
pabelanger | we can add to default vars | 00:37 |
pabelanger | but is registered on line56 | 00:37 |
ianw | yeah, i guess it never gets evaluated in this context, but if it's a syntax error it's still an issue? | 00:38 |
clarkb | | bool is also no longer valid iirc | 00:38 |
clarkb | oh wait I bet | bool actually is a filter not a conditional | 00:38 |
clarkb | because it converts the input to a bool | 00:38 |
*** rf0lc0 has quit IRC | 00:38 | |
ianw | i'd agree on a quick revert while we think about that, mind if i just trim the commit message? | 00:38 |
clarkb | what makes this confusing is that conditionals and filters were used the same before and three isn't a super clear "this is a conditional vs this is a filter" determination | 00:38 |
clarkb | so when you are writing the yaml you have to think on it a bit | 00:39 |
pabelanger | ianw: go for it | 00:39 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: Revert "Make ara-report role to zuul_return an artifact" https://review.opendev.org/703902 | 00:39 |
pabelanger | k, have to drop now | 00:40 |
pabelanger | thanks! | 00:40 |
clarkb | I'm 99% sure that changing it to is skipped will fix it (but have to fix the other ones that ianw points out too | 00:41 |
*** jamesmcarthur has joined #zuul | 00:57 | |
*** wxy-xiyuan has joined #zuul | 01:20 | |
*** paladox_UK_IN_EU is now known as paladox | 01:24 | |
*** jamesmcarthur has quit IRC | 01:36 | |
*** jamesmcarthur has joined #zuul | 01:36 | |
*** jamesmcarthur has quit IRC | 01:42 | |
*** jamesmcarthur has joined #zuul | 01:42 | |
*** jamesmcarthur has quit IRC | 01:55 | |
*** jamesmcarthur has joined #zuul | 01:55 | |
*** jamesmcarthur has quit IRC | 02:01 | |
*** yolanda has quit IRC | 02:12 | |
*** jamesmcarthur has joined #zuul | 02:14 | |
*** yolanda has joined #zuul | 02:17 | |
*** yolanda has quit IRC | 02:27 | |
*** yolanda has joined #zuul | 02:34 | |
*** jamesmcarthur has quit IRC | 02:38 | |
*** jamesmcarthur has joined #zuul | 02:42 | |
*** jamesmcarthur has quit IRC | 02:47 | |
*** jamesmcarthur has joined #zuul | 02:50 | |
*** sgw has quit IRC | 02:50 | |
*** jamesmcarthur has quit IRC | 02:54 | |
*** jamesmcarthur has joined #zuul | 02:59 | |
*** bhavikdbavishi has joined #zuul | 03:04 | |
*** bhavikdbavishi1 has joined #zuul | 03:06 | |
*** jamesmcarthur has quit IRC | 03:07 | |
*** jamesmcarthur has joined #zuul | 03:07 | |
*** bhavikdbavishi has quit IRC | 03:08 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 03:08 | |
*** rlandy|bbl is now known as rlandy | 03:17 | |
*** sgw has joined #zuul | 03:29 | |
*** jamesmcarthur has quit IRC | 03:53 | |
*** jamesmcarthur has joined #zuul | 03:54 | |
*** zxiiro has quit IRC | 03:56 | |
*** jamesmcarthur has quit IRC | 04:15 | |
*** jamesmcarthur has joined #zuul | 04:16 | |
*** jamesmcarthur has quit IRC | 04:21 | |
*** jamesmcarthur has joined #zuul | 04:46 | |
*** mattw4 has joined #zuul | 04:48 | |
*** mattw4 has quit IRC | 04:53 | |
*** jhesketh has quit IRC | 04:54 | |
*** jamesmcarthur has quit IRC | 04:54 | |
*** jhesketh has joined #zuul | 04:55 | |
*** stevthedev has quit IRC | 05:28 | |
*** stevthedev has joined #zuul | 05:28 | |
*** evrardjp has quit IRC | 05:34 | |
*** evrardjp has joined #zuul | 05:34 | |
*** reiterative has quit IRC | 05:41 | |
*** wxy-xiyuan has quit IRC | 05:49 | |
*** jamesmcarthur has joined #zuul | 05:50 | |
*** jamesmcarthur has quit IRC | 05:55 | |
*** swest has joined #zuul | 05:56 | |
*** jamesmcarthur has joined #zuul | 05:58 | |
*** jamesmcarthur has quit IRC | 06:03 | |
*** raukadah is now known as chkumar|rover | 06:09 | |
openstackgerrit | Merged zuul/zuul master: bindep: fixed wrong dep names on rpm platform https://review.opendev.org/703403 | 06:27 |
openstackgerrit | Merged zuul/zuul-jobs master: Revert "Make ara-report role to zuul_return an artifact" https://review.opendev.org/703902 | 06:45 |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Fix ansible-2.9 skipped problem https://review.opendev.org/703922 | 06:48 |
AJaeger | pabelanger, ianw, one more skipped problem - wonder why you did not hit that before, pabelanger ? ^ | 06:48 |
AJaeger | reading backscroll, clarkb, anything else needed? ^ | 06:53 |
*** themroc has joined #zuul | 07:44 | |
*** themr0c has joined #zuul | 07:48 | |
*** themroc has quit IRC | 07:50 | |
*** swest has quit IRC | 07:58 | |
*** fungi has quit IRC | 07:59 | |
*** gouthamr has quit IRC | 07:59 | |
*** frickler has quit IRC | 07:59 | |
*** Diabelko has quit IRC | 07:59 | |
*** ssbarnea has quit IRC | 08:01 | |
*** irclogbot_1 has quit IRC | 08:04 | |
*** jpena|off is now known as jpena | 08:05 | |
*** irclogbot_3 has joined #zuul | 08:06 | |
*** tosky has joined #zuul | 08:13 | |
*** avass has joined #zuul | 08:18 | |
*** swest has joined #zuul | 08:45 | |
*** fungi has joined #zuul | 08:45 | |
*** gouthamr has joined #zuul | 08:45 | |
*** frickler has joined #zuul | 08:45 | |
*** Diabelko has joined #zuul | 08:45 | |
*** openstackstatus has quit IRC | 08:48 | |
*** saneax has joined #zuul | 09:04 | |
*** coldtom has joined #zuul | 09:14 | |
openstackgerrit | Clément Mondion proposed zuul/nodepool master: add tags support for aws provider https://review.opendev.org/703651 | 09:25 |
*** pcaruana has joined #zuul | 09:30 | |
openstackgerrit | Clément Mondion proposed zuul/nodepool master: add tags support for aws provider https://review.opendev.org/703651 | 09:32 |
*** bhavikdbavishi has quit IRC | 09:59 | |
*** sshnaidm|afk is now known as sshnaidm | 10:16 | |
*** reiterative has joined #zuul | 10:38 | |
reiterative | Quassel occasionally fails to auto-identify for me on freenode before trying to rejoin a channel that requires it. Is there any way to finesse this or is it just something I'll have to live with? | 10:43 |
reiterative | Oops wrong channel! (But I guess someone here may know...) | 10:44 |
*** reiterative has quit IRC | 10:53 | |
*** reiterative has joined #zuul | 10:53 | |
*** hashar has joined #zuul | 11:32 | |
*** bhavikdbavishi has joined #zuul | 11:51 | |
*** jpena is now known as jpena|lunch | 12:00 | |
*** ssbarnea has joined #zuul | 12:00 | |
*** rf0lc0 has joined #zuul | 12:01 | |
*** electrofelix has joined #zuul | 12:02 | |
*** avass has quit IRC | 12:08 | |
AJaeger | we have a couple of legacy maven build and upload jobs - has anybody worked on Zuul v3 ones so that we can replace the legacy ones? | 12:10 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: WIP: Fix mqtt log url reporting when report-status-page is active https://review.opendev.org/703983 | 12:27 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: WIP: Fix mqtt log url reporting when report-build-page is active https://review.opendev.org/703983 | 12:30 |
hashar | AJaeger: hi, those maven jobs are just for the jenkins plugins arent they? | 12:38 |
hashar | x/gearman-plugin comes to mind | 12:38 |
hashar | and you archived x/zmq-event-publisher already great ;) | 12:40 |
AJaeger | hashar: and for monasca | 12:52 |
AJaeger | hashar: search in openstack-zuul-jobs for mvn | 12:53 |
*** bhavikdbavishi1 has joined #zuul | 13:00 | |
*** bhavikdbavishi has quit IRC | 13:02 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 13:02 | |
*** bhavikdbavishi has quit IRC | 13:08 | |
*** bhavikdbavishi has joined #zuul | 13:10 | |
*** jpena|lunch is now known as jpena | 13:16 | |
pabelanger | AJaeger: will test this morning | 13:32 |
pabelanger | AJaeger: ah, won't be able to test, trusted job. I think we also want to add a test job to help validate | 13:41 |
* AJaeger won't have time for writing one the next days, happy to take one nevertheless | 13:44 | |
*** jpena is now known as jpena|off | 14:12 | |
*** jpena|off is now known as jpena | 14:20 | |
openstackgerrit | wes hayutin proposed zuul/zuul-jobs master: bindep: Add missing virtualenv and fixed repo install https://review.opendev.org/693637 | 14:31 |
*** saneax has quit IRC | 14:36 | |
*** ffloreth__ has joined #zuul | 14:48 | |
*** themr0c has quit IRC | 14:48 | |
*** hashar has quit IRC | 14:50 | |
*** jamesmcarthur has joined #zuul | 14:53 | |
*** ffloreth__ has quit IRC | 14:54 | |
*** jamesmcarthur has quit IRC | 14:57 | |
*** jamesmcarthur has joined #zuul | 15:00 | |
*** pcaruana has quit IRC | 15:04 | |
corvus | reminder: call about zuul-operator in ~2 hours (1700 utc) | 15:08 |
openstackgerrit | Merged zuul/zuul-jobs master: Fix ansible-2.9 skipped problem https://review.opendev.org/703922 | 15:17 |
Shrews | swest: with your annotated logger changes, should we consider removing duplicate info from annotated log entries? for example: "[node_request: 100-0000000000] Locking request 100-0000000000" Probably don't need the request ID in there twice now | 15:18 |
*** Goneri has joined #zuul | 15:18 | |
Shrews | tobiash: maybe you have thoughts on that too ^^ | 15:18 |
corvus | i think it would be fine to change tha to "[node_request: 100-0000000000] Locking request" | 15:20 |
corvus | that reads fine in the text logs; i'm assuming it would be fine in something that reads structured logs. but that's where input from swest and tobiash would be helpful | 15:22 |
corvus | (ie, if it were in something like logstash, would you have to make sure the right columns were enabled to get the full picture?) | 15:22 |
Shrews | yeah, it's mostly structured log parsing that i'm worried about | 15:24 |
*** plaurin has joined #zuul | 15:30 | |
plaurin | Hello people! | 15:30 |
plaurin | clarkb: Hey sorry I left quite unexpectedly yesterday after fixing the nodepool-kubernetes issue | 15:31 |
*** ffloreth__ has joined #zuul | 15:31 | |
plaurin | Not sure what you guys talked about later in the conversation. I don't use IRC a lot so not sure how I can go back to see what was talked about | 15:32 |
Shrews | plaurin: we have channel logs: http://eavesdrop.openstack.org/irclogs/%23zuul/ | 15:33 |
tobiash | corvus, Shrews: I agree, we also still have the full text (minus logger, timestamp and log level) as one message and have additonally those fields as extra labels | 15:33 |
plaurin | also changed nickname, not sure why but I cannot login with plaurin34 anymore.. | 15:33 |
plaurin | oh great thx Shrews | 15:33 |
pabelanger | plaurin: re nick, https://freenode.net/kb/answer/registration will allow you to register the nick to yourself | 15:34 |
pabelanger | then, if somebody is using it, ghost it back | 15:35 |
tobiash | so it will work just fine for us if we remove that duplicated info | 15:35 |
Shrews | tobiash: great. less bytes in the log | 15:35 |
plaurin | I might just use this one right now "plaurin" I just needed to reset my password. I guess we can get rid of plaurin34 | 15:35 |
corvus | tobiash: so whatever you're using to show the logs, shows the extra labels conveniently? | 15:35 |
tobiash | corvus: currently we parse those eytra labels but don't remove from the message | 15:36 |
plaurin | pabelanger, corvus, tristanC: Thanks for your help yesterday. So basically I got passed the Gathering Fact issue in the job. But now the "prepare workspace" step cannot work because it uses "synchronise" module because it doesn't work with kubectl connection | 15:37 |
tobiash | So we see them just as text log but still can filter them efficiently by label | 15:37 |
tobiash | and if we wanted we could filter them from.the message as well and have them just as label | 15:38 |
tobiash | that's just a matter of logstash config :) | 15:38 |
corvus | tobiash: ok, i wasn't sure how convenient that was for you | 15:38 |
pabelanger | plaurin: I believe we still need to create prepare-workspace-k8s in zuul-jobs | 15:39 |
pabelanger | so far, I have found https://review.opendev.org/631402/ for prepare-workspace-openshift | 15:39 |
plaurin | yeah I am creating one for myself right now called prepare-workspace-kubernetes | 15:40 |
pabelanger | ++ | 15:40 |
plaurin | thx I'll take a look | 15:40 |
corvus | tristanC, plaurin, pabelanger: would https://review.opendev.org/631402 work on k8s too? | 15:40 |
pabelanger | would have to defer to tristanC | 15:41 |
plaurin | corvus: I guess maybe with some modifications yes (probably remplace the 'oc' in rsync.yaml command, from a first quick look | 15:41 |
corvus | i can't remember if oc works against plain k8s | 15:42 |
corvus | plaurin: at the very least, that's a role you can copy/replace :) | 15:42 |
corvus | if oc does work against plain k8s, maybe it will run as-is | 15:42 |
corvus | tobiash, pabelanger: ^ should we merge 631402? | 15:43 |
plaurin | 'oc' have rsync, kubectl doesn't. kubectl has 'cp' that might come in handy | 15:43 |
tobiash | corvus: I'll look in a bit | 15:44 |
tristanC | corvus: i assume oc rsync only uses vanilla k8s api | 15:44 |
plaurin | I'm a bit surprised I tought zuul had kubernetes natively working for a long time. | 15:45 |
corvus | me too | 15:45 |
plaurin | I guess we'll make it so :P | 15:45 |
plaurin | I never contributed directly to opendev but if I can help :) | 15:45 |
*** ffloreth__ has quit IRC | 15:46 | |
tristanC | tobiash: corvus: well, 631402 works, but to support both openstack and k8s from a single base job, here is what i wrote: https://softwarefactory-project.io/cgit/third-party-ci-config/tree/playbooks/upstream-base-pod/pre.yaml (as documented and demonstrated in https://review.opendev.org/687435 ) | 15:46 |
tristanC | tobiash: corvus: thus we may want to make prepare-workspace works in both situation instead, but there is also the issue of the build-ssh-key which doesn't work on k8s... | 15:47 |
tristanC | tobiash: corvus: and more generally, as explained on http://lists.zuul-ci.org/pipermail/zuul-discuss/2019-September/001015.html , to use zuul-jobs with kubectl we really need https://review.opendev.org/#/q/topic:zuul-jobs-with-kubectl too | 15:48 |
tobiash | corvus: according to docs oc is kompatible with kubectl as long no openshift only feature is used | 15:48 |
tristanC | which makes me wonder if it wouldn't be easier to have a zuul/zuul-jobs-containers repository ... | 15:49 |
*** chkumar|rover is now known as raukadah | 15:49 | |
corvus | tristanC: what would zuul-jobs-containers do? | 15:50 |
tobiash | I think I'd prefer adding k8s support to the eyisting roles | 15:50 |
tobiash | in case build-ssh-key that probably would be just a ignore | 15:51 |
pabelanger | As a person doing network and linux testing in same base playbooks, it is hard sometimes to deal with edge cases using 'all'. I've ended up creating a new group called 'appliance' to run specific playbooks against it for roles. | 15:51 |
corvus | yeah, that sounds pretty straightforward. | 15:51 |
tristanC | tobiash: me too, but it is already four month since the issue is documented and the fixes are up for reviews | 15:51 |
corvus | tristanC: if i may make a polite suggestion -- tell us what you want to have happen | 15:52 |
corvus | tristanC: getting big changes done isn't just pushing up changes and waiting for folks to review them | 15:52 |
corvus | tristanC: we need you to drive the plan from start to finish | 15:52 |
corvus | tristanC: so, tell us right now what's the next thing you need to have happen to make k8s better | 15:52 |
tristanC | corvus: well, we have been waiting for follow-up on http://lists.zuul-ci.org/pipermail/zuul-discuss/2019-September/001015.html | 15:52 |
corvus | tristanC: http://lists.zuul-ci.org/pipermail/zuul-discuss/2019-September/001020.html i think mordred agreed with your fetch-output plan | 15:54 |
corvus | plaurin: based on what tobiash says about oc probably working against k8s, are you interested in trying out 631402 as written and testing that? | 15:55 |
plaurin | sure thing | 15:57 |
corvus | tristanC: so i guess the next step is review topic:zuul-jobs-with-kubectl ? | 15:57 |
plaurin | "as written" however not sure how this can work, I don't use openshift / oc | 15:59 |
tristanC | corvus: yes please, that's the biggest issue | 15:59 |
plaurin | basically I don'T have the oc cli | 15:59 |
tristanC | corvus: as asked http://eavesdrop.openstack.org/irclogs/%23zuul/%23zuul.2020-01-13.log.html#t2020-01-13T21:16:17 , is the zuul_use_fetch_output toggle still needed? | 16:00 |
corvus | plaurin: iiuc, if you have "oc" installed on the executor, that job should work, so you'd need to install the oc client | 16:00 |
plaurin | I see, yes I can test it however I don't think the end result should depend on oc for vanilla k8s | 16:01 |
plaurin | I understand this patch is specifically for openshift | 16:01 |
tristanC | tobiash: corvus: also, since we are introducing behavior such as mitogen, we could also make https://review.opendev.org/#/c/681553/ opt-in, then we wouldn't have to change the zuul-jobs | 16:01 |
corvus | plaurin: well, if it works for both, we can use it for both | 16:01 |
corvus | who's adding mitogen? | 16:01 |
tristanC | corvus: https://review.opendev.org/#/c/657024/ | 16:02 |
plaurin | imho we cannot assume everyone want to depend on oc cli | 16:02 |
corvus | plaurin: is there a particular reason why not? | 16:03 |
plaurin | well for my part I wasn't aware of the existence of this oc cli before today actually | 16:03 |
corvus | we can document why its used and how to install it | 16:03 |
tristanC | plaurin: afaik, oc is the only cli that support a compatible interface for the synchronize feature we need | 16:03 |
plaurin | I think kubectl cp can work too for people like me not using os/openshift | 16:04 |
plaurin | ' oc | 16:04 |
tobiash | if it's documented it shouldn't matter if one installs kubectl or oc, as it's basically the same | 16:04 |
corvus | tristanC: i would not say we have committed to doing anything with mitogen at this point | 16:05 |
plaurin | I use k3s from rancher. it depends on vanialla kubectl. I understand oc is juste a cli but I might not want to depend on it | 16:05 |
tristanC | corvus: alright, then i think the cleanest thing to do is ensure the zuul-jobs do not use synchronize | 16:06 |
*** jpena is now known as jpena|off | 16:06 | |
tristanC | plaurin: i'm not sure kubectl cp is really designed to upload potentienlla big directory with lots of files, but there could be a 'prepare-k8s-workspace' role that use it | 16:07 |
plaurin | yes, that's what I think would be the best | 16:07 |
plaurin | huh I need a red hat account to get the oc cli?? | 16:08 |
tristanC | plaurin: nop, you can download it from https://www.okd.io/download.html#oc-platforms | 16:08 |
plaurin | got it | 16:09 |
tristanC | plaurin: it is 100% free software, and you can rebuild it from source too if you prefer | 16:09 |
openstackgerrit | Andreas Jaeger proposed zuul/zuul-jobs master: Update roles/prepare-workspace-openshift/README.rst https://review.opendev.org/704017 | 16:09 |
corvus | tristanC: yeah. separate from mitogen, i think 681553 is clever, and we certainly have modified other modules (the command module!). i think the thing is that every time we do that we make upgrading ansible harder, so it's a big ongoing cost for us, and it's hidden. | 16:09 |
tristanC | corvus: though, i also with clarkb initial concern, a zuul-jobs playbook may not run with a vanilla ansible if we land 681553. while the command module modification shouldn't affect re-use | 16:11 |
plaurin | seems oc does pretty much the same thing as kubectl what is it for? | 16:11 |
tristanC | agree* with clarkb | 16:11 |
plaurin | / diff | 16:11 |
tristanC | plaurin: the rsync sub-command takes care of managing rsync or tar subprocess on the container | 16:11 |
tobiash | plaurin: oc is essantially kubectl but supports additional features that openshift provides on top of kubernetes | 16:12 |
tristanC | enabling efficient synchronization of files between the executor and the pod | 16:12 |
plaurin | so it's a wrapper around kubectl to add more features? | 16:12 |
tobiash | I'm not sure if it's a wrapper or a fork | 16:13 |
plaurin | okay | 16:13 |
tristanC | plaurin: yes, but it's not a wrapper, nor a fork, it just integrate the kubernetes client code and adds more functions to it | 16:13 |
plaurin | let me load that into my zuul server | 16:13 |
corvus | that's apparently how things work with go :) vendor and enhance... | 16:14 |
plaurin | I guess I just need to put the oc bin into my /usr/bin | 16:14 |
tristanC | plaurin: yes | 16:14 |
corvus | tristanC: yeah, that's a good additional point for 553 | 16:14 |
corvus | tristanC: regarding the question about whether the flag is needed -- iiuc, the plan is to add the flag to all the jobs, ask people to add fetch-output to their base jobs, switch the default, remove the flag. does that sound right? | 16:15 |
plaurin | tristanC: okay done now I will get the openshift workspace patch and test that out | 16:15 |
tobiash | corvus, tristanC: I'm not sure I'd vote for 553 instead of adapting synchronize-workspace after reading the comments on the upstream pr | 16:17 |
tristanC | corvus: that works for me, then let me know when i should rebase the rest of the changes. Let's start with https://review.opendev.org/#/c/681882/ | 16:17 |
plaurin | I wonder what would be the difference between the kubernetes and openshift drivers for nodepool | 16:19 |
openstackgerrit | Merged zuul/zuul-jobs master: Add prepare-workspace-openshift role https://review.opendev.org/631402 | 16:19 |
*** mattw4 has joined #zuul | 16:20 | |
tobiash | tristanC: I have a question about this ^ | 16:21 |
tobiash | is this intended to the 'get k8s workspace' or 'get k8s pod' use case? | 16:21 |
corvus | plaurin: they're almost the same | 16:23 |
tristanC | tobiash: the 'get k8s workspace' needs a very different base job, where you may manage the container build outside of the pod. this is for the 'get k8s pod' use case | 16:24 |
tristanC | tobiash: e.g. here is our openshift native job: https://softwarefactory-project.io/cgit/config/tree/zuul.d/_jobs-openshift.yaml#n4 | 16:25 |
tobiash | tristanC: don't we have in the 'get k8s pod' use case an ansible post per pod? | 16:25 |
tobiash | 402 takes a list of pods while we could just get the pod from the hostvars instead | 16:26 |
corvus | i thought 402 did get the list of pods from the hostvars | 16:27 |
plaurin | corvus, tristanC: hey that works https://review.opendev.org/#/c/631402/ | 16:27 |
corvus | doesn't the default of zuul.resources do that? | 16:27 |
corvus | plaurin: \o/ | 16:27 |
*** mattw4 has quit IRC | 16:27 | |
plaurin | I have other issues but that would probably my job really failing because of bad configuration,, but did get though the run start process | 16:27 |
*** mattw4 has joined #zuul | 16:28 | |
*** tosky has quit IRC | 16:28 | |
tristanC | tobiash: from the hostvars may works indeed, don't remember why i used the zuul.resource instead. | 16:28 |
tobiash | corvus: if all pods are in the inventory (I'm not sure about that part, tristanC do you have a link at hand to an inventory of a k8s job?) then we can avoid the loop and just run against the host like regular tasks | 16:29 |
plaurin | https://gist.github.com/plaurin84/bb4502361897d8177b2de5663894dbb1 | 16:29 |
tristanC | tobiash: yes, please see the test-openshift-tox-py36-fedora-30 job result posted by 'Software Factory CI' | 16:30 |
tristanC | plaurin: glad 402 works for you! | 16:31 |
tristanC | tobiash: on https://review.opendev.org/#/c/682049/ | 16:32 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul-jobs master: fetch-output-openshift: initial role https://review.opendev.org/682044 | 16:33 |
tristanC | plaurin: next you'll need https://review.opendev.org/682044 to fetch the job logs from the pod | 16:33 |
plaurin | logs works for me | 16:34 |
plaurin | well I have my job-output.txt | 16:34 |
plaurin | and my zuul-info/inventory | 16:35 |
tristanC | plaurin: i mean build artifacts, like html report generated on the pod | 16:35 |
plaurin | I might need that at some point yea | 16:35 |
plaurin | checking | 16:35 |
tobiash | tristanC: so ansible_host is at least the pod name and I guess the namespace and context will be available after gathering facts automatically | 16:36 |
tristanC | tobiash: perhaps? | 16:36 |
tristanC | tobiash: but since zuul.resources got what we need, it seems safer to use that | 16:42 |
tobiash | tristanC: yes, it works, just feels a little bit odd to treat the pods as data in this role and using them as regular hosts in other roles | 16:44 |
tobiash | tristanC: I'm also thinking about modifying the standard synchronize-workspace role. Doing this should work with hosts instead of looping. | 16:46 |
tobiash | but as a separate role I guess it's ok | 16:47 |
tobiash | same for 682044, if we want to integrate this into the default fetch role, it also needs to be done with hosts | 16:48 |
plaurin | tristanC: I won'T be able to test 682044 now because I have some changes to make in my jobs to make this work. But I'll definitely need it | 16:48 |
tristanC | plaurin: fwiw, here is our base pre job https://softwarefactory-project.io/cgit/config/tree/playbooks/base/pre.yaml (not ideal, but that works for both openstack and openshift jobs) | 16:50 |
tristanC | and here is how we do base post: https://softwarefactory-project.io/cgit/config/tree/playbooks/base/post.yaml | 16:50 |
plaurin | 👍️ | 16:51 |
corvus | tristanC: q on 681748 | 16:54 |
plaurin | I realise I wasn't using the fetch output at all, we synchronised some logs files directly in the jobs.. I have some improvemenents to do! | 16:55 |
*** jamesmcarthur has quit IRC | 16:55 | |
*** jamesmcarthur has joined #zuul | 16:55 | |
*** jamesmcarthur has quit IRC | 16:57 | |
tristanC | corvus: i suggest we look at fetch-javascript-output at the end, it was the most difficult because it has multiple strategy with regards to symlinks | 16:57 |
corvus | tristanC: okay. i left +2s on the other first 3 changes in that topic | 16:59 |
* plaurin lunchtime | 16:59 | |
AJaeger | tristanC: you better rebase and resolve merge conflicts... | 17:01 |
corvus | mnaser, pabelanger, Shrews, fungi: operator call starting now if you want to join | 17:01 |
Shrews | nod | 17:02 |
pabelanger | joining | 17:02 |
mnaser | dialing | 17:04 |
pabelanger | connected | 17:04 |
tristanC | i'm in as well ( sip:6001@pbx.openstack.org ) | 17:04 |
mnaser | sorry -- i'm having issues with my phone dialing in, please start and i'll do my best to dial in | 17:07 |
*** zxiiro has joined #zuul | 17:07 | |
fungi | i'm on | 17:07 |
*** jamesmcarthur has joined #zuul | 17:08 | |
corvus | mnaser: there's a sip option if the pstn isn't working for you | 17:08 |
mnaser | yeah i'm not equipped with a softphone but trying to setup one.. | 17:09 |
openstackgerrit | Merged zuul/nodepool master: Cleanup exception logging in static provider https://review.opendev.org/702828 | 17:13 |
* tristanC raises hand | 17:16 | |
*** jpena|off is now known as jpena | 17:27 | |
*** jamesmcarthur has quit IRC | 17:29 | |
openstackgerrit | Fabien Boucher proposed zuul/zuul-jobs master: Make ara-report role to zuul_return an artifact https://review.opendev.org/704045 | 17:33 |
*** evrardjp has quit IRC | 17:34 | |
*** evrardjp has joined #zuul | 17:34 | |
tristanC | pabelanger: one big advantage of gobased operator is that you can use the kubernetes-client object directly, which give you type safety | 17:38 |
mnaser | ^^ | 17:38 |
tristanC | but it's also more complicated, you have to manage many details such as resources ownership, which are taken care of by ansible based operator | 17:39 |
tristanC | and, you have to integrate with the whole kubernetes go library system, which is quite brutal to start learning go imo | 17:39 |
AJaeger | pabelanger: want to review https://review.opendev.org/704045 ? | 17:41 |
*** mattw4 has quit IRC | 17:51 | |
*** armstrongs has joined #zuul | 17:53 | |
*** themroc has joined #zuul | 17:57 | |
armstrongs | I see there is a gitlab driver under active development is there any estimates on when it will be ready to use and how much work is remaining? | 18:00 |
*** armstrongs has quit IRC | 18:00 | |
*** electrofelix has quit IRC | 18:06 | |
tristanC | armstrongs already left, but here is the gitlab driver: https://review.opendev.org/#/q/status:open+project:zuul/zuul+branch:master+topic:gitlab | 18:08 |
tobiash | fbo: did you see my comment on the first one of this? ^ | 18:08 |
tobiash | the other changes lgtm | 18:09 |
tristanC | tobiash: fbo is at a conf for https://devconfcz2020a.sched.com/event/YOtV/cicd-for-fedora-packaging-with-zuul | 18:10 |
tobiash | ah cool | 18:11 |
tobiash | tristanC: do you know if there will be a video recording of this talk available? Sounds interesting | 18:13 |
tristanC | tobiash: usualy they are published on youtube afterward yes | 18:16 |
tobiash | cool | 18:16 |
pabelanger | tristanC: what has been feedback in fedora project about zuul in general for CI? Is this the first exposure to zuul for them? | 18:19 |
tristanC | pabelanger: the recent feedback is onboarding issue where they don't want to add project to the tenant configuration, and when comparing with gitlab, they would just click 'gitlab ci' to get it running | 18:21 |
tristanC | pabelanger: and the second difficulty is enforcing pr (which we don't) because power packager can't afford an extra step to click or approve the hundreds of change they do when maintaining their packages | 18:23 |
clarkb | wikimedia also called out the explicit addition step as a problem with zuul in their evaluation. I know there was discussion on how to address this yesterday though | 18:24 |
clarkb | it occured to me that another option might be to automate the change through the api? then peopel could have a web dashboard hit it? | 18:24 |
fungi | tristanC: what is "enforcing pr"? | 18:24 |
clarkb | that might make the ordering problem more explicit | 18:24 |
corvus | yesterday we discussed supporting adding all projects. that is a change that makes sense in zuul | 18:24 |
tristanC | fungi: fedora packaging workflow usualy results in 'git push' | 18:25 |
*** jamesmcarthur has joined #zuul | 18:25 | |
tobiash | tristanC: so they do post-merge-ci? | 18:25 |
clarkb | tristanC: zuul can operate on git push events... | 18:25 |
pabelanger | tristanC: yah, we try to encorce that with branch protections in github, but have seen push back | 18:25 |
tobiash | (which is also possible with zuul) | 18:25 |
pabelanger | long term, I think solution for us is to drop merge commit access | 18:25 |
fungi | clarkb: yeah, i was imagining some webui where a checkbox pushed a change proposal for the tenant config and then a special zuul pipeline configured to trigger on that push event to deploy the config and reload the scheduler | 18:26 |
pabelanger | but, giving up root is hard :) | 18:26 |
clarkb | zuul's own release tooling operates on pushes | 18:26 |
tristanC | tobiash: yes, fedora has an extra system to build package and propose promotion (called bodhi). Once a package has enough karma, then it gets pushed to cdn | 18:26 |
corvus | but regarding explicit addition -- there's no reason that "user edits yaml file" should be the only way a project gets added; zuul supports a pluggable system for getting the tenant config yaml, so it's possible to create any kind of integration to programatically generate the tenant yaml. | 18:26 |
tristanC | clarkb: yes, that's how we are using it, on push events, but then it's no longer gating, which we would like to do with PR | 18:27 |
fungi | basically folks want a ci system which tests things for them when they have time to look at test results, but lets them "move quickly and break things" when management puts them under a deadline | 18:27 |
pabelanger | clarkb: I've really wanted to refactor some of the openstack releases project, into something more generic, but haven't had time | 18:27 |
clarkb | tristanC: ok that sounds to me like less of a zuul problem and more of a workflwo problem? maybe I've misunderstood the concern | 18:27 |
clarkb | I guess users might conflate the two | 18:27 |
fungi | corvus: oh, great point, i forgot there was the option of the scheduler running something to generate the tenant config | 18:27 |
pabelanger | fungi: yes, that is exact issue we deal with some time talking to ansible folks | 18:27 |
tobiash | I guess it would be easy to write a small service that receives project webhooks and services the tenant config to zuul | 18:28 |
tobiash | it could even be done incrementally by using smart-reconfig | 18:28 |
pabelanger | fungi: so far, having stable tests really helps change peoples minds | 18:28 |
tobiash | but easier for the user would be to have this builtin | 18:28 |
tristanC | clarkb: exactly, that's a workflow problem, resulting in zuul benefit less interesting over the other system that are being evaluated | 18:28 |
fungi | tobiash: or the scheduler's tenant config hook could just query your code review system for all repositories which have the checkbox ticked, and then you just need an api call to trigger it to reload | 18:28 |
tristanC | clarkb: and it's also because fedora project content is not made to be used from the merge change, but from the package built out of them | 18:29 |
tobiash | at least smart reconfig already works incrementally so the only thing to think through is how and where to get the projects into the tenant config | 18:30 |
fungi | tristanC: but couldn't building and installing the package and running validation tests on that be a job? | 18:30 |
fungi | i think that's how obs works | 18:30 |
corvus | tobiash: i'm not sure "one size fits all" for an automatically generated tenant config, which is why it's currently a hook for a script. maybe there are ways to generalize it, but i think there would still be a lot of site specific configuration. | 18:30 |
fungi | it's also what debian's source-only uploads do | 18:30 |
fungi | (and ubuntu's since ages, i gather) | 18:30 |
fungi | so it could be done pre-merge rather than post-merge | 18:31 |
tobiash | corvus: yes, with the hook script it's already now possible with say half a day of scripting | 18:31 |
tristanC | fungi: yes, that is what fbo is going to demonstrate tomorrow, here you can zuul building a nodepool rpm, running integration test and voting on the PR: https://src.fedoraproject.org/rpms/nodepool/pull-request/7 | 18:31 |
fungi | tristanC: awesome. hopefully that sways some package maintainers into the pre-merge testing camp! | 18:32 |
tobiash | if we want to build it into zuul we could maybe add a regex matching into the tenant config or a catch-all entry | 18:32 |
tobiash | regex would be awesome to easily get all projects of a github org into the tenant | 18:33 |
tristanC | fungi: yes, especially since it's compatible with the in-package testing (that uses ansible roles). But some package owner are managing hundreds of repos, and those are not interested in having a PR workflow | 18:33 |
fungi | zuul can certainly also just do post-merge testing for them, or periodic even | 18:33 |
corvus | tobiash: i agree, that would work; tristanC and i discussed that a bit yesterday; should be in eavesdrop | 18:34 |
clarkb | fungi: tristanC and with artifact promotion you can build the package in the gate, test it, merge the changes, then push the rpm into $repo after merging | 18:34 |
clarkb | its actualy very similar to how the container job setup works I think | 18:35 |
clarkb | containers are just another type of package | 18:35 |
fungi | yep, that's a great point | 18:35 |
corvus | tobiash: starts here http://eavesdrop.openstack.org/irclogs/%23zuul/%23zuul.2020-01-22.log.html#t2020-01-22T16:12:41 | 18:35 |
pabelanger | it would be also cool, if we have buildset-yum-repo too | 18:35 |
tristanC | clarkb: that is what we are doing indeed, here are the templates: https://pagure.io/fedora-zuul-jobs/blob/master/f/zuul.d/templates.yaml | 18:35 |
clarkb | tristanC: exciting! | 18:35 |
pabelanger | which reminds me, really need to push up builset-galaxy job | 18:36 |
tristanC | and the job using artifacts requirements: https://pagure.io/fedora-zuul-jobs/blob/master/f/zuul.d/jobs.yaml | 18:36 |
tobiash | corvus: thanks, I think if we expand the list of projects when parsing the tenant config we could just reuse smart-reconfig instead of adding a further method for reconfiguration | 18:37 |
corvus | tobiash: yeah now that you mention that, i agree. not sure why i didn't think of that yesterday. (cc tristanC) | 18:37 |
*** jamesmcarthur has quit IRC | 18:37 | |
pabelanger | since we have a lot of humans online, I'd love to get some reviews / feedback on https://review.opendev.org/703892/ UI change. zuul-build-dashboard job should give working demo, if you hover over status bar for a job now | 18:38 |
tobiash | but the use case is also solvable with the tenant config script hook. If that builds the project list outside of zuul the only think to do when a new project is added would be to trigger a smart-reconfig (because this already reloads the tenant config) | 18:39 |
tobiash | that would require zero changes to zuul | 18:40 |
clarkb | pabelanger: on quick test of it I notice that when jobs are starting and before they have an executor assigned (when you get the spinny blue bar) it says a few seconds left which isn't accurate | 18:40 |
corvus | cool, so tristanC and friends could try smart-reconfig with the fedora pagure now to see how that works, and we can later add regex support to the configloader to make that easier | 18:41 |
clarkb | then when the job starts it has an accurate estimate I think | 18:41 |
pabelanger | clarkb: yah, open to idea how to fix. I think that is just the data we get back from estimated_time | 18:43 |
clarkb | corvus: tobiash meaning smart reconfig when you get a "project was added" event? | 18:43 |
tobiash | clarkb: yes | 18:43 |
clarkb | makes sense, though you'd have to get the list somehow on next startup right? | 18:43 |
clarkb | (maybe explicit listing covers that) | 18:43 |
tobiash | smart reconfig re-parses the tenant config -> config script fetches all projects -> smart reconfig reconfigures that tenant | 18:44 |
clarkb | got it | 18:44 |
plaurin | tristanC: I'm back. I might have a chance to try the fetch-output-openshift somewhere in the next hour or so | 18:44 |
plaurin | 044 | 18:44 |
clarkb | pabelanger: ya that might be a bug fix in the api | 18:44 |
corvus | clarkb: this would also go hand-in-hand with the suggested change to support adding a regex to the tenant config; so if/when that is in place, the middle step can change from 'script' to 'zuul connection' | 18:44 |
clarkb | pabelanger: have it return None or NaN or something when in that state | 18:45 |
tobiash | corvus: I was wrong, smart-reconfig works per tenant that has changed, not incrementally within a tenant, so we'd still need a further reconfig method, that works quite similar to the smart reconfig | 18:45 |
pabelanger | clarkb: k, I'll poke at it shortly | 18:45 |
fungi | alternative workaround in the ui is to not report an estimate if "a few seconds" is what the api returns | 18:45 |
corvus | tobiash: ah. (cc tristanC nevermind earlier cc :) | 18:46 |
tobiash | but it would be just a slight variant of the existing smart reconfig. Actually a hybrid of tenant reconfig and smart reconfig | 18:46 |
fungi | incremental-reconfig or differential-reconfig (maybe shortened to inc-reconfig or diff-reconfig)> | 18:46 |
*** themroc has quit IRC | 18:47 | |
*** jpena is now known as jpena|off | 18:48 | |
*** openstackgerrit has quit IRC | 18:51 | |
tobiash | or I might be wrong again and smart reconfig doesn't delete the unparsed branch config | 18:54 |
tobiash | in this case it really works incrementally already | 18:54 |
tobiash | so after looking at the code I'm confident that smart reconfig works totally incrementally (aka, reloads only changed tenants and also re-uses cached config within tenants) | 19:02 |
tobiash | cc tristanC double nevermind ;) | 19:03 |
tobiash | zuul-maint: I'd have a couple of tiny fixes that would need a second review: https://review.opendev.org/#/q/status:open+project:zuul/zuul+branch:master+topic:small-fixes | 19:10 |
*** bhavikdbavishi has quit IRC | 19:10 | |
clarkb | I'll take a look | 19:11 |
clarkb | tobiash: if you have a moment too curious what you think about the test change at https://review.opendev.org/#/c/703688/ | 19:11 |
clarkb | I think the testing results show it is about a minute faster according to stestr on similar hardware comparisons | 19:12 |
clarkb | for the whole suite | 19:12 |
clarkb | not a huge chunk but is measurable | 19:12 |
tobiash | clarkb: It might or might not make much difference for the whole test suite, but I can imagine that even then it is *very* useful in local debugging e.g. when adding a new ansible version | 19:14 |
clarkb | that is a good point since you'll be focused on running that ansible verion's tests | 19:15 |
tobiash | yes | 19:15 |
tobiash | when debugging a single test case it really matters if the test runs in 30s or 2minutes | 19:15 |
tobiash | I have the same issue with test_playbook actually | 19:15 |
tobiash | I don't like running that locally as a single test case as it takes so long ;) | 19:16 |
clarkb | ya thats the next slowest test after these ones | 19:16 |
clarkb | much harder to make faster though | 19:16 |
clarkb | could possibly split it up, but not sure that helps | 19:16 |
tobiash | pabelanger: you might be interested in https://review.opendev.org/702378. This fixes occasional wrong change urls with github. | 19:24 |
pabelanger | tobiash: I am, we see that in gate pipeline often, and returns just api url | 19:26 |
pabelanger | will review soon | 19:26 |
tobiash | thx | 19:26 |
*** gmann is now known as gmann_lunch | 19:26 | |
tobiash | clarkb: I think I addressed your comment on 684414 (fix canceling builds in starting phase) | 19:27 |
clarkb | anyone know why nodepool would delete an unused node after 22 seconds? | 19:28 |
tobiash | clarkb: I guess there was no config change? | 19:28 |
clarkb | http://paste.openstack.org/show/788727/ | 19:28 |
clarkb | no config change | 19:28 |
clarkb | maybe zuul decided it didn't need the node afterall? but you should keep the unused ready node until the timeout right? | 19:29 |
clarkb | that is a min ready request too | 19:30 |
clarkb | did we break that somehow? | 19:30 |
tobiash | nodepool should delete the node only if it's marked as used or in-use | 19:30 |
tobiash | clarkb: could it be that zuul used it for a job and then aborted the job (maybe because of a gate reset)? | 19:31 |
clarkb | tobiash: it is possible | 19:31 |
clarkb | except it did it to the next node too | 19:31 |
clarkb | I think we may have broken min-ready somehow | 19:31 |
Shrews | well, nodepool will delete the oldest unused node if it needs to free up capacity for another node, but what's curious is that node should be assigned to a request (or have been marked USED at least) | 19:32 |
clarkb | Shrews: it is from min ready | 19:32 |
clarkb | which is why it doesn't | 19:33 |
Shrews | ah | 19:33 |
clarkb | so the conflict here is between min-ready and deleting the oldest unused node | 19:34 |
clarkb | that kinda makes me wonder if we should just stop using min ready | 19:34 |
clarkb | opendev at least since we spend a lot of time at capacity | 19:34 |
Shrews | clarkb: the theory is, if the pool thread must pause request handling (b/c capacity), deleting the oldest, unused node would help (hopefully) get it unpaused faster | 19:34 |
fungi | i don't think opendev gets a bunch of benefit from min-ready these days, but other environments with much higher node setup costs and less frequency of quota exhaustion still might | 19:34 |
tobiash | clarkb: that could make sense, we stopped using min-ready a year ago | 19:34 |
clarkb | Shrews: ya the problem is we are just spinning with the min ready | 19:35 |
tobiash | however we stopped using it because we have many different images | 19:35 |
clarkb | Shrews: so we delete it, then boot it again then delete it and never make progress on the nodes that jobs demand | 19:35 |
clarkb | or at least not the intended progress | 19:35 |
fungi | in opendev our boot times are low enough that the additional quota can be put to better use in the available pool, i agree | 19:36 |
pabelanger | fungi: agree, I've aways wanted to remove that logic for opendev, since moving to zuulv3 | 19:36 |
fungi | but it still sounds like a legitimate bug which should be fixed (or at least optional behavior which should be considered mutually-exclusive) | 19:36 |
fungi | like if you're going to use the unused node cannibalization feature then don't set a min-ready on your nodes or they'll get into a fight | 19:37 |
fungi | or add some intelligence to pause min-ready booting when quota utilization crosses a certain threshold | 19:38 |
tobiash | hrm, min-ready booting is decoupled from the providers afaik | 19:38 |
pabelanger | so, we don't set min-ready in nodepool for zuul.a.c, but sometime get leaks. So, node is online and ready, for us we set max-read-age: 600, so help make sure they get cleaned up | 19:39 |
tobiash | but maybe it works if the provider ignores min-ready requests if it got paused in the last 10 minutes | 19:39 |
fungi | that could be a solution | 19:39 |
tobiash | Shrews: what do you think? | 19:39 |
fungi | so your min-ready nodes accumulate in your less-active providers when possible, though more likely your providers see proportional usage so min-ready requests would just get refused globally anyway | 19:40 |
fungi | the timeframe there could lead to a hysteresis though i suppose | 19:41 |
fungi | at the right quota sizes and constant utilization around the threshold | 19:41 |
fungi | you could see min-ready nodes bounce unhelpfully back and forth between providers as they put them under or at quota | 19:42 |
fungi | but maybe in practice that's not an issue or it's infrequent enough behavior to ignore | 19:43 |
clarkb | tobiash: the race fix in 684414 lgtm. One thing I notice on rereview is line 484 of https://review.opendev.org/#/c/684414/6/zuul/executor/client.py the old get was for data['worker_name'] but now refer to build.worker.name. Does worker_name get translated into just name by updateFromData? | 19:43 |
* clarkb is double checking this now too | 19:43 | |
Shrews | well, we could build in some sort of buffer for declining min-ready requests (either timeframe since pause, or how close we are to capacity). i'm not sure what other impact that might have | 19:43 |
clarkb | aha it is. ok +2 | 19:43 |
tobiash | clarkb: awesome, thanks | 19:44 |
Shrews | clarkb: is the current situation hurting anything? or are we just interested in avoiding unnecessary node churn? | 19:45 |
corvus | min ready only helps opendev on weekends :) | 19:45 |
clarkb | Shrews: my concern would be the goal is to ensure we get nodes for jobs but are not and that is creating unnecessary churn | 19:45 |
Shrews | ++ | 19:45 |
clarkb | Shrews: if we were managing to get nodes for jobs then the churn would be fine | 19:46 |
clarkb | but as is it is just a waste | 19:46 |
fungi | Shrews: yep, though again min-ready nodes count toward quota, making it a feedback loop so mixing in a delay once again means likely hysteresis | 19:46 |
clarkb | could we have a provider that just transitioned from at capacity to under, ignore min-ready requests unless those are the only requests available? | 19:46 |
tobiash | what priority do the min-ready have? | 19:47 |
clarkb | 100 | 19:47 |
clarkb | maybe lower their priority? | 19:47 |
clarkb | (I think that I read my log paste above properly to get the priority)_ | 19:47 |
fungi | Shrews: we could maybe do like hvac compressors do to avoid short-cycling and force an over/under on the quota too | 19:47 |
tobiash | clarkb: correct, so essentially min-ready have the same prio as the first item in the gate | 19:48 |
Shrews | yeah, we could totally lower their priority | 19:48 |
fungi | as long as nodes aren't cannibalized at or near the same quota threshold where we consider it safe to build min-ready nodes we avoid short-cycles | 19:48 |
tobiash | I think lowering the prio would solve this mostly already | 19:49 |
clarkb | zuul maintainers I think https://review.opendev.org/#/c/684414/6 is ready now if you want to be second reviewr. Its a bit more involved than the simple fixes from tobiash earlier but should help address a fun bug | 19:49 |
fungi | tobiash: clarkb: oh, yep, the reprioritization seems like that could also be a much easier fix | 19:49 |
*** gmann_lunch is now known as gmann | 19:52 | |
*** openstackgerrit has joined #zuul | 19:54 | |
openstackgerrit | Clark Boylan proposed zuul/nodepool master: Lower min-ready request priority https://review.opendev.org/704058 | 19:54 |
clarkb | is it as simple as ^ | 19:54 |
tobiash | clarkb: you were faster ;) | 19:55 |
corvus | clarkb, tobiash: i will reviwe 414 after lunch | 19:55 |
clarkb | I had to figure out which direction things were sorted in :) | 19:55 |
tobiash | clarkb: but I'd suggest to use a different prio as the priority is sorted alphabetically | 19:56 |
clarkb | oh | 19:56 |
clarkb | right its string not numberic | 19:56 |
tobiash | my too slow patch used 900 | 19:56 |
clarkb | k | 19:56 |
tobiash | but you can use anything you like >=400 | 19:57 |
corvus | you could also change 'getNodeRequests' to map to int first :) | 19:57 |
openstackgerrit | Clark Boylan proposed zuul/nodepool master: Lower min-ready request priority https://review.opendev.org/704058 | 19:58 |
clarkb | corvus: I worry that might have unexpected behavior differences | 19:58 |
clarkb | but that could be a good followup if we can dobule check it? | 19:58 |
tobiash | corvus: thanks :) | 19:59 |
corvus | oh wait, getNoteRequests isn't relevant | 19:59 |
clarkb | _assignHandlers | 20:00 |
clarkb | I think | 20:00 |
corvus | or maybe it is, via nodeRequestIterator | 20:00 |
corvus | clarkb: ah yep, it's resorted in assignhandler | 20:01 |
tobiash | oh, actually it's a little bit more complicated: https://opendev.org/zuul/nodepool/src/branch/master/nodepool/launcher.py#L179 | 20:01 |
corvus | but still by string | 20:01 |
corvus | yeah, that's the key line | 20:01 |
corvus | so stick an int on [0] and [2] in that tuple | 20:02 |
tobiash | yes, then it would be int sorted | 20:02 |
fungi | is there any chance someone might have used non-numeric priorities there? | 20:02 |
corvus | nope | 20:02 |
fungi | okay, so don't have to worry about that being a non-backward-compat change | 20:03 |
corvus | zuul sets the priority and uses only 100, 200, and 300 (plus or minus 1) | 20:03 |
corvus | i think zuul can set a priority of 99; i'm unsure if that's 099 or not | 20:03 |
fungi | cool, yeah as long as those aren't user-configurable strings somewhere it should be safe | 20:03 |
corvus | which means we may already have a latent bug... trying to confirm | 20:03 |
tobiash | I found tests/unit/test_v3.py: self.assertEqual(reqs[0]['_oid'], '099-0000000001') | 20:04 |
fungi | oh, though you make a good point, we'd still be altering the sort order if different orders of magnitude are in use | 20:04 |
tobiash | so we have a test to rule out that latent bug :) | 20:04 |
corvus | tobiash: whew :) | 20:04 |
fungi | nice! | 20:04 |
fungi | it's like someone thought of everything | 20:04 |
corvus | Shrews: 704058 lgtm if you want to +W | 20:05 |
* corvus lunches | 20:05 | |
tobiash | 099 is btw the 'there is already a paused job taking resources so use highest prio for child' priority | 20:05 |
openstackgerrit | Clark Boylan proposed zuul/nodepool master: Sort requests by numeric provider value https://review.opendev.org/704060 | 20:05 |
Shrews | yeah, i'm just looking for any unit tests that might need changing | 20:06 |
clarkb | that may want a try except just in case the zk data is manually edited poorly | 20:06 |
clarkb | but I need to eat lunch then see the optometrist so feel free to edit if necesary | 20:06 |
clarkb | hopefully they won't dilate my eyes and I'll be useful after | 20:07 |
tobiash | Shrews: I ran a local test run on 704058 and it's green | 20:08 |
*** openstackstatus has joined #zuul | 20:09 | |
*** ChanServ sets mode: +v openstackstatus | 20:09 | |
*** sshnaidm is now known as sshnaidm|afk | 20:09 | |
*** jamesmcarthur has joined #zuul | 20:11 | |
Shrews | i think the change to sort by numeric value might need more thought. it's currently inconsistent with storing a request with a string priority | 20:16 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Support emitting warnings via zuul_return https://review.opendev.org/651526 | 20:21 |
openstackgerrit | Merged zuul/zuul master: Fix bogus error message on reconfigure event https://review.opendev.org/703229 | 20:23 |
openstackgerrit | Merged zuul/zuul master: Add job.override-checkout to rest api https://review.opendev.org/703239 | 20:23 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Use implied branch matcher for implied branches https://review.opendev.org/640272 | 20:23 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: WIP: Use mirror infrastructure during zuul-quick-start https://review.opendev.org/649448 | 20:33 |
openstackgerrit | Merged zuul/zuul master: Change links on projects page to canonical name https://review.opendev.org/703240 | 20:38 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: WIP: Use mirror infrastructure during zuul-quick-start https://review.opendev.org/649448 | 20:53 |
openstackgerrit | Merged zuul/zuul master: Correctly handle noop on job page https://review.opendev.org/703242 | 20:59 |
*** johanssone_ has joined #zuul | 21:05 | |
*** openstackgerrit has quit IRC | 21:18 | |
*** irclogbot_3 has quit IRC | 21:18 | |
*** jhesketh has quit IRC | 21:18 | |
*** yolanda has quit IRC | 21:18 | |
*** panda has quit IRC | 21:18 | |
*** adam_g has quit IRC | 21:19 | |
*** johanssone has quit IRC | 21:19 | |
*** mgoddard has quit IRC | 21:19 | |
*** jpena|off has quit IRC | 21:19 | |
*** aluria has quit IRC | 21:19 | |
*** openstackstatus has quit IRC | 21:20 | |
*** notnone is now known as at_work | 21:22 | |
*** irclogbot_0 has joined #zuul | 21:26 | |
*** plaurin has quit IRC | 21:28 | |
corvus | remote: https://review.opendev.org/704068 Change default Gerrit HTTP auth method | 21:54 |
corvus | (gerritbot seems to be split) | 21:54 |
*** openstackgerrit has joined #zuul | 22:01 | |
*** jhesketh has joined #zuul | 22:01 | |
*** yolanda has joined #zuul | 22:01 | |
*** panda has joined #zuul | 22:01 | |
*** adam_g has joined #zuul | 22:01 | |
*** mgoddard has joined #zuul | 22:01 | |
*** jpena|off has joined #zuul | 22:01 | |
*** aluria has joined #zuul | 22:01 | |
*** jamesmcarthur has quit IRC | 22:02 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Use apt mirror infrastructure during zuul-quick-start https://review.opendev.org/649448 | 22:04 |
*** tosky has joined #zuul | 22:16 | |
*** armstrongs has joined #zuul | 22:34 | |
*** Goneri has quit IRC | 22:37 | |
*** armstrongs has quit IRC | 22:42 | |
clarkb | corvus: for https://review.opendev.org/#/c/704068/1 does opendev need a change to hardset digest? | 23:13 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!