jeblair | mordred: you may be able to provide input on https://review.openstack.org/505354 to help with that | 00:00 |
---|---|---|
*** jasondot_ has quit IRC | 00:01 | |
mordred | jeblair: it's up in my browser for review ... I need to run - but will review first thing tomorrow | 00:03 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Honor constraints files in tox-siblings https://review.openstack.org/523949 | 00:11 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Combine tox-siblings and tox roles https://review.openstack.org/523950 | 00:11 |
*** jasondotstar has joined #zuul | 00:14 | |
*** threestrands has quit IRC | 00:21 | |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Add inventory variables for checkouts https://review.openstack.org/521976 | 00:36 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: executor: add log_stream_port and log_stream_file settings https://review.openstack.org/523697 | 00:42 |
*** neatherweb has joined #zuul | 01:45 | |
*** threestrands has joined #zuul | 02:55 | |
*** xinliang has quit IRC | 05:00 | |
*** xinliang has joined #zuul | 05:01 | |
*** baiyi has joined #zuul | 05:28 | |
*** bhavik1 has joined #zuul | 05:29 | |
baiyi | hello everyone,when I start zuul-executor in v3, the log show an error: git.exc.GitCommandError: Cmd('git') failed due to: exit code(-13) | 05:31 |
tobiash | baiyi: do you have a full log? | 06:06 |
baiyi | cmdline: git clone ssh://esadmin@192.168.210.32:29418/requirements /var/lib/zuul/executor-git/192.168.210.32/requirements | 06:08 |
baiyi | stdout: 'Cloning into '/var/lib/zuul/executor-git/192.168.210.32/requirements'...' | 06:08 |
baiyi | 2017-11-30 13:42:46,831 INFO zuul.Merger: Updating local repository gerrit/requirements | 06:08 |
baiyi | 2017-11-30 13:42:46,905 ERROR zuul.Merger: Unable to update gerrit/requirements | 06:08 |
baiyi | Traceback (most recent call last): | 06:08 |
*** baiyi has quit IRC | 06:08 | |
*** baiyi has joined #zuul | 06:08 | |
baiyi | cmdline: git clone ssh://esadmin@192.168.210.32:29418/requirements /var/lib/zuul/executor-git/192.168.210.32/requirements | 06:08 |
baiyi | stdout: 'Cloning into '/var/lib/zuul/executor-git/192.168.210.32/requirements'...' | 06:08 |
baiyi | 2017-11-30 13:42:46,831 INFO zuul.Merger: Updating local repository gerrit/requirements | 06:08 |
baiyi | 2017-11-30 13:42:46,905 ERROR zuul.Merger: Unable to update gerrit/requirements | 06:08 |
baiyi | Traceback (most recent call last): | 06:08 |
baiyi | File "/usr/local/python3/lib/python3.6/site-packages/zuul/merger/merger.py", line 382, in updateRepo | 06:08 |
baiyi | repo.reset() | 06:08 |
baiyi | File "/usr/local/python3/lib/python3.6/site-packages/zuul/merger/merger.py", line 143, in reset | 06:08 |
baiyi | self.update() | 06:08 |
baiyi | File "/usr/local/python3/lib/python3.6/site-packages/zuul/merger/merger.py", line 284, in update | 06:08 |
baiyi | repo = self.createRepoObject() | 06:08 |
baiyi | File "/usr/local/python3/lib/python3.6/site-packages/zuul/merger/merger.py", line 136, in createRepoObject | 06:08 |
tobiash | baiyi: please use http://paste.openstack.org/ for posting logs | 06:08 |
tobiash | baiyi: does your executor have write permission in /var/lib/zuul/executor-git ? | 06:09 |
baiyi | http://paste.openstack.org/show/627791/ | 06:10 |
baiyi | yes | 06:11 |
tobiash | baiyi: try to execute the git command within the user context of the executor | 06:12 |
baiyi | It is normal to use zuul-executor -d | 06:12 |
tobiash | maybe it's a permission problem (no access to ssh key etc) | 06:13 |
baiyi | But I use "zuul-executor-d", and there's nothing wrong with that | 06:14 |
tobiash | baiyi: so you're starting it with your uid? | 06:15 |
tobiash | does the git clone work if you do it by yourself? | 06:15 |
baiyi | it works | 06:17 |
tobiash | in the same path, using the same user? | 06:19 |
tobiash | within the same environment? | 06:19 |
baiyi | yes | 06:21 |
tobiash | baiyi: too bad it doesn't print stdout and stderr | 06:27 |
tobiash | so currently I'm out of ideas | 06:28 |
tobiash | you could catch the GitCommanderror in File "/usr/local/python3/lib/python3.6/site-packages/zuul/merger/merger.py", line 128, in _git_clone and log stderr and stdout to further debug this | 06:29 |
tobiash | (catch, log, reraise) | 06:29 |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Do pep8 housekeeping according to zuul rules https://review.openstack.org/522945 | 06:29 |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Use same flake8 config as in zuul https://review.openstack.org/509715 | 06:29 |
*** xinliang has quit IRC | 06:41 | |
*** threestrands has quit IRC | 06:50 | |
*** xinliang has joined #zuul | 06:54 | |
*** baiyi has quit IRC | 06:57 | |
*** baiyi has joined #zuul | 06:57 | |
*** neatherweb has quit IRC | 07:29 | |
openstackgerrit | Rico Lin proposed openstack-infra/nodepool master: Fix getProviderManager in alien_list https://review.openstack.org/524092 | 08:08 |
openstackgerrit | Krzysztof Klimonda proposed openstack-infra/zuul feature/zuulv3: [WIP] Support autoholding nodes for specific changes/refs https://review.openstack.org/515169 | 08:21 |
*** baiyi1 has joined #zuul | 08:24 | |
*** baiyi has quit IRC | 08:25 | |
*** baiyi1 is now known as baiyi | 08:25 | |
openstackgerrit | Krzysztof Klimonda proposed openstack-infra/zuul feature/zuulv3: [WIP] Support autoholding nodes for specific changes/refs https://review.openstack.org/515169 | 08:27 |
*** baiyi1 has joined #zuul | 08:32 | |
*** baiyi has quit IRC | 08:33 | |
*** baiyi1 is now known as baiyi | 08:33 | |
*** bhavik1 has quit IRC | 08:57 | |
*** flepied has quit IRC | 09:33 | |
*** neatherweb has joined #zuul | 09:35 | |
*** EmilienM has quit IRC | 09:39 | |
*** EmilienM has joined #zuul | 09:40 | |
*** EmilienM has quit IRC | 09:41 | |
*** EmilienM has joined #zuul | 09:41 | |
*** hashar has joined #zuul | 09:47 | |
*** electrofelix has joined #zuul | 10:00 | |
*** kklimonda has quit IRC | 10:09 | |
*** kklimonda has joined #zuul | 10:15 | |
*** jesusaur has quit IRC | 10:17 | |
openstackgerrit | Krzysztof Klimonda proposed openstack-infra/zuul feature/zuulv3: Support autoholding nodes for specific changes/refs https://review.openstack.org/515169 | 10:20 |
openstackgerrit | Krzysztof Klimonda proposed openstack-infra/zuul feature/zuulv3: Support autoholding nodes for specific changes/refs https://review.openstack.org/515169 | 10:22 |
*** jesusaur has joined #zuul | 10:22 | |
*** sshnaidm|off is now known as sshnaidm|rover | 10:48 | |
*** jasondotstar has quit IRC | 11:02 | |
*** jasondotstar has joined #zuul | 11:02 | |
*** jasondotstar has quit IRC | 11:07 | |
*** jasondotstar has joined #zuul | 11:08 | |
*** jlk has quit IRC | 11:12 | |
*** jlk has joined #zuul | 11:12 | |
*** jlk has quit IRC | 11:13 | |
*** jlk has joined #zuul | 11:13 | |
*** jesusaur has quit IRC | 11:30 | |
*** jesusaur has joined #zuul | 11:32 | |
*** jkilpatr has quit IRC | 11:37 | |
*** jkilpatr has joined #zuul | 11:56 | |
*** jkilpatr has quit IRC | 12:14 | |
*** jkilpatr has joined #zuul | 12:17 | |
*** baiyi1 has joined #zuul | 12:26 | |
*** baiyi has quit IRC | 12:27 | |
*** baiyi1 is now known as baiyi | 12:28 | |
openstackgerrit | Krzysztof Klimonda proposed openstack-infra/zuul feature/zuulv3: Support autoholding nodes for specific changes/refs https://review.openstack.org/515169 | 12:34 |
Shrews | mordred: re: 505354, i don't know how to reconcile that. the changes from your original fix (https://review.openstack.org/505626) do not seem to correspond to current code. i'm tempted to remove that squash from mine | 14:08 |
*** neatherweb has quit IRC | 14:15 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Move to dictionary list of projects zuul._projects (take 2) https://review.openstack.org/518815 | 14:17 |
*** dkranz has joined #zuul | 14:18 | |
*** jasondotstar is now known as iamjasonc | 14:40 | |
*** iamjasonc is now known as JasonC | 14:42 | |
mordred | Shrews: yah - I say just remove that squash from yours | 14:43 |
mordred | Shrews: refactoring that whole file is coming up soon on my TDL | 14:43 |
*** JasonC is now known as JasonCL | 14:43 | |
Shrews | k | 14:45 |
*** JasonCL has quit IRC | 14:48 | |
*** JasonCL has joined #zuul | 14:48 | |
*** JasonCL has quit IRC | 14:51 | |
*** JasonCL has joined #zuul | 14:52 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Changes for Ansible 2.4 https://review.openstack.org/505354 | 14:56 |
*** JasonCL has quit IRC | 15:02 | |
*** JasonCL has joined #zuul | 15:02 | |
*** JasonCL has quit IRC | 15:03 | |
*** JasonCL has joined #zuul | 15:03 | |
pabelanger | mordred: friendly reminder https://review.openstack.org/521324/ :) | 15:26 |
kklimonda | hmm, can multiple nodepool launchers share same cloud? | 15:28 |
rcarrillocruz | yeah, but caution with naming, you should name them differently | 15:29 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul-jobs master: mirror-workspace-git-repos: Pass a dict, not a list containing a dict https://review.openstack.org/524211 | 15:30 |
pabelanger | kklimonda: yes, I think we still have a setting to namespace them | 15:30 |
pabelanger | looks at docs | 15:30 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: mirror-workspace-git-repos: Pass a dict, not a list containing a dict https://review.openstack.org/524211 | 15:31 |
pabelanger | it was nodepool-id in v2, but I'm not sure nodepool-launcher needs that any more | 15:32 |
kklimonda | I have two launchers registered with ZK, and one of them was printing "Quota exceeded" in the logs - I'm now thinking that's due to both of them competing for that last instance, and one of them receiving 403 from nova. | 15:35 |
rcarrillocruz | yeah | 15:35 |
rcarrillocruz | so that's the problem | 15:35 |
Shrews | kklimonda: launchers should not share clouds | 15:35 |
kklimonda | I've had a weird problem today, with nodepool leaving all nodes in state "Ready, Locked" | 15:35 |
rcarrillocruz | i have two launchers sharing a cloud | 15:35 |
Shrews | rcarrillocruz: you should not | 15:35 |
rcarrillocruz | but had that thing you depicted | 15:35 |
Shrews | sharing resources across launchers is a thing we have talked about for the future, but right now we can't support that | 15:36 |
kklimonda | with a ton of errors like http://paste.openstack.org/show/627883/ | 15:36 |
rcarrillocruz | yeah, not saying you should in prod ( have in testing), but if you go down that route the only way to not have launchers fight each others resources is by having them with different name | 15:37 |
dmsimard | By the way, is there a reason why opening up a console that hasn't started yet will show ---- END OF STREAM ---- ? Could we either wait until it has started to make a link clickable or make the console wait for the stream to begin or something ? | 15:37 |
pabelanger | Yah, you need a single launcher for provider right now. You could share the cloud, but need to setup different providers in nodepool. I think that would work | 15:38 |
dmsimard | Refreshing isn't a big deal but just wondering | 15:38 |
kklimonda | mhm, thanks - I'll shut down the other launcher, and try to actually track down and fix that ZK issue I've been having | 15:38 |
jeblair | i'd like to draw your attention to this spec: https://review.openstack.org/524024 it's written generally for new projects that we might start hosting in openstack, but i'd also like to pilot it with zuul. i'd particularly like to get these things in place before we relase zuul 3.0 | 15:38 |
jeblair | dmsimard: if we had the executor drop a line in job-output.txt before sending the gearman status update, it should have the effect you want | 15:39 |
kklimonda | my nodepool is getting "RuntimeError: ('xids do not match, expected %r received %r', 913440, 913439)" and then gets stuck in "ZooKeeper suspended. Waiting" loop | 15:39 |
jeblair | dmsimard: just a log line like "starting" or something | 15:39 |
Shrews | rcarrillocruz: pabelanger: you can fudge it with different names, but that's really not how it's supposed to work and calculations of quota things may be weird | 15:39 |
dmsimard | jeblair: Ah, so just something like "Console starting..." yeah | 15:39 |
dmsimard | jeblair: I'll figure out where it is and send a patch | 15:39 |
jeblair | dmsimard: thx++ | 15:39 |
rcarrillocruz | yup | 15:39 |
pabelanger | Shrews: Yah, that's what we did for OSIC under v2. Was single cloud, but different flavors. Which, isn't something we'd do with zuulv3 now | 15:40 |
kklimonda | jeblair: if you have time to review https://review.openstack.org/#/c/515169/ I'd appreciate it :) | 15:40 |
jeblair | kklimonda: thanks -- i should get to it soon -- i'm working my way back through after returning from vacation/holiday :) | 15:41 |
kklimonda | haha, I'll leave you to that then | 15:42 |
jeblair | there are a bunch of changes in gate that have been stuck as 'queued' for 14 hours | 15:45 |
jeblair | it seems to be due to: http://paste.openstack.org/show/627887/ | 15:45 |
pabelanger | I've seen that in logs before | 15:47 |
jeblair | that node doesn't seem to be in zk (which makes sense from that message) | 15:48 |
Shrews | jeblair: well, if it was never locked (or lost its lock somehow), it could have been re-used or recycled | 15:48 |
Shrews | would need to check corresponding nodepool logs | 15:49 |
jeblair | https://etherpad.openstack.org/p/o9Tv7MVq0x | 15:49 |
jeblair | i'm accumulating info there | 15:49 |
pabelanger | I do see some launch errors in grafana.o.o | 15:49 |
pabelanger | looking at logs now | 15:49 |
pabelanger | we also had a new shade release yesterday, something to also check | 15:50 |
jeblair | pabelanger: this is more than likely related to the zuul restart | 15:50 |
pabelanger | ok | 15:50 |
Shrews | so it was set to USED by zuul, but lost its lock (due to restart?) | 15:52 |
Shrews | hrm, set to in use twice? | 15:53 |
openstackgerrit | David Moreau Simard proposed openstack-infra/zuul feature/zuulv3: Print a message when we start the Zuul console https://review.openstack.org/524225 | 15:53 |
Shrews | 3 hours apart | 15:53 |
rcarrillocruz | hey, so i got it working 3rd party CI with zuulv3 + github https://github.com/rcarrillocruz-org/ansible-fork/pull/13 (note the ricky2-zuul, that's the 3rd party CI doing a noop). However to make it work i had to set the untrusted project I want to run test against to exclude job and project, otherwise the reconfiguration of the untrusted project (loading the .zuul.yaml) was referencing jobs and projects not | 15:54 |
jeblair | Shrews: well, the last log block is being continuously repeated | 15:54 |
rcarrillocruz | present on the 3rd party zuul | 15:54 |
rcarrillocruz | https://github.com/rcarrillocruz-org/ansible-fork/pull/13 | 15:54 |
rcarrillocruz | erm | 15:54 |
rcarrillocruz | http://paste.openstack.org/show/627889/ | 15:54 |
rcarrillocruz | is that ok approach, or I'm misunderstanding stuff and should do it some other way | 15:54 |
pabelanger | rcarrillocruz: not following your question | 15:55 |
mordred | rcarrillocruz: the .zuul.yaml in ansible-fork had jobs defined that referenced jobs you didn't have? | 15:55 |
rcarrillocruz | from origin zuul | 15:56 |
jeblair | Shrews: i'm working on getting the restart times from the log now | 15:56 |
pabelanger | nm, just re-read it | 15:56 |
rcarrillocruz | if i defined them on the third party zuul | 15:56 |
rcarrillocruz | then the funny thing is that it appears to try to run them | 15:56 |
rcarrillocruz | but what i just want to run the jobs defined in the 3rd party | 15:56 |
rcarrillocruz | that's the only way i could make it work | 15:56 |
mordred | rcarrillocruz: yah - so - I think excluding project is important and definitley correct | 15:57 |
mordred | rcarrillocruz: (when we add ansible/ansible to openstack zuul we'll definitely exclude project) | 15:57 |
mordred | rcarrillocruz: since otherwise exactly whatyoumentoin happens | 15:57 |
rcarrillocruz | k | 15:57 |
mordred | rcarrillocruz: for jobs -it sorts of depends on whether or not you want to re-use any of hte jobs defined in the repo as part of your third party testing- but if you do, you might have to include additional repos to get the job depends | 15:58 |
mordred | rcarrillocruz: but - yah - excluding job and project is a totally legit thing- andthe safest place to start | 15:58 |
rcarrillocruz | k thx | 15:58 |
jeblair | Shrews, pabelanger: hrm, the restart was 3 hours before this node was created. | 15:59 |
*** JasonCL has quit IRC | 16:00 | |
*** JasonCL has joined #zuul | 16:02 | |
Shrews | jeblair: any zk disconnects between 03:16:38 and 03:21:44 ? | 16:03 |
Shrews | or session errors | 16:03 |
jeblair | hrm, i don't see any in zuul's log | 16:05 |
Shrews | weird | 16:06 |
jeblair | Shrews: any idea why nodepool would unlock/lock/unlock from 03:13 -- 03:16? | 16:30 |
jeblair | Shrews: oh there are 2 requests | 16:31 |
jeblair | presumably something happened to request 3740 and then it got reattached to 3778 | 16:31 |
Shrews | jeblair: i haven't looked at the logs (except what you pasted). which launcher? | 16:31 |
Shrews | seems like you figured it out though | 16:31 |
jeblair | Shrews: nl01 (and i was just reading that from the ones in the etherpad) | 16:31 |
mordred | jeblair: (I'm still operating under the assumption that it's unlikely this is related to the shade release so I'm hammering on otherthings, but if it starts to look like shade is involved, please ping) | 16:32 |
jeblair | mordred: ++ | 16:32 |
Shrews | ah | 16:32 |
Shrews | jeblair: curious that it would be assigned to two requests. did zuul return it unused? | 16:33 |
jeblair | trying to dig into the first request now | 16:33 |
Shrews | i think nodepool would log if it cleared the first allocation | 16:34 |
* Shrews looks | 16:34 | |
jeblair | Shrews: oh -- how do we tell if that first request was min-ready? | 16:35 |
jeblair | (cause i don't see it in zuul logs, so i'm starting to suspect it as) | 16:35 |
Shrews | jeblair: it was | 16:36 |
Shrews | 2017-11-30 03:13:27,302 INFO nodepool.PoolWorker.tripleo-test-cloud-rh1-main: Assigning node request <NodeRequest {'reuse': False, 'node_types': ['ubuntu-xenial'], 'id': '100-0001293740', 'nodes': [], 'state': 'requested', 'state_time': 1512011599.1659088, 'requestor': 'NodePool:min-ready', 'declined_by': [], 'stat': ZnodeStat(czxid=742752127, mzxid=742752127, ctime=1512011599165, mtime=1512011599165, | 16:36 |
Shrews | version=0, cversion=0, aversion=0, ephemeralOwner=98788114264752898, dataLength=172, numChildren=0, pzxid=742752127)}> | 16:36 |
jeblair | Shrews: ah there it is, thanks :) | 16:36 |
Shrews | requestor | 16:36 |
Shrews | so that makes sense | 16:37 |
jeblair | okay, so that's a red herring. whatever interesting happened it was with the second request it was assigned to | 16:37 |
Shrews | nothing unusual about that 2nd request in the logs | 16:38 |
Shrews | nodepool logs, that is | 16:38 |
jeblair | weird, there's a 2.5 hour gap between returinng the node and trying to use it again. | 16:40 |
pabelanger | is that the timeout for job? | 16:42 |
jeblair | pabelanger: oh interesting idea | 16:44 |
rcarrillocruz | hmm | 16:45 |
rcarrillocruz | interesting | 16:45 |
rcarrillocruz | so | 16:45 |
rcarrillocruz | set up 3rd party CI on GitHub | 16:46 |
rcarrillocruz | get the two zuul, the one managing the repo I push the PR and the 3rd party one, reporting | 16:46 |
rcarrillocruz | now i uninstall the 3rd party CI Zuul App from the org | 16:46 |
rcarrillocruz | i do a recheck | 16:46 |
rcarrillocruz | and i get again the two zuuls on the status | 16:47 |
rcarrillocruz | despite obvciously the 3rd paarty Zuul not receiving events, as I uninstall it | 16:47 |
rcarrillocruz | looks as if the PR somehow caches it has two CIs | 16:47 |
rcarrillocruz | tristanC, jlk: any hint ^ ? | 16:47 |
jeblair | rcarrillocruz: well, the report sets a status on the pr; i imagine that when you remove the app it doesn't unset the previously set status fields | 16:48 |
rcarrillocruz | the zuul bot does not report tho, BUT the status thingy does show teh two zuuls | 16:48 |
rcarrillocruz | ah, so it's a static thing | 16:48 |
rcarrillocruz | k, that explains, thx | 16:48 |
rcarrillocruz | prepping a demo for tomorrow to my peers | 16:48 |
*** kmalloc has quit IRC | 17:11 | |
*** JasonCL has quit IRC | 17:17 | |
*** JasonCL has joined #zuul | 17:18 | |
jeblair | okay, i think i'm narrowing this down to an interaction between pipeline window sizes and restarts | 17:28 |
jeblair | er reconfigurations | 17:28 |
jeblair | there were 2 reconfigurations between 03:00 and 06:00. i think after each reconfiguration, the window size is reset. so after the first reconfig, the tripleo window size went from 32 to 20. that put the change we're looking at out of the window during the second reconfiguration. i think that caused build results not to be ported over, though nodesets were. so once it entered the window again, it tried to restart the build with an old nodeset. | 17:30 |
jeblair | i think that explains why there is a set of changes at the top of the tripleo queue all like this -- they were the ones that fit between positions #20 - 32 in the queue. | 17:31 |
jeblair | ie, the ones that got cut out of the window when it shrunk during the reconfig. | 17:31 |
clarkb | huh | 17:37 |
*** sshnaidm|rover is now known as sshnaidm|off | 18:02 | |
*** hashar is now known as hasharDinner | 18:06 | |
*** jkilpatr has quit IRC | 18:23 | |
*** myoung|ruck has joined #zuul | 18:31 | |
*** jkilpatr has joined #zuul | 18:39 | |
*** openstackgerrit has quit IRC | 18:48 | |
*** openstackgerrit has joined #zuul | 19:30 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Remove tox-siblings role stub https://review.openstack.org/523996 | 19:30 |
pabelanger | mordred: ^does that mean we can now push on removin tox_install.sh? | 19:33 |
pabelanger | okay, I think updated-pti is the topic you are using | 19:34 |
*** electrofelix has quit IRC | 19:38 | |
*** jkilpatr has quit IRC | 19:39 | |
mordred | pabelanger: yes - for the projects that are not using tox_install.sh to install additional software, it should be safe to make patches like https://review.openstack.org/#/c/508061/ | 19:47 |
pabelanger | mordred: coolio! I'll start looking after some food | 19:48 |
*** kmalloc has joined #zuul | 20:07 | |
jeblair | yay i think i have a reproducing test case | 20:09 |
*** JasonCL has quit IRC | 20:12 | |
mordred | jeblair: \o/ | 20:13 |
*** JasonCL has joined #zuul | 20:21 | |
*** hashar has joined #zuul | 21:05 | |
*** hasharDinner has quit IRC | 21:08 | |
*** jkilpatr has joined #zuul | 21:09 | |
*** threestrands has joined #zuul | 21:12 | |
*** threestrands has quit IRC | 21:12 | |
*** threestrands has joined #zuul | 21:12 | |
*** dkranz has quit IRC | 21:30 | |
*** JasonCL has quit IRC | 21:39 | |
*** hashar has quit IRC | 21:39 | |
*** JasonCL has joined #zuul | 22:22 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove nodesets from builds canceled during reconfiguration https://review.openstack.org/524409 | 22:26 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't shrink windows on reconfiguration https://review.openstack.org/524410 | 22:26 |
jeblair | clarkb: if you could take a look at that, i'd appreciate it | 22:26 |
clarkb | looking | 22:27 |
clarkb | that is a not small set of fixing | 22:28 |
clarkb | (thankfully mostly tests \o/ | 22:28 |
jeblair | yeah, the actual changes are pretty small tweaks. mostly tests. | 22:29 |
mordred | jeblair: wow. those are fun | 22:34 |
clarkb | obeservation, ExecutorClient.execute() is really long | 22:35 |
jeblair | it's generally getting shorter :) | 22:36 |
clarkb | indeed, one line at a time :) | 22:36 |
*** dkranz has joined #zuul | 22:39 | |
clarkb | jeblair: reading this can we just stop keeping track of the nodeset in the buildset entirely? | 22:40 |
clarkb | seems more natural and less redundant that each build would track its own buildset? | 22:40 |
clarkb | (thats a bigger change after digging through model.py so maybe something to do in the future) | 22:41 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't set job var override_checkout if null https://review.openstack.org/524414 | 22:42 |
jeblair | that ^ reduces it by one more line, thanks to comment trickery :) | 22:42 |
jeblair | clarkb: the buildset holds the nodeset before the build is started, so it serves a pretty important role still | 22:43 |
clarkb | jeblair: but couldn't you have a buildset(build(state: notstarted, nodeset: foo)) and track it that way? it just feels weird double accounting this | 22:44 |
clarkb | but yes not easy to change | 22:44 |
jeblair | clarkb: that makes sense, but currently the build is not created until it's launched, so yeah, that's non-trivial | 22:44 |
jeblair | another option would be to remove the nodeset from the buildset as soon as the build is created, and have the routines that look for a nodeset check both places | 22:45 |
jeblair | that's slightly less invasive, but not quite as much of an increase in clarity as your suggestion | 22:45 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove nodesets from builds canceled during reconfiguration https://review.openstack.org/524409 | 22:49 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't shrink windows on reconfiguration https://review.openstack.org/524410 | 22:49 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't set job var override_checkout if null https://review.openstack.org/524414 | 22:49 |
jeblair | i just added ordered=false to the asserts in the tests since the jobs can complete out of order (that was the test failure reported on the change) | 22:50 |
clarkb | jeblair: https://review.openstack.org/#/c/524409/1/tests/unit/test_scheduler.py line 3886, what causes that reconfigure to cancel the other jobs? its just applying the old config again isn't it? | 22:50 |
clarkb | s/old/previous/ | 22:51 |
clarkb | oh do we remove non running builds on reconfiguration regardless of config? | 22:52 |
clarkb | well that can't be because then the first reconfigure would cancel them | 22:53 |
jeblair | clarkb: ah i think i see it after the first reconfiguration, the item is still marked active even though it's outside the window, but as soon as the queue processor runs on it, it flips that bit to inactive. so then the next reconfiguration, the item is considered inactive, and that cancels the job. | 22:58 |
jeblair | iow, the key is that we shrink the window during a reconfiguration, perform a pass through the queue processor, then reconfigure a second time. | 22:58 |
clarkb | what does the pass through the queue processor? does the second reconfiguration trigger that first? | 22:59 |
jeblair | clarkb: after completing the first reconfiguration, we do pass through (that's what would cause new jobs to run) | 23:00 |
jeblair | (i mean, we run the processor after every reconfiguration for that reason) | 23:00 |
clarkb | gotcha | 23:01 |
clarkb | stack lgtm | 23:04 |
clarkb | the keeping of old windows is slightly brain trippy, means you can't force a smaller window if things are going good, but also means we don't artificially inhibit queue growth if things are going good | 23:05 |
clarkb | (and the intent is to let the current state of goodness determien values more so than human fiddling so that wins out) | 23:05 |
clarkb | looks like tests are still unhappy though | 23:06 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't shrink windows on reconfiguration https://review.openstack.org/524410 | 23:32 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Don't set job var override_checkout if null https://review.openstack.org/524414 | 23:32 |
jeblair | that should fix the tests -- we were setting windows on queues which didn't have them. | 23:32 |
*** bramwelt has joined #zuul | 23:35 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!