*** bobh has joined #openstack-infra | 00:01 | |
*** dingyichen has quit IRC | 00:02 | |
*** dingyichen has joined #openstack-infra | 00:04 | |
*** bobh has quit IRC | 00:06 | |
*** dhill_ has quit IRC | 00:07 | |
*** abelur_ has joined #openstack-infra | 00:14 | |
*** lbragstad has joined #openstack-infra | 00:14 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: Export DIB_ROOT_LABEL from final state https://review.openstack.org/532279 | 00:19 |
---|---|---|
*** bobh has joined #openstack-infra | 00:20 | |
*** tosky has quit IRC | 00:23 | |
*** bobh has quit IRC | 00:24 | |
*** salv-orlando has quit IRC | 00:29 | |
*** bobh has joined #openstack-infra | 00:31 | |
openstackgerrit | Merged openstack-infra/elastic-recheck master: Add query for stable branch cellsv1 job libvirt crash bug 1745838 https://review.openstack.org/538614 | 00:34 |
openstack | bug 1745838 in OpenStack Compute (nova) "legacy-tempest-dsvm-cells constantly failing on stable pike and ocata due to libvirt connection reset" [Undecided,New] https://launchpad.net/bugs/1745838 | 00:34 |
*** bobh has quit IRC | 00:35 | |
*** bobh has joined #openstack-infra | 00:40 | |
*** bobh has quit IRC | 00:45 | |
*** kiennt26 has joined #openstack-infra | 00:46 | |
*** gcb has joined #openstack-infra | 00:53 | |
*** hemna has quit IRC | 00:55 | |
jlvillal | Any ideas on why this patch isn't moving to the 'gate'? https://review.openstack.org/#/c/537972/ | 00:57 |
*** cuongnv has joined #openstack-infra | 00:57 | |
jlvillal | It has CR +2, V +1, WF +1 | 00:57 |
jlvillal | Doesn't appear to depend on any un-merged patches. | 00:57 |
* jlvillal tries a recheck | 00:59 | |
*** hemna has joined #openstack-infra | 01:01 | |
corvus | jlvillal: the w+1 event may have been missed due to the zuul outage earlier. a second w+1 or toggling the existing w+1 would enqueue it; or if no core reviewers are available, recheck works (though it'll go through check again first) | 01:02 |
jlvillal | corvus, I did try doing a W -1, CR -2 | 01:03 |
jlvillal | corvus, And then undoing it. Didn't seem to take. | 01:03 |
jlvillal | corvus, So I'll see if the recheck works | 01:03 |
jlvillal | Thanks | 01:03 |
corvus | well, it's in check now, but if that didn't work, it may not move into gate. i'll take a look and see if i can spot what was missing | 01:04 |
corvus | jlvillal: oh, it needs a w+1. | 01:07 |
corvus | jlvillal: if you just go ahead and give it that now, it'll go into gate | 01:07 |
jlvillal | corvus, It has a W+1 right now | 01:07 |
jlvillal | Not by me | 01:07 |
corvus | jlvillal: right, it needs to see the event | 01:07 |
jlvillal | OKay. I'll try :) | 01:08 |
jlvillal | corvus, I see it in the 'gate' now. Thanks. | 01:08 |
corvus | jlvillal: it's event driven, so the things that put a change into gate are either w+1 or v+1 | 01:09 |
jlvillal | corvus, Ah, okay. Thanks | 01:09 |
corvus | (if a matching event happens, then it checks the other requirements) | 01:09 |
jlvillal | corvus, If I removed my W+1, would it stop? Or keep going? | 01:09 |
* jlvillal just curious | 01:10 | |
corvus | jlvillal: it'll keep going. will still need at least one +1 on there when it's done to merge. | 01:10 |
jlvillal | corvus, Okay. | 01:11 |
*** liujiong has joined #openstack-infra | 01:19 | |
*** gcb has quit IRC | 01:21 | |
*** liujiong has quit IRC | 01:27 | |
*** gongysh has joined #openstack-infra | 01:29 | |
*** salv-orlando has joined #openstack-infra | 01:29 | |
*** salv-orlando has quit IRC | 01:35 | |
*** mriedem has quit IRC | 01:37 | |
*** b_bezak has joined #openstack-infra | 01:59 | |
*** b_bezak has quit IRC | 02:04 | |
*** hongbin has joined #openstack-infra | 02:06 | |
*** spligak_ has joined #openstack-infra | 02:11 | |
*** spligak has quit IRC | 02:12 | |
*** armax has joined #openstack-infra | 02:14 | |
*** cuongnv has quit IRC | 02:27 | |
*** cuongnv has joined #openstack-infra | 02:28 | |
*** salv-orlando has joined #openstack-infra | 02:31 | |
*** salv-orlando has quit IRC | 02:35 | |
*** jamesmcarthur has joined #openstack-infra | 02:37 | |
*** sshnaidm has quit IRC | 02:40 | |
*** jamesmcarthur has quit IRC | 02:47 | |
*** jamesmcarthur has joined #openstack-infra | 02:47 | |
*** yamamoto has joined #openstack-infra | 02:58 | |
*** gongysh has quit IRC | 03:03 | |
*** rcernin has quit IRC | 03:08 | |
*** harlowja has joined #openstack-infra | 03:11 | |
openstackgerrit | Nguyen Van Trung proposed openstack-dev/hacking master: Drop py34 target in tox.ini https://review.openstack.org/538731 | 03:14 |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: [DNM] Separate initial state and create state https://review.openstack.org/538732 | 03:15 |
*** xinliang has quit IRC | 03:16 | |
*** xinliang has joined #openstack-infra | 03:16 | |
*** harlowja has quit IRC | 03:18 | |
*** dave-mccowan has quit IRC | 03:20 | |
*** olaph has joined #openstack-infra | 03:20 | |
*** stakeda has joined #openstack-infra | 03:21 | |
*** olaph1 has quit IRC | 03:22 | |
*** jamesmcarthur has quit IRC | 03:25 | |
*** jamesmcarthur has joined #openstack-infra | 03:26 | |
*** jgwentworth is now known as melwitt | 03:26 | |
*** jamesmcarthur has quit IRC | 03:31 | |
*** salv-orlando has joined #openstack-infra | 03:31 | |
*** cshastri has joined #openstack-infra | 03:32 | |
*** salv-orlando has quit IRC | 03:36 | |
*** annp has joined #openstack-infra | 03:40 | |
*** jamesmcarthur has joined #openstack-infra | 03:56 | |
*** jamesmcarthur has quit IRC | 04:03 | |
*** rcernin has joined #openstack-infra | 04:04 | |
*** hongbin has quit IRC | 04:04 | |
*** jamesmcarthur has joined #openstack-infra | 04:09 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: Set default label for XFS disks https://review.openstack.org/532279 | 04:11 |
*** olaph1 has joined #openstack-infra | 04:13 | |
*** olaph has quit IRC | 04:13 | |
*** owalsh_ has joined #openstack-infra | 04:13 | |
*** jamesmcarthur has quit IRC | 04:14 | |
*** jamesmcarthur has joined #openstack-infra | 04:16 | |
*** owalsh has quit IRC | 04:17 | |
*** psachin has joined #openstack-infra | 04:17 | |
*** sree has joined #openstack-infra | 04:19 | |
*** jamesmcarthur has quit IRC | 04:21 | |
*** ykarel has joined #openstack-infra | 04:21 | |
*** bhagyashris has quit IRC | 04:22 | |
EmilienM | I keep having post failures on http://logs.openstack.org/12/538012/1/check/build-openstack-releasenotes/245ceec/job-output.txt.gz | 04:22 |
EmilienM | not sure what to do really | 04:22 |
EmilienM | dmsimard, pabelanger^ is that related to the fsck thing you're running? | 04:23 |
*** xarses has quit IRC | 04:24 | |
*** xarses has joined #openstack-infra | 04:25 | |
*** rosmaita has quit IRC | 04:26 | |
*** jamesmcarthur has joined #openstack-infra | 04:29 | |
*** pgadiya has joined #openstack-infra | 04:30 | |
*** daidv has joined #openstack-infra | 04:31 | |
*** salv-orlando has joined #openstack-infra | 04:32 | |
*** jamesmcarthur has quit IRC | 04:35 | |
ianw | EmilienM: hmm, is that the remote hanging up? | 04:36 |
*** salv-orlando has quit IRC | 04:37 | |
EmilienM | ianw: not sure tbh | 04:37 |
ianw | rax-ord ... are your failures consistent in that region maybe? | 04:37 |
EmilienM | ianw: consistent | 04:38 |
EmilienM | I did 10 rechecks | 04:38 |
EmilienM | always failing | 04:38 |
EmilienM | I'll see tomorrow, cheers | 04:38 |
ianw | nope, http://logs.openstack.org/12/538012/1/check/build-openstack-releasenotes/74c04f7/zuul-info/inventory.yaml for example is different region | 04:38 |
*** jamesmcarthur has joined #openstack-infra | 04:38 | |
ianw | ubuntu-xenial | Error: Cannot find source directory `/home/zuul/src/git.openstack.org/openstack/os-net-config/releasenotes/source'. | 04:39 |
ianw | i think that's the root cause | 04:39 |
*** jamesmcarthur has quit IRC | 04:42 | |
*** armax has quit IRC | 04:49 | |
*** pgadiya has quit IRC | 04:50 | |
*** pgadiya has joined #openstack-infra | 04:51 | |
*** jamesmcarthur has joined #openstack-infra | 04:59 | |
*** jamesmcarthur has quit IRC | 05:03 | |
*** pgadiya has quit IRC | 05:03 | |
*** ramishra has joined #openstack-infra | 05:06 | |
*** claudiub|2 has joined #openstack-infra | 05:07 | |
*** hongbin has joined #openstack-infra | 05:09 | |
*** jamesmcarthur has joined #openstack-infra | 05:12 | |
openstackgerrit | Matthew Treinish proposed openstack-infra/storyboard master: Make notification driver configurable https://review.openstack.org/538574 | 05:16 |
openstackgerrit | Matthew Treinish proposed openstack-infra/storyboard master: WIP: Add MQTT notification publisher https://review.openstack.org/538575 | 05:16 |
*** pgadiya has joined #openstack-infra | 05:16 | |
*** jamesmcarthur has quit IRC | 05:17 | |
openstackgerrit | Merged openstack/diskimage-builder master: Don't install dmidecode on Fedora ppc64le https://review.openstack.org/536852 | 05:23 |
openstackgerrit | Merged openstack/diskimage-builder master: Add support for Fedora 27, remove EOL Fedora 25 https://review.openstack.org/536759 | 05:23 |
*** jamesmcarthur has joined #openstack-infra | 05:27 | |
*** hongbin has quit IRC | 05:28 | |
*** jamesmcarthur has quit IRC | 05:32 | |
*** salv-orlando has joined #openstack-infra | 05:33 | |
*** jamesmcarthur has joined #openstack-infra | 05:36 | |
*** janki has joined #openstack-infra | 05:37 | |
*** salv-orlando has quit IRC | 05:37 | |
*** jamesmcarthur has quit IRC | 05:41 | |
*** agopi|out has quit IRC | 05:44 | |
*** jamesmcarthur has joined #openstack-infra | 05:46 | |
*** wolverineav has joined #openstack-infra | 05:49 | |
*** jamesmcarthur has quit IRC | 05:52 | |
*** jamesmcarthur has joined #openstack-infra | 06:01 | |
*** xinliang has quit IRC | 06:02 | |
*** e0ne has joined #openstack-infra | 06:04 | |
*** jamesmcarthur has quit IRC | 06:06 | |
*** chenying has joined #openstack-infra | 06:12 | |
*** chenying_ has quit IRC | 06:12 | |
*** jamesmcarthur has joined #openstack-infra | 06:13 | |
*** xinliang has joined #openstack-infra | 06:14 | |
*** xinliang has quit IRC | 06:14 | |
*** xinliang has joined #openstack-infra | 06:14 | |
*** jamesmcarthur has quit IRC | 06:18 | |
*** jamesmcarthur has joined #openstack-infra | 06:23 | |
*** ramishra has quit IRC | 06:27 | |
*** jamesmcarthur has quit IRC | 06:27 | |
*** ramishra has joined #openstack-infra | 06:28 | |
*** salv-orlando has joined #openstack-infra | 06:34 | |
*** jamesmcarthur has joined #openstack-infra | 06:35 | |
*** salv-orlando has quit IRC | 06:38 | |
*** salv-orlando has joined #openstack-infra | 06:38 | |
*** jamesmcarthur has quit IRC | 06:40 | |
*** yolanda_ has quit IRC | 06:44 | |
*** dsariel has joined #openstack-infra | 06:47 | |
*** jamesmcarthur has joined #openstack-infra | 06:48 | |
*** jamesmcarthur has quit IRC | 06:53 | |
*** e0ne has quit IRC | 06:56 | |
*** makowals has joined #openstack-infra | 06:56 | |
*** e0ne has joined #openstack-infra | 06:58 | |
*** jamesmcarthur has joined #openstack-infra | 07:00 | |
*** jamesmcarthur has quit IRC | 07:05 | |
*** jamesmcarthur has joined #openstack-infra | 07:06 | |
*** rcernin has quit IRC | 07:11 | |
*** jamesmcarthur has quit IRC | 07:11 | |
*** e0ne has quit IRC | 07:11 | |
*** pcichy has joined #openstack-infra | 07:12 | |
*** jamesmcarthur has joined #openstack-infra | 07:16 | |
*** pcichy has quit IRC | 07:17 | |
*** jamesmcarthur has quit IRC | 07:21 | |
*** namnh has joined #openstack-infra | 07:21 | |
*** armaan has joined #openstack-infra | 07:26 | |
*** jamesmcarthur has joined #openstack-infra | 07:27 | |
*** matbu has quit IRC | 07:29 | |
*** jamesmcarthur has quit IRC | 07:31 | |
*** andreas_s has joined #openstack-infra | 07:32 | |
*** ramishra has quit IRC | 07:32 | |
*** salv-orlando has quit IRC | 07:33 | |
*** jamesmcarthur has joined #openstack-infra | 07:33 | |
*** slaweq has joined #openstack-infra | 07:34 | |
*** ramishra has joined #openstack-infra | 07:35 | |
*** ykarel is now known as ykarel|lunch | 07:35 | |
*** olaph has joined #openstack-infra | 07:37 | |
*** olaph1 has quit IRC | 07:37 | |
*** jamesmcarthur has quit IRC | 07:38 | |
*** slaweq has quit IRC | 07:38 | |
*** slaweq has joined #openstack-infra | 07:38 | |
*** gcb has joined #openstack-infra | 07:39 | |
*** jamesmcarthur has joined #openstack-infra | 07:39 | |
*** slaweq has quit IRC | 07:40 | |
*** slaweq has joined #openstack-infra | 07:40 | |
*** pcaruana has joined #openstack-infra | 07:44 | |
*** jamesmcarthur has quit IRC | 07:44 | |
*** florianf has joined #openstack-infra | 07:48 | |
*** jamesmcarthur has joined #openstack-infra | 07:50 | |
*** jamesmcarthur has quit IRC | 07:54 | |
*** salv-orlando has joined #openstack-infra | 07:55 | |
*** jamesmcarthur has joined #openstack-infra | 08:00 | |
*** jtomasek has joined #openstack-infra | 08:01 | |
*** gongysh has joined #openstack-infra | 08:03 | |
*** jamesmcarthur has quit IRC | 08:04 | |
*** kjackal has joined #openstack-infra | 08:08 | |
*** gongysh has quit IRC | 08:08 | |
*** b_bezak has joined #openstack-infra | 08:09 | |
*** links has joined #openstack-infra | 08:09 | |
openstackgerrit | Duong Ha-Quang proposed openstack-infra/project-config master: Remove legacy jobs in tap-as-a-service https://review.openstack.org/513228 | 08:14 |
*** jamesmcarthur has joined #openstack-infra | 08:15 | |
*** zhenguo has joined #openstack-infra | 08:16 | |
*** ralonsoh has joined #openstack-infra | 08:20 | |
*** jamesmcarthur has quit IRC | 08:20 | |
*** d0ugal has quit IRC | 08:22 | |
*** alexchadin has joined #openstack-infra | 08:25 | |
*** AJaeger has quit IRC | 08:26 | |
*** tesseract has joined #openstack-infra | 08:26 | |
prometheanfire | I probably need some zuul help, I'm not sure what's causing the seemingly random fails, but something is... | 08:27 |
*** d0ugal has joined #openstack-infra | 08:27 | |
prometheanfire | see https://review.openstack.org/536793 for example | 08:28 |
*** jamesmcarthur has joined #openstack-infra | 08:28 | |
prometheanfire | https://review.openstack.org/537645 too | 08:28 |
prometheanfire | I wonder if meltdown mitigation patches slowed down gate and we are running against that | 08:29 |
prometheanfire | evrardjp: we should probably talk here :P | 08:29 |
prometheanfire | evrardjp: thanks for looking into it, tony set up the gate initially | 08:30 |
*** AJaeger has joined #openstack-infra | 08:31 | |
prometheanfire | evrardjp: where do you see those stats? | 08:31 |
evrardjp | I am just checking http://zuul.openstack.org/builds.html | 08:32 |
evrardjp | for the patch you've shown me needing a few rechecks, it was due to many reasons | 08:32 |
*** jamesmcarthur has quit IRC | 08:33 | |
evrardjp | so I don't think it's a timeout thing that deserves changing | 08:33 |
evrardjp | I digged a little deeper into the numbers, and you can see that one of your offenders could theoretically be the job: cross-neutron-py35 | 08:33 |
evrardjp | (enter that and the project: openstack/requirements ) | 08:33 |
evrardjp | you'll see it's not a big offender, and I don't think you should change your limits right now, it should be fine. | 08:34 |
evrardjp | if it becomes too much timeouts, then maybe a little deeper investigation would be wise. Then maybe change the timeouts if nothing can be changed. | 08:34 |
evrardjp | and that's a maybe | 08:35 |
evrardjp | (and only for that job) | 08:35 |
evrardjp | well that's what I'd do | 08:35 |
evrardjp | but after quickly checking your running jobs, I saw one of it had a post merge issue | 08:35 |
evrardjp | that's something that could interest infra: http://logs.openstack.org/c5/c5053646aa1bbbd0b2f2a5b269ddb42a9f29d49e/post/publish-openstack-python-branch-tarball/8205ccb/job-output.txt.gz#_2018-01-29_08_15_27_782740 | 08:36 |
evrardjp | it looks like the branch tarball got a (maybe temporary?) issue. | 08:36 |
*** efoley has joined #openstack-infra | 08:37 | |
*** ykarel|lunch is now known as ykarel | 08:37 | |
*** sshnaidm has joined #openstack-infra | 08:37 | |
evrardjp | just saying, I am no infra person, and all. | 08:38 |
*** jpich has joined #openstack-infra | 08:41 | |
*** masber has quit IRC | 08:41 | |
*** masber has joined #openstack-infra | 08:42 | |
*** jamesmcarthur has joined #openstack-infra | 08:42 | |
*** apetrich has quit IRC | 08:44 | |
*** salv-orlando has quit IRC | 08:45 | |
*** dingyichen has quit IRC | 08:46 | |
*** jamesmcarthur has quit IRC | 08:47 | |
*** jpena|off is now known as jpena | 08:48 | |
openstackgerrit | Chandan Kumar proposed openstack-infra/project-config master: Added check-requirements and publish-to-pypi jobs https://review.openstack.org/538838 | 08:48 |
*** makowals has quit IRC | 08:50 | |
*** jamesmcarthur has joined #openstack-infra | 08:50 | |
*** olaph has quit IRC | 08:50 | |
*** amoralej|off is now known as amoralej | 08:53 | |
*** olaph has joined #openstack-infra | 08:54 | |
*** jamesmcarthur has quit IRC | 08:54 | |
*** rossella_s has joined #openstack-infra | 08:56 | |
*** jamesmcarthur has joined #openstack-infra | 08:57 | |
AJaeger | prometheanfire, evrardjp, what'S the timeout for those jobs? Is it the default? | 08:58 |
*** alexchadin has quit IRC | 08:58 | |
*** makowals has joined #openstack-infra | 08:58 | |
prometheanfire | AJaeger: ya, default | 08:58 |
*** alexchadin has joined #openstack-infra | 08:58 | |
AJaeger | you might want to follow https://review.openstack.org/537016 to set it to 2400s | 08:58 |
AJaeger | prometheanfire: shall I send you a patch? | 08:59 |
prometheanfire | AJaeger: that'd be awesome | 08:59 |
prometheanfire | I could probably figure it out (with that example though | 08:59 |
prometheanfire | let me do it, good learning | 08:59 |
*** d0ugal has quit IRC | 09:00 | |
AJaeger | prometheanfire: https://review.openstack.org/538842 done already - sorry, read your message too late | 09:01 |
prometheanfire | AJaeger: https://gist.github.com/e7dc44d57043c9fa3e1204341c8829c8 | 09:01 |
prometheanfire | lol | 09:01 |
*** jamesmcarthur has quit IRC | 09:01 | |
*** efoley has quit IRC | 09:01 | |
AJaeger | prometheanfire: but can abandon ;) | 09:01 |
AJaeger | prometheanfire: your call.. | 09:01 |
*** rossella_s has quit IRC | 09:01 | |
*** rfolco|off is now known as rfolco | 09:01 | |
*** salv-orlando has joined #openstack-infra | 09:02 | |
AJaeger | prometheanfire: yeah, that works - I just put it earlier :) | 09:02 |
prometheanfire | nah, just tell me if my patch would do it | 09:02 |
prometheanfire | cool | 09:02 |
AJaeger | prometheanfire: so, get that change in quickly to avoid some problems. | 09:02 |
prometheanfire | ya | 09:02 |
AJaeger | regarding the tarball: That might be a node going down, some infra-root need to check that later | 09:03 |
prometheanfire | not sure you could accelerate that, up to you | 09:03 |
AJaeger | prometheanfire, no cannot | 09:03 |
*** e0ne has joined #openstack-infra | 09:04 | |
*** rossella_s has joined #openstack-infra | 09:05 | |
jianghuaw | hi, I met an error in a compute node when using devstack setup multiple nodes env. I think the problem is that we need include libpcre3-dev as the general prerequisite package. | 09:06 |
jianghuaw | As python-pcre is not added in the upper-constraints.txt and python-pcre depends on libpcre3-dev. | 09:06 |
jianghuaw | https://github.com/openstack/requirements/blob/master/upper-constraints.txt#L440 | 09:06 |
jianghuaw | I created a patch here to fix it: https://review.openstack.org/#/c/538841/ | 09:06 |
jianghuaw | could you help to have a look? | 09:07 |
*** giblet is now known as gibi_ | 09:07 | |
jianghuaw | Thanks. My multi-nodes testing job is broken due to the above problem. At the moment I install this package in the image. | 09:08 |
prometheanfire | I thought we just removed python-pcre from requirements | 09:08 |
prometheanfire | maybe that was another conversation | 09:09 |
jianghuaw | prometheanfire, I'm not sure why python-pcre is added. | 09:09 |
jianghuaw | ok. | 09:09 |
jianghuaw | already there is some discussion on it. | 09:09 |
jianghuaw | Do we have conclusion? and who is removeing it? | 09:10 |
prometheanfire | it doesn't look like it was removed, just never added | 09:11 |
prometheanfire | anyway, it's after 3 and I have to sleep some time | 09:11 |
*** jamesmcarthur has joined #openstack-infra | 09:11 | |
*** finucannot is now known as sfinucan | 09:12 | |
*** sfinucan is now known as stephenfin | 09:12 | |
jianghuaw | prometheanfire, ok. Thanks. But that's really in https://github.com/openstack/requirements/blob/master/upper-constraints.txt#L440 | 09:13 |
prometheanfire | oh, maybe I missed it (was just looking at recent commits) | 09:15 |
prometheanfire | that was added or changed 2 months ago | 09:15 |
*** matbu has joined #openstack-infra | 09:15 | |
prometheanfire | https://github.com/openstack/requirements/commit/89cebce27b7bd84260ea8e01a3fff1b64851e41d added as a dep from another module it looks like | 09:16 |
*** jamesmcarthur has quit IRC | 09:16 | |
jianghuaw | yes | 09:16 |
prometheanfire | anyway, time for me to sleep | 09:17 |
jianghuaw | Thanks anyway:-) | 09:17 |
*** owalsh_ has quit IRC | 09:17 | |
*** owalsh has joined #openstack-infra | 09:17 | |
*** makowals has quit IRC | 09:18 | |
*** pcichy has joined #openstack-infra | 09:19 | |
*** dbecker has joined #openstack-infra | 09:19 | |
*** d0ugal has joined #openstack-infra | 09:20 | |
*** derekh has joined #openstack-infra | 09:23 | |
*** kiennt26 has quit IRC | 09:24 | |
*** abelur_ has quit IRC | 09:24 | |
jianghuaw | AJaeger, added you to review my patch: https://review.openstack.org/#/c/538841/ | 09:24 |
jianghuaw | Thanks in advance:-) | 09:25 |
AJaeger | jianghuaw: I'm not a devstack core, better ask on #openstack-qa. | 09:25 |
*** kopecmartin has joined #openstack-infra | 09:26 | |
*** jamesmcarthur has joined #openstack-infra | 09:26 | |
jianghuaw | AJaeger, thanks. | 09:26 |
*** apetrich has joined #openstack-infra | 09:28 | |
*** yamahata has quit IRC | 09:28 | |
AJaeger | jianghuaw: I commented nevertheless... | 09:28 |
jianghuaw | thanks. Indeed that should be *now*. My bad:-) | 09:29 |
*** yamamoto has quit IRC | 09:30 | |
*** jamesmcarthur has quit IRC | 09:31 | |
*** jamesmcarthur has joined #openstack-infra | 09:32 | |
*** s-shiono has quit IRC | 09:33 | |
*** e0ne has quit IRC | 09:35 | |
jianghuaw | AJaeger, uploaded a new PS. | 09:35 |
*** jamesmcarthur has quit IRC | 09:37 | |
openstackgerrit | Masahito Muroi proposed openstack-infra/project-config master: Add publish-to-pypi in blazar-nova repo https://review.openstack.org/538185 | 09:42 |
*** shardy has joined #openstack-infra | 09:43 | |
evrardjp | rcarrillocruz: mordred: jeblair: I am not sure where you are in the integration of tests from PRs inside ansible github's repo to our zuul jobs, but I started to work on this real quick: https://review.openstack.org/#/c/538856/ to add a job on our side to be able to test ansible modules. | 09:44 |
evrardjp | when one of you is available, ping me to know if I should continue or not. | 09:44 |
evrardjp | I just wanted to have a draft, get it working, and adapt that pattern if need be. Improvements of speed can come later. | 09:45 |
*** makowals has joined #openstack-infra | 09:45 | |
*** jamesmcarthur has joined #openstack-infra | 09:45 | |
*** makowals has quit IRC | 09:45 | |
AJaeger | evrardjp: read backscroll about https://review.openstack.org/537955 - that'S all I know... | 09:45 |
*** efoley has joined #openstack-infra | 09:46 | |
evrardjp | oh there are already devstack tests apparently. Maybe I am too late in the game. | 09:46 |
AJaeger | evrardjp: best to talk to mordred and corvus later | 09:47 |
*** yamamoto has joined #openstack-infra | 09:47 | |
evrardjp | AJaeger: yup, thanks! | 09:47 |
chandankumar | AJaeger: Hello, Thanks for working on tempest-lib project removal :-) | 09:49 |
chandankumar | AJaeger: https://review.openstack.org/#/c/538838/ | 09:50 |
*** jamesmcarthur has quit IRC | 09:50 | |
chandankumar | AJaeger: i have added openstackci as a owner for uploading the package to pypi | 09:51 |
chandankumar | for python-tempestconf | 09:52 |
AJaeger | chandankumar: pypi.org/pypi/python-tempestconf does not exist | 09:52 |
AJaeger | please double check ^ | 09:52 |
*** jamesmcarthur has joined #openstack-infra | 09:52 | |
*** erlon_ has quit IRC | 09:53 | |
*** pblaho has joined #openstack-infra | 09:53 | |
chandankumar | AJaeger: https://pypi.python.org/pypi/python-tempestconf | 09:53 |
chandankumar | refresh again | 09:54 |
AJaeger | yeah, looks fine now... | 09:54 |
AJaeger | chandankumar: home page looks wrong - but next release will fix that ;) | 09:55 |
chandankumar | AJaeger: https://review.openstack.org/#/c/538840/ for new release | 09:55 |
AJaeger | chandankumar: let'S update setup.cfg first, please. That'S sooo wrong for an official openstack project | 09:56 |
*** jamesmcarthur has quit IRC | 09:57 | |
*** makowals has joined #openstack-infra | 09:58 | |
*** e0ne has joined #openstack-infra | 10:00 | |
chandankumar | AJaeger: sorry i didnot get that, you mean adding version metadata in setup.cfg, or updating Red Hat stuff part? | 10:01 |
*** kopecmartin has quit IRC | 10:01 | |
AJaeger | chandankumar: yes, remove Red Hat stuff. home page, mailing list, author look wrong | 10:01 |
*** sree has quit IRC | 10:01 | |
chandankumar | AJaeger: sure | 10:02 |
*** sree has joined #openstack-infra | 10:02 | |
AJaeger | chandankumar: you could even publish to docs.o.o (no job setup currently) - if you have docs for that. | 10:02 |
*** jamesmcarthur has joined #openstack-infra | 10:03 | |
chandankumar | AJaeger: currently we donot have too much docs, but in future, docs will be added, then we can publish it | 10:07 |
chandankumar | AJaeger: https://review.openstack.org/#/c/538862/ | 10:07 |
*** jamesmcarthur has quit IRC | 10:09 | |
*** namnh has quit IRC | 10:09 | |
*** namnh has joined #openstack-infra | 10:09 | |
*** sree has quit IRC | 10:11 | |
AJaeger | chandankumar: thanks! | 10:14 |
*** chenying_ has joined #openstack-infra | 10:15 | |
*** chenying has quit IRC | 10:15 | |
*** jamesmcarthur has joined #openstack-infra | 10:16 | |
*** cuongnv has quit IRC | 10:17 | |
*** jamesmcarthur has quit IRC | 10:21 | |
*** stakeda has quit IRC | 10:21 | |
*** adarazs is now known as adarazs_brb | 10:25 | |
ssbarnea | Would it be possible to get rid of the wiki captcha? its presence makes any kind of contributions a huge PITA. Now wonder why pages endup being outdated. | 10:26 |
*** jamesmcarthur has joined #openstack-infra | 10:27 | |
*** hjensas has joined #openstack-infra | 10:27 | |
*** oidgar has joined #openstack-infra | 10:28 | |
*** gcb has quit IRC | 10:29 | |
*** ldnunes has joined #openstack-infra | 10:31 | |
*** jamesmcarthur has quit IRC | 10:31 | |
*** threestrands_ has joined #openstack-infra | 10:33 | |
*** jamesmcarthur has joined #openstack-infra | 10:35 | |
*** threestrands has quit IRC | 10:36 | |
*** lucas-afk is now known as lucasagomes | 10:37 | |
rcarrillocruz | evrardjp: where's openstack-ansible-functional-<os> parent job defined? | 10:37 |
evrardjp | openstack-ansible-tests | 10:37 |
*** jappleii__ has joined #openstack-infra | 10:37 | |
evrardjp | https://github.com/openstack/openstack-ansible-tests/blob/master/zuul.d/jobs.yaml | 10:37 |
evrardjp | it's just a start to get the ball rolling. | 10:38 |
*** jappleii__ has quit IRC | 10:38 | |
*** panda|off is now known as panda | 10:38 | |
*** makowals has quit IRC | 10:39 | |
*** jappleii__ has joined #openstack-infra | 10:39 | |
evrardjp | but if you're already on something else, I can drop this. It's just that I promised you testing, and I am on my way of delivering now. | 10:40 |
*** jamesmcarthur has quit IRC | 10:40 | |
*** threestrands_ has quit IRC | 10:41 | |
*** dhajare has joined #openstack-infra | 10:42 | |
evrardjp | If you don't need it, fine for me, less code in our repos. :D | 10:43 |
*** pbourke has quit IRC | 10:43 | |
*** pbourke has joined #openstack-infra | 10:45 | |
*** danpawlik has quit IRC | 10:46 | |
*** jamesmcarthur has joined #openstack-infra | 10:48 | |
*** tpsilva has joined #openstack-infra | 10:49 | |
*** jaosorior has joined #openstack-infra | 10:52 | |
*** namnh has quit IRC | 10:52 | |
*** jamesmcarthur has quit IRC | 10:53 | |
*** gcb has joined #openstack-infra | 10:53 | |
cmurphy | something up with http://zuul.openstack.org/ ? | 10:55 |
*** alexchadin has quit IRC | 10:55 | |
*** alexchadin has joined #openstack-infra | 10:55 | |
cmurphy | "The proxy server received an invalid response from an upstream server." | 10:55 |
cmurphy | infra-root ^ | 10:56 |
*** makowals has joined #openstack-infra | 10:56 | |
cmurphy | going to cry if my change was dequeued :'( | 10:57 |
AJaeger | cmurphy: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64792&rra_id=all | 10:59 |
*** gcb has quit IRC | 10:59 | |
*** alexchadin has quit IRC | 10:59 | |
AJaeger | infra-root, we hit out of memory on zuul ;( | 10:59 |
cmurphy | oh lovely :( | 10:59 |
AJaeger | 16 GB of swap ;( http://cacti.openstack.org/cacti/graph.php?local_graph_id=64794&rra_id=all | 11:00 |
AJaeger | #status notice Zuul is currently under heavy load. Do not *recheck* or *approve* any changes. | 11:01 |
openstackstatus | AJaeger: sending notice | 11:01 |
*** annp has quit IRC | 11:01 | |
*** ganso has joined #openstack-infra | 11:02 | |
cmurphy | thanks AJaeger | 11:02 |
*** jamesmcarthur has joined #openstack-infra | 11:02 | |
-openstackstatus- NOTICE: Zuul is currently under heavy load. Do not *recheck* or *approve* any changes. | 11:03 | |
*** adarazs_brb is now known as adarazs | 11:03 | |
*** alexchadin has joined #openstack-infra | 11:04 | |
openstackstatus | AJaeger: finished sending notice | 11:04 |
ianw | hmm, it's gone nuts i guess | 11:04 |
AJaeger | cmurphy: that's all I can do ;( | 11:05 |
cmurphy | seems to be back and the queues are still there | 11:05 |
AJaeger | infra-root, I fear that the mass approval of https://review.openstack.org/#/q/status:open+topic:zuulv3-projects is the problem ;( | 11:05 |
AJaeger | ianw, cmurphy shall we wait and let them finish? | 11:05 |
* AJaeger goes to lunch - bbl | 11:05 | |
ianw | there's stuff in there about exceptions and nodes not locked | 11:05 |
*** electrofelix has joined #openstack-infra | 11:06 | |
ianw | i'm afraid at this point, i'm not really up for a debugging session | 11:06 |
ianw | so although i could restart fairly easily, if it goes wrong we might be in a worse position than we are now | 11:06 |
ianw | i think it's best if someone else looks into it, i'm off now | 11:07 |
cmurphy | i bet mordred or fungi or pabelanger will be up in the next couple of hours | 11:07 |
ianw | yep, and they have all day to fix it up too ;) ttyl | 11:07 |
*** alexchadin has quit IRC | 11:07 | |
*** jamesmcarthur has quit IRC | 11:07 | |
*** danpawlik has joined #openstack-infra | 11:13 | |
*** jamesmcarthur has joined #openstack-infra | 11:14 | |
*** gcb has joined #openstack-infra | 11:17 | |
*** jamesmcarthur has quit IRC | 11:18 | |
*** gcb has quit IRC | 11:18 | |
*** tosky has joined #openstack-infra | 11:21 | |
*** olaph has quit IRC | 11:22 | |
*** olaph has joined #openstack-infra | 11:23 | |
*** sambetts|afk is now known as sambetts| | 11:24 | |
*** sambetts| is now known as sambetts | 11:24 | |
*** andreas_s has quit IRC | 11:25 | |
*** andreas_s has joined #openstack-infra | 11:25 | |
*** dhajare has quit IRC | 11:26 | |
*** jamesmcarthur has joined #openstack-infra | 11:27 | |
*** alexchadin has joined #openstack-infra | 11:30 | |
*** _ari_|DevConf is now known as _ari_|conf | 11:32 | |
*** _ari_|conf is now known as _ari_|brno | 11:32 | |
*** jamesmcarthur has quit IRC | 11:32 | |
sambetts | Hey Infra whats the status of Zuul, I see the message from Friday but wanted to make sure before I start rechecking stuff | 11:33 |
sambetts | I can't get on zuul.openstack.org so I assume something not good | 11:33 |
sambetts | ? | 11:33 |
tosky | sambetts: there was a notification not long ago | 11:33 |
*** alexchadin has quit IRC | 11:33 | |
AJaeger | sambetts: https://wiki.openstack.org/wiki/Infrastructure_Status | 11:34 |
sambetts | oh ... my client timestamped that message Friday.... not sure why... | 11:34 |
*** andreas_s has quit IRC | 11:35 | |
AJaeger | sambetts: we had Friday other challenges ;( | 11:35 |
sambetts | :( | 11:36 |
*** dhajare has joined #openstack-infra | 11:38 | |
*** andreas_s has joined #openstack-infra | 11:40 | |
*** jamesmcarthur has joined #openstack-infra | 11:40 | |
*** alexchadin has joined #openstack-infra | 11:41 | |
*** alexchadin has quit IRC | 11:43 | |
*** jamesmcarthur has quit IRC | 11:45 | |
*** jamesmcarthur has joined #openstack-infra | 11:49 | |
*** dklyle has quit IRC | 11:52 | |
*** david-lyle has joined #openstack-infra | 11:53 | |
*** jamesmcarthur has quit IRC | 11:53 | |
*** andreas_s has quit IRC | 11:55 | |
*** andreas_s has joined #openstack-infra | 12:00 | |
*** jamesmcarthur has joined #openstack-infra | 12:01 | |
*** alexchadin has joined #openstack-infra | 12:02 | |
*** andreas_s has quit IRC | 12:03 | |
*** andreas_s has joined #openstack-infra | 12:03 | |
tosky | AJaeger: while waiting for zuul to come up, I have a question where you may help, related to publishing artifacts | 12:04 |
tosky | recently we (sahara) merged this https://review.openstack.org/#/c/532690 and I realized that the artifacts are published now under http://tarballs.openstack.org/sahara-extra/dist/ instead of http://tarballs.openstack.org/sahara/dist/ | 12:05 |
tosky | now, before changing all references to the old URLs, is there an easy way (or are we even allowed) to make the output of a sahara-extra job to publish under tarballs.o.o/sahara/ and not /sahara-extra? | 12:06 |
tosky | or if not, can we put a symlink? | 12:06 |
*** jamesmcarthur has quit IRC | 12:07 | |
*** rfolco is now known as rfolco|ruck | 12:07 | |
*** sree has joined #openstack-infra | 12:10 | |
*** pblaho has quit IRC | 12:11 | |
*** jamesmcarthur has joined #openstack-infra | 12:11 | |
*** erlon_ has joined #openstack-infra | 12:13 | |
*** salv-orlando has quit IRC | 12:14 | |
*** jamesmcarthur has quit IRC | 12:16 | |
*** jamesmcarthur has joined #openstack-infra | 12:22 | |
*** salv-orlando has joined #openstack-infra | 12:22 | |
*** jamesmcarthur has quit IRC | 12:26 | |
*** sree_ has joined #openstack-infra | 12:27 | |
*** sree_ is now known as Guest40215 | 12:27 | |
*** sshnaidm has quit IRC | 12:28 | |
*** jamesmcarthur has joined #openstack-infra | 12:28 | |
*** salv-orlando has quit IRC | 12:29 | |
*** sree has quit IRC | 12:30 | |
*** salv-orlando has joined #openstack-infra | 12:32 | |
*** jpena is now known as jpena|lunch | 12:33 | |
*** jamesmcarthur has quit IRC | 12:33 | |
*** katkapilatova has joined #openstack-infra | 12:39 | |
dmsimard | config-core, infra-root: I'll be semi-afk all week in a certification thingy so I might not be responsive to pings. Hopefully fungi and clarkb are back this week! | 12:40 |
mnaser | sigh | 12:41 |
mnaser | it looks like someone in puppet world took the liberty of doing the remove project name patches | 12:41 |
*** jamesmcarthur has joined #openstack-infra | 12:41 | |
mnaser | without a wait timeout | 12:41 |
mnaser | and then someone else did the same. causing even more load | 12:41 |
mnaser | then 2nd person went and -1'd patches of 1st person | 12:42 |
* mnaser flips table | 12:42 | |
*** pblaho has joined #openstack-infra | 12:42 | |
*** sshnaidm has joined #openstack-infra | 12:43 | |
*** cshastri has quit IRC | 12:43 | |
*** jamesmcarthur has quit IRC | 12:46 | |
*** janki has quit IRC | 12:48 | |
*** pcichy has quit IRC | 12:48 | |
*** armaan_ has joined #openstack-infra | 12:49 | |
*** jamesmcarthur has joined #openstack-infra | 12:50 | |
*** adarazs is now known as adarazs_lunch | 12:51 | |
*** armaan has quit IRC | 12:52 | |
*** jamesmcarthur has quit IRC | 12:55 | |
*** jamesmcarthur has joined #openstack-infra | 12:56 | |
*** jamesmcarthur has quit IRC | 13:01 | |
cmurphy | i think the zuul issues are going to start causing a lot of POST_FAILUREs like this one http://logs.openstack.org/41/538541/1/gate/neutron-grenade/4c60129/job-output.txt.gz :( | 13:04 |
AJaeger | tosky: best discuss with mordred , he's the master of publishing in CI ;) | 13:04 |
AJaeger | mnaser: and some people approved large number of these changes ... | 13:05 |
tosky | AJaeger: I was trying to avoid to fill mordred's request buffer :) | 13:05 |
AJaeger | tosky: I'm fine reviewing but can't help in designing. No time today to dig into this | 13:06 |
*** rosmaita has joined #openstack-infra | 13:06 | |
openstackgerrit | Balazs Gibizer proposed openstack-infra/project-config master: consolidate nova job definitions https://review.openstack.org/538908 | 13:07 |
tosky | AJaeger: sure, sure, it was not a request to proceed further; it was just sorry about the load on mordred | 13:07 |
tosky | (and in general for all people with a lot of stuff on their plate) | 13:07 |
*** amoralej is now known as amoralej|lunch | 13:07 | |
*** sshnaidm_ has joined #openstack-infra | 13:11 | |
*** jamesmcarthur has joined #openstack-infra | 13:12 | |
ssbarnea | anyone knows if gerritbot supports patterns for matching project list? | 13:12 |
*** Guest40215 has quit IRC | 13:13 | |
*** pgadiya has quit IRC | 13:13 | |
rosmaita | is it just me, or has http://zuul.openstack.org/ become non-responsive? | 13:13 |
cmurphy | rosmaita: known issue https://wiki.openstack.org/wiki/Infrastructure_Status | 13:14 |
*** sshnaidm has quit IRC | 13:14 | |
cmurphy | hoping an infra-root can help soon :( | 13:14 |
rosmaita | cmurphy: ty, i always look at the topic in #openstack-infra-incident but looks like the wiki is a better place | 13:16 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Move github webhook from webapp to zuul-web https://review.openstack.org/535711 | 13:16 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Move status_url from webapp to web section https://review.openstack.org/536773 | 13:16 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Remove webapp https://review.openstack.org/536780 | 13:16 |
cmurphy | rosmaita: yeah there was a notice sent out but i guess that doesn't change the channel topics | 13:17 |
*** jamesmcarthur has quit IRC | 13:17 | |
*** dprince has joined #openstack-infra | 13:19 | |
*** jamesmcarthur has joined #openstack-infra | 13:19 | |
*** dhajare has quit IRC | 13:20 | |
*** panda is now known as panda|lunch | 13:22 | |
*** alexchad_ has joined #openstack-infra | 13:22 | |
*** kopecmartin has joined #openstack-infra | 13:23 | |
*** vivsoni has quit IRC | 13:23 | |
AJaeger | cmurphy: shall I change the topics? I can sent out a status alert... | 13:24 |
*** alexchadin has quit IRC | 13:24 | |
*** vivsoni has joined #openstack-infra | 13:24 | |
*** mkopec_ has joined #openstack-infra | 13:25 | |
*** jamesmcarthur has quit IRC | 13:25 | |
cmurphy | AJaeger: your call | 13:26 |
AJaeger | ;) | 13:27 |
*** kopecmartin has quit IRC | 13:28 | |
AJaeger | What about this wording: #status alert Zuul is currently under heavy load. Do not *recheck* or *approve* any changes until we give the go ahead. | 13:29 |
cmurphy | AJaeger: lgtm | 13:29 |
AJaeger | #status alert Zuul is currently under heavy load. Do not *recheck* or *approve* any changes until we give the go ahead. | 13:29 |
openstackstatus | AJaeger: sending alert | 13:29 |
*** rlandy has joined #openstack-infra | 13:30 | |
AJaeger | thanks cmurphy. Then let's do it - we can always change. | 13:30 |
AJaeger | it might avoid some questions... | 13:30 |
-openstackstatus- NOTICE: Zuul is currently under heavy load. Do not *recheck* or *approve* any changes until we give the go ahead. | 13:32 | |
*** ChanServ changes topic to "Zuul is currently under heavy load. Do not *recheck* or *approve* any changes until we give the go ahead." | 13:32 | |
AJaeger | From Zuul (I could just attach it) "Queue lengths: 751 events, 0 management events, 1355 results." | 13:32 |
*** jpena|lunch is now known as jpena | 13:33 | |
*** fultonj has joined #openstack-infra | 13:34 | |
*** alexchadin has joined #openstack-infra | 13:34 | |
*** tmorin has joined #openstack-infra | 13:34 | |
openstackstatus | AJaeger: finished sending alert | 13:35 |
*** alexchad_ has quit IRC | 13:37 | |
*** d0ugal has quit IRC | 13:38 | |
*** d0ugal has joined #openstack-infra | 13:38 | |
AJaeger | infra-root, FYI ^ - please investigate and restart zuul | 13:40 |
*** edmondsw has joined #openstack-infra | 13:41 | |
*** trown|outtypewww is now known as trown | 13:42 | |
*** trown is now known as trown|rover | 13:42 | |
*** alexchadin has quit IRC | 13:43 | |
pabelanger | morning | 13:44 |
*** tellesnobrega has quit IRC | 13:44 | |
*** tellesnobrega has joined #openstack-infra | 13:45 | |
*** hemna_ has joined #openstack-infra | 13:45 | |
AJaeger | pabelanger, good morning! REady for a short fire drill? Otherwise grab your coffee first, please... | 13:45 |
pabelanger | sure, grabbing coffee | 13:46 |
pabelanger | and seeing if I can save queues | 13:46 |
*** alexchadin has joined #openstack-infra | 13:47 | |
AJaeger | pabelanger: don't restore *all* changes, otherwise we swap again | 13:47 |
AJaeger | pabelanger: my theory: We're swapping due to too many approvals of topic:zuulv3-projects changes | 13:48 |
pabelanger | AJaeger: why, is there a specific changes thta pushed up over? | 13:48 |
pabelanger | okay, I'm not sure how to filter them out | 13:48 |
AJaeger | pabelanger: just this list: https://review.openstack.org/#/q/status:open+topic:zuulv3-projects | 13:49 |
pabelanger | we can start with gate first | 13:49 |
AJaeger | pabelanger: those are in gate... | 13:49 |
AJaeger | remove all openstack-ansible from gate ;) | 13:49 |
AJaeger | that's the bulk of the changes | 13:49 |
pabelanger | that' wont be easy | 13:49 |
pabelanger | I only have patchset ides | 13:50 |
pabelanger | id* | 13:50 |
pabelanger | actually | 13:50 |
pabelanger | let me first dump queue | 13:50 |
AJaeger | pabelanger: worst case: Do not restore gate for now... | 13:51 |
AJaeger | pabelanger: sorry, can't help further - meeting time. | 13:51 |
*** dhajare has joined #openstack-infra | 13:52 | |
pabelanger | okay, I also know the project | 13:54 |
pabelanger | dumping queues | 13:54 |
pabelanger | stopping zuul | 13:55 |
*** kgiusti has joined #openstack-infra | 13:56 | |
AJaeger | pabelanger: can you do the #status ok later, please? | 13:57 |
* AJaeger gave 30 mins ago a status alert | 13:58 | |
*** sree has joined #openstack-infra | 13:58 | |
*** ykarel is now known as ykarel|away | 14:00 | |
pabelanger | still bringing zuul backonline | 14:00 |
*** yamamoto has quit IRC | 14:01 | |
*** slaweq has quit IRC | 14:02 | |
pabelanger | merge:cat running | 14:02 |
pabelanger | merger:cat* | 14:02 |
*** slaweq has joined #openstack-infra | 14:02 | |
*** mriedem has joined #openstack-infra | 14:03 | |
*** sree has quit IRC | 14:03 | |
pabelanger | loading queue (minus some OSA patches) | 14:05 |
*** zhenguo has quit IRC | 14:05 | |
*** ykarel|away has quit IRC | 14:06 | |
*** ihrachys has joined #openstack-infra | 14:06 | |
*** dhajare has quit IRC | 14:07 | |
*** slaweq has quit IRC | 14:07 | |
*** jamesmcarthur has joined #openstack-infra | 14:08 | |
*** dhill_ has joined #openstack-infra | 14:09 | |
*** tmorin has quit IRC | 14:09 | |
pabelanger | loading check queue now | 14:12 |
*** david-lyle has quit IRC | 14:15 | |
*** psachin has quit IRC | 14:15 | |
*** dmsimard is now known as dmsimard|afk | 14:17 | |
*** eyalb has joined #openstack-infra | 14:17 | |
*** mkopec_ has quit IRC | 14:17 | |
*** yamamoto has joined #openstack-infra | 14:18 | |
*** r-daneel has joined #openstack-infra | 14:20 | |
*** myoung is now known as myoung|reboot | 14:20 | |
pabelanger | okay, I've stopped loading changes into check, up to 18GB of ram | 14:21 |
*** Goneri has joined #openstack-infra | 14:24 | |
*** jcoufal has joined #openstack-infra | 14:24 | |
*** edmondsw has quit IRC | 14:25 | |
*** dhajare_ has joined #openstack-infra | 14:25 | |
pabelanger | #status notice we've been able to restart zuul, and re-enqueue changes for gate. Please hold off on recheck or approves, we are still recovering. More info shortly. | 14:28 |
openstackstatus | pabelanger: sending notice | 14:28 |
evrardjp | pabelanger: what's happening on our side? | 14:28 |
*** superdan is now known as dansmith | 14:28 | |
*** dave-mccowan has joined #openstack-infra | 14:28 | |
*** edmondsw has joined #openstack-infra | 14:28 | |
*** myoung|reboot is now known as myoung | 14:29 | |
-openstackstatus- NOTICE: we've been able to restart zuul, and re-enqueue changes for gate. Please hold off on recheck or approves, we are still recovering. More info shortly. | 14:29 | |
*** makowals has quit IRC | 14:29 | |
*** dhajare_ has quit IRC | 14:30 | |
pabelanger | AJaeger: do we know how to stop corvus script? | 14:31 |
openstackstatus | pabelanger: finished sending notice | 14:31 |
*** makowals has joined #openstack-infra | 14:31 | |
*** makowals has quit IRC | 14:32 | |
*** ralonsoh_ has joined #openstack-infra | 14:32 | |
mnaser | pabelanger: re approving puppet-openstack, did that and notified #puppet-openstack to not approve anything in the meantime too :) | 14:32 |
*** dave-mccowan has quit IRC | 14:33 | |
*** efoley has quit IRC | 14:33 | |
*** efoley has joined #openstack-infra | 14:33 | |
pabelanger | great, thank you | 14:33 |
*** makowals has joined #openstack-infra | 14:34 | |
*** ykarel|away has joined #openstack-infra | 14:34 | |
*** dave-mccowan has joined #openstack-infra | 14:34 | |
pabelanger | so, were at about 19Gb for zuul right now | 14:34 |
pabelanger | things looks to have leveled for the moment | 14:35 |
*** eyalb has left #openstack-infra | 14:35 | |
*** ralonsoh has quit IRC | 14:35 | |
*** jamesmca_ has joined #openstack-infra | 14:36 | |
AJaeger | pabelanger: corvus script sends a new change every 20 mins - best talk with him on next steps. It was all well so far - but I see at least 40 *approved* changes from them, so that was too much. Perhaps he has to stop for now... | 14:37 |
*** jamesmca_ is now known as jamesmcarthur_ | 14:39 | |
d0ugal | Has this moved? http://zuulv3.openstack.org/ | 14:40 |
mnaser | d0ugal: zuul.openstack.org | 14:41 |
evrardjp | d0ugal: yes a few time ago | 14:41 |
d0ugal | aha, thanks | 14:41 |
d0ugal | (sorry, I have been out for a few weeks) | 14:41 |
evrardjp | haha that would explain. There was a redirect before to zuul.openstack.org from zuulv3. | 14:41 |
evrardjp | just changing the bookmark would do the trick :p | 14:42 |
*** amoralej|lunch is now known as amoralej | 14:42 | |
*** rloo has joined #openstack-infra | 14:43 | |
rloo | hi, is there some way to remove a patch fro zuul. eg, abandon the patch? | 14:43 |
pabelanger | rloo: yes, abandon will dequeue it from zuul | 14:44 |
rloo | pabelanger: sweet. will do that until things are working better in zuul-la-la-land | 14:44 |
*** dhajare_ has joined #openstack-infra | 14:45 | |
*** janki has joined #openstack-infra | 14:46 | |
*** kopecmartin has joined #openstack-infra | 14:47 | |
*** daidv has quit IRC | 14:49 | |
*** jcoufal has quit IRC | 14:49 | |
*** daidv has joined #openstack-infra | 14:49 | |
*** salv-orlando has quit IRC | 14:49 | |
*** salv-orlando has joined #openstack-infra | 14:50 | |
*** esberglu has joined #openstack-infra | 14:51 | |
*** gcb has joined #openstack-infra | 14:52 | |
*** ykarel|away has quit IRC | 14:53 | |
*** bfournie has joined #openstack-infra | 14:54 | |
*** salv-orlando has quit IRC | 14:55 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/nodepool master: Fix relaunch attempts when hitting quota errors https://review.openstack.org/536930 | 14:58 |
*** fresta has quit IRC | 14:59 | |
*** jcoufal has joined #openstack-infra | 15:00 | |
*** fresta has joined #openstack-infra | 15:01 | |
*** lucasagomes is now known as lucas-hungry | 15:01 | |
*** gibi_ is now known as gibi | 15:01 | |
*** fresta has quit IRC | 15:02 | |
*** r-daneel_ has joined #openstack-infra | 15:02 | |
*** rfolco has joined #openstack-infra | 15:03 | |
*** rfolco|ruck has quit IRC | 15:03 | |
*** rfolco_ has joined #openstack-infra | 15:03 | |
pabelanger | afk for a few moments | 15:03 |
*** Goneri has quit IRC | 15:03 | |
*** r-daneel has quit IRC | 15:04 | |
*** r-daneel_ is now known as r-daneel | 15:04 | |
*** tmorin has joined #openstack-infra | 15:04 | |
*** slaweq has joined #openstack-infra | 15:05 | |
*** rfolco_ is now known as rfolco|ruck | 15:05 | |
*** adarazs_lunch is now known as adarazs | 15:05 | |
*** kiennt26 has joined #openstack-infra | 15:05 | |
*** fresta has joined #openstack-infra | 15:06 | |
mriedem | e-r seems to be dead http://status.openstack.org/ - no updates since saturday | 15:09 |
*** slaweq has quit IRC | 15:10 | |
*** yamahata has joined #openstack-infra | 15:10 | |
*** r-daneel has quit IRC | 15:11 | |
*** r-daneel has joined #openstack-infra | 15:11 | |
sc` | the monday thundering herd :( | 15:11 |
*** hongbin has joined #openstack-infra | 15:12 | |
*** panda|lunch is now known as panda | 15:12 | |
*** gus has quit IRC | 15:13 | |
pabelanger | mriedem: I can look shortly | 15:13 |
*** makowals_ has joined #openstack-infra | 15:13 | |
sc` | my cookbooks' tests prevented me from pushing more than one of the zuul changes through at a time for chef. the ones that checked cleanly got merged to keep the amount of in-flight changes low | 15:13 |
*** eharney has joined #openstack-infra | 15:13 | |
sc` | ...come monday. thanks, finger-return. | 15:14 |
*** gus has joined #openstack-infra | 15:14 | |
*** makowals has quit IRC | 15:16 | |
*** Goneri has joined #openstack-infra | 15:17 | |
mgagne | pabelanger: did you see any improvement since we enable the quota refresher in inap-mtl01? | 15:19 |
*** oidgar has quit IRC | 15:19 | |
*** rfolco has quit IRC | 15:20 | |
pabelanger | mgagne: possible, when did you enable it? I haven't see any failures in a last while | 15:20 |
*** rfolco has joined #openstack-infra | 15:20 | |
pabelanger | mgagne: aside from when we upload new images, thunder hurd issue | 15:20 |
mgagne | enabled since January 16 | 15:21 |
*** rfolco has quit IRC | 15:21 | |
mgagne | pabelanger: ok, don't you have the same issue with other providers? | 15:21 |
mgagne | or only with us? | 15:21 |
pabelanger | mgagne: we have an issue in citycloud with quotas, but think that is a bug in nodepool. | 15:22 |
pabelanger | mgagne: maybe OVH would be the other place we seem some quota issue | 15:23 |
mgagne | pabelanger: and how about the thundering herd issue? | 15:23 |
*** hemna_ has quit IRC | 15:23 | |
pabelanger | mgagne: mostly inap, we often seem compute nodes fail to boot the new images, just after upload. But usually fixed by 2nd time we have launched on the compute node | 15:24 |
smcginnis | We're still supposed to be holding off on approvals, right? | 15:24 |
*** myoung is now known as myoung|brb | 15:24 | |
pabelanger | smcginnis: yah, for a bit longer. Would like to make sure corvus or another infra-root is only before we open the flood gates again | 15:24 |
mgagne | pabelanger: ok, I can try something on my side and see if it improves | 15:25 |
smcginnis | pabelanger: ack, thanks! | 15:25 |
pabelanger | mgagne: I think we talked in the past, but does compute nodes convert images to raw after download? Or is that settings disabled in nova | 15:26 |
mgagne | pabelanger: you are thinking like me =) | 15:26 |
*** bobh_ has joined #openstack-infra | 15:26 | |
pabelanger | mgagne: :) | 15:26 |
*** ijw has joined #openstack-infra | 15:28 | |
*** bobh_ has quit IRC | 15:28 | |
*** mylu has joined #openstack-infra | 15:33 | |
*** dhajare_ has quit IRC | 15:35 | |
*** Goneri has quit IRC | 15:36 | |
*** alexchadin has quit IRC | 15:37 | |
*** myoung|brb is now known as myoung | 15:37 | |
*** Goneri has joined #openstack-infra | 15:39 | |
*** r-daneel_ has joined #openstack-infra | 15:39 | |
*** r-daneel has quit IRC | 15:40 | |
*** r-daneel_ is now known as r-daneel | 15:40 | |
*** sree has joined #openstack-infra | 15:43 | |
*** claudiub has joined #openstack-infra | 15:46 | |
corvus | AJaeger, pabelanger: i stopped the script. it shouldn't have been a problem for zuul, but i guess folks got carried away? | 15:47 |
*** gcb has quit IRC | 15:47 | |
pabelanger | corvus: I'm still looking to see what tipped us over memory, but ya first indications we just +A too many at once | 15:48 |
AJaeger | corvus: 40+ of your changes approved in short order and all in gate is my theory of what brought us down... | 15:48 |
*** claudiub|3 has joined #openstack-infra | 15:48 | |
*** esberglu has quit IRC | 15:48 | |
*** hemna_ has joined #openstack-infra | 15:49 | |
*** claudiub|2 has quit IRC | 15:49 | |
*** esberglu has joined #openstack-infra | 15:49 | |
*** ramishra has quit IRC | 15:49 | |
corvus | perhaps i should only run the script during the week instead of on the weekend | 15:49 |
AJaeger | corvus: yeah... | 15:50 |
AJaeger | how do you want to get those in that are approved already? | 15:51 |
*** salv-orlando has joined #openstack-infra | 15:51 | |
*** claudiub has quit IRC | 15:51 | |
*** mylu has quit IRC | 15:53 | |
*** hemna_ has quit IRC | 15:54 | |
* AJaeger will be back later | 15:54 | |
corvus | AJaeger: not sure i understand the question | 15:54 |
*** andreww has joined #openstack-infra | 15:54 | |
*** mylu has joined #openstack-infra | 15:55 | |
AJaeger | corvus: we have 40+ approved but not merged changes for this. Do you want to recheck them one by one? Merge them with some time in between? Or what should be done with them? | 15:55 |
*** salv-orlando has quit IRC | 15:55 | |
*** xarses_ has joined #openstack-infra | 15:55 | |
*** tosky has quit IRC | 15:56 | |
*** inc0 has joined #openstack-infra | 15:56 | |
*** david-lyle has joined #openstack-infra | 15:56 | |
*** b_bezak has quit IRC | 15:57 | |
*** sree_ has joined #openstack-infra | 15:57 | |
*** felipemonteiro has joined #openstack-infra | 15:57 | |
corvus | AJaeger: i don't really want to do anything with them right now. | 15:57 |
*** sree_ is now known as Guest64456 | 15:57 | |
*** eharney has quit IRC | 15:58 | |
mnaser | corvus: there is 2 individuals who have been doing a lot of bulk changes in puppet openstack without discussing with us that noticed your changes and started doing the same as you | 15:58 |
mnaser | but without 20 minutes.. | 15:58 |
mnaser | i think their attempt at helping might have contributed | 15:58 |
*** andreww has quit IRC | 15:59 | |
*** sree has quit IRC | 15:59 | |
corvus | mnaser: i guess i should send out emails when i do this. i had hoped to just save everyone from having to worry or even think about it by just doing it. | 15:59 |
AJaeger | mnaser: I saw melissaml and Tuan - and sent both emails once I noticed. Tuan abandoned the changes. I think melissa did not submit new ones... | 15:59 |
*** lucas-hungry is now known as lucasagomes | 15:59 | |
mnaser | AJaeger: thanks, i tried to send emails too, some -1's were tossed at each other :\ | 16:00 |
mnaser | we get this often in puppet-openstack where we get a big bulk of changes like this without being consulted but anyways | 16:00 |
AJaeger | mnaser: and a third person indeed on puppet Hoang Trung Hieu ;( | 16:00 |
*** eharney has joined #openstack-infra | 16:01 | |
mnaser | its a bit of a mess yeah, i'm waiting for gate to slow down and i was staggering my approves too | 16:01 |
*** felipemonteiro has quit IRC | 16:01 | |
AJaeger | question now is, abandon those puppet reviews and let corvus's ones in - or merge them? | 16:02 |
* AJaeger really leaves now | 16:02 | |
mnaser | AJaeger: ill discuss this with the puppet team | 16:03 |
*** tesseract has quit IRC | 16:03 | |
*** tosky has joined #openstack-infra | 16:04 | |
pabelanger | okay, zuuls results queue looks to be caught up. I'm hoping the wavyness of nodes / requests will start to level out: http://grafana.openstack.org/dashboard/db/zuul-status | 16:05 |
*** jappleii__ has quit IRC | 16:05 | |
pabelanger | I'm going to top up coffee | 16:05 |
*** hemna_ has joined #openstack-infra | 16:06 | |
*** kiennt26 has quit IRC | 16:10 | |
EmilienM | pabelanger: I'm not sure what to do with https://review.openstack.org/#/c/538012/ | 16:13 |
EmilienM | in POST_FAILURE for release notes | 16:13 |
*** xarses_ has quit IRC | 16:14 | |
*** slaweq has joined #openstack-infra | 16:14 | |
*** slaweq has quit IRC | 16:14 | |
*** xarses_ has joined #openstack-infra | 16:14 | |
EmilienM | is it related to the zuul heavy load? | 16:14 |
pabelanger | EmilienM: looks like bug with job: http://logs.openstack.org/12/538012/1/check/build-openstack-releasenotes/5ee5d6d/job-output.txt.gz#_2018-01-29_16_02_43_562070 | 16:14 |
pabelanger | EmilienM: I'll dig more into it shortly, still watching zuul this morning | 16:15 |
corvus | pabelanger: fyi the status alert still says don't recheck or approve | 16:15 |
pabelanger | corvus: yah, I think we can clear now. | 16:15 |
clarkb | EmilienM: pabelanger looks like the release note build failed because the source dir couldn't be found which led to not having build artifacts to sync which led to a post failure | 16:16 |
pabelanger | how does the following sound: status ok zuul.o.o is back online, feel free to recheck / approve patches. | 16:16 |
*** dsariel has quit IRC | 16:16 | |
*** mylu has quit IRC | 16:17 | |
EmilienM | clarkb: ah yeah, sources dir is missing, let me fix that | 16:17 |
EmilienM | that's probably it | 16:17 |
*** felipemonteiro has joined #openstack-infra | 16:18 | |
pabelanger | #status ok zuul.o.o is back online, feel free to recheck / approve patches. | 16:18 |
openstackstatus | pabelanger: sending ok | 16:18 |
pabelanger | AJaeger: mriedem: looks like we still need to get https://review.openstack.org/533608/ to help nova. I rechecked the depends-on patches already | 16:19 |
*** links has quit IRC | 16:20 | |
mriedem | pabelanger: yup thanks - i was holding off on the rechecks b/c of the earlier 'don't recheck' | 16:20 |
*** ChanServ changes topic to "Discussion of OpenStack Developer and Community Infrastructure | docs http://docs.openstack.org/infra/ | bugs https://storyboard.openstack.org/ | source https://git.openstack.org/cgit/openstack-infra/ | channel logs http://eavesdrop.openstack.org/irclogs/%23openstack-infra/" | 16:20 | |
-openstackstatus- NOTICE: zuul.o.o is back online, feel free to recheck / approve patches. | 16:20 | |
*** felipemonteiro_ has joined #openstack-infra | 16:21 | |
openstackgerrit | Merged openstack-infra/nodepool master: Fix race in test_failed_provider https://review.openstack.org/538529 | 16:21 |
pabelanger | clarkb: have you see http://paste.openstack.org/show/657148/ before? that is on logstash-worker01.o.o | 16:22 |
clarkb | pabelanger: ya that shouldn't be fatal I don't think unless we've filled the disk again with crm114 data | 16:23 |
*** olaph1 has joined #openstack-infra | 16:24 | |
openstackstatus | pabelanger: finished sending ok | 16:24 |
openstackgerrit | Javier Peña proposed openstack-infra/system-config master: Move AFS mirror code to puppet-openstackci https://review.openstack.org/529032 | 16:24 |
*** olaph has quit IRC | 16:24 | |
*** felipemonteiro has quit IRC | 16:25 | |
pabelanger | clarkb: yah, does appear to be fatal, but haven't see that worker process anything else after it | 16:25 |
pabelanger | let me poke around why that is | 16:25 |
*** pcaruana has quit IRC | 16:25 | |
pabelanger | logproc+ 18309 4.5 0.0 0 0 ? Z 16:19 0:20 [classify-log.cr] <defunct> | 16:27 |
pabelanger | that doesn't looks healthy | 16:27 |
pabelanger | around same time too | 16:27 |
*** yamamoto has quit IRC | 16:28 | |
pabelanger | clarkb: where should I be looking for crm114 data? | 16:28 |
clarkb | pabelanger: /var/run/crm114 iirc | 16:29 |
clarkb | might be /var/lib/crm114 | 16:29 |
clarkb | pabelanger: typically when I debug e-r/es/logstash problems I try to start at the bottom of the pipeline. So make sure es is happy first. Then logstash is running. Then logstash workers. It could be that logstash OOM'd or something which made teh workers unhappy | 16:32 |
*** dhill_ has quit IRC | 16:33 | |
*** kopecmartin has quit IRC | 16:34 | |
pabelanger | okay | 16:34 |
*** slaweq has joined #openstack-infra | 16:34 | |
*** armaan has joined #openstack-infra | 16:36 | |
*** andreas_s has quit IRC | 16:37 | |
*** tushar_ has joined #openstack-infra | 16:37 | |
*** andreas_s has joined #openstack-infra | 16:38 | |
*** armaan_ has quit IRC | 16:39 | |
tushar_ | Hi | 16:39 |
tushar_ | can we use zuul v2 with github instead of gerrit? | 16:39 |
*** felipemonteiro_ has quit IRC | 16:40 | |
pabelanger | tushar_: only zuulv3 today has native support github integration | 16:40 |
*** dhill_ has joined #openstack-infra | 16:40 | |
*** andreas_s has quit IRC | 16:41 | |
*** andreas_s has joined #openstack-infra | 16:42 | |
*** andreas_s has quit IRC | 16:42 | |
*** andreas_s has joined #openstack-infra | 16:43 | |
*** yamamoto has joined #openstack-infra | 16:43 | |
*** jpena is now known as jpena|brb | 16:44 | |
*** spligak_ has quit IRC | 16:44 | |
pabelanger | mgagne: do you mind checking the following IP in http://logs.openstack.org/23/524423/44/gate/openstack-tox-py35/a86c30b/zuul-info/inventory.yaml | 16:45 |
mgagne | pabelanger: what's happening? | 16:45 |
mgagne | orphan/zombie instance? | 16:45 |
pabelanger | mgagne: we're seeing WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! | 16:45 |
pabelanger | yah, maybe | 16:45 |
pabelanger | http://logs.openstack.org/23/524423/44/gate/openstack-tox-py35/a86c30b/job-output.txt.gz#_2018-01-29_16_35_05_946124 | 16:45 |
tushar_ | pabelanger : ok, that means zuul v2 can only work with gerrit | 16:45 |
*** e0ne has quit IRC | 16:46 | |
pabelanger | tushar_: yah, we only supported gerrit for zuulv2 | 16:46 |
tushar_ | pabelanger : thank you | 16:46 |
*** slaweq_ has joined #openstack-infra | 16:47 | |
*** tesseract has joined #openstack-infra | 16:47 | |
*** oidgar has joined #openstack-infra | 16:47 | |
tobiash | tushar_: there were some experimental patches for v2 but | 16:48 |
*** yamamoto has quit IRC | 16:48 | |
tobiash | I think for github it's better to switch to v3 | 16:48 |
*** ykarel|away has joined #openstack-infra | 16:49 | |
tushar_ | tobiash : yes , thank you | 16:49 |
prometheanfire | 10:50 < smcginnis > prometheanfire: I think I saw something that the "/#/c" format of the URLs don't work for the new Depends-on syntax. | 16:50 |
prometheanfire | can someone confirm? | 16:50 |
mgagne | pabelanger: I found the instance and destroyed it | 16:51 |
mordred | prometheanfire: the c format works fine | 16:51 |
*** salv-orlando has joined #openstack-infra | 16:51 | |
*** yamamoto has joined #openstack-infra | 16:51 | |
prometheanfire | mordred: ok, thanks | 16:51 |
*** yamamoto has quit IRC | 16:51 | |
prometheanfire | smcginnis: #/c works aparently :D | 16:52 |
*** slaweq_ has quit IRC | 16:52 | |
*** andreas_s has quit IRC | 16:52 | |
pabelanger | mgagne: thanks! do you know the UUID? Want to see if we had an error on nodepool side | 16:52 |
smcginnis | Oh, great. | 16:52 |
mgagne | c7de571d-bc73-4945-97b6-4c738276962e | 16:52 |
*** Guest64456 has quit IRC | 16:53 | |
smcginnis | prometheanfire, mordred: What about the second to last paragraph here: http://lists.openstack.org/pipermail/openstack-dev/2018-January/126650.html | 16:53 |
Roamer` | tobiash, pabelanger, tushar_, but can't one still use github with zuulv2 only as a Git mirror for "projects from git", not as an event stream? Still follow the review.o.o event stream and get the patches from Gerrit, but then check out the other repos from GitHub? | 16:53 |
*** salv-orlando has quit IRC | 16:54 | |
Roamer` | (I've never tried it, I just can't see a reason why it would not work) | 16:54 |
*** salv-orlando has joined #openstack-infra | 16:54 | |
*** eharney_ has joined #openstack-infra | 16:54 | |
mgagne | pabelanger: seems there are more, will check | 16:54 |
tobiash | Roamer`: sure, that works | 16:56 |
*** andreas_s has joined #openstack-infra | 16:57 | |
pabelanger | mgagne: nodepool looks to have been fine with that UUID, no exceptions raised | 16:57 |
*** jamesmcarthur has quit IRC | 16:57 | |
mordred | smcginnis: oh! well, ignore me and pay attention to jim always | 16:57 |
mgagne | pabelanger: ok, could be that if you restart a Nova service, it might just drop the request and never properly delete the instance. | 16:58 |
pabelanger | Roamer`: right, in that case zuul doesn't have any interactions with github, just gerrit. So that is supported, as long as you replicate projects to github, end users could pull from github. But like you say, patches must be sent to gerrit | 16:58 |
pabelanger | mgagne: k | 16:58 |
*** eharney has quit IRC | 16:58 | |
mordred | prometheanfire: (also, pay attention to smcginnis and corvus and ignore me - I'm wrong) | 16:58 |
mgagne | pabelanger: deleting ~10 rogue instances | 16:58 |
*** jpich has quit IRC | 16:59 | |
pabelanger | mgagne: thank you | 16:59 |
prometheanfire | mordred: ok, so just remove the /#/c from the url? | 17:00 |
mordred | prometheanfire: yah | 17:00 |
smcginnis | mordred: :) | 17:00 |
mnaser | at what point can i queue another check in zuul to avoid memory blowing up? once i see the jobs appear in zuul.openstack.org -- can i continue at that point? | 17:00 |
mgagne | all done. bug is still in our backlog... =( | 17:00 |
*** andreas_s has quit IRC | 17:01 | |
mordred | Roamer`: you can ... but you should almost certainly not, as github is actually less stable as a source of cloning git refs. you should use git.openstack.org as a source of cloning refs and review.o.o as the source of the event stream | 17:03 |
*** janki has quit IRC | 17:04 | |
mordred | Roamer`: I'm not saying that to be a hater, fwiw - we recently connected zuulv3 to ansible/ansible on github and had a pile of issues related to being able to clone/fetch from github which caused us to need to turn off the github connection until we can go make the cloning more robust | 17:05 |
*** janki has joined #openstack-infra | 17:05 | |
*** pcichy has joined #openstack-infra | 17:06 | |
tobiash | mordred: did you have problems with auth and so on or just networking instabilities? | 17:08 |
*** ijw has quit IRC | 17:09 | |
mordred | tobiash: just networking and/or missing refs | 17:09 |
tobiash | mordred: when running in github app mode you also need https://review.openstack.org/#/c/535716/ | 17:09 |
*** janki has quit IRC | 17:09 | |
*** sshnaidm_ is now known as sshnaidm | 17:09 | |
tobiash | at least if auth matters | 17:10 |
mordred | tobiash: oh - it's possible I don't actually know - in this case it's a public repo so we should have just been cloning/fetching normally | 17:10 |
tobiash | (and it matters for api rate limits) | 17:10 |
*** wolverineav has quit IRC | 17:10 | |
pabelanger | mnaser: is the results queue on zuul.o.o is backing up (currently 316 results) then it is likely possible zuul is doing dynamic reloads. Not always, but seems to be a good indicator. | 17:10 |
mnaser | ok i'll watch that number and try to approve when that number is low | 17:10 |
pabelanger | mnaser: ideally it should be 0 most of the time, when it is growing, zuul is usually doing something with CPU | 17:11 |
* mnaser write a bot to notify when is a good time to approve changes :P | 17:11 | |
mordred | tobiash: I haven't looked at the actual errors in the logs myself - but I think ours were different than yours in that patch | 17:12 |
*** eharney_ is now known as eharney | 17:12 | |
tobiash | mordred: probably, but github also rate limits anonymous clones | 17:13 |
*** armax has joined #openstack-infra | 17:14 | |
mordred | tobiash: good point | 17:14 |
pabelanger | mordred: mnaser: do you mind reviewing https://review.openstack.org/537995/, is the removal of infracloud-chocolate in nodepool. | 17:15 |
pabelanger | would like to see about starting to clean that up this week | 17:15 |
*** ykarel|away has quit IRC | 17:16 | |
mordred | pabelanger: do you want me to avoid +A? | 17:16 |
mordred | pabelanger: +2 - will let you +A as needed | 17:16 |
pabelanger | mordred: yah, just wanted to get some eyes on it. I'm happy to +3 if everybody is good | 17:17 |
pabelanger | clarkb: ^ | 17:17 |
Shrews | pabelanger: oh, that reminds me. i'm going to use the new erase command to cleanup vanilla data | 17:17 |
pabelanger | Shrews: yay | 17:17 |
*** tmorin has quit IRC | 17:18 | |
Shrews | pabelanger: w00t. success | 17:18 |
Shrews | 7 nodes of build data removed | 17:18 |
pabelanger | cool | 17:19 |
mnaser | pabelanger: lgtm too, feel free to +A when you would like | 17:20 |
pabelanger | Shrews: is there another command to erase images? | 17:20 |
Shrews | pabelanger: no, 'erase' does both images and nodes (note the actual images or instances, just the zk data) | 17:21 |
Shrews | pabelanger: i had manually done the vanilla nodes earlier (which led to me creating the new command) | 17:21 |
*** r-daneel_ has joined #openstack-infra | 17:21 | |
Shrews | s/note/not/ | 17:22 |
Shrews | pabelanger: 'nodepool info infracloud-chocolate' will show what will be removed if you s/info/erase/ | 17:22 |
*** r-daneel has quit IRC | 17:23 | |
*** r-daneel_ is now known as r-daneel | 17:23 | |
clarkb | pabelanger: ya I just wanted to make sure we didn't remove it if someone had more info or had reason to keep it | 17:24 |
clarkb | but its been a while now probably fine to remove if it hasn't come back | 17:24 |
pabelanger | Shrews: okay, cool. So did you run it against vanilla right? | 17:25 |
Shrews | pabelanger: correct. you want to do chocolate? | 17:25 |
*** mlavalle has joined #openstack-infra | 17:25 | |
pabelanger | Shrews: yah, think so. It is still down and we are landing 537995 | 17:25 |
pabelanger | clarkb: wfm | 17:26 |
mlavalle | Hi, are we also expected to move the neutron periodic jobs to the neutron repo: https://github.com/openstack-infra/project-config/blob/master/zuul.d/projects.yaml#L10220? | 17:26 |
*** gyee has joined #openstack-infra | 17:29 | |
*** armaan_ has joined #openstack-infra | 17:29 | |
*** armaan_ has quit IRC | 17:31 | |
clarkb | mlavalle: I think the ultimate goal is to have everything except for maybe the system-required template moved into the repos themselves | 17:31 |
clarkb | that said, I'm not sure how periodic which are branchless will interact if defined in branched repos | 17:31 |
*** jpena|brb is now known as jpena | 17:31 | |
*** armaan_ has joined #openstack-infra | 17:32 | |
mlavalle | clarkb: thanks. how can we clarify that? | 17:32 |
*** armaan has quit IRC | 17:32 | |
clarkb | mlavalle: probably just need to try it and see how it works. I think you may end up wanting to define branch specific periodic jobs in each branch? | 17:33 |
corvus | yes, that should work | 17:33 |
mlavalle | clarkb, corvus: ok cool. I'll put together a patch for Neutron master and start playing with it. I will hold off on merging until Rocky | 17:34 |
corvus | instead of 'periodic-neutron-foo-pike' just create 'periodic-neutron-foo' and define it on the pike branch, and put another copy on the ocata branch, etc. | 17:34 |
mlavalle | ok, sounnds reasonable | 17:35 |
mlavalle | matches this: https://docs.openstack.org/infra/manual/zuulv3.html#periodic-jobs | 17:35 |
*** slaweq_ has joined #openstack-infra | 17:36 | |
AJaeger | mlavalle: no need to name them periodic-, see https://docs.openstack.org/infra/manual/drivers.html#consistent-naming-for-jobs-with-zuul-v3 | 17:37 |
AJaeger | mlavalle: and you can easily test them, add them *initially* to check gate to see that the job works - but don't merge. Once it works, add to periodic pipeline. | 17:38 |
*** ykarel|away has joined #openstack-infra | 17:38 | |
mlavalle | AJaeger: ok, will keep both things in mind :-) | 17:38 |
mlavalle | thnaks | 17:38 |
mlavalle | we will merge after we release Queens | 17:39 |
*** oidgar has quit IRC | 17:40 | |
*** slaweq_ has quit IRC | 17:40 | |
corvus | oh, yes, what AJaeger said. | 17:41 |
*** myoung is now known as myoung|food | 17:42 | |
smcginnis | Looks like we had a release job failure for python-blazarclient. | 17:42 |
smcginnis | - tag-releases finger://ze03.openstack.org/1a4929c9670547a89fd3eb23896329d7 : POST_FAILURE in 3m 55s | 17:42 |
smcginnis | How do we determine what happened there? | 17:43 |
smcginnis | cc dhellmann ^ | 17:43 |
corvus | smcginnis: infra-root is required if all we have is the finger url | 17:43 |
corvus | smcginnis: i'll dig it up | 17:43 |
smcginnis | Thanks corvus | 17:44 |
corvus | 2018-01-29 17:28:16,458 DEBUG zuul.AnsibleJob: [build: 1a4929c9670547a89fd3eb23896329d7] msg: 'There was an issue creating /srv/static/logs/9c/9c06fcd854a83f2e348e062cbea507b7d752369a | 17:45 |
corvus | 2018-01-29 17:28:16,459 DEBUG zuul.AnsibleJob: [build: 1a4929c9670547a89fd3eb23896329d7] as requested: [Errno 13] Permission denied: ''/srv/static/logs/9c/9c06fcd854a83f2e348e062cbea507b7d752369a''' | 17:45 |
corvus | dmsimard|afk, infra-root: ^ i suspect something is amiss with the rsync | 17:46 |
pabelanger | oh no | 17:46 |
*** jamesmcarthur_ has quit IRC | 17:46 | |
corvus | ls -la /srv/static/logs/9c/ | 17:46 |
dhellmann | smcginnis : the new missing-releases output: http://paste.openstack.org/show/657270/ | 17:46 |
corvus | drwx------ 2 root root 4096 Jan 28 14:04 . | 17:46 |
pabelanger | read-only FS again | 17:47 |
corvus | pabelanger: where do you see that? | 17:47 |
corvus | pabelanger: it's an ownership issue | 17:47 |
openstackgerrit | Merged openstack-infra/project-config master: Remove infracloud-chocolate from nodepool https://review.openstack.org/537995 | 17:47 |
pabelanger | corvus: still confirming | 17:47 |
pabelanger | corvus: yes, please ignore me | 17:48 |
corvus | ok | 17:48 |
*** jamesmcarthur has joined #openstack-infra | 17:49 | |
corvus | infra-root: i'm not really well enough to operate machinery as root. can someone correct the filesystem permissions, and decide if we need to stop dmsimard's rsync? | 17:50 |
pabelanger | yes, give me a moment and I'll start looking | 17:50 |
clarkb | I'm not entirely sure I know everything going on here but let me know how I can help | 17:50 |
*** sree has joined #openstack-infra | 17:51 | |
pabelanger | clarkb: yes, we have a mix of root / jenkins permissions on /srv/static/logs | 17:52 |
pabelanger | I think this is because dmsimard|afk rsync process is running as root | 17:52 |
*** yamamoto has joined #openstack-infra | 17:52 | |
pabelanger | and not jenkins user | 17:52 |
corvus | smcginnis: the job otherwise succeeded, only failed to copy the logs | 17:53 |
clarkb | is it running with -a? that should've preserved uids | 17:53 |
clarkb | but ya we may need to stop the rsync, chmod what is there then restart rsync with proper uid handle | 17:53 |
pabelanger | I am not sure if temp server has jenkins user, would need to first check | 17:53 |
clarkb | pabelanger: even if it doesn't the filesystem should have ownership set to those uids | 17:53 |
Shrews | pabelanger: fyi, chocolate zk data erased | 17:53 |
pabelanger | clarkb: possible chown happens after all data is copied? | 17:54 |
pabelanger | Shrews: ack | 17:54 |
dhellmann | corvus , smcginnis : I see the zaqar client release jobs in the queue | 17:54 |
clarkb | pabelanger: my concern about that is the rsync is supposed to take like a week? and then who knows how long the chown will take | 17:54 |
*** slaweq_ has joined #openstack-infra | 17:55 | |
*** florianf has quit IRC | 17:55 | |
smcginnis | dhellmann: I think this was blazarclient. | 17:55 |
*** sree has quit IRC | 17:55 | |
pabelanger | clarkb: agree. I think best case now is stop rsync, chown /srv/static/logs, then look back into rsync restore | 17:55 |
clarkb | pabelanger: that would be my vote too | 17:56 |
pabelanger | okay, let me connect to screen | 17:56 |
pabelanger | and stopped | 17:56 |
dhellmann | smcginnis : blazarclient just merged and is still in release-post | 17:57 |
pabelanger | chown -R jenkins:jenkins running | 17:57 |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid-resources master: Added endpoint get speakers summits assistance by summit https://review.openstack.org/538992 | 17:58 |
*** rkukura has joined #openstack-infra | 17:58 | |
*** slaweq_ has quit IRC | 17:59 | |
*** wolverineav has joined #openstack-infra | 18:00 | |
*** derekh has quit IRC | 18:00 | |
*** yamamoto has quit IRC | 18:01 | |
*** sambetts is now known as sambetts|afk | 18:01 | |
*** sree has joined #openstack-infra | 18:04 | |
*** slaweq has quit IRC | 18:07 | |
*** agopi|out has joined #openstack-infra | 18:07 | |
*** slaweq has joined #openstack-infra | 18:08 | |
*** efoley has quit IRC | 18:08 | |
*** sree has quit IRC | 18:09 | |
dmsimard|afk | corvus, pabelanger: hey, briefly stepping in before I go out again.. the rsync shouldbe running with -avz --progress | 18:09 |
*** weshay|ruck is now known as weshay|ruck|brb | 18:09 | |
dmsimard|afk | I also questioned in infra-incident yesterday whether we should bother with the rsync at all considering it is bound to take several days | 18:09 |
*** david-lyle has quit IRC | 18:09 | |
*** wolverineav has quit IRC | 18:10 | |
*** yamahata has quit IRC | 18:10 | |
dmsimard|afk | I'm totally open to stopping the rsync altogether | 18:10 |
openstackgerrit | Merged openstack-infra/openstackid-resources master: Added endpoint get speakers summits assistance by summit https://review.openstack.org/538992 | 18:11 |
*** trown|rover is now known as trown|lunch | 18:11 | |
pabelanger | rsync is already stopped, now trying to change permissions of root:root back to jenkins:jenkins | 18:11 |
pabelanger | clarkb: wonder if we should just find -type d first, then set to jenkins: jenkins. I'm then guessing ansible would do the right thing to upload new files | 18:12 |
*** ralonsoh_ has quit IRC | 18:12 | |
pabelanger | we can then run it again on files after | 18:12 |
clarkb | pabelanger: ya not sure if the extra checking will be faster or slower | 18:13 |
clarkb | what you've got running now is probably fine? | 18:13 |
pabelanger | k, I did chown top level directories first | 18:14 |
pabelanger | but is running now across everything, and in 00/ directory | 18:14 |
*** myoung|food is now known as myoung | 18:14 | |
*** ykarel|away has quit IRC | 18:16 | |
*** Swami has joined #openstack-infra | 18:17 | |
openstackgerrit | Fabien Boucher proposed openstack-infra/zuul-jobs master: Propose to move submit-log-processor-jobs and submit-logstash-jobs in zuul-jobs https://review.openstack.org/537847 | 18:17 |
*** jamesmcarthur has quit IRC | 18:18 | |
*** jpena is now known as jpena|off | 18:18 | |
mgagne | pabelanger: so I disabled force_raw_images in inap-mtl01, lets see how it goes | 18:19 |
pabelanger | mgagne: let me check my notes, I want to say there is a 2nd setting needed | 18:19 |
mgagne | pabelanger: ok, let me know =) | 18:20 |
*** weshay|ruck|brb is now known as weshay | 18:20 | |
*** weshay is now known as weshay|ruck | 18:20 | |
pabelanger | mgagne: https://review.openstack.org/368955/ | 18:20 |
pabelanger | mgagne: we also set libvirt/images_type to qcow2 | 18:21 |
*** jamesmcarthur has joined #openstack-infra | 18:21 | |
*** slaweq has quit IRC | 18:23 | |
*** slaweq has joined #openstack-infra | 18:23 | |
pabelanger | smcginnis: dhellmann: do you mind trying your release-test project tag, would like to see if logging was been corrected (I believe it should be for release). | 18:24 |
mgagne | pabelanger: could it be that image_type defaults to flat if cow is disabled? https://github.com/openstack/nova/blob/ffd59abf1635b35e38396468f9828e2d8cc85f09/nova/virt/libvirt/imagebackend.py#L1149 | 18:24 |
*** tosky has quit IRC | 18:24 | |
*** felipemonteiro has joined #openstack-infra | 18:24 | |
mgagne | but use_cow_images is true by default: https://github.com/openstack/nova/blob/f96e89cc5183e107cffeaf47525ab337c18d7e14/nova/conf/compute.py#L230 | 18:25 |
*** felipemonteiro_ has joined #openstack-infra | 18:25 | |
clarkb | pabelanger: what server is the rsync running from? | 18:26 |
*** jamesmcarthur has quit IRC | 18:26 | |
pabelanger | clarkb: logs.o.o is where I found screen | 18:26 |
pabelanger | mgagne: I'm not sure, I'd have to look into it more | 18:26 |
mgagne | ok, or lets wait until tomorrow and see if it fixes anything. Or I can stop being lazy and test it =) | 18:27 |
pabelanger | mgagne: yah, infracloud was mitaka, so it might have been fixed since then | 18:28 |
mgagne | pabelanger: oh, we are still running mitaka =) | 18:28 |
dmsimard|afk | clarkb, pabelanger: the server the data is in is 104.130.246.187 | 18:28 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Partial revert for disabled provider change https://review.openstack.org/538995 | 18:28 |
clarkb | dmsimard|afk: pabelanger cool thanks | 18:28 |
mgagne | pabelanger: btw, why aren't you updating? should you be using ci/cd? /s =) | 18:28 |
Shrews | I'm not sure what made me discover it, but 538995 fixes a change that just merged. | 18:29 |
pabelanger | mgagne: lost access to hardware | 18:29 |
mgagne | that's a good reason I guess lol | 18:29 |
pabelanger | :) | 18:29 |
*** felipemonteiro has quit IRC | 18:29 | |
*** efried is now known as efried_hexchat | 18:29 | |
*** r-daneel_ has joined #openstack-infra | 18:33 | |
rosmaita | hate to be a PITA (who am i kidding), but could someone take a look at https://review.openstack.org/536525 ? it's been showing as all success on zuul status page for a while now, but is still in the integrated queue | 18:34 |
rosmaita | "for a while | 18:34 |
rosmaita | " > 1 hour | 18:34 |
*** r-daneel has quit IRC | 18:34 | |
clarkb | rosmaita: it cannot merge until the things ahead of it in the gate queue merge or are rejected | 18:34 |
rosmaita | clarkb gotcha, i was only looking locally | 18:36 |
rosmaita | now the outlook is much bleaker when i look globally :( | 18:36 |
*** r-daneel has joined #openstack-infra | 18:37 | |
*** spzala has joined #openstack-infra | 18:37 | |
*** spzala has quit IRC | 18:37 | |
*** r-daneel_ has quit IRC | 18:38 | |
*** jamesmcarthur has joined #openstack-infra | 18:38 | |
*** e0ne has joined #openstack-infra | 18:40 | |
*** jamesmcarthur has quit IRC | 18:42 | |
*** jamesmcarthur has joined #openstack-infra | 18:44 | |
*** jaosorior has quit IRC | 18:48 | |
mriedem | are we ok to recheck things now? | 18:50 |
*** camunoz has joined #openstack-infra | 18:50 | |
smcginnis | mriedem: Should be OK. | 18:51 |
mriedem | ok | 18:51 |
clarkb | the only ting you may run into at tis point is the permissions issue on the log server which pabelanger is working to fix | 18:51 |
*** hemna_ has quit IRC | 18:52 | |
clarkb | pabelanger: any idea how far it has gotten at this point? | 18:53 |
*** yamahata has joined #openstack-infra | 18:53 | |
*** shardy has quit IRC | 18:56 | |
*** jamesmcarthur has quit IRC | 18:57 | |
cmurphy | https://review.openstack.org/#/c/537645/ is about to fail with a timeout in neutron-grenade and domino a bunch of other jobs, would it make sense to increase the timeout in neutron-grenade? | 18:58 |
*** tushar_ has quit IRC | 19:00 | |
AJaeger | cmurphy: we increased timeout of unit tests already, do we see this more often? Then it makes sense... | 19:01 |
AJaeger | cmurphy: job lives in neutron repo | 19:01 |
*** slaweq_ has joined #openstack-infra | 19:01 | |
AJaeger | cmurphy: http://zuul.openstack.org/builds.html?job_name=neutron-grenade | 19:02 |
clarkb | infra-root can you take a look at https://bugs.launchpad.net/openstack-ci/+bug/1745512 to see if my comment(s) are accurate? | 19:03 |
openstack | Launchpad bug 1745512 in OpenStack Core Infrastructure "openstack email server blacklisted" [High,Confirmed] - Assigned to OpenStack CI Core (openstack-ci-core) | 19:03 |
cmurphy | i guess this is the first i've seen it for neutron-grenade in particular, it just seems like something or other is always timing out and taking out a queue of successful jobs with it | 19:03 |
*** jamesmcarthur has joined #openstack-infra | 19:03 | |
*** tosky has joined #openstack-infra | 19:03 | |
clarkb | cmurphy: its fairly common for things to break all ove rduring feature freeze :/ | 19:03 |
cmurphy | okay, it's just really discouraging and i'm looking for ways to help | 19:04 |
clarkb | cmurphy: what seems to happen is we merge a ton of code last minute that hasn't had the same level of review or testing in order to get it in with the idea we'll fix it before release | 19:05 |
*** slaweq_ has quit IRC | 19:05 | |
clarkb | that coupled with the extra load in general for the extra demand results in sadness | 19:05 |
clarkb | and the result is we spend the next few weeks whack a moling things to make them happy again | 19:06 |
*** jamesmcarthur has quit IRC | 19:06 | |
cmurphy | yeah, i've definitely learned to plan better for next time | 19:07 |
*** lucasagomes is now known as lucas-afk | 19:08 | |
*** jamesmcarthur has joined #openstack-infra | 19:09 | |
*** dprince has quit IRC | 19:10 | |
*** sshnaidm is now known as sshnaidm|afk | 19:12 | |
*** jamesmcarthur has quit IRC | 19:12 | |
*** trown|lunch is now known as trown|rover | 19:14 | |
*** jamesmcarthur has joined #openstack-infra | 19:16 | |
AJaeger | cmurphy's https://review.openstack.org/#/c/537645/ has timed out first roles and no output since 90 mins ;( we should take it out of the queue instead of waiting longer ;( | 19:17 |
pabelanger | clarkb: only in 03, we have a long way to go | 19:17 |
pabelanger | AJaeger: no, please done | 19:19 |
pabelanger | don't* | 19:19 |
cmurphy | AJaeger: looks like it just aborted | 19:19 |
pabelanger | I think zuul would have rerun it | 19:19 |
pabelanger | :( | 19:19 |
AJaeger | ARGH ;( | 19:19 |
AJaeger | sorry, i rebased | 19:19 |
*** CrayZee has joined #openstack-infra | 19:19 | |
*** CrayZee is now known as snapiri- | 19:19 | |
cmurphy | i didn't know that timing out jobs would automatically be rerun, that's good to know | 19:20 |
*** jamesmcarthur has quit IRC | 19:20 | |
pabelanger | it depends on the return result we get back from ansible, in some cases we requeue the job to try again | 19:21 |
AJaeger | cmurphy: I didn't either | 19:21 |
pabelanger | especially if provider is having issues | 19:21 |
pabelanger | at least, that is what I have seen before | 19:21 |
AJaeger | indeed, we do - should have looked closer ;( | 19:22 |
*** edmondsw has quit IRC | 19:23 | |
openstackgerrit | Matthieu Huin proposed openstack-infra/zuul master: [WIP] zuul web: add admin endpoint, enqueue commands https://review.openstack.org/539004 | 19:23 |
*** harlowja has joined #openstack-infra | 19:25 | |
pabelanger | AJaeger: yah, not output is bug in timeout handler. Often seen if we have networking issues with provider. We'd wait until to timeout value for all phases of post-run playbooks | 19:26 |
pabelanger | I think it was hung on deleting SSH keys | 19:26 |
*** david-lyle has joined #openstack-infra | 19:27 | |
*** tesseract has quit IRC | 19:31 | |
*** snapiri- has quit IRC | 19:32 | |
openstackgerrit | sebastian marcet proposed openstack-infra/openstackid-resources master: Added endpoint delete summit speaker assistance https://review.openstack.org/539007 | 19:35 |
*** jamesmcarthur has joined #openstack-infra | 19:38 | |
*** xarses_ has quit IRC | 19:44 | |
*** dprince has joined #openstack-infra | 19:45 | |
*** kjackal has quit IRC | 19:48 | |
*** mriedem1 has joined #openstack-infra | 19:52 | |
*** mriedem has quit IRC | 19:52 | |
*** xarses_ has joined #openstack-infra | 19:53 | |
*** e0ne has quit IRC | 19:54 | |
*** jamesmcarthur has quit IRC | 19:55 | |
*** e0ne has joined #openstack-infra | 19:57 | |
*** mriedem1 is now known as mriedem | 19:57 | |
openstackgerrit | Merged openstack-infra/openstackid-resources master: Added endpoint delete summit speaker assistance https://review.openstack.org/539007 | 19:58 |
*** e0ne has quit IRC | 19:59 | |
*** jamesmcarthur has joined #openstack-infra | 20:03 | |
*** pramodrj07 has joined #openstack-infra | 20:04 | |
*** slaweq_ has joined #openstack-infra | 20:08 | |
*** Swami has quit IRC | 20:09 | |
*** eharney has quit IRC | 20:09 | |
*** jamesmca_ has joined #openstack-infra | 20:11 | |
pabelanger | another gate reset by nova tox job, timed out again | 20:12 |
pabelanger | we should consider maybe promoting those patches | 20:12 |
AJaeger | EmilienM: please wait with further approvals on the Zuul project name removal changes | 20:12 |
*** slaweq_ has quit IRC | 20:12 | |
*** jamesmca_ has quit IRC | 20:13 | |
pabelanger | infra-root: any objections to promote https://review.openstack.org/536936/ to help stop integrate queue stop resetting? | 20:13 |
AJaeger | EmilienM: every change to zuul config files increases Zuuls memory usage and might lead us to kill again... | 20:13 |
EmilienM | AJaeger: ok... | 20:13 |
*** jamesmcarthur has quit IRC | 20:13 | |
*** jamesmca_ has joined #openstack-infra | 20:13 | |
EmilienM | AJaeger: I'm just reviewing patches | 20:13 |
EmilienM | how long should I wait? | 20:14 |
AJaeger | EmilienM: please take a break o nthose that touch zuul yaml files - until the current ones are merged I guess | 20:14 |
pabelanger | sorry, it is 537933 | 20:14 |
pabelanger | but 536936 is also important, but can look at that in a bit | 20:15 |
AJaeger | pabelanger: I'm fine with 537933 | 20:15 |
AJaeger | pabelanger: 536936 only defines when jobs to run | 20:15 |
ianw | seems ok (just catching up ...) | 20:15 |
*** Swami has joined #openstack-infra | 20:15 | |
AJaeger | so, important to not introduce regressions but it's on stable/ocata, so could wait as well a bit... | 20:15 |
AJaeger | morning ianw | 20:15 |
EmilienM | AJaeger: ok | 20:16 |
pabelanger | AJaeger: I believe https://review.openstack.org/533608/ is part of the issue too | 20:16 |
pabelanger | AJaeger: but 536936 and 536934 should also include timeout bump from 537933 otherwise, same issues will happen on stable branches | 20:17 |
pabelanger | mriedem: ^FYI | 20:17 |
cmurphy | 537933 seems to be timing out itself though :/ | 20:18 |
clarkb | thats interesting that they seem to blame kernel patches for meltdown? | 20:18 |
clarkb | I wonder if we have more data on that | 20:18 |
*** ldnunes has quit IRC | 20:18 | |
AJaeger | pabelanger: agreed | 20:19 |
*** Goneri has quit IRC | 20:19 | |
openstackgerrit | Matthieu Huin proposed openstack-infra/zuul master: zuul autohold: allow filtering per commit https://review.openstack.org/536993 | 20:19 |
mriedem | pabelanger: yeah i think i made a note to myself about that earlier today when the backports were failing b/c nova-tox-functional was timing out | 20:20 |
mriedem | that i might have to backport and squash the timeout change too | 20:20 |
pabelanger | cmurphy: I'm not sure what happened in tempest-full, but openstack-tox-functional-py35 is a duplicate test currently. Replaced by nova-tox-functional-py35 | 20:20 |
pabelanger | mriedem: ya, i think that might be good. | 20:20 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Remove webapp https://review.openstack.org/536780 | 20:21 |
AJaeger | mriedem: what about removing depends-on from https://review.openstack.org/#/c/533608/ and merge the change? | 20:21 |
pabelanger | AJaeger: another option is to just remove openstack-tox-functional-py35 from master branch (skip) then do so for other stable branches, until everything is properly backported in nova. To help keep master moving | 20:21 |
AJaeger | pabelanger: yep, works as well. mriedem, what do you want? pabelanger and myself can +2A quickly... | 20:22 |
*** eharney has joined #openstack-infra | 20:22 | |
pabelanger | IMO: 533608 should be updated to use branches on stable only. | 20:22 |
mriedem | ah yeah we could do that, | 20:23 |
mriedem | change https://review.openstack.org/#/c/533608/ to be !master, | 20:23 |
pabelanger | once landed, turn our focus to stable branches | 20:24 |
pabelanger | mriedem: yah | 20:24 |
mriedem | and then push the removal patch on top with the depends-on | 20:24 |
mriedem | i can do that quick | 20:24 |
pabelanger | k | 20:25 |
*** amoralej is now known as amoralej|off | 20:26 | |
*** edmondsw has joined #openstack-infra | 20:30 | |
*** rfolco|ruck is now known as rfolco|off | 20:31 | |
*** Swami has quit IRC | 20:31 | |
*** edmondsw has quit IRC | 20:32 | |
*** edmondsw_ has joined #openstack-infra | 20:32 | |
openstackgerrit | Matt Riedemann proposed openstack-infra/project-config master: Moving nova functional test def to in tree https://review.openstack.org/533608 | 20:34 |
openstackgerrit | Matt Riedemann proposed openstack-infra/project-config master: Only run openstack-tox-functional on nova stable branches https://review.openstack.org/539016 | 20:34 |
mriedem | pabelanger: AJaeger: ok i think this does it ^ | 20:34 |
mriedem | i didn't mess with the bottom patch to run on stable/queens since it doesn't exist yet, and figured these should all be merged by the time we cut stable/queens anyway | 20:35 |
*** agopi|out has quit IRC | 20:36 | |
AJaeger | mriedem: looks good. | 20:37 |
AJaeger | mriedem: yeah, hope this can be solved quickly | 20:37 |
pabelanger | +3 | 20:37 |
AJaeger | mriedem: might need 80 mins to get nodes - and then should merge quickly... | 20:38 |
pabelanger | yah, lets see how long new nodes take | 20:39 |
mriedem | after waiting a week, i think i can wait 80 minutes :) | 20:41 |
mriedem | good idea on slicing that up btw | 20:41 |
AJaeger | mriedem: btw. you asked me a couple of days ago for a nova docs change - done in https://review.openstack.org/538163 . could you put that on your review queue, please? | 20:46 |
mriedem | ah yeah thanks | 20:46 |
mriedem | i figured you'd find a suse minion | 20:46 |
clarkb | pabelanger: I'm about to grab lunch but kids should be napping and I can help dig into the logstash stuff | 20:47 |
pabelanger | clarkb: thanks, I haven't made much progress | 20:48 |
AJaeger | mriedem: was quicker fixing myself ;) | 20:48 |
AJaeger | mriedem: thanks | 20:49 |
clarkb | pabelanger: es02 reports that cluster is happy and "green". Logstash-worker01 workers all seem to be writing to their log files did you do anything there? | 20:53 |
*** Swami has joined #openstack-infra | 20:53 | |
pabelanger | clarkb: I did restart workers on that server, but believe logstash-worker02 has the same issue, but I didn't restart anything | 20:54 |
clarkb | 02 looks happy too | 20:54 |
*** e0ne has joined #openstack-infra | 20:54 | |
* clarkb scans all the servers really quick | 20:54 | |
pabelanger | k | 20:54 |
clarkb | the gearman server reports we are keeping up too fwiw | 20:55 |
pabelanger | clarkb: oh, could it have been permissions on logs.o.o? | 20:55 |
clarkb | possibly | 20:55 |
pabelanger | yah, might be related to that | 20:55 |
pabelanger | so, I've spend most of today watching zuul.o.o, and it does appear we are spending a lot of time doing dynamic reloads. Maybe 5-10mins at a time, then we get results from queue, and proceed to do reloads again, get more results, etc. During the periods of reloads, we don't see to be processing node requests, and unblock each time we get more results | 20:57 |
clarkb | e-r reports happyness now too so whatever it was guessing something related to the logs server | 20:57 |
pabelanger | leading to wavy graphs in grafana: http://grafana.openstack.org/dashboard/db/nodepool | 20:57 |
AJaeger | pabelanger: agree with your observation. So, this results in ready nodes that are waiting for jobs for those 5-10 mins and thus start later than they could? | 21:06 |
*** olaph has joined #openstack-infra | 21:06 | |
*** olaph1 has quit IRC | 21:07 | |
pabelanger | AJaeger: yah, we end up pooling ready nodes (from nodepool) and eventually zuul has CPU to switch to in-use. | 21:09 |
pabelanger | having the gate reset isn't helping, as I think it then triggers a new round of dynamic reloads | 21:10 |
smcginnis | So every few results, we end up introducing another 5-10 minutes of latency? | 21:12 |
AJaeger | smcginnis: this hits us pretty hard with gate resets. Not really a problem with new changes added at the ned | 21:14 |
AJaeger | s/ned/end | 21:14 |
pabelanger | well, I'd like to avoid saying that is what happening, until we confirm from logs. But what I have just been noticing looking at zuul.o.o and grafana | 21:16 |
AJaeger | pabelanger: it explains nicely what I'm seeing as well | 21:17 |
* AJaeger waves good night | 21:18 | |
*** anticw has joined #openstack-infra | 21:22 | |
*** dprince has quit IRC | 21:22 | |
anticw | is there a channel (this one?) to ask about zuul (v3) quirks/issues? | 21:22 |
*** slaweq_ has joined #openstack-infra | 21:22 | |
clarkb | anticw: if they are specific to openstack's use of zuulv3 this channel is a good place. But ifyou are running your own zuul for your CI then #zuul may be better | 21:24 |
anticw | clarkb: specific to openstack zuulv3 ... thanks | 21:26 |
anticw | re: http://zuul.openstack.org/ ... if i have a filter (most of the time) ... the bulk of the screen is taken up with queues that aren't relevant/useful ... how can i hide those? | 21:27 |
*** slaweq_ has quit IRC | 21:27 | |
clarkb | anticw: for that we'd likely need to modify the filtering javascript to remove empty pipelines after applying filters | 21:27 |
anticw | re: .zuul.yaml ... i searched the docs, but couldn't find a way to do this ... is there a way to specific the *minimum* resources suitable for a builder? some are slow and will timeout, this is wasted effort for everyone | 21:27 |
clarkb | anticw: all of our test nodes should be roughly equivalent. 8 vcpus, at least 80GB of disk, an external ip etc | 21:28 |
clarkb | anticw: so as of right now there isn't much to distinguish the test nodes. There is talk of starting to add smaller instances for jobs like pep8 though which will likely be implemented via a different "label" that you can use in your nodesets | 21:29 |
anticw | clarkb: some are unreliable, i thought about doing tests on hostname to detect known patterns of VMs which fail often but that feels a bit snarky | 21:29 |
clarkb | anticw: can you be more specific about the unreliableness? I think the best way forward there is to address those problems instead of avoiding them entirely. We've explicitly tried to build a system thta is resilient to cloud failures and that starts to breakdown if you exclude clouds | 21:30 |
anticw | clarkb: for openstack-helm... it does... a lot... and it's not uncommon to see docker and/or kubernetes timeouts | 21:31 |
anticw | when i search the failed logs (this was a week or two back) some hosts seemed more problematic than others | 21:31 |
*** e0ne has quit IRC | 21:32 | |
pabelanger | what sort of timeouts related to docker / k8s? | 21:32 |
clarkb | anticw: its not uncommon for cloud specific behavior to create problems but typically we can and have addressed that directly and have not just avoided the cloud entirely | 21:32 |
clarkb | specific details are helpful so that we can understand the actual problems here | 21:33 |
pabelanger | +1 | 21:33 |
*** r-daneel has quit IRC | 21:33 | |
*** r-daneel has joined #openstack-infra | 21:33 | |
anticw | clarkb: the builder script check things are up ... and times out after say 10 minutes ... some are find after five, some are not | 21:34 |
clarkb | anticw: and that implies not all services are running in that amount of time? | 21:35 |
clarkb | anticw: do we know if that is beacuse they are blocking on network io to get packages or images? | 21:35 |
anticw | my guess is slow IO | 21:35 |
anticw | but i don't think we know | 21:35 |
anticw | portdirect: ? do we know? | 21:35 |
clarkb | ok, I think ^ is what we need to figure out before we start "solving" the problem | 21:35 |
anticw | fair | 21:36 |
anticw | later today i will dig out recent failures related to things being abnormally slow and point at the specific log items | 21:36 |
clarkb | we've put a lot of effort into making things like caching proxies for docker images for example | 21:36 |
*** vivsoni has quit IRC | 21:36 | |
anticw | ones from 2+ weeks ago i think are less useful | 21:36 |
pabelanger | yah, wonder if maybe downloading packages from network, we've dealt with that in the pass with regional mirrors / apache reverse proxy | 21:36 |
clarkb | and if you aren't using that proxy image downloads likely will be slow | 21:36 |
anticw | pabelanger: i don't think it's image download performance | 21:36 |
pabelanger | eg: if you are downloading directly from docker.io, I can see that | 21:36 |
clarkb | but straightforward fix for problems like that is using the mirrors | 21:36 |
*** vivsoni has joined #openstack-infra | 21:36 | |
clarkb | etc | 21:36 |
anticw | it's usually k8s pods are slow to be in a ready state | 21:36 |
*** dsariel has joined #openstack-infra | 21:37 | |
pabelanger | is there logs showing the failure? | 21:38 |
clarkb | right we should dig into the causes of that then decide on the best way to address it | 21:38 |
*** kgiusti has left #openstack-infra | 21:38 | |
anticw | pabelanger: yeah, there are but as gate scripts change daily i will point only recent ones | 21:38 |
anticw | (almost daily) | 21:38 |
*** jamesmca_ has quit IRC | 21:39 | |
anticw | most builders have how much ram? i'm going to start one here as a reference | 21:40 |
clarkb | anticw: they all have 8GB of ram and 8vcpus | 21:41 |
*** jamesmcarthur has joined #openstack-infra | 21:42 | |
anticw | thanks | 21:42 |
openstackgerrit | Merged openstack-infra/nodepool master: Partial revert for disabled provider change https://review.openstack.org/538995 | 21:43 |
*** eharney has quit IRC | 21:43 | |
openstackgerrit | Matthieu Huin proposed openstack-infra/zuul master: [WIP] zuul web: add admin endpoint, enqueue commands https://review.openstack.org/539004 | 21:43 |
*** myoung is now known as myoung|bbl | 21:46 | |
*** Goneri has joined #openstack-infra | 21:47 | |
*** jamesmcarthur has quit IRC | 21:53 | |
*** olaph1 has joined #openstack-infra | 21:54 | |
*** olaph has quit IRC | 21:55 | |
*** jamesmcarthur has joined #openstack-infra | 21:56 | |
*** andreww has joined #openstack-infra | 21:58 | |
clarkb | anticw: https://docs.openstack.org/infra/manual/testing.html is a general document on the topic | 22:01 |
*** xarses_ has quit IRC | 22:01 | |
*** trown|rover is now known as trown|outtypewww | 22:02 | |
*** dsariel has quit IRC | 22:02 | |
anticw | clarkb: thanks | 22:03 |
clarkb | needs an update though we got rid of all the static privileged VMs | 22:03 |
anticw | it specifies the RAM which i've only seen through experience not documentation before | 22:04 |
*** threestrands has joined #openstack-infra | 22:05 | |
*** threestrands has quit IRC | 22:05 | |
*** threestrands has joined #openstack-infra | 22:05 | |
*** threestrands_ has joined #openstack-infra | 22:07 | |
*** threestrands has quit IRC | 22:08 | |
*** threestrands_ has quit IRC | 22:08 | |
*** threestrands has joined #openstack-infra | 22:08 | |
*** threestrands has quit IRC | 22:08 | |
*** threestrands has joined #openstack-infra | 22:08 | |
*** jamesmcarthur has quit IRC | 22:09 | |
openstackgerrit | Clark Boylan proposed openstack-infra/infra-manual master: Update testing doc with zuul v3 info https://review.openstack.org/539029 | 22:13 |
*** dtruong has quit IRC | 22:13 | |
clarkb | updated to reflect current situation a bit better | 22:13 |
*** jamesmcarthur has joined #openstack-infra | 22:14 | |
pabelanger | 539016 is finally running jobs | 22:14 |
*** dmellado has quit IRC | 22:17 | |
*** stevebaker has quit IRC | 22:17 | |
*** stevebaker has joined #openstack-infra | 22:18 | |
*** dmellado has joined #openstack-infra | 22:20 | |
*** threestrands_ has joined #openstack-infra | 22:21 | |
*** felipemonteiro_ has quit IRC | 22:22 | |
*** threestrands has quit IRC | 22:23 | |
pabelanger | okay, fixed another permission issue on logs.o.o, had to chmod 0775 top-level directories. We might also want to update our publish playbooks to confirm that permission too | 22:23 |
*** edmondsw_ is now known as edmondsw | 22:26 | |
clarkb | is the chown still running? I am guessing it is | 22:28 |
*** ganso has quit IRC | 22:29 | |
*** dbecker has quit IRC | 22:30 | |
*** dave-mccowan has quit IRC | 22:31 | |
*** dbecker has joined #openstack-infra | 22:31 | |
*** jamesmcarthur has quit IRC | 22:33 | |
pabelanger | yah, up to 08 | 22:33 |
*** jappleii__ has joined #openstack-infra | 22:35 | |
*** jappleii__ has quit IRC | 22:36 | |
*** jappleii__ has joined #openstack-infra | 22:37 | |
*** jcoufal has quit IRC | 22:37 | |
EmilienM | gerrit is down? | 22:37 |
*** threestrands_ has quit IRC | 22:37 | |
EmilienM | ssh: connect to host review.openstack.org port 29418: Network is unreachable | 22:37 |
pabelanger | EmilienM: works for me | 22:40 |
*** bfournie has quit IRC | 22:40 | |
EmilienM | ok | 22:41 |
EmilienM | again my canadian line | 22:41 |
*** mylu has joined #openstack-infra | 22:42 | |
openstackgerrit | Matthieu Huin proposed openstack-infra/zuul master: zuul autohold: allow filtering per commit https://review.openstack.org/536993 | 22:43 |
EmilienM | pabelanger: try pushing a patch | 22:44 |
EmilienM | it doesn't work, wes tried as well | 22:44 |
pabelanger | EmilienM: ssh review.openstack.org -p29418 | 22:45 |
pabelanger | that work? | 22:45 |
EmilienM | meh | 22:45 |
*** mylu has quit IRC | 22:46 | |
*** rcernin has joined #openstack-infra | 22:46 | |
*** mylu has joined #openstack-infra | 22:46 | |
pabelanger | remote: https://review.openstack.org/539036 Test | 22:46 |
pabelanger | EmilienM: possible VPN issue? | 22:46 |
EmilienM | pabelanger: it works now | 22:48 |
EmilienM | weird | 22:48 |
EmilienM | anyway | 22:48 |
openstackgerrit | Merged openstack-infra/project-config master: Only run openstack-tox-functional on nova stable branches https://review.openstack.org/539016 | 22:49 |
pabelanger | mriedem: AJaeger: ^now merged | 22:50 |
*** mylu has quit IRC | 22:50 | |
mriedem | woot | 22:52 |
mriedem | about 30 minutes for that to flush through rihgt? | 22:52 |
mriedem | *right | 22:52 |
pabelanger | maybe 120mins | 22:54 |
pabelanger | job looks gone now in gate for master branch | 22:55 |
pabelanger | I'll keep an eye out for 537933 | 22:56 |
pabelanger | but first grabbing some food | 22:56 |
*** slaweq_ has joined #openstack-infra | 23:00 | |
bnemec | EmilienM: I had similar intermittent problems in the past week or two. | 23:03 |
bnemec | My suspicion is that it was trying to use the ipv6 DNS entry. | 23:03 |
pabelanger | clarkb: Shrews: looks like citycloud-sto2 might be full of ready nodes, and zuul doesn't know it. I manually deleted a node, but maybe want to see why that is. It looks to be around the time zuul was swapping a lot this morning | 23:03 |
bnemec | I hard-coded the ipv4 address in /etc/hosts and haven't had a problem since. | 23:04 |
bnemec | At some point I'll have to look into a more permanent fix. | 23:04 |
*** slaweq_ has quit IRC | 23:04 | |
*** dtruong has joined #openstack-infra | 23:04 | |
pabelanger | clarkb: Shrews: maybe this is where max-ready-age comes into play? | 23:05 |
*** jamesmcarthur has joined #openstack-infra | 23:05 | |
*** edmondsw has quit IRC | 23:06 | |
clarkb | pabelanger: are there records for the nodes in zookeeper ? | 23:06 |
clarkb | (I'm trying to determine that myself using zk-shell) | 23:07 |
*** weshay|ruck is now known as weshay|ruck|afk | 23:07 | |
pabelanger | clarkb: I haven't looked yet, but nodepool list shows them as ready / unlocked | 23:07 |
pabelanger | so not sure why zuul isn't using them | 23:07 |
clarkb | in that case they must be in zk as nodepool list is getting its info from there | 23:08 |
pabelanger | oh, they just went in-use | 23:08 |
clarkb | my understanding is that the request handler is basically first to lock wins | 23:08 |
pabelanger | I wonder if my delete maybe updated something in zookeeper | 23:08 |
*** salv-orlando has quit IRC | 23:08 | |
*** ekcs has joined #openstack-infra | 23:08 | |
clarkb | so if the other clouds' threads are quicker at locking the requests then those nodes won't be used until the request handler for that cloud gets some locks | 23:08 |
*** salv-orlando has joined #openstack-infra | 23:09 | |
*** tpsilva has quit IRC | 23:09 | |
pabelanger | | 0002238654 | citycloud-sto2 | centos-7 | 53d89441-aa55-4e0c-9b5d-c5bae64ef3a6 | 77.81.189.44 | | ready | 00:10:34:28 | unlocked | | 23:09 |
*** Goneri has quit IRC | 23:09 | |
*** claudiub|3 has quit IRC | 23:10 | |
pabelanger | would be good to see why that is still idle | 23:10 |
*** rlandy is now known as rlandy|bbl | 23:11 | |
pabelanger | yah, I only see a few centos-7 now ready for 10 hours in sto2 | 23:13 |
*** salv-orlando has quit IRC | 23:13 | |
*** uberjay_ has joined #openstack-infra | 23:14 | |
*** bfournie has joined #openstack-infra | 23:16 | |
clarkb | pabelanger: my understanding of how it works is there is a request handler thread for each pool. These poll zookeeper for new requests in zk and if they see one attempt to lock it. Once they have the request lock they check if they have any existing nodes to fulfill the request if they don't then they attempt to boot new instances for the request. If they are at quota they block until they are no | 23:16 |
clarkb | longer at quota before fulfilling the request | 23:16 |
clarkb | my guess is that when zuul was under load it made a ton of node requests. sto2 fulfilled them but then zuul went away so the requests went away but we had nodes for them | 23:17 |
Shrews | pabelanger: citycloud-sto2 is at quote | 23:18 |
clarkb | ah that would explain why it blocks | 23:18 |
*** uberjay has quit IRC | 23:18 | |
*** felipemonteiro_ has joined #openstack-infra | 23:18 | |
Shrews | pabelanger: there are 4 ready nodes, the sto2 thread is trying to handle a request that wants a ubuntu-xenial node. node of those ready nodes are ubuntu-xenial | 23:18 |
Shrews | so it's paused waiting for quota release to build one | 23:19 |
*** bnemec has quit IRC | 23:20 | |
pabelanger | Shrews: yah, so we must have blocked sto2 again (somehow) and my delete requests got things flowing again | 23:20 |
*** mlavalle has quit IRC | 23:20 | |
*** bnemec has joined #openstack-infra | 23:20 | |
clarkb | pabelanger: your delete requests changed it fomr being at quota to not being at quota | 23:20 |
pabelanger | clarkb: yah | 23:20 |
*** edmondsw has joined #openstack-infra | 23:20 | |
clarkb | I think the issue here is that if you block at quota but none of your exising nodes belong to zuul requests then you'll be deadlocked | 23:20 |
Shrews | what did you delete? | 23:20 |
pabelanger | Shrews: I wanted to see if cloud was processing the request (maybe outage) | 23:21 |
pabelanger | clarkb: I think if we had max-ready-age to some value (2 hours) we would have deleted a node and unwedged. not the best, but potential work around | 23:22 |
Shrews | that assumes you have something to delete | 23:22 |
clarkb | ya. Another appraoch may be to have nodepool decline requests if it would block and is at quota and has ready nodes | 23:22 |
clarkb | (but that may result in failed requests unexpectedly) | 23:23 |
clarkb | Shrews: in this case the entire quota was consumed by ready nodes not tied to existing zuul reuqests | 23:23 |
clarkb | I think because we restarted zuul | 23:23 |
clarkb | so any of the ready nodes could be deleted | 23:23 |
pabelanger | yah, believe so, too. They were unlocked for 10 hours | 23:24 |
Shrews | clarkb: can you expand more on "ready nodes not tied to existing zuul requests"? | 23:24 |
clarkb | Shrews: yes, sto2 had 50 ready nodes which put it at quota. So the next request it got made it block. Those ready nodes would never go away because the zuul process that requested them was stopped | 23:25 |
clarkb | Shrews: and since there is a single request handler per pool we weren't able to use those ready nodes in other zuul requests | 23:25 |
Shrews | wait | 23:25 |
Shrews | so, i saw 5 ready nodes | 23:26 |
Shrews | are you saying all 50 were READY and locked? | 23:26 |
pabelanger | 50 ready and unlocked | 23:26 |
pabelanger | http://grafana.openstack.org/dashboard/db/nodepool-city-cloud | 23:26 |
pabelanger | for ~10 hours | 23:26 |
Shrews | that graph is not helpful for me. do either of you have raw zk data for that? | 23:27 |
clarkb | I don't, just going off of what pabelanger said | 23:27 |
Shrews | nodepool list --detail output maybe? | 23:27 |
pabelanger | I just have nodepool list | 23:27 |
*** olaph1 is now known as olaph | 23:27 | |
Shrews | pabelanger: may i see that? | 23:27 |
pabelanger | 1 sec | 23:27 |
Shrews | thx | 23:27 |
clarkb | basically zuul is running and gets into an unhappy state, while on its way to this unhappy state sto2 made a bunch of centos7 nodes for it. Then we restart zuul unlocking all of those nodes and "freeing" them up | 23:28 |
clarkb | except that the next request sto2 processed was for a different flavor and thus blocked | 23:28 |
pabelanger | Shrews: clarkb: http://paste.openstack.org/show/657537/ | 23:29 |
pabelanger | $ sudo -H -u nodepool nodepool delete 0002238037 | 23:29 |
clarkb | basically deadlocking because it had used up its quota with a single label but was trying to fulfill a new request for a different label | 23:29 |
pabelanger | is what I ran to test cloud | 23:29 |
clarkb | we can't free up nodes because no jobs can run on them and we can't boot new instance because we have no free quota | 23:29 |
clarkb | deleting one node allowed the blocked reques tto proceed then if the next request was for centos7 everything starts to get happy | 23:30 |
*** uberjay_ has quit IRC | 23:30 | |
*** stakeda has joined #openstack-infra | 23:30 | |
clarkb | ah ok so it wasn't 50 of a single label | 23:30 |
clarkb | looking at that list I think trusty or debian or fedora or suse request would block though as they aren't xenial or centos7 | 23:31 |
*** uberjay has joined #openstack-infra | 23:31 | |
pabelanger | I'd have to see what nl04.o.o was doing (that is the launcher for citycloud) | 23:31 |
pabelanger | but because they were unlocked, I thought zuul would just iterate over unlocked nodes and use them | 23:32 |
Shrews | pabelanger: was this list *after* restarting a zuul process? | 23:32 |
pabelanger | Shrews: at least 8 hours | 23:32 |
pabelanger | 1 sec | 23:32 |
pabelanger | 14:00 UTC is when I started zuul up again | 23:33 |
pabelanger | so, 9.5 hours I'd say | 23:33 |
*** jamesmcarthur has quit IRC | 23:34 | |
Shrews | something isn't making sense here. i don't think i can evaluate it w/o seeing it in real time myself. | 23:36 |
pabelanger | Ya, I really should not have deleted the node | 23:36 |
*** s-shiono has joined #openstack-infra | 23:37 | |
*** olaph has quit IRC | 23:37 | |
*** olaph has joined #openstack-infra | 23:37 | |
*** ekhugen- has joined #openstack-infra | 23:44 | |
Shrews | pabelanger: clarkb: ah, so further log digging shows that citycloud-sto2 was handling a request for an ubuntu-trusty node. I count 50 nodes (max-servers = 50 for sto2) in pabelanger's output | 23:44 |
Shrews | none of those 50 were a trusty node | 23:44 |
pabelanger | okay, so wedged right? | 23:45 |
Shrews | pabelanger: right | 23:45 |
pabelanger | k | 23:45 |
pabelanger | also :( | 23:45 |
Shrews | pabelanger: so max-ready-age would definitely have helped in this scenario. | 23:45 |
*** ekhugen_alt has quit IRC | 23:45 | |
*** igormarnat has quit IRC | 23:45 | |
*** logan- has quit IRC | 23:45 | |
*** StevenK has quit IRC | 23:45 | |
*** clarkb has quit IRC | 23:45 | |
*** jlvillal has quit IRC | 23:45 | |
*** zeus has quit IRC | 23:45 | |
*** _Cyclone_ has quit IRC | 23:45 | |
*** mandre has quit IRC | 23:45 | |
*** honza has quit IRC | 23:45 | |
*** adarazs has quit IRC | 23:45 | |
*** jistr has quit IRC | 23:45 | |
*** dtantsur|afk has quit IRC | 23:45 | |
*** r-daneel has quit IRC | 23:45 | |
Shrews | i was just trying to make sure there was not a bug | 23:45 |
pabelanger | k | 23:45 |
*** edmondsw has quit IRC | 23:46 | |
* Shrews returns to his evening | 23:47 | |
pabelanger | Shrews: thanks! | 23:47 |
*** edmondsw has joined #openstack-infra | 23:47 | |
*** freerunner has quit IRC | 23:48 | |
Shrews | pabelanger: np. i think we could try setting a fairly low max-ready-age in our environment. maybe an hour? will need to consider that a bit more i suppose | 23:50 |
pabelanger | Shrews: yup, a good topic to discuss | 23:50 |
*** freerunner has joined #openstack-infra | 23:51 | |
*** jamesmcarthur has joined #openstack-infra | 23:51 | |
*** igormarnat has joined #openstack-infra | 23:51 | |
*** logan- has joined #openstack-infra | 23:51 | |
*** StevenK has joined #openstack-infra | 23:51 | |
*** clarkb has joined #openstack-infra | 23:51 | |
*** jlvillal has joined #openstack-infra | 23:51 | |
*** zeus has joined #openstack-infra | 23:51 | |
*** _Cyclone_ has joined #openstack-infra | 23:51 | |
*** mandre has joined #openstack-infra | 23:51 | |
*** honza has joined #openstack-infra | 23:51 | |
*** adarazs has joined #openstack-infra | 23:51 | |
*** jistr has joined #openstack-infra | 23:51 | |
*** dtantsur|afk has joined #openstack-infra | 23:51 | |
*** mylu has joined #openstack-infra | 23:51 | |
*** edmondsw has quit IRC | 23:51 | |
*** dave-mccowan has joined #openstack-infra | 23:53 | |
*** jamesmcarthur has quit IRC | 23:55 | |
*** hongbin has quit IRC | 23:57 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!