*** mattw4 has quit IRC | 00:06 | |
*** zer0c00l_ has quit IRC | 00:06 | |
pabelanger | okay, the tox-remote failure is getting worst it seems. I'll try to spend some time tomorrow looking into it | 00:19 |
---|---|---|
pabelanger | I think tristanC as some code to reproduce, but haven't looked yet | 00:19 |
fungi | agreed, i just hit it a moment ago too | 00:26 |
fungi | and i *really* don't push many zuul changes | 00:27 |
tristanC | tobiash: oh, the zuul_console is actually started by the opendev pre run, and i think tox-remote is indeed using the existing service instead of creating a new one with the speculative state | 00:27 |
tristanC | pabelanger: i'll update the code to re-add the instrumentation with a fix for ^ | 00:28 |
openstackgerrit | Paul Belanger proposed zuul/zuul master: Add retries to getPullReviews() with github https://review.opendev.org/655204 | 00:28 |
pabelanger | SpamapS: ^is that what you were thinking about using for / else. I agree there is a bug, guess need help on what to do when we loop 5 times | 00:29 |
pabelanger | tristanC: ah, cool | 00:30 |
pabelanger | also, have no idea how we test that, aside from maybe mock? on 655204 | 00:30 |
pabelanger | what does: Requirements ['docker-image'] not met by build e4abfc519a8c431c91c2564a2848127f | 00:52 |
pabelanger | mean again? | 00:52 |
pabelanger | https://review.opendev.org/658486/ | 00:52 |
pabelanger | is that because the depends-on patch docker image is missing? | 00:53 |
openstackgerrit | Paul Belanger proposed zuul/zuul master: Update quickstart nodepool node to python3 https://review.opendev.org/658486 | 00:54 |
pabelanger | going to drop it and see | 00:54 |
pabelanger | https://zuul.openstack.org/build/e4abfc519a8c431c91c2564a2848127f | 00:57 |
pabelanger | ah | 00:58 |
pabelanger | I see | 00:58 |
pabelanger | the parent change failed | 00:58 |
pabelanger | but odd | 00:58 |
pabelanger | in that case, does it matter is the same project? I would expect the container to be rebuilt again | 00:59 |
openstackgerrit | Paul Belanger proposed zuul/zuul master: Update quickstart nodepool node to python3 https://review.opendev.org/658486 | 01:25 |
fungi | zuul-maint: 659674 is passing now and will allow us to uncap gerrit in the quickstart jobs again | 01:32 |
SpamapS | pabelanger:you definitely got for/else right. | 01:46 |
SpamapS | pabelanger:I think this is a better exception than exploding later when the variable is undefined. | 01:46 |
pabelanger | SpamapS: okay, neat. I still think it prevents zuul from enqueuing the change, but guess that is what we live with if github api is wonky | 01:47 |
*** ianychoi has joined #zuul | 02:00 | |
tristanC | wouldn't it make sens to have a periodic task to check open changes missing ci votes, and automatically enqueue those in zuul? | 02:02 |
pabelanger | also thought of doing that | 02:03 |
pabelanger | but, ideally we don't miss the events to start | 02:03 |
pabelanger | however, so far when we do, it has been because of 500 errors on github side | 02:03 |
tristanC | pabelanger: it can also happen when upgrading zuul, or when we need to reboot the control plane | 02:08 |
pabelanger | tristanC: yah, I think HA scheduler will fix some of that, there has been talk of fixing that | 02:10 |
pabelanger | so, we shouldn't miss any events if scheduler is down | 02:11 |
pabelanger | but agree, a known issue | 02:11 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: zuul_stream: add debug to investigate tox-remote failures https://review.opendev.org/657914 | 02:29 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: zuul_stream: add debug to investigate tox-remote failures https://review.opendev.org/657914 | 02:50 |
*** jesusaur has quit IRC | 02:58 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: zuul_stream: add debug to investigate tox-remote failures https://review.opendev.org/657914 | 03:28 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: zuul_stream: prevent incorrect task exit code detection https://review.opendev.org/659708 | 03:28 |
*** ianychoi has quit IRC | 03:49 | |
*** ianychoi has joined #zuul | 03:49 | |
*** ianychoi has quit IRC | 03:52 | |
*** ianychoi has joined #zuul | 03:52 | |
*** bhavikdbavishi has joined #zuul | 03:54 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: zuul_stream: add debug to investigate tox-remote failures https://review.opendev.org/657914 | 03:59 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Annotate github logs with the event id https://review.opendev.org/658645 | 04:11 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Annotate gerrit event logs https://review.opendev.org/658646 | 04:11 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Attach event to queue item https://review.opendev.org/658647 | 04:11 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Annotate some logs in the scheduler with event id https://review.opendev.org/658648 | 04:11 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Annotate logs in the zuul driver with event ids https://review.opendev.org/658649 | 04:11 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Add event id to timer events https://review.opendev.org/658650 | 04:11 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Annotate pipeline processing with event id https://review.opendev.org/658651 | 04:11 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Annotate merger logs with event id https://review.opendev.org/658652 | 04:11 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Annotate job freezing logs with event id https://review.opendev.org/658888 | 04:11 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Annotate node request processing with event id https://review.opendev.org/658889 | 04:11 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: WIP: Annotate builds with event id https://review.opendev.org/658895 | 04:11 |
daniel2 | SpamapS: there was no nginx specific stuff in the docs, just apache | 04:19 |
*** bhavikdbavishi1 has joined #zuul | 04:23 | |
*** bhavikdbavishi has quit IRC | 04:23 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 04:23 | |
tobiash | tristanC: I think I've understood the tox-remote failures. It's a race, not a parsing problem. I'll sum it up later when I have more time. | 04:24 |
tristanC | tobiash: i've seen my comment about zuul_console not being restarted by the job and using the one setup by the opendev.org job? | 04:24 |
tristanC | have you seen* ? | 04:24 |
tobiash | yes, that's unfortunate and should be fixed as well | 04:25 |
*** bhavikdbavishi has quit IRC | 04:29 | |
tobiash | tristanC: but we shouldn't fix this by restarting zuul_console but by running a second on a different port | 04:35 |
*** bhavikdbavishi has joined #zuul | 04:36 | |
tobiash | Otherwise we're doomed when a change breaks it as this will also break the overall job output | 04:36 |
*** yolanda_ has quit IRC | 04:41 | |
*** yolanda_ has joined #zuul | 04:42 | |
*** saneax has joined #zuul | 04:45 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: zuul-tox-remote: kill previously started zuul_console service https://review.opendev.org/659708 | 04:47 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: zuul_stream: add debug to investigate tox-remote failures https://review.opendev.org/657914 | 04:47 |
*** pcaruana has joined #zuul | 05:20 | |
openstackgerrit | Merged zuul/zuul master: tox: Integrate tox-docker https://review.opendev.org/649041 | 05:42 |
*** bhavikdbavishi has quit IRC | 05:43 | |
tristanC | tobiash: alright, then i'll give https://review.opendev.org/#/c/535538/ another try | 05:46 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: zuul-tox-remote: use unique zuul_console service https://review.opendev.org/659708 | 06:09 |
*** bjackman_ has joined #zuul | 06:12 | |
*** gtema has joined #zuul | 06:15 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: zuul_stream: add debug to investigate tox-remote failures https://review.opendev.org/657914 | 06:23 |
*** bhavikdbavishi has joined #zuul | 06:30 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: zuul-tox-remote: use unique zuul_console service https://review.opendev.org/659708 | 06:39 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: zuul_stream: add debug to investigate tox-remote failures https://review.opendev.org/657914 | 06:42 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: zuul-tox-remote: use unique zuul_console service https://review.opendev.org/659708 | 06:55 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: zuul_stream: add debug to investigate tox-remote failures https://review.opendev.org/657914 | 06:55 |
*** EmilienM|pto has quit IRC | 07:16 | |
*** EmilienM has joined #zuul | 07:17 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: [DNM] run zuul-tox-remote many times https://review.opendev.org/657914 | 07:21 |
tristanC | tobiash: ok, 6 SUCCESS on 657914 for zuul-tox-remote | 07:21 |
AJaeger | tristanC: Yeah, thanks for fixing! | 07:26 |
tristanC | AJaeger: you're welcome, so the fix may be: https://review.opendev.org/659708 | 07:32 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: DNM: further zuul-remote debugging attempts https://review.opendev.org/659631 | 07:46 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Fix race in log streaming https://review.opendev.org/659738 | 07:46 |
tristanC | unfortunately that doesn't fully fixed the issue, second recheck showed some failure | 07:48 |
tobiash | tristanC: 659738 should fix that race hopefully | 07:49 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: zuul-tox-remote: use unique zuul_console service https://review.opendev.org/659708 | 07:56 |
*** hashar has joined #zuul | 07:57 | |
*** jpena|off is now known as jpena | 07:59 | |
tobiash | all 10 tox-remote successful on https://review.opendev.org/659631 | 08:15 |
tobiash | :) | 08:15 |
AJaeger | tobiash, so is tristanC's change not needed? | 08:17 |
tobiash | AJaeger: it's needed for a different reason (we don't test zuul_stream without it) | 08:19 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: DNM: further zuul-remote debugging attempts https://review.opendev.org/659631 | 08:24 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Increase timeout of zuul-tox-remote https://review.opendev.org/659743 | 08:24 |
tobiash | and also increase the timeout as many runs are close to the default of 30 minutes ^ | 08:24 |
openstackgerrit | Merged zuul/zuul master: Add proper __repr__ to merger repo https://review.opendev.org/649949 | 08:46 |
*** panda|rover|off is now known as panda|rover | 09:10 | |
*** saneax has quit IRC | 09:33 | |
*** saneax has joined #zuul | 09:35 | |
*** saneax has quit IRC | 09:42 | |
*** hashar has quit IRC | 09:47 | |
*** gtema has quit IRC | 10:04 | |
*** gtema has joined #zuul | 10:05 | |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: Pagure driver - https://pagure.io/pagure/ https://review.opendev.org/604404 | 10:23 |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: Pagure driver - https://pagure.io/pagure/ https://review.opendev.org/604404 | 10:29 |
tobiash | 659631 passed 20 zuul-tox-remote jobs in a row :) | 10:40 |
AJaeger | yeah! | 10:45 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Update cached repo during job startup only if needed https://review.opendev.org/648229 | 10:47 |
*** gtema has quit IRC | 11:06 | |
*** gtema has joined #zuul | 11:06 | |
*** hashar has joined #zuul | 11:08 | |
*** jpena is now known as jpena|lunch | 11:31 | |
*** bhavikdbavishi has quit IRC | 12:06 | |
mordred | tristanC: did you see we had to revert the react v2 change yesterday? I haven't had time to reproduce locally and debug, but it looks like the same issue was happening in the build-dashboard job, so it should be reproducible | 12:10 |
mordred | tobiash, tristanC: nice zuul-stream patches! | 12:13 |
AJaeger | mordred: did you see https://review.opendev.org/#/c/659738/ https://review.opendev.org/#/c/659743/ and https://review.opendev.org/659708 ? Those address the tox-remote failures... | 12:14 |
AJaeger | mordred: ah, you did see it ;) | 12:14 |
mordred | :) | 12:14 |
mordred | yes - nice patches | 12:14 |
AJaeger | indeed! | 12:15 |
tobiash | thanks :) | 12:29 |
mordred | tobiash: maybe with the updates to tox-remote and fixes to the image builds we'll actually be able to land your log event id stack :) | 12:32 |
tobiash | mordred: that would be great :) | 12:32 |
Shrews | mordred: i don't think i've noticed this before, but re: the react revert (https://review.opendev.org/659655), it looked like it failed gate initially but immediate requeued and succeeded. Is that a thing? | 12:32 |
Shrews | there is no recheck comment there | 12:34 |
mordred | Shrews: I'm very confused about that | 12:35 |
tobiash | was there a scheduler restart around that time? | 12:35 |
tobiash | a restart + reenqueue race could explain that | 12:35 |
mordred | oh 0 yeah - there might have been | 12:35 |
mordred | we did restart the scheduler yesterday | 12:35 |
Shrews | ah | 12:36 |
*** jpena|lunch is now known as jpena | 12:36 | |
tobiash | btw, 659631 succeeded now 30 remote tests in a row, so I think the tox-remote stack looks good now | 12:37 |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: Prevent Zuul scheduler to crash at startup if gerrit down https://review.opendev.org/576192 | 12:39 |
*** rlandy has joined #zuul | 12:40 | |
Shrews | tobiash: what finally led you to find the cause of the remote job failure? | 12:44 |
Shrews | +3'd , btw | 12:44 |
mordred | fbo_: accidental file deletion in that last patchset && | 12:46 |
mordred | Shrews: I'm going to choose to believe he loaded the tests into a bmw and had someone drive it around the test track and eventually the autopilot pointed out the error | 12:47 |
Shrews | mordred: it looks more like he flogged zuul until it gave up its secrets. i approve of this method. | 12:48 |
tobiash | you know the feeling when you suddenly have an idea how to fix a hard problem and you don't have access to a laptop to immediately try it out? I had this moment this morning before I took the bike to work. | 12:49 |
AJaeger | mordred: it's race - so, that involves multiples cars and drivers ;) | 12:49 |
tobiash | it's like torture having to think about that for an hour until you can work on that ;) | 12:49 |
tobiash | Shrews: just a sec, Iooking for the logs with the hint | 12:50 |
Shrews | tobiash: i think i found it | 12:50 |
tobiash | Shrews: I found the final hint there: http://logs.openstack.org/31/659631/2/check/zuul-tox-remote-5/e2b93e4/testr_results.html.gz | 12:52 |
tobiash | search for 'XXX: streamers stop and log not found controller' | 12:52 |
Shrews | yep, that's what i found :) | 12:53 |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: Prevent Zuul scheduler to crash at startup if gerrit down https://review.opendev.org/576192 | 12:53 |
fbo_ | mordred: argh, yes thanks | 12:53 |
mordred | tobiash: yes! I know that feeling - it can be very annoying. | 12:54 |
*** bjackman has joined #zuul | 12:57 | |
*** bjackman_ has quit IRC | 12:58 | |
tobiash | mordred: thinking about this .keep file, is it possible to remove that and instead create the target dir on the fly? | 13:13 |
tobiash | almost everyone accidentally removes that occasionally in changes | 13:13 |
tobiash | (including me) | 13:13 |
mordred | tobiash: probably? I can't remember what gets confused when the directory is gone - I wanna say something in react-scripts was grumpy | 13:14 |
mordred | (I agree, I dislike the file) | 13:14 |
*** bjackman has quit IRC | 13:21 | |
openstackgerrit | Merged zuul/zuul master: Fix race in log streaming https://review.opendev.org/659738 | 13:23 |
tobiash | \o/ | 13:23 |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: Pagure driver - https://pagure.io/pagure/ https://review.opendev.org/604404 | 13:26 |
*** bhavikdbavishi has joined #zuul | 13:27 | |
clarkb | Shrews: mordred I zuul enqueued the revert back to the gate after it failed | 13:32 |
openstackgerrit | Merged zuul/zuul master: Increase timeout of zuul-tox-remote https://review.opendev.org/659743 | 14:00 |
openstackgerrit | Benedikt Löffler proposed zuul/zuul master: Get playbook data from extra_vars https://review.opendev.org/659802 | 14:03 |
*** bhavikdbavishi has quit IRC | 14:05 | |
pabelanger | Hmm, that's new | 14:10 |
pabelanger | http://paste.openstack.org/show/751518/ | 14:10 |
pabelanger | running zuul tests on fedora-30 now | 14:10 |
pabelanger | guess something changed | 14:10 |
pabelanger | http://paste.openstack.org/show/751519/ | 14:11 |
pabelanger | first one was not complete | 14:11 |
clarkb | pabelanger: rebuild ylur venvs | 14:12 |
clarkb | distro likely updated libcrypt and now you need to relink against it | 14:12 |
pabelanger | clarkb: I did, but let me try again | 14:12 |
pabelanger | oh | 14:13 |
pabelanger | https://fedoraproject.org/wiki/Changes/FullyRemoveDeprecatedAndUnsafeFunctionsFromLibcrypt | 14:13 |
pabelanger | that looks related | 14:13 |
pabelanger | https://github.com/psycopg/psycopg2/issues/912 | 14:15 |
pabelanger | libxcrypt-compat | 14:16 |
pabelanger | seems to be the workaround | 14:16 |
pabelanger | will update bindep.txt for fedora-30 | 14:16 |
tobiash | pabelanger: yeah I had the same problem after upgrading to fedora 30 and installing libxcrypt-compat is the solution | 14:18 |
pabelanger | +1 | 14:19 |
pabelanger | let me finish this python-path test, and will push up that too | 14:19 |
clarkb | the discussion on the manylinux wheel bug is interesting | 14:32 |
fungi | the psycopg2 one or somethnig else? | 14:35 |
fungi | but yeah, manylinux1 wheels which rely on libcrypt are simply going to be broken across the board on fedora 30 | 14:36 |
fungi | which is why it's called "many linux" and not "all linux" | 14:37 |
fungi | just like musl makes manylinux1 wheels not applicable on alpine | 14:37 |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: Prevent Zuul scheduler to crash at startup if gerrit down https://review.opendev.org/576192 | 14:37 |
openstackgerrit | Fabien Boucher proposed zuul/zuul master: Prevent Zuul scheduler to crash at startup if gerrit down https://review.opendev.org/576192 | 14:38 |
fungi | clarkb: ahh, you were likely referring to the https://github.com/pypa/manylinux/issues/305 referenced from it? | 14:38 |
fungi | this mess is why i prefer to jump through hoops and reinvent some wheels (pun intended) to stick to pure python any time it's remotely possible | 14:40 |
clarkb | ya | 14:41 |
openstackgerrit | Paul Belanger proposed zuul/zuul master: Add more test coverage on using python-path https://review.opendev.org/659812 | 14:41 |
pabelanger | zuul-maint: ^ adds more testing around python-path setting from nodepool in zuul^ | 14:42 |
pabelanger | I've added my +2 back to https://review.opendev.org/659812 | 14:42 |
pabelanger | and confident that it is now working correctly | 14:42 |
openstackgerrit | Mark Meyer proposed zuul/zuul master: Create a basic Bitbucket event source https://review.opendev.org/658835 | 14:46 |
*** kmalloc is now known as kmalloc_away | 14:47 | |
openstackgerrit | Mark Meyer proposed zuul/zuul master: Add Bitbucket Server source functionality https://review.opendev.org/657837 | 14:47 |
AJaeger | zuul-maint, is updating our linting jobs to ansible 2.7 correct? See https://review.opendev.org/659810 and https://review.opendev.org/659811 | 14:49 |
pabelanger | AJaeger: we should consider going right to ansible 2.8, since I expect us to start try it out in opendev soon | 14:50 |
AJaeger | pabelanger: let's do it stepwise ;) | 14:50 |
pabelanger | sure | 14:50 |
AJaeger | our users are not that fast | 14:50 |
AJaeger | and with multi-version ansible, i wonder whehter linting jobs iwth 2.7 is correct or whether we need 2.5 as "deprecated" version | 14:51 |
pabelanger | yah, that is the tricky part, we need to keep backwards compat for all version zuul suppoerts | 14:52 |
pabelanger | ansible 2.5 isn't EOL just yet | 14:52 |
pabelanger | another 30days I think | 14:52 |
AJaeger | 2.5 is marked as depreacted in Zuul | 14:52 |
AJaeger | should I send an email to zuul list and wip the changes for now? | 14:53 |
pabelanger | maybe | 14:53 |
pabelanger | we'll have to learn as we go I think | 14:53 |
AJaeger | ;) | 14:54 |
openstackgerrit | Mark Meyer proposed zuul/zuul master: Add Bitbucket Server source functionality https://review.opendev.org/657837 | 14:54 |
openstackgerrit | Mark Meyer proposed zuul/zuul master: Create a basic Bitbucket build status reporter https://review.opendev.org/658335 | 14:54 |
openstackgerrit | Mark Meyer proposed zuul/zuul master: Create a basic Bitbucket event source https://review.opendev.org/658835 | 14:54 |
pabelanger | are any jobs in opendev setup to use another version of ansible besides 2.7? | 14:54 |
AJaeger | pabelanger: not that I know of | 14:56 |
tobiash | achievement unlocked, 500 jobs in parallel | 14:57 |
pabelanger | nice | 14:58 |
mordred | tobiash: as a heads up - 0.28.0 of openstacksdk was released which contains a refactor in image processing code. it should be no different than before - but just wanted you to be aware | 15:43 |
tobiash | thanks for the info :) | 15:44 |
fungi | #status log applied opendev migration renames to the storyboard-dev database to stop puppet complaining | 15:44 |
openstackstatus | fungi: finished logging | 15:44 |
fungi | er, wrong channel, sorry for the noise | 15:44 |
*** rlandy is now known as rlandy|biab | 15:58 | |
*** panda|rover is now known as panda|rover|off | 16:06 | |
*** hashar has quit IRC | 16:07 | |
*** armstrongs has joined #zuul | 16:09 | |
armstrongs | https://storyboard.openstack.org/#!/story/2004868 I am currently hitting this bug and the scheduler is started but not rendering the tenants on the web interface. As a result by executor can't connect to Gearman. Is there any known fix or workaround to get back up and running again? | 16:12 |
*** armstrongs has quit IRC | 16:21 | |
*** jangutter has quit IRC | 16:22 | |
*** rlandy|biab is now known as rlandy | 16:30 | |
clarkb | armstrongs is gone, but I wonder what the merger says | 16:48 |
*** mattw4 has joined #zuul | 17:01 | |
tobiash | yepp, that exception means that the merger failed | 17:07 |
*** mattw4 has quit IRC | 17:16 | |
*** mattw4 has joined #zuul | 17:21 | |
*** jpena is now known as jpena|off | 17:22 | |
*** gtema has quit IRC | 17:46 | |
tobiash | fbo_: I commented on https://review.opendev.org/576192 | 17:48 |
tobiash | pabelanger: +2 with comment on https://review.opendev.org/659812 | 17:53 |
*** Armstrongs has joined #zuul | 17:55 | |
Armstrongs | I'm not running a separate merger component | 17:56 |
tobiash | Armstrongs: the executor is a merger too | 17:56 |
Armstrongs | Sure | 17:56 |
tobiash | your exception indicates that you either don't run an executor/merger or have problems accessing the repo from the executor/merger | 17:56 |
Armstrongs | So that starts then says ending log stream and stops | 17:57 |
Armstrongs | In the logs | 17:57 |
Armstrongs | Debug logs say can't connect to gearman | 17:57 |
Armstrongs | Port | 17:57 |
tobiash | Armstrongs: can you post a log of the scheduler and the executor? | 17:57 |
Armstrongs | Will do 2secs | 17:58 |
fungi | also, is the scheduler configured to run a gearman server? or are you trying to run a separate gearman service for it? | 17:58 |
tobiash | that would have been my next question ;) | 17:58 |
tobiash | because gearman is started before contacting the executor (if gearman is configured to be started by the scheduler) | 17:59 |
Armstrongs | Scheduler is running based on the setup from zuul from scratch guide | 17:59 |
Armstrongs | So it runs gearman | 17:59 |
Armstrongs | I believe | 17:59 |
tobiash | so you have this config: https://zuul-ci.org/docs/zuul/admin/zuul-from-scratch.html#zuul ? | 17:59 |
Armstrongs | Yeah | 18:00 |
Armstrongs | That's how my components are configured | 18:00 |
tobiash | ok, then I think I need scheduler logs, executor logs and the zuul.conf to understand the problem | 18:00 |
Armstrongs | Will do | 18:01 |
Armstrongs | Logging into my laptop now | 18:01 |
Armstrongs | Thanks again | 18:02 |
pabelanger | tobiash: replied | 18:02 |
tobiash | no hurry ;) | 18:02 |
*** armstrongs_ has joined #zuul | 18:02 | |
fungi | Armstrongs: also is your executor on a different machine than your scheduler? if so, have to make sure you have firewall rules on the scheduler allowing access to the gearman port, at least from the ip address of the executor | 18:02 |
Armstrongs | No executor on same machine | 18:03 |
Armstrongs | All on 1 box at moment | 18:03 |
fungi | in that case it's not likely to be a firewall problem at least | 18:03 |
tobiash | pabelanger: I guess there is a misunderstanding, I just meant that I would have expected 'self.fake_nodepool.python_path = python_path' on that line | 18:03 |
Armstrongs | No I can telnet the port | 18:03 |
tobiash | pabelanger: because you pass the path to this function and ignore it there | 18:04 |
tobiash | Armstrongs: you're running *only* the scheduler? | 18:05 |
tobiash | oh, I think I misunderstood that | 18:05 |
tobiash | Armstrongs: so you're running scheduler and executor on the same box right? | 18:06 |
armstrongs_ | i had a running executor, web and scheduler | 18:06 |
armstrongs_ | suddenly the executor stopped | 18:06 |
armstrongs_ | i couldnt start it again | 18:06 |
pabelanger | tobiash: Oh | 18:06 |
pabelanger | ha | 18:06 |
pabelanger | yah, that is a typo | 18:06 |
armstrongs_ | and it has the errors in the logs that match the ticket i found | 18:06 |
pabelanger | I see now | 18:06 |
armstrongs_ | wheres the best place to send you the logs | 18:06 |
tobiash | paste.openstack.org | 18:07 |
openstackgerrit | Paul Belanger proposed zuul/zuul master: Add more test coverage on using python-path https://review.opendev.org/659812 | 18:07 |
armstrongs_ | ok 2 secs | 18:07 |
pabelanger | tobiash: updatred | 18:07 |
pabelanger | updated* | 18:07 |
tobiash | +2 | 18:07 |
armstrongs_ | executor.log is here paste.openstack.org/show/751533/ | 18:19 |
*** Armstrongs has quit IRC | 18:19 | |
armstrongs_ | scheduler.log is here paste.openstack.org/show/751534/ | 18:22 |
armstrongs_ | and the zuul.conf is here paste.openstack.org/show/751535/ | 18:24 |
tobiash | armstrongs_: you should change your db password now ;) | 18:28 |
armstrongs_ | its a poc and test instance | 18:29 |
armstrongs_ | we are evaluating would never be our prod one haha | 18:29 |
armstrongs_ | hoping to get this complete so we can use it in production at just eat instead of team city :) | 18:30 |
tobiash | are those logs incomplete? | 18:31 |
tobiash | I don't see the error there | 18:31 |
armstrongs_ | just checking | 18:32 |
tobiash | also you might be interested in trying out the quickstart instead which is docker based and easier to start with | 18:34 |
armstrongs_ | i had it all working | 18:34 |
tobiash | ah ok | 18:34 |
armstrongs_ | it just stopped after the 3rd day | 18:34 |
armstrongs_ | this happened before too | 18:35 |
armstrongs_ | so defo eventually hits a bug | 18:35 |
armstrongs_ | will update full logs now | 18:35 |
armstrongs_ | sorry | 18:35 |
armstrongs_ | will re-upload the scheduler | 18:37 |
tobiash | in which order did you start the services? | 18:39 |
armstrongs_ | paste.openstack.org/show/751537/ is the full scheduler log | 18:39 |
tobiash | to me it looks like the executor tried for some time to connect to gearman and then gave uo | 18:39 |
tobiash | *up | 18:39 |
armstrongs_ | started them in the order in the guild so executor, web then scheduler | 18:39 |
armstrongs_ | yeah i have tried all orders | 18:39 |
tobiash | try the scheduler first | 18:40 |
armstrongs_ | i have | 18:40 |
tobiash | all services need gearman which is started by the scheduler | 18:40 |
armstrongs_ | it starts but the main page doesnt load | 18:40 |
armstrongs_ | on the web | 18:40 |
armstrongs_ | the tenants | 18:40 |
tobiash | also I noticed that the executor log looks weird as there are large time gaps in between | 18:40 |
armstrongs_ | so defo not in a good state | 18:40 |
armstrongs_ | yeah that was me trying to restart it | 18:40 |
armstrongs_ | after debugging the scheduler logs | 18:41 |
armstrongs_ | i will try starting scheduler then web then executor | 18:42 |
armstrongs_ | in that order now | 18:42 |
tobiash | web is not necessary for the startup at first | 18:42 |
tobiash | could you please delete all logs before and then just start scheduler and executor? | 18:43 |
armstrongs_ | wiil do | 18:43 |
armstrongs_ | scheduler.log paste.openstack.org/show/751538/ | 18:47 |
armstrongs_ | executor.log paste.openstack.org/show/751539/ | 18:49 |
armstrongs_ | so scheduler service is running but executor just dies as before after starting | 18:49 |
SpamapS | armstrongs_:you shouldn't drop the https:// ... makes me have to do an extra 2 clicks to see your pastes. ;) | 18:50 |
tobiash | armstrongs_: the executor just dies? | 18:51 |
SpamapS | yeah that's weird | 18:51 |
tobiash | armstrongs_: you could try to run it in foreground by executing 'zuul-executor -d' | 18:52 |
armstrongs_ | ok will try that | 18:52 |
clarkb | also check dmesg for oomkiller when processes just die | 18:53 |
tobiash | yeah, maybe the instance is now too small | 18:53 |
armstrongs_ | the web interface cant load my tenants now either though | 18:55 |
tobiash | it will only load the tenants after the full startup | 18:55 |
armstrongs_ | scheduler just sits there with -d | 18:56 |
armstrongs_ | saying setting to sleep | 18:56 |
armstrongs_ | polling 1 connection | 18:56 |
armstrongs_ | then nothing | 18:56 |
armstrongs_ | as looking for gearman port | 18:57 |
armstrongs_ | its a powerful box and i was only using a small test repo on it | 18:57 |
armstrongs_ | m5.large | 18:57 |
*** tjgresha has quit IRC | 18:59 | |
SpamapS | armstrongs_:how's zookeeper's health? | 19:01 |
*** tjgresha has joined #zuul | 19:02 | |
armstrongs_ | healthy service | 19:03 |
clarkb | fwiw 8GB of memory isn't a ton for a service like zuul (though usually its only a problem under load) | 19:04 |
clarkb | I would still double check dmesg for oomkiller output | 19:04 |
clarkb | another thing to check is if it is segfaulting | 19:04 |
clarkb | python3 on some distros has been less than stable | 19:05 |
armstrongs_ | this is fedora 28 | 19:05 |
armstrongs_ | the box | 19:05 |
pabelanger | should be okay with fedora-28, I was using that for local development for 6+ months | 19:05 |
armstrongs_ | i gave up on centos 7 | 19:05 |
armstrongs_ | :) | 19:05 |
pabelanger | yah, I wouldn't do centos-7 yet | 19:05 |
pabelanger | hopefully centos-8 things are better | 19:06 |
armstrongs_ | defo | 19:06 |
clarkb | 28 is eol right? | 19:06 |
pabelanger | yah, should be now, if not soon | 19:06 |
armstrongs_ | yeah it is | 19:06 |
armstrongs_ | need to package some new images | 19:06 |
pabelanger | so, executor starts then dies? | 19:06 |
armstrongs_ | :) | 19:06 |
armstrongs_ | yeah | 19:06 |
pabelanger | starting with systemd? | 19:06 |
pabelanger | or manually | 19:06 |
armstrongs_ | so i was setting zuul-executor to verbose | 19:06 |
armstrongs_ | when it happened | 19:07 |
armstrongs_ | if that shines any light on it | 19:07 |
pabelanger | I'd check what clarkb says, dmesg for OOM | 19:07 |
armstrongs_ | as i wanted by ansible output to be verbose | 19:07 |
pabelanger | what version of zuul? | 19:07 |
pabelanger | 3.8.2.dev6 | 19:08 |
armstrongs_ | looking now | 19:08 |
armstrongs_ | 3.8.2.dev6 | 19:09 |
armstrongs_ | not seeing any OOM | 19:10 |
pabelanger | strange | 19:11 |
tjgresha | have a question on separation on zuul from nodepool onto different servers if anyone has insights (is it possible?) | 19:12 |
pabelanger | only real issue I had was with not enough entropy for /dev/random. Installing haveged fixed that for me, for VM | 19:12 |
pabelanger | for zuul-executors | 19:12 |
pabelanger | tjgresha: yup, ask away, also is possible | 19:13 |
pabelanger | armstrongs_: maybe run it under strace and see what happens | 19:13 |
clarkb | segfault would be my next suspicion if it is just dying | 19:14 |
clarkb | strace will show that I Think | 19:14 |
armstrongs_ | will give it a go in a bit, thanks for all the help debugging so far, my girlfriends home now, its 8pm on a friday here so will get in trouble for not paying attention to her :) | 19:15 |
armstrongs_ | will give strace a go and report back | 19:15 |
tjgresha | where do i give zuul the location of the nodepool it should use? | 19:15 |
*** armstrongs_ has quit IRC | 19:15 | |
clarkb | tjgresha: you don't explicitly set that instead you point zuul and nodepool at a common zookeeper database | 19:16 |
pabelanger | tjgresha: https://zuul-ci.org/docs/zuul/admin/components.html#components might help with connections too | 19:17 |
tjgresha | hmm ok - that is what I thought -- so we put ZK and the Nodepool onto a different server than the zuul now, and changed the nodepool.yaml and zuul.conf to point to the new zk | 19:22 |
clarkb | yes | 19:22 |
tjgresha | does zuul need to be restarted after a zuul.conf change? | 19:24 |
openstackgerrit | Paul Belanger proposed zuul/zuul master: Set iterate_timeout to 60 for pause jobs https://review.opendev.org/659871 | 19:24 |
pabelanger | tjgresha: usually yes | 19:24 |
pabelanger | tobiash: clarkb: just seen a pause job timeout, ^ should give us some more room whlie waiting for results | 19:25 |
tjgresha | thanks all | 19:27 |
clarkb | pabelanger: I think part of the reason for those values is OS_TEST_TIMEOUT is set to 240 | 19:28 |
clarkb | 30*4 comes in under 240 but 60*4 == 240 | 19:28 |
clarkb | pabelanger: I still think the change is ok and if we hit that longer timeout we'll see it | 19:28 |
pabelanger | clarkb: Ah, that makes sense | 19:28 |
pabelanger | yah, we might need to bump that too | 19:29 |
pabelanger | so far, I've only see it fail once in 3 weeks, since we improved our testing | 19:29 |
*** tjgresha__ has joined #zuul | 19:30 | |
*** tjgresha__ has quit IRC | 19:30 | |
*** tjgresha has left #zuul | 19:30 | |
*** tjgresha has joined #zuul | 19:31 | |
pabelanger | clarkb: we also seem to be hitting a subunit limit? | 19:31 |
pabelanger | http://logs.openstack.org/08/659708/6/gate/tox-py35/4d7ddb8/job-output.txt.gz#_2019-05-17_13_32_55_896191 | 19:31 |
clarkb | pabelanger: ya thats the too much data in the per test stream | 19:32 |
pabelanger | yah | 19:33 |
pabelanger | test_plugins | 19:33 |
pabelanger | looks to have timed out | 19:33 |
pabelanger | think it is bumping up against OS_TEST_TIMEOUT | 19:33 |
pabelanger | let me confirm | 19:33 |
tobiash | gear is very chatty in the tests | 19:33 |
clarkb | also zuul only attaches logs on failure | 19:34 |
pabelanger | tests.unit.test_v3.TestAnsible28.test_plugins [260.366155s] ... FAILED | 19:34 |
clarkb | so that means the test is failing for some other reason and this error is making it harder to debug but not the root source of the error | 19:34 |
pabelanger | wonder is we have a new failure with 2.8 stuff | 19:34 |
pabelanger | yah, I think we might need to bump that timeout for test_plugins or look to split it up | 19:36 |
pabelanger | very busy that test | 19:37 |
pabelanger | run in ovh-bhs1, so possible we got a slower node there | 19:37 |
pabelanger | clarkb: up for a review on https://review.opendev.org/659812 / https://review.opendev.org/637339 that is to allow for python3 only images for zuul / nodepool | 20:00 |
*** pcaruana has quit IRC | 20:02 | |
clarkb | pabelanger: reading that second chagne there isn't a way to use python3 on the executor/localhost ? | 20:38 |
clarkb | we don't put localhost in the inventory and it isn't managed by nodepool | 20:38 |
clarkb | do we need to consider this case too? | 20:38 |
*** mattw4 has quit IRC | 21:00 | |
*** mattw4 has joined #zuul | 21:00 | |
openstackgerrit | Merged zuul/zuul master: zuul-tox-remote: use unique zuul_console service https://review.opendev.org/659708 | 21:08 |
*** tjgresha has quit IRC | 21:11 | |
openstackgerrit | Merged zuul/zuul master: Set iterate_timeout to 60 for pause jobs https://review.opendev.org/659871 | 21:11 |
*** tjgresha_ has joined #zuul | 21:11 | |
*** tjgresha has joined #zuul | 21:14 | |
pabelanger | clarkb: yah, good question. For that, we don't have a way to override right now | 21:15 |
pabelanger | so, we should think about that | 21:15 |
pabelanger | not localhost, but for nodes not managed by nodepool, I use add_host to set it | 21:16 |
*** tjgresha_ has quit IRC | 21:16 | |
*** tjgresha has quit IRC | 21:22 | |
*** Armstrongs has joined #zuul | 21:22 | |
*** Armstrongs has quit IRC | 21:31 | |
*** mattw4 has quit IRC | 21:35 | |
*** mattw4 has joined #zuul | 21:35 | |
pabelanger | clarkb: replied, but also friday. Likely pick it up on monday | 21:50 |
*** mattw4 has quit IRC | 21:54 | |
*** mattw4 has joined #zuul | 22:05 | |
*** rlandy has quit IRC | 22:31 | |
*** mattw4 has quit IRC | 22:38 | |
*** mattw4 has joined #zuul | 22:42 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!