pabelanger | looking | 00:01 |
---|---|---|
corvus | no failures so far, and looking at the times, individual test runtimes look to be about 25% less than the current typical (while the entire job is about 20% longer) | 00:01 |
corvus | i'd trade an extra 5 minutes for more reliability | 00:01 |
corvus | we're losing way more than 5m each time it bombs out | 00:02 |
pabelanger | yah | 00:02 |
pabelanger | ++ | 00:02 |
corvus | we've got 12 test runs under the belt so far, i just did a recheck to get 10 more | 00:02 |
pabelanger | I kinda like that devstack has a dstat log, that you can also look at to see what the node is doing. Maybe we should consider a zuul jobs to do the same thing | 00:04 |
*** mattw4 has quit IRC | 00:07 | |
*** jamesmcarthur has quit IRC | 00:08 | |
*** jamesmcarthur has joined #zuul | 00:20 | |
*** jamesmcarthur has quit IRC | 00:36 | |
SpamapS | Hrm.. I rotated my github private key yesterday and now I'm getting 401's... | 00:48 |
corvus | pabelanger: feel free to play around with https://review.opendev.org/610100 | 00:49 |
pabelanger | corvus: will do | 00:50 |
SpamapS | ah.. we forgot to bounce the merger | 00:50 |
pabelanger | mnaser: w00t! zuul streaming logs via ipv6 now | 00:51 |
pabelanger | next up, ipv6 ansible | 00:51 |
pabelanger | but that is for another day | 00:51 |
pabelanger | need to prep for avengers! | 00:51 |
fungi | okay, just finished confirming Revert "Prepend path with bin dir of ansible virtualenv" fixed openstack's release jobs | 01:15 |
fungi | so that definitely seems to have been the cause for the regression we experienced | 01:16 |
mnaser | pabelanger: nice! | 01:17 |
mnaser | pabelanger: I hope to one day run OpenStack with iptables blocking all ipv4 traffic :) | 01:17 |
mnaser | but we've got some work still to do | 01:18 |
pabelanger | mnaser: seems like a great goal | 01:18 |
*** jamesmcarthur has joined #zuul | 01:27 | |
pabelanger | https://review.opendev.org/655808/ looks happy now, that is patch from corvus to fix race condition in testing | 01:29 |
pabelanger | I +2'd | 01:29 |
*** jamesmcarthur has quit IRC | 01:34 | |
*** jamesmcarthur has joined #zuul | 01:35 | |
openstackgerrit | Merged zuul/zuul master: web: add triggers information to pipeline list https://review.opendev.org/637670 | 02:09 |
*** jamesmcarthur has quit IRC | 02:18 | |
*** jamesmcarthur has joined #zuul | 02:26 | |
*** bhavikdbavishi has joined #zuul | 02:28 | |
*** jamesmcarthur has quit IRC | 02:43 | |
*** jamesmcarthur has joined #zuul | 02:45 | |
*** bhavikdbavishi has quit IRC | 02:59 | |
*** jamesmcarthur has quit IRC | 03:15 | |
*** jamesmcarthur has joined #zuul | 03:45 | |
*** bhavikdbavishi has joined #zuul | 03:50 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: trigger: add job filter event https://review.opendev.org/639905 | 03:51 |
*** jamesmcarthur has quit IRC | 03:51 | |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: webtrigger: add initial driver and event https://review.opendev.org/555153 | 03:59 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: webtrigger: add web route and rpclistener https://review.opendev.org/554839 | 03:59 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: web: add build button to trigger job https://review.opendev.org/635716 | 03:59 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: webtrigger: add support for parameterized trigger https://review.opendev.org/644484 | 03:59 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: web: add build modal with a parameter form https://review.opendev.org/644485 | 03:59 |
openstackgerrit | Tristan Cacqueray proposed zuul/zuul master: web: add support for checkbox and list parameters https://review.opendev.org/648661 | 03:59 |
*** bhavikdbavishi1 has joined #zuul | 04:01 | |
*** bhavikdbavishi has quit IRC | 04:03 | |
*** bhavikdbavishi1 is now known as bhavikdbavishi | 04:03 | |
*** jamesmcarthur has joined #zuul | 04:20 | |
*** pcaruana has joined #zuul | 04:23 | |
*** jamesmcarthur has quit IRC | 04:27 | |
*** threestrands has joined #zuul | 04:38 | |
*** bhavikdbavishi has quit IRC | 04:52 | |
*** bhavikdbavishi has joined #zuul | 04:53 | |
*** jamesmcarthur has joined #zuul | 05:00 | |
*** jamesmcarthur has quit IRC | 05:05 | |
*** jamesmcarthur has joined #zuul | 05:34 | |
*** jamesmcarthur has quit IRC | 05:39 | |
*** quiquell has joined #zuul | 06:06 | |
*** electrofelix has joined #zuul | 06:16 | |
*** jamesmcarthur has joined #zuul | 06:35 | |
*** jamesmcarthur has quit IRC | 06:40 | |
*** pcaruana has quit IRC | 06:48 | |
*** pcaruana has joined #zuul | 06:55 | |
*** jamesmcarthur has joined #zuul | 07:13 | |
*** jamesmcarthur has quit IRC | 07:25 | |
*** jpena|off is now known as jpena | 07:43 | |
*** threestrands has quit IRC | 07:50 | |
openstackgerrit | Merged zuul/zuul master: Fix race in test_job_pause_pre_skipped_child https://review.opendev.org/655808 | 09:03 |
*** panda|off is now known as panda | 09:34 | |
*** threestrands_ has joined #zuul | 10:03 | |
*** threestrands_ has quit IRC | 10:16 | |
*** bhavikdbavishi has quit IRC | 10:36 | |
*** panda is now known as panda|lunch | 11:10 | |
*** jpena is now known as jpena|lunch | 11:33 | |
*** jamesmcarthur has joined #zuul | 11:45 | |
*** bhavikdbavishi has joined #zuul | 11:49 | |
*** jamesmcarthur has quit IRC | 11:50 | |
*** quiquell is now known as quiquell|lunch | 11:51 | |
*** panda|lunch is now known as panda | 11:57 | |
*** maxamillion has quit IRC | 12:02 | |
*** maxamillion has joined #zuul | 12:03 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Recover cached repos from corrupt object files https://review.opendev.org/655890 | 12:11 |
*** jamesmcarthur has joined #zuul | 12:17 | |
*** jamesmcarthur has quit IRC | 12:30 | |
*** EmilienM is now known as EvilienM | 12:31 | |
*** jpena|lunch is now known as jpena | 12:32 | |
*** gtema has joined #zuul | 12:36 | |
*** quiquell|lunch is now known as quiquell | 12:40 | |
openstackgerrit | Merged zuul/zuul-jobs master: Don't repeat the etc/alias setup for buildset registry pushes https://review.opendev.org/655802 | 12:47 |
*** jamesmcarthur has joined #zuul | 12:50 | |
*** jamesmcarthur has quit IRC | 13:16 | |
*** gtema has quit IRC | 13:17 | |
*** gtema has joined #zuul | 13:18 | |
*** openstackgerrit has quit IRC | 13:27 | |
Shrews | pabelanger: endgame was sooo good | 13:59 |
pabelanger | Shrews: yes! No spoilers but was great | 14:01 |
pabelanger | also, http://paste.openstack.org/show/749814/ | 14:01 |
pabelanger | how can we get nodepool to give more info on that failure? | 14:01 |
Shrews | pabelanger: you can't. that's all you're given back from openstacksdk (and iirc, that's all it gets back from nova) | 14:02 |
pabelanger | boo | 14:02 |
pabelanger | okay, asking mnaser to help see what happened | 14:03 |
*** openstackgerrit has joined #zuul | 14:46 | |
openstackgerrit | Sean McGinnis proposed zuul/zuul-jobs master: Revert "Add environment debugging to ensure-twine role" https://review.opendev.org/655916 | 14:46 |
*** gtema has quit IRC | 14:49 | |
*** ericbarrett has quit IRC | 14:55 | |
*** ianychoi_ has joined #zuul | 15:00 | |
*** ianychoi has quit IRC | 15:03 | |
*** zbr|rover is now known as zbr|over | 15:16 | |
openstackgerrit | Sean McGinnis proposed zuul/zuul-jobs master: Revert "Add environment debugging to ensure-twine role" https://review.opendev.org/655916 | 15:35 |
openstackgerrit | Merged zuul/zuul master: Recover cached repos from corrupt object files https://review.opendev.org/655890 | 15:43 |
openstackgerrit | Paul Belanger proposed zuul/zuul master: WIP: Run dstat and generate graphs in unit tests https://review.opendev.org/610100 | 15:52 |
*** electrofelix has quit IRC | 16:00 | |
corvus | i've restarted our scheduler with only a revert of 3704095c7927568a1f32317337c3646a9d15769e to see if it is the cause of the memory leak, or the other change (02b07a362b201382f62bb5dd0bb82e3bce35e4cc) | 16:05 |
corvus | (so if our memory usage is steady, then the problem was the cancel jobs patch; if it grows now, then it was the missing project patch) | 16:06 |
corvus | http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64792&rra_id=all | 16:06 |
clarkb | corvus: 9fc2 wasthe other change | 16:07 |
corvus | er yes, that | 16:09 |
corvus | 9f7c642ae1dc5ac7de1cb0ff5c7e32d6426bd1b3 | 16:10 |
*** chandankumar is now known as raukadah | 16:12 | |
openstackgerrit | Merged zuul/zuul-jobs master: Revert "Add environment debugging to ensure-twine role" https://review.opendev.org/655916 | 16:20 |
*** mattw4 has joined #zuul | 16:20 | |
*** jpena is now known as jpena|off | 16:42 | |
*** panda is now known as panda|off | 16:56 | |
*** maxamillion has quit IRC | 16:58 | |
*** maxamillion has joined #zuul | 16:58 | |
*** zxiiro has quit IRC | 17:03 | |
*** zxiiro has joined #zuul | 17:04 | |
mordred | Shrews: looks like zuul-preview has gone south again - I wonder if we're running our latest fixes there | 17:24 |
mordred | corvus: ^^ | 17:25 |
mordred | this is obviously not urgent, and I don't think we should spend much effort on it today | 17:25 |
mordred | if any effort at all | 17:25 |
corvus | oh probably not, i've been completely neglecting it | 17:25 |
corvus | probably not running latest fixes | 17:25 |
mordred | but: http://site.4f816af6b10540b1b99d19fca3adc551.opendev.zuul-preview.opendev.org/ is hanging, which is the symptom that caused the fix patch | 17:25 |
mordred | cool | 17:25 |
mordred | should I just do a pull/restart? | 17:25 |
corvus | yeah, seems easy and no reason not to | 17:26 |
mordred | done | 17:26 |
mordred | works now | 17:26 |
*** jamesmcarthur has joined #zuul | 17:30 | |
*** tjgresha has joined #zuul | 17:31 | |
Shrews | mordred: that's exciting | 17:33 |
Shrews | glad it's working now tho | 17:34 |
*** tjgresha has quit IRC | 17:36 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: Fix memory leak in job cancelation https://review.opendev.org/655982 | 17:49 |
corvus | tobiash, clarkb, mordred, Shrews: i left some commentary on https://review.opendev.org/640609 on changes i observed and the possible culprit. 655982 ^ should be a fix if i'm right. | 17:49 |
corvus | i have not experimentally verified any of that; that's just from the old mental python interpreter. | 17:50 |
clarkb | cool I'll take a look soon. Happu I could help narrow things down | 17:50 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Fix memory leak in job cancelation https://review.opendev.org/655982 | 17:52 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Fix memory leak in job cancellation https://review.opendev.org/655982 | 17:53 |
tobiash | corvus: oh, I fixed the first one in the fail-fast patch too: https://review.opendev.org/#/c/652764/7/zuul/scheduler.py | 17:53 |
corvus | tobiash: whoops. :) | 17:54 |
Shrews | corvus: that change had me doing a double take there | 17:54 |
Shrews | the new type hints threw me | 17:55 |
corvus | Shrews: yeah, sorry, the fix is just a few characters, the type hints are penance for allowing the error. :) | 17:55 |
tobiash | yeah, the type hints would have prevented that bug... | 17:56 |
Shrews | lgtm though | 17:56 |
corvus | i'm conflicted on whether i like them for python, but i have to admit that they can help in this case. alternatively, we could be more consistent on taking job *objects* and then job.name will naturally crash if you give it the wrong thing. | 17:57 |
tobiash | corvus: is https://review.opendev.org/#/c/655982/3/zuul/manager/__init__.py a drive by fix or is there a relation I don't see? | 17:58 |
*** mattw4 has quit IRC | 17:58 | |
tobiash | ah, looks like it was an unused parameter | 17:59 |
corvus | yeah, i updated the commit to mention that in the latest ps | 18:00 |
corvus | commit msg | 18:00 |
mnaser | yum, type-hint'd python... | 18:01 |
*** mattw4 has joined #zuul | 18:02 | |
mordred | corvus: yay for type hints being useful! | 18:12 |
*** gtema has joined #zuul | 18:16 | |
*** gtema has quit IRC | 18:20 | |
*** jamesmcarthur has quit IRC | 18:34 | |
*** jamesmcarthur has joined #zuul | 18:34 | |
*** jamesmcarthur has quit IRC | 18:36 | |
*** mattw4 has quit IRC | 18:45 | |
*** mattw4 has joined #zuul | 18:48 | |
*** jamesmcarthur has joined #zuul | 18:49 | |
*** jamesmcarthur has quit IRC | 18:56 | |
*** jamesmcarthur has joined #zuul | 18:59 | |
*** mattw4 has quit IRC | 19:03 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Fix stale job dir deletion on startup https://review.opendev.org/656003 | 19:55 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Fix race in test_job_pause_retry https://review.opendev.org/656004 | 19:56 |
tobiash | corvus: I thought I remembered that the executor was supposed to cleanup the job dirs on startup, but it never did in my deployment. This fixes it ^ | 19:56 |
corvus | tobiash: https://review.opendev.org/620697 is where i left that | 20:04 |
corvus | tobiash: looks like the same fix, though i added a non-working test | 20:06 |
corvus | it's the test that was holding up that patch | 20:06 |
tobiash | oh, looks like I overlooked that | 20:06 |
corvus | considering the way the test was failing, i think it might be worth looking into that | 20:06 |
corvus | (i haven't looked into it to find out if it would cause a production issue, or only tests) | 20:07 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Fix deletion of stale build dirs on startup https://review.opendev.org/620697 | 20:08 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Add more debugging to tests https://review.opendev.org/656006 | 20:13 |
openstackgerrit | Merged zuul/zuul master: Fix memory leak in job cancellation https://review.opendev.org/655982 | 20:42 |
*** nickx-intel has joined #zuul | 20:45 | |
nickx-intel | hi zuulies. I'm working through the quickstart guide ..... I'm not seeing zuul approve my reviews though :( | 20:45 |
nickx-intel | I replied to gerrit change @zuul "looks ok to me, a human - what do you think zuul?" | 20:49 |
nickx-intel | +1 +1 +1 ... just waiting for zuul to approve | 20:49 |
nickx-intel | I think it's probably because of noop jobs but the guide doesn't really point to that; I'm expecting it to work as is | 20:49 |
clarkb | nickx-intel: you have to leave a workflow +1 vote for zuul to pick it up looking at the quickstart example pipeline. Did you do that? | 20:50 |
corvus | i think https://zuul-ci.org/docs/zuul/_images/review-1003.png is the relevant screenshot | 20:51 |
nickx-intel | I left workflow +1 vote ... 2 +1 in fact ... and I'm actually in the Test Zuul Pipelines subheader right now | 20:51 |
corvus | so if you click code-review +2, and click workflow +1, then click send | 20:51 |
nickx-intel | I have to +2 code-review? | 20:51 |
nickx-intel | let me try that | 20:51 |
nickx-intel | ok, +2, submitted, merged | 20:52 |
mordred | \o/ | 20:52 |
*** tjgresha has joined #zuul | 20:53 | |
corvus | nickx-intel: ah, if you at "Test Zuul Pipelines", the only thing we're expecting is for zuul to report on the change on patchset upload. so i think you were working ahead of the rest of the class. :) | 20:53 |
nickx-intel | oh? I was expecting zuul to +1? huh | 20:54 |
nickx-intel | verified +1 vote? | 20:54 |
nickx-intel | errata since quick-start was written or so? :1 | 20:55 |
corvus | nickx-intel: yeah, roughly speaking, "Test Zuul Pipelines" should get you to the point where zuul should leave a +1 when you upload a patch. the next section ("Configure a Base Job") gets all the way to merging it. | 20:55 |
clarkb | unrelated but I'm about to shut down the desktop where I took these notes. The window where the memory leak arose was between 2019-04-16 03:02:39 UTC Restarted zuul-scheduler and zuul-web on commit 0bb220c and 2019-03-19 21:11:31 UTC restarted all of zuul at commit 77ffb70104959803a8ee70076845c185bd17ddc1 just in case we haev to do more debugging of that | 20:56 |
nickx-intel | ........ do I need to reload docker after the pipelines merge? for zuul to get its new config? | 20:56 |
corvus | nickx-intel: nope that should all be automatic | 20:57 |
nickx-intel | so I've done something wrong with the bootstrap corvus? I've been over and over this .... same results same results | 20:58 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Fix deletion of stale build dirs on startup https://review.opendev.org/620697 | 20:58 |
corvus | nickx-intel: i'm not sure, i thought you said it worked? what's not happening as you expect? | 20:58 |
nickx-intel | the merge of check and gate pipelines was successful vis a vis bootstrapping | 20:59 |
nickx-intel | I'm expecting the basic noop check and gate configuration to evaluate proposed changes for untrusted projects | 21:00 |
nickx-intel | I see quul is added as a reviewer but not verified +1 | 21:01 |
corvus | so when you pushed up your test change, you did not see this? https://zuul-ci.org/docs/zuul/_images/check1-1002.png | 21:01 |
nickx-intel | ^quul^zuul | 21:01 |
nickx-intel | is that ...... when you upload via html? | 21:01 |
corvus | nickx-intel: that should be what happens after you run "git review" | 21:01 |
nickx-intel | I'm going to type out some text here, 1sec: | 21:02 |
nickx-intel | # git review | 21:03 |
nickx-intel | > could not connect to gerrit. | 21:03 |
nickx-intel | > enter your gerrit username: [username] | 21:03 |
tobiash | corvus: latest PS on 620697 should fix the tests. As far as I can see there was no problem with the log streamer but just some places in the tests that didn't account for the changed job dir location | 21:03 |
nickx-intel | > trying again with ssh://[username]@localhost:29418/test1 | 21:03 |
nickx-intel | > Enter passphrase for key [...]: | 21:03 |
tobiash | nickx-intel: zuul cannot handle ssh keys with pass phrases atm | 21:04 |
nickx-intel | > creating a remote called "gerrit" that maps to: ssh://[username]@localhost:29418/test1 | 21:04 |
nickx-intel | uhh | 21:04 |
nickx-intel | hmm | 21:04 |
corvus | tobiash: that's not the issue | 21:04 |
nickx-intel | so anyway, after all that | 21:04 |
nickx-intel | > remote: SUCCESS | 21:04 |
corvus | tobiash: nickx-intel is running through the quickstart docs -- that's the git-review command being used to push up changes to the embedded gerrit | 21:04 |
nickx-intel | remote: http://localhost:8080/c/test1/+/1002 Add Auul test1 job | 21:05 |
tobiash | oops, missed that detail | 21:05 |
nickx-intel | yeah so review succeeds as [user] | 21:05 |
corvus | nickx-intel: can you go to the http://localhost:8080/c/test1/+/1002 make a screenshot and paste the link here? | 21:05 |
corvus | (firefox has built-in screenshot support if you're using it; should be under the "..." menu in the url bar) | 21:06 |
nickx-intel | what's a good screenshot hosting site again? not pastebin .. uh .. | 21:07 |
corvus | imgur works | 21:07 |
nickx-intel | word corvus, sec | 21:07 |
nickx-intel | https://imgur.com/a/ZY2shOY | 21:08 |
nickx-intel | 800x600 - apologies | 21:09 |
corvus | nickx-intel: can you click "expand all" in the bottom right hand corner, scroll to the bottom, and screenshot that? | 21:10 |
nickx-intel | yup yup | 21:10 |
nickx-intel | https://imgur.com/a/2wK35z8 | 21:13 |
corvus | nickx-intel: sorry i meant to expand the comments at the bottom | 21:14 |
nickx-intel | oh haha ok 1sec | 21:14 |
nickx-intel | https://imgur.com/a/GqLo96O | 21:15 |
nickx-intel | see me in the comments? "zuul pls" | 21:16 |
*** pcaruana has quit IRC | 21:16 | |
nickx-intel | corvus, ^ | 21:17 |
corvus | nickx-intel: can you visit http://localhost:9000/t/example-tenant/status and paste a screenshot? | 21:17 |
nickx-intel | hmm port 9000 is closed | 21:19 |
nickx-intel | I'm clearly not operating on localhost:#### | 21:19 |
nickx-intel | I see netstat listening on 9000 | 21:19 |
tjgresha | which cloud are you on @intel | 21:20 |
corvus | nickx-intel: how about the output of "docker ps" -- you can use paste.openstack.org to paste it here if you want | 21:20 |
nickx-intel | iptables shows 9000 open | 21:20 |
tjgresha | need to change the security in the tenant to open port 9000 if it is not | 21:20 |
nickx-intel | tjgresha, if you have an internal address pm me | 21:21 |
nickx-intel | ooooo let me check that, although, I've added those rules, but let's see | 21:21 |
corvus | nickx-intel: ah, you're sshing into a cloud vm where you're running docker? and i guess you have port 8080 open so you can reach gerrit? | 21:22 |
nickx-intel | yeahhhhh I'm just missing 9000 from the "cloud" security group rules, sec | 21:22 |
nickx-intel | 9000 open, ss incoming | 21:23 |
tjgresha | see - i know things | 21:23 |
SpamapS | feature idea: it would be cool if we could make checks cancelled if a job goes in to gate. I waste a *lot* of compute on parallel check/gate runs when we're paired up and reviewing fast. | 21:23 |
nickx-intel | corvus, check: 0 gate: 0; 0 events; 0 management events; 0 results | 21:24 |
nickx-intel | tjgresha, +1 :D | 21:24 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Fix test race in test_job_pause_post_fail https://review.opendev.org/656019 | 21:25 |
tobiash | corvus: another test race fix ^ | 21:25 |
corvus | nickx-intel: is there a little bell in the top right hand corner? | 21:25 |
nickx-intel | nope corvus | 21:25 |
corvus | nickx-intel: at this point i think i would look at the scheduler log to see if there are errors there, and if it saw the event from gerrit. | 21:26 |
corvus | nickx-intel: you should be able to see that with "docker logs examples_scheduler_1" i think | 21:27 |
nickx-intel | I can see noop jobs in zuul-config master check/gate jobs .... | 21:27 |
corvus | SpamapS: yeah we could probably use that in our new opendev and zuul tenants (which don't have clean check either) | 21:28 |
SpamapS | corvus:right, I was thinking something like "cancels: {pipeline}" which would be able to cancel something in the other pipeline with the same change ID. | 21:28 |
corvus | SpamapS: yeah i think that's the way to do it | 21:29 |
nickx-intel | uh, binary file (standard input) matches .. even with xargs :1 just tryina grep error | 21:29 |
SpamapS | corvus:I'll do it with one of my free minutes. | 21:30 |
corvus | SpamapS: shouldn't take more than one. ;) | 21:30 |
nickx-intel | lol | 21:30 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Increase zookeeper priority during tests https://review.opendev.org/656021 | 21:32 |
nickx-intel | corvus, handscanning this log, I see no errors, would be nice to grep but "people in hell want ice water" :9 | 21:32 |
corvus | nickx-intel: oh you might have to "docker logs examples_scheduler_1 2>&1 | grep" | 21:33 |
corvus | i think the logs may go to stdout and docker "helpfully" maintains that | 21:33 |
corvus | er stderr | 21:33 |
nickx-intel | same binary file (standard input) matches | 21:34 |
nickx-intel | I'm working on the concept that stuff is succeeding improperly | 21:34 |
corvus | nickx-intel: the logs should have a lot of information including all of the events received from gerrit, starting and completing jobs, and reporting to gerrit, though obviously at least the last one will be missing. | 21:34 |
corvus | nickx-intel: another thing you can check in the zuul web interface is the builds tab -- http://localhost:9000/t/example-tenant/builds | 21:35 |
corvus | nickx-intel: if you see something there, then zuul got the event from gerrit and ran jobs; if not, then we've narrowed it down to either zuul not receiving the event, or it did receive the event but didn't match it to any projects or pipelines, and therefore ran no jobs. | 21:36 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Fix test race in test_job_pause_post_fail https://review.opendev.org/656019 | 21:36 |
corvus | tobiash: http://logs.openstack.org/06/656006/1/check/tox-py35/b15bf22/testr_results.html.gz looks like maybe we have a similar race in plain old "test_job_pause". i know there's a bunch of unrelated errors there, but i think the failure on test_job_pause is a legitimate test race. | 21:37 |
nickx-intel | corvus, https://pastebin.com/9V6ZU81k | 21:37 |
nickx-intel | builds appears to be empty :o | 21:38 |
tobiash | corvus: yes, looks like exactly the same race, fixing | 21:38 |
corvus | nickx-intel: then the answer is going to be in the scheduler log. if you want to paste that, i can help analyze it. | 21:39 |
*** mattw4 has joined #zuul | 21:39 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Fix test race in test_job_pausee https://review.opendev.org/656024 | 21:40 |
nickx-intel | thanks corvus, let me check with someone here about that, and I'll get back to you with a "we figured it out" or "ugh wat" | 21:40 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Fix test race in test_job_pause https://review.opendev.org/656024 | 21:41 |
corvus | nickx-intel: good luck. several other folks here can help pinpoint problems too, if i'm not around. the log is very verbose, but if you haven't run a zuul you might not know what's *not* being printed. | 21:42 |
nickx-intel | thanks for tracking corvus, we're reviewing internally at present | 22:12 |
nickx-intel | corvus, there were a couple of issues: 1. I made myself admin in gerrit (force of habit) 2. verified review as user and took zuul's job | 22:35 |
nickx-intel | so zuul was sulking without throwing any errors XD | 22:36 |
nickx-intel | <zuul> "ok fine u do it then" | 22:36 |
*** sshnaidm has joined #zuul | 22:37 | |
*** sshnaidm is now known as sshnaidm|off | 22:43 | |
*** sshnaidm|off has quit IRC | 22:46 | |
*** EvilienM is now known as EmilienM | 22:51 | |
*** sshnaidm has joined #zuul | 22:53 | |
*** sshnaidm is now known as sshnaidm|off | 22:54 | |
*** jamesmcarthur has quit IRC | 23:05 | |
pabelanger | https://review.opendev.org/656024/ could use another +2 / +A, thanks to tobiash / corvus. I was looking into them this afternoon, but got distracted | 23:13 |
*** sshnaidm|off has quit IRC | 23:24 | |
pabelanger | so, I've now see across multiple tests that we don't seem to be shutting down all the threads properly: http://logs.openstack.org/04/656004/1/gate/tox-py36/1e59b10/testr_results.html.gz | 23:38 |
pabelanger | I can see the thread ID, eg: Thread: 140429056014080 then traceback | 23:39 |
pabelanger | however, I am unsure how it maps to: http://paste.openstack.org/show/749842/ | 23:40 |
pabelanger | <Thread(Thread-2929, started daemon 140429056014080)>, doesn't really explain what it is | 23:40 |
pabelanger | where others are: <Thread(Gearman client poll, started daemon 140429592884992)> | 23:40 |
corvus | pabelanger: that happens when tests timeout; it's a symptom not a cause | 23:51 |
pabelanger | corvus: ah, okay. thanks for info | 23:51 |
corvus | pabelanger: usually what i do is look for the first Traceback; if it's in the middle of a test and it relates to a disconnect, it means the system is too busy and we had a connection timeout | 23:53 |
corvus | pabelanger: if it's a timeout exception, then that's straightforward -- it's just a test timeout, also probably because the test is too busy | 23:53 |
corvus | pabelanger: if one of those happens, then the tests that run afterwords are suspect -- they're probably going to output errors like that even if they ran okay | 23:54 |
corvus | pabelanger: if the first error is an actual test assertion error though, that's probably debuggable. | 23:54 |
pabelanger | k | 23:56 |
pabelanger | I just rechecked https://review.opendev.org/656019/ it failed in gate, but 656024 is the fix. | 23:57 |
pabelanger | then will look at output of 655805 again, which seems to show we are still loose gearman connections some times | 23:58 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!