*** stakeda has joined #openstack-infra | 00:03 | |
*** vaidy has quit IRC | 00:04 | |
*** isviridov_away has quit IRC | 00:04 | |
*** lnxnut has quit IRC | 00:07 | |
*** markvoelker has joined #openstack-infra | 00:12 | |
*** yamamoto has joined #openstack-infra | 00:14 | |
*** yamamoto has quit IRC | 00:14 | |
*** yamamoto has joined #openstack-infra | 00:14 | |
*** dave-mccowan has joined #openstack-infra | 00:15 | |
*** yamamoto has quit IRC | 00:19 | |
*** lukebrowning has joined #openstack-infra | 00:28 | |
*** lukebrowning_ has joined #openstack-infra | 00:32 | |
*** lukebrowning has quit IRC | 00:33 | |
*** lukebrowning_ has quit IRC | 00:37 | |
*** edmondsw has joined #openstack-infra | 00:39 | |
*** psachin has quit IRC | 00:40 | |
*** psachin has joined #openstack-infra | 00:41 | |
*** lukebrowning has joined #openstack-infra | 00:43 | |
*** edmondsw has quit IRC | 00:44 | |
*** markvoelker has quit IRC | 00:45 | |
*** lukebrowning has quit IRC | 00:47 | |
*** lukebrowning has joined #openstack-infra | 00:49 | |
*** kiennt26 has joined #openstack-infra | 00:50 | |
*** lukebrowning has quit IRC | 00:54 | |
*** lukebrowning has joined #openstack-infra | 00:56 | |
*** lukebrowning has quit IRC | 01:00 | |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Remove legacy loci jobs https://review.openstack.org/508556 | 01:00 |
---|---|---|
*** jascott1 has joined #openstack-infra | 01:03 | |
*** claudiub has quit IRC | 01:05 | |
*** lukebrowning has joined #openstack-infra | 01:06 | |
SamYaple | is it possible to give jobs priorities? for example if i have a job that i know takes 20 minutes to run, i want that to start before a job that i know takes 10 minuts to run (instead of sitting "queued") | 01:07 |
*** lukebrowning_ has joined #openstack-infra | 01:09 | |
*** cuongnv has joined #openstack-infra | 01:09 | |
*** lukebrowning has quit IRC | 01:11 | |
*** lukebrowning_ has quit IRC | 01:13 | |
*** hongbin has joined #openstack-infra | 01:15 | |
*** lukebrowning has joined #openstack-infra | 01:15 | |
*** yamamoto has joined #openstack-infra | 01:19 | |
*** lukebrowning has quit IRC | 01:20 | |
*** lukebrowning has joined #openstack-infra | 01:21 | |
*** yamamoto has quit IRC | 01:25 | |
*** lukebrowning has quit IRC | 01:26 | |
*** cshastri has joined #openstack-infra | 01:26 | |
*** lukebrowning has joined #openstack-infra | 01:27 | |
*** dave-mccowan has quit IRC | 01:30 | |
*** lukebrowning has quit IRC | 01:32 | |
*** dave-mccowan has joined #openstack-infra | 01:33 | |
*** lukebrowning has joined #openstack-infra | 01:34 | |
*** lnxnut has joined #openstack-infra | 01:34 | |
*** edmondsw has joined #openstack-infra | 01:37 | |
jeblair | SamYaple: they're run in the order listed, so you can put the long ones first | 01:38 |
*** lukebrowning has quit IRC | 01:39 | |
*** lihi has quit IRC | 01:39 | |
*** dimak has quit IRC | 01:39 | |
*** lihi has joined #openstack-infra | 01:39 | |
*** dimak has joined #openstack-infra | 01:40 | |
*** leyal has quit IRC | 01:40 | |
*** lukebrowning has joined #openstack-infra | 01:40 | |
*** leyal has joined #openstack-infra | 01:40 | |
SamYaple | jeblair: if thats the design, thats not the way its working. they start in seemingly random order | 01:41 |
SamYaple | ive had the long job start first sometimes | 01:41 |
SamYaple | see the job in the queue right this second | 01:41 |
SamYaple | first job is queued, last job is running | 01:41 |
*** edmondsw has quit IRC | 01:41 | |
SamYaple | and a mix in between | 01:41 |
*** markvoelker has joined #openstack-infra | 01:42 | |
*** vaidy has joined #openstack-infra | 01:44 | |
*** lukebrowning has quit IRC | 01:44 | |
*** isviridov_away has joined #openstack-infra | 01:45 | |
*** lukebrowning has joined #openstack-infra | 01:46 | |
*** dave-mcc_ has joined #openstack-infra | 01:49 | |
*** lukebrowning_ has joined #openstack-infra | 01:50 | |
*** dave-mccowan has quit IRC | 01:51 | |
*** lukebrowning has quit IRC | 01:51 | |
jeblair | SamYaple: well, that's the order in which the nodes are requested. the order they arrive is determined by cloud. :) | 01:53 |
*** lukebrowning has joined #openstack-infra | 01:53 | |
SamYaple | oh i see. well then i suppose it doesn't make much of a difference! ill still move it to the top though | 01:54 |
*** lukebrowning_ has quit IRC | 01:54 | |
SamYaple | did zuulv3 add a way to move artifact from one job to anout (say from a job in the gate queue to a job in the post queue)? | 01:55 |
SamYaple | s/anout/another/ | 01:55 |
jeblair | yeah, the only time it would make a difference is if we're at capacity with no turnover. that generally never happens (even when we're at capacity, we turn over like 10 nodes a minute. so you'll never notice. | 01:55 |
jeblair | SamYaple: not yet, that's still via tarballs.o.o for now | 01:55 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Implement an OpenContainer driver https://review.openstack.org/468753 | 01:56 |
SamYaple | ok. (in this case i was hoping to publish something from the gate queue to tarballs.o.o ) | 01:56 |
*** dave-mccowan has joined #openstack-infra | 01:57 | |
SamYaple | its not a big thing. i jsut have to rebuild it in post | 01:57 |
*** lukebrowning has quit IRC | 01:58 | |
*** dave-mcc_ has quit IRC | 01:59 | |
SamYaple | oh man. so excited. very happy with zuulv3 | 02:00 |
*** lnxnut has quit IRC | 02:01 | |
SamYaple | by the end of next week i should be publishing images to dockerhub based on changes to the cinder/keystone/nova/etc repo | 02:01 |
*** shu-mutou-AWAY is now known as shu-mutou | 02:01 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP add repl https://review.openstack.org/508793 | 02:06 |
*** efried_thbagh has quit IRC | 02:07 | |
*** lukebrowning has joined #openstack-infra | 02:07 | |
jeblair | i'm going to restart zuul again | 02:07 |
*** lukebrowning has quit IRC | 02:12 | |
SamYaple | monday its going to get hammered so hard | 02:13 |
*** lukebrowning has joined #openstack-infra | 02:14 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP add repl https://review.openstack.org/508793 | 02:15 |
*** markvoelker has quit IRC | 02:16 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP add repl https://review.openstack.org/508793 | 02:17 |
*** dave-mcc_ has joined #openstack-infra | 02:17 | |
*** lukebrowning has quit IRC | 02:18 | |
*** dave-mccowan has quit IRC | 02:18 | |
*** yamamoto has joined #openstack-infra | 02:21 | |
jeblair | infra-root: i've restarted zuul with a locally applied patch to add a repl so we can do some interative debugging tomorrow ^ | 02:23 |
*** lukebrowning has joined #openstack-infra | 02:26 | |
*** esberglu has joined #openstack-infra | 02:26 | |
*** yamamoto has quit IRC | 02:27 | |
*** ijw has joined #openstack-infra | 02:28 | |
*** lukebrowning_ has joined #openstack-infra | 02:30 | |
*** lukebrowning has quit IRC | 02:30 | |
*** esberglu has quit IRC | 02:31 | |
*** ijw has quit IRC | 02:32 | |
*** lukebrowning_ has quit IRC | 02:35 | |
*** lukebrowning has joined #openstack-infra | 02:37 | |
*** lukebrowning has quit IRC | 02:41 | |
*** lukebrowning has joined #openstack-infra | 02:43 | |
*** bobh has quit IRC | 02:46 | |
*** lukebrowning has quit IRC | 02:47 | |
*** ericyoung has quit IRC | 02:48 | |
*** lukebrowning has joined #openstack-infra | 02:49 | |
*** baoli has quit IRC | 02:52 | |
*** lukebrowning has quit IRC | 02:54 | |
*** dave-mcc_ has quit IRC | 02:55 | |
*** lukebrowning has joined #openstack-infra | 02:56 | |
Jeffrey4l | could anybody check this in-project-job? zuulv3 show that kolla-build-centos-source job is in the queue all the time. | 02:59 |
Jeffrey4l | https://review.openstack.org/#/c/508768/1/.zuul.yaml | 02:59 |
*** ekcs has quit IRC | 03:00 | |
*** lukebrowning has quit IRC | 03:00 | |
*** ekcs has joined #openstack-infra | 03:01 | |
*** bobh has joined #openstack-infra | 03:01 | |
*** lukebrowning has joined #openstack-infra | 03:02 | |
*** rkukura has joined #openstack-infra | 03:05 | |
*** lukebrowning has quit IRC | 03:06 | |
*** lukebrowning has joined #openstack-infra | 03:08 | |
*** dhajare has joined #openstack-infra | 03:12 | |
*** markvoelker has joined #openstack-infra | 03:13 | |
*** lukebrowning has quit IRC | 03:13 | |
*** nicolasbock_ has quit IRC | 03:16 | |
*** lukebrowning has joined #openstack-infra | 03:17 | |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Extract releasenotes build into a role https://review.openstack.org/508765 | 03:17 |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Run release note jobs only when they change https://review.openstack.org/508762 | 03:17 |
*** rossella_s has quit IRC | 03:19 | |
*** rossella_s has joined #openstack-infra | 03:20 | |
*** lukebrowning has quit IRC | 03:22 | |
SpamapS | SamYaple: we _could_ give you a way to upload stuff to dockerhub in gate. But the git sha will change between gate and post since Zuul just tells gerrit to merge... | 03:23 |
*** yamamoto has joined #openstack-infra | 03:23 | |
boris_42_ | heh =( this patch didn't help with Rally gates https://review.openstack.org/#/c/508630/ ;( | 03:24 |
boris_42_ | not sure how to add dib-utils to legacy jobs =( | 03:24 |
boris_42_ | I take a look at other project they just add them to Projects ... | 03:24 |
SpamapS | boris_42_: let me look | 03:25 |
clarkb | boris_42_: look at required-projects | 03:25 |
clarkb | also thr comment about gnocchi was correct | 03:25 |
clarkb | that wont work | 03:25 |
*** edmondsw has joined #openstack-infra | 03:25 | |
clarkb | you need to install it from wherever they publish it now | 03:25 |
SpamapS | github | 03:26 |
SpamapS | so technically | 03:26 |
clarkb | SpamapS: its orthogonal to zuul | 03:26 |
clarkb | devstack-gate assumes things | 03:27 |
clarkb | so it wont work | 03:27 |
SpamapS | does it assume openstack as an org? | 03:27 |
clarkb | yes | 03:27 |
boris_42_ | clarkb: could you elaborate about required-projects ? where that thing is? | 03:27 |
clarkb | boris_42_: its a job setting should be plenty of examples to grep for in openstack-zuul-jobs | 03:28 |
*** lnxnut has joined #openstack-infra | 03:28 | |
*** yamamoto has quit IRC | 03:29 | |
SpamapS | clarkb: oh, so we can't just add https://github.com/gnocchixyz/gnocchi as a project? | 03:29 |
*** ijw has joined #openstack-infra | 03:29 | |
*** lukebrowning has joined #openstack-infra | 03:30 | |
SpamapS | (probably not the best time to be trying out new trigger drivers ;) | 03:30 |
boris_42_ | clarkb: ah ok | 03:30 |
*** edmondsw has quit IRC | 03:30 | |
SamYaple | SpamapS: hmmm. theres a thought. | 03:30 |
SamYaple | SpamapS: nah you know its probably fine. it takes 15m to build the wheels that i need, its not a huge issue right now. i can wait until there is better methods | 03:31 |
clarkb | SpamapS: not to solve boris_42_'s problem I dont think | 03:32 |
clarkb | SpamapS: we'd have to sort out how it also breaks in devstack-gate | 03:33 |
*** ijw has quit IRC | 03:33 | |
*** lukebrowning has quit IRC | 03:34 | |
leyal | Hi, i got -1 from zuul on this patch - https://review.openstack.org/#/c/508785/1 , where can i check why it's fail ? - it's not contain references to any failed test .. | 03:34 |
*** lukebrowning has joined #openstack-infra | 03:36 | |
clarkb | leyal: did you see the comment from zuul? | 03:36 |
boris_42_ | clarkb: so I can enable gnocchi as a devstack plugin | 03:37 |
clarkb | I think it may be a yaml indentation thing, after a : the next line needs 2 spaces of indentation | 03:37 |
leyal | clarkb, nope - just - Verified "-1 | 03:37 |
clarkb | leyal: toggle ci at the bottom | 03:38 |
clarkb | there is a comment from zuul that tries to explain what is going on | 03:39 |
leyal | clarkb , thanks :) | 03:39 |
*** lukebrowning has quit IRC | 03:41 | |
*** lukebrowning has joined #openstack-infra | 03:42 | |
openstackgerrit | Boris Pavlovic proposed openstack-infra/openstack-zuul-jobs master: Fix Rally jobs: Add dib-utils to required projects https://review.openstack.org/508799 | 03:43 |
boris_42_ | clarkb: ^ so something like this? | 03:44 |
clarkb | boris_42_: ya | 03:44 |
boris_42_ | clarkb: okay it will take some time to get familiar with this new system =) | 03:45 |
*** isaacb has joined #openstack-infra | 03:46 | |
*** markvoelker has quit IRC | 03:46 | |
*** lukebrowning has quit IRC | 03:47 | |
*** lukebrowning has joined #openstack-infra | 03:48 | |
*** links has joined #openstack-infra | 03:48 | |
SamYaple | trying to make a commit to the requirements repo, nova function test is failing, http://logs.openstack.org/91/508791/1/check/legacy-cross-nova-func/f469b5d/job-output.txt.gz#_2017-10-02_02_57_39_951015 | 03:51 |
SamYaple | any ideas? | 03:51 |
openstackgerrit | Eyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow https://review.openstack.org/508785 | 03:52 |
*** lukebrowning has quit IRC | 03:53 | |
*** isaacb has quit IRC | 03:54 | |
sshnaidm|off | cores, please check problem with XStatic package on pypi mirror on rh1 cloud: https://bugs.launchpad.net/tripleo/+bug/1720721 | 03:54 |
openstack | Launchpad bug 1720721 in tripleo "CI: OVB jobs fail because can't install XStatic from PyPI mirror on rh1 cloud" [Critical,Triaged] - Assigned to Paul Belanger (pabelanger) | 03:54 |
*** sshnaidm|off is now known as sshnaidm | 03:54 | |
*** lukebrowning has joined #openstack-infra | 03:55 | |
*** gouthamr has joined #openstack-infra | 03:55 | |
*** lnxnut has quit IRC | 03:55 | |
clarkb | SamYaple: I think that job may be running tox in requirements repo and not nova | 03:56 |
clarkb | sshnaidm: that pacage isnt in our mirror index. Where are you jobs finding it? is it a constraint? (but likely means our bandersnatch isnt updating properly for some reason and will have to be debugged) | 03:58 |
openstackgerrit | Omer Anson proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow https://review.openstack.org/508785 | 03:58 |
*** lukebrowning has quit IRC | 03:59 | |
*** dhajare has quit IRC | 04:04 | |
*** lukebrowning has joined #openstack-infra | 04:11 | |
*** lukebrowning_ has joined #openstack-infra | 04:14 | |
SamYaple | clarkb: i am unsure how to fix that... ill start digging into the legacy job i suppose | 04:14 |
*** esberglu has joined #openstack-infra | 04:14 | |
*** lukebrowning has quit IRC | 04:15 | |
sshnaidm | clarkb, it happens when zuul starts to install ara, it's even before job code runs, it's infra step | 04:17 |
SamYaple | clarkb: https://github.com/openstack-infra/openstack-zuul-jobs/blob/master/playbooks/legacy/cross-nova-func/run.yaml#L70 so this should run in the nova src_dir correct? | 04:17 |
*** esberglu has quit IRC | 04:19 | |
*** jaosorior has joined #openstack-infra | 04:19 | |
*** lukebrowning_ has quit IRC | 04:19 | |
clarkb | SamYaple: ya, but workspace root will be for the repo change is made against I think | 04:19 |
*** lukebrowning has joined #openstack-infra | 04:19 | |
clarkb | sshnaidm: hrm, I wonder why ara works for us elsewhere | 04:20 |
SamYaple | so ive got to basically check that it is nova and run it in workspace root, otherwise run it in the cloned nova dir? | 04:20 |
SamYaple | wait if this is just the cross-nova-func, then it shouldnt run in nova at all | 04:22 |
SamYaple | and nova should always be in the cloned dir.. right | 04:22 |
clarkb | I dont thi k so as the job runs against requiremwnts changes only | 04:24 |
clarkb | I think that is what confuses it | 04:24 |
openstackgerrit | Sagi Shnaidman proposed openstack-infra/tripleo-ci master: DNM: test CI job https://review.openstack.org/508660 | 04:24 |
*** lukebrowning has quit IRC | 04:24 | |
clarkb | I thi k you put nova in required projects then just cd into the nova repo dir | 04:24 |
*** yamamoto has joined #openstack-infra | 04:25 | |
SamYaple | nova is in required-projects http://logs.openstack.org/91/508791/1/check/legacy-cross-nova-func/f469b5d/job-output.txt.gz#_2017-10-02_02_55_22_768865 | 04:26 |
SamYaple | so i just need to update teh chdir | 04:26 |
*** hongbin_ has joined #openstack-infra | 04:26 | |
clarkb | ya | 04:27 |
*** yamamoto has quit IRC | 04:29 | |
*** lukebrowning has joined #openstack-infra | 04:30 | |
*** hongbin has quit IRC | 04:30 | |
*** gouthamr has quit IRC | 04:30 | |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Remove obsolete releasenotes jobs https://review.openstack.org/508709 | 04:32 |
SamYaple | is there a variable preset with /home/zuul/src/git.openstack.org/ ? or do i just need to put in /home/zuul/src/git.openstack.org/openstack/nova | 04:33 |
openstackgerrit | Sam Yaple proposed openstack-infra/openstack-zuul-jobs master: Move to the appropriate directory before tox https://review.openstack.org/508803 | 04:34 |
*** lukebrowning has quit IRC | 04:34 | |
*** lukebrowning_ has joined #openstack-infra | 04:34 | |
*** yamahata has joined #openstack-infra | 04:35 | |
*** ykarel has joined #openstack-infra | 04:36 | |
clarkb | I'm not sure would have to look | 04:37 |
clarkb | and its time to call it a day. Will check back in the morning | 04:37 |
SamYaple | ok | 04:38 |
SamYaple | thanks | 04:38 |
*** lukebrowning_ has quit IRC | 04:39 | |
*** lukebrowning has joined #openstack-infra | 04:40 | |
*** markvoelker has joined #openstack-infra | 04:43 | |
*** lukebrowning has quit IRC | 04:45 | |
SamYaple | clarkb: that was not the issue. its definetly running in the correct directory.... something else is going on and im afraid its beyond me | 04:46 |
SamYaple | im notfamiliar with the tests | 04:46 |
*** hongbin_ has quit IRC | 04:46 | |
*** lukebrowning has joined #openstack-infra | 04:47 | |
*** lukebrowning has quit IRC | 04:51 | |
*** lukebrowning has joined #openstack-infra | 04:53 | |
*** lukebrowning has quit IRC | 04:58 | |
*** bobh has quit IRC | 04:58 | |
*** lukebrowning has joined #openstack-infra | 04:59 | |
AJaeger | SamYaple: see also https://review.openstack.org/508783 | 04:59 |
*** bobh has joined #openstack-infra | 04:59 | |
openstackgerrit | Sam Yaple proposed openstack-infra/openstack-zuul-jobs master: Move to the appropriate directory before tox https://review.openstack.org/508803 | 05:02 |
*** numans has quit IRC | 05:03 | |
*** lukebrowning has quit IRC | 05:04 | |
SamYaple | AJaeger: ill give it a go | 05:04 |
*** lukebrowning has joined #openstack-infra | 05:06 | |
*** numans has joined #openstack-infra | 05:06 | |
*** jascott1 has quit IRC | 05:07 | |
*** jascott1 has joined #openstack-infra | 05:08 | |
AJaeger | ianw, yolanda, jhesketh, could you put https://review.openstack.org/#/c/508598/ https://review.openstack.org/508706 https://review.openstack.org/#/c/508764/ on your review queue, please? | 05:08 |
AJaeger | SamYaple: sorry, can't help with your change, hope others will review later | 05:09 |
SamYaple | AJaeger: not a problem i get it. its really late here for me, im going to sleep soon | 05:09 |
SamYaple | thanks! | 05:09 |
*** lukebrowning has quit IRC | 05:10 | |
AJaeger | SamYaple: early morning here ;) Good night! | 05:10 |
*** lukebrowning has joined #openstack-infra | 05:12 | |
AJaeger | jlk: the current translation jobs are broken, you have two changes up (https://review.openstack.org/#/c/502207/ and https://review.openstack.org/#/c/502208/), should we get these in and migrate to your new ones instead of debugging the legacy ones? | 05:13 |
*** edmondsw has joined #openstack-infra | 05:13 | |
* AJaeger will rebase 502208 now | 05:14 | |
*** lukebrowning_ has joined #openstack-infra | 05:16 | |
*** lukebrowning has quit IRC | 05:16 | |
*** markvoelker has quit IRC | 05:17 | |
*** edmondsw has quit IRC | 05:18 | |
*** logan- has quit IRC | 05:18 | |
openstackgerrit | Andreas Jaeger proposed openstack-infra/project-config master: Add translation jobs https://review.openstack.org/502208 | 05:19 |
*** lukebrowning_ has quit IRC | 05:20 | |
*** logan- has joined #openstack-infra | 05:21 | |
openstackgerrit | yatin proposed openstack-infra/project-config master: Remove legacy magnum jobs from pipeline https://review.openstack.org/508804 | 05:21 |
*** lnxnut has joined #openstack-infra | 05:22 | |
*** kiennt26 has quit IRC | 05:26 | |
*** yamamoto has joined #openstack-infra | 05:26 | |
sshnaidm | clarkb, does whit change should resolve hardlinks problem? https://review.openstack.org/#/c/508772/ | 05:29 |
jeblair | infra-root: i've started some zuulv3 memory analysis which is taking a lot of cpu time, and is likely to affect performance while it runs. it'd be nice if we can leave it running for at least a few hours, but if it hasn't finished by the time things get busy later today and is causing errors, feel free to restart zuul-scheduler. | 05:29 |
*** nunchuck has quit IRC | 05:31 | |
*** deduped has quit IRC | 05:31 | |
*** yamamoto has quit IRC | 05:31 | |
openstackgerrit | Sagi Shnaidman proposed openstack-infra/tripleo-ci master: Use sub_nodes_private https://review.openstack.org/508752 | 05:32 |
*** bobh has quit IRC | 05:32 | |
*** bobh has joined #openstack-infra | 05:34 | |
*** bobh has quit IRC | 05:34 | |
AJaeger | pabelanger, mordred, https://review.openstack.org/#/q/project:openstack-infra/zuul-jobs+status:open shows a couple of changes from both of you that are in merge conflict - which ones do we need to move forward and which can be abandoned? I hope there's nothing we really need that just needs updating... | 05:40 |
*** thorst has joined #openstack-infra | 05:41 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Increase ansible internal_poll_interval https://review.openstack.org/508805 | 05:42 |
*** ethfci has joined #openstack-infra | 05:44 | |
ethfci | fungi thanks for fixing legacy-horizon-selenium-headless requirements for Zuul v3 | 05:47 |
*** logan- has quit IRC | 05:48 | |
*** lnxnut has quit IRC | 05:50 | |
frickler | jeblair: how long would you expect zuul to take until it starts processing stuff again? | 05:50 |
*** dizquierdo has joined #openstack-infra | 05:50 | |
*** thorst has quit IRC | 05:51 | |
*** thorst has joined #openstack-infra | 05:51 | |
AJaeger | frickler: it should still process - just slower | 05:52 |
*** thorst has quit IRC | 05:52 | |
AJaeger | frickler: and looking at queue length: It is processing... | 05:53 |
*** logan- has joined #openstack-infra | 05:53 | |
openstackgerrit | Eyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow https://review.openstack.org/508785 | 05:53 |
frickler | AJaeger: hmm, indeed, seems I need to be more patient with snail zuul | 05:57 |
openstackgerrit | Eyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow https://review.openstack.org/508785 | 06:01 |
AJaeger | frickler: it's fast compared to yesterday evening ;) | 06:02 |
*** esberglu has joined #openstack-infra | 06:02 | |
*** bobh has joined #openstack-infra | 06:05 | |
*** esberglu has quit IRC | 06:07 | |
*** bobh has quit IRC | 06:09 | |
openstackgerrit | garyk proposed openstack-infra/project-config master: Add missing projects to vmware-nsxlib https://review.openstack.org/508779 | 06:11 |
frickler | has anyone seen "cannot create hard link" chain of errors when cloning repos? http://logs.openstack.org/81/508781/1/check/legacy-neutron-dynamic-routing-dsvm-functional/fed6569/job-output.txt.gz#_2017-10-02_06_07_32_299327 | 06:13 |
openstackgerrit | Yuval Brik proposed openstack-infra/openstack-zuul-jobs master: Remove karborclient LIBS_FROM_GIT from Karbor gate https://review.openstack.org/508807 | 06:13 |
*** markvoelker has joined #openstack-infra | 06:14 | |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Require requirements prj for legacy-requirements https://review.openstack.org/508460 | 06:14 |
*** lukebrowning has joined #openstack-infra | 06:17 | |
*** kzaitsev1pi has quit IRC | 06:20 | |
*** kzaitsev_pi has joined #openstack-infra | 06:22 | |
*** jtomasek has joined #openstack-infra | 06:24 | |
*** eumel8 has joined #openstack-infra | 06:24 | |
*** ekcs has quit IRC | 06:25 | |
ykarel | frickler, this should fix hardlink: https://review.openstack.org/#/c/508772/ | 06:27 |
openstackgerrit | Jens Harbott (frickler) proposed openstack-infra/project-config master: Add neutron to required-projects for neutron-dynamic-routing https://review.openstack.org/508775 | 06:28 |
*** yamamoto has joined #openstack-infra | 06:28 | |
*** lukebrowning has quit IRC | 06:30 | |
frickler | ykarel: thx, added that to my test-patch | 06:32 |
*** yamamoto has quit IRC | 06:33 | |
openstackgerrit | Omer Anson proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow https://review.openstack.org/508785 | 06:34 |
*** stakeda has quit IRC | 06:36 | |
*** shardy has joined #openstack-infra | 06:37 | |
*** lukebrowning has joined #openstack-infra | 06:38 | |
*** lukebrowning has quit IRC | 06:43 | |
*** lukebrowning has joined #openstack-infra | 06:44 | |
*** markvoelker has quit IRC | 06:47 | |
*** ykarel has quit IRC | 06:48 | |
*** lukebrowning has quit IRC | 06:49 | |
*** lukebrowning has joined #openstack-infra | 06:51 | |
openstackgerrit | Yuval Brik proposed openstack-infra/openstack-zuul-jobs master: Remove karborclient LIBS_FROM_GIT from Karbor gate https://review.openstack.org/508807 | 06:51 |
*** lukebrowning has quit IRC | 06:55 | |
sshnaidm | I use the patch for fixing hardlinks (508772), but still half of jobs have errors: Invalid cross-device link | 06:56 |
*** lukebrowning has joined #openstack-infra | 06:57 | |
openstackgerrit | Numan Siddique proposed openstack-infra/project-config master: ovn: Make scenario007-multinode-oooq-container voting https://review.openstack.org/502899 | 06:58 |
openstackgerrit | Eyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow https://review.openstack.org/508785 | 06:58 |
*** jpena has joined #openstack-infra | 06:59 | |
*** edmondsw has joined #openstack-infra | 06:59 | |
*** lukebrowning has quit IRC | 07:01 | |
*** lukebrowning has joined #openstack-infra | 07:03 | |
*** rcernin has joined #openstack-infra | 07:04 | |
*** edmondsw has quit IRC | 07:04 | |
*** lukebrowning has quit IRC | 07:08 | |
*** pcaruana has joined #openstack-infra | 07:08 | |
*** hashar has joined #openstack-infra | 07:09 | |
*** lukebrowning has joined #openstack-infra | 07:10 | |
openstackgerrit | Eyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow https://review.openstack.org/508785 | 07:14 |
*** lukebrowning has quit IRC | 07:14 | |
*** dizquierdo has quit IRC | 07:15 | |
openstackgerrit | Omer Anson proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow https://review.openstack.org/508785 | 07:16 |
*** lnxnut has joined #openstack-infra | 07:17 | |
frickler | hmm, seems my hard link errors are related to running with "sudo -u stack" rather than different file systems | 07:18 |
*** electrofelix has joined #openstack-infra | 07:18 | |
*** lukebrowning has joined #openstack-infra | 07:19 | |
*** martinkopec has joined #openstack-infra | 07:20 | |
openstackgerrit | Eyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow https://review.openstack.org/508785 | 07:21 |
*** tesseract has joined #openstack-infra | 07:22 | |
*** lukebrowning has quit IRC | 07:23 | |
*** martinkopec has quit IRC | 07:24 | |
openstackgerrit | Eyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow https://review.openstack.org/508785 | 07:24 |
*** lukebrowning has joined #openstack-infra | 07:25 | |
*** martinkopec has joined #openstack-infra | 07:26 | |
dmellado | I'm having issues using LIBS_FROM_GIT in my jobs since the zuulv3 migration | 07:26 |
dmellado | i.e. 2017-09-29 08:37:04.664 | + inc/python:check_libs_from_git:404 : die 404 'The following LIBS_FROM_GIT were not installed correct: python-octaviaclient diskimage-builder' | 07:26 |
*** yamamoto has joined #openstack-infra | 07:29 | |
*** lukebrowning has quit IRC | 07:30 | |
frickler | dmellado: its broken, see https://review.openstack.org/508344 | 07:30 |
*** tmorin has joined #openstack-infra | 07:31 | |
dmellado | frickler: I see, thanks! | 07:31 |
dmellado | I had a workaround patch which I'll stall for me | 07:31 |
*** lukebrowning has joined #openstack-infra | 07:31 | |
*** yamamoto has quit IRC | 07:36 | |
*** lukebrowning has quit IRC | 07:36 | |
*** lukebrowning has joined #openstack-infra | 07:38 | |
*** lukebrowning has quit IRC | 07:42 | |
*** lin_yang has quit IRC | 07:43 | |
*** egonzalez has joined #openstack-infra | 07:43 | |
*** lukebrowning has joined #openstack-infra | 07:44 | |
*** lnxnut has quit IRC | 07:44 | |
*** markvoelker has joined #openstack-infra | 07:44 | |
*** yamamoto has joined #openstack-infra | 07:45 | |
*** seanhandley has left #openstack-infra | 07:47 | |
*** lukebrowning has quit IRC | 07:48 | |
*** jpich has joined #openstack-infra | 07:50 | |
*** lukebrowning has joined #openstack-infra | 07:50 | |
*** esberglu has joined #openstack-infra | 07:51 | |
*** kiennt26 has joined #openstack-infra | 07:52 | |
*** lukebrowning has quit IRC | 07:54 | |
oanson | Hi. Do Depends-On flags in commit messages work for project-config? e.g. the Depends-On here: https://review.openstack.org/#/c/508761 would make a difference? | 07:54 |
*** esberglu has quit IRC | 07:56 | |
*** slaweq has joined #openstack-infra | 07:56 | |
*** kiennt26 has quit IRC | 07:56 | |
*** lukebrowning has joined #openstack-infra | 07:56 | |
openstackgerrit | Merged openstack-infra/project-config master: Add releasenotes publication job https://review.openstack.org/508764 | 08:00 |
*** slaweq has quit IRC | 08:00 | |
*** lukebrowning has quit IRC | 08:01 | |
tmorin | infaroot: I have a few (non-legacy) jobs (e.g. pep8 in networking-bgpvpn) that fail because of zuul-cloner not finding the repo to clone ; In understand that I need to add the repo to required-projects and that this happens somehwere in openstack-zuul-jobs/zuul.d . I've found out how I would do this for a legacy-job, but not how to do this for a non-legacy generic/templated job such as pep8. Can someone provide guidance ? | 08:01 |
*** hashar has quit IRC | 08:02 | |
*** hashar has joined #openstack-infra | 08:02 | |
*** kiennt26 has joined #openstack-infra | 08:02 | |
*** lukebrowning has joined #openstack-infra | 08:03 | |
*** seanhandley has joined #openstack-infra | 08:03 | |
seanhandley | I have a question about Sphinx doc | 08:03 |
*** lucas-pto is now known as lucasagomes | 08:03 | |
seanhandley | I'm building out RST documents at https://github.com/openstack/publiccloud-wg using the openstacktheme. And I can view the build output locally and it looks fine. I'm missing something though - how does that compiled doc get picked up and published on o.o ? | 08:05 |
*** rossella_s has quit IRC | 08:05 | |
*** slaweq has joined #openstack-infra | 08:05 | |
seanhandley | for instance, the API WG has this RST doc defined in their docs repo: https://github.com/openstack/api-wg/blob/master/guidelines/discoverability.rst | 08:05 |
seanhandley | and it appears over at specs.o.o http://specs.openstack.org/openstack/api-wg/guidelines/discoverability.html | 08:06 |
*** Hal has joined #openstack-infra | 08:06 | |
*** bobh has joined #openstack-infra | 08:06 | |
*** Hal is now known as Guest62304 | 08:07 | |
*** lukebrowning has quit IRC | 08:07 | |
*** lukebrowning has joined #openstack-infra | 08:09 | |
sshnaidm | Did something happen to cirros image? I have an error: No such file or directory: '/opt/stack/cache/files/cirros-0.3.5-x86_64-disk.img | 08:09 |
*** garyk has joined #openstack-infra | 08:11 | |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Remove legacy-requirements-python34 https://review.openstack.org/508598 | 08:11 |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Remove old publish-openstack-python-docs templates https://review.openstack.org/508691 | 08:11 |
*** bobh has quit IRC | 08:11 | |
*** lnxnut has joined #openstack-infra | 08:11 | |
garyk | Is there anyone version in the required-project support? I posted https://review.openstack.org/#/c/508779/ and then a patch depending on that and the same issue was hit. | 08:12 |
openstackgerrit | Sagi Shnaidman proposed openstack-infra/tripleo-ci master: DNM: look for info https://review.openstack.org/508820 | 08:12 |
frickler | oanson: garyk: I'm seeing the same issue, seems either the dependencies don't work or the fix must be different | 08:12 |
garyk | frickler: thanks! | 08:13 |
*** lukebrowning has quit IRC | 08:13 | |
oanson | garyk, frickler, I see in https://docs.openstack.org/infra/manual/developers.html#limitations-and-caveats that Depends-On isn't supported. I was hoping it was solved in Zuulv3, which is why I asked. | 08:13 |
garyk | hmm. so we are between a rock and a hard place here - wonder how we can test the update required-projects? | 08:14 |
frickler | oanson: it should be solved indeed, working for other patches. maybe the solution is wrong, let me do a different version | 08:15 |
frickler | tmorin: see https://review.openstack.org/508775 but it doesn't seem to work yet | 08:15 |
*** lnxnut has quit IRC | 08:15 | |
* tmorin looking | 08:16 | |
*** ralonsoh has joined #openstack-infra | 08:16 | |
oanson | frickler, sure. Thanks. | 08:16 |
*** namnh has joined #openstack-infra | 08:16 | |
tmorin | frickler: ok, got the idea, will now try apply... thanks! | 08:17 |
garyk | AJaeger: can you please look at https://review.openstack.org/#/c/508779/ and let us know if there is anything missing? | 08:18 |
*** markvoelker has quit IRC | 08:18 | |
*** lukebrowning has joined #openstack-infra | 08:19 | |
*** dizquierdo has joined #openstack-infra | 08:21 | |
eumel8 | seanhandley: maybe something like this: https://review.openstack.org/#/c/507660/ | 08:21 |
*** lukebrowning has quit IRC | 08:23 | |
*** bauwser is now known as bauzas | 08:25 | |
*** akscram1 has quit IRC | 08:25 | |
seanhandley | thanks eumel8 | 08:25 |
seanhandley | I'll try to make some sense of that | 08:25 |
*** lukebrowning has joined #openstack-infra | 08:26 | |
*** akscram1 has joined #openstack-infra | 08:27 | |
*** lukebrowning has quit IRC | 08:30 | |
*** derekh has joined #openstack-infra | 08:30 | |
*** lukebrowning has joined #openstack-infra | 08:32 | |
*** stakeda has joined #openstack-infra | 08:32 | |
openstackgerrit | Jens Harbott (frickler) proposed openstack-infra/openstack-zuul-jobs master: Add tox jobs including neutron repo https://review.openstack.org/508822 | 08:33 |
frickler | garyk: oanson: tmorin: https://docs.openstack.org/infra/manual/zuulv3.html#installation-of-sibling-requirements makes me think tht we may need different jobs names, so I'm trying this now ^^ | 08:33 |
*** lukebrowning has quit IRC | 08:36 | |
openstackgerrit | Jens Harbott (frickler) proposed openstack-infra/project-config master: Add neutron to required-projects for neutron-dynamic-routing https://review.openstack.org/508775 | 08:38 |
*** lukebrowning has joined #openstack-infra | 08:38 | |
openstackgerrit | Zhiyuan Cai proposed openstack-infra/openstack-zuul-jobs master: Fix nodesets for tricircle multi-region job https://review.openstack.org/508824 | 08:40 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Change in to work dir before executing tox https://review.openstack.org/508783 | 08:40 |
garyk | frickler: i do not think that will address our issues as e need to include a number of different neutron projects. In addition to this I am not sure how you can test that this actually addresses the issues int he dynamic-routing project. | 08:40 |
openstackgerrit | Thomas Morin proposed openstack-infra/project-config master: Add projects to required-projects for networking-(bagpipe|bgpvpn) https://review.openstack.org/508825 | 08:41 |
garyk | first the project config patch will need to be approved then you can post a path in dyanmic routing, then rinse recycle. | 08:41 |
seanhandley | eumel8: The docs team recommends a word with clarkb | 08:41 |
*** lukebrowning has quit IRC | 08:43 | |
tmorin | frickler: can you have a look at https://review.openstack.org/508825 ? (seen https://review.openstack.org/508822 but in n8g-bagpipe and n8g-bgpvpn we need other repos than just neutron) | 08:43 |
*** jascott1 has quit IRC | 08:44 | |
*** jascott1 has joined #openstack-infra | 08:44 | |
*** lukebrowning has joined #openstack-infra | 08:45 | |
eumel8 | seanhandley: yes, then you have to wait for the US wake up time, I think :) | 08:45 |
openstackgerrit | Eyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow https://review.openstack.org/508785 | 08:45 |
frickler | garyk: with zuulv3 depends-on should be working for this kind of tests, it did for a couple of other things. except we might be hitting another zuul bug here | 08:46 |
seanhandley | ok eumel8 - I can wait :) | 08:47 |
*** edmondsw has joined #openstack-infra | 08:48 | |
frickler | tmorin: zuul/layout.yaml is/was for zuul v2, you need to look at stuff in zuul.d now. also I'm still testing stuff for neutron-dynamic-routing, might be easier if you wait a moment until that one is working | 08:49 |
*** jascott1 has quit IRC | 08:49 | |
garyk | frickler: the depends on does not seem to be working. | 08:49 |
*** lukebrowning has quit IRC | 08:49 | |
*** lukebrowning has joined #openstack-infra | 08:51 | |
openstackgerrit | Thomas Morin proposed openstack-infra/project-config master: neutron-lib: add openstack/requirements to required projects https://review.openstack.org/508827 | 08:51 |
*** edmondsw has quit IRC | 08:52 | |
tmorin | frickler: wops, didn't notice the directory was different... | 08:54 |
* aspiers is at the Gerrit User Summit in London. Any other stackers attending? | 08:54 | |
*** lukebrowning has quit IRC | 08:55 | |
tmorin | frickler: will modify so that the work progresses in parallel ; but understood: I'll wait for you confirmation that the neutron-dynamic-routing thing is fixed before I start bugging you on the other one :) | 08:55 |
*** yamamoto has quit IRC | 08:56 | |
*** lukebrowning has joined #openstack-infra | 08:57 | |
frickler | garyk: seems you are correct about the dependency not working :( | 08:57 |
openstackgerrit | Thomas Morin proposed openstack-infra/project-config master: neutron-lib: add openstack/requirements to required projects https://review.openstack.org/508827 | 08:58 |
*** bhavik1 has joined #openstack-infra | 09:00 | |
*** lukebrowning has quit IRC | 09:02 | |
*** lukebrowning has joined #openstack-infra | 09:03 | |
openstackgerrit | Thomas Morin proposed openstack-infra/project-config master: Add projects to required-projects for networking-(bagpipe|bgpvpn) https://review.openstack.org/508825 | 09:05 |
openstackgerrit | Eyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow https://review.openstack.org/508785 | 09:06 |
*** bobh has joined #openstack-infra | 09:07 | |
*** lukebrowning has quit IRC | 09:08 | |
frickler | AJaeger: yolanda: could you consider speculatively merging https://review.openstack.org/508775 and https://review.openstack.org/508822 ? seems the speculative testing on https://review.openstack.org/508781 isn't working, I'm still seeing the old jobs being run there | 09:08 |
*** lukebrowning has joined #openstack-infra | 09:09 | |
oanson | AJaeger, yolanda: Same for https://review.openstack.org/#/c/508785/ , if possible. | 09:10 |
*** bobh has quit IRC | 09:11 | |
openstackgerrit | Thomas Morin proposed openstack-infra/project-config master: neutron-lib: add openstack/requirements to required projects https://review.openstack.org/508827 | 09:13 |
*** lnxnut has joined #openstack-infra | 09:13 | |
*** witek has joined #openstack-infra | 09:13 | |
*** lukebrowning has quit IRC | 09:14 | |
*** markvoelker has joined #openstack-infra | 09:15 | |
openstackgerrit | Thomas Morin proposed openstack-infra/project-config master: Add projects to required-projects for networking-(bagpipe|bgpvpn) https://review.openstack.org/508825 | 09:17 |
*** e0ne has joined #openstack-infra | 09:18 | |
*** lukebrowning has joined #openstack-infra | 09:20 | |
*** panda|bbl is now known as panda | 09:23 | |
*** tosky has joined #openstack-infra | 09:23 | |
*** dizquierdo has quit IRC | 09:24 | |
*** lukebrowning has quit IRC | 09:24 | |
*** lukebrowning has joined #openstack-infra | 09:26 | |
*** sambetts|afk is now known as sambetts | 09:28 | |
openstackgerrit | Thomas Morin proposed openstack-infra/project-config master: Add projects to required-projects for networking-(bagpipe|bgpvpn) https://review.openstack.org/508825 | 09:30 |
openstackgerrit | Sagi Shnaidman proposed openstack-infra/tripleo-ci master: WIP: Fix TripleO CI jobs https://review.openstack.org/508660 | 09:30 |
*** lukebrowning has quit IRC | 09:31 | |
*** dizquierdo has joined #openstack-infra | 09:31 | |
*** lukebrowning has joined #openstack-infra | 09:33 | |
*** stakeda has quit IRC | 09:35 | |
openstackgerrit | Sagi Shnaidman proposed openstack-infra/tripleo-ci master: WIP: Fix TripleO CI jobs https://review.openstack.org/508660 | 09:36 |
*** lukebrowning has quit IRC | 09:37 | |
*** esberglu has joined #openstack-infra | 09:39 | |
*** lukebrowning has joined #openstack-infra | 09:39 | |
*** lnxnut has quit IRC | 09:39 | |
openstackgerrit | Andrey Kurilin proposed openstack-infra/project-config master: Remove legacy-rally-dsvm-keystone-v2api-rally job https://review.openstack.org/508833 | 09:40 |
*** greghaynes has quit IRC | 09:40 | |
*** esberglu has quit IRC | 09:43 | |
*** lukebrowning has quit IRC | 09:43 | |
*** greghaynes has joined #openstack-infra | 09:45 | |
*** lukebrowning has joined #openstack-infra | 09:45 | |
openstackgerrit | Andrey Kurilin proposed openstack-infra/project-config master: [rally] fix cases when *verify* job should be launched https://review.openstack.org/508834 | 09:46 |
aspiers | AJaeger: who are our main folks looking after Gerrit? | 09:48 |
*** markvoelker has quit IRC | 09:48 | |
openstackgerrit | David Pursehouse proposed openstack-infra/lodgeit master: Removes unnecessary utf-8 encoding https://review.openstack.org/418748 | 09:49 |
*** lukebrowning has quit IRC | 09:50 | |
*** vsaienk0 has joined #openstack-infra | 09:50 | |
*** kiennt26 has quit IRC | 09:51 | |
*** lukebrowning has joined #openstack-infra | 09:51 | |
vsaienk0 | infra team could you please help to figure out why timeout for ironic-grenade job is set to 110 min according to logs http://logs.openstack.org/83/506983/3/check/legacy-grenade-dsvm-ironic/783379c/job-output.txt.gz#_2017-10-02_07_23_28_887978 while it should be 180min according to job definition https://git.openstack.org/cgit/openstack-infra/openstack-zuul-jobs/tree/zuul.d/zuul-legacy-jobs.yaml?h=refs/heads/master#n2791 | 09:53 |
*** egonzalez has quit IRC | 09:55 | |
tosky | uhm uhm, was the issue related to jobs running on trusty instead of xenial fixed? | 09:56 |
*** lukebrowning has quit IRC | 09:56 | |
*** yamamoto has joined #openstack-infra | 09:56 | |
frickler | tosky: should be fixed, yes | 09:57 |
*** lukebrowning has joined #openstack-infra | 09:58 | |
tosky | frickler: thanks, then I can safely recheck :) | 09:58 |
*** lukebrowning has quit IRC | 10:02 | |
*** yamamoto has quit IRC | 10:02 | |
openstackgerrit | Julie Pichon proposed openstack-infra/openstack-zuul-jobs master: Add required-projects for Cliff legacy tox jobs https://review.openstack.org/508837 | 10:06 |
*** lnxnut has joined #openstack-infra | 10:07 | |
*** bobh has joined #openstack-infra | 10:08 | |
*** shu-mutou is now known as shu-mutou-AWAY | 10:08 | |
*** cshastri has quit IRC | 10:08 | |
*** lukebrowning has joined #openstack-infra | 10:09 | |
*** lnxnut has quit IRC | 10:11 | |
*** bobh has quit IRC | 10:13 | |
*** lukebrowning has quit IRC | 10:13 | |
*** cuongnv has quit IRC | 10:13 | |
vsaienk0 | clarkb: could you please help to figure out why job timeout is not applied to ironic-grenade job ^ | 10:14 |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Add multinode integration jobs and integration tests for known_hosts https://review.openstack.org/504787 | 10:14 |
*** lukebrowning has joined #openstack-infra | 10:15 | |
frickler | vsaienk0: clarkb: maybe zuul sets BUILD_TIMEOUT but doesn't export it, so it isn't set when "./safe-devstack-vm-gate-wrap.sh" is executed? | 10:17 |
*** dtantsur|afk is now known as dtantsur | 10:18 | |
vsaienk0 | frickler: it might be as according to job BUILD_TIMEOUT was set to 120 min which is the default | 10:18 |
*** lukebrowning has quit IRC | 10:19 | |
frickler | vsaienk0: the zuul/timeout variable seems be to correctly set here http://logs.openstack.org/83/506983/3/check/legacy-grenade-dsvm-ironic/783379c/zuul-info/inventory.yaml | 10:20 |
openstackgerrit | Merged openstack-infra/project-config master: Remove legacy-rpm-packaging-tox-lint https://review.openstack.org/508610 | 10:20 |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Add integration tests for multi-node-hosts-file https://review.openstack.org/505789 | 10:20 |
vsaienk0 | frickler: but I don't see where BUILD_TIMEOUT is set | 10:21 |
*** ociuhandu has joined #openstack-infra | 10:21 | |
*** lukebrowning has joined #openstack-infra | 10:21 | |
frickler | vsaienk0: in zuul/ansible/filter/zuul_filters.py L30 | 10:22 |
*** namnh has quit IRC | 10:23 | |
openstackgerrit | Merged openstack-infra/project-config master: Remove legacy shade jobs from os-client-config https://review.openstack.org/508773 | 10:25 |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Fix Rally jobs: Add dib-utils to required projects https://review.openstack.org/508799 | 10:25 |
*** lukebrowning has quit IRC | 10:25 | |
*** lukebrowning has joined #openstack-infra | 10:27 | |
openstackgerrit | Jens Harbott (frickler) proposed openstack-infra/openstack-zuul-jobs master: Really only copy logs dir for chef-rake-integration https://review.openstack.org/508841 | 10:29 |
*** jkilpatr has quit IRC | 10:29 | |
*** lukebrowning has quit IRC | 10:32 | |
*** lukebrowning has joined #openstack-infra | 10:33 | |
*** psachin has quit IRC | 10:34 | |
yolanda | AJaeger, question... look at https://review.openstack.org/#/c/506138/ . See that legacy-bifrost-integration-tinyipa-opensuse-423 is being triggered. But according to the zuul.d/projects.yaml it should not even be triggered: http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml#n5111 | 10:35 |
yolanda | are we missing something? | 10:35 |
*** edmondsw has joined #openstack-infra | 10:36 | |
*** lukebrowning has quit IRC | 10:38 | |
*** bhavik1 has quit IRC | 10:39 | |
*** lukebrowning has joined #openstack-infra | 10:40 | |
*** edmondsw has quit IRC | 10:40 | |
*** vsaienk0 has quit IRC | 10:44 | |
*** lukebrowning has quit IRC | 10:44 | |
*** vsaienk0 has joined #openstack-infra | 10:45 | |
*** markvoelker has joined #openstack-infra | 10:45 | |
*** jascott1 has joined #openstack-infra | 10:46 | |
*** lukebrowning has joined #openstack-infra | 10:46 | |
*** lukebrowning has quit IRC | 10:50 | |
*** pbourke has quit IRC | 10:50 | |
*** adisky has quit IRC | 10:51 | |
*** lukebrowning has joined #openstack-infra | 10:52 | |
*** pbourke has joined #openstack-infra | 10:52 | |
*** slaweq has quit IRC | 10:54 | |
Shrews | yolanda: that job is also listed in 'check' | 10:54 |
*** slaweq has joined #openstack-infra | 10:54 | |
openstackgerrit | Eyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow https://review.openstack.org/508785 | 10:55 |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Remove rpm-packaging-tox-lint legacy job https://review.openstack.org/508609 | 10:57 |
*** lukebrowning has quit IRC | 10:57 | |
yolanda | Shrews, but also with stable/newton|ocata there | 10:58 |
yolanda | to block from running for those branches | 10:58 |
yolanda | what do i miss' | 10:58 |
yolanda | ? | 10:58 |
*** lukebrowning has joined #openstack-infra | 10:58 | |
*** yamamoto has joined #openstack-infra | 10:58 | |
*** slaweq has quit IRC | 10:59 | |
Shrews | yolanda: oh yeah, i missed the branch. looks like a bug on branch matching. jeblair: mordred: ^^^ | 11:01 |
AJaeger | yolanda: looking at http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml#n5138 - that's the check definition - something is indeed wrong | 11:01 |
*** slaweq has joined #openstack-infra | 11:01 | |
AJaeger | Yeah, as Shrews said | 11:01 |
*** nicolasbock_ has joined #openstack-infra | 11:01 | |
yolanda | ok, shall i report a bug ? or just the mention here is ok? | 11:02 |
AJaeger | yolanda: best to ask jeblair later what to do... | 11:02 |
AJaeger | yolanda: https://review.openstack.org/#/c/508689 and https://review.openstack.org/#/c/508697 are ready for review if time permits | 11:02 |
openstackgerrit | Andrey Kurilin proposed openstack-infra/openstack-zuul-jobs master: Remove legacy-rally-dsvm-keystone-v2api-rally definition https://review.openstack.org/508850 | 11:03 |
*** lukebrowning has quit IRC | 11:03 | |
*** yamamoto has quit IRC | 11:04 | |
openstackgerrit | Frank Kloeker proposed openstack-infra/puppet-zanata master: Preparation for Zanata 4 version https://review.openstack.org/506795 | 11:04 |
*** sdague has joined #openstack-infra | 11:05 | |
*** lukebrowning has joined #openstack-infra | 11:05 | |
*** slaweq has quit IRC | 11:05 | |
*** jkilpatr has joined #openstack-infra | 11:07 | |
*** slaweq has joined #openstack-infra | 11:07 | |
*** dave-mccowan has joined #openstack-infra | 11:07 | |
*** lnxnut has joined #openstack-infra | 11:08 | |
*** bobh has joined #openstack-infra | 11:09 | |
*** lukebrowning has quit IRC | 11:09 | |
*** nicolasbock_ has quit IRC | 11:10 | |
*** bcafarel has joined #openstack-infra | 11:11 | |
*** slaweq has quit IRC | 11:11 | |
*** edmondsw has joined #openstack-infra | 11:13 | |
andreykurilin | AJaeger: hi! should we enable new-jobs in the local repo or it should be still done in project-config? | 11:13 |
*** bobh has quit IRC | 11:13 | |
openstackgerrit | Javier Peña proposed openstack-infra/project-config master: Remove legacy Packstack integration jobs https://review.openstack.org/508851 | 11:13 |
*** edmondsw has quit IRC | 11:14 | |
*** lukebrowning has joined #openstack-infra | 11:15 | |
*** yamamoto has joined #openstack-infra | 11:17 | |
*** yamamoto has quit IRC | 11:18 | |
*** markvoelker has quit IRC | 11:19 | |
*** lucasagomes is now known as lucas-hungry | 11:19 | |
*** alexchadin has joined #openstack-infra | 11:19 | |
*** lukebrowning has quit IRC | 11:20 | |
garyk | AJaeger: is there a timeout for the unit tests? ours take about 40 minutes and we are getting a timeout at 35 minutes? Anyway of tweaking this | 11:20 |
AJaeger | andreykurilin: I suggest to first consolidate the existing jobs before tackling new ones ;) The ifnra manual explains how to do set - once you moved them over to your repo, new jobs should go to your repo... | 11:20 |
AJaeger | garyk: you should be able to override this - similar how some changes are in the queue for adding required-repositories | 11:21 |
garyk | AJaeger: do you have an example by chance? | 11:21 |
*** lukebrowning has joined #openstack-infra | 11:21 | |
*** bobh has joined #openstack-infra | 11:22 | |
*** bobh has quit IRC | 11:23 | |
*** nicolasbock_ has joined #openstack-infra | 11:23 | |
AJaeger | garyk: similar to what you do for vmware-nsxlib, just add timeout. | 11:23 |
AJaeger | garyk: this is new for all of us ;) | 11:23 |
AJaeger | garyk: so, don't have a working example for this specific case | 11:23 |
garyk | AJaeger: gracias! | 11:24 |
*** egonzalez has joined #openstack-infra | 11:24 | |
*** mrunge has quit IRC | 11:24 | |
*** mrunge has joined #openstack-infra | 11:24 | |
*** dhajare has joined #openstack-infra | 11:25 | |
frickler | garyk: do you have an example of a job run with a timeout? there seem to be issues that configured timeouts sometimes do not get applied correctly, see my conversation with vsaienk0 earlier | 11:26 |
garyk | frickler: https://review.openstack.org/508840 and https://review.openstack.org/508809 | 11:26 |
openstackgerrit | Merged openstack-infra/project-config master: Remove old publish-openstack-python-docs jobs https://review.openstack.org/508689 | 11:26 |
*** lukebrowning has quit IRC | 11:26 | |
*** yamamoto has joined #openstack-infra | 11:28 | |
*** lukebrowning has joined #openstack-infra | 11:28 | |
*** bobh has joined #openstack-infra | 11:28 | |
* AJaeger runs some errands, will be back later | 11:28 | |
*** kjackal_ has joined #openstack-infra | 11:29 | |
openstackgerrit | Merged openstack-infra/project-config master: Fix typo in comment https://review.openstack.org/508697 | 11:32 |
*** lukebrowning has quit IRC | 11:32 | |
openstackgerrit | Andrey Kurilin proposed openstack-infra/openstack-zuul-jobs master: Fix typo in descr of build-openstack-sphinx-docs https://review.openstack.org/508854 | 11:32 |
sshnaidm | cat /etc/nodepool/sub_nodes_private - 15.184.65.108, it doesn't seems like private address.. | 11:32 |
*** ociuhandu has quit IRC | 11:33 | |
*** ociuhandu has joined #openstack-infra | 11:33 | |
*** lukebrowning has joined #openstack-infra | 11:34 | |
*** ijw has joined #openstack-infra | 11:34 | |
*** lnxnut has quit IRC | 11:36 | |
openstackgerrit | Javier Peña proposed openstack-infra/openstack-zuul-jobs master: Remove legacy Packstack jobs https://review.openstack.org/508855 | 11:38 |
*** lukebrowning has quit IRC | 11:38 | |
eumel8 | what that mean: ERROR! /home/zuul/src/git.openstack.org/openstack-infra/project-config not found | 11:38 |
*** ijw has quit IRC | 11:39 | |
eumel8 | in legacy-openstackci-beaker-ubuntu-trusty job | 11:39 |
*** jamesdenton has quit IRC | 11:39 | |
*** lukebrowning has joined #openstack-infra | 11:40 | |
openstackgerrit | Jens Harbott (frickler) proposed openstack-infra/openstack-zuul-jobs master: Really only copy logs dir for chef-rake-integration https://review.openstack.org/508841 | 11:41 |
*** yamamoto has quit IRC | 11:43 | |
frickler | eumel8: it means that you need to add the missing project to your job definition, see https://review.openstack.org/508799 for an example | 11:44 |
*** lukebrowning has quit IRC | 11:45 | |
*** yamamoto has joined #openstack-infra | 11:45 | |
*** lukebrowning has joined #openstack-infra | 11:46 | |
eumel8 | frickler: so the legacy jobs will also run in the future? or is there a plan to migrate this to normal jobs? | 11:47 |
eumel8 | I'm affected since this morning in https://review.openstack.org/#/c/506795/ | 11:49 |
openstackgerrit | Javier Peña proposed openstack-infra/project-config master: Remove legacy Packstack integration jobs https://review.openstack.org/508851 | 11:50 |
openstackgerrit | Javier Peña proposed openstack-infra/openstack-zuul-jobs master: Remove legacy Packstack jobs https://review.openstack.org/508855 | 11:51 |
*** lukebrowning has quit IRC | 11:51 | |
*** mat128 has joined #openstack-infra | 11:52 | |
*** slaweq has joined #openstack-infra | 11:52 | |
frickler | eumel8: you may want to read https://docs.openstack.org/infra/manual/zuulv3.html for the big picture | 11:53 |
*** lukebrowning has joined #openstack-infra | 11:53 | |
openstackgerrit | Eyal Leshem proposed openstack-infra/openstack-zuul-jobs master: Add neutron to dragonflow requiered-project https://review.openstack.org/508856 | 11:55 |
*** lukebrowning has quit IRC | 11:57 | |
*** jamesdenton has joined #openstack-infra | 11:58 | |
*** dprince has joined #openstack-infra | 11:59 | |
eumel8 | frickler: thx for the link. First I'm on a puppet module change for I18n and now I have a Zuul-v3 migration at the neck :) | 11:59 |
*** lukebrowning has joined #openstack-infra | 11:59 | |
*** pblaho has joined #openstack-infra | 11:59 | |
*** yamamoto has quit IRC | 11:59 | |
*** tpsilva has joined #openstack-infra | 12:00 | |
openstackgerrit | Eyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow https://review.openstack.org/508785 | 12:01 |
*** ldnunes has joined #openstack-infra | 12:02 | |
*** yamamoto has joined #openstack-infra | 12:02 | |
*** tmorin has quit IRC | 12:02 | |
*** jamesdenton has quit IRC | 12:02 | |
*** lnxnut has joined #openstack-infra | 12:03 | |
*** lukebrowning has quit IRC | 12:04 | |
leyal | AJaege - is that the write place to insert the the required project of a a specific job - https://review.openstack.org/#/c/508856/1 ? | 12:04 |
*** lukebrowning has joined #openstack-infra | 12:05 | |
*** alexchadin has quit IRC | 12:05 | |
*** jamesdenton has joined #openstack-infra | 12:06 | |
*** markvoelker has joined #openstack-infra | 12:06 | |
*** alexchadin has joined #openstack-infra | 12:06 | |
openstackgerrit | Frank Kloeker proposed openstack-infra/openstack-zuul-jobs master: Add missing repo to legacy-openstackci-beaker jobs https://review.openstack.org/508857 | 12:08 |
*** sshnaidm is now known as sshnaidm|afk | 12:08 | |
eumel8 | frickler: maybe it helps ^^ | 12:08 |
*** edmondsw has joined #openstack-infra | 12:09 | |
*** lukebrowning has quit IRC | 12:10 | |
*** lucas-hungry is now known as lucasagomes | 12:10 | |
*** lukebrowning has joined #openstack-infra | 12:11 | |
*** trown|outtypewww is now known as trown | 12:14 | |
*** lukebrowning has quit IRC | 12:16 | |
*** claudiub has joined #openstack-infra | 12:17 | |
*** sshnaidm|afk is now known as sshnaidm | 12:20 | |
*** mkostrzewa has joined #openstack-infra | 12:21 | |
*** lnxnut has quit IRC | 12:21 | |
*** lukebrowning has joined #openstack-infra | 12:21 | |
*** tmorin has joined #openstack-infra | 12:22 | |
mkostrzewa | Hi guys! I've heard that there is some work going on for Nodepool Windows support? | 12:23 |
openstackgerrit | Frank Kloeker proposed openstack-infra/openstack-zuul-jobs master: Add missing repo to legacy-openstackci-beaker jobs https://review.openstack.org/508857 | 12:24 |
Shrews | infra-root: / on nodepool.o.o is full again | 12:24 |
*** aviau has quit IRC | 12:24 | |
*** aviau has joined #openstack-infra | 12:24 | |
*** jpena is now known as jpena|lunch | 12:24 | |
openstackgerrit | Eyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow https://review.openstack.org/508785 | 12:25 |
*** lukebrowning has quit IRC | 12:26 | |
*** yamamoto has quit IRC | 12:26 | |
fungi | Shrews: happen to remember what the command was to clean up the excess zk snapshots? | 12:27 |
Shrews | fungi: https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_administering | 12:27 |
Shrews | fungi: i'm not sure about the values we should use though | 12:27 |
Shrews | reading now | 12:27 |
tobiash | mkostrzewa: yes, https://review.openstack.org/#/q/status:open++topic:windows-support | 12:28 |
fungi | i think we determined that we only need to keep at most a couple since the data is unimportant/ephemeral | 12:28 |
*** lukebrowning has joined #openstack-infra | 12:28 | |
tobiash | mkostrzewa: not complete yet, configuring client keys on the executor is missing (but hello world worked already with hard coded keys hacked in) | 12:29 |
Shrews | fungi: yeah. anything more than 2 and less than the current 12000 is probably good | 12:29 |
Shrews | fungi: /var/lib/zookeeper/version-2$ ls -l snap* | wc -l | 12:30 |
Shrews | 12004 | 12:30 |
Shrews | fungi: i'll let you have the honors :) | 12:30 |
fungi | i'm still pulling up the documentation | 12:30 |
*** jaypipes has joined #openstack-infra | 12:30 | |
Shrews | i'm not even sure where the zk jar file lives | 12:31 |
*** lukebrowning has quit IRC | 12:32 | |
Shrews | fungi: there is a /usr/share/zookeeper/bin/zkCleanup.sh, but i'm not sure what that does | 12:32 |
* Shrews googles | 12:33 | |
Shrews | fungi: https://community.hortonworks.com/content/supportkb/48795/how-to-purge-old-zookeeper-directory-files.html | 12:33 |
*** lukebrowning has joined #openstack-infra | 12:34 | |
Shrews | that script does use the PurgeTxnLog command | 12:35 |
openstackgerrit | Sagi Shnaidman proposed openstack-infra/tripleo-ci master: WIP: Fix TripleO CI jobs https://review.openstack.org/508660 | 12:35 |
Shrews | i wonder if snapshot dir and log dir are the same (given that it seems those files exist in the same directory) | 12:35 |
*** ijw has joined #openstack-infra | 12:36 | |
*** rtjure has quit IRC | 12:36 | |
*** bobh has quit IRC | 12:36 | |
fungi | maybe. the paths to the jarfiles seem to be present in the command line for the zk daemon as it appears in the ps output | 12:36 |
Shrews | # the directory where the snapshot is stored. | 12:36 |
Shrews | dataDir=/var/lib/zookeeper | 12:36 |
Shrews | # Place the dataLogDir to a separate physical disc for better performance | 12:37 |
Shrews | # dataLogDir=/disk2/zookeeper | 12:37 |
Shrews | from /etc/zookeeper/conf/zoo.cfg | 12:37 |
Shrews | ^^^ | 12:37 |
fungi | i guess i need to sub in the path to the log4j jar and conffile from the example config as well | 12:37 |
Shrews | so maybe: zkCleanup.sh /var/lib/zookeeper /var/lib/zookeeper 3 | 12:37 |
Shrews | fungi: i think the .sh script takes care of it all for you | 12:38 |
openstackgerrit | Sagi Shnaidman proposed openstack-infra/tripleo-ci master: WIP: Fix TripleO CI jobs https://review.openstack.org/508660 | 12:38 |
*** lukebrowning has quit IRC | 12:38 | |
*** rtjure has joined #openstack-infra | 12:38 | |
*** rlandy has joined #openstack-infra | 12:39 | |
Shrews | fungi: it uses zkEnv.sh to grab the proper paths, it seems | 12:39 |
fungi | Shrews: oh, i'll pull up that url next | 12:39 |
fungi | i'm not actually at my computer so this is taking way too long. just a sec i'll go downstairs | 12:40 |
*** lukebrowning has joined #openstack-infra | 12:40 | |
*** ijw has quit IRC | 12:40 | |
andreaf | mordred, jeblair, clarkb: I guess this may be due to remote not being defined in zuulv3 cloned repos? I think this specific issue is not on the known issues list? http://logs.openstack.org/32/504232/4/gate/legacy-requirements/b740a6a/job-output.txt.gz#_2017-10-02_10_46_14_454791 but it probably affects more than tempest? | 12:40 |
mnaser | afaik there is no origin with how zuulv3 clones things | 12:42 |
*** hemna_ has joined #openstack-infra | 12:42 | |
andreaf | mnaser: yeah that's my understanding too | 12:43 |
*** hemna_ has quit IRC | 12:44 | |
fungi | #status log ran `sudo -u zookeeper ./zkCleanup.sh /var/lib/zookeeper 3` in /usr/share/zookeeper/bin on nodepool.openstack.org to free up 22gib of space for its / filesystem | 12:44 |
openstackstatus | fungi: finished logging | 12:45 |
fungi | infra-root: ^ | 12:45 |
*** lukebrowning has quit IRC | 12:45 | |
*** kgiusti has joined #openstack-infra | 12:45 | |
Shrews | 44% usage. now to see if the nodepool launchers recover | 12:45 |
*** dhajare has quit IRC | 12:46 | |
*** lukebrowning has joined #openstack-infra | 12:46 | |
fungi | now we only have 3 snapshot files in /var/lib/zookeeper/version-2 | 12:47 |
fungi | which is what the outcome was supposed to be, so i think that worked | 12:47 |
Shrews | yeah, but the launchers don't seem to be recovering. may have to manually restart them | 12:48 |
*** mat128 has quit IRC | 12:48 | |
mkostrzewa | tobiash: thx for the link; I can see you use Ansible for Windows? Were you thinking about using DSC anywhere? | 12:49 |
tobiash | mkostrzewa: dsc? | 12:50 |
tobiash | mkostrzewa: yes, ansible works for windows | 12:50 |
Shrews | #status log Restarted nodepool-launcher on nl01 and nl02 to fix zookeeper connection | 12:50 |
openstackstatus | Shrews: finished logging | 12:50 |
mkostrzewa | Desired State Configuration, Microsoft's Ansible equivalent | 12:50 |
*** lukebrowning has quit IRC | 12:51 | |
tobiash | mkostrzewa: well, my home is not windows ;) | 12:51 |
Shrews | fungi: it seems to be a bug in the launchers where the suspended ZK connection never revives. i'll have to work on debugging that. | 12:51 |
Shrews | infra-root: fyi ^^^ | 12:52 |
mkostrzewa | tobiash: sure. I was wondering because we had lots of little issues with Ansible for Windows, and the module support just wasn't there... | 12:52 |
tobiash | mkostrzewa: zuul3 is based entirely on ansible so dsc would probably not being supported (other than running ansible to kick dsc) | 12:52 |
*** lukebrowning has joined #openstack-infra | 12:53 | |
Shrews | fungi: i think we need that cleanup command added to nodepool.o.o cron | 12:53 |
mkostrzewa | tobiash: do you install any Windows Features / Roles via Ansible? | 12:53 |
tobiash | mkostrzewa: no, I just did a hello world job so far to prove that it is possible to integrate windows in zuulv3 | 12:55 |
*** rhallisey has joined #openstack-infra | 12:55 | |
pabelanger | morning! | 12:55 |
pabelanger | ready to get to work | 12:55 |
pabelanger | what should I be looking at? | 12:55 |
fungi | Shrews: the documentation suggested it could be added to the zk config to have it automagically clean up old snapshots when it creates new ones | 12:56 |
*** jcoufal has joined #openstack-infra | 12:56 | |
fungi | include the following in zoo.cfg and restart the zookeeper servers to automatically purge older files: | 12:57 |
fungi | autopurge.snapRetainCount=3 [number of logs to retain, in this example 3] | 12:57 |
fungi | autopurge.purgeInterval= 168 [in hours, in this example 1 week] | 12:57 |
Shrews | fungi: whatever works! :) | 12:57 |
fungi | https://community.hortonworks.com/content/supportkb/48795/how-to-purge-old-zookeeper-directory-files.html | 12:57 |
*** lukebrowning has quit IRC | 12:57 | |
fungi | anybody who's less confused by ansible who can take a guess as to what this job did? http://logs.openstack.org/22/2293d561bf37b71b14a7b89e2ada1a5552fc2168/release-post/tag-releases/7a41b08/ | 12:58 |
pabelanger | looking | 12:58 |
*** alexchadin has quit IRC | 12:58 | |
fungi | i can't even tell whether it succeeded or failed | 12:59 |
sshnaidm | is any reason not to see the patch https://review.openstack.org/#/c/508660/ in zuul status page? http://zuulv3.openstack.org/ | 12:59 |
*** lukebrowning has joined #openstack-infra | 12:59 | |
*** links has quit IRC | 12:59 | |
sshnaidm | pabelanger, please take a look at my mail in your time | 12:59 |
pabelanger | fungi: that job did't have a run playbook, so only did pre-run and post-run. Why, is another question. Looking into job layout now | 12:59 |
fungi | sshnaidm: "Queue lengths: 532 events, 202 results" | 12:59 |
mkostrzewa | tobiash: OK.. we had lots of little issues like Powershell printing something to stderr would cause Ansible to stop. Stuff like that... | 13:00 |
pabelanger | sshnaidm: yup, will take me a bit to catch up | 13:00 |
mkostrzewa | tobiash: thx for help anyways | 13:00 |
sshnaidm | fungi, so it's just a queue? | 13:00 |
*** tosky_ has joined #openstack-infra | 13:01 | |
fungi | sshnaidm: sshnaidm if you just pushed the change in the past few minutes, zuul probably hasn't processed that trigger event yet | 13:01 |
*** tosky has quit IRC | 13:01 | |
*** tosky_ is now known as tosky | 13:01 | |
sshnaidm | fungi, ok, will wait then, thanks | 13:01 |
*** thorst has joined #openstack-infra | 13:02 | |
fungi | zuul v3 is a lot slower processing events at this scale than v2 was, so there's still some debugging and profiling going on to attempt to speed it up | 13:02 |
fungi | looks like the next reconfiguration event will likely push the server back over into swap space again | 13:03 |
*** mriedem has joined #openstack-infra | 13:03 | |
*** scottda_ has joined #openstack-infra | 13:03 | |
*** lukebrowning has quit IRC | 13:03 | |
*** hemna_ has joined #openstack-infra | 13:04 | |
frickler | fungi: jeblair commented earlier about memory debugging and a possible restart being needed | 13:04 |
fungi | yep, i read all the scrollback | 13:04 |
*** lukebrowning has joined #openstack-infra | 13:05 | |
fungi | though i'm still not entirely awake yet. need to get back to morning routine stuff and pour some coffee | 13:05 |
openstackgerrit | Paul Belanger proposed openstack-infra/project-config master: Fix typo in tag-release pre playbook https://review.openstack.org/508871 | 13:06 |
*** tmorin has quit IRC | 13:06 | |
pabelanger | fungi: there was an error in the log you posted, but it didn't bubble up to logs properly. If you look on ze08.o.o, you'll see ^ is the reason for the failure. | 13:06 |
fungi | pabelanger: thanks! | 13:07 |
*** lbragstad has joined #openstack-infra | 13:07 | |
Shrews | fungi: oh, you know what? i think maybe i just didn't give the np launchers enough time. zuul uses almost the exact same zk code and it only just now re-established the zk connection. | 13:07 |
andreaf | it looks like this http://git.openstack.org/cgit/openstack-infra/project-config/tree/jenkins/scripts/project-requirements-change.py#n205 will never work with a zuul cloned repo - I suppose we should change that to checking out local branches? | 13:08 |
fungi | ahh | 13:08 |
fungi | Shrews: so maybe it's okay in that case? | 13:08 |
Shrews | fungi: i think so. i'm going to poke around and see if maybe zuul uses different timeout settings (i think these were changed recently) | 13:09 |
Shrews | for zk, that is | 13:09 |
*** lukebrowning has quit IRC | 13:09 | |
pabelanger | andreaf: ya, all of the jenkins/script are going to need to be converted into ansible playbooks / roles. So, it is possible things won't work, like you found, in zuulv3. | 13:09 |
*** esberglu has joined #openstack-infra | 13:09 | |
*** mkostrzewa has quit IRC | 13:09 | |
fungi | andreaf: ahh, the `git checkout remotes/origin/...` definitely won't work, correct | 13:10 |
mnaser | is it possible zuul might need a restart as well re: that zk issue? | 13:10 |
pabelanger | andreaf: I'm not sure if we have discussed a plan recently, but what I was doing was move the bash script into an ansible role, then start refactoring it from bash to ansible over a series of commits | 13:10 |
mnaser | nothing added to queue in the past hour, 0 cpu usage | 13:10 |
*** tmorin has joined #openstack-infra | 13:10 | |
dmsimard | mnaser: yeah Shrews found a disk full issue again | 13:10 |
andreaf | fungi, pabelanger: before we ansiblelize that, can we just drop the remote/origin bit out if it? I'm happy to make a patch for that | 13:11 |
fungi | andreaf: but those branches should be pushed to the server for you with the target branch already checked out, so you should be able to just omit that entirely | 13:11 |
*** erlon has joined #openstack-infra | 13:11 | |
pabelanger | mnaser: I know jeblair is collecting memory stats, which affects preformance | 13:11 |
*** bobh_ has joined #openstack-infra | 13:11 | |
Shrews | mnaser: zuul should be doing things now | 13:11 |
pabelanger | until jeblair comes online, I don't think we should restart anything on zuulv3.o.o | 13:11 |
*** lukebrowning has joined #openstack-infra | 13:11 | |
mnaser | dmsimard i think nodepool was restarted to reconnect but zuul (maybe?) was not | 13:12 |
Shrews | zuul only came back to life 5 minutes ago | 13:12 |
mnaser | cause cacti shows the server pretty much idle | 13:12 |
mnaser | ah okay | 13:12 |
mnaser | maybe that doesnt show it yet then | 13:12 |
*** isaacb has joined #openstack-infra | 13:13 | |
andreaf | fungi: heh ok - I was not sure if I could assume that | 13:14 |
andreaf | fungi: I guess not having the remote set is by design :) | 13:14 |
*** yamamoto has joined #openstack-infra | 13:15 | |
tosky | so... should we wait for zuul to notice the jobs that were not triggered so far, or we should wait a bit and retrigger? | 13:15 |
*** lukebrowning has quit IRC | 13:15 | |
fungi | andreaf: yes, the repositories are no longer pulled on the nodes, they're pushed to the nodes, so there is no git remote url involved | 13:15 |
pabelanger | tosky: best to wait for a bit | 13:16 |
tosky | pabelanger: ack | 13:16 |
*** thorst has quit IRC | 13:17 | |
fungi | andreaf: and dependency changes are merged in sequence to their respective target branches in the pushed copy of the repo, with the target branch for the change which triggered the job checked out in the worktree, so in most cases you should just be able to cd into it and use it, or explicitly checkout a local branch name if you need to work on a different branch than the one which triggered it | 13:17 |
*** alexchadin has joined #openstack-infra | 13:18 | |
*** lnxnut has joined #openstack-infra | 13:19 | |
pabelanger | infra-root: I'm going to see why graphite.o.o is no longer collecting statsd info | 13:19 |
openstackgerrit | Andrea Frittoli proposed openstack-infra/project-config master: Do not checked from remote branch https://review.openstack.org/508875 | 13:20 |
andreaf | fungi: ^^^ | 13:20 |
andreaf | fungi: as far as I understood the logic there we need to temp switch to branch (from HEAD) so I think using the local branch is the right approach there | 13:21 |
openstackgerrit | Paul Belanger proposed openstack-infra/system-config master: Add zuulv3.o.o to graphite.o.o https://review.openstack.org/508876 | 13:21 |
*** yamamoto has quit IRC | 13:21 | |
andreaf | fungi: I guess that fix will require a new nodepool image if/when it merges? | 13:21 |
*** lukebrowning has joined #openstack-infra | 13:22 | |
pabelanger | infra-root: okay, I reloaded firewall on graphite.o.o, that has done something. I've also noticed zuulv3.o.o missing^ | 13:22 |
*** jcoufal_ has joined #openstack-infra | 13:22 | |
*** jcoufal has quit IRC | 13:24 | |
*** Guest62304 has quit IRC | 13:24 | |
*** jpena|lunch is now known as jpena | 13:24 | |
fungi | andreaf: oh, yes that will certainly be a slow one to iterate on if we continue trying to fix up that legacy job while it's relying on scripts baked into the images | 13:24 |
*** eharney has joined #openstack-infra | 13:25 | |
*** Hal has joined #openstack-infra | 13:25 | |
*** Hal is now known as Guest10594 | 13:26 | |
*** lukebrowning has quit IRC | 13:26 | |
andreaf | fungi: you mean we should have ansible role that contains that script and runs it, and migrate the legacy-requirements job to a new zuulv3 native job that uses it? | 13:28 |
andreaf | fungi: that sounds like some work too :) | 13:28 |
fungi | andreaf: i think that's what was done elsewhere, but i don't know for sure | 13:28 |
*** lukebrowning has joined #openstack-infra | 13:28 | |
*** thorst has joined #openstack-infra | 13:29 | |
*** lukebrowning has quit IRC | 13:32 | |
*** baoli has joined #openstack-infra | 13:32 | |
*** martinkopec has quit IRC | 13:32 | |
*** jcoufal has joined #openstack-infra | 13:32 | |
*** jcoufal_ has quit IRC | 13:33 | |
*** baoli_ has joined #openstack-infra | 13:33 | |
esberglu | Anyone know how the readthedocs pages are generated? I think we have something misconfigured for nova-powervm. The stable branches docs aren't showing up since newton | 13:33 |
esberglu | https://readthedocs.org/projects/nova-powervm/versions/ | 13:33 |
mriedem | can someone link me to the reviewday rendered hosted page? i always seem to lose it - unless it doesn't exist anymore, and i'm thinking of the bug smash page | 13:34 |
*** lukebrowning has joined #openstack-infra | 13:34 | |
*** Goneri has joined #openstack-infra | 13:34 | |
*** alex_xu has quit IRC | 13:35 | |
*** ijw has joined #openstack-infra | 13:36 | |
*** baoli has quit IRC | 13:37 | |
openstackgerrit | Sagi Shnaidman proposed openstack-infra/tripleo-ci master: WIP: Fix TripleO CI jobs https://review.openstack.org/508660 | 13:37 |
fungi | mriedem: http://status.openstack.org/reviews/ | 13:38 |
edmondsw | esberglu they are showing up, but inactive... | 13:38 |
*** baoli_ has quit IRC | 13:38 | |
*** ihrachys has joined #openstack-infra | 13:38 | |
*** alex_xu has joined #openstack-infra | 13:38 | |
edmondsw | does anyone know if there's a way to mark them active automatically, or does that have to be done manually? | 13:38 |
*** Anticimex has quit IRC | 13:40 | |
mriedem | fungi: thanks, but i thought we had something that looked like this, but for reviews http://status.openstack.org/bugday/ | 13:40 |
*** ijw has quit IRC | 13:41 | |
*** Anticimex has joined #openstack-infra | 13:41 | |
fungi | mriedem: not that i remember, though i think sdague maybe did some custom pages for something along those lines at one point? i'm really not sure | 13:42 |
mriedem | ok np | 13:43 |
fungi | like maybe some nova-specific burndown charts for certain sprint-like events | 13:43 |
sdague | fungi: yeh, that was a bit different | 13:43 |
mriedem | we've had those burndown pages for some other things | 13:43 |
sdague | mriedem: what kind of view do you want? | 13:43 |
openstackgerrit | Anastasia Kravets proposed openstack-infra/project-config master: add ec2-api to unified doc build jobs https://review.openstack.org/508880 | 13:44 |
mriedem | i'll see if https://github.com/openstack-infra/reviewday.git generates it | 13:44 |
*** rloo has joined #openstack-infra | 13:44 | |
*** lukebrowning has quit IRC | 13:45 | |
*** thorst has quit IRC | 13:45 | |
*** lnxnut has quit IRC | 13:45 | |
*** sileht has joined #openstack-infra | 13:46 | |
sdague | that's a big rails app | 13:47 |
sshnaidm | fungi, maybe it's problem with zuul status page, but it seems like something is stuck there.. 492224,3 patch finished but appears there for an hour, I don't see any new patches arrive there | 13:47 |
sdague | oh, maybe it changed up | 13:47 |
mordred | andreaf, fungi: yes - we should definitely shift that script to being in a new non-legacy role so that it's not baked in to images anymore | 13:47 |
edmondsw | another issue with readthedocs is that http://nova-powervm.readthedocs.io redirects to latest, so how is someone supposed to determine what other branches are available? | 13:47 |
*** thorst has joined #openstack-infra | 13:47 | |
*** thorst has quit IRC | 13:48 | |
openstackgerrit | Vasyl Saienko proposed openstack-infra/openstack-zuul-jobs master: Set BUILD_TIMEOUT for ironic grenade jobs https://review.openstack.org/508882 | 13:48 |
dmsimard | I'm a bit confused. Is there a directory on the *node* that is uploaded to the executor by default before logs are uploaded to the logserver ? Say, "{{ ansible_user_dir }}/logs" for example ? I'm trying to find if there's one but I can't ? | 13:48 |
*** thorst has joined #openstack-infra | 13:48 | |
dmsimard | It seems what ends up being uploaded to logs is explicitely uploaded | 13:48 |
*** wolverineav has joined #openstack-infra | 13:49 | |
dmsimard | s/explicitely uploaded/explicitely uploaded to the executor/ | 13:49 |
mordred | dmsimard: there is a directory on the *execturor* that is automatically uploaded to logs.o.o | 13:49 |
*** lukebrowning has joined #openstack-infra | 13:49 | |
mordred | dmsimard: things are (currently) explicitly placed in the logs dir on the executor | 13:49 |
dmsimard | mordred: right, but nothing is uploaded to the executor by default ? | 13:49 |
*** yamamoto has joined #openstack-infra | 13:50 | |
*** yamamoto_ has joined #openstack-infra | 13:50 | |
*** garyk has quit IRC | 13:52 | |
mordred | dmsimard: that is correct | 13:52 |
mordred | dmsimard: we've had a few discussions about changing that - but haven't had the brainspace yet to do so | 13:52 |
andreaf | mordred: does anything exists yet to for requirement job in zuulv3 native format? Is there a WIP on that? | 13:53 |
*** bnemec has joined #openstack-infra | 13:53 | |
dmsimard | mordred: right, I was thinking about that just now. I guess "feature parity" with v2 includes uploading "{{ ansible_user_dir }}/logs" by default but anyway it's not really important for the time being. | 13:53 |
*** lukebrowning has quit IRC | 13:53 | |
dmsimard | AJaeger: fyi ^ this was me validating the comment from https://review.openstack.org/#/c/508434/15/playbooks/upload-logs.yaml | 13:54 |
*** yamamoto has quit IRC | 13:54 | |
pabelanger | mordred: dmsimard: Ya, I think originally I didn't like the idea of auto upload back to executor, but after PTG, I might have reconsidered it. Would make writing jobs a little easier | 13:54 |
openstackgerrit | Alfredo Moralejo proposed openstack-infra/tripleo-ci master: Use infra proxy server for trunk.r.o in delorean-deps https://review.openstack.org/508884 | 13:54 |
*** lukebrowning has joined #openstack-infra | 13:55 | |
mordred | pabelanger: yah - I agree with you on both sides of that - I liked not doing it originally, but now I think I've become more of a fan of doing it | 13:56 |
pabelanger | ++ | 13:56 |
dmsimard | mordred: adding it to the base job makes it easy and convenient IMO, the only thing jobs have to do is put their logs there and they'll be picked up.. instead of offloading the logic of uploading their things first | 13:56 |
*** kgiusti has left #openstack-infra | 13:57 | |
mordred | andreaf: no - I don't think there is - but I can help put one together real quick if you like | 13:58 |
pabelanger | The other way, which I also like, is to create a whitelist of the logs you want, and the post-run role, will go out and fetch them. This way, it forces job authors to be specific about logs, and not glob everything | 13:59 |
andreaf | mordred: sure that would be great | 13:59 |
*** thorst has quit IRC | 13:59 | |
*** lukebrowning has quit IRC | 13:59 | |
*** mat128 has joined #openstack-infra | 14:01 | |
*** thorst has joined #openstack-infra | 14:01 | |
Jeffrey4l | how to handle the current blocked jobs? | 14:01 |
*** lukebrowning has joined #openstack-infra | 14:01 | |
dmsimard | pabelanger: I don't think there's much of a difference between putting the whitelist logic inside the job role parameter or inside some other log collection bash whatever | 14:02 |
vsaienk0 | frickler: clarkb could you please check https://review.openstack.org/#/c/508882 this should help with applying correct timeout for ironic jobs | 14:02 |
dmsimard | pabelanger: for example, devstack and other projects already have their log collection scripts which is pretty much a whitelist :p | 14:02 |
*** martinkopec has joined #openstack-infra | 14:03 | |
*** thorst has quit IRC | 14:04 | |
pabelanger | dmsimard: right, clarkb is a fan of whitelist approach, since we have limited storage on logs.o.o | 14:04 |
*** thorst has joined #openstack-infra | 14:04 | |
*** efried has joined #openstack-infra | 14:05 | |
dmsimard | infra-root: is there anyone actively working on keeping zuul/nodepool healthy right now? Doesn't seem clear from recent backlog | 14:05 |
pabelanger | dmsimard: not until jeblair comes online | 14:06 |
*** lukebrowning has quit IRC | 14:06 | |
pabelanger | for now, I'm just in pending mode | 14:06 |
dmsimard | ok, probably worth a #status notice or something, there's people asking all over the place | 14:06 |
mnaser | i think it's probably just stuck because it lost connectivity to zk (rather than stuck leaking memory) | 14:07 |
mnaser | according to cacti graphs | 14:07 |
*** hongbin has joined #openstack-infra | 14:08 | |
*** lukebrowning has joined #openstack-infra | 14:08 | |
pabelanger | Yah, we likely should ask people for some patience this week until we get everything optimized | 14:08 |
mnaser | http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=557&page=2 idle with no cpu usage, in tune with the losing zookeeper connectivity | 14:08 |
mnaser | nodepool was restarted | 14:08 |
mnaser | but yeah i guess we can wait | 14:09 |
dansmith | it just flushed I think | 14:09 |
dansmith | although things I had in the backlog didn't seem to get queued | 14:09 |
pabelanger | yah, debug log on zuul is now doing merging | 14:09 |
pabelanger | let me see if I can find out why | 14:10 |
mnaser | oh it did | 14:10 |
mnaser | of course it does | 14:10 |
*** lukebrowning has quit IRC | 14:12 | |
*** srobert has joined #openstack-infra | 14:13 | |
*** vsaienk0 has quit IRC | 14:13 | |
dmsimard | pabelanger: fyi I removed -W on https://review.openstack.org/#/c/505233/ -- looked at logstash and we have occurrences. | 14:14 |
*** lukebrowning has joined #openstack-infra | 14:14 | |
dmsimard | ^ I'm seeing ansible sudo timeouts, it's an elastic-recheck query to track it | 14:14 |
dmsimard | For example http://logs.openstack.org/71/323971/59/check/openstack-tox-py27/9ad5128/job-output.txt#_2017-10-01_23_58_01_693309 | 14:15 |
pabelanger | I don't see anything specific in debug log for zuul. | 14:15 |
pabelanger | Seems after: 2017-10-02 14:00:54,509 DEBUG zuul.IndependentPipelineManager: Build <Build 1235b8f2fd404df483bad805885a651d of legacy-grenade-dsvm-neutron-multinode-live-migration on <Worker ze02.openstack.org>> completed | 14:15 |
pabelanger | things started moving again | 14:15 |
pabelanger | node requests being processed | 14:15 |
pabelanger | etc | 14:15 |
mordred | andreaf: remote: https://review.openstack.org/508891 Add requirements-check job | 14:18 |
tosky | https://review.openstack.org/#/c/508847/ (sahara-tests) is both in check and check-tripleo queues, is it expected? | 14:18 |
mordred | andreaf: (that includes your change, fwiw) | 14:18 |
*** lukebrowning has quit IRC | 14:18 | |
dmsimard | tosky: if you look at the check tripleo change, you'll see that there are no actual jobs | 14:18 |
dmsimard | tosky: that is zuul evaluating if there are any jobs to run in that pipeline | 14:19 |
*** kiennt26 has joined #openstack-infra | 14:19 | |
dmsimard | tosky: usually it'd be fast enough that you wouldn't notice it but there are performance issues right now. | 14:19 |
tosky | dmsimard: oh, oki | 14:19 |
tosky | thanks | 14:19 |
tosky | it's still chewing the delayed events | 14:20 |
pabelanger | http://grafana.openstack.org/dashboard/db/nodepool | 14:21 |
pabelanger | nodepool-launchers working again | 14:21 |
dmsimard | pabelanger: btw for that dashboard, https://review.openstack.org/#/c/508349/ | 14:21 |
pabelanger | mordred: mind adding https://review.openstack.org/508876/ do your review pipeline | 14:21 |
pabelanger | dmsimard: cool | 14:22 |
*** lukebrowning has joined #openstack-infra | 14:22 | |
jeblair | 13:53 < dmsimard> mordred: right, I was thinking about that just now. I guess "feature parity" with v2 includes uploading "{{ ansible_user_dir }}/logs" by default but anyway it's not really important for the time being. | 14:23 |
jeblair | dmsimard: that's not how jenkins worked at all | 14:23 |
pabelanger | dmsimard: left question | 14:23 |
pabelanger | dmsimard: nevermind, reading commit message now | 14:23 |
jeblair | dmsimard: jobs have always explicitly had to explicitly save logs | 14:24 |
dmsimard | jeblair: pretty sure all I have to do in a v2 job is to put the stuff I want collected in $WORKSPACE/logs and the scp log publisher took care of rsync'ing it to the logserver | 14:24 |
andreaf | mordred: cool, thanks | 14:24 |
*** eharney has quit IRC | 14:24 | |
jeblair | dmsimard: if you were using devstack-gate, that's because it had an scp publisher set up for $WORKSPACE/logs. | 14:24 |
jeblair | dmsimard: in other words, the jjb config said "copy $WORKSPACE/logs to the log server" | 14:25 |
dmsimard | jeblair: fair, I guess I took that publisher as default/granted | 14:25 |
dmsimard | as in, a vast proportion of jobs used that publisher, not just devstack/devstack-gate | 14:25 |
mordred | andreaf: and a follow up, remote: https://review.openstack.org/508894 Stop using zuul-cloner in project-requirements-change | 14:26 |
*** eharney has joined #openstack-infra | 14:26 | |
*** lukebrowning has quit IRC | 14:27 | |
jeblair | pabelanger: my query that was taking a long time is finished; feel free to restart zuulv3 if needed (i thought i said to do that anyway) | 14:27 |
dmsimard | jeblair: it looks like things have started properly dequeuing once again for the time being | 14:27 |
andreaf | mordred: thanks. The original job passed ZUUL_BRANCH to the script, any reason for not passing it anymore? | 14:28 |
dmsimard | pabelanger: so you want to keep zuul launchers for the time being ? I think two of your comments are contradicting :/ | 14:28 |
*** lukebrowning has joined #openstack-infra | 14:29 | |
*** rbrndt has joined #openstack-infra | 14:29 | |
dmsimard | pabelanger: ok, reading them in a chronological order makes sense :P | 14:29 |
pabelanger | jeblair: okay, I wanted to wait until you were online to check the state of zuul. But, we seem to be processing again. Possible it recovered? | 14:29 |
pabelanger | dmsimard: ya, sorry. 2 reviews | 14:29 |
jeblair | pabelanger: no idea, i've been asleep | 14:30 |
mordred | andreaf: yah - I missed that in patch one - did it in patch 2 | 14:30 |
pabelanger | I need to relocate to the library, will be back in a few minutes | 14:30 |
mnaser | https://review.openstack.org/#/c/508763 -- can i borrow a +W here? to help fix releasenote jobs and move projects to the new non legacy job | 14:31 |
jeblair | fungi: you said the next reconfiguration event will push the server into swap -- have we determined that reconfiguration events cause memory increase? | 14:31 |
jeblair | fungi: i thought i looked over the weekend and could not make that correlation, but if someone has, that would be great | 14:31 |
mordred | andreaf: although - we could go further with this and rework the script to not do any checking out - since zuul should be setting up the branches already how that script expects / desires | 14:31 |
jeblair | at the moment, i have no idea why memory use jumps. if anyone sees any correlations, please let me know. :) | 14:32 |
openstackgerrit | David Moreau Simard proposed openstack-infra/project-config master: Update Nodepool graphite metric names https://review.openstack.org/508349 | 14:32 |
mordred | jeblair: I'm not sure we're 100% satisfied it is a correlation - but it seemed like we were seeing memory jumps during reconfigurations | 14:32 |
jeblair | mordred: which kind of reconfiguration? full reconfig, tenant reconfig, or creating a dynamic layout? | 14:32 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Increase ansible internal_poll_interval https://review.openstack.org/508805 | 14:32 |
mordred | jeblair: but given the weekend nature of the observations, I would not stand by them strongly | 14:33 |
mordred | jeblair: one sec ... | 14:33 |
mnaser | i think we saw hangs during dynamic layout reconfig (and i assume the high volume to zuul.yaml changes contributes to the hangs) | 14:33 |
*** lukebrowning has quit IRC | 14:33 | |
mnaser | for a while it was stuck during phase 1 for a little while according to logs | 14:33 |
jeblair | mnaser: that for sure -- it takes about 1 minute to make a dynamic config | 14:33 |
andreaf | mordred: yeah - so if I want to use this now I need to setup a .zuul.yaml job in tempest, and whatever is there will be combined with the jobs from layout.yaml, right? | 14:34 |
*** lukebrowning has joined #openstack-infra | 14:35 | |
mordred | jeblair: http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2017-10-01.log.html#t2017-10-01T16:52:44 is where clarkb was talking about what he was seeing | 14:35 |
jeblair | mordred, clarkb: thanks, i'll dig into that timeframe | 14:37 |
mordred | andreaf: that's right - you can make that tempest patch a depends-on on that patch - and that could be a good way to verify it works properly - once we're happy with that patch and land it, we can update the check-requirements project-template in openstack-zuul-jobs | 14:37 |
mordred | jeblair: I remember noticing a dynamic config stuck in phase 1 (and before phase 1) for quite some time that was actually processing a shade patch | 14:38 |
andreaf | mordred: ok I'll do that | 14:39 |
mordred | jeblair: I think if you search for 500365,27 in the logs you should be able to see that | 14:39 |
mordred | andreaf: alternately ... | 14:39 |
*** lukebrowning has quit IRC | 14:39 | |
andreaf | mordred? | 14:39 |
mordred | andreaf: you could make a patch to opentsack-zuul-jobs updating the project template and then just make some other patch in a different repo that depends on that... project-template changes themselves are speculative | 14:39 |
mordred | andreaf: (instead of making a patch to tempest adding it to the check pipeline there) | 14:39 |
pabelanger | o/ | 14:40 |
mordred | andreaf: in fact - why don't I make the o-z-j patch real quick, since we'll need it anyway | 14:40 |
andreaf | mordred: ok yeah that makes more sense | 14:40 |
andreaf | mordred: sure even easier for me :) | 14:40 |
*** lukebrowning has joined #openstack-infra | 14:41 | |
andreaf | jeblair, mordred: on the tempest native job, are you happy to move on with this series in d-g https://review.openstack.org/#/q/topic:run_tempest+(status:open+OR+status:merged)? I can move the relevant roles to zuul-jobs as a follow-up | 14:41 |
jeblair | andreaf: fyi https://review.openstack.org/504259 | 14:43 |
mordred | andreaf: sorry - I haven't looked at those much yet - I think roles/process-test-results probably wants to move into zuul-jobs and be merged with roles/fetch-stestr-output and roles/fetch-testr-output | 14:43 |
*** ericyoung has joined #openstack-infra | 14:43 | |
jeblair | andreaf: if you have time to take over 504259, that would be great | 14:44 |
mordred | andreaf: (like, I think one role that finds testr / stestr results and processes them as appropriate would be great) | 14:44 |
*** dizquierdo has quit IRC | 14:44 | |
mordred | andreaf: the other stuff should, I believe, actually move into the tempest repo - since it's about running tempest and whatnot | 14:44 |
mordred | jeblair: (does that sound right to you?) | 14:44 |
jeblair | mordred: yeah -- work on this started before the cutover so... :) i still have a todo to move the devstack job into devstack as well | 14:45 |
*** lukebrowning has quit IRC | 14:46 | |
mordred | jeblair: oh - we should do that before I land the shade patch that consumes it | 14:46 |
mordred | jeblair: since I think the shade patch is the only consumer of that atm | 14:46 |
*** tesseract has quit IRC | 14:46 | |
mordred | jeblair: I'll put moving it on my todo-list for today | 14:46 |
jeblair | mordred: i thought you were already consuming it | 14:46 |
jeblair | mordred: thanks | 14:46 |
mordred | jeblair: nah - haven't landed the patch yet | 14:46 |
jeblair | andreaf: i'm still surprised devstack files directory needs perm changes :) | 14:47 |
*** lukebrowning has joined #openstack-infra | 14:47 | |
*** ccamacho has left #openstack-infra | 14:48 | |
openstackgerrit | Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Replace legacy-requirements with requirements-check https://review.openstack.org/508898 | 14:48 |
jeblair | dmsimard, pabelanger, mordred: see the save-file role in 506835 for another component of the log conversation | 14:48 |
mordred | andreaf: ^^ that updates the template - so a depends-on from any project that should trigger that job should do the rightthing | 14:48 |
*** wolverineav has quit IRC | 14:49 | |
dmsimard | andreaf: include_role works | 14:49 |
dmsimard | andreaf: if you don't use it with a conditional and static: no | 14:49 |
fungi | jeblair: i had an anecdotal correlation of memory surging right after a .zuul.yaml change merged, and there hadn't been any obvious job-related changes merging for nearly an hour prior to that; not a particularly strong correlation: http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2017-10-01.log.html#t2017-10-01T17:37:16 | 14:49 |
andreaf | jeblair: yeah I can rebase https://review.openstack.org/#/c/504259/ | 14:49 |
mordred | jeblair, pabelanger, andreaf: agree - I think we should tee up a discussion about those log/artifact collection stuff first thing once we have the current zuul memory/config issue under control | 14:49 |
jeblair | dmsimard, pabelanger, mordred, andreaf: i was also thinking that we could try out setting a job variable on devstack jobs to specify files for save-file. so in the same way you can enable a plugin with a job var, you can also add extra files to save. | 14:50 |
andreaf | dmsimard: cool, so I should be able to use it in loop right? did a think land for that? | 14:50 |
mordred | jeblair: ++ | 14:51 |
dmsimard | andreaf: in a loop ? why would you run an include_role task with items ? | 14:51 |
mordred | dmsimard: to collect a list of files | 14:51 |
openstackgerrit | Anastasia Kravets proposed openstack-infra/project-config master: add ec2-api to unified doc build jobs https://review.openstack.org/508880 | 14:51 |
andreaf | dmsimard: include_role: save-file with_items: fileA fileB | 14:51 |
mordred | andreaf: I *think* the suggestion from ansible people would be to make the role handle a list of inputs instead of doing include_role in a loop - so that each of the tasks in save-file has the with-items in it | 14:52 |
dmsimard | andreaf, mordred: I don't like that logic, I would run the role once with a list of files and the iteration occurs inside the role | 14:52 |
mordred | dmsimard: jinx | 14:52 |
dmsimard | you should not be including the role like 500 times | 14:52 |
*** lukebrowning has quit IRC | 14:52 | |
openstackgerrit | Anastasia Kravets proposed openstack-infra/project-config master: add ec2-api to unified doc build jobs https://review.openstack.org/508880 | 14:52 |
jeblair | dmsimard: can you find an infra-root that's not me to help you fix up include_role in zuul_json? pabelanger maybe? | 14:52 |
dmsimard | like include_role: name: foo vars: files: - one - two | 14:53 |
dmsimard | (oneline yaml is ew) | 14:53 |
jeblair | dmsimard: (sorry, i think my dance card is full for the next bit) | 14:53 |
dmsimard | jeblair: sure | 14:53 |
*** alexchadin has quit IRC | 14:53 | |
mordred | andreaf: the thing that is counter-intuitive with roles is that they aren't like function/methods - with one definition and multiple calls -they are actually just fancy include statements or macro expansions -so an include role in a loop winds up like you've got a copy of every task in the role for each item in the loop | 14:53 |
*** lukebrowning has joined #openstack-infra | 14:53 | |
dmsimard | andreaf, mordred, andreaf: if we're going to have a 'save-file' role, I'd rather have it in zuul-jobs btw | 14:54 |
dmsimard | instead of including andreaf twice, that last nickname should've been jeblair | 14:54 |
jeblair | fungi: thanks -- that helps -- .zuul.yaml changes landing are tenant reconfiguration events. | 14:54 |
mordred | dmsimard: yes - I think save-file and also process-test-results are great candidates for zuul-jobs | 14:54 |
andreaf | yeah that could go in zuul-jobs | 14:55 |
openstackgerrit | Paul Belanger proposed openstack-infra/openstack-zuul-jobs master: Add project-config to legacy-openstackci-beaker job https://review.openstack.org/508900 | 14:55 |
fungi | jeblair: i was really only able to isolate that enough for it to be remotely suspicious because of the relative infrequency of memory jumps over the weekend coupled with the relative infrequency of changes merging. if there is actually a causal relationship there, i doubt it's one we can easily isolate at our current change frequency | 14:56 |
dmsimard | I'd also name it something like collect-logs or something, but that's bikeshedding :) | 14:56 |
mordred | andreaf: fwiw, in my crazy utopian future world, I would love if it we had a process-test-results role that was smart enough to find the test results from whatever test runner was used, and to transform them to html as appropriate | 14:56 |
andreaf | mordred, dmsimard: so is one role and a loop that uses a task better? I can put a task in the same folder... the thing is that I need to do a sequence of things for each file... | 14:56 |
pabelanger | mordred: fungi: Shrews: https://review.openstack.org/508876/ should allow zuulv3.o.o to post statsd info to graphite.o.o now | 14:56 |
*** wolverineav has joined #openstack-infra | 14:56 | |
mordred | andreaf: so that "process-test-results" could also be called at the end of a go job or a javascript job or a java job and things would just-work -that may take a wile, but I think handling testr/stestr as appropriate is a great first step | 14:57 |
mordred | pabelanger: +A | 14:57 |
fungi | thanks pabelanger! | 14:57 |
dmsimard | andreaf: what you want to do is fairly simple, right ? put all the logs in one place and then you archive them | 14:57 |
andreaf | dmsimard | 14:58 |
dmsimard | andreaf: so it's like one "aggregate logfiles" task which pulls all the logs to a single location and then another task to archive them | 14:58 |
*** xarses has joined #openstack-infra | 14:58 | |
*** yamahata has quit IRC | 14:58 | |
*** lukebrowning has quit IRC | 14:58 | |
dmsimard | this aggregate logfiles task would be the one with_items, the archive would be something recursive that doesn't need a iteration | 14:58 |
dmsimard | that's how I would see thing working imo | 14:59 |
*** yamahata has joined #openstack-infra | 14:59 | |
andreaf | dmsimard: I need to compress them individually not all in one archive | 14:59 |
*** baoli has joined #openstack-infra | 14:59 | |
andreaf | the sequence is: check that file exists, rename if applicable, compress - and copy | 15:00 |
dmsimard | andreaf: https://github.com/openstack-infra/zuul-jobs/blob/master/roles/emit-ara-html/tasks/main.yaml#L18 | 15:00 |
*** lukebrowning has joined #openstack-infra | 15:00 | |
*** wolverineav has quit IRC | 15:00 | |
*** wolverineav has joined #openstack-infra | 15:00 | |
eumel8 | pabelanger: 508900 seems duplicated. I've already invited you to https://review.openstack.org/#/c/508857/ :) | 15:00 |
andreaf | dmsimard: nice I'll try that | 15:01 |
dmsimard | andreaf: renaming files seems overkill, what's the purpose ? | 15:01 |
dmsimard | andreaf: for mimetypes on logs.o.o ? | 15:02 |
dmsimard | jeblair, mordred: seeing a lot of merger errors on zuulv3.o.o | 15:02 |
dmsimard | "merger failure" | 15:02 |
dmsimard | for example 508660 | 15:02 |
dansmith | yeah tons | 15:03 |
pabelanger | eumel8: +3 Thanks! | 15:03 |
eumel8 | :) | 15:03 |
*** lukebrowning has quit IRC | 15:05 | |
*** wolverineav has quit IRC | 15:05 | |
pabelanger | I | 15:05 |
pabelanger | err | 15:05 |
pabelanger | I've noticed glance jobs in gate that are not in the integrated chance queue, is that expected? | 15:05 |
pabelanger | 507957 is the change | 15:06 |
*** tesseract has joined #openstack-infra | 15:07 | |
openstackgerrit | Monty Taylor proposed openstack-infra/devstack-gate master: Remove new-style devstack job https://review.openstack.org/508905 | 15:09 |
mordred | andreaf: remote: https://review.openstack.org/508906 Add devstack base job for zuul v3 | 15:09 |
mordred | andreaf: ^^ those two above shift the new-style devstack job to the devstack repo | 15:10 |
*** lukebrowning has joined #openstack-infra | 15:11 | |
pabelanger | jeblair: mordred: I believe zm05 and zm06 are currently stuck, did you want to check first before I asked to restart them? | 15:11 |
openstackgerrit | Merged openstack-infra/project-config master: Fix typo in tag-release pre playbook https://review.openstack.org/508871 | 15:12 |
pabelanger | both seems to be having issues cloning openstack/glance-specs | 15:12 |
jeblair | pabelanger: if it's just a stuck git process, go ahead and kill it. i think we need to add some timeouts. | 15:13 |
pabelanger | jeblair: ack | 15:13 |
mordred | pabelanger: yah- you can just kill the git process itself | 15:13 |
*** lnxnut has joined #openstack-infra | 15:13 | |
mordred | this is, btw, the second time that the thing it's been stuck on is cloning glance-specs | 15:13 |
mordred | so we might want to check to see if glance-specs is broken somehow | 15:13 |
mordred | (and also add some timeouts) | 15:13 |
lucasagomes | hi all, I'm getting an error in the networking-ovn gate jobs saying I explicitly need to add openstack/neutron to "required-projects"... Which repository I need to submit a change to add it ? See: http://logs.openstack.org/31/507031/1/gate/openstack-tox-py27/8511859/tox/py27-1.log | 15:13 |
mordred | lucasagomes: that'sa great question! (we should add that to the error message...) | 15:14 |
clarkb | lucasagomes: there is an example change, let me dig it up | 15:14 |
lucasagomes | mordred, ++ that would be very useful | 15:14 |
lucasagomes | clarkb, thanks a lot | 15:14 |
pabelanger | okay, mergers mering again | 15:14 |
mordred | lucasagomes: hrm. ... | 15:14 |
mordred | lucasagomes, clarkb: this is a little weird | 15:15 |
clarkb | lucasagomes: https://review.openstack.org/#/c/508775/ | 15:15 |
mordred | that's an openstack-tox-py27 job - I'm confused why it's trying to zuul-cloner anything | 15:15 |
clarkb | mordred: yes nuetron is firmly in the weird location | 15:15 |
mnaser | can i get a very quick/simple review on https://review.openstack.org/#/c/508742 ? just moving release-note-jobs template to use the new job? | 15:15 |
clarkb | mordred: because neutron | 15:15 |
*** lukebrowning has quit IRC | 15:15 | |
clarkb | mordred: basically everything in neutron land deps on neutron | 15:15 |
mordred | oh ... there's a zuul-cloner getting run in the tox itself? | 15:15 |
clarkb | yes | 15:16 |
*** isaacb has quit IRC | 15:16 | |
jeblair | yeah, this is something we should rework with v3 | 15:16 |
jeblair | probably have it use tox-siblings or something | 15:16 |
mordred | yah - agree. for now I think openstack-python-jobs-neutron is good | 15:16 |
mordred | jeblair: ++ | 15:16 |
clarkb | but also stop deping on a server in general | 15:16 |
*** lukebrowning has joined #openstack-infra | 15:17 | |
jeblair | mordred: won't it use tox-siblings if we add neutron to r-p? | 15:17 |
mordred | well - there's a few issues- one is that we don't release servers to pypi | 15:17 |
lucasagomes | mordred, clarkb thanks a lot, I will submit one for networking-ovn then! | 15:17 |
mordred | jeblair: it will - but neutron still has to get installed the first time - so I'm guessing the neutron tarball is in the requirements file? | 15:17 |
* frickler is getting 502s from gerrit again like yesterday | 15:18 | |
clarkb | jeblair: http://logs.openstack.org/28/501128/6/gate/build-openstack-sphinx-docs/4945316/job-output.txt.gz#_2017-10-02_15_13_41_965612 | 15:18 |
mordred | jeblair, clarkb: k - I've read the script tin there - I think we should put it on the list of things to do something about for sure | 15:19 |
clarkb | frickler: java melody says memory use did incrase and just feel significantly and no significant garbage collection overhead. I wonder if there is something else going on | 15:19 |
jeblair | clarkb: do you need me to look at something there? | 15:19 |
clarkb | jeblair: looks lke the executor ran out of disk | 15:19 |
mordred | but for now the openstack-python-jobs-neutron should keep us fine for now | 15:19 |
jeblair | clarkb: any other evidence of that? | 15:20 |
clarkb | jeblair: not that I have, just noticed that restarted the gate | 15:20 |
openstackgerrit | Vasyl Saienko proposed openstack-infra/openstack-zuul-jobs master: Set BUILD_TIMEOUT for ironic grenade jobs https://review.openstack.org/508882 | 15:20 |
jeblair | clarkb: i'd appreciate it if you could dig into that | 15:20 |
aspiers | jeblair: I'm currently at the Gerrit User Summit in London. If there are any questions etc. from the OpenStack side I'll do my best to represent them | 15:20 |
jeblair | i've been at work for 1.5 hours and still haven't spent more than 2 minutes looking at the memory leak | 15:20 |
jeblair | in fact | 15:20 |
jeblair | infra-root: can we have a quick conversation in #openstack-infra-incident ? | 15:21 |
clarkb | aspiers: an updated tuning guide would probably be nice since our memory usage seems significantly higher than it used to be | 15:21 |
aspiers | clarkb: OK I'll see what I can find out | 15:21 |
clarkb | aspiers: eg is our gerrit just small and we need to tune for it better or is there something wrong | 15:21 |
*** lukebrowning has quit IRC | 15:21 | |
aspiers | google currently demoing 2.15 (rc0 was cut last night) | 15:21 |
openstackgerrit | Lucas Alvares Gomes proposed openstack-infra/project-config master: Add neutron to required-projects for networking-ovn https://review.openstack.org/508911 | 15:22 |
aspiers | in particular the migration to NoteDB | 15:22 |
mordred | jeblair: yes | 15:22 |
pabelanger | ++ | 15:23 |
aspiers | PolyGerrit looks really nice | 15:23 |
mordred | jeblair: when we're done with that - if you could give me a quick thumbs-up/thumbs-down on whether https://review.openstack.org/#/c/508767/ will do the right thing (that is - will file matchers work on post jobs with the merge commits involved) | 15:23 |
*** lukebrowning has joined #openstack-infra | 15:23 | |
*** rhallisey has quit IRC | 15:23 | |
*** rhallisey has joined #openstack-infra | 15:24 | |
mordred | aspiers: yah -there's several nice things in 2.14 and 2.15 that we're looking forward to making use of | 15:24 |
jeblair | clarkb, Shrews, pabelanger: can you join #openstack-infra-incident please? | 15:24 |
Shrews | jeblair: there already | 15:24 |
*** dizquierdo has joined #openstack-infra | 15:24 | |
aspiers | clarkb, mordred: I talked to Ericsson about their pretty large internal instance. They have an active/passive HA setup. | 15:25 |
mordred | aspiers: one of them is zaro added a thing for us so that we can register test results in a structured fashion so we don't need to have hacky javascript scraping the comment we leave to produce that table of test results | 15:25 |
*** srobert_ has joined #openstack-infra | 15:25 | |
pabelanger | jeblair: yes | 15:25 |
frickler | could the full disk earlier also have triggered this post_failure during log collection? http://logs.openstack.org/28/501128/6/gate/build-openstack-sphinx-docs/4945316/job-output.txt.gz#_2017-10-02_15_13_41_965612 | 15:27 |
chandankumar | AJaeger: regarding this review https://review.openstack.org/#/c/508502/4 in order to import tags also what i need to change ? | 15:27 |
*** pcaruana has quit IRC | 15:27 | |
*** lukebrowning has quit IRC | 15:28 | |
*** srobert has quit IRC | 15:28 | |
*** e0ne has quit IRC | 15:31 | |
*** lukebrowning has joined #openstack-infra | 15:32 | |
*** dave-mccowan has quit IRC | 15:32 | |
*** lnxnut has quit IRC | 15:32 | |
*** dave-mcc_ has joined #openstack-infra | 15:32 | |
*** lukebrowning has quit IRC | 15:35 | |
openstackgerrit | Chandan Kumar proposed openstack-infra/project-config master: Add python-tempestconf project https://review.openstack.org/508502 | 15:35 |
*** lukebrowning has joined #openstack-infra | 15:35 | |
*** lennyb_ has joined #openstack-infra | 15:36 | |
*** lennyb_ has quit IRC | 15:36 | |
*** kiennt26 has quit IRC | 15:36 | |
tosky | can anyone help explaining what's wrong with this migrated job, which fails with "[WARNING]: No hosts matched, nothing to do"on pip installation? http://logs.openstack.org/47/508847/2/check/legacy-sahara-cli/5de4cbe/job-output.txt.gz | 15:38 |
*** martinkopec has quit IRC | 15:40 | |
*** rcernin has quit IRC | 15:41 | |
mordred | andreaf: you're working on some of the issues - we've spun up https://etherpad.openstack.org/p/zuulv3-issues to track things | 15:42 |
*** dhill_ has quit IRC | 15:43 | |
*** dhill__ has joined #openstack-infra | 15:43 | |
jlvillal | Our Ironic grenade jobs are failing since Zuul v3 changeover. Suspicion is that the BUILD_TIMEOUT variable is not being passed through. Proposed patch for that issue: https://review.openstack.org/#/c/508882/ | 15:46 |
openstackgerrit | Bernard Cafarelli proposed openstack-infra/project-config master: Add neutron to required-projects for networking-sfc https://review.openstack.org/508916 | 15:46 |
*** dbecker has joined #openstack-infra | 15:47 | |
*** jaypipes_ has joined #openstack-infra | 15:47 | |
mnaser | BUILD_TIMEOUT is passed in the env | 15:47 |
AJaeger | chandankumar: tags are imported by default - remove what you do *not* want | 15:47 |
*** apevec has joined #openstack-infra | 15:47 | |
chandankumar | AJaeger: i have cleaned up the things which is not needed | 15:47 |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul-jobs master: Handle z-c shim copies across filesystems https://review.openstack.org/508772 | 15:47 |
mriedem | fyi, the links in http://status.openstack.org/reviews/ are busted for the new version of gerrit. i can't reproduce getting the list locally, since it looks like reviewday scripts are all written to run server side (they require ssh to the gerrit server i think) | 15:47 |
mriedem | i can hit the gerrit REST API but that doesn't give you the same info as the CLI being used to query the database | 15:48 |
*** jaypipes has quit IRC | 15:48 | |
*** jistr is now known as jistr|off|mtg | 15:48 | |
*** jistr|off|mtg is now known as jistr | 15:49 | |
*** thorst has quit IRC | 15:49 | |
fungi | mriedem: if it has ,n,z on the end of the urls and they may need to be stripped | 15:50 |
fungi | s/and // | 15:50 |
mriedem | no, it's like this: https://review.openstack.org/#change,463987 | 15:50 |
mriedem | i don't know why the reviewday code even formats the url that way, b/c that's not what comes back from the gerrit CLI | 15:51 |
mriedem | i'm just going to remove the splitting | 15:51 |
openstackgerrit | Matt Riedemann proposed openstack-infra/reviewday master: Fix change URL links with latest review.openstack.org gerrit https://review.openstack.org/508919 | 15:52 |
mriedem | ^ but someone with access would have to test it | 15:52 |
*** egonzalez has quit IRC | 15:54 | |
*** gouthamr has joined #openstack-infra | 15:54 | |
AJaeger | chandankumar: please comment - if you haven't done so - in the review | 15:55 |
chandankumar | AJaeger: done | 15:57 |
*** lucasagomes is now known as lucas-afk | 15:57 | |
*** jascott1 has quit IRC | 15:58 | |
*** rpittau_ has joined #openstack-infra | 15:58 | |
*** ijw has joined #openstack-infra | 16:00 | |
*** xyang1 has joined #openstack-infra | 16:01 | |
*** rpittau has quit IRC | 16:01 | |
openstackgerrit | Joe D'Andrea proposed openstack-infra/system-config master: Add openstack-valet to statusbot and meetbot https://review.openstack.org/508924 | 16:02 |
*** scottda_ has quit IRC | 16:03 | |
*** trown is now known as trown|lunch | 16:04 | |
*** caphrim007 has joined #openstack-infra | 16:06 | |
*** chlong has joined #openstack-infra | 16:06 | |
*** vhosakot has joined #openstack-infra | 16:08 | |
openstackgerrit | Sagi Shnaidman proposed openstack-infra/tripleo-ci master: WIP: Fix TripleO CI jobs https://review.openstack.org/508660 | 16:09 |
*** thorst has joined #openstack-infra | 16:10 | |
*** tmorin has quit IRC | 16:12 | |
*** thorst has quit IRC | 16:13 | |
*** finucannot is now known as stephenfin | 16:13 | |
AJaeger | regarding release notes: Could somebody +2A https://review.openstack.org/#/c/508763/ , please? We need to merge that stack ... | 16:15 |
andreaf | mordred: the error message in https://review.openstack.org/#/c/508891 is not so verbose, but I'm guessing it doesn't like the empty project dict | 16:15 |
*** jaypipes_ is now known as jaypipes | 16:15 | |
openstackgerrit | Paul Belanger proposed openstack-infra/system-config master: Bring ze09.o.o and ze10.o.o online https://review.openstack.org/508929 | 16:16 |
dansmith | do we know what the cause for the "merger_failure" is? like, is it a thing we should recheck for, or is that another fix that needs to be applied before we should expect it to work? | 16:16 |
pabelanger | clarkb: fungi: mordred: ^ bring 2 more zuul-executors online to help with load issues. Booting servers now | 16:17 |
clarkb | dansmith: I think that is probably a new one. Will add to the list at https://etherpad.openstack.org/p/zuulv3-issues | 16:17 |
fungi | thanks pabelanger | 16:17 |
dansmith | clarkb: okay seems to be affecting a large number of jobs to me | 16:17 |
dansmith | clarkb: also seeing retry_limit, dunno what that means | 16:18 |
fungi | dansmith: means that zuul tried to run the job several times but it aborted unnaturally every time so it stopped retrying | 16:18 |
pabelanger | dansmith: clarkb: fungi: this is likely a result of the large load times on executors | 16:19 |
pabelanger | I've seen zuul-executor killing ansible-playbook tasks because of 10min timeout for pre / post runs | 16:19 |
AJaeger | For releasenotes, https://review.openstack.org/#/c/508742 and https://review.openstack.org/#/c/508763/ look ready.. | 16:19 |
clarkb | pabelanger: ok, I guess step 1 is rollout load fixes and add executors then reassesss | 16:19 |
fungi | agreed, if we get load down on the executors this instability may subside | 16:19 |
pabelanger | clarkb: yah | 16:20 |
pabelanger | current load on ze01 for example: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63999&rra_id=all | 16:20 |
dansmith | so don't kill me, genuinely just asking: has rolling back been reconsidered? it seems like we're in a worse spot now than we were on saturday... | 16:21 |
openstackgerrit | Javier Peña proposed openstack-infra/project-config master: Remove legacy Packstack integration jobs https://review.openstack.org/508851 | 16:21 |
clarkb | fungi: any idea why cacti wouldn't be picking up the /var/lib/zuul fs on ze0X nodes? | 16:21 |
fungi | dansmith: worse spot than saturday (because saturday was low load) but not worse than friday when we contemplated rolling back | 16:21 |
dansmith | fungi: okay | 16:21 |
openstackgerrit | Javier Peña proposed openstack-infra/project-config master: Remove legacy Packstack integration jobs https://review.openstack.org/508851 | 16:22 |
fungi | clarkb: were those filesystems added since the last time snmpd restarted? | 16:22 |
clarkb | fungi: I've restarted snmpd about 15 minutes ago at this point. | 16:22 |
fungi | k | 16:22 |
clarkb | fungi: and poll interval is 5 minutes iirc | 16:22 |
clarkb | its possible that systemctl restart snmpd is't doing what I expect. I should double chekc the process timestamps | 16:22 |
fungi | snmp poll interval is, but i'm not sure how often cacti rechecks to determine what new counters to poll | 16:22 |
pabelanger | Yah, I think once we get load under control on zuul-executors, we'll start seeing more consistant job runs | 16:22 |
mordred | andreaf: oh - ha - I really suck there | 16:23 |
andreaf | mordred: also https://review.openstack.org/#/c/508906/1 has an error coming back from zuul while paring .zuul.yaml | 16:23 |
fungi | dansmith: right now i think we have a handle on what problems are platform-related (which we're prioritizing) and which ones are job migration related (which we're also considering important but can more easily source fixes/reviews from a broader segment of the community for) | 16:24 |
mordred | andreaf: ah- yes on the other one - one sec | 16:24 |
openstackgerrit | Logan V proposed openstack-infra/project-config master: Remove duplicate definition of OSA integrated AIO job https://review.openstack.org/508931 | 16:24 |
dansmith | fungi: okay, it seemed like on friday we'd have one or two jobs fail per patch and now it's a lot, which is why it feels like a bit of a backslide. But, I know you have a better view than me, so I was just wondering | 16:25 |
clarkb | fungi: ya snmpd processes did actually restart, I guess next step is checking how often cacti refreshes its oid listings | 16:25 |
mordred | andreaf: thanks | 16:25 |
clarkb | dansmith: fungi I think on friday things were so completely broken that it helped self manage load | 16:26 |
clarkb | dansmith: fungi then over the weekend we fixed a bunch of things which is allowing the system to run itself over right now | 16:26 |
dansmith | clarkb: oh that's... good? :) | 16:27 |
fungi | "things seem worse because they're getting better" ;) | 16:27 |
pabelanger | indeed | 16:27 |
pabelanger | we are running a lot of ansible-playbook tasks now | 16:27 |
openstackgerrit | Merged openstack-infra/zuul feature/zuulv3: Limit concurrency in zuul-executor under load https://review.openstack.org/508649 | 16:28 |
mordred | pabelanger: awesome ^^ we should roll that out now I think | 16:29 |
SpamapS | what's load like now? | 16:29 |
tosky | ehm, sorry for asking again in a short time: I see a failure in a converted job which I can't decode ("[WARNING]: No hosts matched, nothing to do" when installing with pip) | 16:29 |
tosky | what could it be? http://logs.openstack.org/47/508847/2/check/legacy-sahara-cli/5de4cbe/job-output.txt.gz | 16:29 |
inc0 | dmsimard: man Ara is awesome. | 16:29 |
SpamapS | also did we merge the thing that makes ansible less-UI-responsive-but-also-less-CPU-guzzling ? | 16:29 |
pabelanger | mordred: ah, good idea. Reading up on it now | 16:29 |
*** thorst has joined #openstack-infra | 16:29 | |
*** lnxnut has joined #openstack-infra | 16:30 | |
*** thorst has quit IRC | 16:30 | |
pabelanger | SpamapS: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63999&rra_id=all is ze01.o.o atm | 16:30 |
pabelanger | ze03.o.o is currently running the fixes from tobiash | 16:31 |
mordred | pabelanger: awesome | 16:31 |
pabelanger | we can start to rotate other executors after puppet runs on the servers | 16:31 |
SpamapS | pabelanger: is ze03 more healthy? | 16:32 |
fungi | tosky: http://logs.openstack.org/47/508847/2/check/legacy-sahara-cli/5de4cbe/logs/devstack-gate-setup-workspace-new.txt | 16:32 |
SpamapS | that was like, an epic find | 16:32 |
SpamapS | "hey Ansible, take a chill pill, k?" | 16:32 |
*** panda is now known as panda|bbl | 16:32 | |
*** jtomasek has quit IRC | 16:32 | |
electrofelix | unusual error appearing for a pep8 job for jjb - POST_FAILURE https://review.openstack.org/#/c/134307/ -> http://logs.openstack.org/07/134307/13/check/openstack-tox-pep8/708607e/job-output.txt.gz | 16:32 |
fungi | tosky: maybe openstack-dev/devstack isn't in the required-projects list for the legacy-sahara-cli job? | 16:32 |
inc0 | hey guys, qq, why is zuul -1 when only non-voting jobi is red? https://review.openstack.org/#/c/508661/39 | 16:33 |
pabelanger | SpamapS: hard to say right now | 16:33 |
*** thorst has joined #openstack-infra | 16:33 | |
tosky | fungi: oh, let me check | 16:33 |
*** thorst has quit IRC | 16:33 | |
pabelanger | SpamapS: currently 166 ansible-playbook processes, with 36 load | 16:33 |
electrofelix | seeing a number of UNREACHABLE errors being reported from the ansible jobs as part of those POST_FAILURES | 16:34 |
pabelanger | oh | 16:34 |
clarkb | inc0: look in the review comments | 16:34 |
fungi | inc0: kolla-build-centos-source kolla-build-centos-source : NODE_FAILURE | 16:34 |
pabelanger | and we are swapping on ze03 | 16:34 |
*** yamamoto_ has quit IRC | 16:34 | |
mnaser | https://review.openstack.org/#/c/508742/ could use a very quick +2 to fix puppet's release note jobs :> | 16:34 |
SpamapS | yeah I think the concurrency limiter is going to be needed even with a bit of CPU help from the ansible internal polling factor | 16:34 |
fungi | inc0: toggle ci and then look at zuul's comment. i guess we don't pipe NODE_FAILURE results up into the ci results table we generate in gerrit | 16:34 |
SpamapS | 155 ansible processes is going to chew up a ton of RAM | 16:34 |
inc0 | ah ok, thanks | 16:34 |
inc0 | it's also voting, so probably that's why | 16:35 |
pabelanger | SpamapS: mordred: Ya, we are swapping on all zuul-executors currently. So more servers and patch hopefully helps | 16:35 |
*** hashar has quit IRC | 16:35 | |
*** efried is now known as efried_afk | 16:36 | |
fungi | electrofelix: looks like fetch-tox-output is where that first started going wrong: "ssh: connect to host 15.184.67.23 port 22: Connection timed out" | 16:39 |
fungi | i wouldn't be surprised if this is also fallout from executor load starving rsync/ssh of cycles | 16:39 |
*** yamamoto has joined #openstack-infra | 16:39 | |
tosky | fungi: in the requirements there is openstack-infra/devstack-gate, I guess it's not enough | 16:39 |
tosky | required-jobs | 16:39 |
pabelanger | okay, ze09.o.o launched, /etc/fstab cleaned up, rebooting to confirm | 16:40 |
fungi | tosky: openstack-dev/devstack is a different project from openstack-infra/devstack-gate, so probably not enough no | 16:40 |
*** thorst has joined #openstack-infra | 16:40 | |
tosky | fungi: I was expecting a transitive dependency (openstack-infra/devstack-gate bringing in openstack-dev/devstack), but no problem, going to add it | 16:41 |
*** jpena is now known as jpena|away | 16:41 | |
pabelanger | moving to launch ze10.o.o now | 16:41 |
openstackgerrit | Anastasia Kravets proposed openstack-infra/project-config master: add ec2-api to unified doc build jobs https://review.openstack.org/508880 | 16:42 |
clarkb | pabelanger: maybe before adding them to zuul we make sure the new code is already running on them too? | 16:42 |
clarkb | that means fewer restarts of services | 16:42 |
openstackgerrit | Joe D'Andrea proposed openstack-infra/irc-meetings master: Propose OpenStack Valet meeting date/time https://review.openstack.org/508933 | 16:42 |
pabelanger | clarkb: good idea, checking | 16:42 |
clarkb | ok snmpwalk seems to show that snmp sees the filesystems. So guessing it must be on the cacti side as far as grabbing that data | 16:42 |
*** lnxnut has quit IRC | 16:43 | |
openstackgerrit | Luigi Toscano proposed openstack-infra/openstack-zuul-jobs master: Add openstack-dev/devstack to all dsvm legacy sahara jobs https://review.openstack.org/508934 | 16:44 |
*** thorst has quit IRC | 16:44 | |
inc0 | can I turn off legacy jobs with local .zuul.yaml? | 16:45 |
fungi | tosky: there are plenty of jobs which use devstack-gate but don't use devstack, so hard to make assumptions there. if this had been a traditional "dsvm" job we likely would have guessed the required-projects list for it automatically, but it seems the sahara-cli job was a little bit special | 16:45 |
fungi | inc0: no, we'll need a patch removing them from the global configuration where they're defined instead | 16:46 |
mordred | inc0: you cannot - BUT - you can totally remove thelegacy jobs from your project by submitting a patch to project-config and removing them | 16:46 |
*** dave-mcc_ has quit IRC | 16:46 | |
tosky | fungi: it should have had dsvm in the name probably; but also other jobs with dsvm in the name did not get openstack-dev/devstack as dependency (see my patch above) | 16:46 |
inc0 | well what I wanted to do is to turn them off in gate patch locally and remove from project-config when it merges | 16:46 |
mordred | inc0: I made a doc section: https://docs.openstack.org/infra/manual/zuulv3.html#howto-update-legacy-jobs | 16:47 |
openstackgerrit | Sagi Shnaidman proposed openstack-infra/project-config master: Fix OVB jobs config for TripleO https://review.openstack.org/508936 | 16:47 |
tosky | fungi: out of curiosity, what is the use case of using devstack-gate without devstack? I thought that devstack-gate extends devstack with some default settings & co | 16:47 |
electrofelix | fungi: so does that mean, just wait a while? | 16:47 |
mordred | inc0: you'll need to either remove from project-config first, leaving you with a moment where you're not running either legacy or new verisons - or land new jobs to zuul.yaml first (double-jobs) then remove from project-config | 16:47 |
pabelanger | fungi: clarkb: mordred: any objections is we make legacy-logstash-filters-ubuntu-trusty non-voting for the moment on system-config? Otherwise, we'll need somebody to fix the job. http://logs.openstack.org/29/508929/1/check/legacy-logstash-filters-ubuntu-trusty/efec67f/job-output.txt.gz#_2017-10-02_16_38_36_696318 Currently blocking patches to system-config | 16:47 |
mordred | pabelanger: fine by me | 16:48 |
mordred | pabelanger: let's put it on the list of jobs to fix though | 16:48 |
fungi | electrofelix: yes, we have a several-pronged effort underway to address load on the executors (a patch from tobiash to make ansible run slightly less frequently, one from SpamapS implementing a load-average-based governor, and also adding more executor servers to spread the current load out better) | 16:48 |
*** slaweq_ has joined #openstack-infra | 16:48 | |
mordred | pabelanger: I have added it to the list | 16:48 |
inc0 | ehh, I don't like this gate-less limbo, but double jobs will make it nightmare to run | 16:48 |
pabelanger | mordred: thanks | 16:49 |
sshnaidm | does anybody know where are logs of legacy periodic jobs now? | 16:49 |
inc0 | what if I redeclare legacy jobs locally and run them with exit 0? | 16:49 |
electrofelix | fungi: thanks, will sit tight, it's not a critical fix | 16:49 |
fungi | tosky: devstack-gate is a poorly-named bit of software which provides a general job environment framework for setting up trees of repositories and doing log collection... you can completely remove any dependency on devstack by defining your own gate_hook function for it | 16:49 |
tosky | fungi: I see, thanks | 16:50 |
SpamapS | mordred: couldn't inc0 Depends-On: the project-config patch? | 16:50 |
mordred | inc0: you can turn them non-voting in your .zuul.yaml | 16:50 |
mordred | SpamapS: no. project-config patches are non speculative | 16:50 |
SpamapS | mordred: I mean, yeah he'd be double-gated for _one single patch_ but that would be the end of it. | 16:50 |
inc0 | they're largely non-voting, it's just time it takes to run is quite a problem | 16:50 |
tobiash | SpamapS: maybe a ram based limiter also makes sense | 16:50 |
SpamapS | tobiash: already written.. testing locally :) | 16:50 |
tobiash | :) | 16:50 |
fungi | pabelanger: i'd be okay with disabling that job temporarily if we're not merging the sorts of changes that's supposed to be testing (they seem likely to be a lower priority right now anyway) | 16:51 |
mordred | inc0: yah - it's a scenaio that is not optimized at the moment- I'd recommend makign the project-config patch and following up with a .zuul.yaml patch with a depends-on as SpamapS suggests - the .zuul.yaml patch won't actually not have the jobs until we land the project-config patch - but it should lower the amount of time you're exposed as much as possible | 16:51 |
openstackgerrit | Paul Belanger proposed openstack-infra/openstack-zuul-jobs master: Set legacy-logstash-filters-ubuntu-trusty non-voting https://review.openstack.org/508940 | 16:52 |
pabelanger | fungi: mordred: thanks! ^ if you don't mind reviewing | 16:52 |
*** jascott1 has joined #openstack-infra | 16:52 | |
inc0 | ok and I'll probably freeze non-gate, non-critical changes in meantime in kolla | 16:52 |
SpamapS | mordred: I kind of like it actually. You get one patch where the old and new both pass. | 16:52 |
tobiash | If load is still a concern with the limiters also a poll interval of 0.1 could be considered, which could reduce the load a bit further at the expense of about 10s longer jobs per 100 tasks | 16:53 |
SpamapS | I guess there might be a scenario where other things in the gate for your project behind the project-config patch could land w/o tests | 16:53 |
fungi | SpamapS: what it's especially non-optimal for is teams who just want to abandon the legacy jobs in favor of native jobs in their repos, rather than fixing the legacy jobs first | 16:53 |
fungi | since they end up needing to do it the other way around | 16:54 |
SpamapS | fungi: agreed! | 16:54 |
SpamapS | so yeah, for that you're exposed while the legacy jobs are gone | 16:54 |
*** slaweq_ has quit IRC | 16:55 | |
SpamapS | since you can't Depends-On -> the .zuul.yaml patch since it won't pass | 16:55 |
sshnaidm | pabelanger, can you please look in your time? thanks https://bugs.launchpad.net/tripleo/+bug/1720721 | 16:56 |
openstack | Launchpad bug 1720721 in tripleo "CI: OVB jobs fail because can't install XStatic from PyPI mirror on rh1 cloud" [Critical,Triaged] - Assigned to Paul Belanger (pabelanger) | 16:56 |
pabelanger | sshnaidm: won't be right now, working on another issue. However, any infra-root will be able to see what is going on | 16:56 |
*** thorst has joined #openstack-infra | 16:57 | |
sshnaidm | any infra root, please look at https://bugs.launchpad.net/tripleo/+bug/1720721 | 16:57 |
openstack | Launchpad bug 1720721 in tripleo "CI: OVB jobs fail because can't install XStatic from PyPI mirror on rh1 cloud" [Critical,Triaged] - Assigned to Paul Belanger (pabelanger) | 16:57 |
SpamapS | Though I guess you could do a three-step ... #1- Submit patch to project-config disabling legacy jobs, #2- Submit .zuul.yaml patch that Depends-On #1, #3- Approve #2 and #1 in sequence. | 16:57 |
fungi | granted, most (if not all) awake infra-root sysadmins are also busy trying to stabilize zuul | 16:57 |
*** baoli has quit IRC | 16:57 | |
SpamapS | Then scream at anyone who +A's before #2 lands. | 16:58 |
fungi | SpamapS: yeah, i think that's what SamYaple ended up doing for the loci jobs | 16:58 |
*** baoli has joined #openstack-infra | 16:58 | |
*** chlong has quit IRC | 16:58 | |
SpamapS | fungi: we should write that down as a Zuul v3.1 feature request. :) | 16:58 |
SpamapS | hm, memory governor.. | 16:59 |
SamYaple | i didn't do a depends on, but about the same thing yea | 16:59 |
SamYaple | purge project-config jobs then drop in a quick noop, then i built real gates in-repo | 16:59 |
*** derekh has quit IRC | 16:59 | |
SpamapS | would people rather specify the minimum available memory as a percentage of total system memory, or something like minium available MiB? | 16:59 |
*** jpich has quit IRC | 16:59 | |
SpamapS | like "stop accepting jobs when there is less than 10% of available memory" or "stop accepting jobs when there is less than 2500 MB of available system memory" ? | 17:00 |
pabelanger | SpamapS: does bubblewrap have any ability to do that? | 17:00 |
*** dklyle has joined #openstack-infra | 17:00 | |
SpamapS | pabelanger: that would fail the job too late | 17:00 |
SpamapS | pabelanger: we need to stop accepting jobs entirely | 17:00 |
pabelanger | ah | 17:00 |
pabelanger | right | 17:01 |
fungi | pabelanger: the idea is to implement this around the same code paths as the load average governor (could probably go in the same thread and just become an additional check and tuning config parameter) | 17:01 |
SpamapS | just let them sit in gearman | 17:01 |
*** thorst has quit IRC | 17:01 | |
SpamapS | It does go in the same thread | 17:01 |
SpamapS | I have it working | 17:01 |
SpamapS | but I did it as psutil.virtual_memory().percent | 17:01 |
SpamapS | and I'm jsut wondering if that's too abstract | 17:01 |
*** david-lyle has quit IRC | 17:02 | |
fungi | as for how to configure it, i like the percent implementation. probably not worth bikeshedding over for now and we can find out if it needs tweaking later | 17:02 |
SpamapS | that said, it's a nice default behavior, as it never needs to be tuned... just don't do concurrent jobs if you have less than 10% of whatever system RAM available. | 17:02 |
SpamapS | jeblair: ^ ? | 17:03 |
SpamapS | 10% could be a _lot_ of RAM tho ;) | 17:03 |
SpamapS | maybe 5% is better. | 17:03 |
SpamapS | also haven't looked at how psutil accounts for swap in that | 17:03 |
fungi | SpamapS: we're trying to let him focus on tracking down the memory leak/performance issues for now, so can probably just push an initial version into review | 17:03 |
SpamapS | fungi: (He had some strong thoughts on the load average implementation so I figured I'd ask ;) | 17:04 |
SpamapS | I'll let review do the talking | 17:04 |
fungi | fair | 17:04 |
*** bhavik1 has joined #openstack-infra | 17:04 | |
fungi | SpamapS: i have a feeling we can just get working implementations of these governors and then work out whether we need to change the defaults/scaling factors/config options before 3.0.0 gets tagged | 17:05 |
*** yamahata has quit IRC | 17:05 | |
fungi | this is not a binding api until we tag | 17:05 |
pabelanger | SpamapS: mordred: fungi: clarkb: i propose we restart ze01.o.o, which has been updated with both patches to zuul from tobiash and SpamapS | 17:05 |
openstackgerrit | Michal Jastrzebski (inc0) proposed openstack-infra/project-config master: Remove Kolla and Kolla-Ansible jobs https://review.openstack.org/508944 | 17:07 |
pabelanger | mordred: fungi: clarkb: I also propose we consider a force merge of https://review.openstack.org/508929/ to by pass the load issues in the gate. That will bring online 2 new zuul-executors | 17:08 |
fungi | pabelanger: i concur, sounds good | 17:08 |
fungi | on both fronts | 17:08 |
fungi | i can merge 508929 | 17:08 |
pabelanger | great, ty | 17:08 |
inc0 | I'll remove legacy job definition when we're done with migrations | 17:08 |
openstackgerrit | boden proposed openstack-infra/openstack-zuul-jobs master: Add neutron-lib as required to legacy-tempest-dsvm-neutron-src https://review.openstack.org/508945 | 17:08 |
clarkb | pabelanger: sounds like a plan | 17:09 |
*** yamahata has joined #openstack-infra | 17:09 | |
*** harlowja has joined #openstack-infra | 17:10 | |
openstackgerrit | Merged openstack-infra/system-config master: Bring ze09.o.o and ze10.o.o online https://review.openstack.org/508929 | 17:11 |
fungi | pabelanger: ^ merged | 17:11 |
pabelanger | fungi: thanks! | 17:11 |
pabelanger | I'm preparing to stop ze01.o.o now | 17:11 |
*** claudiub|2 has joined #openstack-infra | 17:11 | |
*** trown|lunch is now known as trown | 17:13 | |
*** ykarel has joined #openstack-infra | 17:13 | |
*** claudiub has quit IRC | 17:14 | |
pabelanger | okay, should be stopping now | 17:15 |
*** chlong has joined #openstack-infra | 17:15 | |
SpamapS | pabelanger: how long does that take? ;) | 17:17 |
pabelanger | SpamapS: still running | 17:17 |
pabelanger | under zuulv2.5 is was much faster when we stopped | 17:17 |
pabelanger | almost instant | 17:18 |
*** bhavik1 has quit IRC | 17:18 | |
*** yamahata has quit IRC | 17:18 | |
SamYaple | yea lets just rollback | 17:18 |
pabelanger | possible that post-runs still need to complete on an abort. Will look into logs in a moment | 17:19 |
*** efried_afk is now known as efried | 17:19 | |
*** gouthamr has quit IRC | 17:21 | |
SpamapS | durn.. psutil.virtual_memory().percent is kinda lame | 17:21 |
SpamapS | includes buffers/cache | 17:21 |
pabelanger | almost done... | 17:22 |
*** slaweq_ has joined #openstack-infra | 17:22 | |
SpamapS | pabelanger: it's also possible that killing 155*3 processes takes a while | 17:22 |
pabelanger | and stopped :D | 17:22 |
pabelanger | SpamapS: ya, post-run was the reason | 17:22 |
SpamapS | actually 155*5 .. ssh agents.. ssh bin | 17:22 |
pabelanger | it was uploading logs and stuff | 17:22 |
pabelanger | which is a little different then v2.5 | 17:23 |
SpamapS | ah rsync didn't obey? | 17:23 |
pabelanger | either way, making note and going to start ze01 | 17:23 |
*** thorst has joined #openstack-infra | 17:23 | |
pabelanger | and ze01.o.o started | 17:24 |
* SpamapS hoping load governor 'splodes less than it helps ;) | 17:24 | |
*** Guest10594 has quit IRC | 17:27 | |
openstackgerrit | Michal Jastrzebski (inc0) proposed openstack-infra/openstack-zuul-jobs master: Removal of kolla legacy jobs https://review.openstack.org/508950 | 17:27 |
openstackgerrit | Michal Jastrzebski (inc0) proposed openstack-infra/project-config master: Remove Kolla and Kolla-Ansible jobs https://review.openstack.org/508944 | 17:27 |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Use publish-to-pypi and friends for python releasing https://review.openstack.org/508951 | 17:27 |
mordred | infra-root: ^^ that patch should fix all of the issues with python tarball jobs and pypi release jobs | 17:28 |
pabelanger | SpamapS: mordred: we only have 13 ansible-playbook processes currently. Looking to see why now on ze01 | 17:29 |
SpamapS | pabelanger: load? | 17:29 |
SpamapS | also you do have to wait for stuff to be triggered/retried ;) | 17:29 |
pabelanger | 38 currently | 17:29 |
AJaeger | mordred: +471, -13006 lines? WOW! | 17:29 |
*** chlong has quit IRC | 17:30 | |
*** baoli_ has joined #openstack-infra | 17:30 | |
SpamapS | pabelanger: are these 8cpu flavors? | 17:30 |
pabelanger | SpamapS: climbing now, 50 | 17:30 |
pabelanger | i don't think I gave it enough time | 17:30 |
SpamapS | pabelanger: also it takes 30s to notice high load :-P | 17:30 |
mordred | AJaeger: right? nice patch right? | 17:31 |
*** chlong has joined #openstack-infra | 17:31 | |
*** tosky has quit IRC | 17:31 | |
jdandrea | Received a -1 from Zuul in reviewing Change 508933. "legacy-irc-meetings-tox-ical TIMED_OUT in 35m 37s" - should I wait or is there something I need to do/change? | 17:31 |
SpamapS | pabelanger: there's an INFO log line that it will print when it unregisters | 17:31 |
AJaeger | mordred: do a few more of these and maybe our memory problems disappear ;) | 17:31 |
*** baoli has quit IRC | 17:31 | |
pabelanger | SpamapS: ya, looking for that now | 17:32 |
mordred | AJaeger: :) | 17:32 |
pabelanger | 2017-10-02 17:28:56,232 INFO zuul.ExecutorServer: Unregistering due to high system load 29.44 > 20.0 | 17:32 |
pabelanger | possible we want to up that from 20 | 17:32 |
SpamapS | pabelanger: \o/ | 17:33 |
*** e0ne has joined #openstack-infra | 17:33 | |
pabelanger | let me check what we have on zuul-launchers | 17:33 |
*** slaweq_ has quit IRC | 17:33 | |
SpamapS | pabelanger: it's ncpu*2.5 ;) | 17:33 |
*** sshnaidm is now known as sshnaidm|afk | 17:33 | |
pabelanger | with a limit of 53 ansible-playbooks, that is down from 150 concurrent ansible-playbook processes on zuulv2.5 | 17:33 |
pabelanger | but, let me first roll out the change to all zuul-executors, and bring online ze09.o.o and ze10.o.o | 17:34 |
AJaeger | mordred: you add an empty initial line ;( | 17:34 |
pabelanger | then evaluate | 17:34 |
mordred | AJaeger: booo | 17:34 |
mordred | AJaeger: want me to fix? or just send in a followup? | 17:35 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: DNM: Fix branch matching logic https://review.openstack.org/508955 | 17:35 |
*** dizquierdo has quit IRC | 17:35 | |
AJaeger | mordred: the job is still in the queue, fix might be nice. But followup works as well... | 17:36 |
pabelanger | ze02.o.o stopping | 17:36 |
dansmith | clarkb: I put something on that etherpad you linked me to a big ago. Should I put my name next to it if I reported it? Can't tell if those are locks or authors on the other ones | 17:36 |
dhill__ | hi guys | 17:37 |
dhill__ | what can makes addresses to be {} when we do a "nova list --debug" but we clearly see the neutron ports are still there? | 17:37 |
dhill__ | if we try do dettach the port, it says it's being used | 17:37 |
dhill__ | this happened after last newton update | 17:37 |
dhill__ | older VMs almost all have addresses {} if they have a given net/subnet with ipv4 and ipv6 | 17:37 |
dhill__ | but we can create new vms in that same network | 17:37 |
AJaeger | dhill__: #openstack is the general channel for OpenStack related questions. | 17:37 |
SpamapS | pabelanger: if jobs queue, jobs queue. That's life. | 17:37 |
AJaeger | dhill__: This channel is about the infrastructure the project runs on | 17:37 |
AJaeger | dhill__: please ask again on #openstack | 17:38 |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Use publish-to-pypi and friends for python releasing https://review.openstack.org/508951 | 17:38 |
SpamapS | do we graph gear status somewhere? | 17:38 |
chandankumar | AJaeger: regarding this review https://review.openstack.org/#/c/506669/ will i add these jobs to openstack-zuul-jobs till neutron moves their legacy zuul v3 jobs under neutron repo? | 17:38 |
dhill__ | AJaeger, oh sorry | 17:38 |
pabelanger | SpamapS: yup, i figure it will take a few days to even everything out | 17:38 |
mordred | AJaeger: I went ahead and updated the patch | 17:39 |
AJaeger | chandankumar: this is all new to us ;) Why not start adding those directly to neutron repo - and then neutron can move the rest later... | 17:39 |
chandankumar | AJaeger: ok | 17:39 |
pabelanger | ze02.o.o started; ze03.o.o stopping | 17:39 |
mordred | frickler: I think we should roll forward with 508822 as the approach for neutron jobs - but I think we can learn from 508785 and just make variants in the project-template rather than full new jobs | 17:40 |
AJaeger | frickler: are you still around and want to give it a try? | 17:40 |
*** lnxnut has joined #openstack-infra | 17:40 | |
SpamapS | so the next thing to watch is this http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64005&rra_id=all <-- swap on ze01 | 17:41 |
SpamapS | looks like it's staying low so that's good | 17:41 |
clarkb | dansmith: names there are people will to work to fix tge problem. Feel free to volunteer or just put your name down as pote tially having more info | 17:41 |
AJaeger | mordred: +2A on your change | 17:42 |
dansmith | clarkb: ack | 17:42 |
pabelanger | ze03.o.o started; ze04.o.o stopping | 17:43 |
*** andreww has joined #openstack-infra | 17:44 | |
*** andreww has quit IRC | 17:44 | |
AJaeger | frickler: I'll update your change now following mordred's suggestion | 17:44 |
*** dtantsur is now known as dtantsur|afk | 17:44 | |
*** ekcs has joined #openstack-infra | 17:44 | |
*** xarses has quit IRC | 17:45 | |
mordred | frickler, AJaeger: updating frickler's change real quick | 17:45 |
mordred | AJaeger: oh - I'm alreayd on it ... | 17:45 |
openstackgerrit | Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Add tox jobs including neutron repo https://review.openstack.org/508822 | 17:45 |
AJaeger | mordred: you're too quick ;) | 17:46 |
*** andreww has joined #openstack-infra | 17:46 | |
mordred | AJaeger: I'm going to make a project-config patch to make that change to all opentack/networking-* repos | 17:46 |
AJaeger | mordred: so, one change instead of many? OK! | 17:48 |
AJaeger | mordred: once you're done, I'll abandon the others and add links | 17:48 |
*** ralonsoh has quit IRC | 17:50 | |
*** camunoz has joined #openstack-infra | 17:50 | |
mordred | AJaeger: yah - I've got a script locally I can edit for doing some of these big edits - so when there'sa decent pattern it's fairly easy | 17:51 |
*** shardy has quit IRC | 17:52 | |
AJaeger | convenient ;) | 17:52 |
*** yamahata has joined #openstack-infra | 17:53 | |
AJaeger | mordred: fungi likes the plan as well - he just +2A 508822. Now we have to wait for merging... | 17:53 |
*** sambetts is now known as sambetts|afk | 17:54 | |
*** lnxnut has quit IRC | 17:55 | |
*** tesseract has quit IRC | 17:55 | |
*** electrofelix has quit IRC | 17:56 | |
mordred | pabelanger: we should make ourselves a really good playbook for doing rolling restarts of ze0* | 17:56 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add memory awareness to system load governor https://review.openstack.org/508960 | 17:57 |
pabelanger | mordred: ya, we have something for zuul-launchers, shouldn't take much to update for zuul-executors | 17:57 |
*** gouthamr has joined #openstack-infra | 17:57 | |
mordred | pabelanger: maybe one that does a one-at-a-time version of 'stop ; no-really-stop ; run-puppet ; start' | 17:57 |
mordred | pabelanger: also - there are times when service zuul-executor stop doesn't, you know, stop things - when things are calmed down - we should really sort that out too :) | 17:58 |
SpamapS | mordred: and maybe a wait_for after start ... and serial: 1 | 17:58 |
mordred | SpamapS: yes! | 17:58 |
pabelanger | mordred: yah | 17:58 |
mordred | SpamapS: turns out we can wait_for the finger port to be open, since that's a thing the executors do when they start | 17:58 |
SpamapS | so 508960 above is the memory governor. I'm throwing it up to have it run through tests. I'll also test it on my local zuul box. | 17:58 |
SpamapS | mordred: nice | 17:58 |
*** tosky has joined #openstack-infra | 17:59 | |
mnaser | has there been any issues @ 137.74.26.164 ? | 18:00 |
mnaser | i dont see anything in buffer but i see this sort of failure - fatal: [centos-7]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \"137.74.26.164\". Make sure this host can be reached over ssh", "unreachable": true} | 18:00 |
*** thorst_ has joined #openstack-infra | 18:01 | |
pabelanger | Hmm, ze07 has 3 zuul-executor processes again | 18:02 |
mnaser | (another similar case if this helps) | 18:02 |
mnaser | UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \"15.184.66.218\". Make sure this host can be reached over ssh", "unreachable": true} | 18:02 |
pabelanger | mnaser: we are just stopping / starting executors, so possible it is related | 18:02 |
mnaser | ok cool | 18:02 |
mnaser | figured as much | 18:02 |
SpamapS | so, ze01 I see has dropped way below 20.0 | 18:03 |
pabelanger | mordred: ze07.o.o might be in a weird state. Did you want to look at it before I consider killing processes? | 18:03 |
AJaeger | argh, https://review.openstack.org/#/c/508822/2 got "Unknown configuration error" ;( | 18:03 |
*** Swami has joined #openstack-infra | 18:03 | |
*** thorst has quit IRC | 18:03 | |
pabelanger | SpamapS: ya, down to 36 ansible-playbook processes | 18:03 |
*** jpena|away is now known as jpena|off | 18:04 | |
*** bnemec has quit IRC | 18:04 | |
SpamapS | memory usage is still pretty high, but without the swapping | 18:04 |
*** nikhil has joined #openstack-infra | 18:04 | |
sc` | frickler kicked off a job for chef, and the integration gate said 'aborted' from ze06. is this relevant or meaningful? | 18:06 |
jlk | o/ | 18:06 |
mordred | infra-root, AJaeger, SpamapS, dmsimard: we have issues with tox flake8 envs in a bunch of infra repos - project-config is a good example ... | 18:06 |
jlk | any fires I can help with? | 18:06 |
*** dklyle is now known as david-lyle | 18:06 | |
mordred | jlk: we've made an etherpad ... | 18:06 |
*** kzaitsev_pi has quit IRC | 18:07 | |
mordred | jlk: https://etherpad.openstack.org/p/zuulv3-issues | 18:07 |
jlk | word. | 18:07 |
*** kzaitsev_pi has joined #openstack-infra | 18:07 | |
AJaeger | jlk, all translation jobs are broken - and I saw that you started earlier working on them. Those are more camp fires ;) I'd like to discuss whether t ocontinue your changes instead of fixing the existing ones. But have a look first whether there are forest fires first ;) | 18:07 |
jlk | ah yeah, it was... difficult if not impossible to test those translation jobs before merging at the time. | 18:08 |
mordred | infra-root, AJaeger, SpamapS, dmsimard, jlk: they are failing open - one of the issues is that we have a 'select = H231' which means ONLY run that test - but also when I try to run locally flake8 seems to actually run on nothing, while if I run pyflakes directly it works as expected | 18:08 |
jlk | might want to continue that work | 18:08 |
AJaeger | jlk: I added this to etherpad | 18:08 |
* mordred is bringing this up in channel because it's possible there is a deeper systemic issue with flake8 right now and our general usage in tox in opentack that might be worth figuring out | 18:09 | |
clarkb | SpamapS: pabelanger so initial indications are that it is working? | 18:09 |
mordred | AJaeger, jlk: I think switching translations to the new jobs is the route to go - the old auto-converted jobs aren't going to be able to run properly no matter what we do | 18:09 |
SpamapS | clarkb: it certainly stopped accepting jobs. I don't know that it has re-started as they completed. | 18:09 |
mordred | SpamapS: \o/ | 18:10 |
dmsimard | mordred: I guess I'm missing context, the linters job for project-config is passing right now | 18:10 |
pabelanger | clarkb: SpamapS: Load has been limited, seems our cap is 20.0 right now | 18:10 |
mordred | dmsimard: right - it's not actually doing anything | 18:10 |
mordred | dmsimard: for reasons I do not understand | 18:10 |
mordred | dmsimard: so it's not ACTUALLY running flake8 on the files we think it is, and it's not showing errors when they exist | 18:10 |
pabelanger | just finishing up with ze07.o.o, need to kill processes then all zuul-executors running the code | 18:10 |
pabelanger | and ze09.o.o and ze10.o.o are also online | 18:10 |
mordred | dmsimard: I can verify that just running flake8 in the project-config repo when there are files that very much should be failing - it just exits 0 | 18:11 |
*** slaweq_ has joined #openstack-infra | 18:12 | |
clarkb | mordred: SpamapS before I end up in completely the wrong direction debugging this disk issue, are the executors enforcing job disk limits? and if they are would those limits be enforced as disk out of space errors? | 18:12 |
dmsimard | mordred: but that's not something that was changed due to v3, right ? I mean, it's just tox | 18:12 |
SpamapS | clarkb: there's a thread that runs du and will stop jobs that use too much disk | 18:14 |
*** slaweq_ has quit IRC | 18:14 | |
clarkb | SpamapS: do you know off the top of your head if that du will manifest errors in this way http://logs.openstack.org/28/501128/6/gate/build-openstack-sphinx-docs/4945316/job-output.txt.gz#_2017-10-02_15_13_41_965612 I am thinking not as that appears to be the system call writing exploding | 18:15 |
*** hemna__ has joined #openstack-infra | 18:15 | |
SpamapS | clarkb: no | 18:15 |
*** thorst_ has quit IRC | 18:15 | |
SpamapS | that probably wrote faster than the disk accountant could kill it | 18:15 |
SpamapS | would expect that to have caused widespread explosion on full disk | 18:16 |
SpamapS | though maybe it immediately unlinked and released the space | 18:16 |
mnaser | SpamapS clarkb fwiw i saw this happen once on a debian image | 18:16 |
clarkb | ya I think bwrap likely cleaned it up quickly if it actually did fill the disk | 18:16 |
mnaser | further investigation showed that the growroot stuff didnt actually run | 18:16 |
openstackgerrit | Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Add tox jobs including neutron repo https://review.openstack.org/508822 | 18:16 |
SpamapS | it COULD be in the tmpfs | 18:17 |
mnaser | so the partition was super tiny | 18:17 |
clarkb | mnaser: I think this was on our zuul executors, but good to know we might have that problem on the test nodes | 18:17 |
SpamapS | I forget where tmpfs's get mounted | 18:17 |
SpamapS | mnaser: this was on localhost | 18:17 |
mnaser | ah okay my bad | 18:17 |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Use the openstack-python-jobs-neutron templates https://review.openstack.org/508961 | 18:17 |
mordred | AJaeger: ^^ there's the other side of that ozj patch | 18:17 |
*** hemna_ has quit IRC | 18:18 | |
AJaeger | mordred: great, thanks! | 18:19 |
openstackgerrit | Jeremy Stanley proposed openstack-infra/system-config master: Set ZooKeeper purge_interval for on nodepool.o.o https://review.openstack.org/508962 | 18:20 |
jeblair | infra-root: i've got another debug routine going that may impact performance; please don't restart zuul until i give the all-clear, even if it's seemingly dead (the info i get may be valuable) | 18:20 |
clarkb | at least on ze04 I am seeing lower disk consumption since the restart, which makes sense because fewer jobs running | 18:20 |
pabelanger | ack | 18:20 |
fungi | roger jeblair! | 18:20 |
pabelanger | clarkb: yah | 18:20 |
clarkb | jeblair: noted | 18:20 |
fungi | Shrews: https://review.openstack.org/508962 should hopefully address the nodepool disk filling up | 18:20 |
mordred | jeblair: roger | 18:21 |
Shrews | fungi: spectacular | 18:22 |
clarkb | fungi: Shrews I've confirmed that the version of zk on nodepool.o.o is new enough for that feature | 18:22 |
fungi | clarkb: thanks for double-checking (dpkg told me 3.4.5 | 18:23 |
fungi | ) | 18:23 |
*** jbadiapa_ has joined #openstack-infra | 18:23 | |
AJaeger | mordred: gave -1 on all related changes and linked to the new one. | 18:23 |
clarkb | yup and 3.4 added it. I would approve but someone already beat me to it :) | 18:23 |
SpamapS | are the logs from executors somewhere searchable? Kibana? | 18:23 |
*** rhallisey has quit IRC | 18:23 | |
clarkb | SpamapS: they are not | 18:24 |
mordred | dhellmann, fungi, smcginnis: ok - I thought putting the new requirements job in the requirements repo straightaway was a good idea, but we have other jobs we need to fix there before we can land patches - so I'm going to add it to openstack-zuul-jobs and we can circle back and do a rename/move dance later | 18:24 |
dhellmann | requirements or release? | 18:24 |
mordred | dhellmann: requirements - sorry, I'm pinging the wrong people :) | 18:24 |
fungi | prometheanfire: ^ | 18:25 |
clarkb | SpamapS: is there something specific you'd like to see? I can work to collect data | 18:25 |
*** ykarel has quit IRC | 18:25 | |
*** jbadiapa has quit IRC | 18:25 | |
AJaeger | fungi, do you want to +2A https://review.openstack.org/#/c/508822/ again, please? | 18:25 |
fungi | wow, what happened to patchset 3 there? | 18:26 |
fungi | did gerrit lose it? | 18:26 |
clarkb | possible db incrementation issue? | 18:27 |
AJaeger | fungi, mordred : https://review.openstack.org/508822 still errors - no need to +2A | 18:27 |
prometheanfire | fungi: ? | 18:27 |
fungi | prometheanfire: mordred's comments about requirements jobs | 18:27 |
prometheanfire | k | 18:27 |
*** claudiub|2 has quit IRC | 18:27 | |
prometheanfire | ya, we've been broken since the switch I think :| | 18:27 |
openstackgerrit | Monty Taylor proposed openstack/os-client-config master: DNM Testing that new releasenotes job works https://review.openstack.org/508965 | 18:29 |
mordred | prometheanfire: fixes coming up for you soon | 18:29 |
prometheanfire | yep, thanks | 18:29 |
mordred | dhellmann, smcginnis: ^^ https://review.openstack.org/508965 should verify the new releasenotes build jobs work | 18:29 |
dhellmann | mordred : until we can release reno, those jobs are going to continue to fail. | 18:30 |
*** e0ne has quit IRC | 18:30 | |
dhellmann | I was watching job status for patches merging into the releases repo at http://zuulv3.openstack.org but now it's blank. Is that related to other changes? | 18:30 |
SpamapS | clarkb: got distracted.. but the disk accountant should log when it stops jobs | 18:30 |
mordred | dhellmann: ah - good point - well - the patch for fixing pypi jobs is up and has been approved: https://review.openstack.org/#/c/508951/2 | 18:30 |
mordred | dhellmann: are there any other issues you are aware of affecting releasing reno? | 18:31 |
AJaeger | regarding release notes, let's merge https://review.openstack.org/#/c/508742 and https://review.openstack.org/#/c/508763 , please | 18:31 |
dmsimard | pabelanger: for https://review.openstack.org/#/c/508940/ .. would it not have been the same amount of effort to just add the required-project ? | 18:31 |
dhellmann | mordred : we had weird post failures in the jobs associated with the releases repo; let me find a link | 18:31 |
mordred | AJaeger: are you comfortable landing those without seeing a job pass with them first? I guess we have to be becaue we need the reno fix ... | 18:32 |
dhellmann | mordred : http://logs.openstack.org/22/2293d561bf37b71b14a7b89e2ada1a5552fc2168/release-post/tag-releases/7a41b08/ | 18:32 |
pabelanger | dmsimard: possible, there there is more then 1 project to add. If you want to propose a fix, that would be great | 18:32 |
dmsimard | pabelanger: ok, I'll do it | 18:32 |
dhellmann | mordred : there doesn't appear to be a "run" step in that job | 18:32 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: DNM: Fix branch matching logic https://review.openstack.org/508955 | 18:32 |
AJaeger | mordred: if we really can test them, let's do it... | 18:32 |
mordred | dhellmann: I'm sorry - too many balls in the air - which job is missing a run step? | 18:33 |
AJaeger | mordred: ah, I see... | 18:33 |
mordred | dhellmann: ah - I can read scrollback better now | 18:33 |
dhellmann | smcginnis : were you looking into the release job failures? | 18:33 |
dhellmann | I thought someone else was, so I didn't... | 18:33 |
pabelanger | dhellmann: have a log handy? | 18:34 |
dhellmann | pabelanger : http://logs.openstack.org/22/2293d561bf37b71b14a7b89e2ada1a5552fc2168/release-post/tag-releases/7a41b08/ | 18:34 |
pabelanger | dhellmann: oh, that is fixed | 18:34 |
pabelanger | 1 sec | 18:34 |
pabelanger | dhellmann: mordred: https://review.openstack.org/508871/ for tag-releases job | 18:35 |
pabelanger | should be able to try again | 18:35 |
pabelanger | making note to bubble up error messages from debug.log | 18:35 |
dhellmann | ok. it looks like the bot that reports when changes merge isn't reporting in #openstack-release | 18:35 |
*** vsaienk0 has joined #openstack-infra | 18:36 | |
*** yamamoto has quit IRC | 18:37 | |
jlk | mordred: I'm fixing some pep8 issues with playbooks/files/project-requirements-change.py in openstack/requirements. This may be the first time py3 pep8 is ran on it, so some stuff outside your change needs updating. Fixing. | 18:37 |
*** wolverineav has joined #openstack-infra | 18:39 | |
fungi | dhellmann: what merged change wasn't reported to the channel? i see openstackgerrit mention 508913 merging at 16:53 | 18:39 |
dhellmann | oh, maybe I had a client blip | 18:39 |
pabelanger | I think I dropped before posting | 18:41 |
pabelanger | dmsimard: No package matching 'emacs' is available do you know why? http://logs.openstack.org/40/508940/1/check/base-integration-ubuntu-xenial/3e2f551/ara/result/01e9f809-b683-4470-8d25-3f91ab56b765/ | 18:41 |
dmsimard | pabelanger: there is only this one required-project for the logstash job. | 18:41 |
dmsimard | pabelanger: should I send another patchset on top of yours so we don't end up doing 3 commits ? | 18:41 |
dhellmann | fungi : ignore that, I need new glasses | 18:41 |
dmsimard | pabelanger: that's the configure-mirror integration test | 18:41 |
dmsimard | pabelanger: hm, there should be a "clear cache" part somewhere | 18:42 |
SpamapS | clarkb: log = logging.getLogger("zuul.ExecutorDiskAccountant") | 18:42 |
SpamapS | clarkb: that logger should show anything done by the accountant. | 18:43 |
jlk | GAH I fucked that up | 18:43 |
SpamapS | but a job has to be over limits for a while | 18:43 |
dmsimard | pabelanger: there's a notify update apt cache but it doesn't look like it ran | 18:43 |
clarkb | SpamapS: thanks | 18:43 |
pabelanger | dmsimard: why would that be? | 18:43 |
pabelanger | Hmm, that is ze09.o.o | 18:44 |
pabelanger | which is a new executor | 18:44 |
dmsimard | pabelanger: trying to see, hang on | 18:44 |
clarkb | http://logs.openstack.org/92/489492/14/gate/openstack-tox-pep8/36e3dcd/job-output.txt.gz#_2017-10-02_18_39_16_680481 maybe we should shut off infracloud while we are debugging? | 18:44 |
clarkb | at least in nodepool | 18:45 |
*** chlong has quit IRC | 18:45 | |
pabelanger | dmsimard: fwiw, haven't used handlers much for running things like apt-get update. Personally prefer just using a task to handle that | 18:45 |
dmsimard | pabelanger: it's for idempotency | 18:45 |
*** vsaienk0 has quit IRC | 18:46 | |
*** thorst has joined #openstack-infra | 18:46 | |
*** thorst has quit IRC | 18:46 | |
*** chlong has joined #openstack-infra | 18:46 | |
pabelanger | dmsimard: sure, but not really applicable for jobs, since we run once and delete the node | 18:47 |
dmsimard | pabelanger: fair, but I write things to be idempotent by default to keep best practices and all | 18:48 |
dmsimard | pabelanger: so the configure repo step didn't return a 'changed' status and that's why the handler didn't fire | 18:48 |
*** thorst has joined #openstack-infra | 18:48 | |
dmsimard | That's weird. I don't see why the repositories would had already been set up and the template task would return ok | 18:49 |
openstackgerrit | Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Replace legacy-requirements with new requirements-check https://review.openstack.org/508898 | 18:49 |
clarkb | infra-root ya I'm seeing a lot of failures to ssh to infracloud | 18:50 |
clarkb | I'll push up a change to set max-servers to 0 | 18:50 |
pabelanger | clarkb: confirmed, I am seeing it too | 18:50 |
clarkb | pabelanger: SpamapS general observation of the zuul status page is looking much better though | 18:50 |
mordred | jlk: oh - you updated https://review.openstack.org/#/c/508891 - let's see if your latest update works - if it does land it - if not, we can land https://review.openstack.org/508898 | 18:51 |
mordred | AJaeger: ^^ | 18:51 |
jlk | yeah I'm fixing i right | 18:51 |
jlk | mordred: I changed the wrong one. so I'm changing the beginning of the stack now and rebasing | 18:51 |
jlk | sorry for hte noise :( | 18:51 |
SpamapS | yay | 18:51 |
*** baoli_ has quit IRC | 18:51 | |
mordred | jlk: no worries! thanks for the help | 18:52 |
dmsimard | pabelanger: looking at a xenial job that isn't in base-integration, the template task for setting up the mirror properly returns 'changed': http://logs.openstack.org/04/508704/2/check/legacy-ara-integration-py35-latest/c29d413/ara/reports/664af4eb-7bd3-4453-8c0b-2efccca3379b.html | 18:52 |
jeblair | zuul is swapping now and has slowed to a crawl. my query still hasn't returned. i have no idea how long it might take. should we cut losses and restart now? | 18:52 |
pabelanger | dmsimard: right, i think it is nice to have. Like I said, I haven't used handlers much myself for things like update apt. FWIW: we likely don't need do that that anymore either, since we added python-apt to our images. We might be able to have package / apt task do it directly | 18:52 |
*** lnxnut has joined #openstack-infra | 18:52 | |
mordred | jeblair: what query were you doing? | 18:53 |
openstackgerrit | Clark Boylan proposed openstack-infra/project-config master: Disable infracloud https://review.openstack.org/508969 | 18:53 |
pabelanger | clarkb: yah, executors seems happier now | 18:53 |
dmsimard | pabelanger: that particular handler is not really to install python-apt, it's to make sure the image doesn't run with stale apt cache after modifying the apt repos | 18:53 |
fungi | jeblair: would being able to run it immediately/soon following a restart increase teh chances of getting data back? or would the data you get back likely be less useful? | 18:53 |
jeblair | mordred: an objgraph path search. there are 59 more layout objects than i expect; i'm trying to find their reference path. | 18:54 |
*** rcernin has joined #openstack-infra | 18:54 | |
mordred | jeblair: also - I'm on board with restarting the scheduler if you are- also ok to wait longer to see how it goes | 18:54 |
mordred | jeblair: nod | 18:54 |
dmsimard | pabelanger: but anyway, I confirm that the handler works as expected outside that job, so maybe there's something specific to the base-integration job. | 18:54 |
*** baoli has joined #openstack-infra | 18:54 | |
mordred | jeblair: fwiw - over the weekend I observed swap usage go back down after spiking - so it's not impossible for it to come back - but it also may take longer than is reasonable | 18:54 |
jeblair | fungi: i'd need it to at least have one memory bump. they don't seem to take long to happen when we're busy. | 18:55 |
jeblair | mordred: yeah, i even noted a reduction in layout objects earlier | 18:55 |
fungi | jeblair: so maybe restart, give it a bit, then try to query when it's not yet in danger of halting and catching fire? | 18:55 |
jeblair | fungi: yeah, i think that's how i'm leaning | 18:56 |
fungi | wfm | 18:56 |
jeblair | unfortunately, it's so dead i can't grab a copy of the queues | 18:56 |
pabelanger | dmsimard: well, for the sake of keeping things simple, can we change it to a task over notify handler? It is preventing jobs in openstack-zuul-jobs from passing | 18:56 |
*** slaweq_ has joined #openstack-infra | 18:56 | |
fungi | jeblair: and no way to abort the query to help it regain enough oomph to grab a copy of the pipelines? | 18:57 |
pabelanger | this should like what happened this morning too | 18:57 |
*** ociuhandu has quit IRC | 18:57 | |
jeblair | fungi: i think the swapping is what's killing it | 18:57 |
fungi | collateral damage i guess | 18:57 |
pabelanger | +1 to what jeblair thinks is best | 18:57 |
*** lukebrowning has quit IRC | 18:58 | |
jeblair | i mean, the query is probably causing some swapping... | 18:58 |
*** kjackal_ has quit IRC | 18:58 | |
jeblair | oh hey i got a queue list | 18:59 |
fungi | woo! | 18:59 |
*** dave-mccowan has joined #openstack-infra | 18:59 | |
*** SumitNaiksatam has quit IRC | 19:00 | |
dmsimard | pabelanger: I'd like to understand why the mirror configuration task did not return 'changed' first | 19:00 |
jeblair | it stopped actively swapping | 19:00 |
*** SumitNaiksatam has joined #openstack-infra | 19:00 | |
dmsimard | pabelanger: it implies that the mirrors had already been configured which should not happen | 19:00 |
*** gtmanfred has joined #openstack-infra | 19:00 | |
dmsimard | pabelanger: that job has been passing for as long as I can remember so I want to know if something new has been introduced | 19:01 |
jeblair | fungi, mordred: what do you think? leave it running for a bit more since it stopped swapping (for now), or take advantage of the fact we have a queue dump and restart? | 19:02 |
fungi | jeblair: i'm leaning toward the restart anyway, since we may have lucked out in getting queue dumps | 19:02 |
mordred | jeblair: it's a tough call - if you think you can still get valid data after the restart - then I say restart with the queue dump | 19:02 |
jeblair | okay, will restart now | 19:03 |
fungi | and hopefully havnig it under less memory/io pressure will allow you to more quickly get back debugging details | 19:03 |
*** e0ne has joined #openstack-infra | 19:03 | |
*** e0ne has quit IRC | 19:03 | |
jeblair | are there any scheduler fixes we need in place? | 19:03 |
pabelanger | dmsimard: that is fine, but honestly i don't want to spend too much time on making things idempotent for configure-mirror tasks. I'd rather just laydown files and always apt-get update and move on to running jobs. | 19:03 |
jeblair | (i'm running from a personal branch with extra debugging) | 19:04 |
mordred | jeblair: your patches for yappi and stack dumping are the only relevant patches outstanding | 19:04 |
jeblair | k, i'll use the same version as before | 19:05 |
jeblair | it's starting up | 19:05 |
clarkb | mordred: jeblair one sec | 19:05 |
clarkb | oh neverming | 19:05 |
clarkb | (its not super urgent) | 19:05 |
*** lnxnut has quit IRC | 19:05 | |
jeblair | clarkb: ok. i can always restart :) | 19:05 |
clarkb | mordred: was going to ask where the BUILD_TIMEOUT env injection change ended up. Was that in zuul? | 19:05 |
pabelanger | we should maybe send out a status notice for people not in channel | 19:05 |
clarkb | because ironic apparently has some tests with that not working | 19:05 |
mordred | clarkb: yes | 19:05 |
jeblair | clarkb: should be executor though | 19:05 |
clarkb | jeblair: ah ok so different daemon, got it | 19:05 |
mordred | clarkb: it's in zuul/ansible/filter/zuul_filters.py | 19:06 |
jeblair | amusingly, our 3 components are almost never running the same version :) | 19:06 |
mordred | jeblair: :) | 19:06 |
fungi | pabelanger: i guess we can, though that's probably not a huge upset given the continual disruption we've been inflicting over the past 4-5 days so far | 19:06 |
clarkb | mordred: in what situations is the timeout not set? eg could the ironic jobs be failing because they don't have a timeout set? | 19:06 |
mordred | clarkb: well - BUILD_TIMEOUT itself is just set based on zuul.timeout | 19:07 |
dmsimard | zuulv3.o.o is empty ? | 19:07 |
jlk | restarting | 19:07 |
mordred | clarkb: which comes from the job's config | 19:07 |
dmsimard | ah | 19:07 |
clarkb | mordred: https://review.openstack.org/#/c/508882 <- that is their proposed fix | 19:07 |
clarkb | mordred: which I think likely misses the point if we are fixing it that way | 19:07 |
jeblair | okay, running re-enqueues now | 19:07 |
mordred | clarkb: do we have an example of one that failed? | 19:08 |
jeblair | i'm going to grab lunch while memory use expands | 19:08 |
pabelanger | this will be a good test for zuul-executors and load | 19:08 |
dmsimard | the polling fix is in ? | 19:09 |
clarkb | mordred: not that I've seen (just going thorugh our list and trying to distill what we can push through at this point) | 19:09 |
clarkb | mordred: so maybe we defer on that | 19:09 |
mordred | clarkb: http://logs.openstack.org/32/507632/4/check/legacy-grenade-dsvm-ironic-multinode-multitenant/62cb405/zuul-info/inventory.yaml | 19:09 |
SpamapS | one thing I find interesting, with the 20.0 cap on 1-minute load, 5 minute load average seems to be staying down around 3.5 - 4.0 or so. http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63999&rra_id=all | 19:09 |
mordred | clarkb: if you look at the bottom, you can see timeout: 10800 | 19:09 |
*** chlong_ has joined #openstack-infra | 19:09 | |
mordred | clarkb: so there IS a timeout set in that job, so I'd expect the environment passed to the shell tasks to include BUILD_TIMEOUT=10800000 | 19:10 |
pabelanger | SpamapS: I've noticed that too, but want to see what happens now that zuulv3.o.o was restarted | 19:10 |
SpamapS | but if you set that next to the memory graph, I think the reason for this is that we have more available RAM for caching/buffers, so it's an exponential improvement http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64003&rra_id=all | 19:10 |
pabelanger | SpamapS: ze01.o.o is up to 10.38 currently | 19:10 |
clarkb | SpamapS: ya fewer tires to spin | 19:10 |
openstackgerrit | David Moreau Simard proposed openstack-infra/openstack-zuul-jobs master: Set legacy-logstash-filters-ubuntu-trusty non-voting https://review.openstack.org/508940 | 19:10 |
openstackgerrit | David Moreau Simard proposed openstack-infra/openstack-zuul-jobs master: Add required-projects to legacy-logstash-filters jobs https://review.openstack.org/508971 | 19:10 |
openstackgerrit | David Moreau Simard proposed openstack-infra/openstack-zuul-jobs master: Revert "Set legacy-logstash-filters-ubuntu-trusty non-voting" https://review.openstack.org/508972 | 19:10 |
*** lukebrowning has joined #openstack-infra | 19:11 | |
dmsimard | pabelanger: doh, mistakenly rebased your patch | 19:11 |
dmsimard | sorry :/ | 19:11 |
mordred | SpamapS: ++ ... turns out there are times when the way to improve both throughput and concurrency is to reduce concurrency | 19:11 |
pabelanger | 100 ansible-playbook processes on ze01 currently, load of 10.41 | 19:12 |
SpamapS | would graphite have the throughput of jobs per-executor? | 19:12 |
pabelanger | it should, IIRC | 19:12 |
pabelanger | but we need to land firewall change first | 19:12 |
pabelanger | 508876 | 19:12 |
clarkb | pabelanger: please add it to the list (also I've got a list of changes that should be good to review now if that one is ready for review) | 19:13 |
clarkb | all on the one etherpad | 19:13 |
pabelanger | clarkb: it's approved, trying to land fixes to system-config first. See 508940 | 19:13 |
SpamapS | I see nothing in graphite unfortunately | 19:14 |
jdandrea | Am I doing anything wrong? A lot of timeouts/limits/failures. Or something else might not be working ATM? https://review.openstack.org/#/c/508924 | 19:14 |
pabelanger | SpamapS: ya, firewall | 19:14 |
SpamapS | ohwell ;) | 19:14 |
*** e0ne has joined #openstack-infra | 19:14 | |
clarkb | pabelanger: ah ok 508940 had a failure but should get reenqueued so hopefully it gets in then progress | 19:14 |
jlk | jdandrea: we just restarted zuul, so there may have been some oddness | 19:15 |
jdandrea | jlk Ah, got it, thx. Will wait a bit. | 19:15 |
dmsimard | mordred: do you happen to have an answer for my question in https://review.openstack.org/#/c/507558/1/zuul.d/jobs.yaml@15 ? | 19:15 |
*** lukebrowning has quit IRC | 19:15 | |
pabelanger | clarkb: ya, there was a failure in integration-configure-mirror job too. dmsimard is looking into why | 19:15 |
mordred | dmsimard: probably? | 19:15 |
AJaeger | dmsimard: can't we merge https://review.openstack.org/#/c/508971/1 without the non-voting change? | 19:16 |
pabelanger | but, ya. we should do 507558 :) | 19:16 |
dmsimard | AJaeger: yeah 508971 should address the issue with the job but pabelanger submitted a non-voting change first | 19:16 |
dmsimard | 508971 should had already merged but it hasn't (flap in a job) and zuul restarts etc | 19:17 |
dmsimard | er, I meant 508940 should had already merged | 19:17 |
mordred | dmsimard: I do, in fact | 19:17 |
pabelanger | dmsimard: at this rate, I'd just rebase 971 to master and I'll abandon mine | 19:17 |
dmsimard | pabelanger: ok let's do that | 19:18 |
* AJaeger agrees with pabelanger | 19:18 | |
openstackgerrit | David Moreau Simard proposed openstack-infra/openstack-zuul-jobs master: Add required-projects to legacy-logstash-filters jobs https://review.openstack.org/508971 | 19:18 |
dmsimard | rebased ^ | 19:18 |
dmsimard | mordred: it's possible to filter on files in different repos ? | 19:18 |
SpamapS | FYI memory governor works for me in my private zuul here | 19:18 |
AJaeger | pabelanger: I abandoned yours | 19:18 |
pabelanger | AJaeger: thanks | 19:18 |
mordred | dmsimard: yes | 19:19 |
AJaeger | pabelanger: do yo uwant to +2A https://review.openstack.org/#/c/508971 later? | 19:19 |
smcginnis | dhellmann: Sorry, was gone for a bit, but yes, I was. The patch Paul linked to I thought would resolve the issues. | 19:19 |
dmsimard | mordred: how? the path is not relative to the project | 19:19 |
dmsimard | ... or is it ? | 19:19 |
mordred | dmsimard: it's relative to the root of the proposed patch | 19:19 |
dmsimard | pabelanger: if you have time, I'd like to troubleshoot https://review.openstack.org/#/c/504238/ for the finger zuul-stream-functional finger://ze05.openstack.org/40c5ebd0467545ad9fa37fac6fc12e28 | 19:20 |
dmsimard | pabelanger: it's the truncated json issue we looked at during the ptg. | 19:20 |
mordred | dmsimard: so if base-integration is added to a pipeline for zuul-jobs, then patches proposed to zuul-jobs that match the files in the files: section will control whether that patch should run the integration job or now | 19:20 |
mordred | not | 19:20 |
*** lukebrowning has joined #openstack-infra | 19:20 | |
mordred | dmsimard: but if a given patch doesn't have ^roles/configure-mirrors in it, whether because it's a patch to openstack-zuul-jobs so can't have that file, or it's a patch to zuul-jobs and doesn't touch that file, then that file matcher wont' match | 19:21 |
pabelanger | dmsimard: -1 on 508971 | 19:21 |
mordred | dmsimard: does that make sense? (I may have said too many or not enough words) | 19:21 |
*** lukebrowning has quit IRC | 19:22 | |
dmsimard | pabelanger: you're linking something from legacy-openstackci-beaker-ubuntu-trusty | 19:22 |
dmsimard | pabelanger: that's not logstash filters ?? | 19:22 |
*** lukebrowning has joined #openstack-infra | 19:22 | |
pabelanger | looking | 19:22 |
clarkb | SpamapS: pabelanger its not a direct mapping to job throughput but 245 nodes in use right now | 19:22 |
clarkb | 191 not in use | 19:23 |
dmsimard | mordred: yeah that makes sense, I thought we needed to have a relative path from o-z-j to z-j | 19:23 |
dmsimard | mordred: your explanation makes more sense, i.e, relative to the project triggering th ejob | 19:23 |
pabelanger | dmsimard: the legacy-openstackci-beaker-ubuntu-trusty is what we need to fix in system-config, I am not sure where the logstash-filters job is coming from | 19:24 |
pabelanger | I've lost track currently | 19:24 |
dmsimard | pabelanger: that's the one you made non-voting lol | 19:24 |
dmsimard | pabelanger: we can fix that one too, hang on, I'll submit another patch | 19:24 |
pabelanger | Hmm, checking | 19:24 |
dmsimard | https://review.openstack.org/#/c/508940/ Set legacy-logstash-filters-ubuntu-trusty non-voting | 19:25 |
pabelanger | dmsimard: yup, i did it wrong to start with :( | 19:25 |
dmsimard | sending another patchset, hang on | 19:25 |
mordred | pabelanger, dmsimard: so ... BOTH of those jobs should be fixed :) | 19:25 |
dmsimard | mordred: yeah ++ | 19:25 |
clarkb | now 286 in use and 201 not in use | 19:25 |
pabelanger | mordred: ya, i think I crossed the wires | 19:25 |
mordred | pabelanger: there are many wires - easy to cross them ;) | 19:26 |
*** e0ne has quit IRC | 19:26 | |
clarkb | load on a spot check of servers including nodepool.o.o and nl01/2 lgtm | 19:26 |
mordred | dmsimard: I believe you should be able to just add it to legacy-infra-puppet-apply-base yeah? | 19:26 |
*** e0ne has joined #openstack-infra | 19:27 | |
SpamapS | clarkb: with 10 executors? | 19:27 |
mordred | clarkb: that's good - so the manage incoming executor load seems to be doing its job | 19:27 |
clarkb | SpamapS: ya | 19:27 |
*** hashar has joined #openstack-infra | 19:27 | |
*** e0ne has quit IRC | 19:27 | |
SpamapS | clarkb: and geard queue length? | 19:27 |
mordred | clarkb: oh - you were talking about launchers | 19:27 |
clarkb | mordred: I think so, obviously we need to watch it more and see if we are running enough jobs (but considering it doesn't sit at load of 20 the whole time I expect we are) | 19:27 |
SpamapS | (you should be able to run the status admin command without blowing up your terminal) | 19:27 |
*** e0ne has joined #openstack-infra | 19:27 | |
SpamapS | since it will just have like, 6 queues | 19:27 |
*** devananda has quit IRC | 19:28 | |
* clarkb looks at geard | 19:28 | |
SpamapS | well, 16, 1 for each executor's cancels | 19:28 |
*** e0ne has quit IRC | 19:28 | |
SpamapS | also lol, I inverted the logic on the memory governor.. fixing | 19:28 |
pabelanger | clarkb: fungi: mordred: I'd not object if we wanted to force merge: https://review.openstack.org/508876/ which opens firewalls for zuulv3.o.o to graphite.o.o | 19:28 |
pabelanger | then we'd be able to get better counts on online nodes | 19:28 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add memory awareness to system load governor https://review.openstack.org/508960 | 19:28 |
openstackgerrit | David Moreau Simard proposed openstack-infra/openstack-zuul-jobs master: Add required-projects to logstash-filters and openstackci-beaker jobs https://review.openstack.org/508971 | 19:28 |
*** e0ne has joined #openstack-infra | 19:29 | |
mordred | dmsimard: lgtm | 19:29 |
clarkb | SpamapS: geard closed the connection on me when I asked for status, guessing because we use ssl auth now? | 19:29 |
*** e0ne has quit IRC | 19:29 | |
clarkb | is there an easy way to connect? guessing with openssl sclient? | 19:29 |
*** e0ne has joined #openstack-infra | 19:30 | |
*** devananda has joined #openstack-infra | 19:30 | |
*** e0ne has quit IRC | 19:30 | |
smcginnis | Was there a restart or something? https://review.openstack.org/#/c/508867/1 was approved, but I don't see it in zuul3.o.o. | 19:30 |
clarkb | pabelanger: I'm ok wiht it as operating blind is annoying | 19:31 |
fungi | pabelanger: yeah, i agree that getting information on the health of the system is a high enough priority; i'm willing to bypass ci on that patch given the triviality of it | 19:31 |
clarkb | smcginnis: there was, if it wans't processed yet then it wouldn't have been readded to the queues, you can just approve it again | 19:31 |
*** e0ne has joined #openstack-infra | 19:31 | |
smcginnis | clarkb: ack, thanks | 19:31 |
mordred | pabelanger: I'm also ok merging it | 19:31 |
*** e0ne has quit IRC | 19:32 | |
SpamapS | clarkb: oh yeah, you can connect with s_client | 19:32 |
mordred | smcginnis: you may want to hold off on that patch until https://review.openstack.org/#/c/508951/ has landed | 19:32 |
openstackgerrit | Merged openstack-infra/system-config master: Add zuulv3.o.o to graphite.o.o https://review.openstack.org/508876 | 19:32 |
fungi | pabelanger: ^ | 19:33 |
pabelanger | fungi: danke! | 19:33 |
*** e0ne has joined #openstack-infra | 19:33 | |
*** jascott1 has quit IRC | 19:33 | |
*** e0ne has quit IRC | 19:33 | |
smcginnis | mordred: Hard to keep track what to wait for. :) | 19:33 |
clarkb | SpamapS: `openssl s_client -cert $wherever_zuul_keep_it -connect localhost:4730` look right to you? | 19:34 |
*** jascott1 has joined #openstack-infra | 19:34 | |
fungi | clarkb: i think you'll probably need to tell s_client where your private key is too so it will be able to receive | 19:35 |
mordred | smcginnis: yah- I know - sorry aboutthat... | 19:35 |
mordred | smcginnis: also - while I'm looking- the legacy-releases-python35 should have just been a normal tox python35 job right? | 19:35 |
SpamapS | clarkb: yep | 19:36 |
SpamapS | oh yeah the key | 19:36 |
smcginnis | mordred: I think probably. Let me take a closer look. | 19:36 |
SpamapS | 2017-10-02 12:31:03,760 INFO zuul.ExecutorServer: Unregistering due to low avail memory 3.700000000000003% < 5.0 | 19:37 |
SpamapS | perhaps I should format that float... | 19:37 |
mnaser | nah | 19:37 |
mnaser | we're precise in openstack world | 19:37 |
*** yamamoto has joined #openstack-infra | 19:38 | |
SpamapS | damn right! | 19:38 |
jlk | precise AF | 19:38 |
fungi | mnaser: just not terribly accurate? ;) | 19:38 |
clarkb | SpamapS: http://paste.openstack.org/show/622497/ the two statuses are about 30 seconds appart | 19:38 |
mnaser | there's a pun about floats to be made but | 19:38 |
SpamapS | precise to the quintillionth | 19:38 |
mnaser | ill leave it for someone else to find out | 19:38 |
jlk | we all Float down here | 19:38 |
smcginnis | mordred: I think you are correct that that job could just be the normal py35 job. But maybe dhellmann and others with more history can confirm. | 19:39 |
dhellmann | I don't think there's anything special about that job | 19:39 |
SpamapS | clarkb: ok, so executor:execute is the one that gets governed. Interesting that there are 10 registered in both cases. | 19:39 |
clarkb | `sudo openssl s_client -connect localhost:4730 -cert /etc/zuul/ssl/client.pem -key /etc/zuul/ssl/client.key -CAfile /etc/zuul/ssl/ca.pem` is the command btw | 19:39 |
SpamapS | actually IIRC that also counts active jobs, not waiting workers | 19:40 |
SpamapS | clarkb: that's a handy command | 19:40 |
*** thorst has quit IRC | 19:41 | |
mordred | dhellmann, smcginnis: ok - I'm going ot fix that- and also a couple of other things for the openstack/releases repo | 19:41 |
clarkb | SpamapS: now its 369 369 10 | 19:41 |
clarkb | SpamapS: so its not going off in either direction quickly (but maybe that is a good thing) | 19:41 |
dhellmann | mordred : ok, thanks | 19:41 |
smcginnis | mordred: Thank you! | 19:42 |
SpamapS | clarkb: should be fine to let it go up during busy periods. Just means jobs will sit queued for a while. | 19:43 |
SpamapS | clarkb: but if we're not utilizing nodes.. that's concerning. | 19:44 |
*** yamamoto has quit IRC | 19:44 | |
SpamapS | probably can raise the load limit up. Maybe go to 24.0 by raising the multiplier to 3.0 | 19:44 |
clarkb | SpamapS: ya let me check current node uage | 19:46 |
fungi | SpamapS: clarkb: i recommend also passing -quiet to s_client if you're not debugging ssl/tls negotiation | 19:46 |
clarkb | 493 in use and 129 not in use | 19:47 |
clarkb | so thats about 2/3 our capacity I think | 19:47 |
sdague | releasenotes not working is a known issue at this point - http://logs.openstack.org/22/490722/11/gate/legacy-releasenotes/c83671e/job-output.txt.gz#_2017-10-02_19_22_20_821466 ? | 19:47 |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Update releases repo to use openstack-python35-jobs template https://review.openstack.org/508978 | 19:47 |
clarkb | sdague: it is, mordred is working to fix it | 19:47 |
clarkb | I think there are changes I hsould probably review related to that too | 19:47 |
fungi | clarkb: you can also pipe commands into s_client, like so: `echo status|sudo openssl s_client -quiet -connect localhost:4730 -cert /etc/zuul/ssl/client.pem -key /etc/zuul/ssl/client.key -CAfile /etc/zuul/ssl/ca.pem 2>/dev/null` | 19:47 |
sdague | ok, trying to figure out what kinds of patches are safe to review, and which aren't | 19:48 |
clarkb | fungi: nice | 19:48 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Fix branch matching logic https://review.openstack.org/508955 | 19:48 |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Switch release-note-jobs project-template to use new jobs https://review.openstack.org/508742 | 19:48 |
clarkb | sdague: I believe that nova and devstack should be generally safe except for use of release notes | 19:48 |
clarkb | tempest I'm not sure | 19:49 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add memory awareness to system load governor https://review.openstack.org/508960 | 19:50 |
sdague | clarkb: ok, good enough. Knowing that nova should be good except for release notes is handy | 19:51 |
inc0 | can I get +3 on https://review.openstack.org/#/c/508944/ please? Will make everyone's life better | 19:51 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Remove un-used legacy-releasenotes job https://review.openstack.org/508766 | 19:51 |
clarkb | inc0: I think its changes like that that create the memory explosion in zuul? mordred that the case? I guess we can't really avoid them | 19:52 |
fungi | clarkb: i don't think we _want_ to avoid them | 19:52 |
mnaser | clarkb sdague https://review.openstack.org/#/c/508763 release note jobs should be fixed anytime now when that change merges | 19:53 |
inc0 | hmm, yeah it would validate *everything* | 19:53 |
clarkb | fungi: well we might be picky about using those that fix bugs first? | 19:53 |
fungi | maybe | 19:53 |
jdandrea | jlk Progress! One of the two changes passed. The other has two failures in the ubuntu-trusty department. https://review.openstack.org/#/c/508924/ | 19:53 |
fungi | clarkb: but also we might should get a few merged before jeblair returns from lunch so he has some memory explosions to analyze | 19:53 |
clarkb | fungi: ha ok, I'll review mordreds for release notes and inc0's | 19:53 |
jlk | jdandrea: was that a misfire? | 19:53 |
clarkb | oh fungi beat me to the mordred release notes fix | 19:54 |
mnaser | clarkb that weirdly enough got the +2 from zuul 13 minutes ago now | 19:55 |
mnaser | but hasnt merged? | 19:55 |
fungi | clarkb: also note i've added Shrews's 508955 fix for branch matching to the priority list on the etherpad. we should try to get that merged soon | 19:55 |
clarkb | mnaser: zuul is processing the event queues I think | 19:55 |
jdandrea | jlk Oh this is interesting. "ERROR! /home/zuul/src/git.openstack.org/openstack-infra/project-config not found" http://logs.openstack.org/24/508924/1/check/legacy-openstackci-beaker-ubuntu-trusty/a1746d2/job-output.txt.gz | 19:55 |
clarkb | mnaser: so need to wait for zuul to catch up then run those jobs | 19:55 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Fix branch matching logic https://review.openstack.org/508955 | 19:55 |
mnaser | clarkb oh i see | 19:56 |
*** thorst has joined #openstack-infra | 19:56 | |
AJaeger | jdandrea: should be fixed by dmsimard's https://review.openstack.org/508971 | 19:56 |
*** dhill__ has quit IRC | 19:57 | |
*** dhill_ has joined #openstack-infra | 19:58 | |
clarkb | AJaeger: comment on 508706, mordred can you double check my comment there is accurate? | 19:58 |
AJaeger | clarkb: I commented - that step is the complete retiring of repo, so it should get removed from the file, shouldn't it? | 20:00 |
clarkb | AJaeger: oh I read it as how to migrate for some reason | 20:00 |
* clarkb rereads | 20:00 | |
*** thorst has quit IRC | 20:00 | |
clarkb | ah yup I'm wrong, sorry for hte noise | 20:01 |
AJaeger | clarkb: better safe than sorry - thanks for reviewing | 20:01 |
*** trown is now known as trown|brb | 20:01 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Simplify TestSchedulerBranchMatcher https://review.openstack.org/508980 | 20:01 |
*** lnxnut has joined #openstack-infra | 20:03 | |
* AJaeger calls it a day and waves good night | 20:04 | |
fungi | thanks for all the help AJaeger! | 20:05 |
fungi | have a good evening | 20:05 |
jdandrea | AJaeger Thanks for that, I'll give it a shot once that merges. | 20:06 |
*** gouthamr has quit IRC | 20:06 | |
clarkb | SpamapS: down to 202 202 10 now (I think because zuul is busy processing results queues and so not generating new jobs yet) | 20:06 |
openstackgerrit | Matthew Treinish proposed openstack-infra/openstack-zuul-jobs master: Add missing projects for openstackci beaker jobs https://review.openstack.org/508981 | 20:08 |
openstackgerrit | Matthew Treinish proposed openstack-infra/puppet-subunit2sql master: Ensure that build_names are unique per project https://review.openstack.org/508258 | 20:09 |
mtreinish | clarkb, fungi: the beaker jobs on ^^^ are still failing, hopefully that'll fix it | 20:09 |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Create static publication base job https://review.openstack.org/508982 | 20:12 |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Add jobs for special static publication targets https://review.openstack.org/508983 | 20:12 |
mordred | clarkb, pabelanger, fungi, dhellmann, smcginnis: ^^ I think that stack should take care of most of the things around releases repo | 20:13 |
*** dave-mcc_ has joined #openstack-infra | 20:13 | |
dhellmann | mordred : is the idea that eventually the stuff in https://review.openstack.org/#/c/508978/1/zuul.d/projects.yaml will move to an in-tree file? | 20:14 |
mordred | dhellmann: yes - by and large. there will still be some things that will stay in there - like defining that openstack python projects run the openstack-python-jobs template, for instance | 20:15 |
dhellmann | ok | 20:15 |
mordred | dhellmann: also things that need to share a change queue, like the integrated gate, need to keep definitions in project-config ... | 20:16 |
dhellmann | smcginnis and I were talking earlier about how to update the validation that we have in the releases repo to ensure that project repos have a release job attached. It sounds like we'll need to look in 2 places for a while | 20:16 |
mordred | dhellmann: but for things that aren't those two cases, the rest should just migrate to in-repo | 20:16 |
*** dave-mccowan has quit IRC | 20:16 | |
mordred | dhellmann: yah - I believe that is true | 20:16 |
dhellmann | k | 20:16 |
*** dave-mcc_ is now known as dave-mccowan | 20:17 | |
dhellmann | mordred : are all (or some?) of the variables used by https://review.openstack.org/#/c/508982/1/playbooks/publish/static.yaml defined somewhere? | 20:17 |
dhellmann | fileserver.path, for example | 20:18 |
*** windmoon has joined #openstack-infra | 20:18 | |
*** dprince has quit IRC | 20:18 | |
mordred | dhellmann: they are - in this particular case 'fileserver' is defined by the 'add-fileserver' role | 20:18 |
dhellmann | ok, so if I find the docs for that it will explain that it creates that variable? | 20:19 |
mordred | dhellmann: oh - actually - no - that's totally a buggy patch from me :) | 20:19 |
dhellmann | and I guess the "artifacts" directory use is somehow standard? or is that something made up for this job? | 20:19 |
mordred | dhellmann: http://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/add-fileserver is where the add-fileserver role is - but is also documented here: https://docs.openstack.org/infra/zuul-jobs/roles.html#role-add-fileserver | 20:20 |
*** windmoon has quit IRC | 20:20 | |
dhellmann | aha, ok, I think I came across that earlier and didn't recognize the significance | 20:20 |
mordred | dhellmann: it's halfway in between - all of our fetch-docs roles put docs into artifacts/ on the executor | 20:21 |
dhellmann | I thought those were inputs to the job, not outputs | 20:21 |
* mordred fixing patches | 20:21 | |
*** lnxnut has quit IRC | 20:21 | |
*** jascott1 has quit IRC | 20:22 | |
*** jascott1 has joined #openstack-infra | 20:23 | |
*** gouthamr has joined #openstack-infra | 20:25 | |
*** jascott1 has quit IRC | 20:25 | |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Create static publication base job https://review.openstack.org/508982 | 20:26 |
*** jascott1 has joined #openstack-infra | 20:26 | |
mordred | dhellmann: ok - that ^^ might read a little better | 20:26 |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul feature/zuulv3: Fix Gearman UnknownJob handler https://review.openstack.org/508992 | 20:26 |
*** ccamacho has joined #openstack-infra | 20:27 | |
openstackgerrit | Mohammed Naser proposed openstack-infra/openstack-zuul-jobs master: Switch puppet unit tests base job https://review.openstack.org/508994 | 20:31 |
mnaser | ^ small one liner to change a base job to fix things temporarily till we move things over ^ | 20:32 |
*** jungleboyj has joined #openstack-infra | 20:32 | |
jungleboyj | Afternoon oh Infra Gods. We have a patch that can fix many of the problems we are seeing with Cinder but for some reason Zuul isn't picking it up. https://review.openstack.org/#/c/508541/ | 20:33 |
jungleboyj | Any guidance you can provide? | 20:34 |
mnaser | jungleboyj things are a bit slow in zuul land esp around reconfigs and what not | 20:34 |
mnaser | expect delays in things showing up in the queue | 20:35 |
clarkb | jungleboyj: zuul is busy with "config generation and an object graph walk" accoridng to #zuul | 20:35 |
jungleboyj | mnaser: Ok, even of an hour plus? | 20:35 |
mnaser | yes, i've seen it just over an hour and it'll click and pop in | 20:35 |
fungi | _though_ zuul has never voted on that change | 20:35 |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Add jobs for special static publication targets https://review.openstack.org/508983 | 20:35 |
fungi | will it actually act on an approval vote if there was never a verify +1 from zuul? | 20:35 |
smcginnis | Should be a full new patch now. | 20:36 |
fungi | or does it need a recheck first? | 20:36 |
smcginnis | check and gate. | 20:36 |
fungi | oh. yep new patchset at 19:55 | 20:36 |
*** Goneri has quit IRC | 20:36 | |
fungi | so right, it's probably just in the event queue still | 20:36 |
jungleboyj | fungi: Yep. Did that to add the bug and try to give it a kick. | 20:36 |
jungleboyj | Will just keep watching for it. | 20:37 |
*** thorst has joined #openstack-infra | 20:37 | |
*** bnemec has joined #openstack-infra | 20:37 | |
jeblair | i think we should restart it again | 20:37 |
*** trown|brb is now known as trown | 20:37 | |
fungi | jeblair: do you think the objgraph calls are killing it? | 20:37 |
jeblair | fungi: likely a contributing factor this time at least | 20:38 |
fungi | we only got up around 5gib ram used since the last restart | 20:38 |
jeblair | fungi: it's doing that and making a dynamic config at the same time. that's been going on for about 40m now | 20:38 |
openstackgerrit | Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Add openstack-tox-validate job https://review.openstack.org/508996 | 20:39 |
fungi | if you think we should restart it again, i won't argue. just don't want to make it harder for you to debug | 20:39 |
* mnaser much rather have a slow moving gate today but useful information to solve things for the long term | 20:40 | |
jeblair | i'm not getting any useful info right now :| | 20:40 |
jeblair | the ones that returned before it got really slow provided no info :( | 20:40 |
fungi | should we get Shrews' branch matcher fix pulled locally before restartnig? | 20:40 |
jeblair | good idea | 20:41 |
mnaser | jeblair ouch, this is really difficult, i ran into this the other day - https://pypi.python.org/pypi/mem_top -- maybe could be interesting (if you havent probably ran into it)? | 20:41 |
mnaser | the funny thing is they caught a gearman related memory leak as well with it :-P | 20:41 |
mnaser | surprisingly close to home.. maybe if it has any usefulness, it can be added pre-restart? | 20:41 |
mnaser | maybe after every dynamic reconfig or reconfig it can throw a logging.debug(mem_top()) ... just trying to throw some ideas that coudl be of help | 20:42 |
jeblair | mnaser: that's like objgraph.show_most_common_types() or objgraph.show_growth() which i added to zuul's sigusr2 debug handler over the weekend | 20:42 |
*** esberglu has quit IRC | 20:43 | |
mnaser | jeblair ah i see | 20:43 |
jeblair | mnaser: the things i'm doing now are asking objgraph to show me the reference chain to objects that i don't think should be in memory any more. i'm still coming up empty. | 20:43 |
mnaser | so technically according to objgraph the memory consumption is "normal" and everything has a refcount of non zero i guess? | 20:44 |
mnaser | how hard would it be to wire up a zuul instance or a test case that just reruns dynamic reconfig using openstack repos over and over again? | 20:45 |
jeblair | mnaser: yeah, the 'graph' part of it is that it helps find where you have references to objects still sitting around | 20:45 |
harlowja | python and dynamic debugging == the suck, lol | 20:45 |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Remove centrally defined jobs for openstack/releases https://review.openstack.org/508998 | 20:45 |
jeblair | mnaser: i already tried doing this locally. i have not figured out the trigger yet. | 20:45 |
Shrews | fungi: jeblair: my change does pass the TestScheduler suite of tests (and my newly added test), but i did not verify it against the entire test suite. so *fairly* confident it fixes more than it breaks | 20:45 |
jeblair | Shrews: any chance you can run the whole suite real quick? | 20:46 |
mnaser | jeblair ouch, this is a real tricky one :( | 20:46 |
Shrews | jeblair: sure | 20:46 |
harlowja | https://github.com/mgedmin/dozer#tracking-down-memory-leaks may also be fun | 20:46 |
jeblair | we all want to say it has something to do with configuration, but every simple theory i've examined hasn't held up based on logs. | 20:46 |
harlowja | if anyone wants to try that to | 20:46 |
Shrews | jeblair: running now | 20:46 |
mordred | dhellmann, smcginnis: https://review.openstack.org/508998 is at the end of a depends-on chain that should, best I can tell, have your repo completely migrated | 20:46 |
Shrews | ugh, i have no mysql | 20:47 |
mnaser | jeblair i'd love to give a hand, if there are some instructions/docs on how to wire up a zuul instance locally (connected to openstack repos, etc), i'd gladly throw a hand | 20:47 |
jeblair | it probably does, but maybe it's more like "a syntax error in dynamic reconfiguration for re-equeing a change that failed once already after a tenant reconfig" or something crazylike that. :) | 20:47 |
SpamapS | There's likely a piece of the old config that gets carried into the new one. | 20:48 |
jeblair | so i'm still trying to extract "what's happening in production" out of this | 20:48 |
SpamapS | Or into running events | 20:48 |
*** jkilpatr has quit IRC | 20:48 | |
SpamapS | but that's going to take a ton of digging to find | 20:48 |
openstackgerrit | Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Remove legacy static- and releases- jobs https://review.openstack.org/508999 | 20:48 |
mnaser | well the only thing i guess we've all noticed is the memory usage goes up when zuul is stuck and (from what i understand) it seems to be in a state of dynamic reconfig during that memory growth | 20:49 |
openstackgerrit | Brian Haley proposed openstack-infra/devstack-gate master: Support an IPv6 underlay network https://review.openstack.org/343041 | 20:49 |
SpamapS | Right, so it's building the new tree, and not letting go of the old one. Possibly even making copies of objects for transformation purposes that get kept around deep in some object. | 20:50 |
*** hashar has quit IRC | 20:50 | |
jeblair | on suday zuul performed 100 dynamic reconfigs from 0900 to 1520 with no increase in memory usage. | 20:50 |
SpamapS | That's why I suspect it may be tied to ongoing events. | 20:50 |
fungi | not necessarily. it's been "stuck" for the last 45-ish minutes doing a dynamic reconfiguration and ram is holding fairly steady | 20:50 |
SpamapS | As in, a piece of config gets carried along forward with a long running event | 20:50 |
jeblair | SpamapS: yep. that's what i'm trying to find | 20:51 |
SpamapS | And until the job finishes, no free for you | 20:51 |
clarkb | jeblair: that is an interesting statistic (I didn't know that) | 20:51 |
jeblair | so earlier, there were 77 layouts in memory, but i could only find 19 of them by trawling buildsets. so what's holding on to the other 58? | 20:52 |
mnaser | would it be hard to extract logs of # of events queued by type and then graph that on top of the memory usage graphs we have? | 20:52 |
jeblair | that was the question i was trying to answer before the last restart. | 20:52 |
jeblair | and "what was holding on to the other 7" was the question i was trying to answer this time. :) | 20:52 |
mordred | jeblair: ++ | 20:52 |
mordred | jeblair: out of curiosity - do you know what the resident memory size is after constructing the initial config but before any event processing or reconfigurations? | 20:52 |
SpamapS | jeblair: maybe we should hack in a uuid for each config generation (or maybe just "all the shas hashed") and then you could look for that string in all the other objects. | 20:52 |
jeblair | mordred: not yet; it's on my list to find out | 20:53 |
mnaser | if the logs arent sensitive and include when an event is queued, i would volunteer to try and graph out different type of events queue'd over the memory usage recorded (if this is beneficial information?) | 20:54 |
*** caphrim007 has quit IRC | 20:54 | |
SpamapS | jeblair: oh also, have you checked that the layouts hanging around don't run into any of the gc cycle fails? | 20:54 |
jeblair | SpamapS: no; can you elaborate? | 20:54 |
*** caphrim007 has joined #openstack-infra | 20:54 | |
SpamapS | I recall python3.4+ have most of them resolved, but a few scenarios still cause self-referential objects to be un-gc'able | 20:54 |
SpamapS | jeblair: I will go get some backing info. | 20:55 |
Shrews | jeblair: test_scheduler.py and test_v3.py all pass. not sure which of the tests are requiring mysql, so i didn't run the entire suite. if we have time to wait, i can install the db and get it setup for testing | 20:55 |
*** esberglu has joined #openstack-infra | 20:55 | |
jeblair | SpamapS: ugh. i know we have a couple of cycles, but am expecting them to be okay. if that's not something we can assume that may be worth checking. | 20:55 |
*** hichihara has joined #openstack-infra | 20:55 | |
SpamapS | jeblair: Yeah I remember seeing a pycon talk specifically about the ones that are left and how it's kind of hard to get into that situation. But.. we're REALLY good at getting into nigh-impossible situations ;) | 20:56 |
dhellmann | jeblair, SpamapS : it's pretty hard in 3.5 to have objects that can't be collected. even cycles seem to need a reference from the outside, now. | 20:56 |
dhellmann | I had to do a lot of work to rewrite that section of pymotw for the gc module. | 20:56 |
SpamapS | dhellmann: zomg you are the right person to discuss this with. My swiss cheese brain can't remember the specific situations that can still get you into an un-collectable but seemingly unreferenced object and I can't remember them. | 20:56 |
SpamapS | This was like, 2014 so.. ugh | 20:57 |
jeblair | SpamapS, dhellmann: is there a way to check? like, if i have a reference to an object i think should be collected, can i ask the gc about it? | 20:57 |
dhellmann | https://pymotw.com/3/gc/#finding-references-to-objects-that-cannot-be-collected | 20:57 |
*** ijw has quit IRC | 20:58 | |
SpamapS | There you go | 20:58 |
dhellmann | there are a couple of different scenarios on that page for showing what the gc has that might be interesting if you're debugging a memory leak | 20:58 |
SpamapS | also isn't there a way to basically tell python to try harder? | 20:58 |
jeblair | Shrews: i see failures for tests.unit.test_change_matcher.TestBranchMatcher.test_matches_returns_true_on_matching_ref and tests.unit.test_change_matcher.TestBranchMatcher.test_matches_returns_false_for_missing_attrs | 20:58 |
*** mat128 has quit IRC | 20:59 | |
dhellmann | the python 2 version of that had a lot more cases, but I couldn't get all of them to "work" in the same way for 3 | 20:59 |
SpamapS | gc.set_threshold() or something? | 20:59 |
jeblair | Shrews: they may just be unit tests that need updates | 20:59 |
*** trown is now known as trown|outtypewww | 20:59 | |
Shrews | jeblair: looking | 20:59 |
dhellmann | SpamapS : yeah, scroll down the page a bit to the next section and it shows how to use that | 20:59 |
*** dave-mccowan has quit IRC | 21:01 | |
*** srobert_ has quit IRC | 21:01 | |
*** rockyg has joined #openstack-infra | 21:01 | |
SpamapS | dhellmann: thanks for the assist. | 21:01 |
SpamapS | I have to run do non-infra things for a bit | 21:01 |
dhellmann | SpamapS, jeblair : I have to drop off, but I hope that's helpful | 21:01 |
jeblair | dhellmann: thanks | 21:02 |
jeblair | looks like zuul has resumed work | 21:04 |
fungi | is the objgraph dump still in progress i guess? | 21:05 |
jeblair | yes | 21:05 |
jeblair | i think it may be finished now | 21:06 |
fungi | ooh! | 21:06 |
mnaser | can i get a very short and quick +W here for puppet unit tests? https://review.openstack.org/#/c/508994/ (sorry for asking so much lately :() | 21:08 |
clarkb | mnaser: done | 21:09 |
mnaser | clarkb tyvm | 21:09 |
* clarkb is waiting on those three project-config changes to merge | 21:10 | |
*** dhill_ has quit IRC | 21:11 | |
*** dhill_ has joined #openstack-infra | 21:11 | |
clarkb | so release notes should work now? and pypi releasing fix is on its way. Infracloud will be turned off until we can sort out its ssh problems | 21:12 |
mordred | clarkb: releaesnotes and pypi-releasing will work as soon as their patches land | 21:12 |
clarkb | mordred: I think releasenotes landed | 21:12 |
clarkb | 93 nodes in use now | 21:12 |
mordred | https://review.openstack.org/#/c/508951/ is publish-to-pypi | 21:12 |
clarkb | mordred: ya that one is in the gate with test passed, just needs to merge | 21:13 |
mordred | woot | 21:13 |
*** jkilpatr has joined #openstack-infra | 21:13 | |
mordred | clarkb: https://review.openstack.org/#/c/508763 and https://review.openstack.org/#/c/508769 apply them | 21:13 |
mordred | the new releasenotes jobs | 21:13 |
*** ccamacho has quit IRC | 21:14 | |
Shrews | jeblair: ok, apparently change.ref can be None in the tests. i'll change add that check back in | 21:14 |
mordred | clarkb: releasenotes jobs themselves need a new reno release, which needs the publish-to-pypi fixes landed | 21:14 |
clarkb | mordred: 8763 may need to be reenqueued to the gate? it says ready to submit but didn't actuall get submitted | 21:14 |
jeblair | Shrews: i think it may never be none anymore; maybe double check that and alter the tests? | 21:14 |
*** edmondsw has quit IRC | 21:14 | |
inc0 | https://review.openstack.org/#/c/508661/ <- these two jobs just hung on some random line...is it possibly related to zuul memory explosion? | 21:15 |
jeblair | Shrews: (this is why i don't like unit tests) | 21:15 |
mordred | clarkb: let's wait til the pypi jobs thing lands | 21:15 |
mordred | clarkb: since the jobs will be bork either way without new reno | 21:15 |
mordred | clarkb: then once new reno is cut - we can either retrigger or just click the merge button (since it did actually get its +2) | 21:16 |
openstackgerrit | Merged openstack-infra/project-config master: Disable infracloud https://review.openstack.org/508969 | 21:16 |
openstackgerrit | Merged openstack-infra/project-config master: Use publish-to-pypi and friends for python releasing https://review.openstack.org/508951 | 21:16 |
mordred | \o/ | 21:16 |
clarkb | mordred: oh its because its based on a non current patchset | 21:16 |
Shrews | jeblair: well the test is _explicitly_ checking what happens when ref is None (test_matches_returns_false_for_missing_attrs), so maybe delete the test? i try to never do that, but maybe it makes sense here? | 21:16 |
clarkb | mordred: so you'll need to rebase those two I think | 21:16 |
mordred | clarkb: ah. that. kk | 21:16 |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Switch jobs to use new release notes job https://review.openstack.org/508763 | 21:16 |
openstackgerrit | Monty Taylor proposed openstack-infra/project-config master: Collapse releasenotes jobs to using project template https://review.openstack.org/508769 | 21:16 |
jeblair | Shrews: yes, i think that's the correct thing to do if there are no subclasses of Ref without a ref | 21:16 |
mordred | clarkb: rebased | 21:16 |
clarkb | mordred: should I reapprove? the +2's carried over | 21:18 |
jlvillal | So we keep trying rechecks on this patch to openstack-infra/openstack-zuul-jobs and the POST_FAILURE message seems to jump between jobs between rechecks: https://review.openstack.org/#/c/508882 | 21:18 |
mordred | clarkb: yah - might as well - it isn't going to break WORSE than it is right now | 21:18 |
*** ldnunes has quit IRC | 21:18 | |
*** lnxnut has joined #openstack-infra | 21:19 | |
jlvillal | Is the POST_FAILURE a known issue? | 21:19 |
jeblair | Shrews: those are the only test failures for your change | 21:19 |
mordred | jlvillal: oh - clarkb and I were looking at that patch and now I forget where we got | 21:19 |
clarkb | jlvillal: backing up a step I think that is the wrong fix for the problem | 21:19 |
Shrews | jeblair: ack | 21:19 |
clarkb | mordred: I think we need to check that BUILD_TIMEOUT is actually in the build env | 21:19 |
jlvillal | mordred: Thanks | 21:19 |
Shrews | thx | 21:19 |
jlvillal | clarkb: Okay. We aren't picky on how it is fixed. We would just like it fixed :) | 21:19 |
*** chlong_ has quit IRC | 21:20 | |
*** chlong has quit IRC | 21:20 | |
clarkb | jlvillal: basically BUILD_TIMEOUT should already be there | 21:20 |
clarkb | jlvillal: do you haev an example log where a job failed becaues it wasn't? | 21:20 |
* jlvillal looks | 21:20 | |
clarkb | jlvillal: and yes as far as the POST_FAILURES go we are not throttling zuul executors based on load so they should be better at a whole bunch of things including servicing those ssh connections | 21:21 |
jlvillal | clarkb: Here is a failed job: https://review.openstack.org/#/c/505837/5 | 21:22 |
mordred | jlvillal: the legacy-grenade-dsvm-ironic one? | 21:22 |
jlvillal | http://logs.openstack.org/37/505837/5/check/legacy-grenade-dsvm-ironic/c666daf/job-output.txt.gz#_2017-10-02_10_45_58_174719 | 21:22 |
jlvillal | mordred: yes | 21:22 |
clarkb | jlvillal: that is a change :) is http://logs.openstack.org/37/505837/5/check/legacy-grenade-dsvm-ironic/c666daf/ ya ok that thanks | 21:22 |
smcginnis | mordred: Should we be OK on releases? Or are there other patches still outstanding? | 21:23 |
mordred | k. so - http://logs.openstack.org/37/505837/5/check/legacy-grenade-dsvm-ironic/c666daf/zuul-info/inventory.yaml shows zuul.timeout to be 10800 | 21:23 |
smcginnis | mordred: Wasn't sure if we should get this through first: https://review.openstack.org/#/c/508997/ | 21:24 |
mordred | dmsimard: you don't magically collect environment variables passed to shell tasks in ara do you? | 21:25 |
clarkb | mordred: jlvillal declare -x DEVSTACK_GATE_TIMEOUT="110" is in reproduce.sh for that job | 21:25 |
mordred | smcginnis: no - that's just cleaning thethings up - to my knowledge all of the patches needed for releasing things to work should be in place | 21:25 |
clarkb | mordred: jlvillal the buffer is 10, so that would imply a BUILD_TIMEOUT of 120 (not 180 which is what inventory.yaml says) | 21:26 |
mordred | clarkb: I see BUILD_TIMEOUT being set to 120 in shell output | 21:27 |
smcginnis | mordred: Excellent, thanks. | 21:27 |
clarkb | mordred: in job-output? | 21:27 |
jlvillal | clarkb: mordred: So we had this before: https://github.com/openstack-infra/project-config/blob/master/jenkins/jobs/ironic.yaml#L36-L37 | 21:27 |
mordred | http://logs.openstack.org/37/505837/5/check/legacy-grenade-dsvm-ironic/c666daf/job-output.txt.gz#_2017-10-02_09_08_09_421108 | 21:27 |
clarkb | jlvillal: those are independent, those apply to specific tests not the entire job | 21:27 |
mordred | yes - I see those in the current output | 21:27 |
clarkb | mordred: jlvillal I think those are unrelated | 21:28 |
mordred | I agree | 21:28 |
clarkb | you can see at http://logs.openstack.org/37/505837/5/check/legacy-grenade-dsvm-ironic/c666daf/job-output.txt.gz#_2017-10-02_09_09_27_571148 that the job timeout is set properly at some point | 21:29 |
clarkb | well at least as a number and not a raw var name | 21:30 |
*** rloo has quit IRC | 21:30 | |
clarkb | (but that number isn't what we expect) | 21:30 |
jlvillal | Okay, yeah I'm not sure where 104 comes from | 21:30 |
clarkb | jlvillal: its 120 - 10 = 110, then that is ~6 minutes into the job so 104 | 21:30 |
jlvillal | Oh, I think it subtracts how long the job has been running for up to this point | 21:30 |
clarkb | yup | 21:30 |
clarkb | ok looking at the output again its totally set and timeout is doing what we expect | 21:31 |
*** lnxnut has quit IRC | 21:31 | |
clarkb | the problem is the timeout value is too short | 21:31 |
mnaser | jeblair just got another "Job base not defined" on https://review.openstack.org/#/c/508994/ | 21:31 |
clarkb | so exporting a raw BUILD_TIMEOUT is likely not going to change any of that | 21:31 |
jlvillal | The job starts at 9:03 and then dies at 10:45. That isn't 120 minutes. | 21:31 |
jlvillal | If the timeout is in minutes. | 21:31 |
mnaser | Shrews ^ maybe that helps as a hint of a possibly .. problem (I guess) | 21:31 |
mordred | clarkb: so we should toss in an 'env' call at the top of one of the shell snippets to verify what's getting passed in to the job | 21:32 |
Shrews | mnaser: eh? | 21:32 |
clarkb | jlvillal: correct its shorter | 21:32 |
mnaser | Shrews got a syntax error that job base is not defined ... maybe that might help hint towards memory / event issues | 21:32 |
clarkb | jlvillal: because we intentionally substract cleanup overhead from that number | 21:32 |
Shrews | mnaser: oh, maybe. i'm not tracking that issue atm :) | 21:32 |
mnaser | okay :> | 21:33 |
clarkb | jlvillal: that is however ~104 minutes from when the job reports the job timeout is 104 minutes | 21:33 |
clarkb | jlvillal: so it is functioning properly in that regard | 21:33 |
*** rhallisey has joined #openstack-infra | 21:33 | |
clarkb | jlvillal: the problem here is that job should have a timeout of 180 minutes not 120 minutes | 21:33 |
*** jascott1 has quit IRC | 21:33 | |
clarkb | ah ok the default BUILD_TIMEOUT is 120 | 21:33 |
Shrews | jeblair: fyi, going to squash that test simplification change on the branch fix | 21:33 |
jlvillal | clarkb: okay | 21:33 |
*** jascott1 has joined #openstack-infra | 21:33 | |
*** eharney has quit IRC | 21:34 | |
clarkb | ok I see the problem (maybe | 21:35 |
*** jascott1 has quit IRC | 21:35 | |
openstackgerrit | Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Print environment for legacy-grenade-dsvm-ironic https://review.openstack.org/509007 | 21:35 |
clarkb | mordred: timeout is in seconds | 21:35 |
openstackgerrit | Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Print environment for legacy-grenade-dsvm-ironic https://review.openstack.org/509007 | 21:35 |
clarkb | mordred: but it should be in miliseconds? | 21:35 |
mordred | clarkb: the filter plugin does that | 21:35 |
mordred | params['BUILD_TIMEOUT'] = str(int(zuul['timeout']) * 1000) | 21:35 |
clarkb | mordred: ah ok | 21:35 |
mordred | clarkb, jlvillal: ^^ if we depends-on an ironic change to https://review.openstack.org/509007, we can see what is actually getting put into the environment | 21:36 |
clarkb | then ya printing the env is likely best next step | 21:36 |
mordred | clarkb: honestly - putting in a env in a step like that in the legacy-base pre playbook might not be a terrible idea for while we're debugging things liek that | 21:37 |
jlvillal | mordred: clarkb: I can spin up a test ironic patch | 21:37 |
clarkb | mordred: ya | 21:37 |
*** claudiub|2 has joined #openstack-infra | 21:37 | |
*** jcoufal has quit IRC | 21:37 | |
jlvillal | mordred: So just make a dummy patch to Ironic and depend on the patch https://review.openstack.org/#/c/509007/ ? | 21:37 |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Fix branch matching logic https://review.openstack.org/508955 | 21:38 |
*** ijw_ has joined #openstack-infra | 21:38 | |
*** ijw_ has quit IRC | 21:38 | |
*** ijw has joined #openstack-infra | 21:38 | |
openstackgerrit | Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Print environment in pre playbook of legacy-base job https://review.openstack.org/509009 | 21:38 |
mordred | clarkb: there ya go ^^ | 21:39 |
Shrews | jeblair: there ya go ^^^ all test_v3, test_scheduler, and test_change_matcher tests passing, and pep8 | 21:39 |
mordred | jlvillal: yes please | 21:39 |
clarkb | mordred: does that zuul_legacy_vars thing in the filter module get flattened? | 21:39 |
* Shrews squashes mordred's "there ya go" because he can | 21:39 | |
jlvillal | mordred: clarkb: Test patch is at: https://review.openstack.org/#/c/509010/ | 21:39 |
*** jascott1 has joined #openstack-infra | 21:40 | |
mordred | Shrews: lgtm | 21:41 |
openstackgerrit | Michal Jastrzebski (inc0) proposed openstack-infra/project-config master: Remove Kolla and Kolla-Ansible jobs https://review.openstack.org/508944 | 21:41 |
inc0 | can I get re +3 on ^ please? | 21:41 |
*** rhallisey has quit IRC | 21:42 | |
inc0 | also mordred plz check if I publisher jobs I removed are correct | 21:42 |
inc0 | https://review.openstack.org/#/c/508944/2..3/zuul.d/projects.yaml mordred | 21:43 |
*** askb has joined #openstack-infra | 21:44 | |
inc0 | well looks good, hopefully I didn't break our release mechanism;) | 21:44 |
mordred | inc0: yes- that's correct | 21:44 |
inc0 | thanks | 21:44 |
mordred | inc0: we just landed https://review.openstack.org/508951 - so it's possible you might have to rebase if git complains - but since you're removing the same thing the base patch is I think it should just work | 21:44 |
mtreinish | fungi, clarkb, mordred, jeblair: do you know what the syntax error on: https://review.openstack.org/#/c/508981/ is coming from? | 21:45 |
mordred | inc0: oh - wait - I grok what you're showing me now ... | 21:45 |
inc0 | well I just rebased | 21:45 |
mordred | inc0: yes -that is the correct resolution of that rebase | 21:45 |
inc0 | cool, thanks | 21:45 |
mordred | mtreinish: no I do not - and that is fairly disturbing :( | 21:46 |
jeblair | mtreinish: not yet; mnaser pointed out that bug yesterday. it's definitely a zuul bug and you should recheck. | 21:46 |
* jeblair adds to etherpad | 21:46 | |
smcginnis | Seeing a lot of POST_FAILURES now. | 21:47 |
eumel8 | just wondering, there are no more translation sync jobs running since last Friday: http://status.openstack.org/openstack-health/#/g/build_queue/periodic?groupKey=build_queue&searchJob=translation | 21:47 |
*** hichihara has quit IRC | 21:48 | |
jlk | the translation jobs are currently broken, and we've got new jobs proposed to fix them | 21:49 |
jlk | but it'll be an iterative process. | 21:49 |
*** jascott1 has quit IRC | 21:50 | |
*** Swami has quit IRC | 21:51 | |
*** jascott1 has joined #openstack-infra | 21:51 | |
eumel8 | ok, something where I can help? | 21:51 |
mordred | jeblair: did you wind up getting data from the objdump? | 21:51 |
mordred | s/objdump/objgraph/ | 21:52 |
jeblair | mordred: still nothing useful | 21:52 |
clarkb | mordred: in environment: '{{ zuul | zuul_legacy_vars }}' what is zuul | doing? | 21:53 |
jlk | eumel8: probably not yet. :( THere is https://review.openstack.org/#/c/502208 and https://review.openstack.org/#/c/502207 as the start, then later I think we'll have to update all the jobs to make use of them. | 21:53 |
mordred | clarkb: | applies a jinja filter | 21:53 |
*** jascott1 has quit IRC | 21:53 | |
mordred | clarkb: so that's saying "apply the filter zuul_legacy_vars to the variable zuul" | 21:53 |
eumel8 | jlk: thx, will take a look | 21:54 |
clarkb | mordred: gotcha, so that filter is basically returning just the bits we want in the environment from the large zuul dict | 21:56 |
jeblair | mordred: well, let me rephrase that to 'no smoking gun yet' | 21:56 |
jlk | clarkb: it takes the large zuul var, runs some python on it to transpose it the way we need it, and re-exposes it. | 21:56 |
jlk | or a subset of it | 21:56 |
*** jascott1 has joined #openstack-infra | 21:57 | |
mtreinish | jeblair: ok, will do | 22:02 |
*** jascott1 has quit IRC | 22:02 | |
*** jascott1 has joined #openstack-infra | 22:02 | |
*** rcernin has quit IRC | 22:02 | |
*** rockyg has quit IRC | 22:05 | |
*** baoli has quit IRC | 22:05 | |
openstackgerrit | Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Add openstack-tox-validate job https://review.openstack.org/508996 | 22:06 |
jlk | eumel8: is there a particular project that we could use as a test case to see if we can make the jobs work? | 22:06 |
openstackgerrit | Monty Taylor proposed openstack-infra/openstack-zuul-jobs master: Remove legacy static- and releases- jobs https://review.openstack.org/508999 | 22:06 |
*** jascott1 has quit IRC | 22:06 | |
*** bobh_ has quit IRC | 22:07 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul feature/zuulv3: Move shadow layout to item https://review.openstack.org/509014 | 22:07 |
*** thiagolib has joined #openstack-infra | 22:09 | |
mordred | infra-root: I +2'd ^^ but didn't +3 in case anyone else wanted to read/review | 22:11 |
ianw | sorry, so much scrollback ... should i be debugging why a recheck on 508367 hasn't made it into the queue in ~15 minutes? | 22:11 |
smcginnis | ianw: I saw one that took over an hour. | 22:12 |
clarkb | ianw: its because zuul is very slowly doing dynamic reconfigurations | 22:12 |
eumel8 | jlk: searchlight-ui https://review.openstack.org/#/c/504768/ it's a small one with priority low | 22:12 |
clarkb | you'll notice the queues on the static page grow quite large then after time drop | 22:13 |
*** Swami has joined #openstack-infra | 22:13 | |
clarkb | (looks like they just dropped \o/ ) | 22:13 |
jlk | ah | 22:13 |
*** xyang1 has quit IRC | 22:13 | |
ianw | yep and there it is | 22:13 |
*** ihrachys has quit IRC | 22:14 | |
ianw | ok, i'm just going to try and keep an eye on etherpad etc and try and help if i can ... but i've been out for 3 days so mostly i'll just try to not make things worse :) | 22:15 |
jlk | hrm, I wonder if we could make this job depend on the other job that is trying to add the translation jobs | 22:15 |
*** thorst has quit IRC | 22:15 | |
jlk | or if we just have to land the translation jobs and then iterate on them. | 22:16 |
*** thorst has joined #openstack-infra | 22:18 | |
eumel8 | I'm not completly trough 502208. there's a lots of stuff in | 22:19 |
*** jascott1 has joined #openstack-infra | 22:20 | |
jlk | yeah, took a while to tear the old job apart to build the new ones | 22:20 |
*** thorst has quit IRC | 22:22 | |
*** jascott1 has quit IRC | 22:22 | |
eumel8 | but good to see someone is still working on it and the topic didn't got lost :) | 22:25 |
SpamapS | interesting.. kind of looks like the executors are still swapping a little, even with the reduced concurrency | 22:27 |
*** lnxnut has joined #openstack-infra | 22:28 | |
eumel8 | zuul syntax error: Job base not defined. What could be happened after recheck? | 22:29 |
jlk | mordred: I don't really follow what's going on with the requirements-check job. Looks like it's first getting moved into openstack/requirements, and then moved into openstack-zuul-jobs | 22:30 |
jlk | or is it different, you're first moving it to openstack-zuul-jobs, and THEN moving it to openstack-requirements? | 22:30 |
*** thorst has joined #openstack-infra | 22:31 | |
clarkb | looks like node_failure means a node request failed? | 22:32 |
*** ijw has quit IRC | 22:33 | |
clarkb | Shrews: http://paste.openstack.org/show/622511/ | 22:34 |
clarkb | is that a known issue? | 22:34 |
*** thorst has quit IRC | 22:35 | |
clarkb | 804 nodes are in use right now | 22:35 |
clarkb | which is a significant chunk of our quota | 22:36 |
clarkb | SpamapS: ^ we may not need to scale up the governor at all. I think the reconfigures have been slowing down job assignments more than anything else | 22:36 |
*** lbragstad has quit IRC | 22:36 | |
Shrews | clarkb: first i've seen that. | 22:37 |
SpamapS | clarkb: Looking at the system graphs, it looks like zuul executors are going to be memory bound more than anything. | 22:37 |
*** rlandy is now known as rlandy|brb | 22:37 | |
*** jascott1 has joined #openstack-infra | 22:38 | |
Shrews | clarkb: i did notice a few moments ago (before i started dinner) that nodepool had *many* outstanding requests, but also had many fulfilled requests that weren't going away quickly. | 22:38 |
SpamapS | the swapping we saw was the main source of load AFAICT. Tons of CPU available still right now, but not much room for more user memory usage when you take buffers and cache into account, which seem to be important given all the network and pipes Zuul does. | 22:38 |
*** gouthamr has quit IRC | 22:38 | |
* Shrews returns to dinner | 22:38 | |
*** lnxnut has quit IRC | 22:38 | |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Add required-projects to logstash-filters and openstackci-beaker jobs https://review.openstack.org/508971 | 22:40 |
clarkb | up to 896 nodes in use according to nl02 | 22:42 |
sc` | huzzah. chef gates finally passed | 22:43 |
jeblair | i've started to look into the 'base not defined' error because it's not far from where i've been focusing on memory usage and think that understanding it may help | 22:44 |
*** nicolasbock_ has quit IRC | 22:45 | |
*** tpsilva has quit IRC | 22:47 | |
eumel8 | jeblair: you can look into https://review.openstack.org/#/c/508857/2 | 22:47 |
jeblair | eumel8: i know it is a zuul bug (not an actual error), and a recheck may clear it | 22:48 |
eumel8 | ok, I triggered the recheck. the last one took up to 75 minutes | 22:52 |
*** claudiub|2 has quit IRC | 22:52 | |
*** rbrndt has quit IRC | 22:54 | |
clarkb | mordred: is pip install -e just completely broken if you don't have a remote in your git config? http://logs.openstack.org/92/489492/14/gate/legacy-tempest-dsvm-cells/ed2d586/job-output.txt.gz#_2017-10-02_22_38_37_663923 /me experiments locally | 22:56 |
*** apevec has quit IRC | 22:56 | |
fungi | clarkb: check pip freeze... that's the biggest difference that i could find | 22:56 |
fungi | it doesn't show a git url, and instead includes a comment line about not finding any remote named origin | 22:57 |
openstackgerrit | Merged openstack-infra/infra-manual master: Update retire instructions for Zuul 3 https://review.openstack.org/508706 | 22:58 |
mgagne | mordred: remember that time I updated a network to be marked as "external" and everyone could see it? It looks like you could update the policy to make it no happen. Like you would need to mark it as external AND shared to be visible. (not just one or the other) | 23:00 |
mriedem | hmm, looks like py35 jobs are running against newton, where it's not supported | 23:01 |
mriedem | and grenade jobs are running on newton but there is no mitaka | 23:01 |
mriedem | tonyb: ^ | 23:01 |
clarkb | mriedem: there is a fix for that in the gate right now https://review.openstack.org/508955 | 23:01 |
mriedem | thanks | 23:01 |
*** rlandy|brb is now known as rlandy | 23:02 | |
*** andreww has quit IRC | 23:04 | |
clarkb | fungi: I see so tox is doing a freeze as part of its logging and hitting that | 23:05 |
*** mrunge_ has joined #openstack-infra | 23:05 | |
*** mrunge has quit IRC | 23:07 | |
openstackgerrit | Merged openstack-infra/openstack-zuul-jobs master: Add tox jobs including neutron repo https://review.openstack.org/508822 | 23:08 |
SpamapS | Seems like maybe 30s between load checks may be too long. the executor can gobble up a lot of jobs in 30s. | 23:08 |
SpamapS | I notice ze01 spiked up to a load of 40 and started swapping again | 23:08 |
*** tosky has quit IRC | 23:09 | |
SpamapS | and cache went down to almost nothing | 23:09 |
fungi | SpamapS: a big part of the problem is that the function we're checking returns a one-minute load average | 23:10 |
fungi | so it can go from zero to crazy before the one-minute load average plus up to half-minute polling delay notice | 23:11 |
fungi | not to mention, the imposed load may creep up once an accepted job is underway rather than immediately | 23:11 |
fungi | so lots of fuzz there | 23:12 |
*** ijw has joined #openstack-infra | 23:14 | |
*** hemna__ has quit IRC | 23:15 | |
*** baoli has joined #openstack-infra | 23:16 | |
SpamapS | fungi: it seems to be holding the line despite its flaws... so I'll just say that this teal bike shed is good enough to keep the bikes safe for now. | 23:16 |
fungi | SpamapS: yeah i think it's useful but based on observations so far the memory percentage governor may make even more sense | 23:17 |
SpamapS | fungi: the two in tandem should prevent any major catastrophe. | 23:17 |
SpamapS | 5% is still pretty conservative, the system will likely be healthy at 2%, but this gives us some head room. | 23:18 |
SpamapS | also I wondered about making a third check which was basically to just stop accepting swap goes above like 1%. | 23:19 |
SpamapS | since the major predictor of massive load and 400x slowdown is when we start copying RAM to the 400x slower disks. ;) | 23:19 |
*** gildub has joined #openstack-infra | 23:20 | |
fungi | well, keying off swap utilization is dicey since it can grow and then not shrink even if regular memory gets freed | 23:21 |
*** esberglu has quit IRC | 23:21 | |
fungi | so you might end up deadlocking an executor to indefinite non-concurrency | 23:22 |
fungi | if the kernel ends up paging out some things it doesn't really need anyway | 23:22 |
*** pahuang has joined #openstack-infra | 23:22 | |
*** ijw has quit IRC | 23:22 | |
fungi | maybe in conjunction with tuning kernel swappiness knobs it would make a little sense | 23:23 |
SamYaple | so question about the encryption/secrets. Are these secrets only available in the post job? | 23:24 |
SamYaple | also, encrypt_secret.py doesnt seem to exist anywhere | 23:26 |
fungi | they're only usable by jobs defined in the same repo they are, and only when running in pipelines whitelisted for secrets consumption | 23:26 |
SamYaple | (docs say it should be in zuul repo) | 23:26 |
jeblair | SamYaple: tools/ dir | 23:27 |
fungi | SamYaple: check the feature/zuulv3 branch (that hasn't merged back to master yet) | 23:27 |
jeblair | that too | 23:27 |
SamYaple | got it.thats what i needed | 23:27 |
SamYaple | what pipelines are whitelisted? | 23:29 |
SamYaple | im assuming post, yes? but the otehrs? | 23:29 |
jeblair | SamYaple: look for 'post-review' in the pipeline config | 23:30 |
*** hongbin has quit IRC | 23:30 | |
*** armax has joined #openstack-infra | 23:30 | |
fungi | yeah, in openstack-infra/project-config zuul.d/pipelines.yaml | 23:31 |
SamYaple | awesome sauce | 23:31 |
SamYaple | im going to get dockerhub pushing going | 23:31 |
SamYaple | is there any way to retrigger a post job if it fails? | 23:32 |
fungi | at the moment you have to ask for a zuul admin to help with that, or merge another change | 23:32 |
fungi | we have a utility to reenqueue refs in pipelines | 23:33 |
SamYaple | understood. and im assuming access to the logs are also protected fro those jobs and an admin would need to help me with that? | 23:33 |
fungi | the logs for zuul itself aren't currently exposed, but the job logs are published like normal | 23:33 |
SamYaple | right i meant for, say, a failed post job | 23:34 |
fungi | yeah, you can get to those just like you could in zuul v2 | 23:34 |
jeblair | so be careful not to have anything log the credential | 23:34 |
SamYaple | thats what i was getting at :) | 23:35 |
fungi | right, we _do_ have some tests to try and ensure that zuul itself won't accidentally leak decrypted secrets as part of normal operation, but mistakes writing the job are still a potential foot-cannon | 23:35 |
jeblair | ansible no_log may be helpful here | 23:36 |
jeblair | but still not a panacea | 23:36 |
*** lnxnut has joined #openstack-infra | 23:36 | |
fungi | so also, as i'm sure you already know, you should make separate credentials for the jobs to use so that you can revoke them easily if accidentally leaked | 23:37 |
SamYaple | well in my case i just need ot login to docker, docker saves this info to a file and then references that to push in the future. so i think i can safely lock that down | 23:37 |
fungi | if the service with which you want to interact allows that | 23:37 |
fungi | and limit the access to only what the job needs to be able to accomplishg | 23:37 |
fungi | yeah | 23:37 |
SamYaple | yea ill be making an infra user and assigning it access to only the repo it needs | 23:38 |
fungi | all the usual security belts and braces | 23:38 |
*** stakeda has joined #openstack-infra | 23:38 | |
SamYaple | i believe i can limit it so it cant delete | 23:38 |
mnaser | the puppet-nova job has the build openstack releasenotes job queued for close to 4:30 hours | 23:39 |
clarkb | mriedem: depending on that fix likely won't help, it needs to merge and zuul needs to restart | 23:42 |
SamYaple | im having trouble with encrypt_secret.py which repo are the public keys in? | 23:43 |
clarkb | SamYaple: they aren't in a rpeo, they are served by the zuul server | 23:43 |
jeblair | SamYaple: 'source' should be 'gerrit' (sorry, we're fixing this soon). the project should be your own project eg 'openstack/foo' | 23:43 |
SamYaple | got it, url is zuul.openstack.org? | 23:44 |
jeblair | zuulv3.openstack.org | 23:44 |
SamYaple | ok thanks | 23:44 |
SamYaple | :O "urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:748)>" | 23:45 |
*** lnxnut has quit IRC | 23:45 | |
SamYaple | tsk tsk tsk pulling encryption keys over http | 23:45 |
clarkb | https does work, but apparently its still a self signed cert | 23:46 |
jeblair | yeah, it'll be over https after we finish migrating and have a cert for zuul.o.o | 23:46 |
*** aeng has joined #openstack-infra | 23:46 | |
fungi | ahh, yeah we have a signed cert for zuul.o.o which zuulv3 can use once we rename it to that | 23:46 |
SamYaple | clarkb: yea i just have to hack the script to hit it | 23:46 |
jeblair | SamYaple: this part was intentionally made hard :) | 23:47 |
SamYaple | cool | 23:47 |
SamYaple | jeblair: i now see your TODO :) | 23:47 |
tonyb | I have: ailed: Read-only file system (30)\nrsync error: error in file IO (code 11) at main.c(674) [Receiver=3.1.1] from http://logs.openstack.org/a1/a1411cd50c74234692e0fd64c7df365663d37067/post/legacy-static-election-publish/e02d4e2/job-output.txt.gz#_2017-10-01_22_33_45_968313 is that something I can fix? It *looks* to me that it's outside the sphere I can influence | 23:48 |
jeblair | infra-root: ^ can someone look at that (could be urgent) | 23:48 |
*** Swami has quit IRC | 23:49 | |
fungi | looks like that was a job run from over 24 hours ago? | 23:49 |
clarkb | it tried to write into the trusted/ dir of the bwrap overlay | 23:49 |
clarkb | iirc we only make the working dir writeable? | 23:50 |
jeblair | oh ok. read-only filesystem caught my attention. | 23:50 |
clarkb | so likely the job def juts has a bad dest for the copy | 23:50 |
fungi | yeah, so i guess this is alarming-looking errors from bubblewrap's security model | 23:50 |
tonyb | clarkb: okay, I'll go look for that. | 23:50 |
clarkb | tonyb: playbooks/legacy/static-election-publish/post.yaml in ozj | 23:52 |
clarkb | looks like it copies to election/ | 23:52 |
* clarkb looks for other post copy examples | 23:52 | |
*** mat128 has joined #openstack-infra | 23:52 | |
jeblair | clarkb, fungi, mordred: ^ is that something that slipped past our first-level defense and hit the second level? as in, do we need to close a hole in the ansible plugin security barrier? | 23:53 |
jeblair | SpamapS: ^ | 23:54 |
clarkb | jeblair: possibly? though I think that first line of defence only applies for sync if copying from localhost | 23:54 |
jeblair | i thought it should catch dest outside of work/ too | 23:55 |
clarkb | tonyb: dest: "{{ zuul.executor.work_root }}/artifacts/" is what the tarball publishing job uses | 23:55 |
clarkb | tonyb: so I thin kyou can update the playbook for election publishing to use that work root | 23:56 |
tonyb | clarkb: literally that? or "{{ zuul.executor.work_root }}/election" ? | 23:56 |
jeblair | i added to etherpad under 'further debugging' section | 23:56 |
clarkb | tonyb: "{{ zuul.executor.work_root }}/election" | 23:56 |
tonyb | clarkb: Thanks. | 23:57 |
SpamapS | does it maybe call something on the system that infers a path outside work? | 23:59 |
* SpamapS hasn't looked closely | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!