*** aluria has quit IRC | 01:29 | |
*** rlandy has quit IRC | 03:22 | |
*** openstackgerrit has quit IRC | 04:52 | |
*** openstackgerrit has joined #zuul | 04:57 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: support passing extra arguments to bdist_wheel in build-python-release https://review.openstack.org/607900 | 05:22 |
---|---|---|
*** nilashishc has joined #zuul | 06:45 | |
*** quiquell|off is now known as quiquell|brb | 06:50 | |
*** nilashishc has quit IRC | 06:54 | |
*** pcaruana has joined #zuul | 06:57 | |
*** nilashishc has joined #zuul | 07:02 | |
*** nilashishc has quit IRC | 07:06 | |
*** quiquell|brb is now known as quiquell | 07:08 | |
*** nilashishc has joined #zuul | 07:08 | |
*** jpena|off is now known as jpena | 07:10 | |
*** nilashishc has quit IRC | 07:19 | |
*** nilashishc has joined #zuul | 07:22 | |
tobiash | tristanC: did you remove 'homepage' from the package.json on purpose in the reverr of the revert? | 07:37 |
tristanC | tobiash: yes, it is actually not needed, the default to '/' is fine | 07:40 |
tobiash | tristanC: that broke my nifty sed to change it in the dockerfile ;) | 07:41 |
tristanC | tobiash: it also broke my sub-url patch ;) | 07:42 |
tobiash | tristanC: do you know if that's overridable by an env var during the build>? | 07:42 |
tristanC | tobiash: i don't think so, you'll have to patch the json | 07:43 |
tobiash | ok | 07:43 |
*** aluria has joined #zuul | 07:47 | |
tobiash | tristanC: deployment works now | 07:52 |
tobiash | tristanC: found an issue: normal click on live log works, new tab of a live log results in 404 | 07:53 |
tristanC | tobiash: hum, it should, what's the link url looks like? | 07:58 |
tobiash | tristanC: https://cc-dev1-ci.bmwgroup.net/zuul/t/cc-playground/stream/c32ac7dfe26d4d4e9ce5d1e578efb7f2?logfile=console.log | 07:58 |
tobiash | tristanC: is the stream route missing? https://git.zuul-ci.org/cgit/zuul/tree/zuul/web/__init__.py#n586 | 08:00 |
tobiash | I only see console-stream (which is the websocket) | 08:00 |
tristanC | tobiash: stream route is defined L42 of https://review.openstack.org/#/c/607479/2/web/src/routes.js | 08:01 |
tristanC | tobiash: the web server shouldn't returns 404, what's the url that fails? | 08:02 |
*** nilashishc has quit IRC | 08:02 | |
tobiash | tristanC: the url above | 08:02 |
tristanC | tobiash: does the other url, e.g. /builds works? | 08:03 |
tobiash | tristanC: so there are two types of route? one in the js (for normal clicks) and one in cherrypy (for deep links?) | 08:03 |
tobiash | tristanC: yes, builds works as deep link | 08:04 |
tristanC | tobiash: there are web interface routes, how the index.html loads the page compoenents, that is routes.js | 08:04 |
tristanC | tobiash: then there are api routes defined in cherrypy | 08:04 |
tristanC | tobiash: https://review.openstack.org/#/c/607479/2/zuul/web/__init__.py edits should returns the index.html for both '/builds' and '/stream' request | 08:05 |
tristanC | or is not working because of the '?logfile' querystring? | 08:05 |
tobiash | tristanC: confirmed, 404 goes away when I remove the ?logfile querystring | 08:07 |
tobiash | but that breaks the streaming itself ;) | 08:08 |
tristanC | tobiash: i see, then maybe we need to add "*arg, **kwarg" to the default() method of the static handler | 08:09 |
tristanC | let me try this quickly | 08:09 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: Revert "Revert "web: rewrite interface in react"" https://review.openstack.org/607479 | 08:13 |
tristanC | tobiash: ^ should fix that issue | 08:14 |
*** nilashishc has joined #zuul | 08:15 | |
*** nilashishc has quit IRC | 08:24 | |
*** nilashishc has joined #zuul | 08:26 | |
*** panda|off is now known as panda | 08:40 | |
*** electrofelix has joined #zuul | 08:42 | |
tobiash | tristanC: confirmed, this fixes the issue :) | 08:55 |
*** chandankumar has joined #zuul | 09:06 | |
*** chandankumar has quit IRC | 09:48 | |
*** chandankumar has joined #zuul | 09:59 | |
*** chandankumar has quit IRC | 10:26 | |
*** sshnaidm is now known as sshnaidm|off | 10:35 | |
tobiash | mordred: btw, our ansible segfaults are gone with ubuntu based executors | 10:39 |
*** nilashishc has quit IRC | 10:45 | |
*** nilashishc has joined #zuul | 10:48 | |
*** jpena is now known as jpena|lunch | 11:04 | |
*** jesusaur has quit IRC | 11:06 | |
*** jesusaur has joined #zuul | 11:14 | |
*** quiquell is now known as quiquell|lunch | 11:20 | |
*** quiquell|lunch is now known as quiquell | 11:42 | |
*** jpena|lunch is now known as jpena | 12:06 | |
*** mrhillsman has joined #zuul | 12:19 | |
mrhillsman | any idea why i see success status and logs but zuul is not reporting back to github and all of our nodepool nodes are stuck in-use payloads are successfully delivered jobs are queued up | 12:19 |
mrhillsman | nodepool is not deleting nodes, zuul is not reporting status back to github | 12:20 |
mrhillsman | http://status.openlabtesting.org/t/openlab/status.html everything has just been "stuck" for hours | 12:21 |
*** rlandy has joined #zuul | 12:25 | |
tobiash | mrhillsman: in your status I see that there are events queued (probably also the result events that trigger reporting) | 12:25 |
*** samccann has joined #zuul | 12:25 | |
tobiash | mrhillsman: this is normal during reconfigurations, but not for hours | 12:26 |
tobiash | mrhillsman: in that case you probably need to check the zuul-scheduler logs for anomalies | 12:26 |
mrhillsman | for a time zookeeper was not reachable | 12:27 |
mrhillsman | 2018-10-05 10:35:47,546 WARNING kazoo.client: Connection dropped: outstanding heartbeat ping not received | 12:27 |
tobiash | mrhillsman: maybe the mergers have no connection to the scheduler via gearman | 12:28 |
mrhillsman | but that is all that is in scheduler and zookeeper | 12:28 |
tobiash | mrhillsman: you also might have had connection problems from mergers to the scheduler? | 12:28 |
tobiash | mrhillsman: currently mergers cannot detect this in some situations | 12:29 |
tobiash | mrhillsman: you could try to restart mergers and executors | 12:29 |
mrhillsman | i did restart them not long ago | 12:29 |
mrhillsman | what is weird is they are all on the same server | 12:29 |
mrhillsman | and things were fine until a couple of days ago | 12:29 |
tobiash | do you have a log of a merger? | 12:30 |
mrhillsman | i do | 12:30 |
mrhillsman | there were some errors yesterday but that was when i was fixing previous fail | 12:32 |
mrhillsman | once i got things back and "working"; nodes available to nodepool, all services restarted | 12:32 |
mrhillsman | i ran into an issue where the websocket was not available so a job would show up but it seemed like the executor could not connect to the node | 12:33 |
mrhillsman | it was late and i figured to check it when i got up | 12:33 |
mrhillsman | and this is what i woke up to lol | 12:33 |
tobiash | mrhillsman: maybe the last few log lines of the scheduler could help | 12:34 |
mrhillsman | https://www.irccloud.com/pastebin/aUV17ODS/ | 12:35 |
mrhillsman | i restarted zookeeper | 12:35 |
mrhillsman | before the executer and merger restart | 12:36 |
mrhillsman | so now the scheduler logs are normal | 12:36 |
mrhillsman | 2018-10-05 12:36:14,549 DEBUG zuul.RPCListener: Received job zuul:status_get | 12:36 |
mrhillsman | gearman is not showing any jobs | 12:37 |
tobiash | that's because I opened your status page link ;) | 12:37 |
mrhillsman | https://www.irccloud.com/pastebin/Ij9GOUVn/ | 12:37 |
tobiash | hrm, is there something unrelated to zk in the scheduler log before that? | 12:37 |
mrhillsman | there's a lot of those status_get lines | 12:37 |
mrhillsman | checking | 12:37 |
tobiash | so the mergers are there so my previous theory is wrong | 12:38 |
mrhillsman | so there are some lines like this 2018-10-05 08:01:29,478 DEBUG zuul.layout: Project <ProjectConfig github.com/cloudfoundry/bosh-openstack-cpi-release source: cloudfoundry/bosh-openstack-cpi-release/.zuul.yaml@master {ImpliedBranchMatcher:master}> did not match item <QueueItem 0x7ff9f01e01d0 for <Branch 0x7ff9f01e0a90 cloudfoundry/bosh-openstack-cpi-release refs/heads/wip_s3_compiled_releases up | 12:42 |
mrhillsman | dated None..None> in periodic> | 12:42 |
mrhillsman | and then things look fine | 12:43 |
mrhillsman | overall things look fine | 12:43 |
mrhillsman | that is the only anamoly | 12:43 |
mrhillsman | and there is an error much earlier than that about a particular nodetype not being available | 12:43 |
mrhillsman | Exception: The nodeset "ubuntu-bionic" was not found. | 12:43 |
mrhillsman | also these 2018-10-05 08:00:21,503 DEBUG zuul.Pipeline.openlab.periodic: <class 'zuul.model.Branch'> does not support dependencies | 12:44 |
tobiash | mrhillsman: hrm, maybe a thread is stuck | 12:45 |
mrhillsman | if i restart the scheduler will all those clear up? | 12:45 |
mrhillsman | the stuff on the dashboard | 12:45 |
tobiash | mrhillsman: wait a second | 12:45 |
mrhillsman | ok | 12:46 |
tobiash | mrhillsman: is your current queue important? | 12:46 |
mrhillsman | it is not | 12:46 |
mrhillsman | i can deal with the fallout | 12:46 |
mrhillsman | i think i want to kill the periodic and disable bosh jobs for now | 12:47 |
tobiash | ok, you should create a stack dumo before the start so we have a chance to check if a thread was stuck | 12:47 |
mrhillsman | ok | 12:47 |
tobiash | you can send SIGUSR2 to the scheduler process to do that | 12:47 |
mrhillsman | thx | 12:47 |
tobiash | it should print a stack trace of every thread to the log | 12:47 |
tobiash | a restart after that should be fine (if you're ok with a lost queue) | 12:48 |
mrhillsman | ok it printed the stack trace | 12:49 |
mrhillsman | http://paste.openstack.org/show/731586/ | 12:53 |
mrhillsman | hrmmm...maybe i will not have to restart the scheduler | 12:54 |
mrhillsman | a bunch of stuff just disappeared | 12:54 |
mrhillsman | and nodepool started deleting/building nodes again | 12:54 |
mrhillsman | this is crazy | 12:54 |
mrhillsman | jobs are running | 12:55 |
mrhillsman | and status updates sent to github | 12:56 |
tobiash | mrhillsman: the thing that was probably stuck was unlocking a node (line 282 in your stack dump) | 12:58 |
tobiash | mrhillsman: maybe that has a very long timeout | 12:58 |
mrhillsman | interesting | 12:59 |
mrhillsman | i wonder if that is a result of something with nodepool | 13:00 |
mrhillsman | cause all of a sudden all of the nodes that were in-use i guess unlocked and got deleted | 13:00 |
mrhillsman | and the executor/merger reported back to github | 13:00 |
tobiash | mrhillsman: if zuul looses its zookeeper session it automatically looses its locks (that is enforced by zookeeper) | 13:01 |
mrhillsman | it was like everything just grinded to a halt after the jobs completed | 13:01 |
tobiash | if that happens nodepool deletes all those nodes that were in-use and unlocked | 13:01 |
mrhillsman | do you think i need to move zookeeper to its own node | 13:01 |
mrhillsman | i'll try to debug it a little via the logs | 13:02 |
mrhillsman | right now all zuul things are on one node and all nodepool on another | 13:02 |
tobiash | do you have zk on ceph or san with sometimes high latencies? | 13:02 |
mrhillsman | zk is on the same node as zuul daemons | 13:03 |
tobiash | in the beginning I had many problems with zk (I'm on ceph) until I made it run on tmpfs (with 5 replica) | 13:03 |
mrhillsman | ok, i'll look into things | 13:03 |
mrhillsman | we were running less jobs and i had things spread out then we consolidated | 13:04 |
mrhillsman | but now we have more stuff running | 13:04 |
mrhillsman | so making some changes are probably in order | 13:04 |
mrhillsman | thx for your help | 13:04 |
tobiash | no problem | 13:05 |
*** samccann has quit IRC | 13:09 | |
*** samccann has joined #zuul | 13:10 | |
*** evrardjp has joined #zuul | 13:25 | |
evrardjp | I am curious, is it possible to use a playbook in a job's run: stanza from a required_project, so not from the main project's repo? | 13:26 |
AJaeger | evrardjp: yes, you can do that - that's what we do the whole time with project-config ;) | 13:27 |
evrardjp | it seems relative path don't work: ../<required_projectname>/<playbook_relativepath_in_required_project>.yml | 13:27 |
evrardjp | AJaeger: opening project-config right now then :) | 13:27 |
AJaeger | evrardjp: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.roles and http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/jobs.yaml#n117 | 13:28 |
tobiash | evrardjp: no, that is not possible with playbook run stanzas, but you can re-use jobs from different projects | 13:29 |
evrardjp | I am getting opposite messages there :) | 13:30 |
*** panda is now known as panda|off | 13:31 | |
evrardjp | I thought roles were like ansible roles, and therefore had to be called in plays to be units of re-use | 13:31 |
AJaeger | evrardjp: show us a change and let tobiash and myself review ;) | 13:31 |
tobiash | evrardjp: AJaeger probably meant roles while you asked for playbooks | 13:31 |
tobiash | I think there might be a misunderstanding ;) | 13:32 |
evrardjp | that is fair, I understand that roles would be the "reusable" unit :) | 13:32 |
evrardjp | I just didn't want to go for roles if I had to still write my own play. I will rethink this :) | 13:32 |
tobiash | evrardjp: yes, roles and jobs are reusable, but not playbooks | 13:32 |
AJaeger | tobiash: indeed - I talked about roles ;( | 13:32 |
AJaeger | evrardjp: so, either roles or jobs - not playbooks | 13:33 |
evrardjp | yup that's what I expected | 13:33 |
*** quiquell is now known as quiquell|off | 13:34 | |
*** EmilienM is now known as EvilienM | 13:57 | |
*** nilashishc has quit IRC | 14:10 | |
*** panda|off has quit IRC | 14:36 | |
*** panda has joined #zuul | 14:37 | |
*** pcaruana has quit IRC | 15:39 | |
*** jimi|ansible has joined #zuul | 15:56 | |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul-jobs master: Retry failed git pushses on workspace setup https://review.openstack.org/608303 | 15:58 |
clarkb | anyone know if ^ will be tested as is or do I need to do something more to test that? in any case I think it is a simple change that should make job pre run setup more reliable | 15:59 |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul-jobs master: Retry failed git pushses on workspace setup https://review.openstack.org/608303 | 16:00 |
logan- | clarkb: I think you'll need register and until attrs on that task for retry to work there. (see https://docs.ansible.com/ansible/2.5/user_guide/playbooks_loops.html#do-until-loops) | 16:09 |
clarkb | logan-: ya it wasn't clear to me if the until is necessary if normal failure checking was good enough | 16:10 |
clarkb | logan-: the current failure checking of the task is good enough, do I need an explicit until to say until this succeeds? | 16:10 |
clarkb | or is that implied by retries > 0? | 16:10 |
logan- | just register: git_clone / until: git_clone is success should be sufficient | 16:10 |
logan- | yeah i think you have to specify it anyway | 16:10 |
logan- | based on the note "If the until parameter isn’t defined, the value for the retries parameter is forced to 1." | 16:11 |
clarkb | ok | 16:11 |
* clarkb updates | 16:11 | |
*** jpena is now known as jpena|off | 16:12 | |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul-jobs master: Retry failed git pushses on workspace setup https://review.openstack.org/608303 | 16:12 |
*** ianychoi_ is now known as ianychoi | 16:19 | |
*** pcaruana has joined #zuul | 16:42 | |
*** pcaruana has quit IRC | 16:50 | |
*** nilashishc has joined #zuul | 16:53 | |
pabelanger | clarkb: I can test it via ansible-network, it is an untrusted job for us | 17:03 |
clarkb | pabelanger: if you don't mind doing that and reviewing based on results I would appreciate it greatly | 17:03 |
pabelanger | sure | 17:04 |
clarkb | the jobs hit by this are retried due to failing in pre-run, but not needing to spin up new test nodes for that and delaying a few seconds and retrying should be a benefit | 17:04 |
pabelanger | clarkb: ah, mirror-workspace-git-repos, sorry. We are not using that yet. I was planning on trying to implement that shortly | 17:06 |
pabelanger | in this case, you'll need to propose a new mirror-workspace-git-repos-test role, land then use base-test to test | 17:06 |
clarkb | any idea if the new role can be a symlink to the existing one or does it have to be a proper copy? | 17:07 |
pabelanger | yah, proper copy. roles will be loaded executor side, and don't think zuul will allow symlinks | 17:10 |
clarkb | I guess it would also have to merge as well because it goes in base-test | 17:11 |
pabelanger | yes | 17:11 |
pabelanger | I'm hoping to do the git mirror things in untrusted when testing with ansible-network, but not sure it can because of the git mirror --push logic | 17:12 |
pabelanger | I think zuul will block it | 17:12 |
*** nilashishc has quit IRC | 17:17 | |
openstackgerrit | Merged openstack-infra/nodepool master: Run release-zuul-python on release https://review.openstack.org/607649 | 17:17 |
Shrews | Can someone else please +3 this race fix for a nodepool test? https://review.openstack.org/604678 | 17:19 |
clarkb | Shrews: I'll take a look | 17:22 |
Shrews | clarkb: thx | 17:27 |
openstackgerrit | Jeremy Stanley proposed openstack-infra/zuul-website master: Update events lists/banner after 2018 Ansiblefest https://review.openstack.org/608320 | 17:31 |
tobiash | Shrews: that failed in the gate because a zk problem. I also see this sometimes in zuul. Maybe we should increase the session timeout and/or place zk data on tmpfs during tests | 17:47 |
Shrews | tobiash: it was only 4 seconds between connecting to zk and losing the connection. i don't think either of those would fix that. i've seen it before too, but have no idea what causes it | 17:51 |
tobiash | Hrm, should't be the default timeout 10s? | 17:53 |
tobiash | Before the end of the session timeout the connection state cannot be lost but only suspended | 17:54 |
*** electrofelix has quit IRC | 18:10 | |
*** jesusaur has quit IRC | 19:16 | |
*** jesusaur has joined #zuul | 19:23 | |
openstackgerrit | Merged openstack-infra/zuul-website master: Update events lists/banner after 2018 Ansiblefest https://review.openstack.org/608320 | 19:29 |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul-jobs master: Retry failed git pushses on workspace setup https://review.openstack.org/608303 | 19:31 |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul-jobs master: Add test workspace setup role https://review.openstack.org/608342 | 19:31 |
clarkb | ok ^ with https://review.openstack.org/608343 should test this retry chnage | 19:33 |
clarkb | pabelanger: logan- AJaeger ^ fyi | 19:33 |
pabelanger | +2 | 19:37 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: WIP: docker-compose quickstart example https://review.openstack.org/608344 | 19:46 |
openstackgerrit | Merged openstack-infra/nodepool master: Fix race in test_launchNode_delete_error https://review.openstack.org/604678 | 20:18 |
*** samccann has quit IRC | 20:33 | |
*** EvilienM is now known as EmilienM | 22:06 | |
*** rlandy has quit IRC | 22:24 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: WIP: docker-compose quickstart example https://review.openstack.org/608344 | 22:25 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: WIP: docker-compose quickstart example https://review.openstack.org/608344 | 22:26 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!