Friday, 2018-10-05

*** aluria has quit IRC		01:29
*** rlandy has quit IRC		03:22
*** openstackgerrit has quit IRC		04:52
*** openstackgerrit has joined #zuul		04:57
openstackgerrit	Merged openstack-infra/zuul-jobs master: support passing extra arguments to bdist_wheel in build-python-release https://review.openstack.org/607900	05:22
*** nilashishc has joined #zuul		06:45
*** quiquell\|off is now known as quiquell\|brb		06:50
*** nilashishc has quit IRC		06:54
*** pcaruana has joined #zuul		06:57
*** nilashishc has joined #zuul		07:02
*** nilashishc has quit IRC		07:06
*** quiquell\|brb is now known as quiquell		07:08
*** nilashishc has joined #zuul		07:08
*** jpena\|off is now known as jpena		07:10
*** nilashishc has quit IRC		07:19
*** nilashishc has joined #zuul		07:22
tobiash	tristanC: did you remove 'homepage' from the package.json on purpose in the reverr of the revert?	07:37
tristanC	tobiash: yes, it is actually not needed, the default to '/' is fine	07:40
tobiash	tristanC: that broke my nifty sed to change it in the dockerfile ;)	07:41
tristanC	tobiash: it also broke my sub-url patch ;)	07:42
tobiash	tristanC: do you know if that's overridable by an env var during the build>?	07:42
tristanC	tobiash: i don't think so, you'll have to patch the json	07:43
tobiash	ok	07:43
*** aluria has joined #zuul		07:47
tobiash	tristanC: deployment works now	07:52
tobiash	tristanC: found an issue: normal click on live log works, new tab of a live log results in 404	07:53
tristanC	tobiash: hum, it should, what's the link url looks like?	07:58
tobiash	tristanC: https://cc-dev1-ci.bmwgroup.net/zuul/t/cc-playground/stream/c32ac7dfe26d4d4e9ce5d1e578efb7f2?logfile=console.log	07:58
tobiash	tristanC: is the stream route missing? https://git.zuul-ci.org/cgit/zuul/tree/zuul/web/__init__.py#n586	08:00
tobiash	I only see console-stream (which is the websocket)	08:00
tristanC	tobiash: stream route is defined L42 of https://review.openstack.org/#/c/607479/2/web/src/routes.js	08:01
tristanC	tobiash: the web server shouldn't returns 404, what's the url that fails?	08:02
*** nilashishc has quit IRC		08:02
tobiash	tristanC: the url above	08:02
tristanC	tobiash: does the other url, e.g. /builds works?	08:03
tobiash	tristanC: so there are two types of route? one in the js (for normal clicks) and one in cherrypy (for deep links?)	08:03
tobiash	tristanC: yes, builds works as deep link	08:04
tristanC	tobiash: there are web interface routes, how the index.html loads the page compoenents, that is routes.js	08:04
tristanC	tobiash: then there are api routes defined in cherrypy	08:04
tristanC	tobiash: https://review.openstack.org/#/c/607479/2/zuul/web/__init__.py edits should returns the index.html for both '/builds' and '/stream' request	08:05
tristanC	or is not working because of the '?logfile' querystring?	08:05
tobiash	tristanC: confirmed, 404 goes away when I remove the ?logfile querystring	08:07
tobiash	but that breaks the streaming itself ;)	08:08
tristanC	tobiash: i see, then maybe we need to add "arg, *kwarg" to the default() method of the static handler	08:09
tristanC	let me try this quickly	08:09
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: Revert "Revert "web: rewrite interface in react"" https://review.openstack.org/607479	08:13
tristanC	tobiash: ^ should fix that issue	08:14
*** nilashishc has joined #zuul		08:15
*** nilashishc has quit IRC		08:24
*** nilashishc has joined #zuul		08:26
*** panda\|off is now known as panda		08:40
*** electrofelix has joined #zuul		08:42
tobiash	tristanC: confirmed, this fixes the issue :)	08:55
*** chandankumar has joined #zuul		09:06
*** chandankumar has quit IRC		09:48
*** chandankumar has joined #zuul		09:59
*** chandankumar has quit IRC		10:26
*** sshnaidm is now known as sshnaidm\|off		10:35
tobiash	mordred: btw, our ansible segfaults are gone with ubuntu based executors	10:39
*** nilashishc has quit IRC		10:45
*** nilashishc has joined #zuul		10:48
*** jpena is now known as jpena\|lunch		11:04
*** jesusaur has quit IRC		11:06
*** jesusaur has joined #zuul		11:14
*** quiquell is now known as quiquell\|lunch		11:20
*** quiquell\|lunch is now known as quiquell		11:42
*** jpena\|lunch is now known as jpena		12:06
*** mrhillsman has joined #zuul		12:19
mrhillsman	any idea why i see success status and logs but zuul is not reporting back to github and all of our nodepool nodes are stuck in-use payloads are successfully delivered jobs are queued up	12:19
mrhillsman	nodepool is not deleting nodes, zuul is not reporting status back to github	12:20
mrhillsman	http://status.openlabtesting.org/t/openlab/status.html everything has just been "stuck" for hours	12:21
*** rlandy has joined #zuul		12:25
tobiash	mrhillsman: in your status I see that there are events queued (probably also the result events that trigger reporting)	12:25
*** samccann has joined #zuul		12:25
tobiash	mrhillsman: this is normal during reconfigurations, but not for hours	12:26
tobiash	mrhillsman: in that case you probably need to check the zuul-scheduler logs for anomalies	12:26
mrhillsman	for a time zookeeper was not reachable	12:27
mrhillsman	2018-10-05 10:35:47,546 WARNING kazoo.client: Connection dropped: outstanding heartbeat ping not received	12:27
tobiash	mrhillsman: maybe the mergers have no connection to the scheduler via gearman	12:28
mrhillsman	but that is all that is in scheduler and zookeeper	12:28
tobiash	mrhillsman: you also might have had connection problems from mergers to the scheduler?	12:28
tobiash	mrhillsman: currently mergers cannot detect this in some situations	12:29
tobiash	mrhillsman: you could try to restart mergers and executors	12:29
mrhillsman	i did restart them not long ago	12:29
mrhillsman	what is weird is they are all on the same server	12:29
mrhillsman	and things were fine until a couple of days ago	12:29
tobiash	do you have a log of a merger?	12:30
mrhillsman	i do	12:30
mrhillsman	there were some errors yesterday but that was when i was fixing previous fail	12:32
mrhillsman	once i got things back and "working"; nodes available to nodepool, all services restarted	12:32
mrhillsman	i ran into an issue where the websocket was not available so a job would show up but it seemed like the executor could not connect to the node	12:33
mrhillsman	it was late and i figured to check it when i got up	12:33
mrhillsman	and this is what i woke up to lol	12:33
tobiash	mrhillsman: maybe the last few log lines of the scheduler could help	12:34
mrhillsman	https://www.irccloud.com/pastebin/aUV17ODS/	12:35
mrhillsman	i restarted zookeeper	12:35
mrhillsman	before the executer and merger restart	12:36
mrhillsman	so now the scheduler logs are normal	12:36
mrhillsman	2018-10-05 12:36:14,549 DEBUG zuul.RPCListener: Received job zuul:status_get	12:36
mrhillsman	gearman is not showing any jobs	12:37
tobiash	that's because I opened your status page link ;)	12:37
mrhillsman	https://www.irccloud.com/pastebin/Ij9GOUVn/	12:37
tobiash	hrm, is there something unrelated to zk in the scheduler log before that?	12:37
mrhillsman	there's a lot of those status_get lines	12:37
mrhillsman	checking	12:37
tobiash	so the mergers are there so my previous theory is wrong	12:38
mrhillsman	so there are some lines like this 2018-10-05 08:01:29,478 DEBUG zuul.layout: Project <ProjectConfig github.com/cloudfoundry/bosh-openstack-cpi-release source: cloudfoundry/bosh-openstack-cpi-release/.zuul.yaml@master {ImpliedBranchMatcher:master}> did not match item <QueueItem 0x7ff9f01e01d0 for <Branch 0x7ff9f01e0a90 cloudfoundry/bosh-openstack-cpi-release refs/heads/wip_s3_compiled_releases up	12:42
mrhillsman	dated None..None> in periodic>	12:42
mrhillsman	and then things look fine	12:43
mrhillsman	overall things look fine	12:43
mrhillsman	that is the only anamoly	12:43
mrhillsman	and there is an error much earlier than that about a particular nodetype not being available	12:43
mrhillsman	Exception: The nodeset "ubuntu-bionic" was not found.	12:43
mrhillsman	also these 2018-10-05 08:00:21,503 DEBUG zuul.Pipeline.openlab.periodic: <class 'zuul.model.Branch'> does not support dependencies	12:44
tobiash	mrhillsman: hrm, maybe a thread is stuck	12:45
mrhillsman	if i restart the scheduler will all those clear up?	12:45
mrhillsman	the stuff on the dashboard	12:45
tobiash	mrhillsman: wait a second	12:45
mrhillsman	ok	12:46
tobiash	mrhillsman: is your current queue important?	12:46
mrhillsman	it is not	12:46
mrhillsman	i can deal with the fallout	12:46
mrhillsman	i think i want to kill the periodic and disable bosh jobs for now	12:47
tobiash	ok, you should create a stack dumo before the start so we have a chance to check if a thread was stuck	12:47
mrhillsman	ok	12:47
tobiash	you can send SIGUSR2 to the scheduler process to do that	12:47
mrhillsman	thx	12:47
tobiash	it should print a stack trace of every thread to the log	12:47
tobiash	a restart after that should be fine (if you're ok with a lost queue)	12:48
mrhillsman	ok it printed the stack trace	12:49
mrhillsman	http://paste.openstack.org/show/731586/	12:53
mrhillsman	hrmmm...maybe i will not have to restart the scheduler	12:54
mrhillsman	a bunch of stuff just disappeared	12:54
mrhillsman	and nodepool started deleting/building nodes again	12:54
mrhillsman	this is crazy	12:54
mrhillsman	jobs are running	12:55
mrhillsman	and status updates sent to github	12:56
tobiash	mrhillsman: the thing that was probably stuck was unlocking a node (line 282 in your stack dump)	12:58
tobiash	mrhillsman: maybe that has a very long timeout	12:58
mrhillsman	interesting	12:59
mrhillsman	i wonder if that is a result of something with nodepool	13:00
mrhillsman	cause all of a sudden all of the nodes that were in-use i guess unlocked and got deleted	13:00
mrhillsman	and the executor/merger reported back to github	13:00
tobiash	mrhillsman: if zuul looses its zookeeper session it automatically looses its locks (that is enforced by zookeeper)	13:01
mrhillsman	it was like everything just grinded to a halt after the jobs completed	13:01
tobiash	if that happens nodepool deletes all those nodes that were in-use and unlocked	13:01
mrhillsman	do you think i need to move zookeeper to its own node	13:01
mrhillsman	i'll try to debug it a little via the logs	13:02
mrhillsman	right now all zuul things are on one node and all nodepool on another	13:02
tobiash	do you have zk on ceph or san with sometimes high latencies?	13:02
mrhillsman	zk is on the same node as zuul daemons	13:03
tobiash	in the beginning I had many problems with zk (I'm on ceph) until I made it run on tmpfs (with 5 replica)	13:03
mrhillsman	ok, i'll look into things	13:03
mrhillsman	we were running less jobs and i had things spread out then we consolidated	13:04
mrhillsman	but now we have more stuff running	13:04
mrhillsman	so making some changes are probably in order	13:04
mrhillsman	thx for your help	13:04
tobiash	no problem	13:05
*** samccann has quit IRC		13:09
*** samccann has joined #zuul		13:10
*** evrardjp has joined #zuul		13:25
evrardjp	I am curious, is it possible to use a playbook in a job's run: stanza from a required_project, so not from the main project's repo?	13:26
AJaeger	evrardjp: yes, you can do that - that's what we do the whole time with project-config ;)	13:27
evrardjp	it seems relative path don't work: ../<required_projectname>/<playbook_relativepath_in_required_project>.yml	13:27
evrardjp	AJaeger: opening project-config right now then :)	13:27
AJaeger	evrardjp: https://zuul-ci.org/docs/zuul/user/config.html#attr-job.roles and http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/jobs.yaml#n117	13:28
tobiash	evrardjp: no, that is not possible with playbook run stanzas, but you can re-use jobs from different projects	13:29
evrardjp	I am getting opposite messages there :)	13:30
*** panda is now known as panda\|off		13:31
evrardjp	I thought roles were like ansible roles, and therefore had to be called in plays to be units of re-use	13:31
AJaeger	evrardjp: show us a change and let tobiash and myself review ;)	13:31
tobiash	evrardjp: AJaeger probably meant roles while you asked for playbooks	13:31
tobiash	I think there might be a misunderstanding ;)	13:32
evrardjp	that is fair, I understand that roles would be the "reusable" unit :)	13:32
evrardjp	I just didn't want to go for roles if I had to still write my own play. I will rethink this :)	13:32
tobiash	evrardjp: yes, roles and jobs are reusable, but not playbooks	13:32
AJaeger	tobiash: indeed - I talked about roles ;(	13:32
AJaeger	evrardjp: so, either roles or jobs - not playbooks	13:33
evrardjp	yup that's what I expected	13:33
*** quiquell is now known as quiquell\|off		13:34
*** EmilienM is now known as EvilienM		13:57
*** nilashishc has quit IRC		14:10
*** panda\|off has quit IRC		14:36
*** panda has joined #zuul		14:37
*** pcaruana has quit IRC		15:39
*** jimi\|ansible has joined #zuul		15:56
openstackgerrit	Clark Boylan proposed openstack-infra/zuul-jobs master: Retry failed git pushses on workspace setup https://review.openstack.org/608303	15:58
clarkb	anyone know if ^ will be tested as is or do I need to do something more to test that? in any case I think it is a simple change that should make job pre run setup more reliable	15:59
openstackgerrit	Clark Boylan proposed openstack-infra/zuul-jobs master: Retry failed git pushses on workspace setup https://review.openstack.org/608303	16:00
logan-	clarkb: I think you'll need register and until attrs on that task for retry to work there. (see https://docs.ansible.com/ansible/2.5/user_guide/playbooks_loops.html#do-until-loops)	16:09
clarkb	logan-: ya it wasn't clear to me if the until is necessary if normal failure checking was good enough	16:10
clarkb	logan-: the current failure checking of the task is good enough, do I need an explicit until to say until this succeeds?	16:10
clarkb	or is that implied by retries > 0?	16:10
logan-	just register: git_clone / until: git_clone is success should be sufficient	16:10
logan-	yeah i think you have to specify it anyway	16:10
logan-	based on the note "If the until parameter isn’t defined, the value for the retries parameter is forced to 1."	16:11
clarkb	ok	16:11
* clarkb updates		16:11
*** jpena is now known as jpena\|off		16:12
openstackgerrit	Clark Boylan proposed openstack-infra/zuul-jobs master: Retry failed git pushses on workspace setup https://review.openstack.org/608303	16:12
*** ianychoi_ is now known as ianychoi		16:19
*** pcaruana has joined #zuul		16:42
*** pcaruana has quit IRC		16:50
*** nilashishc has joined #zuul		16:53
pabelanger	clarkb: I can test it via ansible-network, it is an untrusted job for us	17:03
clarkb	pabelanger: if you don't mind doing that and reviewing based on results I would appreciate it greatly	17:03
pabelanger	sure	17:04
clarkb	the jobs hit by this are retried due to failing in pre-run, but not needing to spin up new test nodes for that and delaying a few seconds and retrying should be a benefit	17:04
pabelanger	clarkb: ah, mirror-workspace-git-repos, sorry. We are not using that yet. I was planning on trying to implement that shortly	17:06
pabelanger	in this case, you'll need to propose a new mirror-workspace-git-repos-test role, land then use base-test to test	17:06
clarkb	any idea if the new role can be a symlink to the existing one or does it have to be a proper copy?	17:07
pabelanger	yah, proper copy. roles will be loaded executor side, and don't think zuul will allow symlinks	17:10
clarkb	I guess it would also have to merge as well because it goes in base-test	17:11
pabelanger	yes	17:11
pabelanger	I'm hoping to do the git mirror things in untrusted when testing with ansible-network, but not sure it can because of the git mirror --push logic	17:12
pabelanger	I think zuul will block it	17:12
*** nilashishc has quit IRC		17:17
openstackgerrit	Merged openstack-infra/nodepool master: Run release-zuul-python on release https://review.openstack.org/607649	17:17
Shrews	Can someone else please +3 this race fix for a nodepool test? https://review.openstack.org/604678	17:19
clarkb	Shrews: I'll take a look	17:22
Shrews	clarkb: thx	17:27
openstackgerrit	Jeremy Stanley proposed openstack-infra/zuul-website master: Update events lists/banner after 2018 Ansiblefest https://review.openstack.org/608320	17:31
tobiash	Shrews: that failed in the gate because a zk problem. I also see this sometimes in zuul. Maybe we should increase the session timeout and/or place zk data on tmpfs during tests	17:47
Shrews	tobiash: it was only 4 seconds between connecting to zk and losing the connection. i don't think either of those would fix that. i've seen it before too, but have no idea what causes it	17:51
tobiash	Hrm, should't be the default timeout 10s?	17:53
tobiash	Before the end of the session timeout the connection state cannot be lost but only suspended	17:54
*** electrofelix has quit IRC		18:10
*** jesusaur has quit IRC		19:16
*** jesusaur has joined #zuul		19:23
openstackgerrit	Merged openstack-infra/zuul-website master: Update events lists/banner after 2018 Ansiblefest https://review.openstack.org/608320	19:29
openstackgerrit	Clark Boylan proposed openstack-infra/zuul-jobs master: Retry failed git pushses on workspace setup https://review.openstack.org/608303	19:31
openstackgerrit	Clark Boylan proposed openstack-infra/zuul-jobs master: Add test workspace setup role https://review.openstack.org/608342	19:31
clarkb	ok ^ with https://review.openstack.org/608343 should test this retry chnage	19:33
clarkb	pabelanger: logan- AJaeger ^ fyi	19:33
pabelanger	+2	19:37
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: WIP: docker-compose quickstart example https://review.openstack.org/608344	19:46
openstackgerrit	Merged openstack-infra/nodepool master: Fix race in test_launchNode_delete_error https://review.openstack.org/604678	20:18
*** samccann has quit IRC		20:33
*** EvilienM is now known as EmilienM		22:06
*** rlandy has quit IRC		22:24
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: WIP: docker-compose quickstart example https://review.openstack.org/608344	22:25
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: WIP: docker-compose quickstart example https://review.openstack.org/608344	22:26

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!