Thursday, 2018-10-18

*** jangutter has quit IRC		08:05
*** jangutter has joined #softwarefactory		08:06
*** sshnaidm_ has joined #softwarefactory		08:59
*** sshnaidm_ has quit IRC		09:18
*** zoli is now known as zoli\|lunch		09:42
*** zoli\|lunch is now known as zoli		09:42
*** sshnaidm has joined #softwarefactory		10:06
*** sshnaidm has quit IRC		10:12
*** sshnaidm has joined #softwarefactory		10:58
*** zoli is now known as zoli\|afk		11:34
*** zoli\|afk is now known as zoli		11:34
*** sshnaidm_ has joined #softwarefactory		11:50
*** sshnaidm has quit IRC		11:53
sfbender	Matthieu Huin created software-factory/sf-config master: Add (undocumented) login collision strategy option in cauth config https://softwarefactory-project.io/r/14003	12:42
pabelanger	fbo: tristanC: mhu: we've deleted a branch from a repo, but don't believe zuul seen the event, now we are getting merge conflict errors on ansible-network/cloud_vpn. Can you check the zuul merge logs and see why? I also belive a full reconfigure of zuul may fix it, but have no way of triggering that currently	15:52
mhu	pabelanger, I'm going to have a look	15:55
pabelanger	It is likely this is a bug, since zuul didn't see the delete branch event, and didn't reload configuration	15:57
pabelanger	but logs will show more	15:57
mhu	I found PR #50, I'm following the thread...	16:00
pabelanger	yah, PR54 also fails: https://github.com/ansible-network/cloud_vpn/pull/54	16:01
mhu	found this on PR51, not sure if related or already fixed: https://pastebin.com/Sa152hVf	16:02
*** mhu has left #softwarefactory		16:06
*** mhu has joined #softwarefactory		16:06
mhu	oops	16:06
mhu	https://pastebin.com/HVvQcnN3	16:06
mhu	that's all i see in the logs	16:07
mhu	pabelanger, ^	16:07
mhu	pabelanger, a reconfigure requires a restart? if so I'm not too keen on restarting the service now, there are 33 jobs queued on the rdo tenant	16:10
pabelanger	mhu: okay, can you get me the relevant logs so we can report this in #zuul, because this looks to be a zuul ub	16:13
pabelanger	bug*	16:13
pabelanger	mhu: no, you can do full-reconfigure from CLI	16:13
*** sshnaidm_ is now known as sshnaidm		16:14
mhu	it's not documented in the CLI help? or is it something now?	16:14
pabelanger	mhu: SIGHUP	16:14
pabelanger	let me find docs	16:14
mhu	thanks	16:14
pabelanger	mhu: https://zuul-ci.org/docs/zuul/admin/components.html#operation	16:15
*** atarakt has left #softwarefactory		16:17
*** nhicher has joined #softwarefactory		16:17
pabelanger	mhu: did the docs help?	16:22
mhu	pabelanger, yeah but is that recent? the zuul-scheduler on sfio doesn't have this option	16:23
mhu	the only option is "stop"	16:23
pabelanger	mhu: yes, but you can kill -s SIGHUP <pid>	16:24
pabelanger	that is the original way of doing it	16:24
mhu	right, sorry, I misread the doc	16:24
mhu	end of the day, etc	16:24
pabelanger	Yah, need more converage in NA :)	16:25
mhu	ok, SIGHUP done	16:27
pabelanger	mhu: you should see reload happening logs	16:27
pabelanger	odd	16:29
pabelanger	https://ansible-network.softwarefactory-project.io/zuul/status	16:29
pabelanger	Request failed with status code 500	16:29
pabelanger	mhu: did zuul stop?	16:31
pabelanger	https://softwarefactory-project.io/grafana/d/000000001/zuul-status?orgId=1&from=now-1h&to=now&refresh=5s	16:31
mhu	it didn't appear stopped in systemctl	16:31
pabelanger	I don't see any executors or mergers online	16:31
mhu	to be sure I restarted the scheduler and web	16:31
pabelanger	mhu: Oh, you stop / started everything?	16:32
mhu	odd, I'm not even on the executors nor mergers	16:32
mhu	just the scheduler	16:32
pabelanger	mhu: did you dump the queues first?	16:32
mhu	no ...	16:32
pabelanger	otherwise, we loose all the open patches that are running	16:32
pabelanger	okay, that is an issue then	16:33
pabelanger	we should avoid doing that, as all open changes now need to be rechecked	16:33
pabelanger	okay, my PR looks right	16:34
mhu	ahah, well there's that at least	16:35
pabelanger	mhu: you'll have to notify rdo all open changes need to be rechecked	16:35
mhu	yup, going there	16:35
pabelanger	mhu: https://docs.openstack.org/infra/system-config/zuulv3.html#restarting-the-scheduler	16:35
pabelanger	is a good doc explaning how to do restarts safely	16:35
pabelanger	also, dmsimard wrote a script upstream for infra, that dumped queues every minute, incase this happens	16:36
pabelanger	then we have something to atleast try and re-enqueue	16:36
pabelanger	I would strongly recommend adding it to SF.io	16:36
mhu	also I shouldn't be allowed near production systems past 6PM	16:36
mhu	and with that, I'm off, catch you later	16:37
pabelanger	matburt: there was an Zuul outage, see above. you might need to recheck open awx PRs	16:39
pabelanger	mhu: tristanC: fbo: https://review.openstack.org/#/c/532955/ is the patch from dmsimard, can we please add it to SF.io zuul if missing	16:40
matburt	pabelanger hah we noticed	16:54
pabelanger	matburt: yah, sorry about that. Going to work with SF.io team to help protect more from total outage	16:55
matburt	meh, it is what it is	16:55
dmsimard	pabelanger: I'm not sure if they're still saved in the zuul web dir, we might have moved them afterwards	17:03
pabelanger	mhu: fbo: tristanC: we are still getting a merge conflict from zuul, can we debug please: https://github.com/ansible-network/cloud_vpn/pull/54	17:46
pabelanger	check pipeline works, but when we move to gate, fails	17:50
*** ssbarnea_ has joined #softwarefactory		20:03
*** ssbarnea_ has quit IRC		21:53

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!