*** jangutter has quit IRC | 08:05 | |
*** jangutter has joined #softwarefactory | 08:06 | |
*** sshnaidm_ has joined #softwarefactory | 08:59 | |
*** sshnaidm_ has quit IRC | 09:18 | |
*** zoli is now known as zoli|lunch | 09:42 | |
*** zoli|lunch is now known as zoli | 09:42 | |
*** sshnaidm has joined #softwarefactory | 10:06 | |
*** sshnaidm has quit IRC | 10:12 | |
*** sshnaidm has joined #softwarefactory | 10:58 | |
*** zoli is now known as zoli|afk | 11:34 | |
*** zoli|afk is now known as zoli | 11:34 | |
*** sshnaidm_ has joined #softwarefactory | 11:50 | |
*** sshnaidm has quit IRC | 11:53 | |
sfbender | Matthieu Huin created software-factory/sf-config master: Add (undocumented) login collision strategy option in cauth config https://softwarefactory-project.io/r/14003 | 12:42 |
---|---|---|
pabelanger | fbo: tristanC: mhu: we've deleted a branch from a repo, but don't believe zuul seen the event, now we are getting merge conflict errors on ansible-network/cloud_vpn. Can you check the zuul merge logs and see why? I also belive a full reconfigure of zuul may fix it, but have no way of triggering that currently | 15:52 |
mhu | pabelanger, I'm going to have a look | 15:55 |
pabelanger | It is likely this is a bug, since zuul didn't see the delete branch event, and didn't reload configuration | 15:57 |
pabelanger | but logs will show more | 15:57 |
mhu | I found PR #50, I'm following the thread... | 16:00 |
pabelanger | yah, PR54 also fails: https://github.com/ansible-network/cloud_vpn/pull/54 | 16:01 |
mhu | found this on PR51, not sure if related or already fixed: https://pastebin.com/Sa152hVf | 16:02 |
*** mhu has left #softwarefactory | 16:06 | |
*** mhu has joined #softwarefactory | 16:06 | |
mhu | oops | 16:06 |
mhu | https://pastebin.com/HVvQcnN3 | 16:06 |
mhu | that's all i see in the logs | 16:07 |
mhu | pabelanger, ^ | 16:07 |
mhu | pabelanger, a reconfigure requires a restart? if so I'm not too keen on restarting the service now, there are 33 jobs queued on the rdo tenant | 16:10 |
pabelanger | mhu: okay, can you get me the relevant logs so we can report this in #zuul, because this looks to be a zuul ub | 16:13 |
pabelanger | bug* | 16:13 |
pabelanger | mhu: no, you can do full-reconfigure from CLI | 16:13 |
*** sshnaidm_ is now known as sshnaidm | 16:14 | |
mhu | it's not documented in the CLI help? or is it something now? | 16:14 |
pabelanger | mhu: SIGHUP | 16:14 |
pabelanger | let me find docs | 16:14 |
mhu | thanks | 16:14 |
pabelanger | mhu: https://zuul-ci.org/docs/zuul/admin/components.html#operation | 16:15 |
*** atarakt has left #softwarefactory | 16:17 | |
*** nhicher has joined #softwarefactory | 16:17 | |
pabelanger | mhu: did the docs help? | 16:22 |
mhu | pabelanger, yeah but is that recent? the zuul-scheduler on sfio doesn't have this option | 16:23 |
mhu | the only option is "stop" | 16:23 |
pabelanger | mhu: yes, but you can kill -s SIGHUP <pid> | 16:24 |
pabelanger | that is the original way of doing it | 16:24 |
mhu | right, sorry, I misread the doc | 16:24 |
mhu | end of the day, etc | 16:24 |
pabelanger | Yah, need more converage in NA :) | 16:25 |
mhu | ok, SIGHUP done | 16:27 |
pabelanger | mhu: you should see reload happening logs | 16:27 |
pabelanger | odd | 16:29 |
pabelanger | https://ansible-network.softwarefactory-project.io/zuul/status | 16:29 |
pabelanger | Request failed with status code 500 | 16:29 |
pabelanger | mhu: did zuul stop? | 16:31 |
pabelanger | https://softwarefactory-project.io/grafana/d/000000001/zuul-status?orgId=1&from=now-1h&to=now&refresh=5s | 16:31 |
mhu | it didn't appear stopped in systemctl | 16:31 |
pabelanger | I don't see any executors or mergers online | 16:31 |
mhu | to be sure I restarted the scheduler and web | 16:31 |
pabelanger | mhu: Oh, you stop / started everything? | 16:32 |
mhu | odd, I'm not even on the executors nor mergers | 16:32 |
mhu | just the scheduler | 16:32 |
pabelanger | mhu: did you dump the queues first? | 16:32 |
mhu | no ... | 16:32 |
pabelanger | otherwise, we loose all the open patches that are running | 16:32 |
pabelanger | okay, that is an issue then | 16:33 |
pabelanger | we should avoid doing that, as all open changes now need to be rechecked | 16:33 |
pabelanger | okay, my PR looks right | 16:34 |
mhu | ahah, well there's that at least | 16:35 |
pabelanger | mhu: you'll have to notify rdo all open changes need to be rechecked | 16:35 |
mhu | yup, going there | 16:35 |
pabelanger | mhu: https://docs.openstack.org/infra/system-config/zuulv3.html#restarting-the-scheduler | 16:35 |
pabelanger | is a good doc explaning how to do restarts safely | 16:35 |
pabelanger | also, dmsimard wrote a script upstream for infra, that dumped queues every minute, incase this happens | 16:36 |
pabelanger | then we have something to atleast try and re-enqueue | 16:36 |
pabelanger | I would strongly recommend adding it to SF.io | 16:36 |
mhu | also I shouldn't be allowed near production systems past 6PM | 16:36 |
mhu | and with that, I'm off, catch you later | 16:37 |
pabelanger | matburt: there was an Zuul outage, see above. you might need to recheck open awx PRs | 16:39 |
pabelanger | mhu: tristanC: fbo: https://review.openstack.org/#/c/532955/ is the patch from dmsimard, can we please add it to SF.io zuul if missing | 16:40 |
matburt | pabelanger hah we noticed | 16:54 |
pabelanger | matburt: yah, sorry about that. Going to work with SF.io team to help protect more from total outage | 16:55 |
matburt | meh, it is what it is | 16:55 |
dmsimard | pabelanger: I'm not sure if they're still saved in the zuul web dir, we might have moved them afterwards | 17:03 |
pabelanger | mhu: fbo: tristanC: we are still getting a merge conflict from zuul, can we debug please: https://github.com/ansible-network/cloud_vpn/pull/54 | 17:46 |
pabelanger | check pipeline works, but when we move to gate, fails | 17:50 |
*** ssbarnea_ has joined #softwarefactory | 20:03 | |
*** ssbarnea_ has quit IRC | 21:53 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!