Saturday, 2026-02-28

@tkajinam:matrix.orgI'm afraid some of the jobs are again kept stuck in queued status for some time https://zuul.opendev.org/t/openstack/status15:25
@jim:acmegating.comthis time it does look like there may be a zuul error involved15:29
@tkajinam:matrix.orgSome of the nodes have been in building state for a few hours but I don't know if that's related15:55
@jim:acmegating.comyes it is15:55
@tkajinam:matrix.orgI suepect something is wrong with cloud providers though these nodes are distributed among multiple providers, if I read https://zuul.opendev.org/t/openstack/nodes correctly15:57
@jim:acmegating.comno like i said it's a problem with zuul15:57
@tkajinam:matrix.orgah, sorry. I misread it 15:58
@jim:acmegating.comi have a theory of a cause and a fix, unfortunately, 3 things are going on right now that are slowing it down, so even if i'm right, recovery may take a bit16:01
@jim:acmegating.comi suspect the issue can be corrected by a reconfiguration, but the zuul upgrade is still in progress, so only one scheduler can actually issue the correction.  the gerrit server that google runs for the gerrit project itself is slow, so during the reconfiguration, zuul is getting a lot of http read timeouts, which is slowing the reconfiguration.16:03
@jim:acmegating.comso we need to wait for a lot of http read timeouts on gerrit's gerrit, then one of the schedulers can reconfigure each tenant, one at a time.  and then, after we reconfigure all the tenants, slowly, we'll know if i'm right.  :/16:05
@tkajinam:matrix.orghmm ok. I'll check the status tomorrow.16:14
@tkajinam:matrix.orgthanks !16:14
@jim:acmegating.cominfra-root: i suspect that google has engaged some abuse prevention measures that are affecting zuul's use of the gerrit-review.googlesource.com server.  if i visit https://gerrit-review.googlesource.com/plugins/delete-project/info/refs?service=git-upload-pack in a browser, i get an immediate response, but a wget takes 43 seconds.  that's longer than the 30s timeout, so zuul is unable to get the refs for all of the googlesource repos.16:34
@jim:acmegating.comit looks like the tenant reconfigurations (which are now finally done) may have resolved the issue, so node requests are being processed now16:52
@tkajinam:matrix.orgoh, yes17:05
@tkajinam:matrix.orgsome of the jobs which was in queued status are now moving forward17:05
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [openstack/project-config] 978334: Remove googlesource projects https://review.opendev.org/c/openstack/project-config/+/97833418:02
@jim:acmegating.cominfra-root: i don't think we need to merge that immediately, but we may want to consider it for the safety of the system.18:03
@jim:acmegating.coman alternative may be to change zuul's gerrit driver to get that information through the rest api (which is, weirdly, more resource intensive).  that would be a moderately complicated undertaking, so not a quick fix.  but perhaps the safest long-term one.18:04
@fungicide:matrix.org#status log Rebooted wiki.openstack.org restoring it to working order again after further database disconnects18:38
@status:opendev.org@fungicide:matrix.org: finished logging18:38

Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!