| @harbott.osism.tech:regio.chat | corvus: looks like we have multiple jobs stuck in queued state, likely after the zuul updates earlier today. like one in the openstack gate pipeline and one in periodic weekly, maybe you can take a closer look? | 15:30 |
|---|---|---|
| @harbott.osism.tech:regio.chat | https://zuul.opendev.org/t/openstack/buildset/613857b0af394833b6a53ad8d494346c and https://zuul.opendev.org/t/openstack/buildset/aafb769b692c45239da7de452c1958e4 are the affected buildsets, just in case they get cleaned up somehow | 15:36 |
| @harbott.osism.tech:regio.chat | schedulers were restarted ~ 2h ago, so well after those buildsets started | 15:46 |
| @harbott.osism.tech:regio.chat | not sure if related, this infra-prod-service-zuul run failed at 12:00 since two mergers were found to be unreachable https://zuul.opendev.org/t/openstack/build/e10d1b49e67f416abd8da5458fd0378f | 15:47 |
| @harbott.osism.tech:regio.chat | all component versions show `14.0.1.dev10 fa91e4b3a` which looks like the expected current master version | 15:55 |
| @harbott.osism.tech:regio.chat | nothing that looks obvious to me in the scheduler logs, lots of zk errors but those seem to have been there before | 16:05 |
| @jim:acmegating.com | Jens Harbott: it looks like neither rax-ord and rax-iad are operating as expected, with hundreds of servers stuck in the openstack deleting state. i recommend taking those cloud regions out of service. | 16:35 |
| -@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [opendev/zuul-providers] 977548: Disable rax-ord and rax-iad https://review.opendev.org/c/opendev/zuul-providers/+/977548 | 16:44 | |
| @jim:acmegating.com | infra-root: something like that ^ | 16:44 |
| -@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [opendev/zuul-providers] 977548: Disable rax-ord and rax-iad https://review.opendev.org/c/opendev/zuul-providers/+/977548 | 17:21 | |
| @harbott.osism.tech:regio.chat | seems the launchers are getting `KeyError: 'servers'` when listing servers for that region. not sure yet if things are broken on the rax end or whether a fresh sdk update is hitting us, will try manually from bridge | 17:23 |
| @harbott.osism.tech:regio.chat | `openstack server list` for rax-ord and rad-iad seems to be working fine and fast, so that makes the sdk or something else in the current image the suspect for me | 17:26 |
| @harbott.osism.tech:regio.chat | corvus: do you expect the stuck buildsets to recover or should we dequeue them? | 17:30 |
| @harbott.osism.tech:regio.chat | completely unrelated: wasn't it possible earlier to horizontally scroll long log lines in the "Task Summary" tab like e.g. for https://zuul.opendev.org/t/openstack/build/7e9365f398fa4ccd8bc86eaf35fac006 ? currently that doesn't work for me with either firefox or chromium | 17:39 |
| @jim:acmegating.com | i think there is a good chance of self-recovery, but dequeing builds should make it happen faster | 17:44 |
| @harbott.osism.tech:regio.chat | well it is the weekend, so I'd be willing to give it a chance until tomorrow | 17:46 |
| @harbott.osism.tech:regio.chat | looks like things are proceeding now at least for some of the older buildsets, like https://zuul.opendev.org/t/openstack/buildset/75f190d8968a4462969690ab674e52da , so there is some hope | 17:51 |
| although I note I also cannot horizontally scroll for the timeline of this buildset | ||
| @harbott.osism.tech:regio.chat | it does look a bit however like most recent requests are getting served faster than the old pending ones, so maybe they'll have to wait until we have less load | 18:09 |
| @harbott.osism.tech:regio.chat | corvus: looking at grafana, there are 42 nodes in "Requested" state for rax-ord, which don't seem to have moved since about 15:00, well before we disabled the region. so maybe there is some other issue at work there after all? | 22:22 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!