| *** ykarel_ is now known as ykarel | 10:32 | |
| dtantsur | Hi folks! I'd appreciate review on https://review.opendev.org/c/openstack/project-config/+/991483 please | 13:01 |
|---|---|---|
| *** jgilaber_ is now known as jgilaber | 14:01 | |
| clarkb | fungi: I know we're going into the board meeting tunnel I +2'd ^ with my note about the branches situation I think it will be fine but maybe double check what I wrote in case it jogs memory before we approve? | 14:55 |
| fungi | yeah, it sounds correct to me | 14:56 |
| fungi | approved it | 14:56 |
| clarkb | sounds good | 14:58 |
| opendevreview | Merged openstack/project-config master: Add 3 more Metal3 projects to untrusted-projects https://review.opendev.org/c/openstack/project-config/+/991483 | 15:09 |
| dtantsur | Thank you! | 15:38 |
| dtantsur | Hmm, is it just me or did gerrit go for a walk a minute ago? | 15:40 |
| dansmith | same here | 15:40 |
| dtantsur | Pings work and curl went through eventually, but Firefox times out. | 15:41 |
| dtantsur | Yeah, curl -I finishes successfully in 97 seconds. More AI happiness? | 15:43 |
| clarkb | system load is relatively low and gerrit is up and running. I wonder if it is the proxy getting overloaded | 15:45 |
| fungi | proxy as in apache? i can take a look at the server-status scorecard | 15:50 |
| clarkb | fungi: is apache status trackign enabled on review? I do suspect it is apache running out of slots. Though I can see logs from about 9 hours ago where it complains about running out of max request workers | 15:50 |
| clarkb | fungi: yes apache | 15:50 |
| clarkb | er though I can see it complaining in the error.log about that 9 housr ago I don't see current complaints | 15:51 |
| fungi | it's definitely acting like it's overloaded, just trying to `wget http://localhost/server-status` from a shell on the server is hanging right now | 15:52 |
| fungi | so definitely acting like it may be out of worker slots | 15:52 |
| fungi | finally returned to me and it looks like all workers are either open with no current process or reading requests | 15:53 |
| fungi | so all running workers are full up | 15:54 |
| clarkb | ok in the past we've avoided increasing those limits due to the thread pool sizes on the gerrit side | 15:54 |
| fungi | i'll try to characterize/summarize the pending requests | 15:54 |
| clarkb | thanks | 15:54 |
| fungi | though it looks like the bulk of them haven't actually gotten the request in yet, it's just showing up as empty | 15:55 |
| clarkb | fungi: look at `ss -npt | less` or similar. One ip address tands out | 15:58 |
| clarkb | oh thats our own address? | 15:58 |
| clarkb | oh I'm looking at he columns wrong duh thats the local side | 15:58 |
| clarkb | but that does show an imbalance between frontend connections and connections to the backend | 15:59 |
| clarkb | so its like we're not even trying to connect to gerrit? | 15:59 |
| fungi | i see a single ipv6 address in the server-status output with about 20 outstanding requests | 15:59 |
| fungi | another with 6, one with 4, a few with 2, and about 180 with only one request each | 16:00 |
| clarkb | sanity checking the backend further gerrit show-queue looks fine | 16:01 |
| clarkb | so Ithink this is on the frontend | 16:01 |
| fungi | might be a good excuse to add anubis there | 16:02 |
| clarkb | I see a bunch of git upload packs getting through | 16:05 |
| clarkb | in theory most of that traffic should be funneled to the gitea farm | 16:05 |
| clarkb | (but if that is the traffic that is the problem anubis may not be immediately helpful) | 16:06 |
| ykarel | hi can ^ also make zuul gate result not getting reported and patch not being merged? | 16:17 |
| ykarel | all jobs passed https://zuul.openstack.org/buildset/ce5b2b3bf586458c88f1d57d928cc5ee but it says buildset in progress | 16:18 |
| ykarel | clarkb, fungi ^ | 16:18 |
| fungi | ykarel: possibly, zuul does use the gerrit rest api in order to be able to return inline review comments | 16:18 |
| fungi | so if it can't establish an https connection to gerrit it may impact result reporting | 16:19 |
| ykarel | ack, so need to recheck or possible to merge based on above results itself? this was top in the gate queue | 16:19 |
| fungi | seems like zuul considers the buildset in progress but the change item is no longer enqueued in any pipeline according to the status view | 16:22 |
| clarkb | its possible that the db reporting didn't happen because gerrit reporting failed? | 16:22 |
| opendevreview | Dmitriy Rabotyagov proposed openstack/project-config master: Deprecate Vitrage project https://review.opendev.org/c/openstack/project-config/+/982878 | 16:22 |
| clarkb | the data you see in your zuul web ui for historical dbs comes from the database and that is a separate reporting step | 16:23 |
| fungi | clarkb: ooh, okay right this isn't in-memory state it's pulling from sql so yes may have not gotten recorded there | 16:23 |
| ykarel | when i was watching the status page, there were 3-4 patches in top of the gate queue which had all the jobs passed, at that time gerrit was not loading for me | 16:23 |
| fungi | ykarel: so yes, i expect that change possibly others failed to report, which will be evidenced by a missing verified vote from the zuul user, a recheck will unfortunately test them all over again but should work as long as gerrit doesn't start having problems again | 16:48 |
| fungi | it seems fine at the moment after forcefully restarting the main apache daemon a little while ago. we're still going through logs and coming up with some potential mitigations, though it's as of yet unclear what caused it to stop recycling worker processes or forwarding requests to gerrit | 16:51 |
| opendevreview | Stephen Finucane proposed openstack/pbr master: Add Resolute, py314 testing https://review.opendev.org/c/openstack/pbr/+/989938 | 22:06 |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!