Friday, 2026-03-13

@mnasiadka:matrix.orgapache on static02 is probably down - connection refused...08:19
@gthiemonge:matrix.orgI get connection refused on https://releases.openstack.org/08:20
@mnasiadka:matrix.orgreleases, docs, tarballs are affected08:26
@tobias-urdin:matrix.org^same08:40
@harbott.osism.tech:regio.chat`Mar 13 01:45:54 static02 systemd[1]: apache2.service: Failed with result 'oom-kill'.`09:33
@harbott.osism.tech:regio.chatrestarted now, maybe the waf stuff took too much memory? also we should likely look into the systemd service doing automatic restarts?09:34
@mnasiadka:matrix.orgIn theory there's Restart=on-abort in the systemd unit file, but it won't work with oom-kill - so we would need to switch to Restart=on-abnormal09:55
@tobias-urdin:matrix.orglooks like it died again13:17
@blasseye:matrix.orgYes.. 13:23
@garyx:matrix.orgIt's dead jim13:26
@jim:acmegating.comthe spike happens very quickly; it's not a gradual buildup13:28
@jim:acmegating.comrestarted again just to try to increase the uptime, but i don't expect it to last.  the url patterns in the log look like the crazy ones from our botnet.13:35
@fungicide:matrix.orgwe're not maxxing out nf_conntrack_count at least13:50
@fungicide:matrix.orgdefinitely lots of apache worker slots in use though there are still plenty waiting/unassigned for the moment13:51
@fungicide:matrix.orgdmesg indicates the oom killer reaped an apache2 process at 01:43:45 and again at 13:15:1313:55
@fungicide:matrix.orgi guess it didn't just take out a worker process, but the parent supervisor both times?13:55
@jim:acmegating.comseems that way.  i've been watching top, and everything looks sensible so far.  lots of workers using a modest amount each, nothing bigger than that.13:59
@clarkb:matrix.orgprior to this round of sadness we did increase the number of total valid workers/threads for apache214:48
@clarkb:matrix.orgits possible that was "safe" before with the characteristics of the requests then but is not safe with the changes we've made or due to changes in request pattersn?14:48
@clarkb:matrix.orgLooks like the service has been running for about an hour now and memory usage is fine. We are not at the server limit either, but it isn't a low server count. But as corvus points out it seems to spike so maybe something very specific that triggers it14:52
@fungicide:matrix.orgwell, we maxxed out workers last week at the same threshold and didn't trigger oom14:53
@fungicide:matrix.orgso maybe subsequent adjustments have caused it to use more ram per worker on average now14:53
@clarkb:matrix.orgmnasiadka: not sure how your day is going, but I'm now looking at static again. I could probably go either way on whether we sync up now. Let me know if you have a strong perference15:02
@clarkb:matrix.orgabout 27% of traffic to docs.openstack.org prior to the 01:44 ish OOM was either a 301 or 302 redirect. and about 7% was 403 rejections15:13
@clarkb:matrix.orgof course there are other vhosts too. I'm just trying to look at it from a high level and see if anything particular problematic stands out15:13
@clarkb:matrix.orgdocs.openstack.org is about 74% of the traffic around this time period15:14
@clarkb:matrix.orgs/traffic/requests total/15:14
@fungicide:matrix.orgyeah, i'm not surprised since most of the problem paths we saw were for /developer/.* and that prefix is redirected to /.*15:41
@fungicide:matrix.orgi assume you were only looking at https access since http is also automatically redirected15:41
@fungicide:matrix.organd for that matter, most paths like /ironic/something get redirected to /ironic/latest/something15:42
@clarkb:matrix.orgI think http and https are combined log files15:43
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 980473: Disable modsecurity response body access in two static vhosts https://review.opendev.org/c/opendev/system-config/+/98047315:51
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [opendev/system-config] 980473: Disable modsecurity response body access in two static vhosts https://review.opendev.org/c/opendev/system-config/+/98047316:50
@clarkb:matrix.orgdeployment reports success and I can see some of the apache worker processes have already rotated out for the reload16:55
@mnasiadka:matrix.orgClark: unfortunately I've got an unplanned family adventure, can we do that on Monday?17:04
@clarkb:matrix.orgmnasiadka: yup! enjoy your weekend17:05
@clarkb:matrix.orgabout 1/3 of the apache workers have still not aged out and been replaced. But memory usage remains at steady sustainable levels17:23
-@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed wip: [zuul/zuul-jobs] 980499: Add and test an ensure-validate-pyproject role https://review.opendev.org/c/zuul/zuul-jobs/+/98049917:24
-@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [openstack/project-config] 980500: Run validate-pyproject during package checks https://review.opendev.org/c/openstack/project-config/+/98050017:27
@fungicide:matrix.orgheading out for a late lunch, back shortly17:43
-@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org marked as active: [zuul/zuul-jobs] 980499: Add and test an ensure-validate-pyproject role https://review.opendev.org/c/zuul/zuul-jobs/+/98049919:26
@clarkb:matrix.orgLooking ahead to next week. After syncing up with mnasiadka on monday I'm thinking that may be a good tiem to upgrade ansible on bridge?19:56
@fungicide:matrix.orgcount me in20:00

Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!