Wednesday, 2026-03-04

-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 978824: Upgrade Gerrit images to 3.11.9 and 3.12.5 https://review.opendev.org/c/opendev/system-config/+/97882400:10
@clarkb:matrix.orgok none of the plugins have the new tags so I've updated the change and made a note of this in the commit message00:11
-@gerrit:opendev.org- yatin proposed: [zuul/zuul-jobs] 961208: Make fips setup compatible to 10-stream https://review.opendev.org/c/zuul/zuul-jobs/+/96120808:11
@fungicide:matrix.orgafter my current meeting i'll work on getting yesterday's hackarounds better encoded into changes in review14:06
@fungicide:matrix.orgnf_conntrack_count on static02 is still 379606 at the moment, so we probably shouldn't lower it just yet14:06
@fungicide:matrix.orgi'm seeing some attempts now to reach similar crazy urls like `docs.openstack.org/security-guide/content/compliance/dashboard/dashboard/identity/compute/common/image-storage/secrets-management/secrets-management/networking/secure-communication/identity/image-storage/identity/dashboard/image-storage/secrets-management/common/dashboard/checklist.html` though most of the addresses are already blocked at this point i think14:09
@fungicide:matrix.orgthe most recent new hit for the ^/developer/watcher/datasources/.* rule was at 12:25:55 (nearly 2 hours ago)14:13
@fungicide:matrix.orgokay, another hit on that rule 2 minutes ago, so it's still happening14:47
@clarkb:matrix.orgfungi: I was thinking that bumping up the expiry for keys on that table to a day may allow conntrack numbers to fall further?15:26
@clarkb:matrix.orgPossible the shorter timeout is letting more stuff hang around?15:26
@clarkb:matrix.orgI'm having a very slow start this morning, but I think it that Gerrit change is happy we might proceed with it and plan to restart Gerrit after it lands15:27
@fungicide:matrix.orgi expect the most straightforward way to get the connection tracking back to normalish sooner is a reboot of the server15:44
@fungicide:matrix.orgor using the conntrack utility to manually expire older sessions from the table ahead of their expiration15:45
@fungicide:matrix.orgas far as the mod_security honeypot table, i was planning to increase the record lifetime to match the 24-hour expiration as part of the patches i'm reworking15:47
@fungicide:matrix.orgbut can also do that manually on the server in the meantime15:47
@fungicide:matrix.orgapache has spun down about half the allowed worker processes, so it's keeping up pretty well from that angle15:48
@fungicide:matrix.orgas far as the gerrit upgrade, i'm heading out to run some lunch errands, but should be back in about an hour and am happy to help with the restart once i'm home15:51
@fungicide:matrix.orgbbiab15:53
@clarkb:matrix.orgok I'll probably approve that in 30-45 minutes if nothing else comes up before then16:09
@clarkb:matrix.orgI did a spot check and delete-project doesn't have new tags either. That was somethign I wanted to make sure we checked in case we had new tags to update to for plugins16:23
@clarkb:matrix.orgI have approved https://review.opendev.org/c/opendev/system-config/+/978824 to update gerrit to 3.11.917:01
@clarkb:matrix.orgwe have about half an hour to decide to unapprove it if something comes up17:01
@fungicide:matrix.orgthanks17:23
@fungicide:matrix.orgi'm on hand again for once it's through17:24
@clarkb:matrix.orgcorvus and I are currently debugging a zuul issue for gerrit's zuul that may be related to the submit whole topic updates. Since we don't use that feature I'm not super concerned but if we update we should pay attention to zuul weirdness around dependency loops 17:44
@clarkb:matrix.orgLooks like the issue may not be directly related to the Gerrit update after all so I think we proceed as planned within opendev17:54
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [opendev/system-config] 978824: Upgrade Gerrit images to 3.11.9 and 3.12.5 https://review.opendev.org/c/opendev/system-config/+/97882417:57
@fungicide:matrix.orginfra-prod-service-review is in progress18:01
@clarkb:matrix.orgWhich should be a noop. We're mostly needing promotions then we can proceed with restarting when ready18:01
@fungicide:matrix.orgyeah, that already succeeded18:02
@fungicide:matrix.orgi have a root screen session started on review0318:04
@clarkb:matrix.orgthanks I'll join shortly. Trying to pull up the image on quay.io first18:04
@fungicide:matrix.org`quay.io/opendevorg/gerrit       3.11      b4345ed1ab79   3 weeks ago    715MB` looks like the image we're running on18:05
@clarkb:matrix.orghttps://quay.io/repository/opendevorg/gerrit/manifest/sha256:4b1401a000571a4fc0042f32a651823ab1b9c52a76ff0912ebc340a73c9e64d3 is the new 3.11 tag18:05
@fungicide:matrix.org`quay.io/opendevorg/gerrit       3.11      2c44da2425cf   26 minutes ago   714MB` is what i just pulled18:07
@fungicide:matrix.org`quay.io/opendevorg/gerrit@sha256:4b1401a000571a4fc0042f32a651823ab1b9c52a76ff0912ebc340a73c9e64d3`18:07
@clarkb:matrix.orgthose hashes seem to match18:07
@fungicide:matrix.orgagreed18:07
@fungicide:matrix.org`docker compose -f /etc/gerrit-compose/docker-compose.yaml down && mv ~gerrit2/review_site/data/replication/ref-updates/waiting ~gerrit2/tmp/waiting_queue_2026-03-04 && rm ~gerrit2/review_site/cache/{gerrit_file_diff,git_file_diff,git_modified_files,modified_files,comment_context}.* && sudo docker compose -f /etc/gerrit-compose/docker-compose.yaml up -d`18:08
@fungicide:matrix.orgthat's what i've queued up to run18:08
@fungicide:matrix.orgstatus notice The Gerrit service on review.opendev.org will be offline momentarily for a software upgrade18:09
@fungicide:matrix.orgthat look okay?18:09
@clarkb:matrix.orgthe two big caches are in the cleanup list so that looks good to me18:09
@clarkb:matrix.organd yes the message also logtm18:09
@fungicide:matrix.orgshall i send that?18:09
@fungicide:matrix.orgor does anyone want to check anything else first?18:09
@clarkb:matrix.orgsure I'll get the zuul queue pausing ready18:10
@fungicide:matrix.org#status notice The Gerrit service on review.opendev.org will be offline momentarily for a software upgrade18:10
@status:opendev.org@fungicide:matrix.org: sending notice18:10
@fungicide:matrix.orgthanks18:10
@clarkb:matrix.orgzuul web ui shows the pause banner message I think we're good on that side18:10
@fungicide:matrix.orgi concur18:11
-@status:opendev.org- NOTICE: The Gerrit service on review.opendev.org will be offline momentarily for a software upgrade18:13
@status:opendev.org@fungicide:matrix.org: finished sending notice18:13
@fungicide:matrix.orgrestarting gerrit now per above18:13
@fungicide:matrix.org2 minutes so far18:15
@clarkb:matrix.orgI think it is processing the git_file_diff cache db18:15
@clarkb:matrix.orgthat is the only cache db file with a lockfile18:15
@fungicide:matrix.org3 minutes now18:16
@clarkb:matrix.orgit is also huge18:16
@clarkb:matrix.orgthe gerrit_file_diff cache db was also huge but seems to have been processed18:16
@fungicide:matrix.orgone minute to go before it times out18:17
@fungicide:matrix.orgstopping took 4m13.1s18:17
@clarkb:matrix.orgwhile the timeout is not ideal I think we've largely decided it is probably harmless since we're going to rm the db backing files anyway18:17
@fungicide:matrix.orgservice should be on its way back up now18:18
@clarkb:matrix.org`[2026-03-04T18:18:10.920Z] [main] INFO  com.google.gerrit.pgm.Daemon : Gerrit Code Review 3.11.9-dirty ready`18:18
@fungicide:matrix.orgwebui is up, reports `Powered by Gerrit Code Review (3.11.9-dirty)`18:18
@clarkb:matrix.orgloading changes was slow but I can see the diffs on one of the chagnes I opened18:19
@clarkb:matrix.orgso seems to be catching back up quickly18:19
@fungicide:matrix.orgstanding by to do a `gerrit index start changes --force` once we're sure it's ready to go18:19
@clarkb:matrix.orgfungi: it is pruning the disk caches we didn't rm18:21
@clarkb:matrix.orgoh and it is done. show-queue doesn't show anymore in a waiting state18:21
@clarkb:matrix.orgI think the main thing to cehck now is pushing a new change/patchset and that replication is good18:21
@clarkb:matrix.orgI'm going to reenable zuul queues18:21
@clarkb:matrix.orgzuul should be back to normal now18:22
@clarkb:matrix.orgour hourly zuul deployment job failed with a post failuire and reports it doesn't have any logs. I think we just check that one in the next hour and make sure it is happy again going forward18:23
@fungicide:matrix.orghttps://review.opendev.org/c/openstack/python-ironicclient/+/978917 was pushed after the restart18:23
@clarkb:matrix.orgIt seems to show up in gitea: https://opendev.org/openstack/python-ironicclient/commit/8d89587250c3de9c5922e652f41acb2d14ba474118:24
@clarkb:matrix.orgso  Ithink pushing code and replicated it is working18:24
@fungicide:matrix.orgyep, confirmed it replicated fine18:24
@fungicide:matrix.orgokay, so start reindexing changes?18:24
@clarkb:matrix.orgI think so18:24
@fungicide:matrix.orgin progress18:24
@fungicide:matrix.orgthere's an error_log tail in the screen session filtered for reindex status updates18:25
-@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/system-config] 978956: Expire mod_security table entries in one day https://review.opendev.org/c/opendev/system-config/+/97895618:50
@fungicide:matrix.orgClark: is that ^ what you had in mind?18:51
@fungicide:matrix.orgif that doesn't break apache in our tests, i can also manually apply it for docs.openstack.org and then do some extended interactive testing18:52
@clarkb:matrix.orgthat was one idea yes. I think the documented method is: https://github.com/owasp-modsecurity/ModSecurity/wiki/Reference-Manual-(v2.x)#seccollectiontimeout18:52
@clarkb:matrix.orgthe documented method would go in /etc/modsecurity.d/ or similar and apply globally to all collections18:52
@clarkb:matrix.orgfungi: /etc/apache2/mods-enabled/security2.conf loads IncludeOptional /etc/modsecurity/*.conf18:53
@fungicide:matrix.org`IncludeOptional /etc/modsecurity/*.conf`18:54
@fungicide:matrix.orgyep18:54
@fungicide:matrix.orgalso the `IncludeOptional /usr/share/modsecurity-crs/*.load` is what i was talking about on the opendevent call18:54
@fungicide:matrix.orgfor crs rules maintained by the owasp community18:54
@clarkb:matrix.orgoh does it already install them even?18:54
@clarkb:matrix.orgthats cool. Maybe we should just turn some of them on and hope for the best :)18:54
@fungicide:matrix.orglooks like we'd have to install the modsecurity-crs distro package, but otherwise yes18:55
@fungicide:matrix.org"modsecurity-crs/jammy 3.3.2-1 all: OWASP ModSecurity Core Rule Set"18:55
@fungicide:matrix.orgso a fairly minor ansible edit if/when we want to give that a try18:56
@fungicide:matrix.orgif i were to do `SecCollectionTimeout 86400` in a separate conf file, would it make sense to create a new ansible role for that like we do with apache-ua-filter or stick copies in individual service roles like we do for apache-connection-tuning?19:04
@clarkb:matrix.orgI'm thinking that the timeout values may need to be tuned per service so keeping that in service roles is probably fine. Alternatively you could have a separate role and make that a parameter19:08
@fungicide:matrix.orgi guess it's a question then of whether we have any reason not to tune them by table19:09
@fungicide:matrix.orgif we may want to vary the global expiration on a server-by-server basis, then is there a good reason not to just do it table-by-table?19:10
@fungicide:matrix.orgdo we expect to have more than one on a server?19:10
@clarkb:matrix.orgthe main reason is we don't know if the change you've pushed will work on a table by table basis. I sort of inferred it might based on the table structure that corvus reported from the tool, but none of the documentation explicitly documents that appraoch19:11
@fungicide:matrix.organd is there a benefit to having them share a timeout?19:11
@fungicide:matrix.orgah19:11
@clarkb:matrix.orgI think if the table by table approach does work then I'm fine with using it as you've proposed. If it doesn't then we're probably stuck with the global modsecurity wide setting19:11
@fungicide:matrix.orgmakes sense, in which case i guess i can start with including the file in each affected service role for now19:12
@clarkb:matrix.orgmy experience with the tool so far is that the parsing is quite strict, but they also let you do all the things. So I think if we pass the parse check then there is a really good chance it will just worm19:12
@clarkb:matrix.org* my experience with the tool so far is that the parsing is quite strict, but they also let you do all the things. So I think if we pass the parse check then there is a really good chance it will just work19:13
@fungicide:matrix.orgyeah, so if 978956 doesn't bomb in zuul, i'll manually apply it on static02 and test it to see if a new block entry persists past an hour19:13
@clarkb:matrix.orgreindexing is nearing completion19:18
@fungicide:matrix.orgfinished19:26
@fungicide:matrix.org`WARN  com.google.gerrit.server.index.change.AllChangesIndexer : Failed 3/972287 changes`19:26
@fungicide:matrix.orgsame as always19:26
@clarkb:matrix.orgPerfect19:27
@fungicide:matrix.orgi'll go ahead and close out the screen session on review0319:27
@clarkb:matrix.orgI think you can shutdown the screen. I haven't detached yet but I don't think I need anything 19:27
@fungicide:matrix.orgdone19:27
@fungicide:matrix.org978956 tests ran clean, so i'll give it a try on static0219:39
@fungicide:matrix.orgmanually patched it into the docs.opendev.org and docs.openstack.org vhosts and am restarting apache now19:42
@fungicide:matrix.orgapache is serving content19:42
@fungicide:matrix.orgserver-status predictably has very few slots in use so far19:43
@fungicide:matrix.orgnf_conntrack_count is still 380840 but i don't expect that to drop substantially for a week, unless the server is rebooted19:43
@fungicide:matrix.orgmy test client has gotten itself blocked, so in a little over an hour i'll try again, like 21:00 utc19:47
@fungicide:matrix.orgif it's still blocked, then this approach seems to be working19:47
@clarkb:matrix.orgSounds good. I would wait at least 70 minutes since it seemed to have a small delay on actually updating the DB when we last tested it19:54
@fungicide:matrix.orgyep20:10
@fungicide:matrix.orginteresting, a wget of https://docs.openstack.org/ from my test client is timing out now21:05
@fungicide:matrix.orgmy local browser can load it fine though21:06
@fungicide:matrix.orgmy request is occupying a worker slot according to server-status21:07
@clarkb:matrix.orgyou're expecting a 403 from the wget test client?21:08
@fungicide:matrix.orgin theory yes21:09
@fungicide:matrix.orgthe request hasn't showed up in either of apache's access log or error log yet21:09
@fungicide:matrix.orgit eventually returned after about a minute, and was a successful get (302 redirect to a 200 page)21:10
@clarkb:matrix.orgboth docs.opendev.org and docs.openstack.org are still working for me. I wonder if we're tripping over some unexpected behavior because we're modifying that database in a way they don't expect us to? Or maybe its specific to your test ip location?21:10
@fungicide:matrix.orghah, and now when i try it again i get a 403 forbidden21:11
@clarkb:matrix.organd you didn't rerequest the bad path in the interim?21:11
@fungicide:matrix.orgcorrect, nothing on that vm requested anything from static sites21:12
@fungicide:matrix.orgthe access log contains a record of the requests now, it's the recent client with a "Wget/1.25.0" ua21:13
@clarkb:matrix.orgdoes it show your 200/302? I wonder if there is a period where it thinks it is expired then unexpires it after processing a request?21:14
@fungicide:matrix.orgaccess log even seems to record the delay, there's a nearly 5-minute delay between the 302 redirect and the 200 page it redirected to21:14
@clarkb:matrix.orgAlso I've realized that my timezone math was off by an hour. The 2200-0000 opendevent time  starts in about 45 minutes. Unfortunately I have to do a school run in about 55 minutes. If we readjust to 2235-0000 that is probably best for me and I'll do my best to get back from the school quickly21:15
@fungicide:matrix.organd then a minute after that i tried / again and got a 403 forbidden21:15
@clarkb:matrix.orgweird. This feels like "you're using initcol field and key expirations in an unexpected manner" resulting in unexpected behavior but that is just a hunch21:16
@fungicide:matrix.org`[04/Mar/2026:21:04:21 +0000] "GET / HTTP/1.1" 302 4302 "-" "Wget/1.25.0"`21:16
@fungicide:matrix.org`[04/Mar/2026:21:09:02 +0000] "GET /2025.2/ HTTP/1.1" 200 35990 "-" "Wget/1.25.0"`21:16
@fungicide:matrix.org`[04/Mar/2026:21:10:49 +0000] "GET / HTTP/1.1" 403 4241 "-" "Wget/1.25.0"`21:16
@clarkb:matrix.orgre opendevent we managed to cover all the topics in the last two sessions. I figured I'd show up to the third one today just to cover and additioanl topisc people want to go over and catch those who couldn't amke the first two sessions up on what we discussed. Hopefully a shift to a later start time works for that21:17
@fungicide:matrix.orgyeah, feels like it timed out trying to look up something, failed open and returned normal content21:17
@fungicide:matrix.orgthen managed to cache the actual lookup and refused the next request21:17
@fungicide:matrix.orgi'm honestly wondering if it wouldn't be simpler and still effective to just drop the overrides entirely and let it use the default hour expiry for everything21:19
@clarkb:matrix.orgya I think the only downside to an hour expiry is it might be less aggressive with shooing things away which may lead to larger conntrack tables21:21
@clarkb:matrix.orgbut the conntrack table now seems stable enough and the larger size doesn't seem to be an inherent issue with the actual problem under control21:21
@fungicide:matrix.orgif we set `SecCollectionTimeout 86400` globally on the server, do we need `xpirevar:ip.honeypot=86400` in the rule at all?21:21
@clarkb:matrix.orgfungi: I'm not sure. I think the idea is that we have a row in a db with multiple values and the specific values can timeout separately21:22
@clarkb:matrix.orgfungi: in the case of IP addresses we'd keep the ip address row for up to a day before treating it as invalid. If we were to do a timeout for honeypot less than 24 hours then I think we need to keep the value21:23
@clarkb:matrix.organd it will expire the row later?21:23
@clarkb:matrix.orgtrying to math that out in my head I suspect we're ok with it as long as the value is the same but should have a value if they differ (and it really only makes sense to have differences that are smaller than the row timeout)21:23
@fungicide:matrix.orgyeah, i guess it was unclear to me whether ip.honeypot would inherit the global SecCollectionTimeout expiry21:24
@clarkb:matrix.orgfungi: I suspect the reason the documentation and examples show value specific expiration timers is that you can handle them on shorter intervals than the db row expiration21:26
@fungicide:matrix.orgi just noticed that the libapache2-mod-security2 package installs a /etc/modsecurity/modsecurity.conf-recommended though that doesn't include any SecCollectionTimeout override21:36
@clarkb:matrix.orgyes I'm not sure how valuable that is. I think it overrides the database file location dir too21:37
@clarkb:matrix.orgcompared to the config we actually load in /etc/apache2/mods-enabled21:37
@fungicide:matrix.orgindeed, it switches SecDataDir from /var/cache/modsecurity to /tmp/21:39
@clarkb:matrix.orgthat might give us better performance with the db and locking if /tmp is a tmpfs?21:43
@clarkb:matrix.orgbut I don't know that is critical now that things have settled down21:43
@fungicide:matrix.orgalso /tmp would be cleared at reboot if so21:43
@clarkb:matrix.orgI'm noticing a couple of post failures in the zuul status dashboard with no longs similar to the zuul hourly deployment job from earlier today. I checked one of them (https://zuul.opendev.org/t/openstack/build/2d4562ce29f04349b221188ee51d52c2) and it ran on ze07 and attempted to upload to ovh gra21:59
@fungicide:matrix.orgi've unapplied the ip.KEY expiry overrides on static02 and added `SecCollectionTimeout 86400` in a `/etc/modsecurity/collection-timeout.conf` file then restarted apache and tested from another previously untainted vm21:59
@clarkb:matrix.orgI'm not sure yet if this is a persistent issue we need to be modifying base job config for but we should keep an eye on it.22:00
@clarkb:matrix.organd as mentioned earlier I have to pop out shortly for a school run which conflicts with the previously planned 2200 UTC start time for the last pre ptg opendevent block22:00
@clarkb:matrix.orgI'll be back around 2235 UTC and will jump on meetpad then22:01
@clarkb:matrix.organd if there isn't anything more to discuss we don't have to hang around longer. I just want to make sure my late start is noted so there is hopefully less confusion if anyone does join22:01
@fungicide:matrix.orgone thing i've noticed with triggering mod_security rules now is that there's a minute or so lag between when i hit a trigger url and when it starts returning 403 on other normal urls for the site, which i suspect is related to the size of the database. when we were originally testing it was ~instantaneous22:02
@clarkb:matrix.orgfungi: I think its actually due to how oftenthe different processes load the db from disk?22:03
@clarkb:matrix.orgI want to say I read something about that and it reloads the db every so often otherwise its local memory copy can be stale relative t othe others?22:04
@clarkb:matrix.org* I want to say I read something about that and it reloads the db every so often otherwise its local memory copy can be stale relative to the other processes?22:04
@fungicide:matrix.orgah, that would also make sense22:04
@fungicide:matrix.organyway, it's consistently far more laggy about it now than it was initially22:05
@clarkb:matrix.orgwhen we originally tested I suspect we had a small number of processes so it was probably more likely to avoid this issue? But I'm not positive this is the reason22:06
@clarkb:matrix.orgalso I have to pop out now. Back in a bit22:06
@fungicide:matrix.orgalso likely, yep. talk to you shortly22:06
@clarkb:matrix.orgok back22:36
@clarkb:matrix.orgLooks like I'm the only one on meetpad. I'll hang out for a bit and happy to go over anything we've already talked about or talk about new stuff22:37
@jim:acmegating.comClark: i can be join if you need me, but i think i got fairly good coverage of topics of interest previously.  lmk if you think otherwise.  :)22:48
@clarkb:matrix.orgcorvus: its just fungi and I so far and ya I think we covered most of the stuff with you already22:48
@clarkb:matrix.orgif you don't have anything new then I wouldn't bother22:48

Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!