Tuesday, 2026-03-03

@mnasiadka:matrix.orgI think releases.openstack.org got overwhelmed by crawlers or something similar13:25
@gtema:matrix.orgdocs.openstack.org are also not reachable13:28
@garyx:matrix.orgtarballs.opendev.org is down as well 13:47
@mnasiadka:matrix.orgThatโ€™s probably the same server :)13:53
@garyx:matrix.orgyeah most likely, just reporting it as well :)13:53
@mnaser:matrix.orgah I was just about to join the train :)14:08
@fungicide:matrix.orgi'm looking into it, seems like i can't even ssh into the server14:10
@fungicide:matrix.orgsadly rackspace classic doesn't implement the console log, but what little i can see from the novnc console is showing blocked processes and hung kernel tasks, no idea how old those are though since it's timestamped by seconds since boot14:13
@fungicide:matrix.orgserver instance is in an active state and pressing return in the console gives me a login prompt, but we don't set passwords on any accounts and my ssh attempts to it time out both over ipv4 and ipv6 so best i can do is attempt a ctrl-alt-del from the console or reboot over nova api14:15
@fungicide:matrix.orgeven ping (icmpv4 and v6) are mostly dead. i did manage to get two v4 echo replies just now but that was like 95% lost14:17
@garyx:matrix.orgyeah ping for me is 99% dead14:17
@mnaser:matrix.orgare other rackspace servers properly working or not?  maybe its a widespread network issue there14:17
@fungicide:matrix.orgi can ssh into afs01.dfw.openstack.org in the same region just fine, but it also gets almost 100% packet loss trying to reach static02.opendev.org locally14:19
@mnaser:matrix.orgah so maybe vm or hypervisor local issue then14:19
@fungicide:matrix.orgthe secondary nic on static02 which is routed over a separate rfc1918 network is also almost 100% packet loss14:20
@fungicide:matrix.orgwe also don't seem to have any tickets from rackspace support notifying us of an impacting outage or anything, so i'll proceed with attempting a graceful reboot14:21
@fungicide:matrix.orgi think the server itself was likely overloaded, because as soon as it started terminating processes during shutdown i suddenly stopped getting packet loss14:22
@fungicide:matrix.orgserver instance14:22
@mnaser:matrix.orgyeah same here, and looks like its back14:23
@mnaser:matrix.orgat least icmp wise :)14:23
@fungicide:matrix.orgit's possible our recent increase in apache worker slots (in an attempt to handle more crawlers) was too generous14:23
@fungicide:matrix.orgi can ssh into it again14:24
@fungicide:matrix.organd sites it serves return content for me now14:24
@garyx:matrix.orgSame, content is working as far as I can tell. 14:25
@fungicide:matrix.org#status log Rebooted static02.opendev.org in order to return it to working order, as it was unreachable and appeared to be overloaded; investigation is underway14:25
@garyx:matrix.orgDo you have any monitoring running on this one to see the load and/or logs? 14:26
@fungicide:matrix.orggaryx: we do, it's a pain to get to at the moment because it's in severe need of being rebuilt/replaced and we've locked off public access in order to not expose any vulnerabilities in the seriously outdated software it runs14:27
@fungicide:matrix.orgbut i'll get an ssh tunnel set up to the cacti server from my workstation and pull up the graphs in a bit14:28
@garyx:matrix.orgThat's totally understandable, tech debt is a thing14:28
@fungicide:matrix.orgi think i may see what was impacting the networking14:29
@fungicide:matrix.orgdmesg is already flooded with "nf_conntrack: table full, dropping packet"14:29
@mnaser:matrix.orgahh14:29
@garyx:matrix.orgif you have the memory you can resize the table. 14:30
@fungicide:matrix.orgso whatever is hammering it seems to be exhausting the default 64k entry limit on conntrack table size14:30
@fungicide:matrix.orgwell, allowing it to handle more connections than that may also just cause it to fall over for other reasons14:30
@garyx:matrix.orgTrue, but I have made that table bigger in quite a few instances and usually it's fine. 14:31
@fungicide:matrix.orgseparately, statusbot never acknowledged my log command, so it probably needs to be looked at as well14:31
@garyx:matrix.orgYou can always revert later. 14:31
@fungicide:matrix.orgyeah, it's an 8gb ram server instance and currently using about 6gb of that immediately after reboot, though 1.5g is occupied by buffers/cache14:32
@fungicide:matrix.orggranted, the ram usage is all apache workers, so probably unusually high due to whatever bot army has decided to index the whole thing at once14:33
@garyx:matrix.orgyeah, that table if you resize it does not use that much memory in my experience. Apacher workers use much more. 14:34
@fungicide:matrix.orgbut yeah, the web sites are already back to being unreachable for me, though my ssh session is still fine presumably due to having an established session14:34
@garyx:matrix.orgHopefully the army stays away but if the table fills up once, I usually see it happen again later. 14:36
@fungicide:matrix.orgokay, i doubled it with `sysctl -w net.netfilter.nf_conntrack_max=131072`14:36
@garyx:matrix.orgYou can save that also in /etc/sysctl.conf so it survives reboot. 14:37
@fungicide:matrix.orgwell, yeah this is just testing for now14:37
@garyx:matrix.orgGetcha14:37
@fungicide:matrix.orgif we want it to survive we'll put it in configuration management so it survives more than just reboots14:37
@fungicide:matrix.orgstill getting table full messages eveb after doubling the limit14:38
@fungicide:matrix.orgdoubled again to 262144 now14:38
@mnaser:matrix.orgi wonder if there's some form of bad clients that are opening and not closing connections14:39
@fungicide:matrix.orgstill getting those errors14:39
@fungicide:matrix.orgincreased to 524288 now14:40
@fungicide:matrix.orgfwiw, this is similar to the situation we've been getting on wiki.openstack.org for the past few days now14:41
@jim:acmegating.comfungi: statusbot failed due to an error from wiki14:41
@fungicide:matrix.orggo figure14:41
@fungicide:matrix.orgi think the internet may finally have collapsed due to badly-configured llm training crawlers, and we can take up goat farming14:42
@jim:acmegating.comi concur14:42
@garyx:matrix.orgYay, goat farming might just be the pivot I was looking for from devops. 14:43
@garyx:matrix.orgHave you run  `conntrack` to see if you have ip's hogging the connections table? 14:44
@fungicide:matrix.orgfwiw, i'm not getting new table full errors in dmesg yet at 524288, but i still can't load page content14:44
@mnaser:matrix.orgI'm going to guess that it's probably because all slots are occupied in apache14:45
@fungicide:matrix.orgat this point probably14:45
@fungicide:matrix.orgalso dmesg is periodically logging eth0 going in and out of promiscuous mode for a few seconds at a time14:45
@fungicide:matrix.orgnot sure if that's just normal operation14:45
@mnaser:matrix.orgthat weird, anyone running tcpdump on that system?14:45
@jim:acmegating.comyep that's me14:45
@jim:acmegating.commost apache slots are in "R" reading request14:46
@fungicide:matrix.orgi concur that apache worker slots are probably full, but i can't get it to respond to my request for server-status over the loopback to confirm, so will just assume14:46
@fungicide:matrix.orgoh, you got a response14:46
@jim:acmegating.commost requests are for /developer/watcher/datasources/datasources/actions/strategi14:47
@jim:acmegating.com(it's truncated in the view)14:47
@jim:acmegating.comsome other docs in that hierarchy too14:48
@fungicide:matrix.orgsean-k-mooney: ^ any idea if that's something software could try to request programmatically or whether it's just documentation?14:48
@jim:acmegating.coma quick look at the ip addrs suggests that they are all unique.  also, ipv4 and ipv6.14:51
@fungicide:matrix.orgalso nf_conntrack_count did eventually reach 524288 and it's hovering there again14:52
@fungicide:matrix.orgi'm trying to figure out which table(s) it's in14:52
@tafkamax:matrix.orgSelf DDOS would be a classic move though ๐Ÿ˜14:54
@tafkamax:matrix.org* Self DOS would be a classic move though ๐Ÿ˜14:54
@tafkamax:matrix.orgWe have openvas that does it our infra...14:54
@tafkamax:matrix.orgneed to tune alertmanager because of it.14:55
@garyx:matrix.orgI mean who hasn't done that at least once in their career? 14:55
@jim:acmegating.comi'm a little confused; in afs, i don't ee a "datasources" directory under /afs/openstack.org/docs/developer/watcher14:57
@jim:acmegating.comindeed: "GET /developer/watcher/datasources/datasources/actions/datasources/actions/strategies/datasources/actions/contributor/actions/strategies/man/strategies/man/integrations/strategies/admin/architecture.html HTTP/1.1" 30115:00
@jim:acmegating.comperhaps a redirect loop is involved15:00
@fungicide:matrix.organalyzing conntrack table entries, the majority of connections are from the server to itself, so presumably some activity (maybe hitting afs-hosted content since that's about the only thing it does) is resulting in local connections that aren't terminating15:00
@fungicide:matrix.orgcodesearch turns up a bunch of hits in openstack watcher and vitrage docs about datasources configuration15:03
@jim:acmegating.comfungi: it looks like something is generating requests for a bunch of bogus urls like above, perhaps a redirect loop or some other misconfiguration (either on our side or theirs).  perhaps each of those is resulting in afs traffic in order to get a negative fstat result; so we're not benefitting from the afs cache in that case.15:09
@fungicide:matrix.orgthat would certainly make sense15:10
@fungicide:matrix.orglooking at https://opendev.org/openstack/watcher/raw/branch/master/doc/source/admin/index.rst it seems like that could be the source of the initial links into nonexistent top-level datasources15:12
@jim:acmegating.comif i wget http://docs.openstack.org/developer/watcher/datasources/datasources/actions/strategies/actions/actions/strategies/man/strategies/actions/datasources/datasources/admin/install/admin/integrations/strategies/index.html i get a location header pointing to the same url15:12
@fungicide:matrix.orgyeah, that certainly would lead to a circular trap for crawlers15:13
@jim:acmegating.comoh wait sorry, that was an http->https redirect; i'm retrying that correctly15:14
@fungicide:matrix.orgfwiw those paths exist in the git repository but don't seem to have gotten installed into afs15:14
@jim:acmegating.comhrm, it looks like we don't split our http/https logs, so perhaps all those 301s are just http->https redirects also?15:15
@fungicide:matrix.orgi wonder if their most recent docs build got interrupted during post-run and didn't write the whole thing15:15
@jim:acmegating.comroot marker is: Project: openstack/watcher Ref: master Build: 1324c0b3665643018366489b7dbbb248 Revision: 5f179609d0ee145fc7957972c83593cce242884d15:16
@jim:acmegating.comi can't remember if that's written at the start or end15:16
@tkajinam:matrix.orghttps://zuul.opendev.org/t/openstack/builds?job_name=promote-openstack-tox-docs&project=openstack/watcher15:17
@tkajinam:matrix.orgat least the promoiton job suceeded without any error, according to zuul15:17
@fungicide:matrix.orgthe site's .htaccess file includes `redirectmatch 301 ^/developer/watcher($|/.*$) /watcher/latest/` and `redirectmatch 301 ^/watcher/?$ /watcher/latest/` (same as for other openstack projects) but watcher isn't doing versioned publishing either15:18
@jim:acmegating.comhttps://zuul.opendev.org/t/openstack/build/1324c0b3665643018366489b7dbbb248  this doesn't exist... that build should be in the openstack tenant, right?15:18
@jim:acmegating.comJun 30  2017 .root-marker15:20
@jim:acmegating.comis /afs/openstack.org/docs/developer/watcher the right place?15:20
@fungicide:matrix.orgoh! okay, `/afs/openstack.org/docs/developer/watcher` is stale content15:20
@tkajinam:matrix.orgI guess that /developer path is an old path ?15:20
@fungicide:matrix.orgthe redirects above should be going into `/afs/openstack.org/docs/watcher/latest` which does exist15:21
@fungicide:matrix.orgso whatever's making those requests isn't following the 301 redirect responses then15:21
@jim:acmegating.comor perhaps they are, but they're not getting an answer due to the afs contention15:22
@fungicide:matrix.organd e.g. `/afs/openstack.org/docs/watcher/latest/datasources` is there15:22
@jim:acmegating.comi haven't been able to complete a request to one of those urls yet15:23
@fungicide:matrix.orgokay, so possible this is a symptom15:23
@jim:acmegating.comif i restart apache, is it possible i might get a request through?15:24
@jim:acmegating.comi'd really like to see the response15:24
@fungicide:matrix.orgyes15:26
@jim:acmegating.comfungi: i'm going to try that15:26
@fungicide:matrix.orggo for it15:26
@fungicide:matrix.orgnot like it will get any more broken than it already is15:26
@jim:acmegating.comLocation: https://docs.openstack.org/watcher/latest/ [following]15:27
@jim:acmegating.comthat lgtm15:27
@jim:acmegating.comit's starting to look like bad url handling on the side of the botnet?15:28
@fungicide:matrix.orgokay, so presumably any redirect loops (if they're happening) are a cascade effect once things start breaking down15:28
@fungicide:matrix.orgthat does seem more likely15:28
@fungicide:matrix.orgwe could start adding those malformed urls to our waf rules temporarily, so that all the ip addresses requesting them get forbidden15:29
@jim:acmegating.comfungi: how do you feel about making /developer/watcher/datasources/.* a tripwire for the new system?15:29
@jim:acmegating.comyeah that :)15:29
@fungicide:matrix.orgpresicely, yes15:29
@fungicide:matrix.orger, precisely15:30
@jim:acmegating.comyou've got my +2 if you want to do that15:30
@fungicide:matrix.orgi'll temporarily add it to the vhost config and see what happens15:30
@fungicide:matrix.orgoh! actually that depends on https://review.opendev.org/978118 but i can hack something in for now15:31
@fungicide:matrix.orghopefully clients requesting anything under /developer/watcher/datasources/ on docs.o.o are getting 403 denied responses now15:38
@fungicide:matrix.org`[Tue Mar 03 15:38:56.480726 2026] [:error] [pid 17077:tid 140594856805952] [client <redacted>:57798] [client <redacted>] ModSecurity: Access denied with code 403 (phase 1). String match "/developer/watcher/datasources/" at REQUEST_URI. [file "/etc/apache2/sites-enabled/50-docs.openstack.org.conf"] [line "24"] [id "9002"] [hostname "docs.openstack.org"] [uri "/developer/watcher/datasources/datasources/actions/strategies/actions/actions/strategies/man/strategies/actions/strategies/configuration/datasources/admin/datasources/man/integrations/index.html"] [unique_id "aacAkIBlhckR6gJpFHdf3AAADU8"]`15:39
@fungicide:matrix.orgthat does seem to work15:40
@fungicide:matrix.orgi'll put static02.opendev.org into the disable list for ansible updates temporarily, but i am starting to be able to get content again15:41
@noonedeadpunk:matrix.orgjsut to mention that, quite right before the issue this patch was merged: https://review.opendev.org/c/openstack/openstack-ansible/+/94949715:42
@noonedeadpunk:matrix.orgwhich I would not think of creating issues, but it's kind of unconventional either15:42
@fungicide:matrix.orgthanks for the pointer, i agree it's probably unrelated15:43
@jim:acmegating.comfungi: um how do i clear my ip? :)15:45
@fungicide:matrix.orgchecking...15:46
@clarkb:matrix.orgcorvus: fungi I think if you restart apache it will rebuild the table?15:48
@clarkb:matrix.orgthen you can let the bot reblock itself15:48
@clarkb:matrix.orgmaybe15:48
@fungicide:matrix.orgtesting that theory now15:48
@clarkb:matrix.orgI'm not sure if there is a more precise apache mod security table edit method15:48
@fungicide:matrix.orgi guess it depends on whether this is in-memory tables or persisted to disk15:48
@clarkb:matrix.orgI'm 99% certain it is in memory not disk15:49
@clarkb:matrix.orgIs there anything I can be doing to help at this point?15:51
@fungicide:matrix.orgnot sure yet. ideating?15:52
@clarkb:matrix.orgI guess the service still isn't reachable with the waf block in place so that may not be the only issue?15:53
@jim:acmegating.comi think it's stored in /var/cache/modsecurity/15:53
@jim:acmegating.com-rw-r----- 1 www-data www-data 3469814784 Mar  3 15:53 /var/cache/modsecurity/www-data-ip.pag15:53
@fungicide:matrix.orgwell, after restarting apache we went back to the conntrack table being full15:54
@fungicide:matrix.orgwhich i think is why it's not responding again15:54
@jim:acmegating.comi wonder if apache is still doing some stats even with the mod_security rule?15:55
@jim:acmegating.comlike, maybe it's looking for htaccess files in case they modify the request path, even though it ends up going through mod_security in the end15:55
@jim:acmegating.comalso, there is some lock contention on the mod security database file15:56
@fungicide:matrix.orgat least we don't seem to have a custom 403 page configured in that vhost (only 404)15:56
@jim:acmegating.comokay i picked a random apache process to strace;  i'm not seeing any stats to  afs for bad watcher paths, so i don't like my theory that it's still doing stats.15:58
@clarkb:matrix.org`sudo conntrack -L | cut -d' ' -f 10 | sort | uniq -c | sort` makes it look like a proper ddos15:58
@jim:acmegating.comClark: i agree, i didn't see any duplicate ips15:59
@jim:acmegating.comi'm starting to see a higher proportion of legitimate requests... well, okay, requests from better-behaved llm crawlers, than bad ones16:01
@fungicide:matrix.orgno more conntrack table full errors for the past several minutes16:02
@fungicide:matrix.organd i'm starting to get page content returned again16:02
@fungicide:matrix.orgstill taking a while to fetch server-status16:03
@fungicide:matrix.orgfinally came back, and full of about 50% reading request / 50% logging16:04
@clarkb:matrix.orgcross referencing conntrack listed IPs to apache logs some show that request you identified as potentially problematic and others don't show up at all. I wonder if they simply never got far enough to process a request in apache16:04
@fungicide:matrix.orgmost likely, with the workers slap full16:05
@clarkb:matrix.orgalso many of the IPs seem to originate in a specific (but broad) location in the world16:05
@fungicide:matrix.orgi'm hoping that the waf contention is apache busy writing new clients to the block list and will settle down soon16:06
@jim:acmegating.comyeah... except i haven't found a way to remove an ip from that list16:06
@clarkb:matrix.orgwhere do we see the waf contention?16:06
@jim:acmegating.comClark: flock for the dbm file in strace of apache16:07
@fungicide:matrix.orgserver-status indicates taht basically all requests are for /developer/watcher/datasources/...16:07
@clarkb:matrix.orgack thanks16:07
@fungicide:matrix.orgfollowing `/var/log/apache2/docs.openstack.org_error.log` it's constantly updating16:08
@jim:acmegating.comi see more slots in "Logging" state than "Reading"16:09
@jim:acmegating.comi wonder if writing to the list happens in that phase16:09
@fungicide:matrix.orgi wonder if we need to make the waf hits not log16:09
@fungicide:matrix.orgspot checks don't show us logging any client address more than once though16:10
@jim:acmegating.comzuul logs a lot more data than that; i doubt it's the actual access log that's slow16:10
@clarkb:matrix.orgfungi: I think you can add `,nolog` to the `"id:9002,phase:1,t:lowercase,deny,setvar:ip.honeypot=+1,expirevar:ip.honeypot=86400"` section to stop logging it16:10
@fungicide:matrix.orgso far it looks like it only logs once as it adds the offending client anyway16:11
@jim:acmegating.comi don't think we should stop logging it16:11
@fungicide:matrix.orgthat'll require another apache restart, which i'd rather avoid after how long it was offline during the previous restart16:11
@jim:acmegating.comwell, i've been trying to say we're going to have another one if we can't find a way to remove an ip16:12
@fungicide:matrix.orgpage content is returning quickly for me now too, so hopefully this has reached a happy state even though we're logging a constant flood of errors from waf16:12
@clarkb:matrix.orghttps://github.com/owasp-modsecurity/ModSecurity/wiki/Reference-Manual-(v3.x)#persistent-storage indicates you can store things in memory or on disk16:12
@clarkb:matrix.orgbut I don't see how to convert this to a memory store yet16:13
@fungicide:matrix.orgmaybe we can switch to memory during the restart and then let it rebuild the table of offenders16:13
@clarkb:matrix.orglooks like it might be a global mod secutiry configuration item16:15
@clarkb:matrix.orgrather than table specific16:15
@clarkb:matrix.orgwe are actually using v2 so this document is more correct: https://github.com/owasp-modsecurity/ModSecurity/wiki/Reference-Manual-(v2.x)16:17
@fungicide:matrix.orglooking at some of the trapped clients' requests in the access logs, they're using varied user agent strings from one request to the next too16:17
@fungicide:matrix.orgeven from the same ip address16:17
@fungicide:matrix.orgthe addresses are from allocations managed by rirs all over the world too, looks like maybe mobile clients?16:19
@fungicide:matrix.orgmy guess is one of the larger compromised device bot armies has been asked to crawl the entire web16:19
@fungicide:matrix.organd badly16:19
@clarkb:matrix.orgI suspect (but am not positive) that if we set SecDataDir to in /etc/apache2/mods-enabled/security2.conf to some dir that apache can't write to it would stop persisting the data. But that seems super hacky. I feel liek there must be some way to force memory only but haven't found it16:20
@jim:acmegating.comhttps://github.com/SpiderLabs/modsec-sdbm-util  might be useful for editing16:24
@clarkb:matrix.orgI'm half wondering if we should be looking at a non mod security solution so that we can switch to that then drop the db entirely (since that seems simpler than building a tool to edit the live db)16:30
@jim:acmegating.comit's an old repo but it compiles on jammy16:30
@clarkb:matrix.orgbut we've established a huge variety of IP source and different user agents. Maybe we just mod rewrite that location entirely for now?16:31
@jim:acmegating.comClark: what do you mean?  have mod_rewrite do what?16:31
@clarkb:matrix.orgcorvus: rewrite requests to /developer/watcher/datasources to a 403 response16:32
@jim:acmegating.comwithout the waf?16:32
@fungicide:matrix.orgit would be nice if any addresses requesting that path didn't consume resources requesting other paths too16:33
@clarkb:matrix.orgyes. Though I guess these IPs are requesting other paths as well? This is just the indentifying bad location so having the waf block them entirely may be what is making things better?16:33
@clarkb:matrix.orgya that16:33
@fungicide:matrix.orgthat's not the only thing they're requesting, it's just that those are the requests taking longer and eating more apache slots16:33
@fungicide:matrix.orgbecause they're nonexistent files and so not getting a response back from the afs cache immediately16:34
@clarkb:matrix.orglooking at conntrack the counts are slowly falling so I think the situation is improving? Just slowly16:34
@jim:acmegating.comi have built a copy of that utility and copied the database to the container where i built it and am testing it.  it seems to read the database correctly.  it has 86923 records.16:35
@fungicide:matrix.orgyes, i think as we get more of the offending addresses blocked, they're not keeping connections open as long because apache responds immediately with a http/40316:35
@jim:acmegating.com```16:37
__expire_KEY: 1772556167
KEY: <my ip>
TIMEOUT: 3600
__key: <my ip>
__name: ip
CREATE_TIME: 1772552567
UPDATE_COUNTER: 1
honeypot: 1
__expire_honeypot: 1772638967
```
@fungicide:matrix.orgserver-status is showing a bunch more worker slots waiting for new connections now16:37
@jim:acmegating.comthat's what a record looks like... it looks like expire_honeypot is 24h,  but expire is 1 hour16:37
@jim:acmegating.comi wounder if the 1 hour expiration would cause the entry to be removed even though the honeypot timeout is longer?16:38
@fungicide:matrix.org1772638967 is 15:42:47 utc tomorrow16:38
@jim:acmegating.comthe key expiration time is coming up in 4 minutes though.16:38
@fungicide:matrix.orgyeah, if you fall out of the database and are no longer blocked i guess we'll know16:39
@clarkb:matrix.orgthe 24 hour expiry is the value that our config attempts to expire at. If you ip clears out automatically in 4 minutes then our config isn't doing quite what we expected but maybe good enough for now16:39
@jim:acmegating.comi will do some dishes then check to see if i'm still blocked.16:39
@fungicide:matrix.org#status log Implemented temporary Apache mod_security WAF rules to block clients that are collectively causing a distributed denial of service against our static site hosting16:41
@status:opendev.org@fungicide:matrix.org: finished logging16:41
@fungicide:matrix.orgseems like whatever was impacting the wiki server has temporarily abated, since statusbot is working again16:41
@clarkb:matrix.orgthey moved from wiki to docs :)16:42
@jim:acmegating.comlooks like i'm still blocked; i'm getting an updated copy of the db16:47
@fungicide:matrix.orgthe entry is refreshed there i guess?16:47
@fungicide:matrix.orgi wonder if that tool's `-r` option needs to be used with apache offline16:48
@fungicide:matrix.orgseems apache is getting even happier, seeing open slots with no process now even16:49
@fungicide:matrix.organd some gracefully finishing, indicating it's able to recycle worker processes again16:50
@clarkb:matrix.orgwe can test the SecDataDir to invalid location idea with our CI jobs if we want to try and use that hack to force things into memory only16:50
@jim:acmegating.comoh and now i'm unblocked16:51
@fungicide:matrix.orghuh16:51
@jim:acmegating.comi suspect there may be a periodic cleanup that has to expire the keys16:51
@jim:acmegating.comthe db copy i got just after my unblock time still had an entry for me with the same data16:51
@jim:acmegating.comi'll get a third copy and see if it's gone now16:51
@fungicide:matrix.orggood to know, so our current 24h expiration isn't relevant because those entries get tossed after 1h anyway16:52
@jim:acmegating.comyeah' that's the hypothesis, gimme a min to confirm16:52
@clarkb:matrix.orgso other than the ddos itself two things to look into are manipulating the db somehow (maybe by forcing it into memory only or via a tool to edit in place?) and better understanding the expiration situation16:53
@jim:acmegating.comyep, my entry is no longer in the db16:53
@fungicide:matrix.orgneat16:54
@clarkb:matrix.orgwe do `expirevar=ip.honeypot` whcih does seem to set a 24 hour expiration for the honeypot value. But the key for the entry is what has an hour long expiry and that also results in clearing the item from the db early16:55
@jim:acmegating.comyep16:55
@jim:acmegating.comfor completeness, i have used the tool to remove a key from the database.  it appears to have worked.16:55
@jim:acmegating.comso if we want to keep using the dbm files, then i think it's worth building this tool.  it's a pretty straightforward classic autoconf co tool, and builds easily on jammy.16:56
@jim:acmegating.com * so if we want to keep using the dbm files, then i think it's worth building this tool.  it's a pretty straightforward classic autoconf c tool, and builds easily on jammy.16:56
@jim:acmegating.com(oh, to clarify, i just removed an entry from my local copy, i have not manipulated the server)16:57
@jim:acmegating.comanyone have any other questions about modsec-sdbm-util before i delete my ephemeral container?16:57
@fungicide:matrix.orgi wonder if it handles write locking properly or needs apache to be stopped16:58
@fungicide:matrix.orgbut useful either way16:58
@fungicide:matrix.orgthe documentation was a bit light and i haven't dug into the source to see16:59
@jim:acmegating.comit uses the apache runtime library; perhaps the `apr_sdbm_open` function handles locking?17:00
@jim:acmegating.comi see a lot of flock calls in strace17:02
@clarkb:matrix.orgI wonder if we can do `expirevar:ip.KEY=86400` to increase the time there.17:02
@fungicide:matrix.orgyeah, so maybe we can use it live in that case17:02
@fungicide:matrix.orgeven more convenient if so17:03
@jim:acmegating.comfungi: yeah, i'd say let's assume so and if we're wrong, then we just lose a database we're happy to throw away anyway.  low stakes.  :)17:03
@fungicide:matrix.orgright, it's not a huge deal to test when things are a little more calm17:04
@jim:acmegating.comClark: or maybe https://github.com/owasp-modsecurity/ModSecurity/wiki/Reference-Manual-(v2.x)#user-content-SecCollectionTimeout17:05
@jim:acmegating.comnot sure how you specify which collection (ie, "IP") that applies to?17:05
@jim:acmegating.comall of them?17:06
@clarkb:matrix.orgcorvus: I think that goes in the global conf so it probably applies to all of them. In this case that is fine since we only have the one?17:06
@jim:acmegating.comsgtm17:07
@clarkb:matrix.org`/etc/apache2/mods-enabled/security2.conf` this file appears to be the one17:07
@clarkb:matrix.orgit does do some includes too so we could stash that in a .d dir to avoid modifying that package supplied file17:08
@clarkb:matrix.orgcorvus: I'm checking in with gerrit upstream and gerrit.googlesource.com appears to be slow again. Are you able to check your gerrit-review.googlesource.com test/logs to see if that is looking better?17:15
@clarkb:matrix.orgI want to give them as much info as I can17:15
@clarkb:matrix.orgcorvus: the curl example you gave yesterday appears to be subsecond quick right now17:17
@clarkb:matrix.orgsounds like the gerrit.googlesource.com slowness is `a different ongoing incident`17:19
@fungicide:matrix.orgi wonder if they're getting flooded with requests for nonexistent pages from a global phone botnet17:23
@clarkb:matrix.orgthe thought did cross my mind :)17:24
@clarkb:matrix.orgfungi: thinking out loud here: any idea if caching 404s for say 10 minutes or an hour would help with performance with afs?17:25
@clarkb:matrix.orgwe can cache them on the apache side maybe so that we're not hitting openafs for what are likely missing files17:25
@fungicide:matrix.orgmaybe...17:27
@fungicide:matrix.orgi can't see an obvious downside to that17:27
@fungicide:matrix.orgother than someone expectantly requesting a new page they've created after it's promoted but before the vos release happens seeing a somewhat longer delay before content finally appears, but that should be minor as inconveniences go17:28
@clarkb:matrix.orgright I think if we keep it short to say 10 minutes we mitigate that problem but potentially take a lot of load off of afs?17:29
@clarkb:matrix.orgthen we're checking these files once every 10 minutes rather than 10k times a second17:29
@fungicide:matrix.orgpicking a random blocked client from the log, i see it made a second attempt at requesting a different nonexistent url under the same base path roughly 8 minutes after initially being blocked, so the same clients are continuing to try17:30
-@gerrit:opendev.org- Michal Nasiadka proposed: [openstack/project-config] 978566: propose-updates: Add ansible-lint https://review.opendev.org/c/openstack/project-config/+/97856617:44
-@gerrit:opendev.org- Michal Nasiadka proposed: [openstack/project-config] 978566: propose-updates: Add ansible-lint target https://review.opendev.org/c/openstack/project-config/+/97856617:44
@fungicide:matrix.orgcurrent nf_conntrack_count/nf_conntrack_max on static02 is still 396818/524288 or 76%17:46
@fungicide:matrix.orgthough it's continuing to fall (albeit slowly)17:47
@fungicide:matrix.org#status log Rebooted wiki.openstack.org in order to get user session logins working again17:58
@status:opendev.org@fungicide:matrix.org: finished logging17:58
@clarkb:matrix.orgre wiki I wonder if we can identify a similar marker we could waf (though setting that up is a bit more work there). And I feel bad I'm coming up with all of these crazy ideas and also stuck in meetings all day so can't really dig in too easily18:01
@fungicide:matrix.orgsame18:13
@garyx:matrix.orgThanks for your work today guys, it's really appreciated. 18:47
-@gerrit:opendev.org- Michal Nasiadka proposed: [openstack/project-config] 978566: propose-updates: Add pcu target https://review.opendev.org/c/openstack/project-config/+/97856619:01
@clarkb:matrix.orgfungi: and I are in the opendevent meetpad: https://meetpad.opendev.org/opendevent-march-2026 if anyone else wants to join us. This is replacing the weekly team meeting today19:08
-@gerrit:opendev.org- Michal Nasiadka proposed: [openstack/project-config] 978566: propose-updates: Add pcu target https://review.opendev.org/c/openstack/project-config/+/97856619:30
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 978824: Upgrade Gerrit images to 3.11.9 and 3.12.5 https://review.opendev.org/c/opendev/system-config/+/97882420:52
@clarkb:matrix.orginfra-root ^ as promised on meetpad. As a heads up I need to eat lunch now then I have to go do an appointment. So I'm not sure I'll be back in time today to babysit the upgrade.  I think we can wait until tomorrow though and get it done first thing20:53
@clarkb:matrix.orgif anyone else finds users confused about unexplained merge failures it looks like github was having an outage today and that impacted some merge requests for some jobs.21:46
@clarkb:matrix.orgcorvus: It doesn't look like we log the build that triggered the merge request just the buildset so its a bit harder to track this to a specific build that has deps that maybe it doesn't need. Not sure if that is something that should be changed though the logs are already quite verbose21:46
@clarkb:matrix.orgcorvus: specifically on zuul01 I see `2026-03-03 19:18:15,544 DEBUG zuul.Scheduler: [e: 9bca931fca214313b49a919aa0093dd0] Processing result event <MergeCompletedEvent job: 32e9baf77b9f4217a6f400abd4eb10eb buildset: c6513d9027874a8f8bbc26c6cad499a3 merged: False updated: False commit: None errors: []>` and was able to track that to zm0321:48
@clarkb:matrix.orgbut not sure what specific build tripped over novnv21:48
@clarkb:matrix.org* but not sure what specific build tripped over novnc21:48
@clarkb:matrix.orgfungi: the gerrit image builds failed on a bazel error: https://zuul.opendev.org/t/openstack/build/18e3358146ee431f94894c1a740fae2c/log/job-output.txt#1214-121721:50
@clarkb:matrix.orgCross checking against https://gerrit.googlesource.com/plugins/delete-project/+refs maybe the issue is that not all plugins got the new 3.11.9 and 3.12.5 tags? I just sort of assumed they would when I wrote teh change21:51
@clarkb:matrix.orgbut we may need to check all of those and update the change accordingly. But I have to pop out in the next handful of minutes. If I get back at a reasonable hour I can try to run that down later21:52
@clarkb:matrix.orgI wonder why that didn't fail in zuul21:52
@fungicide:matrix.orgClark: just saw that myself and found the error in the logs, yeah21:53
@clarkb:matrix.orglooks like the hooks plugin hasn't gotten the tags either so this may just be the main repo that got tagged? I can cehck them all when I get back and update the change. Or feel free to do ti and update the change too21:53
@clarkb:matrix.orgfungi: my suspicion is that we're falling back to master since that is the default checkout ref21:53
@clarkb:matrix.orgfungi: and master is not compatible with 3.11 and 3.1221:53
@fungicide:matrix.orgoh yeah21:53
@clarkb:matrix.organd if we update all the tags to valid values then it should be happy again21:54
@clarkb:matrix.orgbut gotta run now21:54

Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!