| @clarkb:matrix.org | I wanted to remind everyone that https://grafana.opendev.org/d/1f6dfd6769/opendev-load-balancer?orgId=1&from=now-24h&to=now&timezone=utc exists as well since I also forgot | 00:20 |
|---|---|---|
| @clarkb:matrix.org | but you can see the time period where things went sideways pretty clearly in there. You can also see that currently things seem to be much better behaved after we made our chagnes and after we repplied them with ansible. | 00:21 |
| @clarkb:matrix.org | I do note that we don't seem to capture the total number of backend servers in an up/down state. That might be a good improvement if anyone is interested in updating the little tool for that /me makes a note that coudl be a fun little programming task | 00:21 |
| @jim:acmegating.com | Clark: fungi mnasiadka the resolute build job got better: https://review.opendev.org/982182 | 01:07 |
| that may lend support to Clark 's theory the problem was disk space (if that build landed on a node with a larger disk) | ||
| @jim:acmegating.com | also the free space dstat graph worked | 01:09 |
| @harbott.osism.tech:regio.chat | looks like dib also needs opendev.org: | 06:30 |
| ``` | ||
| 2026-04-08 01:27:41.225 | Could not open project list url: 'https://opendev.org/openstack/project-config/raw/gerrit/projects.yaml' | ||
| ``` | ||
| @harbott.osism.tech:regio.chat | according to grafana we had a different type of burst from about 1-5 | 06:46 |
| @harbott.osism.tech:regio.chat | I'm testing devstack on resolute locally now and I noticed that Ubuntu offers two flavors of their cloud image now: amd64 vs. amd64v3 (see https://cloud-images.ubuntu.com/resolute/20260328/). I wonder whether we would want to do the same to match our coverage. ofc this would imply having the v3 images only run on providers that have the appropriate hardware | 08:40 |
| @priteau:matrix.org | I am afraid opendev.org is unreachable again | 11:13 |
| @priteau:matrix.org | Getting 403 errors | 11:14 |
| @mnasiadka:matrix.org | Having a look, seems like another new bot attack | 11:40 |
| @lajos_katona:matrix.org | yeah, I had to change to use local upper_constraint.txt to be able to run any tox -e.... command locally | 11:41 |
| @harbott.osism.tech:regio.chat | that looks like the event from tonight again in grafana, not like yesterday. maybe time to go for anubis after all | 12:06 |
| @mnasiadka:matrix.org | Yeah, chasing the bots is great fun | 12:13 |
| @mnasiadka:matrix.org | Actually, added some bots to ua-filter, but it's still the same (in terms of the effect) - however there's a lot of git clients connections | 12:32 |
| @mnasiadka:matrix.org | Jens Harbott: any ideas? ;-) | 12:32 |
| -@gerrit:opendev.org- Zuul merged on behalf of sean mooney: [openstack/project-config] 951395: Update nodeset for test-release-openstack https://review.opendev.org/c/openstack/project-config/+/951395 | 12:53 | |
| @mnasiadka:matrix.org | Pierre Riteau: lajoskatona seems it's better now, don't know if it's my user agent filtering or not 🤷 | 13:20 |
| @harbott.osism.tech:regio.chat | I don't have any more ideas, just noting that "it", whatever it is, seems to just have stopped | 13:20 |
| @harbott.osism.tech:regio.chat | also I have muted this channel, that also mutes pings, please ping me in IRC if needed | 13:21 |
| @lajos_katona:matrix.org | thanks, I check it | 13:21 |
| @mnasiadka:matrix.org | Maybe it's time to get Anubis patch rolling so we stop wasting time on battling such events | 13:23 |
| @mnasiadka:matrix.org | Let me raise a patch on my ua-filter changes for reference | 13:23 |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 983721: Add more ua-filters https://review.opendev.org/c/opendev/system-config/+/983721 | 13:27 | |
| @harbott.osism.tech:regio.chat | mnasiadka: hmm, that's four bots I did consider well-behaved before | 13:29 |
| @mnasiadka:matrix.org | Jens Harbott: I don't think it really fixed anything, more likely the ,,attacker'' got what they wanted - but I've seen hundreds of queries from these bots and that was the only thing standing out | 13:31 |
| @fungicide:matrix.org | any burst you saw on the graph in the 0100-0500z timeframe was possibly the daily periodic jobs thundering herd | 13:56 |
| @mnasiadka:matrix.org | Shouldn't these use Zuul cached repos? | 13:59 |
| @fungicide:matrix.org | based on the discussion in the openstack tc channel, sounds like a lot of projects have started running a precommit configuration that installs dependencies from git remotes | 14:05 |
| @mnasiadka:matrix.org | Uh oh | 14:06 |
| -@gerrit:opendev.org- Stephen Finucane proposed: [openstack/project-config] 983728: Add GitHub mirroring for devstack-plugin-prometheus https://review.opendev.org/c/openstack/project-config/+/983728 | 14:17 | |
| @harbott.osism.tech:regio.chat | I don't think a couple of pep8 jobs could have caused this. also it only happened today, not the days before | 14:27 |
| @mnasiadka:matrix.org | Well, that would sort of fit the theory I mainly have seen git clients fetching whole repositories (apart the bots in the change) — maybe we should check the amount of jobs that ran in the problematic timeframe | 14:29 |
| @mnasiadka:matrix.org | And it would be fantastic if we would log real external ips in Gitea/Apache log (at least would be helpful to stop running around between haproxy and these two) | 14:30 |
| @fungicide:matrix.org | yeah, Clark's proxy protocol change would give us that in theory, though anubis would break it since it doesn't support proxy protocol yet | 14:39 |
| @clarkb:matrix.org | fungi: I think daily periodic jobs start at 0200? So that may not explain the spike in demand, but may explain the large numbers of git client requests? | 14:45 |
| @mnasiadka:matrix.org | It might have been both - the bots and git client requests | 14:45 |
| @clarkb:matrix.org | fwiw looking at grafana it seems like this confirms our previous thought that the prior problem required both updates to make things better. Because I'm assuming now we're just running with more apache worker slots. Which makes things better but not good enough to keep up | 14:45 |
| @clarkb:matrix.org | in particular we seem to be maxing out the haproxy allowed connections. Maxconn is set to 4k and we're sitting there | 14:46 |
| @clarkb:matrix.org | if we assume a mostly even distribution of connections we've configured apache to allow 2048 connections per backend and we haev 6. That is 12288 connections theoretical. If we have backend resource room we could potentially increase the haproxy maxconn limit and see if that helps. It might also make things worse again by overwhelming the backends | 14:47 |
| @mnasiadka:matrix.org | Well, load avg wise the backend didn’t look bad | 14:48 |
| @clarkb:matrix.org | mnasiadka: ya so maybe increasing the haproxy maxconn limit is something we should try | 14:49 |
| @clarkb:matrix.org | I also want to look at request logs during that timeperiod and see if anything jumps out to me. I agree that in general I think we should continue to push towards anubis though as I think that will help with the request fingerprinting and make it far more automatic | 14:49 |
| @clarkb:matrix.org | so I think the ~3 things we should do are consider bumping haproxy maxconns, double check request logs during that time period to see if anything stands out to other eyeballs, and keep making progress towards anubis | 14:50 |
| @fungicide:matrix.org | i'm going to go grab lunch real quick, but can help with those and/or get back to the pending gitea config changes in an hour-ish | 14:52 |
| -@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 983740: Increase the gitea haproxy maxconn limit https://review.opendev.org/c/opendev/system-config/+/983740 | 14:59 | |
| @clarkb:matrix.org | Consider that an RFC | 15:00 |
| @clarkb:matrix.org | now to go look at logs | 15:00 |
| @mnasiadka:matrix.org | Clark: just by looking at Grafana I think there was some spike exactly at 13:30 my time, which was the moment people reported problems with Gitea access - https://grafana.opendev.org/d/21a6e53ea4/zuul-status?orgId=1&from=now-6h&to=now&timezone=utc | 15:16 |
| @clarkb:matrix.org | it looks like git clients were about half (maybe slightly more?) of the total traffic during 11:00-13:00 UTC which is the bulk of the prior episode. About 1/3 of the remaining traffic look to be blatant crawling. Then a good chunk of what is left is the haproxy healthchecks, internal gitea http calls back to itself, pip, more well behaved bots, and a few actual browsers | 15:18 |
| @clarkb:matrix.org | I do wonder if this was a flood of git client requests which while generally cheaper a sufficient number of them would cause problems | 15:22 |
| @clarkb:matrix.org | I note that OSA updates the git user agent to identify itself. Thank you for that! | 15:22 |
| @mnasiadka:matrix.org | Would it make sense to do that for every project in some central place? | 15:31 |
| @clarkb:matrix.org | I don't know if it can be centralized. I think it is some part of the git config? But yes adding that to zuul and kolla and anywhere else we potentially use a lot of git may be helpful | 15:32 |
| @clarkb:matrix.org | fwiw it looks like we have a fairly steady amount of requests in the 09:00 hour with the vast majority being from git clients. Then there is a slight increase in the 10:00 hour up until about 10:30 ish and then it falls off like a cliff. This behavior does make me think like we're just slowly reaching our limits and then when things start to queue it all goes sideways? | 15:34 |
| @clarkb:matrix.org | and about 10:34-10:35 is when request counts drop off signficantly | 15:37 |
| @mnasiadka:matrix.org | I guess so - wonder how bumping the max connections to 8k is going to behave | 15:37 |
| @mnasiadka:matrix.org | Kolla should not really do a lot of git clone things - we fetch tarballs during build | 15:39 |
| @mnasiadka:matrix.org | But I can check what might need tuning to use cached copies | 15:39 |
| @clarkb:matrix.org | ya I suspect it isn't kolla simply because we've never identify kolla as causing problems in the past | 15:39 |
| @clarkb:matrix.org | looks like at around 10:59 the load balancer starts logging conntrack table full issues | 15:40 |
| @clarkb:matrix.org | so I think we effectively start having things queue up because we're not keeping up | 15:40 |
| @clarkb:matrix.org | this is probably tolerated for some time but things slow down. Then we go over the cliff where the ability to keep up is outpaced by the requests coming in. Eventually contrack gets sad and things get worse | 15:40 |
| @clarkb:matrix.org | if we increase the limits then maybe we can keep up better particularly if the backends are never really in trouble during the early portion of this situation | 15:41 |
| @clarkb:matrix.org | but of course a big enough ddos will likely tip things over again. But if we can give ourselves more breathing room I suspect that is worth a try | 15:41 |
| @clarkb:matrix.org | I also notice that within a relatively short period of time (like half an hour) we've got a bunch of requests to update repos like python-barbicanclient that all send the same amount of data and have the same git version user agent. I wonder if this is less the web crawler dos and more the problem of having someone do an update of all their git repos in the datacenter for 10k nodes at the same time without a local cache. I'll see if I can classify some of these requests to narrow it down | 15:46 |
| @mnasiadka:matrix.org | Well, there was a report from somebody on #zuul:opendev.org that their ,,private'' Zuul had problems updating git cache of opendev.org located repositories | 15:47 |
| @mnasiadka:matrix.org | I haven't looked at my downstream Zuul, but I also have a lot of opendev repos defined to be able to run upstream defined jobs in my downstream forks | 15:47 |
| @clarkb:matrix.org | mnasiadka: ya that was tristanC which I suspect is the softwarefactory zuul. I think that is one possibility too vs someone updating a bunch of deployment nodes or whatever. In theory zuul is good about caching things, but if you queue up enough zuul jobs then maybe | 15:48 |
| @clarkb:matrix.org | but if I can map a few IP addresses maybe we can say more definitively | 15:48 |
| @clarkb:matrix.org | ok it looks like trunk-builder-centos9.rdoproject.org may be the thing requesting python-barbican updates 100 times an hour. That said its overall requset count is quite low compared to the total so I'm less convinced this is part of the problem. Just one of the things happening when everything else is happening. I' | 15:55 |
| @clarkb:matrix.org | * ok it looks like trunk-builder-centos9.rdoproject.org may be the thing requesting python-barbican updates 100 times an hour. That said its overall requset count is quite low compared to the total so I'm less convinced this is part of the problem. Just one of the things happening when everything else is happening. I'll keep digging | 15:55 |
| @fungicide:matrix.org | fwiw, i'm getting very slow responses for git operations at the moment | 16:16 |
| @clarkb:matrix.org | according to https://grafana.opendev.org/d/1f6dfd6769/opendev-load-balancer?orgId=1&from=now-24h&to=now&timezone=utc we've reentered the bad behavior situation | 16:22 |
| @clarkb:matrix.org | but I've got a new theory for what is going on. Earlier when I was looking at logs I was looking at logs from gitea itself which means apache was prefiltering things which would explain why the vast majority of requests look good and also total request counts look ok | 16:22 |
| @clarkb:matrix.org | digging into the data on the load balancer the total requests per hour goes up by an order of magnitude. And it seems like there is a super super long tail on the ip addresses involved (so its a proper ddos) | 16:23 |
| @clarkb:matrix.org | I think that maybe we don't have enough haproxy capacity to serve the 403s back and the crawlers are persistent. That is causing legit traffic to back up | 16:23 |
| @clarkb:matrix.org | if there are no objections I think we can manually edit haproxy maxconn to 8000 as proposed in my change and restart haproxy to see if that changes things | 16:24 |
| @clarkb:matrix.org | but also I'm back to thinking this is crawlers and not git requests, ist just that the vast majority of valid traffic we let through to gitea is git | 16:24 |
| @clarkb:matrix.org | alternatively I'm ready to take up farming | 16:24 |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/system-config] 983757: Add an osh.openstack.org redirect site https://review.opendev.org/c/opendev/system-config/+/983757 | 16:24 | |
| @fungicide:matrix.org | Clark: sounds good to me | 16:26 |
| @fungicide:matrix.org | either haproxy tuning or farming. i'm partial go goats | 16:26 |
| @clarkb:matrix.org | ok I'll make those changes manually on gitea-lb03 now | 16:26 |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/system-config] 983757: Add an osh.openstack.org redirect site https://review.opendev.org/c/opendev/system-config/+/983757 | 16:27 | |
| @clarkb:matrix.org | that is done. But we are also at the conntrack limit | 16:28 |
| @fungicide:matrix.org | which may be a symptom of the other problem, so increasing it temporarily may be in order if restarting haproxy didn't help | 16:30 |
| @clarkb:matrix.org | ya we can check back in a few more minutes | 16:30 |
| @clarkb:matrix.org | but I think my assessment is this looks like a proper ddos to me the backends are idle because we're effectively saying no to the traffic that does make it that far. But the roadblock just up the way is plugged up first and not letting much through | 16:31 |
| @clarkb:matrix.org | fungi: conntrack remains high and close to the limit | 16:33 |
| @clarkb:matrix.org | all of our backends appear to be remaining up which is good and load there hasn't signficantly increased. Which I think continues to back up some of the thoughts/ideas/claims I've had this morning | 16:35 |
| @clarkb:matrix.org | ok conntrack numbers are lowest I've seen (still extremely high) | 16:35 |
| @clarkb:matrix.org | oh and they are right back up at the limit again ugh | 16:35 |
| @clarkb:matrix.org | but this is the exact sort of traffic I was hoping the waf would address. Unfortunately it is also terrible at managing this sort of traffic | 16:37 |
| @clarkb:matrix.org | also looks like the napkin math between haproxy connection limits and apache worker counts is holding up | 16:41 |
| @clarkb:matrix.org | gitea09 is using about 2/3 of its apache worker capacity while we are at the 8k limit which is about 2/3 of the total aggregate apache limit | 16:42 |
| @clarkb:matrix.org | fungi: things remain at the conntrack limit. Do you think we should increase it? | 16:42 |
| @clarkb:matrix.org | I am able to navigate opendev.org just very slowly. So maybe we're ok to let it sit and see how it settles out? | 16:43 |
| @fungicide:matrix.org | memory and cpu on the lb look fine, so i don't see any reason we can't | 16:43 |
| @mnasiadka:matrix.org | Well, if we want to block such things closer (on the haproxy level) - I've stumbled across haphash recently (https://github.com/dgl/haphash) | 16:44 |
| @clarkb:matrix.org | mnasiadka: I think that would require us to terminate ssl on the haproxy. but maybe we've reached that point? | 16:44 |
| @clarkb:matrix.org | that said I feel like this is a losing battle no matter what we do | 16:44 |
| @clarkb:matrix.org | eventually we'll need a bigger haproxy and bigger backends then they'll get a bigger botnet | 16:45 |
| @clarkb:matrix.org | makes me want to brainstorm bigger completely different ideas, But I've got none now so maybe that is a good next step | 16:46 |
| @fungicide:matrix.org | i'm just hoping we can hold on until the bottom falls out of the llm training scraper market and they find something more profitable to do with those botnets that doesn't involve hammering our servers | 16:46 |
| @clarkb:matrix.org | it kinda feels like we should be playing taps for the internet | 16:47 |
| @fungicide:matrix.org | mnasiadka: yeah, the debian community is using a version of haphash (though they ported it to varnish for legacy reasons) | 16:47 |
| @clarkb:matrix.org | all the incentives are wrong | 16:47 |
| @clarkb:matrix.org | we produce the data they want to train on, get nothing in return but more work and pain | 16:47 |
| @fungicide:matrix.org | that sounds like yet another version of how the tragedy of the commons has classically played out for open source anyway | 16:48 |
| @clarkb:matrix.org | fungi: I'm comparing ss and conntrack output and there is an order of magnitude difference. ss should report not fully established connections too right? | 16:49 |
| @clarkb:matrix.org | I'm mostly wondering if that is a sign that increasing the limit would be helpful | 16:49 |
| @fungicide:matrix.org | the most annoying part of this is that society had already started coming around to recognize that open source communities struggle and are woefully under-resourced, but especially in areas like code review/qa, vulnerability handling and infrastructure management, the exact activities that have gotten 10x harder with the "ai scourge" | 16:50 |
| @clarkb:matrix.org | also a single ip could git clone and fetch all refs in a reasonably small period of time. There is no reason for this | 16:50 |
| @fungicide:matrix.org | ss should also report half-open connections for local sockets, not necessarily anything that gets routed though (so in our case hopefully 1:1) | 16:51 |
| @clarkb:matrix.org | (basically the inverse of our gerrit replication to a new gitea backend which takes less than a day) | 16:51 |
| @fungicide:matrix.org | but conntrack might have leaked entries that it missed cleaning up for | 16:51 |
| @fungicide:matrix.org | that's why yesterday i suggested a reboot to completely reset the kernel's conntrack tables | 16:51 |
| @fungicide:matrix.org | otherwise we have to wait for them to expire | 16:52 |
| @clarkb:matrix.org | fungi: so `sudo sysctl -w net.netfilter.nf_conntrack_max=524288` to double the limit? | 16:52 |
| @fungicide:matrix.org | oh, also conntrack can track sessions for non-stateful protocols (e.g. udp, icmp...) | 16:52 |
| @fungicide:matrix.org | so ss won't have open sockets for those | 16:52 |
| @fungicide:matrix.org | Clark: yes, that ought to work | 16:53 |
| @clarkb:matrix.org | based on what I've seen I think these are largely tcp connections to the haproxy | 16:53 |
| @clarkb:matrix.org | ok I will try that. Current value is `net.netfilter.nf_conntrack_max = 262144` for the record | 16:53 |
| @fungicide:matrix.org | it's entirely possible that the kernel times out incomplete socket handshake states faster than conntrack | 16:54 |
| @fungicide:matrix.org | i don't know if there's a mechanism that synchronizes them or if they're expected to be tuned in tandem | 16:54 |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/system-config] 983757: Add an osh.openstack.org redirect site https://review.opendev.org/c/opendev/system-config/+/983757 | 17:44 | |
| @mnasiadka:matrix.org | fungi: if you have some time to merge this stack it would be nice - https://review.opendev.org/c/opendev/zone-opendev.org/+/983609 :-) | 17:46 |
| @fungicide:matrix.org | i guess we still need to figure out why the launch-node script has stopped finding the ipv6 addresses of ovh server instances | 17:50 |
| @clarkb:matrix.org | they have always had a weird ipv6 setup and I wouldn't be surprised if that just never worked | 17:52 |
| -@gerrit:opendev.org- Zuul merged on behalf of Michal Nasiadka: [opendev/zone-opendev.org] 983600: Remove DNS entries for mirror02.bhs1.ovh https://review.opendev.org/c/opendev/zone-opendev.org/+/983600 | 17:52 | |
| -@gerrit:opendev.org- Zuul merged on behalf of Michal Nasiadka: [opendev/zone-opendev.org] 983609: Add mirror04.gra1.ovh https://review.opendev.org/c/opendev/zone-opendev.org/+/983609 | 17:53 | |
| @mnasiadka:matrix.org | Clark: the script output says it’s not doing ipv6 because it’s OVH - but then it gets autoassigned? | 17:54 |
| @mnasiadka:matrix.org | Maybe we just need a note to fix the script to fetch ipv6 address on the next ovh node - happy to have a look | 17:55 |
| @clarkb:matrix.org | oh I wonder if we hardcoded ti because ovh was weird to not do ipv6. And now ovh has since fixed things and we can drop that | 17:56 |
| @mnasiadka:matrix.org | I can also have a look tomorrow and just boot a test node which I’ll remove afterwards | 17:56 |
| @clarkb:matrix.org | mnasiadka: it wouldn't surprise me if ^ is the problem. I seem to recall that the neutron api had the info but then the systems themselves didn't get that data in the config drive (and maybe not via the metadata api) and there were no RAs so nothing autoconfigured it. You have to take the output of neutron api calls and manually configure things | 17:57 |
| @fungicide:matrix.org | yeah, once upon a time the ipv6 addresses in ovh weren't reported by nova | 17:57 |
| @clarkb:matrix.org | if things autoconfigure now that would imply that they are using RAs or that config drive/metadata has the info now | 17:57 |
| @clarkb:matrix.org | so maybe we can just drop whatever rule we had in place there? we used our flavor right? /me double checks to make sure this isn't maybe flavor specific | 17:58 |
| @clarkb:matrix.org | we did use our flavor so it isn't flavor specific | 17:58 |
| @clarkb:matrix.org | I suspect we can drop whatever hack we had in place there | 17:58 |
| @mnasiadka:matrix.org | I’ll have a look tomorrow :) | 17:59 |
| @clarkb:matrix.org | sounds good thanks | 17:59 |
| @mnasiadka:matrix.org | infra-prod-service-nameserver failed on 983609 deploy | 18:06 |
| @fungicide:matrix.org | `Failed to download remote objects and refs: fatal: unable to access 'https://opendev.org/opendev/zone-zuul-ci.org/': gnutls_handshake() failed: Error in the pull function.` | 18:12 |
| @fungicide:matrix.org | from /var/log/ansible/service-nameserver.yaml.log on bridge | 18:12 |
| @clarkb:matrix.org | wee | 18:14 |
| @clarkb:matrix.org | we can probably reenqueu that safely | 18:14 |
| @clarkb:matrix.org | and cross fingers | 18:14 |
| @clarkb:matrix.org | though I'm not sure we should expect a different result | 18:14 |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/system-config] 983757: Add an osh.openstack.org redirect site https://review.opendev.org/c/opendev/system-config/+/983757 | 18:15 | |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed on behalf of Clark Boylan: [opendev/system-config] 983740: Increase the gitea haproxy maxconn limit https://review.opendev.org/c/opendev/system-config/+/983740 | 18:38 | |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/system-config] 983778: Double Apache workers/connections on Gitea backend https://review.opendev.org/c/opendev/system-config/+/983778 | 18:38 | |
| @clarkb:matrix.org | fungi: can you see my comment about updating the default value too on https://review.opendev.org/c/opendev/system-config/+/983740 I'm not sure if it matters but I figure avoid problems and just bump it as well | 18:42 |
| @dpanech:matrix.org | Hi, Zuul posts job logs at URLs similar to https://storage.bhs.cloud.ovh.net and others. Can I find out the host names of such external storage services? Some of my users' firewall blocks these hosts. I'd like to give them a list of exceptions. I'm with the StarlingX project. Example: "View log" hyperlinked from here: https://zuul.opendev.org/t/openstack/build/d44f33a177d14e92818b980bcab822c9 points to some place in https://storage.bhs.cloud.ovh.net/ . | 18:42 |
| @clarkb:matrix.org | davlet: we upload the logs to swift in the rackspace cloud and the ovh cloud. We don't manage or have any real insight into what those systems are. but they are public clouds hosting the files publickly | 18:44 |
| @clarkb:matrix.org | davlet: if people have problems with specific names then you can lookup that specific name. Alternatively you can see if ovh and rackspace publish any lists of Ips for these services | 18:45 |
| @clarkb:matrix.org | fungi: or I can update the change really quickly | 18:45 |
| @fungicide:matrix.org | i'm already working on it | 18:45 |
| @clarkb:matrix.org | ack | 18:45 |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed on behalf of Clark Boylan: [opendev/system-config] 983740: Increase the gitea haproxy maxconn limit https://review.opendev.org/c/opendev/system-config/+/983740 | 18:46 | |
| @clarkb:matrix.org | fungi: perfect +2 on both of those from me | 18:47 |
| @fungicide:matrix.org | if nobody else is on hand to review those, i'll self-approve in a bit | 18:48 |
| @clarkb:matrix.org | davlet: it looks like `storage.bhs.cloud.ovh.net` resolves to a single IP address for me that is consistent in two different location in north america an in one location in europe | 18:49 |
| @clarkb:matrix.org | I can't say definitively that they don't load balance that or have different IPs in different locations since I dont' run the service but it appears to be consistent for me | 18:50 |
| @clarkb:matrix.org | dmsimard: may be able to confirm that | 18:50 |
| @dpanech:matrix.org | Clark: ok thanks | 18:51 |
| @clarkb:matrix.org | similarly `storage.gra.cloud.ovh.net` reports two Ips that seem consistent in the various locations I can check | 18:51 |
| @clarkb:matrix.org | I think that is the other ovh location that we use for zuul log storage | 18:52 |
| @dmsimard:matrix.org | Hi, I'm not sure what's the question :) | 18:53 |
| @clarkb:matrix.org | dmsimard: we were curious if OVH publishes IP addresses for public endpoints like the swift object storage system | 18:54 |
| @dpanech:matrix.org | Clark: I haven't encountered this "gra" one. Do these host names show up in some form in any git-controlled configuration files/repos ? Or is it just the front end URLs? Ie when you upload you use some other target URL or method? I was wondering if there was a way to scrape some source code or config files somewhere. | 18:54 |
| @fungicide:matrix.org | (other than merely through dns, which does indeed counk as a kind of publicaiton of course) | 18:54 |
| @clarkb:matrix.org | dmsimard: we store our zuul ci logs in OVH swift and apparently some corporate networks may be blocking access to them. So wondered if there was a list somewhere we could refer to | 18:54 |
| @clarkb:matrix.org | davlet: no we don't control any of those names. We upload the files and the systems says "your file is accessible here" aiui | 18:55 |
| @fungicide:matrix.org | davlet: we also serve logs in several rackspace classic regions at the moment (dfw, iad, ord). but these also are likely to change over time. that is the nature of public cloud object storage | 18:55 |
| @clarkb:matrix.org | davlet: here is one with logs in gra https://zuul.opendev.org/t/openstack/build/20f55c3d6c03499fb1649bb917e264b6/logs | 18:56 |
| @dpanech:matrix.org | Clark: ok thanks | 18:57 |
| @fungicide:matrix.org | it's just like if we stored/distributed logs in amazon s3, the ip addresses and dns names would change over time outside our control | 18:58 |
| @dmsimard:matrix.org | I don't know if we publish these IPs somewhere but storage.bhs.cloud.ovh.net is in Canada (Beauharnois) and storage.gra.cloud.ovh.net in France (Gravelines). It's likely that anycast routing/load-balancing may provide different routes based on source/destination but that's the only thing that comes to mind. | 18:58 |
| @dpanech:matrix.org | OK I'll pass that along, thanks | 18:59 |
| @clarkb:matrix.org | dmsimard: thanks! | 19:01 |
| @fungicide:matrix.org | it looks like amazon centrally publishes ip address ranges for s3 (i doubt ovh or rackspace do the same for their swift endpoints but could be wrong): https://ip-ranges.amazonaws.com/ip-ranges.json | 19:05 |
| @dpanech:matrix.org | fungi: sorry does this have any significance for Zuul logs I asked about ? | 19:07 |
| @tafkamax:matrix.org | I suppose you could add the ip-ranges to ACL allow list for ze customer... | 19:08 |
| @fungicide:matrix.org | davlet: tangentially, amazon s3 is similar to swift in openstack public clouds, i was noting that publishing ip address ranges in machine-readable format is something that amazon does but that i don't think openstack public cloud providers have done typically | 19:08 |
| @tafkamax:matrix.org | Or give the tip to add these to allow list... | 19:08 |
| @dpanech:matrix.org | fungi: ok thanks | 19:09 |
| @clarkb:matrix.org | Taavi Ansper: maybe. I would also argue that over zealous corporate firewall rules are not something we're every going to solve from the outside | 19:11 |
| @clarkb:matrix.org | I think davlet asking questions about what the hosts are and taking that back to their network folks is about the best we can do | 19:11 |
| @fungicide:matrix.org | corporate internet security egress filters have always been at cross-purposes with employees collaborating in open, public systems and communities | 19:13 |
| @mnasiadka:matrix.org | fungi: if we can merge https://review.opendev.org/c/opendev/system-config/+/983602 as well - then I can remove it tomorrow my morning from bhs1 cloud | 19:14 |
| @fungicide:matrix.org | yep, that looks safe to go in now, thanks! | 19:15 |
| @clarkb:matrix.org | fungi: I'm still eating lunch it looks like the changes may pass in ~15 minutes or so. YOu said you could approvethem? | 19:22 |
| @fungicide:matrix.org | yeah, will do | 19:26 |
| @clarkb:matrix.org | looks like changes passed. You still going to approve them or should I? I ended up getting distracted by my computer while eating | 19:42 |
| @fungicide:matrix.org | approved now | 19:44 |
| @clarkb:matrix.org | heh I think we both approved the first one (which is fine) | 19:44 |
| -@gerrit:opendev.org- Zuul merged on behalf of Michal Nasiadka: [opendev/system-config] 983602: Remove mirror02.bhs1.ovh https://review.opendev.org/c/opendev/system-config/+/983602 | 19:57 | |
| @clarkb:matrix.org | fungi: would it be helpful if I took a quick stab at updating the gitea http refactor to restart on app.ini updates? | 20:27 |
| @fungicide:matrix.org | hah, i was just typing up a question for you on that | 20:27 |
| @clarkb:matrix.org | I know you were going to look at that but everything sort of went sideways | 20:27 |
| @clarkb:matrix.org | Oh cool | 20:27 |
| @fungicide:matrix.org | do i need to move the "Get list of image IDs post pull" task into a handler? | 20:27 |
| @clarkb:matrix.org | No handlers run after all normal tasks and I think we want this running early | 20:28 |
| @fungicide:matrix.org | or just expand the when condition in "Stop/Start gitea safely for Gerrit replication" | 20:28 |
| @clarkb:matrix.org | I think you can add a register to the app.ini template task then update the when condition to say or when app.ini is changed | 20:28 |
| @fungicide:matrix.org | what's the common pattern for checking whether a file got updated? | 20:28 |
| @fungicide:matrix.org | aha, that's what i was looking for | 20:28 |
| @fungicide:matrix.org | thanks! | 20:28 |
| @clarkb:matrix.org | Ya the `changed` attribute of what you register should be true when the files updates. False when the file is the same | 20:29 |
| @fungicide:matrix.org | ansible uses || and && for logical operations in compound conditionals? | 20:29 |
| @clarkb:matrix.org | https://docs.ansible.com/projects/ansible/latest/reference_appendices/common_return_values.html#changed | 20:30 |
| @clarkb:matrix.org | It's jinja https://jinja.palletsprojects.com/en/stable/templates/#logic so and and or I think | 20:31 |
| @fungicide:matrix.org | thanks, i found examples of that in the repo | 20:33 |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/system-config] 983134: Remove intermediate HTTPS layer for Gitea backends https://review.opendev.org/c/opendev/system-config/+/983134 | 20:35 | |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/system-config] 983061: Apply Anubis to the Gitea backend servers https://review.opendev.org/c/opendev/system-config/+/983061 | 20:37 | |
| @clarkb:matrix.org | That looks about right. Let's see if zuul is happy with it | 20:38 |
| @fungicide:matrix.org | i didn't move /var/lib/anubis to /var/anubis since there doesn't seem to be consensus (and also you used /var/lib for the anubis implementation with mailman) | 20:39 |
| @clarkb:matrix.org | Yup I think that's fine | 20:41 |
| @clarkb:matrix.org | I wanted to note it as a thing but I don't feel strongly it should change | 20:41 |
| @clarkb:matrix.org | fungi: any reason to not add inventory hostname to the domains list? | 20:42 |
| @fungicide:matrix.org | oh, i missed that, lemme fix, thanks | 20:42 |
| @fungicide:matrix.org | aha, i think the reason i didn't do it originally before i split the changes up is that anubis wants to know the domain name... i'll look at how that worked with mailman since we have multiple domains there so we must have made it work somehow | 20:44 |
| @fungicide:matrix.org | maybe it was only a warning? we didn't set `REDIRECT_DOMAINS` for the mailman version of this... | 20:46 |
| @clarkb:matrix.org | Ya I think you just do something like `opendev.org,{{ inventory_hostname }}` | 20:46 |
| @clarkb:matrix.org | Setting it is probably a good idea but ya maybe they just warn us | 20:46 |
| @fungicide:matrix.org | i don't see it in /var/log/containers/docker-anubis.log on lists01, so i'll try leaving it out | 20:47 |
| @clarkb:matrix.org | fungi: https://anubis.techaro.lol/docs/admin/configuration/redirect-domains/ there is a security note here but I'm not sure I understand it | 20:49 |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/system-config] 983061: Apply Anubis to the Gitea backend servers https://review.opendev.org/c/opendev/system-config/+/983061 | 20:50 | |
| @fungicide:matrix.org | oh, now that i think about it, i remember seeing a vulnerability report about anubis integration in a hosting platform turning it into an open proxy | 20:50 |
| @fungicide:matrix.org | and setting `REDIRECT_DOMAINS` was the mitigation | 20:51 |
| @fungicide:matrix.org | here we go: https://cvefeed.io/vuln/detail/CVE-2025-61587 | 20:52 |
| @fungicide:matrix.org | probably not relevant for our situation | 20:52 |
| @fungicide:matrix.org | and was an open redirect not an open proxy, so not as bad | 20:53 |
| @fungicide:matrix.org | oh, and the error message i was seeing in the log was misleading, corrected in development a few weeks ago by https://github.com/samjaninf/anubis/commit/c2ed62f | 20:58 |
| @fungicide:matrix.org | looks like https://anubis.techaro.lol/docs/admin/configuration/redirect-domains/ explains the redirect logic | 20:59 |
| @fungicide:matrix.org | so maybe we do want to set it for both mailman and gitea | 21:01 |
| @fungicide:matrix.org | though exploiting it looks rather convoluted | 21:01 |
| @fungicide:matrix.org | (for relatively little payoff) | 21:02 |
| @clarkb:matrix.org | Ya I think it isn't hard to set for either | 21:04 |
| @clarkb:matrix.org | In the gitea case I have the example value above and for lists we can just list all the list domains in our variables and make it dynamic I think | 21:04 |
| @fungicide:matrix.org | i'm looking to see if we have a jinja example already for how we'd do it dynamically with the mailman sites | 21:04 |
| @clarkb:matrix.org | We probably do somewhere | 21:05 |
| @fungicide:matrix.org | something like https://opendev.org/opendev/system-config/src/commit/7de36ff/playbooks/roles/mailman3/templates/domain_aliases.j2 | 21:06 |
| @fungicide:matrix.org | no, that's not it | 21:06 |
| @clarkb:matrix.org | https://opendev.org/opendev/system-config/src/commit/7de36ff/playbooks/roles/mailman3/templates/docker-compose.yaml.j2#L72 | 21:07 |
| @clarkb:matrix.org | That maybe | 21:07 |
| @fungicide:matrix.org | ah, yes we have the list in https://opendev.org/opendev/system-config/src/commit/7de36ff/inventory/service/group_vars/mailman3.yaml#L87 so just need to swap `:` separators for `,` | 21:10 |
| @fungicide:matrix.org | that's simple enough | 21:11 |
| @clarkb:matrix.org | yup and we're alreadydoing it in the file we want to edit for django | 21:12 |
| @clarkb:matrix.org | so should be well tested arleady | 21:12 |
| -@gerrit:opendev.org- Zuul merged on behalf of Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org: [opendev/system-config] 983778: Double Apache workers/connections on Gitea backend https://review.opendev.org/c/opendev/system-config/+/983778 | 21:12 | |
| -@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [opendev/system-config] 983740: Increase the gitea haproxy maxconn limit https://review.opendev.org/c/opendev/system-config/+/983740 | 21:12 | |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/system-config] 983061: Apply Anubis to the Gitea backend servers https://review.opendev.org/c/opendev/system-config/+/983061 | 21:15 | |
| @clarkb:matrix.org | I will check on ^ in a few | 21:15 |
| @clarkb:matrix.org | looks liek gitea09 has already updated its apache configs | 21:16 |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/system-config] 983802: Set REDIRECT_DOMAINS for Anubis with Mailman https://review.opendev.org/c/opendev/system-config/+/983802 | 21:19 | |
| @clarkb:matrix.org | +2 from me on ^ but we should check the logs from the check job before approving it | 21:20 |
| @fungicide:matrix.org | of course | 21:22 |
| @clarkb:matrix.org | ok apache updates are done across the backends. Next deploy buildset should update haproxy | 21:23 |
| @clarkb:matrix.org | then its a matter of monitoring going forward to make sure we haven't overdone it. if we decide we have then we can put the hosts in the emergency file and tune things back quickly. Doing so on gitea-lb03 is likely most simple and effective | 21:23 |
| @clarkb:matrix.org | the load balancer ppears to have updated as well. we did a graceful reload of haproxy so the old process and new process are running side by side. I think the old one will go away at some point though | 21:28 |
| @clarkb:matrix.org | I think the old haproxy process has gone away now and all connections are using the new config at this point | 21:34 |
| @clarkb:matrix.org | fungi: hrm I'm not sure we capture the value of the anubis environment or the docker-compose.yaml file anywhere in our logs | 21:51 |
| @clarkb:matrix.org | we have a checksum we can check against: https://5dce13a5d17b3a355e81-5c64cb79e45e2166ffcdcfd3b17f7c1b.ssl.cf2.rackcdn.com/openstack/b33fe879f1c246eb8985359fe9b68bcc/bridge99.opendev.org/ara-report/results/372.html but that seems error prone | 21:53 |
| @fungicide:matrix.org | we could add the compose file to log collection i suppose | 21:54 |
| @fungicide:matrix.org | assuming you're looking to find out if the list expansion worked as intended | 21:55 |
| @clarkb:matrix.org | yes that was the main thing. The other thought I had was "add a testinfra test case that exercises this request as if you were a browser" then realized thats the thing everyone hitting anubis wants to do and isn't doing yet and I don't want to help them :) | 21:55 |
| @fungicide:matrix.org | we have one of those in the gitea testinfra, just not in mailman's | 22:05 |
| @fungicide:matrix.org | well, we have a test that sets a ua in curl to look like a generic graphical browser and then checks to see if it gets back content that looks like the anubis proxy challenge | 22:08 |
| @fungicide:matrix.org | it's not an end-to-end exercise of the anubis challenge/response/redirect handshake no | 22:09 |
| @fungicide:matrix.org | merely enough to confirm that browsers will get anubis kicking in and plain curl will get passed straight through | 22:10 |
| @clarkb:matrix.org | ya I think to test the domain stuff you have to succeed with the challenge | 22:10 |
| @fungicide:matrix.org | which also makes me wonder why more of these crawlers don't just masquerade as curl (or maybe they do and we simply haven't noticed) | 22:10 |
| @clarkb:matrix.org | once you succeed the challenge it goes to redirect you and if the redirect is not valid you get an error. At least that is what I recall from testing the held gitea99 node and going to gitea99.opendev.org via a socks proxy instead of opendev.org | 22:11 |
| @fungicide:matrix.org | right, that would certainly be a lot harder, and is indeed what the crawlers would probably like to be doing themselves | 22:11 |
| @fungicide:matrix.org | maybe we could instrument it with selenium or similar | 22:11 |
| @fungicide:matrix.org | i can also just add a dnm change on top of the stack and hold a node, but will save that for tomorrow, it's getting late here | 22:13 |
| @clarkb:matrix.org | yup that may be simplest and yes you should take a break | 22:14 |
| -@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul-jobs] 983641: Record filesystem free space in dstat https://review.opendev.org/c/zuul/zuul-jobs/+/983641 | 22:28 | |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!