| tonyb | fungi, noonedeadpunk: I think gitea13 is getting hit by 2 crawlers 1 (Facebook) going via the load-balancer, and a second (ChatGPT) going direct to gitea13 | 01:07 |
|---|---|---|
| Clark[m] | tonyb: there is also a crawler coming out of cloud flare IPs with legit looking UAs but you can tell it is a crawler due to the url patterns (it's asking for every file in every repo for every commit hash | 01:09 |
| Clark[m] | Those tend to be more problematic because they aren't good enough to correctly identify themselves. I noted earlier today that I suspect 13 has been identified by the crawlers on the web but the other 5 haven't been so it gets direct requests and the others don't leading to the imbalance. We may need to consider blocking direct access even though it makes debugging more annoying | 01:10 |
| Clark[m] | That should force things to balance out better | 01:10 |
| tonyb | Clark[m]: I'm not seeing that? | 01:12 |
| Clark[m] | tonyb it's possible they moved on since I checked about 10 hours ago | 01:12 |
| tonyb | Clark[m]: `grep -v 38.108.68.97 /var/log/apache2/gitea-ssl-access.log | awk -F\" '{sub(":.*", "", $1); print $1}' | sort | uniq -c | sort` shows to IPs that accound for 500+ connections | 01:13 |
| tonyb | they're both ChatGPT (I think) | 01:13 |
| Clark[m] | Ya the cloud flare stuff is someone trying to appear like normal traffic so it's many IPs and many user agents | 01:13 |
| tonyb | Ooooo | 01:14 |
| Clark[m] | If you look for requests that include commit hashes they stand out. | 01:14 |
| Clark[m] | But also look for weird user agents like Android 3 or Firefox 3 | 01:15 |
| Clark[m] | They must have systems with massive tables of valid user agents that they iterate through for each request. I've even found typos in the user agents in the past | 01:15 |
| tonyb | Oh yeah there are *lots* of those. I was ignoring them until the 2 I mentioned were dealt with. | 01:15 |
| tonyb | probably a mistake | 01:15 |
| tonyb | I wonder if we should a) block Facebook at the Loadbalancer (viw robots.txt, which they claim to honnor) ; and b) pull gitea13 out of the pool for a while so that real users don't get random slow servers. | 01:17 |
| Clark[m] | While the chatgpt and Facebook and so on traffic is not zero impact I suspect they are generally much better behaved and it is the botnet crawl every repo commit and file from many IPs that is actually the problem | 01:18 |
| Clark[m] | Which is why they don't properly identify themselves because they know they are not doing what they should | 01:18 |
| tonyb | Fair enough | 01:18 |
| tonyb | If that'e true, we could just do step "b" as the worst behaving client(s) are going direct | 01:19 |
| tonyb | Or we could just ignore things as it's annoying but not all that painful ? | 01:20 |
| Clark[m] | Ya I guess I didn't consider just pulling 13 and letting it be a honeypot | 01:25 |
| Clark[m] | I think that works and is simple. I had initially rejected that as I assumed that we would redirect to another backend but since we're bypassing the load balancer entirely with the bad traffic that may actually work as an interim step | 01:26 |
| tonyb | The Facebook Crawler will "migrate" but I think the will stay where they are | 01:34 |
| tonyb | #status log Removing gitea13 from the load-balancer due to a several crawlers hitting gitea13 and bypassing the load-balancer. This leaves the node running as a honeypot and (hopefully) minimising human visible impacts | 01:47 |
| Clark[m] | Thanks! | 01:51 |
| tonyb | As expected Facebook has migrated | 01:51 |
| opendevreview | OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/962557 | 02:14 |
| *** ykarel_ is now known as ykarel | 07:40 | |
| *** mrunge_ is now known as mrunge | 07:55 | |
| fungi | this thread may be worth watching: https://discuss.python.org/t/are-setuptools-abandoned/104390 | 10:21 |
| opendevreview | Merged zuul/zuul-jobs master: Make upload-image-s3 hash timeout configurable https://review.opendev.org/c/zuul/zuul-jobs/+/963663 | 14:54 |
| opendevreview | Merged zuul/zuul-jobs master: Allow disabling compression of uploaded images https://review.opendev.org/c/zuul/zuul-jobs/+/963669 | 14:54 |
| opendevreview | Merged zuul/zuul-jobs master: Make upload-image-swift hash timeout configurable https://review.opendev.org/c/zuul/zuul-jobs/+/963726 | 14:54 |
| opendevreview | Merged zuul/zuul-jobs master: Allow disabling compression of uploaded images https://review.opendev.org/c/zuul/zuul-jobs/+/963727 | 14:57 |
| opendevreview | Merged zuul/zuul-jobs master: Allow upload-image-s3 role to export S3 URLS https://review.opendev.org/c/zuul/zuul-jobs/+/963828 | 14:57 |
| opendevreview | Nicolas Hicher proposed zuul/zuul-jobs master: Refactor: multi-node-bridge to use linux bridge https://review.opendev.org/c/zuul/zuul-jobs/+/959393 | 18:27 |
| opendevreview | Tony Breeds proposed openstack/project-config master: [pti-python-tarball] Add compatibility for older wheels https://review.opendev.org/c/openstack/project-config/+/964251 | 20:12 |
| tonyb | fungi: ^^ That's my rough idea for dealing with the wheel dist-info issue(s) | 20:51 |
| opendevreview | Tony Breeds proposed openstack/project-config master: [pti-python-tarball] Add compatibility for older wheels https://review.opendev.org/c/openstack/project-config/+/964251 | 21:07 |
| opendevreview | Vladimir Kozhukalov proposed zuul/zuul-jobs master: [build-container-image] Update buildx change tag https://review.opendev.org/c/zuul/zuul-jobs/+/964255 | 21:10 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!