Tuesday, 2026-04-07

opendevreviewMichal Nasiadka proposed openstack/project-config master: propose-updates: Add pcu target  https://review.opendev.org/c/openstack/project-config/+/97856607:58
croelandtDid the SSL issues come back? I'm seeing the same issue as last week in the CI: https://zuul.opendev.org/t/openstack/build/3d41b53f81b5493c9d94a5b7270b2c8114:59
croelandtEOF occurred in violation of protocol (_ssl.c:1000)'))': /openstack/requirements/raw/branch/master/upper-constraints.txt15:00
stephenfinI was just about to ask the same thing15:00
clarkbyes its another round of crawler load overwhelming the cluster15:00
clarkbper usual with a bunch of spoofed user agents presumably from a large set of IPs but I haven't checked the IP side yet15:00
stephenfinoh, joy /o\15:01
croelandtdo we know who is running a DDOS^W^W^Wcrawling our resources?15:05
croelandtAI companies that believe our requirements might change every second? :p15:05
clarkbcroelandt: no, they spoof their User Agents and use botnet looking source addresses so each request is from a different IP from just about anywhere in the world15:07
clarkbbut if you look at the requests themselves its clear that there are crawler patterns at play15:08
croelandtbut what is the purpose? It this a deliberate attack on our resources?15:09
fungiit seems to be organized crime in control of vast botnets across the globe trying to scrape everything on the internet to resell to "legitimate" aiu companies15:09
clarkbthe "big" shops all identify their crawlers properly, google, meta, anthropic, openai, etc. Whether or not they buy the datasets being generated here I'm not sure (but I strongly suspect they do if you look in the gemma release info they have to run their own filters over the data sets implying they aren't generating them themselves)15:09
clarkbcroelandt: no I think the purpose is that groups like google will pay money for large sets of data to train their LLMs15:10
clarkbcroelandt: google then has to filter the data for legally problematic data. Then they feed the llm15:10
fungi...and don't care where it comes from/how it was obtained (would probably prefer not to know)15:10
clarkbbut I suspect there is enough money in this to make it a business for certain folks15:10
croelandtfascinating times15:10
fungiwhich has created a goldrush for organized crime15:10
croelandtand it's all related to the absurd prices of hardware, amazing15:11
clarkbfungi: yup exactly. Otherwise why does google need to run an entire legal filtration process15:11
clarkbhttps://ai.google.dev/gemma/docs/core/model_card_4#data_preprocessing this is what I refer to15:15
clarkbbasically that to me strongly implies that they are not curating the data set themselves, or at least not doing so for the entire dataset15:15
clarkband that requires them to apply preprocessing to avoid these problematic sources of data. It also implies that there is a market for collecting the data without regard to fallout beacuse google is going to pay for it anyway15:16
clarkbincluding the far more problematic content that they are attempting to filter out15:16
-opendevstatus- NOTICE: Load on the opendev.org Gitea backends is under control again for now, if any Zuul jobs failed with SSL errors or disconnects reaching the service prior to 16:15 UTC they can be safely rechecked17:03
*** skandix4263990 is now known as skandix42639922:20

Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!