Friday, 2025-01-17

ianwclarkb: scrolling back :)00:00
clarkbianw: sorry its a bit stream of consciousness going on there00:02
ianwi do think we've probably never had something as old as <1.0.9 in the mix00:05
clarkbok thats good to know00:05
clarkbI expect that makes our risk here extra low00:05
clarkb(because anything that gets deleted probably should be deleted)00:05
clarkband hopefully borg mount isn't completetly unhappy with the situation :)00:06
ianwfrom the initial commit 028d6553750eaafc24ef7c244fbd4832d68a11a200:06
clarkbshould know soon. I'm on the node in question just waiting for the job to actually get to that point now00:06
ianw+borg_version: 1.1.1300:06
clarkbya I see that too.00:07
clarkbI didn't even think to check that (too many other things)00:07
ianwso i think that clears up the TAM issue.  probably no way to do thing other than what you're doing working through issues trying to use 1.2 client00:13
clarkbya and probably we'll continue to learn more if/when we eventually get 1.2 into production for paste00:14
clarkbits interesting how much of a forcing function python3.12 has become00:14
ianwupgrading the borg version has been on my todo list for ... forever.  i sort of vaguely had it in my head the best way might be to make sure that running 1.4 it's done as a totally separate thing, in parallel.  basically, work things so that two borgs can run separately00:14
ianwthat seems to be most future proof, in that when a new major comes out you cut over to it in a fresh backup, and stop the old version when you're happy with the new00:15
clarkbthe 1.2 to 1.4 upgrade doesn't actually look too bad. And 1.1 to 1.2 is really only bad for the tam stuff. Everything else they've written down seems solvable00:15
clarkbbut ya maybe another option is a new backup server also running noble and running 1.4 and then paste runs 1.400:15
clarkbwe shall see where the testing gets us00:16
clarkb`borg mount not available: no FUSE support, BORG_FUSE_IMPL=pyfuse3,llfuse.` adding -e was helpful00:16
ianw++ broke my own rules of being "set -eu" safe there :)00:17
clarkbok I think I see how to fix this. New patch in a bit.00:19
ianwoh i didn't think of /var/log/borg-... updating while it's being backed up.  probably should have put that in a subdir that is ignored00:20
ianwi guess it's been warning/failling to back that up all the time, just 1.2 tells you about it?00:20
clarkbianw: ya the change is 1.2 exits 1 if there are warnings00:21
clarkbI've just written a monstrosity of a jinja template string. Lets see if it works00:23
opendevreviewClark Boylan proposed opendev/system-config master: Use newer borgbackup on Ubuntu Noble  https://review.opendev.org/c/opendev/system-config/+/93949100:23
clarkbthis might have to be the last patchset before dinner though. Feel free to poke at that chagne if there is interest. Otherwise I'll pick it up tomorrow00:23
ianwyou probably need a "+" in -> 'borgbackup[fuse]=='borg_version00:24
opendevreviewClark Boylan proposed opendev/system-config master: Use newer borgbackup on Ubuntu Noble  https://review.opendev.org/c/opendev/system-config/+/93949100:30
clarkbthanks! I figured I would get something wrong in that00:30
clarkbthe new pyfuse3 lib is lgpl licensed and maintained by the borg person so that seems fine to switch to00:31
clarkbI don't know why they broke out 'fuse' extras into different library options. Would be nice if they kept fuse and also added the more specific options for people who need them00:31
opendevreviewClark Boylan proposed opendev/system-config master: Use newer borgbackup on Ubuntu Noble  https://review.opendev.org/c/opendev/system-config/+/93949100:53
clarkbit failed quickly enough with an actionable error message t o get another ps in00:53
MattCrees[m]Hello #opendev, would anyone be available to hold a node for me please? I'd like to get in and debug the CI failures here https://review.opendev.org/c/openstack/kolla-ansible/+/92462310:54
fricklerMattCrees[m]: which job? and then I'd need your ssh public key11:07
MattCrees[m]kolla-ansible-ubuntu-podman11:07
MattCrees[m]ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIFSiegoucsoBqvk22UEqHPX1NL48kAGIBT3/drMGTNed mattc@stackhpc.com 11:08
fricklerMattCrees[m]: ok, hold set up and recheck triggered, I'll let you know when the node is ready11:10
MattCrees[m]Brilliant, thank you :) 11:10
fungiMattCrees[m]: ssh root@213.32.78.12313:45
MattCrees[m]I'm in, thanks13:46
fungino sweat, just let us know once you're done so we can release its resources back into the pool13:48
priteauHello. Is there an issue with opendev right now? It is suddenly very slow for me, causing timeouts to fetch upper constraints.14:38
fricklerpriteau: I'm seeing sporadic failures, too, like:     stderr: 'fatal: unable to access 'https://opendev.org/zuul/zuul-jobs/': gnutls_handshake() failed: The TLS connection was non-properly terminated.'14:42
priteauI seem to remember a similar issue a while ago. Apache might need a restart?14:53
fungiwell, if recent experience is any indication, it's being overrun by new llm training crawlers we need to block14:57
priteauLovely14:58
fungiload average on all 6 gitea servers is over 2014:58
mnasiadkaSweet…14:59
fungihttps://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/apache-ua-filter/files/ua-filter.conf is the list of what we're blocking so far15:05
MattCrees[m]Hi again, I'm finished with that node now, feel free to release it15:15
fungithanks MattCrees[m]! i've cleaned up your hold15:18
fungiso, analyzing apache logs on gitea09, in the past 10 minutes it received 32107 requests from clients claiming this ua string:15:21
opendevreviewClark Boylan proposed opendev/system-config master: Use newer borgbackup on Ubuntu Noble  https://review.opendev.org/c/opendev/system-config/+/93949115:21
fungi"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 Edg/114.0.1823.43"15:21
clarkbfungi: that is not a modern chrome though I suppose it could be a current edge/safari. Does Edge report itself with the missing e?15:22
clarkbfungi: not urgent keep looking at gitea but I woke up this morning with the idea that I needed to refactor 939491 a bit to make it more sane and that is all the latest ps does. Should be equivalent just easier to understand and modify in the future15:22
fungithe other backends seem to be seeing similar request volumes with the exact same ua15:23
funginote that it's two orders of magnitude more requests than for the second most active ua15:23
fungiit looks like real edge uses "Edge/some.version" instead15:24
fungii'll stick this one into the list15:25
clarkb++15:25
clarkbfrickler: not sure if you've seen 939491 but its commit message is probably worth reading if you have time. Mostly to call out any additional things we should test15:28
clarkbI have since confirmed that our prune and verify scripts are tested in CI so  Iexpect that a 1.2.8 client pushing to a 1.1.18 server can prune and verify/check safely using 1.1.1815:28
clarkbthe biggest issues will come later for existing hosts if/when we migrat them to 1.2. In particular if we have to clean up TAMless archives and then there is also a step to run to generate some index data I think15:29
opendevreviewJeremy Stanley proposed opendev/system-config master: Block another bogus crawler user agent  https://review.opendev.org/c/opendev/system-config/+/93954115:32
fungipriteau: hopefully that ^ will help15:32
clarkbfungi: ^ is missing the [OR] condition at the end of line on the line prior to your addition15:32
fungid'oh15:33
clarkbfungi: also you are doing a direct match so can use = instead of anchoring the regex15:33
clarkbthat might make it slightly faster15:33
clarkblook at examples near the beginning of the list and compare to the end15:33
opendevreviewJeremy Stanley proposed opendev/system-config master: Block another bogus crawler user agent  https://review.opendev.org/c/opendev/system-config/+/93954115:34
clarkbyou can avoid extra \s that way15:34
clarkbbut the regex should work so is fine too15:34
fungican do15:34
clarkbya I always have to remember to add the [OR] because its easy to yy p and then forget15:35
opendevreviewJeremy Stanley proposed opendev/system-config master: Block another bogus crawler user agent  https://review.opendev.org/c/opendev/system-config/+/93954115:35
fungiindeed, exactly what i did too15:36
clarkbI've gone ahead and approved it. I don't think that can make things worse if CI passes and it should hopefully make things better15:36
clarkbthen I guess if we get gitea settled down we have to consider if we're gonna send the borg update.15:37
clarkbI think our CI coverage is pretty good so its mostly a matter of ensuring the change itslef is structurally sound and that the various scary borg upgrade warnings have been considered15:38
fungithanks15:39
tobias-urdinstupid crawlers, holding my thumbs – been trying to clone https://opendev.org/openstack/puppet-barbican for a while now15:47
opendevreviewClark Boylan proposed opendev/system-config master: Use newer borgbackup on Ubuntu Noble  https://review.opendev.org/c/opendev/system-config/+/93949115:56
clarkbok that was a silly mistake in my refactor (the whole point was to capture the different fuse extras then I didn't) but I'm hopeful for this latest ps15:56
fungiseems like i missed that oopsie15:58
fungipypi has added the ability to "archive" deprecated projects: https://discuss.python.org/t/1393716:01
priteaufungi: Thanks. Just pointing out that a Google Search finds lots of User Agent entries for "Edg/114.0.1823.43", unlike with "Edge", so this may be a genuine one (that might still be used by a crawler)16:05
fungiinteresting, when i went looking up the ua for edge it seemed the official one was spelled out fully16:06
clarkbalso that version string is old16:07
fungimaybe it changed at some point16:07
clarkbI think even if it were valid we'd tell people to 1) upgrade and 2) stop being a nuisance16:07
clarkbbut yes it looks like Edg/version is potentially valid16:09
fungii wonder what prompted them to abbreviate a 4-character name to 316:10
fricklerfor me, opendev.org is still essentially down, so maybe force-merging 939541 once checks pass would be good? or manually applying?16:20
Clark[m]Manually applying would be quickest and shouldn't get undone by hourly jobs16:25
clarkbok breakfast and morning tasks are done. I'm starting to manually apply the rule on 14 and working my way down17:07
clarkbthe rule should be in place everywhere. I accidentally restarted apache on gitea10 instead of reloading it. Everywhere else got a reload17:13
clarkbI don't expect that restart was very noticeable with all the other problems.17:13
clarkbload seems to be steadily falling17:13
clarkbthings appear slow but improving in my testing of the web ui17:15
clarkbI suspect some of that is the crawling pollutes the caches in gitea so the things interactive users want need to be cached and then they get quicker17:15
clarkblooking at logs we're sending a lot of 403s now17:22
clarkbseems like the experience I get now is initial load of a new page is slow (likely bceause gitea has to rebuild and then cache the content) then subsequent loads are reasonable17:32
clarkbI suspect this is far more usable now17:32
opendevreviewMerged opendev/system-config master: Block another bogus crawler user agent  https://review.opendev.org/c/opendev/system-config/+/93954117:56
clarkbapplying ^ should be a noop at this point but good to get it into config so it doesn't unapply later17:57
clarkbfungi: if you're not otherwise distracted by other obligations I think my priority is borg fix then gitea then gerrit. And honestly I'd be happy just to muddle through the borg fix. If we want to prioritize based on risk I'd say gitea is less risky then borg and gerrit are probably similarly likeyl to need intervention (gerrit if the restart goes sideways again)18:07
fungiclarkb: i'm back from lunch so taking a look now18:16
fungihave to find some time this afternoon to tidy up the mil's yard a little since the weekend weather is looking gross, but otherwise nothing pressing18:18
fungiborg fix first sounds good18:19
clarkbas mentioned before I think with the scope of this initial borg update change the worst case should be new psate02 server backups are sad and everyone else continues as is. But our ci job seems to show general happyiness. I am hopeful18:20
clarkbI guess I should triple check logs in the job ut I looekd and I think check/verify and prune are both happy. Can also triple check the correct versions are installed on the correct hosts18:21
fungiyeah, i looked through the borg release notes and agree the risk is low and manageable in the worst case18:21
fungichecking build logs now18:21
clarkbhttps://zuul.opendev.org/t/openstack/build/4f9d33fd3a4b4d24a17c937c0fe227ee/log/borg-backup01.region.provider.opendev.org/verify-borg-backups.log https://zuul.opendev.org/t/openstack/build/4f9d33fd3a4b4d24a17c937c0fe227ee/log/borg-backup01.region.provider.opendev.org/prune-borg-backups.log and18:22
clarkbhttps://zuul.opendev.org/t/openstack/build/4f9d33fd3a4b4d24a17c937c0fe227ee/log/borg-backup-noble.opendev.org/borg-backup-borg-backup01.region.provider.opendev.org.log lgtm18:22
clarkbI'm trying to dig up ionstallations to verify versions next18:22
clarkbhttps://d9c6b55444bb367e4cfd-f794ac2492944f87fa91e71db0988d60.ssl.cf2.rackcdn.com/939491/11/check/system-config-run-borg-backup/4f9d33f/bridge99.opendev.org/ara-report/playbooks/4.html ^F pip then check the entry for each node18:23
clarkbI think I'm happy with ^ (there is a test case that tries to check the installed version too)18:24
clarkbfungi: I guess let me know if you think the new version isn't more readable/understandable than what I had written yesterday. I think I personally prefer this as its less magic and more explicit18:25
clarkbit should also be easier to override on a server by server basis if we change borg_version18:25
clarkbas a followup we might want ot make the _fuse_extras value overrideable too but that doesn't seem important yet18:25
fungiyeah, i don't see anything concerning in the logs18:30
clarkboh as a note nothing currently sets borg_version so we're keeping the overridable default for potential future use18:31
clarkbI see your +2. Do we want to go ahead and send it in now. I suspect there won't be anyone else to review it until Monday. Not ure if you think this is worth waiting on and then switch to other stuff in the interim18:33
clarkbI think I'd like to get it in before the weekend because that is when the verification script runs iirc. But I can go either way18:33
clarkbI'll defer to you on what you feel is safest for your availability. I expect to be able to be around all day if necessary18:36
fungiyeah, approved18:46
fungiand yes the backup verifications are weekly on sunday19:05
fungithat's what was conflicting with the gerrit server backups for a while19:05
clarkbcool getting this in now should get us good exercise then. ~4-6 backups then verification sunday19:06
clarkbassuming it works. It won't be the first time it fails but at least this time we are directly testing it so odds seem better19:06
clarkbthe change should merge around when hourly jobs end so we should know soon 19:08
fungihourly jobs have already finished19:22
fungilast gate job is wrapping up finally19:23
opendevreviewMerged opendev/system-config master: Use newer borgbackup on Ubuntu Noble  https://review.opendev.org/c/opendev/system-config/+/93949119:24
clarkbhere we go19:24
clarkbthe job has begun19:25
fungiload average on the gitea servers has fakken down around 119:30
fungis/fakken/fallen/19:30
clarkbpaste02 has crontab entries for backups now19:30
clarkbfungi: wouldn't surprise me if a good chunk of that is simply returning 403s19:31
fungiyeah19:32
fungideploy job succeeded19:32
clarkbhttps://zuul.opendev.org/t/openstack/build/b2813525b82a4d129c2e3b41c064f768 yup19:32
clarkb0526 and 1726 are the two backup attempt times19:32
clarkbso we're solidly in the middle of that. We could manually run them but I'm not sure that is necesasry unless you want to give that a go19:33
clarkbfor https://review.opendev.org/c/opendev/system-config/+/938826 I guess we have to decide if we're concerned about the load impacting that somehow though as you mention load seems consistently reasonable now19:34
clarkbbut given my priority list ^ would be next19:34
clarkbI'm going to go clean up some autoholds I made for backup stesting19:34
clarkbtkajinam still has a hold on a heat-functional setup19:35
clarkbtkajinam: ^ can we claen that up?19:35
fungiyeah, i think gitea activity has calmed down enough that rotating through the restarts should have minimal impact19:36
fungiclarkb: i can approve 938826 if you're ready for that19:37
clarkbfungi: I think I am. Not sure what other testing I would do for that at this point. I guess triple checking the backup situation. The gate jobs take about an hour though so I say approve it and we can unapprove if it we discover problems in the next hour with other things that demand attention19:38
clarkbI'm going to check borg versions on some representative servers19:38
fungiyeah, the held node looked fine to me too19:38
fungiapproved the gitea upgrade just now19:38
clarkbspot checking a backup server, paste02 and etherpad the borg versions are as expected19:39
fungii'm going to spend a few minutes on yard cleanup while that filters through the gate19:41
clarkbsounds good19:42
clarkbI'll be making something for lunch soon myself19:42
fungigate jobs are looking good so far, zuul is estimating another 50 minutes to merge19:57
clarkband I'm going to make food now.19:58
fungilooks like it'll be merging in the next few minutes, deploy may end up fighting the hourlies20:46
clarkbI suspect if it merges according to the estimate it will deploy before the hourlies20:47
fungiyeah, it finished faster than i expected20:47
opendevreviewMerged opendev/system-config master: Update to Gitea 1.23  https://review.opendev.org/c/opendev/system-config/+/93882620:48
fungibam20:48
clarkbI've got a git clone command ready to run against gitea09 once it is done and have gitea09 opened to system-config to check web20:50
clarkbhttps://gitea09.opendev.org:3081/opendev/system-config/ loads for me and reports the expected version20:52
clarkbgit clone works too20:52
clarkbmariadb switch seems to have mostly nooped as we expect20:53
clarkbimage ids match etc but container was restarted as part of the update and we report the quay location as the image name20:53
clarkbweb ui seems to be working for me20:54
clarkb10 is done now too20:55
fungiyeah, it's working fine for me20:55
clarkb11 is done. Halfway there20:56
clarkbnow down to waiting on 14 to be done21:00
clarkbfungi: if you happen to have a new patchset for an existing chagne you can push that would be good to check replication. I'll look at my change list too once 14 is done21:00
fungiload on them is spiking up around 5 but starting to fall again, so probably just additional startup pressure21:00
clarkbfungi: I think it must be doing upgrade tasks (db migrations?) the ssh service isn't starting for like 30-50 seconds while ansbile waits for web to come up21:01
fungiyeah, seems likely21:02
fungiload average is back to 1 on gitea0921:02
clarkband now 14 is done21:02
clarkbhttps://zuul.opendev.org/t/openstack/build/ccec84325b7e4a0cbbe9e55da57355b8 job reports success too21:03
clarkbnow to find something that will trigger replication21:03
opendevreviewClark Boylan proposed opendev/lodgeit master: Reapply "Move lodgeit image publication to quay.io"  https://review.opendev.org/c/opendev/lodgeit/+/93938521:04
clarkbthat change needed a recheck anyway so I just updated the commit message to trigger reruns of jobs21:04
clarkbnow to check replication21:04
clarkb`git fetch origin refs/changes/85/939385/2` shows me what I expect with originhttps://opendev.org/opendev/lodgeit (fetch)21:06
clarkbso I think replication is working21:06
clarkbincludes the edit to the commit message and the sha matches 21:06
opendevreviewJeremy Stanley proposed opendev/system-config master: Install ssl-cert-check from distro package not Git  https://review.opendev.org/c/opendev/system-config/+/93918721:06
* fungi was too slow21:06
clarkbno it is good to sanity check I'm not doing anything wrong21:07
clarkblooks like load is elevated on 9, 13, and 1421:07
clarkbnothing crazy though21:07
clarkbalso falling. It could just be luck of the draw for those getting reconnects or similar from clients that balance to them. I don't see anything that makes me think there is a problem21:08
clarkbI can't remember who was pointing out the occasional issues with their jobs fetching constraints. Was it noonedeadpunk? Anyway I am hopeful that the reliability changes that have gone into 1.23 make that better now21:11
opendevreviewClark Boylan proposed opendev/system-config master: Fix a typo in the uwsgi-base mirror short name field  https://review.opendev.org/c/opendev/system-config/+/93956221:16
clarkbI'm trying to find the periodic job logs now to see if that just failed previously. It doesn't look like there is a 'uwsgi base' image create for us anyway21:16
clarkbya the job failed and eventually went into retry failure21:18
clarkblanding 929562 should hopefully be all we need to correct that21:19
fungiERROR: failed to solve: quay.io/opendevmirror/uwsgi-base:3.11-bookworm: failed to resolve source metadata for quay.io/opendevmirror/uwsgi-base:3.11-bookworm: unexpected status from HEAD request to https://quay.io/v2/opendevmirror/uwsgi-base/manifests/3.11-bookworm: 401 UNAUTHORIZED21:20
fungireason for that i guess?21:21
clarkbyes21:22
fungi929562 is abandoned "DNM - Trest CentOS" so i assume you meant another change21:22
clarkb939562 the change I just pushed. Thats what I get for tpying the change out by hand21:22
clarkbironic given that was likely also the source of the original bug21:22
fungihah, whoops (re: 939562)21:24
opendevreviewMerged opendev/system-config master: Fix a typo in the uwsgi-base mirror short name field  https://review.opendev.org/c/opendev/system-config/+/93956221:34
clarkbwhat an eventful day. I think its basically EOD for fungi so we can hold off on Gerrit today (it is less urgent)21:49
clarkband that gives me an opportunity to maybe go out for a walk in the sunlight before it is gone21:49
fungido it!21:50
fungiand yeah, i'm mostly checked out at this point, but will try to keep an eye on stuff off and on over the weekend as time permits21:51

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!