| -@gerrit:opendev.org- Steve Baker proposed: | 05:02 | |
| - [openstack/diskimage-builder] 983813: Refactor 02-set-machine-id into an element https://review.opendev.org/c/openstack/diskimage-builder/+/983813 | ||
| - [openstack/diskimage-builder] 983814: Refactor 03-reset-bls-entries into an element https://review.opendev.org/c/openstack/diskimage-builder/+/983814 | ||
| - [openstack/diskimage-builder] 983815: Add tarball element for unprivileged container builds https://review.opendev.org/c/openstack/diskimage-builder/+/983815 | ||
| @mnasiadka:matrix.org | After yesterdays changes the LB Grafana dashboard might need some love - https://grafana.opendev.org/d/1f6dfd6769/opendev-load-balancer?orgId=1&from=now-24h&to=now&timezone=utc | 07:26 |
|---|---|---|
| @priteau:matrix.org | Hello. Is something broken with Zuul? https://zuul.opendev.org is returning 403 | 08:40 |
| @sean-k-mooney:matrix.org | It works for me just now | 08:45 |
| @sean-k-mooney:matrix.org | I guess you’re signing in? | 08:45 |
| @sean-k-mooney:matrix.org | Rather then accessing it without auth | 08:46 |
| @mnasiadka:matrix.org | Works for me | 08:59 |
| @mnasiadka:matrix.org | Hmm, reprepro Ubuntu mirror script is failing on missing bionic "Error: packages database contains unused 'bionic-backports|main|amd64' database." | 09:03 |
| @priteau:matrix.org | It works in Chrome but not in Safari | 09:18 |
| @priteau:matrix.org | User agent filtering? | 09:18 |
| @priteau:matrix.org | * It works in Chrome and Firefox, but not in Safari | 09:18 |
| @sean-k-mooney:matrix.org | so again not signed in but i just tested on safari and i can access it | 09:19 |
| @sean-k-mooney:matrix.org | both on moblie and on my macbook air | 09:20 |
| @mnasiadka:matrix.org | Pierre Riteau: might be, but my Safari works | 09:37 |
| @priteau:matrix.org | Mine is Safari 26.4 | 09:48 |
| @priteau:matrix.org | It works if I change the user agent to Chrome | 09:49 |
| @priteau:matrix.org | User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/26.4 Safari/605.1.15 | 10:02 |
| @mnasiadka:matrix.org | Yes, that one seems to be in Zuul Apache ua-filter.conf | 10:16 |
| @priteau:matrix.org | Same issue to access https://static.openstack.org/project/opendev.org/docs/opendev/infra-manual/latest/matrix.html | 10:16 |
| @priteau:matrix.org | And https://tarballs.openstack.org | 10:17 |
| @priteau:matrix.org | But https://docs.openstack.org works fine | 10:17 |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 983845: Remove entry from ua-filter https://review.opendev.org/c/opendev/system-config/+/983845 | 10:20 | |
| @mnasiadka:matrix.org | ^^ that should fix it, once it gets reviewed and merged | 10:22 |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/zone-opendev.org] 983851: Promote mirror04.gra1.ovh https://review.opendev.org/c/opendev/zone-opendev.org/+/983851 | 10:47 | |
| @mnasiadka:matrix.org | #status log Replaced mirror02.bhs1.ovh.opendev.org (b2a0f48a-c850-485f-838c-0c896c2cfa5d, volume: 05e72250-a3af-46c6-a2b8-aa1b8b0da928) with mirror03.bhs1.ovh.opendev.org | 10:52 |
| @status:opendev.org | @mnasiadka:matrix.org: finished logging | 10:52 |
| @harbott.osism.tech:regio.chat | it just looks like there were some counter wraps/resets? | 11:53 |
| @harbott.osism.tech:regio.chat | also the next attack wave seems to have started 10 mins ago :-/ | 11:53 |
| @mnasiadka:matrix.org | well, opendev.org is responsive for me, maybe a bit slot - but not that bad | 11:59 |
| -@gerrit:opendev.org- Dmitriy Rabotyagov proposed: [openstack/project-config] 981924: Introduce OpenStack-Ansible Power Reviewers group https://review.opendev.org/c/openstack/project-config/+/981924 | 12:04 | |
| @fungicide:matrix.org | i'm getting 500 internal server error when trying to clone a repo from gitea at the moment | 12:17 |
| @fungicide:matrix.org | so we may have tuned apache and haproxy up high enough that gitea can get overwhelmed now | 12:17 |
| @fungicide:matrix.org | yeah, looking at several backends, apache responds quickly but server-status shows most workers in "sending reply" state so may be waiting for gitea to stream the requested content | 12:21 |
| @fungicide:matrix.org | gitea09 is handling roughly one POST for a git-upload-pack each second | 12:29 |
| @fungicide:matrix.org | the other backends are seeing a lot less, so this may be the effect of hashing by client address | 12:31 |
| -@gerrit:opendev.org- Zuul merged on behalf of Michal Nasiadka: [opendev/system-config] 983845: Remove entry from ua-filter https://review.opendev.org/c/opendev/system-config/+/983845 | 12:31 | |
| @fungicide:matrix.org | yeah, load average on gitea09 is hovering around 25 too | 12:31 |
| @clarkb:matrix.org | The 500 errors should be logged in /var/log/containers/docker-gitea.log iirc | 12:33 |
| @fungicide:matrix.org | the poor folks on the launchpad team at canonical are struggling again today too, based on discussion in their matrix room | 12:33 |
| @clarkb:matrix.org | I got an error from gitea12. Looking at grafana graphs I suspect that the load balancer may be cycling the back ends if the healthz responses are slow or also 500ing | 12:34 |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/zone-opendev.org] 983860: Add entries for mirror03.ord.rax https://review.opendev.org/c/opendev/zone-opendev.org/+/983860 | 12:35 | |
| @clarkb:matrix.org | The haproxy show stat command shows backend status iirc but I don't remember the exact command off the top of my head (we don't capture that in grafana) | 12:35 |
| @fungicide:matrix.org | `Apr 9 12:35:55 gitea09 docker-gitea[712]: 2026/04/09 12:35:54 services/context/repo.go:535:RepoAssignment() [E] GetReleaseCountByRepoID: Error 1040: Too many connections` | 12:36 |
| @fungicide:matrix.org | quite a few of those | 12:36 |
| @mnasiadka:matrix.org | Clark: # watch 'echo "show stat" | sudo nc -U /var/lib/haproxy/run/stats | cut -d "," -f 1,2,3,4,5,6,8-10,18 | column -s, -t' | 12:36 |
| @clarkb:matrix.org | Oh we overwhelmed the databade | 12:36 |
| @mnasiadka:matrix.org | ah, backend status - more columns :) | 12:36 |
| @mnasiadka:matrix.org | actually not, it's in last column of my oneliner | 12:37 |
| @clarkb:matrix.org | fungi: error 1040 too many connections is the mariadb database complaining I think | 12:37 |
| @clarkb:matrix.org | I'm not sure if we want to try increasing the mariadb limit and restart services or drop the haproxy front end limits back down. It looks like maybe the prior 8000 value must be right on the limit of either what our DB connection limit can do or what the botbet is doing since we're hovering around 8k connections now | 12:39 |
| @clarkb:matrix.org | I believe there is a specific tuning config file for gitea mariadb connection limits already | 12:41 |
| @clarkb:matrix.org | But without access to gitea it's hard to find at the moment :) | 12:41 |
| @clarkb:matrix.org | If we want to increase DB connections I can probably get to a real computer and load ssh keys | 12:42 |
| @mnasiadka:matrix.org | Well, it's the question of allow the botnet to scrape what they want faster (and explore other bottlenecks) or get back to previous limits and wait until it finishes? | 12:44 |
| @clarkb:matrix.org | Yes, though yesterday we were theorizing a good chunk of the traffic would 403. However if we're getting far enough to talk to the db then that may also have changed. Where the traffic is now actually processing the entire request and waiting for a response rather than short circuiting | 12:45 |
| @clarkb:matrix.org | That may also explain why at 8k ish connections we're seeing trouble | 12:46 |
| @fungicide:matrix.org | the balance comes from being able to reject enough bogus requests before they hit the db | 12:46 |
| @fungicide:matrix.org | i'm definitely seeing a ton of the same sorts of mobile phone user agent strings associated with requests for /commit/ paths making it through to the backend still | 12:47 |
| @vhasko:matrix.org | hello guys, we from T Cloud Public (formerly OpenTelekomCloud) also experiencing 500 error on https://opendev.org/zuul/zuul-jobs/ | 12:47 |
| @fungicide:matrix.org | Vladi: thanks, it's known | 12:47 |
| @clarkb:matrix.org | So I think our options are either to drop the front end limits down again or increase the mariadb connection limits. Status quo is less desirable as I suspect this can impact gereit replication stuff with the DB errors | 12:49 |
| @fungicide:matrix.org | though the majority of the crawlers getting through now are back to computer browser identifiers rather than mobile device ones | 12:50 |
| @clarkb:matrix.org | Yes, the traffic actually hitting the DB implies we're filtering less and returning fewer 403s. We may still be able to keep up if we stabilize via increased DB connection limits though | 12:51 |
| @clarkb:matrix.org | Or the system load will skyrocket and it won't be useable ;) hard to say from the current vantage point | 12:52 |
| @fungicide:matrix.org | but with the load average on gitea09 approaching 30, i expect other failure modes if we increase the db connections | 12:52 |
| @clarkb:matrix.org | That may be due to 09 being the only system up and then going down and the next system is hit. But yes not a good indicator | 12:53 |
| @fungicide:matrix.org | maybe it's time to add more backend servers? copying the data to them is time-consuming though | 12:53 |
| @mnasiadka:matrix.org | All gitea servers have around 30 load avg | 12:53 |
| @mnasiadka:matrix.org | It's not only 09 anymore | 12:53 |
| @fungicide:matrix.org | yeah, i was using 09 as an example, though maybe a poor choice since it was handling a lot more git requests in the past hour | 12:54 |
| @fungicide:matrix.org | or do we want to try to go ahead and land the anubis implementation? | 12:54 |
| @mnasiadka:matrix.org | Well I don't think lowering frontend limits is going to help, it was sort of the same situation yesterday - it just was blocked on LB instead of hammering backend | 12:55 |
| @clarkb:matrix.org | The upside to blocking on the frontend is that Gerrit replication could continue to succeed | 12:55 |
| @mnasiadka:matrix.org | Right | 12:56 |
| @mnasiadka:matrix.org | So - lower the LB frontend limit and land Anubis and see if it helps? | 12:56 |
| @fungicide:matrix.org | previously i didn't expect anubis to help much because the requests had to go through haproxy and apache anyway, but now it's actually load on the gitea service/database killing us, so it might be the solution | 12:56 |
| @clarkb:matrix.org | Ya we may need to increase the limits again as the frontend will just get overpowered but in theory anubis should push back and allow the backend to keep up | 12:57 |
| @clarkb:matrix.org | So step 1 limit front end which won't fix anything for clients. Step 2 deploy anubis. Step 3 see if that provides enough return to sanity? | 12:57 |
| @vhasko:matrix.org | I can confirm that implementing anubis saved our asses on our HelpCenter, kicked off many crawler bots and AI bots | 12:58 |
| @clarkb:matrix.org | Yes we've deployed it on another service | 12:58 |
| @clarkb:matrix.org | It has just been a bit more complicated and tricky to get into gitea particularly when we have had to run a fire drill every 6 hours | 12:59 |
| @fungicide:matrix.org | the main concern with using it in front of gitea is that the crawlers were overloading haproxy and apache before they would have reached anubis, but now we've tuned those to accept a lot more connections/requests | 12:59 |
| @clarkb:matrix.org | fungi: Jens Harbott has a comment about why the latest anubis ps failed. Do we want to get that updated and work on the http vs https switchover in the interim? | 13:03 |
| @clarkb:matrix.org | But also setting the limit on the frontend back to 4k will likely help with the config updates on the backend | 13:04 |
| @fungicide:matrix.org | looking, i hadn't seen it yet | 13:04 |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/system-config] 983061: Apply Anubis to the Gitea backend servers https://review.opendev.org/c/opendev/system-config/+/983061 | 13:07 | |
| @clarkb:matrix.org | I do half wonder if we want to try and manually cut over a single backend too | 13:07 |
| @clarkb:matrix.org | But as previously mentioned there is a fair bit of config to update by hand | 13:08 |
| @fungicide:matrix.org | easy enough to disable ansible deployment to the remaining backends, but then deploying to them later will require manually running ansible i guess? | 13:08 |
| @clarkb:matrix.org | Yes or reenqueue the buildset assuming we don't land anything else conflicting in the interim | 13:09 |
| @clarkb:matrix.org | Looking at the http cutover the bulk of that change is in management code. But otherwise it's a fairly straightforward update of apache and gitea config and restarts of both services. Applying that by hand shouldn't be too bad. I can do that on say gitea14 after manually taking it out of the rotation if we think that would be good to verify. Then similarly anubis is docker compose update and apache update and could be done manually? Just to confirm it generally works and then we apply it via ansible? I dunno | 13:14 |
| @clarkb:matrix.org | There are no good easy straightforward options so just need to pick something and move forward I guess | 13:14 |
| @fungicide:matrix.org | the anubis change yes, though the parent change will be a bit more of a beast | 13:14 |
| @clarkb:matrix.org | It doesn't look too bad (the parent) | 13:15 |
| @clarkb:matrix.org | The main thing is that the management code that talks to https localhost will fail if it runs after a manual update. Which is probably ok let's just not create any projects right now? | 13:16 |
| @priteau:matrix.org | Thanks for fixing the UA filtering for https://zuul.opendev.org/ | 13:16 |
| @fungicide:matrix.org | yeah, i guess it's mainly just docker-compose.yaml, app.ini and gitea.vhost changes that need to get applied by hand, and then apache and containers restarted | 13:16 |
| @priteau:matrix.org | I see there is still the same issue for https://opendev.org | 13:17 |
| @clarkb:matrix.org | fungi: ya give me a few minutes to get situated at the computer but I'll remove gitea14 from the backend rotation then start on it | 13:18 |
| @fungicide:matrix.org | Pierre Riteau: yes, infra-prod-service-gitea failed in deploy for 983845 | 13:18 |
| @fungicide:matrix.org | probably related to one or more of the servers being overloaded | 13:18 |
| @fungicide:matrix.org | i'll need to check the log to confirm | 13:18 |
| @fungicide:matrix.org | `TASK [gitea : List keys again to ensure key ids are correct for deletion.] fatal: [gitea09.opendev.org]: FAILED! => "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result"...` | 13:19 |
| @fungicide:matrix.org | so probably yes | 13:20 |
| @fungicide:matrix.org | zuul estimates another 45 minutes to finish system-config-run-gitea for checking the test update to 983061, and then it needs to get through the gate, but since other testing already succeeded on it earlier if we want to hand-apply it on one of the backends as a trial run i guess we could do it in the interim | 13:22 |
| @clarkb:matrix.org | 14 has been pulled out of the haproxy rotation via manual haproxy socat commands | 13:24 |
| @clarkb:matrix.org | I'm going to apply the two changes separately and we can confirm things look good between the two | 13:26 |
| @fungicide:matrix.org | thanks. sorry i'm a little distracted, in a completely unrelated conference call and have another right after this one | 13:31 |
| @clarkb:matrix.org | I'm staging everything in my homedir so that I'm not ninja editing and there is a bit of a paper trail | 13:32 |
| @clarkb:matrix.org | just in case anyone is wondering what is taking so long and/or if you want to look | 13:32 |
| @clarkb:matrix.org | ok gitea14 should be up after a manual transition to the http backend | 13:40 |
| @clarkb:matrix.org | fungi: ^ if you want t o test that too (I am about to test it. Haven't done any checks beyond looking at processes so far) | 13:40 |
| @clarkb:matrix.org | web seems to be working for me via the socks proxy through the load balancer | 13:41 |
| @clarkb:matrix.org | I'm going to work on staging the anubis change next. That will take a bit of time giving time to test the http transition | 13:41 |
| @fungicide:matrix.org | hah, it's almost 100% git clients hitting the gitea-web interface now | 13:46 |
| @fungicide:matrix.org | load average is running sub-1.0 | 13:47 |
| @clarkb:matrix.org | fungi: on which host? | 13:47 |
| @fungicide:matrix.org | gitea14 | 13:47 |
| @clarkb:matrix.org | gitea14 should be out of haproxy so I wouldn't expect anything talking to it? | 13:47 |
| @clarkb:matrix.org | unless maybe those git clients are from before I took it out and haproxy lets them finish? | 13:47 |
| @fungicide:matrix.org | oh, okay that's internal gitea connections | 13:48 |
| @clarkb:matrix.org | anyway staging is taking me a minute to get my bearings | 13:48 |
| @fungicide:matrix.org | `GiteaHttpLib` | 13:48 |
| @clarkb:matrix.org | so I haven't anubis'd anything yet | 13:48 |
| @fungicide:matrix.org | that makes more sense as to why there's so little activity | 13:50 |
| @clarkb:matrix.org | ok I'm about ready to restart gitea14 services again just a heads up that I'll be doing that and your tests may start to fail | 13:55 |
| @fungicide:matrix.org | k | 13:56 |
| @fungicide:matrix.org | we're about 10 minutes out from system-config-run-gitea possibly moving into the gate if Jens Harbott's test suggestion worked | 13:57 |
| @fungicide:matrix.org | ooh, i see a firefox hit that made it through gitea | 13:58 |
| @fungicide:matrix.org | oh, was that you Clark? | 13:58 |
| @fungicide:matrix.org | looks like it used https://gitea14.opendev.org:3081/ as the url base | 13:59 |
| @clarkb:matrix.org | yes that awas me | 13:59 |
| @clarkb:matrix.org | it isn't working I think due to the :3081 so I'm trying to fix that | 13:59 |
| @fungicide:matrix.org | does hitting apache on 443 not work, or are you trying to exclude that layer? | 14:00 |
| @clarkb:matrix.org | the anubis redirect says not an allowed redirect domain | 14:00 |
| @clarkb:matrix.org | I had gitea14.opendev.org in the redirect domains to match waht the chagne should do but I think it needs the :3081 maybe? | 14:00 |
| @clarkb:matrix.org | I'm going to restart services again | 14:00 |
| @fungicide:matrix.org | looks like you got a 200 ok | 14:01 |
| @clarkb:matrix.org | yes that seems to fix it if I add :3081. I don't think this affects production so I think we can proceed as is and then do a followup to fix it | 14:02 |
| @clarkb:matrix.org | do you think I should add gitea14.opendev.org back to haproxy? My concern is that we'll get the backends round robinning due to the other issues and either gitea14 will take an outsized amount of load or the anubis cookie may confuse things? | 14:02 |
| @clarkb:matrix.org | however, we can always take it back out of rotation if that is a problem I guess if we want to go for it | 14:02 |
| @fungicide:matrix.org | yes, please do | 14:03 |
| @fungicide:matrix.org | i'm watching the log | 14:03 |
| @clarkb:matrix.org | done | 14:03 |
| @fungicide:matrix.org | i see normal traffic now | 14:03 |
| @fungicide:matrix.org | and as expected, it's almost all git traffic | 14:03 |
| @fungicide:matrix.org | lots and lots of it | 14:04 |
| @fungicide:matrix.org | though i do see some normal browsers getting through as well, they look like maybe more normal requests | 14:04 |
| @clarkb:matrix.org | cool I guess I can do 13 next and so on down the line and race ansible | 14:05 |
| @fungicide:matrix.org | looks like google gemini knows how to solve anubis challenges | 14:06 |
| @clarkb:matrix.org | on 13 I can just go to the final state too now that we know it works so maybe it will be quicker | 14:06 |
| @clarkb:matrix.org | fungi: can you monitor gitea14 while I do 13 as its load is steadily climbing | 14:07 |
| @clarkb:matrix.org | I worry its going to get the bulk of the traffic due to being functional | 14:07 |
| @fungicide:matrix.org | yeah, we're up around 12 load average so far | 14:07 |
| @fungicide:matrix.org | actually it's fallen a little | 14:07 |
| @fungicide:matrix.org | requests haven't stopped coming in though (at a very rapid clip), so it's not like it fell out of haproxy | 14:08 |
| @fungicide:matrix.org | load average is now down around 6 | 14:09 |
| @fungicide:matrix.org | so it may be reaching a steady state | 14:09 |
| @clarkb:matrix.org | fungi: oh also if you can write a followchange to add the :3081 to the redirect domain stuff that would be great. You can look at the docker-compose.yaml on gitea14 to see what I mean specifically | 14:10 |
| @fungicide:matrix.org | on it now | 14:10 |
| @clarkb:matrix.org | otherwise I'll try to do that when I've either beaten ansible or ansible has won | 14:10 |
| @clarkb:matrix.org | thanks! | 14:10 |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/system-config] 983875: Support proxy tunnel to Gitea Apache for testing https://review.opendev.org/c/opendev/system-config/+/983875 | 14:14 | |
| @fungicide:matrix.org | like that ^ ? | 14:14 |
| @clarkb:matrix.org | yes | 14:16 |
| @fungicide:matrix.org | i see pip hitting gitea successfully on gitea14 | 14:18 |
| @fungicide:matrix.org | someone fetched /openstack/requirements/raw/branch/stable/2025.2/upper-constraints.txt with a 200 ok response | 14:19 |
| @fungicide:matrix.org | load average is still down around 7-8 | 14:20 |
| @clarkb:matrix.org | I've just got gitea13 up and running and I think it looks good. I'll give it a minute for anyone to object before I put it back into the load balancer | 14:20 |
| @scott.little:matrix.org | Is there a time based element to these attacks? Starlingx is finding that around 8:00 am Eastern, you guys are almost always not responding. Builds at other times are much more likely to pass. | 14:21 |
| @clarkb:matrix.org | scott.little: yes they come in waves you can see them at https://grafana.opendev.org/d/1f6dfd6769/opendev-load-balancer?orgId=1&from=now-2d&to=now&timezone=utc | 14:22 |
| @fungicide:matrix.org | whoever's in control of this particular bot army seems to kick off batches which then subside | 14:23 |
| @clarkb:matrix.org | fungi: I've put 13 into the rotation now. I'll move onto 12 next | 14:24 |
| @clarkb:matrix.org | (if you're able to keep monitoring as I go and/or spot check configs and responses that is much appreciated) | 14:24 |
| @fungicide:matrix.org | can do, thanks | 14:25 |
| @fungicide:matrix.org | i do see reasonable-looking requests getting through to 13 | 14:26 |
| @fungicide:matrix.org | including pip fetching constraints files | 14:26 |
| @clarkb:matrix.org | 12 is up and appears to be working. I'll add it back to the lb shortly | 14:32 |
| @tafkamax:matrix.org | Tried opendev.org and got the waifu approval | 14:33 |
| @tafkamax:matrix.org | seems to be working | 14:33 |
| @fungicide:matrix.org | Taavi Ansper: yeah, for now it will be hit-or-miss depending on which backend you get routed to | 14:34 |
| @clarkb:matrix.org | fungi: 12 should be back in the lb if you want to spot check it. I'm moving onto 11. It isn't quick because I'm trying to be careful and check things as I go. but I think this must already be helping based on what people are reporting back | 14:34 |
| @clarkb:matrix.org | and its a bit faster now as I'm not staging each step I'm just going to the end result | 14:35 |
| @clarkb:matrix.org | gitea11 is done now. I've added it to the load balacner again. I noticed that on 12 and 13 I failed to remove the bind mount of the certs. Doesn't affect functionality, but is something that I'll clean up after 10 and 09 | 14:42 |
| @clarkb:matrix.org | fungi: I've also just realized that since I'm doing this manaully our two stage deployment from ansible will undo anubis. This is probably fine? But I wanted to mention it as we may want to edit the emergency file appropriately. Let both changes deploy in a noop fashion, then remove hosts from the emergency file and then reenqueue? | 14:45 |
| @clarkb:matrix.org | I think that makes sense to me but we have time to think about it | 14:45 |
| @fungicide:matrix.org | `RuntimeError: Cannot validate ip address '[::1]'` | 14:53 |
| @clarkb:matrix.org | I'm also noticing that we do indeed need to rereplicate as not all gitea backends have mnasiadka's change to clean up the ua filter | 14:53 |
| @mnasiadka:matrix.org | ugh, sorry for that | 14:53 |
| @fungicide:matrix.org | looks like system-config-run-gitea timed out on the anubis change, but the failure shows up in the child change | 14:54 |
| @fungicide:matrix.org | it may need to be tcp6://? | 14:54 |
| @clarkb:matrix.org | fungi: that is in the testinfra test? they do have docs that may have clues | 14:54 |
| @fungicide:matrix.org | yeah, i'm digging | 14:55 |
| @clarkb:matrix.org | 10 is back in the rotation. I'm moving to 09 next then will go back and fix teh bind mount of certs on 13 and 12 | 14:55 |
| @clarkb:matrix.org | then maybe we put hosts in the emergency file. Get this into mergeable shape. And trigger replication? | 14:55 |
| @fungicide:matrix.org | we could also just drop the `assert anubis.is_listening` check you asked for, we exercise anubis in subsequent tests so it has to be listening anyway for those to pass | 14:56 |
| @clarkb:matrix.org | ya that seems fine for now too | 14:56 |
| @clarkb:matrix.org | can add that later when we figure it out | 14:56 |
| @fungicide:matrix.org | i could move it to a separate change so it doesn't block deploymemt | 14:56 |
| @clarkb:matrix.org | ++ | 14:56 |
| @fungicide:matrix.org | i'll do that now, but fold your request :3081 in | 14:57 |
| @fungicide:matrix.org | now that more of the backends are not burdened with bogus requests, load average across them has fallen drastically | 14:57 |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: | 15:02 | |
| - [opendev/system-config] 983134: Remove intermediate HTTPS layer for Gitea backends https://review.opendev.org/c/opendev/system-config/+/983134 | ||
| - [opendev/system-config] 983061: Apply Anubis to the Gitea backend servers https://review.opendev.org/c/opendev/system-config/+/983061 | ||
| @clarkb:matrix.org | ok 09 is done. I'm going to clean up the bind mounts on 13 and 12 now so they will rotate out then back in | 15:03 |
| @clarkb:matrix.org | fungi: do you want to put the gitea backends in the emergency file? | 15:03 |
| @clarkb:matrix.org | I don't think that is a rush and I can do it when I'm done with 12 adn 13 as well | 15:03 |
| @fungicide:matrix.org | huh, why did i get a new patchset on 983134 i wonder | 15:06 |
| @fungicide:matrix.org | weird, i didn't think i changed the parent base when i rebased | 15:07 |
| @clarkb:matrix.org | ok 12 and 13 are done | 15:07 |
| @clarkb:matrix.org | so all backends should now be running anubis using a config that matches the proposed changes if I have done the manual edits correctly | 15:07 |
| @clarkb:matrix.org | fungi: I can put things in the emergency file since I've run out of things that need doing immediately | 15:08 |
| @fungicide:matrix.org | if we can land the two changes for that before daily jobs run, we shouldn't need them in there, right? | 15:09 |
| @clarkb:matrix.org | fungi: we need them because the first change will undo anubis and then the second will readd it. I guess this may not be critical depending on the state of the system, but not flip flopping would be nice | 15:10 |
| @fungicide:matrix.org | ah, fair | 15:11 |
| @clarkb:matrix.org | ok they are in the emergencyfile | 15:12 |
| @fungicide:matrix.org | so we would put them in emergency, wait for the deploy on both changes to skip those servers, then take them out of emergency and let the daily deploy cover both changes together | 15:12 |
| @fungicide:matrix.org | (or some subsequent deploy that runs the same job) | 15:13 |
| @clarkb:matrix.org | yup or maybe even better is after both deployments happen remove the servers from the emergency file and reenqueue the deployment buildset for the second change | 15:13 |
| @clarkb:matrix.org | so that we don't have to wait until 02:00 to dsicover if there was an important difference between what I did and what is in ansible | 15:13 |
| @fungicide:matrix.org | for that matter, we could take them out of emergency between the first and second deploy | 15:13 |
| @clarkb:matrix.org | yup, though that may be a tight window | 15:13 |
| @clarkb:matrix.org | then once that is done I think we should attempt cleanup of the emergency ua filter stuff since we keep having false positives and see if anubis si sufficient (it should be based on current evidence) | 15:14 |
| @clarkb:matrix.org | we can't completely drop ua filters because not all services are behind anubis but we can start with the removal of those rules I think | 15:14 |
| @fungicide:matrix.org | okay, looks like we do the v6 listening test for keycloak and it just knows that the final `:` is the port separator | 15:15 |
| @fungicide:matrix.org | `keycloak = host.socket("tcp://::1:8080")` | 15:15 |
| @fungicide:matrix.org | i'll just reinclude it | 15:16 |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/system-config] 983061: Apply Anubis to the Gitea backend servers https://review.opendev.org/c/opendev/system-config/+/983061 | 15:16 | |
| @clarkb:matrix.org | I have approved both changes now | 15:16 |
| @fungicide:matrix.org | thanks | 15:16 |
| @fungicide:matrix.org | gitea is working quickly and cleanly for me, browsing around and also doing git fetches. i did see the anubis screen flash up momentarily | 15:18 |
| @mnasiadka:matrix.org | Works well for me also, I can do some more testing in around an hour - if that’s useful | 15:19 |
| @fungicide:matrix.org | 5-minute load averages on all backends are ranging from 0.5-4.5 at the moment | 15:19 |
| @clarkb:matrix.org | ya I think general monitoring for unexpected behaviors is good. I want to audit the manual work I did after a short break to reset the head (and maybe a shower) | 15:20 |
| @clarkb:matrix.org | just to make sure I didn't miss anything else like the certs bind mount removal | 15:20 |
| @fungicide:matrix.org | i may have missed it in scrollback, but did full replication from gerrit get kicked off yet? | 15:21 |
| @fungicide:matrix.org | if not i can do that next | 15:21 |
| @clarkb:matrix.org | I did not start that. I think it would be good if you can do that. you can `gerrit show-queue -w` first to confirm it isn't already in progress | 15:21 |
| @fungicide:matrix.org | only 27 tasks running in gerrit's queue so i think it didn't | 15:21 |
| @fungicide:matrix.org | yeah i just checked that | 15:22 |
| @fungicide:matrix.org | i'll start it, unless there are reasons not to | 15:22 |
| @clarkb:matrix.org | I don't think so at this point unless you want to audit the manual work I did first | 15:23 |
| @fungicide:matrix.org | looks like the gerrit ssh api command is `replication start --all` | 15:23 |
| @clarkb:matrix.org | but I want a shower before I do that to reset the head and look at it with fresh eyes | 15:23 |
| @fungicide:matrix.org | i'd rather reduce the window of time people might be pulling outdated git refs, then can look over the backends while that's in progress | 15:24 |
| @fungicide:matrix.org | worst case we end up running it twice | 15:24 |
| @clarkb:matrix.org | wfm | 15:24 |
| @fungicide:matrix.org | running now | 15:26 |
| @fungicide:matrix.org | 2175 tasks in the gerrit queue | 15:26 |
| @clarkb:matrix.org | any objections if I pop out now and get that shower? Then when I get back I'll review my work | 15:27 |
| @fungicide:matrix.org | no objection | 15:28 |
| @clarkb:matrix.org | fungi: if you filter for GiteaHttpLib in the /var/gitea/logs/access.log fiel you'll see all the internal requests that I think are rleated to replication | 15:29 |
| @clarkb:matrix.org | they appear to have 200 response codes so I think it is working | 15:29 |
| @fungicide:matrix.org | status notice Anubis is now deployed on our Gitea backends, and things are back to working normally though you may notice an Anubis screen flash briefly when starting to browse opendev.org; any jobs which failed prior to 15:00 UTC today can be safely rechecked | 15:30 |
| @fungicide:matrix.org | that look reasonable? | 15:30 |
| @clarkb:matrix.org | yes | 15:31 |
| @clarkb:matrix.org | and I'll pop out now for ~20 minutes or so | 15:31 |
| @fungicide:matrix.org | #status notice Anubis is now deployed on our Gitea backends, and things are back to working normally though you may notice an Anubis screen flash briefly when starting to browse opendev.org; any jobs which failed prior to 15:00 UTC today can be safely rechecked | 15:32 |
| @status:opendev.org | @fungicide:matrix.org: sending notice | 15:32 |
| @clarkb:matrix.org | actually one last thought: looking at https://grafana.opendev.org/d/1f6dfd6769/opendev-load-balancer?orgId=1&from=now-6h&to=now&timezone=utc it is hard to say if the end of the crawling coincided with our anubis deployment though it falls off gradually so I think we were actually pushing back properly. Previous editions were like an on off switch iirc | 15:32 |
| @clarkb:matrix.org | I guess the frontend apache logs may tell us if we really want to know | 15:33 |
| @fungicide:matrix.org | more the haproxy log if we're just trying to gauge request volume | 15:33 |
| @clarkb:matrix.org | ++ | 15:34 |
| @clarkb:matrix.org | ok really popping out now | 15:34 |
| @mnasiadka:matrix.org | Well, tomorrow 14:00 CEST will tell us - weirdly it’s sort of start of work time in EST timezone? | 15:34 |
| @fungicide:matrix.org | gerrit's down to 8521 tasks in the queue now | 15:35 |
| -@status:opendev.org- NOTICE: Anubis is now deployed on our Gitea backends, and things are back to working normally though you may notice an Anubis screen flash briefly when starting to browse opendev.org; any jobs which failed prior to 15:00 UTC today can be safely rechecked | 15:36 | |
| @status:opendev.org | @fungicide:matrix.org: finished sending notice | 15:36 |
| -@gerrit:opendev.org- Zuul merged on behalf of Gregory Thiemonge: [opendev/irc-meetings] 983191: Update meeting chair for Octavia https://review.opendev.org/c/opendev/irc-meetings/+/983191 | 15:45 | |
| @fungicide:matrix.org | i'm still seeing a few requests that look like crawlers faking browser-type user agent strings which would have had to solve the anubis challenges, but the volume is low | 15:49 |
| @fungicide:matrix.org | though likely an indicator that this workaround will only get us by for so long before they adapt | 15:49 |
| @fungicide:matrix.org | but maybe it slows them down enough, in the case of the ones that need to run js to solve challenges and get/redistribute cookies | 15:50 |
| @fungicide:matrix.org | replication from gerrit to the gitea backens has completed | 15:52 |
| @scott.little:matrix.org | We are trying to find a way to get more reliable git downloads. Would switching from https: to ssh: help? Another suggestion was to pull from review.opendev.org rather than opendev.org. | 15:55 |
| @fungicide:matrix.org | scott.little: we're trying to provide more reliable git downloads. please don't switch all your fetches to review.opendev.org or we'll end up overloading that server instead | 15:59 |
| @fungicide:matrix.org | we also don't have an ssh git endpoint | 15:59 |
| @jim:acmegating.com | remind me why jobs aren't just using zuul required-projects? | 16:00 |
| @fungicide:matrix.org | the request volume overall in the past few weeks has simply been too much for the services to keep up with, but we think we have a longer-term replacement im there now | 16:00 |
| @fungicide:matrix.org | i'm assuming scott.little isn't talking about zuul jobs, but yes if this is in the context of jobs running in zuul then setting their required-projects list appropriately will make the git access local on the test nodes rather than over the internet | 16:02 |
| @jim:acmegating.com | ack. i know there were a lot of job failures. i agree it's unclear if scott.little was asking in that context, but it seems regardless there may be at least some cases where the question may be relevant. | 16:03 |
| @clarkb:matrix.org | right I think to summarize the OpenDev team is doing what it can to make the opendev.org git services as reliable as possible. The solution to them not being up to the level of reliability desired is to join us and help amke things better. I tried explaining this earlier today in an other channel. But I have something like 8 jobs/roles right now? OpenDev sysadmin, opendev service coordinator, zuul maintainer, zuul community manager, I ahve been massaging contribution metics with the switch to lfx, I also do CI engineering (hand wave around taht one). I maintain python packaging systems and container images that are critical to many of our software projects and services. The list goes on and on. I'm spread incredibly thing. I know fungi is too. The problem isn't that this can't be solved. The problem is that we are currently demanding far too much of the people who care enough to try | 16:04 |
| @scott.little:matrix.org | not zuul, just the 'repo sync' that our designers tend to do in the morning to update there 80ish git's | 16:04 |
| @fungicide:matrix.org | yes, in openstack a lot of teams ended up recently merging a regression called "precommit" which apparently only knows how to pull things from remote git urls | 16:04 |
| @clarkb:matrix.org | for a concrete example anubis has been on our "lets do this" list for a few days but we simply haven'y had any time to push it forward due to all the firefighting. Eventually things got bad enough that we said screw it and just applied it manually and are now going to sync up config management after the fact | 16:05 |
| @jim:acmegating.com | fungi: wow, that sounds very wrong. is there an architecture document or spec or something? | 16:06 |
| @fungicide:matrix.org | looking at what user agents are hitting the gitea backends now that anubis is in place, gitea09 and gitea13 apparently are handling very high volumes of requests from git clients (in the 15z hour, 20656 from "git/2.43.5" on gitea09 and 46400 from "git/2.47.3" on gitea13) | 16:06 |
| @clarkb:matrix.org | OpenDev has always been built on the idea that the hosted projects would get involved and help maintain OpenDev too. OpenStack and Zuul have regularly had overlap with the OpenDev team but starlingx has not. If there is interest in getting more involved I'm happy to help point people in the right direction (thats the service coordinator hat I wear talking) | 16:06 |
| @fungicide:matrix.org | the other backends don't exhibit this pattern, so likely coming from one or a handful of ip addresses | 16:07 |
| @fungicide:matrix.org | corvus: no, somebody pushed a bunch of changes to openstack repositories adding configuration for this "precommit" tool and then setting up tox to run that in order to do things like linting, and it does its own dependency installation. i think it originated in the golang ecosystem where everyone refers to dependencies with git urls | 16:08 |
| @fungicide:matrix.org | i'm not that familiar with it, just trying to get to the bottom of what happened there | 16:09 |
| @fungicide:matrix.org | scott.little: approximately what time of day are the designers running this repo sync task, and is there a central cache they share or are they doing it independently? | 16:10 |
| @scott.little:matrix.org | oh boy, we are spread thin too, but of opendev is open to adding a head from the StarlingX pool of talent, I can put that forward at our next team meeting. | 16:10 |
| @scott.little:matrix.org | independently. no cache. The North American members would likely try to update around 8-9 am Eastern | 16:11 |
| @jim:acmegating.com | fungi: it seems like a dependency of an opendev/openstack job should always be installed via zuul, never directly from gitea or gerrit. this seems like a major flaw in both job design an systems design. we were very careful to avoid that situation when originally designing the jobs. | 16:12 |
| @fungicide:matrix.org | well, in this case they replaced pulling dependencies from pypi to pulling them from git | 16:13 |
| @jim:acmegating.com | they were pulling openstack dependencies from pypi? | 16:14 |
| @clarkb:matrix.org | corvus: yes we had this discussion in the openstack tc channel yesterday | 16:14 |
| @clarkb:matrix.org | corvus: there was some indication that this isn't possible with precommit which I argued agaisnt as a git repo and git commit is consistent across locations | 16:14 |
| @fungicide:matrix.org | corvus: looks like it's generally just the "hacking" tool, e.g. https://opendev.org/openstack/nova/src/commit/622a015/.pre-commit-config.yaml#L48 | 16:14 |
| @jim:acmegating.com | yeah, i mean, git repos can be at file urls... | 16:15 |
| @mnasiadka:matrix.org | I didn’t add up to the discussion, but it sounds weird precommit can’t do file repo urls | 16:16 |
| @jim:acmegating.com | maybe we need to re-socialize some of the basic rules for setting up zuul jobs: never fetch from opendev.org and, really, really, never fetch from review.opendev.org. | 16:17 |
| @fungicide:matrix.org | (we do have some rare exceptions to the latter, like proposal jobs that need to check whether there's a pending change in gerrit that hasn't merged in order to decide whether to push a new change or a revision to the existing one) | 16:21 |
| @jim:acmegating.com | yep. and those jobs and exceptions were designed in consultation with the folks running the service. :) | 16:21 |
| @fungicide:matrix.org | (which we could maybe figure out from named refs in a full local mirror of the repo instead? not sure) | 16:21 |
| @scott.little:matrix.org | is there a doc on how a project can get involved in maintaining opendev infrastructure? | 16:21 |
| @clarkb:matrix.org | I've done my audits: all 6 backends have 5 containers running (gitea web, giteassh, anubis, mariadb, and memcached). All 6 giteas have what appears to be a consistent anubis config. All 6 giteas haev the updated protocol http app.ini related changes. All 6 appear to have updated apache vhost configs. And all 6 have the anubis update to docker-compose.yaml and the correct bind mounts for gitea-web | 16:22 |
| @clarkb:matrix.org | scott.little: https://docs.opendev.org/opendev/system-config/latest/open-infrastructure.html this is a good overview of how thinsg are built (though it may be out of date at times) and includes some pointers at contributing from there | 16:23 |
| @fungicide:matrix.org | other than the outsized volume of git requests getting balanced to the 09 and 13 backends, everything seems to be operating normally and even those two are still handling the increased volume gracefully | 16:23 |
| @clarkb:matrix.org | we also maintain a set of spec documents and help wanted info here: https://docs.opendev.org/opendev/infra-specs/latest/ | 16:23 |
| @clarkb:matrix.org | including a less formal etherpad link with lists of things we'd like to do that haven't been formalized or may not need to be formal because they are more straightforard | 16:23 |
| @fungicide:matrix.org | i'm going to take this opportunity to grab a much-belated shower now that Clark is back | 16:24 |
| @fungicide:matrix.org | brb | 16:24 |
| @mnasiadka:matrix.org | Now that the dust is sort of settled - Clark - I started working on the ord.rax mirror replacement/upgrade (what was funny the image that has OpenDev in the name was not booting, but the Cloud 24.04 image did) | 16:28 |
| @jim:acmegating.com | it's like we only have one shower | 16:28 |
| @clarkb:matrix.org | mnasiadka: the opendev image needs to be booted with the --config-drive flag to launch node in that cloud | 16:29 |
| @clarkb:matrix.org | mnasiadka: cloud init doesn't know how to use metadata in that cloud or they don't supply it or something. But config drive works | 16:29 |
| @clarkb:matrix.org | the gerrit 3.11->3.12 upgrade test job failed on the gitea http transition change | 16:30 |
| @clarkb:matrix.org | It looks like the project index lock wasn't removed when gerrit 3.11 shutdown so 3.12 couldn't reindex it on startup | 16:30 |
| @clarkb:matrix.org | I haven't seen that before so I think we can probably recheck it | 16:30 |
| @mnasiadka:matrix.org | Clark: Is it fine with using RAX 24.04 image or should I resort back to the OpenDev one? | 16:31 |
| @clarkb:matrix.org | mnasiadka: I suspect it is fine. In the past we have typically used the cloud supplied image but this time noble images were not uploaded quickly so we ended up uploading our own to all clouds | 16:32 |
| @mnasiadka:matrix.org | Ah, ok - so let me continue with that and see if there are any problems | 16:33 |
| @clarkb:matrix.org | sounds good | 16:33 |
| @clarkb:matrix.org | Looks like gerrit replication completed. I'm going to check each of the backends has the latset system-config commit now | 16:33 |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 983911: Add mirror03.ord.rax https://review.opendev.org/c/opendev/system-config/+/983911 | 16:35 | |
| @clarkb:matrix.org | yes https://opendev.org/opendev/system-config/commit/b3e229b67919c540809289838ffea48969f6b324 seems to be present on each backend now | 16:35 |
| -@gerrit:opendev.org- Monty Taylor https://matrix.to/#/@mordred:inaugust.com proposed: [openstack/project-config] 983912: Add gerrit plugin for openclaw https://review.opendev.org/c/openstack/project-config/+/983912 | 16:36 | |
| @mnasiadka:matrix.org | Clark: the mirror volume in ORD seems to be 256 compared to 200 in OVH - should it stay 256? | 16:36 |
| @mnasiadka:matrix.org | (size) | 16:36 |
| @clarkb:matrix.org | mnasiadka: yes. The reason for that is that the http proxy cache pruner can't keep up on those volumes like ti can on others so we give it more headroom (this is a classic case of tribal knowledge that should be written down somewhere probably) | 16:37 |
| @clarkb:matrix.org | mnasiadka: I would mimic the existing server for this reason. Basically we need more headroom so that http cachecleaning works. And apologies for the tribal knowledge delta | 16:38 |
| @fungicide:matrix.org | mordred: last week someone proposed adding an openclaw-as-a-service project to openstack too | 16:38 |
| @mnasiadka:matrix.org | Clark: I like tribal knowledge, it's everywhere :) | 16:38 |
| @mnasiadka:matrix.org | Ok, volume mounted to mirror03.ord.rax, server rebooting - https://review.opendev.org/c/opendev/system-config/+/983911 (and depends-on) need to be merged for it | 16:43 |
| @clarkb:matrix.org | cool. I should probably take a minute to update my todo list with all the gitea related stuff and also changes like that. | 16:44 |
| @clarkb:matrix.org | I'll try to get to them. But not sure where they are on the priority list at the moment | 16:44 |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 983911: Add mirror03.ord.rax https://review.opendev.org/c/opendev/system-config/+/983911 | 16:45 | |
| @mnasiadka:matrix.org | no worries, I just aim to be done with all mirrors this week | 16:46 |
| @mordred:waterwanders.com | fungi: hah, really? I mean, it's not a bad idea, I've been noodling on an aaS myself, but too many ideas not enough time | 16:47 |
| @mordred:waterwanders.com | fungi: I've got a really nice matrix+gerrit+zuul+openclaw setup going that I'm starting to think would make a good conference talk at some point. they all work together _really_ well | 16:48 |
| @fungicide:matrix.org | mordred: nijaba is apparently working at the company behind openclaw these days, so proposed it | 16:48 |
| @mnasiadka:matrix.org | Ah, there's one more thing - the reprepro job for Ubuntu mirroring is failing on missing bionic things? | 16:49 |
| "Error: packages database contains unused 'bionic-backports|main|amd64' database." | ||
| @mnasiadka:matrix.org | And that sort of broke kolla-build jobs due to some mirror inconsistency | 16:49 |
| @clarkb:matrix.org | mnasiadka: I think that was fallout of removing the bionic reprepro config. There is a manual step we need to run. I thought it would just warn us but I guess it didn't | 16:50 |
| @clarkb:matrix.org | mnasiadka: let me get some documentation and maybe that is something we want to work through together? | 16:50 |
| @mnasiadka:matrix.org | Clark: sure :) | 16:50 |
| @mnasiadka:matrix.org | more tribal knowledge, the better | 16:50 |
| @clarkb:matrix.org | no this one is actualyl docuemtned I just haven't had time to do it with all the fires | 16:51 |
| @clarkb:matrix.org | but I have to find the links first | 16:51 |
| @clarkb:matrix.org | mnasiadka: we need to run the process described here for the ~3 reprepro repos that had bionic removed: https://docs.opendev.org/opendev/system-config/latest/reprepro.html#removing-components All of that can and should be done with the mirroring keytabs for authentication. But generally openafs can also be accessed directly with your own account: https://docs.opendev.org/opendev/system-config/latest/afs.html But I think you can ignore this for that particular issue | 16:52 |
| @clarkb:matrix.org | I'll jump on mirror update now and see if I can grab the locks we need to grab. https://review.opendev.org/c/opendev/system-config/+/983221 shows the things that need cleanup | 16:53 |
| @clarkb:matrix.org | actually there is a step there that involves manually deleting files using your own kinit aklog so we do need that | 16:55 |
| @clarkb:matrix.org | ok I have started a root screen on mirror-update03 and grabbed the apt-puppetlabs, ubuntu-cloud-archive, and ubuntu locks | 16:57 |
| @clarkb:matrix.org | I'm going to start this process against apt-puppetlabs now | 16:59 |
| @clarkb:matrix.org | hrm the apt puppetlabs cleanup cleaned up more than I expected implying that the expectation that there be a warning nto an error may be valid? Still the cleanup needs to be done anyway so I'll keep going | 17:02 |
| @mnasiadka:matrix.org | Ok, attached to screen | 17:10 |
| @clarkb:matrix.org | mnasiadka: cool window 0 has the overview, windows 1,2,3 are where I held specific locks and are where I will perform operations for each of the mirrors. and i just logged into openafs in window 4 to do the next step of apt-puppetlabs cleanup | 17:12 |
| @mnasiadka:matrix.org | Sure, following :) | 17:13 |
| @clarkb:matrix.org | I'm taking my time since I don't remember the last time I've done this :) | 17:14 |
| @clarkb:matrix.org | looks like we have an expired key which is why that instaneously returned to us | 17:19 |
| @clarkb:matrix.org | fungi: are you back yet? | 17:19 |
| @clarkb:matrix.org | I think we need to find whatever key apt puppetlabs is using now and add it to the keychain then use it | 17:20 |
| @clarkb:matrix.org | There are a number of keys at https://apt.puppetlabs.com/ and I'm not sure which is current. I think I may leave this one in this state for now and contineu to cloud archive | 17:21 |
| @clarkb:matrix.org | I suspect that if it was broken before and no one was complaining that this isn't urgent to fix | 17:21 |
| @fungicide:matrix.org | thanks! i was planning to run the bionic mirror cleanup steps but obviously distracted | 17:22 |
| @clarkb:matrix.org | fungi: maybe you want to look at the apt puppetlabs stuff? It looks liek UCA was working until that change broke it so I think it is happier and its good for me to practice | 17:23 |
| @clarkb:matrix.org | fungi: see windows 0, 1, and 4 in the root screen on mirror update for where we ended up but tldr is after deleting everything including directly via rm after aklog the run of reprepro fails because the gpg key is expired | 17:24 |
| @clarkb:matrix.org | so it needs a new key and a rerun I guess | 17:24 |
| @clarkb:matrix.org | ok UCA is done and it appears to have been happy. I'm going to proceed with the big one, Ubuntu proper, now | 17:33 |
| @clarkb:matrix.org | fungi: also should I be doing an intermediate vos release for ubuntu if we've already downloaded stuff that isn't in a happy state? | 17:35 |
| @clarkb:matrix.org | fungi: I'm worried the intermediate release may put us into a spot where things are broken until we do the final release. Maybe I should just do one release at the end? | 17:35 |
| @fungicide:matrix.org | i would not manually run `vos release` for those volumes, no | 17:36 |
| @fungicide:matrix.org | the mirror script will skip it if reprepro failed | 17:37 |
| @clarkb:matrix.org | ok our docs at https://docs.opendev.org/opendev/system-config/latest/reprepro.html#removing-components say to do it. I'll skup the one in the middle and just run the mirror script after the cleanups nd let it come to a fully happy state before proceeding | 17:37 |
| @clarkb:matrix.org | note I did do the intermediate release for apt-puppetlabs and for ubuntu-cloud-archive. Ubuntu-cloud-archive is completely done now and happy though and apt-puppetlabs is broken but I'm not sure how much that matters? | 17:38 |
| @clarkb:matrix.org | anyway I'll proceed with that plan for ubuntu (no intermediate release after clearvanished and deleteunreferenced | 17:38 |
| @clarkb:matrix.org | mnasiadka: and I realize I'm sort of skimming the highlights here. If you want we can have a call or a focused discussion on how m irrors and oepnafs are done | 17:40 |
| @clarkb:matrix.org | I just figured since you noted things were broken I should get around to doing this arleady | 17:40 |
| @mnasiadka:matrix.org | Clark: no worries, I've been noting stuff as you go and there are some docs - I think that's enough information for today, but happy to gain some more knowledge on some more peaceful day :) | 17:41 |
| @clarkb:matrix.org | sounds good, and thanks again for putting up with the crazyness. It has been a week | 17:41 |
| @mnasiadka:matrix.org | Hopefully Anubis should get us more planned activities and less fire drills | 17:42 |
| @clarkb:matrix.org | ++ | 17:42 |
| @clarkb:matrix.org | thinking about apt-puppetlabs more: I'm wondering if we should just remove it? Do we know if it isused anywhere? Presumably it hasn't udpated in some time if the gpg key expired | 17:57 |
| @mnasiadka:matrix.org | Clark: I commented on the Prometheus patch - sorry it took so long ;) | 17:58 |
| @clarkb:matrix.org | thanks! and I won't throw stones :) I understand we've all been super distracted and busy. As you said hopefully we've made a positive change towards keeping this under control going forward | 17:58 |
| @mnasiadka:matrix.org | Ok, enough for today - it's 8pm here :) | 17:59 |
| @clarkb:matrix.org | good night! | 17:59 |
| @mnasiadka:matrix.org | I see the mirror script is running now, so hopefully everything should be fine | 17:59 |
| @clarkb:matrix.org | yes it looks like it is doing what is expected of it. Finding new packages etc | 18:00 |
| @clarkb:matrix.org | fungi: I think I will rerun the mirror script by hand for both uca and ubuntu just to be sure that they are happy with back to back syncs. Then I'm not sure what to do about apt-puppetlabs | 18:00 |
| @fungicide:matrix.org | tkajinam likely knows if that mirror is still used | 18:00 |
| @fungicide:matrix.org | the apt-puppetlabs package mirror i mean | 18:01 |
| @clarkb:matrix.org | what I haev done for apt-puppetlabs is clear vanished, delete unreferenced, then manually rm the dists/ and lists/ content for stretch, xenial, and bionic as documented in our docs. Then did the intermediate vos_release. Running the actual script fails as the key is expected | 18:01 |
| @clarkb:matrix.org | * what I haev done for apt-puppetlabs is clear vanished, delete unreferenced, then manually rm the dists/ and lists/ content for stretch, xenial, and bionic as documented in our docs. Then did the intermediate vos\_release. Running the actual script fails as the key is expired | 18:01 |
| @clarkb:matrix.org | https://apt.puppetlabs.com/ lists a number of keys I'm not sure which is valid now so presumably we can have a change to update the key and it will automatically pick up from there. Or I can leave the lock held and we try to manually fix it then catch up with the system-config updates to match? | 18:02 |
| @clarkb:matrix.org | looks like Ubuntu is solving the different versions of rust problem by having 30 versions of rust available | 18:02 |
| @fungicide:matrix.org | i'll see if i can track it down | 18:04 |
| @clarkb:matrix.org | I want to say `/*stdin*\ : Read error (39) : premature end` is ok and expected? | 18:04 |
| @clarkb:matrix.org | we just got a few of those | 18:05 |
| @fungicide:matrix.org | yeah, those are common and benign afaik | 18:05 |
| @fungicide:matrix.org | unrelated, argh, system-config-run-gitea is probably going to time out on 983134 this time | 18:07 |
| @clarkb:matrix.org | maybe we direct enqueue it to the gate if that happens | 18:07 |
| @clarkb:matrix.org | I wonder if anubis is making things slower though | 18:07 |
| @clarkb:matrix.org | its an extra layer of processing for all the thousands of requests we make to create projects and so on | 18:08 |
| @clarkb:matrix.org | it is running testinfra tests it will be close I bet | 18:09 |
| @clarkb:matrix.org | fungi: it succeeded! | 18:12 |
| @clarkb:matrix.org | two changes are now in the gate. The other thing to note is after anubis is in place we should monitor the first project-config update that creatse a new project or modifies one. This is will covered in the gate testing, but its worth making sure it is happy when we do one for the firsttime just due to how much editing around that was done | 18:14 |
| @fungicide:matrix.org | just barely in under the wire | 18:14 |
| @clarkb:matrix.org | reprepro must be in its validate all the things are reachable stage and it isn't fast nor does it log any info about its progress | 18:22 |
| @clarkb:matrix.org | oh we're swapping. Thats not great. Multiple overlapping reprepro runs will do that I guess | 18:25 |
| @clarkb:matrix.org | ubuntu-ports and debian both seem to have started while ubuntu was going | 18:25 |
| @clarkb:matrix.org | in theory it does things like this every few hours so its probably fine | 18:25 |
| @fungicide:matrix.org | yes, i usually refrain from manually running more than one in parallel when doing a large catch-up sync, just because also that's a lot of churn in afs | 18:26 |
| @clarkb:matrix.org | well I am only manually running one. But cron started the other two | 18:26 |
| @fungicide:matrix.org | aha, yes hopefully those are for things that won't take long | 18:27 |
| @fungicide:matrix.org | if the delta is small | 18:27 |
| @fungicide:matrix.org | i'm going to go grab a long-overdue lunch, but when i get back i'll try to track down the correct/newer apt-puppetlabs keys and we can try to reenqueue the anubis deploy buildset with the gitea backends out of the emergency file again? | 18:29 |
| @fungicide:matrix.org | (assuming those changes merge and don't have to get rechecked) | 18:29 |
| @clarkb:matrix.org | sounds like a plan | 18:29 |
| @fungicide:matrix.org | okay, back in a while | 18:30 |
| @clarkb:matrix.org | I'm going to continue to try and finish up the reprepro bionic cleanups for UCA and Ubuntu in the meantime | 18:30 |
| @clarkb:matrix.org | debian completed as did debian security so now it is just ubuntu and ubuntu ports | 18:38 |
| -@gerrit:opendev.org- Monty Taylor https://matrix.to/#/@mordred:inaugust.com proposed: [openstack/project-config] 983924: Add an OpenClaw plugin for Zuul integration https://review.opendev.org/c/openstack/project-config/+/983924 | 18:45 | |
| @clarkb:matrix.org | mordred: maybe your new plugin will be the guinea pig after we get the anubis config applied via ansible atop what I did manually | 18:45 |
| @mordred:waterwanders.com | Clark: I love being a guinea pig :) | 18:46 |
| @mordred:waterwanders.com | Clark: I'm guessing that's more about the zuul api interaction and less about the gerrit one, yeah? | 18:47 |
| @clarkb:matrix.org | its the gitea api interaction. We're in the middle of converting gitea to listen on http instead of https to make anubis simpler. I manually applied the entire anubis change to the giteas already due to a ddos, but the changes to apply to prod are happening soon (they are in the gate) | 18:48 |
| @clarkb:matrix.org | and all of the machinery to create projects and all that in gitea was using https now need to use http | 18:48 |
| @clarkb:matrix.org | there is good coverage of this in teh gate too we actually create a bunch of empty projects based on the prod projects.yaml file so it should just work. but its good to confirm and your new project should do that for us | 18:49 |
| @clarkb:matrix.org | only ubuntu mirroring is running now. All the others appear to have succeeded so I don't think we got OOMKillered or anything | 19:24 |
| -@gerrit:opendev.org- Zuul merged on behalf of Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org: [opendev/system-config] 983134: Remove intermediate HTTPS layer for Gitea backends https://review.opendev.org/c/opendev/system-config/+/983134 | 19:30 | |
| @clarkb:matrix.org | ok here we go. The 6 backend nodes are in teh emergency file and that should noop | 19:30 |
| @clarkb:matrix.org | looks like we did enqueue manage-projects which should also noop I think | 19:31 |
| @clarkb:matrix.org | yes the manage projects playbook respects disbaled lists | 19:35 |
| @mnasiadka:matrix.org | I forgot to tell - had a look in the Resolute build issues and yes, /opt/dib_tmp is full when the job ends up running in rax dfw (and maybe also somewhere else) - any ideas what to do other than making the image smaller?:) | 19:35 |
| @clarkb:matrix.org | mnasiadka: we could restrict the builds to the nested virt labels. I think they all have larger disks without the extra ephemeral drive. But otherwise ya I think we're looking at optimizing the builds themselves. Maybe doing vhd conversion first if it isn't alredy first so that there isn't a qcow2 and a raw image already there while we do the two step vhd conversion | 19:37 |
| @clarkb:matrix.org | ok service-gitea succeeded very quickly and my tail of syslog on gitea09 showed no ansible access | 19:37 |
| @clarkb:matrix.org | I think we may actually be able to remove the nodes from the emergency file if manage-projects completes before the second change merges | 19:38 |
| @clarkb:matrix.org | fungi: I don't know if you are back yet, but I think I will do that if the timing works out. Just one less thing to wait for this way | 19:38 |
| @clarkb:matrix.org | yup deploy for the first change just succeeded and teh second has not merged yet. I'll remove the hosts from emergency now | 19:38 |
| @clarkb:matrix.org | done | 19:39 |
| @clarkb:matrix.org | if the second change merges it should deploy normally | 19:39 |
| @clarkb:matrix.org | ubuntu reprepro completed and now the vos release is running | 19:39 |
| @clarkb:matrix.org | I've never worked in a kitchen but I imagine this feels similar. I've got vos release simmering over there. I'm waiting for the sauce someone else is cooking to finish (zuul and the anubis change) so that I can then plate something else I've already prepped. Except I bet its a lot harder and more physically exhausting in a kitchen | 19:42 |
| -@gerrit:opendev.org- Zuul merged on behalf of Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org: [opendev/system-config] 983061: Apply Anubis to the Gitea backend servers https://review.opendev.org/c/opendev/system-config/+/983061 | 19:43 | |
| @clarkb:matrix.org | ok here we go. I'm monitoring gitea09 | 19:43 |
| @clarkb:matrix.org | ansible has started running on 09 | 19:46 |
| @clarkb:matrix.org | gitea09 and 10 have both had their ansible runs. We reloaded apache on both implying the vhost file wasn't an exact match. I tested gitea09 via socks proxy through the load balancer and it seems to work so I don't think this is an issue | 19:49 |
| @clarkb:matrix.org | none of the containers were restarted which is what I expected since app.ini and the images shouldn't have changed | 19:49 |
| @clarkb:matrix.org | all that to say I think this is working as epxected but I'll test 10 now then look at the vhost files | 19:49 |
| @clarkb:matrix.org | yup 10 works too so I don't think the apache reload is a problem | 19:50 |
| @clarkb:matrix.org | ok I don't see anything obviously different with the vhost. Maybe just whitespace? | 19:51 |
| @clarkb:matrix.org | once the job completes I'll test the other 4 backends. Then I want to start pushing up some followups that I've been thinking about | 19:51 |
| @clarkb:matrix.org | also `Released volume mirror.ubuntu successfully` this happened. i'm goign to manually re run UCA now then when that is done I'll rerun ubuntu | 19:52 |
| @clarkb:matrix.org | UCA is done and ubuntu is running now | 19:54 |
| @clarkb:matrix.org | https://zuul.opendev.org/t/openstack/buildset/aa4d885597f648e1a011322cfd83ed5b deployment reports success | 19:56 |
| @clarkb:matrix.org | all backends work when accessed via socks tunnel | 19:56 |
| @clarkb:matrix.org | and the frontend seems to work for me | 19:57 |
| @clarkb:matrix.org | all backends are up according to the haproxy show stats command as well | 19:57 |
| @fungicide:matrix.org | okay, catching back up now | 20:00 |
| @fungicide:matrix.org | and yeah, looks like i just missed the exciting non-event of the changes rolling out in deployment? | 20:01 |
| @fungicide:matrix.org | good deal | 20:01 |
| @clarkb:matrix.org | fungi: once this ubuntu rerun finishes and if it is successful I will unlog -cell openstack.org and kdestroy in the screen window 4 if that is safe and won't affect the mirroring stuff. `tokens` only reports one token so I think it is | 20:01 |
| @clarkb:matrix.org | yup it all seems to be working as epxected according to my testing | 20:02 |
| @clarkb:matrix.org | the main thing we haven't checked is new project craetion. Mordred has a new project proposal if we want to proceed with that now. Though I think I want to finish up with the mirror cleanups and then push some changes I have in mind so Idon't forget | 20:02 |
| @clarkb:matrix.org | https://mirror.dfw.rax.opendev.org/ubuntu/dists/ but bionic is gone from here | 20:03 |
| @fungicide:matrix.org | the kitchen analogy is an apt one | 20:03 |
| @clarkb:matrix.org | looking at https://grafana.opendev.org/d/9871b26303/afs?orgId=1&from=now-6h&to=now&timezone=utc I think we freed around 200GB of disk | 20:03 |
| @clarkb:matrix.org | I've also been collecting terminal windows like pokemon | 20:05 |
| @fungicide:matrix.org | gotta catch 'em all | 20:09 |
| -@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 983929: Pin anubis container image to v1.25.0 https://review.opendev.org/c/opendev/system-config/+/983929 | 20:11 | |
| @clarkb:matrix.org | ok change 1 consider this an RFC | 20:11 |
| -@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 983930: Cleanup Apache UA filters https://review.opendev.org/c/opendev/system-config/+/983930 | 20:14 | |
| @clarkb:matrix.org | change 2 again I'm happy to consider this an RFC | 20:14 |
| @clarkb:matrix.org | I also dont think any of these are super urgent. I just want to have an easy reminder | 20:14 |
| @fungicide:matrix.org | i agree we could stand to compress and maybe clear out a lot of our ua filter | 20:15 |
| @fungicide:matrix.org | it's mostly a disorganized pile of rules added in haste | 20:15 |
| @clarkb:matrix.org | ya I did a compression and simplification at one point but it was nowhere near complete | 20:16 |
| -@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 983931: Increase the gitea mysqld connection limit https://review.opendev.org/c/opendev/system-config/+/983931 | 20:18 | |
| -@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 983932: Limit the gitea haproxy connection limit down to 10k https://review.opendev.org/c/opendev/system-config/+/983932 | 20:22 | |
| @clarkb:matrix.org | ok I'm looking for feedback on all of those. I dont' feel strongly about them landing at all or landing with the specific values chosen | 20:22 |
| @fungicide:matrix.org | sp was it just the apt-puppetlabs repo that needed troubleshooting still? | 20:28 |
| @clarkb:matrix.org | fungi: yes | 20:29 |
| @clarkb:matrix.org | fungi: if you join the root screen on mirror update you can see what I did in window 1 and window 4 (though its at the very beginning of window 4 scrollback | 20:29 |
| @clarkb:matrix.org | I followed our documentation up to the point where you have to rerun the normal reprepro mirror script and that immediately failed with a key is expired error | 20:29 |
| @clarkb:matrix.org | you can see that error in the usual log file location | 20:29 |
| @clarkb:matrix.org | window 1 is also where I'm holding the lock | 20:30 |
| @clarkb:matrix.org | fungi: I am still holding the uca and ubuntu locks. ubuntu is still rerunning its second pass. When that second pass finishes I will close window 2 and 3 and drop the locks for them. Then I will unlog -cell openstack and kdestroy in window 4 to undo my auth there | 20:31 |
| @clarkb:matrix.org | but I can keep windows 0, 1, and 4 open if it helps with apt-puppetlabs debugging (and it will hold the lock for apt-puppetlabs) | 20:31 |
| @fungicide:matrix.org | i think to avoid confusion, i'll wait until you clean up, feel free to exit the screen session and i'll hold a new lock under a fresh one just for the puppetlabs mirror | 20:44 |
| @clarkb:matrix.org | ok that works for me | 20:45 |
| @clarkb:matrix.org | still waiting for ubuntu to finish the second pass | 20:45 |
| @fungicide:matrix.org | i mainly just need to check the signature verification error to see what key fingerprint it sees being used, find and double-check the official full copy of that key, maybe hand-apply and test it quickly on the server, then push a change up to add it once i'm sure it's what we need | 20:45 |
| @clarkb:matrix.org | mnasiadka: so I'm looking at mirror03.ord.rax.opendev.org IP addresses lgtm. I thought that there was no volume attached yet and checked via openstack volume list (which showed it). Then got really confused. Then realized that I was doing mount | grep mapper and cat /etc/fstab on mirror-update03. Notice that mirror03 and mirror-update03 look similar to tired eyes :) anyway that was my fault not yours. I think your changes look good and I'll +2 them once I confirm the chagnes themselves and not just the server | 20:48 |
| @clarkb:matrix.org | actually just foudn a small but important issue with the dns change so I -1'd that one. But its an easy fix | 20:52 |
| @fungicide:matrix.org | checking the end of /var/log/reprepro/apt-puppetlabs.log i see `VerifyRelease condition '9E61EF26' lists expired key '4528B6CD9E61EF26'.` so it's possible we're mirroring packages from an index signed by a key that's now expired and we merely need to do what's on the tin | 20:55 |
| @clarkb:matrix.org | `com.rackspace.servermill.failed_reason='auth_failure', rax_service_level_automation='Build Error'` I also see this against server show mirror03.ord.rax.opendev.org. This is a mirror s oI'm not super concerned about proceeding with it. But figured I'd mention it | 20:56 |
| @clarkb:matrix.org | the server is up and running so it didn't completely fail | 20:56 |
| @clarkb:matrix.org | I just remembered that we're upgrading gerrit on sunday | 20:57 |
| @clarkb:matrix.org | I'm like 95% certain the prep work I'ev done is fairly complete so I don't think that changes after the rest of this week. More of a "oh ya I haev to do that too" | 20:57 |
| @clarkb:matrix.org | this reprepro export is taking longer than the last one. I think that has to do wit hthe reprepro for ubuntu ports overlapping at this point in the process? | 21:04 |
| @clarkb:matrix.org | I will attempt to practice patience | 21:04 |
| @fungicide:matrix.org | likely. if i need to clock out before it completes i can pick up apt-puppetlabs in the morning too | 21:04 |
| @clarkb:matrix.org | fungi: should I leave it locked in that case? | 21:05 |
| @clarkb:matrix.org | I guess its going to noop each time it runs anyway so probably fine to unlock | 21:05 |
| @fungicide:matrix.org | it's fine to let it go back to running in the meantime | 21:06 |
| @fungicide:matrix.org | it's also a very small repository so doesn't run for long | 21:06 |
| @fungicide:matrix.org | i think puppetlabs didn't get the memo that gpg signatures are meant to verify the contents of your package repository haven't been tampered with, so serving that as https://apt.puppetlabs.com/pubkey.gpg is sort of silly | 21:11 |
| @fungicide:matrix.org | but easy to find i guess | 21:11 |
| @clarkb:matrix.org | fungi: if you are still near that screen session: does the klist output show only my principal and then the tickets it has? or are those service principals for mirror/reprepro operations? | 21:14 |
| @clarkb:matrix.org | (if you know0 | 21:14 |
| @clarkb:matrix.org | my concern is that if I kdestroy I might impact some running process on the mirror updater | 21:15 |
| @fungicide:matrix.org | o really don't know. i guess you could wait ~20 hours until they expire | 21:16 |
| @clarkb:matrix.org | ya if I was smart I would've su'd back to my personal user | 21:16 |
| @clarkb:matrix.org | so that there was a clear distinction in ownership of credentials | 21:16 |
| @fungicide:matrix.org | the `Default principal:` lists your admin account, so i think those are just you | 21:17 |
| @fungicide:matrix.org | should be safe? | 21:17 |
| @clarkb:matrix.org | yes I think it is saying that default principal has these two service principals tickets? | 21:17 |
| @clarkb:matrix.org | the first I assume is from kinit and the second from aklog | 21:17 |
| @fungicide:matrix.org | that's how i interpret it, but even my beard is not grey enough to have 100% confidence in kerberos matters | 21:17 |
| @clarkb:matrix.org | actually man klist says you can pass -k to look at specific keytabs and since we use keytabs for the mirror operations I think this is a correct interpretation | 21:18 |
| @fungicide:matrix.org | worst case, some mirrors stop updating (again), which is unlikely but also not the end of the world | 21:18 |
| @clarkb:matrix.org | so it should be safe to unlog -cell openstack.org and then kdestroy. I'll run klist after unlogging to see if that second item goes away too | 21:18 |
| @clarkb:matrix.org | and I'll wait for reprepro to finish up so there is nothing runningwhen I do that to be extra safe | 21:19 |
| @clarkb:matrix.org | ok its done | 21:37 |
| @clarkb:matrix.org | but ubuntu-ports is running so I'll wait for that before I dod the unlog and kdestroy | 21:37 |
| @fungicide:matrix.org | cool | 21:38 |
| @clarkb:matrix.org | fungi: ok I've dropped my flocks and closed the screen | 21:48 |
| @clarkb:matrix.org | after doing the unlog and kdestroy so hopefully everything is happy after. I think it will be | 21:48 |
| @fungicide:matrix.org | thanks! i was following along | 21:48 |
| @clarkb:matrix.org | I captured the output of tokens and klist locally just in case that becomes useful later | 21:49 |
| @clarkb:matrix.org | but its mostly about the timestamps I think | 21:49 |
| @fungicide:matrix.org | i've checked the apt-puppetlabs upstream repo and there's a signing key from last year with a different fingerprint. sadly they don't seem to bother cross-signing keys so provenance is an issue | 21:49 |
| @clarkb:matrix.org | https://mirror.dfw.rax.opendev.org/apt-puppetlabs/timestamp.txt this is likely when the key expired I bet (a yaer ago) | 21:50 |
| @clarkb:matrix.org | given it has been a year I wonder if we can just drop the mirror entirely | 21:50 |
| @fungicide:matrix.org | we've got it configured to mirror focal puppet5 which no longer exists at https://apt.puppetlabs.com/dists/focal/ | 21:51 |
| @clarkb:matrix.org | fwiw I checked system-config and we pull from upstream in the few places we use puppet | 21:51 |
| @clarkb:matrix.org | ya in system-config we pull packages from their archive because things apparently go out of the apt repo | 21:51 |
| @fungicide:matrix.org | https://apt.puppetlabs.com/dists/focal/InRelease seems to be signed with 0x4528B6CD9E61EF26 which is the key in https://apt.puppetlabs.com/pubkey.gpg | 21:52 |
| @fungicide:matrix.org | so i think if we just update to that it'll solve the key error | 21:53 |
| @clarkb:matrix.org | then we'll find the next error :) but that is still progress | 21:53 |
| @fungicide:matrix.org | right, the next error i expect is no puppet5 index for focal | 21:53 |
| @clarkb:matrix.org | fwiw my ssh keys have aged out which is my primary "you've worked a long day" signal. Why are you still around? (I mean that in a good way we should both probably go get some fresh air or something) | 21:54 |
| @fungicide:matrix.org | oh, yes i can pick this up again tomorrow. great reminder | 21:54 |
| @fungicide:matrix.org | also if tkajinam happens to be around later, maybe he can tell us to just delete the whole thing and stop caring about it ;) | 21:55 |
| @clarkb:matrix.org | that would be an excellent outcome | 21:55 |
| @fungicide:matrix.org | i do enjoy deleting things a lot more than fixing them | 21:55 |
| @clarkb:matrix.org | fwiw grafana seems to show opendev.org is still happy and I can still browse it from here | 21:55 |
| @clarkb:matrix.org | I do think we should consider when/how we want to add a new project to gerrit and gitea. Doing that sooner than later is probably a good idea | 21:56 |
| @clarkb:matrix.org | and then I'm hoping I can review gerrit upgrade things tomorrow and feel prepared for sunday | 21:56 |
| @fungicide:matrix.org | did we not approve mordred's new openclaw gerrit plugin project yet? | 21:56 |
| @fungicide:matrix.org | a lighter weight intermediate test might be approving a gerrit acl change like https://review.opendev.org/981924 since that'll still run manage-projects | 21:57 |
| @fungicide:matrix.org | though the gitea side of that should be all noop | 21:58 |
| @clarkb:matrix.org | thats a good point if we run the gitea side of manage projects in a noop fashion against an acl update that gives us an early signal if there are any errors | 21:58 |
| @clarkb:matrix.org | then we can followup by landing modrreds new project | 21:59 |
| @clarkb:matrix.org | I mean this should all work as we test it extensively in CI, but its a big change we just made so good to pay attention | 21:59 |
| @clarkb:matrix.org | corvus: btw we now have ~300GB ish of space in the mirror.ubuntu volume if we want to start working on resolute mirroring. Maybe bump up that quote a bit just to be safe then adjust back when we see how big resolute is? | 22:16 |
| @clarkb:matrix.org | that is 300GB of free space according to our quota | 22:17 |
| @clarkb:matrix.org | and mnasiadka confirmed the issue building resolute is with disk space so we may need to figure out if the images are unnecessarily large or if we can optimize the builds one way or another etc | 22:18 |
| @jim:acmegating.com | ok i was just looking at that, and i guess our dstat graph isn't telling the whole story | 22:19 |
| @fungicide:matrix.org | i think we should increase the quota by at least 50gb | 22:19 |
| @jim:acmegating.com | maybe it's measuring the wrong disk? | 22:19 |
| @jim:acmegating.com | because it sure looked like there was space at the end of the failing jobs | 22:19 |
| @clarkb:matrix.org | yes on rax it would be / on one disk and /opt on another | 22:19 |
| @clarkb:matrix.org | so maybe a mixup between those two/ | 22:20 |
| @jim:acmegating.com | yeah -- maybe dstat is either recording only / and not opt, or it's recording the sum of the two, so we still can't see that opt is full | 22:20 |
| @fungicide:matrix.org | the mirror.ububtu volume was running at something like 97% or 98% full before bionic removal started, so i fully expect resolute to exceed our available capacity since successive releases seem to only ever get larger, not smaller | 22:20 |
| @fungicide:matrix.org | also we're going to need to substantially increase quota for mirror.ubuntu-ports if we're also going to mirror resolute for arm | 22:21 |
| @clarkb:matrix.org | we already floating around 80% capacity with bionic removed and no resolute too | 22:21 |
| @fungicide:matrix.org | since that's close to full even long after dropping the bionic mirror there | 22:21 |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!