| @harbott.osism.tech:regio.chat | Clark: fungi please check https://review.opendev.org/c/openstack/project-config/+/978566 when you have a moment, I think this is ready to get tested for real now | 06:58 |
|---|---|---|
| -@gerrit:opendev.org- Kai Liu proposed: [zuul/zuul-jobs] 984689: Start zuul_console in prepare-workspace-{git,openshift} https://review.opendev.org/c/zuul/zuul-jobs/+/984689 | 07:15 | |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 984694: Remove mirror03.gra1.ovh from configuration https://review.opendev.org/c/opendev/system-config/+/984694 | 07:54 | |
| -@gerrit:opendev.org- Kai Liu proposed: [zuul/zuul-jobs] 984695: Fix missing {{ }} in remove-registry-tag role https://review.opendev.org/c/zuul/zuul-jobs/+/984695 | 07:59 | |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 984694: Remove mirror03.gra1.ovh from configuration https://review.opendev.org/c/opendev/system-config/+/984694 | 08:16 | |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 984698: Remove mirror02.ord.rax from configuration https://review.opendev.org/c/opendev/system-config/+/984698 | 08:20 | |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 984702: Remove mirror02.dfw.rax from configuration https://review.opendev.org/c/opendev/system-config/+/984702 | 08:40 | |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/zone-opendev.org] 984738: Remove mirror03.gra1, mirror02.ord and mirror02.dfw from DNS https://review.opendev.org/c/opendev/zone-opendev.org/+/984738 | 12:40 | |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 984694: Remove mirror03.gra1.ovh from configuration https://review.opendev.org/c/opendev/system-config/+/984694 | 12:40 | |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 984698: Remove mirror02.ord.rax from configuration https://review.opendev.org/c/opendev/system-config/+/984698 | 12:41 | |
| -@gerrit:opendev.org- Michal Nasiadka proposed: | 12:41 | |
| - [opendev/system-config] 984698: Remove mirror02.ord.rax from configuration https://review.opendev.org/c/opendev/system-config/+/984698 | ||
| - [opendev/system-config] 984702: Remove mirror02.dfw.rax from configuration https://review.opendev.org/c/opendev/system-config/+/984702 | ||
| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 984702: Remove mirror02.dfw.rax from configuration https://review.opendev.org/c/opendev/system-config/+/984702 | 12:41 | |
| @mnasiadka:matrix.org | Ok, the whole set is ready for reviews | 12:41 |
| @lchams:matrix.org | Hi, I hope this is the right place to send this message. I think the recent update to Gerrit may have broken things for my account. I can no longer auth with git review via ssh. I've tested from multiple machines, it always denies my key. Could anyone help me with this please? | 13:29 |
| @fungicide:matrix.org | Leonie Chamberlin-Medd: can you let us know the error message? you might also add a `-v` on the command line to get more detail | 14:08 |
| @lchams:matrix.org | All here: https://paste.opendev.org/show/bzSg0Ok3cHdKAcToBRev/ | 14:17 |
| -@gerrit:opendev.org- Zuul merged on behalf of Michal Nasiadka: [openstack/project-config] 978566: propose-updates: Add pcu target https://review.opendev.org/c/openstack/project-config/+/978566 | 14:18 | |
| @fungicide:matrix.org | Leonie Chamberlin-Medd: in the gerrit sshd log i'm seeing a "user not found" error for the username you're supplying | 14:32 |
| @fungicide:matrix.org | i'll see if i can figure out what's up with that | 14:32 |
| @fungicide:matrix.org | Leonie Chamberlin-Medd: when was the last time it worked for you? i'm not seeing any successful ligins with that username for at least a month | 14:34 |
| @fungicide:matrix.org | s/ligins/logins/ | 14:34 |
| @harbott.osism.tech:regio.chat | not sure if this is a general zuul issue or just our database being slow: https://zuul.opendev.org/t/openstack/builds?project=openstack%2Fpython-cinderclient&project=openstack%2Fpython-brick-cinderclient-ext&pipeline=periodic-weekly&skip=0&limit=10 works fine for me, but having a larger limit the results page spins forever (limited by my patience) without showing results | 14:39 |
| @clarkb:matrix.org | fungi: Leonie Chamberlin-Medd https://review.opendev.org/c/opendev/sandbox/+/984731 this was pushed today. Not sure if over ssh or https though | 14:49 |
| @lchams:matrix.org | Interesting. I was using it absolutely fine last week on Wednesday. With LChams as the username right? | 14:49 |
| @lchams:matrix.org | Yeah that was me testing https so I could keep submitting patches | 14:50 |
| @clarkb:matrix.org | fungi: Leonie Chamberlin-Medd LChams shows up in the sshd_log from the 8th | 14:53 |
| @clarkb:matrix.org | and it looks like successful attempts. I would expect to see the unsuccessful attempts in today's log but don't | 14:54 |
| @clarkb:matrix.org | review.openstack.org appears to point at the correct location in DNS so that isn't the problem | 14:54 |
| @clarkb:matrix.org | Leonie Chamberlin-Medd: can you go to https://review.opendev.org/settings/ and confirm that Username: says LChams? | 14:55 |
| @clarkb:matrix.org | and https://review.opendev.org/settings/#SSHKeys shows an expected key? I'm just trying to rule things out while we sort out what the problem could be | 14:55 |
| @lchams:matrix.org | Yep LChams | 14:55 |
| @lchams:matrix.org | Yeah I even tried generating a fresh key but nothing | 14:56 |
| @clarkb:matrix.org | ok. I think the next step is to try ssh directly maybe you're on a system that can't negotiate an ssh connection with the updated MINA SSHD for some reason. If you run `ssh -p 29418 LChams@review.opendev.org gerrit ls-projects` this is a command that should list all of the projects via ssh and can be used to test things with fewer tools in the way. If that doesn't work you can add -v up to -vvv to get more ssh client debug output to see what the issue might be from that side. Then I'm hoping that will also create something in the sshd_log I can check | 14:57 |
| @clarkb:matrix.org | Leonie Chamberlin-Medd: nevermind I think i found the issue | 14:59 |
| @clarkb:matrix.org | it doesn't like your key | 15:00 |
| @clarkb:matrix.org | one second while I sanitize the log and share it | 15:00 |
| @clarkb:matrix.org | `failed (ExecutionError) to consult delegate for ssh-ed25519 key=SHA256:$KEYHASHHERE: java.lang.NoClassDefFoundError: net/i2p/crypto/eddsa/EdDSAPublicKey` | 15:00 |
| @clarkb:matrix.org | the only key I have in gerrit is also an ed25519 key so ed25519 generally works | 15:01 |
| @clarkb:matrix.org | but let me triple check that by using my key really quickly | 15:01 |
| @clarkb:matrix.org | yes I can use my ed25519 key against gerrit just fine (I used ssh-add -c to get a confirmation prompt then confirmed the hash/name matches what ssh-add -l lists as an ed25519 key) | 15:03 |
| @clarkb:matrix.org | are you using one of the newer -sk keys maybe? | 15:03 |
| @clarkb:matrix.org | my hunch here is that the key you supplied isn't strictly an ed25519 key (which is a subset of eddsa) and so MINA is looking for a generic eddsa handler and can't find it | 15:04 |
| @clarkb:matrix.org | possibly because your ssh client is supplying it as eddsa not ed25519 due to the difference | 15:05 |
| @clarkb:matrix.org | the example ssh command I gave above with an extra -vvv may help illustrate that case if it is the issue | 15:05 |
| @fungicide:matrix.org | Clark: Leonie Chamberlin-Medd: oh, sorry, i was going off the error from ther paste which showed you trying to log in as user `leonie` and there were a bunch of errors in gerrit's `sshd_log` between 10:17 and 14:11 today saying `user-not-found` for that username | 15:07 |
| @clarkb:matrix.org | fungi: yup then git review prompts for the user name after probing the local machine username and that also failed | 15:08 |
| @clarkb:matrix.org | but oddly sshd_log doesn't get the errors only error_log does | 15:08 |
| -@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [opendev/system-config] 984795: Add a script to test user agent patterns https://review.opendev.org/c/opendev/system-config/+/984795 | 15:10 | |
| @lchams:matrix.org | https://paste.opendev.org/show/bAzQ5uFBuKt8LqGOp5Zn/ | 15:11 |
| @clarkb:matrix.org | Leonie Chamberlin-Medd: that is a different key than the one that was failing before but the client hash and the server hash match so it is that key that is failing | 15:14 |
| @clarkb:matrix.org | and to be clear it is still failing the same way as the prior key | 15:15 |
| -@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [opendev/system-config] 984796: Remove a rewrite rule that matches googlebot https://review.opendev.org/c/opendev/system-config/+/984796 | 15:16 | |
| @clarkb:matrix.org | let me rerun my test with my ED25519 key and see if there is a behavior delta | 15:16 |
| @clarkb:matrix.org | in the ssh -vvv output I mean | 15:16 |
| @jim:acmegating.com | Clark: fungi ^ i think that rule is blocking crawlers from our static sites | 15:16 |
| @jim:acmegating.com | Clark: fungi note it has a parent change with a script that can help us find things like this | 15:17 |
| @clarkb:matrix.org | Leonie Chamberlin-Medd: the only difference I see during that part of the negotiation is `explicit` vs `agent` but I think that just has to do with where the key lives | 15:18 |
| @clarkb:matrix.org | Leonie Chamberlin-Medd: how is the key generated? can you try rsa and see if that has the same problem? | 15:19 |
| @clarkb:matrix.org | corvus: I've approved both | 15:20 |
| @fungicide:matrix.org | it's possible the mina-sshd version changed between gerrit 3.11 and 3.12, and they dropped support for some older key options or regressed on something related to key handling | 15:21 |
| @clarkb:matrix.org | fungi: yes MINA did update. This new version should be post quantum ready for example | 15:21 |
| @clarkb:matrix.org | I'm about to generate a new ed25519 key and test it | 15:22 |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [openstack/project-config] 984800: Follow up for I170bdb3ffc89bc3307a08da7cd4c7f2793a4e491 https://review.opendev.org/c/openstack/project-config/+/984800 | 15:22 | |
| @clarkb:matrix.org | since the key I am using is a couple years old | 15:22 |
| @mnasiadka:matrix.org | fungi: noticed a flaw in job config in 984800 - and I'll followup with a mechanism to install pip-check-updates only in jobs that use it | 15:22 |
| @fungicide:matrix.org | i'm popping out for a lunch errand, back in an hourish | 15:23 |
| @lchams:matrix.org | Generated with ssh-keygen -t ed25519. Just tried with an rsa key and am seeing the same issue | 15:23 |
| @clarkb:matrix.org | Leonie Chamberlin-Medd: ok is that with openssh? and what version of openssh? | 15:24 |
| @harbott.osism.tech:regio.chat | seems https://docs.openstack.org/ is down? | 15:24 |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [openstack/project-config] 984804: Install pip-check-updates only in jobs requiring that https://review.opendev.org/c/openstack/project-config/+/984804 | 15:25 | |
| @lchams:matrix.org | Sorry yes. When on powershell ssh -V gives OpenSSH_for_Windows_9.5p2, LibreSSL 3.8.2 and then on wsl ubuntu OpenSSH_9.6p1 Ubuntu-3ubuntu13.15, OpenSSL 3.0.13 30 Jan 2024 | 15:27 |
| @clarkb:matrix.org | looks like even with -i ssh will prefer keys in my agent so this is taking longer than I had hoped. | 15:27 |
| @mnasiadka:matrix.org | Clark: With ssh-agent I usually end up with that mix that works: -o "IdentitiesOnly yes" -o "IdentityFile path_to_the_public_key" | 15:29 |
| @mnasiadka:matrix.org | (or killing ssh-agent or removing keys from ssh-agent) | 15:29 |
| @clarkb:matrix.org | yup I ended up doing `ssh-add -D` and then rerunning. I cannot reproduce. I generated a new key with `ssh-keygen -t ed25519 -f test-ed25519` and then ssh -i with that key works here. The explicit vs agent thing does seem to be as I asusmed as switching to the new key not in an agent reports explicit rather than agent. | 15:31 |
| -@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [opendev/system-config] 984795: Add a script to test user agent patterns https://review.opendev.org/c/opendev/system-config/+/984795 | 15:31 | |
| @clarkb:matrix.org | OpenSSH_10.2p1, OpenSSL 3.5.3 is my local ssh version running on opensuse tumbleweed | 15:31 |
| @clarkb:matrix.org | Leonie Chamberlin-Medd: did both of those openssh versions generate a non working key for you? | 15:32 |
| @clarkb:matrix.org | I think there must be something either in the key generation process or the client handling of the key that is not working with MINA SSHD | 15:33 |
| @clarkb:matrix.org | since both my old and new ed25519 keys work I dont' think this is a problem with ed25519. You also reported that rsa has the same problem which is more evidence that the issue is maybe not type specific but more client specific for some reason | 15:34 |
| @clarkb:matrix.org | Leonie Chamberlin-Medd: could it be an encoding issue maybe? Does clicking the "Click To View" button at https://review.opendev.org/settings/#SSHKeys work for the pubkey? | 15:37 |
| @clarkb:matrix.org | I don't know if they do any validation of that content when rendering it. But if they do its possible you may get an error there? | 15:38 |
| @clarkb:matrix.org | Jens Harbott: it does look like we're running a full complement of apache servers there again | 15:38 |
| @lchams:matrix.org | Yeah. Don't really use powershell but also from my vm (OpenSSH_9.6p1 Ubuntu-3ubuntu13.14, OpenSSL 3.0.13 30 Jan 2024) the keys aren't working. | 15:40 |
| Click to view on the ssh section works fine | ||
| @clarkb:matrix.org | Leonie Chamberlin-Medd: did you try RSA? I don't see a recent failure with rsa | 15:45 |
| @clarkb:matrix.org | ya if I grep -v eddsa I get no returns for failed ssh attempts | 15:46 |
| -@gerrit:opendev.org- Michal Nasiadka proposed: [openstack/project-config] 984804: Install pip-check-updates only in jobs requiring that https://review.opendev.org/c/openstack/project-config/+/984804 | 15:48 | |
| @lchams:matrix.org | Just tried again with rsa, can you see the fail? | 15:51 |
| @clarkb:matrix.org | Leonie Chamberlin-Medd: yes, I see it but it reports the same error. As if the key is actually an ed25519/eddsa ckey | 15:52 |
| @clarkb:matrix.org | sha256sums should be the same length for the two key types right so I can't infer anything from that | 15:53 |
| @clarkb:matrix.org | I'm going to start reading the Gerrit source code now I guess | 15:54 |
| @clarkb:matrix.org | Leonie Chamberlin-Medd: `WARN com.google.gerrit.sshd.CachingPublicKeyAuthenticator` is the source of the issue and now I'm also wondering if the problem could in whatever is caching the keys? Do you have other keys defined maybe and it is failing on them. Like maybe you added an -sk key at one time and its looking through your keys and failing on that one before it can get to the others? | 15:56 |
| @lchams:matrix.org | Interesting. It's definitely an rsa key. | 15:58 |
| @clarkb:matrix.org | ya that is why I'm wondering if it is looking at an entirely different key in this process | 15:58 |
| @lchams:matrix.org | We added my coworker's key to my account. We can see it fails with my username LChams but works with his, mattcrees https://paste.opendev.org/show/bWL3LcQjPtltoii24mrR/ | 15:59 |
| Same ssh key in both | ||
| @clarkb:matrix.org | eg you say "here's my rsa key" then it goes through its cache of keys looking for it and if it hits a broken key before hand then short circuits? | 15:59 |
| @clarkb:matrix.org | ok, that doesn't rule out the client or the server. But probably does rule out problems with key generation | 15:59 |
| @clarkb:matrix.org | I think https://github.com/apache/mina-sshd/blob/sshd-2.15.0/sshd-core/src/main/java/org/apache/sshd/server/auth/pubkey/CachingPublicKeyAuthenticator.java#L57-L64 is generating the warning message in the logs. I think authenticate() is this: https://gerrit.googlesource.com/gerrit/+/refs/tags/v3.12.6/java/com/google/gerrit/sshd/DatabasePubKeyAuth.java#111 | 16:10 |
| @lchams:matrix.org | I do have a -sk key, but I think it shouldn't be offering it as I'm passing -c core.sshCommand="ssh -i ~/.ssh/opendev-ssh-key" I only see it offer the one key in the verbose output. I've tried killing the ssh agent in case it was caching anything, but this didn't help | 16:19 |
| I've also just tried deleting all keys from Gerrit and adding a new ed25519 key but am still seeing issues | ||
| @clarkb:matrix.org | Leonie Chamberlin-Medd: ok I was asking questions upstream and they thought that maybe if you had -sk keys available locally it could cause this problem. I agree ti seems from the -vvv output that your client is only offering what appears to be a normal ed25519 key | 16:20 |
| @clarkb:matrix.org | did you ever add that -sk key to gerrit? | 16:20 |
| @clarkb:matrix.org | I'm beginning to suspect: https://gerrit.googlesource.com/gerrit/+/refs/tags/v3.12.6/java/com/google/gerrit/sshd/DatabasePubKeyAuth.java#128 is where things may be crashing if you did | 16:20 |
| @lchams:matrix.org | Yes it was added until I removed it a few mins ago | 16:21 |
| @clarkb:matrix.org | ok I think my theory then is that having a key type that Gerrit cannot handle may cause the authentication process to short circuit around line 128 linked above if that key is checked before any other valid keys | 16:22 |
| @clarkb:matrix.org | now if you've removed that key from your account but the problem is persisting the next question becomes how do we clear the cache for you | 16:22 |
| @clarkb:matrix.org | I think I can run the `gerrit flush-caches sshkeys` command and flush it for everyone | 16:26 |
| @lchams:matrix.org | Just got my coworker to try it and it works on his ubuntu machine now. I do need to head off, thank you for all your help today and we'll try and get it working on our end again soon. Must be an issue with the sk key | 16:26 |
| @clarkb:matrix.org | Leonie Chamberlin-Medd: try what? | 16:26 |
| @lchams:matrix.org | we uploaded my coworker's ssh key on my gerrit account and he could run git review -s just fine | 16:27 |
| @clarkb:matrix.org | oh interesting so something with your client then? | 16:27 |
| @lchams:matrix.org | wasn't working before so deleting the sk key seems to be the fix | 16:28 |
| @clarkb:matrix.org | got it | 16:28 |
| @lchams:matrix.org | Thank you for your help! :) | 16:28 |
| @clarkb:matrix.org | to be clear I won't flush keys given that it seems to be working now | 16:29 |
| @clarkb:matrix.org | feel free to check back in later if you find new issues. But to summarize we suspect that having a -sk key within the user account caused gerrit to short circuit in key cache lookups when trying to find a matching key during authentication | 16:29 |
| @clarkb:matrix.org | removing the -sk key seems to have resolved the problem | 16:29 |
| @lchams:matrix.org | Thanks again! | 16:30 |
| @clarkb:matrix.org | Jens Harbott: fungi re static/docs.openstack.org the mod security db is 2.6GB large. I think we should drop mod security from all of the vhosts now and restart apache and see if we can keep up better in that situation? | 16:41 |
| @clarkb:matrix.org | I worry that the large db is causing requests that go through mod security to do large amounts of work just to look up IPs and that is slowing requests down which causes apache workers to be used for longer | 16:41 |
| @clarkb:matrix.org | then we eventually run out of workers | 16:42 |
| @clarkb:matrix.org | but I want to defer to fungi on that in particular as fungi has been dealing wit hthe static stuff more than anyone else I think | 16:44 |
| @clarkb:matrix.org | oh we still have mod security enabled for openstack docs too | 16:47 |
| @clarkb:matrix.org | and its rules are non trivial I suspect that is why we're slow again. I wonder why this was better on the other server? Maybe just less contention and dedicated resources/workers? | 16:47 |
| @fungicide:matrix.org | it seems to have recovered for the moment? | 16:48 |
| @fungicide:matrix.org | but yes, i'll push a change to clean up those rules | 16:48 |
| @clarkb:matrix.org | fungi: well but also the implication is that things are hitting those rules since the db is growing | 16:51 |
| @clarkb:matrix.org | but I guess it has to check the db on every request to determine if something is blocked | 16:51 |
| @clarkb:matrix.org | and this server is bigger than the old one that had issues that led to the mod security rules. So ya maybe we reset and evaluate with new valid info based on what we see after cleanups | 16:52 |
| @fungicide:matrix.org | that makes sense, yes | 17:05 |
| -@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-jobs] 984141: Add prepare-repos role https://review.opendev.org/c/zuul/zuul-jobs/+/984141 | 17:08 | |
| -@gerrit:opendev.org- Ron Stone proposed: [openstack/project-config] 984830: Update StarlingX docs promote job for R12 release https://review.opendev.org/c/openstack/project-config/+/984830 | 17:15 | |
| @clarkb:matrix.org | mnasiadka: we're removing nodes from inventory then from DNS? (just making sure I'm following the process properly) | 17:31 |
| @mnasiadka:matrix.org | Clark: we can do it in other order if you prefer that, we don’t use DNS names in Ansible inventory - but currently the DNS change depends on the system-config stack | 17:36 |
| @clarkb:matrix.org | no I think this is fine. I'm just getting my bearings it has been a busy morning | 17:36 |
| @clarkb:matrix.org | all three system-config changes lgtm. I'm not sure if anyone else will review those today (feels like everyone is busy). Maybe I single core approve them after lunch if no one else reviews them and I don't have new fires to fight | 17:37 |
| @clarkb:matrix.org | This is interesting docs.opendev.org seems to be quite response right now and the total apache process count is down below the limit. But system load is high | 17:46 |
| @clarkb:matrix.org | so not sure if this is the leading edge of things are going to break shortly or we're just seeing different bebaviors from different types of crawlers? | 17:46 |
| @clarkb:matrix.org | ok ya we are right back to the process limit | 17:46 |
| @clarkb:matrix.org | and its slower but still responsive | 17:47 |
| @clarkb:matrix.org | and load is falling back off again (but still really high) | 17:47 |
| @fungicide:matrix.org | working on the config change for that now, sorry for the delay | 17:49 |
| @clarkb:matrix.org | load like that may be explained by the mod security db I suppose? I guess I'm trying to determine if we should change approach here | 17:50 |
| @clarkb:matrix.org | but I don't think so | 17:50 |
| @jim:acmegating.com | are we talking about https://review.opendev.org/981160 ? | 17:51 |
| @fungicide:matrix.org | oh, looks like we are | 17:52 |
| @fungicide:matrix.org | thanks for reminding me i'd already written this once | 17:52 |
| @clarkb:matrix.org | yes except that change won't disable them as the mod is enabled. I think ew also need to disable the mod? | 17:52 |
| @fungicide:matrix.org | it drops all the additional secrules from the vhost though | 17:53 |
| @jim:acmegating.com | yeah, it looks like that change will remove the rules that add new ips to the honeypot, but we will still check. but eventually, the list will be empty and not grow any more. as that right? | 17:53 |
| @clarkb:matrix.org | oh yup for docs.openstack.org specifically | 17:53 |
| @jim:acmegating.com | * yeah, it looks like that change will remove the rules that add new ips to the honeypot, but we will still check. but eventually, the list will be empty and not grow any more. is that right? | 17:53 |
| @clarkb:matrix.org | corvus: yes I think if we clear the 2.6GB db as part of applying this change that would be the case | 17:53 |
| @clarkb:matrix.org | so that seems like a reasonable next step even without disabling the mod security module | 17:54 |
| @jim:acmegating.com | at that point, are we using mod_security for anything? should we disable it in a followup to that change? | 17:54 |
| @clarkb:matrix.org | corvus: I think we should disable it as a followup yes | 17:54 |
| @clarkb:matrix.org | it was an experiment that our experimentation has discovered has a fatal flaw and looking for other tools seems appropriate at this point | 17:54 |
| @fungicide:matrix.org | looks like that change is not even a month old. what does it say about the past month that it's erased my etch-a-sketch? | 17:54 |
| @clarkb:matrix.org | fungi: should we go ahead and approve https://review.opendev.org/c/opendev/system-config/+/981160 now or was there something you wanted to modify in it? | 17:55 |
| @jim:acmegating.com | fungi: haha -- all these bot fighting days run together | 17:55 |
| @fungicide:matrix.org | i'm still good with it as long as there's no merge conflict | 17:55 |
| @clarkb:matrix.org | I have approved it | 17:59 |
| @clarkb:matrix.org | Gerrit doesn't report a emrge conflict and zuul should report one quickly if there is one | 17:59 |
| -@gerrit:opendev.org- Zuul merged on behalf of Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org: [opendev/system-config] 981160: Drop custom WAF rules from docs.openstack.org https://review.opendev.org/c/opendev/system-config/+/981160 | 18:14 | |
| @fungicide:matrix.org | load average on static03 is still falling, back under 4.0 now | 18:24 |
| @fungicide:matrix.org | and there are still crawlers hitting static04 even almost a day after we moved dns away from it | 18:24 |
| @jim:acmegating.com | just for situational awareness -- the googlebot cleanup patch hit a post-run failure so i rechecked it. based on the jobs, looks like this affects lists and gitea too. | 18:30 |
| @fungicide:matrix.org | yes, they all use the same ua filter | 18:31 |
| @jim:acmegating.com | i don't see a job for static, actually -- should we add the user agents file to the static job? | 18:31 |
| @fungicide:matrix.org | probably yes | 18:31 |
| @fungicide:matrix.org | Clark: ^ > | 18:31 |
| @fungicide:matrix.org | ? | 18:31 |
| -@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [opendev/system-config] 984848: Add the ua-filter to static job file matcher https://review.opendev.org/c/opendev/system-config/+/984848 | 18:32 | |
| @jim:acmegating.com | did that as i followup ^ | 18:32 |
| @clarkb:matrix.org | oh yes we removed static because lp was failing | 18:33 |
| @clarkb:matrix.org | but I thought that was only on the system-config-run side not infra-prod-service- side | 18:33 |
| @jim:acmegating.com | i was talking about system-config-run | 18:33 |
| @clarkb:matrix.org | ah got it | 18:33 |
| @jim:acmegating.com | yeah looks like it's still there on prod-service | 18:34 |
| @clarkb:matrix.org | looks like the deployment did run against static for the custom waf rules | 18:34 |
| @clarkb:matrix.org | fungi: I think we need to stop apache, remove the /var/cache/modsecurity content then start apaache again. Should I go ahead and do that? | 18:34 |
| @fungicide:matrix.org | please do, should only take a few seconds of downtime | 18:35 |
| @clarkb:matrix.org | corvus: I went ahead and approved 984848 since it is test only | 18:35 |
| @fungicide:matrix.org | most of that will be apache freeing swap | 18:35 |
| @clarkb:matrix.org | `systemctl stop apache2 && rm /var/cache/modsecurity/www-data-ip.dir && rm /var/cache/modsecurity/www-data-ip.pag && systemctl start apache2` ok this is the command I'll run | 18:36 |
| @fungicide:matrix.org | lgtm | 18:38 |
| @clarkb:matrix.org | that is done now | 18:38 |
| -@gerrit:opendev.org- Zuul merged on behalf of Ron Stone: [openstack/project-config] 984830: Update StarlingX docs promote job for R12 release https://review.opendev.org/c/openstack/project-config/+/984830 | 18:43 | |
| @clarkb:matrix.org | apache looks pretty happy right now so maybe this was the ticket | 18:49 |
| @fungicide:matrix.org | yeah, load average is back down around 0.3 | 18:51 |
| @fungicide:matrix.org | thrilled that the solution was that simple | 18:51 |
| -@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: | 18:51 | |
| - [opendev/system-config] 984796: Remove a rewrite rule that matches googlebot https://review.opendev.org/c/opendev/system-config/+/984796 | ||
| - [opendev/system-config] 984848: Add the ua-filter to static job file matcher https://review.opendev.org/c/opendev/system-config/+/984848 | ||
| @clarkb:matrix.org | ok lunch is over and I said I would proceed with mirror cleanups from mnasiadka going to do that now. static's load appears to have skyrocketed again while I was eating and is falling now | 19:54 |
| @clarkb:matrix.org | but the mod security db is still size 0 and the service is reponsive and isn't using all worker processes so I think that may just be an artifact of how things are crawling? | 19:54 |
| -@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: | 20:17 | |
| - [opendev/zuul-providers] 982182: Add Ubuntu resolute image build job https://review.opendev.org/c/opendev/zuul-providers/+/982182 | ||
| - [opendev/zuul-providers] 984866: Run vhd builds first https://review.opendev.org/c/opendev/zuul-providers/+/984866 | ||
| @jim:acmegating.com | Clark Jens Harbott mnasiadka ^ that's an idea to try to fit in the space we have | 20:17 |
| @jim:acmegating.com | https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/lib/img-functions#L149-L154 is a reference btw. i looked at the vhd-util source and i don't see a way to avoid that without modification. | 20:18 |
| @clarkb:matrix.org | +2 from me. I think even if that doesn't fix this particular issue it should make things more reliable over all as the total disk usage at any one time should be reduced | 20:18 |
| @jim:acmegating.com | er, i think that commit should say "vhd conversion" not build but you get the idea | 20:18 |
| @jim:acmegating.com | another idea i had: we could remove /opt/dib_cache after the build but before the conversion -- is there a dib element hook point or whatever you call them where we could put that? | 20:20 |
| @clarkb:matrix.org | corvus: I think the "hooks" are the run phases. I suspect that we could add an element that executes a cache cleanup in the very last run phase | 20:20 |
| @clarkb:matrix.org | let me find the docs for the order | 20:20 |
| @clarkb:matrix.org | corvus: https://docs.openstack.org/diskimage-builder/2.7.0/developer/developing_elements.html | 20:23 |
| @clarkb:matrix.org | I think cleanup.d would be an appropriate spot for that | 20:23 |
| @clarkb:matrix.org | it runs outside of the chroot too so we don't have to break out or do anything weird like that | 20:24 |
| @jim:acmegating.com | cool, thanks, in progress | 20:24 |
| @clarkb:matrix.org | infra-root one of the things on my todo list is cleaning up the old h2 v1 caches from Gerrit 3.11 as they are consuming something like 60GB | 20:25 |
| @clarkb:matrix.org | any concern with doing that at this point? It seems unlikely we will revert and if we do we can start over with new caches like we did when we updated to h2 v2 | 20:26 |
| -@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [opendev/zuul-providers] 984867: Delete cache at end of build https://review.opendev.org/c/opendev/zuul-providers/+/984867 | 20:29 | |
| @jim:acmegating.com | Clark: h2 cleanup sounds good | 20:30 |
| @jim:acmegating.com | Clark: ^ i did the delete-cache as a followup since it's more uncertain; but if we still have problems, we can move it up the stack | 20:30 |
| @jim:acmegating.com | each of those changes should give us ~10GB | 20:30 |
| @clarkb:matrix.org | sounds great reviewing that one now | 20:31 |
| @clarkb:matrix.org | corvus: I did post one thought but its more of a "if this doesn't work maybe try this next" option rather than something we need to address if it works as is | 20:33 |
| -@gerrit:opendev.org- Zuul merged on behalf of Michal Nasiadka: [opendev/system-config] 984694: Remove mirror03.gra1.ovh from configuration https://review.opendev.org/c/opendev/system-config/+/984694 | 20:33 | |
| @clarkb:matrix.org | ok old h2 v1 caches are deleted. It actually freed up about 79GB | 20:39 |
| @clarkb:matrix.org | I have to pop out in about an hour and 15 minutes to shuttle kids around. I've just approved the mirror02.ord.rax mirror cleanup change. Not sure I'll get to the dfw one today | 20:45 |
| @clarkb:matrix.org | the deployment for the first mirror03.gra1.ovh mirror removal failed on LE. Looks like we got an ansible rc -13 error against gitea10. This is the issue that occurs with ansible trying to use an ssh control persisted connection that is shutting down iirc. The deployment for removal of mirror02.ord.rax should hopefullybe fine and then we'll be set. I don't think I need to reenqueue the failed deployment as these changes are pulling things out of the inventory | 20:59 |
| -@gerrit:opendev.org- Zuul merged on behalf of Michal Nasiadka: [opendev/system-config] 984698: Remove mirror02.ord.rax from configuration https://review.opendev.org/c/opendev/system-config/+/984698 | 21:27 | |
| -@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [opendev/zuul-providers] 984867: Delete cache at end of build https://review.opendev.org/c/opendev/zuul-providers/+/984867 | 21:29 | |
| @jim:acmegating.com | *sudo* delete cache :) | 21:30 |
| @clarkb:matrix.org | Oh I had the same problem when I tried to delete the old Gerrit caches | 21:30 |
| @clarkb:matrix.org | First time I didn't have permissions had to sudo :) | 21:30 |
| @clarkb:matrix.org | letsencrypt did successfully deploy in the second mirror cleanup change's deploy buildset | 21:52 |
| @clarkb:matrix.org | fungi: if you have a moment https://review.opendev.org/c/opendev/gerritlib/+/983476 and https://review.opendev.org/c/opendev/jeepyb/+/983482 are sort of related to the git-review change in that I'm trying to improve the testing that we run against tools that talk to gerrit. Note I think the jeepyb change may rebuild Gerrit images hwich isn't as big of a deal now that we're pinned to specific tags but keep that in mind as I'm out friday etc. Happy to approve next week during the ptg if we prefer | 21:57 |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!