| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 978980: Add Michal Nasiadka to base_users on all hosts https://review.opendev.org/c/opendev/system-config/+/978980 | 06:33 | |
| @scott.little:matrix.org | seeing an odd inconsistency in gerrit this morrning. https://review.opendev.org/q/topic:%22concurrent-build%22 shows two code reviews with CR-1, but when I click on them there are no Code reviews on the current iteration. | 15:02 |
|---|---|---|
| @scott.little:matrix.org | replicated in a different machine/browser. So it's not a caching effect | 15:04 |
| @scott.little:matrix.org | * replicated in a different machine/browser. So it's not a caching effect within the browser | 15:04 |
| -@gerrit:opendev.org- Pierre Riteau proposed: [opendev/irc-meetings] 979022: Add Matt Crees as Blazar meeting chair https://review.opendev.org/c/opendev/irc-meetings/+/979022 | 15:05 | |
| @fungicide:matrix.org | we did just upgrade gerrit yesterday, around the time that patchset was pushed... i bet one of gerrit's server-side caches is inconsistent (i did force a reindex after the upgrade but maybe it didn't cover the necessary index for that dta?) | 15:07 |
| -@gerrit:opendev.org- Pierre Riteau proposed: [opendev/irc-meetings] 979022: Add Matt Crees as Blazar meeting chair https://review.opendev.org/c/opendev/irc-meetings/+/979022 | 15:09 | |
| @fungicide:matrix.org | we can try rerunning the reindex too, but i'll wait for some other gerrit admins to be around first since this isn't a critical issue and likely only affects things pushed right when we were restarting the service | 15:09 |
| @mnasiadka:matrix.org | scott.little: seems to work for me (there's one with CR-1 right now) | 15:21 |
| @fungicide:matrix.org | mnasiadka: yeah, when i initially looked at that query result, 977874 was showing a code review -1 while clicking into the change that seemed to be a stale vote from the prior patchset | 15:42 |
| @fungicide:matrix.org | i suspect that when scott.little added several requested reviewers to the change, that caused whatever was cached for the query result to finally refresh | 15:43 |
| @fungicide:matrix.org | because it looks correct to me too now | 15:43 |
| @fungicide:matrix.org | popping out to lunch, back shortly | 16:20 |
| @clarkb:matrix.org | yes if the issue was stale index data then actions on the change that would trigger updates may bring it back into sync | 16:25 |
| @clarkb:matrix.org | fungi: and other admins: if we see things like this in the future related to specific changes I believe we can reindex the change data for a particular change. https://gerrit-review.googlesource.com/Documentation/cmd-index-changes.html this is the command (it is slightly different from the rebuild the whole index command) | 16:27 |
| @clarkb:matrix.org | I think for cases like this we would want to start with ^ doing a speicifc change reindex request | 16:27 |
| @fungicide:matrix.org | good call, that would help us narrow down what index is at fault | 17:26 |
| -@gerrit:opendev.org- Zuul merged on behalf of Pierre Riteau: [opendev/irc-meetings] 979022: Add Matt Crees as Blazar meeting chair https://review.opendev.org/c/opendev/irc-meetings/+/979022 | 17:52 | |
| @fungicide:matrix.org | i'll note that i forgot to perform my followup test to see if my client's forbidden entry for docs.opendev.org lasted past an hour yesterday, so i did it just now almost 20 hours later and a wget of https://docs.opendev.org/ still returns `ERROR 403: Forbidden.` | 17:53 |
| @fungicide:matrix.org | seems like the `SecCollectionTimeout 86400` in a new /etc/modsecurity/collection-timeout.conf file did the trick | 17:54 |
| @fungicide:matrix.org | i'll get a change going to encode it in ansible | 17:54 |
| @fungicide:matrix.org | i'll try again in a few hours after the one-day mark and see if it's back to working again | 17:55 |
| @clarkb:matrix.org | Any change in conntrack counts? | 18:01 |
| @fungicide:matrix.org | not really, 381508 right now | 18:02 |
| @fungicide:matrix.org | i expect most of them are from when things were going sideways and sessions never got cleanly shut down | 18:02 |
| @fungicide:matrix.org | my guess is that we have a floor of ~380k stale connections being tracked | 18:03 |
| @fungicide:matrix.org | server status suggests that only ~25% of the apache worker slots are reading requests or logging, and the othr ~75% are waiting for connections or open slots with no worker process running | 18:04 |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed wip: [opendev/system-config] 978956: Expire mod_security collection entries in one day https://review.opendev.org/c/opendev/system-config/+/978956 | 21:47 | |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org marked as active: [opendev/system-config] 978956: Expire mod_security collection entries in one day https://review.opendev.org/c/opendev/system-config/+/978956 | 21:48 | |
| @fungicide:matrix.org | tested again ~25 hours after getting my client blocked, and i was back to a normal http/200 ok from https://docs.opendev.org/ | 22:46 |
| @jim:acmegating.com | last weekend, an error in a zuul zk schema upgrade broke opendev's zuul, and to correct that for other zuul users, i made another change to zuul that will also break opendev (basically, it's a partial revert, so it breaks it backwards). i'm wondering if now would be a good time to do a zuul scheduler/launcher restart to force that breakage now. the symptom is just that node requests will hang until i do a full reconfiguration. zuul is fairly busy now, but slowing down, and i think this might be a good time to do that in a controlled manner. thoughts? | 22:53 |
| @fungicide:matrix.org | seems to me like it should be fine | 22:54 |
| @clarkb:matrix.org | yes usually pacific time afternoons things tend to quiet | 22:54 |
| @jim:acmegating.com | still some hours in my day so if it goes wrong i'm not dumping it on anyone else :) | 22:54 |
| @jim:acmegating.com | cool, i'll get started on that then | 22:54 |
| @clarkb:matrix.org | corvus: this will only impact jobs that are waiting for nodes or will it impact running jobs too? | 22:54 |
| @jim:acmegating.com | i don't think i'll announce anything since it should just manifest as a delay | 22:54 |
| @jim:acmegating.com | waiting for nodes | 22:55 |
| @clarkb:matrix.org | then ya impact should be minimal and as you say just a longer delay I think that is fine | 22:55 |
| @jim:acmegating.com | we didn't even loose any builds/results during the unexpected breakage over the weekend | 22:55 |
| @jim:acmegating.com | if google's gerrit is slow again, we're going to find out about it again though | 22:56 |
| @jim:acmegating.com | (the reconfigs over the weekend were how i noticed that) | 22:56 |
| @clarkb:matrix.org | your curl test is currently quick and gerrit.googlesource.com loaded for me in a reasonable amount of time | 22:57 |
| @fungicide:matrix.org | the more i look at the bogus urls that are getting denied on docs.openstack.org, the more i'm starting to think that this is actually a brute-force scan looking for unlinked pages by taking the path components from related urls and randomly rearranging/recombining/duplicating them | 23:08 |
| @clarkb:matrix.org | oohhhhh that would certainly explain the url path construction | 23:08 |
| @clarkb:matrix.org | any similar insights with the lists crawler behavior? | 23:09 |
| @fungicide:matrix.org | e.g. one of the most requested url base paths is /developer/diskimage-builder/user_guide/elements which obviously does not exist but /developer/diskimage-builder/user_guide and /developer/diskimage-builder/elements both exist | 23:09 |
| @fungicide:matrix.org | i haven't even started delving into lists yet | 23:09 |
| @fungicide:matrix.org | right now i'm analyzing the most frequent requests on docs.openstack.org to put together a short hitlist of tripwires | 23:10 |
| @fungicide:matrix.org | (by "exist" i mean redirect to something similar, e.g. /diskimage-builder/latest/user_guide and /diskimage-builder/latest/elements) | 23:11 |
| @jim:acmegating.com | doing the full reconfig now | 23:15 |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/system-config] 979089: Add WAF rules for docs.openstack.org https://review.opendev.org/c/opendev/system-config/+/979089 | 23:16 | |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/system-config] 979090: Add our tripwire SecRule to docs.openstack.org https://review.opendev.org/c/opendev/system-config/+/979090 | 23:19 | |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/system-config] 978118: Add WAF rule for developer.openstack.org https://review.opendev.org/c/opendev/system-config/+/978118 | 23:21 | |
| @fungicide:matrix.org | okay, those are split up the way we want now, i think | 23:22 |
| @clarkb:matrix.org | fungi: its like they aer doing a full matrix combo on all of the path components | 23:22 |
| @fungicide:matrix.org | yes, some of them get crazy long as a result | 23:23 |
| @fungicide:matrix.org | and if you don't truncate the requested paths, there are comparatively few repeats | 23:23 |
| @fungicide:matrix.org | i'll try to work on related bits for lists.opendev.org tomorrow | 23:24 |
| @clarkb:matrix.org | fungi: I noted one thing about the id: values in the rules I was under the impression they need to be unique. I didn't -1 bceause if this deploys then maybe it is fine? But also maybe we shoudl double check the docs | 23:25 |
| @clarkb:matrix.org | https://github.com/owasp-modsecurity/ModSecurity/wiki/Reference-Manual-(v2.x)#id is the documentation for that I think | 23:26 |
| @clarkb:matrix.org | what isn't clear to me is if those values need to eb unique or if they can be for better logging etc | 23:27 |
| @jim:acmegating.com | all the required zuul components should be up and running with a valid config now, so the delay window should be over (i'm still finishing up restarting some redundant components) | 23:38 |
| @jim:acmegating.com | and i see previously queued jobs now running on newly-built nodes | 23:42 |
| @jim:acmegating.com | #status log restarted zuul web, scheduler, launcher components and performed a full-reconfiguration | 23:53 |
| @status:opendev.org | @jim:acmegating.com: finished logging | 23:53 |
| @jim:acmegating.com | that should be all done now, status page lgtm | 23:54 |
| @clarkb:matrix.org | thanks! | 23:56 |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!