| -@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 981064: Add new mirror in openmetal https://review.opendev.org/c/opendev/system-config/+/981064 | 06:19 | |
| @mnasiadka:matrix.org | I think there's a Zuul job stuck - https://zuul.opendev.org/t/openstack/status?change=978863 | 06:38 |
|---|---|---|
| @mnasiadka:matrix.org | And looking at static04 and complaints on the ML, we might be able to bump the connection limit - unless there's any other option to get some faster reponses for docs.openstack.org ;-) | 12:43 |
| @fungicide:matrix.org | i agree, there's no memory pressure now, static04 hasn't even touched its swap space and apache's using around 8gb for normal allocations plus 20gb in buffers/cache | 13:10 |
| @fungicide:matrix.org | though apache's not using all available worker slots either according to server-status: "2784 requests currently being processed, 0 workers gracefully restarting, 0 idle workers" | 13:12 |
| @fungicide:matrix.org | should be able to handle 4096 the way it's configured | 13:12 |
| @harbott.osism.tech:regio.chat | browsing docs is still terribly slow for me. maybe we need to further dig into why? | 13:44 |
| @fungicide:matrix.org | <sarcasm>great idea!</sarcasm> | 13:48 |
| @fungicide:matrix.org | when did we *stop* digging into this? | 13:48 |
| @fungicide:matrix.org | i certainly haven't | 13:49 |
| @fungicide:matrix.org | but if you mean maybe people who aren't us should help understand why it's all falling down around our ears, i'm right there with you | 13:50 |
| @fungicide:matrix.org | judging from the apache server-status scorecard, most of the worker slots are in "logging" state, with the next most common being "gracefully finishing" | 13:55 |
| @fungicide:matrix.org | there are very few "waiting for connection" or "open slot with no current process" so yes maybe we have room to bump the MaxConnectionsPerChild from 8192 to 16384? though i'm worried that will put us right back into the same memory pressure downward spiral we saw on the smaller servers | 13:57 |
| @fungicide:matrix.org | another thing that might be worth trying is turning off mod_security for docs.openstack.org, it's possible this new server is fast enough that serving content to the bots will be more efficient than identifying and blocking them | 13:58 |
| @mnasiadka:matrix.org | fungi: If I see correctly - MaxRequestWorkers is at 150 but ServerLimit is at 128, isn't that why Apache is logging that scoreboard is full? | 14:10 |
| @mnaser:matrix.org | fungi: i think that's worth a shot, also.. maybe this is a spicy idea, but perhaps turning off access logging and making it soemthing that you turn on "opt in" when the need for it comes ? | 14:10 |
| @mnaser:matrix.org | that'll cut on a lot of the unnecessary disk write io that's happening on every request | 14:11 |
| @fungicide:matrix.org | looks like almost all of the workers are in "gracefully finishing" state now | 14:11 |
| @fungicide:matrix.org | disk i/o doesn't seem to be high, fwiw, and overall system load is around 1 right now (on a 8vcpu system) | 14:12 |
| @mnaser:matrix.org | fungi: i wonder if https://httpd.apache.org/docs/current/mod/mod_reqtimeout.html can help | 14:13 |
| @mnaser:matrix.org | i think the gracefully finishing if i remember right might be that its waiting for the client to close the request or so | 14:14 |
| @mnaser:matrix.org | unsure if this has been pointed to here => https://bz.apache.org/bugzilla/show_bug.cgi?id=61551 | 14:16 |
| @jim:acmegating.com | anyone feel like pastebinning that for those of us without an account? | 14:17 |
| @mnaser:matrix.org | ssure, 1 second, it is a very old isssue (2017) but might be somewhow relevant | 14:18 |
| @mnaser:matrix.org | https://paste.opendev.org/show/bpjWuzzdcxFCd4n18Obp/ not the greatest paste but it works i think | 14:19 |
| @fungicide:matrix.org | looks like mod_reqtimeout only has options for limiting how long it takes to complete the tls handshake and send the request body, but not how long to wait for the session to close after the response is sent | 14:19 |
| @fungicide:matrix.org | the scoreboard looks a lot like that one full of "G" | 14:20 |
| @mnaser:matrix.org | i wonder if the often-restarting we do because of max # of request per process or whatever + process taking longer to gracefully terminate because it wants to finalize the connections | 14:22 |
| @clarkb:matrix.org | > <@fungicide:matrix.org> should be able to handle 4096 the way it's configured | 14:24 |
| This is the G connection state preventing us from using all of the slots. They are consumed closing connections rather than starting new ones | ||
| @mnaser:matrix.org | i assume the apache server status is not exposed for us to see | 14:24 |
| @clarkb:matrix.org | It is not as it exposes info that shouldnt be public | 14:25 |
| @fungicide:matrix.org | yeah, it includes client addresses and other sensitive data | 14:26 |
| @fungicide:matrix.org | so we lock it down to just localhost, but i can export a redacted/truncated version in a few minutes | 14:26 |
| @clarkb:matrix.org | Given that apache bug I would be willing to try turning off mod security to see if we can keep up without it as it seems that mod security may play into this connection leakage? | 14:26 |
| @fungicide:matrix.org | yeah, that's a fast thing to test anyway | 14:26 |
| @mnaser:matrix.org | i _think_ there way a way to have a lite version of it that didnt include the actual requests, but that's okay for now, just wanted to see what it looked like (minus the ips), and yea, +1 from me for mod_sec | 14:27 |
| @fungicide:matrix.org | well, maybe not "fast" because it would take a while for the server to spiral out of control like this | 14:27 |
| @fungicide:matrix.org | i've put static04 into the emergency disable list in preparation for more experiments that will differ from the other static servers | 14:35 |
| @fungicide:matrix.org | if we want a long-term divergence between them, that will require a little reengineering of their ansible role(s) | 14:35 |
| @clarkb:matrix.org | Makes sense. Thanks for the heads up | 14:35 |
| @fungicide:matrix.org | browsing meetings.opendev.org is suddenly slow for me now | 14:36 |
| @clarkb:matrix.org | There is also the anubis change if we think that will beat back the bad crawlers. If it works that may backfill the lack of the waf | 14:36 |
| @mnaser:matrix.org | its loading ultra fast for me :X | 14:37 |
| @fungicide:matrix.org | also i can't grab server-status on static04 any longer, wget is just timing out connecting now | 14:37 |
| @jim:acmegating.com | yes i'm on attempt 9 of server-status | 14:38 |
| @fungicide:matrix.org | anyway, i'll go ahead and disable mod_security on static04 and restart apache there | 14:38 |
| @mnaser:matrix.org | meetings.opendev.org is resolving to static02 on my side fwiw | 14:38 |
| @mnaser:matrix.org | but confirmed 04 is dead | 14:38 |
| @fungicide:matrix.org | yeah, it should, only docs.openstack.org is on (the much larger) static04 | 14:39 |
| @jim:acmegating.com | fungi: wait | 14:39 |
| @jim:acmegating.com | fungi: have you done anything to stop apache on static04? | 14:39 |
| @fungicide:matrix.org | no, not yet | 14:42 |
| @fungicide:matrix.org | i need to make edits to the vhost configs since we didn't wrap the mod_security directives in a conditional | 14:43 |
| @fungicide:matrix.org | (though that will be the next change i push so that we can toggle it more easily in the future) | 14:44 |
| @jim:acmegating.com | i'm stracing/lsof the busy apache procs and i note that many of them are stuck waiting on the flocks for the waf ip database. the ones that get the lock are reading LOTS of null bytes. maybe there's database corruption causing a slowdown. or maybe it's just big. or maybe there's something else causing too much lock contention. | 14:45 |
| @fungicide:matrix.org | i would bet on all three at once | 14:46 |
| @jim:acmegating.com | i'm not sure if there's been some event (crash, bad shutdown, etc) that we could blame a corrupted database on. if so, maybe we call it a one time event and clear it and restart. if not, then maybe it's buggy and bugs are visible at this scale, and we need to stop using it. | 14:46 |
| @fungicide:matrix.org | i'm good trying either route, though it will take time to see if the server gets back into the same state | 14:47 |
| @jim:acmegating.com | it's not the first time that we've seen it spending a lot of time on locks, so i think i lean toward "remove" (which, i realize we were about to do anyway, but now we have more info). | 14:48 |
| @clarkb:matrix.org | yes we saw the same problem on static03 before creating 04 | 14:49 |
| @fungicide:matrix.org | okay, proceeding | 14:49 |
| @clarkb:matrix.org | which is a completely different filesystem so not the same exact db corruption if that is what it is | 14:49 |
| @clarkb:matrix.org | but could be triggered by the same set of circumstances on both hosts (since they are doing the same activity and being crawled by the same bots) | 14:50 |
| @fungicide:matrix.org | this is what i did on static04 to disable mod_security for now: `sudo rm /etc/apache2/mods-enabled/security2.conf /etc/apache2/mods-enabled/security2.load ; sudo sed -i '/.*Sec\(Action\|Rule\).*/d' /etc/apache2/sites-enabled/*.conf` | 14:50 |
| @jim:acmegating.com | i saw a bunch of lock contention even on static02; we weren't diagnosing this problem then though. | 14:50 |
| @fungicide:matrix.org | after that, `sudo apache2ctl configtest` was successful so i restarted apache | 14:50 |
| @mnaser:matrix.org | is the ip database located on afs or directly on the fs? | 14:51 |
| @jim:acmegating.com | local | 14:51 |
| @fungicide:matrix.org | rootfs, in /var | 14:51 |
| @jim:acmegating.com | -rw-r----- 1 www-data www-data 1916928 Mar 18 14:50 www-data-ip.dir | 14:51 |
| -rw-r----- 1 www-data www-data 12474331136 Mar 18 14:50 www-data-ip.pag | ||
| @mnaser:matrix.org | ah okay so that should rule out networking/etc | 14:52 |
| @jim:acmegating.com | * ``` | 14:52 |
| -rw-r----- 1 www-data www-data 1916928 Mar 18 14:50 www-data-ip.dir | ||
| -rw-r----- 1 www-data www-data 12474331136 Mar 18 14:50 www-data-ip.pag | ||
| ``` | ||
| @fungicide:matrix.org | yeah, i mean, "local" is sort of a vague concept. in this case i think it's connected over iscsi | 14:53 |
| @fungicide:matrix.org | it's pretend-local at least | 14:53 |
| @clarkb:matrix.org | oh wow is it building a 12GB table of ip addresses? | 14:53 |
| @mnaser:matrix.org | yeah fair point but i was ruling out any potential odd behaviour of afs and reading that file, but yea, pretend-local =) | 14:53 |
| @jim:acmegating.com | ``` | 14:53 |
| [pid 161684] read(199, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 | ||
| [pid 161684] lseek(199, 10995727360, SEEK_SET) = 10995727360 | ||
| [pid 161684] read(199, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 | ||
| [pid 161684] lseek(199, 10995728384, SEEK_SET) = 10995728384 | ||
| ``` | ||
| does not look healthy | ||
| @clarkb:matrix.org | its h2 all over again | 14:53 |
| @clarkb:matrix.org | yes I suspect that database doesn't reuse space but rather zeros out rows and appends. So its reading through tons of 0s just to find the data it is looking for? | 14:54 |
| @jim:acmegating.com | i would expect it to be indexed, but maybe so | 14:54 |
| @mnaser:matrix.org | from 2021 - https://serverfault.com/questions/1074742/how-to-reduce-modsecurity-disk-io could be an interseting idea to move it to ramdisk (assuming and understanding why its so big in the first place) | 14:54 |
| @clarkb:matrix.org | There must be some method to making this work better otherwise how would anyone use this successfully | 14:55 |
| @fungicide:matrix.org | also, sławek reported in this thread seeing etherpad slowness, though i have been unable to reproduce it: https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/TGE4RD64BCFOGEMLI25I3QW2LTHWC3MC/ | 15:05 |
| @clarkb:matrix.org | thank you for following up on that thread /me is just reading it now | 15:07 |
| -@gerrit:opendev.org- Scott Little proposed: [openstack/project-config] 981139: Temporarily add the power to delete a starlingx branch. https://review.opendev.org/c/openstack/project-config/+/981139 | 15:08 | |
| @fungicide:matrix.org | for the moment, almost all the non-idle worker slots in the scoreboard are "reading request" but i don't know how long it'll take to change, if it does | 15:10 |
| @clarkb:matrix.org | fungi: I think one thing we should maybe be a bit more explicit about is that this problem is predominantly specific to just openstack docs. We have moved it into its own corner with a server that is 4x the size of the server for the other 32 vhosts and only docs is struggling this hard. Part of that is likely that it is a more attractive target, but also the construction of the redirects are creating a feeding frenzy for the crawlers. I am beginning to believe that the content itself is a contributor | 15:10 |
| @clarkb:matrix.org | and so the lack of any docs maintenance ofr openstack is actively hampering our ability to make this better | 15:11 |
| @fungicide:matrix.org | yes, i touched on that in my main reply, but only briefly | 15:12 |
| @scott.little:matrix.org | would 'git push gerrit :refs/heads/bad-branch' work to delete the bad branch? | 15:17 |
| @clarkb:matrix.org | scott.little: not via this gerrit acl: https://review.opendev.org/c/openstack/project-config/+/981139/1/gerrit/acls/starlingx/meta-starlingx.config this only allows branch deletion through the web ui or gerrit rest api | 15:18 |
| @clarkb:matrix.org | doing it via git push requires additional git access that we typically don't give out because there are basically no safeguards | 15:19 |
| @fungicide:matrix.org | scott.little: the delete permission, as i explained earlier, gives users access to the delete button in the project management page on gerrit, as well as the delete action in gerrit's rest api | 15:20 |
| @fungicide:matrix.org | but not via git push | 15:20 |
| @fungicide:matrix.org | because the way you delete a branch through a git push is to push an empty ref over top of it, so from an underlying git perspective it's the same as `git push --force` | 15:21 |
| -@gerrit:opendev.org- Zuul merged on behalf of Scott Little: [openstack/project-config] 981139: Temporarily add the power to delete a starlingx branch. https://review.opendev.org/c/openstack/project-config/+/981139 | 15:24 | |
| @clarkb:matrix.org | re static04 we've disabled mod security and kept the existing connection limits. We can double them with a reload I think but anything more requires a restart. But things look ok for now? | 15:25 |
| @fungicide:matrix.org | do far yes | 15:26 |
| @clarkb:matrix.org | so we're back to a monitor and continue to adjust state after the mod security removal | 15:26 |
| @clarkb:matrix.org | cool I feel caught up | 15:26 |
| @scott.little:matrix.org | so the rest api metthod .... that would look something like .... curl -X DELETE \ | 15:26 |
| --user username:password \ | ||
| "https://review.opendev.org/a/projects/starlingx/integ/branches/bad-branch" | ||
| @scott.little:matrix.org | * so the rest api method .... that would look something like .... curl -X DELETE \<br /> | 15:26 |
| --user username:password \<br /> | ||
| "https://review.opendev.org/a/projects/starlingx/integ/branches/bad-branch" | ||
| @fungicide:matrix.org | Clark: there was a spike in concurrent requests and i saw it get up over 750, don't know if it went higher, but now it's down around half that | 15:26 |
| @scott.little:matrix.org | * so the rest api method .... that would look something like .... curl -X DELETE | 15:27 |
| --user username:password | ||
| "https://review.opendev.org/a/projects/starlingx/integ/branches/bad-branch" | ||
| @clarkb:matrix.org | scott.little: yes that looks about right based on https://review.opendev.org/Documentation/rest-api-projects.html#delete-branch. | 15:27 |
| @clarkb:matrix.org | note that your password is the gerrit api password/token. Not your ubuntu one openid password | 15:28 |
| @clarkb:matrix.org | you'll find this value at https://review.opendev.org/settings/#HTTPCredentials | 15:28 |
| @fungicide:matrix.org | scott.little: here's an example of where the openstack release managers scripted similar functionality in python, if it helps: https://opendev.org/openstack/releases/src/commit/70cab47/tools/delete_stable_branch.py#L35-L40 | 15:34 |
| @fungicide:matrix.org | also keep in mind that the branch deletion feature in gerrit includes a safeguard to prevent deleting without first closing any open changes targeting that branch, so that they don't get orphaned | 15:35 |
| @fungicide:matrix.org | which is another good reason not to delete branches via git push | 15:35 |
| @clarkb:matrix.org | https://review.opendev.org/c/opendev/system-config/+/980993 is the anubis change if we need it. Testing seems to indicate it is working: https://zuul.opendev.org/t/openstack/build/76cc59168f894a24be9b050269d47c4d/log/job-output.txt#64391 | 15:37 |
| @clarkb:matrix.org | might be worth looking at and deciding if you'd like to see any changes and/or whether or not I should templatize the configuration ports so that we can run an anubis for each vhost | 15:37 |
| @fungicide:matrix.org | awesome, so depending on how the current experiment goes we could roll it into place fairly quickly | 15:37 |
| @clarkb:matrix.org | yes I think so. At least for docs.openstack.org specifically. The current change doesn't try to do anubis for the other vhosts as we need to run a different anubis for each vhost | 15:39 |
| @clarkb:matrix.org | the construction is apache 443 tls configuration proxies to anubis which then proxies to a not publicly exposed apache http vhost with the actual content. The configuration for anubis seems to be one to one between its frontend port mapping and its backend so we can't have an anubis that farms otu to each vhost based on the servername as far as I can tell | 15:40 |
| @clarkb:matrix.org | but maybe I should keep reading docs to see if that is an option | 15:40 |
| @scott.little:matrix.org | didn't work, but no error reported and RC=0 ... curl -X DELETE \ | 15:43 |
| > --user ${review_opendev_org_user}:${review_opendev_org_passwd} \ | ||
| > "https://review.opendev.org/a/projects/starlingx/${project}/branches/${branch_to_delete}-branch" | ||
| @scott.little:matrix.org | * didn't work, but no error reported and RC=0 ... curl -X DELETE \\ | 15:44 |
| > --user ${review\_opendev\_org\_user}:${review\_opendev\_org\_passwd} | ||
| > | ||
| "https://review.opendev.org/a/projects/starlingx/${project}/branches/${branch\_to\_delete}" | ||
| @clarkb:matrix.org | scott.little: the first thing I wuold check is if you can do a read request. Something like https://review.opendev.org/Documentation/rest-api-accounts.html#get-account-status | 15:45 |
| @clarkb:matrix.org | just to confirm that your credentials are correct. Then the next thinsg to check are is your user in the correct group and did the gerrit acl update happen as expected (maybe you did your request before it applied) | 15:46 |
| @clarkb:matrix.org | fungi: I think I need https://anubis.techaro.lol/docs/admin/configuration/redirect-domains#configuring-allowed-redirect-domains set too | 15:46 |
| @scott.little:matrix.org | curl -X GET --user ${review_opendev_org_user}:${review_opendev_org_passwd} "https://review.opendev.org/accounts/self/status" | 15:48 |
| Resolving account 'self' requires login | ||
| @clarkb:matrix.org | scott.little: you need to add the /a/ prefix to the url I think | 15:49 |
| @clarkb:matrix.org | if you fetch from the / prefix it doesn't do any authentication and is limited to things that don't need auth. Adding /a/ has it attempt to do auth | 15:49 |
| @clarkb:matrix.org | and self is an alias for the currently logged in user so needs to be logged in | 15:49 |
| @scott.little:matrix.org | curl -X GET --user ${review_opendev_org_user}:${review_opendev_org_passwd} "https://review.opendev.org/a/accounts/self/status" | 15:51 |
| Unauthorizedslittle1 | ||
| @scott.little:matrix.org | guess that worked | 15:52 |
| @clarkb:matrix.org | I almost read that as it knows you are slittle1 now but is saying you are not authorized possibly because the password is wrong/ | 15:52 |
| @clarkb:matrix.org | you should get back the four byte json breaking prefix and then something like Available | 15:53 |
| @clarkb:matrix.org | is it possible that you reset your gerrit http password token at some point and are using a stale value? It only shows you the value when you generate it forcing you to stash it in a password manager immediately | 15:54 |
| @fungicide:matrix.org | scott.little: also make sure the password you're using is one generated from https://review.opendev.org/settings/#HTTPCredentials | 15:54 |
| @scott.little:matrix.org | yes, tht's how I generated it | 15:54 |
| @fungicide:matrix.org | cool, just making sure you weren't trying your launchpad password or something | 15:55 |
| @clarkb:matrix.org | and --basic is the default for curl so you shouldn't need to add that | 15:56 |
| @clarkb:matrix.org | though the last time I used curl to talk to the gerrit rest api I did add an explicit --basic. But again I dont' think that should be necessary. Let me test with my account really quickly | 15:57 |
| @clarkb:matrix.org | yes --basic is not necessary. However, I also note that I don't try to supply the password on the command line and let it prompt and provide it that way. I wonder if maybe the password has characters being interpreted by your shell and that mangles it | 15:59 |
| @clarkb:matrix.org | and I don't get unauthorized so I think that is something to fix and points to at least one issue | 15:59 |
| @scott.little:matrix.org | adding --basic didn't help. Is there some additional permission I need to RESTAPI access ? | 15:59 |
| @clarkb:matrix.org | no | 15:59 |
| @clarkb:matrix.org | I would try --user slittle1, and let it prompt you and provide the password that way. If that works then you likely need to quote or escape the password | 16:02 |
| @clarkb:matrix.org | and if that doesn't work then you can always ask gerrit for a new password just to be sure it is correct | 16:03 |
| @clarkb:matrix.org | fungi: for static02 and the rest of the vhosts: do we want to move them to static03 and drop the waf? | 16:04 |
| @fungicide:matrix.org | well, nothing's really hitting the waf rules for those sites, so it probably isn't impacting them much. we could clear the existing database there and it would be about the same effect | 16:05 |
| @clarkb:matrix.org | oh right its just the honeypot there right? So the existing 12GB database is holdover from the old docs handling. So yes Iguess clearing the db on security03 then moving things would have the same effect? | 16:06 |
| @scott.little:matrix.org | curl -X GET --user "${review_opendev_org_user}" "https://review.opendev.org/a/accounts/self/status" | 16:06 |
| Enter host password for user 'slittle': | ||
| Unauthorizedslittle1 .... I'm puzzled why the password promt was for slittle, not slittle1 ??? | ||
| @clarkb:matrix.org | scott.little: you are using a variable to pass the value in maybe the variable is set wrong? I would test using an explicit value | 16:07 |
| @scott.little:matrix.org | doh! review_opendev_org_user was missing the '1' | 16:10 |
| @scott.little:matrix.org | curl -X GET --user "${review_opendev_org_user}:${review_opendev_org_passwd}" "https://review.opendev.org/a/accounts/self/status" | 16:10 |
| )]}' | ||
| "" | ||
| @scott.little:matrix.org | is that what I should expect ? | 16:10 |
| @clarkb:matrix.org | yes that is what i get so now I would try other requests | 16:10 |
| @scott.little:matrix.org | curl -X DELETE --user ${review_opendev_org_user}:${review_opendev_org_passwd} "https://review.opendev.org/a/projects/starlingx/${project}/branches/${branch_to_delete}" | 16:11 |
| Not found: starlingx | ||
| @fungicide:matrix.org | scott.little: any literal "/" in the project name has to be %-encoded (%2f i think?) | 16:12 |
| @fungicide:matrix.org | since "/" is also the field separator for the rest api | 16:12 |
| @scott.little:matrix.org | curl -X DELETE --user ${review_opendev_org_user}:${review_opendev_org_passwd} "https://review.opendev.org/a/projects/starlingx%2f${project}/branches/${branch_to_delete}" | 16:12 |
| not permitted: delete on refs/heads/WRCP_26.03 | ||
| @fungicide:matrix.org | which project was that for? maybe its acl isn't inheriting from starlingx/meta-config | 16:13 |
| @fungicide:matrix.org | at least that's the next thing i'd double-check | 16:13 |
| @scott.little:matrix.org | starlingx/integ | 16:14 |
| @fungicide:matrix.org | looking... | 16:14 |
| @fungicide:matrix.org | yeah, your project acls don't have `[access] inheritFrom = starlingx/meta-config` so i think that's not yet actually used | 16:15 |
| @clarkb:matrix.org | fungi: scott.little that acls belongs to the starlingx/meta-starlingx repo | 16:16 |
| @fungicide:matrix.org | ah, meta-starlingx | 16:16 |
| @clarkb:matrix.org | https://opendev.org/starlingx/meta-starlingx which is not a meta config repo | 16:17 |
| @fungicide:matrix.org | okay, yep that was an oversight in review | 16:18 |
| @fungicide:matrix.org | you'd need to put the delete permission in every acl because it doesn't appear starlingx has any inherited central acl that gets included in their per-project acls | 16:18 |
| @fungicide:matrix.org | in openstack we did it with an empty/unused repository: https://opendev.org/openstack/meta-config | 16:19 |
| @fungicide:matrix.org | and then we gave it an acl that contained the common parts we wanted included for every project: https://opendev.org/openstack/project-config/src/branch/master/gerrit/acls/openstack/meta-config.config | 16:20 |
| @fungicide:matrix.org | and then all the normal project acls just inheritFrom that so they don't all contain a bunch of duplicated rules: https://opendev.org/openstack/project-config/src/branch/master/gerrit/acls/openstack/governance.config#L1-L2 | 16:21 |
| @clarkb:matrix.org | fungi: should I stop apache on static03 and remove the mod security db? then start apache again and start updating dns records? I mostly don't want to get ahead of myself if we're still monitoring static02 for some reason (it has a 12GB db file so I Think any slowness there may be due to this?) | 16:38 |
| @fungicide:matrix.org | yeah, that's probably fine. note that i continue to see bursts of bot traffic to 03, i think holding onto very stale dns now | 16:39 |
| @fungicide:matrix.org | most recent hit in the access log there was ~6 minutes ago | 16:40 |
| @clarkb:matrix.org | ya I think we can accept some disruption there as those clients are clearly doing the wrong thing at this point | 16:41 |
| @fungicide:matrix.org | even though it's been about 24 hours since we updated dns, i wonder if some recursive resolvers ignore relatively short ttls | 16:41 |
| @fungicide:matrix.org | i know that used to be practice, but usually it was very short (e.g. 2-second) ttls that got ignored/overridden | 16:41 |
| @clarkb:matrix.org | ya I would expect 5 minutes is ok | 16:42 |
| @clarkb:matrix.org | `sudo systemctl stop apache2 && sudo rm /var/cache/modsecurity/www-data-ip.dir && sudo rm /var/cache/modsecurity/www-data-ip.pag && sudo systemctl start apache2` has been run on static03 | 16:43 |
| -@gerrit:opendev.org- Scott Little proposed: [openstack/project-config] 981152: Revert "Temporarily add the power to delete a starlingx branch." https://review.opendev.org/c/openstack/project-config/+/981152 | 16:49 | |
| -@gerrit:opendev.org- Clark Boylan proposed: [opendev/zone-opendev.org] 981155: Switch static CNAME over to static03 https://review.opendev.org/c/opendev/zone-opendev.org/+/981155 | 16:52 | |
| @clarkb:matrix.org | fungi: ^ I haven't tested any of the vhosts on static03 yet but I think that is the change needed to shift the load over from 02 | 16:53 |
| @fungicide:matrix.org | Clark: i already had one up since a day | 16:53 |
| @fungicide:matrix.org | i thought you were simply talking about approving it | 16:53 |
| @clarkb:matrix.org | oh gah | 16:54 |
| @clarkb:matrix.org | fungi: I can abandon mine. Too many things to keep track of. Does it need a rebase or is the serial good? | 16:55 |
| @fungicide:matrix.org | i don't think we merged anything else after it, but may need a serial refresh | 16:55 |
| @clarkb:matrix.org | fungi: yours is indeed in merge conflict and is WIP. I'll abandon mine and am ahppy to approve yours once it is ready for review | 16:55 |
| @fungicide:matrix.org | oh, yes, we merged the mirror addition | 16:55 |
| @fungicide:matrix.org | i'll switch gears real quick and update/un-wip | 16:56 |
| -@gerrit:opendev.org- Scott Little proposed: [openstack/project-config] 981156: Temporarily add the power to delete a starlingx branch. https://review.opendev.org/c/openstack/project-config/+/981156 | 16:56 | |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/zone-opendev.org] 981006: Move static hosting to static03 https://review.opendev.org/c/opendev/zone-opendev.org/+/981006 | 16:57 | |
| @clarkb:matrix.org | scott.little: as a side note I'm not sure this extra power needs to be temporary since it is scoped to your release team and is using the api branch deletion | 16:57 |
| @clarkb:matrix.org | but I think I'm also happy if we don't want to have that permission forever | 16:58 |
| @clarkb:matrix.org | fungi: I have approved that change carrying over mnasiadka's +2 | 16:58 |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/system-config] 981160: Drop custom WAF rules from docs.openstack.org https://review.opendev.org/c/opendev/system-config/+/981160 | 17:01 | |
| @clarkb:matrix.org | scott.little: and then the other thing to note is that we garbage collect our git repos regularly. This is important for Gerrit performance to ensure that dangling refs aren't polluting the ref space and that objects and refs that we want to keep get packed. What this means is if you delete a ref/branch and then we garbage collect any commits that aren't referred to be anything else will be deleted. OpenStack will tag its branches before deleting them to ensure there is a ref pointing to the content before deletion. This way history is preserved but active development is stopped on that branch | 17:01 |
| @fungicide:matrix.org | that ^ should allow us to take static04 back out of the disable list, and make toggling mod_security easier in the future too | 17:01 |
| @clarkb:matrix.org | and you can't delete branches with open changes (so need to abandon any that remain open) | 17:01 |
| @fungicide:matrix.org | one thing to keep in mind with gerrit-managed repos and git garbage collection is that every change patchset is referred to automatically via named refs, so it's only really merge commits or dangling untagged history for imported branches from prior to being managed in gerrit | 17:05 |
| @fungicide:matrix.org | that would be garbage-collected | 17:06 |
| -@gerrit:opendev.org- Zuul merged on behalf of Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org: [opendev/zone-opendev.org] 981006: Move static hosting to static03 https://review.opendev.org/c/opendev/zone-opendev.org/+/981006 | 17:07 | |
| @fungicide:matrix.org | Clark: any objection to me going ahead and removing `/var/cache/modsecurity/*` on static04, on the assumption that we'll go ahead with 981160 before taking the server out of the ansible disable list? | 17:12 |
| @fungicide:matrix.org | corvus: ^? | 17:12 |
| @fungicide:matrix.org | didn't know if anyone wanted to do more spelunking in the tables first | 17:13 |
| @fungicide:matrix.org | though i suppose we've still got similar files on static02 for as long as we keep that vm around | 17:13 |
| @clarkb:matrix.org | none from me | 17:19 |
| @scott.little:matrix.org | fungi: can I get your eyes on https://review.opendev.org/c/openstack/project-config/+/981156 and https://review.opendev.org/c/openstack/project-config/+/981152 when you have time | 17:22 |
| @jim:acmegating.com | fungi: save a copy in case we feel like digging more? but i have no current plans to. | 17:24 |
| @jim:acmegating.com | it should compress very well | 17:24 |
| @fungicide:matrix.org | giving that a try, the vm is beefy enough it should make short work of that as well | 17:28 |
| -@gerrit:opendev.org- Zuul merged on behalf of Scott Little: | 17:36 | |
| - [openstack/project-config] 981152: Revert "Temporarily add the power to delete a starlingx branch." https://review.opendev.org/c/openstack/project-config/+/981152 | ||
| - [openstack/project-config] 981156: Temporarily add the power to delete a starlingx branch. https://review.opendev.org/c/openstack/project-config/+/981156 | ||
| @fungicide:matrix.org | hah, actually i think these files are almost 90% sparse? `du --apparent-size` says they're 12G but only actually using 149M | 17:41 |
| @fungicide:matrix.org | i wonder if it preallocates a fixed-size sparse file to contain the data? | 17:44 |
| @clarkb:matrix.org | no I think it is like h2 where it grows because it appends | 17:44 |
| @fungicide:matrix.org | and then sparsifies the blocks somehow when it zeroes them? | 17:44 |
| @clarkb:matrix.org | it isn't packing the data it just adds and adds then we're "deleting" the rows once a day that we don't want anymore and leaving the holes | 17:44 |
| @clarkb:matrix.org | ya | 17:45 |
| @fungicide:matrix.org | because most of the file isn't actually mapped to blocks on disk | 17:45 |
| @fungicide:matrix.org | but yeah, they're zero size on static02 so i guess not preallocated after all | 17:46 |
| @fungicide:matrix.org | i couldn't get tar to preserve the sparsification (even with `--sparse`) so i simply moved them to ~fungi/modsecurity-cache-20260318/ | 17:56 |
| -@gerrit:opendev.org- Scott Little proposed: [openstack/project-config] 981169: Revert "Temporarily add the power to delete a starlingx branch." https://review.opendev.org/c/openstack/project-config/+/981169 | 18:03 | |
| @jim:acmegating.com | maybe all we're missing is a GC cron or something | 18:10 |
| @clarkb:matrix.org | https://github.com/owasp-modsecurity/ModSecurity/issues/1928 this seems to say that this is the way it is | 18:13 |
| @clarkb:matrix.org | the v3 system did change the database format so its possible that in newer versions of mod security the issue less of a problem? unfortunately I don't think noble has that newer version | 18:13 |
| @clarkb:matrix.org | yup double checked and there is only mod-security2 for noble not 3 | 18:18 |
| @fungicide:matrix.org | doesn't look like anyone's packaged 3 for debian yet | 18:19 |
| @jim:acmegating.com | seems like the `modsec-sdbm-util -n` suggestion is the most promising thing there. i'm not sure how creating a new file with that interacts with flock though. | 18:19 |
| @fungicide:matrix.org | but also it was growing by more than half a gb every hour, so we'd be doing that pretty frequently if we wanted to keep it to a manageable size | 18:21 |
| @fungicide:matrix.org | (not that i expect the growth was linear, so maybe even faster than that implies if there's an initial burst with ramp-down) | 18:22 |
| @clarkb:matrix.org | maybe we've foudn the fatal flaw with modsecurity and need to look at alternative options. Annoying because i doubt any will integrate with apache so easily, but still possible. haproxy for example can manage data tables and take actions based on their records. I think varnish can too? nginx has been mentioned previously not sure if it relies on something like owasp's tooling though | 18:25 |
| @fungicide:matrix.org | fwiw, varnish is what the debian project is using, the ported a version of haphash to it to do anubis-like things too | 18:30 |
| @fungicide:matrix.org | er, they ported | 18:31 |
| @clarkb:matrix.org | I think for us using haproxy is probably simpler because we're already using it | 18:31 |
| @clarkb:matrix.org | or just using anubis in apache. Too many options, but I guess we are learnign things which is still valuable | 18:32 |
| @clarkb:matrix.org | That said since things do seem mostly under control and traffic is shifting to static03 and seems ok I will try to catch up on the reviews I promised. | 18:32 |
| -@gerrit:opendev.org- Wade Carpenter proposed: [zuul/zuul-jobs] 961512: Bump bazelisk version to v1.28.1 https://review.opendev.org/c/zuul/zuul-jobs/+/961512 | 18:42 | |
| -@gerrit:opendev.org- David Shrewsbury proposed: [opendev/bindep] 978585: New exit code of 2 for input file parse errors. https://review.opendev.org/c/opendev/bindep/+/978585 | 19:03 | |
| -@gerrit:opendev.org- Wade Carpenter proposed: [zuul/zuul-jobs] 961512: Bump bazelisk version to v1.28.1 https://review.opendev.org/c/zuul/zuul-jobs/+/961512 | 19:22 | |
| @fungicide:matrix.org | heading out to grab a long-overdue lunch, but will check back in when i return | 19:54 |
| @shrews:matrix.org | but soooo close to dinner! | 20:04 |
| -@gerrit:opendev.org- Zuul merged on behalf of David Shrewsbury: [opendev/bindep] 978585: New exit code of 2 for input file parse errors. https://review.opendev.org/c/opendev/bindep/+/978585 | 21:12 | |
| @clarkb:matrix.org | ok now to the prometheus changes | 21:13 |
| @clarkb:matrix.org | mnaser: mnasiadka ok I've posted comments to the two prometheus changes. I think this looks great. I did call out a few places where I think we should either double check some things or maybe slightly tweak the approach but I think this is very close. Let me know what you think of my review and no rush I know it has been a busy week for many of us | 21:57 |
| @fungicide:matrix.org | checking back in on the active static servers, memory use and apache scoreboard still seem healthy, so we should probably go ahead with 981160 and take static04 out of the disable list again, but maybe waiting to see what it's all doing when i wake up in the morning would be good first? | 22:13 |
| @clarkb:matrix.org | That seems reasonable | 22:20 |
| @fungicide:matrix.org | cool, that's my current plan then | 22:29 |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!