Friday, 2025-10-03

*** mtreinish_ is now known as mtreinish00:22
clarkbanyone else seeing high packet loss in cogent on the way to review via ipv4? I don't think it is server specific as the mriror in that region seems to have the same problem for me but bridge has no issues back and forth00:22
clarkbalso pretty sure this isn't my local itnernet connection actiing up as this irc ocnnection is fine as are others00:23
clarkbheh I can't get cogents status page to load00:26
tonybI can't even get to review from here ATM01:05
tonybOh ping gets 80% packet loss01:06
Clark[m]Ya that's what I was getting but via bridge or my personal host in ovh it seems ok. The cogent status page is unreachable so I don't think it is vexxhost specific01:12
Clark[m]I decided to eat dinner and not worry about it for a bit01:13
opendevreviewOpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml  https://review.opendev.org/c/openstack/project-config/+/96255702:12
tonybstill seeing 14% packet loss to review.o.o, but at least its functional now05:22
mnasiadkaI wonder if there’s some easy way I could debug the “Response is empty” errors when clicking in Gerrit, are the Gerrit server logs available somewhere e10:02
mnasiadka(e.g. OpenSearch)?10:02
mnasiadkaIt might be an OpenJDK TLS handshake bug, or something similar (seen these somewhere else)10:02
tonybthose logs aren't in opensearch.   I can take a look in a bit10:05
tonybhow is your connectivity to Gerrit.10:06
tonybas you probably noticed we've been having problems10:06
opendevreviewIvan Anfimov proposed openstack/project-config master: wip  https://review.opendev.org/c/openstack/project-config/+/96292512:04
fungiyeah, my guess is that the gerrit webclient js in the browser is failing to pull content from the rest api when that happens13:22
fungiprobably wouldn't see anything in the server-side logs about that if it's the case, but browser devtools might tell you what's happened to those background requests13:23
mnasiadkafungi: thanks, I can try on Monday - because I think me (and my colleagues) have had that problem for a while14:08
clarkbfungi: I see note of potential replication errors to gitea from gerrit possibly due to the cogent issues (seems likely). You suggested we trigger a full run of gerrit replication once things are stable. They currently look good between me and the mirror node in ca-ymq-1 which was not the case last night and I can reach gerrit. Do we want to find an all clear from cogent or consider14:31
clarkbthat good enough and trigger the replication now?14:31
fungiprobably good enough14:31
fungiif there turn out to still be problems later we can always replicate again anyway14:32
fungii also observed serious slowness getting http(s) responses from sites on static02 this morning, even though pings were 100% clean and system load exceptionally (unrealistically?) low, haven't checked the apache scorecard yet but guessing something is tying up worker slots14:33
clarkbdo you want to trigger that or should I? I should be able to dig out the incantation in a bit (I need some tea first14:33
fungii can, just a sec14:33
fungireindexer started14:35
clarkbfungi: not reindexer14:37
clarkbwe need replication14:37
clarkb(reindexing should be fine, but won't help the giteas)14:37
clarkbthoguh I'm not sure how the two will itneract, might be worth waiting for indexing to complete before replicating to ensure that we don't miss any refs in the great git push14:39
clarkbreindexing is about 1/3 of the way through now14:45
mnasiadkastatic02 seems unresponsive - at least for tarballs.opendev.org14:47
fungid'oh, sorry clarkb yes. that was my bad14:48
fungimnasiadka: yes, i think it's overrun with too many parallel downloads, could be a rush for pulling openstack release artifacts now that flamingo is out, now that i think about it14:49
clarkbit did load for me just a bit slow14:50
fungiclarkb: replication started, i guess those tasks will queue up behind the reindex14:50
clarkbit being tarballs14:50
clarkbfungi: I think they use separate threads so should run in parallel14:51
clarkbya both have tasks that are not labeled 'waiting...' in the show-queue output so I think both are running at the same time14:51
fungipulling server-status locally on static02 is taking some time14:51
mnasiadkaclarkb: kolla-ansible and kayobe ironic CI jobs are failing (time out) when downloading tinyipa image, so I think httpd there is probably having hard time14:52
mnasiadkaICMP looks good at the same time14:52
clarkbwow looks like that is a half gig image14:54
fungiapache scorecard says 149 busy threads, 1 idle at the moment i pulled it14:55
fungialmost all are in the "R" (reading a request) state14:55
clarkbfungi: is that scorecard going to be per vhost or will it cover all vhosts? Just wondering if it shows a complete view or partial view when we compare against the mpm config14:55
fungiit's for the full apache service, all vhosts14:56
clarkbwe configure a maximum of 8k connections for the lifetime of a child worker to ensure they get recycled and age out old certs that have been reissued. But I'm not seeing us configure extra mpm workers/threads/slots14:57
fungithe bulk of connections are requesting docs.openstack.org urls14:57
clarkbmods-enabled/mpm_event.conf says max is 150 so ya we're at the limit14:58
fungithe bulk of the connections are from ipv4 addresses belonging yo china moble, chinanet and china unicom14:58
clarkbconsidering system load maybe we should consider udpating the connection tuning config to bump up total connections14:58
corvusthe cpu usage is unusually low14:59
corvusi don't think those slots are doing any work, i think they're in a slow read situation15:00
fungiyeah, i suspect system load is sub 0.1 because apache is mostly stuffed up15:00
clarkbah ya that could explain it15:00
fungi[Fri Oct  3 12:45:04 2025] afs: Waiting for busy volume 536870992 () in cell openstack.org15:00
corvusoh hrm, do we think afs is contributing?15:01
fungithat was the only line in dmesg from today15:01
fungiso... maybe?15:01
corvusoh that might be ignorable though15:01
clarkbif it happened once then ya probably not a persistent issue15:01
corvusdoing some basic navigating around the docs volume seems fine15:01
corvuslooking at the cacti graphs, i think we have lots of room to increase apache workers (even under normal load)15:02
clarkbmaybe we start there and see if that gives us enough headroom to get past the slowread situation and let clients who can read quicker connect?15:02
fungii'm getting fast responses cat'ing random files from /afs/openstack.org/docs/... on static0215:03
corvusso i like the idea of doing that, and also, just restarting apache now.  the restart might clear out some slow connections and provide immediate relief.15:03
clarkbwe already have a stub connection tuning config in place. I can work on a patch to expand that15:03
fungii can restart apache now, sure15:03
fungiokay, it's restarted15:03
fungiall workers are accepting connections at the moment, according to server-status15:05
corvussome graphs: https://imgur.com/a/th9lGlm15:05
fungipage content is returning quickly for me at the moment too15:06
fungiyeah, i agree we have plenty of room to increase the worker max15:06
clarkbjust working on the math now15:09
clarkbthere are like 8 tunables that all play off of one another15:09
opendevreviewClark Boylan proposed opendev/system-config master: Increase static.o.o apache thread limits  https://review.opendev.org/c/opendev/system-config/+/96297315:16
clarkbsomething like that maybe? It is based on the etherpad connection tuning but I tuned it down a bit15:16
clarkbI figured a ~8x increase was probably a good start15:16
fungilgtm, thanks!15:17
clarkbgerrit reindexing completed with the expected error count of 315:17
clarkblooks like replication queues are empty too15:18
clarkbfungi: do we want to put that in place manually and speed up the deployment here?15:23
fungiclarkb: maybe? response times are already slow again15:33
fungieven pulling server-status is taking a while15:35
fungilike 30 seconds15:35
fungii've manually edited the config for 962973 and restarted apache15:37
fungii have an appointment i need to get to, but hopefully that'll hold it until the change deploys15:38
fungibbiaw15:38
clarkbthanks15:38
clarkbthe apache process counts are definitely increasing15:43
clarkbI have observed the process count fall from 13 to 12 (thats one parent and 12 or 11 child workers aiui) so I think maybe we've found the equilibrium point?15:58
clarkbstatic tuning failed on a testinfra test that I think is due to a chagne in redirects for docs. Looking into fixing that as part of the same change16:02
clarkbhttps://review.opendev.org/c/openstack/openstack-manuals/+/962684 this change updated the redirect from 301 to 302 and we are looking for 30116:03
clarkbbut we have a lot of a 301 checks so I need to makesure this is the only test case affected16:03
opendevreviewClark Boylan proposed opendev/system-config master: Increase static.o.o apache thread limits  https://review.opendev.org/c/opendev/system-config/+/96297316:11
opendevreviewClark Boylan proposed opendev/system-config master: Update docs.openstack.org redirect test  https://review.opendev.org/c/opendev/system-config/+/96298316:11
fungiclarkb: yep, that was me, sorry i didn't realize we were also testing that in system-config17:11
clarkbapache processes have shrunk further17:38
fungioh good17:39
fungipage content is still loading quickly for me too17:40
clarkbya I think the high water mark was something like 360 concurrent requests17:40
clarkbwhcih was well above 150 but now that we've served those requests things are falling back down again. We caught up essentially17:40
opendevreviewMerged zuul/zuul-jobs master: Fix up some EL10 compatibility  https://review.opendev.org/c/zuul/zuul-jobs/+/96219417:43
opendevreviewMerged opendev/system-config master: Update docs.openstack.org redirect test  https://review.opendev.org/c/opendev/system-config/+/96298317:44
opendevreviewMerged opendev/system-config master: Increase static.o.o apache thread limits  https://review.opendev.org/c/opendev/system-config/+/96297317:45
clarkbfungi: it does look like ^ restarted apache at 17:49 UTC18:05
fungiagreed18:05
clarkband we're down to 8 total processes so this seems to be in a happy steady state for now18:06
clarkbfungi: https://review.opendev.org/c/opendev/system-config/+/962826 may or may not do what we want with canonical links for gitea18:08
clarkbfungi: in particular my main concerns are whether or not git is impacted negatively (we do have minimal git clone testing in testinfra for gitea so maybe not?) and whether or not the query parameters have a ? prefixed on the string there18:08
fungioh, right i was about to look at that this morning before i ended up digging into gerrit and static content issues18:09
clarkbbut I think it is to a point where careful review is helpful and we can dig more into those concerns if we need to18:09
clarkbone idea I had is maybe we hold a node and test it with query parameters that way?18:09
clarkbor we can poke around the gitea ui and see if any requests use query parameters and just add that to testinfra test cases18:10
clarkbI think git uses query parameters actually so maybe that is a good test case18:10
fungii guess $QUERY_STRING includes the leading "?"18:13
fungibut yeah it doesn't look like we test it18:14
fungiunless as you say, git is actually exercising that anyway18:14
clarkbfungi: I just don't know if it does and I think git exercises it in that it will return the header but I suspect git ignores the header too18:16
clarkbfungi: but maybe making a git like request with curl is a good test to add and verify18:16
clarkbI'm going to poke around the web ui first to see if there is a query string used there (maybe via search)18:16
clarkbyup https://opendev.org/opendev/system-config/search?q=foo18:17
clarkblet me update the change to include a test that exercises ^ and confirms the Link header is correct for that path18:17
opendevreviewClark Boylan proposed opendev/system-config master: Set canonical Link paths for gitea resources  https://review.opendev.org/c/opendev/system-config/+/96282618:23
clarkbI'm really hoping we don't need some complicated rule to include the ? optionally18:23
clarkbbut that latest patchset should check for us18:23
clarkbRamereth[m]: Ramereth (not sure if these go to the same place or not) Ironic is asking if anyone is using Power + Ironic + PReP Partitioning support in Ironic. I think you may have power gear but don't know if ironic is involved and ifgured I would point it out18:41
clarkbRamereth[m]: Ramereth https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/5RLZTJ2ESGFRVFIS7OUNIE2VOAGHAWZE/ is the email thread18:41
Clark[m]fungi: I'm eating lunch but that test shows it does not add the ? prefix. Do you know how to add that conditionally?19:32
Clark[m]Maybe we can add it in the rewrite rule if the match is non empty?19:33
opendevreviewMathieu Parent proposed opendev/glean master: Use systemd-networkd automatically when enabled  https://review.opendev.org/c/opendev/glean/+/96301019:33
fungiClark[m]: that's what i was afraid of... yeah maybe a rewritecond on qs being nonempty20:05
fungithen we can add "?$QS" if and only if $QS20:05
clarkblooks like Header can have a conditional expression that might be the clearest understandable way to express this20:08
fungioh, even better20:09
clarkbya working on this now20:09
opendevreviewClark Boylan proposed opendev/system-config master: Set canonical Link paths for gitea resources  https://review.opendev.org/c/opendev/system-config/+/96282620:14
clarkbsomething like that maybe. It isn't claer to me if I need to use %{QS}e in the expr block. I didn't because none of the example do but we'll find out20:15
clarkbI'm going to pop out for a bike ride while I awit for results on that. Feel free to push an update if it fails on something silly before I get back20:21
fungiwill do20:26
fungihave fun!20:26
Ramereth[m]<clarkb> "Ramereth: Ramereth (not sure..." <- We don't use Ironic so we don't use that20:26
fungiwe figured you probably didn't, but it was worth double-checking20:27
fungisystem-config-run-gitea for 962826,8 failed...20:59
fungiapache-ua-filter : Reload apache2 FAILED21:00
fungiAH00526: Syntax error on line 55 of /etc/apache2/sites-enabled/000-default.conf: Can't parse envclause/expression: Variable 'QS' does not exist21:01
fungioh, if %{QUERY_STRING} is empty on line 44 we never set QS i guess?21:02
fungimmm, though how is it going to know that at config parsing time? there's got to be another explanation21:03
fungithough i guess RU doesn't get used in a expr like QS does21:05
fungimaybe we need to preset QS to the empty string?21:06
fungi`Define QS ""` maybe?21:10
fungihttps://httpd.apache.org/docs/2.4/mod/core.html#define21:10
fungier, actually i guess those aren't envvars21:17
fungimaybe what we need to do is "expr=-z %{QUERY_STRING}"21:18
fungior should we be using `env=[!]varname` instead if `expr=...`21:20
opendevreviewJeremy Stanley proposed opendev/system-config master: Set canonical Link paths for gitea resources  https://review.opendev.org/c/opendev/system-config/+/96282621:22
fungithat ^ switches from expr to env21:22
fungiseems like the cleanest solution, reading https://httpd.apache.org/docs/current/mod/mod_headers.html#header21:22
clarkbfungi: I think I skipped over that because the docs said that is for defined and undefined vars and I think we always define QS? But amybe the rewrite cond for query string matching .* won't match empty string (it should)22:17
clarkbwe will find out experimetnally I guess can can also change it to .+ potentially so that it doens't match empty string and is unset if undefined?22:18
fungiit looked to me like we only defined it when QUERY_STRING was nonempty, but alternatively we could go back to `expr=-z %{QUERY_STRING}`22:18
clarkbya I could be misundersatnding `RewriteCond %{QUERY_STRING} (.*)`22:18
clarkbI would expect that to always match due to the * not + or whatever22:19
clarkbbut maybe rewrite conds always fail if the var is empty22:19
clarkbwe should know soon enough22:19
clarkbI guess the error message you got indicates QS doesn't exist always22:20
clarkbso ya it mustn't be set in all conditions.22:20
fungithat's the theory i was working on, but it's basically me fumbling around in the dark22:27
fungithe job's close to wrapping up now so we'll know in a few22:28
fungitest_matrix_server and test_matrix_client failed22:29
fungiyeah, i think we're getting a trailing ? when there's no query string, so you're probably right22:33
fungiso do you want to try a stricter RewriteCond or switch to using QUERY_STRING in an expr?22:34
fungiseems like the problem is that QS isn't defined at config parsing time but is always defined during runtime22:35
fungii'm about due to knock off and have my friday evening22:36
Clark[m]Enjoy! I have some pre travel errands to run too (turns out I really need some shoes for walking around in the rain potentially)22:39
Clark[m]We can pick this up Monday. I doubt we're going to deploy this over the weekend anyway. And ya maybe we try .+ Instead to see if that causes the matcher to fail22:39
fungiyeah, sounds fine. good luck shoe shopping!22:41
clarkbI'll get a new patchset up to use .+ before I go22:42
clarkboh heh my ssh keys unloaded already. Maybe that is an indication I should just go shopping22:42
fungicool, i'll try to take a peek at the job results later22:42
fungioh, i can push it22:42
clarkbno I can use the web ui22:42
clarkbits fine22:42
opendevreviewClark Boylan proposed opendev/system-config master: Set canonical Link paths for gitea resources  https://review.opendev.org/c/opendev/system-config/+/96282622:43
clarkbin theory .+ means we only match if the query string is non empty and that means the rewrite rule only runs when set to set QS which means we can use env check to see if the var is set or not22:44
clarkbfingers crossed22:44

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!