fungi | clarkb: thanks, sounds good | 00:11 |
---|---|---|
fungi | i had to learn what a "mouse jiggler" is. the new netbook i just got last week still doesn't have great power management support in the mainline kernel, so there's no way to block its idle autosuspend functionality when the keyboard/mouse isn't active for too long (e.g. playing games with a controller) | 00:13 |
fungi | i tried every trick i could think of, finally ended up finding https://pypi.org/p/keep-presence which seems to be working | 00:14 |
clarkb | the autosuspend is built into the hardware? | 00:14 |
fungi | seems that way | 00:15 |
clarkb | weird | 00:15 |
fungi | systemd-inhibit couldn't block it | 00:15 |
fungi | i actually first noticed because the steam client normally blocks idle autosuspend on my other netbooks but wasn't on this one, so i started trying all sorts of workarounds | 00:15 |
fungi | nothing short of actually poking activity into the mouse/keyboard seems to work | 00:16 |
mnasiadka | /query fungi | 07:29 |
mnasiadka | Oops | 07:29 |
*** marios_ is now known as marios | 07:49 | |
*** ralonsoh_ is now known as ralonsoh | 08:08 | |
*** tkajinam is now known as Guest7334 | 09:05 | |
opendevreview | Benjamin Schanzel proposed zuul/zuul-jobs master: js-package-manager: Allow setting additional env vars for build command https://review.opendev.org/c/zuul/zuul-jobs/+/940373 | 13:45 |
*** Guest7381 is now known as bbezak | 14:56 | |
clarkb | fungi: I think I'm ready to proceed with graphite cors updates: https://review.opendev.org/c/opendev/system-config/+/940328 and we can probably approve your exim update too if no one has objected yet (I don't see any objections myself) | 15:52 |
opendevreview | Rafal Lewandowski proposed openstack/diskimage-builder master: Change grub variables for style and timeout https://review.opendev.org/c/openstack/diskimage-builder/+/937684 | 15:52 |
opendevreview | Rafal Lewandowski proposed openstack/diskimage-builder master: Change grub variables for style and timeout https://review.opendev.org/c/openstack/diskimage-builder/+/937684 | 15:53 |
fungi | clarkb: agreed, i've approved 940328 just now | 15:55 |
clarkb | thanks | 15:55 |
clarkb | then the held node with the new version can be used to see if that made things better | 15:56 |
clarkb | as well as productio nto ensure no regressions for the existing version | 15:56 |
fungi | yep | 15:56 |
fungi | speaking of mailman, my post did garner some suggestions | 15:56 |
clarkb | looks like they are suggesting turning off verp bunlding of deliveries entirely? | 15:57 |
clarkb | or at least removing most of the benefit by doing one delivery at a time? | 15:57 |
fungi | yeah, though the one-liner change to drop the triggering bounce content from being attached to the verp probe would also eliminate concerns with that | 15:58 |
clarkb | oh right because then the data wouldn't all be included | 15:58 |
clarkb | so maybe that is sufficient | 15:58 |
fungi | i'll follow the suggestion to open a feature request for a config toggle there | 15:58 |
clarkb | sounds good | 15:58 |
clarkb | looking at our ansible to deploy graphite I suspect that I may need to manually restart the container | 16:09 |
clarkb | but I'll wait for ansible to confirm that for us | 16:09 |
clarkb | actually I may be able to have nginx reload its config via SIGHIP or the nginx -s reload command? I can try that first | 16:18 |
clarkb | (this way we don't lose any stats while things restart) | 16:18 |
fungi | probably yes | 16:18 |
opendevreview | Merged opendev/system-config master: Update graphite to send CORS headers even on 400 responses https://review.opendev.org/c/opendev/system-config/+/940328 | 16:27 |
clarkb | ya that didn't restart anything. I'm sorting out how to gracefully reload the config next | 16:30 |
clarkb | reload seems to work based on timestamps but grafana isn't any happier | 16:35 |
clarkb | I'm trying to debug with curl next | 16:35 |
clarkb | if I curl --verbose -X GET https://graphite.opendev.org I get the headers back. If I hit https://graphite.opendev.org/metrics/find?from=1735576390&until=1738168392 I don't | 16:39 |
clarkb | just to convince myself the add_header generallyworks. Now in theory the always was supposed to fix this for 400 responses (hitting / is a 200) | 16:40 |
clarkb | maybe the nginx -s reload is insufficient for this new config update? | 16:40 |
fungi | yeah, i'm not familiar with the nuances of nginx config loading like i am with apache's | 16:42 |
fungi | like with apache there are definitely some kinds of configuration changes which are hot-reloadable, some which aren't, and some which are sort of warm depending on previous forks/processes to age out | 16:43 |
clarkb | apprently reload will let old processes age out and close all of their connections. However, ps seems to indicate that all of the processes except for the control one have restarted so I don't think we're waiting on that to happen | 16:47 |
clarkb | nothing super interesting in the access and error logs | 16:47 |
clarkb | bsaically just recording that it sends a 400 which I already know | 16:48 |
fungi | it's not like a restart is going to be that damaging if you want to try one | 16:48 |
fungi | it's just graphite, people might get an error page for a minute | 16:48 |
fungi | mostly useful to note what amount of hammering is required for an nginx config update like this, and then i suppose we can encode it into the deploy playbook | 16:49 |
clarkb | ya though its an all in one container so we'll also stop recording stats in that time period since they are udp fire and forget | 16:50 |
clarkb | but also probably not the end of the world. I'll work on that next | 16:50 |
fungi | fair, there could be a hole in some graphs depending on how long the restart takes | 16:50 |
clarkb | restart is done. Currently getting 502 bad gateways (hopefully just due to startup needing a few seconds) and those include the header | 16:52 |
clarkb | ok I get a 400 now and that includes the headers so ya apparently a proper restart was needed? now to test our held node | 16:53 |
fungi | good to know | 16:54 |
clarkb | and now we get a different error: CORS preflight did not succeed | 16:56 |
clarkb | I think the problem now is the 400 means firefox (and chrome I checked there too) treat the preflight check as a failure even with the headers coming back clean | 16:56 |
clarkb | now I wonder if we have to upgrade graphite? | 16:57 |
clarkb | so that it will treat the options request as valid. In the proxy situation we never send the options in the first place because it isn't a cross origin request | 16:58 |
fungi | well, it's at least some progress | 16:58 |
clarkb | we don't pin the graphite version so in theory we're already running the latest verison but I'm trying to run that down | 16:59 |
clarkb | I think docker hub doesn't work in firefox anymore | 17:00 |
clarkb | add that to the list of reasons somethign something | 17:00 |
clarkb | there is a :master tag on the image that has updated more recently but we're on the latest release looks like | 17:01 |
fungi | https://hub.docker.com/r/gerritcodereview/gerrit/tags is loading for me in ff134 | 17:02 |
clarkb | so I think our options are 1) upgrade to the master version of the image and yolo 2) do some nginx config to rewrite 400 responses to OPTIONS requests to some 20X response or 3) use the proxy method afterall | 17:02 |
clarkb | I like 3) because its simple and already mostly implemented. I think it doesn't preclude us from the other options later too | 17:02 |
clarkb | fungi: hrm let me try an incognito tab maybe some config is causing it to spin | 17:03 |
clarkb | ok ya it loads there so ublock origin or something must be creating problems | 17:03 |
clarkb | corvus: ^ fyi since you looked into this yesterday too | 17:03 |
clarkb | changing gears 200.225.47.44 is the held etherpad 2.2.7 node. I've got a clarkb-test etherpad going if anyone else wants to check it out. I think it looks good and we can probably proceed with https://review.opendev.org/c/opendev/system-config/+/940337 and its parent | 17:08 |
clarkb | comparing to current grafana I don't see an options request just the POST request | 17:13 |
fungi | held etherpad 2.2.7 instance lgtm, playing around in https://etherpad.opendev.org/p/clarkb-test there | 17:13 |
clarkb | is the preflight cors OPTIONS check somethign the js can opt into doing maybe? | 17:13 |
clarkb | I wonder if the problem isn't really on the graphite side but with grafana being more discerning via some opt in CORS check (I honestly expected the browser to do taht regardless) | 17:14 |
clarkb | ya looking at server side logs I don't see any OPTIONS requests when I hit the old version of grafana. The interwebs seem to indicate you can't opt out of these reuqests though and instead the browser decides when they are necessary based on the requests that are happening | 17:18 |
clarkb | my hunch here is that grafana has subtley changed how the requests to graphite are made causing both firefox and chrome to decide that the preflight check is required | 17:18 |
clarkb | effectively making graphite not working with grafana unless you proxy? | 17:18 |
clarkb | https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS#simple_requests I suspect this explains the change in behavior | 17:20 |
clarkb | which I think means my list of three options above is still valid | 17:21 |
clarkb | fungi: re etherpad do we want ot go ahead and ugprade it then? | 17:21 |
clarkb | fungi: note there is a parent chagne to switch the mariadb location too. I can squash the changes if we think there might be problems iwth having a two step update | 17:22 |
fungi | i approved both of them, not too worried about having two deploys | 17:23 |
clarkb | ack sounds good | 17:23 |
clarkb | I think those will automatically restart for us unlike graphite fwiw | 17:23 |
clarkb | also happy to help babysit exim things if you want to proceed with that (not sure if it is urgent after disabling bounce processing but other lists still have bounce processing enabled so probably still a good idea) | 17:23 |
fungi | as for the recent grafana regression, i guess they added a header that turned the request non-simple? | 17:24 |
fungi | yeah, the exim change would still be good to land since at a minimum it's causing some messages accepted by mailman to not get delivered | 17:25 |
clarkb | fungi: reading that last link from mozilla I think it is likely grafana changed the method from simple requests using the form element to using fetch() or XMLHttpRequest() which invokes the preflight check | 17:27 |
clarkb | oh reading more it looks like baesed on which headers you send you trip into the new behavior too so ya maybe they added a new header to the request | 17:27 |
fungi | i'm around to keep an eye on exim and etherpad deploys for a bit. have a conference call meeting in about 30 minutes but can multitask if something comes up. also need to disappear for a bit around 20:00 utc to run some quick errands | 17:28 |
fungi | otherwise around the remainder of my day | 17:29 |
clarkb | ya I'm generally around today too | 17:30 |
clarkb | ok I think they use fetch() | 17:31 |
clarkb | that is listed as the initiator in the firefox debug tools for the options request | 17:31 |
clarkb | but the existing deployment seems to say the same thing | 17:31 |
clarkb | also I polluted my main browser somehow and now it works there after getting requests to work against prod grafana. It still fails in an incognito tab though | 17:33 |
clarkb | the code doesn't look all that different to me in the debugger though so I'm not sure why we get the different behavior | 17:34 |
fungi | the transmitted headers are still the same? | 17:36 |
fungi | all cors-safelisted? | 17:36 |
fungi | and content-type is still the same? | 17:37 |
fungi | also we see it with some graphs but not others, right? i wonder how the requests differ between them | 17:38 |
clarkb | fungi: the problem is specific to graphs using query. These lookup information in graphite to then generate additional metric queries | 17:46 |
clarkb | for example we use query in the dib status to query a list of all the images | 17:46 |
clarkb | in the nodepool provider graphs we use it to generate a list of regions per provider. But yes only those using query break because it is the query requests that apparently trip the cors problem | 17:47 |
fungi | ah, i missed that nuance | 17:47 |
clarkb | I'm struggling to determine what headers are sent by the POST because the debugging tools only show you the ehaders for the options request and I'm not sure that they are all the same as what would be sent by the post | 17:47 |
fungi | yeah, probably requires comparing the function calls in the source between the two versions | 17:48 |
clarkb | basically the application wants to make a post but due to need for preflight checks we send an options request first and then all we record is that it wants to do a post not the headers it would've sent for that post | 17:48 |
clarkb | the options request is returning 400 because the query paramater (included in the post) is omitted resulting in the 400 error | 17:49 |
clarkb | this is theoretically a graphite bug | 17:49 |
clarkb | since graphite should know that options requests would be made to satisify cors preflight checks and it should handle missing query parameters in that instance | 17:50 |
fungi | makes sense that they may only be testing/recommending locally-proxied backends and aren't even aware | 17:50 |
corvus | switching to proxy makes sense to me under the circumstances | 17:51 |
clarkb | one thing I'm going to test is if I override grafana.o.o in /etc/hosts if I get different behavior because then the parent domain is the same | 17:52 |
fungi | i see some cors fixes in grafana v11, wonder if this might have been fixed but not backported to 10.x | 17:53 |
clarkb | using /etc/hosts does not help | 17:53 |
clarkb | corvus: ya I've found some forum posts that basically say "its not that bad to proxy performance wise and it fixes all these problems so thats what we recommend" | 17:53 |
clarkb | part of me wants to solve the puzzle, but another part of me is thinking this isn't that important time to move on | 17:54 |
clarkb | fungi: ya that could be | 17:55 |
clarkb | https://review.opendev.org/c/openstack/project-config/+/940276 is the change to proxy if others want to move on too | 17:55 |
fungi | interesting that it's a one-line config addition | 17:59 |
clarkb | https://github.com/grafana/grafana/blob/v10.4.14/public/app/plugins/datasource/graphite/datasource.ts#L194-L265 is the code in qustion I think | 18:00 |
clarkb | though that seems to be for the /render path not the /metrics path so maybe not | 18:01 |
clarkb | https://github.com/grafana/grafana/blob/v10.4.14/public/app/plugins/datasource/graphite/datasource.ts#L612-L656 this is the code actually I think since that is /metrics/find | 18:02 |
clarkb | but that code hasn't changed so it must be something deeper in the request | 18:03 |
clarkb | those requests do use range values and the header value must be simple to avoid the preflight check. I suppose it could be that changed | 18:07 |
clarkb | oh wait those are graphite timestamp ranges not http range headers envermind | 18:08 |
clarkb | anyway I think we just use proxy for now and maybe if we get things updted (including graphite ) we can revisit | 18:16 |
opendevreview | Merged opendev/system-config master: Deploy mariadb for etherpad from opendev's quay mirror https://review.opendev.org/c/opendev/system-config/+/940336 | 18:18 |
opendevreview | Merged opendev/system-config master: Upgrade etherpad to v2.2.7 https://review.opendev.org/c/opendev/system-config/+/940337 | 18:20 |
clarkb | mariadb has restarted and I can still see our meetup etherpad | 18:21 |
clarkb | waiting on the 2.2.7 deployment | 18:22 |
clarkb | and now etherpad has restarted and reports that it is healthy. Checking our meetup pad now | 18:24 |
clarkb | it loads for me and I see a small edit I made on another browser so that lgtm | 18:25 |
clarkb | but I think I'm happy with it | 18:25 |
clarkb | fungi: for https://review.opendev.org/c/opendev/system-config/+/940248 is that one you want to approve after meetings and errands? I'll be around this afternoon and can help monitor then | 18:27 |
opendevreview | Rafal Lewandowski proposed openstack/diskimage-builder master: Change grub variables for style and timeout https://review.opendev.org/c/openstack/diskimage-builder/+/937684 | 18:34 |
fungi | clarkb: yeah, we can probably do 940248 around 21:00z | 18:42 |
fungi | my only real reservation is that it touches the exim config for every one of our servers, so while i don't expect a problem it has the potential to create widespread havoc of sorts | 18:44 |
fungi | etherpad deploys finished about half an hour ago, and pads i had up previously are still there and correct | 18:52 |
clarkb | ya in theory it is a noop for most servers and then only updating jammy servers (noble would already do that?) | 18:56 |
clarkb | but I'm good with that I'll be around | 18:56 |
fungi | yep, exactly | 18:56 |
fungi | noble technically doesn't already do it because we're not using the packaged configs from ubuntu anyway | 18:57 |
clarkb | ah | 18:57 |
clarkb | but we'd be aligned with the packged configs for thsi setting | 18:57 |
clarkb | anyway still seems afe | 18:57 |
fungi | this is effectively including an option from noble's shipped config into our own config | 18:57 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/940349 is also related to mailman fwiw | 18:57 |
fungi | oh, yeah that should also be safe but would be good to keep an eye on when it rolls out | 18:58 |
opendevreview | Clark Boylan proposed opendev/system-config master: Mirror haproxy container image to opendevmirror on quay.io https://review.opendev.org/c/opendev/system-config/+/940403 | 19:03 |
clarkb | the mariadb image update for standalone mariadb used by zuul hit rate limits fetching haproxy | 19:04 |
clarkb | I think if we mirror haproxy too then none of the zuul install will need to fetch from docker hub anymore | 19:04 |
fungi | that'll be excellent | 19:05 |
clarkb | I'm going to go ahead and approve https://review.opendev.org/c/openstack/project-config/+/940276 | 19:08 |
fungi | sounds good. i may still be around when it deploys | 19:09 |
clarkb | it should land shortly | 19:29 |
opendevreview | Merged openstack/project-config master: Proxy Grafyaml requests to Graphite https://review.opendev.org/c/openstack/project-config/+/940276 | 19:32 |
clarkb | I think the easiest way to check if it applied properly is via browser debug tools to see where the requests go | 19:34 |
clarkb | the deploy pipelien succeeded so I will test ^ now | 19:34 |
clarkb | dashboards still load so we don't appear to have broken them at least | 19:35 |
clarkb | yup the render requests go through grafana now and have /proxy/ in their path | 19:35 |
clarkb | so I think this is working as advertised | 19:36 |
fungi | yeah, seems to be working fine to me | 19:41 |
fungi | i'm going to pop out briefly to run errands, back in an hour-ish and then we can work on landing the mailman/exim changes | 19:42 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/940073 I posted a comment on the grafana upgrade change about listed breaking changes in the changelog. I don't think they affect us and our testing seems to show that we are fine. but links are there for others to check against too | 19:42 |
clarkb | sounds good | 19:42 |
opendevreview | Jay Faulkner proposed openstack/diskimage-builder master: wip: ubuntu-minimal: respect DIB_APT_SOURCES https://review.opendev.org/c/openstack/diskimage-builder/+/940407 | 20:41 |
fungi | okay, errands done finally | 20:47 |
fungi | i'll go ahead and approve the exim change first, since it's got more chance to go slantways | 20:51 |
clarkb | ok | 20:54 |
clarkb | what was the second change? | 20:54 |
fungi | mailman container mirror | 21:08 |
clarkb | oh right | 21:08 |
opendevreview | Merged opendev/system-config master: Increase message_linelength_limit to 1G https://review.opendev.org/c/opendev/system-config/+/940248 | 22:01 |
clarkb | thats going to end up behind the hourly jobs' | 22:03 |
fungi | i concur | 22:04 |
fungi | not like i'm going anywhere | 22:04 |
fungi | other than a major update to no man's sky that dropped today, i have no plans | 22:04 |
fungi | well, cooking dinner i guess | 22:04 |
clarkb | fungi: fwiw service-discuss@lists.opendev.org has dropped two people recently. One of them due to the mail server reporting that user doesn't exist in their local user list (so a valid drop I think). The other simply says it has tried to many times and it isn't clear if that one is valid | 22:15 |
clarkb | I'm going to oeprate underthe assumption iti s since the other is clearly valid, but if you're concerned about it let me know | 22:15 |
fungi | i'll look in a bit | 22:17 |
fungi | base is deploying now | 22:22 |
fungi | looks like exim.conf is changed:true on all the servers, probably an extra blank line | 22:28 |
clarkb | ya there are two blank lines between driver = smtp and the comment block below | 22:30 |
fungi | agreed | 22:30 |
fungi | just confirmed | 22:30 |
clarkb | I've checked a jammy node (etherpad) a noble node (paste02) and a focal node (zuul01) and they look correct to me | 22:30 |
clarkb | but thats based on checking the file itself. Not suer how to verify exim is happy | 22:31 |
fungi | yeah, i checked jammy nodes (lists01, gitea09) and focal (zuul01) for comparison | 22:31 |
clarkb | is it enough that it is running? it is running on etherpad | 22:31 |
fungi | running is enough yes | 22:31 |
fungi | if the config is malformed it won't start | 22:31 |
clarkb | cool it is running on all three of those hosts I spot checked | 22:32 |
clarkb | with a restart time of 4 minutes ago ish | 22:32 |
fungi | same, and yeah i checked process start times | 22:32 |
fungi | unless we observe odd behaviors i think this is good | 22:33 |
clarkb | agreed | 22:33 |
fungi | i've approved 940349 now, and will eat dinner while it winds through the machine | 22:35 |
clarkb | that reminds me I think I'm cooking dinner tonight. I'll have to remember that later | 22:40 |
fungi | yeah, i cooked while i was side-eyeing the zuul status page and ansible logs | 22:52 |
fungi | and somehow managed to not burn anything | 22:53 |
opendevreview | Merged opendev/system-config master: Switch mailman3 to opendevmirror hosted mariadb image https://review.opendev.org/c/opendev/system-config/+/940349 | 22:59 |
fungi | just *barely beat the hourly batch with that one | 23:02 |
fungi | and the deploy for it succeeded too | 23:04 |
clarkb | looks like all the containers were restarted and docker reports the new image name | 23:04 |
fungi | yes, i'll keep an eye out and make sure we get at least one new list post delivered | 23:05 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!