clarkb | corvus: the same set of tripleo changes that we evicted last erestart is apparently near merging and the estimates have them merging just before the zuul change. I'm thinking if those esimates are off it might be nice to giv ethem a few more minutes ust to see if we can flush them out of the queue pre restart | 00:02 |
---|---|---|
corvus | clarkb: wfm | 00:10 |
clarkb | zuul just flushed them out. They beat the zuul change | 00:18 |
clarkb | hrm zuul -2'd the zuul change | 00:18 |
corvus | looking at tests | 00:19 |
clarkb | Both unittest jobs failed and not timeouts | 00:19 |
corvus | yeah, the new test... it may be racy :/ | 00:20 |
clarkb | looks like assertfinal state isn't empty because the periodic jobs are racing in | 00:20 |
clarkb | corvus: you should be able to update the config to remove the jobs at the end of the test? | 00:20 |
corvus | wait i thought i had that in there | 00:20 |
corvus | doh... i deleted that with some extra test prints | 00:21 |
corvus | that was dumb; sorry :/ | 00:21 |
clarkb | no worries, happy to rereview | 00:22 |
corvus | merged; awaiting promote | 01:49 |
corvus | promote succeeded | 01:53 |
corvus | pulling | 01:53 |
clarkb | ok keys loaded and ready if needed | 01:54 |
corvus | pull done; i'll restart now | 01:56 |
clarkb | corvus: I think it is up now? | 02:08 |
corvus | yep, will re-enqueue | 02:08 |
fungi | i came back just in time for the fun! | 02:08 |
corvus | i got a couple of transient errors on re-enqueue that are worth looking into later, but i don't think they're critical now | 02:16 |
corvus | https://paste.opendev.org/show/809663/ | 02:16 |
corvus | i re-ran that and it worked 2nd time | 02:16 |
corvus | re-enqueue finished | 02:17 |
corvus | #status log restarted all of zuul on commit 659ba07f63dcc79bbfe62788d951a004ea4582f8 to pick up change cache fix for periodic jobs | 02:17 |
opendevstatus | corvus: finished logging | 02:17 |
corvus | generally looking good to me. hopefully tomorrow we won't see any misplaced changes in periodic pipelines. | 02:18 |
corvus | i think we saw them in zuul and vexxhost, so those are the tenants to check | 02:18 |
fungi | malformed entries in the cache? | 02:18 |
corvus | (it's the 0600 entries that would show it; the hourly pipelines don't trigger the bug) | 02:19 |
corvus | fungi: this bug: https://review.opendev.org/811488 | 02:19 |
fungi | but yeah, doesn't look urgent | 02:19 |
corvus | oh sorry the paste | 02:19 |
corvus | yeah, not sure about that one yet | 02:19 |
corvus | might be a cache collision, like the cache is being written to or something | 02:20 |
corvus | or... | 02:20 |
fungi | right, the paste, seems to have raised in zuul.zk.change_cache | 02:20 |
corvus | it might be a place where i missed changing from tuple change keys to structured cache keys | 02:20 |
fungi | ahh, that seems plausible | 02:21 |
corvus | yeah, i bet that's it. should be straightforward to track down (and is probably missing test coverage) | 02:21 |
corvus | i think i'll afk now | 02:22 |
clarkb | thanks! | 02:22 |
fungi | have a good evening! | 02:23 |
*** ykarel|away is now known as ykarel | 05:18 | |
*** ysandeep|out is now known as ysandeep | 05:43 | |
*** jpena|off is now known as jpena | 07:28 | |
*** ykarel is now known as ykarel|lunch | 08:49 | |
*** ykarel|lunch is now known as ykarel | 09:52 | |
opendevreview | Ananya proposed opendev/elastic-recheck rdo: Fix ER bot to report back to gerrit with bug/error report https://review.opendev.org/c/opendev/elastic-recheck/+/805638 | 11:08 |
opendevreview | Ananya proposed opendev/elastic-recheck rdo: Fix ER bot to report back to gerrit with bug/error report https://review.opendev.org/c/opendev/elastic-recheck/+/805638 | 11:10 |
*** jpena is now known as jpena|lunch | 11:24 | |
*** jpena|lunch is now known as jpena | 12:22 | |
opendevreview | Ananya proposed opendev/elastic-recheck rdo: Fix ER bot to report back to gerrit with bug/error report https://review.opendev.org/c/opendev/elastic-recheck/+/805638 | 12:22 |
opendevreview | Ananya proposed opendev/elastic-recheck rdo: Fix ER bot to report back to gerrit with bug/error report https://review.opendev.org/c/opendev/elastic-recheck/+/805638 | 12:26 |
opendevreview | Merged opendev/bindep master: Add Rocky Linux support https://review.opendev.org/c/opendev/bindep/+/809362 | 14:09 |
*** cloudnull8 is now known as cloudnull | 14:20 | |
*** ykarel is now known as ykarel|away | 14:59 | |
fungi | infra-root: just a reminder to be on our toes tomorrow, the old letsencrypt root cert (IdenTrust DST Root CA X3) is expiring, so long as our certs are already signed by the new root (ISRG Root X1) we should be fine, pretty sure we checked previously but might not hurt to look one more time in case we need to force renewals somewhere | 15:18 |
fungi | pretty sure they switched to the new root cert long enough ago that any certs we had prior to it would themselves be long expired now anyway | 15:19 |
clarkb | They switched at the beginning of this year or end of last iirc | 15:20 |
clarkb | it was a while ago | 15:20 |
fungi | yep. though also need to be on the lookout for breakage with openssl <= 1.0.2 and the like | 15:20 |
*** dviroel is now known as dviroel|ruck | 15:23 | |
*** marios is now known as marios|out | 15:49 | |
clarkb | fungi: I actually think a number of our servers are not serving the new intermediate. That surprises me. Maybe acme.sh isn't updating the intermediate cert? | 16:00 |
clarkb | no I don't think that is it since paste has the old chain too. What is going on here | 16:04 |
clarkb | https://github.com/acmesh-official/acme.sh/issues/3663 | 16:06 |
clarkb | acme.sh has a --preferred-chain flag and if you don't specify it the default offered chain is used. That implies LE is offering people chains that will expire in less time than the cert validity period? | 16:19 |
clarkb | that makes no sense but wouldn't surprise me given all the moving parts here | 16:19 |
opendevreview | Clark Boylan proposed opendev/system-config master: Use the new LE chain to avoid expiring chain https://review.opendev.org/c/opendev/system-config/+/811749 | 16:22 |
clarkb | infra-root ^ I think that we want to land that then force reissing all of our LE certs | 16:22 |
clarkb | I'm going to fetch that onto bridge and run manually against paste | 16:24 |
*** jpena is now known as jpena|off | 16:38 | |
clarkb | I've managed to get a new cert but it appears to have used preexisting verification and the same chain on paste01 | 16:49 |
clarkb | hrm it actually gave us a new ca.cer file | 16:50 |
clarkb | but it is for the same chain as far as I can tell | 16:50 |
clarkb | fungi: do you know how to force it to try again from scratch? it seems --force is insufficient | 16:51 |
clarkb | I can try moving the /root/.acme.sh stuff aside | 16:51 |
fungi | still catching up, i'll double-check the certs first to make sure i understand the concern | 16:51 |
clarkb | I'm really annoyed that LE is issuing certs this way at all | 16:54 |
fungi | clarkb: when looking at the https cert for etherpad.opendev.org in firefox, it's serving two additional chain certs, the let's encrypt r3 cert, and the isrg root x1 cert | 16:54 |
clarkb | They should've stopped >3 months before the cert expires | 16:54 |
clarkb | fungi: in firefox I see the r3 cert and the DST root cert not the isrg root | 16:55 |
fungi | for etherpad? | 16:55 |
clarkb | yes | 16:55 |
fungi | that cert was renewed earlier today too, from the look of its "not before" field | 16:55 |
clarkb | though looking at chrome I see the ISRG cert, how can that happen? | 16:55 |
clarkb | huh chrome is showing me ISRG cert but firefox shows the DST | 16:56 |
fungi | "Browsers (Chrome, Safari, Edge, Opera) generally trust the same root certificates as the operating system they are running on. Firefox is the exception: it has its own root store." https://letsencrypt.org/docs/certificate-compatibility/ | 16:56 |
fungi | maybe related? | 16:56 |
clarkb | maybe? This isn't the first time I've had problems with firefox's cert viewer either | 16:58 |
clarkb | the etherpad cert appears to be etherpad -> R3 -> ISRG -> DST (the primary chain for LE) in chrome | 16:58 |
clarkb | er sorry with s_client | 16:58 |
clarkb | review is review -> R3 -> ISRG (the alternate chain for LE) with s_client | 16:58 |
clarkb | I'm tempted to set my local desktop clock to tomorrow and see what happens | 16:59 |
clarkb | (though half worried that will make certain things angry) | 16:59 |
fungi | i want to say debian's firefox package actually hacks it to use the system trust instead of its own, so maybe that's why i'm getting different results than you are with it | 16:59 |
clarkb | I'm using mozilla's beta build distribution | 17:00 |
clarkb | re review's chain above I'm sorry I was wrong. Review is the same as etherpad according to s_client it is paste which I just updated using my change and --force that has the ISRG only chain | 17:01 |
clarkb | I'm beginning to suspect that firefox's cert viewer isn't showing me the full chain and is instead showing the very end | 17:01 |
fungi | there's a -no_alt_chains option to s_client which might change that behavior too | 17:01 |
clarkb | no change with -no_alt_chains as it is a single linear chain aiui. The verification process is just supposed to stop before it gets to the DST root if something before it is trusted and verifies | 17:02 |
fungi | testing via `openssl s_client -connect etherpad.opendev.org:https -showcerts` i get etherpad.opendev.org -> Let's Encrypt R3 -> ISRG Root X1 -> DST Root CA X3 | 17:03 |
clarkb | yup and now if you do paste.opendev.org you'll lack the DST Root due to my preferred-cert and --force attempt | 17:03 |
fungi | agreed the result with -no_alt_chains is identical | 17:03 |
clarkb | I can revert that back by removing my --preferred-chain if we like? | 17:03 |
clarkb | and I guess the problem here is firefox | 17:04 |
fungi | paste01.opendev.org -> Let's Encrypt R3 -> ISRG Root X1 | 17:04 |
fungi | yep | 17:04 |
fungi | so the concern is that serving an expired ca cert in the bundle will cause problems for some clients even if they already trust a later ca in the chain provided? | 17:05 |
clarkb | fungi: that is what old openssl et al will fail on | 17:05 |
fungi | got it | 17:05 |
clarkb | here is an interesting one. If I open the cert viewer for paste in FF I still see the DST root | 17:06 |
fungi | so we do want to try to force removal of the old DST Root CA X3 from the bundle on all the servers (which we could do by hand in theory, just editing the chain bundle to delete it) | 17:06 |
clarkb | I wonder if firefox is doing magic around their R3 | 17:06 |
clarkb | fungi: I think I've shown that https://review.opendev.org/c/opendev/system-config/+/811749 will do that the next time we issue certs | 17:06 |
fungi | right, i concur, but that will happen gradually over the course of ~two months | 17:07 |
clarkb | if we do that then certain older systems that don't trust the ISRG root will start to fail. Some of them will already fail due to the presence of the DST root. Others will succeed because tehy have a DST root hack (android) | 17:07 |
clarkb | fungi: no it won't because LE's default chain is the one with DST in it to make older android happy | 17:07 |
fungi | i mean after we merge 811749 | 17:07 |
fungi | it won't take effect across all our servers instantly | 17:07 |
clarkb | oh right we would have to wait for normal renewals. I guess it is up to us whether or not we'd prefer to have the primary default chain in which case we do no updates or force the ISRG cert? | 17:08 |
clarkb | My initial thought on that is the latnerate chain might be better because you can always explicitly add the ISRG cert to your trust chain then make old clients work | 17:08 |
clarkb | But if you have an old client and it sees the DST root in many cases there is no real workaround other than upgrading | 17:09 |
fungi | the resulting bundle is the same excelt that --preferred-chain "X1" will omit DST Root CA X3 right? | 17:10 |
fungi | s/excelt/except/ | 17:11 |
clarkb | fungi: yup I think so. If you look in etc/letsencrypt-certs/paste01.opendev.org and etc/letsencrypt-certs/paste01.opendev.org.bak you can compare them | 17:11 |
fungi | so if we're worried we'll bump up against renewal limits for le's acme api, we could make that same edit locally is what i'm saying | 17:12 |
clarkb | got it | 17:12 |
clarkb | to summarize: my firefox install is being really derpy (hopefully it doesn't break tomorrow but if it idoes it will be a local problem). Currently all our sites seem to have the site -> R3 -> ISRG -> DST chain except for paste01 where I manually reran a forced update with https://review.opendev.org/c/opendev/system-config/+/811749 which removed DST from the chain | 17:13 |
clarkb | All that to say modern clients and our servers shouldn't have problems tomorrow when DST expires. Old clients may start to have problems. We can attempt to mitigate those either by upgrading or adding ISRG to the trust chain on systems and dropping DST from our served chains | 17:14 |
clarkb | fungi: ^ does that seem about right? | 17:14 |
fungi | yeah, so if we do nothing, things will presumably keep working with the default chain except old (openssl?) which will choke on there being an expired root cert in the bundle | 17:14 |
fungi | i agree. probably hard to decide on a particular course of action until we see exactly what the fallout winds up being | 17:15 |
clarkb | fungi: I'm thinking maybe lets leave paste as is for now and we can use it to cross check if old clients show up | 17:15 |
fungi | agreed | 17:15 |
fungi | infra-root: to summarize, we'll take no action now, but if clients run into problems validating https certs for our services on/after tomorrow we can get them to double-check against paste.o.o and consider merging 811749 if that ends up being the better behavior | 17:16 |
clarkb | note merging 811749 alone wasn't sufficent to have it reissue files I had to add a --force in my checkout on system-config to the acme.sh --issue call | 17:17 |
clarkb | however we can land 811479 then separately copy the ca.cer from paste.o.o to avoid reissing things | 17:17 |
clarkb | I think | 17:17 |
fungi | yep | 17:17 |
fungi | i concur | 17:18 |
clarkb | fungi: One more question. If you look at the ca.cer and fullchain.cer files that acme.sh is writing out it seems to not include the R3 cert, Just the root(s depending on where you look there is one or two) | 17:37 |
clarkb | fungi: any idea why that is? | 17:37 |
clarkb | and yet somehow browsers and s_client all see the R3 in the middle | 17:38 |
fungi | i'll have to parse them with openssl x509 to confirm | 17:39 |
opendevreview | Merged zuul/zuul-jobs master: Make default tox run more strict about interpreter version https://review.opendev.org/c/zuul/zuul-jobs/+/807702 | 17:44 |
clarkb | fungi: ok I think I've figured it out. The cert in ca.cer and fullchain.cer on paste01 is the R3 cert | 17:49 |
clarkb | fungi: on other systems it is the R3 cert + the cross signed ISRG Root | 17:50 |
fungi | yes | 17:50 |
clarkb | In both cases the root most cert is omitted (ISRG and DST respectively) ebacuse you're expected to have that locally | 17:50 |
fungi | i just finished hacking the files apart because i couldn't work out how to get openssl x509 -in to parse multiple certs out of a single file | 17:50 |
clarkb | In my firefox case it must be getting the R3 cert then using its local files rather than provided chain to verify from there which is how I end up with DST | 17:51 |
clarkb | I don't know what to expect locally tomorrow, but hopefully the blast radisu is small and I can use chrome temporarily if necessary | 17:51 |
clarkb | still not great of firefox to do this imo | 17:52 |
fungi | yeah, probably ff works its way up the chain from the server cert until it reaches a cert signed by something it considers valid from its trust store | 17:52 |
fungi | so it hits the cross-signed cert, identifies one of the signers which is in its trust store, and reports that (even though the other signer is probably in its trust store too) | 17:53 |
clarkb | yup and the other signer has a much better validity period | 17:53 |
fungi | for all i know it could just match oldest first | 17:54 |
clarkb | oh this is even more interesting. I think the R3 cert firefox shows is different than the one we serve. Its almost like firefox has its own R3 cert and is trusting them from that point on rather than the next level down | 18:00 |
clarkb | Hopefully tomorrow it will use what is supplied if necessary | 18:01 |
*** ysandeep is now known as ysandeep|out | 18:53 | |
*** timburke__ is now known as timburke | 19:39 | |
melwitt | grafana is ... in a state rn https://grafana.opendev.org/d/5Imot6EMk/zuul-status?orgId=1 | 20:49 |
clarkb | melwitt: it seems to load for me | 20:52 |
clarkb | maybe try a hard refresh? | 20:52 |
melwitt | oh weird | 20:52 |
melwitt | hm, hard refresh same result. all panels say "N/A" and the error is "Failed to fetch". maybe something with my vpn connection? | 20:55 |
clarkb | possibly? Grafana loads the graphite data from graphite.opendev.org on the client side iirc | 20:56 |
clarkb | so it could be an issue getting to the backend data store from your client. | 20:56 |
melwitt | just checked with my phone that's not on vpn and same result. hm. | 20:56 |
clarkb | let me try another browser | 20:56 |
clarkb | https://grafana.opendev.org/d/5Imot6EMk/zuul-status?orgId=1 loads for me | 20:57 |
clarkb | in both firefox and chrome | 20:57 |
clarkb | I wonder if it could be an ipv6 issue? | 20:58 |
melwitt | you're right, shows "No Data" in graphite for me too | 20:58 |
clarkb | oh interesting | 20:58 |
clarkb | melwitt: https://graphite.opendev.org/?width=586&height=308&target=stats.zuul.executor.ze01_opendev_org.builds does something like that show data? | 20:59 |
melwitt | clarkb: yes, that does show data | 21:00 |
clarkb | ok so you can directly reach graphite which is where grafana pulls data from | 21:00 |
clarkb | do any of the other graph dashboard work on grafana? wonder if it could be specific to this one for some reason | 21:00 |
clarkb | also I just checked on my phone and it works too | 21:01 |
melwitt | oh it's working now | 21:01 |
melwitt | yeah, whatever was wrong resolved itself. how strange | 21:01 |
clarkb | weird. I guess let us know if it happens again and we can try to dig into it more | 21:01 |
melwitt | but yeah earlier I tried the ceph fail rate dashboard and the logstash queue dashboard and both were erroring in the same way | 21:02 |
melwitt | ok, thanks | 21:02 |
melwitt | it's still in an error state on my phone but it's working on my laptop @_@ | 21:07 |
fungi | could also be a cross-site blocker in your browser | 21:13 |
clarkb | do mobile browsers have the dev debugging tools? Might be good to see what exactly is failing like if it is a specific request? | 21:28 |
*** dviroel|ruck is now known as dviroel|out | 21:34 | |
melwitt | I'll see what I can do | 21:45 |
fungi | looks like we may have mirrored a broken fedora 34 package state... https://zuul.opendev.org/t/zuul/build/687f1c00ab964462aea5c4d36003d625/console#0/3/24/fedora-34 | 21:49 |
clarkb | looks like it can't find packages implying the index updated before the packages were present | 21:51 |
clarkb | unfortunately for rpm mirrors we don't have a tool like reprepro to check things are valid before publishing | 21:51 |
fungi | yeah, if the mirror has refreshed since that failure, i'll just recheck and see if it was a temporary state | 21:51 |
fungi | and it has (just a few minutes ago) | 21:52 |
melwitt | clarkb: this is what I got with web inspector on my phone "TypeError: The certificate for this server is invalid. You might be connecting to a server that is pretending to be “graphite.opendev.org” which could put your confidential information at risk." | 21:59 |
clarkb | melwitt: ok that is super helpful actually. We have a known issue with aapche where asking it to reload to pick up cert rotations doesn't always work because it wait nicely for existing processes to finish up | 22:00 |
melwitt | I'm not sure why it didn't just say that in the browser somehow... instead it just shows broken graphs with "No Data" | 22:00 |
clarkb | melwitt: its possible that we did a cert rotation and then have a backend apache process that is still using the stale cert. I will check that now | 22:00 |
clarkb | hrm graphite doesn't do apache | 22:01 |
clarkb | graphite's cert rotated on august 27. Its docker container which I expect is running nginx last restarted 4 weeks ago whcih is in line with a restart for new cert | 22:02 |
fungi | melwitt: did it provide any additional explanation for why the certificate was invalid? like expiration or wrong hostname? | 22:04 |
clarkb | s_client says the cert chain is just graphite -> R3 | 22:05 |
fungi | i wonder if this is the let's encrypt root key expiration taking effect | 22:05 |
melwitt | fungi: I didn't see more info but I'll look again. I'm not super familiar with web inspector | 22:06 |
clarkb | fungi: its about 14 hours early if so | 22:06 |
fungi | we could try removing the old root cert from the intermediate bundle to see if that solves it, i suppose | 22:06 |
fungi | time is an illusion, time on cell phones doubly so | 22:06 |
clarkb | fungi: right but even my local system with s_client complains | 22:06 |
clarkb | it does not complain with etehrpad | 22:07 |
fungi | oh, openssl s_client is erroring? | 22:07 |
clarkb | fungi: yes on my desktop | 22:07 |
fungi | yeah, maybe the nginx inside that container does something weird with cert bundles | 22:07 |
fungi | melwitt: have you tried going directly to https://graphite.opendev.org/ to see if that gives you errors as well? | 22:09 |
melwitt | oh yeah there it goes again | 22:09 |
melwitt | let's see... | 22:09 |
clarkb | ok we don't serve the chain at all looking at the nginx config | 22:11 |
fungi | clarkb: i think it's not actually serving any chain certs? if i use -showcerts with s_client i only see the server cert | 22:11 |
clarkb | fungi: should I try changing ssl_certificate /etc/letsencrypt-certs/graphite02.opendev.org/graphite02.opendev.org.cer; to ssl_certificate /etc/letsencrypt-certs/graphite02.opendev.org/fullchain.cer; ? | 22:11 |
fungi | yeah, looks like we both arrived at the same conclusion | 22:11 |
fungi | clarkb: for nginx you may need it, yeah | 22:12 |
clarkb | ok let me try that. Can always revert easily enough | 22:12 |
fungi | unless it has a separate option for the chain bundle | 22:12 |
fungi | melwitt: this seems likely to be the cause. your phone probably doesn't have the let's encrypt r3 root cert in its trust store, but your desktop might | 22:13 |
clarkb | http://nginx.org/en/docs/http/configuring_https_servers.html#chains says fullchain is what we want I think | 22:13 |
melwitt | it's doing it on my desktop again | 22:13 |
melwitt | This page is not secure (broken HTTPS). | 22:13 |
melwitt | Certificate - missing | 22:13 |
melwitt | This site is missing a valid, trusted certificate (net::ERR_CERT_DATE_INVALID). | 22:13 |
clarkb | I'll do the fullchain switch and restart and see if that makes it happy | 22:13 |
fungi | melwitt: yep, okay this is likely the fix for the server side then | 22:13 |
fungi | thanks for the detail! | 22:13 |
fungi | i wonder what caused it to suddenly break | 22:14 |
fungi | maybe the graphite container has changed? | 22:14 |
clarkb | melwitt: can you try it again | 22:15 |
clarkb | fungi: this is config we supply | 22:15 |
clarkb | fungi: I think what changed was LE started using teh R3 intermediate and no one noticedu ntil now | 22:15 |
fungi | yeah, i'm wondering if something changed in configuration we don't supply, but i can't imagine what | 22:15 |
clarkb | melwitt: if it works now then I think this is the fix and I'll work on a change to make it permanent | 22:15 |
clarkb | fungi: the R3 intermediate is new in LE | 22:15 |
melwitt | clarkb: looks like that worked | 22:15 |
fungi | new like today new? | 22:15 |
clarkb | I don't know how new but relatively | 22:15 |
clarkb | fungi: no older than that. Like february ish? but maybe they weren't giving it out to all requests yet | 22:15 |
fungi | but we've got it at least as far back as the last cert refresh, so i wonder why it's gone unnoticed until now | 22:16 |
fungi | maybe most of our users have the r3 cert in their browsers' trust stores already | 22:16 |
fungi | because we've been serving a cert relying on it there for over a month | 22:17 |
clarkb | yes that is my expectation | 22:17 |
clarkb | all of my browsers were totally happy with it | 22:17 |
fungi | same. bonkers | 22:18 |
fungi | git grep says the graphite container is the only place we're doing this | 22:19 |
opendevreview | Clark Boylan proposed opendev/system-config master: Use fullchain.cer on graphite for nginx https://review.opendev.org/c/opendev/system-config/+/811803 | 22:19 |
clarkb | fungi: I think gitea too? | 22:19 |
clarkb | oh gitea would've before we switched it to apache | 22:19 |
clarkb | now I don't think it cares becaus apache | 22:19 |
clarkb | melwitt: thanks for working through this with us. Config error on our end | 22:20 |
fungi | yeah, no matches on ssl_certificate outside the graphite config template | 22:20 |
clarkb | fungi: oh I was grepping for fullchain. gitea did its own termination in its golang https server | 22:20 |
fungi | ahh, yeah i was looking for other nginx server configs we might have missed adding the fullchain.cer file to | 22:21 |
clarkb | good idea :) | 22:21 |
melwitt | clarkb: np. it's still doing it on my laptop for some reason ... but my phone is working now | 22:22 |
clarkb | melwitt: you might need to create new ssl connections? | 22:22 |
clarkb | its also possible that ansible has helpfully undone what I did | 22:22 |
clarkb | no it hasn't undone it yet | 22:22 |
clarkb | but s_client verifies properly for me now so I'm reasonably confident this was the fix | 22:23 |
melwitt | clarkb: sorry what does that mean? | 22:23 |
fungi | melwitt: possible that your laptop is complaining for a different reason than your phone did | 22:24 |
clarkb | melwitt: openssl s_client is openssl's command line too to make ssl connections and it prints a bunch of debugging info about the ssl connection setup. Previously I could confirm that s_client did not like the ssl cert setup on that server as it had not way to verify the cert through the intermediate. But I'ev since fixed that | 22:24 |
clarkb | melwitt: `openssl s_client -connect graphite.opendev.org:443` if you want to try it locally. Look for Verification: OK | 22:24 |
melwitt | oh, sorry, I'm dumb. graphite.opendev.org says "No Data" when you visit it without navigating anywhere and I didn't realize that | 22:24 |
clarkb | melwitt: ah yup you have to select a graph | 22:25 |
clarkb | er not a graph but things to graph | 22:25 |
melwitt | grafana dashboards all looking good on phone and laptop | 22:25 |
melwitt | thanks both! | 22:25 |
clarkb | excellent, and thank you as I probably wouldn't have thought to check ssl | 22:25 |
clarkb | (since it was working in my browser) | 22:26 |
melwitt | :) | 22:26 |
fungi | yeah, the fix should hopefully merge and deploy shortly, i've already approved it | 22:26 |
fungi | fedora 34 mirror is still broken by the way, we probably need to wait for the site we're mirroring from to refresh | 22:36 |
fungi | or switch to a different source mirror if it persists for too long | 22:37 |
clarkb | noted | 22:37 |
fungi | unfortunately i've got a whole stack of tox role fixes for zuul-jobs i've been trying to get merged for a month and by each time they get approved there's some new regression which becomes a blocker | 22:39 |
clarkb | :( | 22:39 |
fungi | now it's f34 testing | 22:40 |
clarkb | fungi: don't worry tomorrow will make you forget all about tox :P | 22:40 |
clarkb | "its the final countdown" | 22:40 |
fungi | indeed | 22:43 |
mordred | clarkb: what exciting thing is happening tomorrow? | 22:58 |
fungi | it's a week until openstack xena releases | 23:00 |
clarkb | mordred: lets encrypt's old root signing cert expires | 23:01 |
fungi | and that, yep | 23:01 |
mordred | Oh joy | 23:01 |
clarkb | mordred: a bunch of stuff is expected to stop working. And one of the compromises they are making is that by default they are doing a thing that will work for android but break openssl and gnutls if they are too old | 23:02 |
clarkb | I'm sure that compromise was made because there are billions of android devices that are way out of date but I can't help but feel that is the wrong prioritization | 23:02 |
fungi | because who cares about crufty old servers, as long as old cellphones can still browse stuff | 23:02 |
clarkb | we have a number of options available to us including removing the android hack from our cert chains and adding the new root explicitly to our servers trust chains | 23:03 |
clarkb | I think ESM also addresses this | 23:03 |
clarkb | and so on | 23:03 |
clarkb | It is hard to know right now what the total impact might be but fungi and I poked around this morning and made suer that our existing LE certs are at least doing the right thing (except for the one that melwitt found but that was on our end not acme.sh or LEs) | 23:04 |
clarkb | and I prepared a change to drop the older android compat if that helps us (since older android isn't really a primary user of our services) | 23:04 |
fungi | yeah, that was more just that we had pretty terribly misconfigured nginx in the graphite container | 23:04 |
fungi | because basically nowhere else do we use nginx | 23:04 |
clarkb | but we expect some non zero fallout and will try to debug and workaround it best we can | 23:04 |
clarkb | the nuclear option is we replace LE certs with sectigo certs | 23:05 |
clarkb | then use the yaer that gives us to upgrade things appropriately | 23:05 |
fungi | i have some spare change under my sofa cushions | 23:05 |
opendevreview | Merged opendev/system-config master: Use fullchain.cer on graphite for nginx https://review.opendev.org/c/opendev/system-config/+/811803 | 23:05 |
mordred | Wow. That does sound like it'll make everyone forget about tox | 23:07 |
fungi | carefully timed for maximum impact on openstack release week | 23:07 |
clarkb | also firefox shows me the ISRG root now | 23:09 |
clarkb | I have no idea what happened there | 23:09 |
clarkb | the one good thing about requests using its own chain by default :) | 23:09 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!