Ramereth[m] | ianw fungi: please give try using the public7 network. You should have a lot more ips to use now | 01:42 |
---|---|---|
opendevreview | Ian Wienand proposed opendev/system-config master: nodepool osuosl : use public7 https://review.opendev.org/c/opendev/system-config/+/888967 | 01:45 |
ianw | ^ should do that | 01:48 |
Clark[m] | I'll review shortly | 02:11 |
clarkb | 23.253.22.137 is the held etherpad node and I think the colors and username are working | 02:21 |
clarkb | others should test that node (have to use /etc/hosts due to redirects) and review https://review.opendev.org/c/opendev/system-config/+/887006 | 02:22 |
opendevreview | Merged opendev/system-config master: nodepool osuosl : use public7 https://review.opendev.org/c/opendev/system-config/+/888967 | 03:10 |
*** benj_7 is now known as benj_ | 08:16 | |
frickler | fungi: did you want to do some further investigation regarding the meetpad cert or should we just restart to pick up the new one? | 14:27 |
fungi | frickler: i think we're just waiting for the apache workers to age out and get recycled, but a restart to force that sooner is probably fine (odds are nobody's using it for a call at the moment, though i can also do it later on when that's less likely) | 14:49 |
*** dviroel__ is now known as dviroel | 14:55 | |
Ramereth[m] | Are jobs running properly now on our cluster? | 15:03 |
fungi | Ramereth[m]: i see osuosl-regionone nodes with an in-use status at https://zuul.opendev.org/t/openstack/nodes so looks like yes | 15:08 |
Ramereth[m] | excellent! | 15:09 |
fungi | thanks for working on it! | 15:09 |
frickler | fungi: hmm, I did a quick check and we seem to be running nginx there, not apache. the worker processes are all very old. there is some weird "Unknown failure: 0" at the end of the LE refresh log https://paste.opendev.org/show/bIIKveSXIrsgfPqQVayW/ not sure if related | 15:14 |
fungi | oh, i forgot those were nginx. maybe its reload behavior is different from what we're used to with apache | 15:17 |
fungi | that "Unknown failure: 0" could very well be from the httpd reload | 15:17 |
frickler | in particular it is nginx in a jitsi/web container | 15:18 |
fungi | looks like out "letsencrypt updated meetpad01-main" handler calls roles/letsencrypt-create-certs/handlers/restart_jitsi_meet.yaml which does a `docker-compose restart web` on the server | 15:21 |
fungi | but that's only called if `pgrep -f nginx` exits 0 | 15:21 |
fungi | though maybe if acme.sh had a nonzero exit we never called that handler at all? | 15:23 |
fungi | we probably still have the ansible logs from that run | 15:23 |
fungi | i'm not immediately spotting any failures in /var/log/ansible/letsencrypt.yaml.log.2023-07-15T02:48:35 | 15:28 |
frickler | that restart play must have failed earlier though, the cert hasn't been updated: | 15:30 |
frickler | -rw-r--r-- 1 root root 5638 May 15 02:55 /var/jitsi-meet/web/keys/cert.crt | 15:30 |
frickler | well not failed it looks like, but the handler also doesn't seem to get triggered for some reason | 15:35 |
frickler | might be a special case because acme.sh already tried to refresh a day earlier, the cert issuing failed there, but the dns auth still seems to have been valid on the day after | 15:51 |
frickler | neither sure how to reproduce that nor how to clean up now | 15:52 |
fungi | we could try moving the staged cert out of the way and let it run again tomorrow | 15:55 |
Clark[m] | The cert did update though? I don't know if moving the cert aside triggers a reissue request. I thought simply restarting the container would fix it since it appeared all the cert data has updated | 19:42 |
fungi | the cert updated but ansible didn't copy it into place | 19:58 |
fungi | from what i can see | 19:58 |
Clark[m] | Ah we copy those certs rather than use them in place. And the copy happens in the handler so the handler probably didn't fire. I'm caught up now | 20:11 |
Clark[m] | We have had issues with flaky Ansible handlers due to Ansible bugs but if acme.sh failed then it wouldn't fire either | 20:11 |
Clark[m] | Moving the file aside won't cause the handler to run though so I don't think that will help | 20:14 |
fungi | i meant convince acme.sh that it needs to renew again and see if it actually triggers the handler this time | 20:16 |
fungi | i figured moving the staged cert out of the way might force another renewal | 20:17 |
fungi | basically make it think this was a fresh deployment | 20:17 |
Clark[m] | Ya that might work. I forget which things retrigger acme. | 20:27 |
opendevreview | Clark Boylan proposed opendev/system-config master: Update to Gitea 1.20 https://review.opendev.org/c/opendev/system-config/+/886993 | 23:06 |
clarkb | that patchset updates the commit message TODO list after I checked a few things and tries to add log collection for the gitea access log so that we can check that content | 23:09 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!