Ramereth[m]ianw fungi: please give try using the public7 network. You should have a lot more ips to use now01:42
opendevreviewIan Wienand proposed opendev/system-config master: nodepool osuosl : use public7
ianw^ should do that01:48
Clark[m]I'll review shortly 02:11
clarkb23.253.22.137 is the held etherpad node and I think the colors and username are working02:21
clarkbothers should test that node (have to use /etc/hosts due to redirects) and review
opendevreviewMerged opendev/system-config master: nodepool osuosl : use public7
fricklerfungi: did you want to do some further investigation regarding the meetpad cert or should we just restart to pick up the new one?14:27
fungifrickler: i think we're just waiting for the apache workers to age out and get recycled, but a restart to force that sooner is probably fine (odds are nobody's using it for a call at the moment, though i can also do it later on when that's less likely)14:49
Ramereth[m]Are jobs running properly now on our cluster?15:03
fungiRamereth[m]: i see osuosl-regionone nodes with an in-use status at so looks like yes15:08
fungithanks for working on it!15:09
fricklerfungi: hmm, I did a quick check and we seem to be running nginx there, not apache. the worker processes are all very old. there is some weird "Unknown failure: 0" at the end of the LE refresh log not sure if related15:14
fungioh, i forgot those were nginx. maybe its reload behavior is different from what we're used to with apache15:17
fungithat "Unknown failure: 0" could very well be from the httpd reload15:17
fricklerin particular it is nginx in a jitsi/web container15:18
fungilooks like out "letsencrypt updated meetpad01-main" handler calls roles/letsencrypt-create-certs/handlers/restart_jitsi_meet.yaml which does a `docker-compose restart web` on the server15:21
fungibut that's only called if `pgrep -f nginx` exits 015:21
fungithough maybe if had a nonzero exit we never called that handler at all?15:23
fungiwe probably still have the ansible logs from that run15:23
fungii'm not immediately spotting any failures in /var/log/ansible/letsencrypt.yaml.log.2023-07-15T02:48:3515:28
fricklerthat restart play must have failed earlier though, the cert hasn't been updated:15:30
frickler-rw-r--r-- 1 root root 5638 May 15 02:55 /var/jitsi-meet/web/keys/cert.crt15:30
fricklerwell not failed it looks like, but the handler also doesn't seem to get triggered for some reason15:35
fricklermight be a special case because already tried to refresh a day earlier, the cert issuing failed there, but the dns auth still seems to have been valid on the day after15:51
fricklerneither sure how to reproduce that nor how to clean up now15:52
fungiwe could try moving the staged cert out of the way and let it run again tomorrow15:55
Clark[m]The cert did update though? I don't know if moving the cert aside triggers a reissue request. I thought simply restarting the container would fix it since it appeared all the cert data has updated19:42
fungithe cert updated but ansible didn't copy it into place19:58
fungifrom what i can see19:58
Clark[m]Ah we copy those certs rather than use them in place. And the copy happens in the handler so the handler probably didn't fire. I'm caught up now20:11
Clark[m]We have had issues with flaky Ansible handlers due to Ansible bugs but if failed then it wouldn't fire either20:11
Clark[m]Moving the file aside won't cause the handler to run though so I don't think that will help20:14
fungii meant convince that it needs to renew again and see if it actually triggers the handler this time20:16
fungii figured moving the staged cert out of the way might force another renewal20:17
fungibasically make it think this was a fresh deployment20:17
Clark[m]Ya that might work. I forget which things retrigger acme.20:27
opendevreviewClark Boylan proposed opendev/system-config master: Update to Gitea 1.20
clarkbthat patchset updates the commit message TODO list after I checked a few things and tries to add log collection for the gitea access log so that we can check that content23:09

