Thursday, 2023-07-20

Ramereth[m]	ianw fungi: please give try using the public7 network. You should have a lot more ips to use now	01:42
opendevreview	Ian Wienand proposed opendev/system-config master: nodepool osuosl : use public7 https://review.opendev.org/c/opendev/system-config/+/888967	01:45
ianw	^ should do that	01:48
Clark[m]	I'll review shortly	02:11
clarkb	23.253.22.137 is the held etherpad node and I think the colors and username are working	02:21
clarkb	others should test that node (have to use /etc/hosts due to redirects) and review https://review.opendev.org/c/opendev/system-config/+/887006	02:22
opendevreview	Merged opendev/system-config master: nodepool osuosl : use public7 https://review.opendev.org/c/opendev/system-config/+/888967	03:10
*** benj_7 is now known as benj_		08:16
frickler	fungi: did you want to do some further investigation regarding the meetpad cert or should we just restart to pick up the new one?	14:27
fungi	frickler: i think we're just waiting for the apache workers to age out and get recycled, but a restart to force that sooner is probably fine (odds are nobody's using it for a call at the moment, though i can also do it later on when that's less likely)	14:49
*** dviroel__ is now known as dviroel		14:55
Ramereth[m]	Are jobs running properly now on our cluster?	15:03
fungi	Ramereth[m]: i see osuosl-regionone nodes with an in-use status at https://zuul.opendev.org/t/openstack/nodes so looks like yes	15:08
Ramereth[m]	excellent!	15:09
fungi	thanks for working on it!	15:09
frickler	fungi: hmm, I did a quick check and we seem to be running nginx there, not apache. the worker processes are all very old. there is some weird "Unknown failure: 0" at the end of the LE refresh log https://paste.opendev.org/show/bIIKveSXIrsgfPqQVayW/ not sure if related	15:14
fungi	oh, i forgot those were nginx. maybe its reload behavior is different from what we're used to with apache	15:17
fungi	that "Unknown failure: 0" could very well be from the httpd reload	15:17
frickler	in particular it is nginx in a jitsi/web container	15:18
fungi	looks like out "letsencrypt updated meetpad01-main" handler calls roles/letsencrypt-create-certs/handlers/restart_jitsi_meet.yaml which does a `docker-compose restart web` on the server	15:21
fungi	but that's only called if `pgrep -f nginx` exits 0	15:21
fungi	though maybe if acme.sh had a nonzero exit we never called that handler at all?	15:23
fungi	we probably still have the ansible logs from that run	15:23
fungi	i'm not immediately spotting any failures in /var/log/ansible/letsencrypt.yaml.log.2023-07-15T02:48:35	15:28
frickler	that restart play must have failed earlier though, the cert hasn't been updated:	15:30
frickler	-rw-r--r-- 1 root root 5638 May 15 02:55 /var/jitsi-meet/web/keys/cert.crt	15:30
frickler	well not failed it looks like, but the handler also doesn't seem to get triggered for some reason	15:35
frickler	might be a special case because acme.sh already tried to refresh a day earlier, the cert issuing failed there, but the dns auth still seems to have been valid on the day after	15:51
frickler	neither sure how to reproduce that nor how to clean up now	15:52
fungi	we could try moving the staged cert out of the way and let it run again tomorrow	15:55
Clark[m]	The cert did update though? I don't know if moving the cert aside triggers a reissue request. I thought simply restarting the container would fix it since it appeared all the cert data has updated	19:42
fungi	the cert updated but ansible didn't copy it into place	19:58
fungi	from what i can see	19:58
Clark[m]	Ah we copy those certs rather than use them in place. And the copy happens in the handler so the handler probably didn't fire. I'm caught up now	20:11
Clark[m]	We have had issues with flaky Ansible handlers due to Ansible bugs but if acme.sh failed then it wouldn't fire either	20:11
Clark[m]	Moving the file aside won't cause the handler to run though so I don't think that will help	20:14
fungi	i meant convince acme.sh that it needs to renew again and see if it actually triggers the handler this time	20:16
fungi	i figured moving the staged cert out of the way might force another renewal	20:17
fungi	basically make it think this was a fresh deployment	20:17
Clark[m]	Ya that might work. I forget which things retrigger acme.	20:27
opendevreview	Clark Boylan proposed opendev/system-config master: Update to Gitea 1.20 https://review.opendev.org/c/opendev/system-config/+/886993	23:06
clarkb	that patchset updates the commit message TODO list after I checked a few things and tries to add log collection for the gitea access log so that we can check that content	23:09

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!