Saturday, 2021-05-15

frigomorning! looks down07:40
fungifrigo seems to be gone, but i'm looking into it now11:45
fungithe haproxy-docker_haproxy_1 container on gitea-lb01 is in a "restarting" state according to `docker-compose ps`11:49
fungidowning and upping the container just brings it back to a restarting state11:50
fungithe last time syslog records haproxy forwarding anything was at 06:37:4711:52
fungi`docker image list` says the haproxy "latest" tag is for an image built 17 hours ago11:54
fungiwondering if something changed with it11:54
fungiswitching from latest to lts doesn't seem to have changed anything11:59
fungioh, latest and lts both seem to point to the same thing as 2.4.012:01
fungiswitching to the 2.3 tag fixed it12:03
fungii can load again12:03
fungi#status notice The load balancer for Git services was offline between 06:37 and 12:03 utc due to unanticipated changes in haproxy 2.4 container images, but everything is in service again now12:05
openstackstatusfungi: sending notice12:05
-openstackstatus- NOTICE: The load balancer for Git services was offline between 06:37 and 12:03 utc due to unanticipated changes in haproxy 2.4 container images, but everything is in service again now12:05
fungiinfra-root: i've put in the emergency disable list until we can merge a change to pin to the 2.3 tag12:06
openstackstatusfungi: finished sending notice12:08
openstackgerritJeremy Stanley proposed opendev/system-config master: Temporarily pin haproxy image to 2.3
fungiyeah, prior to yesterday, latest was an alias for 2.3 and lts was an alias for 2.2, but now they're both aliased to 2.412:38
fungiaccording to we can use haproxy -c to check the validity of our configuration. maybe something's not quite right syntax-wise and 2.4 is catching it12:44
fungii don't see mention of any configuration options removed in 2.4 so i doubt it's something that simple12:47
frigothanks:)  looks good now13:17
fungithanks frigo!13:29
fungiappreciate the heads up13:29
frigohope you got some logs to investigate13:30
frigomaybe there is a real bug upstreams?:)  if you did not know what to do during the week-end13:30
fungiwell, unfortunately docker wasn't logging much as to why the new images wouldn't start, but my guess is they reorganized the image layout or something between 2.3 and 2.413:31
fungiour tests probably already reproduce the behavior so shouldn't be hard to hold a node and have a working replica of the problem combination13:32
fungimaybe at some point we'll get sophisticated enough to always pin container versions for things we don't control and then auto-propose updates to those image versions so they can be tested before we deploy13:34
frigobut.. if the new image does not start, the change was still rolled out on all the nodes ?13:34
fungifrigo: right now we just always deploy the "latest" tag of the haproxy image from dockerhub13:34
fungiand the problem is that latest flipped from 2.3.x to 2.4.0 earlier today13:35
fungiwhich for some reason doesn't start13:35
frigoyeah I get that, but the change kept being rolled out even after a first node failed?13:35
fungithere was no "change" which was rolled out13:36
fungiwe continuously update the containers13:36
fungiwhenever "latest" updates on dockerhub, periodic redeployment picks that up automatically13:37
fungiit's not tested13:37
frigook ok :D13:37
fungiit was designed naively to assume whatever the latest tag for haproxy is would work the same as previous versions13:37
fungithere are ways we could catch that, we just haven't implemented them due to limited supply of people involved in running all this13:38
frigoof course of course13:38
fungiwe do test the images we build, but we're not building an haproxy image we're just consuming the "official" one13:39
fungiinfra-root: i've self-approved so i can remove the emergency disable entry for the load balancer in short order13:48
fungiin theory the revert of that should fail its system-config-run-gitea build and then we can use that to investigate further13:49
openstackgerritMerged opendev/system-config master: Temporarily pin haproxy image to 2.3
openstackgerritJeremy Stanley proposed opendev/system-config master: Revert "Temporarily pin haproxy image to 2.3"
fungiit's been long enough since 791596 merged i'm taking gitea-lb01 back out of the emergency disable list now16:43
fungiand setting an autohold for system-config-run-gitea on 79159816:44
fungiif i can figure out how to plumb that through docker-compose exec now16:49
fungiyay got it added16:50
fungiwill check back later when that (hopefully) fails16:51
fungiyep, as predicted system-config-run-gitea failed on the revert, at least our testing is predictable there. held node is
fungiso at first pass, it looks like a permissions error:
