*** darshna has quit IRC | 00:33 | |
*** tkajinam has joined #opendev | 02:38 | |
*** tkajinam has quit IRC | 02:38 | |
*** darshna has joined #opendev | 06:25 | |
*** calcmandan has quit IRC | 06:35 | |
*** calcmandan has joined #opendev | 06:36 | |
*** slaweq has quit IRC | 07:38 | |
*** frigo has joined #opendev | 07:39 | |
frigo | morning! opendev.org looks down | 07:40 |
---|---|---|
*** frigo has quit IRC | 07:47 | |
*** tosky has joined #opendev | 07:49 | |
*** avass has quit IRC | 08:39 | |
*** avass has joined #opendev | 08:40 | |
*** akahat|ruck has quit IRC | 09:20 | |
*** kopecmartin has quit IRC | 09:22 | |
*** kopecmartin has joined #opendev | 09:26 | |
*** akahat has joined #opendev | 09:44 | |
fungi | frigo seems to be gone, but i'm looking into it now | 11:45 |
fungi | the haproxy-docker_haproxy_1 container on gitea-lb01 is in a "restarting" state according to `docker-compose ps` | 11:49 |
fungi | downing and upping the container just brings it back to a restarting state | 11:50 |
fungi | the last time syslog records haproxy forwarding anything was at 06:37:47 | 11:52 |
fungi | `docker image list` says the haproxy "latest" tag is for an image built 17 hours ago | 11:54 |
fungi | wondering if something changed with it | 11:54 |
*** ykarel has joined #opendev | 11:59 | |
fungi | switching from latest to lts doesn't seem to have changed anything | 11:59 |
fungi | oh, latest and lts both seem to point to the same thing as 2.4.0 | 12:01 |
fungi | switching to the 2.3 tag fixed it | 12:03 |
fungi | i can load https://opendev.org/ again | 12:03 |
fungi | #status notice The load balancer for opendev.org Git services was offline between 06:37 and 12:03 utc due to unanticipated changes in haproxy 2.4 container images, but everything is in service again now | 12:05 |
openstackstatus | fungi: sending notice | 12:05 |
-openstackstatus- NOTICE: The load balancer for opendev.org Git services was offline between 06:37 and 12:03 utc due to unanticipated changes in haproxy 2.4 container images, but everything is in service again now | 12:05 | |
fungi | infra-root: i've put gitea-lb01.opendev.org in the emergency disable list until we can merge a change to pin to the 2.3 tag | 12:06 |
openstackstatus | fungi: finished sending notice | 12:08 |
*** dmsimard has quit IRC | 12:12 | |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Temporarily pin haproxy image to 2.3 https://review.opendev.org/c/opendev/system-config/+/791596 | 12:21 |
fungi | yeah, prior to https://github.com/docker-library/haproxy/commit/ae10fbf9 yesterday, latest was an alias for 2.3 and lts was an alias for 2.2, but now they're both aliased to 2.4 | 12:38 |
fungi | according to https://www.haproxy.com/blog/announcing-haproxy-2-4/#validation we can use haproxy -c to check the validity of our configuration. maybe something's not quite right syntax-wise and 2.4 is catching it | 12:44 |
*** ykarel_ has joined #opendev | 12:44 | |
fungi | i don't see mention of any configuration options removed in 2.4 so i doubt it's something that simple | 12:47 |
*** ykarel has quit IRC | 12:47 | |
*** frigo has joined #opendev | 13:16 | |
frigo | thanks:) looks good now | 13:17 |
fungi | thanks frigo! | 13:29 |
fungi | appreciate the heads up | 13:29 |
frigo | hope you got some logs to investigate | 13:30 |
frigo | maybe there is a real bug upstreams?:) if you did not know what to do during the week-end | 13:30 |
fungi | well, unfortunately docker wasn't logging much as to why the new images wouldn't start, but my guess is they reorganized the image layout or something between 2.3 and 2.4 | 13:31 |
fungi | our tests probably already reproduce the behavior so shouldn't be hard to hold a node and have a working replica of the problem combination | 13:32 |
fungi | maybe at some point we'll get sophisticated enough to always pin container versions for things we don't control and then auto-propose updates to those image versions so they can be tested before we deploy | 13:34 |
frigo | but.. if the new image does not start, the change was still rolled out on all the nodes ? | 13:34 |
fungi | frigo: right now we just always deploy the "latest" tag of the haproxy image from dockerhub | 13:34 |
fungi | and the problem is that latest flipped from 2.3.x to 2.4.0 earlier today | 13:35 |
fungi | which for some reason doesn't start | 13:35 |
frigo | yeah I get that, but the change kept being rolled out even after a first node failed? | 13:35 |
fungi | there was no "change" which was rolled out | 13:36 |
fungi | we continuously update the containers | 13:36 |
fungi | whenever "latest" updates on dockerhub, periodic redeployment picks that up automatically | 13:37 |
fungi | it's not tested | 13:37 |
frigo | ok ok :D | 13:37 |
fungi | it was designed naively to assume whatever the latest tag for haproxy is would work the same as previous versions | 13:37 |
fungi | there are ways we could catch that, we just haven't implemented them due to limited supply of people involved in running all this | 13:38 |
*** slaweq has joined #opendev | 13:38 | |
frigo | of course of course | 13:38 |
fungi | we do test the images we build, but we're not building an haproxy image we're just consuming the "official" one | 13:39 |
*** slaweq has quit IRC | 13:47 | |
fungi | infra-root: i've self-approved https://review.opendev.org/791596 so i can remove the emergency disable entry for the load balancer in short order | 13:48 |
*** frigo has quit IRC | 13:49 | |
fungi | in theory the revert of that should fail its system-config-run-gitea build and then we can use that to investigate further | 13:49 |
*** slaweq has joined #opendev | 14:08 | |
*** tosky has quit IRC | 14:11 | |
*** DSpider has joined #opendev | 14:22 | |
*** ykarel_ has quit IRC | 14:23 | |
openstackgerrit | Merged opendev/system-config master: Temporarily pin haproxy image to 2.3 https://review.opendev.org/c/opendev/system-config/+/791596 | 14:36 |
*** slaweq has quit IRC | 14:49 | |
*** ykarel_ has joined #opendev | 16:06 | |
*** ricolin has quit IRC | 16:20 | |
*** ykarel_ has quit IRC | 16:40 | |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Revert "Temporarily pin haproxy image to 2.3" https://review.opendev.org/c/opendev/system-config/+/791598 | 16:43 |
fungi | it's been long enough since 791596 merged i'm taking gitea-lb01 back out of the emergency disable list now | 16:43 |
fungi | and setting an autohold for system-config-run-gitea on 791598 | 16:44 |
fungi | if i can figure out how to plumb that through docker-compose exec now | 16:49 |
fungi | yay got it added | 16:50 |
fungi | will check back later when that (hopefully) fails | 16:51 |
fungi | yep, as predicted system-config-run-gitea failed on the revert, at least our testing is predictable there. held node is 188.212.108.136 | 17:36 |
fungi | so at first pass, it looks like a permissions error: https://zuul.opendev.org/t/openstack/build/678857cf702541d789cd3daa3d829907/log/gitea-lb01.opendev.org/docker/haproxy-docker_haproxy_1.txt#5-7 | 17:41 |
*** brinzhang has joined #opendev | 18:01 | |
*** brinzhang0 has quit IRC | 18:04 | |
*** auristor has quit IRC | 19:15 | |
*** auristor has joined #opendev | 19:19 | |
*** tosky has joined #opendev | 19:20 | |
*** dmsimard has joined #opendev | 21:21 | |
*** darshna has quit IRC | 22:03 | |
*** irclogbot_0 has quit IRC | 23:30 | |
*** irclogbot_1 has joined #opendev | 23:33 | |
*** tosky has quit IRC | 23:34 | |
*** artom has quit IRC | 23:45 | |
*** artom has joined #opendev | 23:45 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!