corvus | zuul01 is back up along with apache, and lb is serving both | 00:00 |
---|---|---|
clarkb | yup I'm back to hitting 01 | 00:00 |
corvus | we can make a dashboard the "default"... i could do that for the zuul status page? | 00:01 |
corvus | oh, or is that just for that account? | 00:01 |
corvus | in which case.. maybe fungi's idea :) | 00:02 |
corvus | oh there's an organization default_home_dashboard_path | 00:03 |
corvus | okay, that's not really working and i don't think it's worth sinking more time into :) | 00:07 |
opendevreview | Merged opendev/system-config master: Do more robust checks against zuul-web with haproxy https://review.opendev.org/c/opendev/system-config/+/832141 | 00:15 |
fungi | yeah, https://grafana.opendev.org/dashboards seems like a more useful landing page than the root page | 00:24 |
Clark[m] | In theory we are doing the http checks now and I can still get https://zuul.opendev.org | 01:20 |
corvus | i will try the stop experiment again | 01:23 |
corvus | it's only downed finger... | 01:24 |
corvus | do we need to reload haproxy? | 01:24 |
corvus | hrm, i restarted haproxy and that hasn't improved things | 01:25 |
corvus | oooh | 01:28 |
corvus | Clark: apache returns 302 for options / and that's acceptable for haproxy -- at least for http | 01:29 |
corvus | checking the https logs now | 01:29 |
fungi | oh, i think it's our redirecting | 01:30 |
fungi | we probably need to check something other than / | 01:30 |
corvus | yeah... i'm not seeing the options request for https though... | 01:30 |
corvus | so i don't know what's going on there | 01:30 |
corvus | it may still be redirecting because of a hostname mismatch | 01:31 |
corvus | but i'd like to confirm | 01:31 |
fungi | option httpchk GET /wherever | 01:32 |
Clark[m] | Ah ya that is configurable | 01:32 |
Clark[m] | HEAD not GET is probably better | 01:32 |
fungi | option httpchk GET / HTTP/1.1\r\nHost: zuul.opendev.org | 01:33 |
corvus | also, btw, somoene is cralwing all the builds with python urllib | 01:33 |
fungi | that may be the replacement for logstash/elasticsearch | 01:33 |
Clark[m] | I'm doing dinner stuff so can't push anything now | 01:33 |
corvus | well, before we push anything, i still want to understand the https frontend | 01:34 |
corvus | i can't find any evidence of a check there | 01:34 |
Clark[m] | Ah | 01:36 |
fungi | i can't figure out how to manually emulate a browser when opening an ssl socket to apache on one of the schedulers | 01:42 |
fungi | no matter what i try to pass after opening the socket, apache sends back a 400 bad request | 01:43 |
Clark[m] | fungi: curl has a verbose mode that will show you all that iirc | 01:44 |
fungi | yeah, i'm able to use openssl s_client to connect to gitea01:3000 and send 'GET / HTTP/1.1\r\nHost: opendev.org' and get page content back | 01:46 |
fungi | can't seem to do something similar to zuul01 or zuul02 though | 01:46 |
corvus | only try zuul02 right now; 01 is down | 01:49 |
corvus | and /api/info is a good endpoint to get | 01:49 |
fungi | okay, i think i needed -noservername | 01:50 |
fungi | seems s_client may have been trying to do sni | 01:50 |
fungi | no, nevermind, i wasn't connecting to the host i meant to | 01:51 |
fungi | curl seems to be able to get content, yeah, setting --resolve zuul.opendev.org:443:104.130.246.31 | 01:54 |
corvus | i'm trying various options with haproxy, and i still don't see any https healthchecks | 01:56 |
corvus | option httpchk GET /api/info HTTP/1.1\r\nHost:\ zuul.opendev.org | 01:58 |
corvus | server zuul01.opendev.org 104.130.246.57:443 check-ssl verify none | 01:59 |
corvus | that's the current config :/ | 01:59 |
fungi | is it actually connecting? | 02:01 |
corvus | i see no evidence we have ever executed an https health check, and therefore zuul01 is currently in the load balancer despite being down. | 02:01 |
corvus | this may not working on the gitea lb either -- i don't think there's anything zuul-specific about this | 02:02 |
fungi | yeah, i'm trying to figure out if it logs health check failures at all | 02:03 |
corvus | it's time for dinner here, so i'm going to restore the config and bring zuul01 back up; but this is definitely worth more investigation | 02:03 |
fungi | looking at the syslog on gitea-lb01 | 02:03 |
corvus | fungi: it does -- you can see them on zuul-lb01 for finger (since that is failing health checks) | 02:03 |
fungi | and yes, it's getting lateish here too but i'll keep poking for a bit | 02:03 |
corvus | i'm looking at the apache logs for the actual checks, and i saw them for http but not https | 02:03 |
corvus | okay, haproxy and zuul01 should both be coming back up now; it may take a bit for zuul01 to be back in service | 02:04 |
Clark[m] | What is odd is how is it up at all without doing checks | 02:10 |
Clark[m] | Seems like it would have to check it is up first. We had issues in testing when it tried to verify ssl implying at least then it checked | 02:11 |
corvus | im wondering if it's silently falling back to tcp checks; we could probably confirm by downing apache (but i'm not going to do that right now because i'm no longer here) | 02:14 |
fungi | i'm wondering if we've accidentally configured it for passive checks | 02:32 |
fungi | still reading up on haproxy's active vs passive checks | 02:33 |
fungi | yeah | 02:33 |
fungi | Whereas an active health check continually polls the server with either a TCP connection or an HTTP request, a passive health check monitors live traffic for errors. You can enable this mode by adding the check, observe, error-limit, and on-error parameters to a server line | 02:34 |
fungi | since we're forwarding at layer 4 not 7, i don't think we can use passive health checks | 02:34 |
fungi | https://www.haproxy.com/blog/how-to-enable-health-checks-in-haproxy/ explains a lot of the possibilities | 02:38 |
fungi | i think we need to set "http-check connect ssl" | 02:40 |
fungi | it looks like the check-ssl parameter on the server lines may only apply to passive checks not active | 02:41 |
*** mazzy5098812929580851 is now known as mazzy509881292958085 | 02:52 | |
fungi | happy to push a change to add that if others' readings agree | 03:02 |
Clark[m] | fungi the thing that confuses me is why is the ssl version different than the http version? | 03:36 |
Clark[m] | We are doing active checks with http but not https I guess because we don't tell it to connect with ssl? Your suggestion can't hurt | 03:37 |
*** mtreinish_ is now known as mtreinish | 09:12 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!