opendevreview | James Page proposed openstack/project-config master: sunbeam: retire all single charm repositories https://review.opendev.org/c/openstack/project-config/+/903666 | 11:04 |
---|---|---|
opendevreview | James Page proposed openstack/project-config master: Fix the ACL associated with charm-keystone-ldap-k8s https://review.opendev.org/c/openstack/project-config/+/903667 | 11:14 |
*** haleyb|out is now known as haleyb | 14:24 | |
*** jamesdenton_ is now known as jamesdenton | 15:37 | |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Temporarily lower max-servers for linaro https://review.opendev.org/c/openstack/project-config/+/903708 | 16:36 |
ttx | dpawlik fungi I'm looking at AWS dashboards and I can tell that Fargate is used for logstash. Trying to see if I can find something to explain the doubling in resources... | 16:41 |
ttx | OK I think there is a thing that doubled on Oct 18. There are two loadbalancers defined in the Fargate logstash thing, and one seems to constantly point to "unhealthy" targets. It used to have an average of 7 unhealthy targets (and 0 healthy ones) until Oct 18 14utc. Starting Oct 18 15utc the average is 18 unhealthy targets. So my read is that it was dysfunctional before but since Oct 18 it spends twice as much resources to be dysfunctional | 16:52 |
fungi | ttx: thanks, that's an interesting piece of information. dpawlik: maybe we can temporarily down the half of the lb that's always pointing to unhealthy targets? do you have access to try that? | 16:54 |
ttx | the one on port 9600 looks ok, but the one on 9999 is clearly broken and useless. The increase in resources consumed might just be that Fargate got more efficient at respawning failed things. | 17:00 |
fungi | i wonder if the two load balancers are for two different parts of the service, and shutting down either of them will lead to disabling all log ingestion... but without a better understanding of the design, turning the seemingly broken one off is probably the easiest thing to try | 17:04 |
ttx | I mean... It's clearly not doing anything since the target hosts are all unhealthy | 17:07 |
fungi | yeah, i guess if it's that broken then disabling it probably won't make things any worse | 17:07 |
ttx | I could reduce the number of target hosts. That way it will still be unhealthy, but use less resource sbeing unhealthy | 17:08 |
ttx | that sounds safe enough... | 17:08 |
fungi | sounds worth trying, i agree | 17:09 |
ttx | I keep the LB and the target group, I just reduce the number of targets from 30 to, say, 10 | 17:10 |
ttx | hmm deregistering them is not enough, it still keeps the 30 objective | 17:12 |
opendevreview | Merged openstack/project-config master: Temporarily lower max-servers for linaro https://review.opendev.org/c/openstack/project-config/+/903708 | 17:12 |
ttx | hmm it peaks at 28 unhealthy after I removed two, so it has an effect | 17:23 |
ttx | hmm, not really | 17:27 |
ttx | it seems to have some kind of an effect, now the number of unhealthy targets is down, so I think it does spend less energy to be dysfunctional | 17:55 |
ttx | I'll look at it again tomorrow to see if it sticks, then observe if it results in a drop in resources. Would still be good to be able to ask the person who set it up whether that load balancer connected to unhealthy targets on port 9999 actually serves any purpose. | 17:57 |
pmatulis | anyone else having trouble connecting to https://opendev.org/ ? it's touch and go for me | 18:55 |
fungi | pmatulis: no problems here... are you getting to it over ipv4 or ipv6? | 19:08 |
JayF | WFM over my ipv4 | 19:08 |
fungi | a traceroute might help | 19:08 |
pmatulis | ipv4 ... digging | 19:10 |
pmatulis | yeah i think it's a regional carrier issue. i also had trouble accessing another site earlier | 19:15 |
pmatulis | https://imgur.com/a/0wXvLFq | 19:17 |
fungi | pmatulis: likely an asymmetric route if the failure is on the last hop, i'll try tracing back to one of the early addresses in your trace for comparison | 20:52 |
fungi | pmatulis: confirmed, return path goes through cogent, who doesn't know how to reach you: https://paste.opendev.org/show/bbb1ztgC80lzTV6zfQwz/ | 20:54 |
fungi | cogent has a lookingglass that confirms it too: https://www.cogentco.com/en/looking-glass | 20:56 |
fungi | plug in ipv4 trace, us - san jose (the pop indicated in our trace), and 64.230.11.206 (the earliest hop in your trace) | 20:57 |
fungi | their query form doesn't have deep-linking that i can find, sorry | 20:57 |
fungi | if you're not familiar with traceroute's annotations, !N is short for "network unreachable" | 20:59 |
* fungi goes back to pretending he wasn't a network engineer in a former life | 21:00 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!