*** yamamoto has joined #openstack-lbaas | 00:10 | |
*** yamamoto has quit IRC | 00:33 | |
openstackgerrit | Michael Johnson proposed openstack/octavia-tempest-plugin master: Adjust scenario tests for NotImplemented skip https://review.opendev.org/714004 | 00:37 |
---|---|---|
*** yamamoto has joined #openstack-lbaas | 02:30 | |
*** yamamoto has quit IRC | 02:35 | |
*** ramishra has joined #openstack-lbaas | 02:40 | |
*** sapd1_x has joined #openstack-lbaas | 02:45 | |
*** ramishra has quit IRC | 02:47 | |
*** armax has quit IRC | 02:48 | |
*** yamamoto has joined #openstack-lbaas | 02:56 | |
*** rcernin has quit IRC | 02:56 | |
sorrison | rm_work, johnsom: We have finally switched fully over to octavia, migrated the last of the old neutron lbaas over a couple weeks ago | 02:57 |
sorrison | got about 150 LBs running | 02:57 |
johnsom | Nice! | 02:58 |
*** sapd1_x has quit IRC | 02:59 | |
sorrison | mainly going ok, but about every day we get a bunch of amps going into ERROR status | 03:00 |
sorrison | been trying to figure out why | 03:00 |
*** rcernin has joined #openstack-lbaas | 03:04 | |
johnsom | Hmm, check the health manager log | 03:04 |
johnsom | That is unusual for sure. | 03:04 |
johnsom | Are the pretty stable LBs or do they have a high rate of changes going on? | 03:05 |
sorrison | both :-) | 03:06 |
sorrison | Amphora %(id)s health message was processed too ' | 03:06 |
sorrison | 'slowly: %(delay)ss! The system may be overloaded ' | 03:06 |
sorrison | 'or otherwise malfunctioning. This heartbeat has ' | 03:06 |
sorrison | 'been ignored and no update was made to the ' | 03:06 |
sorrison | 'amphora health entry. THIS IS NOT GOOD.', | 03:06 |
sorrison | We see this sometimes | 03:06 |
sorrison | I can't quite figure out why as the system is not overloaded | 03:07 |
johnsom | Oh! Yeah, huge red flag. You database is overloaded | 03:07 |
sorrison | We have 3 health manager | 03:07 |
sorrison | our DB is def not overloaded | 03:07 |
johnsom | That means a simple db query took longer than 10 seconds to respond. | 03:07 |
sorrison | physical hardware less than a year old with nvme | 03:07 |
sorrison | we have no slow queries | 03:08 |
johnsom | I have seen this when people put 30+ containers on a single host, with the primary db and the master rabbit queue. | 03:08 |
sorrison | na we have dedicated hardware for our DB servers | 03:09 |
johnsom | Then that is very odd. We know it handle 2000+ | 03:11 |
johnsom | From other deployments. | 03:11 |
sorrison | well we have the odd slow query so I do lie, but hardly any `Slow queries: 0% (224K/13B)` | 03:11 |
sorrison | So is it def a DB issue and couldn't be anything else? | 03:11 |
johnsom | What version are you running? | 03:11 |
sorrison | octavia is ussuri | 03:12 |
sorrison | with a couple of other patches on top, including the fail over refactor | 03:12 |
johnsom | Are there “dropped” messages or just the “this is not good” messages? | 03:13 |
sorrison | just trying to find the log string to chuck in kibana, I can't see any log messages with dropped in the string | 03:14 |
johnsom | It is either a process/thread pool exhaustion or DB queried taking close to 10 seconds to respond. Assuming you haven’t changed the heartbeat interval. | 03:15 |
johnsom | Sorry on mobile so typos. | 03:16 |
sorrison | no, haven't changed the interval, something I've been thinking of but trying to figure this all out exactly first | 03:17 |
johnsom | I can give you the db query to run for monitoring or testing tomorrow when I am back in the office. | 03:17 |
sorrison | ok thanks, probably not in the middle of the work day like me :-) | 03:17 |
johnsom | It is a specially optimized query for this. I did a bunch of work optimizing that and scale testing it as we had 2000+ deployments that needed it. | 03:18 |
johnsom | Yeah, it is 8pm here. Watching a movie with the wife. | 03:19 |
johnsom | Oh, one other case we saw was an ha db deployment that was flapping master and having resync latency. | 03:21 |
*** ramishra has joined #openstack-lbaas | 03:22 | |
sorrison | yeah thought of that as we run a cluster, but all our queries going to the 1 server and hasn't been any flipping etc. | 03:22 |
*** rcernin has quit IRC | 03:35 | |
*** rcernin has joined #openstack-lbaas | 03:39 | |
*** sapd1_x has joined #openstack-lbaas | 03:41 | |
*** psachin has joined #openstack-lbaas | 03:51 | |
*** rcernin has quit IRC | 03:54 | |
sorrison | It seems to happen in spikes so trying to track that one done. Changed the log level from debug -> info and now getting `Health Update finished in: 0.018291685730218887 seconds` so will monitor this and see when it happens again | 03:58 |
*** rcernin has joined #openstack-lbaas | 04:08 | |
*** rcernin has quit IRC | 04:18 | |
*** rcernin has joined #openstack-lbaas | 04:19 | |
*** sapd1_x has quit IRC | 04:26 | |
*** spatel has joined #openstack-lbaas | 04:38 | |
*** spatel has quit IRC | 04:43 | |
johnsom | Yeah, that is slower than my cloud, but still respectable. I usually get 0.006... | 04:55 |
johnsom | 10 | 04:55 |
johnsom | 10 is where we start having problems with it. | 04:56 |
*** vishalmanchanda has joined #openstack-lbaas | 05:01 | |
*** ccamposr has joined #openstack-lbaas | 06:36 | |
*** ataraday_ has joined #openstack-lbaas | 07:18 | |
*** ccamposr__ has joined #openstack-lbaas | 07:32 | |
*** ccamposr has quit IRC | 07:34 | |
openstackgerrit | Ann Taraday proposed openstack/octavia master: Add option to set default ssl ciphers in haproxy https://review.opendev.org/685337 | 08:11 |
*** kevinz has joined #openstack-lbaas | 08:32 | |
*** gcheresh has joined #openstack-lbaas | 08:49 | |
*** gcheresh has quit IRC | 08:55 | |
*** gcheresh has joined #openstack-lbaas | 09:08 | |
*** spatel has joined #openstack-lbaas | 09:20 | |
*** spatel has quit IRC | 09:24 | |
*** gcheresh has quit IRC | 09:27 | |
*** vishalmanchanda has quit IRC | 09:27 | |
cgoncalves | octavia-tox-functional-py37-tips (voting) is failing because of a recent change in octavia-lib. https://review.opendev.org/#/c/744520/ fixes the gate. if you folks have some time to review it... :) | 10:12 |
*** yamamoto has quit IRC | 10:20 | |
*** gcheresh has joined #openstack-lbaas | 10:32 | |
*** gcheresh has quit IRC | 10:40 | |
*** yamamoto has joined #openstack-lbaas | 10:51 | |
*** yamamoto has quit IRC | 12:25 | |
*** servagem has quit IRC | 12:54 | |
*** rcernin has quit IRC | 12:56 | |
*** servagem has joined #openstack-lbaas | 12:57 | |
*** spatel has joined #openstack-lbaas | 13:00 | |
*** spatel has quit IRC | 13:04 | |
*** yamamoto has joined #openstack-lbaas | 13:05 | |
*** yamamoto has quit IRC | 13:10 | |
openstackgerrit | Gregory Thiemonge proposed openstack/octavia master: Add SCTP support in API and Amphora https://review.opendev.org/738381 | 13:19 |
openstackgerrit | Gregory Thiemonge proposed openstack/octavia-tempest-plugin master: WIP SCTP traffic scenario tests https://review.opendev.org/738643 | 13:20 |
*** vishalmanchanda has joined #openstack-lbaas | 13:38 | |
openstackgerrit | Gregory Thiemonge proposed openstack/python-octaviaclient master: Add SCTP support https://review.opendev.org/748667 | 13:49 |
*** ataraday_ has quit IRC | 13:59 | |
*** TrevorV has joined #openstack-lbaas | 14:00 | |
openstackgerrit | Gregory Thiemonge proposed openstack/python-octaviaclient master: Add SCTP support https://review.opendev.org/748667 | 14:19 |
*** armax has joined #openstack-lbaas | 14:23 | |
*** sapd1 has quit IRC | 14:40 | |
openstackgerrit | Gregory Thiemonge proposed openstack/octavia-dashboard master: Add support for SCTP https://review.opendev.org/748681 | 14:46 |
openstackgerrit | Gregory Thiemonge proposed openstack/python-octaviaclient master: Add SCTP support https://review.opendev.org/748667 | 14:58 |
openstackgerrit | Gregory Thiemonge proposed openstack/octavia-dashboard master: Add support for SCTP https://review.opendev.org/748681 | 15:00 |
*** yamamoto has joined #openstack-lbaas | 15:07 | |
*** yamamoto has quit IRC | 15:12 | |
*** gcheresh has joined #openstack-lbaas | 15:14 | |
*** gcheresh has quit IRC | 15:40 | |
openstackgerrit | Michael Johnson proposed openstack/octavia master: Add proxy v2 protocol support https://review.opendev.org/747801 | 16:10 |
johnsom | cgoncalves the ALPN patch looks good, thanks! | 16:21 |
*** psachin has quit IRC | 16:24 | |
*** ccamposr has joined #openstack-lbaas | 16:50 | |
*** ccamposr__ has quit IRC | 16:53 | |
*** yamamoto has joined #openstack-lbaas | 16:56 | |
*** gcheresh has joined #openstack-lbaas | 16:58 | |
openstackgerrit | Brian Haley proposed openstack/octavia-tempest-plugin master: Change pool create scenario test to wait for operating status https://review.opendev.org/745962 | 17:00 |
*** yamamoto has quit IRC | 17:01 | |
openstackgerrit | Brian Haley proposed openstack/octavia master: Remove Neutron SDN-specific code https://review.opendev.org/718192 | 17:32 |
openstackgerrit | Michael Johnson proposed openstack/octavia-tempest-plugin master: Adjust scenario tests for NotImplemented skip https://review.opendev.org/714004 | 17:34 |
johnsom | Ok, that should be back in working order after the ACL patch merged. | 17:34 |
openstackgerrit | Brian Haley proposed openstack/octavia-tempest-plugin master: Change pool create scenario test to wait for operating status https://review.opendev.org/745962 | 17:35 |
openstackgerrit | Michael Johnson proposed openstack/octavia-tempest-plugin master: Adjust API tests for NotImplemented skip https://review.opendev.org/744805 | 18:09 |
openstackgerrit | Gregory Thiemonge proposed openstack/octavia master: Fix nf_conntrack_buckets sysctl in Amphora https://review.opendev.org/748749 | 19:39 |
*** TrevorV has quit IRC | 19:57 | |
*** jamesdenton has quit IRC | 20:39 | |
*** jamesdenton has joined #openstack-lbaas | 20:40 | |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: Switch to live from noop drivers https://review.opendev.org/748163 | 20:51 |
*** yamamoto has joined #openstack-lbaas | 20:58 | |
*** servagem has quit IRC | 20:59 | |
*** yamamoto has quit IRC | 21:03 | |
*** jamesdenton has quit IRC | 21:07 | |
*** jamesden_ has joined #openstack-lbaas | 21:07 | |
*** vishalmanchanda has quit IRC | 21:25 | |
*** yamamoto has joined #openstack-lbaas | 21:32 | |
*** gcheresh has quit IRC | 21:36 | |
openstackgerrit | Michael Johnson proposed openstack/octavia-tempest-plugin master: Adjust API tests for NotImplemented skip https://review.opendev.org/744805 | 22:04 |
*** armax has quit IRC | 22:05 | |
-openstackstatus- NOTICE: A zuul server ended up with read only filesystems which caused many jobs to hit retry_limit. The server has been rebooted and appears happy. Jobs can be rechecked. | 22:13 | |
*** yamamoto has quit IRC | 22:16 | |
*** armax has joined #openstack-lbaas | 22:47 | |
*** rcernin has joined #openstack-lbaas | 23:10 | |
*** rcernin has quit IRC | 23:15 | |
*** armax has quit IRC | 23:26 | |
*** armax has joined #openstack-lbaas | 23:40 | |
*** armax has quit IRC | 23:52 | |
*** armax has joined #openstack-lbaas | 23:56 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!