johnsom | See, this is what I don't get: When deadlock detection is enabled (the default) and a deadlock does occur, InnoDB detects the condition and rolls back one of the transactions (the victim). | 00:00 |
---|---|---|
johnsom | So, it should only roll back one. It should still let one complete | 00:01 |
johnsom | rm_work Ah, I see why it just stops.... | 00:07 |
*** gongysh has joined #openstack-lbaas | 00:08 | |
*** sshank has quit IRC | 00:08 | |
*** gongysh has quit IRC | 00:09 | |
openstackgerrit | Michael Johnson proposed openstack/octavia master: Fix health monitor DB locking. https://review.openstack.org/493252 | 00:11 |
johnsom | Doesn't answer the deadlock, but will cause it to not matter as much. | 00:11 |
*** sshank has joined #openstack-lbaas | 00:13 | |
*** sshank has quit IRC | 00:22 | |
*** xingzhang has joined #openstack-lbaas | 00:25 | |
rm_work | eugh | 00:26 |
rm_work | http://paste.openstack.org/show/618236/ | 00:26 |
rm_work | followed by | 00:26 |
rm_work | http://paste.openstack.org/show/618237/ | 00:27 |
rm_work | this is spectacular | 00:27 |
rm_work | so much bug | 00:27 |
rm_work | this is what i was talking about before i think | 00:27 |
rm_work | the first one is that failovers should be able to ignore status | 00:28 |
rm_work | so it does seem to ALLOW failovers now | 00:30 |
rm_work | but that is pretty lulzy | 00:30 |
xgerman_ | yeah, looks like the wheels are coming off | 00:31 |
johnsom | The top try/catch block? | 00:31 |
johnsom | I mean, it should be ok for that health check to not get a lock, that is "normal" in a way | 00:31 |
johnsom | I probably should modify that get_stale try block to ignore the deadlock event. | 00:32 |
xgerman_ | it’s still saving the busy? I saw the commit in the func calling but… | 00:33 |
rm_work | that was because of a previous failed failover | 00:34 |
rm_work | but | 00:34 |
johnsom | As for the failover status, this is an interesting one. It's locking the LB, which may have other healthy amps.... | 00:34 |
rm_work | basically if it tries to failover when the state is PENDING_UPDATE | 00:34 |
rm_work | it fails | 00:34 |
rm_work | and yeah, the busy stays | 00:34 |
rm_work | i have to figure out the second one | 00:34 |
johnsom | Those revert issues are just missing kwargs | 00:35 |
rm_work | trying to figure out where | 00:35 |
johnsom | https://github.com/openstack/octavia/blob/master/octavia/controller/worker/tasks/database_tasks.py#L922 | 00:36 |
rm_work | ah yeah there's one | 00:36 |
rm_work | and 907 | 00:36 |
rm_work | we should fix up all of those | 00:36 |
johnsom | Should look more like: https://github.com/openstack/octavia/blob/master/octavia/controller/worker/tasks/database_tasks.py#L1058 | 00:36 |
johnsom | Yeah, I fixed a ton of those at one point, but more must have slipped in | 00:37 |
johnsom | We probably need a hacking rule for that | 00:37 |
xgerman_ | +1 | 00:38 |
johnsom | Ha, that is currently the only one it looks like | 00:39 |
xgerman_ | It makes sense to me to lock an LB during failover even if he has more than one amp — we can’t guarantee that updates will reach all amps at that point in time | 00:39 |
xgerman_ | but we should ignore it when we failover another amp | 00:40 |
johnsom | Oh, I don't disagree that it should be locked, I'm just worried if the update thread is still going on the other o-cw if it's going to mess with the state machine, i.e. unlock it | 00:41 |
xgerman_ | mmh | 00:41 |
johnsom | I mean it "should" fail out and go to ERROR instead of pending | 00:42 |
rm_work | yep lol | 00:42 |
rm_work | just the one spot | 00:42 |
rm_work | awesome >_> | 00:42 |
johnsom | So, either we don't failover when it's in PENDING_* and wait for it to exit that state or.... | 00:43 |
xgerman_ | well, we always need to failover - uptime is our ultimate goal | 00:43 |
johnsom | Yeah, but I don't want failover of one amp to cause failure of the other.... | 00:44 |
xgerman_ | ok, makes sense - so if we are not running SINGLE we can ait for the update (and hope it doesn’t crash by talking to the defunct amp) | 00:45 |
rm_work | but yeah that's what i was saying earlier -- "we always need to failover" | 00:45 |
rm_work | so blocking a failover because of an update is kinda >_> | 00:45 |
rm_work | but, yeah, easier said than done since it IS problematic | 00:45 |
rm_work | anyway this HELPS since now failovers *happen*, but now i'm just getting deadlocks like constantly | 00:45 |
johnsom | Did you ever find the deadlock log? | 00:46 |
rm_work | seriously, just spewing them | 00:46 |
rm_work | looking | 00:46 |
rm_work | oh err | 00:47 |
rm_work | wait | 00:47 |
rm_work | am i using INNODB? | 00:47 |
johnsom | I super hope so | 00:47 |
rm_work | err | 00:47 |
rm_work | how do i verify that | 00:47 |
johnsom | http://paste.openstack.org/show/618235/ | 00:47 |
rm_work | i have this set up as percona+extradb | 00:47 |
johnsom | Yeah, you are | 00:47 |
johnsom | It was in your status output | 00:48 |
rm_work | does ExtraDB not override that or something | 00:48 |
rm_work | ExtraDB is madness | 00:48 |
johnsom | sqlalchemy + mysql is maddness | 00:48 |
rm_work | lol | 00:48 |
*** leitan has quit IRC | 00:50 | |
rm_work | yeah i have no idea where the errors are going <_< | 00:50 |
rm_work | if anywhere | 00:51 |
johnsom | lsof? | 00:52 |
openstackgerrit | Michael Johnson proposed openstack/octavia master: Fix health monitor DB locking. https://review.openstack.org/493252 | 00:55 |
johnsom | That will shut it up | 00:55 |
xgerman_ | ha | 00:55 |
rm_work | lol.... | 00:55 |
rm_work | not sure that's ideal | 00:57 |
johnsom | Well, no. We still need to figure out what is deadlocking. | 00:58 |
rm_work | this is dumb | 01:12 |
rm_work | maybe i need to explicitly configure a log location? | 01:13 |
rm_work | ah percona xtradb is Galera | 01:13 |
johnsom | There should be a mysql variable that defines the error log location | 01:14 |
johnsom | But didn't you see those "row too long" messages? that should have been the error log | 01:14 |
rm_work | yeah i found those on all nodes | 01:15 |
rm_work | but nothing about deadlocks | 01:15 |
rm_work | i don't know if setting that global is working right | 01:15 |
rm_work | hmmmmmm | 01:15 |
rm_work | maybe I need to just ... | 01:15 |
rm_work | only send writes to one node <_< | 01:15 |
rm_work | one sec | 01:15 |
rm_work | doing that | 01:16 |
rm_work | man, what we ARE missing is active/passive | 01:16 |
rm_work | i want to have one node ONLY come up if the other is down | 01:17 |
rm_work | can't really do it with weights | 01:17 |
rm_work | johnsom: k i think that solves it -- so this is not really octavia's problem, so much as galera's optimistic locking and writing to more than one node | 01:19 |
johnsom | Are you kidding me? | 01:19 |
johnsom | Ugh, can't figure out why this regex doesn't work | 01:20 |
johnsom | (.)*def revert\(.+, (?!\*\*kwargs)\): | 01:21 |
rm_work | :3 | 01:28 |
rm_work | this is a little odd | 01:28 |
rm_work | johnsom: so i would say: throw that revert fix into the same HM patch, *remove* the bits that hide the deadlock messages from logs, and we should merge that | 01:33 |
rm_work | since it does solve a problem | 01:34 |
johnsom | I would consider it if I can get this damn regex to work | 01:35 |
*** yamamoto has joined #openstack-lbaas | 01:35 | |
rm_work | yeah i poked at it | 01:37 |
rm_work | not sure wtf | 01:37 |
*** ssmith has quit IRC | 01:50 | |
*** yamamoto has quit IRC | 02:03 | |
*** gongysh has joined #openstack-lbaas | 02:43 | |
*** yamamoto has joined #openstack-lbaas | 03:04 | |
*** xingzhang has quit IRC | 03:08 | |
*** xingzhang has joined #openstack-lbaas | 03:08 | |
*** yamamoto has quit IRC | 03:09 | |
*** xingzhang has quit IRC | 03:13 | |
*** xingzhang has joined #openstack-lbaas | 03:14 | |
*** rajivk has quit IRC | 03:28 | |
*** reedip has quit IRC | 03:28 | |
*** yamamoto has joined #openstack-lbaas | 03:29 | |
openstackgerrit | Michael Johnson proposed openstack/python-octaviaclient master: Improve error reporting for the octavia plugin https://review.openstack.org/493273 | 04:13 |
johnsom | Ok, that should pass through our fault strings to the user giving better error strings than "Bad Request" | 04:14 |
*** yamamoto has quit IRC | 04:39 | |
*** yamamoto has joined #openstack-lbaas | 04:48 | |
*** yamamoto has quit IRC | 04:55 | |
*** gcheresh has joined #openstack-lbaas | 06:22 | |
*** gongysh has quit IRC | 06:36 | |
*** gcheresh has quit IRC | 06:40 | |
*** yamamoto has joined #openstack-lbaas | 06:54 | |
*** yamamoto has quit IRC | 06:59 | |
*** tesseract has joined #openstack-lbaas | 07:03 | |
*** KeithMnemonic has quit IRC | 07:24 | |
*** yamamoto has joined #openstack-lbaas | 07:27 | |
*** yamamoto has quit IRC | 07:32 | |
*** Alex_Staf has joined #openstack-lbaas | 08:12 | |
*** aojea has joined #openstack-lbaas | 08:19 | |
*** gongysh has joined #openstack-lbaas | 09:28 | |
*** gongysh has quit IRC | 09:28 | |
*** aojea has quit IRC | 09:47 | |
*** amotoki__away is now known as amotoki | 10:51 | |
*** aojea has joined #openstack-lbaas | 10:53 | |
*** aojea has quit IRC | 11:00 | |
*** dasanind has quit IRC | 11:02 | |
*** yamamoto has joined #openstack-lbaas | 11:27 | |
*** yamamoto has quit IRC | 12:01 | |
*** yamamoto has joined #openstack-lbaas | 12:21 | |
*** gcheresh has joined #openstack-lbaas | 12:24 | |
*** Alex_Staf has quit IRC | 12:30 | |
*** Alex_Staf has joined #openstack-lbaas | 12:35 | |
*** aojea has joined #openstack-lbaas | 12:57 | |
*** aojea has quit IRC | 13:01 | |
*** gcheresh has quit IRC | 13:17 | |
*** aojea has joined #openstack-lbaas | 14:58 | |
*** xingzhang has quit IRC | 14:58 | |
*** xingzhang has joined #openstack-lbaas | 14:59 | |
*** aojea has quit IRC | 15:02 | |
*** xingzhang has quit IRC | 15:03 | |
*** ajo has quit IRC | 15:31 | |
*** yamamoto has quit IRC | 15:40 | |
*** yamamoto has joined #openstack-lbaas | 15:41 | |
*** ipsecguy_ has joined #openstack-lbaas | 15:52 | |
*** ipsecguy has quit IRC | 15:56 | |
*** xingzhang has joined #openstack-lbaas | 16:09 | |
*** Alex_Staf has quit IRC | 16:33 | |
*** xingzhang has quit IRC | 16:42 | |
*** aojea has joined #openstack-lbaas | 16:58 | |
*** aojea has quit IRC | 17:03 | |
*** aojea has joined #openstack-lbaas | 17:04 | |
*** tesseract has quit IRC | 17:28 | |
*** xingzhang has joined #openstack-lbaas | 17:42 | |
*** Alex_Staf has joined #openstack-lbaas | 18:05 | |
*** xingzhang has quit IRC | 18:12 | |
*** aojea has quit IRC | 18:13 | |
openstackgerrit | Michael Johnson proposed openstack/octavia master: Fix octavia logging to be more friendly https://review.openstack.org/493328 | 18:57 |
*** xingzhang has joined #openstack-lbaas | 19:12 | |
*** D33P-B00K has joined #openstack-lbaas | 19:30 | |
*** D33P-B00K has left #openstack-lbaas | 19:30 | |
*** xingzhang has quit IRC | 19:42 | |
*** Alex_Staf has quit IRC | 19:58 | |
*** gcheresh has joined #openstack-lbaas | 20:00 | |
johnsom | Nice, that works for the gates | 20:02 |
*** xingzhang has joined #openstack-lbaas | 20:42 | |
*** aojea has joined #openstack-lbaas | 20:42 | |
openstackgerrit | Merged openstack/neutron-lbaas master: Update reno for stable/pike https://review.openstack.org/492872 | 20:58 |
*** gcheresh has quit IRC | 21:04 | |
*** xingzhang has quit IRC | 21:12 | |
*** aojea has quit IRC | 21:37 | |
*** aojea has joined #openstack-lbaas | 21:37 | |
*** xingzhang has joined #openstack-lbaas | 22:12 | |
*** xingzhang has quit IRC | 22:42 | |
*** aojea has quit IRC | 23:31 | |
*** xingzhang has joined #openstack-lbaas | 23:42 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!