Friday, 2016-02-12

*** amotoki has quit IRC00:02
*** madhu_ak_ is now known as madhu_ak00:05
*** doug-fis_ has joined #openstack-lbaas00:10
*** TrevorV has quit IRC00:12
openstackgerritMichael Johnson proposed openstack/octavia: Stop logging amphora cert generation in debug
*** doug-fish has quit IRC00:15
*** ducttape_ has joined #openstack-lbaas00:17
ajmillerdougwig you around?00:18
*** kevo has quit IRC00:46
*** SumitNaiksatam has quit IRC00:53
*** minwang2 has quit IRC00:58
*** ajmiller has quit IRC00:58
*** allan_h has quit IRC01:09
openstackgerritReedip proposed openstack/neutron-lbaas: Make subnet_id for lbaas v2 member create optional
*** ducttape_ has quit IRC01:21
*** SumitNaiksatam has joined #openstack-lbaas01:32
*** Aish has quit IRC01:37
*** yamamoto_ has joined #openstack-lbaas01:44
*** drjones has quit IRC01:44
*** harlowja has quit IRC01:55
*** amotoki has joined #openstack-lbaas01:58
*** shakamunyi has joined #openstack-lbaas01:58
*** ducttape_ has joined #openstack-lbaas02:00
*** amotoki has quit IRC02:02
*** yamamoto_ has quit IRC02:04
*** Aish has joined #openstack-lbaas02:07
*** Aish has left #openstack-lbaas02:07
*** shakamunyi has quit IRC02:12
*** rooney has quit IRC02:18
*** ducttape_ has quit IRC02:23
*** rooney has joined #openstack-lbaas02:24
*** shakamunyi has joined #openstack-lbaas02:31
*** Bjoern has joined #openstack-lbaas02:34
*** Bjoern has quit IRC02:39
*** bana_k has joined #openstack-lbaas02:42
*** minwang2 has joined #openstack-lbaas02:46
*** yamamoto_ has joined #openstack-lbaas02:51
*** madhu_ak has quit IRC02:58
*** ajmiller has joined #openstack-lbaas03:06
*** ajmiller_ has joined #openstack-lbaas03:08
*** ajmiller has quit IRC03:12
*** ajmiller_ is now known as ajmiller03:13
*** ducttape_ has joined #openstack-lbaas03:14
*** doug-fis_ has quit IRC03:19
*** doug-fish has joined #openstack-lbaas03:21
*** doug-fish has quit IRC03:21
*** kevo has joined #openstack-lbaas03:22
*** ducttape_ has quit IRC03:22
*** bana_k has quit IRC03:25
*** links has joined #openstack-lbaas03:32
*** ducttape_ has joined #openstack-lbaas03:45
*** woodster_ has quit IRC03:46
*** minwang2 has quit IRC03:48
*** minwang2 has joined #openstack-lbaas03:51
*** amotoki has joined #openstack-lbaas04:41
*** minwang2 has quit IRC04:42
*** minwang2 has joined #openstack-lbaas04:44
*** ducttape_ has quit IRC04:58
*** ducttape_ has joined #openstack-lbaas05:00
*** ducttape_ has quit IRC05:05
*** pc_m has quit IRC05:10
*** hockeynut has quit IRC05:10
*** rm_work has quit IRC05:11
*** hockeynut has joined #openstack-lbaas05:12
*** pc_m has joined #openstack-lbaas05:12
*** rm_work has joined #openstack-lbaas05:18
*** ajmiller_ has joined #openstack-lbaas05:18
*** chlong has quit IRC05:19
*** ajmiller has quit IRC05:21
*** ajmiller_ has quit IRC05:21
*** allan_h has joined #openstack-lbaas05:27
*** fnaval has quit IRC05:31
*** chlong has joined #openstack-lbaas05:33
*** fnaval has joined #openstack-lbaas05:41
*** chlong has quit IRC05:42
*** chlong has joined #openstack-lbaas05:54
*** rooney has quit IRC06:00
*** ducttape_ has joined #openstack-lbaas06:01
*** ducttape_ has quit IRC06:06
*** numans has joined #openstack-lbaas06:09
*** chlong has quit IRC06:11
*** prabampm has joined #openstack-lbaas06:14
*** rooney has joined #openstack-lbaas06:17
*** numans has quit IRC06:26
*** numans has joined #openstack-lbaas06:28
*** chlong has joined #openstack-lbaas06:34
*** numans has quit IRC06:53
*** armax has quit IRC06:54
*** ducttape_ has joined #openstack-lbaas07:03
*** chlong has quit IRC07:03
*** ducttape_ has quit IRC07:08
*** numans has joined #openstack-lbaas07:15
*** amotoki has quit IRC07:16
*** minwang2 has quit IRC07:22
*** minwang2 has joined #openstack-lbaas07:23
*** amotoki has joined #openstack-lbaas07:24
*** lmiccini|away is now known as lmiccini07:27
*** minwang2 has quit IRC07:28
*** minwang2 has joined #openstack-lbaas07:28
*** allan_h has quit IRC07:29
*** minwang2 has quit IRC07:30
*** amotoki has quit IRC07:35
*** minwang2 has joined #openstack-lbaas07:37
*** amotoki has joined #openstack-lbaas07:39
openstackgerritokamoto_takashi_p3 proposed openstack/neutron-lbaas: Fixed connection limit for HA Proxy
*** minwang2 has quit IRC07:43
*** amotoki has quit IRC07:43
*** reedip is now known as reedip_away07:54
*** amotoki has joined #openstack-lbaas07:55
*** rooney has quit IRC07:57
openstackgerritStephen Balukoff proposed openstack/octavia: Fixed database model traversal
openstackgerritokamoto_takashi_p3 proposed openstack/neutron-lbaas: Fixed connection limit for HA Proxy
*** kevo has quit IRC08:26
*** ducttape_ has joined #openstack-lbaas09:04
*** mhickey has joined #openstack-lbaas09:07
*** mhickey has quit IRC09:08
*** mhickey has joined #openstack-lbaas09:09
*** ducttape_ has quit IRC09:09
openstackgerritStephen Balukoff proposed openstack/octavia: Add L7 database structures
openstackgerritStephen Balukoff proposed openstack/octavia: Update repos for L7 policy / methods
*** amotoki has quit IRC09:34
openstackgerritStephen Balukoff proposed openstack/octavia: Update repos for L7 rules / validations
*** amotoki has joined #openstack-lbaas09:48
*** amotoki has quit IRC09:53
openstackgerritStephen Balukoff proposed openstack/octavia: Add L7 api - policies
*** amotoki has joined #openstack-lbaas09:56
*** amotoki has quit IRC10:01
openstackgerritStephen Balukoff proposed openstack/octavia: Add L7 api - rules
*** ducttape_ has joined #openstack-lbaas10:05
openstackgerritStephen Balukoff proposed openstack/octavia: Add L7 controller worker flows and tasks
*** ducttape_ has quit IRC10:09
*** amotoki has joined #openstack-lbaas10:10
openstackgerritStephen Balukoff proposed openstack/octavia: Add L7 jinja template updates
*** amotoki has quit IRC10:12
*** amotoki has joined #openstack-lbaas10:13
openstackgerritStephen Balukoff proposed openstack/octavia: Add L7 documentation
*** openstackgerrit has quit IRC10:32
*** openstackgerrit has joined #openstack-lbaas10:33
*** numans has quit IRC10:42
openstackgerritMerged openstack/octavia: Fixes an intermittent load balancer delete failure
*** ducttape_ has joined #openstack-lbaas11:06
*** ducttape_ has quit IRC11:11
*** numans has joined #openstack-lbaas11:30
*** amotoki has quit IRC11:30
*** numans has quit IRC11:34
*** numans has joined #openstack-lbaas11:36
*** ducttape_ has joined #openstack-lbaas12:07
*** links has quit IRC12:10
*** ducttape_ has quit IRC12:12
*** yamamoto_ has quit IRC12:12
*** intr1nsic has quit IRC12:22
*** intr1nsic has joined #openstack-lbaas12:25
*** rtheis has joined #openstack-lbaas12:25
*** ducttape_ has joined #openstack-lbaas13:07
*** _ducttape_ has joined #openstack-lbaas13:11
*** ducttape_ has quit IRC13:12
*** mhickey has quit IRC13:12
*** _ducttape_ has quit IRC13:25
*** amotoki has joined #openstack-lbaas13:33
*** amotoki has quit IRC13:43
*** amotoki has joined #openstack-lbaas13:51
*** amotoki has quit IRC13:52
*** amotoki has joined #openstack-lbaas13:58
*** neelashah has joined #openstack-lbaas13:58
*** ajmiller has joined #openstack-lbaas14:11
*** localloop127 has joined #openstack-lbaas14:20
*** ajmiller has quit IRC14:44
*** amotoki has quit IRC14:45
*** amotoki has joined #openstack-lbaas14:48
*** Bjoern has joined #openstack-lbaas14:48
*** prabampm has quit IRC14:53
*** amotoki has quit IRC14:53
*** piet has joined #openstack-lbaas14:59
*** ducttape_ has joined #openstack-lbaas14:59
*** doug-fish has joined #openstack-lbaas15:01
*** TrevorV has joined #openstack-lbaas15:04
*** amotoki has joined #openstack-lbaas15:09
*** yamamot__ has joined #openstack-lbaas15:09
*** doug-fish has quit IRC15:21
*** doug-fish has joined #openstack-lbaas15:22
*** ajmiller has joined #openstack-lbaas15:28
*** fnaval has quit IRC15:59
*** localloo1 has joined #openstack-lbaas16:02
*** localloop127 has quit IRC16:05
*** localloop127 has joined #openstack-lbaas16:06
*** localloo1 has quit IRC16:07
*** fnaval has joined #openstack-lbaas16:19
*** links has joined #openstack-lbaas16:23
*** numans has quit IRC16:31
*** armax has joined #openstack-lbaas16:32
*** woodster_ has joined #openstack-lbaas16:39
*** localloop127 has quit IRC16:40
*** allan_h has joined #openstack-lbaas16:41
*** localloop127 has joined #openstack-lbaas16:41
*** bana_k has joined #openstack-lbaas16:43
*** localloop127 has quit IRC16:49
*** localloop127 has joined #openstack-lbaas16:51
kevinbentonblogan: yo16:53
kevinbentonblogan: you had question about heartbeats?16:54
johnsomkevinbenton Octavia heartbeats?  I might be able to answer16:54
kevinbentonjohnsom: no idea, just had a ping from him from yesterday16:55
johnsomAh, ok16:55
johnsomCan't help you there16:55
sbalukoffjohnsom: I have to jump into a phone meeting for the next half hour,but I have something I want to discuss with you after that. Do you think you'll be available?16:57
johnsomsbalukoff available until 11, then after 1:3016:58
sbalukoff(It's about this  as well as what I think was causing the pushing of "out of date" configs to the amphorae)16:59
*** yamamot__ has quit IRC16:59
johnsomOk, yeah, ping me16:59
*** yamamoto_ has joined #openstack-lbaas17:00
blogankevinbenton: neutron agent heartbeats17:02
kevinbentonblogan: yeah, what about them?17:02
blogankevinbenton: if a heartbeat is found to be stale, does the agent's admin_state_up get moved to disabled, or is it the status that cahnges?17:02
kevinbentonblogan: just the status17:03
kevinbentonblogan: admin_state_up is purely controlled via the API17:03
kevinbentonblogan: admin_state_up is to prevent stuff from being scheduled to it17:04
blogankevinbenton: okay thats what i figured17:04
openstackgerritMichael Johnson proposed openstack/octavia: Stop logging amphora cert generation in debug
blogankevinbenton: well wouldn't you want agents that have a stale heartbeat to not be scheduled to?17:04
bloganif the status is OFFLINE or DOWN or whatever it is17:05
kevinbentonblogan: yes17:05
blogankevinbenton: okay just making sure on that, lbaas agent does not check status when scheduling17:05
kevinbentonblogan: i was just pointing out that admin_state_up is just used as a way to make it explicit17:05
blogankevinbenton: makes sense, thansk17:05
kevinbentonblogan: i don't think this applies to the L2 agent17:05
kevinbentonblogan: there was some discussion around changing that17:06
kevinbentonblogan: but don't look to l2 agent as example17:06
blogankevinbenton: just looking at the lbaas agent, and it uses the common agent code in neutron17:06
*** localloop127 has quit IRC17:09
*** localloop127 has joined #openstack-lbaas17:10
kevinbentonblogan: now that i think about it i'm not sure the status is actually stored anywhere17:15
kevinbentonblogan: for the other agents17:15
*** _cjones_ has joined #openstack-lbaas17:16
*** openstackgerrit has quit IRC17:17
*** openstackgerrit has joined #openstack-lbaas17:18
kevinbentonblogan: i think the dead/alive is calculated dynamically on API queries17:18
kevinbentonblogan: yeah, only thing in the DB is admin_state_up, no status column17:19
kevinbentonis_active is calculated based on delta from last updated timestampe17:20
*** amotoki has quit IRC17:31
blogankevinbenton: ah so a scheduler woudlnt be able to say don't schedule here if the heartbeats stopped17:31
kevinbentonblogan: yeah, it can. it can check that is_active property17:32
kevinbentonblogan: that will calculate based on the heartbeat17:32
blogankevinbenton: oh duh, sorry i was thinking about it wrong17:32
kevinbentonblogan: i was just pointing out that we don't even have a status column so it can't possibly be affected by anything else17:33
blogankevinbenton: yeah makes sense17:35
blogankevinbenton: thanks for your help17:35
kevinbentonblogan: no prob17:35
sbalukoffjohnsom: Ok! So! Wheneeeeever you're ready...17:38
johnsomsbalukoff Coffee in hand let's go17:38
sbalukoffOk, so first off, you may notice that the bugfix patch  I uploaded moves things to be a lot closer to what blogan wrote there originally, before the shared pools patch.17:39
sbalukoffWhat it does is populate a data model tree from the perspective of whatever database object you hand it first.17:39
sbalukoffblogan's (and this patch's) code refuse to revisit object classes we've seen before.17:40
sbalukoffThis means, that we can trust the data model to be accurate *for the first object we pass to it* but things will get less accurate the farther down the tree we traverse.17:40
sbalukoffMy code that I have in there (without this bugfix) tries to make things a little more fleshed out...17:40
sbalukoffYou can trust the accuracy of the object you pass to it, and all of its immediate children.17:41
sbalukoffHowever, the problem with that is once you have a moderate number of things populated in the tree...17:41
sbalukoffThe efficiency of this algorithm becomes worse than n^217:41
sbalukoffIt's really, really terrible, actually.17:42
johnsomYeah, not fun17:42
sbalukoffWhich is why I'm advocating for essentially a roll-back.17:42
sbalukoffWhere the rubber hits the road here is what we do in the flows whenever we do a amphora_driver_task.ListenersUpdate.17:42
sbalukoffRight now, we pass that method a data model which has not been populated from the perspective of the listeners we are updating.17:43
sbalukoffSo... you get inaccuracies.17:43
sbalukoffI see there being two potential solutions to this problem:17:43
johnsomOk, you lost me on that one17:44
sbalukoff1. Before we call ListenersUpdate, we have to populate the data model from the perspective of the listeners we pass to it, and then do our preliminary updates (eg. "delete this pool" "update this member", etc.) on the data model we've populated from the perspective of the listener...17:44
sbalukoffLet me find a concrete example.17:44
sbalukoffGive me a minute.17:44
johnsomI thought that was what these were for:
sbalukoffSo let's see...17:45
sbalukoffIf you look in
sbalukoffIn the update_member method.17:45
sbalukoffSee how we populate the member from the database....17:45
sbalukoffBut then 'listeners' is populated from that member data model?17:45
*** jwarendt has quit IRC17:46
sbalukoffThat's where we run into trouble.17:46
johnsomYeah, we get the old member from DB, store it and the member updates in the flow.  This allows the flow to update the model before the listener_update17:46
sbalukoffWe need to do a repo.get for the listener.17:46
*** bana_k has quit IRC17:46
sbalukoffThat's solution 1. to this problem.17:46
sbalukoffThe second solution is to just update the database first, and allow the amphora_driver_tasks.ListenersUpdate to populate the listener directly from database data.17:47
johnsomSo you are saying that listeners list, from the member.pool is incomplete?17:47
sbalukoffHear me out on this one:17:47
*** longstaff has joined #openstack-lbaas17:48
sbalukoffjohnsom: Yes, in effect. All the listeners are there, but listeners.pools is not because we're really doing this:   member.pool.listeners.pool.member17:48
sbalukoffEr.... sorry, memberl.pool.listeners.pools.member17:48
sbalukoffDammit, can't type today.17:48
sbalukoffAnyway, you see how once we've traversed a member object, then a pool object, then a listener object, we can't go back to a pool object?17:49
sbalukoffNot with the way we're populating the data model.17:49
sbalukoff(If we're following blogan's code, which I think is the only efficient way to do this.)17:49
sbalukoffSo....  I also want to say something about populating the database first:17:49
sbalukoffThink about about what we presently do when we revert:17:50
sbalukoffWe mark the listener as being in the 'ERROR' state.17:50
johnsomI think I am following you.  I think this is why we have the reload task.  We probably need to add that to the member update flow so it takes that list of members and updates those models17:50
sbalukoffI interpret that to mean "this listener in the database might be out of synch with whatever is actually in place on the amphora"17:50
sbalukoffjohnsom: Yep. We actually have to do that for any flow that is not a listener.17:51
sbalukoffSo, pool, health_monitor, session_persistence, and also (now) l7policy, l7pool.... etc. etc. etc.17:51
johnsomHmmm, but all of those work today17:52
sbalukoffI think it would be far simpler to just update the database first, and if we need to revert... well, we're already marking the listener as being in 'ERROR' right?17:52
sbalukoffTo be honest, I'm not sure they actually do work today. I think our scenario tests are fairly weak right now.17:53
sbalukoffEr... tempest tests.17:53
johnsomI'm not counting on scenario tests, I'm going on my testing17:53
*** jwarendt has joined #openstack-lbaas17:53
johnsomEVERYTHING was broken with that database pull line I removed.  That I know broke a ton17:54
johnsomLet me do a quick update test on my devstack with our latest master.17:54
sbalukoffYeah-- I'm not even sure my super-inefficient model population code saves us from this.17:55
sbalukoffBecause, really, we're not updating the model right in the controller_worker before we hand it to the ListenersUpdate task.17:55
johnsomI guess, I should pull your patch, right?17:55
sbalukoffThat would put you in a state much closer to how things were before the shared pools patch.17:56
sbalukoffThough if you want to be entertained (or bored), go with master, populate a complete load balancer tree, and try to do an API get on the load balancer.17:56
sbalukoffAnd watch the octavia-api daemon spin for about 30 seconds while consuming 100% CPU. XD17:57
johnsomOk, give me a minute.  My build LB script is running.  After it is done, I will pull your patch, restart and do updates on objects down the tree17:57
*** links has quit IRC17:58
sbalukoffThere's one other fly in the ointment that should probably be mentioned if we want to go option 1, and update the model only (not the DB) before passing the model to the ListenersUpdate task...17:58
sbalukoffThere's a lot of logic in the file (not just in the l7policy / l7rule stuff I'm trying to add) which is not presently reflected in the file.17:59
sbalukoffSo if you're trying to simulate update of the database while you're really only updating the data model... you're going to have to duplicate all the logic that happens in
sbalukoffWhich is ugly, to say the least.18:00
sbalukoffAlso, I should note that I could be totally wrong about all this-- this is what I discovered around 1:00am last night and I've only had about 4.5 hours of sleep. But I don't *think* I'm wrong about this. XD18:03
sbalukoffHmmm... there's also a distant third option, but it might not be possible....  We might be able to figure out a way to hold a database write lock while we attempt to update all the amphorae... and then roll back the transaction if we need to revert. But that also seems very ugly.18:06
johnsomWell, that was interesting.  I did a pool update to LEAST_CONNECTIONS and the haproxy didn't render right18:06
sbalukoffIt didn't?18:07
sbalukoffThis is after you've removed the database lookup from ListenersUpdate?18:07
johnsomsbalukoff Yeah, a transaction would be the best option.  Just the way we are doing repo etc. makes that nearly impossible18:07
sbalukoffjohnsom: Yeah. Super painful.18:08
xgermanlet me know when you figure out transactions - will need that for cascading deletes18:09
sbalukoffxgerman: I am so, so sorry.18:10
johnsomTransactions are easy.  repo and sqlalchemy make them harder18:10
*** piet has joined #openstack-lbaas18:10
openstackgerritTrevor Vardeman proposed openstack/octavia: WIP - Get me a Load Balancer
*** madhu_ak has joined #openstack-lbaas18:12
johnsomOk, now just a build isn't rendering the haproxy config right.18:12
sbalukoffjohnsom: Again, this is after you've removed the database lookup from ListenersUpdate?18:13
johnsomYeah, and added your latest.  Going to roll that back18:13
sbalukoffIf you're going to allow the ListenersUpdate to populate its data from the database, then the database should also be updated first.18:14
sbalukoff(Just sayin' ;) )18:14
sbalukoffAnyway, do your thing. Maybe I'm interpreting the situation wrong, eh.18:14
*** amotoki has joined #openstack-lbaas18:14
johnsomI'm going to roll you traversal patch back, then try a build.  It was failing on create, so not just update18:15
*** bana_k has joined #openstack-lbaas18:15
*** kevo has joined #openstack-lbaas18:15
johnsomo-cw said it was fine, but the haproxy conf was missing sections after I did the create of an lb with members18:16
sbalukoffSimilar to symptoms I was seeing last night.18:16
johnsomOk, still broken18:16
johnsomOk, with my "revert bad workaround" removed, still not rendering.  I'm going to pull out shared pools18:21
sbalukoffWhat is your "revert bad workaround"?18:22
*** openstack has joined #openstack-lbaas18:27
sbalukoffYeah, I screw up all the time, which it why it's good you're trying to verify whether my assessment of the situation is correct. :)18:27
johnsomOk, starting over, testing master, create LB18:28
johnsomOk, master, create LB works.18:29
sbalukoffOk. It's just horribly inefficient then. :)18:30
johnsomYeah, could be.  Let me try an update18:30
sbalukoffSeriously, populate like 3 listeners and three pools.18:30
sbalukoffAnd watch it spin...18:30
johnsomOk, so member update is showing the same, need to run twice issue.  trying pool update18:32
johnsom(though member update was re-ordered I thought) maybe that was delete18:32
johnsomIt was delete, nevermind18:33
sbalukoffI know that one of the member methods accidentally got skipped with that fix.18:33
johnsomLooks like our CLI help for pool-update is broken18:34
sbalukoffMost of the cli update help is shit.18:34
johnsomCareful, I implemented some of that... grin18:35
sbalukoff(create and delete help isn't htat bad.) ;)18:35
sbalukoffBut yeah... actually fixing the update help is down there on my to-do somewhere.18:35
sbalukoffProbably after "give Octavia its own CLI, already."18:36
johnsomIt was a rush implement at a hack-a-thon to get it in kilo18:36
sbalukoffThat doesn't surprise me too much. :)18:36
sbalukoffAnd really-- I'm not saying you didn't do what you had to do, eh.18:36
sbalukoff(Or that I would have done differently)18:36
sbalukoffI know how a lot of this code gets written, eh. ;)18:37
sbalukoffMost of the testing against Octavia I've been doing lately has just to run curl against the octavia API directly to reduce the complication / potential point of failiure of neutron-lbaas.18:38
sbalukoff(Makes it far simpler to determine whether it's an Octavia or a Neutron LBaaS problem I'm seeing.)18:38
sbalukoffThis might be helpful if you're interested in doing any of that:
*** openstackgerrit has quit IRC18:47
*** SumitNaiksatam has quit IRC18:47
*** openstackgerrit has joined #openstack-lbaas18:48
johnsomYeah, ok, master is not updating18:51
johnsomIt renders, but doesn't update on first call18:52
sbalukoffIf it's helpful, I can write a second patch (dependent on my traversal fix patch) which changes the flows so the database update comes first. Again, the revert method in this case marks the listener as "ERROR" which is accurate enough.18:53
sbalukoffAnd we can see whether that one fixes things. :/18:53
johnsomLet me mess with this a bit today.  This stuff worked at one point and I have no idea what broke it.  I want to look closer at those model updates.18:54
johnsomThat said, I'm pretty sure your patch breaks the config render18:55
sbalukoffWell.... I'd like to try things with the database update in flow order reversed anyway. So i'll go ahead and write that patch, even if we end up abandoning it. :)18:55
johnsomOk.  Just note I will push back.  But either way, this breaks the haproxy config render18:56
johnsomI can't create LBs with that patch in18:57
sbalukoffyep, makes sense that it would, with the database update happening after other things.18:57
*** piet has quit IRC18:57
sbalukoffAlso, you should feel free to push back. But have good technical reasons for doing so, eh. ;)18:57
johnsomWithout that one patch, it creates just fine18:57
*** localloo1 has joined #openstack-lbaas18:58
sbalukoffyes, and I think I know why. But also without that patch we can't support an octavia infrastructure with more than about 3 listeners and pools.18:58
johnsomSo, order is not the issue18:58
johnsomUnderstand, work to do.18:58
sbalukoffright. We're not populating the data model correctly. And it turns out populating it correctly is painful because we have to duplicate logic from the repository.18:59
*** localloop127 has quit IRC19:01
*** localloo1 has quit IRC19:02
*** localloo1 has joined #openstack-lbaas19:04
*** numans has joined #openstack-lbaas19:07
*** kobis has quit IRC19:08
*** mhickey has joined #openstack-lbaas19:10
mhickeyblogan, dougwig, johnsom: Hey19:13
mhickeyblogan, dougwig, johnsom: re, looking for your guru knowledge?19:16
openstackLaunchpad bug 1544861 in neutron "LBaaS: connection limit does not work with HA Proxy" [Undecided,New]19:16
*** numans has quit IRC19:17
*** localloop127 has joined #openstack-lbaas19:17
openstackgerritStephen Balukoff proposed openstack/octavia: Do DB updates before pushing listener conf to amp
*** localloo1 has quit IRC19:19
johnsommhickey looking19:20
sbalukoff(Gonna go AFK for a bit, then run the above patch through some manual scenario tests...)19:21
mhickeyjohnsom: ack19:22
*** kobis has joined #openstack-lbaas19:28
johnsommhickey I commented and marked incomplete19:28
johnsommhickey let me know if you need more19:29
mhickeyjohnsom: that's great, thanks.:)19:31
*** yamamoto_ has quit IRC19:34
bloganmhayden: left a comment on the bug
openstackLaunchpad bug 1544313 in neutron "LBaaSv2 agent schedules LB's to agents that are offline" [Undecided,New]20:01
sbalukoffjohnsom: I have to run off for a bit, but I just ran the above patch through several scenario tests and everything worked as expected. (ie. DB stays in synch with amphora haproxy.cfg, and changes get pushed immediately)20:11
sbalukoffI would be interested in finding out if anyone can find a breaking scenario test with the above patch.20:12
sbalukoffIn the mean time... during my shower this morning I thought of another way to populate the data model which would be more accurate, would probably be uglier, but also probably wouldn't be n^2 or worse efficiency.20:12
sbalukoffI'll try giving that a shot, but I'm still in favor of doing the DB updates first because that way we don't have to duplicate logic in the flows that's already in the repository.20:13
sbalukoffAnyway, I'll be checking back in here periodically.20:13
rm_worksbalukoff: i had a question on your L7 DB review20:16
rm_workalso -- showerthoughts best thoughts20:16
sbalukoffrm_work: What's the question?20:17
rm_worknothing major, unless my gut feeling about that _listeners var is right20:18
sbalukoffrm_work: I'll have a look and see about answering that later this afternoon. Thanks for the heads up!20:18
*** kevo has quit IRC20:19
*** localloop127 has quit IRC20:20
*** kobis has quit IRC20:26
*** localloop127 has joined #openstack-lbaas20:30
*** kobis has joined #openstack-lbaas20:46
*** yamamoto_ has joined #openstack-lbaas20:48
*** yamamoto_ has quit IRC20:52
*** kobis has quit IRC20:55
xgermanrm_work yt?21:04
xgermanmoar bbq questions21:04
*** piet has joined #openstack-lbaas21:05
*** kevo has joined #openstack-lbaas21:10
johnsomYeah, the more I think about it, the more I am against re-ordering the flows.  Much like we decided earlier in the week, I think it is best to fix the root cause of the problem and get the better code working again.21:17
*** yamamoto_ has joined #openstack-lbaas21:18
bloganjohnsom, sbalukoff: i agree, just by skimming over the scrollback, address the root cause21:21
*** yamamoto_ has quit IRC21:22
bloganjohnsom, sbalukoff: do yall think if we dumped the data models in favor of just passing sql alchemy models around would be better? I've had experiences where sqlalchemy modesl just don't do what you want, so thats how the data models came into existence, but they need cannto reproduce the back and forward links like sqlalchemy gives21:22
johnsomI don't know if it would be better or not.  It would be a big change21:23
bloganas far as the repo goes, i regret the decision to have them convert the SA models to data models, because it makes reusing the methods as subtransactions harder21:24
bloganwould rather have another layer that does that now21:24
bloganwell traversing the models is a pain in the ass, especially for people who don't know the limitations, and they're hard to spot21:25
dougwigblogan: you made that decision.21:25
bana_kanyone facing prob with create listener /21:25
dougwigblogan: i think we all said, "no hurry."  :)21:25
bana_kthere is a prob with peer_port being none in the haproxy.cfg file21:26
blogandougwig: why is it that you haven't said one thing but once i say i regret a decision you're quick to pounce?21:26
dougwigblogan: i have a deep need to add value.21:26
bloganbana_k: i have not but i haven't done a lot of octavia testing lately21:27
johnsombana_k Yes, that is related to the recent model changes we are discussing.  We have some bug(s) related to the models to figure out.21:27
bana_kblogan: it happens only with active-standby21:27
blogandougwig: i don't regret having the sa models be converted to data models, i regret doing that in that one layer21:27
bana_kjohnsom: ok thanks21:27
*** mhickey has quit IRC21:28
johnsombana_k I'm looking into it.  Hopefully we can run it to ground today.21:29
bana_kjohnsom: thanks a lot.21:37
*** TrevorV has quit IRC21:38
*** piet has quit IRC21:40
johnsomOk, so if I use commit 3a49fe2dcf30bd4979420bd82b74bf2c37ec215c (pre-shared pools) and cherry pick I can update the lb_algorithm on the pool with no trouble.  Updates the first time, like it should.21:40
*** ajmiller has quit IRC21:45
johnsommember-update is not working.21:47
johnsomIt takes two tries, so that is a problem I will look into after I try healthmon21:47
xgermanblogan Cosmos avoids sql alchemy altogether - I have been told but them it’s the devli21:48
*** yamamoto_ has joined #openstack-lbaas21:48
*** ajmiller has joined #openstack-lbaas21:48
johnsomNot completely true, it avoids the models21:48
xgermanyou guys need to tell me things more completley21:48
xgermanI am binary: devli/saint21:48
xgermanand we wouldn’t have those problems if we would have stuck to ruby...21:50
* xgerman hides21:50
*** yamamoto_ has quit IRC21:52
johnsomlbaas-healthmonitor-update also works correctly21:52
sbalukoff....aaand gerrit is down.22:15
*** localloop127 has quit IRC22:16
sbalukoffSo, on not reordering the flows:  By doing that you are saying we need to duplicate the logic already in the repositories when we update data models.22:16
sbalukoffAlso, this is inconsistent with creates, where we definitely do create the DB entries before we push out the listener configs.22:17
sbalukoffAnd...  for what? This gains us nothing in a revert:  You still end up with a listener in an 'ERROR' state.22:17
sbalukoffAm I wrong in interpreting a listener in 'ERROR' to mean that it's potentially out of synch with the amphorae?22:18
sbalukoffI'm not even convinced at this point that we would want to roll back the DB transactions, per se. I think "ERROR" should be good enough-- after all, with active-standby and active-active, what do you do when one amphora takes the update just fine, but others don't?22:19
sbalukoffAlso, realize that with everything I've tried (including member updates) is working.22:20
sbalukoffSo please... if you're going to argue for updating the model (and duplicating logic from the repo, including stuff the SQLAlchemy takes care of for us right now) so that we can leave the DB with the "original" objects (which won't be in the original state anyway)...  can someone please explain the technical reason for wanting to do this?22:22
johnsomPrior to shared pools (maybe with shared pools, haven't tried it) they all work except for member update.22:22
sbalukoffSee, it's that "except member update" part that says to me "It's not working correctly"22:23
johnsomThe reason is the same as we discussed earlier in the week.  We put a lot of effort in to having this capability (we could have just slammed the DB at the api layer) to allow retries (yes, not implemented yet), debugging of the error after the fact, and a rollback that can be more intelligent.22:24
johnsomNever said there isn't a bug with member.22:24
sbalukoffBut there's not a bug with the patch I've proposed.22:24
johnsomBut I'm not willing to throw everything else out, just because we can slam some code22:24
sbalukoffYou are mis-characterizing what I'm proposing.22:25
johnsomYes, you can no longer retry update actions, you can no longer debug the before and requested update state22:25
sbalukoffYou're talking about hypothetical code that's not been written yet. You could indeed retry update actions if you keep track of what those actions are supposed to be.22:25
johnsomThe code you proposed, by re-ordering, is incomplete in that you would need to pull all of the model management code out of the flows and probably just update the DB at the API layer.22:26
johnsomdebugging is not hypothetical22:26
sbalukoffI'm not sure I agree with you on not being able to debug the before update state.22:27
johnsomYou seem very worked up over the reorder patch.  What is the aversion to figuring out what is actually wrong with the model code?22:27
sbalukoffI agree that with my update it does make most of the model code seem pointless at this point. I'm not saying we out-and-out remove it at this stage in the game.22:27
johnsomIt's dead code with that change, so it really should be removed22:28
sbalukoffMy biggest aversion to it is that there is *significant* logic that happens in repo code for l7policies and l7rules.22:28
sbalukoffBecause the logic for handling those doesn't boil down to simple database relations.22:28
johnsomWould that need to change if we didn't re-order the flows?22:29
sbalukoffSo, if I have to do that in the data model as well...  not only do we potentially introduce a whole new area for bugs to creep in, but we've got to do all the stuff that SQLAchemy actually does take care of for us in there as well.22:29
*** rtheis has quit IRC22:30
sbalukoffIf we don't re-order the flows, I have to duplicate *all* of that logic in the data model.22:30
sbalukoffBecause, again, you're working around the fact that we don't have real database transactions by trying to fake it with the data model. That's fine as long as you are working off very simple models.22:30
sbalukoffL7 stuff isn't that simple.22:31
johnsomOk, so this, to me, is a totally different argument22:31
johnsomI'm going from a perspective of "it worked before(with a bug), allows retries, and maintains some level of idempotent behavior.22:33
sbalukoffDo we actually do retries?22:34
sbalukoffI would argue that this isn't really idempotent so long as we end up in an ERROR state.22:34
johnsomIf you want to be brash about it, give me ten minutes.22:34
sbalukoffOr rather, what I've proposed is no less idempotent.22:35
sbalukoffI'm not meaning to be brash.22:35
sbalukoffMaybe I don't understand what you mean by "allows retries"22:36
johnsomSo, what I am hearing is the real issue is around how we have implemented the model and the complexity of the L7 rules/etc.  That I have not considered or looked at, so I'm open to looking into that argument22:36
sbalukoffThat's part of the real issue, yes.22:38
johnsomWhat I mean is, much like we had mis-applied it, in the create amphora flows we used the retry capability of taskflow to retry certain actions n-times before fully reverting.  In a future phase, when we turn on job board, that includes trying from multiple workers.22:38
sbalukoffAnd, as it's implemented, things aren't updated in the DB until it's done?22:39
sbalukoff(If so, then I'm pretty sure all of our create flows are off)22:39
johnsomUpdate and delete are the two actions that maintain the actual state of the amp in the database while the actions are attempted.22:40
johnsomCreate updates DB as actions are completed (i.e. there is no previous state)22:41
sbalukoffWhat a time for gerrit to be down. :P22:42
johnsomSo, point me to the patch that has the repo changes it it, that would make model updates hard.22:42
johnsomAt least an hour22:42
sbalukoffI was about to do that.22:42
sbalukoffBut.... I can't right now because jenkins is down.22:42
sbalukoffgerrit. Whatever.22:43
johnsomI am still worried that the shared pools patch may have broken updates and the follow on to-model patch breaks creates22:45
sbalukoffWhen it's back up, it will be this patch:  And this patch:
sbalukoffAre you even going to look at my to-model patch + flow re-order patch combo?  (ie. the one I have not been able to get to do the wrong thing, even with member updates?)22:47
johnsomI can try it, but I don't know how re-order would fix the corrupt haproxy.cfg on create22:48
*** yamamoto_ has joined #openstack-lbaas22:48
sbalukoffGive it a shot.22:48
rm_workugh gerrrrrit22:48
rm_worki want to look at what you're talking about too but didn't get a chance yet lol22:49
sbalukoffThinking about the retry problem-- without actual DB transactions, we're going to be stuck faking transactions in one way or another. We could do this in the model, as I think is being proposed here. But we could also do this in the database by storing state information in a secondary table. (Just brainstorming, eh.) It's still an ugly pain-in-the-ass hack of course... It would be *so* nice to have real database transacti22:52
sbalukoffons here.22:52
*** yamamoto_ has quit IRC22:52
johnsomCorrection, we ARE doing it using the model.  Yes, using database as history was talked about when we were designing this.22:54
sbalukoffIf we're committed to a data model layer we really should have made sure that only that code accessed the repository. As it is, things like the API go straight to the repository.22:54
johnsomWell, maybe that is the right answer.  Develop a method of maintaining a sqlalchemy session transaction across tasks in a flow22:54
*** neelashah has quit IRC23:01
rm_workyeah i remember discussion a LB state history table23:04
rm_workthat gets really ridiculous eventually tho23:04
xgermanThe flow should be able to keep the history so we have things spanning multiple flows?23:05
johnsomI'm really warming to the idea of figuring out transactions23:05
sbalukoffWell... thinking about this some more, when jobboard is implemented... we'd have to build in state management,  right? And we want to attempt from multiple controllers...  which means we're probably storing state in the database in one form or another.23:05
xgermantask flow has revert and such — so what additional data do we need?23:06
johnsomIt's already handled for us.  We are storing that state in the taskflow flow storage.23:06
rm_workthat's interesting, as the queue is the only storage for the "new" state, and if it's picked up by a controller worker... and that CW fails....23:06
rm_workit would have to just be stuck back on the queue right?23:06
rm_workin which case another worker could pick it up anywhere anyway?23:06
johnsomWhen you turn on jobboard it persists that in a DB table while the flow is running23:06
rm_worktaskflow does it?23:06
rm_workstill have managed to avoid learning pretty much anything about taskflow somehow23:07
xgermanyeah, and since it has revert logic we should be able to have transcations23:07
xgermanunless we are worried that other flows intercept us23:07
sbalukoffxgerman: You're going to need to be able to lock other entities out of altering stuff. Presently we manage that through the 'PENDING_UPDATE' state.23:08
johnsomRight now our flow storage is in-memory.  That is part of the work to turn on job board, we define the persistent store.23:08
johnsomsbalukoff right23:08
xgermanwell, in-memory out of memory is an implementation deatil23:08
xgermanyep, we mark the whole LB pending23:09
sbalukoffSomething is not clear to me: Why couldn't we retry if the end-state we want for listeners is stored in the database?23:09
xgermanand reject any update23:09
sbalukoffxgerman: Yep.23:09
xgermandatabase = operational state23:09
sbalukoffThat's not true when we're updating.23:09
sbalukoff(Hence the reason we lock out other transactions with the pending update state)23:10
xgermanwe lock them out since it would cause concurrency issues23:10
sbalukoffYes, I know.23:11
sbalukoffBut the nature of that concurrency issue is that while the update is happening, the database is not actually in synch with what's on the amphorae.23:11
xgermanthen we give the state we want to task flow it makes it so or rolls back23:11
*** Bjoern has quit IRC23:11
sbalukoffBut.... roll backs don't always work.23:11
sbalukoffThat is to say, our current roll-back is to just go to the error state.23:12
xgermanthose are the bugs we need to fix :-)23:12
sbalukoffWouldn't it be possible to store "this is what the DB model looked like just before the update" and attempt to roll that back? (Again, I'm having trouble understanding the need for the data model here.)23:13
sbalukoffThat is...23:13
sbalukoffto store that in the same way we're going to have to store the data model state for retries?23:13
johnsomPrior to your changes, we don't need to store anything additional to what we currently have to implement retries23:14
xgermanIn think storing in the DB the state how it is should be our goal… otherwise monitoring/observing the system will get tricky23:14
xgermanI am Still puzzled that update would not follow that philosophy23:15
*** allan_h has quit IRC23:16
sbalukoffjohnsom: So are you saying we should just ditch the idea of supporting features that require more complex models like L7 policies and rules? :P23:16
*** piet_ has joined #openstack-lbaas23:16
*** ducttape_ has quit IRC23:16
johnsomNot my intent, just saying that your statement about storing state for retries did not exist.23:17
* sbalukoff sighs23:17
sbalukoffWhat do we currently retry?23:17
* johnsom sighs23:18
johnsomI am simply saying, there is nothing more to store23:18
johnsomIt's already there23:18
blogansbalukoff: are you saying no need for data models or sql alchemy models? (not sure if you're saying one or both)23:20
sbalukoffI'm not sure what we need the data models for.23:20
johnsomI think he doesn't want to update the data models for L723:20
blogansbalukoff: but do you mean the static data models, or the sqlalchemy models?23:21
sbalukoffEr... the ones described in common/ ?23:21
blogansbalukoff: okay, the redundant ones23:21
sbalukoffI agree that sqlalchemy adds some complication in inadvertent ways in the effort to make many other things around the database simpler.23:22
blogansbalukoff: so instead use the sqlalchemy models and pass those around?23:22
johnsomSo, most of the gang is here it seems.  Should we re-evaluation the "maintain history during update" decision?23:22
sbalukoffI'm not sure at this point whether I would advocate scrapping the sqlalchemy stuff right now.23:22
sbalukoffblogan: Er... yes? I think?  We don't have things named very descriptively, so it might help to reference the file in which they're defined.23:23
blogani would say keep the sqlalchemy models, but remove the data models if possible, but that's got some big cons to it23:23
sbalukoffIf by sql alchemy models you means the stuff in db/ that builds off of...23:23
*** jwarendt has quit IRC23:23
sbalukoffOh! I forgot we can get at this stuff outside of gerrit.23:24
johnsomSo, on update and delete calls to the Octavia API, is there value in maintaining the currently deployed state of the amp until the update on the amp completes successfully?23:24
*** jwarendt has joined #openstack-lbaas23:24
sbalukoffI think if we're going to keep the data models, we should force everything higher level to use that interface (like the api)23:25
bloganjohnsom: so update the db with the new information immediately at the API layer?23:25
johnsomblogan Yes23:25
blogansbalukoff: does it not now?23:25
sbalukoffblogan: That's what it does now.23:25
sbalukoffOr rather...23:25
bloganjohnsom: so if we update the information immediately, then users will see that information immediately on GETs to that resource...which isn't too big of a deal23:26
bloganthinking out loud here23:26
sbalukoffIt calls the handler, which does get sent to the controller worker.23:26
sbalukoffBut the API is going straight to the repository all over the place...23:26
johnsomblogan yes, the get call would return data that might not yet be in effect23:26
johnsomblogan Especially if we have a large number of amps23:27
blogansbalukoff: i dont understand what you mean yet23:27
bloganjohnsom: the whole reason to do the update only one success was to rollback to a previously known good state, obviously not implemented yet, but also it left open the ability to do a task like API in the future23:28
johnsomblogan Yep23:29
blogantask like API would be, someone requests a create, update, or delete, and tehn they get a task id and query the task id until active23:29
bloganbut that was too forward thinking honestly23:29
sbalukoffblogan: In api/v1/controller/ you will find a line that looks like this:  db_listener = self.repositories.listener.create(23:29
sbalukoff                session, **listener_dict)23:29
sbalukoffSo... we're going straight to the DB to do that.23:29
blogansbalukoff: yeah, so what should we do?23:29
sbalukoffor repository. Which is effectively the DB.23:29
*** jwarendt has quit IRC23:30
sbalukoffIf we keep the data models, we should disallow accessing the repository directly. So the above call becomes a call to a handler only, or something else which operates on the data_model.23:31
sbalukoffAlso, we're doing things like setting the LB and listener statuses on update.23:31
sbalukoffWhich, again, is happening directly against the DB.23:31
sbalukoff(or repository, again... which is effectively the DB)23:31
bloganso the main concern with going to the handler for that is the status change23:32
bloganwell not just that23:32
bloganbut now we'd need to build a whole rpc api for the db calls23:33
*** yamamoto_ has joined #openstack-lbaas23:33
johnsomSo, going through the models, we make the transaction thing worse right?23:33
sbalukoffwhich models?23:33
bloganalso what do yall mean by not being able to do transactions?23:33
blogando amp operations in the middle of a db transaction?23:33
johnsomyour proposal for no direct repo access.23:33
sbalukoffjohnsom: I suspect it would, yes.23:34
sbalukoffWell... maybe.23:34
johnsomYeah, right now we are doing a poor man's transaction in the flows by updating the models and leaving the DB in place.23:34
sbalukoffIf the handler handled keeping state so the API didn't have to worry about it at all that might be better.23:34
*** longstaff has quit IRC23:34
sbalukoffI dunno.23:35
sbalukoffA lot of this stuff is really opaque since we don't seem to be consistently using layers.23:35
blogani don't like going to the handler for db calls23:36
johnsomI think we should write native sql and get rid of it all!  grin23:37
blogani'm not entirely sure if oslo messaging makes rpc cast calls go over the queue, but either way GETs going straight to the workers will overload them23:37
*** yamamoto_ has quit IRC23:37
*** piet_ has quit IRC23:37
sbalukoffblogan: We're already going to the handler to update the DB in the case of updates and deletes.23:37
johnsomsbalukoff I don't follow you there23:38
blogansbalukoff: yes but i'm talking about GETs on the API will have to retrieve from the db, and GETs happen exponentially more frequently than POSTs PUTs and DELETEs, so the cotnroller worker serving all those GETs is asking for trouble23:38
johnsomWhat handler?23:38
bloganjohnsom: handler, right now its the queue producer basically23:38
sbalukoffAgain, looking at api/v1/controllers/   put method:   self.handler.update(db_listener, listener)23:38
sbalukoffThat's how we update the DB.23:38
johnsomblogan +1 to that23:38
johnsomblogan GETs should not go across the queue23:39
sbalukoffBut we don't do that for creates.23:39
blogansbalukoff: and johnsom is suggesting just doing the db update before the handler update23:39
bloganjust like creates23:39
sbalukoffblogan: Er.. that's what I have been suggesting.23:39
sbalukoffThe task flow reorder would do that.23:40
blogansbalukoff: what? you said the API node should go to the handler to get db access, which to me means go across the queue, or straight to the controller23:40
johnsomsbalukoff No, the order would go away23:40
sbalukoffjohnsom: Ok, now I'm confused.23:40
bloganwe all are :(23:40
sbalukoffblogan: You're right that I was suggesting going to the handler for DB access. But you're also right that I wasn't thinking about read-only queries (which should be safe to do directly).23:41
blogansbalukoff: ah okay now i think i know what you mean23:41
sbalukoffMostly I was pointing out that for creates we update the database THEN call the handler, whereas for updates and deletes we just call the handler.23:41
sbalukoff(And the handler updates the DB.)23:41
blogansbalukoff: so creates would go to the handler as well before updating the db?23:42
johnsomIf we are going to not maintain history, as the re-order patch does, there is no point to all of that model stuff.  The API should update the database straight away and call it a day.23:42
sbalukoffblogan: If we want to be consistent in how we handle retries / rollbacks.. yes. But I'm actually much more of a fan of the idea that we update the DB then call the handler for ALL write operations to the database...23:42
blogansbalukoff: if creates go to the handler before the db, the user won't be able to get an ID from the POST to know what load balancer to query23:43
sbalukoffblogan: Well... right now we return data from a delete and update that's stale, too.23:43
bloganso yeah i'm in favor of doing the updates the same way as creates23:43
johnsomsbalukoff NO, we do not return stale data on update or delete23:44
blogansbalukoff: depends on your definition of stale23:44
bloganwhen the request returns that is still hwo the lb is configured, or whatever resource23:44
johnsomWe currently, pre-reorder return the actual state of the amp23:44
*** armax has quit IRC23:44
*** armax has joined #openstack-lbaas23:44
sbalukoffjohnsom: So... we tell the user about a state that will only exist if the transaction succeeds?23:45
sbalukoffIn other words, the same data we would return if we just updated the database in the first place? ;)23:45
johnsomsbalukoff Not currently23:45
johnsomsbalukoff with the re-order, yes your would return in-accurate data if it fails23:46
sbalukoffRight. I'm just more concerned about returning accurate data if it succeeds.23:46
johnsomTo me re-order == update db at API time23:46
johnsomBoth the current way and re-order would return accurate data on success23:47
sbalukoffI'm not sure I understand what you mean when you say "pre-reorder return the actual state of the amp"23:47
sbalukoffSo... both the current way and re-order return inaccurate data on failure?23:47
johnsomsbalukoff Current code, pre-reorder, get calls to the API reflect the actual state of the LB at all times.23:47
xgermanand I like that23:48
johnsomCurrent way will continue to return accurate data on a failure, re-order would return inaccurate data23:48
sbalukoffHow do you figure?23:48
johnsomas the database has been updated, but the load balancer has not yet changed23:48
sbalukoffWe're talking failure mode: The load balancer is in error.23:48
xgermanblogan, gorge, and I decided a while ago to go with the real data in the DB - desired data passed into the queue/flow23:49
sbalukoffSo, almost by definition, we can't trust that the state of the listener is accurate EXCEPT for the fact that it's in error.23:49
johnsomState == Error, configuration == new configuration not currently running on the load balancer.23:49
johnsomxgerman +1 yes, we decided this.  This is a re-hash23:50
xgermanerror is a thorny issue and we might have bugs in it since those cases are hard23:50
xgermanso I am not defending todays rollback code23:50
johnsomJust because the lb is in ERROR (today) doesn't mean we shutdown the amps, they are still running with a config different than a GET would return23:50
sbalukoffjohnsom: By "State == Error, configuration == new configuration not currently running on the load balancer" are you talking about with or without the taskflow reorder.23:51
johnsomxgerman +1 we have not implemented retry code yet.23:51
sbalukoffBecause that still accurately describes things with the task flow reorder.23:51
johnsomError is a cop-out23:51
johnsomsbalukoff With the reorder.  It causes this23:52
xgermangoal is to make the code better so even error has the right info in the DB23:52
johnsomWithout re-order, we do not have that problem23:52
sbalukoffjohnsom: Without the reorder, today, we get the same thing.23:52
johnsomsbalukoff No!23:52
bloganjohnsom: would doing the db update for all updates at the api layer solve a lot of these issues?23:52
johnsomblogan No, it is the same as re-order, just a better implementation23:53
sbalukoffjohnsom: Er... I could be wrong, but I swear that's what I've been seeing in my testing...23:53
johnsomsbalukoff With current code, we fail, the lb goes into error, but a get will pull what is actually running on the amps23:53
xgermanblogan, making the DB show what we want makes it difficult to get a sense what’s actually running23:53
sbalukoffjohnsom: "what is actually running on the amps" is incorrect. You can't assume anything is actually running on the amps correctly at that point.23:54
johnsomsbalukoff because we do not commit the updates to the DB until after all of the amps have successfully been updated.  Active/Active scale out actually makes the re-order problem worse23:54
johnsomsbalukoff why?23:55
sbalukoffjohnsom: You have to know all the ways the amphora update can fail.23:55
johnsomsbalukoff we know the attempt to change what is running on the amp failed23:55
sbalukoffYep. And we don't know why.23:55
bloganjohnsom: if someone adds a member on another subnet, that network has to be plugged and that could cause the amp to fail somehow23:56
johnsomThat is a create call23:56
xgermanok, so you think amp in error - all bets are off, should it between the eyes and start over23:56
bloganso its possible the amp could not be running, but i'd says its unlikely23:56
bloganjohnsom: oh good point lol23:56
sbalukoffxgerman: That's generally what I expect most people to do.23:57
sbalukoffDo we not allow changing the member subnet and IP once they are created?23:57
xgermanwell, then the fresh amp fails and you start diggibng23:57
blogansbalukoff: nope, i see no reason to either23:57
bloganyou may as well creat a new member, thats essentially what you're doing23:57
xgermanand that digging part is helped by some hints from the DB...23:57
sbalukoffblogan: Ok. But that's one example. There are lots of reasons why the amp update might fail. You might just have lost power to part of your datacenter...23:58
xgermanor you file a ticket with your cloud admin to look at the logs...23:58
bloganxgerman: that is a decent reason23:58
johnsomYeah, that is the "debug" argument for keeping it as it is.23:59
blogani know23:59
johnsomAnd, thus the question, do we want to give up having the history in the DB23:59
bloganokay so, what are the alternatives here, keep things as is OR update db before?23:59

Generated by 2.14.0 by Marius Gedminas - find it at!