Friday, 2016-02-12

minwang2	cool	00:01
*** amotoki has quit IRC		00:02
*** madhu_ak_ is now known as madhu_ak		00:05
*** doug-fis_ has joined #openstack-lbaas		00:10
*** TrevorV has quit IRC		00:12
openstackgerrit	Michael Johnson proposed openstack/octavia: Stop logging amphora cert generation in debug https://review.openstack.org/279334	00:12
*** doug-fish has quit IRC		00:15
*** ducttape_ has joined #openstack-lbaas		00:17
ajmiller	dougwig you around?	00:18
*** kevo has quit IRC		00:46
*** SumitNaiksatam has quit IRC		00:53
*** minwang2 has quit IRC		00:58
*** ajmiller has quit IRC		00:58
*** allan_h has quit IRC		01:09
openstackgerrit	Reedip proposed openstack/neutron-lbaas: Make subnet_id for lbaas v2 member create optional https://review.openstack.org/267935	01:13
*** ducttape_ has quit IRC		01:21
*** SumitNaiksatam has joined #openstack-lbaas		01:32
*** Aish has quit IRC		01:37
*** yamamoto_ has joined #openstack-lbaas		01:44
*** drjones has quit IRC		01:44
*** harlowja has quit IRC		01:55
*** amotoki has joined #openstack-lbaas		01:58
*** shakamunyi has joined #openstack-lbaas		01:58
*** ducttape_ has joined #openstack-lbaas		02:00
*** amotoki has quit IRC		02:02
*** yamamoto_ has quit IRC		02:04
*** Aish has joined #openstack-lbaas		02:07
*** Aish has left #openstack-lbaas		02:07
*** shakamunyi has quit IRC		02:12
*** rooney has quit IRC		02:18
*** ducttape_ has quit IRC		02:23
*** rooney has joined #openstack-lbaas		02:24
*** shakamunyi has joined #openstack-lbaas		02:31
*** Bjoern has joined #openstack-lbaas		02:34
*** Bjoern has quit IRC		02:39
*** bana_k has joined #openstack-lbaas		02:42
*** minwang2 has joined #openstack-lbaas		02:46
*** yamamoto_ has joined #openstack-lbaas		02:51
*** madhu_ak has quit IRC		02:58
*** ajmiller has joined #openstack-lbaas		03:06
*** ajmiller_ has joined #openstack-lbaas		03:08
*** ajmiller has quit IRC		03:12
*** ajmiller_ is now known as ajmiller		03:13
*** ducttape_ has joined #openstack-lbaas		03:14
*** doug-fis_ has quit IRC		03:19
*** doug-fish has joined #openstack-lbaas		03:21
*** doug-fish has quit IRC		03:21
*** kevo has joined #openstack-lbaas		03:22
*** ducttape_ has quit IRC		03:22
*** bana_k has quit IRC		03:25
*** links has joined #openstack-lbaas		03:32
*** ducttape_ has joined #openstack-lbaas		03:45
*** woodster_ has quit IRC		03:46
*** minwang2 has quit IRC		03:48
*** minwang2 has joined #openstack-lbaas		03:51
*** amotoki has joined #openstack-lbaas		04:41
*** minwang2 has quit IRC		04:42
*** minwang2 has joined #openstack-lbaas		04:44
*** ducttape_ has quit IRC		04:58
*** ducttape_ has joined #openstack-lbaas		05:00
*** ducttape_ has quit IRC		05:05
*** pc_m has quit IRC		05:10
*** hockeynut has quit IRC		05:10
*** rm_work has quit IRC		05:11
*** hockeynut has joined #openstack-lbaas		05:12
*** pc_m has joined #openstack-lbaas		05:12
*** rm_work has joined #openstack-lbaas		05:18
*** ajmiller_ has joined #openstack-lbaas		05:18
*** chlong has quit IRC		05:19
*** ajmiller has quit IRC		05:21
*** ajmiller_ has quit IRC		05:21
*** allan_h has joined #openstack-lbaas		05:27
*** fnaval has quit IRC		05:31
*** chlong has joined #openstack-lbaas		05:33
*** fnaval has joined #openstack-lbaas		05:41
*** chlong has quit IRC		05:42
*** chlong has joined #openstack-lbaas		05:54
*** rooney has quit IRC		06:00
*** ducttape_ has joined #openstack-lbaas		06:01
*** ducttape_ has quit IRC		06:06
*** numans has joined #openstack-lbaas		06:09
*** chlong has quit IRC		06:11
*** prabampm has joined #openstack-lbaas		06:14
*** rooney has joined #openstack-lbaas		06:17
*** numans has quit IRC		06:26
*** numans has joined #openstack-lbaas		06:28
*** chlong has joined #openstack-lbaas		06:34
*** numans has quit IRC		06:53
*** armax has quit IRC		06:54
*** ducttape_ has joined #openstack-lbaas		07:03
*** chlong has quit IRC		07:03
*** ducttape_ has quit IRC		07:08
*** numans has joined #openstack-lbaas		07:15
*** amotoki has quit IRC		07:16
*** minwang2 has quit IRC		07:22
*** minwang2 has joined #openstack-lbaas		07:23
*** amotoki has joined #openstack-lbaas		07:24
*** lmiccini\|away is now known as lmiccini		07:27
*** minwang2 has quit IRC		07:28
*** minwang2 has joined #openstack-lbaas		07:28
*** allan_h has quit IRC		07:29
*** minwang2 has quit IRC		07:30
*** amotoki has quit IRC		07:35
*** minwang2 has joined #openstack-lbaas		07:37
*** amotoki has joined #openstack-lbaas		07:39
openstackgerrit	okamoto_takashi_p3 proposed openstack/neutron-lbaas: Fixed connection limit for HA Proxy https://review.openstack.org/279402	07:41
*** minwang2 has quit IRC		07:43
*** amotoki has quit IRC		07:43
*** reedip is now known as reedip_away		07:54
*** amotoki has joined #openstack-lbaas		07:55
*** rooney has quit IRC		07:57
openstackgerrit	Stephen Balukoff proposed openstack/octavia: Fixed database model traversal https://review.openstack.org/279414	08:10
openstackgerrit	okamoto_takashi_p3 proposed openstack/neutron-lbaas: Fixed connection limit for HA Proxy https://review.openstack.org/279402	08:23
*** kevo has quit IRC		08:26
*** ducttape_ has joined #openstack-lbaas		09:04
*** mhickey has joined #openstack-lbaas		09:07
*** mhickey has quit IRC		09:08
*** mhickey has joined #openstack-lbaas		09:09
*** ducttape_ has quit IRC		09:09
openstackgerrit	Stephen Balukoff proposed openstack/octavia: Add L7 database structures https://review.openstack.org/265430	09:10
openstackgerrit	Stephen Balukoff proposed openstack/octavia: Update repos for L7 policy / methods https://review.openstack.org/265529	09:31
*** amotoki has quit IRC		09:34
openstackgerrit	Stephen Balukoff proposed openstack/octavia: Update repos for L7 rules / validations https://review.openstack.org/276643	09:41
*** amotoki has joined #openstack-lbaas		09:48
*** amotoki has quit IRC		09:53
openstackgerrit	Stephen Balukoff proposed openstack/octavia: Add L7 api - policies https://review.openstack.org/265690	09:55
*** amotoki has joined #openstack-lbaas		09:56
*** amotoki has quit IRC		10:01
openstackgerrit	Stephen Balukoff proposed openstack/octavia: Add L7 api - rules https://review.openstack.org/277718	10:01
*** ducttape_ has joined #openstack-lbaas		10:05
openstackgerrit	Stephen Balukoff proposed openstack/octavia: Add L7 controller worker flows and tasks https://review.openstack.org/277768	10:06
*** ducttape_ has quit IRC		10:09
*** amotoki has joined #openstack-lbaas		10:10
openstackgerrit	Stephen Balukoff proposed openstack/octavia: Add L7 jinja template updates https://review.openstack.org/278223	10:11
*** amotoki has quit IRC		10:12
*** amotoki has joined #openstack-lbaas		10:13
openstackgerrit	Stephen Balukoff proposed openstack/octavia: Add L7 documentation https://review.openstack.org/278830	10:17
*** openstackgerrit has quit IRC		10:32
*** openstackgerrit has joined #openstack-lbaas		10:33
*** numans has quit IRC		10:42
openstackgerrit	Merged openstack/octavia: Fixes an intermittent load balancer delete failure https://review.openstack.org/278856	10:52
*** ducttape_ has joined #openstack-lbaas		11:06
*** ducttape_ has quit IRC		11:11
*** numans has joined #openstack-lbaas		11:30
*** amotoki has quit IRC		11:30
*** numans has quit IRC		11:34
*** numans has joined #openstack-lbaas		11:36
*** ducttape_ has joined #openstack-lbaas		12:07
*** links has quit IRC		12:10
*** ducttape_ has quit IRC		12:12
*** yamamoto_ has quit IRC		12:12
*** intr1nsic has quit IRC		12:22
*** intr1nsic has joined #openstack-lbaas		12:25
*** rtheis has joined #openstack-lbaas		12:25
*** ducttape_ has joined #openstack-lbaas		13:07
*** _ducttape_ has joined #openstack-lbaas		13:11
*** ducttape_ has quit IRC		13:12
*** mhickey has quit IRC		13:12
*** _ducttape_ has quit IRC		13:25
*** amotoki has joined #openstack-lbaas		13:33
*** amotoki has quit IRC		13:43
*** amotoki has joined #openstack-lbaas		13:51
*** amotoki has quit IRC		13:52
*** amotoki has joined #openstack-lbaas		13:58
*** neelashah has joined #openstack-lbaas		13:58
*** ajmiller has joined #openstack-lbaas		14:11
*** localloop127 has joined #openstack-lbaas		14:20
*** ajmiller has quit IRC		14:44
*** amotoki has quit IRC		14:45
*** amotoki has joined #openstack-lbaas		14:48
*** Bjoern has joined #openstack-lbaas		14:48
*** prabampm has quit IRC		14:53
*** amotoki has quit IRC		14:53
*** piet has joined #openstack-lbaas		14:59
*** ducttape_ has joined #openstack-lbaas		14:59
*** doug-fish has joined #openstack-lbaas		15:01
*** TrevorV has joined #openstack-lbaas		15:04
*** amotoki has joined #openstack-lbaas		15:09
*** yamamot__ has joined #openstack-lbaas		15:09
*** doug-fish has quit IRC		15:21
*** doug-fish has joined #openstack-lbaas		15:22
*** ajmiller has joined #openstack-lbaas		15:28
*** fnaval has quit IRC		15:59
*** localloo1 has joined #openstack-lbaas		16:02
*** localloop127 has quit IRC		16:05
*** localloop127 has joined #openstack-lbaas		16:06
*** localloo1 has quit IRC		16:07
*** fnaval has joined #openstack-lbaas		16:19
*** links has joined #openstack-lbaas		16:23
*** numans has quit IRC		16:31
*** armax has joined #openstack-lbaas		16:32
*** woodster_ has joined #openstack-lbaas		16:39
*** localloop127 has quit IRC		16:40
*** allan_h has joined #openstack-lbaas		16:41
*** localloop127 has joined #openstack-lbaas		16:41
*** bana_k has joined #openstack-lbaas		16:43
*** localloop127 has quit IRC		16:49
*** localloop127 has joined #openstack-lbaas		16:51
kevinbenton	blogan: yo	16:53
kevinbenton	blogan: you had question about heartbeats?	16:54
johnsom	kevinbenton Octavia heartbeats? I might be able to answer	16:54
kevinbenton	johnsom: no idea, just had a ping from him from yesterday	16:55
johnsom	Ah, ok	16:55
johnsom	Can't help you there	16:55
sbalukoff	johnsom: I have to jump into a phone meeting for the next half hour,but I have something I want to discuss with you after that. Do you think you'll be available?	16:57
johnsom	sbalukoff available until 11, then after 1:30	16:58
sbalukoff	(It's about this https://review.openstack.org/#/c/279414/ as well as what I think was causing the pushing of "out of date" configs to the amphorae)	16:59
*** yamamot__ has quit IRC		16:59
johnsom	Ok, yeah, ping me	16:59
*** yamamoto_ has joined #openstack-lbaas		17:00
blogan	kevinbenton: neutron agent heartbeats	17:02
kevinbenton	blogan: yeah, what about them?	17:02
blogan	kevinbenton: if a heartbeat is found to be stale, does the agent's admin_state_up get moved to disabled, or is it the status that cahnges?	17:02
kevinbenton	blogan: just the status	17:03
kevinbenton	blogan: admin_state_up is purely controlled via the API	17:03
kevinbenton	blogan: admin_state_up is to prevent stuff from being scheduled to it	17:04
blogan	kevinbenton: okay thats what i figured	17:04
openstackgerrit	Michael Johnson proposed openstack/octavia: Stop logging amphora cert generation in debug https://review.openstack.org/279334	17:04
blogan	kevinbenton: well wouldn't you want agents that have a stale heartbeat to not be scheduled to?	17:04
blogan	if the status is OFFLINE or DOWN or whatever it is	17:05
kevinbenton	blogan: yes	17:05
blogan	kevinbenton: okay just making sure on that, lbaas agent does not check status when scheduling	17:05
kevinbenton	blogan: i was just pointing out that admin_state_up is just used as a way to make it explicit	17:05
blogan	kevinbenton: makes sense, thansk	17:05
kevinbenton	blogan: i don't think this applies to the L2 agent	17:05
kevinbenton	blogan: there was some discussion around changing that	17:06
kevinbenton	blogan: but don't look to l2 agent as example	17:06
blogan	kevinbenton: just looking at the lbaas agent, and it uses the common agent code in neutron	17:06
*** localloop127 has quit IRC		17:09
*** localloop127 has joined #openstack-lbaas		17:10
kevinbenton	blogan: now that i think about it i'm not sure the status is actually stored anywhere	17:15
kevinbenton	blogan: for the other agents	17:15
*** _cjones_ has joined #openstack-lbaas		17:16
*** openstackgerrit has quit IRC		17:17
*** openstackgerrit has joined #openstack-lbaas		17:18
kevinbenton	blogan: i think the dead/alive is calculated dynamically on API queries	17:18
kevinbenton	blogan: yeah, only thing in the DB is admin_state_up, no status column	17:19
kevinbenton	blogan:	17:19
kevinbenton	https://github.com/openstack/neutron/blob/master/neutron/db/agents_db.py#L72-L106	17:19
kevinbenton	is_active is calculated based on delta from last updated timestampe	17:20
*** amotoki has quit IRC		17:31
blogan	kevinbenton: ah so a scheduler woudlnt be able to say don't schedule here if the heartbeats stopped	17:31
kevinbenton	blogan: yeah, it can. it can check that is_active property	17:32
kevinbenton	blogan: that will calculate based on the heartbeat	17:32
blogan	kevinbenton: oh duh, sorry i was thinking about it wrong	17:32
kevinbenton	blogan: i was just pointing out that we don't even have a status column so it can't possibly be affected by anything else	17:33
blogan	kevinbenton: yeah makes sense	17:35
blogan	kevinbenton: thanks for your help	17:35
kevinbenton	blogan: no prob	17:35
sbalukoff	johnsom: Ok! So! Wheneeeeever you're ready...	17:38
johnsom	sbalukoff Coffee in hand let's go	17:38
sbalukoff	Ok, so first off, you may notice that the bugfix patch I uploaded moves things to be a lot closer to what blogan wrote there originally, before the shared pools patch.	17:39
johnsom	Yes	17:39
sbalukoff	What it does is populate a data model tree from the perspective of whatever database object you hand it first.	17:39
sbalukoff	blogan's (and this patch's) code refuse to revisit object classes we've seen before.	17:40
johnsom	Right	17:40
sbalukoff	This means, that we can trust the data model to be accurate for the first object we pass to it but things will get less accurate the farther down the tree we traverse.	17:40
sbalukoff	My code that I have in there (without this bugfix) tries to make things a little more fleshed out...	17:40
sbalukoff	You can trust the accuracy of the object you pass to it, and all of its immediate children.	17:41
sbalukoff	However, the problem with that is once you have a moderate number of things populated in the tree...	17:41
sbalukoff	The efficiency of this algorithm becomes worse than n^2	17:41
sbalukoff	It's really, really terrible, actually.	17:42
johnsom	Yeah, not fun	17:42
sbalukoff	Which is why I'm advocating for essentially a roll-back.	17:42
sbalukoff	But!	17:42
sbalukoff	Where the rubber hits the road here is what we do in the flows whenever we do a amphora_driver_task.ListenersUpdate.	17:42
sbalukoff	Right now, we pass that method a data model which has not been populated from the perspective of the listeners we are updating.	17:43
sbalukoff	So... you get inaccuracies.	17:43
sbalukoff	I see there being two potential solutions to this problem:	17:43
johnsom	Ok, you lost me on that one	17:44
sbalukoff	1. Before we call ListenersUpdate, we have to populate the data model from the perspective of the listeners we pass to it, and then do our preliminary updates (eg. "delete this pool" "update this member", etc.) on the data model we've populated from the perspective of the listener...	17:44
sbalukoff	Ok...	17:44
sbalukoff	Let me find a concrete example.	17:44
sbalukoff	Give me a minute.	17:44
johnsom	I thought that was what these were for: https://github.com/openstack/octavia/blob/master/octavia/controller/worker/flows/listener_flows.py#L72	17:44
sbalukoff	Yes..	17:45
sbalukoff	So let's see...	17:45
sbalukoff	If you look in controller_worker.py...	17:45
sbalukoff	In the update_member method.	17:45
sbalukoff	See how we populate the member from the database....	17:45
sbalukoff	But then 'listeners' is populated from that member data model?	17:45
*** jwarendt has quit IRC		17:46
sbalukoff	That's where we run into trouble.	17:46
johnsom	Yeah, we get the old member from DB, store it and the member updates in the flow. This allows the flow to update the model before the listener_update	17:46
sbalukoff	We need to do a repo.get for the listener.	17:46
*** bana_k has quit IRC		17:46
sbalukoff	That's solution 1. to this problem.	17:46
sbalukoff	The second solution is to just update the database first, and allow the amphora_driver_tasks.ListenersUpdate to populate the listener directly from database data.	17:47
johnsom	So you are saying that listeners list, from the member.pool is incomplete?	17:47
sbalukoff	Hear me out on this one:	17:47
*** longstaff has joined #openstack-lbaas		17:48
sbalukoff	johnsom: Yes, in effect. All the listeners are there, but listeners.pools is not because we're really doing this: member.pool.listeners.pool.member	17:48
sbalukoff	Er.... sorry, memberl.pool.listeners.pools.member	17:48
sbalukoff	Dammit, can't type today.	17:48
sbalukoff	Anyway, you see how once we've traversed a member object, then a pool object, then a listener object, we can't go back to a pool object?	17:49
sbalukoff	Not with the way we're populating the data model.	17:49
sbalukoff	(If we're following blogan's code, which I think is the only efficient way to do this.)	17:49
sbalukoff	So.... I also want to say something about populating the database first:	17:49
sbalukoff	Think about about what we presently do when we revert:	17:50
sbalukoff	We mark the listener as being in the 'ERROR' state.	17:50
johnsom	I think I am following you. I think this is why we have the reload task. We probably need to add that to the member update flow so it takes that list of members and updates those models	17:50
sbalukoff	I interpret that to mean "this listener in the database might be out of synch with whatever is actually in place on the amphora"	17:50
sbalukoff	johnsom: Yep. We actually have to do that for any flow that is not a listener.	17:51
sbalukoff	So, pool, health_monitor, session_persistence, and also (now) l7policy, l7pool.... etc. etc. etc.	17:51
johnsom	Hmmm, but all of those work today	17:52
sbalukoff	I think it would be far simpler to just update the database first, and if we need to revert... well, we're already marking the listener as being in 'ERROR' right?	17:52
sbalukoff	To be honest, I'm not sure they actually do work today. I think our scenario tests are fairly weak right now.	17:53
sbalukoff	Er... tempest tests.	17:53
johnsom	I'm not counting on scenario tests, I'm going on my testing	17:53
*** jwarendt has joined #openstack-lbaas		17:53
sbalukoff	Ok.	17:54
johnsom	EVERYTHING was broken with that database pull line I removed. That I know broke a ton	17:54
sbalukoff	Yep.	17:54
johnsom	Let me do a quick update test on my devstack with our latest master.	17:54
sbalukoff	Ok.	17:54
sbalukoff	Yeah-- I'm not even sure my super-inefficient model population code saves us from this.	17:55
sbalukoff	Because, really, we're not updating the model right in the controller_worker before we hand it to the ListenersUpdate task.	17:55
johnsom	I guess, I should pull your patch, right?	17:55
sbalukoff	That would put you in a state much closer to how things were before the shared pools patch.	17:56
sbalukoff	Though if you want to be entertained (or bored), go with master, populate a complete load balancer tree, and try to do an API get on the load balancer.	17:56
sbalukoff	And watch the octavia-api daemon spin for about 30 seconds while consuming 100% CPU. XD	17:57
johnsom	Ok, give me a minute. My build LB script is running. After it is done, I will pull your patch, restart and do updates on objects down the tree	17:57
*** links has quit IRC		17:58
sbalukoff	There's one other fly in the ointment that should probably be mentioned if we want to go option 1, and update the model only (not the DB) before passing the model to the ListenersUpdate task...	17:58
sbalukoff	There's a lot of logic in the repository.py file (not just in the l7policy / l7rule stuff I'm trying to add) which is not presently reflected in the data_models.py file.	17:59
sbalukoff	So if you're trying to simulate update of the database while you're really only updating the data model... you're going to have to duplicate all the logic that happens in repository.py.	18:00
sbalukoff	Which is ugly, to say the least.	18:00
sbalukoff	Also, I should note that I could be totally wrong about all this-- this is what I discovered around 1:00am last night and I've only had about 4.5 hours of sleep. But I don't think I'm wrong about this. XD	18:03
sbalukoff	Hmmm... there's also a distant third option, but it might not be possible.... We might be able to figure out a way to hold a database write lock while we attempt to update all the amphorae... and then roll back the transaction if we need to revert. But that also seems very ugly.	18:06
johnsom	Well, that was interesting. I did a pool update to LEAST_CONNECTIONS and the haproxy didn't render right	18:06
sbalukoff	It didn't?	18:07
sbalukoff	This is after you've removed the database lookup from ListenersUpdate?	18:07
johnsom	sbalukoff Yeah, a transaction would be the best option. Just the way we are doing repo etc. makes that nearly impossible	18:07
sbalukoff	johnsom: Yeah. Super painful.	18:08
xgerman	let me know when you figure out transactions - will need that for cascading deletes	18:09
sbalukoff	xgerman: I am so, so sorry.	18:10
johnsom	Transactions are easy. repo and sqlalchemy make them harder	18:10
*** piet has joined #openstack-lbaas		18:10
openstackgerrit	Trevor Vardeman proposed openstack/octavia: WIP - Get me a Load Balancer https://review.openstack.org/256974	18:11
*** madhu_ak has joined #openstack-lbaas		18:12
johnsom	Ok, now just a build isn't rendering the haproxy config right.	18:12
sbalukoff	johnsom: Again, this is after you've removed the database lookup from ListenersUpdate?	18:13
johnsom	Yeah, and added your latest. Going to roll that back	18:13
sbalukoff	If you're going to allow the ListenersUpdate to populate its data from the database, then the database should also be updated first.	18:14
sbalukoff	(Just sayin' ;) )	18:14
sbalukoff	Anyway, do your thing. Maybe I'm interpreting the situation wrong, eh.	18:14
*** amotoki has joined #openstack-lbaas		18:14
johnsom	I'm going to roll you traversal patch back, then try a build. It was failing on create, so not just update	18:15
*** bana_k has joined #openstack-lbaas		18:15
*** kevo has joined #openstack-lbaas		18:15
sbalukoff	Ok.	18:15
johnsom	o-cw said it was fine, but the haproxy conf was missing sections after I did the create of an lb with members	18:16
sbalukoff	nod	18:16
sbalukoff	Similar to symptoms I was seeing last night.	18:16
johnsom	Ok, still broken	18:16
johnsom	Ok, with my "revert bad workaround" removed, still not rendering. I'm going to pull out shared pools	18:21
sbalukoff	What is your "revert bad workaround"?	18:22
johnsom	https://review.openstack.org/#/c/277602/	18:23
sbalukoff	Right.	18:23
*** openstack has joined #openstack-lbaas		18:27
sbalukoff	Yeah, I screw up all the time, which it why it's good you're trying to verify whether my assessment of the situation is correct. :)	18:27
johnsom	Ok, starting over, testing master, create LB	18:28
johnsom	Ok, master, create LB works.	18:29
sbalukoff	Ok. It's just horribly inefficient then. :)	18:30
johnsom	Yeah, could be. Let me try an update	18:30
sbalukoff	Seriously, populate like 3 listeners and three pools.	18:30
sbalukoff	And watch it spin...	18:30
johnsom	Ok, so member update is showing the same, need to run twice issue. trying pool update	18:32
johnsom	(though member update was re-ordered I thought) maybe that was delete	18:32
johnsom	It was delete, nevermind	18:33
sbalukoff	I know that one of the member methods accidentally got skipped with that fix.	18:33
sbalukoff	Yeah.	18:33
johnsom	Looks like our CLI help for pool-update is broken	18:34
sbalukoff	Most of the cli update help is shit.	18:34
johnsom	Careful, I implemented some of that... grin	18:35
sbalukoff	(create and delete help isn't htat bad.) ;)	18:35
sbalukoff	But yeah... actually fixing the update help is down there on my to-do somewhere.	18:35
sbalukoff	Probably after "give Octavia its own CLI, already."	18:36
johnsom	It was a rush implement at a hack-a-thon to get it in kilo	18:36
sbalukoff	That doesn't surprise me too much. :)	18:36
sbalukoff	And really-- I'm not saying you didn't do what you had to do, eh.	18:36
sbalukoff	(Or that I would have done differently)	18:36
sbalukoff	I know how a lot of this code gets written, eh. ;)	18:37
sbalukoff	Most of the testing against Octavia I've been doing lately has just to run curl against the octavia API directly to reduce the complication / potential point of failiure of neutron-lbaas.	18:38
sbalukoff	(Makes it far simpler to determine whether it's an Octavia or a Neutron LBaaS problem I'm seeing.)	18:38
sbalukoff	This might be helpful if you're interested in doing any of that: https://gist.github.com/anonymous/f59d96918f3103ad5651	18:39
*** openstackgerrit has quit IRC		18:47
*** SumitNaiksatam has quit IRC		18:47
*** openstackgerrit has joined #openstack-lbaas		18:48
johnsom	Yeah, ok, master is not updating	18:51
johnsom	It renders, but doesn't update on first call	18:52
sbalukoff	If it's helpful, I can write a second patch (dependent on my traversal fix patch) which changes the flows so the database update comes first. Again, the revert method in this case marks the listener as "ERROR" which is accurate enough.	18:53
sbalukoff	And we can see whether that one fixes things. :/	18:53
johnsom	Let me mess with this a bit today. This stuff worked at one point and I have no idea what broke it. I want to look closer at those model updates.	18:54
sbalukoff	Ok.	18:55
johnsom	That said, I'm pretty sure your patch breaks the config render	18:55
sbalukoff	Well.... I'd like to try things with the database update in flow order reversed anyway. So i'll go ahead and write that patch, even if we end up abandoning it. :)	18:55
johnsom	Ok. Just note I will push back. But either way, this https://review.openstack.org/#/c/279414/ breaks the haproxy config render	18:56
johnsom	I can't create LBs with that patch in	18:57
sbalukoff	yep, makes sense that it would, with the database update happening after other things.	18:57
*** piet has quit IRC		18:57
sbalukoff	Also, you should feel free to push back. But have good technical reasons for doing so, eh. ;)	18:57
johnsom	Without that one patch, it creates just fine	18:57
*** localloo1 has joined #openstack-lbaas		18:58
sbalukoff	yes, and I think I know why. But also without that patch we can't support an octavia infrastructure with more than about 3 listeners and pools.	18:58
johnsom	So, order is not the issue	18:58
johnsom	Understand, work to do.	18:58
sbalukoff	right. We're not populating the data model correctly. And it turns out populating it correctly is painful because we have to duplicate logic from the repository.	18:59
*** localloop127 has quit IRC		19:01
*** localloo1 has quit IRC		19:02
*** localloo1 has joined #openstack-lbaas		19:04
*** numans has joined #openstack-lbaas		19:07
*** kobis has quit IRC		19:08
*** mhickey has joined #openstack-lbaas		19:10
mhickey	blogan, dougwig, johnsom: Hey	19:13
mhickey	blogan, dougwig, johnsom: re https://bugs.launchpad.net/neutron/+bug/1544861, looking for your guru knowledge?	19:16
openstack	Launchpad bug 1544861 in neutron "LBaaS: connection limit does not work with HA Proxy" [Undecided,New]	19:16
*** numans has quit IRC		19:17
*** localloop127 has joined #openstack-lbaas		19:17
openstackgerrit	Stephen Balukoff proposed openstack/octavia: Do DB updates before pushing listener conf to amp https://review.openstack.org/279731	19:18
*** localloo1 has quit IRC		19:19
johnsom	mhickey looking	19:20
sbalukoff	(Gonna go AFK for a bit, then run the above patch through some manual scenario tests...)	19:21
mhickey	johnsom: ack	19:22
*** kobis has joined #openstack-lbaas		19:28
johnsom	mhickey I commented and marked incomplete	19:28
johnsom	mhickey let me know if you need more	19:29
mhickey	johnsom: that's great, thanks.:)	19:31
*** yamamoto_ has quit IRC		19:34
blogan	mhayden: left a comment on the bug https://bugs.launchpad.net/neutron/+bug/1544313	20:01
openstack	Launchpad bug 1544313 in neutron "LBaaSv2 agent schedules LB's to agents that are offline" [Undecided,New]	20:01
sbalukoff	johnsom: I have to run off for a bit, but I just ran the above patch through several scenario tests and everything worked as expected. (ie. DB stays in synch with amphora haproxy.cfg, and changes get pushed immediately)	20:11
sbalukoff	I would be interested in finding out if anyone can find a breaking scenario test with the above patch.	20:12
sbalukoff	In the mean time... during my shower this morning I thought of another way to populate the data model which would be more accurate, would probably be uglier, but also probably wouldn't be n^2 or worse efficiency.	20:12
sbalukoff	I'll try giving that a shot, but I'm still in favor of doing the DB updates first because that way we don't have to duplicate logic in the flows that's already in the repository.	20:13
sbalukoff	Anyway, I'll be checking back in here periodically.	20:13
rm_work	sbalukoff: i had a question on your L7 DB review	20:16
rm_work	also -- showerthoughts best thoughts	20:16
sbalukoff	rm_work: What's the question?	20:17
rm_work	https://review.openstack.org/#/c/265430/28	20:17
rm_work	nothing major, unless my gut feeling about that _listeners var is right	20:18
sbalukoff	rm_work: I'll have a look and see about answering that later this afternoon. Thanks for the heads up!	20:18
*** kevo has quit IRC		20:19
*** localloop127 has quit IRC		20:20
rm_work	kk	20:21
*** kobis has quit IRC		20:26
*** localloop127 has joined #openstack-lbaas		20:30
*** kobis has joined #openstack-lbaas		20:46
*** yamamoto_ has joined #openstack-lbaas		20:48
*** yamamoto_ has quit IRC		20:52
*** kobis has quit IRC		20:55
xgerman	rm_work yt?	21:04
xgerman	moar bbq questions	21:04
*** piet has joined #openstack-lbaas		21:05
*** kevo has joined #openstack-lbaas		21:10
johnsom	Yeah, the more I think about it, the more I am against re-ordering the flows. Much like we decided earlier in the week, I think it is best to fix the root cause of the problem and get the better code working again.	21:17
*** yamamoto_ has joined #openstack-lbaas		21:18
blogan	johnsom, sbalukoff: i agree, just by skimming over the scrollback, address the root cause	21:21
*** yamamoto_ has quit IRC		21:22
blogan	johnsom, sbalukoff: do yall think if we dumped the data models in favor of just passing sql alchemy models around would be better? I've had experiences where sqlalchemy modesl just don't do what you want, so thats how the data models came into existence, but they need cannto reproduce the back and forward links like sqlalchemy gives	21:22
johnsom	I don't know if it would be better or not. It would be a big change	21:23
blogan	as far as the repo goes, i regret the decision to have them convert the SA models to data models, because it makes reusing the methods as subtransactions harder	21:24
blogan	would rather have another layer that does that now	21:24
blogan	well traversing the models is a pain in the ass, especially for people who don't know the limitations, and they're hard to spot	21:25
dougwig	blogan: you made that decision.	21:25
bana_k	anyone facing prob with create listener /	21:25
bana_k	?	21:25
dougwig	blogan: i think we all said, "no hurry." :)	21:25
bana_k	there is a prob with peer_port being none in the haproxy.cfg file	21:26
blogan	dougwig: why is it that you haven't said one thing but once i say i regret a decision you're quick to pounce?	21:26
dougwig	blogan: i have a deep need to add value.	21:26
blogan	lol	21:26
blogan	bana_k: i have not but i haven't done a lot of octavia testing lately	21:27
johnsom	bana_k Yes, that is related to the recent model changes we are discussing. We have some bug(s) related to the models to figure out.	21:27
bana_k	blogan: it happens only with active-standby	21:27
blogan	dougwig: i don't regret having the sa models be converted to data models, i regret doing that in that one layer	21:27
bana_k	johnsom: ok thanks	21:27
*** mhickey has quit IRC		21:28
johnsom	bana_k I'm looking into it. Hopefully we can run it to ground today.	21:29
bana_k	johnsom: thanks a lot.	21:37
*** TrevorV has quit IRC		21:38
*** piet has quit IRC		21:40
johnsom	Ok, so if I use commit 3a49fe2dcf30bd4979420bd82b74bf2c37ec215c (pre-shared pools) and cherry pick https://review.openstack.org/#/c/277602/ I can update the lb_algorithm on the pool with no trouble. Updates the first time, like it should.	21:40
*** ajmiller has quit IRC		21:45
johnsom	member-update is not working.	21:47
johnsom	It takes two tries, so that is a problem I will look into after I try healthmon	21:47
xgerman	blogan Cosmos avoids sql alchemy altogether - I have been told but them it’s the devli	21:48
*** yamamoto_ has joined #openstack-lbaas		21:48
*** ajmiller has joined #openstack-lbaas		21:48
johnsom	Not completely true, it avoids the models	21:48
xgerman	you guys need to tell me things more completley	21:48
xgerman	I am binary: devli/saint	21:48
xgerman	and we wouldn’t have those problems if we would have stuck to ruby...	21:50
* xgerman hides		21:50
*** yamamoto_ has quit IRC		21:52
johnsom	lbaas-healthmonitor-update also works correctly	21:52
sbalukoff	....aaand gerrit is down.	22:15
*** localloop127 has quit IRC		22:16
sbalukoff	So, on not reordering the flows: By doing that you are saying we need to duplicate the logic already in the repositories when we update data models.	22:16
sbalukoff	Also, this is inconsistent with creates, where we definitely do create the DB entries before we push out the listener configs.	22:17
sbalukoff	And... for what? This gains us nothing in a revert: You still end up with a listener in an 'ERROR' state.	22:17
sbalukoff	Am I wrong in interpreting a listener in 'ERROR' to mean that it's potentially out of synch with the amphorae?	22:18
sbalukoff	(configuration-wise)	22:18
sbalukoff	I'm not even convinced at this point that we would want to roll back the DB transactions, per se. I think "ERROR" should be good enough-- after all, with active-standby and active-active, what do you do when one amphora takes the update just fine, but others don't?	22:19
sbalukoff	Also, realize that with https://review.openstack.org/279731 everything I've tried (including member updates) is working.	22:20
sbalukoff	So please... if you're going to argue for updating the model (and duplicating logic from the repo, including stuff the SQLAlchemy takes care of for us right now) so that we can leave the DB with the "original" objects (which won't be in the original state anyway)... can someone please explain the technical reason for wanting to do this?	22:22
johnsom	Prior to shared pools (maybe with shared pools, haven't tried it) they all work except for member update.	22:22
sbalukoff	See, it's that "except member update" part that says to me "It's not working correctly"	22:23
johnsom	The reason is the same as we discussed earlier in the week. We put a lot of effort in to having this capability (we could have just slammed the DB at the api layer) to allow retries (yes, not implemented yet), debugging of the error after the fact, and a rollback that can be more intelligent.	22:24
johnsom	Never said there isn't a bug with member.	22:24
sbalukoff	But there's not a bug with the patch I've proposed.	22:24
johnsom	But I'm not willing to throw everything else out, just because we can slam some code	22:24
sbalukoff	You are mis-characterizing what I'm proposing.	22:25
johnsom	Yes, you can no longer retry update actions, you can no longer debug the before and requested update state	22:25
sbalukoff	You're talking about hypothetical code that's not been written yet. You could indeed retry update actions if you keep track of what those actions are supposed to be.	22:25
johnsom	The code you proposed, by re-ordering, is incomplete in that you would need to pull all of the model management code out of the flows and probably just update the DB at the API layer.	22:26
johnsom	debugging is not hypothetical	22:26
sbalukoff	I'm not sure I agree with you on not being able to debug the before update state.	22:27
johnsom	You seem very worked up over the reorder patch. What is the aversion to figuring out what is actually wrong with the model code?	22:27
sbalukoff	I agree that with my update it does make most of the model code seem pointless at this point. I'm not saying we out-and-out remove it at this stage in the game.	22:27
johnsom	It's dead code with that change, so it really should be removed	22:28
sbalukoff	My biggest aversion to it is that there is significant logic that happens in repo code for l7policies and l7rules.	22:28
sbalukoff	Because the logic for handling those doesn't boil down to simple database relations.	22:28
johnsom	Would that need to change if we didn't re-order the flows?	22:29
sbalukoff	So, if I have to do that in the data model as well... not only do we potentially introduce a whole new area for bugs to creep in, but we've got to do all the stuff that SQLAchemy actually does take care of for us in there as well.	22:29
*** rtheis has quit IRC		22:30
sbalukoff	If we don't re-order the flows, I have to duplicate all of that logic in the data model.	22:30
sbalukoff	Because, again, you're working around the fact that we don't have real database transactions by trying to fake it with the data model. That's fine as long as you are working off very simple models.	22:30
sbalukoff	L7 stuff isn't that simple.	22:31
johnsom	Ok, so this, to me, is a totally different argument	22:31
johnsom	I'm going from a perspective of "it worked before(with a bug), allows retries, and maintains some level of idempotent behavior.	22:33
sbalukoff	Do we actually do retries?	22:34
sbalukoff	I would argue that this isn't really idempotent so long as we end up in an ERROR state.	22:34
johnsom	If you want to be brash about it, give me ten minutes.	22:34
sbalukoff	Or rather, what I've proposed is no less idempotent.	22:35
sbalukoff	I'm not meaning to be brash.	22:35
sbalukoff	Maybe I don't understand what you mean by "allows retries"	22:36
johnsom	So, what I am hearing is the real issue is around how we have implemented the model and the complexity of the L7 rules/etc. That I have not considered or looked at, so I'm open to looking into that argument	22:36
sbalukoff	That's part of the real issue, yes.	22:38
johnsom	What I mean is, much like we had mis-applied it, in the create amphora flows we used the retry capability of taskflow to retry certain actions n-times before fully reverting. In a future phase, when we turn on job board, that includes trying from multiple workers.	22:38
sbalukoff	And, as it's implemented, things aren't updated in the DB until it's done?	22:39
sbalukoff	(If so, then I'm pretty sure all of our create flows are off)	22:39
johnsom	Update and delete are the two actions that maintain the actual state of the amp in the database while the actions are attempted.	22:40
johnsom	Create updates DB as actions are completed (i.e. there is no previous state)	22:41
sbalukoff	What a time for gerrit to be down. :P	22:42
johnsom	So, point me to the patch that has the repo changes it it, that would make model updates hard.	22:42
johnsom	At least an hour	22:42
sbalukoff	I was about to do that.	22:42
sbalukoff	But.... I can't right now because jenkins is down.	22:42
sbalukoff	gerrit. Whatever.	22:43
johnsom	I am still worried that the shared pools patch may have broken updates and the follow on to-model patch breaks creates	22:45
sbalukoff	When it's back up, it will be this patch: https://review.openstack.org/265529 And this patch: https://review.openstack.org/276643	22:45
johnsom	Ok	22:45
sbalukoff	Are you even going to look at my to-model patch + flow re-order patch combo? (ie. the one I have not been able to get to do the wrong thing, even with member updates?)	22:47
johnsom	I can try it, but I don't know how re-order would fix the corrupt haproxy.cfg on create	22:48
*** yamamoto_ has joined #openstack-lbaas		22:48
sbalukoff	Give it a shot.	22:48
rm_work	ugh gerrrrrit	22:48
rm_work	i want to look at what you're talking about too but didn't get a chance yet lol	22:49
sbalukoff	Thinking about the retry problem-- without actual DB transactions, we're going to be stuck faking transactions in one way or another. We could do this in the model, as I think is being proposed here. But we could also do this in the database by storing state information in a secondary table. (Just brainstorming, eh.) It's still an ugly pain-in-the-ass hack of course... It would be so nice to have real database transacti	22:52
sbalukoff	ons here.	22:52
*** yamamoto_ has quit IRC		22:52
johnsom	Correction, we ARE doing it using the model. Yes, using database as history was talked about when we were designing this.	22:54
sbalukoff	If we're committed to a data model layer we really should have made sure that only that code accessed the repository. As it is, things like the API go straight to the repository.	22:54
johnsom	Well, maybe that is the right answer. Develop a method of maintaining a sqlalchemy session transaction across tasks in a flow	22:54
*** neelashah has quit IRC		23:01
rm_work	yeah i remember discussion a LB state history table	23:04
rm_work	that gets really ridiculous eventually tho	23:04
johnsom	Agreed	23:05
xgerman	The flow should be able to keep the history so we have things spanning multiple flows?	23:05
johnsom	I'm really warming to the idea of figuring out transactions	23:05
sbalukoff	Well... thinking about this some more, when jobboard is implemented... we'd have to build in state management, right? And we want to attempt from multiple controllers... which means we're probably storing state in the database in one form or another.	23:05
rm_work	hmm	23:06
xgerman	task flow has revert and such — so what additional data do we need?	23:06
johnsom	It's already handled for us. We are storing that state in the taskflow flow storage.	23:06
rm_work	that's interesting, as the queue is the only storage for the "new" state, and if it's picked up by a controller worker... and that CW fails....	23:06
rm_work	it would have to just be stuck back on the queue right?	23:06
rm_work	in which case another worker could pick it up anywhere anyway?	23:06
rm_work	ah	23:06
johnsom	When you turn on jobboard it persists that in a DB table while the flow is running	23:06
rm_work	taskflow does it?	23:06
rm_work	neat	23:06
rm_work	still have managed to avoid learning pretty much anything about taskflow somehow	23:07
johnsom	Hahaha	23:07
xgerman	yeah, and since it has revert logic we should be able to have transcations	23:07
xgerman	unless we are worried that other flows intercept us	23:07
sbalukoff	xgerman: You're going to need to be able to lock other entities out of altering stuff. Presently we manage that through the 'PENDING_UPDATE' state.	23:08
johnsom	Right now our flow storage is in-memory. That is part of the work to turn on job board, we define the persistent store.	23:08
johnsom	sbalukoff right	23:08
xgerman	well, in-memory out of memory is an implementation deatil	23:08
xgerman	yep, we mark the whole LB pending	23:09
sbalukoff	Something is not clear to me: Why couldn't we retry if the end-state we want for listeners is stored in the database?	23:09
xgerman	and reject any update	23:09
sbalukoff	xgerman: Yep.	23:09
xgerman	database = operational state	23:09
sbalukoff	That's not true when we're updating.	23:09
sbalukoff	(Hence the reason we lock out other transactions with the pending update state)	23:10
xgerman	we lock them out since it would cause concurrency issues	23:10
sbalukoff	Yes, I know.	23:11
sbalukoff	But the nature of that concurrency issue is that while the update is happening, the database is not actually in synch with what's on the amphorae.	23:11
xgerman	then we give the state we want to task flow it makes it so or rolls back	23:11
*** Bjoern has quit IRC		23:11
sbalukoff	But.... roll backs don't always work.	23:11
sbalukoff	That is to say, our current roll-back is to just go to the error state.	23:12
xgerman	those are the bugs we need to fix :-)	23:12
sbalukoff	Wouldn't it be possible to store "this is what the DB model looked like just before the update" and attempt to roll that back? (Again, I'm having trouble understanding the need for the data model here.)	23:13
sbalukoff	That is...	23:13
sbalukoff	to store that in the same way we're going to have to store the data model state for retries?	23:13
johnsom	Prior to your changes, we don't need to store anything additional to what we currently have to implement retries	23:14
xgerman	In think storing in the DB the state how it is should be our goal… otherwise monitoring/observing the system will get tricky	23:14
xgerman	I am Still puzzled that update would not follow that philosophy	23:15
*** allan_h has quit IRC		23:16
sbalukoff	johnsom: So are you saying we should just ditch the idea of supporting features that require more complex models like L7 policies and rules? :P	23:16
*** piet_ has joined #openstack-lbaas		23:16
*** ducttape_ has quit IRC		23:16
johnsom	Not my intent, just saying that your statement about storing state for retries did not exist.	23:17
* sbalukoff sighs		23:17
sbalukoff	What do we currently retry?	23:17
* johnsom sighs		23:18
johnsom	I am simply saying, there is nothing more to store	23:18
johnsom	It's already there	23:18
blogan	sbalukoff: are you saying no need for data models or sql alchemy models? (not sure if you're saying one or both)	23:20
sbalukoff	I'm not sure what we need the data models for.	23:20
johnsom	I think he doesn't want to update the data models for L7	23:20
blogan	sbalukoff: but do you mean the static data models, or the sqlalchemy models?	23:21
sbalukoff	Er... the ones described in common/data_models.py ?	23:21
blogan	sbalukoff: okay, the redundant ones	23:21
sbalukoff	I agree that sqlalchemy adds some complication in inadvertent ways in the effort to make many other things around the database simpler.	23:22
blogan	sbalukoff: so instead use the sqlalchemy models and pass those around?	23:22
johnsom	So, most of the gang is here it seems. Should we re-evaluation the "maintain history during update" decision?	23:22
sbalukoff	I'm not sure at this point whether I would advocate scrapping the sqlalchemy stuff right now.	23:22
sbalukoff	blogan: Er... yes? I think? We don't have things named very descriptively, so it might help to reference the file in which they're defined.	23:23
blogan	i would say keep the sqlalchemy models, but remove the data models if possible, but that's got some big cons to it	23:23
blogan	okay	23:23
sbalukoff	If by sql alchemy models you means the stuff in db/models.py that repositories.py builds off of...	23:23
blogan	sbalukoff: https://github.com/openstack/octavia/blob/master/octavia/db/models.py	23:23
blogan	yes	23:23
*** jwarendt has quit IRC		23:23
sbalukoff	Oh! I forgot we can get at this stuff outside of gerrit.	23:24
sbalukoff	Muahahaha!	23:24
johnsom	So, on update and delete calls to the Octavia API, is there value in maintaining the currently deployed state of the amp until the update on the amp completes successfully?	23:24
*** jwarendt has joined #openstack-lbaas		23:24
sbalukoff	I think if we're going to keep the data models, we should force everything higher level to use that interface (like the api)	23:25
blogan	johnsom: so update the db with the new information immediately at the API layer?	23:25
johnsom	blogan Yes	23:25
blogan	sbalukoff: does it not now?	23:25
sbalukoff	blogan: That's what it does now.	23:25
sbalukoff	Or rather...	23:25
blogan	johnsom: so if we update the information immediately, then users will see that information immediately on GETs to that resource...which isn't too big of a deal	23:26
blogan	thinking out loud here	23:26
sbalukoff	It calls the handler, which does get sent to the controller worker.	23:26
sbalukoff	But the API is going straight to the repository all over the place...	23:26
johnsom	blogan yes, the get call would return data that might not yet be in effect	23:26
johnsom	blogan Especially if we have a large number of amps	23:27
blogan	sbalukoff: i dont understand what you mean yet	23:27
blogan	johnsom: the whole reason to do the update only one success was to rollback to a previously known good state, obviously not implemented yet, but also it left open the ability to do a task like API in the future	23:28
johnsom	blogan Yep	23:29
blogan	task like API would be, someone requests a create, update, or delete, and tehn they get a task id and query the task id until active	23:29
blogan	but that was too forward thinking honestly	23:29
sbalukoff	blogan: In api/v1/controller/listener.py you will find a line that looks like this: db_listener = self.repositories.listener.create(	23:29
sbalukoff	session, **listener_dict)	23:29
sbalukoff	So... we're going straight to the DB to do that.	23:29
blogan	sbalukoff: yeah, so what should we do?	23:29
sbalukoff	or repository. Which is effectively the DB.	23:29
*** jwarendt has quit IRC		23:30
sbalukoff	If we keep the data models, we should disallow accessing the repository directly. So the above call becomes a call to a handler only, or something else which operates on the data_model.	23:31
sbalukoff	Also, we're doing things like setting the LB and listener statuses on update.	23:31
sbalukoff	Which, again, is happening directly against the DB.	23:31
sbalukoff	(or repository, again... which is effectively the DB)	23:31
blogan	so the main concern with going to the handler for that is the status change	23:32
blogan	well not just that	23:32
blogan	but now we'd need to build a whole rpc api for the db calls	23:33
*** yamamoto_ has joined #openstack-lbaas		23:33
johnsom	So, going through the models, we make the transaction thing worse right?	23:33
sbalukoff	which models?	23:33
blogan	also what do yall mean by not being able to do transactions?	23:33
blogan	do amp operations in the middle of a db transaction?	23:33
johnsom	your proposal for no direct repo access.	23:33
sbalukoff	johnsom: I suspect it would, yes.	23:34
sbalukoff	Well... maybe.	23:34
johnsom	Yeah, right now we are doing a poor man's transaction in the flows by updating the models and leaving the DB in place.	23:34
sbalukoff	If the handler handled keeping state so the API didn't have to worry about it at all that might be better.	23:34
*** longstaff has quit IRC		23:34
sbalukoff	I dunno.	23:35
sbalukoff	A lot of this stuff is really opaque since we don't seem to be consistently using layers.	23:35
blogan	i don't like going to the handler for db calls	23:36
johnsom	I think we should write native sql and get rid of it all! grin	23:37
blogan	i'm not entirely sure if oslo messaging makes rpc cast calls go over the queue, but either way GETs going straight to the workers will overload them	23:37
*** yamamoto_ has quit IRC		23:37
*** piet_ has quit IRC		23:37
sbalukoff	blogan: We're already going to the handler to update the DB in the case of updates and deletes.	23:37
johnsom	sbalukoff I don't follow you there	23:38
blogan	sbalukoff: yes but i'm talking about GETs on the API will have to retrieve from the db, and GETs happen exponentially more frequently than POSTs PUTs and DELETEs, so the cotnroller worker serving all those GETs is asking for trouble	23:38
johnsom	What handler?	23:38
blogan	johnsom: handler, right now its the queue producer basically	23:38
sbalukoff	Again, looking at api/v1/controllers/listener.py.... put method: self.handler.update(db_listener, listener)	23:38
sbalukoff	That's how we update the DB.	23:38
johnsom	blogan +1 to that	23:38
johnsom	blogan GETs should not go across the queue	23:39
sbalukoff	But we don't do that for creates.	23:39
blogan	sbalukoff: and johnsom is suggesting just doing the db update before the handler update	23:39
blogan	just like creates	23:39
sbalukoff	blogan: Er.. that's what I have been suggesting.	23:39
sbalukoff	The task flow reorder would do that.	23:40
blogan	sbalukoff: what? you said the API node should go to the handler to get db access, which to me means go across the queue, or straight to the controller	23:40
johnsom	sbalukoff No, the order would go away	23:40
sbalukoff	johnsom: Ok, now I'm confused.	23:40
blogan	we all are :(	23:40
sbalukoff	blogan: You're right that I was suggesting going to the handler for DB access. But you're also right that I wasn't thinking about read-only queries (which should be safe to do directly).	23:41
blogan	sbalukoff: ah okay now i think i know what you mean	23:41
sbalukoff	Mostly I was pointing out that for creates we update the database THEN call the handler, whereas for updates and deletes we just call the handler.	23:41
sbalukoff	(And the handler updates the DB.)	23:41
blogan	sbalukoff: so creates would go to the handler as well before updating the db?	23:42
johnsom	If we are going to not maintain history, as the re-order patch does, there is no point to all of that model stuff. The API should update the database straight away and call it a day.	23:42
sbalukoff	blogan: If we want to be consistent in how we handle retries / rollbacks.. yes. But I'm actually much more of a fan of the idea that we update the DB then call the handler for ALL write operations to the database...	23:42
blogan	sbalukoff: if creates go to the handler before the db, the user won't be able to get an ID from the POST to know what load balancer to query	23:43
sbalukoff	blogan: Well... right now we return data from a delete and update that's stale, too.	23:43
blogan	so yeah i'm in favor of doing the updates the same way as creates	23:43
johnsom	sbalukoff NO, we do not return stale data on update or delete	23:44
blogan	sbalukoff: depends on your definition of stale	23:44
blogan	when the request returns that is still hwo the lb is configured, or whatever resource	23:44
johnsom	We currently, pre-reorder return the actual state of the amp	23:44
*** armax has quit IRC		23:44
johnsom	LB/whatever	23:44
*** armax has joined #openstack-lbaas		23:44
sbalukoff	johnsom: So... we tell the user about a state that will only exist if the transaction succeeds?	23:45
sbalukoff	In other words, the same data we would return if we just updated the database in the first place? ;)	23:45
johnsom	sbalukoff Not currently	23:45
johnsom	sbalukoff with the re-order, yes your would return in-accurate data if it fails	23:46
sbalukoff	Right. I'm just more concerned about returning accurate data if it succeeds.	23:46
johnsom	To me re-order == update db at API time	23:46
johnsom	Both the current way and re-order would return accurate data on success	23:47
sbalukoff	I'm not sure I understand what you mean when you say "pre-reorder return the actual state of the amp"	23:47
sbalukoff	So... both the current way and re-order return inaccurate data on failure?	23:47
johnsom	sbalukoff Current code, pre-reorder, get calls to the API reflect the actual state of the LB at all times.	23:47
xgerman	and I like that	23:48
johnsom	Current way will continue to return accurate data on a failure, re-order would return inaccurate data	23:48
sbalukoff	How do you figure?	23:48
johnsom	as the database has been updated, but the load balancer has not yet changed	23:48
sbalukoff	We're talking failure mode: The load balancer is in error.	23:48
xgerman	blogan, gorge, and I decided a while ago to go with the real data in the DB - desired data passed into the queue/flow	23:49
sbalukoff	So, almost by definition, we can't trust that the state of the listener is accurate EXCEPT for the fact that it's in error.	23:49
johnsom	State == Error, configuration == new configuration not currently running on the load balancer.	23:49
johnsom	xgerman +1 yes, we decided this. This is a re-hash	23:50
xgerman	error is a thorny issue and we might have bugs in it since those cases are hard	23:50
xgerman	so I am not defending todays rollback code	23:50
johnsom	Just because the lb is in ERROR (today) doesn't mean we shutdown the amps, they are still running with a config different than a GET would return	23:50
sbalukoff	johnsom: By "State == Error, configuration == new configuration not currently running on the load balancer" are you talking about with or without the taskflow reorder.	23:51
johnsom	xgerman +1 we have not implemented retry code yet.	23:51
sbalukoff	Because that still accurately describes things with the task flow reorder.	23:51
johnsom	Error is a cop-out	23:51
xgerman	+1	23:51
johnsom	sbalukoff With the reorder. It causes this	23:52
xgerman	goal is to make the code better so even error has the right info in the DB	23:52
johnsom	Without re-order, we do not have that problem	23:52
sbalukoff	johnsom: Without the reorder, today, we get the same thing.	23:52
johnsom	sbalukoff No!	23:52
blogan	johnsom: would doing the db update for all updates at the api layer solve a lot of these issues?	23:52
johnsom	blogan No, it is the same as re-order, just a better implementation	23:53
sbalukoff	johnsom: Er... I could be wrong, but I swear that's what I've been seeing in my testing...	23:53
johnsom	sbalukoff With current code, we fail, the lb goes into error, but a get will pull what is actually running on the amps	23:53
xgerman	blogan, making the DB show what we want makes it difficult to get a sense what’s actually running	23:53
sbalukoff	johnsom: "what is actually running on the amps" is incorrect. You can't assume anything is actually running on the amps correctly at that point.	23:54
johnsom	sbalukoff because we do not commit the updates to the DB until after all of the amps have successfully been updated. Active/Active scale out actually makes the re-order problem worse	23:54
johnsom	sbalukoff why?	23:55
sbalukoff	johnsom: You have to know all the ways the amphora update can fail.	23:55
johnsom	sbalukoff we know the attempt to change what is running on the amp failed	23:55
sbalukoff	Yep. And we don't know why.	23:55
blogan	johnsom: if someone adds a member on another subnet, that network has to be plugged and that could cause the amp to fail somehow	23:56
johnsom	That is a create call	23:56
xgerman	ok, so you think amp in error - all bets are off, should it between the eyes and start over	23:56
blogan	so its possible the amp could not be running, but i'd says its unlikely	23:56
blogan	johnsom: oh good point lol	23:56
sbalukoff	xgerman: That's generally what I expect most people to do.	23:57
sbalukoff	Do we not allow changing the member subnet and IP once they are created?	23:57
xgerman	well, then the fresh amp fails and you start diggibng	23:57
blogan	sbalukoff: nope, i see no reason to either	23:57
blogan	you may as well creat a new member, thats essentially what you're doing	23:57
xgerman	and that digging part is helped by some hints from the DB...	23:57
sbalukoff	blogan: Ok. But that's one example. There are lots of reasons why the amp update might fail. You might just have lost power to part of your datacenter...	23:58
xgerman	or you file a ticket with your cloud admin to look at the logs...	23:58
blogan	xgerman: that is a decent reason	23:58
johnsom	Yeah, that is the "debug" argument for keeping it as it is.	23:59
blogan	i know	23:59
xgerman	:-)	23:59
johnsom	And, thus the question, do we want to give up having the history in the DB	23:59
blogan	okay so, what are the alternatives here, keep things as is OR update db before?	23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!