minwang2 | cool | 00:01 |
---|---|---|
*** amotoki has quit IRC | 00:02 | |
*** madhu_ak_ is now known as madhu_ak | 00:05 | |
*** doug-fis_ has joined #openstack-lbaas | 00:10 | |
*** TrevorV has quit IRC | 00:12 | |
openstackgerrit | Michael Johnson proposed openstack/octavia: Stop logging amphora cert generation in debug https://review.openstack.org/279334 | 00:12 |
*** doug-fish has quit IRC | 00:15 | |
*** ducttape_ has joined #openstack-lbaas | 00:17 | |
ajmiller | dougwig you around? | 00:18 |
*** kevo has quit IRC | 00:46 | |
*** SumitNaiksatam has quit IRC | 00:53 | |
*** minwang2 has quit IRC | 00:58 | |
*** ajmiller has quit IRC | 00:58 | |
*** allan_h has quit IRC | 01:09 | |
openstackgerrit | Reedip proposed openstack/neutron-lbaas: Make subnet_id for lbaas v2 member create optional https://review.openstack.org/267935 | 01:13 |
*** ducttape_ has quit IRC | 01:21 | |
*** SumitNaiksatam has joined #openstack-lbaas | 01:32 | |
*** Aish has quit IRC | 01:37 | |
*** yamamoto_ has joined #openstack-lbaas | 01:44 | |
*** drjones has quit IRC | 01:44 | |
*** harlowja has quit IRC | 01:55 | |
*** amotoki has joined #openstack-lbaas | 01:58 | |
*** shakamunyi has joined #openstack-lbaas | 01:58 | |
*** ducttape_ has joined #openstack-lbaas | 02:00 | |
*** amotoki has quit IRC | 02:02 | |
*** yamamoto_ has quit IRC | 02:04 | |
*** Aish has joined #openstack-lbaas | 02:07 | |
*** Aish has left #openstack-lbaas | 02:07 | |
*** shakamunyi has quit IRC | 02:12 | |
*** rooney has quit IRC | 02:18 | |
*** ducttape_ has quit IRC | 02:23 | |
*** rooney has joined #openstack-lbaas | 02:24 | |
*** shakamunyi has joined #openstack-lbaas | 02:31 | |
*** Bjoern has joined #openstack-lbaas | 02:34 | |
*** Bjoern has quit IRC | 02:39 | |
*** bana_k has joined #openstack-lbaas | 02:42 | |
*** minwang2 has joined #openstack-lbaas | 02:46 | |
*** yamamoto_ has joined #openstack-lbaas | 02:51 | |
*** madhu_ak has quit IRC | 02:58 | |
*** ajmiller has joined #openstack-lbaas | 03:06 | |
*** ajmiller_ has joined #openstack-lbaas | 03:08 | |
*** ajmiller has quit IRC | 03:12 | |
*** ajmiller_ is now known as ajmiller | 03:13 | |
*** ducttape_ has joined #openstack-lbaas | 03:14 | |
*** doug-fis_ has quit IRC | 03:19 | |
*** doug-fish has joined #openstack-lbaas | 03:21 | |
*** doug-fish has quit IRC | 03:21 | |
*** kevo has joined #openstack-lbaas | 03:22 | |
*** ducttape_ has quit IRC | 03:22 | |
*** bana_k has quit IRC | 03:25 | |
*** links has joined #openstack-lbaas | 03:32 | |
*** ducttape_ has joined #openstack-lbaas | 03:45 | |
*** woodster_ has quit IRC | 03:46 | |
*** minwang2 has quit IRC | 03:48 | |
*** minwang2 has joined #openstack-lbaas | 03:51 | |
*** amotoki has joined #openstack-lbaas | 04:41 | |
*** minwang2 has quit IRC | 04:42 | |
*** minwang2 has joined #openstack-lbaas | 04:44 | |
*** ducttape_ has quit IRC | 04:58 | |
*** ducttape_ has joined #openstack-lbaas | 05:00 | |
*** ducttape_ has quit IRC | 05:05 | |
*** pc_m has quit IRC | 05:10 | |
*** hockeynut has quit IRC | 05:10 | |
*** rm_work has quit IRC | 05:11 | |
*** hockeynut has joined #openstack-lbaas | 05:12 | |
*** pc_m has joined #openstack-lbaas | 05:12 | |
*** rm_work has joined #openstack-lbaas | 05:18 | |
*** ajmiller_ has joined #openstack-lbaas | 05:18 | |
*** chlong has quit IRC | 05:19 | |
*** ajmiller has quit IRC | 05:21 | |
*** ajmiller_ has quit IRC | 05:21 | |
*** allan_h has joined #openstack-lbaas | 05:27 | |
*** fnaval has quit IRC | 05:31 | |
*** chlong has joined #openstack-lbaas | 05:33 | |
*** fnaval has joined #openstack-lbaas | 05:41 | |
*** chlong has quit IRC | 05:42 | |
*** chlong has joined #openstack-lbaas | 05:54 | |
*** rooney has quit IRC | 06:00 | |
*** ducttape_ has joined #openstack-lbaas | 06:01 | |
*** ducttape_ has quit IRC | 06:06 | |
*** numans has joined #openstack-lbaas | 06:09 | |
*** chlong has quit IRC | 06:11 | |
*** prabampm has joined #openstack-lbaas | 06:14 | |
*** rooney has joined #openstack-lbaas | 06:17 | |
*** numans has quit IRC | 06:26 | |
*** numans has joined #openstack-lbaas | 06:28 | |
*** chlong has joined #openstack-lbaas | 06:34 | |
*** numans has quit IRC | 06:53 | |
*** armax has quit IRC | 06:54 | |
*** ducttape_ has joined #openstack-lbaas | 07:03 | |
*** chlong has quit IRC | 07:03 | |
*** ducttape_ has quit IRC | 07:08 | |
*** numans has joined #openstack-lbaas | 07:15 | |
*** amotoki has quit IRC | 07:16 | |
*** minwang2 has quit IRC | 07:22 | |
*** minwang2 has joined #openstack-lbaas | 07:23 | |
*** amotoki has joined #openstack-lbaas | 07:24 | |
*** lmiccini|away is now known as lmiccini | 07:27 | |
*** minwang2 has quit IRC | 07:28 | |
*** minwang2 has joined #openstack-lbaas | 07:28 | |
*** allan_h has quit IRC | 07:29 | |
*** minwang2 has quit IRC | 07:30 | |
*** amotoki has quit IRC | 07:35 | |
*** minwang2 has joined #openstack-lbaas | 07:37 | |
*** amotoki has joined #openstack-lbaas | 07:39 | |
openstackgerrit | okamoto_takashi_p3 proposed openstack/neutron-lbaas: Fixed connection limit for HA Proxy https://review.openstack.org/279402 | 07:41 |
*** minwang2 has quit IRC | 07:43 | |
*** amotoki has quit IRC | 07:43 | |
*** reedip is now known as reedip_away | 07:54 | |
*** amotoki has joined #openstack-lbaas | 07:55 | |
*** rooney has quit IRC | 07:57 | |
openstackgerrit | Stephen Balukoff proposed openstack/octavia: Fixed database model traversal https://review.openstack.org/279414 | 08:10 |
openstackgerrit | okamoto_takashi_p3 proposed openstack/neutron-lbaas: Fixed connection limit for HA Proxy https://review.openstack.org/279402 | 08:23 |
*** kevo has quit IRC | 08:26 | |
*** ducttape_ has joined #openstack-lbaas | 09:04 | |
*** mhickey has joined #openstack-lbaas | 09:07 | |
*** mhickey has quit IRC | 09:08 | |
*** mhickey has joined #openstack-lbaas | 09:09 | |
*** ducttape_ has quit IRC | 09:09 | |
openstackgerrit | Stephen Balukoff proposed openstack/octavia: Add L7 database structures https://review.openstack.org/265430 | 09:10 |
openstackgerrit | Stephen Balukoff proposed openstack/octavia: Update repos for L7 policy / methods https://review.openstack.org/265529 | 09:31 |
*** amotoki has quit IRC | 09:34 | |
openstackgerrit | Stephen Balukoff proposed openstack/octavia: Update repos for L7 rules / validations https://review.openstack.org/276643 | 09:41 |
*** amotoki has joined #openstack-lbaas | 09:48 | |
*** amotoki has quit IRC | 09:53 | |
openstackgerrit | Stephen Balukoff proposed openstack/octavia: Add L7 api - policies https://review.openstack.org/265690 | 09:55 |
*** amotoki has joined #openstack-lbaas | 09:56 | |
*** amotoki has quit IRC | 10:01 | |
openstackgerrit | Stephen Balukoff proposed openstack/octavia: Add L7 api - rules https://review.openstack.org/277718 | 10:01 |
*** ducttape_ has joined #openstack-lbaas | 10:05 | |
openstackgerrit | Stephen Balukoff proposed openstack/octavia: Add L7 controller worker flows and tasks https://review.openstack.org/277768 | 10:06 |
*** ducttape_ has quit IRC | 10:09 | |
*** amotoki has joined #openstack-lbaas | 10:10 | |
openstackgerrit | Stephen Balukoff proposed openstack/octavia: Add L7 jinja template updates https://review.openstack.org/278223 | 10:11 |
*** amotoki has quit IRC | 10:12 | |
*** amotoki has joined #openstack-lbaas | 10:13 | |
openstackgerrit | Stephen Balukoff proposed openstack/octavia: Add L7 documentation https://review.openstack.org/278830 | 10:17 |
*** openstackgerrit has quit IRC | 10:32 | |
*** openstackgerrit has joined #openstack-lbaas | 10:33 | |
*** numans has quit IRC | 10:42 | |
openstackgerrit | Merged openstack/octavia: Fixes an intermittent load balancer delete failure https://review.openstack.org/278856 | 10:52 |
*** ducttape_ has joined #openstack-lbaas | 11:06 | |
*** ducttape_ has quit IRC | 11:11 | |
*** numans has joined #openstack-lbaas | 11:30 | |
*** amotoki has quit IRC | 11:30 | |
*** numans has quit IRC | 11:34 | |
*** numans has joined #openstack-lbaas | 11:36 | |
*** ducttape_ has joined #openstack-lbaas | 12:07 | |
*** links has quit IRC | 12:10 | |
*** ducttape_ has quit IRC | 12:12 | |
*** yamamoto_ has quit IRC | 12:12 | |
*** intr1nsic has quit IRC | 12:22 | |
*** intr1nsic has joined #openstack-lbaas | 12:25 | |
*** rtheis has joined #openstack-lbaas | 12:25 | |
*** ducttape_ has joined #openstack-lbaas | 13:07 | |
*** _ducttape_ has joined #openstack-lbaas | 13:11 | |
*** ducttape_ has quit IRC | 13:12 | |
*** mhickey has quit IRC | 13:12 | |
*** _ducttape_ has quit IRC | 13:25 | |
*** amotoki has joined #openstack-lbaas | 13:33 | |
*** amotoki has quit IRC | 13:43 | |
*** amotoki has joined #openstack-lbaas | 13:51 | |
*** amotoki has quit IRC | 13:52 | |
*** amotoki has joined #openstack-lbaas | 13:58 | |
*** neelashah has joined #openstack-lbaas | 13:58 | |
*** ajmiller has joined #openstack-lbaas | 14:11 | |
*** localloop127 has joined #openstack-lbaas | 14:20 | |
*** ajmiller has quit IRC | 14:44 | |
*** amotoki has quit IRC | 14:45 | |
*** amotoki has joined #openstack-lbaas | 14:48 | |
*** Bjoern has joined #openstack-lbaas | 14:48 | |
*** prabampm has quit IRC | 14:53 | |
*** amotoki has quit IRC | 14:53 | |
*** piet has joined #openstack-lbaas | 14:59 | |
*** ducttape_ has joined #openstack-lbaas | 14:59 | |
*** doug-fish has joined #openstack-lbaas | 15:01 | |
*** TrevorV has joined #openstack-lbaas | 15:04 | |
*** amotoki has joined #openstack-lbaas | 15:09 | |
*** yamamot__ has joined #openstack-lbaas | 15:09 | |
*** doug-fish has quit IRC | 15:21 | |
*** doug-fish has joined #openstack-lbaas | 15:22 | |
*** ajmiller has joined #openstack-lbaas | 15:28 | |
*** fnaval has quit IRC | 15:59 | |
*** localloo1 has joined #openstack-lbaas | 16:02 | |
*** localloop127 has quit IRC | 16:05 | |
*** localloop127 has joined #openstack-lbaas | 16:06 | |
*** localloo1 has quit IRC | 16:07 | |
*** fnaval has joined #openstack-lbaas | 16:19 | |
*** links has joined #openstack-lbaas | 16:23 | |
*** numans has quit IRC | 16:31 | |
*** armax has joined #openstack-lbaas | 16:32 | |
*** woodster_ has joined #openstack-lbaas | 16:39 | |
*** localloop127 has quit IRC | 16:40 | |
*** allan_h has joined #openstack-lbaas | 16:41 | |
*** localloop127 has joined #openstack-lbaas | 16:41 | |
*** bana_k has joined #openstack-lbaas | 16:43 | |
*** localloop127 has quit IRC | 16:49 | |
*** localloop127 has joined #openstack-lbaas | 16:51 | |
kevinbenton | blogan: yo | 16:53 |
kevinbenton | blogan: you had question about heartbeats? | 16:54 |
johnsom | kevinbenton Octavia heartbeats? I might be able to answer | 16:54 |
kevinbenton | johnsom: no idea, just had a ping from him from yesterday | 16:55 |
johnsom | Ah, ok | 16:55 |
johnsom | Can't help you there | 16:55 |
sbalukoff | johnsom: I have to jump into a phone meeting for the next half hour,but I have something I want to discuss with you after that. Do you think you'll be available? | 16:57 |
johnsom | sbalukoff available until 11, then after 1:30 | 16:58 |
sbalukoff | (It's about this https://review.openstack.org/#/c/279414/ as well as what I think was causing the pushing of "out of date" configs to the amphorae) | 16:59 |
*** yamamot__ has quit IRC | 16:59 | |
johnsom | Ok, yeah, ping me | 16:59 |
*** yamamoto_ has joined #openstack-lbaas | 17:00 | |
blogan | kevinbenton: neutron agent heartbeats | 17:02 |
kevinbenton | blogan: yeah, what about them? | 17:02 |
blogan | kevinbenton: if a heartbeat is found to be stale, does the agent's admin_state_up get moved to disabled, or is it the status that cahnges? | 17:02 |
kevinbenton | blogan: just the status | 17:03 |
kevinbenton | blogan: admin_state_up is purely controlled via the API | 17:03 |
kevinbenton | blogan: admin_state_up is to prevent stuff from being scheduled to it | 17:04 |
blogan | kevinbenton: okay thats what i figured | 17:04 |
openstackgerrit | Michael Johnson proposed openstack/octavia: Stop logging amphora cert generation in debug https://review.openstack.org/279334 | 17:04 |
blogan | kevinbenton: well wouldn't you want agents that have a stale heartbeat to not be scheduled to? | 17:04 |
blogan | if the status is OFFLINE or DOWN or whatever it is | 17:05 |
kevinbenton | blogan: yes | 17:05 |
blogan | kevinbenton: okay just making sure on that, lbaas agent does not check status when scheduling | 17:05 |
kevinbenton | blogan: i was just pointing out that admin_state_up is just used as a way to make it explicit | 17:05 |
blogan | kevinbenton: makes sense, thansk | 17:05 |
kevinbenton | blogan: i don't think this applies to the L2 agent | 17:05 |
kevinbenton | blogan: there was some discussion around changing that | 17:06 |
kevinbenton | blogan: but don't look to l2 agent as example | 17:06 |
blogan | kevinbenton: just looking at the lbaas agent, and it uses the common agent code in neutron | 17:06 |
*** localloop127 has quit IRC | 17:09 | |
*** localloop127 has joined #openstack-lbaas | 17:10 | |
kevinbenton | blogan: now that i think about it i'm not sure the status is actually stored anywhere | 17:15 |
kevinbenton | blogan: for the other agents | 17:15 |
*** _cjones_ has joined #openstack-lbaas | 17:16 | |
*** openstackgerrit has quit IRC | 17:17 | |
*** openstackgerrit has joined #openstack-lbaas | 17:18 | |
kevinbenton | blogan: i think the dead/alive is calculated dynamically on API queries | 17:18 |
kevinbenton | blogan: yeah, only thing in the DB is admin_state_up, no status column | 17:19 |
kevinbenton | blogan: | 17:19 |
kevinbenton | https://github.com/openstack/neutron/blob/master/neutron/db/agents_db.py#L72-L106 | 17:19 |
kevinbenton | is_active is calculated based on delta from last updated timestampe | 17:20 |
*** amotoki has quit IRC | 17:31 | |
blogan | kevinbenton: ah so a scheduler woudlnt be able to say don't schedule here if the heartbeats stopped | 17:31 |
kevinbenton | blogan: yeah, it can. it can check that is_active property | 17:32 |
kevinbenton | blogan: that will calculate based on the heartbeat | 17:32 |
blogan | kevinbenton: oh duh, sorry i was thinking about it wrong | 17:32 |
kevinbenton | blogan: i was just pointing out that we don't even have a status column so it can't possibly be affected by anything else | 17:33 |
blogan | kevinbenton: yeah makes sense | 17:35 |
blogan | kevinbenton: thanks for your help | 17:35 |
kevinbenton | blogan: no prob | 17:35 |
sbalukoff | johnsom: Ok! So! Wheneeeeever you're ready... | 17:38 |
johnsom | sbalukoff Coffee in hand let's go | 17:38 |
sbalukoff | Ok, so first off, you may notice that the bugfix patch I uploaded moves things to be a lot closer to what blogan wrote there originally, before the shared pools patch. | 17:39 |
johnsom | Yes | 17:39 |
sbalukoff | What it does is populate a data model tree from the perspective of whatever database object you hand it first. | 17:39 |
sbalukoff | blogan's (and this patch's) code refuse to revisit object classes we've seen before. | 17:40 |
johnsom | Right | 17:40 |
sbalukoff | This means, that we can trust the data model to be accurate *for the first object we pass to it* but things will get less accurate the farther down the tree we traverse. | 17:40 |
sbalukoff | My code that I have in there (without this bugfix) tries to make things a little more fleshed out... | 17:40 |
sbalukoff | You can trust the accuracy of the object you pass to it, and all of its immediate children. | 17:41 |
sbalukoff | However, the problem with that is once you have a moderate number of things populated in the tree... | 17:41 |
sbalukoff | The efficiency of this algorithm becomes worse than n^2 | 17:41 |
sbalukoff | It's really, really terrible, actually. | 17:42 |
johnsom | Yeah, not fun | 17:42 |
sbalukoff | Which is why I'm advocating for essentially a roll-back. | 17:42 |
sbalukoff | But! | 17:42 |
sbalukoff | Where the rubber hits the road here is what we do in the flows whenever we do a amphora_driver_task.ListenersUpdate. | 17:42 |
sbalukoff | Right now, we pass that method a data model which has not been populated from the perspective of the listeners we are updating. | 17:43 |
sbalukoff | So... you get inaccuracies. | 17:43 |
sbalukoff | I see there being two potential solutions to this problem: | 17:43 |
johnsom | Ok, you lost me on that one | 17:44 |
sbalukoff | 1. Before we call ListenersUpdate, we have to populate the data model from the perspective of the listeners we pass to it, and then do our preliminary updates (eg. "delete this pool" "update this member", etc.) on the data model we've populated from the perspective of the listener... | 17:44 |
sbalukoff | Ok... | 17:44 |
sbalukoff | Let me find a concrete example. | 17:44 |
sbalukoff | Give me a minute. | 17:44 |
johnsom | I thought that was what these were for: https://github.com/openstack/octavia/blob/master/octavia/controller/worker/flows/listener_flows.py#L72 | 17:44 |
sbalukoff | Yes.. | 17:45 |
sbalukoff | So let's see... | 17:45 |
sbalukoff | If you look in controller_worker.py... | 17:45 |
sbalukoff | In the update_member method. | 17:45 |
sbalukoff | See how we populate the member from the database.... | 17:45 |
sbalukoff | But then 'listeners' is populated from that member data model? | 17:45 |
*** jwarendt has quit IRC | 17:46 | |
sbalukoff | That's where we run into trouble. | 17:46 |
johnsom | Yeah, we get the old member from DB, store it and the member updates in the flow. This allows the flow to update the model before the listener_update | 17:46 |
sbalukoff | We need to do a repo.get for the listener. | 17:46 |
*** bana_k has quit IRC | 17:46 | |
sbalukoff | That's solution 1. to this problem. | 17:46 |
sbalukoff | The second solution is to just update the database first, and allow the amphora_driver_tasks.ListenersUpdate to populate the listener directly from database data. | 17:47 |
johnsom | So you are saying that listeners list, from the member.pool is incomplete? | 17:47 |
sbalukoff | Hear me out on this one: | 17:47 |
*** longstaff has joined #openstack-lbaas | 17:48 | |
sbalukoff | johnsom: Yes, in effect. All the listeners are there, but listeners.pools is not because we're really doing this: member.pool.listeners.pool.member | 17:48 |
sbalukoff | Er.... sorry, memberl.pool.listeners.pools.member | 17:48 |
sbalukoff | Dammit, can't type today. | 17:48 |
sbalukoff | Anyway, you see how once we've traversed a member object, then a pool object, then a listener object, we can't go back to a pool object? | 17:49 |
sbalukoff | Not with the way we're populating the data model. | 17:49 |
sbalukoff | (If we're following blogan's code, which I think is the only efficient way to do this.) | 17:49 |
sbalukoff | So.... I also want to say something about populating the database first: | 17:49 |
sbalukoff | Think about about what we presently do when we revert: | 17:50 |
sbalukoff | We mark the listener as being in the 'ERROR' state. | 17:50 |
johnsom | I think I am following you. I think this is why we have the reload task. We probably need to add that to the member update flow so it takes that list of members and updates those models | 17:50 |
sbalukoff | I interpret that to mean "this listener in the database might be out of synch with whatever is actually in place on the amphora" | 17:50 |
sbalukoff | johnsom: Yep. We actually have to do that for any flow that is not a listener. | 17:51 |
sbalukoff | So, pool, health_monitor, session_persistence, and also (now) l7policy, l7pool.... etc. etc. etc. | 17:51 |
johnsom | Hmmm, but all of those work today | 17:52 |
sbalukoff | I think it would be far simpler to just update the database first, and if we need to revert... well, we're already marking the listener as being in 'ERROR' right? | 17:52 |
sbalukoff | To be honest, I'm not sure they actually do work today. I think our scenario tests are fairly weak right now. | 17:53 |
sbalukoff | Er... tempest tests. | 17:53 |
johnsom | I'm not counting on scenario tests, I'm going on my testing | 17:53 |
*** jwarendt has joined #openstack-lbaas | 17:53 | |
sbalukoff | Ok. | 17:54 |
johnsom | EVERYTHING was broken with that database pull line I removed. That I know broke a ton | 17:54 |
sbalukoff | Yep. | 17:54 |
johnsom | Let me do a quick update test on my devstack with our latest master. | 17:54 |
sbalukoff | Ok. | 17:54 |
sbalukoff | Yeah-- I'm not even sure my super-inefficient model population code saves us from this. | 17:55 |
sbalukoff | Because, really, we're not updating the model right in the controller_worker before we hand it to the ListenersUpdate task. | 17:55 |
johnsom | I guess, I should pull your patch, right? | 17:55 |
sbalukoff | That would put you in a state much closer to how things were before the shared pools patch. | 17:56 |
sbalukoff | Though if you want to be entertained (or bored), go with master, populate a complete load balancer tree, and try to do an API get on the load balancer. | 17:56 |
sbalukoff | And watch the octavia-api daemon spin for about 30 seconds while consuming 100% CPU. XD | 17:57 |
johnsom | Ok, give me a minute. My build LB script is running. After it is done, I will pull your patch, restart and do updates on objects down the tree | 17:57 |
*** links has quit IRC | 17:58 | |
sbalukoff | There's one other fly in the ointment that should probably be mentioned if we want to go option 1, and update the model only (not the DB) before passing the model to the ListenersUpdate task... | 17:58 |
sbalukoff | There's a lot of logic in the repository.py file (not just in the l7policy / l7rule stuff I'm trying to add) which is not presently reflected in the data_models.py file. | 17:59 |
sbalukoff | So if you're trying to simulate update of the database while you're really only updating the data model... you're going to have to duplicate all the logic that happens in repository.py. | 18:00 |
sbalukoff | Which is ugly, to say the least. | 18:00 |
sbalukoff | Also, I should note that I could be totally wrong about all this-- this is what I discovered around 1:00am last night and I've only had about 4.5 hours of sleep. But I don't *think* I'm wrong about this. XD | 18:03 |
sbalukoff | Hmmm... there's also a distant third option, but it might not be possible.... We might be able to figure out a way to hold a database write lock while we attempt to update all the amphorae... and then roll back the transaction if we need to revert. But that also seems very ugly. | 18:06 |
johnsom | Well, that was interesting. I did a pool update to LEAST_CONNECTIONS and the haproxy didn't render right | 18:06 |
sbalukoff | It didn't? | 18:07 |
sbalukoff | This is after you've removed the database lookup from ListenersUpdate? | 18:07 |
johnsom | sbalukoff Yeah, a transaction would be the best option. Just the way we are doing repo etc. makes that nearly impossible | 18:07 |
sbalukoff | johnsom: Yeah. Super painful. | 18:08 |
xgerman | let me know when you figure out transactions - will need that for cascading deletes | 18:09 |
sbalukoff | xgerman: I am so, so sorry. | 18:10 |
johnsom | Transactions are easy. repo and sqlalchemy make them harder | 18:10 |
*** piet has joined #openstack-lbaas | 18:10 | |
openstackgerrit | Trevor Vardeman proposed openstack/octavia: WIP - Get me a Load Balancer https://review.openstack.org/256974 | 18:11 |
*** madhu_ak has joined #openstack-lbaas | 18:12 | |
johnsom | Ok, now just a build isn't rendering the haproxy config right. | 18:12 |
sbalukoff | johnsom: Again, this is after you've removed the database lookup from ListenersUpdate? | 18:13 |
johnsom | Yeah, and added your latest. Going to roll that back | 18:13 |
sbalukoff | If you're going to allow the ListenersUpdate to populate its data from the database, then the database should also be updated first. | 18:14 |
sbalukoff | (Just sayin' ;) ) | 18:14 |
sbalukoff | Anyway, do your thing. Maybe I'm interpreting the situation wrong, eh. | 18:14 |
*** amotoki has joined #openstack-lbaas | 18:14 | |
johnsom | I'm going to roll you traversal patch back, then try a build. It was failing on create, so not just update | 18:15 |
*** bana_k has joined #openstack-lbaas | 18:15 | |
*** kevo has joined #openstack-lbaas | 18:15 | |
sbalukoff | Ok. | 18:15 |
johnsom | o-cw said it was fine, but the haproxy conf was missing sections after I did the create of an lb with members | 18:16 |
sbalukoff | *nod* | 18:16 |
sbalukoff | Similar to symptoms I was seeing last night. | 18:16 |
johnsom | Ok, still broken | 18:16 |
johnsom | Ok, with my "revert bad workaround" removed, still not rendering. I'm going to pull out shared pools | 18:21 |
sbalukoff | What is your "revert bad workaround"? | 18:22 |
johnsom | https://review.openstack.org/#/c/277602/ | 18:23 |
sbalukoff | Right. | 18:23 |
*** openstack has joined #openstack-lbaas | 18:27 | |
sbalukoff | Yeah, I screw up all the time, which it why it's good you're trying to verify whether my assessment of the situation is correct. :) | 18:27 |
johnsom | Ok, starting over, testing master, create LB | 18:28 |
johnsom | Ok, master, create LB works. | 18:29 |
sbalukoff | Ok. It's just horribly inefficient then. :) | 18:30 |
johnsom | Yeah, could be. Let me try an update | 18:30 |
sbalukoff | Seriously, populate like 3 listeners and three pools. | 18:30 |
sbalukoff | And watch it spin... | 18:30 |
johnsom | Ok, so member update is showing the same, need to run twice issue. trying pool update | 18:32 |
johnsom | (though member update was re-ordered I thought) maybe that was delete | 18:32 |
johnsom | It was delete, nevermind | 18:33 |
sbalukoff | I know that one of the member methods accidentally got skipped with that fix. | 18:33 |
sbalukoff | Yeah. | 18:33 |
johnsom | Looks like our CLI help for pool-update is broken | 18:34 |
sbalukoff | Most of the cli update help is shit. | 18:34 |
johnsom | Careful, I implemented some of that... grin | 18:35 |
sbalukoff | (create and delete help isn't htat bad.) ;) | 18:35 |
sbalukoff | But yeah... actually fixing the update help is down there on my to-do somewhere. | 18:35 |
sbalukoff | Probably after "give Octavia its own CLI, already." | 18:36 |
johnsom | It was a rush implement at a hack-a-thon to get it in kilo | 18:36 |
sbalukoff | That doesn't surprise me too much. :) | 18:36 |
sbalukoff | And really-- I'm not saying you didn't do what you had to do, eh. | 18:36 |
sbalukoff | (Or that I would have done differently) | 18:36 |
sbalukoff | I know how a lot of this code gets written, eh. ;) | 18:37 |
sbalukoff | Most of the testing against Octavia I've been doing lately has just to run curl against the octavia API directly to reduce the complication / potential point of failiure of neutron-lbaas. | 18:38 |
sbalukoff | (Makes it far simpler to determine whether it's an Octavia or a Neutron LBaaS problem I'm seeing.) | 18:38 |
sbalukoff | This might be helpful if you're interested in doing any of that: https://gist.github.com/anonymous/f59d96918f3103ad5651 | 18:39 |
*** openstackgerrit has quit IRC | 18:47 | |
*** SumitNaiksatam has quit IRC | 18:47 | |
*** openstackgerrit has joined #openstack-lbaas | 18:48 | |
johnsom | Yeah, ok, master is not updating | 18:51 |
johnsom | It renders, but doesn't update on first call | 18:52 |
sbalukoff | If it's helpful, I can write a second patch (dependent on my traversal fix patch) which changes the flows so the database update comes first. Again, the revert method in this case marks the listener as "ERROR" which is accurate enough. | 18:53 |
sbalukoff | And we can see whether that one fixes things. :/ | 18:53 |
johnsom | Let me mess with this a bit today. This stuff worked at one point and I have no idea what broke it. I want to look closer at those model updates. | 18:54 |
sbalukoff | Ok. | 18:55 |
johnsom | That said, I'm pretty sure your patch breaks the config render | 18:55 |
sbalukoff | Well.... I'd like to try things with the database update in flow order reversed anyway. So i'll go ahead and write that patch, even if we end up abandoning it. :) | 18:55 |
johnsom | Ok. Just note I will push back. But either way, this https://review.openstack.org/#/c/279414/ breaks the haproxy config render | 18:56 |
johnsom | I can't create LBs with that patch in | 18:57 |
sbalukoff | yep, makes sense that it would, with the database update happening after other things. | 18:57 |
*** piet has quit IRC | 18:57 | |
sbalukoff | Also, you should feel free to push back. But have good technical reasons for doing so, eh. ;) | 18:57 |
johnsom | Without that one patch, it creates just fine | 18:57 |
*** localloo1 has joined #openstack-lbaas | 18:58 | |
sbalukoff | yes, and I think I know why. But also without that patch we can't support an octavia infrastructure with more than about 3 listeners and pools. | 18:58 |
johnsom | So, order is not the issue | 18:58 |
johnsom | Understand, work to do. | 18:58 |
sbalukoff | right. We're not populating the data model correctly. And it turns out populating it correctly is painful because we have to duplicate logic from the repository. | 18:59 |
*** localloop127 has quit IRC | 19:01 | |
*** localloo1 has quit IRC | 19:02 | |
*** localloo1 has joined #openstack-lbaas | 19:04 | |
*** numans has joined #openstack-lbaas | 19:07 | |
*** kobis has quit IRC | 19:08 | |
*** mhickey has joined #openstack-lbaas | 19:10 | |
mhickey | blogan, dougwig, johnsom: Hey | 19:13 |
mhickey | blogan, dougwig, johnsom: re https://bugs.launchpad.net/neutron/+bug/1544861, looking for your guru knowledge? | 19:16 |
openstack | Launchpad bug 1544861 in neutron "LBaaS: connection limit does not work with HA Proxy" [Undecided,New] | 19:16 |
*** numans has quit IRC | 19:17 | |
*** localloop127 has joined #openstack-lbaas | 19:17 | |
openstackgerrit | Stephen Balukoff proposed openstack/octavia: Do DB updates before pushing listener conf to amp https://review.openstack.org/279731 | 19:18 |
*** localloo1 has quit IRC | 19:19 | |
johnsom | mhickey looking | 19:20 |
sbalukoff | (Gonna go AFK for a bit, then run the above patch through some manual scenario tests...) | 19:21 |
mhickey | johnsom: ack | 19:22 |
*** kobis has joined #openstack-lbaas | 19:28 | |
johnsom | mhickey I commented and marked incomplete | 19:28 |
johnsom | mhickey let me know if you need more | 19:29 |
mhickey | johnsom: that's great, thanks.:) | 19:31 |
*** yamamoto_ has quit IRC | 19:34 | |
blogan | mhayden: left a comment on the bug https://bugs.launchpad.net/neutron/+bug/1544313 | 20:01 |
openstack | Launchpad bug 1544313 in neutron "LBaaSv2 agent schedules LB's to agents that are offline" [Undecided,New] | 20:01 |
sbalukoff | johnsom: I have to run off for a bit, but I just ran the above patch through several scenario tests and everything worked as expected. (ie. DB stays in synch with amphora haproxy.cfg, and changes get pushed immediately) | 20:11 |
sbalukoff | I would be interested in finding out if anyone can find a breaking scenario test with the above patch. | 20:12 |
sbalukoff | In the mean time... during my shower this morning I thought of another way to populate the data model which would be more accurate, would probably be uglier, but also probably wouldn't be n^2 or worse efficiency. | 20:12 |
sbalukoff | I'll try giving that a shot, but I'm still in favor of doing the DB updates first because that way we don't have to duplicate logic in the flows that's already in the repository. | 20:13 |
sbalukoff | Anyway, I'll be checking back in here periodically. | 20:13 |
rm_work | sbalukoff: i had a question on your L7 DB review | 20:16 |
rm_work | also -- showerthoughts best thoughts | 20:16 |
sbalukoff | rm_work: What's the question? | 20:17 |
rm_work | https://review.openstack.org/#/c/265430/28 | 20:17 |
rm_work | nothing major, unless my gut feeling about that _listeners var is right | 20:18 |
sbalukoff | rm_work: I'll have a look and see about answering that later this afternoon. Thanks for the heads up! | 20:18 |
*** kevo has quit IRC | 20:19 | |
*** localloop127 has quit IRC | 20:20 | |
rm_work | kk | 20:21 |
*** kobis has quit IRC | 20:26 | |
*** localloop127 has joined #openstack-lbaas | 20:30 | |
*** kobis has joined #openstack-lbaas | 20:46 | |
*** yamamoto_ has joined #openstack-lbaas | 20:48 | |
*** yamamoto_ has quit IRC | 20:52 | |
*** kobis has quit IRC | 20:55 | |
xgerman | rm_work yt? | 21:04 |
xgerman | moar bbq questions | 21:04 |
*** piet has joined #openstack-lbaas | 21:05 | |
*** kevo has joined #openstack-lbaas | 21:10 | |
johnsom | Yeah, the more I think about it, the more I am against re-ordering the flows. Much like we decided earlier in the week, I think it is best to fix the root cause of the problem and get the better code working again. | 21:17 |
*** yamamoto_ has joined #openstack-lbaas | 21:18 | |
blogan | johnsom, sbalukoff: i agree, just by skimming over the scrollback, address the root cause | 21:21 |
*** yamamoto_ has quit IRC | 21:22 | |
blogan | johnsom, sbalukoff: do yall think if we dumped the data models in favor of just passing sql alchemy models around would be better? I've had experiences where sqlalchemy modesl just don't do what you want, so thats how the data models came into existence, but they need cannto reproduce the back and forward links like sqlalchemy gives | 21:22 |
johnsom | I don't know if it would be better or not. It would be a big change | 21:23 |
blogan | as far as the repo goes, i regret the decision to have them convert the SA models to data models, because it makes reusing the methods as subtransactions harder | 21:24 |
blogan | would rather have another layer that does that now | 21:24 |
blogan | well traversing the models is a pain in the ass, especially for people who don't know the limitations, and they're hard to spot | 21:25 |
dougwig | blogan: you made that decision. | 21:25 |
bana_k | anyone facing prob with create listener / | 21:25 |
bana_k | ? | 21:25 |
dougwig | blogan: i think we all said, "no hurry." :) | 21:25 |
bana_k | there is a prob with peer_port being none in the haproxy.cfg file | 21:26 |
blogan | dougwig: why is it that you haven't said one thing but once i say i regret a decision you're quick to pounce? | 21:26 |
dougwig | blogan: i have a deep need to add value. | 21:26 |
blogan | lol | 21:26 |
blogan | bana_k: i have not but i haven't done a lot of octavia testing lately | 21:27 |
johnsom | bana_k Yes, that is related to the recent model changes we are discussing. We have some bug(s) related to the models to figure out. | 21:27 |
bana_k | blogan: it happens only with active-standby | 21:27 |
blogan | dougwig: i don't regret having the sa models be converted to data models, i regret doing that in that one layer | 21:27 |
bana_k | johnsom: ok thanks | 21:27 |
*** mhickey has quit IRC | 21:28 | |
johnsom | bana_k I'm looking into it. Hopefully we can run it to ground today. | 21:29 |
bana_k | johnsom: thanks a lot. | 21:37 |
*** TrevorV has quit IRC | 21:38 | |
*** piet has quit IRC | 21:40 | |
johnsom | Ok, so if I use commit 3a49fe2dcf30bd4979420bd82b74bf2c37ec215c (pre-shared pools) and cherry pick https://review.openstack.org/#/c/277602/ I can update the lb_algorithm on the pool with no trouble. Updates the first time, like it should. | 21:40 |
*** ajmiller has quit IRC | 21:45 | |
johnsom | member-update is not working. | 21:47 |
johnsom | It takes two tries, so that is a problem I will look into after I try healthmon | 21:47 |
xgerman | blogan Cosmos avoids sql alchemy altogether - I have been told but them it’s the devli | 21:48 |
*** yamamoto_ has joined #openstack-lbaas | 21:48 | |
*** ajmiller has joined #openstack-lbaas | 21:48 | |
johnsom | Not completely true, it avoids the models | 21:48 |
xgerman | you guys need to tell me things more completley | 21:48 |
xgerman | I am binary: devli/saint | 21:48 |
xgerman | and we wouldn’t have those problems if we would have stuck to ruby... | 21:50 |
* xgerman hides | 21:50 | |
*** yamamoto_ has quit IRC | 21:52 | |
johnsom | lbaas-healthmonitor-update also works correctly | 21:52 |
sbalukoff | ....aaand gerrit is down. | 22:15 |
*** localloop127 has quit IRC | 22:16 | |
sbalukoff | So, on not reordering the flows: By doing that you are saying we need to duplicate the logic already in the repositories when we update data models. | 22:16 |
sbalukoff | Also, this is inconsistent with creates, where we definitely do create the DB entries before we push out the listener configs. | 22:17 |
sbalukoff | And... for what? This gains us nothing in a revert: You still end up with a listener in an 'ERROR' state. | 22:17 |
sbalukoff | Am I wrong in interpreting a listener in 'ERROR' to mean that it's potentially out of synch with the amphorae? | 22:18 |
sbalukoff | (configuration-wise) | 22:18 |
sbalukoff | I'm not even convinced at this point that we would want to roll back the DB transactions, per se. I think "ERROR" should be good enough-- after all, with active-standby and active-active, what do you do when one amphora takes the update just fine, but others don't? | 22:19 |
sbalukoff | Also, realize that with https://review.openstack.org/279731 everything I've tried (including member updates) is working. | 22:20 |
sbalukoff | So please... if you're going to argue for updating the model (and duplicating logic from the repo, including stuff the SQLAlchemy takes care of for us right now) so that we can leave the DB with the "original" objects (which won't be in the original state anyway)... can someone please explain the technical reason for wanting to do this? | 22:22 |
johnsom | Prior to shared pools (maybe with shared pools, haven't tried it) they all work except for member update. | 22:22 |
sbalukoff | See, it's that "except member update" part that says to me "It's not working correctly" | 22:23 |
johnsom | The reason is the same as we discussed earlier in the week. We put a lot of effort in to having this capability (we could have just slammed the DB at the api layer) to allow retries (yes, not implemented yet), debugging of the error after the fact, and a rollback that can be more intelligent. | 22:24 |
johnsom | Never said there isn't a bug with member. | 22:24 |
sbalukoff | But there's not a bug with the patch I've proposed. | 22:24 |
johnsom | But I'm not willing to throw everything else out, just because we can slam some code | 22:24 |
sbalukoff | You are mis-characterizing what I'm proposing. | 22:25 |
johnsom | Yes, you can no longer retry update actions, you can no longer debug the before and requested update state | 22:25 |
sbalukoff | You're talking about hypothetical code that's not been written yet. You could indeed retry update actions if you keep track of what those actions are supposed to be. | 22:25 |
johnsom | The code you proposed, by re-ordering, is incomplete in that you would need to pull all of the model management code out of the flows and probably just update the DB at the API layer. | 22:26 |
johnsom | debugging is not hypothetical | 22:26 |
sbalukoff | I'm not sure I agree with you on not being able to debug the before update state. | 22:27 |
johnsom | You seem very worked up over the reorder patch. What is the aversion to figuring out what is actually wrong with the model code? | 22:27 |
sbalukoff | I agree that with my update it does make most of the model code seem pointless at this point. I'm not saying we out-and-out remove it at this stage in the game. | 22:27 |
johnsom | It's dead code with that change, so it really should be removed | 22:28 |
sbalukoff | My biggest aversion to it is that there is *significant* logic that happens in repo code for l7policies and l7rules. | 22:28 |
sbalukoff | Because the logic for handling those doesn't boil down to simple database relations. | 22:28 |
johnsom | Would that need to change if we didn't re-order the flows? | 22:29 |
sbalukoff | So, if I have to do that in the data model as well... not only do we potentially introduce a whole new area for bugs to creep in, but we've got to do all the stuff that SQLAchemy actually does take care of for us in there as well. | 22:29 |
*** rtheis has quit IRC | 22:30 | |
sbalukoff | If we don't re-order the flows, I have to duplicate *all* of that logic in the data model. | 22:30 |
sbalukoff | Because, again, you're working around the fact that we don't have real database transactions by trying to fake it with the data model. That's fine as long as you are working off very simple models. | 22:30 |
sbalukoff | L7 stuff isn't that simple. | 22:31 |
johnsom | Ok, so this, to me, is a totally different argument | 22:31 |
johnsom | I'm going from a perspective of "it worked before(with a bug), allows retries, and maintains some level of idempotent behavior. | 22:33 |
sbalukoff | Do we actually do retries? | 22:34 |
sbalukoff | I would argue that this isn't really idempotent so long as we end up in an ERROR state. | 22:34 |
johnsom | If you want to be brash about it, give me ten minutes. | 22:34 |
sbalukoff | Or rather, what I've proposed is no less idempotent. | 22:35 |
sbalukoff | I'm not meaning to be brash. | 22:35 |
sbalukoff | Maybe I don't understand what you mean by "allows retries" | 22:36 |
johnsom | So, what I am hearing is the real issue is around how we have implemented the model and the complexity of the L7 rules/etc. That I have not considered or looked at, so I'm open to looking into that argument | 22:36 |
sbalukoff | That's part of the real issue, yes. | 22:38 |
johnsom | What I mean is, much like we had mis-applied it, in the create amphora flows we used the retry capability of taskflow to retry certain actions n-times before fully reverting. In a future phase, when we turn on job board, that includes trying from multiple workers. | 22:38 |
sbalukoff | And, as it's implemented, things aren't updated in the DB until it's done? | 22:39 |
sbalukoff | (If so, then I'm pretty sure all of our create flows are off) | 22:39 |
johnsom | Update and delete are the two actions that maintain the actual state of the amp in the database while the actions are attempted. | 22:40 |
johnsom | Create updates DB as actions are completed (i.e. there is no previous state) | 22:41 |
sbalukoff | What a time for gerrit to be down. :P | 22:42 |
johnsom | So, point me to the patch that has the repo changes it it, that would make model updates hard. | 22:42 |
johnsom | At least an hour | 22:42 |
sbalukoff | I was about to do that. | 22:42 |
sbalukoff | But.... I can't right now because jenkins is down. | 22:42 |
sbalukoff | gerrit. Whatever. | 22:43 |
johnsom | I am still worried that the shared pools patch may have broken updates and the follow on to-model patch breaks creates | 22:45 |
sbalukoff | When it's back up, it will be this patch: https://review.openstack.org/265529 And this patch: https://review.openstack.org/276643 | 22:45 |
johnsom | Ok | 22:45 |
sbalukoff | Are you even going to look at my to-model patch + flow re-order patch combo? (ie. the one I have not been able to get to do the wrong thing, even with member updates?) | 22:47 |
johnsom | I can try it, but I don't know how re-order would fix the corrupt haproxy.cfg on create | 22:48 |
*** yamamoto_ has joined #openstack-lbaas | 22:48 | |
sbalukoff | Give it a shot. | 22:48 |
rm_work | ugh gerrrrrit | 22:48 |
rm_work | i want to look at what you're talking about too but didn't get a chance yet lol | 22:49 |
sbalukoff | Thinking about the retry problem-- without actual DB transactions, we're going to be stuck faking transactions in one way or another. We could do this in the model, as I think is being proposed here. But we could also do this in the database by storing state information in a secondary table. (Just brainstorming, eh.) It's still an ugly pain-in-the-ass hack of course... It would be *so* nice to have real database transacti | 22:52 |
sbalukoff | ons here. | 22:52 |
*** yamamoto_ has quit IRC | 22:52 | |
johnsom | Correction, we ARE doing it using the model. Yes, using database as history was talked about when we were designing this. | 22:54 |
sbalukoff | If we're committed to a data model layer we really should have made sure that only that code accessed the repository. As it is, things like the API go straight to the repository. | 22:54 |
johnsom | Well, maybe that is the right answer. Develop a method of maintaining a sqlalchemy session transaction across tasks in a flow | 22:54 |
*** neelashah has quit IRC | 23:01 | |
rm_work | yeah i remember discussion a LB state history table | 23:04 |
rm_work | that gets really ridiculous eventually tho | 23:04 |
johnsom | Agreed | 23:05 |
xgerman | The flow should be able to keep the history so we have things spanning multiple flows? | 23:05 |
johnsom | I'm really warming to the idea of figuring out transactions | 23:05 |
sbalukoff | Well... thinking about this some more, when jobboard is implemented... we'd have to build in state management, right? And we want to attempt from multiple controllers... which means we're probably storing state in the database in one form or another. | 23:05 |
rm_work | hmm | 23:06 |
xgerman | task flow has revert and such — so what additional data do we need? | 23:06 |
johnsom | It's already handled for us. We are storing that state in the taskflow flow storage. | 23:06 |
rm_work | that's interesting, as the queue is the only storage for the "new" state, and if it's picked up by a controller worker... and that CW fails.... | 23:06 |
rm_work | it would have to just be stuck back on the queue right? | 23:06 |
rm_work | in which case another worker could pick it up anywhere anyway? | 23:06 |
rm_work | ah | 23:06 |
johnsom | When you turn on jobboard it persists that in a DB table while the flow is running | 23:06 |
rm_work | taskflow does it? | 23:06 |
rm_work | neat | 23:06 |
rm_work | still have managed to avoid learning pretty much anything about taskflow somehow | 23:07 |
johnsom | Hahaha | 23:07 |
xgerman | yeah, and since it has revert logic we should be able to have transcations | 23:07 |
xgerman | unless we are worried that other flows intercept us | 23:07 |
sbalukoff | xgerman: You're going to need to be able to lock other entities out of altering stuff. Presently we manage that through the 'PENDING_UPDATE' state. | 23:08 |
johnsom | Right now our flow storage is in-memory. That is part of the work to turn on job board, we define the persistent store. | 23:08 |
johnsom | sbalukoff right | 23:08 |
xgerman | well, in-memory out of memory is an implementation deatil | 23:08 |
xgerman | yep, we mark the whole LB pending | 23:09 |
sbalukoff | Something is not clear to me: Why couldn't we retry if the end-state we want for listeners is stored in the database? | 23:09 |
xgerman | and reject any update | 23:09 |
sbalukoff | xgerman: Yep. | 23:09 |
xgerman | database = operational state | 23:09 |
sbalukoff | That's not true when we're updating. | 23:09 |
sbalukoff | (Hence the reason we lock out other transactions with the pending update state) | 23:10 |
xgerman | we lock them out since it would cause concurrency issues | 23:10 |
sbalukoff | Yes, I know. | 23:11 |
sbalukoff | But the nature of that concurrency issue is that while the update is happening, the database is not actually in synch with what's on the amphorae. | 23:11 |
xgerman | then we give the state we want to task flow it makes it so or rolls back | 23:11 |
*** Bjoern has quit IRC | 23:11 | |
sbalukoff | But.... roll backs don't always work. | 23:11 |
sbalukoff | That is to say, our current roll-back is to just go to the error state. | 23:12 |
xgerman | those are the bugs we need to fix :-) | 23:12 |
sbalukoff | Wouldn't it be possible to store "this is what the DB model looked like just before the update" and attempt to roll that back? (Again, I'm having trouble understanding the need for the data model here.) | 23:13 |
sbalukoff | That is... | 23:13 |
sbalukoff | to store that in the same way we're going to have to store the data model state for retries? | 23:13 |
johnsom | Prior to your changes, we don't need to store anything additional to what we currently have to implement retries | 23:14 |
xgerman | In think storing in the DB the state how it is should be our goal… otherwise monitoring/observing the system will get tricky | 23:14 |
xgerman | I am Still puzzled that update would not follow that philosophy | 23:15 |
*** allan_h has quit IRC | 23:16 | |
sbalukoff | johnsom: So are you saying we should just ditch the idea of supporting features that require more complex models like L7 policies and rules? :P | 23:16 |
*** piet_ has joined #openstack-lbaas | 23:16 | |
*** ducttape_ has quit IRC | 23:16 | |
johnsom | Not my intent, just saying that your statement about storing state for retries did not exist. | 23:17 |
* sbalukoff sighs | 23:17 | |
sbalukoff | What do we currently retry? | 23:17 |
* johnsom sighs | 23:18 | |
johnsom | I am simply saying, there is nothing more to store | 23:18 |
johnsom | It's already there | 23:18 |
blogan | sbalukoff: are you saying no need for data models or sql alchemy models? (not sure if you're saying one or both) | 23:20 |
sbalukoff | I'm not sure what we need the data models for. | 23:20 |
johnsom | I think he doesn't want to update the data models for L7 | 23:20 |
blogan | sbalukoff: but do you mean the static data models, or the sqlalchemy models? | 23:21 |
sbalukoff | Er... the ones described in common/data_models.py ? | 23:21 |
blogan | sbalukoff: okay, the redundant ones | 23:21 |
sbalukoff | I agree that sqlalchemy adds some complication in inadvertent ways in the effort to make many other things around the database simpler. | 23:22 |
blogan | sbalukoff: so instead use the sqlalchemy models and pass those around? | 23:22 |
johnsom | So, most of the gang is here it seems. Should we re-evaluation the "maintain history during update" decision? | 23:22 |
sbalukoff | I'm not sure at this point whether I would advocate scrapping the sqlalchemy stuff right now. | 23:22 |
sbalukoff | blogan: Er... yes? I think? We don't have things named very descriptively, so it might help to reference the file in which they're defined. | 23:23 |
blogan | i would say keep the sqlalchemy models, but remove the data models if possible, but that's got some big cons to it | 23:23 |
blogan | okay | 23:23 |
sbalukoff | If by sql alchemy models you means the stuff in db/models.py that repositories.py builds off of... | 23:23 |
blogan | sbalukoff: https://github.com/openstack/octavia/blob/master/octavia/db/models.py | 23:23 |
blogan | yes | 23:23 |
*** jwarendt has quit IRC | 23:23 | |
sbalukoff | Oh! I forgot we can get at this stuff outside of gerrit. | 23:24 |
sbalukoff | Muahahaha! | 23:24 |
johnsom | So, on update and delete calls to the Octavia API, is there value in maintaining the currently deployed state of the amp until the update on the amp completes successfully? | 23:24 |
*** jwarendt has joined #openstack-lbaas | 23:24 | |
sbalukoff | I think if we're going to keep the data models, we should force everything higher level to use that interface (like the api) | 23:25 |
blogan | johnsom: so update the db with the new information immediately at the API layer? | 23:25 |
johnsom | blogan Yes | 23:25 |
blogan | sbalukoff: does it not now? | 23:25 |
sbalukoff | blogan: That's what it does now. | 23:25 |
sbalukoff | Or rather... | 23:25 |
blogan | johnsom: so if we update the information immediately, then users will see that information immediately on GETs to that resource...which isn't too big of a deal | 23:26 |
blogan | thinking out loud here | 23:26 |
sbalukoff | It calls the handler, which does get sent to the controller worker. | 23:26 |
sbalukoff | But the API is going straight to the repository all over the place... | 23:26 |
johnsom | blogan yes, the get call would return data that might not yet be in effect | 23:26 |
johnsom | blogan Especially if we have a large number of amps | 23:27 |
blogan | sbalukoff: i dont understand what you mean yet | 23:27 |
blogan | johnsom: the whole reason to do the update only one success was to rollback to a previously known good state, obviously not implemented yet, but also it left open the ability to do a task like API in the future | 23:28 |
johnsom | blogan Yep | 23:29 |
blogan | task like API would be, someone requests a create, update, or delete, and tehn they get a task id and query the task id until active | 23:29 |
blogan | but that was too forward thinking honestly | 23:29 |
sbalukoff | blogan: In api/v1/controller/listener.py you will find a line that looks like this: db_listener = self.repositories.listener.create( | 23:29 |
sbalukoff | session, **listener_dict) | 23:29 |
sbalukoff | So... we're going straight to the DB to do that. | 23:29 |
blogan | sbalukoff: yeah, so what should we do? | 23:29 |
sbalukoff | or repository. Which is effectively the DB. | 23:29 |
*** jwarendt has quit IRC | 23:30 | |
sbalukoff | If we keep the data models, we should disallow accessing the repository directly. So the above call becomes a call to a handler only, or something else which operates on the data_model. | 23:31 |
sbalukoff | Also, we're doing things like setting the LB and listener statuses on update. | 23:31 |
sbalukoff | Which, again, is happening directly against the DB. | 23:31 |
sbalukoff | (or repository, again... which is effectively the DB) | 23:31 |
blogan | so the main concern with going to the handler for that is the status change | 23:32 |
blogan | well not just that | 23:32 |
blogan | but now we'd need to build a whole rpc api for the db calls | 23:33 |
*** yamamoto_ has joined #openstack-lbaas | 23:33 | |
johnsom | So, going through the models, we make the transaction thing worse right? | 23:33 |
sbalukoff | which models? | 23:33 |
blogan | also what do yall mean by not being able to do transactions? | 23:33 |
blogan | do amp operations in the middle of a db transaction? | 23:33 |
johnsom | your proposal for no direct repo access. | 23:33 |
sbalukoff | johnsom: I suspect it would, yes. | 23:34 |
sbalukoff | Well... maybe. | 23:34 |
johnsom | Yeah, right now we are doing a poor man's transaction in the flows by updating the models and leaving the DB in place. | 23:34 |
sbalukoff | If the handler handled keeping state so the API didn't have to worry about it at all that might be better. | 23:34 |
*** longstaff has quit IRC | 23:34 | |
sbalukoff | I dunno. | 23:35 |
sbalukoff | A lot of this stuff is really opaque since we don't seem to be consistently using layers. | 23:35 |
blogan | i don't like going to the handler for db calls | 23:36 |
johnsom | I think we should write native sql and get rid of it all! grin | 23:37 |
blogan | i'm not entirely sure if oslo messaging makes rpc cast calls go over the queue, but either way GETs going straight to the workers will overload them | 23:37 |
*** yamamoto_ has quit IRC | 23:37 | |
*** piet_ has quit IRC | 23:37 | |
sbalukoff | blogan: We're already going to the handler to update the DB in the case of updates and deletes. | 23:37 |
johnsom | sbalukoff I don't follow you there | 23:38 |
blogan | sbalukoff: yes but i'm talking about GETs on the API will have to retrieve from the db, and GETs happen exponentially more frequently than POSTs PUTs and DELETEs, so the cotnroller worker serving all those GETs is asking for trouble | 23:38 |
johnsom | What handler? | 23:38 |
blogan | johnsom: handler, right now its the queue producer basically | 23:38 |
sbalukoff | Again, looking at api/v1/controllers/listener.py.... put method: self.handler.update(db_listener, listener) | 23:38 |
sbalukoff | That's how we update the DB. | 23:38 |
johnsom | blogan +1 to that | 23:38 |
johnsom | blogan GETs should not go across the queue | 23:39 |
sbalukoff | But we don't do that for creates. | 23:39 |
blogan | sbalukoff: and johnsom is suggesting just doing the db update before the handler update | 23:39 |
blogan | just like creates | 23:39 |
sbalukoff | blogan: Er.. that's what I have been suggesting. | 23:39 |
sbalukoff | The task flow reorder would do that. | 23:40 |
blogan | sbalukoff: what? you said the API node should go to the handler to get db access, which to me means go across the queue, or straight to the controller | 23:40 |
johnsom | sbalukoff No, the order would go away | 23:40 |
sbalukoff | johnsom: Ok, now I'm confused. | 23:40 |
blogan | we all are :( | 23:40 |
sbalukoff | blogan: You're right that I was suggesting going to the handler for DB access. But you're also right that I wasn't thinking about read-only queries (which should be safe to do directly). | 23:41 |
blogan | sbalukoff: ah okay now i think i know what you mean | 23:41 |
sbalukoff | Mostly I was pointing out that for creates we update the database THEN call the handler, whereas for updates and deletes we just call the handler. | 23:41 |
sbalukoff | (And the handler updates the DB.) | 23:41 |
blogan | sbalukoff: so creates would go to the handler as well before updating the db? | 23:42 |
johnsom | If we are going to not maintain history, as the re-order patch does, there is no point to all of that model stuff. The API should update the database straight away and call it a day. | 23:42 |
sbalukoff | blogan: If we want to be consistent in how we handle retries / rollbacks.. yes. But I'm actually much more of a fan of the idea that we update the DB then call the handler for ALL write operations to the database... | 23:42 |
blogan | sbalukoff: if creates go to the handler before the db, the user won't be able to get an ID from the POST to know what load balancer to query | 23:43 |
sbalukoff | blogan: Well... right now we return data from a delete and update that's stale, too. | 23:43 |
blogan | so yeah i'm in favor of doing the updates the same way as creates | 23:43 |
johnsom | sbalukoff NO, we do not return stale data on update or delete | 23:44 |
blogan | sbalukoff: depends on your definition of stale | 23:44 |
blogan | when the request returns that is still hwo the lb is configured, or whatever resource | 23:44 |
johnsom | We currently, pre-reorder return the actual state of the amp | 23:44 |
*** armax has quit IRC | 23:44 | |
johnsom | LB/whatever | 23:44 |
*** armax has joined #openstack-lbaas | 23:44 | |
sbalukoff | johnsom: So... we tell the user about a state that will only exist if the transaction succeeds? | 23:45 |
sbalukoff | In other words, the same data we would return if we just updated the database in the first place? ;) | 23:45 |
johnsom | sbalukoff Not currently | 23:45 |
johnsom | sbalukoff with the re-order, yes your would return in-accurate data if it fails | 23:46 |
sbalukoff | Right. I'm just more concerned about returning accurate data if it succeeds. | 23:46 |
johnsom | To me re-order == update db at API time | 23:46 |
johnsom | Both the current way and re-order would return accurate data on success | 23:47 |
sbalukoff | I'm not sure I understand what you mean when you say "pre-reorder return the actual state of the amp" | 23:47 |
sbalukoff | So... both the current way and re-order return inaccurate data on failure? | 23:47 |
johnsom | sbalukoff Current code, pre-reorder, get calls to the API reflect the actual state of the LB at all times. | 23:47 |
xgerman | and I like that | 23:48 |
johnsom | Current way will continue to return accurate data on a failure, re-order would return inaccurate data | 23:48 |
sbalukoff | How do you figure? | 23:48 |
johnsom | as the database has been updated, but the load balancer has not yet changed | 23:48 |
sbalukoff | We're talking failure mode: The load balancer is in error. | 23:48 |
xgerman | blogan, gorge, and I decided a while ago to go with the real data in the DB - desired data passed into the queue/flow | 23:49 |
sbalukoff | So, almost by definition, we can't trust that the state of the listener is accurate EXCEPT for the fact that it's in error. | 23:49 |
johnsom | State == Error, configuration == new configuration not currently running on the load balancer. | 23:49 |
johnsom | xgerman +1 yes, we decided this. This is a re-hash | 23:50 |
xgerman | error is a thorny issue and we might have bugs in it since those cases are hard | 23:50 |
xgerman | so I am not defending todays rollback code | 23:50 |
johnsom | Just because the lb is in ERROR (today) doesn't mean we shutdown the amps, they are still running with a config different than a GET would return | 23:50 |
sbalukoff | johnsom: By "State == Error, configuration == new configuration not currently running on the load balancer" are you talking about with or without the taskflow reorder. | 23:51 |
johnsom | xgerman +1 we have not implemented retry code yet. | 23:51 |
sbalukoff | Because that still accurately describes things with the task flow reorder. | 23:51 |
johnsom | Error is a cop-out | 23:51 |
xgerman | +1 | 23:51 |
johnsom | sbalukoff With the reorder. It causes this | 23:52 |
xgerman | goal is to make the code better so even error has the right info in the DB | 23:52 |
johnsom | Without re-order, we do not have that problem | 23:52 |
sbalukoff | johnsom: Without the reorder, today, we get the same thing. | 23:52 |
johnsom | sbalukoff No! | 23:52 |
blogan | johnsom: would doing the db update for all updates at the api layer solve a lot of these issues? | 23:52 |
johnsom | blogan No, it is the same as re-order, just a better implementation | 23:53 |
sbalukoff | johnsom: Er... I could be wrong, but I swear that's what I've been seeing in my testing... | 23:53 |
johnsom | sbalukoff With current code, we fail, the lb goes into error, but a get will pull what is actually running on the amps | 23:53 |
xgerman | blogan, making the DB show what we want makes it difficult to get a sense what’s actually running | 23:53 |
sbalukoff | johnsom: "what is actually running on the amps" is incorrect. You can't assume anything is actually running on the amps correctly at that point. | 23:54 |
johnsom | sbalukoff because we do not commit the updates to the DB until after all of the amps have successfully been updated. Active/Active scale out actually makes the re-order problem worse | 23:54 |
johnsom | sbalukoff why? | 23:55 |
sbalukoff | johnsom: You have to know all the ways the amphora update can fail. | 23:55 |
johnsom | sbalukoff we know the attempt to change what is running on the amp failed | 23:55 |
sbalukoff | Yep. And we don't know why. | 23:55 |
blogan | johnsom: if someone adds a member on another subnet, that network has to be plugged and that could cause the amp to fail somehow | 23:56 |
johnsom | That is a create call | 23:56 |
xgerman | ok, so you think amp in error - all bets are off, should it between the eyes and start over | 23:56 |
blogan | so its possible the amp could not be running, but i'd says its unlikely | 23:56 |
blogan | johnsom: oh good point lol | 23:56 |
sbalukoff | xgerman: That's generally what I expect most people to do. | 23:57 |
sbalukoff | Do we not allow changing the member subnet and IP once they are created? | 23:57 |
xgerman | well, then the fresh amp fails and you start diggibng | 23:57 |
blogan | sbalukoff: nope, i see no reason to either | 23:57 |
blogan | you may as well creat a new member, thats essentially what you're doing | 23:57 |
xgerman | and that digging part is helped by some hints from the DB... | 23:57 |
sbalukoff | blogan: Ok. But that's one example. There are lots of reasons why the amp update might fail. You might just have lost power to part of your datacenter... | 23:58 |
xgerman | or you file a ticket with your cloud admin to look at the logs... | 23:58 |
blogan | xgerman: that is a decent reason | 23:58 |
johnsom | Yeah, that is the "debug" argument for keeping it as it is. | 23:59 |
blogan | i know | 23:59 |
xgerman | :-) | 23:59 |
johnsom | And, thus the question, do we want to give up having the history in the DB | 23:59 |
blogan | okay so, what are the alternatives here, keep things as is OR update db before? | 23:59 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!