#openstack-meeting log

13:06:06 <joehuang> #startmeeting tricircle
13:06:07 <openstack> Meeting started Wed Aug 26 13:06:06 2015 UTC and is due to finish in 60 minutes.  The chair is joehuang. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:06:08 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:06:10 <openstack> The meeting name has been set to 'tricircle'
13:06:50 <joehuang> #topic rollcall
13:07:02 <gampel> #info gampel
13:07:05 <joehuang> #info joehuang
13:07:31 <saggi> #info saggi
13:08:03 <irenab> hi
13:08:26 <joehuang> hi irena, rollcall now
13:08:29 * irenab : will attend partially, have conflicting meeting
13:08:45 <joehuang> understand, thanks
13:08:54 <joehuang> hi zhiyuan
13:08:58 <zhiyuan_> hi joe
13:09:07 <irenab> #info irenab
13:09:13 <zhiyuan_> #info zhiyuan
13:09:23 <joehuang> #topic recent progress
13:09:56 <gampel> saggi can you please explain what you did in nova and why
13:10:11 <joehuang> we have a design meeting this Monday about network connectivity
13:10:23 <joehuang> yes, please
13:10:50 <saggi> As I spoke about in previous meeting. I thought up a way too implement what we need without changing nova core code.
13:11:08 <joehuang> how
13:11:25 <saggi> What I did was to hook up the scheduler and have the cascade_service appear to be multiple nova-compute hosts
13:11:26 <joehuang> sorry now shown last meeting
13:11:40 <saggi> joehuang: Takes time to type :)
13:12:29 <saggi> So the general idea is that when the user wants to run a VM we get the scheduling information in the cascade_service since it's registers as the scheduler. Look at the AZ and return the node_name of the site.
13:12:42 <saggi> In the cascade service we have a compute_service per site
13:13:05 <saggi> so the cascade service always gets the request.
13:13:10 <joehuang> the cascade service as a schedluer?
13:13:20 <saggi> and multiple compute nodes :)
13:13:37 <saggi> can you guys connect to imgur or is it blocked?
13:13:52 <joehuang> what's imgur
13:14:34 <zhiyuan_> i can access
13:14:45 <zhiyuan_> http://imgur.com/ this url, right?
13:14:49 <saggi> yes
13:14:51 <joehuang> which cascade service node will be called for reboot/etc VM operation?
13:15:29 <saggi> http://i.imgur.com/za5kZpy.png
13:15:34 <joehuang> ok, I can access too
13:15:58 <saggi> In this case I have 2 fake sites and they look like to compute hosts
13:16:36 <saggi> They are all actually the cascade service
13:16:54 <saggi> in the hypervisor view you can see that they are cascade sites http://i.imgur.com/tNWeDIn.png
13:16:58 <joehuang> go on please
13:18:36 <saggi> This is what I have ATM. You can register how many sites you want and they will appear as compute hosts.
13:18:56 <saggi> And you can control all the stats from the cascade service
13:19:10 <joehuang> shall the cascade service to collect resource usage from bottom OpenStack
13:19:18 <saggi> Yes
13:19:41 <saggi> We will use the site aggregate stats as the host stats
13:20:12 <saggi> What I need to add next is the actual scheduler logic that uses AZs to select the host
13:20:44 <joehuang> so one cascade service will handle one bottom openstack
13:21:10 <saggi> no one cascade service will handle N bottom openstacks
13:21:33 <saggi> At the start we will have only one cascade service
13:21:52 <joehuang> only one cascade service?
13:22:01 <saggi> yes
13:22:09 <saggi> For now
13:22:23 <saggi> The design allows for more.
13:22:25 <zhiyuan_> where is the compute service running? I think for each bottom OS we need one compute service
13:22:25 <joehuang> the availability need to be in consideration
13:22:41 <saggi> The compute service isn't running anywhere. It's fake.
13:22:46 <gampel> first we need to establish flow end  to end
13:23:35 <saggi> It represents a whole site.
13:23:54 <gampel> we are not sure that the cascading service will be the bottleneck, depends on the run time info module (push ,pull) and who will do that job
13:24:23 <saggi> The design allows distributing the fake hosts across multiple cascade services.
13:24:57 <saggi> But we don't want to start coding all the synchronization that this requires just now
13:25:04 <joehuang> do you mean two nodes for one bottom openstack
13:25:27 <joehuang> to make flow work is compared simple
13:25:33 <gampel> No every cascaded service handle one or more bottom sites
13:26:00 <zhiyuan_> ic, so there is no rpc between scheduler and computer service, just function call?
13:26:07 <saggi> You need to have a single point to handle requests for a single site to have correct ordering of operations.
13:26:32 <saggi> there is no communication between them in nova.
13:26:53 <saggi> But because we are both the scheduler and the compute host we can pass information between them in the cascade layer.
13:27:11 <joehuang> so how to forward the RPC call like reboot VM from API to cascade service
13:27:13 <saggi> So that we don't loose the scheduling information when passing the create call down to the bottom OS
13:27:38 <saggi> Nova will contact the fake host. Which is the cascade service itself.
13:28:42 <joehuang> ok, the RPC call will be forwarded to fixed fake call right
13:28:58 <joehuang> ok, the RPC call will be forwarded to fixed fake noderight
13:29:36 <joehuang> that means if you add more cascade service
13:29:39 <saggi> yes, which is just an instance inside the cascade service.
13:29:54 <joehuang> the RPC call will still forward to the same fake node
13:30:08 <saggi> Yes, since it's the one managing that host
13:30:36 <joehuang> then how to scale out
13:31:10 <joehuang> and if this fake node failed
13:31:35 <joehuang> which cascade service node will be selected for the bottom opentsack
13:31:48 <saggi> The scheduler tell nova what fake host to use.
13:31:54 <joehuang> and how to redirect the API rpc call to the new cascade service nod
13:32:03 <saggi> This makes nova contact the correct cascade_service
13:32:10 <saggi> this allows you to scale out
13:32:47 <joehuang> but in the database all VM have already been allocated for the fake node
13:33:25 <joehuang> if the cascade service will act as the new fake node (the same old name )
13:33:38 <saggi> yes
13:33:45 <saggi> as for redundancy, you could have an active passive set up where cascade services spin up fake node on another cascade service and it will handle the requests.
13:34:06 <saggi> Spinning a fake node is just listening on the proper queue
13:34:07 <gampel> we do not see a problem of HA/scaling in this design
13:34:15 <gampel> I think that we need to agree that HA is in the design but will be handled after we have end to end flow
13:34:19 <saggi> There are issues with VNC connections.
13:34:41 <saggi> which will probably have to be reestablished since the proxy IP will change.
13:34:56 <saggi> But all commands that use the message queue will be uneffected.
13:35:31 <gampel> i am not sure regarding the vnc when we get there we could offload the connection directly to the bottom OS
13:35:34 <saggi> Since the passive cascade service will spin up a fake host and listen on that topic
13:35:45 <saggi> gampel: maybe
13:36:18 <joehuang> what's the benefit compared to the PoC, where one compute-node will proxy one bottom openstack
13:37:34 <gampel> small code change not intrusive very clear to understand what we changed and why, one service could handle multi bottom sites
13:38:36 <joehuang> for PoC code, all RPC from scheduler/API kept as before
13:38:51 <saggi> It's also easier for us, at least at the start. To assume a single cascade service and don't worry about ordering and distribution of information across multiple nodes.
13:40:17 <joehuang> if one cascade service will be reponsible for multiple bottom openstack, then is there any issue for the fanout RPC call from neutron API
13:41:02 <saggi> You need to take control of the scheduler anyway so you don't loose the scheduling information in the cascade layer so you can pass it to the bottom scheduler.
13:41:31 <joehuang> no duplicated fake node allowed for multiple cascade service
13:41:48 <gampel> the Neutron /Nova layer will not be aware of the cascading cascading service layout so it must do funout
13:42:28 <gampel> can you say aging i did not understand ?
13:43:44 <joehuang> if you use fanout, then no two fake node ( cascade service ) to work for one bottom openstack
13:44:08 <gampel> I suggest that Saggi and me will ad the new nova to the design doc and we could discuss it there (we will add the high level design for HA )
13:44:14 <joehuang> if you use fanout, then two fake nodes ( cascade services ) to work for one bottom openstack not allowed
13:44:52 <gampel> no as saggi said we have only one active CS working on a bottom site
13:45:32 <joehuang> if there a lot of API calls for one bottom OpenStack, then other fake node should be moved to other cascade service
13:45:42 <gampel> i suggest we will discuss this in the document and mailing list so we will have time to discuss the status of other tasks
13:45:54 <joehuang> but unfortunately, it's un-estimatble
13:45:59 <joehuang> at last
13:46:17 <joehuang> we have to deploy one cascade service for one bottom openstack
13:46:42 <gampel> saggi will send his patch today and we will have documentation about it
13:47:27 <joehuang> ok
13:48:00 <gampel> we do not agree with that statement and will but lets discuss this with proper design doc intruder
13:48:37 <joehuang> good
13:48:51 <gampel> what is the status of the API , DAL --> Neutron , Nova
13:48:57 <joehuang> the more discussion, the better
13:49:04 <saggi> joehuang: :)
13:49:16 <joehuang> zhiyuan is working on it
13:50:03 <joehuang> keystone part has been settled
13:50:25 <zhiyuan_> yes, I find that we need to store the endpoint url in the database, since normal user cannot get endpoint via "endpoint-list"
13:50:40 <joehuang> agree
13:51:07 <gampel> can you explain a bit more please
13:51:08 <saggi> as cachgin?
13:51:10 <saggi> caching?
13:51:13 <joehuang> and I will change back the site tables for url storage
13:51:45 <joehuang> no caching, because the endpoint can only get through admin
13:52:10 <zhiyuan_> context and site id is passed to DAL, then DAL query the database the get the endpoint url according to the site id and resource type from the database
13:52:21 <joehuang> but we don't want to configure the admin information in configuration
13:52:45 <gampel> are we talking about the DAL to the -->TOP neutron ,nova
13:52:52 <saggi> zhiyuan_: How will we make sure everything is synced than?
13:52:59 <joehuang> so restore to the table design in the doc
13:53:55 <saggi> How will we make sure it's all configured correctly? Keystone and cascade?
13:54:16 <saggi> Make sure nothing was changes at only one end
13:55:01 <zhiyuan_> We have the siteserviceconfiguration table in the design doc, this is to store the url information.
13:55:28 <zhiyuan_> User needs to register this information via cascade API
13:55:35 <saggi> Yes, but if the admin changes the information in keystone. How will we know?
13:55:49 <joehuang> saggi wants to know how to validate the url
13:56:32 <joehuang> this could be done in API. but if later change happen in keystone, then the admin has to reconfigure cascade service too
13:57:09 <zhiyuan_> Or we give cascade service an admin account to sync the change
13:58:13 <saggi> zhiyuan_: I think we will need an admin account anyway. For information from nova and neutron.
13:58:43 <joehuang> do we want to have admin account configuration in cascade service
13:58:45 <saggi> gampel: can you think about any APIs we need right now that are admin only.
13:58:59 <joehuang> if yes, then cache works
13:59:12 <joehuang> if not, the store the url in db
13:59:42 <gampel> in the bottom  we hope to avoid admin call
13:59:47 <joehuang> API could be controlled by policy
13:59:48 <saggi> Top
13:59:57 <saggi> joehuang: We could have a sync_keystone() call that requires an admin context.
14:00:18 <saggi> If syncing keystone is our only issue.
14:00:24 <gampel> i think this will work
14:00:39 <saggi> I'll probably call it something different though :)
14:00:56 <joehuang> yes, if we want to get endpoint from keystone, then admin context is needed
14:01:21 <gampel> and then we could use  the keystone regions
14:01:30 <saggi> What I mean is that instead of having API to add URIs have an API to sync that information.
14:01:46 <joehuang> so conclusion is that we configure admin information in cascade service
14:01:51 <zhiyuan_> if we have the admin account, I think we can also use it to get the endpoints. Is there any reason we should limit the use of the admin account?
14:03:37 <joehuang> I got gampel's idea, use one api to refresh endpoint information from keystone, and store them in db
14:03:45 <gampel> I do not see problem having admin API to keystone and TOP i hope to avoid admin on the bottoms
14:04:35 <zhiyuan_> joehuang: so db works as a cache?
14:04:55 <joehuang> I think so. how about your ideas
14:05:13 <annegentle> knock knock
14:05:26 <saggi> We gotta bail guys
14:05:27 <zhiyuan_> oh, we run out of time again....
14:05:31 <annegentle> :)
14:05:32 <joehuang> we have to end meeting now.
14:05:35 <gampel> let switch to the #openstack-tricircle
14:05:36 <joehuang> byr
14:05:37 <saggi> #neutron-tricircle ?
14:05:38 <joehuang> bye
14:05:45 <joehuang> #endmeeting