#openstack-meeting log

14:59:52 <n0ano> #startmeeting scheduler
14:59:53 <openstack> Meeting started Tue Aug  6 14:59:52 2013 UTC and is due to finish in 60 minutes.  The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:59:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:59:56 <openstack> The meeting name has been set to 'scheduler'
15:00:10 <n0ano> anyone here for the scheduler meeting?
15:00:45 <debo_os> Hi this is Debo
15:00:49 <debo_os> for the scheduler meeting
15:01:15 <debo_os> I am covering for Senhua who left the Openstack world for now :)
15:01:32 * glikson here
15:01:34 <n0ano> debo_os, NP, just waiting for people to gather
15:02:07 <garyk> hi all
15:03:15 <n0ano> #topic instance groups
15:03:26 <n0ano> garyk, I believe this was your issue
15:03:45 <garyk> n0ano: yes, that is correct
15:03:57 <n0ano> care to expand upon it a little
15:04:02 <garyk> i just wanted to bring everybody up to date with our developments and bottlenecks
15:05:16 <garyk> n0ano: the BP is https://blueprints.launchpad.net/nova/+spec/instance-group-api-extension
15:05:25 <garyk> and the wiki is https://wiki.openstack.org/wiki/GroupApiExtension
15:05:46 <garyk> at the moment we the DB support approved in H2
15:06:03 <garyk> the API was looking good until we were asked to use the object model.
15:06:16 <garyk> that was a blocking feature about 3 weeks ago.
15:06:44 <garyk> we have added the support - https://review.openstack.org/#/c/38979/ and are planning to update the API to use this
15:07:08 <debo_os> I am updating the API extn based on the support
15:07:10 <garyk> In addition to this the schsuling support has been added (was approved and then reverted) and is now back in reveiw
15:07:21 <garyk> https://review.openstack.org/#/c/33956/
15:07:55 <garyk> At the moment my concern is that this feature, which we decided at the portland summit was importnat, may not make the H3 cut due to the issue that are out of our control.
15:08:13 <garyk> i wanted to know if there is anyway that we can get some help here with the review process and the issue of the object.
15:08:40 <garyk> sorry - obuject support (this is added and we'll integrate in the coming days - just feels like we are going to default on this)
15:08:55 <n0ano> reviews are always an issue, just rattle the cages here and on the mailing list is the best technique
15:09:15 <n0ano> more interesting is your problem, is that with the object model and, if so, what's the issue?
15:09:23 <PhilDay> @garyk - in terms of polices within a group, what is implemented at the moment ?
15:09:41 <garyk> n0ano: the API patch was nacked due to the fact that it had direct access with the database
15:09:55 <garyk> PhilDay: at the moment anti affinity is in the review process
15:10:27 <garyk> PhilDay: Jay Lau want to add affinity host support above this
15:10:35 <PhilDay> @garyk - Ok thanks, and I assume that builds on the exiting filter ?
15:10:50 <garyk> PhilDay: we are also planning network proximity - but that will certainly not be in the H cycle
15:11:22 <garyk> PhilDay: yes, they are both using existing filters, they are now hooking into the databse structure for instance group management.
15:11:30 <garyk> prior to that it was an ugly hack
15:12:08 <PhilDay> It was the network proximity reference in the wiki that made me ask the question - I didn't remember seeing anything related to that in the code
15:12:32 <garyk> PhilDay: it was something we discussed at the sumiit and was in the queue for development.
15:12:59 <garyk> i guess that we can discuss that part at the next summit.
15:13:01 <PhilDay> @garyk - thansk for the clarification - so this set of changes is really about tidying up the group management
15:13:18 <garyk> PhilDay: correct.
15:13:48 <n0ano> well, tigying up except for the nack on the API change, that seems rather significant, do you have an alternative
15:13:51 <garyk> my concern is making the cut for the H cycle. I feel that we are being unlucky with various changes in Nova out of our control (for example the usage of objects)
15:14:47 <garyk> n0ano: not sure i understand
15:15:09 <n0ano> you said they nacked because of direct access to the DB, what's your alternative to that?
15:15:53 <garyk> n0ano: we have implemented the object support. We are in the process of integrating this into the API layer. Now it is just the review process.
15:16:04 <garyk> is theer anyway that we can get a core reviewer assigned to this?
15:17:24 <n0ano> I not a core reviewer so I can't help, have you asked on the mailing list
15:17:25 <garyk> in neutron we do this - that is, a core reviewr is assigned to developments
15:18:03 <garyk> n0ano: i do not have anything to add. maybe dabo or glikson have something to add
15:18:14 <garyk> sorry debo_os not dabo
15:20:05 <n0ano> well, there's the Nova meeting this Thurs, you could ask for reviewers there, that meeting worries about scheduling/review issues
15:20:35 <garyk> n0ano: ok, will do. i'll try and attend - the hours are crazy for us
15:20:48 <debo_os> Well I am updating the API code based on Gary's patch and I would second Gary about the core reviewer
15:21:06 <n0ano> garyk, I'll be at the meeting, if you don't make it I'll be sure to raise your review issue
15:21:10 <debo_os> and his comments ....
15:21:21 <garyk> n0ano: thanks! much appreciated
15:21:38 <n0ano> NP
15:21:45 <n0ano> Unless there's more on this...
15:22:27 <n0ano> #topic overall scheduler plan
15:22:55 <n0ano> This was more jog0 issue but we talked about it a little at the last meeting and I suggested everyone think about it
15:23:38 <n0ano> there are a lot of random BPs out against the scheduler, does it make sense to come up with a more unified plan for the scheduler
15:24:18 <garyk> n0ano: agreed.
15:24:45 <garyk> n0ano; i think that the details that boris-42 posted are very imported and highlight some serious issues
15:25:12 <n0ano> which details were you thinking of?  I'm concerned about scaling issues myself.
15:25:13 <boris-42> garyk out meeting time hi all
15:25:25 <PhilDay> Can you repeat (or point to boris-42's comments) ?
15:25:26 <boris-42> n0ano scaling + flexibility
15:25:50 <garyk> boris-42: can you please post the link to your doc
15:25:59 <boris-42> garyk one sec
15:26:05 <n0ano> well, I think the current deisgn (plugable filters/weights) is pretty flexible, not so sure about it scalability
15:26:19 <boris-42> nano no it is not flexible
15:26:32 <debo_os> have folks done scalabiity tests and posted results somewhere
15:26:46 <boris-42> debo_os we are going to test on real deployments
15:26:51 <boris-42> debo_os different approaches
15:27:02 <boris-42> debo_os and show results on HK summit
15:27:21 <n0ano> debo_os, yes/no, bluehost did some work but their environment is unique enough that some people don't think their results apply in general
15:27:39 <boris-42> nano they rewrite half openstack=)
15:27:43 <boris-42> n0ano *
15:27:48 <n0ano> boris-42, indeed
15:28:01 <boris-42> by the way they are agree with our approach
15:28:04 <PhilDay> They also don't really schedule as such - they place onto specific hosts
15:28:10 <boris-42> https://docs.google.com/a/mirantis.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit
15:28:14 <n0ano> boris-42, when you say `real deployments', what kind of scale are you talking about
15:28:31 <boris-42> we would like to test on 10-30k nodes
15:28:46 <PaulMurray> boris-42 it would be good to know where performance problems are as well as just straight scale measures
15:28:48 <boris-42> at least just use case of creating instances
15:28:51 <boris-42> ahaha
15:28:56 <PaulMurray> Is that in your plan?
15:29:03 <boris-42> whole openstack is one big botleneck=)
15:29:11 <PaulMurray> :)
15:29:19 <PaulMurray> I noticed
15:29:23 <n0ano> boris-42, now, now :-)
15:29:27 <boris-42> first problem are periodic_tasks
15:29:31 <boris-42> second scheduler
15:29:45 <boris-42> and then we will see=)
15:30:02 <n0ano> boris-42, indeed, I still need to create a BP to remove the periodic scheduler update, we're in violent agreement there
15:30:12 <boris-42> actually we almost finished new version of scheduler
15:30:27 <boris-42> DOC is a little bit out of date
15:30:33 <boris-42> we are going to store data not locally
15:30:36 <boris-42> in scheduler
15:30:50 <garyk> boris-42: there are a number of bottlenecks and they can be dealt with
15:30:51 <boris-42> but in distributed master-master memcached
15:30:52 <n0ano> more problematic is your idea to remove the DB, there is a significant group of opionion that fan-out to the scheduler is wrong and storing state in the DB is right
15:31:07 <PhilDay> Is there a BP / review for that work boris-42 ?
15:31:15 <garyk> the scheduler is certainly one and has a number of shortcomings. i guess it is a process to get this straight.
15:31:23 <boris-42> DB is botleneck
15:31:25 <garyk> these would be good topis for the up and coming summit
15:31:27 <boris-42> there is no fanout
15:31:45 <debo_os> sorry I am a scheduler noob ... but is the main issue the central DB and the fact that its embedded into nova
15:32:07 <boris-42> we will produce N/(PERIODIC_TASK_TIME*SCHEDULERS_COUNT) requests to schedulers
15:32:20 <llu-laptop> boris-42: if no fanout, than how all schedulers can get the same compute node's upate?
15:32:28 <boris-42> memcached
15:32:43 <n0ano> so you're replacing the DB with memcached
15:32:44 <boris-42> we will use one memchaced for all shceudlers
15:32:53 <boris-42> and scheduler will keep all data
15:32:56 <boris-42> not nova
15:33:48 <PaulMurray> boris-42 what do you mean not nova - is this cheduler outside nova?
15:34:04 <PhilDay> So running memcached will become a requirement for all Nova installs - or is this an optional new scheduler ?
15:34:30 <garyk> boris-42: i am not sure that memcached is a solution. but i think that the design is more importnat than the implementations at the moment
15:34:37 <PhilDay> (Just thinking that elsewhere memcahce has been an option, not a requirement so far)
15:34:38 <boris-42> PhilDay We will implement only for memchached (but you are able to impelemtn realization for others backends)
15:34:39 <debo_os> boris-42: well memcached or any other distributed low latency DB :)
15:34:50 <boris-42> yes
15:34:54 <boris-42> debo_os ^
15:35:00 <boris-42> there will be interface
15:35:08 <boris-42> with get(), put(), and get_all() methos
15:35:15 <debo_os> of course
15:35:21 <boris-42> you could implement it for mysql even=)
15:35:31 <boris-42> but we choose memcached=)
15:35:39 <PhilDay> So this is an alternative to the filter scheduler - or a new version / replacement for the filter scheduler ?
15:35:40 <boris-42> for 1 example
15:35:44 <debo_os> is there a version build with memcached
15:35:59 <boris-42> PhilDay we are going step by step to change scheduler in next way
15:36:19 <boris-42> 1. Remove compute_node_get_all methods, and add one new rpc methods to scheduler, update_node()
15:36:32 <boris-42> and use schedulers DB to store all infromation
15:36:51 <boris-42> So the mechanism will be the same
15:37:03 <boris-42> 2. Cleanup Nova (to remove data from compute_node)
15:37:09 <boris-42> and periodic tasks
15:37:17 <n0ano> so, rather than the compute nodes updating a DB you're going to send the data to a scheduler so it can update the DB - how is that faster
15:37:31 <boris-42> Memcahced is faster
15:37:42 <boris-42> we will show real results from real big deployment
15:38:16 <n0ano> and your scheme will work with multiple schedulers?
15:38:17 <boris-42> 3. Add more flexibility shceudler.compute_update() could be called from different projects
15:38:27 <boris-42> n0ano yes of course
15:38:44 <boris-42> 4. Use data from different source, as first cinder
15:38:51 <n0ano> which means you are doing a fan-out message
15:39:00 <boris-42> no
15:39:11 <boris-42> fan-out means compute nodes -> all scheduler
15:39:23 <PhilDay> So for step 1 the scheduler(s) will still update the existing DB - and then you'll make memcache an optional alternative Db driver for the scheduler ?
15:39:26 <boris-42> we are doing compute nodes -> one of the scheduler -> memchacned
15:39:49 <llu-laptop> boris-42: so other schedulers read from memcached?
15:39:50 <boris-42> PhilDay I mean there will be 3 pathces =)
15:39:57 <boris-42> PhilDay baby steps style=)
15:40:19 <boris-42> llu-laptop all schedulers are connected to one distributed memcached
15:40:32 <debo_os> but what state are you planning to store
15:40:37 <n0ano> you do have the issues of master scheduler dies, who takes over and how do you do the hand off
15:40:44 <debo_os> for starters it will be a replica of the db state for now, right?
15:40:49 <PhilDay> Step style is good :-)   Just want to also see that steps introduce options rather that force a change in deployment
15:40:57 <boris-42> there is no master scheduler
15:41:14 <debo_os> I think we should have a layer to isolate the scheduler and the state stuff ... then it wont matter
15:41:27 <boris-42> debo_os it want metter
15:41:29 <boris-42> already
15:41:37 <n0ano> only one scheduler recieves messages and updates memcached => that's the master
15:41:39 <boris-42> We have memcached that is distributed
15:41:43 <boris-42> no
15:41:54 <boris-42> nano let me describe more carefull
15:41:56 <debo_os> agree memcache is just the store, rioght ....
15:42:14 <debo_os> instead of a slow DB u have memcache which could have been redis too or couch
15:42:24 <debo_os> so the point is why mandate just memcache
15:42:27 <boris-42> We have some KEY_VALUE storage
15:42:30 <n0ano> everybody take a deep breath and pause for a minute
15:42:30 <debo_os> have a state API
15:42:34 <boris-42> 1 KEY_VALUE storage
15:42:41 <boris-42> distributed fast and such things
15:42:50 <boris-42> we have A lot of schedulers
15:42:55 <boris-42> we have one RPC queue
15:42:55 <debo_os> ok thats good .. have a pure keyval api and maybe not mandate specific memcache etc
15:42:58 <boris-42> for shceudlers
15:43:14 <boris-42> all schedulers are getting one by one message from qeueu
15:43:24 <n0ano> I believe boris-42 has a detailed write up, we gave the link above, may I suggest we all read that write up, understand it, and come back next week to talk about it in detail
15:43:26 <boris-42> and update global DB
15:43:28 <debo_os> do you have  a doc with this new design you are doing ....
15:43:39 <debo_os> yeah
15:43:49 <boris-42> n0ano we should update part about using memcahed
15:44:01 <boris-42> n0ano our first variant was without memcached
15:44:09 <boris-42> and fan-out
15:44:34 <n0ano> what I've hearing is a lot of confusion that, hopefully, is cleared up a bit by the write up
15:45:26 <boris-42> n0ano We will update our DOC soon ok?) not it describes solution with fan-out and without KEY-Value storage
15:45:51 <boris-42> n0ano could I ping you, after we update our doc?)
15:46:01 <n0ano> that would be good as those are areas that are confusing.
15:46:25 <n0ano> boris-42, please do ping me, yes
15:46:33 <boris-42> n0ano yeah that was result of common discussion with Mke (from BlueHost)
15:47:05 <n0ano> this is good area but I want to make sure we can have a productive discussion about it.
15:47:34 <boris-42> n0ano sure
15:47:40 <boris-42> nods=)
15:47:42 <n0ano> let's move on (note that everyone needs to do homework before the next meeting :-)
15:47:57 <PhilDay> Can we all get pinged when the doc is updated ?
15:47:57 <n0ano> #topic multiple active scheduler policies
15:48:26 <n0ano> I don't think we completely finished the discussion last week, were there anymore issues anyone here is concerned about this topic?
15:48:49 <glikson> we've recently submitted a new patch
15:49:04 <glikson> https://review.openstack.org/#/c/37407/
15:49:23 <n0ano> are you getting reviewers :-)
15:49:31 <glikson> the implementation now is rather different than the original idea, but still..
15:49:55 <glikson> we've got Russell's -2 :-)
15:50:20 <n0ano> well, that's attention anyway
15:50:32 <glikson> which he didn't remove yet. hopefully the last patch addressess the main concerns, and we will be able to make progress.
15:50:44 <n0ano> how did the implementation change, in a nutshell
15:51:02 <glikson> now we just specify scheduler options in flavor extra spec
15:51:32 <glikson> no pre-defined configurations/policies in nova.conf, no association with aggregates..
15:51:56 <n0ano> so the user can select scheduler policies that are created by the administrator, I guess that's OK
15:52:40 <glikson> so, for example, you can define a flavor with CoreFilter and cpu_allocation_ration=2.0, and another different set of parameters
15:53:03 <glikson> (yes, "you"=admin")
15:53:34 <PaulMurray> glikson these things will be visible to users yes?
15:53:43 <glikson> then, one can use existing AggregateInstanceExtraSpec to map to aggregates
15:54:08 <glikson> PaulMurray: we are suggesting not to return these properties for non-admin users.
15:54:18 <PaulMurray> ok, i see
15:54:30 <n0ano> glikson, how would you do that, flavors are visible to all
15:55:13 <glikson> n0ano: in the api. the behavior is often different between admin and non-admin, so we added another "if"..
15:55:21 <glikson> everyone see the flavor
15:55:30 <glikson> just some of the extra specs are not shown
15:55:40 <glikson> if you are not an admi
15:55:41 <glikson> n
15:55:46 <PhilDay> There is an APi extension which determines if the extra-spec values are availabel or not - is that enough ?
15:55:48 <PaulMurray> extra extra specs :)
15:56:09 <glikson> PhilDay: you mean, all the extra specs?
15:56:19 <n0ano> ugh, I don't really like that but I'm not going to argue against it, just seems wrong
15:56:54 <PaulMurray> Is there some drift going on here....
15:57:06 <PaulMurray> I mean are things being used in a way there were not meant to rather
15:57:21 <PaulMurray> ...in order to fit
15:57:23 <PhilDay> I didn't say I liked it either :-)    Just feels like extra specs might be now being pulled in many different ways
15:57:50 <n0ano> PaulMurray, well, extra specs has always been a catch all but, with the scope applied to the keys, it sort works out OK
15:58:02 <n0ano> s/sort/sort of
15:58:19 <glikson> we are adding namespace to the new keys
15:58:34 <n0ano> glikson, you better :-)
15:58:53 <n0ano> sorry guys but we're coming up to the hour
15:58:56 <glikson> I am not sure this is the ideal solution -- but this seemed to be the preferred approach from the feedbacks we received
15:59:16 <PhilDay> So maybe the extra_spec APi exnesion should be updated to allow the admin to define which namespaces are exposed - rather than it being an all or nothing
15:59:41 <n0ano> I'd like to thank everyone and we'll talk again next week (feel free to email me with agenda suggestions)
15:59:53 <glikson> PhilDay: interesting idea. hopefully our patch will not be blocked until it is implemented :-)
15:59:58 <n0ano> #endmeeting