14:59:52 #startmeeting scheduler 14:59:53 Meeting started Tue Aug 6 14:59:52 2013 UTC and is due to finish in 60 minutes. The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:59:54 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:59:56 The meeting name has been set to 'scheduler' 15:00:10 anyone here for the scheduler meeting? 15:00:45 Hi this is Debo 15:00:49 for the scheduler meeting 15:01:15 I am covering for Senhua who left the Openstack world for now :) 15:01:32 * glikson here 15:01:34 debo_os, NP, just waiting for people to gather 15:02:07 hi all 15:03:15 #topic instance groups 15:03:26 garyk, I believe this was your issue 15:03:45 n0ano: yes, that is correct 15:03:57 care to expand upon it a little 15:04:02 i just wanted to bring everybody up to date with our developments and bottlenecks 15:05:16 n0ano: the BP is https://blueprints.launchpad.net/nova/+spec/instance-group-api-extension 15:05:25 and the wiki is https://wiki.openstack.org/wiki/GroupApiExtension 15:05:46 at the moment we the DB support approved in H2 15:06:03 the API was looking good until we were asked to use the object model. 15:06:16 that was a blocking feature about 3 weeks ago. 15:06:44 we have added the support - https://review.openstack.org/#/c/38979/ and are planning to update the API to use this 15:07:08 I am updating the API extn based on the support 15:07:10 In addition to this the schsuling support has been added (was approved and then reverted) and is now back in reveiw 15:07:21 https://review.openstack.org/#/c/33956/ 15:07:55 At the moment my concern is that this feature, which we decided at the portland summit was importnat, may not make the H3 cut due to the issue that are out of our control. 15:08:13 i wanted to know if there is anyway that we can get some help here with the review process and the issue of the object. 15:08:40 sorry - obuject support (this is added and we'll integrate in the coming days - just feels like we are going to default on this) 15:08:55 reviews are always an issue, just rattle the cages here and on the mailing list is the best technique 15:09:15 more interesting is your problem, is that with the object model and, if so, what's the issue? 15:09:23 @garyk - in terms of polices within a group, what is implemented at the moment ? 15:09:41 n0ano: the API patch was nacked due to the fact that it had direct access with the database 15:09:55 PhilDay: at the moment anti affinity is in the review process 15:10:27 PhilDay: Jay Lau want to add affinity host support above this 15:10:35 @garyk - Ok thanks, and I assume that builds on the exiting filter ? 15:10:50 PhilDay: we are also planning network proximity - but that will certainly not be in the H cycle 15:11:22 PhilDay: yes, they are both using existing filters, they are now hooking into the databse structure for instance group management. 15:11:30 prior to that it was an ugly hack 15:12:08 It was the network proximity reference in the wiki that made me ask the question - I didn't remember seeing anything related to that in the code 15:12:32 PhilDay: it was something we discussed at the sumiit and was in the queue for development. 15:12:59 i guess that we can discuss that part at the next summit. 15:13:01 @garyk - thansk for the clarification - so this set of changes is really about tidying up the group management 15:13:18 PhilDay: correct. 15:13:48 well, tigying up except for the nack on the API change, that seems rather significant, do you have an alternative 15:13:51 my concern is making the cut for the H cycle. I feel that we are being unlucky with various changes in Nova out of our control (for example the usage of objects) 15:14:47 n0ano: not sure i understand 15:15:09 you said they nacked because of direct access to the DB, what's your alternative to that? 15:15:53 n0ano: we have implemented the object support. We are in the process of integrating this into the API layer. Now it is just the review process. 15:16:04 is theer anyway that we can get a core reviewer assigned to this? 15:17:24 I not a core reviewer so I can't help, have you asked on the mailing list 15:17:25 in neutron we do this - that is, a core reviewr is assigned to developments 15:18:03 n0ano: i do not have anything to add. maybe dabo or glikson have something to add 15:18:14 sorry debo_os not dabo 15:20:05 well, there's the Nova meeting this Thurs, you could ask for reviewers there, that meeting worries about scheduling/review issues 15:20:35 n0ano: ok, will do. i'll try and attend - the hours are crazy for us 15:20:48 Well I am updating the API code based on Gary's patch and I would second Gary about the core reviewer 15:21:06 garyk, I'll be at the meeting, if you don't make it I'll be sure to raise your review issue 15:21:10 and his comments .... 15:21:21 n0ano: thanks! much appreciated 15:21:38 NP 15:21:45 Unless there's more on this... 15:22:27 #topic overall scheduler plan 15:22:55 This was more jog0 issue but we talked about it a little at the last meeting and I suggested everyone think about it 15:23:38 there are a lot of random BPs out against the scheduler, does it make sense to come up with a more unified plan for the scheduler 15:24:18 n0ano: agreed. 15:24:45 n0ano; i think that the details that boris-42 posted are very imported and highlight some serious issues 15:25:12 which details were you thinking of? I'm concerned about scaling issues myself. 15:25:13 garyk out meeting time hi all 15:25:25 Can you repeat (or point to boris-42's comments) ? 15:25:26 n0ano scaling + flexibility 15:25:50 boris-42: can you please post the link to your doc 15:25:59 garyk one sec 15:26:05 well, I think the current deisgn (plugable filters/weights) is pretty flexible, not so sure about it scalability 15:26:19 nano no it is not flexible 15:26:32 have folks done scalabiity tests and posted results somewhere 15:26:46 debo_os we are going to test on real deployments 15:26:51 debo_os different approaches 15:27:02 debo_os and show results on HK summit 15:27:21 debo_os, yes/no, bluehost did some work but their environment is unique enough that some people don't think their results apply in general 15:27:39 nano they rewrite half openstack=) 15:27:43 n0ano * 15:27:48 boris-42, indeed 15:28:01 by the way they are agree with our approach 15:28:04 They also don't really schedule as such - they place onto specific hosts 15:28:10 https://docs.google.com/a/mirantis.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit 15:28:14 boris-42, when you say `real deployments', what kind of scale are you talking about 15:28:31 we would like to test on 10-30k nodes 15:28:46 boris-42 it would be good to know where performance problems are as well as just straight scale measures 15:28:48 at least just use case of creating instances 15:28:51 ahaha 15:28:56 Is that in your plan? 15:29:03 whole openstack is one big botleneck=) 15:29:11 :) 15:29:19 I noticed 15:29:23 boris-42, now, now :-) 15:29:27 first problem are periodic_tasks 15:29:31 second scheduler 15:29:45 and then we will see=) 15:30:02 boris-42, indeed, I still need to create a BP to remove the periodic scheduler update, we're in violent agreement there 15:30:12 actually we almost finished new version of scheduler 15:30:27 DOC is a little bit out of date 15:30:33 we are going to store data not locally 15:30:36 in scheduler 15:30:50 boris-42: there are a number of bottlenecks and they can be dealt with 15:30:51 but in distributed master-master memcached 15:30:52 more problematic is your idea to remove the DB, there is a significant group of opionion that fan-out to the scheduler is wrong and storing state in the DB is right 15:31:07 Is there a BP / review for that work boris-42 ? 15:31:15 the scheduler is certainly one and has a number of shortcomings. i guess it is a process to get this straight. 15:31:23 DB is botleneck 15:31:25 these would be good topis for the up and coming summit 15:31:27 there is no fanout 15:31:45 sorry I am a scheduler noob ... but is the main issue the central DB and the fact that its embedded into nova 15:32:07 we will produce N/(PERIODIC_TASK_TIME*SCHEDULERS_COUNT) requests to schedulers 15:32:20 boris-42: if no fanout, than how all schedulers can get the same compute node's upate? 15:32:28 memcached 15:32:43 so you're replacing the DB with memcached 15:32:44 we will use one memchaced for all shceudlers 15:32:53 and scheduler will keep all data 15:32:56 not nova 15:33:48 boris-42 what do you mean not nova - is this cheduler outside nova? 15:34:04 So running memcached will become a requirement for all Nova installs - or is this an optional new scheduler ? 15:34:30 boris-42: i am not sure that memcached is a solution. but i think that the design is more importnat than the implementations at the moment 15:34:37 (Just thinking that elsewhere memcahce has been an option, not a requirement so far) 15:34:38 PhilDay We will implement only for memchached (but you are able to impelemtn realization for others backends) 15:34:39 boris-42: well memcached or any other distributed low latency DB :) 15:34:50 yes 15:34:54 debo_os ^ 15:35:00 there will be interface 15:35:08 with get(), put(), and get_all() methos 15:35:15 of course 15:35:21 you could implement it for mysql even=) 15:35:31 but we choose memcached=) 15:35:39 So this is an alternative to the filter scheduler - or a new version / replacement for the filter scheduler ? 15:35:40 for 1 example 15:35:44 is there a version build with memcached 15:35:59 PhilDay we are going step by step to change scheduler in next way 15:36:19 1. Remove compute_node_get_all methods, and add one new rpc methods to scheduler, update_node() 15:36:32 and use schedulers DB to store all infromation 15:36:51 So the mechanism will be the same 15:37:03 2. Cleanup Nova (to remove data from compute_node) 15:37:09 and periodic tasks 15:37:17 so, rather than the compute nodes updating a DB you're going to send the data to a scheduler so it can update the DB - how is that faster 15:37:31 Memcahced is faster 15:37:42 we will show real results from real big deployment 15:38:16 and your scheme will work with multiple schedulers? 15:38:17 3. Add more flexibility shceudler.compute_update() could be called from different projects 15:38:27 n0ano yes of course 15:38:44 4. Use data from different source, as first cinder 15:38:51 which means you are doing a fan-out message 15:39:00 no 15:39:11 fan-out means compute nodes -> all scheduler 15:39:23 So for step 1 the scheduler(s) will still update the existing DB - and then you'll make memcache an optional alternative Db driver for the scheduler ? 15:39:26 we are doing compute nodes -> one of the scheduler -> memchacned 15:39:49 boris-42: so other schedulers read from memcached? 15:39:50 PhilDay I mean there will be 3 pathces =) 15:39:57 PhilDay baby steps style=) 15:40:19 llu-laptop all schedulers are connected to one distributed memcached 15:40:32 but what state are you planning to store 15:40:37 you do have the issues of master scheduler dies, who takes over and how do you do the hand off 15:40:44 for starters it will be a replica of the db state for now, right? 15:40:49 Step style is good :-) Just want to also see that steps introduce options rather that force a change in deployment 15:40:57 there is no master scheduler 15:41:14 I think we should have a layer to isolate the scheduler and the state stuff ... then it wont matter 15:41:27 debo_os it want metter 15:41:29 already 15:41:37 only one scheduler recieves messages and updates memcached => that's the master 15:41:39 We have memcached that is distributed 15:41:43 no 15:41:54 nano let me describe more carefull 15:41:56 agree memcache is just the store, rioght .... 15:42:14 instead of a slow DB u have memcache which could have been redis too or couch 15:42:24 so the point is why mandate just memcache 15:42:27 We have some KEY_VALUE storage 15:42:30 everybody take a deep breath and pause for a minute 15:42:30 have a state API 15:42:34 1 KEY_VALUE storage 15:42:41 distributed fast and such things 15:42:50 we have A lot of schedulers 15:42:55 we have one RPC queue 15:42:55 ok thats good .. have a pure keyval api and maybe not mandate specific memcache etc 15:42:58 for shceudlers 15:43:14 all schedulers are getting one by one message from qeueu 15:43:24 I believe boris-42 has a detailed write up, we gave the link above, may I suggest we all read that write up, understand it, and come back next week to talk about it in detail 15:43:26 and update global DB 15:43:28 do you have a doc with this new design you are doing .... 15:43:39 yeah 15:43:49 n0ano we should update part about using memcahed 15:44:01 n0ano our first variant was without memcached 15:44:09 and fan-out 15:44:34 what I've hearing is a lot of confusion that, hopefully, is cleared up a bit by the write up 15:45:26 n0ano We will update our DOC soon ok?) not it describes solution with fan-out and without KEY-Value storage 15:45:51 n0ano could I ping you, after we update our doc?) 15:46:01 that would be good as those are areas that are confusing. 15:46:25 boris-42, please do ping me, yes 15:46:33 n0ano yeah that was result of common discussion with Mke (from BlueHost) 15:47:05 this is good area but I want to make sure we can have a productive discussion about it. 15:47:34 n0ano sure 15:47:40 nods=) 15:47:42 let's move on (note that everyone needs to do homework before the next meeting :-) 15:47:57 Can we all get pinged when the doc is updated ? 15:47:57 #topic multiple active scheduler policies 15:48:26 I don't think we completely finished the discussion last week, were there anymore issues anyone here is concerned about this topic? 15:48:49 we've recently submitted a new patch 15:49:04 https://review.openstack.org/#/c/37407/ 15:49:23 are you getting reviewers :-) 15:49:31 the implementation now is rather different than the original idea, but still.. 15:49:55 we've got Russell's -2 :-) 15:50:20 well, that's attention anyway 15:50:32 which he didn't remove yet. hopefully the last patch addressess the main concerns, and we will be able to make progress. 15:50:44 how did the implementation change, in a nutshell 15:51:02 now we just specify scheduler options in flavor extra spec 15:51:32 no pre-defined configurations/policies in nova.conf, no association with aggregates.. 15:51:56 so the user can select scheduler policies that are created by the administrator, I guess that's OK 15:52:40 so, for example, you can define a flavor with CoreFilter and cpu_allocation_ration=2.0, and another different set of parameters 15:53:03 (yes, "you"=admin") 15:53:34 glikson these things will be visible to users yes? 15:53:43 then, one can use existing AggregateInstanceExtraSpec to map to aggregates 15:54:08 PaulMurray: we are suggesting not to return these properties for non-admin users. 15:54:18 ok, i see 15:54:30 glikson, how would you do that, flavors are visible to all 15:55:13 n0ano: in the api. the behavior is often different between admin and non-admin, so we added another "if".. 15:55:21 everyone see the flavor 15:55:30 just some of the extra specs are not shown 15:55:40 if you are not an admi 15:55:41 n 15:55:46 There is an APi extension which determines if the extra-spec values are availabel or not - is that enough ? 15:55:48 extra extra specs :) 15:56:09 PhilDay: you mean, all the extra specs? 15:56:19 ugh, I don't really like that but I'm not going to argue against it, just seems wrong 15:56:54 Is there some drift going on here.... 15:57:06 I mean are things being used in a way there were not meant to rather 15:57:21 ...in order to fit 15:57:23 I didn't say I liked it either :-) Just feels like extra specs might be now being pulled in many different ways 15:57:50 PaulMurray, well, extra specs has always been a catch all but, with the scope applied to the keys, it sort works out OK 15:58:02 s/sort/sort of 15:58:19 we are adding namespace to the new keys 15:58:34 glikson, you better :-) 15:58:53 sorry guys but we're coming up to the hour 15:58:56 I am not sure this is the ideal solution -- but this seemed to be the preferred approach from the feedbacks we received 15:59:16 So maybe the extra_spec APi exnesion should be updated to allow the admin to define which namespaces are exposed - rather than it being an all or nothing 15:59:41 I'd like to thank everyone and we'll talk again next week (feel free to email me with agenda suggestions) 15:59:53 PhilDay: interesting idea. hopefully our patch will not be blocked until it is implemented :-) 15:59:58 #endmeeting