14:00:28 #startmeeting nova_scheduler 14:00:29 Meeting started Mon Mar 27 14:00:28 2017 UTC and is due to finish in 60 minutes. The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:30 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:32 The meeting name has been set to 'nova_scheduler' 14:00:37 o/ 14:00:42 Good UGT morning! Who's here? 14:00:47 o/ 14:00:51 hello Monday world 14:01:09 #link Agenda: https://wiki.openstack.org/wiki/Meetings/NovaScheduler 14:01:43 I know that cdent is on PTO today. 14:02:06 Hope jaypipes is around... 14:02:41 and bauzas :) 14:03:21 \o 14:03:36 edleafe: I'm not unfortunately. I know I need to do reviews on traits stuff, and I will be spending 4 hours today doing those. 14:03:52 jaypipes: thanks 14:03:53 jaypipes: ok. I have a POC for the auto-import 14:04:06 #link autoimport: https://github.com/EdLeafe/autoimport 14:04:06 edleafe: awesome. 14:04:29 #topic Specs & Reviews 14:04:42 #link Traits series: https://review.openstack.org/#/c/376201/ 14:04:47 alex_xu? 14:05:11 my colleague is working on the 'placement-manage' cl 14:05:32 #link https://review.openstack.org/#/c/450125/1 14:05:41 it is still in WIP 14:05:52 two problems found for that 14:06:19 alex_xu: any major blocks for your series? 14:06:32 first, that cmd want to use Trait object to create standard trait in db 14:06:49 #link https://review.openstack.org/#/c/376199/28/nova/objects/resource_provider.py@1496 14:07:32 edleafe: ^ I probably need to remove that check from the obj layer, and move into api layer 14:08:05 alex_xu: I'm confused: I thought all standard traits were going to be in the os-traits module? 14:08:25 alex_xu: but yeah, that seems more like an API-level check 14:08:36 edleafe: yes, but we need to import all the standard trait fro os-traits into placement db 14:08:41 it is in the api already via json schema 14:09:05 mriedem: ah, yea, I probably just need to remove that check 14:09:06 https://github.com/openstack/nova/blob/master/nova/api/openstack/placement/handlers/resource_class.py#L33 14:10:02 alex_xu: what was the second problem? 14:10:02 second problem, do we want to consider remove standard trait which removed from os-traits in the placement-manange cmd now? 14:10:06 heh 14:11:07 I think once something is in os-traits, it's there for good. Removing it from the DB for a local modification might be OK, though 14:11:30 if yes, we need to take care the case trait may already associated with specific resource provider 14:12:03 alex_xu: agreed. This would seem to be an ultra-low priority, though 14:12:15 Removing traits was never part of the main design 14:12:26 I thought we should return fault for any trait associated with resource provider. If the user still want to remove that, the user needs to specify '--force' 14:13:10 Well, we should probably move the discussion to the review 14:13:21 so more people can comment 14:13:26 Moving on... 14:13:30 #link os-traits reorg: https://review.openstack.org/#/c/448282/ 14:13:36 edleafe: also agree remove is low priority, just think of if implement that as above ^, we didn't have interface in the object layer to query, the trait assicated with which resource provider 14:14:07 for the object layer to support such query, it is still a WIP patch https://review.openstack.org/#/c/429364/ 14:14:08 jaypipes is breaking up os-traits from a single large file to a logical nesting of smaller files 14:14:36 There were issues with the design for importing those sub-packages 14:15:32 cdent had a POC, and I made another (linked above) 14:15:41 #link cdent POC: https://github.com/cdent/pony 14:16:04 yea, just better than one single huge file 14:16:19 Nothing earth-shattering there; just trying to make computers do the boring repetitive stuff instead of humans 14:17:12 bauzas has an early WIP spec for making claims from placement: 14:17:13 #link WIP placement doing claims: https://review.openstack.org/437424 14:17:26 Comments there are always welcome. 14:18:03 #link Show sched. hints in server details: https://review.openstack.org/440580 14:18:18 There is some discussion as to whether this should be done 14:18:34 or keep scheduler hints an internal thing only 14:18:47 Nested Resource provider series pretty much on hold until traits is done 14:18:50 #link Nested RPs: https://review.openstack.org/#/c/415920/ 14:19:08 Any other specs or reviews to discuss? 14:19:36 o/ 14:19:53 i have re-proposed the nested RPs spec, 14:19:58 do we anticipate changes to that? 14:20:04 or should we just re-approve? 14:20:19 https://review.openstack.org/#/c/449381/ 14:20:22 mriedem: I'd have to re-review 14:20:37 to make sure that the traits usage is current 14:20:38 if we don't expect changes, but it's just lower priority, 14:20:45 then we should still re-approve before p-1 14:20:50 * bauzas waves super-late (thanks DST) 14:21:03 ok, I'll look over that after the meeting 14:21:07 * macsz was late to add item for specs reviews 14:21:25 I saw a late addition just now: 14:21:26 Use local-scheduler spec 14:21:26 #link Add use-local-scheduler spec https://review.openstack.org/#/c/438936/ 14:21:55 John started it, left me to finish it 14:21:56 johnthetubaguy: any comments on that? 14:22:03 macsz: ah! 14:22:04 that came up at the ptg, 14:22:07 like local conductor 14:22:32 so run nova-scheduler local to conductor 14:22:38 and don't require a separate nova-scheduler service 14:22:41 i think is the gist 14:22:46 yeah, basically it is about dropping scheduler process and move it's logic to the condiuctor 14:22:53 mriedem: yeah 14:23:21 ok, added to my growing list of tabs... 14:23:29 John had planned more in this spec 14:23:34 but we decided to split it up 14:23:49 and created two additional specs as follow up, but it;s not scheduler related 14:23:55 the only issue I see with that is that we agreed to have a global scheduler for cellsv2 vs. local conductors for each cell 14:23:57 so i think we can skip it today 14:24:10 that sounds correct 14:24:36 its all about making things simpler for operators 14:24:43 placement is the global scheduler now, 14:24:56 but yeah we still have n-sch global too 14:25:01 using host mappings 14:25:08 mriedem: not really given we still need to look at filters 14:25:28 so, it could be merged with the super-conductor 14:25:38 not the local conductors to make it clear 14:25:43 right, placement is the key change here, there i no longer a benefit from the separate single nova-scheduler process (with active/passive HA, or whatever) 14:25:56 yes, its an api cell thing still 14:26:00 johnthetubaguy: that's another point 14:26:10 because conductors are A/A 14:26:24 while schedulers are A/P 14:26:32 bauzas: thats not always true 14:26:46 its required only for the caching scheduler 14:26:51 well 14:27:35 now the move to placement makes most of these reasons go away, as nova-scheduler is no longer "single threaded" in the way it once was (and that was a good thing) 14:27:52 johnthetubaguy: the problem is about the HostState 14:28:06 * alex_xu thought nova-scheduler is A/A... 14:28:11 I mean, HostState.consume_from_request() 14:28:36 but lemme provide my thoughts in the review 14:29:04 bauzas: yes, the idea is claims will eventually get rid of that 14:29:09 Good idea - let's all review that spec and add our comments 14:29:14 lastly, the point is about the scheduler fanout 14:29:19 to be clear, the current goal is make running as part of conductor possible 14:29:25 deprecating the old way is a follow on step 14:29:31 alex_xu: i don't think it's recommended to run more than one scheduler worker 14:29:45 mriedem: it's possible if you have a good capacity 14:29:58 the main reason is what? collisions? 14:29:59 mriedem: but we don't recommend A/A for example with Ironic 14:30:02 * johnthetubaguy points at stack vs spread 14:30:03 yup 14:30:08 we should doc this, 14:30:16 because i'm reminded of it frequently, 14:30:19 but forget what the reasons are 14:30:19 it's already documented AFAIR 14:30:31 * alex_xu needs to check the doc 14:30:38 but agreed with johnthetubaguy, possibly something we could fix by scheduler claims :) 14:30:59 as I said, my last concern is the scheduler fanout that computes do 14:31:13 for providing instances and aggregates knowing 14:31:18 to the scheduler 14:31:25 I thought we killed that? I guess it came back 14:31:27 but again, let's put that in the spec 14:31:51 johnthetubaguy: we added it somewhere around Juno/Liberty 14:31:58 so maybe we are a cycle early with the spec 14:32:09 bauzas: I thought it got killed soon after 14:32:11 but now we have placement, so we could be doing that using placement 14:32:33 johnthetubaguy: well, not what I know 14:32:56 i have no idea what this means: 14:32:57 (9:30:59 AM) bauzas: as I said, my last concern is the scheduler fanout that computes do 14:32:57 (9:31:13 AM) bauzas: for providing instances and aggregates knowing 14:32:57 (9:31:18 AM) bauzas: to the scheduler 14:33:16 * alex_xu thought the multiple schedulers is ok, only have problem when resource starve 14:33:48 mriedem: talking of https://github.com/openstack/nova/blob/master/nova/scheduler/client/query.py#L41 14:33:48 yeah, I guess we added more stuff into that 14:33:54 alex_xu: if you stack, you are always resource starved 14:34:13 related to https://github.com/openstack/nova/blob/master/nova/scheduler/rpcapi.py#L134 14:34:27 macsz: sounds like we are a cycle too soon for that spec 14:34:38 but good to aim that way sooner rather than later 14:34:51 i added some godaddy ops guys, 14:34:54 johnthetubaguy: tbc, I'm agreing with you 14:35:02 johnthetubaguy: ah, i got the point, but the nova-scheduler is stack? 14:35:03 johnthetubaguy: for the direction 14:35:03 because the size of the conductor service has come up over time in the ops ml 14:35:12 alex_xu: depends on config 14:35:15 alex_xu: you can go both ways, there is a config to choose 14:35:18 johnthetubaguy: we don't honestly need a separate RPC service for running filters 14:35:29 johnthetubaguy: well, better start sooner than later :) 14:35:35 johnthetubaguy: I'm just trying to put my thoughts for explaining the tech debtt 14:35:44 ie. how to go from here to there :) 14:35:47 alex_xu: this is why i was saying i hope these conditions are all clearly documented somewhere, which i don't think they are 14:35:56 but I'm definitely +1 on the idea to merge the scheduler 14:36:02 mriedem: its fair we have a general process size issue 14:36:11 mriedem: +1 for better dev focused docs on this 14:36:54 yea, I always thought I clear about all of that, for now, I think no :( 14:37:25 bauzas: if you can find any existing docs on guidelines about when you can or shouldn't run multiple schedulers, that'd be helpful, 14:37:30 if you can't find that, we should doc it 14:37:49 superdan pointed out something to me last week wrt multiple schedulers and using ironic, and why you can't, 14:37:50 mriedem: https://docs.openstack.org/developer/nova/scheduler_evolution.html#parallelism-and-concurrency 14:37:54 something with the hacks we were talking about last week 14:38:26 mriedem: which points to http://specs.openstack.org/openstack/nova-specs/specs/backlog/approved/parallel-scheduler.html 14:38:34 alex_xu: ^ worth reading 14:38:35 but if i'm not using NUMA or i'm using spread vs pack, then... 14:38:41 bauzas: thanks 14:38:54 or i'm using ironic, 14:38:57 we could expand that doc a bit 14:39:20 mriedem: of course, we can document that further 14:39:23 ideally in the ops guide 14:39:41 running multiple schedulers can be acceptable *if you know the limitations* 14:39:53 yeah, i don't know the limitations :) 14:39:59 hence why i'm asking for docs 14:40:02 but anyway 14:40:06 a large cloud with good capacity is reasonable to run multiple schedulers 14:40:09 i'll take a todo to sort through the docs 14:40:19 a small cloud using NUMA or Ironic isn't :) 14:40:21 i also wonder how valid http://specs.openstack.org/openstack/nova-specs/specs/backlog/approved/parallel-scheduler.html is anymore 14:40:32 mriedem: it was written pre-placement 14:40:45 that's what i mean, 14:40:45 and I guess jaypipes never revisited it 14:40:48 lots of this is probably old 14:41:02 well, there are still some idea that are valuable 14:41:22 if we consider placement as not being feature-parity with scheduler/conductor, which is what I think 14:41:37 placement is good for getting a list of hosts 14:41:46 but then we could still having filters/weighters 14:42:05 using the conductor if we merge mid-term johnthetubaguy's spec, which I agree 14:42:16 anyway 14:42:25 i've got a todo written down, 14:42:27 i'll bug people later 14:42:32 just saying those are the docs describing the current problems with the scheduler 14:43:05 sounds like time to refresh something 14:43:08 I still need to log my comments on johnthetubaguy's spec :) 14:43:14 could help us with the claims discussions 14:43:23 its macsz's spec now 14:43:53 like I say, sounds like something for next cycle, or something we need more work to make it possible 14:43:53 yeah, hence me pushing for scheduler claims, not placement claims 14:43:55 or both 14:44:14 scheduler using placement for that tho 14:44:16 bauzas: I have lost track of the claims debates myself 14:44:44 johnthetubaguy: the main debate was about why we should merge that *before* placement cut 14:44:50 johnthetubaguy: it's here: https://review.openstack.org/437424 14:45:39 I was long-opposed to the idea of scheduler claims, but now I see placement, I think it could be nice using placement for having scheduler claims 14:46:13 so I turned my opinion 14:46:22 anyway 14:46:25 15 mins to the ned 14:46:28 Let's move on 14:46:30 and I'm diverting 14:46:35 #topic Bugs 14:46:43 Didn't see any new ones. 14:46:54 Anything to discuss about bugs? 14:47:14 did not see anything worth as well, but i just started my day :) 14:47:22 ok then 14:47:28 #topic Open discussion 14:47:45 What's on your mind? 14:48:46 * edleafe only hears crickets 14:49:06 Guess that's a wrap. Back to work/sleep/fun! 14:49:09 #endmeeting