14:00:28 <edleafe> #startmeeting nova_scheduler 14:00:29 <openstack> Meeting started Mon Mar 27 14:00:28 2017 UTC and is due to finish in 60 minutes. The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:30 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:32 <openstack> The meeting name has been set to 'nova_scheduler' 14:00:37 <mriedem> o/ 14:00:42 <edleafe> Good UGT morning! Who's here? 14:00:47 <alex_xu> o/ 14:00:51 <macsz> hello Monday world 14:01:09 <edleafe> #link Agenda: https://wiki.openstack.org/wiki/Meetings/NovaScheduler 14:01:43 <edleafe> I know that cdent is on PTO today. 14:02:06 <edleafe> Hope jaypipes is around... 14:02:41 <alex_xu> and bauzas :) 14:03:21 <jroll> \o 14:03:36 <jaypipes> edleafe: I'm not unfortunately. I know I need to do reviews on traits stuff, and I will be spending 4 hours today doing those. 14:03:52 <alex_xu> jaypipes: thanks 14:03:53 <edleafe> jaypipes: ok. I have a POC for the auto-import 14:04:06 <edleafe> #link autoimport: https://github.com/EdLeafe/autoimport 14:04:06 <jaypipes> edleafe: awesome. 14:04:29 <edleafe> #topic Specs & Reviews 14:04:42 <edleafe> #link Traits series: https://review.openstack.org/#/c/376201/ 14:04:47 <edleafe> alex_xu? 14:05:11 <alex_xu> my colleague is working on the 'placement-manage' cl 14:05:32 <alex_xu> #link https://review.openstack.org/#/c/450125/1 14:05:41 <alex_xu> it is still in WIP 14:05:52 <alex_xu> two problems found for that 14:06:19 <edleafe> alex_xu: any major blocks for your series? 14:06:32 <alex_xu> first, that cmd want to use Trait object to create standard trait in db 14:06:49 <alex_xu> #link https://review.openstack.org/#/c/376199/28/nova/objects/resource_provider.py@1496 14:07:32 <alex_xu> edleafe: ^ I probably need to remove that check from the obj layer, and move into api layer 14:08:05 <edleafe> alex_xu: I'm confused: I thought all standard traits were going to be in the os-traits module? 14:08:25 <edleafe> alex_xu: but yeah, that seems more like an API-level check 14:08:36 <alex_xu> edleafe: yes, but we need to import all the standard trait fro os-traits into placement db 14:08:41 <mriedem> it is in the api already via json schema 14:09:05 <alex_xu> mriedem: ah, yea, I probably just need to remove that check 14:09:06 <mriedem> https://github.com/openstack/nova/blob/master/nova/api/openstack/placement/handlers/resource_class.py#L33 14:10:02 <edleafe> alex_xu: what was the second problem? 14:10:02 <alex_xu> second problem, do we want to consider remove standard trait which removed from os-traits in the placement-manange cmd now? 14:10:06 <edleafe> heh 14:11:07 <edleafe> I think once something is in os-traits, it's there for good. Removing it from the DB for a local modification might be OK, though 14:11:30 <alex_xu> if yes, we need to take care the case trait may already associated with specific resource provider 14:12:03 <edleafe> alex_xu: agreed. This would seem to be an ultra-low priority, though 14:12:15 <edleafe> Removing traits was never part of the main design 14:12:26 <alex_xu> I thought we should return fault for any trait associated with resource provider. If the user still want to remove that, the user needs to specify '--force' 14:13:10 <edleafe> Well, we should probably move the discussion to the review 14:13:21 <edleafe> so more people can comment 14:13:26 <edleafe> Moving on... 14:13:30 <edleafe> #link os-traits reorg: https://review.openstack.org/#/c/448282/ 14:13:36 <alex_xu> edleafe: also agree remove is low priority, just think of if implement that as above ^, we didn't have interface in the object layer to query, the trait assicated with which resource provider 14:14:07 <alex_xu> for the object layer to support such query, it is still a WIP patch https://review.openstack.org/#/c/429364/ 14:14:08 <edleafe> jaypipes is breaking up os-traits from a single large file to a logical nesting of smaller files 14:14:36 <edleafe> There were issues with the design for importing those sub-packages 14:15:32 <edleafe> cdent had a POC, and I made another (linked above) 14:15:41 <edleafe> #link cdent POC: https://github.com/cdent/pony 14:16:04 <alex_xu> yea, just better than one single huge file 14:16:19 <edleafe> Nothing earth-shattering there; just trying to make computers do the boring repetitive stuff instead of humans 14:17:12 <edleafe> bauzas has an early WIP spec for making claims from placement: 14:17:13 <edleafe> #link WIP placement doing claims: https://review.openstack.org/437424 14:17:26 <edleafe> Comments there are always welcome. 14:18:03 <edleafe> #link Show sched. hints in server details: https://review.openstack.org/440580 14:18:18 <edleafe> There is some discussion as to whether this should be done 14:18:34 <edleafe> or keep scheduler hints an internal thing only 14:18:47 <edleafe> Nested Resource provider series pretty much on hold until traits is done 14:18:50 <edleafe> #link Nested RPs: https://review.openstack.org/#/c/415920/ 14:19:08 <edleafe> Any other specs or reviews to discuss? 14:19:36 <diga> o/ 14:19:53 <mriedem> i have re-proposed the nested RPs spec, 14:19:58 <mriedem> do we anticipate changes to that? 14:20:04 <mriedem> or should we just re-approve? 14:20:19 <mriedem> https://review.openstack.org/#/c/449381/ 14:20:22 <edleafe> mriedem: I'd have to re-review 14:20:37 <edleafe> to make sure that the traits usage is current 14:20:38 <mriedem> if we don't expect changes, but it's just lower priority, 14:20:45 <mriedem> then we should still re-approve before p-1 14:20:50 * bauzas waves super-late (thanks DST) 14:21:03 <edleafe> ok, I'll look over that after the meeting 14:21:07 * macsz was late to add item for specs reviews 14:21:25 <edleafe> I saw a late addition just now: 14:21:26 <edleafe> Use local-scheduler spec 14:21:26 <edleafe> #link Add use-local-scheduler spec https://review.openstack.org/#/c/438936/ 14:21:55 <macsz> John started it, left me to finish it 14:21:56 <edleafe> johnthetubaguy: any comments on that? 14:22:03 <edleafe> macsz: ah! 14:22:04 <mriedem> that came up at the ptg, 14:22:07 <mriedem> like local conductor 14:22:32 <mriedem> so run nova-scheduler local to conductor 14:22:38 <mriedem> and don't require a separate nova-scheduler service 14:22:41 <mriedem> i think is the gist 14:22:46 <macsz> yeah, basically it is about dropping scheduler process and move it's logic to the condiuctor 14:22:53 <macsz> mriedem: yeah 14:23:21 <edleafe> ok, added to my growing list of tabs... 14:23:29 <macsz> John had planned more in this spec 14:23:34 <macsz> but we decided to split it up 14:23:49 <macsz> and created two additional specs as follow up, but it;s not scheduler related 14:23:55 <bauzas> the only issue I see with that is that we agreed to have a global scheduler for cellsv2 vs. local conductors for each cell 14:23:57 <macsz> so i think we can skip it today 14:24:10 <johnthetubaguy> that sounds correct 14:24:36 <johnthetubaguy> its all about making things simpler for operators 14:24:43 <mriedem> placement is the global scheduler now, 14:24:56 <mriedem> but yeah we still have n-sch global too 14:25:01 <mriedem> using host mappings 14:25:08 <bauzas> mriedem: not really given we still need to look at filters 14:25:28 <bauzas> so, it could be merged with the super-conductor 14:25:38 <bauzas> not the local conductors to make it clear 14:25:43 <johnthetubaguy> right, placement is the key change here, there i no longer a benefit from the separate single nova-scheduler process (with active/passive HA, or whatever) 14:25:56 <johnthetubaguy> yes, its an api cell thing still 14:26:00 <bauzas> johnthetubaguy: that's another point 14:26:10 <bauzas> because conductors are A/A 14:26:24 <bauzas> while schedulers are A/P 14:26:32 <johnthetubaguy> bauzas: thats not always true 14:26:46 <johnthetubaguy> its required only for the caching scheduler 14:26:51 <bauzas> well 14:27:35 <johnthetubaguy> now the move to placement makes most of these reasons go away, as nova-scheduler is no longer "single threaded" in the way it once was (and that was a good thing) 14:27:52 <bauzas> johnthetubaguy: the problem is about the HostState 14:28:06 * alex_xu thought nova-scheduler is A/A... 14:28:11 <bauzas> I mean, HostState.consume_from_request() 14:28:36 <bauzas> but lemme provide my thoughts in the review 14:29:04 <johnthetubaguy> bauzas: yes, the idea is claims will eventually get rid of that 14:29:09 <edleafe> Good idea - let's all review that spec and add our comments 14:29:14 <bauzas> lastly, the point is about the scheduler fanout 14:29:19 <johnthetubaguy> to be clear, the current goal is make running as part of conductor possible 14:29:25 <johnthetubaguy> deprecating the old way is a follow on step 14:29:31 <mriedem> alex_xu: i don't think it's recommended to run more than one scheduler worker 14:29:45 <bauzas> mriedem: it's possible if you have a good capacity 14:29:58 <mriedem> the main reason is what? collisions? 14:29:59 <bauzas> mriedem: but we don't recommend A/A for example with Ironic 14:30:02 * johnthetubaguy points at stack vs spread 14:30:03 <bauzas> yup 14:30:08 <mriedem> we should doc this, 14:30:16 <mriedem> because i'm reminded of it frequently, 14:30:19 <mriedem> but forget what the reasons are 14:30:19 <bauzas> it's already documented AFAIR 14:30:31 * alex_xu needs to check the doc 14:30:38 <bauzas> but agreed with johnthetubaguy, possibly something we could fix by scheduler claims :) 14:30:59 <bauzas> as I said, my last concern is the scheduler fanout that computes do 14:31:13 <bauzas> for providing instances and aggregates knowing 14:31:18 <bauzas> to the scheduler 14:31:25 <johnthetubaguy> I thought we killed that? I guess it came back 14:31:27 <bauzas> but again, let's put that in the spec 14:31:51 <bauzas> johnthetubaguy: we added it somewhere around Juno/Liberty 14:31:58 <johnthetubaguy> so maybe we are a cycle early with the spec 14:32:09 <johnthetubaguy> bauzas: I thought it got killed soon after 14:32:11 <bauzas> but now we have placement, so we could be doing that using placement 14:32:33 <bauzas> johnthetubaguy: well, not what I know 14:32:56 <mriedem> i have no idea what this means: 14:32:57 <mriedem> (9:30:59 AM) bauzas: as I said, my last concern is the scheduler fanout that computes do 14:32:57 <mriedem> (9:31:13 AM) bauzas: for providing instances and aggregates knowing 14:32:57 <mriedem> (9:31:18 AM) bauzas: to the scheduler 14:33:16 * alex_xu thought the multiple schedulers is ok, only have problem when resource starve 14:33:48 <bauzas> mriedem: talking of https://github.com/openstack/nova/blob/master/nova/scheduler/client/query.py#L41 14:33:48 <johnthetubaguy> yeah, I guess we added more stuff into that 14:33:54 <johnthetubaguy> alex_xu: if you stack, you are always resource starved 14:34:13 <bauzas> related to https://github.com/openstack/nova/blob/master/nova/scheduler/rpcapi.py#L134 14:34:27 <johnthetubaguy> macsz: sounds like we are a cycle too soon for that spec 14:34:38 <johnthetubaguy> but good to aim that way sooner rather than later 14:34:51 <mriedem> i added some godaddy ops guys, 14:34:54 <bauzas> johnthetubaguy: tbc, I'm agreing with you 14:35:02 <alex_xu> johnthetubaguy: ah, i got the point, but the nova-scheduler is stack? 14:35:03 <bauzas> johnthetubaguy: for the direction 14:35:03 <mriedem> because the size of the conductor service has come up over time in the ops ml 14:35:12 <mriedem> alex_xu: depends on config 14:35:15 <johnthetubaguy> alex_xu: you can go both ways, there is a config to choose 14:35:18 <bauzas> johnthetubaguy: we don't honestly need a separate RPC service for running filters 14:35:29 <macsz> johnthetubaguy: well, better start sooner than later :) 14:35:35 <bauzas> johnthetubaguy: I'm just trying to put my thoughts for explaining the tech debtt 14:35:44 <bauzas> ie. how to go from here to there :) 14:35:47 <mriedem> alex_xu: this is why i was saying i hope these conditions are all clearly documented somewhere, which i don't think they are 14:35:56 <bauzas> but I'm definitely +1 on the idea to merge the scheduler 14:36:02 <johnthetubaguy> mriedem: its fair we have a general process size issue 14:36:11 <johnthetubaguy> mriedem: +1 for better dev focused docs on this 14:36:54 <alex_xu> yea, I always thought I clear about all of that, for now, I think no :( 14:37:25 <mriedem> bauzas: if you can find any existing docs on guidelines about when you can or shouldn't run multiple schedulers, that'd be helpful, 14:37:30 <mriedem> if you can't find that, we should doc it 14:37:49 <mriedem> superdan pointed out something to me last week wrt multiple schedulers and using ironic, and why you can't, 14:37:50 <bauzas> mriedem: https://docs.openstack.org/developer/nova/scheduler_evolution.html#parallelism-and-concurrency 14:37:54 <mriedem> something with the hacks we were talking about last week 14:38:26 <bauzas> mriedem: which points to http://specs.openstack.org/openstack/nova-specs/specs/backlog/approved/parallel-scheduler.html 14:38:34 <bauzas> alex_xu: ^ worth reading 14:38:35 <mriedem> but if i'm not using NUMA or i'm using spread vs pack, then... 14:38:41 <alex_xu> bauzas: thanks 14:38:54 <mriedem> or i'm using ironic, 14:38:57 <mriedem> we could expand that doc a bit 14:39:20 <bauzas> mriedem: of course, we can document that further 14:39:23 <bauzas> ideally in the ops guide 14:39:41 <bauzas> running multiple schedulers can be acceptable *if you know the limitations* 14:39:53 <mriedem> yeah, i don't know the limitations :) 14:39:59 <mriedem> hence why i'm asking for docs 14:40:02 <mriedem> but anyway 14:40:06 <bauzas> a large cloud with good capacity is reasonable to run multiple schedulers 14:40:09 <mriedem> i'll take a todo to sort through the docs 14:40:19 <bauzas> a small cloud using NUMA or Ironic isn't :) 14:40:21 <mriedem> i also wonder how valid http://specs.openstack.org/openstack/nova-specs/specs/backlog/approved/parallel-scheduler.html is anymore 14:40:32 <bauzas> mriedem: it was written pre-placement 14:40:45 <mriedem> that's what i mean, 14:40:45 <bauzas> and I guess jaypipes never revisited it 14:40:48 <mriedem> lots of this is probably old 14:41:02 <bauzas> well, there are still some idea that are valuable 14:41:22 <bauzas> if we consider placement as not being feature-parity with scheduler/conductor, which is what I think 14:41:37 <bauzas> placement is good for getting a list of hosts 14:41:46 <bauzas> but then we could still having filters/weighters 14:42:05 <bauzas> using the conductor if we merge mid-term johnthetubaguy's spec, which I agree 14:42:16 <bauzas> anyway 14:42:25 <mriedem> i've got a todo written down, 14:42:27 <mriedem> i'll bug people later 14:42:32 <bauzas> just saying those are the docs describing the current problems with the scheduler 14:43:05 <johnthetubaguy> sounds like time to refresh something 14:43:08 <bauzas> I still need to log my comments on johnthetubaguy's spec :) 14:43:14 <johnthetubaguy> could help us with the claims discussions 14:43:23 <johnthetubaguy> its macsz's spec now 14:43:53 <johnthetubaguy> like I say, sounds like something for next cycle, or something we need more work to make it possible 14:43:53 <bauzas> yeah, hence me pushing for scheduler claims, not placement claims 14:43:55 <johnthetubaguy> or both 14:44:14 <bauzas> scheduler using placement for that tho 14:44:16 <johnthetubaguy> bauzas: I have lost track of the claims debates myself 14:44:44 <bauzas> johnthetubaguy: the main debate was about why we should merge that *before* placement cut 14:44:50 <edleafe> johnthetubaguy: it's here: https://review.openstack.org/437424 14:45:39 <bauzas> I was long-opposed to the idea of scheduler claims, but now I see placement, I think it could be nice using placement for having scheduler claims 14:46:13 <bauzas> so I turned my opinion 14:46:22 <bauzas> anyway 14:46:25 <bauzas> 15 mins to the ned 14:46:28 <edleafe> Let's move on 14:46:30 <bauzas> and I'm diverting 14:46:35 <edleafe> #topic Bugs 14:46:43 <edleafe> Didn't see any new ones. 14:46:54 <edleafe> Anything to discuss about bugs? 14:47:14 <macsz> did not see anything worth as well, but i just started my day :) 14:47:22 <edleafe> ok then 14:47:28 <edleafe> #topic Open discussion 14:47:45 <edleafe> What's on your mind? 14:48:46 * edleafe only hears crickets 14:49:06 <edleafe> Guess that's a wrap. Back to work/sleep/fun! 14:49:09 <edleafe> #endmeeting