14:00:28 <edleafe> #startmeeting nova_scheduler
14:00:29 <openstack> Meeting started Mon Mar 27 14:00:28 2017 UTC and is due to finish in 60 minutes.  The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:30 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:32 <openstack> The meeting name has been set to 'nova_scheduler'
14:00:37 <mriedem> o/
14:00:42 <edleafe> Good UGT morning! Who's here?
14:00:47 <alex_xu> o/
14:00:51 <macsz> hello Monday world
14:01:09 <edleafe> #link Agenda: https://wiki.openstack.org/wiki/Meetings/NovaScheduler
14:01:43 <edleafe> I know that cdent is on PTO today.
14:02:06 <edleafe> Hope jaypipes is around...
14:02:41 <alex_xu> and bauzas :)
14:03:21 <jroll> \o
14:03:36 <jaypipes> edleafe: I'm not unfortunately. I know I need to do reviews on traits stuff, and I will be spending 4 hours today doing those.
14:03:52 <alex_xu> jaypipes: thanks
14:03:53 <edleafe> jaypipes: ok. I have a POC for the auto-import
14:04:06 <edleafe> #link autoimport: https://github.com/EdLeafe/autoimport
14:04:06 <jaypipes> edleafe: awesome.
14:04:29 <edleafe> #topic Specs & Reviews
14:04:42 <edleafe> #link Traits series: https://review.openstack.org/#/c/376201/
14:04:47 <edleafe> alex_xu?
14:05:11 <alex_xu> my colleague is working on the 'placement-manage' cl
14:05:32 <alex_xu> #link https://review.openstack.org/#/c/450125/1
14:05:41 <alex_xu> it is still in WIP
14:05:52 <alex_xu> two problems found for that
14:06:19 <edleafe> alex_xu: any major blocks for your series?
14:06:32 <alex_xu> first, that cmd want to use Trait object to create standard trait in db
14:06:49 <alex_xu> #link https://review.openstack.org/#/c/376199/28/nova/objects/resource_provider.py@1496
14:07:32 <alex_xu> edleafe: ^ I probably need to remove that check from the obj layer, and move into api layer
14:08:05 <edleafe> alex_xu: I'm confused: I thought all standard traits were going to be in the os-traits module?
14:08:25 <edleafe> alex_xu: but yeah, that seems more like an API-level check
14:08:36 <alex_xu> edleafe: yes, but we need to import all the standard trait fro os-traits into placement db
14:08:41 <mriedem> it is in the api already via json schema
14:09:05 <alex_xu> mriedem: ah, yea, I probably just need to remove that check
14:09:06 <mriedem> https://github.com/openstack/nova/blob/master/nova/api/openstack/placement/handlers/resource_class.py#L33
14:10:02 <edleafe> alex_xu: what was the second problem?
14:10:02 <alex_xu> second problem, do we want to consider remove standard trait which removed from os-traits in the placement-manange cmd now?
14:10:06 <edleafe> heh
14:11:07 <edleafe> I think once something is in os-traits, it's there for good. Removing it from the DB for a local modification might be OK, though
14:11:30 <alex_xu> if yes, we need to take care the case trait may already associated with specific resource provider
14:12:03 <edleafe> alex_xu: agreed. This would seem to be an ultra-low priority, though
14:12:15 <edleafe> Removing traits was never part of the main design
14:12:26 <alex_xu> I thought we should return fault for any trait associated with resource provider. If the user still want to remove that, the user needs to specify '--force'
14:13:10 <edleafe> Well, we should probably move the discussion to the review
14:13:21 <edleafe> so more people can comment
14:13:26 <edleafe> Moving on...
14:13:30 <edleafe> #link os-traits reorg: https://review.openstack.org/#/c/448282/
14:13:36 <alex_xu> edleafe: also agree remove is low priority, just think of if implement that as above ^, we didn't have interface in the object layer to query, the trait assicated with which resource provider
14:14:07 <alex_xu> for the object layer to support such query, it is still a WIP patch https://review.openstack.org/#/c/429364/
14:14:08 <edleafe> jaypipes is breaking up os-traits from a single large file to a logical nesting of smaller files
14:14:36 <edleafe> There were issues with the design for importing those sub-packages
14:15:32 <edleafe> cdent had a POC, and I made another (linked above)
14:15:41 <edleafe> #link cdent POC: https://github.com/cdent/pony
14:16:04 <alex_xu> yea, just better than one single huge file
14:16:19 <edleafe> Nothing earth-shattering there; just trying to make computers do the boring repetitive stuff instead of humans
14:17:12 <edleafe> bauzas has an early WIP spec for making claims from placement:
14:17:13 <edleafe> #link WIP placement doing claims: https://review.openstack.org/437424
14:17:26 <edleafe> Comments there are always welcome.
14:18:03 <edleafe> #link Show sched. hints in server details: https://review.openstack.org/440580
14:18:18 <edleafe> There is some discussion as to whether this should be done
14:18:34 <edleafe> or keep scheduler hints an internal thing only
14:18:47 <edleafe> Nested Resource provider series pretty much on hold until traits is done
14:18:50 <edleafe> #link Nested RPs: https://review.openstack.org/#/c/415920/
14:19:08 <edleafe> Any other specs or reviews to discuss?
14:19:36 <diga> o/
14:19:53 <mriedem> i have re-proposed the nested RPs spec,
14:19:58 <mriedem> do we anticipate changes to that?
14:20:04 <mriedem> or should we just re-approve?
14:20:19 <mriedem> https://review.openstack.org/#/c/449381/
14:20:22 <edleafe> mriedem: I'd have to re-review
14:20:37 <edleafe> to make sure that the traits usage is current
14:20:38 <mriedem> if we don't expect changes, but it's just lower priority,
14:20:45 <mriedem> then we should still re-approve before p-1
14:20:50 * bauzas waves super-late (thanks DST)
14:21:03 <edleafe> ok, I'll look over that after the meeting
14:21:07 * macsz was late to add item for specs reviews
14:21:25 <edleafe> I saw a late addition just now:
14:21:26 <edleafe> Use local-scheduler spec
14:21:26 <edleafe> #link Add use-local-scheduler spec https://review.openstack.org/#/c/438936/
14:21:55 <macsz> John started it, left me to finish it
14:21:56 <edleafe> johnthetubaguy: any comments on that?
14:22:03 <edleafe> macsz: ah!
14:22:04 <mriedem> that came up at the ptg,
14:22:07 <mriedem> like local conductor
14:22:32 <mriedem> so run nova-scheduler local to conductor
14:22:38 <mriedem> and don't require a separate nova-scheduler service
14:22:41 <mriedem> i think is the gist
14:22:46 <macsz> yeah, basically it is about dropping scheduler process and move it's logic to the condiuctor
14:22:53 <macsz> mriedem: yeah
14:23:21 <edleafe> ok, added to my growing list of tabs...
14:23:29 <macsz> John had planned more in this spec
14:23:34 <macsz> but we decided to split it up
14:23:49 <macsz> and created two additional specs as follow up, but it;s not scheduler related
14:23:55 <bauzas> the only issue I see with that is that we agreed to have a global scheduler for cellsv2 vs. local conductors for each cell
14:23:57 <macsz> so i think we can skip it today
14:24:10 <johnthetubaguy> that sounds correct
14:24:36 <johnthetubaguy> its all about making things simpler for operators
14:24:43 <mriedem> placement is the global scheduler now,
14:24:56 <mriedem> but yeah we still have n-sch global too
14:25:01 <mriedem> using host mappings
14:25:08 <bauzas> mriedem: not really given we still need to look at filters
14:25:28 <bauzas> so, it could be merged with the super-conductor
14:25:38 <bauzas> not the local conductors to make it clear
14:25:43 <johnthetubaguy> right, placement is the key change here, there i no longer a benefit from the separate single nova-scheduler process (with active/passive HA, or whatever)
14:25:56 <johnthetubaguy> yes, its an api cell thing still
14:26:00 <bauzas> johnthetubaguy: that's another point
14:26:10 <bauzas> because conductors are A/A
14:26:24 <bauzas> while schedulers are A/P
14:26:32 <johnthetubaguy> bauzas: thats not always true
14:26:46 <johnthetubaguy> its required only for the caching scheduler
14:26:51 <bauzas> well
14:27:35 <johnthetubaguy> now the move to placement makes most of these reasons go away, as nova-scheduler is no longer "single threaded" in the way it once was (and that was a good thing)
14:27:52 <bauzas> johnthetubaguy: the problem is about the HostState
14:28:06 * alex_xu thought nova-scheduler is A/A...
14:28:11 <bauzas> I mean, HostState.consume_from_request()
14:28:36 <bauzas> but lemme provide my thoughts in the review
14:29:04 <johnthetubaguy> bauzas: yes, the idea is claims will eventually get rid of that
14:29:09 <edleafe> Good idea - let's all review that spec and add our comments
14:29:14 <bauzas> lastly, the point is about the scheduler fanout
14:29:19 <johnthetubaguy> to be clear, the current goal is make running as part of conductor possible
14:29:25 <johnthetubaguy> deprecating the old way is a follow on step
14:29:31 <mriedem> alex_xu: i don't think it's recommended to run more than one scheduler worker
14:29:45 <bauzas> mriedem: it's possible if you have a good capacity
14:29:58 <mriedem> the main reason is what? collisions?
14:29:59 <bauzas> mriedem: but we don't recommend A/A for example with Ironic
14:30:02 * johnthetubaguy points at stack vs spread
14:30:03 <bauzas> yup
14:30:08 <mriedem> we should doc this,
14:30:16 <mriedem> because i'm reminded of it frequently,
14:30:19 <mriedem> but forget what the reasons are
14:30:19 <bauzas> it's already documented AFAIR
14:30:31 * alex_xu needs to check the doc
14:30:38 <bauzas> but agreed with johnthetubaguy, possibly something we could fix by scheduler claims :)
14:30:59 <bauzas> as I said, my last concern is the scheduler fanout that computes do
14:31:13 <bauzas> for providing instances and aggregates knowing
14:31:18 <bauzas> to the scheduler
14:31:25 <johnthetubaguy> I thought we killed that? I guess it came back
14:31:27 <bauzas> but again, let's put that in the spec
14:31:51 <bauzas> johnthetubaguy: we added it somewhere around Juno/Liberty
14:31:58 <johnthetubaguy> so maybe we are a cycle early with the spec
14:32:09 <johnthetubaguy> bauzas: I thought it got killed soon after
14:32:11 <bauzas> but now we have placement, so we could be doing that using placement
14:32:33 <bauzas> johnthetubaguy: well, not what I know
14:32:56 <mriedem> i have no idea what this means:
14:32:57 <mriedem> (9:30:59 AM) bauzas: as I said, my last concern is the scheduler fanout that computes do
14:32:57 <mriedem> (9:31:13 AM) bauzas: for providing instances and aggregates knowing
14:32:57 <mriedem> (9:31:18 AM) bauzas: to the scheduler
14:33:16 * alex_xu thought the multiple schedulers is ok, only have problem when resource starve
14:33:48 <bauzas> mriedem: talking of https://github.com/openstack/nova/blob/master/nova/scheduler/client/query.py#L41
14:33:48 <johnthetubaguy> yeah, I guess we added more stuff into that
14:33:54 <johnthetubaguy> alex_xu: if you stack, you are always resource starved
14:34:13 <bauzas> related to https://github.com/openstack/nova/blob/master/nova/scheduler/rpcapi.py#L134
14:34:27 <johnthetubaguy> macsz: sounds like we are a cycle too soon for that spec
14:34:38 <johnthetubaguy> but good to aim that way sooner rather than later
14:34:51 <mriedem> i added some godaddy ops guys,
14:34:54 <bauzas> johnthetubaguy: tbc, I'm agreing with you
14:35:02 <alex_xu> johnthetubaguy: ah, i got the point, but the nova-scheduler is stack?
14:35:03 <bauzas> johnthetubaguy: for the direction
14:35:03 <mriedem> because the size of the conductor service has come up over time in the ops ml
14:35:12 <mriedem> alex_xu: depends on config
14:35:15 <johnthetubaguy> alex_xu: you can go both ways, there is a config to choose
14:35:18 <bauzas> johnthetubaguy: we don't honestly need a separate RPC service for running filters
14:35:29 <macsz> johnthetubaguy: well, better start sooner than later :)
14:35:35 <bauzas> johnthetubaguy: I'm just trying to put my thoughts for explaining the tech debtt
14:35:44 <bauzas> ie. how to go from here to there :)
14:35:47 <mriedem> alex_xu: this is why i was saying i hope these conditions are all clearly documented somewhere, which i don't think they are
14:35:56 <bauzas> but I'm definitely +1 on the idea to merge the scheduler
14:36:02 <johnthetubaguy> mriedem: its fair we have a general process size issue
14:36:11 <johnthetubaguy> mriedem: +1 for better dev focused docs on this
14:36:54 <alex_xu> yea, I always thought I clear about all of that, for now, I think no :(
14:37:25 <mriedem> bauzas: if you can find any existing docs on guidelines about when you can or shouldn't run multiple schedulers, that'd be helpful,
14:37:30 <mriedem> if you can't find that, we should doc it
14:37:49 <mriedem> superdan pointed out something to me last week wrt multiple schedulers and using ironic, and why you can't,
14:37:50 <bauzas> mriedem: https://docs.openstack.org/developer/nova/scheduler_evolution.html#parallelism-and-concurrency
14:37:54 <mriedem> something with the hacks we were talking about last week
14:38:26 <bauzas> mriedem: which points to http://specs.openstack.org/openstack/nova-specs/specs/backlog/approved/parallel-scheduler.html
14:38:34 <bauzas> alex_xu: ^ worth reading
14:38:35 <mriedem> but if i'm not using NUMA or i'm using spread vs pack, then...
14:38:41 <alex_xu> bauzas: thanks
14:38:54 <mriedem> or i'm using ironic,
14:38:57 <mriedem> we could expand that doc a bit
14:39:20 <bauzas> mriedem: of course, we can document that further
14:39:23 <bauzas> ideally in the ops guide
14:39:41 <bauzas> running multiple schedulers can be acceptable *if you know the limitations*
14:39:53 <mriedem> yeah, i don't know the limitations :)
14:39:59 <mriedem> hence why i'm asking for docs
14:40:02 <mriedem> but anyway
14:40:06 <bauzas> a large cloud with good capacity is reasonable to run multiple schedulers
14:40:09 <mriedem> i'll take a todo to sort through the docs
14:40:19 <bauzas> a small cloud using NUMA or Ironic isn't :)
14:40:21 <mriedem> i also wonder how valid http://specs.openstack.org/openstack/nova-specs/specs/backlog/approved/parallel-scheduler.html is anymore
14:40:32 <bauzas> mriedem: it was written pre-placement
14:40:45 <mriedem> that's what i mean,
14:40:45 <bauzas> and I guess jaypipes never revisited it
14:40:48 <mriedem> lots of this is probably old
14:41:02 <bauzas> well, there are still some idea that are valuable
14:41:22 <bauzas> if we consider placement as not being feature-parity with scheduler/conductor, which is what I think
14:41:37 <bauzas> placement is good for getting a list of hosts
14:41:46 <bauzas> but then we could still having filters/weighters
14:42:05 <bauzas> using the conductor if we merge mid-term johnthetubaguy's spec, which I agree
14:42:16 <bauzas> anyway
14:42:25 <mriedem> i've got a todo written down,
14:42:27 <mriedem> i'll bug people later
14:42:32 <bauzas> just saying those are the docs describing the current problems with the scheduler
14:43:05 <johnthetubaguy> sounds like time to refresh something
14:43:08 <bauzas> I still need to log my comments on johnthetubaguy's spec :)
14:43:14 <johnthetubaguy> could help us with the claims discussions
14:43:23 <johnthetubaguy> its macsz's spec now
14:43:53 <johnthetubaguy> like I say, sounds like something for next cycle, or something we need more work to make it possible
14:43:53 <bauzas> yeah, hence me pushing for scheduler claims, not placement claims
14:43:55 <johnthetubaguy> or both
14:44:14 <bauzas> scheduler using placement for that tho
14:44:16 <johnthetubaguy> bauzas: I have lost track of the claims debates myself
14:44:44 <bauzas> johnthetubaguy: the main debate was about why we should merge that *before* placement cut
14:44:50 <edleafe> johnthetubaguy: it's here: https://review.openstack.org/437424
14:45:39 <bauzas> I was long-opposed to the idea of scheduler claims, but now I see placement, I think it could be nice using placement for having scheduler claims
14:46:13 <bauzas> so I turned my opinion
14:46:22 <bauzas> anyway
14:46:25 <bauzas> 15 mins to the ned
14:46:28 <edleafe> Let's move on
14:46:30 <bauzas> and I'm diverting
14:46:35 <edleafe> #topic Bugs
14:46:43 <edleafe> Didn't see any new ones.
14:46:54 <edleafe> Anything to discuss about bugs?
14:47:14 <macsz> did not see anything worth as well, but i just started my day :)
14:47:22 <edleafe> ok then
14:47:28 <edleafe> #topic Open discussion
14:47:45 <edleafe> What's on your mind?
14:48:46 * edleafe only hears crickets
14:49:06 <edleafe> Guess that's a wrap. Back to work/sleep/fun!
14:49:09 <edleafe> #endmeeting