14:00:09 <edleafe> #startmeeting nova_scheduler
14:00:09 <openstack> Meeting started Mon May  9 14:00:09 2016 UTC and is due to finish in 60 minutes.  The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:11 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:13 <openstack> The meeting name has been set to 'nova_scheduler'
14:00:15 <cdent> o/
14:00:18 <alex_xu> o/
14:00:18 <Yingxin> o/
14:00:19 <mlavalle> o/
14:00:21 <_gryf> o/
14:00:30 <mriedem1> o/
14:01:01 <edleafe> Agenda for today's meeting: https://wiki.openstack.org/wiki/Meetings/NovaScheduler#Agenda_for_next_meeting
14:01:07 * bauzas waves
14:01:52 * edleafe is waiting for jaypipes to show up...
14:01:54 <bauzas> not sure I got the agenda
14:02:04 <bauzas> do you want us to circle over those ?
14:02:21 <edleafe> bauzas: I sent an email on Friday. You were on holiday, right?
14:02:29 <bauzas> yup
14:02:44 <edleafe> ok, no worries
14:02:51 <bauzas> but I saw the email
14:02:51 <edleafe> #topic Specs
14:02:52 * johnthetubaguy lurks
14:03:07 <bauzas> edleafe: my point is that I can only see gerrit links
14:03:26 <bauzas> edleafe: not sure I see what you want us to discuss
14:03:39 <edleafe> bauzas: ah
14:03:50 <edleafe> Well, I just wanted to keep those in view
14:03:53 <mriedem1> i'm assuming request for reviews since they aren't approved
14:04:10 <edleafe> if people had questions about them, they could review and ask here
14:04:26 <bauzas> okay, I'd prefer to see them rather in the etherpad
14:04:27 <mriedem1> generic-resource-pools is close, but cdent has some good questions in there for jaypipes
14:04:50 * cdent nods graciously at mriedem1
14:05:01 <mriedem1> i didn't see allocations in the agenda, but seems it's holding up other things https://review.openstack.org/#/c/300177/
14:05:04 <mriedem1> needs a rebase
14:05:27 <bauzas> tbc, I mean
14:05:27 <bauzas> * will-i-am (~will-dono@10.10.51.252) vient de rentrer
14:05:27 <bauzas> * pradk (~prad@ovpn-113-136.phx2.redhat.com) vient de rentrer
14:05:27 <bauzas> <sylvainb> dansmith, received a blank report for the ceph failure rate, sir.
14:05:27 <bauzas> <sylvainb> suspecting high noise on the radio transmission, sir.
14:05:28 <bauzas> <dansmith> sylvainb: heh, yeah, I think it was halfway through the report when my computer suspended on friday.. the email came the instant I woke it up, and cron sent me a traceback separately
14:05:29 <bauzas> argh
14:05:31 <mriedem1> i.e. there is a series that starts here and depends on allocations https://review.openstack.org/#/c/282442/
14:05:40 <edleafe> missed that one - thought allocations merged last week
14:05:42 <cdent> mriedem1: dansmith and I need to come to terms on that one
14:05:46 <bauzas> https://etherpad.openstack.org/p/newton-nova-priorities-tracking
14:05:50 <bauzas> rather
14:05:59 <johnthetubaguy> As a random heads up, I added a draft spec for the sharding idea doffm was asking me about: https://review.openstack.org/#/c/313519/
14:06:04 * bauzas facepalms for Ctrl-X
14:06:12 <edleafe> #link https://review.openstack.org/#/c/300177/ - Move allocation fields
14:06:25 <alex_xu> also worthing add claudiu's spec https://review.openstack.org/286520 it is related to qualitative.
14:06:26 <mriedem1> that etherpad is stale
14:06:41 <bauzas> mriedem1: ++
14:06:43 <bauzas> that's my point
14:07:05 <edleafe> there was a lot of distaste expressed for working in the etherpad.
14:07:07 <bauzas> I wanted to do a spec review time, but https://etherpad.openstack.org/p/newton-nova-priorities-tracking was bad for me
14:07:10 <edleafe> should we revive it?
14:07:46 <bauzas> the consensus was to keep it until we found another better way, amirite ?
14:07:58 <edleafe> #link https://review.openstack.org/#/c/313519/ - Add distinct-subset-shard-scheduler spec
14:08:02 <bauzas> anyway, I don't want to nitpick
14:08:14 <mriedem1> i prefer to keep the newton-nova-priorities-tracking etherpad as small and focused as possible
14:08:14 <cdent> I think the issue with the etherpad is "who is keeping it up to date"
14:08:14 <johnthetubaguy> I think the idea was to keep the etherpad for priorities, and keep it short
14:08:22 <cdent> because we don't actually agree on what the priorities are
14:08:24 <mriedem1> so think of adding things in there that need to be approved this week, realistically
14:08:30 <cdent> so someone needs to be the boss
14:08:46 <mriedem1> from my pov, generic-resource-pools and allocations specs are top priority
14:08:50 <bauzas> ++
14:08:53 <mriedem1> since those are the base patches in the queue, right?
14:08:56 <edleafe> "boss" sounds almost like "bauzas", no? :)
14:09:05 <cdent> edleafe++
14:09:20 <bauzas> I think jaypipes explained what was achievable for Newton
14:10:11 <mriedem1> if we can get the quantitative stuff done for newton, it will be a miracle
14:10:25 <edleafe> bauzas: so do you want to update the etherpad?
14:10:26 <Yingxin> what's the state of  https://blueprints.launchpad.net/nova/+spec/host-state-level-locking
14:10:47 <Yingxin> is it worthwhile to continue the work to implement scheduler claim?
14:10:47 <bauzas> Yingxin: someone needs to revival that spec
14:10:52 <cdent> I agree with mriedem1, this is going a lot slower than anyone really wants it to, but I guess that's life. From my standpoint what I need to make significant progress is review _before_ stuff is done.
14:11:17 <cdent> Otherwise will just end up at the end of the cycle with someone saying "oh whoops, you forgot..."
14:11:55 <Yingxin> bauzas: I see it is already re-approved for newton
14:11:57 <bauzas> edleafe: well, I surely can, but the goal of etherpad is than anyone can :)
14:12:03 <bauzas> that*
14:12:09 <bauzas> Yingxin: oh nice
14:12:10 <edleafe> cdent: yeah, approving a spec doensn't seem to ensure that it's sufficient
14:12:38 <edleafe> bauzas: of course, but since you brought it up... :)
14:12:44 <bauzas> ack
14:12:51 <mriedem1> can we at least agree on the two priority specs in this meeting? is that generic-resource-pools and allocations?
14:13:03 * edleafe notes that bauzas new friday nick should be "boss"
14:13:05 <cdent> I think that's the right choice mriedem1
14:13:11 <edleafe> mriedem1: yep
14:13:15 <mriedem1> ok, so,
14:13:22 <bauzas> mriedem1: that's my understanding post-Summit yes
14:13:25 <mriedem1> there are comments in generic-resource-pools, we just need jaypipes to address
14:13:35 <mriedem1> as for the allocations one, it needs a rebase,
14:13:48 <mriedem1> but cdent - is there something coming out of the code reviews that needs to be addressed in the spec for that?
14:13:58 <mriedem1> i'm not sure what you and dansmith aren't agreeing on since i haven't reviewed those patches
14:14:06 <dansmith> ...me either
14:14:23 <jaypipes> mriedem1: I feel like I have addressed endless comments on generic-resource-pools, but I will do another round right now.
14:14:27 <cdent> sorry, mriedem1, I was talking about this: https://review.openstack.org/#/c/282442/, not the spec
14:14:33 <dansmith> as far as I'm concerned, the first thing that needs to happen right now is moving the inventory to the api db before we make any more progress on allocations, right?
14:14:47 <mriedem1> jaypipes: i'm pretty much +2 on generic-resource-pools
14:15:02 <jaypipes> mriedem1: I had a full reset of all my puters this weekend. a real spring cleaning. unfortunately, I didn't get to review cdent's questions, which I will do right now and ignore the scheduler meeting.
14:15:18 <mriedem1> dansmith: i was going to bring that up also,
14:15:20 <cdent> jaypipes: my latest comments are implementation related questions, not anything that ought to derail the main thrust
14:15:32 <mriedem1> because moving the inventory stuff to the api db is going to be a prerequisite for new things
14:15:38 <dansmith> mriedem1: right
14:15:39 <bauzas> I agree
14:15:42 <mriedem1> and not only for allocationsm,
14:15:49 <mriedem1> but also the aggregates work that doffm is doing
14:15:51 <dansmith> mriedem1: unless I've missed something, I haven't seen that happen yet, so the rest seems moot until it does
14:16:04 <bauzas> I also began to review the PCI related fixes
14:16:16 <dansmith> mriedem1: well, the inventory stuff isn't in the way of aggregates, but it is in the way of allocations I think
14:16:55 * mriedem1 checks the models
14:17:59 <dansmith> in this vein,
14:18:17 <dansmith> the keypairs spec and code is up.. a couple of nits on the spec now that I've written the code, but: https://review.openstack.org/#/q/status:open+branch:master+topic:bp/cells-keypairs-api-db
14:19:06 <dansmith> I think I've avoided any touching of the reqspec in the process
14:19:43 <edleafe> So since we're on *specs* now, can we identify what blockers there are for the specs that need to get approved?
14:20:07 <edleafe> We know that jay is working on addressing the latest comments
14:20:15 <mriedem1> for generic resource pools, yes
14:20:37 <mriedem1> for allocations, (1) spec needs to be rebased and (2) sounds like we need to migrate inventory to api db before doing allocations
14:20:57 <bauzas> that's my understanding
14:21:12 <dansmith> yeah
14:21:22 <dansmith> does the allocations spec now describe it as in the api db?
14:21:24 <mriedem1> and ftr, i don't think we need a new bp for migrating the inventory records, it could just be a work item in the allocations spec
14:21:26 <edleafe> I can handle the rebase
14:21:51 <mriedem1> dansmith: no
14:22:05 <edleafe> #action - edleafe to rebase https://review.openstack.org/#/c/300177/
14:22:09 <mriedem1> dansmith: i brought that up in the generic-resource-pools spec too, to be clear about which db the new table goes in
14:22:09 <bauzas> dansmith: nope, not explicitely
14:22:13 <jaypipes> mriedem1: do we need another spec for migration inventories and allocations tables to api db?
14:22:16 <dansmith> mriedem1: yeah
14:22:26 <mriedem1> jaypipes: i don't think so
14:22:27 <bauzas> jaypipes: no, just amending the allocation one
14:22:31 <mriedem1> jaypipes: just make it a work item in the allocations spec
14:22:34 <jaypipes> bauzas: k, cool.
14:22:39 <jaypipes> mriedem1: ++
14:22:42 <mriedem1> "oh btw, migrate all of this crap first"
14:22:54 <edleafe> jaypipes: I can add  that when I rebase
14:24:06 <edleafe> So anything else we need to settle on specs?
14:24:14 <Yingxin> jaypipes: alex_xu:  what's the progress of capability spec series? does it also have priority in Newton?
14:24:31 <mriedem1> Yingxin: i think that is a long shot for newton
14:24:36 <dansmith> loooong
14:24:50 <mriedem1> Yingxin: priority spec freeze is after the midcycle,
14:25:01 <alex_xu> okay, i guess so
14:25:11 <jaypipes> Yingxin: edleafe is going to be working on specs in that series and trying to get a good game plan together for the qualitative side of the request.
14:25:13 <edleafe> Yingxin: I'm adding some capability-related stuff, and yeah, it's going to be difficult to get into Newton
14:25:16 <mriedem1> so if by some chance we're looking nearly complete for quantitative by the midcycle, and have a good plan for qualitative at the midcycle, then maybe something will get in
14:25:56 <Yingxin> yes, seems still need a lot of discussions for that
14:26:15 <alex_xu> +1
14:26:25 <bauzas> https://blueprints.launchpad.net/nova/+spec/resource-providers-standardize-extra-specs is kinda big deal
14:26:46 <edleafe> Yingxin: I have https://review.openstack.org/#/c/313784/ up already if you want to give your feedback
14:26:56 <mriedem1> bauzas: and https://review.openstack.org/#/c/309762/ is a dependency for that
14:27:12 <bauzas> I meant https://review.openstack.org/#/c/309762/ indeed
14:27:24 <alex_xu> edleafe: fyi, claudiu updated his patch also https://review.openstack.org/#/c/285856 looking for some feedback also
14:27:31 <Yingxin> edleafe: thanks, will have a look
14:27:34 <edleafe> alex_xu: thanks
14:27:39 <mriedem1> so how many specs do we have now for capabilities / extra specs?
14:27:49 <mriedem1> are they all competing? do some build on others?
14:28:07 <mriedem1> because at least jay, ed and claudiu have one now
14:28:28 <edleafe> mriedem1: I think we're all looking at pieces of the elephant
14:28:41 <Yingxin> mriedem1: I see lots of specs in the summit etherpad
14:28:42 <edleafe> mriedem1: I'll work on tying them together to get a bigger picute
14:28:45 <edleafe> picture
14:28:55 <mriedem1> edleafe: ok, yeah, that would be helpful
14:29:05 <alex_xu> my one depend all of those thing https://review.openstack.org/298188, but i think that is thing we will think after looong works
14:29:06 <mriedem1> Yingxin: yes, lots of prior art here
14:29:29 <alex_xu> edleafe: count on me if anything i can help
14:30:04 <Yingxin> edleafe: I'll work with alex
14:30:05 <edleafe> alex_xu: sure, thanks
14:30:14 <edleafe> Yingxin: great!
14:30:40 <mriedem1> so let's move on
14:30:54 <edleafe> Yes, I think we all have a good direction for this week
14:31:02 <edleafe> #topic Reviews
14:31:17 <edleafe> I put a few on the agenda
14:31:27 <edleafe> Any issues with those that we need to discuss?
14:31:40 <edleafe> Or any other reviews you want to bring up?
14:31:59 <jaypipes> mriedem1: I had a good hangout with edleafe last week. He's going to focus on refining the specs and getting a clean roadmap in place for the capabilities things.
14:32:04 <bauzas> I think we already discussed a lot on that :)
14:32:42 <mriedem1> these reviews seem like random adds, so unless there is a specific issue to discuss on them, let's move on
14:32:53 <mriedem1> they are all blocked on other thigns
14:32:56 <mriedem1> *things
14:33:09 <cdent> since my names associated with those reviews, I'll just repeat what I said up above: I'd appreciate review long in advance of them being considered done
14:33:25 <cdent> otherwise I'm down an unlit tunnel
14:33:58 <cdent> all three of those are attached to the work items on the generic-resource-pool spec
14:34:11 <edleafe> cdent: you'll bump intot he rest of us in that tunnel
14:34:42 <cdent> bring a candle edleafe and we can light the world
14:34:45 <edleafe> So everyone - please give your feedback on those reviews
14:34:46 <bauzas> moving on ?
14:34:53 <edleafe> #topic Opens
14:35:10 <edleafe> cdent has one: Someone please explain the logging issues to me as if I'm five
14:35:24 <cdent> I simply don't get why people want that info?
14:35:24 <mriedem1> is that referring to https://review.openstack.org/#/c/306647/ ?
14:35:38 <edleafe> Here's what I'm understanding: Some operators need to be able to inspect the internal decisions of the scheduler filters and weighers in order to determine why their configuration isn't yielding the results they anticipated.
14:35:51 <cdent> That part it appears to taken for granted or assumed, so I thought perhaps someone could, you know, light that candle
14:36:20 <edleafe> mriedem1: zactly
14:36:20 <cdent> edleafe: yes, but _why_?
14:36:35 <mriedem1> cdent: because on the outside when scheduling fails you get NoValidHost
14:37:01 <mriedem1> we don't expose something like, you're using numa filtering but there is no room to place this request on your x numa nodes
14:37:22 <edleafe> mriedem1: but the flip side is when scheduling succeeds, but doesn't pick the host they thought it would
14:37:34 <edleafe> mriedem1: they want to understand that process, too
14:37:53 <mriedem1> yes, that's a more advanced ask from jlk, at least per my understanding
14:38:00 <cdent> Are people trying to use scheduler behavior as a proxy for resource monitoring?
14:38:24 <edleafe> we already expose which filters are removing the hosts
14:38:24 <cdent> Or because they just want want be able to answer someone's questions?
14:38:39 * cdent is still five
14:39:08 <Yingxin> maybe notification is more helpful to see the claim decisions across compute-nodes
14:39:40 <mriedem1> cdent: in part it's the latter, simply explaining to someone why their boot request failed
14:39:46 <mriedem1> "The primary use case is when an end-user tries to perform an operation that requires scheduling an instance, and it fails.  They then ask the Deployer why it failed, but there is no obvious failure reason in the logs."
14:40:28 <bauzas> the idea is to stack up the pile of errors so that it could eventually to a better consistent error story, also
14:40:58 <cdent> It's not an error, though, right?
14:40:59 <bauzas> and not just grepping all logs trying to figure out the sequence
14:41:10 <mriedem1> cdent: no, it's not
14:41:13 <edleafe> cdent: that was my complaint
14:41:13 <bauzas> well, it's technically not an error, yes
14:41:17 <Yingxin> bauzas: ++
14:41:20 <edleafe> it's correct functioning
14:41:31 <edleafe> they just want visibility into that
14:41:39 <mriedem1> more like an audit log
14:41:44 <mriedem1> you requested x,
14:41:46 <edleafe> mriedem1: +1
14:41:47 <mriedem1> we processed through y,
14:41:52 <mriedem1> and ended up at z, which was a failure
14:41:54 <bauzas> it's rather helpful log messages for operators wanting to reverse understand
14:42:18 <mriedem1> this spec is about dumping the decisions made while processing (y)
14:42:27 <mriedem1> w/o enabling debug logging
14:42:30 <bauzas> yeah
14:42:37 <bauzas> which is a PITA for most operatoes
14:42:45 <johnthetubaguy> yeah, thats the key, debug logs when you need them, without needing debug all the time
14:42:59 <bauzas> just unstack whenever you want
14:43:04 <johnthetubaguy> well, enough logs to find out what failed, without needing all debug logs
14:43:16 <johnthetubaguy> what failed, and why ^
14:43:22 <cdent> so accumulate info, but only dump if things go "wrong" (for various definitions thereof)
14:43:40 <mriedem1> correct
14:43:40 <bauzas> at least, the idea is to provide an in-memory mechanism for that, not exactly what we'll do with it
14:44:03 <bauzas> but lxsli had an oslo.log spec to make that in the right place
14:44:06 <mriedem1> the spec has gotten muddier because of more advanced requests for dumping all the time
14:44:16 <mriedem1> and filtering out certain logs
14:44:45 <mriedem1> so now it's kind of a frankenstein of ideas
14:44:52 <mriedem1> and config options
14:44:55 <edleafe> design by committee?
14:45:03 <cdent> should we maybe just shoot for the very basic and build from there, later?
14:45:10 <cdent> accumulating and dumping is a nice start
14:45:28 <mriedem1> cdent: that's what i said last week
14:45:41 <edleafe> cdent: well, the "only certain filters" was in response to concern about massive memory loads
14:45:44 <cdent> yeah, but last week I was only 4, now I'm five
14:46:48 <mriedem1> anyway, i still need to go through the latest revision where a 3rd config option was added
14:47:02 <mriedem1> i defer to johnthetubaguy and bauzas here given their experience with this problem
14:47:24 <mriedem1> as long as the default behavior fixes the base issue, i'm ok with it
14:47:35 <bauzas> that's my point
14:48:05 <bauzas> the spec is nice for adding a piling mechanism for logs in the scheduler, which is like #1 complaint for operators digging for an issue
14:48:18 <edleafe> cdent: any closer to being 6 yet?
14:48:33 <cdent> edleafe: Getting close.
14:48:42 <edleafe> OK, that's progress then
14:48:44 <bauzas> my only concern would be then the memory footprint, but I don't expect much of that
14:48:51 <cdent> yes, thanks to all of you
14:48:59 <edleafe> Any other opens?
14:49:07 <bauzas> at least, that's something that I will be super causcious in the implementation phase
14:49:14 * cdent wants a pony and a spaceship
14:49:28 <bauzas> just one point
14:49:45 <bauzas> again about the check-destinations blueprint that has been accepted for Newton
14:50:03 <bauzas> code is partially up, and I beg for reviews
14:50:10 <edleafe> bauzas: link?
14:50:24 <bauzas> now, I'm biting into the trickiest part, ie. the API side of it
14:50:39 <bauzas> https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/check-destination-on-migrations-newton
14:51:33 <bauzas> that's like operators #2 complaint I heard
14:51:55 <bauzas> "why I'm running into trouble every time I'm providing a destination ?'
14:52:13 <edleafe> ok, everyone please review the check destinations patches from bauzas
14:52:17 <bauzas> and "why my hints are not persisted ?"
14:52:22 <edleafe> Any other topics for opens?
14:53:11 <edleafe> OK, thanks everyone! Now start reviewing stuff!!
14:53:13 <edleafe> #endmeeting