14:00:09 #startmeeting nova_scheduler 14:00:09 Meeting started Mon May 9 14:00:09 2016 UTC and is due to finish in 60 minutes. The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:11 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:13 The meeting name has been set to 'nova_scheduler' 14:00:15 o/ 14:00:18 o/ 14:00:18 o/ 14:00:19 o/ 14:00:21 <_gryf> o/ 14:00:30 o/ 14:01:01 Agenda for today's meeting: https://wiki.openstack.org/wiki/Meetings/NovaScheduler#Agenda_for_next_meeting 14:01:07 * bauzas waves 14:01:52 * edleafe is waiting for jaypipes to show up... 14:01:54 not sure I got the agenda 14:02:04 do you want us to circle over those ? 14:02:21 bauzas: I sent an email on Friday. You were on holiday, right? 14:02:29 yup 14:02:44 ok, no worries 14:02:51 but I saw the email 14:02:51 #topic Specs 14:02:52 * johnthetubaguy lurks 14:03:07 edleafe: my point is that I can only see gerrit links 14:03:26 edleafe: not sure I see what you want us to discuss 14:03:39 bauzas: ah 14:03:50 Well, I just wanted to keep those in view 14:03:53 i'm assuming request for reviews since they aren't approved 14:04:10 if people had questions about them, they could review and ask here 14:04:26 okay, I'd prefer to see them rather in the etherpad 14:04:27 generic-resource-pools is close, but cdent has some good questions in there for jaypipes 14:04:50 * cdent nods graciously at mriedem1 14:05:01 i didn't see allocations in the agenda, but seems it's holding up other things https://review.openstack.org/#/c/300177/ 14:05:04 needs a rebase 14:05:27 tbc, I mean 14:05:27 * will-i-am (~will-dono@10.10.51.252) vient de rentrer 14:05:27 * pradk (~prad@ovpn-113-136.phx2.redhat.com) vient de rentrer 14:05:27 dansmith, received a blank report for the ceph failure rate, sir. 14:05:27 suspecting high noise on the radio transmission, sir. 14:05:28 sylvainb: heh, yeah, I think it was halfway through the report when my computer suspended on friday.. the email came the instant I woke it up, and cron sent me a traceback separately 14:05:29 argh 14:05:31 i.e. there is a series that starts here and depends on allocations https://review.openstack.org/#/c/282442/ 14:05:40 missed that one - thought allocations merged last week 14:05:42 mriedem1: dansmith and I need to come to terms on that one 14:05:46 https://etherpad.openstack.org/p/newton-nova-priorities-tracking 14:05:50 rather 14:05:59 As a random heads up, I added a draft spec for the sharding idea doffm was asking me about: https://review.openstack.org/#/c/313519/ 14:06:04 * bauzas facepalms for Ctrl-X 14:06:12 #link https://review.openstack.org/#/c/300177/ - Move allocation fields 14:06:25 also worthing add claudiu's spec https://review.openstack.org/286520 it is related to qualitative. 14:06:26 that etherpad is stale 14:06:41 mriedem1: ++ 14:06:43 that's my point 14:07:05 there was a lot of distaste expressed for working in the etherpad. 14:07:07 I wanted to do a spec review time, but https://etherpad.openstack.org/p/newton-nova-priorities-tracking was bad for me 14:07:10 should we revive it? 14:07:46 the consensus was to keep it until we found another better way, amirite ? 14:07:58 #link https://review.openstack.org/#/c/313519/ - Add distinct-subset-shard-scheduler spec 14:08:02 anyway, I don't want to nitpick 14:08:14 i prefer to keep the newton-nova-priorities-tracking etherpad as small and focused as possible 14:08:14 I think the issue with the etherpad is "who is keeping it up to date" 14:08:14 I think the idea was to keep the etherpad for priorities, and keep it short 14:08:22 because we don't actually agree on what the priorities are 14:08:24 so think of adding things in there that need to be approved this week, realistically 14:08:30 so someone needs to be the boss 14:08:46 from my pov, generic-resource-pools and allocations specs are top priority 14:08:50 ++ 14:08:53 since those are the base patches in the queue, right? 14:08:56 "boss" sounds almost like "bauzas", no? :) 14:09:05 edleafe++ 14:09:20 I think jaypipes explained what was achievable for Newton 14:10:11 if we can get the quantitative stuff done for newton, it will be a miracle 14:10:25 bauzas: so do you want to update the etherpad? 14:10:26 what's the state of https://blueprints.launchpad.net/nova/+spec/host-state-level-locking 14:10:47 is it worthwhile to continue the work to implement scheduler claim? 14:10:47 Yingxin: someone needs to revival that spec 14:10:52 I agree with mriedem1, this is going a lot slower than anyone really wants it to, but I guess that's life. From my standpoint what I need to make significant progress is review _before_ stuff is done. 14:11:17 Otherwise will just end up at the end of the cycle with someone saying "oh whoops, you forgot..." 14:11:55 bauzas: I see it is already re-approved for newton 14:11:57 edleafe: well, I surely can, but the goal of etherpad is than anyone can :) 14:12:03 that* 14:12:09 Yingxin: oh nice 14:12:10 cdent: yeah, approving a spec doensn't seem to ensure that it's sufficient 14:12:38 bauzas: of course, but since you brought it up... :) 14:12:44 ack 14:12:51 can we at least agree on the two priority specs in this meeting? is that generic-resource-pools and allocations? 14:13:03 * edleafe notes that bauzas new friday nick should be "boss" 14:13:05 I think that's the right choice mriedem1 14:13:11 mriedem1: yep 14:13:15 ok, so, 14:13:22 mriedem1: that's my understanding post-Summit yes 14:13:25 there are comments in generic-resource-pools, we just need jaypipes to address 14:13:35 as for the allocations one, it needs a rebase, 14:13:48 but cdent - is there something coming out of the code reviews that needs to be addressed in the spec for that? 14:13:58 i'm not sure what you and dansmith aren't agreeing on since i haven't reviewed those patches 14:14:06 ...me either 14:14:23 mriedem1: I feel like I have addressed endless comments on generic-resource-pools, but I will do another round right now. 14:14:27 sorry, mriedem1, I was talking about this: https://review.openstack.org/#/c/282442/, not the spec 14:14:33 as far as I'm concerned, the first thing that needs to happen right now is moving the inventory to the api db before we make any more progress on allocations, right? 14:14:47 jaypipes: i'm pretty much +2 on generic-resource-pools 14:15:02 mriedem1: I had a full reset of all my puters this weekend. a real spring cleaning. unfortunately, I didn't get to review cdent's questions, which I will do right now and ignore the scheduler meeting. 14:15:18 dansmith: i was going to bring that up also, 14:15:20 jaypipes: my latest comments are implementation related questions, not anything that ought to derail the main thrust 14:15:32 because moving the inventory stuff to the api db is going to be a prerequisite for new things 14:15:38 mriedem1: right 14:15:39 I agree 14:15:42 and not only for allocationsm, 14:15:49 but also the aggregates work that doffm is doing 14:15:51 mriedem1: unless I've missed something, I haven't seen that happen yet, so the rest seems moot until it does 14:16:04 I also began to review the PCI related fixes 14:16:16 mriedem1: well, the inventory stuff isn't in the way of aggregates, but it is in the way of allocations I think 14:16:55 * mriedem1 checks the models 14:17:59 in this vein, 14:18:17 the keypairs spec and code is up.. a couple of nits on the spec now that I've written the code, but: https://review.openstack.org/#/q/status:open+branch:master+topic:bp/cells-keypairs-api-db 14:19:06 I think I've avoided any touching of the reqspec in the process 14:19:43 So since we're on *specs* now, can we identify what blockers there are for the specs that need to get approved? 14:20:07 We know that jay is working on addressing the latest comments 14:20:15 for generic resource pools, yes 14:20:37 for allocations, (1) spec needs to be rebased and (2) sounds like we need to migrate inventory to api db before doing allocations 14:20:57 that's my understanding 14:21:12 yeah 14:21:22 does the allocations spec now describe it as in the api db? 14:21:24 and ftr, i don't think we need a new bp for migrating the inventory records, it could just be a work item in the allocations spec 14:21:26 I can handle the rebase 14:21:51 dansmith: no 14:22:05 #action - edleafe to rebase https://review.openstack.org/#/c/300177/ 14:22:09 dansmith: i brought that up in the generic-resource-pools spec too, to be clear about which db the new table goes in 14:22:09 dansmith: nope, not explicitely 14:22:13 mriedem1: do we need another spec for migration inventories and allocations tables to api db? 14:22:16 mriedem1: yeah 14:22:26 jaypipes: i don't think so 14:22:27 jaypipes: no, just amending the allocation one 14:22:31 jaypipes: just make it a work item in the allocations spec 14:22:34 bauzas: k, cool. 14:22:39 mriedem1: ++ 14:22:42 "oh btw, migrate all of this crap first" 14:22:54 jaypipes: I can add that when I rebase 14:24:06 So anything else we need to settle on specs? 14:24:14 jaypipes: alex_xu: what's the progress of capability spec series? does it also have priority in Newton? 14:24:31 Yingxin: i think that is a long shot for newton 14:24:36 loooong 14:24:50 Yingxin: priority spec freeze is after the midcycle, 14:25:01 okay, i guess so 14:25:11 Yingxin: edleafe is going to be working on specs in that series and trying to get a good game plan together for the qualitative side of the request. 14:25:13 Yingxin: I'm adding some capability-related stuff, and yeah, it's going to be difficult to get into Newton 14:25:16 so if by some chance we're looking nearly complete for quantitative by the midcycle, and have a good plan for qualitative at the midcycle, then maybe something will get in 14:25:56 yes, seems still need a lot of discussions for that 14:26:15 +1 14:26:25 https://blueprints.launchpad.net/nova/+spec/resource-providers-standardize-extra-specs is kinda big deal 14:26:46 Yingxin: I have https://review.openstack.org/#/c/313784/ up already if you want to give your feedback 14:26:56 bauzas: and https://review.openstack.org/#/c/309762/ is a dependency for that 14:27:12 I meant https://review.openstack.org/#/c/309762/ indeed 14:27:24 edleafe: fyi, claudiu updated his patch also https://review.openstack.org/#/c/285856 looking for some feedback also 14:27:31 edleafe: thanks, will have a look 14:27:34 alex_xu: thanks 14:27:39 so how many specs do we have now for capabilities / extra specs? 14:27:49 are they all competing? do some build on others? 14:28:07 because at least jay, ed and claudiu have one now 14:28:28 mriedem1: I think we're all looking at pieces of the elephant 14:28:41 mriedem1: I see lots of specs in the summit etherpad 14:28:42 mriedem1: I'll work on tying them together to get a bigger picute 14:28:45 picture 14:28:55 edleafe: ok, yeah, that would be helpful 14:29:05 my one depend all of those thing https://review.openstack.org/298188, but i think that is thing we will think after looong works 14:29:06 Yingxin: yes, lots of prior art here 14:29:29 edleafe: count on me if anything i can help 14:30:04 edleafe: I'll work with alex 14:30:05 alex_xu: sure, thanks 14:30:14 Yingxin: great! 14:30:40 so let's move on 14:30:54 Yes, I think we all have a good direction for this week 14:31:02 #topic Reviews 14:31:17 I put a few on the agenda 14:31:27 Any issues with those that we need to discuss? 14:31:40 Or any other reviews you want to bring up? 14:31:59 mriedem1: I had a good hangout with edleafe last week. He's going to focus on refining the specs and getting a clean roadmap in place for the capabilities things. 14:32:04 I think we already discussed a lot on that :) 14:32:42 these reviews seem like random adds, so unless there is a specific issue to discuss on them, let's move on 14:32:53 they are all blocked on other thigns 14:32:56 *things 14:33:09 since my names associated with those reviews, I'll just repeat what I said up above: I'd appreciate review long in advance of them being considered done 14:33:25 otherwise I'm down an unlit tunnel 14:33:58 all three of those are attached to the work items on the generic-resource-pool spec 14:34:11 cdent: you'll bump intot he rest of us in that tunnel 14:34:42 bring a candle edleafe and we can light the world 14:34:45 So everyone - please give your feedback on those reviews 14:34:46 moving on ? 14:34:53 #topic Opens 14:35:10 cdent has one: Someone please explain the logging issues to me as if I'm five 14:35:24 I simply don't get why people want that info? 14:35:24 is that referring to https://review.openstack.org/#/c/306647/ ? 14:35:38 Here's what I'm understanding: Some operators need to be able to inspect the internal decisions of the scheduler filters and weighers in order to determine why their configuration isn't yielding the results they anticipated. 14:35:51 That part it appears to taken for granted or assumed, so I thought perhaps someone could, you know, light that candle 14:36:20 mriedem1: zactly 14:36:20 edleafe: yes, but _why_? 14:36:35 cdent: because on the outside when scheduling fails you get NoValidHost 14:37:01 we don't expose something like, you're using numa filtering but there is no room to place this request on your x numa nodes 14:37:22 mriedem1: but the flip side is when scheduling succeeds, but doesn't pick the host they thought it would 14:37:34 mriedem1: they want to understand that process, too 14:37:53 yes, that's a more advanced ask from jlk, at least per my understanding 14:38:00 Are people trying to use scheduler behavior as a proxy for resource monitoring? 14:38:24 we already expose which filters are removing the hosts 14:38:24 Or because they just want want be able to answer someone's questions? 14:38:39 * cdent is still five 14:39:08 maybe notification is more helpful to see the claim decisions across compute-nodes 14:39:40 cdent: in part it's the latter, simply explaining to someone why their boot request failed 14:39:46 "The primary use case is when an end-user tries to perform an operation that requires scheduling an instance, and it fails. They then ask the Deployer why it failed, but there is no obvious failure reason in the logs." 14:40:28 the idea is to stack up the pile of errors so that it could eventually to a better consistent error story, also 14:40:58 It's not an error, though, right? 14:40:59 and not just grepping all logs trying to figure out the sequence 14:41:10 cdent: no, it's not 14:41:13 cdent: that was my complaint 14:41:13 well, it's technically not an error, yes 14:41:17 bauzas: ++ 14:41:20 it's correct functioning 14:41:31 they just want visibility into that 14:41:39 more like an audit log 14:41:44 you requested x, 14:41:46 mriedem1: +1 14:41:47 we processed through y, 14:41:52 and ended up at z, which was a failure 14:41:54 it's rather helpful log messages for operators wanting to reverse understand 14:42:18 this spec is about dumping the decisions made while processing (y) 14:42:27 w/o enabling debug logging 14:42:30 yeah 14:42:37 which is a PITA for most operatoes 14:42:45 yeah, thats the key, debug logs when you need them, without needing debug all the time 14:42:59 just unstack whenever you want 14:43:04 well, enough logs to find out what failed, without needing all debug logs 14:43:16 what failed, and why ^ 14:43:22 so accumulate info, but only dump if things go "wrong" (for various definitions thereof) 14:43:40 correct 14:43:40 at least, the idea is to provide an in-memory mechanism for that, not exactly what we'll do with it 14:44:03 but lxsli had an oslo.log spec to make that in the right place 14:44:06 the spec has gotten muddier because of more advanced requests for dumping all the time 14:44:16 and filtering out certain logs 14:44:45 so now it's kind of a frankenstein of ideas 14:44:52 and config options 14:44:55 design by committee? 14:45:03 should we maybe just shoot for the very basic and build from there, later? 14:45:10 accumulating and dumping is a nice start 14:45:28 cdent: that's what i said last week 14:45:41 cdent: well, the "only certain filters" was in response to concern about massive memory loads 14:45:44 yeah, but last week I was only 4, now I'm five 14:46:48 anyway, i still need to go through the latest revision where a 3rd config option was added 14:47:02 i defer to johnthetubaguy and bauzas here given their experience with this problem 14:47:24 as long as the default behavior fixes the base issue, i'm ok with it 14:47:35 that's my point 14:48:05 the spec is nice for adding a piling mechanism for logs in the scheduler, which is like #1 complaint for operators digging for an issue 14:48:18 cdent: any closer to being 6 yet? 14:48:33 edleafe: Getting close. 14:48:42 OK, that's progress then 14:48:44 my only concern would be then the memory footprint, but I don't expect much of that 14:48:51 yes, thanks to all of you 14:48:59 Any other opens? 14:49:07 at least, that's something that I will be super causcious in the implementation phase 14:49:14 * cdent wants a pony and a spaceship 14:49:28 just one point 14:49:45 again about the check-destinations blueprint that has been accepted for Newton 14:50:03 code is partially up, and I beg for reviews 14:50:10 bauzas: link? 14:50:24 now, I'm biting into the trickiest part, ie. the API side of it 14:50:39 https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/check-destination-on-migrations-newton 14:51:33 that's like operators #2 complaint I heard 14:51:55 "why I'm running into trouble every time I'm providing a destination ?' 14:52:13 ok, everyone please review the check destinations patches from bauzas 14:52:17 and "why my hints are not persisted ?" 14:52:22 Any other topics for opens? 14:53:11 OK, thanks everyone! Now start reviewing stuff!! 14:53:13 #endmeeting