13:59:53 #startmeeting nova-scheduler 13:59:54 Meeting started Mon Jan 11 13:59:53 2016 UTC and is due to finish in 60 minutes. The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:59:55 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:59:57 The meeting name has been set to 'nova_scheduler' 14:00:02 anyone here to talk about the scheduler 14:00:41 o/ 14:00:46 yes 14:00:51 morning all 14:00:59 o/ 14:02:06 staff meeting... 14:02:26 lxsli, so, I'm in two meetings right now myself :-) 14:02:27 heya 14:02:31 \o 14:02:35 n0ano: me too :) 14:02:45 o/ 14:02:46 looks like we have quorum so let's go 14:02:57 #topic specs, BPs, patches 14:03:03 * bauzas attending his first '16 sched meeting \o/ 14:03:19 * johnthetubaguy lurks 14:03:24 the big one here belongs to you jaypipes , what's happening with your resource providers BP? 14:04:25 n0ano: I have split it into resource-classes, resource-providers, generic-resource-pools, and compute-node-inventory blueprints. 14:04:39 n0ano: still working on compute-node-allocations blueprint, which is final one in series. 14:04:40 and some of them are being implemented, right? 14:05:06 bauzas: yes, cdent is working on resource-classes (patches pushed already I believe) and we are working on others slowly 14:05:16 coolness 14:05:18 oh, hi, I'm here 14:05:26 I apologize for not being able to attend these meetings for a couple months :( 14:05:28 yeah: I pushed some resource-classes stuff 14:05:33 and for being so slow on the spec stuff :( 14:05:41 but have held off on the next step waiting on the specs to progress 14:05:43 I'd advocate for using the etherpad of doom then :) 14:05:59 jaypipes, NP, my only concern is we're beyond feature freeze, do we need to get exceptions for these 5 BPs? 14:06:02 ie. https://etherpad.openstack.org/p/mitaka-nova-priorities-tracking 14:06:06 n0ano: no 14:06:17 n0ano: no exceptions are possible now 14:07:05 bauzas, then what's the implication for Mitaka? 14:07:39 n0ano: we can continue to work, but no merges 14:07:45 n0ano: need to chat with johnthetubaguy about it. I was under the impression that the resource-providers work was a bit of an exception to the rule. 14:07:57 jaypipes, +1 14:08:26 cdent: jaypipes: like I said above, would it be possible to mention all the specs + implems in https://etherpad.openstack.org/p/mitaka-nova-priorities-tracking ? 14:08:30 if we get it reviewed, and folks to work on the follow up bits, thats OK I think 14:08:41 * bauzas was pretty dotted-line during the past weeks 14:08:48 bauzas: yes. I will do that ASAP. 14:09:04 yeah, in that etherpad sounds like a good plan 14:09:12 johnthetubaguy, do we have a hard cutoff date for getting the BPs approved? 14:09:29 it was a few months back 14:09:36 :) 14:10:01 but the process is here to help, so if we need exceptions, and everyone wants it, lets do what we can 14:10:15 so no, we'll just work with asap 14:10:38 honestly, I'm not that concerned by the Mitaka freeze 14:10:43 if it were not a priority item thats blocking os many things, it would be a hard no, but will to see what other folks think for this 14:11:07 so, if jaypipes can update the epad with links to all the BPs we'll try and get them reviewed and approved as soon as possible 14:11:10 ie. we should be able to accept specs for Nxxx sooner or later 14:11:41 and we could just make sure that if we get enough stuff stable, it could be iterated very quickly for landing in N before Summit 14:11:45 like we did for ReqSpec 14:12:10 no precise need for bypassing what was agreed 14:12:54 I don't think we're bypassing so much as interpreting guidelines :-) 14:13:08 anyway, let's see if we can get it approved as soon as possible. 14:14:06 I think that's it for BPs, all others have either been approved or deferred to N, are they any specific patches people want to discuss today? 14:14:26 I pushe a bug fix that needs some review: https://review.openstack.org/#/c/264349/ 14:14:41 was rated as "high" 14:14:57 cdent, excellent, let's all try to review it 14:16:09 I pushed a backlog spec here at we talked about last week: https://review.openstack.org/#/c/263898/ 14:16:14 cdent: in my pip 14:16:18 pipe even 14:17:06 carl_baldwin, cool, you should also update the mitaka topics page at https://etherpad.openstack.org/p/mitaka-nova-midcycle 14:17:55 n0ano: See L73 14:18:05 carl_baldwin: I was talking with armax about making sure we find time for the neutron related things, its on my mental list as well 14:18:20 carl_baldwin, I'm blnd, I didn't scroll down :-) 14:18:25 johnthetubaguy: many thanks 14:18:33 n0ano: No worries, it is a busy agenda. 14:19:00 kind of seques into the next topic 14:19:06 #topic mid-cycle meetup 14:20:01 given that the neutron scheduling is on one day and a general scheduler topic is on day one I think we're covered unless anyone 14:20:13 I feel so 14:20:55 I should note that scheduler testing is a suggested topic but not on a specific day yet, hopefully that won't be forgotten 14:21:54 should be good I think 14:22:16 OK, moving on 14:22:22 #topic bugs 14:22:41 I was looking into working on this: https://bugs.launchpad.net/nova/+bug/1431291 14:22:42 Launchpad bug 1431291 in OpenStack Compute (nova) "Scheduler Failures are no longer logged with enough detail for a site admin to do problem determination" [High,Confirmed] - Assigned to Pranav Salunke (dguitarbite) 14:22:48 I note that the current list is still at 38 (37 once we push cdent 's ing) 14:23:02 but it feels like reality and the bug have gone in different directions since the bug was created 14:23:24 so I was wondering if someone with a bit more experience could make some statement about the current desired functionality (on the bug)? 14:23:33 cdent: I feel it would be worth discussing that at the midcycle 14:23:51 cdent: because it's more a placeholder bug saying "meh, logs suck" 14:23:59 Yeah, it kinda felt that way. 14:24:08 Is there something more tractable I could/should work on in the meantime? 14:24:10 personally I don't have a problem with closing bugs out with `NOTABUG' 14:24:27 cdent: I guess you saw johnthetubaguy's mentions of what has been done in the past with that bug ? 14:24:30 I have a bug I would like some insight in, I recently marked mine as a duplicate and posted a comment in this one https://bugs.launchpad.net/nova/+bug/1469179 can't really tell if it's the correct room but got directed by johnthetubaguy 14:24:31 Launchpad bug 1469179 in OpenStack Compute (nova) "instance.root_gb should be 0 for volume-backed instances" [Undecided,In progress] - Assigned to Feodor Tersin (ftersin) 14:24:38 bauzas: yes 14:25:09 n0ano: agreed, I feel we could close that one by mentioning what has been done already and ask to reopen if something more specific is needed 14:25:22 bauzas, +1 14:25:27 bauzas: +1 14:25:35 feels out of date post last release 14:25:41 okay, I can do that 14:28:02 tobasco, johnthetubaguy not seeing where this bug is scheduler related, I don't know if anyone here has any insight on this 14:28:54 ah, sorry 14:28:59 looking again 14:29:13 this is more resource tracker 14:29:28 that's even a driver-specific problem, nope ? 14:29:37 I was thinking the resource providers work, so we track shared storage better, would improve things 14:29:39 because the RT is dumpb 14:29:41 dumb 14:29:47 johnthetubaguy, which we are effectively re-writing so maybe this bug might become irrelevant 14:29:54 I think this is cinder volumes confusing local storage space 14:29:55 it only persists what the driver is providing to it 14:30:10 n0ano: not exactly 14:30:31 bauzas, but sounds like this might be cinder related then 14:30:31 so I think most folks doing cinder only would disable the diskfilter and not spot the bug 14:30:32 n0ano: IIUC, the problem is about what's provided as disk space when you have a volume-backed instance 14:30:40 bauzas: Nova is reading root_gb for cinder volumes so effectively this is the resource tracker which creates the wrong stats 14:30:44 but if you are mix and match, it affects you, I guess 14:31:08 so, to be clear, possibly something has to be done on the RT classes 14:31:29 At the same time this effects the scheduler since the DiskFilter combined with the wrong stats created bad scheduling 14:31:49 unless I'm wrong, the DiskFilter is not a default filter 14:31:53 Only solution so far is to exclude DiskFilter or set the disk_allocation_ratio config value 14:32:04 bauzas: The DiskFilter was introduced as a default in liberty iirc 14:32:08 bauzas: it got added, I think 14:32:21 it might have been dropped since then, lol 14:32:22 tobasco, affects the scheduler but it's still doing the right thing, give it the wrong data it will correctly make the wrong decision 14:32:26 oh snap, it is indeed 14:32:37 n0ano: yes 14:32:52 so, yeah, the scheduler only schedules based on what the RT provides 14:32:53 so my through was about resource providers 14:32:57 ++ 14:33:02 I just want to shine some more light into this since running a complete Cinder backed Nova today gives quite some headache 14:33:07 part of that is deciding the cinder vs local resources 14:33:24 zactly, we need jaypipes and cdent at wrok 14:33:25 work 14:33:35 should tobasco maybe have a discussion with some of the cinder people about this? 14:33:36 totes 14:33:44 tobasco: you probably should just disable the disk filter if you are doing that mind, or is there something missing? 14:33:59 not sure we need cinder folk for this, seems a very Nova issue here 14:34:00 a possible workaround could be docs 14:34:15 ie. specify that DiskFilter should be disabled if Cinder backed volumes 14:34:23 as a known limitation 14:34:28 if there are no local disks in the BDM, we should ignore the flavor disk_gb in the resource tracker claim, and the scheduler stuff 14:34:28 johnthetubaguy: In our production environment I have disabled the DiskFilter however this still gives me the wrong stats and should be fixed iirc since it's wrong, or atleast have a look at it. 14:34:37 *imo 14:34:49 tobasco: ok, true, its more that it changes the priority 14:35:24 well, I guess the resource-providers epic already has lots of attention 14:35:32 For example, we run our hypervisors on very low disk and having the wrong statistics on what's actually on local disk is quite scary for people that doesn't know about this. 14:35:43 I feel it's a real bug 14:35:47 with a real problem 14:36:01 and with a possible solution that could be resource-providers 14:36:07 bauzas, yeah but it sounds like this is not really a RT problem, it's being given the wrong data 14:36:25 n0ano: if so, it's not nova 14:36:56 I agree with n0ano if only instances processed by the resource tracker and in the database would have root_gb set to 0 the scheduler nor the stats would have any issues 14:37:17 root_gb = 0 where Cinder backed Nova instances that would say, as root disk. 14:37:30 tobasco, but who should be setting it to 0, nova or cinder? 14:37:46 I remember some conditional in the RT about root_gb, hold on 14:38:51 meh, nevermind 14:38:58 n0ano: I saw some comment about the scheduler using the compute api is_volume_backed_instance function to check and set the root_gb to zero, however I assume this is only theoretical and not tested. I can't really be of very much help, I can troubleshoot code however I'm not so familar with the OpenStack concepts and codebase that I yet can help out. 14:39:49 I would be more than happy to help out and discuss this issue with other people further if it's needed, I have seen a lot of people having this issue and including myself so I would like to shine some light and get it resolved, that's all :) 14:39:54 bauzas: so it might be the spec object population code 14:39:57 bauzas: https://github.com/openstack/nova/blob/master/nova/scheduler/filters/disk_filter.py#L36 14:40:17 but yeah, lets take this into the openstack-nova channel I guess 14:40:31 +1 14:40:34 tobasco, having the scheduler do this check seems wrong, I'd like to see what cinder says about this 14:40:42 johnthetubaguy, +1 to the nova channel 14:41:19 yeah, I'd be very against any change in the scheduler codebase 14:41:30 tobasco, tnx for bringing this up but I think you'll get a better answer on #nova 14:41:34 IMHO, it's a resource issue, not a placement decision issue 14:41:38 n0ano: its the spec object population, rather than the scheduler, but yeah, lets talk about that over the other side 14:41:39 Ok, I'm satisfied, johnthetubaguy can you please help me out to get this further either by helping or giving me some hints on how to proceed. I would be glad to help out. 14:42:06 moving on ? 14:42:08 tobasco: yup, sounds like bauzas might be able to help us too ;-) 14:42:13 bauzas, indeed 14:42:15 aye 14:42:17 #topic opens 14:42:22 +1 14:42:27 that's all from me, anything new? 14:43:15 I have a bit of a blue sky question: Has anybody done any experimentation with using pandas DataFrames as the scheduler's data structure? 14:43:22 * cdent is looking for prior art 14:43:30 * cdent doesn't have any immediate plans, just playing 14:43:52 cdent, ed leafe looked at using Cassandra but we dropped that work, talk to him might be good 14:44:22 cdent, there wasn't really much interest in changing the back end 14:44:28 the selection loop didn't seem that slow, compared to overall time in the scheduler, so I stopped looking at that bit of the scheduler 14:44:34 * cdent nods 14:44:47 what johnthetubaguy said 14:45:02 we want to have a scalable scheduler from zero to beyond infinite 14:45:02 It's more of an exercise in understanding the conceptual stuff, not really coming up with a new solution 14:45:13 there is a fun unit test 14:45:31 cdent: luckily, any out-of-tree implementation can be done 14:45:46 cdent: you just need to implement the interfaces and rock on 14:45:50 * cdent nods 14:46:15 cdent, but expect issues if you want to merge such a thing back in 14:46:18 cdent: the main advantage of the cassandra design was that the scheduler claimed the resournces 14:46:31 * bauzas could joke on numpy dependency tho 14:46:34 cdent: it eliminated the raciness 14:47:21 cdent: https://github.com/openstack/nova/blob/master/nova/tests/unit/scheduler/test_caching_scheduler.py#L205 14:47:53 thanks johnthetubaguy 14:48:02 anything else? 14:49:17 hearing more crickets 14:49:45 I think we're done 14:49:51 tnx everyone, we'll talk again next week 14:49:54 #endmeeting