14:02:25 #startmeeting nova_scheduler 14:02:25 Meeting started Mon Feb 22 14:02:25 2016 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:02:27 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:02:29 The meeting name has been set to 'nova_scheduler' 14:02:31 #chair edleafe 14:02:32 Current chairs: bauzas edleafe 14:02:33 nah 14:02:45 #chair cdent 14:02:46 Current chairs: bauzas cdent edleafe 14:02:52 because he named me 14:02:57 heh 14:02:58 chairs al around! 14:03:01 all 14:03:07 supp' ? 14:03:18 is there an agenda prepared? 14:03:27 heh, guessing n0ano's one 14:03:32 bugs, features and open 14:03:34 so 14:03:44 #topic bugs (because we don't like'em) 14:03:53 so? 14:04:01 anything to notice ? 14:04:22 none that I recall 14:04:25 AFAIK, there was some initiative from n0ano to figure some Intel/RAX folks to help us 14:04:32 but I haven't heard more than that 14:04:51 * bauzas is checking the bug list 14:05:29 #info https://bugs.launchpad.net/nova/+bugs?field.tag=scheduler 14:05:35 I'll try to fix 1523450, 1523459, 1523506, 1515870(1517770) 14:05:59 ok, the last triaged bug is 16 years old 14:06:02 oops 14:06:08 s/years/days :D 14:06:26 Yingxin: cool, ping us anytime if you need further help or guidance 14:06:32 https://bugs.launchpad.net/nova/+bug/1523506 I don't know whether it is actually a bug to fix. 14:06:32 Launchpad bug 1523506 in OpenStack Compute (nova) "hosts within two availability zones" [Undecided,Incomplete] - Assigned to Yingxin (cyx1231st) 14:07:04 Yingxin: okay, I'll look into that one 14:07:06 DO we have more detail on what is needed for https://bugs.launchpad.net/nova/+bug/1431291 ? 14:07:06 Launchpad bug 1431291 in OpenStack Compute (nova) "Scheduler Failures are no longer logged with enough detail for a site admin to do problem determination" [High,Incomplete] - Assigned to Pranav Salunke (dguitarbite) 14:07:06 Yingxin: good evening! 14:07:18 Yingxin: but I fixed most of the races like 1.5yrs ago 14:07:30 jaypipes: good evening~ 14:07:51 bauzas: I think I've found another one :P 14:07:57 Yingxin: ping me tomorrow morning EU if you wish and I'll triage https://bugs.launchpad.net/nova/+bug/1523506 14:07:57 Launchpad bug 1523506 in OpenStack Compute (nova) "hosts within two availability zones" [Undecided,Incomplete] - Assigned to Yingxin (cyx1231st) 14:08:13 Yingxin: interesting, but I doubt :p 14:08:18 bauzas: ok 14:08:43 edleafe: well, that one is Incomplete, so... :D 14:09:07 <_gryf> i've been working on that one: https://bugs.launchpad.net/nova/+bug/1442024 didn't able to reproduce it, scenario and all steps i've performed are as a comment. no one is complained so far 14:09:07 Launchpad bug 1442024 in OpenStack Compute (nova) "AvailabilityZoneFilter does not filter when doing live migration" [Medium,Invalid] - Assigned to Roman Dobosz (roman-dobosz) 14:09:17 edleafe: see https://bugs.launchpad.net/nova/+bug/1431291/comments/22 14:09:17 Launchpad bug 1431291 in OpenStack Compute (nova) "Scheduler Failures are no longer logged with enough detail for a site admin to do problem determination" [High,Incomplete] - Assigned to Pranav Salunke (dguitarbite) 14:09:27 bauzas: exactly. What would it take to give ops a good enough understanding? 14:09:52 edleafe: my point is that we need actionable items and that bug reports doesn't 14:09:57 We enhanced the logging - what else are they asking for? 14:09:58 so, it's incomplete 14:10:09 yeah, we can leave that one rest in peace IMHO 14:10:17 it's an invalid, you don't need to care about it 14:10:28 But it was a High, too 14:10:40 and ? :) 14:10:54 anywaty 14:11:06 _gryf: cool, thanks for helpiugn 14:11:09 moving on ? 14:11:18 * bauzas has fat fingers today 14:11:46 * edleafe gets bauzas a keyboard with bigger keys 14:11:54 I'll ask for AZERTY 14:11:58 anyway 14:12:11 #topic features and blueprints (because we like'em) 14:12:30 so big thread here 14:12:42 who shoots first? 14:13:13 bauzas: i can provide a quick update on resource-providers progress. 14:13:18 \o/ 14:13:22 jaypipes: shoot 14:13:45 btw. lemme put your ML report here 14:14:11 #info http://lists.openstack.org/pipermail/openstack-dev/2016-February/086371.html 14:14:21 jaypipes: you got the mic 14:14:23 bauzas: I pushed up a new revision of the generic-resource-pools blueprint that changes the expected schema slightly (removes the resource_pools table and adds a couple fields to the resource_providers table). 14:14:37 jaypipes: yeah saw that one, it's in my pipe 14:15:01 bauzas: It also had changes to remove the external_id field and forces the use of --aggregate-uuid option in nova resource-pool-create 14:15:04 cdent: I guess you're modifying your series to match with that ? 14:15:12 bauzas: yes 14:15:19 this was based on discussions with superdan, alaski and cdent on Friday 14:15:20 jaypipes: I saw, I began to mark some notes but not uploaded yet 14:15:28 first patch is up (to adjust the models) 14:15:50 jaypipes: actually, lemme see if my notes are for the current PS or not 14:16:16 oh, I uploaded them 14:16:21 https://review.openstack.org/#/c/253187/11/specs/mitaka/approved/generic-resource-pools.rst 14:16:23 oops 14:16:24 #link https://review.openstack.org/#/c/253187/11/specs/mitaka/approved/generic-resource-pools.rst 14:16:33 dstepanenko continues his work on the pci-generate-stats blueprint. reviews welcome on that: https://review.openstack.org/#/q/topic:bp/pci-stats-generate,n,z 14:16:59 <_gryf> is the bp about resource pools (and implementation) at risk due to feature freeze? 14:17:20 _gryf: no. I believe we will be able to complete that one. 14:17:41 jaypipes: I had some concerns about the increase of complexity that BP was having 14:17:43 _gryf: the compute-node-inventory one is slightly at risk but we're trying our best to get most of that pushed. 14:17:53 it introduces a REST API 14:17:55 bauzas: it acutally has *less* complexity than befroe. 14:18:01 which I agree 14:18:03 bauzas: yes. 14:18:16 <_gryf> jaypipes, ok, cool. if you require any help on that, just ping me on irc. 14:18:52 jaypipes: so for example, I was pointing out https://review.openstack.org/#/c/253187/11/specs/mitaka/approved/generic-resource-pools.rst@245 14:19:04 (just discovered that we can tag a specific line in a review, woot) 14:20:22 jaypipes: tbc, while I'm a big fan of your series, I just feel those need to be very described about what are the impact for the existing 14:20:25 bauzas: so, your comment there... we *already* pull all aggregate information in the call to select_destinations(). 14:20:42 indeed, but then we filter out 14:20:55 which is per-host 14:21:11 I don't understand your point. 14:21:16 so, I need to make sure that what you want to modify is the only dummy ComputeNode.get_all() call 14:21:56 bauzas: I'm not prescribing anything there other than a long-term use case to be satisfied that isn't at all what the scheduler currently supports. 14:23:41 jaypipes: sorry if I'm unclear or misunderstood something, I just want to understand what will change and what will stay :) 14:24:41 bauzas: and I'd like to get some of these blueprints approved this cycle... I am struggling to add the level of detail you are asking for in all 6 of the blueprints in this series. 14:25:06 bauzas: there comes a point when we need to be able to amend a blueprint after agreeing on the direction. 14:25:27 jaypipes: that's a good point 14:25:36 jaypipes: agreed 14:26:34 bauzas: and I understand your concern around making any changes that require a refactoring of the filter shceduler. 14:27:24 jaypipes: again, I'm liking your direction, I'm just somehow struggling with operators impact - but we can figure that out later 14:27:51 we always need to think about the aim of the process, if there are details that are best delayed till you see the code, then thats fine 14:28:02 and cdent's patch series are worth reviewing them to see the impacts 14:28:12 bauzas: you are describing concerns about something that is marked as a future use-case that isn't currently supported by Nova. so the impact to operators is non-existent. 14:28:17 johnthetubaguy: agreed 14:29:25 so there are some upgrade worries around not sorting out the future use case, but at some point we need to just make some forward progress, and fix things as we go 14:29:32 jaypipes: okay, it seems we can discuss that offline and see how we can match 14:29:49 so I'd like to address the comments from cdent and bauzas in the next revision and get that pushed ASAP (as in less than an hour). And at that point I'd like an up/down vote on it, if we could manage that. 14:30:07 jaypipes: sounds like a good plan 14:30:21 ++ 14:30:41 not to derail things, do we want to delay the scheduler API to newton at this point? 14:31:27 johnthetubaguy: no. 14:31:28 I think dansmith has some opinions on that johnthetubaguy 14:31:46 jaypipes: so creating a new endpoint by end of M-3 ? 14:31:54 johnthetubaguy: or at least if "scheduler API" means "support for the resource-pools stuff" 14:31:59 bauzas: yes 14:32:29 well 14:33:15 okay, it seems that we have a plan, moving on then ? 14:34:45 jaypipes: so, given that FF is in 2 weeks, it means that I need somehow to find more review time than the expected one for the next 2 weeks :) 14:35:00 but if you feel that's doable, then okay 14:35:26 so its normally at this point I -2 all blueprints that don't currently have all their code up for review, for context 14:35:40 but I want us to make progress here, and we should keep trying for that 14:35:44 so lets see what we can do 14:36:05 ++ 14:36:18 I have cycles available to help out, too 14:36:30 the only reason I bring up the API, is I think we could get that first bit done 14:36:42 johnthetubaguy: ++ 14:36:43 but adding the API seems like a mountain too far at this point 14:37:00 johnthetubaguy: I guess I disagree. 14:37:27 johnthetubaguy: plus, there's zero benefit to this work if there's no REST API that things can use to create shared pools of resources. 14:38:00 the benifit is we can add the rest API on top without having to implement the underneath bits 14:38:30 honestly, I do wonder about a nova-manage hack to let folks test out the new thing, while we agree a REST API 14:39:12 jaypipes: the rest api bit is really only required for the things we said are newton anyway right? 14:39:41 I think the shared storage stuff, kinda needs it, unless you slurp into the DB via a back door 14:39:50 right. 14:40:01 you need some way of adding those records. 14:40:16 right, but in mitaka, our only providers are internal -- compute nodes 14:40:23 but I would rather have the back door via nova-manage than a quickly written API, and marking those calls as experimental, will be remove, etc, etc 14:40:57 if that's what you want, that's fine. 14:41:17 johnthetubaguy: I don't even think we need that 14:41:17 so lets step back, if we get only internal providers sorted for mitaka, we have made a massive step forward, compared to what it looked like two months ago 14:41:20 I'm just a little weary from the analysis paralysis that's happened so far. 14:43:28 folks, that's very important conversation, and I feel we need to make an agreement, but could we move that offline ? 14:43:41 we're 15 mins away from the end of that meeting 14:43:48 so this is release critical right, how do we keep moving forward on this work 14:44:20 cdent: do we have any blockers to get compute's using the resource provider concept in mitaka, at this point? 14:44:37 johnthetubaguy: just the actual work and review.. thing major in the way, IMHO 14:44:48 if we drop the API, could we get in the supporting infrastructure for pools, even if its not useable? 14:45:03 I can see some in-flights patches 14:45:19 sec, pointing out the series 14:45:28 johnthetubaguy: I've been targeting resource-pools as my goal, not entirely certain on the status of compute providers without doing some digging 14:45:36 johnthetubaguy: the generic pool stuff we want for mitaka is nearly merged, and the rest is out for newton, AFAIK 14:45:45 #link https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/generic-resource-pools 14:46:07 ^ that is the generic-rp implementation patches 14:46:37 OK, so if we cut off the API bits off the top, where are we? 14:46:39 bauzas: that's only a small part. 14:46:42 mostly agreed? 14:47:21 https://review.openstack.org/#/q/project:openstack/nova+branch:master+topic:bp/resource-providers seems Implemented to me 14:47:25 cdent: jaypipes: right? 14:47:30 johnthetubaguy: we still need the resource-tracker pieces, a nova-manage tool to add a resource pool, the work to change the scheduler to look at the resource pool inventory instead of the compute node's out-of-whack view of the shared resources. 14:48:11 jaypipes: that's all stuff for newton, yes? 14:48:29 dansmith: I was really hoping to have it in mitaka :( 14:48:54 jaypipes: last week on the hangout we said that was newton stuff... I'm confused 14:49:18 I feel like there's no _way_ that is all happening in mitaka 14:49:21 so that was our main disagreement post midcycle, I guess, I thought we agreed a different set of things, seems maybe not 14:49:25 dansmith: I don't remember that decision. I was referring to resource-providers-allocations and resource-providers-scheduler blueprints being in Newton. 14:50:14 well, I have a hard time using those specs as terms in a discussion.. so many specs makes it confusing.. so I've been talking in terms of actual work items, so maybe that's the problem 14:50:14 dansmith: if the three steps above are not done in Mitaka, there's no value at all to any of the patches, since nothing will be fixed w.r.t. how shared resources are tracked. 14:50:27 that's not true 14:50:42 so we just agreed we should make progress where we can 14:50:44 like I said before, the value is getting the online migrations of compute uuids, compute inventory records created, etc 14:50:49 so that in newton when we go to actually use them, 14:50:51 even if thats not end user visible 14:51:00 mitaka computes are already doing that and we don't need a dependency 14:51:09 dansmith: ah, right, the migrations, that is very visible 14:51:22 yeah, having the migrations completed, will make a big difference in terms of complexity 14:51:36 right, that is what I've been shooting hard for 14:51:37 like it maybe half the complexity 14:51:45 but shared resources will still be totally broken in mitaka. ok... 14:51:51 cdent: does this make sense form where you stand? 14:52:12 getting those migrations in place, that is 14:52:22 jaypipes: right, I don't think we're going to make any actual resource tracking improvement in mitaka.. there's just no time 14:52:38 but if we don 14:52:43 don't do this bit in mitaka, 14:52:44 we have like one week left of a working gate, at this point 14:52:52 johnthetubaguy: so, I was hoping to get a bit further, but I agree that getting migrations and models in place before the end of the cycle is the critical part 14:52:53 +1 for iterating fast on the compute stuff so we could avoid online migrations 14:52:54 dansmith: but you *do* think we should get the compute-node-inventory blueprint cmpleted in mitaka? 14:52:55 we won't be able to reasonably make the improvement in newton either, I expect 14:53:44 jaypipes: again, I can't keep track of the blueprints :) .. I think we need compute nodes recording their inventories in the new place in mitaka, yes, but it won't be read by anything (right?) until newton 14:53:47 fuck I hate 6 month releases :( 14:54:03 dansmith: ++ 14:54:24 jaypipes: if it helps the operators at the meet up hate them just as much, but the other way around 14:54:53 we do release every commit though, but lets not go into that hole 14:54:59 so for mitaka... 14:55:14 operators will always ask for stability and features at the same time, though. 14:55:20 get the DB in the right shape to accept the data we want to put in there for newton? 14:55:27 sure 14:55:49 ++ 14:56:02 because we're 5 mins away, I'll take my chair cap and cut 14:56:02 I think the wibble there is do we add the resource pools bits as well, and since they don't need migrations (?) its not really an issue? 14:56:04 can we at least merge the generic-resource-pools blueprint though? that adds some necessary fields to the resource_providers table that will be needed. 14:56:30 jaypipes: I think it would be easier to merge if the API stuff were separate 14:56:31 jaypipes: sure, just put a new rev and I'll vote on it 14:57:00 I basically agree with the rest of that, at least 14:57:06 +1 14:57:14 #topic open questions 14:57:17 3 mins left 14:57:18 the API I just feel like I haven't fully understood it yet 14:57:25 johnthetubaguy: sigh, ok, yet another blueprint... let me separate the two resource_providers columns into yet another blueprint and then separate out the API bits into yet abnother blueprint. 14:57:27 anyone for anything ? 14:58:11 and I still need to split resource-provdiers-scheduler blueprint into two so that bauzas and I can argue about whether the copmpute node owns its inventory of resources on a separate blueprint. 14:58:22 jaypipes: I appreciate that :-) 14:58:26 so that will make 9 separate blueprints for this. awesome. 14:58:49 so half of those could be combined, but they are separate now, but lets take that offline 14:58:53 ++ 14:58:55 * jaypipes goes to get food before he gets more grumpy. 15:00:14 okay, nothing raised, bye folks, we can continue the convo in -nova 15:00:17 #endmeeting