14:00:10 #startmeeting nova_scheduler 14:00:11 Meeting started Mon Jan 30 14:00:10 2017 UTC and is due to finish in 60 minutes. The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:12 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:14 The meeting name has been set to 'nova_scheduler' 14:00:16 o/ 14:00:18 <_gryf> o/ 14:00:24 Good UGT morning everyone! 14:00:29 \o 14:00:30 o/ 14:00:37 \o 14:02:48 o/ 14:03:00 OK, let's get started 14:03:08 #topic Specs & Reviews 14:03:17 The first item is: Prevent compute crash on discovery failure https://review.openstack.org/#/c/422780 14:03:25 But that's already merged, so... 14:03:42 <_gryf> oh 14:03:53 <_gryf> it was put on scheduler meeting last time 14:03:58 <_gryf> *on agenda 14:04:06 let's discuss the patch for updating the research tracker for Ironic 14:04:09 #link https://review.openstack.org/#/c/404472/ 14:04:27 _gryf: no worries 14:04:28 \o 14:04:41 Here is the basic problem we found: 14:04:45 I just provided a question about ^ 14:04:58 about why we stop providing DB modifications 14:05:05 This patch changes how the RT reports Ironic inventory 14:05:12 the flavor doesn't contain the node.resource_class and therefore NoValidHosts is returned for any request? 14:05:36 o/ sorry for late, stuck in traffic 14:05:36 We stop reporting the old VCPU-style values, and instead report 1 of the particular class of Ironic hardware 14:05:43 bauzas: ++ 14:06:04 But the placement API cannot select based on custom resource classes (yet) 14:06:32 yeah, that's problematic 14:06:36 So once an ironic node is reporting new-style inventory, it cannot be selected. It's essentially invisible to placement 14:06:43 edleafe: well, there's nothing about custom resource classes that the placement API cannot select on. It's just that there is no mapping between flavor and ironic custom resource class yet. 14:07:03 jaypipes: yes, that's another way to say the same thing 14:07:06 lemme clarify my question 14:07:16 if we stop reporting old-way for new Ironic nodes 14:07:35 edleafe: cdent told me about the idea to just have BOTH the new custom resource class AND the old VCPU/MEMORY_MB/DISK_GB inventory records for a time period. I think that's a good idea. 14:07:40 then how could we possibly have ComputeCapabilitiesFilter using the HostState ? 14:07:57 which is one of the main filters ironic operators use 14:08:03 jaypipes: that's what I've been saying since the beginning, and I thought that's what this patch did in the beginning 14:08:12 (at least until we have traits) 14:08:16 jaypipes: that was our temporary fix for now 14:08:25 bauzas: totally different. one is a qualitative filter (ComputeCapabilitiesFilter) and the other is a quantitative filter. 14:08:40 jaypipes: fair to me 14:08:53 jroll: yes, I know you've been saying from beginning :( it slipped my mind. :( 14:08:54 jaypipes: but ComputeCapabilitiesFilter uses the HostState to know capabilities 14:08:57 so there are two things here (1) keep current stuff working (2) replace all existing features 14:09:10 I think (1) is the first step, which the report old and new helps with 14:09:19 johnthetubaguy: +1 14:09:40 ++ 14:09:43 I think we should still provide the current DB modifications even if we have new Ironic nodes 14:10:03 bauzas: thats the current path, I believe 14:10:07 bauzas: I don't know what you mean by "current DB modifications"? 14:10:13 bauzas: yes, we have to, we can't schedule on the new-style stuff 14:10:41 jaypipes: compute_nodes table is how I understood that 14:10:49 ah 14:10:52 existing compute_node.save()? 14:10:56 johnthetubaguy: unless I missed thay, I think we stop using it 14:11:07 https://review.openstack.org/#/c/404472/26/nova/compute/resource_tracker.py@540 14:11:18 bauzas: g 14:11:29 bauzas: good point, that is bypassed, I missed that 14:11:51 ok, if that's just a mistake, I'm fine then 14:11:57 bauzas: right, I believe the suggestion was to continue calling update_resource_stats() *in addition to* doing the new custom resource class inventory. correct, johnthetubaguy, edleafe and cdent? 14:12:13 I thought it was rather a design point 14:12:32 saying that we should stop reporting old-way if new nodes, which I disagree 14:12:52 we need all the old things to keep happening 14:12:59 we can decide what that looks like in the code 14:13:17 sure my question wasn't an implementation detail 14:13:20 We *eventually* want to stop reporting the old way 14:13:25 We just aren't there yet 14:13:31 rather a discussion to make sure we all agree that we need to support old-way 14:13:45 okay, seems to me we're all violently agreeing 14:13:46 +1 for both, that's what we were doing way back in patchset 4, not sure where it got lost 14:13:53 bauzas: +1 14:13:55 bauzas has spotted a bigger issues, which I think jaypipes is touching, still need to call compute_node.save() 14:13:58 bauzas: yeah 14:14:00 it's just an implementation mistake, point. 14:14:10 gotcha 14:14:54 that said 14:15:01 it ties to me how we cover that 14:15:04 I mean 14:15:16 it seems to me Jenkins was fine with that 14:15:31 so there is another piece, and thats what do we *need* in ocata 14:15:36 but, if we were merging it, then it could be a problem with operators 14:15:47 I just want to make sure we are testing our path 14:16:01 johnthetubaguy: right 14:16:05 bauzas: jenkins failed on the last version, not sure about this updated one 14:16:06 and RC1 is in 3 days 14:16:45 so in pike, if we fixed the whole world... it would be good if the new resources were already there 14:16:53 yeah 14:17:01 because it's also a compute modification 14:17:09 but don't we also need the instance claims to be present for the new resources, else we still can't schedule on just the new resources? 14:17:19 s/claims/allocations/ 14:17:27 meaning that we need to also support N-1 computes not reporting ironic resources new-way 14:17:47 which would defer the placement API being able to schedule something for Ironic until Queens 14:17:50 bauzas: remember current CI job does not set the resource_class on ironic nodes, so doesn't hit the new code path, that's what that test patch and my WIP experimental job are for 14:17:54 and yep, +1 14:18:26 (also because the flavors for this won't work until pike) 14:18:30 So is there any point of disagreement about the problem? 14:18:44 so here I'm thinking of some way to not change everything, but just reporting what's needed 14:18:45 If not, we can start discussing the path forward to fix it 14:18:51 well, if new resource present means new allocations are present, we might be able to transition inside one release 14:18:56 a very small and not invasive patch that would stop send things 14:19:02 if we report things incorrectly today, that will make it worse 14:19:09 right 14:19:34 that said, we can still fix things in a point release 14:19:47 and ask operators to deploy the latest point release on computes before upgrading 14:20:24 here, I want to begin reporting Ironic things in Ocata, without really changing too much 14:20:36 so... my question is really about this bit: https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L128 14:20:59 it feels like for ironic, what we want to do, is claim all resources for the chosen node, regardless of what the flavor says 14:21:13 johnthetubaguy: FWIW, there is a bug with https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L138 14:21:20 because swap is miscounted 14:21:28 but meh 14:21:28 johnthetubaguy, bauzas: hold up, I think you're overthinking this. 14:21:42 quite possibly 14:21:49 * bauzas is my mark of fabric 14:22:19 I mean my trademark 14:22:42 bauzas, johnthetubaguy: if we simply keep the call to update_resource_stats() and in addition to that we just add allocation record for the custom resource class (if present in ironic) then we should be fine. 14:22:55 sec, finding link. 14:23:28 jaypipes: that's what I claim for :) 14:23:31 line 695 here: https://review.openstack.org/#/c/404472/26/nova/compute/resource_tracker.py 14:23:34 but, when we come to try and place things next cycle, all the ironic nodes will be showing free resources 14:23:56 bauzas, johnthetubaguy: so instead of deleting old allocations, we simply create the provider allocations if ironic.node_class is present. 14:23:56 I just want the less invasive thing that would just start reporting things in Ocata 14:24:39 I need to disappear in litterally 2 mins :( 14:25:12 how about I just fix this up and push a change. 14:25:14 jaypipes: we toally can't delete old allocations, we are agreed there 14:25:15 gimme about an hour. 14:25:23 so there is still a problem here, for next cycle 14:25:38 although not sure its as bad as I first thought 14:25:46 johnthetubaguy: if we can get allocations and inventory (both old and new) being wrtitten in Ocata for Ironic, I'd be happy. 14:26:05 yeah, me too 14:26:11 me three 14:26:13 ok, gimme an hour. 14:26:13 +1 14:26:16 if we do allocations and inventory 14:26:17 * jaypipes codes 14:26:28 gotta love PTO. 14:26:42 jaypipes: you could have timed it better 14:26:44 can't do half of it, thats my main concern 14:26:48 heh 14:27:00 I need to bail out, folks \o 14:27:12 ciao 14:27:21 thanks bauzas 14:27:46 jaypipes: ping me once you're done, it's now my top-prio review patch 14:27:53 bauzas: will do. 14:28:21 Looks like we have a plan 14:28:56 One other thing that I'd like to mention is that we could have avoided a lot of this if we had a true functional test in place ahead of time 14:29:25 IOW, something that would create an ironic node, have it report resources, and then have it selected by the scheduler 14:29:43 At the very least it would have identified the holes that needed filling in 14:29:56 such as the flavor extra-specs stuff 14:30:11 +many 14:32:20 Let's move on 14:32:24 #topic Bugs 14:32:28 Nothing on the agenda 14:32:40 I got a couple of bug related fixes that would be nice to get in 14:32:40 Anyone have anything to point out, bug-wise? 14:32:49 https://review.openstack.org/#/c/414230/ 14:32:56 (and the one above it 14:33:04 nothing super serious but useful for debugging 14:34:05 Yes, let's get that in. 14:34:08 Anything else 14:34:10 ? 14:34:19 don't think so 14:34:37 Seems like everyone left after the big discussion :) 14:34:42 #topic Opens 14:34:54 So.... what's on your mind? 14:34:57 couple things 14:34:57 :) 14:35:14 cors didn't make it into the eyes of cors: https://review.openstack.org/#/c/392891/ 14:35:28 but we agreed (in various places) that we would prefer to have it in ocata 14:35:32 but I guess it's too late now 14:35:55 the other thing is docs: I'm going to need help on the api-ref: https://review.openstack.org/#/c/409340/ 14:36:11 not just writing the docs but also creating a date job to draft and publish the docs 14:36:52 #link https://review.openstack.org/#/c/392891/ 14:36:55 * cdent watches everyone leaping at once 14:36:56 #link https://review.openstack.org/#/c/409340/ 14:37:45 cdent: Of course - everyone loves working on doc infrastructure! 14:38:25 ikr 14:38:55 OK, then I think since everyone's gone, let's get back to work 14:38:58 #endmeeting