15:01:02 #startmeeting gantt 15:01:03 Meeting started Tue Sep 2 15:01:02 2014 UTC and is due to finish in 60 minutes. The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:04 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:07 The meeting name has been set to 'gantt' 15:01:13 anyone here to talk about the scheduler? 15:01:18 \o 15:01:21 yes 15:02:04 bauzas, you're supposed to catching up on sleep since you don't get any for about the next 6 months :-) 15:02:22 n0ano: eh 15:02:36 n0ano: last night was pretty good for me, not for my wife ;) 15:03:07 in my family bad night for one pretty much meant a bod night for the other 15:03:40 but just coming back from 3 days PTO :) 15:03:45 n0ano: :) 15:03:49 anyway, let's get started... 15:04:01 jaypipes seems to be not around 15:04:03 occurs to me I put the items in the wrong order so let's start with 15:04:08 #topic next steps 15:04:13 eh eh 15:04:31 as everyone should be aware the isolate scheduler DB bp was rejected for Juno... 15:04:52 I've started a thread to address what I think are problems with the current approval process... 15:04:54 n0ano: yeah, thanks for raising up the discussion in the list 15:05:18 ignoring that, what do we do to keep momentum for gantt going... 15:05:19 n0ano: as I'm the proposer, I was expecting to not reply on that specific problem 15:06:02 n0ano: here, the problem was about the -2 going one week before FPF 15:06:24 my thought is that this is a bump but not a major blockage, as soon as Kilo opens up we will just re-propose the BP, hopefully get it approved as people have more time to think about it, and we still do the split early in the Kilo cycle 15:06:32 n0ano: so I guess that discussion is related to all the ones about how we can improve design summits, reviewing slots etc. 15:06:51 n0ano: well, that's not that easy 15:06:59 n0ano: the concerns were good 15:07:12 n0ano: I mean, we were having a dependency on ERT 15:07:25 n0ano: and the current interface is not so good 15:07:40 n0ano: so, I'm thinking about proposing a new iteration as we discussed last week 15:07:49 bauzas, I'm hoping ERT will be resolved this week and then we can fix our BP, with or without ERT as appropriate 15:07:58 n0ano: oh, you didn't catch them 15:08:23 n0ano: ERT revert patch is abandoned, but Scheduler related one has been -2 today 15:08:30 I was kind of fixated on the BP review process, what did I miss 15:08:43 n0ano: and today is FF, so that means the scheduler side won't be there until Kilo opens 15:09:01 n0ano: one sec, giving you the reviews 15:09:15 * bauzas is coming back from vacation but still on page, eh ;) 15:09:23 bauzas, that's a given so, as I said, we get prepared and re-propose our BP as soon as Kilo opens 15:10:10 https://review.openstack.org/#/c/61773/ ERT scheduler side has been -2 for Junoi 15:10:30 n0ano: that's what I'm saying, I think we need to look at other ways to do the bp 15:10:53 bauzas, question is, what are the odds that they will re-propose ERT? 15:11:05 n0ano: for both the reasons that ERT won't be there when Kilo opens, and also because the concerns were good 15:11:30 n0ano: I guess Paul will ask for a FFE or resubmit it for Kilo 15:11:40 n0ano: but I can't speak on behalf of it 15:12:08 so the question is should we wait for Paul to resolve ERT or should we just work around it 15:12:12 s/it/him (Paul is not an object, although he's working on some patches about them...) 15:12:29 n0ano: as we discussed last week, I will propose another way 15:12:45 n0ano: the idea is to pass Objects from RT to Sched 15:12:54 personally, I would prefer to not wait, I would prefer to work around it and, if need be, redo things when/if ERT is finalized. 15:13:03 bauzas, that works for me 15:13:21 n0ano: well, the design is really different if we don't go with ERT 15:13:35 n0ano: and IMHO, that's more "resilient" in terms of community approval :) 15:13:55 yeah but I don't feel we should wait on it, deal with ERT only when it's finalized 15:14:23 n0ano: agreed that's top prio, but we need to sort some things previously 15:15:28 n0ano: seriously, the current way is pretty crap... 15:15:52 so, if I understand, you're going to update the current BP and then we can look at implementing based upon that new design - right? 15:16:05 n0ano: yeah 15:16:18 n0ano: I was just expecting more support from others than just you and me 15:16:21 :) 15:16:28 n0ano: ie. I need to discuss with jay first 15:16:35 #action bauzas to update the isolate scheduler DB BP 15:16:42 thanks :) 15:17:02 will mark it as WIP, until Kilo spec template is there 15:17:20 because there are chances that the template will change 15:17:28 we're in agreement, getting some sort of concensus from Jay would be good but I'm willing to push through others if that's a problem. 15:17:36 n0ano: indeed 15:17:53 n0ano: hence I'm waiting to see what will be the Kilo process for blueprints 15:18:05 bauzas, I doubt the template will change dramitically, just a tweak or two 15:18:07 n0ano: because atm that's unclear 15:18:36 s/dramit/dramat 15:18:36 n0ano: I'm not really waiting for the template, I'm waiting for big decisions like runways and design summit reboot 15:18:43 * n0ano needs to work on spellin skills 15:18:57 * n0ano and typing skills 15:19:11 n0ano: the design summit format will change for Paris 15:19:41 n0ano: and the cores's reviewing process will probably change for Kilo 15:20:02 the format will change but we will still push for gantt no matter what the procedure we have to follow 15:20:50 as I've said in the past, no one has objected to gantt, the only issues have been how & when 15:20:53 n0ano: of course, but coming from a patch where 54 iterations were necessary for having merged it, I can just say that sometimes, you need to have visibility before doing anything 15:21:21 bauzas, completely agree (don't know whether to laugh or cry over that) 15:21:24 n0ano: the "how" is sometimes requiring more than 3 months for getting it done 15:21:51 remember, I started the gantt work over a year ago and we're still debating it 15:21:54 n0ano: the golden rule for Nova is patience 15:22:15 n0ano: indeed, I was sitting 2 rows behind you in HKG :) 15:22:39 HKG, I started this before Portland :-) 15:22:54 then that's not 1 year... :) 15:23:35 anyway, reviewing the NUMA patches was pretty worth it 15:23:45 I now totally understand ndipanov's concerns 15:24:00 Oh, you're right, I can't count, it was after Grizzly I started (I think, too long ago) 15:24:18 bauzas: can you explain briefly? I have not been following the NUMA stuff 15:24:22 atm, RT is pretty good for consuming CPUs and memory, but very bad at counting more complex objects 15:24:37 anyway, bottom line is you get to redo the BP and I get to needle the powers that be over review process 15:24:48 mspreitz_: well, the problem is really about details 15:24:56 mspreitz_: like what you get is not typed 15:25:08 mspreitz_: so you have to doublecheck what you get 15:25:26 mspreitz_: you also have to do some extra calls for getting some info 15:25:31 bauzas: are you explaining the NUMA concerns? 15:25:49 mspreitz_: yeah, all the fixme stuff these folks were doing 15:26:10 OK, thanks. 15:26:28 mspreitz_: I can just summarize that the problem is that you're providing non-typed and non-validated dictionaries 15:26:53 mspreitz_: and based on where you look at these bits, they are either serialized or not 15:26:55 I have to admit I have not been following the details, since it became clear a while ago that it would take a long time to get progress here 15:27:14 mspreitz_: seriously, I understand 15:27:41 mspreitz_: here the problem with isolate-sched-db is that we're not counting cpus or ram, but apples and bananas 15:27:59 mspreitz_: I mean, real complex objects 15:28:16 so we kinda need some formalization 15:28:54 and as I said last week, we *already* have the toolbox for it, that's called.... objects 15:29:33 thanks 15:29:38 anyway, I'll have to leave earlier today, we can maybe move on ? 15:29:49 yes 15:29:54 bauzas, fer sur 15:30:02 #topic opens 15:30:13 anyone have anything new for today? 15:30:17 I'm glad to say scheduler-lib is merged 15:30:26 +1 15:30:32 * bauzas silently says hurrah 15:30:47 * n0ano vocally says hurrah 15:31:03 that will ease the next steps discussed previously 15:31:44 anything else? 15:31:50 yup 15:32:11 https://review.openstack.org/#/c/117042/ that's something prerequisite for the work 15:32:28 the idea discussed previously is to provide ComputeNode objects to Scheduler 15:32:47 but unfortunately, ComputeNode is having a FK on Services for the bad or the good 15:33:07 so we need to do some prework about that, and this patch is doing that 15:33:42 because that makes no sense to give to Scheduler something having a dependency on anything else 15:33:46 how does this relate to the change to not send those updates unless something changes, there's no longer an update every 60 secons 15:34:16 n0ano: this is not related, here is the patch is about creating the DB entry at startup 15:34:35 but the updates are still done the same way, ie. checking if updated or not 15:35:34 n0ano: the problem was that we were looking the existence of an object every 60 secs 15:35:56 n0ano: your check is not before that, but after IIRC 15:36:09 n0ano: ie. before updating the DB 15:36:20 I though the DB entry was already created at startup 15:36:28 n0ano: nope 15:36:34 n0ano: not exactly 15:37:01 n0ano: the update_avail_resource method was called when nova-compute was booting 15:37:16 n0ano: because of a post-hook mechanism 15:37:38 n0ano: but the check was still done every 60 secs 15:38:01 n0ano: here we refactor where we look at services table, that's really helpful 15:38:37 I have to study the code, I don't understand, let me try and understand this first 15:39:14 n0ano: sure 15:39:19 that's it for me 15:39:36 OK, unless there's anything else 15:40:03 I'll thank everyone and we'll talk next week. 15:40:08 #endmeeting