15:00:12 #startmeeting gantt 15:00:13 Meeting started Tue Jun 3 15:00:12 2014 UTC and is due to finish in 60 minutes. The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:14 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:17 The meeting name has been set to 'gantt' 15:00:20 \o 15:00:25 o/ 15:00:27 anyone here want to talk abou the scheduler? 15:00:43 \o/ 15:01:05 * n0ano wonders how many was you can combine / and \ 15:01:26 I'm left-handed :) 15:01:41 bauzas, you & my wife :-) 15:01:41 so \o is better than o/ 15:01:44 * mspreitz /*\ 15:01:54 * mspreitz /o\ 15:02:48 well, why don't we get started (all the important people are here) 15:02:51 * johnthetubaguy is lurking, but on a call 15:02:58 #topic forklift 15:03:12 mainly status I think, anything to report bauzas ? 15:03:53 sorry, was mailing 15:04:02 so, yes, big status 15:04:20 progress so far on implementing the sched-lib 15:04:28 https://review.openstack.org/82778 15:04:43 (that's eating most of my nights now, as juno-1 is next week) 15:05:22 I'm about delivering a new patchset (hoping to land by tomorrow) taking in account all comments 15:05:47 I spent most of my time this week on 2 big concerns 15:06:13 #1 : we're not using objects in RT, so I had to trick some things for using objects with sched-lib 15:06:35 that requires some refactoring effort on that patch 15:06:54 "RT" ? 15:07:01 #2 : I raised the concern that IMHO, logic should stay in the Sched-manager 15:07:10 mspreitz: RT : ResourceTracker, my bad 15:07:36 about #2, a dependent patch has been landed by yesterday 15:07:48 https://review.openstack.org/97232 15:08:05 your comments are welcome on that patch 15:08:33 #action everyone to review https://review.openstack.org/97232 15:08:37 it will be updated tomorrow with the updates from https://review.openstack.org/82778 (they are dependent) 15:08:52 well the most important thing is architectural 15:09:02 I mean, I ported the logic to the sched manager 15:09:20 bauzas: on https://review.openstack.org/82778, client.py line 55 15:09:27 but with that 97232 patch, that means that now compute nodes are now sending updates to scheduler 15:09:28 I put a comment there 15:09:38 could you take a quick look please? 15:09:53 https://review.openstack.org/#/c/82778/13/nova/scheduler/client.py, line 55 15:10:10 bauzas, in re updates to sched - this is in addition to the compute nodes updating the DB? 15:10:11 toan-tran: yay, saw your comment 15:10:34 toan-tran: I'm sorry, but no this is service_name 15:10:54 toan-tran: you're getting a service with possibly multiple nodes 15:11:18 toan-tran: but wait my new patchset, the logic will be rewritten so that it will be clearer to read 15:11:35 n0ano: not exactly 15:11:37 bauzas: thanks, it's rather confusing the variables' name 15:12:06 and please if you can add some description on compute_nodes' structure, that would be greate 15:12:09 n0ano: the problem is that computes are using conductor to update DB for compute_nodes 15:12:47 n0ano: even if we externalize the call to the conductor into a separate library, that still means that computes literally update compute_nodes 15:13:14 n0ano: it should only place a call to an API to the sched 15:13:26 n0ano: so the sched would update its own DB 15:13:45 n0ano: but that means now that all RT updates will go thru sched 15:13:50 I thought no-db-scheduler was in the future 15:13:56 n0ano: that's a possible bottleneck 15:14:01 which is the way compute nodes used to work (the more things change the more they stay the same) 15:14:10 mspreitz: that's not related to no-db work 15:14:26 but it sounds like it..? 15:14:43 no-db work is about having a no-db backend for scheduler 15:14:54 but the blueprint is confusing 15:15:11 on my side, I'm not changing how we store things 15:15:18 mspreitz, I think the point is compute sends update to the sched, where sched stores that info is upto the sched, db for now, memory when no-db is in 15:15:31 I'm just making sure that only sched holds the compute_nodes table 15:15:40 n0ano: +1 15:16:15 yo no hablo ingles 15:16:17 anyway, if we consider Gantt, this is a long-term feature 15:16:30 guah 15:16:34 as RT will need to call Gantt for updating its state 15:17:03 so anyway, RT will place an external call 15:17:20 the problem is that it requires Gantt (or the sched now) to be robust enough 15:17:22 I agree, I think compute status updates should go to the sched and then let sched decide the best way to store the info so this is good. 15:18:14 n0ano: this is rather heavy for Gantt 15:18:20 so, to sum up the most important work is on https://review.openstack.org/82778 15:18:38 should we have some synchronizer to handle DB ? like no-db 15:18:54 and reviews are welcome on https://review.openstack.org/97232 and https://review.openstack.org/89893 15:18:54 toan-tran, maybe but I've just created a BP ( https://blueprints.launchpad.net/nova/+spec/on-demand-compute-update ) to change the way we send updates... 15:19:12 n0ano: that's a good thing 15:19:19 change from periodic to on demand, I thought someone was already working on this but I guess not so I'll start it 15:19:32 mmm, that was about no-db discussion 15:19:36 IIRC 15:19:56 n0ano: +1 15:20:00 n0ano: ping us the nova-spec draft once you're done with 15:20:17 n0ano: so I'll be able to review it 15:20:23 status updates are orthogonal to no-db, I think the no-db spec got a little overly complex 15:20:28 n0ano: could you do some analysis on performance ? comparison with current method 15:20:42 bauzas, sure, the BP is there, I have to do the details for the git repo 15:20:47 some graph would be nice :) 15:21:05 hello how are;-) 15:21:09 n0ano: I subscribed to the BP, so I'll get the patch link 15:21:10 toan-tran, hard for me, I have like a max of a 3 node system :-( I'm not a bluehost 15:21:49 n0ano: well, we don't need a real system for that 15:22:06 ok maybe I will make some Matlab graph so see 15:22:17 toan-tran: your ideas are welcome 15:22:17 estup 15:22:50 canaima172423: we're in the middle of a meeting, please join #openstack-101 if you want to talk about Openstack 15:23:00 toan-tran, any suggestions on how to get some scaling date from a small system would be welcom 15:23:18 bauzas, I tried to talk to him on a private dialog but he seems to be ignoring me 15:23:46 I just remind you all that juno-1 is next week 15:24:09 And we're having a Nova bug day today? 15:24:15 so, if you want to vote on having sched-lib to be merged by juno-1, please put some reviews :) 15:24:19 bauzas, anyway, sounds like you have the forklift well in had (baring some reviews) any other help you need? 15:24:26 s/had/hand 15:24:58 n0ano: as said last week, I'll probably require some help for implementing https://review.openstack.org/89893 15:25:09 it's targeted for juno-3 15:25:47 btw, I'll travelling next week 15:25:52 s/be 15:26:07 I have some colleges (sp?) in China, let me see if I can get someone to work on that 15:26:08 so I won't be able to attend the meeting (: 15:26:18 :( 15:26:38 bauzas, NP but if you can send me a quick email update before hand that would be good 15:26:39 and Monday is bank holiday in France 15:26:58 n0ano: will do - don't hesitate to ping me by email ;) 15:26:58 so, we don't work for a bank :-) 15:27:24 * n0ano favorite holiday is Tomb Cleaning Day in China :-) 15:27:32 well, I don't know the word, I would say 'legal' holiday then :) 15:27:49 anyway, I'm done 15:27:55 bauzas, no, your were correct, I was just making a pun 15:28:01 any other questions about the forklift? 15:28:06 bauzas: well, depending on company, mine still works :) 15:28:06 bauzas, tnx, good work 15:28:07 n0ano: :D 15:28:28 #action n0ano to get someone to work on https://review.openstack.org/89893 15:28:36 moving on 15:28:47 #topic no-db scheduler 15:28:50 toan-tran: don't leave me explain Pentecost Day in France and its paperwork-related stuff :) 15:28:53 YorikSar, you there 15:29:04 Yea, hi 15:29:17 I've seen a lot of comments to my spec 15:29:22 hi YorikSar :) 15:29:28 YorikSar: indeed :) 15:29:31 indeed, we finally got moving on that 15:29:41 Although I never found time to answer or address them. 15:30:02 I guess I'll be working on that this week. 15:30:15 YorikSar: cool let us know 15:30:29 You all will know in Gerrit's emails ;) 15:30:48 ;) 15:30:48 BTW, for the rest of us who do not know Kafka, is there a short sharp summary of what it is and why the advocate thinks it is relevant? 15:30:53 (in? from? through?) 15:31:32 YorikSar: in john garbutt's comment 15:31:40 mspreitz: I'm sorry, maybe johnthetubaguy can comment it ? 15:31:45 mspreitz: I honestly didn't understand how it could fit in our scheme. 15:31:54 http://kafka.apache.org/ 15:32:20 YorikSar: I haven't said the word tooz :) 15:32:21 just seemed a lot like the mem cache queue of updates, but already implemented 15:32:53 the feed back in the summt was it sounds like we are re-inventing a DB 15:33:06 YorikSar: there is also https://github.com/stackforge/tooz 15:33:13 johnthetubaguy, +1 (that's what I heard at the summit also) 15:33:19 +2 15:33:30 johnthetubaguy: That's very unfortunate outcoe. I wish I could be there to avoid such confusion. 15:34:33 OK, I'll agree on the question of Kafka. The proposed design is about getting updates to schedulers, it seems to be working around some presumed problem with fanout 15:34:34 YorikSar, maybe a focused email to the dev list to address this issue from you would be good 15:34:46 bauzas: tooz seems to be not about delivering data from tons of servers to some number of recepients. 15:35:06 YorikSar: indeed, it's only about election, you're right 15:35:25 YorikSar: I was thinking about it for the scheduler 15:35:35 I mean, "why not Kafka" is a good question 15:35:39 YorikSar: but my mind slipped a little bity 15:35:50 I wouldn't mind background on why oslo's fanout is not good enough 15:35:51 I'll take a closer look at Kafka, yes. But I feel like it won't be good for our case. 15:36:01 well, the problem is about the spec with regards to the timeline 15:36:49 Synchronizer provides not only better delivery pace but also some semi-persistence for "subscribers" that just came online or were sleeping too long. 15:36:58 I mean, that's a big change, and we're only having 2 months for juno 15:37:24 bauzas: That's not a big change... 15:37:40 YorikSar: well, you introduce many concepts here :) 15:37:52 bauzas, if the backend is selectable between the current DB and the new scheme then the change isn't that disruptive 15:37:59 YorikSar: and some of them are disruptive, see my comments in the spec :) 15:38:12 * YorikSar wishes to hide this work behind some other name so that everybody would forget what've been said about it during the whole year of dreaming the desing... 15:38:58 YorikSar, name change probably not an option but I understand you :-) 15:39:06 YorikSar: well, the problem is that the spec is not that clear, I'm sorry :( 15:39:35 YorikSar: I mean, it seems some points are overlapping other developments 15:40:03 I think I'll try to convince people in spec first. And then I'll probably start some ML topic so that community could follow current state of things with this bp 15:40:09 What is wrong with oslo's fanout messaging, and why would the proposed backend do the job better? 15:40:12 YorikSar: and you're proposing to rewrite the whole SQLA backend 15:40:41 mspreitz: IIRC, fanout has been banned a long time ago 15:40:49 bauzas: why? 15:41:06 mspreitz: lemme find the thread :) 15:41:19 (not an idle question, we need to know we are not re-producing the same problems) 15:41:20 bauzas: Well... It's a backend, right? This work just replaces a piece of wiring from compute nodes to the scheduler itself. 15:41:47 we discussed fan out a long time agao but I don't think there was a definitive result, there are still proponents & opponents of it 15:42:29 mspreitz: Imagine this. Currently we have 1 message for every node every 1 min. With fanout that numbet will get multiplied by the number of schedulers/ 15:42:51 mspreitz: AFAIC that had been placing too much load to MQ. 15:43:11 YorikSar: the proposed design does as much messaging in total 15:43:43 and with several schedulers, the backend is sending most of it 15:43:44 YorikSar, note my new BP ( https://blueprints.launchpad.net/nova/+spec/on-demand-compute-update ), change the 1 min update to on demand and a lot of that load goes away 15:44:01 mspreitz: This desing keeps numeber of messages the same (unless you plug compute nodes directly to synchronizer). 15:44:12 what is same as what 15:44:14 ? 15:44:19 mspreitz: there we go : https://blueprints.launchpad.net/nova/+spec/no-compute-fanout-to-scheduler 15:44:32 1 message per node per minute 15:45:34 Both new design and oslo fanout send O((num schedulers) * (compute node update rate)) messages from backend / through message broker 15:45:51 n0ano: I thought the source of node state is not that static. E.g. you can add some RAM to compute node and it'll show up on periodic update. 15:45:55 s/messages/message content/ 15:46:36 YorikSar, you're talking about hot add of mem - that's just another (unlikely) event that causes an update 15:46:52 anyway, I don't think the main discussion about no-db is here :) 15:46:53 mspreitz: No... Schedulers retrieve new records from backend in packs while compute nodes push them there with the same pace. 15:47:09 that's why I s/messages/message content/ 15:48:00 How big is a compute node update? n0ano's question is relevant here 15:48:19 n0ano: Ok, I remember I had an example of change that was triggered independently from nova-compute but I don't remember what it was. 15:48:36 I'm just having pdb running 15:48:48 don't ask me to calculate the len 15:48:49 mspreitz, last I saw the log message it was about 20 lines of 80 characters 15:48:49 :) 15:49:30 1226 chars 15:49:34 :) 15:49:46 well, that depends of course 15:50:07 bauzas, pretty close to my 1600 estimate and yes, it varies a little, but not that much 15:50:17 cpu_info is the most greedy 15:50:45 and the bad is that it's very static 15:50:53 you don't change CPUs every day 15:51:01 * toan-tran wonders how close is 1600 to 1226 15:51:08 bauzas, and the most static, we could change the update into two type (static/dynamic) if the size is a big problem. 15:51:08 bauzas: Depends on your hobby :) 15:51:39 YorikSar: :) 15:51:39 toan-tran, within 1 order of magnitude, WFM :-) 15:52:08 n0ano: now I understand when you said "we don't work for the bank" :) 15:52:23 toan-tran, touche :-) 15:52:25 guys, I know that hyper-v people cancelled the next meeting, but is it reasonable to chat about it while we're only havnig 8 mnis left ? :D 15:53:03 I guess we can finish no-db topic here. We'll continue the discussion in the spec draft. 15:53:09 bauzas, I get fried after 60 min. anyway, I'd prefer to have YorikSar update his spec and send out the emails and then discuss later 15:53:24 n0ano: strong approval here 15:54:01 #action YorikSar to update the spec and start email thread on the dev list 15:54:05 but that RPC payload discussion is really passionating 15:54:50 bauzas, I don't mind, strong opinions are good as long as no one gets intimidated 15:55:09 let's move on 15:55:12 #topic opens 15:55:19 anyone have anything new to raise today? 15:55:22 yey, I mean I would love to discuss about it still 15:55:39 5 mins left :) 15:55:57 well, I intended to talk about my new patch: https://review.openstack.org/#/c/61386/ 15:56:12 just a reminder, won't be avaiable from mon to thurs next week 15:56:20 but I don't think we have time left, so maybe next time :) 15:56:37 toan-tran: I briefly readed your spec 15:56:50 it's on my demo at Atlanta 15:56:51 toan-tran, sure, I'll queue it up for next week (doesn't look like it's getting much love so far) 15:56:59 toan-tran: very interesting, but I think we need to define a clear path for this 15:57:11 bauzas: +1 15:57:24 toan-tran: and I would love to help you contributing on this 15:57:30 in fact I submitted it some months ago 15:57:35 #action n0ano to add https://review.openstack.org/#/c/61386/ to agenda for next week 15:57:47 and after Atlanta I got really good talk with Jay Lau 15:58:04 his Tetris is what I need for complete my schema 15:58:04 toan-tran: yey, I think that Jay and I are sharing same views 15:58:06 :) 15:58:23 toan-tran: but that's a big baby 15:58:38 bauzas: here is my presentation: https://docs.google.com/file/d/0B598PxJUvPrwcWZlaUlaOW11enM/edit? 15:58:47 page 20 is my vision on the whole scheduling 15:58:53 toan-tran: even bigger than Gantt IMHO :) 15:58:59 and Tetris fits right in Service Manager 15:59:45 toan-tran: based on last Summit, I fear that it will be too big for Nova 15:59:54 toan-tran: but that's a good fit for Gantt 16:00:11 bauzas: yeah, we expect Gantt will be part of it :D 16:00:20 so that will be Gantt + Tetris + Congress 16:00:49 I was thinking that GTC was related to fast cars :) 16:00:56 but the first step is small & simple, to make an policy-based engine that can fit in nova-scheduler or gantt 16:01:07 bauzas: +) 16:01:13 top of the hour guys, tnx, good discussion, we'll talk on email and be here next week. 16:01:17 :) 16:01:19 #endmeeting