15:00:12 <n0ano> #startmeeting gantt 15:00:13 <openstack> Meeting started Tue Jun 3 15:00:12 2014 UTC and is due to finish in 60 minutes. The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:14 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:17 <openstack> The meeting name has been set to 'gantt' 15:00:20 <bauzas> \o 15:00:25 <mspreitz> o/ 15:00:27 <n0ano> anyone here want to talk abou the scheduler? 15:00:43 <toan-tran> \o/ 15:01:05 * n0ano wonders how many was you can combine / and \ 15:01:26 <bauzas> I'm left-handed :) 15:01:41 <n0ano> bauzas, you & my wife :-) 15:01:41 <bauzas> so \o is better than o/ 15:01:44 * mspreitz /*\ 15:01:54 * mspreitz /o\ 15:02:48 <n0ano> well, why don't we get started (all the important people are here) 15:02:51 * johnthetubaguy is lurking, but on a call 15:02:58 <n0ano> #topic forklift 15:03:12 <n0ano> mainly status I think, anything to report bauzas ? 15:03:53 <bauzas> sorry, was mailing 15:04:02 <bauzas> so, yes, big status 15:04:20 <bauzas> progress so far on implementing the sched-lib 15:04:28 <bauzas> https://review.openstack.org/82778 15:04:43 <bauzas> (that's eating most of my nights now, as juno-1 is next week) 15:05:22 <bauzas> I'm about delivering a new patchset (hoping to land by tomorrow) taking in account all comments 15:05:47 <bauzas> I spent most of my time this week on 2 big concerns 15:06:13 <bauzas> #1 : we're not using objects in RT, so I had to trick some things for using objects with sched-lib 15:06:35 <bauzas> that requires some refactoring effort on that patch 15:06:54 <mspreitz> "RT" ? 15:07:01 <bauzas> #2 : I raised the concern that IMHO, logic should stay in the Sched-manager 15:07:10 <bauzas> mspreitz: RT : ResourceTracker, my bad 15:07:36 <bauzas> about #2, a dependent patch has been landed by yesterday 15:07:48 <bauzas> https://review.openstack.org/97232 15:08:05 <bauzas> your comments are welcome on that patch 15:08:33 <n0ano> #action everyone to review https://review.openstack.org/97232 15:08:37 <bauzas> it will be updated tomorrow with the updates from https://review.openstack.org/82778 (they are dependent) 15:08:52 <bauzas> well the most important thing is architectural 15:09:02 <bauzas> I mean, I ported the logic to the sched manager 15:09:20 <toan-tran> bauzas: on https://review.openstack.org/82778, client.py line 55 15:09:27 <bauzas> but with that 97232 patch, that means that now compute nodes are now sending updates to scheduler 15:09:28 <toan-tran> I put a comment there 15:09:38 <toan-tran> could you take a quick look please? 15:09:53 <toan-tran> https://review.openstack.org/#/c/82778/13/nova/scheduler/client.py, line 55 15:10:10 <n0ano> bauzas, in re updates to sched - this is in addition to the compute nodes updating the DB? 15:10:11 <bauzas> toan-tran: yay, saw your comment 15:10:34 <bauzas> toan-tran: I'm sorry, but no this is service_name 15:10:54 <bauzas> toan-tran: you're getting a service with possibly multiple nodes 15:11:18 <bauzas> toan-tran: but wait my new patchset, the logic will be rewritten so that it will be clearer to read 15:11:35 <bauzas> n0ano: not exactly 15:11:37 <toan-tran> bauzas: thanks, it's rather confusing the variables' name 15:12:06 <toan-tran> and please if you can add some description on compute_nodes' structure, that would be greate 15:12:09 <bauzas> n0ano: the problem is that computes are using conductor to update DB for compute_nodes 15:12:47 <bauzas> n0ano: even if we externalize the call to the conductor into a separate library, that still means that computes literally update compute_nodes 15:13:14 <bauzas> n0ano: it should only place a call to an API to the sched 15:13:26 <bauzas> n0ano: so the sched would update its own DB 15:13:45 <bauzas> n0ano: but that means now that all RT updates will go thru sched 15:13:50 <mspreitz> I thought no-db-scheduler was in the future 15:13:56 <bauzas> n0ano: that's a possible bottleneck 15:14:01 <n0ano> which is the way compute nodes used to work (the more things change the more they stay the same) 15:14:10 <bauzas> mspreitz: that's not related to no-db work 15:14:26 <mspreitz> but it sounds like it..? 15:14:43 <bauzas> no-db work is about having a no-db backend for scheduler 15:14:54 <bauzas> but the blueprint is confusing 15:15:11 <bauzas> on my side, I'm not changing how we store things 15:15:18 <n0ano> mspreitz, I think the point is compute sends update to the sched, where sched stores that info is upto the sched, db for now, memory when no-db is in 15:15:31 <bauzas> I'm just making sure that only sched holds the compute_nodes table 15:15:40 <bauzas> n0ano: +1 15:16:15 <canaima172423> yo no hablo ingles 15:16:17 <bauzas> anyway, if we consider Gantt, this is a long-term feature 15:16:30 <canaima172423> guah 15:16:34 <bauzas> as RT will need to call Gantt for updating its state 15:17:03 <bauzas> so anyway, RT will place an external call 15:17:20 <bauzas> the problem is that it requires Gantt (or the sched now) to be robust enough 15:17:22 <n0ano> I agree, I think compute status updates should go to the sched and then let sched decide the best way to store the info so this is good. 15:18:14 <toan-tran> n0ano: this is rather heavy for Gantt 15:18:20 <bauzas> so, to sum up the most important work is on https://review.openstack.org/82778 15:18:38 <toan-tran> should we have some synchronizer to handle DB ? like no-db 15:18:54 <bauzas> and reviews are welcome on https://review.openstack.org/97232 and https://review.openstack.org/89893 15:18:54 <n0ano> toan-tran, maybe but I've just created a BP ( https://blueprints.launchpad.net/nova/+spec/on-demand-compute-update ) to change the way we send updates... 15:19:12 <bauzas> n0ano: that's a good thing 15:19:19 <n0ano> change from periodic to on demand, I thought someone was already working on this but I guess not so I'll start it 15:19:32 <bauzas> mmm, that was about no-db discussion 15:19:36 <bauzas> IIRC 15:19:56 <toan-tran> n0ano: +1 15:20:00 <bauzas> n0ano: ping us the nova-spec draft once you're done with 15:20:17 <bauzas> n0ano: so I'll be able to review it 15:20:23 <n0ano> status updates are orthogonal to no-db, I think the no-db spec got a little overly complex 15:20:28 <toan-tran> n0ano: could you do some analysis on performance ? comparison with current method 15:20:42 <n0ano> bauzas, sure, the BP is there, I have to do the details for the git repo 15:20:47 <toan-tran> some graph would be nice :) 15:21:05 <canaima172423> hello how are;-) 15:21:09 <bauzas> n0ano: I subscribed to the BP, so I'll get the patch link 15:21:10 <n0ano> toan-tran, hard for me, I have like a max of a 3 node system :-( I'm not a bluehost 15:21:49 <toan-tran> n0ano: well, we don't need a real system for that 15:22:06 <toan-tran> ok maybe I will make some Matlab graph so see 15:22:17 <bauzas> toan-tran: your ideas are welcome 15:22:17 <canaima172423> estup 15:22:50 <bauzas> canaima172423: we're in the middle of a meeting, please join #openstack-101 if you want to talk about Openstack 15:23:00 <n0ano> toan-tran, any suggestions on how to get some scaling date from a small system would be welcom 15:23:18 <n0ano> bauzas, I tried to talk to him on a private dialog but he seems to be ignoring me 15:23:46 <bauzas> I just remind you all that juno-1 is next week 15:24:09 <mspreitz> And we're having a Nova bug day today? 15:24:15 <bauzas> so, if you want to vote on having sched-lib to be merged by juno-1, please put some reviews :) 15:24:19 <n0ano> bauzas, anyway, sounds like you have the forklift well in had (baring some reviews) any other help you need? 15:24:26 <n0ano> s/had/hand 15:24:58 <bauzas> n0ano: as said last week, I'll probably require some help for implementing https://review.openstack.org/89893 15:25:09 <bauzas> it's targeted for juno-3 15:25:47 <bauzas> btw, I'll travelling next week 15:25:52 <bauzas> s/be 15:26:07 <n0ano> I have some colleges (sp?) in China, let me see if I can get someone to work on that 15:26:08 <bauzas> so I won't be able to attend the meeting (: 15:26:18 <bauzas> :( 15:26:38 <n0ano> bauzas, NP but if you can send me a quick email update before hand that would be good 15:26:39 <bauzas> and Monday is bank holiday in France 15:26:58 <bauzas> n0ano: will do - don't hesitate to ping me by email ;) 15:26:58 <n0ano> so, we don't work for a bank :-) 15:27:24 * n0ano favorite holiday is Tomb Cleaning Day in China :-) 15:27:32 <bauzas> well, I don't know the word, I would say 'legal' holiday then :) 15:27:49 <bauzas> anyway, I'm done 15:27:55 <n0ano> bauzas, no, your were correct, I was just making a pun 15:28:01 <bauzas> any other questions about the forklift? 15:28:06 <toan-tran> bauzas: well, depending on company, mine still works :) 15:28:06 <n0ano> bauzas, tnx, good work 15:28:07 <bauzas> n0ano: :D 15:28:28 <n0ano> #action n0ano to get someone to work on https://review.openstack.org/89893 15:28:36 <n0ano> moving on 15:28:47 <n0ano> #topic no-db scheduler 15:28:50 <bauzas> toan-tran: don't leave me explain Pentecost Day in France and its paperwork-related stuff :) 15:28:53 <n0ano> YorikSar, you there 15:29:04 <YorikSar> Yea, hi 15:29:17 <YorikSar> I've seen a lot of comments to my spec 15:29:22 <bauzas> hi YorikSar :) 15:29:28 <bauzas> YorikSar: indeed :) 15:29:31 <n0ano> indeed, we finally got moving on that 15:29:41 <YorikSar> Although I never found time to answer or address them. 15:30:02 <YorikSar> I guess I'll be working on that this week. 15:30:15 <bauzas> YorikSar: cool let us know 15:30:29 <YorikSar> You all will know in Gerrit's emails ;) 15:30:48 <bauzas> ;) 15:30:48 <mspreitz> BTW, for the rest of us who do not know Kafka, is there a short sharp summary of what it is and why the advocate thinks it is relevant? 15:30:53 <YorikSar> (in? from? through?) 15:31:32 <toan-tran> YorikSar: in john garbutt's comment 15:31:40 <bauzas> mspreitz: I'm sorry, maybe johnthetubaguy can comment it ? 15:31:45 <YorikSar> mspreitz: I honestly didn't understand how it could fit in our scheme. 15:31:54 <toan-tran> http://kafka.apache.org/ 15:32:20 <bauzas> YorikSar: I haven't said the word tooz :) 15:32:21 <johnthetubaguy> just seemed a lot like the mem cache queue of updates, but already implemented 15:32:53 <johnthetubaguy> the feed back in the summt was it sounds like we are re-inventing a DB 15:33:06 <bauzas> YorikSar: there is also https://github.com/stackforge/tooz 15:33:13 <n0ano> johnthetubaguy, +1 (that's what I heard at the summit also) 15:33:19 <bauzas> +2 15:33:30 <YorikSar> johnthetubaguy: That's very unfortunate outcoe. I wish I could be there to avoid such confusion. 15:34:33 <mspreitz> OK, I'll agree on the question of Kafka. The proposed design is about getting updates to schedulers, it seems to be working around some presumed problem with fanout 15:34:34 <n0ano> YorikSar, maybe a focused email to the dev list to address this issue from you would be good 15:34:46 <YorikSar> bauzas: tooz seems to be not about delivering data from tons of servers to some number of recepients. 15:35:06 <bauzas> YorikSar: indeed, it's only about election, you're right 15:35:25 <bauzas> YorikSar: I was thinking about it for the scheduler 15:35:35 <mspreitz> I mean, "why not Kafka" is a good question 15:35:39 <bauzas> YorikSar: but my mind slipped a little bity 15:35:50 <mspreitz> I wouldn't mind background on why oslo's fanout is not good enough 15:35:51 <YorikSar> I'll take a closer look at Kafka, yes. But I feel like it won't be good for our case. 15:36:01 <bauzas> well, the problem is about the spec with regards to the timeline 15:36:49 <YorikSar> Synchronizer provides not only better delivery pace but also some semi-persistence for "subscribers" that just came online or were sleeping too long. 15:36:58 <bauzas> I mean, that's a big change, and we're only having 2 months for juno 15:37:24 <YorikSar> bauzas: That's not a big change... 15:37:40 <bauzas> YorikSar: well, you introduce many concepts here :) 15:37:52 <n0ano> bauzas, if the backend is selectable between the current DB and the new scheme then the change isn't that disruptive 15:37:59 <bauzas> YorikSar: and some of them are disruptive, see my comments in the spec :) 15:38:12 * YorikSar wishes to hide this work behind some other name so that everybody would forget what've been said about it during the whole year of dreaming the desing... 15:38:58 <n0ano> YorikSar, name change probably not an option but I understand you :-) 15:39:06 <bauzas> YorikSar: well, the problem is that the spec is not that clear, I'm sorry :( 15:39:35 <bauzas> YorikSar: I mean, it seems some points are overlapping other developments 15:40:03 <YorikSar> I think I'll try to convince people in spec first. And then I'll probably start some ML topic so that community could follow current state of things with this bp 15:40:09 <mspreitz> What is wrong with oslo's fanout messaging, and why would the proposed backend do the job better? 15:40:12 <bauzas> YorikSar: and you're proposing to rewrite the whole SQLA backend 15:40:41 <bauzas> mspreitz: IIRC, fanout has been banned a long time ago 15:40:49 <mspreitz> bauzas: why? 15:41:06 <bauzas> mspreitz: lemme find the thread :) 15:41:19 <mspreitz> (not an idle question, we need to know we are not re-producing the same problems) 15:41:20 <YorikSar> bauzas: Well... It's a backend, right? This work just replaces a piece of wiring from compute nodes to the scheduler itself. 15:41:47 <n0ano> we discussed fan out a long time agao but I don't think there was a definitive result, there are still proponents & opponents of it 15:42:29 <YorikSar> mspreitz: Imagine this. Currently we have 1 message for every node every 1 min. With fanout that numbet will get multiplied by the number of schedulers/ 15:42:51 <YorikSar> mspreitz: AFAIC that had been placing too much load to MQ. 15:43:11 <mspreitz> YorikSar: the proposed design does as much messaging in total 15:43:43 <mspreitz> and with several schedulers, the backend is sending most of it 15:43:44 <n0ano> YorikSar, note my new BP ( https://blueprints.launchpad.net/nova/+spec/on-demand-compute-update ), change the 1 min update to on demand and a lot of that load goes away 15:44:01 <YorikSar> mspreitz: This desing keeps numeber of messages the same (unless you plug compute nodes directly to synchronizer). 15:44:12 <mspreitz> what is same as what 15:44:14 <mspreitz> ? 15:44:19 <bauzas> mspreitz: there we go : https://blueprints.launchpad.net/nova/+spec/no-compute-fanout-to-scheduler 15:44:32 <YorikSar> 1 message per node per minute 15:45:34 <mspreitz> Both new design and oslo fanout send O((num schedulers) * (compute node update rate)) messages from backend / through message broker 15:45:51 <YorikSar> n0ano: I thought the source of node state is not that static. E.g. you can add some RAM to compute node and it'll show up on periodic update. 15:45:55 <mspreitz> s/messages/message content/ 15:46:36 <n0ano> YorikSar, you're talking about hot add of mem - that's just another (unlikely) event that causes an update 15:46:52 <bauzas> anyway, I don't think the main discussion about no-db is here :) 15:46:53 <YorikSar> mspreitz: No... Schedulers retrieve new records from backend in packs while compute nodes push them there with the same pace. 15:47:09 <mspreitz> that's why I s/messages/message content/ 15:48:00 <mspreitz> How big is a compute node update? n0ano's question is relevant here 15:48:19 <YorikSar> n0ano: Ok, I remember I had an example of change that was triggered independently from nova-compute but I don't remember what it was. 15:48:36 <bauzas> I'm just having pdb running 15:48:48 <bauzas> don't ask me to calculate the len 15:48:49 <n0ano> mspreitz, last I saw the log message it was about 20 lines of 80 characters 15:48:49 <bauzas> :) 15:49:30 <bauzas> 1226 chars 15:49:34 <bauzas> :) 15:49:46 <bauzas> well, that depends of course 15:50:07 <n0ano> bauzas, pretty close to my 1600 estimate and yes, it varies a little, but not that much 15:50:17 <bauzas> cpu_info is the most greedy 15:50:45 <bauzas> and the bad is that it's very static 15:50:53 <bauzas> you don't change CPUs every day 15:51:01 * toan-tran wonders how close is 1600 to 1226 15:51:08 <n0ano> bauzas, and the most static, we could change the update into two type (static/dynamic) if the size is a big problem. 15:51:08 <YorikSar> bauzas: Depends on your hobby :) 15:51:39 <bauzas> YorikSar: :) 15:51:39 <n0ano> toan-tran, within 1 order of magnitude, WFM :-) 15:52:08 <toan-tran> n0ano: now I understand when you said "we don't work for the bank" :) 15:52:23 <n0ano> toan-tran, touche :-) 15:52:25 <bauzas> guys, I know that hyper-v people cancelled the next meeting, but is it reasonable to chat about it while we're only havnig 8 mnis left ? :D 15:53:03 <YorikSar> I guess we can finish no-db topic here. We'll continue the discussion in the spec draft. 15:53:09 <n0ano> bauzas, I get fried after 60 min. anyway, I'd prefer to have YorikSar update his spec and send out the emails and then discuss later 15:53:24 <bauzas> n0ano: strong approval here 15:54:01 <n0ano> #action YorikSar to update the spec and start email thread on the dev list 15:54:05 <bauzas> but that RPC payload discussion is really passionating 15:54:50 <n0ano> bauzas, I don't mind, strong opinions are good as long as no one gets intimidated 15:55:09 <n0ano> let's move on 15:55:12 <n0ano> #topic opens 15:55:19 <n0ano> anyone have anything new to raise today? 15:55:22 <bauzas> yey, I mean I would love to discuss about it still 15:55:39 <bauzas> 5 mins left :) 15:55:57 <toan-tran> well, I intended to talk about my new patch: https://review.openstack.org/#/c/61386/ 15:56:12 <bauzas> just a reminder, won't be avaiable from mon to thurs next week 15:56:20 <toan-tran> but I don't think we have time left, so maybe next time :) 15:56:37 <bauzas> toan-tran: I briefly readed your spec 15:56:50 <toan-tran> it's on my demo at Atlanta 15:56:51 <n0ano> toan-tran, sure, I'll queue it up for next week (doesn't look like it's getting much love so far) 15:56:59 <bauzas> toan-tran: very interesting, but I think we need to define a clear path for this 15:57:11 <toan-tran> bauzas: +1 15:57:24 <bauzas> toan-tran: and I would love to help you contributing on this 15:57:30 <toan-tran> in fact I submitted it some months ago 15:57:35 <n0ano> #action n0ano to add https://review.openstack.org/#/c/61386/ to agenda for next week 15:57:47 <toan-tran> and after Atlanta I got really good talk with Jay Lau 15:58:04 <toan-tran> his Tetris is what I need for complete my schema 15:58:04 <bauzas> toan-tran: yey, I think that Jay and I are sharing same views 15:58:06 <toan-tran> :) 15:58:23 <bauzas> toan-tran: but that's a big baby 15:58:38 <toan-tran> bauzas: here is my presentation: https://docs.google.com/file/d/0B598PxJUvPrwcWZlaUlaOW11enM/edit? 15:58:47 <toan-tran> page 20 is my vision on the whole scheduling 15:58:53 <bauzas> toan-tran: even bigger than Gantt IMHO :) 15:58:59 <toan-tran> and Tetris fits right in Service Manager 15:59:45 <bauzas> toan-tran: based on last Summit, I fear that it will be too big for Nova 15:59:54 <bauzas> toan-tran: but that's a good fit for Gantt 16:00:11 <toan-tran> bauzas: yeah, we expect Gantt will be part of it :D 16:00:20 <toan-tran> so that will be Gantt + Tetris + Congress 16:00:49 <bauzas> I was thinking that GTC was related to fast cars :) 16:00:56 <toan-tran> but the first step is small & simple, to make an policy-based engine that can fit in nova-scheduler or gantt 16:01:07 <toan-tran> bauzas: +) 16:01:13 <n0ano> top of the hour guys, tnx, good discussion, we'll talk on email and be here next week. 16:01:17 <bauzas> :) 16:01:19 <n0ano> #endmeeting