14:59:50 <bauzas> #startmeeting gantt 14:59:51 <openstack> Meeting started Tue Feb 17 14:59:50 2015 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:59:52 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:59:55 <openstack> The meeting name has been set to 'gantt' 15:00:04 <edleafe> o/ 15:00:09 <alex_xu> \o 15:00:16 <bauzas> hi, stepping up this time as n0ano can't make it, who's there ? 15:01:50 <bauzas> woah, please all don't speak at the same time :) 15:02:16 <edleafe> alex_xu and I were way ahead of you 15:02:26 <lxsli> o/ 15:02:38 <bauzas> ;à 15:02:40 <bauzas> :) 15:02:46 <alex_xu> :) 15:03:02 <bauzas> so, let's start, so people could come 15:03:46 <bauzas> #topic Remove direct nova DB/API access by Scheduler Filters 15:04:10 <edleafe> So any ideas on the best way to represent version of the compute node? 15:04:13 <bauzas> so, https://review.opernstack.org/138444/ is updated very often, thanks edleafe 15:04:23 <edleafe> Since RPC version doesn't seem to fit 15:04:45 <edleafe> bauzas: I have another update waiting to go, after the results of this meeting 15:05:14 <bauzas> edleafe: here I was thinking a x.y.z version is good 15:05:38 <edleafe> it would seem that rolling compute node updates would have already been a problem 15:05:47 <edleafe> and that someone would have created a solution 15:05:58 <bauzas> edleafe: actually, the problem is that the scheduler is still really tied to Nova 15:06:09 <edleafe> bauzas: true 15:06:10 <bauzas> edleafe: but longer term, that would be an API 15:06:28 <edleafe> but in general, compute nodes are rolled out in bunches, not all at once 15:06:36 <bauzas> edleafe: so considering that's a separate API, in general, clients provide a backwards compatibility by discovering versions 15:06:44 <edleafe> so the issue of differing behavior should have been a problem at some time, no? 15:07:14 <bauzas> edleafe: here, on a longer term, we should imagine a Gantt client able to discover if the Gantt server is having the compute capabilitiy 15:07:45 <bauzas> edleafe: at the moment, as we don't have a discovery mechanism, that's just something we pass to the Scheduler 15:07:57 <edleafe> the reverse is also true 15:08:16 <bauzas> edleafe: reverse of what ? 15:08:31 <edleafe> if the gantt server is relying on outside entities behaving a certain way, it needs to be able to have a way to verify that 15:08:42 <bauzas> edleafe: that, I disagree :) 15:08:50 <edleafe> i.e., gantt server discovering its clients 15:08:54 <bauzas> edleafe: that's all about capabilities of a given API 15:09:18 <bauzas> edleafe: so, if Gantt wants to know Nova capabilities, it will run the Nova client which provides that backwards compatibiltiy 15:09:32 <edleafe> so you're saying that the gantt server will never be dependent on anything outside of itself? 15:09:44 <bauzas> edleafe: all the compatibility checks are usually done by the clients 15:09:53 <bauzas> edleafe: I'm not saying that :) 15:10:21 <bauzas> edleafe: I'm saying that if Gantt has to depend on something else, it will use the "something else" client library for knowing the "something else" capabilities 15:10:30 <edleafe> bauzas: we're not at a pure client/server relationship 15:10:51 <edleafe> we're relying on individual compute nodes as the source of truth 15:11:00 <bauzas> edleafe: right, and that comes up to my point : that's just because the Scheduler is really tied 15:11:01 <edleafe> not the compute api 15:11:12 <edleafe> understood 15:11:15 <bauzas> edleafe: right, and it won't change 15:11:33 <bauzas> edleafe: meaning that the compute nodes are running a scheduler clien t 15:11:38 <edleafe> that's what I'm trying to figure out how to deal with. Not the ideal situation in the future 15:11:49 <bauzas> edleafe: so, we can just consider that the scheduler client is having a version 15:12:20 <bauzas> edleafe: as it's already the case for Juno, we know that all updates are going thru the client 15:12:38 <bauzas> edleafe: so bumping a client version seems reasonable 15:12:57 <edleafe> bauzas: bumping the scheduler client version, yes 15:13:02 <edleafe> but that doesn't help us here 15:13:04 <bauzas> edleafe: ie. computes provide their stats thru the client, the client is adding a version number to those states 15:13:06 <bauzas> stats 15:13:28 <bauzas> edleafe: so we keep the release tagging by the scheduler 15:13:36 <edleafe> it's adding the same version number to every stats report 15:13:43 <bauzas> edleafe: incorrect 15:13:56 <edleafe> ?? 15:14:00 <bauzas> edleafe: computes have different scheduler client versions 15:14:14 <bauzas> edleafe: because the code is run by the compute node 15:14:16 <alex_xu> I'm confuse on what we expect on Gantt in the future, Gantt poll data and nova also push data to gantt 15:14:43 <alex_xu> we want to poll data or push data? 15:14:48 <bauzas> alex_xu: the direction is very clear : Gantt won't *poll* data unless exceptional circumstances 15:15:08 <edleafe> alex_xu: in the future, gantt will own the data 15:15:08 <bauzas> alex_xu: Computes (or others) will push data to Gantt 15:15:21 <lxsli> So we ask compute nodes to write their scheduler client version to DB on startup; then any node sans version which hasn't sent us a message we assume is old? 15:15:33 <bauzas> lxsli: you got it 15:15:35 <edleafe> lxsli: no, we don't even need the db then 15:15:45 <bauzas> edleafe: no, we still need DB 15:15:46 <lxsli> :D 15:16:00 <bauzas> edleafe: because compute updates are going thru the DB now 15:16:03 <alex_xu> bauzas: but we define interface for sync instances info. We call gatt to tell which instance need updated, then we let gantt poll the updated instance 15:16:04 <edleafe> bauzas: can't the scheduler keep track when it gets stats reports? 15:16:30 <edleafe> Host A has client version 1.2.3, host B has 1.2.4 15:16:30 <bauzas> edleafe: that's an async process 15:16:40 <edleafe> bauzas: yes 15:16:48 <bauzas> edleafe: sched client updates the conductor which updates DB 15:16:52 <lxsli> edleafe: we need to know whether a node is old on startup, before it gets a chance to update us 15:16:54 <edleafe> and until a host reports a minimal version, we assume it is old 15:17:00 <lxsli> that's why we still need the DB afaik 15:17:17 <edleafe> lxsli: why do we need to know that at startup? 15:17:32 <edleafe> if it's old, it won't be sending updates 15:17:35 <edleafe> (for instances) 15:17:57 <lxsli> because a compute node doesn't message us until something changes on it and we need to be able to schedule to it immediately 15:18:12 <edleafe> when the time comes to add instance info to HostState objects, unless we've seen a minimal version for that host, we'll grab the InstanceList ourselves 15:18:18 <bauzas> edleafe: so you're just saying that we should just consider that if no RPC calls are going from an host, then the host is old - that's cautious : 15:18:33 <edleafe> bauzas: sort of 15:18:43 <lxsli> we get the initial InstanceList from the DB, we want that to include whether the node is new so we don't do DB queries for new nodes 15:18:44 <edleafe> bauzas: I'm saying that the compute nodes are sending stats regularly 15:18:54 <bauzas> edleafe: I was just seeing that computes periodically update stats to DB using the client, we can tag this client 15:18:56 <lxsli> edleafe: not regularly - on change 15:19:01 <bauzas> lxsli: +1 15:19:13 <edleafe> bauzas: once we see a minimal version for the client, we know that we are getting instance changes 15:19:28 <bauzas> edleafe: that's even better if we tag those stats - just because we're adding a version 15:20:42 <bauzas> edleafe: anyway, the idea is the scheduler version - maybe the sync method (which I still think it's a bad name, but anyway...) can just report that version 15:20:48 <edleafe> lxsli: if the compute node is new, it will also be sending syncs periodically 15:20:58 <edleafe> lxsli: so we'll know if we missed something 15:21:02 <lxsli> edleafe: ahhh the sync - OK, that can work 15:21:18 <bauzas> gosh, I really dislike the 'sync' word :/ 15:21:44 <lxsli> sanity check? :) 15:21:44 <edleafe> bauzas: it's better than 'check_for_same_uuids' :) 15:22:09 <edleafe> bauzas: or 'are_we_in_sync' 15:22:27 <lxsli> OK to sum up - so we assume any node is old unless we've had a sanity check from it, and the scheduler client version contained in that is new enough 15:22:28 <bauzas> edleafe: anyway, I don't want to nitpick 15:22:30 <edleafe> lxsli: gut_check 15:22:45 <edleafe> lxsli: that's what I'm thinking 15:22:47 <lxsli> (where 'new enough' is always true right now) 15:23:08 <lxsli> That works for me... we could do a few extra queries in the first minute, but ^^ 15:23:35 <bauzas> edleafe: that could work - ie. the 'sync' method is just "pass context to the scheduler, incl. version" 15:23:45 <edleafe> lxsli: yes, the startup will always be a little crazy until things settle down and we know what we're dealing with 15:23:57 <edleafe> bauzas: yep 15:24:10 <bauzas> +1 on it, I don't want to take too much time on that 15:24:21 <lxsli> +1 woohoo progress! 15:24:22 <edleafe> bauzas: actually, any sync at all means it's new enough :) 15:24:37 <bauzas> edleafe: agreed 15:24:51 <bauzas> edleafe: adding a version seems reasonable anyway 15:24:58 <edleafe> this is great, because it solves the hairy DB problem I was seeing in tests 15:25:09 <bauzas> eh eh 15:25:23 <edleafe> since the ComputeManager isn't going to be writing to the DB at startup 15:25:34 <bauzas> edleafe: it does :) 15:25:52 <bauzas> edleafe: because ComputeManager calls RT.update_resource_stats() at startup 15:26:17 <edleafe> bauzas: but that's already taken care of 15:26:30 <bauzas> edleafe: anyway, I leave you update the spec 15:26:33 <edleafe> I don't have to mock out this new call all over the place 15:26:37 <lxsli> I put a comment on the spec 15:26:42 <lxsli> What's the next topic? 15:26:46 <bauzas> edleafe: about the spec, I had one comment about passing the user context as an argument 15:26:56 <edleafe> bauzas: yes 15:27:00 <bauzas> basically, I'm not seeing it as an added value 15:27:21 <bauzas> if the scheduler wants to query its DB, it doesn't need the Nova user context 15:27:22 <edleafe> so it's not needed for the RPC stuff to work? 15:27:39 <edleafe> Just looking at the other calls, they all pass in context, 'method_name', **kwargs 15:27:50 <bauzas> edleafe: nope at all, we are even trying to reduce the number of times we're passing out a context 15:28:00 <edleafe> ok, then I'll take it out 15:28:02 <bauzas> edleafe: like in the object methods 15:28:13 <bauzas> edleafe: we're just passing out a context once 15:28:30 <bauzas> edleafe: again, I don't want to nitpick, lxsli thoughts ? 15:28:52 <lxsli> None 15:28:56 <bauzas> awesome 15:29:07 <edleafe> hey, if it's not needed, then I'll take it out 15:29:08 <bauzas> lxsli raised an NotFoundException 15:29:45 <edleafe> anything else on the spec? 15:29:53 <bauzas> ok, before we're definitely loosing lxsli, we can move on :) 15:30:18 <bauzas> #topic Status on cleanup work 15:30:27 <edleafe> bauzas: need any help on your patch series? 15:30:41 <bauzas> edleafe: which one ? :D 15:30:53 <edleafe> any of them! 15:30:54 <bauzas> detach-service ? 15:31:08 * edleafe was thinking of detach server 15:31:11 <bauzas> so, detach-service-from-computenode is in a quite good shape 15:31:12 <edleafe> ugh 15:31:13 <edleafe> service 15:31:18 <bauzas> because I have core support 15:31:26 <lxsli> dan went +A wild :) 15:31:35 <bauzas> and jaypipes kindly helped me on that one 15:31:41 <bauzas> I have other BPs 15:32:00 <bauzas> isolate-sched-db-aggregates is in a good shape, ie. coding coding coding 15:32:31 <bauzas> and RequestSpec objectification is currently blocked because of an Image objectification patch 15:32:48 <bauzas> so, I'm pretty late to the planning, but I can still handle my work :) 15:32:52 <edleafe> bauzas: how can the rest of us help out? 15:33:07 <bauzas> edleafe: honestly, not a lot of things 15:33:09 <lxsli> I have a bit of time too 15:33:13 <edleafe> ok, just checking 15:33:32 <bauzas> edleafe: as I said, it's all about coding a BP which is quite straightforward :) 15:33:48 <edleafe> I'm coding my spec as WIP, so when that finally gets approved, the code should be ready, too 15:33:49 <bauzas> lxsli: you told me this morning about something on the RT objectification, right K 15:33:51 <bauzas> ? 15:33:58 <bauzas> edleafe: awesome 15:34:24 <lxsli> Yeah Paul promised we'd help with that but we've been blocked / busy 15:34:46 <lxsli> I think my migration object chain is pretty OK right now so I was looking for something more to do in that area 15:35:27 <bauzas> lxsli: I'm lost with your series, could you just give us the link ? 15:36:20 <lxsli> So mine is https://review.openstack.org/#/c/79324/ but I need Jay's https://review.openstack.org/#/c/152689/ before sahid's patch can merge before mine can 15:36:44 <bauzas> lxsli: ok thanks, I'm starring those changes 15:36:59 <lxsli> So that one will be some time... meanwhile looking for some more RT objectification to help with 15:37:24 <bauzas> lxsli: there are some patches in review about that 15:37:46 <bauzas> lxsli: I saw them in the pipeline from someone named Hans Lindgren 15:38:15 <bauzas> lxsli: here it is https://review.openstack.org/#/c/149224/ 15:38:31 <bauzas> lxsli: you can probably help him 15:38:44 <lxsli> OK I'll have a look, thanks 15:39:21 <bauzas> any further things to discuss about the priority BPs or can I open the opens ? 15:39:50 <edleafe> open the opens! 15:40:04 <bauzas> #topic Open discussion 15:40:25 <bauzas> So, wazztup ? 15:41:17 <edleafe> bauzas: https://www.youtube.com/watch?v=tauYnVE6ykU 15:41:19 <lxsli> It's just us 4 here right? Maybe not too much 15:41:43 <edleafe> We discussed my issues already, so I'm good 15:41:46 <bauzas> edleafe: I was exactly on the same video but was too shy to propose it here :) 15:41:50 <bauzas> edleafe: I'm glad you did :) 15:42:00 <edleafe> bauzas: :) 15:42:06 <bauzas> ok, if crickets, then return 15:42:17 <lxsli> Early finish \o/ 15:42:22 <bauzas> thanks all 15:42:24 <bauzas> #endmeeting