14:03:34 <edleafe> #startmeeting nova_scheduler 14:03:35 <openstack> Meeting started Mon Sep 12 14:03:34 2016 UTC and is due to finish in 60 minutes. The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:03:36 <zigo> Le'ts continue on #openstack-pkg 14:03:36 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:03:38 <openstack> The meeting name has been set to 'nova_scheduler' 14:03:45 <edleafe> who's here? 14:03:47 <Yingxin> o/ 14:03:47 <cdent> o/ 14:03:48 <bauzas> \o 14:03:56 <alex_xu> o/ 14:03:57 * bauzas with very little support tho 14:04:14 <edleafe> This should be super-quick 14:04:18 <cdent> jay's either on a plane, or on his way to. He promises to review resource provider related stuff on the place 14:04:33 <bauzas> and I need to do my homework as well 14:04:39 <edleafe> cdent: or even the plane? 14:04:43 <edleafe> :_) 14:04:50 <bauzas> but for the moment, working on a regression bugfix 14:05:02 <edleafe> Anyway, everything we need to focus on is here: https://etherpad.openstack.org/p/placement-next 14:05:03 <cdent> edleafe: that too :) 14:05:34 <cdent> we probably need to take a moment (now or after) to make sure that etherpad is up to date, but yeah, it's the center of things at the moment 14:05:49 <edleafe> One thing that just came up, though, is alex_xu's comments on https://review.openstack.org/#/c/368035 14:06:04 <alex_xu> edleafe: thanks 14:06:18 <bauzas> cdent: at least focusing on the "Things we need for Newton" section, hope people can understand :) 14:06:23 <edleafe> that involves including the RP generation both a the request level, and at each individual inventory record 14:06:24 <alex_xu> question about why we have two resource_provider_generation in the request 14:06:34 <edleafe> bauzas: good point 14:06:50 <edleafe> cdent: can you comment on that? 14:06:58 <edleafe> (the multiple generation question) 14:07:12 <cdent> alex_xu: I think the gist there is that the generation at the top level is required, the one within the inventory itself is ignored, but allowed to be there because it is required in the post 14:07:24 <cdent> so if you have code that is generating inventories for both situations you can reuse it 14:07:46 <cdent> it is weird 14:07:57 <bauzas> okay, so we don't have schemas validating the JSON output ? 14:08:02 <alex_xu> cdent: ok, so we still have use that old format, right? 14:08:20 <cdent> it comes about because there was originally only POSTing one inventory at /inventories and then PUT for several inventories at /inventories was added 14:08:25 <bauzas> could we maybe enforce that (the JSON output) at the unittest level ? 14:08:41 <cdent> schemas are not set for output, just input 14:08:49 <bauzas> cdent: is gabbit able to enforce that ? 14:09:03 <bauzas> I mean validing our outputs ? 14:09:05 <cdent> validation of the output is currently only done in the gabbi tests but unit tests are possible as well, if that's desired 14:09:13 <cdent> bauzas: they already do don't they? 14:09:24 <bauzas> cdent: sorry I'm unclear 14:09:30 <cdent> that is: the gabbi tests would fail if the output wasn't what the gabbit tests wanted 14:09:33 <bauzas> cdent: I know we do validate the output with gabbi 14:09:42 <bauzas> cdent: but we validate that per-field, right? 14:09:56 <bauzas> cdent: I was more or less thinking of comparing the whole dict 14:10:03 <cdent> the jsonpaths won't work if the overall structure isn't right 14:10:12 <cdent> but yes, there's not single chunk that validates an entire structure 14:10:29 <johnthetubaguy> bauzas: you mean like the tempest tests that run the output validation json schema things? 14:10:31 <bauzas> cdent: I'm asking that for clarity, ie. something we could point users to 14:10:50 <bauzas> cdent: for the moment, we can only point to specs, right? 14:11:02 <cdent> Yes, we could. If you think we should do it now, instead of ocata, could you make a bug? 14:11:15 <bauzas> cdent: not really for Newton-ishb 14:11:20 <cdent> k 14:11:52 <bauzas> cdent: it's related to the point I missed alex_xu's point mostly because I guess I haven't correctly figured out the JSON output in my mind 14:12:05 * cdent nods 14:12:28 <alex_xu> cdent: the currently code didn't use the generation for each inventory...why we need keep that? 14:13:04 <cdent> alex_xu: we don't _need_ to keep it, but as I said above it is more flexible to do so: "so if you have code that is generating inventories for both situations you can reuse it" 14:13:06 <bauzas> johnthetubaguy: well, I'm rather thinking of the api sample tests that verify the response too 14:13:24 <johnthetubaguy> yeah, I get you now, the samples that then flow into the API docs 14:14:00 <bauzas> johnthetubaguy: I like reading the api sample templates everytime I'm looking for some API response 14:14:06 <cdent> bauzas, johnthetubaguy: we can write, in gabbi if we want to, tests that validate the full request and full response. Most of the gabbi tests right now validate only parts of the response, as they aren't intended to be testing the serializers, but as bauzas points out doing so would make good inspectability 14:14:10 <bauzas> because I'm getting the whole JSON schema 14:14:32 <alex_xu> cdent: ah, thanks 14:14:35 <bauzas> anyway, that's something not really needed for Newton, so no rush 14:14:44 <johnthetubaguy> cdent: yeah, something to capture / check the samples would be good, but yeah, thats of next time 14:16:04 <edleafe> PRobably was good not to have it now, as the structures changed so often as we figured things out 14:16:12 <edleafe> But once they settle down... 14:16:26 <cdent> edleafe++ 14:17:25 <edleafe> So cdent - is there anything we can help with (besides reviews)? 14:17:35 * edleafe has a few spare cycles 14:17:43 <cdent> I seem to recall there was a bug posted late last week about something to do with migrations and allocations? 14:17:45 * cdent looks for it 14:18:08 <bauzas> cdent: yeah, we don't track migrations like we should 14:18:14 <cdent> https://bugs.launchpad.net/nova/+bug/1621709 14:18:15 <openstack> Launchpad bug 1621709 in OpenStack Compute (nova) "There is no allocation record for migration action" [Medium,Confirmed] - Assigned to Alex Xu (xuhj) 14:18:24 <alex_xu> yea, I file that bug 14:18:37 <bauzas> cdent: we only reconcile when we run the periodic update 14:18:43 <alex_xu> but after check the situation, looks like not very easy one 14:18:47 <bauzas> oh, alex_xu, you're on it ? 14:18:49 <edleafe> alex_xu: do you need help with that? I can pick it up after your day ends 14:19:02 <cdent> is the periodic thing not sufficient? 14:19:03 <bauzas> alex_xu: well, that seems easy to me 14:19:06 <bauzas> cdent: it is 14:19:22 <johnthetubaguy> the problem is failed live-migrates I assume? 14:19:23 <alex_xu> ok, you guys can free to take it :) 14:19:26 <bauzas> cdent: so that's not really a big deal for newton, only a stretch goal 14:19:35 <bauzas> johnthetubaguy: live migrations are worst than that 14:19:36 <johnthetubaguy> right, its not a big deal for newton 14:19:44 <alex_xu> johnthetubaguy: nothing failed, just we needn't record for claim 14:19:45 <cdent> the other thing, which mriedem wanted now not later, was https://bugs.launchpad.net/nova/+bug/1621888 14:19:46 <openstack> Launchpad bug 1621888 in OpenStack Compute (nova) "placement-api http responses are not marked for translation" [Medium,Confirmed] - Assigned to Chris Dent (cdent) 14:19:50 <bauzas> johnthetubaguy: here, I'm talking of resizes and cold migrations 14:19:51 <alex_xu> johnthetubaguy: agree 14:20:00 <bauzas> johnthetubaguy: because we have the MoveClaim 14:20:07 <cdent> which I'v assigned myself, but I was leaving it until the representations and logs really settled 14:20:16 <cdent> which I'm assuming won't be the case until the resource tracker has stabilized 14:20:16 <mriedem> cdent: we should just fix that today 14:20:19 <bauzas> johnthetubaguy: but, but, we don't have any claims for live-migration, which is evel 14:20:20 <bauzas> evil 14:20:24 <igordcard> :q 14:20:24 <mriedem> RC1 is on thursday 14:20:28 <johnthetubaguy> so, long term, if you don't have a claim, and new instance takes the spot of the place you want to move to, it all goes odd 14:20:52 <bauzas> johnthetubaguy: which means that even if we add the placement api call on migrations, we'd still derail on live-migs 14:21:05 <cdent> mriedem: there's a couple of server side things to merge, hopefully today. 14:21:08 <bauzas> but then the world would be fixed every 60 secs 14:21:17 <alex_xu> and we didn't cleanup the allocation which compute node didn't know in the update_available_resource 14:22:03 <bauzas> okay, I need to get done my cellsv2 patch super quick so I can help with the migration allocation patch 14:22:27 <mriedem> fyi, we have a grenade full job that runs with the placement API in the experimental queue 14:22:35 <cdent> mriedem++ 14:22:46 <mriedem> if you're working on placement changes, you can 'check experimental' that 14:23:10 <bauzas> mriedem: <3 14:23:25 <edleafe> mriedem: kewl 14:23:28 <Yingxin> there might be another bug that "can_host" is always 0 for compute node resource providers 14:23:52 <cdent> Yingxin: ah, good catch. I think we've probably just forgotten that 14:24:05 <Yingxin> https://bugs.launchpad.net/nova/+bug/1622538 14:24:06 <openstack> Launchpad bug 1622538 in OpenStack Compute (nova) "Wrong "can_host" field of compute node resource providers" [Undecided,New] - Assigned to Yingxin (cyx1231st) 14:24:34 <Yingxin> easy fix 14:24:36 <cdent> Yingxin: I suspect that was just an oversight when doing the PUT 14:24:46 <Yingxin> cdent: yup 14:25:44 <cdent> dansmith: is your inventory stuff synced up with jay's representation changes? 14:25:55 <edleafe> Be sure to update the etherpad with these bugs and their fixes 14:26:15 <edleafe> #link https://bugs.launchpad.net/nova/+bug/1621709 14:26:18 <openstack> Launchpad bug 1621709 in OpenStack Compute (nova) "There is no allocation record for migration action" [Medium,Confirmed] - Assigned to Alex Xu (xuhj) 14:26:18 <edleafe> #link https://bugs.launchpad.net/nova/+bug/1622538 14:26:19 <openstack> Launchpad bug 1622538 in OpenStack Compute (nova) "Wrong "can_host" field of compute node resource providers" [Undecided,New] - Assigned to Yingxin (cyx1231st) 14:26:45 <Yingxin> edleafe: ok 14:27:18 <cdent> edleafe: in terms of "things to do to help" I reckon it's the same as last time: just try it in devstack and see what's wrong 14:27:37 <mriedem> are we holding up rc1 for those bugs? 14:27:39 <cdent> there are things like: https://bugs.launchpad.net/nova/+bug/1620748 14:27:40 <openstack> Launchpad bug 1620748 in OpenStack Compute (nova) "In placement when an attempt is made to write to missing inventory the error message is ugly" [Medium,Confirmed] - Assigned to Chris Dent (cdent) 14:27:48 <cdent> which are not critical, but present 14:28:10 <edleafe> cdent: ok 14:28:11 <cdent> mriedem: migration one probably not, can_host probably yes 14:28:21 <cdent> but the latter is a very quick fix 14:28:24 <mriedem> my assumption is we're not going to hold rc1 for placement bugs as it's optional 14:28:33 <mriedem> and bugs can be backported 14:28:44 * cdent shrugs 14:28:51 <mriedem> but hit me up if there is something ready that people feel we should get in 14:28:58 <cdent> I don't understand all these rules and regulations :) 14:29:18 <dansmith> cdent: no, I should probably rebase that on his I guess 14:29:42 <dansmith> cdent: I had expected mine to be merged already and that his would just fix it up in place, but... 14:30:06 * cdent nods at dansmith with ellipsis in his eyes 14:30:07 <edleafe> cdent: http://ru.memegenerator.net/instance/63122651 14:30:40 <bauzas> well, MHO is that I don't think we should hold RC1 for placement bugs - but, RC2 and later could be a possibility 14:30:40 <dansmith> mriedem: placement api changes should go in before rc1 if we can 14:30:45 <dansmith> mriedem: otherwise it's pretty messy 14:31:08 <mriedem> dansmith: sure, but the range of what those changes are is pretty diverse 14:31:34 <dansmith> mriedem: there's only one left, AFAIK 14:31:52 <dansmith> this one: https://review.openstack.org/#/c/368035/ 14:32:40 <mriedem> ok, um https://etherpad.openstack.org/p/placement-next 14:32:46 <mriedem> whoever is updating the 'things we need for newton' 14:32:56 <mriedem> are we sure? 14:33:07 <mriedem> my point is let's just make sure the list in https://etherpad.openstack.org/p/placement-next is sane 14:33:41 <mriedem> anyway, i'll be hitting up people later after meetings 14:33:54 <edleafe> ...or just hitting people 14:33:56 <cdent> huh 14:34:18 <cdent> I apparently never have enough context when people mention things whether they mean now or later 14:34:52 <cdent> Nor does the arbitrariness of it all ever become clear 14:35:01 <mriedem> if something is going to bad to backport, 14:35:12 <mriedem> or make placement much worse to migrate to if you don't have it at rc1, 14:35:17 <mriedem> then let's mark it for rc1 14:35:25 <mriedem> but like the translation bug can be a non-rc1 thing 14:35:31 <dansmith> definitely 14:35:48 <dansmith> mriedem: I just made some updates to the list 14:35:55 <cdent> mriedem: so, on the translation thing, why do _any_ of it now instead of just waiting? 14:35:59 <dansmith> mriedem: three patches for newton, but one is the critical, the others could go later if we *had* to 14:36:05 <cdent> your comments on the bug is confusing 14:37:22 <mriedem> cdent: b/c it'll take 5 minutes to do it? 14:37:35 <mriedem> and there will be translations after rc1 14:37:46 <cdent> sure, but then we have some of it marked and some of it not, which just seems weird 14:37:57 <mriedem> i'm unclear on why we're waiting for things on the server side that's preventing us from translating the api side 14:38:03 <mriedem> s/server/RT/ 14:38:07 <mriedem> unless api == server 14:38:18 <cdent> api does == server 14:38:24 <cdent> because the bug is about api responses 14:38:29 <mriedem> anyway, if there are new changes that are adding new exceptions, then let's just mark those for translatoin in the changes that introduces them... 14:38:31 <mriedem> and not add more gorp 14:39:17 <cdent> k 14:39:42 <edleafe> Anything else? 14:39:51 <edleafe> Or do we get back to work? 14:40:21 <mriedem> back to the pile 14:40:24 * edleafe only hears the ringing in my ears 14:40:33 <edleafe> ok, everyone - thanks! 14:40:35 <edleafe> #endmeeting