17:00:23 <alaski> #startmeeting nova_cells 17:00:23 <openstack> Meeting started Wed Jun 8 17:00:23 2016 UTC and is due to finish in 60 minutes. The chair is alaski. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:24 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:27 <openstack> The meeting name has been set to 'nova_cells' 17:00:43 <mriedem> o/ 17:00:44 <doffm> o/ 17:00:44 <dansmith> ohai 17:00:45 <auggy> o/ 17:00:48 <melwitt> o/ 17:00:55 <alaski> hello everyone 17:00:59 <alaski> #topic Testing 17:01:14 <alaski> no breaks as far as I know 17:01:24 <alaski> auggy: any update on grenade testing? 17:01:34 <auggy> i've got a WIP change up - https://review.openstack.org/#/c/326105/ 17:01:50 <alaski> awesome 17:01:51 <auggy> it's just copying the multinode test 17:01:54 <auggy> nothing special 17:01:56 * alaski opens a tab 17:01:59 <auggy> i'm still trying to get devstack-gate to work 17:02:15 <auggy> so i can get the grenade target devstack settings we need 17:02:24 <alaski> okay 17:02:46 <auggy> well, whenever the simple cells setup stuff is ready 17:02:47 <auggy> :) 17:03:07 <alaski> great. I'll check out the review in a bit 17:03:08 <auggy> right now i'm just troubleshooting adding that test and making jenkins pass 17:03:22 <alaski> for the simple cells setup I have https://review.openstack.org/#/c/322311/ 17:03:24 <auggy> yeah it's failing right now because i'm not putting a thing somewehre it needs to go 17:04:02 <alaski> okay 17:04:07 <alaski> thanks for working on that 17:04:24 <auggy> alaski: oh great! as soon as i can get devstack-gate working to create a devstack i'll check out that change and see what happens 17:04:32 <auggy> yeah and feel free to pipe in if it looks like i'm going down a rabbit hole i shouldn't be 17:04:45 <alaski> sure 17:04:52 <alaski> #topic Open Reviews 17:05:01 <alaski> https://etherpad.openstack.org/p/newton-nova-priorities-tracking 17:05:20 <alaski> I have not been keeping my stuff up to date 17:05:30 <alaski> I will get on that in a bit 17:05:31 * dansmith wags his finger 17:05:42 * alaski hangs his head 17:05:43 <woodster_> o/ 17:05:52 <melwitt> should WIP things go in there or no? 17:05:56 <alaski> please don't follow my lead, keep it up to date 17:06:02 <alaski> melwitt: I would say yes 17:06:10 <alaski> just mark it as such in there 17:06:21 <melwitt> okay 17:06:48 <alaski> #topic Open Discussion 17:06:55 <alaski> I have a few prepared items here today 17:07:10 <alaski> first, I want to mention an issue with instance.name 17:07:24 <alaski> by default it relies on using instance.id 17:07:30 <alaski> which is assigned by the db 17:07:41 <alaski> so returning that before writing to the cell db is problematic 17:07:54 <alaski> my planned solution is to make it blank until it's in a cell db 17:07:57 <dansmith> alaski: just want to be clear: we expose instance.name via the external API? 17:08:02 <alaski> yes 17:08:08 <dansmith> I'm not sure why we would do that, but.. heh, okay 17:08:10 <alaski> I'm pretty sure we do 17:08:13 <mriedem> external attribute i think 17:08:50 <mriedem> i'm not sure why we do a lot of the things we do 17:08:50 <alaski> yep, just another instance of oversharing implementation details 17:08:57 <mriedem> which is why sean is taking a flamethrower to the api :) 17:08:57 <dansmith> "OS-EXT-SRV-ATTR:instance_name": "instance-00000001", 17:09:16 <alaski> mriedem: I'm trying to start small fires here and there as well 17:09:19 <doffm> Why do we need to base it on the db id? 17:09:25 <doffm> Cant we give it a uuid instead? 17:09:28 <alaski> doffm: we don't 17:09:29 <doffm> For new servers? 17:09:34 <alaski> but in the past we did 17:09:44 <dansmith> doffm: we don't, it's just configurable and used in some scary places 17:09:46 <mriedem> doffm: see the instance_name_template config option 17:09:52 <doffm> Ok. 17:09:54 <alaski> doffm: the more complex answer is that it's generated on each access 17:10:07 <doffm> Ouch. 17:10:10 <alaski> if we persisted it for older instances we could update this 17:10:21 <alaski> but I didn't want to go down that rabbit hole atm 17:10:22 <dansmith> we can work around this, but it's far easier to either not return it until we know it, or have it appear to change 17:10:23 <mriedem> https://github.com/openstack/nova/blob/master/nova/objects/instance.py#L252 17:10:43 <dansmith> this is a relic from long ago 17:10:58 <doffm> LOL. That function. 17:11:00 <dansmith> the last time we tried to remove it, we realized all people with lvm-backed instances would be screwed 17:11:01 <alaski> dansmith: yeah, I want to run an object abuse past you for this. but I would like to keep it blank at first 17:11:04 <dansmith> and there are a couple other places 17:11:27 <dansmith> alaski: this is not remotable so we can abuse it at will, but I'd rather start with it blank or missing yeah 17:11:40 <alaski> cool 17:11:54 <mriedem> how are we going to namespace it per cell? 17:11:59 <mriedem> to avoid collisions? 17:12:03 <alaski> we aren't 17:12:19 <alaski> there's no guarantee of uniqueness here 17:12:34 <dansmith> that's the thing 17:12:34 <alaski> right now someone could make a static template 17:12:40 <dansmith> it will overlap.. a lot 17:12:44 <dansmith> right 17:12:52 <mriedem> it eventually just turns into the uuid if it can't fit the template 17:13:05 <mriedem> https://github.com/openstack/nova/blob/master/nova/objects/instance.py#L268-L271 17:13:41 <alaski> yeah, but a template like 'instance' should work 17:13:43 <mriedem> anyway, yeah, you could goof that template up 17:13:49 <dansmith> it does't matter, 17:13:56 <dansmith> there are resources named from that template right now 17:14:03 <dansmith> if we change it, then those resources are lost 17:14:09 <alaski> I would like to address this eventually, but not as a blocker for what I'm working on now 17:14:09 <dansmith> like lvms for instance backing disks 17:14:30 <alaski> eventually I want to persist the name, and snapshot every current instance name 17:14:37 <alaski> but that's harder that it seems at first 17:15:08 <alaski> next up 17:15:16 <alaski> adding a new compute to a cell 17:15:35 <alaski> I realized yesterday that we have plans for migrating what exists to a cell 17:15:42 <alaski> but no plans for how things should be added to one 17:16:06 <alaski> so when a compute is added it needs to get mapped properly 17:16:18 <alaski> I would like it to look up the info it needs, and have it do it itself 17:16:27 <dansmith> yes please 17:16:30 <alaski> and all it needs it the cell mapping uuid 17:16:47 <alaski> so we could require that in a config, or put it in the cell db 17:17:25 <alaski> I favor putting it in the db, but a config is the simpler start I think 17:17:43 <alaski> any thoughts? 17:18:11 <melwitt> so every compute nova.conf would contain the cell uuid? 17:18:20 <alaski> yeah 17:18:34 <dansmith> I don't love it, but... 17:18:42 <melwitt> okay, just making sure I understand 17:18:44 <mriedem> how would the db one work? 17:18:58 <alaski> a new cell table that just stored that uuid 17:19:07 <alaski> since every compute is configured to point at a db 17:19:16 <dansmith> alaski: so there is another option maybe: 17:19:23 <dansmith> alaski: instead of the upcall which kinda sucks anyway, 17:19:38 <dansmith> alaski: what if we had a "discover" command either via api or nova-manage, 17:19:59 <dansmith> which would merely list all computes in each cell in sequence, find new compute nodes that aren't yet mapped, and add a mapping for them? 17:20:05 <dansmith> that would avoid the upcall, not require a config, 17:20:23 <dansmith> and would have the knowledge of the cell already so it doesn't need a cell uuid persisted anywhere other than where we have it already 17:20:34 <dansmith> and then we could let that run periodically or just say "put that in cron if you want it" 17:20:48 <dansmith> presumably people only want to do that when they know they're adding new computes for the first time 17:20:56 <dansmith> and it wouldn't be "register this one compute node" kind of sucky 17:21:08 <alaski> fair point 17:21:34 <dansmith> and it could even be "discover --all-cells" or "discover --cell=$uuid" 17:21:44 <dansmith> to be lightweight when you just lit up a new rack 17:21:55 <melwitt> interesting idea 17:22:00 <dansmith> the computes already check in to their own cell by queue, so no reason to make them upcall I think 17:22:06 <alaski> I don't love the extra deployer step, but it does simplify it 17:22:23 <dansmith> because people that don't allow upcalls by policy (which hopefully will be most people eventually) would have an issue registering new computes 17:22:45 <melwitt> that's a good point 17:22:58 <mriedem> alaski: there is an extra deployer step with the config option too 17:23:04 <dansmith> alaski: well, we could make a periodic task a the top ad some point that just does "SELECT count(id) FROM compute_nodes" every few minutes on each cell db 17:23:04 <alaski> yeah 17:23:14 <dansmith> mriedem: yeah and that is more painful, IMHO 17:23:18 <alaski> mriedem: that's why I'm in favor of the db option. but it still requires the upcall 17:23:33 <dansmith> alaski: anyway, one more step right now that we can automate is not a huge deal I think 17:23:50 <alaski> dansmith: right. so I think this sounds like a good first step, and then it can be refined later 17:24:28 <alaski> I'll try that, and we can debate further on a review 17:25:01 <alaski> my final agenda item: I'm going to be heading to the airport during next weeks meeting, so does someone want to run it, or skip? 17:25:03 <dansmith> alaski: I can think of lots of lightweight ways the scheduler could detect that we have more compute nodes than mappings, and trigger a discovery 17:25:25 <dansmith> and by "lots" I mean "at least one" 17:25:31 <melwitt> :) 17:25:35 <alaski> dansmith: yes, until the scheduler splits (I'm still hopeful on that) 17:25:46 <dansmith> I hate meetings, I suggest we skip 17:25:54 <doffm> We can probably miss a week. 17:26:02 <dansmith> alaski: I'm not sure where all this kind of stuff goes in that case anyway, but yeah 17:26:07 <melwitt> yeah, a skip is cool with me too 17:26:19 <alaski> cool 17:26:24 <alaski> #note no meeting next week 17:26:40 <alaski> dangit 17:26:45 <alaski> #info no meeting next week 17:26:54 <alaski> okay, any other topics for today? 17:27:27 <melwitt> I wanted to mention I put up a WIP for people to have a look at for querying cell service version for compute RPC calls 17:27:57 <melwitt> https://review.openstack.org/#/c/326906/ so feel free to comment 17:28:17 <mriedem> hmmm, 17:28:29 <mriedem> that makes me think of the check i have in the get me a network rest api change 17:28:41 <mriedem> it's checking that all computes in the deployment are at least newton 17:28:57 <mriedem> with cells it would have to aggregate that all up 17:29:11 <mriedem> but you could be smarter and schedule the instance to a particular cell that is fully newton 17:29:20 <alaski> yeah 17:29:29 <alaski> I was thinking we could treat cells individually 17:30:07 <mriedem> where does the service table live in cells v2? 17:30:14 <dansmith> in the cell 17:30:41 <mriedem> so when asking for min nova-compute service, that will just be checking all computes as it does today, 17:30:49 <mriedem> but those computes would be grouped by cells in some mapping table 17:31:03 <dansmith> you can't do a single query of all compute services 17:31:18 <dansmith> you can do N for N cells and then pick the cells that are >=X 17:31:35 <mriedem> sure 17:31:48 <mriedem> that check in the api won't work unless the scheduler does the right thing though 17:32:33 <mriedem> easy out is just require all computes in all cells to be >=x 17:32:36 <dansmith> right, the scheduler would have to consider it 17:32:54 <dansmith> for single feature adds, 17:33:05 <dansmith> not allowing it until everything is upgraded is totally fine, IMHO 17:33:24 <mriedem> yeah i'm fine with that 17:33:52 <mriedem> do we have a stance on mixed cell deployments? 17:34:03 <mriedem> i.e. i can have a newton cell and a mitaka cell? 17:34:06 <alaski> they're going to need to be possible 17:34:07 <doffm> I think so. 17:34:14 <dansmith> yeah, not optional 17:34:18 <doffm> I mean you will want to roll cells. 17:34:24 <alaski> right 17:34:25 <mriedem> roll computes within your rolling cells 17:34:27 <mriedem> mfer 17:34:28 <dansmith> it'd be a regression to atomic upgrades 17:35:00 <alaski> always be upgrading 17:35:05 <dansmith> lol 17:35:08 <mriedem> god 17:35:14 <dansmith> need .. the .. tshirt 17:35:26 <mriedem> doffm: not it for the ansible changes to handle this 17:35:27 <alaski> hah 17:36:01 <doffm> mriedem: It. :( 17:36:03 <mriedem> ok i'm done with random questions 17:36:25 <melwitt> I was thinking of starting work on the server groups migrations if that's cool with everyone 17:36:44 <dansmith> I OBJECT 17:36:49 <dansmith> (I don't object) 17:36:50 <doffm> YES 17:36:53 * melwitt goes back in cave 17:36:54 <alaski> I'd rather we just got rid of them... but that's cool with me 17:37:07 <melwitt> heh 17:37:10 <mriedem> yeah i was going to say quotas is probably higher priority? 17:37:17 <mriedem> but shittier 17:37:28 <alaski> quotas is in progress right? 17:37:30 <melwitt> I think doffm is doing quotas right? 17:37:32 <mriedem> is it? 17:37:35 <mriedem> ha 17:37:36 <doffm> I will start on quotas next week. 17:37:40 <mriedem> seriously? 17:37:43 <doffm> I'll add it to our backlog. 17:37:46 <doffm> Or do it in the evening. 17:37:55 <mriedem> doffm: i need to know how this benefits ibm public cloud 17:38:14 <doffm> mriedem: Shhhhh. Nothing happening here. 17:38:14 <mriedem> anywho 17:38:19 <melwitt> I can do quotas if that's more needed, I just thought it was already taken 17:38:37 <mriedem> mark wrote the spec 17:38:40 <dansmith> melwitt: run away .. fast 17:38:52 <mriedem> but i'm pretty sure mark is overcommitted, but i'll let him hang himself if he wants 17:39:08 <melwitt> :) 17:39:55 <mriedem> let's end meeting before dan freaks out 17:40:02 <dansmith> yes please 17:40:14 <alaski> alright, anything else? 17:40:19 <alaski> better speak up quick 17:40:34 <alaski> thanks everyone! 17:40:36 <alaski> #endmeeting