17:00:23 #startmeeting nova_cells 17:00:23 Meeting started Wed Jun 8 17:00:23 2016 UTC and is due to finish in 60 minutes. The chair is alaski. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:24 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:27 The meeting name has been set to 'nova_cells' 17:00:43 o/ 17:00:44 o/ 17:00:44 ohai 17:00:45 o/ 17:00:48 o/ 17:00:55 hello everyone 17:00:59 #topic Testing 17:01:14 no breaks as far as I know 17:01:24 auggy: any update on grenade testing? 17:01:34 i've got a WIP change up - https://review.openstack.org/#/c/326105/ 17:01:50 awesome 17:01:51 it's just copying the multinode test 17:01:54 nothing special 17:01:56 * alaski opens a tab 17:01:59 i'm still trying to get devstack-gate to work 17:02:15 so i can get the grenade target devstack settings we need 17:02:24 okay 17:02:46 well, whenever the simple cells setup stuff is ready 17:02:47 :) 17:03:07 great. I'll check out the review in a bit 17:03:08 right now i'm just troubleshooting adding that test and making jenkins pass 17:03:22 for the simple cells setup I have https://review.openstack.org/#/c/322311/ 17:03:24 yeah it's failing right now because i'm not putting a thing somewehre it needs to go 17:04:02 okay 17:04:07 thanks for working on that 17:04:24 alaski: oh great! as soon as i can get devstack-gate working to create a devstack i'll check out that change and see what happens 17:04:32 yeah and feel free to pipe in if it looks like i'm going down a rabbit hole i shouldn't be 17:04:45 sure 17:04:52 #topic Open Reviews 17:05:01 https://etherpad.openstack.org/p/newton-nova-priorities-tracking 17:05:20 I have not been keeping my stuff up to date 17:05:30 I will get on that in a bit 17:05:31 * dansmith wags his finger 17:05:42 * alaski hangs his head 17:05:43 o/ 17:05:52 should WIP things go in there or no? 17:05:56 please don't follow my lead, keep it up to date 17:06:02 melwitt: I would say yes 17:06:10 just mark it as such in there 17:06:21 okay 17:06:48 #topic Open Discussion 17:06:55 I have a few prepared items here today 17:07:10 first, I want to mention an issue with instance.name 17:07:24 by default it relies on using instance.id 17:07:30 which is assigned by the db 17:07:41 so returning that before writing to the cell db is problematic 17:07:54 my planned solution is to make it blank until it's in a cell db 17:07:57 alaski: just want to be clear: we expose instance.name via the external API? 17:08:02 yes 17:08:08 I'm not sure why we would do that, but.. heh, okay 17:08:10 I'm pretty sure we do 17:08:13 external attribute i think 17:08:50 i'm not sure why we do a lot of the things we do 17:08:50 yep, just another instance of oversharing implementation details 17:08:57 which is why sean is taking a flamethrower to the api :) 17:08:57 "OS-EXT-SRV-ATTR:instance_name": "instance-00000001", 17:09:16 mriedem: I'm trying to start small fires here and there as well 17:09:19 Why do we need to base it on the db id? 17:09:25 Cant we give it a uuid instead? 17:09:28 doffm: we don't 17:09:29 For new servers? 17:09:34 but in the past we did 17:09:44 doffm: we don't, it's just configurable and used in some scary places 17:09:46 doffm: see the instance_name_template config option 17:09:52 Ok. 17:09:54 doffm: the more complex answer is that it's generated on each access 17:10:07 Ouch. 17:10:10 if we persisted it for older instances we could update this 17:10:21 but I didn't want to go down that rabbit hole atm 17:10:22 we can work around this, but it's far easier to either not return it until we know it, or have it appear to change 17:10:23 https://github.com/openstack/nova/blob/master/nova/objects/instance.py#L252 17:10:43 this is a relic from long ago 17:10:58 LOL. That function. 17:11:00 the last time we tried to remove it, we realized all people with lvm-backed instances would be screwed 17:11:01 dansmith: yeah, I want to run an object abuse past you for this. but I would like to keep it blank at first 17:11:04 and there are a couple other places 17:11:27 alaski: this is not remotable so we can abuse it at will, but I'd rather start with it blank or missing yeah 17:11:40 cool 17:11:54 how are we going to namespace it per cell? 17:11:59 to avoid collisions? 17:12:03 we aren't 17:12:19 there's no guarantee of uniqueness here 17:12:34 that's the thing 17:12:34 right now someone could make a static template 17:12:40 it will overlap.. a lot 17:12:44 right 17:12:52 it eventually just turns into the uuid if it can't fit the template 17:13:05 https://github.com/openstack/nova/blob/master/nova/objects/instance.py#L268-L271 17:13:41 yeah, but a template like 'instance' should work 17:13:43 anyway, yeah, you could goof that template up 17:13:49 it does't matter, 17:13:56 there are resources named from that template right now 17:14:03 if we change it, then those resources are lost 17:14:09 I would like to address this eventually, but not as a blocker for what I'm working on now 17:14:09 like lvms for instance backing disks 17:14:30 eventually I want to persist the name, and snapshot every current instance name 17:14:37 but that's harder that it seems at first 17:15:08 next up 17:15:16 adding a new compute to a cell 17:15:35 I realized yesterday that we have plans for migrating what exists to a cell 17:15:42 but no plans for how things should be added to one 17:16:06 so when a compute is added it needs to get mapped properly 17:16:18 I would like it to look up the info it needs, and have it do it itself 17:16:27 yes please 17:16:30 and all it needs it the cell mapping uuid 17:16:47 so we could require that in a config, or put it in the cell db 17:17:25 I favor putting it in the db, but a config is the simpler start I think 17:17:43 any thoughts? 17:18:11 so every compute nova.conf would contain the cell uuid? 17:18:20 yeah 17:18:34 I don't love it, but... 17:18:42 okay, just making sure I understand 17:18:44 how would the db one work? 17:18:58 a new cell table that just stored that uuid 17:19:07 since every compute is configured to point at a db 17:19:16 alaski: so there is another option maybe: 17:19:23 alaski: instead of the upcall which kinda sucks anyway, 17:19:38 alaski: what if we had a "discover" command either via api or nova-manage, 17:19:59 which would merely list all computes in each cell in sequence, find new compute nodes that aren't yet mapped, and add a mapping for them? 17:20:05 that would avoid the upcall, not require a config, 17:20:23 and would have the knowledge of the cell already so it doesn't need a cell uuid persisted anywhere other than where we have it already 17:20:34 and then we could let that run periodically or just say "put that in cron if you want it" 17:20:48 presumably people only want to do that when they know they're adding new computes for the first time 17:20:56 and it wouldn't be "register this one compute node" kind of sucky 17:21:08 fair point 17:21:34 and it could even be "discover --all-cells" or "discover --cell=$uuid" 17:21:44 to be lightweight when you just lit up a new rack 17:21:55 interesting idea 17:22:00 the computes already check in to their own cell by queue, so no reason to make them upcall I think 17:22:06 I don't love the extra deployer step, but it does simplify it 17:22:23 because people that don't allow upcalls by policy (which hopefully will be most people eventually) would have an issue registering new computes 17:22:45 that's a good point 17:22:58 alaski: there is an extra deployer step with the config option too 17:23:04 alaski: well, we could make a periodic task a the top ad some point that just does "SELECT count(id) FROM compute_nodes" every few minutes on each cell db 17:23:04 yeah 17:23:14 mriedem: yeah and that is more painful, IMHO 17:23:18 mriedem: that's why I'm in favor of the db option. but it still requires the upcall 17:23:33 alaski: anyway, one more step right now that we can automate is not a huge deal I think 17:23:50 dansmith: right. so I think this sounds like a good first step, and then it can be refined later 17:24:28 I'll try that, and we can debate further on a review 17:25:01 my final agenda item: I'm going to be heading to the airport during next weeks meeting, so does someone want to run it, or skip? 17:25:03 alaski: I can think of lots of lightweight ways the scheduler could detect that we have more compute nodes than mappings, and trigger a discovery 17:25:25 and by "lots" I mean "at least one" 17:25:31 :) 17:25:35 dansmith: yes, until the scheduler splits (I'm still hopeful on that) 17:25:46 I hate meetings, I suggest we skip 17:25:54 We can probably miss a week. 17:26:02 alaski: I'm not sure where all this kind of stuff goes in that case anyway, but yeah 17:26:07 yeah, a skip is cool with me too 17:26:19 cool 17:26:24 #note no meeting next week 17:26:40 dangit 17:26:45 #info no meeting next week 17:26:54 okay, any other topics for today? 17:27:27 I wanted to mention I put up a WIP for people to have a look at for querying cell service version for compute RPC calls 17:27:57 https://review.openstack.org/#/c/326906/ so feel free to comment 17:28:17 hmmm, 17:28:29 that makes me think of the check i have in the get me a network rest api change 17:28:41 it's checking that all computes in the deployment are at least newton 17:28:57 with cells it would have to aggregate that all up 17:29:11 but you could be smarter and schedule the instance to a particular cell that is fully newton 17:29:20 yeah 17:29:29 I was thinking we could treat cells individually 17:30:07 where does the service table live in cells v2? 17:30:14 in the cell 17:30:41 so when asking for min nova-compute service, that will just be checking all computes as it does today, 17:30:49 but those computes would be grouped by cells in some mapping table 17:31:03 you can't do a single query of all compute services 17:31:18 you can do N for N cells and then pick the cells that are >=X 17:31:35 sure 17:31:48 that check in the api won't work unless the scheduler does the right thing though 17:32:33 easy out is just require all computes in all cells to be >=x 17:32:36 right, the scheduler would have to consider it 17:32:54 for single feature adds, 17:33:05 not allowing it until everything is upgraded is totally fine, IMHO 17:33:24 yeah i'm fine with that 17:33:52 do we have a stance on mixed cell deployments? 17:34:03 i.e. i can have a newton cell and a mitaka cell? 17:34:06 they're going to need to be possible 17:34:07 I think so. 17:34:14 yeah, not optional 17:34:18 I mean you will want to roll cells. 17:34:24 right 17:34:25 roll computes within your rolling cells 17:34:27 mfer 17:34:28 it'd be a regression to atomic upgrades 17:35:00 always be upgrading 17:35:05 lol 17:35:08 god 17:35:14 need .. the .. tshirt 17:35:26 doffm: not it for the ansible changes to handle this 17:35:27 hah 17:36:01 mriedem: It. :( 17:36:03 ok i'm done with random questions 17:36:25 I was thinking of starting work on the server groups migrations if that's cool with everyone 17:36:44 I OBJECT 17:36:49 (I don't object) 17:36:50 YES 17:36:53 * melwitt goes back in cave 17:36:54 I'd rather we just got rid of them... but that's cool with me 17:37:07 heh 17:37:10 yeah i was going to say quotas is probably higher priority? 17:37:17 but shittier 17:37:28 quotas is in progress right? 17:37:30 I think doffm is doing quotas right? 17:37:32 is it? 17:37:35 ha 17:37:36 I will start on quotas next week. 17:37:40 seriously? 17:37:43 I'll add it to our backlog. 17:37:46 Or do it in the evening. 17:37:55 doffm: i need to know how this benefits ibm public cloud 17:38:14 mriedem: Shhhhh. Nothing happening here. 17:38:14 anywho 17:38:19 I can do quotas if that's more needed, I just thought it was already taken 17:38:37 mark wrote the spec 17:38:40 melwitt: run away .. fast 17:38:52 but i'm pretty sure mark is overcommitted, but i'll let him hang himself if he wants 17:39:08 :) 17:39:55 let's end meeting before dan freaks out 17:40:02 yes please 17:40:14 alright, anything else? 17:40:19 better speak up quick 17:40:34 thanks everyone! 17:40:36 #endmeeting