21:00:25 <dansmith> #startmeeting nova_cells 21:00:26 <openstack> Meeting started Wed Feb 21 21:00:25 2018 UTC and is due to finish in 60 minutes. The chair is dansmith. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:28 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:30 <openstack> The meeting name has been set to 'nova_cells' 21:00:33 <tssurya> o/ 21:00:36 <dansmith> I got distracted talking to belmoreira 21:00:41 <mriedem> o/ 21:00:43 <dansmith> hence my 26 second tardiness 21:01:32 <dansmith> #topic bugs 21:01:53 <dansmith> we've got a pretty good set on the agenda, cultivated by tssurya: https://wiki.openstack.org/wiki/Meetings/NovaCellsv2 21:02:16 <dansmith> tssurya: I'm not sure how I feel about continuing to work on that first one, for cellsv1, to be honest 21:02:33 <melwitt> o/ 21:02:46 <tssurya> dansmith : well we have that patch in production now 21:03:01 <tssurya> and we would be moving away from cellsv1 soon :) 21:03:08 <dansmith> tssurya: yeah, it just doesn't work for our test environment, hence my concern 21:03:35 <tssurya> dansmith : so maybe we just keep it as WIP ? 21:03:38 <dansmith> ack, so I think I'll just leave it up in case people need it, but not really push on it 21:03:38 <dansmith> yeah 21:03:42 <dansmith> I'll make a note on it 21:04:16 <dansmith> the rest of the bugs up there look straightforward and almost all have reviews, which melwitt just added to the priorities list, so ... review those 21:04:23 <dansmith> tssurya: any of those you want to highlight? 21:04:26 <tssurya> I would appreciate some pointers on trying to write a test case for this : https://review.openstack.org/#/c/546660/ with 21:04:36 <tssurya> respect to deleting RPs 21:04:54 <dansmith> tssurya: okay cool 21:05:37 <tssurya> dansmith : thanks, 21:05:46 <tssurya> will wait for your comments in the review then 21:06:00 <dansmith> tssurya: sure, or mriedem.. he's good with that stuff 21:06:00 <tssurya> I don't have anything else to highlight 21:06:03 <dansmith> okay 21:06:15 <tssurya> dansmith : okay 21:06:18 <dansmith> #topic open reviews 21:06:23 <dansmith> I have this set up: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/placement-req-filter 21:06:35 <dansmith> which is about a pre-filtering mechanism for the scheduler, which isn't cells-specific, 21:06:52 <dansmith> but came up because of the concerns tssurya and belmoreira had about the scheduler choking on the full result set from placement 21:07:04 <dansmith> this will let us fine-tune what we ask of placement for lots of cases 21:07:18 <dansmith> this being one solution for the cells case: https://review.openstack.org/#/c/545002/ 21:07:26 <tssurya> dansmith : thanks again for doing this 21:07:29 <dansmith> specifically over tenant cell assignment 21:07:46 * melwitt adds to priorities etherpad 21:07:50 <dansmith> here's the start of another one that isn't cells-specific: https://review.openstack.org/546282 21:08:01 <dansmith> which would let us do AZs without a post-scheduler filter like we do today, 21:08:09 <dansmith> which will be way more efficient when users ask for a specific AZ 21:08:35 <dansmith> there is some placement API work that has to be done first in order for both of these to work, but it's just a parity thing and not too major 21:08:50 <tssurya> dansmith : so the placement aggregates would be modelled to accommodate the avz ? 21:09:05 <dansmith> tssurya: for the AZ thing yeah 21:09:15 <tssurya> cool 21:09:26 <dansmith> jay is working on a spec to allow mirroring of aggregate operations up to placement, 21:09:36 <dansmith> so when you add an aggregate and add hosts to it, nova will tell placement about those things 21:09:40 <dansmith> so you don't have to do everything twice 21:09:51 <belmoreira> dansmith by not cell specific is because it uses aggregates? 21:10:00 <dansmith> however, until that happens, you'd just have to make sure placement knows about the links 21:10:05 <melwitt> I need to read up on that. placement will do some aggregate stuff but not all, like metadata I assume? 21:10:15 <dansmith> belmoreira: not cells-specific because people that just use AZs today would still want this 21:10:39 <dansmith> melwitt: right, placement already has aggregates for things like knowing which computes are connected to which networks, shared storage, etc 21:10:48 <dansmith> but it's not as heavy as nova's implementation 21:10:55 <melwitt> k, cool 21:11:09 <melwitt> not going to have all of the key=value stuff in it 21:11:13 <dansmith> correct 21:11:21 <melwitt> got it 21:11:51 <belmoreira> dansmith do you think that then we can also have the cell abstraction? 21:12:15 <dansmith> belmoreira: what do you mean? 21:12:23 <mriedem> model cells in placement i assume 21:12:33 <mriedem> like ed's idea about nested providers 21:12:37 <belmoreira> for large sites aggregates are fine grained. We organize thinhs with cells 21:12:38 <mriedem> even though cells don't provide inventory 21:13:13 <dansmith> yeah, cells don't provide inventory, which is why I think it's a bad idea to model cells as parent providers 21:13:21 <dansmith> not to mention it makes the entire deployment in one tree 21:13:24 <belmoreira> meaning that we will need to duplicate the host-cell mapping that we already have per cell for the aggregates 21:14:03 <dansmith> belmoreira: placement is definitely not going to get a cell notion 21:14:19 <dansmith> belmoreira: the closest would be nova maintaining an aggregate per cell when hosts are mapped or something 21:14:42 <dansmith> which I guess we could do, but it doesn't excite me :) 21:14:56 <belmoreira> dansmith ack :) 21:15:08 <melwitt> nested aggregates anybody? 21:15:46 <dansmith> melwitt: that's what I'll tell people when they ask why I'm applying for my next job, yeah 21:15:46 <melwitt> or wait, we can already do that 21:15:59 <belmoreira> but if not done by nova ,operators will need to keep them in sync (aggregate/cell). Not easy... 21:16:00 <dansmith> no, we don't have nested aggregates, we have overlapping aggregates 21:16:05 <melwitt> but maybe AZs messes that up. anyway 21:16:21 <melwitt> overlapping is what I was thinking of 21:16:23 <melwitt> okay 21:16:25 <dansmith> belmoreira: well, operators that need cell based scheduling 21:17:01 <dansmith> belmoreira: so far you're the only one I know of like that, and the other large operators I've talked to want *more* management-via-aggregate, like the allocation ratios thing 21:17:31 <melwitt> fwiw I predict other large operators wanting it 21:17:49 <melwitt> like, if they had cells, they'd want to manage ratios per cell if they could 21:17:52 <dansmith> so, I get that your case would require some manual syncing of those concepts, and I get why that sucks, I just need to kinda get my head around what we ca do about it 21:18:08 <dansmith> melwitt: they can, by defining aggregates 21:18:27 <melwitt> but what if you have multi aggregates in one cell? then can't, right? 21:18:37 <dansmith> sure 21:18:54 <dansmith> that's why I think forcing people into one per cell is wrong anyway 21:19:06 <dansmith> because some people may have smallish cells and deal with things only on the cell level, 21:19:19 <dansmith> others may have giant cells, for which no one rule applies to all things in that cell 21:19:35 <dansmith> that's why aggregates can overlap and why they have metadata and not fixed attributes 21:20:05 <dansmith> and you define aggregates around the things with similar characteristics, and assign meaning to those as appropriate 21:20:12 <belmoreira> dansmth true 21:20:34 <belmoreira> but also because we lack metadata in cells 21:21:27 <dansmith> that's intentional though, so we don't have to apply all the things we can do with aggregates to cells in a different way 21:22:06 <melwitt> yeah ... that makes sense 21:22:19 <dansmith> there are a ton of things you can do with aggregates, and replicating that onto cells is just a terribly complex undertaking 21:22:27 <mriedem> isn't a lot of this trying to shoe-horn the old multi-level cells scheduler stuff into the new flat world rather than just doing things the way we can with what we have in flat scheduling? 21:22:40 <dansmith> mriedem: yes 21:22:41 <melwitt> it might, yeah 21:22:41 <mriedem> like, in cells v1 we have 2 level scheduling and can optimize the cell that's picked, 21:22:51 <mriedem> ok 21:23:03 <dansmith> it's, IMHO, more about "tenants are fixed into these silos", which is valid 21:23:21 <dansmith> those silos used to be naturally cellsv1 cells, but I don't want to tie more meaning into a cell than we have to, 21:23:29 <mriedem> because i don't think we want to add a bunch of new complexity to maintain how things were done the cells v1 way 21:23:34 <dansmith> which is why I'm resistant to giving them more meaning than just a group of computes that share a db/mq 21:23:39 <dansmith> right 21:23:41 <mriedem> ack 21:23:52 * melwitt nods 21:23:53 <mriedem> i realize that makes the transition harder 21:24:26 <dansmith> mirroring cells into aggregates (i.e. when we discover a new cell mapping, we add an aggregate, and when we map a new host, we add it to the aggregate) is an option, I just don't want to make that too easy :) 21:24:56 <belmoreira> mriedem the transition yes, but I'm must worried about the operations 21:25:51 <belmoreira> I need to setup something to keep the aggregates in sync and we will have few of them 21:25:55 <melwitt> this talk is making me think it might be useful to brainstorm a few reference deployment layouts to include in our docs 21:26:07 <belmoreira> for example: aggregate-cell; aggregate-avz 21:26:29 <melwitt> complete with how you could draw your aggregates and cells 21:26:57 <melwitt> "if you currently do this with cells v1, this is how you would do that in cells v2" 21:27:29 <melwitt> going from the multi-level stuff to the flat. anyway, just an idea 21:28:01 <dansmith> melwitt: as long as we're not making a direct mapping, but describing how you achieve things in the new system 21:28:18 <dansmith> belmoreira: so, since this is going on and to wrap up a bit: 21:28:33 * melwitt nods 21:29:03 <dansmith> belmoreira: are you willing to do the mapping with aggregates and this pre-filter thing for a first-go, and report back with how heavy it is in reality for maintenance? 21:29:39 <dansmith> presumably the worst case here is "yes, this is very hard, we have aggregates out of sync sometimes, because $reasons, etc" 21:30:03 <belmoreira> dansmith sure. let's give it a go 21:30:36 <dansmith> belmoreira: okay cool.. this aggregate idea is the result of me transitioning from "hell no" to this, which gets us close, 21:31:12 <dansmith> so I think we'll end up with something workable given more soak time and learning more about the pain points, as we have already 21:31:42 <dansmith> and we have next week to smash our brains together on ideas for refining things 21:31:57 <dansmith> okay, so.. any other open reviews to highlight? :) 21:32:01 <dansmith> other than tssurya's bug reviews 21:32:32 <melwitt> not yet, consoles stuff is still up but spec not re-approved yet. fyi 21:32:36 <belmoreira> dansmith having some filtering in the placement is already something great. thanks for that. don't let me wrong :) 21:32:43 <dansmith> belmoreira: okay :) 21:32:59 <dansmith> melwitt: ack 21:33:12 <dansmith> #topic open discussion 21:33:20 <dansmith> we've already done a lot of discussing openly 21:33:25 <dansmith> no meeting next week because obvious. 21:33:32 <dansmith> anything else to bring up? 21:33:38 <tssurya> nope 21:33:51 * dansmith 's fingers are already tired 21:34:00 <melwitt> nay 21:34:06 <dansmith> tssurya: looking forward to meeting you next week! 21:34:16 <tssurya> dansmith : same here! 21:34:17 <melwitt> ++ 21:34:28 <dansmith> aight, cells team out 21:34:30 <dansmith> #endmeeting