21:00:05 <dansmith> #startmeeting nova_cells 21:00:06 <openstack> Meeting started Wed Mar 8 21:00:05 2017 UTC and is due to finish in 60 minutes. The chair is dansmith. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:07 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:10 <openstack> The meeting name has been set to 'nova_cells' 21:00:18 <mriedem> o/ 21:00:24 <melwitt> o/. 21:00:26 <macsz> \o 21:00:33 <pumaranikar> o/ 21:00:42 <dansmith> melwitt: armpit? 21:01:09 <melwitt> what armpit 21:01:20 <dansmith> your hand up had a bogey 21:01:20 <macsz> the dot :) 21:01:33 <melwitt> oh, haha. I didn't even notice 21:01:38 <dansmith> #topic cells testing/bugs 21:01:47 <dansmith> so before we get into mriedem shitting all over it, 21:01:52 <dansmith> in regards to testing, I'd like to point out this: 21:01:59 <dansmith> http://logs.openstack.org/94/436094/14/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/f7d6160/logs/testr_results.html.gz 21:02:15 <dansmith> all but two tempest tests running with multiple cells, and I have patches up for those as well 21:02:35 <dansmith> unfortunately, no chance of having a clean run at this point due to when I pushed those up 21:02:54 <dansmith> but anyway, the effort of actually getting a clean test run on multicell devstack is progressing 21:02:54 <mriedem> \o/ 21:03:00 <dansmith> the devstack patch itself still needs a lot of work, 21:03:07 <dansmith> but I won't ever get to it if mriedem keeps up his antics 21:03:18 <dansmith> anyway, anything else testing-related? 21:03:23 <mriedem> s/antics/excellent reviews/ 21:03:51 <mriedem> yeah 21:03:52 <mriedem> on https://review.openstack.org/#/c/442861/ 21:03:59 <mriedem> is the nova-status thing just separate from this series? 21:04:06 <mriedem> i think i thought that was fine but checking 21:04:11 <dansmith> mriedem: it is 21:04:22 <mriedem> ah yes https://review.openstack.org/#/c/442787/ 21:04:38 <dansmith> mriedem: we pulled out a newer service version check at the end of ocata, you'll recall, and I continue to challenge the root concern anyway, 21:04:54 <dansmith> but not opposed to a status check of course 21:05:21 <mriedem> you mean the one in the scheduler filter for placement? 21:05:37 <dansmith> no, that was in pike, but that's another good example :) 21:05:55 <dansmith> we had one in compute/api about earlier computes before a cells patch from avolkov 21:06:00 <mriedem> so really our minimum version service check in nova-status should be whatever was required for that placement thing 21:06:14 <mriedem> which i think i had a bug for anyway 21:06:37 <mriedem> oh no different check https://bugs.launchpad.net/nova/+bug/1669433 21:06:37 <openstack> Launchpad bug 1669433 in OpenStack Compute (nova) "nova-status needs to check that placement 1.4 is available for pike" [High,In progress] - Assigned to Roman Podoliaka (rpodolyaka) 21:07:01 <mriedem> anyway, it's a good point that the minimum compute version is going to need to be 16 21:07:03 <mriedem> which is your patch 21:07:54 <dansmith> anything else on testing? the next optic on open reviews has a lot of material 21:08:04 <mriedem> no 21:08:37 <dansmith> #topic open reviews 21:08:55 <dansmith> so one of our oldest is dtp's console upcall patch, which I hit again today 21:09:09 <dansmith> I had a minor complaint about it doing some cleanup and functional change in the same and asked him to split 21:09:15 <dansmith> #link https://review.openstack.org/#/c/415922/ 21:09:23 <dansmith> anyone else able to take a look at that soon? 21:09:42 <mriedem> i'd prefer melwitt to look at that given she was looking more into the spec 21:09:48 <melwitt> I'm planning to look at it 21:09:48 <mriedem> did that get re-proposed and approved btw? 21:10:00 <mriedem> in other words, 21:10:04 <melwitt> mriedem: not yet, going to do that maybe today. this week 21:10:05 <mriedem> shouldn't this change go under that blueprint? 21:10:22 <dansmith> mriedem: which blueprint? the console tokens in db one? 21:10:24 <melwitt> well, I guess the thing is this is an interim thing 21:10:39 <dansmith> right this is not really related to that larger effort 21:10:41 <melwitt> it was supposed to go in ocata as a stop-gap 21:10:42 <mriedem> oh 21:10:55 <mriedem> carry on then 21:11:12 <dansmith> cool 21:11:27 <dansmith> melwitt: can we maybe try to have that merged by this time next week? 21:11:54 <melwitt> dansmith: the spec? yeah. I will also get the placement spec up this week too 21:12:00 <dansmith> no, dtp's patch 21:12:06 <melwitt> oh, yeah. sorry. yeah 21:12:15 <dansmith> cool, the specs are important too of course, 21:12:25 <dansmith> but just want to avoid this withering on the vine too much 21:12:33 <melwitt> roger that 21:12:55 <dansmith> okay so the next set is the quotas stuff, 21:13:10 <dansmith> which got some activity this morning and I think melwitt is probably working on as we speak 21:13:16 <dansmith> I've been through parts of that patch but not the rest 21:13:25 <dansmith> the bottom two are approved and just holding until the third is ready to go 21:13:51 <dansmith> mriedem: in between shitting on my patches that might be a good one for you to look at too 21:13:57 <dansmith> you know, to spread the pain^Wlove around 21:14:22 <mriedem> john has been reviewing that right? 21:14:25 <dansmith> yeah 21:14:28 <mriedem> at this point i'm happy to let john handle it 21:14:39 <dansmith> well, it has some implications to behavior 21:14:50 <mriedem> i realize it's something i should know about... 21:14:54 <melwitt> yup. the top patch is not a picnic for review, a lot of it is deleting of code. so be on the lookout for gaps as something to watch out for 21:14:56 <dansmith> about how things behave when you're close to quota 21:15:09 <mriedem> do we have functional tests for the edge cases? 21:15:33 <dansmith> well, the point is the edge cases are leaky by design 21:16:32 <mriedem> sure. selfishly speaking, there are only so many super complicated series of things i can push into context in my brain at any given time, and with cells v2 and jay's inventory stuff and some other things, i just won't say i can get to it right now and give it a thorough review. 21:16:55 <dansmith> okay 21:17:07 <mriedem> i'm channeling my inner sdague here 21:17:51 <dansmith> melwitt: maybe we try to make sure the commit message/reno summarize the changes well enough that if he just reads that he won't be surprised on stage in the future 21:18:26 <melwitt> fwiw, the edge case discussions are contained at the moment as the only comments on the review. that makes it easier-ish to weigh in on those points 21:18:59 <dansmith> yeah 21:19:00 <melwitt> dansmith: yeah, that's a good idea in general, for anyone to be able to get the main points 21:19:04 <dansmith> yeah 21:19:19 <dansmith> alright anyway, 21:19:28 <mriedem> didn't we need the user/project in placement for counting quotas first? 21:19:35 <dansmith> no 21:19:39 <mriedem> or was that optional for now since we don't expect cells to be 'down' right now 21:19:41 <dansmith> it helps us do it better 21:19:43 <dansmith> yeah 21:20:11 <melwitt> yeah, we're going to go forward with this for now as a first step that has caveats, and expect the placement stuff to complete this cycle and close that gap 21:20:28 <melwitt> since multi cell isn't really a thing at the present moment, anyway 21:20:43 <dansmith> hey! 21:20:59 <dansmith> it is in my fairy tale life 21:20:59 <melwitt> sorry, I meant in the non CD case 21:21:00 <mriedem> look who is shitting on your stuff now 21:21:07 * dansmith steams 21:21:13 <mriedem> like a steaming pile of... 21:21:13 <dansmith> moving on? 21:21:15 <mriedem> yes 21:21:23 <melwitt> guh, no sorry not what I meant 21:21:25 <dansmith> the next series is my steaming pile of shit 21:21:44 <mriedem> don't worry, i also have searchlight to talk about at some point here 21:21:49 <dansmith> which I just realized won't work in the order I just pushed up, so I will have to transplant some code first 21:22:29 <mriedem> dansmith: as in the patch we just talked about first, but without the GET by id stuff? 21:22:31 <dansmith> however, on top of all of them, we pass a tempest run, although just a few minutes ago mriedem identified some issues that stem from historical leaks of things like internal DB ids 21:22:57 <dansmith> mriedem: no, I moved that up, but it had a refactor (load_cells) that the other ones need, so I need to transplant that 21:23:03 <mriedem> ok 21:23:18 <dansmith> mriedem: so one thing to note is that until you have multiple actual cells, 21:23:26 <dansmith> what I have is not any different than what we have today I think 21:23:41 <dansmith> but, you said you had an idea about moving forward with those? 21:23:44 <mriedem> sure for single cell this is fine 21:23:46 <mriedem> yeah 21:23:59 <mriedem> so, i think we can agree that we should stop leaking ids out of the cell databases in the REST API 21:24:00 <mriedem> correct? 21:24:15 <dansmith> are you saying you're okay merging this early with that caveat? because reordering back is much easier 21:24:19 <dansmith> uh, yes, agreed 21:24:23 <dansmith> obviously 21:24:25 <mriedem> dansmith: not yet 21:24:49 <mriedem> ok, so i think we can agree that we should probably do a microversion in os-hypervisors that returns the compute node uuid rather than the id, and takes a uuid rather than an id for GET calls 21:24:53 <mriedem> is that ok? 21:25:29 <dansmith> what are you asking is okay? that we stop being stupid? yes, that's okay :) 21:25:34 <mriedem> ok 21:25:40 <mriedem> just setting the foundatoin of shit we can agree on 21:25:56 <mriedem> next thing is, in this code that's the problem, and not cells aware, 21:26:15 <mriedem> if we have multiple cells and can't find a unique compute/service by id (not uuid), we fail with a 400 21:26:27 <mriedem> and force you to use the microversion to pass the uuid to find the thing you need 21:26:37 <dansmith> meaning, check all of them and if we find any dupes, then refuse to do that thing? 21:26:50 <mriedem> right, just like when we boot a server w/o a specific network 21:26:55 <mriedem> if there are duplicate networks, we fail 21:26:58 <dansmith> sure, that's a good idea 21:27:04 <dansmith> but only after we have the microversion api I guess 21:27:09 <mriedem> yeah, so, 21:27:18 <mriedem> you can still pass id before the microversion in the single cell case 21:27:19 <mriedem> that's fine 21:27:33 <mriedem> but in the multi-cell case, if you pass id and we find multiple, it's a 400, 21:27:40 <mriedem> and you have to pass the uuid using the microversion 21:27:46 <dansmith> aye 21:27:55 <mriedem> ok, if we're all happy with that, i can start the spe 21:27:56 <mriedem> *spec 21:28:13 <dansmith> so, there's probably a few things, right? os-hypervisors, os-services at least 21:28:17 <dansmith> shouldn't we do them all together/ 21:28:21 <mriedem> i'm slightly less clear on the os-pci api here, but would have to investigate that more 21:28:29 <mriedem> yes probably 21:28:39 <mriedem> yeah for sure os-hypervisors and os-services 21:28:41 <dansmith> okay, well, anyway, I'm definitely on baord with that 21:28:44 <mriedem> cool 21:28:50 <mriedem> i think we have the same issue in os-pci, 21:28:58 <mriedem> but i have 0 idea if anyone ever uses that api 21:29:02 <mriedem> it's not even documented 21:29:07 <dansmith> I will reswizzle these so this patch can be later in the stack and keep pushing what we can, and wait for that for this patch 21:29:59 <mriedem> ha, also, side note, 21:30:10 <mriedem> PCI_ADMIN_KEYS is used in os-pci but doesn't check if you're an admin, 21:30:15 <mriedem> or perform any kind of check 21:30:32 <dansmith> wt...f 21:30:33 <mriedem> anyway 21:30:41 <mriedem> well, the default policy on listing pci devices is admin only 21:30:44 <mriedem> but still 21:30:55 <dansmith> ah 21:30:56 <dansmith> yeah 21:31:09 <dansmith> okay, mriedem you wanted to call out the searchlight review I assume? 21:31:22 <mriedem> yeah, sec 21:31:31 <mriedem> https://review.openstack.org/#/c/441692/ 21:31:33 <dansmith> I have started looking at it a few times, but this guy keeps shitting on my patches 21:31:37 <mriedem> #link searchlight integration spec https://review.openstack.org/#/c/441692/ 21:31:39 <dansmith> with "alternative facts" 21:31:52 <mriedem> i haven't gone through the latest round of comments in there, 21:31:58 <mriedem> but it's got quite a bit of detail, 21:32:19 <mriedem> net is it's a bit of a mess dependency-wis 21:32:22 <mriedem> *wise 21:32:28 <mriedem> searchlight doesn't support versioned notifications yet 21:32:38 <mriedem> they have a blueprint to do it, but aren't doing it yet 21:32:43 <dansmith> orly 21:32:55 <dansmith> I thought they were super interested in those 21:33:01 <mriedem> we also have an issue with the fact that when you delete a server in nova, they delete the index entry for that server in searchlight, 21:33:21 <mriedem> so if nova is using searching and you do nova list --deleted, you get nothing 21:33:44 <mriedem> elasticsearch used to have a concept of a ttl on the entry, but that's removed in v5.0 21:33:57 <melwitt> what are the implications of them not supporting versioned notifications? how do they currently get nova notifications? 21:33:58 <mriedem> they basically pushed the filtering on time to the client it sounds like 21:34:06 <mriedem> melwitt: they get the legacy unversioned notifications 21:34:19 <mriedem> they said they wanted to get versioned notification support in for ocata but didn't have the people to do it 21:35:00 <mriedem> i think it will happen, it's just something to note right now 21:35:05 <dansmith> okay 21:35:11 <mriedem> the delete thing is a bit more worrisome for me, 21:35:16 <dansmith> the deleted thing is probably an issue for them anyway right? 21:35:24 <mriedem> i've suggested a config option in searchlight for a time window before they delete the entry 21:35:28 <dansmith> because people that care about that won't be happy with searchlight as a semi replacement 21:36:04 <mriedem> we don't guarantee that you can get deleted instances forever anyway b/c of archive and purge, but it's something people are going to assume works 21:36:14 <mriedem> and i'm sure admins rely on for debug 21:36:15 <dansmith> yeah 21:36:44 <mriedem> as far as data migrations, 21:37:06 <mriedem> the upside is searchlight already has a searchlight-manage command that you can run to make searchlight hit the nova api and pull in all of the existing instances to populate indexes 21:37:20 <mriedem> so we don't have to worry about nova pushing that data out, or setting up a cron to issue instance.usage.exists 21:37:37 <dansmith> sweet 21:37:46 <mriedem> so you (1) setup searchlight, (2) pull the nova data to populate searchlight, (3) configure nova-api to use it, (4) restart nova-api 21:38:25 <mriedem> the other thing i noted in there that sucks is every new field we add to the rest api we have to add to our versioned notifications 21:38:30 <mriedem> that's not really new, but will be more strictly enforced 21:38:55 <mriedem> plus right now the searchlight guys said we'd also have to make a corresponding mapping change to searchlight to make it handle the new field 21:39:16 <mriedem> gibi pointed out that we have a bp to send the schema with the versioned notification payload, and searchlight could use that schema to add new mappings, but that's a long ways off i think 21:39:30 <mriedem> anyway, none of this is impossible, it's just not as trivial as "we'll just have searchlight do our stuff" 21:39:46 <mriedem> fin 21:39:57 <dansmith> okay that's not too bad, 21:40:02 <dansmith> if we're depending on them like we plan to 21:40:24 <dansmith> not unlike making changes to o.vo or os-vif that we need 21:40:53 <mriedem> yeah it would just suck if we have to make 3 changes before we can return something new out of the rest api 21:40:58 <mriedem> but anyway 21:41:16 <dansmith> well, 21:41:38 <dansmith> we'd have to make the searchlight changes before it would work in that environment, not necessarily for it to work at all 21:41:39 <dansmith> but yeah 21:41:55 <dansmith> not surprising given the level at which we're using them for api in this scenario though 21:42:55 <dansmith> anything else on stuff up for review? 21:43:15 <mriedem> i don't have anything 21:43:29 <dansmith> melwitt: ? 21:43:52 <melwitt> no, think everything got mentioned 21:44:03 <dansmith> cool 21:44:06 <dansmith> #topic open discussion 21:44:20 <melwitt> I wanted to clarify what I said earlier, 21:44:42 <melwitt> I was thinking of multi cell from an operator perspective as in, how long would they experience a gap in say, the "cell down quota issue" 21:45:15 <melwitt> I had been thinking we were going to signal to them that's it's okay/recommended to create multiple cells at rc1 21:45:16 <mriedem> can someone explain to me what 'cell down' even means? 21:45:22 <mriedem> rabbit and db are dead for that cell? 21:45:30 <melwitt> like, lose communication with cell, for whatever reason 21:45:51 <melwitt> yeah, that's one example 21:45:59 <dansmith> melwitt: if mriedem stops shitting on patches, then yes I agree with that statement :) 21:46:00 <mriedem> how is that different from if your non-cells single region deployment loses rabbit/db today? 21:46:15 <dansmith> mriedem: your quota appears to expand in that case 21:46:21 <dansmith> mriedem: because you stop counting certain resources you can't see 21:46:32 <mriedem> ok 21:46:42 <mriedem> which is why we need the global allocation 21:46:44 <mriedem> via placement 21:46:46 <mriedem> got it 21:46:55 <dansmith> yeah 21:46:58 <mriedem> btw, it's fun that placement is the new keystone :) 21:47:01 <mriedem> has anyone mentioned that yet? 21:47:14 <dansmith> hah, jaypipes kinda did indirectly :P 21:47:19 <melwitt> yeah, so I was thinking if we aren't going to recommended to operators to create multiple cells until rc1, then we're "safe" in that we don't have a gap for the "cell down" case from their perspective 21:47:20 <macsz> in which way? 21:47:21 <dansmith> when discussing quota shit 21:47:50 <mriedem> macsz: what in which way? placement == keystone? 21:47:56 <macsz> yeah 21:47:59 <dansmith> melwitt: even if we get all my stuff landed, we can still say "it works with the following caveats, so don't use it if those bother you" 21:48:00 <mriedem> macsz: just that it's going to have user/project stuff in it and it's global 21:48:22 <dansmith> melwitt: in fact, I have one such caveat called out in the series already 21:48:25 <macsz> mriedem: oh, ok, got it :) 21:48:33 <melwitt> dansmith: yeah, true 21:48:33 <dansmith> although it's much smaller than quotas of course 21:48:40 <mriedem> melwitt: i'm also fine with saying multiple cells is ok with caveats 21:48:45 <mriedem> for what we know doesn't work 21:48:55 <dansmith> like.. pci :) 21:48:58 <mriedem> melwitt: however, 21:49:07 <mriedem> the fact you're thinking about caveats for rc1 at this point scares me 21:49:22 <melwitt> heh 21:49:25 <dansmith> mriedem: this quota caveat has been planned since before atlanta 21:49:55 <mriedem> but wasn't that before talking about putting user/project in allocations in placement? 21:49:58 <dansmith> as long as the caveats don't affect the single-cell case, I don't see the problem other than just limiting the scope of who can move to cellsv2 on release day 21:50:12 <melwitt> well, I woke up in the middle of the night and exclaimed (in my mind) "what if a cell goes down!" so I've been thinking about it. I'm sure there are other caveats I haven't thought of yet 21:50:34 <mriedem> i woke up thinking about sump pumps and patio furniture covers and sling tv 21:50:43 <mriedem> to each his/her own 21:50:44 <dansmith> yes 21:50:55 <dansmith> oops 21:51:19 <melwitt> dansmith: agreed with the single cell vs multi cell caveat thing 21:51:25 <dansmith> anything else? this is already the longest cells meeting on record 21:51:38 <macsz> just would like to say that me and pumaranikar are both working in osic with johnthetubaguy 21:51:45 <macsz> and we both are interested in doing some work for cells 21:52:07 <macsz> probably will start with some bugs, unfortunately most of cells bug reports are over 1 yr old 21:52:14 <macsz> but will start digging sth :) 21:52:18 <mriedem> macsz: i think there will be work to do with searchlight possibly 21:52:20 <dansmith> macsz: those are bugs we don't care much about 21:52:27 <dansmith> macsz: i.e. cellsv1 bugs 21:52:38 <mriedem> macsz: i don't know that anyone is slated to add versioned notification work to searchlight 21:52:55 <dansmith> macsz: testing more realistic stuff with cellsv2 is something major you guys could help out with 21:52:59 <mriedem> probably need to sort that out with steve mclellan and/or kevin_zheng 21:53:09 <macsz> ok :) 21:53:20 <mriedem> macsz: also, i've been trying to get a ci job setup with searchlight enabled and nova configured to send things to searchlight, 21:53:27 <mriedem> that's not as easy as i thought it would be 21:53:34 <mriedem> we can talk about that in -nova if you're interested 21:53:58 <macsz> mriedem: yeah, sure 21:54:36 <mriedem> ok so i think we can wrap up 21:54:46 <mriedem> excellent meeting gang! 21:54:57 <dansmith> sweet 21:54:59 <dansmith> #endmeeting