17:00:00 <dansmith> #startmeeting nova_cells 17:00:00 <openstack> Meeting started Wed Sep 27 17:00:00 2017 UTC and is due to finish in 60 minutes. The chair is dansmith. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:02 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:04 <openstack> The meeting name has been set to 'nova_cells' 17:00:08 <mriedem> o/ 17:00:37 <melwitt> o/ 17:01:03 <dansmith> doesn't look like surya is around 17:01:17 <dansmith> assuming I remember that nick properly 17:01:22 <mriedem> tssurya: here? 17:01:30 <dansmith> oh forgot there was a t in front 17:01:52 <dansmith> anyway 17:01:55 <dansmith> #topic bugs 17:02:09 <mriedem> https://bugs.launchpad.net/nova/+bugs?field.tag=cells&orderby=-datecreated&start=0 17:02:15 <mriedem> i was poking through those the other day 17:02:41 <mriedem> https://review.openstack.org/#/q/status:open+project:openstack/nova+topic:bug/1715533 17:02:49 <mriedem> i've got those backports lined up for the map_instances things 17:02:51 <mriedem> *thing 17:02:55 <dansmith> cool 17:03:01 <dansmith> I shall leave that open 17:03:25 <dansmith> anything else we should discuss here/ 17:03:39 <mriedem> umm, 17:03:46 <mriedem> no. we should just scrub that list at some point 17:03:51 <dansmith> yeah 17:03:51 <mriedem> probably cells v1 things in there we won't fix 17:03:58 <dansmith> aye 17:04:05 <dansmith> #topic open reviews 17:04:20 <dansmith> before I get to whining about my instance list set, anything we should note here? 17:04:28 <dansmith> ed has the selection patches and spec up for review, 17:04:38 <dansmith> which I've been trying to circle back to every once in a while 17:04:53 <mriedem> are you lumping the alternate hosts stuff in with that as well? 17:04:56 <mriedem> i know he has a spec for both 17:05:00 <dansmith> #link https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/placement-allocation-requests 17:05:19 <dansmith> it's all part of the same effort yeah 17:06:01 <mriedem> ok, i haven't reviewed any of it yet 17:06:05 <mriedem> are there any major issues? 17:06:24 <dansmith> well, the limits thing might be, but I haven't gone back around to that since yesterday 17:08:13 <dansmith> anything else before we get to instance list stuff? 17:08:25 <mriedem> nope 17:08:40 <dansmith> okay, so, for the logs: 17:08:52 <dansmith> mriedem did some perf testing with and without the base instance list cutover patch 17:09:04 <dansmith> and seemed to measure some slowerness with the patch applied 17:09:23 <dansmith> I tried to reproduce and I don't see the differing behavior 17:09:32 <dansmith> but in the process, 17:09:38 <dansmith> we probably uncovered some placement issues, 17:09:54 <dansmith> and definitely are seeing a large performance different (with and without the patch applied) between 2.1 and 2.53 17:10:11 <dansmith> like going from average of 6s per request to 9s for listing 1000 instances 17:10:38 <mriedem> yup, plus you can't create 500 instances at once 17:10:40 <dansmith> details are here: 17:10:41 <dansmith> #link https://etherpad.openstack.org/p/nova-instance-list 17:10:46 <dansmith> mriedem: that's the placement thing I mentioned 17:10:53 <mriedem> oh right 17:11:04 <dansmith> so right now, I'm running every microversion from 1 to 53 for 10x and recording the results 17:11:11 <dansmith> I'm up to 2.22 on master 17:11:25 <dansmith> and so far I'm still at ~6s 17:11:27 <dansmith> looking for a jump 17:11:53 <dansmith> https://imgur.com/a/SHGmi 17:12:07 <melwitt> neat 17:12:35 <dansmith> mriedem: when you got your 30s run on my patch, did you run that a bunch of times or just once? 17:12:39 <dansmith> I know it was 10x per, 17:13:07 <dansmith> but I see some pretty big variances that correlate to other things on the system happening, so I'm just wondering if you being in a public cloud instance skewed one run 17:13:20 <mriedem> for GET /servers/detail with 2.53? 17:13:20 <dansmith> mine is running with plenty of ram, cpu and disk IO 17:13:23 <dansmith> yeah 17:13:34 <mriedem> it was just 10x in a loop 17:13:46 <dansmith> right but you didn't run the 10x loop a couple more times? 17:13:47 <mriedem> 46.3s was the highest 17:13:49 <mriedem> no 17:13:51 <dansmith> okay 17:14:02 <mriedem> i'm running loops of 10 with 1000 active instances with your big patch now 17:14:09 <dansmith> okay 17:14:47 <mriedem> otherwise yeah doing perf testing on a public cloud vm isn't ideal, 17:14:50 <mriedem> but it's basically all i've got 17:15:08 <dansmith> yeah, it's fine, it's just good to keep it in mind if we have an outlier we can't explain 17:15:16 <dansmith> anyway, 17:15:42 <dansmith> since I've got my thing up I'm going to keep poking at master, my patch, and the one that does the fault joining 17:15:45 <dansmith> at this point, 17:16:05 <dansmith> I think maybe we might want to drop the fault join because it doesn't really seem to help (and may hurt in its current form) 17:16:31 <dansmith> ah fsck, my token expired at 2.27 :P 17:17:31 <dansmith> so, anything else on this? 17:17:55 <mriedem> ha 17:18:02 <mriedem> yeah i have to remember to refresh that every once in a while 17:18:19 <mriedem> i don't have anything else on this 17:18:22 <dansmith> I have the timed loop isolated so I should just fetch a token before I run it each time 17:18:40 <dansmith> okay 17:18:41 <dansmith> #topic open discussion 17:18:44 <dansmith> anything else on any subject? 17:19:03 <mriedem> i've got a todo to circle back on forum topics 17:19:16 <mriedem> which i'll bring up in the team meeting tomorrow 17:19:22 <mriedem> and probably in the ML again before friday 17:19:29 <melwitt> just a FYI, I've been working on some cell databases testing stuff, to try to make our env more like reality to help catch more bugs in testing 17:19:46 <dansmith> melwitt: functional or devstack? 17:20:10 <mriedem> functional, it's the cells db fixture, 17:20:15 <mriedem> it defaults a context 17:20:19 <melwitt> dansmith: functional. namely trying to rejigger the way we use the fixture to see more reality 17:20:25 <mriedem> which can mask bugs when we should be targeting in code but aren't 17:20:29 <dansmith> ah I see 17:20:37 <mriedem> we've had a few bugs slip through testing b/c of that 17:20:40 <dansmith> yeah 17:20:58 <melwitt> not sure what yall are gonna think of it but, I've almost got it working (had to fix some tests). so I'll let you know when that's up 17:21:01 <mriedem> and then my huawei masters beat me 17:21:08 <melwitt> lol 17:21:47 <dansmith> okay anything else? 17:21:51 <mriedem> fortunately they don't care as much about instance list performance 17:21:58 <mriedem> only server create and delete performance 17:22:08 <mriedem> oh wait, nvm, they care about instance list too... 17:22:10 <mriedem> f me 17:22:11 <melwitt> well, you found that delete takes forever though 17:22:16 <mriedem> so did they 17:22:18 <melwitt> which is really weird 17:22:19 <melwitt> oh 17:22:37 <mriedem> but we can talk about that in -nova since dan is getting knifey 17:22:42 <dansmith> instance delete is pretty similar to instance boot 17:22:49 <dansmith> in terms of things involved 17:22:52 <mriedem> except, 17:22:55 <mriedem> it's serialized 17:22:57 <mriedem> unlike boot 17:23:04 <dansmith> eh? 17:23:04 <mriedem> where networking and bdms are concurrent 17:23:15 <dansmith> serialized where? 17:23:34 <mriedem> whatever the one of 10 methods in the compute manager cleans up network and volume attachments 17:23:42 <mriedem> _shutdown_instance? 17:23:51 <dansmith> oh you mean serialized in the delete process, not across all instances being deleted 17:24:00 <mriedem> right 17:24:02 <dansmith> gotcha 17:24:12 <mriedem> i think the public cloud team made that concurrent 17:24:18 <mriedem> they've done lots of things... 17:24:31 <mriedem> dims and i are trying to figure it out 17:24:31 <dansmith> anyway, yeah, we're off topic 17:24:35 <dansmith> so I think we're good to end, yes? 17:24:40 <mriedem> yes 17:24:42 <melwitt> yes 17:24:44 <dansmith> #endmeeting