17:00:00 #startmeeting nova_cells 17:00:00 Meeting started Wed Sep 27 17:00:00 2017 UTC and is due to finish in 60 minutes. The chair is dansmith. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:02 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:04 The meeting name has been set to 'nova_cells' 17:00:08 o/ 17:00:37 o/ 17:01:03 doesn't look like surya is around 17:01:17 assuming I remember that nick properly 17:01:22 tssurya: here? 17:01:30 oh forgot there was a t in front 17:01:52 anyway 17:01:55 #topic bugs 17:02:09 https://bugs.launchpad.net/nova/+bugs?field.tag=cells&orderby=-datecreated&start=0 17:02:15 i was poking through those the other day 17:02:41 https://review.openstack.org/#/q/status:open+project:openstack/nova+topic:bug/1715533 17:02:49 i've got those backports lined up for the map_instances things 17:02:51 *thing 17:02:55 cool 17:03:01 I shall leave that open 17:03:25 anything else we should discuss here/ 17:03:39 umm, 17:03:46 no. we should just scrub that list at some point 17:03:51 yeah 17:03:51 probably cells v1 things in there we won't fix 17:03:58 aye 17:04:05 #topic open reviews 17:04:20 before I get to whining about my instance list set, anything we should note here? 17:04:28 ed has the selection patches and spec up for review, 17:04:38 which I've been trying to circle back to every once in a while 17:04:53 are you lumping the alternate hosts stuff in with that as well? 17:04:56 i know he has a spec for both 17:05:00 #link https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/placement-allocation-requests 17:05:19 it's all part of the same effort yeah 17:06:01 ok, i haven't reviewed any of it yet 17:06:05 are there any major issues? 17:06:24 well, the limits thing might be, but I haven't gone back around to that since yesterday 17:08:13 anything else before we get to instance list stuff? 17:08:25 nope 17:08:40 okay, so, for the logs: 17:08:52 mriedem did some perf testing with and without the base instance list cutover patch 17:09:04 and seemed to measure some slowerness with the patch applied 17:09:23 I tried to reproduce and I don't see the differing behavior 17:09:32 but in the process, 17:09:38 we probably uncovered some placement issues, 17:09:54 and definitely are seeing a large performance different (with and without the patch applied) between 2.1 and 2.53 17:10:11 like going from average of 6s per request to 9s for listing 1000 instances 17:10:38 yup, plus you can't create 500 instances at once 17:10:40 details are here: 17:10:41 #link https://etherpad.openstack.org/p/nova-instance-list 17:10:46 mriedem: that's the placement thing I mentioned 17:10:53 oh right 17:11:04 so right now, I'm running every microversion from 1 to 53 for 10x and recording the results 17:11:11 I'm up to 2.22 on master 17:11:25 and so far I'm still at ~6s 17:11:27 looking for a jump 17:11:53 https://imgur.com/a/SHGmi 17:12:07 neat 17:12:35 mriedem: when you got your 30s run on my patch, did you run that a bunch of times or just once? 17:12:39 I know it was 10x per, 17:13:07 but I see some pretty big variances that correlate to other things on the system happening, so I'm just wondering if you being in a public cloud instance skewed one run 17:13:20 for GET /servers/detail with 2.53? 17:13:20 mine is running with plenty of ram, cpu and disk IO 17:13:23 yeah 17:13:34 it was just 10x in a loop 17:13:46 right but you didn't run the 10x loop a couple more times? 17:13:47 46.3s was the highest 17:13:49 no 17:13:51 okay 17:14:02 i'm running loops of 10 with 1000 active instances with your big patch now 17:14:09 okay 17:14:47 otherwise yeah doing perf testing on a public cloud vm isn't ideal, 17:14:50 but it's basically all i've got 17:15:08 yeah, it's fine, it's just good to keep it in mind if we have an outlier we can't explain 17:15:16 anyway, 17:15:42 since I've got my thing up I'm going to keep poking at master, my patch, and the one that does the fault joining 17:15:45 at this point, 17:16:05 I think maybe we might want to drop the fault join because it doesn't really seem to help (and may hurt in its current form) 17:16:31 ah fsck, my token expired at 2.27 :P 17:17:31 so, anything else on this? 17:17:55 ha 17:18:02 yeah i have to remember to refresh that every once in a while 17:18:19 i don't have anything else on this 17:18:22 I have the timed loop isolated so I should just fetch a token before I run it each time 17:18:40 okay 17:18:41 #topic open discussion 17:18:44 anything else on any subject? 17:19:03 i've got a todo to circle back on forum topics 17:19:16 which i'll bring up in the team meeting tomorrow 17:19:22 and probably in the ML again before friday 17:19:29 just a FYI, I've been working on some cell databases testing stuff, to try to make our env more like reality to help catch more bugs in testing 17:19:46 melwitt: functional or devstack? 17:20:10 functional, it's the cells db fixture, 17:20:15 it defaults a context 17:20:19 dansmith: functional. namely trying to rejigger the way we use the fixture to see more reality 17:20:25 which can mask bugs when we should be targeting in code but aren't 17:20:29 ah I see 17:20:37 we've had a few bugs slip through testing b/c of that 17:20:40 yeah 17:20:58 not sure what yall are gonna think of it but, I've almost got it working (had to fix some tests). so I'll let you know when that's up 17:21:01 and then my huawei masters beat me 17:21:08 lol 17:21:47 okay anything else? 17:21:51 fortunately they don't care as much about instance list performance 17:21:58 only server create and delete performance 17:22:08 oh wait, nvm, they care about instance list too... 17:22:10 f me 17:22:11 well, you found that delete takes forever though 17:22:16 so did they 17:22:18 which is really weird 17:22:19 oh 17:22:37 but we can talk about that in -nova since dan is getting knifey 17:22:42 instance delete is pretty similar to instance boot 17:22:49 in terms of things involved 17:22:52 except, 17:22:55 it's serialized 17:22:57 unlike boot 17:23:04 eh? 17:23:04 where networking and bdms are concurrent 17:23:15 serialized where? 17:23:34 whatever the one of 10 methods in the compute manager cleans up network and volume attachments 17:23:42 _shutdown_instance? 17:23:51 oh you mean serialized in the delete process, not across all instances being deleted 17:24:00 right 17:24:02 gotcha 17:24:12 i think the public cloud team made that concurrent 17:24:18 they've done lots of things... 17:24:31 dims and i are trying to figure it out 17:24:31 anyway, yeah, we're off topic 17:24:35 so I think we're good to end, yes? 17:24:40 yes 17:24:42 yes 17:24:44 #endmeeting