Wednesday, 2017-09-27

mriedemdansmith: interesting, listing without details, 500 error (cell0) and 500 active (cell1) is a lot faster than the 1000 active,00:13
mriedemi suppose because we don't have as much to join00:13
dansmithmriedem: with my patch or before?00:13
mriedemoh shit, nvm - copy paste error00:13
mriedemwas using the compute endpoint url from my other devstack :)00:13
mriedem"wow this is fast!"00:14
mriedemhah, here we go, nice and slow00:15
mriedemfault loading mofos00:15
mriedem4.495s with GET /servers, 1000 ACTIVE vms. 11.185s with 500 error, 500 active00:20
mriedeminteresting, listing with details and microversion 2.53 is not much worse than with microversion 2.1 for the error/active mix case - it was nearly double between microversions when all were active00:44
mriedemdansmith: time for your change, do i need or just the one below it?00:47
dansmithmriedem: the one below it should orphan those so they're never called00:48
dansmithso you shouldn't notice any difference afaik00:48
*** yufei has joined #openstack-nova01:35
mriedemdansmith: ok i have results in
mriedemwith your change01:38
dansmithis that faster/same except for details?01:39
mriedemcompared to w/o your change, (1) GET /servers with microversion 2.1 is slightly faster01:39
mriedemGET /servers/detail with microversion is about the same, a bit faster01:39
mriedembut, GET /server/details with microversion 2.53 is slower01:39
mriedemnot a ton, but it's slower01:40
mriedem25.78 compared to 30.1001:40
mriedembut, it's not a huge different01:40
dansmithoh only detail with the later microversion01:40
mriedemsomething about >2.1 always makes listing with details slower01:40
dansmithand there's some fault handling behavior difference?01:40
mriedemat least because of the joins on the (1) services table and (2) tags table01:40
mriedemi don't think there is any fault handling behavior differences with microversion >2.101:41
dansmithokay I thought you were saying there was01:41
dansmithI dunno why because I'm pre-joining it when we were loading them separate01:41
*** mingyu has joined #openstack-nova01:41
mriedemthe only other joins i can think of right now with microversion >2.1 is on the services table (2.16) and tags able (2.26)01:41
mriedemstill, it's a difference of about 4 seconds, which isn't huge here01:42
dansmithso aside from fault, there's no difference in what I'm doing vs what we do currently,01:42
dansmithother than we're not serializing the queries01:42
dansmithwithout my change we issue the cell0 one and then the cell1 one, where now we're doing both at once01:43
dansmithis this a devstack vm on your laptop or something better?01:43
mriedemit's in a vexxhost vm01:43
mriedemthe fault stuff is the only major difference i can think of, since we'll be joining on fault all the time, rather than just for instances in ERROR state01:44
mriedemmaybe that is equaling things out somehow, idk, like if i had 1000 all in ERROR state before/after your change, that might be different in favor of yours01:45
dansmithhmm, yeah, I guess maybe that might be it01:45
mriedemi do have the numbers from yesterday before your change with 1000 ACTIVE,01:46
mriedemso tomorrow i could run yours through with all active and see if there is a bigger difference because of the fault join01:46
dansmithwell, I guess we could go back to the not automatic loading of fault01:46
mriedemi'll run that all active scenario tomorrow to see if it could be the fault stuff,01:46
mriedemit's nearly 9pm so i'm not going to do it tonight01:46
dansmiththere was something the API was doing that made it seem way better to do this than what it was doing01:47
dansmithbut it's been a while now01:47
dansmithwe could also plumb the logic of when to load the fault into the lower layers01:48
mriedemyup i was thinking that too01:49
mriedemanother thing that might be causing the microversion bloat, is maybe the microversion to pull the embedded flavor out of the instance01:49
mriedemadded in pike01:49
dansmithyou could run through each microversion and see where the spike is01:50
dansmiththe sorting layer on top of this really has nothing to do with what we're sorting though01:50
dansmithit doesn't make any more copies of things, nor iterate the list more times01:50
dansmithso, the change right before the switchover should do the fault loading but not the sorting, so you could run against that and see if it's more like the earlier or more like the later01:51
mriedem ?01:53
mriedemlike, revert that on top of the change that uses the new code in the API01:53
mriedemoh, nvm,01:54
dansmithoh, I guess you were running on master already?01:54
mriedemyes, new devstack as of today01:54
dansmithyeah, okay01:54
*** Apoorva has joined #openstack-nova01:54
mriedemso 2.16 makes us join on services, 2.26 makes us join on tags, 2.47 returns instance.flavor, and your change always joins on faults01:55
mriedem2.47 is suspicious01:55
mriedemsince that's from instance_extra01:55
dansmithbut again, it shouldn't be any different01:55
mriedemyeah, nvm, we also didn't start loading that in the api as of 2.47, we already pulled out instance.flavor to get the link stuff01:57
mriedemtotally unrelated, but when we lazy-load instance.flavor, we're still joining on system_metadata now, we should be able to stop doing that01:59
mriedemok, i'll run through with 1000 active instances tomorrow with your change and see if that makes a big difference, and if so, it could be the fault thing02:04
*** thorst has joined #openstack-nova02:07
*** Tom_ has joined #openstack-nova02:24
*** Tom__ has joined #openstack-nova02:27
*** mingyu has quit IRC03:23
*** yangyapeng has quit IRC03:32
*** yangyapeng has joined #openstack-nova03:32
*** itlinux has quit IRC03:41
*** Tom has quit IRC03:55
*** owalsh has quit IRC03:56
*** crushil has quit IRC03:57
*** Sukhdev has quit IRC04:29
*** psachin has joined #openstack-nova04:47
*** mdbooth has joined #openstack-nova04:48
*** ansiwen has joined #openstack-nova04:48
*** vladikr has joined #openstack-nova04:50
*** chyka has quit IRC05:11
*** thorst has joined #openstack-nova05:13
*** vladikr has quit IRC05:14
*** vladikr has joined #openstack-nova05:17
*** Eran_Kuris has joined #openstack-nova05:38
openstackgerritjichenjc proposed openstack/nova master: check query param for server groups function
*** moshele has quit IRC06:38
*** armax has quit IRC06:50
*** sree has joined #openstack-nova07:07
*** ragiman has joined #openstack-nova07:39
*** manasm has quit IRC07:40
*** manasm has joined #openstack-nova07:41
gibijohnthetubaguy: hi! do you have any comments on ? mikal has already stated that rackspace private cloud is not using it
*** baoli has quit IRC08:32
ralonsohdansmith: hi, can I talk about
*** udesale has quit IRC08:57
*** udesale has joined #openstack-nova08:57
johnthetubaguygibi: I am very far removed from if that would be used now08:58
johnthetubaguygibi: I believe the context around that was providing an SLA that involved knowing about the 500 errors in the API, and doing bug fixes to reduce them08:59
johnthetubaguygibi: i.e. the SLA is related to the % of 5xx errors from the API, so errors in the API are really important to those folks08:59
johnthetubaguygibi: I don't remember using those notifications myself, mostly got that data from ELK when I was last looking at that stuff09:00
*** esberglu has joined #openstack-nova09:01
johnthetubaguygibi: tl;dr +1 on killing it09:01
gibijohnthetubaguy: thanks for the info09:02
johnthetubaguygibi: np09:02
stephenfingibi: Super easy docs patch needing another +2 here, if you're twiddling your thumbs at any point today :)
stephenfinWell, kinda easy09:04
gibistephenfin: I will look shortly09:04
stephenfinno rush09:04
*** claudiub|2 has joined #openstack-nova09:05
*** esberglu has quit IRC09:06
*** claudiub|3 has joined #openstack-nova09:06
*** alexchadin has joined #openstack-nova09:07
*** alexchadin has quit IRC09:08
*** claudiub has quit IRC09:08
*** alexchadin has joined #openstack-nova09:08
*** claudiub|2 has quit IRC09:10
*** alexchadin has quit IRC09:11
gibistephenfin: left some comments in
stephenfingibi: On it. Thanks :)09:42
gibistephenfin: sorry for being picky. I can accept if most of my comments are fixed in a followup09:42
gibistephenfin: the only thing that I think is a must
*** thorst has joined #openstack-nova09:47
gibistephenfin: if you need something to review then I can suggest this test improvement series :)09:47
*** Tom has joined #openstack-nova10:20
*** yamamoto has joined #openstack-nova10:20
*** Tom has quit IRC10:21
*** udesale has quit IRC10:52
*** phuongnh has quit IRC11:18
*** diga has quit IRC11:23
manasmbauzas: you may want to take a look at .11:57
openstackLaunchpad bug 1719859 in OpenStack Compute (nova) "Resize failure due to instance group being None in request spec" [Undecided,New]11:57
*** tylerderosagrund has quit IRC11:57
*** acormier has quit IRC12:29
*** liverpooler has joined #openstack-nova13:03
tssuryasdague : thanks for the reply, so here is my situation : I have some instances lying in my nova_cell1 DB which need to be mapped to my new cell (cell1). however the db access URL for this is inside nova_cell1.conf ? so when I run the map instances command, they do not get mapped.13:05
*** smatzek has quit IRC13:05
tssuryasdague: I think its because the queries are sent to the nova_cell0 DB since nova.conf has that access url.13:06
efriedcdent Yo13:07
cdentefried: yo13:08
*** mriedem has joined #openstack-nova13:09
cdentefried: my question is basically: in powervm does the thing which is the actual hypervisor ever host workloads that are not managed by nova. The comments on this bug for context:
openstackLaunchpad bug 1718212 in OpenStack Compute (nova) "Compute resource tracker does not report correct information for drivers such as vSphere" [Medium,In progress] - Assigned to Radoslav Gerganov (rgerganov)13:09
cdentsince placement is authoritative for dynamic workloads if nova is not doing all the managing, there may be need for additional things talking to placement13:10
efriedcdent That's a multi-pronged question.13:10
cdentIt’s a full on dinner fork13:10
openstackgerritSean Dague proposed openstack/nova master: Support qemu >= 2.10
efriedFirst, nothing is stopping the user from creating VMs outside the auspices of OpenStack.13:11
*** jistr is now known as jistr|call13:12
efriedSecond, there's a mode (the most common one, historically) where the I/O virtualization is done by one (or two, cause redundancy/HA) separate partitions, in which case those guys are consuming resources outside of Nova's purview.13:12
efriedcdent Now I'll look at the bug...13:13
bauzasefried: creating VMs by the hypervisor directly is not supported by Nova13:13
*** pino has joined #openstack-nova13:13
*** moshele has joined #openstack-nova13:13
*** krtaylor has joined #openstack-nova13:13
efriedbauzas Oh, Nova won't pick them up, for sure.  But there's nothing stopping the user from doing it.13:14
cdentefried: So, short answer is “yes” so second question is “Do you yet have a plan on how to manage it”13:14
*** lucasxu has joined #openstack-nova13:14
efriedbauzas Well...13:14
efriedI agree in principle.13:14
bauzasefried: lemme find where we say that we don't support direct hypervisor calls13:14
efriedBut that doesn't mean we won't try to accomodate out-of-band partitions when reporting inventory/usage.13:15
openstackgerritMatt Riedemann proposed openstack/nova master: Log consumer uuid when retrying claims in the scheduler
efriedbauzas Sure, but I totally believe you.13:15
openstackgerritAndrey Volkov proposed openstack/nova master: [WIP] List instances performace optimization
*** MVenesio has joined #openstack-nova13:16
bauzasefried: I didn't found any explanations either in or
bauzasmaybe we should say that13:17
efriedcdent Okay, in general, we have very similar issues as described in comment #6 in that bug.13:17
*** chyka has joined #openstack-nova13:18
efriedWhen I started looking into converting over to get_inventory, it was going to be a matter of reporting "reserved" amounts based on whatever's going on OOB.13:18
johnthetubaguyI thought we were heading down the not doing live updates of resource usage?13:18
cdentefried: do you have a convenient way of distinguishing between nova managed and not-nova managed stuff?13:18
efried...and it's not trivial to figure out what's OOB.13:19
efried(I was just about to say :)13:19
efriedI can know off the bat to account for my Novalink partition (the node on which the compute service runs) and my Virtual I/O Servers.13:19
cdentjohnthetubaguy: that seems to work okay for libvirt, but not so great otherwise13:19
johnthetubaguycdent: I guess I am not seeing why, is that vmware bug the example?13:20
openstackgerritRodolfo Alonso Hernandez proposed openstack/os-vif master: Migrate from 'ip' commands to 'pyroute2'
cdentjohnthetubaguy: at this stage efried, rgerganov and I are just sort of having a chat, not making any decisions13:20
bhagyashrisjohnthetubaguy, mriedem: Hi,13:20
efriedSimplest would be, every time get_inventory/get_available_resource is called, I ask Nova for the full list of instances it knows about, ask my hypervisor for the instances IT knows about, and do a set-diff.13:21
johnthetubaguyI am just curious where the problem isn't nova driver expected resource usage13:21
cdentjohnthetubaguy: that’s a specific case of a general issue of “things other than the nova-compute node using the stuff that nova-compute is also using”13:21
efriedjust so.13:21
johnthetubaguyI think I need a more concrete example13:21
mriedemcdent: i think the vcenter vms have some nova metadata associated with them, so you can tell which ones are nova-managed and which were created oob13:22
efriedjohnthetubaguy You try to spawn an instance and ask for 3 VCPU.  Resource tracker reports you've got 3 VCPU, so the claim passes.13:22
efriedjohnthetubaguy But in fact, one VCPU is being consumed by a VM that was spawned out of band13:22
rgerganovmriedem, that may work for vcpu and memory but it won't work well for storage13:22
*** chyka has quit IRC13:22
johnthetubaguyits the out of band VMs, I thought we explicitly didn't support that13:22
efriedjohnthetubaguy Yeah, bauzas said the same, but didn't find where that's documented.13:23
efriedNot saying that means it's supported, or that it should be :)13:23
bauzasso, Nova isn't a proxy layer for hypervisors13:23
johnthetubaguyyeah, I could have swarn it was in here, but I don't see it:
bauzasjohnthetubaguy: yeah, I verified that13:24
bauzaslemme provide a change for it13:24
johnthetubaguywell, its a bit late I guess13:24
cdentSo, even if the statement is “we don’t do that” the problem still holds for shared storage13:25
mriedemnova doesn't import existing vms on the hypervisor,13:25
mriedembut that doesn't mean we don't try to adjust inventory based on things running on the hypervisor host13:25
cdentwhere “the problem” is the generic notion of mixed accounting13:25
efriedmriedem Which actually makes it more problematic.13:25
johnthetubaguycdent: yeah, that sounds like a real thing we have to support13:25
mriedemthat's why we have reserved13:25
mriedemeven libvirt hosts have to account for things like ovs running on the same host13:26
cdentRight, so one of the underlying questions is:13:26
cdentDo we intend/expect that reserved will by dynamically adjuted, frequently13:26
*** baoli has joined #openstack-nova13:26
cdentOr in the cases where we want it to be dynamically adjusted we should instead make allocations, via some third party?13:26
efriedRight, back to what johnthetubaguy mentioned earlier: "not doing live updates of resource usage"13:26
efriedThat is, will get_inventory() eventually be a thing that's run only once, rather than on a periodic?13:27
bhagyashrisjohnthetubaguy, mriedem: Could you please review patch: ? Addressed all review comments. Thank you :)13:27
*** lbragstad has joined #openstack-nova13:27
mriedemi don't think it will no13:27
bauzasI'm not seeing the reserved bit to be that dynamicv13:27
johnthetubaguymriedem: it totally feels like the toggling reserved values ticks the 80% case, with very few surpizes13:30
bauzasproviding a way to dynamically adjust resources in Nova would just mean we create things to support13:35
bauzasjohnthetubaguy: that's my point13:35
rgerganovjohnthetubaguy, most users runs dedicated clusters but also used shared storage13:37
johnthetubaguywe should make SIG_HUP trigger a resource refresh?13:43
bauzasmriedem: yup, I just provided my thoughts litterally 5 mins ago :p13:55
dansmithefried: nova wouldn't be reporting the inventory for a SAN disk, so no problem there :)13:56
dansmithefried: no, nova-compute isn't reporting any network resources14:00
*** ragiman has quit IRC14:00
bauzasmriedem: I got some late pings yesterday14:01
dansmithefried: I think the block you're stumbling over is that nova does not count things it does not count. Therefore, it does not dynamically update inventory for things it does not count. If you want to override the inventory for a thing it _does_ count to account for things it does not count, then you put that in config an HUP it to notice.14:01
bauzasmriedem: I also have a couple of bugfixes that are on hold in my queue14:02
bauzasthat would also help me focusing on rebasing such bugs14:02
tssuryadansmith : okay! so that means the current working is as expected ?14:06
dansmithtssurya: it's been a while since I looked at that command, but I would guess so14:07
*** sree has quit IRC14:09
mriedemthere is one thing missing14:10
mriedemon the failure path, it doesn't have a box saying that the source node is running the _rollback_live_migration method14:10
efriedmriedem That would be like a box to the left of that final 'call' arrow at the bottom?14:11
*** cdent has quit IRC14:17
*** hongbin has joined #openstack-nova14:27
mriedemi was noticing some stuff like this yesterday when writing another test,14:28
mriedemgibi: sure14:37
gibimriedem: thanks a lot14:37
efriedjohnthetubaguy That's a fun thought.  Is there a real use case for it?14:39
johnthetubaguyor disable hyperthreading14:39
*** mnestratov has joined #openstack-nova14:50
mriedemjohnthetubaguy: idk,14:51
cdentyeah, we had “do some performance testing” in the weekly rp update for so long that I eventually took it out from apparent lack of interest15:02
mriedemok. our public cloud guys have made tweaks to the scheduler for performance in mitaka, and lots of those tweaks i've said, "this thing in pike should resolve/replace that" but i don't have hard evidence15:02
dansmithso that's cool15:06
cdentso we race to get the transaction15:07
*** yamamoto has joined #openstack-nova15:10
dansmithcdent: my point is I don't know why we'd be hitting this need to retry with a single thread of allocations15:10
dansmithcdent: no, single 100-instance boot, so one for loop15:12
mriedemi see a buttload of the "we're on a pike compute with all pike computes, so not healing allocations" all the time15:13
dansmithcdent: it should be pretty easy to reproduce (or not) in a devstack and then you can instrument the code as needed15:16
mriedemwell, in this devstack vm which is not local15:17
*** manasm has quit IRC15:20
*** josecastroleon has joined #openstack-nova15:21
dansmithmriedem: speaking of reviewing, I'm not sure what else to do on the base switchover patch since the difference is not measurable on my box. I'm poking at the fault thing in the later patch, but we can just strip that out and work on it in parallel15:27
*** jmlowe has quit IRC15:32
dansmithcdent: but that's just incidental to him finding the problem15:37
cdentmriedem, dansmith, (and sean mooney): If any of you get a chance to look at the discussion on (limiting GET /allocation_candidates ) it’s gotten to the point where we are trying to decide what it is that we are actually optimizing for, so could do with more input15:42
*** mnestratov has quit IRC15:53
mriedemcdent: see my comment from may 26 on that patch16:01
openstackLaunchpad bug 1719915 in OpenStack Compute (nova) "test_live_migrate_delete race fail when checking allocations: MismatchError: 2 != 1" [Medium,In progress] - Assigned to Balazs Gibizer (balazs-gibizer)16:03
*** r-daneel has joined #openstack-nova16:12
*** shardy has joined #openstack-nova16:25
*** Tom_ has quit IRC16:28
openstackgerritMerged openstack/nova master: Fix IoOpsFilter test case class name.
*** david-lyle has quit IRC16:40
johnthetubaguyyeah, thats the ones I think16:47
dansmithjohnthetubaguy: at some point I think it'll be a little of both17:25
openstackgerritMatt Riedemann proposed openstack/nova master: Remove dest node allocations during live migration rollback
mriedemi wonder if we're lazy-loading the flavor extra specs?17:53
mriedemthe policy check for each instance?17:58
*** ralonsoh has quit IRC18:06
mriedemdefine "flavor pinning"18:09
Tengumriedem: the metadata is like "gen1=true" for first aggregate, "gen2=true" for the second.18:17
*** acormier has quit IRC18:34
*** lpetrut has joined #openstack-nova18:47
*** moshele has joined #openstack-nova18:55
bauzaswhen I saw the tweet for 2.47 :p19:17
sdaguejust so the test is more concisely valid19:31
*** gjayavelu has quit IRC19:38
sdaguein tests19:46
sdaguemriedem: yeh, that's the stat calls19:54
dansmithI see no impact as the first numbers are coming out20:05
mriedemso i'll just add a case for fake and make it the same as libvirt?20:38
*** thorst has quit IRC21:09
*** yamahata has joined #openstack-nova21:36
*** gjayavelu has joined #openstack-nova21:40
melwittit's easy to do on their site. there's no free option for US ppl, have to get the ETA visa and it costs $20 AUD21:45
mriedemooo the hotel has an infinity pool21:48
*** baoli has quit IRC21:54
*** takashin has joined #openstack-nova22:06
*** smatzek has quit IRC22:15
*** rtjure has quit IRC23:01
mriedem_awayeasier to change them later if needed23:31
