openstackgerrit | Matt Riedemann proposed openstack/nova master: Fix bad links for admin-guide https://review.openstack.org/580259 | 00:12 |
---|---|---|
openstackgerrit | Merged openstack/nova stable/queens: Add recreate test for RT.stats bug 1784705 https://review.openstack.org/587921 | 00:32 |
openstack | bug 1784705 in OpenStack Compute (nova) queens "ResourceTracker.stats can leak across multiple ironic nodes" [High,In progress] https://launchpad.net/bugs/1784705 - Assigned to Matt Riedemann (mriedem) | 00:32 |
*** mriedem has quit IRC | 01:06 | |
openstackgerrit | zhufl proposed openstack/nova master: xx_instance_type_id in list_migrations should be integer https://review.openstack.org/588481 | 01:23 |
*** lei-zh has joined #openstack-placement | 03:08 | |
openstackgerrit | Merged openstack/nova stable/queens: Make ResourceTracker.stats node-specific https://review.openstack.org/587976 | 03:18 |
openstackgerrit | Merged openstack/nova master: Fix none-ascii char in doc https://review.openstack.org/588422 | 03:18 |
openstackgerrit | Merged openstack/nova master: Add microversion info in the os-server-groups API samples https://review.openstack.org/589006 | 03:19 |
openstackgerrit | Merged openstack/nova master: api-ref: Add descriptions for rebuild https://review.openstack.org/588931 | 03:19 |
*** lei-zh has quit IRC | 04:03 | |
*** jaypipes has quit IRC | 04:11 | |
*** jaypipes has joined #openstack-placement | 04:11 | |
*** tetsuro has joined #openstack-placement | 04:18 | |
*** takashin has joined #openstack-placement | 05:11 | |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: [placement] api-ref: add description for 1.29 https://review.openstack.org/589407 | 05:18 |
*** e0ne has joined #openstack-placement | 05:40 | |
*** e0ne has quit IRC | 05:52 | |
*** e0ne has joined #openstack-placement | 06:03 | |
*** nicolasbock has joined #openstack-placement | 06:03 | |
*** takashin has left #openstack-placement | 06:39 | |
*** e0ne has quit IRC | 06:51 | |
*** s10 has joined #openstack-placement | 06:58 | |
openstackgerrit | Tetsuro Nakamura proposed openstack/nova master: api-ref: fix min_version for parent_provider_uuid in responses https://review.openstack.org/579577 | 07:05 |
openstackgerrit | Tetsuro Nakamura proposed openstack/nova master: [placement] api-ref: add description for 1.29 https://review.openstack.org/589407 | 07:05 |
*** s10 has quit IRC | 07:08 | |
*** tetsuro_ has joined #openstack-placement | 08:06 | |
*** tetsuro has quit IRC | 08:07 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Use placement 1.28 in scheduler report client https://review.openstack.org/583667 | 08:10 |
*** cdent has joined #openstack-placement | 08:20 | |
*** tssurya has joined #openstack-placement | 08:27 | |
*** tetsuro_ has quit IRC | 08:41 | |
openstackgerrit | Kevin Zhao proposed openstack/nova master: Skip CPU comparison in Aarch64 https://review.openstack.org/589769 | 09:15 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: libvirt: Reduce calls to qemu-img during update_available_resource https://review.openstack.org/589513 | 09:21 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: libvirt: Add workaround to stop use of qemu-img by the RT https://review.openstack.org/589567 | 09:21 |
openstackgerrit | OpenStack Release Bot proposed openstack/osc-placement stable/rocky: Update .gitreview for stable/rocky https://review.openstack.org/589773 | 09:34 |
openstackgerrit | OpenStack Release Bot proposed openstack/osc-placement stable/rocky: Update UPPER_CONSTRAINTS_FILE for stable/rocky https://review.openstack.org/589774 | 09:34 |
openstackgerrit | OpenStack Release Bot proposed openstack/osc-placement master: Update reno for stable/rocky https://review.openstack.org/589775 | 09:34 |
*** edmondsw has joined #openstack-placement | 09:45 | |
*** edmondsw has quit IRC | 09:49 | |
openstackgerrit | Chris Dent proposed openstack/nova master: Add explicit functional-py36 tox target https://review.openstack.org/589825 | 10:05 |
openstackgerrit | Boxiang Zhu proposed openstack/nova master: Refactor cell_type in compute/api.py https://review.openstack.org/589833 | 10:38 |
openstackgerrit | Sergii Golovatiuk proposed openstack/nova master: Fix URI for IPv6 https://review.openstack.org/589548 | 10:43 |
openstackgerrit | Sergii Golovatiuk proposed openstack/nova master: Fix URI for IPv6 https://review.openstack.org/589548 | 11:02 |
openstackgerrit | Sergii Golovatiuk proposed openstack/nova master: Fix URI for IPv6 https://review.openstack.org/589548 | 11:05 |
openstackgerrit | Chen proposed openstack/nova master: Update ssh configuration doc https://review.openstack.org/589844 | 11:20 |
openstackgerrit | Chen proposed openstack/nova master: Update ssh configuration doc https://review.openstack.org/589844 | 11:23 |
openstackgerrit | Merged openstack/nova master: Fix resize revert to use non-legacy alloc handling https://review.openstack.org/589425 | 11:42 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: libvirt: Always escapte IPv6 addresses when used in migration URI https://review.openstack.org/589548 | 12:13 |
*** tssurya has quit IRC | 12:13 | |
jaypipes | efried: when does stein open again? | 12:18 |
*** e0ne has joined #openstack-placement | 12:34 | |
*** tssurya has joined #openstack-placement | 12:36 | |
*** mriedem has joined #openstack-placement | 12:59 | |
*** edmondsw has joined #openstack-placement | 13:05 | |
edleafe | jaypipes: usually when RC1 is cut, so I think this Thursday | 13:06 |
jaypipes | ack, thx edleafe | 13:07 |
*** e0ne has quit IRC | 13:18 | |
cdent | edleafe, efried, jaypipes: I'm doing some hand-wavey scale explorations on placement and I have some tentative results that are going to warrant some additional digging (to see where the time is lost): | 13:18 |
cdent | I create 1000 resources providers, all with the same generic vcpu, disk, memory inventory | 13:19 |
cdent | http://127.0.0.1:8081/allocation_candidates?resources=VCPU:1,MEMORY_MB:256,DISK_GB:10&limit=5 is taking approx 17 seconds | 13:20 |
cdent | if I take off the limit it is still about 17 seconds, but the amount of data returned is much more, so that suggests it is in the results creation, not serializing those results to json or sending them out the door | 13:21 |
cdent | if I do just one resource class the time drops to 8 seconds | 13:22 |
cdent | no traits, nested or shared yet | 13:22 |
cdent | I'll be doing further experiments. The initial code is at https://github.com/cdent/placeload | 13:23 |
openstackgerrit | Matthew Booth proposed openstack/nova master: fixtures: Track volume attachments within CinderFixtureNewAttachFlow https://review.openstack.org/587013 | 13:23 |
efried | cdent: Wow, very interesting. Previously you've said most performance issues could be resolved by giving the placement service more umph. Not so in this case? | 13:25 |
cdent | efried: I think things have changed a lot since: https://anticdent.org/placement-scale-fun.html | 13:27 |
cdent | and also the tests in ^ were only 75 resource providers | 13:27 |
jaypipes | cdent: yeah, that's way out of line. should be milliseconds with only 1000 providers. | 13:28 |
cdent | I'm going to switch from my containery thing to devstack so I can put in some profiling more easily | 13:29 |
jaypipes | cdent: am I missing something in placeload I don't see any calls to a-c? | 13:30 |
cdent | i did the a-c calls by hand after the data was loaded | 13:30 |
cdent | using curl | 13:30 |
*** e0ne has joined #openstack-placement | 13:31 | |
cdent | jaypipes: my intention was to get a bit further along and have more in placeload itself, but I paused to do a sanity check and things didn't seem sane, so I've diverted | 13:31 |
jaypipes | cdent: ah. | 13:31 |
jaypipes | cdent: how many candidates are returned from the initial call to a-c before any claiming? the expected 1000? | 13:32 |
cdent | if it's any consolation writing all the resource providers and allocations is _very_ fast | 13:33 |
cdent | (when doing the asyncio fun) | 13:33 |
cdent | as far as I can tell the json results are as expected | 13:34 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: [placement] api-ref: add description for 1.29 https://review.openstack.org/589407 | 13:37 |
openstackgerrit | Matthew Booth proposed openstack/nova master: fixtures: Fail deleting non-existent attachment in CinderFixtureNewAttachFlow https://review.openstack.org/589900 | 13:37 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: [placement] api-ref: add description for 1.29 https://review.openstack.org/589407 | 13:38 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Update the parameter explain when updating a volume attachment https://review.openstack.org/565181 | 13:46 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Update the parameter explain when updating a volume attachment https://review.openstack.org/565181 | 13:48 |
cdent | confirmed in devstack | 13:50 |
efried | cdent: How easy to revert to an earlier commit and try again? | 13:54 |
cdent | efried: not too difficult, but I'm going to some random logs in places first to be able to make a reasonable guess on which commit to go back to | 13:55 |
efried | cdent: Curious if the granular stuff is what's killing us. I haven't looked and don't remember clearly, but it's possible we're still doing "merge math" on the whole set of candidates even though not using granular groups. | 13:55 |
efried | cdent: Okay, so logging will tell that too. | 13:55 |
efried | I assume you're going to wrap the various sql bits in logs. | 13:56 |
cdent | pretty much yeah | 13:59 |
cdent | efried, jaypipes : part of it is the ResourceProvider.get_by_uuid added in https://review.openstack.org/#/c/567113/ line 3610 of objects/resource_provider.py | 14:15 |
cdent | that's maybe half of it | 14:15 |
efried | ugh, that's going to be a bear to unwind | 14:17 |
cdent | i don't think we want to unwind it, I think we want to fix it | 14:17 |
cdent | we've already loaded all the resource providers many times throughout this set of queries | 14:18 |
efried | It would be easy enough to prefetch a cache of RPs according to the root_idZ extracted from rp_tuples, but that's only going to help you where there's more than one candidate per RP, which there isn't in this case. | 14:18 |
efried | Yeah, implement some caching in get_by_uuid? | 14:18 |
efried | in such a way that the cache only survives for the duration of the transaction. | 14:19 |
cdent | no, that's not what I mean | 14:20 |
cdent | prior to getting to this point in the code we've already gathered all the data necessary to construct the rps, we don't need to requery, we need to reuse the information we already have | 14:20 |
efried | from e.g. the provider summaries, yah. | 14:21 |
efried | Well, I'm guessing the other half of the hit is the get_by_uuidZ we did to retrieve 'em the first time... | 14:21 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: libvirt: Reduce calls to qemu-img during update_available_resource https://review.openstack.org/589513 | 14:22 |
cdent | there are two other hot spots that haven't quite narrowed down | 14:23 |
cdent | something in _build_provider_summaries | 14:23 |
cdent | and then somewhere in the building of the allocation requests | 14:24 |
cdent | yeah same thing in _build_provider_summaries | 14:26 |
efried | get_by_uuid? | 14:26 |
jaypipes | cdent: ugh, I can't believe I failed to catch that N+1 queries problem in that patch. :( | 14:27 |
jaypipes | ffs | 14:27 |
cdent | I'll have a diff with commentary here in a few minutes and will put it on a bug and we can figure out what to do | 14:28 |
jaypipes | we're doing 1000 separate SELECT * FROM resource_providers WHERE uuid = ? queries there on line 3610. | 14:28 |
efried | yes | 14:28 |
efried | but... shouldn't we be able to do that? | 14:28 |
jaypipes | so it's not actually one of the "complicated" queries. it's the fact that we're calling RP.get_by_uuid() from within a loop. :( | 14:29 |
efried | and... is it really more efficient to do SELECT * FROM resource_providers WHERE uuid IN (1000 uuids) ? | 14:29 |
efried | or is it building the OVOs that's killing us? | 14:29 |
jaypipes | efried: yes. it's 1000% more efficient. | 14:29 |
jaypipes | efried: has little to nothing to do with ovo. | 14:29 |
efried | Okay, so back-of-napkin: 1) pull out the RP IDs so we can do a single query; 2) reuse providers retrieved earlier | 14:30 |
openstackgerrit | Matthew Booth proposed openstack/nova master: fixtures: Track volume attachments within CinderFixtureNewAttachFlow https://review.openstack.org/587013 | 14:33 |
openstackgerrit | Matthew Booth proposed openstack/nova master: fixtures: Fail deleting non-existent attachment in CinderFixtureNewAttachFlow https://review.openstack.org/589900 | 14:33 |
jaypipes | efried, cdent: the only reason we're doing RP.get_by_uuid() there is to get the root_provider_uuid value. :( | 14:40 |
efried | jaypipes: We need it for the provider summaries anyway. | 14:41 |
jaypipes | efried: right, but we should have that information already before looping over things. | 14:41 |
jaypipes | cdent: good on ya for locating the source of the performance problems. I consider this the highest priority thing to fix right now. (for rocky) | 14:43 |
* cdent is making the bug | 14:43 | |
jaypipes | cdent: it will take more than a little refactoring, but I'm happy to do it. | 14:43 |
cdent | jaypipes, efried : https://bugs.launchpad.net/nova/+bug/1786055 | 14:50 |
openstack | Launchpad bug 1786055 in OpenStack Compute (nova) "performance degradation in placement with large number of resource providers" [High,Confirmed] | 14:50 |
cdent | the diff within already has one of the fixes (maybe; the functional tests still pass) and that fix is the one that saves the most time | 14:51 |
cdent | let me know if I'm not making any sense in that report | 14:51 |
efried | Thanks cdent, nice work sir. | 14:54 |
cdent | s/saves.*/recovers about half the time/ | 14:54 |
cdent | 15 -> 7 seconds | 14:54 |
cdent | i'd happily do this sort of stuff all the time if we could find time for it | 14:54 |
efried | jaypipes: Are you working on N-SELECTs-to-1 refactor rn? | 14:57 |
*** e0ne has quit IRC | 14:58 | |
openstackgerrit | Eric Fried proposed openstack/nova master: Fix host validity check for live-migration https://review.openstack.org/401009 | 15:02 |
efried | cdent: What about putting your code into an "experimental" job (is that what they call it?) so we can run it via zuul from time to time. | 15:03 |
jaypipes | efried: yes | 15:04 |
openstackgerrit | Matthew Booth proposed openstack/nova master: fixtures: Track volume attachments within CinderFixtureNewAttachFlow https://review.openstack.org/587013 | 15:05 |
openstackgerrit | Matthew Booth proposed openstack/nova master: fixtures: Fail deleting non-existent attachment in CinderFixtureNewAttachFlow https://review.openstack.org/589900 | 15:05 |
cdent | jaypipes: is my second chunk fix okay, or are we missing tests that would be upset by that change? | 15:07 |
cdent | efried: that's probably a good idea. let me do some poking around | 15:08 |
efried | cdent: The second-chunk fix (we're talking about getting the rps from the provider summaries, yah?) is glorious. I don't think it needs extra testage. | 15:09 |
* cdent nods | 15:09 | |
efried | not sure what you would do to test it anyway. | 15:09 |
cdent | it's not that I think it needs extra testage | 15:09 |
cdent | but rather that it seems so obvious looking at it that I wonder if it is wrong and is only passing because of twists of fate | 15:10 |
cdent | but if those summaries are already "full" then cool | 15:10 |
jaypipes | cdent: that'll work for now. but I'm currentl fixing the root of the problem (pun intended) which is that _build_provider_summaries() also does the whole RP.get_by_uuid() thing. | 15:11 |
cdent | jaypipes: presumably fixing that could be contained to _build_provider_summaries, in which case the second chunk fix stands | 15:11 |
efried | wonder if we could implement some kind of fixture hack that would issue a warning when we call the same method with the same args | 15:13 |
efried | cdent: We *should* be guaranteed that the providers summaries contain all of the providers we will need. If it doesn't, that's a bug and I'm okay blowing up hard so we can find it and fix it. | 15:14 |
jaypipes | cdent: not really, since all the callers to _build_provider_summaries() need to find, but like I said, your fix is quick and addresses half of the perf issues. | 15:15 |
cdent | jaypipes: I'm not able to parse what you just said | 15:16 |
cdent | jaypipes, efried: or to put it another way: should I go ahead and submit a patch for the small change, or is that all under control in jay's changes? | 15:18 |
efried | jaypipes: IMO cdent's fix still makes sense regardless. Even if you cut the 7s down to .07, cdent's cuts the overall hit in half. | 15:19 |
jaypipes | cdent: submit the small change. | 15:19 |
efried | but yeah, fix in one patch or two? I say two. | 15:19 |
efried | yeah. | 15:19 |
cdent | roger | 15:19 |
jaypipes | two, yes | 15:19 |
jaypipes | sorry if I wasn't clear on that. | 15:19 |
openstackgerrit | Takashi NATSUME proposed openstack/nova master: [placement] api-ref: add description for 1.29 https://review.openstack.org/589407 | 15:21 |
openstackgerrit | Chris Dent proposed openstack/nova master: [placement] Avoid rp.get_by_uuid in allocation_candidates https://review.openstack.org/589941 | 15:29 |
openstackgerrit | Liam Young proposed openstack/nova master: Remove Neutron MetaAPIProxy from cellsv2-layout https://review.openstack.org/588525 | 15:35 |
openstackgerrit | Sergii Golovatiuk proposed openstack/nova master: libvirt: Always escape IPv6 addresses when used in migration URI https://review.openstack.org/589548 | 15:35 |
openstackgerrit | Jay Pipes proposed openstack/nova master: get provider IDs once when building summaries https://review.openstack.org/589945 | 15:42 |
* cdent clicks | 15:42 | |
jaypipes | cdent: ^ and that should dovetail nicely with your fix. | 15:42 |
jaypipes | cdent: if you wouldn't mind running that (along with your fix) through your bench tool, that would be awesome. | 15:44 |
cdent | ayup | 15:46 |
* cdent waits for devstack | 15:54 | |
cdent | I need to further automate my container building workflow | 15:55 |
*** tssurya has quit IRC | 15:57 | |
* jaypipes waits for cdent-ci.org to complete. | 16:08 | |
cdent | unfortunately I just had a ci build failure because I was trying to be clever | 16:08 |
cdent | so starting over | 16:08 |
cdent | we'll be with you shortly caller, all our cpus are busy | 16:08 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add the guideline to write API reference https://review.openstack.org/569058 | 16:15 |
cdent | jaypipes, efried : neither patch: 15.6s. jay's patch: 13.507. my patch: 7.159. both: 3.858s | 16:32 |
cdent | jay's change is having less impact than I would have expected | 16:33 |
jaypipes | cdent: well, my patch cuts your patch in halfish. | 16:35 |
jaypipes | cdent: and your patch cuts master by halfish | 16:35 |
cdent | that's not layered | 16:35 |
cdent | that's either or | 16:35 |
jaypipes | cdent: sorry, I mean that put *together*, they seem to make a very positive difference. | 16:36 |
cdent | well yes, but it's confusing why yours is not more oomph | 16:36 |
cdent | i agree it is overall a win | 16:36 |
cdent | but I was expecting more | 16:36 |
jaypipes | cdent: it's likely that without your patch, my patch doesn't have as much impact (since your patch clearly cuts down on the first (maybe?) rounds of N-loop queries) | 16:37 |
jaypipes | cdent: I can try a more invasive approach but that would be a ton of work. | 16:37 |
jaypipes | :) | 16:37 |
jaypipes | cdent: I say let's go with these for now. | 16:38 |
cdent | yeah, agree | 16:38 |
* cdent tries something for sake of curiousity | 16:38 | |
jaypipes | from 15.6s to 3.9s is a great improvement for a few hours of hacks. | 16:38 |
jaypipes | cdent: but first... lunch. :) | 16:39 |
cdent | my "try something" not particular useful | 16:53 |
cdent | I saw we merge em and in a few days I'll find something else | 16:54 |
*** e0ne has joined #openstack-placement | 17:11 | |
cdent | mriedem: if you've got time to look at https://review.openstack.org/#/q/topic:bug/1786055 those fix the pretty critical perf problem discussion earlier | 17:12 |
openstackgerrit | Dan Smith proposed openstack/nova master: Update compute rpc version alias for rocky https://review.openstack.org/589972 | 17:12 |
cdent | s/ion/ed/ | 17:12 |
mriedem | cdent: discussed earlier...in this channel or your blog post? | 17:27 |
mriedem | probably doesn't matter to me really | 17:27 |
mriedem | "makes perf better" | 17:27 |
cdent | mriedem: sorry, meant here in the channel | 17:28 |
cdent | where we were talking about the bug | 17:28 |
mriedem | ok, so tl;dr makes perf better? | 17:29 |
cdent | mriedem: makes perf better, just like it says on the reviews | 17:33 |
mriedem | just for that, | 17:34 |
* mriedem breaks out the red pen | 17:34 | |
cdent | \o/ | 17:34 |
mriedem | i assume ResourceProvider.get_by_uuid is maybe expensive b/c of the joins to find root provider? | 17:37 |
mriedem | root & parent | 17:37 |
cdent | it's not that. it's that is a separate sql call, with all the overhead of making a new conversation with the db | 17:39 |
cdent | (even when there is connection pooling) | 17:39 |
mriedem | +2 | 17:40 |
mriedem | on yours | 17:40 |
cdent | so the combined change here is to 2000 calls to get_by_uuid (each with one sql statement) | 17:40 |
openstackgerrit | Merged openstack/nova master: conf: Deprecate 'network_manager' https://review.openstack.org/530923 | 17:40 |
cdent | into 1 sql call | 17:40 |
cdent | there's still a lot of looping overhead which could be further fixed (later) but it is basically 1/4 the time it was before the changes | 17:41 |
mriedem | this goes back to what, pike? | 17:42 |
mriedem | i'm sure cern would appreciate this....having ~20k providers... | 17:42 |
cdent | no, this is fairly new | 17:42 |
cdent | one sec | 17:43 |
cdent | one problem was introduced here: https://review.openstack.org/#/c/567113/ | 17:43 |
cdent | sorry, wrong | 17:45 |
mriedem | no that looks right | 17:46 |
mriedem | https://review.openstack.org/#/c/567113/14/nova/api/openstack/placement/objects/resource_provider.py@3610 | 17:46 |
cdent | yeah, I was confusing it with https://review.openstack.org/#/c/564351/ | 17:46 |
cdent | those are the two places | 17:47 |
cdent | so both are pretty new | 17:47 |
cdent | mriedem: the root cause of the get_by_uuid was making sure the ProviderSummary has access to root and parent provider info | 17:49 |
cdent | the cost of doing get_by_uuid for each one was apparently non-obvious | 17:49 |
*** e0ne has quit IRC | 18:03 | |
*** e0ne has joined #openstack-placement | 18:20 | |
openstackgerrit | melanie witt proposed openstack/nova master: Add functional test for affinity with multiple cells https://review.openstack.org/585073 | 18:23 |
openstackgerrit | melanie witt proposed openstack/nova master: Make scheduler.utils.setup_instance_group query all cells https://review.openstack.org/540258 | 18:23 |
openstackgerrit | melanie witt proposed openstack/nova master: Use TLSv1.2 for secure VNC access https://review.openstack.org/589992 | 18:35 |
jaypipes | cdent, mriedem, efried: we're aiming to get these perf fixes into rocky, right? (I hope so, just want to check with you) | 18:57 |
mriedem | one is approved | 18:58 |
mriedem | the other could be backported or put into rc2 if necessary | 18:58 |
mriedem | ask melwitt i guess if it's worth tagging with rocky-rc-potential | 18:58 |
mriedem | given rc1 is tomorrow | 18:58 |
melwitt | it sounds like a major perf improvement, so I'd want to get it into an RC. I don't know if it needs to be RC1 | 19:00 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: FakeDriver: adding and removing instances on live migration. https://review.openstack.org/243613 | 19:09 |
*** e0ne has quit IRC | 19:11 | |
*** e0ne has joined #openstack-placement | 19:19 | |
*** e0ne has quit IRC | 19:23 | |
melwitt | I've tagged the bug with rocky-rc-potential | 19:23 |
melwitt | it's really a major perf regression, so important enough for an RC | 19:25 |
cdent | i'm comfy with both, and think jay's thing is fine as is; we can tweak it for more betterness later | 19:39 |
efried | cdent, jaypipes, mriedem, melwitt: Back at the keys. What's the status of the perf fixes? They merging yet? | 19:42 |
melwitt | one is approved | 19:42 |
melwitt | jaypipes's is under review and is a candidate for RC2 being that it is low risk and solves a major perf regression | 19:43 |
efried | melwitt: Okay, I upgraded https://review.openstack.org/#/c/589945/ to +2. | 19:43 |
efried | melwitt: What's "RC2", and what's the approved one in if not that, and why wouldn't both of these go into the same place(s)? | 19:44 |
melwitt | if it lands before EOD tomorrow, we won't need a RC2 | 19:44 |
melwitt | does that change not need any test coverage? | 19:44 |
efried | melwitt: No. We could add unit tests for it, but it's _well_ covered by object-level and functional tests all over the place. | 19:45 |
melwitt | ok | 19:45 |
cdent | melwitt: it's effectively the same as the other one: existing func tests cover it. there are quite a lot of placement sql-related things for which large chunks are taking "as is" because it's mostly just "make me a sql query" | 19:45 |
melwitt | I see | 19:46 |
melwitt | so it's an in-place swap that is covered by existing coverage | 19:46 |
cdent | pretty much | 19:46 |
efried | cdent: The test whose results you quoted on https://review.openstack.org/#/c/589945/ -- that was GET /a_c with how many resources? | 19:48 |
cdent | efried: it says right there: "with 1000 resource providers" | 19:48 |
efried | cdent: GET /a_c?<what here?> | 19:49 |
efried | cause you had mentioned one run with only one resource class and another with more than that. | 19:49 |
cdent | oh, in the resources param, same as before: 3 resource classes | 19:49 |
efried | okay. Then your patch should really be getting us from 4000 calls down to 1000. | 19:50 |
efried | and then with jay's, down to 1 | 19:50 |
efried | because before we were doing get_by_uuid for each AllocationRequestResource | 19:51 |
cdent | yeah, that makes sense | 19:51 |
cdent | I feel like perhaps there's a lot we can do to make this stuff much better (later) | 19:52 |
efried | I too am surprised that jaypipes' only improved by ~13.5% | 19:52 |
efried | actually, it seems to be a bit more linear than that. | 19:53 |
jaypipes | efried: it improved from baseline 13.5%. It improved from cdent's patch 50%. | 19:53 |
cdent | because if that is 3000 loop because of that tuple arrangement, that seems... wastleful | 19:53 |
efried | jaypipes: yeah. I guess because the db is doing some kind of caching or whatever. | 19:53 |
jaypipes | efried: like I mentioned in a response, I *think* the reason is b/c cdent's patch addresses an outer loop and mine addresses an inner loop of sort. | 19:54 |
cdent | jaypipes: i'm not sure that's the case, I don't see the loops | 19:54 |
cdent | it's create summaries -> use summaries | 19:54 |
jaypipes | efried: or vice versa... I'm not entirely sure but I think it's clear that merging both patches is better than merging either. | 19:55 |
efried | certainly | 19:55 |
cdent | totally agree we should merge both | 19:55 |
cdent | sometime soon we should do a more complete analysis | 19:55 |
jaypipes | cdent: right, but my patch modifies the "create summaries" part to be a single query instead of 1000 queries. | 19:55 |
jaypipes | cdent: so maybe my part is the "outer loop" I've been talking about... | 19:55 |
cdent | yes, but the loops do not intersect | 19:56 |
jaypipes | in any case, not really worth debating :) | 19:56 |
cdent | loop A then loop B | 19:56 |
cdent | :) | 19:56 |
jaypipes | cdent: ack. | 19:56 |
cdent | someday it is worth understanding, I hope, but not now | 19:56 |
jaypipes | ++ | 19:56 |
jaypipes | it will all be rewritten in rust tomorrow anyway. | 19:56 |
cdent | oh it's rust now? cool | 19:57 |
cdent | the conversation in the nova channel reminds of a thing which today's performance issue made me realize may be less interesting that I though: having a "super-placement" server that manages multiple regions. each region's rps identified by a sort of auto-aggregate | 19:59 |
jaypipes | cdent: I was joking of course. :) | 20:00 |
openstackgerrit | Merged openstack/nova master: Refactor cell_type in compute/api.py https://review.openstack.org/589833 | 20:00 |
jaypipes | cdent: having a sharding key in the placement DB would be a good thing, yes. | 20:01 |
cdent | me too, I assumed "in rust tomorrow" is the new "in go tomorrow" | 20:01 |
jaypipes | cdent: well, with PEP 572, we're moving Python one step closer to being Golang. Much to dansmith's everlasting joy I presume. | 20:01 |
dansmith | dude I effing love golang | 20:02 |
* dansmith checks to see if any recruiters were watching | 20:02 | |
efried | jaypipes, cdent: FWIW, I looked around for other places we're looping get_by_uuid. Notwithstanding a TODO in nova.api.openstack.placement.handlers.allocation._resource_providers_by_uuid, this (GET /a_c) is really the only place we're likely to be doing it for more than a small handful of providers. E.g. querying for allocations, aggregates, traits, etc. - it's either a single provider, or (for allocations) is likely t | 20:02 |
cdent | likely t$ | 20:02 |
cdent | efried: seems you trunc'd | 20:03 |
efried | is likely to involve only a few providers. | 20:03 |
efried | weird | 20:03 |
mriedem | +W on the 2nd perf patch from jay | 20:03 |
efried | wait, which 'likely' did I trunc on? | 20:03 |
* efried waits for eavesdrop to catch up | 20:04 | |
cdent | "for allocations) is likely t" | 20:04 |
cdent | purpler is always up to date: http://p.anticdent.org/logs/openstack-placement?dated=2018-08-08%2020:04:37.232431#bottom | 20:04 |
efried | okay, yeah | 20:04 |
cdent | of course purpler has a nice bug: [t 2kaq] | 20:05 |
purplerbot | <efried> jaypipes, cdent: FWIW, I looked around for other places we're looping get_by_uuid. Notwithstanding a TODO in nova.api.openstack.placement.handlers.allocation._resource_providers_by_uuid, this (GET /a_c) is really the only place we're likely to be doing it for more than a small handful of providers. E.g. querying for allocations, aggregates, traits, etc. - it's either a single provider, or (for allocations) is likely t [201 | 20:05 |
efried | eek | 20:05 |
cdent | oh interesting, the bug has changed | 20:05 |
cdent | an overly long line used to cause it to crash | 20:06 |
melwitt | mriedem: thanks. we are on track for RC1andOnly | 20:07 |
mriedem | how romantic | 20:09 |
jaypipes | melwitt: RCAllFor1? | 20:09 |
mriedem | but i've got a promise ring for rc2 | 20:09 |
melwitt | ;p; | 20:09 |
melwitt | I mean, lol | 20:09 |
mriedem | i'll be out of the country for rc2 anyway so doesn't matter to me | 20:09 |
mriedem | you a-holes are on your own | 20:09 |
jaypipes | :) | 20:09 |
melwitt | well! | 20:09 |
jaypipes | mriedem: have fun on your yearly pilgrimage to antarctica. | 20:10 |
mriedem | i vomit chewed up fish into orphaned penguin mouths | 20:10 |
melwitt | sounds like a vacation | 20:10 |
cdent | if you get a chance, highly recommended (antarctica, not the vomit) | 20:11 |
cdent | https://www.flickr.com/photos/cdent/tags/antarctica/ | 20:12 |
dansmith | jaypipes: pretty sure the place mriedem is going is the opposite of antarctica | 20:13 |
melwitt | tropical? | 20:14 |
melwitt | heated? | 20:14 |
jaypipes | cdent: great pics :) | 20:14 |
cdent | it was pretty amazing. penguins make some serious smell. | 20:15 |
efried | ye gods please never ever ever | 20:15 |
efried | Winter in Texas pisses me off. Places with permanent ice? Forgetaboutit. | 20:16 |
efried | I've been known to go to cold places when there's skiing to be had. | 20:16 |
cdent | for CVO permanent | 20:17 |
cdent | night all | 20:31 |
*** cdent has quit IRC | 20:31 | |
openstackgerrit | melanie witt proposed openstack/nova master: Add a prelude release note for the 18.0.0 Rocky GA https://review.openstack.org/589303 | 20:39 |
openstackgerrit | Sergii Golovatiuk proposed openstack/nova master: libvirt: Always escape IPv6 addresses when used in migration URI https://review.openstack.org/589548 | 20:51 |
openstackgerrit | Sergii Golovatiuk proposed openstack/nova master: libvirt: Always escape IPv6 addresses when used in migration URI https://review.openstack.org/589548 | 21:07 |
openstackgerrit | Jay Pipes proposed openstack/nova master: split gigantor SQL placement query into multiple https://review.openstack.org/590041 | 21:13 |
efried | ooo ^ | 21:13 |
openstackgerrit | Merged openstack/nova master: xx_instance_type_id in list_migrations should be integer https://review.openstack.org/588481 | 21:19 |
jaypipes | efried: yeah, I'd like cdent to run that patch through his benchmarks to validate it doesn't have too much of a negative impact on performance (it shouldn't, but good to check) | 21:22 |
efried | I'm waiting for test results. Did you run locally? | 21:22 |
jaypipes | efried: of course | 21:24 |
efried | you say that like it's a foregone conclusion. I can reheat my coffee with my laptop if I try that. | 21:25 |
*** mriedem has quit IRC | 21:30 | |
*** mriedem has joined #openstack-placement | 21:31 | |
openstackgerrit | Merged openstack/nova master: Add explicit functional-py36 tox target https://review.openstack.org/589825 | 21:38 |
openstackgerrit | Merged openstack/nova master: Add the guideline to write API reference https://review.openstack.org/569058 | 21:38 |
openstackgerrit | Merged openstack/nova master: Remove Neutron MetaAPIProxy from cellsv2-layout https://review.openstack.org/588525 | 21:56 |
*** nicolasbock has quit IRC | 22:06 | |
*** edmondsw has quit IRC | 22:10 | |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Docs: Add guide to migrate instance with snapshot https://review.openstack.org/584442 | 22:50 |
*** stephenfin_ has joined #openstack-placement | 23:02 | |
*** sean-k-mooney has quit IRC | 23:02 | |
*** stephenfin has quit IRC | 23:02 | |
openstackgerrit | melanie witt proposed openstack/nova stable/ocata: [stable only] Handle quota usage during create/delete races https://review.openstack.org/582413 | 23:04 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Update nova network info when doing rebuild for evacuate operation https://review.openstack.org/382853 | 23:11 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Update nova network info when doing rebuild for evacuate operation https://review.openstack.org/382853 | 23:12 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/pike: Fix instance evacuation with PCI devices https://review.openstack.org/590059 | 23:18 |
openstackgerrit | Matt Riedemann proposed openstack/nova stable/queens: Update nova network info when doing rebuild for evacuate operation https://review.openstack.org/590062 | 23:33 |
openstackgerrit | Merged openstack/nova master: get provider IDs once when building summaries https://review.openstack.org/589945 | 23:34 |
openstackgerrit | Merged openstack/nova master: libvirt: Reduce calls to qemu-img during update_available_resource https://review.openstack.org/589513 | 23:57 |
openstackgerrit | Merged openstack/nova master: Add tempest-slow job to run the tempest slow tests https://review.openstack.org/567697 | 23:57 |
openstackgerrit | Merged openstack/nova master: Fix bad links for admin-guide https://review.openstack.org/580259 | 23:57 |
openstackgerrit | Merged openstack/nova master: [placement] Avoid rp.get_by_uuid in allocation_candidates https://review.openstack.org/589941 | 23:57 |
openstackgerrit | Merged openstack/nova master: api-ref: fix min_version for parent_provider_uuid in responses https://review.openstack.org/579577 | 23:57 |
openstackgerrit | Merged openstack/nova master: doc: mark the max microversion for rocky https://review.openstack.org/589598 | 23:57 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!