*** mriedem has quit IRC | 00:13 | |
*** mriedem has joined #openstack-placement | 00:19 | |
*** takashin has joined #openstack-placement | 00:43 | |
openstackgerrit | Tetsuro Nakamura proposed openstack/placement master: Use set instead of list https://review.openstack.org/639887 | 02:47 |
---|---|---|
openstackgerrit | Tetsuro Nakamura proposed openstack/placement master: Refactor _get_trees_matching_all() https://review.openstack.org/639888 | 02:47 |
openstackgerrit | Tetsuro Nakamura proposed openstack/placement master: Adds debug log in allocation candidates https://review.openstack.org/639889 | 02:47 |
*** mriedem has quit IRC | 03:32 | |
openstackgerrit | Merged openstack/placement master: Remove redundant second cast to int https://review.openstack.org/639203 | 06:55 |
*** tssurya has joined #openstack-placement | 08:19 | |
*** takashin has left #openstack-placement | 08:30 | |
*** helenafm has joined #openstack-placement | 08:37 | |
*** rubasov has quit IRC | 08:50 | |
gibi | bauzas: why the driver uses mdev and mdev_types as well for lisiting devices? | 08:54 |
gibi | https://github.com/openstack/nova/blob/337b24ca41d2297cf5315d31cd57458526e1e449/nova/virt/libvirt/host.py#L900 | 08:54 |
gibi | https://github.com/openstack/nova/blob/337b24ca41d2297cf5315d31cd57458526e1e449/nova/virt/libvirt/host.py#L893 | 08:54 |
*** ttsiouts has joined #openstack-placement | 09:11 | |
*** e0ne has joined #openstack-placement | 09:32 | |
openstackgerrit | Chris Dent proposed openstack/placement master: Docs: extract testing info to own sub-page https://review.openstack.org/639628 | 09:34 |
*** rubasov has joined #openstack-placement | 10:09 | |
*** rubasov has quit IRC | 10:19 | |
*** rubasov has joined #openstack-placement | 10:26 | |
*** cdent has joined #openstack-placement | 11:00 | |
cdent | jaypipes: if you get a brief moment to swoop in with an opinion on classmethods v module level methods in https://review.openstack.org/#/c/639391/ that would be handy as we continue to remove more code. no need, now, for a big review | 11:07 |
cdent | just an opinion on the quibble that eric and I are enjoying having | 11:08 |
*** ttsiouts has quit IRC | 11:10 | |
*** ttsiouts has joined #openstack-placement | 11:10 | |
*** ttsiouts has quit IRC | 11:15 | |
*** e0ne has quit IRC | 11:30 | |
*** e0ne has joined #openstack-placement | 11:36 | |
*** ttsiouts has joined #openstack-placement | 12:09 | |
bauzas | efried: jaypipes: I'm working on fixing https://review.openstack.org/#/c/636591/5/nova/virt/libvirt/driver.py@583 | 13:31 |
jaypipes | bauzas: k | 13:31 |
jaypipes | cdent: ack | 13:31 |
cdent | thanks | 13:31 |
bauzas | efried: jaypipes: but for that, I need to get allocations for all instances from a compute | 13:31 |
bauzas | efried: jaypipes: so I saw https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1887 but it looks like I couldn't be using it ? | 13:32 |
bauzas | because I would like to pass allocations to https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1202 | 13:33 |
jaypipes | bauzas: isn't the set of allocations passed to the reshape method? | 13:42 |
jaypipes | bauzas: oh, I see, you're trying to do this on restart after you've successfully reshaped.. | 13:43 |
jaypipes | bauzas: question: are mdev UUIDs guaranteed to persist in a consistent fashion across reboots? | 13:45 |
bauzas | jaypipes: unfortunately not :( | 13:51 |
bauzas | they disappear when rebooting the host | 13:52 |
bauzas | they're not persisted by the kernel | 13:52 |
bauzas | :( | 13:52 |
bauzas | hence why we have https://review.openstack.org/#/c/636591/5/nova/virt/libvirt/driver.py@579 | 13:52 |
bauzas | we basically recreate mdevs there | 13:52 |
jaypipes | bauzas: that's crazy. | 13:58 |
jaypipes | bauzas: in any case, I've added a comment to that patch in the function in question. please see that comment. | 13:58 |
jaypipes | bauzas: BTW, since nvidia invented the whole mdev system, what do *their* developers advise? | 13:59 |
jaypipes | bauzas: or are their developers basically not engaged at all? | 13:59 |
bauzas | jaypipes: humpf | 14:00 |
bauzas | I mean, nvidia wants to get paid | 14:00 |
bauzas | but all the devs I know are just working on their own kernel driver | 14:00 |
bauzas | anyway | 14:01 |
bauzas | jaypipes: I just looked at your comment | 14:01 |
bauzas | jaypipes: so we know the instance UUID and even the mdev UUID | 14:01 |
bauzas | jaypipes: that's what does self._get_all_assigned_mediated_devices() => it looks at the guest XMLs | 14:02 |
bauzas | *all* the guest XMLs | 14:02 |
jaypipes | bauzas: heh | 14:02 |
bauzas | and as you can see, we get the instance UUID | 14:03 |
jaypipes | bauzas: ok, so if we have that information, what do we need the allocs dicts for? | 14:03 |
bauzas | unfortunately yes, because we don't know which RP is used by the consumer | 14:03 |
bauzas | in case we have two pGPUs, we have two children RPs | 14:03 |
bauzas | so i need to get the allocation for knowing the RP UUID | 14:04 |
bauzas | if not, I'll create a new mdev for this instance, but maybe not by the same pGPU | 14:04 |
jaypipes | bauzas: oh, I'm sorry, I thought the mdev UUID *was* the provider UUID? | 14:05 |
bauzas | unfortunately no | 14:05 |
jaypipes | ah, no... I guess not, there are >1 mdevs corresponding to a single resource of VGPU consumed on the provider, right? | 14:05 |
bauzas | and it's not also the instance UUID | 14:05 |
bauzas | well, not sure I understand correcly your question, but... for one RP (the PGPU), we could have >1 mdev yes | 14:06 |
bauzas | mdev == VGPU basically | 14:07 |
jaypipes | yeah, sorry, forgot about that | 14:07 |
bauzas | np | 14:07 |
bauzas | what we *could* do is to ask operators to recreate the mdevs by them | 14:07 |
bauzas | but... :) | 14:08 |
jaypipes | bauzas: well, in theory, I don't have any particular problem with passing the allocations information in init_host(). after all, we're doing an InstanceList.get_by_host() right after init_host() (https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1204-L1205) and that's essentially the exact same thing as getting allocations from placement | 14:08 |
jaypipes | bauzas: we could just as easily do that call to InstanceList.get_by_host() *before* calling init_host() and pass that stuff in init_host(). The allocations info from placement is a similar type of info. | 14:09 |
jaypipes | I wonder what mriedem would think of that though. | 14:09 |
bauzas | jaypipes: yeah I was thinking of that, but once we get all the instances UUIDs by self._get_all_assigned_mediated_devices() we need to call placement, right? | 14:11 |
bauzas | jaypipes: for knowing their allocs | 14:11 |
bauzas | jaypipes: so, say we have 1000 instances, we could hit 1000 times placement just for that :( | 14:11 |
bauzas | hence why I was looking at alternative API calls | 14:11 |
*** mriedem has joined #openstack-placement | 14:13 | |
jaypipes | bauzas: you're looking for basically a new placement API call. something like GET /allocations?consumer_id=in:<instance1>,<instance2>, etc... | 14:20 |
bauzas | if so, that's not good :) | 14:21 |
bauzas | jaypipes: I was looking at any *existing* placement call :) | 14:21 |
jaypipes | bauzas: the alternative to that would be to store a file on the compute service that keeps that mapping of instance UUID -> provider UUID for you. (this is actually what I said would be needed in the original spec review) | 14:21 |
bauzas | so that reshape services wouldn't be blocked because of a missing API :) | 14:21 |
bauzas | jaypipes: yeah... | 14:22 |
bauzas | I don't disagree with that | 14:22 |
bauzas | you know what ? I'm just about to write docs and say how terrible it is to reboot a host | 14:22 |
jaypipes | bauzas: there is also GET /resource_providers/{compute_uuid}/allocations | 14:22 |
bauzas | jaypipes: that's the https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1887 report method, right? | 14:22 |
bauzas | jaypipes: I was a bit scared by looking at the docstring :D | 14:23 |
jaypipes | yes | 14:23 |
bauzas | oh wait | 14:23 |
bauzas | no, I can use this method :) | 14:23 |
jaypipes | see here a comment from efried: https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1970-L1971 | 14:23 |
bauzas | I confused it with https://github.com/openstack/nova/blob/master/nova/scheduler/client/report.py#L1907 | 14:23 |
bauzas | oh wait, actually I need the whole tree | 14:24 |
jaypipes | bauzas: all get_allocations_for_provider_tree() does is call get_allocations_for_provider() over and over again :) | 14:24 |
bauzas | yeah | 14:24 |
jaypipes | thus the comment from efried there :) | 14:24 |
bauzas | again, I was confused | 14:24 |
bauzas | heh | 14:24 |
jaypipes | no worries, this is a confusing area | 14:24 |
bauzas | and I think efried is right | 14:24 |
bauzas | the problem with persisting things on our side is that it makes it mandatory, but just because libvirt lacks a feature | 14:25 |
bauzas | which I dislike at mosty | 14:25 |
bauzas | it's not because nvidia devs weren't able to convince kernel folks to persist mdevs that we should suffer from this | 14:25 |
bauzas | and do crazy things just because of this | 14:25 |
bauzas | jaypipes: hence me reluctant to persist any mdev mapping info | 14:26 |
jaypipes | bauzas: yeah, definitely it's a catch-22 situation. | 14:27 |
* bauzas googles this | 14:28 | |
* bauzas doesn't have a single shit of culture | 14:28 | |
bauzas | :( | 14:28 |
jaypipes | bauzas: oh, sorry... a better term would be "a lose-lose situation"? | 14:29 |
bauzas | ha-ah, I understand it better :) | 14:29 |
bauzas | but yeah I agree | 14:29 |
bauzas | the more I think about that, the more I'm reluctant to do any single change | 14:30 |
bauzas | I should rather file a libvirt bug asking for mdevs to be persistent | 14:30 |
bauzas | I don't know who decided mdevs should be part of sysfs | 14:31 |
bauzas | but then... | 14:31 |
bauzas | your call | 14:31 |
jaypipes | bauzas: I think we should chat with mriedem about the pros and cons of passing a built-up allocations variable to init_host() for all the instances on a host. (one of the cons being that this could be an ENORMOUS variable for nova-compute services running Ironic...) | 14:34 |
jaypipes | or frankly any nova-compute with hundreds or thousands of VMs on it. | 14:34 |
efried | jaypipes, bauzas: Do we need the allocations for everything under the compute node, or only allocations against the pGPU RPs? | 14:36 |
bauzas | efried: ideally the latter | 14:36 |
jaypipes | efried: are you thinking something like a GET /allocations?consumer_id=in:<uuids>?resource_class=VGPU ? | 14:37 |
cdent | ugh | 14:38 |
efried | No, I was thinking we already know the pGPU RP UUIDs, don't we? | 14:38 |
efried | same way we generated them? | 14:41 |
bauzas | no, because that's on init | 14:42 |
bauzas | oh wait | 14:42 |
bauzas | init is actually *after* urp | 14:42 |
bauzas | I think I see where you're coming | 14:42 |
bauzas | we could get the RP UUIDs | 14:42 |
bauzas | by looking up the cache | 14:43 |
bauzas | in the driver | 14:43 |
bauzas | but then, we would still need to call placement to get the allocations | 14:43 |
bauzas | or or... | 14:43 |
* bauzas has his mind thinking multiple things | 14:43 | |
efried | Another option is GET /allocations?in_tree=<cn_uuid> | 14:44 |
efried | This would reduce the ironic issue down to a single node. | 14:44 |
efried | But could still be big if there's a crap ton of instances on the node. | 14:44 |
cdent | as a first pass, a lot of small GETs if you need lots of different allocations | 14:45 |
cdent | if that proves too clostly _then_ fix it | 14:45 |
cdent | but getting allocations ought to be one of the faster operations | 14:46 |
cdent | and if you really need it to be properly vast throw all the request down an eventlet thread pool and async them | 14:46 |
cdent | s/vast/fast/ but vast works too | 14:47 |
efried | I've been trying to come up with a good definition of clostly | 14:47 |
cdent | clostly is _obviously_ a type of expense is close quarters or time constraints | 14:49 |
cdent | s/is/in/ | 14:49 |
* cdent sighs | 14:49 | |
efried | I figured it was something like that. | 14:53 |
bauzas | so, we have the providerTree even when we init_host() | 14:55 |
bauzas | so there is a way to know the pGPU RPs | 14:55 |
bauzas | but then, I'd need to get the allocations :( | 14:55 |
efried | you even have their names in the ProviderTree. | 14:55 |
bauzas | yeah but again, I need to lookup allocations | 14:56 |
efried | Right, but you can narrow it down to only the pGPUs on only one compute node. | 14:56 |
efried | which will be what, eight max? | 14:56 |
bauzas | oh I see | 14:56 |
bauzas | oh no | 14:56 |
efried | I mean, how many GPU cards are we expecting on one system? | 14:56 |
bauzas | hah, this, well, it depends but 8 looks reasonable | 14:57 |
efried | I mean, just as a heuristic | 14:57 |
bauzas | but a GPU card == N pGPUs | 14:57 |
bauzas | so theorically, maybe 16 or 64 | 14:57 |
bauzas | but meh | 14:57 |
bauzas | reasonable enough | 14:57 |
efried | my point is, you'll be making somewhere in the magnitude of 10-100 GET /allocations calls | 14:57 |
bauzas | efried: sure, but who would be the caller ? the libvirt driver, right? | 14:58 |
bauzas | which we said N times no | 14:58 |
bauzas | or, we need to get the providertree in the compute manager | 14:58 |
mriedem | someone is going to have to catch me up because i don't want to read all this scrollback | 14:58 |
efried | we have the provider tree in the compute manager - but it doesn't have allocations in it (nor should it). | 14:58 |
bauzas | mriedem: nothing really crispy atm | 14:58 |
bauzas | efried: do we ? | 14:58 |
bauzas | you make my day if so | 14:59 |
efried | bauzas: hum, well, we have a report client, so you could call get_provider_tree_and_ensure_root. | 14:59 |
efried | which would be the way you should get the provider tree in any case. | 14:59 |
efried | so if it's not already cached, it'll be pulled down. | 15:00 |
bauzas | hah | 15:01 |
bauzas | worth trying then | 15:01 |
*** e0ne has quit IRC | 15:16 | |
*** e0ne has joined #openstack-placement | 15:19 | |
*** rubasov has quit IRC | 15:54 | |
*** nguyenhai_ has quit IRC | 16:14 | |
*** nguyenhai_ has joined #openstack-placement | 16:15 | |
*** e0ne has quit IRC | 16:16 | |
*** ttsiouts has quit IRC | 16:16 | |
*** ttsiouts has joined #openstack-placement | 16:21 | |
*** e0ne has joined #openstack-placement | 16:23 | |
*** Sundar has joined #openstack-placement | 16:29 | |
*** rubasov has joined #openstack-placement | 16:47 | |
*** helenafm has quit IRC | 16:53 | |
*** ttsiouts has quit IRC | 16:54 | |
*** tssurya has quit IRC | 16:57 | |
*** ttsiouts has joined #openstack-placement | 16:58 | |
*** ttsiouts has quit IRC | 16:59 | |
*** ttsiouts has joined #openstack-placement | 16:59 | |
*** ttsiouts has quit IRC | 17:04 | |
*** e0ne has quit IRC | 17:05 | |
efried | cdent: are you rebasing the placement ObjectList series? | 17:13 |
cdent | efried: yes | 17:13 |
cdent | right this minute | 17:13 |
openstackgerrit | Chris Dent proposed openstack/placement master: Factor listiness into an ObjectList base class https://review.openstack.org/637325 | 17:15 |
openstackgerrit | Chris Dent proposed openstack/placement master: Move _set_objects into ObjectList https://review.openstack.org/637328 | 17:15 |
openstackgerrit | Chris Dent proposed openstack/placement master: Move *List.__repr__ into ObjectList https://review.openstack.org/637332 | 17:15 |
openstackgerrit | Chris Dent proposed openstack/placement master: Clean up ObjectList._set_objects signature https://review.openstack.org/637335 | 17:15 |
openstackgerrit | Chris Dent proposed openstack/placement master: Use native list for lists of Usage https://review.openstack.org/639391 | 17:15 |
openstackgerrit | Chris Dent proposed openstack/placement master: Move RC_CACHE in resource_class_cache https://review.openstack.org/640114 | 17:15 |
* cdent takes a walk while that cooks | 17:17 | |
*** Sundar has quit IRC | 18:05 | |
*** e0ne has joined #openstack-placement | 18:51 | |
*** e0ne has quit IRC | 19:03 | |
mriedem | cdent: some comments on your osc-placement test cleanup patch https://review.openstack.org/#/c/639717/ | 19:36 |
cdent | thanks mriedem will respond, but brief glance there's nothing to disagree with | 19:51 |
cdent | the whole thing was rather bizarre. bunch of stuff totally broke for py3 | 19:54 |
*** ttsiouts has joined #openstack-placement | 20:09 | |
*** e0ne has joined #openstack-placement | 20:16 | |
openstackgerrit | Chris Dent proposed openstack/placement master: WIP: Move Allocation and AllocationList to own module https://review.openstack.org/640184 | 20:36 |
cdent | efried, jaypipes, edleafe ↑ is now the end of the big refactoring stack. I'm going to continue this stuff the bitter end unless you guys want to stop me. Your feedback thus far has been great, so thanks for that. | 20:37 |
efried | I though there was no end, bitter or otherwise | 20:37 |
efried | gdi, how do you override | ? | 20:38 |
edleafe | Agree with efried - there will never be an ending | 20:38 |
cdent | well, I don't want there to be an end, that's kind of the point: constant refactoring, endless loop | 20:39 |
cdent | efried: you mean when booleaning two objects? | 20:39 |
efried | cdent: I mean set union | 20:40 |
cdent | is it not __or__? | 20:41 |
efried | ah, yes | 20:41 |
efried | after all that, I changed my mind about suggesting it. | 20:44 |
openstackgerrit | Chris Dent proposed openstack/osc-placement master: Update tox and tests to work with modern setups https://review.openstack.org/639717 | 20:46 |
openstackgerrit | Chris Dent proposed openstack/osc-placement master: Add support for 1.18 microversion https://review.openstack.org/639738 | 20:46 |
cdent | mriedem: that ↑ ought to fix your concerns. | 20:47 |
mriedem | fancy arrows | 20:47 |
cdent | I got this one too ↓ | 20:47 |
cdent | but that's as far as it goes | 20:47 |
cdent | that's enough for me today | 20:48 |
cdent | goodnight all | 20:48 |
*** cdent has quit IRC | 20:48 | |
mriedem | i'm +2 on https://review.openstack.org/#/q/topic:cd/make-tests-work+(status:open+OR+status:merged) if someone else wants to hit them, pretty trivial | 20:53 |
mriedem | 1.18 is the first placement microversion added in rocky, | 20:53 |
mriedem | so we're a bit behind on osc-placement parity with the api | 20:53 |
*** takashin has joined #openstack-placement | 20:56 | |
*** e0ne has quit IRC | 21:19 | |
*** s10 has joined #openstack-placement | 21:28 | |
*** e0ne has joined #openstack-placement | 21:42 | |
*** e0ne has quit IRC | 21:44 | |
*** e0ne has joined #openstack-placement | 21:45 | |
*** e0ne has quit IRC | 22:01 | |
*** s10 has quit IRC | 22:27 | |
openstackgerrit | Eric Fried proposed openstack/placement master: DNM: get_rc_cache https://review.openstack.org/640226 | 23:49 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!