14:00:27 #startmeeting nova_scheduler 14:00:28 Meeting started Mon Jan 16 14:00:27 2017 UTC and is due to finish in 60 minutes. The chair is cdent. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:29 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:32 The meeting name has been set to 'nova_scheduler' 14:00:56 say hi if you're hear for nova scheduler team meeting 14:01:05 i'll be running the show today because edleafe is on PTO 14:01:31 #link agenda https://wiki.openstack.org/wiki/Meetings/NovaScheduler#Agenda_for_next_meeting 14:01:33 <_gryf> o/ 14:01:52 o/ 14:01:59 o/ 14:02:17 #topic specs & reviews 14:02:32 #link inventory tree https://review.openstack.org/#/c/415920 14:02:59 jaypipes: I think you and ed negotiated a new approach on this? do things in report client instead of virt manager? 14:03:13 also: thank you for choosing to change the name to ProviderTree 14:03:24 makes my brain feel smoother 14:03:31 cdent: yup, and I have made all those changes. 14:03:48 cdent: I still need to work in NUMA cells into the patch series, but it's getting there. 14:03:51 \o 14:03:58 is that link above still good as the entry point to review stuff? 14:04:21 cdent: yes sir 14:04:29 cool 14:04:55 the agenda says (on that topic): "Is it possible to fix libvirt instead?" 14:05:04 I assume ed probably wrote that. 14:05:24 yeah. never mind that, I pulled all of it out of the virt layer. 14:05:29 could someone please give me more context about the change we're discussing ? 14:05:42 I did read the commit msg of course :) 14:06:15 bauzas: you mean "why do we need this thing?" 14:06:16 bauzas: I had proposed a change to the virt driver API that would pass a new InventoryTree object to an update_inventory() method of the virt driver, where the driver would be responsible for updating the state of the inventory. 14:06:49 cdent: not the reasoning, just the problem statement :) 14:06:54 bauzas: edleafe rightly said it was too invasive and cdent said it felt yucky, so I reworked that to instead be self-contained within the scheduler reporting client. 14:07:12 "felt yucky" was my professional evaluation 14:07:18 mmm 14:07:42 because we don't have yet an hierarchical way for the inventories ? 14:08:13 bauzas: well, simpler problem than that :) 14:08:20 actually, I was a bit out when you discussed about the custom rps 14:08:31 oops 14:08:34 s/custom/nested 14:08:42 bauzas: we didn't yet have a way of identifying resource providers by name and not uuid, and the way the NUMA stuff was written made it impossible to add a uuid field to NUMACell. 14:09:09 ah I see 14:09:12 bauzas: in addition to the problem of nesting levels. 14:09:40 it was impossible because of the limbo dance in the virt hardware helper module ? 14:09:49 bauzas: add to that the ongoing mutex on API microversion and it made for a fun Sunday of rebasing :) 14:10:26 okay, I don't want to eat too much time about that problem statement 14:10:26 bauzas: yes, if by limbo dance you mean the functional programming paradigms that that module was written with that have readonly parameters and functions that return copies/mutations of the supplied parameters. 14:10:44 we could discuss that offline 14:10:49 :) 14:10:53 anyway, yeah, moving on. 14:11:04 so we got bauzas patch merged. \o/ 14:11:06 jaypipes: yup, that and all the conditionals in there 14:11:10 anyone have any other specs and reviews they'd like to mention? 14:11:16 <_gryf> yeah 14:11:26 <_gryf> I didn't put that on the agenda 14:11:32 <_gryf> #link https://review.openstack.org/#/c/418393/ 14:11:41 <_gryf> just a heads up 14:12:05 that will be a great thing to have once it happens 14:12:06 _gryf: ooh, nice. thanks for hopping on that! 14:12:11 <_gryf> this is a BP for providing detailed error info for placement api 14:12:16 ++ 14:12:23 <_gryf> cdent, thanks for the comments 14:12:57 you're welcome, more to come soon I'm sure 14:13:01 <_gryf> I feel, that BP will take a while to make satisfying shape 14:13:07 :) 14:13:13 <_gryf> esp on messages 14:13:56 cdent: jaypipes: about notification BP - https://blueprints.launchpad.net/nova/+spec/placement-notifications 14:13:59 several bikesheds will be needed 14:14:04 I forgot to add it in agent 14:14:29 #link notification bp: https://blueprints.launchpad.net/nova/+spec/placement-notifications 14:14:31 I have gone through the versioned object & now has good understanding of how it works 14:15:03 I will start writing spec on it, if I need some help, will ping you on nova IRC 14:15:16 awesome 14:15:45 ready to move on to bugs? 14:15:52 :) 14:16:03 #topic bugs 14:16:09 #link placement tagged bugs: https://bugs.launchpad.net/nova/+bugs?field.tag=placement&orderby=-id&start=0 14:16:12 there are a couple of new ones 14:16:40 diga: rock on. 14:16:43 both from mattr 14:16:54 jaypipes: :) 14:16:55 mattr doesn't matter. 14:17:42 cdent: cool on bugs. do you want to assign those or are you looking for volunteers? 14:17:43 dark mattr 14:18:03 heh 14:18:08 only one needs a volunteer: https://bugs.launchpad.net/nova/+bug/1656075 14:18:08 Launchpad bug 1656075 in OpenStack Compute (nova) "DiscoveryFailure when trying to get resource providers from the scheduler report client" [Low,Confirmed] 14:18:14 the other is already started 14:18:37 k 14:19:03 <_gryf> i can take a look on that 14:19:05 cdent: how about writing a note to the ML asking for contributor on that bug. 14:19:13 cdent: or _gryf can take a look :) 14:19:24 #action _gryf to work on https://bugs.launchpad.net/nova/+bug/1656075 14:19:24 Launchpad bug 1656075 in OpenStack Compute (nova) "DiscoveryFailure when trying to get resource providers from the scheduler report client" [Low,Confirmed] 14:19:57 cdent, jaypipes: what level is that one ? :) 14:20:32 the actual code would not be complicated, but there's some concern about how we keep finding new different keystoneauth1 exceptions that are raised 14:20:33 <_gryf> btw, how easy is to setup placement service with devstack? 14:20:36 oh _gryf is assigned already... nm 14:20:50 _gryf: it's already by default 14:20:51 so there needs to be some inspection to find out what the real ones 14:21:06 <_gryf> bauzas, no need for plugin setup or smth? 14:21:11 _gryf: no longer 14:21:16 <_gryf> cool 14:21:30 we defaulted devstack to include the placement service 14:21:36 <_gryf> +1 14:22:01 _gryf: ping me if you experience problems tho 14:22:07 any other bugs 14:22:09 <_gryf> bauzas, sure 14:22:48 #topic open discussion 14:23:33 edleafe wrote on the agenda asking about: are we going to allow the can_host parameter to be passed in queries to /resource_providers? is it going to be a part of the ResourceProvider object model? 14:24:03 that's somewhat related to a topic for a hangout happening later 14:24:13 indeed. 14:24:39 we also had discussed changing that field to "shared" or something like that. 14:26:40 I can't recall a specific explanation for why a particular field is needed 14:27:35 <_gryf> IIRC can_host indicates that this particular node can roll out VMs 14:27:56 _gryf: sure, but doesn't having inventory of VCPU mean the same thing? 14:28:13 <_gryf> well, implicitly 14:28:26 cdent: it's mostly for the querying of providers of shared resources. we need a way to filter a set of providers that have shared resources from providers that are associated with providers that share resources. 14:28:55 cdent: the placement API is intended to be used by more than just Nova eventually. 14:29:01 of course 14:29:11 cdent: but yeah, there are other ways to query to get this information. 14:30:04 <_gryf> I can think of situation, where host which have CPU shouldn't spin out vms 14:30:09 cdent: the long-term solution for the problem of "where do I send this placement decision?" is to use a mapping table on the Nova side between resource provider and the "processing worker" (ala the service record in Nova) 14:30:17 <_gryf> like specific hosts for HPC 14:31:45 cdent: to be specific about the querying problem... 14:31:51 cdent: here is the issue: 14:31:56 I suspect what's driving ed's question is avoiding a late stage microversion: "Currently you cannot pass 'can_host=1' as a query parameter, since we only support 'name', 'uuid', 'member_of', and 'resources' as valid params. " 14:32:05 cdent: if I have a request for DISK_GB of 100 14:32:28 cdent: and I have 3 hosts, all of which are associated to aggregate A. 14:32:46 cdent: 2 of the hosts have no local disk. one has local disk (and thus an inventory record of DISK_GB) 14:33:12 cdent: aggregate A is associated with provider D, which is a shared storage pool, obviously having an inventory record of DISK_GB 14:34:15 cdent: now, to find the providers that have VCPU, MEMORY_MB and DISK_GB, I would get all the provider records that have VCPU, MEMORY_MB and DISK_GB, along with the provider record for provider D which has the DISK_GB pool. 14:35:08 yeah, what you're describing here is pretty much the center of the question in that email thread 14:35:13 cdent: the potential is there to return the two compute hosts that have no local disk, and the compute host that has local disk as the provider for DISK_GB of those non-local-disk compute hosts. 14:35:28 and it is centered around what 'local_gb' currently means in the resource tracker 14:35:41 cdent: eh, partly, yes. 14:37:27 cdent: the way I'm currently doing the query in SQL -- you can do it by doing an intersection of providers that have *some* of the requested resources minus the set that has *all* of the resources, conbined with the providers of only shared resources... but it's messy :) 14:37:45 cdent: having that can_host field simplifies things on the querying front. 14:38:01 cdent: but... I don't want to expose that out the main API if I don't have to. 14:38:29 cdent: basically, I'd like to be able to deduce "can_host" from looking at an inventory colleciton and the set of aggregates a provider is associated with. 14:38:30 if can_host matters, we need need start actually using it then, because last I checked the resource tracker doesn't? 14:38:34 ah 14:38:37 i ssee 14:38:38 right. 14:39:10 the resource tracker doesn't yet because the resource tracker hasn't yet been responsbile for adding shared provider inventory. 14:39:57 and it may not be -- an external script, for instance, might be what does that. but the RT will need to at least *know* which shared providers that compute node is associated with in order to set allocation records against it. 14:41:48 to clarify: how much of this matters before the end of ocata? 14:41:48 anyway, seems I've lost everyone :) 14:42:46 Well, what I think needs to happen (because I always think this needs to happen) is more writing in email (perhaps in response to the update # messages). 14:43:07 cdent: I would like to see shared resource inventory records being written in Ocata, but it's a stretch. I think if Ocata can focus on scheduler integration for VCPU, MEMORY_MB and DISK_GB as assumed local disk, I'd be happy. 14:43:27 cdent: and then in Pike, integrate scheduler decisions for shared resources (and nested stuff, etc) 14:43:45 cdent: agreed on communication to ML. 14:43:48 so, for the non-ironic case, my email assumptions are effectively correct? 14:43:53 cdent: will try to dump brain on that this week. 14:44:07 cdent: need to read your email assumptions :( 14:44:09 (e.g., the meaning of local_gb) 14:44:16 * cdent shakes his tiny fist ;) 14:44:31 you'll probably want to read that before the hangout 14:44:35 * jaypipes hammers cdent with his YUGE palm. 14:44:45 * cdent has already built a wall that jaypipes paid for 14:44:59 * jaypipes has made bauzas pay for it (he doesn't know yet) 14:45:00 your illegal palm cannot reach my pristine homeland face 14:45:16 * cdent sighs 14:45:22 ok, enough fun. 14:45:28 anyone have any other open topics? 14:45:31 another thing I wanted to bring up... 14:45:32 yeah 14:45:59 I wanted to see if there's any interest from our more UX-centric friends in getting started on a Horizon placement plugin. 14:46:14 readonly for now. 14:46:47 <_gryf> is anyone use horizon? 14:46:48 as in for resource viewing? 14:46:51 * _gryf runz 14:47:18 listing providers and their inventories/allocations could be nice 14:47:40 or an admin being able to filter those by his list of hosts 14:48:00 oh, look, it's a mriedem. 14:48:05 welcome, friend. 14:48:16 jaypipes: i assume that anything for that will need to be owned by our team in a plugin outside of horizon 14:48:20 _gryf: yes, many folks use it :) 14:48:24 and i don't know anyone that works on horizon 14:48:31 * _gryf was just kidding 14:48:40 mriedem: right, me neither, that's why I was asking :) 14:48:50 we can throw it into the ptg etherpad of doom 14:48:56 hehe 14:49:16 <_gryf> at least mail on ml would be helpful 14:49:27 _gryf: k, will send. 14:49:36 * jaypipes has many communication action items.. 14:49:51 #action jaypipes read and write a fair bit of email 14:50:11 anyone or anything else? 14:50:58 guess not 14:51:22 thanks everyone for showing up and watching the cdent and jaypipes show, next week we'll have punch and judy 14:51:38 ciao 14:51:38 <_gryf> :D 14:51:49 #endmeeting