14:00:15 #startmeeting nova_scheduler 14:00:16 Meeting started Mon Jun 6 14:00:15 2016 UTC and is due to finish in 60 minutes. The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:18 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:20 The meeting name has been set to 'nova_scheduler' 14:00:22 o/ 14:00:23 o/ 14:00:23 o/ 14:00:24 o/ 14:00:25 o/ 14:00:26 o/ 14:00:28 o/ 14:00:42 Good crowd today 14:00:47 o/ 14:01:00 I'm a little unprepared, as I was out most of last week 14:01:14 No one updated the agenda, so we'll have to wing it 14:01:20 o/ 14:01:27 #topic Specs 14:01:39 Is there anything to discuss regarding specs? 14:01:57 so spec freeze happened last week 14:02:13 do we have a good list of the priority spec we still want to happen? 14:02:23 I guess the etherpad covers some of that 14:02:40 * bauzas waves 14:03:11 I think we possibly need to amend some already approved specs 14:03:20 plus the allocations sepc 14:03:21 spec 14:03:27 o/ 14:03:29 given what jaypipes wrote 14:03:39 to clarify how the RT is updating its stats 14:03:52 well, I should abandon the wording "stats" 14:03:52 bauzas: I'm almost done with the amend to the generic-resource-pools spec. 14:03:59 jaypipes: coolness ++ 14:04:13 jaypipes: thanks for the nice catch-up email btw., nice to dig into it 14:04:51 jaypipes: anything you need to discuss/clarify here? 14:04:53 so, I was saying there is a major implementation change (not really a design modification) about how the RT is sending its inventories 14:05:30 see the tl;dr in jaypipes's email of this morning/afternoon/ 14:05:52 plus some cdent's open question about which interface we should use for that 14:06:03 (I just gave MHO to that) 14:07:08 Yeah, my comments there are trying to draw out people's opinions, get things clarified, etc 14:07:12 edleafe: no, just that the resource-providers-allocations spec will basically be overhauled. 14:07:22 o/ 14:07:43 jaypipes: is it going to be a new gerrit patchset? 14:07:49 edleafe: since we will not try to do the migration in the Instance objects themselves but instead rely on a duplicate call to the placement API from the resource tracker to add allocation information via the placement API. 14:07:58 mlavalle: talking about the spec :) 14:08:13 mlavalle: but yes, it will likely be a new patchset. 14:08:29 jaypipes: thatnks, that is what I meant 14:08:47 OK, thanks. We can continue discussion on the ML 14:09:16 #topic Reviews 14:09:34 Anyone have anything to bring up about code reviews? 14:09:36 well, reviews, I'm a bit on-hold now :) 14:10:03 jaypipes: cdent: so AFAICS, there is a patch series starting with the Allocation object that ends up with us having a new placement endpoint, correct? 14:10:30 bauzas: s/endpoint/service/ 14:10:41 but yes 14:10:55 and much of it needs to be -W, because of the stuff we've talked about earlier today and late last week 14:10:59 but some of it is still stable 14:11:09 well, I don't think that would be a lot impacted 14:11:10 later today I'm going to extract the stable bits 14:11:18 we can still merge https://review.openstack.org/#/c/282442/ 14:11:32 bauzas: no, that one is wrong too 14:11:37 will mark it -w now 14:11:38 and then, I could see how we would end-up with us having that new endpoint 14:11:41 actually I can't, wil -1 it 14:11:43 cdent: why so ? 14:12:06 oh 14:12:07 bauzas: well, what we've come up with is a plan to put most methods on the ResourceProvider object itself. 14:12:37 bauza, edleafe: for instance, have a ResourceProvider.update_inventory() method and a ResourceProvider.associate_aggregate() call, etc. 14:12:37 jaypipes: okay, I guess it's what I haven't read yet in your summary email :) 14:12:51 gotcha 14:12:59 bauzas: no, that was a conversation I had with cdent *after* I sent my email. :) 14:13:07 graaaah 14:13:23 I officially state here that I have free time for helping you :) 14:13:28 bauzas: will send another ML post after finalize that decision with dansmith 14:13:36 bauzas: understood. 14:13:48 so, just lemme know so I could bite a bit of that big cake :) 14:13:49 bauzas: will need your review help this week more than anything else. 14:14:01 bauzas: also... 14:14:45 (still waiting about what the supreme secret of the universe being...) 14:14:50 edleafe, bauzas: I have changed the REST API from /resource_pools to /resource_providers to standardize the terminology used. from last week's discussion with the Ironic team folks, it was clear the terminology was inconsistent and confusing. 14:15:05 jaypipes: k, wfm 14:15:14 yeah, makes sense 14:16:03 jaypipes: cdent: one thing is confusing me, do we need a new endpoint or a new *service* ? 14:16:08 ie. a new port ? 14:16:28 bauzas: it's been described as a new port from the start, to enable later extraction 14:16:43 I mean, the more can do the less, but that mostly impacts a lot of ops 14:17:10 and at midcycle when we decided new, port we also decided "use less of nova wsgi architecture" 14:17:15 we can still have a totally separate branch that would be behind a single endpoint, without requiring a new service 14:17:45 cdent: sure, but you know that every new service we create is just a clear PITA for packagers and ops running our infra ? 14:18:18 That may be, but we decided this back in January. 14:18:31 bauzas: it's a new service. 14:18:35 That was one of the few things we agreed on. 14:18:43 cdent: in Bristol ? I should have been sleeping by then :) 14:19:03 You snooze you loose :) 14:19:11 jaypipes: by service you mean a new thing in the catalog and a new port? 14:19:24 long term that's certainly the expectation, but not sure we need to do that _now_ 14:19:32 we can, of course, but.. 14:19:36 dansmith: my point, thanks for clarifying 14:20:27 the alternative is not use the service catalog for now and a CONF value instead? 14:20:41 At the time, the fear was that if we didn't do it from the outset, then we would have to maintain the halfway-way forever. 14:20:53 or find it relative to nova? 14:20:55 So it would be better to do a clean bit of newness. 14:21:12 johnthetubaguy: I'm fine with having the placement API behind a separate endpoint, that's what we agreed 14:21:14 I guess I'm a bit afraid to commit to a new thing in the catalog and a new port at the moment 14:21:28 I'm only concerned by having a new n-something with a dedicated port 14:21:33 the nice thing is it means we don't have to add rpc from the api to that service if we go straight for a new port, 14:21:55 but that's a pretty weak decision, vs what hedging gives us in terms of being able to evolve the course 14:22:14 \b 14:22:18 IMHO, if it's a separate endpoint we're not committed to anything long-term 14:22:18 Ooops. 14:22:43 Before last weeks re-ordering, the hedging was being done by not using the API at all, initially 14:23:12 and as I said last week, we don't *have* to use the API in newton 14:23:19 jaypipes kinda codified the decision in his summary, 14:23:40 but I was just explaining that I had expected we'd get there in newton and avoid the rpc upcall 14:24:42 It sounds like we still have quite a bit of getting-on-the-same-page to be doing. 14:24:46 I think the real sticking point for the api comes in where neutron needs to report their resources right? 14:25:25 seems like maybe jaypipes has gotten pulled away.. not sure we're still making progress here... 14:25:41 I 14:26:04 sorry, by service I mean a new thing in th ecatalog and a new port, yes. 14:26:35 heh 14:26:44 um 14:27:10 I wish things would have been clearer before, because that's really concerning me :/ 14:27:20 bauzas: why? 14:27:48 jaypipes: because there are 2 possibilities with that 14:27:57 bauzas: we're going to need a separate scheduler service in the catalog when the rEST API for placement exists. 14:28:28 #1 either we keep the current wsgi stack and just add a new service that would use the existing stack for running the new namespace 14:28:58 but that sounds an huge operator impact for something that could be running on the same workers 14:29:57 #2 or we assume the existing wsgi stack has kind of a tech debt and we deploy a new framework, but that would be terrible because we would have 2 ways of writing REST resources within one single repo 14:30:54 creating a new service means more than allocating a new port, operational-wise :) 14:31:03 in particular for packagers and deployers :) 14:31:23 bauzas: we discussed a lot of that a while ago with sdague and determined it would actually be good to *not* have the placement API code inherit the Nova baggage and instead be a totally separate API service endpoint, just housed in /nova/api/openstack/placement instead of nova/api/openstack/compute 14:31:43 jaypipes: I agree with having its own endpoint 14:31:51 even, 100% to that 14:32:07 bauzas: I don't really see any benefit to keeping the placement API within the same nova-os-compute-api endpoint. 14:32:18 jaypipes: don't get me wrogn 14:32:26 jaypipes: not talking of not having /placement 14:32:47 jaypipes: just talking of it running behind our single n-api service or not 14:33:05 bauzas: we don't currently have a "single n-api" service, though. 14:33:15 and AFAIR, we agreed on having it a separate endpoint, I'm fine with that :) 14:33:19 we have nova-os-api-compute and nova-api-metadata services. 14:33:30 right 14:34:42 bauzas: this definitely should start as a 3rd API on the network 14:34:53 otherwise, the split requires a proxy service 14:35:33 yeah, it has to be as separate as nova-api and metadata 14:35:39 at a minimum 14:36:02 johnthetubaguy: right 14:36:02 sdague: I see, because our endpoint is /v2.1/, not /v2.1/os-api-compute/ ? 14:36:25 bauzas: well, for a lot of reasons, but yes, that's a symptom 14:36:39 os-api (compute) has one router 14:36:41 okay, thanks for clarifying 14:36:45 you can't really mix routers 14:37:05 k, I see it now 14:37:18 bauzas: to be clear, this would be an API service running on a totally different port or top-level directory. 14:37:20 and I think we've been pretty clear on this approach since ... at least bristol 14:37:42 it seems like we need to expose this extra service to deployers eventually, seems like doing it now is the easiest path long term 14:37:59 I wasn't at Bristol, but I do remember this being one of the outcomes 14:38:02 agreed there is significant short term pain 14:38:10 I was getting lost in the terminology here.. I agree it needs to be a thing peer to compute and metadata 14:38:21 sdague: I think I mixed endpoint and service 14:38:35 I clearly remember us talking about the placement API being a separate endpoint 14:38:39 now that doesn't mean we have to get it in the service catalog right away, I guess? 14:38:43 bauzas: right, which is what this means 14:38:46 johnthetubaguy: yes 14:38:50 type=placement 14:38:51 sdague: okay 14:39:40 can we re-use the existing WSGI stack we have or do we need to somehow run a totally different router ? 14:40:11 that is up to whoever is doing it 14:40:16 you could do either 14:40:22 bauzas: In the POC at https://review.openstack.org/#/c/293104/ I'm using a different router because it is much simpler than the Nova Routes mode 14:40:30 okay, because the latter is a bit worrying me 14:40:35 my concern stems really from putting the build-a-new-api-from-new-parts in the critical path 14:41:16 dansmith: yeah, thats a good concern, it seems like we could create the API in parallel, if we cheat for the short term 14:41:21 dansmith: I agree that's a valid concern, but one of the things I've taken great pains to do is make sure that the code being used is very straightforward and small. _much_ more so than the compute-api 14:41:49 As the API is currently definied, there's no need for it to be super complicated. 14:41:53 for reference, this is where the service split between os-api and md happens today - https://github.com/openstack/nova/blob/903731e7a145eb3cd27e16461de83fdbab1baf03/nova/cmd/api.py#L52-L61 it's a super early split in the workers 14:42:12 johnthetubaguy: it's those "cheats" that involve using the ovo indirection_api that I believe add to much cruft to the solution that long-term will need to be undone. 14:42:47 well that is the trade off here 14:42:49 jaypipes: I totally don't understand what you just said 14:43:30 well, not sure I understand the cost of the "cruft", it doesn't seem too big, as it uses existing infrastructure, but I assume I am missing something? 14:43:42 cdent: well, the problem with using a new router can be seen with things like https://review.openstack.org/#/c/293104/50/nova/api/openstack/placement/handlers/aggregate.py 14:43:51 dansmith: the "cheats" that johnthetubaguy is referring to is not using the HTTP API at all in Newton and instead directly using InventoryList objects et al from the Nova resource tracker and using in-object-trickery to send invneotry/allocation data to one place or the other. 14:44:13 cdent: that's a different way of coding that needs me to having 2 different mindsets for reviewing depending on which namespace I'm looking at 14:44:39 (plus the fact that we're duplicating the aggregates REST resoruce, which is a bit worrying me, but out of that convo now) 14:44:49 jaypipes: I don't get the "in-object-trickery" part.. you mean sending inventory updates to the api db? 14:45:23 (bauzas let's talk about that outside this meeing, because I think I can change your mind, at least a little bit) 14:45:32 k 14:45:35 dansmith: yes. via the ComputeNode object instead of having totally separate objects for the placement service and for Nova. 14:46:06 * dansmith is still confused 14:46:30 can't we just call the placement service code, pre split? 14:46:31 dansmith: in other words, the stuff I pushed a revert up for, I don't think that way of doing things is good. I believe we need totall different sets of objects in Nova and in the placement service (the split-out scheduler). 14:46:55 at least, thats what I expected to happen in that scheduler client seam 14:48:04 johnthetubaguy: well, that's my point... we don't *have* a placement service until we get the REST API done and we can't get that REST API done until the object interfaces are done. And we can't get the object interfaces done until we decide how to handle data in the old ComputeNode object vs. the new ResourceProvider object.. 14:49:15 jaypipes: why can't you have a REST API until the object interfaces are done? There should be some decoupling right 14:49:20 ++ 14:49:32 sdague: thats my thinking too 14:49:48 johnthetubaguy: so my proposal to the ML was to work on the object interfaces for the ResourceProvider object (InventoryList, AllocationList, etc), get those defined to the point where they make sense for the split-out placement service, then get the REST API for placement finalized to use those object definitions and the API database in Nova, and then update the resource tracker to call 14:49:54 that placement REST API in addition to its existing call to ComputeNode.save() whenever inventory changes. 14:50:39 sdague: we could have the placement REST API just directly operate against the API DB instead of using Nova objects. Is that your suggestion? 14:51:06 well, we know from history that what we need in the data layer and the REST layer changes over time 14:51:14 given Ironic in particular is actively waiting for the generic-rp to be implemented (plus dynamic-rc), I'd suggest to not wait for a potential split-out that's blocking us, and rather iterate on things we can do quickly 14:51:25 and the REST layer needs certain guaruntees to users and has to change more slowly 14:51:40 bauzas: Sylvain, that is precisely what I am trying to do: make some progress. 14:51:44 because I really want us to not have the Ironic host/node relationship to be kept again a couple of cycles 14:52:01 From my standpoint, there are a lot of good ideas here, but _none_ of them are being communicated clearly and completely. Can everybody be good and join the ML thread with more complete dumps of their concerns and ideas? 14:52:14 jaypipes: sure, I'm just trying to say that generic-rp is what we want to achieve, not the split-out (yet) 14:52:16 bauzas: I don't either 14:52:41 it's certainly a nice side effect that we could do that in the same time, but we shouldn't block us because of that 14:53:26 bauzas: I am at a loss right now. 14:53:50 bauzas: all I want to do is make forward progress but every time I suggest something you say something like "but that would be too much for operators to deal with". 14:54:15 bauzas: how do you propose to make progress here exactly? 14:54:35 jaypipes: I totally apologize if you feel that, I was confused by the reason behind a new service, but sdague clarified that 14:55:27 We only have 5 minutes left. We should continue in -nova and/or the ML 14:55:36 #topic Opens 14:55:38 jaypipes: my point is, can't we just add those Inventory and Allocation objects and then work on having the REST API implemented 14:55:50 Anything (different) to discuss? 14:57:14 * edleafe hears crickets 14:57:36 silence means security 14:57:45 OK, so let's continue this on the ML, where we can probably express our ideas more clearly 14:57:51 #endmeeting