*** ChanServ has quit IRC | 01:00 | |
*** ChanServ has joined #openstack-placement | 01:04 | |
*** barjavel.freenode.net sets mode: +o ChanServ | 01:04 | |
*** mriedem_afk has quit IRC | 01:05 | |
*** Nel1x has joined #openstack-placement | 01:28 | |
*** dansmith has quit IRC | 01:49 | |
*** dansmith has joined #openstack-placement | 01:51 | |
*** lei-zh has joined #openstack-placement | 02:01 | |
*** Nel1x has quit IRC | 02:28 | |
*** nicolasbock has quit IRC | 02:28 | |
*** dims has quit IRC | 02:28 | |
*** Nel1x has joined #openstack-placement | 02:30 | |
*** nicolasbock has joined #openstack-placement | 02:30 | |
*** dims has joined #openstack-placement | 02:30 | |
*** openstack has joined #openstack-placement | 13:23 | |
*** ChanServ sets mode: +o openstack | 13:23 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova-specs master: Add subtree filter for GET /resource_providers https://review.openstack.org/595236 | 13:24 |
---|---|---|
cdent | giblet: ^^ that hacks tox.ini to make it possible to isntall placement from github. The problem I was experiencing before is that installs from git require pip directly, not whatever is installing nova itself (which I've noted in the updated commit message) | 13:24 |
*** openstackstatus has joined #openstack-placement | 13:25 | |
*** ChanServ sets mode: +v openstackstatus | 13:25 | |
giblet | cdent: thanks | 13:26 |
giblet | cdent: I'm pulling it down to play with it | 13:26 |
cdent | yay! | 13:27 |
cdent | I changed the reference to my branch, rather than the pull request so that edleafe's repo can fluctuate as required | 13:27 |
cdent | seems to be running okay in the gate (at least for functional). gonna run out for a bit | 13:33 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: api: Remove unnecessary default parameter https://review.openstack.org/564451 | 13:50 |
*** efried_afk is now known as efried | 14:16 | |
*** efried is now known as senhor_granhular | 14:24 | |
senhor_granhular | leakypipes, giblet: Diacritics not allowed in nicknames, so I had to go with Portuguese. | 14:25 |
leakypipes | heh :) | 14:32 |
giblet | :) | 14:33 |
senhor_granhular | leakypipes, giblet: Responded. | 14:36 |
*** senhor_granhular is now known as fried_rice | 14:37 | |
fried_rice | leakypipes: Not having caught up with emails yet, did you give further thought to the refresh-in-reshaper-flow issue we started talking about yesterday? | 14:37 |
giblet | fried_rice: thanks | 14:39 |
fried_rice | lemme know if makes no sense | 14:39 |
fried_rice | Easy +A: https://review.openstack.org/#/c/595453/ | 14:39 |
giblet | fried_rice: easily approved :) | 14:41 |
fried_rice | thanks giblet | 14:41 |
giblet | fried_rice: and your comment about non-granula group totally make sense | 14:42 |
fried_rice | phew | 14:42 |
*** ttsiouts has quit IRC | 14:46 | |
fried_rice | giblet: Your +2 on the bottom patch is useful, because 2x+2 all the way up the series will be our signal to remove the -2 and merge. | 14:57 |
mriedem | i suppose that's my cue to review the rest of the series now | 14:59 |
giblet | fried_rice: if you agree to do the doc change in a followup then I plug my +2 back | 15:01 |
fried_rice | giblet: Sure thing. | 15:01 |
giblet | fried_rice: I've plugged the +2 back | 15:03 |
fried_rice | giblet: thx | 15:03 |
cdent | our testing infrastructure is working against the kind of abuse I'm trying to do at the moment. I guess that's good most of the time. | 15:05 |
*** ttsiouts has joined #openstack-placement | 15:07 | |
fried_rice | giblet, leakypipes: Why no +W on https://review.openstack.org/#/c/584598/ ? Is it because of the spurious random -1? | 15:09 |
edleafe | cdent: Which files do we need to preserve in the current nova/api/openstack directory? | 15:10 |
cdent | edleafe: none, just the placement directory and below | 15:11 |
edleafe | cdent: thanks. Wasn't sure if we needed the wsgi stuff | 15:11 |
cdent | no, placement has its own (simpler) stuff | 15:11 |
edleafe | kewl | 15:11 |
cdent | okay, I just got a devstack working off that trimmed nova repo. nova-placement-api is using the placement repo code, but presenting itself as a nova thing | 15:13 |
cdent | gonna create some servers, make sure that's all happy, and then switch over to a placement-api using the same db | 15:14 |
mriedem | fried_rice: cdent: giblet: leakypipes: correct me if i'm wrong, but there is a behavior change in the RT with https://review.openstack.org/#/c/584598/ which has gone unnoticed | 15:15 |
mriedem | except maybe by the wily ying wang | 15:15 |
cdent | mriedem, fried_rice : how errors and return values are handled in the RT is a complete black hole for me :( | 15:16 |
fried_rice | heh | 15:17 |
mriedem | cdent: you were +1 on the change so i called you out | 15:17 |
mriedem | with the others | 15:17 |
fried_rice | If mriedem's comments are accurate, I really effed up. Because I fully intended to make that particular code path backward compatible other than the log msg | 15:17 |
cdent | mriedem: haven't you figured out by now that my +1 means "I'm happy to see this fail later"? | 15:17 |
fried_rice | looking closer... | 15:17 |
mriedem | cdent: i assume that's a joke | 15:18 |
fried_rice | ah, mriedem, I think you may have missed the `or {}` in the original. | 15:18 |
mriedem | well f me sideways | 15:18 |
* fried_rice looks up straight jackets again | 15:19 | |
mriedem | +W | 15:19 |
mriedem | sorry | 15:19 |
cdent | mriedem: well, more accurately: of the things I have experience in, this does not offend my sensibilities, and I trust the other people who have already reviewed. | 15:19 |
cdent | but also: I'm okay with bugs leaking through some of the time: they are the grist in the mill that keeps people coming back | 15:19 |
mriedem | https://bugs.launchpad.net/nova/ | 15:20 |
mriedem | 1 → 75 of 804 results | 15:20 |
mriedem | ... | 15:20 |
cdent | :) | 15:20 |
mriedem | half that shit is probably bugs in havana anyway | 15:20 |
cdent | several conflicting forces at work | 15:20 |
*** nicolasbock has joined #openstack-placement | 15:21 | |
fried_rice | fwiw, I think "of the things I have experience in, this does not offend my sensibilities, and I trust the other people" is a reasonable review strategy, and is why we require multiple +2s to merge a thing. | 15:21 |
giblet | mriedem: to be honest. We I now went back to look at your comment I did not noticed the or {} | 15:21 |
fried_rice | mriedems of the world notwithstanding, it is unreasonable to expect even an illustrious core to understand all the implications of every change before approving it. | 15:22 |
giblet | s/we/when/ | 15:22 |
mriedem | fried_rice: that's why i only +1ed this https://review.openstack.org/#/c/592285/ | 15:23 |
openstackgerrit | Claudiu Belu proposed openstack/nova master: hyper-v: autospec classes before they are instantiated https://review.openstack.org/342211 | 15:23 |
fried_rice | heh, and why I avoided it entirely | 15:23 |
openstackgerrit | Claudiu Belu proposed openstack/nova master: WIP: replace spec with autospec https://review.openstack.org/557299 | 15:23 |
fried_rice | because I figure I should understand at least *some* part of it before voting. | 15:24 |
leakypipes | fried_rice: sorry, was grabbing lunch. I decided to back off and propose a patch in the future that shows what I was talking about. | 15:25 |
mriedem | like me reviewing new powervm features... | 15:25 |
leakypipes | fried_rice: on the refresh-in-reshaper tihng | 15:25 |
leakypipes | thing | 15:25 |
mriedem | sspd viao wtf | 15:25 |
fried_rice | leakypipes: Roger, saw that comment, looking forward to seeing what you come up with :) | 15:25 |
edleafe | cdent: I changed the extract script to move n/a/o/placement to placement/api instead of the placmeent directory. What about the test directories? Should n/t/u/a/o/placement be p/t/u/api or just p/t/unit? | 15:27 |
cdent | edleafe: latter. unless you have an N package somewhere in the "real" code that name shouldn't show up in the tests? | 15:28 |
leakypipes | fried_rice: also, on cdent's API patch, I'm cool addressing that one race condition at a later time. | 15:28 |
leakypipes | fried_rice: it would be super rare anyway and not something I'm too concerned about | 15:29 |
edleafe | I was writing a little search/replace thing to change the pathing, and stumbled upon that difference | 15:29 |
cdent | "little search/replace thing"++ | 15:29 |
fried_rice | I guess I would expect things under p/t/u/api to be testing the API, which would really be /f/ tests not /u/ tests. So I'd be in favor of nixing /api on the /u/ side. | 15:31 |
cdent | fried_rice: the strategy I have in my head is it's all "api" unless otherwise specified | 15:33 |
cdent | as that maps to what was already in place | 15:33 |
fried_rice | sure, makes sense. | 15:33 |
cdent | also, it _might_ help us keep placement a thing with only one long running service | 15:33 |
fried_rice | I guess PlacementDirect would be non-api? | 15:33 |
*** giblet is now known as giblet_off | 15:34 | |
cdent | when I say "all api" I mean there is no api directory | 15:34 |
cdent | so everything just does where it aligns with the existing package hierarchy | 15:34 |
cdent | so sincer direct.py is top level, so would its tests | 15:34 |
cdent | edleafe: does all that correspond with your thinking? | 15:37 |
edleafe | Hmmm... then why the need for placement/api for the current n/a/o/placement stuff, instead of just moving it to placement/ ? | 15:39 |
cdent | that's what I'm saying there shouldn't be any 'api' directory anywhere | 15:40 |
cdent | the contents of n/a/o/placement is what becomes $repo/placement | 15:40 |
cdent | I wrote that on https://etherpad.openstack.org/p/placement-extraction-file-notes line 25 ish. did you see that? | 15:41 |
edleafe | I've read that a bunch of times along with other stuff. I'm getting confused; hence the clarification request | 15:42 |
cdent | I'm okay with there being a subdir there as it might be tidier (thus it being "up for debate" on the etherpaf). What do people thin | 15:42 |
edleafe | In my first push, placement/ had 3 subdirs: api/, db/, and tests/ | 15:43 |
edleafe | So cut that down to just the latter two? | 15:43 |
cdent | I think so, yeah, assuming you're not counting policies, schemas, handlers (and other?) dirs in that "two"? | 15:44 |
edleafe | well yeah, after the stuff in api/ is moved down | 15:45 |
*** ttsiouts has quit IRC | 15:46 | |
cdent | I feel like maybe I'm still not understanding you but I guess it will all become clear in the next example and we can continue to iterate | 15:51 |
edleafe | No, I think we're on the same page | 15:52 |
*** tssurya has quit IRC | 15:54 | |
fried_rice | Is any of this repathing instrumental to getting things working, or could it maybe be done in a subsequent change set after the repo is seeded? | 15:54 |
fried_rice | not advocating, just asking. | 15:54 |
fried_rice | I guess you're having to determine paths for everything and change import lines anyway. | 15:55 |
mriedem | fried_rice: so in https://review.openstack.org/#/c/584599/ why did you change over the heal_allocations CLI but not the other usage in compute and conductor? | 15:57 |
mriedem | assuming b/c it was the only user of include_generation=True? | 15:57 |
fried_rice | mriedem: Because the heal allocations one was the only one using the new microversion at the time | 15:57 |
fried_rice | yes | 15:57 |
fried_rice | the other changeovers were going to be pretty complicated, IIRC, and I wanted to stay focused on what was needed for the reshaper series. | 15:58 |
fried_rice | and do the others "later" | 15:58 |
mriedem | sure | 15:58 |
mriedem | just checking | 15:58 |
fried_rice | hence the implicit TODO in the commit msg https://review.openstack.org/#/c/584599/21//COMMIT_MSG@23 | 15:58 |
fried_rice | and in the code https://review.openstack.org/#/c/584599/21/nova/scheduler/client/report.py@1539 | 15:59 |
mriedem | yes yes | 15:59 |
fried_rice | sorry, not trying to be defensive, just double checking myself | 15:59 |
mriedem | consider me bludgeoned | 15:59 |
fried_rice | dredging up memories from way back when I wrote this. | 15:59 |
mriedem | i have some comments on the heal_allocations part of it, but haven't posted yet | 15:59 |
fried_rice | ack | 15:59 |
fried_rice | knowing that you're reviewing the series, I wasn't planning to lift the bottom -2 until you're done. | 16:00 |
mriedem | yeah hopefully will be done today | 16:01 |
cdent | fried_rice: if we want to be able to do testing with nova.api.openstack.placement.handler and handler.placement potentially being in the same python space (as I'm doing right now) then we need to repath things | 16:01 |
cdent | sorry handler.placement -> placement.handler | 16:01 |
mriedem | fried_rice: ok comments inline on that one | 16:03 |
mriedem | note the risk of putting a release name in a commit message near a release boundary :) | 16:03 |
fried_rice | did that happen? oops | 16:04 |
mriedem | so it looks like you've got -1s to address on the next patch after this, so might as well respin that commit message when you rebase and i'll fast approve | 16:05 |
fried_rice | mriedem: ack. | 16:07 |
fried_rice | though I'm not completely sure about the -1s above <== giblet_off leakypipes | 16:08 |
mriedem | did we say everything must be down during reshape runs? if so, that kind of breaks rolling upgrades right? | 16:11 |
mriedem | dansmith: ^ | 16:11 |
dansmith | I'm not sure what you mean, | 16:12 |
mriedem | https://review.openstack.org/#/c/584648/20/nova/scheduler/client/report.py | 16:12 |
dansmith | obviously placement can't be down so I don't really get it | 16:13 |
dansmith | I'm focusing on something else right now so I can't grok all that at the moment | 16:13 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: doc: Note NUMA topology requirements for numa-aware-vswitches https://review.openstack.org/596393 | 16:14 |
cdent | Is that maybe about FFU and the direct interface, which is only a sometime, not all the time | 16:16 |
cdent | fried_rice ^ ? | 16:16 |
fried_rice | I thought placement offline was the whole reason we did PlacementDirect | 16:18 |
openstackgerrit | Merged openstack/nova stable/rocky: Correct the release notes related to nova-consoleauth https://review.openstack.org/595890 | 16:18 |
openstackgerrit | Merged openstack/nova master: tests: Move mocking to setUp https://review.openstack.org/595802 | 16:18 |
mriedem | well we have 2 upgrade scenarios, | 16:18 |
fried_rice | But I think maybe we did say we would allow reshape to happen on compute startup. | 16:18 |
mriedem | first start of compute with the new code which is an online upgrade | 16:18 |
fried_rice | yeah, that would be the second one? | 16:18 |
mriedem | compute fails to start if reshape fails | 16:18 |
mriedem | FFU is the offline one | 16:19 |
mriedem | like how we migrated ironic flavors | 16:19 |
fried_rice | okay. And in either case, is it possible for a partial-evacuate scenario to exist? | 16:19 |
mriedem | we did the ironic flavor migration online during n-cpu start and had the offline option via nova-manage | 16:19 |
mriedem | sure | 16:19 |
fried_rice | okay. Then I guess we need to handle that case. How? | 16:19 |
mriedem | i've got 1000 hosts, 3 failed and i evacuated from them. then i upgrade to stein. | 16:20 |
mriedem | i left some comments - as gibi said, i'm not entirely sure it will ruin anything, but i'm not that far ahead in the series to know | 16:20 |
fried_rice | butbut, don't the evacuated thingies get purged before we get here? | 16:20 |
mriedem | when we reshape, aren't we only reshaping things for the root provider in the tree which is going to be the current node i'm on - specifically speaking for the libvirt and xen drivers | 16:21 |
mriedem | so if there are other providers in the tree i probably don't even care about those | 16:21 |
mriedem | so it depends on how this is used i guess https://review.openstack.org/#/c/584648/20/nova/scheduler/client/report.py@2055 | 16:22 |
leakypipes | fried_rice: my -1 on that patch is because by adding the sharing pool to the ptree before passing it to update_provider_tree() you are populating a record in the ProviderTree for the shared storage pool. And I wanted to functionally test the scenario when that would *not* be populated in the ProviderTree and when an allocation involving that shared storage pool popped up, that the ProviderTree would "fill in" the missing sharing provider | 16:22 |
leakypipes | record. | 16:22 |
fried_rice | You're talking about specific known cases of reshape that will be happening in Stein. In the future, a provider tree may need to be reshaped to a different provider tree shape. This needs to be made to work for both. | 16:22 |
mriedem | the RPs per consumer could be more than just the $nodename RP | 16:22 |
mriedem | we know of 2 cases where the RPs can be other than $nodename: 1) sharing providers - which we don't support yet and (2) evacuated-from nodes | 16:23 |
mriedem | right? | 16:23 |
fried_rice | leakypipes: You mean the ssp doesn't exist before we reshape? | 16:23 |
leakypipes | fried_rice: doesn't exist in the compute node's ProviderTree cache, yes. | 16:23 |
fried_rice | right, that's what I'm doing. | 16:23 |
mriedem | maybe assert what you expect is being setup? | 16:24 |
mriedem | for clarity? | 16:24 |
alex_xu | leakypipes: fried_rice cdent, our team is working on nvdimm device. but we face the fragmentation issue, looking for some help from your guys. The case is just if you already have 2gb allocated in the middle of 10gb device, you can't allocate the rest 8gb, since the device required contiguous space. Pretty sure we can't modify inventory after claim just like fpga and gpu case. so appreciate help | 16:24 |
alex_xu | me on some idea | 16:24 |
leakypipes | fried_rice: the comment on that line is this: | 16:24 |
leakypipes | # Another unrelated compute node. We don't use the report client's | 16:24 |
leakypipes | # convenience methods because we don't want this guy in the cache. | 16:24 |
fried_rice | leakypipes: The whole design of upt (and its precursors) is to *make* it appear in the ProviderTree so that upt sees it and can figure out what to do with it. | 16:24 |
leakypipes | fried_rice: and I'm saying I would prefer that you do the same for the shared storage pool. create it outside of the provider tree convenience methods to ensure the provider tree doesn't have it already when get_allocations_for_provider() ends up being called | 16:25 |
fried_rice | ohhhhhh, now I follow. I wasn't reading the context carefully enough. The code comment you highlighted is talking about a different non-sharing provider elsewhere in placement, and your review comment is asking why same wasn't done 6LOC earlier when we created the SSP. Sorry, ack, will look closer. (Gotta run now) | 16:27 |
leakypipes | right, exactly. | 16:27 |
fried_rice | But I think the ssp won't show up in the cache if I do that. | 16:27 |
leakypipes | fried_rice: that's kinda what I'm trying to tease out... | 16:27 |
leakypipes | fried_rice: to see if the assumptions made in the code are tested for edge cases like this. | 16:28 |
* alex_xu scroll the screen, sounds like something broke | 16:28 | |
fried_rice | owait, I take it back, if MISC_SHARES is set, and it's in the same agg, it *should* show up. So yeah, I can change that out. | 16:28 |
cdent | alex_xu: I've read your message above, and I'm thinking. | 16:28 |
leakypipes | fried_rice: rock on. | 16:28 |
fried_rice | alex_xu: This sounds like a job for reserved=total | 16:29 |
alex_xu | cdent: thanks, sorry for inject the message | 16:29 |
alex_xu | fried_rice: emm...do you mean after the allocation, then you change the reserved of inventory? | 16:30 |
cdent | alex_xu: I would guess some kind of dynamic max_unit adjustments might be workable | 16:30 |
leakypipes | alex_xu: if I'm being blunt, this sounds like something you'll need to solve on your own outside of placement. | 16:30 |
fried_rice | heh | 16:30 |
cdent | at start max_unit is 8, after the 2 is used, it is 3 or 4 | 16:30 |
fried_rice | that's three wildly different answers | 16:30 |
alex_xu | cdent: yea, but we can't dynamic change inventory | 16:31 |
cdent | i'm pretty okay with client side dynamically adjusting inventory if they feel that's the right thing | 16:31 |
cdent | alex_xu: why not? | 16:31 |
alex_xu | leakypipes: yea...my initial thought that should be something on the host or device drive to fix the fragmentation issue, but it just doesn't fix that issue | 16:31 |
alex_xu | cdent: emm...that should be same with gpu case, if we dynamic change the inventory after a resource claim, we will have race problem | 16:32 |
leakypipes | alex_xu: or you could punt to Cinder, since that's basically what you're doing with nvdimm... volume management. | 16:32 |
alex_xu | leakypipes: it isn't cinder thing, it is memory device. | 16:33 |
leakypipes | I've heard they really like live resize functionality. | 16:33 |
alex_xu | I have one idea... | 16:33 |
leakypipes | alex_xu: it's not memory. it's memory with caveats. | 16:33 |
alex_xu | ask the operator setup the device first, for 10gb device, separate as 5 fragment, each one only can be 2gb, and max-unit,min-unit=2gb | 16:34 |
cdent | alex_xu: yes, that would be an option | 16:36 |
alex_xu | leakypipes: there are two way to use nvdimm device, use it as storage, use it as memory, we working on the case use it as memory, actually just passthrough it to the VM, and VM see it as a vNVDIMM device | 16:36 |
alex_xu | cdent: but someone push me back, the reason is it isn't flexible for usage | 16:36 |
cdent | alex_xu: yeah, I'm not sure there's a truly flexible solution if you are unable to dynamically manage inventory | 16:37 |
*** fried_rice is now known as fried_rolls | 16:39 | |
mriedem | fried_rolls: related to leakypipes' comment on the test, isn't it implicitly hitting the evacuate scenario? | 16:40 |
alex_xu | leakypipes: really not sure there is document can explain nvdimm easily..but from the qemu doc, it is realy a memory device https://github.com/qemu/qemu/blob/master/docs/nvdimm.txt | 16:40 |
mriedem | "Another unrelated compute node." | 16:40 |
openstackgerrit | Dan Smith proposed openstack/nova master: Batch results per cell when doing cross-cell listing https://review.openstack.org/592698 | 16:41 |
openstackgerrit | Dan Smith proposed openstack/nova master: List instances from all cells explicitly https://review.openstack.org/593717 | 16:41 |
openstackgerrit | Dan Smith proposed openstack/nova master: Make instance_list perform per-cell batching https://review.openstack.org/593131 | 16:41 |
openstackgerrit | Dan Smith proposed openstack/nova master: Record cell success/failure/timeout in CrossCellLister https://review.openstack.org/594265 | 16:41 |
openstackgerrit | Dan Smith proposed openstack/nova master: Optimize global marker re-lookup in multi_cell_list https://review.openstack.org/594577 | 16:41 |
alex_xu | cdent: can we let placement support dynamically manage inventory? | 16:41 |
alex_xu | leakypipes: fried_rolls ^ is that option, or we already totally say no | 16:41 |
cdent | alex_xu: I don't understand what you mean? | 16:41 |
cdent | you can change inventory in placement whenever you want | 16:42 |
leakypipes | alex_xu: no. | 16:42 |
cdent | if you're asking for placement to provide some form of transactional control, that's not going to happen | 16:42 |
leakypipes | what cdent said. | 16:42 |
alex_xu | cdent: leakypipes yea, 'transcational' that is what i mean | 16:43 |
cdent | allocations and inventory are very intentional not strongly connected | 16:43 |
alex_xu | cdent: emm..what is key point we won't support forever, just want to learn the idea | 16:45 |
cdent | alex_xu: as I understand what you're asking, you want to avoid a race condition where at time X we allocate 2gb in the middle of nvdimm and at near the same time you want to change the inventory so that (in a variety of strategies) the inventory is represented in a way that allows more of it to be consumed accurately. To avoid the race there the initial allocation would somehow have to signal a lock on any interactio | 16:48 |
cdent | with the inventory until it was updated | 16:48 |
alex_xu | cdent: yes | 16:49 |
cdent | alex_xu: is that right? if so, that's more placement-side state management than we've designed into the system. adding something like that would be very complicated and contrary to some of the original design goals about allocations | 16:49 |
cdent | It is probably possible, but it would be hard, for what amounts to an edge case | 16:49 |
cdent | it would be better to either: accept the risk of the race, or figure out a way to manage the inventory in a way that works with the existing constraints | 16:50 |
openstackgerrit | Dan Smith proposed openstack/nova master: Optimize global marker re-lookup in multi_cell_list https://review.openstack.org/594577 | 16:51 |
alex_xu | cdent: i got it, thanks | 16:51 |
alex_xu | cdent: leakypipes fried_rolls, so thanks, at least I got one option won't think about it anymore. probably continue push the fixed-size idea | 16:53 |
cdent | alex_xu: it's at least a way to get started, and then you can iterate | 16:53 |
alex_xu | cdent: yes, agree with that, hope i can persuade people | 16:54 |
dansmith | mriedem: do you still want/need me to look at that upgrade thing from earlier or did it get worked out? | 16:57 |
mriedem | umm, might need to ask fried_rolls | 17:02 |
mriedem | i'm leaving for a bit | 17:02 |
*** mriedem is now known as mriedem_afk | 17:02 | |
* alex_xu continuously explain the placement won't support transactional control from the cat, fpga to nvdimm for people | 17:05 | |
openstackgerrit | Dan Smith proposed openstack/nova master: Batch results per cell when doing cross-cell listing https://review.openstack.org/592698 | 17:18 |
openstackgerrit | Dan Smith proposed openstack/nova master: List instances from all cells explicitly https://review.openstack.org/593717 | 17:18 |
openstackgerrit | Dan Smith proposed openstack/nova master: Make instance_list perform per-cell batching https://review.openstack.org/593131 | 17:18 |
openstackgerrit | Dan Smith proposed openstack/nova master: Record cell success/failure/timeout in CrossCellLister https://review.openstack.org/594265 | 17:18 |
openstackgerrit | Dan Smith proposed openstack/nova master: Optimize global marker re-lookup in multi_cell_list https://review.openstack.org/594577 | 17:18 |
*** fried_rolls is now known as fried_rice | 17:47 | |
fried_rice | dansmith: What upgrade thing from earlier? | 17:47 |
fried_rice | oh | 17:47 |
fried_rice | There's been a couple of reshape scenarios identified where we may have the potential to race, and we're wondering to what extent the reshaper is only being run in a steady/reduced-activity state, and how that impacts those races. | 17:48 |
fried_rice | dansmith: The one we were talking about earlier was here: https://review.openstack.org/#/c/584648/20/nova/scheduler/client/report.py | 17:49 |
fried_rice | Wondering about evacuate and whether we can have one consumer with allocations straddling multiple computes' providers. | 17:49 |
fried_rice | which it sounds like we could | 17:50 |
dansmith | instances in the middle of a resize across an upgrade boundary is not (at all) uncommon | 17:50 |
dansmith | which means you have an upgraded and unupgraded node that hold allocations for that instance which may be reverted or confirmed on either side of the upgrade by the old or new node | 17:51 |
dansmith | we have no way to prevent that scenario, and when we've talked about it, real clouds confirmed it's completely impossible | 17:51 |
fried_rice | but those scenarios don't use the "migration UUID". | 17:51 |
fried_rice | So there would in fact be the same consumer_uuid existing on providers owned by two different hosts. | 17:51 |
dansmith | um, what? | 17:51 |
dansmith | no, the migration uuid is the consumer in that case, which I assume makes your thing easier, | 17:52 |
dansmith | but you may have an older node restoring allocations for a new one | 17:52 |
fried_rice | "older" meaning "before migration_uuid was a thing"? | 17:53 |
dansmith | because you have allocations held on an old node by migration uuid, then on revert, we use those to restore them against the new node | 17:53 |
dansmith | no, old as in pre-reshape | 17:53 |
fried_rice | okay, but I think that's fine as long as the consumer UUIDs are *different* on both sides (hosts) | 17:53 |
fried_rice | I don't care which one is the real instance UUID and which is the migration UUID | 17:53 |
dansmith | it's not if the older node tries to restore a flat allocation against a nested inventory | 17:54 |
dansmith | but regardless, I'm not all caught up on the actual scenario, | 17:54 |
dansmith | I'm just saying you really can't assume that "all evacuations are quiesced" before an upgrade or whatever you said this morning | 17:54 |
dansmith | and you can't assume scheduler or placement is down (or up) either | 17:55 |
fried_rice | Cool, got that part understood. | 17:55 |
fried_rice | So what I actually need to know now is whether there's any kind of move/migration/resize/evacuation/etc. where it's possible to have the same consumer UUID on two providers owned by different hosts. | 17:55 |
fried_rice | (sharing providers don't count) | 17:55 |
dansmith | I don't think you can say that won't or can't happen | 17:56 |
fried_rice | e.g. could GET /allocations/{c} ever return | 17:56 |
fried_rice | { cn1_rp_uuid: { ... }, | 17:56 |
fried_rice | cn2_rp_uuid: { ... } | 17:56 |
fried_rice | } | 17:56 |
fried_rice | ? | 17:56 |
dansmith | it shouldn't happen right now with nova as it is, but if cyborg gets in the mix I would think you could have that fairly easily | 17:57 |
fried_rice | well | 17:57 |
fried_rice | I don't think I care about that, do I? | 17:57 |
fried_rice | and | 17:57 |
dansmith | I definitely don't know what you care about | 17:57 |
fried_rice | cyborg is going to be "owning" the device providers, but those are still going to be nested under the compute RPs. | 17:58 |
dansmith | anyway, mriedem_afk can tell you all of this that you need to know, so I needn't be involved in this I think | 17:58 |
fried_rice | okay. Thanks for the input. | 17:58 |
openstackgerrit | Merged openstack/nova master: Make CELL_TIMEOUT a constant https://review.openstack.org/594570 | 18:08 |
* cdent waves | 18:14 | |
*** cdent has quit IRC | 18:14 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Merge security groups extension response into server view builder https://review.openstack.org/585475 | 18:34 |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Merge extended_status extension response into server view builder https://review.openstack.org/592092 | 18:35 |
openstackgerrit | Merged openstack/nova master: tests: Create functional libvirt test base class https://review.openstack.org/407055 | 18:41 |
*** mriedem_afk is now known as mriedem | 18:45 | |
mriedem | fried_rice: the answer to "So what I actually need to know now is whether there's any kind of move/migration/resize/evacuation/etc. where it's possible to have the same consumer UUID on two providers owned by different hosts." is definitely "yes" for an evacuated instance | 18:48 |
mriedem | but as noted in the review, if the original evacuated-from source host ever restarts we'll remove it's allocations for any instances evacuated from that host | 18:48 |
mriedem | and we fixed the case that we'd orphan providers when deleting a nova-compute service in the os-services API | 18:49 |
mriedem | impossible to predict all of the weird shit that could happen, which is why we have heal_allocations i guess... | 18:49 |
fried_rice | mriedem: Thanks. I'll have to think through the implications in the reshaper context, which afaict will start with the fact that the allocations param we pass into the (second) update_provider_tree call will contain those allocations from the other host. Not sure what will actually fall out of that. May just be a case of documenting the possibility for the implementor of the upt reshape flow. | 18:56 |
fried_rice | sounds like for the evacuation scenario that documentation should simply say "leave these tf alone if you see 'em" | 18:57 |
fried_rice | which I imagine ought to be the default behavior anyway, since that flow should be focusing on moving allocations only for providers it's messing with, which should *not* include providers from other hosts. | 18:57 |
mriedem | ^ is kind of what i was getting at earlier and you said yes for now in the very specific stein case | 18:58 |
mriedem | the implementor won't actually know what the other RPs are probably w/o any kind of context, | 18:58 |
mriedem | i.e. looking up, "is this a compute node and if so, was it involved in some kind of migration?" | 18:58 |
fried_rice | right. It should employ the strategy of, "Is this allocation related to a provider I'm reshaping? No? Ignore." | 19:02 |
mriedem | the only thing is, | 19:09 |
mriedem | what does the virt driver have for context? it gets the nodename but it doesn't necessarily know based on RP UUID which RPs in the tree are the ones it cares about, right? | 19:10 |
mriedem | like, how does the virt driver identify the rp it actually cares about? | 19:10 |
mriedem | i know we *could* figure that out by looking up the compute node via CONF.host and nodename to get the CN UUID | 19:11 |
mriedem | but that's rather fugly | 19:11 |
mriedem | especially since the RT could just pass it down | 19:11 |
mriedem | "this is your nodename and this is your CN UUID" | 19:11 |
mriedem | if only we had a local yaml file that told nova-compute exactly what it's local inventory was.... :) | 19:12 |
fried_rice | doesn't even get the nodename | 19:18 |
fried_rice | the rt already passes down my nodename | 19:18 |
fried_rice | and we already tell upt implementors (via the docstring) not to futz with providers they don't recognize as being owned by them. | 19:19 |
fried_rice | which they mostly identify by knowing which ones they think are supposed to be the ones in the tree | 19:19 |
fried_rice | but also potentially via namespacing | 19:20 |
fried_rice | which might or might not wind up causing problems here. | 19:20 |
fried_rice | leakypipes: I'm rebasing https://review.openstack.org/#/c/590041/ k? | 19:20 |
openstackgerrit | Eric Fried proposed openstack/nova master: [placement] split gigantor SQL query, add logging https://review.openstack.org/590041 | 19:23 |
leakypipes | fried_rice: sure, np | 19:38 |
leakypipes | fried_rice: did you want me to change that log message? | 19:38 |
fried_rice | leakypipes: I did, and I did. | 19:38 |
fried_rice | leakypipes: See epic cumulative comment response | 19:39 |
leakypipes | ah, never mind then :) | 19:39 |
fried_rice | melwitt: --^ | 19:39 |
leakypipes | thx fried_rice | 19:39 |
fried_rice | mriedem: leakypipes: Restacking the reshaper series. Are you done reviewing? | 19:39 |
*** mriedem has quit IRC | 19:40 | |
leakypipes | fried_rice: I was not, no. | 19:44 |
leakypipes | fried_rice: can you hold on a bit on that one? | 19:45 |
fried_rice | leakypipes: I've started working on the last one you -1'd. I can wait for you after that. | 19:45 |
leakypipes | fried_rice: k. five minutes pls | 19:48 |
fried_rice | sho | 19:48 |
*** mriedem has joined #openstack-placement | 19:48 | |
openstackgerrit | Merged openstack/nova master: Stash the cell uuid on the context when targeting https://review.openstack.org/594571 | 20:09 |
fried_rice | leakypipes: Kid run, bbiab. But finished local restack up to https://review.openstack.org/#/c/584648/ | 20:12 |
*** fried_rice is now known as efried_afk | 20:13 | |
*** efried_afk is now known as fried_rice | 20:14 | |
fried_rice | leakypipes: Cancel that, wires crossed, no kid run. | 20:17 |
leakypipes | fried_rice: :) | 20:17 |
openstackgerrit | Dan Smith proposed openstack/nova master: Batch results per cell when doing cross-cell listing https://review.openstack.org/592698 | 20:29 |
openstackgerrit | Dan Smith proposed openstack/nova master: List instances from all cells explicitly https://review.openstack.org/593717 | 20:29 |
openstackgerrit | Dan Smith proposed openstack/nova master: Make instance_list perform per-cell batching https://review.openstack.org/593131 | 20:29 |
openstackgerrit | Dan Smith proposed openstack/nova master: Record cell success/failure/timeout in CrossCellLister https://review.openstack.org/594265 | 20:30 |
openstackgerrit | Dan Smith proposed openstack/nova master: Optimize global marker re-lookup in multi_cell_list https://review.openstack.org/594577 | 20:30 |
leakypipes | fried_rice: k, done. sorry for delay. | 20:53 |
fried_rice | leakypipes: ack, thx | 20:53 |
fried_rice | leakypipes: What is necessary to get your +1 upgraded to +2 on the top patch? | 21:01 |
fried_rice | Cause I think I've got the rest of the pile done. | 21:01 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: Real get_allocs_for_consumer https://review.openstack.org/584599 | 21:13 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: get_allocations_for_provider_tree https://review.openstack.org/584648 | 21:13 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: _reshape helper, placement min bump https://review.openstack.org/585034 | 21:13 |
openstackgerrit | Eric Fried proposed openstack/nova master: Report client: update_from_provider_tree w/reshape https://review.openstack.org/585049 | 21:13 |
openstackgerrit | Eric Fried proposed openstack/nova master: Compute: Handle reshaped provider trees https://review.openstack.org/576236 | 21:13 |
fried_rice | leakypipes, mriedem, giblet_off, cdent: ^^^^^ | 21:14 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: api-ref: fix volume attachment update policy note https://review.openstack.org/596489 | 21:29 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: api-ref: add a warning about calling swap volume directly https://review.openstack.org/596492 | 21:47 |
openstackgerrit | Eric Fried proposed openstack/nova master: Document no content on POST /reshaper 204 https://review.openstack.org/596494 | 21:49 |
openstackgerrit | Eric Fried proposed openstack/nova master: Compute: Handle reshaped provider trees https://review.openstack.org/576236 | 21:53 |
fried_rice | leakypipes: In case you were in the middle, forgot one thing in that top patch ^ | 21:53 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Add functional test for live migrate with anti-affinity group https://review.openstack.org/588935 | 21:53 |
mriedem | fried_rice: i suppose you should re-propose that reshaper spec for stein huh | 22:04 |
mriedem | or did you already? | 22:04 |
fried_rice | mriedem: did already | 22:05 |
fried_rice | mriedem: https://review.openstack.org/#/c/592650/ | 22:05 |
mriedem | ah i se it | 22:05 |
mriedem | yeah | 22:05 |
mriedem | oh god you had to format it didn't you | 22:06 |
openstackgerrit | Eric Fried proposed openstack/nova master: Fix race condition in reshaper handler https://review.openstack.org/596497 | 22:07 |
fried_rice | mriedem: no? | 22:10 |
mriedem | it's fine, very thorough as usual | 22:10 |
mriedem | +W | 22:10 |
fried_rice | Straight copy, plus deltas as advertised. | 22:10 |
fried_rice | Thanks. | 22:10 |
fried_rice | mriedem: Oh, I had to rename those linkylinks because they have to be globally unique across the whole doc build :( :( :( | 22:11 |
fried_rice | (made me 3x sad) | 22:12 |
mriedem | oh i was wondering about dropping the () | 22:15 |
mriedem | yeah that sucks | 22:15 |
melwitt | fried_rice: cool, will check it out | 22:16 |
mriedem | i hit something in the cinder api-ref the other day b/c of v2 and v3 api sections, took me awhile to realize why it was complaining | 22:16 |
openstackgerrit | Merged openstack/nova-specs master: Repropose reshaper spec for Stein https://review.openstack.org/592650 | 22:23 |
mriedem | fried_rice: looks like maybe another missing uuids.agg1 in the test here https://review.openstack.org/#/c/585034/ | 23:03 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Deprecate Core/Ram/DiskFilter https://review.openstack.org/596502 | 23:28 |
openstackgerrit | Matt Riedemann proposed openstack/nova master: Deprecate Core/Ram/DiskFilter https://review.openstack.org/596502 | 23:32 |
openstackgerrit | melanie witt proposed openstack/nova master: Make scheduler.utils.setup_instance_group query all cells https://review.openstack.org/540258 | 23:45 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!