*** tetsuro has joined #openstack-placement | 00:04 | |
*** mriedem_away has quit IRC | 00:23 | |
*** takashin has joined #openstack-placement | 02:19 | |
openstackgerrit | Tetsuro Nakamura proposed openstack/placement master: Add alembic version stamp capability to the DB https://review.openstack.org/620216 | 05:44 |
---|---|---|
openstackgerrit | Tetsuro Nakamura proposed openstack/placement master: Add alembic version stamp capability to the DB https://review.openstack.org/620216 | 06:01 |
openstackgerrit | Tetsuro Nakamura proposed openstack/placement master: Add alembic version stamp capability to the DB https://review.openstack.org/620216 | 07:01 |
*** tssurya has joined #openstack-placement | 08:13 | |
*** tssurya has quit IRC | 08:49 | |
*** cdent has joined #openstack-placement | 09:01 | |
*** tssurya has joined #openstack-placement | 09:01 | |
openstackgerrit | Merged openstack/placement master: Documentation cleanup: front page https://review.openstack.org/619273 | 09:47 |
cdent | huzzah | 09:48 |
*** takashin has left #openstack-placement | 10:04 | |
openstackgerrit | Merged openstack/placement master: Add a doc describing a quick live environment https://review.openstack.org/613343 | 10:22 |
openstackgerrit | Chris Dent proposed openstack/placement master: Allow placement to start without a config file https://review.openstack.org/619049 | 10:38 |
cdent | gibi: fixed that missing word, thanks | 10:38 |
cdent | gibi: can hyou have a look at https://review.openstack.org/#/c/619121/ ? Is pretty minor changes but may help with figuring out the race that is happening on https://review.openstack.org/#/c/617941/ | 10:39 |
*** sean-k-mooney has quit IRC | 11:04 | |
*** sean-k-mooney has joined #openstack-placement | 11:08 | |
gibi | cdent: thanks for the update, plugged the +2 back | 11:31 |
cdent | thanks! | 11:31 |
gibi | cdent: I've queued https://review.openstack.org/#/c/617941/ for review as well | 11:31 |
cdent | double thanks! | 11:31 |
cdent | I have the nova functional tests running in an infinite loop (breaking on error) trying to make the race happen and I'm not making much progress :( | 11:32 |
gibi | cdent: I can try to run the test in the background too, see if I'm more lucky | 11:33 |
cdent | hmm, just got a failure. 'ServerGroup policy is not supported: ServerGroupAntiAffinityFilter not configured' which suggests CONF is getting messed up, which is not surprising (this doesn't appear to have anything to do with placement itself, but perhaps with conf being migrated back and forth) | 11:35 |
gibi | cdent: I saw that group failure before so that is definitely a thing | 11:37 |
cdent | a different thing :( | 11:37 |
gibi | server group checks are using global state so I'm also not suprised | 11:37 |
cdent | so that particular problem, at least, is unlikely to be driven by the placement changes if it's been around before? | 11:38 |
gibi | cdent: yes, I saw that before the placement separation become a reality | 11:39 |
cdent | k | 11:39 |
* cdent thinks | 11:39 | |
gibi | cdent: I think this is the place causing the group failure https://github.com/openstack/nova/blob/1a1ea8e2aa66a2654e6cc141c735e47bbd8c4fef/nova/scheduler/utils.py#L805 | 11:39 |
gibi | cdent: nasty globals | 11:39 |
cdent | ewww | 11:39 |
cdent | yeah | 11:39 |
gibi | cdent: did you run the nova functional on https://review.openstack.org/#/c/617941 with or without https://review.openstack.org/#/c/619121/ ? | 11:42 |
cdent | both | 11:42 |
cdent | right now I'm running with | 11:42 |
gibi | ack | 11:43 |
gibi | I first try without | 11:43 |
cdent | hmm. I can break that server group stuff regularly. I'll look into that... later | 11:45 |
openstackgerrit | Tetsuro Nakamura proposed openstack/placement master: Add alembic version stamp capability to the DB https://review.openstack.org/620216 | 11:45 |
gibi | it wasn't fequent enough in my env to push me towards trying to fix it | 11:46 |
gibi | cdent: ... and at the first functional run I now hit the group race :) | 11:47 |
* cdent facepalms | 11:48 | |
cdent | It looks like 'class ServerGroupTestBase' ought to be doing some cleanup on those globals. Some of the test mock them, but not all of them. | 11:55 |
* cdent tries it | 12:11 | |
*** tetsuro has quit IRC | 12:13 | |
* cdent needs more hardware | 12:14 | |
cdent | lots more hardware | 12:14 |
* gibi is lucky enough to have access to a machine in the OPNFV lab that has 88 x86 cores | 12:20 | |
cdent | wow. nice. I max out 16 | 12:21 |
cdent | and since I've got that one in the infinite loop, I'm running some other tests on my laptop | 12:21 |
cdent | where I have 4 | 12:21 |
cdent | oh great. the more I poke at this, the worse it gets. I've got server group tests failing regularly, on nova master | 12:22 |
sean-k-mooney | gibi: hehe when i was still at intel i had a couple of those :) i miss my 88 core 192GB ram compute nodes | 12:25 |
cdent | luxury | 12:26 |
sean-k-mooney | gibi: also 88 core machines really show why defaulting serivice workers to $(nproc) is a dumb idea | 12:26 |
cdent | quite | 12:26 |
cdent | it's a dumb idea in any situation | 12:26 |
cdent | sean-k-mooney: if you have a clean nova master lying around can you do a 'tox -efunctional test_get_groups_all_projects' let me know if it is happy? | 12:27 |
sean-k-mooney | i had 30% idel cpu usage in the server because of 88 gnocci metrict collectors for like 5 mins till i deleted gnocci | 12:27 |
* cdent spins up a few more vms | 12:28 | |
sean-k-mooney | am sure i can try it unfortunetly it will be runnign on my personal hardware or a vm since i dont have beefy servers anymore | 12:29 |
cdent | no problem, I just want to confirm that the issue I'm seeing is just me | 12:30 |
* cdent also needs more spindles | 12:31 | |
sean-k-mooney | it ran without errors | 12:33 |
sean-k-mooney | what me to checkout a patch and run it again | 12:33 |
sean-k-mooney | cdent: full out put incase that helps http://paste.openstack.org/show/736081/ | 12:35 |
cdent | nope that's fine, thanks for doing that | 12:36 |
sean-k-mooney | no worries happy to help | 12:36 |
sean-k-mooney | is that the test that is racing with the placemnet fixture | 12:37 |
cdent | no, I was going down the rabbit hole with regard to the server group tests, trying to get their own racing out of the picture | 12:39 |
gibi | cdent: I've tried run the test in the same order as they was run in the failed gate job but it does not reproduce the problem. So I think it is not just two interfeering test cases | 12:41 |
cdent | gibi: yeah. it's...weird | 12:42 |
cdent | gibi: when comparing master and the placement fixture branch, the ServerGroup tests are much more likely to fail on the latter | 13:01 |
cdent | which is yet more weird | 13:01 |
gibi | cdent: your patch removes a lot of tests but most of them is unit test or gabbit so, yeah weird | 13:03 |
gibi | cdent: btw I managed to produce another test failure with https://review.openstack.org/#/c/617941/ without the placement change. See http://paste.openstack.org/show/736086/ | 13:04 |
cdent | oh really | 13:05 |
cdent | that is useful | 13:05 |
cdent | yeah, so that's the original problem, before I tried adding the fixture tidy up patch as a depends-on | 13:06 |
cdent | if we can't cause that to happen with the depends-on then maybe it helps :) | 13:06 |
gibi | OK, I will start running the patch with its dependency | 13:06 |
cdent | thanks for doing this gibi, I think I might go crazy if I kept on with this stuff solo. | 13:08 |
gibi | cdent: running nova functional with the placement deps still fails with TrasactionFactory is already started: http://paste.openstack.org/show/736088/ | 13:33 |
cdent | gibi: I guess that's to be expected | 13:54 |
cdent | any ideas? | 13:54 |
sean-k-mooney | you shoudl not get teh TrasactionFactory error anymore after teh run_once decorator | 13:55 |
gibi | cdent: I thought that your placement deps is fixing this but I read the patcha and now I'm not sure | 13:56 |
cdent | sean-k-mooney: we're resetting that, on purpose | 13:56 |
sean-k-mooney | oh ok | 13:56 |
cdent | sean-k-mooney: we have to because we need up to 3 different engine-types in the same process | 13:56 |
cdent | it's a mess | 13:57 |
sean-k-mooney | i assume the different engine types are for different tests? | 13:59 |
cdent | yes | 14:02 |
cdent | gibi: responded to your comments on the fixture adjustments. I hadn't really expected them to fix the current issues. It was more of a wild guess, since I already had that code around for a few days and I was hoping (in a useless way) that have fewer globals would make a difference | 14:10 |
gibi | cdent: OK, I thanks, missunderstood the goal of the placement patch a bit | 14:12 |
cdent | on the ServerGroup stuff I think the issue may be with policy file handling | 14:15 |
*** mriedem has joined #openstack-placement | 14:18 | |
cdent | gibi: I think one of the several factors here _may_ policy handing having global conf in itself | 14:23 |
cdent | but I'm not really clear. Unfortunately I have to do an internal thing before the end of day tomorrow so I need to drop this now, if you figure something out, feel free to fix it, or leave your notes on the changes | 14:25 |
gibi | cdent: ack, I also not promise too much progress | 14:25 |
cdent | :) | 14:25 |
cdent | I keep hitting another variable and falling in a hole, so maybe after a break I'll figure out a way to narrow things | 14:26 |
gibi | :) | 14:26 |
cdent | mriedem or dansmith you may have thoughts on https://review.openstack.org/#/c/620216/ | 14:28 |
tssurya | efried: so once we disable the whole refreshing, the only call to placement during periodic update would be this periodic checker https://github.com/openstack/nova/blob/1e823f21997018bcd197057ebd4d6207a5c54403/nova/compute/resource_tracker.py#L781 for allocations | 14:28 |
cdent | (using stamp after migration) | 14:28 |
cdent | back after a while | 14:29 |
* cdent waves | 14:29 | |
tssurya | efried: I do see jaypipes's comment https://github.com/openstack/nova/blob/1e823f21997018bcd197057ebd4d6207a5c54403/nova/compute/resource_tracker.py#L1237 about this being "sucky code" | 14:29 |
*** cdent has quit IRC | 14:29 | |
tssurya | efried: but yea probably its a good idea to have that query every 60secs to keep it consistent | 14:29 |
efried | tssurya: Catching up... | 14:41 |
efried | tssurya: Yeah, it looks like that will still happen. | 14:50 |
efried | It should be noted that there's a pretty sharp dividing line between [providers, inventories, traits, aggregates] and [consumers/instances, allocations] in terms of how and where we query, store, and use the information in the resource tracker. | 14:50 |
efried | To wit: we cache the former in the ProviderTree object in the report client, let the virt driver muck with them (via update_provider_tree), and assume they won't change out of band. | 14:50 |
efried | Whereas the latter we don't cache, and we treat it as much more dynamic and subject to changing, e.g. due to migrations of various forms. | 14:50 |
efried | That said, once we flush the last vestige of "doubling allocations" (I think that's in the evacuate path? gibi?) we may be able to do away with some of this "sucky" code. | 14:51 |
mriedem | we also double allocations on resize to same host | 14:52 |
mriedem | ala https://review.openstack.org/#/c/619123/ | 14:52 |
efried | mriedem: ack, thx. We're on the road to fixing that, yah? | 14:53 |
efried | though I guess we haven't figured out how we're gonna do it yet | 14:53 |
gibi | efried: yepp, the only real doubling is during evacuation the rest is using the migration_uuid as a consumer for the dest host allocation | 14:54 |
gibi | mriedem: resize same host still use two different consumer | 14:54 |
gibi | mriedem: as far as I know | 14:54 |
efried | but it's still effectively doubling the allocation, because both consumers are on the same host. | 14:54 |
gibi | efried: from host perspective it is doubled from consumer perspective it is not :) | 14:55 |
gibi | efried: but I agree | 14:55 |
mriedem | efried: well i've got the functional regression recreate there, and the bug reported, with hacky ideas in the bug report, but i'm not actively working on fixing that yet | 14:55 |
mriedem | gibi: yeah correct | 14:55 |
gibi | so evac doubles from consumer perspective but not from host perspective. The resize to same host doubles it from host perspective but not from consumer perspective. what a nice complete coverage of possibilities :) | 14:57 |
*** ttsiouts has joined #openstack-placement | 15:58 | |
*** dansmith has quit IRC | 16:02 | |
*** dansmith has joined #openstack-placement | 16:02 | |
openstackgerrit | Merged openstack/placement master: Clean up and clarify tox.ini https://review.openstack.org/611719 | 16:14 |
*** ttsiouts has quit IRC | 17:06 | |
*** ttsiouts has joined #openstack-placement | 17:07 | |
*** ttsiouts has quit IRC | 17:11 | |
openstackgerrit | Jack Ding proposed openstack/nova-specs master: [WIP] Flavor Extra Spec and Image Properties Validation https://review.openstack.org/618542 | 17:12 |
*** cdent has joined #openstack-placement | 17:16 | |
openstackgerrit | Chris Dent proposed openstack/placement master: Start a contributor goals document https://review.openstack.org/618811 | 17:26 |
cdent | gibi: I've got a demo on the lower-constraints thing of the reason why of the install command | 17:39 |
openstackgerrit | Chris Dent proposed openstack/placement master: Correct lower-constraints.txt and the related tox job https://review.openstack.org/614559 | 17:43 |
openstackgerrit | Artom Lifshitz proposed openstack/nova-specs master: Re-propose numa-aware-live-migration spec https://review.openstack.org/599587 | 17:49 |
*** tssurya has quit IRC | 18:50 | |
* cdent watches --until-failure not fail | 19:12 | |
edleafe | Not failing is failure? | 19:18 |
cdent | I'm unsure on how much I need to convince myself | 19:19 |
cdent | and when I do, I then need ot convince myself in the other direction | 19:20 |
mriedem | this might make you feel better https://review.openstack.org/#/c/617662/ | 19:42 |
cdent | hurrah | 19:49 |
cdent | wow, that took a long time, but failed | 20:10 |
cdent | which disproves one hypothesis | 20:11 |
mriedem | no jaybird huh | 20:32 |
cdent | i'm not even in a maze of twisty passages, I'm in one of those smelly ponds fully of stank and ugh | 20:42 |
efried | mriedem: while I've got it in front of me, would you please hit https://review.openstack.org/#/c/619299/ so we don't somehow forget it before the release? Easy peasy. | 20:58 |
mriedem | why does that depend on a grenade change? | 21:01 |
cdent | mriedem: that can probably go away now | 21:02 |
cdent | It was part of trying to figure out the issue with swift | 21:02 |
mriedem | ok so totally unrelated | 21:02 |
cdent | no, | 21:03 |
cdent | the only way it could be properly tested was if tempest and grenade were running | 21:03 |
cdent | and those were not working until my swift fix | 21:03 |
cdent | so it is unrelated _now_ | 21:03 |
cdent | but it wasn't then | 21:03 |
cdent | so if one of the two of you can clean it out, that would be awesome, as I'm in a deep hole | 21:03 |
openstackgerrit | Merged openstack/placement master: Add integrated-gate-py35 template to .zuul.yaml https://review.openstack.org/617565 | 21:22 |
mriedem | dansmith: fyi placement is now gating on tempest/devstack and grenade ^ | 21:23 |
dansmith | cool | 21:23 |
mriedem | we might want to send a thing to the ML to let people know that devstack is now using extracted placement... | 21:24 |
mriedem | devstack and grenade | 21:24 |
mriedem | in case weird issues crop up | 21:24 |
mriedem | who's it? | 21:24 |
mriedem | i guess i can do it | 21:26 |
mriedem | efried: cdent: ok comments inline on https://review.openstack.org/#/c/619299/ | 21:33 |
cdent | efried: i'm not going to be able to get to that until tomorrow if you feel inclined to do it today. I agree with mriedem says | 21:36 |
mriedem | i was going to push up a change to drop [keystone] and fix the missing [keystone_authtoken] entry in the config docs | 21:36 |
mriedem | then i think i can just tweak the commit message and such and we're happy | 21:36 |
cdent | oh if your'e happy to do that, then awesome | 21:37 |
mriedem | it's better than reviewing specs | 21:37 |
cdent | tru | 21:38 |
cdent | the thing I'm doing now will likely need to base off that as it's moaning about duplicated config | 21:39 |
openstackgerrit | Matt Riedemann proposed openstack/placement master: Remove keystoneauth1 opts from placement config group https://review.openstack.org/619299 | 21:46 |
openstackgerrit | Matt Riedemann proposed openstack/placement master: Remove keystoneauth1 opts from placement config group https://review.openstack.org/619299 | 21:56 |
openstackgerrit | Matt Riedemann proposed openstack/placement master: Remove [keystone] config options from placement https://review.openstack.org/620412 | 21:56 |
efried | cdent: Are we going to need ksa adapter opts for anything from placement? I wouldn't have thought so, right? | 22:08 |
cdent | it doesn't talk out, so I reckon no | 22:08 |
efried | mriedem: make https://review.openstack.org/620412 bigger, see comment. | 22:08 |
efried | cdent: ^ | 22:08 |
cdent | yeah, agree | 22:09 |
mriedem | it's just never enough is it | 22:12 |
efried | mriedem, meet world. | 22:12 |
cdent | less code mmm good | 22:13 |
efried | ++ | 22:13 |
efried | Who wrote that pos module anyway? | 22:13 |
mriedem | yeah yeah i'll do it after i'm done shitting on something else atm | 22:13 |
openstackgerrit | Matt Riedemann proposed openstack/placement master: Remove [keystone] config options from placement https://review.openstack.org/620412 | 22:25 |
efried | +2, yay. | 22:32 |
*** mriedem has quit IRC | 23:46 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!