*** sapd1_x has joined #openstack-nova | 00:20 | |
*** sapd1_x has quit IRC | 00:33 | |
*** imacdonn has quit IRC | 01:15 | |
*** imacdonn has joined #openstack-nova | 01:15 | |
openstackgerrit | Arthur Dayne proposed openstack/python-novaclient stable/rocky: Add Python 3 Train unit tests https://review.opendev.org/670749 | 01:26 |
---|---|---|
openstackgerrit | ya.wang proposed openstack/nova master: Add compatibility checks for CPU mode and CPU models and extra flags https://review.opendev.org/670299 | 01:27 |
openstackgerrit | ya.wang proposed openstack/nova master: Support report multi CPU model traits https://review.opendev.org/670300 | 01:27 |
openstackgerrit | ya.wang proposed openstack/nova master: Add release note https://review.opendev.org/670441 | 01:27 |
*** yedongcan has joined #openstack-nova | 02:20 | |
*** factor has joined #openstack-nova | 02:51 | |
*** Spencer_Yu has joined #openstack-nova | 03:10 | |
*** tbachman has quit IRC | 03:13 | |
Spencer_Yu | https://bugs.launchpad.net/nova/+bug/1836141 Some failed actions would leave the remains in resize instance, which will lead next resize to fail. | 03:19 |
openstack | Launchpad bug 1836141 in OpenStack Compute (nova) "vm resize failed due to the remains left by failed actions" [Undecided,New] | 03:19 |
*** whoami-rajat has joined #openstack-nova | 03:24 | |
*** psachin has joined #openstack-nova | 03:25 | |
*** Spencer_Yu has quit IRC | 03:36 | |
*** markvoelker has joined #openstack-nova | 03:37 | |
*** ricolin has joined #openstack-nova | 03:43 | |
*** sapd1_x has joined #openstack-nova | 03:48 | |
*** ricolin has quit IRC | 03:49 | |
*** udesale has joined #openstack-nova | 03:49 | |
*** ricolin has joined #openstack-nova | 03:50 | |
*** sapd1_x has quit IRC | 04:05 | |
*** tbachman has joined #openstack-nova | 04:28 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: DRY get_flavor in flavor manage tests https://review.opendev.org/668281 | 04:37 |
*** sapd1 has quit IRC | 05:03 | |
*** yaawang has quit IRC | 05:13 | |
*** yaawang has joined #openstack-nova | 05:13 | |
*** shilpasd has joined #openstack-nova | 05:16 | |
*** rcernin has quit IRC | 05:27 | |
*** Luzi has joined #openstack-nova | 05:44 | |
*** ivve has quit IRC | 05:58 | |
*** ratailor has joined #openstack-nova | 06:02 | |
*** ratailor_ has joined #openstack-nova | 06:05 | |
*** ratailor has quit IRC | 06:07 | |
*** ircuser-1 has quit IRC | 06:15 | |
*** luksky11 has joined #openstack-nova | 06:23 | |
*** damien_r has joined #openstack-nova | 06:24 | |
*** damien_r has quit IRC | 06:28 | |
*** dpawlik has joined #openstack-nova | 06:31 | |
*** yedongcan has quit IRC | 06:31 | |
*** yedongcan has joined #openstack-nova | 06:33 | |
*** pcaruana has joined #openstack-nova | 06:36 | |
*** pcaruana has quit IRC | 06:40 | |
*** rcernin has joined #openstack-nova | 06:44 | |
*** whoami-rajat has quit IRC | 06:50 | |
*** geekinutah has quit IRC | 06:50 | |
*** rouk has quit IRC | 06:50 | |
*** zbr has quit IRC | 06:50 | |
*** Hazelesque has quit IRC | 06:50 | |
*** fungi has quit IRC | 06:50 | |
*** amodi has quit IRC | 06:50 | |
*** ajo has quit IRC | 06:50 | |
*** dtantsur|afk has quit IRC | 06:50 | |
*** ildikov has quit IRC | 06:50 | |
*** NobodyCam has quit IRC | 06:50 | |
*** mugsie has quit IRC | 06:50 | |
*** Kevin_Zheng has quit IRC | 06:50 | |
*** niceplace_ has quit IRC | 06:50 | |
*** melwitt has quit IRC | 06:50 | |
*** yikun has quit IRC | 06:50 | |
*** rajinir has quit IRC | 06:50 | |
*** jhesketh has quit IRC | 06:50 | |
openstackgerrit | Brin Zhang proposed openstack/nova-specs master: Proposal for a safer noVNC console with password authentication https://review.opendev.org/623120 | 06:54 |
*** ivve has joined #openstack-nova | 06:57 | |
*** hoonetorg has quit IRC | 07:02 | |
*** xek has joined #openstack-nova | 07:08 | |
*** damien_r has joined #openstack-nova | 07:08 | |
*** ttsiouts has joined #openstack-nova | 07:08 | |
*** hemna has quit IRC | 07:13 | |
*** hemna has joined #openstack-nova | 07:15 | |
*** Kevin_Zheng has joined #openstack-nova | 07:16 | |
*** melwitt has joined #openstack-nova | 07:16 | |
*** niceplace_ has joined #openstack-nova | 07:16 | |
*** yikun has joined #openstack-nova | 07:16 | |
*** rajinir has joined #openstack-nova | 07:16 | |
*** jhesketh has joined #openstack-nova | 07:16 | |
*** whoami-rajat has joined #openstack-nova | 07:16 | |
*** geekinutah has joined #openstack-nova | 07:16 | |
*** rouk has joined #openstack-nova | 07:16 | |
*** zbr has joined #openstack-nova | 07:16 | |
*** Hazelesque has joined #openstack-nova | 07:16 | |
*** fungi has joined #openstack-nova | 07:16 | |
*** amodi has joined #openstack-nova | 07:16 | |
*** ajo has joined #openstack-nova | 07:16 | |
*** ildikov has joined #openstack-nova | 07:16 | |
*** NobodyCam has joined #openstack-nova | 07:16 | |
*** mugsie has joined #openstack-nova | 07:16 | |
*** openstackgerrit has quit IRC | 07:18 | |
*** rpittau|afk is now known as rpittau | 07:19 | |
*** panda has quit IRC | 07:19 | |
*** hoonetorg has joined #openstack-nova | 07:19 | |
*** panda has joined #openstack-nova | 07:21 | |
*** tbachman has quit IRC | 07:22 | |
*** tbachman has joined #openstack-nova | 07:23 | |
*** maciejjozefczyk has joined #openstack-nova | 07:23 | |
*** slaweq has joined #openstack-nova | 07:25 | |
*** tbachman has quit IRC | 07:28 | |
*** ttsiouts has quit IRC | 07:36 | |
*** ttsiouts has joined #openstack-nova | 07:36 | |
*** ccamacho has joined #openstack-nova | 07:38 | |
*** ricolin_ has joined #openstack-nova | 07:39 | |
*** ttsiouts has quit IRC | 07:41 | |
*** ricolin has quit IRC | 07:42 | |
*** ttsiouts has joined #openstack-nova | 07:46 | |
*** tssurya has joined #openstack-nova | 07:52 | |
*** helenafm has joined #openstack-nova | 07:53 | |
*** jangutter has joined #openstack-nova | 07:55 | |
*** lpetrut has joined #openstack-nova | 08:00 | |
*** cdent has joined #openstack-nova | 08:05 | |
alex_xu | efried: bauzas need your help on the direction https://review.opendev.org/#/q/status:open+project:openstack/nova+branch:master+topic:claim_for_instance | 08:23 |
*** Kevin_Zheng has quit IRC | 08:26 | |
*** niceplace_ has quit IRC | 08:26 | |
*** melwitt has quit IRC | 08:26 | |
*** yikun has quit IRC | 08:26 | |
*** rajinir has quit IRC | 08:26 | |
*** jhesketh has quit IRC | 08:26 | |
*** whoami-rajat has quit IRC | 08:27 | |
*** geekinutah has quit IRC | 08:27 | |
*** rouk has quit IRC | 08:27 | |
*** zbr has quit IRC | 08:27 | |
*** Hazelesque has quit IRC | 08:27 | |
*** fungi has quit IRC | 08:27 | |
*** amodi has quit IRC | 08:27 | |
*** ajo has quit IRC | 08:27 | |
*** ildikov has quit IRC | 08:27 | |
*** NobodyCam has quit IRC | 08:27 | |
*** mugsie has quit IRC | 08:27 | |
*** panda has quit IRC | 08:29 | |
*** panda has joined #openstack-nova | 08:31 | |
*** Hazelesque has joined #openstack-nova | 08:35 | |
*** NobodyCam has joined #openstack-nova | 08:35 | |
*** geekinutah has joined #openstack-nova | 08:35 | |
*** rouk has joined #openstack-nova | 08:36 | |
*** whoami-rajat has joined #openstack-nova | 08:36 | |
*** ajo has joined #openstack-nova | 08:36 | |
*** niceplace has joined #openstack-nova | 08:36 | |
*** melwitt has joined #openstack-nova | 08:36 | |
*** fungi has joined #openstack-nova | 08:36 | |
*** zbr has joined #openstack-nova | 08:37 | |
*** yikun has joined #openstack-nova | 08:37 | |
*** mugsie has joined #openstack-nova | 08:37 | |
*** jhesketh has joined #openstack-nova | 08:37 | |
*** davidsha has joined #openstack-nova | 08:52 | |
*** sean-k-mooney has quit IRC | 08:52 | |
*** rcernin has quit IRC | 08:55 | |
*** sean-k-mooney has joined #openstack-nova | 08:56 | |
*** ralonsoh has joined #openstack-nova | 09:02 | |
*** lpetrut has quit IRC | 09:10 | |
sean-k-mooney | stephenfin: do you know the history behind why we went form testr to ostestr to stestr? im wondering if we should consider regerting to testr or look at finding a non subunit based alternitive | 09:38 |
*** derekh has joined #openstack-nova | 09:38 | |
sean-k-mooney | https://blog.kortar.org/?p=370 | 09:40 |
sean-k-mooney | found the history ^ | 09:41 |
stephenfin | sean-k-mooney: I was pretty sure testr was subunit-based, tbh | 09:42 |
sean-k-mooney | it is | 09:42 |
sean-k-mooney | that does not mean it has the same bug | 09:43 |
*** ricolin_ has quit IRC | 09:47 | |
cdent | sean-k-mooney: are you talking about the overflow bug with too much logging? or something else? | 09:57 |
*** rpittau has quit IRC | 10:03 | |
sean-k-mooney | cdent: ya we are now hitting it downstream | 10:04 |
sean-k-mooney | RHEL8 ship newer versions of some libs then upper constratis uses for stien | 10:04 |
sean-k-mooney | cdent: so we get more deprecation warnings then upstream | 10:05 |
cdent | the same bug is present in testr and ostestr (which is a testr wrapper). My understanding is the issue is in testtools and/or subunit not testr or stestr | 10:05 |
cdent | and it really comes down to a binary data structure being too small | 10:05 |
cdent | it should be per test | 10:05 |
sean-k-mooney | cdent: well yes and no | 10:06 |
*** johnsom has quit IRC | 10:06 | |
*** rpittau has joined #openstack-nova | 10:06 | |
sean-k-mooney | it is in subunit parser | 10:06 |
sean-k-mooney | but the subunit protocal | 10:06 |
cdent | so finding and turning off the deprecation warnings (or simply fixing them) is probably the easiest thing | 10:06 |
sean-k-mooney | allows large packets to be fragmented | 10:06 |
sean-k-mooney | i think the issue is we are not fragmenting the stream | 10:06 |
sean-k-mooney | cdent: ya that is how we are fixing it upstream | 10:07 |
*** johnsom has joined #openstack-nova | 10:07 | |
sean-k-mooney | but downstream that is more of a challange | 10:07 |
sean-k-mooney | it either requires backporting things that may not be backported upstream or downstream only patches | 10:07 |
* cdent wonders if we could our should make deprecations warnings 'once' | 10:08 | |
sean-k-mooney | cdent: i was wondering if there was a way to redirect all deperecation warnings to a seperate file so that it does not end up in the test output | 10:13 |
*** lxkong has quit IRC | 10:13 | |
sean-k-mooney | we should be able to install a python logging filter i think but i havent tried | 10:14 |
*** lxkong has joined #openstack-nova | 10:14 | |
cdent | yeah, that does appear to be the case | 10:15 |
sean-k-mooney | the trick would be using a generic enough regex to match deprecations without redirecting real errors. we would also want the gate jobs to copy the deprecation file so that we can actully track how big it is and squash them | 10:18 |
*** ttsiouts has quit IRC | 10:19 | |
*** ttsiouts has joined #openstack-nova | 10:19 | |
sean-k-mooney | it lookse like we already filter a bunch of waring here https://github.com/openstack/nova/blob/master/nova/tests/fixtures.py#L815-L877 | 10:22 |
sean-k-mooney | cdent: actully https://github.com/openstack/nova/blob/master/nova/tests/fixtures.py#L820 should be makeing deprecation warning print once | 10:24 |
*** ttsiouts has quit IRC | 10:24 | |
sean-k-mooney | but maybe that is per test? | 10:24 |
sean-k-mooney | i would have assumed it would be per logger but maybe we dont use the warning filter in all places we should | 10:24 |
cdent | could be that it is not turned on everywhere? there are some tests that don't use the usual base? | 10:26 |
sean-k-mooney | the test that is exploding downswtream is the test_instace_action functional test which runns a bunch of tests interall https://github.com/openstack/nova/blob/7279d6fa009c6e276188bcad0ad5a1832849a4f9/nova/tests/functional/notification_sample_tests/test_instance.py#L357-L377 | 10:27 |
sean-k-mooney | it is not direcly using the fixtre but ill look at its inheritance tree | 10:28 |
sean-k-mooney | maybe we dont use it in the functional tests | 10:28 |
gibi | sean-k-mooney: does not this solved the problem https://review.opendev.org/#/c/656844/ ? | 10:29 |
sean-k-mooney | i can check if we have that on osp 15 which is stine but migi was saying the backport he has tried sofar do not fix it for us | 10:30 |
sean-k-mooney | we dont have auto import of backports so its possible we are missing that one | 10:31 |
* gibi goes get some food | 10:32 | |
sean-k-mooney | gibi: ya we have that and no it does not fix it for us | 10:36 |
*** mdbooth has joined #openstack-nova | 10:37 | |
*** luksky11 has quit IRC | 10:38 | |
*** geekinutah has quit IRC | 10:39 | |
sean-k-mooney | cdent: looks like the waring filter is installed for the functional tests | 10:43 |
sean-k-mooney | so i guess we are still looking too much | 10:43 |
*** sapd1_x has joined #openstack-nova | 10:44 | |
sean-k-mooney | part of the issue is i think stestr is creating a log stream per worker not per test | 10:44 |
*** ttsiouts has joined #openstack-nova | 10:56 | |
*** sapd1_x has quit IRC | 11:04 | |
*** luksky11 has joined #openstack-nova | 11:14 | |
*** ratailor_ has quit IRC | 11:17 | |
*** tesseract has joined #openstack-nova | 11:20 | |
*** sapd1 has joined #openstack-nova | 11:24 | |
*** pcaruana has joined #openstack-nova | 11:32 | |
*** psachin has quit IRC | 11:36 | |
*** udesale has quit IRC | 11:39 | |
*** udesale has joined #openstack-nova | 11:40 | |
*** tesseract has quit IRC | 11:48 | |
*** weshay|rover is now known as weshay | 11:51 | |
*** tesseract has joined #openstack-nova | 11:51 | |
*** needssleep is now known as TheJulia | 12:04 | |
*** cdent has quit IRC | 12:12 | |
*** ttsiouts has quit IRC | 12:15 | |
*** ttsiouts has joined #openstack-nova | 12:15 | |
*** ttsiouts has quit IRC | 12:18 | |
*** ttsiouts has joined #openstack-nova | 12:18 | |
*** hongda has joined #openstack-nova | 12:20 | |
*** hongda has quit IRC | 12:24 | |
*** ricolin_ has joined #openstack-nova | 12:30 | |
*** ricolin__ has joined #openstack-nova | 12:31 | |
*** ricolin_ has quit IRC | 12:35 | |
*** hongda has joined #openstack-nova | 12:35 | |
hongda | Hello everyone. Can you help to review: https://review.opendev.org/#/c/670016/ and https://review.opendev.org/#/c/669867/ ? They tried to fix live-migration failure when token expires. Thanks a lot. XD | 12:56 |
*** lpetrut has joined #openstack-nova | 12:58 | |
sean-k-mooney | we would have to fix this on master first before https://review.opendev.org/#/c/670016/ can be applied | 13:00 |
sean-k-mooney | why is that stable only by the way | 13:00 |
efried | alex_xu: Talk to me | 13:01 |
*** mdbooth has quit IRC | 13:05 | |
sean-k-mooney | hongda: im not sure we should just blindly use the admin context | 13:05 |
sean-k-mooney | we could be it would proably be better to only use the admin context if the token had expired | 13:06 |
*** jmlowe has quit IRC | 13:07 | |
sean-k-mooney | hongda: it also looks like you are reusing an old bug that was closed in 2017 | 13:07 |
efried | sean-k-mooney: I'm not paying much attention here, but service auth was made for a situation where you start off with a user token for a long-running operation and then it expires somewhere in the middle. | 13:07 |
sean-k-mooney | hongda: it would be better to file a new noen and reference it | 13:07 |
efried | wrap the user auth in a service auth and you're good. | 13:07 |
efried | what service is this for? | 13:07 |
sean-k-mooney | efried: hongda patches https://review.opendev.org/#/c/669867/1 | 13:08 |
sean-k-mooney | and this stable only patch https://review.opendev.org/#/c/670016/ | 13:08 |
sean-k-mooney | its related to https://bugs.launchpad.net/nova/+bug/1647451 | 13:08 |
openstack | Launchpad bug 1647451 in OpenStack Compute (nova) newton "Post live migration step could fail due to auth errors" [Medium,Fix committed] - Assigned to Lee Yarwood (lyarwood) | 13:08 |
sean-k-mooney | efried: so instead of juat createing an admin client here https://review.opendev.org/#/c/669867/1/nova/network/neutronv2/api.py we shoudl use the service auth thing right | 13:09 |
efried | sean-k-mooney: I'm going to have to take a closer look at this a bit later. What's the operation that starts this flow? | 13:10 |
efried | i.e. why does the user token have the opportunity to expire before list_ports? | 13:10 |
sean-k-mooney | efried: i ltrally just started looking at this 10 mins ago. but i belive its the admin does openstack server migrate --live near the end of the lifetime of the token | 13:11 |
sean-k-mooney | and it expires midway | 13:11 |
*** openstackgerrit has joined #openstack-nova | 13:11 | |
openstackgerrit | Merged openstack/python-novaclient master: Remove deprecated methods and properties https://review.opendev.org/667762 | 13:11 |
sean-k-mooney | or you know the live migration just take a while | 13:12 |
efried | at a glance, it appears as though the rest of the stuff in this flow is using admin auth | 13:12 |
sean-k-mooney | in either case it expires by the time it gets to post live migrate | 13:12 |
efried | but I don't want to discount the possibility that the list_ports is being done under user auth specifically to guard against someone kicking off this operation when they shouldn't. | 13:12 |
sean-k-mooney | well live-migate is an admin only op so its propably fine | 13:12 |
*** belmoreira has joined #openstack-nova | 13:13 | |
efried | okay. There's some more opportunities to reuse the admin client in this flow as well. | 13:14 |
efried | Let me come back to this after my mtg | 13:14 |
sean-k-mooney | sure i was more worried about breaking the api request id tracking stuff | 13:14 |
sean-k-mooney | but i guess that will be preseved in the keystone context so it prably fine | 13:15 |
sean-k-mooney | *proably | 13:15 |
*** boxiang has joined #openstack-nova | 13:15 | |
efried | sean-k-mooney: If get_client with admin=True is breaking the global request ID, then that's broken all over the place. Would be a separate issue, if it's an issue at all. | 13:18 |
sean-k-mooney | efried: it proably isnt a problem | 13:18 |
efried | nope, I'm looking at get_client itself now and it's properly handling the global request ID. | 13:19 |
sean-k-mooney | but that is just want i wanted to confrim before i +/-1'd and reviewd it properly | 13:19 |
sean-k-mooney | efried: arent you ment to be in a meeting :P | 13:19 |
* sean-k-mooney ignores the fact im on a emea wide meeting too | 13:20 | |
*** yedongcan has left #openstack-nova | 13:20 | |
efried | Yeah, if we're happy that this is supposed to be an admin-only flow anyway, then this change makes sense to me. | 13:21 |
openstackgerrit | Surya Seetharaman proposed openstack/nova master: API microversion 2.75: Add 'power-update' external event https://review.opendev.org/645611 | 13:22 |
artom | I don't think I'm doing this right | 13:23 |
artom | I'm trying to see if test_server_connectivity_cold_migration_revert started failing recently-ish | 13:23 |
sean-k-mooney | life? openstack? irc? | 13:23 |
artom | sean-k-mooney, I mean, yes, but: | 13:23 |
artom | Here's my logstash query: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22test_server_connectivity_cold_migration_revert*FAILED%5C%22%20and%20tags%3A%5C%22job-output.txt%5C%22 | 13:23 |
openstackgerrit | Merged openstack/python-novaclient master: Deprecate cells v1 and extension commands and APIs https://review.opendev.org/669597 | 13:24 |
openstackgerrit | Merged openstack/python-novaclient master: Add a guide to add a new microversion support https://review.opendev.org/667002 | 13:24 |
artom | Looks like I got the wildcard wrong - but I need it, because there's a timestamp in there | 13:24 |
artom | For example: tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_cold_migration_revert [221.049469s] ... FAILED | 13:24 |
artom | Context is, we may have merged https://review.opendev.org/#/c/663405/ too soon | 13:26 |
artom | That recently un-skipped test is failing pretty consistently (though not 100%) on https://review.opendev.org/#/c/668631/ | 13:27 |
openstackgerrit | Surya Seetharaman proposed openstack/python-novaclient master: API microversion 2.75: Add 'power-update' external event https://review.opendev.org/666792 | 13:27 |
sean-k-mooney | too soon in that its still broken or we need to receck things | 13:27 |
artom | But everything looks right from the Nova events POV, so maybe there's something else, and we need to re-skip test_server_connectivity_cold_migration_revert | 13:27 |
artom | But step 1 is determining whether those failures started happening on all runs right after we merged the un-skip patch, or whether it's just my patch that's causing trouble | 13:28 |
artom | Hence the logstash query | 13:28 |
sean-k-mooney | ya | 13:29 |
sean-k-mooney | ill see if i can hack something to work too quickly | 13:29 |
*** jmlowe has joined #openstack-nova | 13:29 | |
sean-k-mooney | artom: it look like your query is being ignored more or less | 13:31 |
artom | sean-k-mooney, I know :( | 13:31 |
*** mdbooth has joined #openstack-nova | 13:31 | |
sean-k-mooney | do you have an example failure | 13:32 |
*** hemna has quit IRC | 13:33 | |
artom | sean-k-mooney, http://logs.openstack.org/31/668631/7/check/tempest-slow-py3/a16d7d9/job-output.txt.gz#_2019-07-14_15_36_51_145040 | 13:33 |
sean-k-mooney | thanks | 13:33 |
artom | The message operator, it's per line, right? | 13:34 |
sean-k-mooney | yes it should be | 13:34 |
*** betherly has joined #openstack-nova | 13:35 | |
sean-k-mooney | well yes and no | 13:35 |
*** belmoreira has quit IRC | 13:35 | |
sean-k-mooney | the log stream should chunk it per new line | 13:35 |
sean-k-mooney | sorry not per line but per log output | 13:35 |
sean-k-mooney | e.g. if you log multi line message teh log/parser/setream will stream that full log messge to logstash in one go | 13:36 |
*** hemna has joined #openstack-nova | 13:37 | |
artom | sean-k-mooney, oh, I think it splits on the _ | 13:38 |
*** belmoreira has joined #openstack-nova | 13:38 | |
*** maciejjozefczyk_ has joined #openstack-nova | 13:39 | |
*** boxiang has quit IRC | 13:39 | |
*** betherly has quit IRC | 13:40 | |
*** maciejjozefczyk has quit IRC | 13:40 | |
sean-k-mooney | its shouldnt | 13:41 |
sean-k-mooney | it might be but it shouldn't | 13:41 |
alex_xu | efried: looking for some feedback on this direction https://review.opendev.org/#/q/status:open+project:openstack/nova+branch:master+topic:claim_for_instance before I'm going futher. | 13:41 |
efried | artom: (still) not paying a lot of attention here, but there ought to be lots of good examples in the elastic-recheck project | 13:41 |
artom | efried, ack, thanks | 13:42 |
efried | alex_xu: Okay, I saw a bunch of patches come in this morning. Were you planning to put up a spec for this? | 13:42 |
sean-k-mooney | ya https://github.com/openstack-infra/elastic-recheck/tree/master/queries i was looking at those | 13:42 |
sean-k-mooney | artom: ^ | 13:42 |
artom | sean-k-mooney, yeah... and I appear to be doing everything right :( | 13:44 |
sean-k-mooney | i dont see any example of actully match on the tempest output | 13:45 |
artom | Wait, do I need to capitalize AND? | 13:45 |
sean-k-mooney | maybe | 13:45 |
sean-k-mooney | yes | 13:45 |
*** eharney has joined #openstack-nova | 13:45 | |
artom | *facepalm* | 13:46 |
artom | OK, now it's turning up nothing | 13:46 |
sean-k-mooney | artom: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_cold_migration_revert%5C%22%20AND%20message%3A%5C%22FAILED%5C%22%20AND%20tags%3A%5C%22job-output.txt%5C%22 | 13:47 |
sean-k-mooney | set it to 30days | 13:47 |
*** amodi has joined #openstack-nova | 13:47 | |
sean-k-mooney | and things show up | 13:47 |
sean-k-mooney | the last failure was about an hour ago | 13:48 |
sean-k-mooney | looks like it start to show up on teh 13th | 13:49 |
artom | sean-k-mooney, that sounds about right | 13:50 |
artom | Means https://review.opendev.org/#/c/663405/ is causing it | 13:50 |
*** belmoreira has quit IRC | 13:51 | |
stephenfin | melwitt: If you're around today, could you take a look at this doc fix? https://review.opendev.org/#/c/670125/ | 13:51 |
artom | All on different changes, too | 13:52 |
sean-k-mooney | the first failure was https://review.opendev.org/#/c/627765/ | 13:52 |
sean-k-mooney | it might be caused by https://review.opendev.org/#/c/663405/ but it looks like a neutron issue to me | 13:53 |
artom | sean-k-mooney, it has to be "cause" by https://review.opendev.org/#/c/663405/ because before that landed we just didn't run that test :) | 13:55 |
alex_xu | efried: if we need a spec, I can write up one | 13:55 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Integrate 'pre-commit' https://review.opendev.org/665518 | 13:56 |
sean-k-mooney | well we dont know that https://review.opendev.org/#/c/663405/ would fix the issue in that test | 13:56 |
sean-k-mooney | we speculated that it should | 13:56 |
artom | sean-k-mooney, you mean https://review.opendev.org/#/c/667177/? | 13:56 |
sean-k-mooney | sorry https://review.opendev.org/#/c/663405/ is the tempest chage | 13:56 |
artom | Yeah, we thought it'd be fine | 13:56 |
artom | Apparently not | 13:57 |
artom | And like I said, I checked logs for 2 failing runs, everything seems OK from the Nova POV | 13:57 |
sean-k-mooney | right so https://review.opendev.org/#/c/667177/ is likely not enough | 13:57 |
artom | We correctly wait for plug-time events, we received them, we finish booting up the guest | 13:57 |
sean-k-mooney | it look like the neutron l3 agent has not correctly set up the floating ip | 13:57 |
sean-k-mooney | and that is why the ssh connection is not working | 13:57 |
*** jmlowe has quit IRC | 13:57 | |
artom | sean-k-mooney, ping is failing | 13:58 |
artom | But with no working fip we'd expect both ping and SSH to fail | 13:58 |
sean-k-mooney | the vm has got an ip from dhcp | 13:58 |
sean-k-mooney | so it has network connectivity | 13:58 |
sean-k-mooney | but since the ping/ssh are failing then that means its a floating ip issue | 13:58 |
artom | I'll start by filing a bug, we can start skipping that test again | 13:58 |
artom | And then work on a solution in its own time | 13:59 |
*** belmoreira has joined #openstack-nova | 13:59 | |
artom | Should I file it under Neutron then? | 14:00 |
sean-k-mooney | i think the neutorn folks are aware of this | 14:00 |
efried | alex_xu: when looking through the patches, is there anything specific you want me to be aware of etc? | 14:00 |
sean-k-mooney | am yes but i would check with them first | 14:00 |
sean-k-mooney | artom: this is the important bit http://logs.openstack.org/31/668631/7/check/tempest-slow-py3/a16d7d9/job-output.txt.gz#_2019-07-14_15_36_51_307956 | 14:00 |
sean-k-mooney | it sent a select to confim it recieved the ip to the dhcp server | 14:01 |
sean-k-mooney | then we try to ssh in and it fails | 14:01 |
*** jmlowe has joined #openstack-nova | 14:01 | |
sean-k-mooney | actull it connect a little later http://logs.openstack.org/31/668631/7/check/tempest-slow-py3/a16d7d9/job-output.txt.gz#_2019-07-14_15_36_51_311500 | 14:02 |
sean-k-mooney | artom: ok so its the ping after teh rever that is failing | 14:06 |
*** yonglihe has joined #openstack-nova | 14:06 | |
artom | sean-k-mooney, yeah. | 14:07 |
alex_xu | efried: I added a device manager, want to hear your thought, is it something we need, or I'm overcomplex it. | 14:07 |
artom | sean-k-mooney, so my kids woke up (late sleepers), I'll be doing dad stuff got a bit, I'll check back in once I'm at the office | 14:07 |
sean-k-mooney | the differenice i am seeing is we dont seem to be trying to ssh before we ping in the final case | 14:07 |
*** mdbooth has quit IRC | 14:07 | |
sean-k-mooney | artom: sure no worries | 14:08 |
yonglihe | sean-k-mooney, I still not totally got the unit test work for 'orphan cleanup'. | 14:08 |
yonglihe | but still working on that. | 14:08 |
*** mdbooth has joined #openstack-nova | 14:09 | |
artom | sean-k-mooney, I guess I'll file the bug for now, in Neutron. We can always change component later. | 14:10 |
yonglihe | 'Add server sub-resource topology API' is on run queue. I fixed the merge conflict. Unit test should pass, currently zuul failure seems not my fault: https://review.opendev.org/#/c/621476/. (seems bumping on image saving test case) | 14:11 |
yonglihe | It hit Bug 1737634: http://status.openstack.org/elastic-recheck/index.html#1737634 | 14:20 |
openstack | bug 1713163 in tempest "duplicate for #1737634 test_delete_saving_image fails because image hasn't transitioned to SAVING" [Medium,Confirmed] https://launchpad.net/bugs/1713163 | 14:20 |
*** belmoreira has quit IRC | 14:26 | |
*** artom has quit IRC | 14:26 | |
*** tesseract has quit IRC | 14:28 | |
*** dpawlik has quit IRC | 14:29 | |
*** belmoreira has joined #openstack-nova | 14:30 | |
*** tesseract has joined #openstack-nova | 14:30 | |
*** ivve has quit IRC | 14:36 | |
*** dklyle has joined #openstack-nova | 14:38 | |
*** belmoreira has quit IRC | 14:39 | |
*** TxGirlGeek has joined #openstack-nova | 14:39 | |
*** irclogbot_1 has quit IRC | 14:41 | |
*** belmoreira has joined #openstack-nova | 14:45 | |
*** ericyoung has quit IRC | 14:49 | |
*** ericyoung has joined #openstack-nova | 14:49 | |
*** mlavalle has joined #openstack-nova | 14:49 | |
*** irclogbot_2 has joined #openstack-nova | 14:51 | |
*** Luzi has quit IRC | 14:52 | |
*** belmoreira has quit IRC | 14:52 | |
*** rouk has quit IRC | 14:53 | |
*** belmoreira has joined #openstack-nova | 14:55 | |
*** betherly has joined #openstack-nova | 14:55 | |
*** beekneemech is now known as bnemec | 15:01 | |
*** tesseract has quit IRC | 15:02 | |
efried | dansmith: http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-07-11.log.html#t2019-07-11T13:39:31 | 15:03 |
efried | LibvirtDriver.power_off() calls self._destroy() which calls guest.poweroff() which calls self._domain.destroy() | 15:03 |
*** tesseract has joined #openstack-nova | 15:03 | |
efried | That is, unless I'm missing something, the domain XML is destroyed at power off, not during the startup. | 15:04 |
*** luksky11 has quit IRC | 15:04 | |
dansmith | efried: I'm not sure what you're saying or asking | 15:05 |
efried | dansmith: When we were talking about doing device (specifically vpmem) claims via the virt driver | 15:06 |
dansmith | efried: when we go into a reboot operation, before we do anything to the guest, we can look at the existing xml.. then if we need to un/redefine it, we can do that with the context we just learned | 15:06 |
*** hemna has quit IRC | 15:07 | |
efried | dansmith: Okay, but if I do a stop, then "wait a while", then start again, I've lost that information. | 15:07 |
dansmith | no | 15:07 |
dansmith | you do not lose the xml across a stop/start | 15:07 |
dansmith | a libvirt destroy operation does not lose data, it stops the instance (old xen terminology) | 15:07 |
dansmith | an undefine operation drops the stored definition of the guest | 15:07 |
efried | self._domain.destroy() <== does this lose the data? | 15:08 |
dansmith | no | 15:08 |
efried | oh | 15:08 |
efried | interesting choice of name | 15:08 |
dansmith | it comes from xen | 15:08 |
efried | alex_xu: ^ | 15:08 |
alex_xu | dansmith: why the start instance action is building xml... | 15:10 |
dansmith | alex_xu: to update the instance with any stored data or config changes that may have happened | 15:11 |
*** ricolin__ has quit IRC | 15:11 | |
dansmith | since nova is the authority for a lot of things (like config drive or nic mac address, etc) | 15:11 |
*** ricolin_ has joined #openstack-nova | 15:11 | |
alex_xu | ah.. | 15:12 |
efried | so why do we need to store things like PCI devs, NUMA topo, etc. in the db at all? | 15:12 |
*** ttsiouts has quit IRC | 15:13 | |
dansmith | because we have apis for them? | 15:13 |
*** ttsiouts has joined #openstack-nova | 15:13 | |
sean-k-mooney | efried: because we need to asign specfiic devices,cpus to instances and we dont persitit the libvirt xml | 15:14 |
dansmith | because we need aggregated views of that data without contacting every node in the cluste for each sort of thing/ | 15:14 |
sean-k-mooney | so if we dont store it in the db we dont know what ones are used if an instance is not running | 15:14 |
alex_xu | sean-k-mooney: we just talk about the libvirt actually persistent everything | 15:14 |
sean-k-mooney | also we need it for the filters | 15:15 |
sean-k-mooney | alex_xu: that would cause other issues on upgrade of libvirt versions and nova | 15:15 |
dansmith | sean-k-mooney: not sure that's true (that we can't get it from libvirt from stored domains), | 15:15 |
dansmith | but the operations we need to do would be incredibly expensive if we had to collect "who is using what right now" by contacting every node | 15:15 |
sean-k-mooney | dansmith: well if we delete teh domain on power off we need to reserve the cores so that we dont violate the requiremetn by booting a new vm | 15:16 |
efried | operations like scheduling, determining available resources... | 15:16 |
sean-k-mooney | if we dont undefine the domain on power off | 15:16 |
dansmith | sean-k-mooney: again, we do not delete the domain on poweroff | 15:16 |
sean-k-mooney | then its possible we dont need to store tehm for that | 15:16 |
sean-k-mooney | but we would still want to have it for the filter | 15:16 |
*** burt has quit IRC | 15:16 | |
sean-k-mooney | we do on power on because it calls hard reboot | 15:17 |
efried | So in the utopian future where all of that is done properly in placement, the scheduler wouldn't need that info in the db and we could theoretically get rid of it. | 15:17 |
dansmith | right, we need that info stored because we've promised a cluster-wide API for them, and because the filters need to be able to survey the entire landscape efficiently | 15:17 |
sean-k-mooney | we might not on power off i didnt check | 15:17 |
sean-k-mooney | e.g. ill take your word for it | 15:17 |
*** ttsiouts has quit IRC | 15:18 | |
efried | dansmith: so, for a new resource we're tracking, specifically vpmem for this conversation, | 15:20 |
efried | since we're tracking the inventory properly in placement, | 15:20 |
efried | that means only the virt driver needs to know the connection between an allocation and a specific vpmem namespace, | 15:20 |
efried | so there's no reason to store vpmems in the database | 15:20 |
dansmith | if that is doable (i.e. we don't have to store them for some other reason) then I think that's ideal, yes | 15:21 |
efried | and, | 15:21 |
efried | since we can recover the vpmem information from the domain xml on operations other than migrations (see below), there's no need to store the vpmem info on the Instance either. | 15:21 |
*** ttsiouts has joined #openstack-nova | 15:21 | |
efried | So the only place we actually need the information is in a migrate context | 15:21 |
dansmith | do we need it in a migration context? | 15:21 |
alex_xu | yes | 15:22 |
efried | not sure | 15:22 |
dansmith | the two ends of a migration are storing the machine-local context in their own libvirts | 15:22 |
dansmith | maybe for the revert case? | 15:22 |
efried | alex_xu: why would we need it in migration context? | 15:22 |
alex_xu | I need copy the source pmem data to the dest pmem. I need dest pmem device path | 15:22 |
dansmith | wait | 15:22 |
dansmith | so for cold migration you're going to migrate the data as well? | 15:22 |
alex_xu | yes | 15:23 |
efried | there's a conf option | 15:23 |
alex_xu | yes, it is configurable by the extra spec | 15:23 |
dansmith | how is that going to work? write it to disk, consider it a disk image to move and then write it back? | 15:23 |
sean-k-mooney | dansmith: not by defualt | 15:23 |
alex_xu | https://review.opendev.org/#/c/634556/12/nova/privsep/libvirt.py | 15:23 |
dansmith | I *love* this snowflake feature you guys have come up with | 15:23 |
sean-k-mooney | we are default to not copy the data but we optionally can copy it | 15:23 |
alex_xu | ^ here is, ssh... | 15:23 |
dansmith | alex_xu: so you're going to add another scp operation? | 15:24 |
alex_xu | yes... | 15:24 |
dansmith | gross | 15:24 |
alex_xu | it is something we hate... | 15:24 |
*** hemna has joined #openstack-nova | 15:24 | |
efried | What's the alternative? Block migrations etc. Which is worse? | 15:24 |
dansmith | so if/when we get to the point of being able to use libvirt to move the images over the TLS tunnel, we're stuck scp'ing this thing? | 15:25 |
dansmith | efried: not promising that this data is persistent across moves is one option | 15:25 |
dansmith | efried: isn't this being added as a snowflake "you can never live migrate" feature? | 15:25 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: nova-manage: heal port allocations https://review.opendev.org/637955 | 15:25 |
efried | I thought live migrate was supported | 15:25 |
sean-k-mooney | dansmith: in the livemigration case qemu/libvirt can copy the data | 15:25 |
alex_xu | it support lm | 15:26 |
sean-k-mooney | for cold migrate we have to do it but maybe we can make a libvirt feature request? | 15:26 |
dansmith | okay, I thought it wasn't going to be able to move that data | 15:26 |
alex_xu | i'm ok without copy for resize | 15:26 |
efried | http://specs.openstack.org/openstack/nova-specs/specs/train/approved/virtual-persistent-memory.html#live-migration | 15:27 |
sean-k-mooney | alex_xu: the issue with scp/rsync for resize is it wont work for cross cell resize | 15:27 |
efried | apparently libvirt moves it along with RAM | 15:27 |
sean-k-mooney | alex_xu: e.g. for cross cell resize we cant assume teh compute nodes have network connectivtiy | 15:27 |
alex_xu | sean-k-mooney: yes, we are going to stop cross cell resize in the initial proposal | 15:27 |
*** igordc has joined #openstack-nova | 15:27 | |
sean-k-mooney | at least not in the edge deployment case | 15:27 |
dansmith | live migration is transparent to the user, so that's good, and resize is not, so I think it's reasonable to say that there's no data copy when you resize | 15:28 |
dansmith | you could resize to a flavor with/without pmem too, | 15:28 |
*** maciejjozefczyk_ has quit IRC | 15:28 | |
dansmith | so not copying would cover that case in both directions | 15:28 |
*** maciejjozefczyk_ has joined #openstack-nova | 15:28 | |
efried | per the spec, "reduction" in vpmem would not be allowed | 15:28 |
dansmith | because of this | 15:28 |
dansmith | I have to jump on a call now, | 15:29 |
*** hemna has quit IRC | 15:29 | |
dansmith | but I'm really unhappy with us adding another "just scp it across" thing | 15:29 |
efried | alex_xu: Sounds like the spec update https://review.opendev.org/669970 needs to be rethought :) | 15:29 |
alex_xu | yea, will update again | 15:29 |
alex_xu | I think the copy for resize can be removed | 15:30 |
efried | & cold migration | 15:30 |
efried | (which is the same thing?) | 15:30 |
dansmith | they are the same thing | 15:30 |
* efried shoots self in face | 15:30 | |
*** hemna has joined #openstack-nova | 15:31 | |
*** gyee has joined #openstack-nova | 15:31 | |
alex_xu | remove the copy, remove the vpmem field | 15:31 |
efried | doesn't the VM see the vpmem as persistent storage, though? | 15:31 |
sean-k-mooney | efried: not really | 15:31 |
efried | won't it freak out if it boots and the data is gone? | 15:31 |
efried | okay, then yeah, that vastly simplifies things. | 15:31 |
sean-k-mooney | it sees it as ram/dimms | 15:31 |
efried | unfortunate that we already merged https://review.opendev.org/#/c/662697/ -- seems as though we won't be using that? | 15:32 |
sean-k-mooney | the data is ment to be persitent | 15:32 |
alex_xu | and the most of usecase is for cache, so it is ok | 15:32 |
sean-k-mooney | but you should really store your data soewhere else and keep the working set in teh pmem | 15:32 |
sean-k-mooney | ya most workload use it as high capasity scratch space for operating on a subset of the data but the long term storage of the data should be in a cinder volume | 15:33 |
efried | I've put a hold on https://review.opendev.org/#/c/634548/ for now | 15:36 |
*** lpetrut has quit IRC | 15:36 | |
*** ivve has joined #openstack-nova | 15:40 | |
gibi | efried: thanks for catching bug in https://review.opendev.org/#/c/637955/ I've fixed it. | 15:51 |
efried | gibi: cool | 15:51 |
efried | got a nice backlog today, but hopefully I can get back around to it. | 15:51 |
efried | gibi: since you're around, would you mind pushing https://review.opendev.org/#/c/657464/ ? | 15:51 |
gibi | efried: on it | 15:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Update supported transports for iscsi connector https://review.opendev.org/524443 | 15:52 |
efried | thanks | 15:52 |
*** hemna has quit IRC | 15:59 | |
efried | sean-k-mooney: is rebuild admin-only? | 16:00 |
sean-k-mooney | no | 16:00 |
sean-k-mooney | rebuild and resize can both be done by tenants | 16:00 |
efried | sean-k-mooney: okay, so back to that thing we were looking at earlier, it looks like setup_networks_on_host is caled in rebuild and resize flows as well as migrate | 16:01 |
*** maciejjozefczyk_ has quit IRC | 16:02 | |
efried | hm, unless teardown=True only happens on migrations... | 16:03 |
openstackgerrit | Merged openstack/os-resource-classes master: Propose FPGA and PGPU resource classes https://review.opendev.org/657464 | 16:03 |
efried | does _confirm_resize only happen on migration-y resizes? | 16:03 |
*** damien_r has quit IRC | 16:03 | |
sean-k-mooney | efried: it happesn on resize or cold migration | 16:04 |
sean-k-mooney | in both cases we go into resize_verify and you chave to do reseize --confim | 16:05 |
efried | rats, nother meeting, will come back to this... | 16:05 |
sean-k-mooney | efried: stephenfin is adding an openstack server migrate --confirm as synatactic sugar but its the same call underneath | 16:06 |
stephenfin | that threw me too, fwiw | 16:07 |
stephenfin | (whether cold migrations needed to be confirmed or not) | 16:07 |
*** ttsiouts has quit IRC | 16:07 | |
sean-k-mooney | stephenfin: well that depends on your config settings | 16:07 |
sean-k-mooney | but ya | 16:07 |
*** ttsiouts has joined #openstack-nova | 16:08 | |
sean-k-mooney | i hit that oddity back in hevana so at this point i dont even think about them differently anymore. | 16:08 |
sean-k-mooney | (cold migraton vs resize) | 16:08 |
*** tssurya has quit IRC | 16:11 | |
*** ttsiouts has quit IRC | 16:13 | |
*** belmoreira has quit IRC | 16:30 | |
*** rpittau is now known as rpittau|afk | 16:30 | |
*** artom has joined #openstack-nova | 16:39 | |
*** artom has quit IRC | 16:43 | |
*** helenafm has quit IRC | 16:49 | |
*** ricolin_ has quit IRC | 16:50 | |
*** davidsha has quit IRC | 16:51 | |
*** udesale has quit IRC | 16:56 | |
*** cdent has joined #openstack-nova | 16:58 | |
*** belmoreira has joined #openstack-nova | 16:58 | |
*** belmoreira has quit IRC | 17:02 | |
*** derekh has quit IRC | 17:03 | |
*** lpetrut has joined #openstack-nova | 17:03 | |
*** hongda has quit IRC | 17:06 | |
*** igordc has quit IRC | 17:07 | |
*** artom has joined #openstack-nova | 17:15 | |
bbobrov | sean-k-mooney: hey | 17:17 |
bbobrov | sean-k-mooney: regarding your comment to https://review.opendev.org/#/c/638680/24/nova/virt/libvirt/driver.py | 17:17 |
* sean-k-mooney clicks | 17:18 | |
bbobrov | sean-k-mooney: do you have a traceback how it blows up? | 17:18 |
sean-k-mooney | i have fixed it in https://review.opendev.org/#/c/670189/ | 17:18 |
sean-k-mooney | bbobrov: basicaly the old code would use q35 if there was not default for an archatecutre | 17:19 |
sean-k-mooney | which is invalid if you have sparc or many other arch specific qemu-* packages installed | 17:20 |
sean-k-mooney | so libvirt would raise an exception saying q35 is not supported by emulator X | 17:20 |
bbobrov | sean-k-mooney: understood, thanks. Don't we have a ci job to catch this kind of stuff? | 17:21 |
sean-k-mooney | this was not caught as this is currently not used outside of the tests and the test mock out all calls to libvirt | 17:21 |
sean-k-mooney | so it was passing because of invalid test data | 17:22 |
sean-k-mooney | if you tried to call that finciton in the agent then you got tracebacks | 17:22 |
*** ralonsoh has quit IRC | 17:22 | |
sean-k-mooney | on ubunutu 18.04 they package all the emultor by defualt and install them when you install qemu and qemu-kvm | 17:23 |
bbobrov | ok, thanks. I'll review 670189 then | 17:23 |
sean-k-mooney | so i would have expect this to break but only when we added code to call this | 17:23 |
sean-k-mooney | i think the signiture of get_domain_capabilities is proably not what we want but i have fixed up the function without changing it for now to not break the sev code an allow me to contue with my own work | 17:25 |
*** altlogbot_1 has quit IRC | 17:25 | |
*** irclogbot_2 has quit IRC | 17:26 | |
sean-k-mooney | the api signature of _get_domain_capabilities is actully more useful IMO. | 17:26 |
bbobrov | sean-k-mooney: how should it look then? Maybe i could quickly fix the sev code | 17:26 |
bbobrov | sean-k-mooney: and rebase on top of the fix | 17:26 |
sean-k-mooney | i think we should be able to optionally be able to pass the arch and machine type to it and have it query them if its not cached | 17:27 |
sean-k-mooney | https://review.opendev.org/#/c/670189/3/nova/virt/libvirt/host.py@674 | 17:27 |
sean-k-mooney | have it default to arch=None mtype=None and retrun all of them | 17:27 |
sean-k-mooney | but it depend on how we will be using it | 17:28 |
sean-k-mooney | if we look up data by the arch only then its fine | 17:28 |
sean-k-mooney | if we look it up by arch and mtype there is no gurantee the combination will be in the dict | 17:29 |
*** irclogbot_2 has joined #openstack-nova | 17:29 | |
*** TxGirlGeek has quit IRC | 17:29 | |
*** altlogbot_1 has joined #openstack-nova | 17:29 | |
sean-k-mooney | i can work with the api as it is for now as i just need it to create traits | 17:30 |
sean-k-mooney | but if i was using this in the driver the current interface would be limiting | 17:30 |
*** altlogbot_1 has quit IRC | 17:31 | |
sean-k-mooney | ideally the libvirt driver would never need to call _get_domain_capabilities directly but since the mtype can be set in the image its possible that you will not find in in the cached copy and would need too | 17:31 |
*** TxGirlGeek has joined #openstack-nova | 17:32 | |
*** irclogbot_2 has quit IRC | 17:32 | |
*** lpetrut has quit IRC | 17:33 | |
*** irclogbot_3 has joined #openstack-nova | 17:39 | |
*** altlogbot_2 has joined #openstack-nova | 17:40 | |
*** dpawlik has joined #openstack-nova | 17:41 | |
*** tesseract has quit IRC | 17:45 | |
*** dpawlik has quit IRC | 17:52 | |
*** dpawlik has joined #openstack-nova | 17:52 | |
*** igordc has joined #openstack-nova | 18:05 | |
*** brault has joined #openstack-nova | 18:36 | |
*** panda has quit IRC | 18:38 | |
*** panda has joined #openstack-nova | 18:40 | |
*** hemna has joined #openstack-nova | 18:40 | |
efried | bbobrov: Hi, since you're here, are you working on rebasing the SEV series? | 18:40 |
*** brault has quit IRC | 18:41 | |
*** tbachman has joined #openstack-nova | 18:42 | |
*** brault has joined #openstack-nova | 18:44 | |
*** jmlowe has quit IRC | 18:46 | |
*** brault has quit IRC | 18:51 | |
openstackgerrit | Boris Bobrov proposed openstack/nova master: Provide HW_CPU_X86_AMD_SEV trait when SEV is supported https://review.opendev.org/638680 | 19:03 |
openstackgerrit | Boris Bobrov proposed openstack/nova master: Add extra spec parameter and image property for memory encryption https://review.opendev.org/664420 | 19:03 |
openstackgerrit | Boris Bobrov proposed openstack/nova master: Extract SEV-specific bits on host detection https://review.opendev.org/636334 | 19:03 |
openstackgerrit | Boris Bobrov proposed openstack/nova master: Add <launchSecurity> and <driver iommu='on' /> to config.py https://review.opendev.org/636318 | 19:03 |
openstackgerrit | Boris Bobrov proposed openstack/nova master: Apply SEV-specific guest config when SEV is required https://review.opendev.org/644565 | 19:03 |
openstackgerrit | Boris Bobrov proposed openstack/nova master: Enable booting of libvirt guests with AMD SEV memory encryption https://review.opendev.org/666616 | 19:03 |
bbobrov | efried: here is the answer :) | 19:03 |
efried | blam! | 19:04 |
efried | thanks bbobrov | 19:04 |
bbobrov | i will reply to the comments now | 19:04 |
artom | What happened to aspiers? | 19:04 |
*** belmoreira has joined #openstack-nova | 19:21 | |
efried | was wondering same | 19:21 |
*** maciejjozefczyk_ has joined #openstack-nova | 19:22 | |
*** lee1 has joined #openstack-nova | 19:23 | |
*** lee1 is now known as lyarwood | 19:23 | |
*** jmlowe has joined #openstack-nova | 19:25 | |
*** maciejjozefczyk_ has quit IRC | 19:28 | |
*** luksky11 has joined #openstack-nova | 19:34 | |
openstackgerrit | Eric Fried proposed openstack/nova master: Use Adapter global_request_id kwarg https://review.opendev.org/670907 | 19:35 |
cdent | efried: the semaphore thing is indirectly related to https://bugs.launchpad.net/nova/+bug/1835958 (power states not under the same lock, but in the realm of nova-compute performance) | 19:35 |
openstack | Launchpad bug 1835958 in OpenStack Compute (nova) "Nova sync power state on large clusters causes poor performance" [Undecided,New] | 19:35 |
efried | cdent: Only slightly related, I've been talking to alex_xu about moving more stuff under that semaphore. | 19:37 |
efried | specifically driver-specific claim stuff | 19:37 |
efried | Not sure if you were around for those conversations. | 19:37 |
cdent | not that I'm aware of | 19:37 |
cdent | it's a problem for high throughput nova-computes (like in vmware where the nova-compute is a chokepoint, rather than a member of a nice bit of parallelism like in a big kvm cloud) | 19:38 |
*** belmoreira has quit IRC | 19:40 | |
efried | cdent: well, the plan is to start delegating virt-specific claimage to the virt driver itself. I could see where vmware could take its own semaphore on a specific internal nodeything and background the real claim job, so the next one could come in and get started (but on a different nodeything). | 19:40 |
efried | and it would be none of RT's business. | 19:40 |
efried | the more we move claimables into placement, the more we can delegate the claim logic for same to the virt driver. | 19:41 |
efried | so e.g. PCI devices would eventually become virt driver business. | 19:42 |
efried | and - as we were discussing this morning - not even stored in the db anymore. | 19:42 |
cdent | well that would be lovely | 19:44 |
*** eharney has quit IRC | 19:49 | |
slaweq | hi, is there any n-meta-api service expert here? Can You take a look at https://bugs.launchpad.net/neutron/+bug/1836642 from Nova PoV? Thx in advance :) | 19:57 |
openstack | Launchpad bug 1836642 in OpenStack Compute (nova) "Metadata responses are very slow sometimes" [Undecided,New] | 19:57 |
efried | artom: catching up, are you filing an elastic-recheck profile for bug 1836595 ? | 20:01 |
openstack | bug 1836595 in neutron "test_server_connectivity_cold_migration_revert failing" [Undecided,New] https://launchpad.net/bugs/1836595 | 20:01 |
*** betherly has quit IRC | 20:02 | |
artom | efried, I could - should I be? :) | 20:02 |
artom | efried, I'm kinda hoping https://review.opendev.org/#/c/670848/ would merge though | 20:02 |
artom | Speaking of which, gmann ^^ :) | 20:03 |
artom | (If you're around, not sure about your TZ) | 20:03 |
efried | artom: This is interesting, the failure on that skip patch http://logs.openstack.org/48/670848/1/check/neutron-tempest-dvr/ed2b81c/testr_results.html.gz | 20:05 |
artom | efried, looks like feral packets to me | 20:06 |
efried | conflict deleting allocations, looks like it's tempest itself that's being a placement client. Without having looked at the code, one wonders whether it should be retrying | 20:06 |
efried | feral packets? how so? | 20:06 |
artom | Yeah, tempest makes a point not to use any Python clients | 20:06 |
artom | efried, heh, a lazy way of saying "something unrelated I can't be arsed to debug" :P | 20:07 |
artom | So it'll do the GET requests itself | 20:07 |
efried | there is no placement client, my point is that this isn't tempest calling something in nova that's talking to placement and getting a conflict. | 20:07 |
efried | this is tempest talking directly to placement | 20:07 |
artom | And that's bad? | 20:07 |
efried | it means there's likely an opportunity to harden the tempest code so this failure doesn't happen anymore. | 20:08 |
* efried clones tempest for possibly the second time ever... | 20:08 | |
artom | *facepalm*, oh just got what you're saying | 20:08 |
artom | Tempest shouldn't be deleting allocations itself, Nova should be doing it | 20:09 |
artom | I think that response is from Nova though | 20:09 |
*** altlogbot_2 has quit IRC | 20:10 | |
efried | mmm, yes, it looks like you're right. | 20:11 |
artom | That test is really confusing | 20:11 |
efried | if the error is coming from nova, it means we have a bug in nova. | 20:13 |
*** altlogbot_1 has joined #openstack-nova | 20:13 | |
artom | Well, looks like tempest is creating a server, not waiting for it to become ACTIVE, then immediately deleting it | 20:16 |
artom | So I suspect that makes Nova hit a race between the build process and the delete request | 20:16 |
cdent | artom: that sounds right | 20:16 |
artom | Short term that can be "fixed" by making Tempest wait | 20:17 |
artom | Longer term I guess we'll need to stick a lock in Nova somewhere | 20:17 |
cdent | placement log entries are near http://logs.openstack.org/48/670848/1/check/neutron-tempest-dvr/ed2b81c/controller/logs/screen-placement-api.txt.gz#_Jul_15_17_27_35_283720 | 20:18 |
cdent | handling for the req id starts here http://logs.openstack.org/48/670848/1/check/neutron-tempest-dvr/ed2b81c/controller/logs/screen-n-cpu.txt.gz#_Jul_15_17_27_33_968264 | 20:20 |
artom | Actually from the little I know about placement it should handle those kinds of races with the generation thing, right? | 20:20 |
artom | So maybe Nova just needs to handle Placement error better | 20:20 |
efried | the generation thing is exactly what's happening here | 20:20 |
efried | alloc deletion was specifically set up so that you couldn't race deletion with some other consumer op | 20:20 |
efried | so this guard is doing exactly what it's supposed to, and it's the overarching operation that's broken. As you say, trying to delete while creating. | 20:21 |
cdent | "instance disappeared during build" http://logs.openstack.org/48/670848/1/check/neutron-tempest-dvr/ed2b81c/controller/logs/screen-n-cpu.txt.gz#_Jul_15_17_27_34_806925 | 20:21 |
efried | only question is where to fix it. | 20:21 |
efried | it would be possible to redrive the alloc delete until it works. But that's still potentially racy if the other op is trying to create the alloc at the same time. | 20:22 |
efried | there's a couple races here actually. | 20:23 |
artom | @instance_state(ACTIVE) for delete :D | 20:23 |
efried | The other one is if the deletion goes through before the allocation is even created | 20:23 |
efried | it will not raise an exception (it returns False, but the caller doesn't check that) so we'll end up with a leaked allocation | 20:25 |
cdent | efried, artom if you end up creating a bug about this, please let me know what it is so I can follow along, tomorrow | 20:26 |
* cdent waves goodnight | 20:26 | |
*** cdent has quit IRC | 20:26 | |
artom | efried, mind doing it? You have a better grasp of that stuff, and I'm off in 30 minutes anyways, daycare duty | 20:27 |
efried | ack | 20:27 |
*** pcaruana has quit IRC | 20:29 | |
*** slaweq has quit IRC | 20:31 | |
*** dpawlik has quit IRC | 20:34 | |
*** TxGirlGeek has quit IRC | 20:35 | |
*** BjoernT has joined #openstack-nova | 20:37 | |
*** eharney has joined #openstack-nova | 20:41 | |
*** BjoernT_ has joined #openstack-nova | 20:41 | |
*** BjoernT has quit IRC | 20:42 | |
*** TxGirlGeek has joined #openstack-nova | 20:46 | |
artom | *snerk* | 20:48 |
artom | https://bugs.launchpad.net/nova/+bug/1836204 | 20:49 |
openstack | Launchpad bug 1836204 in OpenStack Compute (nova) "The allocation of VGPU has race problem" [High,Triaged] - Assigned to Alex Xu (xuhj) | 20:49 |
artom | I guess it hates blacks | 20:49 |
artom | *shakes head* I'm really sorry | 20:49 |
*** xek has quit IRC | 20:52 | |
*** artom has quit IRC | 21:07 | |
*** whoami-rajat has quit IRC | 22:04 | |
*** BjoernT_ has quit IRC | 22:07 | |
*** luksky11 has quit IRC | 22:20 | |
*** icarusfactor has joined #openstack-nova | 22:22 | |
*** factor has quit IRC | 22:22 | |
*** ircuser-1 has joined #openstack-nova | 22:23 | |
*** factor has joined #openstack-nova | 22:32 | |
*** factor has quit IRC | 22:33 | |
*** icarusfactor has quit IRC | 22:34 | |
*** tbachman has quit IRC | 22:50 | |
*** tbachman has joined #openstack-nova | 22:52 | |
efried | dansmith: Do you feel we need a bp/spec for claim_for_instance? | 22:57 |
efried | there should be no API, db, object, conf, upgrade, or doc impacts | 22:58 |
*** tbachman has joined #openstack-nova | 22:58 | |
*** tkajinam has joined #openstack-nova | 22:59 | |
*** rcernin has joined #openstack-nova | 23:29 | |
*** TxGirlGeek has quit IRC | 23:45 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!