*** brinzhang has joined #openstack-nova | 00:20 | |
*** ociuhandu has joined #openstack-nova | 00:21 | |
*** ociuhandu has quit IRC | 00:25 | |
*** threestrands has joined #openstack-nova | 00:35 | |
*** maohongbo has quit IRC | 00:46 | |
*** slaweq has joined #openstack-nova | 00:50 | |
*** brinzhang has quit IRC | 00:58 | |
*** brinzhang has joined #openstack-nova | 01:01 | |
*** brinzhang has quit IRC | 01:28 | |
*** brinzhang has joined #openstack-nova | 01:29 | |
*** sapd1 has quit IRC | 01:52 | |
*** sapd1 has joined #openstack-nova | 02:11 | |
*** ociuhandu has joined #openstack-nova | 02:26 | |
*** ociuhandu has quit IRC | 02:36 | |
*** ociuhandu has joined #openstack-nova | 02:37 | |
*** ociuhandu has quit IRC | 02:42 | |
*** huaqiang has joined #openstack-nova | 02:42 | |
*** ociuhandu has joined #openstack-nova | 02:55 | |
*** jhesketh has joined #openstack-nova | 02:56 | |
openstackgerrit | Qitao proposed openstack/nova master: Use unittest.mock instead of third party mock https://review.opendev.org/721145 | 02:59 |
---|---|---|
*** ociuhandu has quit IRC | 03:05 | |
*** kevinz has joined #openstack-nova | 03:07 | |
*** ircuser-1 has quit IRC | 03:09 | |
*** mkrai has joined #openstack-nova | 03:15 | |
*** psachin has joined #openstack-nova | 03:37 | |
*** ociuhandu has joined #openstack-nova | 03:48 | |
*** ociuhandu has quit IRC | 03:53 | |
*** ratailor has joined #openstack-nova | 04:21 | |
*** evrardjp has quit IRC | 04:23 | |
*** udesale has joined #openstack-nova | 04:38 | |
*** ratailor has quit IRC | 04:38 | |
*** mkrai has quit IRC | 04:44 | |
*** ratailor has joined #openstack-nova | 04:44 | |
*** mkrai has joined #openstack-nova | 04:45 | |
*** tkajinam has quit IRC | 05:12 | |
*** tkajinam has joined #openstack-nova | 05:13 | |
*** iokiwi has quit IRC | 05:16 | |
*** iokiwi has joined #openstack-nova | 05:17 | |
*** yaawang_ has quit IRC | 05:18 | |
*** yaawang_ has joined #openstack-nova | 05:20 | |
*** mkrai has quit IRC | 05:41 | |
*** evrardjp has joined #openstack-nova | 05:41 | |
*** mkrai has joined #openstack-nova | 05:42 | |
*** vishalmanchanda has joined #openstack-nova | 05:46 | |
*** dpawlik has joined #openstack-nova | 06:09 | |
*** nightmare_unreal has joined #openstack-nova | 06:32 | |
*** ociuhandu has joined #openstack-nova | 07:00 | |
*** maciejjozefczyk has joined #openstack-nova | 07:02 | |
*** ttsiouts has joined #openstack-nova | 07:11 | |
*** rpittau|afk is now known as rpittau | 07:12 | |
*** tesseract has joined #openstack-nova | 07:12 | |
*** breizhkoala has joined #openstack-nova | 07:13 | |
*** ttsiouts has quit IRC | 07:14 | |
*** ttsiouts_ has joined #openstack-nova | 07:14 | |
*** threestrands_ has joined #openstack-nova | 07:18 | |
*** ralonsoh has joined #openstack-nova | 07:18 | |
*** threestrands_ has quit IRC | 07:18 | |
*** threestrands has quit IRC | 07:21 | |
bauzas | good morning Nova | 07:24 |
gibi | good morning Nova | 07:24 |
lyarwood | morning morning | 07:25 |
*** tobias-urdin has joined #openstack-nova | 07:27 | |
*** amodi has quit IRC | 07:45 | |
*** udesale_ has joined #openstack-nova | 07:47 | |
*** udesale has quit IRC | 07:50 | |
lyarwood | https://review.opendev.org/#/c/718964/ & https://review.opendev.org/#/c/701756/ could use stable core review so I can cut the release ahead of RC if anyone has time today btw | 07:56 |
*** ratailor has quit IRC | 08:05 | |
*** xek has joined #openstack-nova | 08:05 | |
*** ratailor has joined #openstack-nova | 08:05 | |
*** ociuhandu has quit IRC | 08:07 | |
*** ratailor has quit IRC | 08:07 | |
*** ratailor has joined #openstack-nova | 08:07 | |
*** ratailor has quit IRC | 08:10 | |
openstackgerrit | Lee Yarwood proposed openstack/nova master: WIP nova-next: Start testing the q35 machine type https://review.opendev.org/708701 | 08:10 |
*** ccamacho has joined #openstack-nova | 08:12 | |
*** mkrai has quit IRC | 08:15 | |
*** mkrai_ has joined #openstack-nova | 08:15 | |
*** ratailor has joined #openstack-nova | 08:29 | |
*** ratailor_ has joined #openstack-nova | 08:38 | |
*** ratailor has quit IRC | 08:42 | |
*** ociuhandu has joined #openstack-nova | 08:43 | |
bauzas | lyarwood: any idea why https://review.opendev.org/699291 is not backported ? https://review.opendev.org/#/q/topic:bug/1855927+(status:open+OR+status:merged) | 08:48 |
*** ociuhandu has quit IRC | 08:48 | |
*** ttsiouts_ has quit IRC | 08:48 | |
bauzas | lyarwood: ah, nevermind, found the backport https://review.opendev.org/#/c/701756/2 | 08:49 |
bauzas | something is actually weird with those changes, trying to untangle it | 08:52 |
*** ttsiouts has joined #openstack-nova | 08:53 | |
bauzas | ah-ah, okay, got it | 08:54 |
bauzas | https://review.opendev.org/#/q/topic:bug/1856925+(status:open+OR+status:merged) vs. https://review.opendev.org/#/q/topic:bug/1855927+%28status:open+OR+status:merged%29 | 08:54 |
*** vishalmanchanda has quit IRC | 08:56 | |
*** tosky has joined #openstack-nova | 09:02 | |
lyarwood | bauzas: sorry was head down on something downstream | 09:02 |
* lyarwood reads | 09:02 | |
bauzas | lyarwood: no worries, commenting with a +1 | 09:02 |
* bauzas sorted out all this mess | 09:02 | |
lyarwood | right because https://review.opendev.org/#/c/699291/ hasn't landed yet | 09:03 |
lyarwood | sorry | 09:03 |
lyarwood | does anyone know if we dump the compute service version somewhere in the logs during n-cpu startup? | 09:04 |
* lyarwood is trying to debug a weird stable/queens issue downstream and assumes there's an older compute in the env but can't prove it | 09:04 | |
bauzas | lyarwood: I just proposed to provide another revision of https://review.opendev.org/#/c/701756/2 that would be in a separate branch and just isolated | 09:05 |
tosky | lyarwood: you can check the package version, I guess | 09:06 |
bauzas | because that's confusing | 09:06 |
*** ratailor has joined #openstack-nova | 09:06 | |
bauzas | lyarwood: AFAIR we do provide the compute service version, lemme grab you the logs | 09:06 |
*** ratailor_ has quit IRC | 09:06 | |
*** psachin has quit IRC | 09:09 | |
bauzas | lyarwood: you can get the compute package version here which allows you to get the compute service version by looking at the code https://zuul.opendev.org/t/openstack/build/96486cc66f7542318bbcfc4a43784d56/log/controller/logs/screen-n-cpu.txt#861 | 09:10 |
*** avolkov has joined #openstack-nova | 09:11 | |
lyarwood | bauzas: do we not dump anything in the API or scheduler about the compute versions they are aware of | 09:12 |
lyarwood | bauzas: I'm specifically trying to find the service version, not the package version btw. | 09:12 |
lyarwood | bauzas: https://github.com/openstack/nova/blob/stable/queens/nova/compute/api.py#L3976-L3998 for this | 09:12 |
bauzas | lyarwood: yup, I understood your question | 09:12 |
lyarwood | bauzas: I'm thinking that there's an older compute still registered somewhere that's causing that to use the old legacy path | 09:13 |
bauzas | but AFAIR, we don't expose the service versions, just the package versions | 09:13 |
lyarwood | kk thanks | 09:14 |
bauzas | lyarwood: this being said, the version field of the Service object is maybe expose thru the REST API | 09:15 |
bauzas | exposed* | 09:15 |
* bauzas goes looking at the api-ref | 09:16 | |
bauzas | meh, nvm | 09:16 |
bauzas | this would be a nova-manage thing | 09:17 |
*** psachin has joined #openstack-nova | 09:17 | |
bauzas | at least we don't return it to the API https://docs.openstack.org/api-ref/compute/?expanded=list-compute-services-detail#id368 | 09:18 |
*** ociuhandu has joined #openstack-nova | 09:18 | |
bauzas | lyarwood: I think I found something interesting for your problem | 09:21 |
*** dtantsur|afk is now known as dtantsur | 09:21 | |
bauzas | lyarwood: the logs can emit some information about an old service when it starts https://github.com/openstack/nova/blob/master/nova/service.py#L72-L81 | 09:22 |
*** ratailor has quit IRC | 09:23 | |
*** ratailor has joined #openstack-nova | 09:23 | |
*** gregwork has quit IRC | 09:24 | |
*** rcernin has quit IRC | 09:26 | |
*** ociuhandu has quit IRC | 09:32 | |
*** ociuhandu has joined #openstack-nova | 09:33 | |
nightmare_unreal | hello is there a way to get nova cli output in json ? for e.g. openstack server list -f json give output in json. but there is no such thing(-format) in nova cli | 09:34 |
gibi | bauzas, lyarwood: the version notifications contain the service version https://github.com/openstack/nova/blob/master/doc/notification_samples/common_payloads/ServiceStatusPayload.json#L12 | 09:38 |
bauzas | gibi: ah good point, forgot it | 09:39 |
lyarwood | gibi: ah! thanks | 09:39 |
*** ociuhandu has quit IRC | 09:39 | |
lyarwood | gibi: but that's only for active services right? | 09:39 |
openstackgerrit | Sylvain Bauza proposed openstack/nova master: Allocate mdevs when resizing or reverting resize https://review.opendev.org/712741 | 09:39 |
* bauzas wonders if he was drunk when he uploaded his last bit of https://review.opendev.org/#/c/712741/ | 09:39 | |
lyarwood | gibi: I'm assuming these computes are dead tbh so maybe the db is the only real way to tell | 09:40 |
bauzas | I had to modify again something I fixed locally | 09:40 |
bauzas | for some reason, my editor provided me a stale version of an old file | 09:40 |
bauzas | so my last commit reverted another fix I made | 09:41 |
bauzas | strange... | 09:41 |
bauzas | (I have to be cautious now) | 09:41 |
bauzas | Atom-- | 09:41 |
gibi | lyarwood: if you can interact with the service via the REST API then you can get the above notificaiton | 09:44 |
gibi | like when you disable it | 09:44 |
lyarwood | ack thanks | 09:45 |
bauzas | gibi: lyarwood: I don't think it's a crucial and secret information bit to expose the service object version thru the logs | 09:46 |
lyarwood | bauzas: for admins no I guess not | 09:47 |
gibi | bauzas: agree, this is not a secret for admins | 09:48 |
bauzas | just sayin', we could log something there https://github.com/openstack/nova/blob/stable/queens/nova/service.py#L166 (for DEBUG purposes) | 09:48 |
gibi | sure | 09:48 |
bauzas | we already do https://github.com/openstack/nova/blob/stable/queens/nova/service.py#L158 | 09:48 |
bauzas | lyarwood: feel free to file a patch ;) | 09:49 |
*** rcernin has joined #openstack-nova | 09:51 | |
*** vishalmanchanda has joined #openstack-nova | 09:53 | |
brinzhang | good morning all | 09:57 |
brinzhang | gibi: I am not sure which is true, but I replied something in https://review.opendev.org/#/c/720670/ | 09:58 |
brinzhang | Indeed, nova-cinder interaction and nova-neutron interaction have different result, if the xxxclient reports exception | 09:59 |
*** mkrai has joined #openstack-nova | 10:00 | |
*** ratailor has quit IRC | 10:00 | |
*** ratailor has joined #openstack-nova | 10:02 | |
*** mkrai_ has quit IRC | 10:02 | |
bauzas | stephenfin: gibi: fwiw, we said we could let it go https://review.opendev.org/#/c/712741/ | 10:03 |
bauzas | (don't leave it frozen) | 10:03 |
bauzas | okay, it was a terrible play of words | 10:04 |
bauzas | => [] | 10:04 |
brinzhang | bauzas: a nit inline ^ | 10:06 |
bauzas | brinzhang: can't see your comment :) | 10:07 |
gibi | bauzas: I will reply after lunch | 10:07 |
brinzhang | bauzas: my net so .. slowly | 10:07 |
bauzas | thanks migi and brinzhang | 10:07 |
gibi | brinzhang: same to you | 10:07 |
bauzas | oh shit, wrong mix | 10:07 |
brinzhang | gibi: thanks | 10:08 |
bauzas | thanks gibi | 10:08 |
bauzas | migi, gibi, dammit | 10:08 |
*** ociuhandu has joined #openstack-nova | 10:08 | |
bauzas | ah, sad, he's not connected here | 10:08 |
brinzhang | bauzas: done | 10:08 |
bauzas | brinzhang: ack, good, can respin since Zuul hasn't replied yet | 10:09 |
openstackgerrit | Sylvain Bauza proposed openstack/nova master: Allocate mdevs when resizing or reverting resize https://review.opendev.org/712741 | 10:11 |
*** breizhkoala has quit IRC | 10:13 | |
*** rpittau is now known as rpittau|bbl | 10:15 | |
brinzhang | bauzas: thanks quickly update ^ | 10:16 |
brinzhang | bauzas: stephenfin working on change "import mock" to "from unittest import mock", https://review.opendev.org/#/c/714676/3 | 10:16 |
brinzhang | bauzas: does this [1] need to change? or wait for this merged, than stephenfin update that patch? [1]https://review.opendev.org/#/c/712741/6/nova/tests/functional/libvirt/test_vgpu.py@17 | 10:17 |
bauzas | I honestly think this is a rathole :) | 10:17 |
brinzhang | bauzas: yeah, I think so | 10:19 |
brinzhang | bauzas: https://review.opendev.org/#/c/712741/6/nova/virt/libvirt/driver.py@10111 need someone to check? I saw you add ? in | 10:20 |
brinzhang | others looks good to me | 10:22 |
*** ociuhandu has quit IRC | 10:22 | |
bauzas | brinzhang: not sure I understand your question ? | 10:32 |
bauzas | brinzhang: do you mean that the comment is confusing ? | 10:33 |
bauzas | b/c it's a question ? | 10:33 |
brinzhang | yes, | 10:33 |
brinzhang | that should a note, right? | 10:33 |
bauzas | ahah, no, it's just something like 'verify if we need to assign some mdevs" | 10:34 |
bauzas | if it was a question, it would be a FIXME or a TODO | 10:34 |
brinzhang | bauzas: ah, yes, I think you missed TODO or FIXME tag | 10:35 |
bauzas | brinzhang: again, no | 10:36 |
bauzas | it wasn't a question for others | 10:36 |
brinzhang | You only put one question here, and there is no extra detail, I think it needs to be added, isn't it? | 10:37 |
bauzas | brinzhang: it's not really a question for others, it's just explain what the method does | 10:38 |
bauzas | it just explains* sorry | 10:38 |
brinzhang | it's ok, it really confuses me, maybe I should take it seriously. | 10:39 |
brinzhang | thanks bauzas ^ | 10:39 |
bauzas | I can provide a FUP if you want | 10:41 |
*** derekh has joined #openstack-nova | 10:54 | |
gibi | brinzhang: responded in https://review.opendev.org/#/c/720670 | 10:56 |
gibi | bauzas: I have started looking at https://review.opendev.org/#/c/712741/ just now | 10:57 |
brinzhang | gibi: I think your think make sense, agree, thanks | 11:00 |
brinzhang | this is an invalid bug | 11:03 |
*** songwenping has joined #openstack-nova | 11:17 | |
*** ttsiouts has quit IRC | 11:20 | |
*** tkajinam has quit IRC | 11:21 | |
songwenping | gb:Hi gibi. I am working on nova-cyborg-interaction now, and commit this patch https://review.opendev.org/#/c/720670/ | 11:22 |
gibi | songwenping: hi | 11:22 |
songwenping | We haven't show the ARQ id in dashboard now. | 11:22 |
songwenping | But i think we will show it like cinder volume. | 11:23 |
songwenping | So should we handle the cyborg exception after showing it? | 11:24 |
gibi | songwenping: still, expecting the end user to _know_ where and how to clean up after a seemingly _successfull_ server delete operations feels bad | 11:24 |
gibi | basically after every server delete the end user would need to check the cyborg API to know if the ARQs are freed up or not | 11:24 |
gibi | I don't like that | 11:25 |
brinzhang | gibi, songwenping: agree with gibi, if there are so many resources leaked in Cyborg, it will be a heavy works to cleanup | 11:26 |
brinzhang | but compared with Cinder logical, it also has the same issue. | 11:26 |
brinzhang | maybe we shuold have a logical to deal with this, or dealed in Cinder and/or Cyborg | 11:27 |
*** sean-k-mooney has joined #openstack-nova | 11:29 | |
*** ttsiouts has joined #openstack-nova | 11:30 | |
songwenping | gb:Yeah, it's indeed a problem leaking many resources in system. I just want to keep pace with cinder logical. | 11:33 |
gibi | songwenping: what is the use case you want to solve? you mentioned deploy and undeploy cyborg. there I think before undeploy the admin needs to clean up the cyborg users. Also mentioned failure in cyborg. If that failure is intermittent (e.g service restart or network interrupt) then I think end user needs to retry the delete. if the cyborg failure is static then that is a cyborg but to be fixed | 11:36 |
gibi | s/but/bug/ | 11:37 |
*** mkrai has quit IRC | 11:37 | |
*** sapd1 has quit IRC | 11:42 | |
*** songwenping_ has joined #openstack-nova | 11:42 | |
songwenping_ | gibi:i want to solve the second use case. | 11:43 |
*** songwenping has quit IRC | 11:44 | |
brinzhang | gibi: I give you use case from my customer | 11:45 |
brinzhang | s/use case/ a use case | 11:45 |
brinzhang | gibi: Due to the system upgrade, the cyborg service cannot be started. If the user wants to clear the instance that contains ‘accel: device_profile_name’ in the flavor, the instance will be in an error state and cannot release scarce resources such as GPU and FPGA. If that is the user's only resource, it may also be considered for manual cleaning. This is common for small customers. | 11:45 |
brinzhang | Of course, this is a scene of its edge. | 11:47 |
gibi | brinzhang: so during and upgrade some of the control plane services are still up (e.g. nova) but some of them are down (e.g cyborg) | 11:47 |
brinzhang | This may be a treatment, but it is not so perfect. | 11:48 |
gibi | but if cyborg is down then who the user could ever free up FPGA resources? | 11:48 |
gibi | s/who/how/ | 11:48 |
gibi | also even if it is freed up it cannot be used again as cyborg is down | 11:49 |
brinzhang | maybe that canbe done in db, this is perhaps the worst case | 11:49 |
*** dpawlik has quit IRC | 11:50 | |
*** dpawlik has joined #openstack-nova | 11:50 | |
gibi | sorry but I my mind if cyborg service is down, then the user cannot and should not do anything with resources managed by cyborg | 11:51 |
brinzhang | gibi: I just put forward such a scenario, and I agree with you, your consider is right | 11:51 |
gibi | then we agree that this use case is not valid :) | 11:52 |
songwenping_ | agree with gibi. | 11:52 |
*** dpawlik has quit IRC | 11:52 | |
*** dpawlik has joined #openstack-nova | 11:53 | |
brinzhang | yes, but I think I also need to think how to deal with this scenario, we did encounter this situation. | 11:53 |
*** dpawlik has quit IRC | 11:54 | |
*** dpawlik has joined #openstack-nova | 11:54 | |
brinzhang | another way, that can power off the server, and migrate its instance, than re-deployed the OpenStack | 11:55 |
brinzhang | in a new region | 11:55 |
*** ociuhandu has joined #openstack-nova | 11:57 | |
gibi | honestly I don't see why does your deployment need to support manipulating FPGAs while cyborg service is doewn | 11:57 |
gibi | down | 11:57 |
*** belmoreira has joined #openstack-nova | 12:01 | |
brinzhang | yes, it isnot make sense. good bye gibi, hope you have a good day ^ | 12:01 |
*** ociuhandu has quit IRC | 12:02 | |
*** rpittau|bbl is now known as rpittau | 12:03 | |
gibi | brinzhang: have a nice afternoon | 12:05 |
*** ttsiouts has quit IRC | 12:10 | |
bauzas | gibi: ack thanks | 12:12 |
bauzas | brinzhang: if you want, ping me tomorrow for discussing about something about reviews | 12:12 |
bauzas | brinzhang: (UTC+2 here) | 12:12 |
*** ttsiouts has joined #openstack-nova | 12:12 | |
*** ttsiouts has quit IRC | 12:17 | |
*** jangutter has joined #openstack-nova | 12:18 | |
*** ratailor has quit IRC | 12:20 | |
*** ttsiouts has joined #openstack-nova | 12:22 | |
*** derekh has quit IRC | 12:24 | |
*** ttsiouts has quit IRC | 12:27 | |
*** ttsiouts has joined #openstack-nova | 12:32 | |
*** udesale_ has quit IRC | 12:34 | |
gibi | bauzas: will there be a FUP for the comments from lyarwood in https://review.opendev.org/#/c/712118/ ? | 12:40 |
bauzas | gibi: haven't seen them yet | 12:40 |
bauzas | gibi: FWIW, in https://review.opendev.org/#/c/712118/ like I said in a comment, this change is no longer needed for https://review.opendev.org/#/c/712741/ | 12:41 |
*** Luzi has joined #openstack-nova | 12:42 | |
gibi | bauzas: will you then remove it from the series? | 12:42 |
bauzas | gibi: for the comment nits, sure I can do it in a FUP | 12:42 |
bauzas | gibi: we can merge it since I already provided a ML thread for out-of-tree drivers maintainers | 12:42 |
gibi | bauzas: yes, comment nits are totally FUPable but if the whole patch is not needed then it is even better | 12:42 |
bauzas | gibi: or wait, I'll rebase this one on top of https://review.opendev.org/#/c/712741/ and just provide a new revision for the nits | 12:43 |
bauzas | will be done in 1 min | 12:43 |
gibi | but then the allocation would be an unused param | 12:43 |
gibi | isn't it? | 12:43 |
*** ttsiouts has quit IRC | 12:44 | |
*** ttsiouts has joined #openstack-nova | 12:45 | |
bauzas | gibi: for finish_revert_migration() yes | 12:46 |
bauzas | gibi: to clarify, I'll just provide the new series | 12:46 |
bauzas | and people can discuss on the opportunity to merge https://review.opendev.org/#/c/712118/ if nothing uses the new param or not | 12:46 |
gibi | bauzas: OK, I will check the new series | 12:47 |
bauzas | gibi: should be done in 5 mins, just verifying unittests and functests because of a minor merge conflict | 12:47 |
gibi | cool | 12:49 |
*** priteau has joined #openstack-nova | 12:50 | |
*** derekh has joined #openstack-nova | 12:59 | |
francoisp | gibi hello, we would need an external reviewer (outside of RH) to check on https://review.opendev.org/#/c/669674/ , would you have time to have a look? | 13:00 |
bauzas | gibi: excellent concern FWIW https://review.opendev.org/#/c/712741/6/nova/tests/functional/libvirt/test_vgpu.py@45 | 13:00 |
*** artom has joined #openstack-nova | 13:00 | |
gibi | francoisp: based on a recent agreement you only need to keep the two company rule for high impact changes, anything that | 13:03 |
gibi | involves a microversion, service version, rpc version, or database | 13:03 |
gibi | migration. | 13:03 |
gibi | francoisp: but sure I will look at that bugfix | 13:03 |
gibi | bauzas: honestly I failed to prove that it can actaully cause any problem but if you can add someting to the setUp to reset the test object level variable that could scratch my itch | 13:04 |
openstackgerrit | Sylvain Bauza proposed openstack/nova master: Allocate mdevs when resizing or reverting resize https://review.opendev.org/712741 | 13:04 |
openstackgerrit | Sylvain Bauza proposed openstack/nova master: Pass allocations to virt drivers when reverting resize https://review.opendev.org/712118 | 13:04 |
francoisp | thanks very much gibi | 13:05 |
gibi | francoisp: for reference http://lists.openstack.org/pipermail/openstack-discuss/2020-March/013553.html | 13:05 |
*** irclogbot_0 has quit IRC | 13:06 | |
lyarwood | gibi: I asked for additional review outside of RH in francoisp's case as that change impacts all callers to cinder across all virt drivers. | 13:07 |
francoisp | ok thanks gibi, that makes sense, otherwise you would get overwhelmed | 13:07 |
gibi | lyarwood: I see that is reasonable | 13:07 |
*** irclogbot_0 has joined #openstack-nova | 13:08 | |
*** mriedem has joined #openstack-nova | 13:09 | |
*** kevinz has quit IRC | 13:09 | |
*** kevinz has joined #openstack-nova | 13:14 | |
bauzas | gibi: if you don't mind reapplying your +2 on the vgpu resize change given the only change was due to a merge conflict resolution https://review.opendev.org/#/c/712741/6..7 | 13:22 |
*** ttsiouts has quit IRC | 13:22 | |
* bauzas goes working on the prelude section | 13:22 | |
*** lbragstad has quit IRC | 13:24 | |
*** lbragstad has joined #openstack-nova | 13:27 | |
gibi | bauzas: done. and thanks for writing the prelude | 13:27 |
bauzas | ta | 13:27 |
*** ociuhandu has joined #openstack-nova | 13:30 | |
gibi | francoisp, lyarwood: +A-d the cinder retry | 13:31 |
*** ttsiouts has joined #openstack-nova | 13:32 | |
francoisp | ok great, thank you gibi | 13:32 |
lyarwood | yup thanks gibi | 13:32 |
*** psachin has quit IRC | 13:34 | |
*** ociuhandu has quit IRC | 13:35 | |
*** ttsiouts has quit IRC | 13:37 | |
*** lbragstad_ has joined #openstack-nova | 13:37 | |
openstackgerrit | jayaditya gupta proposed openstack/nova master: Support for --force flag for nova-manage placement heal_allocations command https://review.opendev.org/715395 | 13:37 |
*** lbragstad has quit IRC | 13:39 | |
*** hoonetorg has quit IRC | 13:40 | |
*** ttsiouts has joined #openstack-nova | 13:42 | |
*** ttsiouts has quit IRC | 13:45 | |
*** ttsiouts has joined #openstack-nova | 13:45 | |
nightmare_unreal | mriedem: thanks for the review :) As you have suggested I have made changes accordingly but I am still facing 1 issue. It seems the allocated ram ( bogus ram) won't change if you call heal allocation with force flag or without force flag :/ I have added comments for it https://review.opendev.org/#/c/715395/10 | 13:47 |
nightmare_unreal | Thanks | 13:47 |
*** eharney has joined #openstack-nova | 13:52 | |
*** tkajinam has joined #openstack-nova | 13:53 | |
-openstackstatus- NOTICE: Zuul is temporarily offline; service should be restored in about 15 minutes. | 13:59 | |
*** hoonetorg has joined #openstack-nova | 13:59 | |
*** ociuhandu has joined #openstack-nova | 14:00 | |
*** mkrai has joined #openstack-nova | 14:03 | |
gmann | melwitt: stephenfin gibi seems like johnthetubaguy is not online. how we should proceed on these last bits to merge as 23rd is hard string freeze - https://review.opendev.org/#/q/topic:bp/policy-defaults-refresh+status:open | 14:05 |
gibi | gmann: I'm on a call I will ping back in an hour | 14:06 |
gibi | but overall I can try to spend 1 hour on those today | 14:06 |
gmann | gibi: thanks | 14:08 |
*** sapd1 has joined #openstack-nova | 14:10 | |
*** songwenping_ has quit IRC | 14:16 | |
*** hongbin has joined #openstack-nova | 14:21 | |
openstackgerrit | sean mooney proposed openstack/nova-specs master: move implemented spec for train https://review.opendev.org/706276 | 14:26 |
openstackgerrit | sean mooney proposed openstack/nova-specs master: move implemented spec for train https://review.opendev.org/706276 | 14:27 |
*** ociuhandu has quit IRC | 14:29 | |
mriedem | nightmare_unreal: that's the point of the feature, correct? if it's not working you're going to need to debug it. | 14:29 |
mriedem | but that's why i asked for that kind of test | 14:30 |
nightmare_unreal | yeaah | 14:31 |
*** dklyle has joined #openstack-nova | 14:31 | |
*** ociuhandu has joined #openstack-nova | 14:35 | |
openstackgerrit | sean mooney proposed openstack/nova-specs master: move implemented spec for ussuri https://review.opendev.org/721278 | 14:35 |
*** hemna_ has quit IRC | 14:35 | |
*** ociuhandu has quit IRC | 14:40 | |
kashyap | lyarwood: Hey, do you have the reproducer for that 'q35' thing on Ubuntu? | 14:40 |
* kashyap goes to check the nova-next WIP job URL... | 14:41 | |
kashyap | Ah, you updated this morning | 14:41 |
kashyap | Okay, it looks like a nudge. (Because, I don't see any 'diff' b/n 3..4: https://review.opendev.org/#/c/708701/3..4/.zuul.yaml) | 14:42 |
*** mlavalle has joined #openstack-nova | 14:57 | |
*** tkajinam has quit IRC | 14:59 | |
lyarwood | kashyap: sorry was hacking away on something downstream | 15:02 |
lyarwood | kashyap: I've just rebased that today to see if it still reproduces | 15:02 |
lyarwood | kashyap: I don't have anything written up, I only manually reproduced it before. | 15:02 |
kashyap | lyarwood: No problem; I asked on the change | 15:02 |
kashyap | (I don't count on instant responses :)) | 15:02 |
*** mgariepy has joined #openstack-nova | 15:03 | |
*** dpawlik has quit IRC | 15:07 | |
*** sapd1 has quit IRC | 15:14 | |
*** mkrai has quit IRC | 15:22 | |
*** mkrai has joined #openstack-nova | 15:22 | |
*** hemna_ has joined #openstack-nova | 15:23 | |
*** hemna_ has quit IRC | 15:24 | |
*** vishalmanchanda has quit IRC | 15:26 | |
*** sapd1 has joined #openstack-nova | 15:26 | |
*** hongbin has quit IRC | 15:28 | |
gibi | gmann: I'm +2 on the remaining policy changes. | 15:33 |
*** gyee has joined #openstack-nova | 15:40 | |
*** ttsiouts has quit IRC | 15:43 | |
*** happyhemant has joined #openstack-nova | 15:45 | |
*** amodi has joined #openstack-nova | 15:46 | |
*** hoonetorg has quit IRC | 15:46 | |
-openstackstatus- NOTICE: Gerrit will be restarted to correct a misconfiguration which caused some git mirrors to have outdated references. | 15:47 | |
gmann | gibi: thanks. should i revise this as per comment if you are online and can re+2 - https://review.opendev.org/#/c/720129/7 | 15:48 |
*** ttsiouts has joined #openstack-nova | 15:51 | |
*** hoonetorg has joined #openstack-nova | 15:59 | |
*** KeithMnemonic has joined #openstack-nova | 16:00 | |
gibi | gmann: if you respin it then I can re +2 | 16:00 |
gibi | gmann: but I might be slower during my evening | 16:01 |
gmann | gibi: cool. dojng | 16:01 |
gmann | doing | 16:01 |
gibi | cool | 16:01 |
*** ganso has quit IRC | 16:03 | |
*** hongbin has joined #openstack-nova | 16:05 | |
*** ganso has joined #openstack-nova | 16:05 | |
*** rpittau is now known as rpittau|afk | 16:07 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Add docs and releasenotes for BP policy-defaults-refresh https://review.opendev.org/720129 | 16:10 |
gmann | gibi: ^^ | 16:10 |
gibi | looking | 16:10 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Add docs and releasenotes for BP policy-defaults-refresh https://review.opendev.org/720129 | 16:12 |
gibi | just fixed a missing verb in the same sentence ^^ | 16:12 |
gibi | but +2 | 16:12 |
*** Luzi has quit IRC | 16:15 | |
gmann | thanks | 16:19 |
*** ttsiouts has quit IRC | 16:22 | |
*** dtantsur is now known as dtantsur|afk | 16:32 | |
*** yaawang has joined #openstack-nova | 16:33 | |
*** yaawang_ has quit IRC | 16:33 | |
*** evrardjp has quit IRC | 16:35 | |
*** evrardjp has joined #openstack-nova | 16:35 | |
*** ociuhandu has joined #openstack-nova | 16:36 | |
*** ttsiouts has joined #openstack-nova | 16:38 | |
*** csatari has quit IRC | 16:54 | |
*** ttsiouts has quit IRC | 16:54 | |
*** csatari has joined #openstack-nova | 16:55 | |
*** ociuhandu has quit IRC | 16:56 | |
*** hemna has joined #openstack-nova | 16:58 | |
*** derekh has quit IRC | 17:03 | |
*** sapd1 has quit IRC | 17:07 | |
*** ociuhandu has joined #openstack-nova | 17:10 | |
stephenfin | gibi: If you can hit https://review.opendev.org/717884 https://review.opendev.org/719100 and https://review.opendev.org/720042 then we're done with policy, afaict | 17:17 |
*** tesseract has quit IRC | 17:18 | |
openstackgerrit | Merged openstack/nova stable/train: libvirt: Calculate disk_over_committed for raw instances https://review.opendev.org/718964 | 17:18 |
* stephenfin -> 🐕🚶 | 17:18 | |
*** priteau has quit IRC | 17:19 | |
*** ociuhandu has quit IRC | 17:22 | |
*** portdirect has quit IRC | 17:30 | |
*** portdirect has joined #openstack-nova | 17:30 | |
*** ttsiouts has joined #openstack-nova | 17:34 | |
openstackgerrit | Merged openstack/nova master: Add retry to cinder API calls related to volume detach https://review.opendev.org/669674 | 17:36 |
*** ttsiouts has quit IRC | 17:39 | |
*** ralonsoh has quit IRC | 17:39 | |
*** evrardjp has quit IRC | 17:44 | |
*** evrardjp has joined #openstack-nova | 17:49 | |
*** billkgr has joined #openstack-nova | 17:52 | |
*** maciejjozefczyk_ has joined #openstack-nova | 17:53 | |
*** maciejjozefczyk has quit IRC | 17:53 | |
*** happyhemant has quit IRC | 17:55 | |
*** hemna has quit IRC | 17:58 | |
*** hemna has joined #openstack-nova | 17:59 | |
*** maciejjozefczyk_ has quit IRC | 18:01 | |
*** ircuser-1 has joined #openstack-nova | 18:01 | |
*** nightmare_unreal has quit IRC | 18:02 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Fix the followup comment of policy doc https://review.opendev.org/721322 | 18:06 |
*** billkgr has quit IRC | 18:07 | |
*** ttsiouts has joined #openstack-nova | 18:11 | |
*** billkgr has joined #openstack-nova | 18:12 | |
*** ociuhandu has joined #openstack-nova | 18:15 | |
artom | gmann, wait, are we trying to land that policy doc before RC? | 18:18 |
artom | Didn't mean to sabotage that - but then my -1 carries less weight than gibi's +2, so :) | 18:19 |
*** ociuhandu has quit IRC | 18:20 | |
*** ociuhandu has joined #openstack-nova | 18:20 | |
*** hongbin has quit IRC | 18:21 | |
gmann | artom: yeah before RC. I am fixing your comment in follow up along with stephenfin comments -https://review.opendev.org/721322 | 18:23 |
openstackgerrit | Merged openstack/nova master: Introduce scope_types in servers attributes Policies https://review.opendev.org/719729 | 18:24 |
*** mkrai has quit IRC | 18:24 | |
*** mkrai_ has joined #openstack-nova | 18:24 | |
*** ttsiouts has quit IRC | 18:26 | |
*** ttsiouts has joined #openstack-nova | 18:26 | |
*** hongbin has joined #openstack-nova | 18:26 | |
*** hongbin has quit IRC | 18:27 | |
openstackgerrit | Merged openstack/nova master: Add new default roles in servers attributes policies https://review.opendev.org/719730 | 18:33 |
openstackgerrit | Merged openstack/nova master: Add test coverage of existing remaining servers policies https://review.opendev.org/720104 | 18:34 |
openstackgerrit | Merged openstack/nova master: Introduce scope_types in remaining servers Policies https://review.opendev.org/720106 | 18:34 |
openstackgerrit | Merged openstack/nova master: Add new default roles in remaining servers policies https://review.opendev.org/720116 | 18:34 |
*** jrosser has quit IRC | 18:41 | |
*** jrosser has joined #openstack-nova | 18:42 | |
*** slaweq has quit IRC | 18:50 | |
openstackgerrit | Ghanshyam Mann proposed openstack/nova master: Fix the followup comment of policy doc https://review.opendev.org/721322 | 18:54 |
gmann | stephenfin: artom i fixed the policy doc comment in followup, please check - https://review.opendev.org/#/c/721322/ | 18:54 |
*** slaweq has joined #openstack-nova | 18:56 | |
*** ociuhandu has quit IRC | 18:57 | |
*** ociuhandu has joined #openstack-nova | 18:58 | |
*** jdillaman has joined #openstack-nova | 19:00 | |
*** slaweq_ has joined #openstack-nova | 19:02 | |
*** ociuhandu has quit IRC | 19:03 | |
*** slaweq has quit IRC | 19:05 | |
*** ttsiouts has quit IRC | 19:06 | |
*** belmoreira has quit IRC | 19:10 | |
openstackgerrit | Merged openstack/nova master: Fix follow up comments on policy work https://review.opendev.org/717835 | 19:13 |
openstackgerrit | Merged openstack/nova master: Pass allocations to virt drivers when resizing https://review.opendev.org/589085 | 19:17 |
*** ttsiouts has joined #openstack-nova | 19:42 | |
*** ttsiouts has quit IRC | 19:44 | |
*** ttsiouts has joined #openstack-nova | 19:44 | |
*** grandchild has joined #openstack-nova | 19:47 | |
*** ttsiouts has quit IRC | 19:49 | |
*** ociuhandu has joined #openstack-nova | 19:50 | |
mnaser | sean-k-mooney: sorry to ping you here but i don't know what other channelt o find you in -- happy to hear your thoughts on https://review.opendev.org/#/c/720107/3 :) | 19:56 |
sean-k-mooney | mnaser: im usally in nova,neuton,plamcent,kolla and somethim infra or oslo | 19:57 |
*** ociuhandu has quit IRC | 19:57 | |
sean-k-mooney | but ya ill take a look now | 19:58 |
mnaser | sean-k-mooney: fair :) whois showed a lot less than those today :P | 19:58 |
sean-k-mooney | whois sean-k-mooney | 19:58 |
sean-k-mooney | i has a few but ya so container images | 19:59 |
sean-k-mooney | fun | 19:59 |
sean-k-mooney | im not sure that its fair to describe kolla image as like system image e.g. lxc style but they are not that light weight either | 20:00 |
*** ccamacho has quit IRC | 20:01 | |
zigo | What's blocking this backport patch ? https://review.opendev.org/#/c/711233/ | 20:04 |
zigo | The bug https://bugs.launchpad.net/nova/+bug/1788014 is causing real life troubles and a fix would be really nice. | 20:04 |
openstack | Launchpad bug 1788014 in OpenStack Compute (nova) rocky "when live migration fails due to a internal error rollback is not handeled correctly." [Medium,In progress] - Assigned to Elod Illes (elod-illes) | 20:04 |
zigo | We had all sorts of down time due to it, lots of head scratching until we understood what was going on... | 20:05 |
melwitt | elod: question for your morrow ^ | 20:08 |
*** ociuhandu has joined #openstack-nova | 20:10 | |
*** grandchild has quit IRC | 20:13 | |
*** ttsiouts has joined #openstack-nova | 20:18 | |
*** ttsiouts has quit IRC | 20:28 | |
*** ttsiouts has joined #openstack-nova | 20:28 | |
*** ociuhandu has quit IRC | 20:30 | |
*** billkgr has quit IRC | 20:31 | |
*** xek has quit IRC | 20:33 | |
*** ociuhandu has joined #openstack-nova | 21:03 | |
*** ociuhandu has quit IRC | 21:08 | |
*** jangutter has quit IRC | 21:16 | |
*** avolkov has quit IRC | 21:31 | |
*** dosaboy has quit IRC | 21:46 | |
*** dosaboy has joined #openstack-nova | 22:02 | |
*** ociuhandu has joined #openstack-nova | 22:04 | |
*** rcernin has quit IRC | 22:05 | |
*** rcernin has joined #openstack-nova | 22:06 | |
*** mriedem has left #openstack-nova | 22:11 | |
*** ociuhandu has quit IRC | 22:18 | |
*** ociuhandu has joined #openstack-nova | 22:18 | |
*** tosky has quit IRC | 22:19 | |
*** ociuhandu has quit IRC | 22:23 | |
*** ttsiouts has quit IRC | 22:27 | |
*** abaindur has joined #openstack-nova | 22:34 | |
*** jangutter has joined #openstack-nova | 22:39 | |
*** tkajinam has joined #openstack-nova | 22:43 | |
*** jangutter has quit IRC | 22:45 | |
*** abaindur has quit IRC | 22:46 | |
*** abaindur has joined #openstack-nova | 22:46 | |
abaindur | Hello, I have a question about post copy live migration. What happens if live_migration_permit_post_copy is only set on nova compute on some hypervisors? Does it need to be the same across every host? | 22:47 |
sean-k-mooney | mnaser: this is my counter proposal https://review.opendev.org/#/c/720107/3/goals/proposed/container-images.rst@14 | 22:47 |
*** ttsiouts has joined #openstack-nova | 23:04 | |
*** ttsiouts has quit IRC | 23:09 | |
sean-k-mooney | abaindur: i think it is based on teh source node | 23:11 |
sean-k-mooney | but we dont test it so it shoudl be the same on all node but it might work if its different | 23:12 |
abaindur | would there be any issues if we migrated from a source host that had post copy enabled, but a destination host that didnt? | 23:12 |
abaindur | we want to give it a shot - but only wanted to run it on a subset of hypervisors | 23:12 |
abaindur | sean-k-mooney: one other question about live migration: reason we are going to post-copy is because we're seeing significant downtime (15 - 30+ seconds) during live migration. Seems to always start when VM is Paused on source/Resumed on dest, then start working shortly after port-binding activate call is made, and port is plugged on the host | 23:14 |
abaindur | We thought that maybe giving post-copy a shot would help, since it would give us the benefit of this fix: https://opendev.org/openstack/nova/commit/1f48d3d83b4d5f6f9cd96ee06d2fc005635c1ff9 | 23:15 |
abaindur | But are there any known issues around pre-copy live migration? Bulk of the time seems to be taken up in _post_copy_live_migration() function on the source host. For example it took 18+ seconds from statr of that function until the port-binding activate call was sent to neutron | 23:16 |
abaindur | sorry, not _post_copy_live_migration(). I meant _post_live_migration() function | 23:18 |
*** threestrands has joined #openstack-nova | 23:18 | |
*** slaweq_ has quit IRC | 23:21 | |
sean-k-mooney | abaindur: libvirt will check if the qemu and libvirt on each host support it | 23:22 |
sean-k-mooney | and only enable it if both do i belive | 23:22 |
sean-k-mooney | so in principal i dont think it would have a negitive effect just be aware that you would see different behavior migration too a host with it enabled vs migrating form a host with it enabled | 23:23 |
sean-k-mooney | i dont recall off the top of my head which config we check to enable it but i belive it would have a asymetric behavior as i think we only check one of them | 23:24 |
*** slaweq_ has joined #openstack-nova | 23:25 | |
sean-k-mooney | abaindur: for what it si worth the port binding events shoudl work with or without post copy | 23:25 |
sean-k-mooney | _post_live_migration is the function that cleans up the image on the source node and finishes and work reqiured on the dest | 23:27 |
sean-k-mooney | if you are using a nova and neutron that do not support multiple port bining _post_live_migration_at_dest is where we will do the port binding | 23:28 |
sean-k-mooney | but in a nova that support neutron multiple port bindign api we will prebind the port on the souce and activate it either in responce to the live migrion even tor the start of _post_live_migration | 23:29 |
abaindur | yea, but we noticed that in pre-copy LV mode, the port binding activate call is sent towards the end of that function. And connectivity is disrupted when VM is Paused. | 23:29 |
abaindur | https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L6931 | 23:29 |
sean-k-mooney | ya so are you using ovs with the ovs firewall driver by any chance | 23:29 |
*** slaweq_ has quit IRC | 23:29 | |
abaindur | and here is where the port binding activation call is invoekd: https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L7005 | 23:29 |
sean-k-mooney | or are you using iptables | 23:29 |
abaindur | between those 2 lines of code, we noticed nova taking 18+ seconds - 2 seocnds for bolume cleanup, few more seconds for disconnection, 5 sec for get_instance_nw_info, 5 more sec for compute_utils.notify_about_instance_action, etc... | 23:30 |
abaindur | One the port binding activate call came, another 3-5 seconds for port to be plugged by OVS-agent | 23:31 |
abaindur | we're using iptables | 23:31 |
sean-k-mooney | on what release | 23:31 |
abaindur | Rocky | 23:31 |
sean-k-mooney | the iptable firewall should have less downtime the openvswithc | 23:31 |
sean-k-mooney | as we can pre plug the port and have neutron wire it up while we are waiting for migration too happen at the libvirt level. | 23:32 |
sean-k-mooney | when using the ovs firewall driver because libvirt recreate the ovs port it takes longer as neutron has to do it twice | 23:33 |
abaindur | as mentioned, we're seeing non-trivial downtime in that _post_live_migration() function on the src host, between the log "'_post_live_migration() is started.." (or when VM is Paused), and when neutron receives the port-binding /activate call at https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L7005 | 23:33 |
sean-k-mooney | abaindur: yes but before the multiple port binding change it used to be even longer | 23:33 |
abaindur | so is that down time then to be expected or unavoidable? :( | 23:34 |
sean-k-mooney | no its avoidable | 23:34 |
sean-k-mooney | so this is the feature you are trying to use https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/neutron-new-port-binding-api.html | 23:34 |
sean-k-mooney | i did not think this required post-copy but let me double check | 23:34 |
abaindur | well thats why we were considering trying out post-copy - to see if it speeds up when nova makes the port binding call | 23:35 |
sean-k-mooney | ok so yes if you want the quick setup we are waiting for VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY | 23:35 |
abaindur | post-copy seems to trigger that call based on VIR_DOMAIN_EVENT states | 23:35 |
abaindur | pre-copy waits for that _post_live_migration() on the host to invoke network_api.migrate_instance_start(), which seems to take a while for us | 23:36 |
sean-k-mooney | that will allow us to activate the port bidning from the souce earlier then _post_live_migration | 23:36 |
abaindur | Right, thats why we will try post-copy mode. But was wondering if the downtime/delay we are seeing with pre-copy is expected or unavoidable? | 23:37 |
sean-k-mooney | abaindur: ya so it is a littl strange that your _post_live_migration function completion is os long | 23:37 |
sean-k-mooney | i would not expect _post_live_migration to take multiple seconds | 23:37 |
abaindur | what are downsides to post copy besides that VM needs to be rebooted if theres a live migration error? | 23:38 |
abaindur | and page faults may slow down the VM as memory needs to be copied over the network? | 23:39 |
*** ttsiouts has joined #openstack-nova | 23:39 | |
sean-k-mooney | that is the main one. if there is a network outage while its still in post copy phase then the vm will crash | 23:39 |
sean-k-mooney | yes page falts acroos the network might beut all write happen locally | 23:39 |
sean-k-mooney | so the vm will only pause if it need to read un copied data | 23:39 |
sean-k-mooney | and once its copied local update to that will happen to the dest memmory | 23:40 |
sean-k-mooney | as you pointed out we are activating the port here https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L7005 | 23:40 |
sean-k-mooney | which happen near the star of the function so the dely is likely related to cinder performance | 23:40 |
sean-k-mooney | abaindur: have you tried live migrating vms that dont have cinder volumes | 23:41 |
abaindur | yea, we tried VMs both volume and ephemeral based | 23:41 |
sean-k-mooney | did you see the same delay? | 23:41 |
abaindur | pretty much | 23:42 |
abaindur | timed the volume code with some logs of our own, heres what we observed: | 23:42 |
abaindur | 1.913 seconds for _get_instance_block_device_info and self.driver.post_live_migration | 23:42 |
abaindur | about 0.4 sec for self.driver.get_volume_connector(instance) | 23:42 |
abaindur | 2.659 seconds for self.volume_api.terminate_connection() | 23:42 |
abaindur | 3.28 seconds for network_info = self.network_api.get_instance_nw_info(ctxt, instance) | 23:42 |
abaindur | 5.019 seconds for self._notify_about_instance_usage(ctxt, instance, "live_migration._post.start", network_info=network_info) | 23:43 |
abaindur | another almost 5 seconds nova spent just making the port binding activate API call, seems to be spending time in keystone and oslo_concurrency.lockutils code | 23:43 |
sean-k-mooney | is that absolute time of time for each function | 23:43 |
abaindur | yea, we just added logs throughout that post_live_migration() function | 23:43 |
abaindur | before/after each of those calls to see what was taking so long | 23:43 |
sean-k-mooney | ok so i subtrac those numebr to get the time for each | 23:44 |
sean-k-mooney | that is still very slow | 23:44 |
abaindur | for example, in one live migration, we say: VM Paused at 25:06.192 | 23:44 |
sean-k-mooney | abaindur: are you using memcache for you keysontone auth tokens | 23:44 |
abaindur | by time we saw nova-compute make the port binding call, it was at: 35:24.694 | 23:45 |
abaindur | VM Paused at 35:06.192 * | 23:46 |
*** jangutter has joined #openstack-nova | 23:46 | |
sean-k-mooney | abaindur: can you check your nova.conf and see if you have https://zuul.opendev.org/t/openstack/build/dcde79801a624c25b195a46ead7af562/log/controller/logs/etc/nova/nova-cpu_conf.txt#62-68 | 23:46 |
sean-k-mooney | also 10 seconds to bind the port in precopy mode is too hight so there is something else hurting performance which is why im suspecign you do not have caching of keysotne configure correctly. | 23:49 |
*** jangutter has quit IRC | 23:52 | |
abaindur | Ok no, memcache_servers is not set... | 23:52 |
abaindur | this is for nova compute on the hypervisor side right? | 23:52 |
sean-k-mooney | https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L6982-L7008 could be safely moved to https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L6939 by the way and in the libvirt case https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L6933 woudl be fine too. | 23:53 |
sean-k-mooney | am it will be used both on the contoler and computes | 23:53 |
sean-k-mooney | so you are missing | 23:54 |
sean-k-mooney | [keystone_authtoken] | 23:54 |
sean-k-mooney | memcached_servers = localhost:11211 | 23:54 |
sean-k-mooney | or well your actull memcache servers | 23:54 |
abaindur | yeah, I dont see that config opt ever ok | 23:54 |
sean-k-mooney | this will be used to cach auth tokens for every api call we make | 23:54 |
abaindur | i dont see it ever set* | 23:55 |
sean-k-mooney | abaindur: so at least on the contoler side it has a significnat impact on the api | 23:58 |
sean-k-mooney | https://bugs.launchpad.net/nova/+bug/1836642 | 23:58 |
openstack | Launchpad bug 1836642 in neutron "Metadata responses are very slow sometimes" [High,Incomplete] - Assigned to Slawek Kaplonski (slaweq) | 23:58 |
sean-k-mooney | we adress this porblem in the gate by truning on caching with https://github.com/openstack/devstack/commit/d33cdd01f83b891b010e0fd238f1816910f3fd77 | 23:58 |
abaindur | I dont see it on controller either | 23:59 |
sean-k-mooney | i am not sure if i tis use dby the compute node but i think it will be whenever it is calling cinder neutron or placment | 23:59 |
sean-k-mooney | abaindur: its optional | 23:59 |
sean-k-mooney | but it improves perfromacne alot | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!