*** rcernin has joined #openstack-nova | 00:14 | |
*** Techy2493 has quit IRC | 00:21 | |
*** rcernin has quit IRC | 00:22 | |
*** ircuser-1 has quit IRC | 00:22 | |
*** rcernin has joined #openstack-nova | 00:22 | |
*** tosky has quit IRC | 00:26 | |
*** LinPeiWen has joined #openstack-nova | 00:55 | |
*** swp20 has joined #openstack-nova | 01:53 | |
*** swp20 has quit IRC | 02:05 | |
*** sapd1 has joined #openstack-nova | 02:19 | |
*** macz_ has joined #openstack-nova | 03:17 | |
*** psachin has joined #openstack-nova | 03:22 | |
*** macz_ has quit IRC | 03:22 | |
*** rcernin has quit IRC | 03:22 | |
*** mkrai has joined #openstack-nova | 03:24 | |
*** hemanth_n has joined #openstack-nova | 03:27 | |
*** rcernin has joined #openstack-nova | 04:02 | |
*** rcernin has quit IRC | 04:07 | |
*** tinwood has quit IRC | 04:10 | |
*** mkrai has quit IRC | 04:10 | |
*** tinwood has joined #openstack-nova | 04:13 | |
*** mkrai has joined #openstack-nova | 04:15 | |
*** rcernin has joined #openstack-nova | 04:48 | |
*** rcernin has quit IRC | 04:48 | |
*** sapd1 has quit IRC | 04:53 | |
*** vishalmanchanda has joined #openstack-nova | 05:11 | |
*** rcernin has joined #openstack-nova | 05:26 | |
*** whoami-rajat_ has joined #openstack-nova | 05:51 | |
*** khomesh24 has joined #openstack-nova | 06:17 | |
*** ralonsoh has joined #openstack-nova | 06:43 | |
*** gokhani has joined #openstack-nova | 06:49 | |
*** Luzi has joined #openstack-nova | 07:02 | |
Luzi | gibi: i made a backport to victoria as you told me: https://review.opendev.org/c/openstack/nova/+/781211 | 07:04 |
---|---|---|
*** pawan-gupta_ has joined #openstack-nova | 07:36 | |
*** mkrai has quit IRC | 07:39 | |
*** mkrai_ has joined #openstack-nova | 07:39 | |
*** slaweq has quit IRC | 07:40 | |
pawan-gupta_ | Hi, I am trying to create an instance using `adminPass` option and `nova.conf` is updated with `inject_password = True` but the provided password in `adminPass` does not work. I am using KVM hypervisor. Am I missing something? | 07:42 |
*** slaweq has joined #openstack-nova | 07:46 | |
*** ociuhandu has joined #openstack-nova | 07:46 | |
frickler | pawan-gupta_: are you running under py3? iirc this feature is broken there, check for errors in your logs | 07:49 |
gibi | Luzi: awesome, thank you. I've added two stable cores to the review to get it merged :) | 07:52 |
Luzi | gibi, thank you :) | 07:52 |
*** ociuhandu has quit IRC | 07:52 | |
gibi | frickler: I guess you are referring to https://review.opendev.org/c/openstack/nova/+/781211 | 07:56 |
gibi | frickler: sorry | 07:56 |
gibi | not that | 07:56 |
gibi | frickler: this https://bugs.launchpad.net/nova/+bug/1882421 | 07:56 |
openstack | Launchpad bug 1882421 in OpenStack Compute (nova) "inject_password fails with python3" [Medium,Confirmed] | 07:56 |
*** mkrai_ has quit IRC | 07:58 | |
*** khomesh24 has quit IRC | 08:08 | |
*** andrewbonney has joined #openstack-nova | 08:13 | |
*** rpittau|afk is now known as rpittau | 08:19 | |
*** rcernin has quit IRC | 08:23 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: trivial: fix word duplication in api ref https://review.opendev.org/c/openstack/nova/+/782028 | 08:25 |
*** mkrai has joined #openstack-nova | 08:28 | |
gibi | bauzas: hi! the rpc bump is in merge conflict | 08:28 |
*** sapd1 has joined #openstack-nova | 08:29 | |
pawan-gupta_ | frickler: thanks for the response but I am running it under py2.7.5 | 08:31 |
*** ociuhandu has joined #openstack-nova | 08:33 | |
*** ociuhandu has quit IRC | 08:38 | |
*** tosky has joined #openstack-nova | 08:49 | |
*** ociuhandu has joined #openstack-nova | 08:56 | |
*** lucasagomes has joined #openstack-nova | 08:56 | |
*** rcernin has joined #openstack-nova | 08:58 | |
*** macz_ has joined #openstack-nova | 09:02 | |
*** macz_ has quit IRC | 09:06 | |
*** sapd1 has quit IRC | 09:12 | |
*** ociuhandu has quit IRC | 09:18 | |
*** mkrai has quit IRC | 09:22 | |
*** mkrai_ has joined #openstack-nova | 09:22 | |
*** derekh has joined #openstack-nova | 09:27 | |
*** rcernin has quit IRC | 09:37 | |
*** ociuhandu has joined #openstack-nova | 09:44 | |
*** ociuhandu has quit IRC | 09:46 | |
*** ociuhandu_ has joined #openstack-nova | 09:46 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Add compute rpc version alias for wallaby as 5.13 https://review.opendev.org/c/openstack/nova/+/782115 | 09:49 |
*** viks____ has joined #openstack-nova | 09:56 | |
bauzas | gibi: yup, I need to rebase it | 09:58 |
bauzas | gibi: I'm working on the prelude atm | 09:58 |
gibi | bauzas: ack, thanks | 10:03 |
*** sapd1 has joined #openstack-nova | 10:04 | |
*** links has joined #openstack-nova | 10:06 | |
*** dtantsur|afk is now known as dtantsur | 10:06 | |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Update min supported service version for Xena. https://review.opendev.org/c/openstack/nova/+/782171 | 10:13 |
*** mkrai_ has quit IRC | 10:13 | |
*** ociuhandu_ has quit IRC | 10:16 | |
*** mkrai has joined #openstack-nova | 10:16 | |
openstackgerrit | Sylvain Bauza proposed openstack/nova master: Wallaby 22.0.0 prelude section https://review.opendev.org/c/openstack/nova/+/782172 | 10:17 |
bauzas | gibi: first round of prelude ^ | 10:17 |
gibi | bauzas: on it :) thank you | 10:17 |
bauzas | I need to dad taxi | 10:17 |
bauzas | gibi: do we have deprecations or removals this cycle ? AFAIK, nope | 10:17 |
* bauzas disappears for 20 mins-ish | 10:17 | |
gibi | bauzas: I will check that but I don't remember any deprecation | 10:22 |
lyarwood | gibi / bauzas ; do workarounds count? | 10:28 |
* lyarwood isn't sure if you're just talking about configurable deprecations or feature deprecation | 10:29 | |
*** stephenfin has joined #openstack-nova | 10:33 | |
*** ociuhandu has joined #openstack-nova | 10:34 | |
*** ociuhandu has quit IRC | 10:36 | |
*** ociuhandu has joined #openstack-nova | 10:37 | |
*** ociuhandu has quit IRC | 10:37 | |
*** ociuhandu has joined #openstack-nova | 10:38 | |
*** ociuhandu has quit IRC | 10:38 | |
*** ociuhandu has joined #openstack-nova | 10:38 | |
*** ociuhandu has quit IRC | 10:38 | |
*** ociuhandu has joined #openstack-nova | 10:40 | |
gibi | lyarwood: it is in context of the reno prelude. If this is an important to highlight in the prelude then lets do that | 10:42 |
lyarwood | gibi: it already has it's on releasenote so I wouldn't bother tbh, I just wasn't sure what the criteria was for the prelude | 10:42 |
gibi | I think there are no hard criterias | 10:43 |
*** ociuhandu has quit IRC | 10:43 | |
*** ociuhandu has joined #openstack-nova | 10:44 | |
MrClayPole | Hi, We are looking to start regular "apt" patching in our test environment we a view of pushing to production. Currently we are running on Ubuntu 18.04.1 with OSA Rocky in test. Is there anything we need to be aware or look at first before we start testing from a Nova prospective? I believe there may have been some issue is the past with live migration? Is that still an issue? | 10:51 |
gibi | MrClayPole: I suggest to read https://docs.openstack.org/nova/latest/user/upgrade.html | 10:53 |
MrClayPole | Thanks I'll take a look | 10:53 |
gibi | MrClayPole: and also suggest to read the relese notes related to the version you are upgrading to https://docs.openstack.org/releasenotes/nova/ | 10:53 |
*** mgariepy has quit IRC | 10:54 | |
*** brtknr has quit IRC | 10:56 | |
*** mkrai has quit IRC | 11:00 | |
*** gokhani has quit IRC | 11:02 | |
*** stephenfin has quit IRC | 11:04 | |
*** stephenfin has joined #openstack-nova | 11:05 | |
*** rcernin has joined #openstack-nova | 11:13 | |
*** ociuhandu has quit IRC | 11:18 | |
*** rcernin has quit IRC | 11:18 | |
*** ociuhandu has joined #openstack-nova | 11:28 | |
*** ociuhandu has quit IRC | 11:30 | |
*** ociuhandu has joined #openstack-nova | 11:31 | |
*** sapd1 has quit IRC | 11:38 | |
*** rcernin has joined #openstack-nova | 11:44 | |
*** mkrai has joined #openstack-nova | 11:56 | |
*** mgariepy has joined #openstack-nova | 12:02 | |
*** whoami-rajat_ is now known as whoami-rajat | 12:04 | |
*** artom has joined #openstack-nova | 12:05 | |
*** ociuhandu has quit IRC | 12:09 | |
*** rcernin has quit IRC | 12:31 | |
*** rcernin has joined #openstack-nova | 12:31 | |
gibi | bauzas: left feedback in the reno prelude | 12:32 |
*** ociuhandu has joined #openstack-nova | 12:39 | |
*** jamesdenton has quit IRC | 12:48 | |
*** jamesdenton has joined #openstack-nova | 12:49 | |
*** ociuhandu has quit IRC | 13:02 | |
*** gokhani has joined #openstack-nova | 13:04 | |
*** psachin has quit IRC | 13:04 | |
*** pawan-gupta_ has quit IRC | 13:05 | |
*** brinzhang has joined #openstack-nova | 13:05 | |
*** ociuhandu has joined #openstack-nova | 13:08 | |
*** hemna has quit IRC | 13:18 | |
*** hemna has joined #openstack-nova | 13:18 | |
*** hemanth_n has quit IRC | 13:20 | |
*** hemna has quit IRC | 13:30 | |
*** tbachman has joined #openstack-nova | 13:30 | |
*** hemna has joined #openstack-nova | 13:30 | |
*** hemna has quit IRC | 13:36 | |
*** amodi has joined #openstack-nova | 13:37 | |
*** sean-k-mooney has joined #openstack-nova | 13:37 | |
*** hemna has joined #openstack-nova | 13:39 | |
*** lpetrut has joined #openstack-nova | 13:41 | |
*** ociuhandu has quit IRC | 13:44 | |
*** ociuhandu has joined #openstack-nova | 13:46 | |
*** ociuhandu has quit IRC | 13:46 | |
*** ociuhandu has joined #openstack-nova | 13:49 | |
*** ociuhandu has quit IRC | 13:50 | |
*** ociuhandu has joined #openstack-nova | 13:50 | |
lyarwood | elod / melwitt / bauzas ; some stable/victoria changes ready for review if anyone has time btw - https://review.opendev.org/c/openstack/nova/+/773320 https://review.opendev.org/c/openstack/nova/+/772480 https://review.opendev.org/c/openstack/nova/+/773320 https://review.opendev.org/c/openstack/nova/+/773321/ | 13:53 |
* lyarwood is trying to burn down his stable backlog of reviews | 13:53 | |
lyarwood | if there's anything I can review in return let me know! | 13:53 |
*** ociuhandu has quit IRC | 13:55 | |
*** ociuhandu has joined #openstack-nova | 13:58 | |
*** ociuhandu has quit IRC | 14:00 | |
elod | lyarwood: ack, will try to review some | 14:00 |
*** ociuhandu has joined #openstack-nova | 14:01 | |
*** jhesketh has quit IRC | 14:02 | |
*** jhesketh has joined #openstack-nova | 14:04 | |
*** rcernin has quit IRC | 14:05 | |
*** luksky has joined #openstack-nova | 14:16 | |
*** ociuhandu_ has joined #openstack-nova | 14:21 | |
*** ociuhandu has quit IRC | 14:22 | |
*** ociuhandu has joined #openstack-nova | 14:23 | |
*** ociuhand_ has joined #openstack-nova | 14:24 | |
*** ociuhandu has quit IRC | 14:24 | |
*** ociuhandu has joined #openstack-nova | 14:25 | |
*** ociuhandu_ has quit IRC | 14:26 | |
*** ociuhand_ has quit IRC | 14:28 | |
gibi | cores, as far as I see we have a sort of rule that only admin can list deleted instances. What about soft-delete instances? As far as I see today only admins can list soft-delete instances. So a user knows the uuid of its accidentally deleted server then they can restore it but if they only know the name of the server then there is no way for the user to list the soft-delete instances to find out | 14:31 |
gibi | which one they want to restore | 14:31 |
*** sapd1 has joined #openstack-nova | 14:31 | |
*** ociuhandu_ has joined #openstack-nova | 14:33 | |
gibi | so if I know the uuid then I can show it and restore it, but if I only know the name then there is no way I can find the uuid | 14:34 |
*** ociuhandu has quit IRC | 14:34 | |
*** ociuhand_ has joined #openstack-nova | 14:34 | |
*** lbragstad__ is now known as lbragstad | 14:36 | |
*** ociuhandu_ has quit IRC | 14:38 | |
*** jangutter_ has joined #openstack-nova | 14:39 | |
*** jangutter has quit IRC | 14:42 | |
gibi | we do not list soft-delete vms by default https://github.com/openstack/nova/blob/3de7fb7c327db348d04d15d4cd3c4f811a336126/nova/db/sqlalchemy/api.py#L1557 and a filter like vm_state=soft-delete does not change this | 14:43 |
sean-k-mooney | gibi: im not sure that only admin can list deleted instance or at least all info about them | 14:43 |
sean-k-mooney | gibi: what im thinking about is the simple tenant usage api | 14:43 |
sean-k-mooney | where the interval will include the usage for deleted instancen that were active in that interval | 14:44 |
sean-k-mooney | we may or may not require admin for listing them normally, havent check the api ref | 14:44 |
gibi | sean-k-mooney: https://github.com/openstack/nova/blob/3de7fb7c327db348d04d15d4cd3c4f811a336126/nova/api/openstack/compute/servers.py#L268 | 14:44 |
sean-k-mooney | but normal teant shave at least limited awarenes | 14:44 |
*** jraju__ has joined #openstack-nova | 14:45 | |
gibi | so GET /servers does not return deleted instances to non admins | 14:45 |
sean-k-mooney | gibi are you sure that takes effect if you use v2 | 14:45 |
*** links has quit IRC | 14:45 | |
sean-k-mooney | i guess it should | 14:46 |
sean-k-mooney | "Show deleted items only. In some circumstances deleted items will still be accessible via the backend database, however there is no contract on how long, so this parameter should be used with caution. 1, t, true, on, y and yes are treated as True (case-insensitive). Other than them are treated as False. | 14:47 |
sean-k-mooney | This parameter is only valid when specified by administrators. If non-admin users specify this parameter, it is ignored. | 14:47 |
sean-k-mooney | so ya admin only in the api ref too | 14:47 |
gmann | yes, for v2 or v2.1, deleted instances are admin only | 14:48 |
gibi | and I agree that we cannot garantee data about deleted instances | 14:48 |
gibi | so there admin onlyness is OK to me | 14:48 |
sean-k-mooney | yep same | 14:49 |
gibi | but soft-delete instances are always kept in the db so there we could return them to the owner | 14:49 |
sean-k-mooney | soft delete is not even guarentted by the api | 14:49 |
sean-k-mooney | well isnt soft delete resoration an admin only api too | 14:49 |
gibi | no | 14:49 |
gibi | soft delete is allowed to owner | 14:50 |
gmann | yeah admin-or-owner | 14:50 |
gibi | Policy defaults enable only users with the administrative role or the owner of the server to perform this operation | 14:50 |
gibi | yepp | 14:50 |
sean-k-mooney | ah https://docs.openstack.org/api-ref/compute/?expanded=list-servers-detail,restore-soft-deleted-instance-restore-action-detail#restore-soft-deleted-instance-restore-action | 14:50 |
sean-k-mooney | Policy defaults enable only users with the administrative role or the owner of the server to perform this operation. | 14:51 |
gibi | also GET /servers/<uuid> returns the soft-delete instance for the owner | 14:51 |
gibi | just GET /servers doesn't | 14:51 |
gibi | and GET /servers/details | 14:51 |
sean-k-mooney | i mean i guess it would be oke to list your soft-deleted instances | 14:51 |
gibi | yeah I feel the same ^^ | 14:51 |
sean-k-mooney | { | 14:52 |
sean-k-mooney | "restore": null | 14:52 |
gmann | as long as they know uuid they can get it via GET /servers/<uuid> | 14:52 |
sean-k-mooney | } | 14:52 |
gibi | yes | 14:52 |
gibi | if you know the uuid you can restore it | 14:52 |
sean-k-mooney | ... this looks like another case where we force null instead of allowing restore: {} | 14:52 |
gmann | sean-k-mooney: that inconsistency is in our most of the action APIs | 14:53 |
*** dklyle has joined #openstack-nova | 14:53 | |
*** dansmith has quit IRC | 14:53 | |
sean-k-mooney | gmann: yep im hoping we fix that sooner rather then later and allow {} everhwere null is allowed today | 14:53 |
sean-k-mooney | to correct the regression we intoduced a few years ago in this regard | 14:54 |
gmann | yeah | 14:54 |
sean-k-mooney | thats said customer have not completed about it so low priority | 14:54 |
gmann | gibi: sean-k-mooney +1 on returning soft deleted instances in GET /servers GET /servers/detail | 14:54 |
*** Luzi has quit IRC | 14:55 | |
gibi | gmann: thanks | 14:56 |
sean-k-mooney | gmann: does changing the policy rule require a microverions | 14:56 |
gmann | sean-k-mooney: no. | 14:56 |
sean-k-mooney | or can it be made admin_or_owner | 14:56 |
sean-k-mooney | ok | 14:56 |
gmann | sean-k-mooney: this is code check not policy | 14:56 |
sean-k-mooney | well yes | 14:57 |
gmann | https://github.com/openstack/nova/blob/3de7fb7c327db348d04d15d4cd3c4f811a336126/nova/api/openstack/compute/servers.py#L265 | 14:57 |
sean-k-mooney | but we should still be limiting it to admin_or_owner right | 14:57 |
sean-k-mooney | e.g. i should not be able to list your deleted instances | 14:57 |
gmann | but it is same things, we are just opening the permission here. no new filed added in response or return code change | 14:57 |
sean-k-mooney | so we should likely remove the code check and contol it via policy | 14:57 |
openstackgerrit | Sylvain Bauza proposed openstack/nova master: Bump the Compute RPC API to version 6.0 https://review.opendev.org/c/openstack/nova/+/761452 | 14:58 |
gmann | sean-k-mooney: exactly. hard code is_admin are not so good | 14:58 |
*** dansmith has joined #openstack-nova | 14:58 | |
bauzas | gibi: rebased the compute RPC API change https://review.opendev.org/c/openstack/nova/+/761452 | 14:58 |
gibi | bauzas: ack, looking | 14:58 |
* bauzas looks at gibi's comments for the prelude | 14:58 | |
gmann | sean-k-mooney: and that is one of the next step after new secure rbac, to remove all is_admin hard coded checks from everywhere(API or DB etc) | 14:59 |
*** whoami-rajat has quit IRC | 15:00 | |
*** macz_ has joined #openstack-nova | 15:01 | |
*** macz_ has quit IRC | 15:01 | |
*** macz_ has joined #openstack-nova | 15:02 | |
sean-k-mooney | yep makes sense | 15:04 |
stephenfin | melwitt: Can you look at https://review.opendev.org/c/openstack/osc-placement/+/743976 again today? Looks like gibi is waiting on you to spin back around to it first | 15:06 |
melwitt | stephenfin: yes, sorry, I had meant to look at that friday but didn't :( I will look today | 15:07 |
stephenfin | All good. Thanks! | 15:08 |
*** mgariepy has quit IRC | 15:12 | |
*** lpetrut has quit IRC | 15:18 | |
*** ralonsoh has quit IRC | 15:21 | |
*** __ministry1 has joined #openstack-nova | 15:23 | |
*** martinkennelly has joined #openstack-nova | 15:23 | |
*** mgariepy has joined #openstack-nova | 15:24 | |
*** ralonsoh has joined #openstack-nova | 15:27 | |
*** mkrai has quit IRC | 15:28 | |
*** mgariepy has quit IRC | 15:31 | |
*** __ministry1 has quit IRC | 15:40 | |
*** mlavalle has joined #openstack-nova | 15:48 | |
*** links has joined #openstack-nova | 15:49 | |
*** jraju__ has quit IRC | 15:49 | |
*** vishalmanchanda has quit IRC | 16:11 | |
*** manuvakery1 has joined #openstack-nova | 16:13 | |
*** mgariepy has joined #openstack-nova | 16:22 | |
openstackgerrit | Sylvain Bauza proposed openstack/nova master: Wallaby 23.0.0 prelude section https://review.opendev.org/c/openstack/nova/+/782172 | 16:28 |
bauzas | dansmith: gibi: thanks for looking at the prelude, updated ^ | 16:28 |
bauzas | arf, just saw stephenfin's comments | 16:29 |
sean-k-mooney | stephenfin: melwitt coul ye take a look at the discussion at https://review.opendev.org/c/openstack/nova/+/769614/2//COMMIT_MSG#18 again | 16:29 |
stephenfin | will do | 16:29 |
openstackgerrit | Sylvain Bauza proposed openstack/nova master: Wallaby 23.0.0 prelude section https://review.opendev.org/c/openstack/nova/+/782172 | 16:30 |
sean-k-mooney | stephenfin: melwitt now that we are passed FF and i have a littel brain power back i have 3 bugs i would like to make progress on https://review.opendev.org/c/openstack/nova/+/769614, https://review.opendev.org/c/openstack/nova/+/777679 and https://review.opendev.org/c/openstack/nova/+/602432 | 16:32 |
melwitt | sean-k-mooney: I have been watching the discussion but not really understanding what's going on. all I know is experts on numa are disagreeing :) and I was thinking with discussion maybe a new option that yall agree would be possible | 16:33 |
sean-k-mooney | melwitt: ack, my view is the proported optimisation was never valid or functional in any meaningful way and it was broken by design | 16:34 |
*** penick has joined #openstack-nova | 16:35 | |
*** penick has quit IRC | 16:35 | |
*** penick has joined #openstack-nova | 16:35 | |
sean-k-mooney | i think alex agreed with the design part in his last comment but was unsure if the optimiasation acutlly provided a performance imporment | 16:35 |
manuvakery1 | Hi. what could be a acceptable load average on compute host. I can see its consistently between 30-40 when i stress the vm with same no of cores as compute host | 16:36 |
sean-k-mooney | and stephenfin was concerend about a regression in functionality and belive the optimisation may have been valid in some cases | 16:36 |
*** lpetrut has joined #openstack-nova | 16:37 | |
melwitt | ah, ok | 16:37 |
sean-k-mooney | if you have 40 cores then a load average of 40 means you are fully utilising the system and not over stressing the cpus | 16:37 |
sean-k-mooney | manuvakery1:^ | 16:37 |
sean-k-mooney | so a load average fo <= number of cores meens you are below or at capasity | 16:38 |
manuvakery1 | its a 48 core host | 16:38 |
sean-k-mooney | if you exceed it it means there is contention between prcoess to execute cpu instrucutions | 16:38 |
sean-k-mooney | manuvakery1: 30-40 is prefectly accpable in that case | 16:38 |
manuvakery1 | thanks sean-k-mooney | 16:38 |
sean-k-mooney | if the load avergae exceed core count it means the vms are under perferoming because they are cpu starved | 16:39 |
sean-k-mooney | that may or may not matter depening on your use case but i would not be concerned with your current values | 16:39 |
*** ociuhand_ has quit IRC | 16:42 | |
*** ociuhandu has joined #openstack-nova | 16:43 | |
manuvakery1 | that means when using cpu over commit, I can expect high load average and my vms can under perform when all are trying to gather cpu . I am ok if my vms are performing little slow but don't want my host machine to be non responsive | 16:43 |
kashyap | stephenfin: I take it that when you move content, you're _only_ moving content -- or are you also mixing in little fix-ups? | 16:43 |
kashyap | stephenfin: E.g. I'm looking at the SEV guide | 16:44 |
stephenfin | I might fix spellings and messed with the structure but it general the content is the same. I tried to keep major reworks separate | 16:44 |
sean-k-mooney | manuvakery1: we generally recommend confinging the guest to run on a subset of host cores | 16:44 |
kashyap | Right; I see that you've also added hyperlinks where you can | 16:44 |
sean-k-mooney | manuvakery1: using vcpu_pin_set before train or cpu_share_set and cpu_dedicated_set after train | 16:45 |
kashyap | E.g. on line-17 I see you've added the link to _deploying-sev-capable-infrastructure | 16:45 |
sean-k-mooney | manuvakery1: that way you can ensure that the host os never locks up | 16:45 |
kashyap | stephenfin: Okay; figured as much - major rework separate. Thx | 16:45 |
sean-k-mooney | manuvakery1: our general recommendateion is to reserve at least the first core from each numa node or for the OS to use | 16:45 |
manuvakery1 | sean-k-mooney: ok. I will try that | 16:46 |
kashyap | stephenfin: Err, thinko above: it's not a link, but an "anchor". | 16:46 |
sean-k-mooney | manuvakery1: so assuming you have 2 sockets each wtih 12 cores and 24 hypertreads you weroul ideally reserved cores 0,12,24,36 | 16:46 |
sean-k-mooney | manuvakery1: before train that would be done with vcpu_pin_set=1-11,13-23,25-35,37-47 | 16:48 |
*** lpetrut has quit IRC | 16:48 | |
sean-k-mooney | after train cpu_shared_set=1-11,13-23,25-35,37-47 | 16:48 |
sean-k-mooney | vcpu_pin_set was in the [DEFAULT] sechtion and cpu_*_set are in the [compute] section of the nova.conf, this need to be set on each compute node | 16:49 |
manuvakery1 | sean-k-mooney: I am using train. I will see to it. Thanks for your response | 16:50 |
*** ociuhandu_ has joined #openstack-nova | 16:55 | |
*** ociuhandu has quit IRC | 16:58 | |
*** ociuhandu_ has quit IRC | 16:59 | |
*** lucasagomes has quit IRC | 17:07 | |
*** dtantsur is now known as dtantsur|afk | 17:09 | |
*** whoami-rajat_ has joined #openstack-nova | 17:10 | |
*** penick has quit IRC | 17:11 | |
*** rpittau is now known as rpittau|afk | 17:11 | |
MrClayPole | We are troubleshooting and issue with our Openstack backup provider (Trilio) and our Storage vender/cinder driver (Zadara). As part of this troubleshooting we've been asked to disable iSCSI multipathing. I can see that this installed as part of the nova deployment in Openstack ansible from the mutipath-tools package. Is it OK to just stop and disable the multipathd services. Then run mutipath -F. Then check with multipath | 17:18 |
MrClayPole | -ll to ensure its no longer active? | 17:18 |
*** jamesdenton has quit IRC | 17:24 | |
*** jamesdenton has joined #openstack-nova | 17:24 | |
openstackgerrit | Merged openstack/osc-placement master: Include usage in 'inventory list', 'inventory show' https://review.opendev.org/c/openstack/osc-placement/+/743976 | 17:34 |
*** amodi has quit IRC | 17:53 | |
sean-k-mooney | lyarwood: whats the status of nova-grenade-multinode | 17:55 |
sean-k-mooney | did you have a patch to move that to v3? or am i just wishing you did | 17:55 |
sean-k-mooney | ah https://review.opendev.org/c/openstack/nova/+/778885 | 17:56 |
sean-k-mooney | is ^ going to happen this cycle? | 17:56 |
*** amodi has joined #openstack-nova | 17:58 | |
*** whoami-rajat_ is now known as whoami-rajat | 18:06 | |
*** andrewbonney has quit IRC | 18:09 | |
*** links has quit IRC | 18:12 | |
*** ricolin has joined #openstack-nova | 18:19 | |
ricolin | stephenfin, hi about https://review.opendev.org/c/openstack/nova/+/781210 | 18:19 |
ricolin | I update some logs in comments, for what I can tell it only happen to bionic environment from aarch64. it disappear once I moved to focal | 18:21 |
openstackgerrit | Merged openstack/nova stable/victoria: Add config parameter 'live_migration_scheme' to live migration with tls guide https://review.opendev.org/c/openstack/nova/+/781211 | 18:32 |
*** dtantsur has joined #openstack-nova | 18:35 | |
stephenfin | ricolin: That looks like a bug. Can you open a bug on launchpad and I'll take a look tomorrow? | 18:36 |
stephenfin | ricolin: Referring to this http://paste.openstack.org/show/803788/ | 18:36 |
*** slaweq_ has joined #openstack-nova | 18:37 | |
*** dtantsur|afk has quit IRC | 18:41 | |
*** slaweq has quit IRC | 18:41 | |
ricolin | stephenfin, thanks I will open one for both errors I found | 18:47 |
sean-k-mooney | that a python2 vs python3 issue i think | 19:06 |
sean-k-mooney | ricolin: master is not intended to run on bionic by the way | 19:07 |
sean-k-mooney | it might be compatiable but its not part of the offical testing runtimes anymore | 19:07 |
sean-k-mooney | we have 1/2 jobs that use it for reasons but the arm jobs should be on focal | 19:08 |
sean-k-mooney | ricolin: ussuri was the last release to use bionic | 19:09 |
sean-k-mooney | all jobs for victoria wallaby and master shoudl be useding 20.04/focal if they are ubuntu based | 19:09 |
*** manuvakery1 has quit IRC | 19:11 | |
sean-k-mooney | ricolin: is there a reason the aarch64 job was using bionic | 19:12 |
sean-k-mooney | ricolin: stephenfin we have seen issue with libs incorrectly monkeypactching codecs.py in the past | 19:14 |
sean-k-mooney | this looks like we are hitting https://github.com/openstack/nova/commit/b862f6ff35d1611d0d63623a6254fc889012bfb9 | 19:15 |
sean-k-mooney | where blockdiag was messing with codecs.getreader | 19:16 |
*** hamalq has joined #openstack-nova | 19:27 | |
*** ralonsoh has quit IRC | 19:28 | |
*** Anticime1 is now known as Anticimex | 19:29 | |
*** rouk has joined #openstack-nova | 19:31 | |
*** gokhani has quit IRC | 19:31 | |
rouk | after updating to the latest stable/ussuri nova version, live migrations are broken due to cpu features being added that shouldnt be. | 19:34 |
rouk | i have merged the patches in master which allow disabling cpu features, but even that doesnt make migrations work. | 19:35 |
rouk | even after disabling these features, live migrate fights back with libvirt.libvirtError: operation failed: guest CPU doesn't match specification: extra features: npt,nrip-save | 19:35 |
*** sapd1 has quit IRC | 19:36 | |
rouk | how do i stop nova from adding features on migration and breaking things? | 19:36 |
rouk | seems like a pretty big breaking change that made it into a patch... | 19:40 |
sean-k-mooney | rouk: that is not a breaking change in nova | 19:42 |
sean-k-mooney | that is a breaking change in your kernel | 19:43 |
sean-k-mooney | rouk: nova does not add those features | 19:43 |
rouk | it only appeared after patching nova a week ago to stable/ussuri... | 19:43 |
rouk | and is present on old-kernel boxes. | 19:43 |
openstackgerrit | Merged openstack/nova stable/train: Prevent archiving of pci_devices records because of 'instance_uuid' https://review.opendev.org/c/openstack/nova/+/760978 | 19:44 |
sean-k-mooney | nova does not add those features | 19:44 |
sean-k-mooney | rouk: so i think you got some other change you did not intend | 19:44 |
rouk | it never used to, no. but these new changes that came in to "fix" live migrations, are adding features | 19:44 |
sean-k-mooney | which change are you refering too | 19:45 |
rouk | sec while i grab the commit | 19:45 |
sean-k-mooney | this ? https://github.com/openstack/nova/commit/b6c473159ec45e0aa715edd45cde28f77484a5f7 | 19:49 |
rouk | there was one a bit earlier | 19:49 |
sean-k-mooney | thre is no nova code to add those extrapsec explitly | 19:49 |
sean-k-mooney | *extra features | 19:49 |
sean-k-mooney | can you share how you have configured the cpu_mode/cpu_model/extra cpu flags | 19:50 |
sean-k-mooney | in your nova.conf | 19:50 |
rouk | i have model, i added extra flags to try and fix this, as i merged support for - syntax to remove features. | 19:50 |
rouk | but, just epyc-ibpb | 19:51 |
sean-k-mooney | are all you servers amd eypc ? | 19:51 |
rouk | yep. | 19:51 |
rouk | https://patchwork.kernel.org/project/qemu-devel/patch/20190121155051.5628-1-vkuznets@redhat.com/ | 19:52 |
rouk | which, this happened a while ago, which added these as features retoactively in qemu for the model. | 19:52 |
sean-k-mooney | thise appears to be the defintion of that model | 19:52 |
sean-k-mooney | http://paste.openstack.org/show/803793/ | 19:52 |
rouk | and then this got picked up by nova, which then added these as required features on migrate | 19:53 |
*** tbachman has quit IRC | 19:53 | |
sean-k-mooney | ok so this is not a nova bug so | 19:53 |
sean-k-mooney | those requiremtns are coming form libvirt | 19:53 |
rouk | so how do we stop them from retoactively being added on migrate from nova? | 19:54 |
sean-k-mooney | did you update the qemu/libvirt version when you updated ussuir | 19:54 |
sean-k-mooney | have you confrim that is what is happening | 19:54 |
sean-k-mooney | do the vms actully have them? | 19:54 |
sean-k-mooney | if hte vm was hard rebooted it could have had teh feature exposted to it | 19:54 |
rouk | we were on 4.0+ (where this qemu change happened) the whole time during ussuri, migrations only broke recently. | 19:54 |
rouk | hard rebooting the vm does fix it, by adding the feature | 19:55 |
rouk | i dont want to reboot an entire cloud. | 19:55 |
sean-k-mooney | fair but do you have the nova fix i mentioned above | 19:55 |
sean-k-mooney | https://github.com/openstack/nova/commit/b6c473159ec45e0aa715edd45cde28f77484a5f7 | 19:55 |
rouk | yes, i am on stable/ussuri as of a week ago. | 19:56 |
rouk | built from git. | 19:56 |
sean-k-mooney | ok so what it sounds like is the current vm definiton which is based on epyc-ibpb | 19:57 |
sean-k-mooney | was generated before the model was updated | 19:57 |
sean-k-mooney | and now that its migrating to the new host the info we get from the dest libvirt is expecting the new defintions | 19:57 |
sean-k-mooney | and its failing as a result of the abi break that libvirt/qemu did to the model | 19:58 |
rouk | libvirt migrations usually dont add features during migrate, usually you wait for reboot for that., | 19:58 |
sean-k-mooney | correct it cant | 19:58 |
sean-k-mooney | and i dont think nova is | 19:58 |
sean-k-mooney | i think this is happening lower down the stack | 19:59 |
rouk | how would libvirt add its own policy | 19:59 |
rouk | cpu check mode is now full, instead of partial before. | 19:59 |
rouk | so that also changed. vms before the latest patch had no feature policy mentioned, just the model | 19:59 |
rouk | here, let me paste the 3 iterations i have observed | 20:00 |
sean-k-mooney | rouk: we ask libvirt to dump the migratable xml here https://github.com/openstack/nova/blob/master/nova/virt/libvirt/migration.py#L58 | 20:00 |
sean-k-mooney | rouk: just so you are aware the xml that libvirt uses and dispalas is not the same one that nova gives it | 20:01 |
sean-k-mooney | rouk: libvirt parses the xml we give it and add extra info to it | 20:01 |
rouk | yeah, gotta use --live, etc | 20:01 |
sean-k-mooney | am if you have nova-compute in debug mode we will print the xml that we set to libvirt | 20:02 |
sean-k-mooney | but the one that is shown in any virsh output is not the one we gave it | 20:02 |
sean-k-mooney | its the one after it updates it and files in things like guest pci devices | 20:02 |
rouk | http://paste.openstack.org/show/opnX0PSXurwehQCWsSbg/ | 20:03 |
sean-k-mooney | the migrate xml we use is one that we retive from libvirt | 20:03 |
sean-k-mooney | then we modify it | 20:03 |
sean-k-mooney | with https://github.com/openstack/nova/blob/master/nova/virt/libvirt/migration.py#L59-L69 | 20:03 |
rouk | so the --migratable doesnt have those cpu flags required. | 20:03 |
sean-k-mooney | but we dont modify the cpu flags | 20:03 |
sean-k-mooney | but are they listed | 20:04 |
rouk | nope. | 20:04 |
rouk | see paste, first block. | 20:04 |
sean-k-mooney | just looking at you link now | 20:04 |
sean-k-mooney | so looking at the paste | 20:05 |
sean-k-mooney | the very old vms are the ones booted and not rebooted since the model defineiton was updated | 20:05 |
rouk | "very old vms" is pre ussuri | 20:05 |
rouk | "somewhat old" is post ussuri, pre update. | 20:05 |
sean-k-mooney | ok | 20:05 |
rouk | all of ussuri had libvirt 4.0+, which is where the features got retroactively added | 20:06 |
rouk | unless kolla changed base distro... which is... well i can check. | 20:06 |
rouk | dang, i only have newer versions running, id have to check if kolla has a history somewhere. | 20:07 |
sean-k-mooney | kolla has different images per disto | 20:08 |
sean-k-mooney | are you on centos or ubuntu or debian | 20:08 |
sean-k-mooney | it wont chagne the os version witin a release | 20:08 |
rouk | we are on ubuntu, which is based on 18.04 right now for ussuri, havnt jumped version. | 20:08 |
sean-k-mooney | so one thing i notice is <cpu mode='custom' match='exact' check='full'> | 20:08 |
sean-k-mooney | nova does not sett check=full | 20:09 |
sean-k-mooney | that might be part of the issue here | 20:09 |
rouk | i tried to find any code in libvirt or nova for that, and couldnt find either. im... not sure what changed. | 20:09 |
rouk | it used to be partial. | 20:09 |
sean-k-mooney | this is a default change in libvirt | 20:09 |
rouk | and i confirmed, vms that used partial, do migrate | 20:10 |
rouk | even if theyre ancient | 20:10 |
sean-k-mooney | so libvirt is from the ubuntu cloud archive in your case | 20:10 |
sean-k-mooney | kolla just uses the libvirt form uca for the given version | 20:10 |
rouk | i dont think ubuntu tampered with libvirt defaults... but ive seen worse, but also that would be a very strange mid-release change... | 20:11 |
rouk | so, what i dont get is i merged e0a8cd7ca8a907a3d178c759212e1685d0fa35c6 and c60f4df8b19b75c8c98ec570b2a506aece9a5a34 | 20:12 |
rouk | and backported them correctly, to try and just -npt to solve it, and migration still adds it. | 20:12 |
rouk | even though new vms properly have it explicitly disabled. | 20:13 |
rouk | im not sure what i should be doing? other than a reboot, what can i do? custom libvirt config override? this issue will affect more than me. | 20:13 |
sean-k-mooney | can you do "virsh dumpxml <vm> --update-cpu --migratable" on one of the instances that cannot live migrate | 20:14 |
rouk | i have played with that, but yes, i can get one i havnt touched and get you the before/after etc, sec. | 20:14 |
sean-k-mooney | rouk: so we get the xml form libvirt similar to ^ | 20:15 |
sean-k-mooney | but we do not update the cpu flags part | 20:15 |
sean-k-mooney | im wondiering if we are getting those flags form libvirt in that call | 20:15 |
sean-k-mooney | the patch that we have for removing cpu flag only takes effect for new instances | 20:16 |
rouk | one way to find out, when i saw the commit from qemu that retroactively changed epyc/opteron i just sighed, cause now we got vms stuck with old and new. | 20:16 |
rouk | took a couple weeks after rollout to notice. | 20:16 |
sean-k-mooney | ya | 20:16 |
sean-k-mooney | they should have versioned the model definiton | 20:17 |
sean-k-mooney | they should never update them in place | 20:17 |
rouk | yep. | 20:17 |
rouk | http://paste.openstack.org/show/5ozPeL4s49EoSssXRvK7/ | 20:19 |
rouk | not the result i expected | 20:19 |
*** whoami-rajat has quit IRC | 20:19 | |
sean-k-mooney | thats similar to what i expected | 20:20 |
rouk | so wheres it being added if thats nova's starting point? | 20:20 |
sean-k-mooney | --update-cpu should only change thigns for host model | 20:20 |
sean-k-mooney | possible form the cpu basline check on the dest | 20:20 |
rouk | is there any way i can trick nova into not seeing these new features somewhere? | 20:21 |
rouk | or any other output you want | 20:22 |
sean-k-mooney | the only way to trick it would be to copy the file and revert the change | 20:24 |
sean-k-mooney | im trying to get virsh cpu-baseline to work | 20:25 |
*** rcernin has joined #openstack-nova | 20:25 | |
rouk | the edits are in C, so it would be a recompile to fix it seems. | 20:25 |
rouk | but, the only way this could be responsible is if ubuntu jumped from 3.x to 4.x, as this qemu change is only in 4.0+ | 20:26 |
sean-k-mooney | no you would jst need to edit the files in /usr/share/libvirt/cpu_maps/*.xml | 20:26 |
sean-k-mooney | /usr/share/libvirt/cpu_map/x86_EPYC-IBPB.xml in your case | 20:26 |
rouk | https://patchwork.kernel.org/project/qemu-devel/patch/20190121155051.5628-1-vkuznets@redhat.com/ these qemu changes arent related? | 20:26 |
rouk | nrip nor npt are in my xml | 20:27 |
rouk | its not part of the qemu cpu_map | 20:27 |
rouk | s/qemu/libvirt | 20:27 |
rouk | its being added higher up, in qemu itself? | 20:28 |
*** tbachman has joined #openstack-nova | 20:28 | |
sean-k-mooney | i think what we do is basically http://paste.openstack.org/show/803802/ | 20:28 |
sean-k-mooney | or rather what libvirt does and then its comparing the baseline cpus between the source and dest host | 20:29 |
*** tbachman has quit IRC | 20:29 | |
rouk | it is present there, yeah. | 20:30 |
*** tbachman has joined #openstack-nova | 20:30 | |
sean-k-mooney | could you try "virsh capabilities > /tmp/caps.xml ; virsh cpu-baseline /tmp/caps.xml --migratable --features; rm -f /tmp/caps.xml" | 20:30 |
rouk | <feature policy='require' name='nrip-save'/> | 20:31 |
rouk | yeah its there. | 20:31 |
*** adrianc has quit IRC | 20:31 | |
sean-k-mooney | so that is where its coming from | 20:31 |
sean-k-mooney | let me see if i can find the nova code | 20:31 |
*** adrianc has joined #openstack-nova | 20:32 | |
sean-k-mooney | so this is not form the xml update | 20:32 |
sean-k-mooney | its form the eariler cpu compatiablity check | 20:32 |
sean-k-mooney | in pre livemigrate i think | 20:32 |
rouk | :( and i hoped backporting those cpu feature changes would help me, heh. | 20:32 |
sean-k-mooney | this is where its failing https://github.com/openstack/nova/blob/3de7fb7c327db348d04d15d4cd3c4f811a336126/nova/virt/libvirt/driver.py#L8991-L9060 | 20:35 |
sean-k-mooney | we ask libvirt if the xml is comparitble https://github.com/openstack/nova/blob/3de7fb7c327db348d04d15d4cd3c4f811a336126/nova/virt/libvirt/host.py#L1424 | 20:36 |
rouk | a migration should survive adding features no? could it be made soft/warn for new features? | 20:36 |
sean-k-mooney | rouk: no the cpu flag cannot have addtion or removales in a live migration | 20:36 |
rouk | alright. then new features could be trimmed off? | 20:37 |
*** ociuhandu has joined #openstack-nova | 20:37 | |
rouk | new features on migrate shouldnt happen... even if its qemu being dumb and retroactively changing things. | 20:37 |
sean-k-mooney | am yes but im trying to see how nova is in this state | 20:38 |
sean-k-mooney | we can generate the cpu xml we pass in 3 ways | 20:38 |
sean-k-mooney | https://github.com/openstack/nova/blob/3de7fb7c327db348d04d15d4cd3c4f811a336126/nova/virt/libvirt/driver.py#L9017-L9032 | 20:38 |
*** rcernin has quit IRC | 20:38 | |
sean-k-mooney | its only caled in 2 places https://github.com/openstack/nova/blob/3de7fb7c327db348d04d15d4cd3c4f811a336126/nova/virt/libvirt/driver.py#L8701-L8708 and here https://github.com/openstack/nova/blob/3de7fb7c327db348d04d15d4cd3c4f811a336126/nova/virt/libvirt/driver.py#L872-L888 | 20:41 |
rouk | why would the vm be checked against target host features? if the vm doesnt have the feature, it shouldnt be checked on the vm side... | 20:42 |
rouk | nova doesnt care if the vm is missing a feature the host has | 20:43 |
rouk | if it fits within the features of the new host, it should pass. | 20:43 |
rouk | which, it does. | 20:43 |
rouk | if i was to change some of these checks to only care about the host having what the vm has, itd work, no? | 20:45 |
rouk | or would it migrate with host features and die | 20:46 |
sean-k-mooney | so nova does need to check the that host has all feature that the vm uses | 20:46 |
sean-k-mooney | you are right ti does not care about ones that are disabled | 20:46 |
sean-k-mooney | so nova should ignore onces that are disabeld | 20:46 |
rouk | yeah, vm needs to fit inside the host, not the other side. | 20:46 |
sean-k-mooney | right so libvirt did not previoulsy have disabeld feature we had to ignore until recenlty | 20:47 |
sean-k-mooney | and nova did not provide a way to disable them | 20:47 |
rouk | if these checks pass, will it migrate with the bad config, or the current vm config (which will work)? | 20:48 |
rouk | is it as simple as making these checks looser on the vm side? | 20:48 |
sean-k-mooney | it will migrate with teh current config | 20:48 |
sean-k-mooney | which should work | 20:48 |
rouk | yeah | 20:48 |
sean-k-mooney | well | 20:48 |
sean-k-mooney | actully not nessisarly | 20:48 |
sean-k-mooney | so hte issue we have is that how migration work is actully different then most people think | 20:49 |
rouk | i got confused once i had to patch nova-ssh :p | 20:49 |
sean-k-mooney | libvirt on the source house ask libvirt on the dest host to spawn a qemu instance using an xml we provide | 20:49 |
sean-k-mooney | so that qemu instance i like a norm new instance that was booted with a given xml | 20:49 |
sean-k-mooney | if libvirt addes flags to that qemu becuase it has a different cpu model definiton | 20:50 |
sean-k-mooney | the qne qemu tryes to do the migration form the source to dest it will fail | 20:50 |
rouk | so on move, vm-side missing features need to be added as disabled? | 20:51 |
sean-k-mooney | yes | 20:51 |
rouk | instead of just missing | 20:51 |
sean-k-mooney | i belive they would if they weere enabled in the model | 20:51 |
rouk | yeah, thats what i tried to do with those patches, cause when i manually tested migration, i could make it move by disabling features on the target | 20:51 |
rouk | whats the most elegant way to add missing as disabled on migrate? | 20:52 |
rouk | or should i be forking qemu till i can get vms onto new features organically? | 20:53 |
rouk | heh | 20:53 |
sean-k-mooney | so the current failure is here https://github.com/openstack/nova/blob/3de7fb7c327db348d04d15d4cd3c4f811a336126/nova/virt/libvirt/driver.py#L8702-L8706 i ruled out the other code path | 20:53 |
rouk | id... rather not do that. | 20:53 |
sean-k-mooney | rouk: you dont need to fork qemu | 20:53 |
sean-k-mooney | for testing you could comment out both checks on the dest host | 20:54 |
rouk | but wouldnt libvirt then add the feature on move? | 20:55 |
sean-k-mooney | am well the xml we provide wont reference them | 20:55 |
sean-k-mooney | but yes your right it might do it implcitly | 20:56 |
rouk | i can try, if you think its a worthy test | 20:56 |
sean-k-mooney | i think what will happne is the libvirt error will go away but you might get a qemu error when we actully call migrate | 20:57 |
rouk | probably, if nova isnt the one adding these in the first place. | 20:57 |
sean-k-mooney | what i was thinking was we could modify the migrate xml to remove/add the cpu feature based on the config | 20:57 |
sean-k-mooney | rouk: basically what you were orginally trying to do | 20:57 |
sean-k-mooney | but also on migrate | 20:58 |
sean-k-mooney | the orginal patch only did it on spawn | 20:58 |
rouk | yeah, but wont that be a problem for people in other cases? | 20:58 |
rouk | for me, sure, it fixes my problem | 20:58 |
sean-k-mooney | yep im sure ill get a downstream bug for this ill have to help fix in a rush | 20:58 |
sean-k-mooney | im tyrin g to think through 2 things currently | 20:59 |
sean-k-mooney | what would not be a horrible hack for you | 20:59 |
sean-k-mooney | and waht we could do to workaround the libvirt/qemu abi break more generally | 20:59 |
rouk | well, more generally, cant we just edit the migration xml on compare error, we have the missing features, and we know which direction the failure is. | 21:00 |
rouk | if the vm is at fault, and its a new feature on the host, disable it, it will get enabled next reboot. | 21:00 |
sean-k-mooney | maybe we are also currently reqorking how we do the cpu compare | 21:00 |
sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/762330 | 21:01 |
rouk | i might need a horrible hack though, depending on how long the fix will take. | 21:01 |
sean-k-mooney | although i dont think that will fix it | 21:01 |
rouk | every day this sits, new vms come up depending on new features, and old vms are stranded. | 21:01 |
rouk | got hosts with broken NICs i cant evict, heh | 21:01 |
rouk | thanks supermicro | 21:01 |
sean-k-mooney | ya so this will only affect exsiting instance now that you have updated all the contianers | 21:02 |
rouk | if i didnt have like 1/3rd of my capacity having nics all blow up at once. | 21:02 |
sean-k-mooney | so option 1 is cold migration or hardreboot + libve migration | 21:03 |
rouk | i would hardly mind this cpu change, cause id just uh... wait till everyone reboots. | 21:03 |
sean-k-mooney | not grate but it would work | 21:03 |
rouk | yeah, its about 1500 VMs to reboot. | 21:03 |
rouk | and ill get quite the tomatoes thrown at me | 21:03 |
sean-k-mooney | option 2 patch the cpu model xml and to use the old values | 21:03 |
*** rcernin has joined #openstack-nova | 21:03 | |
rouk | its not in the xml. | 21:03 |
rouk | its added higher up, in qemu cpu.c | 21:04 |
sean-k-mooney | no one sec | 21:04 |
sean-k-mooney | i mean /usr/share/libvirt/cpu_map/x86_EPYC-IBPB.xml | 21:04 |
*** gyee has joined #openstack-nova | 21:04 | |
rouk | yeah, it doesnt mention these features. | 21:04 |
sean-k-mooney | yep you could add them and set them disabled | 21:04 |
rouk | ah | 21:04 |
rouk | didnt see an arg for disabled on any of the existing lines. | 21:04 |
sean-k-mooney | then make a copy of the file as normal with a new name | 21:04 |
sean-k-mooney | and update the nova.conf to use that for new vms | 21:05 |
rouk | will nova know to use the old name on migrate? | 21:05 |
sean-k-mooney | yes because that is in the xml which we dont update | 21:05 |
*** tbachman has quit IRC | 21:05 | |
rouk | ah yeah | 21:05 |
rouk | its quite the hack, but... i can do it pretty trivially. | 21:06 |
rouk | just make it part of my nova build. | 21:06 |
sean-k-mooney | ya so basically mv <file>.xml <file>_v2.xml | 21:06 |
sean-k-mooney | well cp | 21:06 |
sean-k-mooney | and then edit <file.xml> | 21:07 |
sean-k-mooney | you could do that as a layer in the nova-libvirt container | 21:07 |
sean-k-mooney | if you want new vms to use the new defintion just set cpu_model=eypc-ibpb-v2 | 21:08 |
sean-k-mooney | new vms will get that but migrated ones wont until you hard reboot | 21:08 |
sean-k-mooney | basically with a kolla build overried file you would be patching in the versioning that libvirt should have done | 21:08 |
sean-k-mooney | rouk: 3 is we figure out how to do this in code which could take a little while to do | 21:10 |
sean-k-mooney | rouk: have you filed a bug? | 21:10 |
sean-k-mooney | rouk: its not really a nova bug but we could workaround it i think | 21:11 |
sean-k-mooney | we would need to create a functional repoducer first as while i undersatd why this happended its really not due to an external change that we cant contol | 21:13 |
rouk | i havnt filed a bug, no. | 21:14 |
rouk | but yeah, ill build a nova-libvirt with another model. | 21:14 |
*** ociuhandu has quit IRC | 21:15 | |
sean-k-mooney | actully i think there is something else you could try | 21:15 |
sean-k-mooney | so they added teh "fix" for this to https://github.com/qemu/qemu/blob/1b507e55f8199eaad99744613823f6929e4d57c6/hw/i386/pc.c#L125-L147 | 21:16 |
rouk | yeah whats this compat, i googled it on friday | 21:16 |
*** rcernin has quit IRC | 21:17 | |
sean-k-mooney | you might be able to use a versioned machinve type for this | 21:17 |
rouk | i couldnt find any docs for how these compat versions work. | 21:17 |
sean-k-mooney | so i have nver looked at this before but i think it might eb related to machine tyeps | 21:17 |
rouk | i just have no idea what arg or config i need to do to activate this 3.1 compat | 21:18 |
rouk | i cant find a single note of documentation on it | 21:20 |
sean-k-mooney | so these are the machine type i have on centos they are disto specific | 21:21 |
sean-k-mooney | http://paste.openstack.org/show/803804/ | 21:21 |
sean-k-mooney | im going to check a ubuntu vm | 21:22 |
rouk | oh... that notation, i saw some stuff for that in virt-manager before. | 21:22 |
*** jangutter has joined #openstack-nova | 21:22 | |
rouk | wonder if it has it listed | 21:22 |
sean-k-mooney | http://paste.openstack.org/show/803805/ | 21:23 |
sean-k-mooney | that is ubuntu 20.04 | 21:23 |
sean-k-mooney | i wonder if you can use pc-i440fx-3.1 | 21:24 |
rouk | pc-i440fx-3.1 Standard PC (i440FX + PIIX, 1996) | 21:24 |
rouk | yeah | 21:24 |
rouk | can nova set machine type... sec | 21:24 |
sean-k-mooney | yep | 21:25 |
sean-k-mooney | 1 of two ways. in the image which wont help and in the nova.conf which should | 21:25 |
rouk | yeah, libvirt.hw_machine_type | 21:25 |
*** jangutter_ has quit IRC | 21:25 | |
rouk | can do that, that sounds a lot cleaner. | 21:25 |
sean-k-mooney | https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.hw_machine_type | 21:25 |
sean-k-mooney | yep so add x86_64=pc-i440fx-3.1 | 21:25 |
rouk | ill get that tested after some food, been at this issue most of the day, heh. | 21:26 |
rouk | thanks for all the help, even though its not nova's fault, at least the knowledge that this dumb editing can happen from qemu means nova can defend against it. | 21:26 |
sean-k-mooney | ya. if using the version machine type fixes it then great | 21:27 |
rouk | maybe that needs to be a default thing thats recommended to set, idk | 21:27 |
sean-k-mooney | that would at least give use time to think about what to do in nova if anything in the futrue | 21:27 |
sean-k-mooney | well there are two camps | 21:28 |
sean-k-mooney | one camp is alwasy uses the versioned machine type | 21:28 |
sean-k-mooney | and ooo/ redhat openstack does this by default | 21:28 |
sean-k-mooney | the other camp is alwasy use the latest one | 21:28 |
sean-k-mooney | so that you get the latest fixes/abi | 21:28 |
rouk | sure, that works great till qemu screws you | 21:29 |
rouk | with machine type though, we cant really move to the new instructions well | 21:29 |
rouk | the xml change would at least persist in the vm config and be fixed on reboot. | 21:29 |
rouk | version, we just delay till nova can handle it, i guess. | 21:30 |
rouk | not sure which is better. | 21:30 |
sean-k-mooney | https://github.com/qemu/qemu/blob/c40ae5a3ee387b13116948cbfe7824f03311db7e/hw/i386/pc_piix.c#L510-L525 | 21:30 |
sean-k-mooney | there we go | 21:30 |
sean-k-mooney | so yes those are the version machine type defintions | 21:31 |
rouk | is there any way to make nova migrate between versions on reboot? | 21:31 |
rouk | or will i need to use a custom model to do that | 21:31 |
sean-k-mooney | am nova used too but we now wont | 21:31 |
rouk | so i guess the custom model is better | 21:32 |
rouk | at least then i can get everyone to move organically next time they reboot without breaking migrations | 21:32 |
*** derekh has quit IRC | 21:32 | |
rouk | even if its a dirtier fix | 21:32 |
sean-k-mooney | rouk: https://specs.openstack.org/openstack/nova-specs/specs/wallaby/approved/libvirt-stash-instance-machine-type.html | 21:32 |
sean-k-mooney | well no so on ussuri you could set the machine type explcitly | 21:33 |
sean-k-mooney | migrate all teh vms | 21:33 |
sean-k-mooney | then set it back to pc which is the default or leave it unset | 21:33 |
sean-k-mooney | then they would update on hard reboot | 21:33 |
rouk | but their migrations will break. | 21:33 |
rouk | or once its migrated once, it will stick in the config. | 21:33 |
sean-k-mooney | not unless this happens again | 21:34 |
rouk | hmm, i guess thats an okay upgrade plan | 21:34 |
rouk | as long as im not stranded on a setting forever. | 21:34 |
sean-k-mooney | so what the new spec say | 21:34 |
sean-k-mooney | is going forward we will recored the machinve type a vm boots with the first time | 21:34 |
sean-k-mooney | and it will have that forever | 21:34 |
sean-k-mooney | unless | 21:35 |
sean-k-mooney | you use the new nova-manage update_machine_type | 21:35 |
sean-k-mooney | command to change it | 21:35 |
sean-k-mooney | rouk: before that we would use what ever was in the config when generting the xml | 21:35 |
sean-k-mooney | if unset it would default to pc | 21:36 |
sean-k-mooney | anyway i need to grab food too | 21:37 |
sean-k-mooney | hopefully that hels and you can use a machine-type or update the cpu model for now | 21:37 |
sean-k-mooney | the reason you hit this issue is 1 you are not using a version machine type today and 2 qemu backported that abi break | 21:38 |
*** artom has quit IRC | 21:46 | |
*** rcernin has joined #openstack-nova | 21:51 | |
*** rcernin has quit IRC | 21:51 | |
*** rcernin has joined #openstack-nova | 21:52 | |
*** tbachman has joined #openstack-nova | 21:55 | |
*** artom has joined #openstack-nova | 22:00 | |
*** artom has quit IRC | 22:06 | |
*** ociuhandu has joined #openstack-nova | 22:13 | |
*** ociuhandu has quit IRC | 22:14 | |
rouk | sean-k-mooney: with the option set, still breaks. | 22:27 |
rouk | does it not take effect on migrate? | 22:27 |
rouk | hw_machine_type = x86_64=pc-i440fx-3.1 | 22:27 |
*** slaweq_ has quit IRC | 22:30 | |
*** hoonetorg has quit IRC | 22:37 | |
rouk | ah, the vm got the default machine='pc-i440fx-4.2' from before. | 22:37 |
rouk | will it be honored if i change the xml? hmm | 22:40 |
rouk | nah, so i guess the custom model is the only way. | 23:03 |
*** macz_ has quit IRC | 23:12 | |
rouk | sean-k-mooney: <feature policy='disable' name='npt'/> in the destination epyc-ibpb.xml also doesnt work. | 23:32 |
rouk | dont think anything can be disabled here? | 23:32 |
*** macz_ has joined #openstack-nova | 23:46 | |
*** macz_ has quit IRC | 23:50 | |
rouk | but yes, qemu moved from 3.1 to 4.2 in a single release version on ubuntu cloud repo | 23:53 |
rouk | guess i can rollback | 23:53 |
*** mlavalle has quit IRC | 23:57 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!