*** tobias-urdin|pto is now known as tobias-urdin | 06:31 | |
*** bauzas_ is now known as bauzas | 06:52 | |
jlejeune | bauzas: hello, I would need a one more review on https://review.opendev.org/c/openstack/nova/+/925014 please :) | 07:12 |
---|---|---|
*** bauzas_ is now known as bauzas | 07:24 | |
*** bauzas_ is now known as bauzas | 07:56 | |
*** bauzas_ is now known as bauzas | 08:21 | |
opendevreview | Merged openstack/nova stable/2023.1: Fix test_vmdk_bad_descriptor_mem_limit and test_vmdk_bad_descriptor_mem_limit_stream_optimized https://review.opendev.org/c/openstack/nova/+/925014 | 08:28 |
jlejeune | thanks bauzas :) | 08:29 |
*** bauzas_ is now known as bauzas | 10:04 | |
ralonsoh | hi folks! please check this list of backports: https://review.opendev.org/q/Ie731045629f0899840a4680d21793a16ade9b98e | 10:07 |
ralonsoh | this is affecting the stable branches CI in Neutron right now | 10:07 |
ralonsoh | the master patch, merged 3 days ago, solved the issue for the master branch | 10:08 |
ralonsoh | thanks in advance! | 10:08 |
sean-k-mooney | bauzas: gibi: can try an dland these backports today https://review.opendev.org/q/topic:%22remove-ami-override%22 | 10:51 |
sean-k-mooney | this will be needed by neutorn and i want to include them in the release shas for the new release for the cve which i hope to base on what is merged by EOD | 10:52 |
frickler | elodilles: ^^ maybe? or as a last resort I can undust my release-team-inherited stable-maint hat | 10:58 |
sean-k-mooney | frickler: i think elodilles is on pto so its ok to wait for others | 11:01 |
sean-k-mooney | frickler: the timeline i have in my head is prepare the branches to be tagged today by merging any pending backport we think are reasonabe | 11:01 |
sean-k-mooney | then tomorrow ill submit the release patches adn we can review them in the nova meeting | 11:02 |
sean-k-mooney | if no objections at that point then we will proceed with the actual release | 11:02 |
sean-k-mooney | frickler: if really needed i can self approve my own backport or dansmith can approve them i was just avoiding ping people before they come online | 11:03 |
gibi | sean-k-mooney: I'm +2 on the backports | 11:16 |
sean-k-mooney | gibi: thanks if you and bauzas: are both happy i can add +w as required once the preceddeing patch is merged | 11:19 |
gibi | sounds good to me | 11:21 |
sean-k-mooney | by the way i hada weirdly passive agressinve interaction with someone over the weekend. i dont know why they were getting so defensive when i asked simple questions but they abandoned there patch | 11:21 |
sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/925037 | 11:21 |
sean-k-mooney | apparently if we have mdevs allocated volume or possible also interface detach | 11:22 |
sean-k-mooney | can fail https://bugs.launchpad.net/nova/+bug/2074219 | 11:22 |
sean-k-mooney | bauzas: we should create a functional repoducer and or see if we can replicate this in your devstack mdev series | 11:23 |
sean-k-mooney | my guess is they work for Indiana University and just decieded that actully fixing this properly upstream was too much effrot and they are just going to carry the patch downstream | 11:26 |
sean-k-mooney | but if this is a real problem i wouls like to actully fix it for everyone properly | 11:27 |
sean-k-mooney | however we shoudl acutlly create a repoducer before doing that | 11:27 |
elodilles | sean-k-mooney: actually i'm back from today on, though i just started picking up the pace o:) | 11:43 |
sean-k-mooney | elodilles: well welcome back and dont burn yourself out. | 11:54 |
elodilles | thx, i will try no to o:) | 12:00 |
opendevreview | Rajesh Tailor proposed openstack/nova master: Add regression test case for bug 1978573 https://review.opendev.org/c/openstack/nova/+/914653 | 12:21 |
opendevreview | Rajesh Tailor proposed openstack/nova master: Restore original AZ when unshelve fails https://review.opendev.org/c/openstack/nova/+/911108 | 12:21 |
sean-k-mooney | elodilles: i dont really want to delay the release for the cve too long but if there are any backports you think we should include in the release im happy to review | 12:26 |
kashyap | sean-k-mooney: On that MDEV patch review, for a new contributor, I wouldn't use language like: "any attempt to rely on something like that would not pass code review." | 12:31 |
sean-k-mooney | kashyap: they are not really new htey have submitted patches before over a few releases and abandoned most of them | 12:32 |
sean-k-mooney | kashyap: but they were also implying that we were not following best pratices and relying on bad coding style. | 12:32 |
kashyap | I see; I've just seen that they've also contributed in the past a patch. | 12:33 |
sean-k-mooney | when i didnt recognise the name i looked to see if they were new and saw they ahd previous patches but i tought it was odd they abandoned most of them | 12:34 |
kashyap | Yeah, I saw that comment. Maybe they could've said, "_if_ we're relying on missing attributes ..." | 12:34 |
sean-k-mooney | perhaps they seams to ignore the content of my comment hosever and get very defensive | 12:35 |
sean-k-mooney | i on 2 ocations asked them to create a repoducer to demonstrate the orgainal bug and confrim there patch actully fixed it | 12:36 |
kashyap | But is parsing the alias field here wrong? I'm not 100% sure yet | 12:36 |
sean-k-mooney | the bug is that we dont check that all the objects have it before using it | 12:45 |
sean-k-mooney | and only some of them do | 12:45 |
sean-k-mooney | you can only request MDEVs via the flavor | 12:45 |
sean-k-mooney | so we dont actuly supprot detaching or attaching them to a running domain | 12:46 |
sean-k-mooney | which is why we did not add the alias field to them | 12:46 |
sean-k-mooney | now we can just do that and that will fix the current bug | 12:46 |
sean-k-mooney | we could also fix this in other ways | 12:46 |
sean-k-mooney | without a repoducer its hard to prevent a futrue regressions | 12:47 |
kashyap | Yeah, I see that bit - that we don't support detach/attach mdevs to a running domain. | 12:47 |
kashyap | I agree; repro will help here, and you've asked as much on the review | 12:47 |
sean-k-mooney | so the fact we dont supprot it is just an aside, its why we didnt think it was required when we started using the aliases | 12:48 |
sean-k-mooney | if we fix this by adding the filed | 12:48 |
sean-k-mooney | then we have a wackamole problem fo doing this for each possibel object that could be returened | 12:48 |
sean-k-mooney | if we fix this but only checking the aliase if the object has one then we fix it for all device types | 12:48 |
sean-k-mooney | in the previous case of a PF it was correct to add teh alias filed because we do supprot attaching/detaching those to a running guest | 12:49 |
sean-k-mooney | so there patch is not wrong jsut incompelte | 12:49 |
sean-k-mooney | kashyap: is this somehting your interested in fixing? | 12:50 |
sean-k-mooney | i was trying to highlight it to bauzas in case they wanted to fix it or where seeign it in there mtty serise | 12:51 |
kashyap | sean-k-mooney: I'm not sufficiantly clued into it; I was just curious based on your remark here. (Reading back your comments) | 12:51 |
kashyap | Hmm, you do raise a good point on the whack-a-mole problem of each object | 12:51 |
kashyap | Pff; I'd probably not risk to introduce new regressions at this stage, and given capacity. | 12:51 |
sean-k-mooney | ah ok, so the underliying problem is we actully woudl ideally have a new api call in libvirt | 12:51 |
kashyap | Hmm, really? | 12:52 |
kashyap | Ah, "ideally", you said. What's the current API call? | 12:52 |
sean-k-mooney | ya so ideally libvirt would have an api to get a domain device element given an alias | 12:52 |
sean-k-mooney | since it does not we get all the devices and do the filtering ourselves | 12:52 |
kashyap | Let me check on #virt (OFTC) if they're open to it | 12:54 |
sean-k-mooney | so this is where the actly bug is in nova https://github.com/openstack/nova/blob/master/nova/virt/libvirt/guest.py#L419 | 12:54 |
sean-k-mooney | dev.alias is only in some of our devices not all | 12:54 |
sean-k-mooney | so we coudl fix it in the base clase we use or we shoudl do a hasattr check before using it | 12:55 |
kashyap | Do we know a list of devices that libvirt _does_ return here? | 12:55 |
sean-k-mooney | its all devices in the domain after we parse it | 12:56 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/guest.py#L419-L452 | 12:56 |
kashyap | I'm looking at get_all_devices() | 12:56 |
sean-k-mooney | so we could fix this by adding alias to https://github.com/openstack/nova/blob/master/nova/virt/libvirt/config.py#L3000 | 12:57 |
sean-k-mooney | these are not really object returned form libvirt | 12:58 |
sean-k-mooney | they are our config objects | 12:58 |
sean-k-mooney | we construct them form parsing the xml | 12:58 |
sean-k-mooney | but we check the libvirt pyton binding before doing this to see if there was a fucntion we could use more directly and didnt find one at the time | 12:58 |
sean-k-mooney | what we would do ideally is just as libvirt for the singel device element based on the alias | 12:59 |
sean-k-mooney | instead of pulling the entire domain. and parsing out all the devics to try and find the specific one | 13:00 |
kashyap | Yeah, I know these (the config.py stuff) are all libvirt guest-config classes. | 13:02 |
kashyap | Maybe you want to summarize / copy paste your comment from here on the change for the record. | 13:02 |
sean-k-mooney | not really :) i could but since they have abandoned it i feel like they are nto going to work on it any more | 13:03 |
sean-k-mooney | so i wanted to see if one of use wanted to pick it up | 13:03 |
sean-k-mooney | kashyap: im expectign bauzas mtty patches to repodcue this | 13:03 |
sean-k-mooney | so as part of using mtty to test in the gate we will likely need to fix this anyway | 13:03 |
kashyap | You think so? | 13:03 |
kashyap | Got a link? | 13:04 |
sean-k-mooney | so if this is failing the way they reported it woudl cause volume detach to fail if libvirt report mdevs | 13:04 |
sean-k-mooney | so i woudl expect the volume tempest test to be failin on https://review.opendev.org/q/topic:%22mtty_support%22 | 13:05 |
sean-k-mooney | the fact it does not makes me want the repodcures even more | 13:06 |
sean-k-mooney | oh... | 13:06 |
sean-k-mooney | ok so i wonder if this only failes if you have both a vgpu and a cinder volume | 13:07 |
kashyap | Hmm | 13:07 |
sean-k-mooney | on the same vm | 13:07 |
sean-k-mooney | we dont have result anymore but it did actully fail | 13:08 |
sean-k-mooney | https://zuul.opendev.org/t/openstack/build/42a251fe1d174a2abef7016b96fe0dc4 | 13:08 |
kashyap | Ohh, so the vGPU + Cinder volume is the combo | 13:10 |
opendevreview | Merged openstack/nova stable/2024.1: Remove AMI snapshot format special case https://review.opendev.org/c/openstack/nova/+/924982 | 13:14 |
sean-k-mooney | kashyap: i think so but again they didnt provide a clear repoducer in the bug so its hard to tell exactly | 13:21 |
sean-k-mooney | but i think that has to be the case looking at the code | 13:21 |
sean-k-mooney | we are looping over the devices in the guest domain and the error is in the mdev class | 13:22 |
sean-k-mooney | so it must only be an issue when comibining both features | 13:22 |
sean-k-mooney | which would expliain why we dont see this today | 13:22 |
dansmith | sean-k-mooney: gibi: I thought we were into "written by a core, backported by a core, can be single-vote-landed by a core" on stable now | 13:30 |
sean-k-mooney | dansmith: i guess, it does not come up often | 13:31 |
sean-k-mooney | but in either case i was going to proxy your/my +2 by end of day if noone else said go for it | 13:32 |
dansmith | ack | 13:34 |
opendevreview | Rajesh Tailor proposed openstack/nova master: Fix KeyError on assisted snapshot call https://review.opendev.org/c/openstack/nova/+/900783 | 13:45 |
opendevreview | Rajesh Tailor proposed openstack/nova master: Handle InstanceExists exception for duplicate instance https://review.opendev.org/c/openstack/nova/+/860938 | 13:57 |
elodilles | sean-k-mooney: i'm OK to propose stable releases anytime & i don't have any important backport on my radar | 14:00 |
sean-k-mooney | elodilles: bauzas asked me to include https://review.opendev.org/c/openstack/nova/+/924514 and its backports | 14:04 |
sean-k-mooney | i think we can include https://review.opendev.org/c/openstack/nova/+/923689 also on 2024.1 | 14:05 |
elodilles | ACK, I'll wait for them then o/ | 14:06 |
bauzas | cool | 14:06 |
sean-k-mooney | elodilles: i have not realy looked at the older branch contet much | 14:07 |
sean-k-mooney | i do see some nice to haves so i may do some stable review today | 14:07 |
sean-k-mooney | but i dont see anythign super urgent | 14:08 |
elodilles | ACK, i'll check them too | 14:09 |
bauzas | sean-k-mooney: can you summarize the mdev issue you said ? | 14:34 |
bauzas | -ETOOMUCHCONTENTTODIGEST | 14:34 |
bauzas | fwiw, I need to resume my mtty series now that devstack supports 24.04 | 14:35 |
sean-k-mooney | bauzas: it appears that if a gues has an mdev and a cinder volume we cant detach t he cinder volume because the get_device_by_alis code cant hanel the fact that we dont have an aias field in the class that modeles the mdev device in the guest xml | 14:36 |
sean-k-mooney | i.e. you cant detach a cinder volume for a guest with a vgpu | 14:37 |
sean-k-mooney | we can fix that in the way propsoed in the patch but it missed the underly ing bug and did not include a repoducer | 14:37 |
bauzas | i see | 14:37 |
sean-k-mooney | the underlying bug is the get_device_by_alias code assumes all geust devices have an alias filed | 14:38 |
bauzas | so people need to rebuild the instance with a flavor without VGPU first before detaching | 14:38 |
sean-k-mooney | so we shoudl fix that | 14:38 |
bauzas | agreed | 14:38 |
bauzas | is this https://review.opendev.org/c/openstack/nova/+/925037 ? | 14:38 |
sean-k-mooney | bauzas: they might just need to stop the instnace | 14:38 |
sean-k-mooney | yep | 14:39 |
sean-k-mooney | that patch will work but it does not have enouch test coverage | 14:39 |
sean-k-mooney | and as i said the bug is reallly here https://github.com/openstack/nova/blob/7a7427691e0bd4818bb7a2c5f5371e0244addbbb/nova/virt/libvirt/guest.py#L419 | 14:39 |
bauzas | the real problem is that we didn't had any functest for that | 14:40 |
sean-k-mooney | that if assumes all devices in get_all_devices will have an alias field | 14:40 |
bauzas | because mdev support predates alias | 14:40 |
sean-k-mooney | yep | 14:40 |
sean-k-mooney | so ideally we woudl create a fucntional repoducer | 14:40 |
bauzas | but when we merged aliases series, we created a regression | 14:40 |
bauzas | yup | 14:40 |
bauzas | we need a functtest | 14:40 |
sean-k-mooney | and then fix both the underlying bug and optionall add supprot ofr alias ot mdevs even if not used today | 14:41 |
sean-k-mooney | we can fix the actual bug in 1 of 2 ways | 14:41 |
sean-k-mooney | either check if dev has an alais field before usign it | 14:41 |
sean-k-mooney | or make sure that all device will have that eihter by extending the base class or checking all child class have it | 14:42 |
sean-k-mooney | so either add it here https://github.com/openstack/nova/blob/7a7427691e0bd4818bb7a2c5f5371e0244addbbb/nova/virt/libvirt/config.py#L56 or add a hasattr check | 14:43 |
bauzas | I'm fine with the first option | 14:46 |
bauzas | the second option is better in case of *another* regression | 14:46 |
bauzas | want me to help with the reproducer ? | 14:46 |
bauzas | I just have 3.5 days to work before my PTO and I really want my mtty series to be reviewable before I leavred | 14:47 |
opendevreview | Pavlo Shchelokovskyy proposed openstack/nova master: Replace -- as 3-4 char in hostname with - https://review.opendev.org/c/openstack/nova/+/893072 | 15:23 |
*** efried1 is now known as efried | 15:25 | |
opendevreview | Merged openstack/nova stable/2023.2: Remove AMI snapshot format special case https://review.opendev.org/c/openstack/nova/+/924999 | 15:33 |
*** efried1 is now known as efried | 15:33 | |
*** efried1 is now known as efried | 15:53 | |
*** bauzas_ is now known as bauzas | 16:01 | |
opendevreview | Merged openstack/nova stable/2023.2: Fixed an error when caching multiple images in aggregate https://review.opendev.org/c/openstack/nova/+/904017 | 16:09 |
sean-k-mooney | bauzas: i rehcecked the last patch in your mtty seriese and it didnt repoduce the cinder issue but it did fail with https://paste.opendev.org/show/bexn6H6KYRrAlhNONtVn/ | 16:20 |
sean-k-mooney | there is a bug somewehre but im not sure that https://review.opendev.org/c/openstack/nova/+/918420/ is even current | 16:22 |
opendevreview | sean mooney proposed openstack/nova stable/2023.1: Fixed an error when caching multiple images in aggregate https://review.opendev.org/c/openstack/nova/+/904018 | 16:28 |
opendevreview | Merged openstack/nova stable/2023.2: Added context manager for instance lock https://review.opendev.org/c/openstack/nova/+/911394 | 16:39 |
opendevreview | Merged openstack/nova stable/2023.2: Separate OSError with ValueError https://review.opendev.org/c/openstack/nova/+/911395 | 16:44 |
sean-k-mooney | bauzas: if you have time to work on it do | 17:25 |
sean-k-mooney | perhaps just try that usecase "mdev + cinder volume in the same vm" when working on the mtty stuff | 17:26 |
sean-k-mooney | and see if you see the same issue | 17:26 |
sean-k-mooney | if you do then we can confirm the bug | 17:26 |
sean-k-mooney | and create a repoducer | 17:26 |
sean-k-mooney | in the fucntional tests | 17:26 |
opendevreview | Merged openstack/nova stable/2023.1: Fixed an error when caching multiple images in aggregate https://review.opendev.org/c/openstack/nova/+/904018 | 19:51 |
*** bauzas_ is now known as bauzas | 20:16 | |
opendevreview | Pavlo Shchelokovskyy proposed openstack/nova master: Replace -- as 3-4 char in hostname with - https://review.opendev.org/c/openstack/nova/+/893072 | 21:08 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!