| *** ykarel__ is now known as ykarel | 04:46 | |
| ralonsoh | sean-k-mooney, gibi hello folks. Yesterday I was looking at the issues with pyroute2 and asyncio in the nova-compute agent | 07:12 |
|---|---|---|
| ralonsoh | I know this is not a solution but it could help, as a workaround, to wrap all pyroute2 operations with a privsep context | 07:12 |
| ralonsoh | these operations will be executed in the daemon process instead of running in the main process | 07:13 |
| ralonsoh | that will avoid the asyncio problem experienced when executing non-privileged commands | 07:13 |
| ralonsoh | that will allow to bump the pyroute2 version | 07:14 |
| gibi | ralonsoh: as I'm not sure about the root cause of the issue I cannot judge if moving the pyroute2 calls to another process will help or not. Have you tried to see if it helps? | 07:24 |
| ralonsoh | gibi no, but I can propose a os-vif patch, at least for the OVS related commands | 08:20 |
| ralonsoh | let me try it | 08:20 |
| gibi | cool | 08:21 |
| opendevreview | Rodolfo Alonso proposed openstack/os-vif master: Make ``PyPyroute2._lookup_interface`` private https://review.opendev.org/c/openstack/os-vif/+/991292 | 08:34 |
| opendevreview | Rodolfo Alonso proposed openstack/os-vif master: Make ``PyRoute2.exists`` privileged in the OVS library https://review.opendev.org/c/openstack/os-vif/+/991293 | 08:47 |
| ralonsoh | gibi, I've proposed https://review.opendev.org/c/openstack/requirements/+/973210, depending on ^^ | 08:54 |
| ralonsoh | nova-alt-configurations-os-vif should install this os-vif patch | 08:55 |
| opendevreview | dalekseev proposed openstack/nova master: Restrict machine type check to QEMU instances https://review.opendev.org/c/openstack/nova/+/991137 | 08:55 |
| opendevreview | Joan Gilabert proposed openstack/nova master: move compile earlier https://review.opendev.org/c/openstack/nova/+/950516 | 09:12 |
| opendevreview | Joan Gilabert proposed openstack/nova master: Add mtty/mdpy support for testing fake mdevs https://review.opendev.org/c/openstack/nova/+/898100 | 09:12 |
| opendevreview | ribaudr proposed openstack/nova master: Add regression test for bug #2120927 https://review.opendev.org/c/openstack/nova/+/991294 | 09:17 |
| opendevreview | ribaudr proposed openstack/nova master: Fix shelve-offload/unshelve race wiping instance host https://review.opendev.org/c/openstack/nova/+/991295 | 09:17 |
| sean-k-mooney | ralonsoh: im not really a fan of that idea by the way | 10:17 |
| sean-k-mooney | it woudl perfer to just run the command in a futureist process pool | 10:18 |
| sean-k-mooney | then to pretened the call neeed privlage escalation | 10:18 |
| sean-k-mooney | or provide a cli driver that we use when eventlet is enabled | 10:19 |
| sean-k-mooney | nova-compute already defaults to threaded mode. in septempeber we can start removing eventlet support for the 2027.1 cycle if we choose too | 10:20 |
| sean-k-mooney | at which point we wont need the privsep hack | 10:20 |
| sean-k-mooney | frickler: so the next issue after you fix the first-boot being interupted is apparently gettign shceulded to a rax xen host where the cirrors image that 11 seconds for the intram to start 19 cectd for /etc/init.d/rc.sysinit: and 58 second to get to the longin prome when we only wait 15 seconds for the system to boot... | 10:26 |
| sean-k-mooney | so i dont think we can fix randaom slow nodes but thats progress at least | 10:27 |
| ralonsoh | sean-k-mooney, but if you are running in threaded mode, why pyroute has this issue? | 10:30 |
| sean-k-mooney | im not sure it does have it in threaed mode | 10:30 |
| sean-k-mooney | but we supprot both modes and we only enabeld threadign by defautl this cycle | 10:30 |
| sean-k-mooney | as in its only been that way for like 2 months | 10:31 |
| sean-k-mooney | ralonsoh: i have not had time to look at the actual issue in quite a whiel | 10:32 |
| sean-k-mooney | but perhaps there are other approch we coudl take | 10:32 |
| sean-k-mooney | is there a way to repoduce this issue locally? | 10:32 |
| sean-k-mooney | my understanding was it was only seen in ci intermitently | 10:32 |
| ralonsoh | no, that happens always in the CI when bumping pyroute to 0.9.6 | 10:34 |
| ralonsoh | I'm not sure exactly in what command | 10:34 |
| sean-k-mooney | oh ok if its reptable can we trigger ti with tempest locally in devstack | 10:36 |
| sean-k-mooney | im askign because if we can i can maybe take a look at it in a vm and see if i cna debug it a bit | 10:36 |
| opendevreview | Joan Gilabert proposed openstack/nova master: WIP : Add mtty support to nova-next https://review.opendev.org/c/openstack/nova/+/922140 | 12:04 |
| opendevreview | Ashish Gupta proposed openstack/nova master: tests: file-backed SQLite with WAL in threading mode for Database and CellDatabases Fixtures https://review.opendev.org/c/openstack/nova/+/988583 | 12:45 |
| opendevreview | Rodolfo Alonso proposed openstack/os-vif master: Make ``PyPyroute2._lookup_interface`` private https://review.opendev.org/c/openstack/os-vif/+/991292 | 13:31 |
| opendevreview | Rodolfo Alonso proposed openstack/os-vif master: Make ``PyRoute2.exists`` privileged in the OVS library https://review.opendev.org/c/openstack/os-vif/+/991293 | 13:32 |
| opendevreview | ribaudr proposed openstack/nova master: Add reproducer for bug #2117544 https://review.opendev.org/c/openstack/nova/+/991350 | 13:34 |
| opendevreview | ribaudr proposed openstack/nova master: Deserialize JSON properties from volume_image_metadata https://review.opendev.org/c/openstack/nova/+/991351 | 13:34 |
| opendevreview | ribaudr proposed openstack/nova master: Deserialize JSON properties from volume_image_metadata https://review.opendev.org/c/openstack/nova/+/991351 | 13:36 |
| opendevreview | Takashi Kajinami proposed openstack/nova-specs master: libvirt: AMD SEV-SNP support https://review.opendev.org/c/openstack/nova-specs/+/983376 | 13:50 |
| Uggla | I guess there is a dependency resolution issue with grenade and stable/2025.1 https://zuul.opendev.org/t/openstack/build/23ff7cd6c0e649a9b83bb7df6f9dfc7b | 14:40 |
| Uggla | elodilles are you aware of ^ | 14:40 |
| elodilles | Uggla: yes, rpds-py does not support python 3.10 anymore. on QA channel it was discussed the other day and the plan is to release tempest and pin it for jobs that still uses Ubuntu Jammy 22.04 that has python3.10 by default | 14:50 |
| Uggla | elodilles thx, so I will wait the fix. | 14:52 |
| elodilles | +1 | 14:56 |
| gmaan | elodilles: Uggla which job failing, I can fix that as i sent in email | 14:59 |
| Uggla | gmaan it is nova-grenade-multinode | 15:00 |
| Uggla | gmaan you can see it from here : https://review.opendev.org/c/openstack/nova/+/988154 | 15:01 |
| elodilles | gmaan: i can update this patch with your fix to unblock the gate https://review.opendev.org/c/openstack/nova/+/989579 | 15:02 |
| elodilles | o:) | 15:02 |
| Uggla | Reminder upstream bug triage in ~30mn | 15:02 |
| gmaan | elodilles: Uggla ohk grenade on 2025.1 which i suggest to remove it :) but if you are fixing it is ok | 15:03 |
| elodilles | Uggla: sorry, i didn't have enought time for bug triaging this time, so i'll just listen in today :S | 15:03 |
| elodilles | gmaan: yes, yes, that we agreed to remove o:) | 15:04 |
| elodilles | gmaan: my fix was about to land when the upper-constraints were bumped and struck again on the grenade o:) | 15:05 |
| Uggla | elodilles, no worries, tbh I do not expect much this week on the triage due to review activities. | 15:06 |
| *** ralonsoh is now known as ralonsoh_ooo | 15:09 | |
| Uggla | Upstream bug triage: https://meet.google.com/zjr-rxus-hzj | 15:28 |
| sean-k-mooney | that in 20 minutes right? | 16:11 |
| sean-k-mooney | oh no it was na hour ago | 16:11 |
| sean-k-mooney | is it 15:30 UTC ? | 16:12 |
| melwitt | sean-k-mooney: yes 15:30 UTC | 16:23 |
| gmaan | elodilles: commented on grenade job fix,. you still needs to pin the tempest https://review.opendev.org/c/openstack/nova/+/989579 | 17:28 |
| gmaan | tempest change to support that is merged so no depends-on needed anymore | 17:28 |
| opendevreview | Ghanshyam Maan proposed openstack/nova stable/2025.1: [CI][stable-only] nova-grenade-multinode fix https://review.opendev.org/c/openstack/nova/+/989579 | 17:44 |
| sean-k-mooney | so while debuging with the new cirror image i noteice something intersting | 18:04 |
| sean-k-mooney | https://tinyurl.com/2k63h77v | 18:04 |
| sean-k-mooney | while it does not alwasy cause a failure https://paste.opendev.org/show/b5V9zieRqmJVFjyb5uz9/ | 18:05 |
| sean-k-mooney | nova is trigging that traceback like 50 times a day | 18:06 |
| sean-k-mooney | or rather the volume_snapshot_delete assisted volume snapshto test | 18:06 |
| sean-k-mooney | is triging that often | 18:06 |
| sean-k-mooney | if i remove t he filter on nova that tirggerd 2341 times in the last 10 days | 18:07 |
| sean-k-mooney | that with lvm or cephs as the backend | 18:08 |
| sean-k-mooney | it looks like if hte snapshot is not foudn we explosde bcuase we dont have the type info info | 18:09 |
| melwitt | that is a very old issue | 18:14 |
| sean-k-mooney | it felt familar | 18:14 |
| sean-k-mooney | i dont see a bug report for it quickly so im going to file one | 18:14 |
| melwitt | yeah I'm looking for it, sec | 18:15 |
| melwitt | ok that works too | 18:15 |
| sean-k-mooney | https://bugs.launchpad.net/nova/+bug/2155187 | 18:15 |
| sean-k-mooney | i kind fo hate "standard tracebacks" in logs | 18:16 |
| sean-k-mooney | that partly why i try to pretnt os-brick is not a thing | 18:17 |
| dansmith | agree, tracebacks in logs for normal occurrences is extremely uncool | 18:17 |
| dansmith | we had an effort like ten years ago to get rid of them all | 18:17 |
| dansmith | but as you say, brick is like a traceback generator :) | 18:17 |
| melwitt | yeah. I know there was a fix proposed for this exact problem and I can't remember why it stalled out. and I can't even find it right now | 18:18 |
| sean-k-mooney | we have drefinlly fixed other case wehre we asuem a key was alswasy present | 18:18 |
| sean-k-mooney | i dont recall if it was this exact one but we have had that class of issue with the cinder path a few times | 18:18 |
| sean-k-mooney | what i find intersting about this is we seam to mostly handel this internally | 18:19 |
| sean-k-mooney | as in its a key error but of the 2200 or so hits only like 180 of those failed | 18:19 |
| melwitt | it was this https://bugs.launchpad.net/nova/+bug/2033541 and this https://review.opendev.org/c/openstack/nova/+/900783 | 18:20 |
| sean-k-mooney | so at least in the volume delete path this is just ugly in the logs | 18:20 |
| sean-k-mooney | ah cool i can close it as a dupe then | 18:20 |
| melwitt | (that was what I was thinking of) | 18:20 |
| melwitt | is the one you saw with ceph also? | 18:20 |
| sean-k-mooney | yep | 18:21 |
| melwitt | I swear google is working less and less. launchpad search found me that when google couldn't | 18:21 |
| melwitt | you might consider closing the older one as a dupe, it clearly is not getting searched easily haha | 18:22 |
| sean-k-mooney | https://tinyurl.com/2k63h77v | 18:22 |
| sean-k-mooney | so that the case where the trace happend but the job passes | 18:22 |
| sean-k-mooney | but if you weiden that out | 18:22 |
| melwitt | yeah. it absolutely happens all the time. and there was/is a patch proposed but it got stalled out for reasons i don't remember and I think that discouraged everything | 18:23 |
| sean-k-mooney | its present in cidner cinder-plugin-ceph-tempest devstack-plugin-nfs-tempest-full | 18:23 |
| sean-k-mooney | and cinder-tempest-lvm-multibackend | 18:23 |
| sean-k-mooney | job | 18:23 |
| sean-k-mooney | so it seam to be backend indepenent | 18:23 |
| sean-k-mooney | if for any reason the snapshot is not found | 18:23 |
| sean-k-mooney | it will explode | 18:23 |
| melwitt | yeah it is the arbitrary dict thing which really sucks | 18:24 |
| sean-k-mooney | well in my case there was a 404 | 18:24 |
| melwitt | some backends use X keys some use Y keys and it's a wild west | 18:24 |
| sean-k-mooney | i..e the snapshot was not found in cinder durign the delet test | 18:24 |
| melwitt | there's no standard for that payload, last I checked | 18:24 |
| sean-k-mooney | and then we didn have the info a as a reult | 18:24 |
| sean-k-mooney | melwitt: ya i have noticed that too | 18:25 |
| sean-k-mooney | i.e. tht the payload is very backend secific | 18:25 |
| melwitt | I remember digging into it a bit and getting really discouraged | 18:25 |
| melwitt | yeah. and I think they are not documented either. so I'm not sure what the first step in the right direction is even | 18:25 |
| melwitt | that isn't just papering over something, I mean | 18:26 |
| melwitt | it could probably be its own PTG topic | 18:26 |
| sean-k-mooney | have you heard fo microverions :P | 18:26 |
| melwitt | heh | 18:27 |
| melwitt | I mean like for this, we could catch and ignore KeyError for this specific thing but the problem is bigger than that | 18:27 |
| sean-k-mooney | skimmig the patch comment im unsure if this just stalled out because fo a 2 month dely in updatin the patch for your -1 or if there was any other outstading issue with it | 18:29 |
| sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/900783/13/nova/virt/libvirt/driver.py | 18:30 |
| sean-k-mooney | i think is fine for this specific case | 18:30 |
| sean-k-mooney | it just does nto resove that class of issue generally | 18:30 |
| opendevreview | Rajesh Tailor proposed openstack/nova master: Fix KeyError on assisted snapshot call https://review.opendev.org/c/openstack/nova/+/900783 | 18:31 |
| melwitt | ok I see | 18:31 |
| sean-k-mooney | i just rebased that in th ui lets see if it passes or not | 18:31 |
| melwitt | cool thanks | 18:31 |
| sean-k-mooney | that was july 2024 which was right atount the time we wre dealing with https://security.openstack.org/ossa/OSSA-2024-001.html | 18:34 |
| sean-k-mooney | i.e we were fixign the qemu image stuff and the fallout for breaking iso ectra | 18:35 |
| opendevreview | Ghanshyam Maan proposed openstack/nova stable/2025.1: [CI][stable-only] nova-grenade-multinode fix https://review.opendev.org/c/openstack/nova/+/989579 | 21:07 |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!