Wednesday, 2024-07-03

fungiyeah, seems like tempest-integrated-compute-enforce-scope-new-defaults is repeatedly timing out (during volume tests but those aren't necessarily what's taking so long) for both the glance and nova changes previously discussed00:19
fungithough the glance one might barely finish just under the wire this time around00:19
fungionce it (hopefully) merges, i'll reenqueue and promote the nova change again00:20
fungiokay, glance change landed, nova change is back to the top of the gate for a third try00:25
dansmithfungi++00:48
dansmithfungi: that new defaults job has been timeout-heavy lately for sure, I think sean had a thing to bump the timeout, which I suppose we might have to do.. 00:54
opendevreviewDan Smith proposed openstack/nova stable/2024.1: Fix disk_formats in ceph job tempest config  https://review.opendev.org/c/openstack/nova/+/92334201:55
opendevreviewDan Smith proposed openstack/nova stable/2023.2: Fix disk_formats in ceph job tempest config  https://review.opendev.org/c/openstack/nova/+/92334301:55
opendevreviewDan Smith proposed openstack/nova stable/2023.1: Fix disk_formats in ceph job tempest config  https://review.opendev.org/c/openstack/nova/+/92334401:55
dansmithabhishek_: ^01:55
dansmithmelwitt: if you're around by chance and could monitor those ^ we need them in ASAP (master still pending) so we can get glance CVE patches merged that depend on them01:56
abhishek_dansmith: ++01:56
melwittdansmith: ok, I can do it02:12
dansmithmelwitt: thanks, master has been rechecked several times already, so who knows02:14
melwittyeah. we can at least get some more dice rolls in before EU timezone02:16
abhishek_is there someone in this timezone who can prioritise the patch in gate (just in case if required)02:21
dansmithmelwitt: ++02:21
dansmithabhishek_: I'm not sure, frickler will be coming online, might be the earliest02:21
abhishek_ack02:21
dansmithI think he can do it, but not sure02:22
abhishek_will ask him if required, thank you!02:22
opendevreviewMerged openstack/nova master: Fix disk_formats in ceph job tempest config  https://review.opendev.org/c/openstack/nova/+/92332202:37
fungiyay!02:37
dansmithamazing02:37
dansmithfungi: do we prioritize the stable ones too or no?02:38
fungimight not be necessary this time of day if they already have passing check results and there's not much else in the gate, i haven't looked02:39
dansmithack02:41
fungibut if it will help in getting them merged faster we still can02:43
fungijust depends on whether there's much of a queue or jobs are continuing to be problematic02:43
dansmithabhishek_: ^ dunno how important it is02:44
abhishek_dansmith: I think we can wait, since it will going to take a day long to merge master patches02:44
dansmithack02:44
abhishek_nova merged02:46
*** bauzas_ is now known as bauzas03:32
*** bauzas_ is now known as bauzas04:06
fricklerwell the gate is currently looking ok-ish after https://review.opendev.org/923344 caused a gate reset. let's hope the current stack merges without further failures. I'll try to keep an eye on it and look into reshuffling if further issues should happen05:13
fricklerand there goes hoping ... weird early failure in grenade-multinode that I haven't seen before, but looks 99% unrelated https://zuul.opendev.org/t/openstack/build/322833ee6e7a43b0aee2ca9803fc4f5706:00
bauzasgood morning folks07:27
bauzasfrickler: I was on PTO yesterday, so I just got the emails07:28
bauzasbut AFAICS, we needed to update Tempest rightN?07:28
bauzashttps://review.opendev.org/c/openstack/nova/+/92332207:28
bauzasI'm now chasing https://review.opendev.org/c/openstack/nova/+/923255/207:28
fricklerbauzas: the latter is failing in gate with the error I posted before, IMO it could be re-enqueued into gate immediately once zuul reports the failures, together with the stack on top of it07:32
* bauzas digs into https://zuul.opendev.org/t/openstack/build/322833ee6e7a43b0aee2ca9803fc4f5707:33
fricklerI found keystone (once again) to be essentially undebuggable with the amount of errors generated during normal operations07:34
bauzasyeah https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_322/923255/2/gate/nova-grenade-multinode/322833e/controller/logs/grenade.sh_log.txt07:34
bauzasmy only concern is that I'm not sure this is a transient issue07:35
bauzaslooks more a keystone regression to me07:35
bauzas2024-07-03 05:20:04.386 |     testtools.matchers._impl.MismatchError: '56fc53e2e0724c0284ab80060ec55421' != '39aa3070a75e446fb6aa6d48bab74c58'07:35
bauzasunless someone keystone-ed able to tell me this is something like a race condition in Keystone07:36
opendevreviewMerged openstack/nova stable/2024.1: Fix disk_formats in ceph job tempest config  https://review.opendev.org/c/openstack/nova/+/92334208:00
fricklerbauzas: if it is a regression it should fail in gate next time, too, but I don't see any recent changes in keystone. so I still suggest to reenqueue the stack into gate, unless your prefer to only recheck it08:14
bauzasI'll recheck it 08:15
bauzasand we'll see08:15
bauzasdone08:17
opendevreviewMaxime Lubin proposed openstack/nova-specs master: USB over IP  https://review.opendev.org/c/openstack/nova-specs/+/92336208:29
opendevreviewMaxime Lubin proposed openstack/nova-specs master: USB over IP  https://review.opendev.org/c/openstack/nova-specs/+/92068708:43
opendevreviewMaxime Lubin proposed openstack/nova-specs master: USB over IP  https://review.opendev.org/c/openstack/nova-specs/+/92068708:56
opendevreviewMaxime Lubin proposed openstack/nova-specs master: USB over IP  https://review.opendev.org/c/openstack/nova-specs/+/92068709:16
Uggla@gibi, can you have a look at this patch : https://review.opendev.org/c/openstack/nova/+/868089/8  can it be merged ?09:20
fricklernext failure, this time one of the "regulars" tempest.api.compute.servers.test_server_actions.ServerActionsV293TestJSON.test_rebuild_volume_backed_server10:07
bauzasyeepeekay10:08
sean-k-mooneyfrickler: bauzas i requeued the first patch but i kind fo feel like waiting for that to merge before queueign the rest10:43
sean-k-mooneywhat do ye think10:44
sean-k-mooneyshoudl we hold all other nova patches until this lands and ask peopel not to recheck on the mailing list10:44
sean-k-mooneythe random failure are not nessisarly load related so im not really sure if it will help10:45
sean-k-mooneybut we are consumign a lot of ci resouces enquing all 4 patches just to have them kicked out because of a failure in the first one10:45
fricklersean-k-mooney: on one hand I understand that concern, on the other hand these patches are being used in production in a lot of systems already, so IMO it should have highest priority to at least get them into master, even if other projects have to suffer a bit from that11:15
sean-k-mooneyok so your in faovr of quein gthe rest. i was not sure if it would increase the likely hood fo a failure but ok ill recheck the remaining 3 and get them moving11:16
sean-k-mooneyactully im not sure the second one need a recheck it still has +111:17
sean-k-mooneyi have rechecked the third pathch11:18
frickleras I said earlier I would also be in favor of moving these to gate directly11:18
frickleras the failures pretty certainly look unrelated11:18
sean-k-mooneyim not agaisnt taht i just dont want to keep pinging fungi, you or clark ot do that11:19
zhhuabjhi team, does anyone know why ci can't run in this patch - https://review.opendev.org/c/openstack/nova/+/90961111:19
sean-k-mooneyoh ya they are not related any more. we had a fully green run last night on the last patch so the set as a whole i think is good11:19
fricklerwell currently it is me who keeps pinging nova people with the offer ;)11:19
sean-k-mooneyi think we have got enouch results form check in agggreate to be ok with mergeing them as is at this point11:20
fricklerzhhuabj: that patch needs a rebase, see the "not current" on the relation chain11:20
zhhuabjthanks frickler, I will ping patch owner to do rebase to have a try, thanks11:24
sean-k-mooneyzhhuabj: i left some feedback on the parent patch https://review.opendev.org/c/openstack/nova/+/92037411:30
sean-k-mooneythe unit tests you added are not inline with our code style, specifcaly you are addming mulitle diffent test cases in a single test method and you are not using asserts correctly.11:31
fricklerok, so now I need to refresh my memory on how that "enqueue into gate" actually works ;)11:31
fricklerhmm, got myself a keycloak account, but that only seems to allow dequeue + promote, need to read more docs11:44
zhhuabjhi sean-k-mooney , I saw your comments , thanks for your review. you mentioned using assert_called_with instead of 'assert str(expected_call) == str(actual_call)', deepcopy may change object's memory address as well, so assert_called_with will not work12:07
fungialso, i'm done with my morning errands and can help (re)enqueue changes into the gate, reorder the queue, delete verified -2 results, et cetera12:07
fungifrickler: i keep forgetting about the webui, just been using the zuul-client cli12:08
zhhuabjsean-k-mooney: this is error log when using assert_called_with, we can see actually these two strings are equal, but assert-called-with will throw the exception, so I think that's the reason deepcopy has changed memory address of object - https://paste.openstack.org/show/boR6m5V8Zegg5p9iRXlG/12:10
sean-k-mooneythat is proably because instance jobjstefc has changed field tahre are not being printed12:11
sean-k-mooneyyou could compare the object_to_primitive dump but just comaprign the srting wont compare them properly12:13
fricklerfungi: yes, I just needed to figure out how to drop the V-2 first, but found that in the force-merge guide12:13
fricklernow I was just a little bit confused because I only touched one change, but the whole stack landed in gate at the same time12:14
fricklerbut it seems all the others had already finished their rechecks12:14
fungizuul will auto-enqueue dependent changes if they meet its requirements12:15
zhhuabjsean-k-moonkey: do you mean this one - https://paste.openstack.org/show/bd510w2dMMZdwOOjEvS7/ 12:25
sean-k-mooneykindof depending on which parmater had the untracked changes12:35
opendevreviewTakashi Kajinami proposed openstack/os-vif master: Remove old excludes  https://review.opendev.org/c/openstack/os-vif/+/91761112:36
opendevreviewZhang Hua proposed openstack/nova master: BlockDeviceMapping object has no attribute copy  https://review.opendev.org/c/openstack/nova/+/92037412:52
zhhuabjhi sean-k-mooney , I posted a new change to address your comment - https://review.opendev.org/c/openstack/nova/+/920374/3/nova/tests/unit/virt/libvirt/test_blockinfo.py12:54
opendevreviewMerged openstack/nova stable/2023.2: Fix disk_formats in ceph job tempest config  https://review.opendev.org/c/openstack/nova/+/92334312:56
opendevreviewSahid Orentino Ferdjaoui proposed openstack/nova master: scheduler: fix _get_sharing_providers to support unlimited aggr  https://review.opendev.org/c/openstack/nova/+/92166512:58
sean-k-mooneyzhhuabj: thansk we are currently focusing on upstream and downstream tasks related to the cve discustion yesterdaty but we will loop back and review when we have time13:04
opendevreviewAndrei Yachmenev proposed openstack/nova master: Fix processing uniqueness of instance host names  https://review.opendev.org/c/openstack/nova/+/92339513:10
dansmithhead nova patch is going to fail openstacksdk again13:23
dansmithsame autoallocate thing it appears :(13:25
sean-k-mooneythat has been flaky for a while it seams do we want to make it non voting until the cve patches are laneded or just recheck13:28
dansmithidk13:28
sean-k-mooneywe could quickly update it with the flaky decorator13:28
sean-k-mooneythat will skip if it fails and run it otherwise13:28
sean-k-mooneybut the sdk team would need to be aware they need to fix it13:29
fricklerthere's a fix already proposed for the sdk job https://review.opendev.org/c/openstack/openstacksdk/+/92337913:33
fricklerI guess we should fast-approve that and fix stephenfin's comments in a followup?13:36
sean-k-mooneystephenfin: ^13:37
sean-k-mooneyyes i think so13:37
fricklerdone and promoted to top of gate. I think we should also dequeue and re-enqueue the nova stack?13:41
frickleroh, the promotion took care of that already13:42
*** bauzas_ is now known as bauzas13:46
bauzassorry my IRC bouncer has some problems 13:46
bauzasfrickler: where are we now ?13:47
fricklerbauzas: the nova stack was failing again due to an issue in the sdk job. this is to be fixed by https://review.opendev.org/c/openstack/openstacksdk/+/923379 which I've now promoted to the head of the gate pipeline13:48
bauzasack13:49
frickleralso https://meetings.opendev.org/irclogs/%23openstack-nova/latest.log.html is always there to help you, but updated only every 15 minutes13:49
fungiif you're in a hurry and don't want to wait for the update, the less fancy txt version should be more continuously updated too13:59
*** whoami-rajat_ is now known as whoami-rajat14:00
fungithe opendevmeet logbot buffers messages somewhat but streams them more quickly to the txt file, then a cronjob periodically generates the html version from that14:00
fungiwhich is the reason for the apparent delay if you're only looking at the html version of the file14:03
fungifor example https://meetings.opendev.org/irclogs/%23openstack-nova/%23openstack-nova.2024-07-03.log (we don't have a "latest" redirector or any direct links for those, but could probably add them without too much work)14:05
*** bauzas_ is now known as bauzas14:16
fricklersdk patch got updated and gate restarted. I'll leave it like that unless the sdk job fails again in the top nova patch (doesn't seem to be happening 100%)14:18
ygk_12345hi folks14:19
ygk_12345can anyone tell me why we need to check resource limits when executing a qemu command in nova-compute using oslo prlimit utility ?14:20
fungiygk_12345: because qemu is not designed for running untrusted images, and broken or malicious images can cause it to consume unlimited resources (among other things)14:24
ygk_12345fungi: u mean something like ddos attacks 14:25
fungiygk_12345: well, not ddos (the first d in ddos means "distributed")14:31
fungithis would just be a plain old denial of service14:31
fungiygk_12345: though qemu tools can be coerced into doing worse things than just that, which is why we try to deeply inspect image files. see https://security.openstack.org/ossa/OSSA-2024-001.html and https://bugzilla.redhat.com/show_bug.cgi?id=2278875 for examples14:34
ygk_12345fungi: is this a new addition in bobcat release or is it in earlier versions as well ?14:38
fungiygk_12345: the openstack security advisory links to patches for releases as far back as 2023.1 (antelope), but there are also patches in the linked bug that make it possible to backport as far back as victoria (with some additional effort)14:39
fungior possibly even train14:40
ygk_12345fungi: one question though. the prlimit code, is it included in oslo_concurrency tool long back before this vulnerability was reported ? Also, do our antelope setups also have to be patched ?14:42
fungithe prlimit work was done previously, yes, in order to mitigate other known resource consumption risks in qemu. i was mentioning yesterday's openstack security advisory and that qemu bug as examples of other risks14:45
fungier, examples of additional risks i mean14:45
fungithe patches linked in ossa-2024-001 are new fixes for a vulnerability we just announced yesterday. the prlimit checks have been around much longer14:47
ygk_12345fungi: thanks for the information14:59
fungiyou're welcome15:02
*** bauzas_ is now known as bauzas15:27
frickleranother gate failure, timeout on tempest-integrated-compute-enforce-scope-new-defaults, luckily only on the 3rd patch and #4 was out already anyway with sdk failure. will re-enqueue once zuul finishes on them https://zuul.opendev.org/t/openstack/build/4c80df617e06413e9635e0968947809f16:19
fricklerhmm, that did reset the whole bunch of 15 other bunches behind it, though. so CI will be fully loaded for quite some more time16:23
fungitempest-integrated-compute-enforce-scope-new-defaults is what i was seeing repeated timeouts on yesterday as well16:23
opendevreviewMerged openstack/nova master: Reject qcow files with data-file attributes  https://review.opendev.org/c/openstack/nova/+/92325516:23
frickler\o/ 1 done, 20 or so to go? ;)16:23
fungi20 or so if you're just talking about nova. across the three affected projects and four maintained branches for the full ossa it's about 50 changes16:24
fungimaybe more now16:27
fungii think we linked 48 in the advisory, but that's not including the additional testing fixes that ended up going in yesterday16:27
frickleryes, we should seriously reconsider whether doing more than a single patch per branch is really a good idea before next time16:37
fricklerunless of course we fix all gate stability and capacity issues until then :-D16:38
clarkbits probably also worth considering reducing the bar set by the gate (both prior to major time sensitive changes showing up and generally)16:38
fungiwell, usually we try very, very, VERY hard not to do feature development for emergency security fixes and make them as concise and minimal as possible. this issue was an exception because of discovering that we basically needed to rip out any reliance on qemu-img being safe to rely on for security-sensitive codepaths16:39
clarkbonce upon a time removing jobs that were unreliable was a relatively common expectation (this is part of the reason we have the silent and experimental pipelines (though maybe silent is gone now))16:40
fricklerwell that's more questionable IMO. the high bar did at least notice a regression in cinder that went unnoticed before16:40
clarkbyes it is a tradeoff. The ideal is that if you've got a test job because you believe it is valuable to run that it will also be maintained to keep it reliable16:41
clarkbbut unfortunately doing so is often difficult and openstack has basically never managed to do so consistently over time and constantly flip flops between sad and more happy states16:41
fungithe original fix to just address the problem initially described in the bug report, and for which the ossa text focuses on, was going to be a single concise patch for each service16:42
fungiit was complicated by the discovery that the method we'd been relying on already to do those things isn't considered safe by the maintainers of the tool we were using for it16:42
fungiand when we started down that road, the qemu maintainers had made it clear to us that they had no intention of fixing it to be safe in any immediate timeframe16:43
fungi(turns out they did later decide to release a security fix for qemu anyway, but the approach we ended up with should be a lot more future-proof)16:44
opendevreviewMerged openstack/nova master: Check images with format_inspector for safety  https://review.opendev.org/c/openstack/nova/+/92325617:18
dansmithsean-k-mooney: melwitt: gibi: bauzas: Tempest tests for glance import are going to fail in our ceph job when glance patches merge and until a tempest fix is merged, which is currently stuck waiting on tempest-core18:02
dansmithwe can either mark that job n-v (which is easy) or try to strategically skip those tests, or we can just wait for tempest when gmann shows up and sees18:02
dansmithwhat's your preference?18:02
dansmithokay gmann to the rescue18:04
dansmithfungi: abhishek_ I think gmann is +Wing now18:04
gmannjust logged in, checking18:04
gmanndansmith: abhishek_: this one also right ? https://review.opendev.org/c/openstack/tempest/+/923357 18:05
dansmithfungi: so I would prioritize *those* and drop abhi's patch I think18:05
fungiokay...18:05
dansmithyes18:05
fungijust a sec18:05
abhishek_and this one as well, https://review.opendev.org/c/openstack/tempest/+/92335218:05
fungi923357,3 and 923352,1 both for tempest18:07
abhishek_yes18:07
dansmithI hope fungi is doing blowing smoke from his finger guns after each of these like I imagine in my head18:08
fungiif only i were that cool18:08
abhishek_you are18:08
dansmith+218:09
gmannabhishek_: +w another one also, one comment for releasenotes but can be done in follow up https://review.opendev.org/c/openstack/tempest/+/923357/3/tempest/config.py#64818:09
fungiand both have approvals now, so doing18:09
gmann++18:09
dansmiththank you gmann18:09
gmannnp! sorry for login late. my toddler took a lot of time for his breakfast today :)18:10
fungiokay all done and dequeued glance's 923433,2 at the same time18:10
abhishek_gmann: ack, thank you!18:11
dansmithfungi: do it do it!18:11
abhishek_fungi ++18:11
* fungi blows smoke off the end of his finger-gun18:11
dansmithYASS18:11
abhishek_:D18:12
fungicount on me for all your fanservice needs18:12
dansmithheh18:12
sean-k-mooneydansmith: looks like gmann +w'd it18:19
sean-k-mooneyso i guess we want to wait for https://review.opendev.org/c/openstack/tempest/+/923352 to merge18:19
dansmithsean-k-mooney: yes, it just unfolded as soon as I brought it up :D18:19
dansmithsorry for the noise18:19
fungiand both merged about 35 minutes ago now20:36
dansmith++20:51
*** haleyb is now known as haleyb|out21:18
fungilooking into options to less-disruptively speed merging for the remaining security patches... are there any other changes not explicitly mentioned in the ossa which i should avoid sending to the back of the line?21:23
fungigoing to start fiddling with some surgical queue reordering in about an hour once i get my dinner down21:25
melwittfungi: https://review.opendev.org/c/openstack/nova/+/923344 is a backport that some glance CVE patches depend on21:27
melwitt(that I don't see mentioned in the ossa)21:27
sean-k-mooneymelwitt: that not part of the cve fix really21:56
sean-k-mooneymelwitt: thats just a ci change21:56
melwittsean-k-mooney: I know, but it's my understanding that it's required for some of the glance CVE backports. so it probably should be avoided to send to the back of the line21:57
clarkbthe entire gate just reset I think22:02
clarkbso now is a good time to reorder cc fungi 22:02
sean-k-mooneymelwitt: ah good point22:02
fungijust got back22:11
fungiassembling the list now22:12
fungimmm, this is going to be tricky because i have to pass in both the project name and the change,revision for each change i evict and re-add, so a simple loop won't suffice22:23
dansmithyeah, that one is required for the glance on the same branch22:23
dansmithpoor glance hasn't merged anything yet, but the head change is in gate right now22:25
fungishould https://review.opendev.org/c/openstack/glance/+/923433 be preserved too? it switches a job to non-voting22:26
dansmithfungi: this got kicked out of gate but should be able to right back in: https://review.opendev.org/c/openstack/nova/+/923258/322:26
dansmithand the parent of it is in gate now I think22:26
fungii can insert it between the dequeue and reenqueue steps, sure22:27
dansmithfungi: I think abhi still wanted that mostly to insulate them from job timeouts, but it's not strictly required and I think it'd be better if we don't drop all that testing now that tempest is merged22:27
dansmithso you'd have my blessing to kick that out and make space, or I can if you want22:27
fungiwfm22:27
fungiadding it to the list of stuff i'll move to the back of the line22:28
dansmith++22:28
fungiokay, surgery complete22:36
fungithough in retrospect, as none of the changes at the top were being kept, it ended up being the same as if i'd done a promote of the ossa changes i guess22:37
fungii guess i expected there to be more changes from the ossa already approved22:45
dansmithmeaning the stable ones?22:46
fungiturned out there were only 5 in the gate (plus the one that was kicked out and re-added)22:46
dansmithall the master ones should be approved I think22:46
dansmithgiven how things have been going with the insane cross-dependencies I think we've been kinda trying to make sure they are merging in release order22:46
dansmithbut if you really want them all +Wd we can probable just do that22:47
funginah, it's okay22:47
dansmithI've heard great things about melwitt's +W finger22:47
melwittlol22:47
melwittyup, I'm really great at clicking buttons22:47
*** bauzas_ is now known as bauzas23:43

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!