Thursday, 2023-05-11

jandersgood morning Ironic o/01:07
jandersJayF TheJulia when you have time I wanted to touch base with you regarding https://review.opendev.org/c/openstack/ironic/+/881358 on behalf of our partner - it seems it's ready to go, has 2 +2s, would it be possible to merge it? Thank you in advance!01:08
JayFI didn't realize I was the second +2, I'll approve03:33
JayFNo, I'm only the first in that patchset03:33
JayFThe new UI makes it more difficult to tell03:33
jandersthank you Jay! yeah I am struggling with the new GUI too03:36
*** dmellado9 is now known as dmellado05:04
rpittauJayF: thanks for checking, I saw that too but it looks like normal behavior, it's easy to verify in any other passing snmp job with focal, for example https://de19e57dd79c29d6fa65-9c39e1b31aead2b89889cdc7bed43508.ssl.cf2.rackcdn.com/882732/2/check/ironic-tempest-wholedisk-bios-snmp-pxe/959ed74/controller/logs/screen-virtualpdu.txt06:41
rpittauthe libvirt "domain not running" error is due to the domain being powered off, probably the message can be improved06:42
rpittaugood morning ironic! o/06:43
opendevreviewRiccardo Pittau proposed openstack/ironic master: Migrate to pysnmp lextudio ecosystem  https://review.opendev.org/c/openstack/ironic/+/88291706:53
opendevreviewSandeep Yadav proposed openstack/ironic master: [DNM] this is just a test  https://review.opendev.org/c/openstack/ironic/+/88291807:24
opendevreviewRodolfo Alonso proposed openstack/ironic master: Test LP#2019186  https://review.opendev.org/c/openstack/ironic/+/88293609:40
iurygregorygood morning Ironic11:04
opendevreviewMahnoor Asghar proposed openstack/ironic master: Add to Redfish hardware inventory collection  https://review.opendev.org/c/openstack/ironic/+/88178311:54
opendevreviewMahnoor Asghar proposed openstack/ironic master: Add to Redfish hardware inventory collection  https://review.opendev.org/c/openstack/ironic/+/88178312:15
opendevreviewPierre Riteau proposed openstack/bifrost master: Skip unnecessary SDK get_machine calls  https://review.opendev.org/c/openstack/bifrost/+/88295013:28
JayFrpittau: okay, in that case I'm back to wtf-ville with you13:46
TheJuliagood morning13:47
iurygregorymorning TheJulia JayF o/13:48
JayFo/13:49
* TheJulia does her best zombie impression15:02
TheJuliabrrrraaaaaaiiinnns15:05
TheJuliaZombie movie Thursday?!15:05
jrosserhas anyone ever seen something like this with dell r730 era nodes 1) pxeboot tries a seemingly random nic out of all available ones 2) immediately then boots from HDD without trying the rest of the PXE capable interfaces15:10
TheJuliajrosser: UEFI mode or bios mode?15:11
jrosserin this case it is bios mode15:12
jrosserwe're having a bit of a nightmare with these actually to get the behaviour predicatable and reliable enough15:13
TheJuliaWhat I've done on r730s is disable all the other network interfaces which shouldn't be setup for pxe15:25
TheJuliaconcurrently set pxe_enabled=False on the other network ports if there are mutliple configured in ironic15:26
jrosseris that disable them in the server setup/idrac?15:26
TheJuliayers15:26
TheJuliayes15:26
jrosserwe've only specified one interface in the ironic config15:27
jrosserdoing factory reset has improved things a bit15:27
TheJuliaI think only ilo's driver knows to set/enforce that via the bmc15:27
jrosserbut still there are "some times" "some of the r730" do this wierd boot order15:27
TheJuliaso in idrac, you really need to do it server side config15:27
jrosserand then those get stuck in cleaning as they end up rebooting back into the last deployment15:28
TheJuliawheeeeeee15:28
TheJulia... I'd look to see if UEFI mode to see if that changes since it overall changes the model/process and even running state config15:28
jrosseryes, we were using the idrac driver but that was worse from this boot order resepect15:29
jrosserit set it persistently i think then the bmc got confused and it would never change again15:29
jrossertoday changed it all back to ipmi and things are better15:30
dtantsurThis is a bit concerning: https://zuul.opendev.org/t/openstack/builds?job_name=ironic-tempest-bfv&project=openstack/ironic15:30
dtantsurThe last two runs failed with a confusing error.15:30
dtantsur Failed to request detach for volume 640c7e76-99fb-42ae-80b3-5000f102af1f from cinder for node 82fb464a-0581-4590-b049-42a405053084: Invalid volume: Unable to detach volume. Volume status must be 'in-use' and attach_status must be 'attached' to detach.15:31
TheJuliaugh15:31
TheJuliajrosser: I suspect what is at play is the configuration jobs which get applied in the idrac driver. some of that happens automatically with ipmi, but it is much more targetted/one time15:33
jrosseryes, we could see those being applied on the console15:34
jrosserthe nodes would get stuck trying to retrieve the power state15:34
dtantsurHmm, the first failure is actually ConflictNovaUsingAttachment: Detach volume from instance 2cad5e60-d082-44fd-af68-f6875e62579e using the Compute API (HTTP 409)15:34
dtantsurThen further retries cause the error above.15:34
jrosserand if we reset the bmc remotely, that would unwedge things15:34
TheJuliajrosser: what idrac firmware is this?15:34
jrosserwhatever is latest on r730, in desperation we upgraded everything15:35
TheJulia2.70.?100?.something ?15:35
TheJuliadtantsur: I *bet* the CVE fix broke the job15:36
dtantsurwhich one?15:36
dtantsurhttps://opendev.org/openstack/cinder/commit/6df1839bdf288107c600b3e53dff7593a6d4c161?15:37
dtantsurTheJulia: do you have an idea for a fix or should I reach to cinder folks?15:37
TheJuliagive me 5 minutes15:38
TheJuliaI'm on a call at themoment15:38
dtantsursure15:38
dtantsuryep, that's the issue. "Detected user call to delete in-use attachment. Call must come from the nova service and nova must be configured to send the service token." in CInder logs.15:40
TheJuliayeah, looks like it is masked in the client code15:41
TheJuliawe're just calling the client and *boom*15:42
dtantsurI'm a bit puzzled why the error is different the 2nd and 3rd time.. but okay.15:42
TheJuliawe need to explicitly use a service token it seems15:46
jrosserTheJulia: we built an update iso which had iDRAC 2.84.84.84 / IDRAC with Lifecycle Controller V.,2.40.40.40 on it15:47
JayFI need to look at IRC logs, I think you all are doing a fix that nova already did15:47
JayFbut I lost context because my internet15:47
JayFI think they put a fix in the cinder tempest plugin15:48
JayFbut imbw15:48
* JayF read the backlog15:48
TheJuliayeah, i think in this case it is job configuration for us15:48
TheJuliabut we need to look at the config15:48
opendevreviewVerification of a change to openstack/ironic master failed: [iRMC] Fix parse_driver_info bug enforcing SNMP v3 under FIPS mode  https://review.opendev.org/c/openstack/ironic/+/88135815:48
TheJulialets mark the job non-voting for now, we're going to need to sort out a proper fix and make sure we're doing the needful at the right time15:49
JayFmight not hurt to ask someone from qa team if they have the answer in their back pocket and save some research time15:49
JayFbut I'll let you all decide that15:49
TheJuliaI'm looking at the code, they added additional checks15:49
dtantsurI don't think Nova can fix it for us, the error seems to be originating from the conductor?15:50
TheJuliahttps://review.opendev.org/c/openstack/cinder/+/882835/2/cinder/volume/api.py#254315:50
TheJuliayeah, because we do our own config toggling as well15:50
TheJuliaNova never updates us beyond the initial bind15:50
JayFack; the thing I saw was likely nova solving a similar bug then15:51
TheJuliajrosser: hmm, I know the base idrac version is right, the lifecycle controller version seems oldish, but the last time I waded into that I bricked a server15:52
jrosserit does seem old i agree, it's not completely clear whats going on there15:53
TheJuliayeah15:53
* jrosser is reminded why these were pretty much the last dells i bought15:54
TheJuliaAnyhow, if you can give UEFI a spin, I suspect the config application for interfaces will behave differently15:54
jrosserok thats interesting to know, just getting these stable will be a big win15:55
jrosserit's turned into a diversion from actually what we were trying to do now15:55
TheJuliaack, we explicitly set interfaces in the system setup to limit what can boot, which helps with idrac and ipmi drivers15:56
TheJuliain uefi, it should be a static order15:56
TheJuliaif *not*, that is definitely a firmware bug15:57
jrosseris uefi more explicit about which pxe device should be used?15:58
TheJuliayes, it gets asserted based upon NVRAM setting order15:59
TheJulia(for the record, the r730s and their blade brethren are the most painful lab machines I get to work with)16:00
jrosserdoh :)16:00
TheJuliahave you ever watched Legends of Tomorrow?16:01
jrosseri have not16:01
TheJuliaahh, you wouldn't get the idea of what it is like then :)16:02
TheJuliaAt least, the one in my head16:02
rpittaugood night! o/16:10
dtantsur"It is my pleasure to invite you to the Metal3 Community's Project Team Gathering, which will take place on May 15th, 2023 from 14:00 to 17:00 UTC."17:18
dtantsursee metal3-dev for details17:18
dtantsurSee you on Monday o/17:19
JayFTheJulia: iurygregory: dtantsur: others: Should we cancel the Ironic meeting since it overlaps with the metal3 PTG17:20
JayFOur meetings are rarely well attended and neccessary, and this is a good excuse to not go through the motions for one week lol17:21
dtantsurI can keep one eye on it, but I'm all for canceling too17:21
iurygregoryI'm ok with both approaches17:21
* TheJulia really wishes we could have gotten the metal3 community to meet alongside us17:21
JayFwe did, we just show up and we meet alongside them17:22
JayFsome of them came to our PTG; we could offer the same courtesy17:22
TheJulia++17:22
dtantsurTheJulia: half of the community is here ;)17:22
TheJuliaI concur, we should cancel our meeting next week17:22
TheJuliadtantsur: true!17:22
opendevreviewJulia Kreger proposed openstack/ironic master: DPU modeling - parent_node DB/Model/API  https://review.opendev.org/c/openstack/ironic/+/88011417:33
opendevreviewJulia Kreger proposed openstack/ironic master: WIP: execute on child node support  https://review.opendev.org/c/openstack/ironic/+/88298217:33
TheJuliagetting there... on the child node execution17:33
JayF'child {} execution' (another reason I wish I could think of something better than parent/child)17:34
TheJuliayeah17:34
TheJuliaso, I  think the issue is we just don't operate cinder with a service config in devstack17:49
TheJuliahmm, it *should* be working17:56
TheJulia openstack --os-cloud devstack-system-admin role add service --user ironic --project service --user-domain Default --project-domain Default <-- le sigh17:56
TheJuliaapparently we might need to add more stuffs18:08
opendevreviewJulia Kreger proposed openstack/ironic master: WIP bfv service change  https://review.opendev.org/c/openstack/ironic/+/88298518:09
TheJuliaso the tl;dr is we're not being recognized as a service via the token, we might need to investigate client invocations18:17
TheJulia^ might need to change too18:17
JayFack18:17
JayFthank you for diggin this18:17
iurygregorypeople saying that ipa was looping over all devices they had in the machine...18:39
iurygregory84 disks... in a storage node18:41
iurygregoryOMG18:41
iurygregory.-.18:41
iurygregoryand what peple do because the node was still inspecting? they just kill the metal3 pod \o/18:41
iurygregoryanother chapther for the hardware book of horror18:43
TheJuliayes, by design...18:45
TheJuliaoh, inspecting only?!18:45
TheJulialooping over doing what I guess is the major question18:46
TheJulianone of the io should be paused that long and nor disk actions take long at all18:46
TheJuliaunless say... SATA expanders are at play18:46
iurygregoryit was looping to identify multipath devices18:52
iurygregorybut I think if they provide a root device hints with the disk it would just avoid looping 18:53
JayFChance to have feedback on swag... a couple of options we're looking at (no promises it'll be any of these): https://imgur.com/a/ArPA4ma18:53
iurygregoryhttps://paste.opendev.org/show/bLyoeAuT0IoORYm4b8Ta/18:53
JayFhave maybe 10 minutes to give feedback if you want before we order :)18:53
iurygregoryipa was showing this, but they decided to kill the pod, so I don't have logs from inspector to see .-.18:54
iurygregoryironic socks?!18:54
iurygregoryJayF, you rock!18:54
JayFg-research oss rocks*18:54
iurygregorytks g-research =)18:55
JayFseriously though, those look OK?18:55
iurygregoryyou have a +2 from me18:55
JayFhttps://imgur.com/a/N5fHYIL was an alternate we were looking at18:55
JayFiurygregory: which one :) 18:55
iurygregoryok, now it's complicated18:55
iurygregorybecause I like blue a lot18:56
JayFthe first link had two, fwiw, if you didn't see it18:56
iurygregorywoot?!18:56
iurygregoryI was looking at only the brown one18:56
JayFyeah the second one is blue there too18:56
iurygregoryAHA18:56
iurygregoryjust saw18:56
JayFalmost like a business sock for business (with foundation ironic logo)18:56
JayFand a pajama sock for fun time (with the fun original Lucas logo)18:56
iurygregoryJayF, I think you are almost in a war zone18:56
JayF?18:56
iurygregorybecause is old pixieboot vs new one18:57
iurygregoryI like the old logo a bit more18:57
JayFis intentional. We are both old and new :D we cna have one of each18:57
iurygregoryXD18:57
iurygregoryI vote for the old one18:57
JayFTheJulia: you have any opinions on these? https://imgur.com/a/ArPA4ma and https://imgur.com/a/N5fHYIL18:58
iurygregorythe old mascot in all designs would look awesome fwiw18:58
JayFright now barring further feedback I think we'd use the pair of designs in the first link18:58
JayFiurygregory: I'll see what others think. It can be changed but I like the idea of having one of each tbh18:58
iurygregory++18:58
TheJuliathe later suspiciously looks hockey-ish ;)18:58
TheJuliano issues :)18:58
JayFTheJulia: fwiw these were all designed by the Insight Softmax HR woman (the actual company on my paycheck that contracts me to GR), she did a good job and saved me from having to design socks :)18:59
JayFso if they were hockey it was accidental18:59
TheJulia\o.19:00
TheJuliaerr19:00
JayFTheJulia: for that first, brown and grey one, would you go old or new logo?19:00
TheJulia\o/19:00
TheJuliaI would do newer19:00
JayFokay, so that's 2-1, iurygregory loses by a single vote :P 19:00
TheJuliaheh19:00
JayFor 4-2, depending on if we're using openstack voting rules lol19:01
JayFwe're trying to get those socks printed by the summit; I'll have them for people attending then can mail out to other contributors who are unable to attend19:01
JayFdtantsur: rozman: Gauntlet thrown down. We'll have pixie boots socks, now we need metal3 gauntlets or something :D 19:03
iurygregoryXD19:05
iurygregorymetal3 gauntlets omg19:13
opendevreviewIury Gregory Melo Ferreira proposed openstack/ironic master: Add DB model for Firmware  https://review.opendev.org/c/openstack/ironic/+/88303119:14
iurygregorynow time to work on the DB API19:16
TheJuliawheeeeeeeee19:18
iurygregoryprobably finishing this and object structure today (I hope)19:27
TheJuliaokay19:36
TheJuliaI need to revisit the dpu stuff, I might go get an actual lunch and then revisit19:36
opendevreviewHarald JensÃ¥s proposed openstack/ironic-tempest-plugin master: rback - Fix vif_attach expected return values  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/88303219:43
iurygregoryTheJulia, happy to take another look at it later today (going to the gym and physiotherapy for my knee) 19:45
TheJuliaiurygregory: sure, I'm going to get lunch and then revisit if nobody is screaming19:46
TheJuliaI think in the mean time, we may just want to mark the bfv stuff non-voting while we work on it19:46
iurygregory++19:47
JayFI'd +2 a patch to make bfv non-voting, as long as we're 100% certain it's a devstack/config issue, not an incompatability with the fix for Ironic BFV functionality, if that makes sense20:01
JayFIf devstack (or Ironic plugin for it) is broken, but the feature works: NV it. It'd scare me to NV a job that was failing legitimately on a feature we want working.20:02
TheJuliaJayF: If it doesn't work with the change I posted, we're looking at likely having to add more code to session handling, which means not a quick fix anyhow20:42
JayFwell here's my concern: if we have to patch Ironic to work with the cinder patch20:43
JayFthat means any ironic user that installs the security patch and/or config will lose cinder support which is scary20:43
TheJuliayup20:43
TheJuliac'est la vie20:43
JayFthat will likely generate another OSSN and need us to do backports+releases 20:43
TheJuliaThe security fix is also breaking for folks using cinder directly20:44
TheJuliasince now service token usage is a hard requirement20:44
JayFaha, got it20:45
TheJuliaI think since they made an intentional break, it is not OSSN territory for us, just backports + releases20:45
TheJuliasince inherently the behavior on the service has changed20:45
TheJuliaand is so noted20:45
TheJuliaSpeaking of! Since I've had lunch!20:45
JayFI'd let fungi make the call if we have to go that way. OSSN or not is just a question of "will operators know how to fix this?"20:46
JayFalso I have not had lunch so don't make it too hearty TheJulia ;) 20:46
* JayF 's wife is out of town, which means he's lost track of all time and space20:46
fungii will try to quickly catch up on this discussion and see how i can help20:47
TheJuliaYeah, we're going to have to patch ironic most likely to force service token usage20:48
TheJuliasince apparently we're not being classified as a user of the service token20:48
JayFfungi: tl;dr: we're fairly certain at this point Ironic will require a patch to work with Cinder post-their-security-patch20:49
fungiyeah, i was worried the fallout from that might be wider-spread than just nova20:50
JayFBluntly, looking at the nature of the change, they're going to break downstream code for a lot of people, too.20:50
JayFThat was an incredibly disruptive patch20:50
TheJuliaAlso, the entire model of nova being the only consumer is kind of making me want to table flip20:50
TheJuliaIncredibly frustrating :(20:50
fungiyes, i pushed hard repeatedly to find alternatives and was told there basically were none if we want it fixed completely. the design was bad20:50
JayFIn the future, someone checking to see if our CI, and other projects, work against a patch like that would be nice. Or at least roping someone from Ironic n.20:51
fungiprobably the biggest deciding factor on whether it's a security-something is if the same vulnerability can be exploited by users of ironic in similar ways to how nova could20:51
TheJuliasigh, true20:55
TheJuliaI guess it also depends on enforced policy and defaults as well20:55
JayFIt's obvious that it's an unintended side effect of the security process, but I'm not super thrilled that we might have Ironic users actively breaking their installs right now in response to an OSSA :\ 20:56
TheJuliahttps://github.com/openstack/ironic/blob/master/ironic/common/policy.py#L151820:57
TheJuliaa member of the system itself, or an owner/lessee admin20:57
TheJuliaso basically humans with administrative rights could only use the feature anyhow20:58
TheJuliaotherwise it would be requested through nova20:58
JayFowner/lessee admin is a project-admin though20:58
JayFto be clear, right?20:58
TheJuliawhich had the elevated access20:58
TheJuliauser in project *with* admin role20:58
TheJulianot any user in the project20:58
JayFack20:58
TheJuliaI'd likely need to better understand the nova issue itself, but since we can still get the bind request through and the instance boots, that is a good sign20:59
TheJuliaI just suspect all power related recovery/sync with cinder ops are broken21:00
TheJuliasince we detach/reattach around power actions in case the endpoint config changes21:00
TheJulia(so we don't inadvertently break things21:00
TheJulia)21:00
* JayF AFK, need to get away from the PC and find some lunch21:02
TheJuliaso, yes. We error on the detach/reattach op we pull on power operations as well21:27
fungisorry for the uncomfortable silence, had to step away to grab some dinner for a sec21:30
fungibut yes, this is on me. we have a policy that says we don't fix bugs like this. if the only way to fix a vulnerability is by breaking existing deployments then we call it a class b1 "a vulnerability that can only be fixed in master" https://security.openstack.org/vmt-process.html#report-taxonomy21:32
fungii caved to pressure for backporting security fixes in this case, called it an exceptional circumstance, maybe i should have stuck to our longstanding policy on the matter21:33
funginormally we'd just say, "well we made some poor design decisions and now realize that there's a vulnerability in all existing versions, if you really care then there's a patch in master or you can wait and upgrade to the next coordinated release"21:34
TheJuliaI think the issue is we *should* be recognized as a service which short circuits the check, but It looks like we just aren't for some unknown reason. I guess I need to add some code to dump out what we're using to send on the credentials just to make sure....21:34
fungithe reason nova folks were involved in the discussion is that the original reporter reported it as a bug against nova, and i should have thought to ask the developers if the breaking change was going to break anyone not already involved in the discussion, but since we don't normally even allow breaking changes like this to be backported it didn't occur to me21:35
fungino excuses, sorry about that21:36
TheJuliac'est la vie21:36
fungialso we overran our self-imposed 90 day embargo limit by a week trying to do it. if i'd known early on how far-reaching and complicated this was going to be i would have insisted we just switch to working through it in public, and maybe some of this could have been avoided (at the expense of people who get upset when they don't get a heads up in advance with a fix before a21:42
fungivulnerability becomes public, but those same people are likely to be just as annoyed if the fix breaks something we didn't catch because we were trying to design fixes in a locked room)21:42
fungiit started out looking pretty simple, but then snowballed as the tangle was unravelled revealing more and more corner cases that were still vulnerable21:46
TheJulialooks like we use a different method when talking to the keystoneauth helper, going to look at details now21:48
TheJuliasame method, different name21:52
JayFfungi: I think this is one of the rare cases where a public RCA, or at least post to the mailing list for discussion and such, might be a good idea. At least once we get to the end of all of it :)22:02
fungii absolutely agree we should discuss it at length on the ml. i'm open to additional communication avenues too of course, but that's the bare minimum in my opinion22:03
JayFI say RCA just because in my previous experience that's how I'd have described the process22:03
fungiwe had multiple process failings on this one, including some i haven't even brought up22:04
JayFright now my scope of concern is (calling back to the TC/board sync) us violating our biggest feature of being boring :)22:05
fungithis was indeed an unfortunate bout of exciting22:06
fungione i do not wish to repeat22:06
TheJuliaokay, I think I get what is going on22:07
TheJuliawe *basically* need to do https://review.opendev.org/c/openstack/nova/+/88285222:07
TheJuliaor at least the same basic idea22:08
TheJuliabecause we get the pass-through rights, but in this case, we *must* use elevated privs22:08
TheJuliasuch that they are cast as service actions22:08
JayFif the equivalent change in Ironic is similarly scoped, it's hard for me to imagine us backporting it22:08
JayFunless we were extremely delicate to not break if people didn't set the new configs22:09
TheJuliaeh... it is not horrible actually22:09
JayFmore proof of my lack of imagination ;) 22:12
TheJuliaso yeah, the tl;dr is we end up leaning towards the request context the request comes in on, but in this very specific case, to meet the change, we need to elevate our access22:21
TheJuliai *think* this should do it22:21
opendevreviewJulia Kreger proposed openstack/ironic master: WIP bfv service change  https://review.opendev.org/c/openstack/ironic/+/88298522:22
TheJulia(well, tests and all, but I think it is the same basic idea as what nova did22:22
TheJuliaour operating pattern is a little different there though, but not awful22:22
JayFif you fixed it in that small of code22:36
JayFgood job22:36
JayFwill users need to set that keystone value, or just in devstack?22:36
TheJuliathe docs suggest it should be set22:38
TheJuliaI've not run down that code path22:38
TheJuliayet22:38
JayFack22:40
TheJuliaI think I'm going to end up jumping in a pool soon22:40
JayFsounds like a good plan for the afternoon :) after my EOD I have to keep a very sad dog exercised against his will (my wife not being here, and he has sep anxiety) then hunker down for Canes/Kraken games22:42
opendevreviewJulia Kreger proposed openstack/ironic master: DPU modeling - parent_node DB/Model/API  https://review.opendev.org/c/openstack/ironic/+/88011422:58
TheJuliaenjoy!22:58
JayFI'm off to do ^ that :D 23:01
TheJuliaoh, likely need to turn off user auth as well23:05
TheJuliaanother rev, later23:05
opendevreviewIury Gregory Melo Ferreira proposed openstack/ironic master: Add DB model for Firmware  https://review.opendev.org/c/openstack/ironic/+/88303123:46

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!