Thursday, 2023-11-30

opendevreviewSteve Baker proposed openstack/ironic-python-agent master: Switch to utf-8 for parsing efibootmgr -v  https://review.opendev.org/c/openstack/ironic-python-agent/+/90221401:59
stevebaker[m]TheJulia: alright, here is my take02:01
stevebaker[m]oh no, this would need to be backported all the way02:03
TheJuliaWheeee :(02:16
rpittaugood morning ironic! o/07:53
opendevreviewVerification of a change to openstack/ironic master failed: Fix *_by_arch documentation and un-deprecate the options without it  https://review.opendev.org/c/openstack/ironic/+/90195808:17
adam-metal3Hi all, this change is now breaking the virtualmedia boot in the Metal3 CI https://review.opendev.org/c/openstack/ironic-python-agent/+/895519/1 I have pasted the actual IPA error here with a link to the job https://paste.openstack.org/show/bZj140OMuzw5ZYtQcEc2/, we are running vmedia in a DHCP enabled env without glean in these tests so proably that is the cause of the confusion08:53
adam-metal3TheJulia: would it be possible to put this "glean restart" logic behind an option? It is quite limiting for any alternative network condfig approach that is not glean + networkd, or glean + NetworkManager with the rhifcfg plugin08:55
opendevreviewOpenStack Release Bot proposed openstack/ironic-inspector bugfix/11.8: Update .gitreview for bugfix/11.8  https://review.opendev.org/c/openstack/ironic-inspector/+/90224409:12
adam-metal3or actually it is enough if this wouldn't be active if gelan is not present09:12
adam-metal3I would be happy to push the fix09:13
opendevreviewOpenStack Release Bot proposed openstack/ironic-python-agent bugfix/9.8: Update .gitreview for bugfix/9.8  https://review.opendev.org/c/openstack/ironic-python-agent/+/90224709:17
opendevreviewOpenStack Release Bot proposed openstack/ironic bugfix/23.1: Update .gitreview for bugfix/23.1  https://review.opendev.org/c/openstack/ironic/+/90225109:17
rpittauadam-metal3: let me check quickly, can you please open a bug in launchpad?09:37
rpittautrigger_glean_network_refresh should be triggered only if we  actually have the backup in place,we gave as assumed that that is always the case09:45
rpittauadam-metal3: feel free to propose a fix for that09:45
dtantsuradam-metal3, rpittau, TheJulia: that absolutely should not have merged...10:02
dtantsurokay, it's morning, and I'm grumpy, and I'm going to call out people10:02
dtantsurTheJulia, JayF, stevebaker[m], you've merged a change on the critical path in IPA with 0 unit tests. Please don't do that any more.10:03
dtantsurah, gerrit UI shows me the wrong thing. Apologies then.10:04
dtantsur(gerrit loves to insert a changeset number in all links, sigh)10:05
dtantsurYet, I believe this change should be urgently reworked or reverted10:12
TheJuliaSigh, revert it with links to logs and hopefully I’ll have something to be able to look at next year even if I have time and spoons10:13
dtantsurThat's supposed to be waaaay too early for you10:13
TheJuliaAnd the gerrit ui really likes showing cached data now… :(10:13
dtantsurMy biggest beef with that is that it has no checks if glean is even present10:14
dtantsurIf someone uses cloud-init or a self-written script (I'm aware of at least attempts to do both, the result is a failure)10:14
TheJuliaReasonable, just not part of the original scope we as a group were expected10:15
dtantsurTrue. But what we want to support is one thing, what people have is a whole different one..10:16
TheJuliaCloud init itself has to be modified or use non-stock config to do anything but dhcp too10:16
TheJuliaIndeed10:16
TheJuliaOnly way to find that out sometimes is to break something else10:16
dtantsurI'll leave comments on the patch to make re-introducing it easier. But I'm afraid I also don't have spoons to properly fix and test it.10:17
TheJuliaIt took me like six weeks, I t is not a easy place in the path to fix/korify10:18
TheJuliaModify10:18
* TheJulia tries to go back to sleep10:19
dtantsurYep, get rest still10:19
adam-metal3Okay, so how to go about this now, are you reverting and reworking, or should I just start fixing and opening the bug in launchpad (I was having lunch during the time you have discussed)? 10:21
dtantsuradam-metal3: I'll revert the change for now, but please do file the bug.10:21
adam-metal3dtantsur: okay sounds great, I will open the bug10:22
dtantsuronce you have it, I'll propose the revert with Closes-Bug10:23
adam-metal3dtantsur: https://bugs.launchpad.net/ironic-python-agent/+bug/2045255 10:32
opendevreviewDmitry Tantsur proposed openstack/ironic-python-agent master: Revert "Fix vmedia network config drive handling"  https://review.opendev.org/c/openstack/ironic-python-agent/+/90223110:33
dtantsuradam-metal3, rpittau, TheJulia ^^10:34
adam-metal3+110:48
iurygregorygood morning Ironic11:18
dtantsuradam-metal3, iurygregory, rpittau, another CI blocker: https://github.com/metal3-io/ironic-image/pull/45411:48
dtantsurJayF: I think you mentioned this issue previously ^^11:48
iurygregoryI remember something related to this11:51
dtantsurI think it went unnoticed because the generated version of ironic-lib matched the version in upper-constraints11:52
dtantsurThen we landed something - and boom!11:52
iurygregoryto many booms this week11:53
iurygregoryat least tomorrow is friday11:53
iurygregoryso I just need to survive two more days :D11:53
dtantsuryeah, I'm completely overwhelmed (and becoming visibly snappy, which is not great)11:54
rpittaudtantsu adam-metal3 +W the revert12:22
* rpittau goes back to lunch12:22
adam-metal3rpittau: thanks!12:23
adam-metal3dtantsur, JayF: related to yesterday's discussion about Dmitry's ironic-operator demo https://youtu.be/0ScIghaBUhY?t=135112:26
opendevreviewDmitry Tantsur proposed openstack/ironic-python-agent-builder master: [PoC] An element to hijack Glean for a configurable configdrive label  https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/90229113:17
dtantsurTheJulia: I had something like this in mind ^^^ (absolutely untested)13:17
opendevreviewMerged openstack/ironic master: Generic API for attaching/detaching virtual media  https://review.opendev.org/c/openstack/ironic/+/89491814:02
dtantsurwoohooo14:04
rpittau\o/14:33
dtantsuriurygregory: fancy an lgtm on https://github.com/metal3-io/ironic-image/pull/453?14:50
iurygregorydtantsur, yes sir!14:52
dtantsurthx :)14:52
iurygregoryBMO will probably need some changes I think 14:52
dtantsuriurygregory: I've checked it briefly, and it seems like it does not use 'idrac' explicitly other than for the driver14:53
iurygregorynice!14:53
dtantsurIf you could take another look, would be great14:53
dtantsur(just to double-check me)14:53
iurygregorysure, will do14:53
TheJuliadtantsur: I need something backportable14:56
TheJulianot-backportable, is a total no-go.14:56
dtantsurTheJulia: do we follow the stable policy in IPA-builder even?14:56
TheJuliarpittau: does metal3-integration not test vmedia?14:56
dtantsuralso, the change I suggest has a lower risk than what you tried to land14:57
dtantsurso, if we backport either, I'd vote for IPA-builder14:57
TheJuliaWell, to be fair, It landed after working on making sure it works and working to fix other issues in the entire ecosystem of virtual media boot14:57
dtantsurTheJulia: re metal3-integarion: due to resource constraints, it's very VERY simple14:57
TheJuliaso basically, there was no way to detect it before we merged anyway14:57
TheJuliasigh14:58
rpittauTheJulia: not really, just basic stuff :/14:58
TheJuliawhat is super weird, is we have glean-less vmedia it was run against. I wonder why that broke14:58
TheJuliahell, it successfully ran with cloud-init dhcp'ed vmedia14:59
dtantsurTheJulia: ironic will inject stuff into a CD regardless of whether IPA has glean working or at all14:59
TheJuliaeh, so cloud init might have broken the chain then14:59
TheJuliawhat is the ramdisk which is used in metal3 specifically?15:01
TheJulia(Trying to figure out why I didn't see this in testing in our CI)15:02
dtantsurTheJulia: well.. let's ask Adam once he's available. Metal3 uses our upstream ramdisks from tarballs.o.o, but I know that at least some CI jobs use something that Ericsson build with IPA-builder.15:03
TheJuliayeah, was hoping adam would have stayed around a little longer today 15:04
dtantsurRegardless, the patch makes some dangerous assumptions and I sorta -1.5 on landing it (and -2 on ever backporting something like that)15:04
TheJuliac'est la vie15:04
dtantsurYou can email him. He should be at work still, probably just left IRC.15:05
dtantsur(or use k8s slack)15:05
rpittauin ipa-downloader we just use our tarball from upstream15:05
dtantsurYes, but there is this downstream nordix build, I'm not sure which role it plays15:05
rpittauI believe the modified ramdisk is used only from Ericsson internally at the moment15:06
dtantsurah, okay15:07
dtantsurTheJulia: in addition to assuming files and the presence of glean, you also assume that glean can handle the situation where networking could have been injected by a malicious 3rd party. This is risky IMO.15:12
dtantsurAlso that nothing critical runs before IPA (like CERN downloading a tarball with plugins).15:12
opendevreviewMerged openstack/ironic-python-agent master: Revert "Fix vmedia network config drive handling"  https://review.opendev.org/c/openstack/ironic-python-agent/+/90223115:14
TheJuliaI commented on the change set15:16
TheJuliadtantsur: well, the entire idea was also disable glean from ever running upfront too15:16
TheJuliaso we no longer *need* simple-init at all15:16
dtantsurIt's not something you can assume for backports though...15:17
TheJuliatrue, which is why it cleaned up the state because it flat out fails with glean today, identifies the attached vmedia, unmounts the folder, and copies the files back for glean to run15:17
TheJuliawhich is something I don't think you groked from your comment15:18
TheJuliaThat being said, glean just not working today in the double cases was also involved with Centos9 cloud images including cloud-init by default15:20
TheJuliaGlean would try, and then be completely squashed by cloud init trying DHCP15:20
TheJulia.... because cloud-init will only look locally if the data source is explicitly ConfigDrive *and* dhcp is turned off15:21
dtantsurI guess what worried me is the transition of "We provide network_data, Glean is recommended" to "We provide network_data, and you MUST run Glean in the exact way we prescribe". Especially on stable branches.15:22
TheJuliaI mean that is navigatable15:23
TheJuliaI broached just reading the data and dropping glean and the push back I got was that we only supported glean15:23
TheJuliaand just to use it15:23
TheJuliaChallenge is, out of the gate on a stock path, that just doesn't work either15:23
TheJuliaAdam likely has a magical line and a patch or soemthing in the Nordix line that fixes that though15:24
dtantsurWhat is preventing it from working? It definitely did work at some point.15:24
TheJulia(and I kind of started owrking on fixing it dib outputted images anwyay)15:24
dtantsurhttps://github.com/Nordix/metal3-dev-tools/tree/main/ci/scripts/image_scripts/ipa_builder_elements15:24
TheJuliadtantsur: cloud-init, in initial phases now always triggers DHCP, even *if* you have the data source configured as ConfigDrive15:25
dtantsuryeah, I guess assuming that cloud-init is kicked out15:25
TheJuliaexcept, glean always also ran before cloud-init based upon networking defaults15:26
TheJuliaso glean would run far in advance, get things ready15:26
dtantsurFor the record: I don't suggest running Glean together with cloud-init15:26
TheJuliathen cloud init would fire up, attempt to do it's thing, stomp upon fallback and wipe out glean's files15:26
dtantsurThis is a stupid thing to do that we shouldn't spend effort on working around15:26
TheJuliaoh, yeah, no. Please no, no mixing15:26
dtantsurI do recognize that some people may try to hack together something based on cloud-init or something completely downstream15:27
TheJuliayeah, which is why I originally just wanted to go "forget these helpers1"15:27
TheJulias/helpers1/helpers!/15:27
dtantsurI still don't quite get the benefits over something like https://review.opendev.org/c/openstack/ironic-python-agent-builder/+/902291?usp=dashboard15:28
dtantsur(which is closer to what Nordix is doing AFAIK)15:28
TheJuliawe have to make something that is opted in, instead of something that is able to manage/recover on older deployments without logic to add a bootloader argument as well15:30
dtantsurThere is a fall back to /dev/sr0 if that's what you're looking for15:30
dtantsur(could be a smarter way to detect a CD, I'm just showing the idea)15:31
TheJuliamore likely a usb device in real word instead of sr015:31
TheJuliaCI only gives us SATA attached CDs15:31
TheJuliauhh, I'm not sure what you mean by fallback to /dev/sr0 though15:32
dtantsurhttps://review.opendev.org/c/openstack/ironic-python-agent-builder/+/902291/1/dib/ironic-ramdisk-network-config/static/usr/local/bin/ipa-glean-early.sh#1615:32
dtantsurwhich, as you say, should be something slightly smarter, still excluding local disks15:33
dtantsurCan be: anything that is not SATA, SCSI, NVME or a partition15:33
TheJuliayeah, which is why I tried to solve this in IPA since we had existing guarding logic in IPA to help us hone in and identify the actual device instead of guess15:34
dtantsurYeah, but two of my three concerns here revolve around IPA being simply too late15:35
TheJuliain my testing, it just failed to configure the addressing or handle it at all15:36
TheJuliaand the hope was ultimately, explicitly remove glean as a default invocation moving forward15:36
TheJuliaso glean becomes a tool we use, we don't fire it up on start at all15:36
dtantsurSure, but IPA is too late anyway.15:36
dtantsurI mean, the current IPA even depends on network-online, which won't work any more :)15:37
TheJuliawell, not if moving forward, glean just doesn't have any configuration to run on boot15:37
TheJuliaeh, still goes network-online if it fails to actually address anything aiui15:37
TheJuliait tried, nothing was there15:37
dtantsurokay, so we no longer allow any services that require networking to run before IPA?15:38
TheJuliaThat was kind of my hope, get rid of them all so we can make the decisions upon the information which we have and trigger the appropriate configuration15:38
dtantsurnor really any glean alternatives?15:38
TheJuliawhich can be further enhanced as time goes on15:39
dtantsurwdym "get rid of them", people may need them?15:39
dtantsurFortunately OpenShift won't be affected, because that's life-critical for us15:39
TheJuliafrom a utility standpoint, my thought was we encourage getting rid of cloud-init and glean from running entirely15:39
dtantsurI also used CERN as an example15:39
JayFQuite some time ago we sorta made the decision that IPA isn't software, it's the full ramdisk15:40
TheJuliathey have DHCP in their environment, do they use cloud-init for that?15:40
JayFSo it's not unreasonable for us to dictate things about the environment IPA runs in, and what can/can't be in the ramdisk15:40
dtantsurJayF, TheJulia, I was not included in either "we" that you just used15:40
JayFbut such things can't be backported15:40
dtantsurLike, I'm actively very much NOT one of these we to the extent that my career depends on it.15:40
JayFdtantsur: the "we" in that case is strongly implied by code we have changed15:40
JayFdtantsur: and I don't mean like, last month, I mean over the course of years15:40
JayFI'm going to be honest, after re-reading/reviewing the patch, and reading the discussion this morning15:41
JayFthis feature just scares the crap out of me 15:41
JayFI keep trying to convince myself that none of the failure modes are security vulnerabilities so it's probably okay15:42
dtantsurPeople downloading software before IPA or even downloading the IPA itself from the internet is a very-very much a reality.15:42
JayFyes, I agree, but I'm not sure I see an architecture where the combo of that + node network data + configdrive support on instances we are cleaning can combine15:43
dtantsurAt least two major Ironic consumers rely on that. While none will presumably be affected by this discussion, it's a sign that we cannot just assume to know everything about the environment IPA runs in.15:43
dtantsurJayF: well, I'm happy to present you with such an architecture. I've even just proposed an IPA-builder side of it.15:43
dtantsurRandomize the label name, teach Ironic to provide it, teach (or force) Glean to understand that.15:44
dtantsurBoom, done.15:44
TheJuliaand backport the feature on both sides15:44
JayFI thought there was just a whole thing about how this couldn't be glean-specific?15:44
* TheJulia starts to eyeball vacation early15:44
dtantsurJayF: yep, and this is not. You can still swap in your favorite thing that does ~ the same.15:45
* JayF notes he is sick and likely won't make it a full 8 hours today, just trying to keep himself productive+distracted until brain runs outta juice15:45
TheJuliaJayF: I was just going to suggest soup, and coffee15:45
TheJuliabut then I realized I have had no coffee today15:45
dtantsurTheJulia: feature backports are bad except when they aren't. As long as we don't break the default case, I'm happy to justify backporting a feature that fixes serious security limitations.15:46
dtantsurNot that we have never done that...15:46
JayFI'd have to see the IPA side to have a stronger opinino on backporting it15:46
dtantsurNo IPA side in what I propose; an Ironic side though15:47
JayFas long as an operator who doesn't care about this feature, even if they're doing a custom ramdisk, has no action, it should be OK though15:47
TheJuliaI was just trying to do it in a clean non-breaking way requiring agent and ironic to both be upgraded with additional external logic, but I don't see a path forward15:47
JayF^ criterion stays the same tho15:47
dtantsurIt will need to stay opt-in on stable branches. Kinda what we did with the agent token.15:48
TheJuliaeh, I don't think we backported that actively15:49
JayFnot upstream15:50
TheJuliaalso not on our downstream, we just let it flow in15:50
dtantsurI'm more referring to the state where we cannot be sure that IPA and Ironic are upgraded together15:50
opendevreviewWill Szumski proposed openstack/bifrost stable/2023.1: Restore discovery for dnsmasq dhcp provider  https://review.opendev.org/c/openstack/bifrost/+/90223315:50
TheJuliaoh, we've been having to deal with that reality for far longer than agent token :)15:51
TheJuliawe had to put a line in the sand and go "no, you must upgrade" for that though15:51
dtantsurI think the agent token was quite prominent :)15:51
TheJulialots of warning, and nasty mean errors15:51
* TheJulia takes pride in that one15:51
JayFwe handled it well, but that line did demark the first time when basically master IPA wouldn't work on arbitrarily old Ironic15:52
TheJuliawe also didn't lock out old agents until after we were outside of the established support window15:52
TheJulia(but we definitely did give people the knob! and we had some people us it!15:53
TheJulias/us/use/15:53
dtantsurThe case in question is a bit easier though because before TheJulia's patch, DHCP-less is opt-in on the image building level15:53
TheJuliait could have continued to be15:53
dtantsuryeah, I'm pointing out that the number of potentially broken operators is much smaller than with the agent token15:54
TheJuliaThen again, I've been working on it for so long I had sort of forgotten where I was at with things15:54
TheJuliaOn a way less stressful topic: How about this one https://review.opendev.org/c/openstack/ironic/+/89657016:05
TheJuliagreat points dtantsur! 16:05
TheJuliaDid I get the reasoning behind target_power_state correctly?16:06
dtantsur11 October was an eternity ago, did I say something smart?16:06
TheJuliayes, you did, you raised two excellent questions16:06
dtantsur\o/16:06
TheJuliaThe second one, If I'm understanding why, I suspect we just need to reach a consensus and document it, or we need to add more complex logic (and maybe the thing here is just fail fast)16:07
dtantsurMmm, upgrade_lock will most likely fail indeed, so mostly irrelevant.. 16:07
TheJuliaThe first one, dunno, I guess it *might* be good to turn the cards off, but maybe that could also just be a "presently we don't do this, we've considered it, if you have an opinion, please let us know"16:08
dtantsurOn that note, I wonder if we need to get locks beforehand. So that we don't end up with something half-powreed-on16:08
TheJuliaI'm a little worried since it is a fragmented vendor ecosystem to begin with16:08
TheJuliadunno, it could also be it could partially power on16:08
dtantsurmy worry there is less about what the vendors do, more about what ironic will try to do16:09
TheJuliasome of these devices have like a "we're only going to run a single core until we're in full power mode" logic path16:09
dtantsurwill we document that you must power off child nodes manually before powering off the parent one?16:09
dtantsurwill it work in any of our automated processes?16:09
TheJuliaI think we should document it, and just note, you may create pain if you do not16:09
TheJuliaand leave the door to automating it16:10
TheJulias/door/door open/16:10
dtantsurE.g. when we reboot the parent node during deployment, how will the child nodes react? What will Ironic do about it?16:10
TheJuliawith our power off, power on, they will loose power completely and it will be restored16:11
TheJuliaso they too, will fresh boot16:11
TheJuliaunless they have an external PSU16:11
TheJuliaif you trigger the CPU interrupt to reboot, the cards will ignore it16:12
dtantsurOr, say, a node gets deprovisioned. It has DPUs running. Ironic powers it off. Children nodes suddenly go off too. Ironic says "hmm, this is wrong" and powers everything on again.16:12
TheJuliayeah, that is the sort of case this creates16:12
TheJuliaand one where we are kind of in an impossible situation without documenting $something16:13
dtantsurSo, will we expect the users do power DPUs off themselves before deprovisioning? Will they be able to do it if they don't have direct access to Ironic?16:13
dtantsurMaybe we're smelling something like node.is_power_state_inherited16:15
TheJuliayeah, they won't power them off16:15
TheJuliabut we may need them online to unprovision the node at all16:16
TheJuliasince all networking may pass through 16:16
dtantsur*nod*16:16
dtantsurit all sounds to me like maybe we should be more prescriptive in the beginning16:16
TheJuliahmm16:16
TheJuliayeah16:16
TheJulia"Here is what is going to happen, if you don't have an external power supply"16:17
TheJuliaI guess, there are a list of states where it makes sense to power the host on *if* it is off16:17
dtantsurIt really sounds to me like has_external_power should be an option in driver_info (or on the node, which I like better)16:17
TheJuliawe should only make such a change *in* those states, outside of that, "nope"16:17
TheJuliaI think the external power was proposed as a driver_info field16:18
TheJuliaif we add it to node, the snmp operators are going to demand to use it :)16:18
dtantsurwe can consider their demand in the due time ;)16:18
opendevreviewJulia Kreger proposed openstack/ironic master: Redfish UefiHttp boot support  https://review.opendev.org/c/openstack/ironic/+/90096416:18
opendevreviewJulia Kreger proposed openstack/ironic master: Add HTTP versions of network boot interfaces  https://review.opendev.org/c/openstack/ironic/+/90096516:19
TheJuliarebases due to merge conflict on ironic/common/boot_devices.py16:19
TheJuliaenjoy!16:19
TheJuliawell, demand, exepct it just does things16:20
* dtantsur just wants to boast that metal3 is on the verge of making Inspector optional https://github.com/metal3-io/ironic-image/pull/44316:20
TheJulia\o/16:20
dtantsurunfortunately, the rest of the inspector work is stuck for the time being..16:22
TheJuliaokay, so the thing really needed is if I'm powering off a host which has DPUs, we should likely just turn off child nodes first. Which is fine. Internally the vendors are each working out the mechanism to figure out when to let firmware *start* to proceed with booting once power is activated, and it looks like it will be vendor specific-ish until DMTF standardizes on the ?MCT? communication path16:23
TheJuliabecause today, we don't have bi-directional comms between the BMC and the Add-in Management Controller (the bmc on the card)16:24
TheJuliaso, docs saying "if you do this, this other thing will happen, and the thing we need to head off is just re-powering the host up16:26
TheJuliaSo we can just take the same power state requested, and roll across child devices16:27
TheJuliathat sort of makes us opinionated, resets their state16:27
TheJuliaand if someone comes along and tries to power one on, the host will, indeed, return to life too16:28
TheJuliawhich is just a thing we need to document.16:28
dtantsuryeah16:28
TheJuliaThat seems, reasonable sans vendor standardization16:28
TheJuliabecause the AMC's can't call home, yet.16:28
TheJulias/home/home to the chassis bmc and say, give me power!"16:28
* dtantsur has no idea who AMC is :)16:29
TheJuliaAdd-in Management Controller16:29
TheJuliaThe next generation of hosts will have multiple BMCs16:29
dtantsurBright future16:29
TheJuliaSo, about opening a distillery!16:29
dtantsurnow we're talking business16:30
TheJuliasigh, I think this might be contrary to the inline docs in the plugin: 2023-11-29 17:35:01.962851 | controller | {0} setUpClass (ironic_tempest_plugin.tests.scenario.ironic_standalone.test_basic_ops.BaremetalIPXEBootTestClass) ... SKIPPED: The driver: None used in test is not in the list of enabled_drivers [] or enabled_hardware_types ['ipmi'] in the tempest config.16:37
dtantsurOur tempest tests are probably easier to rewrite than to fix. Just saying.16:38
TheJulia(actually, that is sort of what this attempt is to do)16:38
TheJuliaoh, it is the other fields16:39
TheJuliaeasy enough to bypass *grin*16:39
TheJuliaadam-metal3: o/ what OS are you running the downstream metal3 IPA images with?16:43
adam-metal3o/ centos 9 stream, in the metal3 community, even more downstream we have SLES also16:44
TheJuliaweird16:45
TheJuliaI wonder if there is some special casing we ended up with how the ipa image gets built. I didn't look at all of the elements dtantsur linked me to16:45
dtantsurthe official metal3 one is the one we build16:46
adam-metal3yes16:46
adam-metal3that is the same in the XI16:46
adam-metal3CI16:46
adam-metal3I have a pipeline that builds our flavor of centos 9 stream IPA but that we don't use for our regular tests16:46
dtantsuradam-metal3: the IOError failure, which image did it happen with?16:47
TheJuliaadam-metal3: is ipa coming from a package in that pipeline, or source?16:47
adam-metal3it comes from whatever is pulled from upstream by IPA downloader16:48
rpittauwhich is our tarball :)16:48
adam-metal3yes16:48
adam-metal3https://tarballs.opendev.org/openstack/ironic-python-agent/dib/16:48
opendevreviewJulia Kreger proposed openstack/ironic-tempest-plugin master: WIP: Test multiple boot interfaces as part of one CI job  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/90217116:48
TheJuliaInteresting, oh well16:49
dtantsurI think I know where you're coming from. The glean code only handles config-2 which is the label we use when we add network_data.16:49
dtantsurThe IPA side you modified also handles vmedia_iso or something like that, which is our normal label.16:49
dtantsurThis is how you ended up in the Glean code without having Glean or network_data.16:49
TheJuliayup, and why I didn't hit it in our local CI16:50
TheJuliathere is likely a couple other things involved there, but all water under the bridge as it were16:51
dtantsurWhat worried me is why our CI does not catch it..16:51
TheJuliawe had the target folder in our CI jobs with resulting images16:52
dtantsurWe don't have any jobs with DIB any more, do we?16:52
TheJuliabut we build entirely from source16:52
TheJuliadtantsur: I developed the change with one, so dunno16:53
rpittaummm in ipa? we do have dib as base ramdisk type16:53
dtantsurI'm trying to understand why only metal3 caught the failure16:53
TheJuliait is, if we run on rax, we may change that to tinyipa otherwise we end up spending 10 minutes uncompressing the initrd16:54
dtantsurYeah, AND we probably don't use virtual media in the IPA gate16:54
rpittauoh right16:54
TheJuliabut the change I had explicitly force us to use dib for that change16:54
TheJuliaI think the change I had was on ironic since that was where the ipa jobs were16:54
rpittauwe havwe ipa-tempest-uefi-redfish-vmedia-src16:55
rpittauthat should be virtualmedia16:55
dtantsur** WARNING ** - DIB based IPA images have been defined, however we are running devstack on an environment which does not support nested VMs. Due to virtualization constraints, we are automatically falling back to TinyIPA to ensure CI job passage.16:55
rpittauhehhh16:56
dtantsurMaybe we shouldn't "ensure CI job passage" at the expense of regressions...16:56
dtantsurBut yeah, this is why we missed it. The redfish job used tinyIPA.16:56
rpittauyep16:56
TheJuliayeah16:57
TheJuliaripping it out means re-tuning all job timing for running on rax all the time.16:58
JayFCan we setup a job like that, which requires DIB in order to perform it's test, to never run on rax?16:59
dtantsur++16:59
rpittauthat would be ideal17:00
TheJuliathat is entirely doable17:00
TheJuliawell, maybe not the very last part17:00
TheJuliaI'm not sure if we can exclude an entire provider17:00
dtantsurA question for infra folks probably..17:02
* dtantsur has a feeling that has been discussed in the past17:02
*** tkajinam is now known as Guest866617:02
opendevreviewJulia Kreger proposed openstack/ironic master: Change snmp job to not use a focal node  https://review.opendev.org/c/openstack/ironic/+/89382417:04
TheJuliadtantsur: likely, and gets lost on a side commentary17:05
rpittaugood night! o/17:15
JayFdtantsur: our docs say, for Ironic, to not use the inspector-inside-ironic support yet, right?17:19
JayFuntil it's done?17:19
* JayF thinks he's confused about something17:20
JayFon the surface I was wondering if https://github.com/metal3-io/ironic-image/pull/443 is exposing Ironic-says-dont-use-this-its-not-done-yet functionality in metal317:21
JayFbut now I realize that may not be the case17:21
dtantsurLet's say, we like to live on the bleeding edge :D17:21
dtantsurThen again, my intention is to expose the functionality in question in this cycle17:21
dtantsurhttps://review.opendev.org/c/openstack/ironic/+/898237 starts adding docs, for instance, and is awaiting reviews ;)17:22
JayFI was confused by the statement earlier about the inspector work not getting finished this cycle17:22
JayFwhich I thought was the point we were going to flip that swithc17:22
JayFI'm slightly worried if we get inspector-in-ironic "good enough" for metal3, it'll stay in limbo ~forever17:23
dtantsurI think we'll have to stabilize it with only a subset of features migrated :(17:23
dtantsurThat does not depend on what metal3 does, it depends on having people to work on it..17:23
JayFMakes sense. I do prefer metal3 operates on the what-we-consider-public API of Ironic though in general17:27
JayFbut it sounds like we're just sorta landing them togetherish17:28
dtantsurYep. The default is not switched yet.17:28
dtantsurI need it first and foremost for ironic-operator, which itself is a PoC17:29
opendevreviewJulia Kreger proposed openstack/ironic master: DNM: CI test for httpboot jobs  https://review.opendev.org/c/openstack/ironic/+/90118217:29
JayF++17:29
dtantsurBut yeah. I want to finish the docs. We got all processing hooks. So it's a decent feature set already, we can tell people to use that.17:30
dtantsurMaybe someone from my OSP counterparts can figure out the PXE filter ;) That will give us what 90% of people using inspection need.17:32
dtantsurInspection rules can be left as an exercise for the next internship or a new hire.17:32
JayFInspection rules is a specific place I've heard as negative feedback for Ironic recently from a couple of folks17:32
JayF(I don't think it's related to this change, just a note)17:33
dtantsurInteresting, any details?17:33
* dtantsur is reading the Itamar's eventlet email with a great interest in the meantime17:35
JayF"the documentation seems lacking" was the substance I had from that conversation; I'm trying to help get this person into IRC17:35
JayFItamar has been trickling these scary eventlet news bits into my slack nearly daily for weeks lol17:35
dtantsurThis is feedback we get for every part of OpenStack, always :)17:35
JayFoh yeah, apparently wheels are turning on the docs pro from GR side, too17:36
dtantsur\o/17:36
JayFthey found a person and are just figuring out remaining details17:36
dtantsurI'm asking because if the rules themselves suck, we can rewrite them now.17:37
dtantsurIf it's only about good docs.. yeah, inspector docs are not amazing (and them being separate from ironic is not helping either)17:37
* dtantsur needs to go, will catch up with you tomorrow17:38
JayFo/17:39
TheJuliazigo: o/ you around?17:50
opendevreviewMerged openstack/ironic-python-agent bugfix/9.8: Update .gitreview for bugfix/9.8  https://review.opendev.org/c/openstack/ironic-python-agent/+/90224717:51
TheJuliazigo: for when you appear, this question is likely better suited in the #openstack-tc room, but your not there right now, so I figured anywhere might work because I also didn't want to introduce possible noise to the eventlet thread. When you said the ship had sailed, what specifically were you referring to? Or maybe yet what were you thinking that caused you to think that? Just where debian is at on the 3.12 timeline?17:53
JayFThat's how I read it17:54
TheJuliaI was thinking so as well, but was wondering if there was more context or if it was coming from someplace else.17:54
JayFhonestly, given the findings of Itamar, I'm slightly worried about how well things would run at high contention in Py 3.1118:04
TheJuliasort of feels like someone needs to chase down the tests just to see18:04
TheJuliaor at least, get a broad idea18:05
TheJuliaThen again, the hypothosis is interesting18:05
opendevreviewJulia Kreger proposed openstack/ironic-tempest-plugin master: WIP: Test multiple boot interfaces as part of one CI job  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/90217118:08
JayFWell, I am getting off the computer and going to go find a doctor, and hope they have some kind of medication to help me out18:08
TheJuliafeel better JayF!18:08
JayFty o/18:08
iurygregoryget well JayF o/18:35
opendevreviewMerged openstack/ironic master: CI: Remove deprecated devstack method  https://review.opendev.org/c/openstack/ironic/+/90121119:25
opendevreviewMerged openstack/ironic-inspector bugfix/11.8: Update .gitreview for bugfix/11.8  https://review.opendev.org/c/openstack/ironic-inspector/+/90224419:25
opendevreviewJulia Kreger proposed openstack/ironic-tempest-plugin master: WIP: Test multiple boot interfaces as part of one CI job  https://review.opendev.org/c/openstack/ironic-tempest-plugin/+/90217119:44
opendevreviewSteve Baker proposed openstack/ironic-python-agent master: Switch to utf-8 for parsing efibootmgr -v  https://review.opendev.org/c/openstack/ironic-python-agent/+/90221420:15
opendevreviewMerged openstack/ironic master: Fix *_by_arch documentation and un-deprecate the options without it  https://review.opendev.org/c/openstack/ironic/+/90195820:23
jamesdenton_hello all - working with idrac-redfish and curious to know if redfish_verify_ca=False is expected to be functional...Conductor appears to ignore the suggestion21:35
TheJuliaI believe it should be, but have not looked at it specifically21:49
jamesdenton_thanks TheJulia - actually just did a Zed->2023.1 upgrade and something in that fixed it21:57
*** zbitter is now known as zaneb22:23
iurygregoryGREEN \o/ https://review.opendev.org/c/openstack/ironic/+/893824 22:45
iurygregoryif we don't have other cores around I will +W since is a CI change only22:45
iurygregoryand we really need to move out of focal in the snmp...22:45

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!