rpittau | good morning ironic! o/ | 07:41 |
---|---|---|
rpittau | JayF: great job on the highlights :) | 07:46 |
opendevreview | Adam Rozman proposed openstack/sushy-tools master: improve volume boot mode support https://review.opendev.org/c/openstack/sushy-tools/+/912113 | 09:28 |
opendevreview | Mohammed Boukhalfa proposed openstack/sushy-tools master: Add fake_ipa inspection, lookup and heartbeater to fake system https://review.opendev.org/c/openstack/sushy-tools/+/875366 | 11:10 |
iurygregory | good morning Ironic | 11:33 |
iurygregory | dtantsur, I'm looking at your comments in https://review.opendev.org/c/openstack/ironic/+/912336 , wondering about the eject part... this is what I had in mind https://paste.opendev.org/show/bHmFZsDXcYzEh2LWOy16/ so I'm trying to understand the case you mentioned | 12:18 |
* iurygregory brb, need to go to the insurance company to solve things about the car =X | 12:25 | |
opendevreview | Merged openstack/ironic stable/2023.1: Special case lenovo UEFI boot setup https://review.opendev.org/c/openstack/ironic/+/910447 | 12:27 |
opendevreview | Merged openstack/ironic master: ci: support overriding the service project name https://review.opendev.org/c/openstack/ironic/+/909221 | 12:49 |
dtantsur | iurygregory: "If a media was inserted trough Managers first and we update the BMC and now vmedia is trough Systems, I think the media was "automatically" ejected no?" - are you sure about this? | 13:02 |
TheJulia | Is that defined in the standard? I would hesiate unless it is explicitly defined | 13:21 |
dtantsur | That's basically my concern | 13:21 |
TheJulia | and *where* it is defined as well, we've seen meaning in schema spec be misconstrued | 13:21 |
TheJulia | multiple times, sometimes the same exact field | 13:21 |
opendevreview | Riccardo Pittau proposed openstack/ironic master: [trivial] add device_type param to attach_vmedia_device https://review.opendev.org/c/openstack/ironic/+/912457 | 14:11 |
opendevreview | Julia Kreger proposed openstack/ironic master: Add redfish https boot CI job https://review.opendev.org/c/openstack/ironic/+/901090 | 14:15 |
iurygregory | dtantsur, not really, but in case of the Lenovo you won't even have VirtualMedia available trough Managers, so how we would eject? | 14:31 |
dtantsur | iurygregory: we don't then? | 14:31 |
dtantsur | just make sure we don't try to access the missing attribute | 14:32 |
iurygregory | I'm ok with attemting to eject in both Systems/Managers, we just need to be sure they have the VirtualMedia resource | 14:32 |
dtantsur | you can do something like "if there is virtual media in systems, MissingAttributeError can be caught and ignored on the high level) | 14:33 |
iurygregory | dtantsur, yeah, basically check if Virtual Media is available in System (if yes we eject), check if is available in Managers (if yes, eject) | 14:33 |
* TheJulia likes this idea | 14:35 | |
dtantsur | ++ | 14:35 |
iurygregory | awesome, I will talk with winicius to update things o/ | 14:36 |
iurygregory | should we bump sushy in a separate patch or nope? | 14:37 |
dtantsur | we can bump here as long as we remember to remove the bump in backports | 14:41 |
iurygregory | also, I've noticed we have 5.0.0 | 14:44 |
iurygregory | should we bump to it? and not only 4.8.0? | 14:44 |
rpittau | iurygregory: that should be the minimum version needed for that to work | 14:47 |
dtantsur | yep, we never bump "just in case" | 14:47 |
iurygregory | rpittau, yeah, that's why I told him to use 4.8.0 | 14:47 |
rpittau | ack | 14:47 |
iurygregory | just wanted to be sure =) | 14:47 |
JayF | Happy DST meeting day for 'muricans :D | 15:00 |
JayF | #startmeeting ironic | 15:00 |
opendevmeet | Meeting started Mon Mar 11 15:00:39 2024 UTC and is due to finish in 60 minutes. The chair is JayF. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:00 |
opendevmeet | The meeting name has been set to 'ironic' | 15:00 |
rpittau | o/ | 15:00 |
iurygregory | o/ | 15:00 |
* TheJulia grumbles about daylight savings time | 15:00 | |
dtantsur | o/ | 15:00 |
haoZhou | o/ | 15:00 |
TheJulia | o/ | 15:00 |
JayF | Welcome to the weekly Ironic meeting. We operate under the OIF Code of Conduct. | 15:01 |
JayF | #topic Announcements/Reminders | 15:01 |
JayF | #info Standing reminder to review patches tagged ironic-week-prio and to hashtag any patches ready for review with ironic-week-prio: https://tinyurl.com/ironic-weekly-prio-dash | 15:01 |
JayF | #info Project Teams Gathering (PTG) will be held from Monday, April 8 to Friday, April 12 2024 | 15:01 |
JayF | #link https://etherpad.opendev.org/p/ironic-ptg-april-2024 | 15:01 |
JayF | #info Ironic Meetup/BareMetal SIG June 5, OpenInfra Days June 6 @ CERN. Signup at https://indico.cern.ch/event/1378171/ and https://indico.cern.ch/event/1376907/ | 15:01 |
JayF | #topic Caracal Release Schedule | 15:01 |
JayF | It's R-3 week, RC1s are due RC-ing projects by March 14th. | 15:02 |
JayF | We don't RC, but that also serves as a good point to us to get our release going | 15:02 |
JayF | If anyone has a specific change they want in Caracal, please let me know | 15:02 |
JayF | I'm going to be focusing on landing stragglers and cutting releases this week | 15:02 |
iurygregory | We will probably want https://review.opendev.org/c/openstack/ironic/+/912336 | 15:03 |
iurygregory | this week? :O | 15:03 |
iurygregory | oh wow | 15:03 |
dtantsur | ++ to this change | 15:03 |
JayF | 3 weeks until the release is final-final, and we can't ask releases team to take our deliverables last minute | 15:03 |
JayF | so yeah, we gotta get moving :D | 15:03 |
dtantsur | and also https://review.opendev.org/c/openstack/ironic/+/907991?usp=dashboard if we want the new inspection to be (more usable) for non-standalone folks | 15:03 |
rpittau | I'll try to squeeze in one more this week, fingers crossed | 15:04 |
JayF | Please let me know if you have a change up you want in so I don't ninja the release under you :D | 15:04 |
dtantsur | rpittau: you mean, vmedia implementation? | 15:04 |
rpittau | yeah | 15:04 |
dtantsur | oh yeah. API without a single implementation is not great | 15:04 |
rpittau | yep | 15:05 |
JayF | lets make sure we get things up fast :) time is coming up quickly | 15:06 |
JayF | #topic Review Ironic CI Status | 15:06 |
JayF | I'm pretty sure IPA gate is broken | 15:06 |
JayF | I have a task on my todo list to dig into that today | 15:06 |
opendevreview | Julia Kreger proposed openstack/ironic master: Fix artifical rbac policy constraint that resulted in 500s https://review.opendev.org/c/openstack/ironic/+/910969 | 15:07 |
JayF | I guess nothing else about CI status | 15:08 |
JayF | #topic vPTG planning | 15:08 |
JayF | Please review https://etherpad.opendev.org/p/ironic-ptg-april-2024 | 15:09 |
rpittau | JayF: I think that was (or is) metalsmith legacy job, I've retested a patch to verify | 15:09 |
JayF | we already have some pre-discussion happening | 15:09 |
JayF | rpittau: thanks | 15:09 |
JayF | rpittau: I am concerned it's a unit test issue | 15:09 |
JayF | rpittau: so I'm not convinced that's the only bit, we'll see | 15:09 |
dtantsur | yeah, I've left some comments today | 15:09 |
rpittau | oh ok, didn't spot that | 15:09 |
JayF | #topic Bug Deputy | 15:10 |
JayF | I was the bug deputy, but realistically in name only | 15:10 |
JayF | I'm happy to take it another week and do two weeks worth of triage :) | 15:11 |
JayF | Probably a valuable activity to take before release, anyway | 15:11 |
JayF | #topic Open Discussion | 15:11 |
JayF | We had an agenda item left | 15:11 |
JayF | #info Please review BMC CA Cerfificate - ironic spec (https://docs.google.com/document/d/1Vxn-MrcXEnzeHaWvni4IE0WuQpR1QNWNC8EF65wXUho/edit) for RFE https://bugs.launchpad.net/ironic/+bug/2040236. Any comments are welcome. | 15:12 |
JayF | My first feedback will be to push it to gerrit for further feedback :) | 15:12 |
JayF | but I also intend on looking over it, in gdoc form and providing coarse feedback | 15:12 |
JayF | Anything about this or anything else for open discussion? | 15:12 |
TheJulia | wait, a google doc spec? | 15:12 |
rpittau | yeah, they wanted to have a preliminary discussion on that :) | 15:13 |
dtantsur | I think we should communicate that it's about time for real discussions | 15:13 |
JayF | Yeah, it was added to our agenda. It needs to be moved into gerrit but I am not going to hold that against them the first time around :) | 15:13 |
rpittau | I agree | 15:13 |
JayF | Anything else for open discussion? | 15:14 |
haoZhou | I will send it to gerrit, thanks | 15:14 |
JayF | haoZhou: thanks for submitting it! | 15:15 |
JayF | Last call for open discussion items | 15:15 |
TheJulia | Okay, so my $0.02 feeling on that is the addition of the option itself, as long as it is a singular option is likely okay without a spec | 15:15 |
TheJulia | I'm less and less a fan of driver specific field names unless absolutely necessary | 15:16 |
dtantsur | I think JayF asked for the spec back in the RFE discussion days | 15:16 |
dtantsur | TheJulia: well.. while a reasonable deployment should have the same CA for all BMCs, regardless of vendor, I can imagine a situation where some hardware simply cannot be updated | 15:16 |
dtantsur | dunno | 15:16 |
JayF | Or at least, it'd need to be aligned on something that is meant to reflect physical distance | 15:17 |
JayF | like conductor group | 15:17 |
dtantsur | Could be a PTG topic because for now we have all these [redfish]kernel_append_params and so on | 15:17 |
TheJulia | yeah | 15:17 |
JayF | "a different conductor group needs a different CI" aligns with a design where you have one Ironic orchestrating multiple, separately owned/operated computer rooms | 15:17 |
JayF | s/CI/CA/ | 15:17 |
dtantsur | This is probably a different concern. You probably will have a different ironic.conf per conductor group. | 15:18 |
JayF | ooh | 15:18 |
TheJulia | dtantsur: ++ | 15:19 |
JayF | Either way, gerrit is a good place to have that discussion :) | 15:19 |
dtantsur | right | 15:20 |
JayF | Is there anything else or should I close it up? | 15:20 |
JayF | #endmeeting | 15:20 |
opendevmeet | Meeting ended Mon Mar 11 15:20:54 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 15:20 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/ironic/2024/ironic.2024-03-11-15.00.html | 15:20 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/ironic/2024/ironic.2024-03-11-15.00.txt | 15:20 |
opendevmeet | Log: https://meetings.opendev.org/meetings/ironic/2024/ironic.2024-03-11-15.00.log.html | 15:20 |
TheJulia | oh, ipa's unit tests actually codify the underlying ironic-lib behavior | 15:29 |
TheJulia | so we could just do "yes, we called the library" mock level testing instead of "these commands got executed" level of verification | 15:30 |
TheJulia | dtantsur: and regarding the parameter for the bmcs's ca, I'm not arguing about it being settable, I'm arguing about <drivername>_verify_ca being stamped all over the place, even on drivers which... have a shortend lifetime ahead of them | 15:32 |
dking | I see that the cleaning step `erase_devices` dispatches a call to `erase_block_device` across hardware managers, which makes it simple to override the erasing of block devices with custom code. However, I am not seeing a similar option for `erase_devices_metadata`. | 15:33 |
dtantsur | TheJulia: I assumed we would only add it to the drivers that can support this option? | 15:33 |
JayF | dking: erase_devices_metadata *is* the target of the hwmanager call | 15:34 |
JayF | dking: just define your own in a higher-prio hwm | 15:34 |
TheJulia | dtantsur: I can see sort of see an option, regardless of name making sense, It doesn't make sense to me to enabel a pattern of "ilo_verify_ca" and "redfish_verify_ca", because I have customers which will try to set it expecting both options to do something | 15:35 |
dking | I would like to be able to skip erasing metadata, specifically lvm and volume groups while allowing other drives and even volume groups to be wiped. Is there an easier way to do that than to essentially rewrite the entire `erase_devices_metadata`? | 15:35 |
TheJulia | even when on only one of those drivers... | 15:35 |
JayF | dking: the erase_devices bit is just there so you can implement "erase a single device" in a hardware-specific way | 15:35 |
JayF | dking: the assumption is (I think) that you don't need hardware spceific help for metadata wipes | 15:35 |
TheJulia | Is anyone working on ironic-lib? I ask because In a couple hours I need to drive to LA | 15:35 |
TheJulia | err | 15:35 |
TheJulia | ironic-python-agent's ironic-lib testing | 15:36 |
dtantsur | TheJulia: why not? I'm not sure I understood.\ | 15:36 |
TheJulia | https://www.irccloud.com/pastebin/0MbMHn35/ | 15:36 |
dtantsur | We don't expect redfish_verify_ca and ilo_verify_ca to be different.. but they might in an extreme case | 15:36 |
JayF | TheJulia: that's the ipa unit test issue I said I was looking at later, yeah? | 15:36 |
JayF | yeah | 15:36 |
TheJulia | dtantsur: but while being set on the same node is sort of the thing I'm concerned about | 15:36 |
JayF | I already have time set aside :) | 15:36 |
TheJulia | JayF: okay, yeah, thanks! | 15:36 |
dtantsur | TheJulia: wdym? | 15:37 |
dtantsur | ilo and redfish cannot apply to the same node | 15:37 |
dtantsur | not at the same time | 15:37 |
dking | JayF: I'm probably not doing something typical. What I'm looking to do at the moment is to skip wiping certain volume groups. So, it's not really a hardware problem. it's just that I want to make my own check before wiping them. Unfortunately in this case, I do want to wipe some other volume groups on the same node, so I still want the wiping functionality. | 15:37 |
JayF | in my own hwm manager then, I'd override erase_devices_metadata | 15:37 |
JayF | have it get a list of devices | 15:38 |
rpittau | oh you mean this https://review.opendev.org/c/openstack/ironic-python-agent/+/910480 | 15:38 |
JayF | filter those devices for the set you care about | 15:38 |
JayF | then just either call the methods being used by the generichwm implementation or just copy+paste the actual metadata wipe code | 15:38 |
JayF | don't obsess over DRY'ing it up too much, it's OK to just copy+paste | 15:38 |
rpittau | TheJulia: I ahve the fix for the unit test here https://review.opendev.org/c/openstack/ironic-python-agent/+/910480 | 15:38 |
dtantsur | Maybe we could improve that in IPA too | 15:38 |
dking | JayF: Yeah, that's hat I'm probably going to have to do. I just hate the idea of copying over the entire general erase_devices_metadata so that I can add in a skip. That means that I would lose any updates that get made to that method upstream, assuming that any ever do get made. | 15:39 |
JayF | not often :) | 15:39 |
JayF | and if your hardware doesn't change, it's a feature not a bug, imo, that it doesn't change | 15:39 |
iurygregory | dtantsur, what do you mean by normal AttributeError? https://review.opendev.org/c/openstack/ironic/+/912336/comment/3bc72da4_cb011dee/ doesn't sushy only raises MissingAttributeError in such cases? https://opendev.org/openstack/sushy/src/branch/master/sushy/exceptions.py#L41 | 15:40 |
TheJulia | dtantsur: I have a some customers who will try regardless to set both settings, regardless adding multiple options just starts to create divergent code which is driver specific, when those drivers are largely just being maintained with only upstream community engagement | 15:40 |
rpittau | JayF: also fix for that unit test that I forgot :D https://review.opendev.org/c/openstack/ironic-python-agent/+/910480 | 15:40 |
TheJulia | dtantsur: I guess, I'm worried about added perception and interaction complexity more than anything else | 15:41 |
dtantsur | TheJulia: "customers changing things they don't understand" is a problem but not necessarily a sort of problem that must unconditonally drive our decisions | 15:41 |
TheJulia | dtantsur: I'm not saying unconditionally, I'm just saying I'm not a fan of stamping driver specific verify_ca options all over the place without intention | 15:41 |
dtantsur | I'm not overly against just [conductor]bmc_verify_ca, but they we need to develop a policy around it | 15:41 |
TheJulia | agree 1000000% | 15:42 |
dtantsur | so that we don't doing similar things, possible deprecate [<driver>]kernel_append_params and so on | 15:42 |
TheJulia | More towards the ptg topic level of discussion, we likely need to message what the realistic lifetime and path is for, for example, the ilo hardware type | 15:43 |
TheJulia | and explicitly begin that process of expectation setting | 15:43 |
dtantsur | Yeah.. which reminds me, I need to have this conversation in Metal3 | 15:43 |
TheJulia | ++ | 15:43 |
TheJulia | same for the classic drac driver, and likely more urgently because of it | 15:43 |
dtantsur | has ironic communicated the deprecation already? | 15:43 |
TheJulia | where as ilo, we at least have contacts which will respond/engage | 15:44 |
dtantsur | The problem is: ilo4 is still quite actively used | 15:44 |
TheJulia | for classic drac I believe we did | 15:44 |
TheJulia | for ilo, I think we painted the wider context | 15:44 |
JayF | re: verify_ca | 15:44 |
TheJulia | maybe not an official deprecation, but we should go ahead and set that path out | 15:44 |
JayF | I think we need to think about models -- e.g. "this is a CA for BMCs" vs "this is a CA for agents" | 15:45 |
JayF | and if we go that route, of making it not-driver-settable, we need to be really explicit about our "zones" of ssl verification | 15:45 |
TheJulia | dtantsur: yeah, and I would expect ilo4 for a couple more years, it will sharply decline as the mean time between failures occur :) | 15:45 |
dtantsur | and the ilo5 driver will probably live for longer... sigh | 15:45 |
TheJulia | dtantsur: yeah, and we *will* hit a point, at some point, where the state of the proliantutils library will be the primary driver for us going "time to pull support" | 15:46 |
dking | JayF: Maybe then as a separate question, would there be any use in either a dispatch_to_managers for something like erase_block_device_metadata, which in the HardwareManager class simply calls disk_utils.destroy_disk_metadata, so that it could be overriden, or perhaps maybe even just add some type of hook to run a check before proceeding? | 15:46 |
TheJulia | I think that, and similar with dracclient, will be the prime forcing functions | 15:46 |
dtantsur | The problem may be RAID. ilo5 had RAID support via proliantutils but not redfish. | 15:47 |
JayF | dking: I'd have to see the change. I'm not -1 outta hand to it by any stretch, but that's code that's called in almost every Ironic deployment with cleaning so we should be careful -- and I think you're probably overly concerned about the risks of just overriding that method. Just do it! It's fine! | 15:47 |
TheJulia | dtantsur: ugh | 15:48 |
TheJulia | dtantsur: but with at least a stance written down, we have something to stand on on if we need to take any level of drastic action | 15:50 |
dtantsur | true | 15:51 |
dtantsur | have we deprecated ibmc too? | 15:51 |
TheJulia | I believe so!? | 15:51 |
TheJulia | At this point, I'd have to search | 15:51 |
JayF | Yes | 15:52 |
JayF | I put the high level in cycle highlights | 15:52 |
TheJulia | JayF: https://review.opendev.org/c/openstack/ironic/+/911158/4/devstack/lib/ironic#586 is not working :( (le sigh) | 15:53 |
JayF | - Several Ironic drivers have been deprecated in favor of more modern, redfish-based drivers. The ``ibmc``, ``xclarity``, and ``idrac-wsman`` drivers will be removed during a future development cycle. Operators utilizing these drivers are encouraged to use the redfish hardware type instead. Additionally, users of the ``ilo`` hardware type on newer ILO6-based hardware will now be prompted to use ``redfish`` instead. | 15:53 |
dtantsur | thanks! | 15:54 |
JayF | TheJulia: I can add it to the list, but it's lower prio than IPA master branch. I'll be starting from scratch | 15:54 |
JayF | BTW, we said it'd be removed in 2024.2 | 15:54 |
JayF | I think we need to wait longer | 15:54 |
JayF | (in release notes) | 15:54 |
JayF | but we can discuss that when removal time comes | 15:54 |
TheJulia | Anyway, I'm going to be hitting the road to drive to LA in ~45 minutes | 15:57 |
TheJulia | depending on traffic, I might not be on first thing in the morning | 15:57 |
TheJulia | (i.e. the "no way I'm spending that much time in traffic" drive home" scenario | 15:57 |
dking | JayF: I'm doing that, but there's a couple of frustrations. For one, I now also have to maintain the proper version of ironic-libs in my requirements, but also, now I have to go through the trouble of making sure that I mock out `disk_utils.destroy_disk_metadata` in my unit tests. | 16:05 |
JayF | If you're using a patch to do this, you're taking the wrong approach | 16:05 |
JayF | you should be able to make an indepedent hardware manager, like one of the examples we publish, that just implements that one command | 16:06 |
JayF | and it will override | 16:06 |
dking | JayF: Yes, that's what I'm doing. But I still want the metadata wiped for all other block devices. So, in my custom hardware manager, I've having to re-implement erase_devices_metadata so that I can add in a check before it calls disk_utils.destroy_disk_metadata. But that means that disk_utils.destroy_disk_metadata would still get called in my own unit tests and it means that I'll have to have the package installed from which i | 16:08 |
dking | t is called. | 16:08 |
JayF | actually, there's a better way than that, even | 16:08 |
JayF | lemme get the docs | 16:08 |
JayF | you're really close :D | 16:08 |
dking | JayF: Great. | 16:08 |
JayF | https://docs.openstack.org/ironic-python-agent/latest/contributor/hardware_managers.html#custom-hardwaremanagers-and-cleaning | 16:09 |
JayF | so you should be able to: | 16:09 |
JayF | hmmmm | 16:10 |
dtantsur | Side note: I plan to move the entire disk_utils into IPA. but that won't help the current situation. | 16:11 |
JayF | yeah, we'd need a dispatch style like we use for erase_devices | 16:11 |
JayF | because we can't do the "raise IncompatibleDeviceError and have upstream handle it" when the caller is saying "do N things" | 16:12 |
JayF | so I see two paths: 1) https://opendev.org/openstack/ironic-python-agent/src/branch/master/ironic_python_agent/hardware.py#L3422 make this method a dispatch_to_managers() call everywhere so you can override it OR | 16:12 |
JayF | 2) as you mentioned above, change this to call out to dispatch inside the for loop here https://opendev.org/openstack/ironic-python-agent/src/branch/master/ironic_python_agent/hardware.py#L1757 | 16:13 |
JayF | I think that's the only non-patch way to override this method *without also* reimplementing it entirely/copying it out | 16:13 |
JayF | which I still think is the route I'd take in your shoes | 16:13 |
dking | Yes, any of those are good for me. | 16:13 |
dking | That's what I'm doing for now to get the current functionality, but it's a hack and not terribly maintainable. | 16:14 |
JayF | well, that method barely ever changes | 16:14 |
dking | *what I'm working on now, because of this duscussion. | 16:14 |
dking | I understand, but even so it makes a few calls locally, including the important one, which isn't even in the ipa package itself, but the ironic-lib package. And all of that just means extra work even on top of hoping that none of the dependencies or original code changes. | 16:15 |
dking | it's not bad, but it's not ideal, and I can imagine that it would be generally useful for people to be able to cleanly make exceptions to what block devices (or metadata in this case) get wiped. | 16:16 |
JayF | these are the basic motivations around why we did it for erase devices | 16:16 |
JayF | I'm sold on the value; but I also don't think it's likely to be a personal priority to implement | 16:17 |
JayF | but like with anything I'm happy to review and help where I can if someone else is going to | 16:17 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: Switch to qemu-img functions from ironic-lib 6.0.0 https://review.opendev.org/c/openstack/ironic/+/912471 | 16:18 |
opendevreview | Verification of a change to openstack/ironic master failed: Special case lenovo UEFI boot setup https://review.opendev.org/c/openstack/ironic/+/908946 | 16:27 |
dking | JayF: Actually, I just re-discovered get_skip_list_from_node, which might actually work. It looks like _list_erasable_devices has a default list of block_devices which is set to list_block_devices_check_skip_list, which in turn, in the default hardware manager, calls get_skip_list_from_node. | 16:32 |
JayF | oh, score | 16:32 |
JayF | if you find a way to finagle this in | 16:32 |
JayF | please do doc it up? | 16:32 |
JayF | I'm not surprised there's a path, I went looking for it | 16:32 |
opendevreview | Dmitry Tantsur proposed openstack/ironic-python-agent master: [WIP] Import disk_{utils,partitioner} from ironic-lib https://review.opendev.org/c/openstack/ironic-python-agent/+/912476 | 16:32 |
TheJulia | yeouch CI | 16:32 |
dtantsur | rpittau: will ^^ make fixing unit tests easier or harder? | 16:32 |
TheJulia | https://www.irccloud.com/pastebin/Om7sJpM1/ | 16:33 |
JayF | TheJulia: side effect from our drain code, perhaps? | 16:33 |
dking | JayF: I am confused, though. In the abstract HardwareManager, get_skip_list_from_node raises errors.IncompatibleHardwareMethodError. However, I don't see it ever being called by dispatch, it it looks like that's only useful if other methods or implimented along the way. | 16:33 |
JayF | TheJulia: where we might fire that log message inadvertantly? | 16:33 |
* JayF is hoping | 16:33 | |
TheJulia | JayF: we don't fire it inadvertently | 16:33 |
JayF | dking: so that means it actually is likely upstream-bugged | 16:33 |
TheJulia | JayF: it gets logged when we actually fail to update the heartbeat | 16:34 |
JayF | dking: because we'd skip erasing *all* metadata if *one* disk was skipped | 16:34 |
TheJulia | in a timely mannor | 16:34 |
JayF | okay, yikes then | 16:34 |
rpittau | dtantsur: mmm I don't think it would change much, but the problem is that the metalsmith legacy job also looks broken :/ | 16:34 |
TheJulia | so in other words, when our write is not making it to disk (as in committed into the db) | 16:34 |
dking | JayF: yeah, that's what it looks like, but I haven't tested. | 16:34 |
rpittau | dtantsur: I'm rerunning CI here https://review.opendev.org/c/openstack/ironic-python-agent/+/910480 if it passes the metalsmith job I think it would be fast to merge | 16:35 |
dtantsur | rpittau: it has just failed the metalsmith job | 16:35 |
rpittau | \o/ | 16:36 |
rpittau | alright so we need to check that too :/ | 16:36 |
dtantsur | TheJulia: did you mean to address comments on https://review.opendev.org/c/openstack/ironic/+/907991 now or in a follow-up? Especially the docs update is not something I can trivially do between other tasks. | 16:42 |
dtantsur | otherwise, it has 2x +2 | 16:42 |
TheJulia | follow-up is fine | 16:42 |
TheJulia | from my point of view | 16:42 |
dtantsur | Could you do a +W then? | 16:51 |
rpittau | yep, metalsmith-integration-ipa-src-legacy in IPA is foobar | 16:55 |
rpittau | 'Failed to create config drive on disk /dev/vda' | 16:55 |
rpittau | munmap_chunk(): invalid pointer | 16:55 |
dtantsur | WUT | 16:58 |
rpittau | that's parted doing weird things | 17:00 |
TheJulia | .... is it too late to take a vacation? | 17:00 |
rpittau | lol | 17:00 |
dtantsur | I'm dreaming of a vacation since.. a while | 17:01 |
TheJulia | ditto | 17:01 |
dtantsur | (my last attempt failed miserably, which did not help at all) | 17:01 |
TheJulia | I'm actually going to start heading towards los angeles shortly | 17:01 |
TheJulia | at least, to give a talk | 17:01 |
dtantsur | KubeCon next week, maybe I can get some rest there? | 17:01 |
TheJulia | bahahahahaha | 17:01 |
rpittau | LA, being a while I was there :) | 17:01 |
TheJulia | sorry | 17:01 |
dtantsur | c'mon, I can hope! :D | 17:02 |
TheJulia | dtantsur: indeed, I highly recommend the mid-week morning nap | 17:02 |
TheJulia | where you end up sleeping in by accident all morning | 17:02 |
dtantsur | ++ | 17:02 |
dtantsur | my most important events during the conference will happen in the evening anyway :) | 17:02 |
TheJulia | rpittau: doing a talk for the local openinfra meetup | 17:02 |
TheJulia | wish me luck | 17:02 |
JayF | Nobody mailed me a random gift, perchance, did they? | 17:02 |
dtantsur | good luck TheJulia! | 17:03 |
rpittau | TheJulia: nice! have fun! :) | 17:03 |
TheJulia | JayF: I... don't... think so. | 17:03 |
JayF | I had a piece of the first cray supercomputer, mounted, show up in my mailbox the other day with a vague anonymous note | 17:03 |
TheJulia | nice | 17:03 |
JayF | and it sounded ironic-adjacent so I figured I'd ask here | 17:03 |
JayF | yes, exactly | 17:03 |
JayF | I need to find, thank, and encourage this person to keep doing it | 17:03 |
JayF | lol | 17:03 |
TheJulia | heh | 17:03 |
JayF | ooooooooooooh I bet I know who it was | 17:03 |
rpittau | my brain refuses to go on, see ya tomorrow! o/ | 17:05 |
* TheJulia needs coffee | 17:05 | |
TheJulia | coffee is like a turbo button | 17:05 |
* TheJulia just dated herself there | 17:05 | |
JayF | Here's the real question: How much faster did a turbo button being pressed make computers go? | 17:06 |
TheJulia | 2.663Mhz faster | 17:06 |
dtantsur | I was too small and never risked pressing the button :( | 17:06 |
TheJulia | just enough to brak the graphics | 17:06 |
TheJulia | break the graphics | 17:06 |
TheJulia | Granted, those were CGA graphics | 17:07 |
JayF | you fell for the trick question | 17:07 |
JayF | traditionally, *enabling* turbo mode slowed the machine to 4.77mhz | 17:07 |
JayF | disabling turbo mode made it run at full speed, breakin' your games | 17:07 |
TheJulia | TIL | 17:07 |
JayF | (unless your system builder hooked the button up backwards, which was also common since enabling turbo mode to go faster is weird) | 17:07 |
JayF | s/en/dis/ | 17:08 |
dking | JayF: The more I look at the code, the more it looks like a lot of the methods in the abstract HardwareManager class were never really intended to be abstract. They seem to depend deeply upon each other and are called with `self.` instead of being dispatched to manager. Essentially, the way they are working, it seems like they might as well just be moved to the GenericHardwareManager, but I think that it would be nice if they w | 18:03 |
dking | ere instead dispatch_to_managers. Am I missing something? | 18:03 |
JayF | Every method in a clean step | 18:04 |
JayF | is dispatched to managers | 18:04 |
JayF | so if you have clean step X -> it gets dispatched | 18:04 |
JayF | there are only a few cases we'd need a "nested" dispatch | 18:04 |
JayF | and that's when the step is plural | 18:04 |
JayF | e.g. "erase_block_devices" has to then break into dispatches of "erase_block_device" in order for us to both ahve a step that says "erase everything" and the ability to override the erase of a single thing | 18:05 |
dking | JayF: Yes, the steps. I'm talking about the list of methods at the top of the abstract HardwareManager which simply `raise errors.IncompatibleHardwareMethodError` as if they are expected to be implemented by the custom hardware manager. | 18:05 |
JayF | otherwise, the usage pattern expected is create a custom HWM with a higher priority and override the method | 18:05 |
JayF | it's an abstract class | 18:05 |
JayF | it's not supposed to implement anything? | 18:06 |
JayF | I'm confused as to what you're asking | 18:06 |
dking | For instance, list_block_devices. Let's say that you looked at the HardwareManager abstract class and saw that it simply raises an IncompatibleHardwareMethodError, so you assumed that you could just write your own in your custom hardware manager. Let's say that you first did a check to see if you wanted to handle it, and then, deciding that you don't here, you raised the IncompatibleHardwareMethodError instead, just like in the | 18:17 |
dking | abstract class, expecting that it would go on to be handled by the default hardware manager. From what I see in the code, what would happen instead is that all the placess that try to list_block_devices would themselves throw IncompatibleHardwareMethodError because it's not dispatched and nothing is there to catch it. | 18:17 |
JayF | so we call *all* possible hwm | 18:18 |
JayF | if you have Custom1, Custom2, Generic in that priority order, and we call method_only_in_generic | 18:18 |
JayF | it'll call Custom1.method_only_in_generic() -> Incompatible, so it'll call Custom2.method_only_in_generic() -> Incompatible, so it'll call Generic.method_only_in_generic() -> succeeds | 18:18 |
dking | JayF: That's how it would seem to me that it _should_ work, but I don't think that it does for these cases, because they aren't being dispatched. It looks like it's calling these on `self.`. I'm going to make a proof of concept to confirm what I suspect and see if that's helpful. | 18:26 |
opendevreview | Merged openstack/ironic master: Add inspection PXE filter service https://review.opendev.org/c/openstack/ironic/+/907991 | 18:31 |
JayF | so I think what you're saying is that | 18:31 |
JayF | often GenericHardwareManager implementation details are not dispatched | 18:31 |
JayF | and that is likely by design; anywhere we dispatch we're creating a de-facto API/plugin interface | 18:32 |
JayF | so like, e.g. def some_erase_method(): self.do_pre_checks() | 18:32 |
JayF | you can't override do_pre_checks | 18:33 |
dking | What I am saying is that methods which exist in the abstract HardwareManager (not GenericHardwareManager), but which only raise IncompatibleHardwareMethodError, like list_block_devices, seem to be implying that they could be implemented by simply adding them to your custom hardware manager, and that if they do throw IncompatibleHardwareMethodError, then they would be run again by another hwm. | 18:39 |
JayF | aha | 18:44 |
JayF | I think I found the disconnect | 18:44 |
JayF | I suspect list_block_devices may need to be gone from abstract HWM | 18:44 |
JayF | when it was removed from GHWM | 18:44 |
dking | ^ | 18:44 |
JayF | that would explain some of the confusion and why I've been confused too lol | 18:45 |
dking | But the problem is that list_block_devices and others may actually be useful to be overridable. | 18:45 |
JayF | well, it's a pressure on both sides | 18:45 |
JayF | there are two ways to customize: configuration, which we've added more and more of | 18:45 |
JayF | and plugins/hardware managers | 18:45 |
JayF | often the difficulty is ... if you override list_block_devices, you'd also likely be disabling some documented Ironic functionality provided by IPA | 18:46 |
dking | I found it because I was looking for a way to have certain block devices be skipped during cleaning. There's a lot of great methods which do checks, but none of them are simple to override. Essentially, there's no good existing hook I can find yet to override that. I know that I _could_ add them in the properties, but I would like to do it in my custom hwm because I want to use code (in this case to check lvms and volume groups | 18:47 |
dking | for certain names). | 18:47 |
JayF | Yeah, I think the place we maybe disagree, design wise, is I basically think at the point you want to use code to pick devices to skip | 18:48 |
JayF | I kinda think you should take over the whole process | 18:48 |
JayF | e.g. just overriding erase_devices_metadata (or disabling the built-in one and using a step of your own) | 18:49 |
JayF | that's the pattern I've seen others follow (including myself) in this space in the past | 18:49 |
dking | There's lots of useful stuff in IPA that I want to keep. I don't want to replace any part of that. I just simply want to hook in and say "Oh, and skip these devices, too." | 18:49 |
JayF | I think that's a feature rather than a plugin. I suspect our answer to that request (if you filed an RFE) would be that a feature to skip based on some metadata would be a good one | 19:04 |
dking | JayF: So, if the official position is that these should simply not exist as overridable methods and should just be removed from HardwareManager in favor just existing in GenericHardwareManager, that's fine. Then, I probably could just add in an overridable method which would allow for that sort of thing. Perhaps, something like `def get_additional_skip_list(self, block_devices=none)` and maybe some wrapper method, like `def get | 19:05 |
dking | _device_skip_list(self, block_devices=none, just_raid=False)` which combines that (by dispatch_to_managers) and get_skip_list_from_node, and then call that every place that get_skip_list_from_node is called. Does that sound reasonable? | 19:05 |
JayF | and in fact, we could probably wire root_device_hints into skip_device_hints or similar (I'm surprised we haven't already done this) | 19:05 |
JayF | dking: s/official position/Jay's opinion/ | 19:05 |
JayF | that sounds sensible, hard to follow entirely without seeing the code | 19:05 |
JayF | but I also think just like ... hooking up a feature that does what you want, or gives you the tools to do what you want | 19:06 |
JayF | instead of giving a plugin hook | 19:06 |
JayF | might be a viable path, too | 19:06 |
dking | Well, device hints are great and we should use them, but I'd like something that can be run from hwm code, at least for my use case. | 19:06 |
JayF | but usually with stuff like this, whoever pushes the patch gets large amounts of input into how it's done | 19:06 |
dking | If it's practical, I'm thinking that I'd write the patch. I wouldn't want to make anybody else write it. But I do think that it can be useful for more people than just me. | 19:07 |
JayF | so I'd file an RFE about it anyway, lay out the approach you wanna take in there, and put it on the meeting agenda for rfe approval | 19:09 |
JayF | that'll get it before the eyes of enough people to say "yep" or "nope" at a high level before you get too much code written | 19:09 |
dking | Both the code for root_device_hints and get_skip_list_from_node use the same il_utils.find_devices_by_hints, so if you want device hints from the node, I think that's already there. I'm just wanting to discover the devices at runtime. | 19:09 |
JayF | yeah and if you're doing "LVM things named X get skipped" I doubt we'd wanna implement that | 19:10 |
JayF | because it breaks our API contract a little bit (it lets something *on the instance* decide if it gets deleted or not, which is not good for our security model) | 19:10 |
JayF | but if we add a hook in that spot | 19:10 |
JayF | and you write code that does it :D | 19:10 |
dking | Exactly. That won't be something that would be generic. But giving me a hwm hook to allow people to implement their own code is what I think is useful. | 19:11 |
JayF | honestly just adding a single | 19:11 |
JayF | dispatch_to_manaagers(custom_device_skip) | 19:11 |
JayF | in the appropriate place, with a true/false result | 19:11 |
JayF | and a device object passed thru | 19:11 |
dking | that way, it could be just about any silly thing. I'm sure people would want to skip cleaning for all kinds of things. Maybe they have a backup on a disk and want to leave the disks with that, or disks with certain volume groups, and so forth. | 19:11 |
JayF | might be a simple-to-implement piece | 19:11 |
JayF | This is a place where I like for it to be possible to do what you need | 19:12 |
JayF | but we gotta make sure the defaults are not a weapon pointed at anyones' foot :D | 19:12 |
JayF | which is why I'm giving cagey answers instead of solid answers w/o seeing the code lol | 19:12 |
dking | That is simple, but there are several places where the HardwareManager uses a skip list (in particular, the one from hints from the node), so that it might be useful to write a wrapper which includes both, to make the code cleaner and more DRY. | 19:13 |
JayF | I think DRY in this case is the enemy of extensibility. Not saying don't think about it; but it's less of a primary concern than being explicit | 19:13 |
JayF | I'd rather someone who is a 2/10 on python, mainly a sysadmin, be able to understand what's going on than I do someone who is a 10/10 on python being impressed with our python ;) | 19:14 |
JayF | (in context of hardware managers) | 19:14 |
dking | Definitely. However, I think that if somebody is going to roll their own "get_additional_skip_list" and populate its return with devices they want wiped, they've pretty intentionally at least tried to hammer their own foot. | 19:15 |
dking | In this case, the DRY I was speaking about wouldn't just be for readability, but to prevent somebody from accidentally forgetting an important step. | 19:16 |
dking | And the names would all be rather obvious, as well as how it is used. | 19:16 |
JayF | I'm interested to see what you come up with | 19:17 |
dking | Thanks. And thank you for walking down that trail with me. | 19:18 |
opendevreview | Jay Faulkner proposed openstack/ironic-python-agent master: Workaround unit test failures with extra mocking https://review.opendev.org/c/openstack/ironic-python-agent/+/912490 | 19:54 |
JayF | TheJulia: ^ fixes for IPA, I went the short way around, may try to delegate the fixing of the overall unit tests | 19:55 |
JayF | r/n I want to go look and see how this all landed in ironic-lib while breaking IPA unit tests | 19:55 |
JayF | we don't cross gate anything on ironic-lib, great | 19:56 |
JayF | I guess that makes sense given changes have to happen there first | 19:56 |
jamesdenton__ | QQ: Does Ironic support neutron's trunk port functionality? Especially interested in how this would look with ML2/NGS | 20:05 |
*** jamesdenton__ is now known as jamesdenton | 20:05 | |
JayF | I know we use it downstream, with some minimal patches | 20:06 |
jamesdenton | patches to ironic or to ngs? | 20:06 |
JayF | I *think* the patches are for our shenanigans, and that we support that generally upstream | 20:06 |
JayF | ngs | 20:06 |
jamesdenton | ok, that's managable | 20:06 |
JayF | I think we might have added some downstream filtering stuff (e.g. port X limited to VLAN Y/Z) | 20:06 |
jamesdenton | is this actual vlan pruning on a server switchport or more related to switch uplinks? | 20:09 |
JayF | actual knob-turning on switches | 20:10 |
JayF | our patch is basically a downstream thing that lets NGS go "no" if someone says "plug port X into VLAN A" where VLAN a is not allowed for port x | 20:10 |
opendevreview | Jay Faulkner proposed openstack/ironic stable/zed: stable only/ci: pin CI to dnsmasq 2.85/pin proliantutils/scciclient https://review.opendev.org/c/openstack/ironic/+/911158 | 20:12 |
JayF | TheJulia: ^ and that should be the fix that needs, I hope. Appears we were setting the microversion wrong | 20:13 |
opendevreview | Verification of a change to openstack/ironic master failed: Special case lenovo UEFI boot setup https://review.opendev.org/c/openstack/ironic/+/908946 | 20:26 |
jamesdenton | JayF that patch sounds nifty | 20:28 |
JayF | I think there's an upstream version of it somewhere | 20:29 |
JayF | https://review.opendev.org/c/openstack/networking-generic-switch/+/888047 | 20:29 |
JayF | that may actually be mergable | 20:30 |
jamesdenton | oh ok, it's a config setting... the allowlist | 20:30 |
jamesdenton | makes sense | 20:30 |
JayF | TheJulia: rpittau: Apparently everyone is workign on the same thing? https://review.opendev.org/c/openstack/ironic-python-agent/+/910480 + https://review.opendev.org/c/openstack/ironic-python-agent/+/912490 | 20:40 |
JayF | file that under an hour-ish wasted | 20:40 |
TheJulia | why am I sitting outside, sipping a cold drink... in LA... when it is windy on a cool day | 21:20 |
TheJulia | why?! | 21:20 |
TheJulia | we really need to get jamesdenton an irccloud account :) | 21:20 |
TheJulia | that, or glue his shoes to irc :) | 21:20 |
TheJulia | JayF: interesting, ack, thanks on the microversion | 21:22 |
opendevreview | Merged openstack/ironic master: Switch to qemu-img functions from ironic-lib 6.0.0 https://review.opendev.org/c/openstack/ironic/+/912471 | 21:27 |
opendevreview | Merged openstack/ironic master: [trivial] add device_type param to attach_vmedia_device https://review.opendev.org/c/openstack/ironic/+/912457 | 22:07 |
opendevreview | Steve Baker proposed openstack/sushy-tools master: Add virtual-media-boot to openstack driver https://review.opendev.org/c/openstack/sushy-tools/+/906768 | 22:37 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!