rpittau | good morning ironic! o/ | 07:13 |
---|---|---|
dtantsur | JayF: at the very least, owner==null will remain the only possible state for standalone users | 09:16 |
dtantsur | Also, while a cluster-scoped nova instance makes little sense, a cluster-scoped baremetal is totally fine | 09:16 |
dtantsur | Is it going to solve any actual problems that users have? | 09:17 |
rpittau | final releases for antelope is up https://review.opendev.org/c/openstack/releases/+/932593 | 09:19 |
TheJulia | JayF: I'd kind of agree with Dmitry, that deprecation of node.owner=null is the wrong move for the project. I *do* think, at least in devstack we should just let an normal project scoped admin enroll nodes. | 10:14 |
TheJulia | JayF: In regard to service scoped user and such, I had a discussion with Vexxhost regarding the same issue/challenge , and truthfully think we should have had the option to be true instead of false. Once they swapped the option their deployment worked as expected. | 10:15 |
iurygregory | good morning ironic | 11:38 |
iurygregory | I just saw the email on openstack-discuss [devstack][nova][cinder][ironic][glance][swift][neutron][all] Deprecating/removing non-uWSGI deployment mechanisms? Since we are tagged maybe we should try to provide some thoughts ? | 11:40 |
Sandzwerg[m] | <TheJulia> "JayF: I'd kind of agree with..." <- For us nodes, so far, are not project scoped. We do this with filters. Nodes have a flavor, and you get quota for that flavor. So if we have multiple free nodes with the same spec you might end up on a different one every time. We might switch eventually. I haven't looked into this yet | 11:56 |
opendevreview | Mahnoor Asghar proposed openstack/sushy-tools master: Minor docs changes for better readability and consistency https://review.opendev.org/c/openstack/sushy-tools/+/932496 | 11:56 |
dtantsur | iurygregory: I think the Ironic bits are represented correctly there: we can about standalone executables, so we'll need to find something that is not eventlet (fortunately, there is no shortage of HTTP servers for Python) | 12:03 |
opendevreview | Mahnoor Asghar proposed openstack/sushy-tools master: Minor docs changes for better readability and consistency https://review.opendev.org/c/openstack/sushy-tools/+/932496 | 12:03 |
rpittau | iurygregory: I don't think there's a lot to say on our side | 12:05 |
rpittau | we have 2 patches from stephenfin to move in that direction that are already approved, but feel free to reply if you have any thought on that :) | 12:05 |
stephenfin | rpittau: iurygregory: Yup, I saw dtantsur's reply to tkajinam and assumed you were testing standalone mode in CI, but later investigation revealed the config opts in your devstack plugin were unused so it was more straightforward than I though (thankfully) :) | 12:07 |
dtantsur | stephenfin: it's tested in non-devstack jobs :) | 12:07 |
stephenfin | Good to hear. In any case, a non-issue now from my perspective :) | 12:08 |
rpittau | stephenfin: thanks for bringing that up btw :) | 12:11 |
dtantsur | stephenfin: mmm, just realized: ironic-inspector might be non-wsgi | 12:11 |
dtantsur | even in devstack | 12:11 |
stephenfin | I thought you'd deprecated that? Or am I thinking of a different recently-retired tool? | 12:12 |
dtantsur | stephenfin: it's deprecated but will stick around for some more time. | 12:12 |
stephenfin | gotcha. I guess some work is needed to toggle IRONIC_INSPECTOR_STANDALONE to false in (most) CI jobs like neutron or doing. Or assume eventlet-in-openstack will outlive ironic-inspector | 12:14 |
dtantsur | yeah, we need to try it | 12:15 |
opendevreview | Mahnoor Asghar proposed openstack/sushy-tools master: Minor docs changes for better readability and consistency https://review.opendev.org/c/openstack/sushy-tools/+/932496 | 12:21 |
*** iurygregory_ is now known as iurygregory | 12:57 | |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: Replace image_format_inspector with its oslo.utils version https://review.opendev.org/c/openstack/ironic/+/929904 | 13:12 |
opendevreview | Merged openstack/ironic master: devstack: Remove IRONIC_USE_MOD_WSGI https://review.opendev.org/c/openstack/ironic/+/932501 | 13:28 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: Redfish power: account for disable_power_off https://review.opendev.org/c/openstack/ironic/+/932610 | 13:29 |
JayF | TheJulia: dtantsur: makes a lot of sense why we wouldn't do that. I knew there had to be a good reason 😄 | 13:39 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: Redfish power: account for disable_power_off https://review.opendev.org/c/openstack/ironic/+/932610 | 13:58 |
opendevreview | Merged openstack/ironic master: devstack: Remove IRONIC_USE_WSGI https://review.opendev.org/c/openstack/ironic/+/932502 | 14:07 |
TheJulia | o/ It would be super awesome to get some eyes on https://review.opendev.org/c/openstack/ironic/+/930655 since 4k hardware is not going to go away. :) | 14:27 |
rpittau | double approval :D | 14:35 |
rpittau | didn't see JayF already approved it | 14:35 |
rpittau | cardoe: hey o/ I was wondering if you plan to add this https://opendev.org/openstack/sushy/commit/8928f45402f26e5adbe9d885c45d827a38db442c to other ironic repos | 14:40 |
TheJulia | much appreciated, thanks! | 15:02 |
opendevreview | Mahnoor Asghar proposed openstack/sushy-tools master: Reject node power off requests to align with ironic supporting NCSI https://review.opendev.org/c/openstack/sushy-tools/+/932623 | 15:02 |
opendevreview | Dmitry Tantsur proposed openstack/ironic master: IPMI power: account for disable_power_off https://review.opendev.org/c/openstack/ironic/+/932624 | 15:12 |
cardoe | rpittau: yes. I think I waited for the PTG cause I was gonna wait on all of those changes for that. But if that's not controversial, I'll push those out today. | 15:16 |
opendevreview | Merged openstack/ironic master: CI: Add a 4k disk CI job https://review.opendev.org/c/openstack/ironic/+/930655 | 15:37 |
rpittau | cardoe: my main issue with that is the pbr version, for consistency we should update the version in requirements.txt too, and we need to justify the min version required | 15:48 |
rpittau | my bad for not having underlined that for sushy | 15:48 |
cardoe | oh | 15:51 |
cardoe | You need pbr 6.0.0 for pyproject.toml support | 15:51 |
cardoe | Honestly once you upgrade everything to modern PEP stuff, I don't see what PBR is providing. | 15:52 |
cardoe | Cause if you look at the code paths it's pretty much just re-exporting or executing setuptools pieces. | 15:54 |
rpittau | cardoe: mmm I'm not sure about that, but I may be wrong, do you have docs supporting that statement? I see supprot for pep517 was added to pbr quite some time ago | 15:54 |
rpittau | cardoe: https://github.com/openstack/pbr/commit/09ee15341014fc0e3bb8a7c3b06a3fa912cfad38 | 15:54 |
cardoe | So that's what clarkb told us to do. | 15:54 |
cardoe | Or was it sean? | 15:55 |
cardoe | JayF: you recall^ | 15:55 |
JayF | rpittau: cardoe is right; pep517 basically means setuptools knows how to load pbr to run the install | 15:56 |
JayF | rpittau: theoretically many of the features we use are provided by setuptools now, but they are shaped differently and I am personally very -1 to us being different than the rest of openstack in that regard | 15:56 |
rpittau | JayF: ok, and why we need pbr 6 or higher ? | 15:56 |
clarkb | cardoe: PBR provides versions management and a couple of other minor things | 15:56 |
clarkb | you probably can replace pbr with other tools at this point but I'm not aware of anyone doing so for an openstack project | 15:57 |
JayF | rpittau: that piece I'm not sure of, maybe clarkb would know | 15:57 |
rpittau | that's my only doubt :) | 15:57 |
cardoe | https://review.opendev.org/c/openstack/nova/+/899753 is where we got the start from | 15:57 |
cardoe | You need pbr 6.0.0 to work correctly with devstack and GLOBAL_VENV | 15:57 |
clarkb | I think there have been a couple of bugfixes to pep 517 in pbr since the original 5.7.0 release with support | 15:58 |
cardoe | clarkb: ah true the git / semver management thing is still provided by pbr. Can't believe that's not in the stdlib yet. | 15:58 |
clarkb | off the top of my head wsgi script generation (could be replaced by a simple file install instead) and authors file generation are features taht are useful too | 15:59 |
clarkb | basically there are several useful features that you probably do want to find a port for and no one has taken the time to do that work | 15:59 |
rpittau | ok perfect, so we really need to update min pbr version besides the introduction of pyproject.toml ? | 15:59 |
clarkb | but I suspect it is doable to port | 15:59 |
clarkb | rpittau: as a general rule outside of pyproject.toml use we expect you always use the latest version of pbr | 16:00 |
clarkb | this has to do with the way setup requires and easy install work. They will blindly install the newest version and not let you control that. So the expectation is the latest version always works. This is also why we can't remove python2 or older python3 support from pbr | 16:00 |
rpittau | clarkb: and we do, at least in CI sinec we have the upper constraints that enforce that, but I guess we need to reflect that in requirements.txt | 16:00 |
clarkb | pyproject.toml changes this because you can now control those versions | 16:01 |
clarkb | to start you probably want to use the latest version of pbr | 16:01 |
clarkb | rpittau: no that is wrong | 16:01 |
clarkb | upper constraints does not control the pbr version | 16:01 |
rpittau | clarkb: so why there's a pbr version in upper-constraints ? | 16:01 |
rpittau | https://github.com/openstack/requirements/blob/master/upper-constraints.txt#L47 | 16:01 |
clarkb | rpittau: because some projects use pbr at runtime and you can control that verison but you can't control the one used to install the packages generally | 16:01 |
clarkb | personally I don't want any of those packages in upper constraints because it constaintly leads to this confusion | 16:02 |
rpittau | alright so I think we're on the same path | 16:02 |
rpittau | thanks clarkb | 16:02 |
clarkb | the only way to control the pbr version for installation is to use pyproject.toml or have a preinstall step that explicitly installs the version of pbr that you want in all the places that will use pbr | 16:03 |
clarkb | but pip install foo --upper-constraints bar.txt won't do that | 16:03 |
rpittau | yep, that's clear, thanks | 16:04 |
rpittau | cardoe: I think we're good to move forward with pyproject.toml for the other ironic projects if you already have the patches, but we should probably fix requirements.txt and setup.py in sushy first :) | 16:06 |
JayF | if it fixes something w/r/t GLOBAL_VENV, I'd love to see that change in ironic soon | 16:07 |
JayF | since we're working on a fix for that with g mann right now | 16:07 |
JayF | (I'm curious what that version update fixes, but taken at face value it seems like a good idea) | 16:07 |
rpittau | JayF: yeah, we really need to have consistency between the ironic repos in general | 16:07 |
opendevreview | Doug Goldstein proposed openstack/sushy master: bump pbr to match what pyproject.toml requests https://review.opendev.org/c/openstack/sushy/+/932638 | 16:09 |
cardoe | There it is. bot was finishing lunch | 16:10 |
cardoe | rpittau: if I take that commit and the prior commit squashed together and submit that against more repos, that'd be good? | 16:10 |
JayF | rpittau with the fastest -1 in the west | 16:10 |
JayF | wa-pow! | 16:11 |
rpittau | lol | 16:11 |
cardoe | my goodness there's a lot of duplication here | 16:11 |
rpittau | yeah | 16:11 |
cardoe | Can we not get rid of setup.py now? | 16:11 |
clarkb | no pbr + pep517 uses setuptools | 16:11 |
clarkb | just like using pep517 with setuptools | 16:12 |
rpittau | heh this ^ | 16:12 |
clarkb | however you may be able to delete that line in setup.py | 16:12 |
* clarkb look at the docs | 16:12 | |
clarkb | oh ya maybe you can delete setup.py but you can't delete setuptools | 16:13 |
clarkb | worth testing | 16:13 |
JayF | Yeah, I was looking at that, fairly sure we can just kill setup.py | 16:13 |
clarkb | (the docs imply this and I'm pretty sure I tested things like that when I wrote that change) | 16:13 |
rpittau | if that can be done in one shot, let's ! | 16:13 |
cardoe | You're triggering old brain cells here. | 16:13 |
JayF | it's roughly equivalent to what the boilerplate equivalent that runs with pep517+pyproject.toml | 16:13 |
cardoe | The only setuptools based project I still have doesn't have a setup.py for a bit. | 16:13 |
clarkb | ya I think the pbr=True argument is supplanted by build-backend = "pbr.build" | 16:14 |
clarkb | so you don't need the setup.py at all unless you're doing some other magic (and sushy isn't thankfully) | 16:14 |
cardoe | well pep517+pyproject.toml = setuptools.setup(setup_requires=<pyproject.toml>.build-system.requires) | 16:14 |
cardoe | It looks like pbr.build as the entry point just calls setup() with setup(pbr=True) | 16:15 |
cardoe | yeah what clarkb said. :-D | 16:15 |
clarkb | yup | 16:15 |
JayF | nice --> it's roughly equivalent to what the boilerplate equivalent that runs with pep517+pyproject.toml <-- | 16:15 |
cardoe | I've been using poetry for years now and I'm actually switching over to uv. | 16:15 |
JayF | the pyproject.toml stuff has been a boon for gentoo | 16:15 |
JayF | poetry is my absolutely least favorite of the alternative python installer things | 16:15 |
cardoe | So I don't like it for installation. I like it for the lock file management. | 16:16 |
JayF | the defaulting to version locking leads folks down a path that will either make their software bitrot quickly or require constant maintenance to bump versions | 16:16 |
JayF | lol | 16:16 |
JayF | the thing I hate is the thing you like | 16:16 |
clarkb | you may also want to make sure zigo is aware | 16:16 |
cardoe | Well I follow the Rust guidelines on lock files. | 16:17 |
clarkb | if the debian packaging is still relying on setup.py this will force an update to something else (which is probably a solved problem in debian generally but maybe not in the debian packaging for openstack) | 16:17 |
cardoe | lock files for libraries are a minimum tested versions. lock files for binaries are the CI tested versions. | 16:17 |
cardoe | You're free to install stuff without using the lock files (and I do). | 16:17 |
cardoe | rpittau: so which would ya prefer? version bump in setup.py or get wild and delete it? I'm leaning towards the former with a plan to remove it later? | 16:21 |
rpittau | cardoe: me too, I don't think removing setup.py is trivial, so let's do one step at a time | 16:21 |
rpittau | good night! o/ | 16:42 |
opendevreview | Doug Goldstein proposed openstack/sushy master: bump pbr to match what pyproject.toml requests https://review.opendev.org/c/openstack/sushy/+/932638 | 17:07 |
opendevreview | Ghanshyam proposed openstack/ironic master: Enable GLOBAL_VENV in ironic grenade jobs https://review.opendev.org/c/openstack/ironic/+/932016 | 17:07 |
JayF | cardoe: https://review.opendev.org/c/openstack/sushy/+/932638/2/setup.py#16 | 17:32 |
JayF | did I misunderstand? | 17:32 |
cardoe | Yeah we just said we'd take the process a bit smaller steps. | 17:49 |
cardoe | We'll patch it all and then remove setup.py afterwards. | 17:49 |
opendevreview | cid proposed openstack/ironic master: [WIP] Add inspection rules https://review.opendev.org/c/openstack/ironic/+/918303 | 18:31 |
rbudden | Does anyone have a handy solution for setting up IPMI/Redfish credenetials during auto-discover w/Ironic Inspector? From what I can see online older versions of Ironic Discoverd supported ‘enable_setting_ipmi_credentials = true’ and allowed Ironic to use ipmitool to set credentials. That feature appears to have been removed (https://bugs.launchpad.net/ironic-inspector/+bug/1654318). | 19:55 |
rbudden | I’m curious how ppl are bulk adding nodes these days, if there’s a special hook or plugin that could handle this (or I could write) | 19:56 |
JayF | This is likely not helpful to you (sorry) but the answer for most places I've worked has been some automation-glue between some external CMDB and Ironic | 19:57 |
JayF | If I had a set of machines that came set to DHCP their BMC + common passwords between them, I'd probably write a simple script that changed the creds and added them to Ironic. | 19:58 |
rbudden | Yeah, I was afraid of that. I’m having two racks 9128 computes | 19:58 |
rbudden | *128 computes dropped in place shortly | 19:58 |
JayF | do you get any kind of sheet from your delivery? | 19:58 |
rbudden | and they need reconfigured, but also I’m assuming that I need to validate as part of the cleaning process that a user hasn’t changed anything with IPMI | 19:58 |
JayF | that you could use to iterate through to add to ironic | 19:58 |
JayF | Ironic will know if the creds are wrong/have been messed with by erroring | 19:59 |
JayF | might just wanna handle the "happy case" then iterate through whatever failures you might get | 19:59 |
JayF | cardoe: ^ you might have some insight here | 19:59 |
rbudden | well, you can have more than one account to IPMI… so that leaves me leary from a security perspective | 20:00 |
JayF | if you're using IPMI and are leary from a security perspective | 20:00 |
JayF | you are probably right :D | 20:00 |
JayF | regardless of if there's a bonus user or not lol | 20:00 |
rbudden | haha, yeah, sorry, using that losely but yes… plan is RedFish ATM | 20:01 |
rbudden | but thinking outside the box | 20:01 |
rbudden | ATM these are recycled nodes, so luckily I have the old IPs, user/pass, and in the future I’m hoping the delivery will have all the details | 20:01 |
JayF | one thing to consider outside the box is that many of those BMCs will work with external auth | 20:01 |
JayF | my downstream ties it into ldap so they can rotate passwords external to ironic and just make the API call to update them in node.driver_info | 20:01 |
JayF | "ldap" I'll note I don't know the actual backend there :) | 20:02 |
rbudden | Yeah, the goal here was to move away from IPMI to RedFish and HTTPS logins. We’ve had vendors in the past (no names given) with less than ideal BMC details given. I was orignally hoping something simliar to an IPA boot could scrape LLDP info and build out address/credentials from data from the switch ports (since we rely on LLDP for Auto Discovery to name nodes) | 20:04 |
JayF | I'm not the biggest expert on inspection/discovery so it's EXTREMELY possible there's an option here I'm not wise to | 20:04 |
JayF | I suggest hanging out and seeing if someone else can point you in a better direction | 20:04 |
rbudden | Sure thing | 20:05 |
cardoe | So how far down the rabbit hole ya wanna go? | 20:12 |
rbudden | :) | 20:13 |
cardoe | At the end of the day there's a secret somewhere. | 20:13 |
rbudden | Sure sure | 20:13 |
cardoe | It's what I argue with the security folks all the time. Stop trying to check boxes and create real attack trees. | 20:14 |
JayF | cardoe: I was making the case to someone the other day that one of the major values of threat modeling is actually being about to use those threat models to tell stories to the business. | 20:14 |
rbudden | I’m happy to entertain ideas. Perhaps I’ll preface it with the goals. Automate hardware discovery and configuration of BMCs. | 20:15 |
rbudden | In the previous Beowulf style cluster the team used xCAT for imaging and used (I think) a Genesis tool for bootstrapping iDRACs, etc. and BMC setup | 20:15 |
cardoe | So that's the same goals we've got and what we're working with. | 20:16 |
rbudden | So I’m trying to find some equivalent that if we have Vendor A drop 10 racks of compute and they screw up the BMCs we don’t spend a ton of time manually doing things | 20:16 |
cardoe | But like I see people that use the default self signed certificate with IPMI and stressing about password complexity. | 20:16 |
cardoe | So I won't say I've got this all working today (cause I don't) | 20:17 |
rbudden | On a side note, security (NASA) would feel better if we move towards something better than IPMI locked down to a VLAN, so we’re attempting to move to RedFish and have the ability to be a bit more secure there. | 20:17 |
cardoe | Any of those folks at MSFC? | 20:17 |
rbudden | A few ppl there will likely use this system. I’m GSFC | 20:17 |
rbudden | We’ll be hosting some DGXs for Inference work for a small team as Marshal | 20:18 |
cardoe | Just a curiosity aside. I can see MSFC from my window. | 20:18 |
rbudden | (roadmap plans) | 20:18 |
rbudden | Haha nice | 20:18 |
cardoe | So I _think_ end goal from a secure stand point is BMCs are segregated on VLAN 1 with a DHCP server. It sees new machines (or old machines that have been factory reset) and starts an initial process on them. | 20:20 |
cardoe | Our stuff is workflow driven (Argo Workflows to be exact). | 20:21 |
cardoe | It logs into the BMC with the factory default creds (we've selected to use the "insecure" one password to rule them all instead of a sticker on the chassis) and grabs the asset tag off the machine. | 20:22 |
cardoe | We then look that up and decide if it's a new machine or old machine. If its an old machine then it should already have an IP and other things assigned. If its new we provision that in the DCIM/IPAM system. | 20:23 |
cardoe | Set the password to something more real and update the IP info for the BMC and kick the power button. | 20:24 |
rbudden | Ok, so this all happens as a prep step before Ironic Inspection (or scripted node creates) happen | 20:25 |
cardoe | Oh sorry. We also drag the machine up to our minimum supported BMC version before allowing it on the real network. | 20:25 |
cardoe | Not using Ironic Inspection cause it just wasn't flexible enough. The plan is for us to work these flows into Ironic using the now integrated inspection. | 20:26 |
rbudden | Makes sense. If nothing existed in Ironic that I could use the thought was yet another isolated VLAN with a small DHCP pool and serving up an image that ran and scraped LLDP info to pull node name from switch, DNS lookup, set IP on BMC, etc. | 20:27 |
cardoe | Once it's on the real network we're giving it OpenID Connect authentication and nuking the password based authentication. | 20:27 |
rbudden | or something like that | 20:27 |
rbudden | Nice | 20:27 |
cardoe | The device flow has more privileges while the authorization flow (I think its called) really only has viewer permissions to be able to pull up the visual console. | 20:27 |
rbudden | OIDC would be interesting… since I could natively integrated with the agency’s identity service | 20:28 |
cardoe | There's a PTG session around serial / graphical console that we hope to tie in. | 20:28 |
rbudden | We do OIDC integration in Horzion | 20:28 |
cardoe | Same here. | 20:28 |
cardoe | So we use dex today as a proxy. | 20:28 |
cardoe | So like my machines at home take my personal GitHub authentication to get to the BMC. | 20:29 |
rbudden | Are you using RedFish for BMC? or other vendor proprietary? | 20:29 |
cardoe | All Redfish. | 20:29 |
rbudden | Cool | 20:29 |
cardoe | Well Sponge Bob meme-ified Redfish | 20:29 |
cardoe | It's all Dell hardware. | 20:29 |
cardoe | Which the nicest thing I can say about it is that it is indeed 100% compliant with the Redfish DMTF validation suite. Assuming you use their fork of the Redfish DMTF validation suite. | 20:30 |
cardoe | Right now the push is to get everything we're touching in Redfish into Sushy (or sushy-oem-idrac) for everyone else to benefit. | 20:33 |
rbudden | Checking the Sushy website now... | 20:34 |
cardoe | Then enhance the apply_configuration step so that everything we touch once the BMC is on the good network can be done via Ironic. | 20:34 |
JayF | sushy is just the library we use (and maintain) for redfish access in Ironic | 20:34 |
rbudden | I’ll have some reading to do. Redfish is new to us, so right now it’s just the barebones getting it working with Ironic in our TDS | 20:35 |
cardoe | https://opendev.org/openstack/sushy | 20:35 |
JayF | right now it only supports things officially in redfish, even though many vendors, as alluded to by cardoe, have proprietary extensions | 20:35 |
JayF | we're looking at making sushy/redfish driver more flexible in face of those proprietary differences | 20:35 |
rbudden | Gotta love the proprietary extensions! | 20:35 |
rbudden | We’re enjoying that fun in our road down SONIC | 20:35 |
JayF | Doesn't matter if I love 'em or not, they exist so I gotta make them work :| | 20:35 |
JayF | although mine is not as dell flavored :D | 20:36 |
cardoe | Yeah JayF's spot on. There's a few recent patches to Ironic that are querying the board type and if its vendor X, make call Y instead. | 20:36 |
rbudden | haha, sorry I should have added /sarcasm, but yeah, it’s life | 20:36 |
cardoe | So what I think we (speaking in the ironic project sense) would like is a way that it's understood this ugliness might come up and how to not liter the code with vendor non-sense. | 20:37 |
cardoe | But have a clean way to load a vendor override for that operation. | 20:37 |
rbudden | makes sense | 20:38 |
cardoe | IMHO having the "idrac" driver in Ironic is less than idea now days. | 20:39 |
cardoe | In 2024.2 the wsman based backend was removed and it became a just subclass of redfish. | 20:39 |
JayF | rbudden: Just so the space geek in me knows what NASA corner is running Ironic; can you tell me what this cluster is for? | 20:41 |
JayF | and how much would it cost to get the openstack source code on the next voyager, I think that'd be the best foot forward for our andromedean friends to find in a few dozen centuries :D /s | 20:42 |
rbudden | So we’re redesigning the way we approach HPC altogether at Goddard. This is the ‘Next Gen System’ that will replace Discover | 20:43 |
rbudden | Discover is our current HPC system with approx 180k cores and 100PB disk | 20:43 |
rbudden | This new system we are building from the ground up to be cloud-native, GitOps/DevSecOps driven, etc. A full remodernizaton to the way we do HPC at NASA | 20:44 |
JayF | hell yeah! | 20:44 |
JayF | You know what might be another group you might have some commonalities in | 20:44 |
JayF | CERN is a longtime heavy Ironic user and community member | 20:44 |
JayF | I'd bet you have similar shaped problems | 20:44 |
JayF | I think kubajj might be the only CERN-y person in IRC these days (I am remembering right that you're at CERN kubajj, right?) | 20:45 |
rbudden | Right now we’re doing a proof of concept (we have a small TDS already running Kayobe) that will be around 10k cores, 1PB NVMe split between two filesystems. | 20:45 |
kubajj | @JayF: yes, indeed. | 20:46 |
rbudden | We have a few contacts at CERN I believe, but yeah, we’ve followed there work | 20:47 |
rbudden | So yeah, it’s exciting stuff, we’re very much hoping this lays the ground work for an awesome next generation. | 20:48 |
rbudden | I’ll probably be around more asking questions as i’ve been out of the Ironic loop for a little bit, but very much in OpenStack for awhile. | 20:48 |
JayF | It's good for all Ironic users/contributors the more folks that are running Ironic; so happy to help in any way I can. | 20:49 |
rbudden | Appreciate that. | 20:49 |
JayF | I'm in here most USA working hours so just let me know | 20:49 |
rbudden | I enjoyed chatting about the Networking-Generic-Switch code I’m working on the other day. | 20:49 |
JayF | ask things about cleaning, the agent, the API, just something other than inspection next time so I can feel more helpful ;) | 20:50 |
rbudden | Haha | 20:50 |
JayF | NGS, you're hitting my knowledge gaps throughout :D | 20:50 |
JayF | lol | 20:50 |
rbudden | Well I have it all working now thanks to you pointing me in the right direction. | 20:50 |
JayF | good stuff, I gotta go walk my doggo now; have a good one o/ | 20:50 |
rbudden | take it easy | 20:51 |
rbudden | gotta jet here as well | 20:51 |
rbudden | @cardoe thanks as well! | 20:51 |
cardoe | yeah happy to help anyway I can. Cause like JayF said I think a couple of us I going in the same direction and if we use/contribute to Ironic it'll only bebetter | 20:51 |
rbudden | Agreed | 20:52 |
rbudden | StackHPC is doing a ton of similar work as well | 20:53 |
rbudden | I need to get my Neutron Trunk code for NGS fixed up for our SONIC switches, then merge with the work they’ve been doing and get it all upstreamed so it’s in the mainline code. | 20:53 |
rbudden | Alright, gotta run for real. Baseball coaching up next. Take it easy | 20:54 |
kubajj | Ah, just managed to go through the last hour or so of messages 😀 | 20:59 |
kubajj | The Ironic at NASA project sounds really cool | 21:01 |
JayF | Reminder if you haven't to register for PTG https://ptg2024.openinfra.dev | 21:22 |
opendevreview | Ghanshyam proposed openstack/ironic master: Enable GLOBAL_VENV in ironic grenade jobs https://review.opendev.org/c/openstack/ironic/+/932016 | 21:48 |
JayF | gmann: that vmbcd change is brilliant, and explains a failure on my more complex change, ty | 21:51 |
shermanm | hey! I (uchicago/chameleon) actually caught up with some of the above discussions at openinfra this week. we're also in the process of rolling out redfish generally, virtual media boot at one site, and probably borrowing some of those NGS patches | 22:00 |
JayF | awesome | 22:00 |
shermanm | trying to be a bit more involved on here than previously | 22:00 |
JayF | we should probably see how to turn them from 'patches' into 'merges' if many people need them | 22:01 |
shermanm | we've got a growing need for vlan trunks on baremetal at least | 22:04 |
shermanm | on the "node auto enrollment" topic, rackspace had a talk which included how they're driving inspection (out of band + in-band) with ansible for that purpose | 22:06 |
JayF | Yeah, that requires work in Neutron before we can fully support it in NGS aiui | 22:06 |
JayF | yeah, most folks have some kinda local automation glue to do that integration piece | 22:06 |
shermanm | as an aside, I'm pretty sure I submitted a proposed change when I meant to just show some example code. I wanted to ask what the "right way" to include a patch with a launchpad bug is? | 22:11 |
clarkb | you can attach a diff/patch file to launchpad bugs. But pushing code to gerrit and linking to it is fine too (you can mark the chagne in gerrit as work in progress if you want people to avoid reviewing it as mergable as is) | 22:14 |
shermanm | gotcha, thanks! | 22:15 |
shermanm | I'm probably going to have a bunch of somewhat odd bugs / feature requests to submit, all surrounding "I need my nodes to reboot as few times as possible, cause that's slow" | 22:18 |
JayF | Ironic in launchpad; and you can post patches to gerrit with a tag of "Related-Bug: #nnnnnn" or "Closes-bug: #nnnnnn" | 22:20 |
JayF | you /can/ post patches to LP bugs, it's how we handle security issues, but it's not our usual workflow | 22:20 |
shermanm | ah, ok. I think the WIP tag is mostly what I needed, I'd submitted https://review.opendev.org/c/openstack/ironic/+/932418 and didn't want to accidentally step on any ties | 22:21 |
shermanm | *toes | 22:21 |
JayF | you'll nbever really bother us putting up code, although Riccardo's comment on there is valid -- you submitted it to a stable/ branch and we only backport patches into those, so it has to go to master first | 22:23 |
JayF | although I will tell you I will be a tough review on that feature; we generally strongly discourage fully disabling cleaning in Ironic | 22:23 |
JayF | and I'll ask questions like ... what use case is this for that our existing disk skip logic + regular cleaning can't handle :) | 22:24 |
shermanm | the main use-case is that we can't run regular cleaning, because the time taken for the extra reboot causes us major end user usability issues | 22:24 |
shermanm | because we're using blazar so users can reserve one particular node, and repeatedly reconfigure it | 22:25 |
shermanm | and if the node is spending 15 minutes rebooting into cleaning, the user will just get "no hosts available" from nova for that duration | 22:25 |
shermanm | but it's the same user "owning" the node before and after, so there are no security concerns | 22:26 |
shermanm | technically we could use root device hints to ensure a stable boot disk and avoid some of this, and I was also able to use deploy templates + node trait + flavor to trigger the cleaning during deploy behavior | 22:28 |
shermanm | but both of those require maintaining additional metadata in ironic per-node, when what we really want is a global setting | 22:28 |
JayF | so first of all; that's explicitly a use case we wouldn't support (or at least I wouldn't want to; but I don't speak for everyone) | 22:28 |
JayF | Is there a reason you wouldn't use rebuild to fill that need instead? | 22:28 |
JayF | Other thing I wonder is if there's some way to use node.owner to cooperate with blazar and help that work, but I don't think that's likely since node.owner doesn't populate up to nova in a meaningful way | 22:29 |
JayF | to be clear: the why to not support that use case is baked in the heart of the patch you filed -- basically it's tough to ask Ironic to deploy to a node where it might not know the previous state | 22:29 |
JayF | lots of stuff can go wrong, including a handful of bugs we only recently escaped where IPA ramdisks could read a leftover configdrive | 22:30 |
shermanm | no, I totally get it, there's a reason that we're maintaining so many forks to support our use-cases | 22:30 |
JayF | one thing I am thinking is | 22:31 |
shermanm | we're also looking into e.g. fast-track to amortize some of the reboot cost | 22:31 |
JayF | I wonder if you could do a deploy step of erase_device_metadata | 22:31 |
JayF | before any of the imaging steps | 22:31 |
JayF | that might be the way to get what you want without going the long way round | 22:31 |
shermanm | basically all I wanted was the ability to specify "default" deploy steps, since the deploy templates approach worked | 22:32 |
JayF | https://opendev.org/openstack/ironic-python-agent/src/branch/master/ironic_python_agent/hardware.py#L2637 | 22:32 |
JayF | so basically just hook up deploy templates, give all your nodes that trait and all the nova flavors you care about that trait | 22:33 |
JayF | put that step in as the first step that runs | 22:33 |
JayF | and you're likely in business, patch-free | 22:33 |
shermanm | as to why not rebuilds: I think that would solve most of the issues, but the UI for it in horizon is kind of frustrating, and pretty different from the initial "launch an instance" one | 22:34 |
JayF | yeah I'd suggest pursuing rebuilds or doing a deploy template like that | 22:35 |
JayF | I wonder if we would consider setting a default deploy template | 22:35 |
JayF | so you could make the template, set something in config, and just be done | 22:35 |
shermanm | the deploy template approach absolutely works, and default deploy templates would completely solve my issue | 22:35 |
JayF | I would set up deploy templates for everything for today | 22:36 |
JayF | and file a bug, tag with RFE, with the default deploy templates idea | 22:36 |
JayF | we can see how others feel about it | 22:36 |
shermanm | e.g. I tested the template+flavor+trait approach already, but was trying to avoid needing to avoid needing to introduce traits into my environment | 22:36 |
JayF | we're using traits for more and more stuff | 22:36 |
shermanm | but yeah, that makes sense, sounds like a good path to get feedback | 22:36 |
JayF | e.g. the new runbooks feature uses traits to identify what nodes can run what runbooks | 22:36 |
JayF | you now have me wondering if someone could (ab)use node service to implement some wacky custom rebuild logic :) | 22:37 |
shermanm | traits do seem really useful, it was just "a new way nova scheduler can fail kind of opaquely", and I didn't want to spring it on my operators. I'm planning to deploy them sooner or later anyway to let users trigger bios changes | 22:38 |
JayF | I feel that in my bones re: "a new way it can fail opaquely" LOL | 22:39 |
JayF | I've probably screamed at nova-scheduler more than any other openstack service (and it's not really it's fault, it was a cloud with a bad rabbitmq and scheduler was always just the canary) | 22:39 |
shermanm | I mean, I already found a race condition when creating nova aggregates from this same sort of work: https://bugs.launchpad.net/nova/+bug/1542491 | 22:40 |
shermanm | I'm right there with you | 22:40 |
shermanm | I just want nova to tell me *why* no hosts were found | 22:41 |
shermanm | but thanks again! this was helpful | 22:44 |
JayF | I went ahead and documented our chat in that gerrit patch | 22:46 |
JayF | https://review.opendev.org/c/openstack/ironic/+/932418/1#message-41143859b3410a6f8997eac577cb8158dca27159 | 22:46 |
JayF | if other folks are onboard, implementing a default is likely trivial: https://opendev.org/openstack/ironic/src/branch/master/ironic/conductor/steps.py#L339 just having this, or the callers, fallback to a configured default if one is set | 22:49 |
shermanm | nice! yeah, I'll definitely take a crack at that RFE | 22:52 |
opendevreview | Steve Baker proposed openstack/ironic-python-agent stable/2024.1: Remove non RE2 job config https://review.opendev.org/c/openstack/ironic-python-agent/+/932660 | 22:53 |
opendevreview | Steve Baker proposed openstack/ironic-python-agent stable/2023.1: Remove non RE2 job config https://review.opendev.org/c/openstack/ironic-python-agent/+/932661 | 22:53 |
opendevreview | Steve Baker proposed openstack/ironic-python-agent stable/2023.2: Remove non RE2 job config https://review.opendev.org/c/openstack/ironic-python-agent/+/932662 | 22:54 |
opendevreview | Steve Baker proposed openstack/ironic-python-agent bugfix/9.12: Follow up to broken Zuul config https://review.opendev.org/c/openstack/ironic-python-agent/+/932663 | 22:58 |
opendevreview | Michael Sherman proposed openstack/ironic stable/2023.1: allow disk cleaning during deploy https://review.opendev.org/c/openstack/ironic/+/932418 | 23:02 |
shermanm | bah, just making the wip (and probably wonfix) linked to the existing LP bug | 23:03 |
opendevreview | Steve Baker proposed openstack/ironic-python-agent bugfix/9.9: Remove and disable examples job https://review.opendev.org/c/openstack/ironic-python-agent/+/928019 | 23:04 |
opendevreview | Steve Baker proposed openstack/ironic-python-agent bugfix/9.9: Inspect non-raw images for safety https://review.opendev.org/c/openstack/ironic-python-agent/+/927984 | 23:04 |
opendevreview | Merged openstack/ironic-python-agent bugfix/9.12: Follow up to broken Zuul config https://review.opendev.org/c/openstack/ironic-python-agent/+/932663 | 23:39 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!