Thursday, 2024-10-17

rpittaugood morning ironic! o/07:13
dtantsurJayF: at the very least, owner==null will remain the only possible state for standalone users09:16
dtantsurAlso, while a cluster-scoped nova instance makes little sense, a cluster-scoped baremetal is totally fine09:16
dtantsurIs it going to solve any actual problems that users have?09:17
rpittaufinal releases for antelope is up https://review.opendev.org/c/openstack/releases/+/93259309:19
TheJuliaJayF: I'd kind of agree with Dmitry, that deprecation of node.owner=null is the wrong move for the project. I *do* think, at least in devstack we should just let an normal project scoped admin enroll nodes.10:14
TheJuliaJayF: In regard to service scoped user and such, I had a discussion with Vexxhost regarding the same issue/challenge , and truthfully think we should have had the option to be true instead of false. Once they swapped the option their deployment worked as expected.10:15
iurygregorygood morning ironic11:38
iurygregoryI just saw the email on openstack-discuss [devstack][nova][cinder][ironic][glance][swift][neutron][all] Deprecating/removing non-uWSGI deployment mechanisms? Since we are tagged maybe we should try to provide some thoughts ?11:40
Sandzwerg[m]<TheJulia> "JayF: I'd kind of agree with..." <- For us nodes, so far, are not project scoped. We do this with filters. Nodes have a flavor, and you get quota for that flavor. So if we have multiple free nodes with the same spec you might end up on a different one every time. We might switch eventually. I haven't looked into this yet11:56
opendevreviewMahnoor Asghar proposed openstack/sushy-tools master: Minor docs changes for better readability and consistency  https://review.opendev.org/c/openstack/sushy-tools/+/93249611:56
dtantsuriurygregory: I think the Ironic bits are represented correctly there: we can about standalone executables, so we'll need to find something that is not eventlet (fortunately, there is no shortage of HTTP servers for Python)12:03
opendevreviewMahnoor Asghar proposed openstack/sushy-tools master: Minor docs changes for better readability and consistency  https://review.opendev.org/c/openstack/sushy-tools/+/93249612:03
rpittauiurygregory: I don't think there's a lot to say on our side12:05
rpittauwe have 2 patches from stephenfin to move in that direction that are already approved, but feel free to reply if you have any thought on that :)12:05
stephenfinrpittau: iurygregory: Yup, I saw dtantsur's reply to tkajinam and assumed you were testing standalone mode in CI, but later investigation revealed the config opts in your devstack plugin were unused so it was more straightforward than I though (thankfully) :)12:07
dtantsurstephenfin: it's tested in non-devstack jobs :)12:07
stephenfinGood to hear. In any case, a non-issue now from my perspective :)12:08
rpittaustephenfin: thanks for bringing that up btw :)12:11
dtantsurstephenfin: mmm, just realized: ironic-inspector might be non-wsgi12:11
dtantsureven in devstack12:11
stephenfinI thought you'd deprecated that? Or am I thinking of a different recently-retired tool?12:12
dtantsurstephenfin: it's deprecated but will stick around for some more time.12:12
stephenfingotcha. I guess some work is needed to toggle IRONIC_INSPECTOR_STANDALONE to false in (most) CI jobs like neutron or doing. Or assume eventlet-in-openstack will outlive ironic-inspector12:14
dtantsuryeah, we need to try it12:15
opendevreviewMahnoor Asghar proposed openstack/sushy-tools master: Minor docs changes for better readability and consistency  https://review.opendev.org/c/openstack/sushy-tools/+/93249612:21
*** iurygregory_ is now known as iurygregory12:57
opendevreviewDmitry Tantsur proposed openstack/ironic master: Replace image_format_inspector with its oslo.utils version  https://review.opendev.org/c/openstack/ironic/+/92990413:12
opendevreviewMerged openstack/ironic master: devstack: Remove IRONIC_USE_MOD_WSGI  https://review.opendev.org/c/openstack/ironic/+/93250113:28
opendevreviewDmitry Tantsur proposed openstack/ironic master: Redfish power: account for disable_power_off  https://review.opendev.org/c/openstack/ironic/+/93261013:29
JayFTheJulia: dtantsur: makes a lot of sense why we wouldn't do that. I knew there had to be a good reason 😄13:39
opendevreviewDmitry Tantsur proposed openstack/ironic master: Redfish power: account for disable_power_off  https://review.opendev.org/c/openstack/ironic/+/93261013:58
opendevreviewMerged openstack/ironic master: devstack: Remove IRONIC_USE_WSGI  https://review.opendev.org/c/openstack/ironic/+/93250214:07
TheJuliao/ It would be super awesome to get some eyes on https://review.opendev.org/c/openstack/ironic/+/930655 since 4k hardware is not going to go away. :)14:27
rpittaudouble approval :D14:35
rpittaudidn't see JayF already approved it14:35
rpittaucardoe: hey o/ I was wondering if you plan to add this https://opendev.org/openstack/sushy/commit/8928f45402f26e5adbe9d885c45d827a38db442c to other ironic repos14:40
TheJuliamuch appreciated, thanks!15:02
opendevreviewMahnoor Asghar proposed openstack/sushy-tools master: Reject node power off requests to align with ironic supporting NCSI  https://review.opendev.org/c/openstack/sushy-tools/+/93262315:02
opendevreviewDmitry Tantsur proposed openstack/ironic master: IPMI power: account for disable_power_off  https://review.opendev.org/c/openstack/ironic/+/93262415:12
cardoerpittau: yes. I think I waited for the PTG cause I was gonna wait on all of those changes for that. But if that's not controversial, I'll push those out today.15:16
opendevreviewMerged openstack/ironic master: CI: Add a 4k disk CI job  https://review.opendev.org/c/openstack/ironic/+/93065515:37
rpittaucardoe: my main issue with that is the pbr version, for consistency we should update the version in requirements.txt too, and we need to justify the min version required15:48
rpittaumy bad for not having underlined that for sushy15:48
cardoeoh15:51
cardoeYou need pbr 6.0.0 for pyproject.toml support15:51
cardoeHonestly once you upgrade everything to modern PEP stuff, I don't see what PBR is providing.15:52
cardoeCause if you look at the code paths it's pretty much just re-exporting or executing setuptools pieces.15:54
rpittaucardoe: mmm I'm not sure about that, but I may be wrong, do you have docs supporting that statement? I see supprot for pep517 was added to pbr quite some time ago15:54
rpittaucardoe: https://github.com/openstack/pbr/commit/09ee15341014fc0e3bb8a7c3b06a3fa912cfad3815:54
cardoeSo that's what clarkb told us to do.15:54
cardoeOr was it sean?15:55
cardoeJayF: you recall^15:55
JayFrpittau: cardoe is right; pep517 basically means setuptools knows how to load pbr to run the install15:56
JayFrpittau: theoretically many of the features we use are provided by setuptools now, but they are shaped differently and I am personally very -1 to us being different than the rest of openstack in that regard15:56
rpittauJayF: ok, and why we need pbr 6 or higher ?15:56
clarkbcardoe: PBR provides versions management and a couple of other minor things15:56
clarkbyou probably can replace pbr with other tools at this point but I'm not aware of anyone doing so for an openstack project15:57
JayFrpittau: that piece I'm not sure of, maybe clarkb would know15:57
rpittauthat's my only doubt :)15:57
cardoehttps://review.opendev.org/c/openstack/nova/+/899753 is where we got the start from15:57
cardoeYou need pbr 6.0.0 to work correctly with devstack and GLOBAL_VENV15:57
clarkbI think there have been a couple of bugfixes to pep 517 in pbr since the original 5.7.0 release with support15:58
cardoeclarkb: ah true the git / semver management thing is still provided by pbr. Can't believe that's not in the stdlib yet.15:58
clarkboff the top of my head wsgi script generation (could be replaced by a simple file install instead) and authors file generation are features taht are useful too15:59
clarkbbasically there are several useful features that you probably do want to find a port for and no one has taken the time to do that work15:59
rpittauok perfect, so we really need to update min pbr version besides the introduction of pyproject.toml ?15:59
clarkbbut I suspect it is doable to port15:59
clarkbrpittau: as a general rule outside of pyproject.toml use we expect you always use the latest version of pbr16:00
clarkbthis has to do with the way setup requires and easy install work. They will blindly install the newest version and not let you control that. So the expectation is the latest version always works. This is also why we can't remove python2 or older python3 support from pbr16:00
rpittauclarkb: and we do, at least in CI sinec we have the upper constraints that enforce that, but I guess we need to reflect that in requirements.txt16:00
clarkbpyproject.toml changes this because you can now control those versions16:01
clarkbto start you probably want to use the latest version of pbr16:01
clarkbrpittau: no that is wrong16:01
clarkbupper constraints does not control the pbr version16:01
rpittauclarkb: so why there's a pbr version in upper-constraints ?16:01
rpittauhttps://github.com/openstack/requirements/blob/master/upper-constraints.txt#L4716:01
clarkbrpittau: because some projects use pbr at runtime and you can control that verison but you can't control the one used to install the packages generally16:01
clarkbpersonally I don't want any of those packages in upper constraints because it constaintly leads to this confusion16:02
rpittaualright so I think we're on the same path16:02
rpittauthanks clarkb 16:02
clarkbthe only way to control the pbr version for installation is to use pyproject.toml or have a preinstall step that explicitly installs the version of pbr that you want in all the places that will use pbr16:03
clarkbbut pip install foo --upper-constraints bar.txt won't do that16:03
rpittauyep, that's clear, thanks16:04
rpittaucardoe: I think we're good to move forward with pyproject.toml for the other ironic projects if you already have the patches, but we should probably fix requirements.txt and setup.py in sushy first :)16:06
JayFif it fixes something w/r/t GLOBAL_VENV, I'd love to see that change in ironic soon16:07
JayFsince we're working on a fix for that with g mann right now16:07
JayF(I'm curious what that version update fixes, but taken at face value it seems like a good idea)16:07
rpittauJayF: yeah, we really need to have consistency between the ironic repos in general16:07
opendevreviewDoug Goldstein proposed openstack/sushy master: bump pbr to match what pyproject.toml requests  https://review.opendev.org/c/openstack/sushy/+/93263816:09
cardoeThere it is. bot was finishing lunch16:10
cardoerpittau: if I take that commit and the prior commit squashed together and submit that against more repos, that'd be good?16:10
JayFrpittau with the fastest -1 in the west16:10
JayFwa-pow!16:11
rpittaulol16:11
cardoemy goodness there's a lot of duplication here16:11
rpittauyeah16:11
cardoeCan we not get rid of setup.py now?16:11
clarkbno pbr + pep517 uses setuptools16:11
clarkbjust like using pep517 with setuptools16:12
rpittauheh this ^16:12
clarkbhowever you may be able to delete that line in setup.py16:12
* clarkb look at the docs16:12
clarkboh ya maybe you can delete setup.py but you can't delete setuptools16:13
clarkbworth testing16:13
JayFYeah, I was looking at that, fairly sure we can just kill setup.py16:13
clarkb(the docs imply this and I'm pretty sure I tested things like that when I wrote that change)16:13
rpittauif that can be done in one shot, let's !16:13
cardoeYou're triggering old brain cells here.16:13
JayFit's roughly equivalent to what the boilerplate equivalent that runs with pep517+pyproject.toml16:13
cardoeThe only setuptools based project I still have doesn't have a setup.py for a bit.16:13
clarkbya I think the pbr=True argument is supplanted by build-backend = "pbr.build"16:14
clarkbso you don't need the setup.py at all unless you're doing some other magic (and sushy isn't thankfully)16:14
cardoewell pep517+pyproject.toml = setuptools.setup(setup_requires=<pyproject.toml>.build-system.requires)16:14
cardoeIt looks like pbr.build as the entry point just calls setup() with setup(pbr=True)16:15
cardoeyeah what clarkb said. :-D 16:15
clarkbyup16:15
JayFnice --> it's roughly equivalent to what the boilerplate equivalent that runs with pep517+pyproject.toml <--16:15
cardoeI've been using poetry for years now and I'm actually switching over to uv.16:15
JayFthe pyproject.toml stuff has been a boon for gentoo16:15
JayFpoetry is my absolutely least favorite of the alternative python installer things16:15
cardoeSo I don't like it for installation. I like it for the lock file management.16:16
JayFthe defaulting to version locking leads folks down a path that will either make their software bitrot quickly or require constant maintenance to bump versions16:16
JayFlol16:16
JayFthe thing I hate is the thing you like16:16
clarkbyou may also want to make sure zigo is aware16:16
cardoeWell I follow the Rust guidelines on lock files.16:17
clarkbif the debian packaging is still relying on setup.py this will force an update to something else (which is probably a solved problem in debian generally but maybe not in the debian packaging for openstack)16:17
cardoelock files for libraries are a minimum tested versions. lock files for binaries are the CI tested versions.16:17
cardoeYou're free to install stuff without using the lock files (and I do).16:17
cardoerpittau: so which would ya prefer? version bump in setup.py or get wild and delete it? I'm leaning towards the former with a plan to remove it later?16:21
rpittaucardoe: me too, I don't think removing setup.py is trivial, so let's do one step at a time16:21
rpittaugood night! o/16:42
opendevreviewDoug Goldstein proposed openstack/sushy master: bump pbr to match what pyproject.toml requests  https://review.opendev.org/c/openstack/sushy/+/93263817:07
opendevreviewGhanshyam proposed openstack/ironic master: Enable GLOBAL_VENV in ironic grenade jobs  https://review.opendev.org/c/openstack/ironic/+/93201617:07
JayFcardoe: https://review.opendev.org/c/openstack/sushy/+/932638/2/setup.py#1617:32
JayFdid I misunderstand?17:32
cardoeYeah we just said we'd take the process a bit smaller steps.17:49
cardoeWe'll patch it all and then remove setup.py afterwards.17:49
opendevreviewcid proposed openstack/ironic master: [WIP] Add inspection rules  https://review.opendev.org/c/openstack/ironic/+/91830318:31
rbuddenDoes anyone have a handy solution for setting up IPMI/Redfish credenetials during auto-discover w/Ironic Inspector? From what I can see online older versions of Ironic Discoverd supported ‘enable_setting_ipmi_credentials = true’ and allowed Ironic to use ipmitool to set credentials. That feature appears to have been removed (https://bugs.launchpad.net/ironic-inspector/+bug/1654318).19:55
rbuddenI’m curious how ppl are bulk adding nodes these days, if there’s a special hook or plugin that could handle this (or I could write)19:56
JayFThis is likely not helpful to you (sorry) but the answer for most places I've worked has been some automation-glue between some external CMDB and Ironic19:57
JayFIf I had a set of machines that came set to DHCP their BMC + common passwords between them, I'd probably write a simple script that changed the creds and added them to Ironic.19:58
rbuddenYeah, I was afraid of that. I’m having two racks 9128 computes19:58
rbudden*128 computes dropped in place shortly19:58
JayFdo you get any kind of sheet from your delivery?19:58
rbuddenand they need reconfigured, but also I’m assuming that I need to validate as part of the cleaning process that a user hasn’t changed anything with IPMI19:58
JayFthat you could use to iterate through to add to ironic19:58
JayFIronic will know if the creds are wrong/have been messed with by erroring19:59
JayFmight just wanna handle the "happy case" then iterate through whatever failures you might get19:59
JayFcardoe: ^ you might have some insight here19:59
rbuddenwell, you can have more than one account to IPMI… so that leaves me leary from a security perspective20:00
JayFif you're using IPMI and are leary from a security perspective20:00
JayFyou are probably right :D 20:00
JayFregardless of if there's a bonus user or not lol20:00
rbuddenhaha, yeah, sorry, using that losely but yes… plan is RedFish ATM20:01
rbuddenbut thinking outside the box20:01
rbuddenATM these are recycled nodes, so luckily I have the old IPs, user/pass, and in the future I’m hoping the delivery will have all the details20:01
JayFone thing to consider outside the box is that many of those BMCs will work with external auth20:01
JayFmy downstream ties it into ldap so they can rotate passwords external to ironic and just make the API call to update them in node.driver_info20:01
JayF"ldap" I'll note I don't know the actual backend there :) 20:02
rbuddenYeah, the goal here was to move away from IPMI to RedFish and HTTPS logins. We’ve had vendors in the past (no names given) with less than ideal BMC details given. I was orignally hoping something simliar to an IPA boot could scrape LLDP info and build out address/credentials from data from the switch ports (since we rely on LLDP for Auto Discovery to name nodes)20:04
JayFI'm not the biggest expert on inspection/discovery so it's EXTREMELY possible there's an option here I'm not wise to20:04
JayFI suggest hanging out and seeing if someone else can point you in a better direction20:04
rbuddenSure thing20:05
cardoeSo how far down the rabbit hole ya wanna go?20:12
rbudden:)20:13
cardoeAt the end of the day there's a secret somewhere.20:13
rbuddenSure sure20:13
cardoeIt's what I argue with the security folks all the time. Stop trying to check boxes and create real attack trees.20:14
JayFcardoe: I was making the case to someone the other day that one of the major values of threat modeling is actually being about to use those threat models to tell stories to the business.20:14
rbuddenI’m happy to entertain ideas. Perhaps I’ll preface it with the goals. Automate hardware discovery and configuration of BMCs.20:15
rbuddenIn the previous Beowulf style cluster the team used xCAT for imaging and used (I think) a Genesis tool for bootstrapping iDRACs, etc. and BMC setup20:15
cardoeSo that's the same goals we've got and what we're working with.20:16
rbuddenSo I’m trying to find some equivalent that if we have Vendor A drop 10 racks of compute and they screw up the BMCs we don’t spend a ton of time manually doing things20:16
cardoeBut like I see people that use the default self signed certificate with IPMI and stressing about password complexity.20:16
cardoeSo I won't say I've got this all working today (cause I don't)20:17
rbuddenOn a side note, security (NASA) would feel better if we move towards something better than IPMI locked down to a VLAN, so we’re attempting to move to RedFish and have the ability to be a bit more secure there.20:17
cardoeAny of those folks at MSFC?20:17
rbuddenA few ppl there will likely use this system. I’m GSFC20:17
rbuddenWe’ll be hosting some DGXs for Inference work for a small team as Marshal 20:18
cardoeJust a curiosity aside. I can see MSFC from my window.20:18
rbudden(roadmap plans)20:18
rbuddenHaha nice20:18
cardoeSo I _think_ end goal from a secure stand point is BMCs are segregated on VLAN 1 with a DHCP server. It sees new machines (or old machines that have been factory reset) and starts an initial process on them.20:20
cardoeOur stuff is workflow driven (Argo Workflows to be exact).20:21
cardoeIt logs into the BMC with the factory default creds (we've selected to use the "insecure" one password to rule them all instead of a sticker on the chassis) and grabs the asset tag off the machine.20:22
cardoeWe then look that up and decide if it's a new machine or old machine. If its an old machine then it should already have an IP and other things assigned. If its new we provision that in the DCIM/IPAM system.20:23
cardoeSet the password to something more real and update the IP info for the BMC and kick the power button.20:24
rbuddenOk, so this all happens as a prep step before Ironic Inspection (or scripted node creates) happen20:25
cardoeOh sorry. We also drag the machine up to our minimum supported BMC version before allowing it on the real network.20:25
cardoeNot using Ironic Inspection cause it just wasn't flexible enough. The plan is for us to work these flows into Ironic using the now integrated inspection.20:26
rbuddenMakes sense. If nothing existed in Ironic that I could use the thought was yet another isolated VLAN with a small DHCP pool and serving up an image that ran and scraped LLDP info to pull node name from switch, DNS lookup, set IP on BMC, etc.20:27
cardoeOnce it's on the real network we're giving it OpenID Connect authentication and nuking the password based authentication.20:27
rbuddenor something like that20:27
rbuddenNice20:27
cardoeThe device flow has more privileges while the authorization flow (I think its called) really only has viewer permissions to be able to pull up the visual console.20:27
rbuddenOIDC would be interesting… since I could natively integrated with the agency’s identity service20:28
cardoeThere's a PTG session around serial / graphical console that we hope to tie in.20:28
rbuddenWe do OIDC integration in Horzion20:28
cardoeSame here.20:28
cardoeSo we use dex today as a proxy.20:28
cardoeSo like my machines at home take my personal GitHub authentication to get to the BMC.20:29
rbuddenAre you using RedFish for BMC? or other vendor proprietary?20:29
cardoeAll Redfish.20:29
rbuddenCool20:29
cardoeWell Sponge Bob meme-ified Redfish20:29
cardoeIt's all Dell hardware.20:29
cardoeWhich the nicest thing I can say about it is that it is indeed 100% compliant with the Redfish DMTF validation suite. Assuming you use their fork of the Redfish DMTF validation suite.20:30
cardoeRight now the push is to get everything we're touching in Redfish into Sushy (or sushy-oem-idrac) for everyone else to benefit.20:33
rbuddenChecking the Sushy website now...20:34
cardoeThen enhance the apply_configuration step so that everything we touch once the BMC is on the good network can be done via Ironic.20:34
JayFsushy is just the library we use (and maintain) for redfish access in Ironic20:34
rbuddenI’ll have some reading to do. Redfish is new to us, so right now it’s just the barebones getting it working with Ironic in our TDS20:35
cardoehttps://opendev.org/openstack/sushy20:35
JayFright now it only supports things officially in redfish, even though many vendors, as alluded to by cardoe, have proprietary extensions20:35
JayFwe're looking at making sushy/redfish driver more flexible in face of those proprietary differences20:35
rbuddenGotta love the proprietary extensions!20:35
rbuddenWe’re enjoying that fun in our road down SONIC20:35
JayFDoesn't matter if I love 'em or not, they exist so I gotta make them work :| 20:35
JayFalthough mine is not as dell flavored :D20:36
cardoeYeah JayF's spot on. There's a few recent patches to Ironic that are querying the board type and if its vendor X, make call Y instead.20:36
rbuddenhaha, sorry I should have added /sarcasm, but yeah, it’s life20:36
cardoeSo what I think we (speaking in the ironic project sense) would like is a way that it's understood this ugliness might come up and how to not liter the code with vendor non-sense.20:37
cardoeBut have a clean way to load a vendor override for that operation.20:37
rbuddenmakes sense20:38
cardoeIMHO having the "idrac" driver in Ironic is less than idea now days.20:39
cardoeIn 2024.2 the wsman based backend was removed and it became a just subclass of redfish.20:39
JayFrbudden: Just so the space geek in me knows what NASA corner is running Ironic; can you tell me what this cluster is for?20:41
JayFand how much would it cost to get the openstack source code on the next voyager, I think that'd be the best foot forward for our andromedean friends to find in a few dozen centuries :D /s 20:42
rbuddenSo we’re redesigning the way we approach HPC altogether at Goddard. This is the ‘Next Gen System’ that will replace Discover20:43
rbuddenDiscover is our current HPC system with approx 180k cores and 100PB disk20:43
rbuddenThis new system we are building from the ground up to be cloud-native, GitOps/DevSecOps driven, etc. A full remodernizaton to the way we do HPC at NASA20:44
JayFhell yeah!20:44
JayFYou know what might be another group you might have some commonalities in20:44
JayFCERN is a longtime heavy Ironic user and community member20:44
JayFI'd bet you have similar shaped problems20:44
JayFI think kubajj might be the only CERN-y person in IRC these days (I am remembering right that you're at CERN kubajj, right?)20:45
rbuddenRight now we’re doing a proof of concept (we have a small TDS already running Kayobe) that will be around 10k cores, 1PB NVMe split between two filesystems.20:45
kubajj@JayF: yes, indeed.20:46
rbuddenWe have a few contacts at CERN I believe, but yeah, we’ve followed there work20:47
rbuddenSo yeah, it’s exciting stuff, we’re very much hoping this lays the ground work for an awesome next generation.20:48
rbuddenI’ll probably be around more asking questions as i’ve been out of the Ironic loop for a little bit, but very much in OpenStack for awhile.20:48
JayFIt's good for all Ironic users/contributors the more folks that are running Ironic; so happy to help in any way I can. 20:49
rbuddenAppreciate that. 20:49
JayFI'm in here most USA working hours so just let me know 20:49
rbuddenI enjoyed chatting about the Networking-Generic-Switch code I’m working on the other day.20:49
JayFask things about cleaning, the agent, the API, just something other than inspection next time so I can feel more helpful ;) 20:50
rbuddenHaha20:50
JayFNGS, you're hitting my knowledge gaps throughout :D 20:50
JayFlol20:50
rbuddenWell I have it all working now thanks to you pointing me in the right direction.20:50
JayFgood stuff, I gotta go walk my doggo now; have a good one o/ 20:50
rbuddentake it easy20:51
rbuddengotta jet here as well20:51
rbudden@cardoe thanks as well!20:51
cardoeyeah happy to help anyway I can. Cause like JayF said I think a couple of us I going in the same direction and if we use/contribute to Ironic it'll only bebetter20:51
rbuddenAgreed20:52
rbuddenStackHPC is doing a ton of similar work as well20:53
rbuddenI need to get my Neutron Trunk code for NGS fixed up for our SONIC switches, then merge with the work they’ve been doing and get it all upstreamed so it’s in the mainline code.20:53
rbuddenAlright, gotta run for real. Baseball coaching up next. Take it easy20:54
kubajjAh, just managed to go through the last hour or so of messages 😀20:59
kubajjThe Ironic at NASA project sounds really cool21:01
JayFReminder if you haven't to register for PTG https://ptg2024.openinfra.dev 21:22
opendevreviewGhanshyam proposed openstack/ironic master: Enable GLOBAL_VENV in ironic grenade jobs  https://review.opendev.org/c/openstack/ironic/+/93201621:48
JayFgmann: that vmbcd change is brilliant, and explains a failure on my more complex change, ty21:51
shermanmhey! I  (uchicago/chameleon) actually caught up with some of the above discussions at openinfra this week. we're also in the process of rolling out redfish generally, virtual media boot at one site, and probably borrowing some of those NGS patches22:00
JayFawesome22:00
shermanmtrying to be a bit more involved on here than previously22:00
JayFwe should probably see how to turn them from 'patches' into 'merges' if many people need them22:01
shermanmwe've got a growing need for vlan trunks on baremetal at least22:04
shermanmon the "node auto enrollment" topic, rackspace had a talk which included how they're driving inspection (out of band + in-band) with ansible for that purpose22:06
JayFYeah, that requires work in Neutron before we can fully support it in NGS aiui22:06
JayFyeah, most folks have some kinda local automation glue to do that integration piece22:06
shermanmas an aside, I'm pretty sure I submitted a proposed change when I meant to just show some example code. I wanted to ask what the "right way" to include a patch with a launchpad bug is?22:11
clarkbyou can attach a diff/patch file to launchpad bugs. But pushing code to gerrit and linking to it is fine too (you can mark the chagne in gerrit as work in progress if you want people to avoid reviewing it as mergable as is)22:14
shermanmgotcha, thanks!22:15
shermanmI'm probably going to have a bunch of somewhat odd bugs / feature requests to submit, all surrounding "I need my nodes to reboot as few times as possible, cause that's slow"22:18
JayFIronic in launchpad; and you can post patches to gerrit with a tag of "Related-Bug: #nnnnnn" or "Closes-bug: #nnnnnn" 22:20
JayFyou /can/ post patches to LP bugs, it's how we handle security issues, but it's not our usual workflow22:20
shermanmah, ok. I think the WIP tag is mostly what I needed, I'd submitted https://review.opendev.org/c/openstack/ironic/+/932418 and didn't want to accidentally step on any ties22:21
shermanm*toes22:21
JayFyou'll nbever really bother us putting up code, although Riccardo's comment on there is valid -- you submitted it to a stable/ branch and we only backport patches into those, so it has to go to master first22:23
JayFalthough I will tell you I will be a tough review on that feature; we generally strongly discourage fully disabling cleaning in Ironic22:23
JayFand I'll ask questions like ... what use case is this for that our existing disk skip logic + regular cleaning can't handle :)22:24
shermanmthe main use-case is that we can't run regular cleaning, because the time taken for the extra reboot causes us major end user usability issues22:24
shermanmbecause we're using blazar so users can reserve one particular node, and repeatedly reconfigure it22:25
shermanmand if the node is spending 15 minutes rebooting into cleaning, the user will just get "no hosts available" from nova for that duration22:25
shermanmbut it's the same user "owning" the node before and after, so there are no security concerns22:26
shermanmtechnically we could use root device hints to ensure a stable boot disk and avoid some of this, and I was also able to use deploy templates + node trait + flavor to trigger the cleaning during deploy behavior22:28
shermanmbut both of those require maintaining additional metadata in ironic per-node, when what we really want is a global setting22:28
JayFso first of all; that's explicitly a use case we wouldn't support (or at least I wouldn't want to; but I don't speak for everyone)22:28
JayFIs there a reason you wouldn't use rebuild to fill that need instead?22:28
JayFOther thing I wonder is if there's some way to use node.owner to cooperate with blazar and help that work, but I don't think that's likely since node.owner doesn't populate up to nova in a meaningful way22:29
JayFto be clear: the why to not support that use case is baked in the heart of the patch you filed -- basically it's tough to ask Ironic to deploy to a node where it might not know the previous state22:29
JayFlots of stuff can go wrong, including a handful of bugs we only recently escaped where IPA ramdisks could read a leftover configdrive22:30
shermanmno, I totally get it, there's a reason that we're maintaining so many forks to support our use-cases22:30
JayFone thing I am thinking is22:31
shermanmwe're also looking into e.g. fast-track to amortize some of the reboot cost22:31
JayFI wonder if you could do a deploy step of erase_device_metadata22:31
JayFbefore any of the imaging steps22:31
JayFthat might be the way to get what you want without going the long way round22:31
shermanmbasically all I wanted was the ability to specify "default" deploy steps, since the deploy templates approach worked22:32
JayFhttps://opendev.org/openstack/ironic-python-agent/src/branch/master/ironic_python_agent/hardware.py#L2637 22:32
JayFso basically just hook up deploy templates, give all your nodes that trait and all the nova flavors you care about that trait22:33
JayFput that step in as the first step that runs22:33
JayFand you're likely in business, patch-free22:33
shermanmas to why not rebuilds: I think that would solve most of the issues, but the UI for it in horizon is kind of frustrating, and pretty different from the initial "launch an instance" one22:34
JayFyeah I'd suggest pursuing rebuilds or doing a deploy template like that22:35
JayFI wonder if we would consider setting a default deploy template 22:35
JayFso you could make the template, set something in config, and just be done22:35
shermanmthe deploy template approach absolutely works, and default deploy templates would completely solve my issue22:35
JayFI would set up deploy templates for everything for today22:36
JayFand file a bug, tag with RFE, with the default deploy templates idea22:36
JayFwe can see how others feel about it22:36
shermanme.g. I tested the template+flavor+trait approach already, but was trying to avoid needing to avoid needing to introduce traits into my environment22:36
JayFwe're using traits for more and more stuff22:36
shermanmbut yeah, that makes sense, sounds like a good path to get feedback22:36
JayFe.g. the new runbooks feature uses traits to identify what nodes can run what runbooks22:36
JayFyou now have me wondering if someone could (ab)use node service to implement some wacky custom rebuild logic :)22:37
shermanmtraits do seem really useful, it was just "a new way nova scheduler can fail kind of opaquely", and I didn't want to spring it on my operators. I'm planning to deploy them sooner or later anyway to let users trigger bios changes22:38
JayFI feel that in my bones re: "a new way it can fail opaquely" LOL22:39
JayFI've probably screamed at nova-scheduler more than any other openstack service (and it's not really it's fault, it was a cloud with a bad rabbitmq and scheduler was always just the canary)22:39
shermanmI mean, I already found a race condition when creating nova aggregates from this same sort of work: https://bugs.launchpad.net/nova/+bug/154249122:40
shermanmI'm right there with you22:40
shermanmI just want nova to tell me *why* no hosts were found22:41
shermanmbut thanks again! this was helpful22:44
JayFI went ahead and documented our chat in that gerrit patch22:46
JayFhttps://review.opendev.org/c/openstack/ironic/+/932418/1#message-41143859b3410a6f8997eac577cb8158dca2715922:46
JayFif other folks are onboard, implementing a default is likely trivial: https://opendev.org/openstack/ironic/src/branch/master/ironic/conductor/steps.py#L339 just having this, or the callers, fallback to a configured default if one is set22:49
shermanmnice! yeah, I'll definitely take a crack at that RFE22:52
opendevreviewSteve Baker proposed openstack/ironic-python-agent stable/2024.1: Remove non RE2 job config  https://review.opendev.org/c/openstack/ironic-python-agent/+/93266022:53
opendevreviewSteve Baker proposed openstack/ironic-python-agent stable/2023.1: Remove non RE2 job config  https://review.opendev.org/c/openstack/ironic-python-agent/+/93266122:53
opendevreviewSteve Baker proposed openstack/ironic-python-agent stable/2023.2: Remove non RE2 job config  https://review.opendev.org/c/openstack/ironic-python-agent/+/93266222:54
opendevreviewSteve Baker proposed openstack/ironic-python-agent bugfix/9.12: Follow up to broken Zuul config  https://review.opendev.org/c/openstack/ironic-python-agent/+/93266322:58
opendevreviewMichael Sherman proposed openstack/ironic stable/2023.1: allow disk cleaning during deploy  https://review.opendev.org/c/openstack/ironic/+/93241823:02
shermanmbah, just making the wip (and probably wonfix) linked to the existing LP bug23:03
opendevreviewSteve Baker proposed openstack/ironic-python-agent bugfix/9.9: Remove and disable examples job  https://review.opendev.org/c/openstack/ironic-python-agent/+/92801923:04
opendevreviewSteve Baker proposed openstack/ironic-python-agent bugfix/9.9: Inspect non-raw images for safety  https://review.opendev.org/c/openstack/ironic-python-agent/+/92798423:04
opendevreviewMerged openstack/ironic-python-agent bugfix/9.12: Follow up to broken Zuul config  https://review.opendev.org/c/openstack/ironic-python-agent/+/93266323:39

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!