arne_wiebalck | Good morning, Ironic! | 06:55 |
---|---|---|
queensly[m] | Good morning | 07:03 |
AmarachiOrdor[m] | Good morning Ironic | 07:50 |
sylvr | Good morning Ironic | 08:29 |
abongale | Good Morning Ironic | 08:49 |
freemanboss[m] | Good morning | 09:37 |
sylvr | ironic-inspector is deprecated in favor of ironic which will handle inspection using power_management_interface to start PXE boot and IPA right ? | 09:41 |
sylvr | I'm looking through the ironic-inspector, and ironic in band inspection documentation, and using the newer solution (Ironic + IPA), launching some node inspection now require ports to be created ? (which wasn't the case for ironic-inspector) | 09:58 |
dtantsur | sylvr: 1) yes, 2) by default - yes. Read on about unmanaged mode (which was the default for inspector): https://docs.openstack.org/ironic/latest/admin/inspection/managed.html#unmanaged-inspection | 11:04 |
frickler | fyi if you have something like redfish, you can run oob inspection to generate the ports automatically | 11:12 |
dtantsur | yes, and furthermore: the in-band inspection implementation will try to do it automatically before starting | 11:13 |
iurygregory | good morning ironic | 11:21 |
sylvr | dtantsur: thanks for your answers ! just to be sure, without ironic-inspector, no TFTP server is supplied to the providing service, you have to setup your own ? | 11:50 |
sylvr | frickler: yes I read about that, and it looks awesome! unfortunately, I'm stuck with old hardware that only support IPMI (and I even had some weird boot order behavior which required me to manually intervene a few times) | 11:52 |
dtantsur | sylvr: ironic-inspector also never supplied a TFTP server, nothing has changed in this regard | 11:58 |
sylvr | dtantsur: ok I should've known that because it's deployed by kayobe/kolla-ansible. Well, I think I have some issue with the PXE filter and that confused me. Thanks for clarifying Ironic :) | 12:28 |
opendevreview | Merged openstack/ironic-python-agent master: Remove eventlet from Ironic Python Agent https://review.opendev.org/c/openstack/ironic-python-agent/+/946091 | 12:59 |
opendevreview | cid proposed openstack/ironic master: Add an index on ports.node_id https://review.opendev.org/c/openstack/ironic/+/948431 | 13:04 |
opendevreview | cid proposed openstack/networking-baremetal master: Add conductor group sharding support https://review.opendev.org/c/openstack/networking-baremetal/+/948432 | 13:04 |
opendevreview | cid proposed openstack/networking-baremetal master: Add conductor group sharding support https://review.opendev.org/c/openstack/networking-baremetal/+/948432 | 14:00 |
TheJulia | good morning | 15:16 |
shermanm | with the caveat that I know 2023.1, and especially ironic-inspector are not supported anymore, does anyone recall an issue where nodes would get stuck in "inspecting", and never leave it until the conductor is restarted? Just looking for a pointer on where to dig around in the codebase | 15:47 |
dtantsur | shermanm: do they get stuck immediately or after some time in inspect wait? | 16:07 |
dtantsur | I assume nothing interesting in the logs? | 16:08 |
TheJulia | This feels deja-vu-ey | 16:08 |
TheJulia | so I think we've seen this once or twice before, at least if it is the exact same thing. The client which gets cached for interactions breaks, but you'll obviously need to look at the logs | 16:12 |
TheJulia | once it breaks, you have to restart the conductor but the break itself is the oddity if memory serves. Quickly looking at git doesn't yield anything in that regard, so maybe my internal LLM is having a fever dream or something funky | 16:13 |
dtantsur | yeah, it's either the client or something in the hooks, the former being more likely | 16:13 |
TheJulia | I don't remember what exactly happened to the client | 16:14 |
shermanm | so, luckily my log archiving actually went back far enough to catch this. It looks like it failed "Unable to start managed inspection for node $uuid: Failed to create neutron ports for node's $uuid ports" | 16:15 |
shermanm | and that was due to a 504 from neutron, stemming from NGS slowness | 16:15 |
TheJulia | But yeah, entirely situational as well so there may be an opporunity to harden/restart the client. We've had a couple weird such bugs where clients break under weird circumstances, in the case which comes to mind is around timeouts in certian states with redfish bmcs, not inspector but yeah... | 16:15 |
TheJulia | ... that shouldn't break it hard but I guess there is no way to really unwedge it from that point | 16:16 |
TheJulia | Likely need to invoke the error handler | 16:16 |
dtantsur | Hmm, the error would explain a failure but doesn't really tell us why the process got stuck | 16:17 |
dtantsur | this exception should have been handled properly | 16:17 |
shermanm | actually, it looks like it failed in the exception handler, after handling the rest of that exception properly, due to `oslo_db.exception.DBDataError: (pymysql.err.DataError) (1406, "Data too long for column 'user' at row 1")` | 16:18 |
TheJulia | That would do it | 16:18 |
shermanm | which IIRC was fixed in newer versions for node_history, I didn't realize it could crop up here too | 16:18 |
TheJulia | shermanm: How long is user in your environment? | 16:19 |
TheJulia | you have the federated config don't you | 16:22 |
shermanm | 64 characters, more details in this launchpad bug from back in xena w.r.t. node_history https://bugs.launchpad.net/ironic/+bug/2054594 | 16:22 |
shermanm | yup | 16:22 |
TheJulia | that is how we foudn aht issue | 16:22 |
TheJulia | yeah | 16:22 |
TheJulia | okay, that explains it! | 16:22 |
shermanm | the rubber-ducking session is appreciated :) | 16:23 |
shermanm | the million dollar question for me becomes just how messy is it to try and backport the alembic migration to my 2023.1 downstream | 16:36 |
JayF | I would upgrade ironic only before I did that | 16:37 |
JayF | alternatively | 16:37 |
JayF | I wonder if the migration goes boom if you just make the user bigger | 16:37 |
shermanm | yeah, that's what I'm afraid of. migrations making future upgrades a huge headache has been an issue before, I'll probably just do an ugly workaround on the error handler until we can get ironic upgraded | 16:39 |
JayF | shermanm: we don't test it, but I will say some users upgrade Ironic at a faster cadence than the rest of the cloud. | 16:39 |
JayF | shermanm: I would prefer having Ironic version outta sync than backport a DB-related change | 16:40 |
shermanm | JayF: definitely a good point | 16:43 |
TheJulia | I don't think we've ever tried a backport of a db schema change because it woudl also break the upgrade ordering | 16:47 |
TheJulia | The *closest* thing we've ever done is make some of our upgrades around indexes super smart to check the state prior to applying the change | 16:47 |
TheJulia | And we ended up backporting docs like: hey, you could run these commands to make the DB happier, ironic knows what to do on upgrade | 16:48 |
shermanm | luckily there seems to be an easy workaround, just setting "node_history=false" until we get to 2024.1, so no crimes against alembic needed | 17:10 |
shermanm | pretty sure this would affect deployments running 2023.2, dunno if it would make sense to add a note in the docs about disabling node_history if using keystone federation | 17:13 |
alegacy_ | As discussed at yesterday's weekly meeting I have gone ahead and added a rough outline of some of the high-level topics to try and kickstart the conversation on the standalone network topic. Here: https://etherpad.opendev.org/p/ironic-standalone-networking | 17:19 |
TheJulia | alegacy_: I added some comments. We should likely try and have a call to get multiple folks on the same page, but I also know JayF might not be available easily this week. | 18:50 |
JayF | a call is maybe one of the easier things I could do | 18:50 |
JayF | work that can be done while horizontal >>>>>>>> * | 18:50 |
JayF | lol | 18:50 |
* TheJulia glares ;) | 18:54 | |
JayF | honestly I have finally found a comfortable position to sit with my laptop without my back screaming so I'm trying to get as much done as I can | 18:57 |
JayF | not just for like, productivity sake but for my own sanity lol | 18:57 |
opendevreview | Ivan Anfimov proposed openstack/ironic master: Remove tags from README https://review.opendev.org/c/openstack/ironic/+/948479 | 19:49 |
opendevreview | Ivan Anfimov proposed openstack/ironic master: Remove tags from README https://review.opendev.org/c/openstack/ironic/+/948479 | 19:50 |
opendevreview | Ivan Anfimov proposed openstack/ironic master: Remove tags from README https://review.opendev.org/c/openstack/ironic/+/948479 | 22:23 |
opendevreview | Ivan Anfimov proposed openstack/ironic master: Remove tags from README https://review.opendev.org/c/openstack/ironic/+/948479 | 22:23 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!