iurygregory | happy new year Ironic o/ | 10:30 |
---|---|---|
dtantsur | happy new year folks! | 11:30 |
opendevreview | Takashi Kajinami proposed openstack/ironic master: Replace crypt module https://review.opendev.org/c/openstack/ironic/+/937173 | 13:07 |
opendevreview | Takashi Kajinami proposed openstack/ironic-python-agent master: Replace crypt module https://review.opendev.org/c/openstack/ironic-python-agent/+/937175 | 13:07 |
opendevreview | Adam Rozman proposed openstack/ironic master: disable ISO cache image format and safety checks https://review.opendev.org/c/openstack/ironic/+/938363 | 13:51 |
iurygregory | dtantsur, do you have thoughts on https://review.opendev.org/c/openstack/ironic/+/938108 ? | 13:55 |
iurygregory | looking for feedback if this would be a valid approach or not | 13:55 |
dtantsur | iurygregory: generally. But I'd prefer the retry to be more granular. | 13:59 |
dtantsur | As in, I'm not sure if the whole prepare_ramdisk is even re-entrant | 13:59 |
dtantsur | I also don't quite understand why it belongs in prepare_ramdisk, not somewhere in the firmware update code | 14:00 |
iurygregory | let me find the link with the logs 1min | 14:00 |
iurygregory | https://paste.opendev.org/show/bdf8ZjY9DXtJhzYGKTDm/ | 14:01 |
iurygregory | here it goes | 14:01 |
iurygregory | the error trying to reach the BMC was during prepare_ramdisk, that is why I added there | 14:01 |
dtantsur | okay, so File "/opt/stack/ironic/ironic/drivers/modules/redfish/firmware.py", line 184 seems to be the right place to add something like "are we ready to proceed already?" | 14:01 |
iurygregory | https://opendev.org/openstack/ironic/src/branch/master/ironic/drivers/modules/redfish/firmware.py#L184 ? | 14:02 |
dtantsur | I don't think we should even try to call prepare_ramdisk if the BMC is not working | 14:02 |
iurygregory | so probably we would need logic inside reboot_to_finish_step ? | 14:02 |
dtantsur | does your logic apply to normal deployment? cleaning? RAID? | 14:03 |
dtantsur | if not, why put it in the generic code? | 14:03 |
dtantsur | the cause of the failure is firmware update, right? | 14:03 |
dtantsur | so this has to be handled by (and constrained within) the firmware update code | 14:04 |
iurygregory | yeah, it makes sense, since it's only during firmware update | 14:04 |
dtantsur | right. so probably try to poke the system/managers list and retry. | 14:05 |
dtantsur | bonus: you will be sure you're not retrying something entirely unrelated | 14:05 |
dtantsur | now, I have another concern. Are you sure this condition will get resolved at all until reboot? | 14:06 |
iurygregory | ok, I have the feeling it would, like the BMC becomes responsive after sometime, if I trigger a manual redfish simple update we don't need to send reboot | 14:07 |
iurygregory | so I'm assuming it would be ok | 14:07 |
dtantsur | great | 14:07 |
dtantsur | then our plan should be enough | 14:08 |
iurygregory | will do manual testing on the HPE and Dell again to see how it goes | 14:08 |
iurygregory | Thanks for the help! | 14:08 |
dtantsur | sure | 14:11 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!