opendevreview | Nahian Pathan proposed openstack/sushy master: Support expanded Chassis and Storage for redfish https://review.opendev.org/c/openstack/sushy/+/955211 | 01:09 |
---|---|---|
opendevreview | Jacob Anders proposed openstack/ironic master: [WIP] Skip initial reboot to IPA when updating firmware out-of-band https://review.opendev.org/c/openstack/ironic/+/954311 | 01:36 |
opendevreview | Jacob Anders proposed openstack/ironic master: [WIP] Skip initial reboot to IPA when updating firmware out-of-band https://review.opendev.org/c/openstack/ironic/+/954311 | 02:35 |
opendevreview | Jacob Anders proposed openstack/ironic master: [WIP] Skip initial reboot to IPA when updating firmware out-of-band https://review.opendev.org/c/openstack/ironic/+/954311 | 02:55 |
rpittau | good morning ironic! happy friday! o/ | 06:59 |
queensly[m] | Good morning :) | 07:53 |
dtantsur | TheJulia: wow, that's truly impressive given that it has 0 unit tests and I haven't tested it at all :D I'll try to finish it today | 11:23 |
TheJulia | dtantsur: yeah, I did have to set min_workers, and set the rejection function to behave differently, but realistically those are minor compared to watching the thread pool hover between 26-31 threads | 13:14 |
TheJulia | FWIW, I *think* in this new model, it wouldn't be a bad idea if debug logging is enabled to just log the thread count. That happens to be super useful. | 13:14 |
TheJulia | I guess I'll also add it as a metric at some point | 13:14 |
dtantsur | yeah, I'll definitely add some logging | 13:15 |
dtantsur | TheJulia: if you have a minute this morning, could you scan https://bugs.launchpad.net/ironic/+bug/2117178 for obvious reasons it would not work? | 13:15 |
TheJulia | dtantsur: I was sort of thinking on a periodic so we don't trigger any such logging on any periodic trigger | 13:16 |
TheJulia | or worker spawn | 13:16 |
dtantsur | TheJulia: it's actually an interesting bit of information whether we create/drop threads too often. We can always disable it via oslo.log settings if it becomes noisy. | 13:17 |
TheJulia | true | 13:18 |
TheJulia | One thing I *did* notice, my spread on power sync settled in on ~8:30 seconds | 13:18 |
dtantsur | with how many nodes? | 13:18 |
TheJulia | with eventlet, ~6:40, *however* with some tuning I saw similar numbers on full threads too when the stars aligned until we ran out of new workers (then, it would go to 12 to 15 minutes) | 13:19 |
TheJulia | A little over 5k configured, ~4500 elegible for power state sync | 13:20 |
TheJulia | 1 second delay on *each* sync, which honestly is still kind of impressive | 13:20 |
dtantsur | That's not too bad, we can optimize further | 13:20 |
TheJulia | yeah, that was what I was thinking as well | 13:20 |
TheJulia | (and, I likely need to get to a cleaner state with less debug logging added as well (which can slow things down too)) | 13:21 |
TheJulia | so we can re-measure once we get some more stuff moving forward | 13:21 |
TheJulia | so regarding the launchpad item, I was sort of thinking similar for the power state syncing, even though in the current model it works well, it would be a boost if we could do async power state calls and then reconcile | 13:26 |
dtantsur | Yep. I have the power sync in mind too, it's just harder to implement | 13:27 |
TheJulia | ipmi will always be syncrhonous and slower, but... its also ipmi and we'll have a nice big AI generated sign in a project update saying "stop using ipmi" | 13:27 |
dtantsur | and yet, we require ipmitool by default :) | 13:27 |
TheJulia | oh, much, but I think the design still works super well, and doesn't have a 60 second update frequency requirement | 13:28 |
TheJulia | I thought we dialed that back | 13:28 |
TheJulia | of course, drivers still using it under the hood is a thing :( | 13:28 |
dtantsur | TheJulia: I figured why you had problems with rejection. I'm using queue_size/workers, so the pool never grows until tasks get queued. | 14:24 |
TheJulia | ahh, that makes sense | 14:25 |
dtantsur | I think it's a more confusing metric, so I'll switch to idle threads instead. | 14:28 |
TheJulia | ++ | 14:29 |
TheJulia | When I was glancing back at it trying to figure it out, that was sort of the gut feeling I was having | 14:29 |
alegacy | Trying to use bifrost for the first time. Nodes seem to be stuck in "clean failed". Is there an easy way to blow everything away and start again ... or is this a delete the bifrost VM and start again situation? | 14:39 |
TheJulia | If I'm recalling, not terribly, but I'm sort of surprised the nodes are stuck in clean failed state, did they just not boot? the whole test environment setup is modeled to start as a working environment so sort of weird it failed | 14:44 |
alegacy | dunno. showing as shut-off in virsh | 14:46 |
TheJulia | what does the last_error field indicate? and what did you use as your options when you installed? | 14:50 |
alegacy | last_error is: "Timeout reached while cleaning the node. Please check if the ramdisk responsible for the cleaning is running on the node. Failed on step {}" | 14:55 |
alegacy | no options... just "bifrost testenv" followed by "bifrost install --network-interface eth0 --testenv" | 14:55 |
TheJulia | oh, so that explains it | 14:56 |
TheJulia | re-install without passing a network interface | 14:56 |
TheJulia | that option is if you want to bind bifrost to a physical network outside of the host for physical machine testing/work | 14:57 |
alegacy | ah, so if testenv then no network-interface required? | 14:57 |
TheJulia | correct, it defaults out to virbr0 | 14:57 |
TheJulia | at which point, once you've reinstalled, you could likely do "baremetal node manage $name_or_id" and then do "baremetal node provide $name_or_id" | 14:58 |
alegacy | k, trying again | 15:02 |
clif | o/ gm ironic, is there a guide on how to update docs when adding a new attribute to an object such as Port? | 15:20 |
clif | I see there's a script called regenerate-samples.sh under ironic/api-ref but it seems like it hasn't been update in quite a while | 15:21 |
clif | is it just manually finding sections that need to be updated and doing it? | 15:22 |
TheJulia | clif: no guide really, depends on what layer since the object is allowed to have fields which are not exposed by the rest API, so object -> rpc (version should change) -> api response and sample | 15:36 |
alegacy | TheJulia: that worked... I'm back in business. Thank you! | 15:46 |
TheJulia | bifrost is largely modeled around change the source, re-install | 16:00 |
TheJulia | keeps the db around, and allows you to kind of figure out what is going on and then you can navigate the state of being for a node based upon what is going on | 16:00 |
clif | TheJulia: thanks, do you know if would we want to expose "vendor" and "class" port fields to the API? I would think we would want to exponse it, but then again maybe not. | 19:52 |
TheJulia | we likely do want to | 19:59 |
TheJulia | I mean, how else would someone be able to set them :) | 19:59 |
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline briefly for a configuration and version update, but should return to service momentarily | 20:06 | |
clif | true :) | 20:07 |
opendevreview | Clif Houck proposed openstack/ironic master: Add a new 'vendor' field to the Port object https://review.opendev.org/c/openstack/ironic/+/954966 | 20:08 |
opendevreview | Doug Goldstein proposed openstack/ironic master: allow running inspection hooks on redfish interface https://review.opendev.org/c/openstack/ironic/+/933066 | 22:38 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!