opendevreview | Harald Jensås proposed openstack/ironic master: Use association_proxy for ports node_uuid https://review.opendev.org/c/openstack/ironic/+/862933 | 00:01 |
---|---|---|
TheJulia | JayF: ack ack | 00:10 |
hjensas | JayF: added the perf test data, it's both good and bad. I.e the plain db_api.get_port_list is a lot slower, but the more complete "API" test is a lot faster. | 00:11 |
opendevreview | Harald Jensås proposed openstack/ironic master: Use association_proxy for node chassis_uuid https://review.opendev.org/c/openstack/ironic/+/864802 | 00:11 |
TheJulia | hjensas: so, it is a lazy population... I *wonder* if you change that to selectinload, fi that would change the numbers dramatically | 00:22 |
TheJulia | since lazy deferrs the data population, and might speed up the dp_api method call | 00:22 |
TheJulia | hmm lazy=joined | 00:27 |
hjensas | TheJulia: I will run a test with selectinload tomorrow. | 00:27 |
TheJulia | https://docs.sqlalchemy.org/en/20/orm/relationship_api.html#sqlalchemy.orm.relationship.params.lazy | 00:28 |
TheJulia | select is the parameter | 00:28 |
TheJulia | joined is a deferred lazy load | 00:28 |
TheJulia | select would cause two distinct queries | 00:28 |
TheJulia | so ports, then "hi db, give me this value for all of these id's kthxbai" | 00:28 |
TheJulia | hjensas: have a wonderful sleep() :) | 00:30 |
hjensas | TheJulia: that might help, I need to respin my devstack instance to test. But using select vs join sure changes the SQL generated - https://paste.opendev.org/show/be5fV2ynyF1W6cD30kjH/ | 00:31 |
TheJulia | yeah, there is a tax on the lazy that is a bit more overhead in later population | 00:32 |
TheJulia | but it is one of those "we should check both to be sure" things | 00:33 |
JayF | Lazy uses more sqla magic, right? | 00:33 |
JayF | It's like opting in to some magic? | 00:33 |
TheJulia | lazy is | 01:15 |
TheJulia | so, "select" is a lazy load, "selectin" is the double query | 01:15 |
TheJulia | joined might have unexpected results.... | 01:21 |
opendevreview | Steve Baker proposed openstack/bifrost master: Support PXE network boot with grub https://review.opendev.org/c/openstack/bifrost/+/807220 | 02:06 |
janders | good morning Ironic o/ | 07:10 |
janders | ajya I finally had a chance to look at test improvements you suggested for the SettingsURI fix (thanks!) - left you a response: https://review.opendev.org/c/openstack/sushy/+/856597/comments/e04fa10b_2b4ac1e9 - let me know what you think, TY! | 07:11 |
opendevreview | kamlesh chauvhan proposed openstack/sushy master: Fix volume deletion on newer iDRACs https://review.opendev.org/c/openstack/sushy/+/864845 | 07:47 |
kubajj | Good morning Ironic | 07:52 |
arne_wiebalck | Good morning kubajj and Ironic! | 07:55 |
ajya | Morning, janders, kubajj, arne_wiebalck and Ironic | 08:13 |
ajya | janders: thanks, I left a comment | 08:13 |
*** akahat|ruck is now known as akahat|ruck|lunch | 08:14 | |
arne_wiebalck | hey ajya o/ | 08:25 |
rpittau | good morning ironic! o/ | 08:35 |
opendevreview | Vanou Ishii proposed openstack/python-ironicclient master: Expand boot device choice https://review.opendev.org/c/openstack/python-ironicclient/+/864847 | 08:42 |
*** akahat|ruck|lunch is now known as akahat|ruck | 09:22 | |
janders | ajya WDYT about this approach to improving the tests in question: https://paste.openstack.org/show/bedD9rBJV08zDsuyhZ4R/ (L24-25). This is trying to ensure the GET request for eTag is hitting the expected path (System or Settings) | 10:04 |
ajya | janders: I guess this adds some extra checks, but might not be so obvious what it's testing. I'm not pushing hard for this, wait for other reviewers input. | 10:25 |
janders | ajya TY | 10:25 |
janders | dtantsur if you have time it would be great to hear your thoughts on this part - would be great to get this thing to merge :) | 10:26 |
janders | yeah I'm not super confident about using those GETs, if you folks have any pointers how to do this nicely with side_effects I can have a go tomorrow | 10:44 |
janders | (I mean using asserts to look for GET calls) | 10:45 |
iurygregory | morning Ironic | 11:32 |
dtantsur | janders: absolutely no context, sorry | 12:40 |
dtantsur | which patch are we talking about? | 12:40 |
TheJulia | Good morning! | 13:14 |
jrosser | is there anywhere in the docs where the 'lifecycle' of whatever is providing the dhcp lease for an ipmi port is described? | 13:35 |
jrosser | right now i have inspector offering dhcp leases to anything that wants one, but those seem ephemeral and i can't see how those become persistent once a node is created/inspected | 13:36 |
jrosser | *dnsmasq adjacent to inspector.... | 13:36 |
TheJulia | so you want to manage your BMCs using dhcp? | 13:40 |
jrosser | well, pretty much by accident that it what im doing now :) | 13:41 |
TheJulia | okay! | 13:41 |
TheJulia | Generally we recommend static addressing of BMCs since if you loose your dhcp server, you'll be in a huge world of hurt | 13:41 |
jrosser | and what happened was we did some firmware updates that involved taking the network ports down, and the leases expired | 13:41 |
jrosser | and now everything is $randomised! | 13:41 |
TheJulia | uhh | 13:41 |
TheJulia | that is exactly why we generally recommend that | 13:42 |
jrosser | so i was wondering if i just misunderstood and there was a step needed to convert the inspector/dnsmasq leases into neutron ports | 13:42 |
janders | dtantsur apologies. We were talking about this comment in the Lenovo - fixing SettingsURI patch : https://review.opendev.org/c/openstack/sushy/+/856597/comments/e04fa10b_2b4ac1e9 | 13:42 |
TheJulia | the inspector dnsmasq has always been for the connected interfaces for netbooting/hardware discovery, not the BMCs itself | 13:43 |
janders | dtantsur if you have any inputs on the need to improve these tests (and even better hints how to go about it) that would be very much appreciated | 13:43 |
jrosser | ah ok i think we have struggled to find a reference architecture for this | 13:43 |
jrosser | and getting the bmc leases from inspector dnsmasq has indeed made it work pretty much by accident | 13:44 |
TheJulia | jrosser: not a step per-say, more disjointed usage since most operators when using something like auto-discovery just discover the machine on a base vlan, and then use network switch managemnet via an ml2 driver, or human port management to move the base network interfaces to an more appropriate network for the long term operation of the physical node | 13:44 |
TheJulia | yeah, we deadlocked on reference architectures because there are many ways this can all be plugged together, and that doesn't translate to workflows easily either :( | 13:45 |
TheJulia | most operators I know run static for the bmcs on an entirely separate isolated management network | 13:45 |
TheJulia | but yeah, it is easy to stumble into such a config :\ | 13:45 |
TheJulia | Also, we can't really just know what the bmc's IP is all the time. If ipmi, the bmc if memory serves says it is 0.0.0.0 | 13:46 |
TheJulia | Do you have a physical asset inventory? | 13:46 |
jrosser | at the moment we have some ansible that creates the baremetal nodes | 13:47 |
jrosser | so "yes" in as much as the nodes are described in ansible group vars | 13:47 |
TheJulia | okay.. I think your going to need to get into the bmc of each node, and reconcile the addressing back out by hand and/or set it up either with a static lease provided by some dhcp service or just a static address, and then ensure that matches what ironic is configured for to perform power/boot management | 13:48 |
TheJulia | Which is kind of why i was thinking... do you have an asset inventory because typically accountants need serial numbers and sometimes those lists have mac addresses | 13:49 |
TheJulia | this is one of those things physical hardware access makes a ton easier, fwiw since there since service tags are generally a thing | 13:49 |
jrosser | i think that we were seeing something looking really close to auto discovery / auto inspect and thinking how neat that was | 13:50 |
jrosser | like zero touch nearly | 13:50 |
TheJulia | most zero-touch things I've seen completely ignore the long term persistent management since they are just the first go at the machine's deployment | 13:51 |
jrosser | fwiw i am heavily revisiting the code and documentation in openstack-ansible as a side effect of deploying ironic for $job so i'd like to make sure what i do and document is reasonable | 13:52 |
TheJulia | jrosser: awesome, if there is any thoughts/guidance/opinion/info we can help leverage please just let us know | 13:52 |
TheJulia | fwiw, I know several operators who have pipelined ironic into zero touch of their stuff, but they have a hardware inventory to run with which helps advance setting of dhcp reservations for bmcs and supplies them passwords with just a little glue to fit their business process and vendor process on each side of getting the hardware in on the loading dock. | 13:59 |
TheJulia | or static addresses pre-programmed by the factory | 14:00 |
jrosser | right - everything shipped with unique bmc passwords these days means some process is needed there | 14:00 |
TheJulia | yeah, for individual off the shelf orders that is a pain | 14:00 |
jrosser | also completely different angle, what is the situiation with openbmc and ARM systems? | 14:00 |
jrosser | i am about to get some Ampere stuff which is both of those | 14:01 |
TheJulia | for bulk like whole rack orders, most vendors will provide a list of the passwords upfront | 14:01 |
TheJulia | oh good, because Ampere uses ironic in their lab | 14:01 |
TheJulia | w/r/t openbmc, they impelemented enough ipmi that things kind of work as I understand it, and redfish apparently also just works, but I personally can't speak to it | 14:02 |
TheJulia | Arm wise, you can set the bootloader you want, I think ampere went with ipxe and had to build their own ipxe binary because of a bug or something | 14:02 |
TheJulia | (and distributors don't ship arm ipxe) | 14:03 |
jrosser | it's great to hear they are using ironic internally - they've been very helpful so far so I can just ask our local FAE | 14:03 |
TheJulia | you could likely use pre-built grub if you really wanted to, it is just not as flexible and doesn't contain fallback/retry logic | 14:03 |
TheJulia | also you can't just fall into introspection easily on grub | 14:04 |
TheJulia | anyone remember what irc handle morgan is using these days? | 14:07 |
johnthetubaguy | JayF: TheJulia: as a heads up, I think I found a way to finally fix that placement race when nova instances are deleted, then the ironic node goes into automatic cleaning, but there is a window when we keep handing out that ironic node as a valid candidate: https://review.opendev.org/c/openstack/nova/+/864773 and a related fix on retrying when placement gets out of sync has also just got approved: | 14:16 |
johnthetubaguy | https://review.opendev.org/c/openstack/nova/+/842478 Hopefully a bunch of failed builds that can be avoided, particuarly when you are trying to patch supercomputers power by OpenStack ;) | 14:16 |
TheJulia | johnthetubaguy: that is *awesome* news | 14:17 |
TheJulia | johnthetubaguy: Is there a path to backport it? | 14:19 |
johnthetubaguy | the general "feeling" was yes for both patches, I think | 14:21 |
TheJulia | nice | 14:23 |
TheJulia | hjensas: sorry, I looke dat the wrong field, it was like 2500%, not 3500% | 14:23 |
TheJulia | improvement that is | 14:23 |
hjensas | TheJulia: no problem, still a lot. (I should've calculated myself before posting on commit msg.) | 14:56 |
opendevreview | kamlesh chauvhan proposed openstack/sushy master: Retry on iDRAC SYS518 errors for all requests https://review.opendev.org/c/openstack/sushy/+/864911 | 15:10 |
dtantsur | janders: I fell asleep, apologies.. will take a look | 16:05 |
* dtantsur suspects he needs a large break pretty soon | 16:05 | |
JayF | I feel that :) I'm off W/Th/Fri next week to visit family for American Thanksgiving and it's much needed | 16:46 |
TheJulia | JayF: fwiw, nobodycam mentioned to me that we shoudl expect traction on https://review.opendev.org/c/openstack/ironic-python-agent/+/566544 after the new year | 17:12 |
JayF | I mean, it's not something I'm tracking or worried about :D | 17:12 |
JayF | I'm sure they are the most motivated to support their own hardware :D | 17:13 |
rpittau | good night! o/ | 17:17 |
opendevreview | Harald Jensås proposed openstack/ironic master: Use association_proxy for ports node_uuid https://review.opendev.org/c/openstack/ironic/+/862933 | 18:00 |
opendevreview | Harald Jensås proposed openstack/ironic master: Use association_proxy for port groups node_uuid https://review.opendev.org/c/openstack/ironic/+/864781 | 18:00 |
opendevreview | Harald Jensås proposed openstack/ironic master: Use association_proxy for node chassis_uuid https://review.opendev.org/c/openstack/ironic/+/864802 | 18:00 |
hjensas | TheJulia: I changed lazy to "selectin" - with "select" I end up with error related to no active session. | 18:02 |
hjensas | TheJulia: selecting seems to be marginally better than "joined" and "joined" + "innerjoin=True" https://paste.opendev.org/show/bVCbtyXmPDd44OVskrea/ | 18:02 |
hjensas | s/selecting/selectin/ | 18:02 |
TheJulia | oh, right... yeah you can't chain anymore | 18:06 |
TheJulia | sorry, nest | 18:06 |
TheJulia | yeah maybe 200ms which is still surprising, I guess that means there was some result set deduplication occuring | 18:07 |
TheJulia | so so selectin it is I guess | 18:08 |
hjensas | add_port_filter_node_uuid_is_not_none <- removing that shaves ~500ms | 18:13 |
TheJulia | got a link to that method? | 18:16 |
hjensas | https://review.opendev.org/c/openstack/ironic/+/862933/6/ironic/db/sqlalchemy/api.py#214 | 18:17 |
hjensas | That check for not none is to "compensate" for removing what was added for https://launchpad.net/bugs/1748893 - i.e https://review.opendev.org/c/openstack/ironic/+/862933/6/ironic/api/controllers/v1/port.py#b169 | 18:19 |
TheJulia | oh! Interesting | 18:20 |
hjensas | I kind of feel that should not be an issue anymore. | 18:20 |
TheJulia | Okay, yeah | 18:20 |
TheJulia | I was thinking it shoudln't be either | 18:20 |
TheJulia | since there is no background query occuring anymore, the join result failure should just not have a value.... or might not show in the list | 18:21 |
TheJulia | oh | 18:21 |
TheJulia | it is selectin, so I think it would just be None | 18:21 |
TheJulia | so yeah, that really shouldn't be an issue | 18:21 |
JayF | you could possibly write a test to validate that as well, if it's a concern | 18:21 |
hjensas | so, ports should always be gone before the node is gone - https://opendev.org/openstack/ironic/src/branch/master/ironic/db/sqlalchemy/api.py#L801-L867 | 18:31 |
* JayF wonders if replicated read slaves are guaranteed to update in order | 18:31 | |
JayF | yes, at least for mysql | 18:31 |
hjensas | unless there is some replica that did not sync yet. | 18:31 |
TheJulia | yeah, its a weird edge case | 18:34 |
TheJulia | and largely resolves around the results being generated and being iterated through | 18:34 |
TheJulia | closing the time span gap also fixes it | 18:34 |
TheJulia | because your not going back and re-consulting the db after the initial list was created 3-4 moments back | 18:35 |
hjensas | wonder if it would actually be faster to just continue if port.node_uuid is None in the api controller? | 18:35 |
TheJulia | I seem to remember ring repl could get out of order.... but I don't think you can do that anymore | 18:35 |
TheJulia | maybe | 18:36 |
TheJulia | although generally closing the window greatly reduces the possibility | 18:36 |
opendevreview | Harald Jensås proposed openstack/ironic master: Use association_proxy for ports node_uuid https://review.opendev.org/c/openstack/ironic/+/862933 | 19:13 |
opendevreview | Harald Jensås proposed openstack/ironic master: Use association_proxy for port groups node_uuid https://review.opendev.org/c/openstack/ironic/+/864781 | 19:13 |
opendevreview | Harald Jensås proposed openstack/ironic master: Use association_proxy for node chassis_uuid https://review.opendev.org/c/openstack/ironic/+/864802 | 19:13 |
hjensas | I moved the node_uuid is None check to the API controller, afict it's overhead is far less invasive when done there. | 19:15 |
stevebaker[m] | good morning | 19:33 |
TheJulia | good morning stevebaker[m] | 20:17 |
arne_wiebalck | TheJulia: I left some thoughts on https://review.opendev.org/c/openstack/ironic-specs/+/861803 | 20:39 |
arne_wiebalck | Good morning stevebaker[m] o/ | 20:39 |
stevebaker[m] | \o | 20:40 |
arne_wiebalck | .... aaaaand goodbye everyone, see you tmrw o/ :) | 20:40 |
stevebaker[m] | good night arne_wiebalck | 20:40 |
JayF | arne_wiebalck: I wrote you a novel in there for when you're back tomorrow :D | 21:39 |
JayF | tl;dr: conductors and computes have different scaling semantics and tying them together is a bad idea as a result | 21:40 |
TheJulia | brraaaaaains | 22:35 |
TheJulia | JayF: I think you kind of explained it nicely, fwiw | 22:47 |
TheJulia | it is so drastically different, and given it works, just not quickly... kind of becomes decieving | 22:47 |
JayF | it's not even that like, in some common deployment types they won't, as a coincidence, have similar scaling sizes | 22:48 |
JayF | but if you have someone who, e.g. is managing large numbers of nodes but mainly only doing 1-5 provision/clean cycles during a multi-year node lifetime | 22:48 |
JayF | you don't need quite so many conductors :D | 22:48 |
JayF | and I'm sure there are examples which go the other direction; but with the iscsi driver dead I don't have an easy one to pick on lol | 22:48 |
TheJulia | True | 22:51 |
TheJulia | heh | 22:51 |
TheJulia | we also changed some of the logic to be more preferential/smarter | 22:51 |
TheJulia | like... always starting power sync with the oldest | 22:51 |
TheJulia | And everything is tunable | 22:52 |
TheJulia | And... we're fairly laid about power sync just being a long running "thing" | 22:52 |
JayF | we have config to disable it, yeah? | 22:52 |
TheJulia | of course | 22:52 |
JayF | I can't remember if that was a downstream patch or config | 22:52 |
JayF | ok good | 22:53 |
TheJulia | you set the interval to 0 | 22:53 |
TheJulia | and it stops | 22:53 |
JayF | fyi my office hours today are starting in ~7 minutes | 22:53 |
JayF | going to be working on periodic task for the shards stuff | 22:53 |
JayF | of course folks are welcome to come stop by and request reviews or similar, too | 22:54 |
TheJulia | link? | 22:55 |
JayF | youtube.com/jayofdoom | 22:56 |
JayF | going live at the top of the hourish | 22:56 |
TheJulia | cool cool | 22:56 |
opendevreview | Harald Jensås proposed openstack/ironic master: Use association_proxy for port groups node_uuid https://review.opendev.org/c/openstack/ironic/+/864781 | 23:57 |
opendevreview | Harald Jensås proposed openstack/ironic master: Use association_proxy for node chassis_uuid https://review.opendev.org/c/openstack/ironic/+/864802 | 23:57 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!