iurygregory | and https://xkcd.com/927/ strikes again in Redfish \o/ | 01:01 |
---|---|---|
iurygregory | Reason: unable to start inspection: The attribute Links/ManagedBy is missing from the resource /redfish/v1/Systems/1 | 01:01 |
iurygregory | insert happy dance gif /s | 01:01 |
opendevreview | Jacob Anders proposed openstack/sushy master: When ManagedBy attribute is missing from System retry with Managers https://review.opendev.org/c/openstack/sushy/+/926370 | 03:15 |
opendevreview | Jacob Anders proposed openstack/sushy master: When ManagedBy attribute is missing from System retry with Managers https://review.opendev.org/c/openstack/sushy/+/926370 | 03:21 |
opendevreview | Jacob Anders proposed openstack/sushy master: When ManagedBy attribute is missing from System retry with Managers https://review.opendev.org/c/openstack/sushy/+/926370 | 05:52 |
kubajj | Good morning Ironic! o/ | 06:47 |
opendevreview | Bela Szanics proposed openstack/ironic master: Fix conductor startup warning message https://review.opendev.org/c/openstack/ironic/+/926398 | 09:45 |
opendevreview | Jacob Anders proposed openstack/sushy master: When ManagedBy attribute is missing from System retry with Managers https://review.opendev.org/c/openstack/sushy/+/926370 | 10:25 |
opendevreview | Jacob Anders proposed openstack/sushy master: When ManagedBy attribute is missing from System retry with Managers https://review.opendev.org/c/openstack/sushy/+/926370 | 10:43 |
iurygregory | good morning ironic | 10:50 |
opendevreview | Bela Szanics proposed openstack/ironic master: Fix conductor startup warning message https://review.opendev.org/c/openstack/ironic/+/926398 | 11:11 |
iurygregory | dtantsur, would you be ok with me approving https://review.opendev.org/c/openstack/sushy/+/926370 ? (I think the nit can be in a follow-up, so we can move forward later with downstream process) wdyt? | 11:27 |
* iurygregory is a bit upset because the issue is on the implementation done in the hardware... | 11:29 | |
dtantsur | iurygregory: yep, totally | 12:01 |
opendevreview | Merged openstack/sushy master: When ManagedBy attribute is missing from System retry with Managers https://review.opendev.org/c/openstack/sushy/+/926370 | 12:23 |
opendevreview | Dmitry Tantsur proposed openstack/sushy stable/2024.1: When ManagedBy attribute is missing from System retry with Managers https://review.opendev.org/c/openstack/sushy/+/926410 | 12:34 |
TheJulia | good morning | 13:26 |
iurygregory | good morning TheJulia | 13:27 |
opendevreview | Merged openstack/sushy stable/2024.1: When ManagedBy attribute is missing from System retry with Managers https://review.opendev.org/c/openstack/sushy/+/926410 | 14:09 |
opendevreview | cid proposed openstack/ironic master: docs-audit-2024: Labeling references https://review.opendev.org/c/openstack/ironic/+/925691 | 14:16 |
opendevreview | cid proposed openstack/ironic master: Address TODO: set to the same value as in [pxe] after Xena https://review.opendev.org/c/openstack/ironic/+/926415 | 14:17 |
opendevreview | Doug Goldstein proposed openstack/python-ironicclient master: support passing disable_ramdisk for clean and service state https://review.opendev.org/c/openstack/python-ironicclient/+/924895 | 14:17 |
opendevreview | cid proposed openstack/ironic master: Address TODO: set to the same value as in [pxe] after Xena https://review.opendev.org/c/openstack/ironic/+/924349 | 14:20 |
TheJulia | cid: I think it would help to revise the commit message on https://review.opendev.org/c/openstack/ironic/+/924349 | 14:23 |
TheJulia | specifically the title | 14:23 |
opendevreview | cid proposed openstack/ironic master: Update configuration value https://review.opendev.org/c/openstack/ironic/+/924349 | 14:24 |
TheJulia | cid: add irmc :) | 14:24 |
cid | TheJulia: To the previous or current title :) | 14:25 |
TheJulia | current :) | 14:25 |
cid | Alright | 14:26 |
TheJulia | Thanks! | 14:26 |
opendevreview | cid proposed openstack/ironic master: Update configuration value in iRMC https://review.opendev.org/c/openstack/ironic/+/924349 | 14:27 |
TheJulia | Thanks again! | 14:37 |
JayF | jfyi I'm out sick today, if you need something from me urgently send a DM or sms | 14:37 |
opendevreview | Merged openstack/ironic master: Fix conductor startup warning message https://review.opendev.org/c/openstack/ironic/+/926398 | 14:47 |
masghar | Get well soon Jay! | 15:27 |
masghar | I have a question about the association of a node to a specific conductor in a multi-conductor setup: why are they linked? | 16:28 |
masghar | I mean: cant any conductor drive the transition of anode from one state to the next? And once thats done, the next transition can be done by any conductor? | 16:29 |
TheJulia | masghar: I was about to ask for you to clarify the question :) | 16:29 |
masghar | Also, am I right in understanding that each conductor has one API instance for itself? | 16:29 |
masghar | Also, how old are multi-conductor setups? Is it a fairly new design/implementation? | 16:30 |
masghar | Also, all nodes share one database? | 16:30 |
masghar | Sorry I am full of questions :) | 16:31 |
TheJulia | So, in theory, but reality is you may have a global ironic deployment. Or you may have a group of servers with specific operating security requirements with different base configuration needs. Because config/needs/routes can differ, that is the basis | 16:31 |
masghar | sorry I meant all conductors | 16:31 |
TheJulia | Further you also don’t want all the conductors downloading 1gb to deploy 1gb. That would also be super inefficient | 16:31 |
TheJulia | Ironic, really, has always been multi-conductor. We’ve added new capabilities to handle localization and logical resource groupings. | 16:32 |
masghar | When you say a group of servers with different security and config requirements, do you mean the nodes we provision or the nodes ironic runs on? | 16:34 |
TheJulia | Realistically, both. | 16:35 |
masghar | Oh I see | 16:35 |
masghar | Its a whole mixture of things | 16:35 |
jrosser | ^ i have found this all gigantically confusing when trying to make things highly available rather than at large scale | 16:35 |
masghar | 'Further you also don’t want all the conductors downloading 1gb to deploy 1gb. That would also be super inefficient' (how do I quote reply) | 16:38 |
masghar | I don't quite follow..wouldnt that be load-balancing for the conductors? | 16:38 |
TheJulia | The plus side, is if you have multiple conductors in a conductor group, they can serve those specific nodes and you can scale those as you need, and have no conductor group and scale the rest. It just impacts *how* the hash ring operates | 16:39 |
TheJulia | and truthfully, it is just a key injection. | 16:39 |
TheJulia | to the calculation. | 16:39 |
TheJulia | masghar: in a theory of any conductor can service any node, that baremetal node has to map to a specific conductor if you have asked ironic to download and stage files to the conductor | 16:40 |
masghar | I see! | 16:40 |
TheJulia | you can't have every conductor download that payload to deploy in anticipation of need | 16:40 |
TheJulia | jrosser: Sorry, how can we help? | 16:41 |
masghar | So the servicing of that image-related request has to be done by one conductor IIUC (makes sense) | 16:41 |
TheJulia | Yes, so that also helps drive the locking model *as well* | 16:41 |
masghar | okay, makes sense! | 16:42 |
jrosser | TheJulia: well maybe it's just that i dont understand, but with pretty much the whole of the rest of openstack i can take down 1 of N controllers and life just carries one | 16:42 |
jrosser | *on | 16:42 |
TheJulia | the conductor locks the baremetal "node" for exclusive action. You also don't want multiple conductors trying to do things like talk to the BMC because you'll likely crash or lockout the user on the BMC. | 16:42 |
jrosser | and so this becomes a bit of an operational headache when we get to doing software upgrades, maintainance or even worse totally re-installing a controller | 16:42 |
TheJulia | jrosser: which part, you can have multiple conductors in the same group, or no conductor groups at all? | 16:43 |
jrosser | but the baremetal nodes are still managed on an individual basis by one of the conductors, regardless of the use of a group? | 16:44 |
TheJulia | Think of a conductor group as a knob to control the hash ring | 16:45 |
TheJulia | in other words, how nodes are mapped to conductors | 16:45 |
TheJulia | so if one conductor that owns a node goes down, the hash ring recomputes | 16:45 |
TheJulia | future work gets re-assigned to the new conductor chosen by the hash ring | 16:45 |
jrosser | ah well that is what i am missing understanding :) | 16:46 |
TheJulia | that new conductor also takes over responsibility for the node until the hash ring recomputes | 16:46 |
masghar | And this recomputation happens within conductor groups only, or this is regular multi-conductor behaviour? | 16:46 |
TheJulia | The recomputation happens across all conductors, but think of conductor groups as limiting the scope and input into the calculation | 16:47 |
TheJulia | so, the results may change, but the hash ring from a group of unaffected nodes in the same group would not change | 16:47 |
TheJulia | other conductors outside of that specific group may change their hash ring and may take over for nodes | 16:47 |
TheJulia | well, to be more precise, "take over for nodes" means "take over responsibility for the baremetal nodes for the node which has failed or which need to be re-shuffled due to the failure. | 16:48 |
jrosser | the other thing that looks difficult to h/a is the association of a node to a nova-compute instance? | 16:48 |
masghar | (Thats so useful!) The hash ring basically maps nodes to conductors (atleast one conductor)? | 16:48 |
TheJulia | jrosser: that is a depressing topic | 16:49 |
jrosser | right - because in the scenario of upgrading a controller you quite likley lose 1/Nth of the nova-compute instances dedicated to ironic | 16:50 |
TheJulia | jrosser: so, API wise, the nova-compute service uses the API so it doesn't see the failure of a conductor. If ironic can continue it does, if it cannot, it fails the deploy. That is what it sees. Nova-compute has no real H/A capability, but you *can* separately configure a "failover" set where the service starts with the exact same "hostname" if the nova-compute process or process's host fails. But that requires advance | 16:50 |
TheJulia | planning. | 16:50 |
jrosser | thats interesting | 16:52 |
TheJulia | jrosser: in the nova context, your quite literally taking down a "hypervisor", so they don't want nor have been willing to allow a virt driver to change/fix/heal the db state over fear of fields which really don't matter in that context because we're not managing VMs and Ironic is doing the actual management with keeping the data it needs in the database | 16:52 |
TheJulia | Ironic's database, to be precise. | 16:52 |
masghar | As for the API and conductor processes, there must be one endpoint served by the API for (all) the conductor(s) running? (So to the external clients of the API, the conductor setup doesnt matter)? | 16:59 |
masghar | Or am I mistaken | 16:59 |
TheJulia | no, you can have separate API endpoints... but the cases where you may want to do that will be along the lines of split horizon for access. In other words, you have a public endpoint which has some restrictions enabled so IPA can't talk to it, but then you have ironic and other services talk to the other API endpoint. | 17:00 |
masghar | I see, I see | 17:01 |
masghar | Thanks TheJulia! | 17:01 |
TheJulia | Specifically https://github.com/openstack/ironic/blob/master/ironic/conf/api.py#L77 | 17:03 |
masghar | Quite strange | 17:07 |
TheJulia | how so? | 17:07 |
masghar | The config option sounds reasonable, but if we have a public endpoint and IPA cant talk to it, then how will IPA reach Ironic is what I was wondering | 17:08 |
masghar | Oh I think you meant there is a separate endpoint for the IPA-ironic communication in this case | 17:09 |
TheJulia | masghar: the conductor can have a specific URL to hint to IPA | 17:12 |
masghar | Makes sense | 17:13 |
jrosser | masghar: depending on how you architect your deployment, its quite likley you have an "internal" API endpoint that has the potential to be setup somewhat differently to the public endpoint | 17:18 |
jrosser | for example here is a set of haproxy ACL which enforces certain paths of the ironic API originate from specific network ranges https://github.com/openstack/openstack-ansible/blob/master/inventory/group_vars/ironic_all/haproxy_service.yml#L28-L30 | 17:20 |
masghar | jrosser: interesting! | 17:28 |
masghar | (AFK for a bit, will take a closer look later) | 17:28 |
opendevreview | Adam Rozman proposed openstack/ironic-python-agent master: WIP - root device encryption https://review.opendev.org/c/openstack/ironic-python-agent/+/926425 | 19:07 |
opendevreview | cid proposed openstack/ironic master: Update configuration value in iRMC https://review.opendev.org/c/openstack/ironic/+/924349 | 19:09 |
opendevreview | Doug Goldstein proposed openstack/python-ironicclient master: support passing disable_ramdisk for clean and service state https://review.opendev.org/c/openstack/python-ironicclient/+/924895 | 19:31 |
cardoe | Sorry about the above JayF. | 19:31 |
cardoe | So I'm late to the convo TheJulia and masghar, but the multiple conductor groups is absolutely something I planned on exploring for my use case. And maybe I'm wrong but maybe my use case is a valid case for it. | 20:15 |
cid | Good night ironic o/ | 20:17 |
cardoe | I've got one data center let's say and it's got 1 ironic API for the entire DC. So that user facing can just hit 1 endpoint. But really my DC is divided up into multiple leaf-spine fabrics. Data within one is fast and good but crossing the boundaries goes through an interconnect that while fast is going to be limited / QoSed. | 20:18 |
cardoe | So my rough idea is to have conductors in each of those leaf-spines and the devices that live there would be assigned to that conductor group. | 20:20 |
cardoe | I see the traffic between the node and the conductor as being greater than the node and the ironic API. | 20:21 |
TheJulia | cardoe: totally one of the reasons we built it :) | 23:00 |
TheJulia | And sorry for the delay in responding, a bad migraine hit me today. Waking up at this point. | 23:01 |
TheJulia | cardoe: so you could just have two conductors for, for example, each of your leafs, and you can fail over there. But that entirely depends on the number of nodes in each leaf :) | 23:02 |
TheJulia | And your risk profile/concerns | 23:03 |
cardoe | So I’m a glutton for punishment and running OpenStack on top of k8s. So which is the harder option? | 23:14 |
TheJulia | You can just run one if you don’t need to balance load. Steve baker did some changes to allow the conductor be a bit more graceful on k8s as a workload. | 23:15 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!