Thursday, 2024-08-15

iurygregoryand https://xkcd.com/927/ strikes again in Redfish \o/01:01
iurygregory Reason: unable to start inspection: The attribute Links/ManagedBy is missing from the resource /redfish/v1/Systems/101:01
iurygregoryinsert happy dance gif /s01:01
opendevreviewJacob Anders proposed openstack/sushy master: When ManagedBy attribute is missing from System retry with Managers  https://review.opendev.org/c/openstack/sushy/+/92637003:15
opendevreviewJacob Anders proposed openstack/sushy master: When ManagedBy attribute is missing from System retry with Managers  https://review.opendev.org/c/openstack/sushy/+/92637003:21
opendevreviewJacob Anders proposed openstack/sushy master: When ManagedBy attribute is missing from System retry with Managers  https://review.opendev.org/c/openstack/sushy/+/92637005:52
kubajjGood morning Ironic! o/06:47
opendevreviewBela Szanics proposed openstack/ironic master: Fix conductor startup warning message  https://review.opendev.org/c/openstack/ironic/+/92639809:45
opendevreviewJacob Anders proposed openstack/sushy master: When ManagedBy attribute is missing from System retry with Managers  https://review.opendev.org/c/openstack/sushy/+/92637010:25
opendevreviewJacob Anders proposed openstack/sushy master: When ManagedBy attribute is missing from System retry with Managers  https://review.opendev.org/c/openstack/sushy/+/92637010:43
iurygregorygood morning ironic10:50
opendevreviewBela Szanics proposed openstack/ironic master: Fix conductor startup warning message  https://review.opendev.org/c/openstack/ironic/+/92639811:11
iurygregorydtantsur, would you be ok with me approving https://review.opendev.org/c/openstack/sushy/+/926370 ? (I think the nit can be in a follow-up, so we can move forward later with downstream process) wdyt?11:27
* iurygregory is a bit upset because the issue is on the implementation done in the hardware...11:29
dtantsuriurygregory: yep, totally12:01
opendevreviewMerged openstack/sushy master: When ManagedBy attribute is missing from System retry with Managers  https://review.opendev.org/c/openstack/sushy/+/92637012:23
opendevreviewDmitry Tantsur proposed openstack/sushy stable/2024.1: When ManagedBy attribute is missing from System retry with Managers  https://review.opendev.org/c/openstack/sushy/+/92641012:34
TheJuliagood morning13:26
iurygregorygood morning TheJulia 13:27
opendevreviewMerged openstack/sushy stable/2024.1: When ManagedBy attribute is missing from System retry with Managers  https://review.opendev.org/c/openstack/sushy/+/92641014:09
opendevreviewcid proposed openstack/ironic master: docs-audit-2024: Labeling references  https://review.opendev.org/c/openstack/ironic/+/92569114:16
opendevreviewcid proposed openstack/ironic master: Address TODO: set to the same value as in [pxe] after Xena  https://review.opendev.org/c/openstack/ironic/+/92641514:17
opendevreviewDoug Goldstein proposed openstack/python-ironicclient master: support passing disable_ramdisk for clean and service state  https://review.opendev.org/c/openstack/python-ironicclient/+/92489514:17
opendevreviewcid proposed openstack/ironic master: Address TODO: set to the same value as in [pxe] after Xena  https://review.opendev.org/c/openstack/ironic/+/92434914:20
TheJuliacid: I think it would help to revise the commit message on https://review.opendev.org/c/openstack/ironic/+/92434914:23
TheJuliaspecifically the title14:23
opendevreviewcid proposed openstack/ironic master: Update configuration value  https://review.opendev.org/c/openstack/ironic/+/92434914:24
TheJuliacid: add irmc :)14:24
cidTheJulia: To the previous or current title :)14:25
TheJuliacurrent :)14:25
cidAlright14:26
TheJuliaThanks!14:26
opendevreviewcid proposed openstack/ironic master: Update configuration value in iRMC  https://review.opendev.org/c/openstack/ironic/+/92434914:27
TheJuliaThanks again!14:37
JayFjfyi I'm out sick today, if you need something from me urgently send a DM or sms14:37
opendevreviewMerged openstack/ironic master: Fix conductor startup warning message  https://review.opendev.org/c/openstack/ironic/+/92639814:47
masgharGet well soon Jay!15:27
masgharI have a question about the association of a node to a specific conductor in a multi-conductor setup: why are they linked? 16:28
masgharI mean: cant any conductor drive the transition of anode from one state to the next? And once thats done, the next transition can be done by any conductor?16:29
TheJuliamasghar: I was about to ask for you to clarify the question :)16:29
masgharAlso, am I right in understanding that each conductor has one API instance for itself?16:29
masgharAlso, how old are multi-conductor setups? Is it a fairly new design/implementation?16:30
masgharAlso, all nodes share one database?16:30
masgharSorry I am full of questions :)16:31
TheJuliaSo, in theory, but reality is you may have a global ironic deployment. Or you may have a group of servers with specific operating security requirements with different base configuration needs.  Because config/needs/routes can differ, that is the basis16:31
masgharsorry I meant all conductors16:31
TheJuliaFurther you also don’t want all the conductors downloading 1gb to deploy 1gb. That would also be super inefficient16:31
TheJuliaIronic, really, has always been multi-conductor. We’ve added new capabilities to handle localization and logical resource groupings.16:32
masgharWhen you say a group of servers with different security and config requirements, do you mean the nodes we provision or the nodes ironic runs on?16:34
TheJuliaRealistically, both.16:35
masgharOh I see16:35
masgharIts a whole mixture of things16:35
jrosser^ i have found this all gigantically confusing when trying to make things highly available rather than at large scale16:35
masghar'Further you also don’t want all the conductors downloading 1gb to deploy 1gb. That would also be super inefficient' (how do I quote reply)16:38
masgharI don't quite follow..wouldnt that be load-balancing for the conductors?16:38
TheJuliaThe plus side, is if you have multiple conductors in a conductor group, they can serve those specific nodes and you can scale those as you need, and have no conductor group and scale the rest. It just impacts *how* the hash ring operates16:39
TheJuliaand truthfully, it is just a key injection.16:39
TheJuliato the calculation.16:39
TheJuliamasghar: in a theory of any conductor can service any node, that baremetal node has to map to a specific conductor if you have asked ironic to download and stage files to the conductor16:40
masgharI see!16:40
TheJuliayou can't have every conductor download that payload to deploy in anticipation of need16:40
TheJuliajrosser: Sorry, how can we help?16:41
masgharSo the servicing of that image-related request has to be done by one conductor IIUC (makes sense)16:41
TheJuliaYes, so that also helps drive the locking model *as well*16:41
masgharokay, makes sense!16:42
jrosserTheJulia: well maybe it's just that i dont understand, but with pretty much the whole of the rest of openstack i can take down 1 of N controllers and life just carries one16:42
jrosser*on16:42
TheJuliathe conductor locks the baremetal "node" for exclusive action. You also don't want multiple conductors trying to do things like talk to the BMC because you'll likely crash or lockout the user on the BMC.16:42
jrosserand so this becomes a bit of an operational headache when we get to doing software upgrades, maintainance or even worse totally re-installing a controller16:42
TheJuliajrosser: which part, you can have multiple conductors in the same group, or no conductor groups at all?16:43
jrosserbut the baremetal nodes are still managed on an individual basis by one of the conductors, regardless of the use of a group?16:44
TheJuliaThink of a conductor group as a knob to control the hash ring16:45
TheJuliain other words, how nodes are mapped to conductors16:45
TheJuliaso if one conductor that owns a node goes down, the hash ring recomputes16:45
TheJuliafuture work gets re-assigned to the new conductor chosen by the hash ring16:45
jrosserah well that is what i am missing understanding :)16:46
TheJuliathat new conductor also takes over responsibility for the node until the hash ring recomputes16:46
masgharAnd this recomputation happens within conductor groups only, or this is regular multi-conductor behaviour?16:46
TheJuliaThe recomputation happens across all conductors, but think of conductor groups as limiting the scope and input into the calculation16:47
TheJuliaso, the results may change, but the hash ring from a group of unaffected nodes in the same group would not change16:47
TheJuliaother conductors outside of that specific group may change their hash ring and may take over for nodes16:47
TheJuliawell, to be more precise, "take over for nodes" means "take over responsibility for the baremetal nodes for the node which has failed or which need to be re-shuffled due to the failure.16:48
jrosserthe other thing that looks difficult to h/a is the association of a node to a nova-compute instance?16:48
masghar(Thats so useful!) The hash ring basically maps nodes to conductors (atleast one conductor)?16:48
TheJuliajrosser: that is a depressing topic16:49
jrosserright - because in the scenario of upgrading a controller you quite likley lose 1/Nth of the nova-compute instances dedicated to ironic16:50
TheJuliajrosser: so, API wise, the nova-compute service uses the API so it doesn't see the failure  of a conductor. If ironic can continue it does, if it cannot, it fails the deploy. That is what it sees. Nova-compute has no real H/A capability, but you *can* separately configure a "failover" set where the service starts with the exact same "hostname" if the nova-compute process or process's host fails. But that requires advance 16:50
TheJuliaplanning.16:50
jrosserthats interesting16:52
TheJuliajrosser: in the nova context, your quite literally taking down a "hypervisor", so they don't want nor have been willing to allow a virt driver to change/fix/heal the db state over fear of fields which really don't matter in that context because we're not managing VMs and Ironic is doing the actual management with keeping the data it needs in the database16:52
TheJuliaIronic's database, to be precise.16:52
masgharAs for the API and conductor processes, there must be one endpoint served by the API for (all) the conductor(s) running? (So to the external clients of the API, the conductor setup doesnt matter)?16:59
masgharOr am I mistaken16:59
TheJuliano, you can have separate API endpoints... but the cases where you may want to do that will be along the lines of split horizon for access. In other words, you have a public endpoint which has some restrictions enabled so IPA can't talk to it, but then you have ironic and other services talk to the other API endpoint.17:00
masgharI see, I see17:01
masgharThanks TheJulia!17:01
TheJuliaSpecifically https://github.com/openstack/ironic/blob/master/ironic/conf/api.py#L7717:03
masgharQuite strange17:07
TheJuliahow so?17:07
masgharThe config option sounds reasonable, but if we have a public endpoint and IPA cant talk to it, then how will IPA reach Ironic is what I was wondering17:08
masgharOh I think you meant there is a separate endpoint for the IPA-ironic communication in this case17:09
TheJuliamasghar: the conductor can have a specific URL to hint to IPA17:12
masgharMakes sense17:13
jrossermasghar: depending on how you architect your deployment, its quite likley you have an "internal" API endpoint that has the potential to be setup somewhat differently to the public endpoint17:18
jrosserfor example here is a set of haproxy ACL which enforces certain paths of the ironic API originate from specific network ranges https://github.com/openstack/openstack-ansible/blob/master/inventory/group_vars/ironic_all/haproxy_service.yml#L28-L3017:20
masgharjrosser: interesting! 17:28
masghar(AFK for a bit, will take a closer look later)17:28
opendevreviewAdam Rozman proposed openstack/ironic-python-agent master: WIP - root device encryption  https://review.opendev.org/c/openstack/ironic-python-agent/+/92642519:07
opendevreviewcid proposed openstack/ironic master: Update configuration value in iRMC  https://review.opendev.org/c/openstack/ironic/+/92434919:09
opendevreviewDoug Goldstein proposed openstack/python-ironicclient master: support passing disable_ramdisk for clean and service state  https://review.opendev.org/c/openstack/python-ironicclient/+/92489519:31
cardoeSorry about the above JayF.19:31
cardoeSo I'm late to the convo TheJulia and masghar, but the multiple conductor groups is absolutely something I planned on exploring for my use case. And maybe I'm wrong but maybe my use case is a valid case for it.20:15
cidGood night ironic o/20:17
cardoeI've got one data center let's say and it's got 1 ironic API for the entire DC. So that user facing can just hit 1 endpoint. But really my DC is divided up into multiple leaf-spine fabrics. Data within one is fast and good but crossing the boundaries goes through an interconnect that while fast is going to be limited / QoSed.20:18
cardoeSo my rough idea is to have conductors in each of those leaf-spines and the devices that live there would be assigned to that conductor group.20:20
cardoeI see the traffic between the node and the conductor as being greater than the node and the ironic API.20:21
TheJuliacardoe: totally one of the reasons we built it :)23:00
TheJuliaAnd sorry for the delay in responding, a bad migraine hit me today. Waking up at this point.23:01
TheJuliacardoe: so you could just have two conductors for, for example, each of your leafs, and you can fail over there. But that entirely depends on the number of nodes in each leaf :)23:02
TheJuliaAnd your risk profile/concerns23:03
cardoeSo I’m a glutton for punishment and running OpenStack on top of k8s. So which is the harder option?23:14
TheJuliaYou can just run one if you don’t need to balance load. Steve baker did some changes to allow the conductor be a bit more graceful on k8s as a workload.23:15

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!