Thursday, 2024-08-15

iurygregory	and https://xkcd.com/927/ strikes again in Redfish \o/	01:01
iurygregory	Reason: unable to start inspection: The attribute Links/ManagedBy is missing from the resource /redfish/v1/Systems/1	01:01
iurygregory	insert happy dance gif /s	01:01
opendevreview	Jacob Anders proposed openstack/sushy master: When ManagedBy attribute is missing from System retry with Managers https://review.opendev.org/c/openstack/sushy/+/926370	03:15
opendevreview	Jacob Anders proposed openstack/sushy master: When ManagedBy attribute is missing from System retry with Managers https://review.opendev.org/c/openstack/sushy/+/926370	03:21
opendevreview	Jacob Anders proposed openstack/sushy master: When ManagedBy attribute is missing from System retry with Managers https://review.opendev.org/c/openstack/sushy/+/926370	05:52
kubajj	Good morning Ironic! o/	06:47
opendevreview	Bela Szanics proposed openstack/ironic master: Fix conductor startup warning message https://review.opendev.org/c/openstack/ironic/+/926398	09:45
opendevreview	Jacob Anders proposed openstack/sushy master: When ManagedBy attribute is missing from System retry with Managers https://review.opendev.org/c/openstack/sushy/+/926370	10:25
opendevreview	Jacob Anders proposed openstack/sushy master: When ManagedBy attribute is missing from System retry with Managers https://review.opendev.org/c/openstack/sushy/+/926370	10:43
iurygregory	good morning ironic	10:50
opendevreview	Bela Szanics proposed openstack/ironic master: Fix conductor startup warning message https://review.opendev.org/c/openstack/ironic/+/926398	11:11
iurygregory	dtantsur, would you be ok with me approving https://review.opendev.org/c/openstack/sushy/+/926370 ? (I think the nit can be in a follow-up, so we can move forward later with downstream process) wdyt?	11:27
* iurygregory is a bit upset because the issue is on the implementation done in the hardware...		11:29
dtantsur	iurygregory: yep, totally	12:01
opendevreview	Merged openstack/sushy master: When ManagedBy attribute is missing from System retry with Managers https://review.opendev.org/c/openstack/sushy/+/926370	12:23
opendevreview	Dmitry Tantsur proposed openstack/sushy stable/2024.1: When ManagedBy attribute is missing from System retry with Managers https://review.opendev.org/c/openstack/sushy/+/926410	12:34
TheJulia	good morning	13:26
iurygregory	good morning TheJulia	13:27
opendevreview	Merged openstack/sushy stable/2024.1: When ManagedBy attribute is missing from System retry with Managers https://review.opendev.org/c/openstack/sushy/+/926410	14:09
opendevreview	cid proposed openstack/ironic master: docs-audit-2024: Labeling references https://review.opendev.org/c/openstack/ironic/+/925691	14:16
opendevreview	cid proposed openstack/ironic master: Address TODO: set to the same value as in [pxe] after Xena https://review.opendev.org/c/openstack/ironic/+/926415	14:17
opendevreview	Doug Goldstein proposed openstack/python-ironicclient master: support passing disable_ramdisk for clean and service state https://review.opendev.org/c/openstack/python-ironicclient/+/924895	14:17
opendevreview	cid proposed openstack/ironic master: Address TODO: set to the same value as in [pxe] after Xena https://review.opendev.org/c/openstack/ironic/+/924349	14:20
TheJulia	cid: I think it would help to revise the commit message on https://review.opendev.org/c/openstack/ironic/+/924349	14:23
TheJulia	specifically the title	14:23
opendevreview	cid proposed openstack/ironic master: Update configuration value https://review.opendev.org/c/openstack/ironic/+/924349	14:24
TheJulia	cid: add irmc :)	14:24
cid	TheJulia: To the previous or current title :)	14:25
TheJulia	current :)	14:25
cid	Alright	14:26
TheJulia	Thanks!	14:26
opendevreview	cid proposed openstack/ironic master: Update configuration value in iRMC https://review.opendev.org/c/openstack/ironic/+/924349	14:27
TheJulia	Thanks again!	14:37
JayF	jfyi I'm out sick today, if you need something from me urgently send a DM or sms	14:37
opendevreview	Merged openstack/ironic master: Fix conductor startup warning message https://review.opendev.org/c/openstack/ironic/+/926398	14:47
masghar	Get well soon Jay!	15:27
masghar	I have a question about the association of a node to a specific conductor in a multi-conductor setup: why are they linked?	16:28
masghar	I mean: cant any conductor drive the transition of anode from one state to the next? And once thats done, the next transition can be done by any conductor?	16:29
TheJulia	masghar: I was about to ask for you to clarify the question :)	16:29
masghar	Also, am I right in understanding that each conductor has one API instance for itself?	16:29
masghar	Also, how old are multi-conductor setups? Is it a fairly new design/implementation?	16:30
masghar	Also, all nodes share one database?	16:30
masghar	Sorry I am full of questions :)	16:31
TheJulia	So, in theory, but reality is you may have a global ironic deployment. Or you may have a group of servers with specific operating security requirements with different base configuration needs. Because config/needs/routes can differ, that is the basis	16:31
masghar	sorry I meant all conductors	16:31
TheJulia	Further you also don’t want all the conductors downloading 1gb to deploy 1gb. That would also be super inefficient	16:31
TheJulia	Ironic, really, has always been multi-conductor. We’ve added new capabilities to handle localization and logical resource groupings.	16:32
masghar	When you say a group of servers with different security and config requirements, do you mean the nodes we provision or the nodes ironic runs on?	16:34
TheJulia	Realistically, both.	16:35
masghar	Oh I see	16:35
masghar	Its a whole mixture of things	16:35
jrosser	^ i have found this all gigantically confusing when trying to make things highly available rather than at large scale	16:35
masghar	'Further you also don’t want all the conductors downloading 1gb to deploy 1gb. That would also be super inefficient' (how do I quote reply)	16:38
masghar	I don't quite follow..wouldnt that be load-balancing for the conductors?	16:38
TheJulia	The plus side, is if you have multiple conductors in a conductor group, they can serve those specific nodes and you can scale those as you need, and have no conductor group and scale the rest. It just impacts how the hash ring operates	16:39
TheJulia	and truthfully, it is just a key injection.	16:39
TheJulia	to the calculation.	16:39
TheJulia	masghar: in a theory of any conductor can service any node, that baremetal node has to map to a specific conductor if you have asked ironic to download and stage files to the conductor	16:40
masghar	I see!	16:40
TheJulia	you can't have every conductor download that payload to deploy in anticipation of need	16:40
TheJulia	jrosser: Sorry, how can we help?	16:41
masghar	So the servicing of that image-related request has to be done by one conductor IIUC (makes sense)	16:41
TheJulia	Yes, so that also helps drive the locking model as well	16:41
masghar	okay, makes sense!	16:42
jrosser	TheJulia: well maybe it's just that i dont understand, but with pretty much the whole of the rest of openstack i can take down 1 of N controllers and life just carries one	16:42
jrosser	*on	16:42
TheJulia	the conductor locks the baremetal "node" for exclusive action. You also don't want multiple conductors trying to do things like talk to the BMC because you'll likely crash or lockout the user on the BMC.	16:42
jrosser	and so this becomes a bit of an operational headache when we get to doing software upgrades, maintainance or even worse totally re-installing a controller	16:42
TheJulia	jrosser: which part, you can have multiple conductors in the same group, or no conductor groups at all?	16:43
jrosser	but the baremetal nodes are still managed on an individual basis by one of the conductors, regardless of the use of a group?	16:44
TheJulia	Think of a conductor group as a knob to control the hash ring	16:45
TheJulia	in other words, how nodes are mapped to conductors	16:45
TheJulia	so if one conductor that owns a node goes down, the hash ring recomputes	16:45
TheJulia	future work gets re-assigned to the new conductor chosen by the hash ring	16:45
jrosser	ah well that is what i am missing understanding :)	16:46
TheJulia	that new conductor also takes over responsibility for the node until the hash ring recomputes	16:46
masghar	And this recomputation happens within conductor groups only, or this is regular multi-conductor behaviour?	16:46
TheJulia	The recomputation happens across all conductors, but think of conductor groups as limiting the scope and input into the calculation	16:47
TheJulia	so, the results may change, but the hash ring from a group of unaffected nodes in the same group would not change	16:47
TheJulia	other conductors outside of that specific group may change their hash ring and may take over for nodes	16:47
TheJulia	well, to be more precise, "take over for nodes" means "take over responsibility for the baremetal nodes for the node which has failed or which need to be re-shuffled due to the failure.	16:48
jrosser	the other thing that looks difficult to h/a is the association of a node to a nova-compute instance?	16:48
masghar	(Thats so useful!) The hash ring basically maps nodes to conductors (atleast one conductor)?	16:48
TheJulia	jrosser: that is a depressing topic	16:49
jrosser	right - because in the scenario of upgrading a controller you quite likley lose 1/Nth of the nova-compute instances dedicated to ironic	16:50
TheJulia	jrosser: so, API wise, the nova-compute service uses the API so it doesn't see the failure of a conductor. If ironic can continue it does, if it cannot, it fails the deploy. That is what it sees. Nova-compute has no real H/A capability, but you can separately configure a "failover" set where the service starts with the exact same "hostname" if the nova-compute process or process's host fails. But that requires advance	16:50
TheJulia	planning.	16:50
jrosser	thats interesting	16:52
TheJulia	jrosser: in the nova context, your quite literally taking down a "hypervisor", so they don't want nor have been willing to allow a virt driver to change/fix/heal the db state over fear of fields which really don't matter in that context because we're not managing VMs and Ironic is doing the actual management with keeping the data it needs in the database	16:52
TheJulia	Ironic's database, to be precise.	16:52
masghar	As for the API and conductor processes, there must be one endpoint served by the API for (all) the conductor(s) running? (So to the external clients of the API, the conductor setup doesnt matter)?	16:59
masghar	Or am I mistaken	16:59
TheJulia	no, you can have separate API endpoints... but the cases where you may want to do that will be along the lines of split horizon for access. In other words, you have a public endpoint which has some restrictions enabled so IPA can't talk to it, but then you have ironic and other services talk to the other API endpoint.	17:00
masghar	I see, I see	17:01
masghar	Thanks TheJulia!	17:01
TheJulia	Specifically https://github.com/openstack/ironic/blob/master/ironic/conf/api.py#L77	17:03
masghar	Quite strange	17:07
TheJulia	how so?	17:07
masghar	The config option sounds reasonable, but if we have a public endpoint and IPA cant talk to it, then how will IPA reach Ironic is what I was wondering	17:08
masghar	Oh I think you meant there is a separate endpoint for the IPA-ironic communication in this case	17:09
TheJulia	masghar: the conductor can have a specific URL to hint to IPA	17:12
masghar	Makes sense	17:13
jrosser	masghar: depending on how you architect your deployment, its quite likley you have an "internal" API endpoint that has the potential to be setup somewhat differently to the public endpoint	17:18
jrosser	for example here is a set of haproxy ACL which enforces certain paths of the ironic API originate from specific network ranges https://github.com/openstack/openstack-ansible/blob/master/inventory/group_vars/ironic_all/haproxy_service.yml#L28-L30	17:20
masghar	jrosser: interesting!	17:28
masghar	(AFK for a bit, will take a closer look later)	17:28
opendevreview	Adam Rozman proposed openstack/ironic-python-agent master: WIP - root device encryption https://review.opendev.org/c/openstack/ironic-python-agent/+/926425	19:07
opendevreview	cid proposed openstack/ironic master: Update configuration value in iRMC https://review.opendev.org/c/openstack/ironic/+/924349	19:09
opendevreview	Doug Goldstein proposed openstack/python-ironicclient master: support passing disable_ramdisk for clean and service state https://review.opendev.org/c/openstack/python-ironicclient/+/924895	19:31
cardoe	Sorry about the above JayF.	19:31
cardoe	So I'm late to the convo TheJulia and masghar, but the multiple conductor groups is absolutely something I planned on exploring for my use case. And maybe I'm wrong but maybe my use case is a valid case for it.	20:15
cid	Good night ironic o/	20:17
cardoe	I've got one data center let's say and it's got 1 ironic API for the entire DC. So that user facing can just hit 1 endpoint. But really my DC is divided up into multiple leaf-spine fabrics. Data within one is fast and good but crossing the boundaries goes through an interconnect that while fast is going to be limited / QoSed.	20:18
cardoe	So my rough idea is to have conductors in each of those leaf-spines and the devices that live there would be assigned to that conductor group.	20:20
cardoe	I see the traffic between the node and the conductor as being greater than the node and the ironic API.	20:21
TheJulia	cardoe: totally one of the reasons we built it :)	23:00
TheJulia	And sorry for the delay in responding, a bad migraine hit me today. Waking up at this point.	23:01
TheJulia	cardoe: so you could just have two conductors for, for example, each of your leafs, and you can fail over there. But that entirely depends on the number of nodes in each leaf :)	23:02
TheJulia	And your risk profile/concerns	23:03
cardoe	So I’m a glutton for punishment and running OpenStack on top of k8s. So which is the harder option?	23:14
TheJulia	You can just run one if you don’t need to balance load. Steve baker did some changes to allow the conductor be a bit more graceful on k8s as a workload.	23:15

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!