Monday, 2025-07-07

noonedeadpunk	good morning	07:26
jrosser	o/ morning	07:50
jrosser	looks like a missing dependancy here https://zuul.opendev.org/t/openstack/build/35dd297230ae4b7280a7780d9be80671	07:58
jrosser	distutils for centos10	07:58
jrosser	i was also thinking that making the "user supplied" choice an explicit backend in the pki role could make things simpler	08:03
jrosser	as we add extra backends it is kind of odd that user supplied is some special case of "standalone"	08:03
jrosser	when actually it would not use the standalone cert generation at all	08:03
jrosser	that would allow the vars that define a cert to become more uniform and reduce "these vars are for backend X" "these vars are for backend Y" that we seem to be going towards	08:04
noonedeadpunk	I think that distutils is provided by setuptools, right?	08:10
noonedeadpunk	I actually can recall coverign it in some other molecule job	08:11
jrosser	i think thats right yes	08:11
noonedeadpunk	damiandabrowski: ^	08:16
damiandabrowski	hi! jrosser I'm not sure if I understand, can you prepare some example?	08:18
frickler	distutils is provided by setuptools, but the latter is no longer automatically installed in a venv, you need to make that explicit	08:18
jrosser	damiandabrowski: like here https://github.com/openstack/openstack-ansible-os_glance/blob/master/defaults/main.yml#L380	08:19
noonedeadpunk	ah	08:20
noonedeadpunk	well, how would it be different from standalone then?	08:20
jrosser	frickler: i think there is a good chance that this is actually missing in the system python, as the missing distutils error is happening inside execution of an ansible module on its target	08:20
frickler	but that should already have been needed with noble, cf. https://review.opendev.org/c/openstack/kolla/+/907589	08:21
noonedeadpunk	frickler: yeah, we do it for sure in most cases, but missed some other in molecule it seems	08:21
noonedeadpunk	or just docker images are slightly different in what they have for ubuntu/centos	08:22
damiandabrowski	ahhh I see, yes that may be a good idea.	08:22
damiandabrowski	If I understand you corectly, it will slightly reduce the "these vars are for backend X" "these vars are for backend Y" issue	08:23
damiandabrowski	because we won't need a separate src (for standalone) and cert (for hashi_vault) parameter here, right?	08:23
noonedeadpunk	but thinking about that, offloading this logic to the role might make sense indeed	08:23
damiandabrowski	https://opendev.org/openstack/openstack-ansible-os_glance/src/commit/1f1dc604b1cf0c5b4c2cd8c502299399e6848b9a/defaults/main.yml#L405	08:23
jrosser	damiandabrowski: i was noticing in your patches that the name of the cert really is quite simple `myservice_{{ ansible_facts['hostname'] }}`	08:23
jrosser	and theres no reason it cannot also be the same format for the standalone, except that it is expecting a path in order to also support user-supplied	08:24
noonedeadpunk	we can just go with `name` instead thyen?	08:24
noonedeadpunk	or well.	08:24
jrosser	well something yes for sure, im not totally sure what is the best option	08:25
noonedeadpunk	yeah, it can be name for standalone/vault and keep src for user-supplied?	08:25
jrosser	"src" is pretty well understood from the copy module so that does make sense for the user-supplied	08:25
noonedeadpunk	as we still need to pass a user-supplied path somehow, unless we restrain this to expected structure	08:25
noonedeadpunk	and tell that in order to use user-supplied certs, you have to place them under expected strucutre	08:26
damiandabrowski	but maybe we can use exactly the same parameter for all backends - standalone, user-supplied and hashi_vault?	08:26
damiandabrowski	in this case, src would perfectly fit IMO	08:26
damiandabrowski	ofc, for user-supplied backend, user will need to override the default value of src with the expected cert path	08:27
jrosser	unless the use of `name` in that case means you have to put it in the expected directory	08:28
jrosser	and `src` allows you to place it wherever you wish	08:28
noonedeadpunk	yeah	08:28
damiandabrowski	should we work on this before of after merging hashi_vault patches?	08:29
damiandabrowski	or after*	08:29
jrosser	anyway, i was just thinking that if we tidy up a few things (like noonedeadpunk did with the defaults/group vars typo) and also this, there might be quite some improvment for adding new backends cleanly	08:29
jrosser	you did say that you had some trouble with overriding defaults and vars, i think we should address that first	08:30
damiandabrowski	ouh, maybe I wasn't clear enough. I had troubles with this in other roles :D (for ex. ansible-hardening stores its variables in vars/ which is problematic)	08:31
jrosser	well it's not problematic, so long as they're not supposed to be overridden :)	08:31
jrosser	but yeah it needs care to make it correct	08:31
jrosser	ahhah https://opendev.org/openstack/ansible-role-pki/commit/4e960a1083c71babc05a282a929f29d8f2f4df02	08:33
noonedeadpunk	oh, well :)	08:34
noonedeadpunk	time to add it to redhat then	08:34
opendevreview	Jonathan Rosser proposed openstack/ansible-role-pki master: Add python3-setuptools for redhat-10 based distros. https://review.opendev.org/c/openstack/ansible-role-pki/+/954213	08:35
damiandabrowski	fair point, yeah :D I will try to sort out the vars/ vs. defaults/ issue, but we will definitely need to have some backend-specific defaults in ansible-role-pki anyway	08:37
damiandabrowski	and I wonder if it's okay to keep them in defaults/main.yml or move them to something like defaults/hashi_vault.yml and dynamically import this file when needed	08:37
damiandabrowski	example: https://opendev.org/openstack/ansible-role-pki/src/commit/00545ffa46446372b0baf7fdb8a4b99e3eb5926a/defaults/main.yml#L205	08:38
jrosser	i think it is fine for them to be in defaults, so long as it is variables that are intended to be overridden	08:39
jrosser	i would much rather we address this sort of thing https://opendev.org/openstack/ansible-role-pki/src/commit/00545ffa46446372b0baf7fdb8a4b99e3eb5926a/defaults/main.yml#L172-L180	08:39
damiandabrowski	okok, will do that	08:43
damiandabrowski	I aim to apply improvements to my patches later this week	08:43
noonedeadpunk	jrosser: btw on your comment here: https://review.opendev.org/c/openstack/openstack-ansible/+/953570 :)	08:43
damiandabrowski	it would be nice to gather as much feedback as possible by then :D	08:43
noonedeadpunk	the problem is, that despite squid is listening on *:3128, the problem is that it's not actually responding on management_address, as it's configured after service is started	08:44
jrosser	oh becasue we change the order of setting up squid and creating the network?	08:45
noonedeadpunk	but also, there's a race condition, that openstack_hosts fail before networks are configured, as they try to reach proxy to install systemd_networks	08:45
noonedeadpunk	yeah	08:45
noonedeadpunk	but the route I've added is /32	08:46
jrosser	right but that allows services to contact the external vip?	08:46
noonedeadpunk	so it does not really give any escape path - jsut tells how to reach squid and do that not via lxcbr with nat, but via mgmt networek	08:46
jrosser	unless i misunderstand......	08:47
noonedeadpunk	hm	08:47
jrosser	the proxy scenario is kind of a two-for-the-price-of-one test	08:47
jrosser	becasue it proves that everything goes via the proxy, or it will fail	08:48
jrosser	and i think aslo it prevents misconfigured/broken services from directly using the external vip	08:48
noonedeadpunk	well, they will go through proxy be default anyway then?	08:49
jrosser	no, becasue it only sets deployment_environment_variables iirc	08:49
noonedeadpunk	as public vip is in no_proxy anyway?	08:49
jrosser	so there is no left-over proxy config left once the ansible has run	08:49
noonedeadpunk	well, then this idea to offload to openstack_host sucks....	08:51
noonedeadpunk	as with proxy ansible/apt wants to go through it right away	08:51
noonedeadpunk	not giving any chance to provision networks after bootstrap_aio is completed	08:51
jrosser	well yes	08:53
jrosser	in a real environment the proxy would be something that just exists before you start and you point to it	08:53
jrosser	not something ever provisioned by openstack-ansible	08:53
jrosser	similar case pretty much for step-ca?	08:53
noonedeadpunk	I think proxy is a bit unique here	08:55
jrosser	i think that theres two things going on here	08:55
noonedeadpunk	as proxy is pre-requirement for `apt` to install packages	08:55
jrosser	the setup of networks in openstack-hosts does have benefits for automating more of the OSA specific things	08:55
noonedeadpunk	while step-ca is needed waaaay later	08:55
jrosser	but the other case is "test fixtures" that we need which are somehow network related, and squid is the most early of these it seems	08:56
noonedeadpunk	well, yes. As also on production you won't host squid on any of openstack hosts anyway	09:02
noonedeadpunk	it should be somehow in different perimeter	09:02
jrosser	well maybe thats what we do	09:10
noonedeadpunk	ok, can we start a bit from the beginning here? Just trying to think if I should abandon this patch based on that or maybe not	09:10
jrosser	we have have some other code that deploys test fixtures right at the start	09:10
jrosser	and some other interface or whaever thats nothing to do with the OSA deploy	09:10
jrosser	so it behaves just like it would in production	09:10
noonedeadpunk	so like another bridge?	09:11
noonedeadpunk	and add it to all containers?	09:11
jrosser	and use the .102 IP perhaps so that theres no confusion with the VIP	09:11
jrosser	oh hmm	09:11
jrosser	i think it would be OK if the route you'd added was to something that was not the external VIP	09:12
noonedeadpunk	so the issue is order of execution	09:13
noonedeadpunk	we don't have this IP address up until systemd_networkd is restarted, but we already need it to install systemd_networkd by apt	09:13
noonedeadpunk	and `bootstrap_host_public_address` is ansible_default_ip4_address	09:14
noonedeadpunk	which we already have by default	09:14
noonedeadpunk	and we can't add extra IP to the default interface, as it can be real IP and also restricted by firewall or allowed-address pairs	09:15
noonedeadpunk	so I can't mess up with it	09:15
noonedeadpunk	so the only 2 things I have to not startup squid on the public VIP, is either to provision a completely separate network in aio as we do today, or give up on idea of using openstack_hosts for network provisionment	09:16
noonedeadpunk	(but also squid is listening on *:3128, so it's not even startup, but jsut ordering loophole)	09:17
noonedeadpunk	I think that with current approach we do test proxy connection in quite a good way tbh. As dropping the route or doing smth wrong with squid would result in failures right away.	09:18
noonedeadpunk	With that, we actually were not testing proxy connection to the public VIP anyway as we have this today: https://opendev.org/openstack/openstack-ansible/src/branch/master/tests/roles/bootstrap-host/templates/user_variables.aio.yml.j2#L322	09:19
jrosser	yes i agree it is pretty robust and catching errors that we would not find in other tests, even for non proxy deployments (like endpoint errors)	09:19
noonedeadpunk	So services would walk to public IP without squid regardless	09:19
noonedeadpunk	what the route changes, is that snat won't be used for reaching public vip	09:21
noonedeadpunk	but we're keeping the interface we communicate over, as previously we were talking throug mgmt interface as well	09:22
noonedeadpunk	but yeah, dunno	09:22
noonedeadpunk	It's not I like proposal, I don't see good option to solve that	09:22
jrosser	also i think i removed lxcbr0/eth0 from the containers entirely in this config	09:25
jrosser	so it's not possible to nat via the host at all	09:25
jrosser	https://opendev.org/openstack/openstack-ansible/src/branch/master/tests/roles/bootstrap-host/templates/user_variables.aio.yml.j2#L324-L326	09:27
noonedeadpunk	ah, yes, true	09:28
noonedeadpunk	I think that's why I added route :D	09:29
jrosser	i think i'm more confused rather than less now	09:29
jrosser	but eth1 should be the mgmt subnet	09:29
noonedeadpunk	and this is what was guaranteeing public ip is not reachable	09:29
jrosser	oh ok ok public ip is the node ip isnt it	09:29
noonedeadpunk	and I think it was not reachable at all, as containers were not having a default route	09:29
noonedeadpunk	well	09:30
jrosser	so i'm getting confused about .100 / .101 for sure	09:30
noonedeadpunk	what we can do... is to have a different IP for proxy for containers and bare metal	09:30
noonedeadpunk	.100 and .101 are both management network	09:30
jrosser	yes one is the VIP and one is the bind address for services?	09:31
noonedeadpunk	as for containers, we can keep having proxy on management network, but do it on public for bare metal	09:31
noonedeadpunk	yes	09:31
noonedeadpunk	but for public we don't have VIP	09:31
noonedeadpunk	so yeah, I can do is_metal check and use either bootstrap_host_public_address or bootstrap_host_management_address depending on the choice	09:32
noonedeadpunk	for http_proxy	09:32
noonedeadpunk	and then we don't need route	09:33
jrosser	so for the "early" things it would use the public address that squid is already bound to	09:34
noonedeadpunk	yeah	09:34
noonedeadpunk	for late as well, given they're running not in lxc though	09:34
noonedeadpunk	but public vip is local bound, so you can't prohibit reaching it anyway	09:35
noonedeadpunk	opr well	09:35
noonedeadpunk	you can...	09:35
noonedeadpunk	but you got it :)	09:35
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Offload network provisionment for AIO to openstack_hosts https://review.opendev.org/c/openstack/openstack-ansible/+/953570	09:38
noonedeadpunk	so smth like that I guess	09:38
noonedeadpunk	oh, not really	09:39
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Offload network provisionment for AIO to openstack_hosts https://review.opendev.org/c/openstack/openstack-ansible/+/953570	09:40
noonedeadpunk	it could be fairt middle-ground....	09:41
noonedeadpunk	or indeed we should be having just a separate network/bridge for squid which we still do during bootstrap-aio	09:42
noonedeadpunk	or abandon the patch :)	09:42
opendevreview	Jonathan Rosser proposed openstack/ansible-role-pki master: Allow certificates to be installed by specifying them by name https://review.opendev.org/c/openstack/ansible-role-pki/+/954239	12:43
jrosser	damiandabrowski: ^ maybe something like this? then for the vault stuff you can also use the `name` key	12:43
jrosser	i think also this is backward compatible, and we could go through the roles and migrate everything to use `name`	12:44
damiandabrowski	i liked the idea of a "user-provided" backend a bit more, because it would allow us to have only one variable instead of name and src, but this also looks good at first glance	13:18
damiandabrowski	I'll have a deeper look tomorrow	13:30
opendevreview	Jonathan Rosser proposed openstack/ansible-role-pki master: Allow certificates to be installed by specifying them by name https://review.opendev.org/c/openstack/ansible-role-pki/+/954239	14:38
opendevreview	Jonathan Rosser proposed openstack/ansible-role-pki master: Allow certificates to be installed by specifying them by name https://review.opendev.org/c/openstack/ansible-role-pki/+/954239	14:45
opendevreview	Jonathan Rosser proposed openstack/ansible-role-pki master: Allow certificates to be installed by specifying them by name https://review.opendev.org/c/openstack/ansible-role-pki/+/954239	15:11
jrosser	damiandabrowski: even if we add a user-provided backend we have to support the vars that are in use today	15:11
jrosser	even if its temporary whist migrating src -> name for most things	15:12
jrosser	and one thing i'm not sure about is how we "enable" a user provided cert	15:12
jrosser	as right now it's enough to just define `glance_user_ssl_cert` as the path to the file, and it will work	15:13
jrosser	no messing with backend settings, or redefining the whole set of certs for glance just to set one of them to be user supplied	15:13
jrosser	setting that var really implies that some 'user-supplied' functionality is used, however that is implemented	15:14
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-os_glance master: Use 'name' to specify SSL certificates to the PKI role https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/954269	15:29
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-os_glance master: Use 'name' to specify SSL certificates to the PKI role https://review.opendev.org/c/openstack/openstack-ansible-os_glance/+/954269	15:51
damiandabrowski	yeah, you may be right about keeping src...	16:22
jamesdenton_	jrosser hello! looking back thru some old IRC logs... curious if you're using ASAP^2 in production with OVN	17:26
jrosser	well	17:27
jrosser	some time ago we (well andrewbonney actually) did a POC with it	17:27
jrosser	and whilst it did basically work it was extremely fragile	17:27
jrosser	i do have some notes here is there is anything in particular	17:30
jamesdenton_	Nothing in particular, no. Curious if the fragility was more on the driver side than say, neutron	17:30
jamesdenton_	Mainly curious to know what throughput you're seeing for GENEVE out of the box	17:31
jamesdenton_	and besides ASAP^2, are you doing anything special for offloading	17:31
jamesdenton_	i'm not crazy about DPDK, even aafter all this time	17:31
jrosser	from what i can see in the test environment we were getting ~5Gbps out of the box between two VM on different hosts with the standard OVN setup	17:37
*** jpw_alt is now known as jpw		17:37
jrosser	and i think we got that up to 18Gbps with the offloading as good as we could get at the time (~2 years ago)	17:38
jrosser	i think that the fragility was in the composite of all-the-things that had to be just right at the same time	17:39
jrosser	and all these things are pretty niche features, like vf-lag, the ovs offloading itself, and i think at the time offloaded security groups were a really new feature	17:40
jrosser	and you need some pretty custom config of the NIC at boot, at every boot as well	17:41
jamesdenton_	that tracks pretty well. thank you	17:43
jrosser	it would do 35Gbps between VM on the same host, and that was with 4x iperf threads all at ~85% CPU on the server side	17:45
jrosser	so that might have been running out of CPU grunt at that point rather than network	17:46
jamesdenton_	yeah, exactly. We're seeing about 2.2Gbps between hypervisors on some Broadcom 25G NICs	17:46
jamesdenton_	and thats a single iPerf thread. I can scale that out horizontally but no one thread is > than 2Gbps or so	17:47
jamesdenton_	What's interesting is this offload page mentions Broadcom NICs as supporting offloading - though i'm not sure if the implementation is the same. https://docs.openstack.org/neutron/latest/admin/config-ovs-offload.html	17:48
jrosser	looks like we also struggled to set encapsuation + vlan tagging at the same time on the what you'd call "uplink" port from OVS	17:49
jrosser	for this test we ended up having to have the tunnel traffic untagged up to the switch	17:49
jrosser	interesting - no detail for the broadcom nic though	17:50
jamesdenton_	that doc doesn't really call out the vtep itself, huh. an important missing detail :D	17:51
jrosser	damiandabrowski: why dont we also implement ca_chain and fullchain for the standalone backend	18:10
jrosser	seems like we leak implementation details of hashi vault out with the need for dual handling of string or list for `type`	18:10

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!