Wednesday, 2023-06-21

*** ultra is now known as Guest3688		04:08
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible master: Add 'tls-transition' scenario https://review.opendev.org/c/openstack/openstack-ansible/+/885194	08:46
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible stable/zed: Add support for 'tls-transition' scenario https://review.opendev.org/c/openstack/openstack-ansible/+/885196	08:46
kleini	I am currently upgrading to Yoga. setup-infrastructure did not fail in staging but in production: "Host controller2-repo-container-lalala is not in 'Peer in Cluster' state". I will read now through GlusterFS setup guide but maybe you have some faster hints.	08:48
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible master: Enable TLS on haproxy VIPs and backends by default https://review.opendev.org/c/openstack/openstack-ansible/+/885192	08:48
kleini	Stumbled over https://bugzilla.redhat.com/show_bug.cgi?id=1051992 restarting glusterd resolved peers in status "Accepted peer request"	08:58
jrosser	i wonder if it is a race somehow	09:01
jrosser	like if we bring the gluster peers up all at once then it might end up in a strange state	09:02
kleini	Now I have controller2 and controller3 as peers listed in controller2-repo-container-something and controller3-repo-container-something. Of course, "disconnected".	09:05
kleini	before 2- and 3-repo-container had just 1-repo-container as single peer	09:05
kleini	no, 2-repo-container has just 3-repo-container as single peer and vice versa.	09:08
kleini	sorry, my first time getting in touch with glusterfs	09:09
kleini	finally solved. I had two issues: 1. glusterd required a restart on all nodes to get transition from "Accepted peer request" to "Peer in Cluster" 2. nodes 2 and 3 had each other as peer, while 1 had 2 and 3 as peers. had to add 1 as peer on node 2, which automatically added node 1 as peer on node 3.	09:24
jrosser	interesting	09:27
jrosser	iirc the `infra` CI jobs start 3 repo containers to check this	09:28
noonedeadpunk	yup, it does	09:28
noonedeadpunk	but we don't check there like idempotency or anything like that for example	09:29
noonedeadpunk	and on top of that, I guess environment wasn't brand new?	09:29
kleini	maybe my second issue is caused by my first one.	09:30
kleini	environment is old, initially deployed with S or T. second and third controller node have been added later.	09:31
kleini	as my initial problem with peers in wrong states still does not seem to be resolved at all, would there be anything that could help to avoid such issues?	09:41
kleini	anything in my logs now, that could help to avoid it?	09:41
noonedeadpunk	To be frank - I'm not huge expert in gluster. We mount cephfs instead. As you can actually use any shared FS instesd of gluster, even things like s3fs or nfs. As at the end mount is configured using systemd-mount, and gluster installation can be disabled with a variable	09:45
jrosser	yes, gluster is not an absolute requirement at all, it's just that something is needed to provide a shared filesystem	09:50
jrosser	you can disable it and provide something else here https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/repo_all.yml#L25-L32	09:51
jrosser	having said that, really it should work though	09:52
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Bump ansible-core to 2.15.1 and collections https://review.opendev.org/c/openstack/openstack-ansible/+/886527	09:55
noonedeadpunk	yeah, it should for sure	09:56
jrosser	maybe as simple as needing to run that in serial, but i don't really know	10:00
opendevreview	Damian Dąbrowski proposed openstack/openstack-ansible master: Remove haproxy_accept_both_protocols from repo_all https://review.opendev.org/c/openstack/openstack-ansible/+/886586	10:00
kleini	will think about migrating that to a cephfs but then repo container need additional connection to Ceph storage network.	10:05
jrosser	glusterfs has been reliable for us	10:06
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-openstack_hosts master: Remove Ubuntu 20.04 support https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/886595	10:10
kleini	I expect it to be reliable, too, at least according to what I've heard. I just had now issues in production with setting it up initially.	10:11
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts master: Cleanup old OS support https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/886597	10:14
noonedeadpunk	switching to ansible-core 2.15 won't be trivial... Prtially because I did an unsupported thing with loop label lately :(	10:33
jrosser	this was to prevent secret output (key) being in the ansible log?	10:57
noonedeadpunk	well, more to suppress output and be more clear of what;s passed to the module instead of jsut what we're looping against	11:04
noonedeadpunk	as loop item != what we pass to the module, so kinda weird	11:04
jrosser	i think using default(omit) on the label is pretty suspect too	11:09
jrosser	even if a mapping were allowed	11:09
noonedeadpunk	I think I was adding default('omit') ? Which would jsut print out "omit"	11:32
noonedeadpunk	At least that was intention	11:32
noonedeadpunk	jrosser: have you ever was concerned about live migration speed?	11:33
noonedeadpunk	As it seems that with enabled TLS for libvirtd it uses only single core for migration	11:33
noonedeadpunk	while with disabled it utilkizes all. Which means that speed of migration is like a VM with disabled multiqueue	11:34
noonedeadpunk	~1.2gb	11:34
jrosser	i dont recall us having seen an issue with that yet	11:35
jrosser	andrewbonney: ^ ?	11:35
jrosser	that is pretty sad though	11:35
andrewbonney	I haven't seen it, but doesn't mean we don't have it	11:35
andrewbonney	Our previous issues were all around using the wrong interfaces	11:36
noonedeadpunk	Like https://listman.redhat.com/archives/libvirt-users/2018-May/msg00053.html	11:36
jrosser	the second post there is talking about large volumes	11:38
jrosser	andrewbonney: related to wrong interfaces, there are some patches regarding management address / ssh address which we need to go over	11:38
noonedeadpunk	ah, well https://wiki.qemu.org/Features/Migration-Multiple-fds	11:42
noonedeadpunk	what is fun though, is to see how encryption affects network throughput	11:50
jrosser	similar https://bugzilla.redhat.com/show_bug.cgi?id=1968540	11:54
noonedeadpunk	as without tls enabled for live migrations (using plain tcp), I have like 20gbit/s vs 3gbit/s with enabled encryption	11:54
noonedeadpunk	yup...	11:56
noonedeadpunk	So it's a feature	11:57
noonedeadpunk	Though now I'm very more sceptical about enabling internal tls by default	11:57
noonedeadpunk	damiandabrowski: you might be interested in the topic as well	11:57
jrosser	feels like that is a legitimate thing to talk to nova about, as it's not obvious that there is a big performance hit there	12:00
noonedeadpunk	yeah, already pinged them as well. At least mentioning that in docs would be good I guess	12:01
damiandabrowski	noonedeadpunk: but nova_qemu_vnc_tls is enabled by default already	12:02
noonedeadpunk	but what is really nasty, that when you disable tls - you also can not do authentication	12:03
noonedeadpunk	as it's done through mTLS	12:03
damiandabrowski	i didn't want to mention vnc :D	12:03
noonedeadpunk	damiandabrowski: well, what i meant, is that usage of any encryption that backed by gnutls, will get serious performance hit	12:04
jrosser	isnt live migration an extreme case of that though?	12:05
jrosser	normal API traffic will be spending much time doing $stuff in python anyway	12:06
damiandabrowski	i can test that with rally if we have any doubts	12:06
damiandabrowski	but i agree with jrosser	12:06
jrosser	i am also not really familiar with the process model of uwsgi, if the tls is in a separate process/thread from the python parts	12:08
noonedeadpunk	well, this feature for gnutls was merged for 3.7.3 which is exactly what you'd get in 22.04	12:11
noonedeadpunk	I should test in the sandbox	12:11
*** mgoddard- is now known as mgoddard		12:28
NeilHanlon	yayyy centos is not longer publishing to git.centos.org	12:57
* NeilHanlon begins crying		12:57
noonedeadpunk	┻━┻︵ \(°□°)/ ︵ ┻━┻	13:18
NeilHanlon	(https://www.redhat.com/en/blog/furthering-evolution-centos-stream)	13:22
noonedeadpunk	So now source code is locked for rhel customers only?	13:29
noonedeadpunk	rly?	13:29
NeilHanlon	pretty much	13:30
NeilHanlon	what a fun wednesday	13:30
noonedeadpunk	guess we should discuss marking CentOS as experimental at this point	13:31
noonedeadpunk	But not really sure I realize what it means for Rocky? Not much I guess?	13:32
NeilHanlon	for now it means we don't have updates...	13:33
NeilHanlon	actively working on wtf we're going to do, though	13:33
mgariepy	wow.	13:34
noonedeadpunk	oh, wow	13:35
mgariepy	rebase rocky on.. debian ?	13:37
noonedeadpunk	lol	13:37
noonedeadpunk	but that's not fun at all to be frank	13:39
mgariepy	i know.	13:39
noonedeadpunk	obvoisly that's a move against derivatives	13:39
mgariepy	ibm is so evil. imo.	13:41
spatel	what could be the issue for cinder volume stuck in detaching state ?	13:44
spatel	I am able to create/attach but detach just stuck	13:44
noonedeadpunk	does it get detached from nova point of view?	13:45
noonedeadpunk	as attachment is stored both in cinder and nova databases	13:46
spatel	checking nova logs	13:47
noonedeadpunk	and depending on what command you use to detach - flows might be different. Or well, they could be different until latest os-brick OSSA vulnarability got covered	13:47
spatel	noonedeadpunk this is the error I am getting on nova-compute.log - https://paste.opendev.org/show/bAbgQN0Qpf9do7pp0tMj/	13:51
noonedeadpunk	ask kolla ヽ(。_°)ノ	13:52
spatel	haha, its my lab	13:52
spatel	my production still running on openstack-ansible but some small environment using kolla.. :(	13:53
noonedeadpunk	but eventually the latest ossa coverage got commands to detach volume issued to cinder invalid	13:53
noonedeadpunk	and I think you should always use nova api to detach volumes since then	13:54
spatel	Hmm! I am using horizon to detach. You are saying use CLI?	13:54
spatel	everything was working fine until yoga but as soon as I upgrade to zed this issue encounter.	13:57
spatel	I will open bug and see if its real issue or something else	13:57
noonedeadpunk	Yeah, that's actually backported back to Yoga	13:58
noonedeadpunk	https://security.openstack.org/ossa/OSSA-2023-003.html	13:58
noonedeadpunk	and that's the release note covering your issue I beleive https://review.opendev.org/c/openstack/cinder/+/882835/2/releasenotes/notes/redirect-detach-nova-4b7b7902d7d182e0.yaml#20	14:00
noonedeadpunk	`cinder now rejects user attachment delete requests for attachments that are being used by nova instances to ensure that no leftover devices are produced on the compute nodes which could be used to access another project's volumes.`	14:01
spatel	You are saying it required to use nova service token ?	14:01
spatel	is this what you refer - https://docs.openstack.org/nova/latest/admin/configuration/service-user-token.html	14:02
noonedeadpunk	I'm saying there used to be 2 api cals that allowed to detach volume - one to cinder and another to nova	14:03
noonedeadpunk	from now own requests directly to cinder will fail	14:03
noonedeadpunk	So you have this https://docs.openstack.org/api-ref/block-storage/v3/index.html#detach-volume-from-server	14:04
noonedeadpunk	and you have that https://docs.openstack.org/api-ref/compute/#detach-a-volume-from-an-instance	14:04
noonedeadpunk	and now first one can be called only by nova service and not by user	14:05
noonedeadpunk	If I'm not mistaken and it's vice versa...	14:06
spatel	Let me understand, You want me to use nova volume-detach command to detach volume?	14:07
spatel	when you say nova api means what?	14:07
spatel	Let me understand whole bug report first	14:12
noonedeadpunk	or `openstack server remove volume`	14:12
spatel	that didn't help :(	14:14
spatel	I believe we need to configure something like this in cinder or nova - send_service_user_token = True	14:14
spatel	because of security reason now cinder won't allow to detach volume without valid token from nova..	14:15
spatel	trying to understand where and how I should add those option in config	14:15
spatel	https://bugs.launchpad.net/cinder/+bug/2004555/comments/75	14:16
noonedeadpunk	You can check how we did that :)	14:18
noonedeadpunk	so you should define service_user https://opendev.org/openstack/openstack-ansible-os_cinder/src/branch/master/templates/cinder.conf.j2#L193-L203	14:19
spatel	oh.. let me splunk OSA code..	14:19
noonedeadpunk	and also use service token roles https://opendev.org/openstack/openstack-ansible-os_cinder/src/branch/master/templates/cinder.conf.j2#L177-L179	14:19
spatel	I should be adding them in NOVA and Cinder both place correct?	14:19
noonedeadpunk	and glance I guess	14:19
spatel	3 roles?	14:20
noonedeadpunk	Well, role is `service` for all of them	14:24
admin1	noonedeadpunk, from which tag in osa is the nova service token implemented ?	14:30
admin1	i see .. yoga and xena	14:31
noonedeadpunk	it's not backported to xena :(	14:42
noonedeadpunk	for yoga it's 25.4.0	14:43
noonedeadpunk	and minor upgrade could be quite breaking as well - I believe I wrote a release note to address that	14:43
spatel	I hit this issue in my upgrade path so definitely worth keeping eyes	14:44
anskiy	noonedeadpunk: that note only mentions major upgrades, if I understand correctly	15:23
mgariepy	anyone here do multiple let'sencrypt domain/ips on a deployment ?	15:23
anskiy	I'm planning an upgrade from 25.2.0 to 25.4.0, should I be concerned about that thing?	15:24
* noonedeadpunk needs to check patch again		15:24
noonedeadpunk	anskiy: I think you're right, and minor upgrade will just cover vulnarability and resilently enable usage of service tokens	15:26
noonedeadpunk	or well, relatively resiltently	15:26
anskiy	okay, thank you :)	15:27
noonedeadpunk	as problem was arising, when you already require service roles, but users were not assigned the role on the first place	15:27
noonedeadpunk	When upgrading to Yoga, you would get role assigned, but it was not forced yet. And with this minor upgrade it will be forced	15:28
spatel	noonedeadpunk it works now after adding service_user snippet :)	16:16
spatel	Thank to point out that	16:16
admin1	mgariepy, use case ? .. i usually use a wildcard	18:22
mgariepy	having 2 different ips on the ctrl, one for api the other object	18:24
mgariepy	admin1, ^^	18:25
admin1	i have use SAN for those, but not used letsencrypt	18:33
jrosser	mgariepy: I think you can supply extra args to certbot through the haproxy role vars	18:52
jrosser	with more ‘-d {{ fqdn}}’ as you need them	18:52
mgariepy	jrosser, yep i think i found the correct stuff	18:53
mgariepy	i want to have 1 cert per ip/domain tho.	18:53
jrosser	yeah I’m not so sure we can do that just now	18:53
mgariepy	will need to do some stuff but it should work.	18:53
mgariepy	i'll see and patch as needed i guess.	18:53
jrosser	we are looking at enabling s3 static sites which needs another ip/dns on the same haproxy	18:53
mgariepy	the keepalived part is kinda simple.	18:54
mgariepy	with keepalived_instances_overrides	18:54
jrosser	unclear if it is possible to have two haproxy front ends with different LE setups	18:54
jrosser	or if it’s ok to share the same very with a SAN	18:55
jrosser	*same cert	18:55
mgariepy	i'll dig a bit, might need some custom haproxy front/back i guess..	18:59
mgariepy	hmm yeah i guess it would need some adjustment..	19:24
mgariepy	jrosser, i guess we would need to refactor part of the code. here.	19:56
mgariepy	the haproxy is probably the simplest part since it can be bound to a specific ip for a Front	19:58

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!