Friday, 2024-02-09

gokhan	good morning noonedeadpunk, I am trying distribution upgrades, but when reinstalling infra nodes, lxc containers are not created with "openstack-ansible setup-hosts.yml --limit localhost,reinstalled_host*". it seems we need also adding lxc_hosts when using limit	06:27
gokhan	noonedeadpunk, I found it we need to add also reinstalled_host,reinstalled_host-host_containers. we need to update distribution upgrade document	07:48
noonedeadpunk	gokhan: oh, yes, sure	08:32
noonedeadpunk	you defenitely need it	08:32
noonedeadpunk	however, I somehow thought that reinstalled_host* includes both it and containers?	08:32
noonedeadpunk	gokhan: like, when I do `ansible -m ping os-control01*` I get both host and containers	08:33
noonedeadpunk	so `openstack-ansible setup-hosts.yml --limit localhost,reinstalled_host*` should do the trick?	08:34
gokhan	noonedeadpunk, sorry, it was my fault, I didn't use asterix * after node name :( it is working	08:36
noonedeadpunk	yeah, asterisk is important there :)	08:38
gokhan	noonedeadpunk, in this command "openstack-ansible set-haproxy-backends-state.yml -e hostname=<infrahost> -e backend_state=disabled --limit reinstalled_host" infrahost is reinstalled host or another infrahost	08:58
gokhan	?	08:58
noonedeadpunk	good question	09:04
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Disable RPC configuration for Neutron with OVN in CI https://review.opendev.org/c/openstack/openstack-ansible/+/908521	09:33
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Disable RPC configuration for Neutron with OVN in CI https://review.opendev.org/c/openstack/openstack-ansible/+/908521	09:34
noonedeadpunk	so, seems that centos9 is broken on nova-compute	09:41
noonedeadpunk	and also we have broken CI overall...	09:42
noonedeadpunk	jobs are not scheduled due to zuul config error	09:43
noonedeadpunk	https://review.opendev.org/c/openstack/openstack-ansible/+/908322 to solve it	09:43
noonedeadpunk	gokhan: sorry got distracted	09:43
noonedeadpunk	gokhan: yes infrahost is reinstalled_host in this context	09:44
noonedeadpunk	would be good to align these 2 in doc	09:45
andrewbonney	I can fix that in my patch. Should have it ready to go within a day or two	09:53
noonedeadpunk	NeilHanlon: just in case you might be interested, that libvirt 9.10 has nasty regression https://fedoraproject.org/wiki/Changes/LibvirtModularDaemons	10:48
noonedeadpunk	ugh	10:48
noonedeadpunk	https://issues.redhat.com/browse/RHEL-20609	10:48
noonedeadpunk	andrewbonney: we're having another OS upgrade next week, so can practice it a bit :)	11:00
halali	folk, will be good to land/merge this change soon https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/907708 :)	11:04
gokhan	noonedeadpunk, when running "openstack-ansible setup-infrastructure.yml --limit localhost,repo_all,rabbitmq_all,reinstalled_host*" , it throws error when creating wheel directory on the build host	11:07
gokhan	https://paste.openstack.org/show/bTsHar3QBasMsJk1UyEP/	11:07
gokhan	error is "msg": "chown failed: failed to look up user nginx", nginx user is not created on repo container	11:09
noonedeadpunk	gokhan: maybe it failed somewhere before that?	11:10
noonedeadpunk	what if you run repo-install.yml?	11:10
gokhan	noonedeadpunk, I am checking now	11:11
noonedeadpunk	as it feel that repo installation also potentially failed.	11:11
noonedeadpunk	As task is delegated to repo container	11:11
noonedeadpunk	and it expects nginx to be present there	11:12
gokhan	noonedeadpunk, when I run "openstack-ansible repo-install.yml --limit localhost,dev-compute1*", I am getting failed: dev-infra1-repo-container-c3e5f3be is either already part of another cluster or having volumes configured\	11:14
gokhan	I have previously removed this infra node from peers	11:15
noonedeadpunk	ok, frankly speaking I'm not that an expert in gluster... And we don't use it locally either...	11:17
noonedeadpunk	So I can hardly help with this part	11:18
noonedeadpunk	But I know andrewbonney dealt with it	11:18
noonedeadpunk	there's a doc update containing removing brick in advacne: https://review.opendev.org/c/openstack/openstack-ansible/+/906832/2/doc/source/admin/upgrades/distribution-upgrades.rst	11:18
noonedeadpunk	L195	11:19
andrewbonney	Assuming the brick/peer was removed in advance, the issue may be that the repo install needs to run against all hosts (no limit)	11:20
gokhan	noonedeadpunk, sorry it was my fault again :( repoinstall.yml is commented in setup-infrastructre.yml :(	11:20
noonedeadpunk	heh, ok :)	11:21
gokhan	I previously followed https://review.opendev.org/c/openstack/openstack-ansible/+/906832/2/doc/source/admin/upgrades/distribution-upgrades.rst and remove brick/peers	11:21
gokhan	noonedeadpunk, you are right repo install needs to run against all hosts	11:22
gokhan	thanks noonedeadpunk andrewbonney it is now working	11:22
noonedeadpunk	was not me, but ok	11:22
andrewbonney	:)	11:22
noonedeadpunk	also your original paste does limit repo_all	11:23
noonedeadpunk	so I guess it should be fine	11:23
gokhan	yes my original post does limit repo_all but when ı run it repo-install.yml is commented :(	11:23
noonedeadpunk	yeah, ok, gotcha	11:24
gokhan	noonedeadpunk, mariadb is installed on new node but it doesn't create /root/.my.cnf file I manually created this file	11:34
gokhan	also on rabbitmq it is failed "To install a new major/minor version of RabbitMQ set '-e rabbitmq_upgrade=true'.""	11:36
gokhan	do we need to add "-e rabbitmq_upgrade=true"	11:36
noonedeadpunk	gokhan: yes, so that is kinda known thing....	11:37
noonedeadpunk	And I'm not sure about it at all	11:38
noonedeadpunk	Known - missing /root/.my.cnf	11:38
noonedeadpunk	Eventually, with any modern MariaDB you are not supposed to have my.cnf	11:38
noonedeadpunk	As you're expected to login as root through socket auth	11:38
noonedeadpunk	which is default	11:38
noonedeadpunk	And old envs that has root messed up would struggle with not being able to auth as root without my.cnf	11:39
noonedeadpunk	I guess we might want to add a note about that....	11:39
noonedeadpunk	not sure about rabbit, but potentially yes	11:40
noonedeadpunk	actually, another thing about `rabbitmq_upgrade=true` that it feels to be bad/wrong approach when quorum queues are enabled	11:40
noonedeadpunk	What I see in my sandbox now, is that rabbitmq behaves like mysql more or less - being in `activating` state until it can get clustered properly	11:41
noonedeadpunk	And our rabbitmq_upgrade currently stops everything except 1 node "by design"	11:41
noonedeadpunk	but it's future problem....	11:41
noonedeadpunk	or well	11:41
noonedeadpunk	not for you gokhan at least :)	11:41
gokhan	noonedeadpunk, yes when I tried to check galera status on deployment host, I realized that .my.cnf is missing on the new host. it is needed when checking status for me.	11:41
gokhan	yes in my env. mirroring queue is enabled :)	11:42
noonedeadpunk	mirroring of queues != quorum queues	11:42
noonedeadpunk	these are 2 very distinct things and switching is not very trivial, available only since 2023.2	11:43
noonedeadpunk	and mirrored queues are considered deprecated at this point	11:44
gokhan	is quorum queues enabled default on bobcat ? is there any migration path from mirrored queues to quorum queues?	11:47
noonedeadpunk	no, not default, yes, upgrade is possible	11:52
noonedeadpunk	but it's involving some downtime/disturbance	11:52
noonedeadpunk	eventually, upgrade is already there. The problem is, that to upgrade to quorum, you actually need to drop existing vhost, and create a new one which will be replicated	11:53
noonedeadpunk	So after removing vhost for the service (which happens around at the beginning) and until playbook ends - service might misbehave	11:54
noonedeadpunk	But so far in sandbox experience is waaay better	11:54
opendevreview	Merged openstack/openstack-ansible-rabbitmq_server master: Add the abillity to configure the logging options https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/902908	11:59
gokhan	thanks for information noonedeadpunk :)	12:09
NeilHanlon	noonedeadpunk ah.. yeah. i had heard about that in the Integration SIG.. :\	13:46
noonedeadpunk	if you around... can you check this backport pls?:) https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/907708	13:47
spatel	mgariepy morning!	14:35
spatel	any luck with CAPI?	14:35
mgariepy	didnt had time to try it.	15:02
mgariepy	it's for my future self ;) haha	15:03
nixbuilder	I know this may not be the proper place for this question... however I need to know if anyone has a procedure for deleting images and volumes using only mysql? There are a few images/volumes that are in error. Somehow the image/volume already was deleted on our SAN but not within the openstack databases. I am attempting to clean this up.	15:10
noonedeadpunk	update volumes set deleted = 1 deleted_at = "2024-02-09 15:11:23" where id = UUID ?	15:12
noonedeadpunk	but eventually for volumes specifically - it should not get to error if backing device is gone	15:12
noonedeadpunk	it should be marked as deleted properly	15:12
noonedeadpunk	So you should be able to issue delete request thorugh api	15:13
nixbuilder	noonedeadpunk: from what I can tell cinder makes a call through the SAN driver to delete the volume, that call fails because the volume is not there and then I get an "error deleting" status on the volume. But I will try your suggestion.	15:16
noonedeadpunk	huh	15:16
noonedeadpunk	Ok, that's different in ceph. Or well. It still tries to issue request to ceph, it says - no image, and cinder happily marks as "deleted" afterwards,	15:17
noonedeadpunk	So potentially a bug in a driver, as I would expect such exception to be catched	15:17
nixbuilder	noonedeadpunk: Perhaps a bug in the driver... as always thanks for your help!	15:18
drarvese	Greetings! I'm running into an issue during the Keystone playbook where I get a "504 Gateway timeout" when adding the service project -- https://paste.openstack.org/show/bwzv3tuyCp8mLQNaTf5w/. Does anyone have any ideas? This is an AIO deployment, though I'm not using the bootstrap-aio.sh script or scenarios. This is the second time I've ran into this. The previous time (also an AIO	16:41
drarvese	deployment) I was able to get around it by deploying everything on baremetal, but that seems like a really heavy handed solution.	16:41
noonedeadpunk	o/	16:51
noonedeadpunk	drarvese: I guess, first question should be if you can access a keystone with curl from the VM?	16:52
noonedeadpunk	meaning - through container IP	16:52
noonedeadpunk	probably you can, as that's container timeout....	16:52
noonedeadpunk	*API	16:52
noonedeadpunk	and then if you can reach MySQL and what you see in logs inside keystone container	16:53
noonedeadpunk	as that sounds like some kind of connectivity issue to me...	16:54
noonedeadpunk	between what parts is a good question...	16:54
noonedeadpunk	so it can be haproxy -> keystone or keystone -> mysql, keystone -> memcached	16:54
drarvese	Yeah, I can curl the keystone endpoint through its container IP. I can reach MySQL through the utility container. Lemme grab the logs from the keystone container	17:00
noonedeadpunk	Huh, ok, interesting	17:08
noonedeadpunk	and with curl it returns api version and some json?	17:08
drarvese	Yeah	17:09
noonedeadpunk	I guess I would install telnet or smth like that to keystone container and would try to reach mariadb and memcached ips from it	17:10
noonedeadpunk	via ips defined in /etc/keystone/keystone.conf	17:10
noonedeadpunk	oh, btw, can you run smth like `openstack endpoint list` from utility container?	17:11
noonedeadpunk	As I assume you should get same 504?	17:11
drarvese	Logs from the keystone container: https://paste.openstack.org/show/byw3oUGWo0NzkzXIBHtW/	17:19
drarvese	And, yes, that returns a 504	17:19
noonedeadpunk	huh	17:21
noonedeadpunk	according to the log - keystone answers eventually	17:22
noonedeadpunk	log looks quite short though....	17:24
noonedeadpunk	another thing - have you applied same overrides as for aio?	17:25
noonedeadpunk	ie: https://opendev.org/openstack/openstack-ansible/src/branch/master/tests/roles/bootstrap-host/templates/user_variables.aio.yml.j2#L74-L81	17:25
noonedeadpunk	but franky speaking I'm not sure what's really wrong, given that keystone can connect to memcache and mariadb	17:26
noonedeadpunk	and system is not under some weird load	17:26
drarvese	No, I haven't applied any overrides like that	17:27
noonedeadpunk	ofc you can try to increase timeouts and see if request will eventually pass....	17:29
noonedeadpunk	there're couple of variables for that: https://opendev.org/openstack/openstack-ansible-haproxy_server/src/branch/master/defaults/main.yml#L244-L251	17:29
noonedeadpunk	BUt in fact I experienced this sort of issues only when keystone was not able to reach memcached due to some firewalling	17:30
drarvese	I'm able to telnet to the MySQL IP (the internal_lb_vip_ip), but not memcached or the IP of the MySQL container	17:30
noonedeadpunk	when connection was not reseted, but dropped	17:31
noonedeadpunk	yeah, so then, when keystone can not reach memcached, it will wait for connection timeout and only then proceed with request	17:32
noonedeadpunk	Which has high probability of timing out on haproxy	17:32
noonedeadpunk	I dunno how aio is done (and if it aio), but memcached and keystone containers are ideally on the same bridge inside the controller	17:32
noonedeadpunk	so unless it's some multi-node aio - issue is strange	17:33
drarvese	Yeah, they are on the same bridge	17:34
noonedeadpunk	and memcached container does have IP on eth1?	17:35
noonedeadpunk	and running?	17:35
drarvese	Yep	17:36
noonedeadpunk	Then I can only guess the reason might be in disabled net.ipv4.ip_forward or smth like that...	17:37
noonedeadpunk	but that should be set by openstack_hosts role even....	17:37
noonedeadpunk	drarvese: ok, easy test. comment out memcached in /etc/keystone/keystone.conf, restart service. After that you should be able to issue request from utility container	17:38
drarvese	That works	17:40
noonedeadpunk	mhm... well... you need to find out why direct connection withing same bridge does not work... While you can reach host - you can't reach other container somehow...	17:44
noonedeadpunk	maybe proxy_arp is needed, but I'd doubt...	17:44
noonedeadpunk	that really feels like some firewall frankly speaking	17:44
noonedeadpunk	drarvese: you would totally need that for rabbitmq for sure in the future	17:47
drarvese	Yeah. It does seem like a firewall issue. I'll look closer at that	17:49
noonedeadpunk	from osa prespective - nothing touches firewall	17:53
drarvese	Sigh, it was a firewall issue. The FORWARD iptables chain was configured to deny stuff.	18:03
noonedeadpunk	that would explain it :D	18:07
opendevreview	Merged openstack/openstack-ansible master: Remove distro_ceph template from project defenition https://review.opendev.org/c/openstack/openstack-ansible/+/908322	18:41
noonedeadpunk	folks, does anybody know how VLAN in OVN works? :D	18:54
noonedeadpunk	Like - I do see there's a virtual switch, I also see patch-provnet that in nbdb maps to vlan	18:55
noonedeadpunk	as well as all ports in the network	18:55
noonedeadpunk	but question more - where traffic does go out from this vlan?	18:56
noonedeadpunk	I guess meaning, if gateway != compute, should compute have access to vlan?	18:56
noonedeadpunk	As it feels like there's anyway geneve in between	18:57
noonedeadpunk	jamesdenton: sorry, not sure if you're around, but I guess you might know best :D	18:57
jamesdenton	hi	19:14
jamesdenton	IIRC your gateway nodes will handle non-floatingip traffic always, and compute nodes would handle floatingip traffic when distributed routing is enabled. Otherwise, the gateway nodes handle that too	19:16
jamesdenton	If it's just a provider network (w/o a neutron router) then the computes would need to have access to that vlan	19:16
noonedeadpunk	aha	19:17
noonedeadpunk	and what is non-floating ip traffic then?	19:17
jamesdenton	SNAT	19:18
noonedeadpunk	ok, so routers	19:18
jamesdenton	So, tenant network behind neutron router, likely geneve	19:18
jamesdenton	yes	19:18
noonedeadpunk	and fip in routers if distributed is disabled	19:18
noonedeadpunk	mhm, ok yes	19:18
noonedeadpunk	I somehow started assuming that vlan somehow goes through gateways as well	19:18
jamesdenton	yep	19:19
noonedeadpunk	but didn't find how to prove that or dismiss	19:19
noonedeadpunk	and I was thinking about octavia lbaas vlan per say	19:19
jamesdenton	yeah that gateway node is only used when routers are in play, if it's just a VM on a vlan network straight up to the fabric then that's all through the compute	19:20
noonedeadpunk	so it does not have router or anything in ovn	19:20
noonedeadpunk	just being in nbdb confused me I guess :D	19:20
noonedeadpunk	ok, thanks!	19:20
jamesdenton	octavia w/ ovn provider does not require the lbaas mgmt network	19:20
noonedeadpunk	yeah, I know that	19:20
jamesdenton	cool	19:20
noonedeadpunk	But it does not have l4 either	19:20
noonedeadpunk	so meh :(	19:20
jamesdenton	it's a little more basic :)	19:20
noonedeadpunk	yeah, I mean, it can repalce some usecases, but not all I guess	19:21
noonedeadpunk	btw, I've did some cleanup of your octavia ovn patch	19:21
jamesdenton	but cheap!	19:21
jamesdenton	how's the ovn vpnaas stuff coming along?	19:21
noonedeadpunk	and tested it - works nicely	19:21
jamesdenton	oh thank you	19:21
noonedeadpunk	though it's somehow failing CI on quite unrelated failures....	19:22
noonedeadpunk	jamesdenton: well. it looks very nice	19:22
noonedeadpunk	and about working	19:22
jamesdenton	does it use a namespace?	19:22
noonedeadpunk	I guess I just constantly messing up with bringing tunnel up	19:22
noonedeadpunk	it does	19:22
noonedeadpunk	And messing what's left what's right side....	19:22
noonedeadpunk	So it creates a namespace with ipsec, it does use one more IP from the external network, as it can't share one with the router	19:23
jamesdenton	oh ok, not terrible i guess	19:23
noonedeadpunk	Then it also creates internal /30 network and wires up with the router	19:23
jamesdenton	and adds some routes to the router?	19:24
noonedeadpunk	yeah, I believe it does	19:24
noonedeadpunk	I didn't manage to make a pair fully working yet:)	19:24
noonedeadpunk	but all pieces are in place, so it must work	19:24
noonedeadpunk	Ah! And VPN is running as extra service, alike to metadata, and is registered in neutron agents	19:25
noonedeadpunk	And uses RPC....	19:25
jamesdenton	oh nice	19:25
jamesdenton	i'll give the patch a go locally this weekend	19:25
jamesdenton	i could never get a tunnel up in an OVS environment for some reason	19:25
noonedeadpunk	But I really think I'm doing some very basic and stupid mistake when bringing 2 VPNs up	19:25
jamesdenton	been a few months since i tried though	19:25
noonedeadpunk	(in the same env)	19:25
noonedeadpunk	I'm also about to look into ovn-bgp-agent really shortly	19:26
noonedeadpunk	but dunno where to take frr from...	19:26
jamesdenton	been keeping eyes on that too	19:26
noonedeadpunk	Yeah, according to internal planning I should have done that 2 weeks ago...	19:27
jamesdenton	don't be so hard on yourself, i'm still working on backlog from 3 years ago	19:27
noonedeadpunk	haha	19:28
noonedeadpunk	yeah, true	19:28
noonedeadpunk	backlog from 3y ago haven't gone anywhere	19:28
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove galera_client from required projects https://review.opendev.org/c/openstack/openstack-ansible/+/908324	19:32
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/2023.2: Remove distro_ceph template from project defenition https://review.opendev.org/c/openstack/openstack-ansible/+/908280	19:33
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/2023.1: Remove distro_ceph template from project defenition https://review.opendev.org/c/openstack/openstack-ansible/+/908681	19:34
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/zed: Remove distro_ceph template from project defenition https://review.opendev.org/c/openstack/openstack-ansible/+/908682	19:34
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/zed: Remove distro_ceph template from project defenition https://review.opendev.org/c/openstack/openstack-ansible/+/908682	19:35
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/zed: Remove distro_ceph template from project defenition https://review.opendev.org/c/openstack/openstack-ansible/+/908682	19:35
spatel	jamesdenton hey! after long time	19:42
jamesdenton	hey spatel !	19:43
spatel	how is your EVPN issue?	19:43
jamesdenton	what's new?	19:43
jamesdenton	we got that worked out... i think there were a few issues but mainly a mismatch between switches on the reserved vlan ranges, in addition to the lack of an infra vlan configuration	19:44
jamesdenton	but we're cookin' now	19:44
spatel	oh so it was mis-config issue right?	19:44
jamesdenton	yeah, at the end of the day it was	19:45
jamesdenton	out setup is ingress replication, no multicast	19:45
jamesdenton	all is well, for now	19:45
spatel	I am busy in building new DC and new openstack. I am looking for k8s with sriov support	19:45
jamesdenton	the fun stuff	19:45
spatel	Did you ever run k8s with sriov ?	19:45
jamesdenton	i have not	19:46
spatel	developer want to run voice application on k8s with sriov support	19:46
spatel	Yes OVN-BGP-AGENT is in my list	19:47
spatel	jamesdenton why are you using ingress replication?	19:48
spatel	Multicast is easy and scalable..	19:48
jamesdenton	this is the way our network guys wanna run it	19:49
spatel	Ingress is easy so I can understand but using multicast give your better control on BUM engineering..	19:50
spatel	if you don't want to send BUM traffic on ABC rack then you can do that without any issue :)	19:50

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!