Tuesday, 2024-09-10

harun	hi all, there is no container communication among these containers via br-mgmt bridge, i cannot reach the containers of a host from other hosts after keeapalived is installed, when i reboot all hosts, i can reach containers from different hosts. Why do i encounter this issue? I would appreciate if you help me, thank you. Tcdump stdout: https://paste.openstack.org/show/bI2qiJpMbCsX8AgnS3OX/	07:05
noonedeadpunk	o/	07:32
noonedeadpunk	harun: um, I think you actually need to check first if there's a communication over br-mgmt between hosts first then. As it feels like either some kind of firewalling or just misconfigured bridge on itself	07:34
noonedeadpunk	I'm not sure if keepalived would cause such issues, but ofc would depend on your configuration	07:35
jrosser	you need a unique mgmt address ip on each bridge, quite apart from whatever vip keepalived is managing	07:35
gokhan_	noonedeadpunk, jrosser harun is my teammate. we can ping br-mgmt ips between hosts. All of the ips are unique.	07:39
gokhan_	we can not get arp reply from other hosts.	07:40
jrosser	are you being completely specific about hosts/containers here?	07:41
noonedeadpunk	well we kinda had specific issues with ARP replies between computes quite recently, but that was related to NICs firware and kernel version	07:42
jrosser	from memory keepalived adjusts routes too?	07:43
noonedeadpunk	and what we saw was VMs on some compute nodes were not able to communicate over tunnel (vxlan) networks due to arp being jsut dropped in one way	07:43
noonedeadpunk	yes, it does add a network which is defined for VIP	07:43
noonedeadpunk	*add a route	07:43
gokhan_	we are getting this issue in all of our environmets, also in customer environment which we installed. after reboot issiu is resolved	07:43
noonedeadpunk	so if you define vip with some weird netmask... but then there would be an issue with communication between controllers as well	07:44
noonedeadpunk	could it be that you've somehow dropped ip_forward from sysctl for the runtime?	07:44
jrosser	gokhan_: it is still not completey clear to me what breaks	07:46
jrosser	i.e, if you can still ping containers on the same host, from the host br-mgmt	07:47
jrosser	or if you can still ping br-mgmt <> br-mgmt between hosts, but just the container<>container is broken	07:47
gokhan_	this sysctl conf https://paste.openstack.org/show/bi97dP7ADRPtOB7hAAUk/	07:47
gokhan_	jrosser, also container<>otherhostsbr-mgmt is broken but container<>samehostbr-mgmt is working	07:49
jrosser	and what about one host br-mgmt to another host br-mgmt?	07:50
gokhan_	jrosser, sorry we also can not ping containers on the same host	07:51
gokhan_	we can only ping from containers to its host br-mgmt ip	07:52
gokhan_	we restart the lxc-dnsmasq service but it is not worked.	07:54
jrosser	that only deals with eth0 in the container	07:55
jrosser	did you do other things like check that the routing table looks reasonable?	07:56
gokhan_	jrosser, https://paste.openstack.org/show/blJXkzVFYUpesUza31NQ/	07:58
gokhan_	it seems ok	07:59
jrosser	docker?	07:59
gokhan_	also ceph is installed with cephadm and it is using docker	08:01
jrosser	all i can recommend is starting bottom up with really basic connectivity checks	08:04
jrosser	arp/ping with tcpdump at both ends between two host br-mgmt	08:04
jrosser	we do not test cephadm on the same hosts as openstack-ansible so that would be up to you to check there is no bad interaction	08:05
jrosser	it is also possible that docker is installing iptables rules	08:09
gokhan_	jrosser, this is ping and tcpdum output ping between 2 host. https://paste.openstack.org/show/bcdc92jPLvzqYrewviQs/	08:11
gokhan_	this is iptables rule list https://paste.openstack.org/show/bR62RCLyZQeKgd5OEFix/	08:12
gokhan_	I am using cephadm and osa on samehost in multiple environments, ı didn't get any issues about that.	08:13
gokhan_	the weird behaviour it is working after the reboot :(	08:14
gokhan_	but we are trying to find root cause of this.	08:14
gokhan_	jrosser, can apparmor service effect container networking ?	08:16
jrosser	you would see anything that apparmor blocks in the kernel log	08:21
jrosser	have you checked that br-mgmt has all the members you'd expect	08:22
gokhan_	jrosser, these are dmesq logs https://paste.openstack.org/show/bDEQa5sxPHJcf7UeHC59/	08:25
gokhan_	thre are profile replace logs on apparmor	08:26
gokhan_	jrosser, br-mgmt has all members https://paste.openstack.org/show/bd2p0LIGnUCfEpoVk0uh/	08:30
gokhan_	now I am rebooting one of hosts try to see difference	08:36
gokhan_	jrosser, after the reboot, now containers on rebooted host can ping between themselves	08:44
gokhan_	the only difference I see is lxc-monitord service is not working	08:45
jrosser	noonedeadpunk: is this correct? https://github.com/openstack/openstack-ansible/blob/master/scripts/gate-check-commit.sh#L68	08:51
jrosser	should it be 2024.1?	08:51
gokhan_	jrosser, it seems I find the issue	08:56
gokhan_	after the reboot iptables rule has changed	08:57
gokhan_	this is rebooted host https://paste.openstack.org/show/bO3R9QEdzZlzFeivqcIo/	08:57
noonedeadpunk	jrosser: should be 2024.1, yes	08:58
gokhan_	in other host iptables rules are Chain INPUT (policy ACCEPT)	08:58
gokhan_	target prot opt source destination	08:58
gokhan_	ACCEPT tcp -- anywhere anywhere tcp dpt:domain	08:58
gokhan_	ACCEPT udp -- anywhere anywhere udp dpt:domain	08:58
gokhan_	ACCEPT tcp -- anywhere anywhere tcp dpt:67	08:58
gokhan_	ACCEPT udp -- anywhere anywhere udp dpt:bootps	08:58
gokhan_	Chain FORWARD (policy DROP)	08:58
gokhan_	target prot opt source destination	08:58
gokhan_	ACCEPT all -- anywhere anywhere	08:58
gokhan_	ACCEPT all -- anywhere anywhere	08:58
gokhan_	DOCKER-USER all -- anywhere anywhere	08:58
gokhan_	DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere	08:58
gokhan_	ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED	08:58
gokhan_	DOCKER all -- anywhere anywhere	08:58
gokhan_	ACCEPT all -- anywhere anywhere	08:58
gokhan_	ACCEPT all -- anywhere anywhere	08:58
gokhan_	Chain OUTPUT (policy ACCEPT)	08:58
gokhan_	target prot opt source destination	08:58
gokhan_	Chain DOCKER (1 references)	08:58
gokhan_	target prot opt source destination	08:58
gokhan_	Chain DOCKER-ISOLATION-STAGE-1 (1 references)	08:58
gokhan_	target prot opt source destination	08:58
gokhan_	DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere	08:58
gokhan_	RETURN all -- anywhere anywhere	08:58
gokhan_	Chain DOCKER-ISOLATION-STAGE-2 (1 references)	08:59
gokhan_	target prot opt source destination	08:59
gokhan_	DROP all -- anywhere anywhere	08:59
gokhan_	RETURN all -- anywhere anywhere	08:59
gokhan_	Chain DOCKER-USER (1 references)	08:59
gokhan_	target prot opt source destination	08:59
gokhan_	RETURN all -- anywhere anywhere	08:59
gokhan_	sorry :(	08:59
gokhan_	https://paste.openstack.org/show/bK8axT4XP81rb1l580Lt/	08:59
gokhan_	change Forward policy is drop on unrebooted hosts	08:59
gokhan_	how can we apply iptables rule for lxc containers	08:59
gokhan_	it seems they are not applied	08:59
jrosser	like i say we do not test/support having docker and lxc on the same host	08:59
jrosser	this is very well known to cause trouble for both lxc and lxd	08:59
jrosser	there might be some config you can change on the docker side about this - but i have no idea about that really	09:01
jrosser	openstack-ansible does not do any management of iptables rules at all, so this feels like a docker issue	09:01
gokhan_	thanks jrosser for helping to find the issue. as you have said, it seems there are issues when installing docker and lxc on same host. as a workaround we will change iptables rules as expected.	09:05
jrosser	it may be that restarting some service on the docker side has the same effect as the reboot, whichever one is responsible for inserting the iptables rules	09:06
gokhan_	the weird thing is docker is working as expected. ceph mon daemon can communicate between themselves.	09:08
gokhan_	I will restart ceph.target and see is there any change on iptables side.	09:08
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible master: Fix upgrade job on master to upgrade from 2024.1 to master https://review.opendev.org/c/openstack/openstack-ansible/+/928771	09:17
noonedeadpunk	actually - we do iptables rules for LXC	09:20
noonedeadpunk	and they should be re-loaded/applied with restart of lxc-dnsmasq service iirc	09:20
noonedeadpunk	https://opendev.org/openstack/openstack-ansible-lxc_hosts/src/branch/master/templates/lxc-system-manage.j2#L76-L111	09:21
noonedeadpunk	and yes, lxc-dnsmasq would remove/add iptables rules	09:22
noonedeadpunk	https://opendev.org/openstack/openstack-ansible-lxc_hosts/src/branch/master/tasks/lxc_net.yml#L89-L104	09:22
noonedeadpunk	but you also can `/usr/local/bin/lxc-system-manage iptables-recreate`	09:23
jrosser	oh wow i completely missed that!	09:23
jrosser	gokhan_: ^ this is stuff to know about	09:25
noonedeadpunk	eventually we can add some "custom" rules to that template if that's gonna help	09:41
opendevreview	Merged openstack/openstack-ansible stable/2023.2: Remove the get_md5 parameter from ansible stat tasks https://review.opendev.org/c/openstack/openstack-ansible/+/927720	10:08
gokhan_	noonedeadpunk, thanks noonedeadpunk , we restarted lxc-dnsmasq but they are not applied. I am trying now	10:11
gokhan_	noonedeadpunk, it is not changed. Chain FORWARD (policy DROP) > policy is DROP but on rebooted host it is Chain FORWARD (policy ACCEPT)	10:26
gokhan_	same ip table rules are ecreated	10:26
noonedeadpunk	so the service totally does not change the default policy on chains	10:36
noonedeadpunk	I don't think doker does this either	10:36
noonedeadpunk	ah....	10:37
noonedeadpunk	service ensures forward only for lxc_bridge, not mgmt_bridge	10:37
gokhan_	network connection issue is solved by running "sudo iptables -P FORWARD ACCEP"	10:38
gokhan_	network connection issue is solved by running "sudo iptables -P FORWARD ACCEPT"	10:39
gokhan_	noonedeadpunk, I didn't find anouther solution except upper	10:40
noonedeadpunk	iptables -I FORWARD -i "br-mgmt" -j ACCEPT ?	10:41
gokhan_	docker restart is also not worked	10:41
gokhan_	noonedeadpunk, I am trying	10:42
gokhan_	it also worked	10:45
gokhan_	noonedeadpunk, jrosser I have tested with docker installation on a vm, docker is changing iptables forward chain policy from accept to drop.	10:48
noonedeadpunk	well....	10:48
noonedeadpunk	this used to be really nice role to manage iptables rules: https://github.com/logan2211/ansible-iptables	10:49
jrosser	we use that ^	10:51
jrosser	but we also have an unmerged PR there for 4 years :(	10:52
noonedeadpunk	oops, quite a crucial one btw	11:19
jrosser	looks like logan- is still here in irc.....	11:20
noonedeadpunk	at worst I hope seeing him in couple of months, so potentially can bug him about things :p	11:24
noonedeadpunk	I've been reported one thing here. Apparently, magnum with heat driver (at least with heat) does try to use `amphora` octavia_provider which is default	11:43
noonedeadpunk	and I've proposed patch (which we've merged) which removes this provider and leaves only amphorav2	11:43
noonedeadpunk	so I was wondering if we should maybe rollback (or not) and have `amphora` provider along with `amphorav2`	11:55
noonedeadpunk	as `amphora` will call the v2 anyway.	11:55
noonedeadpunk	not idea though if it's going to be same in the future or not. Or Magnum should adjust the default to point to v2	11:56
jrosser	noonedeadpunk: there is `octavia_provider` label but having that be the old value by default is not good	12:19
noonedeadpunk	yeah, it's "old" default.	12:20
noonedeadpunk	johnsom: any insight if `amphora` provider is expected to be existing in deployments, or having jsut `amphorav2` is fine?	12:21
noonedeadpunk	and if `amphora` is going to be kept in octavia for the future as well?	12:21
jrosser	i wonder if we should revert converting the repo server to apache	12:29
noonedeadpunk	I wanna fix mpms this week for sure	12:30
noonedeadpunk	and backport to 2024.1	12:30
noonedeadpunk	as seems that skyline/keystone is already an issue	12:30
jrosser	are you going to look at fixing up everything being on apache?	12:34
jrosser	if so i will leave it alone	12:34
jrosser	there are some surprise failures to come as we've not been testing the right upgrades too	12:34
noonedeadpunk	I was going to iterate through mpm modules and disable all except one that's being defined	12:39
noonedeadpunk	and introduce global variable to set the mpm	12:40
jrosser	ah i think also the wrong upgrade branch is why we are missing a bunch of logs of /etc for upgrade jobs	12:46
jrosser	the log collection at the end depends on tools which should have been installed from the starting branch, and they are missing (like parallel)	12:47
jrosser	and the same will affect slurp upgrades as we need to tools for master but set things up initially two branches back	12:48
noonedeadpunk	yeah	12:49
jrosser	i think thats a simple fix	12:49
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible stable/2023.2: Ensure "parallel" package is installed for CI log collection https://review.opendev.org/c/openstack/openstack-ansible/+/928790	12:56
noonedeadpunk	which mpm we wanna for the default? event as of keystone or worker as of horizon?	12:56
noonedeadpunk	I frankly can't recall exact difference between these 2 already :(	12:57
jrosser	i have no idea tbh :/	12:57
noonedeadpunk	sounds like event is better	12:59
noonedeadpunk	or well, like it's improved worker	12:59
opendevreview	Merged openstack/openstack-ansible stable/2024.1: Remove extra slash character from horizon haproxy healthcheck url. https://review.opendev.org/c/openstack/openstack-ansible/+/927264	13:20
opendevreview	Merged openstack/openstack-ansible-os_neutron master: Improve OVN cluster setup idempotence report https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/928618	13:37
opendevreview	Merged openstack/openstack-ansible-os_neutron master: Do not kill ipsec on L3 cleanup https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/927992	13:37
opendevreview	Merged openstack/openstack-ansible-plugins master: Add infrastructure playbooks to openstack-ansible-plugins collection https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/924171	13:38
opendevreview	Merged openstack/openstack-ansible-os_ceilometer stable/2024.1: Add support for Magnum notifications https://review.opendev.org/c/openstack/openstack-ansible-os_ceilometer/+/927812	13:41
opendevreview	Merged openstack/openstack-ansible-os_neutron master: Remove ns-metadata-proxy cleanuop handler https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/927993	13:53
johnsom	noonedeadpunk amphora will be permanent, amphorav2 may go away at some point.	13:59
johnsom	So, yeah, keep using amphora	14:01
opendevreview	Merged openstack/openstack-ansible-ops master: Update magnum-cluster-api version https://review.opendev.org/c/openstack/openstack-ansible-ops/+/928613	14:01
noonedeadpunk	johnsom: and `octavia` is just removed?	14:04
noonedeadpunk	for some reason I thought that it will remain with v2 :(	14:06
noonedeadpunk	probably completely misunderstood some discussion	14:06
johnsom	Yeah, at some point "octavia" might go away. people didn't like that one as we have multiple providers now, so lobbied to change to "amphora"	14:13
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_octavia master: Return `amphora` provider back https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/928815	14:13
noonedeadpunk	well, versioned amphora also makes sense to me kinda	14:13
johnsom	noonedeadpunk as of master branch, they are all the same now	14:13
johnsom	Yeah, but the code for v1 is going away	14:13
noonedeadpunk	yeah, that part I know, though thought some were marked for removal anyway in the future	14:14
noonedeadpunk	and that is why having `amphora` felt a bit confusing I guess	14:14
johnsom	For a deployment project, "amphora" will always be the right answer	14:14
noonedeadpunk	as `amphorav2` makes more natural given that the code is quite different	14:14
noonedeadpunk	ok, yeah, I see	14:14
noonedeadpunk	we just switched `default_provider_driver = amphorav2` some time ago....	14:15
noonedeadpunk	#startmeeting openstack_ansible_meeting	15:00
opendevmeet	Meeting started Tue Sep 10 15:00:20 2024 UTC and is due to finish in 60 minutes. The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot.	15:00
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	15:00
opendevmeet	The meeting name has been set to 'openstack_ansible_meeting'	15:00
noonedeadpunk	#topic rollcall	15:00
noonedeadpunk	o/	15:00
hamburgler	o/	15:00
NeilHanlon	o/	15:01
jrosser	o/ hello	15:01
noonedeadpunk	#topic office hours	15:03
noonedeadpunk	so, noble test jobs finally merged	15:03
noonedeadpunk	though we've missed moving noble with playbooks	15:04
noonedeadpunk	and the fix failed on gate intermittently and currently in recheck	15:05
noonedeadpunk	#link https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/928592/3	15:05
noonedeadpunk	There is also a current issue with apache on metal	15:05
noonedeadpunk	as we're using different MPMs across roles, which causes upgrade job failures	15:06
noonedeadpunk	(once upgrade jobs track correct branch)	15:06
noonedeadpunk	so whatever fix needed shoud be backported to 2024.1	15:07
jrosser	i found that by trying to understand the job failures in more depth	15:07
noonedeadpunk	and i guess this should be kinda last thing for backport before doing first minor release	15:07
noonedeadpunk	Ah, except octavia thing that I realized just today	15:07
noonedeadpunk	#link https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/928815	15:08
jrosser	do we have broken apache/metal on 2024.1?	15:08
noonedeadpunk	yeah	15:08
jrosser	oh dear, ok	15:08
noonedeadpunk	I think that second run of playbooks will break it	15:08
jrosser	fixing the upgrade job branch could bring more CI trouble, just a release earlier	15:09
noonedeadpunk	yeah, true	15:11
noonedeadpunk	so there's quite some things to work on, but not sure what needs deeper discussion	15:14
jrosser	i found the horizon compress failure is not specifically an OSA issue	15:16
noonedeadpunk	oh	15:16
jrosser	it aparrently occurs when installing UCA pacakges, as part of building debian packages, and also in devstack	15:17
jrosser	there is a bug which is now correctly assigned to the horizon project https://bugs.launchpad.net/horizon/+bug/2045394	15:17
jrosser	i also spent some time looking at why jobs fail to get u-c when that should be from the disk	15:18
jrosser	and unfortuntley that happens a lot in upgrade jobs and there are insufficient logs collected	15:19
jrosser	this (+ a backport) should address the log collection https://review.opendev.org/c/openstack/openstack-ansible/+/928790	15:20
jrosser	but that is kind of hard to test	15:20
noonedeadpunk	it looks reasonable enough	15:35
jrosser	for the u-c errors it is clear that the code takes the path for the url being https:// rather than file://	15:38
jrosser	but why it does that is not obvious yet - it could be that we have changed the way that the redirection of the URLs to files works between releases	15:39
jrosser	so what is set up for the initial upgrade branch does not do the right thing for the target branch	15:39
jrosser	i think this is the most likley explanation for those kind of errors	15:40
noonedeadpunk	so if that for upgrade jobs only - that might be the case	15:47
noonedeadpunk	as there we kind of ignore zuul-provided repos	15:47
noonedeadpunk	just to leave them in "original" state to preserve depends-on	15:48
noonedeadpunk	which could explain why upgrade on N-1 might try to do web fetch of u-c	15:48
jrosser	how do i discover where the opensearch log collection service is?	15:53
jrosser	^ for CI jobs	15:53
jrosser	ML says https://opensearch.logs.openstack.org/_dashboards/app/discover?security_tenant=global	15:56
noonedeadpunk	#endmeeting	16:06
opendevmeet	Meeting ended Tue Sep 10 16:06:38 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	16:06
opendevmeet	Minutes: https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-09-10-15.00.html	16:06
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-09-10-15.00.txt	16:06
opendevmeet	Log: https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-09-10-15.00.log.html	16:06
jrosser	https://mariadb.com/newsroom/press-releases/k1-acquires-a-leading-database-software-company-mariadb-and-appoints-new-ceo/	16:31
noonedeadpunk	wow	16:41
noonedeadpunk	no good example of k1 investments in examples....	16:42
opendevreview	Merged openstack/openstack-ansible-plugins master: Verify OS for containers installation https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/928591	18:25
opendevreview	Merged openstack/openstack-ansible-plugins master: Add Ubuntu 24.04 to supported by playbook versions https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/928592	18:25

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!