Tuesday, 2024-09-10

harunhi all, there is no container communication among these containers via br-mgmt bridge, i cannot reach the containers of a host from other hosts after keeapalived is installed, when i reboot all hosts, i can reach containers from different hosts. Why do i encounter this issue? I would appreciate if you help me, thank you. Tcdump stdout: https://paste.openstack.org/show/bI2qiJpMbCsX8AgnS3OX/07:05
noonedeadpunko/07:32
noonedeadpunkharun: um, I think you actually need to check first if there's a communication over br-mgmt between hosts first then. As it feels like either some kind of firewalling or just misconfigured bridge on itself07:34
noonedeadpunkI'm not sure if keepalived would cause such issues, but ofc would depend on your configuration07:35
jrosseryou need a unique mgmt address ip on each bridge, quite apart from whatever vip keepalived is managing07:35
gokhan_noonedeadpunk, jrosser harun  is my teammate. we can ping br-mgmt ips between hosts. All of the ips are unique. 07:39
gokhan_we can not get arp reply from other hosts. 07:40
jrosserare you being completely specific about hosts/containers here?07:41
noonedeadpunkwell we kinda had specific issues with ARP replies between computes quite recently, but that was related to NICs firware and kernel version07:42
jrosserfrom memory keepalived adjusts routes too?07:43
noonedeadpunkand what we saw was VMs on some compute nodes were not able to communicate over tunnel (vxlan) networks due to arp being jsut dropped in one way07:43
noonedeadpunkyes, it does add a network which is defined for VIP07:43
noonedeadpunk*add a route07:43
gokhan_we are getting this issue in all of our environmets, also in customer environment which we installed. after reboot issiu is resolved  07:43
noonedeadpunkso if you define vip with some weird netmask... but then there would be an issue with communication between controllers as well07:44
noonedeadpunkcould it be that you've somehow dropped ip_forward from sysctl for the runtime?07:44
jrossergokhan_: it is still not completey clear to me what breaks07:46
jrosseri.e, if you can still ping containers on the same host, from the host br-mgmt07:47
jrosseror if you can still ping br-mgmt <> br-mgmt between hosts, but just the container<>container is broken07:47
gokhan_this sysctl conf https://paste.openstack.org/show/bi97dP7ADRPtOB7hAAUk/ 07:47
gokhan_jrosser, also container<>otherhostsbr-mgmt is broken but container<>samehostbr-mgmt is working 07:49
jrosserand what about one host br-mgmt to another host br-mgmt?07:50
gokhan_jrosser, sorry we also can not ping containers on the same host 07:51
gokhan_we can only ping from containers to its host br-mgmt ip07:52
gokhan_we restart the lxc-dnsmasq service but it is not worked. 07:54
jrosserthat only deals with eth0 in the container07:55
jrosserdid you do other things like check that the routing table looks reasonable?07:56
gokhan_jrosser, https://paste.openstack.org/show/blJXkzVFYUpesUza31NQ/07:58
gokhan_it seems ok 07:59
jrosserdocker?07:59
gokhan_also  ceph is installed with cephadm and it is using docker 08:01
jrosserall i can recommend is starting bottom up with really basic connectivity checks08:04
jrosserarp/ping with tcpdump at both ends between two host br-mgmt08:04
jrosserwe do not test cephadm on the same hosts as openstack-ansible so that would be up to you to check there is no bad interaction08:05
jrosserit is also possible that docker is installing iptables rules08:09
gokhan_jrosser, this is ping and tcpdum output ping between 2 host. https://paste.openstack.org/show/bcdc92jPLvzqYrewviQs/08:11
gokhan_this is iptables rule list https://paste.openstack.org/show/bR62RCLyZQeKgd5OEFix/08:12
gokhan_I am using cephadm and osa on samehost in multiple environments, ı didn't get any issues about that. 08:13
gokhan_the weird behaviour it is working after the reboot :(08:14
gokhan_but we are trying to find root cause of this. 08:14
gokhan_jrosser, can apparmor service effect container networking ? 08:16
jrosseryou would see anything that apparmor blocks in the kernel log08:21
jrosserhave you checked that br-mgmt has all the members you'd expect08:22
gokhan_jrosser, these are dmesq logs https://paste.openstack.org/show/bDEQa5sxPHJcf7UeHC59/08:25
gokhan_thre are profile replace logs on apparmor 08:26
gokhan_jrosser, br-mgmt has all members https://paste.openstack.org/show/bd2p0LIGnUCfEpoVk0uh/08:30
gokhan_now I am rebooting one of hosts try to see difference 08:36
gokhan_jrosser, after the reboot, now containers on rebooted host can ping between themselves08:44
gokhan_the only difference I see is lxc-monitord service is not working 08:45
jrossernoonedeadpunk: is this correct? https://github.com/openstack/openstack-ansible/blob/master/scripts/gate-check-commit.sh#L6808:51
jrossershould it be 2024.1?08:51
gokhan_jrosser, it seems I find the issue 08:56
gokhan_after the reboot iptables rule has changed 08:57
gokhan_this is rebooted host https://paste.openstack.org/show/bO3R9QEdzZlzFeivqcIo/08:57
noonedeadpunkjrosser: should be 2024.1, yes08:58
gokhan_in other host iptables rules are Chain INPUT (policy ACCEPT)08:58
gokhan_target     prot opt source               destination         08:58
gokhan_ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:domain08:58
gokhan_ACCEPT     udp  --  anywhere             anywhere             udp dpt:domain08:58
gokhan_ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:6708:58
gokhan_ACCEPT     udp  --  anywhere             anywhere             udp dpt:bootps08:58
gokhan_Chain FORWARD (policy DROP)08:58
gokhan_target     prot opt source               destination         08:58
gokhan_ACCEPT     all  --  anywhere             anywhere            08:58
gokhan_ACCEPT     all  --  anywhere             anywhere            08:58
gokhan_DOCKER-USER  all  --  anywhere             anywhere            08:58
gokhan_DOCKER-ISOLATION-STAGE-1  all  --  anywhere             anywhere            08:58
gokhan_ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED08:58
gokhan_DOCKER     all  --  anywhere             anywhere            08:58
gokhan_ACCEPT     all  --  anywhere             anywhere            08:58
gokhan_ACCEPT     all  --  anywhere             anywhere            08:58
gokhan_Chain OUTPUT (policy ACCEPT)08:58
gokhan_target     prot opt source               destination         08:58
gokhan_Chain DOCKER (1 references)08:58
gokhan_target     prot opt source               destination         08:58
gokhan_Chain DOCKER-ISOLATION-STAGE-1 (1 references)08:58
gokhan_target     prot opt source               destination         08:58
gokhan_DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere            08:58
gokhan_RETURN     all  --  anywhere             anywhere            08:58
gokhan_Chain DOCKER-ISOLATION-STAGE-2 (1 references)08:59
gokhan_target     prot opt source               destination         08:59
gokhan_DROP       all  --  anywhere             anywhere            08:59
gokhan_RETURN     all  --  anywhere             anywhere            08:59
gokhan_Chain DOCKER-USER (1 references)08:59
gokhan_target     prot opt source               destination         08:59
gokhan_RETURN     all  --  anywhere             anywhere08:59
gokhan_sorry :(08:59
gokhan_https://paste.openstack.org/show/bK8axT4XP81rb1l580Lt/08:59
gokhan_change Forward policy is drop on unrebooted hosts08:59
gokhan_how can we apply iptables rule for lxc containers 08:59
gokhan_it seems they are not applied 08:59
jrosserlike i say we do not test/support having docker and lxc on the same host08:59
jrosserthis is very well known to cause trouble for both lxc and lxd08:59
jrosserthere might be some config you can change on the docker side about this - but i have no idea about that really09:01
jrosseropenstack-ansible does not do any management of iptables rules at all, so this feels like a docker issue09:01
gokhan_thanks jrosser for helping to find the issue. as you have said, it seems there are issues when installing docker and lxc on same host. as a workaround we will change iptables rules as expected.    09:05
jrosserit may be that restarting some service on the docker side has the same effect as the reboot, whichever one is responsible for inserting the iptables rules09:06
gokhan_the weird thing is docker is working as expected. ceph mon daemon can communicate between themselves. 09:08
gokhan_I will restart ceph.target and see is there any change on iptables side. 09:08
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: Fix upgrade job on master to upgrade from 2024.1 to master  https://review.opendev.org/c/openstack/openstack-ansible/+/92877109:17
noonedeadpunkactually - we do iptables rules for LXC09:20
noonedeadpunkand they should be re-loaded/applied with restart of lxc-dnsmasq service iirc09:20
noonedeadpunkhttps://opendev.org/openstack/openstack-ansible-lxc_hosts/src/branch/master/templates/lxc-system-manage.j2#L76-L11109:21
noonedeadpunkand yes, lxc-dnsmasq would remove/add iptables rules09:22
noonedeadpunkhttps://opendev.org/openstack/openstack-ansible-lxc_hosts/src/branch/master/tasks/lxc_net.yml#L89-L10409:22
noonedeadpunkbut you also can `/usr/local/bin/lxc-system-manage iptables-recreate`09:23
jrosseroh wow i completely missed that!09:23
jrossergokhan_: ^ this is stuff to know about09:25
noonedeadpunkeventually we can add some "custom" rules to that template if that's gonna help09:41
opendevreviewMerged openstack/openstack-ansible stable/2023.2: Remove the get_md5 parameter from ansible stat tasks  https://review.opendev.org/c/openstack/openstack-ansible/+/92772010:08
gokhan_noonedeadpunk, thanks noonedeadpunk , we restarted lxc-dnsmasq but they are not applied. I am trying now 10:11
gokhan_noonedeadpunk, it is not changed. Chain FORWARD (policy DROP) > policy is DROP but on rebooted host it is Chain FORWARD (policy ACCEPT) 10:26
gokhan_same ip table rules are ecreated 10:26
noonedeadpunkso the service totally does not change the default policy on chains10:36
noonedeadpunkI don't think doker does this either10:36
noonedeadpunkah....10:37
noonedeadpunkservice ensures forward only for lxc_bridge, not mgmt_bridge10:37
gokhan_network connection issue is solved by running "sudo iptables -P FORWARD ACCEP"10:38
gokhan_network connection issue is solved by running "sudo iptables -P FORWARD ACCEPT"10:39
gokhan_noonedeadpunk, I didn't find anouther solution except upper 10:40
noonedeadpunk iptables -I FORWARD -i "br-mgmt" -j ACCEPT ?10:41
gokhan_docker restart is also not worked 10:41
gokhan_noonedeadpunk, I am trying 10:42
gokhan_it also worked 10:45
gokhan_noonedeadpunk, jrosser I have tested with docker installation on a vm, docker is changing iptables forward chain policy from accept to drop. 10:48
noonedeadpunkwell....10:48
noonedeadpunkthis used to be really nice role to manage iptables rules: https://github.com/logan2211/ansible-iptables10:49
jrosserwe use that ^10:51
jrosserbut we also have an unmerged PR there for 4 years :(10:52
noonedeadpunkoops, quite a crucial one btw11:19
jrosserlooks like logan- is still here in irc.....11:20
noonedeadpunkat worst I hope seeing him in couple of months, so potentially can bug him about things :p11:24
noonedeadpunkI've been reported one thing here. Apparently, magnum with heat driver (at least with heat) does try to use `amphora` octavia_provider which is default11:43
noonedeadpunkand I've proposed patch (which we've merged) which removes this provider and leaves only amphorav211:43
noonedeadpunkso I was wondering if we should maybe rollback (or not) and have `amphora` provider along with `amphorav2`11:55
noonedeadpunkas `amphora` will call the v2 anyway.11:55
noonedeadpunknot idea though if it's going to be same in the future or not. Or Magnum should adjust the default to point to v211:56
jrossernoonedeadpunk: there is `octavia_provider` label but having that be the old value by default is not good12:19
noonedeadpunkyeah, it's "old" default.12:20
noonedeadpunkjohnsom: any insight if `amphora` provider is expected to be existing in deployments, or having jsut `amphorav2` is fine?12:21
noonedeadpunkand if `amphora` is going to be kept in octavia for the future as well?12:21
jrosseri wonder if we should revert converting the repo server to apache12:29
noonedeadpunkI wanna fix mpms this week for sure12:30
noonedeadpunkand backport to 2024.112:30
noonedeadpunkas seems that skyline/keystone is already an issue12:30
jrosserare you going to look at fixing up everything being on apache?12:34
jrosserif so i will leave it alone12:34
jrosserthere are some surprise failures to come as we've not been testing the right upgrades too12:34
noonedeadpunkI was going to iterate through mpm modules and disable all except one that's being defined12:39
noonedeadpunkand introduce global variable to set the mpm12:40
jrosserah i think also the wrong upgrade branch is why we are missing a bunch of logs of /etc for upgrade jobs12:46
jrosserthe log collection at the end depends on tools which should have been installed from the starting branch, and they are missing (like parallel)12:47
jrosserand the same will affect slurp upgrades as we need to tools for master but set things up initially two branches back12:48
noonedeadpunkyeah12:49
jrosseri think thats a simple fix12:49
opendevreviewJonathan Rosser proposed openstack/openstack-ansible stable/2023.2: Ensure "parallel" package is installed for CI log collection  https://review.opendev.org/c/openstack/openstack-ansible/+/92879012:56
noonedeadpunkwhich mpm we wanna for the default? event as of keystone or worker as of horizon?12:56
noonedeadpunkI frankly can't recall exact difference between these 2 already :(12:57
jrosseri have no idea tbh :/12:57
noonedeadpunksounds like event is better12:59
noonedeadpunkor well, like it's improved worker12:59
opendevreviewMerged openstack/openstack-ansible stable/2024.1: Remove extra slash character from horizon haproxy healthcheck url.  https://review.opendev.org/c/openstack/openstack-ansible/+/92726413:20
opendevreviewMerged openstack/openstack-ansible-os_neutron master: Improve OVN cluster setup idempotence report  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/92861813:37
opendevreviewMerged openstack/openstack-ansible-os_neutron master: Do not kill ipsec on L3 cleanup  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/92799213:37
opendevreviewMerged openstack/openstack-ansible-plugins master: Add infrastructure playbooks to openstack-ansible-plugins collection  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/92417113:38
opendevreviewMerged openstack/openstack-ansible-os_ceilometer stable/2024.1: Add support for Magnum notifications  https://review.opendev.org/c/openstack/openstack-ansible-os_ceilometer/+/92781213:41
opendevreviewMerged openstack/openstack-ansible-os_neutron master: Remove ns-metadata-proxy cleanuop handler  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/92799313:53
johnsomnoonedeadpunk amphora will be permanent, amphorav2 may go away at some point.13:59
johnsomSo, yeah, keep using amphora14:01
opendevreviewMerged openstack/openstack-ansible-ops master: Update magnum-cluster-api version  https://review.opendev.org/c/openstack/openstack-ansible-ops/+/92861314:01
noonedeadpunkjohnsom: and `octavia` is just removed?14:04
noonedeadpunkfor some reason I thought that it will remain with v2 :(14:06
noonedeadpunkprobably completely misunderstood some discussion14:06
johnsomYeah, at some point "octavia" might go away. people didn't like that one as we have multiple providers now, so lobbied to change to "amphora"14:13
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_octavia master: Return `amphora` provider back  https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/92881514:13
noonedeadpunkwell, versioned amphora also makes sense to me kinda14:13
johnsomnoonedeadpunk as of master branch, they are all the same now14:13
johnsomYeah, but the code for v1 is going away14:13
noonedeadpunkyeah, that part I know, though thought some were marked for removal anyway in the future14:14
noonedeadpunkand that is why having `amphora` felt a bit confusing I guess14:14
johnsomFor a deployment project, "amphora" will always be the right answer14:14
noonedeadpunkas `amphorav2` makes more natural given that the code is quite different14:14
noonedeadpunkok, yeah, I see14:14
noonedeadpunkwe just switched `default_provider_driver = amphorav2`  some time ago....14:15
noonedeadpunk#startmeeting openstack_ansible_meeting15:00
opendevmeetMeeting started Tue Sep 10 15:00:20 2024 UTC and is due to finish in 60 minutes.  The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot.15:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:00
opendevmeetThe meeting name has been set to 'openstack_ansible_meeting'15:00
noonedeadpunk#topic rollcall15:00
noonedeadpunko/15:00
hamburglero/15:00
NeilHanlono/15:01
jrossero/ hello15:01
noonedeadpunk#topic office hours15:03
noonedeadpunkso, noble test jobs finally merged15:03
noonedeadpunkthough we've missed moving noble with playbooks15:04
noonedeadpunkand the fix failed on gate intermittently and currently in recheck15:05
noonedeadpunk#link https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/928592/315:05
noonedeadpunkThere is also a current issue with apache on metal15:05
noonedeadpunkas we're using different MPMs across roles, which causes upgrade job failures15:06
noonedeadpunk(once upgrade jobs track correct branch)15:06
noonedeadpunkso whatever fix needed shoud be backported to 2024.115:07
jrosseri found that by trying to understand the job failures in more depth15:07
noonedeadpunkand i guess this should be kinda last thing for backport before doing first minor release15:07
noonedeadpunkAh, except octavia thing that I realized just today15:07
noonedeadpunk#link https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/92881515:08
jrosserdo we have broken apache/metal on 2024.1?15:08
noonedeadpunkyeah15:08
jrosseroh dear, ok15:08
noonedeadpunkI think that second run of playbooks will break it15:08
jrosserfixing the upgrade job branch could bring more CI trouble, just a release earlier15:09
noonedeadpunkyeah, true15:11
noonedeadpunkso there's quite some things to work on, but not sure what needs deeper discussion15:14
jrosseri found the horizon compress failure is not specifically an OSA issue15:16
noonedeadpunkoh15:16
jrosserit aparrently occurs when installing UCA pacakges, as part of building debian packages, and also in devstack15:17
jrosserthere is a bug which is now correctly assigned to the horizon project https://bugs.launchpad.net/horizon/+bug/204539415:17
jrosseri also spent some time looking at why jobs fail to get u-c when that should be from the disk15:18
jrosserand unfortuntley that happens a lot in upgrade jobs and there are insufficient logs collected15:19
jrosserthis (+ a backport) should address the log collection https://review.opendev.org/c/openstack/openstack-ansible/+/92879015:20
jrosserbut that is kind of hard to test15:20
noonedeadpunkit looks reasonable enough15:35
jrosserfor the u-c errors it is clear that the code takes the path for the url being https:// rather than file://15:38
jrosserbut why it does that is not obvious yet - it could be that we have changed the way that the redirection of the URLs to files works between releases15:39
jrosserso what is set up for the initial upgrade branch does not do the right thing for the target branch15:39
jrosseri think this is the most likley explanation for those kind of errors15:40
noonedeadpunkso if that for upgrade jobs only - that might be the case15:47
noonedeadpunkas there we kind of ignore zuul-provided repos15:47
noonedeadpunkjust to leave them in "original" state to preserve depends-on15:48
noonedeadpunkwhich could explain why upgrade on N-1 might try to do web fetch of u-c15:48
jrosserhow do i discover where the opensearch log collection service is?15:53
jrosser^ for CI jobs15:53
jrosserML says https://opensearch.logs.openstack.org/_dashboards/app/discover?security_tenant=global15:56
noonedeadpunk#endmeeting16:06
opendevmeetMeeting ended Tue Sep 10 16:06:38 2024 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:06
opendevmeetMinutes:        https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-09-10-15.00.html16:06
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-09-10-15.00.txt16:06
opendevmeetLog:            https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-09-10-15.00.log.html16:06
jrosserhttps://mariadb.com/newsroom/press-releases/k1-acquires-a-leading-database-software-company-mariadb-and-appoints-new-ceo/16:31
noonedeadpunkwow16:41
noonedeadpunkno good example of k1 investments in examples....16:42
opendevreviewMerged openstack/openstack-ansible-plugins master: Verify OS for containers installation  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/92859118:25
opendevreviewMerged openstack/openstack-ansible-plugins master: Add Ubuntu 24.04 to supported by playbook versions  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/92859218:25

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!