Friday, 2024-04-12

opendevreviewyatin proposed openstack/neutron stable/zed: Fix TestOVNMechanismDriver ipv6 tests  https://review.opendev.org/c/openstack/neutron/+/91552304:54
zigoAbout this bug: https://bugs.launchpad.net/neutron/+bug/206097407:05
zigoI sent into the haproxy.c source code to see how its writing the pid file, and it's like this:07:05
zigopidfd = open(global.pidfile, O_CREAT | O_WRONLY | O_TRUNC, 0644);07:05
zigoSo it's really attempting to write rw-r--r--, but it ends up rw-r----- in /var/lib/neutron/external/pids. So I wonder: is there an umask set by neutron somehow, when starting haproxy?07:05
zigoWhere's the code that starts haproxy?07:06
zigoLooks like /var/lib/neutron/external/ and /var/lib/neutron/external/pids are created with the wrong unix rights, missing the o+r...07:14
zigoWhen I look on another server (older OpenStack), I can see correct o+r rights.07:15
slaweqhi, who can approve patches to the unmaintained branches? Can you check https://review.opendev.org/q/topic:%22clean-tobiko-job%22 ?07:20
slaweqthx in advanve07:20
fricklerslaweq: https://review.opendev.org/admin/groups/4d728691952c04b8b2ec828eabc96b98dc124d69,members there is also #openstack-unmaintained though I'm considering to shut it down again as it doesn't really seem to get used08:00
opendevreviewBenjamin Reichel proposed openstack/python-neutronclient master: Fix insert and remove rule from firewall policy  https://review.opendev.org/c/openstack/python-neutronclient/+/91329108:43
fricklerlajoskatona: FYI we will be discussing future actions regarding the unmaintained process in the PTG session at 16:00 UTC, in case you want to join or add comments before at https://etherpad.opendev.org/p/apr2024-ptg-os-tc#L15409:02
fricklernicolasbock__: ^^09:02
frickleryou might also want to join #openstack-tc and #openstack-unmaintained09:03
zigoslaweq: I've been searching for 2 days, and can't find out what changed the unix rights of pid.haproxy to fix the dhcp-agent that's currently completely broken for me in Caracal. Could you give me a hint on where haproxy is spawned ?09:04
slaweq@zigo sure09:05
slaweqhaproxy may be spawned by different agent - in which namespace do you have it running?09:05
zigoslaweq: I'm not sure, my issue is neutron-dhcp-agent not being able to read files like /var/lib/neutron/external/pids/32d7d6eb-cd03-4d4d-88f4-c05955a2e9d2.pid.haproxy for example.09:07
zigoThen it gets stuck in a loop doing that, and doesn't process anything else.09:07
zigoAs a result, there's no DHCP for my VMs ... :/09:07
slaweqahh, so it's dhcp agent09:07
zigoUnix rights for these files have changed, somehow.09:07
slaweqok09:07
zigoIn Bobcat, it was world readable, and no problem.09:07
slaweqdhcp agent is calling metadata driver here https://github.com/openstack/neutron/blob/019294c71d94b788c14b23dc1da3c21f51bcdb0b/neutron/agent/dhcp/agent.py#L82609:08
zigoAlso, /var/lib/neutron/external/pids/ used to be 755 owned by neutron:neutron.09:08
zigoNow it's 640 owned by root:root ...09:08
slaweqand that driver spawns haproxy09:09
zigoOk, thanks.09:09
zigoSo it's still some metadata agent code that's doing the thing, right?09:09
slaweqbut I don't think that Neutron is messing with the owner of that file09:09
slaweqor with the rights to it09:09
slaweqit has to be something external to neutron IMO09:09
zigoI checked haproxy code, and it's really opening file as 644, so world readable...09:10
zigoMaybe there's some kind of umask at some point.09:10
zigoAnyways, thanks, I'll be able to trace it from there, hopefully.09:11
lajoskatonafrickler: thanks I want to join09:11
zigoslaweq: I'll try reverting https://review.opendev.org/c/openstack/neutron/+/894399 and see how it goes... :P09:16
zigoThis patch looks suspicious to me.09:16
opendevreviewRodolfo Alonso proposed openstack/neutron master: [OVN] Optimize ``HAChassisGroupRouterEvent`` code  https://review.opendev.org/c/openstack/neutron/+/91555809:31
opendevreviewRodolfo Alonso proposed openstack/neutron master: [OVN] Add release note for OVN router tunnelled networks ext GW  https://review.opendev.org/c/openstack/neutron/+/91555909:31
tkajinamzigo, I'd rather suspect something with haproxy in Debian, as I've not seen this problem in CentOS or Ubuntu09:48
zigotkajinam: I don't think so, it's the same base bookworm that I'm using.09:55
zigoIn bobcat, I didn't have the trouble.09:55
tkajinamhm09:55
tkajinamhttps://paste.opendev.org/show/bkgmiErdwZ9Ame8UWM8S/09:55
zigoI can try reseting my CI from scratch with bobcat to prove it...09:55
tkajinamis that haproxy run under dhcp-agent ?09:56
tkajinamnot l3-agent right ?09:56
tkajinam(I guess you use isolated metadata09:56
zigo cat /proc/329914/status | grep -i umask09:57
zigoUmask:  002609:57
zigoThat's for one of the haproxies I can see running...09:57
zigoSo "something" has "fixed" the umask ... :/09:58
* zigo is reruning his CI for Bobcat to make sure at least Bobcat is sane.09:59
zigoThis will take me 3 hours at least ... :/09:59
tkajinamI enabled isolated metadata and created another network/subnet. this time haproxy is launched by dhcp-agent, instead of l3-agent but has 022 umask and 644 pid10:00
tkajinamhaproxy creates pid with explicit 644 but that umask is messing up that https://github.com/haproxy/haproxy/blob/0797e05d9f0577d9239d4265667ea536a2439db0/src/haproxy.c#L358910:08
zigotkajinam: That's exactly what I went to look into.10:40
zigoAnyway, my CI is running with bobcat, I'll soon know if that's a regression or what.10:40
sebais the separation between neutron-server (api via uwsgi) and neutron-rpc-server something that is widely used? I found it in the docs, but I have some sublte problems with it, as some plugins still run rpc servers in the API part, which then clashes with eventlet and kills requests10:42
tkajinamseba, there is a known issue with ml2-ovn which is not yet fixed even in current master10:44
tkajinamseba, if you use the other plugins like ml2-ovs then it may work10:44
sebanamely trunk drivers and logapi create an rpc server that instantiates MessageHandlingServer, which runs a oslo_messaging server, resulting in usage of a shared lock between greenthreads and native threads. this locks up some workers, resulting in requests just timing out, resulting in orphaned ports in manila + extra ports in nova10:44
zigoseba: That's what I've done in the Debian package, and what we've been using in production for like YEARS !10:45
zigoThough truth, we're not using OVN...10:45
sebayeah, we've been using this separation also for a couple of years, but the orphaned ports are getting problematic for us.10:46
tkajinamthere might be issues with a few more implementations in specific plugins, though10:46
sebawe're also not using ovn btw10:46
sebais there something to read about the ml2-ovn issue? As ovn is the kinda reference implementation on how to do things it's always nice to look into it :)10:48
sebafor trunking I think the issue is here: https://opendev.org/openstack/neutron/src/branch/master/neutron/services/trunk/drivers/base.py#L87-L8810:48
sebaif you're inheriting from the trunking DriverBase the register() method will create the ServerSideRpcBackend() regardless of if we're the api part or rpc-server part of neutron10:49
tkajinamhttps://bugs.launchpad.net/neutron/+bug/1912359 this is the bug for ovn afaik10:50
sebathe only indicator to find out if neutron is run as rpc-server or not is cfg.CONF.rpc_workers > 0 - or is there a better/official way to find that out in code?10:50
tkajinamcfg.CONF.rpc_workers = 0 is for specifically ml2-ovn afaik10:51
tkajinamseba, I'd suggest you check the existing bugs reported for neutron and create one if there is no similar ones.10:52
sebait's part of conf/service.py and gets used in service.py as well as the l3 plugin, so I thought it's a more general config10:52
sebaaye, I can certainly create a bug report10:53
sebathough I'm also planning on making a downstream fix for this in our own fork of neutron rather soonish, as this is impacting some of our customers10:55
sebathought I'd check in if somebody had a better idea than to introduce some sort of mechanism to determine if I'm in rpc or not10:55
sebaI guess with trunking the problem is that the rpc is started as part of the DriverBase class in register() and not via neutron.service.start_rpc_workers()10:58
sebahttps://bugs.launchpad.net/neutron/+bug/2015275 this looks relevant to my problem11:22
lajoskatonatkajinam: Hi, Sorry for disturbing, I sent a question to openstack-discuss regarding Heat and trunk port handling and you were so heedless enough to answer: https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/P4JC2HHCBMFRVKJK275UZYCK3FJ552HI/11:30
tkajinamseba, we have had ptg sessions(development discussions) this week but unfortunately neutron already finished its all slot. for further discussion you may probably want to bring it in next irc meeting or in ml11:31
lajoskatonatkajinam: do you have perhaps chance to think about it? In the last mail I sent an etherpad with the yamls we tried so far: https://etherpad.opendev.org/p/Trunk_creation_with_resourceGroup11:31
tkajinamlajoskatona, sorry it dropped from my memory. I saw your bug report and attempted to try any potential methods but failed. I'll give it another thoughts11:31
tkajinamI think the root cause of the problem is that we don't have dedicated resource to manage interface attachments but idk if we can fix it at this stage11:32
lajoskatonatkajinam: thanks, if it is not possible today that is also a good answer, we have to think about perhaps to check Heat code if it can be fixed there or extend so already existing hot template feature11:33
tkajinamfixing the way dependency is resolved with ResourceGroup may be a potential way but I've not yet looked into it further11:33
sebatkajinam, okay then, works for me. I'll continue to investigate this in my cloud deployments anyway and see what I come up with11:36
opendevreviewIhtisham ul Haq proposed openstack/neutron master: Optimize deletion of static routes  https://review.opendev.org/c/openstack/neutron/+/91490012:51
zigoslaweq: tkajinam: Ok, so I still have the same access right issue in Bobcat, except that it looks like neutron-dhcp-agent isn't monitoring the dhcp agent and therefore it's working for me.13:04
zigoIt looks like there's so many haproxy started though ... :/13:05
zigonetwork-1>_ ~ # ps axuf | grep haproxy | grep 4aeef23c-6d20-4b9b-853e-02308da799ea | wc -l13:06
zigo2513:06
zigo:/13:06
zigoslaweq: tkajinam: I found out, it's my startup script for neutron-dhcp-agent that has "umask 0026" ...13:19
zigoNot sure why it wasn't an issue previously.13:19
zigoWas neutron setting it up previously ?!?13:19
zigoI was doing this because of log files that I didn't want to be world readable.13:20
slaweqzigo no, I don't think neutron was setting those ever13:20
zigoAnyways, I've found out! :P13:23
slaweqgreat13:44
samcat116Hi all, seeing some odd behavior when trying to delete a large amount of instances at once on one compute host(over 100 vms with many networks per vm). Nova sets the instances to error as its getting nova.exception.NeutronAdminCredentialConfigurationInvalid when trying to deallocate the network. On the neutron side I can see warnings for token authorization failed and the 401 errors for the GET requests on these ports. The token in14:12
samcat116the config here isn't changing and must be correct as spinning up all these vms worked, so not sure what this means.14:12
greatgatsby_Hello.  Is there a way to determine the tap<some-id> interface created on the compute given an openstack port ID?15:02
greatgatsby_nm - figured it out, the tap<id> is actually the first part of the port id15:31
*** atmark_ is now known as atmark17:16

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!