opendevreview | yatin proposed openstack/neutron stable/zed: Fix TestOVNMechanismDriver ipv6 tests https://review.opendev.org/c/openstack/neutron/+/915523 | 04:54 |
---|---|---|
zigo | About this bug: https://bugs.launchpad.net/neutron/+bug/2060974 | 07:05 |
zigo | I sent into the haproxy.c source code to see how its writing the pid file, and it's like this: | 07:05 |
zigo | pidfd = open(global.pidfile, O_CREAT | O_WRONLY | O_TRUNC, 0644); | 07:05 |
zigo | So it's really attempting to write rw-r--r--, but it ends up rw-r----- in /var/lib/neutron/external/pids. So I wonder: is there an umask set by neutron somehow, when starting haproxy? | 07:05 |
zigo | Where's the code that starts haproxy? | 07:06 |
zigo | Looks like /var/lib/neutron/external/ and /var/lib/neutron/external/pids are created with the wrong unix rights, missing the o+r... | 07:14 |
zigo | When I look on another server (older OpenStack), I can see correct o+r rights. | 07:15 |
slaweq | hi, who can approve patches to the unmaintained branches? Can you check https://review.opendev.org/q/topic:%22clean-tobiko-job%22 ? | 07:20 |
slaweq | thx in advanve | 07:20 |
frickler | slaweq: https://review.opendev.org/admin/groups/4d728691952c04b8b2ec828eabc96b98dc124d69,members there is also #openstack-unmaintained though I'm considering to shut it down again as it doesn't really seem to get used | 08:00 |
opendevreview | Benjamin Reichel proposed openstack/python-neutronclient master: Fix insert and remove rule from firewall policy https://review.opendev.org/c/openstack/python-neutronclient/+/913291 | 08:43 |
frickler | lajoskatona: FYI we will be discussing future actions regarding the unmaintained process in the PTG session at 16:00 UTC, in case you want to join or add comments before at https://etherpad.opendev.org/p/apr2024-ptg-os-tc#L154 | 09:02 |
frickler | nicolasbock__: ^^ | 09:02 |
frickler | you might also want to join #openstack-tc and #openstack-unmaintained | 09:03 |
zigo | slaweq: I've been searching for 2 days, and can't find out what changed the unix rights of pid.haproxy to fix the dhcp-agent that's currently completely broken for me in Caracal. Could you give me a hint on where haproxy is spawned ? | 09:04 |
slaweq | @zigo sure | 09:05 |
slaweq | haproxy may be spawned by different agent - in which namespace do you have it running? | 09:05 |
zigo | slaweq: I'm not sure, my issue is neutron-dhcp-agent not being able to read files like /var/lib/neutron/external/pids/32d7d6eb-cd03-4d4d-88f4-c05955a2e9d2.pid.haproxy for example. | 09:07 |
zigo | Then it gets stuck in a loop doing that, and doesn't process anything else. | 09:07 |
zigo | As a result, there's no DHCP for my VMs ... :/ | 09:07 |
slaweq | ahh, so it's dhcp agent | 09:07 |
zigo | Unix rights for these files have changed, somehow. | 09:07 |
slaweq | ok | 09:07 |
zigo | In Bobcat, it was world readable, and no problem. | 09:07 |
slaweq | dhcp agent is calling metadata driver here https://github.com/openstack/neutron/blob/019294c71d94b788c14b23dc1da3c21f51bcdb0b/neutron/agent/dhcp/agent.py#L826 | 09:08 |
zigo | Also, /var/lib/neutron/external/pids/ used to be 755 owned by neutron:neutron. | 09:08 |
zigo | Now it's 640 owned by root:root ... | 09:08 |
slaweq | and that driver spawns haproxy | 09:09 |
zigo | Ok, thanks. | 09:09 |
zigo | So it's still some metadata agent code that's doing the thing, right? | 09:09 |
slaweq | but I don't think that Neutron is messing with the owner of that file | 09:09 |
slaweq | or with the rights to it | 09:09 |
slaweq | it has to be something external to neutron IMO | 09:09 |
zigo | I checked haproxy code, and it's really opening file as 644, so world readable... | 09:10 |
zigo | Maybe there's some kind of umask at some point. | 09:10 |
zigo | Anyways, thanks, I'll be able to trace it from there, hopefully. | 09:11 |
lajoskatona | frickler: thanks I want to join | 09:11 |
zigo | slaweq: I'll try reverting https://review.opendev.org/c/openstack/neutron/+/894399 and see how it goes... :P | 09:16 |
zigo | This patch looks suspicious to me. | 09:16 |
opendevreview | Rodolfo Alonso proposed openstack/neutron master: [OVN] Optimize ``HAChassisGroupRouterEvent`` code https://review.opendev.org/c/openstack/neutron/+/915558 | 09:31 |
opendevreview | Rodolfo Alonso proposed openstack/neutron master: [OVN] Add release note for OVN router tunnelled networks ext GW https://review.opendev.org/c/openstack/neutron/+/915559 | 09:31 |
tkajinam | zigo, I'd rather suspect something with haproxy in Debian, as I've not seen this problem in CentOS or Ubuntu | 09:48 |
zigo | tkajinam: I don't think so, it's the same base bookworm that I'm using. | 09:55 |
zigo | In bobcat, I didn't have the trouble. | 09:55 |
tkajinam | hm | 09:55 |
tkajinam | https://paste.opendev.org/show/bkgmiErdwZ9Ame8UWM8S/ | 09:55 |
zigo | I can try reseting my CI from scratch with bobcat to prove it... | 09:55 |
tkajinam | is that haproxy run under dhcp-agent ? | 09:56 |
tkajinam | not l3-agent right ? | 09:56 |
tkajinam | (I guess you use isolated metadata | 09:56 |
zigo | cat /proc/329914/status | grep -i umask | 09:57 |
zigo | Umask: 0026 | 09:57 |
zigo | That's for one of the haproxies I can see running... | 09:57 |
zigo | So "something" has "fixed" the umask ... :/ | 09:58 |
* zigo is reruning his CI for Bobcat to make sure at least Bobcat is sane. | 09:59 | |
zigo | This will take me 3 hours at least ... :/ | 09:59 |
tkajinam | I enabled isolated metadata and created another network/subnet. this time haproxy is launched by dhcp-agent, instead of l3-agent but has 022 umask and 644 pid | 10:00 |
tkajinam | haproxy creates pid with explicit 644 but that umask is messing up that https://github.com/haproxy/haproxy/blob/0797e05d9f0577d9239d4265667ea536a2439db0/src/haproxy.c#L3589 | 10:08 |
zigo | tkajinam: That's exactly what I went to look into. | 10:40 |
zigo | Anyway, my CI is running with bobcat, I'll soon know if that's a regression or what. | 10:40 |
seba | is the separation between neutron-server (api via uwsgi) and neutron-rpc-server something that is widely used? I found it in the docs, but I have some sublte problems with it, as some plugins still run rpc servers in the API part, which then clashes with eventlet and kills requests | 10:42 |
tkajinam | seba, there is a known issue with ml2-ovn which is not yet fixed even in current master | 10:44 |
tkajinam | seba, if you use the other plugins like ml2-ovs then it may work | 10:44 |
seba | namely trunk drivers and logapi create an rpc server that instantiates MessageHandlingServer, which runs a oslo_messaging server, resulting in usage of a shared lock between greenthreads and native threads. this locks up some workers, resulting in requests just timing out, resulting in orphaned ports in manila + extra ports in nova | 10:44 |
zigo | seba: That's what I've done in the Debian package, and what we've been using in production for like YEARS ! | 10:45 |
zigo | Though truth, we're not using OVN... | 10:45 |
seba | yeah, we've been using this separation also for a couple of years, but the orphaned ports are getting problematic for us. | 10:46 |
tkajinam | there might be issues with a few more implementations in specific plugins, though | 10:46 |
seba | we're also not using ovn btw | 10:46 |
seba | is there something to read about the ml2-ovn issue? As ovn is the kinda reference implementation on how to do things it's always nice to look into it :) | 10:48 |
seba | for trunking I think the issue is here: https://opendev.org/openstack/neutron/src/branch/master/neutron/services/trunk/drivers/base.py#L87-L88 | 10:48 |
seba | if you're inheriting from the trunking DriverBase the register() method will create the ServerSideRpcBackend() regardless of if we're the api part or rpc-server part of neutron | 10:49 |
tkajinam | https://bugs.launchpad.net/neutron/+bug/1912359 this is the bug for ovn afaik | 10:50 |
seba | the only indicator to find out if neutron is run as rpc-server or not is cfg.CONF.rpc_workers > 0 - or is there a better/official way to find that out in code? | 10:50 |
tkajinam | cfg.CONF.rpc_workers = 0 is for specifically ml2-ovn afaik | 10:51 |
tkajinam | seba, I'd suggest you check the existing bugs reported for neutron and create one if there is no similar ones. | 10:52 |
seba | it's part of conf/service.py and gets used in service.py as well as the l3 plugin, so I thought it's a more general config | 10:52 |
seba | aye, I can certainly create a bug report | 10:53 |
seba | though I'm also planning on making a downstream fix for this in our own fork of neutron rather soonish, as this is impacting some of our customers | 10:55 |
seba | thought I'd check in if somebody had a better idea than to introduce some sort of mechanism to determine if I'm in rpc or not | 10:55 |
seba | I guess with trunking the problem is that the rpc is started as part of the DriverBase class in register() and not via neutron.service.start_rpc_workers() | 10:58 |
seba | https://bugs.launchpad.net/neutron/+bug/2015275 this looks relevant to my problem | 11:22 |
lajoskatona | tkajinam: Hi, Sorry for disturbing, I sent a question to openstack-discuss regarding Heat and trunk port handling and you were so heedless enough to answer: https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/P4JC2HHCBMFRVKJK275UZYCK3FJ552HI/ | 11:30 |
tkajinam | seba, we have had ptg sessions(development discussions) this week but unfortunately neutron already finished its all slot. for further discussion you may probably want to bring it in next irc meeting or in ml | 11:31 |
lajoskatona | tkajinam: do you have perhaps chance to think about it? In the last mail I sent an etherpad with the yamls we tried so far: https://etherpad.opendev.org/p/Trunk_creation_with_resourceGroup | 11:31 |
tkajinam | lajoskatona, sorry it dropped from my memory. I saw your bug report and attempted to try any potential methods but failed. I'll give it another thoughts | 11:31 |
tkajinam | I think the root cause of the problem is that we don't have dedicated resource to manage interface attachments but idk if we can fix it at this stage | 11:32 |
lajoskatona | tkajinam: thanks, if it is not possible today that is also a good answer, we have to think about perhaps to check Heat code if it can be fixed there or extend so already existing hot template feature | 11:33 |
tkajinam | fixing the way dependency is resolved with ResourceGroup may be a potential way but I've not yet looked into it further | 11:33 |
seba | tkajinam, okay then, works for me. I'll continue to investigate this in my cloud deployments anyway and see what I come up with | 11:36 |
opendevreview | Ihtisham ul Haq proposed openstack/neutron master: Optimize deletion of static routes https://review.opendev.org/c/openstack/neutron/+/914900 | 12:51 |
zigo | slaweq: tkajinam: Ok, so I still have the same access right issue in Bobcat, except that it looks like neutron-dhcp-agent isn't monitoring the dhcp agent and therefore it's working for me. | 13:04 |
zigo | It looks like there's so many haproxy started though ... :/ | 13:05 |
zigo | network-1>_ ~ # ps axuf | grep haproxy | grep 4aeef23c-6d20-4b9b-853e-02308da799ea | wc -l | 13:06 |
zigo | 25 | 13:06 |
zigo | :/ | 13:06 |
zigo | slaweq: tkajinam: I found out, it's my startup script for neutron-dhcp-agent that has "umask 0026" ... | 13:19 |
zigo | Not sure why it wasn't an issue previously. | 13:19 |
zigo | Was neutron setting it up previously ?!? | 13:19 |
zigo | I was doing this because of log files that I didn't want to be world readable. | 13:20 |
slaweq | zigo no, I don't think neutron was setting those ever | 13:20 |
zigo | Anyways, I've found out! :P | 13:23 |
slaweq | great | 13:44 |
samcat116 | Hi all, seeing some odd behavior when trying to delete a large amount of instances at once on one compute host(over 100 vms with many networks per vm). Nova sets the instances to error as its getting nova.exception.NeutronAdminCredentialConfigurationInvalid when trying to deallocate the network. On the neutron side I can see warnings for token authorization failed and the 401 errors for the GET requests on these ports. The token in | 14:12 |
samcat116 | the config here isn't changing and must be correct as spinning up all these vms worked, so not sure what this means. | 14:12 |
greatgatsby_ | Hello. Is there a way to determine the tap<some-id> interface created on the compute given an openstack port ID? | 15:02 |
greatgatsby_ | nm - figured it out, the tap<id> is actually the first part of the port id | 15:31 |
*** atmark_ is now known as atmark | 17:16 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!