Monday, 2024-09-23

*** bauzas_ is now known as bauzas09:58
*** bauzas_ is now known as bauzas13:17
lajoskatonaHi, I question regarding one of the Neutron jobs, perhaps you have a view which can help us.15:03
lajoskatonaNeutron-functional jobs started to fail recently with "OSError: [Errno 24] Too many open files" (i.e.: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_212/928953/12/check/neutron-functional-with-uwsgi-10/212c1ad/testr_results.html )15:03
clarkblajoskatona: my hunch is that you are hitting a ulimit15:05
lajoskatonawe started to increase the ulimit: https://review.opendev.org/c/openstack/neutron/+/928759 but the above error is comming with ulimit 4096, which is strange, so you have perhaps an idea where other direction we should touch?15:05
clarkbno it very likely is you're hitting that 4k ulimit on open files. Is it possible you are leaking file descriptors?15:05
lajoskatonaclarkb: the issue started a few weeks ago, and strangely on older branches also,15:07
clarkbthough looking at that change I'm not sure you're applying the ulimit properly against the devstack user15:07
clarkbso maybe you're still hitting the 2k limit15:07
lajoskatonaand as it is functional (lot's of namespaces and  things are opened parallelly 15:08
clarkbwouldn't be strange if you backported the bug (assuming it is a bug leaking fds)15:08
lajoskatonaclarkb: yes that can be15:08
lajoskatonaI tried to set in local.conf like ULIMIT_NOFILE: xy but as functional tests for Neutron use there own scripts and stack.sh just partially executed that didn't work15:09
clarkbcan you trap for the error then do a fd table dump?15:09
clarkbyou should be able to get paths and ownership and get a clue as to where the issue is stemming from15:10
clarkba more brute force approach would be to overload open() and dump a table each time you open things but that might explode your logging15:10
fungior you can try to list them all with lsof15:10
clarkbfungi: ya I imagine part of the issue with that is the race involved in things crashing and cleaning up vs being visiable to lsof but maybe if run often enough we can win that race15:11
fungiyeah, could be15:11
lajoskatonafile tracker is on, example: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_212/928953/12/check/neutron-functional-with-uwsgi-10/212c1ad/controller/logs/screen-file_tracker.txt15:12
clarkblajoskatona: that lacks paths15:12
clarkbbut you can pretty clearly see it balloon to 15.5k15:13
clarkbif you had paths and processes you'd hopefully be able to narrow down the source of the issue. Maybe file tracker if > 15k does an lsof dump into another file15:13
clarkbthen you'd get that info15:13
lajoskatonaclarkb: yes, but we supposed that's not only for the test user  (you know try to be on the bright side and somebody else should be anyway the troublemaker :))15:14
clarkblajoskatona: right, I'm just saying we already know the fd count is too high based on the error. So that doesn't give us any real new info other than we can see it climb slowly. But it might provide a good hook point to do an lsof dump when you exceed a threshold like 15k15:15
lajoskatonaclarkb: thanks, I check this file tracker, as this is just the devstack service, perhaps I can add something similar to the functional job to have such extra info15:16
clarkblajoskatona: you can update devstack then depends on that in your change15:20
lajoskatonaclarkb: thanks, good idea, I check that direction first15:31

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!