*** gibi_pto is now known as gibi | 05:51 | |
tobias-urdin | hberaud: i'm trying to figure out why nova-compute is leaking a_inodes allocated for eventpoll after we've upgraded, hopefully i can pick your brain a little since it involves some changes in oslo.messaging | 07:20 |
---|---|---|
tobias-urdin | the issue is pretty much the same as described and solved with https://review.opendev.org/c/openstack/oslo.messaging/+/386656 a long time ago, but in that is also leaks anon_inodes until NOFILE limit is hit for the process and it just stops working since there is no available fds | 07:21 |
tobias-urdin | this was then reverted by this change, but i dont understand why this line was reverted, is it because threading is already patched by eventlet to threading would already point to that a similar implementation to what that helper class in eventutils does? | 07:23 |
tobias-urdin | https://github.com/openstack/oslo.messaging/commit/22f240b82fffbd62be8568a7d0d3369134596ace#diff-ba636bdb71407febb1ff546dee098c4bc45952da2bb4e7f86f1126d53d7ec11fR949 | 07:23 |
tobias-urdin | the final change is then the change to using pthreads for heatbeats by default https://github.com/openstack/oslo.messaging/commit/add5ab4ecec090efdb9864bc9385f871f2dd082a | 07:24 |
tobias-urdin | which i guess would also be the case, that if I firstly, try to disable that and see the impact, however that is also deprecated so I assume the behavior will be default in future and not configurable? | 07:25 |
tobias-urdin | I will continue to investigate by seeing if I can actually find something that solves the issue. | 07:25 |
jrosser | tobias-urdin: we had that trouble here i think | 07:26 |
tobias-urdin | interesting, just for posterity we upgraded oslo.messaging from 12.5.2 to 12.9.3 | 07:27 |
jrosser | there is a LP bug that my colleague made, just looking for it | 07:27 |
jrosser | https://bugs.launchpad.net/oslo.messaging/+bug/1949964 | 07:28 |
tobias-urdin | thanks! i will check it out | 07:30 |
jrosser | ah yes, there was a very bad FD leak from the amqp library, and once that was fixed there was an underlying eventpoll FD leak related to threading | 07:37 |
jrosser | and it affects more than nova-compute for us, anything not running with uwsgi https://bugs.launchpad.net/openstack-ansible/+bug/1961603 | 07:38 |
tobias-urdin | ack, yeah I assume we will start hitting that with more services not running with mod_wsgi when upgrading, from what i understand apps running under mod_wsgi should use pthreads | 07:42 |
hberaud | tobias-urdin: Concerning the heartbeat in pthread option, we undeprecated it last year, so this option will remain and the possibility to swith from greenthread to pthread too https://review.opendev.org/c/openstack/oslo.messaging/+/800621 | 07:50 |
tobias-urdin | ack | 07:57 |
damani | sorry for the meeting yesterday | 12:15 |
damani | i was to the doctor and then i forget it, but we will do it next week | 12:15 |
damani | /bu/bu44 | 12:16 |
sean-k-mooney | hberaud: tobias-urdin yep it was undeprecated at my request because it could cause issue in the nova-comptue agent in some cases if i recall correctly | 12:35 |
sean-k-mooney | hberaud: tobias-urdin with that said we were recently chatting in #openstack-nova about some eventlet internals | 12:36 |
sean-k-mooney | we think that we shoudl perhaps remove the use of eventlets spawn_n | 12:37 |
sean-k-mooney | we think that using spawn_n can lead to leaking greanthread over time when exeptions are raised or in some other cases | 12:38 |
sean-k-mooney | tobias-urdin: so the inode leak coudl be related to useing spawn_n | 12:38 |
sean-k-mooney | https://github.com/eventlet/eventlet/issues/731#issuecomment-953761883 | 12:39 |
hberaud | Not related to threading but also triggered by monkey patched env, DNS and sockets are also impacted by eventlet issues (https://github.com/celery/py-amqp/commit/98f6d364188215c2973693a79e461c7e9b54daef) (recently fixed) | 12:40 |
sean-k-mooney | tobias-urdin: im considering doing https://github.com/eventlet/eventlet/issues/731#issuecomment-968135262 eventually. | 12:40 |
sean-k-mooney | hberaud: afctully that might not be needed anymore | 12:41 |
sean-k-mooney | hberaud: https://github.com/openstack/nova/commit/fe1ebe69f358cbed62434da3f1537a94390324bb | 12:42 |
sean-k-mooney | hberaud: i turn back on greendns recently | 12:42 |
hberaud | oh cool | 12:42 |
hberaud | good to know | 12:42 |
sean-k-mooney | so on my ever growing todo list i want to see if i gloablly do "eventlet.spawn_n = eventlet.spawn" | 12:43 |
sean-k-mooney | and the rest of that comment in nova 1 will it break anythign and 2 will it help withthe leaking of greenlets | 12:43 |
sean-k-mooney | if it does i would like to then convert all the use of spawn_n to spawn in nova nad eventrually in oslo | 12:44 |
hberaud | great | 12:44 |
hberaud | may I can help you on the oslo side? | 12:44 |
sean-k-mooney | https://github.com/eventlet/eventlet/issues/731#issuecomment-968135262 is a bit of a hack but if that work we can put it behind a workaround option and run it in the ci for a whiel | 12:45 |
sean-k-mooney | sure but i want to confirm it does not break the world first | 12:45 |
hberaud | sure | 12:45 |
sean-k-mooney | when we do this we are going to silently discard the reference to the greenthread intially in the places that are using spwan_n | 12:46 |
sean-k-mooney | but that should not matter | 12:46 |
sean-k-mooney | the api is otherwise the same | 12:46 |
hberaud | I see | 12:46 |
sean-k-mooney | but the sematic of using a greenthread vs freestandign greenlet we think will help with the resouce leaks based on the comments form the upstream eventlet maintainer | 12:47 |
sean-k-mooney | im not sure that we fully comprehended the full implciations of """The same as spawn(), but it’s not possible to know how the function terminated (i.e. no return value or exceptions).""" ment when we first started usign spawn_n | 12:49 |
sean-k-mooney | if an excption perculates all the way to the top of the call stack for the function we invoke with spaw_n the greenlet jsut stays in the background after loging the traceback and i belive nothing handels it today | 12:51 |
sean-k-mooney | mnaser triggered a GMR on a nova-compute that had 100s of greenlets in that state. | 12:53 |
sean-k-mooney | tobias-urdin: nova has not recently started useing spwan_n but the pthread change is relitivly recent so maybe the two dont play nicely togehter | 12:56 |
sean-k-mooney | tobias-urdin: if you do find something on the nova side besure to file a bug and let us know | 12:57 |
tobias-urdin | i'm rolled out the heartbeat_in_pthread=false change, will monitor that it doesn't leak for 1-2 days, but that's probably the issue based on the bug https://bugs.launchpad.net/oslo.messaging/+bug/1949964 | 13:00 |
tobias-urdin | i've | 13:00 |
tobias-urdin | but will do | 13:00 |
jrosser | we get log noise from oslo.cache etc3gw backend turning into bug reports in openstack-ansible `Could not load 'oslo_cache.etcd3gw': No module named 'etcd3gw'` - are all the backends loaded regardless if we only use memcached? | 13:27 |
opendevreview | Takashi Kajinami proposed openstack/taskflow master: Remove six https://review.opendev.org/c/openstack/taskflow/+/842114 | 13:57 |
opendevreview | Takashi Kajinami proposed openstack/taskflow master: Remove six https://review.opendev.org/c/openstack/taskflow/+/842114 | 14:44 |
sean-k-mooney | jrosser: i think the oslo_cache modules are loaded to register config options | 14:59 |
sean-k-mooney | and i would guess the etcd one is unconditonlly doign an import of etcd3gw | 14:59 |
sean-k-mooney | hum the import is in the init actully | 15:00 |
sean-k-mooney | https://github.com/openstack/oslo.cache/blob/master/oslo_cache/backends/etcd3gw.py#L43= | 15:00 |
sean-k-mooney | ah it was fixed a year ago https://github.com/openstack/oslo.cache/commit/40946a9349407f36a43d5020d991085c11468698 | 15:01 |
sean-k-mooney | jrosser: so ya now it wont fail since it will only do the import if you try to use it | 15:01 |
sean-k-mooney | https://bugs.launchpad.net/oslo.cache/+bug/1928318 | 15:01 |
sean-k-mooney | its backported back to xena | 15:02 |
jrosser | sean-k-mooney: ah excellent - we get a steady drip of LP bugs to openstack-ansible about that log message | 15:04 |
sean-k-mooney | i was pretty sure that got fixed since i rembere seeign it in devstack often enough but i havent in a while | 15:09 |
*** andrewbonney_ is now known as andrewbonney | 16:29 | |
*** ricolin_ is now known as ricolin | 16:29 | |
*** dansmith_ is now known as dansmith | 16:55 | |
*** melwitt_ is now known as melwitt | 18:08 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!