bauzas | good morning | 07:27 |
---|---|---|
gibi | o/ | 08:05 |
ykarel | sean-k-mooney[m], gibi can you please check https://bugs.launchpad.net/neutron/+bug/2015065 comment 7/8 | 08:06 |
ykarel | randomly one of nova-api worker just get's stuck when doing requests to neutron(not sure if same is seen with any other service yet) | 08:07 |
gibi | ykarel: quickly looked at the bug. Thanks for collecting all that data. When the nova-api stuck in calling neutronclient's show_security_group do you see that the actualy API request to neutron was sent but never received by neutron-server? Or nova-api is stuck on sending the message? | 08:42 |
ykarel | gibi, i don't see the request received on neutron side, not sure where to check if it's stuck on sending | 08:49 |
gibi | ykarel: ack | 08:51 |
gibi | ykarel: | 09:03 |
gibi | i feel like we are seeing an interesting interaction between multiple things | 09:03 |
gibi | I'm trying to follow the stack trace from the latest comment from the bug to see where the neutronclient got stuck | 09:04 |
gibi | the firts interesting point is | 09:04 |
gibi | /usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py:28 in is_connection_dropped | 09:04 |
gibi | https://github.com/urllib3/urllib3/blob/a5b29ac1025f9bb30f2c9b756f3b171389c2c039/src/urllib3/connectionpool.py#L272 | 09:04 |
gibi | so urllib try to check if the existing client connection is still usable or got disconnected | 09:05 |
gibi | https://github.com/urllib3/urllib3/blob/a5b29ac1025f9bb30f2c9b756f3b171389c2c039/src/urllib3/util/connection.py#L28 | 09:05 |
gibi | wait_for_read(sock, timeout=0.0) | 09:05 |
gibi | os it checks if it can read from the socket with 0.0 timeout | 09:06 |
gibi | https://github.com/urllib3/urllib3/blob/a5b29ac1025f9bb30f2c9b756f3b171389c2c039/src/urllib3/util/wait.py#L84-L85 | 09:06 |
gibi | that 0.0 timeout is passed to python's select.select | 09:06 |
gibi | https://docs.python.org/3.10/library/select.html#select.select | 09:07 |
gibi | "The optional timeout argument specifies a time-out as a floating point number in seconds. When the timeout argument is omitted the function blocks until at least one file descriptor is ready. A time-out value of zero specifies a poll and never blocks." | 09:07 |
gibi | so that select.select called with 0.0 should never block | 09:07 |
gibi | BUT | 09:07 |
gibi | in our env the envtlet monkey patching is changing python's select.select | 09:08 |
gibi | /usr/local/lib/python3.10/dist-packages/eventlet/green/select.py:80 in select | 09:08 |
gibi | and redirects it to implement the envtlet switching mechanism | 09:08 |
gibi | https://github.com/eventlet/eventlet/blob/88ec603404b2ed25c610dead75d4693c7b3e8072/eventlet/green/select.py#L30-L80C32 | 09:11 |
gibi | looking at that code it seems enventlet sets a timer with the timeout value | 09:12 |
gibi | via hub.schedule_call_global | 09:12 |
gibi | here I'm getting lost in the eventlet code but I assume sheduling a timer with 0.0 timeout in eventlet can be racy | 09:17 |
gibi | based on the comment in https://github.com/eventlet/eventlet/blob/88ec603404b2ed25c610dead75d4693c7b3e8072/eventlet/green/select.py#L62-L69 | 09:17 |
gibi | one could argue that what we see is an eventlet bug as select.select with timeout=0.0 should not ever block but it does block in our case. | 09:21 |
opendevreview | suzhengwei proposed openstack/nova master: rename 'recreate' to 'evacuate' https://review.opendev.org/c/openstack/nova/+/883810 | 09:28 |
ykarel | Thanks gibi for checking, anyway the issue can be fixed/worked around on nova side? | 10:24 |
gibi | I'm trying to open an issue on eventlet and see if the maintainer agrees with my analysis or not. I don't see now any easy workaround. Maybe sean-k-mooney or melwitt can see some | 10:25 |
gibi | ykarel: I will update the launchpad bug | 10:25 |
ykarel | Thanks gibi | 10:26 |
sean-k-mooney | gibi: sorry i missed the start of this what is the issue | 10:28 |
gibi | nova-api using neutron client to call neutron API but get stuck for ever | 10:29 |
gibi | based on the stack trace it stuck checking if the previous connection is still usable | 10:30 |
gibi | we end up in eventlet monkeypatched select.select on a socket | 10:30 |
gibi | with a timeout 0.0 | 10:30 |
gibi | based on the stdlib doc timeout 0.0 means non blocking but we still block | 10:31 |
gibi | so I assume eventlet not properly handles timeout 0.0 in the eventlet select impl | 10:31 |
sean-k-mooney | i see | 10:31 |
sean-k-mooney | that or its python version specirif | 10:31 |
sean-k-mooney | but ya sound like a api compaitblity bug | 10:31 |
gibi | details are here https://bugs.launchpad.net/neutron/+bug/2015065 | 10:31 |
sean-k-mooney | Changed in version 3.7: The method no longer toggles SOCK_NONBLOCK flag on socket.type. | 10:36 |
sean-k-mooney | https://docs.python.org/3/library/socket.html#socket.socket.settimeout | 10:36 |
sean-k-mooney | ykarel: gibi: it looks like we shoudl not be using 0.0 to make it non-blocking after 3.7 | 10:37 |
sean-k-mooney | we should be using socket.setblocking(false) | 10:38 |
gibi | sean-k-mooney: https://docs.python.org/3.10/library/select.html#select.select for select.select timeout=0.0 still means | 10:38 |
gibi | non blocking | 10:38 |
gibi | so it is not the socket that is set to non blocking mode, it is select called with non blocking timeout | 10:38 |
sean-k-mooney | i feel like this is still a logic but on our part | 10:40 |
sean-k-mooney | we likely shoudl be settign timeout to a non zeor value and setting the socket to non-blocking | 10:40 |
gibi | this is urllib3, not our code | 10:40 |
sean-k-mooney | fun | 10:42 |
sean-k-mooney | gibi: as far as i can tell eventlet supprot for 3.10 is still not fully complete | 10:44 |
opendevreview | Sahid Orentino Ferdjaoui proposed openstack/nova master: [wip]network: convert usage of neutronclient to openstacksdk https://review.opendev.org/c/openstack/nova/+/882714 | 10:53 |
sean-k-mooney | gibi: so it does look like its alwasy a blockign operation https://github.com/eventlet/eventlet/blob/master/eventlet/green/select.py#L30-L36 | 10:53 |
gibi | filed https://github.com/eventlet/eventlet/issues/798 | 10:54 |
gibi | sean-k-mooney: as far as I understand the eventlet code they do block but set up a timer to wake up after the given timeout | 10:55 |
sean-k-mooney | if timeout is not None: | 10:55 |
gibi | so I think that can race and the code can miss the weakup as it is not reached hub.switch() yet | 10:55 |
sean-k-mooney | timers.append(hub.schedule_call_global(timeout, on_timeout)) | 10:55 |
sean-k-mooney | so yes | 10:55 |
sean-k-mooney | but they proably need to instead spwan this in a seperate greenthread | 10:56 |
sean-k-mooney | although based on the assert | 10:56 |
sean-k-mooney | https://github.com/eventlet/eventlet/blob/master/eventlet/green/select.py#L40 | 10:56 |
sean-k-mooney | they are expecting you to be spanwign this in a thread pool or similar i think | 10:57 |
sean-k-mooney | not that we have contol over this really | 10:57 |
gibi | yeah | 10:58 |
sean-k-mooney | i guess we will see what they say | 10:58 |
sean-k-mooney | for what its worth this is happening in tempest right | 10:58 |
gibi | also I don't think I fully understand the logic of the double timer in https://github.com/eventlet/eventlet/blob/88ec603404b2ed25c610dead75d4693c7b3e8072/eventlet/green/select.py#L59-L72 | 10:58 |
gibi | but it feel scarry | 10:58 |
sean-k-mooney | im surpised we are using eventlet in tempest | 10:58 |
gibi | we need eventlet in nova-api for the scatter-gather, do we? | 10:59 |
gibi | don't we? | 10:59 |
sean-k-mooney | oh sorry i tought this was the tempest client that was blocking | 11:00 |
sean-k-mooney | not in nova | 11:00 |
gibi | it is nova-api using neutron client to call neutron API | 11:00 |
gibi | then nova-api blocks | 11:01 |
sean-k-mooney | i see | 11:01 |
gibi | and therefore tempest call to nova-api timeouts too | 11:01 |
sean-k-mooney | yep yep yep | 11:01 |
sean-k-mooney | following now | 11:01 |
gibi | OK | 11:01 |
sean-k-mooney | ya thats not good | 11:01 |
gibi | I will stop here now and move over to golang for another set of challenges :) | 11:01 |
gibi | let's see if eventlet maintainers has some ideas | 11:01 |
gibi | another way would be to ask urllib3 maintainers to change their side | 11:02 |
sean-k-mooney | well eventlet is monkeytpatching urllib3 | 11:02 |
sean-k-mooney | https://github.com/eventlet/eventlet/blob/master/eventlet/green/urllib/request.py | 11:03 |
gibi | (or we can rip out eventlet from nova and move to pure threads in the next coupe of years :)) | 11:03 |
sean-k-mooney | well nova-api didnt have any direct depency on eventlet until we added scater gateher | 11:03 |
sean-k-mooney | so if it was just that we could rip that out | 11:03 |
sean-k-mooney | but this could obvioulys happen for all teh other client invocations | 11:04 |
sean-k-mooney | in the other services so ya... | 11:04 |
gibi | hm, that seems like only monkey patching urllib but not urllib3 | 11:05 |
sean-k-mooney | maybe i didn't look too closely i assumed it was the same | 11:05 |
gibi | it seem for urllib3 the socket and select monkey considered enough | 11:05 |
opendevreview | Merged openstack/os-vif stable/wallaby: Use TCP keepalives for ovsdb connections https://review.opendev.org/c/openstack/os-vif/+/841773 | 11:07 |
opendevreview | Merged openstack/os-vif stable/wallaby: only register tables used by os-vif https://review.opendev.org/c/openstack/os-vif/+/841774 | 11:07 |
sean-k-mooney | oh urllib is the standard libary | 11:08 |
sean-k-mooney | https://docs.python.org/3/library/urllib.html?highlight=urllib#module-urllib | 11:09 |
opendevreview | Amit Uniyal proposed openstack/nova master: WIP: Reproducer for dangling volumes https://review.opendev.org/c/openstack/nova/+/881457 | 11:32 |
opendevreview | Amit Uniyal proposed openstack/nova master: WIP: Delete dangling bdms https://review.opendev.org/c/openstack/nova/+/882284 | 11:32 |
*** dmellado90 is now known as dmellado | 12:48 | |
opendevreview | Dan Smith proposed openstack/nova master: Populate ComputeNode.service_id https://review.opendev.org/c/openstack/nova/+/879904 | 13:37 |
opendevreview | Dan Smith proposed openstack/nova master: Add compute_id columns to instances, migrations https://review.opendev.org/c/openstack/nova/+/879499 | 13:37 |
opendevreview | Dan Smith proposed openstack/nova master: Add dest_compute_id to Migration object https://review.opendev.org/c/openstack/nova/+/879682 | 13:37 |
opendevreview | Dan Smith proposed openstack/nova master: Add compute_id to Instance object https://review.opendev.org/c/openstack/nova/+/879500 | 13:37 |
opendevreview | Dan Smith proposed openstack/nova master: Online migrate missing Instance.compute_id fields https://review.opendev.org/c/openstack/nova/+/879905 | 13:37 |
opendevreview | Sylvain Bauza proposed openstack/nova stable/zed: Fix get_segments_id with subnets without segment_id https://review.opendev.org/c/openstack/nova/+/883723 | 15:39 |
opendevreview | Sylvain Bauza proposed openstack/nova stable/yoga: Fix get_segments_id with subnets without segment_id https://review.opendev.org/c/openstack/nova/+/883724 | 15:40 |
opendevreview | Sylvain Bauza proposed openstack/nova stable/xena: Fix get_segments_id with subnets without segment_id https://review.opendev.org/c/openstack/nova/+/883725 | 15:41 |
opendevreview | Sylvain Bauza proposed openstack/nova stable/wallaby: Fix get_segments_id with subnets without segment_id https://review.opendev.org/c/openstack/nova/+/883726 | 15:42 |
sean-k-mooney | o/ | 15:50 |
sean-k-mooney | melwitt: bauzas dansmith could ye take a look at https://review.opendev.org/c/openstack/nova/+/853269/1 and the follow ups | 15:53 |
melwitt | sure | 15:54 |
bauzas | done | 15:54 |
sean-k-mooney | thanks | 15:55 |
melwitt | sweet | 15:55 |
dansmith | so if powerkvm is gone, why do we still have its CI commenting on all our patches? Looks to me like it can't even devstack anymore | 16:20 |
dansmith | maybe it's just a bot in a cloud somewhere that someone forgot to turn off? | 16:20 |
opendevreview | Merged openstack/nova stable/xena: Reproducer for bug 1983753 https://review.opendev.org/c/openstack/nova/+/853269 | 16:23 |
clarkb | dansmith: there should be a contact email in the account as well as on the wiki for the third party ci. Asking them to stop would be the first step and if they don't then we (a gerrit admin) can disable the account | 16:30 |
dansmith | okay, the "lightbits" CI also seems superfluous, but I think we've asked before and nobody knows what it is? | 16:33 |
sean-k-mooney | i do | 16:33 |
sean-k-mooney | its a ci that test tehre os-brick integration | 16:33 |
sean-k-mooney | its testing changes to https://github.com/openstack/nova/blob/master/nova/virt/libvirt/volume/lightos.py | 16:34 |
sean-k-mooney | https://github.com/openstack/nova/commit/b5e2128f3847d444a808a2b0f89e6f1e4ffb77fc | 16:34 |
sean-k-mooney | we asked them to set it up and maintain it whne https://www.lightbitslabs.com/ wanted to integrate with nova | 16:35 |
sean-k-mooney | which they did in yoga so its relitivly new | 16:35 |
sean-k-mooney | baiscally they upstream the integratio they had internally for several years | 16:36 |
dansmith | hmm, okay | 16:37 |
dansmith | the powerkvm contact either never made it to oftc or isn't around | 16:39 |
dansmith | never registered no oftc at least | 16:40 |
dansmith | I can email to ask | 16:40 |
opendevreview | Merged openstack/nova stable/xena: Update RequestSpec.pci_request for resize https://review.opendev.org/c/openstack/nova/+/853270 | 17:17 |
opendevreview | Merged openstack/nova stable/xena: Add reno for fixing bug 1941005 https://review.opendev.org/c/openstack/nova/+/853271 | 17:17 |
opendevreview | ribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (db) https://review.opendev.org/c/openstack/nova/+/831193 | 17:24 |
opendevreview | ribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (objects) https://review.opendev.org/c/openstack/nova/+/839401 | 17:24 |
opendevreview | ribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (manila abstraction) https://review.opendev.org/c/openstack/nova/+/831194 | 17:24 |
opendevreview | ribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (drivers and compute manager part) https://review.opendev.org/c/openstack/nova/+/833090 | 17:24 |
opendevreview | ribaudr proposed openstack/nova master: Attach Manila shares via virtiofs (api) https://review.opendev.org/c/openstack/nova/+/836830 | 17:24 |
opendevreview | ribaudr proposed openstack/nova master: Check shares support https://review.opendev.org/c/openstack/nova/+/850499 | 17:24 |
opendevreview | ribaudr proposed openstack/nova master: Add metadata for shares https://review.opendev.org/c/openstack/nova/+/850500 | 17:24 |
opendevreview | ribaudr proposed openstack/nova master: Add instance.share_attach notification https://review.opendev.org/c/openstack/nova/+/850501 | 17:24 |
opendevreview | ribaudr proposed openstack/nova master: Add instance.share_detach notification https://review.opendev.org/c/openstack/nova/+/851028 | 17:24 |
opendevreview | ribaudr proposed openstack/nova master: Add shares to InstancePayload https://review.opendev.org/c/openstack/nova/+/851029 | 17:24 |
opendevreview | ribaudr proposed openstack/nova master: Add helper methods to attach/detach shares https://review.opendev.org/c/openstack/nova/+/852085 | 17:24 |
opendevreview | ribaudr proposed openstack/nova master: Add libvirt test to ensure metadata are working. https://review.opendev.org/c/openstack/nova/+/852086 | 17:24 |
opendevreview | ribaudr proposed openstack/nova master: Add virt/libvirt error test cases https://review.opendev.org/c/openstack/nova/+/852087 | 17:24 |
opendevreview | ribaudr proposed openstack/nova master: Add share_info parameter to reboot method for each driver (driver part) https://review.opendev.org/c/openstack/nova/+/854823 | 17:24 |
opendevreview | ribaudr proposed openstack/nova master: Support rebooting an instance with shares (compute and API part) https://review.opendev.org/c/openstack/nova/+/854824 | 17:24 |
opendevreview | ribaudr proposed openstack/nova master: Add instance.share_attach_error notification https://review.opendev.org/c/openstack/nova/+/860282 | 17:24 |
opendevreview | ribaudr proposed openstack/nova master: Add instance.share_detach_error notification https://review.opendev.org/c/openstack/nova/+/860283 | 17:24 |
opendevreview | ribaudr proposed openstack/nova master: Add share_info parameter to resume method for each driver (driver part) https://review.opendev.org/c/openstack/nova/+/860284 | 17:25 |
opendevreview | ribaudr proposed openstack/nova master: Support resuming an instance with shares (compute and API part) https://review.opendev.org/c/openstack/nova/+/860285 | 17:25 |
opendevreview | ribaudr proposed openstack/nova master: Add helper methods to rescue/unrescue shares https://review.opendev.org/c/openstack/nova/+/860286 | 17:25 |
opendevreview | ribaudr proposed openstack/nova master: Support rescuing an instance with shares (driver part) https://review.opendev.org/c/openstack/nova/+/860287 | 17:25 |
opendevreview | ribaudr proposed openstack/nova master: Support rescuing an instance with shares (compute and API part) https://review.opendev.org/c/openstack/nova/+/860288 | 17:25 |
opendevreview | ribaudr proposed openstack/nova master: Mounting the shares as part of the initialization process https://review.opendev.org/c/openstack/nova/+/880075 | 17:25 |
opendevreview | ribaudr proposed openstack/nova master: Deletion of associated share mappings on instance deletion https://review.opendev.org/c/openstack/nova/+/881472 | 17:25 |
opendevreview | ribaudr proposed openstack/nova master: Docs about Manila shares API usage https://review.opendev.org/c/openstack/nova/+/871642 | 17:25 |
opendevreview | ribaudr proposed openstack/nova master: Allow to mount manila share using Cephfs protocol https://review.opendev.org/c/openstack/nova/+/883862 | 17:25 |
opendevreview | Merged openstack/nova master: Fixes a typo in availability-zone doc https://review.opendev.org/c/openstack/nova/+/883474 | 19:49 |
opendevreview | Ghanshyam proposed openstack/nova master: DNM testing cirros 0.6.1 https://review.opendev.org/c/openstack/nova/+/883875 | 20:15 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!