*** gibi_off is now known as gibi | 07:02 | |
opendevreview | Christian Rohmann proposed openstack/nova master: db: Drop redundant indeces on instances and console_auth_tokens tables https://review.opendev.org/c/openstack/nova/+/856757 | 07:11 |
---|---|---|
Uggla | Good morning Nova. | 07:46 |
bauzas | good morning | 07:51 |
gibi | o/ | 07:52 |
gibi | Uggla: do you still have a question on my comment on the manila series ? | 07:52 |
Uggla | hi gibi , no that's ok for the moment. :) | 07:53 |
gibi | Uggla: cool. Sorry for not responding last week I was deep in some k8s discussions | 07:53 |
Uggla | gibi, no worries that's fine. | 07:54 |
sahid | o/ I have specific use-case regarding nova host-evacuate, we would like evacuate host to run but that, all instances scheduled to be forced as shutdown | 08:43 |
sahid | is there a way to have this happening? | 08:43 |
sean-k-mooney | no | 08:48 |
sean-k-mooney | evacuate at the api level result in the vm being evacuated to the same state its in in the db currently | 08:48 |
sean-k-mooney | so you would need an api change for that. | 08:48 |
sean-k-mooney | novaclient's shell is deprecated so we are not adding or alterign any commands | 08:49 |
sean-k-mooney | and nova host-evacuate is intentionally not supported in osc | 08:49 |
sean-k-mooney | so at this point we should not alter/extend its behavior | 08:49 |
gibi | I'm wondering what happens if you try to first stop the VM then evacuate it | 08:50 |
sahid | gibi if the host is down, nothing is happening | 08:53 |
sahid | the other idea was to extend resetState | 08:53 |
gibi | I feel like this might be a new microversion to the evacuate action, adding a flag to instruct nova to evacuate but not start the VM on the dest | 08:54 |
sahid | it's what I was thinking as-well but for host-evacuate it seems that you don't want we make any changes | 08:56 |
gibi | host-evacuate is a client side concept. You can replace that with a shell script calling the openstack client | 08:56 |
sahid | side question, why host-evacuate is not supported in openstack client? | 08:57 |
gibi | what you cannot do is to make a active VM evacuated as stopped via the nova REST API today, hence my microversion thinking | 08:57 |
sean-k-mooney | sahid: because of what gibi said | 08:57 |
sean-k-mooney | you its a client side implemation and we did not want to support it any more | 08:58 |
sean-k-mooney | the error handeling is terrible | 08:58 |
gibi | sahid: because it is considered orcestration | 08:58 |
sean-k-mooney | well that too | 08:58 |
sahid | yes that makes sense, i understand now | 08:58 |
sean-k-mooney | but more because if one of the evacuation fails its kind of undefiend what the end result of the commnd is | 08:58 |
sahid | so back to the original use-case, does that would make sense to have evacuate with a flag to force the state? | 08:59 |
sean-k-mooney | it wont be one of (all evacuated or all still on orginal host) it will be a mix | 08:59 |
sean-k-mooney | sahid: i would say target state | 08:59 |
sean-k-mooney | rahter then force | 08:59 |
sean-k-mooney | that has been requested before at the last inperson ptg i think | 08:59 |
sean-k-mooney | i would not be apposed to a eveacuate to stopped option | 09:00 |
sean-k-mooney | im not sure that shelved makes sense | 09:00 |
sean-k-mooney | but started/stopped i can see | 09:00 |
gibi | I think target_state enum (AsBefore,Stopped) | 09:00 |
gibi | make sense | 09:00 |
gibi | AsBefore=NoChange | 09:00 |
gibi | can we evacuate a shelved instance? | 09:01 |
sean-k-mooney | on reset-sate while i would like to expand what it can do so that you can specify somehting other then aviable/error im not sure this is the right way to do this | 09:01 |
sean-k-mooney | gibi: no | 09:01 |
sean-k-mooney | gibi: because its not on a host | 09:01 |
gibi | OK, cool then :D | 09:01 |
gibi | I started worrying :) | 09:01 |
sean-k-mooney | im ment it would not make sense to evacuate to shelve_offloaded | 09:02 |
sean-k-mooney | we could allow shelve_offloading when its down instead | 09:02 |
sean-k-mooney | but its not really evacuate | 09:02 |
sean-k-mooney | evaucate is ment to move the vm form one host to another | 09:02 |
sean-k-mooney | where as shelve/unshleve is moving form on a host to not and vise versa | 09:03 |
gibi | yeah | 09:03 |
sean-k-mooney | its kind of a pendantic distinction but i dont quite consider them equal | 09:03 |
sean-k-mooney | you could argue it either way | 09:03 |
sean-k-mooney | so i would not be agaisn allowing stop ot work in a host dwonstate by the way | 09:04 |
sean-k-mooney | you woudl update the db and treat it kind of like local delete | 09:04 |
sean-k-mooney | when the compute agent comes back up it woudl reconsile the vm state | 09:04 |
sean-k-mooney | if you stoped it then evacuated that would solve sahid's case | 09:05 |
opendevreview | Amit Uniyal proposed openstack/nova master: Adds a repoducer for post live migration fail https://review.opendev.org/c/openstack/nova/+/854499 | 09:05 |
opendevreview | Amit Uniyal proposed openstack/nova master: [compute] always set instnace.host in post_livemigration https://review.opendev.org/c/openstack/nova/+/791135 | 09:05 |
sahid | sean-k-mooney: yes it's also a possibility | 09:05 |
sean-k-mooney | the one thing to keep in mind i guess is that even if we allow stop | 09:06 |
sean-k-mooney | it doen not chnage the responsiblity for the admin | 09:07 |
sean-k-mooney | that is you are requried as an admin to ensure a host is fenced or all vms are stopped before you evacuate | 09:07 |
sean-k-mooney | if we allow stop in a down host state the admin still need to ensure it is stoped to prevent data currpption | 09:08 |
sean-k-mooney | but if they can then that woudl allwo them to evacuate without start the vm again | 09:08 |
gibi | this is why I would connect the stopping to the evacuation action, that way it is clear that on the source host it is not stopped | 09:08 |
sean-k-mooney | ack ya that cleaner | 09:08 |
sean-k-mooney | and the existing check for is it safe to evacute woudl also be checked | 09:09 |
gibi | yes | 09:09 |
sean-k-mooney | e.g. the heatbeat has been missed or you set force_down | 09:09 |
sean-k-mooney | i think johnthetubaguy expressed interest in this in the past | 09:09 |
sean-k-mooney | or at least supprot for the people that were askign for it in the past | 09:10 |
sean-k-mooney | oh that reminds me i guess we are not merging my default change? | 09:11 |
sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/830829 | 09:11 |
sean-k-mooney | i would either like to merge that before RC1 or after we create the stable branch | 09:12 |
sahid | thank you guys - are we agree to extend evacuate with target state (AsBefore, Stopped) ? Can I report a bug with destailled description or should I share a spec? | 09:17 |
sean-k-mooney | sahid: all api changes require a spec regardless of how trivial | 09:21 |
sean-k-mooney | this would need a spec and a new microverion | 09:21 |
sahid | ack I make this happen for A. | 09:22 |
sean-k-mooney | it can be a pretty short spec but there will at least be a conductor rpc change and likely a compute one too pass the target state | 09:22 |
sahid | sure no worries | 09:22 |
sean-k-mooney | sahid: the repo is open for sepc reviews so whenever you have time feel free to submit one | 09:23 |
sahid | +1 | 09:24 |
bauzas | sahid, gibi, sean-k-mooney: tbh, I think we discussed about the host-evacuate support before, and we said this was close to orchestration as a client could be doing it | 09:29 |
bauzas | so the consensus is to tell that some client or script could be doing it | 09:29 |
bauzas | (or Heat, or whatever else) | 09:30 |
sean-k-mooney | bauzas: yep | 09:31 |
sean-k-mooney | bauzas: but what we were discussiong regaring a spec wa allwoing the api to accept a target state to evacuate too | 09:31 |
sean-k-mooney | oneof:active,poweredoff or current | 09:32 |
sean-k-mooney | where current or not specified means what we do today | 09:32 |
zigo | gibi: oslo.concurrency 5.0.1 hangs when I try to run its unit tests, which is probably related to your patch (at: https://review.opendev.org/c/openstack/oslo.concurrency/+/855714 ). Any idea what's going on? Do I need the latest Eventlet (I know we're lagging one minor version behind)? | 10:10 |
zigo | Ah no, I'm even with 0.30.2, maybe that's why... | 10:12 |
* zigo tries upgrading | 10:14 | |
gibi | zigo: do you know where it is hanging? which test acse? | 10:14 |
gibi | case | 10:14 |
zigo | Hard to tell... | 10:15 |
zigo | Last output was: https://paste.opendev.org/show/bsDsX7d0gw1SDAnLcRMA/ | 10:16 |
zigo | Then it hangged ... | 10:16 |
sean-k-mooney | whats that form? | 10:16 |
zigo | sean-k-mooney: building oslo.concurrency. | 10:16 |
sean-k-mooney | oh i disconnected and reconnected for a bit | 10:17 |
sean-k-mooney | missed the start of your conversation with gibi i think | 10:17 |
zigo | Problem is: I have autopkgtest issues in Eventlet ... 0.33.x :/ | 10:18 |
sean-k-mooney | https://github.com/openstack/oslo.concurrency/blob/5397838f4117300a509bff474dfcdd60b5993677/oslo_concurrency/tests/unit/test_processutils.py#L184-L201 | 10:18 |
sean-k-mooney | i see | 10:20 |
sean-k-mooney | so thats failing whiel building in some cases | 10:20 |
sean-k-mooney | im not really sure who/why | 10:20 |
gibi | zigo: could you point to how you run the unti tests? | 10:21 |
sean-k-mooney | the est is just concorting an instnace of an exeption class | 10:21 |
sean-k-mooney | asserting when you call str() on it that it contians the message | 10:21 |
gibi | there are unit tests in the repo that can be run with and without eventlet monkey patching but there are tests that can only run in eventlet | 10:22 |
gibi | hence the https://github.com/openstack/oslo.concurrency/blob/01cf2ffdf48c21f886b2aa3f766be5d268248c18/tox.ini#L14-L15 | 10:22 |
sean-k-mooney | maybe this print is the issue https://github.com/openstack/oslo.concurrency/blob/5397838f4117300a509bff474dfcdd60b5993677/oslo_concurrency/tests/unit/test_processutils.py#L203 | 10:22 |
zigo | I simply do this: | 10:22 |
zigo | PYTHON=python3 stestr run --subunit | subunit2pyunit | 10:22 |
gibi | I can imagine that if you run the eventlet aware test without eventlet monkey patching then the eventlet only test might missbehave | 10:23 |
zigo | I'll try further and let you know where it leads me. | 10:23 |
sean-k-mooney | if you do things like eventlet.spwawn directly without monkeypatching | 10:24 |
sean-k-mooney | you need to manually invoke the event loop to have it run | 10:24 |
sean-k-mooney | we saw that in the nova-api when we added scater gather | 10:24 |
gibi | zigo: based on that command line you run without eventlet monkey patching but you still run tests from https://github.com/openstack/oslo.concurrency/blob/master/oslo_concurrency/tests/unit/test_lockutils_eventlet.py | 10:25 |
gibi | zigo: you can try not running those test to see if that resolve the hang | 10:26 |
sean-k-mooney | you could fix them by adding eventlet.sleep(seconds=0) | 10:26 |
sean-k-mooney | i think that will make it work if not monkeypatched | 10:26 |
sean-k-mooney | but ya not runnign them would be better | 10:27 |
gibi | there is the place where the test monkey patches selectively https://github.com/openstack/oslo.concurrency/blob/master/oslo_concurrency/tests/__init__.py | 10:27 |
sean-k-mooney | https://github.com/openstack/oslo.concurrency/blob/5397838f4117300a509bff474dfcdd60b5993677/oslo_concurrency/tests/unit/test_lockutils_eventlet.py#L49 | 10:28 |
sean-k-mooney | so if we also do that on lin 51 before the pool.waitall() | 10:28 |
sean-k-mooney | i think that might actully work in either case | 10:28 |
sean-k-mooney | but i might make sense for use to use the skip funciton in the classs | 10:28 |
sean-k-mooney | to have it skip in not monkey patched | 10:29 |
sean-k-mooney | gibi: is there any reason not to check if we are monkey patched in the setup function and call skipTest | 10:35 |
gibi | I don't know we might need to consult with oslo cores | 10:40 |
gibi | because there was eventlet specific tests before I added mine I thought it is handled centrally not to run them in a non patched env | 10:40 |
gibi | this might not be the case | 10:41 |
sean-k-mooney | i dont see that logic genericlly | 10:41 |
sean-k-mooney | its certenly possible to add | 10:41 |
noonedeadpunk | hello folks! I was wondering - does reverting resize failure rings anybody a bell? Ie - create server, resize server, revert resize -> VM is "stuck" in REVERT_RESIZE until message timeouts, then it goes back to VERIFY_RESIZE but with original flavor, and then nova-compute shutdown VM on hypervisor at all. The only way to recover is to reset state | 11:20 |
sean-k-mooney | not that i recall but i can see that poteilaly happening if we raise an excption in the revert path and cant proceed | 11:21 |
sean-k-mooney | like if the souce host was down or soemthing like that we would not be able to revert | 11:22 |
noonedeadpunk | paste: https://paste.openstack.org/show/bh3kML89sPFYDN9HkDSd/ | 11:22 |
noonedeadpunk | sean-k-mooney: to have that said, out of 100 tempest runs of tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert 57 has failed | 11:23 |
noonedeadpunk | the stack trace I've spotted: https://paste.openstack.org/show/bytEWO0CHe8cSVktE3tv/ | 11:24 |
noonedeadpunk | I assumed it can be related to the heartbeat_in_pthread thing, as in the region it was "default" setting on Xena (which is enabled), but disabling it didn't fix that | 11:25 |
sean-k-mooney | do you have any timeouts form ovsdbapp | 11:27 |
sean-k-mooney | in the nova-compute logs | 11:27 |
noonedeadpunk | sean-k-mooney: um, nope | 11:30 |
noonedeadpunk | it's ovs, not ovn fwiw | 11:30 |
sean-k-mooney | ya it would be the same either way | 11:30 |
sean-k-mooney | https://bugzilla.redhat.com/show_bug.cgi?id=2085583 | 11:30 |
sean-k-mooney | i was wondering if it was related to that | 11:30 |
sean-k-mooney | those are still pending backport upstream https://review.opendev.org/c/openstack/os-vif/+/841771 | 11:32 |
sean-k-mooney | https://review.opendev.org/c/openstack/os-vif/+/841772/1 | 11:32 |
sean-k-mooney | if you are using the native os-vif backend then the compute agent can hang if the connection to the ovs db drops | 11:33 |
sean-k-mooney | and that can cause timeouts | 11:33 |
sean-k-mooney | the workaround is to just use the vsctl backend | 11:33 |
sean-k-mooney | you will see something that looks like thi | 11:34 |
sean-k-mooney | 2022-09-01 16:33:38.583 2 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 18 __log_wakeup /usr/lib64/python3.9/site-packages/ovs/poller.py:263 | 11:34 |
noonedeadpunk | in neutron-ovs-agent the only string referencing port is `neutron.agent.common.ovs_lib [req-3aa28a82-1d30-4713-989a-e5093e55f7ab - - - - -] Port 200e682f-b9af-487c-aa8c-605e20b99002 not present in bridge br-int` | 11:34 |
sean-k-mooney | 2022-09-01 16:33:38.584 2 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] tcp:127.0.0.1:6640: entering ACTIVE _transition /usr/lib64/python3.9/site-packages/ovs/reconnect.py:519 | 11:34 |
sean-k-mooney | 2022-09-01 16:33:40.874 2 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 18 __log_wakeup /usr/lib64/python3.9/site-packages/ovs/poller.py:263 | 11:34 |
sean-k-mooney | 2022-09-01 16:33:43.584 2 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 18 __log_wakeup /usr/lib64/python3.9/site-packages/ovs/poller.py:263 | 11:34 |
sean-k-mooney | 2022-09-01 16:33:48.587 2 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] 4999-ms timeout __log_wakeup /usr/lib64/python3.9/site-packages/ovs/poller.py:248 | 11:34 |
sean-k-mooney | noonedeadpunk: not in neutron in the nova-compute agent | 11:34 |
noonedeadpunk | ah, I would need to have DEBUG enabled | 11:35 |
noonedeadpunk | let me enable it and reproduce :) | 11:35 |
sean-k-mooney | noonedeadpunk: am well that just means that the logical port is defiend in the nortdb | 11:35 |
sean-k-mooney | but not actully added to the br-int bridge yet | 11:35 |
sean-k-mooney | i think | 11:36 |
sean-k-mooney | so not northdb but more or less the same | 11:36 |
sean-k-mooney | neutron know that the port should exist bug nova has not added it yet | 11:36 |
sean-k-mooney | noonedeadpunk: if you dont have debug enabled the symthom to look for is large gaps in the log of 5+ seconds without output | 11:37 |
sean-k-mooney | that however is not helpful if the system is idel or not activly spwaning a vm | 11:38 |
sean-k-mooney | since that is what you woudl expect it to look like unless you asked nova to do something | 11:38 |
noonedeadpunk | yeah, well, delay between `Updating port 73862d0c-b31b-4b75-b833-8e29d8066b9a with attributes {'binding:host_id': 'cc-compute04-tky1', 'device_owner': 'compute:nova'}` and stack trace is exactly 5 sec. But it's exactly the timeout | 11:40 |
sean-k-mooney | that is a little sus yes | 11:42 |
sean-k-mooney | the tl;dr of https://bugs.launchpad.net/os-vif/+bug/1929446 is that in the ovs python bindign they backislly make a blocking call to select.poll() which blocks the main thread | 11:44 |
sean-k-mooney | *basically | 11:44 |
sean-k-mooney | http://patchwork.ozlabs.org/project/openvswitch/patch/20210611142923.474384-1-twilson@redhat.com/ | 11:45 |
noonedeadpunk | ah | 11:47 |
* noonedeadpunk reproducing with debug enabled | 11:47 | |
sean-k-mooney | by the way we have seen ValueError: Circular reference detected before | 11:54 |
sean-k-mooney | but i dont think we ever figured out where that comes form or how | 11:54 |
sean-k-mooney | any time we have seen it there has always been some other error too | 11:55 |
sean-k-mooney | and fixing the othe rerror has caused both to go away | 11:55 |
noonedeadpunk | oh yes, I do see `DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 31 __log_wakeup /openstack/venvs/nova-24.0.0/lib/python3.8/site-packages/ovs/poller.py:263` when revert_resize is failed | 11:56 |
noonedeadpunk | thanks sean-k-mooney, I will check out if patch will work :) | 11:57 |
sean-k-mooney | its oke to see it if you dont see large timeouts or see it ocationally | 11:57 |
noonedeadpunk | I see it only when revert is failing | 11:58 |
sean-k-mooney | otherwise etiehr backport that patch or set os vif to use vsctl | 11:58 |
sean-k-mooney | ack | 11:58 |
sean-k-mooney | [os_vif_ovs]/ovsdb_interface=vsctl | 11:58 |
sean-k-mooney | setting that in your nova.conf will also workaround it | 11:59 |
noonedeadpunk | ah, undocumented option?:) | 12:00 |
noonedeadpunk | let me try it out then | 12:01 |
sean-k-mooney | well its not undcoumented | 12:01 |
sean-k-mooney | its an os_vif option | 12:01 |
sean-k-mooney | not nova | 12:01 |
noonedeadpunk | well, yeah, but keystone_authtoken and oslo are included. And with that as operator I would expect see rest as well... | 12:02 |
noonedeadpunk | Anyway) | 12:02 |
zigo | Looks like my issue is related to https://github.com/eventlet/eventlet/issues/730 | 12:02 |
zigo | I need to fix eventlet in Debian with py 3.10 first, and then see what's going on with oslo.concurrency ... | 12:03 |
sean-k-mooney | noonedeadpunk: we could render them i guess i was expecting them to be in https://docs.openstack.org/os-vif/latest/index.html but im not seeing them | 12:03 |
sean-k-mooney | noonedeadpunk: we have explictly list namespaces if we want them to show up | 12:04 |
sean-k-mooney | noonedeadpunk: this is where its defiend by the way https://github.com/openstack/os-vif/blob/master/vif_plug_ovs/ovs.py#L71-L82 | 12:05 |
sean-k-mooney | these are the valid options https://github.com/openstack/os-vif/blob/771dfffcd90dcd7c8c95c41744092f5ad4917be3/vif_plug_ovs/ovsdb/api.py#L18-L21 | 12:06 |
noonedeadpunk | nice, thanks! | 12:07 |
sean-k-mooney | noonedeadpunk: if your using ml2/ovs you should also be enabling https://github.com/openstack/os-vif/blob/master/vif_plug_ovs/ovs.py#L96-L99 | 12:08 |
sean-k-mooney | ignore per_port_bridge | 12:08 |
sean-k-mooney | i should remove that | 12:08 |
noonedeadpunk | til about `isolate_vif` | 12:09 |
noonedeadpunk | and yes, it your note does make sense | 12:10 |
noonedeadpunk | sean-k-mooney: `[os_vif_ovs]/ovsdb_interface = vsctl` seems not to solve the issue. | 12:30 |
noonedeadpunk | no ovsdbapp logs though | 12:32 |
sean-k-mooney | ok then its likely not beauce of the agent looking up | 12:33 |
sean-k-mooney | you mentioned you have disabled the pthread for heatbeat yes | 12:33 |
sean-k-mooney | do you you iptbale firewall or openvswtich | 12:34 |
noonedeadpunk | yeah, I did, for all neutron-ovs-agents, nova-compute/scheduler/conductor | 12:34 |
noonedeadpunk | huh. | 12:34 |
sean-k-mooney | iptables we add the linux bridge and veth pair ovs firewall we add the tap directly to ovs | 12:34 |
noonedeadpunk | I just realized I likely missed disabling for neutron-server | 12:35 |
noonedeadpunk | as I assumed it's running in uwsgi, but likely it's not in this deployment | 12:35 |
sean-k-mooney | well neutron-server | 12:35 |
sean-k-mooney | ah i was going to sayy its proably wsgi | 12:35 |
sean-k-mooney | ya worth checking | 12:36 |
sean-k-mooney | that has been reverted to off by default | 12:36 |
sean-k-mooney | not sure if its been released yet on stable branches | 12:36 |
noonedeadpunk | nah, I disabled heartbeat_in_pthread for neutron-server as well | 12:36 |
noonedeadpunk | firewall_driver = iptables_hybrid | 12:37 |
noonedeadpunk | yeah, I guess there's an lxb in place | 12:37 |
sean-k-mooney | yep | 12:49 |
sean-k-mooney | ok on sec | 12:49 |
sean-k-mooney | https://github.com/openstack/nova/commit/0b0f40d1b308b29da537859b72080488560c23d4 | 12:51 |
noonedeadpunk | Tbh I still kind of blame pthread, as actual error is `oslo_messaging.rpc.server eventlet.timeout.Timeout: 300 seconds` which is exactly what would happen because of pthreads iirc | 12:51 |
noonedeadpunk | and it's intermittent as well | 12:51 |
noonedeadpunk | but only revert is affected which is weird given it's pthread | 12:52 |
sean-k-mooney | https://bugs.launchpad.net/nova/+bug/1952003 | 12:52 |
sean-k-mooney | so we when back and foth with this due to a few inflight change at once | 12:52 |
sean-k-mooney | but you are probaly hitting that | 12:52 |
sean-k-mooney | we shoudl not be waiting for the network vif plugged event if you have a specic set of patches. | 12:53 |
sean-k-mooney | noonedeadpunk: do you have https://github.com/openstack/nova/commit/66c7f00e1d9d7c0eebe46eb4b24b2b21f7413789 | 12:54 |
gibi | zigo: ack, let me know if I can help somehow | 12:56 |
sean-k-mooney | when we adressed https://bugs.launchpad.net/nova/+bug/1895220 it intoduced https://bugs.launchpad.net/nova/+bug/1952003 which orginially fixed https://bugs.launchpad.net/nova/+bug/1832028 and https://bugs.launchpad.net/nova/+bug/1833902 | 12:56 |
sean-k-mooney | noonedeadpunk: https://github.com/openstack/nova/commit/0b0f40d1b308b29da537859b72080488560c23d4 is in yoga | 12:57 |
noonedeadpunk | sean-k-mooney: I do have https://github.com/openstack/nova/commit/66c7f00e1d9d7c0eebe46eb4b24b2b21f7413789 | 12:58 |
noonedeadpunk | I guess I don't have I3cb39a9ec2c260f422b3c48122b9db512cdd799b though, as it's Xena | 12:58 |
sean-k-mooney | noonedeadpunk: what about https://review.opendev.org/c/openstack/nova/+/828414 | 12:58 |
sean-k-mooney | we backproted it | 12:58 |
sean-k-mooney | but only 5 months ago | 12:59 |
noonedeadpunk | Nah, we did not do this minor upgrade | 12:59 |
noonedeadpunk | Let me check it out | 12:59 |
sean-k-mooney | im not sure if we have done a release since then | 12:59 |
noonedeadpunk | Gerrit says you did :p | 13:00 |
noonedeadpunk | But we run 24.0.1.dev10, and it's included in 24.1.1 | 13:00 |
sean-k-mooney | no it say we merge it | 13:00 |
sean-k-mooney | where did you see that in gerrit | 13:00 |
noonedeadpunk | three dots in upper right corner -> included in | 13:01 |
sean-k-mooney | oh wow didnt know that was a thing | 13:01 |
sean-k-mooney | https://github.com/openstack/releases/commit/ac4be06827ec7a450233244d8c5cae8834b95ffc | 13:02 |
noonedeadpunk | it was there even in gerrit 2 | 13:02 |
sean-k-mooney | but yes we did that on 21st jun | 13:02 |
sean-k-mooney | never used it i normally just check the release in github | 13:03 |
sean-k-mooney | noonedeadpunk: so ya soory i think its https://bugs.launchpad.net/nova/+bug/1952003 | 13:04 |
sean-k-mooney | on the pluse side if it is then you just need to do the minor update when you have time too | 13:05 |
noonedeadpunk | ah, bug report is indeed super familiar | 13:18 |
zigo | gibi: Building Eventlet, I still get this: | 13:29 |
zigo | https://paste.opendev.org/show/bIinQaPTAy81Uac3ZPHS/ | 13:29 |
zigo | After a lot of head-scratching, I can't get how to fix it (note: I already cherry-picked https://github.com/eventlet/eventlet/pull/754/commits/cd2532168e33d892de625f9fc831bf0951f4e937 the collections.abc.Iterable one, and another about ssl_version=ssl.PROTOCOL_TLSv1_2). | 13:29 |
zigo | The send_method object contains what, in fact? | 13:29 |
zigo | I see it's self.fd.send_method or something ... | 13:30 |
*** dasm|off is now known as dasm | 13:33 | |
* zigo tries this patch: https://github.com/eventlet/eventlet/commit/f0a887b94a86f9567e33037646712b89f02ae441 | 13:35 | |
* zigo gives up and skips the broken unit test. | 13:43 | |
gibi | I looked at it but I have no ideas either on that | 13:43 |
noonedeadpunk | sean-k-mooney: seems that patch revert does work, thanks a lot! | 14:14 |
sean-k-mooney | we had 3 or 4 supper niche edgcases that we resolve and unfortetly it took use a while to relase that that was nolonger required after we fixed that previous bug | 14:16 |
sean-k-mooney | noonedeadpunk: im glad this is working for you | 14:16 |
zigo | gibi: In oslo.concurrency, I tried reverting "Fix fair internal lock used from eventlet.spawn_n" and it was still stuck, so now I'm trying to revert "Prove that spawn_n with fair lock is broken" ... | 14:34 |
sean-k-mooney | zigo: just an an fyi that fix is needed | 14:35 |
sean-k-mooney | zigo: without it none of the fair logs in nova actully work | 14:35 |
zigo | You mean the "Fix fair internal lock used from eventlet.spawn_n" ? | 14:36 |
sean-k-mooney | yes | 14:36 |
sean-k-mooney | that is required for correctness | 14:36 |
zigo | Right, though it doesn't seem to be the brokenness ... | 14:36 |
sean-k-mooney | ack | 14:36 |
zigo | It passed ... | 14:39 |
zigo | Removing https://review.opendev.org/c/openstack/oslo.concurrency/+/855713 fixed my issue. | 14:40 |
gibi | as I noted earlier you are probably running those test ^^ without monkey patching hence the they stuck | 14:42 |
gibi | you have no better option now but removing those tests | 14:43 |
gibi | but you still have to keep the fix form "Fix fair internal lock used from eventlet.spawn_n"as sean-k-mooney noted | 14:43 |
zigo | I did. | 14:44 |
zigo | gibi: Is it possible that I'm running into this problem because I'm not doing `TEST_EVENTLET=0 lockutils-wrapper` before stestr run ? | 14:45 |
gibi | zigo: yes, I think so | 14:45 |
zigo | Ok, will try. | 14:45 |
zigo | Thanks. | 14:46 |
zigo | Indeed, that looks like fixing the issue, thanks gibi! :) | 14:57 |
gibi | as a follow up we need a better handling of those test in oslo.concurrency. The eventlet specific test should be skipped if TEST_EVENTLET is not requested | 14:58 |
zigo | I'd prefer if the whole unit test suite was failing with a TEST_EVENTLET is not set ... | 15:11 |
zigo | This way, a guy like me would know what to do... :) | 15:11 |
zigo | (just my 2 cents of advice...) | 15:12 |
opendevreview | Dmitriy Rabotyagov proposed openstack/nova master: [doc] Add os_vif configuration options https://review.opendev.org/c/openstack/nova/+/857202 | 15:27 |
gibi | zigo: yeah it is not helpful if the test just stuck | 15:27 |
JayF | So heads up, it looks like there might be some persistent failure in stable/yoga: https://review.opendev.org/c/openstack/nova/+/854257 the openstacksdk functional job has failed almost every time on this change (and it's clearly unrelated) | 22:10 |
JayF | I didn't see it mentioned on the etherpad so I figured I'd mention it here. I'm not too attached to that patch specifically anymore (all the things we know need backporting from Ironic driver have been) -- but it works well as a test case to see the failures. | 22:10 |
*** dasm is now known as dasm|off | 22:59 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!