*** dasm|rover is now known as dasm|out | 00:55 | |
opendevreview | Merged openstack/nova master: Protect against a deleted node id file https://review.opendev.org/c/openstack/nova/+/872204 | 02:17 |
---|---|---|
opendevreview | melanie witt proposed openstack/nova master: libvirt: Configure and teardown ephemeral encryption secrets https://review.opendev.org/c/openstack/nova/+/826754 | 07:56 |
opendevreview | melanie witt proposed openstack/nova master: imagebackend: Add support to libvirt_info for LUKS based encryption https://review.opendev.org/c/openstack/nova/+/826755 | 07:56 |
opendevreview | melanie witt proposed openstack/nova master: imagebackend: Cache the key manager when disk is encrypted https://review.opendev.org/c/openstack/nova/+/826756 | 07:56 |
opendevreview | melanie witt proposed openstack/nova master: Support create with ephemeral encryption for qcow2 https://review.opendev.org/c/openstack/nova/+/870932 | 07:56 |
opendevreview | melanie witt proposed openstack/nova master: Support resize with ephemeral encryption https://review.opendev.org/c/openstack/nova/+/870933 | 07:56 |
opendevreview | melanie witt proposed openstack/nova master: Add encryption support to convert_image https://review.opendev.org/c/openstack/nova/+/870934 | 07:56 |
opendevreview | melanie witt proposed openstack/nova master: Add hw_ephemeral_encryption_secret_uuid image property https://review.opendev.org/c/openstack/nova/+/870935 | 07:56 |
opendevreview | melanie witt proposed openstack/nova master: Add encryption support to qemu-img rebase https://review.opendev.org/c/openstack/nova/+/870936 | 07:56 |
opendevreview | melanie witt proposed openstack/nova master: Support snapshot with ephemeral encryption https://review.opendev.org/c/openstack/nova/+/870937 | 07:56 |
opendevreview | melanie witt proposed openstack/nova master: Add reset_encryption_fields() and save_all() to BlockDeviceMappingList https://review.opendev.org/c/openstack/nova/+/870938 | 07:56 |
opendevreview | melanie witt proposed openstack/nova master: Update driver BDMs with ephemeral encryption image properties https://review.opendev.org/c/openstack/nova/+/870939 | 07:56 |
opendevreview | melanie witt proposed openstack/nova master: libvirt: Introduce support for qcow2 with LUKS https://review.opendev.org/c/openstack/nova/+/772273 | 07:56 |
opendevreview | melanie witt proposed openstack/nova master: libvirt: Configure and teardown ephemeral encryption secrets https://review.opendev.org/c/openstack/nova/+/826754 | 08:13 |
opendevreview | melanie witt proposed openstack/nova master: imagebackend: Add support to libvirt_info for LUKS based encryption https://review.opendev.org/c/openstack/nova/+/826755 | 08:13 |
opendevreview | melanie witt proposed openstack/nova master: imagebackend: Cache the key manager when disk is encrypted https://review.opendev.org/c/openstack/nova/+/826756 | 08:13 |
opendevreview | melanie witt proposed openstack/nova master: Support create with ephemeral encryption for qcow2 https://review.opendev.org/c/openstack/nova/+/870932 | 08:13 |
opendevreview | melanie witt proposed openstack/nova master: Support resize with ephemeral encryption https://review.opendev.org/c/openstack/nova/+/870933 | 08:13 |
opendevreview | melanie witt proposed openstack/nova master: Add encryption support to convert_image https://review.opendev.org/c/openstack/nova/+/870934 | 08:13 |
opendevreview | melanie witt proposed openstack/nova master: Add hw_ephemeral_encryption_secret_uuid image property https://review.opendev.org/c/openstack/nova/+/870935 | 08:13 |
opendevreview | melanie witt proposed openstack/nova master: Add encryption support to qemu-img rebase https://review.opendev.org/c/openstack/nova/+/870936 | 08:13 |
opendevreview | melanie witt proposed openstack/nova master: Support snapshot with ephemeral encryption https://review.opendev.org/c/openstack/nova/+/870937 | 08:13 |
opendevreview | melanie witt proposed openstack/nova master: Add reset_encryption_fields() and save_all() to BlockDeviceMappingList https://review.opendev.org/c/openstack/nova/+/870938 | 08:13 |
opendevreview | melanie witt proposed openstack/nova master: Update driver BDMs with ephemeral encryption image properties https://review.opendev.org/c/openstack/nova/+/870939 | 08:13 |
opendevreview | melanie witt proposed openstack/nova master: libvirt: Introduce support for qcow2 with LUKS https://review.opendev.org/c/openstack/nova/+/772273 | 08:13 |
bauzas | tobias-urdin: excellent catch on the RPC pin alias, many thanks | 09:04 |
bauzas | it would have broken seriously our users if we were having it released | 09:04 |
bauzas | yet another strange DB issue https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_d00/868237/9/check/nova-tox-functional-py38/d00d1ff/testr_results.html | 09:27 |
bauzas | gibi: I'm writing the etherpad | 09:27 |
bauzas | and I'm reporting a new gate failure | 09:28 |
gibi | bauzas: the failure you linked is probably https://bugs.launchpad.net/nova/+bug/1946339 | 09:29 |
gibi | so no need for a new bug report | 09:29 |
bauzas | gibi: seems unrelated | 09:30 |
bauzas | I get a DB issue due to a missing table | 09:30 |
bauzas | oh | 09:31 |
gibi | both fails with | 09:31 |
gibi | sqlite3.OperationalError: no such table: instance_faults | 09:31 |
gibi | after a message timeout | 09:31 |
bauzas | yeah, not the same table but quite the same behaviour | 09:31 |
bauzas | well, the bug report title is too specific | 09:32 |
bauzas | gibi: agreed, seems the same | 09:33 |
bauzas | gibi: https://etherpad.opendev.org/p/nova-ci-failures | 09:47 |
bauzas | damn shit, opensearch urls aren't shareable | 09:49 |
gibi | bauzas: feel free to update the title of https://bugs.launchpad.net/nova/+bug/1946339 | 09:52 |
bauzas | gibi: I'm not good at naming | 09:52 |
bauzas | and I'm not sure of the rootcause | 09:53 |
bauzas | I guess this is due to long-running threads | 09:53 |
bauzas | so, due to the length it takes, then we delete the DB | 09:53 |
gibi | bauzas: changed the title then | 09:54 |
bauzas | and so, when the thread stops, then we try to lookup the table which no longer exists | 09:54 |
bauzas | amirite on the root cause ? | 09:54 |
gibi | bauzas: one of the root cause is described in the commit message here https://review.opendev.org/c/openstack/nova/+/814036 | 09:54 |
gibi | what I we need to figure out what is the new sequence of events that leads to a similar leak | 09:56 |
bauzas | yeah ok, so we agree | 09:56 |
bauzas | we have spawned greenlets | 09:56 |
bauzas | that finish after the db is cleaned up | 09:56 |
bauzas | we should hold the test until those greenlets finish | 09:56 |
gibi | 1) we need to figure out which eventlet leaks 2) then we need to figure out how that leaked eventlet affects the later test case, i.e. where it the global variable that supports the leak 3) then we can figure out how to fix it | 09:58 |
tobias-urdin | bauzas: no worries, funny coincidence that i was digging around in the RPC layer and saw it :) | 09:59 |
gibi | in the fixed case it was RPC handling eventlet was leaked that after 60 sec waked up do to timeout and used nova.rpc.get_versioned_notifier() to get a fresh notifier to the actually running test case | 09:59 |
opendevreview | Jorge San Emeterio proposed openstack/nova master: Dividing global privsep profile https://review.opendev.org/c/openstack/nova/+/871729 | 10:00 |
gibi | the fix there was to ignore notifications if it is comming from a test case that is different from the currently running one | 10:00 |
bauzas | gibi: grabbing a coffee, this is a hard issue to dig into | 10:03 |
gibi | bauzas: ack I will add a logsearch query for it as soon as I have one | 10:04 |
bauzas | gibi: thanks, I was just doing it but I'm not expert of the tool | 10:04 |
bauzas | I'm very disappointed we can't longer share our ex-logstack urls | 10:04 |
bauzas | logstash* | 10:04 |
gibi | the SQL cursor one https://bugs.launchpad.net/nova/+bug/2002782 is not that frequent, we hit it 9 times in 20 days so I would ignore it for now | 10:05 |
bauzas | not very handy for finding how much we're doomed | 10:05 |
bauzas | gibi: yup, I was about to tell you | 10:05 |
bauzas | gibi: that's why I wanted to add the opensearch one | 10:05 |
gibi | that 1525 hits in 7 days seem way to much for https://bugs.launchpad.net/nova/+bug/1946339 I suspect some false positives there | 10:09 |
bauzas | probably | 10:10 |
bauzas | gibi: fancy to share your logsearch config files ? | 10:11 |
gibi | https://github.com/gibizer/zuul-log-search-config | 10:12 |
gibi | so we have sort of false positives. For example this https://zuul.opendev.org/t/openstack/build/31e9de9e4a574df6a2f45546927954fe/log/job-output.txt#23436 partially reproduced https://bugs.launchpad.net/nova/+bug/1946339 but in this case the test case did not fail due the the missing DB table just the stack trace was logged. | 10:14 |
gibi | so we probably have a lot of hits that has the stack trace but no failed tests | 10:15 |
bauzas | gibi: I can amend the opensearch query to ensure the outcome is FAILURE | 10:16 |
gibi | yepp, this is an example of a passed tox run with the missing DB table test case https://zuul.opendev.org/t/openstack/build/f389ba6c4d9f4a04a5b6f09e253d864b/log/job-output.txt#23501 | 10:16 |
bauzas | 191 hits :) | 10:17 |
gibi | s/missing DB table test case/missing DB table stack trace/ | 10:17 |
bauzas | in the last 7 days | 10:17 |
bauzas | gibi: sorry for this dumb question but I wanna rush | 10:18 |
bauzas | gibi: how to use the configs from the other repo into the main one ? | 10:18 |
bauzas | just all the files ? | 10:18 |
bauzas | or just checkouting some of them ? | 10:18 |
gibi | bauzas: what I do is: 1) create a venv and install the tool with pip install git+http://github.com/gibizer/zuul-log-search 2) next to the .venv clone the config repo | 10:19 |
bauzas | did 1) | 10:20 |
bauzas | did 2) | 10:20 |
gibi | It uses .logsearch.conf.d/ in the current directory if exists. Otherwise, uses $XDG_CONFIG_HOME/logsearch/ if XDG_CONFIG_HOME is defined. Otherwise, uses ~/.config/logsearch/. | 10:20 |
bauzas | yeah so mv the whole dir ? | 10:20 |
bauzas | that was my question | 10:20 |
bauzas | I see a config subdir in the zuul-log-search | 10:21 |
bauzas | but it seems unused | 10:21 |
gibi | https://paste.opendev.org/show/bTV1aUuajhb3uPYNb8Mp/ | 10:21 |
gibi | sorry, so create the .vevn in the clone config repo. | 10:22 |
bauzas | I see | 10:22 |
bauzas | or ln -sf this config dir | 10:23 |
bauzas | which is what I'll be using | 10:23 |
* bauzas usually creates build venvs in the projects repo | 10:23 | |
gibi | ack | 10:24 |
bauzas | yay, that works | 10:24 |
gibi | I will go and collect other frequent gate failures based on the query logsearch build --project openstack/nova --voting --pipeline gate --result FAILURE --branch master --days 7 | 10:28 |
bauzas | gibi: iiuc, Builds with matching logs 160/162 means that over 162 job runs with FAILURE, 160 of them were having the query I asked ? | 10:29 |
bauzas | so, 98% of them | 10:29 |
gibi | we have some gate runs which are TIMED_OUT too logsearch build --project openstack/nova --voting --pipeline gate --result TIMED_OUT --branch master --days 7 I think this is what dansmith mentioned yesterday | 10:30 |
gibi | bauzas: yes | 10:30 |
bauzas | gibi: maybe the query I make is too large, as you mentioned | 10:30 |
bauzas | request was 'sqlite3.OperationalError: no such table: instance_faults" | 10:30 |
gibi | bauzas: that will pick up the cases when you see the stack trace without that killing the test but job failed for other reason | 10:32 |
gibi | but we need to live with it | 10:32 |
bauzas | yup | 10:32 |
bauzas | I think we now have enough to work with | 10:32 |
bauzas | I'll try to do this digging thing | 10:33 |
gibi | ack | 10:33 |
opendevreview | Maxim Monin proposed openstack/nova master: Server Rescue leads to Server ERROR state if base image is deleted https://review.opendev.org/c/openstack/nova/+/872385 | 10:34 |
bauzas | I tried to ask opensearch to give me the occurrences down to 30 days | 10:34 |
bauzas | I'll try to see whether it started to reappear at some point in time | 10:34 |
bauzas | mmm, interesting | 10:35 |
bauzas | grabbing occurrences for the last 2 months | 10:35 |
bauzas | gibi: https://imgur.com/a/wOeLcUf | 10:36 |
gibi | bauzas: we have limited log storage | 10:37 |
bauzas | it started recently | 10:37 |
bauzas | less than one month of storage ? | 10:37 |
bauzas | or more ? | 10:37 |
gibi | with the old logstash it was about a month | 10:37 |
gibi | I don't know about the new one | 10:37 |
bauzas | gibi: with old logstash, I was sure it was a month | 10:37 |
bauzas | anyway, if so, let's start to find the regression by other way | 10:38 |
bauzas | and I suspect this can't be reproduced locally | 10:38 |
bauzas | or I would need to speed down my laptop | 10:39 |
dvo-plv | Hello, <sean-k-mooney> | 10:50 |
dvo-plv | I would like to continue our coversation, which we had at friday | 10:50 |
dvo-plv | We talked about packed_ring option | 10:50 |
gibi | bauzas: yeah one thing you can try is to slow down things and increase the frequency of the test case that was failed by duplicating in many times | 10:51 |
dvo-plv | I would like to discuss schedulet. | 10:51 |
dvo-plv | The situation when user did not ask about COMPUTE_NET_VIRTIO_PACKED trait, but we need to handle migration in some way. I found that scheduler has ALL_REQUEST_FILTERS array with different filters. My eye falls on the accelerators_filter. I suggest implement packed_ring filtering in the same way as in this method. Also this give us ability to avoid situation when user want to start VM on the node where this feature is not unavailable | 10:51 |
sean-k-mooney | dvo-plv: good thinking but that would be the legacy approch | 10:53 |
sean-k-mooney | dvo-plv: my counter propsal is this. when a vm is spwaned on a host if it support COMPUTE_NET_VIRTIO_PACKED set a flag in the instance_system_metadta to record that. then instead of adding a post placement filter add a pre placement filter here https://github.com/openstack/nova/blob/master/nova/scheduler/request_filter.py | 10:54 |
sean-k-mooney | dvo-plv: unless this is the accelerators filter you ment https://github.com/openstack/nova/blob/master/nova/scheduler/request_filter.py#L260-L273 | 10:55 |
sean-k-mooney | if so then yes it would be very similar to that | 10:55 |
sean-k-mooney | we would either check for a extra_spec and add the trait in an identical way | 10:55 |
sean-k-mooney | or check the instnace_system_metadta for the flag. | 10:56 |
sean-k-mooney | the former would take effect when booting a vm that explictly request this the latter for any vm that was spwaned on a host with this capablity | 10:56 |
sean-k-mooney | one of the main probalem with the instnace_system_metadata approch is im not sure the request_spec has that field | 10:58 |
sean-k-mooney | the approch we take really comes down to one choice. does the packed ring format need to be opt-in or automatic | 10:58 |
sean-k-mooney | if its opt in via a flavor/image property then the prefilter is trivial | 10:59 |
sean-k-mooney | the request spec has both the image properties and flavor extra spec so you can just check them and add the required trait as the accleror filter does | 10:59 |
sean-k-mooney | looking at https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py the instance_system_metadata is not part of the request spec currently | 11:01 |
sean-k-mooney | so to that ahat approch we woul d have ot modify the request spec which im not sure is the right thing to do here. | 11:01 |
sean-k-mooney | the request spec and instance_system_metadata live in different DBs (api vs cell db) | 11:02 |
bauzas | gibi: I chose the stestr approach of --until-failure | 11:03 |
dvo-plv | I check instance_system_metadata table and for me it looks like we will mix different OpenStack's layers ( instance and host) , because there is no type of data for instance like that, we have there some image, project, and user info. And I did not found how this table link with request_spec what we create at the scheduler | 11:04 |
sean-k-mooney | dvo-plv: so based on that i would suggest we take the opt in/out approch and use flavor/image properties | 11:04 |
gibi | bauzas: that is independent from increasing the chance of catching it by increasing the number of test case to run that we know can fail due to the issue. you can do both | 11:05 |
sean-k-mooney | dvo-plv: instance_system_metadata is a generic key value store for storing internal information about the instnace. such as the embeded image properites | 11:05 |
sean-k-mooney | and its not accesable to the schduler genreally | 11:05 |
bauzas | gibi: we know that the issue is not on a single test | 11:06 |
bauzas | so while the testrunner runs, I'm looking at every single failure to see the stacktrace and find a pattern | 11:06 |
gibi | bauzas: yes, but we can grab a list of test cases run by a failed test worker. That list contains both the test case that leaked and the test case that failed due to the leak | 11:06 |
opendevreview | Merged openstack/nova master: Fix 6.2 compute RPC version alias https://review.opendev.org/c/openstack/nova/+/872804 | 11:06 |
gibi | bauzas: we know what is the latter | 11:06 |
sean-k-mooney | dvo-plv: what you would actully need to do is have the conductor populate a filed on the request spec when you do a live migration. i feel like that approch is more complex then requried | 11:07 |
gibi | bauzas: so we can run the same testcase list | 11:07 |
gibi | bauzas: as we know it contains both | 11:07 |
bauzas | gibi: I see your proposal | 11:07 |
gibi | bauzas: then we can increase the chance by adding more test cases that is in the latter category | 11:07 |
bauzas | I have the subunits | 11:07 |
bauzas | so I can generate a list | 11:07 |
bauzas | of failing tests | 11:07 |
bauzas | and duplicate that list | 11:08 |
sean-k-mooney | dvo-plv: if we were to leverage the instance_system_metadta we would likely need to extend the Destination filed to have addtional trait requests or something like that https://github.com/openstack/nova/blob/master/nova/objects/request_spec.py#L1093 | 11:09 |
gibi | originally (in 2021) this way I was able to reproduce https://bugs.launchpad.net/nova/+bug/1946339 but I tried this couple weeks ago again and was not able to reproduce the current occasion after couple hour of --unit-failure run | 11:09 |
sean-k-mooney | dvo-plv: the destination object is constucted here https://github.com/openstack/nova/blob/1c46c4e9e5ba4b84816f5cadad0674f3a773e739/nova/conductor/tasks/live_migrate.py#L64 | 11:10 |
sean-k-mooney | dvo-plv: but as i said this more complex approch is only relevent if we wanted to automatically enabel this functionality | 11:11 |
sean-k-mooney | well technially its created here https://github.com/openstack/nova/blob/1c46c4e9e5ba4b84816f5cadad0674f3a773e739/nova/conductor/manager.py#L470 | 11:12 |
sean-k-mooney | this only matters for the live migration case as the feature can be renegociated on cold migration or other move operations | 11:13 |
bauzas | (functional-py38) [sbauza@sbauza nova]$ stestr load /tmp/zuul-logs.Edb6Rp/testrepository.subunit --subunit | subunit-filter -F | subunit-ls | 11:14 |
bauzas | nova.tests.functional.libvirt.test_vtpm.VTPMServersTest.test_create_server | 11:14 |
bauzas | gibi: I'm able to get the failing test | 11:14 |
bauzas | so given I'm looking at all the fetched logsearch subunits, I could extrapolate a list of usual suspects | 11:14 |
dvo-plv | So if you think that this way is very complex and can make code not so easy and familiar, maybe we should better use existing approach ( creates a new filter like accelerators_filter and check if the user requested packed option) what already exists and is easy to scale. I already check this approach, this approach also forbids migrating VM to the host without packed ring support and also start VM on the host without packed ring support | 11:15 |
sean-k-mooney | dvo-plv: yep that is the simpelest approch. we could in a future release enable it by default and add a migration mechaniums too if desired. | 11:16 |
sean-k-mooney | either by turning it on once we raise our min QEMU/Libvirt versions to one that means it will alwasy be aviable | 11:17 |
sean-k-mooney | or buy automatically adding the image property if not provided | 11:17 |
sean-k-mooney | so taking the explict approch now does not prevent use making it implict in the future | 11:18 |
sean-k-mooney | making it automatic now front loads a bunch of complexity | 11:18 |
gibi | bauzas: I tried that too. I fetched multiple failed worker test case list and intersected them it resulted in an empty list. probably we have multiple test cases that leaks | 11:21 |
gibi | bauzas: but you can be lucky | 11:22 |
bauzas | gibi: I have the uuids from logseearch but I don't have the subunit streams | 11:23 |
bauzas | gibi: any way to pull them with logsearhc ? maybe using the --file param ? | 11:24 |
gibi | I pull them manually | 11:24 |
gibi | I needed only about 5 to see that there is no one common test case in the list | 11:24 |
bauzas | I can hack download.sh | 11:25 |
gibi | bauzas: I added | 11:27 |
gibi | https://bugs.launchpad.net/tempest/+bug/1999893 | 11:28 |
gibi | to the etherpad | 11:28 |
bauzas | gibi: I'm downloading the subunit file from each of the 159 failing runs | 11:37 |
gibi | that is maybe overkill as I said 5 example was enough for me to end up in an empty intersect | 11:37 |
bauzas | we will see | 11:37 |
bauzas | and then I'll ask stestr run to run 10 times each of the failing tests | 11:38 |
sahid_ | o/ quick question, regqrding instance.props, we do copy image metadata to the instance right? I don't remember | 11:43 |
gibi | bauzas: added another bug to the etherpad https://bugs.launchpad.net/nova/+bug/2006467 | 12:10 |
bauzas | ack | 12:11 |
bauzas | fwiw, looping over 78 different libvirt funct tests | 12:11 |
bauzas | first pass was saying OK | 12:12 |
dvo-plv | Sorry, Sean, but I do not get your final think. Do you prefer to use. Honestly, I would like to implement it as I have suggested, all interfaces what I need already exists and I will not extend other methods and classes with one parameter that will not use often. It will be a delicate way to extend the existing bunch of filters with one more filter. | 12:18 |
gibi | bauzas: added another https://bugs.launchpad.net/glance/+bug/2006473 | 12:50 |
sean-k-mooney | dvo-plv: i was suggeting using the extra specs/image properties for now | 12:57 |
sean-k-mooney | dvo-plv: and when we raise our min libvirt/qemu version eventually we can enable it by defualt then | 12:57 |
sean-k-mooney | dvo-plv: that would be my prefence. its simple to add to nova, easy to document and understand and easy to test | 12:58 |
sean-k-mooney | dvo-plv: we can evenutally turn this on by default when we nolonger supprot qemu/libvirt version that dont have this and there is nolonger an upgade impact | 12:59 |
sean-k-mooney | dvo-plv: we did the same thing with the virtio random number generator in the past | 13:01 |
sean-k-mooney | dvo-plv: intially it was opt in and we enabled it by default after a few release after we raised our min libvirt/qemu version | 13:01 |
bauzas | gibi: after one hour, still none of the 79 tests were having an issue | 13:14 |
opendevreview | Jorge San Emeterio proposed openstack/nova master: Moving privsep profiles to nova/__init__.py https://review.opendev.org/c/openstack/nova/+/872010 | 13:18 |
dvo-plv | Yes, I would like to have some general pre-approval from you here, before starting to implement and verify this functionality and present it in the blueprint to be sure that it will work okay, and does not waste your time on the blueprint spec file review process 1) User will have the ability to enable/disable this feature via flavor/image. 2) User will have the ability to set trait COMPUTE_NET_VIRTIO_PACKED to the flavor | 13:18 |
dvo-plv | Sorry, I have interrupt, I will resend my question | 13:19 |
dvo-plv | Yes, I would like to have some general pre-approval from you here, before starting to implement and verify this functionality and present it in the blueprint to be sure that it will work okay, and does not waste your time on the blueprint spec file review process | 13:19 |
dvo-plv | 1) User will have the ability to enable/disable this feature via flavor/image. 2) User will have the ability to set trait COMPUTE_NET_VIRTIO_PACKED to the flavor to choose some specific servers. Compute node will set this trait to the resource provider here static_trait. | 13:19 |
dvo-plv | 3) Scheduler will handle migration and OpenStack cluster update process with automatically understanding which node has this function with extended ALL_REQUEST_FILTERS array with a new filter similarly how it was implemented for accelerators_filter ( get a packed request from flavor ). | 13:20 |
sean-k-mooney | dvo-plv: yep so requesting the feature via flavor/image shoudl automatically result in the trait request via a pre_fiter like the acclerator filter | 13:20 |
dvo-plv | 4) As far as Qemu from v4.2 can not be compiled without packed ring support and Libvirt from v6.3, we can get if the current compute node can use this functionality and if it is available for the user. | 13:20 |
dvo-plv | OR do we need just implement options 1, 2, and 4 without the automatic scheduler handling this feature exists on the compute target node? | 13:20 |
sean-k-mooney | so they can ask for COMPUTE_NET_VIRTIO_PACKED explictly but it should not be required | 13:20 |
sean-k-mooney | 2 you get for free we already support arbitry trait request in the flavor/image | 13:21 |
sean-k-mooney | as part of implementeing 1 you should add a schduler prefilter to request COMPUTE_NET_VIRTIO_PACKED if the extra_spec/image property is set | 13:21 |
sean-k-mooney | so 1 and 3 are what you need to enable this feature properly | 13:22 |
sean-k-mooney | dvo-plv: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L212-L222 | 13:23 |
sean-k-mooney | our current min libvirt is 6.0 and Qemu is 4.2 | 13:23 |
sean-k-mooney | we have not bumped that in a few releases so we will likely go to libvirt 7.0 and qemu 5.2 in the B release | 13:24 |
sean-k-mooney | although we can technially do that in the A release | 13:25 |
dvo-plv | Okay, I see, but it in the future, for now Libvirt support packed from 6.3, Should I just update minimum libvirt version, or create my own define for my trait? | 13:25 |
sean-k-mooney | bauzas: kashyap any reason not to do that in A | 13:25 |
sean-k-mooney | dvo-plv: we have a speicifc procedure for updating it where we have to annowuch our new min version in advacne for at least 1 cycle | 13:26 |
sean-k-mooney | we declared 7.0 and 5.2 as our next verion in Wallaby | 13:27 |
sean-k-mooney | so we could have done that bump some time ago | 13:27 |
bauzas | sean-k-mooney: we can if you want | 13:27 |
sean-k-mooney | although we now have new upgrade requirement to test the previous LTS | 13:27 |
sean-k-mooney | bauzas: i just realsied we cant | 13:27 |
sean-k-mooney | we need to support focal for A for upgrade reasons | 13:28 |
sean-k-mooney | bauzas: so we should do this in early B | 13:28 |
sean-k-mooney | we need to not have 20.04 in our greade job to do this bump | 13:29 |
bauzas | hmmm ok | 13:29 |
sean-k-mooney | and the dedicated focal job to go away | 13:29 |
sean-k-mooney | for B we will be useing 22.04 | 13:29 |
sean-k-mooney | dvo-plv: so what that means for you is if your patch is after we have done the bump you will not need to do the version check | 13:30 |
gibi | bauzas: I'm not surpirsed, it seems both of us are missing some hidden ingredients to reproduce the same thing that happens on the gate | 13:31 |
sean-k-mooney | if its before we do the bump you will ahve to do the version check when reportin the trait | 13:31 |
bauzas | Ran: 4144 tests in 4758.5975 sec. | 13:31 |
bauzas | - Passed: 4144 | 13:31 |
bauzas | - Skipped: 0 | 13:31 |
bauzas | - Expected Fail: 0 | 13:31 |
bauzas | - Unexpected Success: 0 | 13:31 |
bauzas | - Failed: 0 | 13:31 |
bauzas | Sum of execute time for each test: 32443.8935 sec. | 13:31 |
bauzas | :) | 13:31 |
sean-k-mooney | what kind of potato is that running on | 13:31 |
sean-k-mooney | or were you just running those in a loop | 13:32 |
bauzas | sean-k-mooney: see what we discussed before you arrived | 13:32 |
bauzas | gibi: yah, maybe | 13:33 |
dvo-plv | Okay, If Libvirt version will be lower that 6.3, when I will present patch in the blueprint, I will create separate define with Libvirt version | 13:33 |
bauzas | gibi: I'm now looking at the code and trying to understand what we use | 13:33 |
opendevreview | Jorge San Emeterio proposed openstack/nova master: Moving privsep profiles to nova/__init__.py https://review.opendev.org/c/openstack/nova/+/872010 | 13:33 |
kashyap | sean-k-mooney: Hi, reading back. (Was buried elsewhere in an urgent thing) | 13:33 |
sean-k-mooney | kashyap: its fine it was just on our next libvirt/qemu version | 13:34 |
kashyap | sean-k-mooney: Yeah, bumping the min versions in 'A' is totally fine. | 13:34 |
sean-k-mooney | kashyap: actully it used to be its not anymore | 13:34 |
sean-k-mooney | kashyap: form a pure nova point of view it would be | 13:34 |
sean-k-mooney | kashyap: but we have PTI/governance requirements | 13:34 |
kashyap | Right, I'm talking from a Nova PoV | 13:34 |
sean-k-mooney | that reqire use to supprot 20.04 for A | 13:35 |
sean-k-mooney | right so because of the other requirement we cant bump it in nova until B | 13:35 |
sean-k-mooney | kashyap: https://github.com/openstack/governance/blob/master/reference/runtimes/2023.1.rst#additional-testing-for-smooth-upgrade | 13:35 |
kashyap | What is "support 20.04 for A", I don't get | 13:36 |
* kashyap reads | 13:36 | |
dvo-plv | Thank you for your time and conversation. Have a nice day | 13:36 |
kashyap | Ah, it is Ubunutu 20.04 | 13:36 |
sean-k-mooney | yes basicaly every time we cange a base OS in the testign requirement we need to have one release wehre we test the old and new version | 13:37 |
sean-k-mooney | kashyap: we chavned form 20.04 to 22.04 in this release | 13:37 |
sean-k-mooney | so the same would happen for debiany 11->12 or centos 9->10 in the future | 13:37 |
sean-k-mooney | its to ensure you can upgrade openstack without nessiarly needing to upgrade the OS it also mimic how our greneade jobs work | 13:38 |
sean-k-mooney | its related to https://github.com/openstack/governance/blob/master/resolutions/20220210-release-cadence-adjustment.rst the skip level upgrade release and the new lifecycle for integrated release projects | 13:39 |
sean-k-mooney | dvo-plv: o/ | 13:40 |
kashyap | sean-k-mooney: Yeah, the upgradeability makes sense | 13:40 |
sean-k-mooney | kashyap: bauzas any objection to doing the bump in a few weeks after RC 1 is out | 13:40 |
sean-k-mooney | better to try and do that early rather then late | 13:40 |
sean-k-mooney | or at least identify what our next versions should be declared as | 13:40 |
bauzas | when RC1 is out, then the master branch will be the Bobcat release, so ok | 13:41 |
kashyap | sean-k-mooney: Definitely agree on doing it earlier | 13:41 |
opendevreview | Merged openstack/nova master: Move comment about _destroy_evacuated_instances() https://review.opendev.org/c/openstack/nova/+/872348 | 13:42 |
opendevreview | Merged openstack/nova stable/wallaby: [stable-only][cve] Check VMDK create-type against an allowed list https://review.opendev.org/c/openstack/nova/+/871557 | 13:42 |
bauzas | \o/ | 13:42 |
bauzas | gibi: interestingly, if I restrict the logsearch call to ImportError: This test imports the 'libvirt' module, which it should not in the test environment. Please add appropriate mocking to this test." which is the latest exception I only get 19/143 failures that match (from the last 20 days) | 13:45 |
opendevreview | Maxim Monin proposed openstack/nova master: Server Rescue leads to Server ERROR state if base image is deleted https://review.opendev.org/c/openstack/nova/+/872385 | 13:45 |
bauzas | by comparing https://7ffaea22ff93fca2f0ea-bf433abff5f8b85f7f80257b72ac6f67.ssl.cf5.rackcdn.com/869900/7/gate/nova-tox-functional-py38/3b10d8a/testr_results.html to https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_d00/868237/9/check/nova-tox-functional-py38/d00d1ff/testr_results.html that's why I think we have this | 13:45 |
gibi | I dont see the difference both has the import error line | 13:46 |
*** dasm|out is now known as dasm|rover | 13:53 | |
bauzas | gibi: I mean, this is just a canary line for not getting the false positives | 13:57 |
gibi | do you have a false positive where this line is missing? | 13:58 |
opendevreview | David Hill proposed openstack/nova master: Increase user_data from 64k to 128k https://review.opendev.org/c/openstack/nova/+/872931 | 14:00 |
bauzas | gibi: one of the false positives is https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_aea/850501/15/check/nova-tox-functional-py38/aea02af/testr_results.html | 14:04 |
bauzas | gibi: you can find the DB issue in job_output.txt but the tests aren't failing | 14:05 |
bauzas | https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_aea/850501/15/check/nova-tox-functional-py38/aea02af/job-output.txt | 14:05 |
bauzas | and you won't see the canary line | 14:06 |
gibi | I see. So the difference between the false positive and a real positive is that the real one hits the libvirt import check and fails the actual check while the false one did not | 14:06 |
bauzas | yup | 14:06 |
gibi | s/actual check/actual test/ | 14:06 |
bauzas | so now I'm trying to find the pattern | 14:06 |
gibi | I see | 14:06 |
bauzas | I stestr loaded all the subunites | 14:06 |
gibi | good progress | 14:06 |
bauzas | now, I'm grepping this large txtfile I generated | 14:06 |
bauzas | gibi: see, the fact that we get the same exceptions with or without failing makes me think that the canary isn't maybe just a canary but rather the root cause of the failure | 14:13 |
bauzas | or rather some condition to a failure | 14:14 |
bauzas | if we can understand why in some cases we say meh and why not, then we could fix the problem | 14:15 |
bauzas | gibi: https://4dca9d38a541907e85e1-0253beca39d73a6e7192d5b32ed5edc2.ssl.cf2.rackcdn.com/860282/2/check/nova-tox-functional-py310/466e0d7/testr_results.html is a good candidate to use as it got | 14:21 |
bauzas | *both* false positive and true positives | 14:21 |
bauzas | test_resize_revert_across_azs is a true positive | 14:22 |
bauzas | while other api failures are false ones | 14:22 |
gibi | are the hits in the same worker? that would be the best to find a worker with multipl hits as that would mean that worker had more than one leak | 14:27 |
bauzas | 2023-01-24 18:37:50.710671 | ubuntu-jammy | File "/home/zuul/src/opendev.org/openstack/nova/nova/virt/libvirt/driver.py", line 10619, in _live_migration 2023-01-24 18:37:50.710677 | ubuntu-jammy | self.live_migration_abort(instance) | 14:28 |
bauzas | looks to me all our pain comes from this ^ | 14:28 |
bauzas | and then I'm confused | 14:31 |
bauzas | why so the f... are we calling live_migration_abort() is some functional test that just creates an instance ? | 14:31 |
bauzas | https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_d00/868237/9/check/nova-tox-functional-py38/d00d1ff/testr_results.html | 14:31 |
gibi | bauzas: the leaked thread calls it :) | 14:32 |
bauzas | hah | 14:32 |
gibi | bauzas: I think the global state the leak acts on to infulence the later tests is sys.meta_path from ImportModulePoisonFixture | 14:32 |
gibi | so my current theory: we leak a thread/eventlet that eventually calls live_migration_abort while a later test runs. As the later test sets the global posion on libvirt import the later test gets the failure | 14:33 |
bauzas | yeah, the global libvirt object is set to None by something else | 14:33 |
gibi | if there is no ImportModulePoisonFixture set in the later test then we only see the stack trace | 14:33 |
bauzas | so the threads get this NoneError and fails whivh tramples the whole terst | 14:34 |
bauzas | something we merged tampered the libvirt import | 14:34 |
bauzas | and we need to find it | 14:34 |
gibi | I think we intentionally posion libvirt import | 14:35 |
bauzas | by posion, you mean poison, right? | 14:35 |
bauzas | but yeah got it | 14:36 |
gibi | yeah, sorry | 14:36 |
bauzas | afaicr, our libvirt functional tests do poison indeed the import | 14:36 |
bauzas | the problem is that it seems that some test that calls live_mig_abort() doesn't use the libvirt poisoned instance, hence the issue | 14:37 |
bauzas | but which one and how to find it ? | 14:38 |
bauzas | 2023-02-06 16:15:47,325 ERROR [root] Original exception being dropped: ['Traceback (most recent call last):\n', ' File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py38/lib/python3.8/site-packages/oslo_messaging/_drivers/impl_fake.py", line 207, in _send\n reply, failure = reply_q.get(timeout=timeout)\n', ' File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py38/lib/python3.8/site-packages/eventlet | 14:39 |
bauzas | /queue.py", line 322, in get\n return waiter.wait()\n', ' File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py38/lib/python3.8/site-packages/eventlet/queue.py", line 141, in wait\n return get_hub().switch()\n', ' File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py38/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 313, in switch\n return self.greenlet.switch()\n', '_queue.Empty\n', ' | 14:39 |
bauzas | \nDuring handling of the above exception, another exception occurred:\n\n', 'Traceback (most recent call last):\n', ' File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 203, in decorated_function\n return function(self, context, *args, **kwargs)\n', ' File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 9300, in _post_live_migration\n self._update_scheduler_instance_in | 14:39 |
bauzas | fo(ctxt, instance)\n', ' File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 2219, in _update_scheduler_instance_info\n self.query_client.update_instance_info(context, self.host,\n', ' File "/home/zuul/src/opendev.org/openstack/nova/nova/scheduler/client/query.py", line 69, in update_instance_info\n self.scheduler_rpcapi.update_instance_info(context, host_name,\n', ' File "/home/zuul/src/opende | 14:39 |
bauzas | v.org/openstack/nova/nova/scheduler/rpcapi.py", line 174, in update_instance_info\n cctxt.cast(ctxt, \'update_instance_info\', host_name=host_name,\n', ' File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py38/lib/python3.8/site-packages/fixtures/_fixtures/monkeypatch.py", line 86, in avoid_get\n return captured_method(*args, **kwargs)\n', ' File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py38 | 14:39 |
bauzas | /lib/python3.8/site-packages/oslo_messaging/rpc/client.py", line 190, in call\n result = self.transport._send(\n', ' File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py38/lib/python3.8/site-packages/oslo_messaging/transport.py", line 123, in _send\n return self._driver.send(target, ctxt, message,\n', ' File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py38/lib/python3.8/site-packages/oslo_mess | 14:39 |
bauzas | aging/_drivers/impl_fake.py", line 222, in send\n return self._send(target, ctxt, message, wait_for_reply, timeout,\n', ' File "/home/zuul/src/opendev.org/openstack/nova/.tox/functional-py38/lib/python3.8/site-packages/oslo_messaging/_drivers/impl_fake.py", line 213, in _send\n raise oslo_messaging.MessagingTimeout(\n', 'oslo_messaging.exceptions.MessagingTimeout: No reply on topic scheduler\n'] 2023-02-06 16:15:47,361 WAR | 14:39 |
bauzas | NING [nova.virt.libvirt.driver] Error monitoring migration: (sqlite3.OperationalError) no such table: compute_nodes | 14:39 |
gibi | maybe the test has the poison but it is removed after the test finished, but the thread that will do the abort call is leaked to another test that might or might not have (or need) the poision | 14:39 |
opendevreview | Elod Illes proposed openstack/nova stable/ussuri: DNM: CI test https://review.opendev.org/c/openstack/nova/+/872184 | 14:49 |
elodilles | bauzas: i'll update the meeting wiki (stable section) if you are OK with it | 15:02 |
bauzas | elodilles: do it | 15:02 |
elodilles | ack | 15:02 |
bauzas | gibi: any way we could have to introspect in some log what was creating the thread ? | 15:04 |
* bauzas googles as I speak | 15:04 | |
bauzas | ChatGPT, maybe you know ? | 15:05 |
elodilles | :) | 15:07 |
elodilles | (meanwhile, I'm done with the wiki editing) | 15:08 |
gibi | bauzas: we can try printing https://github.com/openstack/nova/blob/9bc198e05733c03ba1a40f89cd6a77ab54b7e480/nova/tests/fixtures/notifications.py#L154-L160 to get the name of the testcase that started the eventlet | 15:10 |
bauzas | gibi: I can write a patch | 15:13 |
bauzas | given the occurrences, we may have evidences coming up | 15:13 |
bauzas | sooner than later | 15:13 |
gibi | yeah lets try that | 15:14 |
bauzas | gibi: that being said, the thread is maybe not a FakeVersionedNotifier | 15:15 |
*** dasm|rover is now known as dasm|afk | 15:15 | |
gibi | bauzas: FakeVersionedNotifier was on the receiving end in the past not on the sending side. in the current case the poison is on the receiving side, and the live_mig_abort is on the sending side afaik | 15:16 |
bauzas | gibi: are you proposing me to add this directly in File "/home/zuul/src/opendev.org/openstack/nova/nova/compute/manager.py", line 8854, in _do_live_migration self.driver.live_migration(context, instance, dest, ? | 15:16 |
gibi | if we want to print only in true positive cases then add it in /home/zuul/src/opendev.org/openstack/nova/nova/tests/fixtures/nova.py line 1849, | 15:17 |
gibi | if we want to print in false positive cases too then in /home/zuul/src/opendev.org/openstack/nova/nova/virt/libvirt/driver.py", line 10071, in live_migration_abort | 15:17 |
gibi | don't call _get_sender_test_case_id just copy the implementation of it | 15:18 |
bauzas | yup, I see | 15:19 |
bauzas | gibi: but we want to know the parent, right? | 15:21 |
gibi | bauzas: print the first id it gets that will be the name of the test case leaked the thread either directly or indirectly | 15:21 |
gibi | hm, direclty, hence the walking on the parents | 15:21 |
gibi | so print the first id that will be the eventlet nova spawn or spawn_n started and have the test case id emeded | 15:22 |
gibi | we walk the parents as eventlets later can spawn other eventlets which we don't control and therefore we cannot propagate the testcase id there | 15:22 |
opendevreview | Sylvain Bauza proposed openstack/nova master: DNM: Add logging for leaking out the non-poisoned libvirt testcase https://review.opendev.org/c/openstack/nova/+/872975 | 15:32 |
bauzas | gibi: ^ | 15:32 |
bauzas | I said DNM but we could merge it | 15:32 |
bauzas | instead of us rechecking | 15:33 |
opendevreview | Dan Smith proposed openstack/nova master: Add docs for stable-compute-uuid behaviors https://review.opendev.org/c/openstack/nova/+/872977 | 15:46 |
*** dasm|afk is now known as dasm|rover | 15:50 | |
bauzas | #startmeeting nova | 16:01 |
opendevmeet | Meeting started Tue Feb 7 16:01:03 2023 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:01 |
opendevmeet | The meeting name has been set to 'nova' | 16:01 |
bauzas | sorry folks, forgot to remind you of the meeting | 16:01 |
bauzas | who's around ? | 16:01 |
Uggla | o/ | 16:01 |
elodilles | o/ | 16:02 |
bauzas | I guess we can make a soft start | 16:02 |
bauzas | #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting | 16:02 |
bauzas | #topic Bugs (stuck/critical) | 16:03 |
bauzas | #info No Critical bug | 16:03 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 28 new untriaged bugs (+1 since the last meeting) | 16:03 |
bauzas | #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster | 16:03 |
bauzas | Uggla: fancy getting the bug triage baton for this week N | 16:03 |
bauzas | ? | 16:03 |
sean-k-mooney | o/ | 16:04 |
Uggla | I will be out next week so I would rather postponed if possible | 16:04 |
bauzas | ack, so artom would you want to continue having the triage baton for an extra week ? | 16:04 |
Uggla | If not I'll try to do my best till the end of the week. | 16:04 |
artom | Ah, I completely dropped the ball, didn't I? | 16:05 |
artom | Yeah, I can keep it | 16:05 |
bauzas | ++ | 16:05 |
bauzas | artom: no worries | 16:05 |
bauzas | and thanks | 16:05 |
gibi | o/ | 16:05 |
dansmith | o/ | 16:06 |
bauzas | ok moving on | 16:06 |
bauzas | #topic Gate status | 16:06 |
bauzas | #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs | 16:06 |
bauzas | new item | 16:06 |
bauzas | #link https://etherpad.opendev.org/p/nova-ci-failures Etherpad for tracking CI failures | 16:06 |
gibi | there is a fairly long list but please add to it if you see failures | 16:07 |
gibi | that are not on the list | 16:07 |
bauzas | we have some of them are hit us in some place between the chair and the man | 16:07 |
bauzas | I think the hardest one is at the bottom of the document | 16:08 |
dansmith | oof yeah | 16:08 |
bauzas | I had a lovely morning and a half-afternoon spent on that one | 16:08 |
bauzas | in the context of a soonish feature freeze, more hands are more than welcome | 16:08 |
dansmith | I wrote the replace_location test, so I can look into that one.. it's a glance test though. I'm sure it's poking some bug in glance, because until I wrote that we didn't really have any tests for that stuff | 16:09 |
bauzas | because don't expect your patches to be reviewed if most of the cores are having their days spent on fixing CI problems | 16:09 |
dansmith | but maybe it is resolvable | 16:09 |
gibi | dansmith: there is https://bugs.launchpad.net/glance/+bug/1999800 and https://bugs.launchpad.net/glance/+bug/2006473 both location tests | 16:10 |
bauzas | and yeah, I know, debugging a CI failure isn't exactly the best experience you may have of working on an opensource project, but let's be honest and say that's necessary to have an healthy gate | 16:10 |
gibi | bauzas: +1 | 16:10 |
dansmith | okay the former is the same as bauzas' one | 16:10 |
gibi | yeah probably | 16:10 |
dansmith | yeah from the logs, the test is clearly doing something legit and glance is rejecting it but shouldn't | 16:11 |
bauzas | gibi: I created https://bugs.launchpad.net/nova/+bug/2004641 but it seems duplicate of https://bugs.launchpad.net/glance/+bug/1999800 | 16:11 |
dansmith | might be because it fails to talk to the cirros site occasionally, so maybe we can use an openstack infra url instead | 16:11 |
dansmith | bauzas: indeed | 16:11 |
sean-k-mooney | i tought we tried to pull those form provider proxies in ci | 16:11 |
gibi | bauzas: https://bugs.launchpad.net/tempest/+bug/2004641 and https://bugs.launchpad.net/glance/+bug/2006473 are duplicates but https://bugs.launchpad.net/glance/+bug/1999800 is a separate tc | 16:11 |
bauzas | I can close my one as duplicate | 16:11 |
dansmith | bauzas: ++ | 16:12 |
sean-k-mooney | github is more repliable for downlaoding cirrors images by the way then the cirros site | 16:12 |
bauzas | I just ideally would like to track that bug in our project | 16:12 |
dansmith | sean-k-mooney: the cirros page just redirects to the github one | 16:13 |
sean-k-mooney | oh they finally implmetned that | 16:13 |
dansmith | sean-k-mooney: and we're just using CONF.image.http_image in that test | 16:13 |
sean-k-mooney | oh this is not the image pulled by https://github.com/openstack/devstack/blob/master/stackrc#L670-L708 | 16:14 |
dansmith | the github URL is crazy long with tons of tokens and other values after the redirects it does | 16:14 |
bauzas | gibi: ack will mark your https://bugs.launchpad.net/nova/+bug/2004641 as duplicate of mine, then | 16:14 |
dansmith | sean-k-mooney: this is a tempest test | 16:14 |
sean-k-mooney | right the one with the larger image | 16:14 |
dansmith | no | 16:14 |
dansmith | gibi: it's the same test case, different behavior, but I'm guessing its the sameish problem | 16:14 |
bauzas | ok, you know what, I'll add mine in the tracking etherpad, and we'll figure out | 16:15 |
bauzas | the three of them are set against Glance either way | 16:15 |
sean-k-mooney | im surpised that the tempest test is not using the one we prestage in the vm but ok | 16:15 |
sean-k-mooney | i was expecting CONF.image.http_image to be file:///opt/devstack/data/cirros... | 16:16 |
gibi | dansmith: yeah probably similar root cause | 16:17 |
dansmith | sean-k-mooney: it can't be because that is specifically for testing fetching an image server-side from http | 16:17 |
sean-k-mooney | ah thanks i was missing that context | 16:17 |
sean-k-mooney | oh that that in https://bugs.launchpad.net/glance/+bug/2006473 i was only familar with https://bugs.launchpad.net/glance/+bug/1999800 | 16:17 |
dansmith | they're the same test | 16:18 |
dansmith | sorry, the same test helper | 16:18 |
bauzas | and probably the rootcause | 16:18 |
sean-k-mooney | ya so likely the same cause | 16:18 |
bauzas | same rootcause | 16:18 |
bauzas | which is a flakey httpservice | 16:19 |
bauzas | either way, seems we have a path forward with the github image repo then ? | 16:19 |
* bauzas trying to read between the lines | 16:19 | |
bauzas | looks like people are gone | 16:21 |
dansmith | bauzas: no, it's already using that via redirect | 16:21 |
bauzas | there is another CI failure I'd like to talk about | 16:21 |
sean-k-mooney | from the name i would not expect either to depned on downloading an image over http but i have not looked at the detail of the test. i was expecting tempest to upload the image form disk. | 16:21 |
dansmith | bauzas: I'll take it and work something out | 16:21 |
bauzas | dansmith: very much appreciated, trust me. | 16:21 |
bauzas | dansmith: fwiw, the hits number seems low compared to other bits | 16:22 |
bauzas | bites* | 16:22 |
bauzas | so, about https://bugs.launchpad.net/nova/+bug/1946339 | 16:22 |
dansmith | yeah, but if we have no other obvious ones to work on, at least I can make some progress on this :) | 16:22 |
bauzas | dansmith: heh | 16:22 |
bauzas | so, after a day of co-investigation with my CSI partner gibi on https://bugs.launchpad.net/nova/+bug/1946339 | 16:23 |
bauzas | we identified this may come from a non-poisoned libvirt | 16:23 |
bauzas | the funny part is that we hit this in a thread, not in the main test | 16:23 |
bauzas | hence why we missed it before | 16:24 |
bauzas | I have a question | 16:24 |
bauzas | do people agree with merging https://review.opendev.org/c/openstack/nova/+/872975 even if it says it's a dnm ? | 16:24 |
bauzas | (tbc, I can make an update and remove the dnm title) | 16:24 |
dansmith | we should remove the dnm for sure | 16:25 |
sean-k-mooney | bauzas: melwitt had a patch to poison importing libvrt that should catuch this by the way | 16:25 |
opendevreview | Sylvain Bauza proposed openstack/nova master: Add logging for leaking out the non-poisoned libvirt testcase https://review.opendev.org/c/openstack/nova/+/872975 | 16:25 |
dansmith | bauzas: do you know about the thing you can do to add additional test payload report sections? | 16:25 |
bauzas | dansmith: acked ^ | 16:25 |
dansmith | depending on what you're trying to do, that can be more useful than logging sometimes | 16:25 |
bauzas | dansmith: nope, hence my sending the bottle to the sea, asking for advices | 16:26 |
sean-k-mooney | bauzas: can you put a sleep in that busy loop too | 16:26 |
dansmith | it's not a busy loop is it? | 16:26 |
bauzas | nope | 16:26 |
gibi | it is walking a tree up | 16:26 |
bauzas | we're trying to find an attribute from an eventlet object and if we can't find it, we walk the ascendance | 16:26 |
sean-k-mooney | it will loop until the test_case_id is not None | 16:26 |
gibi | it walks along the eventlet.parent link | 16:27 |
sean-k-mooney | i guess its proably fine | 16:27 |
gibi | so while it busy it is bounded | 16:27 |
sean-k-mooney | oh sorry your right it is doing that | 16:27 |
sean-k-mooney | ok | 16:27 |
bauzas | dansmith: so, about the payload reporting, you gained my interest | 16:27 |
dansmith | bauzas: https://github.com/openstack/glance/blob/master/glance/tests/functional/__init__.py#L1129-L1130 | 16:27 |
dansmith | bauzas: that adds another section of the test failure reporting, like "here's the stdout I captured" and "here are the log lines I captured" | 16:28 |
bauzas | ffff | 16:28 |
bauzas | dansmith: ++ | 16:28 |
dansmith | helps to separate nova-logging from something specifically to be reported by the test case | 16:28 |
dansmith | especially if debug logging isn't captured, or is being mocked out, etc | 16:28 |
dansmith | in glance I found it useful because their functional workers run outside the main process, but also in some cases where I needed to debug failures | 16:29 |
dansmith | (failures that happen infrequently) | 16:29 |
dansmith | anyway, just FYI, might be helpful | 16:29 |
bauzas | it could be | 16:29 |
gibi | dansmith: ohh that is good to know :) | 16:29 |
sean-k-mooney | oh addDetail | 16:30 |
bauzas | dansmith: the problem is that we get an exception from a test which is actually not due by this test but rather by a leaked eventlet thread that blows up at that point in time | 16:30 |
sean-k-mooney | i have seen that before but never looked into it ya look useful | 16:30 |
dansmith | gibi: yeah, it's kinda nice :) | 16:31 |
bauzas | ideally I would like to trace the whole parenting stack that triggered the leaky thread | 16:31 |
gibi | bauzas: we will hopefuly get the name of the leaky test case and then we can create a local reproduction | 16:32 |
bauzas | gibi: a stack would have been better but yeah | 16:33 |
gibi | you have a stack | 16:33 |
gibi | but it start when the thread starts | 16:33 |
bauzas | that's the parent stack I want :) | 16:33 |
gibi | yeah | 16:33 |
gibi | that is hard | 16:33 |
bauzas | yup | 16:35 |
bauzas | anyway, reviews appreciated on https://review.opendev.org/c/openstack/nova/+/872975 | 16:35 |
bauzas | moving on ? | 16:35 |
sean-k-mooney | sure | 16:35 |
bauzas | #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status | 16:36 |
gibi | I'm on it | 16:36 |
bauzas | #info Please look at the gate failures and file a bug report with the gate-failure tag. | 16:36 |
bauzas | #info Please look at the gate failures and file a bug report with the gate-failure tag. | 16:36 |
bauzas | #info STOP DOING BLIND RECHECKS aka. 'recheck' https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures | 16:37 |
bauzas | done with this topic, phew. | 16:37 |
bauzas | #topic Release Planning | 16:37 |
bauzas | #link https://releases.openstack.org/antelope/schedule.html | 16:37 |
bauzas | #info Antelope-3 is in 1.2 weeks | 16:37 |
bauzas | I said 1.2 because it will be on Thursday next week | 16:37 |
bauzas | #link https://blueprints.launchpad.net/nova/antelope All accepted blueprints for 2023.1 | 16:37 |
bauzas | #link https://etherpad.opendev.org/p/nova-antelope-blueprint-status Blueprint status for 2023.1 | 16:37 |
bauzas | feel free to comment it as much as you want ^ | 16:37 |
bauzas | I was originally planning to do a full reviews set by today, but due to the former topic, I abandoned my promise | 16:38 |
elodilles | a bit related to release: 'Release final os-vif for 2023.1 Antelope' https://review.opendev.org/c/openstack/releases/+/872779 | 16:39 |
bauzas | (yet again saying, don't expect the reviews to magically happen, be present and interact with us) | 16:39 |
bauzas | elodilles: good catch I forgot to add it the agenda | 16:39 |
bauzas | Important : | 16:39 |
bauzas | #info Thursday is the non-client libs feature freeze, which means we can only accept features changes for os-vif, os-traits and os-rc up until Thursday | 16:40 |
bauzas | later changes will be on hold until next release | 16:40 |
elodilles | ++ | 16:40 |
bauzas | I haven't looked at os-vif, os-traits and os-resourceclasses master branches, but I think we have open changes on them | 16:41 |
bauzas | so, if anyone wants some addition to those libraries, I'd recommend them to ping me or anyone else for reviews | 16:42 |
bauzas | last point | 16:43 |
bauzas | FeatureFreeze is on next Thursday | 16:43 |
bauzas | we'll see how the gate goes by that time | 16:43 |
bauzas | but as for the older releases, the most important for having your series accepted for Antelope is to get a +W before Thursday EOB | 16:43 |
bauzas | we'll manage the rechecks if needed | 16:44 |
bauzas | don't freak out by the gate stability, but please continue to ensure your patches are ready for reviews | 16:44 |
sean-k-mooney | elodilles: i was planning to propose an os-vif release to include rodolfos patches | 16:44 |
sean-k-mooney | so i want to confirm the sha before we move forward with that | 16:45 |
sean-k-mooney | ill do that after the meeting | 16:45 |
elodilles | sean-k-mooney: as i know he updated the release patch already | 16:45 |
bauzas | sean-k-mooney: thanks, appreciated | 16:45 |
elodilles | sean-k-mooney: but please -1 if something is still missing | 16:45 |
sean-k-mooney | ack just looking now ill +1 if its correct | 16:45 |
elodilles | sean-k-mooney: that is even better :) | 16:46 |
bauzas | I think for os-traits I've seen one patch from Uggla | 16:46 |
bauzas | but I don't think we can reasonably merge the nova related series | 16:47 |
Uggla | yep but it can wait. | 16:47 |
bauzas | next topic, if so | 16:47 |
bauzas | #topic vPTG Planning | 16:47 |
bauzas | a bit early butn | 16:47 |
bauzas | #link https://www.eventbrite.com/e/project-teams-gathering-march-2023-tickets-483971570997 Register your free ticket | 16:47 |
bauzas | also | 16:48 |
bauzas | #link https://etherpad.opendev.org/p/nova-bobcat-ptg Draft PTG etherpad | 16:48 |
bauzas | every cycle, we're asked how long we should have sessions | 16:48 |
bauzas | I thought it would be better to somehow have an idea on how many topics we gonna discuss before saying how many slots we need :) | 16:49 |
bauzas | but I know | 16:49 |
bauzas | lots of topics will arrive the week before the PTG :) | 16:49 |
sean-k-mooney | the more virtual ptgs we have the less energy i have for them. that said i would prefer to have more slots over more days then a few short long ones | 16:50 |
bauzas | the thing is, you have the etherpad, feel free to amend it | 16:50 |
sean-k-mooney | s/short// | 16:50 |
bauzas | sean-k-mooney: this cycle, we will also have a "physical PTG" at the middle of bobcat-1 | 16:50 |
bauzas | that could alleviate some discussions | 16:50 |
sean-k-mooney | that should really be for C planning | 16:50 |
bauzas | or B implementation phasing :) | 16:51 |
sean-k-mooney | either/both its close to Spec Freeze | 16:51 |
bauzas | I haven't checked whether the proposed B agenda is merged yet | 16:51 |
sean-k-mooney | so proably to late for directional changes on large specs | 16:51 |
bauzas | sean-k-mooney: don't disagree, I'm just saying it could help some small contributors to get audience when they need | 16:52 |
elodilles | B schedule was merged | 16:52 |
bauzas | cool | 16:52 |
bauzas | so | 16:52 |
sean-k-mooney | https://releases.openstack.org/bobcat/schedule.html | 16:52 |
sean-k-mooney | so yes its merged | 16:52 |
bauzas | https://releases.openstack.org/bobcat/schedule.html ays that pPTG will be 3 weeks before specfreeze (if we agree on the PTG at b-2 being spec freeze) | 16:53 |
bauzas | so, that's why I'm saying we could have a shorter but productive vPTG | 16:53 |
bauzas | like 2 hours per day | 16:53 |
bauzas | (and ideally, I'd like to attend some TC discussions this time) | 16:53 |
sean-k-mooney | i would prefer to frontload the plannign to the vPTG and have the physical one be more C focused but ok | 16:54 |
bauzas | don't disagree | 16:54 |
sean-k-mooney | because of its time in the cycle it feels more like a fourm then a ptg | 16:54 |
bauzas | anyway, we're rushing out of time | 16:54 |
bauzas | #topic Review priorities | 16:54 |
bauzas | #link https://review.opendev.org/q/status:open+(project:openstack/nova+OR+project:openstack/placement+OR+project:openstack/os-traits+OR+project:openstack/os-resource-classes+OR+project:openstack/os-vif+OR+project:openstack/python-novaclient+OR+project:openstack/osc-placement)+(label:Review-Priority%252B1+OR+label:Review-Priority%252B2) | 16:54 |
bauzas | #info As a reminder, cores eager to review changes can +1 to indicate their interest, +2 for committing to the review | 16:54 |
bauzas | #topic Stable Branches | 16:54 |
bauzas | elodilles: you have 5 mins (sorry) | 16:54 |
elodilles | #info stable branches seem to be OK back till wallaby | 16:54 |
bauzas | huzzah | 16:55 |
elodilles | stable/wallaby gate is passing (failing openstacksdk-functional-devstack job was removed from wallaby) | 16:55 |
bauzas | thanks gmann for the hard work on stable/wallaby | 16:55 |
elodilles | yepp | 16:55 |
elodilles | ++ | 16:55 |
elodilles | #info stable/victoria gate is affected by the same failing openstacksdk-functional-devstack job | 16:55 |
elodilles | #info ussuri and train gates are broken broken ('Could not build wheels for bcrypt, cryptography' errors) | 16:55 |
elodilles | #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci | 16:55 |
elodilles | EOM | 16:55 |
bauzas | elodilles: I'll pay attention to the ussuri branch with the CVE backport | 16:55 |
elodilles | bauzas: ack | 16:56 |
bauzas | ... when I have time :) | 16:56 |
elodilles | bauzas: i have a proposed potential workaround | 16:56 |
elodilles | for ussuri | 16:56 |
bauzas | elodilles: cool, let's figure that out after the meeting, tomorrow per say | 16:56 |
elodilles | bauzas: ++ | 16:56 |
bauzas | fwiw, I'm planning to deliver the cve fix down to ussuri | 16:56 |
bauzas | but not provide any backport to train | 16:57 |
elodilles | why not train? :) | 16:57 |
bauzas | due to the oslo.utils versioning | 16:57 |
bauzas | most of the distros now made the backports | 16:57 |
bauzas | so it's upstream support | 16:57 |
bauzas | and Train is on EM | 16:57 |
elodilles | well, Wallaby is EM | 16:57 |
bauzas | and ussuri too | 16:57 |
elodilles | (and Xena soon, too) | 16:57 |
bauzas | but it was simple to backport the fix down to ussuri | 16:58 |
bauzas | it was cheap, so we proposed it | 16:58 |
bauzas | backporting it to train is a totally different story | 16:58 |
elodilles | ok :) thanks for that! | 16:58 |
bauzas | it requires some oslo.utils backport too (and then a janga puzzle with dependency management) | 16:58 |
elodilles | :S | 16:59 |
bauzas | so, things are said, crystal clear. | 16:59 |
elodilles | thanks, i see | 16:59 |
bauzas | elodilles: thanks elodilles for the stable report | 16:59 |
elodilles | np | 16:59 |
bauzas | last point for the 20 secs left | 16:59 |
bauzas | #topic Open discussion | 16:59 |
bauzas | nothing on the agenda | 16:59 |
bauzas | so I'll close the meeting | 17:00 |
bauzas | feel free to add your items for next week | 17:00 |
bauzas | thnaks all | 17:00 |
bauzas | #endmeeting | 17:00 |
opendevmeet | Meeting ended Tue Feb 7 17:00:21 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 17:00 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/nova/2023/nova.2023-02-07-16.01.html | 17:00 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/nova/2023/nova.2023-02-07-16.01.txt | 17:00 |
opendevmeet | Log: https://meetings.opendev.org/meetings/nova/2023/nova.2023-02-07-16.01.log.html | 17:00 |
elodilles | thanks o/ | 17:02 |
sean-k-mooney | elodilles: we can chat after the meeting but is the crytography issue beign adressed in train. | 17:02 |
gibi | bauzas: I left feedback in https://review.opendev.org/c/openstack/nova/+/872975 | 17:04 |
elodilles | sean-k-mooney[m]: ensure-trust workaround seems to be working for stable/ussuri (i proposed it to master last time accidentally): https://review.opendev.org/c/openstack/grenade/+/872969/ | 17:04 |
bauzas | gibi: appreciated. I can speak French, English and a bit of German and Spanish, but my eventlet is definitely small | 17:05 |
elodilles | sean-k-mooney[m]: and ykarel in neutron added it to tempest (devstack, actually) as well for train, so hopefully something like that should do the trick in train | 17:05 |
gibi | it is not a big codebase (eventlet + greenlet) but it is not straitforward either. And greenlet has parts of it implemented as a C extension to python :) | 17:05 |
bauzas | gibi: elodilles: thanks for the reviews | 17:06 |
elodilles | np | 17:07 |
bauzas | to clarify, 1/ this is hard to tell which tests are in cause | 17:07 |
sean-k-mooney | elodilles: ack i was asking as its currenlty blocking a ceilometer/devstack fix i was working on | 17:07 |
sean-k-mooney | elodilles: changes to the telemetry tempest plugin check for compatiablity with train | 17:07 |
bauzas | elodilles: you missed today's conversation but I basically grepped the occurrences of such problem trying to intersect the failing tests and basically there were no suspects | 17:07 |
bauzas | elodilles: because the failing test is not responsible | 17:07 |
bauzas | elodilles: this is just an unfortunate test that runs at the same time than the thread is throwing the exception | 17:08 |
sean-k-mooney | elodilles: i tought you siad orginaly ensure-rust would not work | 17:08 |
bauzas | bite by the bullet | 17:08 |
bauzas | gibi: about your comment, not sure I fully understand | 17:08 |
bauzas | gibi: I can remove the DNM: prefix in the log | 17:09 |
elodilles | sean-k-mooney: yepp, and i was wrong :/ (pushed it to the wrong branch) | 17:09 |
gibi | bauzas: I'm not happy to merge this as it will log things that is misleading I would rather keep a DNM patch that we recheck until hits the issue | 17:09 |
sean-k-mooney | elodilles: ah ok :) then cool i can recheck https://review.opendev.org/c/openstack/telemetry-tempest-plugin/+/872350 after its merged | 17:09 |
bauzas | gibi: but for case #1 you mentioned, this seems to me OK to have this log | 17:09 |
bauzas | gibi: I mean | 17:10 |
elodilles | bauzas: ack. i got your intention (i think): to catch the test with the 'DNM' log whenever we see the failing test | 17:10 |
bauzas | I'm a developer, I'm writing a functest and I forget to poison libvirt | 17:10 |
sean-k-mooney | we have a fixture to poision it | 17:10 |
sean-k-mooney | and we dont install libvirt so the import should also fail | 17:11 |
bauzas | then I'd see my gate saying -1 if myself I'm not brave enough to run the functest locally | 17:11 |
sean-k-mooney | in teh fucntional tests the libvirt python package should not be there | 17:11 |
sean-k-mooney | unless it has been baked into the ci image | 17:11 |
clarkb | it shouldn't be | 17:11 |
bauzas | sean-k-mooney: context is, the poison disappears when the thread is executed | 17:11 |
sean-k-mooney | we intentionally od not list libvirt in test-requirements.txt or requirements.txt | 17:12 |
bauzas | so the threads gets a None attribute for the import | 17:12 |
sean-k-mooney | ok so it raises an import error as we expect | 17:12 |
bauzas | to quote gibi "an existing test case that is properly poisoned and mocked libvirt. But an eventlet is leaked out from the test, the test finished and removed the mock. Then the leaked eventlet wakes up while a later test case runs and because the mock was removed in when the original test finished the leaked eventlet now imports libvirt and hits the poison set up by the current test." | 17:12 |
sean-k-mooney | that should fail the test bug im guessign we are using spawn_n | 17:12 |
gibi | bauzas: the code you injected does not help catching such case where the poison was not added | 17:13 |
gibi | bauzas: and our goal here now is to know what test leaked the eventlet | 17:13 |
gibi | to be able to reproduce the leak locally and fix it | 17:14 |
sean-k-mooney | it will just log an error with the orgianl evently id to help identigy the test that was not poisoned | 17:14 |
bauzas | gibi: yup, that's why I'm trying where to patch | 17:14 |
sean-k-mooney | gibi: is it a libvirt import in all cases or just some | 17:14 |
bauzas | sean-k-mooney: no the test that runs when the greenthread wakes up then turns into a failure | 17:15 |
bauzas | https://4dca9d38a541907e85e1-0253beca39d73a6e7192d5b32ed5edc2.ssl.cf2.rackcdn.com/860282/2/check/nova-tox-functional-py310/466e0d7/testr_results.html | 17:15 |
bauzas | (one of the many occurences) | 17:15 |
bauzas | or https://4dca9d38a541907e85e1-0253beca39d73a6e7192d5b32ed5edc2.ssl.cf2.rackcdn.com/860282/2/check/nova-tox-functional-py310/466e0d7/testr_results.html | 17:15 |
bauzas | or https://4dca9d38a541907e85e1-0253beca39d73a6e7192d5b32ed5edc2.ssl.cf2.rackcdn.com/860282/2/check/nova-tox-functional-py310/466e0d7/testr_results.html | 17:15 |
gibi | sean-k-mooney: depending on when the leaked eventlet weaks up it either hits the libvirt poison and fails the test, or just logs the stack traces and let the test passes if no poison is in place | 17:16 |
bauzas | or https://7ffaea22ff93fca2f0ea-bf433abff5f8b85f7f80257b72ac6f67.ssl.cf5.rackcdn.com/869900/7/gate/nova-tox-functional-py38/3b10d8a/testr_results.html (sorry) | 17:16 |
sean-k-mooney | gibi: ack | 17:16 |
bauzas | gibi: yup, I found some run | 17:16 |
gibi | sean-k-mooney: the poision acts like the global state the lets the leaked eventlet manipulate the running test case | 17:16 |
sean-k-mooney | gibi: an dis it spawn_n in all cases | 17:16 |
sean-k-mooney | gibi:yes but this is not a reulst of the poision its just highlighign an exisitng issue | 17:17 |
gibi | sean-k-mooney: yes the poison is good | 17:17 |
sean-k-mooney | we did have an existing thing like this related to noticiation i think in the past right | 17:17 |
gibi | sean-k-mooney: yes | 17:17 |
sean-k-mooney | and we checked the eventlet id | 17:17 |
gibi | sean-k-mooney: that embeds the testcase id to the eventlet | 17:18 |
sean-k-mooney | yep | 17:18 |
gibi | and checks it during the notification code path | 17:18 |
sean-k-mooney | which is what bauzas is logging now | 17:18 |
gibi | and that path is fixed | 17:18 |
gibi | sean-k-mooney: yes, we try to log that now for this poison / live_migration_abort() codepath | 17:18 |
bauzas | sean-k-mooney: yes, I'm trying to see what's firing the greenthread | 17:18 |
sean-k-mooney | so longterm i still wonder if we should make nova use a green pool | 17:18 |
sean-k-mooney | and then in the tests we can make each test use there own greenpool | 17:19 |
sean-k-mooney | and call wait on that in the test cleanup | 17:19 |
sean-k-mooney | i think that would be relitvly simple to do | 17:19 |
sean-k-mooney | im just not sure we want to do it 2 weeks before FF | 17:20 |
bauzas | no | 17:20 |
bauzas | please :) | 17:20 |
gibi | sean-k-mooney: we would still need a reproduce for the current failure to see that if the pooling fixes it :) | 17:20 |
bauzas | gibi: I missed your top comment | 17:20 |
sean-k-mooney | gibi: yes we would :) | 17:20 |
bauzas | I'll amend .zuul.yaml | 17:20 |
sean-k-mooney | gibi: but it would allow use to piosion direct calls to spawn/spaw_n potentially and ensure we cant leek eventlets between cases | 17:21 |
gibi | so let's get a reproducer first by figuring out the leak tests (we know that there is more than one as simply intersecting testcase lists from failed test workers did not result in a single test case but an empty list) | 17:21 |
gibi | sean-k-mooney: I'm not against fixing this via pooling :) | 17:21 |
opendevreview | Sylvain Bauza proposed openstack/nova master: DNM: Add logging for leaking out the non-poisoned libvirt testcase https://review.opendev.org/c/openstack/nova/+/872975 | 17:21 |
bauzas | gibi: sean-k-mooney: I'm not against fixing our concurrency mechanism for func tests, I'm just against doing it *now* :) | 17:22 |
gibi | I will disappeare soon. I think we can continue this tomorrow. I will look at the patch and call rechecks time to time during my evening | 17:23 |
bauzas | gibi: if only I was able to reproduce it locally, I could just call tox with -- --until-failure | 17:25 |
gibi | bauzas: yeah | 17:27 |
gibi | that is the key. If we have it locally I can add as much runtime to it as I want | 17:27 |
bauzas | anyway, have a good evening | 17:27 |
bauzas | and thanks for the help | 17:27 |
bauzas | I think I'll shortly stop too | 17:27 |
gibi | bauzas: thanks for the work, I think we made good progress today. I was not able to do that without you. | 17:28 |
opendevreview | Maksim Malchuk proposed openstack/nova stable/xena: Fix to implement 'pack' or 'spread' VM's NUMA cells https://review.opendev.org/c/openstack/nova/+/829804 | 17:31 |
opendevreview | Maksim Malchuk proposed openstack/nova stable/wallaby: Fix to implement 'pack' or 'spread' VM's NUMA cells https://review.opendev.org/c/openstack/nova/+/861832 | 17:37 |
*** umbSubli1 is now known as umbSublime | 18:11 | |
*** efried1 is now known as efried | 19:24 | |
opendevreview | Maxim Monin proposed openstack/nova master: Server Rescue leads to Server ERROR state if base image is deleted https://review.opendev.org/c/openstack/nova/+/872385 | 20:49 |
opendevreview | Balazs Gibizer proposed openstack/nova master: DNM: Add logging for leaking out the non-poisoned libvirt testcase https://review.opendev.org/c/openstack/nova/+/872975 | 21:06 |
*** dasm|rover is now known as dasm|off | 22:40 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!