Tuesday, 2023-02-21

opendevreviewAmit Uniyal proposed openstack/tempest master: Adds test for resize server swap to 0  https://review.opendev.org/c/openstack/tempest/+/85888506:01
*** ralonsoh_ooo is now known as ralonsoh07:32
*** jpena|off is now known as jpena08:29
ykarelgmann, kopecmartin https://review.opendev.org/c/openstack/devstack/+/859773 broke atleast tempest slow jobs10:19
kopecmartinoh, tempest.scenario.test_network_basic_ops.TestNetworkBasicOps fails with - cat: can't open '/var/run/udhcpc.eth0.pid': No such file or directory 10:23
kopecmartinhttps://2ba7f10ac23ddac3b9f6-1e843e6e8b4b324e302975788622dfa4.ssl.cf2.rackcdn.com/874232/1/check/tempest-slow-py3/35d32f7/testr_results.html10:23
kopecmartinrenew_lease method fails :/10:25
kopecmartinbut if that failed due to cirros bump, it could have failed with any other custom image a user might use 10:26
ykarelyes if those images don't use udhcpc10:27
ykarelfrom what i see only that test uses that config option and is marked as slow10:29
ykarelso only jobs running slow tests are impacted10:29
kopecmartinykarel: right, i see that udhcpc is the default client .. anyway, does this mean that they changed cirros in 0.6.1 not to include this client? or maybe use a different one, i'm trying to find a change log or smth12:46
kopecmartinoh12:47
kopecmartinhttps://github.com/cirros-dev/cirros/blob/0.6.1/ChangeLog#L2012:48
kopecmartinthey switched to dhcpcd12:48
ykarelkopecmartin, yeap12:48
ykarelhttps://github.com/cirros-dev/cirros/commit/ded54d3524d1dda485b095ed8a0f934695200c6512:48
ykarelhttps://github.com/cirros-dev/cirros/commit/e59406d14c857a949d6eeb400d67c2ed8f54539012:48
kopecmartini'm gonna try to override the default client to dhcpcd in the slow job12:49
*** ralonsoh is now known as ralonsoh_lunch12:51
ykarel+112:52
opendevreviewMilana Levy proposed openstack/tempest master: This change was written so that a new volume could be created by another client other than the primary admin  https://review.opendev.org/c/openstack/tempest/+/87457712:58
*** ralonsoh_lunch is now known as ralonsoh13:31
opendevreviewMartin Kopec proposed openstack/tempest master: Change dhcp client to dhcpcd in slow jobs  https://review.opendev.org/c/openstack/tempest/+/87458613:57
opendevreviewyatin proposed openstack/grenade master: Dump Console log if ping fails  https://review.opendev.org/c/openstack/grenade/+/87441714:12
kopecmartin#startmeeting qa15:01
opendevmeetMeeting started Tue Feb 21 15:01:04 2023 UTC and is due to finish in 60 minutes.  The chair is kopecmartin. Information about MeetBot at http://wiki.debian.org/MeetBot.15:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:01
opendevmeetThe meeting name has been set to 'qa'15:01
mnaserwas there previously an invisible_to_admin user or something like that in the past in devstack?15:01
mnaseroops bad timing15:01
mnaserignore that :)15:01
lpiwowaro/15:01
kopecmartinmnaser: i don't know, let me get back to that at the end of the meeting in the Open Discussion 15:01
*** yadnesh_ is now known as yadnesh|away15:02
kopecmartin#topic Announcement and Action Item (Optional)15:04
kopecmartinOpenStack Elections15:04
kopecmartinthe current status at15:04
kopecmartin#link #link https://governance.openstack.org/election/15:04
kopecmartin#topic Antelope Priority Items progress15:05
kopecmartin#link https://etherpad.opendev.org/p/qa-antelope-priority15:05
* kopecmartin checks the status there15:05
fricklerno updates on ceph plugin I guess?15:07
kopecmartindoesn't look like it15:08
kopecmartinanyone working on that?15:08
fricklerI thought that at some time you wanted to take a look at the tempest issue. or was it gmann?15:08
kopecmartini think it was me and i lost it in the pile of tabs :/ 15:09
kopecmartini rechecked that to get fresh logs15:09
kopecmartini'm gonna try to get to that 15:09
fricklercool15:10
kopecmartinso the goal is to fix whatever is failing here now, right? 15:10
kopecmartin#link https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/86531515:10
kopecmartinso that we can merge that 15:10
frickleryes15:10
kopecmartinokey15:12
kopecmartinthere's been a progress on FIPS 15:12
kopecmartin#link https://review.opendev.org/c/openstack/devstack/+/87160615:12
kopecmartinbut that depends on a patch in zuul-jobs15:13
kopecmartini wonder whether there are patches which depend on the devstack one - becuase the patch ^^ doesn't change anything, just allows the consumers to enable fips15:13
kopecmartinI'll check with Ade15:13
kopecmartinoh, one thing i forgot to mention in the announcements section 15:16
kopecmartinwe're about to release a new tempest tag15:16
kopecmartin#link https://review.opendev.org/c/openstack/tempest/+/87101815:16
kopecmartinthe patches are in the queue depending on that one ^^15:16
kopecmartinwhich  is currently blocked by the cirros bump, but i'll get to that later15:16
kopecmartin#topic OpenStack Events Updates and Planning15:17
kopecmartin#link https://etherpad.opendev.org/p/qa-bobcat-ptg15:17
kopecmartinif you have any ideas for the topics to discuss over the ptg, then ^^15:17
kopecmartini'll need to reserve some time and think about the topics we might wanna cover during the PTG15:18
kopecmartin#topic Gate Status Checks15:19
kopecmartin#link https://review.opendev.org/q/label:Review-Priority%253D%252B2+status:open+(project:openstack/tempest+OR+project:openstack/patrole+OR+project:openstack/devstack+OR+project:openstack/grenade)15:19
kopecmartin2 reviews one is blocked by the other15:19
kopecmartinthe cirros version bump caused an issue with the dhcp client .. apparently the new cirros uses a different dhcp client by default15:20
opendevreviewJorge San Emeterio proposed openstack/tempest master: Create a tempest test to verify bz#2118968  https://review.opendev.org/c/openstack/tempest/+/87370615:20
kopecmartinmore info here15:20
kopecmartin#link https://review.opendev.org/c/openstack/tempest/+/87458615:20
kopecmartinand in the associated bug report 15:20
kopecmartinanything urgent to review? 15:22
kopecmartin#topic Bare rechecks15:23
kopecmartin#link https://etherpad.opendev.org/p/recheck-weekly-summary15:23
kopecmartinwe're doing quite good here15:23
kopecmartin#topic Periodic jobs Status Checks15:23
kopecmartinstable15:23
kopecmartin#link https://zuul.openstack.org/builds?job_name=tempest-full-yoga&job_name=tempest-full-xena&job_name=tempest-full-wallaby-py3&job_name=tempest-full-victoria-py3&job_name=tempest-full-ussuri-py3&job_name=tempest-full-zed&pipeline=periodic-stable15:23
kopecmartinmaster15:23
kopecmartin#link https://zuul.openstack.org/builds?project=openstack%2Ftempest&project=openstack%2Fdevstack&pipeline=periodic15:23
kopecmartinmaster got hit by the dhcp client issue15:24
kopecmartini'm checking whether those jobs would be fixed by the patch i earlier15:26
frickleryes, I changed the dhcp client in order to better support different IPv6 scenarios15:27
kopecmartinack, it requires a small change in a few jobs because tempest uses the previous dhcp client by default 15:29
kopecmartin#topic Distros check15:30
kopecmartincs-915:31
kopecmartin#link https://zuul.openstack.org/builds?job_name=tempest-full-centos-9-stream&job_name=devstack-platform-centos-9-stream&skip=015:31
kopecmartinfedora15:31
kopecmartin#link https://zuul.openstack.org/builds?job_name=devstack-platform-fedora-latest&skip=015:31
kopecmartindebian15:31
kopecmartin#link https://zuul.openstack.org/builds?job_name=devstack-platform-debian-bullseye&skip=015:31
kopecmartinfocal15:31
kopecmartin#link https://zuul.opendev.org/t/openstack/builds?job_name=devstack-platform-ubuntu-focal&skip=015:31
kopecmartinrocky15:31
kopecmartin#link https://zuul.openstack.org/builds?job_name=devstack-platform-rocky-blue-onyx15:31
kopecmartinopenEueler15:31
kopecmartin#link https://zuul.openstack.org/builds?job_name=devstack-platform-openEuler-22.03-ovn-source&job_name=devstack-platform-openEuler-22.03-ovs&skip=015:31
kopecmartinall good, all passing, note that we merged the fix for rocky only a day or 2 ago15:32
kopecmartin#topic Sub Teams highlights15:33
kopecmartinChanges with Review-Priority == +115:33
kopecmartin#link https://review.opendev.org/q/label:Review-Priority%253D%252B1+status:open+(project:openstack/tempest+OR+project:openstack/patrole+OR+project:openstack/devstack+OR+project:openstack/grenade)15:33
kopecmartinno reviews there15:33
kopecmartin#topic Open Discussion15:33
kopecmartin (gmann) PyPi additional maintainers audit for QA repo 15:34
kopecmartinregarding this15:34
kopecmartinwe have reached out to everyone we could find 15:34
kopecmartini think we can consider this done15:34
kopecmartin.. i made a note here that we are ok with the removal of additional maintainers 15:35
kopecmartin#link https://etherpad.opendev.org/p/openstack-pypi-maintainers-cleanup15:35
kopecmartinanything for the open discussion?15:35
tkajinamo/15:36
tkajinamMay I bring one topic ?15:36
kopecmartinsure15:36
tkajinamhttps://github.com/unbit/uwsgi/commit/5838086dd4490b8a55ff58fc0bf0f108caa4e07915:37
tkajinamI happened to notice uwsgi announced maintenance mode last year. is anybody aware of this ?15:37
tkajinamthis might be concerning for us because we are now extensively using uwsgi in devstack afaik15:38
kopecmartinisn't the maintenance mode enough for us?15:40
tkajinamif they will still maintain it well. but it's not a good sign imho.15:40
kopecmartinyes, that's true15:41
kopecmartinhow can we mitigate that? 15:41
kopecmartinshould we plan replacing that with something else?15:41
kopecmartin(seems we have a topic for the upcoming virtual ptg)15:41
tkajinamI noticed this 30 minutes ago so sharing this here so I don't have clear ideas now. we probably have to check the reason behind that shift and prepare replacement plan in case it becomes unmaintained.15:42
kopecmartintkajinam: i'm just thinking outloud .. thanks for sharing, it's very appreciated 15:43
kopecmartingmann: ^ did it come up in TC?15:43
kopecmartinlet's gather more info and get back to this 15:43
tkajinamI'll send an email to openstack-discuss. probably that would be a good way to initiate discussion around this.15:44
kopecmartintkajinam: very good idea15:44
kopecmartin+115:44
kopecmartinsearching the ML whether it hasn't come up already and i don't see anything specific 15:45
fricklerdoesn't ring a bell for me, either, but certainly worth discussing15:46
kopecmartinyeah, this is interesting, seems like a very important info and it didn't come up for a year o.O .. thanks again tkajinam15:47
fricklerregarding mnaser's question, I only know about the project of that name, not a user https://opendev.org/openstack/devstack/src/branch/master/lib/keystone#L343-L34515:47
tkajinamkopecmartin frickler, thanks !15:48
mnaserim trying to fix ospurge gate and its failing because of that15:48
mnaserhttps://opendev.org/x/ospurge/src/branch/master/tools/func-tests.sh#L3215:48
kopecmartinit doesn't look it's used anywhere else but there 15:49
kopecmartin#link https://codesearch.opendev.org/?q=invisible_to_admin_demo_pass&i=nope&literal=nope&files=&excludeFiles=&repos=15:49
fricklermnaser: iiuc "demo" is the username and invisible_to_admin the project name15:50
fricklerdo you have a link to a failure?15:50
mnaserhttps://zuul.opendev.org/t/openstack/build/ef954eefbef2439da35829b2f99d8ef515:50
mnaseryeah so i wonder if it's bitrot since its no longer used, since i checked codesearch too15:50
kopecmartinit seds files under DEVSTACK_DIR which aren't there, i wonder whether they were there at some point or they were just generated by someone on the fly15:53
fricklerit seems accrc is no longer being created at all15:54
kopecmartinthe last commit in x/ospurge was done 3 years ago 15:54
mnaseryeah theres a lot of bitrot there15:54
mnaserbut ah ok if accrc is not a thing at all15:55
fricklermight be related to our general move to clouds.yaml, does ospurge support that?15:55
mnaseri tihnk it uses openstacksdk client in the backend15:55
mnaserso i could update the tests to use --os-cloud15:55
fricklerI think that that would be the best path looking forward15:56
mnaserok ill try to see what teh different options and how the clouds yaml file is generated and clean up that file15:57
fricklerwe could add a cloud definition for the invisible project if needed15:57
frickleranyway, I think we can also continue this discussion after the meeting15:58
kopecmartinack , last quick note about the bug triage15:59
kopecmartin#topic Bug Triage15:59
kopecmartin#link https://etherpad.openstack.org/p/qa-bug-triage-antelope15:59
kopecmartinnumbers recorded as always15:59
kopecmartinand we're out of time15:59
kopecmartinthank you everyone for joining 15:59
kopecmartinsee you online 15:59
kopecmartin#endmeeting16:00
opendevmeetMeeting ended Tue Feb 21 16:00:04 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:00
opendevmeetMinutes:        https://meetings.opendev.org/meetings/qa/2023/qa.2023-02-21-15.01.html16:00
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/qa/2023/qa.2023-02-21-15.01.txt16:00
opendevmeetLog:            https://meetings.opendev.org/meetings/qa/2023/qa.2023-02-21-15.01.log.html16:00
fricklerthx kopecmartin 16:00
*** artom_ is now known as artom16:01
lpiwowarthanks o/16:01
*** sean-k-mooney1 is now known as sean-k-mooney16:25
opendevreviewMerged openstack/grenade master: Dump Console log if ping fails  https://review.opendev.org/c/openstack/grenade/+/87441718:00
gmanntkajinam: thanks for bringing it, I will check mail18:43
dansmithgmann: can you tell if this failure is during tearDown() or part of the test itself? https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_137/874664/1/check/tempest-integrated-compute-ubuntu-focal/1376f12/testr_results.html19:32
dansmithbecause it doesn't show me a trace in the actual test, I'm guessing this is just tearDown?19:32
gmannk, checking19:37
gmanndansmith: yes, it is  during tearDown from here https://github.com/openstack/tempest/blob/3f9ae1349768ee7ad7f163a302dd387847ebce7a/tempest/api/volume/base.py#L12719:39
dansmithgmann: okay, so I think what's going on there is that we've disturbed the guest a lot by doing the snapshot while it's running, and it is stuck to the point where it will not detach the volume19:40
dansmithso we sit there and wait for 8*20s trying to detach it, it never lets go, so we never finish detaching and we fail there19:40
dansmithI'm not sure if that really means the test failed or not, because the snapshots succeeded19:41
dansmithbut we could (a) get the console of the guest to see if it has a kernel panic or something (unlikely I think)19:41
dansmithor (b) we could try a force reboot of the guest during teardown before we try to clean up or something19:42
dansmithwe see this failure pattern a lot19:42
dansmithso I'm trying to think of how we can either be more forgiving here, or debug further what is going on19:42
dansmithI guess I don't really know what happens during a volume snapshot with the guest running, but it seems to clearly destabilize the guest19:43
gmannbut this testwait for deleting the snapshot and before this delete_volume happening right? https://github.com/openstack/tempest/blob/3f9ae1349768ee7ad7f163a302dd387847ebce7a/tempest/api/volume/base.py#L18419:43
dansmithyou're saying it _does_ delete the snaps before the volume right?19:44
gmannyes19:44
gmannso snapshots things should be clean by volume are deleted19:44
dansmithright, but the thing it's failing on is (AFAICT) a detach operation in nova, which tries 8 times and fails because the guest never releases the block device19:44
dansmithgmann: right but that's what I'm saying I think the test has finished already19:44
dansmithwell,19:45
dansmithI guess maybe I'm mistaken about what happens during a snapshot19:45
dansmithgmann: if the test failed in the middle of the meat of the test, wouldn't we see a failure specific to that, in addition to the failure during teardown?19:45
rosmaitao/19:46
dansmithrosmaita: looking at that test that failed.. when that volume test does a force snapshot with the guest running - what is happening? is it snapshotting the volume underneath, or does it try to detach, snapshot, reattach?19:46
dansmithI had assumed the former19:46
rosmaitai think it depends on the driver, but for lvm, i believe it's the former19:47
dansmithrosmaita: this: https://github.com/openstack/tempest/blob/3f9ae1349768ee7ad7f163a302dd387847ebce7a/tempest/api/volume/test_volumes_snapshots.py#L5919:48
dansmithokay, so what I'm trying to determine is if we are hanging on the detach as part of the snapshot, or just the test cleanup19:48
dansmithand based on that, I'm thinking it's the latter.. we've disturbed the guest by doing the snapshot underneath and it's wedged such that we just fail to cleanup19:48
rosmaitalooks to me like the cleanup19:48
gmanndansmith: yeah, I am ok on 'more forgiving here' in cleanup as we do check snapshot deletion happing fine in the test itself https://github.com/openstack/tempest/blob/3f9ae1349768ee7ad7f163a302dd387847ebce7a/tempest/api/volume/base.py#L18419:49
dansmithwe hit this sort of "volume fails to detach" thing so *very* often, that I think we need to do something here19:49
gmannsnapshot deletion complete the operation this test teting19:49
dansmithgmann: right, okay19:50
rosmaitawhat i'm seeing in the c-vol log is the volume is reported as available, and then a series of lvcreate --snapshot commands19:50
gmannthis is same case many other test case cleanup also where detaching stuck in cleanup when test try many operations19:50
dansmithrosmaita: all that happens underneath without really disturbing iscsi or the guest, I would think, so I'm not sure what the problem is19:50
dansmithgmann: yes, but volume detach is a good portion of those It hink19:51
gmannyeah19:52
dansmithgmann: so, hmm.. should we have already run the delete server and wait for termination part of the cleanup?19:54
dansmithhttps://github.com/openstack/tempest/blob/3f9ae1349768ee7ad7f163a302dd387847ebce7a/tempest/api/volume/base.py#L21119:55
dansmithor do those run in reverse order so we're trying to delete the volume first?19:55
gmanndansmith: cleanup is in reverse order but here in this test, server is cleanup after test as it is added as addCleanup and delete volume cleanup happening at test class level as it is added as addClassResourceCleanup https://github.com/openstack/tempest/blob/3f9ae1349768ee7ad7f163a302dd387847ebce7a/tempest/api/volume/base.py#L12719:58
gmannso delete volume happening later  19:58
dansmithgmann: the instance is still very clearly running, but you think it should have already been deleted?19:58
gmannis it?19:59
dansmithyes20:00
dansmithwell,20:00
dansmithlet me say the instance is still running when the volume fails to detach20:01
gmannI see volume detach request here 2023-02-21 17:34:25.163 99115 INFO tempest.lib.common.rest_client [req-b137b6c4-ba62-411a-b2c4-bd5637202770 req-b137b6c4-ba62-411a-b2c4-bd5637202770 ] Request (VolumesSnapshotTestJSON:_run_cleanups): 202 DELETE https://10.176.196.163/compute/v2.1/servers/91b2ff57-588c-454c-aad1-3e67749420ee/os-volume_attachments/6ec5c1f8-6f4c-430f-94c2-6e08f0ce78f9 0.186s20:05
gmannthis is from tempest.log20:05
gmannand server deletion request was not done yet20:05
dansmithokay, so,20:06
dansmithI think maybe we're actually stuck trying to delete the server20:06
dansmithand it's stuck because it's waiting for the volume to be detached gracefully20:06
dansmithI'm thinking the server becomes "deleted" immediately from the view of tempest,20:08
dansmithso it moves on to delete the volume,20:08
gmannit wait for server termination20:08
gmannhttps://github.com/openstack/tempest/blob/3f9ae1349768ee7ad7f163a302dd387847ebce7a/tempest/api/volume/base.py#L21120:08
dansmithwhich it can't do because the instance is still kinda stuck detaching in its attempt to be deleted20:08
dansmithgmann: right but it just waits for it to go 404, which happens basically immediately I think20:09
dansmithbut n-cpu continues to gracefully detach the volume before it deletes the server20:09
gmannah right20:09
dansmithso, here's the thing20:10
gmannif detach is stuck it should stuck here https://github.com/openstack/tempest/blob/3f9ae1349768ee7ad7f163a302dd387847ebce7a/tempest/api/volume/base.py#L19320:10
dansmithmelwitt was working on moving us to force-detach with brick for another unrelated thing20:10
dansmithwhich is actually what we should be doing on delete server20:10
dansmithso maybe we could try applying that and see if some/all of these go away20:10
dansmithforce-detach volumes with brick only on server delete, I mean20:11
dansmithgmann: because on server delete, we try to do a graceful shutdown, but with limited patience before we cut and actually delete20:12
dansmithand that's what brick's force detach _does_20:12
dansmithbut this volume failure to detach can get in the way of that20:12
gmannyeah, i can see detach stuck but because it is cleanup delete server still run         Body: b'{"badRequest": {"code": 400, "message": "Invalid volume: Volume status must be available or error or error_restoring or error_extending or error_managing and must not be migrating, attached, belong to a group, have snapshots, awaiting a transfer, or be disassociated from snapshots after volume transfer."}}' _log_request_full 20:12
gmann/opt/stack/tempest/tempest/lib/common/rest_client.py:46420:12
gmannthis is right before the server delete request20:12
dansmithgmann: exactly, it gets that immediately, even before it has tried to do anything with the volume20:13
dansmithoh wait, no20:13
dansmithit does try to delete the attachment20:13
dansmithdang20:13
dansmithI misread that call, I thought it was trying to delete the server, but it's actually trying to delete the *attachment* is that right?20:14
dansmiththis, is what I didn't have scrolled far enough to the right: 2023-02-21 17:34:25,163 99115 INFO     [tempest.lib.common.rest_client] Request (VolumesSnapshotTestJSON:_run_cleanups): 202 DELETE https://10.176.196.163/compute/v2.1/servers/91b2ff57-588c-454c-aad1-3e67749420ee/os-volume_attachments/6ec5c1f8-6f4c-430f-94c2-6e08f0ce78f9 0.186s20:14
gmannyes this happen before delete server20:15
dansmithokay, my bad20:16
dansmiththe other thing that I thought supported this, is that immediately after we see the final detach attempt fail in the n-cpu log, the instance is deleted20:16
dansmithso I thought it was stuck in that wait process20:16
dansmithgmann: so where is the tempest code that tries to delete the attachment?20:17
gmannfrom here, and it does wait for attachment to be deleted https://github.com/openstack/tempest/blob/3f9ae1349768ee7ad7f163a302dd387847ebce7a/tempest/api/volume/base.py#L193-L19520:18
gmannit is from attach_volume cleanup20:18
dansmithah, in the attach I see20:18
dansmithI never can wrap my head around all the positive actions do their own cleanup scheduling20:19
dansmithokay, so I don't think we have any way to do the force detach from the API20:19
dansmithgmann: so maybe a force reboot of the affected instance before we go to do the detach? it's a little messy, but it might shake it loose20:20
gmannhumm that makes testd more lengthy 20:21
gmanncan we go for ignoring detach completion in such non-detach test doing lot of other operation on guest?20:22
rosmaitai don't know what this means, but in c-vol log, that volume is last mentioned when the 3rd snapshot is created at Feb 21 17:34:17.577277, and then not again until Feb 21 17:37:47.596555 when the initiator is deleted ... which seems a  long time after that delete-attachment call gmann posted earlier20:22
dansmithgmann: meaning don't do the wait_for_volume_resource_status==available step?20:23
dansmithrosmaita: right because it's waiting for the guest to let go before it does20:23
gmanndansmith: yes but delete server will stuck right20:23
dansmithrosmaita: 8 attempts at 20s each20:23
dansmithgmann: well, if we make delete server (in nova) properly do a force detach of the volume because it's being deleted, that would actually improve20:24
dansmithgmann: so (1) do not wait for delete attachment to complete (2) go straight to delete server (3) make nova do force detach in delete server (which we need to do anyway)20:24
gmanndansmith: but does volume get deleted in-use state?20:24
dansmithgmann: it will still go back to available i think once the server is deleted20:25
gmanndansmith: i see. I think that is right way as server is anyway going to be deleted to cleanup attachment forcefully and tell cinder the same so they they can make volume available 20:26
gmannhope volume will be ok to be reused again? if no then force volume delete also needed?20:27
dansmithgmann: so we would need a flag to attach volume that says "don't schedule a wait_for_volume_resource_status because I'm going to delete this server" ?20:27
dansmithgmann: ah, because this volume is shared among other tests in this class?20:27
gmannno, not in this test. I am thinking about general users scenarion if nova forcefully delete attachment but volume is not reusable because of that20:28
dansmithgmann: oh yeah, it has to be reusable for sure20:28
gmannok, then it is fine20:29
dansmithgmann: force detach still tries to do it gracefully first, it just forces if it doesn't go easily20:29
dansmithmelwitt: right?20:29
gmanndansmith: yeah in later case, what we will do with volume (mean tell cinder)20:29
dansmithI guess I'm echoing what I heard about brick's detach20:30
dansmithI haven't chased the process that nova goes through on delete, but it *has* to delete the attachment with cinder20:30
gmannbecause in case where user want to reuse the volume after server delete, I think stuck in serve delete is better than delete-server-with-force-detach-but-make-volume-unusable20:32
dansmithgmann: do you mean unusable because of the state of the volume, or "unclean unmount from the guest" ?20:32
gmann"unclean unmount from the guest" state is fine which can be modified forcefully 20:33
dansmiththe former is definitely required, and I'm sure we're doing that now, or we'd already be locking volumes when you delete a server and it happens gracefully20:33
gmannk20:33
melwittdansmith: os-brick force detach? yes it does a graceful detach first but if it doesn't complete it will force detach it20:34
dansmithgmann: no, delete of an active server is effectively pulling the plug out, just like hard reboot, so if you leave the volume unclean after that, we did what you asked20:34
gmannand this can be a nice test to reuse volume after the delete-server-with-force-delete-attachment 20:34
dansmithmelwitt: yeah, I'm more talking about the nova part.. surely if you delete a server with a volume attached, nova deletes the attachment in cinder20:34
dansmithotherwise even in the everything-worked case, we'd leave the volume unattachable if we didn't delete the attachment record and put it back to "available"20:35
dansmithgmann: for hard reboot from the docs: "The HARD reboot corresponds to the power cycles of the server."20:36
melwittyes it deletes the attachment in cinder as part of an instance delete in nova20:36
melwittit does that after detaching with os-brick and it ignores errors from os-brick and deletes the attachment regardless only for instance delete20:36
dansmithmelwitt: yeah, cool, so if we make tempest *not* do (or wait for) the attachment delete,20:36
dansmiththen just deleting the server will (a) clean up cinder, (b) force-disconnect with brick and not hang and (c) delete the server20:36
melwittyeah if we were to add force=True to our os-brick detach call for server delete, it would do the steps as you describe20:38
dansmithyeah20:39
gmanndansmith: melwitt: that will done by default internally in delete server flow if detach not happening in normal way or it will be based on a new 'force-detach' flag  in delete server nova API? 20:42
dansmithgmann: always, during server delete.. but brick's force detach *tries* graceful first20:43
gmannok20:43
dansmithgmann: just like we do without a volume now.. we ask the server via acpi, but if it doesn't shut down in time, we nuke it from orbit20:43
gmannok20:44
melwittgmann: yeah, volume detach is kind of confusing bc there are multiple steps: 1) detach vol from guest 2) detach vol from host (currently we do not force this) 3) delete attachment in cinder20:44
gmannk, so tempest tests just need to modify the cleanup not to wait for detach things and rely on delete server to do everything 20:46
melwittwe could use the force feature in os-brick at step 2) to force the detach if it doesn't succeed gracefully20:46
gmanni see20:46
dansmithgmann: yeah, I'll put something up in a sec20:47
gmannthanks 20:47
opendevreviewDan Smith proposed openstack/tempest master: Avoid long wait for volume detach in some tests  https://review.opendev.org/c/openstack/tempest/+/87470020:52
dansmithgmann: is that what you had in mind? ^20:53
dansmithcc melwitt 20:53
gmanndansmith: yes. that way20:53
dansmithgmann: so for test cases other than these which might hit the same thing,20:56
dansmithis it easy to add the "log the guest console" thing?20:56
dansmithbecause there's something causing the guest to not release the volume.. that may not manifest on the console, but .. it might 20:56
dansmiththe referenced bug already tried to address some APIC-related reason for this happening (on live migrate I think) but it might be useful to get the console if we fail to wait for deleting the attachment20:57
dansmither, if we hit the timeout waiting for the attach delete, which has failed, I mean20:57
gmannyou mean during detach_volume call itself?20:58
dansmithgmann: no, I mean for any other volume test that may be doing a detach and then wait (i.e. not passing this flag).. if we timeout waiting for the detach, we should log the console20:59
gmanndansmith: yeah we can put that in waiter method itself which can be helpful for other volume state error also21:02
dansmithgmann: can you lazy internet me a link to what to shove in there? :D21:02
dansmithgmann: as you know, I'm very lazy21:02
gmanndansmith: but we do two type of wait for detach to confirm 1. volume status -https://github.com/openstack/tempest/blob/3f9ae1349768ee7ad7f163a302dd387847ebce7a/tempest/api/volume/base.py#L193  2. wait_for_volume_attachment_remove_from_server as in compute test base class https://github.com/openstack/tempest/blob/3f9ae1349768ee7ad7f163a302dd387847ebce7a/tempest/api/compute/base.py#L612-L61521:03
dansmithgmann: the ack, should be in both places21:04
gmanndansmith: so we can do log console here https://github.com/openstack/tempest/blob/1569290be06e61d63061ae35a997aff0ebad68f1/tempest/common/waiters.py#L33721:04
gmanndansmith: and in 2nd place we already do https://github.com/openstack/tempest/blob/1569290be06e61d63061ae35a997aff0ebad68f1/tempest/common/waiters.py#L40521:05
dansmithgmann: aha, cool, I'll add it to the former then21:06
gmanndansmith: ok, former one need to pass server id also it does not have currently 21:06
dansmithgmann: ack21:07
gmannbut as this is generic method for other volume test we can output concole based on server_id none or not21:07
dansmithgmann: ah, we'd need servers_client to do that right?21:17
gmanndansmith: right21:19
dansmithgmann: so are you okay passing both of those (optionally) in there?21:19
gmanndansmith: yeah21:19
dansmithokay21:20
dansmithgmann: also, further investigation reveals that we already do the right thing in nova here21:20
gmannyou mean for force detach thing?21:21
dansmithgmann: if you look at the log, we already force delete the instance  with fire first and then deal with the volumes after wards21:21
dansmithgmann: don't even need force detach for this case21:21
dansmithgmann: right after we fail to wait for the detach in tempest, this happens:21:21
dansmith[instance: 91b2ff57-588c-454c-aad1-3e67749420ee] Instance destroyed successfully.21:21
dansmiththen this:21:21
dansmith[instance: 91b2ff57-588c-454c-aad1-3e67749420ee] calling os-brick to detach iSCSI Volume 21:22
dansmithwhich succeeds 21:22
dansmithso nova doesn't even try to detach the volume before it deletes the instance, it waits until the instance can't possibly be using it anymore and then does the disconnect21:22
dansmithso I think just this tempest patch will likely improve gate things21:23
*** jpena is now known as jpena|off21:25
gmannbut in this test detach is happening before delete server and failing. so you are saying leaving detach things to delete server will clean it up correctly ?21:25
dansmithgmann: yes, I'm saying just this tempest change, and no nova side change is required21:26
gmanndansmith: ok21:26
opendevreviewGhanshyam proposed openstack/tempest master: Fix tempest-full-py3 for stable/ussuri to wallaby  https://review.opendev.org/c/openstack/tempest/+/87470421:31
gmannykarel: ^^ this will fix the stable/wallaby and older job21:33
kopecmartingmann: i'm trying to figure out the fix for this bug - https://bugs.launchpad.net/tempest/+bug/2007973 .. it affects all slow jobs, it's quite many of them and on different branches .. wouldn't it be easier to make the fix in devstack? something like if new cirros image, set the dhcp_client in tempest.conf accordingly21:38
kopecmartinwdyt, would it work?21:38
gmannkopecmartin: is this same as what ykarel reported https://review.opendev.org/c/openstack/devstack/+/859773?tab=comments21:39
gmannI am checking the same and it seems we need to revert the cirros bump to 0.6.1 to unblock gate first and then we can debug ?21:40
opendevreviewGhanshyam proposed openstack/devstack master: Revert "Bump cirros version to 0.6.1"  https://review.opendev.org/c/openstack/devstack/+/87462521:41
gmannkopecmartin: ^^21:41
kopecmartingmann: yup, i opened that bug based on ykarel's feedback 21:42
kopecmartingmann: probably easier to revert and figure it out .. although we know what's wrong21:42
kopecmartini just don't know how to set it affectively in the jobs 21:42
kopecmartinany job which will use the newer cirros version needs to set scenario.dhcp_client to dhcpcd in tempest.conf21:43
kopecmartinit's impossible to go this way - https://review.opendev.org/c/openstack/tempest/+/874586/1/zuul.d/integrated-gate.yaml - too many job variants 21:44
gmannkopecmartin: then you need to do it via config option in tempest.conf and set that from devstack so that it will be set in all jobs on master using new cirrors and job on stable using devstack with old cirrors 21:44
kopecmartinso maybe if we added a condition to devstack like - if cirros >=0.6.1 than set the opt 21:44
kopecmartinexactly , good 21:45
gmannbecause devstack master configure new cirros so setting there without condition can be added and in tempest config option we can keep old dhcp.client as default so we do not need to chaneg devstack21:46
gmannbut to merge those we need to revert devstack change first21:47
kopecmartinomg :D 21:47
kopecmartinit's really easy to get locked out21:48
gmannkopecmartin its release time so expect everything :) 21:49
kopecmartingmann: wait, do we need to revert that? instead of the revert can't we just set the proper dhcp client here https://opendev.org/openstack/devstack/src/branch/master/.zuul.yaml#L578 21:59
gmannkopecmartin: that will break stable branch job. that is why we need config option and set that from devstack22:01
gmanndevstack master set that as it will use cirrors new version and devstack stable branch will not set so default will work22:01
kopecmartini got lost in it, i don't understand how a change in master can break stable jobs when devstack is branched 22:03
gmannkopecmartin: tempest job from master are used to run on stable also right so anything change in configuration of job will impact stable 22:04
gmannunless you are adding it as condition but that only solve job not the tempest run with new cirrors in production 22:05
gmannkopecmartin: that is why we need to set that new config from devstack which is branched and will take care of old and new things automatically 22:06
gmannlike any other feature flag22:06
gmannkopecmartin: ohk, you are saying to set via devstack job. that will work but this is not job specific right we should set it in lib/tempest so that any local installation alos work fine22:29

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!