opendevreview | melanie witt proposed openstack/nova master: Add func test for nova-manage db archive_deleted_rows --before https://review.opendev.org/c/openstack/nova/+/796744 | 01:46 |
---|---|---|
opendevreview | melanie witt proposed openstack/nova master: Add --task-log option to nova-manage db archive_deleted_rows https://review.opendev.org/c/openstack/nova/+/780395 | 01:57 |
melwitt | lyarwood, elodilles: it's funny passing CI \o/ (but recheck-a-thon) https://review.opendev.org/c/openstack/nova/+/795432 | 02:00 |
melwitt | s/funny/finally/ | 02:00 |
gibi | lyarwood: hi! regarding https://review.opendev.org/c/openstack/nova/+/796523 I'm sure I asked this before but forgot. Where do we have now the evacuation test coverage? | 07:11 |
opendevreview | Merged openstack/nova stable/rocky: libvirt:driver:Disallow AIO=native when 'O_DIRECT' is not available https://review.opendev.org/c/openstack/nova/+/747612 | 07:18 |
opendevreview | Merged openstack/nova stable/wallaby: Neutron fixture: don't clobber profile and vif_details if empty https://review.opendev.org/c/openstack/nova/+/792233 | 07:18 |
opendevreview | Merged openstack/nova stable/wallaby: Test SRIOV port move operations with PCI conflicts https://review.opendev.org/c/openstack/nova/+/790710 | 07:18 |
*** rpittau|afk is now known as rpittau | 07:22 | |
opendevreview | Yongli He proposed openstack/nova master: smartnic support https://review.opendev.org/c/openstack/nova/+/758944 | 07:31 |
opendevreview | Yongli He proposed openstack/nova master: smartnic support - reject server move and suspend https://review.opendev.org/c/openstack/nova/+/779913 | 07:31 |
opendevreview | Yongli He proposed openstack/nova master: smartnic support - functional tests https://review.opendev.org/c/openstack/nova/+/780147 | 07:31 |
lyarwood | melwitt: awesome :) I'll +1 only as I modified it | 07:43 |
lyarwood | gibi: it's part of the live migration jobs | 07:44 |
lyarwood | gibi: runs in the post playbook | 07:44 |
lyarwood | gibi: https://github.com/openstack/nova/tree/master/roles/run-evacuate-hook is the role we use | 07:45 |
lyarwood | gibi: https://github.com/openstack/nova/blob/master/playbooks/nova-live-migration/post-run.yaml is where it's called | 07:45 |
lyarwood | gibi: the logic being that we didn't want to stand up another multinode env every run to test evacuation | 07:46 |
lyarwood | gibi: doing it in post was easier as we didn't need to copy and paste any of the tempest playbook logic into Nova | 07:46 |
lyarwood | gibi: so for that review evacuation is tested from here https://zuul.opendev.org/t/openstack/build/057093756ca64ef994584e2cae50f537/log/job-output.txt#64392 | 07:50 |
opendevreview | Merged openstack/nova stable/victoria: Reproduce bug 1897528 https://review.opendev.org/c/openstack/nova/+/791767 | 07:50 |
gibi | lyarwood: thanks | 08:03 |
gibi | I hope I will not foget this again :) | 08:03 |
*** akekane__ is now known as abhishekk | 08:04 | |
lyarwood | ^_^ | 08:14 |
opendevreview | Yongli He proposed openstack/nova master: Smartnic support - cyborg drive https://review.opendev.org/c/openstack/nova/+/771362 | 09:01 |
opendevreview | Yongli He proposed openstack/nova master: smartnic support - new vnic type https://review.opendev.org/c/openstack/nova/+/771363 | 09:01 |
opendevreview | Yongli He proposed openstack/nova master: smartnic support https://review.opendev.org/c/openstack/nova/+/758944 | 09:01 |
opendevreview | Yongli He proposed openstack/nova master: smartnic support - reject server move and suspend https://review.opendev.org/c/openstack/nova/+/779913 | 09:01 |
opendevreview | Yongli He proposed openstack/nova master: smartnic support - functional tests https://review.opendev.org/c/openstack/nova/+/780147 | 09:01 |
opendevreview | Merged openstack/nova stable/victoria: Ignore PCI devices with 32bit domain https://review.opendev.org/c/openstack/nova/+/791768 | 09:02 |
yonglihe | rebase to fix dependency problem, that's weird. | 09:03 |
lyarwood | yonglihe: A pip dependency problem? We've seen loads that make no sense recently. | 09:19 |
* lyarwood really needs to write something up on the ML to see if other projects are also hitting it | 09:19 | |
stephenfin | lyarwood: it's a cache issue, I think | 09:21 |
lyarwood | oh the limestone thing? | 09:21 |
stephenfin | yeah, I think so | 09:21 |
lyarwood | wonderful | 09:22 |
stephenfin | it's failing with e.g. dep a requesting >=1.2 and upper constraints requesting == 3.0, which would pass unless 3.0 wasn't available | 09:22 |
stephenfin | hmm, maybe not actually - the error message I get locally is different | 09:24 |
* stephenfin looks at the failure from yonglihe | 09:24 | |
lyarwood | ack thanks | 09:25 |
lyarwood | that makes sense now if the cache is borked | 09:25 |
stephenfin | yonglihe: the failure on https://review.opendev.org/c/openstack/nova/+/758944/ looks real? | 09:26 |
stephenfin | if vnic_type in network_model.VNIC_TYPES_ACCELERATOR: | 09:26 |
stephenfin | AttributeError: module 'nova.network.model' has no attribute 'VNIC_TYPES_ACCELERATOR' | 09:26 |
stephenfin | (from https://zuul.opendev.org/t/openstack/build/e08dc74546d34d9a8ee67e597ade8fb2) | 09:26 |
stephenfin | elodilles: lyarwood: Care to keep working through this backport series? The victoria patches have landed now and this is another clean backport https://review.opendev.org/q/topic:%2522bug/1897528%2522+branch:stable/ussuri | 09:28 |
lyarwood | ack looking | 09:29 |
lyarwood | elodilles: https://review.opendev.org/c/openstack/nova/+/796626 - can you also take a look at this on master if you get a chance, moving the cherry-pick script out of pep8. | 09:30 |
elodilles | sure, looking at the patches :) | 09:32 |
gibi | lyarwood, stephenfin: yesterday infra turned off limestone due to the pip cache issue | 09:42 |
gibi | so we should not see these nonsensical req conflicts | 09:43 |
gibi | any more today | 09:43 |
yonglihe | stephenfin, that's because that patch lost the decency to second patch, fixed. | 09:45 |
lyarwood | wonderful | 09:45 |
lyarwood | gibi: https://bugs.launchpad.net/cinder/+bug/1932287 just caught this if you see any random volume creation failures today | 09:46 |
gibi | lyarwood: thanks, I haven't seen that issue yet | 09:47 |
opendevreview | Merged openstack/nova stable/rocky: Remove allocations before setting vm_status to SHELVED_OFFLOADED https://review.opendev.org/c/openstack/nova/+/771985 | 09:47 |
stephenfin | elodilles: Yeah, as lyarwood said, we need to move the cherry-pick change out of the pep8 job. I hadn't seen that failure | 09:48 |
* stephenfin respins | 09:48 | |
gibi | lyarwood: with the exit code 139 lvs complains about missing devices and that I saw before | 09:50 |
* gibi digging up job results | 09:50 | |
lyarwood | yeah https://review.opendev.org/c/openstack/cinder/+/783660 fixed it elsewhere | 09:50 |
lyarwood | just not in this path | 09:51 |
gibi | lyarwood: cool, then we have a way forward | 09:51 |
opendevreview | Stephen Finucane proposed openstack/nova stable/ussuri: Reproduce bug 1897528 https://review.opendev.org/c/openstack/nova/+/791770 | 09:51 |
opendevreview | Stephen Finucane proposed openstack/nova stable/ussuri: Ignore PCI devices with 32bit domain https://review.opendev.org/c/openstack/nova/+/791771 | 09:51 |
stephenfin | elodilles: lyarwood: fixed the pep8 failure ^ | 09:52 |
gibi | lyarwood: I'm hitting https://bugs.launchpad.net/nova/+bug/1912310 many times now and almost always in the test_volume_backed_live_migration tempest test. Wondering if it worth to disable that test until ovsdbapp fix lands | 09:52 |
opendevreview | Stephen Finucane proposed openstack/nova stable/train: Reproduce bug 1897528 https://review.opendev.org/c/openstack/nova/+/792116 | 09:53 |
opendevreview | Stephen Finucane proposed openstack/nova stable/train: Ignore PCI devices with 32bit domain https://review.opendev.org/c/openstack/nova/+/792117 | 09:53 |
stephenfin | and the train ones are updated now too | 09:53 |
lyarwood | gibi: ack lets do it, I'll disable them now | 09:57 |
opendevreview | Lee Yarwood proposed openstack/nova master: zuul: Skip volume backed LM tests until bug #1912310 is resolved https://review.opendev.org/c/openstack/nova/+/796813 | 10:04 |
lyarwood | gibi: ^ hopefully that's enough, if it isn't then we might want to move the LM jobs to non-voting | 10:04 |
opendevreview | Stephen Finucane proposed openstack/nova master: db: Reintroduce validation of shadow table schema https://review.opendev.org/c/openstack/nova/+/796814 | 10:11 |
stephenfin | lyarwood: gibi: one final one, as requested ^ | 10:11 |
gibi | lyarwood: thanks | 10:12 |
elodilles | stephenfin: actually i was surprised that pep8 is failing in ussuri because of py27/six problem as py27 should be supported only up until train :-o | 10:13 |
stephenfin | elodilles: yeah, we simply weren't aggressive enough in dropping the no-longer relevant hacking checks | 10:13 |
elodilles | oh, i see | 10:13 |
stephenfin | I would personally like to backport the patch that dropped the check, but I don't know what you think about that. I can't imagine that would violate stable policy since it's nothing to do with production code | 10:15 |
stephenfin | (commit 9dca0d186f834c38d0d06e226b18ab3ae717c140 fwiw) | 10:15 |
lyarwood | Yup I assumed we would tbh | 10:17 |
lyarwood | no reason to leave it just on >=stable/xena | 10:18 |
elodilles | stephenfin: well, it formally violates, as it is a blueprint o:) ... anyway, I would stick to backporting only bug fixes... but... given that py27 is not supported in ussuri anymore... anyway I'm a bit unsure... o:) | 10:21 |
lyarwood | oh sorry I thought we were talking about the cherry-pick script | 10:24 |
elodilles | lyarwood: actually I've missed that discussion :X Are you planning to move out the cherry-pick-check from pep8? | 10:35 |
elodilles | lyarwood: nevermind, i'm just a bit slow today :D | 10:36 |
lyarwood | elodilles: https://review.opendev.org/c/openstack/nova/+/796626 yeah that's the idea, make it non-voting in check and only voting in the gate | 10:36 |
elodilles | lyarwood: yeah, sorry :X | 10:36 |
sean-k-mooney | lyarwood: instead of skiping the test https://review.opendev.org/c/openstack/nova/+/796813 why not just use the old driver | 10:45 |
sean-k-mooney | the stalling issue only happens if you use the native driver | 10:45 |
sean-k-mooney | the vsctl one wont have that problem | 10:45 |
lyarwood | We could but unless that's done in devstack across all jobs we end up with a mixed set of jobs | 10:50 |
sean-k-mooney | is that a bad thing | 10:51 |
sean-k-mooney | with the current patch your just reducing coverage | 10:51 |
sean-k-mooney | but the issue can still happen | 10:51 |
sean-k-mooney | im fine with makeing that change in devstack tempeorally | 10:52 |
lyarwood | kk if you could post that we can yank or revert this | 10:52 |
sean-k-mooney | i left a -1 on the patch already but would you like me to do the devstack change | 10:53 |
sean-k-mooney | jsut finishing an email but i can do it then | 10:54 |
lyarwood | sean-k-mooney: ack | 10:56 |
lyarwood | stephenfin: can you yank the +W on https://review.opendev.org/c/openstack/nova/+/796813 | 10:56 |
lyarwood | stephenfin: sean-k-mooney is going to work around this in devstack | 10:56 |
stephenfin | sure, done | 10:56 |
lyarwood | ta | 10:56 |
lyarwood | noice, the FIPS fallout doesn't look that bad | 11:05 |
lyarwood | https://6cbf38d10f57b850b36e-212ab268b5e4bbb4b3348f98a2a831ee.ssl.cf5.rackcdn.com/790519/6/check/nova-fips/914f344/testr_results.html | 11:05 |
lyarwood | paramiko as expected and a server create timeout | 11:06 |
lyarwood | and that smells like the ovs locking up issue | 11:07 |
sean-k-mooney | lyarwood: cool | 11:12 |
sean-k-mooney | by the way i have found an interestign ceph issue that well i know how to fix but dont know how to detect | 11:12 |
sean-k-mooney | lyarwood: are you familar with EC pools in ceph | 11:13 |
sean-k-mooney | i was following https://docs.ceph.com/en/latest/rbd/rbd-openstack/ to configure ceph for openstack in general and https://themeanti.me/technology/2018/08/23/ceph_erasure_openstack.html for the ec pools | 11:13 |
sean-k-mooney | and i missed a step kind of | 11:14 |
sean-k-mooney | since i have a vms pool for nova and a vms_data pool | 11:14 |
sean-k-mooney | wehn i was doing the cephx user caps configuration i need to list both the vms pool and vms_data pool | 11:15 |
sean-k-mooney | only listed vms | 11:15 |
sean-k-mooney | the result of which is that nova booted a vm and it went into the active state | 11:15 |
sean-k-mooney | but it could not reade or write its root disk | 11:15 |
sean-k-mooney | the root disk has all the data present but it was inaccessable to qemu | 11:16 |
sean-k-mooney | lyarwood: due you think that qemu might be abel to detect that and create a warnig/error or could we detech that somehow and create an error | 11:17 |
sean-k-mooney | nova is calling ceph directly to get the avaibale storage | 11:17 |
sean-k-mooney | im debating if woudl make sense for nova to try and create a volume and read form it on the host or something on startup of the agent | 11:18 |
lyarwood | I'm not sure how QEMU could catch that tbh | 11:22 |
lyarwood | tbh that smells more like a deployment tooling validation? | 11:23 |
lyarwood | I wouldn't ask Nova to check it | 11:23 |
sean-k-mooney | ok its just annoying to debug | 11:23 |
sean-k-mooney | there is no error in qemu or nova or ceph | 11:23 |
sean-k-mooney | the vm just cant find a bootable disk | 11:24 |
sean-k-mooney | lyarwood: the deployment tool im using does not technially support this anymore which is why i messed it up | 11:24 |
sean-k-mooney | kolla-ansibel now just has extrenal ceph support | 11:24 |
lyarwood | that could still be a validation for external ceph | 11:25 |
sean-k-mooney | so you predeply ceph with your favor tool and then pass it a few files like the keyrings and it does the rest | 11:25 |
lyarwood | that the keyring has r/w access | 11:25 |
sean-k-mooney | it could yes | 11:25 |
sean-k-mooney | there are post run check which i did not run | 11:25 |
sean-k-mooney | i might add one for this | 11:25 |
sean-k-mooney | i was going to try and update there docs later anyway to document some of the more advanced customisation that im doing | 11:26 |
sean-k-mooney | for example running all fo the opnestack servics on the same port but with different subdomains | 11:26 |
sean-k-mooney | lyarwood: stephenfin https://review.opendev.org/c/openstack/devstack/+/796826 | 11:54 |
sean-k-mooney | i think that will do the right thing | 11:55 |
lyarwood | LGTM but I'll wait for CI to run before I vote | 12:09 |
sean-k-mooney | i have not had time to test that so that is proably a good idea :) | 12:12 |
lyarwood | Small nit in the commit message btw, you called out the wrong bug. | 12:14 |
sean-k-mooney | oh | 12:14 |
sean-k-mooney | i can fix it but might wait for the ci to finish | 12:14 |
lyarwood | yeah no issues | 12:14 |
* lyarwood -> lunch brb | 12:15 | |
sean-k-mooney | ah i di | 12:15 |
sean-k-mooney | it should be https://bugs.launchpad.net/nova/+bug/1929446 | 12:15 |
sean-k-mooney | not https://bugs.launchpad.net/ubuntu/+source/grub-installer/+bug/1929466 | 12:15 |
sean-k-mooney | 446 no 466 | 12:15 |
sean-k-mooney | lyarwood: its going to fail | 12:17 |
sean-k-mooney | opt/stack/devstack/lib/os-vif: line 12: return: False: numeric argument required | 12:17 |
sean-k-mooney | i forgot you cant return sting in bash | 12:18 |
sean-k-mooney | you echo them | 12:18 |
opendevreview | Merged openstack/nova master: db: Remove dead code https://review.opendev.org/c/openstack/nova/+/786291 | 12:19 |
opendevreview | Merged openstack/nova master: gate: Remove test_evacuate.sh https://review.opendev.org/c/openstack/nova/+/796523 | 12:19 |
opendevreview | Rodrigo Barbieri proposed openstack/nova stable/ussuri: Error anti-affinity violation on migrations https://review.opendev.org/c/openstack/nova/+/796719 | 12:50 |
opendevreview | Merged openstack/nova stable/stein: Improve error log when snapshot fails https://review.opendev.org/c/openstack/nova/+/782962 | 13:06 |
opendevreview | Merged openstack/nova stable/ussuri: Reproduce bug 1897528 https://review.opendev.org/c/openstack/nova/+/791770 | 13:06 |
opendevreview | Lee Yarwood proposed openstack/nova master: zuul: Add nova-tox-functional-centos8-py36 job https://review.opendev.org/c/openstack/nova/+/796684 | 13:13 |
opendevreview | Lee Yarwood proposed openstack/nova master: zuul: Add nova-tox-functional-centos8-py36 job https://review.opendev.org/c/openstack/nova/+/796684 | 13:18 |
lyarwood | gah! | 13:19 |
opendevreview | Lee Yarwood proposed openstack/nova master: zuul: Add nova-tox-functional-centos8-py36 job https://review.opendev.org/c/openstack/nova/+/796684 | 13:19 |
* gibi is frustrated that https://review.opendev.org/c/openstack/nova/+/796255 needed a 10th recheck :/ | 13:47 | |
lyarwood | gibi: sean-k-mooney is working on https://review.opendev.org/c/openstack/devstack/+/796826 to hopefully resolve lots of instability | 13:50 |
sean-k-mooney | i wonder why we are hitting this so much more often recently | 13:51 |
lyarwood | maybe we are just noticing it more recently, it's an awkward one. | 13:53 |
sean-k-mooney | ya we also kind of mentally filter out those lines in the log | 13:53 |
sean-k-mooney | at least i do most of the time | 13:54 |
lyarwood | right takes some processing of timestamps to even see the issue but most of the time the ultimate test failure is miles away from that | 13:59 |
lyarwood | sometimes I wish I worked on an easier stack :) | 13:59 |
sean-k-mooney | lyarwood: gibi its almost finished the check run by the way the current version seams to be working | 13:59 |
lyarwood | ack yeah I've been watching | 13:59 |
lyarwood | looking good thus far | 13:59 |
noonedeadpunk | o/ | 14:04 |
noonedeadpunk | folks we noticed weird behaviour that you're probably aware about | 14:04 |
opendevreview | Mohammed Naser proposed openstack/nova stable/wallaby: Allow X-OpenStack-Nova-API-Version header in CORS https://review.opendev.org/c/openstack/nova/+/796860 | 14:05 |
opendevreview | Mohammed Naser proposed openstack/nova stable/victoria: Allow X-OpenStack-Nova-API-Version header in CORS https://review.opendev.org/c/openstack/nova/+/796861 | 14:06 |
opendevreview | Mohammed Naser proposed openstack/nova stable/ussuri: Allow X-OpenStack-Nova-API-Version header in CORS https://review.opendev.org/c/openstack/nova/+/796862 | 14:06 |
opendevreview | Mohammed Naser proposed openstack/nova stable/train: Allow X-OpenStack-Nova-API-Version header in CORS https://review.opendev.org/c/openstack/nova/+/796863 | 14:07 |
noonedeadpunk | So, algorithm is kind of the following: 1. HV goes down. 2. VM is sent Shutdown (or any other request). 3. Then VM is in `powering-off` state, but it needs to be evacuated. So reset-state is issued and evacuate is processed. Now VM is running on another HV. 4 When original HV goes up it process messages that were issued while it was down and powers off VM that was evacuated and owned by another HV atm | 14:07 |
noonedeadpunk | I have a feeling that if node is not owning VM it should not have ability to influence it even if it has some commands in queue? | 14:08 |
noonedeadpunk | and maybe you have some guess where in code worth looking for this? | 14:08 |
lyarwood | so the compute manager that gets the cast in this case isn't doing any checks to ensure the instance is still on that host | 14:10 |
lyarwood | I guess it's a valid thing to do for any operations using casts | 14:10 |
noonedeadpunk | yeah, I expect smth like that is happening. But not super familiar with codebase :( | 14:10 |
sean-k-mooney | noonedeadpunk: why are you doing reset-state in your evacuate workflow | 14:11 |
sean-k-mooney | noonedeadpunk: you should not be doing reset state first | 14:11 |
noonedeadpunk | well, otherwise it can't be evacuated with `ERROR (Conflict): Cannot 'evacuate' instance e46404b1-e6e1-4d22-9f8f-12d6f51b55ae while it is in task_state powering-off` | 14:11 |
sean-k-mooney | hum | 14:11 |
sean-k-mooney | i see | 14:12 |
noonedeadpunk | Is there any other proper way to do evacuate? | 14:12 |
gibi | lyarwood, sean-k-mooney thanks. I'm happy to see that this week a lot of us focused on stabilizing the gate. | 14:12 |
noonedeadpunk | I mean technicaly we could wait until node goes up, but it might be days theoretically? | 14:12 |
lyarwood | tbh I think we should allow evacuate if the instance is powering-off | 14:12 |
lyarwood | either way the src compute is dead | 14:12 |
sean-k-mooney | yep i ws thinking the same | 14:13 |
noonedeadpunk | but it won't resolve original issue though | 14:13 |
noonedeadpunk | as then evacuated instance would be shot anyway | 14:13 |
lyarwood | well it shouldn't kill the instance on the dest | 14:13 |
noonedeadpunk | (but agree it's super valid to allow evacuate) | 14:13 |
lyarwood | the cast to shutdown the original instance on the original host should fail | 14:13 |
noonedeadpunk | fwiw it's on Victoria | 14:14 |
lyarwood | but that has the potential of moving the instance into an ERROR state | 14:14 |
lyarwood | a simple decorator to check that instance.host points at the current host would work here | 14:14 |
lyarwood | it might not work everywhere we cast | 14:15 |
lyarwood | but in this example it's fine | 14:15 |
lyarwood | noonedeadpunk: did you have a bug for this? | 14:15 |
noonedeadpunk | nope, not yet, but will submit one :) | 14:15 |
lyarwood | awesome thanks | 14:16 |
noonedeadpunk | or maybe even two... | 14:16 |
gibi | there is a recent bug asking for evacuating in soft-delete state https://bugs.launchpad.net/nova/+bug/1932126 | 14:17 |
sean-k-mooney | we spoke about allowing it in other state at the ptg | 14:17 |
sean-k-mooney | like paused/suspended | 14:18 |
noonedeadpunk | well... soft delete is really corner case imo... | 14:18 |
lyarwood | I'm not sure that soft-delete makes sense | 14:18 |
bauzas | soft-deleted in the Nova API or in the database ? | 14:18 |
sean-k-mooney | im not sure soft delete makes much sense | 14:18 |
noonedeadpunk | once you will evacuate it it would be already time to delete instance... | 14:18 |
sean-k-mooney | lyarwood: :) | 14:18 |
bauzas | ah, this | 14:19 |
sean-k-mooney | i guess the use case is to undelete it | 14:19 |
bauzas | (nova api soft delete, that's it) | 14:19 |
bauzas | well, i do understand the concern from an operator pov | 14:19 |
lyarwood | TIL we can do that | 14:19 |
bauzas | if you wanna evacuate, you're in a rush | 14:19 |
sean-k-mooney | i would be ok with making undelete work when the host is down and then allow evacuate | 14:19 |
lyarwood | for some reason I didn't think we had a way back | 14:19 |
sean-k-mooney | btu eveauate on a soft-deleted instace form me would have to undelete it | 14:19 |
bauzas | sean-k-mooney: or we could just not rebuild the instance | 14:20 |
bauzas | it's soft deleted in the source host | 14:20 |
bauzas | so the target should just not rebuild the instance | 14:20 |
sean-k-mooney | bauzas: so rebuild when the hosts is down | 14:20 |
bauzas | nah | 14:20 |
bauzas | not rebuild the soft-deleted instance | 14:21 |
sean-k-mooney | but why would we keep it deleted | 14:21 |
bauzas | but the evacuate API should woirk | 14:21 |
sean-k-mooney | not for soft deleted | 14:21 |
bauzas | because it's already deleted | 14:21 |
bauzas | someone asked the instance to be deleted | 14:21 |
bauzas | then the host got an issue | 14:22 |
sean-k-mooney | yep | 14:22 |
bauzas | so the operator would recreate the instances in a target | 14:22 |
sean-k-mooney | at which point i think we should jsut treat it as if it has been deelted fully | 14:22 |
bauzas | sean-k-mooney: that's my point | 14:22 |
bauzas | when saying to not rebuild it on the target | 14:22 |
bauzas | but here the API doesn't work | 14:23 |
bauzas | so, we should provide a HTTP200 for an soft-delete evacuate | 14:23 |
bauzas | but not recreating it | 14:23 |
bauzas | anyway, needs to get my kids from the school | 14:24 |
gibi | I think the use case could be | 14:27 |
gibi | 1) user deletes the VM | 14:27 |
gibi | 2) soft deleting is enabled so the VM is just soft-deleted | 14:27 |
gibi | 3) host goes down | 14:27 |
gibi | 4) user realizes that there was a mistake deleting the VM and calls restore | 14:28 |
gibi | 5) restore fails as the host is down | 14:28 |
gibi | what to do nwo | 14:28 |
gibi | now | 14:28 |
sean-k-mooney | so the user cannot know if soft delete is avaiable or the time to restore | 14:28 |
sean-k-mooney | to me we can make the restorce call work when the hsot is down but we should not change the evacuate beahvior IMO | 14:30 |
sean-k-mooney | so have restore jsut undelete it in the db | 14:30 |
gibi | OK so then the restore + evacuate would work | 14:31 |
sean-k-mooney | yep | 14:31 |
gibi | that is acceptable to me | 14:31 |
gibi | but | 14:31 |
gibi | soft-delete, host down, restore (undelete in db), host up sequence would lead to inconsistency | 14:31 |
sean-k-mooney | well when the host comes up it will need to check the db state | 14:32 |
sean-k-mooney | before completing the soft delete action correct | 14:32 |
sean-k-mooney | e.g. when the compute comes back up it shoudl see the vm was evacuated | 14:33 |
sean-k-mooney | or if it was jsut restored | 14:33 |
gibi | there was no evacuation in this sequence | 14:33 |
sean-k-mooney | tehn it would see that its been restored in the db | 14:33 |
sean-k-mooney | so it would need to hanel that | 14:33 |
gibi | today restore sets the power state back to running. the db only restore would set it to shutoff? | 14:34 |
gibi | anyhow I have to run | 14:36 |
gibi | my days are soo random, I don't feel productive | 14:36 |
noonedeadpunk | https://bugs.launchpad.net/nova/+bug/1932326 | 14:37 |
sean-k-mooney | noonedeadpunk: thanks | 14:46 |
sean-k-mooney | noonedeadpunk: just going to triage this quickly how impactful is to you production wise | 14:47 |
sean-k-mooney | im leaning towards medium or low since there is no data lose but there is a workload outage | 14:47 |
sean-k-mooney | noonedeadpunk: i.e. you can just fix it by starting the vm again | 14:48 |
noonedeadpunk | I'd say it's closer to medium I guess, because as for public cloud provider it's hard to explain why customer VM wents down in a day after previous outage | 14:48 |
noonedeadpunk | and you can start it when you own vm or monitor it | 14:49 |
noonedeadpunk | but if it's not your VM it's hard to even know that it went down | 14:49 |
sean-k-mooney | yep | 14:50 |
noonedeadpunk | As current workaround we will probably attempt to flush queue for compute that went down... | 14:50 |
noonedeadpunk | but it's so nasty imo | 14:50 |
sean-k-mooney | more then likely it will cause the custoemr to notice a failure of the vm restore it and file a ticket | 14:50 |
noonedeadpunk | that's exactly what has happened :) | 14:50 |
noonedeadpunk | that's pretty much a corner case though as well | 14:51 |
sean-k-mooney | noonedeadpunk: so if we allow evac in the powering-off state restoring it to shutdown would make sense right | 14:51 |
sean-k-mooney | rather then active | 14:51 |
noonedeadpunk | yes, totally | 14:52 |
noonedeadpunk | and powering-on to active :) | 14:52 |
sean-k-mooney | we had disscced doing that for vms in suspend and pause so including powering-off in that list i think is consitent | 14:52 |
noonedeadpunk | (but it's harder to imagine happening) | 14:52 |
sean-k-mooney | powering-on to active would also make sense | 14:53 |
sean-k-mooney | well i dont know | 14:53 |
sean-k-mooney | if i was a custoemr and my vm sudenly stoped working i might do a start to see if that fixes it | 14:53 |
noonedeadpunk | I think it will appear as active | 14:54 |
sean-k-mooney | it will yes | 14:54 |
noonedeadpunk | so you are able only to reboot or shutdown? | 14:54 |
sean-k-mooney | but i might not check and just do a start but ya i normlally woud do hard-reboot | 14:54 |
sean-k-mooney | its less likely but if we are adressing this we proably should go throug all the statees and just make them consitent/intuitive | 14:55 |
noonedeadpunk | I mean that you can't start already active instance - you will get same Conflict exception iirc | 14:55 |
sean-k-mooney | ah | 14:55 |
sean-k-mooney | yes proably since its in the state you want | 14:55 |
noonedeadpunk | yeah | 14:55 |
noonedeadpunk | so powering-on would be really unfortunate co-incidence that will affect most likely only CI toolings or dunno... | 14:57 |
sean-k-mooney | ya its much less likely | 14:58 |
melwitt | gibi, stephenfin: heya, I've updated the --task-log archive patch to address gibi's comments https://review.opendev.org/c/openstack/nova/+/780395 | 14:58 |
sean-k-mooney | we can reason about it though and come to a logic conclution for what it shoudl do so we proably shoudl just cover it | 14:58 |
stephenfin | melwitt: trying to backport https://review.opendev.org/c/openstack/nova/+/602432 at the moment (it's hell) but I'll hit that again before EOD, hopefully | 15:15 |
melwitt | stephenfin: ok np, and good luck | 15:16 |
kashyap | stephenfin: sean-k-mooney: NUMA-related: you might find it interesting - libvirt upstream is wiring up "HMAT" - which defines the different latencies and bandwidths b/n NUMA nodes: | 15:24 |
kashyap | [quote] | 15:24 |
kashyap | "Links between NUMA nodes can have different latencies and bandwidths. This info is newly defined in ACPI 6.2 under Heterogeneous Memory Attribute Table (HMAT) table. Linux kernel learned how to report these values under sysfs and thus we can expose them in our capabilities XML. The sysfs interface is documented in kernel's Documentation/admin-guide/mm/numaperf.rst." | 15:24 |
kashyap | [/quote] | 15:24 |
kashyap | This is called "NUMA interconnects", apparently: https://listman.redhat.com/archives/libvir-list/2021-June/msg00268.html | 15:24 |
sean-k-mooney | ill take a look | 15:28 |
sean-k-mooney | kashyap: that could be useful yes | 15:46 |
kashyap | Yep; noted | 15:46 |
*** rpittau is now known as rpittau|afk | 16:09 | |
opendevreview | Stephen Finucane proposed openstack/nova stable/wallaby: libvirt: Delegate OVS plug to os-vif https://review.opendev.org/c/openstack/nova/+/790447 | 16:42 |
opendevreview | Stephen Finucane proposed openstack/nova stable/wallaby: fixup! libvirt: Delegate OVS plug to os-vif https://review.opendev.org/c/openstack/nova/+/796891 | 16:42 |
opendevreview | Stephen Finucane proposed openstack/nova stable/wallaby: libvirt: Delegate OVS plug to os-vif https://review.opendev.org/c/openstack/nova/+/790447 | 16:43 |
*** akekane_ is now known as abhishekk | 16:44 | |
opendevreview | Lee Yarwood proposed openstack/nova stable/ussuri: virt: Add destroy_secrets kwarg to destroy and cleanup https://review.opendev.org/c/openstack/nova/+/796262 | 16:50 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/victoria: fixtures: Handle binding of first port https://review.opendev.org/c/openstack/nova/+/796905 | 17:04 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/victoria: Neutron fixture: don't clobber profile and vif_details if empty https://review.opendev.org/c/openstack/nova/+/796906 | 17:04 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/victoria: functional: Add live migration tests for PCI, SR-IOV servers https://review.opendev.org/c/openstack/nova/+/796907 | 17:04 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/victoria: Test SRIOV port move operations with PCI conflicts https://review.opendev.org/c/openstack/nova/+/796908 | 17:04 |
opendevreview | Artom Lifshitz proposed openstack/nova stable/victoria: Update SRIOV port pci_slot when unshelving https://review.opendev.org/c/openstack/nova/+/796909 | 17:04 |
opendevreview | Lee Yarwood proposed openstack/nova stable/victoria: virt: Add destroy_secrets kwarg to destroy and cleanup https://review.opendev.org/c/openstack/nova/+/796259 | 17:06 |
opendevreview | Lee Yarwood proposed openstack/nova stable/victoria: libvirt: Do not destroy volume secrets during _hard_reboot https://review.opendev.org/c/openstack/nova/+/796260 | 17:06 |
opendevreview | Lee Yarwood proposed openstack/nova stable/victoria: Trival Change: Remove redundant code in instance delete https://review.opendev.org/c/openstack/nova/+/796912 | 17:06 |
lyarwood | melwitt: https://review.opendev.org/c/openstack/nova/+/796626 - Would you mind taking a look at this today if you get a chance, stephenfin is looking to split out the cherry-pick.sh script from the pep8 job. | 17:14 |
melwitt | sure | 17:16 |
lyarwood | thanks | 17:16 |
opendevreview | Lee Yarwood proposed openstack/nova stable/ussuri: virt: Add destroy_secrets kwarg to destroy and cleanup https://review.opendev.org/c/openstack/nova/+/796262 | 17:20 |
opendevreview | Lee Yarwood proposed openstack/nova stable/ussuri: Detach is broken for multi-attached fs-based volumes https://review.opendev.org/c/openstack/nova/+/796263 | 17:20 |
opendevreview | Lee Yarwood proposed openstack/nova stable/ussuri: libvirt: Do not destroy volume secrets during _hard_reboot https://review.opendev.org/c/openstack/nova/+/796264 | 17:20 |
opendevreview | Lee Yarwood proposed openstack/nova stable/ussuri: Trival Change: Remove redundant code in instance delete https://review.opendev.org/c/openstack/nova/+/796929 | 17:20 |
melwitt | stephenfin: does this empty deps = do something? https://review.opendev.org/c/openstack/nova/+/796626/2/tox.ini#87 | 17:22 |
sean-k-mooney | melwitt: i beleive it prevent use installing any deps | 17:22 |
melwitt | ok | 17:22 |
sean-k-mooney | we just need bash so this shoud make it slightly faster | 17:23 |
sean-k-mooney | sicne we ware just looking at the commit message | 17:23 |
melwitt | thanks | 17:24 |
*** mdbooth4 is now known as mdbooth | 17:33 | |
opendevreview | Lee Yarwood proposed openstack/nova stable/train: virt: Add destroy_secrets kwarg to destroy and cleanup https://review.opendev.org/c/openstack/nova/+/796935 | 18:19 |
opendevreview | Lee Yarwood proposed openstack/nova stable/train: Detach is broken for multi-attached fs-based volumes https://review.opendev.org/c/openstack/nova/+/796936 | 18:19 |
opendevreview | Lee Yarwood proposed openstack/nova stable/train: Handle unset 'connection_info' https://review.opendev.org/c/openstack/nova/+/796937 | 18:19 |
opendevreview | Lee Yarwood proposed openstack/nova stable/train: Trival Change: Remove redundant code in instance delete https://review.opendev.org/c/openstack/nova/+/796938 | 18:19 |
opendevreview | Lee Yarwood proposed openstack/nova stable/train: libvirt: Do not destroy volume secrets during _hard_reboot https://review.opendev.org/c/openstack/nova/+/796939 | 18:19 |
opendevreview | Merged openstack/nova master: Move 'check-cherry-picks' test to gate, n-v check https://review.opendev.org/c/openstack/nova/+/796626 | 21:42 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!