opendevreview | HYSong proposed openstack/nova master: fix local volume extend https://review.opendev.org/c/openstack/nova/+/832180 | 02:05 |
---|---|---|
opendevreview | Xuan Yandong proposed openstack/nova master: Remove redundant symbols https://review.opendev.org/c/openstack/nova/+/832185 | 03:29 |
opendevreview | kiran pawar proposed openstack/nova master: VMware: Split out VMwareAPISession https://review.opendev.org/c/openstack/nova/+/832156 | 09:41 |
opendevreview | kiran pawar proposed openstack/nova master: VMware: StableMoRefProxy for moref recovery https://review.opendev.org/c/openstack/nova/+/832164 | 09:41 |
ignaziocassano_ | Hello, sometimes the volume retype from a netapp nfs storage to another netapp nfs storage does not work. I do not know the reason but I think something is going wrong in nova: | 10:10 |
ignaziocassano_ | File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1593, in _swap_volume\n raise exception.VolumeRebaseFailed(reason=six.text_type(exc))\n', "VolumeRebaseFailed: Volume rebase failed: Requested operation is not valid: pivot of disk 'vda' requires an active copy job\n"]: VolumeAttachmentNotFound: Volume attachment 2cd820e0-85e8-498d-a62a-800260d0cf31 could not be found | 10:11 |
ignaziocassano_ | Any helo please ? | 10:11 |
ignaziocassano_ | Any help please ? | 10:11 |
kashyap | ignaziocassano_: No direct answer, but that error (from libvirt) means: the "volume retype" (i.e volume migration) itself is not active | 10:41 |
kashyap | "active copy" == the copy that is on the NFS and is being mirrored from the NetApp storage | 10:42 |
kashyap | Also what version of OSP is this? And also mention libvirt/QEMU versions | 10:42 |
* kashyap --> needs to be AFK briefly | 10:42 | |
ignaziocassano_ | kashyap: I am using queens on centos 7 libvirt 4.5.0 QEMU emulator version 2.12.0 (qemu-kvm-ev-2.12.0-33.1.el7_7.4) | 10:45 |
ignaziocassano_ | Somettimes retyped volumes are corrupted and file system on instances went in read only | 10:48 |
kashyap | ignaziocassano_: The versions seems moderately old (~2017/2018); lots of storage bugs have been fixed in this area. And that corruption doesn't sound good. | 11:05 |
kashyap | I don't know if this is even reproducible consistenlty in your env. | 11:05 |
kashyap | So many variables :-( | 11:05 |
sean-k-mooney | ignaziocassano_: what version of nfs are you using | 11:15 |
sean-k-mooney | ignaziocassano_: nova recommends v4.0 as a minium preferably 4.2 | 11:16 |
ignaziocassano_ | sean-k-mooney: I do not knkow why, but the controlles mount cinder with version 4.0 while the compute nodes are using nfs vers 3 | 11:17 |
sean-k-mooney | we know that v3 has some issue with lockign that might affect data integrety | 11:17 |
sean-k-mooney | im really not sure how mixing would affect things | 11:18 |
ignaziocassano_ | ok, so I must specify nfs_mount_options in nova.conf on compute nodes | 11:18 |
sean-k-mooney | it slikely not adviasble but i suspect that this is outside the scope of nova. | 11:18 |
ignaziocassano_ | I will try it | 11:19 |
kashyap | Oh, yeah - the NFS version also plays a role. Indeed we recommend a minimum of NFS > 4.2 | 11:19 |
sean-k-mooney | https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.nfs_mount_options | 11:19 |
sean-k-mooney | likely can help | 11:19 |
sean-k-mooney | but i am not that familar with it | 11:19 |
sean-k-mooney | i know downstream we have some recommended optiosn for that related to selinux | 11:19 |
sean-k-mooney | beyond that i have never really looked at what we suggest setting there | 11:20 |
ignaziocassano_ | thanks for your help | 11:24 |
dmitriis | sean-k-mooney, gibi: zuul seems to be happy about https://review.opendev.org/c/openstack/nova/+/829974 | 14:49 |
sean-k-mooney | ah thanks for the reminder | 15:01 |
opendevreview | Alexey Stupnikov proposed openstack/nova master: Add functional tests to reproduce bug #1960412 https://review.opendev.org/c/openstack/nova/+/830010 | 15:10 |
sean-k-mooney | gibi: can you spot check my expectations, when we shelve an instance would you expect use to unbinding the ports form the host | 15:36 |
sean-k-mooney | or well when we shelve offload | 15:36 |
gibi | sean-k-mooney: I think when we offload we should unbind from the host, otherwise the physical resource (i.e. pci device) is not freed. Or can we free up a PCI device without unbinding the port? | 15:39 |
sean-k-mooney | ack that is what i woudl expect too but we dont | 15:39 |
sean-k-mooney | the port shoudl still be atached to the vm but it shoudl not be bound to a host or an ml2 driver once its offloaded | 15:40 |
gibi | so we shoudl keep device_id in the port. Is that enough to keep the port reserved in neutron? | 15:41 |
sean-k-mooney | yes the device_id is enough to track ownwership | 15:41 |
sean-k-mooney | but binding:host_id shoudl be set to '' | 15:42 |
sean-k-mooney | im trying to create functional test for vdpa move operations while if figure out if i can get a 2 node deployment to test/develop move ops | 15:42 |
sean-k-mooney | but when i was writign the shelve test i noticed it was not cleared | 15:43 |
sean-k-mooney | so i dont know of this cause any bug but its not what i was expecting | 15:43 |
sean-k-mooney | gibi: we are relying on driver.cleanup to tear down the networking on the host | 15:44 |
sean-k-mooney | which it does and that also disconnects the volumes | 15:44 |
gibi | do we free the compute claim? | 15:44 |
sean-k-mooney | but we dont actully unbind the ports | 15:44 |
sean-k-mooney | its a good question i would have to look but i think so | 15:45 |
sean-k-mooney | but im not certin now | 15:45 |
gibi | if we free the claim but does not unbind the port then I think we have no resource problems it is just ugly / misleading that we keep the binding:host_id in neutron | 15:46 |
sean-k-mooney | ya | 15:46 |
sean-k-mooney | that would happen in the compute manager i guess | 15:47 |
sean-k-mooney | we call self.rt.delete_allocation_for_shelve_offloaded_instance | 15:47 |
sean-k-mooney | i would assuem that would free them | 15:47 |
sean-k-mooney | no... that is just clearing the placment allocation | 15:48 |
sean-k-mooney | we do update the resouce tracker after that however | 15:49 |
sean-k-mooney | my func test say we do not free testtools.matchers._impl.MismatchError: 4 != 3 | 15:58 |
opendevreview | Alexey Stupnikov proposed openstack/nova master: Clean up when queued live migration aborted https://review.opendev.org/c/openstack/nova/+/828570 | 15:58 |
sean-k-mooney | gibi: i might try and reporduce this in devstack and see if the same is true in realitiy | 16:05 |
gibi | maybe the periodic update_available_resources task frees it | 16:06 |
sean-k-mooney | i ran that in the func test with run_perodics | 16:06 |
sean-k-mooney | i tought it would | 16:06 |
sean-k-mooney | but apperently not | 16:06 |
sean-k-mooney | its not inconciveable this is a func test issue but its worth figureing out | 16:08 |
dansmith | what is the nova-emulation job? | 16:09 |
sean-k-mooney | it test emulating arm vms | 16:09 |
sean-k-mooney | so x86 host bootign arm vms | 16:10 |
dansmith | is it supposed to be stable? | 16:10 |
dansmith | seems like a ton of rechecking going on these days, and nova jobs have gotten pretty fat | 16:10 |
sean-k-mooney | am i dont know if that is stable | 16:10 |
sean-k-mooney | we just enabled it a few days ago | 16:10 |
dansmith | it's voting, | 16:11 |
sean-k-mooney | i did not think it was failing but it can certenly be set non-voting or moved to periodic | 16:11 |
dansmith | and I just saw a kernel panic on it | 16:11 |
dansmith | I think it's a guest kernel | 16:11 |
sean-k-mooney | https://zuul.openstack.org/builds?job_name=nova-emulation&skip=0 | 16:12 |
sean-k-mooney | it looks kind of ok | 16:12 |
sean-k-mooney | i think that is the first failure since it was merged | 16:12 |
dansmith | https://zuul.opendev.org/t/openstack/build/cb1314bff0f34bfdbb3a4f1fd5547b72 | 16:12 |
dansmith | okay, well, regardless, nova jobs are looking pretty heavy | 16:12 |
dansmith | I dunno how widely-known it is, but we're losing 30% of our CI capacity at the end of the month | 16:12 |
dansmith | so we'll probably need to be making some cuts | 16:13 |
dansmith | what's the major benefit of testing arm-on-x86? | 16:13 |
sean-k-mooney | its a proxy for ensureing that the new emulation featur works in general | 16:14 |
sean-k-mooney | it could be a weekly job | 16:14 |
sean-k-mooney | or run only on libvirt changes | 16:14 |
dansmith | the thing that lets us choose the guest emulation mode you mean? | 16:14 |
dansmith | couldn't it be a single test? if we have an arm image available, couldn't we just boot one instance from it and make sure it's alive, instead of a whole other job? | 16:15 |
sean-k-mooney | well we ant to ensure resize ectra works | 16:16 |
sean-k-mooney | we could proably do it as a post action or something more light weight | 16:16 |
dansmith | sure, so one scenario test that boots, resize, snapshot, etc | 16:16 |
sean-k-mooney | ya we coudl do that | 16:16 |
dansmith | just saying, it seems pretty expensive for a minor verification | 16:16 |
sean-k-mooney | well the idea was to test all feature with emulation | 16:17 |
sean-k-mooney | but we can 1 move it to weakly and 2 make it a set of senario tests | 16:17 |
sean-k-mooney | chateaulav:^ | 16:17 |
dansmith | yeah, ideally we'd run every configuration on every patch, but.. | 16:18 |
chateaulav | sean-k-mooney: would the scenario tests need added to the tempest project? | 16:20 |
sean-k-mooney | ya | 16:20 |
sean-k-mooney | well or as a plugin | 16:20 |
sean-k-mooney | but upstream tempest i think would be ok | 16:20 |
chateaulav | yeah, i noticed it took some time for the ci itself to run. so then we want to pursue a new tempest scenario test that can be added into another ci? | 16:22 |
chateaulav | then pause the nova emulation, or run it not as frequently? | 16:23 |
dansmith | if there's a high likelihood of it being broken, then a tempest test to check that on each patch would be good | 16:23 |
dansmith | however, if it's not very likely, then a weekly periodic test would be better and easier | 16:23 |
dansmith | I suspect the latter | 16:24 |
chateaulav | yeah. i think long term, maybe next cycle add in the tempest scenario that we can leverage. I think the weekly periodic would be good for the interim though. | 16:26 |
chateaulav | your thoughts sean-k-mooney | 16:26 |
dansmith | was anything not working when we first tried to do this? | 16:27 |
opendevreview | Alexey Stupnikov proposed openstack/nova master: Clean up when queued live migration aborted https://review.opendev.org/c/openstack/nova/+/828570 | 16:27 |
chateaulav | what do you mena in regards to not working? | 16:29 |
dansmith | chateaulav: you added the ability to select the guest emulation mode right? when you added that, were other things broken that made that non-trivial? | 16:30 |
dansmith | or, how invasive was the change? it thought it was mostly just a flag | 16:30 |
chateaulav | yeah, so the main item is the meta property that lets you define the guest architecture | 16:33 |
chateaulav | everything else was mods to the various checks to account for reading that value along with the host arch | 16:33 |
kashyap | sean-k-mooney: I franky question the value of this "nova-emulation" job, given dansmith's comment on the impact. | 16:34 |
kashyap | Also who are the users for this? | 16:34 |
chateaulav | and then choosing the guest arch if it was defined. so the ci is just to ensure the emulation works. it is highly likely that changes to nova wont affect its functionality, because it follows the logical paths for the physical architecture support | 16:35 |
sean-k-mooney | kashyap: well chateaulav for one :) | 16:35 |
kashyap | Hmm, still | 16:36 |
kashyap | chateaulav: Also, please note: https://www.qemu.org/docs/master/system/security.html#non-virtualization-use-case | 16:36 |
sean-k-mooney | kashyap: they are aware. there are many production uscase for it even with that in mind | 16:37 |
dansmith | yeah, really seems pretty low-impact in terms of a feature, and a whole job on every change is very high cost | 16:37 |
sean-k-mooney | probly not public cloud | 16:37 |
dansmith | I tend to think that even a scenario in every job is more expensive than we need | 16:38 |
dansmith | a weekly periodic is fine if we want, but.. | 16:38 |
kashyap | Yeah, 30% impact on other nodes is just too much | 16:38 |
sean-k-mooney | well its not 30% from this job | 16:39 |
kashyap | sean-k-mooney: "many cases" - I'm assuming they don't give a hoot about security | 16:39 |
sean-k-mooney | we are loosing one of the providers i assume | 16:39 |
sean-k-mooney | kashyap: much of our downstream ci use qemu some uses kvm | 16:39 |
sean-k-mooney | so for ci, package building it think its fine | 16:40 |
kashyap | sean-k-mooney: Well, near as I know, most is exercising nested KVM | 16:40 |
kashyap | Internal CI is fine | 16:40 |
sean-k-mooney | dont forget that rackspace used to run there public cloud on power provideign x86 vms | 16:40 |
dansmith | um, what? that's news to me :) | 16:41 |
dansmith | I think they toyed with that, probably for second-source reasons but.. not to my knowledge for anything real | 16:41 |
dansmith | even still, that doesn't mean it makes sense, or is a good idea with qemu, and arm on x86 :) | 16:42 |
sean-k-mooney | they used ot have xen but also ppc host | 16:42 |
dansmith | yeah, probably for political reasons :) | 16:43 |
sean-k-mooney | perhaps | 16:43 |
chateaulav | yeah initial use of this is not meant to real-world systems. it is to bring testing and validation forward a little more so you dont have to run physical, and then work towards greater parity going forward | 16:44 |
kashyap | chateaulav: Okay, as long as you're clear that for any production usage this cross-arch emulation is entirely unfit. | 16:48 |
kashyap | Depending on the (cross-arch emulation) config, you still have _massive_ holes for a truck to comfortably drive through ;-) | 16:49 |
chateaulav | correct, this is entirely meant to bring security testing, validation testing, and providing simulated environments (which doesnt exist anywhere) to the common person within openstack | 16:50 |
dansmith | chateaulav: yeah, so it's cool if this is a toy, useful for developers or whatever, but that means the ci impact has to be negligible, IMHO | 16:50 |
chateaulav | dansmith: I was requested to add a ci in, so from the Nova Core Dev community perspective use it as you see fit. no need to waste ci time if it is exhuasting a lot of extra. I think i would be useful to have a periodic check to ensure that it remains functional; however, i can see it also being added to an existing ci as a scenario for the long term support of its testing | 16:58 |
dansmith | chateaulav: yeah, I understand, I'm not blaming you | 16:59 |
chateaulav | for sure, just want to make sure you understand our overall intent for this feature as a whole | 17:00 |
bauzas | chateaulav: dansmith: fwiw, I explain in the prelude that this is experimental and not tested in our CI | 17:23 |
bauzas | not false promises | 17:23 |
bauzas | no* | 17:23 |
bauzas | plus in the cycle highlights, hoping the marketing folks don't freak out and write something wrong | 17:24 |
dansmith | bauzas: okay but it is tested in our ci, has already broken for me this morning, and is costing a fair bit in terms of resource | 17:25 |
dansmith | but if you mean to describe it that way (and make the job reflect that) then ++ | 17:25 |
bauzas | I'm just testing the prelude as I write, so I'll upload it | 17:25 |
bauzas | heh, done | 17:25 |
bauzas | uploading it so reviews are welcome | 17:26 |
opendevreview | Sylvain Bauza proposed openstack/nova master: Add the Yoga prelude section https://review.opendev.org/c/openstack/nova/+/832292 | 17:26 |
bauzas | dansmith: gibi: sean-k-mooney: gmann: ^ | 17:26 |
gibi | bauzas: ack, I will look at it tomorrow morning | 17:27 |
gmann | bauzas: thanks. will check in my after noon | 17:27 |
sean-k-mooney | gibi: my func test now show that the device are claimed and freed porperly i had an off by on error | 19:28 |
sean-k-mooney | claiming a vdpa device decremets the total count by 2 | 19:30 |
sean-k-mooney | 1 for the vdpa device and 1 for the pf | 19:30 |
opendevreview | Merged openstack/nova master: Add grenade-skip-level irrelevant-files config https://review.opendev.org/c/openstack/nova/+/831229 | 20:36 |
opendevreview | sean mooney proposed openstack/nova master: [WIP] add fun tests for VDPA operations that should work. https://review.opendev.org/c/openstack/nova/+/832330 | 20:46 |
*** dasm is now known as dasm|off | 23:36 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!