Thursday, 2024-09-12

*** __ministry is now known as Guest321501:18
opendevreviewmelanie witt proposed openstack/nova master: DNM try minimalistic imagebackend refactor  https://review.opendev.org/c/openstack/nova/+/92563501:20
opendevreviewmelanie witt proposed openstack/nova master: DNM try minimalistic imagebackend refactor  https://review.opendev.org/c/openstack/nova/+/92563505:00
gibibauzas: supamatt's bug feels like an RC candidate https://review.opendev.org/c/openstack/nova/+/92897006:42
gibibauzas: I've added it to the tracking etherpad and I will review it properly after a coffee06:43
gibisupamatt: thanks for bringing this to our attention06:43
gibisupamatt: I have doubts about the problem the code pointed out by the report and the fix does not effect the setting of the destroy_disks variable that is reponsible to signal deletion of files from the disk at cleanup07:15
gibiit seems nobody is around so I will update the bug report with my doubt/questions07:23
gibidone07:43
bauzasgibi: ack, sorry, I had trouble with my home network07:47
gibibauzas: no worries07:48
bauzasgibi: this is on 2024.107:48
bauzasCaracal so07:48
gibibauzas: it was a backport from master07:49
gibimerged this cycle07:49
bauzastechnically, we could deliver RC1 and then backport that one in a .z release07:49
gibihttps://review.opendev.org/c/openstack/nova/+/90980607:49
bauzasokay, lemme look at the commit SHA107:49
gibithat is the offending patch according to the reporter07:49
gibibauzas: I lean towards that the reporter do see a numa live migration regression on shared storage but I'm not sure that the root cause is the power mgmt flag added. But more like that flag now trigger a driver cleanup call that wasn't triggered before07:50
gibiand that cleanup call might have issues07:50
bauzasagreed07:50
bauzasthat's the cleanup07:51
bauzasgibi: oh, that's the conditional07:53
bauzaswe only cleanup if we don't have shared path07:53
bauzasbut here, the 'or' makes it now optional07:53
bauzaswe just need to change the conditional07:53
gibicalling the cleanup with destroy_disks should be OK07:56
gibidestroy_disk= false07:56
gibiappreantly it isn't OK but changing the logic at when to trigger do_cleanup seems to be the wrong fix to me07:57
gibias destroy_disk=false should prevent the disk deletion07:57
bauzastrue07:57
bauzasbut we check two different fields for shared storage07:58
bauzasthe conditional checks whether both computes use the same path07:58
gibiyes I spotted that but did not dig deeper tehre07:58
bauzaswhile the destroy_disks arg checks whether we use block storage07:59
bauzasI assume the reporter gets something like True for the shared path but False for shared block storage07:59
gibiwhat is a shared block storage?08:01
bauzasfor mdevs and vpmems, there was a reason to call cleanup08:01
gibijust cinder? or something more special?08:01
bauzaswe need to delete the local residues08:01
bauzashonestly, that's the first time I'm hearing this 08:02
bauzasI'm used to the regular FS share between instance pathes08:02
bauzasthat's the regular shared storage I know08:03
gibiyeah I think is_shared_instance_path is the case when the instance dir is on e.g. NFS 08:03
bauzasyup08:03
gibiwe need to call cleanup if power mgmt is enabled so the current fix proposed would regress that08:04
bauzasI need to see what makes migrate_data.is_shared_block_storage08:04
gibiyep08:04
bauzasand it would also regress the mdev live-migration support08:05
bauzaswe need to call cleanup to delete the mdevs we reserved08:05
bauzashence the 'or'08:05
bauzasI need a coffee08:06
bauzasit's early and I'm already fried08:06
bauzastrying to improve my local LTE connection took me too much of time08:06
gibihttps://github.com/openstack/nova/blob/16d815990b53b9afa35fc4da38609f87820fa690/nova/virt/libvirt/driver.py#L10367-L1036908:08
bauzasooooh I see08:09
gibihttps://github.com/openstack/nova/blob/16d815990b53b9afa35fc4da38609f87820fa690/nova/virt/libvirt/driver.py#L1047008:09
bauzasyeah, some this is when you have BFV or you share the same libvirt image backend between both08:10
bauzasthat's not NFS08:10
gibiyeah it is the RBD image backend 08:11
bauzasso08:11
bauzasI think of something quick08:11
bauzaspreviously, we were NOT calling cleanup on shared storage08:11
bauzasright?08:12
bauzasnow we WANT to call it too, but we don't want to destroy the disks08:12
bauzasI see a simple solution08:12
bauzaswhich is to add another clause to destroy_disks08:13
bauzasgibi: thoughts ?08:13
gibithinking...08:14
gibiwe did not call cleanup if is_shared_instance_path was true. And now we need it due to power mgmt.08:14
bauzasyup08:14
bauzasbut we need to call it smartly08:15
gibiyes08:15
gibiwould the change destroy_disks = not (migrate_data.is_shared_block_storage or migrate_data.is_shared_instance_path) work?08:16
bauzasthat's my hope08:16
bauzasbut we need to check the cleanup method08:16
gibihttps://github.com/openstack/nova/blob/16d815990b53b9afa35fc4da38609f87820fa690/nova/virt/libvirt/driver.py#L1707-L1724 this is the relevant path. I think it would work08:20
gibibut we don't have reproduction08:20
bauzasyeah08:21
bauzaswe need a functional test08:21
gibiIn 20 mins I need to step away for more than an hour 08:22
bauzasI can try to work on the reproducer08:22
gibiI will summarize the above discussion in the bug 08:22
bauzasI already did in the report08:22
gibicool08:23
gibiI will add a request to the reporter to try,if they can, our fix proposal08:23
bauzasplease, but I assume he's on West-US timezones08:23
gibidone08:36
*** bauzas_ is now known as bauzas09:30
opendevreviewSylvain Bauza proposed openstack/nova master: Add Dalmatian prelude section  https://review.opendev.org/c/openstack/nova/+/92899510:28
opendevreviewMerged openstack/osc-placement master: Update master for stable/2024.2  https://review.opendev.org/c/openstack/osc-placement/+/92836010:28
yusufgungor_hi everyone, we think hit a strange bug. We got the mtu error as below on the source compute node, when live migrating an instance with the specified scenarios.11:32
yusufgungor_"Target network card MTU 0 does not match source 1450: libvirt.libvirtError: unsupported configuration: Target network card MTU 0 does not match source 1450"11:32
yusufgungor_Also debug logs shows that mtu is null for one of the interfaces for the failed scenarios. _update_vif_xml function prints log as below on debug:11:32
yusufgungor_Updating guest XML with vif config: <interface type="ethernet">11:32
yusufgungor_  <mac address="fa:16:3e:19:a8:ba"/>11:33
yusufgungor_  <model type="virtio"/>11:33
yusufgungor_  <target dev="tapfb1b3816-ed"/>11:33
yusufgungor_</interface>11:33
yusufgungor_Updating guest XML with vif config: <interface type="ethernet">11:33
yusufgungor_  <mac address="fa:16:3e:8e:e7:b3"/>11:33
yusufgungor_  <model type="virtio"/>11:33
yusufgungor_  <mtu size="1500"/>11:33
yusufgungor_  <target dev="tapd001bd30-9c"/>11:33
yusufgungor_</interface>11:33
yusufgungor_Should we create a bug report? This mtu error causes revert the migration and triggers another strange bug on the os-brick side which unmaps the incorrect iscsi device for random instance :/ It is another story.11:33
yusufgungor_Scenarios:11:33
yusufgungor_1. When creating an instance from a vxlan tenant network, then attaching an interface from the provider network, and live migrating, it gives an mtu error! 11:33
yusufgungor_2. When creating an instance from a vxlan tenant network, then attaching an interface from the provider network, rebooting, and live migrating, it does not give an mtu error.11:33
yusufgungor_3. When creating an instance only from the provider network, it does not give an mtu error.11:33
yusufgungor_4. When creating an instance from the provider network, then attaching an interface from the vxlan tenant network, and migrating, it gives an mtu error!11:33
yusufgungor_5. When creating an instance, giving an IP from both the vxlan tenant network and the provider network, and live migrating, it does not give an mtu error.11:33
yusufgungor_6. When creating an instance only from the provider network, it does not give an mtu error. Then, when attaching a leg from the vxlan tenant network and migrating again, it gives an mtu error!11:33
yusufgungor_nova-compute --version : 26.2.1 (osp cluster is zed version)11:36
sean-k-mooneyyusufgungor_: so it shoudl be noted that nova does not supprot multi segment networks with differnt mtu12:04
sean-k-mooneyyusufgungor_: vxlan has 48? byte of encapsulation overhead so we round that up to 50 and reduce its mtu vs vlan/flat in neutorn automatically12:05
sean-k-mooneyyusufgungor_: you should file a bug with as much detail as is relevent12:07
yusufgungor_@sean thanks for reply. so a nova instance with both vxlan tenant network (mtu: 1450) and provider network (mtu: 1500) is not supported? Actually it works except live migration on spesific case. For example after hard reboot the instance we can live migrate too.12:08
yusufgungor_@sean-k-mooney thanks, then i will file a bug report to nova12:08
sean-k-mooneyno you can have instance with a mix of both12:09
sean-k-mooneythat should be fine12:09
sean-k-mooneyjust not a neutron network that has a vlan segment and a vxlan segment with differnt mtus12:10
sean-k-mooneythat actully might work form a nova side as we take the mtu form the port12:10
sean-k-mooneybut it would break l2 connectivty between the segments as mtus should only chagne on l3(subnet) bondaries12:11
yusufgungor_@sean-k-mooney thanks. actually we have tried to set the provider network mtu as 1450 too and tested but hitting the same live migrate error12:12
yusufgungor_@sean-k-mooney i will test it again, and file a bug12:12
sean-k-mooneyoh this might be an old bug then12:12
sean-k-mooneyso we do not suprot changing the mtu of neutron netowrks12:13
sean-k-mooneyand by we i mean nova12:13
sean-k-mooneythe only way to do that properly is if you detach and reattach all the ports or otherwise regenerate the guest xml12:14
yusufgungor_@sean-k-mooney thanks 🙏12:15
ratailor_sean-k-mooney, could you please provide your suggestion on this approach https://review.opendev.org/c/openstack/nova/+/928933 for https://bugs.launchpad.net/nova/+bug/205892812:55
ratailor_sean-k-mooney, the finish_time field is not shown in instance action show output, that's why we need to do this before calling instance action finish method for respective APIs where we need to finish the started action.12:56
ratailor_sean-k-mooney, I will fix failing tests and add new tests wherever required, once this approach looks good.12:57
ratailor_artom, gibi please provide your suggestions as well ^^13:00
opendevreviewsean mooney proposed openstack/nova master: repoduce post liberty pre vicoria instance numa db issue  https://review.opendev.org/c/openstack/nova/+/92902513:01
sean-k-mooneydansmith: gibi bauzas ^13:02
sean-k-mooneyill file a bug for that and update it when i push the fix patch13:04
yusufgungor_@sean-k-mooney i have filed a bug: https://bugs.launchpad.net/nova/+bug/208053113:31
opendevreviewDan Smith proposed openstack/nova stable/2023.1: [stable-only]: Port qcow2v2 support from oslo  https://review.opendev.org/c/openstack/nova/+/92893714:02
opendevreviewRajesh Tailor proposed openstack/nova master: Add support for showing finish_time  https://review.opendev.org/c/openstack/nova/+/92893314:08
dansmithsean-k-mooney: are you respinning the safety patch to invert the safe/unsafe logic or no?16:29
dansmithyour comment made it sound like yes, but...16:30
sean-k-mooneyno16:31
sean-k-mooneyi inverted the logic already16:31
sean-k-mooneyso it should be good as is16:31
dansmithokay16:31
sean-k-mooneyunsafe would have been less work but i was alrey 30 mins into fixing the tests16:31
sean-k-mooneyso i decied to just finish it16:31
dansmithyeah I just wasn't sure from your comment if you were going to respin is all16:32
bauzashum, looks like we don't have any existing functional tests checking shared storage16:32
sean-k-mooneyya i proably should have dropped the comment16:32
sean-k-mooneybauzas: we may under the libvirt section16:32
sean-k-mooneybut in generally its light if it exists16:33
bauzasmy grep on shared_storage or just storage shows me nothing16:33
bauzasI assume I may have to mock this16:34
bauzasiirc, we just detect shared storage if we can see the same instance files on both computes, right?16:34
sean-k-mooneyyep you can mock that function16:36
sean-k-mooneyi would just create a regression test in a spereate file16:36
sean-k-mooneyand use https://github.com/openstack/nova/blob/master/nova/tests/functional/libvirt/base.py#L264 to make live migration work16:36
sean-k-mooneythen stub the function for the shared storage check to return true16:36
sean-k-mooneyhttps://github.com/openstack/nova/blob/16d815990b53b9afa35fc4da38609f87820fa690/nova/tests/functional/regressions/test_bug_1939545.py16:36
bauzaswe have a couple of functional tests for live-migration, that's not a problem16:36
sean-k-mooneyi think has all the ground work16:37
bauzasoh cool, thanks for the digging but I was about to do it :)16:37
bauzasokay, I'm out of steam and I don't have ideas on how to correctly assert we delete the local disks, except checking we call the delete17:20
bauzassee you tomorrow17:20
*** bauzas_ is now known as bauzas19:48
opendevreviewsean mooney proposed openstack/nova master: repoduce post liberty pre vicoria instance numa db issue  https://review.opendev.org/c/openstack/nova/+/92902520:09
opendevreviewsean mooney proposed openstack/nova master: allow upgrade of pre-victoria InstanceNUMACells  https://review.opendev.org/c/openstack/nova/+/92918720:09
opendevreviewsean mooney proposed openstack/nova master: repoduce post liberty pre vicoria instance numa db issue  https://review.opendev.org/c/openstack/nova/+/92902520:14
opendevreviewsean mooney proposed openstack/nova master: allow upgrade of pre-victoria InstanceNUMACells  https://review.opendev.org/c/openstack/nova/+/92918720:14
sean-k-mooneylooks like openstack clinet is broken22:11
sean-k-mooney024-09-12 21:04:32.171696 | controller | + /opt/stack/nova/roles/run-evacuate-hook/files/test_negative_evacuate.sh:evacuate_and_wait_for_error:19 :   openstack --os-compute-api-version 2.67 server evacuate --host np0038485592 evacuate-test22:11
sean-k-mooney2024-09-12 21:04:33.305045 | controller | compute version 2.67 is not in supported versions: 2, 2.122:11
sean-k-mooney2024-09-12 21:04:34.474915 | controller | Proxy.evacuate_server() got an unexpected keyword argument 'password'22:11
sean-k-mooneystephenfin: ^ fyi for the morning22:12

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!