Tuesday, 2024-08-20

opendevreviewTakashi Kajinami proposed openstack/nova master: Report availability of stateless firmware support  https://review.opendev.org/c/openstack/nova/+/90888800:37
opendevreviewTakashi Kajinami proposed openstack/nova master: libvirt: Launch instances with stateless firmware  https://review.opendev.org/c/openstack/nova/+/90889000:37
opendevreviewTakashi Kajinami proposed openstack/nova master: Add hw_firmware_stateless image property  https://review.opendev.org/c/openstack/nova/+/92659000:37
*** __ministry is now known as Guest91901:23
*** __ministry is now known as Guest93002:45
opendevreviewMerged openstack/nova master: [libvirt]log XML if nova fails to parse it  https://review.opendev.org/c/openstack/nova/+/90636403:37
*** __ministry is now known as Guest93403:59
*** __ministry is now known as Guest94005:22
opendevreviewZhang Hua proposed openstack/nova master: Fix deepcopy usage for BlockDeviceMapping in get_root_info  https://review.opendev.org/c/openstack/nova/+/92037405:23
*** bauzas_ is now known as bauzas06:39
*** __ministry is now known as Guest94806:47
ralonsohsean-k-mooney, hello! I have an investigation in progress (well, very little progress so far)08:39
ralonsohI'm fixing an issue with eventlet, wsgi and tls with ML2/OVN08:39
*** bauzas_ is now known as bauzas08:40
ralonsohin a nutshell: I'm pushing https://review.opendev.org/c/openstack/neutron/+/925376/08:40
ralonsohthis patch works fine with ML2/OVN but not so great with ML2/OVS08:40
ralonsohthe same test is randomly (but very frequently) failing in the OVS jobs: test_established_tcp_session_after_re_attachinging_sg08:41
ralonsohthe issue, so far, is because n-compute is deleting the VM used to test the ping08:41
ralonsohfor example: https://32db135de77b6f5b2e24-00cf68680e4576901ac5e284377f47f2.ssl.cf2.rackcdn.com/924317/7/experimental/neutron-tempest-plugin-openvswitch-distributed-dhcp/43f8b60/controller/logs/index.html08:42
ralonsohsean-k-mooney, if you have time to review these logs, can you explain why nova is trying to delete the VM?09:10
ralonsohAug 19 12:41:14.998800 np0038215967 devstack@n-api.service[58447]: DEBUG nova.compute.api [None req-54123910-92d9-491b-b845-f8e7d37490dd tempest-StatefulNetworkSecGroupTest-1423213631 tempest-StatefulNetworkSecGroupTest-1423213631-project-member] [instance: 4623b509-a848-41a4-9fc2-1cf8227bd4ca] Going to try to terminate instance {{(pid=58447) delete /opt/stack/nova/nova/compute/api.py:2725}}09:10
ralonsohsorry, I don't find the reason09:10
mikalsean-k-mooney: I'd do a change for the usb redirect devices as extra specs, but I'm starting to be a bit worried about any of these changes landing in time. What would you prefer?09:19
mikalsean-k-mooney: naively I'd just cargo cult the sound device extra spec one for usb redirect devices, so I was kind of waiting for someone to tell me I did that one wrong first.09:19
sean-k-mooney[m]mikal:  ill review that shortly and let you know09:40
sean-k-mooney[m]ralonsoh:  ill try and take a look nova wont delete the vm unless asked09:43
sean-k-mooney[m]even missing a vif plugged event willl just cause it to go to error althoguh we might stop it09:44
mikalsean-k-mooney: yeah, I wont get to it tonight anyways as I'm oncall and that always makes me tired. But yeah, I'd appreciate the feedback, I've never done one of those before.09:49
ralonsohsean-k-mooney[m], I think I found something: the VM is created at 12:40:0810:00
ralonsohbut the OVS agent receives the tap port creation almost one minute later and doesn't have time to send the vif-plugged event10:01
ralonsohand the same time, n-compute decides to delete the VM10:01
ralonsohhow can I increase this timeout?10:01
leaveletHi! I encountered a cinderclient.exceptions.Unauthorized error when creating an instance. I also received the message: "Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible." I've confirmed that my nova service_user is configured correctly, and nova-status upgrade check shows all checks as successful. The error logs can be found at https://paste.aosc.io/paste/hMJM8omukSVYzgvLj10:02
leaveletHEhqQ. Should I file an issue on Launchpad, or is this likely a configuration problem on my end?10:02
leaveletI'm new to IRC and not very familiar with it, so if I'm in the wrong place, please let me know10:02
ralonsohsean-k-mooney[m], vif_plugging_timeout, that by default is 300 (now I don't understand...)10:03
sean-k-mooneyralonsoh: you really should not need too but i think we have a time out that defaults to 300 seconds10:03
sean-k-mooneyya10:03
ralonsohso I don't understand why the VM is deleted10:03
sean-k-mooneyits already longer then is reasonable and you are using kvm in this job10:04
ralonsohIt is not commanded by the tempest test10:04
sean-k-mooneyso the vm should be quick to boot/start10:04
sean-k-mooneyi did notice that libvirt is a littel unhappy on that node and it failed to detach an interface in one case10:05
sean-k-mooneyyou are also using very old guest images (20.04 and cirros 0.5.3) im not sure if that is related or not10:05
ralonsohbtw, the VM id is 4623b509-a848-41a4-9fc2-1cf8227bd4ca and the port 9bc40eec-ae67-4b56-9e75-551609520b2710:05
ralonsohthat is working in the CI well, but not with this patch in particular10:06
sean-k-mooney so the domain was defied at Aug 19 12:40:14.41426310:07
sean-k-mooneyand we get the vif plugged event at ug 19 12:40:16.54780410:08
sean-k-mooneyso 2 seconds later10:08
sean-k-mooneyralonsoh: is this using the iptables firewall or openvswitch i assuem the latter10:09
ralonsohlet me check (by default is the later)10:09
sean-k-mooney"port_filter": true, "ovs_hybrid_plug": false, 10:09
sean-k-mooneyits the ovs firewall10:09
ralonsohyes, ovs fw10:09
sean-k-mooneythat means that the ml2 agent only sees the tap when the vm is started vs seeign the veth pair10:10
sean-k-mooneyslightly earilier10:10
sean-k-mooneyso if we look at the ovs log we should see when it sees the tap10:11
ralonsohat Aug 19 12:41:15.65362310:11
ralonsohjust before it is deleted10:11
sean-k-mooney2024-08-19T12:40:15.533Z|02270|bridge|INFO|bridge br-int: added interface tap9bc40eec-ae on port 25710:12
ralonsohthe ovsdb client detects the tap at 12:41:1510:12
ralonsohAug 19 12:41:15.653623 np0038215967 neutron-openvswitch-agent[61294]: DEBUG neutron.agent.common.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["18eba42b-2254-46a1-aeb8-2027e39235ed","old",null,257,null],["","new","tap9bc40eec-ae",-1,["map",[["attached-mac","fa:16:3e:ec:7b:05"],["iface-id","9bc40eec-ae67-4b56-9e75-551609520b27"10:12
ralonsoh],["iface-status","active"],["vm-uuid","4623b509-a848-41a4-9fc2-1cf8227bd4ca"]]]]],"headings":["row","action","name","ofport","external_ids"]} {{(pid=61294) _read_stdout /opt/stack/neutron/neutron/agent/common/async_process.py:285}}10:12
sean-k-mooneyno it sees it just before neutron sends the vif pulgged event10:12
sean-k-mooneyi was checking the actual ovs-vswitchd log https://32db135de77b6f5b2e24-00cf68680e4576901ac5e284377f47f2.ssl.cf2.rackcdn.com/924317/7/experimental/neutron-tempest-plugin-openvswitch-distributed-dhcp/43f8b60/controller/logs/openvswitch/ovs-vswitchd_log.txt10:13
ralonsohthen I don't know why this delay between the vswitch detection and the ovsdb-client10:14
ralonsohthat's a lot10:14
ralonsohactually one minute (could be an error in the timestamp?)10:15
sean-k-mooneyralonsoh: so nova recived the network vif plugged event correctly after 2 seconds10:16
sean-k-mooneythe instance delete starts at Aug 19 12:41:1510:17
ralonsohyes but why?10:17
ralonsohwhy is the VM being deleted?10:17
ralonsohyes, I see the vif-plugged event at 12:40:1610:18
ralonsohno idea why is sending it10:19
ralonsohwho*10:19
sean-k-mooneyis the provisioning block stuff broken again?10:20
sean-k-mooneyproably not10:20
sean-k-mooneyim going to check novas api to see what deleted the instance10:20
ralonsohI'm checking the neutron API logs and this event is not sent10:20
sean-k-mooneyso the delete did happen via the api 10:24
sean-k-mooneyAug 19 12:41:15.092746 np0038215967 devstack@n-api.service[58447]: INFO nova.api.openstack.requestlog [None req-54123910-92d9-491b-b845-f8e7d37490dd tempest-StatefulNetworkSecGroupTest-1423213631 tempest-StatefulNetworkSecGroupTest-1423213631-project-member] 149.202.166.120 "DELETE /compute/v2.1/servers/4623b509-a848-41a4-9fc2-1cf8227bd4ca" status: 204 len: 0 microversion: 2.110:24
sean-k-mooneytime: 0.12746610:24
sean-k-mooneyso the quest is was that tempest 10:24
sean-k-mooneyit proably was but we can check the request id10:24
ralonsohyes10:25
ralonsoh2024-08-19 12:41:15.096 87668 INFO tempest.lib.common.rest_client [req-54123910-92d9-491b-b845-f8e7d37490dd req-54123910-92d9-491b-b845-f8e7d37490dd ] Request (StatefulNetworkSecGroupTest:_run_cleanups): 204 DELETE https://149.202.166.120/compute/v2.1/servers/4623b509-a848-41a4-9fc2-1cf8227bd4ca 0.147s10:25
ralonsohI didn't find it before...10:25
ralonsohsean-k-mooney, thanks a lot!!10:25
sean-k-mooneywell that proablju just the test cleanup10:26
sean-k-mooneyit does not tells use why it decied to do that10:26
sean-k-mooneybut im guessing there is a timeout or something10:26
ralonsohyes, now I need to know why the test is failing10:26
ralonsohI see the SSH is working10:26
sean-k-mooneyand your not expecting too?10:28
ralonsohno, that'10:28
ralonsohthat's ok, but the test (ping) is working, and should not10:28
ralonsohadd SG rule -> ping; remove SG rule -> no ping10:28
sean-k-mooneyso i think for ml2/ovs we alwasy allow icmp but i might be mistaken10:29
sean-k-mooneyovn, iptables and the ovs security group driver do have some diffent behaivor10:29
sean-k-mooneyalthough maybing im thinkign of arp not icmp10:30
sean-k-mooneyis the test stopping the ping when it removes the rule10:30
sean-k-mooneyit could be related to contrack10:30
ralonsohsorry, not ping but nc command10:30
sean-k-mooney:) ok agian it could be related to contrack i dont think we terminate established connections10:31
*** __ministry is now known as Guest96612:10
sean-k-mooneymikal: https://review.opendev.org/c/openstack/nova/+/926126 is correct although dansmith for valid reasons prefer we now split this type of change into 2 commits 1 for the object change and 1 for the driver change. 12:49
*** ministry is now known as __ministry13:11
*** __ministry is now known as Guest97013:11
*** jamesdenton_alt is now known as jamesdenton13:25
opendevreviewTakashi Kajinami proposed openstack/nova master: libvirt: Launch instances with stateless firmware  https://review.opendev.org/c/openstack/nova/+/90889013:26
tkajinamsean-k-mooney, I've added functional test coverage but I'm struggling to understand why rebuild test fails... I'll dig into it further but it'd be really appreciated if you can take a look to find out what I've overlooked.13:29
tkajinamalso there is a bit tricky problem with assertion in live migration. I've summarized it (as well as the problem with rescue) so it'd be nice if you can check it when you have time13:30
sean-k-mooneytkajinam: if i can find time to run it locally ill try and do that13:36
tkajinamsean-k-mooney, thx13:36
sean-k-mooneytkajinam: we have test helper for rebuild by the way adn some of the other actions13:36
sean-k-mooneyso you shoudl not need to do a raw post13:36
tkajinamah, ok. let me find these. I found these bare posts in a few other test files but we can probably replace these, too13:37
tkajinam(in a separate follow-up13:37
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/tests/functional/integrated_helpers.py#L55013:37
sean-k-mooneymost of the actions shoudl be there and if you need a new one you should just ad it ther13:38
sean-k-mooneythats not yoru issue but just good to know they exist to aovid code duplication13:38
sean-k-mooneyOS-DCF:diskConfig  is not generaly requrie by the way and it default to auto which iswhy we dont actully set it today13:40
sean-k-mooneyoh hehe13:41
sean-k-mooneyhttps://review.opendev.org/c/openstack/nova/+/908890/15/nova/tests/functional/libvirt/test_stateless_firmware.py#24613:41
sean-k-mooneytkajinam: your not using the updated server object13:42
sean-k-mooneyoh your passing the id and doing the db lookup13:42
sean-k-mooneyinternally13:42
sean-k-mooneybut you rusing self.guest_configs[server_id].os_loader_stateless13:43
sean-k-mooneyso i wonder if that has stale info13:43
tkajinamsean-k-mooney, I'm fixing it now. However the current blocker appears in the rescue API call so that line is not related13:45
opendevreviewTakashi Kajinami proposed openstack/nova master: libvirt: Launch instances with stateless firmware  https://review.opendev.org/c/openstack/nova/+/90889013:45
sean-k-mooneyok so rescue is your main consern https://review.opendev.org/c/openstack/nova/+/908890/16/nova/tests/functional/libvirt/test_stateless_firmware.py#20813:46
sean-k-mooneyill quickly pull that review and see what happens locally13:46
sean-k-mooneytkajinam: so rescue is failing beucase the test is tryign to open the unrescue_xml_path i.e. its tryign to write to disk14:09
sean-k-mooneytkajinam: so there is a missing mock somewhere14:09
sean-k-mooneyhttps://paste.opendev.org/show/bCsw4SkPFosMTi0pL4yd/14:10
tkajinamI was wondering why that instance directory is not created but it might be because fake libvirt driver is used. hmm... there are no rescue functional tests with fake libvirt driver now so I may need to look into additional mocking to make the process pass14:13
sean-k-mooneythere are https://github.com/openstack/nova/blob/0b091179d575a5e7cff9cd76223a8016cfbbc019/nova/tests/functional/libvirt/test_rescue_deleted_base.py14:14
sean-k-mooneybut but its doing https://github.com/openstack/nova/blob/0b091179d575a5e7cff9cd76223a8016cfbbc019/nova/tests/functional/libvirt/test_rescue_deleted_base.py#L48-L5714:15
tkajinamahhh14:15
sean-k-mooneyif i add that14:17
sean-k-mooneythen it fails with https://paste.opendev.org/show/bLajZLL1mJueSgMkPNAQ/14:17
sean-k-mooneybut that might just be an issue with the assert14:18
tkajinamyeah. rescue uses the image without stateless firmware so that assertion is wrong14:18
sean-k-mooneyok well i trust you can figure out how to fix that 14:19
sean-k-mooneyill quickly look at why the live migration test is failing14:19
sean-k-mooneyTypeError: Can't upgrade a READER transaction to a WRITER mid-transaction14:19
sean-k-mooneythat od14:20
sean-k-mooneyi feel like you just missing the migration stub or soemthign liek that14:21
sean-k-mooneyno you have the corret mixin14:21
opendevreviewTakashi Kajinami proposed openstack/nova master: libvirt: Launch instances with stateless firmware  https://review.opendev.org/c/openstack/nova/+/90889014:24
tkajinamsean-k-mooney, sorry but where did you find that type error ?14:25
tkajinamthe remaining problem with live migration tests is that the xml content generated by fake libvirt driver does not contain any loader related fields and it's not possible to assert that these fields are passed in destination_xml. https://github.com/openstack/nova/blob/0b091179d575a5e7cff9cd76223a8016cfbbc019/nova/tests/fixtures/libvirt.py#L1565-L156814:26
sean-k-mooneyim running it in an IDE but you can get it to print by adding OS_DEBUG=1 to your tox command14:27
tkajinamI can technically extend the implementation to simulate more detailed behavior but I'm inclined to look into whitebox tests instead of importing more libvirt's behaviors into that fixture14:27
tkajinamah, ok14:27
sean-k-mooneyoh weird14:30
sean-k-mooneyok it passes if i run it with stestr14:31
sean-k-mooneyi.e. OS_DEBUG=1 stestr  --test-path=./nova/tests/functional  run  -- LibvirtStatelessFirmwareTest.test_live_migrate_server14:31
opendevreviewTakashi Kajinami proposed openstack/nova master: libvirt: Launch instances with stateless firmware  https://review.opendev.org/c/openstack/nova/+/90889014:34
sean-k-mooneyso that test only fails when i run it form my ide which might be related to how im runing it since im not using stestr and i kind of hacked it to work14:35
tkajinamyeah14:36
tkajinamhttps://paste.opendev.org/show/bsFDSlO0e6ZaLcsTFX2g/14:36
tkajinamat least that test passes in my env with OS_DEBUG=114:36
sean-k-mooneyif it passes in ci thats the main thing14:36
sean-k-mooneyill test v18 now14:36
sean-k-mooneycool all 8 pass under stestr14:37
sean-k-mooneyi didnt run all the funcitonal tests but all the libvirt ones pass locally14:40
sean-k-mooneyim goig to grab a drink but ill take a look at the current version when i get back14:41
tkajinamsean-k-mooney, thanks... I've added a few comments to explain a few context behind current test coverage which hopefully helps that review...14:44
sean-k-mooneysome of those comments are incorrect. in general i think most of the real code changes look correct and the unit test coverage is fine15:31
sean-k-mooneyi think the functiontional coverage can be impvoed but its reasonably close15:31
sean-k-mooneyto what we woudl need to merge this15:31
opendevreviewMasahito Muroi proposed openstack/nova master: Use boot_roles_count and boot_role_<count> key for system_metadata  https://review.opendev.org/c/openstack/nova/+/92516317:12
*** bauzas_ is now known as bauzas17:41
*** bauzas_ is now known as bauzas18:36
opendevreviewmelanie witt proposed openstack/nova master: Fix deepcopy usage for BlockDeviceMapping in get_root_info  https://review.opendev.org/c/openstack/nova/+/92037421:49

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!