opendevreview | Takashi Kajinami proposed openstack/nova master: Report availability of stateless firmware support https://review.opendev.org/c/openstack/nova/+/908888 | 00:37 |
---|---|---|
opendevreview | Takashi Kajinami proposed openstack/nova master: libvirt: Launch instances with stateless firmware https://review.opendev.org/c/openstack/nova/+/908890 | 00:37 |
opendevreview | Takashi Kajinami proposed openstack/nova master: Add hw_firmware_stateless image property https://review.opendev.org/c/openstack/nova/+/926590 | 00:37 |
*** __ministry is now known as Guest919 | 01:23 | |
*** __ministry is now known as Guest930 | 02:45 | |
opendevreview | Merged openstack/nova master: [libvirt]log XML if nova fails to parse it https://review.opendev.org/c/openstack/nova/+/906364 | 03:37 |
*** __ministry is now known as Guest934 | 03:59 | |
*** __ministry is now known as Guest940 | 05:22 | |
opendevreview | Zhang Hua proposed openstack/nova master: Fix deepcopy usage for BlockDeviceMapping in get_root_info https://review.opendev.org/c/openstack/nova/+/920374 | 05:23 |
*** bauzas_ is now known as bauzas | 06:39 | |
*** __ministry is now known as Guest948 | 06:47 | |
ralonsoh | sean-k-mooney, hello! I have an investigation in progress (well, very little progress so far) | 08:39 |
ralonsoh | I'm fixing an issue with eventlet, wsgi and tls with ML2/OVN | 08:39 |
*** bauzas_ is now known as bauzas | 08:40 | |
ralonsoh | in a nutshell: I'm pushing https://review.opendev.org/c/openstack/neutron/+/925376/ | 08:40 |
ralonsoh | this patch works fine with ML2/OVN but not so great with ML2/OVS | 08:40 |
ralonsoh | the same test is randomly (but very frequently) failing in the OVS jobs: test_established_tcp_session_after_re_attachinging_sg | 08:41 |
ralonsoh | the issue, so far, is because n-compute is deleting the VM used to test the ping | 08:41 |
ralonsoh | for example: https://32db135de77b6f5b2e24-00cf68680e4576901ac5e284377f47f2.ssl.cf2.rackcdn.com/924317/7/experimental/neutron-tempest-plugin-openvswitch-distributed-dhcp/43f8b60/controller/logs/index.html | 08:42 |
ralonsoh | sean-k-mooney, if you have time to review these logs, can you explain why nova is trying to delete the VM? | 09:10 |
ralonsoh | Aug 19 12:41:14.998800 np0038215967 devstack@n-api.service[58447]: DEBUG nova.compute.api [None req-54123910-92d9-491b-b845-f8e7d37490dd tempest-StatefulNetworkSecGroupTest-1423213631 tempest-StatefulNetworkSecGroupTest-1423213631-project-member] [instance: 4623b509-a848-41a4-9fc2-1cf8227bd4ca] Going to try to terminate instance {{(pid=58447) delete /opt/stack/nova/nova/compute/api.py:2725}} | 09:10 |
ralonsoh | sorry, I don't find the reason | 09:10 |
mikal | sean-k-mooney: I'd do a change for the usb redirect devices as extra specs, but I'm starting to be a bit worried about any of these changes landing in time. What would you prefer? | 09:19 |
mikal | sean-k-mooney: naively I'd just cargo cult the sound device extra spec one for usb redirect devices, so I was kind of waiting for someone to tell me I did that one wrong first. | 09:19 |
sean-k-mooney[m] | mikal: ill review that shortly and let you know | 09:40 |
sean-k-mooney[m] | ralonsoh: ill try and take a look nova wont delete the vm unless asked | 09:43 |
sean-k-mooney[m] | even missing a vif plugged event willl just cause it to go to error althoguh we might stop it | 09:44 |
mikal | sean-k-mooney: yeah, I wont get to it tonight anyways as I'm oncall and that always makes me tired. But yeah, I'd appreciate the feedback, I've never done one of those before. | 09:49 |
ralonsoh | sean-k-mooney[m], I think I found something: the VM is created at 12:40:08 | 10:00 |
ralonsoh | but the OVS agent receives the tap port creation almost one minute later and doesn't have time to send the vif-plugged event | 10:01 |
ralonsoh | and the same time, n-compute decides to delete the VM | 10:01 |
ralonsoh | how can I increase this timeout? | 10:01 |
leavelet | Hi! I encountered a cinderclient.exceptions.Unauthorized error when creating an instance. I also received the message: "Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible." I've confirmed that my nova service_user is configured correctly, and nova-status upgrade check shows all checks as successful. The error logs can be found at https://paste.aosc.io/paste/hMJM8omukSVYzgvLj | 10:02 |
leavelet | HEhqQ. Should I file an issue on Launchpad, or is this likely a configuration problem on my end? | 10:02 |
leavelet | I'm new to IRC and not very familiar with it, so if I'm in the wrong place, please let me know | 10:02 |
ralonsoh | sean-k-mooney[m], vif_plugging_timeout, that by default is 300 (now I don't understand...) | 10:03 |
sean-k-mooney | ralonsoh: you really should not need too but i think we have a time out that defaults to 300 seconds | 10:03 |
sean-k-mooney | ya | 10:03 |
ralonsoh | so I don't understand why the VM is deleted | 10:03 |
sean-k-mooney | its already longer then is reasonable and you are using kvm in this job | 10:04 |
ralonsoh | It is not commanded by the tempest test | 10:04 |
sean-k-mooney | so the vm should be quick to boot/start | 10:04 |
sean-k-mooney | i did notice that libvirt is a littel unhappy on that node and it failed to detach an interface in one case | 10:05 |
sean-k-mooney | you are also using very old guest images (20.04 and cirros 0.5.3) im not sure if that is related or not | 10:05 |
ralonsoh | btw, the VM id is 4623b509-a848-41a4-9fc2-1cf8227bd4ca and the port 9bc40eec-ae67-4b56-9e75-551609520b27 | 10:05 |
ralonsoh | that is working in the CI well, but not with this patch in particular | 10:06 |
sean-k-mooney | so the domain was defied at Aug 19 12:40:14.414263 | 10:07 |
sean-k-mooney | and we get the vif plugged event at ug 19 12:40:16.547804 | 10:08 |
sean-k-mooney | so 2 seconds later | 10:08 |
sean-k-mooney | ralonsoh: is this using the iptables firewall or openvswitch i assuem the latter | 10:09 |
ralonsoh | let me check (by default is the later) | 10:09 |
sean-k-mooney | "port_filter": true, "ovs_hybrid_plug": false, | 10:09 |
sean-k-mooney | its the ovs firewall | 10:09 |
ralonsoh | yes, ovs fw | 10:09 |
sean-k-mooney | that means that the ml2 agent only sees the tap when the vm is started vs seeign the veth pair | 10:10 |
sean-k-mooney | slightly earilier | 10:10 |
sean-k-mooney | so if we look at the ovs log we should see when it sees the tap | 10:11 |
ralonsoh | at Aug 19 12:41:15.653623 | 10:11 |
ralonsoh | just before it is deleted | 10:11 |
sean-k-mooney | 2024-08-19T12:40:15.533Z|02270|bridge|INFO|bridge br-int: added interface tap9bc40eec-ae on port 257 | 10:12 |
ralonsoh | the ovsdb client detects the tap at 12:41:15 | 10:12 |
ralonsoh | Aug 19 12:41:15.653623 np0038215967 neutron-openvswitch-agent[61294]: DEBUG neutron.agent.common.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["18eba42b-2254-46a1-aeb8-2027e39235ed","old",null,257,null],["","new","tap9bc40eec-ae",-1,["map",[["attached-mac","fa:16:3e:ec:7b:05"],["iface-id","9bc40eec-ae67-4b56-9e75-551609520b27" | 10:12 |
ralonsoh | ],["iface-status","active"],["vm-uuid","4623b509-a848-41a4-9fc2-1cf8227bd4ca"]]]]],"headings":["row","action","name","ofport","external_ids"]} {{(pid=61294) _read_stdout /opt/stack/neutron/neutron/agent/common/async_process.py:285}} | 10:12 |
sean-k-mooney | no it sees it just before neutron sends the vif pulgged event | 10:12 |
sean-k-mooney | i was checking the actual ovs-vswitchd log https://32db135de77b6f5b2e24-00cf68680e4576901ac5e284377f47f2.ssl.cf2.rackcdn.com/924317/7/experimental/neutron-tempest-plugin-openvswitch-distributed-dhcp/43f8b60/controller/logs/openvswitch/ovs-vswitchd_log.txt | 10:13 |
ralonsoh | then I don't know why this delay between the vswitch detection and the ovsdb-client | 10:14 |
ralonsoh | that's a lot | 10:14 |
ralonsoh | actually one minute (could be an error in the timestamp?) | 10:15 |
sean-k-mooney | ralonsoh: so nova recived the network vif plugged event correctly after 2 seconds | 10:16 |
sean-k-mooney | the instance delete starts at Aug 19 12:41:15 | 10:17 |
ralonsoh | yes but why? | 10:17 |
ralonsoh | why is the VM being deleted? | 10:17 |
ralonsoh | yes, I see the vif-plugged event at 12:40:16 | 10:18 |
ralonsoh | no idea why is sending it | 10:19 |
ralonsoh | who* | 10:19 |
sean-k-mooney | is the provisioning block stuff broken again? | 10:20 |
sean-k-mooney | proably not | 10:20 |
sean-k-mooney | im going to check novas api to see what deleted the instance | 10:20 |
ralonsoh | I'm checking the neutron API logs and this event is not sent | 10:20 |
sean-k-mooney | so the delete did happen via the api | 10:24 |
sean-k-mooney | Aug 19 12:41:15.092746 np0038215967 devstack@n-api.service[58447]: INFO nova.api.openstack.requestlog [None req-54123910-92d9-491b-b845-f8e7d37490dd tempest-StatefulNetworkSecGroupTest-1423213631 tempest-StatefulNetworkSecGroupTest-1423213631-project-member] 149.202.166.120 "DELETE /compute/v2.1/servers/4623b509-a848-41a4-9fc2-1cf8227bd4ca" status: 204 len: 0 microversion: 2.1 | 10:24 |
sean-k-mooney | time: 0.127466 | 10:24 |
sean-k-mooney | so the quest is was that tempest | 10:24 |
sean-k-mooney | it proably was but we can check the request id | 10:24 |
ralonsoh | yes | 10:25 |
ralonsoh | 2024-08-19 12:41:15.096 87668 INFO tempest.lib.common.rest_client [req-54123910-92d9-491b-b845-f8e7d37490dd req-54123910-92d9-491b-b845-f8e7d37490dd ] Request (StatefulNetworkSecGroupTest:_run_cleanups): 204 DELETE https://149.202.166.120/compute/v2.1/servers/4623b509-a848-41a4-9fc2-1cf8227bd4ca 0.147s | 10:25 |
ralonsoh | I didn't find it before... | 10:25 |
ralonsoh | sean-k-mooney, thanks a lot!! | 10:25 |
sean-k-mooney | well that proablju just the test cleanup | 10:26 |
sean-k-mooney | it does not tells use why it decied to do that | 10:26 |
sean-k-mooney | but im guessing there is a timeout or something | 10:26 |
ralonsoh | yes, now I need to know why the test is failing | 10:26 |
ralonsoh | I see the SSH is working | 10:26 |
sean-k-mooney | and your not expecting too? | 10:28 |
ralonsoh | no, that' | 10:28 |
ralonsoh | that's ok, but the test (ping) is working, and should not | 10:28 |
ralonsoh | add SG rule -> ping; remove SG rule -> no ping | 10:28 |
sean-k-mooney | so i think for ml2/ovs we alwasy allow icmp but i might be mistaken | 10:29 |
sean-k-mooney | ovn, iptables and the ovs security group driver do have some diffent behaivor | 10:29 |
sean-k-mooney | although maybing im thinkign of arp not icmp | 10:30 |
sean-k-mooney | is the test stopping the ping when it removes the rule | 10:30 |
sean-k-mooney | it could be related to contrack | 10:30 |
ralonsoh | sorry, not ping but nc command | 10:30 |
sean-k-mooney | :) ok agian it could be related to contrack i dont think we terminate established connections | 10:31 |
*** __ministry is now known as Guest966 | 12:10 | |
sean-k-mooney | mikal: https://review.opendev.org/c/openstack/nova/+/926126 is correct although dansmith for valid reasons prefer we now split this type of change into 2 commits 1 for the object change and 1 for the driver change. | 12:49 |
*** ministry is now known as __ministry | 13:11 | |
*** __ministry is now known as Guest970 | 13:11 | |
*** jamesdenton_alt is now known as jamesdenton | 13:25 | |
opendevreview | Takashi Kajinami proposed openstack/nova master: libvirt: Launch instances with stateless firmware https://review.opendev.org/c/openstack/nova/+/908890 | 13:26 |
tkajinam | sean-k-mooney, I've added functional test coverage but I'm struggling to understand why rebuild test fails... I'll dig into it further but it'd be really appreciated if you can take a look to find out what I've overlooked. | 13:29 |
tkajinam | also there is a bit tricky problem with assertion in live migration. I've summarized it (as well as the problem with rescue) so it'd be nice if you can check it when you have time | 13:30 |
sean-k-mooney | tkajinam: if i can find time to run it locally ill try and do that | 13:36 |
tkajinam | sean-k-mooney, thx | 13:36 |
sean-k-mooney | tkajinam: we have test helper for rebuild by the way adn some of the other actions | 13:36 |
sean-k-mooney | so you shoudl not need to do a raw post | 13:36 |
tkajinam | ah, ok. let me find these. I found these bare posts in a few other test files but we can probably replace these, too | 13:37 |
tkajinam | (in a separate follow-up | 13:37 |
sean-k-mooney | https://github.com/openstack/nova/blob/master/nova/tests/functional/integrated_helpers.py#L550 | 13:37 |
sean-k-mooney | most of the actions shoudl be there and if you need a new one you should just ad it ther | 13:38 |
sean-k-mooney | thats not yoru issue but just good to know they exist to aovid code duplication | 13:38 |
sean-k-mooney | OS-DCF:diskConfig is not generaly requrie by the way and it default to auto which iswhy we dont actully set it today | 13:40 |
sean-k-mooney | oh hehe | 13:41 |
sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/908890/15/nova/tests/functional/libvirt/test_stateless_firmware.py#246 | 13:41 |
sean-k-mooney | tkajinam: your not using the updated server object | 13:42 |
sean-k-mooney | oh your passing the id and doing the db lookup | 13:42 |
sean-k-mooney | internally | 13:42 |
sean-k-mooney | but you rusing self.guest_configs[server_id].os_loader_stateless | 13:43 |
sean-k-mooney | so i wonder if that has stale info | 13:43 |
tkajinam | sean-k-mooney, I'm fixing it now. However the current blocker appears in the rescue API call so that line is not related | 13:45 |
opendevreview | Takashi Kajinami proposed openstack/nova master: libvirt: Launch instances with stateless firmware https://review.opendev.org/c/openstack/nova/+/908890 | 13:45 |
sean-k-mooney | ok so rescue is your main consern https://review.opendev.org/c/openstack/nova/+/908890/16/nova/tests/functional/libvirt/test_stateless_firmware.py#208 | 13:46 |
sean-k-mooney | ill quickly pull that review and see what happens locally | 13:46 |
sean-k-mooney | tkajinam: so rescue is failing beucase the test is tryign to open the unrescue_xml_path i.e. its tryign to write to disk | 14:09 |
sean-k-mooney | tkajinam: so there is a missing mock somewhere | 14:09 |
sean-k-mooney | https://paste.opendev.org/show/bCsw4SkPFosMTi0pL4yd/ | 14:10 |
tkajinam | I was wondering why that instance directory is not created but it might be because fake libvirt driver is used. hmm... there are no rescue functional tests with fake libvirt driver now so I may need to look into additional mocking to make the process pass | 14:13 |
sean-k-mooney | there are https://github.com/openstack/nova/blob/0b091179d575a5e7cff9cd76223a8016cfbbc019/nova/tests/functional/libvirt/test_rescue_deleted_base.py | 14:14 |
sean-k-mooney | but but its doing https://github.com/openstack/nova/blob/0b091179d575a5e7cff9cd76223a8016cfbbc019/nova/tests/functional/libvirt/test_rescue_deleted_base.py#L48-L57 | 14:15 |
tkajinam | ahhh | 14:15 |
sean-k-mooney | if i add that | 14:17 |
sean-k-mooney | then it fails with https://paste.opendev.org/show/bLajZLL1mJueSgMkPNAQ/ | 14:17 |
sean-k-mooney | but that might just be an issue with the assert | 14:18 |
tkajinam | yeah. rescue uses the image without stateless firmware so that assertion is wrong | 14:18 |
sean-k-mooney | ok well i trust you can figure out how to fix that | 14:19 |
sean-k-mooney | ill quickly look at why the live migration test is failing | 14:19 |
sean-k-mooney | TypeError: Can't upgrade a READER transaction to a WRITER mid-transaction | 14:19 |
sean-k-mooney | that od | 14:20 |
sean-k-mooney | i feel like you just missing the migration stub or soemthign liek that | 14:21 |
sean-k-mooney | no you have the corret mixin | 14:21 |
opendevreview | Takashi Kajinami proposed openstack/nova master: libvirt: Launch instances with stateless firmware https://review.opendev.org/c/openstack/nova/+/908890 | 14:24 |
tkajinam | sean-k-mooney, sorry but where did you find that type error ? | 14:25 |
tkajinam | the remaining problem with live migration tests is that the xml content generated by fake libvirt driver does not contain any loader related fields and it's not possible to assert that these fields are passed in destination_xml. https://github.com/openstack/nova/blob/0b091179d575a5e7cff9cd76223a8016cfbbc019/nova/tests/fixtures/libvirt.py#L1565-L1568 | 14:26 |
sean-k-mooney | im running it in an IDE but you can get it to print by adding OS_DEBUG=1 to your tox command | 14:27 |
tkajinam | I can technically extend the implementation to simulate more detailed behavior but I'm inclined to look into whitebox tests instead of importing more libvirt's behaviors into that fixture | 14:27 |
tkajinam | ah, ok | 14:27 |
sean-k-mooney | oh weird | 14:30 |
sean-k-mooney | ok it passes if i run it with stestr | 14:31 |
sean-k-mooney | i.e. OS_DEBUG=1 stestr --test-path=./nova/tests/functional run -- LibvirtStatelessFirmwareTest.test_live_migrate_server | 14:31 |
opendevreview | Takashi Kajinami proposed openstack/nova master: libvirt: Launch instances with stateless firmware https://review.opendev.org/c/openstack/nova/+/908890 | 14:34 |
sean-k-mooney | so that test only fails when i run it form my ide which might be related to how im runing it since im not using stestr and i kind of hacked it to work | 14:35 |
tkajinam | yeah | 14:36 |
tkajinam | https://paste.opendev.org/show/bsFDSlO0e6ZaLcsTFX2g/ | 14:36 |
tkajinam | at least that test passes in my env with OS_DEBUG=1 | 14:36 |
sean-k-mooney | if it passes in ci thats the main thing | 14:36 |
sean-k-mooney | ill test v18 now | 14:36 |
sean-k-mooney | cool all 8 pass under stestr | 14:37 |
sean-k-mooney | i didnt run all the funcitonal tests but all the libvirt ones pass locally | 14:40 |
sean-k-mooney | im goig to grab a drink but ill take a look at the current version when i get back | 14:41 |
tkajinam | sean-k-mooney, thanks... I've added a few comments to explain a few context behind current test coverage which hopefully helps that review... | 14:44 |
sean-k-mooney | some of those comments are incorrect. in general i think most of the real code changes look correct and the unit test coverage is fine | 15:31 |
sean-k-mooney | i think the functiontional coverage can be impvoed but its reasonably close | 15:31 |
sean-k-mooney | to what we woudl need to merge this | 15:31 |
opendevreview | Masahito Muroi proposed openstack/nova master: Use boot_roles_count and boot_role_<count> key for system_metadata https://review.opendev.org/c/openstack/nova/+/925163 | 17:12 |
*** bauzas_ is now known as bauzas | 17:41 | |
*** bauzas_ is now known as bauzas | 18:36 | |
opendevreview | melanie witt proposed openstack/nova master: Fix deepcopy usage for BlockDeviceMapping in get_root_info https://review.opendev.org/c/openstack/nova/+/920374 | 21:49 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!