opendevreview | Artom Lifshitz proposed openstack/nova stable/victoria: Add a regression test for bug 1939545 https://review.opendev.org/c/openstack/nova/+/843948 | 00:05 |
---|---|---|
opendevreview | Artom Lifshitz proposed openstack/nova stable/victoria: compute: Ensure updates to bdms during pre_live_migration are saved https://review.opendev.org/c/openstack/nova/+/843949 | 00:05 |
opendevreview | melanie witt proposed openstack/nova stable/train: Define new functional test tox env for placement gate to run https://review.opendev.org/c/openstack/nova/+/840777 | 00:19 |
opendevreview | Steve Baker proposed openstack/nova master: Align ironic driver with libvirt secure boot enable https://review.opendev.org/c/openstack/nova/+/844243 | 05:35 |
*** whoami-rajat__ is now known as whoami-rajat | 07:35 | |
opendevreview | Rico Lin proposed openstack/nova master: libvirt: Ignore LibvirtConfigObject kwargs https://review.opendev.org/c/openstack/nova/+/830644 | 08:44 |
opendevreview | Rico Lin proposed openstack/nova master: libvirt: Remove unnecessary TODO https://review.opendev.org/c/openstack/nova/+/830645 | 08:44 |
opendevreview | Rico Lin proposed openstack/nova master: libvirt: Add vIOMMU device to guest https://review.opendev.org/c/openstack/nova/+/830646 | 08:44 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/train: Extend the reproducer for 1953359 and 1952915 https://review.opendev.org/c/openstack/nova/+/839354 | 09:22 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/train: [rt] Apply migration context for incoming migrations https://review.opendev.org/c/openstack/nova/+/839355 | 09:22 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Reject AZ changes during aggregate add / remove host https://review.opendev.org/c/openstack/nova/+/821423 | 09:26 |
opendevreview | Balazs Gibizer proposed openstack/osc-placement master: Support microversion 1.39 https://review.opendev.org/c/openstack/osc-placement/+/828545 | 09:32 |
slaweq | gibi hi, since few days we are seeing same failure in our fedora based scenario periodic job, see https://zuul.openstack.org/build/4a7f284f32eb436da6b5ef59d46e615d for example | 09:38 |
slaweq | it seems like some nova and/or glance issue for me | 09:38 |
slaweq | did You maybe saw already something like that? or should I open new LP for it? | 09:38 |
gibi | slaweq: it does not immediately ring a bell but give me some time to look into it... I will get back to you | 09:39 |
slaweq | @gibi sure, thx a lot | 09:40 |
sean-k-mooney | slaweq: its proably related to runnign on python 3.10 | 09:47 |
sean-k-mooney | slaweq: we dont have any tempest based testing on 3.10 currently for nova | 09:48 |
sean-k-mooney | if you are only seing it on fedora | 09:48 |
slaweq | sean-k-mooney: yes, we are seeing it only on fedora currently | 09:50 |
sean-k-mooney | do you need to run that job on fedora for a particalar reason by the way? im wondering if ubuntu 22.04 woudl see the same failure | 09:51 |
slaweq | sean-k-mooney (@sean-k-mooney:matrix.org) we just want to have fedora based periodic job, we run many Ubuntu jobs in check/gate queues already | 09:54 |
slaweq | maybe we could/should move it to c9s now | 09:54 |
slaweq | but for now it's fedora | 09:54 |
sean-k-mooney | slaweq: right i was going to suggest ye move to ubuntu 22.04 or centos 9 stream | 09:57 |
sean-k-mooney | i strongly dislike haveing fedora based testing in the gate | 09:57 |
sean-k-mooney | im supportive of c9s testing | 09:58 |
sean-k-mooney | im looking at the logs currently and waitign for the filters error logs to render | 09:59 |
sean-k-mooney | actully there are no error level logs in nova | 10:02 |
sean-k-mooney | so there were no excptions | 10:02 |
sean-k-mooney | i do see DEBUG neutronclient.v2_0.client [-] Error message: {"NeutronError": {"type": "PortNotFound", "message": "Port 1a6c72ae-3e1a-45d6-a70e-1f9964b70756 could not be found.", "detail": ""}} | 10:02 |
sean-k-mooney | but that proably not related to the failure | 10:02 |
sean-k-mooney | im only looking at the novac compute currently | 10:03 |
gibi | slaweq: I see that the instance being snapshotted is crashed | 10:04 |
gibi | https://zuul.openstack.org/build/4a7f284f32eb436da6b5ef59d46e615d/log/controller/logs/screen-n-cpu.txt#19235 | 10:04 |
gibi | Instance instance-0000002e disappeared while taking snapshot of it: [Error Code 42] Domain not found: no domain with matching uuid | 10:04 |
gibi | https://zuul.openstack.org/build/4a7f284f32eb436da6b5ef59d46e615d/log/controller/logs/libvirt/libvirt/qemu/instance-0000002e_log.txt | 10:04 |
gibi | 2022-06-01 03:13:33.853+0000: shutting down, reason=crashed | 10:04 |
gibi | but I don't see any reaonse why it is crashed | 10:05 |
sean-k-mooney | its paused but i have nto gotten to where it crahsed yet | 10:05 |
sean-k-mooney | Instance instance-0000002e disappeared while taking snapshot of it: [Error Code 42] Domain not found: no domain with matching uuid '84e8668d-6263-4a85-98f5-15d1a7e38f0d' (instance-0000002e) | 10:06 |
gibi | it seems that the syslog journal was rotated as it only has later logs in it | 10:07 |
gibi | slaweq: as far as I see this is job failing since 2022-05-07 with the same issue | 10:16 |
gibi | but older runs has no logs any more | 10:17 |
gibi | so potentially it is failing a lot longer than 05-07 | 10:18 |
gibi | slaweq: I found the rotated yournal | 10:26 |
gibi | qemu is segfaulted | 10:26 |
gibi | https://paste.opendev.org/show/bIxQstqasZeJiuyDRHxD/ | 10:26 |
gibi | better paste https://paste.opendev.org/show/byrrh32nwTYvGXtw52ax/ | 10:27 |
sean-k-mooney | nice find | 10:29 |
sean-k-mooney | so likely a fedora bug | 10:29 |
gibi | probably worth taking up with the downstream virt team | 10:29 |
gibi | as my knowledge ends here | 10:29 |
sean-k-mooney | ya perhaps although they dont really care about the tcg backend for qemu | 10:30 |
sean-k-mooney | so if this does not happen with kvm | 10:30 |
sean-k-mooney | they may or may not help | 10:30 |
sean-k-mooney | hum there is nothing useful in the qemu instance log | 10:32 |
sean-k-mooney | it looks like a pretty normal vm | 10:33 |
sean-k-mooney | -machine pc-i440fx-6.1,accel=tcg,usb=off,dump-guest-core=off,memory-backend=pc.ram | 10:33 |
sean-k-mooney | so still pc machine type too | 10:33 |
opendevreview | Merged openstack/nova master: Fix typos in help messages https://review.opendev.org/c/openstack/nova/+/843843 | 10:50 |
opendevreview | Merged openstack/nova master: Fix typos https://review.opendev.org/c/openstack/nova/+/843127 | 10:50 |
opendevreview | Merged openstack/nova stable/train: Define new functional test tox env for placement gate to run https://review.opendev.org/c/openstack/nova/+/840777 | 10:50 |
*** tbachman_ is now known as tbachman | 10:56 | |
*** tbachman_ is now known as tbachman | 11:07 | |
opendevreview | Balazs Gibizer proposed openstack/osc-placement master: Drop py36 and py37 support https://review.opendev.org/c/openstack/osc-placement/+/844281 | 11:28 |
opendevreview | Balazs Gibizer proposed openstack/osc-placement master: Support microversion 1.39 https://review.opendev.org/c/openstack/osc-placement/+/828545 | 11:29 |
slaweq | thx sean-k-mooney (@sean-k-mooney:matrix.org) and gibi for checking tha | 11:36 |
slaweq | *that | 11:36 |
sean-k-mooney | slaweq: assuming this is just the qemu bug then you shoudl be able to swap the nodeset to c9s and that hopefully will resolve the problem unless that but has already made it into c9s | 11:42 |
gibi | elodilles, melwitt: I see a new blocker on stable/ussuri https://zuul.opendev.org/t/openstack/build/034890df09f94bcba63b97f1586e9a48/log/job-output.txt#43311 | 11:46 |
gibi | added to the tracking etherpad https://etherpad.opendev.org/p/nova-stable-branch-ci L26 | 11:46 |
gibi | gmann: ^^ fyi maybe you have a view on this too | 11:47 |
elodilles | gibi: there's already a patch for that: https://review.opendev.org/c/openstack/openstacksdk/+/843978 | 11:49 |
gibi | thanks, then now that is a blocker for ussuri as the job fails 100% | 11:51 |
gibi | due to master upper-constraint was bumped recently | 11:51 |
elodilles | yepp | 11:52 |
gibi | elodilles: thanks for the info | 11:54 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/wallaby: func: Increase rpc_response_timeout in TestMultiCellMigrate tests https://review.opendev.org/c/openstack/nova/+/844200 | 12:01 |
gibi | elodilles, melwitt: I saw this issue ^^ in wallaby (newer branches are already fixed) | 12:03 |
opendevreview | Balazs Gibizer proposed openstack/osc-placement master: Add Python3 zed unit tests https://review.opendev.org/c/openstack/osc-placement/+/835369 | 12:09 |
opendevreview | Balazs Gibizer proposed openstack/osc-placement master: Support microversion 1.39 https://review.opendev.org/c/openstack/osc-placement/+/828545 | 12:09 |
opendevreview | Alexey Stupnikov proposed openstack/nova master: [minor]Remove unused argument from _fake_do_delete https://review.opendev.org/c/openstack/nova/+/844285 | 12:15 |
gibi | elodilles, melwitt: another blocker on stable/train grenade is failing in devstack-plugin-ceph https://zuul.opendev.org/t/openstack/build/a6e02e6d4b5d40f794c188c852214002/log/job-output.txt#5771-5780 https://zuul.opendev.org/t/openstack/build/a6e02e6d4b5d40f794c188c852214002/log/job-output.txt#5771-5780 | 12:17 |
gibi | it seems it started two days ago https://paste.opendev.org/show/bxCRqAFCVzM28hDqlWix/ | 12:17 |
elodilles | gibi: wallaby patch is +2'd. the train blocker is interesting :S i was focused on ussuri and victoria and haven't noticed we have a blocker in train :S | 12:25 |
opendevreview | Rico Lin proposed openstack/nova master: libvirt: Add vIOMMU device to guest https://review.opendev.org/c/openstack/nova/+/830646 | 12:28 |
gibi | elodilles: seems to be a fresh one | 12:32 |
gibi | elodilles: thanks for the review on the wallaby one | 12:33 |
elodilles | gibi: about the train failure: it is strange, it seems 'source' command is missing (?) from the node :-o | 12:40 |
gibi | it could be that "" is not found | 12:45 |
gibi | by source | 12:45 |
gibi | as the next line is | 12:46 |
gibi | /bin/sh: 6: install_ceph_remote: not found | 12:46 |
gibi | or meh | 12:46 |
elodilles | i guess that should come from the sourced code | 12:46 |
gibi | https://github.com/openstack/devstack-plugin-ceph/commit/67da31fa430b99bbe13dc53bbcf81d0e7688523f | 12:47 |
gibi | this did some change 6 days ago on ussuri and victoria | 12:47 |
gibi | so this might be related | 12:47 |
gibi | other than that I don't see any new thing in the plugin repo | 12:50 |
gibi | elodilles: btw, you were right in https://review.opendev.org/c/openstack/nova/+/839354/2#message-8f0247bc92c2d82ed37c05c88db0769fa1f69fa0 I fixed up the cherry pick | 13:04 |
gibi | you have eagle eyes | 13:04 |
elodilles | gibi: diffing the diffs helps a lot :D | 13:22 |
elodilles | gibi: it was strange that in one we have CPUPinning while in the other we have CPUUnpinning :) | 13:23 |
elodilles | i'll review that again when i get there :) | 13:23 |
elodilles | btw, i think the problem is with the train thing is that we don't have bash (thus 'source' command). somehow now it runs /bin/sh instead of /bin/bash. (at least that is what i suspect) | 13:25 |
sean-k-mooney | elodilles: odd thing i noticed on pop os 22.04 but might be the same on ubuntu 22.04 /bin/sh is actully dash... | 13:26 |
sean-k-mooney | you shoud never depend on /bin/sh for portabl scripts | 13:26 |
gibi | elodilles: interesting... | 13:27 |
sean-k-mooney | . <thing> | 13:27 |
sean-k-mooney | is more porable then "source thing" | 13:27 |
sean-k-mooney | but ya i hit the /bin/sh -> /bin/dash issue because pushd was not a command | 13:28 |
sean-k-mooney | in the script that uses /bin/sh | 13:29 |
sean-k-mooney | wehre is /bin/sh being used | 13:29 |
elodilles | sean-k-mooney: what i see is we expicitly ask for /bin/bash: https://opendev.org/openstack/nova/src/branch/stable/train/gate/live_migration/hooks/ceph.sh#L10 | 13:30 |
sean-k-mooney | is it in the devstack plugin | 13:30 |
sean-k-mooney | ah yep we do | 13:30 |
elodilles | sean-k-mooney: but then we got errors like '/bin/sh: .: install_ceph_remote: not found' | 13:30 |
elodilles | sean-k-mooney: or '/bin/sh: 5: source: not found' | 13:31 |
sean-k-mooney | why are we using ansible with raw there | 13:31 |
sean-k-mooney | and not shell | 13:31 |
elodilles | so it seems the subnode does not have 'bash' | 13:31 |
elodilles | sean-k-mooney: good question :) | 13:32 |
sean-k-mooney | it should im not sure that executable=/bin/bash is the correc t syntx | 13:32 |
sean-k-mooney | that is defining a varable in the "script" | 13:32 |
sean-k-mooney | its not actully part of the argument to the ansibel module | 13:33 |
sean-k-mooney | as in that is not telling the raw module to use bash as far as i can see | 13:33 |
sean-k-mooney | its just storing /bin/bash in the executable variable in the shell that is creted by raw | 13:34 |
sean-k-mooney | executable is a parmater it can accpet | 13:35 |
sean-k-mooney | https://docs.ansible.com/ansible/latest/collections/ansible/builtin/raw_module.html | 13:35 |
sean-k-mooney | but that does not look liek the correct syntax to me | 13:35 |
elodilles | sean-k-mooney: the interesting thing is that this worked so far on stable/train, but some days ago it started to fail | 13:36 |
sean-k-mooney | ~/repos/ansible_role_devstack on add-ubuntu-multinode [?] via 🐍 v3.8.13 (.venv) | 13:40 |
sean-k-mooney | [14:40:15]❯ ansible localhost -m raw -a "executable=/bin/bash; echo test" | 13:40 |
sean-k-mooney | [WARNING]: No inventory was parsed, only implicit localhost is available | 13:40 |
sean-k-mooney | localhost | FAILED | rc=127 >> | 13:40 |
sean-k-mooney | /bin/sh: line 1: -c: command not found | 13:40 |
sean-k-mooney | non-zero return code | 13:40 |
sean-k-mooney | so that would imply that this is using /bin/sh | 13:41 |
sean-k-mooney | im tryign to figure out how to pring the current executable | 13:41 |
sean-k-mooney | maybe $0 | 13:41 |
sean-k-mooney | hum | 13:41 |
sean-k-mooney | 14:41:30]❯ ansible localhost -m raw -a "executable=/bin/bash echo $0" | 13:41 |
sean-k-mooney | [WARNING]: No inventory was parsed, only implicit localhost is available | 13:41 |
sean-k-mooney | localhost | CHANGED | rc=0 >> | 13:41 |
sean-k-mooney | bash | 13:41 |
sean-k-mooney | 14:42:10]➜ ansible localhost -m raw -a "executable=/home/sean/.nix-profile/bin/fish echo $0" | 13:42 |
sean-k-mooney | [WARNING]: No inventory was parsed, only implicit localhost is available | 13:42 |
sean-k-mooney | localhost | CHANGED | rc=0 >> | 13:42 |
sean-k-mooney | bash | 13:42 |
sean-k-mooney | so settign execuatble seams to have no impact on what is used with that syntax | 13:42 |
gibi | this is the last success run https://zuul.opendev.org/t/openstack/build/9c886d2f459143cdaf587ba0b93c7cd4/log/job-output.txt | 13:43 |
sean-k-mooney | based on https://docs.ansible.com/ansible/latest/user_guide/intro_adhoc.html it looks like that syntax should actully work | 13:44 |
elodilles | gibi: yepp, and the 'source' here works so install_ceph_remote simply runs without any error | 13:45 |
sean-k-mooney | oh im dumb i need to use '' not "" | 13:47 |
sean-k-mooney | $0 is being evaulated before its passed to ansible | 13:47 |
sean-k-mooney | https://pastebin.com/0wfCmm1z openstacks past seams to be down https://pastebin.com/0wfCmm1z | 13:54 |
sean-k-mooney | but ya even if quote it correctly that does nto seam to set teh execuabel used by raw | 13:55 |
sean-k-mooney | oh no for fish the syntax is differnt | 13:57 |
elodilles | sean-k-mooney: based on your example, here is mine: https://paste.opendev.org/show/btfrnyRESb4uBo2t9TCj/ | 14:01 |
elodilles | sean-k-mooney: so i think bash is disappeared from the subnode in the past days | 14:01 |
sean-k-mooney | maybe but i wonder how | 14:02 |
elodilles | sean-k-mooney: and now ansible defaults to /bin/sh | 14:02 |
elodilles | sean-k-mooney: where we don't have 'source' command | 14:02 |
sean-k-mooney | well /bin/sh is not not a real shell | 14:02 |
sean-k-mooney | its a symlink | 14:02 |
elodilles | sean-k-mooney: that is a good question: how :) | 14:02 |
sean-k-mooney | ther is not sh binary | 14:02 |
sean-k-mooney | maybe it got removed form the cloud image by default | 14:03 |
sean-k-mooney | did you say this was train | 14:03 |
sean-k-mooney | or ussuri | 14:03 |
sean-k-mooney | just wonderign if its 18.04 or 20.04 | 14:03 |
elodilles | sean-k-mooney: well it could be /bin/dash, still the result is the same: we don't have 'source' command | 14:03 |
sean-k-mooney | correct | 14:04 |
elodilles | sean-k-mooney: train | 14:04 |
elodilles | sean-k-mooney: and it's bionic | 14:04 |
elodilles | sean-k-mooney: so 20.04 | 14:04 |
sean-k-mooney | ack so i think we can pull the nodepool images directly form somewhere | 14:04 |
sean-k-mooney | infra makes them aviabel via the nodepool job | 14:04 |
sean-k-mooney | we could open the qcow and take a look | 14:04 |
sean-k-mooney | and or look and see if anyone tweaked the dib element or nodepool config recently | 14:05 |
elodilles | sean-k-mooney: is there a way to see the recent changes in the image we have in nodepool? :-o | 14:07 |
sean-k-mooney | the image are here nb03.opendev.org | 14:09 |
sean-k-mooney | well the logs at least | 14:09 |
Uggla | sean-k-mooney, are you using starship for your prompt ? | 14:09 |
sean-k-mooney | Uggla: yes | 14:09 |
Uggla | sean-k-mooney, cool ! Do you like it ? | 14:10 |
sean-k-mooney | ya its not bad i dont notice it much | 14:11 |
sean-k-mooney | the command time in the output is nice | 14:11 |
* Uggla plans to use starship, but stay with powerline atm. | 14:11 | |
sean-k-mooney | it can be helpful sometimes | 14:11 |
sean-k-mooney | i do not use vim so powerline is not as useful to me | 14:11 |
Uggla | sean-k-mooney, you are emacs user right ? | 14:12 |
sean-k-mooney | elodilles: the x86 logs are here https://nb02.opendev.org/ubuntu-bionic-0000229995.log | 14:12 |
sean-k-mooney | Uggla: currently i used to just use nano | 14:12 |
sean-k-mooney | i wanted some thing a little more advance so went to spacemacs in emacs mode | 14:12 |
sean-k-mooney | i used to use ides like pycharm years ago bug got tired of having to set them up on various dev systems | 14:13 |
Uggla | sean-k-mooney, oh ok. I tried astronvim --> really cool. (if you like vim) | 14:13 |
sean-k-mooney | i hate how modes work in vim | 14:14 |
sean-k-mooney | i dont want my modes to change what keys do | 14:14 |
sean-k-mooney | so insert vs replace vs normal vs visual mode is a deal breaker for me | 14:15 |
sean-k-mooney | i never want to have to care about that and cary that context in my brain when im coding | 14:15 |
sean-k-mooney | Uggla: as you may have notice there is more then enough context to remember for nova as it is | 14:16 |
opendevreview | Alexey Stupnikov proposed openstack/nova master: [minor]Remove unused argument from _fake_do_delete https://review.opendev.org/c/openstack/nova/+/844285 | 14:17 |
sean-k-mooney | elodilles: dash is definetly isntalled 2022-06-01 08:18:05.065 | I: Retrieving dash 0.5.8-2.10 | 14:18 |
Uggla | sean-k-mooney, sure. I understand, in my case, I use vim for so long that it is more or less automatic. | 14:18 |
sean-k-mooney | as is bash | 14:18 |
sean-k-mooney | Uggla: i used to automate unistalling it form every system i used | 14:18 |
elodilles | sean-k-mooney: yepp, i see bash 4 in it. still it is strange :S | 14:19 |
Uggla | sean-k-mooney, ಥ_ಥ | 14:20 |
Uggla | sean-k-mooney, you brake my heart. ;) | 14:20 |
elodilles | sean-k-mooney: even the base ubuntu image ( bionic-server-cloudimg-amd64.img ) has /bin/bash in it | 14:29 |
zigo | I'm getting "Live Migration failure: operation failed: migration out job: Cannot write to TLS channel: Input/output error: libvirt.libvirtError: operation failed: migration out job: Cannot write to TLS channel: Input/output error". | 14:46 |
zigo | How can I check what's wrong? | 14:46 |
zigo | Oh, I know ... nova doesn't have a shell. :/ | 14:48 |
zigo | Hum... this was a problem, but it's not the only one, I'm still getting it. | 14:50 |
opendevreview | Ghanshyam proposed openstack/nova stable/ussuri: Make sdk broken job non voting until it is fixed https://review.opendev.org/c/openstack/nova/+/844309 | 14:58 |
gmann | gibi: elodilles melwitt ^^ sdk broken job fix might take time or might not be fixed soon. http://lists.openstack.org/pipermail/openstack-discuss/2022-May/028763.html | 14:59 |
gmann | gibi: elodilles melwitt ^^ I am making it non voting until then to unblock gate like we did in devstack | 14:59 |
elodilles | gmann: ack. though this is passing so this should be a good workaround i guess: https://review.opendev.org/c/openstack/openstacksdk/+/843978 | 15:03 |
elodilles | or do i miss something? :-o | 15:04 |
Uggla | gibi, sean-k-mooney could you review https://review.opendev.org/c/openstack/nova/+/831507 in the next days if you have time ? | 15:04 |
gmann | elodilles: yeah but honestly saying that does not give us much benefit than making it non voting | 15:05 |
zigo | Found the issue ... :) | 15:14 |
melwitt | gibi: thanks for adding those to the etherpad! also +W on the rpc_response_timeout backport | 15:18 |
melwitt | gmann: thanks, I'm looking through to understand the options | 15:36 |
gmann | melwitt: i commented on sdk patch. I like the venv approach but that need some commitment and bandwidth from sdk team | 15:38 |
melwitt | gmann: I see, makes sense. we need opinion from gtema and stephenfin | 15:39 |
gmann | yeah I will ping them on sdk channel in case they did not see | 15:40 |
melwitt | ++ | 15:40 |
melwitt | ordinarily I would say we'd want the venv approach bc sdk is to be backward compat with previous versions BUT as you point out, branch is EM so there is no obligation to keep it working | 15:41 |
gmann | yeah, same as Tempest. and Tempest team bandwidth is the key for us not to support Tempest master for all EM branches. | 15:42 |
gmann | and I think sdk is also in same situation. | 15:43 |
* melwitt nods | 15:46 | |
opendevreview | Rico Lin proposed openstack/nova master: libvirt: Add vIOMMU device to guest https://review.opendev.org/c/openstack/nova/+/830646 | 15:49 |
melwitt | gmann, elodilles: commented on the sdk job n-v patch https://review.opendev.org/c/openstack/nova/+/844309 | 16:28 |
opendevreview | Ghanshyam proposed openstack/nova stable/ussuri: [stable-only] Make sdk broken job non voting until it is fixed https://review.opendev.org/c/openstack/nova/+/844309 | 16:35 |
gmann | melwitt: thanks, add [stable-only] ^^ | 16:35 |
gmann | added | 16:35 |
opendevreview | Merged openstack/nova-specs master: Repropose volume backed server rebuild spec https://review.opendev.org/c/openstack/nova-specs/+/840155 | 16:36 |
opendevreview | Merged openstack/nova stable/wallaby: func: Increase rpc_response_timeout in TestMultiCellMigrate tests https://review.opendev.org/c/openstack/nova/+/844200 | 16:43 |
gibi | gmann: thanks for the sdk nonvoting patch | 17:19 |
opendevreview | Merged openstack/nova stable/xena: Add service version check workaround for FFU https://review.opendev.org/c/openstack/nova/+/831174 | 17:33 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/wallaby: Add service version check workaround for FFU https://review.opendev.org/c/openstack/nova/+/844202 | 17:37 |
opendevreview | Artom Lifshitz proposed openstack/nova master: Func test for deletion of auto-created port on detach https://review.opendev.org/c/openstack/nova/+/844325 | 18:21 |
opendevreview | Artom Lifshitz proposed openstack/nova master: Don't delete auto-existing port when detaching https://review.opendev.org/c/openstack/nova/+/844326 | 18:21 |
artom_ | dansmith, well there's 2 things. Does the user want the port deleted when the instance is deleted, and does the user want the port deleted when it's detached? | 18:27 |
*** artom_ is now known as artom | 18:27 | |
dansmith | artom: ack, so we let them change the flag on volumes for the same reason right/ | 18:28 |
dansmith | yeah in 2.85 | 18:29 |
dansmith | so maybe making that the same in the port is the way, if you want to be able to change the instance delete behavior too | 18:29 |
artom | dansmith, so the instance delete case is orthogonal, the original issue is that the port gets deleted when the user detaches it | 18:30 |
dansmith | on first thought I was thinking that you'd always want to delete it on instance delete, but I dunno why.. volume data is *super* important, but ip allocation can be important too (for broken software :P) | 18:30 |
dansmith | artom: right I know | 18:30 |
melwitt | do we delete a volume when you detach it? I didn't think so | 18:30 |
dansmith | melwitt: you couldn't detach root volumes so it didn't matter | 18:31 |
artom | You still can't IIRC... can you? | 18:31 |
dansmith | and those are the ones that get that flag | 18:31 |
artom | I know there was an attempt https://review.opendev.org/q/topic:bp/detach-boot-volume+ | 18:31 |
dansmith | artom: not that I know of, yeah | 18:31 |
dansmith | right | 18:31 |
melwitt | yeah but what if you booted with another block device to attach? I guess that would mean you created the volume outside of nova | 18:31 |
dansmith | melwitt: right | 18:31 |
artom | dansmith, so for the detach case, you're proposing an API change (like 'delete_port': True in the body, for example) when doing the DELETE os-interface call? | 18:33 |
dansmith | so I think we don't expose preserve_on_delete anywhere currently right? if not, it would probably be best to mirror the same boolean logic as the volume one | 18:33 |
dansmith | which is inverted from preserve, I think | 18:33 |
artom | dansmith, we do not, it's in a JSON blob in instance info network cache | 18:33 |
dansmith | artom: that's what I was proposing, but I think I've changed my mind | 18:33 |
dansmith | artom: for the "I need to delete this instance but I want to keep its IP" case | 18:33 |
melwitt | I think it's best to mirror it too if we can | 18:33 |
dansmith | melwitt: yeah should just be "delete_on_termination=!preserve_on_delete" | 18:34 |
artom | dansmith, but we're not deleting any instance, we're keeping the instance (at least initially), but detaching the port | 18:35 |
dansmith | artom: I realize that, I'm saying make the flag mutable so that it will work for either | 18:35 |
melwitt | which is why I was thinking preserve_on_delete_and_detach | 18:36 |
dansmith | artom: I'm considering the whole picture and not just your single case :) | 18:36 |
artom | And I'm saying they're different cases :) | 18:36 |
dansmith | melwitt: you want that to be the flag name? that's inverted from the volume one and also really oddly worded :) | 18:36 |
dansmith | artom: why? | 18:36 |
melwitt | dansmith: er... I guess it should be delete_on_terminate_and_detach then | 18:37 |
artom | Because you could want a port auto-deleted when you delete an instance (delete_on_termination addresses that) | 18:37 |
dansmith | it's basically a "is this precious" flag which applies in either case, and can be tweaked right before you do either operation | 18:38 |
artom | But for the exact same instance, you could want a port preserved when detaching it | 18:38 |
artom | Well... | 18:38 |
dansmith | artom: so tweak the flag before you do either, but I think having different behavior for those cases is more complicated than necessary and more likely for someone to set one and not the other and then be surprised | 18:38 |
melwitt | I dunno how yall will want to do it but I'm just saying I don't want it to delete on detach if the only word in the flag is "termination" | 18:38 |
dansmith | melwitt: the volume flag doesn't mention the exact operation in the name either.. we have no "terminate" server op, | 18:39 |
dansmith | so terminate can cover any reason, IMHO | 18:40 |
dansmith | cross-cell migrate might not be able to keep the same port in the other cell, so wouldn't we want to honor the same flag and leave that port allocated for the user in the old cell if they've marked it as precious? | 18:40 |
dansmith | or should we delete it because "meh, it's not detach or delete"? :) | 18:41 |
melwitt | eh... "terminate" means "delete server" as far as I've known | 18:41 |
artom | I guess if you think of the verb terminate as acting on the attachment itself... | 18:41 |
dansmith | artom: right :) | 18:41 |
artom | But then it should really be delete_(volume|port)_on_termination | 18:42 |
dansmith | it would be better to not use a verb and use an adjective.. "is this thing precious, should I preserve it always" | 18:42 |
dansmith | preserve_on_deallocate -- does that cover both? | 18:42 |
artom | It's really about lifecycle coupling (bingo!) | 18:42 |
artom | Is this volume/port coupled to this instance? | 18:43 |
dansmith | delete_with_parent | 18:43 |
artom | delete_with_owner? | 18:43 |
dansmith | owner is the user, IMHO, but it's close | 18:43 |
dansmith | the user *owns* the hierarchy of resources including an instance and a port, volume | 18:44 |
dansmith | but whatever, we're far into the weeds now | 18:44 |
artom | *puffs* | 18:44 |
dansmith | sounds like we're coalescing on exposing the to-be-named flag, letting it be mutable, and using that during detach and instance delete? | 18:44 |
artom | Well, we are, because you're more opinionated and I care less, but the Europeans might have a different opinion once a spec is written up :) | 18:45 |
dansmith | delete_with_ouwner? | 18:45 |
melwitt | I'm not commenting on it being coupled, just trying to say if we couple it, it should be really clear in the name of the flag. otherwise we're gonna have people be surprised and unhappy their port got deleted when they detached it and we'll be doing yet another microversion to change the name or add another flag | 18:46 |
dansmith | well, delete_on_termination is plenty ambiguous, when stored on a volume attachment, IMHO | 18:46 |
dansmith | but it says "server delete" in the docs as the explainer, | 18:47 |
dansmith | so I think we do the same and we'll please some and offend others | 18:47 |
melwitt | I don't think it is given that detaching a volume _never_ deletes the volume | 18:47 |
artom | dansmith, honestly, if it wasn't for the long standing precedent, I'd be arguing for the changing the default | 18:47 |
artom | Because if you want to delete the port, just do the damn API call yourself :) | 18:47 |
artom | No less ambiguous than that | 18:48 |
dansmith | artom: eh? you can't if it's bound right? | 18:48 |
artom | Right, so detach it first | 18:48 |
dansmith | you know what the feedback was when we split out neutron? | 18:48 |
dansmith | people hated having to do one operation in two places just to get an instance booted | 18:48 |
artom | Lemme guess, "plz proxy and/or automate more things" | 18:48 |
dansmith | so that sounds a lot like a regressive pattern to me | 18:48 |
artom | Yeah, I see the smooth UX argument | 18:49 |
dansmith | artom: I boot my instance on a private provisioning network first, then I want to attach it to public and delete the temporary nic, but I have to do two things? | 18:49 |
artom | But can't beat the idea in terms of clarity of intent :) | 18:49 |
dansmith | you could also follow your logic for the delete case, | 18:50 |
artom | Yeah, it's admittedly gray there | 18:50 |
dansmith | and say: if you have an ip you really need to keep, detach it with no_delete=True before you delete your instance, | 18:50 |
dansmith | so if you think that's more natural, that's fine | 18:50 |
dansmith | meaning put the intent in the detach call, instead of a property on the attachment, | 18:51 |
dansmith | which solves melwitt's "what does terminate mean" problem | 18:51 |
dansmith | (which I created for her, noted...) | 18:51 |
melwitt | hehe. I do like this idea more, it feels a lot clearer | 18:52 |
dansmith | it was my original thought, but then I was thinking "but what if I want to delete the instance?" .. but if you can detach first, then that solves that | 18:52 |
dansmith | kinda unfortunate to make it different from volumes, even though volumes have a reason to be different (because detaching root) | 18:53 |
artom | So for volumes AFAICT the delete_on_termination flag only applies to instance delete | 18:54 |
artom | The detach flow *always* preserves the volume | 18:54 |
dansmith | artom: but because it has to | 18:54 |
gibi | you can delete a bound port in neutron and it will trigger a detach in nova, a buggy detach :D | 18:54 |
melwitt | I think it's fine to make it same as volumes in concept but it shouldn't use ambiguous words IMHO. it's not ambiguous for volumes but it is for NICs, IMO | 18:55 |
artom | And I guess because nova-created volumes aren't a thing | 18:55 |
dansmith | gibi: because we get a network-deleted and then try to yank it out? | 18:55 |
gibi | yepp | 18:55 |
dansmith | artom: they are, but only root | 18:55 |
gibi | but that codepath is incomplete | 18:55 |
artom | dansmith, ah, right | 18:55 |
dansmith | gibi: yeah, I wish we didn't do that | 18:55 |
dansmith | gibi: like trove creating instances for you, I wish we had a better notion of "this was created for you by $service, delete it from there if you want to" | 18:56 |
gibi | dansmith: totally agree | 18:57 |
artom | I guess what I'm forgetting is that in his particular case, they instance *will* get deleted regardless | 18:57 |
gibi | also I'm wondering if nova can trigger volume creation other than the root volume via image->volume or blank->volume BDMs in the boot request | 18:57 |
dansmith | so does anyone think that just declaring intent during detach, and detach before instance delete if you want to keep it, is not okay? | 18:57 |
gibi | I'm OK to declar intent at detach | 18:58 |
artom | So for them delete_on_termination=False or whatever we end up calling it is fine | 18:58 |
artom | And then they'll be able to just set that and delete the instance, and have the port be preserved | 18:58 |
dansmith | gibi: Depending on the destination_type and guest_format, this will either be a blank persistent volume " | 18:58 |
dansmith | gibi: so source_type=blank,destination_type=volume maybe? | 18:59 |
gibi | yepp that is what I vaguely remember | 18:59 |
dansmith | artom: that's not what I was just describing | 18:59 |
gibi | but it is too late here to actually try | 18:59 |
dansmith | artom: detach(port=$uuid,no_delete=True) <- only change | 19:00 |
dansmith | artom: if you want to keep an ip before deleting an instance, detach it first | 19:00 |
dansmith | gibi: I thought you had to do this with a volume uuid so yeah I'm skeptical, but maybe it works | 19:00 |
artom | dansmith, oh, so you wouldn't actually expose the preserve_on_delete flag | 19:00 |
dansmith | artom: right, hence the "intent on detach" approach | 19:01 |
artom | dansmith, but rather *just* add a delete_port option to the detach call | 19:01 |
melwitt | if everyone else prefers to have delete_on_termination for the port and have it also mean delete on detach, I will support the consensus | 19:01 |
dansmith | artom: right, whatever the flag is | 19:01 |
artom | dansmith, right, I think I prefer that | 19:02 |
artom | The confusion was never about instance deletion | 19:02 |
dansmith | artom: since we already have behavior, I'd say that if you don't provide the flag, we follow the network_info setting to determine whether or not it gets deleted, but if you ask for True or False, we do that | 19:02 |
artom | The current logic of "if Nova created it, Nova deletes it" is fine | 19:02 |
artom | The confusion is upon interface detach | 19:02 |
artom | So it would make sense to add a way to specify intent there | 19:02 |
artom | While keeping the current behaviour as the historical default | 19:03 |
dansmith | artom: again, I was not confusing the two, I was trying to make sure that we had a consistent story for both | 19:03 |
artom | Confusion in the sense of "what happens to the port?: | 19:03 |
artom | " | 19:03 |
dansmith | again, I... okay. :P | 19:05 |
gibi | Uggla: I read half of the https://review.opendev.org/c/openstack/nova/+/831507 and I left feedback. | 19:20 |
opendevreview | Rico Lin proposed openstack/nova master: libvirt: Ignore LibvirtConfigObject kwargs https://review.opendev.org/c/openstack/nova/+/830644 | 20:19 |
opendevreview | Rico Lin proposed openstack/nova master: libvirt: Remove unnecessary TODO https://review.opendev.org/c/openstack/nova/+/830645 | 20:19 |
opendevreview | Rico Lin proposed openstack/nova master: add locked_memory extra spec and image property https://review.opendev.org/c/openstack/nova/+/778347 | 20:19 |
opendevreview | Rico Lin proposed openstack/nova master: libvirt: Add vIOMMU device to guest https://review.opendev.org/c/openstack/nova/+/830646 | 20:19 |
ricolin | ^^^ sean-k-mooney: stephenfin gibi just update nova viommu and lock memory patch set according to the bp | 20:24 |
opendevreview | Rico Lin proposed openstack/os-traits master: Add traits for vIOMMU https://review.opendev.org/c/openstack/os-traits/+/844336 | 20:34 |
melwitt | gmann: hm, it looks like the patch somehow did not affect the gate queue and it voted https://review.opendev.org/c/openstack/nova/+/844309 | 21:23 |
melwitt | er, I mean that it didn't remove the job from the gate queue and it ran and then voted | 21:25 |
gmann | melwitt: ohk, I think we have this in integrated template also, let me fix | 21:25 |
melwitt | k | 21:25 |
melwitt | gmann: ah yeah I found it (I was curious) https://opendev.org/openstack/tempest/src/branch/master/zuul.d/integrated-gate.yaml#L361 | 21:31 |
gmann | yeah | 21:34 |
gmann | melwitt: https://review.opendev.org/c/openstack/tempest/+/844342 | 21:34 |
gmann | l wait for this to merge first | 21:35 |
melwitt | ack | 21:35 |
opendevreview | Ghanshyam proposed openstack/nova stable/ussuri: DNM: testing sdk job not runnign from template on ussuri https://review.opendev.org/c/openstack/nova/+/844343 | 21:37 |
gmann | melwitt: ^^ will test it with the tempest change in case it is running from somewhere else also :) | 21:38 |
melwitt | :) | 21:38 |
gmann | irrelevant-files are making all these hidden run, we should have some way to just override irrelevant-files without adding job in pipeline | 21:39 |
melwitt | gmann: it looks like https://bugs.launchpad.net/devstack/+bug/1906322 has somehow cropped up on stable/wallaby grenade runs /o\ | 22:25 |
melwitt | example build: https://zuul.opendev.org/t/openstack/build/09f7ff5b69b84e429e3141a622dfa951/log/controller/logs/grenade.sh_log.txt#16798 from https://review.opendev.org/c/openstack/nova/+/844202 | 22:26 |
melwitt | does it mean we should backport https://review.opendev.org/c/openstack/devstack/+/802642 to wallaby? | 22:28 |
melwitt | I'm not sure how this suddenly started failing | 22:29 |
melwitt | log says it's using pip version 21.0.1; | 22:29 |
melwitt | I'll try a backport and see what happens | 22:31 |
melwitt | someone else had proposed it and abandoned it https://review.opendev.org/c/openstack/devstack/+/805008 | 22:33 |
gmann | ohk, let's discuss in qa channel | 22:54 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!