opendevreview | OpenStack Proposal Bot proposed openstack/nova master: Imported Translations from Zanata https://review.opendev.org/c/openstack/nova/+/891648 | 04:44 |
---|---|---|
opendevreview | Merged openstack/nova master: Bump MIN_{LIBVIRT,QEMU} for "Bobcat" https://review.opendev.org/c/openstack/nova/+/887255 | 04:56 |
opendevreview | melanie witt proposed openstack/nova master: Follow up for unified limits documentation https://review.opendev.org/c/openstack/nova/+/893462 | 06:01 |
opendevreview | melanie witt proposed openstack/nova master: Follow up for unified limits: PCPU and documentation https://review.opendev.org/c/openstack/nova/+/893462 | 06:03 |
melwitt | johnthetubaguy, bauzas, sean-k-mooney: followup for unified limits to handle PCPU and add a bit more detail to docs ^ thanks | 06:04 |
opendevreview | melanie witt proposed openstack/nova master: Follow up for unified limits: PCPU and documentation https://review.opendev.org/c/openstack/nova/+/893462 | 06:20 |
opendevreview | Amit Uniyal proposed openstack/nova stable/2023.1: Refactor CinderFixture https://review.opendev.org/c/openstack/nova/+/893464 | 06:33 |
opendevreview | Amit Uniyal proposed openstack/nova stable/2023.1: Add function to get all attachments in Cinder.API module https://review.opendev.org/c/openstack/nova/+/893465 | 06:33 |
opendevreview | Amit Uniyal proposed openstack/nova stable/2023.1: Reproducer for dangling bdms https://review.opendev.org/c/openstack/nova/+/893466 | 06:33 |
opendevreview | Amit Uniyal proposed openstack/nova stable/2023.1: Delete dangling bdms https://review.opendev.org/c/openstack/nova/+/893467 | 06:35 |
opendevreview | Amit Uniyal proposed openstack/nova stable/zed: Adds regression functional test for 1980720 https://review.opendev.org/c/openstack/nova/+/893468 | 06:47 |
opendevreview | Amit Uniyal proposed openstack/nova stable/zed: Refactor CinderFixture https://review.opendev.org/c/openstack/nova/+/893469 | 06:47 |
opendevreview | Amit Uniyal proposed openstack/nova stable/zed: Add function to get all attachments in Cinder.API module https://review.opendev.org/c/openstack/nova/+/893470 | 06:47 |
opendevreview | Amit Uniyal proposed openstack/nova stable/zed: Reproducer for dangling bdms https://review.opendev.org/c/openstack/nova/+/893471 | 06:47 |
opendevreview | Amit Uniyal proposed openstack/nova stable/zed: Delete dangling bdms https://review.opendev.org/c/openstack/nova/+/893472 | 06:47 |
*** whoami-rajat_ is now known as whoami-rajat | 07:10 | |
opendevreview | Amit Uniyal proposed openstack/nova stable/yoga: Adds regression functional test for 1980720 https://review.opendev.org/c/openstack/nova/+/893474 | 07:27 |
opendevreview | Amit Uniyal proposed openstack/nova stable/yoga: Refactor CinderFixture https://review.opendev.org/c/openstack/nova/+/893475 | 07:27 |
opendevreview | Amit Uniyal proposed openstack/nova stable/yoga: Add function to get all attachments in Cinder.API module https://review.opendev.org/c/openstack/nova/+/893476 | 07:27 |
opendevreview | Amit Uniyal proposed openstack/nova stable/yoga: Reproducer for dangling bdms https://review.opendev.org/c/openstack/nova/+/893477 | 07:27 |
opendevreview | Amit Uniyal proposed openstack/nova stable/yoga: Delete dangling bdms https://review.opendev.org/c/openstack/nova/+/893478 | 07:27 |
bauzas | morning folks | 07:40 |
* bauzas yawns | 07:40 | |
bauzas | melwitt: on it | 07:52 |
bauzas | thanks for the quick follow-ups | 07:53 |
bauzas | auniyal: your series was tagged as a blueprint | 07:53 |
bauzas | auniyal: accordingly, you can't backport it in stable branches as it would change a behavior | 07:54 |
bauzas | see the backport policy rules that are given here https://docs.openstack.org/project-team-guide/stable-branches.html#appropriate-fixes | 07:57 |
ykarel | Hi is there some known issue with test_reboot_server_hard, recently saw many failures | 08:07 |
ykarel | Fails like AssertionError: time.struct_time(tm_year=2023, tm_mon=9, tm_mday=1, tm_hour=7, tm_min=26, tm_sec=24, tm_wday=4, tm_yday=244, tm_isdst=0) not greater than time.struct_time(tm_year=2023, tm_mon=9, tm_mday=1, tm_hour=7, tm_min=26, tm_sec=24, tm_wday=4, tm_yday=244, tm_isdst=0) | 08:07 |
ykarel | k didn't find a bug report, so reported https://bugs.launchpad.net/tempest/+bug/2033752 | 08:14 |
ykarel | it's started just few hours back | 08:14 |
bauzas | ykarel: oh | 08:41 |
bauzas | probably related to some recent merge we did | 08:42 |
bauzas | because now we call Cinder when rebooting so the performance could be impactedcx | 08:42 |
frickler | elodilles: https://review.opendev.org/c/openstack/nova/+/871968 is still blocked by your W-1, can you please check? | 08:53 |
frickler | bauzas: calling cinder for a reboot? I'm curious about that, do you happen to have a reference? | 08:54 |
frickler | ah, https://review.opendev.org/c/openstack/nova/+/882284 ? | 08:55 |
bauzas | frickler: see the bug report | 08:55 |
bauzas | yea | 08:55 |
elodilles | frickler: looking, thanks | 08:56 |
bauzas | sean-k-mooney: dansmith: auniyal: looks like https://review.opendev.org/c/openstack/nova/+/882284 created a gate issue | 08:57 |
frickler | a reno referencing "this change" is weird wording btw | 08:57 |
bauzas | frickler: we can modify this | 08:57 |
auniyal | bauzas, ack | 08:57 |
bauzas | in general we say "Now, Nova does X or Y" | 08:58 |
bauzas | or "With this new feature named "blah", now we have" | 08:58 |
bauzas | auniyal: I'm working on a patch that will conditionnally check the volumes if only BDMs exist | 09:02 |
auniyal | ack bauzas, any thing I can help with | 09:02 |
bauzas | we may need core reviewers | 09:03 |
frickler | elodilles: thx, now just someone needs to retrigger the W+1 I think | 09:04 |
elodilles | frickler: i did it just before you wrote :) | 09:09 |
bauzas | auniyal: actually, we can't avoid a Cinder API call for getting the volumes | 09:10 |
bauzas | volume attachments* | 09:10 |
auniyal | yeah, I was about to say | 09:11 |
auniyal | saw the comment on bug report | 09:11 |
auniyal | cinder calls are separate from bdms what nova have | 09:11 |
auniyal | can we increase reboot time check ? | 09:12 |
bauzas | oh wait | 09:14 |
bauzas | this isn't calculating the time it boots | 09:15 |
bauzas | this is just getting the uptime | 09:15 |
SvenKieske | systemd-analyze time gives you boot times :) | 09:19 |
SvenKieske | without any context | 09:20 |
bauzas | that's weird then | 09:23 |
bauzas | https://github.com/openstack/tempest/blob/28ba6ee268bdf5d18a13a0dc7c7fea2c3a3dfcce/tempest/common/utils/linux/remote_client.py#L84-L88 is just giving me when the server was booted | 09:24 |
bauzas | and https://github.com/openstack/tempest/blob/master/tempest/api/compute/servers/test_server_actions.py#L127-L128 is just comparing the timestamps to check that the VM was actually rebooted | 09:24 |
bauzas | this isn't at all a performance check | 09:25 |
SvenKieske | I'm not understanding your last sentence: do you mean this isn't supposed to be a performance check, or are you indicating that this should be a performance check but fails to do so? | 09:30 |
SvenKieske | depending on what stage the virtual machine is in, the time comparison could be flaky, during early boot especially, if time is not correctly synced. | 09:31 |
SvenKieske | I'd also strongly advise not to directly get stuff from proc-fs. why not use a native python function instead at least? | 09:33 |
bauzas | SvenKieske: Tempest gets the uptime, subs it from the current time so it gets the local time when the server was booted, then reboots the server and does the same math with uptime | 09:33 |
bauzas | and then there is an assertion check that verifies that the reboot timestamp is greater than the original timestamp (meaning the OS was actually rebooted) | 09:34 |
bauzas | https://github.com/openstack/tempest/blob/28ba6ee268bdf5d18a13a0dc7c7fea2c3a3dfcce/tempest/api/compute/servers/test_server_actions.py#L127-L128 and https://github.com/openstack/tempest/blob/28ba6ee268bdf5d18a13a0dc7c7fea2c3a3dfcce/tempest/api/compute/servers/test_server_actions.py#L109 | 09:34 |
SvenKieske | I hope that test never runs during dailight saving time adjustments *just kidding* | 09:34 |
bauzas | UTC exists for a good reason :) | 09:35 |
bauzas | basically, this failing check means that the server *wasn't* rebooted | 09:35 |
SvenKieske | I'm strongly reminded about this now, don't know why: https://gist.github.com/timvisee/fcda9bbdff88d45cc9061606b4b923ca | 09:35 |
SvenKieske | this is really a bad check, to be blunt | 09:36 |
SvenKieske | just use systemd for this stuff already guys, it's there, it works, I know it has it's own bugs, but it's better then the stuff that was before it. | 09:36 |
auniyal | in test reboot_time should be more then (first) boot_time --- but its failing because it same. | 09:37 |
auniyal | https://paste.opendev.org/show/bk8aK3ZbjpeDYs6swcoQ/ | 09:37 |
SvenKieske | e.g. you can easily check the boot counter with systemd | 09:37 |
bauzas | auniyal: which means that the server *wasn't* rebooted | 09:37 |
auniyal | yeah | 09:37 |
bauzas | I'm just checking the logs | 09:37 |
bauzas | we may have had an exception in the reboot | 09:38 |
bauzas | oh wait | 09:39 |
SvenKieske | well that could also be true :) but just to check the wall clock seems not really reliable in the long way, I bet you get false positives someday (when clocks will be out of sync in VMs, and they will be, if the ntp setup is not bulletproof) | 09:39 |
bauzas | if we ssh the guest too soon, the reboot may not be done yet | 09:39 |
bauzas | SvenKieske: I don't think it's an instrumentation issue | 09:39 |
SvenKieske | so you need to detect the reboot, before ssh'ing in order to detect the reboot ;) | 09:39 |
auniyal | but still will it saw same time ? time must have updated right ? | 09:40 |
bauzas | auniyal: not if the guest wasn't rebooted *yet* | 09:40 |
bauzas | yay, another waiter ! | 09:40 |
auniyal | yeah, so it took boot_time, reboot (and waited until get ACTIVE), then took new_boot_time, must be greater, | 09:42 |
auniyal | or we should set our own clock counter in test ?? | 09:43 |
auniyal | and verify if we are able to run any cmd at all in system after reboot ? | 09:44 |
SvenKieske | well you're already doing that if you read the system time, you "just" need to verify that this reading was successfull and not NULL :) | 09:45 |
auniyal | yeah, that should be enough to say server rebooted | 09:46 |
bauzas | okay, I think I spotted | 09:54 |
bauzas | we reverted the reboot due to us unable to reach cinder | 09:55 |
auniyal | bauzas https://zuul.opendev.org/t/openstack/build/0bf10393fa25419c8f3a072ab091f0da/log/controller/logs/screen-n-cpu.txt#12033 this right ? | 10:00 |
bauzas | yes | 10:01 |
bauzas | the keystone service catalog isn't containing the volumev3 endpoint url, that's what's saying the exception | 10:02 |
bauzas | but heh, https://1e11be38b60141dbb290-777f110ca49a5cd01022e1e8aeff1ed5.ssl.cf1.rackcdn.com/893401/5/check/neutron-ovn-tempest-ovs-release/f379752/controller/logs/local_conf.txt | 10:04 |
bauzas | "disable_service cinder" | 10:04 |
auniyal | okay, its reverted but still shouldn't /proc/uptime get updated ? | 10:05 |
auniyal | still great test though it caught it or navigated us to the reason | 10:07 |
bauzas | no, I mean, we have a problem in Nova : we can't longer hard reboot if you don't deploy Cinder | 10:09 |
bauzas | we should handle this case | 10:09 |
auniyal | yes | 10:10 |
bauzas | not sure sean-k-mooney is around but I'd appreciate his thoughts on the best way to alleviate https://github.com/openstack/nova/blob/88c73e2931dcb276982612dbbdc3b849d4977c16/nova/compute/manager.py#L4177 | 10:10 |
auniyal | so if cinder.attachment_get_all failed, with could not create client or endpoint not found skip the cinder check | 10:12 |
auniyal | I'll create a patch for that, | 10:12 |
bauzas | yeah that sounds the obvious fix, but I wonder whether there is a better way to flag that this environment isn't having cinder" | 10:12 |
bauzas | I'm just looking at how we manage such case elsewhere in the code | 10:13 |
bauzas | yeah found | 10:15 |
bauzas | auniyal: https://github.com/openstack/nova/blob/88c73e2931dcb276982612dbbdc3b849d4977c16/nova/compute/manager.py#L3173-L3178 | 10:16 |
bauzas | this is gross but meh | 10:16 |
auniyal | ack bauzas | 10:16 |
sean-k-mooney | bauzas what the issue | 10:18 |
bauzas | auniyal: would you be able to propose a patch soon ? | 10:18 |
auniyal | bauzas yes | 10:19 |
bauzas | sean-k-mooney: tl;dr: some jobs that don't pull cinder fail | 10:19 |
sean-k-mooney | ah | 10:19 |
bauzas | sean-k-mooney: because now we call cinder in reboot | 10:19 |
bauzas | so we need to wrap the right exceptions | 10:19 |
bauzas | https://bugs.launchpad.net/nova/+bug/2033752 | 10:19 |
sean-k-mooney | well we just need a decorator that check if keystone is installed and skipt the function | 10:19 |
bauzas | sean-k-mooney: keystone is installed, cinder isn't | 10:20 |
sean-k-mooney | treat this like how we test for api extentions in neutron | 10:20 |
sean-k-mooney | bauzas: yes | 10:20 |
sean-k-mooney | oh i said keystoen | 10:20 |
sean-k-mooney | i ment check in keystone if cinder is installed | 10:20 |
bauzas | yeah but I got your point | 10:20 |
bauzas | yeah, that's why I wanted to discuss this with you and dansmith | 10:21 |
bauzas | we have 100 ways to make it working | 10:21 |
bauzas | ideally I would think that our internal cinder client package would wrap this, but this sounds a huge work | 10:21 |
sean-k-mooney | so when discusssing this in the past we noted that if you cant connect ot cinder for any reason we shoudl jsut boot with the bdm we have | 10:21 |
bauzas | yeah and with shutdown we just have a long list of wrapped exceptions in order to let shutdown succeed anyway | 10:22 |
bauzas | https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L3164-L3188 | 10:22 |
bauzas | tbc, I agree with it, reboot *has to* be trustable anyway | 10:23 |
bauzas | so, auniyal is just stacking all the exception handling for now | 10:23 |
sean-k-mooney | so this is what we do for neutron extnetions | 10:23 |
sean-k-mooney | https://github.com/openstack/nova/blob/88c73e2931dcb276982612dbbdc3b849d4977c16/nova/network/neutron.py#L1386-L1406 | 10:23 |
sean-k-mooney | do we get an expction when we try to crate the client or when we try to make the first call? | 10:23 |
opendevreview | Merged openstack/nova stable/zed: libvirt: At start-up rework compareCPU() usage with a workaround https://review.opendev.org/c/openstack/nova/+/871968 | 10:23 |
bauzas | sean-k-mooney: when we make the call, that's the problem | 10:24 |
sean-k-mooney | no thats fine | 10:24 |
sean-k-mooney | so with the neutron decorators we are caching the results | 10:25 |
sean-k-mooney | so w eonly do the check once | 10:25 |
bauzas | sean-k-mooney: yeah sounds better, | 10:25 |
sean-k-mooney | so what i woul drobaly do is somethign like add a decorator that check the cinder max microversion | 10:26 |
bauzas | but for the sake of fixing this bug quickly, auniyal is just doing like shutdown and stacking all the exceptions | 10:26 |
sean-k-mooney | and we can handle the logic with that. i.e. say this cufntion need whatever | 10:26 |
sean-k-mooney | i would prefer to revert then hack in a fix if we dont do it properly | 10:26 |
bauzas | sean-k-mooney: well, let's see auniyal's patch | 10:27 |
bauzas | I don't disagree but I'd then also like to change https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L3164-L3188 | 10:28 |
bauzas | and possibly others | 10:28 |
sean-k-mooney | bauzas: we shoudl condider disabling cinder in one fo our jobs by the way | 10:34 |
bauzas | sean-k-mooney: good point | 10:34 |
sean-k-mooney | we do not need it in nova-ovs-hybrid-plug | 10:34 |
bauzas | sean-k-mooney: I was just looking at the jobs list :) | 10:35 |
bauzas | fancy me creating a patch ? | 10:35 |
sean-k-mooney | its ok i can do it | 10:35 |
sean-k-mooney | im looking at the code to see if i can also formulate a patch for the cinder issue | 10:35 |
bauzas | okay, do it then, you had the copyrights on the idea :) | 10:35 |
sean-k-mooney | do you have a link to a failing job? | 10:36 |
bauzas | yeah | 10:36 |
bauzas | https://1e11be38b60141dbb290-777f110ca49a5cd01022e1e8aeff1ed5.ssl.cf1.rackcdn.com/893401/5/check/neutron-ovn-tempest-ovs-release/f379752/controller/logs/local_conf.txt | 10:36 |
sean-k-mooney | ok so that is form the constuction of the client not the first call | 10:38 |
bauzas | sean-k-mooney: https://zuul.openstack.org/job/neutron-ovn-base removes cinder | 10:38 |
sean-k-mooney | its failing here https://github.com/openstack/nova/blob/master/nova/volume/cinder.py#L247 | 10:39 |
bauzas | yeah due to https://github.com/openstack/nova/blob/master/nova/volume/cinder.py#L863 | 10:39 |
bauzas | we're not caching the client | 10:40 |
bauzas | we just instantiate a client at every call | 10:40 |
sean-k-mooney | ya that is somehtign we can fix when we replace the client with the sdk | 10:41 |
bauzas | yup | 10:41 |
bauzas | anyway, my daughters are starving, I shall start to cook something or they'll burn the house | 10:41 |
bauzas | fortunately, the school is starting again next week | 10:42 |
opendevreview | Amit Uniyal proposed openstack/nova master: Addis a try-catch for cinder unavailability. https://review.opendev.org/c/openstack/nova/+/893489 | 10:50 |
opendevreview | sean mooney proposed openstack/nova master: only attempt to clean up dangeling bdm if cinder is installed https://review.opendev.org/c/openstack/nova/+/893502 | 11:48 |
sean-k-mooney | bauzas: auniyal ^ is how i woudl fix it | 11:48 |
opendevreview | sean mooney proposed openstack/nova master: only attempt to clean up dangeling bdm if cinder is installed https://review.opendev.org/c/openstack/nova/+/893502 | 11:50 |
auniyal | yeah, its looks cleaner than https://review.opendev.org/c/openstack/nova/+/893489 | 11:50 |
sean-k-mooney | jsut fixing some typos | 11:51 |
sean-k-mooney | is there a bug for this | 11:51 |
sean-k-mooney | ah https://bugs.launchpad.net/tempest/+bug/2033752 | 11:51 |
auniyal | yes | 11:52 |
opendevreview | sean mooney proposed openstack/nova master: only attempt to clean up dangeling bdm if cinder is installed https://review.opendev.org/c/openstack/nova/+/893502 | 11:53 |
bauzas | sorry folks, was with kids | 12:07 |
auniyal | sean-k-mooney, I added 2 comments in patch, its same what I was trying to do locally | 12:08 |
auniyal | *your patch - https://review.opendev.org/c/openstack/nova/+/893502/3 | 12:08 |
stephenfin | Uggla: https://review.opendev.org/c/openstack/openstacksdk/+/893505 | 12:15 |
stephenfin | I'm off for the rest of the afternoon, so hopefully that does the trick for you :) | 12:15 |
Uggla | oh thank you stephenfin, I'll have a look | 12:16 |
Uggla | stephenfin, have a great day | 12:16 |
stephenfin | Also published here https://gist.github.com/stephenfin/5834e4bd3213a2b9a9313ee3a7f56946 | 12:21 |
stephenfin | Uggla: Cheers | 12:21 |
Uggla | stephenfin, damned I'have again forgotten to register the options. :( It needs more testing but that looks good. | 12:24 |
Uggla | stephenfin, thank you so much | 12:24 |
opendevreview | sean mooney proposed openstack/nova master: only attempt to clean up dangling bdm if cinder is installed https://review.opendev.org/c/openstack/nova/+/893502 | 12:51 |
sean-k-mooney | bauzas: ^ i think i have adressed your nits stephenfin if your around can you take a look quickly too | 12:53 |
* sean-k-mooney goes to make a coffee | 12:53 | |
bauzas | sean-k-mooney: cool do | 12:53 |
dvo-plv | sean-k-mooney, Hello | 13:48 |
sean-k-mooney | hi | 13:48 |
dansmith | johnthetubaguy: comment here if you could look quickly: https://review.opendev.org/c/openstack/nova/+/886989 | 14:00 |
dansmith | also bauzas, since you were +2 | 14:00 |
sean-k-mooney | dansmith: can you take a look at this https://review.opendev.org/c/openstack/nova/+/893502 | 14:00 |
sean-k-mooney | merging the dangeling bdms serise broke reboot for deploymetns without cinder | 14:01 |
bauzas | dansmith: shouldn't need to update node, right? | 14:01 |
bauzas | dansmith: it's not changing | 14:01 |
bauzas | but, compute_id, mmmm, yeah you're right :( | 14:01 |
dvo-plv | I would like to talk regarding our proposal | 14:02 |
dansmith | bauzas: if compute_id is changing then node is changing, however, we're changing host to point to a different node, right? | 14:02 |
dvo-plv | Currently I see two option. 1 we move back to the separate mech driver. 2 we generate socket in the OpenStack and pass it to the dpdk | 14:02 |
bauzas | dansmith: well, the compute node uuid is the ironic node uuid | 14:03 |
bauzas | so, it shouldn't be changing, right? | 14:03 |
dvo-plv | I belive that 2 option is more suitable for you | 14:03 |
dansmith | okay I see, we're moving the node between services... | 14:03 |
dvo-plv | So, how do you see it in terms of the OpenStack, out dpdk team currently analyze this approach with our pmd driver | 14:04 |
bauzas | dansmith: correct, that being said, now I wonder about compute_id | 14:04 |
dansmith | bauzas: compute_id should match the node, so only change one if the other | 14:04 |
bauzas | ah, I was about looking on your series | 14:05 |
dvo-plv | I belive that using common option vhost-server-path is not acceptable in case dpdk port type, so we should create the new one, and it also will be appropriate parameter | 14:05 |
bauzas | (b/c while I reviewed it, honestly, I didn't thought about Ironic) | 14:05 |
sean-k-mooney | dvo-plv: ya so option 2 woudl be my prefernce. but add a new dpdkarg or extra_config to hold the path | 14:06 |
sean-k-mooney | dvo-plv: we have kind of a chicken and egg problem | 14:07 |
dansmith | sean-k-mooney: really? we have no jobs without cinder? | 14:07 |
sean-k-mooney | dansmith: apprently | 14:07 |
sean-k-mooney | dansmith: its enabled by default in the devstack base job | 14:07 |
dansmith | sean-k-mooney: that doesn't make sense to me.. if we have no cinder, we should have no bdms with type==volume | 14:08 |
sean-k-mooney | dvo-plv: os-vif does not have a way to return data ot nova, nova need to knwo the socket path to put it in the libvirt xml so qemu cna create it, and neutron does not knwo what vendor nic is being used | 14:09 |
bauzas | dansmith: ok, indeed compute_id == compute_node.id (found it in the code) so are you +W then ? | 14:09 |
bauzas | dansmith: see https://bugs.launchpad.net/nova/+bug/2033752 | 14:09 |
dansmith | bauzas: it was already +W, I just -Wd to make sure we weren't missing something | 14:09 |
sean-k-mooney | dansmith: hum... that is a very good point | 14:10 |
sean-k-mooney | oh | 14:10 |
sean-k-mooney | https://review.opendev.org/c/openstack/nova/+/893502/4/nova/compute/manager.py#4217 | 14:10 |
sean-k-mooney | dansmith: that line is alwasy called | 14:10 |
dansmith | ah | 14:10 |
bauzas | dansmith: the problem is with https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4209 | 14:10 |
bauzas | everytime we call it | 14:11 |
dansmith | so let's put a check to not do anything else if no bdms_to_delete right? | 14:11 |
dansmith | if not bdms_to_delete: return | 14:11 |
bauzas | no, because we could have none BDMs | 14:11 |
bauzas | while there could be dangling volume attachments | 14:11 |
sean-k-mooney | i mean i can adda that as well | 14:11 |
bauzas | but, I looked at shutdown | 14:12 |
sean-k-mooney | are you suggesting removing the check for if cinder exists | 14:12 |
dansmith | bauzas: that's to delete attachments cinder thinks we have? | 14:12 |
sean-k-mooney | or just also adding a second check on line 4216 | 14:12 |
dvo-plv | sean-k-mooney, If we extend this field with socket, it will be acceptable, right ? https://github.com/openstack/os-vif/blob/master/vif_plug_ovs/ovsdb/ovsdb_lib.py#L184 | 14:12 |
dansmith | bauzas: isn't that wrong anyway? | 14:12 |
dansmith | bauzas: what if we have a multi-attach volume, with one instance and one standalone attachment for an externally-managed database server or something? | 14:12 |
sean-k-mooney | dvo-plv: i think that would be the correct approch yes | 14:13 |
sean-k-mooney | and if other vendors then want to integrate they just need to supprot that too | 14:13 |
dvo-plv | okay, great, we will rework this solution. Do I need to udpate spec file ? | 14:14 |
bauzas | dansmith: on the first way, I'm okay with the verification : if we have nova attachments, then we can verify whether cinder also has them | 14:14 |
bauzas | dansmith: on the other way, we could have cinder attachments while we wouldn't see BDMs | 14:14 |
dansmith | bauzas: right, but that's valid in the real world right? | 14:14 |
sean-k-mooney | dvo-plv: yes but not entirly because of this. last week was the non-client lib freeze so os-vif is frozen for bobcat and yesterday was the feature freeze for nova so the spec need to be reposposed for next cycle | 14:15 |
dansmith | bauzas: or, I guess we're filtering by only attachments for that instance, so maybe that's okay | 14:15 |
sean-k-mooney | dvo-plv: so it would be nice to note this in that when you reporpose the spec | 14:15 |
bauzas | dansmith: well, if you see my comments, I was torn | 14:15 |
bauzas | I wonder whether it's possible to have dangled volume attachments for this instance while we no longer have the BDMs | 14:16 |
bauzas | in order to remove the stale cinder attachments | 14:16 |
dansmith | bauzas: we should only do that if they specify our instance in the attachment | 14:16 |
sean-k-mooney | bauzas: if you try to attach a volume to an isntace via cinder it will create that case | 14:16 |
sean-k-mooney | fortunetly that is now blokcked without a service token | 14:17 |
sean-k-mooney | i.e. if a attachment extis for a voluem that we do not have a bdm for. you either did a db restore or messed with the nova db or you use cidner api to create it directly which we never supported | 14:18 |
sean-k-mooney | dansmith: bauzas: i have not been fully following your conversation but im planning to finish in the next hour or so | 14:19 |
sean-k-mooney | with that in mind is there a direction we want to take to unblock things | 14:19 |
sean-k-mooney | revert? proceed with patch? or is there a change you woudl like me to make quickly? | 14:19 |
dansmith | stale attachments are the only *legit* way to get there now, but that can certainly happen | 14:19 |
sean-k-mooney | alternitivly im fine with someone else takeing over this | 14:19 |
dvo-plv | okay, I got it | 14:19 |
dvo-plv | thanks | 14:19 |
dansmith | sean-k-mooney: bauzas I'd be okay with revert and fix, but if not, sean-k-mooney's patch needs some change I think | 14:20 |
dansmith | I can make the change if bauzas is around to approve, but melwitt is out today and monday all of the US is off | 14:20 |
dansmith | so a fast revert might be more prudent | 14:20 |
bauzas | dansmith: that's what it's doing now, right? we ask Cinder to tell us if we have volume attachments for this instance | 14:20 |
dansmith | bauzas: yeah, I'm just now reading it.. that wasn't in the version I last reviewed | 14:20 |
bauzas | but, how can we make this call conditional? I don't see | 14:21 |
dansmith | well, one thing we could do is just not do that healing at all if the instance has no volume bdms | 14:21 |
bauzas | that's why wrapping the exception seems the easiest path to me | 14:21 |
bauzas | dansmith: but we wouldn't fix the problem of a stale volume attach | 14:21 |
bauzas | and looking at other pieces of code, we just blindly catch the keystone exception actually | 14:22 |
sean-k-mooney | bauzas: in the event there are no other cinder voluem on the instance | 14:22 |
dansmith | bauzas: yep, we just need to make a call | 14:22 |
bauzas | dansmith: example with shutdown https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L3173-L3178 | 14:22 |
dansmith | bauzas: yep, we could just wrap all of that function in that check and move on.. perhaps that's safest | 14:23 |
bauzas | that's what I originally ask to auniyal but then sean-k-mooney and I discussed on having a better helper | 14:23 |
dansmith | but we need to not log that on every single reboot for people that have no cinder at all, which is what sean-k-mooney's patch will currently do | 14:23 |
sean-k-mooney | ah i can drop the log | 14:23 |
sean-k-mooney | i didnt have it orgianlly | 14:23 |
bauzas | dansmith: to not help the problem, we don't cache the cinder client | 14:23 |
bauzas | dansmith: we're really instantiating it at every call we make | 14:24 |
sean-k-mooney | dansmith: this is the log you are refering too right https://review.opendev.org/c/openstack/nova/+/893502/4/nova/compute/manager.py#4193 | 14:24 |
dansmith | bauzas: are we? | 14:24 |
bauzas | yes | 14:24 |
dansmith | sean-k-mooney: the one with the ugly parens, yeah :) | 14:24 |
bauzas | dansmith: eg. https://github.com/openstack/nova/blob/master/nova/volume/cinder.py#L863 | 14:25 |
dansmith | bauzas: ack | 14:25 |
dansmith | bauzas: you need to re-+W the ironic manage one | 14:26 |
opendevreview | Danylo Vodopianov proposed openstack/nova master: Packed virtqueue support was added. https://review.opendev.org/c/openstack/nova/+/876075 | 14:27 |
bauzas | dansmith: ack, will | 14:28 |
opendevreview | sean mooney proposed openstack/nova master: only attempt to clean up dangling bdm if cinder is installed https://review.opendev.org/c/openstack/nova/+/893502 | 14:28 |
dansmith | man I hate wasting vertical space on dangling parens | 14:30 |
sean-k-mooney | i have kind of started to do that because of ifs but ya i prefer not to wast it either | 14:31 |
dansmith | man it is SO ugly | 14:31 |
bauzas | black syntax kills python :) | 14:33 |
bauzas | it's Friday I can duck | 14:33 |
dansmith | bauzas: ++ | 14:40 |
bauzas | well, now I know that we don't have jobs without cinder | 14:40 |
dansmith | I'm really shocked tbh, and makes me think we have some room for trimming if that's the case ;) | 14:41 |
bauzas | yeah that's actually good news for the gate | 14:45 |
sean-k-mooney | before i go fix the test are we ok with this https://termbin.com/d2q2 | 14:47 |
auniyal | sean-k-mooney, should we catch one more i.e only Exception in manager.py | 14:50 |
auniyal | after EndpointNotFound | 14:50 |
bauzas | sean-k-mooney: looking | 14:50 |
sean-k-mooney | auniyal: no we shoud avoid catching Exception in general | 14:51 |
bauzas | yeah | 14:51 |
sean-k-mooney | i think we even have a hacking check to prevent you doing that without a #noqa line | 14:51 |
sean-k-mooney | s/line/comment/ | 14:51 |
bauzas | no AFAIR | 14:52 |
bauzas | but meh, anyway | 14:52 |
auniyal | sean-k-mooney ack | 14:52 |
bauzas | sean-k-mooneydoesn't see in your patch where you're moving the attachment_get_call(), just see it added at first | 14:52 |
bauzas | but I assume you're moving the call, not copying it | 14:53 |
auniyal | actually I was saying it because its here https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L3184 | 14:53 |
sean-k-mooney | im moving it im still editing but i just wanted to confirm that was the general direction | 14:53 |
bauzas | yeah, that's the direction we agreed | 14:53 |
auniyal | but yes, it should be avoided | 14:53 |
bauzas | make it the first call in the method and bail out early | 14:53 |
bauzas | this way, non-cinder envs would meh this method | 14:54 |
sean-k-mooney | yep and im adding a second early our for | 14:54 |
sean-k-mooney | len(bdms_to_delete) == 0: | 14:54 |
sean-k-mooney | after the first loop | 14:54 |
bauzas | well, we're looping, right ? | 14:54 |
bauzas | ah | 14:55 |
bauzas | then after the loop, my bad, you just told it :) | 14:55 |
sean-k-mooney | https://termbin.com/a7sc | 14:55 |
sean-k-mooney | ok im goign to start fixign the test and ill push somthing up in a bit | 14:56 |
dansmith | sean-k-mooney: I'd be okay with that other than the dangling paren | 14:57 |
sean-k-mooney | for the excpet | 14:57 |
dansmith | and the unnecessary wrapping of the except list | 14:57 |
dansmith | that could be two lines but is four | 14:57 |
sean-k-mooney | well our convention is if you need to wrap you move all args | 14:58 |
dansmith | no it's not | 14:58 |
sean-k-mooney | it depend on if you want to dedent the args | 14:59 |
dansmith | that's black convention that has been sneaking into nova over years | 14:59 |
sean-k-mooney | either we dont denent them or we do and then move all | 14:59 |
dansmith | first example I found in the same file: https://review.opendev.org/c/openstack/nova/+/893489/1/nova/compute/manager.py#3881 | 14:59 |
dansmith | there's another right above it | 14:59 |
sean-k-mooney | yep i know we have been converting but fine i have swap back to that | 15:00 |
dansmith | "we" have not been converting, AFAIK.. "some people" have been converting :) | 15:00 |
sean-k-mooney | i dont really mind doing that in this case as it not hevvily indented | 15:00 |
sean-k-mooney | i really hate when we have really indeneted arges across like 10 lins | 15:01 |
sean-k-mooney | when it can be 4 if we dedent and wrap properly | 15:01 |
dansmith | well, regardless, this is not that case | 15:01 |
sean-k-mooney | we are both trying not to have lots of wasted virtical space so im ok to do it the other way here | 15:01 |
dansmith | yep | 15:04 |
sean-k-mooney | ok i think i have things wokring now | 15:41 |
sean-k-mooney | i cant do the second early out | 15:42 |
opendevreview | sean mooney proposed openstack/nova master: only attempt to clean up dangling bdm if cinder is installed https://review.opendev.org/c/openstack/nova/+/893502 | 15:42 |
sean-k-mooney | ok i need to run now so dansmith bauzas can ye take this from here. | 15:47 |
bauzas | ack | 15:47 |
bauzas | dansmith: can you please validate https://review.opendev.org/c/openstack/nova/+/893502 ? | 16:20 |
dansmith | bauzas: +W.. we might want to get a depends-on running against it in parallel to make sure it really solves the problem | 16:23 |
dansmith | was that a devstack base job that hit the original problem? | 16:23 |
bauzas | dansmith: yeah, some neutron ones | 16:24 |
bauzas | https://bugs.launchpad.net/tempest/+bug/2033752 | 16:24 |
dansmith | neutron ones or devstack ones? | 16:24 |
dansmith | ykarel: still around? | 16:25 |
bauzas | sorry https://zuul.openstack.org/job/neutron-ovn-base | 16:25 |
ykarel | dansmith, yes kind off | 16:25 |
bauzas | https://zuul.openstack.org/job/tempest-integrated-networking doesn't skip cinder, but his child above skips it | 16:25 |
dansmith | ykarel: I'll submit a DNM neutron patch to test | 16:26 |
dansmith | ykarel: https://review.opendev.org/c/openstack/neutron/+/893538 | 16:26 |
ykarel | dansmith, ack | 16:26 |
bauzas | I'll bail out now | 16:27 |
dansmith | bauzas: ack | 16:27 |
bauzas | long day for a long week | 16:27 |
dansmith | bauzas: puh lease man.. back for one week after gone for what, three? | 16:28 |
dansmith | I'm going to really enjoy that ONE day off I have next week ;) | 16:28 |
bauzas | 3 and a half :) | 16:28 |
dansmith | pssh | 16:28 |
bauzas | yeah, but going back on the ff week, lovin' <3 | 16:28 |
dansmith | bauzas: go on, get out of here.. I don't want to over stress your system :) | 16:29 |
bauzas | dansmith: anyway, enjoy your own timeoff :) | 16:29 |
dansmith | heh thanks | 16:29 |
auniyal | dansmith - https://bf80d1d7066e6c071f87-1fbe10cd6d23547f6092d75bc2b558ab.ssl.cf2.rackcdn.com/893502/6/check/nova-ovs-hybrid-plug/4042471/job-output.txt - "DBNonExistentDatabase" | 16:46 |
auniyal | and https://zuul.opendev.org/t/openstack/build/40424718c3524bb78ca22cb9d5293b6c/log/controller/logs/screen-n-cpu.txt#14946 - this one is oslo_service.periodic_task | 16:47 |
dansmith | auniyal: what about it? | 16:48 |
auniyal | nova-ovs-hybrid-plug job failed with POST fail | 16:48 |
auniyal | because devstack@c-bak.service | 16:49 |
dansmith | oh, sean changed the zuul yaml | 16:49 |
dansmith | I didn't even look at that | 16:49 |
auniyal | couldn't start | 16:49 |
dansmith | yeah we should just do that in a different patch | 16:49 |
dansmith | auniyal: can you update sean's patch to just not do the .zuul part and I'll +2+W it? | 16:50 |
dansmith | move the .zuul part to a second patch on top and we can go from there separate from fixing the gate | 16:50 |
auniyal | ack | 16:50 |
dansmith | auniyal: can you do it quick? if not I will, but it would be better if you do and I +W | 16:50 |
auniyal | yes I am doing | 16:51 |
dansmith | cool | 16:51 |
dansmith | ++ glad you were watching those jobs live, I didn't even pay attention to the .zuul changer | 16:51 |
dansmith | something else is re-enabling c-bak, which is why that happened: | 16:54 |
dansmith | 2023-09-01 16:19:08.457805 | compute1 | +++ enable_service c-bak | 16:54 |
opendevreview | Amit Uniyal proposed openstack/nova master: only attempt to clean up dangling bdm if cinder is installed https://review.opendev.org/c/openstack/nova/+/893502 | 16:56 |
opendevreview | Amit Uniyal proposed openstack/nova master: Moved zuul job update https://review.opendev.org/c/openstack/nova/+/893540 | 16:56 |
dansmith | gmann: can you tell us why this is not working? https://review.opendev.org/c/openstack/nova/+/893540/1/.zuul.yaml | 16:57 |
dansmith | gmann: c-bak ends up re-enabled but fails to start because the rest of cinder is disabled | 16:57 |
gmann | dansmith: checking | 16:58 |
dansmith | auniyal: ah, you didn't remove the comment from the commit message about modifying the job, can you fix that real quick? | 16:59 |
auniyal | yes | 17:00 |
sean-k-mooney | without the job config tweak we dont actuly knwo that this is fixed | 17:01 |
dansmith | sean-k-mooney: yes, we have a neutron DNM against it | 17:01 |
sean-k-mooney | but i see that was factored out into its own change | 17:01 |
sean-k-mooney | ok i would have prefered to keep them in the same patch btu sure | 17:02 |
gmann | dansmith: where you are seeing it failing? | 17:02 |
sean-k-mooney | as long as we eventually get a job without cinder working its fine | 17:02 |
dansmith | gmann: https://bf80d1d7066e6c071f87-1fbe10cd6d23547f6092d75bc2b558ab.ssl.cf2.rackcdn.com/893502/6/check/nova-ovs-hybrid-plug/4042471/job-output.txt | 17:02 |
dansmith | gmann: search for "noneextentdatabase" | 17:02 |
dansmith | sean-k-mooney: yeah, definitely need that, but just fixing that job won't even ensure that neutron is unblocked | 17:03 |
dansmith | sean-k-mooney: I've got https://review.opendev.org/c/openstack/neutron/+/893538 for that verification | 17:03 |
sean-k-mooney | cool well i was just checking back to see if there were any issues | 17:03 |
opendevreview | Amit Uniyal proposed openstack/nova master: only attempt to clean up dangling bdm if cinder is installed https://review.opendev.org/c/openstack/nova/+/893502 | 17:03 |
opendevreview | Amit Uniyal proposed openstack/nova master: zuul update https://review.opendev.org/c/openstack/nova/+/893540 | 17:03 |
sean-k-mooney | auniyal: why did you rebase those? | 17:04 |
dansmith | sean-k-mooney: to fix the commit message | 17:04 |
dansmith | sean-k-mooney: if we can figure out why it's not working in devstack, I'll fast-approve that job change or get gmann to help me, it just doesn't work as you had it | 17:04 |
dansmith | since it was part of the approved initial patch | 17:05 |
sean-k-mooney | it can wait till monday so if its not workign ill look at it then | 17:05 |
dansmith | ack | 17:05 |
gmann | dansmith: sean-k-mooney auniyal this is multinode job and we need to disable the cinder in subnode also | 17:05 |
dansmith | ahh | 17:05 |
sean-k-mooney | oh right | 17:05 |
gmann | neutron might be having example of those | 17:05 |
gmann | like this one https://github.com/openstack/neutron/blob/ed6023c3473585b4af6980404e7fe36ea152f980/zuul.d/tempest-multinode.yaml#L151 | 17:06 |
sean-k-mooney | its trivial | 17:06 |
dansmith | yeah | 17:06 |
sean-k-mooney | it just need to be in teh group vars section | 17:06 |
dansmith | I'll fix it | 17:06 |
opendevreview | Dan Smith proposed openstack/nova master: zuul update https://review.opendev.org/c/openstack/nova/+/893540 | 17:07 |
opendevreview | Dan Smith proposed openstack/nova master: Make our nova-ovs-hybrid-plug job omit cinder https://review.opendev.org/c/openstack/nova/+/893540 | 17:08 |
gmann | perfect, it should work now. will keep eyes on it | 17:08 |
dansmith | thanks gmann ;) | 17:08 |
sean-k-mooney | its slightly overkill but ya it will | 17:08 |
sean-k-mooney | the sub node only has cinder volume and not swift bits | 17:09 |
sean-k-mooney | but it does not hurt to be explcit | 17:09 |
dansmith | it clearly had c-bak | 17:09 |
sean-k-mooney | really i tought that was only on the contoler | 17:09 |
dansmith | that was the failure | 17:09 |
sean-k-mooney | in anycase it does not hurt to disabel them all explictly | 17:10 |
dansmith | c-bak was trying to start and didn't find the cinder database because it wasn't setup on the main node | 17:10 |
sean-k-mooney | ack we might have moved that at some point | 17:10 |
gmann | yeah, we do not enable swift in subnode by default but never know which child job doing might be doing/will do that so disable explicitly is better | 17:11 |
sean-k-mooney | ok im going to go mostly afk. i will skim the logs later tonight but otherwise ill cat to ye on tuesday? its a long weekend in the US right | 17:12 |
dansmith | yup | 17:13 |
sean-k-mooney | cool well enjoy | 17:13 |
dansmith | thanks, o/ | 17:13 |
gmann | sean-k-mooney: o/ enjoy weekend | 17:14 |
dansmith | auniyal: devstack finished on the subnode this time | 17:33 |
auniyal | ack | 17:33 |
auniyal | job "openstacksdk-functional-devstack" is still running | 17:34 |
dansmith | I meant on the top patch to change the job | 17:35 |
dansmith | the one where we had to add the c-bak settings to the subnode | 17:35 |
auniyal | .zuul patch where `c-bak: false` is set | 17:37 |
dansmith | yeah | 17:42 |
auniyal | in here https://zuul.openstack.org/status#893540 - nova-ovs-hybrid-plug: failing and openstacksdk-functional-devstack: in-progress | 17:43 |
auniyal | unit and functional tests are passed | 17:44 |
auniyal | okay you meant deployment finished without issue | 17:44 |
auniyal | right ? | 17:44 |
elodilles | hi, nova team! fyi, train-eol patch has merged & the branches are deleted: https://paste.opendev.org/show/bVfeofByKRN0ZrNX779o/ | 17:50 |
auniyal | this is not a regular thing right - https://zuul.opendev.org/t/openstack/build/475dab49cd33437a96ba81437779b875/log/controller/logs/screen-n-cpu.txt#14515 | 17:50 |
dansmith | auniyal: yeah | 17:50 |
gmann | elodilles: thanks | 17:52 |
elodilles | gmann: np | 17:52 |
dansmith | gmann: this is a different failure: https://zuul.opendev.org/t/openstack/build/475dab49cd33437a96ba81437779b875 | 21:05 |
dansmith | the one we were trying to resolve was where it tried to start c-bak on the subnode and failed | 21:05 |
dansmith | oh, but, it's running an evacuate test on a bfv instance which clearly won't work, I see | 21:06 |
dansmith | so we need to remove that post action or whatever | 21:06 |
gmann | dansmith: I mean the failure it had before for evacuate test | 21:06 |
dansmith | okay gotcha | 21:07 |
gmann | yeah, or separate the test with cinder availability or so? | 21:07 |
opendevreview | Merged openstack/nova master: only attempt to clean up dangling bdm if cinder is installed https://review.opendev.org/c/openstack/nova/+/893502 | 21:09 |
dansmith | gmann: I don't think there's any reason we should be running that post test on the ovs job, I think it's just inherited and wasting time | 21:12 |
dansmith | I'll leave that to sean on monday | 21:13 |
gmann | dansmith: not sure, but it job definition explicitly says about it "Run move operations, reboot, and evacuation (via the same post-run hook | 21:14 |
gmann | as the nova-live-migration job) tests with the OVS network backend " | 21:14 |
gmann | I think we can avoid running bfv evacuate test if cinder disabled. | 21:14 |
dansmith | yep, that's what I meant and said in my comment | 21:18 |
gmann | yeah | 21:23 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!