16:00:41 <bauzas> #startmeeting nova
16:00:41 <opendevmeet> Meeting started Tue Jun  4 16:00:41 2024 UTC and is due to finish in 60 minutes.  The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:41 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:41 <opendevmeet> The meeting name has been set to 'nova'
16:00:45 <bauzas> hey folks
16:00:51 <elodilles> o/
16:00:55 <bauzas> #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting
16:01:03 <bauzas> should be a quick one hopefully
16:01:16 <erlon> \o
16:01:31 <marlinc> \o
16:01:43 <kgube> \o
16:01:45 <auniyal> o/
16:02:11 <Luzi> o/
16:02:26 <marlinc> I forgot to put my topic in the agenda, is that still oke?
16:02:38 <bauzas> shoot it in the agenda, sure
16:02:47 <bauzas> okay, let's start
16:02:54 <bauzas> #topic Bugs (stuck/critical)
16:03:02 <bauzas> #info No Critical bug
16:03:10 <bauzas> #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster
16:03:19 <bauzas> anything about buuugs ?
16:03:46 <fwiesel> \o
16:03:54 <gibi> o/
16:03:56 <bauzas> looks not
16:03:59 <bauzas> let's move on then
16:04:13 <bauzas> #topic Gate status
16:04:18 <bauzas> #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs
16:04:24 <bauzas> #link https://etherpad.opendev.org/p/nova-ci-failures-minimal
16:04:32 <bauzas> #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status
16:04:38 <bauzas> (all greens, huzzah)
16:04:44 <bauzas> #info Please look at the gate failures and file a bug report with the gate-failure tag.
16:04:49 <bauzas> #info Please try to provide meaningful comment when you recheck
16:04:59 <bauzas> I haven't seen any important gate failure
16:05:09 <bauzas> and you ?
16:06:51 <bauzas> ok, np
16:06:57 <bauzas> #topic Release Planning
16:07:02 <bauzas> #link https://releases.openstack.org/dalmatian/schedule.html
16:07:08 <bauzas> as a reminder, nova deadlines are ^
16:07:15 <bauzas> #info Dalmatian-2 in 4 weeks
16:07:18 <bauzas> tick-tock
16:07:37 <bauzas> nothing planned as a review day until milestone-2 where we will have a spec review day
16:07:52 <bauzas> anything else about our schedule ?
16:07:57 <erlon> Is there a freeze day for bug fixes?
16:08:46 <erlon> I only see a freeze for specs
16:09:11 <erlon> ah, I see now, seems common to all projects
16:09:14 <bauzas> we can merge bugfixes until RC1
16:09:50 <bauzas> after RC1, we then branch master for Dalmatian
16:10:04 <erlon> ok, so, what kind of changes are accepted after RC1?
16:10:11 <bauzas> then, in between RC1 and GA, we can't merge any bugfix, ony regression fixes
16:10:24 <erlon> hmm, ok, makes sense
16:10:33 <bauzas> but we can still merge bugfixes in master
16:10:47 <erlon> Right, that will be available for the next release
16:10:54 <bauzas> since master will then be E (and no longer D) after RC1
16:11:17 <bauzas> so basically, we can merge any bugfixes anytime
16:11:22 <bauzas> for master
16:11:57 <sean-k-mooney> the only soft freeze is really betwee Feature freeze and RC1
16:12:02 <bauzas> but if you want your fix to be on a specific release, then either before RC1 or after GA by backporting it
16:12:14 <sean-k-mooney> we can merge a bugfix during that period but prefer to limit them ot regressions
16:12:18 <erlon> right
16:12:20 <bauzas> sean-k-mooney: not for bugfixes
16:12:45 <bauzas> anyway, I think you have your answer
16:12:50 <bauzas> can we move ?
16:13:03 <bauzas> #topic Review priorities
16:13:06 <sean-k-mooney> bauzas: we have prevsiously ased reviews not to merge bugfixes for bugs not intoduced in the current release in that period. but yes we can move on
16:13:14 <bauzas> #link https://etherpad.opendev.org/p/nova-dalmatian-status
16:13:24 <bauzas> nothing to say but please look at it ^
16:13:31 <bauzas> #topic Stable Branches
16:13:33 <erlon> I have a request here
16:13:46 <erlon> for Review priorities
16:14:16 <erlon> I would like to get some attention on the Shared security groups patches?
16:14:40 <bauzas> #undo
16:14:40 <opendevmeet> Removing item from minutes: #topic Stable Branches
16:14:41 <erlon> s/?/\./g
16:14:56 <marlinc> I might have put my blueprint in the wrong meeting section, I'm not sure if it is this or open discussion
16:15:06 <sean-k-mooney> no that at the end
16:15:12 <bauzas> marlinc: we'll discuss this on the open discussion
16:15:17 <sean-k-mooney> which is where we should discuss the shared security group patches :)
16:15:29 <marlinc> Thank you :)
16:15:30 <bauzas> erlon: for your series, this will be reviewed like any other
16:15:41 <erlon> ow, came on, thas a review priority topic :)
16:16:01 <bauzas> https://etherpad.opendev.org/p/nova-dalmatian-status?#L37
16:16:05 <bauzas> yeah
16:16:16 <erlon> Right, I just want to make sure that I don't miss any deadlines on that one
16:16:18 <sean-k-mooney> yes and no its not identifed as a review priortiy form my perspective
16:16:18 <bauzas> given your blueprint was accepted
16:16:26 <sean-k-mooney> but ill try and review it after the meeting
16:16:34 <sean-k-mooney> looks like its passing ci now
16:16:37 <erlon> And I also have a special request on that
16:16:41 <bauzas> ditto here, I already have a lot of other series to look at
16:16:55 <bauzas> moving on then
16:17:01 <bauzas> #topic Stable Branches
16:17:11 <bauzas> elodilles: heya
16:17:16 <elodilles> o/
16:17:24 <elodilles> #info stable gates should be mostly OK
16:17:40 <elodilles> (i've seen some intermittent failures, otherwise, nothing special)
16:17:48 <elodilles> #info stable release proposed for 2024.1 Caracal: https://review.opendev.org/c/openstack/releases/+/921287
16:18:07 <elodilles> we had Bobcat and Antelope stable releases some weeks ago,
16:18:21 <elodilles> maybe we can release now Caracal as well ^^^
16:18:34 <elodilles> feel free to comment on the release patch
16:18:45 <elodilles> #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci
16:18:55 <elodilles> add any issue if you see one ^^^
16:19:17 <elodilles> and that's all from me
16:20:25 <bauzas> thanks
16:20:35 <bauzas> I'll look at the caracal patch
16:21:04 <elodilles> bauzas: thx in advance
16:21:09 <bauzas> +1d fwiw
16:21:22 <bauzas> #topic vmwareapi 3rd-party CI efforts Highlights
16:21:29 <bauzas> fwiesel: are you here ?
16:21:38 <fwiesel> Hi, yes...
16:21:45 <fwiesel> #info No updates.
16:22:13 <fwiesel> Still little progress from my side currently. Sorry about that.
16:22:22 <fwiesel> Any questions or comments?
16:22:41 <bauzas> not from me
16:22:55 <bauzas> moving on then
16:23:00 <bauzas> #topic Open discussion
16:23:11 <bauzas> erlon: nova handling of virDomainGetJobStats() errors
16:23:27 <erlon> so, I posted a link on the wiki
16:23:41 <erlon> https://gist.githubusercontent.com/sombrafam/8f177cbc4e153c328a242811bc24650e/raw/67190ffce7f036c7c3d3628fda15327cef7a41da/nova-compute.log
16:23:45 <bauzas> (woah, we have 4 topics to discuss today so please discuss everyone by only 5-10 mins maxc)
16:23:54 <erlon> This bug, is happening every time that we try to do a lot of migrations
16:24:29 <bauzas> just file a blueprint bug report
16:24:36 <erlon> For some reason the source host stops responding, and it gets to the time out. I want to know if would be okay to add a new Handler for this kind of exception there:
16:24:36 <erlon> https://github.com/openstack/nova/blob/master/nova/virt/libvirt/guest.py#L669
16:24:40 <bauzas> and ping us anytime on the chat
16:24:45 <sean-k-mooney> erlon: does that imply you you have set the max conncurrent live migration over 1
16:24:51 <erlon> ah ok, sounds good
16:25:14 <erlon> I don't know the exact configuration but very likely
16:25:24 <sean-k-mooney> https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.max_concurrent_live_migrations
16:25:35 <sean-k-mooney> that should bassicly never be set to anything excpet 1
16:26:02 <erlon> So why does that exist? Which case should it apply?
16:26:12 <sean-k-mooney> there are usecases for it to be != 1 but unless you using 100G networking its not recommended
16:26:16 <bauzas> it just says it can't connect to the libvirt API
16:26:37 <sean-k-mooney> it exists for usecases wehere a single core is not enough to saturate the network link
16:26:47 <bauzas> there could be different reasons why libvirt is resetting the connection
16:27:02 <erlon> It says that because it time it out due to a short keepalive
16:27:02 <sean-k-mooney> bauzas:right but since they said this was load related
16:27:22 <sean-k-mooney> not doning concurrent live migrations may help
16:27:40 <bauzas> oh, missed that sentence, indeed good point
16:27:50 <bauzas> don't try to max out the number of calls you make to libvirt
16:28:01 <bauzas> like any other API, it has limits
16:28:23 <erlon> This comes from the libvirt logs: "At the same time in libvirtd logs: `May 28 13:00:39 ps5-ra1-n6 libvirtd[612268]: internal error: connection closed due to keepalive timeout`"
16:28:41 <erlon> So, that's why we know thats a timeout error
16:28:51 <bauzas> erlon: to answer your original question, no, I wouldn't recommend to modify Nova to handle that libvirt exception
16:29:28 <bauzas> particularly when the bug comes from libvirt, not nova itself
16:30:04 <erlon> But see, that function does exactly that. it handles only libvirt exceptions
16:30:31 <sean-k-mooney> there are some cases where retrying at the nova level is correct
16:30:38 <sean-k-mooney> this may or may not be one of them
16:30:39 <bauzas> specific and meaningful libvirt exceptions, that's the point :)
16:30:47 <sean-k-mooney> can you right this up as a bug report if you have not already
16:31:03 <sean-k-mooney> and we can see see if this is such a case
16:31:03 <bauzas> retrying on a generic connection failed is never a good idea
16:31:13 <erlon> ok, I will as soon as I get a reproducer
16:31:33 <sean-k-mooney> bauzas: we do that all the time for rest calls
16:31:34 <bauzas> honestly, you have my opinion
16:31:49 <bauzas> sean-k-mooney: to placement
16:31:50 <sean-k-mooney> this is us doing a read call to check the status fo the job
16:31:58 <sean-k-mooney> bauzas: and neutorn and cinder
16:31:58 <erlon> I was actually thinking and just returning, an empty info and leave nova call it again in the next periodic task, since this is just a info report
16:32:48 <sean-k-mooney> perhaps or catching it an logging a warning
16:32:56 <sean-k-mooney> rather then a trace back
16:33:03 <bauzas> I wouldn't hide
16:33:11 <sean-k-mooney> anywya if you have a repoducer we can discuss the solution in gerrit
16:33:12 <bauzas> if I were catching that exception
16:34:00 <erlon> The issue with the current process is that, although it triggers the section again, the
16:34:00 <erlon> periodic tasks do not complete the migration. Consequently, while the VM is migrated
16:34:00 <erlon> to the destination host, it is not properly cleaned up on the source host.
16:34:03 <bauzas> honestly, exception giving a better log to the operator saying 'libvirt is getting weirdo, see", I don't see the benefit of adding another handler to a periodic
16:34:11 <opendevreview> Merged openstack/nova stable/2023.2: Improve logging at '_numa_cells_support_network_metadata'  https://review.opendev.org/c/openstack/nova/+/900840
16:34:30 <bauzas> anyway, we have three other topics to discuss
16:34:45 <bauzas> erlon: please file a bug report and come to us once's done
16:34:56 <erlon> Okay, let's move, thanks
16:34:57 <bauzas> and yeah, a reproducer may help
16:35:11 <bauzas> particularly with some mitigation actions
16:35:25 <bauzas> like, I know libvirt gives me this but we can workaround by that
16:35:49 <bauzas> that way, we could log a better warning than just "heh, look, that's what I got from libvirt"
16:36:04 <bauzas> moving on anyway
16:36:14 <bauzas> next item (2 of 4)
16:36:19 <bauzas> marlinc: rotation_rate blueprint: https://blueprints.launchpad.net/nova/+spec/rotation-rate
16:36:32 <bauzas> I guess you're asking for a specless approval ?
16:37:00 <marlinc> Honestly I am not entirely sure what is best, I thought specless might work for this but I'm not sure
16:37:10 <bauzas> that's the point
16:37:26 <bauzas> you're saying you would get the detail from the cinder connection info
16:37:31 <bauzas> do we already have that ?
16:37:53 <sean-k-mooney> if we add a new compute service version for this and a min compute service version check i think that woudl resolve most if not all the upgrade concerns
16:38:07 <marlinc> Yes so we have an internal Cinder driver for our own storage platform which does give that property in the cinder connection info, we didn't have change anything in Cinder itself
16:38:12 <bauzas> I would honestly then recommend to write a spec
16:38:26 <sean-k-mooney> marlinc: is there an upstream driver that support this
16:38:31 <bauzas> there could be some upgrade and compatibility concerns I may see
16:38:36 <sean-k-mooney> marlinc: without that we can proceed with this in nova
16:38:47 <sean-k-mooney> we do not add supprot for out of tree drivers in other project
16:38:49 <marlinc> No there is currently no upstream driver that has this
16:39:12 <bauzas> then we can't just modify nova to blindly check anything that's not upstream
16:39:22 <sean-k-mooney> so implementing this in the lvm or a diffent intree cinder driver would likely be a requirement if we did this for cinder only
16:39:29 <bauzas> spec it is
16:39:49 <bauzas> and I guess you probably need to discuss that with the cinder folks
16:40:24 <marlinc> Oke I will look into creating a spec and also see if I can get this implemented in a in-tree driver
16:41:13 <bauzas> thanks
16:41:26 <bauzas> ping me anytime if you require assistance on the paperwork process
16:41:27 <marlinc> I'm going to have to see how to implement the min compute service version check and bump
16:41:42 <sean-k-mooney> marlinc: i would proably add it to the lvm driver personally, i think it woudl be relvitly simple to do
16:41:58 <bauzas> before doing anything with version checks, please check with the cinder folks about populating the rotation rate into the connection info details
16:42:14 <bauzas> yeah, lvm seems the easiast approach
16:42:25 <marlinc> (Also actually the functional test for live migration, honestly I have never implemented something so complex as I have no experience at all with the testing framework and how it works in Nova)
16:42:26 <bauzas> but that's not my own garden :)
16:42:48 <marlinc> Especially since there is no existing volume based live migration functional I could see
16:42:54 <bauzas> marlinc: once you get a go from the cinder folks about exposing the rotation rate, come to me
16:43:03 <sean-k-mooney> marlinc: if cinder accpet the enhancement we can revisit that and or help you with that
16:43:07 <marlinc> Alright
16:43:20 <bauzas> cool
16:43:23 <bauzas> moving on then
16:43:26 <marlinc> Thank you, I'll update our internal ticket
16:43:44 <bauzas> marlinc: I assume you know how to reach the cinder folks?
16:44:36 <marlinc> Well, I'm going to assume #openstack-cinder and I'll check their contribution documentation
16:44:39 <bauzas> I have no potential design disapproval on using a specific connection info detail for setting the right value in the xml
16:45:00 <sean-k-mooney> marlinc: yep thats the correct channel
16:45:05 <bauzas> but you'll need to write a nova spec for explaining that it'll require a recent enough cinder and new computes
16:45:19 <sean-k-mooney> by the way there may be a usecase here for nova local storage too.
16:45:22 <marlinc> I was maybe a bit afraid though about the special rotation_rate 1 that is libvirt specific
16:45:23 <bauzas> and testing will be interesting to be discussed in the spec proocess
16:45:40 <marlinc> But that is probably something for the spec
16:45:52 <bauzas> the rotation rate is storage specific
16:46:07 <bauzas> how you tune it for the guest is libvirt specific indeed
16:46:34 <marlinc> That is why, right now we return rotation_rate 1 for SSDs, however that might not be smart from a cinder -> nova integration perspective
16:46:42 <marlinc> It is a magic value
16:46:56 <sean-k-mooney> well really your tryign to say "this is an ssd" or not
16:46:58 <bauzas> we're overtime and we still have two topics to discuss
16:47:05 <sean-k-mooney> afnd there may be better way to express that
16:47:15 <bauzas> but I think we may need to discuss that from a cross-project perspective
16:47:19 <sean-k-mooney> ya lets continue this converatoin after the meeting
16:47:23 <marlinc> Yes that is right now the primary use case though we also use it to set it to 7200 for HDDs
16:47:32 <marlinc> Alright thank you
16:47:38 <bauzas> next one then (3 of 4)
16:47:45 <bauzas> Luzi: floating ip behavior (assign fip: https://bugs.launchpad.net/neutron/+bug/2060808  ,   remove fip: https://bugs.launchpad.net/nova/+bug/2060812)
16:47:47 <Luzi> these two bugs describe changes with the neutron handling of floating ips (compared to the deprecated nova code). 1. one allows allocating a floating ip to a vm even if it is attached to another vm ("stealing" it). The 2. one does not check the VM when removing a floating ip, resulting in always removing the ip (even if the vm-name was correct, and the ip a mistake)
16:48:17 <bauzas> are you asking for reviews on the nova patch ?
16:48:24 <Luzi> I raised that at the PTG with Neutron, but they wanted Nova poeple to look over it, and tell, whether you want to change these bahaviors or not
16:48:33 <bauzas> oh my bad
16:49:08 <Luzi> (i normally don't have time to attend the Nova meetings)
16:49:31 <Luzi> I would be glad if you could check out these bugs again and give your input on it
16:49:56 <bauzas> Luzi: well, it's hard to comment those bugs on a limited timeframe
16:50:09 <Luzi> yeah, we can discuss this on launchpad also
16:50:11 <bauzas> could you please go back to the nova channel on another time ?
16:50:15 <Luzi> sure
16:50:31 <bauzas> Luzi: well, what's your timezone, please remind me ?
16:50:39 <Luzi> UTC-1
16:51:09 <bauzas> okay, would that work if you would ping us again on tomorrow UTC afternoon ?
16:51:34 <Luzi> yeah i can do taht
16:51:43 <bauzas> thanks
16:52:01 <bauzas> last one
16:52:05 <bauzas> kgube: re-proposed extend volume completion spec: https://review.opendev.org/c/openstack/nova-specs/+/917133
16:52:12 <kgube> Hi, this is just a review request
16:52:27 <bauzas> cool, we'll review all the proposed specs indeed
16:52:40 <kgube> The spec has been reproposed from last cycle and not much has changed
16:52:49 <bauzas> I think I need to remember the exact situation we had last cycle
16:53:06 <bauzas> and why this didn't get enough traction
16:53:18 <bauzas> I think we were basically awaiting cinder's feedback, right?
16:53:21 <kgube> The cinder dependencies had to get merged
16:53:31 <bauzas> yah that
16:53:42 <bauzas> and the cinder spec was accepted ?
16:53:47 <kgube> but cinderclient now supports the feature
16:54:05 <bauzas> okay, so where are we exactly on the cinder side ?
16:54:17 <bauzas> everything eventually got merged ?
16:54:46 <kgube> well, there are some cinder changes left, but they depend on the nova change again
16:54:55 <bauzas> yeah
16:54:58 <bauzas> I remember that
16:55:09 <bauzas> but okay, that's on our plate now
16:55:21 <bauzas> I'll then review the spec reproposal
16:55:27 <sean-k-mooney> they depend on https://review.opendev.org/c/openstack/nova/+/873560 right
16:55:28 <kgube> thanks!
16:55:38 <sean-k-mooney> which was waiting for the cinder client release
16:55:42 <bauzas> yeah
16:55:42 <sean-k-mooney> which has now happened
16:55:46 <bauzas> exactly
16:56:00 <sean-k-mooney> ok then if that can be rebased
16:56:05 <bauzas> I don't see anything controversial but I'll just doublecheck
16:56:09 <kgube> yeah, i still need to fix the build for this
16:56:12 <sean-k-mooney> we can review and then cidner can complete the rest
16:56:14 <bauzas> before blindly reapproving
16:56:36 <bauzas> cool
16:56:49 <bauzas> I think the dust has settled then
16:57:03 <bauzas> are we good with wrapping up the meeting then ?
16:58:12 <bauzas> looks so
16:58:16 <bauzas> thanks all
16:58:20 <bauzas> #endmeeting