16:00:41 #startmeeting nova 16:00:41 Meeting started Tue Jun 4 16:00:41 2024 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:41 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:41 The meeting name has been set to 'nova' 16:00:45 hey folks 16:00:51 o/ 16:00:55 #link https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting 16:01:03 should be a quick one hopefully 16:01:16 \o 16:01:31 \o 16:01:43 \o 16:01:45 o/ 16:02:11 o/ 16:02:26 I forgot to put my topic in the agenda, is that still oke? 16:02:38 shoot it in the agenda, sure 16:02:47 okay, let's start 16:02:54 #topic Bugs (stuck/critical) 16:03:02 #info No Critical bug 16:03:10 #info Add yourself in the team bug roster if you want to help https://etherpad.opendev.org/p/nova-bug-triage-roster 16:03:19 anything about buuugs ? 16:03:46 \o 16:03:54 o/ 16:03:56 looks not 16:03:59 let's move on then 16:04:13 #topic Gate status 16:04:18 #link https://bugs.launchpad.net/nova/+bugs?field.tag=gate-failure Nova gate bugs 16:04:24 #link https://etherpad.opendev.org/p/nova-ci-failures-minimal 16:04:32 #link https://zuul.openstack.org/builds?project=openstack%2Fnova&project=openstack%2Fplacement&pipeline=periodic-weekly Nova&Placement periodic jobs status 16:04:38 (all greens, huzzah) 16:04:44 #info Please look at the gate failures and file a bug report with the gate-failure tag. 16:04:49 #info Please try to provide meaningful comment when you recheck 16:04:59 I haven't seen any important gate failure 16:05:09 and you ? 16:06:51 ok, np 16:06:57 #topic Release Planning 16:07:02 #link https://releases.openstack.org/dalmatian/schedule.html 16:07:08 as a reminder, nova deadlines are ^ 16:07:15 #info Dalmatian-2 in 4 weeks 16:07:18 tick-tock 16:07:37 nothing planned as a review day until milestone-2 where we will have a spec review day 16:07:52 anything else about our schedule ? 16:07:57 Is there a freeze day for bug fixes? 16:08:46 I only see a freeze for specs 16:09:11 ah, I see now, seems common to all projects 16:09:14 we can merge bugfixes until RC1 16:09:50 after RC1, we then branch master for Dalmatian 16:10:04 ok, so, what kind of changes are accepted after RC1? 16:10:11 then, in between RC1 and GA, we can't merge any bugfix, ony regression fixes 16:10:24 hmm, ok, makes sense 16:10:33 but we can still merge bugfixes in master 16:10:47 Right, that will be available for the next release 16:10:54 since master will then be E (and no longer D) after RC1 16:11:17 so basically, we can merge any bugfixes anytime 16:11:22 for master 16:11:57 the only soft freeze is really betwee Feature freeze and RC1 16:12:02 but if you want your fix to be on a specific release, then either before RC1 or after GA by backporting it 16:12:14 we can merge a bugfix during that period but prefer to limit them ot regressions 16:12:18 right 16:12:20 sean-k-mooney: not for bugfixes 16:12:45 anyway, I think you have your answer 16:12:50 can we move ? 16:13:03 #topic Review priorities 16:13:06 bauzas: we have prevsiously ased reviews not to merge bugfixes for bugs not intoduced in the current release in that period. but yes we can move on 16:13:14 #link https://etherpad.opendev.org/p/nova-dalmatian-status 16:13:24 nothing to say but please look at it ^ 16:13:31 #topic Stable Branches 16:13:33 I have a request here 16:13:46 for Review priorities 16:14:16 I would like to get some attention on the Shared security groups patches? 16:14:40 #undo 16:14:40 Removing item from minutes: #topic Stable Branches 16:14:41 s/?/\./g 16:14:56 I might have put my blueprint in the wrong meeting section, I'm not sure if it is this or open discussion 16:15:06 no that at the end 16:15:12 marlinc: we'll discuss this on the open discussion 16:15:17 which is where we should discuss the shared security group patches :) 16:15:29 Thank you :) 16:15:30 erlon: for your series, this will be reviewed like any other 16:15:41 ow, came on, thas a review priority topic :) 16:16:01 https://etherpad.opendev.org/p/nova-dalmatian-status?#L37 16:16:05 yeah 16:16:16 Right, I just want to make sure that I don't miss any deadlines on that one 16:16:18 yes and no its not identifed as a review priortiy form my perspective 16:16:18 given your blueprint was accepted 16:16:26 but ill try and review it after the meeting 16:16:34 looks like its passing ci now 16:16:37 And I also have a special request on that 16:16:41 ditto here, I already have a lot of other series to look at 16:16:55 moving on then 16:17:01 #topic Stable Branches 16:17:11 elodilles: heya 16:17:16 o/ 16:17:24 #info stable gates should be mostly OK 16:17:40 (i've seen some intermittent failures, otherwise, nothing special) 16:17:48 #info stable release proposed for 2024.1 Caracal: https://review.opendev.org/c/openstack/releases/+/921287 16:18:07 we had Bobcat and Antelope stable releases some weeks ago, 16:18:21 maybe we can release now Caracal as well ^^^ 16:18:34 feel free to comment on the release patch 16:18:45 #info stable branch status / gate failures tracking etherpad: https://etherpad.opendev.org/p/nova-stable-branch-ci 16:18:55 add any issue if you see one ^^^ 16:19:17 and that's all from me 16:20:25 thanks 16:20:35 I'll look at the caracal patch 16:21:04 bauzas: thx in advance 16:21:09 +1d fwiw 16:21:22 #topic vmwareapi 3rd-party CI efforts Highlights 16:21:29 fwiesel: are you here ? 16:21:38 Hi, yes... 16:21:45 #info No updates. 16:22:13 Still little progress from my side currently. Sorry about that. 16:22:22 Any questions or comments? 16:22:41 not from me 16:22:55 moving on then 16:23:00 #topic Open discussion 16:23:11 erlon: nova handling of virDomainGetJobStats() errors 16:23:27 so, I posted a link on the wiki 16:23:41 https://gist.githubusercontent.com/sombrafam/8f177cbc4e153c328a242811bc24650e/raw/67190ffce7f036c7c3d3628fda15327cef7a41da/nova-compute.log 16:23:45 (woah, we have 4 topics to discuss today so please discuss everyone by only 5-10 mins maxc) 16:23:54 This bug, is happening every time that we try to do a lot of migrations 16:24:29 just file a blueprint bug report 16:24:36 For some reason the source host stops responding, and it gets to the time out. I want to know if would be okay to add a new Handler for this kind of exception there: 16:24:36 https://github.com/openstack/nova/blob/master/nova/virt/libvirt/guest.py#L669 16:24:40 and ping us anytime on the chat 16:24:45 erlon: does that imply you you have set the max conncurrent live migration over 1 16:24:51 ah ok, sounds good 16:25:14 I don't know the exact configuration but very likely 16:25:24 https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.max_concurrent_live_migrations 16:25:35 that should bassicly never be set to anything excpet 1 16:26:02 So why does that exist? Which case should it apply? 16:26:12 there are usecases for it to be != 1 but unless you using 100G networking its not recommended 16:26:16 it just says it can't connect to the libvirt API 16:26:37 it exists for usecases wehere a single core is not enough to saturate the network link 16:26:47 there could be different reasons why libvirt is resetting the connection 16:27:02 It says that because it time it out due to a short keepalive 16:27:02 bauzas:right but since they said this was load related 16:27:22 not doning concurrent live migrations may help 16:27:40 oh, missed that sentence, indeed good point 16:27:50 don't try to max out the number of calls you make to libvirt 16:28:01 like any other API, it has limits 16:28:23 This comes from the libvirt logs: "At the same time in libvirtd logs: `May 28 13:00:39 ps5-ra1-n6 libvirtd[612268]: internal error: connection closed due to keepalive timeout`" 16:28:41 So, that's why we know thats a timeout error 16:28:51 erlon: to answer your original question, no, I wouldn't recommend to modify Nova to handle that libvirt exception 16:29:28 particularly when the bug comes from libvirt, not nova itself 16:30:04 But see, that function does exactly that. it handles only libvirt exceptions 16:30:31 there are some cases where retrying at the nova level is correct 16:30:38 this may or may not be one of them 16:30:39 specific and meaningful libvirt exceptions, that's the point :) 16:30:47 can you right this up as a bug report if you have not already 16:31:03 and we can see see if this is such a case 16:31:03 retrying on a generic connection failed is never a good idea 16:31:13 ok, I will as soon as I get a reproducer 16:31:33 bauzas: we do that all the time for rest calls 16:31:34 honestly, you have my opinion 16:31:49 sean-k-mooney: to placement 16:31:50 this is us doing a read call to check the status fo the job 16:31:58 bauzas: and neutorn and cinder 16:31:58 I was actually thinking and just returning, an empty info and leave nova call it again in the next periodic task, since this is just a info report 16:32:48 perhaps or catching it an logging a warning 16:32:56 rather then a trace back 16:33:03 I wouldn't hide 16:33:11 anywya if you have a repoducer we can discuss the solution in gerrit 16:33:12 if I were catching that exception 16:34:00 The issue with the current process is that, although it triggers the section again, the 16:34:00 periodic tasks do not complete the migration. Consequently, while the VM is migrated 16:34:00 to the destination host, it is not properly cleaned up on the source host. 16:34:03 honestly, exception giving a better log to the operator saying 'libvirt is getting weirdo, see", I don't see the benefit of adding another handler to a periodic 16:34:11 Merged openstack/nova stable/2023.2: Improve logging at '_numa_cells_support_network_metadata' https://review.opendev.org/c/openstack/nova/+/900840 16:34:30 anyway, we have three other topics to discuss 16:34:45 erlon: please file a bug report and come to us once's done 16:34:56 Okay, let's move, thanks 16:34:57 and yeah, a reproducer may help 16:35:11 particularly with some mitigation actions 16:35:25 like, I know libvirt gives me this but we can workaround by that 16:35:49 that way, we could log a better warning than just "heh, look, that's what I got from libvirt" 16:36:04 moving on anyway 16:36:14 next item (2 of 4) 16:36:19 marlinc: rotation_rate blueprint: https://blueprints.launchpad.net/nova/+spec/rotation-rate 16:36:32 I guess you're asking for a specless approval ? 16:37:00 Honestly I am not entirely sure what is best, I thought specless might work for this but I'm not sure 16:37:10 that's the point 16:37:26 you're saying you would get the detail from the cinder connection info 16:37:31 do we already have that ? 16:37:53 if we add a new compute service version for this and a min compute service version check i think that woudl resolve most if not all the upgrade concerns 16:38:07 Yes so we have an internal Cinder driver for our own storage platform which does give that property in the cinder connection info, we didn't have change anything in Cinder itself 16:38:12 I would honestly then recommend to write a spec 16:38:26 marlinc: is there an upstream driver that support this 16:38:31 there could be some upgrade and compatibility concerns I may see 16:38:36 marlinc: without that we can proceed with this in nova 16:38:47 we do not add supprot for out of tree drivers in other project 16:38:49 No there is currently no upstream driver that has this 16:39:12 then we can't just modify nova to blindly check anything that's not upstream 16:39:22 so implementing this in the lvm or a diffent intree cinder driver would likely be a requirement if we did this for cinder only 16:39:29 spec it is 16:39:49 and I guess you probably need to discuss that with the cinder folks 16:40:24 Oke I will look into creating a spec and also see if I can get this implemented in a in-tree driver 16:41:13 thanks 16:41:26 ping me anytime if you require assistance on the paperwork process 16:41:27 I'm going to have to see how to implement the min compute service version check and bump 16:41:42 marlinc: i would proably add it to the lvm driver personally, i think it woudl be relvitly simple to do 16:41:58 before doing anything with version checks, please check with the cinder folks about populating the rotation rate into the connection info details 16:42:14 yeah, lvm seems the easiast approach 16:42:25 (Also actually the functional test for live migration, honestly I have never implemented something so complex as I have no experience at all with the testing framework and how it works in Nova) 16:42:26 but that's not my own garden :) 16:42:48 Especially since there is no existing volume based live migration functional I could see 16:42:54 marlinc: once you get a go from the cinder folks about exposing the rotation rate, come to me 16:43:03 marlinc: if cinder accpet the enhancement we can revisit that and or help you with that 16:43:07 Alright 16:43:20 cool 16:43:23 moving on then 16:43:26 Thank you, I'll update our internal ticket 16:43:44 marlinc: I assume you know how to reach the cinder folks? 16:44:36 Well, I'm going to assume #openstack-cinder and I'll check their contribution documentation 16:44:39 I have no potential design disapproval on using a specific connection info detail for setting the right value in the xml 16:45:00 marlinc: yep thats the correct channel 16:45:05 but you'll need to write a nova spec for explaining that it'll require a recent enough cinder and new computes 16:45:19 by the way there may be a usecase here for nova local storage too. 16:45:22 I was maybe a bit afraid though about the special rotation_rate 1 that is libvirt specific 16:45:23 and testing will be interesting to be discussed in the spec proocess 16:45:40 But that is probably something for the spec 16:45:52 the rotation rate is storage specific 16:46:07 how you tune it for the guest is libvirt specific indeed 16:46:34 That is why, right now we return rotation_rate 1 for SSDs, however that might not be smart from a cinder -> nova integration perspective 16:46:42 It is a magic value 16:46:56 well really your tryign to say "this is an ssd" or not 16:46:58 we're overtime and we still have two topics to discuss 16:47:05 afnd there may be better way to express that 16:47:15 but I think we may need to discuss that from a cross-project perspective 16:47:19 ya lets continue this converatoin after the meeting 16:47:23 Yes that is right now the primary use case though we also use it to set it to 7200 for HDDs 16:47:32 Alright thank you 16:47:38 next one then (3 of 4) 16:47:45 Luzi: floating ip behavior (assign fip: https://bugs.launchpad.net/neutron/+bug/2060808 , remove fip: https://bugs.launchpad.net/nova/+bug/2060812) 16:47:47 these two bugs describe changes with the neutron handling of floating ips (compared to the deprecated nova code). 1. one allows allocating a floating ip to a vm even if it is attached to another vm ("stealing" it). The 2. one does not check the VM when removing a floating ip, resulting in always removing the ip (even if the vm-name was correct, and the ip a mistake) 16:48:17 are you asking for reviews on the nova patch ? 16:48:24 I raised that at the PTG with Neutron, but they wanted Nova poeple to look over it, and tell, whether you want to change these bahaviors or not 16:48:33 oh my bad 16:49:08 (i normally don't have time to attend the Nova meetings) 16:49:31 I would be glad if you could check out these bugs again and give your input on it 16:49:56 Luzi: well, it's hard to comment those bugs on a limited timeframe 16:50:09 yeah, we can discuss this on launchpad also 16:50:11 could you please go back to the nova channel on another time ? 16:50:15 sure 16:50:31 Luzi: well, what's your timezone, please remind me ? 16:50:39 UTC-1 16:51:09 okay, would that work if you would ping us again on tomorrow UTC afternoon ? 16:51:34 yeah i can do taht 16:51:43 thanks 16:52:01 last one 16:52:05 kgube: re-proposed extend volume completion spec: https://review.opendev.org/c/openstack/nova-specs/+/917133 16:52:12 Hi, this is just a review request 16:52:27 cool, we'll review all the proposed specs indeed 16:52:40 The spec has been reproposed from last cycle and not much has changed 16:52:49 I think I need to remember the exact situation we had last cycle 16:53:06 and why this didn't get enough traction 16:53:18 I think we were basically awaiting cinder's feedback, right? 16:53:21 The cinder dependencies had to get merged 16:53:31 yah that 16:53:42 and the cinder spec was accepted ? 16:53:47 but cinderclient now supports the feature 16:54:05 okay, so where are we exactly on the cinder side ? 16:54:17 everything eventually got merged ? 16:54:46 well, there are some cinder changes left, but they depend on the nova change again 16:54:55 yeah 16:54:58 I remember that 16:55:09 but okay, that's on our plate now 16:55:21 I'll then review the spec reproposal 16:55:27 they depend on https://review.opendev.org/c/openstack/nova/+/873560 right 16:55:28 thanks! 16:55:38 which was waiting for the cinder client release 16:55:42 yeah 16:55:42 which has now happened 16:55:46 exactly 16:56:00 ok then if that can be rebased 16:56:05 I don't see anything controversial but I'll just doublecheck 16:56:09 yeah, i still need to fix the build for this 16:56:12 we can review and then cidner can complete the rest 16:56:14 before blindly reapproving 16:56:36 cool 16:56:49 I think the dust has settled then 16:57:03 are we good with wrapping up the meeting then ? 16:58:12 looks so 16:58:16 thanks all 16:58:20 #endmeeting