16:00:02 #startmeeting nova 16:00:03 Meeting started Tue Jun 1 16:00:02 2021 UTC and is due to finish in 60 minutes. The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:05 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:07 The meeting name has been set to 'nova' 16:00:07 o/ 16:00:08 \o 16:00:11 o/ 16:00:14 hotseating with neutron :) 16:00:33 yup, I guess the chair is hot 16:00:34 or hotswaping 16:00:41 o/ 16:00:43 :) 16:00:45 hi 16:00:47 maybe we should open the windows ? 16:00:49 slaweq: o/ 16:00:56 bauzas: :D 16:01:04 bauzas: :D 16:01:17 o/ 16:01:31 * bauzas misses the physical meetings :cry: 16:01:37 * gibi joins in 16:01:38 :) 16:01:50 o/ 16:01:52 you could see me sweating 16:01:56 this will be a bittersweet meeting 16:02:23 lets get rolling 16:02:24 #topic Bugs (stuck/critical) 16:02:28 no critical bugs 16:02:37 #link 9 new untriaged bugs (-0 since the last meeting): #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New 16:02:45 I like this stable number under 10 16:02:46 :D 16:02:59 any specific bug we need to discuss? 16:04:02 good 16:04:03 #topic Gate status 16:04:08 Placement periodic job status #link https://zuul.openstack.org/builds?project=openstack%2Fplacement&pipeline=periodic-weekly 16:04:11 super green 16:04:22 also nova master gate seems to be OK 16:04:26 we merged patches today 16:04:39 \o/ 16:04:44 thanks lyarwood I guess 16:05:07 thanks everybody who keep this up :) 16:05:17 any gate issue we need to talk about? 16:06:06 im still investingating the one that might be related to os-vif 16:06:16 but not form me 16:06:44 sean-k-mooney: thanks 16:07:21 if nothing else for the gate then 16:07:22 #topic Release Planning 16:07:27 We had Milestone 1 last Thursday 16:07:34 M2 is 5 weeks from now 16:07:52 at M2 we will hit spec freeze 16:08:08 hurry up with specs :) 16:08:15 anything else about the release? 16:09:16 #topic Stable Branches 16:09:25 copying elodilles' notes 16:09:36 newer stable branch gates needs investigation why those fail 16:09:40 wallaby..ussuri seems to be failing, mainly due to nova-grenade-multinode (?) 16:09:44 train..queens seems to be OK 16:09:48 pike gate fix is on the way, should be OK whenever it lands ( https://review.opendev.org/c/openstack/devstack/+/792268 ) 16:09:51 EOM 16:10:30 elodilles: on the nova-grenade-multinode failure, is the ceph issue you pushed a DNM patch for? 16:10:50 yes, that's it i think 16:11:04 in short we see to new ceph version (pacific) installed on stable 16:11:36 s/to/too/ 16:12:18 anything else about stable? 16:12:25 yes, melwitt's comment pointed that out ( https://review.opendev.org/c/openstack/nova/+/785059/2#message-c31738db1240ddaa629a3aaa4e901c5a62206e85 ) 16:12:39 nothing else from me :X 16:13:14 #topic Sub/related team Highlights 16:13:18 Libvirt (bauzas) 16:13:30 well, nothing worth mentioning 16:13:46 thanks 16:13:52 #topic Open discussion 16:13:57 I have couple of topics 16:14:02 (gibi) Follow up on the IRC move 16:14:06 so welcome on OFTC 16:14:11 +1 16:14:15 so far the move seems to be going well 16:14:33 I grepped our docs and the nova related wiki pages and fixed them up 16:14:45 Then again, if someone's trying to reach us over on Freenode, we'll never know, will we? 16:14:49 yeah thanks for that 16:14:55 artom: I will stay on freenode 16:14:56 Unless someone's stayed behind to redirect folks? 16:14:59 * sean-k-mooney was not really aware we mentioned irc in the docs before this 16:15:00 Aha! 16:15:00 we tho still have less people in the OFTC chan vs. the freenode one 16:15:14 yeah I will also stay on Freenode for redirect 16:15:18 gibi: me too, I'll keep the ZNC network open for a while 16:15:22 so if any discussion is starting on freenode I will redirect people 16:15:36 We are going to discuss in TC Thursday meeting on topic change on freenode or so 16:15:42 I believe you're not allowed to mention OFTC by name? 16:15:50 There are apparently bots that hijack channels if you do that? 16:15:51 artom: I will use private messages if needed 16:15:53 we currently have 102 attendees on freenode -nova compared to OFTC one 16:15:53 artom: we can do as we have OFTC ready and working now 16:16:01 freenode : 102, OFTC : 83 16:16:08 artom: that was librea and it was based on the topic i think 16:16:19 so I guess not everyone moved already 16:16:31 we can give this email ref to know details http://lists.openstack.org/pipermail/openstack-discuss/2021-May/022780.html 16:16:39 gmann: good point 16:16:39 artom: indeed, you can't spell the word 16:16:41 bauzas: well som of those are likely bots 16:16:50 artom: or the channels could be hijacked by some ops 16:17:04 sean-k-mooney: haven't really digged into details 16:17:10 but the numbers were appealing 16:17:27 but that's only a 48h change 16:17:30 yep most people are now here 16:17:33 we'll see next weeks 16:17:42 OK, any other feedback or question around the IRC move? 16:18:23 if not then a related topic... 16:18:28 (gibi) Do we want to move our meeting from #openstack-meetin-3 to #openstack-nova ? 16:18:34 this was a question originally from the TC 16:18:52 I think it make sense and also working for many projects like QA, TC afaik 16:18:58 -1 from me. I've popped in to other channel, not aware that they were having a meeting, and "polluted" their meeting 16:19:11 artom: which one? 16:19:21 artom: i raised that in the tc call 16:19:24 gmann, #openstack-qa, actually, I think :) 16:19:27 I see very less interruption in QA or TC since past 1 year 16:19:29 but that is really just a habbit thing 16:19:33 Folks were very polite and everything 16:19:41 artom: it might happen very less. 16:19:44 waitign for the topic to load adn see if the channel is active 16:19:49 I can handle interruption politely I guess 16:19:55 But I felt guilty for interrupting 16:19:57 and if anyone come in between we can tell them its meeting time 16:20:01 But... why? 16:20:07 What's wrong with a dedicated meeting channel? 16:20:11 artom: i have had the same feeling yes 16:20:33 artom: noting really just more infracture to schduing the meetings 16:20:33 I am absolutely +0 to this 16:20:44 e.g. "booking the room" 16:20:50 it is hard to know where the meeting is going on with all openstack-meeting-* channels. 16:20:52 and seeting up the loging ectra 16:20:59 but sometimes it's nice to have sideways discussions happening on -nova while we continue ranting here 16:21:06 sean-k-mooney, it's already all set up :) 16:21:11 there will be no difference in logging etc 16:21:32 artom: oh i know 16:21:39 bauzas: +1 on the side discussions 16:21:45 i do like having the side conversation option 16:21:46 so I guess my only concern would be the ability to have dual conversations happening at the same time without polluting the meeting 16:21:47 gmann, is it though? Maybe for someone completely new to the community 16:21:53 that said i try not to do that when i can 16:22:09 artom: its for me too when I need to attend many meeting :) 16:22:12 but I guess #openstack-dev could do the job 16:22:31 gmann, ah, I see your point. 16:22:39 my other concern could be some random folks pinging us straight during the meeting 16:22:44 Well, #openstack--meetings then? 16:22:46 i guess it depends i ususally wait for gibi to ping us :) 16:22:50 but that's not a big deal 16:22:57 (as I usually diverge) 16:22:58 I still want to keep the dual channel meeting/normal IRC option 16:23:07 artom: that will be too many channels 16:23:23 *shrug* w/e :) 16:23:33 #openstack-dev can fit the purpose of side discussions 16:23:37 dansmith: you are pretty quite on the topic 16:23:37 * artom joins bauzas in the +0 camp 16:23:52 sean-k-mooney: still have a conflict in this slot, will have to read later 16:23:53 dansmith: is in another call i think 16:24:06 I just express open thoughts and I'm okay with workarounds if needed 16:24:13 hence the +0 16:24:20 nothing critical to me to hold 16:24:21 dansmith: ah just asking if you prefered #openstck-nova vs #openstack-meeting-3 16:24:22 OK, lets table this for next week then. So far I don't see too many people wanting to move 16:24:24 dansmith: no worries 16:24:44 I will say let's try and if it does not work we can come back here 16:24:59 +1 on keeping it open for discussion 16:25:31 we will come back to this next week 16:25:40 next 16:25:42 (gibi) Monthly extra meeting slot for the Asia + EU. Doodle #link https://doodle.com/poll/svrnmrtn6nnknzqp . It seems Wednesday 8:00 or Thursday 8:00 is winning. 16:25:51 8:00 UTC I mean 16:25:51 sean-k-mooney: very much -nova 16:27:01 gibi: does that work for you to chair the meeting at that time 16:27:04 dansmith: I just expressed some concern about the ability to have side discussions, they could happen "elsewhere" tho 16:27:06 If no objection then I will schedule that to Thursday 8:00 UTC and I will do that on #openstack-nova (so we can try the feeling) 16:27:12 sean-k-mooney: yes 16:27:17 sean-k-mooney: I can chair 16:27:23 cool 16:27:34 this works for me too 16:27:45 10am isn't exactly early in the morning 16:28:08 * sean-k-mooney wakes up at 10:30 most mornings 16:28:09 there was a lot of participation in the doodle 16:28:24 yes 16:28:29 so I hope for similar crowd on the meeting too 16:28:32 indeed 16:28:36 form a number of names we dont see often in this neeting 16:28:55 yup, even from cyborg team 16:29:17 I will schedule the next meeting for this Thursday 16:29:29 so we can have a rule of every first Thurday of a month 16:29:30 (I mentioned this because they expressed their likeliness for a Asian-friendly timeslot) 16:30:10 gibi: this works for me, and this would mean the first meeting being held in two days 16:30:19 yepp 16:30:44 no more topic on the agenda for today. Is there anything else you would like to discuss today 16:30:47 ? 16:31:33 do we need some kind of formal agenda for thursday meeting ? 16:31:48 or would we stick with a free open hour 16:32:03 ? 16:32:15 I will ask the people about it on Thursday 16:32:23 I can do both 16:32:42 or just summarizing anyting from Tuesday 16:32:43 Oh, can I ask a specless blueprint vs spec question? 16:32:49 artom: sure, go ahead 16:33:07 So, we talked about https://review.opendev.org/c/openstack/nova-specs/+/791287 a few meetings ago 16:33:17 WIP: Rabbit exchange name: normalize case 16:33:38 sean-k-mooney came up with a better idea that solves the same problem, except without the messy upgrade impact: 16:33:50 Just refuse to start nova-compute if we detect the hostname has changed 16:34:11 So I want to abandon https://review.opendev.org/c/openstack/nova-specs/+/791287 and replace it with sean-k-mooney's approach 16:34:20 Which I believe doesn't need a spec, maybe not even a blueprint 16:34:45 you mean treat it as a bugfix if its not a blueprint 16:35:12 artom: the hostname reported by libvirt change compared to the hostname stored in DB? 16:35:18 gibi, yes 16:35:22 sean-k-mooney, basically 16:36:11 tbc, we don't mention the service name 16:36:13 artom: could there be deployments out there that are working today but will stop working after your cahnge? 16:36:21 but the hypervisor hostname which is reported by libvirt 16:36:30 because you have a tuple 16:36:35 gibi, we could wrap it in a config option, sorta like stephenfin did for NUMA live migration 16:36:39 (host, hypervisor_hostame) 16:36:47 gibi: tl;dr in the virt diver we would lookup the compute service record using CONF.host and in the libvirt driver check that 1 the compute nodes assocated with the compute service record is lenght 1 and 2 that its hypervior_hostnaem is the same as the one we currenmtly have 16:37:05 sean-k-mooney: thanks 16:37:12 eg. with ironic, you have a single nova-compute service (then, a single hostname) but multiple nodes, each of them being an ironic node UUID 16:37:40 sean-k-mooney: what puzzles me is that I thought service RPC names were absolutely unrelated to hypervisor names 16:37:46 bauzas, it would be for drivers that have a 1:1 host:node relationship 16:37:58 bauzas, but it's a good point, we'd have to make it drive-agnostic as much as possible 16:37:59 and CONF.host is the RPC name, hence the service name 16:38:11 artom: that's my concern, I guess 16:38:37 some ops wanna define some RPC name that's not exactly what the driver reports and we said for a while "you'll be fine" 16:38:40 bauzas, valid concern, though I think it'd be pointless to talk about it in a spec, without the code to look at 16:39:23 artom: tbh, I'm even not sure that other drivers but libvirt use the libvirt name as the service name 16:39:48 so we need to be... cautious, I'd say 16:39:51 I thought we use CONF.host as service name 16:39:54 bauzas: we cant do that wihtou breaking people 16:39:58 gibi: right 16:39:58 gibi: we do 16:40:03 OK 16:40:17 so the service name is hypervisor agnostic 16:40:19 gibi: but we use what the virt driver reports for the hypervisor_hostname field of the ComputeNode record 16:40:32 the node name is hypervisor specific 16:40:33 gibi: this is correct again 16:40:34 and in theory we use conf.host for the name we put in instance.host 16:41:00 and RPC name is also the service name so that is also hypervisor agnostic 16:41:06 yup 16:41:14 so if we need to fix the RPC name the we could do it hypervisor agonsticly 16:41:17 (I guess) 16:41:31 but here, artom proposes to rely on the discrepancy to make it hardstoppable 16:41:46 i think there are two issues 1 changing conf.host and two the hypervior_hostname changing 16:41:55 what if we detect the discrepancy via comparing db host with nova-compute conf.host? 16:42:06 ideally we would liek both to not change and detetc/block both 16:42:28 well, if you change the service name, then it will create a new service record 16:42:30 hm, we cannot look up our old DB record if conf.host is changed :/ 16:42:35 gibi: we would have to do that backwards 16:42:44 lookup compute node rp by hypervior_hostname 16:42:51 the old service will be seen as dead 16:42:51 then check compute service recrod 16:43:14 sean-k-mooney: so we detect if conf.host changes but hypervisor_hostname remains the same 16:43:36 gibi: yep we can check if either of the values changes 16:43:41 but not if both of the values change 16:43:50 what problem are we trying to solve ? 16:44:01 the fact that messages go lost ? 16:44:02 sean-k-mooney: OK thanks, now I see it 16:44:03 unless we just write this in a file on disk that we read 16:44:38 bauzas, people renaming their compute hosts and exploding stuff 16:44:55 Either on purpose, or accidentally 16:45:01 bauzas: the fact that is possible to start the compute service when either conf.host has change or hypervisor_hostname has changed and get in an inconsitent state 16:45:24 bauzas: one of those bing that we can have instance on the same host with different valuse of instace.host 16:45:32 I guess if both changed then we get a new working compute and the old will be orphaned 16:45:46 gibi: yep 16:45:49 OK 16:45:59 assuming we can solve this, I don't see this as any different to gibi's change to disallow N and N-M (M > 1) in the same deployment 16:46:03 sean-k-mooney: can we just consider to NOT accept to rename the service hostname if instances are existing on it ? 16:46:18 in terms of being a hard break but only for people that already in a broken state 16:46:26 bauzas: that might also be an option yes 16:46:30 ditto for the NUMA live migration thing, as artom alluded to above 16:46:59 bauzas: i think we have options and it might be good to POC some of them 16:47:13 again, I'm a bit conservative here 16:47:18 I think I convinced that this can be done for libvirt driver. As of how to do it for the other drivers it remains to be seen 16:47:50 I get your point but I think we need to be extra cautious, especially with some fancy scenarios involving ironic 16:48:07 gibi: that's a service issue, we need to stay driver agnostic 16:48:15 bauzas: ack although what i was orignly suggestin was driver speific 16:48:23 sean-k-mooney, right, true 16:48:29 sean-k-mooney: hence my point about hypervisor_hostname 16:48:44 yes it not that i did not consider ironic 16:48:44 It would be entirely within the libvirt driver in init_host() or something, though I suspect we'd have to add new method arguments 16:48:47 I thought you were mentioning it as this is the only fied being virt specific 16:48:51 i intentionally was declaring it out of scope 16:48:58 and just fixinbg the issue for libvirt 16:49:11 bauzas: i dont think this can happen with ironic for what its worth 16:49:13 artom: init_host() is very late in the boot proicess 16:49:23 bauzas, well pre_init_host() then :P 16:49:29 you already have a service record 16:49:39 artom: pre init_host, you are driver agnostic 16:49:39 bauzas: we need a libvirt connection though 16:49:45 gibi: I know 16:49:48 bauzas: so I'm not sure we can do it earlier 16:49:52 gibi: my point 16:49:58 bauzas: we need to do it after we retrive teh compute service recored but before we create a new one 16:50:18 iirc, we create the service record *before* we initialize the libvirt connection 16:50:46 this wouldn't be idempotent 16:50:50 perhaps we should bring this to #openstack-nova 16:51:22 I honestly feel this is tricky enough for drafting it somewhere... unfortunately like in a spec 16:51:27 FWIW we call init_host() before we update the service ref... 16:51:46 especially if we need to be virt-agnostic 16:51:50 And before we create the RPC server 16:51:58 artom: ack, then I was wrong 16:52:15 'ts what I'm saying, we need the code :) 16:52:22 In a spec it's all high up and abstract 16:52:29 I thought we did it in pre_hook or something 16:52:36 no we can talk about this in a spec if we need too 16:52:38 poc, then 16:52:43 specs dont hae to be high level 16:52:46 poc, poc, poc 16:52:52 ok 16:53:05 OK, lets put up some patches 16:53:09 discuss it there 16:53:09 🐔 poc poc poc it is then 🐔🐔 16:53:23 and see if this can fly 16:53:30 Chickens can't fly 16:53:33 I'm OK to keep this without a spec so far 16:53:33 if we have a few mints i have one ther simialr topic 16:53:47 https://bugzilla.redhat.com/show_bug.cgi?id=1700390 16:53:48 bugzilla.redhat.com bug 1700390 in openstack-nova "KVM-RT guest with 10 vCPUs hangs on reboot" [High,Closed: notabug] - Assigned to nova-maint 16:53:55 sean-k-mooney: sure 16:54:01 we have ^ downstream 16:54:25 baically when using realtime you shoudl alway use hw:emulator_tread_polcy=something 16:54:46 but we dont disally it because while its a bad idea not to it can work 16:55:16 im debating between filing a whish list bug vs a specless blueprint for a small change in our defaultl logic 16:55:26 feels like a bug to me we can fix in nova. even if it works some times we can disallow it 16:55:39 if i rememeber correct we stil require at least 1 core to not be realtime 16:55:54 so i was tinking we could limit the emulator tread to that core 16:56:04 sounds like a plan 16:56:19 if we don't have such a limit then I think we should add that as weel 16:56:22 well 16:56:36 it does not make much sense not to have a o&m cpu 16:56:36 we used too 16:56:39 i dont think we removed it 16:56:56 would people be ok with this as a bugfix 16:57:09 i was conserded it might be slitghly featureish 16:57:22 yes. It removes the possibility of a known bad setup 16:58:01 ok ill file and transcribe the relevent bit from the downstream bug so 16:58:25 the workaround is just use hw:emulator_tread_polcy=share|isolate 16:58:52 but it would be nice to not have the bugging config by default 16:58:59 I agree 16:59:05 and others seems to be silent :) 16:59:09 so it is sold 16:59:17 any last words before we hit the top of the hour? 17:00:04 then thanks for joining today 17:00:08 o/ 17:00:12 #endmeeting