08:00:16 #startmeeting nova_extra 08:00:17 Meeting started Thu Jun 3 08:00:16 2021 UTC and is due to finish in 60 minutes. The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot. 08:00:18 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 08:00:20 The meeting name has been set to 'nova_extra' 08:00:58 welcome o/ 08:01:16 \o/ 08:01:38 o/ 08:01:51 ... waiting a minute to lets everybody join 08:03:24 So a quick summary. We agree to have this meeting time slot one a Months in every first Thursday here in #openstack-nova 08:03:33 s/one/once/ 08:04:04 but we haven't talked about yet what would be a good format for it 08:04:19 should we go through the normal meeting agenda 08:04:33 talking about bugs, CI, release, stable, 08:04:44 or just go with an open agenda, like in an office hour 08:05:06 what is the preference of the people present? 08:05:14 I can do both if needed 08:07:07 This is the first extra meeting. My colleague didn't prepare for it enough. 08:07:18 suzhengwei__: no worries 08:07:56 Open agenda will be all right. 08:08:09 If no oppinions about the format then I would go with the simple open agenda 08:08:19 suzhengwei__: cool, then we are on the same page 08:08:23 ok 08:08:27 and we can change later if needed 08:08:59 I have one agenda point for today 08:09:47 just want to refresh the process we have in nova about the release 08:09:56 so we just had Milestone 1 last week 08:10:07 that that point of time there is no special deadline in nova 08:10:22 the next Milestone will happen in 5 weeks from now 08:10:33 at Milestone 2 we will do spec freeze 08:11:09 it means that if you have open specs then you have to get merged before M2 or it will need to be re-proposed to the Y release during the autumn 08:11:21 I'm planning to have a spec review day before M2 08:11:29 similarly how we had such day before M1 08:12:12 those features that got approved before M2 could be implemented and merged before M3 08:12:20 at M3 we will have feature freeze 08:12:34 sorry for being late, got some issue with my laptop notification 08:12:42 bauzas: no worried :0 08:12:43 :) 08:13:03 basically this is the importnat deadlines 08:13:20 the exact dates are #link https://releases.openstack.org/xena/schedule.html 08:13:31 is there any question about these? 08:14:42 Got it. 08:15:12 cool 08:15:47 that was what I prepared for today. 08:15:53 next open office hour would be around the spec freeze 08:15:53 Is there any topic you want to discuss? 08:16:01 so, I have a concern 08:16:02 bauzas: god point 08:16:05 good even 08:16:20 bauzas: tell us 08:16:29 how can we help contributors that are not in the nova meeting for their own specs ? 08:16:41 how can we discuss ? 08:17:08 bauzas: what problem do you see that we need to dicuss in a meeting? stuck reviews? 08:17:19 and if we want to have priorities between some reviews, how could we know which ones ? 08:18:05 gibi: my point is that sometimes it's nice to discuss directly in IRC when you have some spec questions 08:18:10 sure 08:18:19 but it does not need to be a in the form of a meeting 08:18:28 yup 08:18:50 so, I wonder how we could help the contributors that aren't around in general 08:18:53 still if there is question now, I'm happy to hear them even during this meeting 08:19:22 bauzas: I'm OK to do discussion primarily in the spec review 08:19:52 and if we hit a wall there then we can try to find a common time between the spec author and the reviewers to resolve 08:19:57 the block 08:20:01 yup 08:20:16 suzhengwei, XinxinShen do you have a view on this? 08:21:22 do you feel it is hard to discuss issues about the specs? 08:23:21 in general, it takes more than one week 08:23:37 for example, when I review a spec, I provide some comment 08:23:50 but then the reply could be on the next day 08:24:00 and then I'd only see it honestly by the end of the week 08:24:23 Is this channel often open to all? I means if i want to get help, would someone react in time? 08:24:41 suzhengwei: yes, in general, we do it this way 08:24:50 good 08:24:58 suzhengwei: for example, when I'm reviewing some gibi's spec, I'm pinging him 08:25:04 suzhengwei: it depends on the time zone. like I tend to be here between UTC 7:00 - UTC 17:00 on workdays 08:25:08 telling him that I had some questions 08:25:25 so, sometimes we directly discuss on IRC for a spec 08:25:31 cool 08:25:35 but like gibi said, we're both on the same TZ 08:25:45 so it's simple 08:26:09 suzhengwei: I keep my client up during the night so I will see pings next day 08:26:14 me too 08:28:37 cool, We should ensure that the spec author and reviewer can communicate in time. 08:28:46 suzhengwei: if you ping us, one thing to help is not just pinging but stating your problem right away 08:29:50 yes 08:30:30 I am doing some work about instance HA, and interested in the topic "Support vm evacuation while server status is suspended, paused". 08:31:00 I wonder if some one has processed it. 08:31:27 I still have open specs to look at 08:31:49 If not, I would like to do it. 08:31:57 suzhengwei: interesting ideas 08:32:34 suzhengwei: I think we don't support these today, but if with some compromise we could 08:33:18 we will loose the in memory state of the paused instance but we could rebuild it still on another compute host 08:33:48 and if the source compute is dead already then the in memory state is lost anyhow 08:34:13 host failure triggers evacuation. And active instance loose the in memory state too. 08:34:32 suzhengwei: yes, so I think it is OK to lose that for the pause instance too 08:34:41 suzhengwei, bauzas: does suspend saves some state to the disk? 08:34:49 good question 08:35:07 I honestly don't have the answer straight out of my mind 08:35:22 anyhow if it save something to disk and that disk is on shared storage then we might even recover that saves state on the destination host 08:35:30 the crucial bit to remember with instance HA is that the host is already done 08:35:32 but as a first step I would loose that too 08:35:33 gone* 08:35:47 bauzas: yepp 08:35:56 so, yeah, ephemeral storage can't be somehow persisted 08:36:17 except if it is on shared storage ^^ ;) 08:36:19 you need to have either shared storage or volumes 08:36:23 yepp 08:36:38 yeah, but in general, you need to assume a crash 08:36:52 so any memory that's not synced is lost 08:36:58 bauzas: yeah, this is why first I would assume that the suspended state is lost as well 08:37:16 to avoid an inconsistent suspended state to be loaded 08:37:18 if suspend stores on disk, we're ok 08:37:32 kashyap: around ? 08:37:40 bauzas: Mornin, yes 08:37:53 How can I be useful? :) 08:37:57 kashyap: we are in office hour and we have a question about suspended instances 08:38:19 * kashyap reads back 08:38:25 with the libvirt driver and qemu, what happens to the memory state when suspending ? 08:38:32 do we suspend on disk ? 08:38:41 I'd be inclined to say so 08:38:52 bauzas: gibi: Yes: suspend usually indeed means save-the-state-to-a-file-on-disk 08:39:02 Your inclination is correct :) 08:39:33 the problem is that we can't tell whether the instance is on shared storage or not 08:39:43 kashyap: and what do you think, moves such suspended state between compute host make sense? 08:40:25 gibi: keep in mind evacuate is a rebuild 08:40:26 gibi: You mean moving such suspended state between different compute hosts make sense? 08:40:40 kashyap: yeaht that is my queston 08:41:05 bauzas: in case of suspend we see the vm_state on the dest being suspended so we can look for the state file on the disk. if it is there then we know that it was on shared storage 08:41:27 gibi: sure but then we leak the state of the host 08:41:40 this isn't predictable 08:42:28 OK, I agree this can be a can of worm 08:42:54 suzhengwei: in case of evacuating a suspended VM, is it OK to you to loose the suspended state? 08:44:03 gibi: I need to think a bit more about it. (libvirt has managedSave() API that does the suspend thingie, which already Nova uses. So we have the primitives...) 08:44:06 If host down, active suspend instance both loose their memory. 08:44:53 gibi: suzhengwei: What is the main use-case here? The ability to start suspended instances on any compute host from a given pool? 08:44:59 suzhengwei: if you don't want to recover the suspended state that is saved to disk, then I think your proposal is pretty simple and straight forward 08:45:08 Instance Ha, try best to recover the workload as much as possible. 08:45:17 interestingly, I found some nova admin docs https://docs.openstack.org/nova/latest/admin/node-down.html 08:45:38 kashyap: we looked at it from evacuation perspective. VM is suspended to disk (on shared storage), the host dies, user evacuates VM 08:46:04 gibi: I see; that makes sense 08:46:51 gibi: I honestly feel we can just support recreating a new instance 08:47:07 bauzas: Isn't that what already 'rebuild' is? 08:47:16 Ah, you said that already above :) 08:47:18 kashyap: yup, the question was about the memory state 08:47:51 If host down, the suspend instance can be active agian on the origin node. So I think it makes sence to evacuate suspend instances. 08:48:01 can not 08:48:06 I guess here suzhengwei's concern is that we limitate evacuate on active instances 08:48:14 right? 08:48:22 that's the problem we're trying to solve ? 08:48:35 I think so 08:48:39 yes 08:49:09 and I'm totally supportive to extend evac to support paused and suspended instances. It is simple if we allow loosing the running state 08:49:11 I just remembered we have a --on-shared-storage flag https://docs.openstack.org/nova/latest/admin/evacuate.html#evacuate-a-single-instance 08:49:36 since evacuate is an admin action, op can use it 08:49:39 on purpose 08:50:51 so we already do the check automatically 08:50:52 bauzas: onSharedStorage is deprecated in 2.13 08:51:00 bauzas: today we automatically detect it I guess 08:51:01 gibi: because we detect this ? 08:51:04 yeah 08:51:09 "Starting since version 2.14, Nova automatically detects whether the server is on shared storage or not. Therefore this parameter was removed." 08:51:12 yepp 08:51:20 ok, so I guess we can consider adding suspend 08:51:31 suzhengwei: I suggest to propose a small spec about this. I'm happy to review it 08:51:44 if the target host is on shared storage, we could just try to boot with the suspended state 08:52:02 for paused, the implication would be that the evacuated instance would become active 08:52:15 for suspend, too 08:52:15 bauzas: active, or stopped 08:52:20 bauzas: we can decide 08:52:27 yup, that's the point 08:52:31 bauzas: but true, it cannot be pasued any more 08:52:36 paused 08:53:05 I don't want to stop the discussion, but we have 8 minutes left. If there any other topic to discuss? 08:53:14 /If/Is/ 08:53:24 bauzas: gibi: One last: 08:53:28 kashyap: go 08:53:30 I think stopped is better. No matter pause or suspend, users can not acess the instance directly. 08:53:43 suzhengwei: I can accept that 08:54:07 gibi: suzhengwei: On whether it makes sense of moving suspended instances between compute hosts, a thumb-rule can be: "follow the same rules for hardware matching as for a live migration between the hosts" 08:54:19 (I mean, to uncover any "gotchas") 08:54:47 kashyap: ahh you have a point, this state can be hw dependent 08:54:49 FWIW, I also just checked the above w/ a QEMU migration developer; and he agrees. 08:54:49 kashyap: since evacuate is a rebuild, we can't predict this 08:55:25 gibi: sorry, I wasn't explicit but when I said 'we're gonna try to unsuspend from disk", I was thinking of hardware capabilities 08:55:28 OK, then I propose not to try to recover the suspended state during evac. At list not in the first step 08:55:40 bauzas: gibi: Hm, so looks like this needs to be fleshed out in a design document 08:55:41 /list/least/ 08:56:06 kashyap: the evacuate workflow is waaaaay different from live-migrate 08:56:16 you can't just check the source host at first ;) 08:56:27 and compare both 08:56:45 the scheduler is just giving you a target and then good luck with it 08:56:57 bauzas: I see; fair enough 08:57:03 so in summary 08:57:04 but yeah, we're 4 mins 08:57:07 left 08:57:07 so in summary 08:57:49 suzhengwei: please propose a spec. I don't see any problem supporting evac for paused and suspended VMs. But they will lose the in memory or suspended state. They will be fresh VMs on the dest host in stopped state 08:58:18 Yeah; makes sense. 08:58:24 I will. 08:58:26 (On spec) 08:58:32 suzhengwei: cool, thanks 08:58:48 any last words before we stop the meeting? ;) 08:59:03 nothing from me. 08:59:07 XinxinShen: ? 08:59:37 nothing for me. thanks. 08:59:56 then thanks for joining. please continue discussion if needed 09:00:00 I just stop the meeting log here 09:00:04 #endmeeting