08:00:16 <gibi> #startmeeting nova_extra
08:00:17 <opendevmeet> Meeting started Thu Jun  3 08:00:16 2021 UTC and is due to finish in 60 minutes.  The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot.
08:00:18 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
08:00:20 <opendevmeet> The meeting name has been set to 'nova_extra'
08:00:58 <gibi> welcome o/
08:01:16 <suzhengwei__> \o/
08:01:38 <XinxinShen> o/
08:01:51 <gibi> ... waiting a minute to lets everybody join
08:03:24 <gibi> So a quick summary. We agree to have this meeting time slot one a Months in every first Thursday here in #openstack-nova
08:03:33 <gibi> s/one/once/
08:04:04 <gibi> but we haven't talked about yet what would be a good format for it
08:04:19 <gibi> should we go through the normal meeting agenda
08:04:33 <gibi> talking about bugs, CI, release, stable,
08:04:44 <gibi> or just go with an open agenda, like in an office hour
08:05:06 <gibi> what is the preference of the people present?
08:05:14 <gibi> I can do both if needed
08:07:07 <suzhengwei__> This is the first extra meeting. My colleague didn't prepare for it enough.
08:07:18 <gibi> suzhengwei__: no worries
08:07:56 <suzhengwei__> Open agenda will be all right.
08:08:09 <gibi> If no oppinions about the format then I would go with the simple open agenda
08:08:19 <gibi> suzhengwei__: cool, then we are on the same page
08:08:23 <suzhengwei__> ok
08:08:27 <gibi> and we can change later if needed
08:08:59 <gibi> I have one agenda point for today
08:09:47 <gibi> just want to refresh the process we have in nova about the release
08:09:56 <gibi> so we just had Milestone 1 last week
08:10:07 <gibi> that that point of time there is no special deadline in nova
08:10:22 <gibi> the next Milestone will happen in 5 weeks from now
08:10:33 <gibi> at Milestone 2 we will do spec freeze
08:11:09 <gibi> it means that if you have open specs then you have to get merged before M2 or it will need to be re-proposed to the Y release during the autumn
08:11:21 <gibi> I'm planning to have a spec review day before M2
08:11:29 <gibi> similarly how we had such day before M1
08:12:12 <gibi> those features that got approved before M2 could be implemented and merged before M3
08:12:20 <gibi> at M3 we will have feature freeze
08:12:34 <bauzas> sorry for being late, got some issue with my laptop notification
08:12:42 <gibi> bauzas: no worried :0
08:12:43 <gibi> :)
08:13:03 <gibi> basically this is the importnat deadlines
08:13:20 <gibi> the exact dates are #link https://releases.openstack.org/xena/schedule.html
08:13:31 <gibi> is there any question about these?
08:14:42 <suzhengwei__> Got it.
08:15:12 <gibi> cool
08:15:47 <gibi> that was what I prepared for today.
08:15:53 <bauzas> next open office hour would be around the spec freeze
08:15:53 <gibi> Is there any topic you want to discuss?
08:16:01 <bauzas> so, I have a concern
08:16:02 <gibi> bauzas: god point
08:16:05 <gibi> good even
08:16:20 <gibi> bauzas: tell us
08:16:29 <bauzas> how can we help contributors that are not in the nova meeting for their own specs ?
08:16:41 <bauzas> how can we discuss ?
08:17:08 <gibi> bauzas: what problem do you see that we need to dicuss in a meeting? stuck reviews?
08:17:19 <bauzas> and if we want to have priorities between some reviews, how could we know which ones ?
08:18:05 <bauzas> gibi: my point is that sometimes it's nice to discuss directly in IRC when you have some spec questions
08:18:10 <gibi> sure
08:18:19 <gibi> but it does not need to be a in the form of a meeting
08:18:28 <bauzas> yup
08:18:50 <bauzas> so, I wonder how we could help the contributors that aren't around in general
08:18:53 <gibi> still if there is question now, I'm happy to hear them even during this meeting
08:19:22 <gibi> bauzas: I'm OK to do discussion primarily in the spec review
08:19:52 <gibi> and if we hit a wall there then we can try to find a common time between the spec author and the reviewers to resolve
08:19:57 <gibi> the block
08:20:01 <bauzas> yup
08:20:16 <gibi> suzhengwei, XinxinShen do you have a view on this?
08:21:22 <gibi> do you feel it is hard to discuss issues about the specs?
08:23:21 <bauzas> in general, it takes more than one week
08:23:37 <bauzas> for example, when I review a spec, I provide some comment
08:23:50 <bauzas> but then the reply could be on the next day
08:24:00 <bauzas> and then I'd only see it honestly by the end of the week
08:24:23 <suzhengwei> Is this channel often open to all? I means if i want to get help, would someone react in time?
08:24:41 <bauzas> suzhengwei: yes, in general, we do it this way
08:24:50 <suzhengwei> good
08:24:58 <bauzas> suzhengwei: for example, when I'm reviewing some gibi's spec, I'm pinging him
08:25:04 <gibi> suzhengwei: it depends on the time zone. like I tend to be here between UTC 7:00 - UTC 17:00 on workdays
08:25:08 <bauzas> telling him that I had some questions
08:25:25 <bauzas> so, sometimes we directly discuss on IRC for a spec
08:25:31 <suzhengwei> cool
08:25:35 <bauzas> but like gibi said, we're both on the same TZ
08:25:45 <bauzas> so it's simple
08:26:09 <gibi> suzhengwei: I keep my client up during the night so I will see pings next day
08:26:14 <bauzas> me too
08:28:37 <XinxinShen> cool, We should ensure that the spec author and reviewer can communicate in time.
08:28:46 <gibi> suzhengwei: if you ping us, one thing to help is not just pinging but stating your problem right away
08:29:50 <suzhengwei> yes
08:30:30 <suzhengwei> I am doing some work about instance HA, and interested in the topic "Support vm evacuation while server status is suspended, paused".
08:31:00 <suzhengwei> I wonder if some one has processed it.
08:31:27 <bauzas> I still have open specs to look at
08:31:49 <suzhengwei> If not, I would like to do it.
08:31:57 <gibi> suzhengwei: interesting ideas
08:32:34 <gibi> suzhengwei: I think we don't support these today, but if with some compromise we could
08:33:18 <gibi> we will loose the in memory state of the paused instance but we could rebuild it still on another compute host
08:33:48 <gibi> and if the source compute is dead already then the in memory state is lost anyhow
08:34:13 <suzhengwei> host failure triggers evacuation. And active instance loose the in memory state too.
08:34:32 <gibi> suzhengwei: yes, so I think it is OK to lose that for the pause instance too
08:34:41 <gibi> suzhengwei, bauzas: does suspend saves some state to the disk?
08:34:49 <bauzas> good question
08:35:07 <bauzas> I honestly don't have the answer straight out of my mind
08:35:22 <gibi> anyhow if it save something to disk and that disk is on shared storage then we might even recover that saves state on the destination host
08:35:30 <bauzas> the crucial bit to remember with instance HA is that the host is already done
08:35:32 <gibi> but as a first step I would loose that too
08:35:33 <bauzas> gone*
08:35:47 <gibi> bauzas: yepp
08:35:56 <bauzas> so, yeah, ephemeral storage can't be somehow persisted
08:36:17 <gibi> except if it is on shared storage ^^ ;)
08:36:19 <bauzas> you need to have either shared storage or volumes
08:36:23 <gibi> yepp
08:36:38 <bauzas> yeah, but in general, you need to assume a crash
08:36:52 <bauzas> so any memory that's not synced is lost
08:36:58 <gibi> bauzas: yeah, this is why first I would assume that the suspended state is lost as well
08:37:16 <gibi> to avoid an inconsistent suspended state to be loaded
08:37:18 <bauzas> if suspend stores on disk, we're ok
08:37:32 <bauzas> kashyap: around ?
08:37:40 <kashyap> bauzas: Mornin, yes
08:37:53 <kashyap> How can I be useful? :)
08:37:57 <bauzas> kashyap: we are in office hour and we have a question about suspended instances
08:38:19 * kashyap reads back
08:38:25 <bauzas> with the libvirt driver and qemu, what happens to the memory state when suspending ?
08:38:32 <bauzas> do we suspend on disk ?
08:38:41 <bauzas> I'd be inclined to say so
08:38:52 <kashyap> bauzas: gibi: Yes: suspend usually indeed means save-the-state-to-a-file-on-disk
08:39:02 <kashyap> Your inclination is correct :)
08:39:33 <bauzas> the problem is that we can't tell whether the instance is on shared storage or not
08:39:43 <gibi> kashyap: and what do you think, moves such suspended state between compute host make sense?
08:40:25 <bauzas> gibi: keep in mind evacuate is a rebuild
08:40:26 <kashyap> gibi: You mean moving such suspended state between different compute hosts make sense?
08:40:40 <gibi> kashyap: yeaht that is my queston
08:41:05 <gibi> bauzas: in case of suspend we see the vm_state on the dest being suspended so we can look for the state file on the disk. if it is there then we know that it was on shared storage
08:41:27 <bauzas> gibi: sure but then we leak the state of the host
08:41:40 <bauzas> this isn't predictable
08:42:28 <gibi> OK, I agree this can be a can of worm
08:42:54 <gibi> suzhengwei: in case of evacuating a suspended VM, is it OK to you to loose the suspended state?
08:44:03 <kashyap> gibi: I need to think a bit more about it.  (libvirt has managedSave() API that does the suspend thingie, which already Nova uses.  So we have the primitives...)
08:44:06 <suzhengwei> If host down, active suspend instance both loose their memory.
08:44:53 <kashyap> gibi: suzhengwei: What is the main use-case here?  The ability to start suspended instances on any compute host from a given pool?
08:44:59 <gibi> suzhengwei: if you don't want to recover the suspended state that is saved to disk, then I think your proposal is pretty simple and straight forward
08:45:08 <suzhengwei> Instance Ha, try best to recover the workload as much as possible.
08:45:17 <bauzas> interestingly, I found some nova admin docs https://docs.openstack.org/nova/latest/admin/node-down.html
08:45:38 <gibi> kashyap: we looked at it from evacuation perspective. VM is suspended to disk (on shared storage), the host dies, user evacuates VM
08:46:04 <kashyap> gibi: I see; that makes sense
08:46:51 <bauzas> gibi: I honestly feel we can just support recreating a new instance
08:47:07 <kashyap> bauzas: Isn't that what already 'rebuild' is?
08:47:16 <kashyap> Ah, you said that already above :)
08:47:18 <bauzas> kashyap: yup, the question was about the memory state
08:47:51 <suzhengwei> If host down, the suspend instance can be active agian on the origin node. So I think it makes sence to evacuate suspend instances.
08:48:01 <suzhengwei> can not
08:48:06 <bauzas> I guess here suzhengwei's concern is that we limitate evacuate on active instances
08:48:14 <bauzas> right?
08:48:22 <bauzas> that's the problem we're trying to solve ?
08:48:35 <gibi> I think so
08:48:39 <suzhengwei> yes
08:49:09 <gibi> and I'm totally supportive to extend evac to support paused and suspended instances. It is simple if we allow loosing the running state
08:49:11 <bauzas> I just remembered we have a --on-shared-storage flag https://docs.openstack.org/nova/latest/admin/evacuate.html#evacuate-a-single-instance
08:49:36 <bauzas> since evacuate is an admin action, op can use it
08:49:39 <bauzas> on purpose
08:50:51 <bauzas> so we already do the check automatically
08:50:52 <gibi> bauzas: onSharedStorage is deprecated in 2.13
08:51:00 <gibi> bauzas: today we automatically detect it I guess
08:51:01 <bauzas> gibi: because we detect this ?
08:51:04 <bauzas> yeah
08:51:09 <gibi> "Starting since version 2.14, Nova automatically detects whether the server is on shared storage or not. Therefore this parameter was removed."
08:51:12 <gibi> yepp
08:51:20 <bauzas> ok, so I guess we can consider adding suspend
08:51:31 <gibi> suzhengwei: I suggest to propose a small spec about this. I'm happy to review it
08:51:44 <bauzas> if the target host is on shared storage, we could just try to boot with the suspended state
08:52:02 <bauzas> for paused, the implication would be that the evacuated instance would become active
08:52:15 <bauzas> for suspend, too
08:52:15 <gibi> bauzas: active, or stopped
08:52:20 <gibi> bauzas: we can decide
08:52:27 <bauzas> yup, that's the point
08:52:31 <gibi> bauzas: but true, it cannot be pasued any more
08:52:36 <gibi> paused
08:53:05 <gibi> I don't want to stop the discussion, but we have 8 minutes left. If there any other topic to discuss?
08:53:14 <gibi> /If/Is/
08:53:24 <kashyap> bauzas: gibi: One last:
08:53:28 <gibi> kashyap: go
08:53:30 <suzhengwei> I think stopped is better. No matter pause or suspend, users can not acess the instance directly.
08:53:43 <gibi> suzhengwei: I can accept that
08:54:07 <kashyap> gibi: suzhengwei: On whether it makes sense of moving suspended instances between compute hosts, a thumb-rule can be: "follow the same rules for hardware matching as for a live migration between the hosts"
08:54:19 <kashyap> (I mean, to uncover any "gotchas")
08:54:47 <gibi> kashyap: ahh you have a point, this state can be hw dependent
08:54:49 <kashyap> FWIW, I also just checked the above w/ a QEMU migration developer; and he agrees.
08:54:49 <bauzas> kashyap: since evacuate is a rebuild, we can't predict this
08:55:25 <bauzas> gibi: sorry, I wasn't explicit but when I said 'we're gonna try to unsuspend from disk", I was thinking of hardware capabilities
08:55:28 <gibi> OK, then I propose not to try to recover the suspended state during evac. At list not in the first step
08:55:40 <kashyap> bauzas: gibi: Hm, so looks like this needs to be fleshed out in a design document
08:55:41 <gibi> /list/least/
08:56:06 <bauzas> kashyap: the evacuate workflow is waaaaay different from live-migrate
08:56:16 <bauzas> you can't just check the source host at first ;)
08:56:27 <bauzas> and compare both
08:56:45 <bauzas> the scheduler is just giving you a target and then good luck with it
08:56:57 <kashyap> bauzas: I see; fair enough
08:57:03 <gibi> so in summary
08:57:04 <bauzas> but yeah, we're 4 mins
08:57:07 <bauzas> left
08:57:07 <gibi> so in summary
08:57:49 <gibi> suzhengwei: please propose a spec. I don't see any problem supporting evac for paused and suspended VMs. But they will lose the in memory or suspended state. They will be fresh VMs on the dest host in stopped state
08:58:18 <kashyap> Yeah; makes sense.
08:58:24 <suzhengwei> I will.
08:58:26 <kashyap> (On spec)
08:58:32 <gibi> suzhengwei: cool, thanks
08:58:48 <gibi> any last words before we stop the meeting? ;)
08:59:03 <suzhengwei> nothing from me.
08:59:07 <gibi> XinxinShen: ?
08:59:37 <XinxinShen> nothing for me. thanks.
08:59:56 <gibi> then thanks for joining. please continue discussion if needed
09:00:00 <gibi> I just stop the meeting log here
09:00:04 <gibi> #endmeeting