21:00:48 <russellb> #startmeeting nova 21:00:49 <openstack> Meeting started Thu Aug 8 21:00:48 2013 UTC and is due to finish in 60 minutes. The chair is russellb. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:50 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:52 <russellb> hello, everyone! 21:00:52 <openstack> The meeting name has been set to 'nova' 21:00:56 <comstud> o/ 21:01:08 <russellb> #link https://wiki.openstack.org/wiki/Meetings/Nova 21:01:12 <dansmith> yo 21:01:14 <cyeoh> hi 21:01:15 <n0ano> o/ 21:01:15 <mrodden> hi 21:01:16 <mriedem> hi 21:01:20 <alaski> hi 21:01:27 * joel-coffman waves hello 21:01:31 <jog0> o/ 21:01:32 <russellb> #topic havana-3 status 21:01:55 <russellb> we are just under 2 weeks from the first havana-3 deadline, when feature patches must be submitted for review to be considered for havana 21:02:05 <russellb> #link https://launchpad.net/nova/+milestone/havana-3 21:02:14 <dansmith> cripes, two weeks? 21:02:22 <russellb> 98 blueprints 21:02:27 <vishy> o/ 21:02:28 <russellb> less than 10 implemented 21:02:40 <russellb> so obviously some serious work to do to even get a big chunk of this merged 21:02:59 <russellb> we're going to hit a review bottleneck i'm sure 21:03:11 <russellb> so it's really important that we try to prioritize which features we're reviewing 21:03:39 <russellb> not just on age at this point, but also what we think is most important to get into havana 21:03:42 <russellb> so those are my general comments 21:03:53 <russellb> we can now talk about whichever blueprints people here are interested in discussing the status of 21:04:02 <russellb> i think vishy would like to talk about live-snapshots 21:04:06 <vishy> yes 21:04:10 <vishy> #link https://review.openstack.org/#/c/33698/ 21:04:16 <russellb> #link https://blueprints.launchpad.net/nova/+spec/live-snapshot-vms 21:04:25 <vishy> so i'm getting quite a bit of push-back from libvirt folk 21:04:29 <vishy> on two points 21:04:47 <vishy> a) I'm using the libvirt api for snapshots in a way that wasn't intended 21:05:03 <vishy> and b) we shouldn't be loading memory state into a new vm because it might break 21:05:15 <vishy> I wanted to discuss these here 21:05:22 <dansmith> I don't understand b 21:05:30 <lifeless> thats interesting 21:05:31 <vishy> dansmith: see the comment from eric blake 21:05:41 <dansmith> vishy: yes, that comment makes no sense to me 21:05:46 <jog0> a) concerns me 21:05:46 <lifeless> see I had this idea the other day that we could snapshot memory state into a new sort of glance image 21:05:48 <dansmith> let me reread 21:05:54 <vishy> cloning memory state into a "new" vm means there is data in memory that is invalid 21:05:55 <lifeless> and deploy that at an arbitrary later date 21:06:09 <vishy> specifically the mac address and other hardware specific things 21:06:27 <vishy> I think this is an acceptable tradeoff for live booting vms 21:06:31 <lifeless> oh it's exactly what vishy is doing - cool 21:06:37 <vishy> but it will lead to hard to diagnose problems 21:06:38 <russellb> lifeless: :-) 21:07:01 <lifeless> vishy: where is eric's comments ? 21:07:18 <russellb> lifeless: on patch set 11 21:07:20 <russellb> lifeless: https://review.openstack.org/#/c/33698/11 21:07:25 <vishy> not inline 21:07:27 <dansmith> yeah, eric's comments don't make sense to me (still) 21:07:35 <vishy> well one comment is inline and one is global 21:07:53 <vishy> I would like to treat live-cloning as experimental for sure 21:08:12 <vishy> but if we can ignore b) what do we do about a) ? 21:08:32 <vishy> do i have to rewrite it to use the qemu monitor directly and just deal with the libvirt tainted flag? 21:08:54 <vishy> fyi one of my main reasons for pushing this is I want it as an option in the api 21:08:56 <lifeless> how is this so different to live migration ? 21:09:04 <vishy> lifeless: it is a clone 21:09:24 <dansmith> vishy: in a former life, I used life migration in xen to do this 21:09:25 <vishy> … so that other companies can come in and make better versions 21:09:42 <vishy> for example, grid centric has a pretty sexy version of this already 21:09:53 <russellb> vishy: why can't they contribute it? :-) 21:10:05 <vishy> russellb: because it is proprietary code 21:10:31 <russellb> i don't have much to add. I generally defer to the libvirt folks on point a) and generally think we should avoid doing something that freaks them out if we can't convince them it's ok 21:10:43 <lifeless> vishy: so then eric's comments translate as 'live migration can't possibly work, but does because it doesn't change the MAC address' 21:10:49 <jog0> vishy: why is this worth pusing for Havana? 21:10:50 <vishy> so basically i want to get the framework in so others can write a better version 21:10:51 <lifeless> (as teh MAC is the only in-instance visible thing). 21:11:03 <lifeless> I have a suggestion that might mitigate his concerns 21:11:09 <vishy> jog0 only because i've been trying to get it in for two months :) 21:11:10 <lifeless> hot-unplug the vnic. 21:11:13 <lifeless> save state. 21:11:16 <lifeless> and on boot 21:11:16 <jog0> vishy: fair enough 21:11:17 <russellb> vishy: to be honest, i'm not terribly interested in a framework for other implementations that aren't coming upstream 21:11:19 <lifeless> boot without a mac 21:11:24 <lifeless> then hot-plug it in. 21:11:42 <vishy> lifeless: that could work, unplug the nic before save 21:11:43 <lifeless> tada, as long as vcpu has teh same flags and arch etc, there should be no in-instance twiddling to do at all. 21:11:48 <dansmith> lifeless: I don't thnk that is his concern, right? 21:11:49 <lifeless> I will put this in a review comment. 21:11:52 <dansmith> he does a replug after creating it 21:12:05 <dansmith> I think eric is concerned about something else, but it doesn't make sense to me 21:12:06 <yjiang5> lifeless: you have to make sure there is no window between hot-remove and save, otherwise application may failed 21:12:08 <vishy> it doesn't really address point a) 21:12:16 <lifeless> dansmith: his concern is about having to update in-RAM values. 21:12:25 <vishy> yjiang5: ah yes good point, I did think of this before 21:12:40 <dansmith> lifeless: you think that's what he means by "address" ? 21:12:51 <lifeless> dansmith: I think he means memory address. 21:12:55 <lifeless> dansmith: in the guest RAM space 21:13:00 <vishy> yjiang5: however i am hot unplug/replug on clone anyway so it should be roughly the same 21:13:08 <dansmith> lifeless: right, what does that have to do with anything? 21:13:29 <vishy> lifeless: meaning the mac address of the old vm could be in a bunch of places in memory 21:13:38 <vishy> which will break things 21:13:42 <lifeless> dansmith: ^ 21:13:45 <dansmith> that's solved by the replug 21:13:48 <vishy> aye dansmith ^ 21:13:49 <lifeless> dansmith: right 21:14:02 <lifeless> dansmith: but you have to unplug it first to avoid potential crazy 21:14:09 <dansmith> his words led me to think he's concerned about something else 21:14:17 <jog0> is it worth discussing the details here or just the high level issues vishy listed (a and b)? 21:14:27 <russellb> let me see if i can get him to join us ... 21:14:29 <lifeless> dansmith: if you unplug *after* the snapshot, you're changing the hardware MAC of a virtIO device without warning 21:14:29 <vishy> maybe if it doesn't actually change the mac address in the xml and just does uuid/name then the libvirt folk will be better with it 21:14:38 <lifeless> dansmith: it's easy to image that leading to subtle corruption bugs 21:15:11 <dansmith> lifeless: well, I dunno, you can unplug a network device in linux without notice, it's not like a block device 21:15:15 <vishy> that would also potentially allow for multiple nics to be attached as well 21:15:33 <vishy> dansmith: right but that is trusting the os to be smart about these things 21:15:40 <vishy> i suspect windows is not nearly as forgiving 21:15:47 <dansmith> um, who cares? :) 21:15:47 <yjiang5> vishy: in live snapshot, will the VM pause a while to get the last round RAM state? 21:16:06 <n0ano> are we sure that the MAC address is the only data in RAM to worry about, what about things like file system UUIDs 21:16:07 <vishy> it pauses while snapshotting yes 21:16:33 <vishy> there are probably other things but this is all related to the idea of live clones in general 21:16:37 <mrodden> what about hardware passthrough devices and such... 21:16:41 <lifeless> dansmith: right, I agree you can hot unplug 21:16:56 <dansmith> mrodden: out of scope for something like this, I'd say 21:17:03 <lifeless> dansmith: but we're not hot unplugging, we're just rewriting details in the virtio device, *then* hot replugging. 21:17:04 <dansmith> mrodden: hardware passthrough pins a lot of things 21:17:08 <mrodden> right 21:17:08 <vishy> there will need to be a smart guest agent to handle some of the edge cases 21:17:13 <mrodden> it straight up wouldn't work 21:17:31 <vishy> thats why i want to get it in for people to experiment with 21:17:31 <mrodden> but i guess if its not a concern 21:17:41 <lifeless> dansmith: I'm saying we need to switch the order to address his concern about changing things out under the guest without the normal hw notifications 21:17:43 <dansmith> lifeless: but we all agree we should be hot unplugging before snapshot, right? 21:17:52 <vishy> right 21:18:00 <lifeless> dansmith: I think we do; modulo yjiang5's point about apps 21:18:10 <lifeless> dansmith: which I think is a higher layer problem - the IP will change as well 21:18:17 <dansmith> right, that has to be understood 21:18:19 <dansmith> the thing is, 21:18:23 <dansmith> his comments are so emphatic, 21:18:35 <mrodden> unless qemu does something weird with the memory mapping in the state file, i can't think of why it wouldn't work 21:18:46 <dansmith> I assumed that changing the order was not going to assuage his fears, so I assumed he was worried about something else 21:19:02 <vishy> ok so I will rewrite the hotplugging before and minimize the xml changes, add some notes that i would prefer to do it through a libvirt api but the xml rewrite is the only thing supported currently 21:19:11 <dansmith> vishy: personally, 21:19:14 <vishy> and see what daniel et. al. think about that 21:19:22 <dansmith> I'd like to see you do a lot more bounds checking and stuff on the xml payload, 21:19:23 <yjiang5> vishy: I think n0ano is correct that we may have a lot of guest-specific status, considering the guest has connection to a server, then that connectoin may be cloned also and confused the server. 21:19:43 <dansmith> because changing offsets in a new version of that format would cause you to really badly break the image 21:19:48 <vishy> yes clearly the production version of this needs a guest agent 21:19:54 <dansmith> and I think you can do some easy sanity stuff 21:20:03 <vishy> gridcentric has offered to create an opensource guest agent to do this stuff 21:20:07 <vishy> since they already have one 21:20:47 <russellb> should we be waiting on something we consider production ready? 21:21:23 <vishy> russellb: I don't think we are going to get there unless there is an experimental version for people to start trying 21:22:22 <russellb> ok 21:22:49 <russellb> vishy: so sounds like you have a plan for an update to make? 21:23:43 <vishy> yep i will make the update and see how it goes 21:23:46 <vishy> thx 21:23:47 <russellb> alright 21:23:48 <russellb> sure 21:23:54 <russellb> next blueprint for discussion? 21:23:58 <dansmith> objects 21:24:08 <russellb> dansmith: alrighty, have at it 21:24:09 <dansmith> I think compute-api-objects is still in striking distance 21:24:20 <russellb> #link https://blueprints.launchpad.net/nova/+spec/compute-api-objects 21:24:31 <dansmith> it's going to be a lot of work to get it done in two weeks, especially since comstud refuses to step up, but I think it's doable regardless 21:24:35 <dansmith> (kidding about comstud of course) 21:24:52 <comstud> lol 21:25:01 <jog0> dansmith: why does this need to hit the havana deadline? 21:25:19 <comstud> it sucks having 1/2 of it done 21:25:45 <russellb> and that has an effect on the pain of maintaining havana 21:25:49 <jog0> comstud: true, but there is an impact on ther blueprints 21:26:06 <comstud> by done i mean merged already 21:26:07 <jog0> russellb: good point, as in backports would be much harder if half done? 21:26:27 <comstud> we already have a lot of transitional code that is not ideal 21:26:30 <russellb> i think so, we'd be left maintaining a fairly inconsistent interface 21:26:41 <jog0> russellb: you convinced me 21:26:43 <russellb> k :) 21:26:56 <yjiang5> comstud: do this really changed the interface a lot ? 21:27:10 <russellb> (and setting deadlines for ourselves is good to help ensure we keep up the pace) ;-) 21:27:51 <comstud> yjiang5: I'm not sure what interface you're specifically referring to 21:28:04 <yjiang5> comstud: the interface russellb is talking about :-) 21:28:26 <russellb> compute api? 21:28:32 <comstud> it's generally way different how you work with objects or sqlalchemy models/dicts 21:28:40 <comstud> right 21:28:44 <comstud> some compute api methods expect objects now 21:28:46 <comstud> others cannot take objects 21:29:07 <comstud> it'd be nice to get that all consistent :) 21:29:16 <yjiang5> comstud: got it and agree. 21:29:36 <dansmith> um, sorry 21:30:01 <russellb> dani4571: didn't miss *too* much, just clarifying what's different in compute api with objects 21:30:03 <comstud> dansmith: it's cool, we assigned 10 more things to you 21:30:07 <russellb> anything else you want to cover? 21:30:13 <dansmith> nope 21:30:19 <dansmith> comstud: bring it on! 21:30:19 <russellb> ok, next blueprint? 21:30:43 <comstud> alaski ? 21:30:45 <n0ano> garyk wanted me to bring up instance groups 21:31:05 <russellb> n0ano: ok 21:31:08 <russellb> #link https://blueprints.launchpad.net/nova/+spec/instance-group-api-extension 21:31:14 <russellb> n0ano: what's up with that one 21:31:29 <n0ano> he's looking for reviewers 21:31:38 <alaski> comstud: ack. I'll go next 21:31:45 <russellb> n0ano: did it get converted to objects? 21:31:55 <n0ano> the patch was ready and then he had to change it to use objects and now it's ready again 21:31:58 <russellb> looks like it did 21:32:00 <n0ano> yes, I believe he did 21:32:19 <russellb> ok 21:32:23 <russellb> anything else? 21:32:40 <yjiang5> although PCI pasthrough is low now, I still want to lobby more cores to review our patch. It's really helpful for virtualization, and some customer want it. 21:33:04 <jog0> shameless plug for: https://blueprints.launchpad.net/nova/+spec/no-compute-fanout-to-scheduler 21:33:13 <russellb> heh, just review lobbying? 21:33:16 <russellb> alaski: you're up 21:33:25 <alaski> cool 21:33:27 <alaski> https://blueprints.launchpad.net/nova/+spec/query-scheduler 21:33:31 <jog0> russellb: yeah and feedback of course 21:34:13 <alaski> so the tl;dr is that rather than casting to scheduler to compute, and then back to scheduler on a failure. we cast to conductor which gets a list of hosts from scheduler and calls to compute 21:34:22 <yjiang5> jog0: I think the question is to me :) yes, just review lobby and feedback of course. 21:34:38 <alaski> except that calling to compute doesn't work so well becuase building an instance can take some time 21:35:07 <alaski> so now I'm looking at conductor casting to compute and compute casting back to conductor to report success/failure 21:35:18 <russellb> alaski: that would at least match how it works now 21:35:31 <russellb> alaski: seems like anything else is going to be a lot more work (and more risk) 21:35:31 <alaski> right. But leaves the scheduler in a better place 21:35:55 <russellb> yeah 21:35:58 <russellb> seems ok to me 21:36:23 <russellb> and perhaps worth looking at again next cycle to consider having conductor do more active monitoring of the progress 21:36:57 <russellb> but what you said sounds good to me for now 21:37:08 <alaski> agreed. I'm open to ideas on ways to not have ccompute need to cast/call back to conductor 21:38:04 <russellb> ok, good on that one? 21:38:13 <alaski> yep, thanks 21:38:16 <russellb> cool 21:38:21 <russellb> any other blueprints to discuss? 21:38:40 <itzikb> russellb: Can I ask for a review ? 21:38:49 <russellb> itzikb: sure why not 21:38:52 <joel-coffman> I'll second that request 21:39:05 <itzikb> https://review.openstack.org/#/c/35189/ 21:39:08 <russellb> i assume anything marked as "needs code review", someone would like a review on it 21:39:19 <itzikb> :-) 21:39:43 <russellb> comstud: direct mysql? still want it on the list? 21:39:52 <comstud> defer 21:39:55 <russellb> ack 21:39:56 <jog0> comstud: :( 21:39:59 <dansmith> heh 21:40:03 <comstud> haha 21:40:18 <comstud> jog0: "Why does it need to be in Havana?" ;) 21:40:24 <russellb> comstud: ha 21:40:27 <jog0> comstud: performance 21:40:36 <comstud> I know and I tend to agree 21:40:47 <comstud> unless a miracle happens, I don't see it making it 21:40:53 <comstud> beacuse of trying to get these other things done 21:41:09 <russellb> reality is a pain sometimes 21:41:13 <jog0> comstud: yeah ... it sure is 21:41:43 <comstud> I need to put my current code back up on github at least 21:41:51 <jog0> russellb: what about all the BPs that aren't started? 21:41:53 <comstud> or well, update it 21:42:15 <russellb> jog0: all hyper-v yes? they say they're delivering all of them ... 21:42:17 * russellb shrugs 21:42:34 <jog0> shouldn'tthey at least mark them as started 21:42:43 <russellb> jog0: you'd think so 21:43:06 <russellb> maybe i'll defer them next week 21:43:12 <jog0> ack 21:43:18 <russellb> note that we're basically aiming to merge *everything* in high/medium 21:43:34 <russellb> but beyond that it's no real commitment 21:43:59 <russellb> alright, let's check with subteams real quick, and then open discussion 21:44:02 <russellb> #topic sub-team reports 21:44:08 <russellb> anyone around to provide a subteam report? 21:44:39 <hartsocks> \0 21:44:45 <russellb> hartsocks: hey 21:44:48 <hartsocks> jo 21:44:55 <russellb> so hartsocks does a nice job sending out regular subteam status emails :-) 21:45:05 <hartsocks> So I'm building a list of the most important blueprints to our subteam. 21:45:20 <hartsocks> I'll mail the list with that tonight after I collect the stats. 21:45:23 <russellb> ok 21:45:41 <hartsocks> Yep. 21:45:48 <russellb> anything else? 21:45:57 <hartsocks> I've got a few reviews I'm shilling for... 21:46:05 <hartsocks> I'll save that for the list. 21:46:09 <russellb> ok 21:46:12 <russellb> any other subteams? 21:46:52 <russellb> alrighty then 21:46:55 <russellb> #topic open discussion 21:47:20 <russellb> any final topics? 21:47:22 <jog0> if no one else has anything, got a question about the future of conductor 21:47:31 <russellb> jog0: sure, we can chat about that 21:47:59 <russellb> i don't know that there's a wiki page i can point to ... 21:48:06 <jog0> so the question is, I have been ignoring the conductor planning until now, and my question is why make conductor long term tasks etc 21:48:17 <russellb> the related blueprints for havana are unified-migrations (and its child blueprints for cold and live migrations) 21:48:20 <russellb> and query-schedulre 21:48:28 <jog0> I noticed scheduling doess a conductor hop 21:48:39 <russellb> yeah, so, there's multiple angles to this 21:48:53 * russellb tries to think of where to start 21:49:06 <jog0> so migrations makes a lot of sense to me, query-scheduling just makes things nicer but 21:49:08 <russellb> with migrations, one issue is that it's not great for security to have compute nodes telling each other what to do 21:49:11 <alaski> jog0: I think it makes the most sense when you consider something like migrate. it needs to communicate with the source and destination 21:49:30 <russellb> another thing that applies to migrations and the instance build, is just status tracking of long running tasks 21:49:32 <jog0> alaski: yes things like migrations make sens to me. 21:49:48 <jog0> russellb: this sounds like a change that needs well documented reasoning 21:50:02 <russellb> we want to be able to eventually have graceful handling of these tasks, even if a service has to restart, for example 21:50:07 <jog0> can you expand on what status tracking for long running tasks actually means 21:50:16 <russellb> and having it anchored in one spot makes it much more achievable to do that 21:50:32 <jog0> russellb: one spot in code doesn't have to be one service though 21:50:52 <russellb> so you think hopping through the scheduler is better? 21:51:10 <russellb> i won't argue that what we've done / are doing is the best possible way, most things aren't :-) 21:51:15 <russellb> but is it an iterative improvement? 21:51:19 <jog0> not any worse. 21:51:24 <russellb> i think it is, but if it isn't, we need to reconsider 21:51:31 <jog0> russellb: without knowing what the goal clearly is its hard to say 21:51:57 <russellb> so is your concern specifically with the scheduler related build_instance bit you were looking at? 21:52:03 <jog0> I am fairly sure this won't make things worse in anyway my question is more of is this a good use of our time. (I am playing devils advocate here) 21:52:36 <jog0> russellb: my concern is we are going down a road without a clear idea of what we are trying to do 21:52:59 <alaski> well, we're taking a very short step right now 21:53:00 <jog0> unfiy task logic makes sense to me, but there are other ways to do that Ithink 21:53:10 <alaski> and where we go after that is open for discussion 21:53:21 <jog0> alaski: right, but doesn't it make sense to have the discussion before? 21:53:25 <russellb> making the scheduler interface much simpler 21:53:26 <yjiang5> jog0: agree. 21:53:30 <comstud> we discussed this at the summit 21:53:43 <comstud> unless I just dreamed it 21:53:47 <jog0> note: I am not saying i don't like this, just saying I want to see a detailed answer why I am completely wrong 21:53:47 <russellb> yep, and should probably have another chat about next steps 21:53:47 <alaski> yeah, there was discussion 21:54:07 <russellb> jog0: well ... i think the burden is on you to say what's bad 21:54:13 <russellb> not that you're owed an explanation of why it's good 21:54:15 <jog0> comstud: IMHO at this point in OpenStacks life a discussion at the summit without a doc summing it up isn't enough 21:54:23 <russellb> just a general opinion on things 21:54:36 <comstud> jog0: I'm sure there's an etherpad with notes... and then there's the BP. 21:54:37 <jog0> russellb: because not everyone wants to run conductors 21:54:46 <russellb> jog0: ok, well why not? 21:54:51 <comstud> conductor is not necessary here 21:54:52 <russellb> we've had many ML threads on this too 21:55:01 <comstud> it just runs it local if you don't want it 21:55:15 <russellb> puts all this in nova-api instead basically if you run in that mode 21:55:16 <alaski> yes, local mode works fine. it spawns a greenthread in that case 21:55:17 <comstud> 'local' tends to mean in nova-api for these things. 21:55:42 <jog0> russellb: right, but the next question is will this actaully make task logic easier to understand? 21:55:52 <russellb> i think it will/does, yes 21:55:57 <alaski> yes, it does 21:56:02 <russellb> following the process is going to be *much* easier this way 21:56:23 <dansmith> agreed 21:56:41 <jog0> russellb: because ...? (once again devils advocate here) 21:57:56 <alaski> because it's centralized to conductor. and because it's not mixed in with filtering and weighting concerns in the scheduler 21:58:03 <russellb> yeah.. 21:58:24 <russellb> jog0: can you come back with some specific objections? perhaps it'd be easier to address those? 21:58:26 <jog0> and why not something like taskflow? and how do we handle when a conductor goes down 21:58:34 <jog0> russellb: I don't have any objects really 21:58:37 <russellb> ok 21:58:43 <russellb> taskflow is somewhat related here 21:58:45 <jog0> my objection is the lack of a clear document saying why tis is good 21:58:49 <russellb> the issue with that is that we didn't have a place to really utilize it 21:58:50 <alaski> this is moving towards being able to look at taskflow 21:58:56 <russellb> what alaski said 21:59:05 <dansmith> we're out of time 21:59:12 <russellb> this were steps 1-10 and step 20 was (maybe) taskflow 21:59:13 <russellb> in general 21:59:15 <russellb> ok, time! 21:59:17 <russellb> thanks everyone! 21:59:28 <jog0> thanks 21:59:29 <russellb> #endmeeting