21:01:42 #startmeeting nova 21:01:43 Meeting started Thu Nov 29 21:01:42 2012 UTC. The chair is russellb. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:01:44 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:01:47 yo 21:01:47 The meeting name has been set to 'nova' 21:01:47 * Vek yawns 21:01:52 * dansmith snorts 21:02:00 Here's the agenda: http://wiki.openstack.org/Meetings/Nova 21:02:02 o/ 21:02:04 everyone here and awake?! 21:02:11 o/ 21:02:17 yep 21:02:41 cool, comstud, around? 21:02:59 or how about devananda ? 21:03:12 he and vishy_zz were road tripping, but it sounded like they'd be back on by now 21:03:24 k 21:03:34 * Vek is trying to grab comstud's attention 21:03:37 really? road tripping? 21:03:45 well we'll skip to bugs for now 21:03:47 #topic bugs 21:03:52 #link http://webnumbr.com/untouched-nova-bugs 21:04:11 not terrible, 25 not touched yet 21:04:24 if everyone could triage a couple that'd help get us back down really low, so please do 21:04:28 I did a couple last week I think 21:04:31 nice 21:04:43 i don't have mikal's nifty script so we can give out gold stars for triage 21:04:51 er, I mean I did 20 21:05:04 dansmith gets a gold star! 21:05:16 anyway, any other bugs we need to cover? 21:05:23 really important stuff that has popped up? 21:05:42 * russellb would know had he been doing more triage before 20 minutes ago :-/ 21:05:50 russellb: I have a question for a bug that I still have to file up 21:05:58 ok 21:06:06 it's related to porting the full nova stack to Windows 21:06:09 alexpilotti: and we can do your topic next 21:06:12 REJECTED 21:06:15 oh, sorry, go on 21:06:19 :D 21:06:23 lol 21:06:34 #topic hyper-v testing, windows support 21:06:38 simply put: in nova-api there'a large use of fork()s 21:06:44 go ahead, and then we can talk hyper-v unit tests next 21:06:58 so windows should add fork() ? 21:07:01 as you probably know fork() is not implemented on Windows 21:07:25 is it just for the multi-process API stuff? or is it always required? 21:07:27 I'm pretty close to have a working implementation on fork(9 but I still need some low level things sorted out 21:07:32 russellb: yep 21:07:56 alexpilotti, btw, that code is moving to oslo-incubator 21:07:58 the problem is that cpython dosn't really offer alternatives due to the lock 21:08:03 alexpilotti, so might be better fixing it there 21:08:13 interesting 21:08:27 russellb: you want to lead this? 21:08:30 anyway, the 2 way out are: rewriting it to avoid locks 21:08:31 my connection is being flaky 21:08:31 https://blueprints.launchpad.net/oslo/+spec/service-infrastructure 21:08:34 vishy: sure 21:08:57 an the other is spawning a process instead of a fork 21:09:03 vishy: we hit bugs already, not much there, now talking windows support / hyper-v 21:09:05 woot, i'm here 21:09:07 what do you guys think? 21:09:28 alexpilotti, which lock is this? 21:09:30 alexpilotti: might be worth an openstack-dev thread so we can dive into details 21:09:44 yeah, agree 21:09:58 http://wiki.python.org/moin/GlobalInterpreterLock ;-) 21:10:38 #action alexpilotti to start a -dev list thread on dealing with usage of fork() in nova for Windows 21:10:48 so the other topic we wanted to cover was the hyper-v unit tests 21:10:48 russellb: ok 21:11:02 there seems to have been confusion around how they work, and what impact they have on development 21:11:27 so i wanted alexpilotti to give an overview of what it's doing, and what the plans are for improving it 21:11:35 sure 21:12:12 first, I still don't get what issue came up with the latest live migration tests fix 21:12:25 https://review.openstack.org/#/c/17090/ 21:12:40 it's a 2 lines fix, just adding a lambda and that's it 21:12:43 alexpilotti: there was no issue with the patch itself 21:12:55 alexpilotti: sdague just raised the issue there, which is part of the confusion 21:13:11 dansmith: no, that's the patch to the bug 21:13:20 https://bugs.launchpad.net/nova/+bug/1083018 21:13:21 Launchpad bug 1083018 in nova "HyperV compute "resume" tests need to be fixed" [High,Fix committed] 21:13:32 not the sdague comment 21:14:01 alexpilotti: I think the problem is the fix isn't obvious 21:14:03 so I think the stubs just scare people off and we've been assuming that we can't fix your unit tests, heh 21:14:32 anyway, the hyper-v tests have nothing to do with the driver's interface changes 21:14:48 I understand that 21:15:02 we also want to get rid of them 21:15:34 the reason we have them is that this was the only way to get the test the pre-Folsom code 21:15:44 also worth noting nova/tests/hyperv/README.rst (which I just now discovered) 21:16:46 yes, I added that readme to explain the architecture 21:16:48 so sdague did you have any other questions or issues with it you wanted to discuss? 21:17:20 well, I don't feel as strongly as he does I think, 21:17:24 alexpilotti: ok, so you said you are getting rid of the tests, what's coming to replace them 21:17:24 maybe a simple example in the readme involving adding a parameter to an interface 21:17:33 would help 21:17:52 and when? it's definitely a little weird to be constantly trying to grok reviews that have 50 changed .gz files in them :) 21:17:52 vishy: adding a parameter to the interface doesn't need any changes in the stubs 21:17:53 but I really would like to see a different implementation for those so we don't have 20 changed pickle files for small fixes 21:18:06 sdague: I agree on that 21:18:07 I think it was rmk that hit the actual issue 21:18:26 this happens when we change code that affect all the tests internally 21:18:45 it happened quite often lately due to the type of features that we are adding 21:18:48 so nobody else can fix actual driver code then? 21:18:58 alexpilotti: why did his change break the hyperv tests then? 21:19:14 vishy: I don't think that it did. that's the point 21:19:33 alexpilotti: it did, the unit tests wouldn't pass 21:19:36 vishy: if you look here: https://review.openstack.org/#/c/17090/1/nova/tests/test_hypervapi.py 21:20:12 you'll see that the only code that I changed is adding a couple of lambdas in the tests to handle that extra parameter in the tests 21:20:35 alexpilotti: are you saying that changing the actual driver code would have broken the sub matching? 21:20:36 in test_hypervapi.py and that's it 21:20:48 dansmith: that's what i'm getting ... 21:21:07 alexpilotti: meaning, if rmk had changed it to match the other drivers he was changing, would the stubs fail? 21:21:10 russellb: only when it affects the WMI calls 21:21:30 I don't understand why he would have broken the stubs with that change, is my point 21:21:49 so I wonder if it was a knee-jerk reaction, thinking he couldn't without updating the gz files 21:21:53 alexpilotti: this was the original patch: https://review.openstack.org/#/c/13251/ 21:21:55 ok, so stubs only have to change when WMI calls change? 21:22:00 russellb: right 21:22:01 since rmk didn't change anything except the driver's interface, I don't see how could that be possible 21:22:08 alexpilotti: right but that isn't obvious 21:22:11 well that sounds better than what i thought 21:22:23 russellb: that's what *I* have been trying to get at :) 21:22:31 alexpilotti: I'm saying just add a section to the doc showing what needs to be done 21:22:35 so, here's the patch: https://review.openstack.org/#/c/13251/7/nova/virt/hyperv/driver.py 21:22:44 he changed only the line with the parameter 21:23:17 i honestly think this was a case of not everyone understanding what layer the stubs came in 21:23:18 alexpilotti: so you added a lambda to only pass the first param 21:23:20 vishy: there's nothing that needs to be done, as those patches are showing 21:23:27 vishy: correct 21:23:32 so I think the issue here is that the hyperv test layer is very different because it doesn't use the stubs model that other drivers use 21:23:42 and instead uses a different indirection mechanism 21:23:51 vishy: to pass the lambda to a utility method in the tests that expects a callable 21:24:06 alexpilotti, sdague: which means an example of that would be useful 21:24:12 which makes it harder for people to match a change in it, because it's different. 21:24:13 sdague: it uses also those stubs 21:24:27 so, anything in the works to change how this works? 21:24:29 sdague: I understand 21:24:37 russellb: it is 21:24:53 but just so we're clear, 21:25:07 this stub approach is still required at some layer to verify anything that would normally be calling WMI stuff 21:25:15 makes sense 21:25:19 right, that's actually what I'm most interested in, how do we make it better, and not so different 21:25:24 the improvement that could/should be made, is so that not all of the testing depends on that being there 21:25:30 #note stubs only have to change if WMI interaction changes (driver internals) 21:25:35 because there is a huge mental cost in it being different in the tree 21:25:36 I discussed this in SD and we came up with the idea to replace the stubs with serialized json instead of pickled files as a first step 21:26:00 #action alexpilotti to add an example to the README of making a driver interface change and updating the tests to reflect it 21:26:18 russellb: I don't think that makes sense 21:26:26 russellb: but, if there' snothing to change, what example can I add? :-) 21:26:30 #undo 21:26:31 Removing item from minutes: 21:26:32 heh 21:26:38 russellb: there's nothing different about making that sort of change in hyperv vs. libvirt 21:26:47 correct 21:26:49 actually, 21:26:51 when you think about this, 21:26:57 they are testing *more* than the other drivers, 21:27:08 because they're actually simulating the _libvirt.so 21:27:19 it's just that they don't have a FakeHyperV to sit between there, 21:27:30 so that most stuff gets tested against FakeHyperV (like FakeLibvirt) 21:27:37 and then just use the stubs to test the interaction with WMI 21:28:03 yeah, ok, i at least understand what's happening in the tests better now 21:28:12 I've modified their driver interface several times, 21:28:22 with no ill-effects, or hand-editing WMI pickled gz files :) 21:28:26 :) 21:28:37 this type of tests have some great advantages 21:28:48 any other questions/actions/notes before we wrap this topic up? have a number of other things to hit 21:28:59 hmm 21:29:40 i'm not sure that we came to any sort of conclusion necessarily, but hopefully some folks understand it better (I do) 21:29:40 it appears a little different to me 21:29:42 adding the json files will let anybody edit manually the stubs 21:29:55 alexpilotti: that is a good thing, IMHO 21:29:57 usually with the other drivers you go in and add the extra param to the interface 21:30:10 not enough, because I think you still need a FakeHyperV, but better 21:30:39 alexpilotti: yeh, I agree with dansmith, we really need a FakeHyperV layer in the tree 21:30:49 vishy: that's what has been done here as well! 21:30:58 because I don't think it's obvious to anyone that adding lambdas would have been the fix :) 21:31:19 alexpilotti: fair enough, i think it is just the _test method that is confusing 21:31:26 sdague: adding the lambda was just a "styly" way to solve this with two lines 21:31:35 alexpilotti: it was the wrong way, IMHO 21:31:40 the alternative, would have been an "if" in the tests 21:31:46 alexpilotti: and I know that, because I don't understand what the right way is yet 21:31:48 but regardless, 21:32:12 this is really not indicative of the WMI snapshots keeping people from making interface changes 21:32:22 ok it isn't as confusing as i expected 21:32:23 vishy: the "_" was marking it as private 21:32:32 _test_vm_state_change doesn't accept extra parameters 21:32:43 vishy: teh last parameter is a callable 21:32:59 that method is there because there are a gazillion of tests that look the same: 21:33:09 suspend, resume, start, etc etc 21:33:11 gotcha 21:33:17 they all just pass the instance data 21:33:18 they all have the same signature 21:33:35 they all HAD the same signature :-) 21:33:41 :) 21:33:42 until the resume got that extra param 21:33:58 so, CELLS, huh? 21:34:00 onward? 21:34:01 so, since tha actual method to test was passed as a callable 21:34:05 comstud: still around? 21:34:07 yes 21:34:12 #topic cells 21:34:20 comstud: what's up 21:34:37 had some other things to take care of the last week... but I'm about done moving some stuff around with the main cells code 21:34:47 I'm hoping tomorrow is the day for updated reviews 21:34:52 cool, i'm going to make myself go heads down in that code once you update it ... 21:35:00 yep, appreciate it 21:35:04 should be less ugly 21:35:11 who else is going to review? 21:35:19 and i've noticed some things that weren't structured correctly with respect to pluggable communication that I meant to have 21:35:23 (rpc vs something else0 21:35:33 so that'll all be fixed and should hopefully be easier to understand 21:35:43 awesome 21:36:15 well i'll be watching for the updates ... 21:36:19 might be late tomorrow :) or Saturday.. depends on how much I partcipate in this meetup:) 21:36:20 so there was still the open discussion of the cells adds to nova-manage 21:36:30 ah right 21:36:58 my argument is that it seems silly to have to start an unconfigured service to configure it. 21:37:08 i wouldn't want to start something that's unconfigured 21:37:29 s/nova-manage/nova-bootstrap/ ? :) 21:37:35 but honestly I don't really care. 21:37:40 I have the code in an extension as well 21:37:42 (i'm not really suggesting renaming it) 21:37:57 comstud: could things go in a config file? or are they too dynamic? 21:37:58 well, i mean, i do care. i'm just not going to fight hard over it 21:38:03 better things to worrya bout :) 21:38:10 sdague: it's a lot to configure for a .conf file 21:38:15 each parent cell 21:38:17 each child cell 21:38:25 the Rabiit broker credentials for each 21:38:28 etc 21:38:45 comstud: does it tend to be dynamic? or staticly defined? 21:38:47 how is it too much for a config file, but not too much for a bunch of calls to nova-manage ? 21:38:48 ConfigParser doesn't really work well for this 21:39:14 comstud: you don't have to start the cells service do you? You can just start nova-api and configure first? 21:39:16 i'm not sure what i'd name the config options 21:39:22 yeah, nova-api 21:39:46 comstud: I have secret plans to make db-sync into an extension 21:39:51 lol 21:40:02 not a secret anymore! 21:40:07 nova-api can currently start without the db migrations 21:40:10 comstud: doh 21:40:26 can/can't ? 21:41:31 did we lose you guys? 21:41:34 comstud: if it's mostly static, what about a json file to config it? 21:41:40 i guess it can.. it's just hosed until the DB is upgraded 21:41:42 * can 21:41:56 sorry guys I am here now 21:42:01 I was tied up in meetings 21:42:19 if you add a new column in the model code 21:42:22 but don't update the DB... 21:42:25 rmk: we voted you as the new hyperv maintainer 21:42:27 nova-api starts spewing failures 21:42:30 until the DB is upgraded 21:42:39 cuz sqlalchemy starts querying those new columns 21:42:57 dansmith: Oh cool! 21:43:05 :D 21:43:15 (which is annoying that sqlalchemy explicity asks for the columns by name for *) 21:43:21 I feel bad for people using my newly inherited driver 21:43:24 #note give your opinion on the cells nova-manage additions on the associated -dev list thread 21:43:50 #link http://lists.openstack.org/pipermail/openstack-dev/2012-November/003298.html 21:43:58 sdague: thanks 21:44:09 anything else, or can we move on? 21:44:14 i'm done 21:44:20 thanks! 21:44:22 ty 21:44:27 devananda: around for a quick baremetal chat? 21:44:35 or anyone else that has been working on it 21:44:54 * dansmith resists the urge to say something sarcastic 21:44:59 heh 21:45:02 skip it then 21:45:05 #topic project name mapping 21:45:09 vishy: take it away sir 21:45:45 ok! 21:46:07 there are many cases in nova where we know the project_id / tenant_id of an object 21:46:14 but we have no way of telling the name 21:46:27 which forces users / services to go look them up in keystone 21:46:39 since humans generally use names vs long uuid strings 21:46:54 I wanted to see how people felt about keeping a mapping of names to ids in nova 21:46:57 where do we need the name in nova? 21:47:08 well there are a number of places where it would be useful 21:47:27 nova list --all-tenants 21:47:37 nova list --all-tenants --tenant= 21:47:41 for example 21:48:00 where are the names kept, keystone? 21:48:00 I would also like our default dhcp hostnames to include tenant name 21:48:04 yup 21:48:09 what about doing the lookups in the nova client? 21:48:11 we get the name in the context for every api command 21:48:12 couldn't the cli query the list first? 21:48:18 i was about to ask about novaclient, yeah 21:48:28 dansmith: assuming that the nova admin is also a keystone admin yeah 21:48:34 ah 21:48:45 should a nova admin who is not a keystone admin see the mapping tho? 21:48:48 unless keystone decides to make the mapping public 21:48:49 :) 21:48:49 is it an information leak if they're not and we give them that info? 21:48:57 dansmith: +1 21:49:06 I'm mainly coming from a usability perspective 21:49:13 the name is far more useful imo 21:49:16 definitely nice from the dhcpd hostnames thing tho 21:49:31 This is a general problem with how decoupled keystone is from everything. 21:49:40 It's not just tenant names. 21:49:53 it seems a little weird to dup the data though 21:50:09 i don't really mind about the idea in general.. but it can be a decent sized table for large OS deployments. 21:50:18 if we add it to nova it would just be more like a cache. 21:50:26 One problem we frequently encounter and have built custom scripts around is reconciling tenant deletion with reclaiming resources 21:50:35 It's probably a different class of issue than this though 21:50:44 vishy: right, but then we need a cache management for it 21:50:58 i recognize that it isn't exactly an easy addition 21:51:05 Are we talking about regularly pulling tenant lists from keystone to have an id:name mapping? 21:51:13 please no 21:51:15 that's why I'm bringing it up 21:51:20 not a regular pulling 21:51:23 of everything 21:51:24 no, cache would be populated via normal nova requests 21:51:33 (that would not scale) 21:51:47 Yeah I wasn't proposing an architecture 21:51:54 you could theoretically have an external service to resync every so often (or if a tenant name changes in keystone) 21:52:18 can we solve the information leak with getting keystone to return only the mapping data the user could see? I guess I don't see how it's an information leak if we can get it from keystone, but not if we can get it from nova. 21:52:34 you'd have to cache permissions in table along with the tenant name 21:52:35 I've got an environment with over a hundred tenants and I can't tell you what a nightmare it is to deal with the lack of keystone association 21:52:35 vishy: sweet, you created active directory! 21:52:36 i'd think 21:52:39 sdague: that doesn't help the --all-tenants thing 21:53:06 if the solution here is to just make keystone expose an api for getting a mapping 21:53:11 vishy: yes 21:53:22 i guess i can handle that 21:53:24 what about bringing this to the -dev ML so the keystone guys can get involved in the discussion? 21:53:39 It would be nice to make a single bulk request of which IDs need to resolve, and get a single response back with the mapping 21:53:40 horizon must be doing this already 21:53:41 jog0: easier to make the decision with out them,... duh 21:53:48 vishy: It does it very inefficiently 21:53:54 rmk: agreed, a bulk operation would be a must 21:54:00 Each mapping is a request and it's terrible 21:54:03 so it's one wire burst 21:54:13 Horizon as it exists today does not scale 21:54:21 ok lets take that to keystone 21:54:32 so it would help horizon as well, which should make it even more valuable 21:54:33 +1 on bulk 21:54:36 There's a lot of requests which need to be made bulkable in keystone, this is one of them 21:54:37 vishy: you going to start a thread on it? 21:54:41 sure 21:54:54 #action vishy to start a thread on this topic on the -dev ML so that keystone devs can get involved, too 21:55:03 #topic grizzly-2 status 21:55:07 i expect I will need to cache for performance, but I can just do that in memory 21:55:11 grizzly-2 is still a ways out, but it's our next milestone 21:55:12 if nova is going to cache the mapping how will it handle deletions? 21:55:14 #link https://launchpad.net/nova/+milestone/grizzly-2 21:55:26 scheduled for January 10th 21:55:38 jog0: just a performance cache, deletions shouldn't matter 21:55:38 vishy: I don't know if you'd cache the response in Nova or just expect keystone to be sane about caching itself so you can just ask it again 21:55:55 vishy: any grizzly-2 planning stuff you want to hit? 21:56:05 russellb: not really 21:56:09 Having every service maintain its own cache of the others data seems like a recipe for pain. 21:56:11 vishy: if a tenant is renamed or deleted nova won't know right away right? 21:56:20 cool ... so ... keep hacking! 21:56:23 jog0: By right away, you mean never 21:56:28 #topic Open Discussion 21:56:32 4 minutes left 21:56:35 rmk: yes 21:56:38 Nothing gets reclaimed 21:56:48 jog0: The cache would be temporary just to avoid making requests for every instance launch 21:56:50 So if you have network associated to deleted tenants, its up to you to clean them up 21:57:07 jog0: like only one request every 5 minutes per node or something 21:57:37 vishy: I like rmk 's idea about keystone doing sane caching itself instead of us dealing with it 21:58:03 sdague: can I ask you to remove the -1 from https://review.openstack.org/#/c/16843/ ? unless there are other reasons for teh -1 of course :-) 21:58:30 alexpilotti: I'll do you one better 21:58:44 sdague: technically that's three better 21:59:10 oh, last item for open discussion 21:59:14 sdague: tx ;-) 21:59:31 tempest gate is close to being ready, but even when it's successful there are a lot of nova stack traces 21:59:33 https://bugs.launchpad.net/nova/+bug/1079210/comments/3 21:59:34 Launchpad bug 1079210 in nova "Successful full gate jobs show ERRORs and stacktraces" [Medium,Confirmed] 21:59:52 yeah i was just looking at those 22:00:02 those are mostly real bugs, the 413 explosion that I fixed a couple weeks ago was one of those 22:00:18 there are some that are just noise in the log that we need to silence 22:00:30 we often pass back exceptions over rpc to the caller 22:00:41 and we log every single one on the manager (server) side 22:00:48 so more eyes on those would be great, as the hope was to get the tempest gate to also check for exceptions in the logs, and fail if they were found 22:00:59 but obviously can't do that until those get cleaned up 22:01:02 but for most cases, we really shouldn't, and we should let the client side decide what to do with it 22:01:18 makes sense 22:01:30 anyway, more eyes would help 22:02:00 alright, we're a bit over 22:02:04 thanks everyone! 22:02:06 #endmeeting