21:02:42 #startmeeting nova 21:02:43 Meeting started Thu Feb 14 21:02:42 2013 UTC. The chair is russellb. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:02:44 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:02:46 The meeting name has been set to 'nova' 21:02:47 #chair vishy 21:02:48 Current chairs: russellb vishy 21:03:03 #link http://wiki.openstack.org/Meetings/Nova 21:03:42 couple of topics, let's hit grizzly-3 really quick though 21:03:45 #topic grizzly-3 21:03:49 #link https://launchpad.net/nova/+milestone/grizzly-3 21:04:05 We removed a number of blueprints this week for stuff that came in too late, didn't have code yet, etc 21:04:30 reviews on the FC nova changes needed https://review.openstack.org/#/c/19992/ 21:04:36 i suppose we need to be trying to prioritize our reviews for the next couple weeks on these features ready for review 21:05:00 kmartin: yep, it's on the big list of "needs code review" blueprints 21:05:04 so when are we properly in feature freeze? 21:05:15 21st 21:05:26 i thought it was 2 weeks, eep 21:05:28 I hope the FC nova changes get merged as soon as possible as we are also working on a FC Cinder driver that depends on it 21:05:31 well realistically the 19th 21:05:44 #link http://wiki.openstack.org/GrizzlyReleaseSchedule 21:05:57 even less time than i thought 21:05:58 so what is the process if the review is not completed by the 21st, is the feature pushed out to Havana? 21:06:01 definitely lots of review work to do 21:06:08 yeah 21:06:31 kmartin: we can consider feature freeze exceptions, but honestly we had too many of those last year and we need to be strict about it IMO 21:06:32 right, ok, so if 19th is the freeze, we probably need to start figuring out what blueprints we really want for grizzly, and move the rest out 21:06:38 s/last year/last release/ 21:07:18 #link http://status.openstack.org/release/ 21:07:24 that's actually a really nice view 21:07:40 even if it shows all the projects 21:07:43 sdague: start from the top and see how much we can review? heh 21:07:51 yeh, basically :) 21:08:18 anything else on g3 before we jump into these specific topics? 21:08:37 #topic network adapter hotplug bp 21:08:44 well, vishy, any guidance on the bps you want to make sure land? 21:08:44 dansmith: this you? 21:08:49 russellb: yeah 21:08:51 doude: round? 21:09:19 yes 21:09:22 well, basically, it was abandoned by the original author in august, 21:09:34 doude made a single push against it this month, and has some more changes since then 21:09:41 vishy asked me to pick it up and try to make it work 21:09:55 we've made some progress, and I significantly refactored how the detach case works 21:09:56 yes and I asked yaguang if can rebase it 21:10:21 doude: the question I guess is who is going to drive this? if you're going to do it, then I'll back off 21:10:34 otherwise I can integrate your changes and keep running with it 21:10:41 just don't want to be duplicating effort 21:11:51 doude: thoughts? 21:12:23 dansmith: Okay. As you want. It depends if you need help and if you have enough time 21:12:57 doude: I'm happy to go do it, and I think it's pretty close.. I've got it updated for the recent libvirt changes (vif models) as well 21:13:07 so I think I can do it, I just don't want to take it away from you if you really want to do it 21:13:11 I followed your progress 21:13:37 if you're okay with it, I'll pull what I can from your latest changes that you pushed yesterday and integrate them 21:13:41 dansmith: Okay, no problem 21:13:46 okay 21:13:49 doude: if dansmith takes it, your review would be appreciated of course, and to help point out anything he may miss 21:13:56 I made some progress on the sample API tests 21:14:05 doude: yes, that's what I'm going to take for sure :) 21:14:06 I can merge them to your branch 21:14:09 russellb: me? miss something? 21:14:27 doude: let me do it, if that's okay, since we changed the extension name slightly and I have other changes yet to be merged 21:14:59 dansmith: as incredibly unlikely as it may be :) 21:15:03 Okay, that what I made 21:15:03 russellb: heh 21:15:12 okay, so I think we're good then 21:15:18 great 21:15:28 #topic nova-conductor 21:15:38 so this discussion started just a while ago in -nova 21:15:44 heh, I guess this meeting is "messes dansmith made" 21:16:03 dansmith: ha, i think i share any blame on this one 21:16:09 good :) 21:16:10 comstud: around? 21:16:19 yeah, notfeeling so hot, but here 21:16:30 * devananda waves from the back of the room 21:16:40 comstud: so, are you feeling like we need to default to local conductor based on the db stuff you're hitting? 21:16:52 it seems to me like you're making gains that outweigh the hits taken by going to conductor in the first place, 21:17:02 so I'm not sure that it's necessary to disable "real" conductor for grizzly 21:17:04 i'm not sure... it'll be fine for small deployments 21:17:10 i mean, conductor wil 21:17:12 l 21:17:13 but obviously we should if it'll be a problem 21:17:29 but for larger installs, there'll be issues 21:17:46 comstud: even with your db changes and plenty of conductor instances? 21:17:59 comstud: small is relative. any idea what order of magnitute of compute-per-conductor it takes to have problems? 21:18:02 well 21:18:24 like, as many conductor instances as api instances perhaps ... just thinking about how it might be deployed 21:18:35 mysqldb implementations for some things are taking a bit longer than expected 21:18:45 and i'm not sure how well-received the code will be... 21:18:58 but 21:19:23 comstud: i like jog0's idea of a separate backend that can fallback to the sqlalchemy for unimplemented functions 21:19:24 I cannot get thread pooling to work with sqlalchemy due to bugs in eventlet. 21:19:32 then it can just be experimental 21:19:46 I can get it to work with mysqldb... but I have to patch our logging because it uses a lock that's brokenw ith eventlet tpool 21:20:00 vishy: Yeah, that's what we're working on 21:20:07 assuming we can figure out a way to not double up every connection 21:20:16 we have instance_get and bw_usage_update instance_destroy all working 21:20:25 from the limited testing I've done 21:20:31 instance_update is almost done.. 21:20:39 comstud: any idea what this does to our test matrix? 21:20:48 i'm going to start throwing up some reviews for the framework 21:20:56 is that just changing the api calls for those in main, or does it have the old version as well? 21:21:14 not sure I understand 21:21:19 so it sounds like this is separate from conductor ... right? 21:21:25 just to be clear. 21:21:28 it's an alternate IMPL in db/api 21:21:35 nova/db/mysqldb/* 21:21:36 comstud: right, that's my concern 21:21:39 comstud: fyi - let me know if you need help with eventlet fixes. 21:21:44 that's what I'm trying to figure out.. is it really only a problem with conductor, or just only noticeable? 21:21:44 the db being slow, and usage of nova-conductor and separate, sort of related issues. 21:21:50 the mysqldb code itself falls back to sqlalchemy 21:21:54 because it means that we need to light another tempest full gate if we have another db path 21:22:05 ewindisch: sure 21:22:30 sdague: Yeah, I know.. it really should be tested w/ tempest, etc 21:22:47 dansmith: it's a problem with sqlalchemy+eventlet, iiuc, which conductor makes much more painful 21:22:56 I'll try to get a review up for the framework by tomorrow 21:22:56 although 21:23:10 comstud: so is this not fixable in db/sqlalchemy if you change the apis to not use the models, but use sqlalchemy low level instead? 21:23:13 https://github.com/comstud/nova/tree/bp/db-mysqldb-impl 21:23:21 if you ignore all of the ugly commits in that :) 21:23:54 #link https://github.com/comstud/nova/tree/bp/db-mysqldb-impl 21:24:14 comstud: remember you need to base that off oslo-incubator now. 21:24:20 yup i know 21:24:31 there's only a few changes to openstack/common in there 21:24:32 comstud: that stuff totally scares me from an sql injection direction as well 21:24:36 mostly to patch the logging. 21:24:45 sdague: which part? 21:24:54 Oh 21:24:56 the way the query building is happening 21:25:10 using python string formating to build queries 21:25:34 Yeah, that's how mysqldb works 21:25:46 sdague: the args are not getting placed into the sql directly 21:25:46 it wraps '' around strings though 21:25:55 sdague: should be ok injection wise unless we missed something 21:26:17 It's probably not clear from initial examination 21:26:42 can you not do some parameterized queries? 21:26:47 so i guess the overarching issue is this isn't going to be done for grizzly 21:26:51 but queries to mysqldb end up being execute("%(fooo)s ", kwargs) 21:27:01 and mysqldb turns the kwarg values into strings, etc for you 21:27:02 so do we need to do anything for grizzly 21:27:05 '... .' 21:27:26 It certainly won't be 'done' 21:27:31 sounds like we should do some load testing and see what the limits of conductor are 21:27:33 I think we can have a few working queries in mysqldb... the most important ones 21:27:41 comstud: is there any chance that we could get some pre-production testing on real conductor 21:27:45 and decide whether to change the default. 21:27:47 vishy: +1 21:27:52 to prove it's too much of a problem to be on by default? 21:27:54 sdague: I think mysql-python does parameterized queries that *look* like python string formatting 21:27:58 dansmith: Not by me 21:28:11 ewindisch: Correct 21:28:19 ++ to load testing and setting a good default for conductor 21:28:25 dansmith: I definitely don't have the resources to load test nova-conductor myself 21:28:30 ewindisch: ok, I'd have to educate myself on that one, new ground, just bringing my old sql safety hat on 21:28:46 sdague: appreciate the caution, I share it :) 21:28:47 comstud: neither do I :( 21:29:01 comstud: do we know conductor is a problem in practice, or just in theory? 21:29:03 dansmith: We have some really ugly scripts that load test stand-alone, though, which kind of mimics the behavior 21:29:44 sdague: so, global nova-cells acts like nova-conductor 21:29:55 in that it takes a shitload of stuff off a queue and does DB queries 21:30:13 And it has a huge problem in practive given enough load. 21:30:22 it all depends on how large your deployment is 21:30:38 so, 21:30:47 what's the ratio of that to compute nodes, 21:30:49 We have a large enough deployment so far that global nova-cells is having a hard time keeping up 21:30:55 compared to what we might guess the conductor one would be? 21:31:05 can you run multiple cells services? 21:31:20 or rather, do you? 21:31:24 russellb: almost.. i think there's a couple of races I've not solved yet 21:31:29 ok 21:31:36 the trick is.. 21:31:52 the order of the DB calls 21:31:57 this is not really a problem w/ conductor 21:31:59 they're all rpc.calls 21:32:00 vs casts 21:32:21 so you should be able to set up multiple nova-conductors without problem 21:32:25 the question is... how many do you need? 21:32:38 I suspect nova-conductor might be okay to enable in grizzly 21:32:43 And if you run into load problems, you set up more 21:32:50 and if all fails, go to local mode 21:32:53 load problems == the nova-conductor queue backing up 21:32:54 that was our hope anyway ... 21:33:01 I just fear we'll never get the data if we don't have it on by default 21:33:09 yeah 21:33:10 with some good deployment docs 21:33:16 personally, I'm fine with enabling it. 21:33:23 if it's broken, there's an easy work around 21:33:24 large installs will tweak a ton anyway 21:33:54 agreed, turn it on. we need to know 21:34:22 Really this is a separate issue: sqlalchemy just sucks balls performance wise... :) 21:34:28 need to write up some guidance on deploying conductor though 21:34:44 so people know what to watch out for, and that they may need to run multiple instances of it (like other services) 21:35:05 We have a really horrible problem with our DB code, also... We're joining way too much stuff... most of the time the data is not needed. 21:35:08 comstud: but sqla sucking balls amplifies the conductor concern, so i see how they're related 21:35:16 join all the things! 21:35:17 We seem to be join-loading security_groups and security_groups.rules on every instance get! 21:35:35 So far I cannot find where we use 'rules' when you start from an instance model. 21:35:39 so I think that's a join that can be removed 21:36:14 But even when I join all of that in manually formed queries... things are still much faster than using sqlalchemy 21:36:19 may have been no-db-messaging inspired ... 21:36:24 anyway, i digress. 21:36:25 well, 21:36:31 it's no-db-primitives or whatever too 21:36:32 join all the things so it doesn't have to be looked up later 21:36:36 you can't do that without the join 21:36:49 russellb: 'because it can't' is more accurate 21:36:52 You can query for security groups and rules separately though 21:36:59 comstud: *nod* 21:37:01 comstud: yes 21:37:05 when needed. 21:37:21 just identifying likely history for why it's there 21:37:31 every instance delete pulls them in which is a huge waste. 21:37:39 yeah 21:37:41 * beagles makes note to revisit apparently crazy db api behavior 21:37:43 anyway :) 21:37:51 call it morbid interest 21:37:53 well, maybe we also try to remove some backrefs as part of this going forward 21:37:58 beagles: :-) 21:38:13 yes, there's a number of backrefs that need to go away 21:38:14 post-G I mean 21:38:28 when you joinload security groups 21:38:46 you have instance['security_groups'] and each one has ['instances'] 21:38:55 dear god 21:39:01 i'm sure all of this processing sql-a does contributes to its slowness. 21:39:18 ouch 21:39:23 wow 21:39:43 well 21:39:55 it doesn't have *ALL* instances in ['instances'] 21:39:57 it's not as bad as it sounds 21:40:07 but whatever instances were queried will be in there 21:40:17 all instances that use that security group? 21:40:19 or? 21:40:24 no 21:40:27 instnaces that were queried 21:40:29 ah, okay 21:40:34 still, it's busywork 21:40:37 AFAIK 21:40:40 but 21:40:43 instance_get_all() is nasty :) 21:41:08 my mysqldb query is about 5s for 50K entries... 12s for sqlalchemy 21:41:22 it populates a model similar to sql-a, but it has no backrefs. 21:41:28 ugh, why are you every pulling 50k records from the db? 21:41:31 *ever 21:41:36 For load testing 21:41:36 :) 21:41:38 that's al 21:41:44 aah :) 21:41:49 yeh, be careful of the micro benchmarks though :) 21:41:57 sure 21:42:25 so if we are done on conductor, a related db question, is how much of the db blueprints do we expect / think are safe, to still land 21:42:32 as there are still a bunch outstanding 21:42:37 if any one missed my instance_destroy() test.. 21:42:50 it took 20 minutes for 50K instances with sql-a 21:42:51 lol 21:43:04 i got it to like 40s with mysqldb + thread pooling 21:43:05 #topic db 21:43:09 omg….. 21:43:19 boris-42 wanted to talk about his db-unique-keys blueprint 21:43:31 anyway, I could complian forever about this.. let's move on 21:43:32 ;) 21:43:42 ;) 21:44:08 Ok. I am working on db-unique-keys and I haven't enough time to do all job around.. 21:44:58 I think the one remaining concern I have on db-unique-keys is the behavior in turning on a UC in a migration. As the initial patches would fail in the migration if the UC couldn't be applied, but it seems bad form to leave folks stuck in the middle of a migration chain. 21:45:14 +1000 21:45:18 I finished main part of this BP (provide creating real UC) and finished work around DBDuplicatedEntry 21:45:31 DBDuplicateEntryError=) 21:45:34 sdague: makes conductor look better though :) 21:45:38 heh 21:45:44 you can have the migration resolve duplicates and remove them 21:45:46 * sdague thinks dansmith just wants off the hook 21:45:48 re the db-session-cleanup bp, i've been completely focused on other things and wont get around to the cleaning up the remaining bits of that :( 21:45:56 comstud: yeh, that's my pov 21:46:03 I just wanted to make sure that was shared 21:46:10 i was going to add one for bw_usage_caches 21:46:11 so we can shepard that through 21:46:23 start_period+uuid+mac needs to be unique 21:46:23 devananda: ok i'll bump it from g then 21:46:37 sdague: for applying a UX during a migration, could there be a query to check that it is, in fact, safe to apply? 21:46:37 as long as we are consistent in believing that the right thing to do is remove the dups, I'm good 21:46:58 if we're currently doing .first() queries on those things 21:47:05 there's no point in keeping dups 21:47:10 well 21:47:15 devananda: I guess, except in the middle of a db-sync you might be trying to apply all 50 21:47:21 afaict, .first() is not ordered anywhere 21:47:24 and exactly how would you test 21:47:34 yeah, well, what I meant was.. 21:47:37 if we only grab 1 21:47:39 so which one is returned is up to the db to pick ... however it wants 21:47:41 Hmm I want to create generic method that will drop all duplicates except last 21:47:43 is it ok? 21:47:43 remove the dupes 21:47:44 yeh, I think we just need to put some disclaimer 21:47:54 boris-42: yes, I think that's the right thing to do 21:47:56 boris-42: i think 21:47:59 yes 21:48:11 boris-42: and I know you were already working in that direction 21:48:24 just wanted to confirm with everyone else here that it's the approach we're all good with 21:48:34 Ok. I will do that. 21:48:53 and we should put a big statement in release notes about the fact that the migrations remove duplicates 21:49:01 so people should ensure to backup before upgrade 21:49:11 Ok. I only hope that I will have enough time. 21:49:18 might be good to also log when the migration deletes duplicates 21:49:23 ie, print the table name and row id 21:49:27 +1 to removing dups 21:49:32 devananda: +1, I like that 21:49:40 yes 21:49:49 boris-42: can you add logging to the duplicate remover 21:50:14 Yes it is possible in my implementation 21:51:01 boris-42: ok, cool, I'll help review this one and try to help get it in 21:51:03 I found not so optimal solution. (there will be a lot of queries) but I think it is not so important for migration scripts? 21:51:37 boris-42: yeh, that's probably ok. Let's get it in a review and have people have a look 21:51:54 Ok I hope it will be finished on saturday (or probably tomorrow) 21:52:12 comstud: you able to stay on top of this review as well? 21:52:24 would be good to have a couple dedicated folks to help it move 21:52:33 which one? 21:52:36 dup keys? 21:52:37 sure 21:52:40 yep 21:52:49 just add me to the review 21:53:00 how about db-archiving, you guys think that one can make it? 21:53:01 i'll *try* to keep up... i've been ignoring a lot of shit this week 21:53:10 sounds like it was pretty close from talking to dripton earlier today 21:53:18 russellb: yeh, that's real close 21:53:24 awesome 21:53:27 I think it is ready 21:53:28 just needs a little test enhancement from my perspective 21:53:33 he landed another review 21:53:37 I haven't looked yet 21:53:40 might be good now 21:53:44 sdague: I did a bunch of test tweaks after your review. 21:53:47 It could be better 21:53:51 dripton: great 21:54:05 I'll take another look 21:54:33 mind if I jump in re: baremetal for the last few minutes? 21:54:34 some performance optimization to avoid getting all rows. 21:54:44 Ok. 21:54:59 there's a db meeting after this in theory if folks want to cover that some more 21:55:02 #topic baremetal 21:55:04 boris-42: we could handle that as a bug after feature freeze 21:55:17 devananda: go for it 21:55:33 iirc, after the last nova meeting we agreed that the initial baremetal bp was implemented 21:55:49 'tis marked that way 21:55:54 i've been hammering on bugs all week, and a few things that kinda border bugs 21:56:05 ie, where baremtal cant' do things other hypervisors can 21:56:18 there's a bunch of patches up now that I'd _really_ like to land ;) 21:56:45 heh, lots of patches lots of people would like to land. 21:56:54 +1 21:57:01 russellb: but he _really_ wants them :) 21:57:07 hehe :) 21:57:10 oh well that changes everything :-p 21:57:24 devananda: anything you need beyong review bandwidth? 21:57:41 fwiw, baremetal is really fragile right now. it works until you sneeze ... then you have to restack 21:57:46 honestly if the patches are self contained in baremetal land, we can probably breeze them through 21:57:48 these bugs go a good way towards fixing that 21:58:17 they are. one of them changes how nova identifies baremetal nodes -- the hypervisor_hostname / nodename changes from an int to a uuid 21:58:19 devananda: this is improve-baremetal-deply? 21:58:21 er, deploy 21:58:22 but no nova code actually changes 21:58:24 dansmith: yes 21:58:40 also there are 3 baremetal db migrations 21:58:48 so i added a new test_baremetal_migrations.py file 21:58:57 again though, basically self contained 21:59:23 devananda: this just a patch stream? or hanging off a blueprint 21:59:47 sdague: patch stream with many bug ##s in the commit messages 22:00:13 ok, gotcha 22:00:52 well, out of time it seems 22:01:13 seems worth mentioning again that featurefreeze is feb 19 22:01:15 review ALL THE THINGS 22:01:23 #endmeeting