21:01:47 <mikal> #startmeeting nova 21:01:49 <openstack> Meeting started Thu Sep 11 21:01:47 2014 UTC and is due to finish in 60 minutes. The chair is mikal. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:01:50 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:01:53 <openstack> The meeting name has been set to 'nova' 21:01:58 <melwitt> o/ 21:02:01 <mikal> Well hello 21:02:09 <dansmith> quick would be awesome 21:02:09 <mikal> The agenda as always is at https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting 21:02:11 * mriedem1 sneaks in late 21:02:18 <alaski> o/ 21:02:21 <mikal> #topic Feature freeze exceptions 21:02:27 <mikal> #link https://etherpad.openstack.org/p/juno-nova-approved-ffes 21:02:33 <mikal> We still have a few inflight 21:02:44 <mikal> The deadline for them to _enter_ the gate being midnight Friday this week 21:02:54 <dansmith> so 21:02:57 <dansmith> we just had a thing 21:03:02 <dansmith> that kicked out at least one 21:03:06 <mikal> Looking at the list, I think server group quotas is the most off the rails of the four left 21:03:23 <mikal> dansmith: as in verification failed? 21:03:42 <mriedem1> mikal: no gate reset 21:03:42 <dansmith> mikal: as in, 100% of unit tests were failing 21:03:44 <PhilD> Codes's there - but it needs a tempest chaneg which is in the gate. Going to just be a race against the gate at ths point 21:04:03 <mriedem1> mikal: bad sqla-migrate release broke unit tests, had to be promoted to top 21:04:06 <mriedem1> which reset the gate 21:04:12 <mikal> mriedem1: ahhh ok 21:04:24 <dansmith> well, and SRIOV got kicked because it "failed" 21:04:24 <mikal> I wouldn't panic yet 21:04:27 <dansmith> because of unit tests 21:04:37 <mikal> If things are approved and we're just fighting the gate then we can work through that 21:04:52 <mikal> Let's go through these four real quick 21:04:57 <mikal> vmware refactor 21:05:02 <mikal> There are two minor patches left? 21:05:17 <mriedem1> https://review.openstack.org/#/c/100927 21:05:18 <mikal> One approved and one not 21:05:18 <PhilD> The tempest change needed for sg quotas has been in the gate for 21hrs. Once that merges I'm expecting the nova changes to verify - 21:05:22 <mriedem1> that's the last one it looks like 21:05:39 <mikal> https://review.openstack.org/#/c/119696/ is approved but not merged as well 21:05:50 <mriedem1> the latter sounds like cleanup, not functional 21:05:50 <mikal> But yeah, I think vmware refactor is basically done except for that one 21:05:54 <mriedem1> yup 21:06:04 <tjones> hurrah! 21:06:16 <mriedem1> it's not tied to the bp either 21:06:18 <mriedem1> https://review.openstack.org/#/c/100927 that is 21:06:25 <mikal> mriedem1: oh, as in we might just do that last one during stablisation anyways? 21:06:37 <mikal> Ahhh, well spotted 21:06:43 <mikal> I had trusted the etherpad list 21:06:45 <mikal> Ok 21:06:53 <mikal> So, vmware refactor is safe I'd say 21:07:02 <mikal> virt-numa-driver-placement then 21:07:35 <mikal> Two there need approval 21:07:36 <dansmith> mikal: fully in the gate, I think 21:07:41 <mikal> https://review.openstack.org/#/c/115381/ 21:07:49 <mikal> https://review.openstack.org/#/c/115007/ 21:08:00 <dansmith> oh, thought jaypipes had done that one already 21:08:17 <mikal> So, it would be cool if someone could be the second +2 on those todayish 21:08:31 <jaypipes> I will. 21:08:31 <dansmith> jaypipes: was going to I think he prolly just got distracted 21:08:33 <dansmith> yeah 21:08:36 <jaypipes> I thought I already had... 21:08:36 <mikal> I will take a look at them after the meeting unless someone beats me to it 21:08:48 <mikal> Cool, so virt-numa is safe then too 21:08:54 <jaypipes> mikal: nah, I'm on it. 21:09:02 <mikal> jaypipes: ta 21:09:04 <mikal> server group quotas 21:09:08 <mikal> PhilD: its your time to shine 21:09:17 <mikal> We're waiting on the tempest change? 21:09:17 <PhilD> First patch is in the gate 21:09:32 <PhilD> second part is +2'd but needs the tempest change 21:10:11 <mikal> https://review.openstack.org/#/c/116079/ needs rescue though? 21:10:11 <PhilD> third part has been reviewd OK but was missign v2.1 work. That's there now - waitign for Chris and Ken'ichi to come on lien and do a final pass 21:10:28 <mikal> Ahhh, ok 21:10:34 <mikal> So you think you're covered once those guys wake up? 21:10:42 <PhilD> Needs Chris and Ken'icih to re-review it yes, btu the changs is jist to 2,1 21:10:44 <mikal> Or do you want to ask if anyone else can review as well? 21:10:46 <PhilD> Yep, I think so 21:10:50 <mikal> Cool 21:10:55 <PhilD> More eyes are always welcome 21:10:56 <mikal> So, server groups is mostly safe then 21:11:11 <mikal> SRIOV is the last one 21:11:29 <dansmith> SRIOV was fully in the gate earlier 21:11:32 <mikal> https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/pci-passthrough-sriov,n,z is the correc tlist for this one, right? 21:11:40 <dansmith> after a check run of the base one, it should all pop back in 21:11:47 <mikal> https://review.openstack.org/#/c/120675/ needs re-approval? 21:12:05 <mikal> https://review.openstack.org/#/c/120423/ depends on an outdated dependancy? 21:12:07 <dansmith> mikal: that one is not part of the original thing 21:12:14 <mikal> Ahhh, ok 21:12:16 <dansmith> mikal: something baoli and I are adding on top for more testing 21:12:18 <mikal> Its in the gerrit list is all 21:12:23 <dansmith> yep 21:12:26 <mikal> Ok, cool 21:12:44 <mikal> Can you turn the etherpad list into the canonical list of reviews you need to merge for the bp to be marked completed? 21:12:49 <mikal> That will help me know when to mark it done 21:13:04 <dansmith> okay 21:13:06 <sdague> so, how possibly destabalizing do we think any of these ffes could be? 21:13:31 <mikal> Well, we approved them all 21:13:35 <mikal> So in theory they shouldn't be too bad 21:13:37 <sdague> because back of the envelope... these are probably not really landing until middle of next week if "in gate by friday" is the critera 21:13:50 <PhilD> The quota change is pretty well contained, so its scope for mayhem is limited 21:14:05 <mikal> Our gate load should be pretty light, I guess that depends on how wild other teams are going 21:14:07 <sdague> yeh, vmware and quotas feel pretty off in their corner 21:14:22 <mikal> SRIOV sounds well tested 21:14:35 <mikal> i.e. someone has actually deployed it to test, right Dan? 21:14:36 <PhilD> Numa and SRIOV look like they have a much bigger surface 21:14:36 <sdague> yeh, my bigger concern is the fact that we went from 1 in 10 tests failing to 1 in 2 21:14:40 <sdague> during FFE 21:14:45 <sdague> for all of openstack 21:14:47 <mikal> Herm 21:14:50 <dansmith> mikal: yeah, it *actually* works :) 21:14:50 <mikal> That's not good 21:14:59 <sdague> that's why there is a 30 hr backup 21:15:07 <sdague> which was a 26 hr backup this morning 21:15:08 <baoli> mikal, it's been tested on cisco's vmfex and mlnx's 21:15:11 <mikal> sdague: do we know what bit is less stable now? 21:15:22 <sdague> mikal: no... 21:15:25 <jogo> another way to look at this is: on our record for merges to openstack/openstack in a single day is something like 140 21:15:34 <jogo> yesterday we merged 38 21:15:39 <sdague> yeh, right now we should be merging 120 - 150 a day 21:15:47 <sdague> given the in flow 21:15:58 <sdague> so I get everyone wants their feature in 21:16:16 <jogo> mikal: everything is pretty bad right now 21:16:38 <mikal> Looking at the calendar, we cut rc1 on 25 September (ish) 21:16:40 <PhilD> If someone coudl keep an eye on the tempest change https://review.openstack.org/#/c/120395/ and re-vreify the other two Nova changes if/when it lands overnight (my time) that would help me 21:16:43 <sdague> but ... the trade off with this much energy still on features at this point in the game, seems odd 21:16:48 <mikal> Which is 14 days away 21:17:05 <jogo> horizon, neutron, nova, mordred, glance 21:17:16 <mikal> mordred is unreliable? 21:17:41 <mikal> sdague: I agree we need to move past this to really getting stabilization happening 21:17:44 <sdague> he did push sqlalchemy-migrate package which broke all our unit tests 21:17:49 <jogo> mikal: yup https://review.openstack.org/gitweb?p=openstack/nova.git;a=commitdiff;h=9c2d6f21854bc79f03095163b8133a9fec68f9f2 21:17:50 <mikal> Oh, nice 21:18:19 <sdague> well, the point is, we're not going to be able to start what we'd normally consider stabalization until the gate is under control 21:18:28 <sdague> and that's probably 10 - 14 days solid effort 21:18:36 <jogo> what about a hard deadline for things by your Monday mikal 21:18:39 <sdague> anyway, something to consider 21:18:56 <mikal> jogo: as in "if it hasn't merged by then you're out?" 21:19:05 <jogo> mikal: yup 21:19:26 <mikal> I think that's fair... Its two days past the original deadline in the UK timezone. 21:19:43 <mikal> It only works for us if the rest of the ecosystem is also dropping things out of the gate too though, right? 21:20:12 <mikal> We need other projects to also be working on stabilization 21:20:39 <jogo> mikal: yeah that would help 21:20:51 <dansmith> so, without a doubt, that deadline is going to leave some of these features half-merged 21:21:15 <mikal> dansmith: that's hard to tell... It depends how broken the gate is 21:21:19 <jogo> mikal: I can send an email out later today about gate issues 21:21:29 <mikal> jogo: that would be good 21:21:34 <dansmith> Based on how it has been going this week, I think it's probably how it will go 21:21:45 <mikal> Oh, I agree 21:21:49 <PhilD> Do they all leave things a sane state if they part merge ? 21:21:53 <dansmith> so would we (a) revert the pieces, (b) leave them, or (c) push the rest to keep the feature whole? 21:21:56 <mikal> I guess my point is do we think its going to be like that for the next two weeks? 21:22:05 <mikal> Are we just going to reverify over and over for a couple of weeks? 21:22:11 <sdague> well, all patches are supposed to be safe for the b camp, right? 21:22:14 <dansmith> I'm not arguing against the deadline, by the way :) 21:22:24 <dansmith> sdague: technically, sure 21:22:24 <sdague> that's our design point, we work at every patch level 21:22:25 <mikal> b camp? 21:22:36 <jogo> dansmith: we can evaulate case by case how to deal with things that won't make it 21:22:37 <dansmith> I'm just curious how we're going to handle it 21:22:38 <sdague> i.e. don't revert work that lands, just move on 21:22:43 <dansmith> jogo: okie 21:23:00 <dansmith> these are some of the larger FFEs we've ever had, I think 21:23:01 <mikal> I think there's always an implied "we will do the least worst thing when the time comes" 21:23:12 <mikal> But encouraging people to be watching their patches over the weekend is a good idea 21:23:15 <dansmith> so I just want to make sure we think about what we're going to do 21:23:28 <dansmith> "nothing" is fine with me :) 21:23:34 <sdague> yeh, nova had a ton of large FFE 21:23:53 <sdague> normally FFE for nova is closed out by the tuesday after 21:24:00 <mikal> This is true, although I think we also worked through most of them better than ususal as well 21:24:05 <mikal> Gate tech debt is what is hurting us here 21:24:13 <mikal> (Noting that is our fault as well, just a different thing) 21:24:35 <mikal> I want to move onto bugs if I may 21:24:43 <mikal> Because I think that's where the gate discussion goes 21:25:00 <mikal> #topic Bugs 21:25:06 <mikal> So... 21:25:22 <mikal> sdague / mriedem1 / jogo: if you could cherry pick stablization bugs for people to work on, what would they be? 21:25:33 <mikal> Do you have a wish list? 21:25:39 <jogo> mikal: go to http://status.openstack.org/elastic-recheck/ 21:25:41 <mikal> Would putting a list of important bugs in an etherpad or something help? 21:25:42 <mriedem1> mikal: there was some discussion in nova today already, lots 21:25:45 <jogo> and look for the word nova (or neutron) 21:26:01 <tjones> should we mark those critical for tracking purposes? 21:26:12 <sdague> honestly, I've not been working on gate issues, I've been trigaging the nova bug queue 21:26:15 <mriedem1> yeah i'd go through e-r and look for things that are targeted to nova and aren't completely ambiguous, like 'think timed out waiting for status x' 21:26:18 <tjones> there is just 1 critical affecting gate https://bugs.launchpad.net/nova/+bug/1367941 but i think there are more... 21:26:20 <mikal> I think its a bending of our triage rules, but if something is breaking the gate its a big deal 21:26:21 <uvirtbot> Launchpad bug 1367941 in oslo-incubator "Able to aquire the semaphore used in lockutils.synchronized_with_prefix twice at the same time" [Critical,Confirmed] 21:26:23 <sdague> tjones: yeh, I think marking gate bugs critical is good 21:26:38 <mriedem1> tjones: i thought we had a lockutils sync? 21:26:45 <jogo> tjones: yes that would be great, I don't think we can realistically mark all as critical. so say anything with 5 or more hits in 1 day 21:26:53 <mriedem1> oh nvm 21:26:54 <mriedem1> https://review.openstack.org/#/c/120897/ 21:26:55 <sdague> mriedem1: this is after the sync 21:27:20 <tjones> jogo: good idea 21:27:24 <jogo> mriedem1: mriedem1 lockutils was a red herring 21:27:35 <mikal> jogo: the elastic_recheck page is sorted by fail rate, yes? 21:27:42 <sdague> mikal: yes 21:27:45 <jogo> tjones: also some are marked as invlalid 21:27:51 <jogo> mikal: fail rate in last 24 hours 21:27:57 <jogo> so just go to the top of the page 21:28:12 <sdague> jogo: did you get any further on - 1367941 ? 21:28:22 <sdague> because I can imagine that might be a culprit in lots of issues 21:28:28 <jogo> sdague: yeah it was a red herring 21:28:33 <jogo> sdague: just misleading logs 21:28:37 <sdague> ok 21:28:42 <jogo> sdague: which is too bad 21:28:47 <sdague> so why is it still confirmed? 21:29:00 <jogo> sdague: because there was a bug, but it was in the logging 21:29:17 <sdague> can you update the log message and make it not critical :P 21:29:30 <jogo> sdague: that is the workigng theory at least and everything backs that up 21:30:39 <jogo> sdague: re-triaged out of critical 21:30:44 <sdague> jogo: thanks 21:31:15 <jogo> sdague: you looked into bug 1357476 right? 21:31:19 <uvirtbot> Launchpad bug 1357476 in neutron "Timeout waiting for vif plugging callback for instance" [Medium,Confirmed] https://launchpad.net/bugs/1357476 21:31:21 <sdague> on the up side, only 32 bugs in the new state for nova 21:31:30 <sdague> I was debugging with dansmith 21:31:31 <mriedem1> jogo: dansmith has a logging patch up for that 21:31:40 <sdague> yeh, we need the logging patch 21:31:41 <mriedem1> https://review.openstack.org/#/c/120842/ 21:31:41 <mikal> I guess I'm hoping for a FFE like workflow for bugs at this point 21:31:44 <sdague> something's not right 21:31:50 <mikal> If we could decide as a group a small number to focus on 21:31:56 <mikal> We might make more progress than we do usually 21:32:04 <mikal> I guess "Critical" is one way of defining that group 21:32:33 <tjones> either that or use a tag 21:32:59 <jogo> lets do critical 21:33:15 <mikal> Well, we should also be emailing that list to -dev a bunch 21:33:22 <mikal> Let's make it hard for people to not know what to look at 21:33:36 <tjones> that will do it! 21:33:37 <mikal> If people were checking the bug tracker, we wouldn't have 1,000 bugs 21:34:17 <tjones> 935 (without incomplete) 21:34:27 <mriedem1> moving on... 21:34:31 <mriedem1> novaclient release? 21:34:31 <mikal> Yeah 21:34:40 <mikal> But if someone could come up witha wishlist email that would be good 21:34:45 <mikal> mriedem1: sure, if you want one 21:34:50 <mikal> mriedem1: what prompts it? 21:35:00 <mriedem1> mikal: 9/18 is the deadline for final client releases before rc1 21:35:12 <mikal> And would it destabilize the gate? 21:35:32 <mriedem1> the gate already uses trunk novaclient 21:35:38 <mikal> Wow, there's a lot of "undecided" in the client fix committed list 21:35:51 <mriedem1> this would be for getting it into global-reqs i think as a min version? 21:35:53 <mriedem1> there is a ML thread 21:35:57 <mikal> Nothing above medium for those thigns which are triaged 21:36:16 <mriedem1> mikal: http://lists.openstack.org/pipermail/openstack-dev/2014-September/045487.html 21:36:28 * mikal looks 21:36:37 <mikal> Oh that one 21:36:39 <mriedem1> mikal: basically any FFEs that touch the clients should have a release before rc1 21:36:46 <mikal> Ok, so how about I try and triage those bugs so they're less confusing over the weekend 21:36:51 <mikal> And do a release on my Monday? 21:37:02 <mriedem1> sure, well i think you have a week 21:37:25 <mikal> I don't recall seeing any client changes in the FFE review list 21:37:29 <mikal> But I might have missed them 21:37:34 <mikal> I wasn't really looking for them 21:37:35 <mriedem1> they'd be tied to the bp's 21:37:52 <mikal> Ok, I will make a note to do a client release 21:37:58 <mikal> oMoving on? 21:38:08 <mriedem1> mikal: https://github.com/openstack/python-novaclient/commit/c59a0c8748ccc5f6a0cf80910c09b9328b4253ac 21:38:16 <mriedem1> that's an example of server bp that is in the client 21:38:20 <mriedem1> but not in a released client version 21:38:22 <mriedem1> for vishy 21:38:38 <mikal> So noted 21:39:02 <PhilD> https://review.openstack.org/#/c/108942/ is the client change for server group quotas 21:39:30 <mikal> Ugh 21:39:37 <mriedem1> PhilD: he needs to tie it to the bp 21:39:38 <mikal> Getting that through the gate before 18 September will be hard 21:39:57 <mikal> Given we need to get three of its friends through first 21:40:11 <sdague> yeh, honestly, just wait on the client bit there and bring it in once we open the tree back up 21:40:22 <sdague> and cut another release of the nova client after the release 21:40:32 <PhilD> Ok - makes sense 21:40:36 <mikal> Yeah, client releases aren't too scary 21:40:37 <jaypipes> mikal: both remaining NUMA patches now reviewed. 21:40:49 <mikal> But I agree that change needs to be tied to the BP 21:40:54 <mikal> Moving on... 21:40:58 <PhilD> Its only the shell change - the client bindign itself will work wth the new quotas anyway 21:40:59 <mikal> I think we've covered gate already 21:41:14 <mikal> #topic Ironic API proxy 21:41:38 <mikal> The code is out for review for this one right? 21:41:47 <mikal> It seems the concensus on the thread is we should land it? 21:42:47 <mikal> Its not even very big 21:42:54 <mikal> So... Who is going to review it? 21:43:19 <dansmith> I will 21:43:21 <sdague> I will 21:43:22 <dansmith> because I have time for that 21:43:26 <dansmith> (not) 21:43:29 <mikal> I will too, cause I feel left out 21:43:39 <mriedem1> link? 21:43:40 <mikal> Ok, let's just do it then 21:43:42 <sdague> I did the first round, mostly wanted a fix on imports, didn't see if we had a second 21:43:45 <mikal> https://review.openstack.org/#/c/120433/ 21:43:56 <mikal> We're at patch set 7 21:44:05 <mikal> This is the last thing blocking ironic graduation as best as I can tell 21:44:32 <sdague> unit test fails are probably migrate's fault 21:44:33 <mriedem1> i'll add to the queue 21:44:35 <sdague> rechecking 21:44:49 <mikal> sdague: It already had a recheck? 21:45:10 <sdague> it just failed out recently I thought 21:45:25 <mikal> It failed at 7:29AM 21:45:30 <mikal> And got rechecked at 7:39AM 21:45:35 <mikal> (My time) 21:45:40 <sdague> in crazy future world 21:45:44 <sdague> ok, my bad :) 21:45:44 <mikal> LOL 21:45:55 <mikal> Ok, I'm done with ironic proxies 21:46:01 <mikal> #topic Open Discussion 21:46:47 <bknudson> I've got a bug fix up for review -- https://review.openstack.org/#/c/119646/ -- for some reason it's -V. 21:46:59 <bknudson> although it's actually passed all the CI 21:47:24 <jogo> bknudson: citrix doesn't like it 21:47:28 <jogo> click toggle CI 21:47:33 <mriedem1> citrix doesn't like much today 21:47:38 <mikal> Sigh 21:47:46 <jogo> although citrix says it passed but -1ed anywy 21:47:51 <bknudson> for some reason the latest citrix CI comment says it passed. 21:48:08 <bknudson> maybe just rebase it and run again? 21:48:17 <mikal> Has anyone pinged Bob about the citrix CI being unreliable? 21:48:20 <mriedem1> just did 21:48:24 <mriedem1> consider him punged 21:48:26 <mikal> Ta 21:48:34 <sdague> the reason citrix CI is unreliable is the memory bloat 21:48:45 <jogo> sdague: ? 21:48:47 <sdague> they are basically always running out of memory on their systems during test 21:48:53 <clarkb> right they have the xenserver overhead on top of everything else 21:49:03 <clarkb> which is non trivial iirc 21:49:05 * dansmith shudders 21:49:09 <jogo> haha 21:49:21 <sdague> BobBall was actually originally the one pushing the reduction of workers in devstack because of *this* 21:49:56 <sdague> so there is a patch, that ianw disagrees with me on, that gives us a global flag to tune this. I suppose I should just override him and land it so we can move forward. 21:50:33 <bknudson> devstack change: https://review.openstack.org/#/c/117517/ 21:51:16 <mikal> So, I think we're done here? 21:51:18 <mikal> Nothing else? 21:51:24 <jogo> fix bugs please! 21:51:29 <mikal> I live to obey 21:51:32 <jogo> I marked gate bugs as critical 21:51:36 <mikal> Thanks 21:51:36 <sdague> mikal: definition for release critical bugs? 21:51:37 <mriedem1> why does novaclient have a config generator file? 21:52:19 <mikal> sdague: hmmm, not sure. I haven't given much thought to that yet. Do we have a written definition we've used in the past? 21:53:04 <sdague> don't know, it would just probably be good for guidance on triage 21:53:13 <mikal> Yep 21:53:20 <mikal> This sounds like the sort of thing ttx will have advice on 21:53:24 <mikal> I shall ping him and ask 21:53:43 <mikal> I feel like I don't need to reinvent the wheel here, just work out what we said in the past 21:54:30 <mikal> Sounds like we're done 21:54:35 <mikal> Have a 6 minute break 21:54:38 <mikal> Then go fix bugs 21:54:53 <mikal> #endmeeting