21:01:47 #startmeeting nova 21:01:49 Meeting started Thu Sep 11 21:01:47 2014 UTC and is due to finish in 60 minutes. The chair is mikal. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:01:50 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:01:53 The meeting name has been set to 'nova' 21:01:58 o/ 21:02:01 Well hello 21:02:09 quick would be awesome 21:02:09 The agenda as always is at https://wiki.openstack.org/wiki/Meetings/Nova#Agenda_for_next_meeting 21:02:11 * mriedem1 sneaks in late 21:02:18 o/ 21:02:21 #topic Feature freeze exceptions 21:02:27 #link https://etherpad.openstack.org/p/juno-nova-approved-ffes 21:02:33 We still have a few inflight 21:02:44 The deadline for them to _enter_ the gate being midnight Friday this week 21:02:54 so 21:02:57 we just had a thing 21:03:02 that kicked out at least one 21:03:06 Looking at the list, I think server group quotas is the most off the rails of the four left 21:03:23 dansmith: as in verification failed? 21:03:42 mikal: no gate reset 21:03:42 mikal: as in, 100% of unit tests were failing 21:03:44 Codes's there - but it needs a tempest chaneg which is in the gate. Going to just be a race against the gate at ths point 21:04:03 mikal: bad sqla-migrate release broke unit tests, had to be promoted to top 21:04:06 which reset the gate 21:04:12 mriedem1: ahhh ok 21:04:24 well, and SRIOV got kicked because it "failed" 21:04:24 I wouldn't panic yet 21:04:27 because of unit tests 21:04:37 If things are approved and we're just fighting the gate then we can work through that 21:04:52 Let's go through these four real quick 21:04:57 vmware refactor 21:05:02 There are two minor patches left? 21:05:17 https://review.openstack.org/#/c/100927 21:05:18 One approved and one not 21:05:18 The tempest change needed for sg quotas has been in the gate for 21hrs. Once that merges I'm expecting the nova changes to verify - 21:05:22 that's the last one it looks like 21:05:39 https://review.openstack.org/#/c/119696/ is approved but not merged as well 21:05:50 the latter sounds like cleanup, not functional 21:05:50 But yeah, I think vmware refactor is basically done except for that one 21:05:54 yup 21:06:04 hurrah! 21:06:16 it's not tied to the bp either 21:06:18 https://review.openstack.org/#/c/100927 that is 21:06:25 mriedem1: oh, as in we might just do that last one during stablisation anyways? 21:06:37 Ahhh, well spotted 21:06:43 I had trusted the etherpad list 21:06:45 Ok 21:06:53 So, vmware refactor is safe I'd say 21:07:02 virt-numa-driver-placement then 21:07:35 Two there need approval 21:07:36 mikal: fully in the gate, I think 21:07:41 https://review.openstack.org/#/c/115381/ 21:07:49 https://review.openstack.org/#/c/115007/ 21:08:00 oh, thought jaypipes had done that one already 21:08:17 So, it would be cool if someone could be the second +2 on those todayish 21:08:31 I will. 21:08:31 jaypipes: was going to I think he prolly just got distracted 21:08:33 yeah 21:08:36 I thought I already had... 21:08:36 I will take a look at them after the meeting unless someone beats me to it 21:08:48 Cool, so virt-numa is safe then too 21:08:54 mikal: nah, I'm on it. 21:09:02 jaypipes: ta 21:09:04 server group quotas 21:09:08 PhilD: its your time to shine 21:09:17 We're waiting on the tempest change? 21:09:17 First patch is in the gate 21:09:32 second part is +2'd but needs the tempest change 21:10:11 https://review.openstack.org/#/c/116079/ needs rescue though? 21:10:11 third part has been reviewd OK but was missign v2.1 work. That's there now - waitign for Chris and Ken'ichi to come on lien and do a final pass 21:10:28 Ahhh, ok 21:10:34 So you think you're covered once those guys wake up? 21:10:42 Needs Chris and Ken'icih to re-review it yes, btu the changs is jist to 2,1 21:10:44 Or do you want to ask if anyone else can review as well? 21:10:46 Yep, I think so 21:10:50 Cool 21:10:55 More eyes are always welcome 21:10:56 So, server groups is mostly safe then 21:11:11 SRIOV is the last one 21:11:29 SRIOV was fully in the gate earlier 21:11:32 https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/pci-passthrough-sriov,n,z is the correc tlist for this one, right? 21:11:40 after a check run of the base one, it should all pop back in 21:11:47 https://review.openstack.org/#/c/120675/ needs re-approval? 21:12:05 https://review.openstack.org/#/c/120423/ depends on an outdated dependancy? 21:12:07 mikal: that one is not part of the original thing 21:12:14 Ahhh, ok 21:12:16 mikal: something baoli and I are adding on top for more testing 21:12:18 Its in the gerrit list is all 21:12:23 yep 21:12:26 Ok, cool 21:12:44 Can you turn the etherpad list into the canonical list of reviews you need to merge for the bp to be marked completed? 21:12:49 That will help me know when to mark it done 21:13:04 okay 21:13:06 so, how possibly destabalizing do we think any of these ffes could be? 21:13:31 Well, we approved them all 21:13:35 So in theory they shouldn't be too bad 21:13:37 because back of the envelope... these are probably not really landing until middle of next week if "in gate by friday" is the critera 21:13:50 The quota change is pretty well contained, so its scope for mayhem is limited 21:14:05 Our gate load should be pretty light, I guess that depends on how wild other teams are going 21:14:07 yeh, vmware and quotas feel pretty off in their corner 21:14:22 SRIOV sounds well tested 21:14:35 i.e. someone has actually deployed it to test, right Dan? 21:14:36 Numa and SRIOV look like they have a much bigger surface 21:14:36 yeh, my bigger concern is the fact that we went from 1 in 10 tests failing to 1 in 2 21:14:40 during FFE 21:14:45 for all of openstack 21:14:47 Herm 21:14:50 mikal: yeah, it *actually* works :) 21:14:50 That's not good 21:14:59 that's why there is a 30 hr backup 21:15:07 which was a 26 hr backup this morning 21:15:08 mikal, it's been tested on cisco's vmfex and mlnx's 21:15:11 sdague: do we know what bit is less stable now? 21:15:22 mikal: no... 21:15:25 another way to look at this is: on our record for merges to openstack/openstack in a single day is something like 140 21:15:34 yesterday we merged 38 21:15:39 yeh, right now we should be merging 120 - 150 a day 21:15:47 given the in flow 21:15:58 so I get everyone wants their feature in 21:16:16 mikal: everything is pretty bad right now 21:16:38 Looking at the calendar, we cut rc1 on 25 September (ish) 21:16:40 If someone coudl keep an eye on the tempest change https://review.openstack.org/#/c/120395/ and re-vreify the other two Nova changes if/when it lands overnight (my time) that would help me 21:16:43 but ... the trade off with this much energy still on features at this point in the game, seems odd 21:16:48 Which is 14 days away 21:17:05 horizon, neutron, nova, mordred, glance 21:17:16 mordred is unreliable? 21:17:41 sdague: I agree we need to move past this to really getting stabilization happening 21:17:44 he did push sqlalchemy-migrate package which broke all our unit tests 21:17:49 mikal: yup https://review.openstack.org/gitweb?p=openstack/nova.git;a=commitdiff;h=9c2d6f21854bc79f03095163b8133a9fec68f9f2 21:17:50 Oh, nice 21:18:19 well, the point is, we're not going to be able to start what we'd normally consider stabalization until the gate is under control 21:18:28 and that's probably 10 - 14 days solid effort 21:18:36 what about a hard deadline for things by your Monday mikal 21:18:39 anyway, something to consider 21:18:56 jogo: as in "if it hasn't merged by then you're out?" 21:19:05 mikal: yup 21:19:26 I think that's fair... Its two days past the original deadline in the UK timezone. 21:19:43 It only works for us if the rest of the ecosystem is also dropping things out of the gate too though, right? 21:20:12 We need other projects to also be working on stabilization 21:20:39 mikal: yeah that would help 21:20:51 so, without a doubt, that deadline is going to leave some of these features half-merged 21:21:15 dansmith: that's hard to tell... It depends how broken the gate is 21:21:19 mikal: I can send an email out later today about gate issues 21:21:29 jogo: that would be good 21:21:34 Based on how it has been going this week, I think it's probably how it will go 21:21:45 Oh, I agree 21:21:49 Do they all leave things a sane state if they part merge ? 21:21:53 so would we (a) revert the pieces, (b) leave them, or (c) push the rest to keep the feature whole? 21:21:56 I guess my point is do we think its going to be like that for the next two weeks? 21:22:05 Are we just going to reverify over and over for a couple of weeks? 21:22:11 well, all patches are supposed to be safe for the b camp, right? 21:22:14 I'm not arguing against the deadline, by the way :) 21:22:24 sdague: technically, sure 21:22:24 that's our design point, we work at every patch level 21:22:25 b camp? 21:22:36 dansmith: we can evaulate case by case how to deal with things that won't make it 21:22:37 I'm just curious how we're going to handle it 21:22:38 i.e. don't revert work that lands, just move on 21:22:43 jogo: okie 21:23:00 these are some of the larger FFEs we've ever had, I think 21:23:01 I think there's always an implied "we will do the least worst thing when the time comes" 21:23:12 But encouraging people to be watching their patches over the weekend is a good idea 21:23:15 so I just want to make sure we think about what we're going to do 21:23:28 "nothing" is fine with me :) 21:23:34 yeh, nova had a ton of large FFE 21:23:53 normally FFE for nova is closed out by the tuesday after 21:24:00 This is true, although I think we also worked through most of them better than ususal as well 21:24:05 Gate tech debt is what is hurting us here 21:24:13 (Noting that is our fault as well, just a different thing) 21:24:35 I want to move onto bugs if I may 21:24:43 Because I think that's where the gate discussion goes 21:25:00 #topic Bugs 21:25:06 So... 21:25:22 sdague / mriedem1 / jogo: if you could cherry pick stablization bugs for people to work on, what would they be? 21:25:33 Do you have a wish list? 21:25:39 mikal: go to http://status.openstack.org/elastic-recheck/ 21:25:41 Would putting a list of important bugs in an etherpad or something help? 21:25:42 mikal: there was some discussion in nova today already, lots 21:25:45 and look for the word nova (or neutron) 21:26:01 should we mark those critical for tracking purposes? 21:26:12 honestly, I've not been working on gate issues, I've been trigaging the nova bug queue 21:26:15 yeah i'd go through e-r and look for things that are targeted to nova and aren't completely ambiguous, like 'think timed out waiting for status x' 21:26:18 there is just 1 critical affecting gate https://bugs.launchpad.net/nova/+bug/1367941 but i think there are more... 21:26:20 I think its a bending of our triage rules, but if something is breaking the gate its a big deal 21:26:21 Launchpad bug 1367941 in oslo-incubator "Able to aquire the semaphore used in lockutils.synchronized_with_prefix twice at the same time" [Critical,Confirmed] 21:26:23 tjones: yeh, I think marking gate bugs critical is good 21:26:38 tjones: i thought we had a lockutils sync? 21:26:45 tjones: yes that would be great, I don't think we can realistically mark all as critical. so say anything with 5 or more hits in 1 day 21:26:53 oh nvm 21:26:54 https://review.openstack.org/#/c/120897/ 21:26:55 mriedem1: this is after the sync 21:27:20 jogo: good idea 21:27:24 mriedem1: mriedem1 lockutils was a red herring 21:27:35 jogo: the elastic_recheck page is sorted by fail rate, yes? 21:27:42 mikal: yes 21:27:45 tjones: also some are marked as invlalid 21:27:51 mikal: fail rate in last 24 hours 21:27:57 so just go to the top of the page 21:28:12 jogo: did you get any further on - 1367941 ? 21:28:22 because I can imagine that might be a culprit in lots of issues 21:28:28 sdague: yeah it was a red herring 21:28:33 sdague: just misleading logs 21:28:37 ok 21:28:42 sdague: which is too bad 21:28:47 so why is it still confirmed? 21:29:00 sdague: because there was a bug, but it was in the logging 21:29:17 can you update the log message and make it not critical :P 21:29:30 sdague: that is the workigng theory at least and everything backs that up 21:30:39 sdague: re-triaged out of critical 21:30:44 jogo: thanks 21:31:15 sdague: you looked into bug 1357476 right? 21:31:19 Launchpad bug 1357476 in neutron "Timeout waiting for vif plugging callback for instance" [Medium,Confirmed] https://launchpad.net/bugs/1357476 21:31:21 on the up side, only 32 bugs in the new state for nova 21:31:30 I was debugging with dansmith 21:31:31 jogo: dansmith has a logging patch up for that 21:31:40 yeh, we need the logging patch 21:31:41 https://review.openstack.org/#/c/120842/ 21:31:41 I guess I'm hoping for a FFE like workflow for bugs at this point 21:31:44 something's not right 21:31:50 If we could decide as a group a small number to focus on 21:31:56 We might make more progress than we do usually 21:32:04 I guess "Critical" is one way of defining that group 21:32:33 either that or use a tag 21:32:59 lets do critical 21:33:15 Well, we should also be emailing that list to -dev a bunch 21:33:22 Let's make it hard for people to not know what to look at 21:33:36 that will do it! 21:33:37 If people were checking the bug tracker, we wouldn't have 1,000 bugs 21:34:17 935 (without incomplete) 21:34:27 moving on... 21:34:31 novaclient release? 21:34:31 Yeah 21:34:40 But if someone could come up witha wishlist email that would be good 21:34:45 mriedem1: sure, if you want one 21:34:50 mriedem1: what prompts it? 21:35:00 mikal: 9/18 is the deadline for final client releases before rc1 21:35:12 And would it destabilize the gate? 21:35:32 the gate already uses trunk novaclient 21:35:38 Wow, there's a lot of "undecided" in the client fix committed list 21:35:51 this would be for getting it into global-reqs i think as a min version? 21:35:53 there is a ML thread 21:35:57 Nothing above medium for those thigns which are triaged 21:36:16 mikal: http://lists.openstack.org/pipermail/openstack-dev/2014-September/045487.html 21:36:28 * mikal looks 21:36:37 Oh that one 21:36:39 mikal: basically any FFEs that touch the clients should have a release before rc1 21:36:46 Ok, so how about I try and triage those bugs so they're less confusing over the weekend 21:36:51 And do a release on my Monday? 21:37:02 sure, well i think you have a week 21:37:25 I don't recall seeing any client changes in the FFE review list 21:37:29 But I might have missed them 21:37:34 I wasn't really looking for them 21:37:35 they'd be tied to the bp's 21:37:52 Ok, I will make a note to do a client release 21:37:58 oMoving on? 21:38:08 mikal: https://github.com/openstack/python-novaclient/commit/c59a0c8748ccc5f6a0cf80910c09b9328b4253ac 21:38:16 that's an example of server bp that is in the client 21:38:20 but not in a released client version 21:38:22 for vishy 21:38:38 So noted 21:39:02 https://review.openstack.org/#/c/108942/ is the client change for server group quotas 21:39:30 Ugh 21:39:37 PhilD: he needs to tie it to the bp 21:39:38 Getting that through the gate before 18 September will be hard 21:39:57 Given we need to get three of its friends through first 21:40:11 yeh, honestly, just wait on the client bit there and bring it in once we open the tree back up 21:40:22 and cut another release of the nova client after the release 21:40:32 Ok - makes sense 21:40:36 Yeah, client releases aren't too scary 21:40:37 mikal: both remaining NUMA patches now reviewed. 21:40:49 But I agree that change needs to be tied to the BP 21:40:54 Moving on... 21:40:58 Its only the shell change - the client bindign itself will work wth the new quotas anyway 21:40:59 I think we've covered gate already 21:41:14 #topic Ironic API proxy 21:41:38 The code is out for review for this one right? 21:41:47 It seems the concensus on the thread is we should land it? 21:42:47 Its not even very big 21:42:54 So... Who is going to review it? 21:43:19 I will 21:43:21 I will 21:43:22 because I have time for that 21:43:26 (not) 21:43:29 I will too, cause I feel left out 21:43:39 link? 21:43:40 Ok, let's just do it then 21:43:42 I did the first round, mostly wanted a fix on imports, didn't see if we had a second 21:43:45 https://review.openstack.org/#/c/120433/ 21:43:56 We're at patch set 7 21:44:05 This is the last thing blocking ironic graduation as best as I can tell 21:44:32 unit test fails are probably migrate's fault 21:44:33 i'll add to the queue 21:44:35 rechecking 21:44:49 sdague: It already had a recheck? 21:45:10 it just failed out recently I thought 21:45:25 It failed at 7:29AM 21:45:30 And got rechecked at 7:39AM 21:45:35 (My time) 21:45:40 in crazy future world 21:45:44 ok, my bad :) 21:45:44 LOL 21:45:55 Ok, I'm done with ironic proxies 21:46:01 #topic Open Discussion 21:46:47 I've got a bug fix up for review -- https://review.openstack.org/#/c/119646/ -- for some reason it's -V. 21:46:59 although it's actually passed all the CI 21:47:24 bknudson: citrix doesn't like it 21:47:28 click toggle CI 21:47:33 citrix doesn't like much today 21:47:38 Sigh 21:47:46 although citrix says it passed but -1ed anywy 21:47:51 for some reason the latest citrix CI comment says it passed. 21:48:08 maybe just rebase it and run again? 21:48:17 Has anyone pinged Bob about the citrix CI being unreliable? 21:48:20 just did 21:48:24 consider him punged 21:48:26 Ta 21:48:34 the reason citrix CI is unreliable is the memory bloat 21:48:45 sdague: ? 21:48:47 they are basically always running out of memory on their systems during test 21:48:53 right they have the xenserver overhead on top of everything else 21:49:03 which is non trivial iirc 21:49:05 * dansmith shudders 21:49:09 haha 21:49:21 BobBall was actually originally the one pushing the reduction of workers in devstack because of *this* 21:49:56 so there is a patch, that ianw disagrees with me on, that gives us a global flag to tune this. I suppose I should just override him and land it so we can move forward. 21:50:33 devstack change: https://review.openstack.org/#/c/117517/ 21:51:16 So, I think we're done here? 21:51:18 Nothing else? 21:51:24 fix bugs please! 21:51:29 I live to obey 21:51:32 I marked gate bugs as critical 21:51:36 Thanks 21:51:36 mikal: definition for release critical bugs? 21:51:37 why does novaclient have a config generator file? 21:52:19 sdague: hmmm, not sure. I haven't given much thought to that yet. Do we have a written definition we've used in the past? 21:53:04 don't know, it would just probably be good for guidance on triage 21:53:13 Yep 21:53:20 This sounds like the sort of thing ttx will have advice on 21:53:24 I shall ping him and ask 21:53:43 I feel like I don't need to reinvent the wheel here, just work out what we said in the past 21:54:30 Sounds like we're done 21:54:35 Have a 6 minute break 21:54:38 Then go fix bugs 21:54:53 #endmeeting