21:01:05 #startmeeting nova 21:01:06 Meeting started Thu Aug 28 21:01:05 2014 UTC and is due to finish in 60 minutes. The chair is mikal. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:01:07 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:01:09 o/ 21:01:09 The meeting name has been set to 'nova' 21:01:10 gawd 21:01:15 So who is still here? 21:01:16 :P 21:01:19 o/ 21:01:20 \o 21:01:20 o/ 21:01:21 o/ 21:01:24 o/ 21:01:28 o/ 21:01:33 o/ 21:01:41 do we really need to take attendance before and after we start each time? 21:01:46 #topic Juno-3 status, September 4th Juno-3, FF and SF 21:02:02 dansmith: its cool, it just gives people time to get ready 21:02:06 So... j-3 status 21:02:14 We still have four high priority bps needing review 21:02:21 #link https://launchpad.net/nova/+milestone/juno-3 21:02:29 dansmith: no, you can always start it and cancel it if you judge you are not quorate. 21:02:36 I tried a week or so ago emailing openstack-dev and highlighting those 21:02:41 so might as well start and then ask for attendees to chime in 21:02:53 That wasn't super successful 21:02:54 late o/ 21:03:06 So the new tactic is that some cores are the lucky winners of personal pings asking for them to look at things 21:03:14 Which has actually had people reply to, which is cool 21:03:24 If you're offended that you weren't pinged, then just review something anyways... 21:03:45 I think the BP I'm most worried about that's high is scheduelr at this point 21:03:51 It hasn't had much core review compared with the others 21:04:01 So yeah, please keep reviewing BPs that are targetted to j-3 21:04:21 Anything else people think should be mentioned here? 21:04:47 dansmith: your megatuple? 21:04:49 Oh, we're also still looking for a volunteer to wrangle review days if anyone wants to give it a go 21:04:56 mriedem: I have that later on the list 21:05:00 ok 21:05:08 Move on then? 21:05:20 mikal: dansmith: I noticed the compute manager objectify BP is high while RT objectify is medium. Will that make the computer objectify not finished? 21:05:45 yjiang5: those are long term tracking BPs 21:05:46 I mean if only compute manager objectify, but no RT objectify 21:05:50 mikal: got it. 21:05:53 I'm not sure the priority really affects those much 21:05:56 righ 21:05:57 t 21:06:06 poorly named perhaps 21:06:13 Although, if any of those are "done" we should mark them implemented 21:06:22 I guess once we hit feature freeze that might be more likely 21:06:42 Anything else? 21:06:43 mikal: dansmith: thanks for clarification. 21:06:48 NP 21:07:04 #topic Bugs 21:07:14 tjones: you around? 21:07:19 (autocomplete says not) 21:07:41 There are 186 bugs marked as ready for review at the moment 21:07:49 So... It would also be nice to get some reviews happening on those 21:08:03 Although I recognize that people are probably full steam onf FF stuff at the moment 21:08:17 No criticals in progress at least 21:08:19 mikal: don't forget the fun link http://54.201.139.117/nova-bugs.html 21:08:36 jogo: yeah, that's what I'm looking at, but hadn't linked to 21:08:42 #link http://54.201.139.117/nova-bugs.html 21:09:03 Are there any bugs we need to be extra worried about at the moment? 21:09:11 mikal: bug 1349617 is not yet prioritized - but seems nova is involved as well 21:09:13 Launchpad bug 1349617 in neutron "test_volume_boot_pattern fails in grenade with "SSHException: Error reading SSH protocol banner[Errno 104] Connection reset by peer"" [High,New] https://launchpad.net/bugs/1349617 21:09:24 it’s one of the top gate offenders. 21:09:32 salv-orlando: so, that sounds familiar 21:09:39 We've had things like that before related to snapshot IIRC? 21:09:50 #link https://launchpad.net/bugs/1349617 21:09:52 Launchpad bug 1349617 in neutron "test_volume_boot_pattern fails in grenade with "SSHException: Error reading SSH protocol banner[Errno 104] Connection reset by peer"" [High,New] 21:09:57 and resize: bug 1323658 21:09:59 Launchpad bug 1323658 in nova "Nova resize/restart results in guest ending up in inconsistent state" [Undecided,New] https://launchpad.net/bugs/1323658 21:10:03 also a high gate issue 21:10:06 #link https://launchpad.net/bugs/1323658 21:10:08 Launchpad bug 1323658 in nova "Nova resize/restart results in guest ending up in inconsistent state" [Undecided,New] 21:10:14 Is anyone looking at those two yet? 21:10:29 jogo: footprints for 1323568 and 1349617 seem to overlap indeed 21:10:44 Or should we quickly change the topic to gate? 21:10:46 salv-orlando: hmm maybe the same bug, want to look into that a bit more? 21:10:48 These seem to fall under both 21:11:08 jogo: already doing that - seems issue with metadata/config drive not providing right user data or no user data at all 21:11:16 Huh 21:11:18 that’s why ssh then fails - wrong key loaded 21:11:22 salv-orlando: doh 21:11:32 I haven't heard of anything like that before 21:11:33 see last comment on bug report forbug 1349617 21:11:47 mikal: so I don't have a lot of bandwidth to dive into fixing some of these nova bugs at the moment 21:11:51 I just need some help to confirm it’s not a red herring, and if not what might be the cause 21:11:59 so a few volunteers besides mriedem and salv-orlando would be great 21:12:08 in addition to* 21:12:27 jogo: agreed 21:12:36 That userdata bug does look interesting 21:12:40 An possibly new 21:12:43 And even 21:12:52 Anyone got some spare cycles to take a look/ 21:12:53 ? 21:13:13 #note we might have a bug where userdata is wrong in metadata server and config drive? 21:13:56 Ok, well, instead of stalling I'll try and chase that tomorrow if someone doesn't beat me to it 21:14:06 Moving on to gate in general... 21:14:06 mikal: I'll take a look at it, I might not have enough experience to help though 21:14:15 melwitt: that would be awesome 21:14:23 I think step one might be to dump the userdata somehow? 21:14:31 And then see if gate logs confirm its wrong in those cases? 21:14:58 melwitt: I am happy to help in any way I can I just can't drive it 21:15:27 Cool 21:15:36 Move on to gate in general then? 21:15:57 #topic Gate 21:16:11 So, apart from those two bugs above, is there anything else about the gate we should know? 21:16:13 jogo: I can take lead on this issue if nova team is ok with that. 21:16:31 salv-orlando: we take bug fixes from anyone... 21:16:39 salv-orlando: I am definitely OK with that 21:16:42 But gettign melwitt to help sounds like a good idea to me 21:17:06 mriedem: jogo: the gate is all good apart from that? 21:17:15 mikal: so as a gate related note we have been hitting job quota for some time 21:17:35 jogo: that just means we queue right? 21:17:43 So life is slower, but not apocalytic? 21:17:45 bug 1353131 21:17:46 Launchpad bug 1353131 in nova "Failed to commit reservations in gate" [High,Confirmed] https://launchpad.net/bugs/1353131 21:18:02 mikal: much slower 21:18:03 #link https://launchpad.net/bugs/1353131 21:18:04 Launchpad bug 1353131 in nova "Failed to commit reservations in gate" [High,Confirmed] 21:18:07 bug 1357677 21:18:11 Launchpad bug 1357677 in nova "Instances failes to boot from volume" [Undecided,New] https://launchpad.net/bugs/1357677 21:18:17 Dude, if you usd link I'd be out of a job 21:18:17 bug 1357055 21:18:19 Launchpad bug 1357055 in nova "Race to delete shared subnet in Tempest neutron full jobs" [Undecided,New] https://launchpad.net/bugs/1357055 21:18:24 #link https://launchpad.net/bugs/1357677 21:18:25 bug 1273292 21:18:28 Launchpad bug 1357677 in nova "Instances failes to boot from volume" [Undecided,New] 21:18:29 Launchpad bug 1273292 in nova ""Timed out waiting for thing ... to become in-use" causes tempest-dsvm-* failures (dup-of: 1270608)" [Critical,Confirmed] https://launchpad.net/bugs/1273292 21:18:31 Launchpad bug 1270608 in cinder "n-cpu 'iSCSI device not found' log causes gate-tempest-dsvm-*-full to fail" [Critical,Fix released] https://launchpad.net/bugs/1270608 21:18:33 Sigh 21:18:43 bug 1349147 21:18:44 Launchpad bug 1349147 in nova "test_db_api unit tests fail with: UnexpectedMethodCallError: Unexpected method call get_session.__call__(use_slave=False) -> None" [High,Confirmed] https://launchpad.net/bugs/1349147 21:18:53 how many are we doing to do here? 21:18:55 bug 1357578 21:18:57 Launchpad bug 1357578 in nova "Unit test: nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITest.test_terminate_sigterm timing out in gate" [Undecided,New] https://launchpad.net/bugs/1357578 21:19:00 we could just link to the e-r status page 21:19:02 :) 21:19:04 mriedem: enough to make mikal mad 21:19:06 mriedem: apparently every bug ever? 21:19:19 * mikal has given up on minuting them 21:19:24 * salv-orlando think jogo has a script for that 21:19:28 So... 21:19:33 mikal: those are bugs that have more then 3 fails in 24 hours 21:19:39 What are the, say, three bugs you want looked at? 21:19:42 There's salv-orlando's two 21:19:44 And one other? 21:20:04 mikal: all the nova ones on http://status.openstack.org/elastic-recheck/ 21:20:05 Or are you just trying to highlight that people should read the e-r report? 21:20:08 starting at the top 21:20:14 mikal: yes 21:20:27 and that nova is one of the worst offenders of gate bugs 21:20:30 Ok, cool 21:20:33 So noted 21:21:05 Anything else for gating? Or do we move on? 21:21:32 #topic Mid-cycle meetups 21:21:39 I finished writing the nova summary 21:21:42 #link http://www.stillhq.com/openstack/juno/000016.html 21:21:47 I also went to the ops meetup this week 21:21:53 They don't seem too filled with rage 21:22:00 I think it was a useful exercise 21:22:23 #topic Specs for Kilo 21:22:29 Ok, so... 21:22:35 I think we should open specs for Kilo basically now 21:22:47 Because there are people who are only intersted in kilo specs at this point 21:23:00 But set the expectation that we wont review anything until later 21:23:02 Thoughts? 21:23:28 I would prefer not to open them 21:23:31 it sends the wrong message 21:23:39 Which is? 21:23:40 better than nothing, when do you think reviews will truly start? 21:23:42 I don't see what the point is 21:23:51 we have way to much to do for Juno as is, we don't need the distraction 21:23:53 dansmith: in opening them? 21:23:59 I don't think its a distraction for us 21:24:00 mikal: yeah 21:24:02 We just ignore them for now 21:24:10 mikal: i think it makes sense to open them fwiw 21:24:11 It saves us from having to explain over and over why they're not open yet 21:24:15 like you said, if people are going to work on them regardless 21:24:15 Which I've done at least three times now 21:24:22 might as well let them use the tool 21:24:30 Yeah, there are some people who just wont close bugs or help review 21:24:30 it still sends the wrong message 21:24:33 instead of pushing it out elsewhere 21:24:53 if people don't want to close bugs or help reviews they honestly shouldn't file specs 21:24:55 gives visibility into what's happening regardless 21:25:03 some folks also realistically would be working on both 21:25:08 jogo, but limbo (Juno frozen, Kilo not open) is even worse I think 21:25:10 I don't want visibility it is a distraction 21:25:10 bugs and planning for next cycle i mean 21:25:13 jogo: well, if we're going to require specs at the summit it also gives people more time to prepare 21:25:32 mikal: that is a fair point 21:25:34 jogo: its only a distraction if you look at them 21:25:40 what is the timeline for summit stuff? 21:25:47 jogo: I find the questions over and over a bigger distraction to be honest 21:25:51 mikal: it still sends a really bad message IMHO 21:26:03 mikal: I am happy to answer it instead 21:26:07 jogo: timeline as in when we announce the schedule etc? 21:26:15 also I think we want to rev the specs template a bit 21:26:32 mikal: when we start accepting summit proposals and when we have to decide 21:26:38 "opening kilo specs" means creating the directory, right? anyone can submit a spec in the kilo directory now, AFAIK 21:26:41 So, my plan is to let the ideas etherpad bake a bit longer 21:26:49 #link https://etherpad.openstack.org/p/kilo-nova-summit-topics 21:27:07 dansmith: that is true I suppose 21:27:18 that is why I don't see the point of doing anything 21:27:19 jogo: I'm not sure, I need to chase ttx for that 21:27:45 so if we are going to open up kilo for submissions 21:27:47 I think creating an empty directory does no harm 21:27:49 we need to rev the spec first 21:27:56 err template 21:27:57 And we can always rev the template later and make people update if we really need to 21:28:11 Lost of people who are rebasing will need to anyways 21:28:13 I think once we're official about it, people will start pinging for review 21:28:14 I still think this sends the compleatly wrong message 21:28:15 Lots even 21:28:24 we don't want more specs right now 21:28:28 "not official review, of course, but could you just take a quick look please?" 21:28:29 even if we aren't going to look at them 21:28:37 dansmith: to that I would say hell no 21:28:38 dansmith: that's a fair point 21:28:40 personally 21:28:46 So, if not now, when? 21:28:47 right, it's goign to make me angry 21:28:54 Its going to have to be before release... 21:29:26 master == kilo at rc1, based on past process anyway 21:29:29 mikal: how about once we have full feature freeze in place? 21:29:45 at least that way we are done looking at Juno blueprint reviews 21:29:51 Ok, so... 3 September ish? 21:29:57 I could live with that 21:30:07 wow that soon 21:30:08 well, right after FF is going to be a hectic time as well of course 21:30:11 likely some exception requests after that 21:30:16 I think jogo meant "well after.." 21:30:19 russellb: right, that 21:30:27 so could be 2 weeks 21:30:36 week to FF, another week to sort the exception rush 21:30:41 17 September? 21:30:42 dansmith: after we are done with all juno blueprints 21:30:44 or we could round up to 1-Oct 21:30:46 FF and all 21:31:18 FFE and all * 21:31:24 Well, it has to be before we flip the branch, right? 21:31:37 Otherwise people will be bottlenecked on working on master when we do 21:31:44 mikal: not necessarily, we have 1000 bugs to fix 21:32:00 whatever, I don't care that much, it just seems like in or around FF time is the wrong time to do it 21:32:14 2 October is the absolute latest I'd be willing to do it 21:32:16 I'm +1 for the 1-Oct idea 21:32:20 purely from a distraction, bandwidth, and annoyance pov 21:32:24 so 1-oct works for me 21:32:44 Does anyone want later than 1 Oct? 21:32:58 RC1 *could* happen before that 21:33:11 it'd be a little awkward to wait until after master is open for kilo i think 21:33:18 russellb: yeah, agreed. 25 Sept right? 21:33:19 but otherwise, waiting until after FF and such makes sense 21:33:23 #link https://wiki.openstack.org/wiki/Juno_Release_Schedule 21:33:27 yeah 21:33:58 So... rc-1? 25 Sept or later, whenever that happens? 21:34:11 yeah that sounds good 21:34:17 that is the earliest I want to do it 21:34:24 rc1 21:34:33 when master is open for Kilo seems like the right cutoff to me 21:34:41 Ok, RC1 it is then 21:34:52 #note K specs will open at Juno RC1 21:34:56 dansmith: thouhts ^ 21:35:00 thoughts 21:35:02 #note There might be a template update before that 21:35:09 sure, whatever 21:35:40 Ok, when we do that I also want to do some rearranging in the Juno directory 21:35:46 But we don't have to argue about it now 21:35:51 I mentioned it on the mailing list 21:35:59 Basically signalling what actually merged vs was approved 21:36:09 Probaby with this new fangled directory technology 21:36:30 so there was a similar discussion about 6 months ago when we were setting up the specs system 21:36:40 I don't remember what the result was, but I think we talked about this 21:36:48 i remember talking about it :) 21:36:57 i don't remember how we ended where we are though 21:36:58 I think we can do somehting now based on how we've learned people use this information 21:37:11 It would be confusing to leave unimplemented things as approved 21:37:13 mikal: i think your proposal was good (differentiating what got done vs didn't) 21:37:21 Its totally paperwork 21:37:29 Let's not spend more than 5 minutes arguing about it 21:37:31 And just do something 21:37:36 i think keeping record of the state of a design as approved, even if not implemented, is good too 21:37:37 As in, its totally not a big deal 21:37:49 like a "unfinished" or whatever you want to call it dir 21:37:52 i think you said that in your mail 21:37:55 Yeah, that's the idea 21:37:56 Cool 21:38:01 I like that someone agrees with me for once 21:38:04 yep +1 from me at least 21:38:04 You get a gold star 21:38:06 heh 21:38:13 #topic Summit session ideas etherpad 21:38:17 So, I mentioned this a second ago 21:38:20 But just in case... 21:38:29 #link https://etherpad.openstack.org/p/kilo-nova-summit-topics 21:38:38 The idea is to brain storm what is important to discuss at the summit 21:38:44 Much like we did for the last couple of mid-cycles 21:38:52 And then prioritize that list in a bit once we've collected all the things 21:38:56 So, please check it out 21:39:01 And add stuff we've missed 21:39:12 Moving on... 21:39:16 #topic SRIOV patch set big issues 21:39:21 dansmith: you wanna run with this? 21:39:26 yeah, so, 21:39:41 there is code up for the SRIOV thing, which I've been reviewing lately 21:39:58 and there are two things that really concern me about it, but I know it's important for a lot of folks 21:40:33 one is the fact that it adds another thing to the neutron megatuple, which further complicates that code and confuses the issue in the non-neutron code by mangling that set of tuples late 21:40:39 I've got patches up to refactor that: 21:40:39 https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:refactor-megatuple,n,z 21:40:50 nice work on that refactor 21:40:58 i've started reviewing it ... was in meeting hell most of the day though 21:40:59 which hopefully we can agree is an improvement and get those in ahead of the SRIOV patches 21:41:12 however, the second problem concerns me a lot more, and isn't quite as easy to solve 21:41:15 that series is nearly ready 21:41:22 dansmith: many are +A already it seems 21:41:36 mikal: yes, we've been working hard over the last couple days to get that in place in short order, 21:41:42 much thanks to mriedem for quick turnaround reviews 21:42:05 the second problem is that the PCI stuff stashes some details in a dict structure, which it serializes to JSON, 21:42:12 and stores into a system_metadata key 21:42:20 this is bad because those are limited to 255 characters, 21:42:39 so if all of this generated data becomes >255 characters when serialized, the instance boot breaks in a strange way, 21:42:44 and in a way the user can't really do anything about 21:42:56 Which seems not great 21:43:09 the existing implementation is really at fault, and needs to be changed, but adding more things to there just gets us closer to breaking at 255 chars for more common situations 21:43:12 thoughts on just punting this to Kilo. we have no shortage of important things 21:43:12 How likely is it to be too long? 21:43:18 Is it easy to do? 21:43:40 mikal: I think that with a couple of PCI devices you could overrun it, yeah, depending on how things fall 21:44:02 would refactoring to use something like the new instance_extra table be enough? 21:44:10 like, if you wanted a GPU and an inbound/outbound NIC, youd' be over I think 21:44:17 right, so that is one option, 21:44:17 Ugh 21:44:32 is stash this alongside the numa stuff in that new text area, 21:44:41 which should presumably be doable without much trouble, 21:44:56 and hopefully objects can help us migrate the actual data live 21:45:05 like, pull it from both places on read, but just save to the new location, etc 21:45:19 Makes sense to me 21:45:21 a full system_metadata refactor seems unreasonable given time frame, i guess 21:45:28 but maybe that's a sensible middle ground? 21:45:43 so, I bring this up because (1) I think it's not something we should merge in its current form and (2) we need quick agreement on if/how we are going to solve it 21:45:49 A system_metadata schema migration would likely hurt deployers too 21:45:52 we could punt, but lots of people want this 21:45:59 mikal: right, we don't want to do that now I think.. too late 21:46:02 dansmith: lots of people want lots of things 21:46:08 dansmith: agreed 21:46:22 jogo: I know, I'm not saying punting is off the table, I'm saying we need to decide 21:46:26 jogo: well, if instance_extra is trivial for them to do, I think we should at least give them the chance to do that 21:46:27 jogo: but there are people willing to do this work, so seems reasonable to let it get done, right? 21:46:47 jogo: I've already done a lot to make it work on the neutronapi side in the last 48 hours, 21:46:51 Could we charge them for the refactor in bug fixes? (joking) 21:47:03 so I kinda don't want it to be for naught, but if there is no support, then we should punt 21:47:22 Sounds like jogo is the only one who prefers punting here? 21:47:26 dansmith: how about giving them a window 21:47:26 Any other takers? 21:47:34 jogo: that would be fair 21:47:37 jogo: feature freeze? 21:47:37 FF? 21:47:44 that's what FF is for right? :) 21:47:45 so give them x days to get this into a good state 21:47:53 russellb: agreed 21:47:54 russellb: well yeah 21:47:57 meaning it has to be *up* before FF? 21:48:05 and that we'll give this an FFE to get merged? 21:48:16 dansmith: I think so 21:48:19 okay 21:48:23 We shouldn't punish them for our review bandwidth problems 21:48:24 the other way to look at this is: 21:48:28 instead of this just being more feature, 21:48:37 this actually makes our existing PCI stuff less stupid, IMHO 21:48:45 not much less, but, a little less 21:48:53 Well, "more reliable" might eb the marketing way of saying that... 21:48:56 dansmith: that is a fair point 21:49:03 mikal: er, yeah, let's go with that one :) 21:49:08 LOL 21:49:14 Ok, sounds like we have a plan? 21:49:16 "make it suck less" 21:49:27 * mikal preps a #note 21:49:44 on the plus side, megatuple will be gone for a big chunk of shitty stuff 21:49:54 regardless 21:50:02 yeah, death to the megatuple is goodness 21:50:05 makes pci less sucky, and neutronapi less sucky 21:50:16 #note Ask SRIOV people to refactor PCI pass through to use instance_extra instead of system_metadata to avoid hitting the 255 chars limit. Dan Smith to provide guidance. Deadline for refactor is feature freeze. 21:50:20 it's a small dent in neutronapi's overall suckiness, but.. 21:50:21 ++ to less badness 21:50:22 ^-- a fair summary? 21:50:28 yep 21:50:33 Done 21:50:42 works for me 21:50:45 Nothing else on this? 21:51:02 #topic Open discussion 21:51:07 You have nine minutes, use them wisely 21:51:45 Really, nothing? 21:51:51 OMG, early mark? 21:51:59 I plan on revving https://review.openstack.org/#/c/116699 and related todaay 21:52:07 the blueprints for kilo page 21:52:19 "Add plan for kilo blueprints: when is a blueprint needed" 21:52:30 jogo: sounds good 21:52:31 and the next patch about priorities 21:52:50 Nothing else? 21:52:57 mikal: I have a question 21:53:02 dansmith: go for it 21:53:07 mikal: may I have an early mark? 21:53:14 dansmith: yes 21:53:19 heh 21:53:20 Mostly because you didnt make fun of the phrase... 21:53:27 Which I insist is English 21:53:28 * dansmith has no idea what he has just got 21:53:33 Sounds like we're done 21:53:47 o/ 21:54:00 Is that you waving good bye or asking a question? 21:54:09 http://english.stackexchange.com/questions/97013/is-early-mark-only-used-in-australia-and-new-zealand 21:54:39 #endmeeting