17:00:58 <sdague> #startmeeting qa 17:00:59 <openstack> Meeting started Thu Oct 24 17:00:58 2013 UTC and is due to finish in 60 minutes. The chair is sdague. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:01:00 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:01:02 <openstack> The meeting name has been set to 'qa' 17:01:07 <sdague> who's here? 17:01:10 <afazekas> O/ 17:01:14 <mtreinish> I am here 17:01:14 <mlavalle> sdague: hi 17:01:17 <maurosr> o/ 17:01:19 <Anju> hi 17:01:19 <dkranz> Here 17:01:20 <adalbas> here 17:01:32 <sdague> #link https://wiki.openstack.org/wiki/Meetings/QATeamMeeting Agenda 17:01:44 <sdague> #topic Design Summit Schedule (sdague) 17:01:53 <mkoderer> hi * 17:01:58 <sdague> #link http://icehousedesignsummit.sched.org/ 17:02:20 <mkoderer> sdague: I got my final approval 17:02:30 <sdague> so the summit schedule is pushed, I figured we'd take a minute to figure out if there were any last minute sessions that we really need, and that I need to adjust for 17:02:49 <sdague> otherwise, I'm pretty happy with how the schedule played out, and I think there will be a lot of good meat there 17:03:16 <dkranz> sdague: lgtm 17:03:18 <mkoderer> looks good for me 17:03:45 <mtreinish> sdague: nice pun? 17:03:55 <sdague> :) 17:04:16 <sdague> I guess I asked before, but who all is going to be there? 17:04:18 <sdague> o/ 17:04:23 <sdague> just to get a sense of things 17:04:25 <mtreinish> I will 17:04:38 <mkoderer> I will be there 17:04:49 <Anju> sdague: I will also 17:04:58 <dkranz> I will 17:05:01 <afazekas> o/ 17:05:11 <sdague> cool, the gang will all be there :) 17:05:18 <sdague> ok, next topic 17:05:33 <sdague> #topic Neutron job status (mtreinish) 17:05:41 <mtreinish> oh this is a topic 17:05:57 <mtreinish> ok so you may have noticed a new gating job on tempest neutron-pg-isolated 17:05:59 <sdague> I figured you've got the most recent knowledge on that 17:06:16 <mlavalle> sdague: I also have things to report regarding neutron 17:06:25 <mtreinish> that is the same as the regular neutron job just with tenant isolation enabled (also with a postgres db) 17:06:32 <sdague> mlavalle: cool, jump in 17:06:45 <mlavalle> i'll wait for mtreinish to finish 17:06:52 <mtreinish> yesterday I broke the neutron gate by increase the number of tests that have isolation enabled 17:07:16 <mtreinish> it exposed another real bug in neutron 17:07:29 <jog0> what does isolated mean? 17:07:48 <mtreinish> jog0: it creates a separate tenant and user for each test class 17:07:58 <mtreinish> and with neutron makes an separate network for each tenant 17:08:15 <jog0> mtreinish: thought so, thanks. and that makes neutron fail more? 17:08:19 <mtreinish> yep 17:08:39 <jog0> strange 17:08:55 <mtreinish> so the job was added to fix the asymmetry between the neutron gate and the tempest gate 17:08:55 <sdague> there looks like there is some resource starvation that's happening 17:08:57 <mlavalle> mtreinish; the good thing about this isolation code is that we are really putting neutron through its paces 17:09:15 <mtreinish> so we can catch these issues without me breaking the neutron gate to do it 17:09:35 <dkranz> mtreinish: ++ 17:09:58 <sdague> mtreinish: and you tripped another deadlock, right? 17:10:21 <mtreinish> nati_uen_: thought so, but I did a logstash query this morning and it wasn't a 1:1 match up with the tfail 17:10:48 <mtreinish> this is nati_uen_ etherpad with debug notes: https://etherpad.openstack.org/p/debug1243726 17:11:19 <mtreinish> and bug 1243726 was opened for the issue 17:11:21 <uvirtbot> Launchpad bug 1243726 in neutron "tempest failure: No more IP addresses available on network" [Critical,Confirmed] https://launchpad.net/bugs/1243726 17:11:32 <sdague> #link https://etherpad.openstack.org/p/debug1243726 etherpad for debugging tenant isolation 17:11:50 <sdague> mtreinish: you know if nati_uen_ is still working the issue? 17:11:59 <mtreinish> I think so 17:12:08 <mlavalle> mtreinish: yeah, that bug is consistent with what I find in my dev system 17:13:11 <sdague> mlavalle: ok, great. Are there other things you have to report on it? 17:13:18 <sdague> or on other issues here? 17:13:32 <mlavalle> sdague: I've been working on debugging https://bugs.launchpad.net/swift/+bug/1224001 17:13:34 <uvirtbot> Launchpad bug 1224001 in neutron "test_network_basic_ops fails waiting for network to become available" [Critical,Fix released] 17:13:58 <mlavalle> sdague: the nature of the failure has changed in the log stash since the last fix to neutron 17:14:09 <mlavalle> it now is mostly ping failures 17:14:23 <mtreinish> mlavalle: do we need to change the elastic recheck query? 17:14:33 <mtreinish> or open a new bug? 17:15:18 <mlavalle> i can reproduce in my dev system and will continue debugging. I will use this as an opportunity to develop some of the tcpdump stuff we talked about last week 17:15:27 <sdague> mlavalle: do you have a new recheck query for it? 17:15:39 <sdague> that would be good to change so we can categorize it 17:15:50 <mlavalle> sdague: i will soon 17:16:17 <mlavalle> mtreinish: no for the time being. but i will ping you in irc if i think we should do it 17:16:31 <mtreinish> mlavalle: ok 17:16:41 <mlavalle> that's all i have 17:17:25 <mlavalle> by the way, i'm not going to HKG but want somehow to be part of the neutron conversation :-( 17:17:25 <sdague> ok, great 17:17:51 <sdague> mlavalle: ok... I'm not sure how we do that, but we'll at least try to have a solid etherpad in advance 17:18:14 <sdague> #topic Tempest config file naming conventions and reorg (mtreinish) 17:18:24 <sdague> ok, mtreinish yours again 17:18:32 <mtreinish> I thought this was at the bottom 17:18:32 <mlavalle> sdague: that's great, thanks. I just the team know that i'm committed to this effort 17:18:46 <dkranz> mtreinish: Doesn't matter, go ahead 17:19:05 <mtreinish> so this week I've been going through the config file and changing the grouping around and trying to update the naming to be consistent 17:19:17 <mkoderer> mtreinish: +1 17:19:40 <mtreinish> I want to start adding options for every extension and extra feature we're testing 17:19:47 <mtreinish> instead of just assuming that they are enabled 17:20:08 <mtreinish> but sdague brought up the good point of how we handle that with multiple api versions for the same extension 17:20:20 <mtreinish> like the nova api v3 17:21:03 <mtreinish> so does anyone has any input on what are config strategy should be for this kind of thing? 17:21:29 <mtreinish> I was thinking for extensions with multiple versions we make it a string instead of a bool option to specify which versions are enabled 17:21:46 <mtreinish> obviously this is only a transient issue because eventually the old api version will be deprecated 17:21:52 <dkranz> mtreinish: This is going to be pretty ugly a year or two from now 17:21:53 <sdague> so we also have the issue of configuring this from devstack 17:22:08 <sdague> because devstack really has no idea, as the way nova works is that everything is loaded by default 17:22:32 <dkranz> Coudn't we have some way to "opt-out" of extensions? 17:22:40 <mtreinish> sdague: so I'm fine for defaulting everything true in the sample conf that will work around the devstack issue 17:22:52 <dkranz> Realistically, installations are going to have most enabled, if not aoo. 17:22:54 <sdague> yeh, that just seems like a huge number of options, easy to get wrong 17:22:55 <mtreinish> and I'm working on the config verification script for people who are manually configuring tempest 17:23:11 <mtreinish> which will do the api querying to figure out what is enabled 17:23:12 <sdague> what if we had a list option 17:23:28 <sdague> computev2 = blah,foo,bar 17:23:35 <dkranz> sdague: of excluded extensions? 17:23:38 <sdague> and 'all' is a special value 17:23:39 <mtreinish> that list will get pretty long 17:23:55 <dkranz> mtreinish: Not if it means exclusion 17:23:56 <maurosr> using something similar to what we do in nova policy file wouldn't work? 17:24:02 <sdague> dkranz: so, again, with nova, the minute you specify extensions, you specify them all 17:24:11 <sdague> there isn't an exclude 17:24:28 <sdague> so doing the math becomes interesting 17:24:34 <dkranz> sdague: So you either have all extensions enabled or none? 17:24:46 <sdague> either all, or the list you provide 17:25:09 <sdague> in v3 it's different, because of entry poitns 17:25:19 <dkranz> sdague: I was talking about exclusion only in the tempest config 17:25:34 <mtreinish> maurosr: do you have a link? 17:25:40 <dkranz> So tempest would assume enabled unless mentioned 17:25:51 <dkranz> which would also handle the devstack case 17:25:52 <sdague> dkranz: right, but that would mean you have to figure our that nova added a new extension that you didn't know about 17:26:07 <sdague> because you actually need to compute the diff 17:26:29 <dkranz> but if you didn't know about it you would be running with it, unless the default was disabled 17:26:56 <dkranz> I think I may be too ignorant about this so will be quiet 17:27:06 <sdague> heh, no think of it this way :) 17:27:15 <sdague> avail extensions: a, b, c, d, e 17:27:21 <sdague> nova loads: a, b 17:27:24 <maurosr> mtreinish: https://github.com/openstack/nova/blob/master/etc/nova/policy.json of course just the model, the idea would be enable extensions or not instead of privilege level 17:27:31 <sdague> tempest exclusion for: c, d, e 17:27:40 <sdague> now nova adds ext f 17:27:45 <sdague> and your validation break 17:27:56 <sdague> because f isn't excluded from your tempest config 17:28:01 <sdague> but it's not enabled in nova 17:28:13 <dkranz> sdague: I just did not realize that these extensions in nova were by default opt-in 17:28:17 <sdague> so if we are building a list, it should be in the same order as the services 17:28:25 <sdague> dkranz: well, it's weird 17:28:29 <mtreinish> maurosr: that's basically what I'm proposing except instead of doubling up for the v3 extensions make it a string which specifies the versions 17:28:31 <sdague> it's all in, or explicit it 17:28:55 <dkranz> sdague: I don't think that matches the real usage model at all, which will mostly be "in" not "out" 17:28:57 <sdague> explicit in 17:29:01 <dkranz> sdague: but oh well 17:29:11 <sdague> yeh, the way it is 17:29:35 <sdague> mtreinish: so is there an oslo config type that would let us do this with lists (that could be multi line)? 17:29:40 <sdague> instead of lots of options? 17:29:52 <mtreinish> sdague: there is ListOpt 17:29:59 <sdague> I think it's at least exploring how terrible that patch would be 17:30:16 <sdague> because a ton of boolean options feels weird to me 17:30:33 <sdague> nova v2 is 70 extensions I think 17:30:40 <mtreinish> sdague: we use it for logging right now: https://git.openstack.org/cgit/openstack/tempest/tree/etc/tempest.conf.sample#n13 17:30:45 <mlavalle> sdague: I have to run to another meeting. See you in openstack-qa 17:30:50 <sdague> mlavalle: sure 17:31:13 <mtreinish> sdague: yeah that's fair 17:31:16 <sdague> mtreinish: sure, but that's a much smaller list 17:31:26 <dkranz> mtreinish: Could we have the option just point to a policy file, or wherever the "in" is defined? 17:31:41 <sdague> I guess dkranz's exclude approach would be good as well 17:31:53 <dkranz> Then the conf would not have to be updated all the time. 17:31:55 <sdague> from brevity, though we know it would cause issues 17:32:03 <sdague> dkranz: the policy file is not network accessible 17:32:15 <mtreinish> sdague: that approach just makes my verification script more difficult 17:32:25 <mtreinish> sdague: I think he's saying break out this into a separate file 17:32:26 <dkranz> sdague: I meant "get a copy from the cloud you are running against with tempest" 17:32:27 <sdague> mtreinish: excludes... yeh 17:32:39 <sdague> dkranz: so the problem is, you might not be able to do that 17:32:56 <sdague> if I want to run tempest against hp cloud to figure out if it's really openstack, I can't get their policy file 17:33:09 <dkranz> sdague: Why not, or at least a sanitized subset with just what we care about? 17:33:24 <dkranz> sdague: Surely the implemented extensions is public? 17:33:36 <sdague> dkranz: but not the policy file 17:33:47 <dkranz> sdague: I am going for DRY really 17:33:59 <dkranz> But perhaps it is not possible 17:34:16 <sdague> and the reason we're going down this path, vs. trusting list_extensions, is to be explicit 17:34:33 <dkranz> sdague: I understand 17:34:50 <sdague> mtreinish: ok, so how about explicit and "all" 17:34:57 <sdague> as a list option 17:35:15 <mtreinish> sdague: sure I can do that 17:35:23 <sdague> lets see how bad it is 17:35:41 <mtreinish> well we'll never see how bad it will get because we only run it as all :) 17:35:52 <mtreinish> and not 60 of 70 extensions 17:36:07 <sdague> well, someone else will tell us how bad it is 17:36:34 <sdague> ok, lets move on 17:36:39 <sdague> #topic Scope and place for performance testing such as Rally (dkranz) 17:36:53 <dkranz> So there was a discussion about this on the ml 17:37:14 <dkranz> I just wanted to get a feel of whether we think performance testing should ever be part of tempest 17:37:50 <dkranz> I could go either way 17:38:02 <mtreinish> dkranz: I think that's a good idea or at least the part of it that's actually exercising things 17:38:31 <mkoderer> dkranz: I like the idea 17:38:41 <dkranz> mtreinish: Right, but who will do this work? 17:38:58 <dkranz> mtreinish: If it is not done soon, and people like rally, it will get harder and harder. 17:39:24 <sdague> dkranz: I think the point was letting the rally folks know that we'll like that part in tempest 17:39:42 <sdague> they presumably already were going to do that work, so just that this is the place it should happen 17:40:01 <dkranz> sdague: That works for me, but I was not sure they intended to do that 17:40:06 <sdague> I think the community spoke up pretty strongly about not wanting another load driver out there 17:40:30 <dkranz> sdague: ok, so we will see it percolate a bit 17:40:38 <sdague> dkranz: yes, it remains unclear to me either, but it also was clear they wanted to be part of the gate, and I don't think that will happen if they remain split off doing their own thing 17:40:45 <dkranz> sdague: perhaps there could be some informal discussion at the summit 17:40:50 <sdague> sure 17:41:10 <dkranz> sdague: Agreed about the gate. BUt that is sketchy for the real value 17:41:17 <dkranz> sdague: Even more so than for stress tests 17:41:45 <sdague> dkranz: I think it's a hard problem, but I don't want to completely give up on it yet 17:42:00 <dkranz> sdague: ok let's discuss at summit 17:43:19 <sdague> #topic Status and roll-out plan for failing the gate on log errors (dkranz) 17:43:22 <sdague> next topic 17:43:28 <sdague> all you dkranz 17:43:39 <dkranz> So there was more contention about this on the ml than I expected 17:44:40 <dkranz> I'm not sure how to proceed. I think my case was convincing. 17:45:13 <dkranz> A lot of folks seem to not get that if we allow crap in logs, no one will look at them and that is really bad. 17:45:28 <sdague> dkranz: I don't think it was that contentious 17:45:54 <sdague> honestly, I think the current whitelist approach is fine, and I expect there might be just a few error conditions that we negotiate over at the end 17:45:55 <dkranz> sdague: So if we say we are going to start to fail non-whitelisted errors there will be no objection? 17:46:13 <sdague> dkranz: I think so 17:46:19 <dkranz> sdague: Great, if that is true. 17:46:38 <sdague> jgriffith already went and changed a couple of error conditions in cinder because of the conversation 17:46:46 <dkranz> sdague: ok, cool 17:46:49 <dkranz> sdague: next 17:47:06 <sdague> it looks like there are a couple more that need to be whitelisted out of nova network 17:47:13 <sdague> from the last time I looked at logs 17:47:27 <dkranz> sdague: I am re-watching now 17:47:37 <dkranz> sdague: You can probably imagine how painful this is. 17:47:54 <dkranz> sdague: But I will push through, The end is in sight. 17:48:50 <sdague> cool :) 17:48:56 <dkranz> next topic? 17:49:00 <sdague> yep 17:49:03 <sdague> #topic State of 'smoke' tagging: can we make it useful? (dkranz) 17:49:10 <sdague> it's the dkranz show :) 17:49:14 <dkranz> :) 17:49:26 <sdague> and, honestly, I'm super excited for the whitelist error stuff to hit 17:49:34 <dkranz> So it came up that the current smoke tagging is pretty arbitrary. 17:49:52 <dkranz> We want a set of tests that can run in 5-10 minutes that cover the most ground 17:50:02 <mtreinish> dkranz: right now it's only used for what runs in grenade and the neutron jobs 17:50:18 <mkoderer> we should get rid of all negative test flagged as smoke 17:50:21 <dkranz> mtreinish: Right, but that was not the intent 17:50:27 <sdague> yeh, so honestly, I think we should just dump smoke and have our smoke target be all the non-slow scenario tests 17:50:35 <mkoderer> I don't see any reason for negative tests that are smoke tests 17:50:45 <sdague> mkoderer: +1 agree with that 17:51:01 <dkranz> sdague: That is reasonable if we have the right coverage. 17:51:15 <dkranz> sdague: Certainly the scenario tests *should* have enough coverage. 17:51:31 <mtreinish> sdague: that was the future intent for grenade right, to just run scenario (and increase the scenario coverage) 17:51:39 <mtreinish> and for neutron we want that to be running full anyway 17:51:42 <sdague> dkranz: agreed, actually I'm hoping we can talk about that in - http://icehousedesignsummit.sched.org/event/1a28654a7e05217067ded2bacbfa7484 17:51:46 <dkranz> So that works for me 17:52:20 <sdague> mtreinish: yeh 17:52:26 <dkranz> except that as we go forward we will have more non-slow scenario tests than can run in 5 minutes 17:52:31 <afazekas> IMHO several auth* related test should be smoke even if its a negative test 17:53:07 <sdague> so maybe the negative test discussion at summit, and the scenario test discussion will flesh this out 17:53:14 <dkranz> sdague: ok 17:53:35 <dkranz> Perhaps after neutron is working we can re-use smoke to mean what it should. 17:53:48 <sdague> dkranz: yeh, that would be nice 17:53:49 <dkranz> Then we can run 'smoke' scenario and api 17:53:52 <mkoderer> afazekas: could be that there are some exceptions.. 17:54:11 <dkranz> I think that is he right answer for what we want. 17:54:23 <sdague> sure, I guess we could do that, tag some representative scenario tests and a few others we think are important 17:54:30 <dkranz> sdague: Exactly 17:54:53 <sdague> but in reality the API tests feel like they are largely a different class, and each very small, so being in smoke isn't quite right 17:55:01 <sdague> but... a summit discussion 17:55:05 <sdague> also possibly with beer :) 17:55:12 <dkranz> sdague: Definitely 17:55:22 <dkranz> sdague: That's all from me 17:55:23 * mkoderer not sure if the beer tastes good in HK 17:55:33 <sdague> I will have to say the beer during summit sessions in san diego was a great idea 17:55:35 <dkranz> mkoderer: Bring some! 17:55:49 <sdague> mkoderer: I'm sure they have importers :) 17:55:52 <sdague> ok 17:55:59 <mkoderer> ok :) 17:56:00 <sdague> #topic Open Discussion 17:56:05 <sdague> anything else? 17:56:29 <sdague> are people going to be around next week, or are they starting to travel by that point? 17:56:40 <dkranz> sdague: I will be on a plane next Thur so may or may not have connectivity for the meeting 17:56:46 <sdague> mtreinish and I are flying on Fri 17:56:55 <sdague> others? 17:57:06 <mkoderer> I leave on Sunday 17:57:13 <mtreinish> dkranz, sdague: doesn't that depend on your reference point? (and departure time) 17:57:38 <jog0> https://review.openstack.org/#/c/53699/ needs review to unblock stable/havana 17:57:56 <jog0> I had a commit message typo but zuul said it was working 17:58:06 <mtreinish> jog0: why don't you cherry pick the version of that in master? 17:58:12 <sdague> heh 17:58:36 <jog0> mtreinish: that is a seperate patch https://review.openstack.org/#/c/51041/ 17:58:45 <jog0> this is a bigger issue though 17:58:51 <sdague> dkranz: ok, can you +2 - https://review.openstack.org/#/c/52413/1 first? 17:59:00 <dkranz> sdague: Yeah, just a sec 17:59:05 <mtreinish> oh crap, that's going to be a merge conflict 17:59:13 <mtreinish> no there was patch bumping six on master 17:59:16 <mtreinish> that's been merged 17:59:32 <mtreinish> jog0: https://git.openstack.org/cgit/openstack/tempest/commit/?id=c0441be3d7f994998779054991214242c5005877 18:00:01 <dkranz> sdague: I did it 18:00:15 <sdague> ok, we need to give up the slot, lets take this to -qa 18:00:20 <sdague> #endmeeting