17:00:58 #startmeeting qa 17:00:59 Meeting started Thu Oct 24 17:00:58 2013 UTC and is due to finish in 60 minutes. The chair is sdague. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:01:00 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:01:02 The meeting name has been set to 'qa' 17:01:07 who's here? 17:01:10 O/ 17:01:14 I am here 17:01:14 sdague: hi 17:01:17 o/ 17:01:19 hi 17:01:19 Here 17:01:20 here 17:01:32 #link https://wiki.openstack.org/wiki/Meetings/QATeamMeeting Agenda 17:01:44 #topic Design Summit Schedule (sdague) 17:01:53 hi * 17:01:58 #link http://icehousedesignsummit.sched.org/ 17:02:20 sdague: I got my final approval 17:02:30 so the summit schedule is pushed, I figured we'd take a minute to figure out if there were any last minute sessions that we really need, and that I need to adjust for 17:02:49 otherwise, I'm pretty happy with how the schedule played out, and I think there will be a lot of good meat there 17:03:16 sdague: lgtm 17:03:18 looks good for me 17:03:45 sdague: nice pun? 17:03:55 :) 17:04:16 I guess I asked before, but who all is going to be there? 17:04:18 o/ 17:04:23 just to get a sense of things 17:04:25 I will 17:04:38 I will be there 17:04:49 sdague: I will also 17:04:58 I will 17:05:01 o/ 17:05:11 cool, the gang will all be there :) 17:05:18 ok, next topic 17:05:33 #topic Neutron job status (mtreinish) 17:05:41 oh this is a topic 17:05:57 ok so you may have noticed a new gating job on tempest neutron-pg-isolated 17:05:59 I figured you've got the most recent knowledge on that 17:06:16 sdague: I also have things to report regarding neutron 17:06:25 that is the same as the regular neutron job just with tenant isolation enabled (also with a postgres db) 17:06:32 mlavalle: cool, jump in 17:06:45 i'll wait for mtreinish to finish 17:06:52 yesterday I broke the neutron gate by increase the number of tests that have isolation enabled 17:07:16 it exposed another real bug in neutron 17:07:29 what does isolated mean? 17:07:48 jog0: it creates a separate tenant and user for each test class 17:07:58 and with neutron makes an separate network for each tenant 17:08:15 mtreinish: thought so, thanks. and that makes neutron fail more? 17:08:19 yep 17:08:39 strange 17:08:55 so the job was added to fix the asymmetry between the neutron gate and the tempest gate 17:08:55 there looks like there is some resource starvation that's happening 17:08:57 mtreinish; the good thing about this isolation code is that we are really putting neutron through its paces 17:09:15 so we can catch these issues without me breaking the neutron gate to do it 17:09:35 mtreinish: ++ 17:09:58 mtreinish: and you tripped another deadlock, right? 17:10:21 nati_uen_: thought so, but I did a logstash query this morning and it wasn't a 1:1 match up with the tfail 17:10:48 this is nati_uen_ etherpad with debug notes: https://etherpad.openstack.org/p/debug1243726 17:11:19 and bug 1243726 was opened for the issue 17:11:21 Launchpad bug 1243726 in neutron "tempest failure: No more IP addresses available on network" [Critical,Confirmed] https://launchpad.net/bugs/1243726 17:11:32 #link https://etherpad.openstack.org/p/debug1243726 etherpad for debugging tenant isolation 17:11:50 mtreinish: you know if nati_uen_ is still working the issue? 17:11:59 I think so 17:12:08 mtreinish: yeah, that bug is consistent with what I find in my dev system 17:13:11 mlavalle: ok, great. Are there other things you have to report on it? 17:13:18 or on other issues here? 17:13:32 sdague: I've been working on debugging https://bugs.launchpad.net/swift/+bug/1224001 17:13:34 Launchpad bug 1224001 in neutron "test_network_basic_ops fails waiting for network to become available" [Critical,Fix released] 17:13:58 sdague: the nature of the failure has changed in the log stash since the last fix to neutron 17:14:09 it now is mostly ping failures 17:14:23 mlavalle: do we need to change the elastic recheck query? 17:14:33 or open a new bug? 17:15:18 i can reproduce in my dev system and will continue debugging. I will use this as an opportunity to develop some of the tcpdump stuff we talked about last week 17:15:27 mlavalle: do you have a new recheck query for it? 17:15:39 that would be good to change so we can categorize it 17:15:50 sdague: i will soon 17:16:17 mtreinish: no for the time being. but i will ping you in irc if i think we should do it 17:16:31 mlavalle: ok 17:16:41 that's all i have 17:17:25 by the way, i'm not going to HKG but want somehow to be part of the neutron conversation :-( 17:17:25 ok, great 17:17:51 mlavalle: ok... I'm not sure how we do that, but we'll at least try to have a solid etherpad in advance 17:18:14 #topic Tempest config file naming conventions and reorg (mtreinish) 17:18:24 ok, mtreinish yours again 17:18:32 I thought this was at the bottom 17:18:32 sdague: that's great, thanks. I just the team know that i'm committed to this effort 17:18:46 mtreinish: Doesn't matter, go ahead 17:19:05 so this week I've been going through the config file and changing the grouping around and trying to update the naming to be consistent 17:19:17 mtreinish: +1 17:19:40 I want to start adding options for every extension and extra feature we're testing 17:19:47 instead of just assuming that they are enabled 17:20:08 but sdague brought up the good point of how we handle that with multiple api versions for the same extension 17:20:20 like the nova api v3 17:21:03 so does anyone has any input on what are config strategy should be for this kind of thing? 17:21:29 I was thinking for extensions with multiple versions we make it a string instead of a bool option to specify which versions are enabled 17:21:46 obviously this is only a transient issue because eventually the old api version will be deprecated 17:21:52 mtreinish: This is going to be pretty ugly a year or two from now 17:21:53 so we also have the issue of configuring this from devstack 17:22:08 because devstack really has no idea, as the way nova works is that everything is loaded by default 17:22:32 Coudn't we have some way to "opt-out" of extensions? 17:22:40 sdague: so I'm fine for defaulting everything true in the sample conf that will work around the devstack issue 17:22:52 Realistically, installations are going to have most enabled, if not aoo. 17:22:54 yeh, that just seems like a huge number of options, easy to get wrong 17:22:55 and I'm working on the config verification script for people who are manually configuring tempest 17:23:11 which will do the api querying to figure out what is enabled 17:23:12 what if we had a list option 17:23:28 computev2 = blah,foo,bar 17:23:35 sdague: of excluded extensions? 17:23:38 and 'all' is a special value 17:23:39 that list will get pretty long 17:23:55 mtreinish: Not if it means exclusion 17:23:56 using something similar to what we do in nova policy file wouldn't work? 17:24:02 dkranz: so, again, with nova, the minute you specify extensions, you specify them all 17:24:11 there isn't an exclude 17:24:28 so doing the math becomes interesting 17:24:34 sdague: So you either have all extensions enabled or none? 17:24:46 either all, or the list you provide 17:25:09 in v3 it's different, because of entry poitns 17:25:19 sdague: I was talking about exclusion only in the tempest config 17:25:34 maurosr: do you have a link? 17:25:40 So tempest would assume enabled unless mentioned 17:25:51 which would also handle the devstack case 17:25:52 dkranz: right, but that would mean you have to figure our that nova added a new extension that you didn't know about 17:26:07 because you actually need to compute the diff 17:26:29 but if you didn't know about it you would be running with it, unless the default was disabled 17:26:56 I think I may be too ignorant about this so will be quiet 17:27:06 heh, no think of it this way :) 17:27:15 avail extensions: a, b, c, d, e 17:27:21 nova loads: a, b 17:27:24 mtreinish: https://github.com/openstack/nova/blob/master/etc/nova/policy.json of course just the model, the idea would be enable extensions or not instead of privilege level 17:27:31 tempest exclusion for: c, d, e 17:27:40 now nova adds ext f 17:27:45 and your validation break 17:27:56 because f isn't excluded from your tempest config 17:28:01 but it's not enabled in nova 17:28:13 sdague: I just did not realize that these extensions in nova were by default opt-in 17:28:17 so if we are building a list, it should be in the same order as the services 17:28:25 dkranz: well, it's weird 17:28:29 maurosr: that's basically what I'm proposing except instead of doubling up for the v3 extensions make it a string which specifies the versions 17:28:31 it's all in, or explicit it 17:28:55 sdague: I don't think that matches the real usage model at all, which will mostly be "in" not "out" 17:28:57 explicit in 17:29:01 sdague: but oh well 17:29:11 yeh, the way it is 17:29:35 mtreinish: so is there an oslo config type that would let us do this with lists (that could be multi line)? 17:29:40 instead of lots of options? 17:29:52 sdague: there is ListOpt 17:29:59 I think it's at least exploring how terrible that patch would be 17:30:16 because a ton of boolean options feels weird to me 17:30:33 nova v2 is 70 extensions I think 17:30:40 sdague: we use it for logging right now: https://git.openstack.org/cgit/openstack/tempest/tree/etc/tempest.conf.sample#n13 17:30:45 sdague: I have to run to another meeting. See you in openstack-qa 17:30:50 mlavalle: sure 17:31:13 sdague: yeah that's fair 17:31:16 mtreinish: sure, but that's a much smaller list 17:31:26 mtreinish: Could we have the option just point to a policy file, or wherever the "in" is defined? 17:31:41 I guess dkranz's exclude approach would be good as well 17:31:53 Then the conf would not have to be updated all the time. 17:31:55 from brevity, though we know it would cause issues 17:32:03 dkranz: the policy file is not network accessible 17:32:15 sdague: that approach just makes my verification script more difficult 17:32:25 sdague: I think he's saying break out this into a separate file 17:32:26 sdague: I meant "get a copy from the cloud you are running against with tempest" 17:32:27 mtreinish: excludes... yeh 17:32:39 dkranz: so the problem is, you might not be able to do that 17:32:56 if I want to run tempest against hp cloud to figure out if it's really openstack, I can't get their policy file 17:33:09 sdague: Why not, or at least a sanitized subset with just what we care about? 17:33:24 sdague: Surely the implemented extensions is public? 17:33:36 dkranz: but not the policy file 17:33:47 sdague: I am going for DRY really 17:33:59 But perhaps it is not possible 17:34:16 and the reason we're going down this path, vs. trusting list_extensions, is to be explicit 17:34:33 sdague: I understand 17:34:50 mtreinish: ok, so how about explicit and "all" 17:34:57 as a list option 17:35:15 sdague: sure I can do that 17:35:23 lets see how bad it is 17:35:41 well we'll never see how bad it will get because we only run it as all :) 17:35:52 and not 60 of 70 extensions 17:36:07 well, someone else will tell us how bad it is 17:36:34 ok, lets move on 17:36:39 #topic Scope and place for performance testing such as Rally (dkranz) 17:36:53 So there was a discussion about this on the ml 17:37:14 I just wanted to get a feel of whether we think performance testing should ever be part of tempest 17:37:50 I could go either way 17:38:02 dkranz: I think that's a good idea or at least the part of it that's actually exercising things 17:38:31 dkranz: I like the idea 17:38:41 mtreinish: Right, but who will do this work? 17:38:58 mtreinish: If it is not done soon, and people like rally, it will get harder and harder. 17:39:24 dkranz: I think the point was letting the rally folks know that we'll like that part in tempest 17:39:42 they presumably already were going to do that work, so just that this is the place it should happen 17:40:01 sdague: That works for me, but I was not sure they intended to do that 17:40:06 I think the community spoke up pretty strongly about not wanting another load driver out there 17:40:30 sdague: ok, so we will see it percolate a bit 17:40:38 dkranz: yes, it remains unclear to me either, but it also was clear they wanted to be part of the gate, and I don't think that will happen if they remain split off doing their own thing 17:40:45 sdague: perhaps there could be some informal discussion at the summit 17:40:50 sure 17:41:10 sdague: Agreed about the gate. BUt that is sketchy for the real value 17:41:17 sdague: Even more so than for stress tests 17:41:45 dkranz: I think it's a hard problem, but I don't want to completely give up on it yet 17:42:00 sdague: ok let's discuss at summit 17:43:19 #topic Status and roll-out plan for failing the gate on log errors (dkranz) 17:43:22 next topic 17:43:28 all you dkranz 17:43:39 So there was more contention about this on the ml than I expected 17:44:40 I'm not sure how to proceed. I think my case was convincing. 17:45:13 A lot of folks seem to not get that if we allow crap in logs, no one will look at them and that is really bad. 17:45:28 dkranz: I don't think it was that contentious 17:45:54 honestly, I think the current whitelist approach is fine, and I expect there might be just a few error conditions that we negotiate over at the end 17:45:55 sdague: So if we say we are going to start to fail non-whitelisted errors there will be no objection? 17:46:13 dkranz: I think so 17:46:19 sdague: Great, if that is true. 17:46:38 jgriffith already went and changed a couple of error conditions in cinder because of the conversation 17:46:46 sdague: ok, cool 17:46:49 sdague: next 17:47:06 it looks like there are a couple more that need to be whitelisted out of nova network 17:47:13 from the last time I looked at logs 17:47:27 sdague: I am re-watching now 17:47:37 sdague: You can probably imagine how painful this is. 17:47:54 sdague: But I will push through, The end is in sight. 17:48:50 cool :) 17:48:56 next topic? 17:49:00 yep 17:49:03 #topic State of 'smoke' tagging: can we make it useful? (dkranz) 17:49:10 it's the dkranz show :) 17:49:14 :) 17:49:26 and, honestly, I'm super excited for the whitelist error stuff to hit 17:49:34 So it came up that the current smoke tagging is pretty arbitrary. 17:49:52 We want a set of tests that can run in 5-10 minutes that cover the most ground 17:50:02 dkranz: right now it's only used for what runs in grenade and the neutron jobs 17:50:18 we should get rid of all negative test flagged as smoke 17:50:21 mtreinish: Right, but that was not the intent 17:50:27 yeh, so honestly, I think we should just dump smoke and have our smoke target be all the non-slow scenario tests 17:50:35 I don't see any reason for negative tests that are smoke tests 17:50:45 mkoderer: +1 agree with that 17:51:01 sdague: That is reasonable if we have the right coverage. 17:51:15 sdague: Certainly the scenario tests *should* have enough coverage. 17:51:31 sdague: that was the future intent for grenade right, to just run scenario (and increase the scenario coverage) 17:51:39 and for neutron we want that to be running full anyway 17:51:42 dkranz: agreed, actually I'm hoping we can talk about that in - http://icehousedesignsummit.sched.org/event/1a28654a7e05217067ded2bacbfa7484 17:51:46 So that works for me 17:52:20 mtreinish: yeh 17:52:26 except that as we go forward we will have more non-slow scenario tests than can run in 5 minutes 17:52:31 IMHO several auth* related test should be smoke even if its a negative test 17:53:07 so maybe the negative test discussion at summit, and the scenario test discussion will flesh this out 17:53:14 sdague: ok 17:53:35 Perhaps after neutron is working we can re-use smoke to mean what it should. 17:53:48 dkranz: yeh, that would be nice 17:53:49 Then we can run 'smoke' scenario and api 17:53:52 afazekas: could be that there are some exceptions.. 17:54:11 I think that is he right answer for what we want. 17:54:23 sure, I guess we could do that, tag some representative scenario tests and a few others we think are important 17:54:30 sdague: Exactly 17:54:53 but in reality the API tests feel like they are largely a different class, and each very small, so being in smoke isn't quite right 17:55:01 but... a summit discussion 17:55:05 also possibly with beer :) 17:55:12 sdague: Definitely 17:55:22 sdague: That's all from me 17:55:23 * mkoderer not sure if the beer tastes good in HK 17:55:33 I will have to say the beer during summit sessions in san diego was a great idea 17:55:35 mkoderer: Bring some! 17:55:49 mkoderer: I'm sure they have importers :) 17:55:52 ok 17:55:59 ok :) 17:56:00 #topic Open Discussion 17:56:05 anything else? 17:56:29 are people going to be around next week, or are they starting to travel by that point? 17:56:40 sdague: I will be on a plane next Thur so may or may not have connectivity for the meeting 17:56:46 mtreinish and I are flying on Fri 17:56:55 others? 17:57:06 I leave on Sunday 17:57:13 dkranz, sdague: doesn't that depend on your reference point? (and departure time) 17:57:38 https://review.openstack.org/#/c/53699/ needs review to unblock stable/havana 17:57:56 I had a commit message typo but zuul said it was working 17:58:06 jog0: why don't you cherry pick the version of that in master? 17:58:12 heh 17:58:36 mtreinish: that is a seperate patch https://review.openstack.org/#/c/51041/ 17:58:45 this is a bigger issue though 17:58:51 dkranz: ok, can you +2 - https://review.openstack.org/#/c/52413/1 first? 17:59:00 sdague: Yeah, just a sec 17:59:05 oh crap, that's going to be a merge conflict 17:59:13 no there was patch bumping six on master 17:59:16 that's been merged 17:59:32 jog0: https://git.openstack.org/cgit/openstack/tempest/commit/?id=c0441be3d7f994998779054991214242c5005877 18:00:01 sdague: I did it 18:00:15 ok, we need to give up the slot, lets take this to -qa 18:00:20 #endmeeting