17:00:03 <mtreinish> #startmeeting qa 17:00:04 <openstack> Meeting started Thu Sep 19 17:00:03 2013 UTC and is due to finish in 60 minutes. The chair is mtreinish. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:05 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:07 <openstack> The meeting name has been set to 'qa' 17:00:20 <mtreinish> who's here for the meeting? 17:00:37 <afazekas> hi 17:00:42 <Anju> hii 17:00:42 <mkoderer> hi 17:01:10 <dkranz> hi 17:01:10 <giulivo> hi 17:01:12 <mtreinish> ok here today's agenda: 17:01:14 <mtreinish> #link https://wiki.openstack.org/wiki/Meetings/QATeamMeeting 17:01:23 <mtreinish> lets get started 17:01:34 <mtreinish> #topic neutron testing status 17:01:39 <ravikumar_hp> hi 17:01:41 <mtreinish> mlavalle: are you around? 17:02:10 <mlavalle> Yes i am 17:02:21 <mtreinish> ok, any update on neutron testing? 17:02:44 <mlavalle> mtreinish: I've been debugging isolated_creds 17:03:13 <mlavalle> mtreinish: as you know, I am having a race issue. With the latest run you sent me….. 17:03:33 <mtreinish> ok 17:03:43 <mlavalle> I think I need to add exception handling when I don't find the port in cleanup_ports 17:03:47 <psedlak> hi 17:04:02 <mlavalle> that's going to be my next patchset 17:04:20 <mtreinish> mlavalle: ok, we talk about the details of the patch after the meeting I had some questions 17:04:31 <mtreinish> is there anything else on the neutron front? 17:04:43 <mlavalle> I would also like to request a run of the neutron_full gate job 17:04:44 <afazekas> can we merge the partially working isolation patch now ? 17:05:11 <mtreinish> afazekas: it'll fail on clean up 17:05:20 <mtreinish> mlavalle: I think it's in the experimental queue 17:05:21 <mlavalle> right now is disabled. I want to see the status of a run, after all the changes that have been merged 17:05:36 <mlavalle> can you point me to it? 17:06:05 <mtreinish> mlavalle: I think you leave a zero comment 'check experimental' 17:06:08 <mtreinish> dkranz: is that right? 17:06:14 <dkranz> mtreinish: Yes. 17:07:00 <mlavalle> mtreinish: ok, with that I will be able to gauge how much more work we need to do to fix it for godd 17:07:12 <mlavalle> that's all I have 17:07:13 <mtreinish> ok cool 17:07:21 <mtreinish> let's move on then 17:07:28 <mtreinish> #topic blueprints 17:07:49 <afazekas> The patch makes closer to a working network isolation ant it is easier to continue if it is merged, it does not have any negative impact to the current jobs 17:07:49 <mtreinish> are there any blueprints that we need to discuss? 17:08:32 <mtreinish> afazekas: I'm not comfortable merging code that doesn't work 17:09:00 <mtreinish> ok if there aren't any blueprints to bring up then lets move 17:09:14 <mtreinish> #topic Critical Reviews 17:09:23 <mlavalle> afazekas: I don't feel comfortable either, yet. let me wrestle with it a little longer 17:09:28 <mtreinish> are there any reviews that people need to get eyes on? 17:09:28 <dkranz> mlavalle: You should do the check experimental in a tempest patch. It was not added to all projects. 17:10:17 <Anju> mtreinish: https://review.openstack.org/#/c/43039/ 17:10:20 <Anju> this one 17:10:28 <mtreinish> #link https://review.openstack.org/#/c/43039/ 17:10:29 <dkranz> mtreinish: https://review.openstack.org/#/c/38995/ is very old and I hope can be approved. 17:10:38 <mtreinish> #link https://review.openstack.org/#/c/38995/ 17:11:05 <mtreinish> ok I've got them open on my browser I'll take a look after the meeting 17:11:06 <dkranz> Anju: That link was merged 17:11:20 <mtreinish> and any other cores can pick them off too 17:11:31 <giulivo> mtreinish, one more, it's not critical but useful to figure what is good and what isn't for the stable branches 17:11:32 <giulivo> https://review.openstack.org/#/c/45808/ 17:11:33 <mtreinish> Anju: yeah it is 17:11:44 <mtreinish> Anju: it's also a nova patch 17:11:52 <Anju> mtreinish: https://review.openstack.org/#/c/39621/ 17:11:56 <Anju> mtreinish: this one 17:11:58 <mtreinish> #link https://review.openstack.org/#/c/45808/ 17:12:16 <giulivo> btw https://review.openstack.org/#/c/45808/ is a cherry pick 17:12:16 <Anju> i want to how to proceed in v3 tests 17:12:21 <mkoderer> https://review.openstack.org/#/c/39621/ got several -1's 17:12:58 <mtreinish> giulivo: yeah I see that we can talk about that after the meeting 17:13:12 <mtreinish> ok are there any other reviews? 17:13:22 <adalbas> #link https://review.openstack.org/#/c/39621/ 17:13:34 <dkranz> mkoderer: I don't think the -1s are valid at this point. 17:13:48 <Anju> mtreinish, mkoderer , giulivo , dkranz ,afazekas : i know the topic is not for v3 tests. but need a direction... 17:14:11 <Anju> mtreinish, mkoderer , giulivo , dkranz ,afazekas : two times -1 in this patch 17:14:12 <mtreinish> dkranz: yeah it just needs a rebase probably 17:14:12 <malini2> sorry to interrupt -- but do we not have a security meeting today? 17:14:27 <mtreinish> malini2: it's in an hour I think 17:14:39 <malini2> oh 17:15:00 <mtreinish> Anju: there is nothing wrong with the v3 tests but that one just needs a rebase 17:15:12 <mtreinish> that commit is a straight copy and paste of the v2 tests 17:15:31 <dkranz> mtreinish: To be clear, we have adopted the copy/modify approach and are ready to move on any v3 reviews, right? 17:15:52 <Anju> afazekas: is this ok? 17:16:10 <mtreinish> dkranz: I think that's what the plan is. I didn't really agree with it but the consensus was against me so I'm fine with it now :) 17:16:25 <Anju> your comments to more inheritance and less copy-paste ? 17:16:33 <afazekas> Anju: It is the faster way to get v3 test, adding same inheritance is possible latter 17:16:39 <dkranz> mtreinish: What did you propose? 17:16:56 <mtreinish> just adding v3 tests as individual patches don't bother copying and pasting 17:17:05 <mtreinish> and maybe break it up into smaller patches in a longer series 17:17:24 <dkranz> mtreinish: but the end result is still a full copy of tests for v3? 17:17:33 <mtreinish> yeah 17:17:59 <dkranz> mtreinish: I was talking about copy/modify vs attemped inheritance 17:18:23 <mtreinish> yeah the copy/modify is the way they're going to do it 17:18:32 <dkranz> mtreinish: I don't remember a discussion and don't really have an opinion about how we get there 17:18:48 <dkranz> mtreinish: But we can move on now. 17:18:51 <mtreinish> it was on the ml and partially in a review at one point 17:18:59 <mtreinish> yeah ok let's go to the next topic 17:19:11 <mtreinish> #topic How to handle bug fixes in launchpad 17:19:18 <mtreinish> so this one is mine 17:19:42 <mtreinish> during the bug day earlier this week we noticed that all the fixed bugs were in the fix committed state 17:20:07 <mtreinish> it used to immediately go into fix released after the patch was merged 17:20:21 <mtreinish> there seemed to be a bit of disagreement over which approach was better 17:20:31 <afazekas> IMHO after it merged it should go the Fix released state automatically or with 1-7 day delay 17:20:40 <dkranz> afazekas: I agree 17:20:42 <mtreinish> so I just wanted to bring it up during the meeting to see if there was a strong opinion one way or the other 17:21:07 <mtreinish> afazekas: ok yeah that's what I'm leaning towards again 17:21:31 <mtreinish> I can revert for jeepb to switch it back to the old behavior then 17:21:40 <dkranz> mtreinish: Thanks. 17:21:54 <mtreinish> ok this was a quick topic 17:22:01 <mtreinish> #topic Bogus errors in logs one more time 17:22:04 <mtreinish> dkranz: you're up 17:22:32 <dkranz> So I was trying to see if there was a failure in the neutron logs and saw that it too has lots of bogus ERROR/stacktrace 17:22:35 <giulivo> mtreinish, and eventually move whatever is in fix committed to fix released ? 17:22:48 <dkranz> That show up even on successful runs 17:22:54 <afazekas> the tempest.log should have a thread/pid number 17:23:10 <mtreinish> giulivo: yeah that was the intent of switching leaving it in fix committed 17:23:13 <dkranz> infra has agreed they could put in a regexp on the logs to fail a build if bogus errors show up 17:23:46 <dkranz> Part of fixing this issues is educating developers about what log.error should be used and not used for. 17:24:10 <dkranz> The second part is giving some priority to fixing the bogus ones 17:24:10 <mtreinish> dkranz: yeah I've seen overuse of log.ERROR before too 17:24:21 <afazekas> I hope it will not lead to not using the error or critical level when it is required 17:24:22 <mtreinish> I think sdague was working on a whitelist at one point 17:24:31 <dkranz> The third part is defining a "whitelist" so that infra can start failing builds that introduce new ones. 17:24:35 <mtreinish> or that might have just been for stacktraces 17:25:00 <dkranz> mtreinish: I haven 17:25:20 <dkranz> I haven't heard any one disagree with these points but I'm not sure how to make it happen. 17:25:26 <dkranz> Particularly the priority part 17:25:55 <dkranz> I don't know why people don't think this has customer/user impact 17:26:16 <dkranz> It was a major headache with the system I ran. 17:26:29 <mtreinish> dkranz: well we can make a blueprint for this, for the priority one it's really 2 parts identifying the spurious log messages and then opening bugs for them 17:26:45 <mtreinish> and marking them as high priority bugs 17:26:51 <jog0> it would be nice to see some of this before Havana is cut 17:26:54 <dkranz> mtreinish: Yeah. THere are already a bunch of bugs I filed a while ago. 17:27:05 <dkranz> mtreinish: I can take another pass at this. 17:27:25 <dkranz> But really some one from each team should grep there logs and go from there 17:27:27 <mtreinish> I think the best way to start going about this is to send out a post to the ML to try to get a wider audience 17:27:47 <afazekas> +1 17:28:09 <dkranz> mtreinish: Sure. But I've done that before :( 17:28:10 <jog0> we can use logstash for this i think 17:28:29 <mtreinish> jog0: yeah logstash will be useful for this too 17:28:32 <dkranz> jog0: How? 17:28:33 <jog0> (for finding what stactraces happen today) 17:28:41 <giulivo> dkranz, mtreinish how about going the opposite route which is collecting what is logged as error also when tempest succeeds and post it to the -dev list ? 17:28:54 <dkranz> giulivo: That is exactly what I was talking about 17:29:07 <giulivo> oh I thought asking -dev to inspect 17:29:25 <dkranz> giulivo: Yes, to inspect the bogus ERROR in their logs 17:29:51 <giulivo> ok so I was just suggesting -qa/-infra inspects the logs first and post the bogus messages 17:30:22 <jog0> @message:Traceback* AND @fields.filename:"logs/screen-n-api.txt" 17:30:34 <jog0> dkranz: ^ put that in logstash.openstack.org 17:30:53 <mtreinish> dkranz: there should probably be a summit topic on this too. 17:31:06 <dkranz> mtreinish: OK, I'll put one in. 17:31:11 <mtreinish> but we want to start it before havana 17:31:11 <jog0> 'message:Traceback* AND @fields.filename:"logs/screen-n-api.txt" AND @fields.build_status:"SUCCESS"' will tell you for only passing jobs 17:31:30 <dkranz> jog0: OK 17:31:59 <dkranz> jog0: There are also bogus ERROR without stacktrace 17:32:13 <mtreinish> there is also the find_stack_traces.py script in tools 17:32:23 <mtreinish> although I haven't used it at all 17:32:46 <dkranz> mtreinish: I will do something 17:33:03 <afazekas> where the white list should be stored ? is the devstack-gate repo good for whitelist ? 17:33:27 <mtreinish> afazekas: it should probably be separate 17:33:31 <dkranz> afazekas: Sean had some idea for this but didn't tell me details 17:34:06 <mtreinish> dkranz: well then we can all bug him when he gets back. 17:34:16 <mtreinish> dkranz: it probably wouldn't hurt to start a bp on this now 17:34:28 <mtreinish> and we can fill in the details after we have some more discussion about how to do it 17:34:36 <dkranz> mtreinish: OK, I'll do that. 17:34:47 <afazekas> tempest or infra bp ? 17:34:49 <dkranz> mtreinish: The problem is that it is really a cross-project blueprint 17:35:05 <dkranz> mtreinish: We as a community don't always do so well with those. 17:35:38 <mtreinish> dkranz: yeah, it really doesn't fit too well in one project. Just stick it somewhere I guess 17:35:46 <mtreinish> we can make individual project bps too 17:35:52 <mtreinish> and use dependency to track them 17:35:59 <mtreinish> that's what I did for the coverage extension 17:36:00 <dkranz> mtreinish: That's a good idea. 17:36:10 <mtreinish> dkranz: https://blueprints.launchpad.net/tempest/+spec/tempest-coverage-reporting 17:36:17 <dkranz> mtreinish: I'll figure out which projects have issues and open them in each 17:36:48 <dkranz> mtreinish: tempest is probably the one project that won't require code changes :) 17:36:50 <psedlak> isn't the infra obvious choice for that (bogus errors), what would be the reason to create it as tempest bp? 17:36:54 <jog0> if we can gate on no new stacktraces before Havana is out that would be really amazing 17:37:15 <dkranz> jog0: Yes 17:37:40 <dkranz> jog0: To be clear, are you distinguishing between incorrect ERRORs that have stacktraces and those that don't? 17:38:10 <dkranz> jog0: If we make "rules" for developers we have to be very precise 17:39:06 <afazekas> if there is an ERROR message without detailed info it is a double -1 :) 17:39:58 <dkranz> IMO, ERROR in log should be for something the operator *should* understand/investigate 17:40:26 <dkranz> And can thus be used as a monitoring alert 17:40:45 <jog0> I agree with dkranz. I think its better to take a smaller step at first and just worry about stacktraces and not errors 17:40:46 <afazekas> +1 17:40:50 <dkranz> swift is hopeless in this regard 17:41:27 <mtreinish> dkranz: +1 (not about swift I haven't really looked at swift logs much) 17:41:37 <mtreinish> dkranz: ok is there anything else on this topic? 17:41:39 <rockyg> Gotta start somewhere. +1 17:41:39 <dkranz> jog0: I'm ok with that. 17:41:56 <dkranz> Just want to make sure if people agree with my statement about ERROR above 17:42:08 <dkranz> at 13:39 17:42:21 <afazekas> dkranz: I assume syslog con be configured to separate the swift logs 17:42:34 <rockyg> Yes 17:42:39 <mtreinish> dkranz: I do (that's what the +1 was for) 17:42:39 <dkranz> afazekas: Sure 17:42:42 <jog0> dkranz: why not punt the ERROR msg stuff till Icehouse 17:42:53 <dkranz> mtreinish: OK, great. I'll report next week on my progress. 17:42:55 <jog0> consensus and all) 17:42:59 <mtreinish> jog0: yeah that makes sense 17:43:18 <mtreinish> dkranz: ok you'll get a semi-permanent spot on the agenda then :) 17:43:26 <dkranz> mtreinish: :) 17:43:36 <mtreinish> ok then let's open the floor 17:43:39 <rockyg> Error in log means investigate to any ops guy. So better have enough info for them to stay to dog e 17:43:41 <mtreinish> #topic open discussion 17:44:08 <mtreinish> are there any topics to bring up with what time is left? 17:44:31 <jog0> mtreinish: you can mention the work we have been doing 17:44:48 <mtreinish> oh yeah this is a good forum to discuss that 17:45:01 <afazekas> The pid/thread should be logged in the tempest.log , inorder to distinguish the logs from different workers 17:45:20 <mtreinish> so jog0 and I have been working on a bot that watches the gerrit stream for tempest failures 17:45:31 <mtreinish> and then use logstash to find fingerprints for open bugs 17:45:47 <mtreinish> and report back on irc and the gerrit commit with what it found 17:45:48 <rockyg> Cool 17:46:01 <mtreinish> you've probably seen RecheckWatchBot on the -qa channel 17:46:04 <jog0> well a human stil has to find the fingerprint but then we use logstash to classify the failures 17:46:23 <mtreinish> yeah that's what I meant (I didn't word it clearly) 17:46:52 <mtreinish> we're going to be moving it over to infra soon and will have everything up in gerrit too 17:47:36 <dkranz> mtreinish: That's cool. 17:47:48 <dkranz> mtreinish: Who do we expect to look at this, and what do we expect them to do? 17:48:06 <mtreinish> dkranz: it's mostly to lower the developer load for using recheck 17:48:12 <jog0> dkranz: you and the patch author 17:48:13 <mtreinish> and avoid duplicate bugs 17:48:28 <afazekas> What is the easiest way to find the logs relates a recheck ? on the recheck page you see the change number, but you need to click a lot for getting to the real job logs 17:48:33 <dkranz> jog0: Ah 17:48:52 <dkranz> So if I see this I should go to the review and do a recheck if one hasn't been done already? 17:48:55 <jog0> in this case https://review.openstack.org/#/c/47365/ just the patch auther looks at it 17:49:13 <jog0> or you if you want 17:49:28 <jog0> but when an unclassifed faulure comes up, we need to write a logstash query for it 17:49:46 <jog0> https://github.com/jogo/elasticRecheck/blob/master/queries.json 17:50:20 <jog0> afazekas: there isn't an easy way which is why we wrote this 17:50:27 <dkranz> jog0: I see. 17:51:12 <mtreinish> afazekas: about the logging I'm not opposed to doing that, but it's not exactly straightforward and I think it's a lower priority 17:51:13 <jog0> this should reduce the number or recheck no bugs, making our recheck numbers more accurate 17:51:26 <jog0> allowing us to better prioritize transient gate failures 17:52:39 <afazekas> Is anybody knows why this bug moved to medium ? https://bugs.launchpad.net/tempest/+bug/1205344 can we enable the test case ? 17:52:41 <uvirtbot> Launchpad bug 1205344 in nova "mkfs error in test_stamp_pattern" [Medium,Confirmed] 17:53:09 <mtreinish> jog0: ^^^ you and russellb were talking about that yesterday right? 17:53:11 <jog0> afazekas: yeah russell changed it yesterday 17:53:24 <jog0> you can dig through the eavesdrop for nova for the exact wording 17:53:43 <jog0> how frequent was that bug 17:54:12 <afazekas> it was really frequent 17:54:43 <jog0> russellb: ^ 17:55:08 <mtreinish> afazekas: you guys should probably take this offline (we've got ~5 min left) 17:55:29 <mtreinish> are there any other topics to bring up with what time we've got left? 17:55:36 <dkranz> Not from me 17:55:57 <giulivo> jog0, I think it is a nice tool, you said will enable coop via gerrit not git pull requests right? 17:56:15 <giulivo> looks like managing the queries is the most coop part 17:56:24 <mtreinish> giulivo: yeah that's the plan we're probably going to be making the move today 17:56:33 <clarkb> at least starting the process of moving 17:56:45 <mtreinish> clarkb: :) 17:56:56 <clarkb> I don't want to sneak it by jeblair, fungi, and mordred and we are all pretty busy this week 17:57:16 <psedlak> afazekas touched the issue that logs from parallel runs are not much useful ... could we add thread/pid numbers to log format or something like that? 17:58:04 <clarkb> psedlak: the subunit log has that 17:58:22 <clarkb> psedlak: I think we should probably attach the logs to the subunit for each test as they are run 17:58:29 <clarkb> this is what nova et al do 17:58:59 <psedlak> yes, i for example meant the tempest.txt ... 17:59:15 <psedlak> clarkb: what has to be done for that? 18:00:00 <clarkb> psedlak: http://git.openstack.org/cgit/openstack/nova/tree/nova/test.py#n242 18:00:06 <mtreinish> psedlak: another option which might be simpler would be printing the testname with each log message 18:00:14 <mtreinish> but we're out of time 18:00:20 <mtreinish> psedlak: we can pick this up on -qa 18:00:23 <mtreinish> #endmeeting