15:00:23 <anteaya> #startmeeting third-party 15:00:25 <openstack> Meeting started Mon Mar 30 15:00:23 2015 UTC and is due to finish in 60 minutes. The chair is anteaya. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:29 <openstack> The meeting name has been set to 'third_party' 15:00:32 <anteaya> hello 15:00:38 <ameade> hey 15:00:48 <anteaya> raise your hand if you are here for the third party meeting 15:00:50 <anteaya> ameade: hello 15:00:59 <anteaya> ameade: how is your work coming along? 15:01:08 <kaisers1> hi 15:01:13 <anteaya> kaisers1: hello 15:01:17 <ameade> anteaya: very well thanks :) 15:01:24 <anteaya> ameade: I'm glad to hear that 15:01:30 <anteaya> ameade: what is your current status? 15:01:37 <patrickeast> hi everyone 15:01:44 <anteaya> patrickeast: hi patrickeast 15:02:47 <ameade> anteaya: have been posting on Cinder for most of our drivers for months now but have been setting up another system for our Fibre Channel stuff which has been interesting 15:03:00 <anteaya> yes, I've been following along 15:03:11 <anteaya> which is why I am giving you priority right now at the meeting 15:03:20 <anteaya> unless you prefer that I don't 15:03:55 <ameade> just hanging out incase I can help other folks :) 15:04:21 <anteaya> oh okay 15:04:26 <anteaya> that is good too 15:04:27 <anteaya> welcome 15:04:33 <anteaya> how is everyone today? 15:05:09 <asselin> hi 15:05:14 <anteaya> hey asselin 15:05:26 <anteaya> does anyone have anything they would like to discuss? 15:06:00 <asselin> ameade, I posted our fc cinder testing notes back a few meetings ago 15:06:06 <asselin> ameade, where you there? 15:06:24 <ameade> asselin: I dont think so 15:07:30 <ameade> asselin: Do you do one job at a time against an FC switch or have you turned zoning off? 15:07:59 <asselin> ameade, we have zoning off 15:08:37 <ameade> gotcha, i'm currently running one at a time with single concurrency, but to scale i'm going to have to turn it off too 15:08:58 <ameade> i do think I found a bug with the synchronization in the brocade code though 15:10:17 <asselin> ameade, please share / submit a bug report. we're planning to test with the zone manager eventually 15:10:37 <ameade> asselin: yeah definitely, i'm going to dig into it once i get this other hardware setup 15:12:27 <anteaya> asselin: thanks for bringing this up 15:12:36 <anteaya> is there more to discuss here? 15:13:35 <anteaya> does anyone have anything else they would like to discuss? 15:13:49 <kaisers1> quick question regarding a failure i have 15:13:54 <kaisers1> pasting, sec 15:13:54 <asselin> ameade, http://eavesdrop.openstack.org/meetings/third_party/2015/third_party.2015-02-23-15.01.log.html 15:14:04 <anteaya> there are no quick questions, just questions 15:14:12 <asselin> #link my fibre channel cinder notes http://eavesdrop.openstack.org/meetings/third_party/2015/third_party.2015-02-23-15.01.log.html 15:14:19 <kaisers1> http://paste.openstack.org/show/197567/ 15:14:32 <kaisers1> a failure in the cinder tests, 15:15:02 <anteaya> kaisers1: and let's have a question to go with the paste 15:15:06 <kaisers1> i'm just looking into this. Nova cannot create a floating ip, do i read that correctly? 15:15:14 <kaisers1> sorry, not the fastest typing 15:15:15 <ameade> asselin: perfect, thank you...I am gonna have to automate the PCI passthrough stuff soon 15:15:36 <asselin> kaisers1, this test is failing for us too. I disabled it and didn't look into it yet. test_minimum_basic_scenario 15:15:51 <kaisers1> asselin: oh, ok 15:16:01 <krtaylor> hi everybody, sorry I am late 15:16:11 <asselin> ameade, FYI there could be a better way to do it directly via nova pci passthrough. 15:16:15 <kaisers1> asselin: what kind of disabling do you use? don't want to clash with thingees requirements... 15:16:21 <anteaya> kaisers1: I get one hit returned on that error: http://www.gossamer-threads.com/lists/openstack/dev/23837 15:16:45 <kaisers1> anteaya: thanks for the hint! 15:16:47 <asselin> kaisers1, I use regex exclusions. 15:16:59 <kaisers1> asselin: that allowed after fridays email? :) 15:17:05 <anteaya> kaisers1: so it looks like a timing issue 15:17:10 <kaisers1> s/email/emails/ 15:17:25 <anteaya> kaisers1: do you have a link to the email you reference? 15:17:27 <asselin> kaisers1, I'm not too concerned with thingee's requirements. There are bugs that need to get fixed. 15:17:37 <kaisers1> ok, thanks for the feedback :) 15:17:41 <anteaya> kaisers1: it helps if you go to lists.openstack.org and bring one back 15:17:50 <kaisers1> ok, sec 15:17:54 <anteaya> thank you 15:17:57 <rhe00> asselin: I am using your FC passthrough scripts and it is very reliable. I tried the nova flavor method as well but could not get it to work reliable with nodepool. 15:18:20 <rhe00> reliably 15:18:24 <kaisers1> one more thing from last week. The weird ResponseNotReady issue was solved 15:18:28 <asselin> thingee wants to include all 'volume' tests. We disable those that don't work and investigate them outside of CI. 15:18:39 <asselin> rhe00, great to know! 15:18:41 <anteaya> asselin: thanks for clarifying 15:18:58 <anteaya> as some would take your above comment to mean that _they_ don't need to worry about requirements 15:18:59 <kaisers1> asselin: ok, will probably go similar 15:19:03 <anteaya> which is not the case 15:19:22 <asselin> kaisers1, it's nice to know other drivers have issues. Likely means its a real bug in the tempest test or cinder itself. 15:19:48 <kaisers1> asselin: I found a bug in tempest on friday :) 15:19:49 <asselin> kaisers1, and not necessarily the drivers 15:19:56 <asselin> kaisers1, link? 15:20:00 <kaisers1> asselin: that was fixed quickly, sec 15:20:57 <kaisers1> link takes a moment, please continue 15:21:20 <anteaya> I'm waiting on your link for the bug and for the mailing list post 15:21:28 <anteaya> unless someone else can find and post them first 15:21:40 <kaisers1> bug: https://bugs.launchpad.net/tempest/+bug/1437328 15:21:41 <openstack> Launchpad bug 1437328 in tempest "No networks found in fixed_network.py (list index out of range)" [High,Fix released] - Assigned to Matthew Treinish (treinish) 15:21:41 <anteaya> I don't have enough information myself to attempt a search 15:21:55 <anteaya> #link https://bugs.launchpad.net/tempest/+bug/1437328 15:21:57 <asselin> ok, the issue I'm running into is: test_volume_upload is causing the jenkins slave to freeze 15:22:05 <asselin> intermittently 15:22:16 <anteaya> asselin: do you have any artifacts to share? 15:22:52 <asselin> anteaya, all I have is the nova console log error message: "BUG: soft lockup - CPU#1 stuck for 22s" 15:23:09 <anteaya> nice 15:23:14 <ameade> i dont think i've hit these fwiw 15:23:42 <anteaya> asselin: I don't know as we have been seeing that in infra 15:23:53 <anteaya> that error message doesn't ring a bell with me 15:24:01 <anteaya> when did it begin? 15:24:46 <asselin> anteaya, it's been around for a while....we used to exclude that test, but now want to run it (obviously) 15:25:04 <anteaya> that is looking like a linux kernel bug 15:25:14 <anteaya> #link https://bugzilla.redhat.com/show_bug.cgi?id=442920 15:25:18 <openstack> bugzilla.redhat.com bug 442920 in kernel "BUG: soft lockup - CPU#0 stuck for 61s!" [Low,Closed: insufficient_data] - Assigned to kernel-maint 15:25:42 <kaisers1> anteaya: This is the mail link: http://lists.openstack.org/pipermail/openstack-dev/2015-March/059990.html . This goes along with some prior discussion regarding tests should not be skipped via regexp. I'm not sure if that was on irc or the mailing list 15:25:42 <asselin> anteaya, yes likely either on the bare metal or nodepool image..... 15:26:00 <anteaya> #link http://lists.openstack.org/pipermail/openstack-dev/2015-March/059990.html 15:26:09 <anteaya> kaisers1: thank you 15:26:22 <anteaya> asselin: interesting 15:26:32 <asselin> anteaya, I'm trying to upgrade the blade's linux kernel to 3.16 (from 3.13) but then fc passthrough breaks 15:26:42 <anteaya> :( 15:27:10 <anteaya> just for the sake of conversation what happens if you downgrade the kernel 15:27:22 <anteaya> meaning it is just 3.13 that is causing that error 15:27:22 <asselin> anteaya, didn't try 15:27:27 * wznoinsk joins late 15:27:32 <anteaya> it might be a dumb idea 15:28:19 <krtaylor> asselin, unfortunately, that message could be a lot of different problems 15:28:29 <anteaya> asselin: here is a thread on an ubuntu forjm 15:28:32 <asselin> anteaya, it's not a dumb idea. It's possible there's a regression in one of the minor release versions 15:28:33 <anteaya> #link http://ubuntuforums.org/showthread.php?t=1757773 15:28:44 <anteaya> asselin: okay well let me know if you decide to try it 15:29:00 <anteaya> asselin: you are using ubuntu, yes? 15:29:08 <asselin> anteaya, yes 15:29:17 <asselin> krtaylor, you've seen these before? 15:29:33 <anteaya> asselin: the thread seams to be suggesting a bios upgrade 15:29:36 <krtaylor> asselin, yes, early on, not in a while 15:29:47 <anteaya> asselin: I'm not sure how blades work, would a bios upgrade apply? 15:30:15 <asselin> krtaylor, which kernel versions do you use for the bare metal & jenkins slaves? 15:30:29 <asselin> anteaya, perhaps I can check to see if there's an update... 15:30:35 <krtaylor> anteaya, yes, we also saw this with hardware related problems with bringup 15:30:49 <anteaya> asselin: it might be a first step rather than downgrading the kernel 15:31:13 <anteaya> has anyone else seen this issue? 15:31:14 <krtaylor> asselin, we are not doing bare metal (yet), we run in 1st level guests just like upstream infra 15:31:50 <asselin> krtaylor, so who runs the hosts? 15:32:14 <krtaylor> asselin, but on our hosts we run a base F20/F21 15:32:51 <anteaya> asselin: let me know when you're done 15:32:59 <asselin> anteaya, i'm done 15:33:33 <anteaya> asselin: okay 15:33:59 <anteaya> kaisers1: so going back to the mailing list thread, are you one of the people who participated in the conversation? 15:35:01 <anteaya> kaisers1: did we lose you? 15:35:14 <kaisers1> anteaya: sorry, colleague asked something 15:35:27 <anteaya> are you one of the folks who participated in the thread? 15:35:44 <kaisers1> anteaya: I did not write but due to the added tests from friday our ci has two failures 15:35:52 <anteaya> okay 15:35:58 <anteaya> which failures? 15:36:07 <kaisers1> one was the exception i posted earlier 15:36:15 <anteaya> okay and the other? 15:36:29 <kaisers1> a permission issue with runtime snapshots 15:36:41 <anteaya> do you have a stack trace handy or no? 15:36:42 <kaisers1> That may relate to our driver 15:36:46 <anteaya> ah 15:36:55 <kaisers1> that's why i'm looking into it myself 15:36:57 <anteaya> well that is good you are running the tests to find that 15:37:00 <anteaya> okay great 15:37:43 <anteaya> so you have a bout 48 hours to get them fixed, yes? 15:38:18 <kaisers1> that's what i think. Although we're still in trunk, our CI was working ok at the deadline, just the new tests are now creating pressure 15:38:24 <anteaya> yes 15:38:35 <anteaya> and from what I read you have 48 hours 15:38:46 <anteaya> are you in a position to meet that deadline? 15:38:53 <kaisers1> 04.06., bit more than 48hours? 15:38:59 <anteaya> give or take 15:39:05 <kaisers1> ok :) 15:39:09 <anteaya> 5 days before 04/06 15:39:14 <anteaya> which is April 1 15:39:18 <anteaya> read the email 15:39:18 <kaisers1> yep 15:39:32 <kaisers1> yes, i did read that 15:39:32 <anteaya> #info you must have a CI reporting and stable 15:39:34 <anteaya> for five days prior to 4/6. 15:39:39 <anteaya> good 15:39:46 <rhe00> anteaya: I think that is only for drivers that were pulled because they didn't have a working CI at the deadline 15:39:47 <anteaya> so stay in communication 15:39:49 <kaisers1> yes 15:39:59 <anteaya> rhe00: correct 15:40:10 <anteaya> rhe00: that is also how I read that 15:40:10 <kaisers1> that's what i gathered, too. Nevertheless i want the failures be fixed asap anyways 15:40:18 <anteaya> kaisers1: good attitude 15:40:20 <asselin> kaisers1, I think we need to get this clarified at the cinder meeting. Personally I think you should be able to have select exclusions with e.g. bug's submitted to track them. 15:40:30 <anteaya> asselin: good point 15:40:35 <rhe00> asselin: +1 15:40:40 <anteaya> has someone added an agenda item to the cinder meeting yet? 15:40:42 <kaisers1> that would be fine 15:41:13 <anteaya> someone needs to drive it for the conversation to happen 15:42:01 <kaisers1> yep. My thought was first to write a mail for clarification 15:42:13 <kaisers1> Otherwise there will be an agenda bullet added... 15:42:22 <asselin> I can post to the mailing list & ask in cinder channel. Given the deadline, probably best to do it today. 15:42:29 <anteaya> asselin: I agree 15:42:48 <anteaya> asselin: did you want to take an action item on that? 15:42:53 <asselin> sure. 15:43:25 <asselin> also I remember now some disucssions in the cinder channel about it. I'll have to look that up again. I think there were clarifications that didn't make it to the mailing list. 15:43:35 <anteaya> #action asselin to post to the mailing list and ask in cinder channel to clarify select exclusions for cinder tests if they have bugs tracking them 15:43:44 <anteaya> asselin: does that sound reasonable? 15:43:49 <asselin> sure 15:43:52 <anteaya> thank you 15:44:13 <anteaya> any more on cinder tests? 15:44:31 <kaisers1> i'm done regarding that for today 15:44:36 <anteaya> kaisers1: thanks 15:44:44 <anteaya> I appreciate you bringing that up 15:44:59 <kaisers1> anteaya: pleasure 15:45:00 <anteaya> discussing issues before a deadline is much nicer than afterwards 15:45:17 <anteaya> does anyone have anything else they would like to discuss today? 15:45:51 <wznoinsk> hi all 15:45:56 <anteaya> wznoinsk: hello 15:46:06 <anteaya> wznoinsk: did you have anything you would like to discuss? 15:46:13 <wznoinsk> is anyone seeing problems with python-eventlet(greenthread) in their CIs when using testr? 15:46:44 <krtaylor> wznoinsk, we have in the past, what are you seeing? 15:46:49 <kaisers1> wznoinsk: Not currently but saw something like that some time ago 15:46:51 <anteaya> wznoinsk: have you a paste? 15:46:53 <wznoinsk> started only recently and happens in both containers and baremetal 15:47:29 <wznoinsk> http://pastebin.com/Dk2x3yMV 15:48:07 * krtaylor looks 15:48:13 <wznoinsk> yet, I'm not yet tested whether it's related to our networking on dpdk but looks pretty generic (eventlet/threading like) 15:48:43 <anteaya> wznoinsk: the error contains a suggestion 15:49:01 <anteaya> wznoinsk: have you tried evaluating if implementing the suggestion is reasonable? 15:49:37 <wznoinsk> I wouldn't be able to change the code (not quickly), if nobody's seen that/similar recently I'll keep diggin myself 15:49:46 <krtaylor> wznoinsk, this looks like a race, you are running parallel tests I presume? 15:50:13 <anteaya> eventlet is not well liked right now, mostly because of the python2/3 disparity 15:50:40 <anteaya> wznoinsk: beyond that I haven't personally come across eventlet errors specific to third party cis 15:50:41 <wznoinsk> krtaylor: yes, parallel tests, but I think the error is happening on the software threads (greenthreads) and (I would imagine) that the same would happen in the productino env 15:51:35 <wznoinsk> anteaya: I thought I'd share it here as when I boot vms by hand it's ok, only testr gives me that grieve here 15:51:47 <anteaya> wznoinsk: of course, yes 15:51:54 <anteaya> wznoinsk: always good to share 15:52:02 <anteaya> boot vms by hand? 15:52:19 <wznoinsk> ='nova boot' or horizon 15:52:34 <anteaya> oh 15:53:16 <wznoinsk> thanks for having a look anyways 15:53:24 <anteaya> what command are you running that this is part of the output? 15:53:44 <wznoinsk> testr run tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_network_basic_ops 15:54:01 <anteaya> is is just that test that fails? 15:54:08 <anteaya> or any test run with testr? 15:54:53 <anteaya> have you tried running just one test? 15:54:56 <wznoinsk> apparently it sometimes passes sometimes failes due to the above error (depending when it actually happens), krtaylor is probably right about some race condition 15:55:20 <anteaya> is it the same test that fails everytime? 15:55:32 <anteaya> you can run it with until failure to flush out races 15:55:40 <wznoinsk> happens for any tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.*, I'm about to check other scenario tests 15:56:11 <anteaya> #info testr run --parallel --concurrency N --until-failure 15:56:19 <anteaya> wznoinsk: okay 15:56:24 <wznoinsk> I'll give it a go, thanks 15:56:27 <anteaya> would be good to get a comparison 15:56:33 <anteaya> thanks for asking, I hope you find the issue 15:56:48 <anteaya> anything more here? 15:57:41 <anteaya> anyone have anything else? 15:57:46 <anteaya> 2 minutes left 15:58:20 <anteaya> okay let's wrap up 15:58:24 <anteaya> thanks everyone 15:58:35 <kaisers1> thanks & bye 15:58:36 <anteaya> enjoy the rest of your <time-of-day> 15:58:42 <anteaya> see you next week 15:58:46 <anteaya> #endmeeting