15:09:29 <anteaya> #startmeeting third-party 15:09:30 <openstack> Meeting started Mon Mar 9 15:09:29 2015 UTC and is due to finish in 60 minutes. The chair is anteaya. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:09:31 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:09:33 <openstack> The meeting name has been set to 'third_party' 15:09:37 <anteaya> sorry about that 15:09:43 <anteaya> didn't mean to keep you waiting 15:09:47 <anteaya> how is everyone today? 15:10:14 <patrickeast> good 15:10:18 <anteaya> glad to hear it 15:10:27 <anteaya> how is everyone's systems operating? 15:10:56 <wznoinsk> hi, good here 15:11:05 <anteaya> wznoinsk: glad to hear it 15:11:09 <asselin_> so far so good. we found an issue affecting our systems. looking up bug now 15:11:20 <anteaya> asselin_: I'd be interested to hear 15:11:40 <patrickeast> mines turned off right now :( got some new hardware but had to move the existing one to a new rack, had to get new ip addresses and network configs for everything 15:11:57 <anteaya> patrickeast: wow 15:12:06 <anteaya> yay for new hardware 15:12:11 <patrickeast> it will be better in a few days tho 15:12:12 <patrickeast> yea 15:12:14 <anteaya> but not such great timing 15:12:21 <anteaya> so you are thinking a few days? 15:12:32 <asselin_> #link https://bugs.launchpad.net/tempest/+bug/1427833 15:12:33 <openstack> Launchpad bug 1427833 in tempest "Intermittent 3rd party ci failures: 400 Bad Request [Errno 101] ENETUNREACH 101" [Undecided,Incomplete] 15:12:36 <patrickeast> tripled the amount of initiators, and now i’ve got 4x 10 Gb for each to the array and FC ready for next release 15:12:43 <patrickeast> i’m hoping later today 15:12:47 <patrickeast> few days worst case 15:13:18 <anteaya> nice 15:13:28 <anteaya> patrickeast: you are pure storage, yeah? 15:13:32 <patrickeast> yep 15:13:38 <anteaya> nice job on the wikipage 15:13:41 <asselin_> if my analysis is correct, others should be hitting the same issue 15:13:41 <anteaya> #link https://wiki.openstack.org/wiki/ThirdPartySystems/Pure_Storage_CI 15:13:41 <luqas> our also down for maintenance, hope to be up soon 15:13:51 <anteaya> patrickeast: well done 15:14:10 <patrickeast> thx 15:14:24 <anteaya> luqas: okay 15:14:36 <anteaya> luqas: can you edit your page like patrickeast did? https://wiki.openstack.org/wiki/ThirdPartySystems 15:14:53 <luqas> anteaya: sure 15:14:55 <patrickeast> asselin: oo when did that bug start showing up? 15:15:00 <anteaya> luqas: thanks 15:15:40 <asselin_> patrickeast, since the beginning 15:16:15 <asselin_> patrickeast, finally traced it down...and then submitted the bug now that we know it's not some issue in our side 15:16:22 <patrickeast> asselin_: ahh ok, i see, the workaround is to run sequentially… i’ve been doing that for a while to increase stability (probably sweeping bugs like this under the rug) 15:16:59 <anteaya> asselin_: first of all, nice work on the bug report 15:17:07 <anteaya> asselin_: thanks for including your script 15:17:08 <asselin_> patrickeast, yes. and LVM/gate won't run into it since they don't do externally network connections during that time 15:17:45 <anteaya> asselin_: what do you want to do now? 15:18:45 <asselin_> anteaya, right now we're running sequential. but I'd like to get input from more people to see if they're running into it, and add to the bug. 15:18:59 <anteaya> great 15:19:19 <asselin_> one big questions is why is br100 not setup at the very beginning like in normal devstack. 15:19:55 <krtaylor> o/ 15:20:14 <asselin_> b/c if we can solve that, then we can go back to running parallel tests...which is what we want long-term. 15:20:15 <anteaya> asselin_: I see that 15:20:30 <anteaya> asselin_: so that is a difference between devstack and tempest? 15:20:37 * krtaylor reads bug 15:21:36 <asselin_> anteaya, actually, it's more devstack running under devstack-gate. I think br100 should be setup before tempest even starts. 15:21:45 <anteaya> asselin_: jeblair is supporting of starting a thread on the -dev list to get more feedback on it 15:22:07 <anteaya> asselin_: so the issue might be how devstack-gate is setting up devstack? 15:22:13 <asselin_> anteaya, in that case perhaps the bug is not in tempest.... 15:22:26 <anteaya> when you say normal devstack, do you mean a devstack you stack yourself? 15:22:33 <asselin_> anteaya, correct 15:22:37 <anteaya> as opposed to how infra sets it up 15:22:47 <asselin_> yes 15:22:48 <anteaya> yes then devstack-gate might be the home for the bug 15:22:56 <krtaylor> asselin_, we may be seeing something similar, what is the OS? F21? 15:23:05 <anteaya> we should look at how devstack-gate sets up devstack for those nodes 15:23:11 <asselin_> krtaylor, ubuntu 14.04 15:24:13 <anteaya> so the logs for the test you linked are run on your system 15:24:22 <anteaya> #link http://15.126.198.151/59/134759/13/check/3par-fc-driver-master-client-pip-eos10-dsvm/c3e9b0b/console.html.gz 15:24:55 <anteaya> so you are using a form of devstack-gate for your setup here? 15:25:11 <asselin_> anteaya, yes, we use the official devstack-gate 15:25:15 <krtaylor> asselin_, interesting, on F21 we are seeing something change the network configuration after tempest starts and creates 2nd level guests, network drops 15:25:27 <anteaya> asselin_: you run devstack-gate master every time? 15:25:34 <anteaya> or do you have a cached copy? 15:25:40 <asselin_> anteaya, yes, same as -infra 15:25:49 <anteaya> okay 15:25:52 <krtaylor> asselin_, I'll crawl through it and see if I see anything 15:26:27 <anteaya> asselin_: the only way you would be able to find the same information from our nodes is to run your script, is that correct? 15:26:58 <asselin_> anteaya, yes that's a good idea. We can then see if br100 is setup in the middle 15:27:04 <anteaya> okay 15:27:14 <anteaya> talk to fungi and see if he can hold you a node 15:27:41 <anteaya> then you can run your script and see if this is what devstack-gate is doing or just what devstack-gate is doing on third party systems 15:27:47 <anteaya> or perhaps just your system 15:28:06 <fungi> krtaylor: is there an upstream job doing that? 15:28:47 <anteaya> fungi: would it be reasonable to allow asselin_ to run his script on one of our nodes? 15:29:07 <krtaylor> fungi, it is what we are seeing in our CI tests when trying to use a new F21 image 15:29:10 <asselin_> fungi, simple script to print out arp tables with timestamps 15:29:40 <fungi> anteaya: asselin_: to what end? you can propose a change to an existing project (devstack? devstack-gate?) which would run that fairly easily and report on the results, i expect 15:30:04 <anteaya> if you would prefer that, that is fine 15:30:07 <asselin_> fungi, anteaya yes, this is what I did on our system. will see how to do that in infra 15:30:16 <anteaya> asselin_: great 15:30:28 <anteaya> any more on this topic? 15:30:37 <fungi> we generally try to limit manual intervention on node handling to situations where our systems are misbehaving in ways which prevent us from seeing why in our logs 15:31:01 <asselin_> that's it. just wanted to get awareness & ideas. thank you. 15:31:04 <anteaya> fair enough 15:31:14 <anteaya> asselin_: thanks for bringing it up 15:31:15 <krtaylor> rfolco (on my team) has been looking at route info changing mid test, right before tempest creates 2nd level guests 15:31:30 <krtaylor> rfolco ^^^^, see scrollback 15:31:31 <anteaya> does anyone have anything else they would like to discuss today? 15:31:54 <asselin_> sure, last week I got nodepool working with DiskImageBuilder. It's really great! 15:32:02 <anteaya> yes wonderful 15:32:06 <anteaya> congratulations 15:32:15 <anteaya> did you want to share any further thoughts? 15:32:16 <asselin_> it's a huge savings 15:32:44 <anteaya> do you have diskimagebuilder running on its own server or on a server with another service? 15:32:46 <asselin_> in bandwidth, time, and easier to maintain 15:33:01 <asselin_> anteaya, it's part of nodepool, so I just had to configure it. 15:33:08 <anteaya> awesome 15:33:24 <asselin_> I leverage the same elements used in project-config. 15:33:27 <anteaya> is anyone else running diskimagebuilder? 15:33:31 <anteaya> cool 15:33:42 <asselin_> and add extras that I need to support my http proxies, pip configurations, etc. 15:33:58 <anteaya> how long did it take you to figure out your configuration? 15:33:59 <asselin_> and anything that breaks, I just overwrite with an empty file 15:34:14 <anteaya> how do you mean? 15:34:20 <asselin_> anteaya, I spent about 2 days on it....much less than I expected 15:34:29 <anteaya> nice 15:35:04 <krtaylor> asselin_, very good to know 15:35:22 <asselin_> anteaya, for example, there are some element that e.g need access to infra components, that I don't have. So I use all the files, and copy over an empty file for those parts that I don't need. 15:35:37 <anteaya> asselin_: ah that makes sense 15:35:39 <patrickeast> asselin_: yea thats pretty cool, would you recommend others switch to it as well? 15:35:40 <asselin_> #link https://github.com/rasselin/os-ext-testing/commit/dac919beca8dd8109505fab66a716bf4b4451b9e 15:35:45 <asselin_> patrickeast, definitely 15:36:13 <asselin_> patrickeast, the other reason is scalability 15:36:28 <anteaya> that is very straightforward 15:36:31 <anteaya> nicely done 15:36:34 <patrickeast> nice, i’ll give it a try once i’ve got my system back up 15:36:44 <asselin_> we run each blade as a nodepool provider, then each is actually 2 providers, 1 for regular nodes, the other for fc (to limit max nodes) 15:37:12 <asselin_> so 16 blades * 2 providers = 32 providers = 32 images * 4GB = a ton of data every day 15:37:28 <patrickeast> ew yea 15:37:36 <asselin_> with DIB, everything is cached locally on nodepool. image is built once, then uploaded to each provider 15:37:53 <asselin_> so you guarentee the same exact image on all providers 15:38:05 <anteaya> how often are you building images? 15:38:09 <asselin_> which I think was the main reason -infra switched over to it 15:38:12 <asselin_> anteaya, daily 15:38:18 <anteaya> nice 15:38:52 <anteaya> luqas: thanks for changing your wikipage 15:38:55 <anteaya> #link https://wiki.openstack.org/wiki/ThirdPartySystems/Midokura_CI 15:38:59 <anteaya> luqas: well done 15:39:17 <luqas> anteaya: you're welcome :) 15:39:19 <anteaya> diskimagebuilder was created as part of tripleo 15:39:20 <asselin_> #link https://review.openstack.org/#/c/162313/ 15:39:21 <anteaya> :) 15:39:33 <anteaya> as a way of deploying openstack 15:39:41 <asselin_> ^^ this is needed for nodepool to work in setups with regions 15:39:48 <anteaya> and it was so useful infra started using it 15:39:58 <anteaya> I think it is still being deveopled as well 15:40:09 <anteaya> yolanda has done quite a bit of work onit 15:40:32 <anteaya> oh and as a note 15:40:59 <anteaya> nodepool won't be having a lot of new patches merged until we can get some sort of testing for it 15:41:17 <anteaya> as we had issues last week and we really don't have a great way to test it 15:41:26 <anteaya> other than in production, which is not great 15:41:42 <anteaya> so if anyone wants to help figure out how to test it 15:41:55 <anteaya> I do belive it is on jeblair's todo list 15:42:31 <anteaya> so thanks for the report on how you are doing with diskimagebuilder asselin_ 15:42:46 <anteaya> does anyone have anything else they want to talk about today? 15:43:54 * krtaylor is happy for daylight savings time so he can finally start attending the office hours without meeting conflict 15:43:57 <anteaya> well I won't keep you 15:44:07 <anteaya> thanks everyone for attending 15:44:12 <anteaya> have a good rest of the day 15:44:16 <anteaya> and see you next week 15:44:20 <anteaya> #endmeeting