15:09:29 #startmeeting third-party 15:09:30 Meeting started Mon Mar 9 15:09:29 2015 UTC and is due to finish in 60 minutes. The chair is anteaya. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:09:31 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:09:33 The meeting name has been set to 'third_party' 15:09:37 sorry about that 15:09:43 didn't mean to keep you waiting 15:09:47 how is everyone today? 15:10:14 good 15:10:18 glad to hear it 15:10:27 how is everyone's systems operating? 15:10:56 hi, good here 15:11:05 wznoinsk: glad to hear it 15:11:09 so far so good. we found an issue affecting our systems. looking up bug now 15:11:20 asselin_: I'd be interested to hear 15:11:40 mines turned off right now :( got some new hardware but had to move the existing one to a new rack, had to get new ip addresses and network configs for everything 15:11:57 patrickeast: wow 15:12:06 yay for new hardware 15:12:11 it will be better in a few days tho 15:12:12 yea 15:12:14 but not such great timing 15:12:21 so you are thinking a few days? 15:12:32 #link https://bugs.launchpad.net/tempest/+bug/1427833 15:12:33 Launchpad bug 1427833 in tempest "Intermittent 3rd party ci failures: 400 Bad Request [Errno 101] ENETUNREACH 101" [Undecided,Incomplete] 15:12:36 tripled the amount of initiators, and now i’ve got 4x 10 Gb for each to the array and FC ready for next release 15:12:43 i’m hoping later today 15:12:47 few days worst case 15:13:18 nice 15:13:28 patrickeast: you are pure storage, yeah? 15:13:32 yep 15:13:38 nice job on the wikipage 15:13:41 if my analysis is correct, others should be hitting the same issue 15:13:41 #link https://wiki.openstack.org/wiki/ThirdPartySystems/Pure_Storage_CI 15:13:41 our also down for maintenance, hope to be up soon 15:13:51 patrickeast: well done 15:14:10 thx 15:14:24 luqas: okay 15:14:36 luqas: can you edit your page like patrickeast did? https://wiki.openstack.org/wiki/ThirdPartySystems 15:14:53 anteaya: sure 15:14:55 asselin: oo when did that bug start showing up? 15:15:00 luqas: thanks 15:15:40 patrickeast, since the beginning 15:16:15 patrickeast, finally traced it down...and then submitted the bug now that we know it's not some issue in our side 15:16:22 asselin_: ahh ok, i see, the workaround is to run sequentially… i’ve been doing that for a while to increase stability (probably sweeping bugs like this under the rug) 15:16:59 asselin_: first of all, nice work on the bug report 15:17:07 asselin_: thanks for including your script 15:17:08 patrickeast, yes. and LVM/gate won't run into it since they don't do externally network connections during that time 15:17:45 asselin_: what do you want to do now? 15:18:45 anteaya, right now we're running sequential. but I'd like to get input from more people to see if they're running into it, and add to the bug. 15:18:59 great 15:19:19 one big questions is why is br100 not setup at the very beginning like in normal devstack. 15:19:55 o/ 15:20:14 b/c if we can solve that, then we can go back to running parallel tests...which is what we want long-term. 15:20:15 asselin_: I see that 15:20:30 asselin_: so that is a difference between devstack and tempest? 15:20:37 * krtaylor reads bug 15:21:36 anteaya, actually, it's more devstack running under devstack-gate. I think br100 should be setup before tempest even starts. 15:21:45 asselin_: jeblair is supporting of starting a thread on the -dev list to get more feedback on it 15:22:07 asselin_: so the issue might be how devstack-gate is setting up devstack? 15:22:13 anteaya, in that case perhaps the bug is not in tempest.... 15:22:26 when you say normal devstack, do you mean a devstack you stack yourself? 15:22:33 anteaya, correct 15:22:37 as opposed to how infra sets it up 15:22:47 yes 15:22:48 yes then devstack-gate might be the home for the bug 15:22:56 asselin_, we may be seeing something similar, what is the OS? F21? 15:23:05 we should look at how devstack-gate sets up devstack for those nodes 15:23:11 krtaylor, ubuntu 14.04 15:24:13 so the logs for the test you linked are run on your system 15:24:22 #link http://15.126.198.151/59/134759/13/check/3par-fc-driver-master-client-pip-eos10-dsvm/c3e9b0b/console.html.gz 15:24:55 so you are using a form of devstack-gate for your setup here? 15:25:11 anteaya, yes, we use the official devstack-gate 15:25:15 asselin_, interesting, on F21 we are seeing something change the network configuration after tempest starts and creates 2nd level guests, network drops 15:25:27 asselin_: you run devstack-gate master every time? 15:25:34 or do you have a cached copy? 15:25:40 anteaya, yes, same as -infra 15:25:49 okay 15:25:52 asselin_, I'll crawl through it and see if I see anything 15:26:27 asselin_: the only way you would be able to find the same information from our nodes is to run your script, is that correct? 15:26:58 anteaya, yes that's a good idea. We can then see if br100 is setup in the middle 15:27:04 okay 15:27:14 talk to fungi and see if he can hold you a node 15:27:41 then you can run your script and see if this is what devstack-gate is doing or just what devstack-gate is doing on third party systems 15:27:47 or perhaps just your system 15:28:06 krtaylor: is there an upstream job doing that? 15:28:47 fungi: would it be reasonable to allow asselin_ to run his script on one of our nodes? 15:29:07 fungi, it is what we are seeing in our CI tests when trying to use a new F21 image 15:29:10 fungi, simple script to print out arp tables with timestamps 15:29:40 anteaya: asselin_: to what end? you can propose a change to an existing project (devstack? devstack-gate?) which would run that fairly easily and report on the results, i expect 15:30:04 if you would prefer that, that is fine 15:30:07 fungi, anteaya yes, this is what I did on our system. will see how to do that in infra 15:30:16 asselin_: great 15:30:28 any more on this topic? 15:30:37 we generally try to limit manual intervention on node handling to situations where our systems are misbehaving in ways which prevent us from seeing why in our logs 15:31:01 that's it. just wanted to get awareness & ideas. thank you. 15:31:04 fair enough 15:31:14 asselin_: thanks for bringing it up 15:31:15 rfolco (on my team) has been looking at route info changing mid test, right before tempest creates 2nd level guests 15:31:30 rfolco ^^^^, see scrollback 15:31:31 does anyone have anything else they would like to discuss today? 15:31:54 sure, last week I got nodepool working with DiskImageBuilder. It's really great! 15:32:02 yes wonderful 15:32:06 congratulations 15:32:15 did you want to share any further thoughts? 15:32:16 it's a huge savings 15:32:44 do you have diskimagebuilder running on its own server or on a server with another service? 15:32:46 in bandwidth, time, and easier to maintain 15:33:01 anteaya, it's part of nodepool, so I just had to configure it. 15:33:08 awesome 15:33:24 I leverage the same elements used in project-config. 15:33:27 is anyone else running diskimagebuilder? 15:33:31 cool 15:33:42 and add extras that I need to support my http proxies, pip configurations, etc. 15:33:58 how long did it take you to figure out your configuration? 15:33:59 and anything that breaks, I just overwrite with an empty file 15:34:14 how do you mean? 15:34:20 anteaya, I spent about 2 days on it....much less than I expected 15:34:29 nice 15:35:04 asselin_, very good to know 15:35:22 anteaya, for example, there are some element that e.g need access to infra components, that I don't have. So I use all the files, and copy over an empty file for those parts that I don't need. 15:35:37 asselin_: ah that makes sense 15:35:39 asselin_: yea thats pretty cool, would you recommend others switch to it as well? 15:35:40 #link https://github.com/rasselin/os-ext-testing/commit/dac919beca8dd8109505fab66a716bf4b4451b9e 15:35:45 patrickeast, definitely 15:36:13 patrickeast, the other reason is scalability 15:36:28 that is very straightforward 15:36:31 nicely done 15:36:34 nice, i’ll give it a try once i’ve got my system back up 15:36:44 we run each blade as a nodepool provider, then each is actually 2 providers, 1 for regular nodes, the other for fc (to limit max nodes) 15:37:12 so 16 blades * 2 providers = 32 providers = 32 images * 4GB = a ton of data every day 15:37:28 ew yea 15:37:36 with DIB, everything is cached locally on nodepool. image is built once, then uploaded to each provider 15:37:53 so you guarentee the same exact image on all providers 15:38:05 how often are you building images? 15:38:09 which I think was the main reason -infra switched over to it 15:38:12 anteaya, daily 15:38:18 nice 15:38:52 luqas: thanks for changing your wikipage 15:38:55 #link https://wiki.openstack.org/wiki/ThirdPartySystems/Midokura_CI 15:38:59 luqas: well done 15:39:17 anteaya: you're welcome :) 15:39:19 diskimagebuilder was created as part of tripleo 15:39:20 #link https://review.openstack.org/#/c/162313/ 15:39:21 :) 15:39:33 as a way of deploying openstack 15:39:41 ^^ this is needed for nodepool to work in setups with regions 15:39:48 and it was so useful infra started using it 15:39:58 I think it is still being deveopled as well 15:40:09 yolanda has done quite a bit of work onit 15:40:32 oh and as a note 15:40:59 nodepool won't be having a lot of new patches merged until we can get some sort of testing for it 15:41:17 as we had issues last week and we really don't have a great way to test it 15:41:26 other than in production, which is not great 15:41:42 so if anyone wants to help figure out how to test it 15:41:55 I do belive it is on jeblair's todo list 15:42:31 so thanks for the report on how you are doing with diskimagebuilder asselin_ 15:42:46 does anyone have anything else they want to talk about today? 15:43:54 * krtaylor is happy for daylight savings time so he can finally start attending the office hours without meeting conflict 15:43:57 well I won't keep you 15:44:07 thanks everyone for attending 15:44:12 have a good rest of the day 15:44:16 and see you next week 15:44:20 #endmeeting