#openstack-meeting log

15:09:29 <anteaya> #startmeeting third-party
15:09:30 <openstack> Meeting started Mon Mar  9 15:09:29 2015 UTC and is due to finish in 60 minutes.  The chair is anteaya. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:09:31 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:09:33 <openstack> The meeting name has been set to 'third_party'
15:09:37 <anteaya> sorry about that
15:09:43 <anteaya> didn't mean to keep you waiting
15:09:47 <anteaya> how is everyone today?
15:10:14 <patrickeast> good
15:10:18 <anteaya> glad to hear it
15:10:27 <anteaya> how is everyone's systems operating?
15:10:56 <wznoinsk> hi, good here
15:11:05 <anteaya> wznoinsk: glad to hear it
15:11:09 <asselin_> so far so good. we found an issue affecting our systems. looking up bug now
15:11:20 <anteaya> asselin_: I'd be interested to hear
15:11:40 <patrickeast> mines turned off right now :(  got some new hardware but had to move the existing one to a new rack, had to get new ip addresses and network configs for everything
15:11:57 <anteaya> patrickeast: wow
15:12:06 <anteaya> yay for new hardware
15:12:11 <patrickeast> it will be better in a few days tho
15:12:12 <patrickeast> yea
15:12:14 <anteaya> but not such great timing
15:12:21 <anteaya> so you are thinking a few days?
15:12:32 <asselin_> #link https://bugs.launchpad.net/tempest/+bug/1427833
15:12:33 <openstack> Launchpad bug 1427833 in tempest "Intermittent 3rd party ci failures: 400 Bad Request [Errno 101] ENETUNREACH 101" [Undecided,Incomplete]
15:12:36 <patrickeast> tripled the amount of initiators, and now i’ve got 4x 10 Gb for each to the array and FC ready for next release
15:12:43 <patrickeast> i’m hoping later today
15:12:47 <patrickeast> few days worst case
15:13:18 <anteaya> nice
15:13:28 <anteaya> patrickeast: you are pure storage, yeah?
15:13:32 <patrickeast> yep
15:13:38 <anteaya> nice job on the wikipage
15:13:41 <asselin_> if my analysis is correct, others should be hitting the same issue
15:13:41 <anteaya> #link https://wiki.openstack.org/wiki/ThirdPartySystems/Pure_Storage_CI
15:13:41 <luqas> our also down for maintenance, hope to be up soon
15:13:51 <anteaya> patrickeast: well done
15:14:10 <patrickeast> thx
15:14:24 <anteaya> luqas: okay
15:14:36 <anteaya> luqas: can you edit your page like patrickeast did? https://wiki.openstack.org/wiki/ThirdPartySystems
15:14:53 <luqas> anteaya: sure
15:14:55 <patrickeast> asselin: oo when did that bug start showing up?
15:15:00 <anteaya> luqas: thanks
15:15:40 <asselin_> patrickeast, since the beginning
15:16:15 <asselin_> patrickeast, finally traced it down...and then submitted the bug now that we know it's not some issue in our side
15:16:22 <patrickeast> asselin_: ahh ok, i see, the workaround is to run sequentially… i’ve been doing that for a while to increase stability (probably sweeping bugs like this under the rug)
15:16:59 <anteaya> asselin_: first of all, nice work on the bug report
15:17:07 <anteaya> asselin_: thanks for including your script
15:17:08 <asselin_> patrickeast, yes. and LVM/gate won't run into it since they don't do externally network connections during that time
15:17:45 <anteaya> asselin_: what do you want to do now?
15:18:45 <asselin_> anteaya, right now we're running sequential. but I'd like to get input from more people to see if they're running into it, and add to the bug.
15:18:59 <anteaya> great
15:19:19 <asselin_> one big questions is why is br100 not setup at the very beginning like in normal devstack.
15:19:55 <krtaylor> o/
15:20:14 <asselin_> b/c if we can solve that, then we can go back to running parallel tests...which is what we want long-term.
15:20:15 <anteaya> asselin_: I see that
15:20:30 <anteaya> asselin_: so that is a difference between devstack and tempest?
15:20:37 * krtaylor reads bug
15:21:36 <asselin_> anteaya, actually, it's more devstack running under devstack-gate. I think br100 should be setup before tempest even starts.
15:21:45 <anteaya> asselin_: jeblair is supporting of starting a thread on the -dev list to get more feedback on it
15:22:07 <anteaya> asselin_: so the issue might be how devstack-gate is setting up devstack?
15:22:13 <asselin_> anteaya, in that case perhaps the bug is not in tempest....
15:22:26 <anteaya> when you say normal devstack, do you mean a devstack you stack yourself?
15:22:33 <asselin_> anteaya, correct
15:22:37 <anteaya> as opposed to how infra sets it up
15:22:47 <asselin_> yes
15:22:48 <anteaya> yes then devstack-gate might be the home for the bug
15:22:56 <krtaylor> asselin_, we may be seeing something similar, what is the OS? F21?
15:23:05 <anteaya> we should look at how devstack-gate sets up devstack for those nodes
15:23:11 <asselin_> krtaylor, ubuntu 14.04
15:24:13 <anteaya> so the logs for the test you linked are run on your system
15:24:22 <anteaya> #link http://15.126.198.151/59/134759/13/check/3par-fc-driver-master-client-pip-eos10-dsvm/c3e9b0b/console.html.gz
15:24:55 <anteaya> so you are using a form of devstack-gate for your setup here?
15:25:11 <asselin_> anteaya, yes, we use the official devstack-gate
15:25:15 <krtaylor> asselin_, interesting, on F21 we are seeing something change the network configuration after tempest starts and creates 2nd level guests, network drops
15:25:27 <anteaya> asselin_: you run devstack-gate master every time?
15:25:34 <anteaya> or do you have a cached copy?
15:25:40 <asselin_> anteaya, yes, same as -infra
15:25:49 <anteaya> okay
15:25:52 <krtaylor> asselin_, I'll crawl through it and see if I see anything
15:26:27 <anteaya> asselin_: the only way you would be able to find the same information from our nodes is to run your script, is that correct?
15:26:58 <asselin_> anteaya, yes that's a good idea. We can then see if br100 is setup in the middle
15:27:04 <anteaya> okay
15:27:14 <anteaya> talk to fungi and see if he can hold you a node
15:27:41 <anteaya> then you can run your script and see if this is what devstack-gate is doing or just what devstack-gate is doing on third party systems
15:27:47 <anteaya> or perhaps just your system
15:28:06 <fungi> krtaylor: is there an upstream job doing that?
15:28:47 <anteaya> fungi: would it be reasonable to allow asselin_ to run his script on one of our nodes?
15:29:07 <krtaylor> fungi, it is what we are seeing in our CI tests when trying to use a new F21 image
15:29:10 <asselin_> fungi, simple script to print out arp tables with timestamps
15:29:40 <fungi> anteaya: asselin_: to what end? you can propose a change to an existing project (devstack? devstack-gate?) which would run that fairly easily and report on the results, i expect
15:30:04 <anteaya> if you would prefer that, that is fine
15:30:07 <asselin_> fungi, anteaya yes, this is what I did on our system. will see how to do that in infra
15:30:16 <anteaya> asselin_: great
15:30:28 <anteaya> any more on this topic?
15:30:37 <fungi> we generally try to limit manual intervention on node handling to situations where our systems are misbehaving in ways which prevent us from seeing why in our logs
15:31:01 <asselin_> that's it. just wanted to get awareness & ideas. thank you.
15:31:04 <anteaya> fair enough
15:31:14 <anteaya> asselin_: thanks for bringing it up
15:31:15 <krtaylor> rfolco (on my team) has been looking at route info changing mid test, right before tempest creates 2nd level guests
15:31:30 <krtaylor> rfolco ^^^^, see scrollback
15:31:31 <anteaya> does anyone have anything else they would like to discuss today?
15:31:54 <asselin_> sure, last week I got nodepool working with DiskImageBuilder. It's really great!
15:32:02 <anteaya> yes wonderful
15:32:06 <anteaya> congratulations
15:32:15 <anteaya> did you want to share any further thoughts?
15:32:16 <asselin_> it's a huge savings
15:32:44 <anteaya> do you have diskimagebuilder running on its own server or on a server with another service?
15:32:46 <asselin_> in bandwidth, time, and easier to maintain
15:33:01 <asselin_> anteaya, it's part of nodepool, so I just had to configure it.
15:33:08 <anteaya> awesome
15:33:24 <asselin_> I leverage the same elements used in project-config.
15:33:27 <anteaya> is anyone else running diskimagebuilder?
15:33:31 <anteaya> cool
15:33:42 <asselin_> and add extras that I need to support my http proxies, pip configurations, etc.
15:33:58 <anteaya> how long did it take you to figure out your configuration?
15:33:59 <asselin_> and anything that breaks, I just overwrite with an empty file
15:34:14 <anteaya> how do you mean?
15:34:20 <asselin_> anteaya, I spent about 2 days on it....much less than I expected
15:34:29 <anteaya> nice
15:35:04 <krtaylor> asselin_, very good to know
15:35:22 <asselin_> anteaya, for example, there are some element that e.g need access to infra components, that I don't have. So I use all the files, and copy over an empty file for those parts that I don't need.
15:35:37 <anteaya> asselin_: ah that makes sense
15:35:39 <patrickeast> asselin_: yea thats pretty cool, would you recommend others switch to it as well?
15:35:40 <asselin_> #link https://github.com/rasselin/os-ext-testing/commit/dac919beca8dd8109505fab66a716bf4b4451b9e
15:35:45 <asselin_> patrickeast, definitely
15:36:13 <asselin_> patrickeast, the other reason is scalability
15:36:28 <anteaya> that is very straightforward
15:36:31 <anteaya> nicely done
15:36:34 <patrickeast> nice, i’ll give it a try once i’ve got my system back up
15:36:44 <asselin_> we run each blade as a nodepool provider, then each is actually 2 providers, 1 for regular nodes, the other for fc (to limit max nodes)
15:37:12 <asselin_> so 16 blades * 2 providers = 32 providers = 32 images * 4GB = a ton of data every day
15:37:28 <patrickeast> ew yea
15:37:36 <asselin_> with DIB, everything is cached locally on nodepool. image is built once, then uploaded to each provider
15:37:53 <asselin_> so you guarentee the same exact image on all providers
15:38:05 <anteaya> how often are you building images?
15:38:09 <asselin_> which I think was the main reason -infra switched over to it
15:38:12 <asselin_> anteaya, daily
15:38:18 <anteaya> nice
15:38:52 <anteaya> luqas: thanks for changing your wikipage
15:38:55 <anteaya> #link https://wiki.openstack.org/wiki/ThirdPartySystems/Midokura_CI
15:38:59 <anteaya> luqas: well done
15:39:17 <luqas> anteaya: you're welcome :)
15:39:19 <anteaya> diskimagebuilder was created as part of tripleo
15:39:20 <asselin_> #link https://review.openstack.org/#/c/162313/
15:39:21 <anteaya> :)
15:39:33 <anteaya> as a way of deploying openstack
15:39:41 <asselin_> ^^ this is needed for nodepool to work in setups with regions
15:39:48 <anteaya> and it was so useful infra started using it
15:39:58 <anteaya> I think it is still being deveopled as well
15:40:09 <anteaya> yolanda has done quite a bit of work onit
15:40:32 <anteaya> oh and as a note
15:40:59 <anteaya> nodepool won't be having a lot of new patches merged until we can get some sort of testing for it
15:41:17 <anteaya> as we had issues last week and we really don't have a great way to test it
15:41:26 <anteaya> other than in production, which is not great
15:41:42 <anteaya> so if anyone wants to help figure out how to test it
15:41:55 <anteaya> I do belive it is on jeblair's todo list
15:42:31 <anteaya> so thanks for the report on how you are doing with diskimagebuilder asselin_
15:42:46 <anteaya> does anyone have anything else they want to talk about today?
15:43:54 * krtaylor is happy for daylight savings time so he can finally start attending the office hours without meeting conflict
15:43:57 <anteaya> well I won't keep you
15:44:07 <anteaya> thanks everyone for attending
15:44:12 <anteaya> have a good rest of the day
15:44:16 <anteaya> and see you next week
15:44:20 <anteaya> #endmeeting