#openstack-meeting-alt log

19:00:20 <lifeless> #startmeeting tripleo
19:00:20 <openstack> Meeting started Tue Nov 19 19:00:20 2013 UTC and is due to finish in 60 minutes.  The chair is lifeless. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:21 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:24 <openstack> The meeting name has been set to 'tripleo'
19:00:25 <jistr> hi
19:00:28 <ifarkas> hello
19:00:31 <lifeless> #topic agenda
19:00:41 <lifeless> bugs
19:00:41 <lifeless> reviews
19:00:42 <lifeless> Projects needing releases
19:00:42 <lifeless> CD Cloud status
19:00:42 <lifeless> CI virtualized testing progress
19:00:44 <lifeless> Insert one-off agenda items here
19:00:46 <lifeless> open discussion
19:00:49 <lifeless> #topic bugs
19:00:55 <lifeless> #link https://bugs.launchpad.net/tripleo/
19:00:55 <lifeless> #link https://bugs.launchpad.net/diskimage-builder/
19:00:56 <lifeless> #link https://bugs.launchpad.net/os-refresh-config
19:00:56 <lifeless> #link https://bugs.launchpad.net/os-apply-config
19:00:56 <lifeless> #link https://bugs.launchpad.net/os-collect-config
19:00:58 <lifeless> #link https://bugs.launchpad.net/tuskar
19:01:00 <lifeless> #link https://bugs.launchpad.net/tuskar-ui
19:01:03 <lifeless> #link https://bugs.launchpad.net/python-tuskarclient
19:01:09 <lifeless> one critical fixed - bug 1251166
19:01:17 <jcoufal> o/
19:01:24 <bnemec> \o
19:01:27 <pleia2> o/
19:01:31 <jomara> good afternoon
19:01:31 <slagle> |o
19:01:39 <tzumainn> heya
19:01:40 <jomara> i broke the wave
19:01:40 <dkehn> hi
19:01:43 <jomara> i am so sorry
19:01:44 <jtomasek> hey
19:02:04 <marios> 'evening
19:02:12 <jprovazn> hi
19:03:13 <lifeless> sorry elocal there
19:04:14 <lifeless> ok so all other bugs triaged it looks like
19:04:17 <lifeless> and no other criticals - cool
19:04:22 <lifeless> any other bug business?
19:05:21 <dprince> lifeless: question about general OS fixes that affect tripleO as well.
19:05:27 <lifeless> dprince: shoot
19:05:36 <dprince> lifeless: should we register those against tripleO in LP as well?
19:05:44 <dprince> lifeless: and/or put them in the trello
19:06:13 <dprince> lifeless: or just fix them in each project (Nova) and mention in on tripleO IRC
19:06:31 * dprince doesn't like extra busy work but wants to keep people in the loop
19:06:35 <lifeless> dprince: can you give a for-instance ?
19:07:17 <dprince> lifeless: for instance... if any sort of integration point breaks that we aren't gating on
19:07:17 <rpodolyaka1> sqlalchemy-migrate versioning bug
19:07:24 <lifeless> ok
19:07:27 <dprince> ^^ or that, sure
19:07:40 <rpodolyaka1> that broke nova and other api services for a few hours
19:07:51 <lifeless> so, IMO if we're going to make a code change (e.g. version lock to work around) then it's entirely appropriate to have a bug in TripleO
19:08:06 <lifeless> if it's something we're going to wait and see / just fix in the source location immediately
19:08:18 <lifeless> then I think a trello card maybe, and/or a topic update to surface the info
19:08:38 <lifeless> dprince: how does that sound?
19:09:00 <dprince> lifeless: okay. thanks
19:09:07 <rpodolyaka1> +1
19:09:20 <lifeless> ok
19:09:23 <lifeless> #topic reviews
19:09:31 <lifeless> http://russellbryant.net/openstack-stats/tripleo-openreviews.html
19:09:34 <lifeless> http://russellbryant.net/openstack-stats/tripleo-reviewers-30.txt
19:09:37 <lifeless> http://russellbryant.net/openstack-stats/tripleo-reviewers-90.txt
19:09:52 * SpamapS dons UTC watch
19:10:04 <lifeless> Median wait time: 0 days, 6 hours, 15 minutes
19:10:05 <lifeless> 3rd quartile wait time: 1 days, 4 hours, 7 minutes
19:10:10 <lifeless> so we're a little behind, *but*
19:10:19 <rpodolyaka1> WIP?
19:10:26 <lifeless> https://review.openstack.org/#/c/52045/ is the reason, and I think it's meant to be marked work in progress
19:10:27 <dprince> reviews seem to be totally under control... so much so that I frequently don't get to see things until after they land
19:10:29 <lifeless> but isn't
19:10:34 * matty_dubs actually wears UTC watch
19:10:57 <lifeless> marios: gerrit sadly requires you to click on 'work in progress' after each push
19:11:14 <jcoufal> matty_dubs: but only 12 h mode :P
19:11:49 <marios> lifeless: oh is that what that was
19:12:13 <marios> sorry, fixing
19:12:30 <SpamapS> I have not been doing my usual share of reviews the last week as well. Just getting back into it today.
19:13:55 <lifeless> ok, so yeah good stuff!
19:14:08 <lifeless> #topic  Projects needing releases
19:14:14 <lifeless> we've landed a bunch of code
19:14:24 <lifeless> do we have a volunteer to do releases of them all ?
19:14:34 <rpodolyaka1> pick me :)
19:14:39 <lifeless> rpodolyaka1: tag!
19:15:13 <lifeless> #action rpodolyaka1 to release all the things
19:15:25 <lifeless> #topic  CD Cloud status
19:16:11 <lifeless> its pretty unwell
19:16:24 <lifeless> I've not lead by example and fixed it this week - sorry
19:17:09 <SpamapS> I have a POC that will at least let us track the map of hardware<->instances
19:17:32 <SpamapS> Should push that up for review later today.
19:17:52 <SpamapS> I have a hypothesis that certain hardware is causing timeouts/problems while others allow it to finish properly.
19:18:25 <dprince> SpamapS: do we have a mixed bag of hardware in the CD cloud then?
19:19:04 <lifeless> not really
19:19:08 <lifeless> we have two hardware configs
19:19:11 <lifeless> in theory
19:19:19 <lifeless> but we have some oddness
19:19:35 <lifeless> SpamapS: it's failing 100% of the time atm though
19:19:42 <lifeless> SpamapS: so I think diagnosing directly is needed
19:19:47 <SpamapS> lifeless: network driver reload again?
19:20:06 <lifeless> SpamapS: I don't know :)
19:20:14 <lifeless> SpamapS: but I want a larger set of people responding and fixing
19:20:35 <SpamapS> lifeless: agreed diagnosing directly is needed. My suggestion is that it is 100% failing because the good hardware is all taken. But I am willing to accept evidence of other problems. Right now we have very little data beyond "it fails"
19:20:49 <lifeless> SpamapS: we have 4 machines deployed and 50 in the rack.
19:20:53 <lifeless> SpamapS: your theory is wrong.
19:20:59 <lifeless> SpamapS: :)
19:21:02 <SpamapS> unless we're using the same 4.
19:21:43 <lifeless> SpamapS: so, I don't want to rathole right now.
19:21:51 <lifeless> SpamapS: lets talk process and coordination instead.
19:21:57 <SpamapS> nor do I. But I don't want to leave without a plan of action.
19:22:17 <lifeless> SpamapS: I think a volunteer poking directly and gathering data is a good start.
19:22:33 <lifeless> SpamapS: e.g. stop the service, run 'nova boot' and see if a machine comes up
19:23:18 <lifeless> SpamapS: I do want us to gather more automated data
19:23:28 <lifeless> SpamapS: but we also need to get into a production mindset and keep the thing itself running.
19:24:01 <lifeless> SpamapS: my concern is that right now we have 9 sysadmins, but the cloud has been down for 3+ days
19:24:30 <SpamapS> the cloud is down for several minutes every hour
19:24:38 <SpamapS> I don't think it matters to anyone yet that it goes down...
19:25:24 <lifeless> SpamapS: if it fails to deploy, that means we have a problem that may affect everyone doing devtest.
19:25:27 <SpamapS> lifeless: I've been uninterested in part because it feels more important to preserve state first so we can have some kind of SLA/monitoring of "cloud is up" before we chase these problems.
19:25:39 <dprince> lifeless: So I think the issue now is we (as a team) are chasing things on multiple fronts. Once CI and CD get closer we'll have more interest in CD cloud errors for sure.
19:25:58 <dprince> lifeless: I only logged into the CD cloud for the first time this week...
19:26:09 <dprince> lifeless: and then only briefly
19:26:14 <lifeless> dprince: mmmm I can see that angle. The way I think of this is that this period is our learning period.
19:26:22 <slagle> what happened to the status announcements in #tripleo?
19:26:28 * dprince needs a crash course perhaps in our CD cloud setup
19:26:33 <slagle> or is that part of what's down?
19:26:59 <lifeless> slagle: that would seem to be down too
19:26:59 <SpamapS> slagle: it may be turned off.
19:27:19 <lifeless> slagle: which means either someone disabled the service (and didn't say so somewhere visible like the channel topic)
19:27:29 <lifeless> or the undercloud host has lost its firmware marbles again
19:27:32 <slagle> :)
19:27:38 <slagle> so that's part of the problem
19:27:44 <lifeless> agreed!
19:28:07 <lifeless> I have linked references to a HPCS ticket about these cards/firmware having issue
19:28:08 <SpamapS> Let's just agree to make it a priority and do analysis on why it has been failing.
19:28:15 <lifeless> ok
19:28:25 <lifeless> SpamapS: can you continue the meeting? ELOCAL
19:28:50 <SpamapS> lifeless: perhaps.. does the bot let you transfer?
19:29:29 <dprince> SpamapS: I don't think so... but we can unofficially follow your lead
19:29:48 <SpamapS> Ok sure. :)
19:29:51 <dprince> SpamapS: not 100% on that though
19:30:06 <SpamapS> documentation doesn't show any way to take it
19:30:38 <SpamapS> anyway, I think we agree that analysis is needed and that we should probably hold it to a higher standard.
19:31:20 <SpamapS> #topic CI virtualized testing progress
19:31:31 <SpamapS> pleia2: is this still your area?
19:31:46 <pleia2> yeah
19:32:05 <pleia2> I owe a patch to create-nodes and would like to talk a little about boot-seed-vm
19:32:48 <pleia2> we could use boot-seed-vm in setting up the test environment, but we'd need to modularize it a bit more, right now we have it as an optional option to actually boot the seed vm, we could also make it optional to build the seed vm
19:33:16 <pleia2> so essentially it would just run configure-vm without building the image, or we could just run configure-vm outside of boot-seed-vm
19:33:42 <pleia2> thoughts?
19:33:46 <SpamapS> pleia2: I like splitting things into smaller and smaller tools
19:34:46 <SpamapS> anyone else have strong opinions?
19:35:17 <lifeless> hi, backish
19:35:30 <pleia2> ok, beyond that I'm working on how to tackle network addressing for all these test envs
19:35:33 <lifeless> we need to separate out all the libvirt manipulation
19:35:48 <lifeless> e.g. configure-vm should be run just once as part of setting up the test environment
19:36:08 <lifeless> so I'd like to see boot-seed-vm behaviour change to assume the vm is defined
19:36:19 <pleia2> ok, great
19:36:20 <lifeless> and just build the image, copy to $place, and start it.
19:36:30 * pleia2 nods
19:37:20 <pleia2> that's all from me, dprince and derekh are looking into the gearman stuff https://etherpad.openstack.org/p/3rYI32gvfu
19:37:24 * dprince wishes this meeting didn't run concurrently to the infra team meeting
19:38:00 <dprince> pleia2: cool, good progress. The gearman stuff is coming along nicely too
19:39:33 <dprince> On the gearman front I believe we are still hitting issues with the python gearman client. But if you see the etherpad above the PHP client seems to work fine.
19:39:51 <dprince> So... we might need to have a closer look at fixing that.
19:41:27 <dprince> Is it worth having another CI focussed google hangout this week? pleia2/derekh?
19:41:40 <eggmaster> durrh, sorry, just saw tuskar meeting invite email.
19:41:54 <pleia2> dprince: yeah, maybe tomorrow? I'm on the tail end of a flu so today still isn't great
19:42:21 <dprince> pleia2: okay. sound good
19:42:38 * dprince hopes pleia2 feels better
19:42:47 <pleia2> thanks :)
19:43:38 <pleia2> I think that's it for the CI stuff in this meeting
19:43:44 <lifeless> ok properly back
19:43:48 <SpamapS> ok so let's just move on
19:44:04 <SpamapS> #topic Open Discussion
19:44:06 <lifeless> dprince: I'm up for a CI hangout
19:44:16 <SpamapS> oh and now he's back ;)
19:44:22 <jistr> so i didn't make the meeting last time but wanted to give my +1 to the idea of gate involving techs like puppet or chef, for projects where it's helpful
19:44:32 <dprince> lifeless: cool, sounds like tomorrow is the best day for it
19:44:54 <jistr> if we don't gate on it, the potential bugs won't go away, we'll just hit them more painfully
19:45:11 <jistr> and in the end someone will have to fix it anyway
19:45:23 <jistr> so we might as well fix it *before* stuff gets in
19:45:51 <jistr> and if some dev feels blocked by tech he doesn't understand, he can ask for assistance
19:46:16 <lifeless> jistr: so, I agree, but I think it's a -infra discussion at this point
19:46:33 <jistr> ok :)
19:49:02 <lifeless> #topic Open Discussion
19:49:11 <lifeless> I'll time out the meeting in a minute
19:49:49 <dprince> lifeless: I have a question about my local dev env... which I've been chipping away in again
19:50:07 <dprince> lifeless: sent you derek and liz the email a few days back...
19:50:22 <dprince> lifeless: got back to it today, and still hitting an issue
19:50:46 <dprince> lifeless: which is, I can't ping my overcloud from outside my seed VM
19:50:56 <dprince> lifeless: which sort of breaks the devtest story :(
19:51:00 <lifeless> the overcloud host
19:51:04 <lifeless> or an instance in the overcloud
19:51:13 <dprince> lifeless: overcloud host (all-in-one)
19:51:31 <dprince> lifeless: I can ping the undercloud host fine
19:51:34 <lifeless> login via the console using stack:stack and check the networking is setup
19:51:40 <lifeless> sensibly
19:51:41 <dprince> lifeless: did that
19:51:51 <jomara> \/win 4
19:51:56 <jomara> !
19:51:58 <dprince> lifeless: didn't see anything too odd
19:52:07 <lifeless> dprince: in particular check there is a default route out via 192.0.2.1
19:52:17 <lifeless> dprince: cause that being missing would give the symptoms you're describing
19:52:36 <dprince> lifeless: I think it was, but I can check.
19:52:51 <SpamapS> well that could explain the 100% fail rate on the CD cloud...
19:54:22 <rpodolyaka1> haven't tested Dan's all-in-one template yet, but devtest is working right now
19:54:26 <lifeless> dprince: if the route is there, start a ping from outside to your overcloud host, and tcpdump each hop along the way
19:54:40 <dprince> lifeless: So if it isn't set should I use the NeutronPublicInterfaceDefaultRoute in the template?
19:55:07 <dprince> lifeless: did that. I saw incoming ICMP packages. But no responses on br-ex from the overcloud
19:55:32 <rpodolyaka1> only to override the default route set AFAIK
19:55:42 <dprince> lifeless: on a related note... it would be really cool if our standard image included tcpdump
19:55:54 <lifeless> dprince: it does
19:55:55 <lifeless> dprince: :)
19:56:02 <lifeless> dprince: maybe not on fedora?
19:56:03 <dprince> lifeless: not on Fedora :(
19:56:14 <lifeless> dprince: I'm entirely happy for us to add it :)
19:56:40 <dprince> lifeless: yep. Which element is it in again (DIB presumably)
19:56:44 <lifeless> dprince: not sure if NeutronPublicInterfaceDefaultRoute would be appropriate
19:56:48 <rpodolyaka1> dprince: there is a problem with br-ex when hand executing devtest (https://bugs.launchpad.net/tripleo/+bug/1252304), but it should only affect floating ips
19:56:57 <rpodolyaka1> not pings of overcloud host
19:56:58 <lifeless> dprince: I'd add it to the fedora element TBH
19:57:12 <dprince> rpodolyaka1; thanks, I did see your ticket earlier this week
19:57:25 <dprince> lifeless; thanks
19:58:04 <lifeless> ok, we're out of time :)
19:58:12 <lifeless> thanks everyone!
19:58:14 <lifeless> #endmeeting