19:01:03 <lifeless> #startmeeting tripleo 19:01:05 <openstack> Meeting started Tue Apr 29 19:01:03 2014 UTC and is due to finish in 60 minutes. The chair is lifeless. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:09 <openstack> The meeting name has been set to 'tripleo' 19:01:13 <derekh> yo 19:01:15 <SpamapS> some projects took this week off for meetings. 19:01:23 <greghaynes> O/ 19:01:32 <marios> SpamapS: yeah, neutron did 19:01:39 <jdob> i think heat did too 19:01:48 <GheRivero> o/ 19:01:53 * marios looks at slavedriver lifeless 19:02:07 <marios> :) 19:02:17 <lsmola2> hello 19:02:22 <lifeless> slackers 19:02:48 <lifeless> #topic agenda 19:02:49 <lifeless> bugs 19:02:49 <lifeless> reviews 19:02:49 <lifeless> Projects needing releases 19:02:50 <lifeless> CD Cloud status 19:02:51 <lifeless> CI 19:02:54 <lifeless> Insert one-off agenda items here 19:02:57 <lifeless> open discussion 19:02:59 <lifeless> #topic bugs 19:03:07 <lifeless> #link https://bugs.launchpad.net/tripleo/ 19:03:07 <lifeless> #link https://bugs.launchpad.net/diskimage-builder/ 19:03:07 <lifeless> #link https://bugs.launchpad.net/os-refresh-config 19:03:07 <lifeless> #link https://bugs.launchpad.net/os-apply-config 19:03:07 <lifeless> #link https://bugs.launchpad.net/os-collect-config 19:03:09 <lifeless> #link https://bugs.launchpad.net/tuskar 19:03:12 <lifeless> #link https://bugs.launchpad.net/python-tuskarclient 19:03:44 <lifeless> rpodolyaka1: https://review.openstack.org/#/c/88597/ needs a rebase 19:03:55 <rpodolyaka1> lifeless: ack 19:04:08 <lifeless> I think we're fully triaged. yay bots 19:04:16 <lifeless> now, what about criticals 19:04:18 <marios> been looking at https://bugs.launchpad.net/tripleo/+bug/1290486 after comparing notes with tchaypo today, seems not so easy to reproduce for f20 envs. updated ticket. will investigate some more tomorrow 19:04:55 <lifeless> slagle: https://bugs.launchpad.net/os-collect-config - two criticals there I thought the release last week would have closed ? 19:04:58 <tchaypo> Thanks marios 19:05:22 <lifeless> slagle: or was that the project SpamapS ninjad? and if so... SpamapS did you skip part of the proces? 19:06:11 * lifeless pauses for SpamapS / slagle :) 19:06:58 <SpamapS> lifeless: sorryrry whasorry I'm having internet issues 19:07:14 <lifeless> 07:04 < lifeless> slagle: https://bugs.launchpad.net/os-collect-config - two criticals there I thought 19:07:17 <lifeless> the release last week would have closed ? 19:07:20 <lifeless> 07:04 < tchaypo> Thanks marios 19:07:22 <lifeless> 07:05 < lifeless> slagle: or was that the project SpamapS ninjad? and if so... SpamapS did you skip part 19:07:25 <lifeless> of the proces? 19:07:58 <lifeless> SpamapS: which project did you release last week ? 19:08:24 <slagle> i don't think i released occ 19:08:46 <lifeless> then SpamapS - I'll leave it with you to resolve whether those bugs are meant to be closed 19:08:50 <lifeless> and keep the meeting moving 19:08:52 <SpamapS> lifeless: ninja release did not close the bugs. I closed them now. 19:09:02 <SpamapS> ahhh.. internet storm passed 19:09:12 <SpamapS> lifeless: os-collect-config is the one I released 19:09:20 <lifeless> https://bugs.launchpad.net/diskimage-builder/ has one critical, I've pushed a fix (I believe). 19:09:40 <lifeless> and https://bugs.launchpad.net/tripleo/ has 8 criticals 19:09:43 <slagle> so my fixes for https://bugs.launchpad.net/tripleo/+bug/1270646 have merged 19:09:49 <slagle> do you want to close that one? 19:10:02 <slagle> i don't know what else to do for it at this point 19:10:07 <lifeless> slagle: thats the workaround by advertising mtu ? 19:10:12 <slagle> yes 19:10:36 <lifeless> slagle: we should close the tripleo task then. Much as I have a huge philosophical issue with the approach, its not our call 19:10:43 <slagle> there was a fix to tie and tht that have merged 19:10:46 <slagle> lifeless: ok 19:10:54 <lifeless> slagle: as in, its an upstream ovs root cause thing 19:11:04 <lifeless> and ranting and railing over here won't help 19:11:44 <lifeless> I believe 1272803 was biting SpamapS yesterday 19:12:14 <lifeless> I think we can close https://bugs.launchpad.net/tripleo/+bug/1280941 19:12:16 <derekh> lifeless: I dropped the ball on 1272803 , picking it back up now 19:12:17 <SpamapS> lifeless: yeah, derekh confirmed that and said he had a plan 19:12:27 <SpamapS> oh there you are :) 19:12:30 <lifeless> derekh: cool 19:12:49 <lifeless> I see slagle has patches open for 1287453 19:13:07 <lifeless> and there is a fun discussion about keystone v3 on the list related to it 19:13:16 <lifeless> I hopefully roped morganfainberg_Z into that 19:13:22 <lifeless> have not had time to check this morning 19:13:28 <slagle> yea, well, i marked my patches WIP 19:13:35 <slagle> since os-cloud-config already has the code as well 19:13:51 <slagle> makes sense to just get that ready and switch to that 19:13:56 <lifeless> yup 19:14:31 <lifeless> 1293782 - I don't believe stevenk is working on that right now, and we have a workaround in place. shall we unassign and drop to high ? 19:14:49 <lifeless> since the defect isn't 'cloud broken', its 'cloud slow' 19:15:33 <lifeless> rpodolyaka1 has a fix for 1304424 that I noted needs a rebase above 19:15:47 <slagle> fine by me on 1293782 19:15:48 <lifeless> derekh: what about 1308407 is it still a thing ? 19:15:48 <slagle> will update 19:16:15 <SpamapS> wait 19:16:19 <SpamapS> os-cloud-config has the heat domain code? 19:16:28 <SpamapS> if so I have some patches that _I_ need to WIP :) 19:16:30 <SpamapS> or even abandon 19:16:48 <rpodolyaka1> lifeless: done 19:16:49 <slagle> now you're making me second guess :) 19:17:01 <derekh> lifeless: yup, 1308407 is still a thing, still waiting on reviews 19:17:05 <lifeless> SpamapS: 19:17:19 <lifeless> SpamapS: have a look in it, and its review queue :) 19:17:33 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1306596 has an abandoned patch 19:17:46 <lifeless> Ng: do you know cial's IRC handle ? 19:17:48 <SpamapS> lifeless: will do! 19:18:01 <lifeless> Ng: I mean Cian 19:18:19 <slagle> SpamapS: yea, it's in http://git.openstack.org/cgit/openstack/os-cloud-config/tree/os_cloud_config/keystone.py 19:19:13 <lifeless> ok no comments re https://bugs.launchpad.net/tripleo/+bug/1280941 so closing it 19:19:23 <Ng> lifeless: hmm, no 19:19:30 <SpamapS> slagle: cool! I missed that. 19:20:04 <lifeless> Ng: can I ask you to ping them about that review, since its a critical bug... 19:20:16 <lifeless> Ng: and while its abandoned noone else can tweak it 19:20:40 <lifeless> rpodolyaka1: prbably want to toggle in-progress to tiraged for that bug too 19:21:10 <lifeless> ok any other bugs business? 19:21:23 <lifeless> fav bug? left by wayside bug ? 19:21:26 <rpodolyaka1> lifeless: done 19:22:23 <Ng> lifeless: k 19:22:44 <lifeless> #topic reviews 19:22:52 <lifeless> Ng: they just need to un-abandon it 19:23:00 <lifeless> http://russellbryant.net/openstack-stats/tripleo-openreviews.html 19:23:02 <Ng> lifeless: yup, composing a mail now 19:23:05 <lifeless> http://russellbryant.net/openstack-stats/tripleo-reviewers-30.txt 19:23:27 <bnemec> Stats are having issues since the Gerrit upgrade. 19:23:30 <derekh> lifeless: I believe ^^ isn't getting updated since the gerrit upgrade 19:23:35 <lifeless> ugh 19:23:43 <lifeless> still, one day old will be reasonably indicative 19:23:49 <lifeless> ah, nothing. 19:23:53 <lifeless> *real* issues. 19:24:02 <bnemec> Yeah, openreviews is completely dead. 19:24:05 <lifeless> rustlebee: do you need a hand w/that ? 19:24:17 <bnemec> lifeless: He has a new baby. :-) 19:24:23 <SpamapS> http://www.stackalytics.com/report/contribution/tripleo-group/30 19:24:42 <SpamapS> http://www.stackalytics.com/report/reviews/tripleo-group/open 19:24:59 <SpamapS> lifeless: ^^ 19:25:09 <marios> weird. the stackalytics one seems less updated for me 19:26:04 <lifeless> the stackalytics stuff is subtly different, no? completely different codebase 19:26:33 <bnemec> Yeah 19:27:24 <marios> lifeless: no idea. just pointing out the differenc e(i.e. if russel's stats are bad after gerrit upgrade, stackalytics isn't fairing better 19:27:50 <SpamapS> lifeless: they've tended to be close enough that the variation is not statistically significant in my experience 19:27:53 <lifeless> marios: was reposnding to SpamapS - and yes, agree with you 19:27:58 <lifeless> anyhow 19:27:59 <SpamapS> lifeless: both show the trends and activity 19:28:06 <lifeless> clearly we're not in good shape 19:28:39 <SpamapS> I've been focusing on making CI more healthy rather than doing reviews 19:28:46 <SpamapS> with the idea that healthy CI enables reviews :) 19:29:10 <lifeless> SpamapS: if you do three reviews a day, thats the commitment benchmark, and then its all gravy from there :) 19:29:18 <SpamapS> Well yeah I'm still hitting that :) 19:29:30 <lifeless> ok so 19:29:40 <lifeless> I see a bunch of passthrough candidates 19:29:53 <lifeless> but indeed we've had a lot of CI issues - different section to the meeting 19:30:00 <lifeless> any proposals around reviews? 19:30:02 <SpamapS> we have 11 cores that are not though 19:30:11 <bnemec> Ooh, more passthroughs for me to -2? :-) 19:30:14 <lifeless> and any omg I need X reviewed requests ? 19:30:28 <lifeless> bnemec: more that they haven't gone from the system yet 19:30:33 <lifeless> bnemec: so we still see them in stats 19:30:35 <marios> lifeless: from quick look seems we still haven't -2 all the things wrt 'config foo' 19:30:49 <bnemec> Oh, I mostly look at the stats since last -1 or -2. 19:31:19 <greghaynes> hrm, I tried to do a quick scan for those a while ago... maybe more have been added since 19:31:24 <lifeless> marios: yeah - or perhaps they are things we should be modelling which is some of them 19:31:34 <lifeless> bnemec: yeah, stackalytics isn't showing that 19:31:43 <lifeless> ok, moving on in a sec 19:32:01 <lifeless> #topic 19:32:02 <lifeless> Projects needing releases 19:32:07 <lifeless> #topic Projects needing releases 19:32:23 <lifeless> Do we have a selfless volunteer? 19:32:27 <SpamapS> IIRC there was some problem with os-apply-config 0.1.15 not reaching pypi 19:32:44 <SpamapS> Not sure what that problem was tho 19:32:48 <SpamapS> 0.1.14 is still the max version 19:32:54 <rpodolyaka1> lifeless: I haven't done releases for a while :) 19:33:05 <lifeless> ok, can the volunteer get that fixed too ? 19:33:08 <lifeless> rpodolyaka1: \o/ 19:33:12 <rpodolyaka1> ack 19:33:18 <lifeless> #action rpodolyaka1 volunteered to release the world 19:33:28 <lifeless> #topic 19:33:33 <lifeless> #topic CD Cloud status 19:33:47 <SpamapS> heh 19:34:09 <lifeless> AFAICT the underclouds and CI clouds in both regions are happy ? 19:34:17 <SpamapS> so I'm attacking the list of bad/missing/w'ever machines in the HP region with JIRA tickets for our DC ops. 19:34:19 <lifeless> with the exception of mellanox being mellanox 19:34:27 <lifeless> SpamapS: \o/ 19:34:47 <SpamapS> My intention is to resurrect tripleo-cd, which needs a couple more machines avaiable. I think we have 3 working, and we'll need 5 if we ever get to HA :) 19:35:13 <derekh> lifeless: things seem to be plodding along ci wise http://goodsquishy.com/downloads/s_tripleo-jobs.html 19:35:21 <lifeless> #topic CI 19:35:27 <derekh> try not to be dazzeled by the pretty colors 19:35:27 <lifeless> derekh: you were saying :) 19:35:47 <SpamapS> Anyway, if we can get tripleo-cd up, we can then use the images from that to update ci-overcloud (as in, R1) to trusty and that should eliminate the mellanox fails. 19:35:52 <derekh> R1 overcloud jobs are running 30 slower then R2 19:36:05 <derekh> but I don't think its the spec of the machines 19:36:15 <lifeless> derekh: linked that to the meetings page 19:36:15 <dprince> derekh: that is 30 minutes right? 19:36:41 <SpamapS> we are down two compute nodes in R1 right now 19:36:46 <derekh> quit a lot of the R1 jobs spend 20 minutes waiting on the testenv , so we have a but there somewhere I need to track down 19:36:52 <lifeless> SpamapS: you know you can just build the images directly right - ci-overcloud is deployed by devtest_overcloud.sh 19:36:57 <derekh> but -> bug 19:37:15 <SpamapS> lifeless: I do. But.. actual.. tested images.. would be amazing. :) 19:37:20 <lifeless> SpamapS: :) 19:37:24 <derekh> dprince: yup 19:37:33 <lifeless> derekh: could it be we have more slaves than testenvs ? 19:37:36 <SpamapS> I'll give up on it in a couple of days and just jam new images onto ci-overcloud if I can't get a working tripleo-cd 19:37:45 <lifeless> ok 19:38:03 <derekh> lifeless: I thought that was a possibility but it seems to be consistently 20 minutes 19:38:10 <lifeless> is R1 HP ? 19:38:22 <derekh> lifeless: yup R1 == HP 19:38:50 <derekh> lifeless: so I tried to rebuild 3 TE hosts today to confirm and they went to error state 19:38:52 <lifeless> yesterday I saw lots of spurious failures on seed node bringup 19:39:24 <lifeless> I'd love to migrate to Ironic in that rack 19:39:35 <lifeless> its so close to being 'there' for that 19:40:29 <derekh> lifeless: anyways in summary, I have 2 issues to look into that are currently causing a time difference between the 2 racks 19:41:22 <lifeless> 20m on the testenv 19:41:25 <lifeless> whats the second one ? 19:41:30 <lifeless> second issue I mean 19:41:47 <derekh> heat stack-create overcloud 19:41:56 <derekh> take 10 minutes longer (at least) on R1 19:41:58 <SpamapS> lifeless: do we have patches available for preserve ephemeral? 19:42:10 <derekh> I'm guessing the ovs brige needs tweaking 19:42:13 <lifeless> SpamapS: shrews has one up for the rebuil side of it 19:42:19 <derekh> lifeless: ^ 19:42:20 <tchaypo> R1 is HP/ubuntu, R2 is redhat? 19:42:28 <derekh> tchaypo: yup 19:42:32 <lifeless> SpamapS: I'm not sure the driver maps the ephemeral siz across yet either. need to double check 19:42:42 <lifeless> derekh: *interesting* 19:43:07 <SpamapS> sounds like we need to do some measurement 19:43:32 <SpamapS> derekh: do the RH machines have battery backed write cache? 19:43:46 <derekh> SpamapS: no idea 19:43:53 <lifeless> SpamapS: we kvm in unsafe-cache mode 19:44:04 <SpamapS> lifeless: but we don't build images in vms 19:44:12 <SpamapS> lifeless: and we do eventually commit it all to disk 19:44:12 <lifeless> SpamapS: yeah we do 19:44:23 <SpamapS> oh I thought the images happened in the testenvs 19:44:30 <lifeless> SpamapS: though we don't build VMs in *those* Vms 19:44:32 <SpamapS> ok then _harumph_ 19:44:36 <lifeless> SpamapS: we build images in jenkins slaves. 19:44:41 <lifeless> SpamapS: in the ci-overcloud 19:44:41 <SpamapS> right 19:44:43 <derekh> kvm unsafe should be on both regions 19:44:49 <SpamapS> understanding clicking back into place 19:45:06 <SpamapS> yeah so network probs seem more likely than anything else. 19:45:08 <lifeless> for the testenvs - dprince was looking at doing a nova feature to let us control that from the API 19:45:21 <derekh> this isn't building images, the time difference is from nova boot too bm-deploy-help /POST 19:45:35 <lifeless> yeah 19:45:50 <SpamapS> might we also have divergent undercloud configurations? 19:46:08 <derekh> SpamapS: I'd say wildly different 19:46:13 <lifeless> SpamapS: not relevant in this context AFAICT though... 19:46:30 <lifeless> SpamapS: more relevant might be that the host OS for the cloud is f20 vs ubuntu 19:46:44 <derekh> anyway I'll get more info once I carve myself away a host to work with 19:46:58 <SpamapS> nova boot to bm-deploy-helper is all handled on the local network on the testenvs, right? 19:47:02 <lifeless> yup 19:47:20 <lifeless> I think the testenvs in region2 are also f20 based 19:47:21 <SpamapS> we should, if nothing else, give trusty testenvs a try and measure that. 19:47:30 <derekh> lifeless: yes they are 19:47:49 * lifeless is a little sad at the variation there - just having more variables 19:47:52 <SpamapS> Entirely possible something magical happened to local networking in 3.13 :-P 19:47:58 <derekh> SpamapS: I'm all for that, if its still a problem we can dig further 19:48:05 <lifeless> the clouds we have an entirely good reason to want variation - its product! 19:48:22 <lifeless> ok, so 19:48:31 <lifeless> #topic open discussion 19:49:06 <lifeless> dprince: SpamapS: you guys were having an animated discussion in #tripleo 19:49:40 <SpamapS> Yes, let me see if I can summarize 19:49:43 <dprince> lifeless: yes, I'm sending a mail to the list about that. 19:49:51 <SpamapS> * We have infrastructure for our development efforts 19:50:04 <SpamapS> * Currently it is all over the place, monkey-patched and hand-deployed (at least, in R1) 19:50:19 <SpamapS> * We should collaborate on this infrastructure. 19:50:53 <SpamapS> * How we do that, and how it relates to openstack-infra, are all open questions. 19:52:28 <SpamapS> dprince: ^ agree? 19:53:01 <dprince> SpamapS: yes, specifically I was thinking about mirrors myself this weekend and how to go about setting them up in both racks. 19:53:43 <SpamapS> Right, so my thinking is, use diskimage-builder elements, os-*-config, and heat. 19:53:44 <dprince> SpamapS: and I was a bit surprised that you guys already have an Ubuntu mirror in the HP rack. We'll need one in the Red Hat rack as well... so how to do that? 19:54:04 <lifeless> dprince: the apt-mirror element 19:54:24 <SpamapS> lifeless: debian-mirror 19:54:25 <dprince> SpamapS: My initial thought was we are providing our CI overcloud as a resource to openstack-infra (i.e. they already run slaves there...) 19:54:28 <lifeless> dprince: + a tiny apache fragment, I *think* I pushed a WIP with it in it 19:54:29 <SpamapS> which is probably.. the wrong name 19:54:33 <lifeless> SpamapS: uh, yeah. 19:54:33 <dprince> SpamapS: So why not mirrors too? 19:55:17 <dprince> But the larger question to me was who is responsible for these things, thinking ahead to the fact that we want to be a gate... 19:55:45 <SpamapS> Well openstack-infra will already be maintaining mirrors for the supported OS's to support devstack. 19:55:53 <lifeless> SpamapS: they aren't 19:55:56 <SpamapS> they aren't? 19:55:59 <lifeless> SpamapS: they aren't 19:56:02 <SpamapS> So does it fall on cloud provider? 19:56:04 <derekh> I'm starting to thing there I soo many thing we *could* mirror we should instead start with a caching/proxy in each rack wait a few days and then see whats still hurting us 19:56:04 <SpamapS> (that's good IMO) 19:56:17 <dprince> I can't answer this question myself. I have my opinion (which would probably be to use TripleO tools for it BTW). But I'm also trying to be mindful of it. 19:56:22 <lifeless> SpamapS: today yes, since they can't use glance even, anyhow. Thats changing though. 19:56:28 <SpamapS> lifeless: ah 19:56:30 <dprince> mindful of infra rather 19:56:48 <lifeless> FWIW I've had a number of inconclusive discussions about this (nearly all in -infra channel) 19:56:51 <dprince> derekh: exactly. 19:57:01 <dprince> derekh: my take was lets run a public mirror https://review.openstack.org/#/c/90875/ 19:57:08 <dprince> derekh: and then squid it up in our Racks 19:57:20 <SpamapS> ok 19:57:30 <SpamapS> So a thread is likely going to be more productive. :) 19:57:32 * dprince has a squid element he will push today 19:57:43 <lifeless> the issues as dprince says are about gating requirements, responsibility, availability etc 19:57:44 <dprince> SpamapS: my thought as well 19:58:03 <SpamapS> dprince: wow, I'm actually shocked there's no squid element already. :) 19:58:39 <dprince> SpamapS: I think we need/want both. Mirros are good for stability. 19:58:50 <dprince> SpamapS: squid is good at the local caching bits 19:58:50 <SpamapS> yes 20:00:05 <SpamapS> Ok, endmeeting? 20:00:16 <lifeless> #endmeeting