19:01:03 #startmeeting tripleo 19:01:05 Meeting started Tue Apr 29 19:01:03 2014 UTC and is due to finish in 60 minutes. The chair is lifeless. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:06 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:09 The meeting name has been set to 'tripleo' 19:01:13 yo 19:01:15 some projects took this week off for meetings. 19:01:23 O/ 19:01:32 SpamapS: yeah, neutron did 19:01:39 i think heat did too 19:01:48 o/ 19:01:53 * marios looks at slavedriver lifeless 19:02:07 :) 19:02:17 hello 19:02:22 slackers 19:02:48 #topic agenda 19:02:49 bugs 19:02:49 reviews 19:02:49 Projects needing releases 19:02:50 CD Cloud status 19:02:51 CI 19:02:54 Insert one-off agenda items here 19:02:57 open discussion 19:02:59 #topic bugs 19:03:07 #link https://bugs.launchpad.net/tripleo/ 19:03:07 #link https://bugs.launchpad.net/diskimage-builder/ 19:03:07 #link https://bugs.launchpad.net/os-refresh-config 19:03:07 #link https://bugs.launchpad.net/os-apply-config 19:03:07 #link https://bugs.launchpad.net/os-collect-config 19:03:09 #link https://bugs.launchpad.net/tuskar 19:03:12 #link https://bugs.launchpad.net/python-tuskarclient 19:03:44 rpodolyaka1: https://review.openstack.org/#/c/88597/ needs a rebase 19:03:55 lifeless: ack 19:04:08 I think we're fully triaged. yay bots 19:04:16 now, what about criticals 19:04:18 been looking at https://bugs.launchpad.net/tripleo/+bug/1290486 after comparing notes with tchaypo today, seems not so easy to reproduce for f20 envs. updated ticket. will investigate some more tomorrow 19:04:55 slagle: https://bugs.launchpad.net/os-collect-config - two criticals there I thought the release last week would have closed ? 19:04:58 Thanks marios 19:05:22 slagle: or was that the project SpamapS ninjad? and if so... SpamapS did you skip part of the proces? 19:06:11 * lifeless pauses for SpamapS / slagle :) 19:06:58 lifeless: sorryrry whasorry I'm having internet issues 19:07:14 07:04 < lifeless> slagle: https://bugs.launchpad.net/os-collect-config - two criticals there I thought 19:07:17 the release last week would have closed ? 19:07:20 07:04 < tchaypo> Thanks marios 19:07:22 07:05 < lifeless> slagle: or was that the project SpamapS ninjad? and if so... SpamapS did you skip part 19:07:25 of the proces? 19:07:58 SpamapS: which project did you release last week ? 19:08:24 i don't think i released occ 19:08:46 then SpamapS - I'll leave it with you to resolve whether those bugs are meant to be closed 19:08:50 and keep the meeting moving 19:08:52 lifeless: ninja release did not close the bugs. I closed them now. 19:09:02 ahhh.. internet storm passed 19:09:12 lifeless: os-collect-config is the one I released 19:09:20 https://bugs.launchpad.net/diskimage-builder/ has one critical, I've pushed a fix (I believe). 19:09:40 and https://bugs.launchpad.net/tripleo/ has 8 criticals 19:09:43 so my fixes for https://bugs.launchpad.net/tripleo/+bug/1270646 have merged 19:09:49 do you want to close that one? 19:10:02 i don't know what else to do for it at this point 19:10:07 slagle: thats the workaround by advertising mtu ? 19:10:12 yes 19:10:36 slagle: we should close the tripleo task then. Much as I have a huge philosophical issue with the approach, its not our call 19:10:43 there was a fix to tie and tht that have merged 19:10:46 lifeless: ok 19:10:54 slagle: as in, its an upstream ovs root cause thing 19:11:04 and ranting and railing over here won't help 19:11:44 I believe 1272803 was biting SpamapS yesterday 19:12:14 I think we can close https://bugs.launchpad.net/tripleo/+bug/1280941 19:12:16 lifeless: I dropped the ball on 1272803 , picking it back up now 19:12:17 lifeless: yeah, derekh confirmed that and said he had a plan 19:12:27 oh there you are :) 19:12:30 derekh: cool 19:12:49 I see slagle has patches open for 1287453 19:13:07 and there is a fun discussion about keystone v3 on the list related to it 19:13:16 I hopefully roped morganfainberg_Z into that 19:13:22 have not had time to check this morning 19:13:28 yea, well, i marked my patches WIP 19:13:35 since os-cloud-config already has the code as well 19:13:51 makes sense to just get that ready and switch to that 19:13:56 yup 19:14:31 1293782 - I don't believe stevenk is working on that right now, and we have a workaround in place. shall we unassign and drop to high ? 19:14:49 since the defect isn't 'cloud broken', its 'cloud slow' 19:15:33 rpodolyaka1 has a fix for 1304424 that I noted needs a rebase above 19:15:47 fine by me on 1293782 19:15:48 derekh: what about 1308407 is it still a thing ? 19:15:48 will update 19:16:15 wait 19:16:19 os-cloud-config has the heat domain code? 19:16:28 if so I have some patches that _I_ need to WIP :) 19:16:30 or even abandon 19:16:48 lifeless: done 19:16:49 now you're making me second guess :) 19:17:01 lifeless: yup, 1308407 is still a thing, still waiting on reviews 19:17:05 SpamapS: 19:17:19 SpamapS: have a look in it, and its review queue :) 19:17:33 https://bugs.launchpad.net/tripleo/+bug/1306596 has an abandoned patch 19:17:46 Ng: do you know cial's IRC handle ? 19:17:48 lifeless: will do! 19:18:01 Ng: I mean Cian 19:18:19 SpamapS: yea, it's in http://git.openstack.org/cgit/openstack/os-cloud-config/tree/os_cloud_config/keystone.py 19:19:13 ok no comments re https://bugs.launchpad.net/tripleo/+bug/1280941 so closing it 19:19:23 lifeless: hmm, no 19:19:30 slagle: cool! I missed that. 19:20:04 Ng: can I ask you to ping them about that review, since its a critical bug... 19:20:16 Ng: and while its abandoned noone else can tweak it 19:20:40 rpodolyaka1: prbably want to toggle in-progress to tiraged for that bug too 19:21:10 ok any other bugs business? 19:21:23 fav bug? left by wayside bug ? 19:21:26 lifeless: done 19:22:23 lifeless: k 19:22:44 #topic reviews 19:22:52 Ng: they just need to un-abandon it 19:23:00 http://russellbryant.net/openstack-stats/tripleo-openreviews.html 19:23:02 lifeless: yup, composing a mail now 19:23:05 http://russellbryant.net/openstack-stats/tripleo-reviewers-30.txt 19:23:27 Stats are having issues since the Gerrit upgrade. 19:23:30 lifeless: I believe ^^ isn't getting updated since the gerrit upgrade 19:23:35 ugh 19:23:43 still, one day old will be reasonably indicative 19:23:49 ah, nothing. 19:23:53 *real* issues. 19:24:02 Yeah, openreviews is completely dead. 19:24:05 rustlebee: do you need a hand w/that ? 19:24:17 lifeless: He has a new baby. :-) 19:24:23 http://www.stackalytics.com/report/contribution/tripleo-group/30 19:24:42 http://www.stackalytics.com/report/reviews/tripleo-group/open 19:24:59 lifeless: ^^ 19:25:09 weird. the stackalytics one seems less updated for me 19:26:04 the stackalytics stuff is subtly different, no? completely different codebase 19:26:33 Yeah 19:27:24 lifeless: no idea. just pointing out the differenc e(i.e. if russel's stats are bad after gerrit upgrade, stackalytics isn't fairing better 19:27:50 lifeless: they've tended to be close enough that the variation is not statistically significant in my experience 19:27:53 marios: was reposnding to SpamapS - and yes, agree with you 19:27:58 anyhow 19:27:59 lifeless: both show the trends and activity 19:28:06 clearly we're not in good shape 19:28:39 I've been focusing on making CI more healthy rather than doing reviews 19:28:46 with the idea that healthy CI enables reviews :) 19:29:10 SpamapS: if you do three reviews a day, thats the commitment benchmark, and then its all gravy from there :) 19:29:18 Well yeah I'm still hitting that :) 19:29:30 ok so 19:29:40 I see a bunch of passthrough candidates 19:29:53 but indeed we've had a lot of CI issues - different section to the meeting 19:30:00 any proposals around reviews? 19:30:02 we have 11 cores that are not though 19:30:11 Ooh, more passthroughs for me to -2? :-) 19:30:14 and any omg I need X reviewed requests ? 19:30:28 bnemec: more that they haven't gone from the system yet 19:30:33 bnemec: so we still see them in stats 19:30:35 lifeless: from quick look seems we still haven't -2 all the things wrt 'config foo' 19:30:49 Oh, I mostly look at the stats since last -1 or -2. 19:31:19 hrm, I tried to do a quick scan for those a while ago... maybe more have been added since 19:31:24 marios: yeah - or perhaps they are things we should be modelling which is some of them 19:31:34 bnemec: yeah, stackalytics isn't showing that 19:31:43 ok, moving on in a sec 19:32:01 #topic 19:32:02 Projects needing releases 19:32:07 #topic Projects needing releases 19:32:23 Do we have a selfless volunteer? 19:32:27 IIRC there was some problem with os-apply-config 0.1.15 not reaching pypi 19:32:44 Not sure what that problem was tho 19:32:48 0.1.14 is still the max version 19:32:54 lifeless: I haven't done releases for a while :) 19:33:05 ok, can the volunteer get that fixed too ? 19:33:08 rpodolyaka1: \o/ 19:33:12 ack 19:33:18 #action rpodolyaka1 volunteered to release the world 19:33:28 #topic 19:33:33 #topic CD Cloud status 19:33:47 heh 19:34:09 AFAICT the underclouds and CI clouds in both regions are happy ? 19:34:17 so I'm attacking the list of bad/missing/w'ever machines in the HP region with JIRA tickets for our DC ops. 19:34:19 with the exception of mellanox being mellanox 19:34:27 SpamapS: \o/ 19:34:47 My intention is to resurrect tripleo-cd, which needs a couple more machines avaiable. I think we have 3 working, and we'll need 5 if we ever get to HA :) 19:35:13 lifeless: things seem to be plodding along ci wise http://goodsquishy.com/downloads/s_tripleo-jobs.html 19:35:21 #topic CI 19:35:27 try not to be dazzeled by the pretty colors 19:35:27 derekh: you were saying :) 19:35:47 Anyway, if we can get tripleo-cd up, we can then use the images from that to update ci-overcloud (as in, R1) to trusty and that should eliminate the mellanox fails. 19:35:52 R1 overcloud jobs are running 30 slower then R2 19:36:05 but I don't think its the spec of the machines 19:36:15 derekh: linked that to the meetings page 19:36:15 derekh: that is 30 minutes right? 19:36:41 we are down two compute nodes in R1 right now 19:36:46 quit a lot of the R1 jobs spend 20 minutes waiting on the testenv , so we have a but there somewhere I need to track down 19:36:52 SpamapS: you know you can just build the images directly right - ci-overcloud is deployed by devtest_overcloud.sh 19:36:57 but -> bug 19:37:15 lifeless: I do. But.. actual.. tested images.. would be amazing. :) 19:37:20 SpamapS: :) 19:37:24 dprince: yup 19:37:33 derekh: could it be we have more slaves than testenvs ? 19:37:36 I'll give up on it in a couple of days and just jam new images onto ci-overcloud if I can't get a working tripleo-cd 19:37:45 ok 19:38:03 lifeless: I thought that was a possibility but it seems to be consistently 20 minutes 19:38:10 is R1 HP ? 19:38:22 lifeless: yup R1 == HP 19:38:50 lifeless: so I tried to rebuild 3 TE hosts today to confirm and they went to error state 19:38:52 yesterday I saw lots of spurious failures on seed node bringup 19:39:24 I'd love to migrate to Ironic in that rack 19:39:35 its so close to being 'there' for that 19:40:29 lifeless: anyways in summary, I have 2 issues to look into that are currently causing a time difference between the 2 racks 19:41:22 20m on the testenv 19:41:25 whats the second one ? 19:41:30 second issue I mean 19:41:47 heat stack-create overcloud 19:41:56 take 10 minutes longer (at least) on R1 19:41:58 lifeless: do we have patches available for preserve ephemeral? 19:42:10 I'm guessing the ovs brige needs tweaking 19:42:13 SpamapS: shrews has one up for the rebuil side of it 19:42:19 lifeless: ^ 19:42:20 R1 is HP/ubuntu, R2 is redhat? 19:42:28 tchaypo: yup 19:42:32 SpamapS: I'm not sure the driver maps the ephemeral siz across yet either. need to double check 19:42:42 derekh: *interesting* 19:43:07 sounds like we need to do some measurement 19:43:32 derekh: do the RH machines have battery backed write cache? 19:43:46 SpamapS: no idea 19:43:53 SpamapS: we kvm in unsafe-cache mode 19:44:04 lifeless: but we don't build images in vms 19:44:12 lifeless: and we do eventually commit it all to disk 19:44:12 SpamapS: yeah we do 19:44:23 oh I thought the images happened in the testenvs 19:44:30 SpamapS: though we don't build VMs in *those* Vms 19:44:32 ok then _harumph_ 19:44:36 SpamapS: we build images in jenkins slaves. 19:44:41 SpamapS: in the ci-overcloud 19:44:41 right 19:44:43 kvm unsafe should be on both regions 19:44:49 understanding clicking back into place 19:45:06 yeah so network probs seem more likely than anything else. 19:45:08 for the testenvs - dprince was looking at doing a nova feature to let us control that from the API 19:45:21 this isn't building images, the time difference is from nova boot too bm-deploy-help /POST 19:45:35 yeah 19:45:50 might we also have divergent undercloud configurations? 19:46:08 SpamapS: I'd say wildly different 19:46:13 SpamapS: not relevant in this context AFAICT though... 19:46:30 SpamapS: more relevant might be that the host OS for the cloud is f20 vs ubuntu 19:46:44 anyway I'll get more info once I carve myself away a host to work with 19:46:58 nova boot to bm-deploy-helper is all handled on the local network on the testenvs, right? 19:47:02 yup 19:47:20 I think the testenvs in region2 are also f20 based 19:47:21 we should, if nothing else, give trusty testenvs a try and measure that. 19:47:30 lifeless: yes they are 19:47:49 * lifeless is a little sad at the variation there - just having more variables 19:47:52 Entirely possible something magical happened to local networking in 3.13 :-P 19:47:58 SpamapS: I'm all for that, if its still a problem we can dig further 19:48:05 the clouds we have an entirely good reason to want variation - its product! 19:48:22 ok, so 19:48:31 #topic open discussion 19:49:06 dprince: SpamapS: you guys were having an animated discussion in #tripleo 19:49:40 Yes, let me see if I can summarize 19:49:43 lifeless: yes, I'm sending a mail to the list about that. 19:49:51 * We have infrastructure for our development efforts 19:50:04 * Currently it is all over the place, monkey-patched and hand-deployed (at least, in R1) 19:50:19 * We should collaborate on this infrastructure. 19:50:53 * How we do that, and how it relates to openstack-infra, are all open questions. 19:52:28 dprince: ^ agree? 19:53:01 SpamapS: yes, specifically I was thinking about mirrors myself this weekend and how to go about setting them up in both racks. 19:53:43 Right, so my thinking is, use diskimage-builder elements, os-*-config, and heat. 19:53:44 SpamapS: and I was a bit surprised that you guys already have an Ubuntu mirror in the HP rack. We'll need one in the Red Hat rack as well... so how to do that? 19:54:04 dprince: the apt-mirror element 19:54:24 lifeless: debian-mirror 19:54:25 SpamapS: My initial thought was we are providing our CI overcloud as a resource to openstack-infra (i.e. they already run slaves there...) 19:54:28 dprince: + a tiny apache fragment, I *think* I pushed a WIP with it in it 19:54:29 which is probably.. the wrong name 19:54:33 SpamapS: uh, yeah. 19:54:33 SpamapS: So why not mirrors too? 19:55:17 But the larger question to me was who is responsible for these things, thinking ahead to the fact that we want to be a gate... 19:55:45 Well openstack-infra will already be maintaining mirrors for the supported OS's to support devstack. 19:55:53 SpamapS: they aren't 19:55:56 they aren't? 19:55:59 SpamapS: they aren't 19:56:02 So does it fall on cloud provider? 19:56:04 I'm starting to thing there I soo many thing we *could* mirror we should instead start with a caching/proxy in each rack wait a few days and then see whats still hurting us 19:56:04 (that's good IMO) 19:56:17 I can't answer this question myself. I have my opinion (which would probably be to use TripleO tools for it BTW). But I'm also trying to be mindful of it. 19:56:22 SpamapS: today yes, since they can't use glance even, anyhow. Thats changing though. 19:56:28 lifeless: ah 19:56:30 mindful of infra rather 19:56:48 FWIW I've had a number of inconclusive discussions about this (nearly all in -infra channel) 19:56:51 derekh: exactly. 19:57:01 derekh: my take was lets run a public mirror https://review.openstack.org/#/c/90875/ 19:57:08 derekh: and then squid it up in our Racks 19:57:20 ok 19:57:30 So a thread is likely going to be more productive. :) 19:57:32 * dprince has a squid element he will push today 19:57:43 the issues as dprince says are about gating requirements, responsibility, availability etc 19:57:44 SpamapS: my thought as well 19:58:03 dprince: wow, I'm actually shocked there's no squid element already. :) 19:58:39 SpamapS: I think we need/want both. Mirros are good for stability. 19:58:50 SpamapS: squid is good at the local caching bits 19:58:50 yes 20:00:05 Ok, endmeeting? 20:00:16 #endmeeting