19:01:28 #startmeeting tripleo 19:01:29 Meeting started Tue Jun 17 19:01:28 2014 UTC and is due to finish in 60 minutes. The chair is lifeless. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:30 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:32 The meeting name has been set to 'tripleo' 19:01:38 hi 19:01:39 hello 19:01:40 o/ 19:01:41 o/ 19:01:49 o/ 19:01:49 first post 19:01:50 \o 19:01:51 * giulivo just in case 19:01:55 giulivo: it is a head, and an arm raised to wave 19:02:01 #topic agenda 19:02:05 bugs 19:02:06 reviews 19:02:06 Projects needing releases 19:02:06 CD Cloud status 19:02:06 CI 19:02:08 \o/ I see that now! 19:02:08 Tuskar 19:02:10 Insert one-off agenda items here 19:02:13 open discussion 19:02:24 hello 19:02:25 'morning' everyone 19:02:30 hi 19:02:31 o/ 19:02:33 o/ 19:04:53 #topic bugs 19:05:04 #link https://bugs.launchpad.net/tripleo/ 19:05:04 #link https://bugs.launchpad.net/diskimage-builder/ 19:05:04 #link https://bugs.launchpad.net/os-refresh-config 19:05:04 #link https://bugs.launchpad.net/os-apply-config 19:05:04 #link https://bugs.launchpad.net/os-collect-config 19:05:06 #link https://bugs.launchpad.net/os-cloud-config 19:05:09 #link https://bugs.launchpad.net/tuskar 19:05:11 #link https://bugs.launchpad.net/python-tuskarclient 19:06:58 wall of red criticals 19:07:15 is michael kerrin here? 19:07:28 slagle: any word on 1287453 ? 19:07:38 tchaypo: 1290486 ? 19:07:46 derekh is on leave 19:07:55 SpamapS: any word on 1315474 ? 19:08:04 SpamapS: and bug 1316309 ? 19:08:06 Launchpad bug 1316309 in tripleo "User requested by parameter RabbitUserName is not managed on the RabbitMQ servers." [Critical,In progress] https://launchpad.net/bugs/1316309 19:08:20 adam_g: what about bug 1316475 ? 19:08:21 Launchpad bug 1316475 in cloud-init "[SRU] CloudSigma DS for causes hangs when serial console present" [High,Fix committed] https://launchpad.net/bugs/1316475 19:08:36 GheRivero: you have bug 1316985 under your wing ? 19:08:42 Launchpad bug 1316985 in tripleo "set -eu may spuriously break dkms module" [Critical,In progress] https://launchpad.net/bugs/1316985 19:09:22 lifeless, 1316475 is fixed elsewhere, marking invalid for tripleo 19:09:42 GheRivero: also bug 1319979 19:09:47 Launchpad bug 1319979 in tripleo "rabbitmq-server fails to restart in check-tripleo-overcloud-precise" [Critical,Triaged] https://launchpad.net/bugs/1319979 19:10:02 and the last one is dprince's fix committed. yayayayayyayayyay 19:10:27 Looks like the deps for https://review.openstack.org/#/c/95151/ have merged, so I'll drop WIP from it. Maybe that will get it moving again. 19:10:28 lifeless: re bug 1315474 .. it looks to be a bit stalled. 19:10:29 Launchpad bug 1315474 in tripleo "rabbit.username is not consumed in most elements" [Critical,In progress] https://launchpad.net/bugs/1315474 19:10:47 ^Kept failing ci IIRC. 19:11:03 lifeless: yes. The dkms is waiting for a couple of reviews 19:11:03 slicknik has bug 1329068 in dib 19:11:04 Launchpad bug 1329068 in trove-integration "The trove guest-image is unable to publish the guest-config via cloud-init" [Critical,In progress] https://launchpad.net/bugs/1329068 19:11:08 lifeless: i'm going to drop 1287453 down to High, since Heat added a workaround 19:11:21 ok, lets get #links here for the reviews folk need 19:11:30 criticals stalled ==bad 19:11:39 there is a patch up for it, so we don't have to rely on the workaround, but it needs an update. 19:11:42 lifeless: and the rabbitmq restart dissapeared sometie ago :) Will revise and updated the bug propperly 19:11:45 i can work on that soon'ish 19:11:59 slagle: cool 19:12:22 CI today is hung up on a bunch of overcloud jobs failing with this issue https://bugs.launchpad.net/tripleo/+bug/1292105 19:12:23 Launchpad bug 1292105 in tripleo "CI failed pinging overcloud instance" [Medium,Triaged] 19:12:39 lifeless: should we bump the priority on that ^^ 19:12:43 adam_g: so we caused a regression in bug 1329068 from the data source patch 19:12:44 Launchpad bug 1329068 in trove-integration "The trove guest-image is unable to publish the guest-config via cloud-init" [Critical,In progress] https://launchpad.net/bugs/1329068 19:12:52 dprince: do it 19:12:57 dprince: (raise it, I mean) 19:13:14 dprince: what does kibana say about frequency ? 19:13:15 One moment while I find a browser that works 19:14:12 http://goodsquishy.com/downloads/tripleo-jobs.html is a sea of red for the overcloud job. 19:14:24 #link https://review.openstack.org/#/c/99505/7 for the dib regression 19:14:33 we shouldn't release a new dib until that lands 19:14:34 lifeless: I'm seeing like 90% + overcloud jobs fail on that. 19:14:40 Ah. I saw emails marking 1290486 as fix released, but the tripleo version has been moved back to triaged. 19:14:51 I'll update it to fix-released in line with e other projects 19:15:11 lifeless: I ran check experimental on ben's debug test which should tell us if an Ubuntu overcloud works as well (just in case) 19:16:10 lifeless: re bug 1316309 .. needed a rebase.. submitted a new one now 19:16:15 Launchpad bug 1316309 in tripleo "User requested by parameter RabbitUserName is not managed on the RabbitMQ servers." [Critical,In progress] https://launchpad.net/bugs/1316309 19:16:20 SpamapS: #linky link the review please 19:17:37 #link https://review.openstack.org/92204 19:17:54 dprince: its a few hours old 19:17:56 http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiRGVzdGluYXRpb24gSG9zdCBVbnJlYWNoYWJsZVwiIEFORCBidWlsZF9uYW1lOlwiY2hlY2stdHJpcGxlby1vdmVyY2xvdWQtZjIwXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjE3MjgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjE0MDMwMzI2MzA3MzR9 19:17:56 it is marked Critical, btw, because it blocks Debian unstable users. Perhaps we should drop it to High? 19:18:20 SpamapS: we haven't had a group discussion about supporting debian releases at all 19:18:52 SpamapS: is there a group standing up to help support that (or do we think its in the program interest to do that regardless?) 19:19:37 dprince: first occurance in 48 hours was at 7am 19:19:46 dprince: utc I believe. so should be bisectable 19:19:59 lifeless: I'd imagine our employer is interested in it working. The team using Debian inside HP ran into the bug when they upgraded to RabbitMQ 3.3 19:20:09 lifeless: the previous issue may be masking it somewhat too 19:20:15 dprince: nope 19:20:26 dprince: that was fixed closer to 24 hours back 19:20:41 dprince: there are *no* hits of this before 12 hours back from now, over the last 48 hours 19:20:54 dprince: checking 7 days now 19:21:37 http://goodsquishy.com/downloads/tripleo-jobs.html 19:21:43 lifeless: well derek's report isn't showing a whole lot of overcloud love since the instance_extra_specs thing broke ^^^ 19:21:53 dprince: ack 19:21:59 we may have missed other settings 19:22:17 lifeless: I do see one job passed however: https://review.openstack.org/#/c/99250/9 19:22:25 on kibana 19:22:27 search for message:"Destination Host Unreachable" AND build_name:"check-tripleo-overcloud-f20" 19:22:31 over 7d timeframe 19:22:34 may take a couple of attempts 19:22:46 lifeless: maybe not 19:22:52 its very clearly new 19:22:59 12 hours old 19:23:09 nothing at all for the rest of the week 19:23:25 lifeless: I'll double check the nova configs again, thing is my local setup works fine 19:23:32 lifeless: as did Ben's 19:23:54 curious to know if other people are having success w/ devtest (overcloud) today!! 19:24:06 Yeah, I've never been so unhappy to have devtest pass. 19:24:10 first occurance was on 83883 19:24:22 http://logs.openstack.org/83/83883/18/check-tripleo/check-tripleo-overcloud-f20/e95267e/console.html 19:24:23 lifeless: ok, I dropped both of those to High. 19:24:44 SlickNik: ok so can you talk with SpamapS and/or adam_g about the critical dib bug? there is a review which might fix it in dib 19:24:56 SlickNik: or it might be too-narrow defaults in the new element 19:25:07 SlickNik: we'd like to get it fixed :) 19:25:14 I think we need to move on from bugs though 19:25:25 I think for precise we may need to default to the exact 0.6 defaults 19:25:34 which include NoCloud and ConfigDrive 19:25:35 lifeless: That would be good. I'll take it up with SpamapS after the meeting. 19:25:36 Thanks! 19:25:38 guessing the logic changed 19:25:50 since 0.7 is a _massive_ refactor 19:26:02 #topic reviews 19:26:07 so before I link anything 19:26:15 I'm sorry, I've been trying to get to the metareview 19:26:16 but failing 19:26:39 30m time slices at the end of days are not suitable for a multi-hour high context problem 19:26:46 after the fix for 1330735 was merged, I'm seeing a different error though and only on f20, on ubuntu overcloud is passing: https://review.openstack.org/#/c/93745/ 19:27:34 lifeless, can I also bring back the dicussion on a non-critical bug asking for some feedback? bug 1226310 19:27:36 Launchpad bug 1226310 in tripleo "Nova bm operations fail when LIBVIRT_DEFAULT_URI not set" [Medium,Triaged] https://launchpad.net/bugs/1226310 19:28:07 giulivo: I think we'll run out of time; perhaps in the free for all at the end 19:28:09 definitely not urgent but not sure how to approach that given that we are probabaly all setting that in the user environment at this point to make that work 19:28:23 giulivo: I don't set it 19:28:43 giulivo: tripleo-ci doesn't set it 19:28:58 giulivo: anyhow end of meeting please 19:29:05 lifeless, ok I checked with debian and it would be needed... I'll make sure to see what ubuntu does and update the bug 19:29:12 #link http://www.nemebean.com/reviewstats/tripleo-open.html 19:29:12 #link http://www.nemebean.com/reviewstats/tripleo-30.txt 19:29:13 #link http://www.nemebean.com/reviewstats/tripleo-90.txt 19:29:50 oh 19:29:55 russelb's instance is back 19:30:07 \o/ 19:30:30 * Ng failing so hard, even accounting for a week off and helion release crunch :( 19:31:07 We need to do a better job at finding almost-ready patches 19:31:08 https://review.openstack.org/#/c/89860/ 19:31:17 gertty may help 19:31:18 Been sitting there with my +2, two other +1's, for a week. 19:31:26 I'm having success adding little dashboard patches to it locally 19:31:41 SpamapS: there's no good way of querying for that sort of thing, is there? 19:31:49 jdob: I bet there is. 19:31:59 ok so 19:32:00 Stats since the last revision without -1 or -2 : 19:32:01 Average wait time: 8 days, 16 hours, 40 minutes 19:32:01 1st quartile wait time: 1 days, 8 hours, 50 minutes 19:32:01 Median wait time: 5 days, 6 hours, 20 minutes 19:32:03 3rd quartile wait time: 13 days, 12 hours, 44 minutes 19:32:06 we're still sitting on ~2 weeks 19:32:22 jdob: the search language for it is not awful: https://review.openstack.org/Documentation/user-search.html 19:32:35 never looked at it, but will take a look, thanks Ng 19:32:35 There's a lot waiting on CI passes that is making our stats look worse than they probably are. 19:32:37 one thing folk can do 19:32:41 There's a link to a tripleo reviews dashboard near the bottom of 19:32:43 https://wiki.openstack.org/wiki/TripleO#Review_team 19:32:44 is look at http://russellbryant.net/openstack-stats/tripleo-openreviews.html 19:32:55 and pick reviews from the Longest waiting reviews (based on oldest rev without -1 or -2): 19:32:58 section 19:33:03 since thats the thing that really ages 19:33:08 its also a convenient todo list 19:33:11 Can we make the CI job leave a negative vote when it fails? 19:33:35 bnemec: I think that infra have refactored things for ironic in a way we could use 19:33:47 bnemec: so in principle, yes, if someone can chase that ? 19:33:51 That link is generated by gerrit-dash-creator - https://github.com/stackforge/gerrit-dash-creator/blob/master/dashboards/tripleo.dash 19:34:16 lifeless: I'll see what I can find out. 19:34:19 tchaypo, +1 I'm using it too and I think it provides a nice view 19:34:55 so I'm going to propose tht reviewers start their review pass with the 'Longest waiting reviews (based on oldest rev without -1 or -2): 19:34:59 ' section of reviewstats. 19:35:02 what do folk think ? 19:35:18 +1, We could probably even add that to our new review dashboard 19:35:18 Yeah that's a good idea. 19:35:23 that makes sense, it's different from my normal workflow 19:35:30 it looks like it shouldn't be too hard to add the things SpamapS was talking about, and it already has sections for "no review in last N days" and "no negative feedback" 19:35:40 That's what I do anyway. 19:35:47 bnemec: <3 19:35:53 :-) 19:35:58 bnemec: so how come 19:36:04 46 days, 0 hours, 13 minutes https://review.openstack.org/91861 (Use configured RabbitMQ user) is unreviewed since june 11 :) 19:36:04 There's a reason I care about keeping reviewstats working. 19:36:08 * lifeless is mean 19:36:21 I'm pretty sure that hasn't passed CI in that time. 19:36:30 yea, that one is ready to +A when CI is good 19:36:33 I've rechecked it repeatedly. 19:36:38 actually, lifeless failed to click on the one he looked at before 19:36:49 yea, that patch has been +15'd 19:36:55 yet shows up as unreviewed? 19:36:56 * lifeless is mean and stupid 19:36:58 it's just waiting to pass CI 19:37:00 is this the meeting section where we plead for people to review our specs? I didnt see an explicit section for it this time 19:37:13 https://review.openstack.org/#/c/96448/ 19:37:34 jdob: I think so, we can add specs to the agenda too I guess 19:37:45 i forgot the resolution, if it was going to be separate or not 19:37:53 agenda wasn't updated 19:38:03 so clearly noone cared enough :) 19:38:09 regardless, I'd really appreciate eyes on https://review.openstack.org/#/c/94720/ to make sure everyone is happy with tuskar's direction so we can get started 19:38:36 jdob: can you #linky that? 19:38:48 more generally; as the specs process matures - expect folk to be justifiably unhappy at design-issues-caught-in-code-review 19:38:55 or maybe #info the whole thing so it gets into the logs? 19:39:14 I think we'll find contributors pushing back if we haven't caught it in spec review - double-edged sword 19:39:19 lifeless: ya, that's my concern, I don't want to start down a path that gets major disagreement after it's been in code for a few weeks 19:39:35 and i'm starting to catch crap from my managers that it still hasnt been started 19:39:43 so please take the time reviewers to review specs; I guess as second priority after super-old-code-reviews 19:39:45 tchaypo: that's just pound link and the link right? 19:39:52 jdob: aye 19:39:57 #link https://review.openstack.org/#/c/94720/ 19:40:15 thanks in advance :) 19:40:24 unless you -1 it, then i'd have rathered you didn't bother looking 19:40:30 I'm hoping that using it in the middle of a sentence doesn't matc the regex 19:40:30 hah 19:40:54 #topic Projects needing releases 19:40:54 tchaypo: me too, that's why I was careful to type out "pound"... I don't want to be "that guy" :) 19:41:04 do we have a 'volunteer' ? 19:41:07 i'm willing to take a stab at it 19:41:26 #action jdob to wear the release cape 19:41:42 So I want to note that we need a dib-utils release. 19:41:51 Otherwise we can't start using it in diskimage-builder. 19:41:56 bnemec: is it in the releasemgmt wiki page as a thing ? 19:42:05 I _think_ all the permissions should be okay. 19:42:06 bnemec: can you add that to https://wiki.openstack.org/wiki/TripleO/ReleaseManagement ? 19:42:17 Yeah, will do. 19:42:23 * bnemec didn't know about that 19:42:29 jdob: since dib had a regression you might want to see if the regression was released or not; if it wasn't, we should fix it before releasing 19:42:44 kk 19:42:56 #topic CD Cloud status 19:43:08 dprince: can you speak to rh1 while I grab my notes for hp1 ? :) 19:44:11 ... or not :) 19:44:13 lifeless: sure, rh1 is fine 19:44:21 ok so AFAIK therh... damn race conditions 19:44:37 ok so hp1 is still down 19:44:50 #info help still needed on hp1.tripleo.org 19:44:54 #link https://etherpad.openstack.org/p/tripleo-ci-hp1-rebuild 19:45:33 and #link https://bugs.launchpad.net/tripleo/+bug/1325114 <-title is wrong, affects overcloud too 19:45:34 Launchpad bug 1325114 in tripleo "can't deploy underclouds with VLAN API endpoints" [High,Triaged] 19:45:41 #link https://bugs.launchpad.net/tripleo/+bug/1325114 <-title is wrong, affects overcloud too 19:45:44 and 19:46:00 #link https://review.openstack.org/#/c/100151/ 19:46:19 basically, we didn't have test coverage of the (primitive) vlan support, and it bitrotted 19:46:52 I think the simple 'get a vlan up on an IP' bit is nearly fixed, but we need to glue it in more than that, because we now have non-wildcard binds for services 19:47:24 so reviewing dan's networking spec is key, as is reviewing the 100151 public service VIP spec, and helping implement both 19:47:38 We also have hp2.tripleo.org spinning up 19:47:42 its going to have the same issue 19:47:56 its a new region, currently 20 recycled machines 19:48:00 Could we get a summary of this hash-infoed for the logs? 19:48:02 decent specs just not new 19:48:33 #info please help with restoring vlan and public endpoint support so that we can get hp1.tripleo.orgback online and > double our CI capacity 19:48:56 which leads to #topic CI 19:49:04 #topic CI 19:49:13 everything is red 19:49:32 lifeless: not everything, just the overcloud 19:49:47 bnemec: hah. http://russellbryant.net/openstack-stats/tripleo-reviewers-30.txt 19:49:58 bnemec: so I'd like to keep your copy running, if you're willing :) 19:50:05 bnemec: seems simplest 19:50:30 dprince: yes, but since we need it to pass :) 19:50:34 lifeless: Yeah, no problem 19:51:16 * slagle still doesnt know what "Emails" means in stackalytics 19:51:26 regardless, i'm glad i'm Emails=0. 19:51:42 #link https://bugs.launchpad.net/tripleo/+bug/1292105 19:51:44 Launchpad bug 1292105 in tripleo "CI failed pinging overcloud instance" [Critical,Triaged] 19:51:55 #info that bug needs fixing to resume landing patches 19:52:05 #topic Tuskar 19:52:28 anything tuskar specific to address? 19:53:11 for my part, just the review I linked earlier 19:53:14 ok 19:53:27 #topic open discussion 19:53:37 Can folk please sign up on the mid cycle etherpad if you're going 19:53:51 RH are finding a block of rooms at discount rate and they need room count asap 19:54:14 On support and backwards compat and so on 19:54:20 this came up from a review I did 19:54:22 https://etherpad.openstack.org/p/juno-midcycle-meetup if anyone needs the link 19:54:29 but I realise we really haven't spoken much about it 19:54:43 other than the fact we're reserving the right to break things in most of our projects 19:55:02 I think we may be being too conservative - most of our projects are split out of incubator *to enable us to give guarantees* 19:55:09 discuss 19:55:15 #link https://etherpad.openstack.org/p/juno-midcycle-meetup 19:55:23 #info please sign up if you're going 19:55:40 lifeless: Too conservative in incubator or the split out projects? 19:55:49 I thought this arose from one of the latter. 19:55:53 split out 19:56:19 we have a minimum requirement of cross-project backwards compat on *every* change we land, because gerrit. 19:56:33 We don't have any requirement on upgrades (yet, but we will once thats tested) 19:56:50 personally, i'd like to see the split out projects be able to deploy any supported version of openstack 19:56:55 and we don't have any requirement on derived-work (and that will be hard to test, even if we wanted that as a desired thing) 19:57:01 slagle: yes! me too 19:57:09 slagle: thats always been the intent in my head, at least. 19:57:20 slagle: and its yet another dimension of compatibility 19:57:22 lifeless: right, we don't always code and review that way though 19:57:22 Until we're testing that I'm not sure it's a realistic goal. 19:57:38 bnemec: it could be an aspiration 19:57:45 so concretely, I propose to: 19:57:56 - write up a spec about the various dimensions of backwards compat 19:58:01 - with strawman positions 19:58:08 - get everyone to review it over this next week 19:58:19 - land it ~ 7 days from now. 19:58:23 What do you all think ? 19:58:30 +1 to spec and timeboxing the spec 19:58:46 Sure, seems reasonable. 19:58:49 the spec process is still new enough to us that I dont think its getting the attention it ultimately should 19:58:55 so setting a deadline helps that 19:59:26 #info there will be a spec on backwards compat and tripleo and stuff; we're timeboxing it to ~1 week so be sure to watch out for it 19:59:35 giulivo: btw this is your time for that bug :) 19:59:37 60s 19:59:51 tick tock... 19:59:54 slagle: it's not realy how we code and review that matters. It is really how we test that determines if we "support" something. 20:00:38 SpamapS: testing is a backstop to culture, but it doesn't define the vision 20:00:38 I suggest giulivo take that discussion to #tripleo 20:00:46 lifeless, well I really don't know yet why/if on ubuntu a regular user gets qemu:///system by default as libvirt uri 20:00:48 SpamapS: there's always going to be resource limits, for instance 20:00:54 lifeless: if it isn't happening, the culture is built on sand. :( 20:01:02 I'm sure that is not the case on debian though as I checked that personally 20:01:15 SpamapS: what can I say, we're living in anhk-morpork? 20:01:21 #endmeeting