14:00:14 #startmeeting tripleo 14:00:16 Meeting started Tue Mar 8 14:00:14 2016 UTC and is due to finish in 60 minutes. The chair is dprince. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:17 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:20 The meeting name has been set to 'tripleo' 14:00:23 yo 14:00:24 o/ 14:00:35 o/ 14:00:40 o/ 14:00:47 o/ 14:00:50 hi everyone 14:00:52 \o 14:00:53 o/ 14:00:58 o/ 14:01:02 \o/ 14:01:06 o/ 14:01:09 o/ 14:01:09 hi 14:02:11 #topic agenda 14:02:11 * bugs 14:02:11 * Projects releases or stable backports 14:02:11 * CI 14:02:13 * Specs 14:02:16 * Changing default provisioning IP range: https://review.openstack.org/#/c/289221 14:02:19 * Clear usage of parameters: in tripleoclient https://review.openstack.org/#/c/256670/ 14:02:22 * open discussion 14:02:50 That is our agenda this week. Anything else to add/subtract? 14:03:15 add ci testenv node sizes 14:03:49 derekh: we need to revisit the mitaka branch discussion, I guess we can do that under the releases section 14:03:56 o/ 14:04:00 sorry, dprince ^^ ;) 14:04:06 o/ 14:04:10 o/ 14:04:17 hello \o 14:04:18 d caught me out there 14:04:38 shardy: yes. that was on my mind as well. Lets do it in the stable branches 14:04:49 okay, lets go 14:04:54 #topic bugs 14:05:44 lots of bugs on stable liberty around getting the network isolation passing 14:05:57 are we closing in on getting that working there? 14:06:10 i've merged everything 14:06:19 theory is it should be passing now 14:06:27 slagle: cool 14:06:30 but we shall see, there are some jobs running 14:06:53 slagle: and on trunk we think we have the validations/swap issues fixed? 14:06:57 slagle: https://bugs.launchpad.net/tripleo/+bug/1553243 14:06:57 Launchpad bug 1553243 in tripleo "CI: Resource CREATE failed: Error: resources.ComputeAllNodesValidationDeployment.resources[0]" [Critical,In progress] - Assigned to James Slagle (james-slagle) 14:07:19 dprince: i think so, yes. if anyone sees it, please shout 14:07:32 the patch to use the swap partition instead of file is still out there though 14:07:35 slagle: yep, we can query logstash to verify it is gone I think 14:07:51 slagle: link? 14:07:55 but we can't land that until the swap partition template gets backported to liberty, which is also in progress 14:08:11 dprince: https://review.openstack.org/289085 14:08:25 it's workflowed -1 until the liberty backport lands 14:08:32 which is https://review.openstack.org/#/c/289610/ 14:08:55 we ought to find a way to check our tripleo-ci changes on the stable branches :) 14:09:01 hmmm, this is perhaps a reason not to share our CI branches across stable and upstream 14:09:12 yea 14:09:20 either tripleo-ci is backwards compatible or we use branches 14:09:44 I think we should move tripleo.sh into tripleo-ci and have all CI scripts backwards compatible 14:09:56 shardy: +1 14:10:02 yeah I was for avoiding branches and making it backward compat too 14:10:02 we'll need to agree a time to stop landing stuff to tripleo-common tho 14:10:19 shardy: once upon a time TOCI had devtest scripts in it I think 14:10:25 I posted https://review.openstack.org/#/c/272210/ which was in merge conflict in about 10 minutes ;) 14:10:40 I can rebase it if we're in agreement the move should happen 14:10:42 shardy: I'm fine w/ moving it to tripleo-ci 14:11:01 any other bugs issues? 14:11:10 lets rebase that, confirm the job started ok and then merge without waiting, identical tripleo.sh script 14:11:14 Ok, I'll rebase it today 14:12:01 #topic Projects releases or stable backports 14:12:05 shardy, derekh maybe make it depend on a change which removes it from tripleo-common too so that any chance which *was* editing tripleo.sh in -common will fail 14:12:25 gfidente: Yup, good point, I'll do that too 14:12:31 #link http://releases.openstack.org/mitaka/schedule.html 14:12:32 +1 14:12:40 okay. so this is where we should talk about https://etherpad.openstack.org/p/tripleo-mitaka-rc-blockers 14:12:43 So, I know we had a heated debate about this last week 14:13:01 but that blocker list keeps growing, and we have to decide when to branch 14:13:23 shardy: lets triage the blocker list 14:13:34 shardy: IPv6, upgrades, and one other thing 14:14:05 IPv6 is blocked on a newer version of memcached 14:14:17 and python-memcached 14:14:21 we either need to do time based releases or feature based releases... not both 14:14:35 rdo guys are going to build package in cloud sig 14:14:49 I would argue that the etherpad is really about downstream needs, and feature based releases would hose downstream worse 14:14:49 trown: I agree 14:14:54 trown: true, but all projects track landing things before deciding wnen it's a good time to branch 14:14:59 o/ also on a call sorry late 14:15:05 that's why there's an RC period, but not a specific date 14:15:22 shardy: ya, but that etherpad is a bit extreme 14:15:29 trown: Yeah, agreed 14:15:37 the stress here is this etherpad came very late in Mitaka 14:16:00 Yeah, we should do milestone releases and have a feature freeze next time around IMO 14:16:06 +1 14:16:29 shardy: it sounds like everyone is on board for that in Newton, its just a matter of limping past Mitaka right now 14:16:37 * beagles wanders in late 14:16:52 yup, so ipv6 is blocked, any eta on the memcached stuff? 14:17:24 memcached should be resolved in a couple hours 14:17:29 and what's the status re upgrades - I've seen a fair few patches but we don't seem to have a definitive list in the etherpad? 14:17:40 my concern if we choose the date of say next Friday is people are still going to slam stuff in at this point, when we should actually be in a freeze already 14:17:53 Ok, and there's been manual testing of ipv6, so that should be OK to land in the next week? 14:18:01 shardy: apevec moved it into a card for March 10th https://trello.com/c/8xVahzfA/122-ipv6-support-investigate-redis-as-alternate-caching-engine 14:18:16 shardy: what trown said is better ;-) 14:18:32 derekh: I think number80 already built it, just waiting for repo sync 14:18:45 trown: ack 14:18:53 derekh: hehe, I just meant, sounds like that one is in-hand 14:19:02 shardy: now that we have IPv4 net iso testing in our check jobs I don't think IPv6 is blocked. 14:19:29 shardy: it would give me a vote of confidence to see it passing here though first: https://review.openstack.org/#/c/289445/ 14:19:38 dprince: I agree, and I'm not proposing a fixed date, but we need to burn down these blockers then decide when to branch 14:19:49 ideally that needs to be in the next 7-10 days 14:20:01 dprince, so the issue with memcached won't make it pass anyway 14:20:23 who can provide the status re the upgrades patches - marios? 14:21:06 shardy: again, I'm not convinced we need to burn down all these patches. We can land some of it I think, but landing it all would be foolish 14:21:46 the whole idea of a "blockers" etherpad was to identify those patches deemed critical for the release 14:22:10 shardy: yeah, and it became a free for all I think 14:22:12 given that we don't track anything in launchpad it's hard to track the status any other way? 14:22:29 these are the "blockers" 14:22:32 blocking what, i'm not sure 14:22:44 but these are the things that need to be in liberty 14:22:46 Ok, let me try again - how do we know when we can reasonably branch? 14:22:58 ya I have yet to see an upstream argument for what these are blocking 14:23:17 personally I think it'd be nice to land the ipv6 patches in master before branching 14:23:19 I mean, we could take a bunch of sha's from a passing periodic job, but I'm trying to be sympathetic to those trying to land stuff and coordinate better than that 14:23:21 slagle: these are the things that need to be in liberty today (by someone's assessment) 14:23:34 slagle: do we have a guarantee those needs aren't changing next week? or thereafter? 14:23:57 dprince: i can't answer that. and you know who to ask who can answer it 14:24:13 trown: what impact will it have from an RDO perspective if we branch without fully proven upgrade support landed? 14:24:28 I'm assuming that would be considered a blocker for existing users, no? 14:24:59 should be blocker big interface changes, and non backward compatible stuff 14:25:01 shardy: RDO has never claimed to support upgrades 14:25:02 shardy: unless we have a CI job on upgrades (in TripleO) I wouldn't claim it is working I think 14:25:10 but small features can be backported I think 14:25:37 EmilienM: for Mitaka we're trying to avoid feature backports, because they've been completely abused for liberty 14:25:39 shardy: I would agree with dprince, that until there is a CI job in RDO for upgrades, I would not claim RDO supports upgrades 14:25:55 shardy: right, that's why I specified the "small" :-) 14:25:57 my opinion was that we could branch after ipv6/upgrades was landed. and if needed, we backport fixes to mitaka and liberty 14:26:01 most other projects don't allow feature backports, so it'll be clearer if we can try to do the same 14:26:03 shardy: I'm inclined to try and land the larger patches around IPv6. Especially now that we know we aren't breaking IPv4 with those 14:26:12 slagle: +1 14:26:36 i still don't understand the urgency to branch mitaka though, without rdo mitaka repos 14:26:47 trown: did we get an answer on that? 14:26:52 ipv6 sounds like a big change everywhere, might be good to have it in stable/mitaka 14:26:54 +1 to limiting "blocker" to just ipv6/upgrades 14:27:13 it makes no sense to me to just mitaka tripleo branches with trunk rdo repos 14:27:20 slagle: would you propose we pin our CI to mitaka when all other projects branch then? 14:27:39 because otherwise we have no way to know if we still work with mitaka, we'll be testing trunk/newton 14:28:15 slagle: not sure I get the question... delorean will get branched as soon as oslo libs branch 14:28:37 when is that I guess? 14:28:45 shardy: pin to mitaka-passed-ci 14:28:56 trown, slagle: could we perhaps take the RDO discussion in #rdo afterwards? 14:29:01 then mitaka delorean will be master of most projects, and will switch to stable/mitaka as it gets branced everywhere 14:29:04 dprince: it's not an rdo discussion 14:29:18 those are the repos tripleo-ci uses 14:29:24 Do we have general consensus on just IPv6 and upgrades being blockers? 14:29:46 Where making this hard on ourselves because we don't want to make the same mistakes of backporting everything, why not land IPv6 and upgrades, then loosen temperarily the policy of no backports, but trust ourselves to to make it a free for all, ghave a predefined set of exceptions 14:29:51 shardy: i propose not testing our mitaka branches against newton 14:29:54 imho, after RDO mitaka release, OOO CI should pin on this stable repo 14:30:11 slagle: FWIW I was assuming we'd branch around the time RDO mitaka repos become available, not before 14:30:27 http://releases.openstack.org/mitaka/schedule.html says we should start having RC's next week 14:30:28 slagle: sure, but eventually there will be branches we can switch to for Mitaka in RDO too 14:30:31 shardy: me as well, that's what i was trying to figure out when that will happen 14:30:48 TripleO moves first, then RDO 14:30:48 I just don't want to block trunk indefinitely, or continue testing our "mitaka" release against newton projects 14:31:37 derekh: that'd be fine, but we've failed to reach consensus re the exceptions, ref this etherpad list 14:32:20 Do we agree on IPv6 and upgrades as the blockers for branching? 14:32:33 +1 14:32:34 +1 14:32:49 other issues can come in via (minor) backports if needed via a short window 14:32:49 +1, along with the etherpad as the list of exceptions :) 14:32:50 dprince: Yes, but there's no detail on what constitutes the patch dependencies for liberty->mitaka upgrades 14:32:51 yup 14:33:21 a lot of things in the ehterpad are accepted exceptions anyway, they are bug fixes 14:33:25 shardy: I would ask those driving upgrades to provide that pronto 14:33:46 I'm not sure they know yet. 14:34:04 bnemec: because it isn't finished you mean? 14:34:11 or its a moving target? 14:34:37 dprince: Well, probably both, but mostly because they won't know what problems they're going to hit until they get through all of them. 14:35:40 Based on the etherpad, it's also not clear to me that the people working on upgrades are even focused on Mitaka yet. 14:36:07 okay, so is upgrades as a blocker just a fictitious feature we are chasing. Sounds like only the upstream upgrades CI job will tell 14:36:38 perhaps all we are really blocking on is IPv6 then, Upgardes by definition would need to be backported and fixed anyways 14:36:43 I should note that I've had almost no involvement in it, so if we can we should talk to someone who actually knows what's going on there. 14:37:04 Alright, well we need to move on I think 14:37:27 Lets continue the blockers discussion on #tripleo afterwards 14:37:41 #topic CI 14:38:10 Last week we rebuilt the testenv's to accomidate network isolation testing 14:38:43 #link https://review.openstack.org/#/c/288163/ 14:39:38 We got lots on transient errors on our CI jobs, I'm reckong if somebody spent long enough root causing them we could get to a better place we could improve things by changing nothing in CI envs but this alone is a full time job, (one I think somebody should be doing regardless of what we do) 14:39:51 there have also been lots of calls to bump the RAM on the nodes we use in CI, I think we're at the point where to add 1G to each node, we'll have to reduce the number of envs available by 25% 14:39:58 I mentioned in an email we could try this as an experiement, which we can if people want to, if it works we may end up needing less envs due to less rechecks 14:40:09 "experiment" suggests we can easily undo it, that mightn't be the case, once we bump the ram and add things that start using it then there may be no going back 14:40:27 i think we need less envs also due to cpu 14:40:27 HA deployment in CI at the moment is 5+(4*4)=21G Memory, so essentially we're talking about bumping that to 26G 14:40:37 so if we're going to have less we might as well give them more ram 14:41:13 derekh: my concern is if we do this our upstream CI environment no longer represents something any of the developers can actually test on their own hardware 14:41:35 i was just looking at a testenv host where the load was over 60 on the 15 minute average 14:41:46 derekh: it is okay if we have some "large" testenv's I think. But maintaining at least one smaller testenv setup that is intended to guard the developer case is important too 14:41:51 O.O 14:41:56 dprince: I wouldn't say any of the developers, but I take your point asnd its one of the reasons I was holding out 14:42:00 dprince: a similar setup should still be possible with VMs on a 32G development box? 14:42:27 shardy: how many TripleO core devs have a 32G box ATM 14:42:27 Developers aren't using 4 GB VMs anymore anyway, are they? 14:42:29 * dprince doesn't 14:42:30 talking about ram, is it really the overcloud nodes we would need to bump up or is it just the undercloud ? 14:42:47 dprince: more then you'd think 14:42:49 Hey, I've been asked recently 2 or 3 times by people joining tripleo what kind of dev env is needed, and I started listing options, I think may of us have a different setup anybody want to describe their setup here so we can point newcommers at it ? https://etherpad.openstack.org/p/tripleo-dev-env-census 14:42:54 I thought most devs have at least one 32G RAM box 14:42:58 dprince: nearly everyone in this so far https://etherpad.openstack.org/p/tripleo-dev-env-census 14:43:17 dprince: and not all developers need to deploy HA regularly? 14:43:18 bnemec: 4GB VMs don't really work; i tried a few days ago 14:43:19 that seems like a bare minimum to deploy HA 14:43:25 Public OVB cloud. Just sayin'. ;-) 14:43:30 jdob: Yeah, that's what I mean. 14:43:31 bnemec: overcloud nodes fail in puppet and the undercloud shits itself after a few runs 14:43:35 4gb vm's definitely no longer works for the overcloud 14:43:40 in reality I think most folks are using more ram locally than in CI already 14:43:41 even with aodh reverted 14:43:53 The only reason we're getting away with it in CI is that we've tweaked the hell out of the config and the environment only stays up for about two minutes. 14:43:54 side note, we should probably update those defaults 14:43:55 you can just go look at the oom failures on the revert i pushed to remove the swap file :) 14:44:07 I have multiple boxes, but 32G is going to exclude anyone from outside of tripleO from contributing (and testing locally) unless they have hardware 14:45:04 Well, without HA you might still be able to fit undercloud + 2 overcloud nodes in 16 GB. 14:45:07 This is why I think the nonha case makes a lot of sense. Or perhaps a single node "HA" installation (one controller) 14:45:19 dprince: or access to the OVB cloud we're hoping to deploy, or another one 14:45:27 bnemec: you can, I did it for my devconf demo (you may recall, it was slow tho due to some swapping) 14:45:37 single node HA is and has always been meant to work 14:45:57 gfidente: if it isn't tested in our CI I'm not sure it does 14:46:16 so while one can't test failover and similar stuff, the resulting config matches 3 controllers 14:46:22 gfidente: all I'm suggesting is that our CI represent what our minimum developer requirements actually are 14:46:35 we can go above that for some jobs but not all of them 14:47:45 okay, well this was interesting but I'm not feeling like we need to debate anymore here 14:47:45 i think at least 5gb oc vm's has got to the be minimum for both 14:47:53 4gb just doesn't work 14:48:01 (without swap) 14:48:09 so I think this is a topic which came up on the list too 14:48:11 dprince: YBut our CI doesn't really represent minimum requirements, we've added swap to make it work and tweaked the overcloud to run less workers then default etc... 14:48:15 I guess commenting on the email thread is the best way to follow up 14:48:17 we want more coverage and we need more ram 14:48:26 i went overkill and went with 8 on this last build, but I can try 5-6 on the next ones and post how it goes 14:48:28 It's been a discussion at least since Atlanta. :-) 14:49:16 derekh, is it really ram the limiting factor or are the nodes out of I/O on the disks too? maybe what we really need is more nodes, not more ram? 14:49:17 derekh: I think we probably need to re-allocate the testenv's, I'm just not in agreement that we should drop nonha I think. Or perhaps arguing that a single node HA is perhaps just as important for some devs 14:50:13 dprince: I'm not talking about dropping any jobs 14:50:27 derekh: it was proposed on the list I think, that is why I mentioned it 14:50:37 dprince: I'm talking about having slightly bigger testenvs but 25% less of them 14:50:54 * derekh has done a U turn on his opinion here yesterday 14:50:57 derekh: right but wouldn't that potentially cause us capacity issues? 14:51:18 dprince: The hope is that we would be running less jobs due to less rechecks 14:51:24 derekh: +1 14:51:31 I think less testenvs means less iowait too 14:51:32 dprince: but its a valid concern 14:51:40 i think right now when we are running at full capacity, we are killing ourselves 14:52:07 yeah, well right now stable/liberty is taking up way too much capacity :/ 14:52:22 but sure I agree capacity has a hand in this 14:52:28 okay, we need to move on. 14:52:36 Skipping specs this week 14:52:41 I have a question re adding IPv6 to a gate job: where to include it? is the ceph job fine as dprince suggested in the review? 14:52:54 #topic Changing default provisioning IP range 14:53:23 jaosorior: is this your topic? 14:53:25 https://review.openstack.org/#/c/289221 14:53:37 dprince: It is. 14:53:44 adarazs: starting with anyof them would be fine I think 14:53:51 dprince: okay. :) 14:54:20 jaosorior: anything you want to say on this right now? 14:54:24 So, thing is, the provisioning network was chosen on a network that violates rfc5737 and that breaks other tools if we try to integrate them using the default (such as FreeIPA) 14:54:40 adarazs: maybe start "simple" with the ha job, and add ceph later 14:54:52 I proposed changing it, but bnemec suggests we take that conversation here, as it might have implications regarding upgrades 14:55:01 jaosorior: This has been our default for some time. We can look at changing it, but I sort of think this is a Newton thing 14:55:14 (guys we just got all jobs green with netiso in liberty/ci https://review.openstack.org/#/c/287600/) 14:55:26 gfidente: nice :) 14:55:34 dprince: Well, that's fair enough for me 14:55:37 gfidente++ excelent 14:56:40 okay, moving on 14:56:44 #topic Forcibly clear parameters now passed as parameter_defaults 14:56:51 #link https://review.openstack.org/#/c/256670/ 14:56:58 gfidente: anything you want to mention here? 14:57:13 hey so purpose of that submission is to make sure the client clears parameters: 14:57:20 because it now always passes parameter_defaults: 14:57:34 but on update whatever was passed as parameters: preveals on parameter_defaults: 14:57:56 now the problem with that is that it will also clear our CinderISCSIHelper which in the templates defaults to tgtadm 14:58:04 while on our testenv it should default to lioadm 14:58:11 gfidente: any side effects of this? 14:58:42 gfidente: i thought that patch only cleared parameters that were passed as cli args 14:58:54 that's how it should build the list to pass to clear_parameters 14:58:58 so there are two solutions 1) we change the default in templates from tgtadm to lioadm or 2) we make distro-specific hiera preveal on templates as in https://review.openstack.org/#/c/283712/ 14:59:20 slagle, hey man you wrote that patch too :P 14:59:29 it does but it has some defaults it used to pass regardless 14:59:40 * EmilienM leaves for puppet openstack meeting starting now 14:59:44 amongst which there is lioadm 14:59:52 gfidente: yea, a while ago :). i'll have another look 15:00:02 We should just change that default in the templates. None of our supported platforms right now have tgtadm anymore. 15:00:04 gfidente: okay, thanks for this info. I think we can debate these things on the patch 15:00:12 dprince, so the purpose is to let user-specified parameters passed as parameters: to preveal 15:00:19 bnemec: disagree, tgtadm is the cinder default 15:00:32 times up though 15:00:36 thanks everyone 15:00:45 #endmeeting