#openstack-meeting-4 log

16:00:10 <b3rnard0> #startmeeting OpenStack Ansible Meeting
16:00:10 <openstack> Meeting started Thu Feb 12 16:00:10 2015 UTC and is due to finish in 60 minutes.  The chair is b3rnard0. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:14 <openstack> The meeting name has been set to 'openstack_ansible_meeting'
16:00:24 <cloudnull> hello !
16:00:24 <odyssey4me> o/
16:00:27 <b3rnard0> #link https://wiki.openstack.org/wiki/Meetings/openstack-ansible#Agenda_for_next_meeting
16:00:28 <mancdaz> hi
16:00:33 <BjoernT> Hi
16:00:38 <b3rnard0> #topic RollCall
16:00:46 <cloudnull> present
16:00:58 <Sam-I-Am> hi
16:00:59 <b3rnard0> hello
16:01:22 <palendae> Present
16:01:34 <serverascode> here
16:01:36 <rackertom> Hello
16:01:38 <mancdaz> present
16:02:45 <b3rnard0> #topic Review action items from last week
16:02:58 <cloudnull> ok, lets get started.
16:03:07 <cloudnull> first up hughsaunders to start reading inventory-manage.py
16:03:30 <mancdaz> he's not present
16:04:05 <odyssey4me> I think that was a bit of a laugh anyway. He did say that it needn't carry.
16:04:17 <cloudnull> so ill assume not. :)
16:04:19 <cloudnull> b3rnard0 use link shortener at next meeting
16:04:24 <cloudnull> ^ .doit()
16:04:34 <b3rnard0> #link http://tinyurl.com/pzhdxev
16:04:43 <cloudnull> next: "d34dh0r53 : to help with review of https://review.openstack.org/#/c/152753/"
16:04:52 <d34dh0r53> in progress
16:05:26 <cloudnull> on that review, has anyone had time to review it? besides odyssey4me  and d34dh0r53
16:05:35 <d34dh0r53> nothing major so far other than the multiple physical drive failure in the aio that we discovered
16:05:44 <odyssey4me> andymccr has been doing some work with it
16:06:09 <andymccr> i got it working at least, after a few issues which i think are fixed now?
16:06:17 <cloudnull> andymccr pointed out an issue with the dynamic inventory this morning, so ill have that pushed up this morning.
16:06:24 <cloudnull> i have to do a rebase though before i can do it
16:06:36 <andymccr> since its all encompassing i think we need to decide how we're gonna do this
16:07:00 <cloudnull> i think thats fair .
16:07:13 <andymccr> either we need to just get this through (after a bit of testing) or we need to try split it up. but we should probably pick cos as it stands people are going to be spinning wheels with no real result
16:07:13 <odyssey4me> related topic, I'm working on doing the deraxification in https://review.openstack.org/155342 - perhaps you should rebase on top of that to make your monster patch less monster?
16:07:38 <cloudnull> i was going to bring that up here too.
16:07:44 <b3rnard0> #info related topic, I'm working on doing the deraxification in https://review.openstack.org/155342 - perhaps you should rebase on top of that to make your monster patch less monster?
16:08:17 <odyssey4me> note - it's not fully tested yet, so don't bother just yet... I'm trying out installs now
16:08:19 <cloudnull> so the de-raxing is just as large as the "moster patch"-tm
16:08:53 <odyssey4me> yep, but it's focused on doing one thing only - so that your patch can focus on doing the other thing
16:08:54 <hughsaunders> not really
16:08:57 <cloudnull> do we really think that piecing out into multiple patches is better ?
16:09:21 <git-harry> yes
16:09:28 <cloudnull> how?
16:09:30 <hughsaunders> odyssey4me's patch is mostly renames and removes, so its easier to see whats going on
16:09:36 <odyssey4me> this patch should focus on de-raxing and putting in a migration plan - your patch can focus on converting to ansible roles
16:09:49 <cloudnull> this is true.
16:10:01 <andymccr> i would even say stuff like moving out to use multiple user_secrets/user_variables etc files should be a separate patch in itself
16:10:11 <git-harry> ^agree
16:10:15 <odyssey4me> +1 andymccr
16:10:18 <cloudnull> but it also leaves master broken until its fixed. which is done in the original review
16:10:22 <mancdaz> it also makes changelogs easier to generate
16:10:36 <hughsaunders> +284, -49915 vs +30089, -77051
16:10:49 <andymccr> i think stealth committing several things isn't ideal. but we need to commit to one way or the other in this instance - or we end up with people doing different things where only 1 will be used and its a waste essentially.
16:10:51 <odyssey4me> when the de-rax patch is no longer wip it'll be a working install
16:10:51 <b3rnard0> #info andymccr: i would even say stuff like moving out to use multiple user_secrets/user_variables etc files should be a separate patch in itself
16:11:04 <cloudnull> in addition the already available patch passes our gating and tempest tests which we dont have in our current releases.
16:11:13 <cloudnull> so its better tested than what we have presently
16:11:26 <andymccr> im concerned about upgrading existing installs
16:11:36 <andymccr> and bug regression
16:11:37 <odyssey4me> yeah, but the patch may have regressions which are very hard to spot right now
16:12:05 <git-harry> We've not been adding tests when things are fixed
16:12:12 <odyssey4me> besides, I think everyone needs more time because the patch entirely changes the way the install works
16:12:51 <cloudnull> git-harry we presently have no tempest tests running
16:12:53 <odyssey4me> it'd be easier to review if it was broken up a bit more - and the patch didn't also include extra features along the way ;)
16:12:55 <Sam-I-Am> i sense docs changes :)
16:13:10 <palendae> Sam-I-Am: There will be whenever this work merges, no matter what patch it comes from
16:13:15 <odyssey4me> Sam-I-Am high level of doc impact for both
16:13:19 <git-harry> cloudnull: I thought we had some
16:13:22 <Sam-I-Am> and upgrades might become more interesting
16:13:27 <cloudnull> git-harry no, we have qe
16:13:54 <cloudnull> additionally tempest in its present state will not run , it cant ssh to a built instance.
16:13:58 <cloudnull> the test fail no matter what .
16:14:06 <git-harry> testing is not just tempest
16:14:11 <cloudnull> agreed
16:14:49 <mancdaz> regardless of the state of this particular patch, isn't it more about the principle of massive patches versus smaller iterative patches
16:15:09 <cloudnull> so i think that its easier to understand the patches when we split it out. but i also think we're just prolonging the inevitable.
16:15:26 <mancdaz> cloudnull the first part of what you said is important though
16:15:49 <odyssey4me> sure, it may all end up at exactly what you've produced in the end - but I think more people need time to work through it properly
16:15:56 <mancdaz> it's easier for everyone to understand the changes if they are iterative over a longer time
16:16:14 <odyssey4me> if it's broken up into smaller patches then it's easier to consume, test and approve
16:17:09 <cloudnull> we need to be thinking about how we get to "kilo" and not maintaining our raxisims in this community project.
16:17:25 <BjoernT> I don't think anyone can review such large patches. We should adopt the same rule a linux kernel dev, no patch > 300k or so. I believe that was the rule, or something similar.
16:17:57 <d34dh0r53> I agree that small patches are easier to grok, but breaking the mega-patch into an iterative process will require the addition of a lot of constantly changing glue code which IMHO is likely to break and cause problems.
16:18:18 <cloudnull> ^ that
16:18:55 <andymccr> imo we're trying to achieve like 5 things here. e.g. renaming. galaxy roles, split out user_vars/secrets etc,add some new conf options
16:19:01 <andymccr> this doens't mean we won't still have some massive patches
16:19:15 <andymccr> we will, but i think there are some pretty clear divisions that aren't going to require glue code
16:19:23 <hughsaunders> dynamic inventory changes, package building changes
16:19:54 <BjoernT> Right but we might consider that in the future for new patches
16:20:00 <rackertom> I've only looked at a small portion of the patch because of its size. But, I'd ask if the cost of teasing out the relationships between several patch chunks and any possible breakage in the CI processes is worth the effort of actually breaking it out.
16:20:52 <odyssey4me> if the de-rax can happen first I really do feel like it would make the patch less intimidating
16:21:41 <cloudnull> andymccr i disagree in-order to iteratively  remove our preset bits to be more community focused we're going to have to create a tun of glue
16:21:46 <odyssey4me> at this stage I'm not informed enough to determine whether the conversion to galaxy roles would require much glue (if any)
16:22:04 <cloudnull> its a rip and replace.
16:22:18 <cloudnull> to keep what we have we're going to have to create a lot of glue.
16:22:37 <Sam-I-Am> as long as the glue smells good
16:22:43 <d34dh0r53> lol
16:22:43 <andymccr> cloudnull: can you give an example of the glue?
16:22:53 <cloudnull> sure.
16:23:23 <andymccr> my thinking is if i said to you we're not doing galaxy roles, just remove the raxisms, its something that could be done without glue surely.
16:23:54 <cloudnull> we presently dont use any "defaults" in any of roles. instead we overload group vars. in order to make the new and old work together we're going to have to create defaults that set to the old variables.
16:23:58 <odyssey4me> yeah, the only 'glue' would be the migration process for the inventory and variables
16:24:08 <cloudnull> using vars_files
16:24:26 <cloudnull> because of the way we do vars, that will effect everything
16:24:31 <andymccr> cloudnull: how does that impact the name changing/removal of rax stuff?
16:24:46 <cloudnull> rpc_* rackspace_*
16:24:50 <cloudnull> are vars that we use throughout
16:25:03 <cloudnull> additionally we created structure that shouldnt exist
16:25:46 <cloudnull> furthermore our currect gate checks look at roles for linting which will break on the new roles
16:26:01 <andymccr> but we shouldn't change roles - we should just change the naming on vars etc
16:26:03 <cloudnull> because they use a defauls and meta main.yml
16:26:08 <andymccr> so that rax is gone.
16:26:09 <odyssey4me> sure, but that'll be handled by the de-rax patch
16:26:37 <andymccr> thats kind of my point in that we have 1 task "remove the RAX references" not "remove the RAX references and adjust the roles to do some other structuring at the same time because it seems more efficient"
16:26:41 <cloudnull> so with our current gating and setup we can not have the two live together .
16:27:49 <hughsaunders> cloudnull: why would the new roles not pass the lint check?
16:28:19 <cloudnull> the lint check should only check the plays.
16:28:27 <cloudnull> the plays pull in the role.s
16:28:47 <cloudnull> if you do a syntax check on a role it will fail because the defaults/main.yml is not a dict
16:29:07 <cloudnull> same with meta/main.yml
16:29:32 <hughsaunders> fair, seems like odyssey4me's gate-script-split patch should help with that
16:29:36 <palendae> So the syntax check requires the roles to be galaxy-compliant?
16:29:50 <odyssey4me> ok - side topic of importance... before we can do anything we need everyone to test and be happy with https://review.openstack.org/152965 (gate script changes)... once that's merged then I can change openstack-infra's jobs... then we only need to change the scripts if we want to verify builds, etc
16:29:56 <palendae> hughsaunders: that one is getting big too
16:30:05 <hughsaunders> also fair
16:30:32 <odyssey4me> palendae it is done, from my standpoint - I did find a bug in it just before the meeting (specifically with regards to lint testing)
16:31:10 <odyssey4me> hughsaunders just needs to verify that it suits the tempest and multi-node needs
16:31:27 <cloudnull> palendae no it doesn't require it. but in our current stack the roles dont follow ansible best practices which at this point allows the syntax check to pass , because there are no default variable.s
16:31:39 <palendae> Ah
16:32:22 <odyssey4me> I think the basic request for now is simply to allow the de-rax and conversion-to-galaxy-roles to be seperated
16:32:26 <b3rnard0> #info odyssey4me: ok - side topic of importance... before we can do anything we need everyone to test and be happy with https://review.openstack.org/152965 (gate script changes)... once that's merged then I can change openstack-infra's jobs... then we only need to change the scripts if we want to verify builds, etc
16:32:42 <andymccr> odyssey4me: its more than that. because there are still 4 other things that happen in the mega-patch
16:32:47 <andymccr> theres not much point splitting out 1 thing
16:32:57 <andymccr> and not the others, because its more a question of approach
16:33:11 <andymccr> the mega-patch itself could work fine, and from what ive seen in testing - it does
16:33:56 <cloudnull> ok, so lets break it up. thats the overwhelming consensus.
16:34:30 <b3rnard0> #agreed cloudnull: ok, so lets break it up. thats the overwhelming consensus.
16:34:50 <odyssey4me> I was kind-of thinking that perhaps we break the de-rax out. Get that done. Then we're able to look at the rest of the patch to see what's there. Next week perhaps then we identify what we feel should be added as a subsequent patch (they're new features)
16:35:58 <odyssey4me> it'll also give us all the chance to focus a little more on the patch itself as much of the gating work will be tapering off.
16:36:48 <cloudnull> ok. but we need to get this done before "kilo-3". we'll have failed if we let it go farther than that
16:37:12 <cloudnull> kilo-3 is march.
16:37:22 <b3rnard0> #link https://wiki.openstack.org/wiki/Kilo_Release_Schedule
16:37:24 <odyssey4me> mar 19 - a month from now
16:37:58 <odyssey4me> assuming that we can put time in the next sprint into this, then I would expect that we should be able to meet that deadline quite easily
16:37:58 <cloudnull> yes, seems like a long time but at our current rate of review / change its not that much time.
16:38:12 <hughsaunders> we can all work on it in SAT (if not done by then)
16:38:54 <cloudnull> hughsaunders: we should have it done before.
16:39:34 <cloudnull> we should be working on solving kilo issues no attempting to genericize.
16:39:35 <palendae> cloudnull: Hopefully the reviews will go faster if it's smaller patches. That's what people seem to be saying, anyway
16:39:41 <cloudnull> kk
16:39:43 <cloudnull> moving on
16:39:50 <b3rnard0> #topic Gating
16:40:07 <cloudnull> odyssey4me: patch is looking good .
16:40:16 <cloudnull> hughsaunders where are we on getting tempest to run ?
16:41:04 <cloudnull> using the commit_aio ?
16:41:37 <cloudnull> hughsaunders anything  ?
16:41:38 <hughsaunders> cloudnull: had a successful run today: https://review.openstack.org/#/c/154799/12 but not reliable yet
16:42:24 <cloudnull> what's "not reliable yet"
16:42:39 <hughsaunders> the build after that failed (patch set 11)
16:44:00 <hughsaunders> latest run failed on installing cinder (unrelated) the one before that failed on restarting keystone-apache (port 5k in use)
16:44:38 <hughsaunders> so currently dealing with annoying failures that aren't actually related to tempest. The last run that got as far as tempest booted an instance, created a volume, attached the volume, but then timed out.
16:45:13 <cloudnull> thats likley related to ssh timeout.
16:45:33 <cloudnull> also i see that we're still hard coding the default networks ?
16:45:37 <hughsaunders> was a 504 gateway timeout
16:46:34 <hughsaunders> I moved the cidrs into group_vars/tempest_all so they can be overriden
16:47:04 <cloudnull> in tempest.conf.j2 i see "default_network = 192.168.74.0/24"
16:47:21 <hughsaunders> that will need to be fixed then
16:48:19 <cloudnull> ok, we'll have to look at these issues. Maybe we can port over the role for tempest our of the massive-patch(tm) which will further reduce the size of that patch.
16:48:36 <hughsaunders> cloudnull: did you change it much?
16:48:58 <b3rnard0> #info cloudnull: in tempest.conf.j2 i see "default_network = 192.168.74.0/24”; hughsaunders: that will need to be fixed then
16:49:03 <cloudnull> not much , but i did map out all of the config options to their appropriate sections.
16:49:28 <cloudnull> and like i mentioned earlier , the massive patch is currently passing gating using tempest.
16:49:38 <cloudnull> so i have it functional in that role.
16:49:53 <cloudnull> do we have anyone from the rax qe here?
16:50:26 <palendae> BjoernT? ^
16:51:01 <cloudnull> what else gating related do we have going on ?
16:51:28 <cloudnull> ok moving on
16:51:28 <BjoernT> I'm not officially QE but I kinda fall into the bugs as early adaptor
16:51:36 <hughsaunders> work on converting the commit labs to clean up rather than rekick
16:52:08 <hughsaunders> QE are working on converting their nightly labs to rekick
16:52:14 <cloudnull> ok.
16:52:29 <cloudnull> BjoernT: do you have anything to add to gating?
16:52:41 <hughsaunders> Also there is still work to be done to use the same set of pre os-ansible-deployment plays across all the labs
16:53:05 <palendae> hughsaunders: That's kind of rax-specific no?
16:53:11 <BjoernT> I have not seen the full potential yet since tempest failed in my cloud environment so too early to say from my perspective
16:53:22 <cloudnull> kk
16:53:52 <BjoernT> is tempest supposed to work with 10.1.2 and 9.0.6 ?
16:54:35 <hughsaunders> tempest should run against both
16:54:45 <cloudnull> BjoernT: imo it should. buts its a work in progress.
16:54:59 <BjoernT> ok I will check out the error I got offline with someone
16:55:07 <cloudnull> So we're almost out of time but i wanted to point out that we have a new bp, if we can get some eyes on it that'd be good. "https://blueprints.launchpad.net/openstack-ansible/+spec/improved-network-generation"
16:55:44 <palendae> Yeah; I would like people's thoughts on that blueprint
16:55:45 <cloudnull> do we have anything else that we want to discus?
16:55:54 <b3rnard0> #info BjoernT: is tempest supposed to work with 10.1.2 and 9.0.6 ?;  hughsaunders: tempest should run against both; cloudnull: BjoernT: imo it should. buts its a work in progress.
16:56:05 <b3rnard0> #topic Open discussion
16:56:09 <hughsaunders> that bp looks good, may help reduce the amount of code in jenkins-rpc
16:56:42 <palendae> hughsaunders: Yep, and hopefully other places
16:56:42 <b3rnard0> #info cloudnull: So we're almost out of time but i wanted to point out that we have a new bp, if we can get some eyes on it that'd be good. "https://blueprints.launchpad.net/openstack-ansible/+spec/improved-network-generation"
16:56:55 <odyssey4me> yeah, it does look good
16:57:18 <odyssey4me> the debops roles also look like they may be usable for other infrastructure in the stack we deploy
16:57:36 <cloudnull> so cores, lets get eyes on it and prioritize it accordingly.
16:57:46 * odyssey4me nods
16:57:57 <cloudnull> ok so lets call it
16:58:25 <b3rnard0> #endmeeting