#openstack-ansible log

15:59:42 <evrardjp> #startmeeting openstack_ansible_meeting
15:59:43 <openstack> Meeting started Tue Jan  9 15:59:42 2018 UTC and is due to finish in 60 minutes.  The chair is evrardjp. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:59:44 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:59:46 <openstack> The meeting name has been set to 'openstack_ansible_meeting'
15:59:48 <evrardjp> #topic rollcall
15:59:55 <evrardjp> waiting for a couple of minutes before starting
16:00:00 <cloudnull> o/
16:00:33 <d34dh0r53> o/
16:00:47 <mgariepy> half-o/
16:01:15 <evrardjp> wow d34dh0r53 is in the place! Glad to see you :)
16:01:26 <evrardjp> #topic new year
16:01:32 <evrardjp> Please allow me to start the first meeting of 2018 by wishing you all an happy new year.
16:01:33 <d34dh0r53> :) good to be here
16:01:34 <evrardjp> I hope you had nice winter holidays.
16:01:47 <evrardjp> :)
16:01:51 <d34dh0r53> Likewise
16:02:04 <evrardjp> #topic this week's focus introduction
16:02:07 <prometheanfire> hi
16:02:11 <evrardjp> A new topic/section on our weekly meeting!
16:02:16 <evrardjp> A little backstory first.
16:02:20 <evrardjp> We faced a few challenges at the beginning of this cycle.
16:02:41 <evrardjp> While these challenges came mostly from external factors, I think we sometimes failed to work together in their resolution.
16:02:53 <evrardjp> After analysis, I've noticed that it mostly came from different people working on different things, and then those different things would land together, making it harder to fix compared to a more coordinated approach.
16:03:07 <evrardjp> After discussing with a few of you, I realized reducing the amount of community meetings also reduced our communication abilities, which led to less coordination.
16:03:16 <evrardjp> I think we should therefore have a "week's focus".
16:03:18 <openstackgerrit> Merged openstack/openstack-ansible master: Stable keepalived priorities  https://review.openstack.org/532135
16:03:20 <evrardjp> This "focus" thing will allow us to be in sync with what we do together, avoid misunderstandings.
16:03:34 <evrardjp> At each community meeting (currently happening at the last Tuesday of the month), I'll propose next month focuses, and we'll adapt the planning together.
16:03:54 <evrardjp> What's your opininon? Is there anyone opposed to this idea?
16:04:06 <spotz> \o/ I'm late catching up
16:05:23 <spotz> If we're having focuses we also need to allow for emergencies
16:05:54 <evrardjp> yeah I think we need ofc space for modifications of the planning
16:06:17 <evrardjp> but they should be relatively minimum, and only about fixing something we should triage as critical
16:06:36 <evrardjp> (which means blocking people because of broken gates)
16:07:06 <spotz> Well a bug could be critical as well. Maybe a list of what constitutes critical
16:07:19 <evrardjp> that's what I meant
16:08:02 <evrardjp> modifications of the planning should be minimal, the modification should only happen in case of a critical bug appearing.
16:08:49 <evrardjp> also I think this should be a FOCUS not a hinderance
16:09:05 <evrardjp> people should still be able to do their usual work, but remember what the focus is
16:09:20 <evrardjp> to avoid stepping on each other toes
16:10:23 <odyssey4me> o/ not sure how this would play out - could you give an example of what a focus would be, and what that would mean in real terms?
16:11:08 <openstackgerrit> Merged openstack/openstack-ansible master: Avoid retrieving IP from a missing bridge  https://review.openstack.org/524738
16:11:38 <evrardjp> odyssey4me: I thought of different focuses, depending on where we are in the cycle
16:11:49 <evrardjp> handling deprecations would be one
16:12:43 <evrardjp> so if during that week's focus we are cleaning the old things, we know the next week we could have things linked to the removal appearing.
16:13:13 <openstackgerrit> Gaudenz Steinlin proposed openstack/openstack-ansible stable/pike: Stable keepalived priorities  https://review.openstack.org/532230
16:13:17 <evrardjp> for this week I thought the focus could be on fixing the upgrade from P-Q. let me explain that:
16:13:28 <odyssey4me> ok, that works for short term things - but what about long term things like implementing the nspawn usage or changing up the inventory or changing up how the software installs are done
16:13:51 <evrardjp> This could still integrate.
16:14:13 <evrardjp> A cycle is 6 months
16:14:40 <evrardjp> if we plan the focus month per month, it would allow us to have a big picture and a more organized planning, together, of what happens in the cycle
16:15:04 <odyssey4me> well, I'd suggest that you give your suggestion a try - but bear in mind that some things are long term an dwon't fit well with that... what you're suggesting sounds like a good thing for people who are looking to help, but don't know where to start
16:15:05 <evrardjp> We have a lot of challenges for the last part of the Queens cycle. I know we have many new features to introduce,
16:15:15 <evrardjp> but I'd like to first fix the , in order to be able to upgrade from P-Q, whether they are roles or the AIO.
16:15:39 <evrardjp> If we don't merge new things like nspawn this week, it would help on getting there
16:16:02 <odyssey4me> having a sort-of breakdown for the cycle where there is some sort of focus for a period for the general community to chip in is a great idea IMO
16:16:19 <evrardjp> thanks for your support
16:16:37 <odyssey4me> that allows bugs to be arranged into the schedule too, which might be a nice way of smashing more of them
16:16:41 <evrardjp> Like I said it should be a focus, not a hinderance. So people should still be able to do their work
16:17:03 <openstackgerrit> Gaudenz Steinlin proposed openstack/openstack-ansible stable/ocata: Stable keepalived priorities  https://review.openstack.org/532232
16:17:13 <evrardjp> they should just remember that if there is a bug smash for example, introducing a large feature would be counter productive
16:17:56 <evrardjp> or at the opposite, if there is a large feature that we expect to require a bug smash, we can have the "focus" done that way
16:18:12 <evrardjp> anyone other opinion?
16:18:46 <evrardjp> if not let's move to try it, and see how it goes, start the bug triage for today.
16:19:20 <evrardjp> #topic this week's bugs.
16:19:47 <evrardjp> we have some nasty bugs this week, and a whole series of bugs too.
16:20:01 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1741990
16:20:01 <openstack> Launchpad bug 1741990 in openstack-ansible "os_cinder Associating QoS types to volume types fails" [Undecided,New]
16:20:37 <RandomTech> Is it normal for the rabbgitMQ check to fail the first time you run setup-infrastructure but work the second time?
16:20:45 <odyssey4me> my might want to switch to using storyboard - not sure exactly how it works, but my impression is that was designed with this work method in mind
16:21:26 <evrardjp> odyssey4me: interesting. I will have a look at that.
16:22:04 <odyssey4me> awesome
16:22:08 <spotz> odyssey4me: We've played with storybooard a little during Upstream Institute, it's definitely good for grouping like topics and smaller pieces of an issue. It does take some getting used to
16:22:27 <evrardjp> the whole point of this previous topic was to get attention on the collaborative part. I think it would be helpful to avoid clashes.
16:23:01 <openstackgerrit> Gaudenz Steinlin proposed openstack/openstack-ansible stable/newton: Stable keepalived priorities  https://review.openstack.org/532233
16:23:02 <odyssey4me> yeah, it may help reduce the amount of unplanned work coming into play
16:23:21 <evrardjp> to get back to the bug triage now, I think that this issue looks valid
16:23:42 <evrardjp> what about marking it as confirmed and high?
16:25:06 <evrardjp> ok I will continue the bug triage road.
16:25:14 <cloudnull> on the road again
16:25:23 <evrardjp> haha
16:25:38 <evrardjp> that's a good song, and a good radio broadcast.
16:25:51 <evrardjp> cloudnull: do you happen to agree on the bug classification?
16:26:12 <cloudnull> for keepalived ?
16:26:16 <cloudnull> 532233
16:26:25 <evrardjp> for https://bugs.launchpad.net/openstack-ansible/+bug/1741990
16:26:26 <openstack> Launchpad bug 1741990 in openstack-ansible "os_cinder Associating QoS types to volume types fails" [Undecided,New]
16:26:39 <cloudnull> hahaha .
16:26:42 <evrardjp> confirmed? high?
16:26:47 * cloudnull looking at the wrong link :)
16:26:50 <odyssey4me> yeah, looks valid
16:26:55 <odyssey4me> agreed
16:27:01 <evrardjp> ok thanks.
16:27:11 <cloudnull> ++
16:27:11 <evrardjp> next one is fun
16:27:13 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1741634
16:27:14 <openstack> Launchpad bug 1741634 in openstack-ansible "virtualenv-tools is unreliable for changing path in venvs" [Critical,New] - Assigned to Jean-Philippe Evrard (jean-philippe-evrard)
16:27:17 <cloudnull> i might set that to high
16:27:33 <evrardjp> cloudnull: we agree then :)
16:27:46 <odyssey4me> valid, high/confirmed
16:27:55 <evrardjp> for https://bugs.launchpad.net/openstack-ansible/+bug/1741634 I think we should mark it as critical
16:28:09 <odyssey4me> actually, yeah - critical makes sense
16:28:11 <evrardjp> because it will break gates and expectations
16:28:17 <evrardjp> question
16:28:17 <cloudnull> is it failing for us these days
16:28:28 <cloudnull> or is that just due to the project being unmaintaine d?
16:28:40 <evrardjp> the project is unmaintained
16:28:55 <evrardjp> and won't accept a simple change to change the shebang
16:29:02 <evrardjp> so we are kinda stuck
16:29:10 <evrardjp> so that was my question
16:29:16 <evrardjp> vendor in, or use another thing
16:29:17 <cloudnull> can we just fork and fix
16:29:25 <cloudnull> and then look for an alternitive later?
16:29:31 <evrardjp> I've checked what it would need to do for an alternative
16:29:34 <evrardjp> that's very simple
16:29:41 * hwoarang was dragged to some internal talks and is catching up now
16:29:43 <evrardjp> https://review.openstack.org/#/c/531731/
16:30:15 <evrardjp> was just wondering if you agreed on the approach, because we're gonna have to backport that back down.
16:30:17 <evrardjp> very low.
16:30:33 <evrardjp> I guess we can discuss in the review now
16:30:39 <evrardjp> let's move to next bug
16:30:54 <cloudnull> I'd probably fork and fix the unmaintained project
16:31:03 <cloudnull> then spec out an actual replacement
16:31:11 <evrardjp> ok that's really possible too.
16:31:18 <odyssey4me> cloudnull could, but then how do we publish the updated version to pypi?
16:31:19 <evrardjp> It would be quite easy and less risky
16:31:34 <evrardjp> someone could take the responsibility for it I guess
16:31:52 <cloudnull> odyssey4me: yes. if the maintainer is not wanting to deal with it we cloud take it over on pypi if they're willing
16:31:59 <evrardjp> odyssey4me: the fork approach is less risky but logistically more complex
16:32:09 <hwoarang> sed is simple enough imho
16:32:36 <odyssey4me> the tool doesn't seem to do much else IIRC
16:32:39 <evrardjp> yeah I think sed _should_ cover us enough.
16:32:57 <evrardjp> but that's the thing, it's only a _should_ , according to what I have seen.
16:33:10 <cloudnull> if it works +1
16:33:16 <evrardjp> ok
16:33:24 <evrardjp> let's discuss next bug then
16:33:29 <evrardjp> it blocks this patch :p
16:33:31 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1741471
16:33:32 <openstack> Launchpad bug 1741471 in openstack-ansible "keystone upgrade to Queens failure due to new handler" [Undecided,New] - Assigned to Jean-Philippe Evrard (jean-philippe-evrard)
16:33:47 <evrardjp> this one is fun too, and is not our fault.
16:35:24 <evrardjp> Right now I don't know how to fix it, and I will dig deeper (except if someone knows how to fix this)
16:35:57 <odyssey4me> maybe logan- can help with that as he worked on the LB thingy
16:36:01 <logan-> o/
16:36:05 <evrardjp> the alternative would be to install ansible==2.3 just before running the os_previous_* role, and then do the upgrade after an update of ansible.
16:36:22 <logan-> issue there is upgrades right?
16:36:25 <evrardjp> odyssey4me: it's a pure ansible failure
16:36:37 <evrardjp> but yeah :)
16:36:37 <odyssey4me> ah yes, this is actually an issue with the way our role tests upgrades
16:36:39 <evrardjp> logan-: yes
16:36:49 <evrardjp> logan-: odyssey4me have a look at my last comment
16:36:52 <odyssey4me> we use the newer ansible to test the older role code
16:36:57 <evrardjp> yes
16:37:11 <logan-> https://github.com/openstack/openstack-ansible-os_keystone/blob/a48a73089286a370312a35fc7df54a6a3a513fa2/handlers/main.yml#L107-L109
16:37:14 <evrardjp> we could technically change the older role by adding {{ role_dir }} iirc
16:37:15 <logan-> could we just drop that in the previous branch
16:37:19 <odyssey4me> so yes, we should either change how the upgrade test runs in the roles - or work out a workaround
16:37:35 <evrardjp> logan-: that sounds a bad idea
16:37:40 <odyssey4me> logan- dropping it in the older branch creates turtles
16:37:54 <evrardjp> we could define that handler in the playbook without needing that
16:38:02 <odyssey4me> drop into pike, breaks the pike upgrade test - drop into ocata doesn't support the listen thing in the handler
16:38:03 <evrardjp> but that's just the start of all our failures
16:38:14 <evrardjp> we are NOT testing pike in reality.
16:38:18 <logan-> gotcha odyssey4me
16:38:40 <evrardjp> odyssey4me: in this case that would stop at Pike, but yes, it could technically.
16:38:55 <evrardjp> (ansible 2.3 and 2.2 having the same behavior)
16:39:08 <odyssey4me> best longer term strategy is to make the upgrade test use the right ansible for the initial deploy... but then it also needs to implement the older roles for everything... so the whole upgrade jobs needs to change quite dramatically
16:39:11 <evrardjp> my concern is that we aren't testing pike to queens
16:39:15 <openstackgerrit> Gaudenz Steinlin proposed openstack/openstack-ansible stable/pike: Avoid retrieving IP from a missing bridge  https://review.openstack.org/532243
16:39:26 <evrardjp> odyssey4me: yeah that's my concern.
16:39:34 <openstackgerrit> Gaudenz Steinlin proposed openstack/openstack-ansible stable/ocata: Avoid retrieving IP from a missing bridge  https://review.openstack.org/532244
16:39:37 <evrardjp> that's too big of a change.
16:39:47 <odyssey4me> it might be better to just make the role tests use the integrated build, but with a more limited inventory
16:39:49 <openstackgerrit> Gaudenz Steinlin proposed openstack/openstack-ansible stable/newton: Avoid retrieving IP from a missing bridge  https://review.openstack.org/532245
16:40:01 <odyssey4me> again, that's a significant body of work
16:40:05 <evrardjp> I thought the change of ansible version just before and after sounded better, but I don't think it's a solution, merely a workaround
16:40:16 <evrardjp> odyssey4me: I agree.
16:40:31 <evrardjp> which could be a focus in the future ! :D
16:40:42 <evrardjp> (I am trying to sell this one very hard!)
16:40:52 <evrardjp> anyway
16:41:01 <odyssey4me> currently it deploys the current branch for all things except the role being tested, it then deploys the previous branch role, then deploys the current branch role
16:41:04 <evrardjp> I think we are safe to mark this as confirmed.
16:41:17 <evrardjp> odyssey4me: yes
16:41:28 <evrardjp> and under ansible 2.4 it doesn't do previous for all tasks
16:41:31 <odyssey4me> so it's not as simple as just switching the ansible version, because that may not work with the new roles for all the infra
16:41:41 <evrardjp> it does for the main.yml includes, but not their includes.
16:42:05 <evrardjp> no I meant just before the os_previous_role
16:42:22 <evrardjp> the rest being still the current version
16:42:25 <odyssey4me> oh bother - I've seen a pretty nasty pattern to do the different types of includes based on ansible version, but it's almost unreadable
16:42:40 <evrardjp> no that's not what I meant
16:42:52 <evrardjp> we do have many playbooks
16:43:26 <evrardjp> the last ones are doing the previous branch role execution, then the current branch role execution and the functional testing
16:44:03 <evrardjp> I'd insert a ansible downgrade based on tests/common/previous before the previous branch execution, and an ansible upgrade based on tests/common
16:44:07 <evrardjp> but that's all a mess IMO
16:44:14 <odyssey4me> if we can finalise getting the integrated build to using a static inventory then we can switch the roles to using it instead of something else
16:44:47 <odyssey4me> hmm, that could work - messy as you say, but workable
16:44:57 <odyssey4me> another option could just be to be rid of tox and to use two venvs
16:45:06 <odyssey4me> one for current, one for previous
16:45:11 <evrardjp> that sounds nice
16:45:21 <odyssey4me> instead of using tox to build the venv, do it ourself
16:45:53 <evrardjp> so we'd have to run x shell tasks, one to do the infra, one to do the old role with previous venv, the rest?
16:46:01 <evrardjp> and one with the rest*
16:46:26 <evrardjp> that sounds cleaner in the meantime we reform stuff.
16:46:37 <odyssey4me> well, tox is just running scripts anyway - so we just switch to zuul executing the scripts and make the scripts facilitate everything
16:47:08 <odyssey4me> I did hope to get to doing that this cycle, but time has not been kind.
16:47:26 <evrardjp> that is interesting. But just to make sure we are in sync, we'll not change the stable/pike code to make the code work under ansible 2.3 and ansible 2.4
16:47:35 <evrardjp> because that was also an option
16:47:50 <evrardjp> (not the handlers part, but at least adapting the includes)
16:47:53 <odyssey4me> I'd rather avoid changing the previous branch to cater for the current dev head.
16:48:09 <odyssey4me> The only reason to port stuff back is to make it easier to keep the test implementations more maintainable.
16:48:19 <evrardjp> It makes sense to me.
16:48:36 <odyssey4me> But that would be the test implementations only, not the main body of role code.
16:49:19 <evrardjp> yeah here it would change the body of the code to change the include from include: includefile.yml to include: "{{ roledir }}/includefile.yml"
16:50:02 <evrardjp> I saw there is another config thing that could alter that loading, so i will double check if that's not something we could do
16:51:44 <evrardjp> ok let's mark this as confirmed and critical because it blocks gates, but we'll discuss implementation details later
16:52:00 <odyssey4me> sure
16:52:04 <evrardjp> next
16:52:06 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1741462
16:52:07 <openstack> Launchpad bug 1741462 in openstack-ansible "P-Q upgrade fails due to change in inventory" [Undecided,New]
16:52:36 <evrardjp> another one I'd consider critical
16:52:44 <odyssey4me> confirmed, crit
16:52:45 <evrardjp> not possible to upgrade right now
16:53:00 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1741247
16:53:00 <openstack> Launchpad bug 1741247 in openstack-ansible "Upgrade from Ocata fails with timeout" [Undecided,New]
16:53:04 <odyssey4me> it's breaking upgrades, and we're nearing the end of the cycle
16:53:48 <odyssey4me> our timeout is already too long - infra doesn't like longer than 3 hrs and we have 4 IIRC
16:54:03 <evrardjp> that's bad news.
16:54:15 <odyssey4me> that said, it's old code - not much we can do
16:54:32 <evrardjp> any idea of what we can do for that then?
16:54:38 <evrardjp> Manual testing and removing the periodics?
16:54:42 <evrardjp> that doesn't sound better.
16:54:46 <odyssey4me> only option might be to use 100% serial values for the initial deploy to try and cut time - or to cut the number of containers down like we have for master
16:55:21 <evrardjp> master has seen lots of performance improvements
16:55:37 <evrardjp> maybe the 100% serial would be a good temp fix
16:55:48 <evrardjp> let's mark this as confirmed and critical too?
16:55:57 <odyssey4me> yup
16:56:19 <evrardjp> next
16:56:22 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1741235
16:56:23 <openstack> Launchpad bug 1741235 in openstack-ansible "Shallow cloning doesn't work during repo-build" [Undecided,New]
16:56:38 <evrardjp> I'd say this is a whishlist
16:57:39 <odyssey4me> yes
16:57:45 <evrardjp> we can consider this as opinion too
16:57:57 <odyssey4me> it doesn't matter too much - and is part of the longer term plan for the python deploy bits anyway
16:58:01 <evrardjp> opinion -> "doesn't fit the project plan"
16:58:17 <evrardjp> ok let's mark it as wishlist then
16:58:33 <evrardjp> last for today
16:58:35 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1741225
16:58:36 <openstack> Launchpad bug 1741225 in openstack-ansible "Check for NOPASSWD in sudo configuration can't be disabled" [Undecided,New]
16:58:56 <odyssey4me> reality is that infra is planning to publish all services to pypi some time in the future so we won't *have* to use SHA's - we'll be able to just use pypi with appropriate pins
16:59:35 <evrardjp> odyssey4me: that is good when we think about it.
16:59:47 <evrardjp> for last bug, mhayden are you there?
16:59:51 <odyssey4me> looks like mhayden has confirmed
16:59:57 <odyssey4me> maybe low/med
17:00:04 <evrardjp> ok let's mark this as confirmed and med
17:00:13 <evrardjp> thanks everyone!
17:00:20 <evrardjp> next week's gonna be though too :(
17:00:29 <evrardjp> we have many big ones remaining in the list
17:00:36 <evrardjp> anyway, thanks for your time all!
17:00:41 <evrardjp> #endmeeting