15:59:42 #startmeeting openstack_ansible_meeting 15:59:43 Meeting started Tue Jan 9 15:59:42 2018 UTC and is due to finish in 60 minutes. The chair is evrardjp. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:59:44 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:59:46 The meeting name has been set to 'openstack_ansible_meeting' 15:59:48 #topic rollcall 15:59:55 waiting for a couple of minutes before starting 16:00:00 o/ 16:00:33 o/ 16:00:47 half-o/ 16:01:15 wow d34dh0r53 is in the place! Glad to see you :) 16:01:26 #topic new year 16:01:32 Please allow me to start the first meeting of 2018 by wishing you all an happy new year. 16:01:33 :) good to be here 16:01:34 I hope you had nice winter holidays. 16:01:47 :) 16:01:51 Likewise 16:02:04 #topic this week's focus introduction 16:02:07 hi 16:02:11 A new topic/section on our weekly meeting! 16:02:16 A little backstory first. 16:02:20 We faced a few challenges at the beginning of this cycle. 16:02:41 While these challenges came mostly from external factors, I think we sometimes failed to work together in their resolution. 16:02:53 After analysis, I've noticed that it mostly came from different people working on different things, and then those different things would land together, making it harder to fix compared to a more coordinated approach. 16:03:07 After discussing with a few of you, I realized reducing the amount of community meetings also reduced our communication abilities, which led to less coordination. 16:03:16 I think we should therefore have a "week's focus". 16:03:18 Merged openstack/openstack-ansible master: Stable keepalived priorities https://review.openstack.org/532135 16:03:20 This "focus" thing will allow us to be in sync with what we do together, avoid misunderstandings. 16:03:34 At each community meeting (currently happening at the last Tuesday of the month), I'll propose next month focuses, and we'll adapt the planning together. 16:03:54 What's your opininon? Is there anyone opposed to this idea? 16:04:06 \o/ I'm late catching up 16:05:23 If we're having focuses we also need to allow for emergencies 16:05:54 yeah I think we need ofc space for modifications of the planning 16:06:17 but they should be relatively minimum, and only about fixing something we should triage as critical 16:06:36 (which means blocking people because of broken gates) 16:07:06 Well a bug could be critical as well. Maybe a list of what constitutes critical 16:07:19 that's what I meant 16:08:02 modifications of the planning should be minimal, the modification should only happen in case of a critical bug appearing. 16:08:49 also I think this should be a FOCUS not a hinderance 16:09:05 people should still be able to do their usual work, but remember what the focus is 16:09:20 to avoid stepping on each other toes 16:10:23 o/ not sure how this would play out - could you give an example of what a focus would be, and what that would mean in real terms? 16:11:08 Merged openstack/openstack-ansible master: Avoid retrieving IP from a missing bridge https://review.openstack.org/524738 16:11:38 odyssey4me: I thought of different focuses, depending on where we are in the cycle 16:11:49 handling deprecations would be one 16:12:43 so if during that week's focus we are cleaning the old things, we know the next week we could have things linked to the removal appearing. 16:13:13 Gaudenz Steinlin proposed openstack/openstack-ansible stable/pike: Stable keepalived priorities https://review.openstack.org/532230 16:13:17 for this week I thought the focus could be on fixing the upgrade from P-Q. let me explain that: 16:13:28 ok, that works for short term things - but what about long term things like implementing the nspawn usage or changing up the inventory or changing up how the software installs are done 16:13:51 This could still integrate. 16:14:13 A cycle is 6 months 16:14:40 if we plan the focus month per month, it would allow us to have a big picture and a more organized planning, together, of what happens in the cycle 16:15:04 well, I'd suggest that you give your suggestion a try - but bear in mind that some things are long term an dwon't fit well with that... what you're suggesting sounds like a good thing for people who are looking to help, but don't know where to start 16:15:05 We have a lot of challenges for the last part of the Queens cycle. I know we have many new features to introduce, 16:15:15 but I'd like to first fix the , in order to be able to upgrade from P-Q, whether they are roles or the AIO. 16:15:39 If we don't merge new things like nspawn this week, it would help on getting there 16:16:02 having a sort-of breakdown for the cycle where there is some sort of focus for a period for the general community to chip in is a great idea IMO 16:16:19 thanks for your support 16:16:37 that allows bugs to be arranged into the schedule too, which might be a nice way of smashing more of them 16:16:41 Like I said it should be a focus, not a hinderance. So people should still be able to do their work 16:17:03 Gaudenz Steinlin proposed openstack/openstack-ansible stable/ocata: Stable keepalived priorities https://review.openstack.org/532232 16:17:13 they should just remember that if there is a bug smash for example, introducing a large feature would be counter productive 16:17:56 or at the opposite, if there is a large feature that we expect to require a bug smash, we can have the "focus" done that way 16:18:12 anyone other opinion? 16:18:46 if not let's move to try it, and see how it goes, start the bug triage for today. 16:19:20 #topic this week's bugs. 16:19:47 we have some nasty bugs this week, and a whole series of bugs too. 16:20:01 #link https://bugs.launchpad.net/openstack-ansible/+bug/1741990 16:20:01 Launchpad bug 1741990 in openstack-ansible "os_cinder Associating QoS types to volume types fails" [Undecided,New] 16:20:37 Is it normal for the rabbgitMQ check to fail the first time you run setup-infrastructure but work the second time? 16:20:45 my might want to switch to using storyboard - not sure exactly how it works, but my impression is that was designed with this work method in mind 16:21:26 odyssey4me: interesting. I will have a look at that. 16:22:04 awesome 16:22:08 odyssey4me: We've played with storybooard a little during Upstream Institute, it's definitely good for grouping like topics and smaller pieces of an issue. It does take some getting used to 16:22:27 the whole point of this previous topic was to get attention on the collaborative part. I think it would be helpful to avoid clashes. 16:23:01 Gaudenz Steinlin proposed openstack/openstack-ansible stable/newton: Stable keepalived priorities https://review.openstack.org/532233 16:23:02 yeah, it may help reduce the amount of unplanned work coming into play 16:23:21 to get back to the bug triage now, I think that this issue looks valid 16:23:42 what about marking it as confirmed and high? 16:25:06 ok I will continue the bug triage road. 16:25:14 on the road again 16:25:23 haha 16:25:38 that's a good song, and a good radio broadcast. 16:25:51 cloudnull: do you happen to agree on the bug classification? 16:26:12 for keepalived ? 16:26:16 532233 16:26:25 for https://bugs.launchpad.net/openstack-ansible/+bug/1741990 16:26:26 Launchpad bug 1741990 in openstack-ansible "os_cinder Associating QoS types to volume types fails" [Undecided,New] 16:26:39 hahaha . 16:26:42 confirmed? high? 16:26:47 * cloudnull looking at the wrong link :) 16:26:50 yeah, looks valid 16:26:55 agreed 16:27:01 ok thanks. 16:27:11 ++ 16:27:11 next one is fun 16:27:13 #link https://bugs.launchpad.net/openstack-ansible/+bug/1741634 16:27:14 Launchpad bug 1741634 in openstack-ansible "virtualenv-tools is unreliable for changing path in venvs" [Critical,New] - Assigned to Jean-Philippe Evrard (jean-philippe-evrard) 16:27:17 i might set that to high 16:27:33 cloudnull: we agree then :) 16:27:46 valid, high/confirmed 16:27:55 for https://bugs.launchpad.net/openstack-ansible/+bug/1741634 I think we should mark it as critical 16:28:09 actually, yeah - critical makes sense 16:28:11 because it will break gates and expectations 16:28:17 question 16:28:17 is it failing for us these days 16:28:28 or is that just due to the project being unmaintaine d? 16:28:40 the project is unmaintained 16:28:55 and won't accept a simple change to change the shebang 16:29:02 so we are kinda stuck 16:29:10 so that was my question 16:29:16 vendor in, or use another thing 16:29:17 can we just fork and fix 16:29:25 and then look for an alternitive later? 16:29:31 I've checked what it would need to do for an alternative 16:29:34 that's very simple 16:29:41 * hwoarang was dragged to some internal talks and is catching up now 16:29:43 https://review.openstack.org/#/c/531731/ 16:30:15 was just wondering if you agreed on the approach, because we're gonna have to backport that back down. 16:30:17 very low. 16:30:33 I guess we can discuss in the review now 16:30:39 let's move to next bug 16:30:54 I'd probably fork and fix the unmaintained project 16:31:03 then spec out an actual replacement 16:31:11 ok that's really possible too. 16:31:18 cloudnull could, but then how do we publish the updated version to pypi? 16:31:19 It would be quite easy and less risky 16:31:34 someone could take the responsibility for it I guess 16:31:52 odyssey4me: yes. if the maintainer is not wanting to deal with it we cloud take it over on pypi if they're willing 16:31:59 odyssey4me: the fork approach is less risky but logistically more complex 16:32:09 sed is simple enough imho 16:32:36 the tool doesn't seem to do much else IIRC 16:32:39 yeah I think sed _should_ cover us enough. 16:32:57 but that's the thing, it's only a _should_ , according to what I have seen. 16:33:10 if it works +1 16:33:16 ok 16:33:24 let's discuss next bug then 16:33:29 it blocks this patch :p 16:33:31 #link https://bugs.launchpad.net/openstack-ansible/+bug/1741471 16:33:32 Launchpad bug 1741471 in openstack-ansible "keystone upgrade to Queens failure due to new handler" [Undecided,New] - Assigned to Jean-Philippe Evrard (jean-philippe-evrard) 16:33:47 this one is fun too, and is not our fault. 16:35:24 Right now I don't know how to fix it, and I will dig deeper (except if someone knows how to fix this) 16:35:57 maybe logan- can help with that as he worked on the LB thingy 16:36:01 o/ 16:36:05 the alternative would be to install ansible==2.3 just before running the os_previous_* role, and then do the upgrade after an update of ansible. 16:36:22 issue there is upgrades right? 16:36:25 odyssey4me: it's a pure ansible failure 16:36:37 but yeah :) 16:36:37 ah yes, this is actually an issue with the way our role tests upgrades 16:36:39 logan-: yes 16:36:49 logan-: odyssey4me have a look at my last comment 16:36:52 we use the newer ansible to test the older role code 16:36:57 yes 16:37:11 https://github.com/openstack/openstack-ansible-os_keystone/blob/a48a73089286a370312a35fc7df54a6a3a513fa2/handlers/main.yml#L107-L109 16:37:14 we could technically change the older role by adding {{ role_dir }} iirc 16:37:15 could we just drop that in the previous branch 16:37:19 so yes, we should either change how the upgrade test runs in the roles - or work out a workaround 16:37:35 logan-: that sounds a bad idea 16:37:40 logan- dropping it in the older branch creates turtles 16:37:54 we could define that handler in the playbook without needing that 16:38:02 drop into pike, breaks the pike upgrade test - drop into ocata doesn't support the listen thing in the handler 16:38:03 but that's just the start of all our failures 16:38:14 we are NOT testing pike in reality. 16:38:18 gotcha odyssey4me 16:38:40 odyssey4me: in this case that would stop at Pike, but yes, it could technically. 16:38:55 (ansible 2.3 and 2.2 having the same behavior) 16:39:08 best longer term strategy is to make the upgrade test use the right ansible for the initial deploy... but then it also needs to implement the older roles for everything... so the whole upgrade jobs needs to change quite dramatically 16:39:11 my concern is that we aren't testing pike to queens 16:39:15 Gaudenz Steinlin proposed openstack/openstack-ansible stable/pike: Avoid retrieving IP from a missing bridge https://review.openstack.org/532243 16:39:26 odyssey4me: yeah that's my concern. 16:39:34 Gaudenz Steinlin proposed openstack/openstack-ansible stable/ocata: Avoid retrieving IP from a missing bridge https://review.openstack.org/532244 16:39:37 that's too big of a change. 16:39:47 it might be better to just make the role tests use the integrated build, but with a more limited inventory 16:39:49 Gaudenz Steinlin proposed openstack/openstack-ansible stable/newton: Avoid retrieving IP from a missing bridge https://review.openstack.org/532245 16:40:01 again, that's a significant body of work 16:40:05 I thought the change of ansible version just before and after sounded better, but I don't think it's a solution, merely a workaround 16:40:16 odyssey4me: I agree. 16:40:31 which could be a focus in the future ! :D 16:40:42 (I am trying to sell this one very hard!) 16:40:52 anyway 16:41:01 currently it deploys the current branch for all things except the role being tested, it then deploys the previous branch role, then deploys the current branch role 16:41:04 I think we are safe to mark this as confirmed. 16:41:17 odyssey4me: yes 16:41:28 and under ansible 2.4 it doesn't do previous for all tasks 16:41:31 so it's not as simple as just switching the ansible version, because that may not work with the new roles for all the infra 16:41:41 it does for the main.yml includes, but not their includes. 16:42:05 no I meant just before the os_previous_role 16:42:22 the rest being still the current version 16:42:25 oh bother - I've seen a pretty nasty pattern to do the different types of includes based on ansible version, but it's almost unreadable 16:42:40 no that's not what I meant 16:42:52 we do have many playbooks 16:43:26 the last ones are doing the previous branch role execution, then the current branch role execution and the functional testing 16:44:03 I'd insert a ansible downgrade based on tests/common/previous before the previous branch execution, and an ansible upgrade based on tests/common 16:44:07 but that's all a mess IMO 16:44:14 if we can finalise getting the integrated build to using a static inventory then we can switch the roles to using it instead of something else 16:44:47 hmm, that could work - messy as you say, but workable 16:44:57 another option could just be to be rid of tox and to use two venvs 16:45:06 one for current, one for previous 16:45:11 that sounds nice 16:45:21 instead of using tox to build the venv, do it ourself 16:45:53 so we'd have to run x shell tasks, one to do the infra, one to do the old role with previous venv, the rest? 16:46:01 and one with the rest* 16:46:26 that sounds cleaner in the meantime we reform stuff. 16:46:37 well, tox is just running scripts anyway - so we just switch to zuul executing the scripts and make the scripts facilitate everything 16:47:08 I did hope to get to doing that this cycle, but time has not been kind. 16:47:26 that is interesting. But just to make sure we are in sync, we'll not change the stable/pike code to make the code work under ansible 2.3 and ansible 2.4 16:47:35 because that was also an option 16:47:50 (not the handlers part, but at least adapting the includes) 16:47:53 I'd rather avoid changing the previous branch to cater for the current dev head. 16:48:09 The only reason to port stuff back is to make it easier to keep the test implementations more maintainable. 16:48:19 It makes sense to me. 16:48:36 But that would be the test implementations only, not the main body of role code. 16:49:19 yeah here it would change the body of the code to change the include from include: includefile.yml to include: "{{ roledir }}/includefile.yml" 16:50:02 I saw there is another config thing that could alter that loading, so i will double check if that's not something we could do 16:51:44 ok let's mark this as confirmed and critical because it blocks gates, but we'll discuss implementation details later 16:52:00 sure 16:52:04 next 16:52:06 #link https://bugs.launchpad.net/openstack-ansible/+bug/1741462 16:52:07 Launchpad bug 1741462 in openstack-ansible "P-Q upgrade fails due to change in inventory" [Undecided,New] 16:52:36 another one I'd consider critical 16:52:44 confirmed, crit 16:52:45 not possible to upgrade right now 16:53:00 #link https://bugs.launchpad.net/openstack-ansible/+bug/1741247 16:53:00 Launchpad bug 1741247 in openstack-ansible "Upgrade from Ocata fails with timeout" [Undecided,New] 16:53:04 it's breaking upgrades, and we're nearing the end of the cycle 16:53:48 our timeout is already too long - infra doesn't like longer than 3 hrs and we have 4 IIRC 16:54:03 that's bad news. 16:54:15 that said, it's old code - not much we can do 16:54:32 any idea of what we can do for that then? 16:54:38 Manual testing and removing the periodics? 16:54:42 that doesn't sound better. 16:54:46 only option might be to use 100% serial values for the initial deploy to try and cut time - or to cut the number of containers down like we have for master 16:55:21 master has seen lots of performance improvements 16:55:37 maybe the 100% serial would be a good temp fix 16:55:48 let's mark this as confirmed and critical too? 16:55:57 yup 16:56:19 next 16:56:22 #link https://bugs.launchpad.net/openstack-ansible/+bug/1741235 16:56:23 Launchpad bug 1741235 in openstack-ansible "Shallow cloning doesn't work during repo-build" [Undecided,New] 16:56:38 I'd say this is a whishlist 16:57:39 yes 16:57:45 we can consider this as opinion too 16:57:57 it doesn't matter too much - and is part of the longer term plan for the python deploy bits anyway 16:58:01 opinion -> "doesn't fit the project plan" 16:58:17 ok let's mark it as wishlist then 16:58:33 last for today 16:58:35 #link https://bugs.launchpad.net/openstack-ansible/+bug/1741225 16:58:36 Launchpad bug 1741225 in openstack-ansible "Check for NOPASSWD in sudo configuration can't be disabled" [Undecided,New] 16:58:56 reality is that infra is planning to publish all services to pypi some time in the future so we won't *have* to use SHA's - we'll be able to just use pypi with appropriate pins 16:59:35 odyssey4me: that is good when we think about it. 16:59:47 for last bug, mhayden are you there? 16:59:51 looks like mhayden has confirmed 16:59:57 maybe low/med 17:00:04 ok let's mark this as confirmed and med 17:00:13 thanks everyone! 17:00:20 next week's gonna be though too :( 17:00:29 we have many big ones remaining in the list 17:00:36 anyway, thanks for your time all! 17:00:41 #endmeeting