16:00:27 <evrardjp> #startmeeting openstack_ansible_meeting 16:00:28 <openstack> Meeting started Tue Dec 12 16:00:27 2017 UTC and is due to finish in 60 minutes. The chair is evrardjp. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:29 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:32 <openstack> The meeting name has been set to 'openstack_ansible_meeting' 16:01:23 <evrardjp> #topic bugs with new status 16:01:40 <evrardjp> is there anyone ready for bug triage? 16:01:57 <hwoarang> im here 16:01:58 <odyssey4me> o/ here, although between some things 16:02:10 <evrardjp> let's start 16:02:11 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1737013 16:02:12 <openstack> Launchpad bug 1737013 in openstack-ansible "Using non-default Ceph user for Cinder Backup service" [Undecided,New] 16:02:38 <evrardjp> confirmed medium? 16:02:41 <evrardjp> What about the solution? 16:03:29 <hwoarang> looks sensible to me 16:03:52 <spotz> hey 16:03:56 <evrardjp> ok 16:04:02 <odyssey4me> hmm, we should set a group var to make sure that ceph_client and cinder agree on what's right 16:04:10 <evrardjp> odyssey4me: see my comment 16:04:53 <evrardjp> we may want to skip the group var if we can just pass it over 16:05:01 <odyssey4me> yeah, medium as it's fairly serious 16:05:18 <odyssey4me> I can't confirm without looking at the code itself, but it sounds plausible. 16:05:21 <evrardjp> ok 16:05:30 <evrardjp> I have checked the code it's real. 16:05:32 <evrardjp> :D 16:05:39 <evrardjp> now it's about how to do things 16:05:58 <evrardjp> we can discuss in the bug 16:06:04 <evrardjp> we have too many bugs to sort today 16:06:07 <evrardjp> next 16:06:17 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1736989 16:06:19 <openstack> Launchpad bug 1736989 in openstack-ansible "[Docs] Issue when using --limit to add compute with DVR" [Undecided,New] 16:06:58 <evrardjp> I've seen more than one ppl having failures with limit, maybe we should document this better. 16:07:09 <evrardjp> This is a sub-case, probably from our operations guide 16:07:28 <spotz> evrardjp: pop a note in what you want and then assign to me 16:07:31 <evrardjp> For me it's confirmed and low 16:08:10 <odyssey4me> we either need to test limits so that we fix issues that arise from using it, or we should recommend not using it on specific playbooks 16:08:31 <odyssey4me> the ops guide has a lot of dated info, unfortunately 16:08:48 <odyssey4me> agreed for confirmed, low 16:08:54 <evrardjp> spotz: could you have a look where we have --limit in our docs, and update to add localhost, when needed? 16:09:11 <spotz> evrardjp: grep is my friend:) 16:09:18 <evrardjp> yup :) 16:09:28 <evrardjp> ok cool 16:09:36 <evrardjp> next 16:09:37 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1736731 16:09:38 <openstack> Launchpad bug 1736731 in openstack-ansible "os_nova might create a duplicate cell1" [Undecided,New] 16:09:45 <spotz> evrardjp: travelling with the dog this weekend but if I dont get to it before I leave I'll get it next week 16:09:57 <odyssey4me> also, any reference to lxc-containers-create.yml and --limit should always include lxc_hosts in the limit 16:10:11 <evrardjp> ^ +1 this 16:10:23 <evrardjp> so 16:10:26 <evrardjp> next bug 16:10:33 <evrardjp> Adri2000: was the owner of this bug 16:10:45 <evrardjp> well. is owner :) 16:10:54 <evrardjp> for me it's a question of what we do 16:11:01 <evrardjp> it's all a question of expectations 16:11:42 <Adri2000> I expect that adding a rabbitmq node works :) 16:11:45 <evrardjp> if something changed, like in this case adding a rabbitmq node, should we EDIT the cell by being smart ourselves, or should we fully rely on nova 16:12:22 <evrardjp> well it technically worked: you now had a new cell with your new config :p 16:12:28 <openstackgerrit> Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible master: Remove symlinked git cache functionality https://review.openstack.org/521749 16:12:35 <evrardjp> the problem is that you had two cells with the same name 16:12:40 <evrardjp> am I correct? 16:12:44 <Adri2000> yes 16:12:47 <evrardjp> Or the problem went even further? 16:12:55 <Adri2000> that has some odd consequences 16:13:00 <Miouge> I’m running into problems around the physical_interface_mappings in linuxbridge_agent.ini with this my openstack_user_config http://paste.openstack.org/show/628740/ br-public shows up in both compute and neutron hosts :( I was expecting the group_binds to apply it only on neutron nodes? 16:13:08 <odyssey4me> hmm, I wonder if there's a create-or-update command 16:13:09 <Adri2000> like nova compute service list listing every service twice 16:13:17 <Adri2000> openstack compute service list* 16:13:23 <evrardjp> so because nova is creating cells based on rabbitmq and galera nodes, we should check the existence of the cell name only ? 16:13:47 <evrardjp> odyssey4me: I am not aware of it 16:14:11 <Adri2000> as long as OSA supports deploying only one cell, I think checking whether "cell1" already exists and then update it should be fine 16:14:42 <Adri2000> it looks like tripleo fixed a similar issue in a similar way (I linked their bug report in our bug report) 16:14:54 <odyssey4me> looks like this is where it's done: https://github.com/openstack/openstack-ansible-os_nova/blob/65c12961b4d764ae541e49ec9b582fec086dadc8/tasks/nova_db_setup.yml#L31-L35 16:15:08 <evrardjp> usage: nova-manage cell [-h] {create,delete,list} ... 16:15:20 <odyssey4me> which means that nova itself has to get fixed to prevent duplication 16:15:31 <odyssey4me> and they'll argue that your db should be in sync 16:15:42 <evrardjp> odyssey4me: yeah that's exactly the thing 16:15:45 <odyssey4me> oh cell1... hang on 16:15:58 <evrardjp> it's a nova bug, well no it's not a nova bug, it's your deploy... etc. 16:16:03 <Adri2000> odyssey4me: next task I think 16:16:18 <Adri2000> command: "{{ nova_bin }}/nova-manage cell_v2 create_cell --name {{ nova_cell1_name }} [...] 16:16:47 <evrardjp> usage: nova-manage cell_v2 [-h] 16:16:48 <evrardjp> {create_cell,delete_cell,discover_hosts,list_cells,map_cell0,map_cell_and_hosts,map_instances,simple_cell_setup,update_cell,verify_instance} 16:16:50 <odyssey4me> oh, perhaps we should check for an existing one, then create if it doesn't exist 16:17:01 <odyssey4me> there we go, update_cell 16:17:15 <evrardjp> yup 16:17:25 <evrardjp> so I think we should list_cells 16:17:29 <odyssey4me> so basically change that command task into a shell task which does a check and does the right thing, then reports back whether it changed or not 16:17:30 <evrardjp> see if there is one with maching name 16:17:36 <evrardjp> then update if that's the case 16:17:39 <evrardjp> else, create 16:18:09 <evrardjp> ok sounds like a solved problem! 16:18:19 <odyssey4me> yes, it should definitely do something like that - good find that one! 16:18:31 <evrardjp> yeah thanks Adri2000 16:18:38 <odyssey4me> that'll need backporting to wherever cells started - ocata or pike, can't recall 16:18:39 <evrardjp> confirmed high? 16:18:48 <odyssey4me> yeah 16:18:48 <evrardjp> cell v2 is end of O 16:18:51 <evrardjp> iirc 16:19:08 <odyssey4me> it's in O, yes: https://github.com/openstack/openstack-ansible-os_nova/blob/stable/ocata/tasks/nova_db_setup.yml#L37-L43 16:19:24 <evrardjp> Adri2000: can you submit the patch? 16:19:36 <evrardjp> I propose confirmed and high? 16:19:43 <odyssey4me> yes, agreed 16:19:52 <Adri2000> evrardjp: I can try to submit something this week 16:20:05 <evrardjp> else I just do it quick and you test it :p 16:20:22 <evrardjp> ok let's sync later 16:20:27 <evrardjp> next 16:20:29 <Adri2000> yep :) 16:20:37 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1736726 16:20:39 <openstack> Launchpad bug 1736726 in openstack-ansible "Pike update causes parallel restart of Galera containers" [Undecided,New] 16:20:50 * Adri2000 again o/ 16:20:58 <evrardjp> big deal IMO 16:21:07 <evrardjp> so we have a patch in master to prevent restarts 16:21:19 <evrardjp> we already discussed this on the channel too. 16:21:27 <Adri2000> yes it was discussed in the chan one or two days before I filed the report 16:21:29 <evrardjp> my concern is that: 16:21:37 <odyssey4me> yeah, but I'd rather have a fix figured out 16:22:00 <evrardjp> 1) we don't have a test to catch that . 2) we need to fix it soon. 16:22:08 <odyssey4me> for now we can do a reno though to inform people of the known issue - the trouble is that we need to figure out where the change was introduced so that we can say - any versions from x are affected 16:22:23 <evrardjp> 16.0.3 and 16.0.4 are affected 16:22:39 <evrardjp> so anyone from earlier than that will suffer. 16:22:44 <odyssey4me> when I was digging into this, I thin kall the way back to newton might be affected 16:22:57 <evrardjp> why? 16:23:06 <evrardjp> it's coming from a backport, was that backported to N? 16:23:13 <odyssey4me> but I didn't get to a point of confirming, because we don't have enough information about what the change was to the container config that caused it 16:23:30 <odyssey4me> yes, I think a patch was made to master, and ported back 16:23:50 <evrardjp> 16.0.3 is fine I think 16:23:53 <jmccrory> this has happened before https://review.openstack.org/#/c/426337/ 16:23:56 <odyssey4me> the patch was fine for master (because in a major upgrade we handle this issue), but not for backports (because we can't handle it for minors) 16:24:32 <evrardjp> odyssey4me: I agree there. 16:24:40 <odyssey4me> yep, good find there jmccrory 16:25:07 <evrardjp> jmccrory: there was no documentation 16:25:10 <odyssey4me> we should probably serialise te container config changes in all the playbooks anyway for exactly this reason 16:25:20 <evrardjp> we just fixed the issue by introducing the serialization 16:25:29 <evrardjp> here it's on lxc hosts/lxc container create 16:25:30 <odyssey4me> if we do, this whole issue of when we can and can't restart becomes moot 16:25:52 <odyssey4me> yes, well - if it's in lxc-containers-create then we have a big problem 16:25:59 <Adri2000> https://git.openstack.org/cgit/openstack/openstack-ansible-lxc_container_create/tree/handlers/main.yml#n19 this handler is the issue 16:26:17 <Adri2000> if that handler is called for all the galera containers: you loose 16:26:24 <jmccrory> yeah more impact if it's happening through an lxc host/create play 16:26:27 <evrardjp> odyssey4me: for me, we should BY DEFAULT for a stable branch NOT reboot. 16:26:29 <odyssey4me> Adri2000 that's just doing what it's told - some config changed, and that config change is the real root cause 16:26:56 <Adri2000> odyssey4me: shouldn't we be able to change the lxc containers config without worrying that restart of the containers will break everything? 16:27:08 <odyssey4me> evrardjp we should never allow changes to container config to be ported back, unless we also handle the effects or add a BIG release note 16:27:28 <evrardjp> odyssey4me: I agree. I was on holiday at that time, under the sun :D 16:27:31 <odyssey4me> Adri2000 not really - container config changes do not take effect until a container is stopped, then started again 16:27:42 <evrardjp> I should have eyes everywhere, I know. 16:27:50 <odyssey4me> evrardjp I feel you. 16:28:14 <odyssey4me> do we have any record of what exactly changed in the container config? 16:28:41 <evrardjp> Adri2000: you told me it all happened on the same backport right? 16:28:46 <evrardjp> you had no issues before that 16:28:47 <evrardjp> ? 16:29:06 <Adri2000> yes 16:29:11 <Adri2000> never had a similar issue before 16:29:30 <odyssey4me> if we ran lxc-containers-create in the service playbooks, then we could serialise this sensibly 16:29:34 <odyssey4me> but that doesn't help stable branches 16:29:34 <Adri2000> that happened while doing a regular git pull of stable/pike, bootstrap-ansible && playbooks run 16:29:44 <evrardjp> please also note that there was a change in stable/pike 16:29:50 <evrardjp> https://github.com/openstack/openstack-ansible-lxc_container_create/commit/c41d3b20da6be07d9bf5db7f7e6a1384c7cfb5eb 16:30:03 <evrardjp> on top of the mega backport https://github.com/openstack/openstack-ansible-lxc_container_create/commit/05b84528d100ab73b91e119dd379a8ed0726db7d 16:31:06 <odyssey4me> evrardjp ok, so if we do not reboot the containers by default - how do we handle making sure the containers are properly configured when initially built? 16:31:27 <evrardjp> odyssey4me: that's not possible :p 16:31:30 <odyssey4me> perhaps do something like drop a local fact, then handle the reboot in serial in the service play? 16:31:36 <evrardjp> that's why I think we are in a pickle 16:31:44 <evrardjp> well yes or something like that 16:31:57 <odyssey4me> that's something we can do, and is back portable 16:32:16 <odyssey4me> it'll also survive a failed playbook, which is nice 16:33:01 <evrardjp> odyssey4me: yeah and we are closer to JIT. 16:33:08 <evrardjp> because that's what ppl want 16:33:16 <evrardjp> so good for me. 16:33:34 <evrardjp> I have no cycles for that. 16:35:01 <evrardjp> confirmed high? Maybe it will break some upgrades gates, then I will need to spend cycles on that, but right now I can't say. 16:35:10 <evrardjp> upgrade tests* 16:35:22 <evrardjp> ok for the classification anyone? 16:35:24 <Adri2000> I'd say at least high :) 16:35:44 <Adri2000> anyone with more than 1 galera node is likely to be impacted 16:35:44 <odyssey4me> high, confirmed - assign it to me 16:35:46 <evrardjp> yeah for now i'd say high and move to critical if broken jobs 16:35:51 <odyssey4me> unless someone else wants to take it? 16:36:01 <evrardjp> anyone? 16:36:07 <evrardjp> pretty please? 16:36:09 <evrardjp> :p 16:36:13 <evrardjp> ok that's for odyssey4me then! 16:36:19 <odyssey4me> Adri2000 can you please add the tag version you upgraded from and to, so that I can add a reno 16:36:26 <evrardjp> it should be cloudnull tbh. 16:36:35 <odyssey4me> that'll really help me narrow down where the issue is 16:36:40 <evrardjp> odyssey4me: it's in the bug 16:36:44 <Adri2000> odyssey4me: the bug report contains the commit ids already, because I was not using tags 16:36:56 <Adri2000> was on stable/pike head 16:37:14 <evrardjp> my mom says "You break it, you fix it". 16:37:37 <spotz> heheh 16:37:48 <odyssey4me> ah, perfect thanks 16:37:52 <evrardjp> ok next 16:37:55 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1736719 16:37:56 <openstack> Launchpad bug 1736719 in openstack-ansible "Appendix A: Example test environment configuration in OpenStack-Ansible" [Undecided,New] 16:38:23 <evrardjp> I guess spotz will have a look at it? :) 16:38:27 <odyssey4me> ah yeah, maybe mhayden and/or hwoarang could pick that up? 16:38:30 <evrardjp> hwoarang: mhayden could you help? 16:38:32 <evrardjp> AHAHAH 16:38:43 <evrardjp> yeah. 16:38:46 <evrardjp> mgariepy: too maybe :p 16:38:52 <evrardjp> confirmed and wishlist ok ? 16:39:00 <spotz> Poor appendix A, no body loves it 16:39:35 <evrardjp> spotz: we'll make it better. 16:39:37 <evrardjp> :) 16:40:00 <mhayden> odyssey4me / evrardjp: what's up? 16:40:03 <evrardjp> I assigned that to you spotz , please ask those suse/redhat guys :) 16:40:12 <evrardjp> we need help on sample network config 16:40:14 <evrardjp> mhayden: ^ 16:40:16 <spotz> mhayden go fix appendix a:) 16:40:17 <mhayden> ORLY 16:40:20 <evrardjp> for appendix a 16:40:23 <odyssey4me> mhayden well, evrardjp's beating you in the break-everything-dept 16:40:27 <evrardjp> just copy pasta and we are good 16:40:29 <mhayden> hot dang 16:40:44 <mhayden> evrardjp: i will not let you beat me in gate breakage 16:40:49 <odyssey4me> not just appendix A - ALL OF THEM! 16:40:50 <mhayden> this aggression will not stand 16:40:50 <evrardjp> odyssey4me: you need to bring patches to break stuff :p 16:40:58 <odyssey4me> bwahaha 16:41:10 * mhayden starts writing a new linter 16:41:12 <evrardjp> well, there are ppl that do it magically 16:41:18 <spotz> mhayden: I wonder if we're bad it'll snow again?:) 16:41:35 <mhayden> that make snow sense 16:41:36 <odyssey4me> yes, let's make it snow more - the world is too warm 16:41:37 <evrardjp> mhayden: or new backend for containers. 16:41:56 <evrardjp> odyssey4me: we are losing the debate folks. 16:41:58 <odyssey4me> anyway, that's 10 mins of our lives which bug triage will never get back 16:41:58 <mhayden> let's make a new container frontend 16:42:07 <spotz> hehe 16:42:09 <evrardjp> I think right now, we need a good breakage. 16:42:11 <evrardjp> :p 16:42:28 <mhayden> i'm about to take down my pike env to re-do the networking and add bonding 16:42:33 <mhayden> so i might be able to put some of that in 16:42:40 <evrardjp> I am glad I managed to break it twice in my lifetime, I think I am far from all the others :D 16:42:44 <hwoarang> i will havea look too 16:42:47 <evrardjp> I will get better. 16:42:50 <mhayden> i don't have vxlan offloading, so i need another route for that :/ 16:43:06 <evrardjp> well as long as you give a first draft, that would be good I guess 16:43:11 <evrardjp> can we move on? 16:43:23 <evrardjp> talking about mhayden breakages: 16:43:27 <mhayden> hooray 16:43:35 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1736702 16:43:36 <openstack> Launchpad bug 1736702 in openstack-ansible "TMOUT declared two times after upgrade in /etc/profile" [Undecided,New] 16:43:38 <mhayden> sounds like evrardjp wants me to move on and work on something else 16:43:40 <mhayden> :) 16:43:49 <mhayden> ugh yes 16:44:02 <mhayden> does ansible have a capacity to remove an entire block? 16:44:04 <evrardjp> thanks bro. 16:44:13 <mhayden> i'll take that one and do some experimentation 16:44:14 <evrardjp> yes, blockinfile state absent? 16:44:37 <mhayden> i'll do some testing/tinkering there 16:44:38 <evrardjp> you just use the same task, copy pasta, replace state present with absent, and change the marker. BOOM 16:44:49 <evrardjp> mhayden: thanks! 16:44:54 * mhayden gets out markers and crayons 16:44:54 <evrardjp> confirmed med 16:45:11 <odyssey4me> mhayden evrardjp there was a patch that cloudnull did which converts all the networking to networkd 16:45:14 <evrardjp> where is the love for paint! 16:45:15 <evrardjp> ? 16:45:27 <mhayden> MS Paint is deprecated 16:45:36 * mhayden stops ruining evrardjp's meeting ;) 16:45:41 <evrardjp> odyssey4me: that's a good potential breakage too, but that was yesteryear's conversation! 16:45:55 <evrardjp> mhayden: everything is fine 16:46:00 <evrardjp> (fire) (fire) 16:46:17 <evrardjp> we are almost at a third of the bugs 16:46:20 <evrardjp> great 16:46:21 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1735709 16:46:22 <openstack> Launchpad bug 1735709 in openstack-ansible "Grub authentication is not applied on Fedora 26 and Ubuntu 16.04 (at least)" [Undecided,New] - Assigned to Major Hayden (rackerhacker) 16:46:24 <cloudnull> odyssey4me: https://review.openstack.org/#/c/523218/ 16:46:32 <evrardjp> OH WAIT, guess what? 16:46:33 <evrardjp> :p 16:46:38 <cloudnull> sadly i have to circle back on that for suse 16:46:53 <evrardjp> odyssey4me: don't feed the troll. 16:46:56 <evrardjp> :p 16:47:25 <evrardjp> mhayden: status there? 16:47:38 <evrardjp> should we care? 16:47:47 <mhayden> nothing yet 16:47:50 <mhayden> but this isn't critical 16:48:02 <mhayden> most people i know are horrified about grub auth on boot 16:48:05 <mhayden> but it would be nice to fix it 16:48:48 <evrardjp> confirmed low? 16:48:51 <evrardjp> or still not confirmed? 16:49:50 <odyssey4me> or just take it out and say, do it if you wanna 16:49:55 <odyssey4me> :p 16:50:28 <evrardjp> leaving it as is then :p 16:50:39 <evrardjp> next 16:50:41 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1731186 16:50:42 <openstack> Launchpad bug 1731186 in openstack-ansible "vpnaas_agent.ini wrong config" [Undecided,New] 16:52:21 <evrardjp> should I mark this as incomplete? 16:52:27 <evrardjp> I am not sure I understood this one 16:52:45 <cloudnull> evrardjp: Adri2000: I missed the message before, what is it i need to do ? 16:53:01 * cloudnull can wait for after the meetin g 16:53:26 <cloudnull> wasn't vpnaas removed ? 16:53:43 <evrardjp> cloudnull: fixing the backport that forces restarts of all containers even on minor updates. odyssey4me is assignee for now, but you should work with him :D 16:53:55 <cloudnull> ok 16:53:59 <evrardjp> not so sure but now that you remind me that 16:54:23 <evrardjp> I think bgpvpn aaS was talked about, etc. 16:54:35 <evrardjp> so maybe it's worth saying incomplete, and maybe won't fix? 16:54:36 <cloudnull> evrardjp: odyssey4me: doesn't https://review.openstack.org/#/c/527256 fix that? 16:55:04 <evrardjp> we can talk about the approach after the meeting 16:55:07 <cloudnull> ok 16:55:17 <evrardjp> ok let's wrap it up then 16:55:34 <evrardjp> I am marking this vpnaas as incomplete 16:55:43 <evrardjp> next 16:55:45 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1729525 16:55:46 <openstack> Launchpad bug 1729525 in openstack-ansible "Can't run bootstrap_ansible with encrypted user_secrets.yml (prompting for password) " [Undecided,New] 16:56:43 <evrardjp> ok I think it's fixed 16:57:12 <evrardjp> we should mark it as invalid on the next bug triage 16:57:14 <evrardjp> so 16:57:29 <evrardjp> new topic! 16:57:34 <evrardjp> #topic next week bug triage 16:57:49 <evrardjp> can someone handle the bug triage for the next week? 16:58:04 <evrardjp> I am not available 16:58:21 <cloudnull> I will be out as well 16:58:22 <evrardjp> Starting from tomorrow I won't be available at all until next year. 16:58:25 <cloudnull> ++ 16:58:34 <evrardjp> ok 16:58:41 <evrardjp> anyone else to run the meeting? 16:59:06 <evrardjp> if nobody can run it, we can cancel it, and I post a mail on the ML 16:59:23 <evrardjp> ok for everyone? 16:59:29 <cloudnull> sounds good 16:59:42 <evrardjp> ok thanks everyone then! 16:59:47 <evrardjp> #endmeeting