#openstack-ansible log

16:00:27 <evrardjp> #startmeeting openstack_ansible_meeting
16:00:28 <openstack> Meeting started Tue Dec 12 16:00:27 2017 UTC and is due to finish in 60 minutes.  The chair is evrardjp. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:29 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:32 <openstack> The meeting name has been set to 'openstack_ansible_meeting'
16:01:23 <evrardjp> #topic bugs with new status
16:01:40 <evrardjp> is there anyone ready for bug triage?
16:01:57 <hwoarang> im here
16:01:58 <odyssey4me> o/ here, although between some things
16:02:10 <evrardjp> let's start
16:02:11 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1737013
16:02:12 <openstack> Launchpad bug 1737013 in openstack-ansible "Using non-default Ceph user for Cinder Backup service" [Undecided,New]
16:02:38 <evrardjp> confirmed medium?
16:02:41 <evrardjp> What about the solution?
16:03:29 <hwoarang> looks sensible to me
16:03:52 <spotz> hey
16:03:56 <evrardjp> ok
16:04:02 <odyssey4me> hmm, we should set a group var to make sure that ceph_client and cinder agree on what's right
16:04:10 <evrardjp> odyssey4me: see my comment
16:04:53 <evrardjp> we may want to skip the group var if we can just pass it over
16:05:01 <odyssey4me> yeah, medium as it's fairly serious
16:05:18 <odyssey4me> I can't confirm without looking at the code itself, but it sounds plausible.
16:05:21 <evrardjp> ok
16:05:30 <evrardjp> I have checked the code it's real.
16:05:32 <evrardjp> :D
16:05:39 <evrardjp> now it's about how to do things
16:05:58 <evrardjp> we can discuss in the bug
16:06:04 <evrardjp> we have too many bugs to sort today
16:06:07 <evrardjp> next
16:06:17 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1736989
16:06:19 <openstack> Launchpad bug 1736989 in openstack-ansible "[Docs] Issue when using --limit to add compute with DVR" [Undecided,New]
16:06:58 <evrardjp> I've seen more than one ppl having failures with limit, maybe we should document this better.
16:07:09 <evrardjp> This is a sub-case, probably from our operations guide
16:07:28 <spotz> evrardjp: pop a note in what you want and then assign to me
16:07:31 <evrardjp> For me it's confirmed and low
16:08:10 <odyssey4me> we either need to test limits so that we fix issues that arise from using it, or we should recommend not using it on specific playbooks
16:08:31 <odyssey4me> the ops guide has a lot of dated info, unfortunately
16:08:48 <odyssey4me> agreed for confirmed, low
16:08:54 <evrardjp> spotz: could you have a look where we have --limit in our docs, and update to add localhost, when needed?
16:09:11 <spotz> evrardjp: grep is my friend:)
16:09:18 <evrardjp> yup :)
16:09:28 <evrardjp> ok cool
16:09:36 <evrardjp> next
16:09:37 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1736731
16:09:38 <openstack> Launchpad bug 1736731 in openstack-ansible "os_nova might create a duplicate cell1" [Undecided,New]
16:09:45 <spotz> evrardjp: travelling with the dog this weekend but if I dont get to it before I leave I'll get it next week
16:09:57 <odyssey4me> also, any reference to lxc-containers-create.yml and --limit should always include lxc_hosts in the limit
16:10:11 <evrardjp> ^ +1 this
16:10:23 <evrardjp> so
16:10:26 <evrardjp> next bug
16:10:33 <evrardjp> Adri2000: was the owner of this bug
16:10:45 <evrardjp> well. is owner :)
16:10:54 <evrardjp> for me it's a question of what we do
16:11:01 <evrardjp> it's all a question of expectations
16:11:42 <Adri2000> I expect that adding a rabbitmq node works :)
16:11:45 <evrardjp> if something changed, like in this case adding a rabbitmq node, should we EDIT the cell by being smart ourselves, or should we fully rely on nova
16:12:22 <evrardjp> well it technically worked: you now had a new cell with your new config :p
16:12:28 <openstackgerrit> Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible master: Remove symlinked git cache functionality  https://review.openstack.org/521749
16:12:35 <evrardjp> the problem is that you had two cells with the same name
16:12:40 <evrardjp> am I correct?
16:12:44 <Adri2000> yes
16:12:47 <evrardjp> Or the problem went even further?
16:12:55 <Adri2000> that has some odd consequences
16:13:00 <Miouge> I’m running into problems around the physical_interface_mappings in linuxbridge_agent.ini with this my openstack_user_config http://paste.openstack.org/show/628740/ br-public shows up in both compute and neutron hosts :( I was expecting the group_binds to apply it only on neutron nodes?
16:13:08 <odyssey4me> hmm, I wonder if there's a create-or-update command
16:13:09 <Adri2000> like nova compute service list listing every service twice
16:13:17 <Adri2000> openstack compute service list*
16:13:23 <evrardjp> so because nova is creating cells based on rabbitmq and galera nodes, we should check the existence of the cell name only ?
16:13:47 <evrardjp> odyssey4me: I am not aware of it
16:14:11 <Adri2000> as long as OSA supports deploying only one cell, I think checking whether "cell1" already exists and then update it should be fine
16:14:42 <Adri2000> it looks like tripleo fixed a similar issue in a similar way (I linked their bug report in our bug report)
16:14:54 <odyssey4me> looks like this is where it's done: https://github.com/openstack/openstack-ansible-os_nova/blob/65c12961b4d764ae541e49ec9b582fec086dadc8/tasks/nova_db_setup.yml#L31-L35
16:15:08 <evrardjp> usage: nova-manage cell [-h] {create,delete,list} ...
16:15:20 <odyssey4me> which means that nova itself has to get fixed to prevent duplication
16:15:31 <odyssey4me> and they'll argue that your db should be in sync
16:15:42 <evrardjp> odyssey4me: yeah that's exactly the thing
16:15:45 <odyssey4me> oh cell1... hang on
16:15:58 <evrardjp> it's a nova bug, well no it's not a nova bug, it's your deploy... etc.
16:16:03 <Adri2000> odyssey4me: next task I think
16:16:18 <Adri2000> command: "{{ nova_bin }}/nova-manage cell_v2 create_cell --name {{ nova_cell1_name }} [...]
16:16:47 <evrardjp> usage: nova-manage cell_v2 [-h]
16:16:48 <evrardjp> {create_cell,delete_cell,discover_hosts,list_cells,map_cell0,map_cell_and_hosts,map_instances,simple_cell_setup,update_cell,verify_instance}
16:16:50 <odyssey4me> oh, perhaps we should check for an existing one, then create if it doesn't exist
16:17:01 <odyssey4me> there we go, update_cell
16:17:15 <evrardjp> yup
16:17:25 <evrardjp> so I think we should list_cells
16:17:29 <odyssey4me> so basically change that command task into a shell task which does a check and does the right thing, then reports back whether it changed or not
16:17:30 <evrardjp> see if there is one with maching name
16:17:36 <evrardjp> then update if that's the case
16:17:39 <evrardjp> else, create
16:18:09 <evrardjp> ok sounds like a solved problem!
16:18:19 <odyssey4me> yes, it should definitely do something like that - good find that one!
16:18:31 <evrardjp> yeah thanks Adri2000
16:18:38 <odyssey4me> that'll need backporting to wherever cells started - ocata or pike, can't recall
16:18:39 <evrardjp> confirmed high?
16:18:48 <odyssey4me> yeah
16:18:48 <evrardjp> cell v2 is end of O
16:18:51 <evrardjp> iirc
16:19:08 <odyssey4me> it's in O, yes: https://github.com/openstack/openstack-ansible-os_nova/blob/stable/ocata/tasks/nova_db_setup.yml#L37-L43
16:19:24 <evrardjp> Adri2000: can you submit the patch?
16:19:36 <evrardjp> I propose confirmed and high?
16:19:43 <odyssey4me> yes, agreed
16:19:52 <Adri2000> evrardjp: I can try to submit something this week
16:20:05 <evrardjp> else I just do it quick and you test it :p
16:20:22 <evrardjp> ok let's sync later
16:20:27 <evrardjp> next
16:20:29 <Adri2000> yep :)
16:20:37 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1736726
16:20:39 <openstack> Launchpad bug 1736726 in openstack-ansible "Pike update causes parallel restart of Galera containers" [Undecided,New]
16:20:50 * Adri2000 again o/
16:20:58 <evrardjp> big deal IMO
16:21:07 <evrardjp> so we have a patch in master to prevent restarts
16:21:19 <evrardjp> we already discussed this on the channel too.
16:21:27 <Adri2000> yes it was discussed in the chan one or two days before I filed the report
16:21:29 <evrardjp> my concern is that:
16:21:37 <odyssey4me> yeah, but I'd rather have a fix figured out
16:22:00 <evrardjp> 1) we don't have a test to catch that . 2) we need to fix it soon.
16:22:08 <odyssey4me> for now we can do a reno though to inform people of the known issue - the trouble is that we need to figure out where the change was introduced so that we can say - any versions from x are affected
16:22:23 <evrardjp> 16.0.3 and 16.0.4 are affected
16:22:39 <evrardjp> so anyone from earlier than that will suffer.
16:22:44 <odyssey4me> when I was digging into this, I thin kall the way back to newton might be affected
16:22:57 <evrardjp> why?
16:23:06 <evrardjp> it's coming from a backport, was that backported to N?
16:23:13 <odyssey4me> but I didn't get to a point of confirming, because we don't have enough information about what the change was to the container config that caused it
16:23:30 <odyssey4me> yes, I think a patch was made to master, and ported back
16:23:50 <evrardjp> 16.0.3 is fine I think
16:23:53 <jmccrory> this has happened before https://review.openstack.org/#/c/426337/
16:23:56 <odyssey4me> the patch was fine for master (because in a major upgrade we handle this issue), but not for backports (because we can't handle it for minors)
16:24:32 <evrardjp> odyssey4me: I agree there.
16:24:40 <odyssey4me> yep, good find there jmccrory
16:25:07 <evrardjp> jmccrory: there was no documentation
16:25:10 <odyssey4me> we should probably serialise te container config changes in all the playbooks anyway for exactly this reason
16:25:20 <evrardjp> we just fixed the issue by introducing the serialization
16:25:29 <evrardjp> here it's on lxc hosts/lxc container create
16:25:30 <odyssey4me> if we do, this whole issue of when we can and can't restart becomes moot
16:25:52 <odyssey4me> yes, well - if it's in lxc-containers-create then we have a big problem
16:25:59 <Adri2000> https://git.openstack.org/cgit/openstack/openstack-ansible-lxc_container_create/tree/handlers/main.yml#n19 this handler is the issue
16:26:17 <Adri2000> if that handler is called for all the galera containers: you loose
16:26:24 <jmccrory> yeah more impact if it's happening through an lxc host/create play
16:26:27 <evrardjp> odyssey4me: for me, we should BY DEFAULT for a stable branch NOT reboot.
16:26:29 <odyssey4me> Adri2000 that's just doing what it's told - some config changed, and that config change is the real root cause
16:26:56 <Adri2000> odyssey4me: shouldn't we be able to change the lxc containers config without worrying that restart of the containers will break everything?
16:27:08 <odyssey4me> evrardjp we should never allow changes to container config to be ported back, unless we also handle the effects or add a BIG release note
16:27:28 <evrardjp> odyssey4me: I agree. I was on holiday at that time, under the sun :D
16:27:31 <odyssey4me> Adri2000 not really - container config changes do not take effect until a container is stopped, then started again
16:27:42 <evrardjp> I should have eyes everywhere, I know.
16:27:50 <odyssey4me> evrardjp I feel you.
16:28:14 <odyssey4me> do we have any record of what exactly changed in the container config?
16:28:41 <evrardjp> Adri2000: you told me it all happened on the same backport right?
16:28:46 <evrardjp> you had no issues before that
16:28:47 <evrardjp> ?
16:29:06 <Adri2000> yes
16:29:11 <Adri2000> never had a similar issue before
16:29:30 <odyssey4me> if we ran lxc-containers-create in the service playbooks, then we could serialise this sensibly
16:29:34 <odyssey4me> but that doesn't help stable branches
16:29:34 <Adri2000> that happened while doing a regular git pull of stable/pike, bootstrap-ansible && playbooks run
16:29:44 <evrardjp> please also note that there was a change in stable/pike
16:29:50 <evrardjp> https://github.com/openstack/openstack-ansible-lxc_container_create/commit/c41d3b20da6be07d9bf5db7f7e6a1384c7cfb5eb
16:30:03 <evrardjp> on top of the mega backport https://github.com/openstack/openstack-ansible-lxc_container_create/commit/05b84528d100ab73b91e119dd379a8ed0726db7d
16:31:06 <odyssey4me> evrardjp ok, so if we do not reboot the containers by default - how do we handle making sure the containers are properly configured when initially built?
16:31:27 <evrardjp> odyssey4me: that's not possible :p
16:31:30 <odyssey4me> perhaps do something like drop a local fact, then handle the reboot in serial in the service play?
16:31:36 <evrardjp> that's why I think we are in a pickle
16:31:44 <evrardjp> well yes or something like that
16:31:57 <odyssey4me> that's something we can do, and is back portable
16:32:16 <odyssey4me> it'll also survive a failed playbook, which is nice
16:33:01 <evrardjp> odyssey4me: yeah and we are closer to JIT.
16:33:08 <evrardjp> because that's what ppl want
16:33:16 <evrardjp> so good for me.
16:33:34 <evrardjp> I have no cycles for that.
16:35:01 <evrardjp> confirmed high? Maybe it will break some upgrades gates, then I will need to spend cycles on that, but right now I can't say.
16:35:10 <evrardjp> upgrade tests*
16:35:22 <evrardjp> ok for the classification anyone?
16:35:24 <Adri2000> I'd say at least high :)
16:35:44 <Adri2000> anyone with more than 1 galera node is likely to be impacted
16:35:44 <odyssey4me> high, confirmed - assign it to me
16:35:46 <evrardjp> yeah for now i'd say high and move to critical if broken jobs
16:35:51 <odyssey4me> unless someone else wants to take it?
16:36:01 <evrardjp> anyone?
16:36:07 <evrardjp> pretty please?
16:36:09 <evrardjp> :p
16:36:13 <evrardjp> ok that's for odyssey4me  then!
16:36:19 <odyssey4me> Adri2000 can you please add the tag version you upgraded from and to, so that I can add a reno
16:36:26 <evrardjp> it should be cloudnull tbh.
16:36:35 <odyssey4me> that'll really help me narrow down where the issue is
16:36:40 <evrardjp> odyssey4me: it's in the bug
16:36:44 <Adri2000> odyssey4me: the bug report contains the commit ids already, because I was not using tags
16:36:56 <Adri2000> was on stable/pike head
16:37:14 <evrardjp> my mom says "You break it, you fix it".
16:37:37 <spotz> heheh
16:37:48 <odyssey4me> ah, perfect thanks
16:37:52 <evrardjp> ok next
16:37:55 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1736719
16:37:56 <openstack> Launchpad bug 1736719 in openstack-ansible "Appendix A: Example test environment configuration in OpenStack-Ansible" [Undecided,New]
16:38:23 <evrardjp> I guess spotz will have a look at it? :)
16:38:27 <odyssey4me> ah yeah, maybe mhayden and/or hwoarang could pick that up?
16:38:30 <evrardjp> hwoarang: mhayden could you help?
16:38:32 <evrardjp> AHAHAH
16:38:43 <evrardjp> yeah.
16:38:46 <evrardjp> mgariepy: too maybe :p
16:38:52 <evrardjp> confirmed and wishlist ok ?
16:39:00 <spotz> Poor appendix A, no body loves it
16:39:35 <evrardjp> spotz: we'll make it better.
16:39:37 <evrardjp> :)
16:40:00 <mhayden> odyssey4me / evrardjp: what's up?
16:40:03 <evrardjp> I assigned that to you spotz , please ask those suse/redhat guys :)
16:40:12 <evrardjp> we need help on sample network config
16:40:14 <evrardjp> mhayden: ^
16:40:16 <spotz> mhayden go fix appendix a:)
16:40:17 <mhayden> ORLY
16:40:20 <evrardjp> for appendix a
16:40:23 <odyssey4me> mhayden well, evrardjp's beating you in the break-everything-dept
16:40:27 <evrardjp> just copy pasta and we are good
16:40:29 <mhayden> hot dang
16:40:44 <mhayden> evrardjp: i will not let you beat me in gate breakage
16:40:49 <odyssey4me> not just appendix A - ALL OF THEM!
16:40:50 <mhayden> this aggression will not stand
16:40:50 <evrardjp> odyssey4me: you need to bring patches to break stuff :p
16:40:58 <odyssey4me> bwahaha
16:41:10 * mhayden starts writing a new linter
16:41:12 <evrardjp> well, there are ppl that do it magically
16:41:18 <spotz> mhayden: I wonder if we're bad it'll snow again?:)
16:41:35 <mhayden> that make snow sense
16:41:36 <odyssey4me> yes, let's make it snow more - the world is too warm
16:41:37 <evrardjp> mhayden: or new backend for containers.
16:41:56 <evrardjp> odyssey4me: we are losing the debate folks.
16:41:58 <odyssey4me> anyway, that's 10 mins of our lives which bug triage will never get back
16:41:58 <mhayden> let's make a new container frontend
16:42:07 <spotz> hehe
16:42:09 <evrardjp> I think right now, we need a good breakage.
16:42:11 <evrardjp> :p
16:42:28 <mhayden> i'm about to take down my pike env to re-do the networking and add bonding
16:42:33 <mhayden> so i might be able to put some of that in
16:42:40 <evrardjp> I am glad I managed to break it twice in my lifetime, I think I am far from all the others :D
16:42:44 <hwoarang> i will havea  look too
16:42:47 <evrardjp> I will get better.
16:42:50 <mhayden> i don't have vxlan offloading, so i need another route for that :/
16:43:06 <evrardjp> well as long as you give a first draft, that would be good I guess
16:43:11 <evrardjp> can we move on?
16:43:23 <evrardjp> talking about mhayden breakages:
16:43:27 <mhayden> hooray
16:43:35 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1736702
16:43:36 <openstack> Launchpad bug 1736702 in openstack-ansible "TMOUT declared two times after upgrade in /etc/profile" [Undecided,New]
16:43:38 <mhayden> sounds like evrardjp wants me to move on and work on something else
16:43:40 <mhayden> :)
16:43:49 <mhayden> ugh yes
16:44:02 <mhayden> does ansible have a capacity to remove an entire block?
16:44:04 <evrardjp> thanks bro.
16:44:13 <mhayden> i'll take that one and do some experimentation
16:44:14 <evrardjp> yes, blockinfile state absent?
16:44:37 <mhayden> i'll do some testing/tinkering there
16:44:38 <evrardjp> you just use the same task, copy pasta, replace state present with absent, and change the marker. BOOM
16:44:49 <evrardjp> mhayden: thanks!
16:44:54 * mhayden gets out markers and crayons
16:44:54 <evrardjp> confirmed med
16:45:11 <odyssey4me> mhayden evrardjp there was a patch that cloudnull did which converts all the networking to networkd
16:45:14 <evrardjp> where is the love for paint!
16:45:15 <evrardjp> ?
16:45:27 <mhayden> MS Paint is deprecated
16:45:36 * mhayden stops ruining evrardjp's meeting ;)
16:45:41 <evrardjp> odyssey4me: that's a good potential breakage too, but that was yesteryear's conversation!
16:45:55 <evrardjp> mhayden: everything is fine
16:46:00 <evrardjp> (fire) (fire)
16:46:17 <evrardjp> we are almost at a third of the bugs
16:46:20 <evrardjp> great
16:46:21 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1735709
16:46:22 <openstack> Launchpad bug 1735709 in openstack-ansible "Grub authentication is not applied on Fedora 26 and Ubuntu 16.04 (at least)" [Undecided,New] - Assigned to Major Hayden (rackerhacker)
16:46:24 <cloudnull> odyssey4me: https://review.openstack.org/#/c/523218/
16:46:32 <evrardjp> OH WAIT, guess what?
16:46:33 <evrardjp> :p
16:46:38 <cloudnull> sadly i have to circle back on that for suse
16:46:53 <evrardjp> odyssey4me: don't feed the troll.
16:46:56 <evrardjp> :p
16:47:25 <evrardjp> mhayden: status there?
16:47:38 <evrardjp> should we care?
16:47:47 <mhayden> nothing yet
16:47:50 <mhayden> but this isn't critical
16:48:02 <mhayden> most people i know are horrified about grub auth on boot
16:48:05 <mhayden> but it would be nice to fix it
16:48:48 <evrardjp> confirmed low?
16:48:51 <evrardjp> or still not confirmed?
16:49:50 <odyssey4me> or just take it out and say, do it if you wanna
16:49:55 <odyssey4me> :p
16:50:28 <evrardjp> leaving it as is then :p
16:50:39 <evrardjp> next
16:50:41 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1731186
16:50:42 <openstack> Launchpad bug 1731186 in openstack-ansible "vpnaas_agent.ini wrong config" [Undecided,New]
16:52:21 <evrardjp> should I mark this as incomplete?
16:52:27 <evrardjp> I am not sure I understood this one
16:52:45 <cloudnull> evrardjp: Adri2000: I missed the message before, what is it i need to do ?
16:53:01 * cloudnull can wait for after the meetin g
16:53:26 <cloudnull> wasn't vpnaas removed ?
16:53:43 <evrardjp> cloudnull: fixing the backport that forces restarts of all containers even on minor updates. odyssey4me is assignee for now, but you should work with him :D
16:53:55 <cloudnull> ok
16:53:59 <evrardjp> not so sure but now that you remind me that
16:54:23 <evrardjp> I think bgpvpn aaS was talked about, etc.
16:54:35 <evrardjp> so maybe it's worth saying incomplete, and maybe won't fix?
16:54:36 <cloudnull> evrardjp: odyssey4me: doesn't https://review.openstack.org/#/c/527256 fix that?
16:55:04 <evrardjp> we can talk about the approach after the meeting
16:55:07 <cloudnull> ok
16:55:17 <evrardjp> ok let's wrap it up then
16:55:34 <evrardjp> I am marking this vpnaas as incomplete
16:55:43 <evrardjp> next
16:55:45 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1729525
16:55:46 <openstack> Launchpad bug 1729525 in openstack-ansible "Can't run bootstrap_ansible with encrypted user_secrets.yml (prompting for password) " [Undecided,New]
16:56:43 <evrardjp> ok I think it's fixed
16:57:12 <evrardjp> we should mark it as invalid on the next bug triage
16:57:14 <evrardjp> so
16:57:29 <evrardjp> new topic!
16:57:34 <evrardjp> #topic next week bug triage
16:57:49 <evrardjp> can someone handle the bug triage for the next week?
16:58:04 <evrardjp> I am not available
16:58:21 <cloudnull> I will be out as well
16:58:22 <evrardjp> Starting from tomorrow I won't be available at all until next year.
16:58:25 <cloudnull> ++
16:58:34 <evrardjp> ok
16:58:41 <evrardjp> anyone else to run the meeting?
16:59:06 <evrardjp> if nobody can run it, we can cancel it, and I post a mail on the ML
16:59:23 <evrardjp> ok for everyone?
16:59:29 <cloudnull> sounds good
16:59:42 <evrardjp> ok thanks everyone then!
16:59:47 <evrardjp> #endmeeting