16:01:49 <evrardjp> #startmeeting openstack_ansible_meeting
16:01:51 <openstack> Meeting started Tue Feb  6 16:01:49 2018 UTC and is due to finish in 60 minutes.  The chair is evrardjp. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:01:52 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:01:54 <openstack> The meeting name has been set to 'openstack_ansible_meeting'
16:02:13 <evrardjp> #topic rollcall
16:02:15 <prometheanfire> o/
16:02:36 <evrardjp> nobody else?
16:03:11 <mgariepy> i'm half-here.
16:03:22 <odyssey4me> o/
16:03:38 <evrardjp> let's start, 3.5 people :)
16:03:43 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1747629
16:03:44 <openstack> Launchpad bug 1747629 in openstack-ansible "A worker was found in dead state" [Undecided,New]
16:03:49 <evrardjp> #topic bugs
16:03:55 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1747629
16:04:08 <openstackgerrit> Kevin Carter (cloudnull) proposed openstack/openstack-ansible master: Ensure neutron agents & rabbitmq do not restart when upgrading  https://review.openstack.org/541320
16:04:11 <odyssey4me> mgariepy https://media.giphy.com/media/xT9IgN8YKRhByRBzMI/giphy.gif ;)
16:04:12 <cloudnull> mgariepy: ^
16:04:20 <cloudnull> evrardjp: i'm here this morning :)
16:04:26 <evrardjp> woot
16:04:36 <cloudnull> 4.5 people :)
16:04:37 <evrardjp> ok so the first one is interesting
16:04:41 <cloudnull> more like 3.75
16:04:43 <cloudnull> :P
16:04:46 <spotz> sorta here, ping if needed
16:05:16 <evrardjp> it doesn't seem to reproduce in my machine, but reliably happens in the gates. It seems we have dead workers after a certain time in the translations jobs
16:05:17 <odyssey4me> hmm, that worker dead state issue is something I've seen when the host runs out of memory
16:05:34 <odyssey4me> ansible basically just croaks
16:05:37 <evrardjp> odyssey4me: indeed, I've seen that too
16:05:51 <evrardjp> so maybe there is something to reduce memory consumption for that job
16:06:46 <mgariepy> cloudnull, reviewed ;)
16:06:58 <openstackgerrit> Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-os_neutron master: add ml2 config for networking bgpvpn  https://review.openstack.org/522598
16:07:24 <evrardjp> is there anyone that wants to confirm this, work on it?
16:07:45 <evrardjp> if not I'll move to the next bug
16:08:00 <evrardjp> ok next
16:08:01 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1747628
16:08:02 <openstack> Launchpad bug 1747628 in openstack-ansible "Upgrades to Queens are broken due to new container scaffolding" [Undecided,In progress] - Assigned to Kevin Carter (kevin-carter)
16:08:09 <evrardjp> cloudnull: could you have a look?
16:08:11 <evrardjp> great!
16:08:15 <evrardjp> thanks
16:08:26 <evrardjp> must be something real quick to fix
16:08:32 <evrardjp> let's move on
16:08:32 <cloudnull> evrardjp: already done.
16:08:36 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1747350
16:08:38 <prometheanfire> lol
16:08:38 <openstack> Launchpad bug 1747350 in openstack-ansible "openstack-ansible failed on ubuntu 16.04 aio " [Undecided,New]
16:08:54 <cloudnull> https://review.openstack.org/#/c/541315/
16:08:58 <evrardjp> cloudnull: yeah I've seen, thanks.
16:08:58 <cloudnull> evrardjp: ^
16:09:12 <odyssey4me> that failure is not a failure
16:09:19 <odyssey4me> it's a try/rescue block
16:10:10 <evrardjp> it looks invalid
16:10:13 <evrardjp> indeed
16:10:18 <odyssey4me> I've commented
16:10:31 <cloudnull> ++ odyssey4me with https://review.openstack.org/#/q/Ic54a7524c09a170e20830c5f8d2c2a0658159ed0
16:10:37 <odyssey4me> That, and there is still the set of rabbitmq try/rescue blocks that causes confusion.
16:10:45 <andymccr> invalid!
16:10:59 <cloudnull> hopefully it'll be clearer when the playbooks fail or succeed
16:11:16 <cloudnull> that is until ansible 2.5 is released and the try/except ux is better
16:12:09 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1747313
16:12:09 <openstack> Launchpad bug 1747313 in openstack-ansible "openstack-ansible failed to build AIO" [Undecided,New]
16:12:10 <odyssey4me> Ja, added that comment to the bug. I'm not a fan of that set of patches - I would far rather have us rework that try/rescue block into something that doesn't require a task failure.
16:12:16 <cloudnull> cool
16:12:25 <odyssey4me> But meh, I don't really have the time to spend on it.
16:12:25 <evrardjp> we've all commented on that bug. nice!
16:12:57 <evrardjp> odyssey4me: that's what we should probably do indeed. Because the rescue is meant as a rescue :)
16:13:05 <evrardjp> last resort stuff, not expected failures.
16:13:08 <evrardjp> anyway
16:13:11 <evrardjp> next one is selinux
16:13:14 <evrardjp> https://bugs.launchpad.net/openstack-ansible/+bug/1747313
16:13:15 <openstack> Launchpad bug 1747313 in openstack-ansible "openstack-ansible failed to build AIO" [Undecided,New]
16:13:28 <odyssey4me> this looks kinda like a dup of the last one, except for the selinux thing
16:13:41 <odyssey4me> I thought that we had selinux things in the ansible bootstrap though
16:13:54 <odyssey4me> in this one the failure is again not actually a failure
16:14:19 <evrardjp> let's check real quick
16:14:19 <odyssey4me> ah, except for the "Check for unlabeled device files" task
16:14:28 <evrardjp> wait
16:14:33 <evrardjp> openstack-ansible-security
16:14:37 <evrardjp> that's old.
16:15:04 <odyssey4me> oh, good catch there
16:15:13 <odyssey4me> it's tagged with the beta, but doesn't look beta related
16:15:23 <evrardjp> no it's not.
16:15:26 <evrardjp> that's not possible.
16:15:37 <evrardjp> will comment on the bug
16:16:20 <evrardjp> marked it as incomplete
16:16:29 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1746142
16:16:30 <openstack> Launchpad bug 1746142 in openstack-ansible "Nova uid/gid sync, default/standards" [Undecided,New]
16:17:23 <odyssey4me> The reason it's not defaulted is exactly for the reason he describes... it will break an upgrade
16:17:23 <evrardjp> here it's a question of: is that a whishlist, or a real bug? (think about people using shared storage)
16:17:53 <cloudnull> wishlist
16:18:04 <odyssey4me> both the cinder and nova roles implement very broad access rights to the storage folders so that the uid/gid should not matter
16:18:11 <odyssey4me> that's as far as I recall, at least
16:18:32 <odyssey4me> so if we implement a default, we have to implement a migration tool too so that upgrades work
16:18:32 <andymccr> yeah wishlist
16:18:42 <odyssey4me> so yeah, that's a feature request - and a hard one to do
16:18:48 <andymccr> imo just dont set the uid/gid why would you need to? unless you are doing a new deploy
16:19:38 <odyssey4me> andymccr on a shared storage system (like NFS), if each compute has a different nova uid, then stuff doesn't work right... but we work around that using very brooad access settings which is quite horrible, really
16:20:17 <odyssey4me> we've long talked about having a global uid/gid map - even back in juno I remember chatting about it
16:20:19 <evrardjp> I think we all agree, and we can continue
16:20:32 <evrardjp> if someone wants consistant uid he can do it beforehand
16:20:37 <odyssey4me> but the upgrade issue is always where it got stuck
16:21:09 <evrardjp> next
16:21:11 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1745675
16:21:14 <openstack> Launchpad bug 1745675 in openstack-ansible "aide database file is missing" [Undecided,New]
16:21:37 <evrardjp> mhayden: ?
16:21:43 <evrardjp> could you have a look at it?
16:22:25 <evrardjp> who is okay with the fact I assign that to mhayden ?
16:22:27 <evrardjp> :D
16:22:48 <odyssey4me> well, anyone could pick it up I guess - but mhayden would probably be interested :)
16:23:01 <evrardjp> yeah
16:23:11 <odyssey4me> looks like that process needs a little bit of TLC to make it more reliable
16:23:12 <evrardjp> I leave it as new for now. Just assigned mhayden for the incentive
16:23:24 <evrardjp> let's move on
16:23:25 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1745361
16:23:26 <openstack> Launchpad bug 1745361 in openstack-ansible "Failed to create subvolume /var/lib/machines/ when run 'openstack-ansible setup-hosts.yml' multiple times" [Undecided,New]
16:24:22 <evrardjp> gokhan: was definitely unlucky on this one
16:24:30 <openstackgerrit> Kevin Carter (cloudnull) proposed openstack/openstack-ansible master: Ensure neutron agents & rabbitmq do not restart when upgrading  https://review.openstack.org/541320
16:24:41 <andymccr> hmm
16:25:00 <odyssey4me> Defer to cloudnull for that one.
16:25:15 <odyssey4me> There may already be a fix available - I remember seeing something about quotas and such.
16:25:43 <evrardjp> yeah maybe
16:25:49 <evrardjp> cloudnull: ?
16:25:57 <evrardjp> could you handle this one?
16:26:02 <cloudnull> yup, https://review.openstack.org/#/c/527592/
16:26:04 <evrardjp> triage it and take decisions?
16:26:22 <cloudnull> I need to update it seems as reno is unhappy
16:26:44 <evrardjp> great
16:26:59 <evrardjp> yeah I've quickly reviewed to link both the bug and the review.
16:27:06 <evrardjp> let's continue
16:27:08 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1745287
16:27:09 <openstack> Launchpad bug 1745287 in openstack-ansible "ceph-mon : collect admin and bootstrap keys fails on CentOS7" [Undecided,New]
16:27:11 <andymccr> cloudnull knocking it out the park!
16:27:19 <odyssey4me> cloudnull need a closes-bug in that commit msg too then
16:27:27 <cloudnull> odyssey4me: ++ will add
16:28:27 <mhayden> evrardjp: sorry -- got caught in something... feel free to assign
16:28:30 <andymccr> hmm that is weird
16:28:32 <odyssey4me> looks like it failed to collect the data it needs from the mons
16:28:43 <odyssey4me> connectivity, or perhaps ssh key issue?
16:28:52 <evrardjp> it looks like a problem with quorum?
16:29:07 <evrardjp> but on top of that unable to find a keyring on /etc/ceph/ceph.client.admin.keyring
16:29:08 <odyssey4me> or yes, perhaps actually a ceph cluster issue
16:29:26 <odyssey4me> sounds to me like the cluster setup wasn't complete
16:29:48 <andymccr> well its a ceph-mon role from ceph-ansible task that is failing
16:29:53 <evrardjp> andymccr: does that look right to you?  "addr": "172.29.236.177:6789/0",
16:30:26 <evrardjp> that doesn't like nice
16:30:27 <andymccr> im sure its the mon service
16:30:30 <andymccr> not sure what the /0 is
16:30:38 <evrardjp> or "addr": "0.0.0.0:0/1",
16:31:29 <jamespage> odyssey4me: pylxd 2.0.6 is now on pypi
16:31:40 <odyssey4me> jamespage brilliant, thank you very much!
16:31:52 <andymccr> the /0 is fine
16:32:12 <andymccr> "extra_probe_peers": [
16:32:12 <andymccr> "172.29.236.32:6789/0",
16:32:12 <andymccr> "172.29.236.33:6789/0"
16:32:16 <andymccr> for e.g. on my test aio right now i have ^
16:32:21 <andymccr> and that succeeded fine
16:32:40 <evrardjp> the configuration doesn't look that bad on the given tar file
16:32:55 <evrardjp> jamespage: \o/
16:33:10 <jamespage> tinwood: ^^ take the praise (not my work)
16:33:22 <evrardjp> tinwood: \o/ too!
16:33:27 <tinwood> thanks
16:33:30 <evrardjp> andymccr: what's your take?
16:33:33 <evrardjp> on the bug
16:33:36 <andymccr> http://tracker.ceph.com/issues/21427 id follow that
16:33:53 <andymccr> it is failing on a specific task - ceph-create-keys i think if we could get more info
16:33:58 <andymccr> like what does a manual run output. etc
16:34:10 <andymccr> its hard to say if its something setup in our deploys or a ceph-ansible bug or a ceph bug overall
16:35:03 <evrardjp> could you comment on the bug to ask the additional info, please?
16:35:11 <evrardjp> I'll mark it as incomplete
16:35:55 <evrardjp> ok let's move on
16:35:57 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1745281
16:35:58 <openstack> Launchpad bug 1745281 in openstack-ansible "galera_server : Create galera users fails on CentOS7" [Undecided,New]
16:36:24 <evrardjp> this proves how centos is broken :/
16:36:41 <evrardjp> or on how the deployer's environment is broken
16:36:45 <evrardjp> :D
16:36:55 <andymccr> will do
16:37:10 <andymccr> it also shows ppl kinda like centos :P
16:37:46 <evrardjp> I will tackle this one I guess, except if mgariepy has the time to do it between two things?
16:38:52 <openstackgerrit> Kevin Carter (cloudnull) proposed openstack/openstack-ansible-lxc_hosts master: Clean-up old systemd prep and allow machinctl to grow  https://review.openstack.org/527592
16:38:53 <idlemind> +1 centos, i haven't seen this one yet but i only have a single galera container atm (fail on me i know right)
16:39:52 <evrardjp> good point idlemind , I have to try with a cluster.
16:39:55 <evrardjp> let's move on
16:40:07 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1745270
16:40:08 <openstack> Launchpad bug 1745270 in openstack-ansible "unable to connect to epmd (port 4369) on CentOS7" [Undecided,New]
16:40:40 <evrardjp> now I am thinking this whole series is a networking issue :p
16:41:04 <evrardjp> let's move on
16:41:06 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1745215
16:41:07 <openstack> Launchpad bug 1745215 in openstack-ansible "Every openstack client is built in the repo build" [Undecided,New]
16:42:16 <openstackgerrit> Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible stable/newton: Update os_nova and repo_build role SHA's  https://review.openstack.org/541259
16:43:15 <cloudnull> evrardjp: seems like not a bug, given the mentioned refactor ?
16:43:28 <odyssey4me> yeah, feature release
16:43:43 <odyssey4me> it's a bit painful, but not a bug - it's due to heat and that's life
16:44:00 <odyssey4me> if you don't use heat, remove it from your config and life will be awesome again :)
16:44:23 <openstackgerrit> Major Hayden proposed openstack/openstack-ansible-tests master: Add a status line for SELinux status  https://review.openstack.org/541371
16:45:01 <odyssey4me> I mean, we could propose patches to the heat client repo, which would be nice - if someone has the time and inclination, awesome
16:45:24 <evrardjp> yeah. In the meantime it's wishlist I'd say, and confirmed.
16:45:29 <odyssey4me> yep
16:45:40 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1745212
16:45:41 <openstack> Launchpad bug 1745212 in openstack-ansible "default_bind_mount_logs changes on N>O upgrade" [Undecided,New]
16:46:24 <evrardjp> we can mark this fix released I guess?
16:46:28 <odyssey4me> looks like it
16:46:53 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1744458
16:46:54 <openstack> Launchpad bug 1744458 in openstack-ansible "Failed to build cradox on CentOS7 " [Undecided,New]
16:47:01 <odyssey4me> unless it's not actually solved? mgariepy - I see the reviews were marked 'related', not 'closes'
16:47:27 <odyssey4me> I think this was solved recently
16:47:59 <odyssey4me> https://review.openstack.org/#/c/530570/
16:48:01 <evrardjp> Yes I think it looks solved
16:48:32 <evrardjp> fix released
16:48:35 <evrardjp> woot
16:48:38 <mgariepy> hrm , sorry was the time i was half - not here haha :)
16:48:41 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1743032
16:48:42 <openstack> Launchpad bug 1743032 in openstack-ansible "Galera cluster maintenance in OpenStack-Ansible" [Undecided,New] - Assigned to Kevin Carter (kevin-carter)
16:49:22 <evrardjp> cloudnull: do you have time to check at this one, or should we unassign it?
16:49:40 <nurdie_> https://docs.openstack.org/openstack-ansible-ceph_client/latest  <-- is that just a guide on what to include when making our own ceph-install play? So, for instance, this could replace the ceph-install play that gets called from /opt/openstack-ansible/playbooks/setup-infrastructure.yml?
16:50:03 <nurdie_> for integration with an existing ceph cluster
16:50:05 <evrardjp> nurdie_: let's talk after the bug triage, in 10 minutes.
16:50:14 <nurdie_> evrardjp, thanks!
16:50:15 <openstackgerrit> Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-os_nova stable/newton: Revert "Switch the LXD compute driver test to non-voting"  https://review.openstack.org/541372
16:51:08 <cloudnull> evrardjp:  I can look into that today
16:51:18 <evrardjp> that's great!
16:51:19 <openstackgerrit> Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-os_nova stable/newton: Revert "Switch the LXD compute driver test to non-voting"  https://review.openstack.org/541372
16:51:25 <evrardjp> next
16:51:27 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1739472
16:51:28 <openstack> Launchpad bug 1739472 in openstack-ansible "mariadb-client excluded in ocata centos7 deploy" [Undecided,New] - Assigned to Markos Chandras (hwoarang)
16:52:11 <evrardjp> hwoarang: said he cannot attend the meeting today, so let's postpone the discussion of this, unless someone has anything to say?
16:52:34 <evrardjp> ok let's move on then
16:52:38 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1737827
16:52:39 <openstack> Launchpad bug 1737827 in openstack-ansible "(ceph-client): setting 'nova_ceph_client' results in deployment where volumes can't be attached to VMs" [Undecided,New]
16:53:07 <odyssey4me> that's an old bug IIRC
16:53:29 <odyssey4me> what I mean is, I've seen that discussed in channel before - I think admin0 made some noise about it
16:53:41 <andymccr> ahh igot feedback on that
16:54:09 <evrardjp> odyssey4me: yeah, I am sure I got that biting me too in the past... but things might have changed now :)
16:54:12 <evrardjp> andymccr: ?
16:54:24 <andymccr> lemme look at the comments
16:54:37 <andymccr> i'll take that one
16:54:40 <evrardjp> ok
16:54:50 <evrardjp> assigning it to you
16:54:58 <evrardjp> thanks
16:55:05 <evrardjp> next
16:55:07 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1729263
16:55:09 <openstack> Launchpad bug 1729263 in openstack-ansible "Swift (master) transient tempest failures under centos " [Undecided,New]
16:55:20 <evrardjp> let's skip that one
16:55:25 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1721554
16:55:26 <openstack> Launchpad bug 1721554 in openstack-ansible "os_ceilometer fails without swift installed" [Medium,New]
16:56:01 <evrardjp> I have the impression there was a patch for this one
16:57:01 <evrardjp> mnaser: are you there?
16:57:06 <mnaser> o/
16:57:24 * mnaser reads that bug
16:58:06 <evrardjp> thanks mnaser !
16:58:49 <mnaser> okay it looks like this is something that should be skipped if not using swift
16:58:50 <evrardjp> If you are working on removing mongo there are maybe other things on that topic that can cause issues, and things to be aware of...
16:58:57 <evrardjp> thanks for having a look!
16:59:46 <evrardjp> if you need any help, don't hesitate mnaser !
17:00:00 <evrardjp> ok we are out of time
17:00:13 <evrardjp> thanks everyone for your help on making the world of openstack-ansible better
17:00:22 <evrardjp> #endmeeting