16:01:49 #startmeeting openstack_ansible_meeting 16:01:51 Meeting started Tue Feb 6 16:01:49 2018 UTC and is due to finish in 60 minutes. The chair is evrardjp. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:52 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:54 The meeting name has been set to 'openstack_ansible_meeting' 16:02:13 #topic rollcall 16:02:15 o/ 16:02:36 nobody else? 16:03:11 i'm half-here. 16:03:22 o/ 16:03:38 let's start, 3.5 people :) 16:03:43 #link https://bugs.launchpad.net/openstack-ansible/+bug/1747629 16:03:44 Launchpad bug 1747629 in openstack-ansible "A worker was found in dead state" [Undecided,New] 16:03:49 #topic bugs 16:03:55 #link https://bugs.launchpad.net/openstack-ansible/+bug/1747629 16:04:08 Kevin Carter (cloudnull) proposed openstack/openstack-ansible master: Ensure neutron agents & rabbitmq do not restart when upgrading https://review.openstack.org/541320 16:04:11 mgariepy https://media.giphy.com/media/xT9IgN8YKRhByRBzMI/giphy.gif ;) 16:04:12 mgariepy: ^ 16:04:20 evrardjp: i'm here this morning :) 16:04:26 woot 16:04:36 4.5 people :) 16:04:37 ok so the first one is interesting 16:04:41 more like 3.75 16:04:43 :P 16:04:46 sorta here, ping if needed 16:05:16 it doesn't seem to reproduce in my machine, but reliably happens in the gates. It seems we have dead workers after a certain time in the translations jobs 16:05:17 hmm, that worker dead state issue is something I've seen when the host runs out of memory 16:05:34 ansible basically just croaks 16:05:37 odyssey4me: indeed, I've seen that too 16:05:51 so maybe there is something to reduce memory consumption for that job 16:06:46 cloudnull, reviewed ;) 16:06:58 Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-os_neutron master: add ml2 config for networking bgpvpn https://review.openstack.org/522598 16:07:24 is there anyone that wants to confirm this, work on it? 16:07:45 if not I'll move to the next bug 16:08:00 ok next 16:08:01 #link https://bugs.launchpad.net/openstack-ansible/+bug/1747628 16:08:02 Launchpad bug 1747628 in openstack-ansible "Upgrades to Queens are broken due to new container scaffolding" [Undecided,In progress] - Assigned to Kevin Carter (kevin-carter) 16:08:09 cloudnull: could you have a look? 16:08:11 great! 16:08:15 thanks 16:08:26 must be something real quick to fix 16:08:32 let's move on 16:08:32 evrardjp: already done. 16:08:36 #link https://bugs.launchpad.net/openstack-ansible/+bug/1747350 16:08:38 lol 16:08:38 Launchpad bug 1747350 in openstack-ansible "openstack-ansible failed on ubuntu 16.04 aio " [Undecided,New] 16:08:54 https://review.openstack.org/#/c/541315/ 16:08:58 cloudnull: yeah I've seen, thanks. 16:08:58 evrardjp: ^ 16:09:12 that failure is not a failure 16:09:19 it's a try/rescue block 16:10:10 it looks invalid 16:10:13 indeed 16:10:18 I've commented 16:10:31 ++ odyssey4me with https://review.openstack.org/#/q/Ic54a7524c09a170e20830c5f8d2c2a0658159ed0 16:10:37 That, and there is still the set of rabbitmq try/rescue blocks that causes confusion. 16:10:45 invalid! 16:10:59 hopefully it'll be clearer when the playbooks fail or succeed 16:11:16 that is until ansible 2.5 is released and the try/except ux is better 16:12:09 #link https://bugs.launchpad.net/openstack-ansible/+bug/1747313 16:12:09 Launchpad bug 1747313 in openstack-ansible "openstack-ansible failed to build AIO" [Undecided,New] 16:12:10 Ja, added that comment to the bug. I'm not a fan of that set of patches - I would far rather have us rework that try/rescue block into something that doesn't require a task failure. 16:12:16 cool 16:12:25 But meh, I don't really have the time to spend on it. 16:12:25 we've all commented on that bug. nice! 16:12:57 odyssey4me: that's what we should probably do indeed. Because the rescue is meant as a rescue :) 16:13:05 last resort stuff, not expected failures. 16:13:08 anyway 16:13:11 next one is selinux 16:13:14 https://bugs.launchpad.net/openstack-ansible/+bug/1747313 16:13:15 Launchpad bug 1747313 in openstack-ansible "openstack-ansible failed to build AIO" [Undecided,New] 16:13:28 this looks kinda like a dup of the last one, except for the selinux thing 16:13:41 I thought that we had selinux things in the ansible bootstrap though 16:13:54 in this one the failure is again not actually a failure 16:14:19 let's check real quick 16:14:19 ah, except for the "Check for unlabeled device files" task 16:14:28 wait 16:14:33 openstack-ansible-security 16:14:37 that's old. 16:15:04 oh, good catch there 16:15:13 it's tagged with the beta, but doesn't look beta related 16:15:23 no it's not. 16:15:26 that's not possible. 16:15:37 will comment on the bug 16:16:20 marked it as incomplete 16:16:29 #link https://bugs.launchpad.net/openstack-ansible/+bug/1746142 16:16:30 Launchpad bug 1746142 in openstack-ansible "Nova uid/gid sync, default/standards" [Undecided,New] 16:17:23 The reason it's not defaulted is exactly for the reason he describes... it will break an upgrade 16:17:23 here it's a question of: is that a whishlist, or a real bug? (think about people using shared storage) 16:17:53 wishlist 16:18:04 both the cinder and nova roles implement very broad access rights to the storage folders so that the uid/gid should not matter 16:18:11 that's as far as I recall, at least 16:18:32 so if we implement a default, we have to implement a migration tool too so that upgrades work 16:18:32 yeah wishlist 16:18:42 so yeah, that's a feature request - and a hard one to do 16:18:48 imo just dont set the uid/gid why would you need to? unless you are doing a new deploy 16:19:38 andymccr on a shared storage system (like NFS), if each compute has a different nova uid, then stuff doesn't work right... but we work around that using very brooad access settings which is quite horrible, really 16:20:17 we've long talked about having a global uid/gid map - even back in juno I remember chatting about it 16:20:19 I think we all agree, and we can continue 16:20:32 if someone wants consistant uid he can do it beforehand 16:20:37 but the upgrade issue is always where it got stuck 16:21:09 next 16:21:11 #link https://bugs.launchpad.net/openstack-ansible/+bug/1745675 16:21:14 Launchpad bug 1745675 in openstack-ansible "aide database file is missing" [Undecided,New] 16:21:37 mhayden: ? 16:21:43 could you have a look at it? 16:22:25 who is okay with the fact I assign that to mhayden ? 16:22:27 :D 16:22:48 well, anyone could pick it up I guess - but mhayden would probably be interested :) 16:23:01 yeah 16:23:11 looks like that process needs a little bit of TLC to make it more reliable 16:23:12 I leave it as new for now. Just assigned mhayden for the incentive 16:23:24 let's move on 16:23:25 #link https://bugs.launchpad.net/openstack-ansible/+bug/1745361 16:23:26 Launchpad bug 1745361 in openstack-ansible "Failed to create subvolume /var/lib/machines/ when run 'openstack-ansible setup-hosts.yml' multiple times" [Undecided,New] 16:24:22 gokhan: was definitely unlucky on this one 16:24:30 Kevin Carter (cloudnull) proposed openstack/openstack-ansible master: Ensure neutron agents & rabbitmq do not restart when upgrading https://review.openstack.org/541320 16:24:41 hmm 16:25:00 Defer to cloudnull for that one. 16:25:15 There may already be a fix available - I remember seeing something about quotas and such. 16:25:43 yeah maybe 16:25:49 cloudnull: ? 16:25:57 could you handle this one? 16:26:02 yup, https://review.openstack.org/#/c/527592/ 16:26:04 triage it and take decisions? 16:26:22 I need to update it seems as reno is unhappy 16:26:44 great 16:26:59 yeah I've quickly reviewed to link both the bug and the review. 16:27:06 let's continue 16:27:08 #link https://bugs.launchpad.net/openstack-ansible/+bug/1745287 16:27:09 Launchpad bug 1745287 in openstack-ansible "ceph-mon : collect admin and bootstrap keys fails on CentOS7" [Undecided,New] 16:27:11 cloudnull knocking it out the park! 16:27:19 cloudnull need a closes-bug in that commit msg too then 16:27:27 odyssey4me: ++ will add 16:28:27 evrardjp: sorry -- got caught in something... feel free to assign 16:28:30 hmm that is weird 16:28:32 looks like it failed to collect the data it needs from the mons 16:28:43 connectivity, or perhaps ssh key issue? 16:28:52 it looks like a problem with quorum? 16:29:07 but on top of that unable to find a keyring on /etc/ceph/ceph.client.admin.keyring 16:29:08 or yes, perhaps actually a ceph cluster issue 16:29:26 sounds to me like the cluster setup wasn't complete 16:29:48 well its a ceph-mon role from ceph-ansible task that is failing 16:29:53 andymccr: does that look right to you? "addr": "172.29.236.177:6789/0", 16:30:26 that doesn't like nice 16:30:27 im sure its the mon service 16:30:30 not sure what the /0 is 16:30:38 or "addr": "0.0.0.0:0/1", 16:31:29 odyssey4me: pylxd 2.0.6 is now on pypi 16:31:40 jamespage brilliant, thank you very much! 16:31:52 the /0 is fine 16:32:12 "extra_probe_peers": [ 16:32:12 "172.29.236.32:6789/0", 16:32:12 "172.29.236.33:6789/0" 16:32:16 for e.g. on my test aio right now i have ^ 16:32:21 and that succeeded fine 16:32:40 the configuration doesn't look that bad on the given tar file 16:32:55 jamespage: \o/ 16:33:10 tinwood: ^^ take the praise (not my work) 16:33:22 tinwood: \o/ too! 16:33:27 thanks 16:33:30 andymccr: what's your take? 16:33:33 on the bug 16:33:36 http://tracker.ceph.com/issues/21427 id follow that 16:33:53 it is failing on a specific task - ceph-create-keys i think if we could get more info 16:33:58 like what does a manual run output. etc 16:34:10 its hard to say if its something setup in our deploys or a ceph-ansible bug or a ceph bug overall 16:35:03 could you comment on the bug to ask the additional info, please? 16:35:11 I'll mark it as incomplete 16:35:55 ok let's move on 16:35:57 #link https://bugs.launchpad.net/openstack-ansible/+bug/1745281 16:35:58 Launchpad bug 1745281 in openstack-ansible "galera_server : Create galera users fails on CentOS7" [Undecided,New] 16:36:24 this proves how centos is broken :/ 16:36:41 or on how the deployer's environment is broken 16:36:45 :D 16:36:55 will do 16:37:10 it also shows ppl kinda like centos :P 16:37:46 I will tackle this one I guess, except if mgariepy has the time to do it between two things? 16:38:52 Kevin Carter (cloudnull) proposed openstack/openstack-ansible-lxc_hosts master: Clean-up old systemd prep and allow machinctl to grow https://review.openstack.org/527592 16:38:53 +1 centos, i haven't seen this one yet but i only have a single galera container atm (fail on me i know right) 16:39:52 good point idlemind , I have to try with a cluster. 16:39:55 let's move on 16:40:07 #link https://bugs.launchpad.net/openstack-ansible/+bug/1745270 16:40:08 Launchpad bug 1745270 in openstack-ansible "unable to connect to epmd (port 4369) on CentOS7" [Undecided,New] 16:40:40 now I am thinking this whole series is a networking issue :p 16:41:04 let's move on 16:41:06 #link https://bugs.launchpad.net/openstack-ansible/+bug/1745215 16:41:07 Launchpad bug 1745215 in openstack-ansible "Every openstack client is built in the repo build" [Undecided,New] 16:42:16 Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible stable/newton: Update os_nova and repo_build role SHA's https://review.openstack.org/541259 16:43:15 evrardjp: seems like not a bug, given the mentioned refactor ? 16:43:28 yeah, feature release 16:43:43 it's a bit painful, but not a bug - it's due to heat and that's life 16:44:00 if you don't use heat, remove it from your config and life will be awesome again :) 16:44:23 Major Hayden proposed openstack/openstack-ansible-tests master: Add a status line for SELinux status https://review.openstack.org/541371 16:45:01 I mean, we could propose patches to the heat client repo, which would be nice - if someone has the time and inclination, awesome 16:45:24 yeah. In the meantime it's wishlist I'd say, and confirmed. 16:45:29 yep 16:45:40 #link https://bugs.launchpad.net/openstack-ansible/+bug/1745212 16:45:41 Launchpad bug 1745212 in openstack-ansible "default_bind_mount_logs changes on N>O upgrade" [Undecided,New] 16:46:24 we can mark this fix released I guess? 16:46:28 looks like it 16:46:53 #link https://bugs.launchpad.net/openstack-ansible/+bug/1744458 16:46:54 Launchpad bug 1744458 in openstack-ansible "Failed to build cradox on CentOS7 " [Undecided,New] 16:47:01 unless it's not actually solved? mgariepy - I see the reviews were marked 'related', not 'closes' 16:47:27 I think this was solved recently 16:47:59 https://review.openstack.org/#/c/530570/ 16:48:01 Yes I think it looks solved 16:48:32 fix released 16:48:35 woot 16:48:38 hrm , sorry was the time i was half - not here haha :) 16:48:41 #link https://bugs.launchpad.net/openstack-ansible/+bug/1743032 16:48:42 Launchpad bug 1743032 in openstack-ansible "Galera cluster maintenance in OpenStack-Ansible" [Undecided,New] - Assigned to Kevin Carter (kevin-carter) 16:49:22 cloudnull: do you have time to check at this one, or should we unassign it? 16:49:40 https://docs.openstack.org/openstack-ansible-ceph_client/latest <-- is that just a guide on what to include when making our own ceph-install play? So, for instance, this could replace the ceph-install play that gets called from /opt/openstack-ansible/playbooks/setup-infrastructure.yml? 16:50:03 for integration with an existing ceph cluster 16:50:05 nurdie_: let's talk after the bug triage, in 10 minutes. 16:50:14 evrardjp, thanks! 16:50:15 Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-os_nova stable/newton: Revert "Switch the LXD compute driver test to non-voting" https://review.openstack.org/541372 16:51:08 evrardjp: I can look into that today 16:51:18 that's great! 16:51:19 Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-os_nova stable/newton: Revert "Switch the LXD compute driver test to non-voting" https://review.openstack.org/541372 16:51:25 next 16:51:27 #link https://bugs.launchpad.net/openstack-ansible/+bug/1739472 16:51:28 Launchpad bug 1739472 in openstack-ansible "mariadb-client excluded in ocata centos7 deploy" [Undecided,New] - Assigned to Markos Chandras (hwoarang) 16:52:11 hwoarang: said he cannot attend the meeting today, so let's postpone the discussion of this, unless someone has anything to say? 16:52:34 ok let's move on then 16:52:38 #link https://bugs.launchpad.net/openstack-ansible/+bug/1737827 16:52:39 Launchpad bug 1737827 in openstack-ansible "(ceph-client): setting 'nova_ceph_client' results in deployment where volumes can't be attached to VMs" [Undecided,New] 16:53:07 that's an old bug IIRC 16:53:29 what I mean is, I've seen that discussed in channel before - I think admin0 made some noise about it 16:53:41 ahh igot feedback on that 16:54:09 odyssey4me: yeah, I am sure I got that biting me too in the past... but things might have changed now :) 16:54:12 andymccr: ? 16:54:24 lemme look at the comments 16:54:37 i'll take that one 16:54:40 ok 16:54:50 assigning it to you 16:54:58 thanks 16:55:05 next 16:55:07 #link https://bugs.launchpad.net/openstack-ansible/+bug/1729263 16:55:09 Launchpad bug 1729263 in openstack-ansible "Swift (master) transient tempest failures under centos " [Undecided,New] 16:55:20 let's skip that one 16:55:25 #link https://bugs.launchpad.net/openstack-ansible/+bug/1721554 16:55:26 Launchpad bug 1721554 in openstack-ansible "os_ceilometer fails without swift installed" [Medium,New] 16:56:01 I have the impression there was a patch for this one 16:57:01 mnaser: are you there? 16:57:06 o/ 16:57:24 * mnaser reads that bug 16:58:06 thanks mnaser ! 16:58:49 okay it looks like this is something that should be skipped if not using swift 16:58:50 If you are working on removing mongo there are maybe other things on that topic that can cause issues, and things to be aware of... 16:58:57 thanks for having a look! 16:59:46 if you need any help, don't hesitate mnaser ! 17:00:00 ok we are out of time 17:00:13 thanks everyone for your help on making the world of openstack-ansible better 17:00:22 #endmeeting