16:01:23 #startmeeting openstack_ansible_meeting 16:01:24 Meeting started Tue Aug 14 16:01:23 2018 UTC and is due to finish in 60 minutes. The chair is mnaser. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:26 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:28 The meeting name has been set to 'openstack_ansible_meeting' 16:01:36 evrardjp: yep, but the question was about agenda location 16:01:40 nevermind) 16:01:41 #topic rollcall 16:01:49 o/ 16:02:01 o/ 16:02:18 o/ 16:02:33 o/ 16:02:34 o/ 16:02:55 #topic last week highlights 16:03:15 i apologize, i missed the last meeting, could anyone update on which are stale items and which were added? 16:03:31 inside https://wiki.openstack.org/wiki/Meetings/openstack-ansible#Meeting_section_.22Last_week_highlights.22 16:03:39 I haven't changed those sorry 16:03:50 no worries, they all seem somewhat still relevant 16:03:51 let me update them real quick 16:04:04 jrosser: has put in a whole lot of work on bionic 16:04:46 still hacking around https://review.openstack.org/#/c/591287/ afaik 16:05:07 o/ its very close - i have a review for enabling a nv test on the integrated build which gets as far as the usual tempest fail 16:05:11 mnaser: updated. 16:05:34 please help with reviews and any work that jrosser needs help with. 16:06:09 next: leap15 doesnt seem to have progressed, it would be good to get some updates to know if we're close or not, because we have these always failing jobs 16:06:34 so we want to be good citizens of infra and avoid having a forever failing job 16:07:18 mnaser: there was issues in infra on leap 15, and mariadb started to move this morning 16:07:26 okay cool, so we have progress so that's awesome 16:07:29 so it's slowly progressing 16:07:38 evrardjp: RC1 published. Branching happened, but we need a freeze in OSA. I am tracking this. 16:07:48 We also have the transient issues with cinder. I didnt take a deep look into it yet. But a intend to do 16:08:13 sounds good 16:08:17 Im on mobile, so I wont find the patch easily 16:08:22 guilhermesp: maybe the fix jrosser is helping. It helped me at least. But we need to track if this happens again. 16:08:46 i assume the freeze is https://review.openstack.org/#/c/590503/ 16:08:49 but that's a good highlight, thanks guilhermesp ! 16:09:03 mnaser: yes 16:09:05 so it looks like it's going through the gates right now 16:09:17 and then 16:09:18 evrardjp: please fill this etherpad https://etherpad.openstack.org/p/osa-stein-ptg 16:09:20 mnaser: indeed. The extra part will be to check when tempest is released 16:09:39 then we can use tempest from pypi in stable/rocky 16:09:47 evrardjp: solid! 16:10:00 if you're attending (or not) the ptg, please add topic discussions. i would love for us to find a way to get folks who cant make it up in a google hangout or something 16:10:01 and last, is we fix ceph-ansible, which hwoarang has a patch for 16:10:11 mnaser: ++ 16:10:23 please add your conversations, questions, or anything there. 16:10:49 i think it would probably be good to send an email out if anyone isn't around to see that, but ive been trying to push it on irc here and there 16:11:22 if we don't have any other things, we can jump to bug triage 16:12:07 agreed 16:12:13 I will send the email 16:12:17 evrardjp: cool, thank you 16:12:22 #topic bug triage 16:12:26 #link https://bugs.launchpad.net/openstack-ansible/+bug/1786292 16:12:26 Launchpad bug 1786292 in openstack-ansible "Integration of swift api with ceph RGW" [Critical,New] 16:12:30 i have some work on go-faster which we could talk about at the ned 16:12:32 *end 16:12:37 I will be afk, read the logs later. Seeya 16:12:52 sigh 16:12:54 guilhermesp: take care 16:13:01 jrosser: please if you can! :) 16:13:06 that is *not* a critical bug 16:13:11 My reading is a misunderstanding between swift and radosgw / ceph 16:13:18 bgmccollum: i kinda agree. 16:13:25 "swift is still not using Ceph" 16:13:36 it never will 16:13:51 yep 16:13:52 *well, maybe not never* 16:13:56 he might mean swiftcli - but that is probably because it has 2 endpoints of the same type and may be default picking the first one (swift) 16:14:02 never is a long time 16:14:13 yeah both ceph and swift are deployed which is never right 16:14:25 *ceph radosgw and swift 16:14:27 this is a user story 16:14:42 becasue there is subtlety here if you want to so S3 and swift at the same time 16:14:48 *do 16:15:06 * jrosser adds to todo list 16:15:21 I could see using both, but very rarely 16:15:29 i mean those things served up by ceph/radosgw 16:15:56 well right now its the swift client probably not sure about which endpoint to use 16:16:15 so i think the issue here is that a) you cant run both swift and ceph radosgw (at least, not suppoted and tested by OSA) 16:16:50 and i think that's it considering we don't own ceph-ansible playbooks and they create all the things 16:17:02 we create the endpoints etc 16:17:18 but yeah having multiple endpoints of the same type wont work as expected. 16:17:33 andymccr: oh, well in that case maybe we should ensure: absent for the swift endpoints if radosgw is being deployed? 16:17:47 it might flip flop if someone wants to deploy swift and ceph, but that's them doing something wrong 16:17:59 or maybe have the swift playbook die if radosgw is enabled and vice versa 16:18:19 mnaser, i think we just change the port if swift exists (to 7980) - maybe we shouldn't setup the endpoints? or we should put it in a different region 16:18:37 that feels like a hack to get it to work though - i feel like if you really want that your use case is quite specific and you should probably dictate how we do that 16:18:44 andymccr: i agree. 16:19:04 in this case i think there is confusion for sure though 16:19:13 so what do we feel like setting this too 16:19:49 i think the intent needs to be understood. if he though swift had to be deloyed to get the Swift API backed by Ceph (radosgw), then maybe a documentation addition to clear up the confusion...? 16:19:58 ^ yeah - we should ask for more info/feedback 16:20:12 bgmccollum: could you comment for more info? 16:20:18 sure 16:21:23 thank you bgmccollum 16:21:55 bgmccollum: you can put the status to incomplete too 16:22:02 #link https://bugs.launchpad.net/openstack-ansible/+bug/1785592 16:22:02 Launchpad bug 1785592 in openstack-ansible "dynamic inventory doesn't handle is_metal: false->true change" [Undecided,New] - Assigned to Kevin Carter (kevin-carter) 16:22:02 yes 16:23:00 that does seem like a very plausable bug 16:23:05 i think its because the inventory is not removed 16:23:23 evrardjp: hello 16:23:50 chandankumar: hello, we are in bug triage meeting, can this wait? 16:23:55 evrardjp: sure 16:24:01 thanks 16:24:47 i guess it's already assigned to cloudnull 16:24:57 i will put this as confirmed because i know its actually an issue 16:25:08 confirmed/high 16:26:59 #link https://bugs.launchpad.net/openstack-ansible/+bug/1785386 16:26:59 Launchpad bug 1785386 in openstack-ansible "Integration of Swift and Manila with Openstack-ansible having Ceph backend?" [Wishlist,New] 16:27:25 looks like this was taken care of last time by spotz 16:27:31 status is still new thats probably why it showed up 16:27:36 so we can do confirmed/wishlist 16:28:00 seems okay? 16:28:03 Yeah sorry! 16:28:43 next up 16:28:44 #link https://bugs.launchpad.net/openstack-ansible/+bug/1785365 16:28:44 Launchpad bug 1785365 in openstack-ansible "LXC container network slow because of kernel debug mesg" [Undecided,New] 16:29:18 thanks spotz 16:29:25 (sorry for the lag) 16:29:33 "Later i found iptables checksum-fill for port 80 rule was causing all issue." 16:29:37 looks like it's a user issue? 16:29:43 I think I fixed that. 16:29:54 it's both an user issue and a default thing for an AIO 16:30:03 I've pushed something, and it has a release note for users. 16:30:19 maybe I used the wrong Fixes-Bug. 16:30:21 evrardjp: got a link to include as a comment to that bug? 16:30:29 wow that reminds me of something we found and fixed 2 years ago 16:30:45 my bad, none: https://review.openstack.org/#/c/589463/ 16:30:59 mnaser: sorry for that. 16:31:02 yeah will do 16:31:05 evrardjp: it's all good 16:33:31 updated 16:33:34 perfect thank you 16:33:37 #link https://bugs.launchpad.net/openstack-ansible/+bug/1784880 16:33:37 Launchpad bug 1784880 in openstack-ansible "Pike to queen upgrade neutron issue " [Undecided,New] 16:34:01 i think this is a duplicate of the earlier one 16:34:09 duplicate of 1785592 i think 16:35:27 checking 1785592then 16:35:42 not so sure if it's a duplicate 16:35:56 but they are all linked to moving to bare metal 16:36:14 this work was not fully QA-ed 16:36:15 its an upgrade 16:36:23 pike => queens took agents to baremetal 16:36:23 yeah 16:36:26 so i think the same thing happened 16:36:27 sounds similar to me 16:36:32 they are similar 16:36:37 that's the right term 16:36:46 the context is the same, not sure what's the root cause 16:37:10 but indeed if the inventory is wrong, it could lead to issues in the upgrade. 16:37:44 i think its a duplicate imho 16:38:05 its happening because is_metal changes and our inventory doesnt delete the old containers 16:38:07 let's mark it duplicate 16:38:15 I am fine with the duplicate. 16:38:39 done 16:38:45 #link https://bugs.launchpad.net/openstack-ansible/+bug/1783668 16:38:45 Launchpad bug 1783668 in openstack-ansible "Playbook openstack-service-setup does not run on MNAIO" [Undecided,New] 16:39:13 i dunno anything about that bug at all 16:40:01 idk why that would be different from a regular deploy 16:40:01 likewise 16:40:19 MNAIO is multinode all in one.. the one in -ops repo? 16:40:29 yes 16:40:33 oh 16:40:35 wait 16:40:38 the playbooks must have been able to target the utility container during the deploy so thats probably a reasonbly obvious error somewhere 16:41:07 odyssey4me has been working on the mnaio recently 16:41:34 jrosser: well I have asked questions 16:41:46 i think thats good 16:41:53 we'll see 16:41:55 that comment should get us somewhere from there 16:42:06 do we wanna mark incomplete in the meantime or? 16:42:19 yup 16:43:22 sounds good 16:43:30 #link https://bugs.launchpad.net/openstack-ansible/+bug/1783423 16:43:30 Launchpad bug 1783423 in openstack-ansible "Flush all of the cache in memcached issue" [Undecided,New] 16:44:31 why dont we just use the variables from inventory? 16:45:47 https://github.com/openstack/openstack-ansible-memcached_server/blob/master/defaults/main.yml#L48-L49 16:45:48 as in those 2 16:47:03 yeah 16:47:05 we should 16:47:15 but how do we load those? 16:47:20 as this is not in the group 16:47:26 and probably overriden 16:47:45 evrardjp: dont we run this playbook with openstack-ansible ? 16:47:50 it's in group vars 16:47:53 so we can now 16:48:08 yeah replacing with variables should be the right decision 16:48:14 https://github.com/openstack/openstack-ansible/blob/master/scripts/run-upgrade.sh#L194 16:48:31 great so confirmed/medium and add a comment mentioning we can use that instead of regex? 16:48:39 yeah 16:48:42 I have to go 16:48:53 sorry to skip the end of the meeting. ttyl everyone! 16:49:22 np 16:49:55 #link https://bugs.launchpad.net/openstack-ansible/+bug/1782388 16:49:55 Launchpad bug 1782388 in openstack-ansible "Installing Multipath But Not Enabling In Nova Causes Volume Attachment Failures" [Undecided,New] 16:50:04 all yours bgmccollum :P 16:50:15 so... 16:50:56 last comment sums up findings... 16:50:59 Here is what I've discovered... 16:50:59 If multipath-tools is installed (which is now the default), then nova.conf *MUST* have `iscsi_use_multipath = True` set in nova.conf, or attachments won't work. If your compute hosts are also volume hosts (meaning, multipath is installed and running), then you *MUST* have `use_multipath_for_image_xfer = True` set in cinder.conf under [lvm], or volume migration won't work. 16:51:36 If the volume and instance are on the same host, then attachments aren't an issue... 16:51:53 bgmccollum: so is this possibly a nova/cinder issue? 16:51:55 but im no expert, and would really like to see if someone else experiences these same issues 16:52:20 possibly... 16:52:51 i dont really know much about multipathing and lvm/cinder :( 16:53:32 attachments failing doesn't manifest when the volume and instance are on the same host...which is how the gate is setup... 16:53:55 so, need someone with a real deployment with volume / compute co-located 16:53:59 using lvm/iscsi 16:54:13 i think we dont use multipath in the gate too.. i think? 16:54:38 its installed by default 16:54:57 i'm not sure i can be much help 16:55:10 id like to keep the bug as 'new', maybe next meeting we'll have more people 16:55:17 no worries... 16:55:43 #link https://bugs.launchpad.net/openstack-ansible/+bug/1778586 16:55:43 Launchpad bug 1778586 in openstack-ansible "aio_lxc fails on openSUSE Leap 42.3: package conflict between gettext and gettext-runtime" [Medium,Incomplete] - Assigned to Jean-Philippe Evrard (jean-philippe-evrard) 16:55:52 i set that as medium/incomplete as it seems to have already been assigned and worked on 16:56:19 #topic open discussion 16:56:32 we have a few minutes :) 16:56:46 https://etherpad.openstack.org/p/osa-faster 16:57:12 i've been thinking about how to "go faster", this is my braindump 16:57:56 some could be nonsense, some worthwhile, and i wanted to show this as it's background to some of my reviews, like the eatmydata stuff 16:58:12 i like these ideas 16:58:28 mitogen has a really interesting way of being able to delegate to lxc containers 16:58:38 so we avoid our whole custom connection plugin magic 16:58:45 i have an AIO+mitogen which gets as far as keystone 16:59:02 be fun to hack on that 16:59:19 which for pretty much just switching it on is excellent 16:59:24 dw: ^ nice work! 16:59:36 anyone would like to volunteer for running next weeks meeting? 17:00:57 i guess not, i'll run next weeks :) 17:01:02 thanks everyone!! 17:01:08 #endmeeting