16:00:27 #startmeeting openstack_ansible_meeting 16:00:28 Meeting started Tue Dec 12 16:00:27 2017 UTC and is due to finish in 60 minutes. The chair is evrardjp. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:29 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:32 The meeting name has been set to 'openstack_ansible_meeting' 16:01:23 #topic bugs with new status 16:01:40 is there anyone ready for bug triage? 16:01:57 im here 16:01:58 o/ here, although between some things 16:02:10 let's start 16:02:11 #link https://bugs.launchpad.net/openstack-ansible/+bug/1737013 16:02:12 Launchpad bug 1737013 in openstack-ansible "Using non-default Ceph user for Cinder Backup service" [Undecided,New] 16:02:38 confirmed medium? 16:02:41 What about the solution? 16:03:29 looks sensible to me 16:03:52 hey 16:03:56 ok 16:04:02 hmm, we should set a group var to make sure that ceph_client and cinder agree on what's right 16:04:10 odyssey4me: see my comment 16:04:53 we may want to skip the group var if we can just pass it over 16:05:01 yeah, medium as it's fairly serious 16:05:18 I can't confirm without looking at the code itself, but it sounds plausible. 16:05:21 ok 16:05:30 I have checked the code it's real. 16:05:32 :D 16:05:39 now it's about how to do things 16:05:58 we can discuss in the bug 16:06:04 we have too many bugs to sort today 16:06:07 next 16:06:17 #link https://bugs.launchpad.net/openstack-ansible/+bug/1736989 16:06:19 Launchpad bug 1736989 in openstack-ansible "[Docs] Issue when using --limit to add compute with DVR" [Undecided,New] 16:06:58 I've seen more than one ppl having failures with limit, maybe we should document this better. 16:07:09 This is a sub-case, probably from our operations guide 16:07:28 evrardjp: pop a note in what you want and then assign to me 16:07:31 For me it's confirmed and low 16:08:10 we either need to test limits so that we fix issues that arise from using it, or we should recommend not using it on specific playbooks 16:08:31 the ops guide has a lot of dated info, unfortunately 16:08:48 agreed for confirmed, low 16:08:54 spotz: could you have a look where we have --limit in our docs, and update to add localhost, when needed? 16:09:11 evrardjp: grep is my friend:) 16:09:18 yup :) 16:09:28 ok cool 16:09:36 next 16:09:37 #link https://bugs.launchpad.net/openstack-ansible/+bug/1736731 16:09:38 Launchpad bug 1736731 in openstack-ansible "os_nova might create a duplicate cell1" [Undecided,New] 16:09:45 evrardjp: travelling with the dog this weekend but if I dont get to it before I leave I'll get it next week 16:09:57 also, any reference to lxc-containers-create.yml and --limit should always include lxc_hosts in the limit 16:10:11 ^ +1 this 16:10:23 so 16:10:26 next bug 16:10:33 Adri2000: was the owner of this bug 16:10:45 well. is owner :) 16:10:54 for me it's a question of what we do 16:11:01 it's all a question of expectations 16:11:42 I expect that adding a rabbitmq node works :) 16:11:45 if something changed, like in this case adding a rabbitmq node, should we EDIT the cell by being smart ourselves, or should we fully rely on nova 16:12:22 well it technically worked: you now had a new cell with your new config :p 16:12:28 Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible master: Remove symlinked git cache functionality https://review.openstack.org/521749 16:12:35 the problem is that you had two cells with the same name 16:12:40 am I correct? 16:12:44 yes 16:12:47 Or the problem went even further? 16:12:55 that has some odd consequences 16:13:00 I’m running into problems around the physical_interface_mappings in linuxbridge_agent.ini with this my openstack_user_config http://paste.openstack.org/show/628740/ br-public shows up in both compute and neutron hosts :( I was expecting the group_binds to apply it only on neutron nodes? 16:13:08 hmm, I wonder if there's a create-or-update command 16:13:09 like nova compute service list listing every service twice 16:13:17 openstack compute service list* 16:13:23 so because nova is creating cells based on rabbitmq and galera nodes, we should check the existence of the cell name only ? 16:13:47 odyssey4me: I am not aware of it 16:14:11 as long as OSA supports deploying only one cell, I think checking whether "cell1" already exists and then update it should be fine 16:14:42 it looks like tripleo fixed a similar issue in a similar way (I linked their bug report in our bug report) 16:14:54 looks like this is where it's done: https://github.com/openstack/openstack-ansible-os_nova/blob/65c12961b4d764ae541e49ec9b582fec086dadc8/tasks/nova_db_setup.yml#L31-L35 16:15:08 usage: nova-manage cell [-h] {create,delete,list} ... 16:15:20 which means that nova itself has to get fixed to prevent duplication 16:15:31 and they'll argue that your db should be in sync 16:15:42 odyssey4me: yeah that's exactly the thing 16:15:45 oh cell1... hang on 16:15:58 it's a nova bug, well no it's not a nova bug, it's your deploy... etc. 16:16:03 odyssey4me: next task I think 16:16:18 command: "{{ nova_bin }}/nova-manage cell_v2 create_cell --name {{ nova_cell1_name }} [...] 16:16:47 usage: nova-manage cell_v2 [-h] 16:16:48 {create_cell,delete_cell,discover_hosts,list_cells,map_cell0,map_cell_and_hosts,map_instances,simple_cell_setup,update_cell,verify_instance} 16:16:50 oh, perhaps we should check for an existing one, then create if it doesn't exist 16:17:01 there we go, update_cell 16:17:15 yup 16:17:25 so I think we should list_cells 16:17:29 so basically change that command task into a shell task which does a check and does the right thing, then reports back whether it changed or not 16:17:30 see if there is one with maching name 16:17:36 then update if that's the case 16:17:39 else, create 16:18:09 ok sounds like a solved problem! 16:18:19 yes, it should definitely do something like that - good find that one! 16:18:31 yeah thanks Adri2000 16:18:38 that'll need backporting to wherever cells started - ocata or pike, can't recall 16:18:39 confirmed high? 16:18:48 yeah 16:18:48 cell v2 is end of O 16:18:51 iirc 16:19:08 it's in O, yes: https://github.com/openstack/openstack-ansible-os_nova/blob/stable/ocata/tasks/nova_db_setup.yml#L37-L43 16:19:24 Adri2000: can you submit the patch? 16:19:36 I propose confirmed and high? 16:19:43 yes, agreed 16:19:52 evrardjp: I can try to submit something this week 16:20:05 else I just do it quick and you test it :p 16:20:22 ok let's sync later 16:20:27 next 16:20:29 yep :) 16:20:37 #link https://bugs.launchpad.net/openstack-ansible/+bug/1736726 16:20:39 Launchpad bug 1736726 in openstack-ansible "Pike update causes parallel restart of Galera containers" [Undecided,New] 16:20:50 * Adri2000 again o/ 16:20:58 big deal IMO 16:21:07 so we have a patch in master to prevent restarts 16:21:19 we already discussed this on the channel too. 16:21:27 yes it was discussed in the chan one or two days before I filed the report 16:21:29 my concern is that: 16:21:37 yeah, but I'd rather have a fix figured out 16:22:00 1) we don't have a test to catch that . 2) we need to fix it soon. 16:22:08 for now we can do a reno though to inform people of the known issue - the trouble is that we need to figure out where the change was introduced so that we can say - any versions from x are affected 16:22:23 16.0.3 and 16.0.4 are affected 16:22:39 so anyone from earlier than that will suffer. 16:22:44 when I was digging into this, I thin kall the way back to newton might be affected 16:22:57 why? 16:23:06 it's coming from a backport, was that backported to N? 16:23:13 but I didn't get to a point of confirming, because we don't have enough information about what the change was to the container config that caused it 16:23:30 yes, I think a patch was made to master, and ported back 16:23:50 16.0.3 is fine I think 16:23:53 this has happened before https://review.openstack.org/#/c/426337/ 16:23:56 the patch was fine for master (because in a major upgrade we handle this issue), but not for backports (because we can't handle it for minors) 16:24:32 odyssey4me: I agree there. 16:24:40 yep, good find there jmccrory 16:25:07 jmccrory: there was no documentation 16:25:10 we should probably serialise te container config changes in all the playbooks anyway for exactly this reason 16:25:20 we just fixed the issue by introducing the serialization 16:25:29 here it's on lxc hosts/lxc container create 16:25:30 if we do, this whole issue of when we can and can't restart becomes moot 16:25:52 yes, well - if it's in lxc-containers-create then we have a big problem 16:25:59 https://git.openstack.org/cgit/openstack/openstack-ansible-lxc_container_create/tree/handlers/main.yml#n19 this handler is the issue 16:26:17 if that handler is called for all the galera containers: you loose 16:26:24 yeah more impact if it's happening through an lxc host/create play 16:26:27 odyssey4me: for me, we should BY DEFAULT for a stable branch NOT reboot. 16:26:29 Adri2000 that's just doing what it's told - some config changed, and that config change is the real root cause 16:26:56 odyssey4me: shouldn't we be able to change the lxc containers config without worrying that restart of the containers will break everything? 16:27:08 evrardjp we should never allow changes to container config to be ported back, unless we also handle the effects or add a BIG release note 16:27:28 odyssey4me: I agree. I was on holiday at that time, under the sun :D 16:27:31 Adri2000 not really - container config changes do not take effect until a container is stopped, then started again 16:27:42 I should have eyes everywhere, I know. 16:27:50 evrardjp I feel you. 16:28:14 do we have any record of what exactly changed in the container config? 16:28:41 Adri2000: you told me it all happened on the same backport right? 16:28:46 you had no issues before that 16:28:47 ? 16:29:06 yes 16:29:11 never had a similar issue before 16:29:30 if we ran lxc-containers-create in the service playbooks, then we could serialise this sensibly 16:29:34 but that doesn't help stable branches 16:29:34 that happened while doing a regular git pull of stable/pike, bootstrap-ansible && playbooks run 16:29:44 please also note that there was a change in stable/pike 16:29:50 https://github.com/openstack/openstack-ansible-lxc_container_create/commit/c41d3b20da6be07d9bf5db7f7e6a1384c7cfb5eb 16:30:03 on top of the mega backport https://github.com/openstack/openstack-ansible-lxc_container_create/commit/05b84528d100ab73b91e119dd379a8ed0726db7d 16:31:06 evrardjp ok, so if we do not reboot the containers by default - how do we handle making sure the containers are properly configured when initially built? 16:31:27 odyssey4me: that's not possible :p 16:31:30 perhaps do something like drop a local fact, then handle the reboot in serial in the service play? 16:31:36 that's why I think we are in a pickle 16:31:44 well yes or something like that 16:31:57 that's something we can do, and is back portable 16:32:16 it'll also survive a failed playbook, which is nice 16:33:01 odyssey4me: yeah and we are closer to JIT. 16:33:08 because that's what ppl want 16:33:16 so good for me. 16:33:34 I have no cycles for that. 16:35:01 confirmed high? Maybe it will break some upgrades gates, then I will need to spend cycles on that, but right now I can't say. 16:35:10 upgrade tests* 16:35:22 ok for the classification anyone? 16:35:24 I'd say at least high :) 16:35:44 anyone with more than 1 galera node is likely to be impacted 16:35:44 high, confirmed - assign it to me 16:35:46 yeah for now i'd say high and move to critical if broken jobs 16:35:51 unless someone else wants to take it? 16:36:01 anyone? 16:36:07 pretty please? 16:36:09 :p 16:36:13 ok that's for odyssey4me then! 16:36:19 Adri2000 can you please add the tag version you upgraded from and to, so that I can add a reno 16:36:26 it should be cloudnull tbh. 16:36:35 that'll really help me narrow down where the issue is 16:36:40 odyssey4me: it's in the bug 16:36:44 odyssey4me: the bug report contains the commit ids already, because I was not using tags 16:36:56 was on stable/pike head 16:37:14 my mom says "You break it, you fix it". 16:37:37 heheh 16:37:48 ah, perfect thanks 16:37:52 ok next 16:37:55 #link https://bugs.launchpad.net/openstack-ansible/+bug/1736719 16:37:56 Launchpad bug 1736719 in openstack-ansible "Appendix A: Example test environment configuration in OpenStack-Ansible" [Undecided,New] 16:38:23 I guess spotz will have a look at it? :) 16:38:27 ah yeah, maybe mhayden and/or hwoarang could pick that up? 16:38:30 hwoarang: mhayden could you help? 16:38:32 AHAHAH 16:38:43 yeah. 16:38:46 mgariepy: too maybe :p 16:38:52 confirmed and wishlist ok ? 16:39:00 Poor appendix A, no body loves it 16:39:35 spotz: we'll make it better. 16:39:37 :) 16:40:00 odyssey4me / evrardjp: what's up? 16:40:03 I assigned that to you spotz , please ask those suse/redhat guys :) 16:40:12 we need help on sample network config 16:40:14 mhayden: ^ 16:40:16 mhayden go fix appendix a:) 16:40:17 ORLY 16:40:20 for appendix a 16:40:23 mhayden well, evrardjp's beating you in the break-everything-dept 16:40:27 just copy pasta and we are good 16:40:29 hot dang 16:40:44 evrardjp: i will not let you beat me in gate breakage 16:40:49 not just appendix A - ALL OF THEM! 16:40:50 this aggression will not stand 16:40:50 odyssey4me: you need to bring patches to break stuff :p 16:40:58 bwahaha 16:41:10 * mhayden starts writing a new linter 16:41:12 well, there are ppl that do it magically 16:41:18 mhayden: I wonder if we're bad it'll snow again?:) 16:41:35 that make snow sense 16:41:36 yes, let's make it snow more - the world is too warm 16:41:37 mhayden: or new backend for containers. 16:41:56 odyssey4me: we are losing the debate folks. 16:41:58 anyway, that's 10 mins of our lives which bug triage will never get back 16:41:58 let's make a new container frontend 16:42:07 hehe 16:42:09 I think right now, we need a good breakage. 16:42:11 :p 16:42:28 i'm about to take down my pike env to re-do the networking and add bonding 16:42:33 so i might be able to put some of that in 16:42:40 I am glad I managed to break it twice in my lifetime, I think I am far from all the others :D 16:42:44 i will havea look too 16:42:47 I will get better. 16:42:50 i don't have vxlan offloading, so i need another route for that :/ 16:43:06 well as long as you give a first draft, that would be good I guess 16:43:11 can we move on? 16:43:23 talking about mhayden breakages: 16:43:27 hooray 16:43:35 #link https://bugs.launchpad.net/openstack-ansible/+bug/1736702 16:43:36 Launchpad bug 1736702 in openstack-ansible "TMOUT declared two times after upgrade in /etc/profile" [Undecided,New] 16:43:38 sounds like evrardjp wants me to move on and work on something else 16:43:40 :) 16:43:49 ugh yes 16:44:02 does ansible have a capacity to remove an entire block? 16:44:04 thanks bro. 16:44:13 i'll take that one and do some experimentation 16:44:14 yes, blockinfile state absent? 16:44:37 i'll do some testing/tinkering there 16:44:38 you just use the same task, copy pasta, replace state present with absent, and change the marker. BOOM 16:44:49 mhayden: thanks! 16:44:54 * mhayden gets out markers and crayons 16:44:54 confirmed med 16:45:11 mhayden evrardjp there was a patch that cloudnull did which converts all the networking to networkd 16:45:14 where is the love for paint! 16:45:15 ? 16:45:27 MS Paint is deprecated 16:45:36 * mhayden stops ruining evrardjp's meeting ;) 16:45:41 odyssey4me: that's a good potential breakage too, but that was yesteryear's conversation! 16:45:55 mhayden: everything is fine 16:46:00 (fire) (fire) 16:46:17 we are almost at a third of the bugs 16:46:20 great 16:46:21 #link https://bugs.launchpad.net/openstack-ansible/+bug/1735709 16:46:22 Launchpad bug 1735709 in openstack-ansible "Grub authentication is not applied on Fedora 26 and Ubuntu 16.04 (at least)" [Undecided,New] - Assigned to Major Hayden (rackerhacker) 16:46:24 odyssey4me: https://review.openstack.org/#/c/523218/ 16:46:32 OH WAIT, guess what? 16:46:33 :p 16:46:38 sadly i have to circle back on that for suse 16:46:53 odyssey4me: don't feed the troll. 16:46:56 :p 16:47:25 mhayden: status there? 16:47:38 should we care? 16:47:47 nothing yet 16:47:50 but this isn't critical 16:48:02 most people i know are horrified about grub auth on boot 16:48:05 but it would be nice to fix it 16:48:48 confirmed low? 16:48:51 or still not confirmed? 16:49:50 or just take it out and say, do it if you wanna 16:49:55 :p 16:50:28 leaving it as is then :p 16:50:39 next 16:50:41 #link https://bugs.launchpad.net/openstack-ansible/+bug/1731186 16:50:42 Launchpad bug 1731186 in openstack-ansible "vpnaas_agent.ini wrong config" [Undecided,New] 16:52:21 should I mark this as incomplete? 16:52:27 I am not sure I understood this one 16:52:45 evrardjp: Adri2000: I missed the message before, what is it i need to do ? 16:53:01 * cloudnull can wait for after the meetin g 16:53:26 wasn't vpnaas removed ? 16:53:43 cloudnull: fixing the backport that forces restarts of all containers even on minor updates. odyssey4me is assignee for now, but you should work with him :D 16:53:55 ok 16:53:59 not so sure but now that you remind me that 16:54:23 I think bgpvpn aaS was talked about, etc. 16:54:35 so maybe it's worth saying incomplete, and maybe won't fix? 16:54:36 evrardjp: odyssey4me: doesn't https://review.openstack.org/#/c/527256 fix that? 16:55:04 we can talk about the approach after the meeting 16:55:07 ok 16:55:17 ok let's wrap it up then 16:55:34 I am marking this vpnaas as incomplete 16:55:43 next 16:55:45 #link https://bugs.launchpad.net/openstack-ansible/+bug/1729525 16:55:46 Launchpad bug 1729525 in openstack-ansible "Can't run bootstrap_ansible with encrypted user_secrets.yml (prompting for password) " [Undecided,New] 16:56:43 ok I think it's fixed 16:57:12 we should mark it as invalid on the next bug triage 16:57:14 so 16:57:29 new topic! 16:57:34 #topic next week bug triage 16:57:49 can someone handle the bug triage for the next week? 16:58:04 I am not available 16:58:21 I will be out as well 16:58:22 Starting from tomorrow I won't be available at all until next year. 16:58:25 ++ 16:58:34 ok 16:58:41 anyone else to run the meeting? 16:59:06 if nobody can run it, we can cancel it, and I post a mail on the ML 16:59:23 ok for everyone? 16:59:29 sounds good 16:59:42 ok thanks everyone then! 16:59:47 #endmeeting