16:03:55 #startmeeting openstack_ansible_meeting 16:03:56 Meeting started Tue Aug 22 16:03:55 2017 UTC and is due to finish in 60 minutes. The chair is evrardjp. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:03:57 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:04:00 The meeting name has been set to 'openstack_ansible_meeting' 16:04:09 #topic rollcall 16:04:09 \o/ 16:04:40 waiting a few seconds for ppl to join 16:04:54 o/ 16:05:37 o/ 16:06:52 o/ 16:07:01 #topic this week's bugs 16:07:06 #link https://bugs.launchpad.net/openstack-ansible/+bug/1712315 Newton integrated build fails in nodepool when executing galera_server role 16:07:07 Launchpad bug 1712315 in openstack-ansible "Newton integrated build fails in nodepool when executing galera_server role" [Undecided,New] 16:07:54 That sounds serious for gating. 16:08:11 Confirmed critical ? 16:08:19 we are non voting on xenial 16:08:22 so its not critical 16:08:32 we think its due to overlapping repos from infra/what we add in newton 16:08:43 but we should get it fixed/investigate more 16:08:54 it may become critical when infra removes the trusty gates 16:09:00 perhaps high for now 16:09:13 this is true 16:09:57 sounds a good classification to me. Will note the remark about criticality 16:10:04 we have a fix idea? 16:10:26 not sure - i havnt looked into it enough 16:10:53 ok 16:11:10 let's continue in the meantime, if someone has time, that would be great to work on it... 16:11:11 #link https://bugs.launchpad.net/openstack-ansible/+bug/1711765 16:11:12 Launchpad bug 1711765 in openstack-ansible "dynamic_inventory.py adds 'physical_host = None' to hostvars" [Undecided,New] 16:11:20 <- submitter if you have any questions. 16:11:34 of 1711765 16:11:38 a bug submitter at the triage meeting, that's great! 16:11:44 . ) 16:12:06 thanks for being fast at replying too 16:12:31 the openstack user config seems ok, except that "ip unclosed, but I guess it's the edition 16:12:39 Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-os_neutron stable/pike: Simplify the string check for offline db migrations https://review.openstack.org/495349 16:12:54 right. boss-men wantde the IPs cleaned for submission. 16:13:09 tasker: small question, are all the tuples (hostname, ip) the same for all the groups? 16:13:16 fine for me 16:13:17 yes 16:13:23 ok 16:14:59 I was thinking about the title earlier this morning and thinking that it might not accurately summarize what the problem is. it's not so much that "physical_host = None", but that None gets added to the host_list. 16:15:21 Merged openstack/openstack-ansible-os_neutron stable/pike: tasks: neutron_install: Fix virtualenv-tools issue on openSUSE https://review.openstack.org/495360 16:15:28 yes, but weirdly I tried a full on metal thing recently, and I didn't get that 16:15:30 Merged openstack/openstack-ansible-os_neutron master: Update reno for stable/pike https://review.openstack.org/495064 16:15:35 I am concerned that there is something else there 16:15:57 while the sanity check doesn't hurt by itself, I am worried it could help hide other issues. 16:16:25 and I'm more than willing to blame a problem in my inventory, but I thought that the sanity check could help at least prevent further issues. ... ahh, I got you. 16:17:22 Has been a while since I looked at the inventory code, not sure what the follow on effects of that would be 16:17:24 yeah. jmccrory opinion? 16:17:34 palendae: \o/ 16:17:45 So basically I'm echoing evrardjp :p 16:18:01 palendae: ah that's it meant :) 16:18:17 the follow on is when the upstream ansible inventory.Host class tries to stringify the host.name when it's None. 16:18:20 how can we help triaging the issue at the inventory then? 16:18:35 tasker: Right, I'm more thinking of our inventory module, which is a bit messy 16:18:42 evrardjp agree about sanity check maybe hiding cause of issue 16:18:48 It's totally possible there's some hidden assumption about 'None' somewhere else 16:19:19 so let's focusing on _not the fix_ but rather on triaging the issue then 16:19:47 how can we reproduce this, as the o_u_c seems alright at first sight 16:20:15 I think that might have been added to help with the connection plugin 16:20:32 does it help you to understand that my gateways are virtual machines within the cluster infrastructure? 16:21:24 https://github.com/openstack/openstack-ansible-plugins/blob/6c6753bbede974a4f787f0e574e693ac0414371c/connection/ssh.py#L68-L69 16:21:32 tasker: you added a group, and then you are using your own playbooks targetting those nodes and you get that issue, whilst you don't get the issue for the other plays? 16:21:48 think connection plugin was updated to ignore a None or non existent physical_host, but don't know why the inventory script would have set one to None 16:21:48 it would appear that this will return None on its own unless I'm misunderstanding how it works 16:21:59 odyssey4me: oh I see. 16:22:01 it occurs regardless of playbook -- it's whenever dynamic_inventory.py runs. 16:22:15 @cloudnull thanks to your pointers last week, I was able to correct my configuration and get Ocata deployed on CentOS7~ 16:22:36 tasker: I guess these new groups have been configured with an env.d ? 16:22:49 that, I don't know. 16:23:20 tasker: currently openstack ansible need a structure to generate groups and hosts. If we don't provide a structure for new groups, it will not work. 16:23:30 it's not simple YAML parsing like upstream ansible. 16:23:39 we have extra bits 16:25:47 tasker, we need information about those extra groups 16:25:58 in the meantime I will mark it as incomplete 16:26:26 thanks. I'm sorry i cannot provide any more. I did not layout the cluster. I'll try to get the more info for you in the meantime. 16:26:52 thanks 16:26:55 next 16:26:57 #link #link https://bugs.launchpad.net/openstack-ansible/+bug/1711638 16:26:57 Launchpad bug 1711638 in openstack-ansible "neutron-dnsmasq.log not forwarded to rsyslog" [Undecided,New] 16:26:58 #link https://bugs.launchpad.net/openstack-ansible/+bug/1711638 16:27:05 evrardjp : looks like that's it. i just tried adding the last few lines from the o_u_c in ticket, got a null physical host 16:27:44 yeah that's what I think too. Let's see what will be the real output, in case there was indeed an env.d 16:27:53 So we'll probably mark this as invalid. 16:27:58 but let's see 16:28:29 well, we should perhaps make the dynamic inventory handle that case better instead of forcing an env.d file to be created 16:28:32 for the next one, dnsmasq, I think it would be linked to recently bug fixed during the bug squash, about rsyslog not starting properly 16:28:39 odyssey4me: agreed. 16:28:54 odyssey4me: I think we should take ansible 2.4 yaml upstream format, and get done with it. 16:29:05 Just make sure there's a migration path 16:29:06 but that's me and my radical thinking :p 16:29:19 I don't disagree that our current structure should be deprecated 16:29:23 Just make sure there's a path 16:29:28 oh yeah, that would be the problem of radical thinking :p 16:29:32 evrardjp yeah, if we can use a better upstream method, and adjust the inventory to cater for both then that'd be a win 16:29:37 we can then deprecate the old method 16:29:58 I expect that in the dynamic inventory we could import something to allow both to work 16:30:01 yeah, I think making the inventory optional and all that jazz.... But I haven't written the spec for the ptg yet. 16:30:12 they can be cumulative. 16:30:20 well 16:30:22 Doesn't have to happen all at once, either 16:30:38 in the future, it's possible for them to be cumulative 16:30:40 I'd advocate a slower approach just because our current inventory isn't very well understood 16:30:42 so we can phase out 16:30:46 And pretty vital 16:30:49 yeah 16:31:06 that's what ansible 2.4 could bring us. 16:31:10 Again, not disagreeing at all; just thinking slower might be better 16:31:17 but let's continue the bug triage :) 16:31:20 Yes 16:31:28 for https://bugs.launchpad.net/openstack-ansible/+bug/1711638 16:31:29 Launchpad bug 1711638 in openstack-ansible "neutron-dnsmasq.log not forwarded to rsyslog" [Undecided,New] 16:31:38 I think it's related to rsyslog client. 16:31:55 so I expect this to be fixed soon, when I'll backport the change. 16:32:06 Merged openstack/openstack-ansible stable/pike: Bootstrap Ansible fails if partial keypair exists https://review.openstack.org/496164 16:32:19 Merged openstack/openstack-ansible stable/newton: Bootstrap Ansible fails if partial keypair exists https://review.openstack.org/496166 16:32:21 I'd say we can confirm in the meantime: if the rsyslog client doesn't ship logs, then it makes sense that dnsmasq logs aren't forwarded. 16:32:26 Merged openstack/openstack-ansible stable/ocata: Bootstrap Ansible fails if partial keypair exists https://review.openstack.org/496165 16:32:36 Merged openstack/openstack-ansible master: Update reno for stable/pike https://review.openstack.org/494992 16:32:37 what do you think? 16:32:42 Adri2000: are you there? 16:34:01 Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-os_tempest master: Update reno for stable/pike https://review.openstack.org/495079 16:34:10 Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-os_nova master: Update reno for stable/pike https://review.openstack.org/495067 16:34:10 ok let's move on 16:34:12 next 16:34:14 #link https://bugs.launchpad.net/openstack-ansible/+bug/1711577 16:34:14 Launchpad bug 1711577 in openstack-ansible "os_neutron: vpnaas variables missing for RedHat" [Undecided,New] 16:34:15 Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-os_ironic master: Update reno for stable/pike https://review.openstack.org/495055 16:34:18 Ok I'll take it. 16:34:26 I think it's confirmed and lwo 16:34:39 but I have a patch pending that touches vpnaas already. 16:35:24 evrardjp apologies - was elsewhere, the rsyslog one, I don't think we've told that file to ship anywhere... but it doesn't exist until neutron is running so I'm not sure it'll be easy to make it do the thing... but it may relate 16:35:26 Jean-Philippe Evrard proposed openstack/openstack-ansible-os_neutron master: Fix VPNaaS variable definition for non-ubuntu https://review.openstack.org/492495 16:36:25 odyssey4me: oh, so it means it would become a wishlist item? 16:36:38 I'd say it's still a bug, because expectations aren't set 16:36:41 I think it's mentioned in the rsyslog config 16:37:03 evrardjp I'd suggest that it needs triage first I think - need to verify whether it's already configured 16:37:20 odyssey4me: that's what we are doing right now :) 16:38:06 /etc/rsyslog.d/99-neutron-rsyslog-client.conf:$InputFileName /var/log/neutron/neutron-dnsmasq.log 16:38:27 Sounds confirmed to me 16:38:31 in that case, the two bugs may be related then - with the extra complexity that the file may not exist until a later restart 16:38:50 odyssey4me: that's true 16:38:57 criticality? 16:39:00 if we move all log shipping to use systemd then we'll likely have better results 16:39:05 and: " I think it's related to rsyslog client." < if you mean bug #1699875, I'm not sure because I fixed that bug locally on this deployment 16:39:06 bug 1699875 in openstack-ansible "rsyslog client postrotate script contains invalid command" [High,In progress] https://launchpad.net/bugs/1699875 - Assigned to Jean-Philippe Evrard (jean-philippe-evrard) 16:39:21 my suggestion is low - it's not a big impact and there is a workaround (restart) available 16:40:19 Adri2000: then what do you mean there? 16:40:23 odyssey4me: I agree. 16:40:47 evrardjp: I mean that a restart won't fix the neutron-dnsmasq.log issue 16:40:57 unless I missed something of course :) 16:41:13 I think it's really it's a different issue 16:41:22 oh ok 16:41:26 Oh yeah permissions. 16:41:35 sounds different then. 16:41:41 maybe permissions indeed 16:41:43 oops got distracted:( 16:42:01 I guess we need more ppl to look at it. 16:42:19 in the meantime it's triaged, and we'll continue our lifes. 16:42:28 we have much work this week. 16:42:31 next 16:42:47 #link https://bugs.launchpad.net/openstack-ansible/+bug/1711376 16:42:49 Launchpad bug 1711376 in openstack-ansible "intermittent AIO error: Timeout (7s) waiting for privilege escalation prompt" [Undecided,New] 16:42:59 hwoarang: could you tell us more? 16:43:10 is the fix there a definitive fix? 16:44:15 has anyone noticed that, or got the chance to triage it? 16:44:53 ok let's move on 16:44:55 next 16:44:55 #link https://bugs.launchpad.net/openstack-ansible/+bug/1711349 16:44:56 Launchpad bug 1711349 in openstack-ansible "CentOS/Pike - resolvconf - No file was found when using with_first_found." [Undecided,New] 16:45:17 mhayden: logan-? 16:45:22 this one appears to not be centos specific 16:45:30 ok 16:45:34 and cloudnull has a patch in for it 16:46:32 I think it got a few eyes on it, so it should be good. 16:47:27 I think that one is release critical for pike though 16:48:14 andymccr: are our pike gates busted? 16:48:25 evrardjp: not that i was aware of. 16:48:30 sorry to ask I didn't got the chance to babysit 16:48:37 oh maybe on centos7 16:48:48 no, this doesn't show up in the gates for some reason - but it does show up when using vagrant or another environment to build 16:48:48 ok if that error is happening i have an idea why 16:49:09 I think we need a little more review/test attention in https://review.openstack.org/#/c/493739/ 16:50:21 well we need a proper triaging of this issue. 16:50:48 We need a manual build if gates doesn't show it up. 16:50:51 ok. 16:51:03 I will try it. 16:51:36 next 16:51:49 it shows up in the periodic upgrade 16:51:59 oh 16:52:01 darn 16:52:04 http://logs.openstack.org/periodic/periodic-openstack-ansible-upgrade-aio-master-ubuntu-xenial/ 16:53:15 yeah sounds legit to me then. 16:53:21 Confirmed high ? 16:53:45 sounds right 16:54:25 ok next 16:54:27 #link https://bugs.launchpad.net/openstack-ansible/+bug/1711347 16:54:28 Launchpad bug 1711347 in openstack-ansible "Config file override mechanism broken" [Undecided,New] 16:55:34 oh thats a good one 16:55:36 :P 16:55:39 :) 16:55:46 I see what you did there jamesdenton! :p 16:55:47 what kind of personw ould file a bug like this 16:56:36 HR trap there andymccr. 16:56:36 Hi everyone... i´m trying to launch an aio with ironic following the instructions on https://docs.openstack.org/openstack-ansible/latest/contributor/quickstart-aio.html 16:56:37 I copied ironic.yaml.aio to conf.d, removed .aio, and run openstack-ansible setup-{hosts,infrastructure,openstack}.yml but there is no ironic related container... is there any additional step to enable it? 16:56:48 yeah i don't quite know how to answer that 16:56:57 haha 16:56:59 well 16:57:06 let's finish this triage shall we? :D 16:57:35 so for us to reproduce the issue, we need to be full python3 ? I thought we didn't support that 16:58:13 evrardjp: not yet 16:58:24 fair enough ;) 16:58:30 evrardjp: i think thats wish list. we need to do that next cycle tbh 16:58:40 Well, this environment was deployed on a fresh Ubuntu 16.04 image, so i'm not sure how it would've ended up using Py 3.5 rather than 2.7 16:58:42 my point is that I am surprised that the action plugin is by default using python3 16:58:57 /opt/ansible-runtime/lib/python3.5/site-packages/ansible/executor/task_executor.py 16:59:05 :p 16:59:31 so probably need to make the config template dual python compatible too. 16:59:35 ricardoas: We're in the middle of bug triage, someone can help you in a few 16:59:42 for me it makes sense to do it. 17:00:21 so I'd say confirmed and medium at least, because it's in openstack strategy, .... but on top of it, if it prevents deploys, I'd move it high 17:00:26 thanks, spotz! I´ll keep trying here... :) 17:00:35 andymccr: opinion? 17:00:45 hmm 17:00:57 so how do we repeat this error? 17:01:18 have python3 when bootstrapping ansible I guess 17:01:28 evrardjp: but we do that by default already? 17:01:37 for bootstrapping ansible i believe we already do py3 17:01:40 atleast on ubuntu 17:01:40 and then we could technically use our plugin without the wrapper 17:02:17 yeah but if we install python, all the #!/usr/bin/env python will be using python2 17:02:24 what I mean here is we generally use python2 17:02:39 we could force python3 in our wrapper? 17:02:50 jamesdenton: what did you do to get that issue? 17:03:02 Nothing particularily? Did you bootstrap-ansible.sh ? 17:03:14 yes, i bootstrapped aio and ansible 17:03:21 basically followed the developer guide 17:03:46 mmm . We need to stop the bug triage meeting for today, we are already exceeding time allocated. 17:03:47 the environment went down fine. Just making changes post-deploy using the override mechanism is how i ran into it 17:03:57 we'll need ot make acall about whether py3 for the ansible venv should be in pike or not 17:04:13 we may need to revert to py2 for pike, and continue work on it for queens 17:04:17 odyssey4me: it's too late for that I think. 17:04:31 evrardjp it's never too late ;) 17:04:37 fair enough :) 17:04:58 jamesdenton: hmm maybe 17:04:59 well let's leave the bug as is, for ppl to triage it 17:05:02 we should test that then 17:05:14 ok thats an interesting bug at least 17:05:17 I'd prefer to continue with what we have - but if we think there's a fair risk of trouble we could perhaps revert it - it's only tested on ubuntu right now 17:05:27 probably agree with odyssey4me on ripping out py3 from pike 17:05:45 I'd say ok to me. 17:06:05 let's close the bug triage and discuss that into another meeting 17:06:11 yup 17:06:22 thanks everyone for this very busy and very active bug triage! 17:06:28 more next week ! :p 17:06:31 #endmeeting