16:01:22 <evrardjp> #startmeeting openstack_ansible_meeting 16:01:22 <openstack> Meeting started Tue Mar 21 16:01:22 2017 UTC and is due to finish in 60 minutes. The chair is evrardjp. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:24 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:26 <openstack> The meeting name has been set to 'openstack_ansible_meeting' 16:01:30 <spotz> woot 16:01:46 <andymccr> well the list is going down since the last few weeks :) 16:03:00 <evrardjp> So there is no action point from last week, so it proves we were either good, or I'm terrible at tracking. 16:03:21 <evrardjp> #action evrardjp don't forget to use #action 16:03:51 <evrardjp> I guess now everybody has joined and is active 16:03:55 <evrardjp> thanks for your presence! 16:04:00 <evrardjp> we start now! 16:04:05 <evrardjp> [insert gif here] 16:04:10 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1673938 16:04:10 <openstack> Launchpad bug 1673938 in openstack-ansible "git example latest release" [Undecided,New] 16:04:23 <andymccr> hmm id say invalid - we ideally want people using the releases. 16:04:54 <andymccr> because stable/ocata is movable - whereas 15.0.0 isn't. from a doc perspective at least 16:05:21 <evrardjp> agreed. 16:05:33 <andymccr> i'll put a note in there 16:05:53 <evrardjp> I'm already doing it 16:06:03 <evrardjp> done 16:06:05 <andymccr> ahh ok sweet 16:06:07 <evrardjp> you can complete if you prefer. 16:06:13 <evrardjp> you know my english :p 16:06:22 <evrardjp> next 16:06:24 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1673889 16:06:25 <openstack> Launchpad bug 1673889 in openstack-ansible "Nova services do not restart on N->O upgrade" [Undecided,New] 16:06:43 <andymccr> hmm 16:07:15 <andymccr> perhaps that's why our upgrade jobs are failing 16:08:06 <evrardjp> well that would explain a lot 16:08:23 <evrardjp> (jenkins still loading for me) 16:09:07 <evrardjp> system daemon-reload is not properly issues ? 16:09:21 <evrardjp> issued* 16:09:27 <evrardjp> (still loading...) 16:09:57 <dankolbrs> o/ 16:10:04 <dankolbrs> let me know if I can provide more info on that 16:10:29 <dankolbrs> The jenkins page is ~135M usually, I usually just wget and less it 16:10:39 <odyssey4me> the containers probably shouldn't restart, but the services should 16:10:39 <evrardjp> this one was 65megs or something 16:10:46 <evrardjp> but you're right 16:10:46 <evrardjp> :p 16:10:54 <andymccr> hmm wondering why a container restart would fix it though - that would imply the services weren't restarted properly 16:10:56 <evrardjp> it's maybe too much for my browser 16:11:06 <andymccr> so you may be right evrardjp 16:11:08 <evrardjp> andymccr: I think we shouldn't force anything for the restart 16:11:18 <evrardjp> we should make it work without restart 16:11:33 <andymccr> evrardjp: agreed - but the only diff there would be services started up with the new venv etc 16:11:35 <evrardjp> if restart, that could work, but we shouldn't force a restart :p 16:11:43 <evrardjp> except if we do it in a proper serial way 16:11:48 <andymccr> that would imply a regular service restart would've wroked (or at least a daemon-reload and then restart) 16:12:14 <andymccr> hmm hmm 16:12:19 <evrardjp> did I miss something? 16:12:36 <andymccr> evrardjp: well it seems it worked - except the services in the containers didnt restart properly, so were still on 14.1.1 code running. 16:12:37 <evrardjp> the paste seem to say it's stuck on 14. where it should have been 15 16:12:46 <evrardjp> so daemon reload should have worked 16:12:49 <sss> odyssey4me: # openstack-ansible setup-hosts.yml Variable files: "-e @/etc/openstack_deploy/user_secrets.yml -e @/etc/openstack_deploy/user_variables.yml " ERROR! Attempted to execute "/opt/openstack-ansible/playbooks/inventory/dynamic_inventory.py" as inventory script: Inventory script (/opt/openstack-ansible/playbooks/inventory/dynamic_inventory.py) had an execution error: No container or management network specified in user 16:12:57 <andymccr> but if a container restart fixed that - then all that happened was the services started on boot using the correct venv 16:13:22 <evrardjp> oh yes, I think we arrived to the same conclusion, I thought you were infirming things :p 16:13:50 <evrardjp> ok my browser crashed. 16:13:54 <andymccr> thats a bad bug though 16:14:02 <andymccr> id say high 16:14:23 <evrardjp> I'd say critical even. We need to reliably know which version runs! 16:14:35 <evrardjp> it could be disastrous to run 14. code on 15 db right? 16:14:39 <andymccr> yeah hmm 16:14:46 <andymccr> i think the issue is 16:15:03 <andymccr> its restarting the compute hosts correctly 16:15:07 <andymccr> and then there is a version mismatch 16:15:09 <andymccr> but yeah you're right 16:15:09 <andymccr> hmm 16:15:14 <andymccr> ok lets put it critical 16:16:06 <evrardjp> so the log loaded for me 16:16:26 <evrardjp> it seems that it needs further investigation. The return of systemctl daemon-reload is changed 16:16:40 <evrardjp> and then it does the restart of services 16:16:44 <evrardjp> so yes. 16:16:55 <andymccr> hmm thats weird 16:18:04 <evrardjp> yes I see changed on nova-compute or other afterwards too. That is a good bug report 16:18:35 <evrardjp> who has time to look at this? 16:18:47 <evrardjp> that would be great to assign someone right now 16:19:06 <evrardjp> dankolbrs: you will spend time on this too? 16:19:52 <evrardjp> ok let's believe someone will pick that up, and track this 16:19:59 <evrardjp> next 16:20:04 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1673226 16:20:04 <openstack> Launchpad bug 1673226 in openstack-ansible "Support cinder v3" [Undecided,New] 16:20:07 <andymccr> oh this is fixed 16:20:14 <dankolbrs> evrardjp, I've looked into it but haven't really figured it out 16:20:19 <evrardjp> andymccr: ? 16:20:22 <evrardjp> which one? 16:20:25 <andymccr> https://review.openstack.org/#/c/446503/ 16:20:26 <andymccr> cinder one 16:20:28 <dankolbrs> I was able to replicate on an AIO 16:20:57 <evrardjp> dankolbrs: we moved to another bug due to restricted time reasons, let's talk about it in the chan later 16:21:06 <andymccr> The nova equivalent patch could use some reviews though: https://review.openstack.org/#/c/446508/ 16:21:07 <evrardjp> andymccr: good 16:21:56 <evrardjp> #action review https://review.openstack.org/#/c/446508/ 16:22:46 <evrardjp> ok next 16:22:49 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1672648 16:22:49 <openstack> Launchpad bug 1672648 in openstack-ansible "Documented test setup fails due to HTTPS usage in proxy" [Undecided,New] 16:23:04 <evrardjp> I'd say confirmed low 16:23:14 <evrardjp> it works right now, except edge cases 16:23:16 <evrardjp> doc change 16:23:24 <andymccr> ahh i see the issue 16:23:24 <andymccr> yeah 16:23:30 <andymccr> agreed 16:23:30 <evrardjp> ok 16:23:58 <evrardjp> next 16:24:00 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1670874 16:24:00 <openstack> Launchpad bug 1670874 in openstack-ansible "Validating domain name in FQDN and openstack_domain variable" [Undecided,New] 16:24:53 <evrardjp> it makes sense to properly use short hostname IMO 16:25:17 <andymccr> would that help though? doesnt it need to be whatever nova uses? 16:25:43 <evrardjp> on top of that we can't because there is no ansible_hostname_short, only inventory_hostname_short 16:25:47 <evrardjp> IIRC 16:26:18 <andymccr> so is the TL;DR that because the nova compute hosts aren't in /etc/hosts it bombs. 16:26:36 <andymccr> on live migration at least 16:27:14 <evrardjp> thet seem to be in /etc/hosts 16:27:44 <evrardjp> isn't that libvirt that's using another mechanism and doesn't care about /etc/hosts? 16:27:58 <andymccr> well i guess if it tries to resolve the name 16:27:58 <evrardjp> that is the first time I hear it 16:28:24 <evrardjp> systemd-resolvd? 16:28:57 <evrardjp> well we could lookup dns if we want to go that far, but I don't think it's a great idea 16:29:16 <andymccr> well tbh 16:29:24 <andymccr> i dont think this is related to /etc/hosts 16:29:34 <evrardjp> OH i got it, I misread 16:29:38 <andymccr> we mostly populate that based on the container names and other hosts so it can access them 16:29:38 <andymccr> so 16:29:51 <andymccr> if nova live migrate doesn't work because it's missing /etc/hosts entries that should go into the nova role imo 16:30:02 <evrardjp> yes, I completely misread 16:30:04 <andymccr> which could hten be additional entries to ensure the other compute hosts are present 16:30:27 <evrardjp> or we ensure a proper name resolving, but that's for the future I guess 16:30:37 <andymccr> haha yeah agreed, but yeah future 16:30:49 <evrardjp> I guess we could wire it better up in the defaults 16:30:58 <andymccr> so im thinking this is a bug - but not so much to do with the openstack_domain bits, and more to do with the nova role not setting up the appropriate entries for nova 16:31:01 <evrardjp> to have always openstack.local 16:31:10 <evrardjp> overridable by the users 16:31:20 <evrardjp> instead of having this local.lan that's not consistent 16:31:28 <evrardjp> confirmed low? 16:31:32 <andymccr> yeah 16:31:42 <andymccr> but yeah the issue is a disconnect between our "openstack_domain" and what nova is using 16:31:51 <evrardjp> yes indeed 16:32:26 <evrardjp> next 16:32:29 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1670632 16:32:29 <openstack> Launchpad bug 1670632 in openstack-ansible "ceilometer error because gnocchiclient > 3.0 for stable/newton " [Undecided,New] - Assigned to Jesse Pretorius (jesse-pretorius) 16:33:11 <evrardjp> odyssey4me: ? 16:33:28 <odyssey4me> I haven't had a chance to revisit that. 16:33:40 <evrardjp> ok let's leave it as is. someone else? 16:33:54 <evrardjp> except if someone else* 16:34:17 <evrardjp> #action odyssey4me have a look at https://bugs.launchpad.net/openstack-ansible/+bug/1670632 16:34:17 <openstack> Launchpad bug 1670632 in openstack-ansible "ceilometer error because gnocchiclient > 3.0 for stable/newton " [Undecided,New] - Assigned to Jesse Pretorius (jesse-pretorius) 16:34:28 <evrardjp> next 16:34:31 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1665667 16:34:31 <openstack> Launchpad bug 1665667 in openstack-ansible "Slow failover Recover on primary node restart" [Undecided,New] - Assigned to Jean-Philippe Evrard (jean-philippe-evrard) 16:34:54 <evrardjp> confirmed wishlist. 16:35:19 <evrardjp> We can spend some time on this, but it's gotta be a real effort on haproxy -- I still don't have time assigned to this. 16:35:25 <andymccr> yeah 16:35:34 <andymccr> i think its something cool we shoudl look into 16:35:43 <andymccr> when somebody has time for it ;P 16:35:48 <evrardjp> oh yes, it's definitely needed IMO 16:36:13 <evrardjp> but to triage, I guess it's falls under the wishlist category 16:36:14 <evrardjp> next 16:36:30 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1665377 16:36:30 <openstack> Launchpad bug 1665377 in openstack-ansible "openstack-ansible-security setting the pass_warn_age parameter incorrectly" [Undecided,New] 16:36:41 <evrardjp> I have no clue. 16:36:46 <evrardjp> mhayden: ? 16:36:54 <andymccr> he's out atm 16:36:56 <andymccr> i'll take a look 16:37:02 <andymccr> it seems pretty easy to confirm 16:37:05 <andymccr> lets move on 16:37:14 <evrardjp> it looks it's that indeed. 16:37:27 <evrardjp> or PASS_MAX_DAYS/PASS_MIN_DAYS 16:37:49 <evrardjp> confirmed medium, and assigned to mhayden :p 16:38:06 <evrardjp> if someone wants to take it, feel free 16:38:19 <evrardjp> it's classified as low-hanging-fruit for newcomers! 16:38:31 <evrardjp> next 16:38:33 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1665084 16:38:34 <openstack> Launchpad bug 1665084 in openstack-ansible "M->N upgrade causes container reboot due to bind mounts" [Undecided,New] 16:39:06 <andymccr> hmm 16:39:34 <pjm6> regarding for my problem of updating quota VMs it seems tis is a bug 16:39:49 <pjm6> https://bugs.launchpad.net/nova/+bug/1668267 16:39:49 <openstack> Launchpad bug 1670627 in OpenStack Compute (nova) ocata "duplicate for #1668267 quota is always in-use after delete the ERROR instances " [Critical,In progress] - Assigned to Matt Riedemann (mriedem) 16:40:28 <openstackgerrit> Andy McCrae proposed openstack/openstack-ansible-security master: Change PASS_WARN_DAYS --> PASS_WARN_AGE https://review.openstack.org/448193 16:40:59 <evrardjp> looks valid. 16:41:04 <andymccr> yeah 16:41:05 <andymccr> agreed 16:41:07 <evrardjp> but what do we do 16:41:25 <evrardjp> that's gonna be painful during an upgrade to N 16:41:39 <andymccr> hmm 16:41:39 <andymccr> maybe 16:41:47 <andymccr> the lineinfile stuff should be smarter 16:42:05 <andymccr> because the issue seems to be a minor change in a bindmount causing a restart of a container 16:42:18 <evrardjp> we could add regexp 16:42:26 <andymccr> but then that will still change - and restart so i guess thats pointless 16:42:41 <evrardjp> to have the item mount_path the bind patch and bind , then skip 16:42:50 <EmilienM> andymccr, odyssey4me and others: it would be great to have your feedback on my email: [deployment][tripleo] Next steps for configuration management in TripleO (and other tools?) 16:42:52 <evrardjp> well that's true 16:43:22 <evrardjp> I think there is no perfect way to do this andymccr 16:43:30 <EmilienM> andymccr, odyssey4me: if you have any feedback, please reply to it 16:43:46 <andymccr> evrardjp: yeah agreed 16:43:48 <evrardjp> let's just confirm and mark it as medium -- it can break upgrades 16:43:58 <andymccr> if only we could be smarter with mounting/container restarts but i guess that is not possible 16:43:58 <evrardjp> we need to "deal with it" 16:44:17 <evrardjp> either by removing this (and risking another change, which seems awful) 16:44:24 <evrardjp> or by being smarter in the restart 16:44:48 <logan-> the lineinfile needs to be fixed anyway because it creates duplicate mount entries for the same bind mount 16:45:00 <jmccrory> https://review.openstack.org/#/c/426928/ galera was serialized to avoid problems with this 16:45:01 <evrardjp> that's true too. 16:45:20 <andymccr> yeah perhaps that is the solution 16:45:20 <jmccrory> or limit them at least 16:45:24 <evrardjp> jmccrory: which branch? 16:45:27 <andymccr> for now at least 16:45:29 <evrardjp> that's good 16:45:29 <jmccrory> it went to newton 16:45:34 <evrardjp> ok 16:45:41 <evrardjp> so we just need to clean the lineinfile 16:45:44 <jmccrory> and was breaking upgrades 16:45:48 <evrardjp> to ensure everything looks clean 16:46:02 <evrardjp> (avoiding duplicates) 16:47:09 <evrardjp> good 16:47:13 <evrardjp> I commented there 16:47:23 <evrardjp> confirmed low-hanging-fruit 16:47:41 <evrardjp> medium? 16:47:49 <evrardjp> depends on our priority of upgrades I guess 16:48:00 <andymccr> id say it's medium at least 16:48:04 <evrardjp> ok 16:48:07 <evrardjp> done as medium 16:48:11 <evrardjp> let's move on 16:48:30 <evrardjp> next 16:48:32 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1662865 16:48:32 <openstack> Launchpad bug 1662865 in openstack-ansible "nova-compute: inconsistent qemu packages installed" [Undecided,New] 16:49:02 <evrardjp> I guess we can leave it as is 16:49:09 <evrardjp> until bug reporter comes back 16:49:13 <andymccr> yeah 16:49:13 <andymccr> agreed 16:49:17 <evrardjp> it will expire if not 16:49:18 <evrardjp> ok 16:49:42 <evrardjp> next one is assigned to mhayden 16:49:45 <evrardjp> let's skip 16:49:48 <andymccr> yeah 16:49:59 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1661114 16:49:59 <openstack> Launchpad bug 1661114 in openstack-ansible "rabbitmq_Server role is failing in mitaka due to rfc hostname changes" [Undecided,New] 16:50:12 <evrardjp> bjoern doesn't seem connected 16:50:13 <evrardjp> next 16:50:26 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1659073 16:50:26 <openstack> Launchpad bug 1659073 in openstack-ansible "MariaDB 10.0.28 -> 10.1 upgrade (galera server wsrep bootstrap fails)" [Undecided,New] 16:50:38 <evrardjp> waiting for bug reporter too 16:51:08 <andymccr> ok so this is simliar to the one jamesdenton raised i think 16:51:39 <evrardjp> which one, the nova one? or the maria? 16:51:44 <andymccr> mariadb bits 16:52:10 <EmilienM> (sorry I just noticed you were in a meeting) 16:52:16 <evrardjp> let's wait for reporter info, if it's fixed we don't have anything to do there 16:52:22 <evrardjp> EmilienM: that's alright, don't worry :p 16:52:25 <jmccrory> saw that being caused during upgrades when all containers restart at the same time 16:52:27 <evrardjp> we are only triaging :p 16:52:31 <EmilienM> if you guys need a logo 16:52:59 <evrardjp> let's continue 16:53:12 <evrardjp> jmccrory: may already have fixed it as described in the bug 16:53:24 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1651809 16:53:24 <openstack> Launchpad bug 1651809 in openstack-ansible "Hostname change in L->M resets hypervisor usage" [Undecided,New] 16:53:51 <evrardjp> maybe this one is linked to vnogin's one 16:54:05 <andymccr> hmm 16:54:06 <evrardjp> indirectly 16:54:06 <andymccr> well 16:54:17 <evrardjp> well I think it's the whole _ - hostname change 16:54:19 <andymccr> we need to get this one fixed asap if it's an issue since mitaka will EOL at some point 16:54:35 <evrardjp> we moved to use ansible_hostname which will be a fqdn 16:54:45 <evrardjp> so there is no way to fix that without being disruptive IMO 16:55:18 <andymccr> hmm 16:55:28 <evrardjp> if you move to use inventory_hostname, you'll basically have issues later 16:55:35 <evrardjp> like bjoern suggested 16:55:40 <evrardjp> so I don't think it's a good idea. 16:56:13 <evrardjp> well I think it deserves a decision, once and for all 16:56:33 <evrardjp> and I thought that decision was taken when everything moved to ansible_hostname 16:56:41 <andymccr> yeah i agree with the move 16:56:41 <palendae> evrardjp: Is that _/- change you're talking about related to https://review.openstack.org/#/c/407655/ ? 16:56:46 <palendae> Or something different? 16:57:01 <evrardjp> not directly 16:57:04 <palendae> Mostly ask cause I haven't had time to circle back on that review and I got stumped 16:57:05 <palendae> Ok 16:57:22 <evrardjp> in L to M upgrade we moved to use - in many places 16:57:31 <evrardjp> (not in the inventory 'though) 16:57:51 <evrardjp> but that's the reason why we started to use ansible_hostname 16:58:04 <palendae> Got it 16:58:12 <evrardjp> so if nova in M is now using ansible_hostname, it would now be a different name in M and L 16:58:21 <evrardjp> you can fix that by forcing M to use L name 16:58:27 <evrardjp> but that sounds like delaying the pain 16:58:59 <evrardjp> if we had an inventory adapted for that in M, that would have been different 16:59:05 <evrardjp> but that's not the case 16:59:19 <andymccr> hmm 16:59:34 <evrardjp> so until we have an inventory that's clean enough and that we can use (and revert ansible_hostname to inventory_hostname), I'd say let's stick with this 16:59:42 <andymccr> yeah 16:59:47 <evrardjp> it's for me a "known issue" 16:59:58 <andymccr> yeah agreed 17:00:07 <evrardjp> ok 17:00:08 <palendae> Is M EOL yet? 17:00:15 <evrardjp> nope 17:00:19 <evrardjp> IIRC 17:00:21 <andymccr> not yet no 17:00:27 <andymccr> it'll happen in the next few months 17:00:35 <evrardjp> but it's stable 17:00:47 <evrardjp> we can't do anything too much disruptive there 17:01:05 <palendae> Right 17:01:09 <evrardjp> I guess we are out of time for today 17:01:18 <evrardjp> thanks everyone 17:01:29 <andymccr> pretty good effort - only 2 left, so all good :) 17:01:35 <evrardjp> hehe 17:01:40 <evrardjp> #endmeeting