#openstack-ansible log

16:01:22 <evrardjp> #startmeeting openstack_ansible_meeting
16:01:22 <openstack> Meeting started Tue Mar 21 16:01:22 2017 UTC and is due to finish in 60 minutes.  The chair is evrardjp. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:01:24 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:01:26 <openstack> The meeting name has been set to 'openstack_ansible_meeting'
16:01:30 <spotz> woot
16:01:46 <andymccr> well the list is going down since the last few weeks :)
16:03:00 <evrardjp> So there is no action point from last week, so it proves we were either good, or I'm terrible at tracking.
16:03:21 <evrardjp> #action evrardjp don't forget to use #action
16:03:51 <evrardjp> I guess now everybody has joined and is active
16:03:55 <evrardjp> thanks for your presence!
16:04:00 <evrardjp> we start now!
16:04:05 <evrardjp> [insert gif here]
16:04:10 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1673938
16:04:10 <openstack> Launchpad bug 1673938 in openstack-ansible "git example latest release" [Undecided,New]
16:04:23 <andymccr> hmm id say invalid - we ideally want people using the releases.
16:04:54 <andymccr> because stable/ocata is movable - whereas 15.0.0 isn't. from a doc perspective at least
16:05:21 <evrardjp> agreed.
16:05:33 <andymccr> i'll put a note in there
16:05:53 <evrardjp> I'm already doing it
16:06:03 <evrardjp> done
16:06:05 <andymccr> ahh ok sweet
16:06:07 <evrardjp> you can complete if you prefer.
16:06:13 <evrardjp> you know my english :p
16:06:22 <evrardjp> next
16:06:24 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1673889
16:06:25 <openstack> Launchpad bug 1673889 in openstack-ansible "Nova services do not restart on N->O upgrade" [Undecided,New]
16:06:43 <andymccr> hmm
16:07:15 <andymccr> perhaps that's why our upgrade jobs are failing
16:08:06 <evrardjp> well that would explain a lot
16:08:23 <evrardjp> (jenkins still loading for me)
16:09:07 <evrardjp> system daemon-reload is not properly issues ?
16:09:21 <evrardjp> issued*
16:09:27 <evrardjp> (still loading...)
16:09:57 <dankolbrs> o/
16:10:04 <dankolbrs> let me know if I can provide more info on that
16:10:29 <dankolbrs> The jenkins page is ~135M usually, I usually just wget and less it
16:10:39 <odyssey4me> the containers probably shouldn't restart, but the services should
16:10:39 <evrardjp> this one was 65megs or something
16:10:46 <evrardjp> but you're right
16:10:46 <evrardjp> :p
16:10:54 <andymccr> hmm wondering why a container restart would fix it though - that would imply the services weren't restarted properly
16:10:56 <evrardjp> it's maybe too much for my browser
16:11:06 <andymccr> so you may be right evrardjp
16:11:08 <evrardjp> andymccr: I think we shouldn't force anything for the restart
16:11:18 <evrardjp> we should make it work without restart
16:11:33 <andymccr> evrardjp: agreed - but the only diff there would be services started up with the new venv etc
16:11:35 <evrardjp> if restart, that could work, but we shouldn't force a restart :p
16:11:43 <evrardjp> except if we do it in a proper serial way
16:11:48 <andymccr> that would imply a regular service restart would've wroked (or at least a daemon-reload and then restart)
16:12:14 <andymccr> hmm hmm
16:12:19 <evrardjp> did I miss something?
16:12:36 <andymccr> evrardjp: well it seems it worked - except the services in the containers didnt restart properly, so were still on 14.1.1 code running.
16:12:37 <evrardjp> the paste seem to say it's stuck on 14. where it should have been 15
16:12:46 <evrardjp> so daemon reload should have worked
16:12:49 <sss> odyssey4me: # openstack-ansible setup-hosts.yml Variable files: "-e @/etc/openstack_deploy/user_secrets.yml -e @/etc/openstack_deploy/user_variables.yml " ERROR! Attempted to execute "/opt/openstack-ansible/playbooks/inventory/dynamic_inventory.py" as inventory script: Inventory script (/opt/openstack-ansible/playbooks/inventory/dynamic_inventory.py) had an execution error: No container or management network specified in user
16:12:57 <andymccr> but if a container restart fixed that - then all that happened was the services started on boot using the correct venv
16:13:22 <evrardjp> oh yes, I think we arrived to the same conclusion, I thought you were infirming things :p
16:13:50 <evrardjp> ok my browser crashed.
16:13:54 <andymccr> thats a bad bug though
16:14:02 <andymccr> id say high
16:14:23 <evrardjp> I'd say critical even. We need to reliably know which version runs!
16:14:35 <evrardjp> it could be disastrous to run 14. code on 15 db right?
16:14:39 <andymccr> yeah hmm
16:14:46 <andymccr> i think the issue is
16:15:03 <andymccr> its restarting the compute hosts correctly
16:15:07 <andymccr> and then there is a version mismatch
16:15:09 <andymccr> but yeah you're right
16:15:09 <andymccr> hmm
16:15:14 <andymccr> ok lets put it critical
16:16:06 <evrardjp> so the log loaded for me
16:16:26 <evrardjp> it seems that it needs further investigation. The return of systemctl daemon-reload is changed
16:16:40 <evrardjp> and then it does the restart of services
16:16:44 <evrardjp> so yes.
16:16:55 <andymccr> hmm thats weird
16:18:04 <evrardjp> yes I see changed on nova-compute or other afterwards too. That is a good bug report
16:18:35 <evrardjp> who has time to look at this?
16:18:47 <evrardjp> that would be great to assign someone right now
16:19:06 <evrardjp> dankolbrs: you will spend time on this too?
16:19:52 <evrardjp> ok let's believe someone will pick that up, and track this
16:19:59 <evrardjp> next
16:20:04 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1673226
16:20:04 <openstack> Launchpad bug 1673226 in openstack-ansible "Support cinder v3" [Undecided,New]
16:20:07 <andymccr> oh this is fixed
16:20:14 <dankolbrs> evrardjp, I've looked into it but haven't really figured it out
16:20:19 <evrardjp> andymccr: ?
16:20:22 <evrardjp> which one?
16:20:25 <andymccr> https://review.openstack.org/#/c/446503/
16:20:26 <andymccr> cinder one
16:20:28 <dankolbrs> I was able to replicate on an AIO
16:20:57 <evrardjp> dankolbrs: we moved to another bug due to restricted time reasons, let's talk about it in the chan later
16:21:06 <andymccr> The nova equivalent patch could use some reviews though: https://review.openstack.org/#/c/446508/
16:21:07 <evrardjp> andymccr: good
16:21:56 <evrardjp> #action review https://review.openstack.org/#/c/446508/
16:22:46 <evrardjp> ok next
16:22:49 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1672648
16:22:49 <openstack> Launchpad bug 1672648 in openstack-ansible "Documented test setup fails due to HTTPS usage in proxy" [Undecided,New]
16:23:04 <evrardjp> I'd say confirmed low
16:23:14 <evrardjp> it works right now, except edge cases
16:23:16 <evrardjp> doc change
16:23:24 <andymccr> ahh i see the issue
16:23:24 <andymccr> yeah
16:23:30 <andymccr> agreed
16:23:30 <evrardjp> ok
16:23:58 <evrardjp> next
16:24:00 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1670874
16:24:00 <openstack> Launchpad bug 1670874 in openstack-ansible "Validating domain name in FQDN and openstack_domain variable" [Undecided,New]
16:24:53 <evrardjp> it makes sense to properly use short hostname IMO
16:25:17 <andymccr> would that help though? doesnt it need to be whatever nova uses?
16:25:43 <evrardjp> on top of that we can't because there is no ansible_hostname_short, only inventory_hostname_short
16:25:47 <evrardjp> IIRC
16:26:18 <andymccr> so is the TL;DR that because the nova compute hosts aren't in /etc/hosts it bombs.
16:26:36 <andymccr> on live migration at least
16:27:14 <evrardjp> thet seem to be in /etc/hosts
16:27:44 <evrardjp> isn't that libvirt that's using another mechanism and doesn't care about /etc/hosts?
16:27:58 <andymccr> well i guess if it tries to resolve the name
16:27:58 <evrardjp> that is the first time I hear it
16:28:24 <evrardjp> systemd-resolvd?
16:28:57 <evrardjp> well we could lookup dns if we want to go that far, but I don't think it's a great idea
16:29:16 <andymccr> well tbh
16:29:24 <andymccr> i dont think this is related to /etc/hosts
16:29:34 <evrardjp> OH i got it, I misread
16:29:38 <andymccr> we mostly populate that based on the container names and other hosts so it can access them
16:29:38 <andymccr> so
16:29:51 <andymccr> if nova live migrate doesn't work because it's missing /etc/hosts entries that should go into the nova role imo
16:30:02 <evrardjp> yes, I completely misread
16:30:04 <andymccr> which could hten be additional entries to ensure the other compute hosts are present
16:30:27 <evrardjp> or we ensure a proper name resolving, but that's for the future I guess
16:30:37 <andymccr> haha yeah agreed, but yeah future
16:30:49 <evrardjp> I guess we could wire it better up in the defaults
16:30:58 <andymccr> so im thinking this is a bug - but not so much to do with the openstack_domain bits, and more to do with the nova role not setting up the appropriate entries for nova
16:31:01 <evrardjp> to have always openstack.local
16:31:10 <evrardjp> overridable by the users
16:31:20 <evrardjp> instead of having this local.lan that's not consistent
16:31:28 <evrardjp> confirmed low?
16:31:32 <andymccr> yeah
16:31:42 <andymccr> but yeah the issue is a disconnect between our "openstack_domain" and what nova is using
16:31:51 <evrardjp> yes indeed
16:32:26 <evrardjp> next
16:32:29 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1670632
16:32:29 <openstack> Launchpad bug 1670632 in openstack-ansible "ceilometer error because gnocchiclient > 3.0 for stable/newton " [Undecided,New] - Assigned to Jesse Pretorius (jesse-pretorius)
16:33:11 <evrardjp> odyssey4me: ?
16:33:28 <odyssey4me> I haven't had a chance to revisit that.
16:33:40 <evrardjp> ok let's leave it as is. someone else?
16:33:54 <evrardjp> except if someone else*
16:34:17 <evrardjp> #action odyssey4me have a look at  https://bugs.launchpad.net/openstack-ansible/+bug/1670632
16:34:17 <openstack> Launchpad bug 1670632 in openstack-ansible "ceilometer error because gnocchiclient > 3.0 for stable/newton " [Undecided,New] - Assigned to Jesse Pretorius (jesse-pretorius)
16:34:28 <evrardjp> next
16:34:31 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1665667
16:34:31 <openstack> Launchpad bug 1665667 in openstack-ansible "Slow failover Recover on primary node restart" [Undecided,New] - Assigned to Jean-Philippe Evrard (jean-philippe-evrard)
16:34:54 <evrardjp> confirmed wishlist.
16:35:19 <evrardjp> We can spend some time on this, but it's gotta be a real effort on haproxy -- I still don't have time assigned to this.
16:35:25 <andymccr> yeah
16:35:34 <andymccr> i think its something cool we shoudl look into
16:35:43 <andymccr> when somebody has time for it ;P
16:35:48 <evrardjp> oh yes, it's definitely needed IMO
16:36:13 <evrardjp> but to triage, I guess it's falls under the wishlist category
16:36:14 <evrardjp> next
16:36:30 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1665377
16:36:30 <openstack> Launchpad bug 1665377 in openstack-ansible "openstack-ansible-security setting the pass_warn_age parameter incorrectly" [Undecided,New]
16:36:41 <evrardjp> I have no clue.
16:36:46 <evrardjp> mhayden: ?
16:36:54 <andymccr> he's out atm
16:36:56 <andymccr> i'll take a look
16:37:02 <andymccr> it seems pretty easy to confirm
16:37:05 <andymccr> lets move on
16:37:14 <evrardjp> it looks it's that indeed.
16:37:27 <evrardjp> or PASS_MAX_DAYS/PASS_MIN_DAYS
16:37:49 <evrardjp> confirmed medium, and assigned to mhayden :p
16:38:06 <evrardjp> if someone wants to take it, feel free
16:38:19 <evrardjp> it's classified as low-hanging-fruit for newcomers!
16:38:31 <evrardjp> next
16:38:33 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1665084
16:38:34 <openstack> Launchpad bug 1665084 in openstack-ansible "M->N upgrade causes container reboot due to bind mounts" [Undecided,New]
16:39:06 <andymccr> hmm
16:39:34 <pjm6> regarding for my problem of updating quota VMs it seems tis is a bug
16:39:49 <pjm6> https://bugs.launchpad.net/nova/+bug/1668267
16:39:49 <openstack> Launchpad bug 1670627 in OpenStack Compute (nova) ocata "duplicate for #1668267 quota is always in-use after delete the ERROR instances " [Critical,In progress] - Assigned to Matt Riedemann (mriedem)
16:40:28 <openstackgerrit> Andy McCrae proposed openstack/openstack-ansible-security master: Change PASS_WARN_DAYS --> PASS_WARN_AGE  https://review.openstack.org/448193
16:40:59 <evrardjp> looks valid.
16:41:04 <andymccr> yeah
16:41:05 <andymccr> agreed
16:41:07 <evrardjp> but what do we do
16:41:25 <evrardjp> that's gonna be painful during an upgrade to N
16:41:39 <andymccr> hmm
16:41:39 <andymccr> maybe
16:41:47 <andymccr> the lineinfile stuff should be smarter
16:42:05 <andymccr> because the issue seems to be a minor change in a bindmount causing a restart of a container
16:42:18 <evrardjp> we could add regexp
16:42:26 <andymccr> but then that will still change - and restart so i guess thats pointless
16:42:41 <evrardjp> to have the item mount_path the bind patch and bind , then skip
16:42:50 <EmilienM> andymccr, odyssey4me and others: it would be great to have your feedback on my email: [deployment][tripleo] Next steps for configuration management in TripleO (and other tools?)
16:42:52 <evrardjp> well that's true
16:43:22 <evrardjp> I think there is no perfect way to do this andymccr
16:43:30 <EmilienM> andymccr, odyssey4me: if you have any feedback, please reply to it
16:43:46 <andymccr> evrardjp: yeah agreed
16:43:48 <evrardjp> let's just confirm and mark it as medium -- it can break upgrades
16:43:58 <andymccr> if only we could be smarter with mounting/container restarts but i guess that is not possible
16:43:58 <evrardjp> we need to "deal with it"
16:44:17 <evrardjp> either by removing this (and risking another change, which seems awful)
16:44:24 <evrardjp> or by being smarter in the restart
16:44:48 <logan-> the lineinfile needs to be fixed anyway because it creates duplicate mount entries for the same bind mount
16:45:00 <jmccrory> https://review.openstack.org/#/c/426928/ galera was serialized to avoid problems with this
16:45:01 <evrardjp> that's true too.
16:45:20 <andymccr> yeah perhaps that is the solution
16:45:20 <jmccrory> or limit them at least
16:45:24 <evrardjp> jmccrory: which branch?
16:45:27 <andymccr> for now at least
16:45:29 <evrardjp> that's good
16:45:29 <jmccrory> it went to newton
16:45:34 <evrardjp> ok
16:45:41 <evrardjp> so we just need to clean the lineinfile
16:45:44 <jmccrory> and was breaking upgrades
16:45:48 <evrardjp> to ensure everything looks clean
16:46:02 <evrardjp> (avoiding duplicates)
16:47:09 <evrardjp> good
16:47:13 <evrardjp> I commented there
16:47:23 <evrardjp> confirmed low-hanging-fruit
16:47:41 <evrardjp> medium?
16:47:49 <evrardjp> depends on our priority of upgrades I guess
16:48:00 <andymccr> id say it's medium at least
16:48:04 <evrardjp> ok
16:48:07 <evrardjp> done as medium
16:48:11 <evrardjp> let's move on
16:48:30 <evrardjp> next
16:48:32 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1662865
16:48:32 <openstack> Launchpad bug 1662865 in openstack-ansible "nova-compute: inconsistent qemu packages installed" [Undecided,New]
16:49:02 <evrardjp> I guess we can leave it as is
16:49:09 <evrardjp> until bug reporter comes back
16:49:13 <andymccr> yeah
16:49:13 <andymccr> agreed
16:49:17 <evrardjp> it will expire if not
16:49:18 <evrardjp> ok
16:49:42 <evrardjp> next one is assigned to mhayden
16:49:45 <evrardjp> let's skip
16:49:48 <andymccr> yeah
16:49:59 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1661114
16:49:59 <openstack> Launchpad bug 1661114 in openstack-ansible "rabbitmq_Server role is failing in mitaka due to rfc hostname changes" [Undecided,New]
16:50:12 <evrardjp> bjoern doesn't seem connected
16:50:13 <evrardjp> next
16:50:26 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1659073
16:50:26 <openstack> Launchpad bug 1659073 in openstack-ansible "MariaDB 10.0.28 -> 10.1 upgrade (galera server wsrep bootstrap fails)" [Undecided,New]
16:50:38 <evrardjp> waiting for bug reporter too
16:51:08 <andymccr> ok so this is simliar to the one jamesdenton raised i think
16:51:39 <evrardjp> which one, the nova one? or the maria?
16:51:44 <andymccr> mariadb bits
16:52:10 <EmilienM> (sorry I just noticed you were in a meeting)
16:52:16 <evrardjp> let's wait for reporter info, if it's fixed we don't have anything to do there
16:52:22 <evrardjp> EmilienM: that's alright, don't worry :p
16:52:25 <jmccrory> saw that being caused during upgrades when all containers restart at the same time
16:52:27 <evrardjp> we are only triaging :p
16:52:31 <EmilienM> if you guys need a logo
16:52:59 <evrardjp> let's continue
16:53:12 <evrardjp> jmccrory: may already have fixed it as described in the bug
16:53:24 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1651809
16:53:24 <openstack> Launchpad bug 1651809 in openstack-ansible "Hostname change in L->M resets hypervisor usage" [Undecided,New]
16:53:51 <evrardjp> maybe this one is linked to vnogin's one
16:54:05 <andymccr> hmm
16:54:06 <evrardjp> indirectly
16:54:06 <andymccr> well
16:54:17 <evrardjp> well I think it's the whole _ - hostname change
16:54:19 <andymccr> we need to get this one fixed asap if it's an issue since mitaka will EOL at some point
16:54:35 <evrardjp> we moved to use ansible_hostname which will be a fqdn
16:54:45 <evrardjp> so there is no way to fix that without being disruptive IMO
16:55:18 <andymccr> hmm
16:55:28 <evrardjp> if you move to use inventory_hostname, you'll basically have issues later
16:55:35 <evrardjp> like bjoern suggested
16:55:40 <evrardjp> so I don't think it's a good idea.
16:56:13 <evrardjp> well I think it deserves a decision, once and for all
16:56:33 <evrardjp> and I thought that decision was taken when everything moved to ansible_hostname
16:56:41 <andymccr> yeah i agree with the move
16:56:41 <palendae> evrardjp: Is that _/- change you're talking about related to https://review.openstack.org/#/c/407655/ ?
16:56:46 <palendae> Or something different?
16:57:01 <evrardjp> not directly
16:57:04 <palendae> Mostly ask cause I haven't had time to circle back on that review and I got stumped
16:57:05 <palendae> Ok
16:57:22 <evrardjp> in L to M upgrade we moved to use - in many places
16:57:31 <evrardjp> (not in the inventory 'though)
16:57:51 <evrardjp> but that's the reason why we started to use ansible_hostname
16:58:04 <palendae> Got it
16:58:12 <evrardjp> so if nova in M is now using ansible_hostname, it would now be a different name in M and L
16:58:21 <evrardjp> you can fix that by forcing M to use L name
16:58:27 <evrardjp> but that sounds like delaying the pain
16:58:59 <evrardjp> if we had an inventory adapted for that in M, that would have been different
16:59:05 <evrardjp> but that's not the case
16:59:19 <andymccr> hmm
16:59:34 <evrardjp> so until we have an inventory that's clean enough and that we can use (and revert ansible_hostname to inventory_hostname), I'd say let's stick with this
16:59:42 <andymccr> yeah
16:59:47 <evrardjp> it's for me a "known issue"
16:59:58 <andymccr> yeah agreed
17:00:07 <evrardjp> ok
17:00:08 <palendae> Is M EOL yet?
17:00:15 <evrardjp> nope
17:00:19 <evrardjp> IIRC
17:00:21 <andymccr> not yet no
17:00:27 <andymccr> it'll happen in the next few months
17:00:35 <evrardjp> but it's stable
17:00:47 <evrardjp> we can't do anything too much disruptive there
17:01:05 <palendae> Right
17:01:09 <evrardjp> I guess we are out of time for today
17:01:18 <evrardjp> thanks everyone
17:01:29 <andymccr> pretty good effort - only 2 left, so all good :)
17:01:35 <evrardjp> hehe
17:01:40 <evrardjp> #endmeeting