16:04:15 <evrardjp> #startmeeting openstack_ansible_meeting
16:04:16 <openstack> Meeting started Tue Feb 20 16:04:15 2018 UTC and is due to finish in 60 minutes.  The chair is evrardjp. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:04:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:04:18 <d34dh0r53> o/
16:04:19 <openstack> The meeting name has been set to 'openstack_ansible_meeting'
16:04:33 <hwoarang> ο/
16:04:57 <evrardjp> good, we are already 3.
16:05:00 <jmccrory> o/
16:05:03 <evrardjp> 4!
16:05:05 <evrardjp> omg!
16:05:32 <andymccr> o/
16:05:33 <hwoarang> lol
16:05:41 <hwoarang> quick before we disappear
16:05:45 <andymccr> haha
16:06:08 <openstackgerrit> Merged openstack/openstack-ansible-os_nova master: Change include: to include_tasks:  https://review.openstack.org/544986
16:06:50 <evrardjp> ok let's move on to the agenda then!
16:07:00 <evrardjp> #topic focus of the week
16:07:06 <evrardjp> this week is!
16:07:11 <evrardjp> drumroll....
16:07:13 <evrardjp> Wrapping up Newton, stabilization of queens branch by fixing bugs
16:07:33 <evrardjp> so basically newton is close of EOL
16:07:55 <evrardjp> I will send a message to ML soon to do a last warning sign :)
16:08:23 <evrardjp> for the rest, I think it would be nice to fix the upgrades to queens and improve stability of queens in general.
16:08:28 <evrardjp> ok let's move on to bug triage
16:08:37 <evrardjp> #topic bugtriage
16:08:43 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1750241
16:08:43 <openstack> Launchpad bug 1750241 in openstack-ansible "creating OS::Neutron::FloatingIP with OS::Neutron::LBaaS::LoadBalancer" [Undecided,New]
16:09:47 <evrardjp> I don't understand this. Is that our thing?
16:09:57 * hwoarang has no clue
16:10:14 <evrardjp> It looks like it's usage
16:11:21 <evrardjp> any idea?
16:11:25 <evrardjp> should I say invalid?
16:11:30 <evrardjp> incomplete?
16:11:30 <andymccr> hmm
16:11:33 <andymccr> thats a heat template
16:11:36 <evrardjp> yeah
16:11:48 <evrardjp> it looks like a depends on something
16:11:59 <evrardjp> which we don't do, and probably shouldn't
16:11:59 <andymccr> im not sure what the issue is at all
16:12:06 <evrardjp> ahah welcome to the club
16:12:28 <andymccr> incomplete?
16:12:46 <evrardjp> yeah incomplete
16:12:50 <evrardjp> I asked a question there
16:13:04 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1750236
16:13:04 <openstack> Launchpad bug 1750236 in openstack-ansible "os::aodh::alarm via heat stable/pike ubuntu & centos http 503" [Undecided,New]
16:13:24 <andymccr> ok so since its the same reporter
16:13:29 <evrardjp> omg.
16:13:33 <andymccr> im guessing the issue is that using heat with OSA right now is causing issues?
16:13:52 <andymccr> but tl;dr aodh is broken there id guess :D
16:13:55 <evrardjp> with gnocchi and autoscaling.
16:14:49 <evrardjp> so I guess unless someone has the time to confirm, we should move to another bug
16:14:57 <evrardjp> ok for everyone?
16:15:27 <evrardjp> I assume yes.
16:15:30 <evrardjp> next
16:15:31 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1750233
16:15:31 <openstack> Launchpad bug 1750233 in openstack-ansible "corrupted dynamic_inventory.py backup file" [Undecided,New]
16:15:36 <evrardjp> ok this one is VERY nice.
16:16:04 <odyssey4me> I suspect those heat issues relate to bad config - endpoints for example.
16:16:08 <evrardjp> I want to work on it, but I am still lacking cycles, so I'd like to classify this, so we know what we can do.
16:16:25 <odyssey4me> I don't know heat very well, but it does require access to valid endpoints, and theire showing gateway errors and internal server errors.
16:17:28 <odyssey4me> heh, yeah - our inventory is designed for single-use only
16:17:32 <evrardjp> odyssey4me: can you comment on the bug, to ask for more detailed information maybe? Or should I do it?
16:17:39 <odyssey4me> multiple accessors at the same time *will* break it
16:17:50 <evrardjp> odyssey4me: yeah, that inventory thing show how many issues we will hit if we do things in parallel.
16:17:51 <spotz> heat does require valid endpoints
16:17:58 <odyssey4me> we should probably implement some sort of lock file or whatever
16:18:12 <odyssey4me> or not allow it to change itself ;)
16:18:16 <evrardjp> or moved to static inventory, or safe system
16:18:26 <evrardjp> yeah
16:18:27 <evrardjp> so
16:18:42 <evrardjp> I think this is a feature addition, which is kinda whishlist
16:18:53 <evrardjp> although, this is a SEVERE bug in usability
16:19:54 <andymccr> hmm
16:21:08 <andymccr> can i ask what the usecase is there?
16:21:23 <andymccr> i guess its still broken but hmm
16:21:24 <evrardjp> you can ask :D
16:21:32 <mattt> the usecase of multiple simultaneous ansible runs?
16:21:59 <andymccr> mattt: ok so the aim is to run separate things at the same time? e.g. deploy nova and something else in the inventory at the same time?
16:22:05 <andymccr> that kinda makes sense
16:22:34 <andymccr> (this is why we need a distributed inventory!) :D
16:23:08 <mattt> andymccr: :)  yeah not entirely sure, maybe shananigans can chime in if he's free
16:23:16 <evrardjp> nah we don't need distributed inventory, we need it to be multithread safe.
16:23:27 <evrardjp> don't go for too complex :)
16:23:37 <odyssey4me> andymccr it could also be multiple people doing multiple things at the same time, but with the same inventory
16:23:54 <andymccr> we could avoid doing the tar in the script and instead create a backups dir that gets tarred up once per day as part of a cron?
16:23:57 <odyssey4me> for example - jenkins is executing the upgrade, meanwhile I'm doing some maintenance
16:24:14 <openstackgerrit> German Eichberger proposed openstack/openstack-ansible-os_octavia master: Fixes Lint errors and improve tests  https://review.openstack.org/544117
16:24:17 <andymccr> that way we only tar once and each run will create its own backup inventory
16:24:20 <andymccr> so that should avoid corruption
16:24:31 <odyssey4me> if we're worried just about the tarball, we could just timestamp it and maintain a history of x versions
16:24:42 <andymccr> i think we do - but the tarball is a tar of those to save space
16:24:57 <evrardjp> yeah that's what we do indeed.
16:24:58 <andymccr> so we create timestamped backup confs that we then tar up, and we untar add the new one and tar up again on each run
16:25:03 <andymccr> at least thats my idea
16:25:12 <odyssey4me> but the issue is actually that the inventory itself is a static file, and that file can't really be safely modified at the same time by multiple executions of the script... but in this case that's happening
16:25:12 <evrardjp> that's correct
16:25:13 <andymccr> so if we just dont tar as part of the inventory script
16:25:15 <evrardjp> well
16:25:22 <evrardjp> we just append I think
16:25:25 <andymccr> hmm
16:25:31 <andymccr> ok so the issue is the inventory script itself
16:25:33 <andymccr> thats a problem then
16:25:36 <evrardjp> I think this should be moved out of the inventory
16:25:45 <evrardjp> yeah that's definitely inventory script failure there
16:25:45 <odyssey4me> wow, simpler would just be to have the 'openstac-ansible' wrapper do the tarballing and take it out of the inventory script
16:26:09 <andymccr> yeah ^ that'd work
16:26:13 <andymccr> but i think it sounds like its still broken
16:26:15 <andymccr> hmm
16:26:57 <evrardjp> well let's just triage this first
16:27:10 <evrardjp> instead of thinking of solutions
16:27:21 <evrardjp> is that a new feature we want to provide, or do we think it's a bug?
16:27:29 <andymccr> you could classify as either
16:27:36 <andymccr> we didnt design it to be run multiple times as jesse said
16:27:42 <evrardjp> IMO it's a new feature we provide, but it proves a big bug we have pending on our noses.
16:27:49 <andymccr> yeh
16:28:10 <evrardjp> so I'd like to classify this as something medium or high.
16:28:56 <odyssey4me> I'd say this is high.
16:29:00 <odyssey4me> Confirmed.
16:29:43 <odyssey4me> The bug problem itself can be worked around with a band-aid, but the feature request for multi-user inventory should be registered separately.
16:29:44 <evrardjp> ok I validate this then.
16:29:56 <evrardjp> I agree.
16:29:58 <odyssey4me> That feature request is not actually asked for here though, so don't comment that.
16:30:09 <odyssey4me> We're deriving a feature request based on the bug.
16:30:16 <shananigans> matt, andymcrr: Sorry, was in a meeting.  We were seeing bug 1750233 in a larger environment where multiple admins are running various plays to get information from the servers.
16:30:16 <openstack> bug 1750233 in openstack-ansible "corrupted dynamic_inventory.py backup file" [High,Confirmed] https://launchpad.net/bugs/1750233
16:30:24 <evrardjp> well it's the first line of the bug.
16:30:44 <evrardjp> ok let's move on then, now that we have triaged the bug
16:30:59 <mattt> shananigans: yeah, in my mind i could imagine a deployment running ansible routinely to obtain information about the deployment, and hitting this race conditino
16:31:19 <evrardjp> next
16:31:22 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1749990
16:31:22 <openstack> Launchpad bug 1749990 in openstack-ansible "Fails to update the a-r-r file properly when stable branch is used" [Undecided,New]
16:32:11 <evrardjp> I think this can be marked as confirmed and low.
16:32:26 <evrardjp> ok for everyone ?
16:32:53 <evrardjp> let's move on, we have many bugs open today.
16:32:56 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1749680
16:32:56 <openstack> Launchpad bug 1749680 in openstack-ansible "Ensure apt operations have retries" [Undecided,New]
16:32:56 <hwoarang> ok
16:33:30 <hwoarang> dont know what to say about that. by the same argument, every network operation should have a retry option
16:33:43 <evrardjp> probably yes.
16:33:54 <evrardjp> ansible is notoriously unreliable.
16:34:03 <evrardjp> wow.
16:34:13 <evrardjp> that's not what I meant without proper context.
16:34:41 <evrardjp> I think that it's possible that you have a process issue if a network disruption happens.
16:35:13 <hwoarang> and this bug shouldn cover just apt
16:35:19 <hwoarang> every network op is subject to this problem
16:35:27 <hwoarang> overall i am not sure i agree this is a real bug
16:35:31 <evrardjp> Retrying is fair on modules that don't implement retries, or for modules whose operations are dependant on somethink that's prone to connectivity issues.
16:35:40 <andymccr> yeah
16:36:04 <evrardjp> hwoarang: that's fair too
16:36:06 <andymccr> you could argue that if apt module needs retries all the time it should be hardcoded in ansible.
16:36:15 <hwoarang> every module is subject to connectivity issues since it talks to remote hosts
16:36:19 <andymccr> but if we are doing it everywhere but in a few places it seems sensible to have consistency as well
16:36:29 <evrardjp> yup
16:36:45 <evrardjp> hwoarang: I mean without counting the connection plugin in itself
16:37:14 <odyssey4me> it's not a bug, its a wishlist item
16:37:17 <hwoarang> yeah
16:37:22 <evrardjp> I think the linting bring consistency
16:37:26 <odyssey4me> it happens to cause failures, so it matters to us
16:37:26 <evrardjp> it's a wishlist indeed.
16:37:33 <odyssey4me> a lint test would help us enforce such a thing, yes
16:38:15 <evrardjp> well, I understand both positions -- invalid for the fact it's not really a bug because we shouldn't do it -- and whishlist for consistency
16:38:30 <odyssey4me> basically, if we did this, deployments would be more reliable in their results
16:38:35 <evrardjp> I'd prefer to classify this as whishlist and we can think about improving ansible in the future.
16:39:09 <evrardjp> just so you know -- ansible-devel also has lots of flaky tests when installing packages.
16:39:21 <andymccr> yeah im all for consistency tbh so i wouldnt be against adding it in as a feature
16:39:43 <evrardjp> so there might be improvements coming upstream in the future too, not that it's currently planned.
16:39:53 <evrardjp> let's mark it as wishlist
16:40:19 <evrardjp> next
16:40:22 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1749255
16:40:22 <openstack> Launchpad bug 1749255 in openstack-ansible "haproxy healthcheck for API placement is broken" [Undecided,New]
16:40:35 <andymccr> i swear this thing breaks once a cycle
16:40:46 <evrardjp> haha yeah it broke recently in queens.
16:41:01 <andymccr> sigh
16:41:03 <odyssey4me> yeah, every cycle and sometimes in the cycle they change an interface
16:41:07 <evrardjp> I fixed it I think, so this should be fixed released.
16:41:10 <andymccr> ok cool
16:41:11 <andymccr> next!
16:41:26 <odyssey4me> the return code, or the path
16:41:28 <evrardjp> I fixed it before the bug was posted I think :p
16:41:40 <evrardjp> rc
16:41:55 <evrardjp> next
16:41:57 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1749083
16:41:57 <openstack> Launchpad bug 1749083 in openstack-ansible "Nova, Glance, Cinder, ... downtime during O to P upgrade" [Undecided,New]
16:42:09 <evrardjp> oh yeah. I am on it.
16:42:23 <evrardjp> when not busy on other things.
16:42:32 <evrardjp> Confirmed and high, ok for everyone?
16:42:42 <evrardjp> it's not breaking gates, it's counting that's wrong
16:43:01 <odyssey4me> yeah
16:43:08 <evrardjp> ok moving on
16:43:17 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1748951
16:43:17 <openstack> Launchpad bug 1748951 in openstack-ansible "Use default sysctl_file in openstack_hosts" [Undecided,New]
16:43:42 <evrardjp> wishlist?
16:43:57 <mattt> definitely wishlist
16:43:58 <evrardjp> he is overriding the file, so overriding our defaults. I think he should be using the module :)
16:44:12 <evrardjp> that would make things work in an idempotent way
16:45:00 <evrardjp> ok commented.
16:45:16 <evrardjp> next
16:45:19 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1747684
16:45:19 <openstack> Launchpad bug 1747684 in openstack-ansible "Default Values Do Not Allow Image Uploads to Glance from Horizon" [Undecided,New]
16:46:06 <evrardjp> maybe we should ship with different defaults?
16:46:49 <evrardjp> the changes in horizon in Pike state the default is to use external endpoints
16:47:02 <evrardjp> so it would make sense to work on this
16:47:10 <mattt> i'll need 30 mins to process that bug, so no comment
16:47:12 <evrardjp> and provide better defaults.
16:47:13 <mattt> :)
16:47:48 <odyssey4me> evrardjp if we use the public endpoint by default, due to the self-signed cert it will be broken out the box
16:48:00 <odyssey4me> so to change that we have to ensure that we also configure the cert CA
16:48:21 <odyssey4me> ultimately, yes, we should use the public endpoint by default
16:48:22 <jwitko_> Hey All, is it possible to specify the index of the product of an intersected group of hosts? For example,  if i do  "hosts: group1:&group2[0]",  this intersects group1 with the index0 host from group2
16:48:24 <evrardjp> unless the default should be to use HTTP.
16:48:33 <odyssey4me> because the endpoints shown to the user should be public, not internal
16:48:46 <logan-> i think the self signed cert might be breaking it anyway
16:48:53 <odyssey4me> we could do that, but then we'll not catch bugs where the wrong endpoint is being used
16:49:10 <logan-> because iirc horizon fixed the "bug" for direct uploads and it now directs the client to use public glance endpoints regardless of what endpoints horizon uses
16:49:10 <odyssey4me> so yes - the intent is good, but some plumbing needs to be done
16:49:36 <evrardjp> is there no insecure flags that can be set?
16:49:38 <odyssey4me> haha, well that's always fun
16:50:01 <odyssey4me> there probably is - not sure, but that's not doing anyone any favors either
16:50:01 <evrardjp> I think it's the same conversation over and over again
16:50:29 <evrardjp> does someone deserve to run openstack if he hasn't valid certs?
16:50:33 <evrardjp> hahaha
16:51:12 <evrardjp> what do we do?
16:51:16 <evrardjp> and who could work on this?
16:51:22 <logan-> in the case where self signed certs are used, legacy mode will work fine. maybe we should make sure our developer and AIO example configs reflect that
16:51:24 <openstackgerrit> Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-os_nova stable/queens: Change include: to include_tasks:  https://review.openstack.org/546231
16:51:32 <evrardjp> FYI: https://review.openstack.org/#/c/525491/
16:51:40 <openstackgerrit> Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-os_nova stable/queens: Remove systemd conditionals  https://review.openstack.org/546232
16:53:02 <openstackgerrit> Major Hayden proposed openstack/openstack-ansible master: Enable profile_tasks callback for AIO bootstrap  https://review.openstack.org/546233
16:53:31 <evrardjp> so what's the situation ,  what do we decide?
16:53:45 <evrardjp> That looks like confirmed -- not so sure how we're gonna classify this
16:54:26 <logan-> mediumish? the defaults need improvement but with configuration (easiest way is to set upload_mode legacy) this can be easily worked around
16:55:36 <evrardjp> ok
16:56:22 <evrardjp> next
16:56:34 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1747629
16:56:34 <openstack> Launchpad bug 1747629 in openstack-ansible "A worker was found in dead state" [Undecided,New]
16:57:28 <evrardjp> I think it's valid to cleanup the translations build, in order to make it pass.
16:57:32 <evrardjp> Confirmed and high.
16:57:39 <evrardjp> ok for everyone?
16:58:07 <odyssey4me> yeah, perhaps better to move to using the non-container build for it to save up some resource usage
16:58:35 <odyssey4me> and look through all the services to see if we've ensured that they're all properly constrained
16:58:57 <mattt> what exactly does the translations job do?
16:58:59 <odyssey4me> ie there should not be a default number of workers/threads/processes for each service - but instead a smaller set as per the AIO config
16:59:50 <odyssey4me> mattt it installs every service that has a horizon plugin, and is used by the translations team to validate whether the language conversations are working right
17:00:02 <evrardjp> ok let's do a last one for today
17:00:13 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1745281
17:00:13 <openstack> Launchpad bug 1745281 in openstack-ansible "galera_server : Create galera users fails on CentOS7" [Undecided,New] - Assigned to Jean-Philippe Evrard (jean-philippe-evrard)
17:00:19 <evrardjp> what
17:00:24 <evrardjp> why is this assigned to me?
17:00:41 <mattt> yeah i've seen that bug
17:00:58 <mattt> that thing is valid
17:01:01 <evrardjp> yeah
17:01:11 <evrardjp> mhayden: told me the case was important
17:01:15 <odyssey4me> evrardjp heh, it shows you assigned yourself :p
17:01:25 <evrardjp> odyssey4me: yeah during the galera issue period
17:01:30 <evrardjp> but that's something different
17:01:47 <evrardjp> I'll assign that to mhayden
17:01:53 <evrardjp> anyone against?
17:01:55 <evrardjp> :D
17:02:02 <odyssey4me> yup, something is hinky there - I remember mnaser also wondering what the heck was going on
17:02:08 <evrardjp> yeah
17:02:13 <evrardjp> it was in galera_client at that time
17:02:15 <odyssey4me> if mhayden or mgagne can pick that up it'd be nice
17:02:15 <mattt> evrardjp: no objections, i already asked him to look at it :P
17:02:25 <evrardjp> but now the link is for galera_server
17:02:34 <evrardjp> it looks like the package name case doesn't matter after all
17:02:51 <evrardjp> mattt: hahah great.
17:03:00 <evrardjp> let's wrap up for today.
17:03:27 <evrardjp> thanks everyone!
17:03:31 <mattt> thanks!
17:03:38 <evrardjp> #endmeeting