16:04:15 <evrardjp> #startmeeting openstack_ansible_meeting 16:04:16 <openstack> Meeting started Tue Feb 20 16:04:15 2018 UTC and is due to finish in 60 minutes. The chair is evrardjp. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:04:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:04:18 <d34dh0r53> o/ 16:04:19 <openstack> The meeting name has been set to 'openstack_ansible_meeting' 16:04:33 <hwoarang> ο/ 16:04:57 <evrardjp> good, we are already 3. 16:05:00 <jmccrory> o/ 16:05:03 <evrardjp> 4! 16:05:05 <evrardjp> omg! 16:05:32 <andymccr> o/ 16:05:33 <hwoarang> lol 16:05:41 <hwoarang> quick before we disappear 16:05:45 <andymccr> haha 16:06:08 <openstackgerrit> Merged openstack/openstack-ansible-os_nova master: Change include: to include_tasks: https://review.openstack.org/544986 16:06:50 <evrardjp> ok let's move on to the agenda then! 16:07:00 <evrardjp> #topic focus of the week 16:07:06 <evrardjp> this week is! 16:07:11 <evrardjp> drumroll.... 16:07:13 <evrardjp> Wrapping up Newton, stabilization of queens branch by fixing bugs 16:07:33 <evrardjp> so basically newton is close of EOL 16:07:55 <evrardjp> I will send a message to ML soon to do a last warning sign :) 16:08:23 <evrardjp> for the rest, I think it would be nice to fix the upgrades to queens and improve stability of queens in general. 16:08:28 <evrardjp> ok let's move on to bug triage 16:08:37 <evrardjp> #topic bugtriage 16:08:43 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1750241 16:08:43 <openstack> Launchpad bug 1750241 in openstack-ansible "creating OS::Neutron::FloatingIP with OS::Neutron::LBaaS::LoadBalancer" [Undecided,New] 16:09:47 <evrardjp> I don't understand this. Is that our thing? 16:09:57 * hwoarang has no clue 16:10:14 <evrardjp> It looks like it's usage 16:11:21 <evrardjp> any idea? 16:11:25 <evrardjp> should I say invalid? 16:11:30 <evrardjp> incomplete? 16:11:30 <andymccr> hmm 16:11:33 <andymccr> thats a heat template 16:11:36 <evrardjp> yeah 16:11:48 <evrardjp> it looks like a depends on something 16:11:59 <evrardjp> which we don't do, and probably shouldn't 16:11:59 <andymccr> im not sure what the issue is at all 16:12:06 <evrardjp> ahah welcome to the club 16:12:28 <andymccr> incomplete? 16:12:46 <evrardjp> yeah incomplete 16:12:50 <evrardjp> I asked a question there 16:13:04 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1750236 16:13:04 <openstack> Launchpad bug 1750236 in openstack-ansible "os::aodh::alarm via heat stable/pike ubuntu & centos http 503" [Undecided,New] 16:13:24 <andymccr> ok so since its the same reporter 16:13:29 <evrardjp> omg. 16:13:33 <andymccr> im guessing the issue is that using heat with OSA right now is causing issues? 16:13:52 <andymccr> but tl;dr aodh is broken there id guess :D 16:13:55 <evrardjp> with gnocchi and autoscaling. 16:14:49 <evrardjp> so I guess unless someone has the time to confirm, we should move to another bug 16:14:57 <evrardjp> ok for everyone? 16:15:27 <evrardjp> I assume yes. 16:15:30 <evrardjp> next 16:15:31 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1750233 16:15:31 <openstack> Launchpad bug 1750233 in openstack-ansible "corrupted dynamic_inventory.py backup file" [Undecided,New] 16:15:36 <evrardjp> ok this one is VERY nice. 16:16:04 <odyssey4me> I suspect those heat issues relate to bad config - endpoints for example. 16:16:08 <evrardjp> I want to work on it, but I am still lacking cycles, so I'd like to classify this, so we know what we can do. 16:16:25 <odyssey4me> I don't know heat very well, but it does require access to valid endpoints, and theire showing gateway errors and internal server errors. 16:17:28 <odyssey4me> heh, yeah - our inventory is designed for single-use only 16:17:32 <evrardjp> odyssey4me: can you comment on the bug, to ask for more detailed information maybe? Or should I do it? 16:17:39 <odyssey4me> multiple accessors at the same time *will* break it 16:17:50 <evrardjp> odyssey4me: yeah, that inventory thing show how many issues we will hit if we do things in parallel. 16:17:51 <spotz> heat does require valid endpoints 16:17:58 <odyssey4me> we should probably implement some sort of lock file or whatever 16:18:12 <odyssey4me> or not allow it to change itself ;) 16:18:16 <evrardjp> or moved to static inventory, or safe system 16:18:26 <evrardjp> yeah 16:18:27 <evrardjp> so 16:18:42 <evrardjp> I think this is a feature addition, which is kinda whishlist 16:18:53 <evrardjp> although, this is a SEVERE bug in usability 16:19:54 <andymccr> hmm 16:21:08 <andymccr> can i ask what the usecase is there? 16:21:23 <andymccr> i guess its still broken but hmm 16:21:24 <evrardjp> you can ask :D 16:21:32 <mattt> the usecase of multiple simultaneous ansible runs? 16:21:59 <andymccr> mattt: ok so the aim is to run separate things at the same time? e.g. deploy nova and something else in the inventory at the same time? 16:22:05 <andymccr> that kinda makes sense 16:22:34 <andymccr> (this is why we need a distributed inventory!) :D 16:23:08 <mattt> andymccr: :) yeah not entirely sure, maybe shananigans can chime in if he's free 16:23:16 <evrardjp> nah we don't need distributed inventory, we need it to be multithread safe. 16:23:27 <evrardjp> don't go for too complex :) 16:23:37 <odyssey4me> andymccr it could also be multiple people doing multiple things at the same time, but with the same inventory 16:23:54 <andymccr> we could avoid doing the tar in the script and instead create a backups dir that gets tarred up once per day as part of a cron? 16:23:57 <odyssey4me> for example - jenkins is executing the upgrade, meanwhile I'm doing some maintenance 16:24:14 <openstackgerrit> German Eichberger proposed openstack/openstack-ansible-os_octavia master: Fixes Lint errors and improve tests https://review.openstack.org/544117 16:24:17 <andymccr> that way we only tar once and each run will create its own backup inventory 16:24:20 <andymccr> so that should avoid corruption 16:24:31 <odyssey4me> if we're worried just about the tarball, we could just timestamp it and maintain a history of x versions 16:24:42 <andymccr> i think we do - but the tarball is a tar of those to save space 16:24:57 <evrardjp> yeah that's what we do indeed. 16:24:58 <andymccr> so we create timestamped backup confs that we then tar up, and we untar add the new one and tar up again on each run 16:25:03 <andymccr> at least thats my idea 16:25:12 <odyssey4me> but the issue is actually that the inventory itself is a static file, and that file can't really be safely modified at the same time by multiple executions of the script... but in this case that's happening 16:25:12 <evrardjp> that's correct 16:25:13 <andymccr> so if we just dont tar as part of the inventory script 16:25:15 <evrardjp> well 16:25:22 <evrardjp> we just append I think 16:25:25 <andymccr> hmm 16:25:31 <andymccr> ok so the issue is the inventory script itself 16:25:33 <andymccr> thats a problem then 16:25:36 <evrardjp> I think this should be moved out of the inventory 16:25:45 <evrardjp> yeah that's definitely inventory script failure there 16:25:45 <odyssey4me> wow, simpler would just be to have the 'openstac-ansible' wrapper do the tarballing and take it out of the inventory script 16:26:09 <andymccr> yeah ^ that'd work 16:26:13 <andymccr> but i think it sounds like its still broken 16:26:15 <andymccr> hmm 16:26:57 <evrardjp> well let's just triage this first 16:27:10 <evrardjp> instead of thinking of solutions 16:27:21 <evrardjp> is that a new feature we want to provide, or do we think it's a bug? 16:27:29 <andymccr> you could classify as either 16:27:36 <andymccr> we didnt design it to be run multiple times as jesse said 16:27:42 <evrardjp> IMO it's a new feature we provide, but it proves a big bug we have pending on our noses. 16:27:49 <andymccr> yeh 16:28:10 <evrardjp> so I'd like to classify this as something medium or high. 16:28:56 <odyssey4me> I'd say this is high. 16:29:00 <odyssey4me> Confirmed. 16:29:43 <odyssey4me> The bug problem itself can be worked around with a band-aid, but the feature request for multi-user inventory should be registered separately. 16:29:44 <evrardjp> ok I validate this then. 16:29:56 <evrardjp> I agree. 16:29:58 <odyssey4me> That feature request is not actually asked for here though, so don't comment that. 16:30:09 <odyssey4me> We're deriving a feature request based on the bug. 16:30:16 <shananigans> matt, andymcrr: Sorry, was in a meeting. We were seeing bug 1750233 in a larger environment where multiple admins are running various plays to get information from the servers. 16:30:16 <openstack> bug 1750233 in openstack-ansible "corrupted dynamic_inventory.py backup file" [High,Confirmed] https://launchpad.net/bugs/1750233 16:30:24 <evrardjp> well it's the first line of the bug. 16:30:44 <evrardjp> ok let's move on then, now that we have triaged the bug 16:30:59 <mattt> shananigans: yeah, in my mind i could imagine a deployment running ansible routinely to obtain information about the deployment, and hitting this race conditino 16:31:19 <evrardjp> next 16:31:22 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1749990 16:31:22 <openstack> Launchpad bug 1749990 in openstack-ansible "Fails to update the a-r-r file properly when stable branch is used" [Undecided,New] 16:32:11 <evrardjp> I think this can be marked as confirmed and low. 16:32:26 <evrardjp> ok for everyone ? 16:32:53 <evrardjp> let's move on, we have many bugs open today. 16:32:56 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1749680 16:32:56 <openstack> Launchpad bug 1749680 in openstack-ansible "Ensure apt operations have retries" [Undecided,New] 16:32:56 <hwoarang> ok 16:33:30 <hwoarang> dont know what to say about that. by the same argument, every network operation should have a retry option 16:33:43 <evrardjp> probably yes. 16:33:54 <evrardjp> ansible is notoriously unreliable. 16:34:03 <evrardjp> wow. 16:34:13 <evrardjp> that's not what I meant without proper context. 16:34:41 <evrardjp> I think that it's possible that you have a process issue if a network disruption happens. 16:35:13 <hwoarang> and this bug shouldn cover just apt 16:35:19 <hwoarang> every network op is subject to this problem 16:35:27 <hwoarang> overall i am not sure i agree this is a real bug 16:35:31 <evrardjp> Retrying is fair on modules that don't implement retries, or for modules whose operations are dependant on somethink that's prone to connectivity issues. 16:35:40 <andymccr> yeah 16:36:04 <evrardjp> hwoarang: that's fair too 16:36:06 <andymccr> you could argue that if apt module needs retries all the time it should be hardcoded in ansible. 16:36:15 <hwoarang> every module is subject to connectivity issues since it talks to remote hosts 16:36:19 <andymccr> but if we are doing it everywhere but in a few places it seems sensible to have consistency as well 16:36:29 <evrardjp> yup 16:36:45 <evrardjp> hwoarang: I mean without counting the connection plugin in itself 16:37:14 <odyssey4me> it's not a bug, its a wishlist item 16:37:17 <hwoarang> yeah 16:37:22 <evrardjp> I think the linting bring consistency 16:37:26 <odyssey4me> it happens to cause failures, so it matters to us 16:37:26 <evrardjp> it's a wishlist indeed. 16:37:33 <odyssey4me> a lint test would help us enforce such a thing, yes 16:38:15 <evrardjp> well, I understand both positions -- invalid for the fact it's not really a bug because we shouldn't do it -- and whishlist for consistency 16:38:30 <odyssey4me> basically, if we did this, deployments would be more reliable in their results 16:38:35 <evrardjp> I'd prefer to classify this as whishlist and we can think about improving ansible in the future. 16:39:09 <evrardjp> just so you know -- ansible-devel also has lots of flaky tests when installing packages. 16:39:21 <andymccr> yeah im all for consistency tbh so i wouldnt be against adding it in as a feature 16:39:43 <evrardjp> so there might be improvements coming upstream in the future too, not that it's currently planned. 16:39:53 <evrardjp> let's mark it as wishlist 16:40:19 <evrardjp> next 16:40:22 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1749255 16:40:22 <openstack> Launchpad bug 1749255 in openstack-ansible "haproxy healthcheck for API placement is broken" [Undecided,New] 16:40:35 <andymccr> i swear this thing breaks once a cycle 16:40:46 <evrardjp> haha yeah it broke recently in queens. 16:41:01 <andymccr> sigh 16:41:03 <odyssey4me> yeah, every cycle and sometimes in the cycle they change an interface 16:41:07 <evrardjp> I fixed it I think, so this should be fixed released. 16:41:10 <andymccr> ok cool 16:41:11 <andymccr> next! 16:41:26 <odyssey4me> the return code, or the path 16:41:28 <evrardjp> I fixed it before the bug was posted I think :p 16:41:40 <evrardjp> rc 16:41:55 <evrardjp> next 16:41:57 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1749083 16:41:57 <openstack> Launchpad bug 1749083 in openstack-ansible "Nova, Glance, Cinder, ... downtime during O to P upgrade" [Undecided,New] 16:42:09 <evrardjp> oh yeah. I am on it. 16:42:23 <evrardjp> when not busy on other things. 16:42:32 <evrardjp> Confirmed and high, ok for everyone? 16:42:42 <evrardjp> it's not breaking gates, it's counting that's wrong 16:43:01 <odyssey4me> yeah 16:43:08 <evrardjp> ok moving on 16:43:17 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1748951 16:43:17 <openstack> Launchpad bug 1748951 in openstack-ansible "Use default sysctl_file in openstack_hosts" [Undecided,New] 16:43:42 <evrardjp> wishlist? 16:43:57 <mattt> definitely wishlist 16:43:58 <evrardjp> he is overriding the file, so overriding our defaults. I think he should be using the module :) 16:44:12 <evrardjp> that would make things work in an idempotent way 16:45:00 <evrardjp> ok commented. 16:45:16 <evrardjp> next 16:45:19 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1747684 16:45:19 <openstack> Launchpad bug 1747684 in openstack-ansible "Default Values Do Not Allow Image Uploads to Glance from Horizon" [Undecided,New] 16:46:06 <evrardjp> maybe we should ship with different defaults? 16:46:49 <evrardjp> the changes in horizon in Pike state the default is to use external endpoints 16:47:02 <evrardjp> so it would make sense to work on this 16:47:10 <mattt> i'll need 30 mins to process that bug, so no comment 16:47:12 <evrardjp> and provide better defaults. 16:47:13 <mattt> :) 16:47:48 <odyssey4me> evrardjp if we use the public endpoint by default, due to the self-signed cert it will be broken out the box 16:48:00 <odyssey4me> so to change that we have to ensure that we also configure the cert CA 16:48:21 <odyssey4me> ultimately, yes, we should use the public endpoint by default 16:48:22 <jwitko_> Hey All, is it possible to specify the index of the product of an intersected group of hosts? For example, if i do "hosts: group1:&group2[0]", this intersects group1 with the index0 host from group2 16:48:24 <evrardjp> unless the default should be to use HTTP. 16:48:33 <odyssey4me> because the endpoints shown to the user should be public, not internal 16:48:46 <logan-> i think the self signed cert might be breaking it anyway 16:48:53 <odyssey4me> we could do that, but then we'll not catch bugs where the wrong endpoint is being used 16:49:10 <logan-> because iirc horizon fixed the "bug" for direct uploads and it now directs the client to use public glance endpoints regardless of what endpoints horizon uses 16:49:10 <odyssey4me> so yes - the intent is good, but some plumbing needs to be done 16:49:36 <evrardjp> is there no insecure flags that can be set? 16:49:38 <odyssey4me> haha, well that's always fun 16:50:01 <odyssey4me> there probably is - not sure, but that's not doing anyone any favors either 16:50:01 <evrardjp> I think it's the same conversation over and over again 16:50:29 <evrardjp> does someone deserve to run openstack if he hasn't valid certs? 16:50:33 <evrardjp> hahaha 16:51:12 <evrardjp> what do we do? 16:51:16 <evrardjp> and who could work on this? 16:51:22 <logan-> in the case where self signed certs are used, legacy mode will work fine. maybe we should make sure our developer and AIO example configs reflect that 16:51:24 <openstackgerrit> Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-os_nova stable/queens: Change include: to include_tasks: https://review.openstack.org/546231 16:51:32 <evrardjp> FYI: https://review.openstack.org/#/c/525491/ 16:51:40 <openstackgerrit> Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-os_nova stable/queens: Remove systemd conditionals https://review.openstack.org/546232 16:53:02 <openstackgerrit> Major Hayden proposed openstack/openstack-ansible master: Enable profile_tasks callback for AIO bootstrap https://review.openstack.org/546233 16:53:31 <evrardjp> so what's the situation , what do we decide? 16:53:45 <evrardjp> That looks like confirmed -- not so sure how we're gonna classify this 16:54:26 <logan-> mediumish? the defaults need improvement but with configuration (easiest way is to set upload_mode legacy) this can be easily worked around 16:55:36 <evrardjp> ok 16:56:22 <evrardjp> next 16:56:34 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1747629 16:56:34 <openstack> Launchpad bug 1747629 in openstack-ansible "A worker was found in dead state" [Undecided,New] 16:57:28 <evrardjp> I think it's valid to cleanup the translations build, in order to make it pass. 16:57:32 <evrardjp> Confirmed and high. 16:57:39 <evrardjp> ok for everyone? 16:58:07 <odyssey4me> yeah, perhaps better to move to using the non-container build for it to save up some resource usage 16:58:35 <odyssey4me> and look through all the services to see if we've ensured that they're all properly constrained 16:58:57 <mattt> what exactly does the translations job do? 16:58:59 <odyssey4me> ie there should not be a default number of workers/threads/processes for each service - but instead a smaller set as per the AIO config 16:59:50 <odyssey4me> mattt it installs every service that has a horizon plugin, and is used by the translations team to validate whether the language conversations are working right 17:00:02 <evrardjp> ok let's do a last one for today 17:00:13 <evrardjp> #link https://bugs.launchpad.net/openstack-ansible/+bug/1745281 17:00:13 <openstack> Launchpad bug 1745281 in openstack-ansible "galera_server : Create galera users fails on CentOS7" [Undecided,New] - Assigned to Jean-Philippe Evrard (jean-philippe-evrard) 17:00:19 <evrardjp> what 17:00:24 <evrardjp> why is this assigned to me? 17:00:41 <mattt> yeah i've seen that bug 17:00:58 <mattt> that thing is valid 17:01:01 <evrardjp> yeah 17:01:11 <evrardjp> mhayden: told me the case was important 17:01:15 <odyssey4me> evrardjp heh, it shows you assigned yourself :p 17:01:25 <evrardjp> odyssey4me: yeah during the galera issue period 17:01:30 <evrardjp> but that's something different 17:01:47 <evrardjp> I'll assign that to mhayden 17:01:53 <evrardjp> anyone against? 17:01:55 <evrardjp> :D 17:02:02 <odyssey4me> yup, something is hinky there - I remember mnaser also wondering what the heck was going on 17:02:08 <evrardjp> yeah 17:02:13 <evrardjp> it was in galera_client at that time 17:02:15 <odyssey4me> if mhayden or mgagne can pick that up it'd be nice 17:02:15 <mattt> evrardjp: no objections, i already asked him to look at it :P 17:02:25 <evrardjp> but now the link is for galera_server 17:02:34 <evrardjp> it looks like the package name case doesn't matter after all 17:02:51 <evrardjp> mattt: hahah great. 17:03:00 <evrardjp> let's wrap up for today. 17:03:27 <evrardjp> thanks everyone! 17:03:31 <mattt> thanks! 17:03:38 <evrardjp> #endmeeting