16:04:15 #startmeeting openstack_ansible_meeting 16:04:16 Meeting started Tue Feb 20 16:04:15 2018 UTC and is due to finish in 60 minutes. The chair is evrardjp. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:04:17 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:04:18 o/ 16:04:19 The meeting name has been set to 'openstack_ansible_meeting' 16:04:33 ο/ 16:04:57 good, we are already 3. 16:05:00 o/ 16:05:03 4! 16:05:05 omg! 16:05:32 o/ 16:05:33 lol 16:05:41 quick before we disappear 16:05:45 haha 16:06:08 Merged openstack/openstack-ansible-os_nova master: Change include: to include_tasks: https://review.openstack.org/544986 16:06:50 ok let's move on to the agenda then! 16:07:00 #topic focus of the week 16:07:06 this week is! 16:07:11 drumroll.... 16:07:13 Wrapping up Newton, stabilization of queens branch by fixing bugs 16:07:33 so basically newton is close of EOL 16:07:55 I will send a message to ML soon to do a last warning sign :) 16:08:23 for the rest, I think it would be nice to fix the upgrades to queens and improve stability of queens in general. 16:08:28 ok let's move on to bug triage 16:08:37 #topic bugtriage 16:08:43 #link https://bugs.launchpad.net/openstack-ansible/+bug/1750241 16:08:43 Launchpad bug 1750241 in openstack-ansible "creating OS::Neutron::FloatingIP with OS::Neutron::LBaaS::LoadBalancer" [Undecided,New] 16:09:47 I don't understand this. Is that our thing? 16:09:57 * hwoarang has no clue 16:10:14 It looks like it's usage 16:11:21 any idea? 16:11:25 should I say invalid? 16:11:30 incomplete? 16:11:30 hmm 16:11:33 thats a heat template 16:11:36 yeah 16:11:48 it looks like a depends on something 16:11:59 which we don't do, and probably shouldn't 16:11:59 im not sure what the issue is at all 16:12:06 ahah welcome to the club 16:12:28 incomplete? 16:12:46 yeah incomplete 16:12:50 I asked a question there 16:13:04 #link https://bugs.launchpad.net/openstack-ansible/+bug/1750236 16:13:04 Launchpad bug 1750236 in openstack-ansible "os::aodh::alarm via heat stable/pike ubuntu & centos http 503" [Undecided,New] 16:13:24 ok so since its the same reporter 16:13:29 omg. 16:13:33 im guessing the issue is that using heat with OSA right now is causing issues? 16:13:52 but tl;dr aodh is broken there id guess :D 16:13:55 with gnocchi and autoscaling. 16:14:49 so I guess unless someone has the time to confirm, we should move to another bug 16:14:57 ok for everyone? 16:15:27 I assume yes. 16:15:30 next 16:15:31 #link https://bugs.launchpad.net/openstack-ansible/+bug/1750233 16:15:31 Launchpad bug 1750233 in openstack-ansible "corrupted dynamic_inventory.py backup file" [Undecided,New] 16:15:36 ok this one is VERY nice. 16:16:04 I suspect those heat issues relate to bad config - endpoints for example. 16:16:08 I want to work on it, but I am still lacking cycles, so I'd like to classify this, so we know what we can do. 16:16:25 I don't know heat very well, but it does require access to valid endpoints, and theire showing gateway errors and internal server errors. 16:17:28 heh, yeah - our inventory is designed for single-use only 16:17:32 odyssey4me: can you comment on the bug, to ask for more detailed information maybe? Or should I do it? 16:17:39 multiple accessors at the same time *will* break it 16:17:50 odyssey4me: yeah, that inventory thing show how many issues we will hit if we do things in parallel. 16:17:51 heat does require valid endpoints 16:17:58 we should probably implement some sort of lock file or whatever 16:18:12 or not allow it to change itself ;) 16:18:16 or moved to static inventory, or safe system 16:18:26 yeah 16:18:27 so 16:18:42 I think this is a feature addition, which is kinda whishlist 16:18:53 although, this is a SEVERE bug in usability 16:19:54 hmm 16:21:08 can i ask what the usecase is there? 16:21:23 i guess its still broken but hmm 16:21:24 you can ask :D 16:21:32 the usecase of multiple simultaneous ansible runs? 16:21:59 mattt: ok so the aim is to run separate things at the same time? e.g. deploy nova and something else in the inventory at the same time? 16:22:05 that kinda makes sense 16:22:34 (this is why we need a distributed inventory!) :D 16:23:08 andymccr: :) yeah not entirely sure, maybe shananigans can chime in if he's free 16:23:16 nah we don't need distributed inventory, we need it to be multithread safe. 16:23:27 don't go for too complex :) 16:23:37 andymccr it could also be multiple people doing multiple things at the same time, but with the same inventory 16:23:54 we could avoid doing the tar in the script and instead create a backups dir that gets tarred up once per day as part of a cron? 16:23:57 for example - jenkins is executing the upgrade, meanwhile I'm doing some maintenance 16:24:14 German Eichberger proposed openstack/openstack-ansible-os_octavia master: Fixes Lint errors and improve tests https://review.openstack.org/544117 16:24:17 that way we only tar once and each run will create its own backup inventory 16:24:20 so that should avoid corruption 16:24:31 if we're worried just about the tarball, we could just timestamp it and maintain a history of x versions 16:24:42 i think we do - but the tarball is a tar of those to save space 16:24:57 yeah that's what we do indeed. 16:24:58 so we create timestamped backup confs that we then tar up, and we untar add the new one and tar up again on each run 16:25:03 at least thats my idea 16:25:12 but the issue is actually that the inventory itself is a static file, and that file can't really be safely modified at the same time by multiple executions of the script... but in this case that's happening 16:25:12 that's correct 16:25:13 so if we just dont tar as part of the inventory script 16:25:15 well 16:25:22 we just append I think 16:25:25 hmm 16:25:31 ok so the issue is the inventory script itself 16:25:33 thats a problem then 16:25:36 I think this should be moved out of the inventory 16:25:45 yeah that's definitely inventory script failure there 16:25:45 wow, simpler would just be to have the 'openstac-ansible' wrapper do the tarballing and take it out of the inventory script 16:26:09 yeah ^ that'd work 16:26:13 but i think it sounds like its still broken 16:26:15 hmm 16:26:57 well let's just triage this first 16:27:10 instead of thinking of solutions 16:27:21 is that a new feature we want to provide, or do we think it's a bug? 16:27:29 you could classify as either 16:27:36 we didnt design it to be run multiple times as jesse said 16:27:42 IMO it's a new feature we provide, but it proves a big bug we have pending on our noses. 16:27:49 yeh 16:28:10 so I'd like to classify this as something medium or high. 16:28:56 I'd say this is high. 16:29:00 Confirmed. 16:29:43 The bug problem itself can be worked around with a band-aid, but the feature request for multi-user inventory should be registered separately. 16:29:44 ok I validate this then. 16:29:56 I agree. 16:29:58 That feature request is not actually asked for here though, so don't comment that. 16:30:09 We're deriving a feature request based on the bug. 16:30:16 matt, andymcrr: Sorry, was in a meeting. We were seeing bug 1750233 in a larger environment where multiple admins are running various plays to get information from the servers. 16:30:16 bug 1750233 in openstack-ansible "corrupted dynamic_inventory.py backup file" [High,Confirmed] https://launchpad.net/bugs/1750233 16:30:24 well it's the first line of the bug. 16:30:44 ok let's move on then, now that we have triaged the bug 16:30:59 shananigans: yeah, in my mind i could imagine a deployment running ansible routinely to obtain information about the deployment, and hitting this race conditino 16:31:19 next 16:31:22 #link https://bugs.launchpad.net/openstack-ansible/+bug/1749990 16:31:22 Launchpad bug 1749990 in openstack-ansible "Fails to update the a-r-r file properly when stable branch is used" [Undecided,New] 16:32:11 I think this can be marked as confirmed and low. 16:32:26 ok for everyone ? 16:32:53 let's move on, we have many bugs open today. 16:32:56 #link https://bugs.launchpad.net/openstack-ansible/+bug/1749680 16:32:56 Launchpad bug 1749680 in openstack-ansible "Ensure apt operations have retries" [Undecided,New] 16:32:56 ok 16:33:30 dont know what to say about that. by the same argument, every network operation should have a retry option 16:33:43 probably yes. 16:33:54 ansible is notoriously unreliable. 16:34:03 wow. 16:34:13 that's not what I meant without proper context. 16:34:41 I think that it's possible that you have a process issue if a network disruption happens. 16:35:13 and this bug shouldn cover just apt 16:35:19 every network op is subject to this problem 16:35:27 overall i am not sure i agree this is a real bug 16:35:31 Retrying is fair on modules that don't implement retries, or for modules whose operations are dependant on somethink that's prone to connectivity issues. 16:35:40 yeah 16:36:04 hwoarang: that's fair too 16:36:06 you could argue that if apt module needs retries all the time it should be hardcoded in ansible. 16:36:15 every module is subject to connectivity issues since it talks to remote hosts 16:36:19 but if we are doing it everywhere but in a few places it seems sensible to have consistency as well 16:36:29 yup 16:36:45 hwoarang: I mean without counting the connection plugin in itself 16:37:14 it's not a bug, its a wishlist item 16:37:17 yeah 16:37:22 I think the linting bring consistency 16:37:26 it happens to cause failures, so it matters to us 16:37:26 it's a wishlist indeed. 16:37:33 a lint test would help us enforce such a thing, yes 16:38:15 well, I understand both positions -- invalid for the fact it's not really a bug because we shouldn't do it -- and whishlist for consistency 16:38:30 basically, if we did this, deployments would be more reliable in their results 16:38:35 I'd prefer to classify this as whishlist and we can think about improving ansible in the future. 16:39:09 just so you know -- ansible-devel also has lots of flaky tests when installing packages. 16:39:21 yeah im all for consistency tbh so i wouldnt be against adding it in as a feature 16:39:43 so there might be improvements coming upstream in the future too, not that it's currently planned. 16:39:53 let's mark it as wishlist 16:40:19 next 16:40:22 #link https://bugs.launchpad.net/openstack-ansible/+bug/1749255 16:40:22 Launchpad bug 1749255 in openstack-ansible "haproxy healthcheck for API placement is broken" [Undecided,New] 16:40:35 i swear this thing breaks once a cycle 16:40:46 haha yeah it broke recently in queens. 16:41:01 sigh 16:41:03 yeah, every cycle and sometimes in the cycle they change an interface 16:41:07 I fixed it I think, so this should be fixed released. 16:41:10 ok cool 16:41:11 next! 16:41:26 the return code, or the path 16:41:28 I fixed it before the bug was posted I think :p 16:41:40 rc 16:41:55 next 16:41:57 #link https://bugs.launchpad.net/openstack-ansible/+bug/1749083 16:41:57 Launchpad bug 1749083 in openstack-ansible "Nova, Glance, Cinder, ... downtime during O to P upgrade" [Undecided,New] 16:42:09 oh yeah. I am on it. 16:42:23 when not busy on other things. 16:42:32 Confirmed and high, ok for everyone? 16:42:42 it's not breaking gates, it's counting that's wrong 16:43:01 yeah 16:43:08 ok moving on 16:43:17 #link https://bugs.launchpad.net/openstack-ansible/+bug/1748951 16:43:17 Launchpad bug 1748951 in openstack-ansible "Use default sysctl_file in openstack_hosts" [Undecided,New] 16:43:42 wishlist? 16:43:57 definitely wishlist 16:43:58 he is overriding the file, so overriding our defaults. I think he should be using the module :) 16:44:12 that would make things work in an idempotent way 16:45:00 ok commented. 16:45:16 next 16:45:19 #link https://bugs.launchpad.net/openstack-ansible/+bug/1747684 16:45:19 Launchpad bug 1747684 in openstack-ansible "Default Values Do Not Allow Image Uploads to Glance from Horizon" [Undecided,New] 16:46:06 maybe we should ship with different defaults? 16:46:49 the changes in horizon in Pike state the default is to use external endpoints 16:47:02 so it would make sense to work on this 16:47:10 i'll need 30 mins to process that bug, so no comment 16:47:12 and provide better defaults. 16:47:13 :) 16:47:48 evrardjp if we use the public endpoint by default, due to the self-signed cert it will be broken out the box 16:48:00 so to change that we have to ensure that we also configure the cert CA 16:48:21 ultimately, yes, we should use the public endpoint by default 16:48:22 Hey All, is it possible to specify the index of the product of an intersected group of hosts? For example, if i do "hosts: group1:&group2[0]", this intersects group1 with the index0 host from group2 16:48:24 unless the default should be to use HTTP. 16:48:33 because the endpoints shown to the user should be public, not internal 16:48:46 i think the self signed cert might be breaking it anyway 16:48:53 we could do that, but then we'll not catch bugs where the wrong endpoint is being used 16:49:10 because iirc horizon fixed the "bug" for direct uploads and it now directs the client to use public glance endpoints regardless of what endpoints horizon uses 16:49:10 so yes - the intent is good, but some plumbing needs to be done 16:49:36 is there no insecure flags that can be set? 16:49:38 haha, well that's always fun 16:50:01 there probably is - not sure, but that's not doing anyone any favors either 16:50:01 I think it's the same conversation over and over again 16:50:29 does someone deserve to run openstack if he hasn't valid certs? 16:50:33 hahaha 16:51:12 what do we do? 16:51:16 and who could work on this? 16:51:22 in the case where self signed certs are used, legacy mode will work fine. maybe we should make sure our developer and AIO example configs reflect that 16:51:24 Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-os_nova stable/queens: Change include: to include_tasks: https://review.openstack.org/546231 16:51:32 FYI: https://review.openstack.org/#/c/525491/ 16:51:40 Jesse Pretorius (odyssey4me) proposed openstack/openstack-ansible-os_nova stable/queens: Remove systemd conditionals https://review.openstack.org/546232 16:53:02 Major Hayden proposed openstack/openstack-ansible master: Enable profile_tasks callback for AIO bootstrap https://review.openstack.org/546233 16:53:31 so what's the situation , what do we decide? 16:53:45 That looks like confirmed -- not so sure how we're gonna classify this 16:54:26 mediumish? the defaults need improvement but with configuration (easiest way is to set upload_mode legacy) this can be easily worked around 16:55:36 ok 16:56:22 next 16:56:34 #link https://bugs.launchpad.net/openstack-ansible/+bug/1747629 16:56:34 Launchpad bug 1747629 in openstack-ansible "A worker was found in dead state" [Undecided,New] 16:57:28 I think it's valid to cleanup the translations build, in order to make it pass. 16:57:32 Confirmed and high. 16:57:39 ok for everyone? 16:58:07 yeah, perhaps better to move to using the non-container build for it to save up some resource usage 16:58:35 and look through all the services to see if we've ensured that they're all properly constrained 16:58:57 what exactly does the translations job do? 16:58:59 ie there should not be a default number of workers/threads/processes for each service - but instead a smaller set as per the AIO config 16:59:50 mattt it installs every service that has a horizon plugin, and is used by the translations team to validate whether the language conversations are working right 17:00:02 ok let's do a last one for today 17:00:13 #link https://bugs.launchpad.net/openstack-ansible/+bug/1745281 17:00:13 Launchpad bug 1745281 in openstack-ansible "galera_server : Create galera users fails on CentOS7" [Undecided,New] - Assigned to Jean-Philippe Evrard (jean-philippe-evrard) 17:00:19 what 17:00:24 why is this assigned to me? 17:00:41 yeah i've seen that bug 17:00:58 that thing is valid 17:01:01 yeah 17:01:11 mhayden: told me the case was important 17:01:15 evrardjp heh, it shows you assigned yourself :p 17:01:25 odyssey4me: yeah during the galera issue period 17:01:30 but that's something different 17:01:47 I'll assign that to mhayden 17:01:53 anyone against? 17:01:55 :D 17:02:02 yup, something is hinky there - I remember mnaser also wondering what the heck was going on 17:02:08 yeah 17:02:13 it was in galera_client at that time 17:02:15 if mhayden or mgagne can pick that up it'd be nice 17:02:15 evrardjp: no objections, i already asked him to look at it :P 17:02:25 but now the link is for galera_server 17:02:34 it looks like the package name case doesn't matter after all 17:02:51 mattt: hahah great. 17:03:00 let's wrap up for today. 17:03:27 thanks everyone! 17:03:31 thanks! 17:03:38 #endmeeting