16:01:22 #startmeeting openstack_ansible_meeting 16:01:22 Meeting started Tue Mar 21 16:01:22 2017 UTC and is due to finish in 60 minutes. The chair is evrardjp. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:24 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:26 The meeting name has been set to 'openstack_ansible_meeting' 16:01:30 woot 16:01:46 well the list is going down since the last few weeks :) 16:03:00 So there is no action point from last week, so it proves we were either good, or I'm terrible at tracking. 16:03:21 #action evrardjp don't forget to use #action 16:03:51 I guess now everybody has joined and is active 16:03:55 thanks for your presence! 16:04:00 we start now! 16:04:05 [insert gif here] 16:04:10 #link https://bugs.launchpad.net/openstack-ansible/+bug/1673938 16:04:10 Launchpad bug 1673938 in openstack-ansible "git example latest release" [Undecided,New] 16:04:23 hmm id say invalid - we ideally want people using the releases. 16:04:54 because stable/ocata is movable - whereas 15.0.0 isn't. from a doc perspective at least 16:05:21 agreed. 16:05:33 i'll put a note in there 16:05:53 I'm already doing it 16:06:03 done 16:06:05 ahh ok sweet 16:06:07 you can complete if you prefer. 16:06:13 you know my english :p 16:06:22 next 16:06:24 #link https://bugs.launchpad.net/openstack-ansible/+bug/1673889 16:06:25 Launchpad bug 1673889 in openstack-ansible "Nova services do not restart on N->O upgrade" [Undecided,New] 16:06:43 hmm 16:07:15 perhaps that's why our upgrade jobs are failing 16:08:06 well that would explain a lot 16:08:23 (jenkins still loading for me) 16:09:07 system daemon-reload is not properly issues ? 16:09:21 issued* 16:09:27 (still loading...) 16:09:57 o/ 16:10:04 let me know if I can provide more info on that 16:10:29 The jenkins page is ~135M usually, I usually just wget and less it 16:10:39 the containers probably shouldn't restart, but the services should 16:10:39 this one was 65megs or something 16:10:46 but you're right 16:10:46 :p 16:10:54 hmm wondering why a container restart would fix it though - that would imply the services weren't restarted properly 16:10:56 it's maybe too much for my browser 16:11:06 so you may be right evrardjp 16:11:08 andymccr: I think we shouldn't force anything for the restart 16:11:18 we should make it work without restart 16:11:33 evrardjp: agreed - but the only diff there would be services started up with the new venv etc 16:11:35 if restart, that could work, but we shouldn't force a restart :p 16:11:43 except if we do it in a proper serial way 16:11:48 that would imply a regular service restart would've wroked (or at least a daemon-reload and then restart) 16:12:14 hmm hmm 16:12:19 did I miss something? 16:12:36 evrardjp: well it seems it worked - except the services in the containers didnt restart properly, so were still on 14.1.1 code running. 16:12:37 the paste seem to say it's stuck on 14. where it should have been 15 16:12:46 so daemon reload should have worked 16:12:49 odyssey4me: # openstack-ansible setup-hosts.yml Variable files: "-e @/etc/openstack_deploy/user_secrets.yml -e @/etc/openstack_deploy/user_variables.yml " ERROR! Attempted to execute "/opt/openstack-ansible/playbooks/inventory/dynamic_inventory.py" as inventory script: Inventory script (/opt/openstack-ansible/playbooks/inventory/dynamic_inventory.py) had an execution error: No container or management network specified in user 16:12:57 but if a container restart fixed that - then all that happened was the services started on boot using the correct venv 16:13:22 oh yes, I think we arrived to the same conclusion, I thought you were infirming things :p 16:13:50 ok my browser crashed. 16:13:54 thats a bad bug though 16:14:02 id say high 16:14:23 I'd say critical even. We need to reliably know which version runs! 16:14:35 it could be disastrous to run 14. code on 15 db right? 16:14:39 yeah hmm 16:14:46 i think the issue is 16:15:03 its restarting the compute hosts correctly 16:15:07 and then there is a version mismatch 16:15:09 but yeah you're right 16:15:09 hmm 16:15:14 ok lets put it critical 16:16:06 so the log loaded for me 16:16:26 it seems that it needs further investigation. The return of systemctl daemon-reload is changed 16:16:40 and then it does the restart of services 16:16:44 so yes. 16:16:55 hmm thats weird 16:18:04 yes I see changed on nova-compute or other afterwards too. That is a good bug report 16:18:35 who has time to look at this? 16:18:47 that would be great to assign someone right now 16:19:06 dankolbrs: you will spend time on this too? 16:19:52 ok let's believe someone will pick that up, and track this 16:19:59 next 16:20:04 #link https://bugs.launchpad.net/openstack-ansible/+bug/1673226 16:20:04 Launchpad bug 1673226 in openstack-ansible "Support cinder v3" [Undecided,New] 16:20:07 oh this is fixed 16:20:14 evrardjp, I've looked into it but haven't really figured it out 16:20:19 andymccr: ? 16:20:22 which one? 16:20:25 https://review.openstack.org/#/c/446503/ 16:20:26 cinder one 16:20:28 I was able to replicate on an AIO 16:20:57 dankolbrs: we moved to another bug due to restricted time reasons, let's talk about it in the chan later 16:21:06 The nova equivalent patch could use some reviews though: https://review.openstack.org/#/c/446508/ 16:21:07 andymccr: good 16:21:56 #action review https://review.openstack.org/#/c/446508/ 16:22:46 ok next 16:22:49 #link https://bugs.launchpad.net/openstack-ansible/+bug/1672648 16:22:49 Launchpad bug 1672648 in openstack-ansible "Documented test setup fails due to HTTPS usage in proxy" [Undecided,New] 16:23:04 I'd say confirmed low 16:23:14 it works right now, except edge cases 16:23:16 doc change 16:23:24 ahh i see the issue 16:23:24 yeah 16:23:30 agreed 16:23:30 ok 16:23:58 next 16:24:00 #link https://bugs.launchpad.net/openstack-ansible/+bug/1670874 16:24:00 Launchpad bug 1670874 in openstack-ansible "Validating domain name in FQDN and openstack_domain variable" [Undecided,New] 16:24:53 it makes sense to properly use short hostname IMO 16:25:17 would that help though? doesnt it need to be whatever nova uses? 16:25:43 on top of that we can't because there is no ansible_hostname_short, only inventory_hostname_short 16:25:47 IIRC 16:26:18 so is the TL;DR that because the nova compute hosts aren't in /etc/hosts it bombs. 16:26:36 on live migration at least 16:27:14 thet seem to be in /etc/hosts 16:27:44 isn't that libvirt that's using another mechanism and doesn't care about /etc/hosts? 16:27:58 well i guess if it tries to resolve the name 16:27:58 that is the first time I hear it 16:28:24 systemd-resolvd? 16:28:57 well we could lookup dns if we want to go that far, but I don't think it's a great idea 16:29:16 well tbh 16:29:24 i dont think this is related to /etc/hosts 16:29:34 OH i got it, I misread 16:29:38 we mostly populate that based on the container names and other hosts so it can access them 16:29:38 so 16:29:51 if nova live migrate doesn't work because it's missing /etc/hosts entries that should go into the nova role imo 16:30:02 yes, I completely misread 16:30:04 which could hten be additional entries to ensure the other compute hosts are present 16:30:27 or we ensure a proper name resolving, but that's for the future I guess 16:30:37 haha yeah agreed, but yeah future 16:30:49 I guess we could wire it better up in the defaults 16:30:58 so im thinking this is a bug - but not so much to do with the openstack_domain bits, and more to do with the nova role not setting up the appropriate entries for nova 16:31:01 to have always openstack.local 16:31:10 overridable by the users 16:31:20 instead of having this local.lan that's not consistent 16:31:28 confirmed low? 16:31:32 yeah 16:31:42 but yeah the issue is a disconnect between our "openstack_domain" and what nova is using 16:31:51 yes indeed 16:32:26 next 16:32:29 #link https://bugs.launchpad.net/openstack-ansible/+bug/1670632 16:32:29 Launchpad bug 1670632 in openstack-ansible "ceilometer error because gnocchiclient > 3.0 for stable/newton " [Undecided,New] - Assigned to Jesse Pretorius (jesse-pretorius) 16:33:11 odyssey4me: ? 16:33:28 I haven't had a chance to revisit that. 16:33:40 ok let's leave it as is. someone else? 16:33:54 except if someone else* 16:34:17 #action odyssey4me have a look at https://bugs.launchpad.net/openstack-ansible/+bug/1670632 16:34:17 Launchpad bug 1670632 in openstack-ansible "ceilometer error because gnocchiclient > 3.0 for stable/newton " [Undecided,New] - Assigned to Jesse Pretorius (jesse-pretorius) 16:34:28 next 16:34:31 #link https://bugs.launchpad.net/openstack-ansible/+bug/1665667 16:34:31 Launchpad bug 1665667 in openstack-ansible "Slow failover Recover on primary node restart" [Undecided,New] - Assigned to Jean-Philippe Evrard (jean-philippe-evrard) 16:34:54 confirmed wishlist. 16:35:19 We can spend some time on this, but it's gotta be a real effort on haproxy -- I still don't have time assigned to this. 16:35:25 yeah 16:35:34 i think its something cool we shoudl look into 16:35:43 when somebody has time for it ;P 16:35:48 oh yes, it's definitely needed IMO 16:36:13 but to triage, I guess it's falls under the wishlist category 16:36:14 next 16:36:30 #link https://bugs.launchpad.net/openstack-ansible/+bug/1665377 16:36:30 Launchpad bug 1665377 in openstack-ansible "openstack-ansible-security setting the pass_warn_age parameter incorrectly" [Undecided,New] 16:36:41 I have no clue. 16:36:46 mhayden: ? 16:36:54 he's out atm 16:36:56 i'll take a look 16:37:02 it seems pretty easy to confirm 16:37:05 lets move on 16:37:14 it looks it's that indeed. 16:37:27 or PASS_MAX_DAYS/PASS_MIN_DAYS 16:37:49 confirmed medium, and assigned to mhayden :p 16:38:06 if someone wants to take it, feel free 16:38:19 it's classified as low-hanging-fruit for newcomers! 16:38:31 next 16:38:33 #link https://bugs.launchpad.net/openstack-ansible/+bug/1665084 16:38:34 Launchpad bug 1665084 in openstack-ansible "M->N upgrade causes container reboot due to bind mounts" [Undecided,New] 16:39:06 hmm 16:39:34 regarding for my problem of updating quota VMs it seems tis is a bug 16:39:49 https://bugs.launchpad.net/nova/+bug/1668267 16:39:49 Launchpad bug 1670627 in OpenStack Compute (nova) ocata "duplicate for #1668267 quota is always in-use after delete the ERROR instances " [Critical,In progress] - Assigned to Matt Riedemann (mriedem) 16:40:28 Andy McCrae proposed openstack/openstack-ansible-security master: Change PASS_WARN_DAYS --> PASS_WARN_AGE https://review.openstack.org/448193 16:40:59 looks valid. 16:41:04 yeah 16:41:05 agreed 16:41:07 but what do we do 16:41:25 that's gonna be painful during an upgrade to N 16:41:39 hmm 16:41:39 maybe 16:41:47 the lineinfile stuff should be smarter 16:42:05 because the issue seems to be a minor change in a bindmount causing a restart of a container 16:42:18 we could add regexp 16:42:26 but then that will still change - and restart so i guess thats pointless 16:42:41 to have the item mount_path the bind patch and bind , then skip 16:42:50 andymccr, odyssey4me and others: it would be great to have your feedback on my email: [deployment][tripleo] Next steps for configuration management in TripleO (and other tools?) 16:42:52 well that's true 16:43:22 I think there is no perfect way to do this andymccr 16:43:30 andymccr, odyssey4me: if you have any feedback, please reply to it 16:43:46 evrardjp: yeah agreed 16:43:48 let's just confirm and mark it as medium -- it can break upgrades 16:43:58 if only we could be smarter with mounting/container restarts but i guess that is not possible 16:43:58 we need to "deal with it" 16:44:17 either by removing this (and risking another change, which seems awful) 16:44:24 or by being smarter in the restart 16:44:48 the lineinfile needs to be fixed anyway because it creates duplicate mount entries for the same bind mount 16:45:00 https://review.openstack.org/#/c/426928/ galera was serialized to avoid problems with this 16:45:01 that's true too. 16:45:20 yeah perhaps that is the solution 16:45:20 or limit them at least 16:45:24 jmccrory: which branch? 16:45:27 for now at least 16:45:29 that's good 16:45:29 it went to newton 16:45:34 ok 16:45:41 so we just need to clean the lineinfile 16:45:44 and was breaking upgrades 16:45:48 to ensure everything looks clean 16:46:02 (avoiding duplicates) 16:47:09 good 16:47:13 I commented there 16:47:23 confirmed low-hanging-fruit 16:47:41 medium? 16:47:49 depends on our priority of upgrades I guess 16:48:00 id say it's medium at least 16:48:04 ok 16:48:07 done as medium 16:48:11 let's move on 16:48:30 next 16:48:32 #link https://bugs.launchpad.net/openstack-ansible/+bug/1662865 16:48:32 Launchpad bug 1662865 in openstack-ansible "nova-compute: inconsistent qemu packages installed" [Undecided,New] 16:49:02 I guess we can leave it as is 16:49:09 until bug reporter comes back 16:49:13 yeah 16:49:13 agreed 16:49:17 it will expire if not 16:49:18 ok 16:49:42 next one is assigned to mhayden 16:49:45 let's skip 16:49:48 yeah 16:49:59 #link https://bugs.launchpad.net/openstack-ansible/+bug/1661114 16:49:59 Launchpad bug 1661114 in openstack-ansible "rabbitmq_Server role is failing in mitaka due to rfc hostname changes" [Undecided,New] 16:50:12 bjoern doesn't seem connected 16:50:13 next 16:50:26 #link https://bugs.launchpad.net/openstack-ansible/+bug/1659073 16:50:26 Launchpad bug 1659073 in openstack-ansible "MariaDB 10.0.28 -> 10.1 upgrade (galera server wsrep bootstrap fails)" [Undecided,New] 16:50:38 waiting for bug reporter too 16:51:08 ok so this is simliar to the one jamesdenton raised i think 16:51:39 which one, the nova one? or the maria? 16:51:44 mariadb bits 16:52:10 (sorry I just noticed you were in a meeting) 16:52:16 let's wait for reporter info, if it's fixed we don't have anything to do there 16:52:22 EmilienM: that's alright, don't worry :p 16:52:25 saw that being caused during upgrades when all containers restart at the same time 16:52:27 we are only triaging :p 16:52:31 if you guys need a logo 16:52:59 let's continue 16:53:12 jmccrory: may already have fixed it as described in the bug 16:53:24 #link https://bugs.launchpad.net/openstack-ansible/+bug/1651809 16:53:24 Launchpad bug 1651809 in openstack-ansible "Hostname change in L->M resets hypervisor usage" [Undecided,New] 16:53:51 maybe this one is linked to vnogin's one 16:54:05 hmm 16:54:06 indirectly 16:54:06 well 16:54:17 well I think it's the whole _ - hostname change 16:54:19 we need to get this one fixed asap if it's an issue since mitaka will EOL at some point 16:54:35 we moved to use ansible_hostname which will be a fqdn 16:54:45 so there is no way to fix that without being disruptive IMO 16:55:18 hmm 16:55:28 if you move to use inventory_hostname, you'll basically have issues later 16:55:35 like bjoern suggested 16:55:40 so I don't think it's a good idea. 16:56:13 well I think it deserves a decision, once and for all 16:56:33 and I thought that decision was taken when everything moved to ansible_hostname 16:56:41 yeah i agree with the move 16:56:41 evrardjp: Is that _/- change you're talking about related to https://review.openstack.org/#/c/407655/ ? 16:56:46 Or something different? 16:57:01 not directly 16:57:04 Mostly ask cause I haven't had time to circle back on that review and I got stumped 16:57:05 Ok 16:57:22 in L to M upgrade we moved to use - in many places 16:57:31 (not in the inventory 'though) 16:57:51 but that's the reason why we started to use ansible_hostname 16:58:04 Got it 16:58:12 so if nova in M is now using ansible_hostname, it would now be a different name in M and L 16:58:21 you can fix that by forcing M to use L name 16:58:27 but that sounds like delaying the pain 16:58:59 if we had an inventory adapted for that in M, that would have been different 16:59:05 but that's not the case 16:59:19 hmm 16:59:34 so until we have an inventory that's clean enough and that we can use (and revert ansible_hostname to inventory_hostname), I'd say let's stick with this 16:59:42 yeah 16:59:47 it's for me a "known issue" 16:59:58 yeah agreed 17:00:07 ok 17:00:08 Is M EOL yet? 17:00:15 nope 17:00:19 IIRC 17:00:21 not yet no 17:00:27 it'll happen in the next few months 17:00:35 but it's stable 17:00:47 we can't do anything too much disruptive there 17:01:05 Right 17:01:09 I guess we are out of time for today 17:01:18 thanks everyone 17:01:29 pretty good effort - only 2 left, so all good :) 17:01:35 hehe 17:01:40 #endmeeting