14:01:05 #startmeeting tripleo 14:01:06 Meeting started Tue May 23 14:01:05 2017 UTC and is due to finish in 60 minutes. The chair is EmilienM. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:08 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:10 The meeting name has been set to 'tripleo' 14:01:12 #topic agenda 14:01:18 * review past action items 14:01:19 * one off agenda items 14:01:21 * bugs 14:01:24 * Projects releases or stable backports 14:01:25 * CI 14:01:27 * Specs 14:01:29 * open discussion 14:01:31 Anyone can use the #link, #action and #info commands, not just the moderatorǃ 14:01:33 Hi everyone! who is around today? 14:01:36 o/ 14:01:38 o/ 14:01:39 o/ 14:01:40 o/ 14:01:41 o/ o/ 14:01:42 \o 14:01:43 \o/ 14:01:51 o/ 14:02:00 sshnaidm: /o\ 14:02:01 o/ 14:02:08 o/ 14:02:10 o/ 14:02:13 o/ 14:02:50 the owl say hi! (~˘▾˘)~ 14:02:56 ccamacho: perfect 14:02:59 #topic review past actions items 14:03:01 o/ 14:03:06 * rasca to send ML about tripleo-quickstart-utils so we keep open discussion going: done 14:03:49 rasca: feel free to post updates, sounds like we're waiting for your feedback on the replies now 14:04:03 * EmilienM to share tripleo project updates slides to ML: done 14:04:14 EmilienM, sure 14:04:24 #link tripleo project updates - boston https://docs.google.com/presentation/d/1knOesCs3HTqKvIl9iUZciUtE006ff9I3zhxCtbLZz4c 14:04:26 o/ 14:04:36 o/ 14:04:36 #topic one off agenda items 14:04:43 \o 14:04:56 panda: please go ahead! 14:04:58 EmilienM: thanks, I'll make it quick 14:06:02 so, there has been an explosion in featureset files and the current assignement function i_ll_just_pick_one() is not really working, so until we'll find a better solution I set up this etherpad https://etherpad.openstack.org/p/quickstart-featuresets for coordination 14:06:08 o/ 14:06:42 so we can acquire the lock on a index number without stomping ecah other's feet 14:07:11 does i_ll_just_pick_one() will query the etherpad content? 14:07:20 the other this is that currently there are a lot of transition in oooq oooq-extras and tripleo-ci, and also we are still working to dissolve some confusion around featureset files 14:07:38 panda: I think etherpad is a good start for now 14:08:06 I also believe the featuresets should be documented in-tree of tripleo-quickstart 14:08:10 So I'd like to ask that people that want to +2 reviews on those project, attend some of the CI meeting, to be sure they are updated on the latest developements 14:08:28 EmilienM: yes, we have to expand documentation around featurests a lot 14:08:44 I'm done. 14:09:03 panda: actually, everytime to add a featureset in oooq, it should be *required* to document it in the patch 14:09:19 panda: maybe you can send a reminder to openstack-dev [tripleo] so everyone can read this ingo 14:09:22 info* 14:09:28 EmilienM: yes, we currently update the matrix, but we have to add other pieces to the documentation 14:09:46 like what variable should go in featureset files 14:09:50 good, maybe make it clear again, and take the opportunity to share the etherpad url 14:09:54 and how to treat them 14:10:13 yes, I'll follow with an email to openstack-dev 14:10:26 thank you sir 14:11:12 ccamacho: go ahead 14:11:20 hey guys, just a quick update on 2 features I want to have, basically to have a command to backup and restore an undercloud.. 14:11:26 https://blueprints.launchpad.net/tripleo/+spec/undercloud-backup-restore 14:11:42 I have create this blueprint to log the progress.. I started it last thursday 14:11:55 now testing the 3th iteration locally 14:12:25 Please, feedback is welcomed there :) 14:12:26 thanks 14:12:40 ccamacho: ++ nice addition esp for pre-upgrade backup. some changes esp around network config have proven very problematic and best recovery is undercloud restore for the networks 14:12:57 ccamacho: will you write a spec? 14:13:12 is it necessary i mean if it will be large change? 14:13:18 sure if its needed 14:13:31 I dont think will be a large change 14:13:49 I'm always asking myself if the undercloud is considered as ephemeral or if there are some cases where we want to backup it 14:13:54 will this have to work also in baremetal undercloud ? 14:13:57 its more like to automate 2 steps we already have documented 14:14:20 I think I've had this discussion with dprince a while ago 14:14:43 for the backup I want to keep the data needed to run again the undercloud install and make it work again.. 14:15:15 when you say the data, what else do we have beside the mysql database? 14:15:19 I think is critical for users to have a "simple execute this command" to have a backup 14:15:50 ccamacho: +1 14:16:01 if we had backup-restore mechanism we could just always use that as a means to "update" (reinstall a fresh, undercloud) 14:16:09 EmilienM: some of this is from the recent conversations about improving the upgrade but i am sure it has been discussed in the past esp the issue with undercloud restore for the changed networks 14:16:11 and could even upgrade to a containers undercloud this way 14:16:35 undercloud configuration like networks 14:16:36 yeah 14:16:59 EmilienM: sorry, 'recent' conversationss, e.g. thread at http://lists.openstack.org/pipermail/openstack-dev/2017-May/116876.html 14:17:06 as an examplt (for anyone who missed it) 14:17:07 do we want to store the bits on the filesystem or in something like Swift? 14:17:29 * ccamacho wasnt thinking about containers yet dprince but yeah probably can be a good thing to have 14:17:30 marios: thx for the link 14:17:40 ccamacho: we should discuss offline i guess, but why not just db dump and restore? 14:18:13 at this stage of the cycle, is it safe to target it for queens-1? 14:18:22 marios ack we can check it later, what about keeping your OC images also in the backup.. 14:18:24 or do we want this one done by pike? 14:19:20 EmilienM we can have it for Q but is it possible to backport it to P ? 14:19:38 backporting a feature is against OpenStack stable policy 14:19:53 ccamacho: https://docs.openstack.org/project-team-guide/stable-branches.html#support-phases 14:20:14 mm EMilienM how much time we have for landing it on P? 14:20:21 at least the backup 14:20:29 ccamacho: https://releases.openstack.org/pike/schedule.html 14:20:34 I'll let you do some reading :) 14:20:44 EmilienM ack thanks we can check it later then 14:20:53 I don't see much pushback on this feature right now. I would suggest to also discuss it on the ML to make sure 14:21:11 ccamacho: yes, you can check the schedule later and let me know what you think. 14:21:31 #action ccamacho to propose undercloud backup/restore blueprint on ML 14:21:43 thanks! 14:21:53 #action panda to remind oooq featureset policy / etherpad on ML 14:22:22 #action rasca follow up tripleo-quickstart-utils discussion on ML 14:22:29 sshnaidm: go ahead please 14:22:52 just to make it visible to everyone: http://lists.openstack.org/pipermail/openstack-dev/2017-May/117263.html 14:23:05 a proposal to do a sprint to reduce tripleo deployment time 14:23:20 sshnaidm: good idea. I already replied 14:23:31 sshnaidm: anything you want to discuss here? 14:23:47 EmilienM, no, that's it I think, the rest is in ML 14:23:54 ok 14:24:12 marios: go ahead please! 14:24:18 EmilienM: did you miss matbu? 14:24:24 yes 14:24:27 marios: lol yes :) 14:24:28 I'm blind sorry 14:24:41 matbu: go ahead 14:24:46 but no worries :) 14:25:05 so i wrote a BP for the major upgrade workflow: https://blueprints.launchpad.net/tripleo/+spec/major-upgrade-workflow && https://etherpad.openstack.org/p/upgrade-workflow-and-validation 14:25:15 sounds cool 14:25:22 and an etherpad where is summerize all the ongoing discussions reviews 14:25:23 this one is for pike :D 14:25:25 and BP 14:25:33 well, why not ? :) 14:25:38 the blueprint sounds too general too me 14:25:48 its in progress anyway 14:25:49 and I'm pretty sure it could be broken down into smaller bits 14:26:05 but it's a first iteration I guess 14:26:32 EmilienM: idk, smaller like one BP per upgrade steps ? 14:26:36 matbu: could you make blueprints dependencies in Launchpad? 14:27:05 matbu: not really but a BP per problem we solve 14:27:21 in the etherpad, L21, it seems we have a list of BPs 14:27:27 which sounds good 14:27:46 EmilienM: yep, there is also the one that marios wants to discuss today 14:27:49 I'm wondering if https://blueprints.launchpad.net/tripleo/+spec/major-upgrade-workflow is really needed 14:27:52 EmilienM: o/ am coordinating with florian on https://etherpad.openstack.org/p/tripleo-pre-upgrade-validations for some pre-upgrade validations. just ansible tasks in the existing tripleo-validations. additions/thoughts welome. I think we can easily get some things here for P. thanks. 14:28:10 EmilienM: the blueprint for that is https://blueprints.launchpad.net/tripleo/+spec/pre-upgrade-validations 14:28:12 you see, marios should have talked first :P 14:28:14 linked from the etherpad 14:28:19 EmilienM: this one is for having the workflow in mistral and the cli implementation 14:28:28 EmilienM: yeah it is different 14:28:36 all of this sounds good to me 14:28:52 cool, i think we can target it to pike 14:29:01 if it sounds reasonable to you 14:29:02 we just need to do a better communication on the blueprints we created during the cycle 14:29:14 summarize the work in progress and make some prioritization and scheduling 14:29:24 EmilienM: also end of my item. https://blueprints.launchpad.net/tripleo/+spec/pre-upgrade-validations maybe target this one to P as well 14:30:20 marios, matbu: I would let you guys talk each other and summarize the blueprints you want in Pike and the ones for Queens and maybe share it to the ML so we can discuss there. Also, from the list we can make triage 14:30:37 does it make sense? 14:30:39 EmilienM: but fwiw/info all of these things are discussed in that mail thread i pointed to earlier at http://lists.openstack.org/pipermail/openstack-dev/2017-May/116759.html 14:31:16 EmilienM: i mean, upgrade workflow in client/common, backup/restore (not sure that one is there actually), better validations/checks during the upgrade undercloud/overcloud 14:31:21 marios: yes, I know, just re-use the thread then. I'm just thinking at sharing an overview of all blueprints related to $topic 14:31:24 EmilienM: ack thanks 14:31:30 ack 14:31:52 matbu, marios: it will help us to make the correct release triaging 14:32:23 #action matbu + marios to share all upgrade-related blueprints on the ML thread 14:33:04 matbu, marios: thanks! great work here 14:33:22 please keep https://etherpad.openstack.org/p/upgrade-workflow-and-validation updated if you can 14:33:37 yep thx 14:33:43 do we have any other items this week before we go to the regular agenda? 14:33:46 thanks 14:34:00 #topic bugs 14:34:04 #link https://launchpad.net/tripleo/+milestone/pike-2 14:34:22 do we have any outstanding bug to discuss this week? 14:35:07 EmilienM: the newton ovb job is still blocked right 14:35:13 yes 14:35:17 EmilienM, maybe to define what could be done for pingtest bug 14:35:20 EmilienM: getting link sorrysec 14:35:24 I'm wondering if someone was looking at it 14:35:29 because I haven't seen much progress 14:35:39 I did a quick investigation yesterday and commented on the bug report 14:36:06 let's talk about it during the CI topic 14:36:26 EmilienM: ack 14:36:30 beside CI issues, is there any outsanding bug in tripleo to discuss? 14:36:36 outstanding even 14:36:47 alright, let's talk about ci 14:36:54 #topic CI 14:37:03 so we currently have 4 alerts 14:37:09 EmilienM, I don't think pingtest failure is CI issue, it's most likely tripleo issue 14:37:17 2 for newton jobs, 1 for master (pingtest) and 1 for containers 14:37:25 sshnaidm: deployment time? 14:37:36 EmilienM, no, the failure of pingtest in HA 14:37:43 https://bugs.launchpad.net/tripleo/+bug/1690373 this one. so it was blocked on the os-refresh-config fixup and the new package build dummy reviews. but those are blocked now on master? 14:37:45 Launchpad bug 1690373 in tripleo "stable/newton gate-tripleo-ci-centos-7-nonha-multinode-oooq broken" [Critical,Triaged] - Assigned to Marios Andreou (marios-b) 14:37:45 ah 14:38:08 marios: I'm not sure it's something in packages 14:38:19 marios: I looked at it and the jobs timeouts 14:38:35 sshnaidm: do we have some HA experts looking at it? 14:38:39 EmilienM: this i mean https://review.rdoproject.org/r/#/q/Ie205c93a3cdcc3c68668327fde6327cd373a8739,n,z 14:38:45 EmilienM: it was part of the fix right? 14:38:46 EmilienM, afaik no 14:38:46 i think 14:39:02 marios: I'm not sure it's really helpful, tbh 14:39:07 I am looking at newton jobs 14:39:08 EmilienM, who are HA experts that I can ask them to look? 14:39:23 marios: have you look at logs? the job *timeouts* 14:39:33 EmilienM: failure for pingtest is definitely not a timeout issue. see https://bugs.launchpad.net/tripleo/+bug/1680195/comments/5 14:39:34 Launchpad bug 1680195 in tripleo "Random ovb-ha ping test failures" [Critical,Triaged] 14:39:38 EmilienM: so bandini has already been helping on this bug wrt 'haexperts' 14:39:39 it seems like maybe they are fixed on latest newton current-passed-ci repo... at least my local env deployed fine 14:39:48 sshnaidm: you can ask bandini and his team 14:39:59 EmilienM, ok, will do 14:40:10 EmilienM: but i didnn't check for couple days as i was waiting for https://review.openstack.org/#/c/465934/ 14:40:40 #action sshnaidm to ask bandini and his team to look at pingtest HA failures: https://bugs.launchpad.net/tripleo/+bug/1680195 14:40:41 Launchpad bug 1680195 in tripleo "Random ovb-ha ping test failures" [Critical,Triaged] 14:40:44 trown: in how much time? 14:41:15 marios: can you explain why a dummy patch in t-i-e would help? 14:41:24 EmilienM: multinode jobs cant be a time thing can they? 14:41:37 EmilienM: I think maybe streams are crossed since there are 4 alert bugs 14:41:50 EmilienM: I was specifically looking at the 2 newton ones 14:42:03 EmilienM: sshnaidm sorry i was referring to the newton oooq nonha bug/1690373 14:42:06 trown: well, gate-tripleo-ci-centos-7-nonha-multinode-oooq on newton is timeouting 14:42:18 EmilienM: i had it ready from the earlier discussion and hit return too quickly :/ sorry for the confusion sshnaidm 14:43:10 EmilienM: ya... but that was actually a hang right? as in it would not finish given infinite time 14:43:47 probably 14:44:16 anyway, let's move forward, we'll follow-up on #tripleo 14:44:18 I just would be very surprised if we slowed things down to the point multinode jobs were timeouting 14:44:26 yeah me too 14:44:33 AFIK it happenned during the oooq transition 14:44:43 but it was the ovb transition 14:45:03 EmilienM, yep, it was ovb only 14:45:07 the package diff didn't help much, quite a lot of changes in a few weeks 14:45:36 is there anything ci-related we should talk now? 14:45:45 EmilienM: https://bugs.launchpad.net/tripleo/+bug/1690373 this one 14:45:46 Launchpad bug 1690373 in tripleo "stable/newton gate-tripleo-ci-centos-7-nonha-multinode-oooq broken" [Critical,Triaged] - Assigned to Marios Andreou (marios-b) 14:46:08 EmilienM: is the one i was referring to. it was waiting for https://review.rdoproject.org/r/#/q/Ie205c93a3cdcc3c68668327fde6327cd373a8739,n,z and then https://review.openstack.org/#/c/465934/ 14:46:34 marios: and I'm asking *again*: what make you think this thing will help? 14:46:51 EmilienM: the discussion on the bug 14:46:56 why would gate-tripleo-ci-centos-7-nonha-multinode-oooq timeout because of this? 14:47:27 marios: https://review.openstack.org/#/c/465935/ didn't pass CI 14:47:36 EmilienM: there is issue in o-r-c which will cause the stack update to hang 14:47:40 that's why I abandoned it 14:47:55 we run stack update on gate-tripleo-ci-centos-7-nonha-multinode-oooq ? 14:48:44 EmilienM, I don't think so 14:49:01 marios: ^? 14:49:09 EmilienM: so if you see for example https://bugs.launchpad.net/tripleo/+bug/1690373/comments/7 14:49:10 Launchpad bug 1690373 in tripleo "stable/newton gate-tripleo-ci-centos-7-nonha-multinode-oooq broken" [Critical,Triaged] - Assigned to Marios Andreou (marios-b) 14:49:52 EmilienM: there is a yum update being executed there e.g. http://logs.openstack.org/29/463529/1/check/gate-tripleo-ci-centos-7-nonha-multinode-oooq/362437e/logs/subnode-2/var/log/yum.log.txt.gz 14:49:55 marios: how is it related to "stack-update"? And again, we don't run stack-update in this job I think 14:49:58 shows some things being updated 14:50:20 is it updated by a stack update? 14:50:38 EmilienM: ok perhaps i am not being clear enough then :) I'd be happy to discuss offline some more if you line EmilienM 14:50:59 we can take it offline for sure 14:51:09 moving on now 14:51:09 EmilienM: i've updated the bug already and bandini was also involved as mentioned 14:51:12 EmilienM: thanks :D 14:51:16 #topic specs 14:51:20 #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open 14:51:44 do we have anything to discuss about specs this week? 14:52:26 I guess no 14:52:29 #topic open discussion 14:52:38 if there is any question or feedback, it's the right time 14:52:46 EmilienM, who are cores that can approve it? https://review.openstack.org/#/c/420878/ 14:53:10 sshnaidm: tripleo cores 14:53:28 EmilienM, ok 14:53:33 sshnaidm: you can +2 14:53:44 I'll approve it once we have enough votes 14:53:47 ok 14:53:55 anything else this week? 14:54:49 alright. Have a nice week and have fun 14:54:51 #endmeeting