14:03:01 #startmeeting tripleo 14:03:01 Meeting started Tue Nov 22 14:03:01 2016 UTC and is due to finish in 60 minutes. The chair is EmilienM. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:03:02 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:03:04 The meeting name has been set to 'tripleo' 14:03:04 EmilienM: any google reminders got messed up as they're not UTC aware (mine was an hour from now) 14:03:21 at least in those locations that recently moved an hour for DST 14:03:44 my agenda says it's in one hour 14:03:44 o/ 14:03:45 hi guys! 14:03:46 o/ 14:03:46 o/ 14:03:46 o/ 14:03:49 o/ 14:03:49 Hi all! 14:03:51 o/ 14:03:51 o/ 14:03:52 o/ 14:03:53 o/ 14:03:53 hello 14:03:53 o/ 14:03:54 o/ 14:03:54 o/ 14:03:54 #link https://wiki.openstack.org/wiki/Meetings/TripleO 14:04:03 o/ again 14:04:12 o/ 14:04:14 o/ 14:04:17 hey 14:04:18 EmilienM: https://www.timeanddate.com/worldclock/timezone/utc 14:04:20 o/ 14:04:21 #topic agenda 14:04:23 * one off agenda items 14:04:25 * bugs 14:04:27 * Projects releases or stable backports 14:04:29 * CI 14:04:31 * Specs 14:04:33 * open discussion 14:04:37 shardy: yeah sorry for that 14:04:43 hi folks! 14:04:46 o/ 14:04:49 EmilienM: np, we had the same issue last week :) 14:05:08 if you have off agenda items, please add them to https://etherpad.openstack.org/p/tripleo-meeting-items 14:05:12 hi everyone 14:05:13 or propose them here directly 14:05:28 \o 14:05:30 Hey all 14:05:34 we can start with bugs in the meantime 14:05:42 #topic bugs 14:06:02 I see 109 bugs targeted for ocata-2 14:06:06 #link https://launchpad.net/tripleo/+milestone/ocata-2 14:06:46 Yeah I deferred a lot, but we fixed 119 (!) for Ocata-1 14:07:02 do we have outstanding bugs this week? I see that last week you had a blocker with upgrade, fixed now 14:07:20 /me working on https://bugs.launchpad.net/tripleo-common/+bug/1643701 for tripleo-common 14:07:20 Launchpad bug 1643701 in tripleo-common "minor update fails: os-collect-config shows: Notice: /Stage[main]/Ceph/Ceph_config[global/fsid]/value: value changed '03d0dd10-b019-11e6-bd00-52540071cf28' to 'fdc8ad96-b016-11e6-88a3-52540071cf28'" [Undecided,In progress] - Assigned to Giulio Fidente (gfidente) 14:07:43 gfidente: the bug title is terrible :-P 14:08:01 sounds like "ceph fsid is not idempotent" or something 14:08:30 yeah I can update that 14:08:36 jaosorior: have we reported the selinux problem yet? 14:08:49 jaosorior just found out that CI is broken atm, httpd doesn't start 14:09:01 and we are investigating why (we think it could be selinux but really not sure at this point) 14:09:12 I thought we had selinux permissive? 14:09:23 yes, I 've looked at audit and I don't see any alert 14:09:43 also don't see it here https://etherpad.openstack.org/p/tripleo-ci-status 14:09:51 weshay: we found it 10 min ago 14:09:56 aye 14:10:00 #action EmilienM & jaosorior to report httpd issue on launchpad 14:10:15 do we have anything else we should prioritize? 14:10:42 weshay: is there any bug still open that prevent tripleo CI to be promoted? 14:11:25 EmilienM, I know panda was going to manually kick of a periodic run, others including myself have tests running right now 14:11:55 ok 14:11:59 there are lots of "uncrossed" items in https://etherpad.openstack.org/p/tripleo-ci-status 14:12:03 btw, i thought we had a tag for promotion blockers in launchpad 14:12:09 not sure if some of those may be fixed though 14:12:17 EmilienM: we do 14:12:53 periodic run started 5 minutes ago 14:13:48 trown: I updated 2 of them 14:13:50 that are fixed 14:13:51 EmilienM: hmm... I dont see it actually... 14:13:52 nova and neutron 14:13:57 trown: yes 14:14:06 #action EmilienM to create launchpad tag for promotion blockers 14:14:36 EmilienM: you can cross it off :) I just did via https://bugs.launchpad.net/tripleo/+manage-official-tags 14:14:49 #undo 14:14:50 Removing item from minutes: 14:14:55 EmilienM, I'm not sure, but maybe this related to httpd issue? https://bugs.launchpad.net/tripleo/+bug/1640879 14:14:55 Launchpad bug 1640879 in tripleo "CI: apache fails to start on overcloud controller" [High,Triaged] 14:14:55 trown: thx 14:15:30 sshnaidm: I don't know, we'll have to compare logs to see if it's the same problem 14:15:51 "httpd doesn't start" is very hasardous 14:16:06 sshnaidm: let's figure out after our meeting 14:16:14 ok 14:16:40 anything else for bugs? 14:17:00 #ŧopic one off agenda items 14:17:14 #link https://etherpad.openstack.org/p/tripleo-meeting-items 14:17:23 I see a question about spec freeze plan 14:17:39 so iirc we freezed blueprints on Nov 14th 14:18:06 Yeah, although in the last meetings we discussed some flexibility for specs already posted that haven't got many reviews 14:18:11 which is nearly all of them 14:18:15 we should probably freeze specs by end of ocata-2 14:18:27 main point is nothing *proposed* now should be considered IMO 14:18:30 so we still have 3 weeks 14:18:39 shardy: +1 14:18:42 https://bugs.launchpad.net/tripleo/+milestone/ocata-2 14:18:42 +1 14:18:52 we've got 3 weeks until ocata-2, and 26 blueprints 14:19:03 I'll be amazed if even half of them land 14:19:30 so we probably need to start deferring everything not started to ocata-3, and flagging that they're at risk 14:19:46 yesterday, I went through all of them and updated status of it's started or not 14:19:58 a lot are not started or very barely 14:20:25 so some will be pushed to oc3 and some will need to be reproposed for pike? 14:20:44 fultonj: most probably 14:20:55 fultonj: well it's too soon to say, but the point is many probably will slip into Pike at this rate 14:20:59 think we're back to the neutron-db-manage error now in the undercloud :( 14:21:02 given the short cycle and lack of progress 14:21:04 fultonj: it's also a sign that blueprints are not well broken down 14:21:30 fultonj: the other thing to consider is we're not going to have review bandwidth for a late cycle mountain of features 14:21:40 I see some blueprints as huge features, and ocata is shorter than other cycles 14:21:57 makes sense 14:22:03 people who create blueprints should probably break down a little bit more so we can achieve a blueprint within one cycle 14:22:32 shardy: right, that's why we need to defer now and flag the ones at risk 14:22:32 EmilienM +1 14:22:35 EmilienM: so instead of doing x,y,z in one blueprint start with just x? 14:22:47 shardy: because otherwise, they'll request FFE and we'll have same thing as last time 14:23:02 EmilienM: I was waiting for some reviews on https://review.openstack.org/#/c/397296/ before creating individual bps for tasks 14:23:06 fultonj: yes and you can configure blueprint dependencies 14:23:12 even so, for people trying to get features to implement, the current blueprints are not quite descriptive, right? 14:23:17 EmilienM: does that make sense or should I just go and create all bps right away? 14:23:45 EmilienM: should we communicate (loudly) that we'll bserve the same feature freeze as all other projects this time? 14:23:53 e.g at ocata-3 14:23:57 https://releases.openstack.org/ocata/schedule.html 14:24:14 akrivoka: I think so, and we need to target the blueprints that you think we can achieve in ocata 14:24:28 shardy: yes 14:24:30 shardy: +1 on that. 14:24:47 The other option would be to choose some slightly later date, but I think release quality suffered last time due to all the FFEs 14:24:53 EmilienM: ok, I'll go ahead and create blueprints for individual work items 14:25:02 A good example IMHO was the initial blueprint for composable services, decomposed by service, so lot of people helped to landed on time 14:25:04 so it'd be great to have a focus on bugfixing during the RC window instead this time 14:25:21 ++ 14:25:30 EmilienM: I've a similar situation to akrivoka, assume I should do the same and create blueprints asap? 14:25:34 short cycle.. timing, everything... FFEs are going to be harder to deal with. Not to mention our bug list 14:26:11 ya, imo it is not really meant to be a feature-full release, so FFE is too much 14:26:33 beagles: given the amount of work here, we need to avoid any rush to push last features like we did in newton 14:26:42 * beagles would prefer to go into a P with a longer TODO list than have everything really wobbly at the end of O 14:27:00 and rather focus on what we can actually deliver in ocata, make it good and more stable than previous releases 14:27:15 EmilienM: absolutely. 14:27:19 +1 14:27:40 I'll communicate on that 14:27:46 Ok sounds like we're all agreed, we just need to ensure it's clearly communicated 14:28:09 #action EmilienM to communicate on ocata release schedule 14:28:34 any other off items? 14:28:57 #topic Projects releases or stable backports 14:29:04 any thoughts on the containerization blueprint? 14:29:16 slagle, shardy: thanks a ton for doing the releases when I was on PTO :-) 14:29:22 https://review.openstack.org/#/c/223182/ 14:29:28 fultonj: it's a multi-cycles blueprint 14:29:56 EmilienM: np! 14:30:26 fultonj: at summit we discussed it getting completed during Ocata and Pike 14:30:39 time will tell how far along the road we get during Ocata 14:30:56 so tonyb sent an email about Liberty EOL : [openstack-dev] [all][stable][ptls] Tagging liberty as EOL 14:31:04 ack 14:31:42 shardy and I discussed about it and agreed that we should propose a last Liberty release in all TripleO projects, tag it EOL and then remove branches and Ci jobs for liberty, like other projects. 14:31:52 any thoughts ? ^ 14:32:02 +1 14:32:21 that also means we may be able to request the stable: follow-policy tag again 14:32:31 follows-policy I mean 14:32:33 shardy: yeah! :) 14:32:55 shardy: https://review.openstack.org/#/c/314485/ - I'll ask you to restore it when it's done 14:33:02 +1 14:33:07 EmilienM: ack, sure will do 14:33:26 it does mean we can no longer test liberty->mitaka upgrades, but that is consistent with other projects upgrade testing 14:33:45 trown: we're not really testing them in CI now, are we? 14:33:56 I didn't know we tested it 14:33:58 but yeah, we can test the remaining supported branches 14:34:11 shardy: we have periodic jobs that are testing it on centos.ci 14:34:11 when upgrade CI is landed 14:34:40 matbu: Ah, Ok, but not upstream yet 14:34:46 #action EmilienM to proceed to last Liberty release and sync with tonyb about tripleo liberty EOL 14:34:47 shardy: nop 14:34:53 anything else about release management? 14:35:32 #topic CI 14:36:12 slagle has made some progress to have a 3-nodes job, https://review.openstack.org/#/c/397441/ 14:36:20 if people want to review it and dependencies, it would be cool 14:36:34 sshnaidm: do you mind to give an update about oooq integration? 14:36:47 I was chatting with matbu and wondering if using multinode (ideally based on the 3 nodes job) would be a good idea for upgrades 14:36:59 as the walltime is much less for the initial deployment 14:37:21 shardy: why 3nodes and not 2nodes? 14:37:22 EmilienM, it was broken yesterday and brought alive today 14:37:38 sshnaidm: what do you expect from this patch? Do you want us to review and merge it sometimes? 14:37:39 EmilienM: because the two node job has an all-in-one openstack 14:37:40 EmilienM, I hope today the patch will pass again and we can merge it if no objection 14:37:46 yep, EmilienM i re-worked last week on the upgrade review, the whole upgrade jobs works but hit timeout w/ ovb 14:37:47 sshnaidm: ok 14:37:57 i think we could use multinode jobs for upgrade testing 14:38:00 EmilienM: that would be OK as a starting point tho I guess 14:38:03 EmilienM, sure, for now only you and slagle reviewed it 14:38:07 shardy: ok, so you want to split in 2 overcloud nodes to avoid timeout? 14:38:10 EmilienM: it just makes it harder to separate the compute upgrade logic 14:38:14 just need to be aware of what is actually being tested 14:38:20 for instance, 2node jobs does not use pacemaker 14:38:28 which is a huge part of the upgrade 14:38:31 ya 14:38:33 EmilienM: No, I just want a faster way to do the initial deployment so we actually have time to test the upgrade 14:38:35 shardy: ok, makes sense 14:38:45 which is the blocker for the OVB upgrades patch atm (AFAIK) 14:38:55 the 3nodes job i did though does use pacemaker, although it is a cluster of 1 14:38:57 slagle: the upgrade review used pacemaker even with 2 nodes 14:39:13 matbu: using multinode jobs? 14:39:15 slagle: it's a good start I guess 14:39:27 slagle: nop i mean ovb + pacemaker with 2 nodes 14:39:30 slagle: nice, I think that would still be a good target for upgrade testing, as we'd at least run all the pacemaker upgrade stuff 14:40:09 the 3 nodes job still runs in under an hours, so we should have time to test an upgrade 14:40:23 although it's 3 or 4 stack-updates? 14:40:27 we might have time :) 14:40:28 yep 14:40:39 slagle: right now it's 3, but my PoC reduces it potentially to 2 14:40:41 + yum update on non controller nodes 14:40:47 or even 1 if we automate the compute upgrade 14:40:50 shardy: hehe +1 14:40:56 shardy: ok great 14:41:11 the 3nodes job can be made faster as well, b/c right now i bootstrap the oc nodes sequentially 14:41:15 slagle: well for M to N it's ideally 4 or 5 14:41:16 and that could be done in parallel 14:42:35 matbu: init, update, converge - the other steps are the per-role scripts, no? 14:42:55 shardy: ceilometer migration as a first step 14:43:12 shardy: yeah ignoring migrations 14:43:19 well, we can't test M..N with multinode 14:43:25 multinode jobs don't run on mitaka 14:43:26 shardy: and aodh migration (post converge) 14:43:28 matbu: Ok, we'll have to work through the details, I'm trying to reduce the number of stack updates if possible 14:43:40 we can only test Newton..master 14:43:53 yeah, I'd say lets just start with newton to master 14:44:05 slagle: why not on mitaka ? 14:44:14 multinode jobs don't run on mitaka 14:44:22 matbu: because the upgrade for Newton->master is going to be totally different 14:44:28 right, only newton and master now 14:44:55 we could look at extending coverage to stable branches later I guess 14:45:29 * shardy is mostly focussed on ensuringt the upgrades rewrite works for ocata 14:45:33 shardy: yes, but i'm missing the details here why mitaka diff from newton on multinodes 14:45:58 matbu: because multinode uses composable roles/services I think 14:46:13 shardy: marios yes i think for us the bug number #1 in upgrade is "how to test it" 14:46:23 shardy: ack ok make sense thx 14:46:54 yes, and also the deployed-server environment did not merge until newton 14:46:59 matbu: ack, as shardy points out we will need 2 jobs (you also mentioned this before ) as the n->o upgrade is going to be completely different 14:47:25 FWIW a control-plane upgrade from Newton->master locally only takes 9 minutes :) 14:47:33 do you prioritize n..o or m..n? 14:47:36 marios: you asked how long yesterday ^^ 14:47:41 shardy: woot :) 14:47:55 there's a few missing pieces so that will go up a bit 14:48:27 shardy: wow :) 14:48:38 do we want periodic jobs , M to N ? 14:48:54 shardy: note a significant portion of that time is /was bringing the cluster down/up so yeah it will go up 14:49:03 also, in the future we could have remove the classic jobs for upgrades and clone the scenarios to test upgrades too (maybe) 14:49:17 but I'm maybe too far in the future :-) 14:49:26 btw , i don't have all the details, but if sshnaidm is making oooq usable in tripleo-ci, we can use RDO upgrade ansible roles 14:49:27 marios: yeah, just a starting point, but not to bad for the nonha case 14:49:46 shardy: for sure ! :) 14:49:54 (for periodics i mean) 14:50:01 matbu, great idea 14:50:04 matbu: yes, I was interested by this too, as we know it works already downstream 14:50:34 ok time is running out 14:50:38 EmilienM: yes, i'm a bit worring of the gating stuff 14:50:39 I don't see much #action 14:51:33 well just to summurize the discussion ^ 14:51:40 matbu: can you take some actions maybe so we track the output of this discussion 14:51:55 EmilienM: yes 14:52:12 i see 2 actions: making periodics upgrade jobs with oooq in tripleo-ci 14:52:21 and making upgrade jobs for N to O 14:52:42 matbu: can you use #action please? :-) 14:52:57 EmilienM: hehe lets try :) 14:53:00 do we have anything else about CI today? 14:53:10 after the meeting, i'll keep digging why it's broken now 14:53:16 but any help is warmly welcome 14:53:37 #action matbu lets use oooq in tripleo-ci for periodics M to N jobs 14:53:52 #action matbu build upgrade jobs for N to O 14:54:03 #action team to review slagle's work on 3nodes job https://review.openstack.org/#/c/397441/ (and dependencies) 14:54:33 #topic Specs 14:54:48 #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open 14:55:00 we still have a bunch of specs without many reviews 14:55:07 if the team could take some time to review our specs 14:55:12 I'd like to request reviews for https://review.openstack.org/#/c/397296 thanks! 14:55:19 https://review.openstack.org/#/c/392116/ is pretty close thanks 14:55:43 I also would like to get more core reviews on our squad policy, https://review.openstack.org/#/c/385201/ 14:56:14 shardy: EmilienM i'll published an edit there for shardy comment in a sec please re-add your +2 https://review.openstack.org/#/c/392116/ 14:56:24 marios: no way! 14:56:31 EmilienM: :( 14:56:33 pretty please? 14:56:34 done :P 14:56:41 marios: ack, will do, thanks! 14:56:59 #action team to reviews specs targetted for ocata 14:57:13 shardy: so do we freeze specs by end of ocata-2? 14:58:01 #topic open discussion 14:58:01 EmilienM: +1 14:58:33 #action EmilienM to communicate on ocata specs freeze (end of o-2) 14:58:54 sorry for short open discussion, if you have anything to ask or say, please use #tripleo channel! 14:59:10 heh 14:59:18 thanks all for your time 14:59:21 #endmeeting