14:02:37 #startmeeting #tripleo 14:02:37 Meeting started Tue Apr 19 14:02:37 2016 UTC and is due to finish in 60 minutes. The chair is shardy. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:02:38 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:02:40 The meeting name has been set to '_tripleo' 14:02:43 o/ 14:02:44 o/ 14:02:45 #topic rollcall 14:02:47 o/ 14:02:48 o/ 14:02:49 hey 14:02:50 thanks for the reminder EmilienM ;) 14:02:54 o. 14:03:07 shardy: calendar as a service ! 14:03:08 o/ 14:03:12 o/ 14:03:15 #link https://wiki.openstack.org/wiki/Meetings/TripleO 14:03:18 hiya 14:03:29 hi 14:03:54 o/ 14:03:57 hello 14:04:21 #topic agenda 14:04:21 * one off agenda items 14:04:21 * bugs 14:04:22 * Projects releases or stable backports 14:04:22 * CI 14:04:24 * Specs 14:04:28 * open discussion 14:04:43 Anyone have any additional one-off items other than the one dprince added? 14:04:52 I have a Q about hiera.yaml composition 14:04:56 if we have time 14:05:06 michchap_: kk, can we do that in open discussion? 14:05:12 shardy: sure 14:05:17 Ok, thanks 14:05:27 #topic one off agenda items 14:05:35 * CI status report broke. Potential fix here: https://review.openstack.org/#/c/307532/ 14:05:48 dprince: this is due to the firewalling right? 14:05:57 yes ^ 14:05:59 shardy: yes 14:06:20 shardy: ccamacho and I are working on a patch to generate a new report 14:06:35 cool 14:06:35 derekh: on that note, do we want to keep the old tripleo-jobs.py as-is? 14:06:41 * bnemec subscribes 14:06:42 Sounds good, would be nice to get cistatus working again 14:06:44 derekh: in case the firewall opens up? 14:06:53 o/ 14:06:59 ya I think jenkins restriction is temporary 14:07:02 derekh: like maybe our patch should be a tripleo-jobs-gerrit.py or something 14:07:08 though probably a few weeks 14:07:45 dprince: we could revert it again, or keep it, either is fine with me 14:07:47 trown: there is possibility... in the longterm that infra could move away from Jenkins I think and just use a zuul worker directly 14:08:00 trown: whereas... I don't think gerrit is going away long term 14:08:14 and even after it is unfirewalled, we will still likely need to patch tripleo-jobs.py to account for CSRF protections 14:08:25 derekh: I may keep them both side,by side 14:08:37 dprince: ya, the only problem with gerrit is no ability to check periodic 14:08:50 trown: yep, we lose that 14:08:54 I am pro side-by-side approach 14:09:06 dprince: the gerrit version wont get us the periodic jobs, best I can come up with its traversing the log server http://paste.openstack.org/show/494640/ 14:09:46 trown: best solution I can come up with for periodic jobs ^ its a WIP 14:09:48 derekh: lets do it man. 14:10:01 are we starting an unoffical "screenscrapers" team here guys? 14:10:19 :) 14:10:35 dprince: can't think of an alternative ;-( 14:10:39 hehe - well it sounds like we've got a workable interim plan then 14:11:02 shall we move on, and hopefully find out more about the duration of the firewalling etc next week? 14:11:09 shardy: yep, we are done I think 14:11:18 #topic bugs 14:11:38 #link https://bugs.launchpad.net/tripleo 14:12:07 So https://bugs.launchpad.net/tripleo/+bug/1571708 is one I was looking at with EmilienM and chem yesterday 14:12:08 Launchpad bug 1571708 in tripleo "undercloud user role assignments deleted by undercloud install" [Undecided,New] 14:12:33 it's an issue where we delete the _member_ role assignments when upgrading from quite old environments 14:12:59 https://review.openstack.org/#/c/307352/ is a proposed fix, which is a little ugly but hopefully workable 14:13:02 feedback welcome 14:13:11 any other bugs to highlight this week? 14:13:19 shardy: I like your approach 14:13:32 shardy: do you confirm that's a problem using all upstream bits? 14:14:02 slagle: It's a problem if you installed your undercloud then kept updating it for sufficiently long 14:14:17 the change was keystone used to create _member_ as part of the bootstrap of the DB 14:14:17 there were some patches carried downstream that were reworked before they ever landed upstream, related to recreating users at the end of the install 14:14:25 ok 14:14:42 just wanted to make sure the right root cause was identified 14:15:33 slagle: sanity check welcome, I was under the impression the role assignments would have been created by keystone along with the _member_ role, but we can rework if that's not the case 14:16:20 shardy: i'd need to go back and refresh my memory 14:16:31 kk, any feedback welcome 14:16:35 but based on your simple reproducer in the bug, it's probably not related to what i was thinking of 14:17:15 Well the reproducer actually isn't fixed by my patch - if an operator adds a role to any of the pre-created users, puppet will erase the role assignments every undercloud install 14:17:24 but it demonstrates the same issue 14:18:02 Was https://bugs.launchpad.net/tripleo/+bug/1571384 fixed by the revised package build? 14:18:03 Launchpad bug 1571384 in tripleo "CI periodic jobs: undercloud keystone fails to start" [Undecided,New] 14:18:27 shardy: I think so 14:18:45 shardy: 3-4 packaging fixes later and we are good :) 14:18:52 lol 14:18:58 ya that was a mess 14:18:58 dprince: Ok, sounds like we can close that one off then :) 14:20:54 #topic Projects releases or stable backports 14:21:34 So I think stable CI was broken which should be fixed when we land https://review.openstack.org/#/c/306524/ 14:22:00 The other thing I wanted to raise (which we can discuss in more detail next week) was the release schedule we plan to observe for Newton 14:22:17 until now, TripleO has kind of done it's own thing under the "independent" release model 14:22:20 Yup the liberty job is accidentally using the cached instack master image ... sorry .. this patch fixes it (in addition to moving some stuff into a function) https://review.openstack.org/#/c/306524/ 14:22:35 but IMO it would be a good thing if we aligned more closely with the coordinated release 14:22:46 given that we're maintaining branches etc now 14:23:00 +1 14:23:07 #link https://github.com/openstack/releases/blob/master/doc/source/newton/schedule.rst 14:23:23 There are basically two possible approaches 14:23:46 we either align with the main Newton release, with some number of intermediate milestones 14:24:06 Or we adopt the new "trailing" release which allows a 2 week window after the main release 14:24:11 I think tripleo is a good candidate for https://review.openstack.org/#/c/306037/ 14:24:32 +1 for trailing release 14:24:39 I think trailing makes sense if we have an actual hard date still 14:24:46 EmilienM: is puppet also on that? 14:24:47 fwiw, that's what puppet openstack + fuel + kolla + ansible are going to follow 14:25:03 I mean, deployment systems are explicitly called out as the intended use case. 14:25:04 +1 for trailing release 14:25:06 trown: yes, probably 14:25:06 I think we are pretty tied to what puppet does 14:25:31 o/ sorry am late 14:25:33 Yeah, and also there's a natural lag in that we are dependent on RDO branches existing for a release etc 14:25:33 trown: we'll do our best to release at the official date though 14:25:46 trown: like we did for Mitaka, we released 1 week earlier 14:25:57 hi (missed role call) 14:26:04 EmilienM: Ok, if puppet is going with that tag, I think we'll have to do likewise 14:26:11 +1 14:26:15 but I like the idea of aiming for sooner-than-two-weeks 14:26:27 right 14:26:39 for sure we (puppet) won't wait 2 weeks 14:27:03 Ok then, it sounds like we have consesus but I'll follow up on the ML and folks can vote or raise concerns there 14:27:04 ya, "no later than 2 weeks" is good 14:27:14 shardy: ++ 14:27:27 Anything else re releases to raise? 14:27:42 note, that the trailing release model still involves milestones 14:27:51 that are not required afik 14:28:14 it is in the requirements on https://review.openstack.org/#/c/306037/2/reference/tags/release_cycle-trailing.rst 14:28:25 though not very specifically defined 14:28:33 trown: yeah, I read it too, but iiuc when talking with doug it was not required 14:28:40 k 14:28:46 maybe we can ask dhellmann to confirm (later) 14:28:48 trown: Ya, I think it'll be good for us to consider some intermediate releases 14:29:04 we'll have to decide if we follow the milestone or intemediary model tho 14:29:09 shardy: yes, release more often would really help to iterate 14:30:04 Ok I'll try to summarise a plan on the ML and folks can reply there until we reach agreement 14:30:15 #topic CI 14:30:32 One of our RH1 compute nodes went offline today, it was the compute node that hosted our squid and geard, so all jobs failed 14:30:38 its back running now, we should know soon if everything is ok, can't check jenkins anymore to see progress ;-( 14:30:41 Anyone like to give us a summry of the CI status, obviously been another fairly rough week 14:30:50 We successfully moved the tripleo pin yesterday to a repo from friday, we are now caching the IPA image, 14:30:59 this patch should allow us to use it https://review.openstack.org/#/c/301699/ 14:31:05 and this patch will start us caching the overcloud image https://review.openstack.org/#/c/306499/ I'll work on a patch to use it once its cached 14:31:22 #info we are testing stack-updates to the tripleo-heat-templates patch being tested in the upgrades job now 14:31:24 derekh: did the automated promotion etc all work when the periodic job passed? 14:31:43 slagle: nice :) 14:31:45 I wonder if that's why my undercloud upgrade job started failing yesterday... 14:31:47 slagle: w00t 14:32:03 I believe therve posted a fix to the ResourceChain update issue also, has that been proven? 14:32:18 do we deploy newton already? (puppet CI is failing at that currently, I'm working on it) 14:32:30 shardy: yes, its was an atomatic promotion, but it was a fake periodic job (didn't want to wait another 24 hrs https://review.openstack.org/#/c/229789/) 14:32:35 EmilienM: ya, but only with pingtest 14:32:58 trown: it's very cool, I'll investigate all my tempests failures when our CI is back 14:33:13 EmilienM: but I have had success manually testing tempest, with only ceilo notification and keystone v3 failures 14:34:15 I've been trying to make the toci_ scripts usable by more people to reproduce problems, would appreciate some eyes here to help out the cause https://review.openstack.org/#/c/306506/ 14:34:40 also 14:34:45 I've been asked when would be a good time for the new RAM to be installed on the tripleo hosts, my suggestion was Friday 29th, this would leave CI offline from friday until the following monday when we bring it back up again. 14:34:50 Most people will be traveling back from the summit so disruption shouldn't be too bad 14:34:55 how does this sound to people? 14:35:01 +1 14:35:05 derekh: +1 14:35:05 ++ 14:35:09 This is assuming we have the RAM in time if course, if not we'll have to pick some other time, we could wait until the Rack moves to a new DC but thats the beginning of July 14:35:20 good to hear we've got a definite timeline on those upgrades :) 14:35:39 derekh: would be nice to not wait that long if possible 14:35:59 shardy: the PO is sent, so we should have it available once the supplier sends them too us 14:36:12 One CI related thing, I did a bunch of memory profiling last week, resulting in https://bugs.launchpad.net/heat/+bug/1570983 and https://bugs.launchpad.net/heat/+bug/1570974 14:36:14 Launchpad bug 1570983 in heat "raw_template files duplication wastes DB space and memory" [High,In progress] - Assigned to Crag Wolfe (cwolfe) 14:36:15 Launchpad bug 1570974 in heat "Possible reference loops lead to high memory usage when idle" [Undecided,New] 14:36:37 tl;dr - there are some issues which we may be able to fix to improve the heat-engine memory usage somewhat 14:37:08 Anything else CI related? 14:37:30 container job was working for a bit, it is no longer 14:37:53 rhallisey: what was the cause, tripleo code or the images? 14:37:55 shardy: looks like 1570983 is addressed already, and if i understood correctly will make significant improvement in memory use 14:38:03 even if I fix it though, it still won't turn up from for the composable container stuff until I do all the roles 14:38:06 since we run on atomic 14:38:24 marios: Yeah the patch isn't ready yet but it may make a big difference when done 14:38:37 shardy, it was working for a while, but about week ago it stopped 14:38:54 should we enable it as voting?^ 14:38:59 shardy, I think we should hold off on that CI job until composable container stuff is all set 14:40:34 EmilienM, it's not ready yet 14:40:50 #topic Specs 14:40:57 #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open 14:40:57 the compute node is going to undergo a lot of change, so gating on anything there won't really do much 14:41:12 There's the lightweight HA one which could use feedback 14:41:33 should we leave other spec discussions until during/after summit? 14:41:44 or does anyone have anything to raise now? 14:41:53 can we have people look at the validation one again? 14:42:09 shadower: ah, yes sorry I meant to mention that one too 14:42:10 it's prerequisite (the mistral api) has landed and I don't see any negative feedback 14:42:16 *its 14:42:33 but both mandre and myself will be at the summit so we can talk abuot it there, too 14:42:46 no worries shardy :-) 14:42:50 shadower: thanks for reminder i want to revisit. will this include things like 'is my pcs cluster ok?' 14:43:27 marios: we're focusing on validating the predeployment stuff because it can shave a ton of deployment time. But we do have some postdeployment validations 14:43:38 not 100% sure they belong there -- with sensu being a thing and all 14:43:46 shadower: ack thx and sorry i see we bump discussion/take offline. just for sthing i was looking at today 14:44:07 yeah, np 14:44:17 that's a question i'd like to resolve soonish anyway 14:44:42 shadower: link to the spec would be good in minutes 14:44:53 https://review.openstack.org/#/c/255792/ 14:45:00 sorry 14:45:07 #link https://review.openstack.org/#/c/255792 14:45:24 #topic Open Discussion 14:45:36 So, one thing before we address michchap_'s question 14:45:39 have the topics for summit been announced yet? 14:45:54 #link http://www.openstack.org/summit/austin-2016/summit-schedule/global-search?t=TripleO 14:45:58 #link https://etherpad.openstack.org/p/newton-tripleo-sessions 14:46:09 tzumainn: hehe 14:46:20 shardy, lol, thanks! 14:46:28 I'll create the etherpads per-topic and link them into the main page later today 14:46:49 There are a few conflicts with Heat sessions in the afternoon unfortunately, but otherwise all seems to be in good shape 14:47:11 let me know if you see anything that needs adjustment 14:48:02 I'll also chat to folks in #tripleo and we can nominate spread the load of leading sessions around a bit 14:48:24 any questions/comments re summit to add? 14:49:14 Ok, michchap_ - your question? 14:49:20 I will be quick 14:49:30 I was looking at extraconfig things for doing numa support as an optional component and noticed the hiera.yaml on compute nodes has a set of hardcoded files for each neutron plugin. I was thinking the composable services stuff might be a good time to look at making hiera.yaml itself composable, or perhaps merging things into existing yaml files. 14:50:04 I just wanted to raise that as a thing and ask if anyone had already looked at doing it 14:50:22 michchap_: Yeah, the history there is until recently heat didn't have any way to join two (or more) lists 14:50:42 I fixed that in mitaka, so we can probably make that list composable now 14:50:51 although how we handle the ordering could be interesting 14:50:55 shardy: Ah, but we might be in a good spot to do it now. 14:51:11 shardy: ordering is going to be difficult. 14:51:18 michchap_: Yeah, I think it should be do-able as part of the composable services work 14:51:27 dprince: Have you looked into that yet at all? 14:52:12 shardy: not specifically, but it sounds doable 14:52:26 kk, thanks for mentioning it michchap_ 14:52:31 yep, good idea 14:52:33 thanks for your time 14:52:42 I'll see if I can throw an initial patch up which illustrates how we might do it 14:52:47 I might try to make a patch but heat isn't my strong suit. 14:52:49 and we can take it from there 14:52:54 ^ happy to review 14:53:20 Anyone have any other topics to raise before we wrap up? 14:54:19 Ok then, thanks all, see you in Austin! 14:54:33 #endmeeting