14:00:25 #startmeeting tripleo 14:00:25 Meeting started Tue Apr 4 14:00:25 2017 UTC and is due to finish in 60 minutes. The chair is EmilienM. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:26 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:29 The meeting name has been set to 'tripleo' 14:00:30 #topic agenda 14:00:38 o/ 14:00:44 hey folks! (~˘▾˘)~ 14:00:45 * review past action items 14:00:46 * one off agenda items 14:00:48 * bugs 14:00:51 * Projects releases or stable backports 14:00:52 * CI 14:00:54 * Specs 14:00:56 * open discussion 14:00:58 Anyone can use the #link, #action and #info commands, not just the moderatorǃ 14:01:00 Hi everyone! who is around today? 14:01:05 o/ 14:01:07 o/ (observer from I18n team) 14:01:08 o/ 14:01:08 \o 14:01:13 o/ 14:01:19 \o 14:01:20 \o 14:01:28 hi 14:01:31 o/ 14:02:04 o/ 14:02:24 ok let's start 14:02:27 #topic review past action items 14:02:32 * EmilienM to postpone pike-1 Triaged bugs to pike-2 milestone: not done yet, will do this week 14:02:36 #action EmilienM to postpone pike-1 Triaged bugs to pike-2 milestone this week 14:02:55 pike-1 is next week so i'll move the pike-1 bugs this week 14:03:02 at least the ones that are not in Progress 14:03:12 * shardy to run CI patch that remove t-i-e and incubator projects: still WIP 14:03:16 o/ 14:03:43 * EmilienM to retire os-cloud-config: almost done. git repo is empty now. Need reviews on RDO packaging and we're done 14:03:56 #link https://review.rdoproject.org/r/#/q/topic:os-cloud-config/retire 14:03:57 I did that ref https://review.openstack.org/#/c/450809/ 14:04:05 it turns out we are still using some elements 14:04:17 oops :) 14:04:22 so I'll iterate on that until we can remove those unused, then we can see if/where we can move things to retire the repo 14:04:30 /o 14:04:30 shardy: nice, thanks! 14:04:33 * team to review chem's patches from https://etherpad.openstack.org/p/tripleo-meeting-items about upgrades: still wip? 14:04:50 chem is not here but if upgrade folks need more reviews, let us know 14:04:59 * EmilienM to write an email with all issues we had in Ci recently: not done yet 14:05:07 * container squad to investigate downloaded vs apparent image sizes 14:05:16 * container squad to continue discussion with -infra re TripleO and kolla requirements for local/cached registry 14:05:37 I think dprince has a topic for that a bit later 14:05:57 chem: did you get all the reviews you needed on upgrade patches ? (still catching up past action items= 14:06:24 yeap, thank to everyone :) 14:06:36 #topic one off agenda items 14:06:40 #link https://etherpad.openstack.org/p/tripleo-meeting-items 14:06:51 dprince: go ahead 14:06:51 chem: EmilienM: we still have some pending things but can bring them later on bugs 14:07:07 marios: ack 14:07:19 EmilienM: ack. 14:07:35 EmilienM: I would like to proposed that we disable the nonha job 14:07:49 EmilienM: in favor of adding back the containers job. 14:08:00 noting that the containers job alread did introspection 14:08:31 We can make other aspects of the containers job match the previous nonha job I think fairly quickly 14:08:42 dprince: considering we need to save time, would moving introspection to a different job be an option? 14:08:58 I guess the HA job is already pretty close to the timeout... 14:09:05 shardy: I'm not sure either the HA or updates jobs have extra time either 14:09:23 shardy: same for ovb-updates 14:09:25 * dprince is starting to think introspection doesn't belong in our normal OVB queue jobs 14:09:26 yea, that was my concern with adding more $stuff to the ha job 14:09:33 yeah, hmm 14:09:37 dprince: +1 14:09:45 periodic? 14:09:46 i'd rather test introspection in a periodic job altogether 14:09:47 like what if introspection became a periodic job 14:09:51 yeah 14:10:01 Ok maybe we do introspection plus container deploy, and leave container upgrades to a multinode job as jistr is working on 14:10:08 if we agree to bring attention to the periodic jobs, then ok 14:10:10 * dtantsur will not repair introspection again, if it again gets broken a week after it's out of CI... 14:10:22 dtantsur: good point 14:10:37 does tripleo frequently break introspection? 14:10:40 shardy: yes, good option 14:10:51 or is it the other way around? 14:10:54 slagle: we removed it from CI once before and it immediately got broken, I don't recall why 14:11:09 dtantsur: I'm hppy to run introspection jobs in our CI. But I think perhaps only a subset of the patches are related to it. Perhaps not every single t-h-t patch for example 14:11:28 dprince, yes, until we use THT to configure inspector 14:11:46 currently we can only run it on tripleo-common, python-tripleoclient and instack-undercloud 14:12:12 dtantsur: Ok, so we actually don't have to run it on t-h-t changes? 14:12:15 dtantsur: I know, But even then we could provide "less expensive" coverage for the featureset 14:12:21 that, I think, is the main bottleneck for containers 14:12:34 shardy, well, right now - we don't 14:13:27 to be fair, we should have at least one job that exercise the whole flow we recommend to customers 14:13:40 putting aside the fact, that I'm getting constant requests to enable cleaning by default ;)( 14:14:28 o/ 14:15:10 dtantsur: we have to make some hard decisions I think here. I'm happy to fit introspection in wherever we can. But at this point we've got a large number of containers patches coming in.... and no overcloud CI jobs on them at all 14:15:27 * mwhahaha is in favor of having CI actually do what customers do 14:15:40 I realize it. But now we have introspection by default, and we have to cover it. 14:15:50 how long does it take? 14:16:07 If we stop recommending it by default - we can reduce its coverage, or stop covering it at all. 14:16:15 why not just enable it for the nonha-updates job 14:16:22 dtantsur: introspection is absolutely important to the overwall workflow, but I think we can still guarantee it isn't broken with and use much less resources 14:16:26 mwhahaha: close to timeout already 14:16:29 or was the thought to also remove the nonha-updates job 14:16:37 what's the timeout on that one? 14:16:42 90 mins? 14:16:52 EmilienM, up to 5 minutes, usually. Not sure about TripleO CI though.. 14:17:10 mwhahaha: 180 afik 14:17:15 3-4 minutes 14:17:18 and with this approach we'll never be able to enable cleaning, which is something that at least storage folks want to make mandatory.. 14:17:24 EmilienM: last i saw the nonha-updates was taking 80 mins 14:17:26 and fwiw introspection is an optional feature. 14:17:30 * dprince doesn't always use it 14:17:38 dprince: you're not deploying in production 14:17:41 we may recommend it... but it is sort of optional 14:17:41 customers do 14:17:50 EmilienM++ 14:18:01 EmilienM: I do deploy to baremetal, and arguably more "production" than most developers 14:18:04 I also don't always run introspection, but it's run nearly always in production 14:18:28 dprince: what mwhahaha said is right, we need to test real scenarios and introspection is one of them 14:18:29 dprince, I'm not sure if you're the most valuable customer, to be honest :D 14:18:32 would we always use containers for Pike. I think that is the goal 14:18:40 and we're talking about 3-4 minutes here 14:18:43 right now we have 0 overcloud CI on this.... 14:18:47 lets start there 14:19:08 and make introspection a periodic job until we get things tuned to accomidate it 14:19:13 * dtantsur does not disagree with having a container job 14:19:16 *this* is the hard decision 14:19:18 can we just try to move introspection to ovb-updates? and revert if we see it timeouts too much? 14:19:24 dprince, and this is a wrong decision 14:19:36 because then nobody ever will care to move it back 14:19:44 we should just start testing conatiners with multinode, then this would be a moot point :) 14:19:53 slagle: yes that 14:19:54 EmilienM: I got comments from slagle and bnemec with concerns about it causing timeouts on the HA or updates jobs I think 14:20:03 slagle: we should do that too 14:20:07 dprince: I'm aware about this concerns, but we can try 14:20:15 slagle: but I think then we aren't testing a full story there either 14:20:19 slagle: we need to do both 14:20:25 what about, dunno, figuring out why running puppet a few times takes so much time? 14:20:38 slagle: jistr is already working on that via oooq-extras 14:20:51 well, i'm not really in favor of doing anythnig that increaes the runtime of any CI jobs 14:20:51 so hopefully soon we'll be able to do both 14:20:54 though that's non-container -> container upgrade 14:21:02 which is a bit different than container deploy 14:21:07 all we're doing is kicking the can down the road 14:21:08 I would point out that my initial proposal here is: 14:21:09 IMO we need the multinode approach so we can get the exact same scenarios we use for not-containers now 14:21:12 1) disable nonha 14:21:20 2) enable containers job, with introspection 14:21:26 everyone seems to be ignoring that... 14:21:39 dprince: I think that's reasonable FWIW 14:21:39 dprince, won't it hit the same timeouts? 14:21:41 jistr: before upgrades, why not working on classic deployments? 14:21:49 jistr: I would iterate on upgrades later, imho 14:21:52 dtantsur: we have other ideas that will help there soon enough 14:22:00 dprince: i would be fine with that, if the containers job also covers everything else nonha was doing 14:22:01 EmilienM: we already had classic deployments ;) 14:22:03 dprince: yes, it sounds good to me 14:22:08 jistr: where? the ovb job? 14:22:10 yes 14:22:12 slagle: yes, that is my goal 14:22:27 jistr: we think it would be better to have multinode jobs for that 14:22:34 jistr: to have more coverage, like we do with scenarios 14:22:41 dprince: that sounds good to me then 14:22:42 I'm asking everyone to agree that we go all on in containers CI 14:22:54 because we don't have resources to add *any* more OVB jobs ATM 14:22:56 jistr: scenario001, 002, ... 004 with container deployments. I thought it was clear it was a priority 14:23:05 just to clarify: containers in undercloud, in overcloud or both? 14:23:12 EmilienM: we need deployments and upgrades tested via the multinode jobs IMO 14:23:23 dtantsur: conatiners in the overcloud only I think is what we are talking about 14:23:25 shardy: yes but deployments first. Upgrade right after 14:23:30 k 14:23:34 shardy: and it seems jistr is doing the other way around. Upgrade first... 14:23:52 so weshay has this featureset010 14:23:54 dprince: anything else about ovb-nonha we need to move but introspection? 14:23:54 https://review.openstack.org/#/c/446151 14:24:03 dtantsur: the containerized 'undercloud' is experimental. Only myself really spends time on this and I think it would be a Queen feature at this point 14:24:08 i hope that could be used as a base for multinode container job 14:24:24 I see 14:24:43 EmilienM: I think we need to make containers use SSL too. But I don't think that would be a major blocker 14:24:46 dprince, so essentially you're suggesting to change the non-ha job to use containers, right? 14:24:51 shardy: could we prioritize deployments first and then upgrades for container-multinode work please? 14:24:51 dtantsur: yes 14:24:58 though i agree with dprince that anything else, regardless how far in it we may look to be, is currently in the air 14:25:02 EmilienM: yeah, but we need three things (1) deploy containers (2) deploy containers and upgrade/update (3) deploy baremetal upgrade to containers 14:25:03 and in the meantime 14:25:04 ack, no objections here 14:25:11 our major Pike feature is getting no coverage 14:25:12 dprince: yes, SSL is a major blocker to me, again it's what our customers use 14:25:34 EmilienM: I will fit ssl in somewhere. SSL can go anywhere I think... 14:25:35 shardy: and you just gave them in the right order 14:26:45 and ovb-nonha has UNDERCLOUD_VALIDATIONS 14:26:48 but we can figure that later 14:26:57 I just don't want to loose in coverage by this removal 14:27:03 Features in the nonha job: http://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/README.rst#n102 14:27:06 EmilienM: yeah, but honestly the oooq transition has stalled progress here a little - lets work with weshay and jistr so we can iterate quickly to all the coverage I mentioned 14:27:37 shardy: ok good 14:27:59 bnemec: ok thx 14:28:38 Ok so we need to move ssl coverage to another job I guess 14:28:38 roger 14:28:39 #action dprince moves introspection from non-ha to ovb-containers and remove nonha job starting from pike (and keep it for stable/ocata and stable/newton) 14:28:45 dprince: ok ? ^ 14:28:59 dprince: once it's done, let's enable the ovb-containers again (if it pass) 14:29:00 EmilienM: ack 14:29:13 and shardy and jistr are working on a containers multinode job too 14:29:20 dprince: please let us know on the mailing list the progress 14:29:25 just to be clear, we want to move all features from nonha to containers, not just introspection 14:29:29 before we remove nonha 14:29:30 and in the future we'll have to refactor one of these to support containers upgrades 14:29:56 #action jistr / shardy / weshay / EmilienM to synchronize about the work prioritization on container / multinode CI work 14:30:08 slagle: yes, as many as we can. Anything that is a blocker in the short term could go into the HA job too (SSL for example) 14:30:12 slagle, some of them can go to ha job 14:30:13 dprince: container deployment vs. container upgrades can't go into single job for pike 14:30:29 dprince: as we're not interested in container -> container upgrade right now 14:30:40 jistr: right, thanks for this detail 14:30:40 jistr: we'll argue about that in a future meeting I think 14:30:41 it's non-container -> container we need to test 14:30:47 ++ on that 14:30:49 i guess, i'm not thrilled with that rate of change given all the birds up in the air, but ok :) 14:30:52 ok :) 14:31:01 jistr: this job could become containers upgrades fwiw if we need that 14:31:02 jistr: yeah although we'll need to test container->container updates at some point 14:31:34 yea but that becomes important for production only with Queens release 14:31:35 these are hard decisions. We may have to give a litte in terms of what we'd like to test 14:31:54 let's iterate on what we said today 14:32:03 and keep the discussion going over the next weeks 14:32:04 jistr: well, we need the architecture around updates proven before we release pike 14:32:19 jistr: unlike major upgrades, there may be reasons to do updates very soon after the pike release 14:32:54 shardy: right for container->container minor updates yea, that's a different story, much higher prio, i'd say 14:33:18 jistr: yeah, sorry s/upgrades/updates 14:33:50 shardy, jistr: not now because we're running out of time but later, can we work together on a document (blueprint or etherpad) with our list of CI jobs related to containers that we target for each cycle? 14:34:11 so it's clear to everyone the priorities and what people should be working on 14:34:13 EmilienM: sure I'll start one 14:34:18 shardy: thank you 14:34:24 dprince: can we move on? 14:34:32 any question or last feedback before we go ahead? 14:35:12 #topic bugs 14:35:18 #link https://launchpad.net/tripleo/+milestone/pike-1 14:35:38 this afternoon, I'll start moving "Triaged" bugs from pike-1 to pike-2 14:35:56 the priority is to work on "In Progress" bugs for pike-1 now unless if there are critical bugs 14:36:21 if I move a bug to pike-2 and you're no happy about it, please let me know. I use a script to do that, so I might miss something critical 14:36:46 do we have any outstanding bugs to discuss this week? 14:36:53 marios: anything about upgrades we need to discuss? 14:37:02 EmilienM: couple upgrades outstanding things yeah, sec 14:37:19 EmilienM: https://bugs.launchpad.net/tripleo/+bug/1678101 https://review.openstack.org/#/c/448602/ and also sofer (chem) with https://bugs.launchpad.net/tripleo/+bug/1679486 with https://review.openstack.org/#/c/452828/ . Note that gfidente has a related/alternate review at https://review.openstack.org/#/c/452789/ for both of those things 14:37:19 Launchpad bug 1678101 in tripleo "batch_upgrade_tasks not executed before upgrade_tasks" [High,In progress] - Assigned to Marios Andreou (marios-b) 14:37:20 Launchpad bug 1679486 in tripleo "N->O Upgrade, ochestration is broken." [Critical,In progress] - Assigned to Marios Andreou (marios-b) 14:37:47 EmilienM: so there has already been discussion in irc/on the reviews... i think the last one from gfidente is gaining traction to fix both bugs 14:38:07 marios: long story, we should meet with gfidente 14:38:11 EmilienM: we need this asap as both are key to the ansible upgrades workflow so our goal is end of week 14:38:11 EmilienM: https://etherpad.openstack.org/p/tripleo-container-ci 14:38:21 shardy: thx 14:38:28 everyone feel free to hack on it, I made a first pass 14:38:31 marios: i -1 the one from Guilio 14:38:54 we should meet matbu gfidente marios :) 14:38:55 matbu: ack, lets contienue in chan, just mentioning it as relevant/important bugs right now for the ansible upgrades 14:39:09 yep 14:39:23 ok, please let us know if you need any help 14:39:29 (including reviews) 14:39:52 yeah likewise, let me know if I can help beyond the discussions we've already had 14:39:56 EmilienM: thanks, i pointed at the key reviews above so if anyone has review cycles comments appreciated 14:40:04 thanks shardy 14:40:24 marios: ok thanks 14:40:38 is there any other bug that requires discussion this week? 14:41:15 #topic projects releases or stable backports 14:41:22 so next week is pike-1 14:41:49 * I'll propose tripleo pike-1 release by Thursday morning 14:42:24 * After pike-1, we should verify that upgrade jobs are working on master (it will test ocata to pike for the first time) 14:42:48 we probably have a bunch of things to cleanup in the service templates, that are related to newton to ocata upgrades 14:42:59 marios: ^ fyi 14:43:28 * Mitaka is EOL next week 14:43:29 EmilienM: we already removed stuff from the tripleo-heat-templates 14:43:36 EmilienM: do you mean in the ci repo? 14:43:47 marios: I'm not sure what you're talking about 14:43:50 which repo? 14:43:59 marios: No, te upgrade job has been broken and non-voting since we branched pike 14:44:04 I'm talking about upgrade_tasks that were specific to newton to ocata upgrade 14:44:10 marios: ref my ML discussion around release tagging etc 14:44:18 EmilienM: oh i see, i thought you were referring to removal of old upgrades scripts. didn't realise you were referring to newton to ocata 14:44:19 we'll need to get it green and voting after pike-1 14:44:28 yes that ^ 14:44:38 EmilienM: because we cleared up older upgrade scripts already 14:44:39 re: Mitaka EOL - it's official on 2017-04-10 14:44:47 EmilienM: ok thanks 14:44:56 I'll poll the team to ask if whether or not we want to keep the branches & CI jobs for Mitaka 14:45:11 marios: yes, I reviewed the patch. cool :) 14:45:45 do we have any question or feedback about stable branches & release management? 14:46:20 #topic CI 14:46:38 I have some updates : 14:47:10 * pabelanger has been working on getting RDO AFS mirrors, so we can download packages faster 14:47:23 pabelanger++ 14:47:39 that's awesome 14:47:41 we switched puppet openstack ci to use them, it works pretty well (it was a pilot) 14:47:52 now we're working to switch quickstart jobs to use the new mirrors: https://review.openstack.org/#/c/451938/ 14:48:10 once pabelanger gives us the "go" for tripleo, we'll start using it 14:48:16 nice 14:48:44 we need to be careful in this transition, because the mirrors are using rsync directly from rdo servers, so this thing is kind of experimental right now 14:48:57 but we expect to improve the runtime of CI jobs 14:49:12 pabelanger: anything you want to add on this topic? 14:49:45 * pabelanger (again) has been working on a Docker registry : https://review.openstack.org/#/c/401003/ 14:49:57 and it follows up our discussion from last week 14:50:19 where TripleO Ci could use this registry, (spread over AFS, so same thing as packaging, we would improve the container jobs runtime) 14:50:41 mandre, dprince, jistr ^ fyi just to make sure you can review it and maybe investigate how tripleo ci will use it 14:51:01 ack thanks 14:51:13 adarazs: do we have any blocker on quickstart transition? Anything that needs discussion? 14:51:25 panda, sshnaidm: any blocker on CI promotions this week? 14:51:26 EmilienM: I don't think so. 14:51:39 EmilienM, nope 14:52:01 #topic specs 14:52:04 #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open 14:52:13 I sent an email last week about TripleO specs 14:52:32 let me find the link 14:53:04 #link http://lists.openstack.org/pipermail/openstack-dev/2017-March/114675.html 14:53:52 I think one the conclusions is that please don't wait that your spec is merged to start doing PoC of the feature 14:54:06 because i've been told some people were doing it 14:54:32 do we have any discussion about specs this week? 14:54:46 EmilienM: yes, I would like some core review to review my spec 14:54:54 https://blueprints.launchpad.net/tripleo/+spec/send-mail-tool 14:55:14 arxcruz: I encourage any review, core or not core 14:55:15 also there's already a POC in https://review.openstack.org/#/c/423340/ 14:55:40 EmilienM: well, actually, you already gave me +2 on the spec at https://review.openstack.org/#/c/420878/ need one more core to review and hopefully merge 14:56:03 arxcruz: my point is that when asking for reviews, we need to encourage anyone to review, even people not core yet 14:56:16 I don't want the group to encourage only cores to make reviews 14:56:17 EmilienM: I just saw your comments in https://review.openstack.org/#/c/423340/ and I'll work on that. 14:56:20 EmilienM: gotcha :) 14:56:24 arxcruz: cool 14:56:46 but since pike-1 is next week, if someone can review I appreciate 14:56:55 arxcruz: have you talked with infra and QA folks if they already have a mechanism to do this thing? 14:57:06 EmilienM: yes, they don't 14:57:07 arxcruz: I would talk with mtreinish and pabelanger and present the work you have done 14:57:23 EmilienM: the idea is on zuul v3 have something similar, but it's for the future 14:57:34 ok 14:57:46 a mechanism to do what? 14:57:47 it's not in zuul v3 agenda yet 14:58:07 #topic open discussion 14:58:08 mtreinish: remember we were talking about get the tempest results and send an email to key people when tempest tests fails ? 14:58:25 I have a last minute request: oooq deep dive: I voluteered to it, but since I'm PTO for the next two weeks, it can either be done the day after tomorrow (too near?), thursay 27th of april (too far?), or someone else could volunteer and do it in one of the next two weeks 14:58:28 a few weeks ago, you point me to check openstack health 14:58:37 arxcruz: right and I said if you put subunit streams in the log dir you can leverage openstack-health's rss feeds 14:58:55 which does exactly what you want, its just rss not email 14:59:03 panda: I dont think the end of April or early May is too far... 14:59:07 there are some tools that can send emails from RSS 14:59:19 panda: yeah I think we can wait a little 14:59:25 ok 14:59:28 unless trown wants to do it before :P 14:59:31 I'll propose the 27th then 14:59:33 :0 14:59:35 :) 14:59:36 panda: sounds good 14:59:41 arxcruz: for example: http://health.openstack.org/runs/key/build_queue/gate/recent/rss (which is an rss feed of all failed runs in the gate queue) 14:59:46 I could... but panda will do a better job :) 15:00:36 ok thanks everyone 15:00:38 #endmeeting