14:00:09 #startmeeting tripleo 14:00:10 Meeting started Tue Mar 28 14:00:09 2017 UTC and is due to finish in 60 minutes. The chair is EmilienM. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:11 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:12 #topic agenda 14:00:13 The meeting name has been set to 'tripleo' 14:00:19 * one off agenda items 14:00:20 * bugs 14:00:22 * Projects releases or stable backports 14:00:24 * CI 14:00:26 * Specs 14:00:28 * Week roadmap 14:00:30 * open discussion 14:00:32 Anyone can use the #link, #action and #info commands, not just the moderatorǃ 14:00:34 Hi everyone! who is around today? 14:00:37 o/ 14:00:38 yo/ 14:00:40 hi folks o/ 14:00:42 hi2u 14:00:44 o/ 14:00:45 o/ 14:00:45 o/ 14:00:46 o/ 14:00:50 o/ 14:00:50 o/ 14:01:14 look at that crowd 14:01:15 \o/ (o) oC /o\ 14:01:20 \o 14:01:23 o/ 14:01:26 o/ 14:01:37 o/ 14:01:42 o/ 14:01:56 o/ 14:02:03 o/ 14:02:16 ok, let's start 14:02:18 #topic review past action items 14:02:26 EmilienM to look more closely the thread about periodic jobs: done, topic is still under discussion now. 14:02:55 sshnaidm: like we said in our recent discussions, I guess we'll make progress over the next days and we need to continue the investigation with pabelanger 14:03:05 sshnaidm: feel free to bring it during the CI topic later 14:03:12 EmilienM to update https://review.openstack.org/#/c/445617/ to keep dib-utils part of TripleO for now and keep moving out DIB out of TripleO: done and positive votes by TC + DIB + TripleO folks. 14:03:22 o/ 14:03:22 marios to investigate why https://review.openstack.org/#/c/446506/ is failing: done, patch is merged 14:03:29 team to review https://launchpad.net/tripleo/+milestone/pike-1 and prioritize in progress work: still in progress. A lot of bugs are still untriaged, help is welcome 14:03:31 EmilienM: ack 14:03:52 on my list for today ^: continue the triage on pike-1 bugs and blueprints 14:03:58 team to postpone Triaged bugs to pike-2 next week: will do it this week. 14:04:07 #action EmilienM to postpone pike-1 Triaged bugs to pike-2 milestone 14:04:12 o/ 14:04:39 flaper87 to file a bug in tripleo assigned to opstools for fluentd to read logs from containers: done 14:04:44 and bogdando to followup on https://review.openstack.org/#/c/442603/ and update if needed: patch still WIP (not passing CI and no review) 14:04:53 o/ 14:05:06 anything before we move on? 14:05:21 #topic one off agenda items 14:05:27 #link https://etherpad.openstack.org/p/tripleo-meeting-items 14:05:32 shardy: o/ 14:05:57 EmilienM: Hey, so my question is around tripleo-image-elements, and tripleo-incubator 14:06:04 there is a lot of stuff there we no longer use 14:06:23 and t-i-e has caused some confusion recently as we're mostly only updating tripleo-puppet-elements 14:06:25 shardy: I added os-cloud-config 14:06:31 o/ 14:06:32 e.g even for non puppet things like the packages in the image 14:06:49 EmilienM: yeah, it'd be good to figure out how to retire these old repos I think 14:06:57 I remember slagle moved a bunch of things from t-i-e into instack-undercloud 14:07:04 in the case of tripleo-incubator, I think there are still a couple of ci dependencies 14:07:17 we should do it step by step. Probably tripleo-incubator is the safest one to start with 14:07:20 but IMO we should just move things, and retire those repos? 14:07:24 shardy: i removed all those deps, or at least have patches that do 14:07:43 i dont recall if they all merged yet or not 14:07:58 slagle: nice, Ok will look for them 14:08:09 it looks like they did 14:08:27 Ok so we can probably just propose the rm -fr patches and see what CI does ;) 14:08:28 so, we could test a ci patch that deletes the git checkout from /opt/stack and see if we pass 14:08:49 shardy: that would be a first good start, and only t-i-e and incubator now 14:09:08 Ok, I'll propose the patches and we can iterate on any remaining pieces we actually need 14:09:09 shardy: not o-c-c because the patch that removes the dep is still WIP 14:09:18 EmilienM: ack, yep we can do that later 14:09:43 I think the CI admins are still in -incubator? 14:09:44 #action shardy to run CI patch that remove t-i-e and incubator projects 14:10:06 I guess we may need some changes to move them to tripleo-ci 14:10:33 #action EmilienM to remove os-cloud-config as a dependency (WIP) and follow-up on what shardy is testing for the 2 other projects 14:10:37 Anyway, we can work out the details later unless anyone is opposed 14:10:40 thanks! 14:10:43 shardy: thanks! 14:10:50 let's clean up things :) 14:10:54 bogdando: o/ 14:11:08 EmilienM: hi 14:11:48 bogdando: what's up? 14:12:22 I have a few topics to announce and ask for ideas 14:12:32 yeah go for it please, floor is yours 14:12:49 #topic minimal custom undercloud for containers dev/qa WIP 14:13:01 Custom undercloud layouts with dev branches and containers WIP (follows on the Flavio's patch) 14:13:01 #link https://review.openstack.org/#/c/450792/ 14:13:34 so the idea is to deploy only the component under dev, with as minimal things as possible 14:13:48 your ideas in imlementation are welcome 14:14:16 #topic improved getthelogs (CI scripts) for CI logs parsing UX 14:14:19 bogdando: cool, sounds like your asking for some reviews on quickstart patches 14:14:28 I reviewed that 14:14:43 I would prefer if the added functionality in that patch be moved to a new patch 14:14:48 well, yes, but not only reviews but ideas if this shall be done another way 14:15:02 that patch has been there for some time and we dont want to make it more complicated 14:15:11 and the 2nd item, I ask for reviews only :) 14:15:13 Getthelogs rework https://review.openstack.org/#/c/449552/ , an example log parsing session https://github.com/bogdando/fuel-log-parse/blob/master/README.md#examples-for-tripleo-ci-openstack-infra-logs 14:15:13 #link https://review.openstack.org/#/c/449552/ 14:15:37 and you can try to use that for daily CI tshooting as well and give me feedback 14:15:48 that's it 14:15:51 go on :) 14:16:10 trown: I did that, it's a follow up now 14:16:15 bogdando: yeah, it's hard to give feedback on the first thing since you sent the patch 30 min ago 14:16:22 I don't think anyone had time to look at it 14:16:25 bogdando: cool thanks! 14:16:49 chem: go ahead, seem like you need reviews on upgrade stuffs 14:17:02 EmilienM: np. The patch is fresh, but I strongly believe not the very idea of dev shortcomings 14:17:03 hi all 14:17:09 not sure we want to paste all the links here though but it would have been nice to create a Gerrit topic 14:17:31 yes those are pending review and backport needed for N->O upgrade 14:17:45 EmilienM: hum ... willing to learn for next time 14:17:56 chem: could you please create a Gerrit topic for all these patches 14:18:05 so we can track them more easily 14:18:07 EmilienM: ah ... ack 14:18:10 (oops, s/shortcomings/shortcuts/g) 14:18:20 EmilienM: oki will do 14:18:30 chem: thanks. 14:18:34 EmilienM: will put it in the etherpad when done 14:18:41 chem: so for l3agents on https://review.openstack.org/#/c/445494/ was going to bring it up in bugs for https://bugs.launchpad.net/tripleo/+bug/1671504 14:18:41 Launchpad bug 1671504 in tripleo "l3 agent downtime can cause tenant VM outages during upgrade" [High,In progress] - Assigned to Marios Andreou (marios-b) 14:18:43 #action team to review chem's patches from https://etherpad.openstack.org/p/tripleo-meeting-items about upgrades 14:19:15 thanks 14:19:27 panda: you had something to say too? I noticed the last point 14:20:04 no 14:20:19 someone posted: FYI more doc on ci migration re: toci scripts https://review.openstack.org/#/c/450281/ 14:20:40 but no name, so I don't know who wants to talk about it 14:20:49 ok moving on 14:20:51 #topic bugs 14:20:56 #link https://launchpad.net/tripleo/+milestone/pike-1 14:21:19 EmilienM: i have two things please https://bugs.launchpad.net/tripleo/+bug/1669714 comments 9/10 in particular... long story short, we were told to remove the openvswitch upgrade workaround. Now we need to add it with an extra flag and thoes reviews are in progress see comment #9. Q: can we use that Bug (attempt to minimize confusion) even though it is in fix-released? Or do we have to file 14:21:19 Launchpad bug 1669714 in tripleo "Newton to Ocata - upgrade to ovs 2.5->2.6 with current workaround and lose connectivity" [High,Fix released] - Assigned to Marios Andreou (marios-b) 14:21:21 do we have outstanding bugs to discuss this week? 14:21:24 new one? 14:22:18 marios: re-open it and re-use it 14:22:27 marios: it will avoid confusion. It's same topic anyway 14:22:28 EmilienM: (the reviews for this are in chem list fyi). ack will do 14:22:35 EmilienM: exactly for this this reason 14:22:53 EmilienM: this one was a request we are trying to get into stable/ocata for the ansible steps upgrade... i spent some time looking last week... https://bugs.launchpad.net/tripleo/+bug/1671504 14:22:53 Launchpad bug 1671504 in tripleo "l3 agent downtime can cause tenant VM outages during upgrade" [High,In progress] - Assigned to Marios Andreou (marios-b) 14:23:09 the review at https://review.openstack.org/#/c/445494/ does what it's meant to, by only killing one neutron-l3-agent at a time, but there is an ongoing/outstanding issue neutron-openvswitch-agent. (see comment #2 on bug for more info). Q: can we land /#/c/445494/ so we can get it to stable/ocata, even though it won't work until the packaging bug is fixed? 14:23:42 shardy: grateful for input, even if you didn't have time to check it yet/doesn't have to be right now thanks 14:23:56 marios: what is ETA for packaging fix? 14:24:15 EmilienM: still tbd i mean there is no reliable fix yet at least not one i've been able to test 14:24:27 marios: If it's been proven to solve one part of the problem I'm +1 on landing it, will review 14:25:02 yeah I had a first review and I need more time to review again and vote. 14:25:21 EmilienM: shardy ack thanks appreciate anyone's reveiw time /me done on bugs 14:25:33 ok thanks marios 14:26:00 so about bugs, quick reminder: I'm going to move all bugs that are not In progress from pike-1 to pike-2 14:26:07 except these which are critical 14:26:19 any comment ^? 14:26:59 sounds fine 14:27:08 #topic projects releases or stable backports 14:27:18 quick update on tripleo-validations: we have now stable/ocata 14:27:42 could someone from validations investigate this problem? https://review.openstack.org/#/c/450178/ 14:27:56 the python jobs fail on stable/ocata 14:28:13 jrist: ^ can you look this week please (or find someone) 14:28:29 florianf is looking into it 14:28:33 EmilienM: in both cases the failure seem to be related to some package metadata file. 14:28:33 neat 14:28:57 florianf: ok. so you on it? 14:29:06 EmilienM: it looks like a race condition, since different gates fail after rechecks. yep, I'm on it. 14:29:13 ok 14:29:15 florianf: are we just missing a dependency on pyparsing or something? 14:29:26 quick reminder about Pike schedule 14:29:28 #link https://releases.openstack.org/pike/schedule.html 14:29:37 Pike-1 milestone is on Apr 10 - Apr 14 14:29:49 shardy: Good point, I'll check for that as well. 14:30:16 I'll propose a first tag on tripleo projects on April 13 most probably 14:30:30 florianf: Hmm, maybe not, I see what you mean as the py27 job sometimes works 14:30:52 any question about releases & stable branches? 14:31:21 #topic CI 14:31:26 shardy: there's an open github issue for setuptools that look a bit like what we're seeing here. apparently other projects have started pinning down setuptools versions... 14:31:44 florianf: ack, thanks for looking into it :) 14:32:00 it would take 1 hour to describe all the issues we had in CI recently but I plan to write a post-mortem email when the saga is over 14:32:27 #action EmilienM to write an email with all issues we had in Ci recently 14:32:48 the current status is that things should be much more stable and we should hopefuly get a promotion today 14:33:08 we need https://review.openstack.org/#/c/450481/2 and https://review.openstack.org/#/c/450756/1 14:34:00 do we have any update about CI work? 14:34:10 Is CI for stable branches getting more stable as well, or is that separate? 14:34:22 we should have that now 14:34:25 dprince, jistr: did you want to mention the work around optimizing the container jobs? 14:34:27 hopefully 14:34:40 adarazs sent a CI squad weekly status email: http://lists.openstack.org/pipermail/openstack-dev/2017-March/114634.html 14:34:43 FYI.. panda put together a readme re: the changes to the toci scripts https://review.openstack.org/#/c/450281/ if anyone is finding it confusing 14:34:47 that seems like a high-priority as we can't test upgrades without a much shorter walltime 14:35:18 if something is not clear, ping me and I'll add more infomrations 14:35:24 right. dprince might know more but in case he's not around -- 14:35:26 EmilienM: just mentioning an issue i saw before the meeting for docker related dependencies in the upgrade job see https://review.openstack.org/#/c/450607/ 14:35:30 we need to speed up containers CI 14:35:34 ya 14:35:35 both the normal one 14:35:36 jpich: stable branches should work, as much as the whole CI does. :) 14:35:40 and the upgrades one 14:35:43 #info all bugs have been moved from launchpad/tripleo-quickstart to launchpad/tripleo with the quickstart tag 14:35:44 (which is WIP) 14:35:54 dprince is working on setup of local docker registry 14:35:58 trown: I noticed. Thanks for that 14:36:02 either we could build images instead of downloading them from dockerhub 14:36:07 not sure how we'll run a upgrade + containers in 170min 14:36:08 which *might* be a bit faster 14:36:10 Ok, thanks! 14:36:12 So I had one question, the local registry is only for OVB, right? 14:36:15 but better seems a local dockerhub mirror 14:36:23 really we need to solve this for multinode, because those jobs are already much faster 14:36:27 right yes, the one that dprince is setting up is in OVB cloud 14:36:34 shardy, not only 14:36:34 matbu, may have to chat about his tool to help w/ that 14:36:58 shardy, if it has public ip, it could be used in multinode too 14:37:06 shardy: faster and also because we have all the scenarios things that we want 14:37:12 sshnaidm: but then we're still downloading a ton of stuff over the internet 14:37:19 yea... 14:37:20 vs a local cache 14:37:25 shardy, no, it will be local machine 14:37:26 depends where the bottleneck is i guess 14:37:41 shardy, just updating by cron from docker hub 14:38:05 sshnaidm: right but i think the OVB and the OS-infra clouds aren't collocated 14:38:11 sshnaidm: For multinode, I'm not clear how that works, unless we build all the images every CI run, or just download them as we already do 14:38:17 shardy, jistr oh, right 14:38:24 yeah the infra clouds could be one of many 14:38:44 so for this we'd probably need support of os-infra folks? 14:38:52 but we get a big performance improvement because the time taken to deploy the nodes via ironic etc is removed, and not considered as part of the timeout AFAIK 14:38:55 pabelanger: ^ 14:39:03 jistr: yeah, I think we should start that discussion 14:39:17 I suspect it's something which would be really useful to kolla too? 14:39:19 do we want some AFS mirrors ? 14:39:53 shardy: yea i think so re usefulness to Kolla 14:39:55 I wonder if infra already has available docker registries 14:40:00 and maybe other projects as well? 14:40:03 EmilienM: where should I be reading? 14:40:04 EmilienM: I think we want an infra hosted local docker registry with a mirror of certain things on dockerhub 14:40:14 for all clouds used by infra 14:40:28 pabelanger: we're trying to speed up our container CI jobs 14:40:31 the dockerhub pull-through cache doesn't have to be (and probably shouldn't) be tripleo-specific 14:40:45 yes, that is something we'd like to do 14:40:45 pabelanger: the bottleneck (or one of them) is downloading a ton of stuff from dockerhub 14:40:48 pabelanger, do you have available docker registries in infra? 14:40:57 but need to make docker register backend to AFS 14:41:08 so all images are the same across all regions 14:41:43 pabelanger: do we have any timeline for that work at all yet? 14:42:04 pabelanger: basically we really need this, and some other optimizations, to enable major upgrade testing 14:42:06 no, execpt we want to do it 14:42:15 otherwise we'll just run out of walltime before the timeout 14:42:39 how much HDD is the containers taking? 14:43:00 what kolla is doing is just publishing to tarballs.o.o for now, then downloading and building the registry themself atm 14:43:45 that could be a first step for us 14:44:00 pabelanger: just checked, about 10G 14:44:00 since we have the registry on the undercloud, right? 14:44:03 shardy, building registry on undercloud will help, isn't it? 14:44:33 sshnaidm: we already have a registry on undercloud 14:44:33 well, it means pulling 10 GB each time we deploy an undercloud ? 14:44:36 sshnaidm: yes, but we still have to download the images from somewhere, or build them, because we build a new undercloud for every CI job 14:44:44 sshnaidm: yea what EmilienM said 14:45:21 sounds like the tarballs approach used by kolla may be worth looking into as a first step 14:45:27 I hope I'm wrong and we don't need to download 10 GB at each CI run 14:45:37 the only issue is, we have so much storage on tarball.o.o 14:45:38 :-x 14:45:51 that isn't your issue, but something openstack-infra needs to fix 14:45:56 I can deal with that 14:46:28 am I correct when i said that we need to pull 10 GB of data when deploying the undercloud and create a local registry? 14:46:38 jistr: does that number consider the image layering? 14:46:41 note, docker save deduplicates base images from the resulting tarball, we could use that trick to move all of the images 14:46:42 it sounds like a lot 14:47:12 the 10G is my /var/lib/docker/devicemapper on undercloud 14:47:16 if we have 50 images which overlap like 90%m the resulting saved artifact will be very small 14:47:21 i don't have any containers running there 14:47:24 so it's just the images 14:47:36 EmilienM: I think we need to look at the actual amount downloaded vs the apparent size of the images, but I know from testing this you have to download a lot of stuff 14:47:36 not sure if that's exactly the amount downloaded, probably not 14:47:44 to move around* 14:47:47 and it's really slow as a result (slower than a normal puppet deploy) 14:47:48 ok 14:47:53 can we take some #action here? 14:47:56 and move on 14:49:33 shardy, jistr: some actions would help to track what we said 14:50:11 #action container squad to investigate downloaded vs apparent image sizes 14:50:26 shardy: thanks. 14:50:28 #topic specs 14:50:32 #action container squad to continue discussion with -infra re TripleO and kolla requirements for local/cached registry 14:50:50 sounds like a good plan for short term 14:51:03 #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open 14:51:41 quick reminder, we want Pike specs merged by Pike-1 milestone otherwise they'll get postponed to Queen cycle 14:52:06 a lot of them are ready for review, please take some time 14:52:24 is there anyone here who want to discuss about a spec ? 14:54:44 ok, let's move on. 14:54:48 #topic open discussion 14:55:06 if you have a question or feedback about TripleO, or bring another topic, it's the right time 14:55:49 it sounds like we can close this meeting 14:55:57 thanks everyone 14:56:00 i just want to warn that the delay for the reviews in tripleo-quickstart is really huge 14:56:05 thanks! 14:56:09 thanks all! 14:56:11 matbu: why? 14:56:17 idk what we should do, but it's pretty boring 14:56:41 matbu: it's a good info. 14:56:52 matbu: do you think we spend enough time on reviewing oooq patches? 14:57:19 EmilienM: i think tripleo cores don't spent enough time and btw there is not enough cores 14:57:28 or oooq 14:57:33 matbu: there is not enough cores on what? 14:57:35 It's difficule because a lot of tripleo-cores aren't yet oooq experts 14:57:43 we need to improve that I guess 14:57:43 it would be intersting to compare different tripleo repos to see where the longest wait is 14:57:44 shardy: yep i know 14:57:54 d0ugal: +1 14:58:05 matbu, you can ping people, it works usually 14:58:19 most of the time when i do a commit in tht, even if it's WIP i got a feedback pretty quick 14:58:19 we have 30 Tripleo Core, I don't think it's fair to say we don't have enough core 14:58:42 EmilienM: but how many of them feel like oooq cores - I don't :) 14:58:44 for oooq even if i add peoples, i have no such chance 14:59:13 EmilienM: how many of the 30 are experts in tripleo-quickstart though? 14:59:15 EmilienM: hehe yep for tripleo, i wasn't aware of 30, it looks really big :) 14:59:16 well, it's hard to push people to review oooq 14:59:21 sounds like a topic for the ML 14:59:23 shardy: not enough 14:59:26 matbu: I find adding people to reviews doesn't work - I guess because we all get so much gerrit email spam it is easily missed 14:59:27 maybe we could have a deep-dive 14:59:34 +1 14:59:36 matbu: https://review.openstack.org/#/admin/groups/190,members 14:59:48 I'm going to close the meeting 14:59:50 shardy: +100 :) 14:59:55 but sounds like matbu you could run the topic on the ML please 14:59:57 that could help. 15:00:06 #endmeeting