15:00:48 #startmeeting third-party 15:00:49 Meeting started Wed Feb 4 15:00:48 2015 UTC and is due to finish in 60 minutes. The chair is krtaylor. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:50 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:53 The meeting name has been set to 'third_party' 15:00:58 Hello 15:01:02 o/ 15:01:04 Hi 15:01:09 who is here for the Third Party CI Working Group meeting? 15:01:21 omrim 15:01:26 Hello, My name is Lenny, I work for Mellanox and will be working on our CI. 15:01:49 hi lennyb 15:01:53 hi omrim 15:02:06 krtaylor: Hello :) 15:02:26 moin moin 15:02:35 ok, well let's get started, we have a full agenda today 15:02:39 hi ja 15:02:51 #topic Third-party CI documentation 15:03:03 howdy krtaylor 15:03:19 so, the documentation sprint went well, but we still have a lot of work to do 15:03:45 omrim, you are going to help us get a FAQ going right? 15:04:19 krtaylor: Yes, I will be glad to get some great refernces 15:04:46 you can use the etherpad to gather ideas if you wish, or start a new one 15:04:49 #link https://etherpad.openstack.org/p/third-party-ci-documentation 15:05:05 there is the link, for everyone else 15:05:26 will revisit there and see what is missing 15:05:38 ja, I think you have some interesting points in the Meta-comments 15:05:46 those could be turned into patches 15:06:08 rfolco, yes please, that woul dbe great 15:06:13 and would be great too 15:06:54 we have done a pretty good job on third-party.rst 15:07:13 but running-your-own.rst is untouched 15:07:20 it needs the most work 15:07:28 are there any upcoming milestones that make it harder/easier to do patches at a given time? I'm unsure how/if the overall release dates affect documentation patches. 15:07:34 we can divide that up to make progress 15:08:12 ja, good question, but we are still in the middle of kilo 15:08:28 we should be good for several more weeks 15:09:03 and since docs don't hurt anything else, it should be fairly independent of release schedules 15:09:36 so, any volunteers for running-your-own? 15:09:41 krtaylor, that's the point of the question... interaction between doc and release dates. e.g. in w3c there is a "quiet period" at certain points where nothing new gets published. 15:10:05 ... little/no interaction was the hoped-for state 15:10:10 ja, oh, yeah I remember that, no nothing like that here 15:10:38 especially for what we are doing to help external testing 15:11:03 krtaylor, is the same true of the puppet splits or will those "have" to quiesce "during" the release? 15:11:43 lennyb, getting your perspective on the documentation would be really good, and we can help you get started in patch writing 15:12:05 ja, not sure I understand, do you mean for adopting the module split-out? 15:12:07 ... if the CI work is closer to ... well, CI process... that's ideal for me at least. 15:12:32 ok, 15:12:49 krtaylor, since puppet is "code" not "doc" it might have to play by different rules - that was the point of that question 15:13:15 ja, good stuff, but lets focus on docs for the moment, can we discuss that at the end in open discussion? 15:13:26 sure 15:14:13 ok, so how do we get the running-your-own to move forward? 15:14:30 do we need to assign chunks? any volunteers? 15:14:39 I'll be in, obviously 15:15:16 seems to be huge 15:15:27 yes, and needs the most work 15:15:52 So, I'll take the requirements section 15:16:02 everyone, grab a section in the etherpad 15:16:21 do we have consensus on what purposes it is intended to serve? e.g. purely reference, closer to tutorial 15:16:59 ...I wouldn't really know what to put into a patch until I understand where the wg wants it to land. 15:17:50 ja, I think refreshing what is there is the first goal, but including links to other parts of the existing documentation is a really good thing 15:18:06 so, reuse the infra manual as much as possible 15:18:39 I guess for running-your-own it's just an update. You need a balance between detailed/superficial 15:18:51 we, as a work group, own keeping this document up to date, so minimizing revisions by referencing other infra docs is a priority 15:18:53 rfolco, yes 15:20:06 ok, well, I encourage everyone to read that doc and think about a section to re-write, its a great way to get started in the community 15:20:22 #link http://ci.openstack.org/running-your-own.html 15:20:28 just for completeness 15:21:09 also remember to please set topic to 'third-party-ci-documentation' 15:21:27 on any patches, makes it easier to track as a group 15:21:51 ok, any questions on documentation, else we'll move on 15:21:52 If I see its current form as a reference, and I think a tutorial is what newbies really need, is the net that I should just write it all up as a patch first and then see if reviewers like it? 15:22:19 ja, smaller is better, just start with a section 15:22:28 easier to review and merge 15:22:37 krtaylor: Great doc thanks 15:23:15 omrim, thank you for your FAQ patch! 15:23:32 ok, lets move on, we have a full agenda today 15:23:48 #topic Splitting out puppet modules 15:24:01 I left this on the agenda, but just to summarize 15:24:30 not sure if we have asselin yet, its early for him 15:24:34 it was a tremendous success 15:24:39 and a lot of fun 15:25:15 +1 15:25:19 thanks mmedvede for a big part, how many patches? 15:25:33 I think I did 14 during the sprint 15:25:40 modules 15:25:46 ~3 patches per module 15:25:57 wow, nice 15:26:18 yeah, I did a search to see and stopped counting after a page 15:26:49 any thoughts about what went well in the sprint, what didnt? 15:27:21 The result is that now puppet modules are in their own projects. And we are encouraged to make those modules more consumable by 3rd parties 15:27:49 krtaylor: the ordering was an issue 15:28:10 I think pleia2 is going to summarize the virtual sprints in general via email, more teams should consider them 15:28:16 mmedvede, agreed 15:28:21 a lot of merge conflicts. We should have figured out a way to avoid this. Other than that, it was efficient 15:28:33 rebase, rebase, rebase 15:29:00 it was amazing, good infra core participation was critical 15:29:07 Infra team helped a lot 15:29:23 ok, lets move on 15:29:43 #topic Spec for in-tree 3rd party ci solution 15:30:14 please review this spec, it is a really good direction and a place where we can get involved 15:30:32 #link https://review.openstack.org/#/c/139745/ 15:31:00 I know asselin would appreciate any feedback or ideas 15:32:07 and finally 15:32:15 #topic Highlighting Third-Party CI Service 15:32:33 this was an idea we had last year, that I wanted to start doing again 15:33:02 one of the goals of this work group is to help each other out, and share how we solved the hard problems 15:33:12 so, to kick this off again 15:33:26 I figured that I would volunteer my team 15:33:35 and specifically rfolco 15:33:43 to come and share what we are doing 15:33:58 and how we solved some problems in our environment 15:34:33 so, rfolco it is all yours 15:35:02 sure I'll just summarize few topics due to the limitation of this format and time 15:35:12 and make breaks for questions 15:35:35 We'll base the discussions on the article https://www.ibm.com/developerworks/community/blogs/fe313521-2e95-46f2-817d-44a4f27eba32/entry/building_your_openstack_3rd_party_ci_system1?lang=en. 15:35:48 For this meeting we're going to focus on the problems we solved and also on the improvements we've made so far. 15:36:06 #link https://www.ibm.com/developerworks/community/blogs/fe313521-2e95-46f2-817d-44a4f27eba32/entry/building_your_openstack_3rd_party_ci_system1?lang=en 15:36:50 so how did we avoid our CI to break every time Openstack Infra changed their code? 15:37:01 Successful rate of builds in Jenkins increased from ~60% to ~97% after implementing two separate envioments: production and development. Before that the CI system was so sensitive and broke more often. 15:37:02 A typical CI system needs custom code and configuration overrides on top of OpenStack Infra code (system-config and project-config). Pinning these projects to a code level on production environment increased stability of our CI jobs significantly. 15:37:22 --pause for questions-- 15:37:55 rfolco, how much of that breakage do you think would be addressed by the puppet work? 15:38:40 I don't have a good estimate number, but I think about 30% more or less 15:38:45 so, I guess we could also say that we are using the upstream infra ci ported to our environment, our goal was to follow them as closely as possible, but if we followed too close, we were not stable 15:38:57 ...and thinking out loud, does it suggest that *infra* needs a CI process to avoid breaking 3rd parties 15:39:24 30% is pretty substantial 15:39:25 ja, no, not their concern, it is up to us to address 15:39:32 the problem ja, is that we had in the beginning some code overrides, not only configuration 15:40:02 our fault on designing a stable CI that does not have workarounds or hacks in the code 15:40:12 he right way to do it is to override config 15:40:30 so now we pin the code to a stable code level that we know that works 15:40:44 +1 on overriding config as the right way 15:40:47 and do experiements / test on a separate environment (dev) 15:40:55 ja, puppet split work should reduce the amount of custom puppet code we have, this is a big advantage 15:41:04 ++ 15:41:48 ok so moving on 15:41:51 hi rfolco 15:41:57 Production services Jenkins, Nodepool and Zuul run on VMs in an x86 cloud. Each of these services has a "clone" for the development environment. This enables a better control of code levels in each service. Another advantage is the ability of creating snapshots for the services. 15:42:27 feel free to ask any questions wznoinsk, this is more fun being interactive :) 15:42:43 in what situations config from upstream has to be updated in 3rd party ci? I'm not using Zuul/Puppet at this very moment would there be something I'd need from upstream infra projects in this case? 15:43:28 Well all depends on your needs 15:43:55 if you wanna report back to community, yes, you need to override zuul yaml config with your config 15:44:09 * krtaylor notes a good documentation topic 15:44:36 this is just an example, the article details better the most common overrides you have to make, and what are their purposes 15:45:01 in our case, we started testing NOva project 15:45:27 so if I'm only using Jenkins + lxc containers in my case via docker (a basic 3rd party CI setup) and I'm always fetching master of each of openstack projects + the change proposed I can leave without getting infra configs for now? 15:45:30 so we went to zuul yaml, and override upstream yaml with our custom yaml just to test Nova 15:46:05 s/leave/live 15:46:58 the key point is: do you need to trigger your test for every patch? Another important thing: if you are stable running upstream code, thats awsome 15:47:27 but in our case we decided to not work with latest upstream code from Infra 15:47:53 It's ok just to use Jenkins 15:48:10 in case you don't need to report back and you define when your tests will run 15:48:30 You can automated your builds this way, nothing wrong... 15:48:43 with efficency of lxc container that's the plan to run on every patchset, also 'stable running upstream code' is a negation of itself ;-) 15:48:59 :) 15:49:18 wznoinsk, it also depends on what and how many tests you have to run, and how big your system needs to scale 15:49:52 I'm still able to comment back on the build, potentially listing tempest/other tests and their success separately in a single comment (without using Zuul), I'm trying to understand whether I should go the Zuul-way already ... 15:49:53 zuul listens on gerrit the patches and queues jobs for you 15:50:12 and reports back 15:50:37 so you define in layout.yaml what to listen, what jobs to trigger and how to report back 15:51:06 wznoinsk, I'll be happy to answer in more detail about this topic... 15:51:32 I know it's beneficial to have Zuul when you have multiple proposed code changes that may depend on each other (two independent jenkins jobs wouldn't catch the reliance) but I don't think using Zuul is a must for 3rd party CI to have, is it? 15:52:41 sure rfolco, I'll catch you after this meeting 15:52:46 wznoinsk, I would really like to hear about your environment, can I schedule you for an upcoming week to tell us about it? 15:53:14 My impression is that the choice of implementation components on your side of the firewall is yours. In that sense, the ssh mechanism Zuul uses and so on is "interface" not "implementation" 15:53:15 krtaylor: yes, with pleasure, thanks 15:53:16 as I said you can run your CI without it, but its easier with zuul I think 15:53:53 ...of course, as with any community, using what others use makes it easier to find help and give it 15:54:05 ja I'd agree, but without Zuul 3rd party CI tests could be missing some of the code breakages that may happen when two/more code changes can break each other 15:54:21 rfolco, we had several areas that we made modifications right? 15:54:31 yes, 15:54:43 The customizations required to run a 3rd-party CI include changes in: layout.yaml (Zuul); nodepool.yaml; (Nodepool); projects.yaml & devstack-gate.yaml (Jnkins Job Builder). 15:54:43 To override code and configurations two internal Git repositories have been created: (1) puppet-config, which overrides code and configuration; and (2) ibm-devstack-gate, which contains additional customizations for devstack-gate jobs such as regex file (skip list for Tempest), pre_test_hook.sh and Swift upload log script. 15:55:12 Both internal repositories (puppet-config and ibm-devstack-gate) work in different branches: dev (development) and master (production). To turn reporting off on production one needs to checkout puppet-config master branch and comment out the lines "success" and "failure" in layout.yaml. Changing base image or memory requirements for slaves in Development Nodepool, one would change nodepool.ya 15:55:12 ml in the repository but push changes to dev branch instead. The same idea applies to JJB configuration files. It's also possible to checkout ibm-devstack-gate and modify regex file and include a new test to the skip list for Tempest runs. 15:56:17 some of the customizations and where we spent more time on out CI: 15:56:20 (1) Build the cloud infrastructure for the services and compute nodes (2) Install and configure services using Puppet (3) Build CirrOS for Power with mainline kernel (4) Skip Tempest failures (5) Resolve devstack-gate problems on Power platform by overriding config (6) Build custom MySQL to resolve issues on Power (7) Map and debug Tempest failures (8) Investigate concurrency problems for Tem 15:56:20 pest (9) Cleanup database for expired keystone entries and deleted instances (our current problem this week) (10) Upload script to Softlayer Swift. 15:57:32 I'm open for anybody to ping me after the meeting nd detail better any of these 15:57:50 back to you Kurt since time is over :) 15:57:53 rfolco: thanks for writing up the article. Very informative. 15:58:07 the blog illustrates this pretty well 15:58:27 rfolco: thanks for getting it all together, it's a compact version of what you should know about 3rd party ci, learned a few new things from it as well 15:58:40 thats great to hear 15:58:54 well, we are still learning how to do this, but rfolco, this was a really good kickoff 15:59:01 thanks guys.. wznoinsk if you wanna discuss zuul role better let me know 15:59:24 I hope to get every CI team to come and share how they fixed problems for their environment 15:59:43 we are out of time, thanks everyone 15:59:55 thx 15:59:55 really good meeting, see you next time 16:00:34 #endmeeting