15:02:04 <lennyb> #startmeeting third-party 15:02:05 <openstack> Meeting started Mon Mar 20 15:02:04 2017 UTC and is due to finish in 60 minutes. The chair is lennyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:02:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:02:08 <openstack> The meeting name has been set to 'third_party' 15:02:13 <lennyb> Hello 15:03:32 <mmedvede> o/ 15:03:41 <lennyb> mmedvede, hi, 15:03:59 <lennyb> do you run fedora 24 in your CI? 15:04:05 <asselin> o/ 15:04:41 <lennyb> hello asselin 15:04:50 <mmedvede> lennyb: no, we only run Ubuntu at the moment. 15:04:53 <pots> o/ 15:04:54 <lennyb> I am getting strange error http://paste.openstack.org/show/603440/ 15:04:58 <lennyb> hello pots 15:05:15 <lennyb> mmedvede what verion of ubuntu? 15:05:43 <mmedvede> lennyb: our dsvm tests use Xenial 15:06:55 <lennyb> mmedvede, I see. 15:07:37 <lennyb> asselin, btw, your suggestion regarding zuul configuration for using one time node is not working for us, since we use multijob plugin. 15:08:02 <lennyb> post, asselin, mmedvede - anything you would like to discuss today? 15:08:04 <asselin> what is the multijob plugin? Is that jenkins? 15:08:37 <lennyb> asselin, yes jenkins multijob plugin https://wiki.jenkins-ci.org/display/JENKINS/Multijob+Plugin 15:08:55 <lennyb> one job to run a list of other jhobs 15:09:27 <lennyb> it is also supported by JJB 15:10:40 * asselin_ having network issues 15:11:09 <lennyb> I also face something similiar to #link https://bugs.launchpad.net/zuul/+bug/1270029 . with zuul 2.5.1. I am trying to check the proposed workaround 15:11:09 <openstack> Launchpad bug 1270029 in Zuul "zuul doesn't connect to gearman server" [Undecided,New] 15:11:32 <asselin_> lennyb: I see...I don't think zuul and multijob plugin will play nice together. 15:12:18 <mptacekx> lennyb: I think I saw this as well 15:12:31 <lennyb> asselin_, is there a way to make jobs dependable in zuul? 15:13:01 <lennyb> like Build JOB -> Install JOB -> Test Job 15:13:29 <asselin_> lennyb: I don't think so...maybe zuul v3 has that. 15:13:51 <asselin_> lennyb: otherwise zuul does have post-build jobs ability, but not quite like you want 15:13:57 <mmedvede> zuul v3 (maybe even 2.5) should have that 15:13:59 <lennyb> mptacekx, what are you doing to overcome it? mmedvede uses zuul restart one a week. 15:14:38 <lennyb> mmedvede, I will check 2.5, thanks 15:15:10 <asselin_> lennyb: but you should be able to structure the individual parts of the job in JJB, then combine them into one high level job...but you do lose the parallel part that I suppose you're looking for. 15:15:23 <mptacekx> lennyb: I saw it just once in couple of months with zuul 2.5.1, had to restart zuul to wake him up. Not a real cure 15:16:53 <lennyb> asselin_, I use multijob plugin for this. 15:17:41 <lennyb> I can also run some jobs in parallel, there. 15:18:03 <lennyb> Is there anything else you would like to share/ask/discuss today? 15:18:09 <asselin_> lennyb: so remind me what the issue is again? Using multijob plugin with nodepool? 15:18:59 <lennyb> asselin_, I must add quite-period of 1min to a job, since it take s time for a nodepool to remove a slave from the jenkins. 15:19:41 <lennyb> and since all the job takes about 10min, 3 minutes of quite-period is 30% of the run time :) 15:20:13 <lennyb> but it's not that critical. so let's move to another things, if there are 15:20:15 <asselin_> lennyb: ok I see...yeah, might need to use that workaround since the tooling isn't designed to work as you want. 15:20:32 <pots> lennyb: i tried the quiet-period change but it had no effect--is there any way to fix this in zuul? 15:20:34 <asselin_> lennyb: ok. btw in zuulv3 it will switch to ansible and you'll have better support for the use case 15:21:09 <lennyb> pots, did you try asselin_ 's suggestion from the last week> 15:21:32 <pots> i don't recall what it was? 15:22:38 * lennyb looking 15:22:58 <lennyb> #link http://eavesdrop.openstack.org/irclogs/%23openstack-meeting/%23openstack-meeting.2017-03-13.log.html 15:24:24 <asselin_> this here? http://eavesdrop.openstack.org/irclogs/%23openstack-meeting/%23openstack-meeting.2017-03-13.log.html#t2017-03-13T15:04:54 15:24:24 <pots> oops, i did not see that. 15:24:50 <lennyb> anything else on this issue? 15:25:42 <lennyb> any other issues? 15:25:58 <pots> i've already got the parameter-function: single_use_node entry 15:26:16 <lennyb> pots, what is not working in your setup? 15:26:48 <asselin_> pots: are you using it in zuul's layout.yaml? 15:27:25 <pots> i'm using openstackci-puppet on top of devstack, with two jobs defined (one for fc, one iscsi). whenever a job completes, it looks like zuul starts the next job using the old nodepool vm, which fails. then nodepool comes around and kills the old vm a minute later. 15:27:44 <pots> so basically every other job fails 15:27:53 <asselin_> pots: can you share your zuul layout.yaml file? 15:28:16 <lennyb> pots, do you have jenkins? 15:28:20 <asselin_> does it look like what I shared before? http://eavesdrop.openstack.org/irclogs/%23openstack-meeting/%23openstack-meeting.2017-03-13.log.html#t2017-03-13T15:05:19 15:28:32 <asselin_> - name: ^dsvm-tempest.*$ parameter-function: single_use_node 15:29:33 <pots> yes, it looks like that. 15:29:41 <pots> sorry, can't seem to cut & paste today 15:29:49 <asselin_> btw I need to leave in 2 mins. pots if you still have issues we can discuss in openstack-infra. Others there might be able to help. 15:30:04 <asselin_> discuss later* 15:30:04 <lennyb> thanks, asselin_ 15:30:10 <pots> ok, i will see you there. thanks! 15:30:28 * asselin_ leaves now 15:30:37 <lennyb> pots, do you use jenkins? 15:30:59 <pots> yes, i'm following the recipe for openstackci-puppet. same issue with jenkins 2.x and 1.651 and 1.656 15:32:00 <lennyb> pots, do you see 'quite period' in jenkins job? 15:32:32 <pots> yes 15:32:35 <clarkb> newer jenkins filters out injected vars by default. Zuul coordinates single use VM action with jenkins via a job parameter iirc. You may need to whitelist that var 15:33:21 <lennyb> pots, so quite period should work anyway, maybe increase the delay. 15:33:22 <pots> i got the impression that quiet-period only affected the normal jenkins repo triggers and would not necessarily impact gearman 15:33:49 <pots> but i will try a much larger value and see if it has any effect. 15:34:29 <lennyb> pots, in our case Jenkins waits before starting job, in the meantime nodepool have enough time to delete slave from the jenkins 15:36:42 <pots> i was wondering if there might be some way to modify zuul/gearman behavior. e.g. doesn't gearman choose a specific executor before giving the job to jenkins? 15:37:16 <lennyb> pots, try asking in infra channel 15:38:30 <pots> will do. one last question, i was wondering what the conventional wisdom was re: setting up a new cloud for the CI to run on, something closer to a devstack that can survive a reboot? 15:39:03 <lennyb> pots, we have CI on RDO based cloud 15:39:13 <clarkb> pots: lennyb ^ seem my earlier comment 15:39:49 <pots> clarkb thanks, i resolved that issue earlier. 15:39:50 <lennyb> pots, but we are considering to check Fuel as cloud provider for CI 15:39:59 <mmedvede> pots: we have a full blown cloud that we maintain (not based on devstack) 15:40:16 <clarkb> pots: did you resolve it by whitelisting vars? and if so did you whitelist the var that tells gearman to not reuse a host? 15:40:37 <clarkb> its something like OFFLINE_NODE or something 15:41:03 <mmedvede> clarkb: I believe the way most third-party CI's do is to disable the security feature of jenkins, so no need to whitelist variables. But pots' setup might be different 15:41:13 <pots> clarkb: I thought I whitelisted everything 15:42:31 <pots> clarkb: asselin_ showed me an additional parameter for jenkins that works. 15:44:50 <pots> i don't see any OFFLINE_XXX parameter in the jenkins jobs 15:46:56 <lennyb> pots, asselin_ proposed zuul based param(I dont think it will be reflected in Jenkins). I proposed quite-period that can be seen in jenkins. 15:47:19 <clarkb> params['OFFLINE_NODE_WHEN_COMPLETE'] = '1' is what single_user_node sets in the gearman communication from zuul to jenkins 15:47:47 <pots> so that should appear in the job parameters in the jenkins UI? 15:49:18 <clarkb> it may, jenkins is weird about what params it shows iirc 15:51:11 <pots> it seems like a race condition, i think when i looked at the logs you could see the wrong thing happening in the same second. 15:51:36 <pots> my systems are very slow, so that just makes everything more fun. 15:52:26 <pots> lennyb so would you recommend chucking xenial/devstack for RHEL7/packstack for an all-in-one to run the CI on? 15:52:36 <clarkb> the actual implementation of offline node when complete should toggle the offline bit when job is running then jenkins puts it into effect when the job is completed (so there could be a race there somewher) 15:53:58 <pots> any way to debug the gearman stuff? 15:54:02 <lennyb> pots, I am not familiar with xenial,packstack. we use rdo #link https://www.rdoproject.org . It's RedHat oriented, but not really packstack 15:55:09 <pots> i'm just looking for an All-In-One solution that works out of the box 15:56:29 <lennyb> pots, rdo works out of the box. to be honest, we had to add networks manually, but it was quite straight forward. I had a lot of positive things regarding Fuel, but did not use it yet 15:56:50 <pots> i will look into that, thanks. 15:57:19 <lennyb> any other shares/questions/proposals ? before our time ends? 15:57:24 <pots> i tried openstack-ansible but it did not pass it's own tests 15:57:51 <pots> lennyb: not for me, thanks all 15:58:38 <lennyb> Great, mmedvede, clarkb, pots, mptacekx, asselin_ .. thanks. see you next week 15:58:41 <clarkb> pots: yes one thing you could do to help debug it is while a job is running use the nodepool hold command to hold that node (it won't be deleted) then check that the node is marked offline properly when the job completes 15:59:34 <lennyb> clarkb, can I 'release' node without deleting it after hold? 16:00:33 <clarkb> release it from what perspective? you can delete it in jenkins so jenkins will stop doing things to it 16:00:43 <clarkb> but if you delete it in nodepool it iwll get deleted from the cloud 16:01:47 <lennyb> clarkb, ignore my question. thanks. 16:01:54 <lennyb> #endmeeting