15:00:03 <anteaya> #startmeeting third-party 15:00:04 <openstack> Meeting started Mon Mar 7 15:00:03 2016 UTC and is due to finish in 60 minutes. The chair is anteaya. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:08 <openstack> The meeting name has been set to 'third_party' 15:00:11 <anteaya> hello 15:00:35 <mmedvede> hi anteaya 15:00:43 <anteaya> mmedvede: how are you today? 15:01:29 <mmedvede> anteaya: good 15:01:34 <anteaya> glad to hear it 15:01:59 <anteaya> so I'll post a reminder to folks that openstack meetings use utc time on purpose 15:02:15 <anteaya> so if time changes for you next weekend, refer to utc time for meetings 15:02:18 <asselin> o/ 15:02:22 <anteaya> hey asselin 15:02:31 <anteaya> do we have anything to discuss today? 15:03:02 <anteaya> that was a question for either of you 15:03:05 <anteaya> not just asselin 15:03:13 <mmedvede> there is a thread on third-party-announce with a CI reporting merges 15:03:19 <anteaya> yes 15:03:30 <mmedvede> just wanted to note that it could be valid sometimes to report a failure to merge 15:03:32 <anteaya> the ci operator posted a reply today 15:03:41 <anteaya> mmedvede: can you share the usecase? 15:03:57 <anteaya> in this instance it was indicative of a problem on their end 15:04:01 <mmedvede> e.g. if infra jenkins did not fail to merge, but third-party CI did 15:04:14 <anteaya> why is that important for a dev to know 15:04:24 <anteaya> what action should the dev take in that case? 15:04:24 <mmedvede> if third-party CI does not report anything in that case, than it would not be clear what happened 15:04:42 <anteaya> let's look at it from the devs point of view 15:04:48 <anteaya> what should they do in that case? 15:05:07 <asselin> mmedvede, that's why the suggestion is to e-mail the operator. 15:05:13 <mmedvede> in that case they would know that merge failed, and could recheck accordingly 15:05:30 <anteaya> why would they recheck if jenkins was able to merge the patch? 15:05:32 <asselin> I don't see any reason it should fail merge differently by 3rd party ci and infra 15:06:15 <mmedvede> asselin: it could be different, because there is often time difference between when infra zuul merger runs, and third-party CI does 15:06:37 <anteaya> but in the case of a difference, what is the dev to do? 15:06:46 <anteaya> if jenkins merges? 15:07:15 <mmedvede> anteaya: dev would know that third-party CI failed. There could be case they expect a +1 from third-party CI 15:07:23 <anteaya> infra isn't going to counsel people to rebase based on what a third party ci reports 15:07:58 <asselin> I think the issue is that most 3rd party ci merge failures are not legitimate 15:08:13 <mmedvede> well, if TPCI fails, it would also indicate that infra zuul would most likely fail on next recheck or during gate 15:08:23 <asselin> they are caused by e.g. network failures 15:08:40 <mmedvede> asselin: +1 on most being not legitimate. 15:08:59 <mmedvede> I am trying to make a case that in rare cases it could be useful. 15:09:06 <anteaya> if there is something the dev should know then as asselin says, the operator is emailed and can post a comment on a patch 15:09:16 <anteaya> informing the dev of the situation 15:09:18 <mmedvede> Not saying that it should be encouraged for third-party CI to report merge failures all the time 15:09:29 <anteaya> sure 15:09:42 <anteaya> but that is where human intervention is necessary 15:09:48 <anteaya> that the operator read their email 15:09:55 <anteaya> and provide a comment on a patch 15:10:06 <anteaya> as a human, not a third party ci system 15:10:11 <anteaya> that is most welcome 15:10:20 <anteaya> and highly appropriate 15:10:47 <anteaya> does that make sense? 15:11:04 <anteaya> and thanks asselin for taking the time to reply to that thread 15:11:59 <anteaya> did we want to say more on this topic? 15:12:08 <anteaya> mmedvede: thanks for bringing it up 15:12:21 <mmedvede> nothing more here, thanks for discussion 15:12:36 <anteaya> thank you 15:12:46 <anteaya> so in this reply http://lists.openstack.org/pipermail/third-party-announce/2016-March/000295.html 15:13:02 <anteaya> the third party operator is missing the importance of recieving the email 15:13:11 <anteaya> would either of you like to reply to that post? 15:13:17 <anteaya> I can if you don't want to 15:13:31 <anteaya> was just thinking it would be nice to hear from other operators 15:13:57 <anteaya> right now their configuration has email going to a dummy account 15:13:58 <asselin> I read it but wasn't sure how to reply more than restating my original e-mail 15:14:10 <anteaya> asselin: okay fair enough, thank you 15:14:23 <anteaya> mmedvede: would you like to reply? or shall I? 15:14:52 <mmedvede> I am not sure what to reply there 15:14:59 <anteaya> okay that is fine 15:15:25 <anteaya> the fact they have email going to to: third_party_ci at example.com was what I was going to address 15:15:27 <mmedvede> I can just say that the config he is asking about is the one he should be using, I guess 15:15:31 <mmedvede> (with the email) 15:15:35 <anteaya> as I doubt that is their email address 15:15:58 <anteaya> I'll reply 15:16:03 <anteaya> thanks for taking a look 15:16:10 <mmedvede> didn't they use those emails as examples? 15:16:19 <anteaya> I don't think so 15:16:31 <anteaya> I think that is what they have in their config file 15:16:45 <anteaya> so that is what I want to address 15:17:13 <anteaya> what we just discussed, read the email from your system and reply as a human on patches where it makes sense for the developer 15:18:15 <anteaya> so thanks for having the discussion with me 15:18:19 <anteaya> so I know how to reply 15:18:21 <anteaya> :) 15:18:32 <anteaya> is there anything more to be said in this topic? 15:18:52 <mmedvede> nothing except I now fear our CI would start reporting merge failures too :) 15:19:01 <mmedvede> need to make sure it does not 15:19:03 <anteaya> mmedvede: has it in the past? 15:19:06 <mmedvede> yes 15:19:07 <anteaya> mmedvede: ah thank you 15:19:22 <anteaya> well I did try to find them online 15:19:30 <anteaya> and they weren't around 15:19:34 <anteaya> and a dev complained 15:19:38 <asselin> mmedvede, I thought you were the one that originally suggested that :) 15:19:43 <anteaya> as they had 4 systems doing that 15:20:11 <anteaya> so four systems spamming patches was a little noisy 15:21:00 <mmedvede> asselin: could be me who suggested it exactly because our CI was doing that 15:21:11 <mmedvede> but that was awhile back 15:21:26 <anteaya> good solutions stand the test of time 15:21:55 <anteaya> so thank you for your solution mmedvede :) 15:22:12 <anteaya> do we have other topics we would like to discuss today? 15:22:32 <mmedvede> I do not want to take credit, possibly was not me. My memory fails me on that 15:22:47 <anteaya> I just know it wasn't me 15:22:51 <mmedvede> hehe 15:22:53 <anteaya> thanks to whomever it was 15:23:05 <anteaya> and I'm fine if it was or was not mmedvede 15:23:15 <mmedvede> I have one topic - wondered if anyone else has the same problem - zuul memory leak 15:23:26 <anteaya> mmedvede: yes infra has the same problem 15:24:02 <mmedvede> ok, I am thinking of implementing a workaround - periodic restart of zuul service 15:24:19 <anteaya> mmedvede: hmmmmm, we are trying to collect more data on the memory leak 15:24:31 <anteaya> we aren't fans of restarting for memory leak issues 15:24:57 <anteaya> that is sort of our last stand when trying to find the real solution 15:24:59 <jeblair> i believe jhesketh has been looking into the problem 15:25:37 <mmedvede> restart is last resort, but guaranteed to work :) 15:25:42 <anteaya> jeblair: what is the current theory of the cause? 15:25:51 <anteaya> or is there a current theory? 15:25:56 <anteaya> mmedvede: true 15:26:28 <mmedvede> last time for me it took zuul about 3 days to consume 95% of 8GB ram, and halt the system 15:26:40 <anteaya> mmedvede: :( 15:26:46 <anteaya> mmedvede: that makes us sad 15:27:15 <jeblair> i don't see an outstanding patch, so the best thing to do may be to continue discussion on the mailing list thread 15:27:19 <anteaya> mmedvede: so for you is the memory leak consuming memory faster than before? 15:27:33 <jeblair> it's possible jhesketh thought it was fixed and is unaware that it is not. 15:27:35 * anteaya looks for the mailing list thread 15:27:41 <mmedvede> anteaya: I did not see a pattern, sometimes it took a week/2 weeks 15:27:47 <mmedvede> last time was fast 15:27:49 <anteaya> jeblair: I don't think it is fixed for infra 15:27:55 <jeblair> anteaya: i agree 15:28:31 <anteaya> #link http://lists.openstack.org/pipermail/openstack-infra/2016-February/003722.html 15:28:37 <jeblair> anteaya: while i think we restarted after jheskeths most recent fix, i am not positive. that should be verified before we solidify that conclusion 15:28:44 <anteaya> mmedvede: so this is the start of the zuul memory leak thread 15:28:52 <anteaya> mmedvede: please add your experience 15:29:18 <mmedvede> anteaya: ok, thanks for the link 15:29:20 <anteaya> jeblair: I think the rename zuul restart incorporated jhesketh's lates change 15:29:29 <anteaya> but again yes, confirmation would be good here 15:30:08 <mmedvede> I also need to update to latest zuul. But I did not see any patches landed that made me think there was another fix attempt 15:30:09 <anteaya> mmedvede: and thank you for sharing your experience, hopefully with more folks running zuul's we can close in on a solution 15:30:56 <anteaya> #link http://git.openstack.org/cgit/openstack-infra/zuul/commit/?id=90b61dbde89402971411a63f7596719db63f6155 15:31:35 <anteaya> that merged Feb. 3rd 15:32:05 <anteaya> last rename was Feb. 12th: https://etherpad.openstack.org/p/repo-renaming-2016-02-12 15:32:14 <mmedvede> yes, that is the last one I remember. Our zuul uses that patch 15:32:31 <anteaya> jeblair: based on those dates I'm going with zuul is running jhesketh's latest memory leak solution attempt 15:32:37 <anteaya> mmedvede: okay thanks 15:33:02 <anteaya> mmedvede: and you still have zuul using up memory within 3 days and ceasing to run 15:33:04 <anteaya> yes? 15:33:36 <mmedvede> anteaya: no, it is not always the same. Last one was about 3 days. I assume it depends on patch volume 15:33:44 <anteaya> right 15:34:05 <anteaya> but I'm confirming that the last time you had an issue your zuul was running jhesketh's patch 15:34:23 <anteaya> is that accurate? 15:35:33 <mmedvede> yes. For the record, zuul ui is reporting 'Zuul version: 2.1.1.dev123' 15:35:39 <anteaya> mmedvede: thank you 15:36:17 <anteaya> so please include your experience on the mailing list thread 15:36:38 <mmedvede> anteaya: ok 15:36:44 <anteaya> it is possible, as I believe jeblair stated above, that jhesketh belives the memory leak is fixed 15:36:52 <anteaya> yet you have data that this is not the case 15:37:26 <anteaya> if you could do a git log on your zuul version and confirm the presence of http://git.openstack.org/cgit/openstack-infra/zuul/commit/?id=90b61dbde89402971411a63f7596719db63f6155 15:37:30 <anteaya> that would be awesome 15:37:39 <anteaya> that way we aren't guessing 15:37:49 <anteaya> thanks for bringing up the topic, mmedvede 15:38:08 <anteaya> do we have any more discussion on this topic? 15:38:38 <mmedvede> nothing from me 15:38:47 <anteaya> do we have any other topic we would like to discuss? 15:38:53 <anteaya> thanks mmedvede 15:39:18 <anteaya> are there any objections to me closing the meeting? 15:39:38 <anteaya> thanks everyone for your kind attendance and participation today 15:39:47 <kcalman> \o 15:39:54 <anteaya> check utc times for meetings next week if your time changes 15:40:02 <anteaya> kcalman: have you an item you wish to discuss? 15:40:18 <kcalman> No, I m good 15:40:22 <anteaya> okay thank you 15:40:24 <anteaya> welcome 15:40:38 <kcalman> thanks 15:40:42 <anteaya> I look forward to seeing everyone next week 15:40:45 <anteaya> thank you 15:40:48 <anteaya> #endmeeting