#openstack-meeting log

15:00:03 <anteaya> #startmeeting third-party
15:00:04 <openstack> Meeting started Mon Mar  7 15:00:03 2016 UTC and is due to finish in 60 minutes.  The chair is anteaya. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:08 <openstack> The meeting name has been set to 'third_party'
15:00:11 <anteaya> hello
15:00:35 <mmedvede> hi anteaya
15:00:43 <anteaya> mmedvede: how are you today?
15:01:29 <mmedvede> anteaya: good
15:01:34 <anteaya> glad to hear it
15:01:59 <anteaya> so I'll post a reminder to folks that openstack meetings use utc time on purpose
15:02:15 <anteaya> so if time changes for you next weekend, refer to utc time for meetings
15:02:18 <asselin> o/
15:02:22 <anteaya> hey asselin
15:02:31 <anteaya> do we have anything to discuss today?
15:03:02 <anteaya> that was a question for either of you
15:03:05 <anteaya> not just asselin
15:03:13 <mmedvede> there is a thread on third-party-announce with a CI reporting merges
15:03:19 <anteaya> yes
15:03:30 <mmedvede> just wanted to note that it could be valid sometimes to report a failure to merge
15:03:32 <anteaya> the ci operator posted a reply today
15:03:41 <anteaya> mmedvede: can you share the usecase?
15:03:57 <anteaya> in this instance it was indicative of a problem on their end
15:04:01 <mmedvede> e.g. if infra jenkins did not fail to merge, but third-party CI did
15:04:14 <anteaya> why is that important for a dev to know
15:04:24 <anteaya> what action should the dev take in that case?
15:04:24 <mmedvede> if third-party CI does not report anything in that case, than it would not be clear what happened
15:04:42 <anteaya> let's look at it from the devs point of view
15:04:48 <anteaya> what should they do in that case?
15:05:07 <asselin> mmedvede, that's why the suggestion is to e-mail the operator.
15:05:13 <mmedvede> in that case they would know that merge failed, and could recheck accordingly
15:05:30 <anteaya> why would they recheck if jenkins was able to merge the patch?
15:05:32 <asselin> I don't see any reason it should fail merge differently by 3rd party ci and infra
15:06:15 <mmedvede> asselin: it could be different, because there is often time difference between when infra zuul merger runs, and third-party CI does
15:06:37 <anteaya> but in the case of a difference, what is the dev to do?
15:06:46 <anteaya> if jenkins merges?
15:07:15 <mmedvede> anteaya: dev would know that third-party CI failed. There could be case they expect a +1 from third-party CI
15:07:23 <anteaya> infra isn't going to counsel people to rebase based on what a third party ci reports
15:07:58 <asselin> I think the issue is that most 3rd party ci merge failures are not legitimate
15:08:13 <mmedvede> well, if TPCI fails, it would also indicate that infra zuul would most likely fail on next recheck or during gate
15:08:23 <asselin> they are caused by e.g. network failures
15:08:40 <mmedvede> asselin: +1 on most being not legitimate.
15:08:59 <mmedvede> I am trying to make a case that in rare cases it could be useful.
15:09:06 <anteaya> if there is something the dev should know then as asselin says, the operator is emailed and can post a comment on a patch
15:09:16 <anteaya> informing the dev of the situation
15:09:18 <mmedvede> Not saying that it should be encouraged for third-party CI to report merge failures all the time
15:09:29 <anteaya> sure
15:09:42 <anteaya> but that is where human intervention is necessary
15:09:48 <anteaya> that the operator read their email
15:09:55 <anteaya> and provide a comment on a patch
15:10:06 <anteaya> as a human, not a third party ci system
15:10:11 <anteaya> that is most welcome
15:10:20 <anteaya> and highly appropriate
15:10:47 <anteaya> does that make sense?
15:11:04 <anteaya> and thanks asselin for taking the time to reply to that thread
15:11:59 <anteaya> did we want to say more on this topic?
15:12:08 <anteaya> mmedvede: thanks for bringing it up
15:12:21 <mmedvede> nothing more here, thanks for discussion
15:12:36 <anteaya> thank you
15:12:46 <anteaya> so in this reply http://lists.openstack.org/pipermail/third-party-announce/2016-March/000295.html
15:13:02 <anteaya> the third party operator is missing the importance of recieving the email
15:13:11 <anteaya> would either of you like to reply to that post?
15:13:17 <anteaya> I can if you don't want to
15:13:31 <anteaya> was just thinking it would be nice to hear from other operators
15:13:57 <anteaya> right now their configuration has email going to a dummy account
15:13:58 <asselin> I read it but wasn't sure how to reply more than restating my original e-mail
15:14:10 <anteaya> asselin: okay fair enough, thank you
15:14:23 <anteaya> mmedvede: would you like to reply? or shall I?
15:14:52 <mmedvede> I am not sure what to reply there
15:14:59 <anteaya> okay that is fine
15:15:25 <anteaya> the fact they have email going to to: third_party_ci at example.com was what I was going to address
15:15:27 <mmedvede> I can just say that the config he is asking about is the one he should be using, I guess
15:15:31 <mmedvede> (with the email)
15:15:35 <anteaya> as I doubt that is their email address
15:15:58 <anteaya> I'll reply
15:16:03 <anteaya> thanks for taking a look
15:16:10 <mmedvede> didn't they use those emails as examples?
15:16:19 <anteaya> I don't think so
15:16:31 <anteaya> I think that is what they have in their config file
15:16:45 <anteaya> so that is what I want to address
15:17:13 <anteaya> what we just discussed, read the email from your system and reply as a human on patches where it makes sense for the developer
15:18:15 <anteaya> so thanks for having the discussion with me
15:18:19 <anteaya> so I know how to reply
15:18:21 <anteaya> :)
15:18:32 <anteaya> is there anything more to be said in this topic?
15:18:52 <mmedvede> nothing except I now fear our CI would start reporting merge failures too :)
15:19:01 <mmedvede> need to make sure it does not
15:19:03 <anteaya> mmedvede: has it in the past?
15:19:06 <mmedvede> yes
15:19:07 <anteaya> mmedvede: ah thank you
15:19:22 <anteaya> well I did try to find them online
15:19:30 <anteaya> and they weren't around
15:19:34 <anteaya> and a dev complained
15:19:38 <asselin> mmedvede, I thought you were the one that originally suggested that :)
15:19:43 <anteaya> as they had 4 systems doing that
15:20:11 <anteaya> so four systems spamming patches was a little noisy
15:21:00 <mmedvede> asselin: could be me who suggested it exactly because our CI was doing that
15:21:11 <mmedvede> but that was awhile back
15:21:26 <anteaya> good solutions stand the test of time
15:21:55 <anteaya> so thank you for your solution mmedvede :)
15:22:12 <anteaya> do we have other topics we would like to discuss today?
15:22:32 <mmedvede> I do not want to take credit, possibly was not me. My memory fails me on that
15:22:47 <anteaya> I just know it wasn't me
15:22:51 <mmedvede> hehe
15:22:53 <anteaya> thanks to whomever it was
15:23:05 <anteaya> and I'm fine if it was or was not mmedvede
15:23:15 <mmedvede> I have one topic - wondered if anyone else has the same problem - zuul memory leak
15:23:26 <anteaya> mmedvede: yes infra has the same problem
15:24:02 <mmedvede> ok, I am thinking of implementing a workaround - periodic restart of zuul service
15:24:19 <anteaya> mmedvede: hmmmmm, we are trying to collect more data on the memory leak
15:24:31 <anteaya> we aren't fans of restarting for memory leak issues
15:24:57 <anteaya> that is sort of our last stand when trying to find the real solution
15:24:59 <jeblair> i believe jhesketh has been looking into the problem
15:25:37 <mmedvede> restart is last resort, but guaranteed to work :)
15:25:42 <anteaya> jeblair: what is the current theory of the cause?
15:25:51 <anteaya> or is there a current theory?
15:25:56 <anteaya> mmedvede: true
15:26:28 <mmedvede> last time for me it took zuul about 3 days to consume 95% of 8GB ram, and halt the system
15:26:40 <anteaya> mmedvede: :(
15:26:46 <anteaya> mmedvede: that makes us sad
15:27:15 <jeblair> i don't see an outstanding patch, so the best thing to do may be to continue discussion on the mailing list thread
15:27:19 <anteaya> mmedvede: so for you is the memory leak consuming memory faster than before?
15:27:33 <jeblair> it's possible jhesketh thought it was fixed and is unaware that it is not.
15:27:35 * anteaya looks for the mailing list thread
15:27:41 <mmedvede> anteaya: I did not see a pattern, sometimes it took a week/2 weeks
15:27:47 <mmedvede> last time was fast
15:27:49 <anteaya> jeblair: I don't think it is fixed for infra
15:27:55 <jeblair> anteaya: i agree
15:28:31 <anteaya> #link http://lists.openstack.org/pipermail/openstack-infra/2016-February/003722.html
15:28:37 <jeblair> anteaya: while i think we restarted after jheskeths most recent fix, i am not positive.  that should be verified before we solidify that conclusion
15:28:44 <anteaya> mmedvede: so this is the start of the zuul memory leak thread
15:28:52 <anteaya> mmedvede: please add your experience
15:29:18 <mmedvede> anteaya: ok, thanks for the link
15:29:20 <anteaya> jeblair: I think the rename zuul restart incorporated jhesketh's lates change
15:29:29 <anteaya> but again yes, confirmation would be good here
15:30:08 <mmedvede> I also need to update to latest zuul. But I did not see any patches landed that made me think there was another fix attempt
15:30:09 <anteaya> mmedvede: and thank you for sharing your experience, hopefully with more folks running zuul's we can close in on a solution
15:30:56 <anteaya> #link http://git.openstack.org/cgit/openstack-infra/zuul/commit/?id=90b61dbde89402971411a63f7596719db63f6155
15:31:35 <anteaya> that merged Feb. 3rd
15:32:05 <anteaya> last rename was Feb. 12th: https://etherpad.openstack.org/p/repo-renaming-2016-02-12
15:32:14 <mmedvede> yes, that is the last one I remember. Our zuul uses that patch
15:32:31 <anteaya> jeblair: based on those dates I'm going with zuul is running jhesketh's latest memory leak solution attempt
15:32:37 <anteaya> mmedvede: okay thanks
15:33:02 <anteaya> mmedvede: and you still have zuul using up memory within 3 days and ceasing to run
15:33:04 <anteaya> yes?
15:33:36 <mmedvede> anteaya: no, it is not always the same. Last one was about 3 days. I assume it depends on patch volume
15:33:44 <anteaya> right
15:34:05 <anteaya> but I'm confirming that the last time you had an issue your zuul was running jhesketh's patch
15:34:23 <anteaya> is that accurate?
15:35:33 <mmedvede> yes. For the record, zuul ui is reporting 'Zuul version: 2.1.1.dev123'
15:35:39 <anteaya> mmedvede: thank you
15:36:17 <anteaya> so please include your experience on the mailing list thread
15:36:38 <mmedvede> anteaya: ok
15:36:44 <anteaya> it is possible, as I believe jeblair stated above, that jhesketh belives the memory leak is fixed
15:36:52 <anteaya> yet you have data that this is not the case
15:37:26 <anteaya> if you could do a git log on your zuul version and confirm the presence of http://git.openstack.org/cgit/openstack-infra/zuul/commit/?id=90b61dbde89402971411a63f7596719db63f6155
15:37:30 <anteaya> that would be awesome
15:37:39 <anteaya> that way we aren't guessing
15:37:49 <anteaya> thanks for bringing up the topic, mmedvede
15:38:08 <anteaya> do we have any more discussion on this topic?
15:38:38 <mmedvede> nothing from me
15:38:47 <anteaya> do we have any other topic we would like to discuss?
15:38:53 <anteaya> thanks mmedvede
15:39:18 <anteaya> are there any objections to me closing the meeting?
15:39:38 <anteaya> thanks everyone for your kind attendance and participation today
15:39:47 <kcalman> \o
15:39:54 <anteaya> check utc times for meetings next week if your time changes
15:40:02 <anteaya> kcalman: have you an item you wish to discuss?
15:40:18 <kcalman> No, I m good
15:40:22 <anteaya> okay thank you
15:40:24 <anteaya> welcome
15:40:38 <kcalman> thanks
15:40:42 <anteaya> I look forward to seeing everyone next week
15:40:45 <anteaya> thank you
15:40:48 <anteaya> #endmeeting