19:03:32 <jeblair> #startmeeting infra
19:03:33 <openstack> Meeting started Tue May  5 19:03:32 2015 UTC and is due to finish in 60 minutes.  The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:03:35 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:03:38 <openstack> The meeting name has been set to 'infra'
19:03:38 <greghaynes> O/
19:03:44 <jeblair> #link agenda https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:03:45 <jeblair> #link previous meeting http://eavesdrop.openstack.org/meetings/infra/2015/infra.2015-04-28-19.04.html
19:03:52 <krtaylor> o/
19:03:58 <jeblair> #topic Actions from last meeting
19:04:10 <jeblair> fungi check our cinder quota in rax-dfw
19:04:19 <fungi> uh, yeah, not done
19:04:27 <jeblair> #action fungi check our cinder quota in rax-dfw
19:04:57 <jeblair> #topic Summit planning
19:05:15 <jeblair> thanks to folks who put things on https://etherpad.openstack.org/p/infra-liberty-summit-planning
19:05:48 <jeblair> mordred, SpamapS: do you think we ought to talk about infra-cloud at the summit?
19:05:56 <mordred> yes
19:06:01 * anteaya reminds folks to enter their name in the top right corner of the etherpad
19:06:20 <jeblair> mordred: okay, should probably put something in there real quick like
19:06:21 <mordred> SpamapS: I'm slammed this week - can you make an entry for that?
19:06:24 <greghaynes> Is there new stuff to talk about since we last talked? Or just convey info?
19:06:27 <mordred> me tries
19:07:19 <jeblair> greghaynes: i would like us to make a go/no-go decision.  in my mind that means determining the scope of work and whether we have enough people lined up and ready for it
19:07:36 <mordred> jeblair: I have put in a placeholder entry
19:07:40 <greghaynes> Awesome
19:07:45 <jeblair> mordred: thx
19:08:14 <jeblair> i expect to translate that into an actual schedule this week
19:08:58 <jeblair> #topic Priority Efforts
19:09:41 <jeblair> this part of the meeting has started to get a little status-reporty, which i think generally we want to avoid, and instead focus on things we need to work through widely together
19:09:52 <jeblair> but giving these items priority
19:10:36 <jeblair> so i'm thinking we should ask people involved in the priority efforts to update the meeting agenda to flag that they have something to discuss; or else, i can ask at the start of the meeting...
19:10:41 <jeblair> how does that sound?
19:10:53 <fungi> i think that's a great idea
19:11:00 <jhesketh> Sounds good to me
19:11:03 <GheRivero> like it
19:11:03 <anteaya> me too
19:11:03 <jeblair> also, shout now if you have a priority effort thing to discuss :)
19:11:06 <pleia2> wfm
19:11:10 <fungi> makes the meeting less of a scramble to get through those and still have time for other incidental topics
19:11:14 <anteaya> can we talk about gerrit for saturday?
19:11:20 <anteaya> just to ensure we are ready?
19:11:25 <nibalizer> wfm
19:11:30 <jeblair> anteaya: yep
19:11:33 <anteaya> also I can't be here on saturday, sorry
19:11:35 <jhesketh> I have nothing to discuss sorry (ie no progress)
19:12:00 <fungi> yeah, with the gerrit upgrade coming up this weekend, talking about that is probably a great idea ;)
19:12:08 <pleia2> only have progress-report stuff for zanata, we're doing fine and have nothing to discuss more broadly
19:12:31 <clarkb> jeblair: sounds good to me (updating agends if specific items need discussion)
19:12:45 <jeblair> oh, also i have a todo item to make the priority effort gerrit topics more visible to reduce the need to should for reviews in this meeting
19:12:58 <pleia2> jeblair: ++
19:12:59 <jeblair> #action jeblair make the priority effort gerrit topics more visible to reduce the need to shout for reviews in this meeting
19:13:19 <jeblair> #topic Priority Effort: Upgrading Gerrit Saturday May 9
19:13:28 <jeblair> that's a few days away!
19:13:43 <fungi> we probably need to talk about 2.10.2 vs 2.10.3+
19:14:00 <fungi> the main concern for latest 2.10.x was...
19:14:04 <fungi> #link https://groups.google.com/forum/#!msg/repo-discuss/Kv4bWLESbQ4/-oSNbuTQwkUJ
19:14:44 <jeblair> so 2.10.2 has the sshd version we are using now
19:14:47 <jeblair> right?
19:14:54 <fungi> someone running gerrit saw a lockup on 2.10.3 (which is also the one with the ssh lib that's supposed to solve our stream-events problem)
19:15:14 <fungi> but their lockup seems afterward to have been likely unrelated
19:15:15 <jeblair> fungi: wait, there's a known fix for our stream-events problem?
19:15:35 <fungi> #link https://issues.apache.org/jira/browse/SSHD-348
19:15:37 <clarkb> fungi: jeblair I think there was a supposed fix via MINA SSHD update to fix one bug
19:15:40 <fungi> supposedly
19:16:19 <jeblair> isn't that the one that was introduced _after_ our version?
19:17:01 <jeblair> my recollection is that was introduced in 2.9.x, reverted in 2.9.y, fixed upstream, then reintroduced later... so all of that happens _after_ our gerrit version
19:17:24 <fungi> perhaps. except that the stack traces i have show stream workers stuck in that same method
19:17:30 <clarkb> jeblair: ya there are three MINA SSHD versions at play. the one we use, the one that broke older 2.9, and the one 2.10.3 is using
19:17:31 <jeblair> which is why we are baffled by seeing the problem on our server (which ran something like 1.5 years with it only showing up once)
19:18:35 <clarkb> jeblair: correct, and I think this thread shows that all three have exhiited the problem
19:19:05 <jeblair> clarkb: yes, but possibly with varying degrees?  and maybe it's more than one problem with a single manifestation?
19:19:42 <jeblair> i'm getting the sense from that thread that people think 2.10.2 is less error-prone in this regard than 2.10.3.  does that seem right?
19:19:44 <clarkb> ya I think we see a common symptom across all of them that may be >= 1 bug with varying degrees of expression based on gerrit version
19:19:50 <clarkb> jeblair: I agree ith that
19:20:17 <jeblair> zaro: what do you have staged for us on our 2.10 branch?
19:20:18 <fungi> so, anyway, i guess point being 2.10.2 has the same mina-sshd we're running, 2.10.3 has a much newer mina-sshd which might alleviate the problem we have. though the only thing i saw in that discussion was one person reporting a problem which might have been unrelated to the gerrit version
19:21:14 <SpamapS> jeblair: sorry for the interruption and late response, I got pulled away by meatspace things. Yes I do think we should talk about infra cloud at the summit and I'm working on a patch to infra manual with the first rev of the docs that we can use to seed the discussion.
19:21:43 <jeblair> SpamapS, mordred: great, thanks!
19:22:50 <jeblair> okay, does anyone at this meeting know what version of gerrit we are poised to deploy on saturday?
19:23:38 <fungi> 2.10.2-23-g039a170 is what's running on review-dev
19:23:48 <fungi> if i had to guess, i'd say that
19:24:03 <anteaya> https://review.openstack.org/#/c/155463/3/modules/openstack_project/manifests/review.pp
19:24:06 <anteaya> 10.2.22
19:24:08 <jeblair> fungi: so that's 23 commits past .2, one of which may be the sshd upgrade
19:24:43 <jeblair> anteaya: or 22 commits
19:24:58 <anteaya> the patch is not in syncy with the -dev server :(
19:25:02 <fungi> http://git.openstack.org/cgit/openstack-infra/gerrit/log/?h=openstack%2F2.10.2
19:25:03 * clarkb is looking at git now
19:25:17 <anteaya> jeblair: ah thanks
19:25:22 <jeblair> i'm starting to get worried about this.  i'm not at all sure we have our act together for saturday.
19:25:53 <jeblair> does anyone want to drive this effort?
19:25:56 <clarkb> it does not include mina sshd change or 2.10.3
19:26:08 <anteaya> I can't since I cant' be here on saturday, sorry
19:26:13 <anteaya> otherwise I would
19:26:23 <clarkb> zaro: ^ are you around?
19:27:06 <anteaya> anything other than a funeral and I'd change my plans
19:28:24 <clarkb> I can jump in as soon as I get this gearman plugin fix going
19:28:27 <fungi> i can pick it up and run with it since zaro seems not to be around
19:28:29 <zaro> clarkb: yes
19:28:59 <clarkb> zaro: see questions above, what version do we intend on upgrading Gerrit to on saturday? and does that include the 2.10.3 sshd changes?
19:29:01 <jeblair> zaro: welcome back
19:29:05 <zaro> jeblair: i believe it's tip of stable-2.10
19:29:07 <zaro> branch
19:29:19 <clarkb> ok so that would include the 2.10.3 changes
19:29:39 <zaro> i think there was an update to jgit that we should get
19:29:43 <jeblair> zaro: why is review-dev running .23 and the proposal for review to run .22?
19:30:27 <fungi> 4 days to go, so not much time left to lock this down and retest on review-dev to be certain we're good for the window
19:31:08 <zaro> jeblair: probably an error, i need to update
19:31:42 <jeblair> zaro: you mean you intend to upgrade review.o.o to .23?
19:32:19 <zaro> let me review this. trying to map version number to change
19:33:03 <jeblair> zaro: i need your input on whether you think we should use the older or newer mina sshd, and also whether your proposed gerrit build includes the older or newer sshd
19:35:16 <zaro> I think the version on review-dev is the one we want to go with.  IIRC the SSHD problem that was reported wasn't a real proble but let me confirm
19:35:42 <fungi> zaro: well, turned out to probably not be an sshd-related issue, but the reporter never updated to say when they retried to upgrade
19:35:52 <fungi> i linked it earlier in the meeting
19:36:01 <zaro> ahh cool.
19:36:34 <zaro> what is your opinion on this change?  https://gerrit-review.googlesource.com/#/c/67653/
19:36:48 <fungi> so it's an unknown. we might upgrade to 2.10.3 and see gerrit freezing up on us, or we might not. we might upgrade to 2.3.10 and see the new mina-sshd solve our stream-events hang, or might not
19:37:06 <mordred> that's awesome
19:37:10 <jeblair> clarkb, fungi: are you comfortable trying the newer mina sshd then?
19:37:43 <clarkb> I think so, it will likely be no worse than the current situation
19:37:52 <jeblair> zaro: so one last thing -- can you confirm that the .23 build has the newer mina sshd?
19:38:05 <fungi> it's early in the cycle, we can so an emergency downgrade if needed
19:38:22 <fungi> new versions usually come with unknown new bugs
19:38:37 <jeblair> fungi: schema changes might force us to stick with 2.10, so likely just a downgrade to 2.10.2 equivalent by the time we notice the problem
19:38:50 <fungi> yeah, that's what i was expecting
19:39:00 <zaro> jeblair: yes.
19:39:09 <fungi> we try 2.10.3 and if we have problems switch to 2.10.2
19:39:25 <jeblair> zaro: yes it is in that build?
19:39:28 <clarkb> fungi: sounds good
19:39:54 <jeblair> who's around on saturday?
19:39:57 <jeblair> o/
19:40:05 <fungi> i plan to be here for the duration
19:40:31 <zaro> jeblair: ohh crap.  i don't see .23 in tarballs.  let me check that on the sever
19:41:03 <fungi> zaro: yeah, there's no 23rd change merged to the branch since 2.10.2
19:41:16 <fungi> tip of that branch is 22
19:41:55 <clarkb> I am
19:41:56 <pleia2> I can be here for the first hour, but I need to leave at 1700
19:41:59 <zaro> alright.  that's a custom build of mine.  probably testing something
19:42:11 <pabelanger> I can be, if help is needed
19:42:43 <anteaya> pabelanger: having someone in channel to answer questions is helpful
19:42:47 <jeblair> zaro: okay, so what do you propose we install on saturday?
19:43:12 <zaro> for sure .22 is it.
19:43:25 <jeblair> zaro: want to downgrade review-dev then?
19:43:31 <zaro> yes, i can do that
19:43:35 <jeblair> k, thx
19:43:45 <jeblair> should we send out a reminder announcement?
19:43:49 <pleia2> yes
19:43:52 <anteaya> I think so
19:43:58 <pleia2> I can do that if you'd like
19:44:08 <zaro> have puppet turned off on review-dev due to the required change for gerrit libs.
19:45:00 <AJaeger_> will you do project renames also during the downtime or is that better for another separate slot?
19:45:22 <zaro> #link https://review.openstack.org/#/c/172534/
19:45:33 <fungi> separate
19:45:35 <jeblair> AJaeger_: i think we should leave it for another slot
19:45:46 <fungi> i don't think we want to do anything during the saturday window except upgrade gerrit
19:45:49 <zaro> sorry to come late, did we post the etherpad for gerrit upgrade yet?
19:45:49 <jeblair> pleia2: that sounds great; when should we send the announcement?
19:45:59 <jeblair> #agreed downgrade review-dev.o.o to gerrit 2.10.2.22
19:46:00 <jeblair> #agreed upgrade review.o.o to gerrit 2.10.2.22
19:46:00 <jeblair> #info 2.10.2.22 is stable-2.10 branch tip, approximately equivalent to 2.10.3 and contains a newer mina sshd
19:46:21 <anteaya> #link https://etherpad.openstack.org/p/gerrit-2.10-upgrade
19:46:22 <zaro> well here it is anyways #link https://etherpad.openstack.org/p/gerrit-2.10-upgrade
19:46:29 <anteaya> zaro: I put it in the agenda
19:46:37 <zaro> anteaya: thanks
19:46:42 <anteaya> welcome
19:46:48 <jeblair> maybe send reminder announcements tomorrow and also friday?
19:46:57 <pleia2> jeblair: wfm
19:47:10 <clarkb> jeblair: +1
19:47:16 <fungi> zaro: you're sure the mina-sshd upgrade from 2.10.3 is in our openstack/2.10.2 branch?
19:47:20 <jeblair> #action pleia2 send reminder announcements about gerrit upgrade wed may 6 and friday may 8
19:49:21 <zaro> fungi: it's a downgrade, but yes the downgrade is there.
19:49:50 <jeblair> if that is true, it has invalidated all of my knowledge on the subject
19:50:22 <fungi> yeah, i was asking about the mina-sshd _upgrade_ to 0.14.0 in gerrit 2.10.3
19:50:40 <zaro> no, sorry i meant it's got the 0.14 version in there.
19:50:54 <clarkb> zaro: do you know which commit pulls it in?
19:51:02 <fungi> which is newer than the 0.9.whatever we're running
19:51:13 <zaro> http://git.openstack.org/cgit/openstack-infra/gerrit/commit/?h=openstack/2.10.2&id=e43b1b10b13e86f9c957175aca33d9c2ff592fff
19:51:27 <clarkb> https://git.openstack.org/cgit/openstack-infra/gerrit/commit/?h=openstack/2.10.2&id=e43b1b10b13e86f9c957175aca33d9c2ff592fff ?
19:51:30 <clarkb> ya ok
19:52:35 <jeblair> okay, i think we're all set then.  anything else on this topic?
19:52:52 <anteaya> I'm looking forward to close connectin
19:53:02 <fungi> nope, let's do it
19:53:04 <zaro> no
19:53:15 <jeblair> great, thanks everyone!
19:53:15 <fungi> it'll be good to have behind us
19:53:32 <jeblair> #topic Open discussion
19:53:55 <clarkb> I think I have a mostly working version of the gearman plugin update change to push up
19:54:03 <anteaya> yay
19:54:04 <clarkb> will get that up for review as soon as meeting is over
19:54:23 <anteaya> do we think that this will solve the problem?
19:54:38 <anteaya> the problem being lots of ready nodes and few in use
19:54:43 <clarkb> it should solve the node leaking problem, unsure how it will affect the other issues
19:54:49 <anteaya> okay thanks
19:55:05 <jeblair> i met with some enovance folks recently, and they want to help out with upstream infra -- they're going to start by pitching into the puppet-openstackci / downstream puppet effort
19:55:20 <fungi> anteaya: i think the bulk of the ready nodes aren't actually ready
19:55:20 <anteaya> wonderful
19:55:23 <asselin__> great!
19:55:24 <clarkb> I think fbo has already been pushing changes for that
19:55:28 <anteaya> fungi: ah
19:55:30 * clarkb has been trying to review when able
19:55:36 <pabelanger> started work on grafyaml (yaml for grafana).  Have some yaml validating and working on posting a dashboard right now. I'm sure there'll be some discussion about it, but figure we can talk about it next meeting / summit?
19:55:39 <jeblair> clarkb: yep!
19:56:02 <anteaya> pabelanger: have you put it on the etherpad?
19:56:15 <fungi> anteaya: just after we started the meeting i spotted that there were nodes running jobs which nodepool thought were ready, so we may have been simultaneously struggling with the gearman race in jenkins and a zeromq publisher disconnect between nodepoold and the jenkins masters
19:56:22 <anteaya> even if you don't get a slot put it there as a marker
19:56:34 <pabelanger> anteaya, nothing yet.  Will have to do that shortly
19:56:34 <fungi> both have the effect of causing nodes to seem to stick around in a ready state in nodepool, but for very different reasons
19:56:38 <anteaya> fungi: oh wonderful
19:56:39 <mrmartin> I was a bit late today,and missed to askbot topic. I'm on a good track on askbot-staging, but it seems to be that the latest app is broken and need to solve that issue before we can move further.
19:56:45 <fungi> and with very different outcomes when you delete them :/
19:56:51 <anteaya> fungi: oh great
19:57:12 <anteaya> fungi: are you able to filter them based on state?
19:57:18 <jeblair> pabelanger: exciting -- i expect it's probably not an issue that needs a lot of discussion (i think we probably agree it's a good idea), so might be a good sprint or workroom thing at the summit (to just knock out some dashboards or something)
19:57:39 <jeblair> mrmartin: good to know, thanks
19:57:40 <fungi> anteaya: i'm able to filter based on long enough to not be running jobs any more but still showing ready. also i restarted nodepoold to get around the other issue
19:58:16 <AJaeger_> fungi, is there anything you can tell SergeyLukjanov in case we run into this again during non-US hours?
19:58:21 <anteaya> fungi: seems to help a bit based on teh graph
19:58:37 <fungi> AJaeger_: yes, get familiar with this stuff and troubleshoot it
19:58:43 <pabelanger> jeblair, sounds good to me
19:58:51 <zaro> jeblair: i completely forgot, we probably want this in before he upgrade https://review.openstack.org/#/c/176523/
19:59:17 <anteaya> zaro: add that to the etherpad please, it isn't there right now
19:59:32 <jeblair> zaro: is that based on the second version of my patch?  (the first was wrong)
20:00:23 <jeblair> time's up, thanks everyone!
20:00:25 <jeblair> #endmeeting