19:03:32 <jeblair> #startmeeting infra 19:03:33 <openstack> Meeting started Tue May 5 19:03:32 2015 UTC and is due to finish in 60 minutes. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:03:35 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:03:38 <openstack> The meeting name has been set to 'infra' 19:03:38 <greghaynes> O/ 19:03:44 <jeblair> #link agenda https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:03:45 <jeblair> #link previous meeting http://eavesdrop.openstack.org/meetings/infra/2015/infra.2015-04-28-19.04.html 19:03:52 <krtaylor> o/ 19:03:58 <jeblair> #topic Actions from last meeting 19:04:10 <jeblair> fungi check our cinder quota in rax-dfw 19:04:19 <fungi> uh, yeah, not done 19:04:27 <jeblair> #action fungi check our cinder quota in rax-dfw 19:04:57 <jeblair> #topic Summit planning 19:05:15 <jeblair> thanks to folks who put things on https://etherpad.openstack.org/p/infra-liberty-summit-planning 19:05:48 <jeblair> mordred, SpamapS: do you think we ought to talk about infra-cloud at the summit? 19:05:56 <mordred> yes 19:06:01 * anteaya reminds folks to enter their name in the top right corner of the etherpad 19:06:20 <jeblair> mordred: okay, should probably put something in there real quick like 19:06:21 <mordred> SpamapS: I'm slammed this week - can you make an entry for that? 19:06:24 <greghaynes> Is there new stuff to talk about since we last talked? Or just convey info? 19:06:27 <mordred> me tries 19:07:19 <jeblair> greghaynes: i would like us to make a go/no-go decision. in my mind that means determining the scope of work and whether we have enough people lined up and ready for it 19:07:36 <mordred> jeblair: I have put in a placeholder entry 19:07:40 <greghaynes> Awesome 19:07:45 <jeblair> mordred: thx 19:08:14 <jeblair> i expect to translate that into an actual schedule this week 19:08:58 <jeblair> #topic Priority Efforts 19:09:41 <jeblair> this part of the meeting has started to get a little status-reporty, which i think generally we want to avoid, and instead focus on things we need to work through widely together 19:09:52 <jeblair> but giving these items priority 19:10:36 <jeblair> so i'm thinking we should ask people involved in the priority efforts to update the meeting agenda to flag that they have something to discuss; or else, i can ask at the start of the meeting... 19:10:41 <jeblair> how does that sound? 19:10:53 <fungi> i think that's a great idea 19:11:00 <jhesketh> Sounds good to me 19:11:03 <GheRivero> like it 19:11:03 <anteaya> me too 19:11:03 <jeblair> also, shout now if you have a priority effort thing to discuss :) 19:11:06 <pleia2> wfm 19:11:10 <fungi> makes the meeting less of a scramble to get through those and still have time for other incidental topics 19:11:14 <anteaya> can we talk about gerrit for saturday? 19:11:20 <anteaya> just to ensure we are ready? 19:11:25 <nibalizer> wfm 19:11:30 <jeblair> anteaya: yep 19:11:33 <anteaya> also I can't be here on saturday, sorry 19:11:35 <jhesketh> I have nothing to discuss sorry (ie no progress) 19:12:00 <fungi> yeah, with the gerrit upgrade coming up this weekend, talking about that is probably a great idea ;) 19:12:08 <pleia2> only have progress-report stuff for zanata, we're doing fine and have nothing to discuss more broadly 19:12:31 <clarkb> jeblair: sounds good to me (updating agends if specific items need discussion) 19:12:45 <jeblair> oh, also i have a todo item to make the priority effort gerrit topics more visible to reduce the need to should for reviews in this meeting 19:12:58 <pleia2> jeblair: ++ 19:12:59 <jeblair> #action jeblair make the priority effort gerrit topics more visible to reduce the need to shout for reviews in this meeting 19:13:19 <jeblair> #topic Priority Effort: Upgrading Gerrit Saturday May 9 19:13:28 <jeblair> that's a few days away! 19:13:43 <fungi> we probably need to talk about 2.10.2 vs 2.10.3+ 19:14:00 <fungi> the main concern for latest 2.10.x was... 19:14:04 <fungi> #link https://groups.google.com/forum/#!msg/repo-discuss/Kv4bWLESbQ4/-oSNbuTQwkUJ 19:14:44 <jeblair> so 2.10.2 has the sshd version we are using now 19:14:47 <jeblair> right? 19:14:54 <fungi> someone running gerrit saw a lockup on 2.10.3 (which is also the one with the ssh lib that's supposed to solve our stream-events problem) 19:15:14 <fungi> but their lockup seems afterward to have been likely unrelated 19:15:15 <jeblair> fungi: wait, there's a known fix for our stream-events problem? 19:15:35 <fungi> #link https://issues.apache.org/jira/browse/SSHD-348 19:15:37 <clarkb> fungi: jeblair I think there was a supposed fix via MINA SSHD update to fix one bug 19:15:40 <fungi> supposedly 19:16:19 <jeblair> isn't that the one that was introduced _after_ our version? 19:17:01 <jeblair> my recollection is that was introduced in 2.9.x, reverted in 2.9.y, fixed upstream, then reintroduced later... so all of that happens _after_ our gerrit version 19:17:24 <fungi> perhaps. except that the stack traces i have show stream workers stuck in that same method 19:17:30 <clarkb> jeblair: ya there are three MINA SSHD versions at play. the one we use, the one that broke older 2.9, and the one 2.10.3 is using 19:17:31 <jeblair> which is why we are baffled by seeing the problem on our server (which ran something like 1.5 years with it only showing up once) 19:18:35 <clarkb> jeblair: correct, and I think this thread shows that all three have exhiited the problem 19:19:05 <jeblair> clarkb: yes, but possibly with varying degrees? and maybe it's more than one problem with a single manifestation? 19:19:42 <jeblair> i'm getting the sense from that thread that people think 2.10.2 is less error-prone in this regard than 2.10.3. does that seem right? 19:19:44 <clarkb> ya I think we see a common symptom across all of them that may be >= 1 bug with varying degrees of expression based on gerrit version 19:19:50 <clarkb> jeblair: I agree ith that 19:20:17 <jeblair> zaro: what do you have staged for us on our 2.10 branch? 19:20:18 <fungi> so, anyway, i guess point being 2.10.2 has the same mina-sshd we're running, 2.10.3 has a much newer mina-sshd which might alleviate the problem we have. though the only thing i saw in that discussion was one person reporting a problem which might have been unrelated to the gerrit version 19:21:14 <SpamapS> jeblair: sorry for the interruption and late response, I got pulled away by meatspace things. Yes I do think we should talk about infra cloud at the summit and I'm working on a patch to infra manual with the first rev of the docs that we can use to seed the discussion. 19:21:43 <jeblair> SpamapS, mordred: great, thanks! 19:22:50 <jeblair> okay, does anyone at this meeting know what version of gerrit we are poised to deploy on saturday? 19:23:38 <fungi> 2.10.2-23-g039a170 is what's running on review-dev 19:23:48 <fungi> if i had to guess, i'd say that 19:24:03 <anteaya> https://review.openstack.org/#/c/155463/3/modules/openstack_project/manifests/review.pp 19:24:06 <anteaya> 10.2.22 19:24:08 <jeblair> fungi: so that's 23 commits past .2, one of which may be the sshd upgrade 19:24:43 <jeblair> anteaya: or 22 commits 19:24:58 <anteaya> the patch is not in syncy with the -dev server :( 19:25:02 <fungi> http://git.openstack.org/cgit/openstack-infra/gerrit/log/?h=openstack%2F2.10.2 19:25:03 * clarkb is looking at git now 19:25:17 <anteaya> jeblair: ah thanks 19:25:22 <jeblair> i'm starting to get worried about this. i'm not at all sure we have our act together for saturday. 19:25:53 <jeblair> does anyone want to drive this effort? 19:25:56 <clarkb> it does not include mina sshd change or 2.10.3 19:26:08 <anteaya> I can't since I cant' be here on saturday, sorry 19:26:13 <anteaya> otherwise I would 19:26:23 <clarkb> zaro: ^ are you around? 19:27:06 <anteaya> anything other than a funeral and I'd change my plans 19:28:24 <clarkb> I can jump in as soon as I get this gearman plugin fix going 19:28:27 <fungi> i can pick it up and run with it since zaro seems not to be around 19:28:29 <zaro> clarkb: yes 19:28:59 <clarkb> zaro: see questions above, what version do we intend on upgrading Gerrit to on saturday? and does that include the 2.10.3 sshd changes? 19:29:01 <jeblair> zaro: welcome back 19:29:05 <zaro> jeblair: i believe it's tip of stable-2.10 19:29:07 <zaro> branch 19:29:19 <clarkb> ok so that would include the 2.10.3 changes 19:29:39 <zaro> i think there was an update to jgit that we should get 19:29:43 <jeblair> zaro: why is review-dev running .23 and the proposal for review to run .22? 19:30:27 <fungi> 4 days to go, so not much time left to lock this down and retest on review-dev to be certain we're good for the window 19:31:08 <zaro> jeblair: probably an error, i need to update 19:31:42 <jeblair> zaro: you mean you intend to upgrade review.o.o to .23? 19:32:19 <zaro> let me review this. trying to map version number to change 19:33:03 <jeblair> zaro: i need your input on whether you think we should use the older or newer mina sshd, and also whether your proposed gerrit build includes the older or newer sshd 19:35:16 <zaro> I think the version on review-dev is the one we want to go with. IIRC the SSHD problem that was reported wasn't a real proble but let me confirm 19:35:42 <fungi> zaro: well, turned out to probably not be an sshd-related issue, but the reporter never updated to say when they retried to upgrade 19:35:52 <fungi> i linked it earlier in the meeting 19:36:01 <zaro> ahh cool. 19:36:34 <zaro> what is your opinion on this change? https://gerrit-review.googlesource.com/#/c/67653/ 19:36:48 <fungi> so it's an unknown. we might upgrade to 2.10.3 and see gerrit freezing up on us, or we might not. we might upgrade to 2.3.10 and see the new mina-sshd solve our stream-events hang, or might not 19:37:06 <mordred> that's awesome 19:37:10 <jeblair> clarkb, fungi: are you comfortable trying the newer mina sshd then? 19:37:43 <clarkb> I think so, it will likely be no worse than the current situation 19:37:52 <jeblair> zaro: so one last thing -- can you confirm that the .23 build has the newer mina sshd? 19:38:05 <fungi> it's early in the cycle, we can so an emergency downgrade if needed 19:38:22 <fungi> new versions usually come with unknown new bugs 19:38:37 <jeblair> fungi: schema changes might force us to stick with 2.10, so likely just a downgrade to 2.10.2 equivalent by the time we notice the problem 19:38:50 <fungi> yeah, that's what i was expecting 19:39:00 <zaro> jeblair: yes. 19:39:09 <fungi> we try 2.10.3 and if we have problems switch to 2.10.2 19:39:25 <jeblair> zaro: yes it is in that build? 19:39:28 <clarkb> fungi: sounds good 19:39:54 <jeblair> who's around on saturday? 19:39:57 <jeblair> o/ 19:40:05 <fungi> i plan to be here for the duration 19:40:31 <zaro> jeblair: ohh crap. i don't see .23 in tarballs. let me check that on the sever 19:41:03 <fungi> zaro: yeah, there's no 23rd change merged to the branch since 2.10.2 19:41:16 <fungi> tip of that branch is 22 19:41:55 <clarkb> I am 19:41:56 <pleia2> I can be here for the first hour, but I need to leave at 1700 19:41:59 <zaro> alright. that's a custom build of mine. probably testing something 19:42:11 <pabelanger> I can be, if help is needed 19:42:43 <anteaya> pabelanger: having someone in channel to answer questions is helpful 19:42:47 <jeblair> zaro: okay, so what do you propose we install on saturday? 19:43:12 <zaro> for sure .22 is it. 19:43:25 <jeblair> zaro: want to downgrade review-dev then? 19:43:31 <zaro> yes, i can do that 19:43:35 <jeblair> k, thx 19:43:45 <jeblair> should we send out a reminder announcement? 19:43:49 <pleia2> yes 19:43:52 <anteaya> I think so 19:43:58 <pleia2> I can do that if you'd like 19:44:08 <zaro> have puppet turned off on review-dev due to the required change for gerrit libs. 19:45:00 <AJaeger_> will you do project renames also during the downtime or is that better for another separate slot? 19:45:22 <zaro> #link https://review.openstack.org/#/c/172534/ 19:45:33 <fungi> separate 19:45:35 <jeblair> AJaeger_: i think we should leave it for another slot 19:45:46 <fungi> i don't think we want to do anything during the saturday window except upgrade gerrit 19:45:49 <zaro> sorry to come late, did we post the etherpad for gerrit upgrade yet? 19:45:49 <jeblair> pleia2: that sounds great; when should we send the announcement? 19:45:59 <jeblair> #agreed downgrade review-dev.o.o to gerrit 2.10.2.22 19:46:00 <jeblair> #agreed upgrade review.o.o to gerrit 2.10.2.22 19:46:00 <jeblair> #info 2.10.2.22 is stable-2.10 branch tip, approximately equivalent to 2.10.3 and contains a newer mina sshd 19:46:21 <anteaya> #link https://etherpad.openstack.org/p/gerrit-2.10-upgrade 19:46:22 <zaro> well here it is anyways #link https://etherpad.openstack.org/p/gerrit-2.10-upgrade 19:46:29 <anteaya> zaro: I put it in the agenda 19:46:37 <zaro> anteaya: thanks 19:46:42 <anteaya> welcome 19:46:48 <jeblair> maybe send reminder announcements tomorrow and also friday? 19:46:57 <pleia2> jeblair: wfm 19:47:10 <clarkb> jeblair: +1 19:47:16 <fungi> zaro: you're sure the mina-sshd upgrade from 2.10.3 is in our openstack/2.10.2 branch? 19:47:20 <jeblair> #action pleia2 send reminder announcements about gerrit upgrade wed may 6 and friday may 8 19:49:21 <zaro> fungi: it's a downgrade, but yes the downgrade is there. 19:49:50 <jeblair> if that is true, it has invalidated all of my knowledge on the subject 19:50:22 <fungi> yeah, i was asking about the mina-sshd _upgrade_ to 0.14.0 in gerrit 2.10.3 19:50:40 <zaro> no, sorry i meant it's got the 0.14 version in there. 19:50:54 <clarkb> zaro: do you know which commit pulls it in? 19:51:02 <fungi> which is newer than the 0.9.whatever we're running 19:51:13 <zaro> http://git.openstack.org/cgit/openstack-infra/gerrit/commit/?h=openstack/2.10.2&id=e43b1b10b13e86f9c957175aca33d9c2ff592fff 19:51:27 <clarkb> https://git.openstack.org/cgit/openstack-infra/gerrit/commit/?h=openstack/2.10.2&id=e43b1b10b13e86f9c957175aca33d9c2ff592fff ? 19:51:30 <clarkb> ya ok 19:52:35 <jeblair> okay, i think we're all set then. anything else on this topic? 19:52:52 <anteaya> I'm looking forward to close connectin 19:53:02 <fungi> nope, let's do it 19:53:04 <zaro> no 19:53:15 <jeblair> great, thanks everyone! 19:53:15 <fungi> it'll be good to have behind us 19:53:32 <jeblair> #topic Open discussion 19:53:55 <clarkb> I think I have a mostly working version of the gearman plugin update change to push up 19:54:03 <anteaya> yay 19:54:04 <clarkb> will get that up for review as soon as meeting is over 19:54:23 <anteaya> do we think that this will solve the problem? 19:54:38 <anteaya> the problem being lots of ready nodes and few in use 19:54:43 <clarkb> it should solve the node leaking problem, unsure how it will affect the other issues 19:54:49 <anteaya> okay thanks 19:55:05 <jeblair> i met with some enovance folks recently, and they want to help out with upstream infra -- they're going to start by pitching into the puppet-openstackci / downstream puppet effort 19:55:20 <fungi> anteaya: i think the bulk of the ready nodes aren't actually ready 19:55:20 <anteaya> wonderful 19:55:23 <asselin__> great! 19:55:24 <clarkb> I think fbo has already been pushing changes for that 19:55:28 <anteaya> fungi: ah 19:55:30 * clarkb has been trying to review when able 19:55:36 <pabelanger> started work on grafyaml (yaml for grafana). Have some yaml validating and working on posting a dashboard right now. I'm sure there'll be some discussion about it, but figure we can talk about it next meeting / summit? 19:55:39 <jeblair> clarkb: yep! 19:56:02 <anteaya> pabelanger: have you put it on the etherpad? 19:56:15 <fungi> anteaya: just after we started the meeting i spotted that there were nodes running jobs which nodepool thought were ready, so we may have been simultaneously struggling with the gearman race in jenkins and a zeromq publisher disconnect between nodepoold and the jenkins masters 19:56:22 <anteaya> even if you don't get a slot put it there as a marker 19:56:34 <pabelanger> anteaya, nothing yet. Will have to do that shortly 19:56:34 <fungi> both have the effect of causing nodes to seem to stick around in a ready state in nodepool, but for very different reasons 19:56:38 <anteaya> fungi: oh wonderful 19:56:39 <mrmartin> I was a bit late today,and missed to askbot topic. I'm on a good track on askbot-staging, but it seems to be that the latest app is broken and need to solve that issue before we can move further. 19:56:45 <fungi> and with very different outcomes when you delete them :/ 19:56:51 <anteaya> fungi: oh great 19:57:12 <anteaya> fungi: are you able to filter them based on state? 19:57:18 <jeblair> pabelanger: exciting -- i expect it's probably not an issue that needs a lot of discussion (i think we probably agree it's a good idea), so might be a good sprint or workroom thing at the summit (to just knock out some dashboards or something) 19:57:39 <jeblair> mrmartin: good to know, thanks 19:57:40 <fungi> anteaya: i'm able to filter based on long enough to not be running jobs any more but still showing ready. also i restarted nodepoold to get around the other issue 19:58:16 <AJaeger_> fungi, is there anything you can tell SergeyLukjanov in case we run into this again during non-US hours? 19:58:21 <anteaya> fungi: seems to help a bit based on teh graph 19:58:37 <fungi> AJaeger_: yes, get familiar with this stuff and troubleshoot it 19:58:43 <pabelanger> jeblair, sounds good to me 19:58:51 <zaro> jeblair: i completely forgot, we probably want this in before he upgrade https://review.openstack.org/#/c/176523/ 19:59:17 <anteaya> zaro: add that to the etherpad please, it isn't there right now 19:59:32 <jeblair> zaro: is that based on the second version of my patch? (the first was wrong) 20:00:23 <jeblair> time's up, thanks everyone! 20:00:25 <jeblair> #endmeeting