19:03:32 #startmeeting infra 19:03:33 Meeting started Tue May 5 19:03:32 2015 UTC and is due to finish in 60 minutes. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:03:35 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:03:38 The meeting name has been set to 'infra' 19:03:38 O/ 19:03:44 #link agenda https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:03:45 #link previous meeting http://eavesdrop.openstack.org/meetings/infra/2015/infra.2015-04-28-19.04.html 19:03:52 o/ 19:03:58 #topic Actions from last meeting 19:04:10 fungi check our cinder quota in rax-dfw 19:04:19 uh, yeah, not done 19:04:27 #action fungi check our cinder quota in rax-dfw 19:04:57 #topic Summit planning 19:05:15 thanks to folks who put things on https://etherpad.openstack.org/p/infra-liberty-summit-planning 19:05:48 mordred, SpamapS: do you think we ought to talk about infra-cloud at the summit? 19:05:56 yes 19:06:01 * anteaya reminds folks to enter their name in the top right corner of the etherpad 19:06:20 mordred: okay, should probably put something in there real quick like 19:06:21 SpamapS: I'm slammed this week - can you make an entry for that? 19:06:24 Is there new stuff to talk about since we last talked? Or just convey info? 19:06:27 me tries 19:07:19 greghaynes: i would like us to make a go/no-go decision. in my mind that means determining the scope of work and whether we have enough people lined up and ready for it 19:07:36 jeblair: I have put in a placeholder entry 19:07:40 Awesome 19:07:45 mordred: thx 19:08:14 i expect to translate that into an actual schedule this week 19:08:58 #topic Priority Efforts 19:09:41 this part of the meeting has started to get a little status-reporty, which i think generally we want to avoid, and instead focus on things we need to work through widely together 19:09:52 but giving these items priority 19:10:36 so i'm thinking we should ask people involved in the priority efforts to update the meeting agenda to flag that they have something to discuss; or else, i can ask at the start of the meeting... 19:10:41 how does that sound? 19:10:53 i think that's a great idea 19:11:00 Sounds good to me 19:11:03 like it 19:11:03 me too 19:11:03 also, shout now if you have a priority effort thing to discuss :) 19:11:06 wfm 19:11:10 makes the meeting less of a scramble to get through those and still have time for other incidental topics 19:11:14 can we talk about gerrit for saturday? 19:11:20 just to ensure we are ready? 19:11:25 wfm 19:11:30 anteaya: yep 19:11:33 also I can't be here on saturday, sorry 19:11:35 I have nothing to discuss sorry (ie no progress) 19:12:00 yeah, with the gerrit upgrade coming up this weekend, talking about that is probably a great idea ;) 19:12:08 only have progress-report stuff for zanata, we're doing fine and have nothing to discuss more broadly 19:12:31 jeblair: sounds good to me (updating agends if specific items need discussion) 19:12:45 oh, also i have a todo item to make the priority effort gerrit topics more visible to reduce the need to should for reviews in this meeting 19:12:58 jeblair: ++ 19:12:59 #action jeblair make the priority effort gerrit topics more visible to reduce the need to shout for reviews in this meeting 19:13:19 #topic Priority Effort: Upgrading Gerrit Saturday May 9 19:13:28 that's a few days away! 19:13:43 we probably need to talk about 2.10.2 vs 2.10.3+ 19:14:00 the main concern for latest 2.10.x was... 19:14:04 #link https://groups.google.com/forum/#!msg/repo-discuss/Kv4bWLESbQ4/-oSNbuTQwkUJ 19:14:44 so 2.10.2 has the sshd version we are using now 19:14:47 right? 19:14:54 someone running gerrit saw a lockup on 2.10.3 (which is also the one with the ssh lib that's supposed to solve our stream-events problem) 19:15:14 but their lockup seems afterward to have been likely unrelated 19:15:15 fungi: wait, there's a known fix for our stream-events problem? 19:15:35 #link https://issues.apache.org/jira/browse/SSHD-348 19:15:37 fungi: jeblair I think there was a supposed fix via MINA SSHD update to fix one bug 19:15:40 supposedly 19:16:19 isn't that the one that was introduced _after_ our version? 19:17:01 my recollection is that was introduced in 2.9.x, reverted in 2.9.y, fixed upstream, then reintroduced later... so all of that happens _after_ our gerrit version 19:17:24 perhaps. except that the stack traces i have show stream workers stuck in that same method 19:17:30 jeblair: ya there are three MINA SSHD versions at play. the one we use, the one that broke older 2.9, and the one 2.10.3 is using 19:17:31 which is why we are baffled by seeing the problem on our server (which ran something like 1.5 years with it only showing up once) 19:18:35 jeblair: correct, and I think this thread shows that all three have exhiited the problem 19:19:05 clarkb: yes, but possibly with varying degrees? and maybe it's more than one problem with a single manifestation? 19:19:42 i'm getting the sense from that thread that people think 2.10.2 is less error-prone in this regard than 2.10.3. does that seem right? 19:19:44 ya I think we see a common symptom across all of them that may be >= 1 bug with varying degrees of expression based on gerrit version 19:19:50 jeblair: I agree ith that 19:20:17 zaro: what do you have staged for us on our 2.10 branch? 19:20:18 so, anyway, i guess point being 2.10.2 has the same mina-sshd we're running, 2.10.3 has a much newer mina-sshd which might alleviate the problem we have. though the only thing i saw in that discussion was one person reporting a problem which might have been unrelated to the gerrit version 19:21:14 jeblair: sorry for the interruption and late response, I got pulled away by meatspace things. Yes I do think we should talk about infra cloud at the summit and I'm working on a patch to infra manual with the first rev of the docs that we can use to seed the discussion. 19:21:43 SpamapS, mordred: great, thanks! 19:22:50 okay, does anyone at this meeting know what version of gerrit we are poised to deploy on saturday? 19:23:38 2.10.2-23-g039a170 is what's running on review-dev 19:23:48 if i had to guess, i'd say that 19:24:03 https://review.openstack.org/#/c/155463/3/modules/openstack_project/manifests/review.pp 19:24:06 10.2.22 19:24:08 fungi: so that's 23 commits past .2, one of which may be the sshd upgrade 19:24:43 anteaya: or 22 commits 19:24:58 the patch is not in syncy with the -dev server :( 19:25:02 http://git.openstack.org/cgit/openstack-infra/gerrit/log/?h=openstack%2F2.10.2 19:25:03 * clarkb is looking at git now 19:25:17 jeblair: ah thanks 19:25:22 i'm starting to get worried about this. i'm not at all sure we have our act together for saturday. 19:25:53 does anyone want to drive this effort? 19:25:56 it does not include mina sshd change or 2.10.3 19:26:08 I can't since I cant' be here on saturday, sorry 19:26:13 otherwise I would 19:26:23 zaro: ^ are you around? 19:27:06 anything other than a funeral and I'd change my plans 19:28:24 I can jump in as soon as I get this gearman plugin fix going 19:28:27 i can pick it up and run with it since zaro seems not to be around 19:28:29 clarkb: yes 19:28:59 zaro: see questions above, what version do we intend on upgrading Gerrit to on saturday? and does that include the 2.10.3 sshd changes? 19:29:01 zaro: welcome back 19:29:05 jeblair: i believe it's tip of stable-2.10 19:29:07 branch 19:29:19 ok so that would include the 2.10.3 changes 19:29:39 i think there was an update to jgit that we should get 19:29:43 zaro: why is review-dev running .23 and the proposal for review to run .22? 19:30:27 4 days to go, so not much time left to lock this down and retest on review-dev to be certain we're good for the window 19:31:08 jeblair: probably an error, i need to update 19:31:42 zaro: you mean you intend to upgrade review.o.o to .23? 19:32:19 let me review this. trying to map version number to change 19:33:03 zaro: i need your input on whether you think we should use the older or newer mina sshd, and also whether your proposed gerrit build includes the older or newer sshd 19:35:16 I think the version on review-dev is the one we want to go with. IIRC the SSHD problem that was reported wasn't a real proble but let me confirm 19:35:42 zaro: well, turned out to probably not be an sshd-related issue, but the reporter never updated to say when they retried to upgrade 19:35:52 i linked it earlier in the meeting 19:36:01 ahh cool. 19:36:34 what is your opinion on this change? https://gerrit-review.googlesource.com/#/c/67653/ 19:36:48 so it's an unknown. we might upgrade to 2.10.3 and see gerrit freezing up on us, or we might not. we might upgrade to 2.3.10 and see the new mina-sshd solve our stream-events hang, or might not 19:37:06 that's awesome 19:37:10 clarkb, fungi: are you comfortable trying the newer mina sshd then? 19:37:43 I think so, it will likely be no worse than the current situation 19:37:52 zaro: so one last thing -- can you confirm that the .23 build has the newer mina sshd? 19:38:05 it's early in the cycle, we can so an emergency downgrade if needed 19:38:22 new versions usually come with unknown new bugs 19:38:37 fungi: schema changes might force us to stick with 2.10, so likely just a downgrade to 2.10.2 equivalent by the time we notice the problem 19:38:50 yeah, that's what i was expecting 19:39:00 jeblair: yes. 19:39:09 we try 2.10.3 and if we have problems switch to 2.10.2 19:39:25 zaro: yes it is in that build? 19:39:28 fungi: sounds good 19:39:54 who's around on saturday? 19:39:57 o/ 19:40:05 i plan to be here for the duration 19:40:31 jeblair: ohh crap. i don't see .23 in tarballs. let me check that on the sever 19:41:03 zaro: yeah, there's no 23rd change merged to the branch since 2.10.2 19:41:16 tip of that branch is 22 19:41:55 I am 19:41:56 I can be here for the first hour, but I need to leave at 1700 19:41:59 alright. that's a custom build of mine. probably testing something 19:42:11 I can be, if help is needed 19:42:43 pabelanger: having someone in channel to answer questions is helpful 19:42:47 zaro: okay, so what do you propose we install on saturday? 19:43:12 for sure .22 is it. 19:43:25 zaro: want to downgrade review-dev then? 19:43:31 yes, i can do that 19:43:35 k, thx 19:43:45 should we send out a reminder announcement? 19:43:49 yes 19:43:52 I think so 19:43:58 I can do that if you'd like 19:44:08 have puppet turned off on review-dev due to the required change for gerrit libs. 19:45:00 will you do project renames also during the downtime or is that better for another separate slot? 19:45:22 #link https://review.openstack.org/#/c/172534/ 19:45:33 separate 19:45:35 AJaeger_: i think we should leave it for another slot 19:45:46 i don't think we want to do anything during the saturday window except upgrade gerrit 19:45:49 sorry to come late, did we post the etherpad for gerrit upgrade yet? 19:45:49 pleia2: that sounds great; when should we send the announcement? 19:45:59 #agreed downgrade review-dev.o.o to gerrit 2.10.2.22 19:46:00 #agreed upgrade review.o.o to gerrit 2.10.2.22 19:46:00 #info 2.10.2.22 is stable-2.10 branch tip, approximately equivalent to 2.10.3 and contains a newer mina sshd 19:46:21 #link https://etherpad.openstack.org/p/gerrit-2.10-upgrade 19:46:22 well here it is anyways #link https://etherpad.openstack.org/p/gerrit-2.10-upgrade 19:46:29 zaro: I put it in the agenda 19:46:37 anteaya: thanks 19:46:42 welcome 19:46:48 maybe send reminder announcements tomorrow and also friday? 19:46:57 jeblair: wfm 19:47:10 jeblair: +1 19:47:16 zaro: you're sure the mina-sshd upgrade from 2.10.3 is in our openstack/2.10.2 branch? 19:47:20 #action pleia2 send reminder announcements about gerrit upgrade wed may 6 and friday may 8 19:49:21 fungi: it's a downgrade, but yes the downgrade is there. 19:49:50 if that is true, it has invalidated all of my knowledge on the subject 19:50:22 yeah, i was asking about the mina-sshd _upgrade_ to 0.14.0 in gerrit 2.10.3 19:50:40 no, sorry i meant it's got the 0.14 version in there. 19:50:54 zaro: do you know which commit pulls it in? 19:51:02 which is newer than the 0.9.whatever we're running 19:51:13 http://git.openstack.org/cgit/openstack-infra/gerrit/commit/?h=openstack/2.10.2&id=e43b1b10b13e86f9c957175aca33d9c2ff592fff 19:51:27 https://git.openstack.org/cgit/openstack-infra/gerrit/commit/?h=openstack/2.10.2&id=e43b1b10b13e86f9c957175aca33d9c2ff592fff ? 19:51:30 ya ok 19:52:35 okay, i think we're all set then. anything else on this topic? 19:52:52 I'm looking forward to close connectin 19:53:02 nope, let's do it 19:53:04 no 19:53:15 great, thanks everyone! 19:53:15 it'll be good to have behind us 19:53:32 #topic Open discussion 19:53:55 I think I have a mostly working version of the gearman plugin update change to push up 19:54:03 yay 19:54:04 will get that up for review as soon as meeting is over 19:54:23 do we think that this will solve the problem? 19:54:38 the problem being lots of ready nodes and few in use 19:54:43 it should solve the node leaking problem, unsure how it will affect the other issues 19:54:49 okay thanks 19:55:05 i met with some enovance folks recently, and they want to help out with upstream infra -- they're going to start by pitching into the puppet-openstackci / downstream puppet effort 19:55:20 anteaya: i think the bulk of the ready nodes aren't actually ready 19:55:20 wonderful 19:55:23 great! 19:55:24 I think fbo has already been pushing changes for that 19:55:28 fungi: ah 19:55:30 * clarkb has been trying to review when able 19:55:36 started work on grafyaml (yaml for grafana). Have some yaml validating and working on posting a dashboard right now. I'm sure there'll be some discussion about it, but figure we can talk about it next meeting / summit? 19:55:39 clarkb: yep! 19:56:02 pabelanger: have you put it on the etherpad? 19:56:15 anteaya: just after we started the meeting i spotted that there were nodes running jobs which nodepool thought were ready, so we may have been simultaneously struggling with the gearman race in jenkins and a zeromq publisher disconnect between nodepoold and the jenkins masters 19:56:22 even if you don't get a slot put it there as a marker 19:56:34 anteaya, nothing yet. Will have to do that shortly 19:56:34 both have the effect of causing nodes to seem to stick around in a ready state in nodepool, but for very different reasons 19:56:38 fungi: oh wonderful 19:56:39 I was a bit late today,and missed to askbot topic. I'm on a good track on askbot-staging, but it seems to be that the latest app is broken and need to solve that issue before we can move further. 19:56:45 and with very different outcomes when you delete them :/ 19:56:51 fungi: oh great 19:57:12 fungi: are you able to filter them based on state? 19:57:18 pabelanger: exciting -- i expect it's probably not an issue that needs a lot of discussion (i think we probably agree it's a good idea), so might be a good sprint or workroom thing at the summit (to just knock out some dashboards or something) 19:57:39 mrmartin: good to know, thanks 19:57:40 anteaya: i'm able to filter based on long enough to not be running jobs any more but still showing ready. also i restarted nodepoold to get around the other issue 19:58:16 fungi, is there anything you can tell SergeyLukjanov in case we run into this again during non-US hours? 19:58:21 fungi: seems to help a bit based on teh graph 19:58:37 AJaeger_: yes, get familiar with this stuff and troubleshoot it 19:58:43 jeblair, sounds good to me 19:58:51 jeblair: i completely forgot, we probably want this in before he upgrade https://review.openstack.org/#/c/176523/ 19:59:17 zaro: add that to the etherpad please, it isn't there right now 19:59:32 zaro: is that based on the second version of my patch? (the first was wrong) 20:00:23 time's up, thanks everyone! 20:00:25 #endmeeting