#openstack-meeting log

19:03:49 <jeblair> #startmeeting infra
19:03:50 <openstack> Meeting started Tue Nov 19 19:03:49 2013 UTC and is due to finish in 60 minutes.  The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:03:51 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:03:54 <openstack> The meeting name has been set to 'infra'
19:03:58 <jeblair> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting
19:04:01 <krtaylor> o/
19:04:40 <jeblair> #link http://eavesdrop.openstack.org/meetings/infra/2013/infra.2013-11-12-19.03.html
19:04:53 <jeblair> #topic Actions from last meeting
19:05:07 <jeblair> #action jeblair move tarballs.o.o and include 50gb space for heat/trove images
19:05:10 <jeblair> clarkb: etherpad?
19:05:32 <clarkb> etherpad-dev is dead, didn't get to etherpad.o.o yet. Plan to kill it post meeting
19:05:55 <clarkb> that work kept getting bumped for more important things over the week :/
19:06:33 <jeblair> clarkb: what are the things you want to double check before killing?
19:06:44 <clarkb> jeblair: that the db backups properly overlap
19:06:47 <jeblair> clarkb: (and more specifically, anything you need help/coordination with)
19:06:55 <clarkb> so that I don't lose any potentially useful data
19:07:03 <clarkb> I shouldn't need help with that
19:07:22 <jeblair> cool
19:07:31 <jeblair> #topic Trove testing (mordred, hub_cap)
19:07:55 <hub_cap> hey. nothing to report here. still getting caught up w/ trove
19:08:21 <hub_cap> going to touch base in the next few days tho regarding resuming progress :)
19:08:30 <jeblair> ok.  makes sense, given the schedule of the past couple weeks.
19:08:33 <jeblair> hub_cap: sounds good!
19:08:44 <jeblair> #topic Tripleo testing (lifeless, pleia2)
19:09:02 <pleia2> working through some basics with dprince and derekh, nothing to report on the infra side at the moment though
19:10:31 <mordred> o/
19:10:34 <jeblair> #topic Wsme / pecan testing (sdague, dhellman)
19:10:56 <jeblair> sdague: might be busy/afk today?
19:10:59 <clarkb> yup
19:11:02 <jeblair> dhellmann: ping
19:11:05 <clarkb> I do have an updated from the d-g side though
19:11:07 <fungi> all week i think sdague said
19:11:10 <dhellmann> hi
19:11:20 <clarkb> to get the havana periodic/bitrot jobs in I have started my refactoring of the d-g jobs
19:11:40 <clarkb> I am trying to do one small step at a time for my sanity and for reviewers' sanity
19:11:50 <fungi> there was also a suggestion i think that sqlalchemy-migrate might ought to get the same love as pecan and wsme as far as dependency gating goes?
19:12:01 <clarkb> but I think a path towards having better d-g job templates is emerging which should make the wsme jobs easy
19:12:07 <dhellmann> we are trying really hard to get rid of sqlalchemy-migrate
19:12:17 <fungi> that sounds like a better solution anyway ;)
19:12:38 <dhellmann> we have https://review.openstack.org/#/c/54333/ open for running tox tests for pecan to gate against wsme, ceilometer, and ironic
19:12:51 <mordred> well, we are - but I think that adding pecan/wsme like gating to it would be good, based on the recent breakage
19:13:00 <dhellmann> ok, I wasn't aware of any breakage
19:13:16 <mordred> s-m made a release and it broke things :)
19:13:22 <dhellmann> more testing is better, I was just trying to help avoid extra work
19:14:28 <anteaya> mordred: s-m?
19:14:37 <clarkb> if the JJB job refactor ends up where I would like to have it end up it may not be a whole lot of extra work
19:14:43 <jeblair> anteaya: sqlalchemy-migrate
19:14:49 <anteaya> k
19:15:03 <jeblair> clarkb: does the jjb refactor conflict with https://review.openstack.org/#/c/54333/5 ?
19:15:23 <clarkb> jeblair: it shouldn't I am talking about the d-g jobs specifically
19:15:28 <jeblair> clarkb: k
19:15:37 <clarkb> since they have grown very unweildy and need better templating
19:16:34 <jeblair> clarkb: so your change would make it easier to add full devstack testing for wsme, which is additional to the tox jobs in 54333
19:16:41 <clarkb> yup
19:16:47 <jeblair> cool, makes sense
19:17:10 <jeblair> end-of-topic?
19:17:25 * dhellmann has nothing to add
19:17:30 <jeblair> dhellmann: thanks
19:17:36 * mordred thins refactor is great
19:17:39 <jeblair> clarkb: is etherpad still a separate topic, or already covered?
19:17:44 <clarkb> jeblair: covered
19:17:46 <jeblair> mordred: thin refactors are the best
19:17:53 <jeblair> #topic  Retiring https://github.com/openstack-ci
19:18:11 <jeblair> who wanted to talk about this?
19:18:26 <fungi> we talked about it,m but wanted to get your input too
19:18:26 <pleia2> someone stumbled upon it recently and it was confusing
19:18:55 <pleia2> want to at least see about deleting publications and gerrit-trigger-plugin
19:19:05 <pleia2> or close the whole thing down
19:19:17 <jeblair> pleia2: i think there is more history in publications that need to be pushed into infra/pub
19:19:22 <clarkb> or, add a dummy project with a README that says go to opensatck-infra instead
19:19:31 <jeblair> clarkb: like https://github.com/openstack-ci/moved-to-openstack-infra
19:19:32 <fungi> clarkb: that's already there
19:19:37 <clarkb> ah
19:19:54 <jeblair> pleia2: the existing historical publications need to be pushed and tagged so they show up
19:19:59 <jeblair> pleia2: then i think publications can go away
19:20:04 <pleia2> ok
19:20:08 <jeblair> that leaves gerrit-trigger-plugin
19:20:25 <fungi> jeblair: i thought we retained the history for pubs and then cleaned it, so the master branch still has those commits, they just need tags
19:20:45 <fungi> or better, separating into branches
19:20:50 <fungi> first
19:21:19 <jeblair> fungi: yeah, i think to be compatible with the new system, we need to make some new commits that move each pub into a top level and then tag those
19:22:08 <jeblair> gerrit-trigger-plugin is a genuine fork; i'm not sure if all our changes were upstreamed
19:22:35 <mordred> I don't believe they were
19:22:41 <jeblair> i think we need to figure out the status of that, and decide whether it's a useful historical artifact
19:23:22 <fungi> so it sounds like we have a couple bugs/action items out of this topic... branchify/tag the old pubs, and decide the fate of g-t-p
19:23:37 <jeblair> yep
19:24:06 <jeblair> i'm not in a position to volunteer for those right now, but will certainly do publications if no one gets to it first.
19:24:18 <pleia2> I could use some practice with tag/branch fun if someone will be available to answer questions as I go
19:24:27 <fungi> pleia2: i can help you on that
19:24:32 <clarkb> pleia2: happy to
19:24:40 <pleia2> ok cool, action me for digging into publications then
19:24:55 <jeblair> #action pleia2 add historical publications tags
19:25:00 <jeblair> pleia2: thanks!
19:25:19 <jeblair> #action jeblair file bug about cleaning up gerrit-git-prep repo
19:25:21 <jeblair> gah
19:25:25 <jeblair> #action jeblair file bug about cleaning up gerrit-trigger-plugin
19:25:35 <jeblair> #topic Savanna testing (SergeyLukjanov)
19:25:43 <jeblair> SergeyLukjanov: ping
19:25:51 <SergeyLukjanov> nothing interesting to report atm
19:25:52 <SergeyLukjanov> o/
19:25:58 <SergeyLukjanov> is it ok to make a job to build images using savanna-image-elements and publish them to tarballs.o.o?
19:26:05 <SergeyLukjanov> just to clarify
19:26:26 <jeblair> SergeyLukjanov: absolutely; i believe trove will need to do something similar
19:26:29 <jeblair> mordred, hub_cap: ^
19:26:31 <mordred> yah
19:26:33 <mordred> that's correct
19:26:38 <mordred> we'd like to generalize that
19:26:54 <SergeyLukjanov> yep, we talked about this need with hub_cap at summit
19:27:22 <SergeyLukjanov> and eventually it'll be great torun integration tests in this images...
19:27:40 <ruhe> SergeyLukjanov, would it be possible to run Savanna integration tests on those images before they get published?
19:28:05 <SergeyLukjanov> ruhe, we can publish master and release images I think
19:28:25 <SergeyLukjanov> and eventually add integration tests to the gate pipeline
19:28:39 <SergeyLukjanov> when will achieve 'hadoop vs. nested virt.'
19:28:46 <fungi> similar to how we do branch-tip, pre-release and release tarballs i expect
19:29:04 <SergeyLukjanov> so, nothing to add from my side, still hope to start creating CRs this week
19:29:44 <SergeyLukjanov> fungi, yep, it should work ok
19:30:06 <jeblair> SergeyLukjanov: ok cool, thanks
19:30:09 <jeblair> #topic Goodbye Folsom (ttx, clarkb, fungi)
19:30:18 <fungi> #link http://lists.openstack.org/pipermail/openstack-stable-maint/2013-November/001723.html
19:30:27 <fungi> i've officially eol'd the integrated projects
19:30:34 <mordred> woot
19:30:40 <pleia2> \o/
19:30:55 <fungi> still need to know what we're supposed to do (if anything) with things like devstack, grenade, oslo-i, manuals, tempest, reqs
19:30:57 <jeblair> yaaay!
19:31:09 <jeblair> fungi: kill them all
19:31:14 <clarkb> with fire!
19:31:24 <fungi> they have stable/folsom branches. tag then delete, same as the rest?
19:31:51 <jeblair> fungi: i believe that is the correct thing to do.  note that many of them will have significant trouble landing patches to those branches now.  :)
19:32:17 <fungi> and the topic more properly should have been goodbye folsom, hello havana since clarkb worked on getting the new stable/havana jobs in last week
19:32:18 <clarkb> definitely reqs, devstack and tempest
19:32:31 <clarkb> since we can't effectively test them on folsom anymore
19:32:42 <clarkb> oh and grenade, might as well do them all
19:32:51 <fungi> clarkb: i approved your job templating last night and it seems to have worked
19:33:11 <clarkb> yes we have periodic/bitrot jobs for havana now and changes to d-g are tested against havana and grizzly as well as master
19:33:33 <fungi> though dprince's fgrenade change is still grinding in the gate of doom
19:33:42 <fungi> #link https://review.openstack.org/57066
19:34:32 <fungi> (to move the base for master tests to havana instead of grizzly)
19:34:46 <jeblair> fungi, clarkb, dprince: thanks for this!
19:34:56 <jeblair> #topic Jenkins 1.540 upgrade (zaro, clarkb)
19:35:10 <dprince> fungi: I am actually a bit perplexed by my grenade failure there.
19:35:46 <dprince> fungi: can take that offline w/ sdague/dtroyer perhaps though
19:35:58 <fungi> dprince: i suspect it's just the volume of tempest tests being run against d-g coupled with the nondeterminism in the gate right now
19:36:02 <zaro> the reason we want to upgrade jenkins is that there was a major fix to reduce number of threads by 75%
19:36:18 <sdague> dprince, there is a grenade patch that needs to land as well for that to work
19:36:25 <clarkb> and with recent jenkins trouble, we figure it is worth a shot to upgrade and run jenkins that doesn't have so much overhead
19:36:27 <sdague> maurosr was working on it, I don't know it's status
19:36:47 <anteaya> sdague: hello
19:36:49 <zaro> clarkb has latest jenkins on jenkins-dev.o.p
19:36:53 <dprince> fungi: thanks, sdague: can you point me to it... or in the right direction?
19:36:58 <clarkb> I upgraded jenkins-dev yesterday, it seems to be fine but the jenkins test script in config is very old d-g and old zuul oriented
19:37:03 * maurosr reading
19:37:34 <fungi> and the release notes for new jenkins mention a lot of churn in parts of the api we use, i think
19:37:35 <clarkb> jeblair: fungi: mordred: any opinions on how we should test jenkins-dev (they upgraded the ssh-slaves and credentials plugins that are bundled with jenkins and changed some of the permissions around node creation/update/delete)
19:37:47 <mordred> oy. that sounds like fun
19:37:56 <jeblair> clarkb: i think the api calls it exercises should still be about the same, right?  add/remove nodes, etc...
19:38:14 <clarkb> jeblair: possibly. they actual calls are made in the zuul and d-g source code
19:38:23 <sdague> maurosr, this is the patch that takes the version specific upgrade scripts into that separate directory, instead of just based on the branch we're in in grenade
19:38:25 <jeblair> clarkb: ah, heh.
19:38:27 <clarkb> jeblair: so I think rewriting it to use nodepool is what we may need to do?
19:38:43 <clarkb> or and this was my crazy idea this morning
19:38:47 <anteaya> could we introduce a new jenkins master with the upgrade? and then move over from there?
19:38:54 <clarkb> maybe we can point prod nodepool at jenkins-dev?
19:39:07 <clarkb> I don't think prod nodepool at jenkins-dev will work due to ssh key mismatches
19:39:11 <maurosr> sdague: yup, I'm finishing it, I can submit it today (yesterday and friday had some troubles and the holiday that didnt let me work on it)
19:39:34 <maurosr> but will submit it today for sure
19:39:47 <jeblair> clarkb: so maybe we need a nodepool-dev? ugh.
19:39:50 <maurosr> replacing the other one which cleans everything
19:39:52 <clarkb> anteaya: that is one possibility but we should be able to do an upgrade in place
19:40:03 <clarkb> jeblair: that was another thing I considered, possibly just run it on jenkins-dev
19:40:04 <fungi> clarkb: mordred has had success getting nodepool running on his laptop... maybe something similar could be pointed at jenkins-dev instead of needing a nodepool-dev server?
19:40:25 <jeblair> colocating nodepool-dev service on jenkins-dev sounds fine
19:40:42 <fungi> oh, or locally installed on jenkins-dev itself. yeah, not a bad idea at all
19:40:59 <anteaya> clarkb k
19:41:25 <jeblair> clarkb: sound like a plan?
19:41:27 <clarkb> sure
19:41:38 <clarkb> zaro: any chance you want to work on the puppet to do that?
19:41:38 <jeblair> cool
19:42:02 <zaro> clarkb: can do.
19:42:09 <jeblair> clarkb, zaro: in fact, i think that may actually be how the old devstack-gate stuff was set up on jenkins-dev.
19:42:31 <clarkb> #action zaro setup dev nodepoold on jenkins-dev
19:42:39 <jeblair> #topic New devstack job requirements (clarkb)
19:43:28 <clarkb> one of the things that came out of adding havana d-g jobs was that we have two large classes of d-g jobs. There are d-g jobs that run against patches and d-g jobs that run periodically against tip of $branch
19:44:09 <clarkb> I would like to propose that we require and new d-g jobs supply both forms as two different templates so that when icehouse rolls around adding d-g jobs for it is as simple as updating zuul layout.yaml
19:44:36 <clarkb> with havana I was juggling a lot of missing pieces and I think staying on top of that through a cycle would be better
19:44:59 <jeblair> clarkb: i agree with the proposal in principle; i may want to see your refactor before agreeing to the specifics
19:45:00 <fungi> and i think the current state with regard to that is great now, after your last change went in
19:46:20 <clarkb> jeblair: thats fair, there is a little more work to coalesce the branch specific check jobs with the rest of the check/gate jobs and stable branche periodic jobs with master periodic jobs
19:46:40 <clarkb> right now we have ~4 distinct classes of d-g job and I think I can roll that into two
19:46:53 <clarkb> so getting it to two is a good first step
19:47:56 <jeblair> cool; if there are any new d-g jobs while clarkb works on this, we should probably run those changes by him to make sure it fits with this work
19:48:09 <fungi> yes, agreed
19:48:26 <jeblair> #topic Increased space yet again on static.o.o (fungi)
19:48:39 <fungi> this was more of a public service announcement
19:48:52 <pleia2> time to add some disk space monitoring?
19:49:02 <jeblair> fungi: was this during the summit, or once again, afterwords?
19:49:18 <fungi> i increased the logs volume during the summit to 4tb, but then caught it again last week just before it filled up and pushed it to 5tb
19:49:28 <fungi> http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=717&rra_id=all
19:49:36 <fungi> er
19:49:41 <fungi> #link http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=717&rra_id=all
19:50:06 <fungi> i confirmed the deletion weekly cron job is working as intended to expire 6-month-old content
19:50:21 <fungi> we're just on an ever-increasing treadmill of log collection
19:50:44 <jeblair> fungi: that's exciting.
19:50:54 <fungi> similarly, docs-draft is up to 400gb now
19:51:11 <clarkb> we have a giant firehose filling water ballons
19:51:24 <fungi> anyway, that's all i wanted to mention on the topic
19:51:35 <jeblair> fungi: i think we should make those volumes be at 50%
19:51:44 <fungi> jeblair: i'm happy to do that
19:51:50 <jeblair> fungi: cool, thanks
19:52:07 <fungi> #action fungi grow logs and docs-draft volumes so they're 50% full
19:52:09 <jeblair> i believe we had expected them to stabilize.  we might have been wrong about that.  :(
19:52:25 <clarkb> so I haven't been able to confirm this yet
19:52:48 <clarkb> but the nova logs exploded in size (~8MB compressed for n-cpu??) due to the iso8601 logging
19:53:02 <clarkb> fungi: any idea if that caused a significant uptick in log size?
19:53:27 <fungi> clarkb: it was hard to tell, but i'd expect compression to water that increase down if it's repetitive lines
19:53:51 <jeblair> hrm, total artifact size by zuul job would be a nice metric; the new log uploading system could report that.
19:54:20 <fungi> also, there was no obvious uptick in utilization on the voume, just a fairly steady linear progression there
19:54:28 <clarkb> gotcha
19:54:29 <jeblair> #topic Open discussion
19:54:31 * zaro meeting topic, sorry came in late.
19:54:48 <zaro> new jjb release
19:55:05 <zaro> anybody have issues with that?
19:55:11 <clarkb> oh ya, we had someone submit a bug requesting a new release and I am +1 on doing it
19:55:23 <pabelanger> Also wanted to remind people about https://review.openstack.org/#/c/56107/ maybe get some feedback about the packaging import
19:55:35 <clarkb> I do think we should get the new jjb-ptl group change in so that zaro and maybe others can do the releases
19:55:54 <clarkb> #link https://review.openstack.org/#/c/56823/
19:56:01 <jeblair> clarkb: jjb has a ptl?
19:56:04 <anteaya> I'm continuing to work in -neutron
19:56:15 <clarkb> jeblair: no, but read the the change comments for why it was named that way :)
19:56:24 <clarkb> jeblair: mordred wanted consistency
19:56:49 * mordred doens't feel strongly about it
19:56:53 <jeblair> clarkb: well, the group is named that way to remind us to keep it small.  :)
19:57:09 <mordred> but I know we've pushed back on people before making a thing not called -ptl
19:57:13 <clarkb> yup
19:57:16 <clarkb> I am fine with the name
19:57:21 <jeblair> mordred: i'm not opposed to the name
19:57:22 <mordred> that said - perhaps -ptl is the wrong name for that role and -release-manager is a better name?
19:57:27 * mordred doesn't want to bikeshed too much
19:57:28 <jeblair> mordred: i'm not sold on the _idea_
19:57:33 <mordred> ah
19:57:35 <mordred> gotcha
19:57:56 <clarkb> jeblair: so I suggested it because openstack-ci-core has really taken a back seat in JJB reviews lately
19:58:09 <jeblair> clarkb: one of us has been on vacation
19:58:11 <clarkb> I think a subset of the jjb core group is in a better position to cut releases
19:59:50 <fungi> and that's about it for time
19:59:59 <jeblair> clarkb: the people who can make releases should definitely be a subset of the jjb core group.  it currently is.
20:00:12 <jeblair> #endmeeting