19:03:17 <fungi> #startmeeting infra
19:03:17 <tonyb> o/
19:03:18 <openstack> Meeting started Tue Mar 15 19:03:17 2016 UTC and is due to finish in 60 minutes.  The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:03:19 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:03:22 <openstack> The meeting name has been set to 'infra'
19:03:24 <fungi> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:03:29 <fungi> #topic Announcements
19:03:32 <craige> O/
19:03:38 <fungi> pleia2 is awesome and started this for us:
19:03:42 <fungi> #link https://etherpad.openstack.org/p/infra-newton-summit-planning Newton Summit Planning
19:03:47 <fungi> some people have already begun brainstorming ideas but we should get rolling on this and have some decisions in next week's meeting
19:03:54 <fungi> it sounds like we'll get at least the space requested (4 workrooms, 3 fishbowls, a day of possibly shared sprint) but there may be more available if we know _very_ soon so i can still request it
19:03:57 <nibalizer> o/
19:04:10 <fungi> #info Please add Infra team summit session ideas to the Etherpad
19:04:18 <fungi> #topic Actions from last meeting
19:04:20 <Clint> o/
19:04:27 <fungi> we have a couple!
19:04:32 <fungi> jeblair Give the TC a heads up on "Create elections.openstack.org" spec
19:04:36 <fungi> #link http://eavesdrop.openstack.org/meetings/tc/2016/tc.2016-03-08-20.01.log.html#l-93
19:04:51 <jeblair> i did it!
19:04:51 <fungi> looks like that happened, they discussed it some and weighed in with positivity
19:05:07 <fungi> we have a moment to discuss this further down in the specs approval
19:05:13 <fungi> yolanda Boot a replacement review.openstack.org and communicate the new IP address and maintenance window in an announcement E-mail
19:05:18 <fungi> #link http://lists.openstack.org/pipermail/openstack-dev/2016-March/088985.html
19:05:18 <yolanda> it's done
19:05:20 <fungi> that too happened
19:05:23 <fungi> yes
19:05:41 <fungi> thank you both! we're back to a starting count of 0 action items!
19:06:08 <fungi> yolanda: also not sure if you saw, but the extra gerritbot is probably on that server and should be shut down if you get a chance
19:06:14 <fungi> #topic Specs approval
19:06:16 <yolanda> i already stopped it
19:06:20 <fungi> #link https://review.openstack.org/287577 PROPOSED: Create elections.openstack.org
19:06:55 <fungi> #link https://review.openstack.org/292666 Amendment to elections spec to use governance.openstack.org instead
19:07:08 <fungi> tonyb: jhesketh: these seem ready to go for a council vote now?
19:07:29 <tonyb> fungi: I believe so
19:07:32 <fungi> jeblair: the tc seemed generally approving of having this grafted into a subtree of the governance site?
19:07:45 <jeblair> fungi: that was my interpretation of the meeting and review feedback
19:07:52 <jeblair> so i'm happy to see both of those go to voting together
19:08:17 <fungi> #info Voting is open on the "Create elections.openstack.org" spec and "Publish election data to governance.o.o" update for it until 19:00 UTC on Thursday, March 17
19:08:54 <fungi> #topic Priority Efforts
19:09:17 <fungi> looks like we don't have any urgent priority effort updates this week so i'll skip this and see how many of the general topics we can get through
19:09:29 <fungi> #topic Upgrade servers from ubuntu precise to ubuntu trusty (pabelanger)
19:09:30 <rcarrillocruz> i'd appreciate more reviews to https://review.openstack.org/#/c/239810/
19:09:42 <rcarrillocruz> related to puppet-openstackci priority effort
19:09:43 <pabelanger> ohai
19:10:07 <pabelanger> So, we still have servers running precise, and I figure I can offer my service to upgrade them to trusty
19:10:11 <pabelanger> this is mostly for some bindep stuff
19:10:19 <pabelanger> but also prep for 16.04
19:10:31 <pabelanger> for the most part they are 3 servers atm
19:10:32 <fungi> #link https://review.openstack.org/#/q/status:open+topic:trusty-upgrades Review topic for server upgrades to Ubuntu Trusty
19:10:38 <pabelanger> any low hanging fruit
19:10:47 <pabelanger> I figure puppetdb.o.o could be the first one
19:10:49 <jeblair> is there something applying pressure for us to do so?
19:10:50 <fungi> yeah, one caught us by unfortunate surprise this morning
19:11:11 <pabelanger> with the other 2 lists.o.o and cacti.o.o requiring migration discussion
19:11:18 <fungi> that is, we were unknowingly validating our zuul configuration on trusty while running zuul in produciton on precise
19:11:34 <pabelanger> jeblair: no, it just came up when talking with fungi about bindep stuff for ubuntu precise
19:11:53 <pabelanger> so we could remove bare-precise
19:12:16 <fungi> i know the openstackid-resources api split is also applying pressure to get openstackid-dev.o.o and openstackid.org upgraded from precise to trusty so that they can run a consistent framework with what the openstackid-resources.o.o server will need
19:12:33 <jeblair> do we still need precise nodes for stable branches?
19:12:49 <fungi> but yeah, right now our options for testing precise are basically either to only do that in rackspace or to make a precise ubuntu-minimal work for dib imagery
19:13:06 <fungi> precise hasn't been needed for stable testing since icehouse went eol almost a year ago
19:13:22 <jeblair> ok, so we're the last users?
19:13:26 <fungi> that we are
19:13:30 <pabelanger> Yup
19:13:31 <jeblair> that seems reason enough
19:13:46 <jeblair> i am now sufficiently motivated
19:13:56 <pabelanger> So, the changes up are minimal puppet changes
19:14:09 <pabelanger> where puppetdb we could afford to lose data
19:14:16 <pabelanger> but lists and cacti we might need to migrate
19:14:32 <jeblair> pabelanger: agreed (except s/might//)
19:14:36 <pabelanger> fungi: suggested moving the data to cinder to help transfer it between server
19:14:48 <jeblair> not a bad idea
19:15:06 <fungi> right, i was thinking cinder first, then move to new servers for those
19:15:28 <fungi> cinder and trove make upgrades like this a lot faster
19:15:42 <fungi> cacti is less of a problem in that a longish downtime is not critical
19:15:52 <jeblair> do we use trove with cacti?
19:16:11 <fungi> i don't recall if cacti needs mysql at all
19:16:12 <jeblair> lists only has data in the filesystem; cacti has mysql and filesystem data
19:16:13 <pabelanger> need to check
19:16:32 <amrith> did someone say trove ;)
19:16:37 <fungi> ahh, so yeah may also want to trove the cacti server if so
19:16:39 <jeblair> there is a mysqld running on cacti
19:17:06 <jeblair> i think that should be doable
19:17:17 <fungi> but anyway, lengthy migration for cacti is not a serious problem. extended mailman outages on the other hand will be painful for the community
19:17:18 <pabelanger> okay, I can look into that and get puppet code up
19:17:49 <pabelanger> puppetdb should be easy to test, stand up new one, validate reports post, then update dns
19:17:56 <fungi> so at least putting mailman's persistent data on cinder would be awesome for keeping the cut-over shorter
19:18:01 <pabelanger> suspend / delete old instance
19:18:20 <fungi> anything else on this topic?
19:18:25 <craige> phabricator.o.o would be a great test target too
19:18:33 <pabelanger> nothing now, I'll follow up offline
19:18:42 * craige would appreciate that :-)
19:18:50 <pabelanger> craige: Yup, we should aim for trusty for that too
19:19:04 <fungi> yeah, we have a lot of low-hanging fruit servers for this. a quick read of the global site manifest's testing platform comments will reveal which ones hopefully, or puppetboard facts view
19:19:08 <craige> It's required.
19:19:35 <fungi> thanks pabelanger! i think this is a worthwhile effort and am looking forward to it
19:19:35 <craige> and a blocker to going live.
19:19:42 <pabelanger> fungi: np
19:19:46 <rcarrillocruz> fungi, pabelanger: i don't mind taking some of them, i'd need a root to make the vm spin tho
19:20:21 <fungi> rcarrillocruz: yep, we'll make sure there's at least one infra-root on hand for these as a shepherd. we can divide them up as needed
19:20:27 <fungi> #topic Constraints - next steps? (AJaeger)
19:21:05 <AJaeger> We have constraints enabled in gate and periodic jobs but not post and release jobs. To enable it everywhere, we need https://review.openstack.org/#/c/271585/ to enhance zuul-cloner. Thanks jesaurus for working on this.
19:21:13 <AJaeger> I'm unhappy that the constraints changes did not finish so far and wanted to discuss next steps. Should we push this for Mitaka and have the projects that currently support constraints (at least cinder, glance, some neutron ones) full support? My preference is to declare this is a Newton target and get zuul-cloner enabled but do further work  - including proper announcements - only for Newton.
19:21:35 <fungi> #link https://review.openstack.org/271585 Let zuul-cloner work in post and release pipelines
19:21:57 <AJaeger> So, what are your thoughts on direction here for Mitaka and Newton?
19:23:16 <anteaya> I still find contraints confusing and have no thoughts to offer to this conversation
19:23:51 <AJaeger> Basically what needs to be done to finish this AFAIU: 1) Get 271585 pass the testsuite (too large logs); 2) Update post/release jobs; 3) Document setup and announce it so that projects can update their tox.ini files.
19:24:53 <fungi> it does look like jesusaur is still working on it
19:24:53 <AJaeger> We could also make the current setup a bit more robust - so projects enabled it and use same environment (working fine) in gate but then had no tarballs (post queue).
19:25:11 <fungi> jesusaur: do you need additional assistance debugging the job failures on 271585?
19:25:24 <AJaeger> AFAIU jesusaur has it testing locally, it fails in our gate due too large logs
19:25:52 <fungi> that seems like an odd reason for a job to fail
19:26:32 <AJaeger> that's what I remember from some #openstack-infra discussion but I might have misread
19:26:37 <fungi> #link http://logs.openstack.org/85/271585/5/check/gate-zuul-python27/75de99c/console.html.gz#_2016-03-10_20_52_55_179 testrepository.subunit was > 50 MB of uncompressed data!!!
19:26:51 <jeblair> !!!
19:26:52 <openstack> jeblair: Error: "!!" is not a valid command.
19:26:52 <AJaeger> fungi, your incredible!
19:27:07 <fungi> yeah, it's a remarkably emphatic error message
19:27:54 <AJaeger> Hearing the enthusiasm ;) here, I suggest to declare it a Newton target...
19:28:11 <fungi> so i guess the questions this raises are, 1. is the subunit output from zuul's unit tests too verbose or possibly full of junk? 2. is 50mb uncompressed a sane cap for subunit data?
19:28:45 <jeblair> they are certainly too verbose except in the case of a test failure :(
19:29:07 <fungi> if someone gets time, a git blame on that to find why the check for uncompressed subunit data size was introduced would be handy as a data point (e.g., was it for keeping the subunit2sql database sane?)
19:29:12 <jesusaur> fungi: the tests pass locally for the change, but they fail in the check pipeline due to either the size of the resulting subunit file or they hit an alarm and time out
19:29:41 <jeblair> fungi: istr it was actuall just for uploading to the logserver
19:29:53 <mordred> o hai
19:30:03 <jeblair> i think it's probably an appropriate limit for 'unit tests'.
19:30:11 <jeblair> but not so much for zuul's functional "unit" tests
19:30:51 <fungi> granted, many of even our oldest projects' "unit" tests are more "functional" testing (i'm looking at you, oh great and venerable nova, performing database interactions)
19:31:15 <jeblair> yeah, but zuul starts and reconfigures several hundred times during its tests
19:31:30 <jeblair> there's a lot of startup boilerplate in there, along with full gearman debug logs
19:32:05 <fungi> AJaeger: mitaka integrated rc is already nearly upon us, it does seem like newton will be when we have constraints more widely in use for things besides devstack-based jobs
19:32:58 <jeblair> i'm not sure what to do about the subunit cap.  i definitely need the logging information locally, and i need it in the gate when we get those occasional 'only fails in the gate' bugs.
19:33:02 <AJaeger> fungi, yes.
19:33:06 <jeblair> i don't need it in the gate on tests that pass
19:33:20 <jeblair> (though i do need it locally on tests that pass)
19:33:23 <fungi> AJaeger: and i also agree that declaring open season on constraints integration in tox configs opens us up to a potential risk of breaking things like tarball jobs (as already seen on at least one project recently)
19:33:25 <bkero> compress it, offer it up in the finished build artifacts?
19:33:40 <fungi> bkero: we already do
19:34:00 <jeblair> so i wonder if we could prune the logs from tests that pass.  i have no idea if/how that would be possible with testr
19:34:06 <fungi> bkero: however the job checks the raw uncompressed size and complains, then fails on that, with the current implementation
19:34:16 <AJaeger> fungi, so for Mitaka: Projects can use it but we advise against it - or if they do ask them to triple check ;)
19:34:49 <fungi> AJaeger: yes, i think it's still mostly experimental until we can let them use it on more than just check/gate type jobs
19:34:49 <bkero> Oh, that seems incorrect O_o
19:35:02 <jeblair> bkero: what seems incorrect?
19:35:03 <AJaeger> but we can't control it anyway, they just change their tox.ini without us
19:35:21 <bkero> jeblair: compressing data, but checking uncompressed size
19:35:26 <fungi> AJaeger: agreed, but we can tell them that when it breaks they get to keep both pieces
19:35:38 <AJaeger> fungi ;)
19:35:49 <jeblair> bkero: if you check the compressed size, then it's not a check about how much output was produced, it's a check on how well the compression algorithm performed
19:36:43 <fungi> granted, if your concern is how much storage you're using for your compressed logs, then checking compressed size does make more sense (which is why i suggested looking into the reason the check was introduced)
19:37:01 <bkero> Yep, that's true. I don't know the context of what this check limitation is supposed to do. Not constrain resources, or check for discrepancies in the run
19:37:03 <jeblair> occasionally projects would output insane amounts of data into their subunit logs, this was to catch those occurences.  it's working.  zuul outputs an insane amount of data.  i find it useful.
19:37:18 <fungi> without better history of why we started checking that, i'm hesitant to suggest blindly altering it
19:37:54 <jesusaur> could we reduce the logging level of some of the modules, but keep DEBUG for zuul.* ?
19:38:20 <jeblair> jesusaur: i think the bulk of the data are zuul debug lines
19:38:54 <jeblair> jesusaur: gear.* might be significant
19:38:56 <jesusaur> I think gear is also a contender for that top spot
19:39:41 <fungi> are your primary concerns on this topic answered, AJaeger?
19:39:45 <AJaeger> fungi, yes
19:40:03 <jeblair> jesusaur: heh, yeah, gear might exceed zuul
19:40:12 <fungi> we can probably move zuul test log improvements to a review, ml thread or #openstack-infra
19:40:22 <jeblair> well, what would be useful...
19:40:24 <AJaeger> thanks, fungi. we can move on. Unless you want to log that constraints are experimental besides devstack
19:40:28 <fungi> unless anyone is finding it useful to continue brainstorming that in the meeting
19:40:36 <jeblair> is to have someone who understands testrepository willing to pitch in
19:42:04 <fungi> #agreed Constraints via zuul-cloner for jobs outside check and gate type pipelines is not currently supported, and general use of constraints in tox should be implemented with care to avoid risk to other non-constraints-using tox jobs until support is fully implemented
19:42:46 <fungi> anyone disagree with tat? i can #undo and rework if needed
19:42:53 <anteaya> I do not disagree
19:43:08 <fungi> otherwise we still have a couple more topics to fill the remaining 15 minutes
19:43:18 <AJaeger> nicely formulated, fungi. +1
19:43:45 <fungi> #topic Operating System Upgrades (anteaya)
19:43:54 <anteaya> #link https://etherpad.openstack.org/p/infra-operating-system-upgrades
19:43:54 <fungi> #link https://etherpad.openstack.org/p/infra-operating-system-upgrades precise -> trusty -> xenial
19:43:58 <anteaya> thanks
19:43:59 <fungi> #undo
19:44:00 <openstack> Removing item from minutes: <ircmeeting.items.Link object at 0xac5ccd0>
19:44:10 <anteaya> so this came up last meeting and this meeting already
19:44:36 <anteaya> this is mostly a place to track what operating systems we are running and what we want to upgrade
19:44:41 <fungi> this does seem like an extension of pabelanger's earlier topic
19:44:45 <anteaya> we have identified during pabelanger item
19:44:47 <anteaya> yes
19:44:54 <fungi> though including the forward-looking plans for ubuntu 16.04
19:45:00 <anteaya> that precise servers are readyg to move to trusty
19:45:02 <anteaya> yes
19:45:08 <anteaya> also dib capabilities
19:45:14 <anteaya> and what nodes have available
19:45:22 <pabelanger> Agreed. I think getting trusty is a good base for xenial
19:45:29 <pabelanger> and don't mind helping anteaya where possible
19:45:33 <anteaya> so I don't personally have anything to offer here, other than the etherpad
19:45:38 <anteaya> pabelanger: thanks
19:45:50 <anteaya> mostly trying to track decisions and statsu
19:46:11 <anteaya> I felt I was less effective that I could ahve been helping AJaeger with bare-trusty to ubuntu-trusty
19:46:18 <fungi> having xenial for use in jobs is a definite first step. upgrading our servers from trusty to xenial (or from precise to xenial) should wait for at least a few months according to opinions expressed in earlier meetings
19:46:23 <anteaya> mostly becauase I didnt' understand the large picutre or status
19:46:33 <anteaya> fungi: yes
19:46:36 * AJaeger is gratefull for everybody's help with that conversion
19:46:53 <anteaya> also clarkb has an item in the summit plannnig etherpad on operating systems
19:47:00 <anteaya> so its a topic we are talking about
19:47:10 <anteaya> would mostly like to aggregate if we can
19:47:11 <fungi> and also, yes, having _one_ image for jobs running on trusty provides us with an easier starting point for the coming trusty->xenial support mogration in our jobs
19:47:25 <fungi> er, migration
19:47:48 <jeblair> the etherpad notes puppet 384 in xenial...
19:47:48 <anteaya> and if we don't like this etherpad, that is fine, as long as we have something
19:47:58 <jeblair> do we want to switch to distro packages for puppet?
19:48:02 <jeblair> or keep using puppetlabs?
19:48:05 <anteaya> jeblair: I got that from pabelanger's comment from last weeks meeting
19:48:35 <fungi> part of the complexity in the precise->trusty job migration across different branches was complicated by having several precise images (bare, devstack, py3k/pypy) and a couple of trusty images (bare, devstack)
19:48:52 <pabelanger> jeblair: anteaya: Ya, I was surprised to see xenial still using puppet 3.x
19:49:03 <fungi> so there was an element of many<->many mapping
19:49:14 <pabelanger> I'll defer to nibalizer and crinkle for version to use on ubuntu
19:49:39 <crinkle> use latest 3.x from apt.puppetlabs.com until we're ready to migrate to 4
19:49:54 <anteaya> crinkle: can you add that to the etherpad?
19:49:56 <fungi> i'm pretty sure we're going to want to use puppetlabs packages for at least as long as we're puppeting multiple distros in production, so that we don't have to deal with keeping everything sane across multiple simultaneous puppet versions
19:49:58 <jeblair> i think one of the advantages of using puppetlabs is that we can actually migrate os versions
19:50:02 <anteaya> as your recommendation?
19:50:03 <jeblair> fungi: right, that.
19:50:12 <nibalizer> yep
19:50:16 <crinkle> anteaya: sure
19:50:20 <anteaya> crinkle: thanks
19:50:25 <AJaeger> regarding puppet 4: We'Re not testing Infra puppet scripts for puppet 4 - should we start that?
19:50:35 <pabelanger> AJaeger: no, we are. Fedora-23
19:50:42 <AJaeger> pabelanger: ah...
19:50:46 <pabelanger> so we get some coverage
19:50:50 <AJaeger> great
19:50:56 <nibalizer> AJaeger: there is a spec for that sorta
19:51:14 <nibalizer> i think its more about getting linting working
19:51:36 <pabelanger> AJaeger: and part of the reason we missed fedora-22 in the gate
19:51:38 <nibalizer> pabelanger: f23 doesnt test the majority of our codr
19:51:52 <pabelanger> nibalizer: right, but we could start to with puppet apply if needed
19:52:37 <nibalizer> we should get a nonvoting puppet four trusty apply test
19:53:02 <fungi> so as for anteaya's etherpad, i think that makes a fine place to coordinate an overview of what changes are migrating what servers from precise to trusty. the trusty to xenial migration for servers is still a ways out and we may find new things we want/need to plan for on the road to xenial anyway
19:53:12 <pabelanger> puppet openstack team does some good testing around using straight gems, if we wanted to do that too
19:53:36 <anteaya> fungi: great thank you
19:53:39 <fungi> we still have a few more minutes to get to the final topic on the agenda
19:53:47 <fungi> if this is mostly addressed
19:53:51 <anteaya> also in that etherpad mordred has made some notes about dib and xenial
19:53:57 <anteaya> yup, let's move on, thank you
19:54:05 <fungi> #topic StackViz deployment spec (timothyb89)
19:54:17 <fungi> #link https://review.openstack.org/287373
19:54:39 <timothyb89> right, so we're looking for preliminary feedback on the spec, particularly the change proposal and our build method
19:54:43 <timothyb89> we've had some alternative methods suggested, so we'd like to know if there can be a consensus on the method in the spec (DIB) vs puppet vs something else
19:55:46 <fungi> reviewers so far who provided feedback on that spec seem to have been AJaeger and pabelanger
19:56:04 <pabelanger> I left comments about puppet, but haven't heard anybody else have an opinion
19:56:40 <fungi> so the question is between adding puppetry to install stackviz and its dependencies, and pre-seed its data, vs using a dib element?
19:56:56 <timothyb89> fungi: yes, exactly
19:57:13 <fungi> it's true that we do similar things both ways currently
19:57:44 <fungi> one long-term vision is to continue reducing the amoun of puppet we apply to our base images, in hopes of maybe eventually being able to not have to preinstall puppet on them at all
19:58:38 <fungi> our reliance on puppet for our image builds is already pretty minor these days
19:59:03 <fungi> we have a lot of orchestration being done in dib elements themselves, by nodepool ready scripts, and at job runtime
19:59:36 <fungi> anyway, i guess people with a vested interest in seeing how this is done, please follow up on the spec
19:59:45 <pabelanger> my only real concern, is if we use dib elements, we are looking gate testing of stackviz
19:59:50 <fungi> we're basically out of meeting time for this week
20:00:09 <fungi> thanks everybody! time to make way for the tc
20:00:14 <fungi> #endmeeting