19:03:17 #startmeeting infra 19:03:17 o/ 19:03:18 Meeting started Tue Mar 15 19:03:17 2016 UTC and is due to finish in 60 minutes. The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:03:19 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:03:22 The meeting name has been set to 'infra' 19:03:24 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:03:29 #topic Announcements 19:03:32 O/ 19:03:38 pleia2 is awesome and started this for us: 19:03:42 #link https://etherpad.openstack.org/p/infra-newton-summit-planning Newton Summit Planning 19:03:47 some people have already begun brainstorming ideas but we should get rolling on this and have some decisions in next week's meeting 19:03:54 it sounds like we'll get at least the space requested (4 workrooms, 3 fishbowls, a day of possibly shared sprint) but there may be more available if we know _very_ soon so i can still request it 19:03:57 o/ 19:04:10 #info Please add Infra team summit session ideas to the Etherpad 19:04:18 #topic Actions from last meeting 19:04:20 o/ 19:04:27 we have a couple! 19:04:32 jeblair Give the TC a heads up on "Create elections.openstack.org" spec 19:04:36 #link http://eavesdrop.openstack.org/meetings/tc/2016/tc.2016-03-08-20.01.log.html#l-93 19:04:51 i did it! 19:04:51 looks like that happened, they discussed it some and weighed in with positivity 19:05:07 we have a moment to discuss this further down in the specs approval 19:05:13 yolanda Boot a replacement review.openstack.org and communicate the new IP address and maintenance window in an announcement E-mail 19:05:18 #link http://lists.openstack.org/pipermail/openstack-dev/2016-March/088985.html 19:05:18 it's done 19:05:20 that too happened 19:05:23 yes 19:05:41 thank you both! we're back to a starting count of 0 action items! 19:06:08 yolanda: also not sure if you saw, but the extra gerritbot is probably on that server and should be shut down if you get a chance 19:06:14 #topic Specs approval 19:06:16 i already stopped it 19:06:20 #link https://review.openstack.org/287577 PROPOSED: Create elections.openstack.org 19:06:55 #link https://review.openstack.org/292666 Amendment to elections spec to use governance.openstack.org instead 19:07:08 tonyb: jhesketh: these seem ready to go for a council vote now? 19:07:29 fungi: I believe so 19:07:32 jeblair: the tc seemed generally approving of having this grafted into a subtree of the governance site? 19:07:45 fungi: that was my interpretation of the meeting and review feedback 19:07:52 so i'm happy to see both of those go to voting together 19:08:17 #info Voting is open on the "Create elections.openstack.org" spec and "Publish election data to governance.o.o" update for it until 19:00 UTC on Thursday, March 17 19:08:54 #topic Priority Efforts 19:09:17 looks like we don't have any urgent priority effort updates this week so i'll skip this and see how many of the general topics we can get through 19:09:29 #topic Upgrade servers from ubuntu precise to ubuntu trusty (pabelanger) 19:09:30 i'd appreciate more reviews to https://review.openstack.org/#/c/239810/ 19:09:42 related to puppet-openstackci priority effort 19:09:43 ohai 19:10:07 So, we still have servers running precise, and I figure I can offer my service to upgrade them to trusty 19:10:11 this is mostly for some bindep stuff 19:10:19 but also prep for 16.04 19:10:31 for the most part they are 3 servers atm 19:10:32 #link https://review.openstack.org/#/q/status:open+topic:trusty-upgrades Review topic for server upgrades to Ubuntu Trusty 19:10:38 any low hanging fruit 19:10:47 I figure puppetdb.o.o could be the first one 19:10:49 is there something applying pressure for us to do so? 19:10:50 yeah, one caught us by unfortunate surprise this morning 19:11:11 with the other 2 lists.o.o and cacti.o.o requiring migration discussion 19:11:18 that is, we were unknowingly validating our zuul configuration on trusty while running zuul in produciton on precise 19:11:34 jeblair: no, it just came up when talking with fungi about bindep stuff for ubuntu precise 19:11:53 so we could remove bare-precise 19:12:16 i know the openstackid-resources api split is also applying pressure to get openstackid-dev.o.o and openstackid.org upgraded from precise to trusty so that they can run a consistent framework with what the openstackid-resources.o.o server will need 19:12:33 do we still need precise nodes for stable branches? 19:12:49 but yeah, right now our options for testing precise are basically either to only do that in rackspace or to make a precise ubuntu-minimal work for dib imagery 19:13:06 precise hasn't been needed for stable testing since icehouse went eol almost a year ago 19:13:22 ok, so we're the last users? 19:13:26 that we are 19:13:30 Yup 19:13:31 that seems reason enough 19:13:46 i am now sufficiently motivated 19:13:56 So, the changes up are minimal puppet changes 19:14:09 where puppetdb we could afford to lose data 19:14:16 but lists and cacti we might need to migrate 19:14:32 pabelanger: agreed (except s/might//) 19:14:36 fungi: suggested moving the data to cinder to help transfer it between server 19:14:48 not a bad idea 19:15:06 right, i was thinking cinder first, then move to new servers for those 19:15:28 cinder and trove make upgrades like this a lot faster 19:15:42 cacti is less of a problem in that a longish downtime is not critical 19:15:52 do we use trove with cacti? 19:16:11 i don't recall if cacti needs mysql at all 19:16:12 lists only has data in the filesystem; cacti has mysql and filesystem data 19:16:13 need to check 19:16:32 did someone say trove ;) 19:16:37 ahh, so yeah may also want to trove the cacti server if so 19:16:39 there is a mysqld running on cacti 19:17:06 i think that should be doable 19:17:17 but anyway, lengthy migration for cacti is not a serious problem. extended mailman outages on the other hand will be painful for the community 19:17:18 okay, I can look into that and get puppet code up 19:17:49 puppetdb should be easy to test, stand up new one, validate reports post, then update dns 19:17:56 so at least putting mailman's persistent data on cinder would be awesome for keeping the cut-over shorter 19:18:01 suspend / delete old instance 19:18:20 anything else on this topic? 19:18:25 phabricator.o.o would be a great test target too 19:18:33 nothing now, I'll follow up offline 19:18:42 * craige would appreciate that :-) 19:18:50 craige: Yup, we should aim for trusty for that too 19:19:04 yeah, we have a lot of low-hanging fruit servers for this. a quick read of the global site manifest's testing platform comments will reveal which ones hopefully, or puppetboard facts view 19:19:08 It's required. 19:19:35 thanks pabelanger! i think this is a worthwhile effort and am looking forward to it 19:19:35 and a blocker to going live. 19:19:42 fungi: np 19:19:46 fungi, pabelanger: i don't mind taking some of them, i'd need a root to make the vm spin tho 19:20:21 rcarrillocruz: yep, we'll make sure there's at least one infra-root on hand for these as a shepherd. we can divide them up as needed 19:20:27 #topic Constraints - next steps? (AJaeger) 19:21:05 We have constraints enabled in gate and periodic jobs but not post and release jobs. To enable it everywhere, we need https://review.openstack.org/#/c/271585/ to enhance zuul-cloner. Thanks jesaurus for working on this. 19:21:13 I'm unhappy that the constraints changes did not finish so far and wanted to discuss next steps. Should we push this for Mitaka and have the projects that currently support constraints (at least cinder, glance, some neutron ones) full support? My preference is to declare this is a Newton target and get zuul-cloner enabled but do further work - including proper announcements - only for Newton. 19:21:35 #link https://review.openstack.org/271585 Let zuul-cloner work in post and release pipelines 19:21:57 So, what are your thoughts on direction here for Mitaka and Newton? 19:23:16 I still find contraints confusing and have no thoughts to offer to this conversation 19:23:51 Basically what needs to be done to finish this AFAIU: 1) Get 271585 pass the testsuite (too large logs); 2) Update post/release jobs; 3) Document setup and announce it so that projects can update their tox.ini files. 19:24:53 it does look like jesusaur is still working on it 19:24:53 We could also make the current setup a bit more robust - so projects enabled it and use same environment (working fine) in gate but then had no tarballs (post queue). 19:25:11 jesusaur: do you need additional assistance debugging the job failures on 271585? 19:25:24 AFAIU jesusaur has it testing locally, it fails in our gate due too large logs 19:25:52 that seems like an odd reason for a job to fail 19:26:32 that's what I remember from some #openstack-infra discussion but I might have misread 19:26:37 #link http://logs.openstack.org/85/271585/5/check/gate-zuul-python27/75de99c/console.html.gz#_2016-03-10_20_52_55_179 testrepository.subunit was > 50 MB of uncompressed data!!! 19:26:51 !!! 19:26:52 jeblair: Error: "!!" is not a valid command. 19:26:52 fungi, your incredible! 19:27:07 yeah, it's a remarkably emphatic error message 19:27:54 Hearing the enthusiasm ;) here, I suggest to declare it a Newton target... 19:28:11 so i guess the questions this raises are, 1. is the subunit output from zuul's unit tests too verbose or possibly full of junk? 2. is 50mb uncompressed a sane cap for subunit data? 19:28:45 they are certainly too verbose except in the case of a test failure :( 19:29:07 if someone gets time, a git blame on that to find why the check for uncompressed subunit data size was introduced would be handy as a data point (e.g., was it for keeping the subunit2sql database sane?) 19:29:12 fungi: the tests pass locally for the change, but they fail in the check pipeline due to either the size of the resulting subunit file or they hit an alarm and time out 19:29:41 fungi: istr it was actuall just for uploading to the logserver 19:29:53 o hai 19:30:03 i think it's probably an appropriate limit for 'unit tests'. 19:30:11 but not so much for zuul's functional "unit" tests 19:30:51 granted, many of even our oldest projects' "unit" tests are more "functional" testing (i'm looking at you, oh great and venerable nova, performing database interactions) 19:31:15 yeah, but zuul starts and reconfigures several hundred times during its tests 19:31:30 there's a lot of startup boilerplate in there, along with full gearman debug logs 19:32:05 AJaeger: mitaka integrated rc is already nearly upon us, it does seem like newton will be when we have constraints more widely in use for things besides devstack-based jobs 19:32:58 i'm not sure what to do about the subunit cap. i definitely need the logging information locally, and i need it in the gate when we get those occasional 'only fails in the gate' bugs. 19:33:02 fungi, yes. 19:33:06 i don't need it in the gate on tests that pass 19:33:20 (though i do need it locally on tests that pass) 19:33:23 AJaeger: and i also agree that declaring open season on constraints integration in tox configs opens us up to a potential risk of breaking things like tarball jobs (as already seen on at least one project recently) 19:33:25 compress it, offer it up in the finished build artifacts? 19:33:40 bkero: we already do 19:34:00 so i wonder if we could prune the logs from tests that pass. i have no idea if/how that would be possible with testr 19:34:06 bkero: however the job checks the raw uncompressed size and complains, then fails on that, with the current implementation 19:34:16 fungi, so for Mitaka: Projects can use it but we advise against it - or if they do ask them to triple check ;) 19:34:49 AJaeger: yes, i think it's still mostly experimental until we can let them use it on more than just check/gate type jobs 19:34:49 Oh, that seems incorrect O_o 19:35:02 bkero: what seems incorrect? 19:35:03 but we can't control it anyway, they just change their tox.ini without us 19:35:21 jeblair: compressing data, but checking uncompressed size 19:35:26 AJaeger: agreed, but we can tell them that when it breaks they get to keep both pieces 19:35:38 fungi ;) 19:35:49 bkero: if you check the compressed size, then it's not a check about how much output was produced, it's a check on how well the compression algorithm performed 19:36:43 granted, if your concern is how much storage you're using for your compressed logs, then checking compressed size does make more sense (which is why i suggested looking into the reason the check was introduced) 19:37:01 Yep, that's true. I don't know the context of what this check limitation is supposed to do. Not constrain resources, or check for discrepancies in the run 19:37:03 occasionally projects would output insane amounts of data into their subunit logs, this was to catch those occurences. it's working. zuul outputs an insane amount of data. i find it useful. 19:37:18 without better history of why we started checking that, i'm hesitant to suggest blindly altering it 19:37:54 could we reduce the logging level of some of the modules, but keep DEBUG for zuul.* ? 19:38:20 jesusaur: i think the bulk of the data are zuul debug lines 19:38:54 jesusaur: gear.* might be significant 19:38:56 I think gear is also a contender for that top spot 19:39:41 are your primary concerns on this topic answered, AJaeger? 19:39:45 fungi, yes 19:40:03 jesusaur: heh, yeah, gear might exceed zuul 19:40:12 we can probably move zuul test log improvements to a review, ml thread or #openstack-infra 19:40:22 well, what would be useful... 19:40:24 thanks, fungi. we can move on. Unless you want to log that constraints are experimental besides devstack 19:40:28 unless anyone is finding it useful to continue brainstorming that in the meeting 19:40:36 is to have someone who understands testrepository willing to pitch in 19:42:04 #agreed Constraints via zuul-cloner for jobs outside check and gate type pipelines is not currently supported, and general use of constraints in tox should be implemented with care to avoid risk to other non-constraints-using tox jobs until support is fully implemented 19:42:46 anyone disagree with tat? i can #undo and rework if needed 19:42:53 I do not disagree 19:43:08 otherwise we still have a couple more topics to fill the remaining 15 minutes 19:43:18 nicely formulated, fungi. +1 19:43:45 #topic Operating System Upgrades (anteaya) 19:43:54 #link https://etherpad.openstack.org/p/infra-operating-system-upgrades 19:43:54 #link https://etherpad.openstack.org/p/infra-operating-system-upgrades precise -> trusty -> xenial 19:43:58 thanks 19:43:59 #undo 19:44:00 Removing item from minutes: 19:44:10 so this came up last meeting and this meeting already 19:44:36 this is mostly a place to track what operating systems we are running and what we want to upgrade 19:44:41 this does seem like an extension of pabelanger's earlier topic 19:44:45 we have identified during pabelanger item 19:44:47 yes 19:44:54 though including the forward-looking plans for ubuntu 16.04 19:45:00 that precise servers are readyg to move to trusty 19:45:02 yes 19:45:08 also dib capabilities 19:45:14 and what nodes have available 19:45:22 Agreed. I think getting trusty is a good base for xenial 19:45:29 and don't mind helping anteaya where possible 19:45:33 so I don't personally have anything to offer here, other than the etherpad 19:45:38 pabelanger: thanks 19:45:50 mostly trying to track decisions and statsu 19:46:11 I felt I was less effective that I could ahve been helping AJaeger with bare-trusty to ubuntu-trusty 19:46:18 having xenial for use in jobs is a definite first step. upgrading our servers from trusty to xenial (or from precise to xenial) should wait for at least a few months according to opinions expressed in earlier meetings 19:46:23 mostly becauase I didnt' understand the large picutre or status 19:46:33 fungi: yes 19:46:36 * AJaeger is gratefull for everybody's help with that conversion 19:46:53 also clarkb has an item in the summit plannnig etherpad on operating systems 19:47:00 so its a topic we are talking about 19:47:10 would mostly like to aggregate if we can 19:47:11 and also, yes, having _one_ image for jobs running on trusty provides us with an easier starting point for the coming trusty->xenial support mogration in our jobs 19:47:25 er, migration 19:47:48 the etherpad notes puppet 384 in xenial... 19:47:48 and if we don't like this etherpad, that is fine, as long as we have something 19:47:58 do we want to switch to distro packages for puppet? 19:48:02 or keep using puppetlabs? 19:48:05 jeblair: I got that from pabelanger's comment from last weeks meeting 19:48:35 part of the complexity in the precise->trusty job migration across different branches was complicated by having several precise images (bare, devstack, py3k/pypy) and a couple of trusty images (bare, devstack) 19:48:52 jeblair: anteaya: Ya, I was surprised to see xenial still using puppet 3.x 19:49:03 so there was an element of many<->many mapping 19:49:14 I'll defer to nibalizer and crinkle for version to use on ubuntu 19:49:39 use latest 3.x from apt.puppetlabs.com until we're ready to migrate to 4 19:49:54 crinkle: can you add that to the etherpad? 19:49:56 i'm pretty sure we're going to want to use puppetlabs packages for at least as long as we're puppeting multiple distros in production, so that we don't have to deal with keeping everything sane across multiple simultaneous puppet versions 19:49:58 i think one of the advantages of using puppetlabs is that we can actually migrate os versions 19:50:02 as your recommendation? 19:50:03 fungi: right, that. 19:50:12 yep 19:50:16 anteaya: sure 19:50:20 crinkle: thanks 19:50:25 regarding puppet 4: We'Re not testing Infra puppet scripts for puppet 4 - should we start that? 19:50:35 AJaeger: no, we are. Fedora-23 19:50:42 pabelanger: ah... 19:50:46 so we get some coverage 19:50:50 great 19:50:56 AJaeger: there is a spec for that sorta 19:51:14 i think its more about getting linting working 19:51:36 AJaeger: and part of the reason we missed fedora-22 in the gate 19:51:38 pabelanger: f23 doesnt test the majority of our codr 19:51:52 nibalizer: right, but we could start to with puppet apply if needed 19:52:37 we should get a nonvoting puppet four trusty apply test 19:53:02 so as for anteaya's etherpad, i think that makes a fine place to coordinate an overview of what changes are migrating what servers from precise to trusty. the trusty to xenial migration for servers is still a ways out and we may find new things we want/need to plan for on the road to xenial anyway 19:53:12 puppet openstack team does some good testing around using straight gems, if we wanted to do that too 19:53:36 fungi: great thank you 19:53:39 we still have a few more minutes to get to the final topic on the agenda 19:53:47 if this is mostly addressed 19:53:51 also in that etherpad mordred has made some notes about dib and xenial 19:53:57 yup, let's move on, thank you 19:54:05 #topic StackViz deployment spec (timothyb89) 19:54:17 #link https://review.openstack.org/287373 19:54:39 right, so we're looking for preliminary feedback on the spec, particularly the change proposal and our build method 19:54:43 we've had some alternative methods suggested, so we'd like to know if there can be a consensus on the method in the spec (DIB) vs puppet vs something else 19:55:46 reviewers so far who provided feedback on that spec seem to have been AJaeger and pabelanger 19:56:04 I left comments about puppet, but haven't heard anybody else have an opinion 19:56:40 so the question is between adding puppetry to install stackviz and its dependencies, and pre-seed its data, vs using a dib element? 19:56:56 fungi: yes, exactly 19:57:13 it's true that we do similar things both ways currently 19:57:44 one long-term vision is to continue reducing the amoun of puppet we apply to our base images, in hopes of maybe eventually being able to not have to preinstall puppet on them at all 19:58:38 our reliance on puppet for our image builds is already pretty minor these days 19:59:03 we have a lot of orchestration being done in dib elements themselves, by nodepool ready scripts, and at job runtime 19:59:36 anyway, i guess people with a vested interest in seeing how this is done, please follow up on the spec 19:59:45 my only real concern, is if we use dib elements, we are looking gate testing of stackviz 19:59:50 we're basically out of meeting time for this week 20:00:09 thanks everybody! time to make way for the tc 20:00:14 #endmeeting