19:03:45 <mtaylor> #startmeeting 19:03:46 <openstack> Meeting started Tue Jun 5 19:03:45 2012 UTC. The chair is mtaylor. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:03:47 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:04:00 <jeblair> thank goodness that made it in before the startmeeting. 19:04:10 <clarkb> jeblair: we were racing? 19:04:12 <mtaylor> who wants to talk about barrell racing? 19:04:34 <mtaylor> OR, I guess we can talk about CI stuff 19:04:36 * jeblair wants to know what a barrell is. 19:04:41 * mtaylor can't spell 19:05:02 <mtaylor> #topic zuul 19:05:02 <jeblair> bigger than a barrel, i'd imagine 19:05:15 <mtaylor> jeblair: so - you wanna tell folks all about the new hotness? 19:05:21 <jeblair> yeah, so zuul is in production, basically globally for openstack now 19:05:39 <jeblair> because of the interdependencies of all the projects, we can't phase it in, it's pretty much all or nothing. 19:05:55 <jeblair> I wrote a mailing list post about it, which you should receive in the next 6 hours if you haven't already 19:06:02 * mtaylor hands jeblair a large salmon 19:06:04 <jeblair> and a blog post here 19:06:04 <mtaylor> totally awesome 19:06:08 <jeblair> #link http://amo-probos.org/post/14 19:06:26 <jeblair> After rolling it out, it pretty much immediately started testing keystone changes in parallel 19:06:30 <jeblair> http://paste.openstack.org/show/18354/ 19:06:35 <jeblair> that's what that looks like. 19:06:47 <jeblair> not to be outdone, 4 nova changes were tested in parallel shortly after that 19:06:51 <jeblair> http://paste.openstack.org/show/18357/ 19:07:39 <mtaylor> things I like a) parallel testing b) dependent testing (yay for not running long-running tests if the quick ones don't pass) 19:07:58 <jeblair> i'm pretty sure the ssh connection is going to die at some point 19:08:14 <clarkb> so in that output the change at the top was tested with all of the changes below it merged in as well? 19:08:14 <jeblair> but that's a matter of waiting until that happens, and figuring out why from the debug messages. 19:08:27 <jeblair> clarkb: yep 19:08:41 <jeblair> clarkb: and only merged if they all passed (they did) 19:09:13 <deva> Cross project dependencies, even? 19:09:26 <jeblair> deva: yes and no... 19:09:37 <jeblair> yes in that the changes across dependent projects are sequenced 19:09:56 <jeblair> no in that you can not specify a change to one project must be tested with a change to another project 19:10:25 <jeblair> deva: it may be possible to do that if we can get the merge job behaving exactly like gerrit's merge check. it's something i plan on looking into. 19:11:05 <deva> Gotcha 19:11:47 <mtaylor> jeblair: should we do pep8 before unittests similar to how we do merge first now? 19:12:44 <jeblair> mtaylor: we could do that; the pep8 tests take a little longer since they're done in a tox venv 19:12:54 <mtaylor> jeblair: good point 19:13:43 <jeblair> mtaylor: also, unit tests can still be meaningful even if pep8 fails 19:13:51 <jeblair> (which isn't true for a failing merge test) 19:13:57 <mtaylor> indeed 19:14:12 <jeblair> so i think we'd at least want to keep the current setup for the check queue 19:14:29 <jeblair> let's look into how long the pep8 tests take before deciding to change the gate queue 19:14:31 <mtaylor> yeah - I can be on board with that 19:15:40 <jeblair> that's probably it for zuul 19:17:00 <mtaylor> cool 19:17:05 <mtaylor> lemme see ... 19:17:16 <mtaylor> #topic gerrit changes 19:17:32 <mtaylor> Shrews, clarkb: how are we doing on our new gerrit features? 19:18:01 <Shrews> Work In Progress is ready, available on review-dev now. 19:18:26 <clarkb> and I think the first attempt at a better dashboard and list of "reviewable" changes is complete 19:18:42 <Shrews> As an enhancement, we'll soon be adding a new per-project permission so more people can use the WIP feature. 19:19:06 <Shrews> right now, only change submitter, branch owner, project owner, and admins can use it 19:19:31 <mtaylor> I think we should land both of your most recent changes, install those on review-dev to double-check ... and then release to review.openstack.org 19:19:40 <mtaylor> unless somebody thinks we should wait for Shrews' acl fix? 19:20:05 <Shrews> mtaylor: i see no reason to wait on it 19:20:35 <clarkb> I have no problems with it 19:20:38 <mtaylor> I think that gerrit 2.4 + dashboard are pretty compelling, and giving change owner ability to WIP is nice 19:20:52 <mtaylor> and might get us a little bit more real-world use of wip 19:21:03 <clarkb> I have a feeling the better priority sorting will take some time 19:21:18 <jeblair> how long do you think the acl will take? 19:21:19 <clarkb> and I haven't really dug into it yet, so don't wait 19:22:06 <jeblair> (because if it's not going to be too long, we may want to wait until we can announce the feature, and announce that -core developers can wip changes) 19:22:26 <mtaylor> that's a good point - Shrews? thoughts? 19:23:00 <Shrews> jeblair: i'm *hoping* this week 19:23:18 <Shrews> so we can hold off a couple of days if you want to see where i stand then 19:25:47 <clarkb> I was going to update puppet to land http://ci.openstack.org/tarballs/test/gerrit-2.4-11-gd4a0c4b.war on review-dev. Should I go ahead or will Shrews' change and my latest one be approved soon? 19:27:50 <mtaylor> I'm good with both changes landing 19:28:29 <clarkb> I can update puppet after they land then 19:28:33 <mtaylor> cool 19:28:54 <mtaylor> alright, let's hold off a couple of days before updating review and see how the acl changes go 19:30:02 <mtaylor> I think that's all the big-ticket topics for the moment ... 19:30:06 <mtaylor> #topic open discussion 19:30:44 <mtaylor> I'm trying to get the global dependency list stuff up and going (after realizing that we can use the update.py machinery in openstack-common to our advantage) 19:30:57 <mtaylor> and I got pure-nosetests changes done for nova and glance 19:31:10 <mtaylor> OH - I did something else I forgot about ... new pypi mirror code 19:31:40 <clarkb> LinuxJedi isn't here, but after cleaning up etherpad-lite's puppet module I think I may want a precise host instead of an oneiric host for that >_> 19:31:41 <mtaylor> pypi.openstack.org is created from all of the packages downloaded by pip-installing all of the requirements from all of the branches of all of our projects 19:31:45 <jeblair> mtaylor: re dependency list, is awesome -- basic idea to have the list in openstack-common, and use update.py to copy it into projects? 19:31:52 <mtaylor> jeblair: yes. 19:31:55 <mtaylor> jeblair: except 19:31:59 <LinuxJedi> clarkb: can't do that yet 19:32:05 <clarkb> LinuxJedi: darn, ok 19:32:09 <mtaylor> jeblair: we won't copy entries from the global list into the projects unless that depend is there first 19:32:11 <LinuxJedi> clarkb: since Rackspace doesn't give us Precise 19:32:31 <jeblair> and nosetests is awesome, except it outputs a lot of logging to console. 19:32:31 <mtaylor> so each projects list will be a subset of the global list ... but the versions will be tied... 19:32:36 <LinuxJedi> clarkb: unless mtaylor wants it on the SF HP Cloud account or something 19:32:49 <mtaylor> jeblair: yeah, I've gotta fix the nosetest output thing ... vishy said he was cool with our proposed change 19:32:52 <jeblair> LinuxJedi: i think precise images exist now. 19:32:57 <mtaylor> they do 19:33:00 <LinuxJedi> jeblair: ah, awesome 19:33:05 <mtaylor> we can spin up precise slaves via jclouds-plugin even 19:33:07 <LinuxJedi> clarkb: ok, scrap what I said ;) 19:34:03 <clarkb> LinuxJedi: if you can swap oneiric out for precise when you get back that would be awesome 19:34:24 <mtaylor> speaking of that ... 19:34:28 <mtaylor> #topic etherpad 19:34:34 <clarkb> I am still fiddling with it a little on my test box though. Not entirely sure logrotate is working the way I want it to 19:34:38 <mtaylor> should we talk about a transition plan? 19:34:51 <LinuxJedi> clarkb: sure, can I erase the oneiric one in the process or do you temporarily need both? 19:35:02 <clarkb> LinuxJedi: I do not need the oneiric box so erasing is fine 19:35:07 <LinuxJedi> cool 19:35:23 * LinuxJedi goes back to lurking and pretending to be not working on a public holiday ;) 19:35:27 <jeblair> clarkb: lovely puppet work, btw. 19:36:18 <clarkb> #link https://github.com/Pita/etherpad-lite/wiki/How-to-migrate-the-database-from-Etherpad-to-Etherpad-Lite 19:36:32 <clarkb> that link describes the technical process behind migrating 19:37:07 <clarkb> basically run a js script to dump the old DB then cat that back into the etherpad lite DB 19:37:23 <mtaylor> so we should be able to dry run the data migration a few times to make sure it's solid and see how long it takes 19:37:42 <LinuxJedi> clarkb: let me know if you need any more VMs for the dry runs 19:37:51 * LinuxJedi can spin up as many as you need 19:37:54 <clarkb> ok 19:37:57 <mtaylor> at that point, should just be a scheduled downtown and migration, yeah? 19:38:12 <mtaylor> are we close enough on it to be thinking about that? or am I jumping the gun? 19:38:49 <clarkb> probably jumping the gun a little, but yes if things look good after migrating a couple times we should be able to schedule a downtime and DNS cutover or however you want to actually flip the switch 19:39:17 <clarkb> does the CI team admin etherpad.openstack.org? 19:39:22 <mtaylor> ok. I'll just sit back on my haunches for a while 19:39:26 <LinuxJedi> clarkb: yes 19:39:32 <mtaylor> well, sort of 19:39:37 <mtaylor> we have the login to it :) 19:39:37 <LinuxJedi> clarkb: I can help you with a migration plan when ready 19:39:37 <clarkb> so access to the old DB shouldn't be a problem? 19:39:43 <LinuxJedi> clarkb: I have logins for everything 19:39:50 <clarkb> great 19:39:51 <mtaylor> LinuxJedi: has global root on the internet 19:40:01 <LinuxJedi> rm -rf /internet 19:40:35 <mtaylor> crap. now I can't work 19:40:46 <mtaylor> #topic open discussion 19:41:01 <mtaylor> anybody got anything else? questions? comments? 19:41:44 * LinuxJedi has had 2 days off this week and lots of non-public admin stuff this week so it will probably be a quietish week from me 19:42:23 <LinuxJedi> but I can fix everyone's problems as usual and I have a few things planned 19:42:25 <LinuxJedi> :) 19:42:28 <mtaylor> hehehe 19:42:41 <mtaylor> well, for the record, I did NOT break anything this weekend 19:42:49 <LinuxJedi> yay \o/ 19:43:01 * LinuxJedi buys mtaylor a beer 19:43:20 <clarkb> are we fully recovered from the forkbombs? 19:43:32 <mtaylor> good question. actually... 19:43:40 <mtaylor> #topic multiprocess forkbombs 19:43:49 <mtaylor> we should probably talk about that for a sec just for the record 19:43:58 <jeblair> i think so, unless a test snuck in last night as i was merging the revert patch 19:44:22 <jeblair> #link https://wiki.jenkins-ci.org/display/JENKINS/ProcessTreeKiller 19:44:38 <jeblair> becaues of that, i believe that jenkins should have killed the processes that got out of control 19:45:07 <jeblair> on the two machines i could (eventually) log into, the processes in question had the correct environment for that to operate 19:45:12 <clarkb> is there any value in setting ulimits on the test VMs? 19:45:13 <jeblair> so i'm not sure why it didn't happen. 19:45:36 <mtaylor> someone was suggesting that the forkbomb was going so fast that perhaps the killer couldn't keep up 19:45:38 <jeblair> it may have been so pathologically bad that jenkins couldn't run that code. 19:46:19 <jeblair> perhaps, but that's a naive implementation of a process killer; it should do a complete pass and eventually kill the parent. 19:46:24 <jeblair> but i don't know how it's implemented in jenkins. 19:46:43 * mtaylor blames java 19:46:48 <jeblair> clarkb: we may want to look into that. or something with cgroups 19:47:32 <clarkb> I think the goal with ulimit/cgroups would be to keep the machine in a useable state for debugging? 19:47:40 <clarkb> and possibly give jenkins a better shot at cleaning things up 19:47:43 <jeblair> and probably look into the processtreekiller code to see what it's actually doing. 19:47:56 <mtaylor> jeblair: any further thoughts on the post-build action of cleaning up lurking processes? 19:48:33 <jeblair> mtaylor: my thoughts on that are disrupted by the processtreekiller -- if it was supposed to run but failed, i think there's probably nothing we can do from within jenkins to do the same thing. 19:49:22 <mtaylor> jeblair: good point 19:50:37 <Shrews> heh, it lists ALL processes and check the env variables of each. ick 19:50:57 <mtaylor> wow, really? that's special 19:51:43 <jeblair> Shrews: better ideas? 19:52:14 <Shrews> jeblair: store list of pids? not sure without understanding jenkins code 19:52:55 <jeblair> jenkins spawns processes that can spawn processes that can spawn processes whose parents can die making the children be reparented to PID 1. 19:53:05 <jeblair> all of which happened yesterday 19:53:34 <jeblair> so i'm hard pressed to see a better way (other than using cgroups which isn't cross-platform) 19:54:14 <LinuxJedi> jeblair: still loving Jenkins? ;) 19:55:04 <jeblair> LinuxJedi: in my statement above, the processes i'm talking about are the test processes. 19:55:13 <LinuxJedi> ah, ok :) 19:55:22 <Shrews> eh, there could probably be some sort of central reporting system when a new child is spawned. 19:56:21 <mtaylor> well... I think that's about it for real this time 19:56:21 <jeblair> Shrews: I think what you're describing doesn't exist in unix. 19:56:26 <mtaylor> last thoughts? 19:56:28 <clarkb> Shrews: you should write a custom init just for jenkins hosts 19:56:44 <Shrews> jeblair: i'm thinking at the jenkins level. 19:56:48 <jeblair> perhaps we should use systemd. 19:57:20 <jeblair> Shrews: the processes we're talking about aren't spawned by jenkins, they're spawned by the test runner that we told jenkins to run. 19:57:33 <Shrews> jeblair: oh, well that is different indeed 19:58:19 <mtaylor> thanks everybody! 19:58:22 <mtaylor> #endmeeting