19:03:46 #startmeeting infra 19:03:47 Meeting started Tue Apr 30 19:03:46 2013 UTC. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:03:48 ~o~ 19:03:49 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:03:51 The meeting name has been set to 'infra' 19:03:53 isnt that just the next ptl yelling at you 19:04:06 to get off 19:04:15 o/ 19:04:43 o/ 19:05:00 heyoh! 19:05:25 o/ 19:05:26 o/ 19:05:40 mordred: are you above 10,000 feet yet? :) 19:05:55 o/ 19:06:15 #topic items from last meeting 19:06:24 fungi: slaves? 19:06:28 yup 19:06:42 rhel6 slaves have been replaced by centos6 slaves and destroyed/deleted 19:07:06 the change to switch the node labels (i added temporary compatability labels ahead of time) is in the process of merging 19:07:14 cool 19:07:36 it also adds some periodic centos6 bitrot jobs so dprince can ferret out the remainder of patches needing backports 19:07:55 and i disabled and eventually deleted the precise unit test slaves too 19:08:09 since those hadn't been used for a month or more 19:08:10 w00t 19:08:29 i believe the stable 2.6 tests are the last thing that holds us back from dropping oneiric? 19:08:32 i think that's the current status on slave versions at the moment 19:08:42 jeblair: yes 19:09:04 fungi: sounds good! 19:09:09 yeah, dprince is working on sorting the last few patches he needs for that 19:09:12 oh 19:09:30 also i cleaned up our puppeting to make it easier to add debian slaves should we want/need to do so later 19:09:53 or sooner... 19:09:54 that segways us into "what do we do about >= 13.04 having only 9 months of support?" 19:09:59 good, i like having a plan b 19:10:04 someone on the tc (hint mordred) needs to strike up the conversations around all that 19:10:20 #topic jenkins slave operating systems 19:10:32 rackspace does have a raring image now fwiw 19:10:32 merp 19:10:35 * clarkb checks hpcloud 19:10:36 oh good 19:10:41 mordred: you had some thoughts about that last week; want to chat about those? 19:11:15 the idea was that since we're not supposed to break rhel/lts 19:11:22 no raring on hpcloud 19:11:36 that we use lts nodes to do testing of stable+1 19:12:03 mordred: my concern with that is while we are not supposed to break lts how do we know we havent? 19:12:04 and maintain our focus otherwise on current ubuntu release for dev purposes of master 19:12:15 mordred: by stable+1 do you mean two releases back? eg, folsom now? 19:12:25 yes. stable+1 == folsom now 19:12:41 good thing our precise unit test slaves needed to be rebuilt on rackspace nova anyway, so deleting them was not wasted work 19:12:51 and I'd say that since stable branches are really the purview of the distros and they've pledged to support things on their lts release 19:12:54 (well, all but 2 of the 16 anyway) 19:13:00 there's a clear ownership for problems 19:13:39 Daviey: you around? does the above sound reasonable to you? 19:13:48 mordred: what do we use to test 2.6 on master? 19:13:57 jeblair: centos 19:14:06 centos 19:14:17 that problem exists even without the 9 month support 19:14:28 or hell - debian apparently has all pythons :) 19:14:55 and i assume lts for stable+1 means the lts which was current at the time stable+1 was developed/released, not whatever the latest lts is (which might not be the same occasionally) 19:15:19 i'm wondering why we should bother testing on non-lts at all? (which i think is pretty similar to what clarkb is saying?) 19:15:21 mordred: i wouldn't say all, but wheezy will have 2.6, 2.7 and 3.2 in main, with 3.3 in backports 19:15:34 jeblair: correct 19:15:35 fungi: with Ubuntu's cloud archive latest LTS is where you would find stable+1 19:15:39 jeblair: because our devs focus on current release, not on lts 19:15:45 and for good reason 19:16:11 https://wiki.ubuntu.com/ServerTeam/CloudArchive 19:16:16 ttx: not necessarily a new version of python 19:16:44 mordred: sure, but your plan is to bump testing of a project that was tested on latest back to lts at a more or less arbitrary point in time 19:16:50 wheezy is probably not a good target either, unless you want to shuffle it again soon 19:17:19 mordred: so if that's going to work, why won't just testing on the lts to start with work? 19:17:27 jeblair: we could test on both the whole time and then just drop "latest" when it dies 19:17:35 sequencing, I believe 19:17:43 new things go in to master, tested against latest ubuntu 19:17:59 we have been really bad at changing platforms for various reasons 19:18:01 they will be backported to cloud archive for lts, but probably not until they've landed on master I'd imagine 19:18:14 new features are not landed against stable branches 19:18:16 we are only just recently on quantal (~6 months after release) and devstack is still all precise 19:18:25 so the needs for preemptive backporting don't exist 19:18:39 if we have to iterate that quickly just to drop support 3 months later that feels like a lot of wasted effort to me 19:19:03 clarkb makes a good point that most of this is theory and not practice 19:19:06 wait what are you guys trying to do? 19:19:21 zul: you guys new support lifecycle broke ours 19:19:22 well, i consider the time i spent getting tests running on quantal will be applicable toward getting them running on raring anyway 19:19:34 zul: our stable branches need testing for 12 months 19:19:43 mordred: how? 19:19:44 but ubuntu only now exists for 9 months at a pop 19:19:47 mordred: sounds like revenge :) 19:19:52 ttx: likely :) 19:20:11 zul: so we're trying to sort out how to test changes to stable+1 branches 19:20:24 mordred: we wouldnt get revenge ;) 19:20:39 now we could just automagically switch to $newrelease when they come out and break the gate 19:20:41 mordred: why not 12.04 with the coud archive enabled 19:20:47 because 19:20:49 zul: we can't continue to test grizzly/raring when raring goes out of support, and we need it for 12 months, and you now provide 9 instead of 18 19:21:00 zul: sorry, that's what I was proposing 19:21:05 with the expectation that a week after ubuntu releases we spend a couple days fixing all the things 19:21:16 mordred: that sounds sane to me 19:21:18 but if we try to put things in place workign I think we will always be well behind the curve 19:21:29 clarkb: you mean for master? 19:21:32 mordred: yes 19:21:34 * fungi dislikes the kind of scramble "taking a couple days to fix all the things breaking the gate" implies 19:21:45 clarkb: I think our problem has historically been the slow speed our cloud vendors have in providing us images 19:21:48 well all of the stuff is gotten from pip isnt it? 19:21:55 zul: libvirt 19:22:04 and python itself 19:22:10 mordred: clarkb, fungi, and dprince have put a huge amount of effort into upgrading to quantal, and we're not done yet 19:22:14 which, btw, is broken on redhat (python itself) 19:22:15 mordred: libvirt is not a problem the cloud archive gets the same version whats in the development release 19:22:21 jeblair: agree 19:22:36 mordred: my issue is that if we switch to raring at the end of havana (as we switched to quantal afte rgrizzly) then we have only 3 months of support on that before we drop back to LTS 19:22:52 clarkb: we shouldn't have waited that long 19:22:55 so its not just the stable branches that are at issue. we *will* have to iterate much faster than we have been able to 19:22:59 jeblair: yeah that 19:23:20 well, problem 1 is that it takes so long for us to be able to _start_ migrating 19:23:26 the worry is that changing the test platform in the middle of a release cycle introduces more churn than desired? 19:23:31 but - I'm willing to not die on this hill 19:23:54 jlk: it is a lot of churn that we give up on shortly after (to me the benefits are fuzzy but the costs are known and expensive) 19:24:02 nod 19:24:09 if everyone else thinks that lts+cloud-archive is sane for master, then fine... I just worry that we're going to hit backport hell like we had 2 years ago 19:24:30 and the flip side is that continuing to test on a dead-end platform isn't providing much benefit in the Real World? 19:24:32 but it's also possible that we've stabalized and I'm being an olld curmudgeon 19:25:01 mordred: you shouldnt since you guys are getting the python dependencies from pypi 19:25:29 zul: yeah. and I think libxml and libvirt are reasonably sane from an api perspective at this point 19:25:40 I'm game 19:25:42 mordred: if we could get rackspace and hp to commit to having images available in a decent amount of time (thank you rackspace for raring) I think we could try speeding the cycle up 19:25:43 mordred: hah libvirt sane 19:25:51 clarkb: HAHAHAHAHAHAHAHAHAHA 19:25:51 the desire is for something newer than LTS, but with longer support than Ubuntu now has 19:25:55 mordred: exactly 19:26:04 clarkb: HAHAHAHAHAHAHAHAHAHA 19:26:07 sorry 19:26:11 I repeated myself 19:26:11 if we are beholden to other people it is really hard to promise with such a short window 19:26:35 yes. I believe the story will be a bit different once we have glance api endpoint available, but they are not now 19:26:37 clarkb: is it just Ubuntu images you're in need of, or would fast access to other platforms (like Fedora) help as well? 19:26:47 jlk: right now its just ubuntu 19:26:48 jlk: fedora would not be helpful 19:26:54 from my perspective, the desire is for something which has the versions of system components we need (python et al) and decent run time for security support into the future 19:27:00 ok. (just trying to understand the problem scope) 19:27:18 jlk: when people talk about cloud interop - this is one of the things that doesn't get talked about enough 19:27:32 image availability, or platform to run on? 19:27:33 jlk: the TC decided in january that we would tset on latest ubuntu with an eye for not breaking current RHEL and Ubuntu LTS 19:27:34 without image upload ability, we're stuck waiting until BOTH clouds upload new images 19:27:46 nod 19:28:01 we are going to use centos to test python 2.6 as ubuntu has ditched 2.6. That covers current RHEL 19:28:07 is somebody working on image upload capability on the RAX side? (understanding that it's not a problem for raring at this time) 19:28:24 yes. both clouds want it as a feature 19:28:27 so now we need to accomodate testing on current ubuntu or ask the TC to reconsider the platforms we test on or ???? 19:28:46 well - the policy has always been dev on latest ubuntu 19:28:57 however, I do not believe we have EVER actually been able to do that 19:29:01 due to lag time 19:29:20 we've pretty consistently been at least one release behind 19:29:30 https://etherpad.openstack.org/python-support-motion 19:29:31 * jlk cries a little to himself, softly. 19:29:36 #link https://etherpad.openstack.org/python-support-motion 19:29:36 so it might just be time to call a spade a spade and go with a new plan 19:30:00 distro making release + all providers of interest making images available of that release + time for us to test and fix things we need working on that release + time to switch tests over to it 19:30:23 technically the tc agreed on a motion about targeting development. it wasn't quite so specific to specify exactly what test platforms, but i think the intent is that we should use it to guide what we do 19:30:33 I agree 19:30:58 however, I betcha we could do the lts+cloud archive for testing 19:31:06 as a vehicle to support that motion 19:31:10 +1 19:31:17 and have few enough corner cases that anyone would ever notice 19:31:41 mordred: it sounds like that's worth a shot, and if we get into dependency hell, then we know we have a reason to speed up the treadmill 19:31:43 since cloud archive has latest ubuntu backports of the relevant bits 19:31:48 jeblair: ++ 19:32:00 so.... 19:32:14 we may want to add an apt mirror for the cloud archive 19:32:23 as I do not believe our providers are doing local mirrors of it 19:32:49 or mayubbe it doesn't matter? 19:32:56 seems like Ubuntu is going to be trying to do that work anyway (keeping OStack releases going on LTS) so making use of that effort makes sense to me. 19:33:11 if we're going to that trouble, it seems sane to just mirror what we need in general (rackspace's ubuntu mirror has gone down from time to time too) 19:33:22 fungi: ++ 19:33:38 fungi: I have a reprepro config for it already actually 19:34:00 mordred: you want to drop that in puppet then? 19:34:06 we'd need cloud-local mirrors 19:34:21 which I'm not 100% sure how to solve - but I'll put my brainhole on it 19:34:47 i continue to wonder if cloud-local (one per az or whatever) mirrors don't also make sense for our pypi mirroring 19:35:01 mordred: i think devstack-gate can accommodate that fairly easily 19:35:08 mordred: that'd be our own instances acting as a mirror, because of fear that the provider provided mirror might go down? 19:35:16 jlk: correct 19:35:17 mordred: (in the image creation step, do provider-specific apt-source config) 19:35:27 jlk: and they do go down occasionally 19:35:46 jeblair: yes. although I'd like to figure it out for unittest slaves too 19:35:58 clarkb: but doesn't that just cause them to hit the next mirror (maybe more slowly)? 19:36:09 forgive me, I come from the yum world 19:36:13 jeblair: besides you know who to bug if something breaks ;) 19:36:17 jlk: the bigger issue is, the more external bits we rely on being reachable for tests, the more their outages multiply each other (multiple points of failure rather than just a single point of failure) 19:36:22 mordred: actually, if you solve it for unit test slaves in puppet , you might not have to do anything special for devstack-gate. 19:36:25 jlk: no apt usually complains then dies 19:36:31 clarkb: awesome :/ 19:36:48 mordred: (even though unit test slaves don't strictly need it right now because they are one-provider) 19:36:48 fungi: I understand that. What I didn't quite grasp was that apt doesn't have a mirror system to fail through 19:37:00 jeblair: let me poke and see if there is a pure-puppet mechanism I can dream up 19:37:05 jeblair: ++ 19:37:31 #action mordred set up per-provider apt mirrors (incl cloud archive) and magic puppet config to use them 19:37:52 jlk: yeah apt expects to try one url to retrieve a package, and then errors out rather than continuing to spend time trying other urls 19:38:14 sad 19:38:36 well, the alternative is to take lots of time to realize your network is broken and it's not a mirror issue 19:38:38 so really, doing our own is just moving the potential problem closer to us 19:38:43 so, er, we're dropping the quantal slaves and going back to precise? 19:39:00 jlk: which tends to work out for us 19:39:10 jeblair: I really want to say no, because quantal has 18 months of support 19:39:16 fungi: *shrug* in the yum world that could be a matter of seconds or so. But you never suffer from a single mirror being out of date or down. 19:39:16 but doing quantal then going back to precise is just weird 19:39:47 when is the next ubuntu lts due? 19:40:01 14.04 19:40:11 one year 19:40:15 * fungi thinks doing quantal and then upgrading to wheezy doesn't sound *that* weird ;) 19:40:40 (we may want to move onto other topics before our hour is up) 19:40:47 concur 19:40:52 +1 19:40:53 I think we have a good handle of the problem with a general solution. we can sort out details later 19:41:00 hrm 19:41:02 details are important? 19:41:15 clarkb: i think you just said that we have decided to test "I" on precise 19:41:21 details are important, but consensus may not be reached during meeting. 19:41:26 definitely but so are things like gerrit 2.6, lists.o.o, logstash, etc :) 19:41:36 clarkb: and we're either planning on testing "H" on either quantal or precise? 19:41:47 jeblair: that is how I grok 19:42:13 i think that's kind of an important point to resolve so we don't go off-track... 19:42:38 will 14.04 be available in time for the "ifoo" development timeframe, or not until "jbar"? 19:42:43 if you want, we can punt to the next meeting for time, but i don't want to start work on this project without resolving that. 19:43:03 jeblair: ++ I don't intend on things changing until we have consensus 19:43:38 okay, i'll put this on the agenda for next time then 19:43:53 however, the agenda wasn't updated since last time, so, what else would you like to discuss? :) 19:44:23 woops. I know gerrit 2.6, lists.o.o, and logstash are things that are on my radar 19:44:28 bunnies 19:44:30 does anyone have anything specifically for me? i need to duck out early (another minute or two) 19:44:35 nope 19:44:46 #topic gerrit2.6 19:44:48 also I'd like to talk about testr maybe as we should really push that hard before people get wary of merging those changes 19:44:59 I believe zaro is going to start looking at 2.6 19:45:08 * fungi ducks out. back in #-infra later if something comes up 19:45:10 just started reading docs. 19:45:19 awesome 19:45:25 zaro: can you find-or-create a bug in openstack-ci about upgrading to 2.6 and assign it to yourself? 19:45:35 jeblair: sure will. 19:45:54 as I understand it the intent with gerrit 2.6 is to no longer run a fork of gerrit 19:46:00 is that correct? 19:46:01 zaro: welcome to the traditional hazing gerrit tasks - everyone has had one when they started ... :) 19:46:05 i would like to not run a fork 19:46:05 clarkb: yes. if possible 19:46:19 ++ 19:46:24 mordred: thanks a lot. clarkb warned me during interview. 19:46:27 awesome 19:46:29 if we have to diverge, i'd like us to try to do it in a way where we expect the divergence to be accepted upstream 19:46:37 ++ 19:46:41 ++ 19:46:49 I ask because I think this will influence the upgrade process 19:46:59 want to make sure we are on the same page. sounds like we are \o/ 19:47:18 yeah, it's not just forward-port patches, it's gap-analysis, and try to figure out the easiest way to close 19:47:24 clarkb: +1 19:47:33 #topic lists.o.o 19:47:59 clarkb: TTL is already 300 19:48:02 so I just booted and puppetted a replacement server for lists.o.o (old server is oneiric which will EOL in just over a week) 19:48:13 clarkb: so dns is ready to change when you are 19:48:33 ok. I will set temporary DNS records for the new host after this meeting. 19:48:44 we should announce a cutover time 19:48:59 I will start the data migration after the 1st to avoid any mailman monthly emails 19:49:14 jeblair: yes. Is this something that we think needs to happen over a weekend? 19:49:20 (I am leaning that way) 19:49:21 and as i mentioned in -infra a few mins ago, i think we should avoid having exim send over v6 to start 19:49:28 ++ 19:49:30 though i think it's okay to add AAAA records 19:49:45 (and have exim receive on v6) 19:49:50 * mordred agrees with every opinion jeblair has on mail 19:50:00 if we want to do a weekend before oneiric EOLs we will have to do it this weekend. We can do it the one after if we are willing to risk a couple days of EOL 19:50:20 this weekend is good for me. 19:50:21 clarkb: yes, i think something in the friday-night to sunday-morning range 19:50:30 clarkb: and this weekend works for me too 19:50:44 same here 19:50:47 how about 9am PST saturday? 19:50:54 great 19:50:57 wfm 19:51:01 +1 19:51:10 ok, I will send a notification this afternoon after lunch 19:51:16 #action clarkb send email announcing lists.o.o move at 9am pst saturday 19:51:37 #topic testr 19:51:38 mordred: hey 19:51:48 clarkb: testr thoughts? 19:51:57 yes. I agree 19:51:58 ya, nova, quantum and some of the clients are done 19:52:08 we should push hard on testr early in the cycle 19:52:15 mordred: do we need to be more coordinated and push testr on everyone else before milestone 1? 19:52:23 but - it's a big task and slightly out of scope for us 19:52:37 I think we should just get markmc and sdague to yell at people 19:53:10 (honestly, there's no way that we have the manpower to do it by ourselves) 19:53:36 so perhaps bugging ttx to start a chat with folks in the meeting about best ways to get them migrated? 19:55:53 I can tell everyone is excited by this topic 19:56:47 woooo 19:56:55 me yelling doesn't help all that much :) 19:56:55 #topic eavesdrop.o.o 19:57:05 i think eavesdrop needs migration too. 19:57:11 yah 19:57:11 that works for me. 19:57:11 but I think we should pay attention to it and be proactive 19:57:11 #action clarkb to ping markmc and sdague about move to testr 19:57:11 I will see what they think and do braindumps as necessary 19:57:11 clarkb: shall we do it at the same time as lists? 19:57:11 Do we want an open discussion? I can talk about logstash a little bit too 19:57:18 jeblair: might as well 19:57:35 #topic open discussion 19:57:45 mordred: ack, how about I plug you in during meeting so that you pass the bucket ? 19:57:45 maybe someone else will volunteer to do the nagging 19:57:55 ttx: great 19:58:05 FYI I think logstash.o.o's data is now consistent. After much hammering 19:58:21 index size per day has grown to about 5GB 19:58:25 clarkb: so we're past the burn-and-rebuild stage? 19:58:30 jeblair: I think so 19:58:37 neato! 19:58:42 ++ 19:59:26 there is a bug in kibana where the timestamps don't show their milliseconds correctly... is fixed in master. I may pull that it. Otherwise I think the next step is getting other logs into logstash 20:00:08 cool. i think that's time for us. 20:00:24 thanks all, and we'll work out the rest of the details about test platforms next week 20:00:26 #endmeeting