#openstack-meeting log

19:02:47 <jeblair> #startmeeting infra
19:02:48 <openstack> Meeting started Tue Jul 30 19:02:47 2013 UTC and is due to finish in 60 minutes.  The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:02:49 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:02:51 <openstack> The meeting name has been set to 'infra'
19:02:59 <jeblair> "and is due to finish in 60 minutes" ! neato :)
19:03:02 <mordred> o/
19:03:08 <fungi> too cool
19:03:25 <jeblair> #topic asterisk server
19:03:38 <jeblair> most of the topics from last meeting were around the asterisk server anyway...
19:03:54 <jeblair> including that i signed up for a DID, so it has a phone number now...
19:03:57 * mordred bows down to the new asterisk server overlord
19:03:57 * fungi missed the last meeting
19:04:11 * fungi forgot to see what got assigned to him in absentia
19:04:15 <jeblair> fungi: i don't think you did; at least based on irc logs
19:04:30 <jeblair> it looks like there was no meeting last week
19:04:33 <clarkb> correct
19:04:43 <jeblair> pabelanger, russellb: should we all dial in?
19:04:53 <jeblair> to see if we can stress test the conf server a bit?
19:04:59 <russellb> jeblair: sure
19:05:02 <pabelanger> should be able too
19:05:12 * fungi has a telegraph^H^H^H^H^Hphone handy
19:05:21 <pabelanger> sip:conference@openstack.org
19:05:39 <russellb> pabelanger: going to use g722?
19:05:41 <clarkb> I will need a real number as android's clients are derpy and I am still without a proper headset
19:05:46 <jeblair> there's that, and the phone number is 512-808-5750
19:05:52 <pabelanger> no, was going to use DID
19:05:57 <russellb> i'll probably just use the DID as well ...
19:06:00 <pabelanger> russellb: no client
19:06:14 <russellb> i have a client, but i like my desk phone more, heh
19:06:17 <fungi> conference id number?
19:06:19 <jeblair> 6000
19:06:29 <pabelanger> in
19:06:40 <pabelanger> asterisk CLI looks good
19:07:29 <russellb> 6 people on
19:07:59 <pabelanger> load is nothing
19:08:58 <anteaya> I can get on with skype and the phone number
19:09:11 <pabelanger> *CLI
19:09:15 <pabelanger> confbridge list
19:09:21 <anteaya> using the sip client, Jitsi still fails for me
19:09:30 <pabelanger> anteaya: likely a codec issue
19:09:41 <anteaya> pabelanger: probably
19:09:58 <russellb> jeblair: *CLI> core show channels ... shows all channels as calls from the SIP provider
19:10:06 <anteaya> no
19:10:57 <anteaya> yes a little choppy for me too
19:11:39 <mordred> cell phone
19:11:53 <anteaya> I am just listening, not transmitting
19:12:08 <anteaya> who was that?
19:13:07 <pabelanger> nice
19:13:23 <russellb> pbx*CLI> channel originate Local/6000@public application playback spam
19:14:39 <jeblair> i have silence now...
19:14:48 <clarkb> jeblair: we are still talking
19:14:52 <jeblair> neat!
19:14:54 <fungi> silence huh?
19:15:10 <fungi> we have more callspam
19:15:18 <pabelanger> CPU is spiking, so we'd need to see why
19:15:27 <russellb> she will say anything.
19:15:29 <russellb> almost anything.
19:15:34 <anteaya> ha ha ha
19:15:47 <pabelanger> tt-monkeys FTW
19:18:06 <anteaya> just with the small group, the sound is about equivalent to one of the board meeting conference calls, as a listener
19:18:12 <anteaya> fungi has good sound
19:18:22 * ttx lurks
19:18:29 * mordred waves at ttx
19:18:37 <mordred> ttx: you want to dial in from france?
19:18:54 * mordred got dropped
19:19:12 <ttx> mordred: i'll pass, unless you REALLY need that tested
19:19:19 <anteaya> skype is charging me money to listen in
19:19:27 * ttx multitasks
19:19:36 <anteaya> so far I have paid 40 cents
19:19:47 <jeblair> ttx: not yet; i think we'd get a local number later
19:21:25 <jeblair> #action clarkb add pbx to cacti
19:21:32 <davidlenwell> what is the conf number ?
19:21:37 <anteaya> that actually was quite good
19:21:39 <jeblair> we all just hung up
19:21:43 <davidlenwell> oh .. too late
19:21:44 <anteaya> we just ended the session
19:21:51 <jd__> what? are we having a party on the phone or something?
19:21:53 <russellb> https://wiki.openstack.org/wiki/Infrastructure/Conferencing
19:22:04 * jd__ hides
19:22:05 <davidlenwell> jd__:  we missed it
19:22:07 <fungi> we can try to bum-rush it again next week once we have it in cacti to see what the impact is
19:22:08 <russellb> i documented it!
19:22:20 <pabelanger> you could totally hook the meeting room bot into asterisk too, once the meeting start, a new conference room is create and password outputted
19:22:25 <pabelanger> don't think that would be too hard
19:22:44 <jeblair> neat :)
19:22:50 <russellb> pabelanger: it's only software
19:22:54 <ttx> russellb: "on his spare time, the hero of cloud computing sets up asterisk"
19:23:00 <russellb> ttx: heh
19:23:13 <fungi> ttx: like you're one to talk, writing your own bug tracker
19:23:20 <ttx> russellb: not jealous at all.
19:23:21 <pabelanger> russellb: would be a fun integration
19:24:11 <jeblair> so i think next we want to get a handle on resource usage there... fwiw asterisk is currently almost idle now that we've hung up.
19:24:30 <jeblair> so i'm guessing the high cpu usage is related to our call
19:24:36 <clarkb> I think fail2ban is probably a reasonable thing to try as well
19:24:43 <clarkb> it will act as a rate limiter for badness
19:24:55 <anteaya> or spammers just target us when there is something going on
19:25:35 <jeblair> shall we move on?
19:25:47 <russellb> there was no spam while we were on the call fwiw
19:25:55 <russellb> so it was just conference related
19:25:55 <pabelanger> jeblair: yes.  Transcoding was the hit
19:26:06 <pabelanger> but I wouldn't expect it to be spiking
19:26:13 <russellb> pabelanger: dangit, should have looked to see what codecs were being used ...
19:26:29 <pabelanger> ya, I was trying to see that
19:26:37 <pabelanger> I think I seen some gsm with ulaw
19:26:43 <russellb> they were all from the provider, so i can just call back and check
19:27:08 <russellb> ulaw (which is good)
19:27:15 <russellb> k, can move on
19:27:26 <jeblair> ok
19:27:30 <pabelanger> might just be confbridge
19:27:32 <fungi> too attractive to spend the whole hour on a phone system ;)
19:27:36 <jeblair> #topic multiple jenkins masters
19:28:14 <jeblair> the multi-step process to get to the point where we can have multiple masters is almost complete...
19:28:33 <jeblair> we just need to do something to ensure that the bitrot jobs only run one place (and also that their logs are correctly processed)
19:29:14 <zaro> do we also need this? https://bugs.launchpad.net/openstack-ci/+bug/1082803
19:29:15 <uvirtbot> Launchpad bug 1082803 in openstack-ci "Manage jenkins global config and plugin list" [Medium,Triaged]
19:29:17 <jeblair> to that end, I think i can have a working timer trigger for zuul today, which means zuul/gearman can dispatch those jobs, and we can stop using the jenkins timer triggers for them
19:29:18 <clarkb> jeblair: reviewing your multiple triggers zuul change is on my list of things to do once we are done with bugs
19:29:37 <jeblair> zaro: that would be nice, but i think i will defer it for now...
19:29:53 <jeblair> because i'd like to have multiple masters within a few days (or weeks at the most)
19:30:04 <clarkb> jeblair: I agree. There is a pressing need for multiple masters which we can manage by hand until we have the automagic to do it with tools
19:30:24 <fungi> especially since multiple will initially be just two
19:30:30 <fungi> not counting the old one
19:30:38 <jeblair> i believe we're hitting performance problems related to the rate at which we are adding/removing slaves from jenkins for the devstack tests
19:30:54 <jeblair> so being able to scale that up is the motivating factor for this
19:31:08 <clarkb> jeblair: yes, the slave launch logs indicate it is taking up to a minute to add each slave
19:31:10 <fungi> that certainly sounds like the sort of unusual use case the jenkins devs wouldn't have optimized for
19:31:11 <clarkb> which is really really slow
19:31:31 <jeblair> after we have multiple masters, i plan on looking at more efficient and reliable ways of managing slaves.
19:31:51 <zaro> that is awsome.  i am talking about the gearman-plugin at this year's jenkins conf. will having something good to show!
19:32:09 <jeblair> zaro: cool!
19:32:12 <jeblair> when is that?
19:32:25 <zaro> oct 23-24
19:32:30 <zaro> want to come down?
19:32:34 <mordred> jeblair: on the efficient and reliable ways of managing slaves - I'd love to chat about that once you're there
19:33:15 <zaro> it's in palo alto.
19:33:38 <jeblair> mordred: my thoughts so far are that d-g should be a daemon instead of a bunch of jenkins jobs, and that gearman-plugin should (optionally) handle offlining slaves when jobs are done (since it's in a good place to do that)
19:33:51 <mordred> ah yes. that. totally with you
19:34:21 <jeblair> mordred: the daemon will be able to manage the nova boots in a better way.  we may still have bottlenecks adding and removing slaves...
19:34:48 <jeblair> howevever, we'll have a better handle on that in that we'll be able to tune how many api calls happen in parallel, etc...
19:34:54 <mordred> agree
19:35:09 <jeblair> also, we won't be making all these api calls at the same time jenkins itself is managing 16 running jobs (which are making the api calls)
19:35:33 <mordred> _after_ that, clarkb and I were supposing about kexec re-purposing of slaves, so that there would be less adds/removes
19:35:38 <clarkb> using a daemon will definitely make a lot of pain points less insane
19:35:40 <mordred> but I agree that tuning what we have first
19:35:49 <mordred> will let us grok the other things sanely
19:35:55 <jeblair> ah yeah, that could be cool.  the only thing about that is that i don't think we can count on always doing that
19:36:07 <jeblair> some jobs will completely break the node and it will need to be destroyed and replaced
19:36:17 <jeblair> but 90% of them may be able to be reused, which could be a huge win
19:36:28 <clarkb> jeblair: yup
19:36:41 <clarkb> jeblair: and a daemon should be smart enough to know when it can and cant make use of kexec
19:37:13 <jeblair> that could be the normal case, and then if we fail to kexec after a minute or two, we kill it and let, say, a hypothetical future low-watermark load based system decide that it needs to replinish the pool.
19:37:44 <mordred> jeblair: yup. and a hung-kexec probably will not respond to health pings :)
19:37:52 <jeblair> (and hopefully, we can apply a bunch of this to the regular nodes too)
19:38:02 <clarkb> that would be amazing
19:38:06 <fungi> sounds great
19:39:01 <jeblair> cool, sounds like we have general consensus on a very long road ahead :)
19:39:09 <jeblair> #topic requirements and mirrors
19:39:18 <jeblair> mordred: you want to talk about about what's going on in this area?
19:39:26 <mordred> ugh
19:39:28 <mordred> not really
19:39:37 <mordred> can I just close my eyes and make it go away?
19:39:50 <jeblair> maybe?
19:39:51 <mordred> so ... there are several issues
19:40:00 <jeblair> (it's worth a shot)
19:40:10 <mordred> one is that pip installs things not in a dependency graph order
19:40:33 <mordred> but in a strange combo of first takes precedence and sometimes highest takes precedence
19:41:09 <mordred> which makes things gigantically problematic for devstack when things change, because sequencing will cause not what you expected to be installed
19:41:12 <mordred> SO
19:41:15 <clarkb> I did open a bug against this problem with upstream pip dstufft thought it may be fixed in 1.5
19:41:18 <mordred> current things on tap to fix this
19:41:23 <clarkb> but that doesn't help us today
19:41:40 <mordred> sdague and I are working on getting the requirements update.py script in good shape
19:41:50 <mordred> with the idea that in setup_develop in devstack
19:42:09 <mordred> the first step will be to run the update.py script from requirements on the repo to be setup_develop'd
19:42:18 <sdague> yes, we're close, except for the fact that update.py doesn't handle oslo tarball urls
19:42:23 <mordred> this will ensure a consistent worldview of which packages should be installed
19:42:25 <sdague> I think that's actually the only missing piece
19:42:28 <clarkb> #link https://github.com/pypa/pip/issues/988 there is apparently some undocumented feature that may be useful in mitigating this
19:42:39 <mordred> sdague: grab most recent trunk - I think that's fixed now
19:42:49 <sdague> mordred: really?
19:42:58 <sdague> if so I can kick my devstack job again
19:43:04 <jeblair> mordred: setup_develop sounds like a good idea
19:43:09 <mordred> then - once that's happening, we should be able to gate requirements on devstack running with requirements
19:43:14 <mordred> so we can know if a new req will bork devstack
19:43:28 <jeblair> awesome
19:43:38 <mordred> and then, with the two of those in place, we sohuld be in good shape to auto-propose patches to projects on requirements changes landing
19:43:53 <mordred> incidentally, I also just wrote a patch to update.py to make it sync the setup.py file
19:43:59 <mordred> so those really will be managed boilerplate
19:44:05 <clarkb> mordred: maybe you can get dstufft to give us a tl;dr on the undocumented feature referred to in the pip bug
19:44:19 <clarkb> and see if that actually does help us
19:44:22 <mordred> k.
19:44:30 <mordred> I'll ask him
19:44:31 <sdague> also, I've got unit testing for update.py inbound, as soon as mordred does a pbr fix
19:44:45 <mordred> yup. we have a pbr bug to bug folks about to help this
19:44:50 <sdague> because tox in requirements/ is fun :)
19:44:54 <mordred> then I'm also going to get per-branch mirrors up
19:45:34 <mordred> and finally- I think we should ditch installing from pip and figure out a way to auto-create debs and install from those - because we're spending WAY too much time on this
19:46:05 <sdague> I think that just moves the pain elsewhere
19:46:08 <mordred> but I am not working on that
19:46:09 <jeblair> mordred: won't that put us in the "can't use python-foo version X because it's not packaged yet" boat again?
19:46:26 <clarkb> jeblair: yes, that was my worry when we were discussingthis the other day
19:46:43 <mordred> possibly to both of you ... but we are spending a LOT of effort on this
19:46:47 <jeblair> mordred: which i'm okay with, if we think we're in a better place to deal with that new (less churn, more packagers, cloud archive, etc)
19:46:51 <mordred> and turning up every corner case in python
19:46:58 <jeblair> s/new/now/
19:47:07 <mordred> I don't think it's the right thing to work on right now
19:47:20 <mordred> but I do think perhaps at the summit, we should discuss what it might look like in earnest
19:47:20 <sdague> I think it's a summit session, honestly
19:47:26 <mordred> jinx
19:47:31 <jeblair> sounds good to me
19:47:34 <clarkb> ++
19:47:43 <mordred> I still agree that we do not want to be in the business of shipping debs or rpms
19:47:46 <jeblair> mordred: (and i agree with you in principle)
19:47:59 <mordred> but like we've talked about for infra packages, operationally helpful packages might be helpful
19:48:24 <jeblair> we're at a very different place than we were 2 years ago
19:48:38 <mordred> very much so
19:48:44 <mordred> https://review.openstack.org/39363 btw
19:48:46 <jeblair> anything else on this topic?
19:49:00 <mordred> for those who feel like looking at a pbr change related to helping the requirements work
19:49:13 <jeblair> #top Gerrit 2.6 upgrade (zaro)
19:49:23 <jeblair> #topic Gerrit 2.6 upgrade (zaro)
19:49:31 <jeblair> zaro: how's it going?
19:49:39 <zaro> ohh.  it's going..
19:49:53 <zaro> i think i have gerrit with WIP votes working now.
19:50:03 <clarkb> with one small asterisk right?
19:50:26 <jeblair> is that with a patch?
19:50:29 <zaro> nope! just figured out the bit of prolog..
19:50:35 <mordred> neat!
19:50:39 <fungi> prolog!
19:50:40 <clarkb> zaro: but it requires the patch for the change owner right?
19:50:42 <zaro> yes, it's gerrit core.
19:50:44 <mrodden> clarkb: thanks, didn't realize the check_uptodate.sh was just nova
19:50:56 <jeblair> zaro: have you proposed it upstream yet?
19:51:14 <pleia2> o/
19:51:24 <zaro> i just got it all to work.  will submit a patch to upstream this week.
19:51:25 <fungi> pleia2 found an internet
19:51:30 <sdague> so couldn't we get WIP equiv by making APROV -1,0,+1 and letting an author -1 APROV his/her own patches?
19:51:34 <jeblair> zaro: neat! :)
19:51:36 <zaro> let's see what they say.
19:51:41 <jeblair> sdague: that's actually the plan we discussed...
19:51:45 <sdague> oh... ok :)
19:51:47 <mordred> if they're amenable to the patch in general
19:52:05 <mordred> do we think we'd be willing to roll out a 2.6 with only that patch
19:52:06 <jeblair> sdague: and i believe the "let an author..." bit is what zaro was patching to support
19:52:11 <zaro> i have been in discussions with mfink?  and i did it on his suggestion.
19:52:21 <mordred> oh awesome
19:52:23 <clarkb> jeblair: and/or adding a WIP category, but both changes need the patch zaro wrote to be expressible in the ACLs
19:52:25 <jeblair> sdague: (existing gerrit acls don't support that operation)
19:52:26 <sdague> ok, I had thought that permission was already in,
19:52:28 <sdague> ok
19:52:57 <zaro> ok.  that's it for now.
19:53:06 <mordred> I think, given a history of carrying a whole string of patches, carrying one patch for a cycle would not be as terrible
19:53:13 <jeblair> zaro: i think mfink has a very strong voice in the gerrit community, so if he likes it, that's great.  :)
19:53:30 <zaro> good to hear.
19:53:43 <jeblair> mordred: let's give this "develop upstream" thing a shot, eh? :)
19:53:55 <mordred> jeblair: sure!
19:54:08 * mordred just wasn't sure what the 2.7 schedule was looking like
19:54:12 <jeblair> zaro: i'm very excited, thanks!
19:54:18 <clarkb> me too
19:54:22 <jeblair> #topic cgit server status (pleia2)
19:54:31 <jeblair> pleia2: welcome!
19:54:34 <pleia2> so, the one thing I wanted to talk about is addressing for grabbing git repos
19:54:44 * mordred welcomes our new git.o.o overlords
19:55:08 <pleia2> the plan right now is to do what fedora does (it's easy) and do git://git.o.o/heading/project and http://git.o.o/cgit/heading/project
19:55:13 <pleia2> so they aren't the same :(
19:55:30 <pleia2> git.kernel.org makes it so they are both git.kernel.org/prod/
19:55:30 <mordred> I think jeblair has convinced me that I can get over my issues with the cgit in the url
19:55:35 <pleia2> err /pub
19:55:40 <clarkb> pleia2: I think we can lie and put them all under /cgit
19:55:48 <clarkb> even though cgit doesn't do git protocll
19:55:57 <pleia2> clarkb: yeah, that's really easy
19:56:00 <jeblair> pleia2: i'm okay with that because you have indicated cgit really wants to be set up like that, fedora and kernel work that way...
19:56:01 <jeblair> er
19:56:14 <mordred> I do not want to add cgit to the git:// url
19:56:20 <fungi> does kernel.org use rewrites or something? (if first node in the path matches an org, rewrite the url)
19:56:29 <jeblair> you want to put the git-protocol repos under 'cgit/'? that doesn't sound good to me
19:56:34 <mordred> if we lie about anything, I want the cgit to go away from the urls in places
19:56:49 <pleia2> fungi: probably, I clicked down into a project to see what urls they provide visually and:
19:56:53 <pleia2> git://git.kernel.org/pub/scm/bluetooth/bluez.git
19:56:53 <pleia2> http://git.kernel.org/pub/scm/bluetooth/bluez.git
19:56:53 <pleia2> https://git.kernel.org/pub/scm/bluetooth/bluez.git
19:56:53 <mordred> but I'm fine with having the web and the clone be different
19:56:55 <pleia2> ^^ ie
19:56:56 <uvirtbot> pleia2: Error: "^" is not a valid command.
19:57:16 <jeblair> https://git.kernel.org/pub/scm/bluetooth/bluez.git  is not cgit
19:57:20 <jeblair> it's regular git
19:57:26 <pleia2> ah, interesting
19:57:47 <clarkb> so maybe that is what we do, put regular git http daemon behind /pub put git:// behind /pub then cgit can have /cgit?
19:57:48 <mordred> right- I think git clone http:// and git clone git:// should have the same urls
19:57:48 <jeblair> so i think it makes sense to serve cgit from /cgit
19:57:54 <clarkb> mordred: I agree
19:58:02 <clarkb> and I agree with jebalir
19:58:03 <fungi> worth noting, you don't have to clone from cgit, you can clone from http(s) published copies of the git trees
19:58:09 <fungi> er, that
19:58:10 <mordred> and I don't think we need pub - I think that can go in root
19:58:13 <jeblair> and let's serve http and git protocols without a prefix
19:58:21 <mordred> git clone git://git.openstack.org/openstack/nova.git
19:58:22 <clarkb> wfm
19:58:30 <jeblair> mordred: +1
19:58:42 <jeblair> #topic Py3k testing open for business (fungi, zul, dprince, jd__, jog0)
19:58:49 <jeblair> fungi: 1 minute!
19:58:51 <fungi> just a quick update
19:58:57 <fungi> we're basically ready
19:59:03 <mordred> w00t
19:59:05 <fungi> couple of reviews which need last-minute attention...
19:59:07 <jeblair> yaay!
19:59:12 <jeblair> oh
19:59:15 <fungi> #link https://review.openstack.org/#/q/status:open+project:openstack-infra/config+branch:master+topic:py3k,n,z
19:59:29 <jeblair> oops
19:59:44 <jeblair> (plug for my new gate test which will catch missing jobs!)
19:59:54 <fungi> but that's all the updates i have time for in here. we can pickit up i #-infra
20:00:05 <jd__> what's missing?
20:00:09 <jeblair> so close!
20:00:12 <fungi> jd__: see link
20:00:30 <fungi> jd__: we missed that when reviewnig your change earlier
20:00:37 * ttx whistles innocently
20:00:43 <jd__> too bad
20:00:47 <jeblair> thanks everyone!
20:00:47 <markmc> hey
20:00:50 <russellb> hi
20:00:54 <jeblair> #endmeeting