19:00:46 <jeblair> #startmeeting infra
19:00:46 <openstack> Meeting started Tue Sep 24 19:00:46 2013 UTC and is due to finish in 60 minutes.  The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:47 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:49 <openstack> The meeting name has been set to 'infra'
19:00:57 <jeblair> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting
19:01:02 <jeblair> packed agenda
19:01:11 <jeblair> #topic  Actions from last meeting
19:01:20 <jeblair> #link http://eavesdrop.openstack.org/meetings/infra/2013/infra.2013-09-10-19.01.html
19:01:36 <jeblair> fungi, clarkb: you moved marconi?
19:01:47 * mordred confirms that marconi was moved
19:01:49 <clarkb> we did
19:01:53 <fungi> yup
19:02:03 <fungi> mordred was there as well
19:02:06 <zaro> o/
19:02:10 <jeblair> cool.  anything we should know about that?
19:02:12 <mordred> (and after that noticed a couple of things that they used that were not in our mirror :) )
19:02:16 <clarkb> I think the only hiccup was that we need to be patient with cgit to pick up the move
19:02:26 <clarkb> it will eventually notice once puppet has the new stuff and runs on the git servers
19:02:42 * mordred had thoughts of a salt-related manage-projects that can do the triggering/sequencing.
19:02:45 * mordred will not work on that soon
19:02:59 <jeblair> mordred: ++, also just having salt run puppet will help a bit.
19:03:03 <mordred> yah
19:03:28 <jeblair> since this was an action item, i'll jump to it:
19:03:31 <jeblair> #topic Asterisk server (jeblair, pabelanger, russelb)
19:03:35 <jeblair> i did send an email
19:03:39 <russellb> hi
19:03:53 <jeblair> it looks like we're good to tweak some parameters on friday at 1700 utc
19:04:13 <jeblair> hopefully we'll end up with a config we like on our current server...
19:04:40 <jeblair> or if not, we can use one of the other servers or disable silence detection
19:05:03 <jeblair> if some folks could be around then for that testing, that would be great
19:05:09 * clarkb will be around
19:05:11 <pleia2> sure thing
19:05:18 * anteaya should be around then
19:05:28 <jeblair> cool, thanks.  anything else on this?
19:05:49 * zaro will be around
19:05:55 * fungi too
19:06:00 <jeblair> #topic Backups (clarkb)
19:06:16 <jeblair> clarkb: how's that goin?
19:06:20 <clarkb> good
19:06:27 <clarkb> review.o.o is now backed up
19:06:32 <anteaya> yay
19:06:41 <jeblair> clarkb: wonderful!
19:06:52 <clarkb> I am debating whether I should go ahead and do etherpad now or wait for the rebuilt server (which I expect to get to next week) which will use cloud dbs
19:07:15 <clarkb> and I think we need to think a little about how we will do ci-puppetmaster
19:07:20 <jeblair> clarkb: i'm okay with waiting
19:07:36 <fungi> have we started backing up trove databases remotely yet?
19:08:00 <clarkb> fungi: no, that will require a minor update to my backup module. Basically pass creds in instead of using default my.cnf
19:08:13 <fungi> also, apparently there is some way hub_cap was saying to trigger ad download a trove backup, though i can't recall whether it was suitable for our needs
19:08:23 <mordred> clarkb and I noodled a very small amount about ci-puppetmaster
19:08:25 <fungi> something about being careful not to upload to the same swift store
19:08:38 <mordred> don't think we were really truly happy about any of the thoughts
19:08:51 <clarkb> fungi: their backups don't give us what jeblair wants out of backups, append only etc
19:08:59 <fungi> ahh, right
19:09:00 <clarkb> fungi: I think we need to take our own backups too
19:09:07 <fungi> yep, makes sense
19:09:15 <mordred> I agree - although I think that their backups might be a nice addition to the set
19:09:16 <jeblair> well, as a mechanism for getting the data out of mysql, they may be sufficient...
19:09:21 <mordred> (you can't have too much backups)
19:09:36 <jeblair> but we should then archive them somewhere ourselves for the other reasons
19:09:40 <clarkb> yeah it isn't a bad thing, just not completely sufficient for our needs
19:09:52 <mordred> BUT - if we start to use/use them, we'll be tying to rax, as the mechanism is apparently different between hp and rax
19:09:55 <mordred> sigh
19:10:05 <jeblair> that's a good reason to just ignore it
19:10:10 <fungi> as for the puppetmaster, i think maybe we ought to make sure that we have clearly-defined places on the filesystem which we exclude from bup, and then we can make periodic encrypted tarballs of those trees and stick them somewhere reachable so as not to make bup's git too worthless
19:10:10 <mordred> because, you know, that's a winning idea
19:10:29 <clarkb> fungi: that sounds like a reasonable approach
19:10:32 <mordred> fungi: the thing we're most concerned with is backing up that one file - everything else is a simple rebuild
19:10:45 <clarkb> mordred: two files
19:10:48 <clarkb> but yes
19:10:54 <mordred> ah. wait  I see what fungi is saying now
19:11:22 <fungi> but yeah, if there's little of interest on the server besides what we want to encrypt, then maybe there's little point to using bup other than for consistency
19:11:22 <mordred> so - possible rathole - but we discussed perhaps having a sharded master key that infra-core could each have a part of
19:11:34 <mordred> perhaps we encrypt that dir with the public part of that
19:11:47 <mordred> and if we ever need to restore, it takes the key re-combination to do so
19:12:07 <mordred> (this started in conversations about how to deal with hong kong infra issues)
19:12:13 <fungi> we can do an x-out-of-y multipart key, but yeah rathole
19:12:21 <mordred> yah. come back to that later. sorry
19:12:24 <clarkb> I don't think we need to make a decision now, but this is the sort of thing we need to sort out to backup that host in a reasnable manner
19:12:27 <jeblair> fun.  :)
19:12:37 <jeblair> #topic puppet-dashboard (pleia2, anteaya)
19:12:59 <jeblair> pleia2, anteaya: what's the latest?
19:13:01 <anteaya> so far I have instructions to get a dashboard and master up
19:13:14 <anteaya> I was using pleia2's hpcloud account and now have my own
19:13:24 <anteaya> so I will bring up those nodes again on my own
19:13:39 <anteaya> then I have to get them talking to each other, so bascially the same status as the last time we talked
19:13:49 <pleia2> these are manual instructions, once we have this going we'll dig into what we need to change about our puppet module
19:14:38 <hub_cap> heyo
19:14:48 <pleia2> that's about it
19:14:54 <jeblair> ok thanks.  i'm looking forward to having a usable dashboard.  :)
19:15:04 <anteaya> yes
19:15:09 <mordred> ++
19:15:10 <jeblair> i think the marconi agenda topic is stale, yeah?
19:15:15 <mordred> yah
19:15:28 <jeblair> #topic Trove testing
19:15:33 <jeblair> hub_cap: just in time!
19:15:39 <jeblair> clarkb: has an etherpad link, yeah?
19:15:51 <clarkb> I do
19:15:59 <clarkb> #link https://etherpad.openstack.org/testing-heat-trove-with-dib
19:16:50 <jeblair> that makes sense to me.  are there any high level questions we should address before i dive into specifics about storing/publishing the images?
19:17:06 <mordred> the one thing that we discovered from talking to lifeless after the heat/trove discussion is that caching the upstream images might be a little more complex than we originally thought
19:17:23 <mordred> but I don't fully understand the details, so I expect to poke at that with lifeless
19:17:53 <jeblair> mordred: oh, ok.
19:18:05 <clarkb> mordred: I don't think it is the fully built images that are a problem it is the point about building ubuntu and fedora images to build a package cache
19:18:20 <mordred> no - there's another thing
19:18:33 <mordred> the thing that dib caches is not the exact thing that's downloaded
19:18:39 <mordred> so we need or they need to grow a thing
19:18:59 <mordred> I belive they're going to grow a feature
19:19:04 <jeblair> #action mordred understand and explain the mysterious caching issue with the heat/trove test plan
19:19:06 <anteaya> yes the download then a form of unpacking and then selecting a thing which is cached
19:19:21 <mordred> anteaya: ++
19:20:03 <jeblair> so i was thinking that later this week i would effect the tarballs move (from old-wiki to static.o.o)
19:20:23 <mordred> woot
19:20:27 <jeblair> considering step 1 in that plan, it might be a good time to talk about where these images would be published...
19:20:48 <jeblair> should we just dump them in a directory on tarballs.o.o, or create a new images.o.o vhost?
19:21:06 <jeblair> (i mean really, tarballs.o.o is more like "published built artifacts.o.o" anyway...
19:21:17 <mordred> yah
19:21:20 <jeblair> it also holds jars and wars, for instance)
19:21:20 <mordred> according to heat and trove, they do not expect to produce tons of these
19:21:34 <pleia2> will our build systems have access to writing to this server where the images are cached?
19:21:49 <mordred> but I do believe tripleo long-term would like to publish a larger and more frequently updated set of images
19:21:56 <clarkb> pleia2: priveleged jenkins will
19:21:57 <mordred> so I kinda think subdir on tarballs.o.o for now
19:22:03 <jeblair> so i'd be okay with just putting them in, say, tarballs.o.o/trove/something.img, unless we wanted to make this thing a real public service with a nicer url.
19:22:11 <mordred> and then sort out a swift-backed-glance for later when we expect more real traffic
19:22:18 <mordred> jeblair: ++
19:22:34 <pleia2> clarkb: ok, safe enough then (at least, not less safe than anything we're doing now)
19:22:43 <mordred> I think there is a larger design that could be nice here, but is not necessary to get through step one
19:22:54 <jeblair> ok, so we'll start there... what kind of space requirements should we expect for heat and trove in the medium term?
19:23:19 <mordred> 4 images. they have not indicated that they need historical storage - but rather a "this is the one that works"
19:23:39 <jeblair> hrm, well we never delete anything from tarballs.o.o now, but i suppose we could
19:23:47 <mordred> until we get a broader requirement, I say we stick with the equiv of a master.img
19:23:56 <jeblair> oh ok
19:24:01 <mordred> and then maybe a $tag.img if they ever do those?
19:24:12 <jeblair> so upload and overwrite an existing filename
19:24:28 <clarkb> I think we need at least some rotation for debugging purposes
19:24:40 <clarkb> (similar to why we keep old snapshot(s) for nodepool)
19:24:53 <clarkb> because these artifacts will be used in the gate
19:25:21 <jeblair> none of this is easy with jenkins scp uploading (neither atomic rewrites, symlinks, or rotation).  this is another use-case for a smarter artifact receiver.
19:26:09 <jeblair> but, at the moment, that's what we have, so i think the best approximation would be to upload a unique file as well as a master.img file...
19:26:15 <jeblair> and have a cron job delete old unique files
19:26:21 <mordred> I'm fine with that
19:26:34 <mordred> again - they say these images do not change frequently
19:26:44 <mordred> "almost never" was the phrase used
19:26:48 <fungi> the cron job is more or less already there, just needs a pattern and relevant timeframe i think
19:26:53 <jeblair> so what kind of size are we talking about?
19:27:33 <jeblair> a few gb total i'm guessing (a handful of couple-hundred-mb images each?)
19:27:33 <fungi> ahh, so maybe tagging them (releasing the images) makes sense then?
19:27:40 <mordred> hub_cap: ^^ ?
19:27:44 <clarkb> the images I was building last week were >200MB and less that <1GB. This was for tripleo so may not represent anything like what trove and heat need
19:27:56 <mordred> I believe trove and heat do not need large images
19:28:27 <mordred> so, yeah to what jeblair said
19:28:54 <jeblair> okay, let's throw 50gb at it for starters.
19:29:03 <hub_cap> sorry dude im totally afk w baby
19:29:07 <hub_cap> im back let me scroll up
19:29:21 <mordred> hub_cap: tl;dr - how big are your images
19:30:02 <hub_cap> im thinking ~60m
19:30:05 <jeblair> #action jeblair move tarballs.o.o and include 50gb space for heat/trove images
19:30:06 <hub_cap> i can find out tho
19:31:04 <jeblair> anything else on trove(/heat) testing?
19:31:33 <jeblair> #topic Tripleo testing
19:31:44 <jeblair> lifeless: ping
19:31:55 <jeblair> so lifeless sent an email about this, and there are a few replies
19:32:14 <jeblair> #link https://etherpad.openstack.org/tripleo-test-cluster
19:32:17 <lifeless> hi
19:32:19 <hub_cap> ohhhhh qcow2 is 400m
19:32:29 <hub_cap> i was totally wrong
19:32:54 <jeblair> in broad strokes, this plan also seems reasonable
19:33:31 <jeblair> mordred: did you want to update folks on the changes you made to the etherpad, or is everyone up to speed on that?
19:33:39 <jeblair> hub_cap: ok, i'll stick with my swag of 50g then
19:34:10 <mordred> basically, since we talked, I remembered that nodepool has the ability to have a custom bootstrap script per base-image type
19:34:31 <mordred> which means that rather than piggypacking on the d-g nodes, we could also choose to just make a whole new node type to deal with the tripleo slaves
19:34:31 * ttx lurks
19:34:39 <hub_cap> jeblair: 50g is reasonable (sry for the late response)
19:34:44 <mordred> I do not know if that's better or worse
19:34:45 <jeblair> hub_cap: np, thx
19:35:06 <mordred> considering that we need to solve upstream image caching on d-g nodes for trove and heat anyway
19:35:08 <jeblair> mordred: well, the tripleo slaves want to run on the tripleo cloud, yeah?
19:35:16 <mordred> they do
19:35:49 <jeblair> mordred: so that's a new provider -- and unless we want to run d-g tests there, then it needs to be a new image too
19:35:58 <jeblair> so i think that strongly suggests that direction :)
19:36:03 <mordred> nod.
19:36:20 <mordred> so, lifeless, that's the change I made to the etherpad since then
19:37:00 <mordred> also, jhesketh popped in channel with something like hipster-zoro or something yesterday, which is a zuul-gearman based non-jenkins job runner
19:37:08 <mordred> which I think might be worthwhile looking at for the pieces of this that need built
19:37:11 <mordred> although might be wrong
19:37:31 <mordred> that's all
19:37:39 <fungi> "turbo-hipster"
19:37:49 <mordred> yah
19:37:51 <mordred> https://github.com/rcbau/turbo-hipster
19:38:09 <jeblair> mordred: i love it; i think he's trolling you with a whole project.  :)
19:38:15 <mordred> it might be the wrong design for what lifeless needs out of it
19:38:33 <fungi> it has a beard and skinny jeans. what more does it need?
19:38:43 <mordred> craft beer and a food truck
19:39:06 <anteaya> and an untucked shirt
19:39:11 <clarkb> mordred: beer consumption among the hipster demographic is down, craft liquor is what you need
19:39:19 <lifeless> so I want to only vary from existing CI tooling where needed
19:39:29 <lifeless> better to migrate as part of a bigger plan than be a special snowflake
19:39:38 <mordred> yah. I was more thinking about your broker
19:39:44 <mordred> with the turbo-hipster
19:39:57 <mordred> like I said - may be COMPLETE mismatch of purpose
19:40:03 <lifeless> I think so
19:41:30 <mordred> jeblair: any specific questions you wanted to dive in to there?
19:41:32 <jeblair> well, i think the in-person planning and resulting documentation have made this a very easy topic.  i'm interpreting this as widespread agreement and support.
19:42:44 <jeblair> mordred: i don't think so.  broadly speaking, i think it's sound.  there are lots of fun details to work out, but i think they are all tractable problems (easier to deal with when we get closer)
19:42:57 <clarkb> ++
19:43:05 <mordred> ++
19:43:24 <mordred> jeblair: we even did most of it without beer
19:43:30 <jeblair> mordred: i can tell!
19:43:33 <jeblair> #topic Salt (UtahDave)
19:43:45 <jeblair> fungi, UtahDave: ?
19:43:50 <UtahDave> o/
19:44:05 <fungi> i've got a small to do list on this
19:44:15 <fungi> #link https://etherpad.openstack.org/salt-slavery-and-puppetry
19:44:25 <fungi> mostly just clean-up now
19:44:43 <fungi> we've been ironing out the stability issues seen previously and i think we're down to the last one
19:45:00 <fungi> we were just looking into it before the meeting
19:45:07 <UtahDave> mostly making sure zmq 3.2+ is installed.
19:45:11 <jeblair> fungi: point #1 -- i think we add repos to a whitelist where we trust unattended upgrades...
19:45:22 <jeblair> fungi: are you planning on doing that for salt?
19:45:28 <fungi> jeblair: we have, but i didn't do it right--needs fixing
19:45:32 <jeblair> ah ok
19:45:48 <fungi> jeblair: i think my ruby/puppet list iteration is wrong is all
19:46:15 <anteaya> fungi: I can peak after the meeting if you want
19:46:18 <fungi> UtahDave: did you have details on why #2 there helped?
19:46:24 <UtahDave> fungi: point #3   the salt-minion should be restarted, not reloaded.
19:46:30 <fungi> i'd like to make sure i include it in the commit message
19:46:58 <fungi> UtahDave: ahh! i heard you backwards earlier. puppet only knows restart unless you do fancy things
19:47:01 <clarkb> does zmq 3.2+ support 2.X? protocol?
19:47:04 <fungi> striking from the list
19:47:28 <UtahDave> fungi: Yes, it has to do with the Salt Mine. It's a fairly new feature that runs frequently. There apparently was a bug that was causing those issues
19:47:52 <clarkb> we may need to do a quick check that newer zmq eg 3.2+ won't break the jenkins event stream plugin
19:48:07 <fungi> UtahDave: if the salt mine bug link is handy, i'll keep an eye on it so i know when it's safe to revert that bit
19:48:25 <fungi> clarkb: good point
19:48:28 <UtahDave> fungi: I don't have it right here. I'll track it down and get it to you.
19:48:35 <fungi> UtahDave: no rush, and thanks
19:48:45 <UtahDave> clarkb: I'm not sure how zmq 3.2 would affect Jenkins.
19:48:53 <fungi> clarkb: which machines specifically are involved in that right now? just jenkins/zuul/logstash?
19:48:54 <clarkb> the jenkins side is jeromq which is java native and won't be affected but the python things talking to jeromq use libzmq
19:48:55 <UtahDave> fungi: np!
19:49:03 <clarkb> fungi: and nodepool
19:49:14 <fungi> clarkb: ahh, right
19:49:24 <fungi> clarkb: but the jenkins slaves themselves are not, right?
19:49:29 <clarkb> fungi: correct
19:50:43 <mordred> we eventually want salt minions on everything for the salt puppeting - so I think that's a good thing to check
19:50:49 <jeblair> ++
19:51:01 <fungi> mordred: yep, item #8 on that to do list
19:51:12 <UtahDave> testing is obviously in order, but my guess is that upgrading to zmq3.2 won't cause any communication issues.  Salt works with both zmq 2.1.x and zmq 3.2-x
19:51:23 <clarkb> UtahDave: that is what I expected
19:51:41 <UtahDave> zmq 3.2.x  fixes a bunch of stability issues
19:52:17 <jeblair> fungi, UtahDave: anything else on this topic?
19:52:47 <fungi> no, i think that's about it--continuing to hack away at it
19:52:51 <UtahDave> As fungi ties off this initial project, if anyone else has any uses for Salt, please let me know
19:52:57 <jeblair> UtahDave: thanks for your (countinued!) help
19:53:06 <jeblair> #topic Owncloud (anteaya)
19:53:12 <jeblair> anteaya: hi!
19:53:17 <anteaya> I have stood up an owncloud: 15.185.188.187/owncloud/
19:53:39 <anteaya> and gave out credentials to the infra team via pm last week
19:53:40 <anteaya> hi
19:53:57 <anteaya> the reason I did this was mordred asked me to
19:54:04 <anteaya> apparently the board wants to use it
19:54:13 <jeblair> i think mordred (in his board capacity) is driving requirements for this...
19:54:18 <anteaya> yes
19:54:24 <mordred> yah. the board would like a place to put documents
19:54:31 <mordred> so that we can stop having a private mailing list
19:54:38 <jeblair> anteaya, mordred: so do you want to do some acceptance testing with the install anteaya has set up?
19:54:39 <anteaya> will owncloud suit their needs?
19:54:48 <anteaya> let's do that
19:55:02 <mordred> I think the main outstanding questions are:
19:55:02 <mordred> auth integration of some sort
19:55:21 <zaro> anteaya: did you check if it works with windows 7, the webdav portion?
19:55:30 <anteaya> zaro: I have not, no
19:55:36 <jeblair> mordred: i think that would be nice, but with a limited pool of users, integrated auth support could probably be deferred...
19:55:42 <mordred> uhm, I think that's the main one
19:55:47 <mordred> totally. we can totally do by-hand auth
19:55:50 <anteaya> I think I still have the link you sent me though
19:55:58 <mordred> I think at first the main purpose of this is "so board can share stuff"
19:56:06 <jeblair> mordred: want to share some docs with us then?
19:56:09 <anteaya> they can look at files
19:56:11 <mordred> it might be nice to expand that in te future to "so devs and stuff can share stuff"
19:56:13 <jeblair> zaro: good idea to test webdav on osx and windows
19:56:19 <anteaya> but they can't edit other people's files
19:56:27 <anteaya> it is not a group file editing app
19:56:29 <clarkb> isn't group based permissions important as well?
19:56:31 <jeblair> mordred: integrated auth would def be a requirement for that
19:56:36 <mordred> yah
19:56:48 <pleia2> anteaya: oh, I have a win7 install floating around, just let me know what to test
19:56:56 <mordred> clarkb: not for step one - if this is only used by the board, there is only one group :)
19:57:10 <mordred> anteaya: I do not seem to be able to add things to the shared folder
19:57:11 <anteaya> okay, perhaps just logging in to start?
19:57:33 <pleia2> anteaya: will do
19:57:41 <anteaya> mordred: create a file and then share it
19:57:44 <mordred> there _do_ seem to be groups in it - and you seem to be able to share with groups
19:57:51 <clarkb> mordred: right, but if there is a step two and dropbox doesn't do group permissions... I think we need to test it a bit as if we were using it in the desired end state
19:57:53 <anteaya> you have to click it and then select share
19:58:01 <mordred> I hav ejust shared a file
19:58:11 <mordred> clarkb: agree
19:58:38 <mordred> do people see the file I shared?
19:58:40 <jeblair> mordred, anteaya: ok, let us know if there's specific testing you would like us to do; other than that, i think when mordred decides it's okay we should puppet it, yeah?
19:58:53 <anteaya> jeblair: yes
19:58:54 <mordred> I think we should work on puppeting it
19:58:58 <mordred> it seems to meet the basic use case
19:59:08 <mordred> which is file sharing for a group of 24 people managed by hand
19:59:13 <anteaya> okay, I will work with anyone wanted to test the functionality of owncloud
19:59:17 <jeblair> mordred: i have received your kerrerts.
19:59:20 <anteaya> and also will start to puppet it
19:59:26 <mordred> if there are more advanced things we want out of it, Alan clark has offered to have suse fix things if they don't work
19:59:40 <jeblair> we did not get to this topic: elastic-recheck (clarkb, jog0, mtreinish)
19:59:41 <clarkb> do we want to look at alternatives as well?
19:59:42 <anteaya> awesome
19:59:43 <mordred> anteaya: it should probably be configured to use swift as a backend and stuff
19:59:48 <jeblair> i'll move it to top of agenda for next week
19:59:53 <clarkb> jeblair: thanks
20:00:05 <jeblair> clarkb, jog0, mtreinish: if there are urgent things related to that, we can overflow into -infra channel now
20:00:05 <anteaya> mordred: okay, this one had mysql backend, I will work on a swift backend
20:00:10 <mordred> clarkb: I don't care enough - but if there are alternatives people know about, then whee!
20:00:15 <mordred> anteaya: thanks!
20:00:20 <jeblair> thanks everyone!
20:00:24 <jeblair> #endmeeting