19:00:46 <jeblair> #startmeeting infra 19:00:46 <openstack> Meeting started Tue Sep 24 19:00:46 2013 UTC and is due to finish in 60 minutes. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:47 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:49 <openstack> The meeting name has been set to 'infra' 19:00:57 <jeblair> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting 19:01:02 <jeblair> packed agenda 19:01:11 <jeblair> #topic Actions from last meeting 19:01:20 <jeblair> #link http://eavesdrop.openstack.org/meetings/infra/2013/infra.2013-09-10-19.01.html 19:01:36 <jeblair> fungi, clarkb: you moved marconi? 19:01:47 * mordred confirms that marconi was moved 19:01:49 <clarkb> we did 19:01:53 <fungi> yup 19:02:03 <fungi> mordred was there as well 19:02:06 <zaro> o/ 19:02:10 <jeblair> cool. anything we should know about that? 19:02:12 <mordred> (and after that noticed a couple of things that they used that were not in our mirror :) ) 19:02:16 <clarkb> I think the only hiccup was that we need to be patient with cgit to pick up the move 19:02:26 <clarkb> it will eventually notice once puppet has the new stuff and runs on the git servers 19:02:42 * mordred had thoughts of a salt-related manage-projects that can do the triggering/sequencing. 19:02:45 * mordred will not work on that soon 19:02:59 <jeblair> mordred: ++, also just having salt run puppet will help a bit. 19:03:03 <mordred> yah 19:03:28 <jeblair> since this was an action item, i'll jump to it: 19:03:31 <jeblair> #topic Asterisk server (jeblair, pabelanger, russelb) 19:03:35 <jeblair> i did send an email 19:03:39 <russellb> hi 19:03:53 <jeblair> it looks like we're good to tweak some parameters on friday at 1700 utc 19:04:13 <jeblair> hopefully we'll end up with a config we like on our current server... 19:04:40 <jeblair> or if not, we can use one of the other servers or disable silence detection 19:05:03 <jeblair> if some folks could be around then for that testing, that would be great 19:05:09 * clarkb will be around 19:05:11 <pleia2> sure thing 19:05:18 * anteaya should be around then 19:05:28 <jeblair> cool, thanks. anything else on this? 19:05:49 * zaro will be around 19:05:55 * fungi too 19:06:00 <jeblair> #topic Backups (clarkb) 19:06:16 <jeblair> clarkb: how's that goin? 19:06:20 <clarkb> good 19:06:27 <clarkb> review.o.o is now backed up 19:06:32 <anteaya> yay 19:06:41 <jeblair> clarkb: wonderful! 19:06:52 <clarkb> I am debating whether I should go ahead and do etherpad now or wait for the rebuilt server (which I expect to get to next week) which will use cloud dbs 19:07:15 <clarkb> and I think we need to think a little about how we will do ci-puppetmaster 19:07:20 <jeblair> clarkb: i'm okay with waiting 19:07:36 <fungi> have we started backing up trove databases remotely yet? 19:08:00 <clarkb> fungi: no, that will require a minor update to my backup module. Basically pass creds in instead of using default my.cnf 19:08:13 <fungi> also, apparently there is some way hub_cap was saying to trigger ad download a trove backup, though i can't recall whether it was suitable for our needs 19:08:23 <mordred> clarkb and I noodled a very small amount about ci-puppetmaster 19:08:25 <fungi> something about being careful not to upload to the same swift store 19:08:38 <mordred> don't think we were really truly happy about any of the thoughts 19:08:51 <clarkb> fungi: their backups don't give us what jeblair wants out of backups, append only etc 19:08:59 <fungi> ahh, right 19:09:00 <clarkb> fungi: I think we need to take our own backups too 19:09:07 <fungi> yep, makes sense 19:09:15 <mordred> I agree - although I think that their backups might be a nice addition to the set 19:09:16 <jeblair> well, as a mechanism for getting the data out of mysql, they may be sufficient... 19:09:21 <mordred> (you can't have too much backups) 19:09:36 <jeblair> but we should then archive them somewhere ourselves for the other reasons 19:09:40 <clarkb> yeah it isn't a bad thing, just not completely sufficient for our needs 19:09:52 <mordred> BUT - if we start to use/use them, we'll be tying to rax, as the mechanism is apparently different between hp and rax 19:09:55 <mordred> sigh 19:10:05 <jeblair> that's a good reason to just ignore it 19:10:10 <fungi> as for the puppetmaster, i think maybe we ought to make sure that we have clearly-defined places on the filesystem which we exclude from bup, and then we can make periodic encrypted tarballs of those trees and stick them somewhere reachable so as not to make bup's git too worthless 19:10:10 <mordred> because, you know, that's a winning idea 19:10:29 <clarkb> fungi: that sounds like a reasonable approach 19:10:32 <mordred> fungi: the thing we're most concerned with is backing up that one file - everything else is a simple rebuild 19:10:45 <clarkb> mordred: two files 19:10:48 <clarkb> but yes 19:10:54 <mordred> ah. wait I see what fungi is saying now 19:11:22 <fungi> but yeah, if there's little of interest on the server besides what we want to encrypt, then maybe there's little point to using bup other than for consistency 19:11:22 <mordred> so - possible rathole - but we discussed perhaps having a sharded master key that infra-core could each have a part of 19:11:34 <mordred> perhaps we encrypt that dir with the public part of that 19:11:47 <mordred> and if we ever need to restore, it takes the key re-combination to do so 19:12:07 <mordred> (this started in conversations about how to deal with hong kong infra issues) 19:12:13 <fungi> we can do an x-out-of-y multipart key, but yeah rathole 19:12:21 <mordred> yah. come back to that later. sorry 19:12:24 <clarkb> I don't think we need to make a decision now, but this is the sort of thing we need to sort out to backup that host in a reasnable manner 19:12:27 <jeblair> fun. :) 19:12:37 <jeblair> #topic puppet-dashboard (pleia2, anteaya) 19:12:59 <jeblair> pleia2, anteaya: what's the latest? 19:13:01 <anteaya> so far I have instructions to get a dashboard and master up 19:13:14 <anteaya> I was using pleia2's hpcloud account and now have my own 19:13:24 <anteaya> so I will bring up those nodes again on my own 19:13:39 <anteaya> then I have to get them talking to each other, so bascially the same status as the last time we talked 19:13:49 <pleia2> these are manual instructions, once we have this going we'll dig into what we need to change about our puppet module 19:14:38 <hub_cap> heyo 19:14:48 <pleia2> that's about it 19:14:54 <jeblair> ok thanks. i'm looking forward to having a usable dashboard. :) 19:15:04 <anteaya> yes 19:15:09 <mordred> ++ 19:15:10 <jeblair> i think the marconi agenda topic is stale, yeah? 19:15:15 <mordred> yah 19:15:28 <jeblair> #topic Trove testing 19:15:33 <jeblair> hub_cap: just in time! 19:15:39 <jeblair> clarkb: has an etherpad link, yeah? 19:15:51 <clarkb> I do 19:15:59 <clarkb> #link https://etherpad.openstack.org/testing-heat-trove-with-dib 19:16:50 <jeblair> that makes sense to me. are there any high level questions we should address before i dive into specifics about storing/publishing the images? 19:17:06 <mordred> the one thing that we discovered from talking to lifeless after the heat/trove discussion is that caching the upstream images might be a little more complex than we originally thought 19:17:23 <mordred> but I don't fully understand the details, so I expect to poke at that with lifeless 19:17:53 <jeblair> mordred: oh, ok. 19:18:05 <clarkb> mordred: I don't think it is the fully built images that are a problem it is the point about building ubuntu and fedora images to build a package cache 19:18:20 <mordred> no - there's another thing 19:18:33 <mordred> the thing that dib caches is not the exact thing that's downloaded 19:18:39 <mordred> so we need or they need to grow a thing 19:18:59 <mordred> I belive they're going to grow a feature 19:19:04 <jeblair> #action mordred understand and explain the mysterious caching issue with the heat/trove test plan 19:19:06 <anteaya> yes the download then a form of unpacking and then selecting a thing which is cached 19:19:21 <mordred> anteaya: ++ 19:20:03 <jeblair> so i was thinking that later this week i would effect the tarballs move (from old-wiki to static.o.o) 19:20:23 <mordred> woot 19:20:27 <jeblair> considering step 1 in that plan, it might be a good time to talk about where these images would be published... 19:20:48 <jeblair> should we just dump them in a directory on tarballs.o.o, or create a new images.o.o vhost? 19:21:06 <jeblair> (i mean really, tarballs.o.o is more like "published built artifacts.o.o" anyway... 19:21:17 <mordred> yah 19:21:20 <jeblair> it also holds jars and wars, for instance) 19:21:20 <mordred> according to heat and trove, they do not expect to produce tons of these 19:21:34 <pleia2> will our build systems have access to writing to this server where the images are cached? 19:21:49 <mordred> but I do believe tripleo long-term would like to publish a larger and more frequently updated set of images 19:21:56 <clarkb> pleia2: priveleged jenkins will 19:21:57 <mordred> so I kinda think subdir on tarballs.o.o for now 19:22:03 <jeblair> so i'd be okay with just putting them in, say, tarballs.o.o/trove/something.img, unless we wanted to make this thing a real public service with a nicer url. 19:22:11 <mordred> and then sort out a swift-backed-glance for later when we expect more real traffic 19:22:18 <mordred> jeblair: ++ 19:22:34 <pleia2> clarkb: ok, safe enough then (at least, not less safe than anything we're doing now) 19:22:43 <mordred> I think there is a larger design that could be nice here, but is not necessary to get through step one 19:22:54 <jeblair> ok, so we'll start there... what kind of space requirements should we expect for heat and trove in the medium term? 19:23:19 <mordred> 4 images. they have not indicated that they need historical storage - but rather a "this is the one that works" 19:23:39 <jeblair> hrm, well we never delete anything from tarballs.o.o now, but i suppose we could 19:23:47 <mordred> until we get a broader requirement, I say we stick with the equiv of a master.img 19:23:56 <jeblair> oh ok 19:24:01 <mordred> and then maybe a $tag.img if they ever do those? 19:24:12 <jeblair> so upload and overwrite an existing filename 19:24:28 <clarkb> I think we need at least some rotation for debugging purposes 19:24:40 <clarkb> (similar to why we keep old snapshot(s) for nodepool) 19:24:53 <clarkb> because these artifacts will be used in the gate 19:25:21 <jeblair> none of this is easy with jenkins scp uploading (neither atomic rewrites, symlinks, or rotation). this is another use-case for a smarter artifact receiver. 19:26:09 <jeblair> but, at the moment, that's what we have, so i think the best approximation would be to upload a unique file as well as a master.img file... 19:26:15 <jeblair> and have a cron job delete old unique files 19:26:21 <mordred> I'm fine with that 19:26:34 <mordred> again - they say these images do not change frequently 19:26:44 <mordred> "almost never" was the phrase used 19:26:48 <fungi> the cron job is more or less already there, just needs a pattern and relevant timeframe i think 19:26:53 <jeblair> so what kind of size are we talking about? 19:27:33 <jeblair> a few gb total i'm guessing (a handful of couple-hundred-mb images each?) 19:27:33 <fungi> ahh, so maybe tagging them (releasing the images) makes sense then? 19:27:40 <mordred> hub_cap: ^^ ? 19:27:44 <clarkb> the images I was building last week were >200MB and less that <1GB. This was for tripleo so may not represent anything like what trove and heat need 19:27:56 <mordred> I believe trove and heat do not need large images 19:28:27 <mordred> so, yeah to what jeblair said 19:28:54 <jeblair> okay, let's throw 50gb at it for starters. 19:29:03 <hub_cap> sorry dude im totally afk w baby 19:29:07 <hub_cap> im back let me scroll up 19:29:21 <mordred> hub_cap: tl;dr - how big are your images 19:30:02 <hub_cap> im thinking ~60m 19:30:05 <jeblair> #action jeblair move tarballs.o.o and include 50gb space for heat/trove images 19:30:06 <hub_cap> i can find out tho 19:31:04 <jeblair> anything else on trove(/heat) testing? 19:31:33 <jeblair> #topic Tripleo testing 19:31:44 <jeblair> lifeless: ping 19:31:55 <jeblair> so lifeless sent an email about this, and there are a few replies 19:32:14 <jeblair> #link https://etherpad.openstack.org/tripleo-test-cluster 19:32:17 <lifeless> hi 19:32:19 <hub_cap> ohhhhh qcow2 is 400m 19:32:29 <hub_cap> i was totally wrong 19:32:54 <jeblair> in broad strokes, this plan also seems reasonable 19:33:31 <jeblair> mordred: did you want to update folks on the changes you made to the etherpad, or is everyone up to speed on that? 19:33:39 <jeblair> hub_cap: ok, i'll stick with my swag of 50g then 19:34:10 <mordred> basically, since we talked, I remembered that nodepool has the ability to have a custom bootstrap script per base-image type 19:34:31 <mordred> which means that rather than piggypacking on the d-g nodes, we could also choose to just make a whole new node type to deal with the tripleo slaves 19:34:31 * ttx lurks 19:34:39 <hub_cap> jeblair: 50g is reasonable (sry for the late response) 19:34:44 <mordred> I do not know if that's better or worse 19:34:45 <jeblair> hub_cap: np, thx 19:35:06 <mordred> considering that we need to solve upstream image caching on d-g nodes for trove and heat anyway 19:35:08 <jeblair> mordred: well, the tripleo slaves want to run on the tripleo cloud, yeah? 19:35:16 <mordred> they do 19:35:49 <jeblair> mordred: so that's a new provider -- and unless we want to run d-g tests there, then it needs to be a new image too 19:35:58 <jeblair> so i think that strongly suggests that direction :) 19:36:03 <mordred> nod. 19:36:20 <mordred> so, lifeless, that's the change I made to the etherpad since then 19:37:00 <mordred> also, jhesketh popped in channel with something like hipster-zoro or something yesterday, which is a zuul-gearman based non-jenkins job runner 19:37:08 <mordred> which I think might be worthwhile looking at for the pieces of this that need built 19:37:11 <mordred> although might be wrong 19:37:31 <mordred> that's all 19:37:39 <fungi> "turbo-hipster" 19:37:49 <mordred> yah 19:37:51 <mordred> https://github.com/rcbau/turbo-hipster 19:38:09 <jeblair> mordred: i love it; i think he's trolling you with a whole project. :) 19:38:15 <mordred> it might be the wrong design for what lifeless needs out of it 19:38:33 <fungi> it has a beard and skinny jeans. what more does it need? 19:38:43 <mordred> craft beer and a food truck 19:39:06 <anteaya> and an untucked shirt 19:39:11 <clarkb> mordred: beer consumption among the hipster demographic is down, craft liquor is what you need 19:39:19 <lifeless> so I want to only vary from existing CI tooling where needed 19:39:29 <lifeless> better to migrate as part of a bigger plan than be a special snowflake 19:39:38 <mordred> yah. I was more thinking about your broker 19:39:44 <mordred> with the turbo-hipster 19:39:57 <mordred> like I said - may be COMPLETE mismatch of purpose 19:40:03 <lifeless> I think so 19:41:30 <mordred> jeblair: any specific questions you wanted to dive in to there? 19:41:32 <jeblair> well, i think the in-person planning and resulting documentation have made this a very easy topic. i'm interpreting this as widespread agreement and support. 19:42:44 <jeblair> mordred: i don't think so. broadly speaking, i think it's sound. there are lots of fun details to work out, but i think they are all tractable problems (easier to deal with when we get closer) 19:42:57 <clarkb> ++ 19:43:05 <mordred> ++ 19:43:24 <mordred> jeblair: we even did most of it without beer 19:43:30 <jeblair> mordred: i can tell! 19:43:33 <jeblair> #topic Salt (UtahDave) 19:43:45 <jeblair> fungi, UtahDave: ? 19:43:50 <UtahDave> o/ 19:44:05 <fungi> i've got a small to do list on this 19:44:15 <fungi> #link https://etherpad.openstack.org/salt-slavery-and-puppetry 19:44:25 <fungi> mostly just clean-up now 19:44:43 <fungi> we've been ironing out the stability issues seen previously and i think we're down to the last one 19:45:00 <fungi> we were just looking into it before the meeting 19:45:07 <UtahDave> mostly making sure zmq 3.2+ is installed. 19:45:11 <jeblair> fungi: point #1 -- i think we add repos to a whitelist where we trust unattended upgrades... 19:45:22 <jeblair> fungi: are you planning on doing that for salt? 19:45:28 <fungi> jeblair: we have, but i didn't do it right--needs fixing 19:45:32 <jeblair> ah ok 19:45:48 <fungi> jeblair: i think my ruby/puppet list iteration is wrong is all 19:46:15 <anteaya> fungi: I can peak after the meeting if you want 19:46:18 <fungi> UtahDave: did you have details on why #2 there helped? 19:46:24 <UtahDave> fungi: point #3 the salt-minion should be restarted, not reloaded. 19:46:30 <fungi> i'd like to make sure i include it in the commit message 19:46:58 <fungi> UtahDave: ahh! i heard you backwards earlier. puppet only knows restart unless you do fancy things 19:47:01 <clarkb> does zmq 3.2+ support 2.X? protocol? 19:47:04 <fungi> striking from the list 19:47:28 <UtahDave> fungi: Yes, it has to do with the Salt Mine. It's a fairly new feature that runs frequently. There apparently was a bug that was causing those issues 19:47:52 <clarkb> we may need to do a quick check that newer zmq eg 3.2+ won't break the jenkins event stream plugin 19:48:07 <fungi> UtahDave: if the salt mine bug link is handy, i'll keep an eye on it so i know when it's safe to revert that bit 19:48:25 <fungi> clarkb: good point 19:48:28 <UtahDave> fungi: I don't have it right here. I'll track it down and get it to you. 19:48:35 <fungi> UtahDave: no rush, and thanks 19:48:45 <UtahDave> clarkb: I'm not sure how zmq 3.2 would affect Jenkins. 19:48:53 <fungi> clarkb: which machines specifically are involved in that right now? just jenkins/zuul/logstash? 19:48:54 <clarkb> the jenkins side is jeromq which is java native and won't be affected but the python things talking to jeromq use libzmq 19:48:55 <UtahDave> fungi: np! 19:49:03 <clarkb> fungi: and nodepool 19:49:14 <fungi> clarkb: ahh, right 19:49:24 <fungi> clarkb: but the jenkins slaves themselves are not, right? 19:49:29 <clarkb> fungi: correct 19:50:43 <mordred> we eventually want salt minions on everything for the salt puppeting - so I think that's a good thing to check 19:50:49 <jeblair> ++ 19:51:01 <fungi> mordred: yep, item #8 on that to do list 19:51:12 <UtahDave> testing is obviously in order, but my guess is that upgrading to zmq3.2 won't cause any communication issues. Salt works with both zmq 2.1.x and zmq 3.2-x 19:51:23 <clarkb> UtahDave: that is what I expected 19:51:41 <UtahDave> zmq 3.2.x fixes a bunch of stability issues 19:52:17 <jeblair> fungi, UtahDave: anything else on this topic? 19:52:47 <fungi> no, i think that's about it--continuing to hack away at it 19:52:51 <UtahDave> As fungi ties off this initial project, if anyone else has any uses for Salt, please let me know 19:52:57 <jeblair> UtahDave: thanks for your (countinued!) help 19:53:06 <jeblair> #topic Owncloud (anteaya) 19:53:12 <jeblair> anteaya: hi! 19:53:17 <anteaya> I have stood up an owncloud: 15.185.188.187/owncloud/ 19:53:39 <anteaya> and gave out credentials to the infra team via pm last week 19:53:40 <anteaya> hi 19:53:57 <anteaya> the reason I did this was mordred asked me to 19:54:04 <anteaya> apparently the board wants to use it 19:54:13 <jeblair> i think mordred (in his board capacity) is driving requirements for this... 19:54:18 <anteaya> yes 19:54:24 <mordred> yah. the board would like a place to put documents 19:54:31 <mordred> so that we can stop having a private mailing list 19:54:38 <jeblair> anteaya, mordred: so do you want to do some acceptance testing with the install anteaya has set up? 19:54:39 <anteaya> will owncloud suit their needs? 19:54:48 <anteaya> let's do that 19:55:02 <mordred> I think the main outstanding questions are: 19:55:02 <mordred> auth integration of some sort 19:55:21 <zaro> anteaya: did you check if it works with windows 7, the webdav portion? 19:55:30 <anteaya> zaro: I have not, no 19:55:36 <jeblair> mordred: i think that would be nice, but with a limited pool of users, integrated auth support could probably be deferred... 19:55:42 <mordred> uhm, I think that's the main one 19:55:47 <mordred> totally. we can totally do by-hand auth 19:55:50 <anteaya> I think I still have the link you sent me though 19:55:58 <mordred> I think at first the main purpose of this is "so board can share stuff" 19:56:06 <jeblair> mordred: want to share some docs with us then? 19:56:09 <anteaya> they can look at files 19:56:11 <mordred> it might be nice to expand that in te future to "so devs and stuff can share stuff" 19:56:13 <jeblair> zaro: good idea to test webdav on osx and windows 19:56:19 <anteaya> but they can't edit other people's files 19:56:27 <anteaya> it is not a group file editing app 19:56:29 <clarkb> isn't group based permissions important as well? 19:56:31 <jeblair> mordred: integrated auth would def be a requirement for that 19:56:36 <mordred> yah 19:56:48 <pleia2> anteaya: oh, I have a win7 install floating around, just let me know what to test 19:56:56 <mordred> clarkb: not for step one - if this is only used by the board, there is only one group :) 19:57:10 <mordred> anteaya: I do not seem to be able to add things to the shared folder 19:57:11 <anteaya> okay, perhaps just logging in to start? 19:57:33 <pleia2> anteaya: will do 19:57:41 <anteaya> mordred: create a file and then share it 19:57:44 <mordred> there _do_ seem to be groups in it - and you seem to be able to share with groups 19:57:51 <clarkb> mordred: right, but if there is a step two and dropbox doesn't do group permissions... I think we need to test it a bit as if we were using it in the desired end state 19:57:53 <anteaya> you have to click it and then select share 19:58:01 <mordred> I hav ejust shared a file 19:58:11 <mordred> clarkb: agree 19:58:38 <mordred> do people see the file I shared? 19:58:40 <jeblair> mordred, anteaya: ok, let us know if there's specific testing you would like us to do; other than that, i think when mordred decides it's okay we should puppet it, yeah? 19:58:53 <anteaya> jeblair: yes 19:58:54 <mordred> I think we should work on puppeting it 19:58:58 <mordred> it seems to meet the basic use case 19:59:08 <mordred> which is file sharing for a group of 24 people managed by hand 19:59:13 <anteaya> okay, I will work with anyone wanted to test the functionality of owncloud 19:59:17 <jeblair> mordred: i have received your kerrerts. 19:59:20 <anteaya> and also will start to puppet it 19:59:26 <mordred> if there are more advanced things we want out of it, Alan clark has offered to have suse fix things if they don't work 19:59:40 <jeblair> we did not get to this topic: elastic-recheck (clarkb, jog0, mtreinish) 19:59:41 <clarkb> do we want to look at alternatives as well? 19:59:42 <anteaya> awesome 19:59:43 <mordred> anteaya: it should probably be configured to use swift as a backend and stuff 19:59:48 <jeblair> i'll move it to top of agenda for next week 19:59:53 <clarkb> jeblair: thanks 20:00:05 <jeblair> clarkb, jog0, mtreinish: if there are urgent things related to that, we can overflow into -infra channel now 20:00:05 <anteaya> mordred: okay, this one had mysql backend, I will work on a swift backend 20:00:10 <mordred> clarkb: I don't care enough - but if there are alternatives people know about, then whee! 20:00:15 <mordred> anteaya: thanks! 20:00:20 <jeblair> thanks everyone! 20:00:24 <jeblair> #endmeeting