19:02:10 <jeblair> #startmeeting infra 19:02:12 <openstack> Meeting started Tue Dec 9 19:02:10 2014 UTC and is due to finish in 60 minutes. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:02:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:02:15 <openstack> The meeting name has been set to 'infra' 19:02:24 <ianw> o/ (despite thunderstorms and my isp doing their best to keep me away) 19:02:31 <asselin> I'm here 19:02:33 <cody-somerville> \o 19:02:43 <jeblair> oh yay 19:02:46 <jhesketh> Morning 19:02:50 <krtaylor> o/ 19:02:52 <jesusaurus> o/ 19:02:57 <fungi> yo 19:03:00 <jeblair> asselin has a time constraint today, so we'll take his topic first 19:03:11 <asselin> jeblair, thanks. 19:03:22 <asselin> I'm proposing an in-tree 3rd party ci solution. 19:03:30 <jeblair> #topic in-tree 3rd party ci solution (asselin) 19:03:54 <asselin> I have a spec written. looking for link... 19:04:02 <jeblair> #link https://review.openstack.org/#/c/139745/ 19:04:21 <mordred> o/ 19:04:24 <jeblair> cool, i think this sounds like a good idea 19:04:36 <nibalizer> o/ 19:04:37 <asselin> thanks. I've been discussing it in 3rd party meeting and with others, and generally lots of support with the idea 19:04:37 <jeblair> and a logical next step after the puppet module breakup 19:04:50 <nibalizer> (i have to leave at quarter-till tho) 19:05:29 <fungi> asselin: flagged that to read soon. i had similar thoughts a while back 19:05:29 <clarkb> jeblair: right I don't think having a new independent repo helps much if we do that before we have the module split done 19:05:30 <asselin> I was hoping to start looking at the possible solutions and get somethign proposed by end of K1. 19:05:45 <fungi> hogepodge: you may also be interested in that spec 19:06:20 <asselin> I took an initial look at what it would take to set up a log server. 19:06:26 <jeblair> clarkb: yeah, i'm assuming this depends on finishing the module split 19:06:32 <jeblair> and it's going to uncover alot of other gotchas too 19:06:43 <jeblair> but it should help us start to nail down our interfaces 19:06:47 <mordred> ++ 19:06:53 <asselin> got good feedback, and looking at starting to 19:06:58 <jeblair> since having >1 consumer is really helpful for that sort of thing :) 19:07:03 <clarkb> +1 19:07:03 <asselin> jeblair, right, exactly :) 19:07:11 <SergeyLukjanov> o/ 19:07:29 <jeblair> also, we might be able to do better testing for this limited part of our system 19:07:45 <fungi> asselin: logserver is likely to be the hardest part, since setting up a public-facing webserver is often a clash with corporate network admins' firewall policies and needs extra deployment considerations 19:07:55 <fungi> but definitely still worth covering the simple case 19:08:00 <asselin> jeblair, +1 can add that to the spec 19:08:09 <jeblair> anyway, so it sounds like next steps are for us to try to find some time to review the spec, and if we find any contentious/complicated bits, come back here and hash them out? 19:08:22 <asselin> fungi, the assumption is to setup the log server in public place, and the rest can operate behind the firewall 19:08:29 <mordred> asselin: also, offline from this meeting, I'd like to sync up with you on the rework-launch-node things I've been poking at and haven't written up 19:08:48 <asselin> mordred, ok sure 19:09:02 <asselin> jeblair, yes 19:09:21 <asselin> #link https://review.openstack.org/#/q/topic:thirdpartyci,n,z 19:09:40 <asselin> I created a topic to track the spec and initial attempt at the log server ^ 19:10:06 <asselin> so that's it, just wanted to get awareness and support 19:10:11 <jeblair> asselin: cool, thanks very much! 19:10:16 <anteaya> asselin: nice work 19:10:22 <fungi> a worthwhile endeavor 19:10:29 <asselin> thanks 19:10:37 <jeblair> #topic Actions from last meeting 19:10:50 <timrc> o/ 19:10:57 <jeblair> oh, forgot my links 19:11:00 <jeblair> #link agenda https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting 19:11:02 <jeblair> #link previous meeting http://eavesdrop.openstack.org/meetings/infra/2014/infra.2014-12-02-19.01.html 19:11:23 <jeblair> anteaya draft messaging to communicate the new third-party account process 19:11:33 <anteaya> that happened 19:11:34 <fungi> she even sent it out 19:11:39 <anteaya> I did 19:11:41 <jeblair> above and beyond! 19:11:46 <anteaya> heh 19:11:57 <clarkb> and we are mostly transitioned off of the old stuff. 19:12:02 <anteaya> yay 19:12:14 <jeblair> so that seems to be going well, aside from apparently we had a way to block gerrit emails being sent on behalf of 3p systems 19:12:16 <fungi> #link http://lists.openstack.org/pipermail/third-party-announce/2014-December/000130.html 19:12:17 <clarkb> the old groups still exist but are owned by administrators and are not visible 19:12:19 <jeblair> which we lost 19:12:28 <anteaya> pleia2: has a system-config patch up to remove the old ml from puppet 19:12:36 <anteaya> we have planes to archive it 19:12:49 <anteaya> the third-party requests ml 19:12:56 <jeblair> what should we do about the email functionality? 19:13:20 <jeblair> drop the "feature", try to get people to manage a non-voting-ci group, exim rules? 19:13:41 <anteaya> I like not manageing this group 19:14:03 <anteaya> can we filter on IC 19:14:07 <anteaya> CI 19:14:11 <anteaya> laggy wifi 19:14:28 <fungi> create a "don't send e-mail" group in gerrit and have a lot of people who can drop accounts in there for well-defined reasons? 19:14:38 <anteaya> ohhh, I like that 19:14:41 <fungi> doesn't necessarily have to be ci-specific, but likely would be anyway 19:14:42 <jeblair> anteaya: yes, we could filter outgoing mail with exim based on having "CI" in the name 19:14:49 <anteaya> who can add to that group? 19:14:51 <clarkb> I strongly feel this should be managed in clients. Similar to filtering noise in irc channels 19:15:11 <clarkb> not everyone will agree on how to filter a thing and thankfully for email there are lots of tools available to do this independent of gerrit 19:15:12 <anteaya> clarkb: except it isn't happening 19:15:20 <clarkb> anteaya: why not? anyone can do it 19:15:23 <anteaya> and folks make noise in -infra 19:15:41 <anteaya> based on the number of questions so far it isn;t happening 19:15:53 <clarkb> if I had to pick a way to go back to sort of what we had before I would create some Central CI group that has a pretty large management group to add/remove members 19:16:14 <anteaya> I like fungi's no email group 19:16:15 <fungi> i think some of the complaint is devs using muas they can't or don't know how to configure to filter this (which should be a lot easier now that we have naming consistency) 19:16:20 <clarkb> we could reuse the existing Third-Party CI group 19:16:28 <clarkb> which is preseeded with a number of ci users 19:16:38 <jeblair> and anyone who complains gets added to the management group for that group ;) 19:16:43 <anteaya> ha ha ha 19:16:44 <clarkb> jeblair: +1 :) 19:16:46 <anteaya> yes! 19:16:47 <nibalizer> hahaha 19:17:24 <fungi> i'm on board. if we do implement a no-emails group, infra core shouldn't add anyone to it, we should only add coordinators and make them take responsibility 19:17:40 * anteaya touches her nose 19:17:43 <clarkb> so I can make this change. Should I go ahead and remake Third-Party CI a thing and give Third-Party Coordinators ownership and pressed that group with people to manage it? 19:18:04 <fungi> unfortunately, there's no real audit trail on group membership management, so the larger the list of coordinators the less likely you'll be able to figure out when, why and by whom a member was added/removed 19:18:04 <jeblair> everyone attending the third-party ci meetings should be in the management group, i think 19:18:14 <clarkb> fungi: so you wouldn't start with the preexisting list of accounts? 19:18:16 <anteaya> terrific 19:18:17 <jeblair> fungi: isn't there such a thing in the db? 19:18:21 <jeblair> fungi: just not exposed? 19:18:48 <fungi> clarkb: yeah, whether or not the group is pre-seeded doesn't change the future accountability problem potential though 19:18:52 <jeblair> clarkb: i think we should start with the current list. just not add any more :) 19:19:00 <anteaya> then folks can complain at the meetings not infra channel 19:19:03 <clarkb> jeblair: ya that is what I was going with 19:19:11 <fungi> jeblair: i don't think there is, unless you cound the mysql journal in trove 19:19:32 <jeblair> fungi: hrm, there's an audit table for groups, but we can dig into that later 19:19:44 <fungi> oh, indeed. i'll double-check it 19:20:30 <clarkb> #action clarkb add DENY Email Third-Party CI rule to gerrit ACLs, giev Third-Party Coordinators ownership of Third-Party CI, seed Third-Party Coordinators with third party meeting attendees 19:20:38 <clarkb> is that what we have agreed on? 19:20:42 <jeblair> i think so 19:20:51 <anteaya> I like it 19:20:55 <fungi> yep, i'm on board 19:20:55 <jeblair> also, we should make it clear that jenkins and anything not a third-party ci is off-limits :) 19:21:09 <clarkb> jeblair: ya I can put that in the description of the group too 19:21:34 <jeblair> back to actions... 19:21:35 <jeblair> fungi nibalizer get pip and github modules split out 19:21:46 <fungi> according to nibalizer that was already done last week 19:21:53 <nibalizer> yup 19:21:55 <jeblair> oh neato 19:21:59 <fungi> and so i readded it to the actions list in error 19:22:05 <jeblair> clarkb script new gerrit group creation for self service third party accounts 19:22:13 <jeblair> also done :) 19:22:17 <jeblair> fungi close openstack-ci and add openstack-gate to e-r bugs 19:22:21 <fungi> done 19:22:31 <fungi> we haven't needed the hat of shame at all this week 19:22:39 <anteaya> :D 19:22:45 * mordred is the hat of shame 19:22:48 <jeblair> #topic Priority Efforts (swift logs) 19:23:13 <fungi> haz 19:23:19 <nibalizer> i have the hat of shame, i didn't make the docker thing i was supposed to do 19:23:33 <jhesketh> Okay, so zuul-swift-upload now works as a publisher so we can get logs on failed tests 19:23:40 <mordred> WOOT 19:23:40 <anteaya> yay 19:23:47 <jeblair> those are the best kind to have :) 19:23:52 <jhesketh> The experimental job (as seen here https://review.openstack.org/#/c/133179/) has disk log storage turned off. 19:24:07 <clarkb> for the most recent run 19:24:11 <jhesketh> So everything should be in place now to switch jobs over. Some of the project-config jobs are logging to swift. The next step is to turn off disk logging for infra to be the guinnepigs 19:24:18 <fungi> and no more races with getting the end of the console log? or is that still sometimes an issue? 19:24:29 <jhesketh> What do people think? 19:24:37 <mordred> jhesketh: I support this 19:24:37 <clarkb> fungi: none issue with swift 19:25:05 <anteaya> jhesketh: progress progress progress let's make some 19:25:17 <clarkb> yes I think I am ready to dogfood. I did have some ideas that came up looking at the index page for the above job. It would be nice if we had timestamps and file size there but I think we can add that later 19:25:36 <jeblair> #agreed start infra dogfooding logs in swift 19:25:37 <jhesketh> fungi : do you mean that we miss the end due to fetching? We kinda do in that we cut off the wget stuff 19:25:38 <mordred> I'm excited that 4 years in we may be about to use swift 19:25:41 <fungi> looks good. our publisher is still a little noisy with certificate warnings when grabbing the console from jenkins 19:25:54 <fungi> jhesketh: right 19:25:55 <mordred> fungi: is that because of the RFC deprecation thing? 19:26:07 <fungi> mordred: nope, it's because it's self-signed 19:26:09 <mordred> ah 19:26:38 <fungi> also we still need index page text besides the links themselves 19:26:39 <jhesketh> We could try and silence wget 19:26:54 <mordred> or we could just get 8 real certs 19:27:02 <jeblair> mordred: -- 19:27:15 <jhesketh> fungi:? 19:27:35 <jhesketh> Do you mean the returned links? 19:27:44 <fungi> jhesketh: the readme we embed in on-disk apache autoindexes 19:28:06 <jhesketh> Ah, right, yes 19:28:10 <clarkb> fungi: not for our dogfooding though 19:28:18 <clarkb> hrm except for d-g I guess 19:28:22 <clarkb> but d-g is the interesting one 19:28:30 <jhesketh> I guess each job can do their own index somehow 19:28:49 <clarkb> what if we just put that tempate in a known location and link to it in every index? 19:28:51 <fungi> we could add generate-index macro in jjb maybe 19:28:53 <jhesketh> Either that or we make os-loganalyze smarter 19:29:14 <clarkb> where link is something different than a hyperlink so thatit renders nicely 19:29:14 <fungi> i think os-loganalyze seems like a better place to fix that 19:29:46 <fungi> since that will allow us to alter readmes over time rather than having them stuck in whatever the state was at the time the logs were uploaded 19:30:17 <jeblair> the opposite approach has advantages too -- as things evolve, the readmes can co-evolve 19:30:28 <jhesketh> Except the readme might not match the old logs, so storing it with the job may make more sense 19:30:30 <fungi> true 19:30:34 <jeblair> (if, say, devstack-gate wrote its own readme) 19:30:54 <jhesketh> Doing it as a macro adds the greatest flexibility to the jobs 19:31:34 <jeblair> what writes the index right now? 19:31:40 <fungi> yeah, and i guess the inefficiency of having hundreds of thousands of copies isn't terrible since that'll be a small bit of text in teh end 19:31:49 <jhesketh> This shouldn't affect the project-config jobs we want to dogfood, so maybe we tackle it when we move the other jobs over 19:32:02 <jeblair> btw, logs.openstack.org/79/133179/1/experimental/experimental-swift-logs-system-config-pep8-centos6/cc75c20 is really slow to load for me 19:32:30 <jhesketh> jeblair: the upload script can generate an index.html which is just a list of files it uploaded 19:33:05 <fungi> on performance, yes it does seem that the index page generation is slow. requesting specific files is very quick by comparison 19:33:43 <clarkb> fungi: the index.html isn't generated and is also a specific file. So Ithink any file may have that problem 19:33:44 <fungi> huh... suddenly it sped up for me 19:33:46 <jeblair> jhesketh: so we could add times/sizes to that, and have it insert the text of a readme if one exists 19:33:50 <jhesketh> Yes so indexes suck in that the object in the url is first attempted to fetch and failing that it appends index.html and tries again 19:34:00 <jhesketh> So it needs to make a few calls to swift 19:34:12 <jhesketh> jeblair: yep 19:34:41 <jeblair> jhesketh: can we have os-log-analyze append 'index.html' if the uri terminates with a '/'? 19:34:55 <jeblair> (which it should anyway, and then we can go update zuul, etc, to leave the proper / terminated link) 19:35:03 <jhesketh> fungi: the speed will depend if there is an established connection with swift available in the pool 19:35:21 <fungi> ahh 19:35:38 <jhesketh> jeblair: that seems reasonable (so we assume object names never end in a trailing slash) 19:36:04 <jeblair> i _think_ for our pseudo-filesystem case we can make that assumption 19:36:20 <jhesketh> Yep, good idea 19:36:22 <fungi> after all, that's basically how it's working with apache and a real filesystem 19:36:38 <clarkb> I like that idea 19:36:59 <jhesketh> Also swift apparently have a plan to not be terrible at threads that'll help our connection pool management and stop so much lag (hopefully) 19:37:37 <jeblair> cool, agreement to dogfood and some next steps... anything else? 19:37:53 <jhesketh> Okay so it sounds like we're ready to dog food and will make some tweaks to the index loading and then documentation as we go 19:38:10 <jeblair> jhesketh: thanks! 19:38:12 <jeblair> #topic Priority Efforts (Puppet module split) 19:38:13 <ianw> jhesketh: couldn't the output page be cached by osloganalyze? 19:38:57 <jeblair> asselin is probably gone by now... 19:39:05 <jeblair> nibalizer: anything related to this we should chat about? anything blocking? 19:39:13 <ianw> fairly related to this, i have two changed for httpd puppet module that need final reviews 19:39:20 <jhesketh> ianw: hmm, with something like memcache it might not be a bad idea. Let's take that as an offline improvement 19:39:22 <ianw> https://review.openstack.org/136959 (update rakefile) 19:39:35 <ianw> https://review.openstack.org/136962 (adding the module) 19:39:38 <mmedvede> there are couple of things that can be merged 19:39:41 <jeblair> #link https://review.openstack.org/136959 19:39:41 <mmedvede> #link https://review.openstack.org/#/q/status:open++topic:module-split%29,n,z 19:39:47 <nibalizer> jeblair: i think we're swell 19:40:05 <jeblair> mmedvede: ah thanks 19:40:19 <nibalizer> i think things have been sorta slow lately, but i attribuet that to infra-manual sprint and thanksgiving so im not worried 19:40:50 <jeblair> so we should consider those priority reviews 19:41:23 <mmedvede> There was also some movement towards more automation on splits 19:41:30 <jeblair> ianw: maybe you should change your topics to 'module-split' ? 19:42:10 <jeblair> i mean, it's split out, but it still seems related to the effort :) 19:42:54 <anteaya> mmedvede: can you exapnd on the automation 19:42:59 <anteaya> I have some questions 19:43:15 <jeblair> i think there's a pending change to add a script to system-config 19:43:22 <anteaya> cool 19:43:56 <jeblair> #link https://review.openstack.org/#/c/137991/ 19:44:01 <mmedvede> anteaya: we are trying to maintain all the splits automatically, before they are pulled in 19:44:19 <anteaya> great 19:44:20 <mmedvede> asselin and sweston were who worked on it 19:44:24 <nibalizer> I have to step out, thanks everyone! 19:44:45 <anteaya> and asselins says that the script updates sweston's github repo 19:44:53 <anteaya> was is the trigger for the update? 19:44:55 <mmedvede> anteaya: correct 19:45:34 <anteaya> my concern is extra overhead having source under sweston's control and asselin's patch 19:45:48 <anteaya> in case the seed repo needs to be respun 19:45:52 <ianw> jeblair: the important one, https://review.openstack.org/#/c/136962/ , has had two +2's and 4 +1's ... so it's been seen ... but we can't use the module until it's in 19:45:56 <anteaya> which can happen 19:46:08 <mmedvede> anteaya: I see. There could be another step added, that actually would validate 19:46:26 <mmedvede> i.e. run a simple diff with their upstream vs system-config 19:46:40 <anteaya> mmedvede: if we can have anyone trigger a respin of the repo, especially a patch owner that is all I am looking for 19:47:47 <jeblair> #topic Priority Efforts (Nodepool DIB) 19:47:59 <mordred> dib images work in rackspace now 19:48:05 <yolanda> congrats! 19:48:08 <mordred> I have repeatable scripts to make them work and whatnot 19:48:12 <yolanda> mordred, using glance import and siwft? 19:48:16 <mordred> yolanda: yes 19:48:31 <mordred> it turns out that in rax we need to use glance v2 and swift, and in HP we need to use glance v1 19:48:51 <mordred> jroll also got rax to put info we need into config-drive 19:48:52 <jeblair> the trusty nodepool server is not working at scale, and we're digging into why. 19:49:08 <mordred> so I think before we roll this out, we want to wait for that change to go live so that we can drop nova-agent 19:49:12 <mordred> however, nova-agent does work 19:49:17 <mordred> and we have dib elements for it 19:49:22 <jeblair> mordred: ++ 19:49:30 <mordred> my next step is to turn the shell scripts I have into python 19:49:43 <mordred> at which point I'm probably going to want to have a conversation about nodepool+shade 19:50:03 <mordred> because "I want to upload an image" having two completely different paths per cloud is not really nodepool specific logic 19:50:08 <fungi> also we seem to possibly be overloading our git server farm during snapshot image updates, and switching to dib will reduce that a whole bunch 19:50:09 <clarkb> mordred: and everyone else really. I would really appreciate it if we stopped pushing so many new features to nodepool until we get the current ones working :) 19:50:29 <clarkb> its great that people are excited about nodepool but there are a few things currently wrong with it and I don't think they are getting much attention 19:50:59 <mordred> in any case, if people want to look at the elements, they're up for review against system-config 19:51:07 <mordred> I'll move them to project-config and repropose soon 19:51:25 <jeblair> mordred: i'm probably going to want to have a conversation about shade's project hosting and maintenance situation :) 19:51:32 <mordred> jeblair: me too :) 19:51:43 <mordred> jeblair: I imagine that will be part of that conversation 19:51:58 <mordred> ALSO - I'd like to suggest that we move nodepool/elements to just elements/ - because I think we're going to wind up with non-nodepool elements too (see sane base images for infra servers) 19:52:02 <mordred> but that's not urgent 19:52:33 <jeblair> *nod* 19:52:34 <clarkb> mordred: they are in different repos though 19:52:46 <mordred> clarkb: idea is to not propose the elements I've been working on to system-config 19:52:50 <mordred> but to project-config 19:52:57 <mordred> and have one set of elements and have it be tehre 19:53:18 <fungi> elements for non-nodepool things would live in project-config? 19:53:34 <jeblair> i feel like we're getting really close to a chicken and egg problem 19:53:50 <fungi> what non-nodepool elements would be likely to get used for project-specific stuff? 19:54:39 <jeblair> probably we should start with duplication and see what we end up with and if it makes sense to de-dup? 19:54:58 <jeblair> this might be a bit hard to reason about at the current level of abstraction/understanding :) 19:55:14 <clarkb> ya I think that is something to revisit when dib is a bit more concrete for us 19:55:21 <clarkb> right now its very much hand wavy 19:55:29 <fungi> cart before the horse then 19:56:02 <jeblair> #topic Priority Efforts (jobs on trusty) 19:56:10 <fungi> i prodded bug 1348954, bug 1367907 and bug 1382607 again a couple weeks ago, but no response from anyone in ubuntu/canonical on having a less-bug-ridden py3k in trusty 19:56:14 <uvirtbot> Launchpad bug 1348954 in python3.4 "update Python3 for trusty" [Undecided,Confirmed] https://launchpad.net/bugs/1348954 19:56:15 <uvirtbot> Launchpad bug 1367907 in python3.4 "Segfault in gc with cyclic trash" [High,Fix released] https://launchpad.net/bugs/1367907 19:56:16 <uvirtbot> Launchpad bug 1382607 in python3.4 "[SRU] Backport python3.4 logging module backward incompatibility fix." [High,Fix released] https://launchpad.net/bugs/1382607 19:56:28 <fungi> i'm open to suggestions on how to raise the visibility of those 19:56:53 <fungi> zul asked me for the bug numbers again last week, but not sure if he was looking into them now too 19:57:05 <jeblair> switch to an os that supports python3.4? 19:57:22 <fungi> there's an alternative 19:57:42 <fungi> or build our own real python interpreters for tests rather than using distro-provided python ;) 19:57:54 <clarkb> or use the ones they pushed into that other repo 19:58:06 <fungi> (which come with who-knows-what features back/forwardported into them) 19:58:14 <fungi> that other repo? 19:58:17 <zul> i could look at pushing it to a ppa again as well 19:58:39 <clarkb> fungi: py3.4 is fixed in one of the package maintainers repos iirc 19:58:48 <clarkb> fungi: which you tested to confirm that the new package fixed our problem 19:59:00 <fungi> clarkb: yeah 19:59:06 <jeblair> zul: can you just yell at them to release it already? :) 19:59:11 <fungi> clarkb: at least fixed for that one patch 19:59:16 <zul> jeblair: ive done that 20:00:12 <jeblair> thanks everyone 20:00:17 <pleia2> we skipped over them, but core eyeballs on the priority specs would be much appreciated 20:00:23 <fungi> anyway, that's the current state. patched python 3.4 in trusty is the current only blocker to moving our py3k jobs to 3.4/trusty 20:00:25 <jeblair> pleia2: ++ 20:00:36 <jeblair> and gerrit topics are up next time 20:00:38 <jeblair> #endmeeting