#openstack-meeting log

19:02:10 <jeblair> #startmeeting infra
19:02:12 <openstack> Meeting started Tue Dec  9 19:02:10 2014 UTC and is due to finish in 60 minutes.  The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:02:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:02:15 <openstack> The meeting name has been set to 'infra'
19:02:24 <ianw> o/ (despite thunderstorms and my isp doing their best to keep me away)
19:02:31 <asselin> I'm here
19:02:33 <cody-somerville> \o
19:02:43 <jeblair> oh yay
19:02:46 <jhesketh> Morning
19:02:50 <krtaylor> o/
19:02:52 <jesusaurus> o/
19:02:57 <fungi> yo
19:03:00 <jeblair> asselin has a time constraint today, so we'll take his topic first
19:03:11 <asselin> jeblair, thanks.
19:03:22 <asselin> I'm proposing an in-tree 3rd party ci solution.
19:03:30 <jeblair> #topic in-tree 3rd party ci solution (asselin)
19:03:54 <asselin> I have a spec written. looking for link...
19:04:02 <jeblair> #link https://review.openstack.org/#/c/139745/
19:04:21 <mordred> o/
19:04:24 <jeblair> cool, i think this sounds like a good idea
19:04:36 <nibalizer> o/
19:04:37 <asselin> thanks. I've been discussing it in 3rd party meeting and with others, and generally lots of support with the idea
19:04:37 <jeblair> and a logical next step after the puppet module breakup
19:04:50 <nibalizer> (i have to leave at quarter-till tho)
19:05:29 <fungi> asselin: flagged that to read soon. i had similar thoughts a while back
19:05:29 <clarkb> jeblair: right I don't think having a new independent repo helps much if we do that before we have the module split done
19:05:30 <asselin> I was hoping to start looking at the possible solutions and get somethign proposed by end of K1.
19:05:45 <fungi> hogepodge: you may also be interested in that spec
19:06:20 <asselin> I took an initial look at what it would take to set up a log server.
19:06:26 <jeblair> clarkb: yeah, i'm assuming this depends on finishing the module split
19:06:32 <jeblair> and it's going to uncover alot of other gotchas too
19:06:43 <jeblair> but it should help us start to nail down our interfaces
19:06:47 <mordred> ++
19:06:53 <asselin> got good feedback, and looking at starting to
19:06:58 <jeblair> since having >1 consumer is really helpful for that sort of thing :)
19:07:03 <clarkb> +1
19:07:03 <asselin> jeblair, right, exactly :)
19:07:11 <SergeyLukjanov> o/
19:07:29 <jeblair> also, we might be able to do better testing for this limited part of our system
19:07:45 <fungi> asselin: logserver is likely to be the hardest part, since setting up a public-facing webserver is often a clash with corporate network admins' firewall policies and needs extra deployment considerations
19:07:55 <fungi> but definitely still worth covering the simple case
19:08:00 <asselin> jeblair, +1 can add that to the spec
19:08:09 <jeblair> anyway, so it sounds like next steps are for us to try to find some time to review the spec, and if we find any contentious/complicated bits, come back here and hash them out?
19:08:22 <asselin> fungi, the assumption is to setup the log server in public place, and the rest can operate behind the firewall
19:08:29 <mordred> asselin: also, offline from this meeting, I'd like to sync up with you on the rework-launch-node things I've been poking at and haven't written up
19:08:48 <asselin> mordred, ok sure
19:09:02 <asselin> jeblair, yes
19:09:21 <asselin> #link https://review.openstack.org/#/q/topic:thirdpartyci,n,z
19:09:40 <asselin> I created a topic to track the spec and initial attempt at the log server ^
19:10:06 <asselin> so that's it, just wanted to get awareness and support
19:10:11 <jeblair> asselin: cool, thanks very much!
19:10:16 <anteaya> asselin: nice work
19:10:22 <fungi> a worthwhile endeavor
19:10:29 <asselin> thanks
19:10:37 <jeblair> #topic  Actions from last meeting
19:10:50 <timrc> o/
19:10:57 <jeblair> oh, forgot my links
19:11:00 <jeblair> #link agenda https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting
19:11:02 <jeblair> #link previous meeting http://eavesdrop.openstack.org/meetings/infra/2014/infra.2014-12-02-19.01.html
19:11:23 <jeblair> anteaya draft messaging to communicate the new third-party account process
19:11:33 <anteaya> that happened
19:11:34 <fungi> she even sent it out
19:11:39 <anteaya> I did
19:11:41 <jeblair> above and beyond!
19:11:46 <anteaya> heh
19:11:57 <clarkb> and we are mostly transitioned off of the old stuff.
19:12:02 <anteaya> yay
19:12:14 <jeblair> so that seems to be going well, aside from apparently we had a way to block gerrit emails being sent on behalf of 3p systems
19:12:16 <fungi> #link http://lists.openstack.org/pipermail/third-party-announce/2014-December/000130.html
19:12:17 <clarkb> the old groups still exist but are owned by administrators and are not visible
19:12:19 <jeblair> which we lost
19:12:28 <anteaya> pleia2: has a system-config patch up to remove the old ml from puppet
19:12:36 <anteaya> we have planes to archive it
19:12:49 <anteaya> the third-party requests ml
19:12:56 <jeblair> what should we do about the email functionality?
19:13:20 <jeblair> drop the "feature", try to get people to manage a non-voting-ci group, exim rules?
19:13:41 <anteaya> I like not manageing this group
19:14:03 <anteaya> can we filter on IC
19:14:07 <anteaya> CI
19:14:11 <anteaya> laggy wifi
19:14:28 <fungi> create a "don't send e-mail" group in gerrit and have a lot of people who can drop accounts in there for well-defined reasons?
19:14:38 <anteaya> ohhh, I like that
19:14:41 <fungi> doesn't necessarily have to be ci-specific, but likely would be anyway
19:14:42 <jeblair> anteaya: yes, we could filter outgoing mail with exim based on having "CI" in the name
19:14:49 <anteaya> who can add to that group?
19:14:51 <clarkb> I strongly feel this should be managed in clients. Similar to filtering noise in irc channels
19:15:11 <clarkb> not everyone will agree on how to filter a thing and thankfully for email there are lots of tools available to do this independent of gerrit
19:15:12 <anteaya> clarkb: except it isn't happening
19:15:20 <clarkb> anteaya: why not? anyone can do it
19:15:23 <anteaya> and folks make noise in -infra
19:15:41 <anteaya> based on the number of questions so far it isn;t happening
19:15:53 <clarkb> if I had to pick a way to go back to sort of what we had before I would create some Central CI group that has a pretty large management group to add/remove members
19:16:14 <anteaya> I like fungi's no email group
19:16:15 <fungi> i think some of the complaint is devs using muas they can't or don't know how to configure to filter this (which should be a lot easier now that we have naming consistency)
19:16:20 <clarkb> we could reuse the existing Third-Party CI group
19:16:28 <clarkb> which is preseeded with a number of ci users
19:16:38 <jeblair> and anyone who complains gets added to the management group for that group ;)
19:16:43 <anteaya> ha ha ha
19:16:44 <clarkb> jeblair: +1 :)
19:16:46 <anteaya> yes!
19:16:47 <nibalizer> hahaha
19:17:24 <fungi> i'm on board. if we do implement a no-emails group, infra core shouldn't add anyone to it, we should only add coordinators and make them take responsibility
19:17:40 * anteaya touches her nose
19:17:43 <clarkb> so I can make this change. Should I go ahead and remake Third-Party CI a thing and give Third-Party Coordinators ownership and pressed that group with people to manage it?
19:18:04 <fungi> unfortunately, there's no real audit trail on group membership management, so the larger the list of coordinators the less likely you'll be able to figure out when, why and by whom a member was added/removed
19:18:04 <jeblair> everyone attending the third-party ci meetings should be in the management group, i think
19:18:14 <clarkb> fungi: so you wouldn't start with the preexisting list of accounts?
19:18:16 <anteaya> terrific
19:18:17 <jeblair> fungi: isn't there such a thing in the db?
19:18:21 <jeblair> fungi: just not exposed?
19:18:48 <fungi> clarkb: yeah, whether or not the group is pre-seeded doesn't change the future accountability problem potential though
19:18:52 <jeblair> clarkb: i think we should start with the current list.  just not add any more :)
19:19:00 <anteaya> then folks can complain at the meetings not infra channel
19:19:03 <clarkb> jeblair: ya that is what I was going with
19:19:11 <fungi> jeblair: i don't think there is, unless you cound the mysql journal in trove
19:19:32 <jeblair> fungi: hrm,  there's an audit table for groups, but we can dig into that later
19:19:44 <fungi> oh, indeed. i'll double-check it
19:20:30 <clarkb> #action clarkb add DENY Email Third-Party CI rule to gerrit ACLs, giev Third-Party Coordinators ownership of Third-Party CI, seed Third-Party Coordinators with third party meeting attendees
19:20:38 <clarkb> is that what we have agreed on?
19:20:42 <jeblair> i think so
19:20:51 <anteaya> I like it
19:20:55 <fungi> yep, i'm on board
19:20:55 <jeblair> also, we should make it clear that jenkins and anything not a third-party ci is off-limits :)
19:21:09 <clarkb> jeblair: ya I can put that in the description of the group too
19:21:34 <jeblair> back to actions...
19:21:35 <jeblair> fungi nibalizer get pip and github modules split out
19:21:46 <fungi> according to nibalizer that was already done last week
19:21:53 <nibalizer> yup
19:21:55 <jeblair> oh neato
19:21:59 <fungi> and so i readded it to the actions list in error
19:22:05 <jeblair> clarkb script new gerrit group creation for self service third party accounts
19:22:13 <jeblair> also done :)
19:22:17 <jeblair> fungi close openstack-ci and add openstack-gate to e-r bugs
19:22:21 <fungi> done
19:22:31 <fungi> we haven't needed the hat of shame at all this week
19:22:39 <anteaya> :D
19:22:45 * mordred is the hat of shame
19:22:48 <jeblair> #topic Priority Efforts (swift logs)
19:23:13 <fungi> haz
19:23:19 <nibalizer> i have the hat of shame, i didn't make the docker thing i was supposed to do
19:23:33 <jhesketh> Okay, so zuul-swift-upload now works as a publisher so we can get logs on failed tests
19:23:40 <mordred> WOOT
19:23:40 <anteaya> yay
19:23:47 <jeblair> those are the best kind to have :)
19:23:52 <jhesketh> The experimental job (as seen here https://review.openstack.org/#/c/133179/) has disk log storage turned off.
19:24:07 <clarkb> for the most recent run
19:24:11 <jhesketh> So everything should be in place now to switch jobs over. Some of the project-config jobs are logging to swift. The next step is to turn off disk logging for infra to be the guinnepigs
19:24:18 <fungi> and no more races with getting the end of the console log? or is that still sometimes an issue?
19:24:29 <jhesketh> What do people think?
19:24:37 <mordred> jhesketh: I support this
19:24:37 <clarkb> fungi: none issue with swift
19:25:05 <anteaya> jhesketh: progress progress progress let's make some
19:25:17 <clarkb> yes I think I am ready to dogfood. I did have some ideas that came up looking at the index page for the above job. It would be nice if we had timestamps and file size there but I think we can add that later
19:25:36 <jeblair> #agreed start infra dogfooding logs in swift
19:25:37 <jhesketh> fungi : do you mean that we miss the end due to fetching? We kinda do in that we cut off the wget stuff
19:25:38 <mordred> I'm excited that 4 years in we may be about to use swift
19:25:41 <fungi> looks good. our publisher is still a little noisy with certificate warnings when grabbing the console from jenkins
19:25:54 <fungi> jhesketh: right
19:25:55 <mordred> fungi: is that because of the RFC deprecation thing?
19:26:07 <fungi> mordred: nope, it's because it's self-signed
19:26:09 <mordred> ah
19:26:38 <fungi> also we still need index page text besides the links themselves
19:26:39 <jhesketh> We could try and silence wget
19:26:54 <mordred> or we could just get 8 real certs
19:27:02 <jeblair> mordred: --
19:27:15 <jhesketh> fungi:?
19:27:35 <jhesketh> Do you mean the returned links?
19:27:44 <fungi> jhesketh: the readme we embed in on-disk apache autoindexes
19:28:06 <jhesketh> Ah, right, yes
19:28:10 <clarkb> fungi: not for our dogfooding though
19:28:18 <clarkb> hrm except for d-g I guess
19:28:22 <clarkb> but d-g is the interesting one
19:28:30 <jhesketh> I guess each job can do their own index somehow
19:28:49 <clarkb> what if we just put that tempate in a known location and link to it in every index?
19:28:51 <fungi> we could add generate-index macro in jjb maybe
19:28:53 <jhesketh> Either that or we make os-loganalyze smarter
19:29:14 <clarkb> where link is something different than a hyperlink so thatit renders nicely
19:29:14 <fungi> i think os-loganalyze seems like a better place to fix that
19:29:46 <fungi> since that will allow us to alter readmes over time rather than having them stuck in whatever the state was at the time the logs were uploaded
19:30:17 <jeblair> the opposite approach has advantages too -- as things evolve, the readmes can co-evolve
19:30:28 <jhesketh> Except the readme might not match the old logs, so storing it with the job may make more sense
19:30:30 <fungi> true
19:30:34 <jeblair> (if, say, devstack-gate wrote its own readme)
19:30:54 <jhesketh> Doing it as a macro adds the greatest flexibility to the jobs
19:31:34 <jeblair> what writes the index right now?
19:31:40 <fungi> yeah, and i guess the inefficiency of having hundreds of thousands of copies isn't terrible since that'll be a small bit of text in teh end
19:31:49 <jhesketh> This shouldn't affect the project-config jobs we want to dogfood, so maybe we tackle it when we move the other jobs over
19:32:02 <jeblair> btw, logs.openstack.org/79/133179/1/experimental/experimental-swift-logs-system-config-pep8-centos6/cc75c20 is really slow to load for me
19:32:30 <jhesketh> jeblair: the upload script can generate an index.html which is just a list of files it uploaded
19:33:05 <fungi> on performance, yes it does seem that the index page generation is slow. requesting specific files is very quick by comparison
19:33:43 <clarkb> fungi: the index.html isn't generated and is also a specific file. So Ithink any file may have that problem
19:33:44 <fungi> huh... suddenly it sped up for me
19:33:46 <jeblair> jhesketh: so we could add times/sizes to that, and have it insert the text of a readme if one exists
19:33:50 <jhesketh> Yes so indexes suck in that the object in the url is first attempted to fetch and failing that it appends index.html and tries again
19:34:00 <jhesketh> So it needs to make a few calls to swift
19:34:12 <jhesketh> jeblair: yep
19:34:41 <jeblair> jhesketh: can we have os-log-analyze append 'index.html' if the uri terminates with a '/'?
19:34:55 <jeblair> (which it should anyway, and then we can go update zuul, etc, to leave the proper / terminated link)
19:35:03 <jhesketh> fungi: the speed will depend if there is an established connection with swift available in the pool
19:35:21 <fungi> ahh
19:35:38 <jhesketh> jeblair: that seems reasonable (so we assume object names never end in a trailing slash)
19:36:04 <jeblair> i _think_ for our pseudo-filesystem case we can make that assumption
19:36:20 <jhesketh> Yep, good idea
19:36:22 <fungi> after all, that's basically how it's working with apache and a real filesystem
19:36:38 <clarkb> I like that idea
19:36:59 <jhesketh> Also swift apparently have a plan to not be terrible at threads that'll help our connection pool management and stop so much lag (hopefully)
19:37:37 <jeblair> cool, agreement to dogfood and some next steps... anything else?
19:37:53 <jhesketh> Okay so it sounds like we're ready to dog food and will make some tweaks to the index loading and then documentation as we go
19:38:10 <jeblair> jhesketh: thanks!
19:38:12 <jeblair> #topic Priority Efforts (Puppet module split)
19:38:13 <ianw> jhesketh: couldn't the output page be cached by osloganalyze?
19:38:57 <jeblair> asselin is probably gone by now...
19:39:05 <jeblair> nibalizer: anything related to this we should chat about?  anything blocking?
19:39:13 <ianw> fairly related to this, i have two changed for httpd puppet module that need final reviews
19:39:20 <jhesketh> ianw: hmm, with something like memcache it might not be a bad idea. Let's take that as an offline improvement
19:39:22 <ianw> https://review.openstack.org/136959 (update rakefile)
19:39:35 <ianw> https://review.openstack.org/136962 (adding the module)
19:39:38 <mmedvede> there are couple of things that can be merged
19:39:41 <jeblair> #link https://review.openstack.org/136959
19:39:41 <mmedvede> #link https://review.openstack.org/#/q/status:open++topic:module-split%29,n,z
19:39:47 <nibalizer> jeblair: i think we're swell
19:40:05 <jeblair> mmedvede: ah thanks
19:40:19 <nibalizer> i think things have been sorta slow lately, but i attribuet that to infra-manual sprint and thanksgiving so im not worried
19:40:50 <jeblair> so we should consider those priority reviews
19:41:23 <mmedvede> There was also some movement towards more automation on splits
19:41:30 <jeblair> ianw: maybe you should change your topics to 'module-split' ?
19:42:10 <jeblair> i mean, it's split out, but it still seems related to the effort :)
19:42:54 <anteaya> mmedvede: can you exapnd on the automation
19:42:59 <anteaya> I have some questions
19:43:15 <jeblair> i think there's a pending change to add a script to system-config
19:43:22 <anteaya> cool
19:43:56 <jeblair> #link https://review.openstack.org/#/c/137991/
19:44:01 <mmedvede> anteaya: we are trying to maintain all the splits automatically, before they are pulled in
19:44:19 <anteaya> great
19:44:20 <mmedvede> asselin and sweston were who worked on it
19:44:24 <nibalizer> I have to step out, thanks everyone!
19:44:45 <anteaya> and asselins says that the script updates sweston's github repo
19:44:53 <anteaya> was is the trigger for the update?
19:44:55 <mmedvede> anteaya: correct
19:45:34 <anteaya> my concern is extra overhead having source under sweston's control and asselin's patch
19:45:48 <anteaya> in case the seed repo needs to be respun
19:45:52 <ianw> jeblair: the important one, https://review.openstack.org/#/c/136962/ , has had two +2's and 4 +1's ... so it's been seen ... but we can't use the module until it's in
19:45:56 <anteaya> which can happen
19:46:08 <mmedvede> anteaya: I see. There could be another step added, that actually would validate
19:46:26 <mmedvede> i.e. run a simple diff with their upstream vs system-config
19:46:40 <anteaya> mmedvede: if we can have anyone trigger a respin of the repo, especially a patch owner that is all I am looking for
19:47:47 <jeblair> #topic Priority Efforts (Nodepool DIB)
19:47:59 <mordred> dib images work in rackspace now
19:48:05 <yolanda> congrats!
19:48:08 <mordred> I have repeatable scripts to make them work and whatnot
19:48:12 <yolanda> mordred, using glance import and siwft?
19:48:16 <mordred> yolanda: yes
19:48:31 <mordred> it turns out that in rax we need to use glance v2 and swift, and in HP we need to use glance v1
19:48:51 <mordred> jroll also got rax to put info we need into config-drive
19:48:52 <jeblair> the trusty nodepool server is not working at scale, and we're digging into why.
19:49:08 <mordred> so I think before we roll this out, we want to wait for that change to go live so that we can drop nova-agent
19:49:12 <mordred> however, nova-agent does work
19:49:17 <mordred> and we have dib elements for it
19:49:22 <jeblair> mordred: ++
19:49:30 <mordred> my next step is to turn the shell scripts I have into python
19:49:43 <mordred> at which point I'm probably going to want to have a conversation about nodepool+shade
19:50:03 <mordred> because "I want to upload an image" having two completely different paths per cloud is not really nodepool specific logic
19:50:08 <fungi> also we seem to possibly be overloading our git server farm during snapshot image updates, and switching to dib will reduce that a whole bunch
19:50:09 <clarkb> mordred: and everyone else really. I would really appreciate it if we stopped pushing so many new features to nodepool until we get the current ones working :)
19:50:29 <clarkb> its great that people are excited about nodepool but there are a few things currently wrong with it and I don't think they are getting much attention
19:50:59 <mordred> in any case, if people want to look at the elements, they're up for review against system-config
19:51:07 <mordred> I'll move them to project-config and repropose soon
19:51:25 <jeblair> mordred: i'm probably going to want to have a conversation about shade's project hosting and maintenance situation :)
19:51:32 <mordred> jeblair: me too :)
19:51:43 <mordred> jeblair: I imagine that will be part of that conversation
19:51:58 <mordred> ALSO - I'd like to suggest that we move nodepool/elements to just elements/ - because I think we're going to wind up with non-nodepool elements too (see sane base images for infra servers)
19:52:02 <mordred> but that's not urgent
19:52:33 <jeblair> *nod*
19:52:34 <clarkb> mordred: they are in different repos though
19:52:46 <mordred> clarkb: idea is to not propose the elements I've been working on to system-config
19:52:50 <mordred> but to project-config
19:52:57 <mordred> and have one set of elements and have it be tehre
19:53:18 <fungi> elements for non-nodepool things would live in project-config?
19:53:34 <jeblair> i feel like we're getting really close to a chicken and egg problem
19:53:50 <fungi> what non-nodepool elements would be likely to get used for project-specific stuff?
19:54:39 <jeblair> probably we should start with duplication and see what we end up with and if it makes sense to de-dup?
19:54:58 <jeblair> this might be a bit hard to reason about at the current level of abstraction/understanding :)
19:55:14 <clarkb> ya I think that is something to revisit when dib is a bit more concrete for us
19:55:21 <clarkb> right now its very much hand wavy
19:55:29 <fungi> cart before the horse then
19:56:02 <jeblair> #topic Priority Efforts (jobs on trusty)
19:56:10 <fungi> i prodded bug 1348954, bug 1367907 and bug 1382607 again a couple weeks ago, but no response from anyone in ubuntu/canonical on having a less-bug-ridden py3k in trusty
19:56:14 <uvirtbot> Launchpad bug 1348954 in python3.4 "update Python3 for trusty" [Undecided,Confirmed] https://launchpad.net/bugs/1348954
19:56:15 <uvirtbot> Launchpad bug 1367907 in python3.4 "Segfault in gc with cyclic trash" [High,Fix released] https://launchpad.net/bugs/1367907
19:56:16 <uvirtbot> Launchpad bug 1382607 in python3.4 "[SRU] Backport python3.4 logging module backward incompatibility fix." [High,Fix released] https://launchpad.net/bugs/1382607
19:56:28 <fungi> i'm open to suggestions on how to raise the visibility of those
19:56:53 <fungi> zul asked me for the bug numbers again last week, but not sure if he was looking into them now too
19:57:05 <jeblair> switch to an os that supports python3.4?
19:57:22 <fungi> there's an alternative
19:57:42 <fungi> or build our own real python interpreters for tests rather than using distro-provided python ;)
19:57:54 <clarkb> or use the ones they pushed into that other repo
19:58:06 <fungi> (which come with who-knows-what features back/forwardported into them)
19:58:14 <fungi> that other repo?
19:58:17 <zul> i could look at pushing it to a ppa again as well
19:58:39 <clarkb> fungi: py3.4 is fixed in one of the package maintainers repos iirc
19:58:48 <clarkb> fungi: which you tested to confirm that the new package fixed our problem
19:59:00 <fungi> clarkb: yeah
19:59:06 <jeblair> zul: can you just yell at them to release it already? :)
19:59:11 <fungi> clarkb: at least fixed for that one patch
19:59:16 <zul> jeblair:  ive done that
20:00:12 <jeblair> thanks everyone
20:00:17 <pleia2> we skipped over them, but core eyeballs on the priority specs would be much appreciated
20:00:23 <fungi> anyway, that's the current state. patched python 3.4 in trusty is the current only blocker to moving our py3k jobs to 3.4/trusty
20:00:25 <jeblair> pleia2: ++
20:00:36 <jeblair> and gerrit topics are up next time
20:00:38 <jeblair> #endmeeting