19:03:32 <fungi> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:03:37 <fungi> #topic Announcements
19:03:52 <ianw> o/
19:04:16 <olaph> o/
19:04:34 <fungi> hrm, i thought i'd saved a link for this, just a sec
19:05:21 <fungi> okay, here we go. sorry about that
19:05:35 <fungi> #info Mentors needed for GSOC
19:05:40 <fungi> #link http://lists.openstack.org/pipermail/openstack-dev/2016-February/085508.html
19:06:12 <fungi> we've had some great infra interns/mentees in the past, and it's a great way to get in touch with potential new additions to the team, to openstack, and to free software in general
19:06:38 <fungi> i encourage people to give it a try, especially if they've never mentored. it's a rewarding experience
19:06:57 <fungi> anyway, no other announcements lined up. anything important i should mention before we move on to action items?
19:07:39 <fungi> #topic Actions from last meeting
19:07:43 <fungi> #link http://eavesdrop.openstack.org/meetings/infra/2016/infra.2016-01-26-19.02.html
19:07:47 <fungi> cody-somerville to draft and send HPE Cloud shutdown notice+impact to openstack-infra and openstack-dev
19:07:51 <fungi> #link http://lists.openstack.org/pipermail/openstack-dev/2016-January/085141.html
19:07:56 <fungi> thanks cody-sommerville for sending that!
19:08:00 <cody-somerville> No problem.
19:08:01 <fungi> thanks everyone who worked on maintaining the former hewlett-packard cloud for our use and abuse!
19:08:02 <anteaya> thanks cody-somerville
19:08:08 <fungi> and thanks to the rest of the infra team for making the removal well-planned, quick and painless!
19:08:14 <anteaya> yes thank you
19:08:15 <cody-somerville> We've gotten some folks who are interested in donating resources. Is someone following up with them?
19:08:39 <anteaya> who did you contact about that?
19:08:44 <fungi> there was one which emerged form the infra ml moderation queue yesterday, and i was planning to reply but haven't had time yet
19:08:53 <fungi> s/form/from/
19:09:01 <anteaya> obviously I have backscroll to read
19:09:05 <clarkb> I am currently working with osic for credentials
19:09:10 <anteaya> yay
19:09:20 <clarkb> harlowja says they will have internal disucssion and may have something for us
19:09:29 <anteaya> wonderful
19:09:36 <harlowja> clarkb yup
19:09:46 <harlowja> gonna go poke the SE guy here who said he wanted to chat with me about this
19:09:47 <fungi> #link http://lists.openstack.org/pipermail/openstack-infra/2016-January/003707.html
19:09:56 <fungi> "safebrands"
19:10:19 <fungi> though odd that the offer came from their head of marketing
19:10:39 <anteaya> yay safebrands
19:10:42 <jeblair> we have logos on a marketing page now :)
19:10:48 <anteaya> yay
19:10:51 <fungi> yes we do!
19:10:53 <fungi> anyway, no need to eat up meeting time with this one
19:10:57 <anteaya> so much progress
19:11:05 <fungi> nibalizer release gerritlib 0.5.0
19:11:07 <fungi> #link https://pypi.python.org/pypi/gerritlib
19:11:11 <zaro> o/
19:11:16 <fungi> thanks nibalizer for picking that up after i promised to do it and then dropped it on the floor!
19:11:27 <yolanda> yay gerritlib!
19:11:45 <fungi> i didn't see any fallout from the gerritlib release, so smooth sailing there i guess
19:11:58 <fungi> #topic Specs approval
19:12:03 <fungi> PROPOSED: Unified Mirrors (krotscheck, jeblair)
19:12:06 <fungi> #link https://review.openstack.org/252678
19:12:11 <fungi> looks like this was discussed as i'd hoped last week, though council voting was deferred for an additional week
19:12:20 <fungi> #info Voting is open on the "Unified Mirrors" spec until 19:00 UTC Thursday, February 4.
19:12:39 <krotscheck1> \o/
19:12:50 <fungi> we've got a status update on the agenda for this already, so i'll avoid spending much time on the spec announcement
19:12:51 <jeblair> i have also pushed up the afs modification of that https://review.openstack.org/273673
19:13:14 <fungi> cool. everyone let's vote on that too by thursday if possible
19:13:15 <jeblair> (which is what's actually in production now)
19:13:20 <fungi> #link https://review.openstack.org/273673
19:13:47 <fungi> #topic Adding a new node to nodepool to support libvirt-lxc testing in Nova (thomasem, dimtruck)
19:14:04 <fungi> any chance thomasem or dimtruck are around this week to discuss what they wanted here?
19:14:11 <thomasem> fungi: So, this turned out to be a bug that was filed here: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1536280
19:14:12 <openstack> Launchpad bug 1536280 in linux (Ubuntu) "domain shutdown fails for libvirt/lxc" [Medium,Confirmed]
19:14:43 <fungi> #link https://launchpad.net/bugs/1536280
19:14:49 <thomasem> We're disabling the tests affected right now to just get some testing for LXC working, and I'm iterating tests here: https://review.openstack.org/#/c/274792/1 and investigating the other failures.
19:15:02 <thomasem> #link https://review.openstack.org/#/c/274792/1
19:15:24 <clarkb> I wouldn't make a special new node just for this. If newer kernel fixes it we can use that across the board for example
19:15:32 <clarkb> or use centos/fedora until ubuntu can fix
19:15:43 <anteaya> thomasem: so from the nova mid-cycle last week I got that you want to use a specific kernel for your test
19:15:52 <fungi> yeah, trying on centos 7 was one of my earlier suggestions as well
19:16:05 <fungi> or fedora 23
19:16:22 <fungi> the latter should have a pretty bleeding-edge kernel i think
19:16:26 <thomasem> clarkb: fungi: Good call, I wasn't sure what our options are regarding that, but if we can just specify a different kernel, I would be amenable to that. I haven't tried on other OSes
19:16:49 <thomasem> I'm using 3.18.x in my environments and that works great
19:16:58 <clarkb> thomasem: ubuntu LTSs have the hardware support kernels which we havne't had to use previously but are available to us
19:17:12 <fungi> i'm a little iffy on having our test environment for ubuntu 14.04 lts use a nonstandard kernel, and having jobs rely on that
19:17:20 <clarkb> fungi: its "standard"
19:17:23 <clarkb> fungi: it just isn't default
19:17:26 <fungi> but if it's a packaged kernel in updates that's fine
19:17:46 <fungi> yep, no disagreement from me on that
19:17:49 <thomasem> So, you would prefer a different OS entirely? I think that's why we were considering a different node at the time. That's not the biggest problem right away, though. The biggest problem is the other intermittent failures that I don't have a root cause for yet.
19:18:00 <cody-somerville> Is there a specific kernel patch that is needed? Maybe we can get them to include it in the LTS kernel.
19:18:03 <fungi> as long as it doesn't also come with new and improved blow up all our other jobs support
19:18:15 <thomasem> Hahaha, yeah
19:18:21 <clarkb> fungi: right that is the risk and why centos7/fedora23 would be preferable to start probably
19:18:43 <thomasem> To avoid Ubuntu node kernel changes affecting everything else?
19:18:44 <ianw> thomasem: centos is 3.10 ... but probably heavily modified ... f23 is 4.3.3
19:18:49 <clarkb> thomasem: yes
19:18:53 <thomasem> Gotcha
19:18:56 <ianw> thomasem: i can help you with setup of either
19:19:01 <clarkb> thomasem: ovs, qemu, etc
19:20:11 <thomasem> ianw: Gotcha. At the moment I'm trying to just get this thing passing consistently for the tests that aren't affected by the kernel problem, if that makes sense. But, once we get that part solved, I would be happy to explore other node types that can open up the breadth of tests we can run on LXC reliably.
19:20:20 <thomasem> clarkb: fungi ^^
19:20:23 <thomasem> does that all seem reasonable?
19:20:38 <fungi> okay, so sounds like there are at least a couple of good paths forward without adding a special "ubuntu with seasoning" node type
19:20:42 <fungi> makes sense
19:20:47 <clarkb> thomasem: yup makes sense
19:21:24 <thomasem> Okay, awesome. ianw, I will hit you up if I run into snags, does that sound good? I really appreciate the aid.
19:21:32 <fungi> thomasem: need anything else debated in the meeting, or does this transition to the infra channel/ml and code review next?
19:21:57 <thomasem> fungi: Nope. I think we have a path forward, and if anything starts going wrong, I'll start screaming again.
19:22:05 <thomasem> :D
19:22:09 <fungi> excellent! we can finally get this off our meeting agenda backlog ;)
19:22:14 <thomasem> woot woot
19:22:18 <fungi> #topic Scheduling a Gerrit project rename batch maintenance (SergeyLukjanov)
19:22:19 <anteaya> I don't think this constitues as screaming
19:22:30 <fungi> i do enough screaming for all of us
19:22:40 <anteaya> I've never seen that happen
19:22:47 <fungi> okay, last week it was decided to kick the ball down the road because lots of people were travelling
19:22:52 <anteaya> anyway sorry to derail
19:23:21 <anteaya> is SergeyLukjanov here?
19:23:49 <anteaya> I'm not excited about this, I'll play along but can't drive
19:23:49 <fungi> there are a couple of pending renames for official repos. SergeyLukjanov was offering to run the maintenance as long as there are at least some of us around in case something goes awry
19:24:07 <clarkb> this weekend is an unofficial holiday in this country
19:24:18 <fungi> foosball tournaments
19:24:21 <anteaya> I can be around this weekend but again not in a hurry
19:24:23 <clarkb> so I am mostly not around
19:24:48 <fungi> i will not be partaking in said tournament mania so can be available...
19:25:13 <fungi> but also these renames don't seem terribly urgent and there aren't many of them
19:25:28 <anteaya> I'm here if you need me
19:25:33 <fungi> also this would be our first renames since the gerrit upgrade, right?
19:25:36 <anteaya> yes
19:25:50 <jeblair> oh, so probably worth a bit more attention than usual
19:25:54 <fungi> so might be best to defer yet another week and have more people available
19:25:57 <anteaya> good point
19:25:58 <fungi> yeah
19:26:06 <yolanda> i can be around on mornings in my timezone, that will be the same as Sergey. But cannot be around in the afternoons
19:26:21 <fungi> yolanda: thanks, good to know
19:26:29 <anteaya> I'm off again the weekend of the 12th
19:26:39 <anteaya> until the end of Feb
19:26:49 <anteaya> so not available for rename things
19:27:08 <fungi> i guess i'll try to circle back around with SergeyLukjanov about a possible window for next week, and we can pick a time when we meet again
19:27:23 <anteaya> yup
19:27:25 <fungi> #topic Mirror update (jeblair, krotscheck)
19:27:32 <jeblair> you're all afs admins now
19:27:39 <anteaya> ha ha ha
19:27:42 <krotscheck1> WOOO
19:27:43 <krotscheck1> Wait
19:27:48 <krotscheck1> sadpanda
19:27:57 <anteaya> all == infra-root, yes?
19:28:02 <fungi> afs administration as code! old meets new!
19:28:21 <jeblair> the pypi mirrors are in production and on afs
19:28:34 <jeblair> we're still keeping the old ones around for just a bit in case something goes terribly wrong
19:28:35 <fungi> anteaya: all == anyone contributing patches and looking at whatever we can expose on dashboards and graphs
19:28:39 <krotscheck1> Wheel mirror work is still in progress.
19:28:44 <anteaya> fungi: ah okay thanks
19:29:06 <jeblair> in general, the theory about it being fast by serving from local cache seems to be holding
19:29:08 <AJaeger> anything special we need to know when reviewing?
19:29:29 <jeblair> it does turn out that trans-atlantic udp is quite slow
19:29:48 <clarkb> jeblair: are we going to set up a replica in europe?
19:29:50 <jeblair> so when the ovh mirrors fetch something from the fileserver, it takes a bit longer
19:30:00 <jeblair> clarkb: if we did, it may improve that ^
19:30:05 <fungi> AJaeger: probably for now, being aware that pip is installing from a cached backend for the pypi mirrors in our jobs is a good place to start
19:30:49 <jeblair> but also, even within the us, when we transfer hundreds of gb between data centers to make the read-only replicas, that turns out to be quite slow too
19:31:00 <fungi> in theory there should be no impact, but be on the lookout for oddities in jobs which could be explained by stale caches, cold caches, cache misses
19:31:16 <Clint> we need faster-than-light packets
19:31:23 <anteaya> thanks, good question AJaeger
19:31:23 <AJaeger> fungi, ok, we learn as we go ;)
19:31:25 <jeblair> our initial sync of the pypi mirror to a new read-only site took much longer than i guessed, and ended up getting aborted
19:31:41 <jeblair> i started another initial sync last night in a safer manner, and expect it to finish wed night
19:32:04 <jeblair> i'm going to manually release it after that a few times until i'm happy that the deltas are reasonably small and fast
19:32:12 <jeblair> then we can switch back to automated releases
19:32:37 <jeblair> and then, some time in the future, maybe we can look into whether there's anything we can tune to make this faster
19:32:53 <jeblair> (cern measures their afs throughput in gbps)
19:32:54 <clarkb> jeblair: the expectation is that only the initial sync is slow right?
19:33:10 <jeblair> clarkb: yeah, it's an incremental system so should speed up considerably
19:33:11 <fungi> doesn't seem terrible as a startup cost, really
19:33:43 <fungi> consider that bootstrapping a pypi mirror from scratch with bandersnatch takes a similarly long amount of time
19:33:49 <jeblair> [end of my report]
19:33:51 <jeblair> indeed :)
19:34:17 <fungi> except in this case we (in theory) incur that cost once now instead of for every new mirror server we create
19:34:18 <cody-somerville> If we start doing replication clones, I assume we'll cluster a couple together to avoid having to repay that setup cost in event of failure of one of the nodes?
19:34:46 <jeblair> right now we have 2 fileservers, in rax dfw and ord
19:35:12 <jeblair> i don't think that needs to change in the immediate future
19:35:29 <jeblair> as it's actually the mirror servers (which are afs clients) that are doing the real local caching
19:35:37 <jeblair> and they are in each region we have nodepool slaves
19:35:43 * krotscheck1 is more or less done with the wheel_mirror patches, excepting some typos and cleanup.
19:36:03 * krotscheck1 is waiting for local tests to pass before uploading a (hopefully final) patchset
19:36:31 <jeblair> yeah, i think those are ready once we get the incantation right :)
19:36:51 <fungi> awesome
19:36:51 <jeblair> i created the first wheel volume in afs and mounted it
19:36:55 * krotscheck1 is fresh out of goats, will be switching to tofu sacrifices.
19:37:02 <jeblair> so all the externalities have been satisfied
19:37:10 <anteaya> krotscheck1: its squishy
19:37:17 <krotscheck1> Most of the system-config patches should be ready though.
19:37:24 <krotscheck1> It's only the job defenitions that are pending.
19:37:35 <krotscheck1> jeblair: Do we already have a wheel slave?
19:37:54 <jeblair> krotscheck1: no
19:37:57 <jeblair> fungi: right?
19:38:04 <clarkb> and we will need one for each platform
19:38:11 <clarkb> or otherwise chroot/container
19:38:16 <jeblair> clarkb: right, though we're starting with only ubuntu for simplicity
19:38:18 <fungi> no wheel slave built yet afaik
19:38:33 <fungi> so yes, that's an upcoming step
19:39:22 <clarkb> jeblair: that makes sense
19:39:41 <fungi> so thrilling progress with afs in production! this also makes for a great segue into our next topic
19:39:54 <fungi> unless there are more afs/wheel mirror questions
19:40:08 <anteaya> none here
19:40:37 <fungi> #topic Swift for docs publishing (annegentle, fungi)
19:40:43 <fungi> #link http://specs.openstack.org/openstack-infra/infra-specs/specs/doc-publishing.html
19:41:11 <fungi> there was some renewed interest from the docs team in recent weeks/months on this spec
19:41:17 <jeblair> i have prepared thoughts on this, sorry for the bomb.
19:41:24 <jeblair> note that the spec also has an afs alternative section:
19:41:24 <jeblair> http://specs.openstack.org/openstack-infra/infra-specs/specs/doc-publishing.html#afs
19:41:24 <jeblair> it says to do this, we'd have to set up an afs cell which is a lot of work
19:41:24 <jeblair> but we have an afs cell now
19:41:24 <jeblair> so it's probably worth revisiting
19:41:25 <jeblair> originally, i imagined that to have it work securely on throwaway nodes, we would need zuul to do some complicated stuff with creating principals, pts entries, etc
19:41:25 <jeblair> that's still more work than i'd like to do in zuulv2, but it's possible, and may be less work than the rube-goldberg approach in the spec
19:41:26 <jeblair> however, if we are willing to be a little less paranoid, and trust doc build jobs with afs creds on long lunning slaves (like we chose to do with mirror wheel builds), we could get docs into afs with rsync _really quickly_
19:41:32 <fungi> really what it's mostly lacking now is some available hands to work through implementation
19:41:40 <fungi> oh, heh. reading
19:41:57 <fungi> and yes, basically what i was about to say
19:42:09 <fungi> thanks for saving me the typing!
19:42:20 <jeblair> :)
19:42:52 <fungi> so the takeaway there is... if there are people who are really amped about afs, this is a great spec to jump on
19:43:28 <cody-somerville> Would it really need AFS credentials? Or just normal prived ssh key to do the build+publish workflow?
19:43:54 <SpamapS> wow, nobody wanted to just teach sphinx how to upload things to swift?
19:43:56 <fungi> i would expect to see a revision of the current approved spec which takes the afs details into account of course, but aside from that i agree the work is much simplified if we go down that path now
19:44:14 <SpamapS> and proxy docs.openstack.org to swift?
19:44:15 <jeblair> cody-somerville: logs are simpler, so i proposed doing that for that spec.  docs are _much harder_ because of the layout
19:44:20 <jeblair> SpamapS: see above
19:44:26 <fungi> SpamapS: i was equally shocked ;)
19:45:10 <cody-somerville> Also, wondering if there is benefit to dogfooding OpenStack service instead of relying on custom infrastructure here (though the point about layout is definitely fair)
19:45:37 <jeblair> we have different branches writing into basically the same tree (in a predictable/structured manner), so it can not simple be copied, or even blindly rsynced
19:46:15 * annegentle waves
19:46:32 <jeblair> basically, the only way i know to get what we need is a careful rsync (copy is right out because you can't be smart about deletions)
19:46:36 <fungi> our dogfooding of said storage solution for log publication has run into some snags, mostly around the browsing experience and need for real filesystem-like metadata too. while i agree we should avoid reinventing the wheel, this wheel was invented decades ago and is still nice and round
19:46:36 <annegentle> yeah it's the "blindly rsynched" that's tough here
19:47:26 <cody-somerville> What if each build is published in unique location/namespace/whatever and then there is an atomic "update to point at latest"?
19:47:31 * annegentle catches up on afs cell stuff...
19:47:44 <SpamapS> Yeah, for an immediate solution, seems like just "better things behind the current solution" is going to have to win out over "making swift better"
19:48:42 <jeblair> cody-somerville: that's vaguely what the spec accomplishes -- it's sort of "build this as a unit, and try to drop the unit in place (with rsync)"
19:48:45 <annegentle> cody-somerville: I'd have to think about that... we only "release" the install guides and config refs to a known /relname/ URL right now, and then the contrib dev docs also have "releases"
19:48:58 <SpamapS> One should be able to build a site entirely hosted in swift. But if there's time pressure, sounds like can't dogfood it.
19:49:12 <annegentle> cody-somerville: then guides like Ops Guide, Security Guide, those are namespaced to cover multiple releases...
19:49:18 <fungi> well, the docs and logs use cases turned out to differ in a couple of key areas. logs: huge volume, need to track and possibly prune by age, need to generate indexes on the fly; docs: (comparatively) small quantity of data, comes with own indexing pregenerated
19:49:20 <annegentle> cody-somerville: so yeah it's vaguely like that :)
19:49:27 <jeblair> SpamapS: that's where we started with this, but the actual requirements are almost completely opposite of what swift provides
19:49:44 <SpamapS> jeblair: how disappointing. :(
19:49:50 <annegentle> SpamapS: ha. Yeah I wondered how much the reality had moved on with a 1.5 year old spec jeblair
19:49:58 <SpamapS> That may explain why I am always puzzled as to why somebody didn't use swift. Maybe it just isn't for what I think it's for.
19:50:04 <annegentle> so that's also a point to discuss, is that spec reflecting reality?
19:50:29 <annegentle> Honestly, it comes up mostly when Google finds an outdated doc that's still on the server because we don't delete.
19:50:38 <jeblair> annegentle: i think the spec could still be implement as written; my personal feeling is that we'd get it done faster if we pivoted to afs (which is the big thing that has moved on since the spec was written)
19:50:42 <annegentle> the other big win is HTTPS on docs. and developer.
19:50:48 <SpamapS> Like, I also think anything that is "a mirror" should be hostable in a thing like swift. But nobody seems to do that, so there must be something about it that just makes that really hard.
19:50:56 <annegentle> jeblair: ok, that's good to know and exactly why I'm asking :)
19:51:04 <SpamapS> ++ on https for docs
19:51:10 <annegentle> SpamapS: IKNOW :)
19:51:17 <annegentle> kidding on the shout :)
19:51:19 <AJaeger> SpamapS: noone disagrees ;)
19:51:24 <fungi> SpamapS: well, the docs "site" is built in bits and pieces with different processes modifying overlapping parts of the tree at different times
19:51:25 <cody-somerville> I'm just concerned with AFS becoming a hard dependency of the CI system.
19:51:29 <jeblair> and i think either way, we can add https
19:51:36 <annegentle> jeblair: true
19:51:48 <fungi> so persistence and malleability are needed for the current state of docs.o.o
19:52:30 <clarkb> SpamapS: no public access except via cdn and rudimentary indexing ability are the two killers for us
19:52:41 <fungi> also swift doesn't seem to be designed as a filesystem, rather as a filestore, and you still need to maintain external indexing (which is where we're still struggling with the log storage effort)
19:53:31 <fungi> lack of hierarchical indexing, specifically (or any built-in hierarchy at all really)
19:54:10 <jeblair> if folks aren't sick of hearing about afs, i can propose a spec update and describe what that would look like
19:54:24 <fungi> and here i was just about to ask who wanted to take that as a next step
19:54:25 <anteaya> jeblair: I'd like to see your spec update proposal
19:54:30 <jeblair> i'd like to know whether we're comfortable with semi-trusted doc build jobs
19:54:53 <anteaya> I'd like to read about what makes them semi-trusted
19:54:56 <SpamapS> yeah we can discuss swift's flaws another time. +1 on afs from me, but with the caveat that I'm just a curious party, not a working party, in this context.
19:54:57 <jeblair> i can also write it up both ways i guess
19:54:59 <fungi> it's no worse, trust-wise, than the status quo so no objection from me for now
19:55:33 <annegentle> spec update is progress to me, and ensures we're still moving towards https and decent synch
19:55:37 <annegentle> "good enough synch"
19:56:05 <jeblair> ok, i'll do that, and we can accept or reject that approach
19:56:14 <annegentle> jeblair: thanks
19:56:18 <jeblair> np
19:56:21 <fungi> excellent! and also a timely topic in conjunction with afs usage recently going into production for our mirroring
19:56:39 <fungi> #topic Open discussion
19:56:46 <cody-somerville> jeblair: will doc publishing have access to the vos command?
19:57:52 <fungi> pabelanger: any last-minute updates on the upstream development presentations ideas for the summit? did you get anything submitted? cfp deadline is in a few hours
19:58:10 <jeblair> cody-somerville: not directly, too high priv.  possibly as a follow-up job like we're doing for wheels
19:58:21 <AJaeger> FYI: The new Translation is nearly finished, the unified approach works fine. Now amotoki and myself are cleaning up and reenabling all repos again. Then it's figuring out horizon.
19:58:33 <AJaeger> Current set of changes: https://review.openstack.org/#/q/status:open%20%20branch:master%20topic:translation_setup
19:58:49 <AJaeger> Only project-config ones are mandatory - reviews are welcome
19:59:15 <AJaeger> fungi, pabelanger sumitted the lightning talks for sure
19:59:15 <dougwig> AJaeger: yay.  :)
19:59:37 <fungi> thanks pabelanger!
20:00:01 <fungi> okay, we're out of time
20:00:08 <fungi> thanks all
