19:02:50 <fungi> #startmeeting infra
19:02:51 <openstack> Meeting started Tue May 23 19:02:50 2017 UTC and is due to finish in 60 minutes.  The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:02:52 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:02:55 <openstack> The meeting name has been set to 'infra'
19:03:00 <fungi> #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting
19:03:07 <fungi> #topic Announcements
19:03:18 <fungi> #info Many thanks to AJaeger (Andreas Jaeger) for agreeing to take on core reviewer duties for the infra-manual repo!
19:03:30 <fungi> #info Many thanks to SpamapS (Clint Byrum) for agreeing to take on core reviewer duties for the nodepool and zuul repos!
19:03:42 <fungi> as always, feel free to hit me up with announcements you want included in future meetings
19:03:51 <fungi> #topic Actions from last meeting
19:04:03 <fungi> #link http://eavesdrop.openstack.org/meetings/infra/2017/infra.2017-05-02-19.05.html Minutes from last meeting
19:04:14 <fungi> pabelanger Open an Ubuntu SRU for bug 1251495
19:04:15 <openstack> bug 1251495 in mailman (Ubuntu Trusty) "Lists with topics enabled can throw unexpected keyword argument 'Delete' exception." [High,Triaged] https://launchpad.net/bugs/1251495
19:04:23 <fungi> guessing this one's still pending, not seeing one there yet
19:04:45 <pabelanger> fungi: yes, at this point if you want to take it off action list, I am trying to work ubuntu community to get it done
19:04:56 <pabelanger> but, since layoffs it has been difficult finding people
19:05:17 <pabelanger> was going to reach out to a few openstack ubuntu members and see how to best move forward
19:05:24 <fungi> okay, no sweat. we had a fallback plan for now anyway, i think?
19:05:29 <pabelanger> I just don't want to update up a SRU bug and leave it to bitrot
19:05:44 <pabelanger> ya, we are running a manual patch to lists.o.o today
19:05:55 <pabelanger> fallback, we could do our own PPA if needed
19:06:02 <pabelanger> or move to xenial
19:06:05 <jeblair> and i think it's fixed in xenial
19:06:12 <pabelanger> yes, xenial is okay
19:06:15 <fungi> does that get reverted if unattended-upgrades updates the mailman package on us (for a security fix or the like)?
19:06:29 <pabelanger> fungi: I believe so
19:06:29 <jeblair> yes
19:06:46 <fungi> okay, so this steps up the priority to update the listserv from trusty to xenial i guess
19:06:51 <jeblair> fungi: ++
19:07:28 <jeblair> i'm on board to reprise my role from last time
19:07:51 <clarkb> I guess security fixes bypass the sru process ya?
19:07:54 <fungi> jeblair: want someone else to draft the server upgrade plan this time?
19:07:56 <pabelanger> cool, upgrade to xenial is likely our best option I think
19:08:07 <clarkb> (eg it is possible for a patch like that to come in that doesn't include pabelanger's patch to fix the bug)
19:08:09 <pabelanger> clarkb: yes, it looks that way
19:08:50 <jeblair> fungi: either way; given my etherpad from last time, it's probably not too hard for someone else to do it.  if no one else wants to, i can.
19:09:15 <fungi> i'm resisting the temptation to volunteer myself for yet one more thing... any takers?
19:10:13 <bkero> I can give the upgrade a shot, although will likely have many questions.
19:10:30 <bkero> Unless this would be something better for an infra-core to handle.
19:10:48 <fungi> bkero: want to start by adapting jeblair's previous etherpad contents?
19:11:14 <fungi> #link https://etherpad.openstack.org/p/lists.o.o-trusty-upgrade
19:11:23 <bkero> fungi: Sure
19:11:29 <clarkb> ya some things like ext4 upgrade won't be necessary
19:11:44 <fungi> #action bkero draft an upgrade plan for lists.o.o to xenial
19:11:48 <jeblair> do we want to give bkero a snapshot of lists.o.o to work from?
19:12:18 <fungi> i would have no problem snapshotting the current server and adding an account/sshkey for him on that
19:12:47 <clarkb> wfm
19:12:56 <clarkb> I can help get a snapshot done since I did that last time
19:13:09 <fungi> we'll obviously want to follow our earlier disable/enable steps for stuff around teh snapshot creation so the snapshot isn't booted spewing duplicate e-mails
19:13:11 <clarkb> jeblair: we want ot doctor the base image before doing that right?
19:13:16 <clarkb> to disable services?
19:13:21 <clarkb> ya that
19:13:25 <jeblair> ++
19:13:40 <clarkb> though likely not today as I am going to try and get shade out the door and nodepool upgraded so that citycloud can run multinode jobs
19:13:46 <clarkb> but tomorrow I can likely do this
19:14:06 <bkero> Sounds good. I'll have time to tackle this tomorrow and Thursday evening.
19:14:07 <mordred> yes. we're going to get that out the door if it kills me
19:14:26 <mordred> (we got a bug report this morning that I'm squeezing the fix for in because I'm a bad person)
19:15:37 <rockyg> Bad mordred!  Bad dog!
19:15:43 * mordred hides
19:15:46 <jeblair> mordred: always trying to get the fix in
19:16:07 <Shrews> mordred: and just approved the last of those
19:16:11 <mordred> Shrews: thank you
19:17:07 <fungi> clarkb to add citycloud to nodepool
19:17:09 <fungi> #link http://grafana.openstack.org/dashboard/db/nodepool-city-cloud City Cloud nodepool utilization graphs
19:17:14 <fungi> (and there was much rejoicing)
19:17:18 <pabelanger> Yay
19:17:55 <clarkb> there are/were two hiccups with this
19:18:04 <fungi> one minor but annoying issue still outstanding in one of their regions which looks like a bug/misconfiguration at this point i guess
19:18:06 <clarkb> first one was missing flavor in the La1 region which I sent email about and they corrected
19:19:06 <clarkb> the second is we sometimes get multiple private IP addrs assigned to isntances which breaks in two different ways. The first is if nodepool writes the non working private ip to the private ip list on the instance then multinode breaks. The shade stuff I mention above should address this by using the private ip address associated with the floating IP address
19:19:26 <clarkb> the second way this breaks is if the floating IP address is attached to the non working private IP then nodepool fails to ssh in and deletes the node and tries again
19:19:42 <fungi> which is probably accounting for most of the hits on that error node launch attempts graph too, i would assume
19:19:44 <clarkb> I've sent email to citycloud with example instances and info on this in hopes they can track down why this is happening
19:19:53 <clarkb> pabelanger helped track down the second way this breaks so thanks
19:20:01 <clarkb> fungi: ya
19:20:16 <jeblair> 2.1 and 2.2 both seem like either one or two openstack bugs, yes?  i guess we're expecting them to confirm that?
19:20:18 <pabelanger> Ya, nodepool debug script wins again
19:20:39 <clarkb> jeblair: I think its the same underlying bug in openstack or their deployment yes. Hoping they can fix/confirm
19:21:13 <pabelanger> does shade pick the private IP to attach the FIP too?
19:21:18 <pabelanger> or is that openstack
19:21:28 <mordred> pabelanger: yes
19:21:35 <mordred> pabelanger: (it depends)
19:21:45 <clarkb> being a private IP though shade has no way of knowing which one is "correct"
19:21:59 <pabelanger> right
19:22:01 <clarkb> so there isn't much it can do there other than assume the floating IP one will work
19:22:01 <mordred> pabelanger: if there is only one fixed ip on a server, openstack picks. if there is more than one, the user (or shade) has to tell openstack which to use
19:22:13 <mordred> there are _some_ ways to infer the correct one, which shade does
19:22:22 <mordred> but if those don't work there is an occ config option the user can set
19:22:28 <mordred> or be explicit in the create_server call
19:22:37 <clarkb> in this case they are  both on the same network though
19:22:59 <pabelanger> I wonder if it is always the 2nd private IP that ends up working
19:23:02 <clarkb> so the only thing differentiating is the ip address and thats a toss up without actually see which got dhcped on the VM
19:23:20 <clarkb> pabelanger: it could be though I don't know how the are ordered if at all
19:23:31 <mordred> clarkb: but so far the one with the same mac as the fip seems to be correct, right?
19:23:56 <clarkb> mordred: for the multinode job case yes, because in order to get that far the fip had to work
19:23:57 <mordred> the fact that the first one has a different mac from the mac on the server is the thing that makes me think something is extra broke
19:24:19 <clarkb> mordred: but we have ssh failures in nodepool that pabelanger has tracked back to the fip being attached to the wrong private ip
19:24:28 <mordred> clarkb: oh - ah - right
19:24:42 <pabelanger> clarkb: ya, it seems when FIP attaches to 2nd private IP, it works
19:24:42 <clarkb> also we don't get 2 private IPs on every server
19:24:44 <mordred> soyah - nothing we can do about that shade-side
19:24:55 <pabelanger> all the working servers now, are on private IP2
19:24:58 <clarkb> its all very weird, hoping the cloud can clarify
19:25:02 <clarkb> pabelanger: interesting
19:25:25 <fungi> and it looked like they were always adjacent addresses (at least in the examples i saw)
19:25:37 <clarkb> anyways we don't have to spend much more time on this. Cloud is in use, once servers boot and nodepool sshes we should be fine (especially after shade is released)
19:25:44 <mordred> ++
19:25:50 <jeblair> clarkb: thanks, and thanks for the update
19:25:52 <fungi> thanks clarkb, pabelanger!
19:25:54 <jeblair> pabelanger: ^
19:26:12 <fungi> #topic Specs approval: PROPOSED Add nodepool drivers spec (jeblair)
19:26:16 <fungi> #link https://review.openstack.org/461509 "Nodepool Drivers" spec proposal
19:26:58 <jeblair> i proposed a thing
19:27:05 <jeblair> this is pretty high level
19:27:14 <fungi> looks like it's gone through some review/iteration at this point
19:27:29 <jeblair> basically, some folks showed up and wanted to start working on the static node support in nodepool
19:27:57 <jeblair> this spec lays out an approach for doing that as well as laying the groundwork for future expansion for other non-openstack providers
19:28:21 <jeblair> i think we've circulated it around the folks interested in that area, so i think it's ready for a vote
19:29:19 <fungi> awesome
19:30:07 <fungi> #info The "Nodepool Drivers" spec is open for Infra Council voting until 19:00 UTC Thursday, May 25
19:30:18 <fungi> that cool?
19:30:40 <clarkb> it being that late in the year is not cool
19:30:42 <jeblair> cool.  cool.
19:30:47 <clarkb> where did the last 5 months go
19:31:01 <fungi> clarkb: yeah, i don't know where the first half of the year went
19:31:24 <fungi> thanks jeblair! this will be awesome to have working
19:31:43 <fungi> #topic Priority Efforts
19:32:03 <fungi> nothing called out specifically here, though the spec above is related to the zuulv3 work
19:32:23 <fungi> #topic Old general ML archive import (fungi)
19:32:27 <fungi> #link https://etherpad.openstack.org/p/lists.o.o-openstack-archive-import Mainte
19:32:35 <fungi> #undo
19:32:36 <openstack> Removing item from minutes: #link https://etherpad.openstack.org/p/lists.o.o-openstack-archive-import
19:32:47 <fungi> #link https://etherpad.openstack.org/p/lists.o.o-openstack-archive-import Maintenance plan for old general ML archive import
19:32:59 <fungi> (not sure where that stray newline came from)
19:33:26 <fungi> anyway, repeat from about a month ago when i said i'd punt this maintenance until after the summit
19:34:23 <fungi> we're in the middle of a few dead weeks in the release schedule
19:34:30 <jeblair> lgtm.  i think this is fine to do either before or after the xenial upgrade.  just not during.  :)
19:34:33 <fungi> #link https://releases.openstack.org/pike/schedule.html Pike Release Schedule
19:34:50 <fungi> so, yeah, this seems like a good time to go ahead with it
19:34:59 <fungi> just means some (brief) downtime for the listserv
19:35:11 <jeblair> count me in as standby help.
19:35:23 <fungi> and to have some volunteers on hand to visually inspect the archive before we start allowing new messages into the list
19:35:31 <fungi> thanks jeblair!
19:36:06 <fungi> i'm probably fine doing it late utc on friday (20:00 utc or later)
19:36:24 <fungi> have an appointment on the mainland earlier in the day so won't be around until then
19:36:25 <pabelanger> should be able to help also
19:36:36 <clarkb> ok, I'll be around on friday as well
19:36:53 <jeblair> 2000 fri wfm
19:37:01 <fungi> awesome, i'll send an announcement after the tc meeting
19:37:39 <fungi> #info The mailman services on lists.openstack.org will be offline for about an hour on Friday, May 26 starting at 20:00 UTC
19:38:19 <fungi> refreshing the agenda and seeing no other last-minute additions...
19:38:22 <fungi> #topic Open discussion
19:39:23 <fungi> don't all talk at once now ;)
19:39:27 <pabelanger> nb03.o.o is online (and xenial). However, we are at volume quota for vexxhost, waiting for feedback on how to proceed from vexxhost
19:39:33 <clarkb> osic is running at max servers of zero right now
19:39:53 <clarkb> we are told that we should get access to the cloud once dns and ssl are sorted
19:39:53 <jeblair> folks should expect to see another zuulv3 email update soon
19:40:26 <jeblair> pabelanger: how is volume quota related?
19:40:48 <jeblair> pabelanger: (did you mean image quota?)
19:40:57 <fungi> we have a 1tb cinder volume attached to nb01 and nb02
19:41:00 <pabelanger> we only have 200GB HDD for server, we usually mount a 1TB for diskimage builder
19:41:03 <fungi> at /opt
19:41:03 <jeblair> ooooh got it
19:41:40 <clarkb> since we have multiple copies of images and multiple images and multiple formats that all adds up
19:41:45 <clarkb> also the scratch space and cache for dib
19:41:48 <fungi> we could _probably_ get by with 0.5tb in there... nb01 is using 261gb and nb02 185gb
19:42:11 <pabelanger> ya, since feature/zuulv3 branch, the storage has been much lower
19:42:27 <pabelanger> mostly because we are not leaking things :)
19:42:41 <clarkb> and we've split the image storage across multiple nodes
19:42:47 <pabelanger> that too
19:43:13 <clarkb> also we stopped rawing
19:43:20 <clarkb> so lots of good improvements
19:43:59 <fungi> so maybe 0.5tb is plenty, but regardless we have little/no volume quota in that tenant currently?
19:44:28 <pabelanger> ya, we are over quota atm
19:44:54 <fungi> what else do we have in there besides planet01?
19:45:08 <pabelanger> old mirror
19:45:10 <pabelanger> for nodepool
19:45:14 <fungi> (which isn't using cinder afaict)
19:45:22 <clarkb> it may actually be using it
19:45:30 <clarkb> its possible you could remove the mirror and its volume to reclaim that quota
19:45:32 <fungi> i meant planet01 isn't
19:45:39 <fungi> but yeah, we can delete that mirror server
19:46:04 <fungi> since we stopped trying to use vexxhost for nodepool nodes
19:46:06 <pabelanger> k
19:46:20 <pabelanger> I was using it to debug some apache proxy caching stuff
19:46:25 <fungi> and hopefully that frees you up to push forward on the nb03 build
19:46:42 <fungi> oh, well you could also just unmount and detach the cinder volume in that case
19:46:44 <clarkb> pabelanger: we can always redeploy one without a volume for that sort of testing
19:46:48 <clarkb> or that
19:46:56 <pabelanger> the issue is plant is using 200GB volume
19:46:59 <fungi> and then just delete the volume not the server
19:47:00 <pabelanger> and our quota is 100GB
19:47:16 <pabelanger> sorry, have to run openstack commands again to confirm
19:47:30 <pabelanger> VolumeSizeExceedsAvailableQuota: Requested volume or snapshot exceeds allowed gigabytes quota. Requested 1024G, quota is 1000G and 200G has been consumed. (HTTP 413) (Request-ID: req-2ea6e088-f9cd-4e71-b635-901ee212f7f8)
19:47:46 <fungi> yeah, that's a 200gb volume on the mirror server
19:49:03 <pabelanger> sorry, I am not sure what our quota is
19:49:15 <pabelanger> /faceplam
19:49:19 <pabelanger> 100GB :)
19:49:26 <pabelanger> so, we could do 500GB volume
19:49:32 <pabelanger> 1000*
19:49:39 <pabelanger> I am going to stop typing now
19:49:43 <fungi> yeah, that ought to be plenty for now
19:50:31 <clarkb> yup 500GB seems fine.
19:51:06 <pabelanger> k, I'll do 500GB
19:51:38 <fungi> and as for the other 200gb, confirmed as suspected:
19:51:40 <fungi> | 17eeb39d-c6de-4e32-8c08-26cf3592a22c | mirror.ca-ymq-1.vexxhost.openstack.org/main02 | in-use    |  200 | Attached to mirror.ca-ymq-1.vexxhost.openstack.org on /dev/vdc  |
19:55:35 <fungi> okay, well if there's nothing else, that concludes this week's installment!
19:55:38 <fungi> thanks everyone
19:55:46 <fungi> #endmeeting