#openstack-meeting log

16:00:52 <VW_> #startmeeting Large Deployment Team
16:00:53 <openstack> Meeting started Thu Apr 16 16:00:52 2015 UTC and is due to finish in 60 minutes.  The chair is VW_. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:56 <openstack> The meeting name has been set to 'large_deployment_team'
16:01:11 <VW_> howdy, folks
16:01:16 <klindgren_> hi
16:01:19 <belmoreira> o/
16:02:14 <VW_> Alright, let's get started.
16:02:38 <VW_> #topic Ops Midcycle Meetup debrief
16:02:57 <VW_> so, I've gone over this - https://etherpad.openstack.org/p/PHL-ops-large-deployments - a few times
16:03:02 <belmoreira> VW_: I can find the agenda. Do you have a link?
16:03:11 <VW_> yep, belmoreira - one sec
16:03:38 <VW_> https://wiki.openstack.org/wiki/Meetings/LDT
16:03:50 <VW_> guess I skipped roll call :\
16:04:29 <belmoreira> VW_: thx
16:04:32 <VW_> yep
16:04:36 <mdorman> o/
16:04:44 <VW_> howdy mdorman
16:05:29 <VW_> so, andyhky is here.  I don't think jlk has joined.  but that gives us at least one of the moderators at the midcycle
16:05:42 <andyhky> o/
16:05:51 <VW_> andyhky: any over arching take aways from the discussion there
16:06:49 <andyhky> The discussion around pruning deleted instances brought this bug up: https://bugs.launchpad.net/nova/+bug/1226049 / https://review.openstack.org/#/c/109201/
16:06:50 <openstack> Launchpad bug 1226049 in OpenStack Compute (nova) "instance_system_metadata rows not being deleted" [High,In progress] - Assigned to Alex Xu (xuhj)
16:07:55 <VW_> did the group commit to any action around it, andyhky
16:09:18 <VW_> I know we are actually running a script here that purges deleted instance data older than 90 days
16:09:20 <andyhky> We didn't have specific commitments, it was more of a discussion of issues (e.g., pruning, rabbit leaks) and best practices.
16:09:33 <VW_> kk
16:09:58 <VW_> I see there was quite a bit of discussion around adding/testing new hosts/nodes as well
16:10:34 <klindgren> speaking of which - does anyone have a solution to deploying to disabled hosts?
16:10:55 <klindgren> I thought that the --availability-zone az:host trick would work - but it doesn't :-/
16:11:00 <VW_> yeah
16:11:07 <andyhky> We had a discussion on recommending a disabled by default for a new compute node.
16:11:23 <andyhky> The room seemed to want the default to change to disabled.
16:11:23 <belmoreira> klindgren: no
16:11:37 <VW_> one of my engineers has a patch that will build to a disabled host if you target it specifically, klindgren
16:11:43 <VW_> because the AZ thing didn't work for us either
16:11:56 <VW_> let me see if I can find it
16:12:54 <belmoreira> andyhky: https://review.openstack.org/#/c/136645/7 is related. It proposes a disable reason by default
16:15:49 <VW_> any follow up items in general from the midcycle?
16:15:57 <VW_> or was it all just discussion on these particular topics
16:16:10 <andyhky> I think it's worth coming up with a recommendation on the status of new compute nodes
16:16:22 <VW_> sounds fair
16:16:41 <VW_> do we want that ahead of Vancouver
16:16:47 <VW_> or do we want it on our agenda for there
16:16:53 <andyhky> I'll start a ML thread
16:17:30 <VW_> #action andyhky Start ML thread on status of a new compute node
16:17:37 <VW_> cool - thanks, andyhky
16:17:59 <VW_> I'm still looking for the patch I mentioned above
16:18:08 <VW_> any other items related to mid-cycle we want to dive into here
16:18:26 <andyhky> I think that's enough from the mid-cycle
16:18:35 <VW_> cool
16:18:54 <VW_> #topic Vancouver OPs meetup - LDT session
16:19:12 <VW_> so, we have like 3 hours I think at our disposal for a working group session(s)
16:19:15 <VW_> let me verify
16:19:50 <VW_> ok - not quite - about 2:20
16:20:11 <VW_> but 3 sessions
16:20:24 <VW_> 1:50 - 2:30
16:20:31 <VW_> 2:40 - 3:20
16:20:41 <VW_> and 3:30 - 4:10
16:21:28 <VW_> I really think we need to come out of this one with a new blueprint, comments/reccomendations on exiting blueprints or something like that
16:21:48 <VW_> that's my opinion, but I think the following comment on the planning etherpad is valid:
16:22:11 <VW_> "I like this group, too, but what real work would we do here?"
16:22:22 <VW_> any of you have thoughts?
16:25:12 <belmoreira> yes, we need to have "procedure" to raise and share our "concerns"
16:26:09 <VW_> indeed.  It's hard because I know most of us are busy fighting fires and such that comes with running big clouds
16:26:14 <belmoreira> because, after the summit how can developers will be notified about our ideas?
16:26:36 <belmoreira> I image that only very few will attend the ops sessions
16:26:53 <VW_> of the devs?
16:26:56 <VW_> sorry - just clarifying
16:27:33 <belmoreira> yes, sorry
16:27:44 <VW_> so, proposal then:
16:28:33 <andyhky> belmoreira / VW_ - so with this compute node status change, I'm just going to propose a spec and see where it lands
16:29:02 <VW_> session 1:  Pull together all our thoughts on adding new hosts / managing capacity
16:29:25 <VW_> this would include new specs like andyhky's and comments from the group in the room on any others
16:29:38 <belmoreira> andyhky: great
16:29:41 <VW_> session 2:  Find something from the cells v2 folks we can bring to the group and get them feedback on
16:30:07 <VW_> session 3:  general business.  review meeting schedule, etc.  Find out how we can improve the processes, etc and make our group more influential
16:30:15 <andyhky> If we have a discussion and it results in recommendations, we should have a spec owner and deliver the spec upstream
16:31:12 <VW_> yeah, I'm thinking before we even leave the room, andyhky
16:31:32 <VW_> if possible
16:31:36 <andyhky> The spec owner is identified before leaving the room
16:32:26 <VW_> anyone have any issues with the rough schedule above then
16:32:54 <VW_> with the requirement that we hold ourselves to turning all recommendations into specs or feedback on existing specs with an owner
16:34:08 <klindgren> sounds good to me
16:34:49 <belmoreira> +1
16:35:24 <VW_> cool - then I'll start working on the etherpad then
16:35:51 <VW_> #action VW_ start updating etherpad for Vancouver with proposed session schedules
16:36:16 <VW_> #action VW_ reach out to Cells V2 devs to find out an issue/spec we can working and provide feedback/specs
16:36:38 <VW_> anything else folks want to discuss for Vancouver?
16:37:06 <mdorman> sorry, a bit distracted here with standup meetings.   that all looks good to me for YVR
16:37:20 <VW_> cool
16:37:45 <VW_> in that case...
16:37:52 <VW_> #topic Other Business
16:38:03 <VW_> The floor is open
16:38:11 <VW_> anything else anyone wants to discuss?
16:40:29 <VW_> have any of you looked at this - https://review.openstack.org/#/c/169836
16:40:44 <mdorman> yup
16:41:13 <mdorman> i kinda go back and forth whether this is a good long-term solution, but i think it’s one of those “it’s better than nothing” things
16:41:23 <VW_> yeah, me too mdorman
16:41:32 <VW_> johnthetubaguy: is spot on about not assuming the VMS are down
16:41:57 <mdorman> i haven’t read the comments for the last few days, so haven’t seen that yet.
16:42:05 <VW_> in our case the compute node is virtualized so a separate "node" than the host
16:42:08 <mdorman> good discussion to have, nonetheless
16:42:34 <VW_> I agree
16:42:51 <VW_> that's why I thought I'd make more of this group aware :)
16:42:56 <mdorman> yup yup
16:43:34 <belmoreira> I need to leave...
16:43:40 <VW_> but I get a little bi-polar about it too.  Probably because we have several automation services that we've built to handle down hosts so we can just have them mark the disabled bit for us
16:43:42 <VW_> ok belmoreira
16:43:45 <VW_> thanks for coming
16:43:52 <VW_> see you in Vancouver
16:43:58 <johnthetubaguy> VW: my main worry is that we need a long term plan really, before adding another band-aid
16:44:47 <VW_> yeah, johnthetubaguy - I'm with you
16:45:46 <mdorman> +1
16:45:47 <johnthetubaguy> VW_: totally agreed that its an important thing to sort out
16:46:07 <VW_> yeah, johnthetubaguy that's why I wanted more of the LDT folks to at least be aware
16:46:17 <johnthetubaguy> cool, sounds good
16:46:32 <VW_> it sounds like we might make one our sessions in YVR focused on adding new hosts/capacity management
16:46:38 <VW_> maybe we can work this one in too
16:46:42 <VW_> will be tight
16:47:20 <klindgren> So - we have been running the updates oslo.messaging
16:47:30 <klindgren> with heartbeat stuff under juno code base
16:47:38 <klindgren> rabbit is much much much better
16:47:46 <VW_> 1.9, klindgren
16:47:57 <VW_> oslo.messaging that is
16:47:57 <klindgren> iirc 1.8.1
16:48:30 <VW_> hmm - good to know.
16:48:43 <klindgren> yea 1.8.1 they pulled everything into that and cut a tag.
16:49:00 <klindgren> We had some major maintenace the other day that left some compute nodes network down for a long time
16:49:06 <klindgren> they all recovered without issue
16:49:30 <klindgren> no lost/hung rabbit rpc stuff - it all "just worked" within a few minutes of the network comming online
16:49:47 <VW_> yeah, good to know
16:50:13 <VW_> we got hit with the issue the other day when a network device failed over and cut the connection between computes and rabit
16:50:16 <VW_> rabbit even
16:50:20 <VW_> so I'm all for getting that fix
16:51:03 <VW_> for those who don't know - here is a related link - https://review.openstack.org/#/c/146047/
16:51:09 <klindgren> yea - looking forward to the day when I have 99 problems and rabbitmq is not one.
16:51:27 <VW_> indeed
16:51:34 <VW_> message queues are hard, evidently
16:51:37 <VW_> :)
16:52:36 <VW_> ok, well we have a few minutes left
16:52:47 <VW_> anything else?
16:53:09 <mdorman> i’m good
16:53:19 <mdorman> thanks for organizing VW_
16:53:29 <VW_> my pleasure
16:53:41 <VW_> thanks, to all of you, for helping us get a plan for YVR
16:53:52 <VW_> want to make those sessions as productive as possible
16:54:31 <VW_> alright, well, I'll give everyone 5 minutes back
16:54:36 <VW_> see you all in Vancouver!
16:55:05 <VW_> thanks for joining today
16:55:10 <VW_> #endmeeting