#openstack-meeting log

20:00:20 <sdake> #startmeeting kolla
20:00:23 <openstack> Meeting started Mon Feb  9 20:00:20 2015 UTC and is due to finish in 60 minutes.  The chair is sdake. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:00:24 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
20:00:26 <openstack> The meeting name has been set to 'kolla'
20:00:36 <sdake> #topic rollcall
20:00:39 <daneyon_> here
20:00:44 <britthouser> here
20:00:48 <sdake> steak here [\o/
20:00:53 <rhallisey> hello
20:00:57 <sdake> hey ryan
20:01:00 <sdake> hey daneyon
20:01:02 <sdake> hey britt
20:01:02 <daneyon_> hi
20:01:04 <sdake> jpeeler make it ?
20:01:21 <jpeeler> thanks for the ping, yeah
20:01:25 <sdake> cool
20:01:30 <sdake> glad you hang out in this channel :)
20:01:32 <sdake> #topic agenda
20:02:05 <sdake> #link https://wiki.openstack.org/wiki/Meetings/Kolla#Agenda_for_next_meeting
20:02:11 <sdake> anyone have anything to add last minute?
20:02:29 <daneyon_> nada
20:02:38 <rhallisey> nothing
20:02:53 <sdake> #topic review of super-privileged container approach
20:03:11 <sdake> I don't know if everyone has had a chance to read the super privileged container spec
20:03:19 <sdake> but it proposes a new direction for kolla
20:03:42 <sdake> #link https://review.openstack.org/#/c/153798/
20:03:46 <daneyon_> i just performed another review and submitted it just b4 the meeting. I wanted to get your feedback on details for dealing with mysql data
20:04:10 <sdake> in summary the proposal is to remove kubernetes as a dependency and focus on using docker only with the full docker API availlable
20:04:17 <sdake> rhallisey have you had a chance to review the spec?
20:04:19 <daneyon_> to perform complete seperation among the services, should we have a mysql container for each service?
20:04:25 <sdake> I saw daneyons and jpeelers reviews
20:04:34 <daneyon_> I agree to that approach.
20:04:37 <rhallisey> sdake, ya I left a few comments
20:04:50 <sdake> daneyon sounds cool, but complicated :)
20:04:57 <sdake> we dont have to specify how we do that part tho
20:05:10 <sdake> I just want general agreement on a change in focus
20:05:14 <sdake> because we will have to implement it
20:05:26 <sdake> without this change in focus, I'm not sure what else can be done with tthe current kolla implementation
20:05:52 <sdake> I have alot of outstanding comments I see in the review, I'll submit an update today
20:06:11 <sdake> what I'd like is for the four of us here from the core team unanimously agree to the specification
20:06:25 <sdake> if you disagree, propose an alternative that is viable :)
20:06:36 <daneyon_> I agree to the change in focus.
20:06:38 <sdake> that means 4 +2 votes
20:06:45 <sdake> on the specification
20:06:54 <sdake> (mine is implicit in the review request:)
20:07:10 <sdake> jpeeler/rhallisey able to review this over the next week and beat the spec into submission then?
20:07:47 <rhallisey> I read it and I think it's a good idea
20:07:51 <jpeeler> sdake: yeah i can do that. but from what i've seen, minor details are all that's left
20:08:04 <rhallisey> +2 on the idea for me
20:08:08 <sdake> cool, so hopefully we can get the spec approved this week - sounds like everyone is on board
20:08:25 <sdake> #topic milestone #3 planning
20:08:27 <daneyon_> I just +2'd
20:08:35 <sdake> well it needs love yet daneyon :)
20:08:41 <sdake> but thanks for the vote of confidence :)
20:08:47 <daneyon_> for sure
20:09:09 <daneyon_> so i would think milestone 3 is going to see big changes :-)
20:09:14 <sdake> Ok, so since we are using this new approach, I think we need to start to define the blueprints that make up milestone #3 - which is defined as launching stuff via SPC
20:10:10 <daneyon_> Would creating an easy ot use dev environment be at the top of the list? I don;t think kube-heat will be of use to us
20:11:08 <sdake> daneyon_ I think we can easily create something based on old heat-kubernetes that just launches 3 VMs to run the various nodes
20:11:14 <sdake> #link https://etherpad.openstack.org/p/kolla-blueprint-brainstorm
20:11:29 <jpeeler> how is the networking going to work?
20:11:36 <sdake> I'd like to brainstorm here in this etherpad, and I'll convert em into blueprints
20:11:43 <sdake> --net=host, in other words using the host network stack
20:12:21 <jpeeler> does that give each container a "real" ip?
20:12:27 <daneyon_> jpeeler: I think we start of with nova-network like we did before
20:12:42 <sdake> ya lets start with nova-network
20:12:53 <sdake> jpeeler it gives all containers the same ip on the system
20:12:56 <daneyon_> jpeeler: then we move to neutron, using either ovs or linucbridge + ML2 plugin
20:13:39 <britthouser> so the development env would be a dressed down all-in-one system?
20:13:47 <jpeeler> i guess i'm thinking more about container communication
20:13:54 <sdake> britthouser more or less
20:14:17 <sdake> jpeeler the containers essentially use the host's network, so if they communicate its almost as if they were a process running in the host os
20:14:33 <jpeeler> ok just trying to wrap my head around it, thanks
20:14:41 <sdake> need more input on the blueprints - can folks start adding stuff to the etherpad ;-)
20:15:00 <sdake> What about the containers we need that we don't yet have
20:15:08 <sdake> rhallisey can you add those to the etherpad
20:15:38 <daneyon_> testing?
20:15:43 <rhallisey> sdake, sure, I'm sure what we don't have yet..
20:15:52 * rhallisey looks
20:16:12 <sdake> rhallisey read the spec - it has the desired container set - compare vs what we have
20:16:40 <rhallisey> gotcha
20:16:42 <sdake> jpeeler super-privileged containers are a raelly thin chroot essentially :)
20:16:56 <sdake> so we have "develop new containers"
20:17:06 <sdake> add that in and put some subheadings :)
20:17:20 <daneyon_> what are the OS details that we want to run this stuff on... Fedora atomic?
20:17:32 <sdake> atomic doesn't have git or any useful tools for development
20:17:33 <britthouser> Would the deployment scheme be one container per VM? or all containers on 3-VMs? or doesn't really matter?
20:17:35 <sdake> I think we want to stick with f21
20:17:55 <sdake> britthouser so we need a tool to deploy an individual container set
20:18:02 <sdake> can you add that to the etherpad?
20:18:06 <jpeeler> sorry, one more question: is it too tripleO unfriendly to use something like fig to bootstrap the environment?
20:18:18 <sdake> then when your on a vm, you can run the tool
20:18:30 <sdake> jpeeler I'd like to not complicate things early with new tools or systems if possible
20:18:41 <sdake> but long term we can add things like fig or puppet
20:18:42 <jpeeler> it's part of docker now as far as i know
20:18:45 <sdake> I just don't know the right answer yet
20:18:57 <sdake> can you give a brief overview of fig?
20:19:23 <jpeeler> ha, not a competent one. i thought it was like "vagrant up" for containers.
20:19:30 <bdastur> is f21 for the container base image?
20:20:08 <daneyon_> I would like to discuss the use of HA tools such as Pacemaker, Corosync, before we add those to the container list
20:20:12 <sdake> bdastur good question, atm we use f21, but I'd like to go to Centos
20:21:05 <jpeeler> sdake: https://www.orchardup.com/blog/fig if it's not in docker (i haven't checked), then not worth thinking about
20:21:40 <sdake> daneyon_ if you want to have a discussion about HA, we should do in the spec imo
20:21:54 <sdake> because those are specified in the spec directly
20:22:17 <sdake> swift I think doens't work :)
20:22:54 <daneyon_> sdake: Do you think that's an implementation detail? Some people may want to use corosync, others galera
20:23:19 <sdake> galera is only for mysql iiuc
20:23:24 <notmyname> ?
20:23:25 <sdake> we need galera too to do ha for mysql
20:23:32 <notmyname> what doesn't work in swift?
20:23:44 <sdake> kolla containers for swift are busted notmyname
20:24:02 <sdake> nothing of your fault - our fault :)
20:24:16 <bdastur> shouldnt galera be part of the same container as mysql
20:24:43 <bdastur> I actually managed to get two containers on two VMs running rabbitmq in a HA cluster
20:24:51 <daneyon_> sdake: I guess in general, it have been involved in HA implementations that do not use the HA tools mentioned in the spec. I think HA may be outside the scope of the initial spec?
20:24:57 <sdake> each logical service goes in a separate container, and shares the host as necessary via bind mounting /run or /var
20:25:25 <sdake> ok we can drop HA, although I'd like to tackle HA of mysql and rabbit if possible
20:26:48 <sdake> so lets have a more general discussion about ha quickly
20:26:50 <daneyon_> sdake: I want to tackle it too. I think it must get done for anyone to use kolla in prod. we can work on an implementation.
20:26:56 <sdake> do folks want to tackle ha in milestone 3 or later?
20:27:04 <daneyon_> OK, HA?
20:27:57 <sdake> we know we want ha for mysql via galera and rabbitmq
20:28:02 <sdake> maybe we should just start with those
20:28:05 <bdastur> better to validate HA earlier to make sure there are no obstacles we did not anticipate
20:28:09 <sdake> although I like the check script idea
20:28:59 <sdake> ok, well lets do this
20:29:05 <britthouser> galera and rabbit seem to be the most popular (or at least most talked about) ways of doing HA, so that is a good starting point.
20:29:06 <sdake> lets assume we are going to implement everything in the spec
20:29:25 <sdake> and write out the blueprints assuming we are doing that
20:29:35 <sdake> and if any part of the spec changes, we can just "Not" the blueprint and forget about it ;-)
20:29:45 <sdake> (between now and approval of the spec that is)
20:29:52 <daneyon_> an initial HA implementation could be 1. HAProxy for API endpoints, MySQL VIP. 2. Galera for clustering the MySQL DB 3. Standard Rabbit Clustering.... implementation specific would need to get worked out.
20:30:42 <sdake> ok, what runs the container check script and restarts it if busted?
20:30:48 <sdake> I guess we need a tool for that!
20:31:01 <sdake> then we can possibly remove corosync and pacemaker
20:31:15 <daneyon_> sdake: if we're going to cluster the DB and MQ, we need something for the API endpoints, DB VIP > HAProxy for some other OS SLB
20:31:34 <sdake> wow too many acronyms my brain just imploded ;-)
20:31:43 <sdake> SLB = ?
20:31:54 <britthouser> server load balancer
20:31:58 <daneyon_> sdake: How does the script know if the container is busted?
20:32:11 <sdake> it runs a check script in the container via docker exec
20:32:16 <daneyon_> SLB- Server Load Balancer, HAProxy, F5, etc..
20:32:19 <sdake> if the check script returns 0 - gtg, returns -1, restart
20:32:59 <sdake> the check script can do some form of healthcheck on the softwre in the container
20:33:15 <britthouser> Probably need some intelligence in that script.  If it dies 3x, then don't restart or something like that.
20:33:25 <sdake> is HA proxy any good?  Or is there a better tool
20:33:29 <sdake> britthouser that is called escalation
20:33:38 <sdake> at that point, you would want to reset the machine
20:33:43 <sdake> but lets not think about escalation now
20:33:46 <daneyon_> sdake: is the check script looking to see if, for example, that a test tenant can talk to the Ketystone API and create a user, endpoint, etc?
20:33:47 <britthouser> Ok.
20:33:49 <sdake> lets assume its dumb and smiple :)
20:34:04 <sdake> right, that would be a keystone-api check script
20:35:06 <sdake> We need something to check the health of the machine, but I think that is above the line we care about
20:35:12 <sdake> that is what pacemaker + corosync tackle
20:35:25 <sdake> we only care about health of containers
20:35:31 <sdake> in a real system, someone will have to sort out health of the bare metal as well
20:35:42 <daneyon_> sdake: instead of managing our own scripts for health checking, their is a tool (trying to remember the name) that can run as a process within each container to perform deep health checking.
20:36:04 <britthouser> Ok...so corosync+pacemaker aren't implementing the HA between openstack services, just the containers themselves.  I misunderstood that earlier.
20:36:06 <sdake> it health checks openstack specifics?
20:36:11 <daneyon_> Let me dig up the name and I'll send it along.
20:36:25 <sdake> put a link in the etherpad
20:36:29 <daneyon_> yes.
20:36:59 <sdake> britthouser I removed pacemaker+corosync, I think they are not necessary for container management
20:37:10 <daneyon_> i'll dig it up.... in the meantime, lets not set it in stone that we are going to create our own shell scripts from scratch to perform health checking.
20:37:13 <sdake> corosync + pacemaker manage a group of machines, we are only talking a single machine
20:37:17 <daneyon_> will do
20:37:43 * britthouser rereads...
20:38:05 <daneyon_> we can run health checks son each of the cluster tools, HAProxy, Galera, etc..
20:38:21 <britthouser> I'm with you now sdake
20:38:31 <daneyon_> I don;t see a need for corosync or pacemaker at this point. I'm open to hear from others on the need though
20:39:01 <sdake> if we need something to monitor health of machines and restart them, that is where pacemaker and corosync come in
20:39:43 <daneyon_> health check monitoring though monit #link http://mmonit.com/monit/
20:41:06 <sdake> license?
20:41:25 <daneyon_> open source
20:41:34 <sdake> which one :)
20:41:41 <jpeeler> AGPL
20:41:50 <sdake> gaahh
20:42:00 <sdake> who would use that license!
20:42:09 <sdake> that one makes lawyers cringe for some reason
20:43:08 <sdake> perhaps we can just put monit in a container if it does the job
20:43:14 <sdake> run with --pid=host
20:43:35 <daneyon_> for now, I say we need to investigate the best way to perform health checking in each container and across the container cluster. Creating our own scripts from scratch should be the last option... I'm open to other tools. I just mention Monit because I used it back when i was doing customer deployments and it worked well
20:43:40 <sdake> so launch would like like docker exec blah monit x
20:43:58 <sdake> cool i'll play with it today daneyon_
20:44:02 <sdake> looks small enough
20:44:23 <jpeeler> +1 on avoiding using our own scripts
20:44:44 <sdake> ya we want our scripts to be simple - less then 100 lines if possible
20:44:54 <sdake> although not always posible inside a container
20:45:20 <daneyon_> Monit needs to be able to restart the pid of the service it's monitoring
20:45:25 <sdake> ok, i'll spec monit after I give it a go
20:45:47 <sdake> daneyon_ it would be able to do that with --pid=host
20:46:03 <daneyon_> sdake: roger that
20:46:28 <sdake> any other suggested blueprints?
20:46:32 <daneyon_> what about container logging?
20:46:43 <sdake> that needs to go in the spec if you want it :)
20:46:59 <sdake> maybe we can defer that to a later spec
20:47:07 <daneyon_> if we're going to address monitoring the services, should we dev a logging solution?
20:47:09 <sdake> I have no idea how to do the job on that point
20:47:33 <sdake> I like logging to stdout personally :)
20:47:42 <sdake> and have some tool capture them via docker log
20:47:51 <daneyon_> OK, we can address logging in a follow-on milestone
20:48:01 <sdake> or a followon spec in this milestone as well
20:48:06 <sdake> so dates
20:48:15 <sdake> I think we have beat the etherpad into pretty good shape
20:48:25 <sdake> I'll not convert to blueprints until we have approved the spec
20:48:33 <sdake> so if you want, feel free to edit as you see fit
20:49:15 <sdake> #link https://wiki.openstack.org/wiki/Kilo_Release_Schedule
20:49:49 <sdake> March 19 is k3, I think it makes alot of sense to align with the OpenStack project's release schedule
20:50:05 <sdake> and that is about 4 weeks after we wrap up our planning
20:50:25 <sdake> yah/ney? :)
20:50:44 <rhallisey> sounds good
20:50:52 <daneyon_> yah
20:51:13 <sdake> cool sounds good then no nays :)
20:51:16 <sdake> #topic open discussion
20:51:25 <sdake> we have 9 minutes but we had alot of open discussion already
20:51:32 <sdake> seems like pepole are fired up for this new approach to me :)
20:51:40 <daneyon_> ya!
20:51:50 <britthouser> Huzzah!
20:51:53 <daneyon_> Another monitoring option #link https://github.com/stackforge/monitoring-for-openstack
20:52:03 <rhallisey> sdake, when we get there, I can write us the selinux policy needed whne using the super privlaged containers
20:52:11 <daneyon_> not much dev lately, but definitly better than starting from scratch
20:52:13 <sdake> rhallisey that would totally rock!
20:53:45 <sdake> ok anything else?
20:53:49 <sdake> or shall we end the meeting
20:54:12 <sdake> danyeon_ those monitoring scripts look interesting, would work well as check scripts
20:54:26 <daneyon_> time for lunch. thx.
20:54:32 <sdake> #endmeeting