#openstack-meeting log

09:01:23 <aspiers> #startmeeting ha
09:01:24 <openstack> Meeting started Mon Dec  7 09:01:23 2015 UTC and is due to finish in 60 minutes.  The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:01:25 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:01:27 <openstack> The meeting name has been set to 'ha'
09:01:40 <aspiers> OK hopefully this is the right channel now ;-)
09:01:53 <ddeja> yes, hello :-)
09:01:59 <masahito> o/
09:02:02 <aspiers> Welcome everyone - again :-)
09:02:12 <kazuIchikawa> o/
09:02:22 <aspiers> Maybe I should have mentioned that this meeting time is Monday morning for me, so sometimes my brain will not work ;-)
09:02:40 <beekhof> howdy
09:02:48 <_gryf> hey
09:03:19 <aspiers> #topic Current status (progress, issues, roadblocks, further plans)
09:03:55 <aspiers> _gryf/ddeja: you want to give any updates?
09:04:18 <ddeja> I have working fix for bug in mistral, but struggle with tests
09:04:26 <aspiers> ok
09:04:46 <ddeja> but other guy from Mistral have proposed another (probably better) fix
09:04:57 <_gryf> from my side - nothing in the ha area
09:05:04 <ddeja> so I hope to have it resolved this week
09:05:09 <aspiers> ok cool
09:05:25 <ddeja> I'll be making review and start hardening POC solution
09:05:44 <aspiers> ddeja: sounds good!
09:05:49 <aspiers> masahito: anything from your side?
09:06:09 <masahito> I checked 2 things last week.  i) whether Masakari works with pacemaker-remote with no change or not?  ii) easy to replace sqlalchemy?
09:06:25 <masahito> i) We need some works for using pacemaker-remote.  Changes sources of host's status for hostmonitor process. Masakari parses output of crm_mon command to check which host is online and offline.  aspiers suggetted up-to-date crmsh is better.
09:06:40 <masahito> but I hit a problem in crmsh when I checked it. A remote node appears in 'crm node list', but when pacemaker-remote goes down the command doesn't said the remote node is offline. I used pacemaker 1.1.10 on ubuntu14.04. If the version is wrong for testing, please let me know.
09:06:52 <masahito> ii) replacing MySQLdb with sqlalchemy isn't difficult but needs some works.
09:07:01 <aspiers> 1.1.100 is old
09:07:05 <aspiers> 1.1.10 I mean
09:07:24 <masahito> above is my update :)
09:07:26 <beekhof> crmsh wont work for RH
09:07:54 <masahito> beekhof: which tools should I use?
09:08:07 <beekhof> well RH would love you to use pcs
09:08:07 <beekhof> but
09:08:14 <beekhof> maybe crm_mon --xml ?
09:08:27 <masahito> ubuntu doesn't have it.
09:08:36 <ddeja> beekhof: why not pcs? I'm using it and it's OK?
09:08:53 <aspiers> ddeja: pcs only exists on RH IIUC
09:08:58 <beekhof> ddeja: because that will make suse just as unhappy
09:09:05 <ddeja> Oh, OK
09:09:13 <aspiers> (and Ubuntu)
09:09:33 <beekhof> masahito: i'd worry about using pacemaker remote on a version old enough not to have --xml
09:09:46 <beekhof> sorry, --as-xml
09:09:57 <aspiers> masahito: you should probably use 1.1.13 at least
09:10:09 <beekhof> at least
09:10:17 <masahito> beekhof: ok, I'll try it with latest one.
09:10:21 <beekhof> lots of work went into it this year
09:10:50 <beekhof> lots of scary, "how did this ever work" kind of fixes
09:11:15 <aspiers> crm_node should work cross-distro, but IIRC it doesn't work on remotes unless they have a node attribute set
09:11:16 <_gryf> masahito, what operating system do you use?
09:11:44 <masahito> _gryf: ubuntu
09:11:57 <_gryf> masahito, oh, i see, ubuntu lts
09:12:34 <masahito> _gryf: currently we use ubuntu14.04 because of lts
09:12:34 <aspiers> I should try to summarise some of this for the minutes
09:13:01 <aspiers> #action ddeja to review alternate solution to mistral bug
09:13:13 <aspiers> #action ddeja to continue hardening mistral PoC
09:13:43 <aspiers> #info masahito is now working on integrating pacemaker_remote into masakari
09:13:59 <aspiers> #info masahito is now working on replacing MySQLdb with sqlalchemy
09:14:00 <beekhof> aspiers: it should work
09:14:20 <aspiers> #action masahito will see if a newer pacemaker solves some of his issues
09:14:47 <aspiers> beekhof: it works, but remotes are missing from one part of the CIB until a node attribute is set on them
09:15:01 <aspiers> beekhof: therefore IIRC, crm_node -l only includes remotes with node attributes
09:15:10 <aspiers> but I could be remembering the details totally wrong
09:15:17 <aspiers> since it is still early on Monday
09:15:39 <beekhof> i think i encouraged ken to change that
09:15:49 <aspiers> that would be nice - it did seem a bit inconsistent
09:15:52 <beekhof> should be reliable in 1.1.14
09:16:16 <aspiers> beekhof: any updates from your side?
09:16:29 <beekhof> ha. yeah
09:16:56 <beekhof> so remember all my problems last week?  yeah, someone pointed fencing at the undercloud instead.
09:17:07 <aspiers> LOL
09:17:31 <beekhof> there are still some fixes to be had, and i'd like to co-ordinate with you on the host/nova name mappings
09:17:39 <aspiers> yes please
09:17:45 <aspiers> that was the bit I had the most problems with
09:17:46 <beekhof> see if we can get something that works for both of us
09:18:07 <beekhof> yep, i still have your diff open
09:18:23 <aspiers> I have a whole page of notes from when I was reverse-engineering that
09:18:36 <aspiers> I should share it
09:18:50 <beekhof> i should have documented it :)
09:18:58 <aspiers> that would be nice :)
09:19:01 <aspiers> but it wasn't just you
09:19:21 <aspiers> a lot of the mapping stuff is quite sparse on docs
09:19:46 <aspiers> e.g. pcmk_host_map, how port gets set for fencing agents etc.
09:19:55 <beekhof> yeah
09:19:59 <aspiers> #action beekhof and aspiers to discuss host/nova name mappings
09:20:01 <beekhof> too little time
09:20:04 <aspiers> right
09:20:07 <beekhof> i updated RH on the status here, and someone raised a concern that Congress might be flatlining
09:20:11 <beekhof> anyone want to comment on that?
09:20:31 <aspiers> it doesn't look like that based on the meeting minutes I read last week
09:20:50 <aspiers> but I wanted to discuss Congress as a separate topic anyway
09:21:09 <_gryf> beekhof, you mean, Congress would be discontinued?
09:21:24 <aspiers> so let me first just say I don't have any significant updates from my side since last week was our team meeting
09:21:35 <aspiers> #topic Congress and potential use in HA
09:21:36 <beekhof> someone basically sent this link ( http://stackalytics.com/?module=congress-group&metric=commits&release=liberty ) and said see, vmware isn't interested anymore
09:21:48 <beekhof> presumably vmware was the original project champion?
09:22:05 <masahito> hi, I'm also contributing Congress.
09:22:10 <aspiers> beekhof: are you mentioning Congress due to what I posted on #openstack-ha last week?
09:22:17 <beekhof> _gryf: that was the suggestion
09:22:25 <beekhof> aspiers: i dont recall who suggested it
09:22:29 <aspiers> it was me
09:22:29 <beekhof> but yeah
09:22:34 <beekhof> ok
09:22:46 <aspiers> they already have a use case for triggering nova evacuation workflows from Congress
09:23:03 <aspiers> #info Congress project has documented a use case for triggering nova evacuation workflows from Congress
09:23:07 <beekhof> use case == reason or use case == code?
09:23:14 <aspiers> reason IIUC
09:23:30 <beekhof> link?
09:23:54 <masahito> I don't know what was happen in ha, but I can explain from Congress side maybe.
09:24:09 <aspiers> #link https://docs.google.com/document/d/1ExDmT06vDZjzOPePYBqojMRfXodvsk0R8nRkX-zrkSw/edit#
09:24:21 <beekhof> to be up front, i don't much care who does the triggering.  but the division of labor seemed sane
09:24:28 <aspiers> the use case is missing from the ToC
09:24:43 <aspiers> but it's just after the "evacuation of tenants for planned outage" section
09:25:17 <aspiers> I also talked to one of our guys who is following Congress
09:25:26 <aspiers> and found out that they are talking about integrating it with Mistral
09:26:05 <aspiers> so it could watch out for the attribute set by fence_compute, and then initiate evacuation via Mistral based on cloud-specific policies
09:26:27 <aspiers> which sounds really nice because each cloud will have different ideas of how to set SLAs for pets
09:26:40 <beekhof> would we bother with fence_compute? other than to tell nova that the node is down?
09:27:10 <beekhof> once we've told nova, then congress should automagically know what to do right?
09:27:38 <aspiers> beekhof: I guess it depends on the details
09:27:50 <beekhof> doesn't everything :)
09:27:56 <ddeja> From the document that aspiers linked, it seemes that fence_compute should also tell congress
09:28:01 <aspiers> if congress is too slow to notice for some reason, the compute host could bounce back up without any action taken
09:28:07 <beekhof> also possible
09:28:10 <ddeja> but maybe masahito know the details?
09:28:36 <masahito> yap
09:28:46 <aspiers> but I really like the idea of allowing flexible policies via Congress, e.g. some might want to do it with availability zones, others per-tenant, etc...
09:28:51 <beekhof> aspiers: so was the question just "what do you think of including it?" ?
09:29:11 <aspiers> beekhof: yeah pretty much
09:29:25 <aspiers> I think there's still quite a bit of work to do, e.g. the mistral driver for congress
09:29:28 <beekhof> masahito: how speedy congress would be for this? seconds? minutes?
09:29:40 <masahito> beekhof: depends on config.
09:30:08 <masahito> beekhof: currently, congress polls Nova API every 10s in default. but
09:30:17 <aspiers> Congress team were originally planning to add Mistral integration for liberty
09:30:23 <aspiers> but AFAIK it hasn't started yet
09:30:44 <aspiers> #link https://wiki.openstack.org/wiki/PolicyGuidedFulfillmentLibertyPlanning_Remediation is an example quite similar to the use case we are discussing
09:30:48 <beekhof> aspiers: i think phase 1 == confirm if mistral is the right path + implement, phase 2 == decide if triggering via congress is the right path
09:31:01 <masahito> In Mitaka release, I'll implement a new feature that Congress receive notification from other service.
09:31:03 <aspiers> beekhof: violently agree :)
09:31:16 <beekhof> masahito: ah, excellent
09:31:25 <aspiers> masahito: very nice :-D
09:31:52 <aspiers> #info masahito is planning to enable Congress to receive external notifications in mitaka
09:32:13 <masahito> there is already a BP.
09:32:33 <masahito> #link https://blueprints.launchpad.net/congress/+spec/push-type-datasource-driver
09:32:35 <aspiers> ah cool
09:34:35 <masahito> On the other hand, the usecase aspiers mentioned above is suggested by others, so it's not under discussion now.
09:35:13 <aspiers> I will try to attend Congress / Mistral IRC meetings when possible
09:35:30 <aspiers> to stay up to date and represent our interests
09:35:48 <beekhof> aspiers: did you want to move on to our fun email thread?
09:35:52 <aspiers> but I'm pretty busy so hopefully others can too
09:35:59 <ddeja> I'm trying to cover Mistral meetings
09:36:01 <aspiers> beekhof: yes I was about to suggest that :)
09:36:06 <aspiers> ddeja: great!
09:36:14 <beekhof> but i got in first so ,,!,, :)
09:36:17 <aspiers> #topic future direction of OCF RAs
09:36:21 <aspiers> haha :)
09:36:30 <aspiers> so, I started a discussion with beekhof
09:36:35 <aspiers> on IRC
09:36:47 <aspiers> and then it turned into a private mail thread, which we should probably avoid in future :)
09:36:49 <beekhof> but then i needed to cook dinner
09:37:00 <beekhof> and you logged off
09:37:04 <aspiers> right :)
09:37:05 <ddeja> beekhof: LOL
09:37:20 <aspiers> I suggested the idea of converting the RAs to wrap around service(8)
09:37:25 <aspiers> and beekhof doesn't like that at all :)
09:37:35 <aspiers> we're currently working towards a consensus
09:37:35 <beekhof> and everyone agreed it was a terrible idea. the end. good night
09:37:39 <aspiers> haha ;-)
09:37:47 <_gryf> aspiers, serviced, or just any other init system?
09:37:57 <aspiers> _gryf: that's one of the key points
09:38:10 <aspiers> my main point was that each distro potentially has a different way of managing services
09:38:16 <aspiers> most are on systemd by now
09:38:17 <_gryf> s/serviced/systed/
09:38:19 <aspiers> but I guess not all
09:38:24 <aspiers> e.g. Ubuntu LTSS
09:38:27 <beekhof> i'm no systemd fan, but it won, we should move on
09:38:29 <_gryf> darn, systemd :)
09:38:50 <beekhof> well, at least until people start using pacemaker_remoted as pid one
09:39:05 <_gryf> ubuntu 14.04 have only logind, but still uses upstart
09:39:16 <_gryf> ubuntu 16.04 will have systed
09:39:25 <aspiers> sorry, s/LTSS/LTS/
09:39:38 <beekhof> _gryf: so still a few months out
09:39:55 <aspiers> 14.04 LTS is supported until late 2019
09:40:11 <_gryf> beekhof, right, and we have to count existing setups in
09:40:17 <beekhof> LALALALALAL there is only RHEL
09:40:18 <aspiers> beekhof: so you are suggesting we should drop support for that?
09:40:51 <ddeja> IMHO even after U16.04 we still should assume, that some other OS may come without systemd
09:40:54 <aspiers> beekhof: in theory you and I should care the same amount about Ubuntu LTS ;-)
09:40:55 <beekhof> i probably wouldnt base a forward looking solution around not having systemd if thats what you mean
09:41:10 <aspiers> ddeja: that's right
09:41:45 <beekhof> in any case, we should just use OCF scripts.  problem solved
09:41:47 <_gryf> ddeja, besides slackware and gentoo are there any other significant distribution that does not use systemd?
09:42:12 <aspiers> now might be a good time to mention the https://wiki.openstack.org/wiki/Rpm-packaging project
09:42:30 <beekhof> no comes with useful status monitoring! free with the first 100 orders
09:42:35 <beekhof> (soory, its late, i get silly)
09:42:55 <_gryf> beekhof, lol
09:43:03 <aspiers> the likelihood of converging on the same systemd service descriptions for OpenStack services for rpm-based distros depends on the success of that project
09:43:17 <aspiers> but it's actually expanding beyond rpm to deb
09:43:36 <aspiers> one of our guys (Dirk Mueller) is the PTL and he was talking to Mirantis etc. in Tokyo
09:43:51 <aspiers> although it's early stages
09:44:14 <aspiers> anyway, my point is, even if everybody uses systemd, can we rely on the services having the same names etc.?
09:44:31 <aspiers> or doing the same things
09:44:36 <beekhof> nope. do we need to?
09:44:41 <aspiers> that's the next question
09:44:53 <aspiers> e.g. if Pacemaker is controlling keystone, should systemctl status openstack-keystone still work reliably?
09:45:00 <aspiers> I would say yes
09:45:15 <aspiers> of course systemctl start/stop are a very bad idea in those scenarios
09:45:17 <beekhof> not necessarily
09:45:28 <beekhof> its arguably better if it doesnt
09:45:45 <ddeja> I'll agree with beekhof
09:45:46 <beekhof> because then people will try start/stop via systemctl too
09:45:59 <ddeja> with pacemaker we got one point to control all services
09:46:00 <beekhof> we know this because it already happens :-(
09:46:01 <aspiers> beekhof: you mean, so deliberately treat HA services as distinct from non-HA ones?
09:46:07 <beekhof> yes
09:46:09 <beekhof> absolutely
09:46:13 <beekhof> or
09:46:33 <beekhof> figure out a way to redirect systemctl X to the cluster
09:46:37 <beekhof> we're close
09:46:47 <beekhof> but of course lacking the time to implement it
09:46:52 <aspiers> ok, but then the problem is: how do we ensure that the RAs work cross-distro?
09:46:56 <beekhof> there are a couple of ways to go
09:47:03 <aspiers> which again comes back to packaging
09:47:38 <beekhof> are the binaries deployed to such different locations?
09:47:45 <beekhof> running as completely different users?
09:48:11 <aspiers> beekhof: I would be surprised if everything was identical across all distros
09:48:33 <aspiers> IIRC, one difference was whether to prefix binaries/users/groups etc. with "openstack-"
09:48:37 <beekhof> we handle apache as an RA and the binary names aren't even consistent
09:48:57 <beekhof> across distros i mean
09:48:58 <aspiers> that's true
09:49:07 <beekhof> httpd vs. apache
09:49:39 <aspiers> HTTPDLIST="/sbin/httpd2 /usr/sbin/httpd2 /usr/sbin/apache2 /sbin/httpd /usr/sbin/httpd /usr/sbin/apache $IBMHTTPD"
09:49:44 <aspiers> wow, that's pretty ugly :-/
09:50:02 <aspiers> DEFAULT_IBMCONFIG=/opt/IBMHTTPServer/conf/httpd.conf
09:50:04 <aspiers> DEFAULT_SUSECONFIG="/etc/apache2/httpd.conf"
09:50:06 <aspiers> DEFAULT_RHELCONFIG="/etc/httpd/conf/httpd.conf"
09:50:15 <aspiers> this is exactly what I would like to avoid
09:50:16 <beekhof> i'm not saying the community shouldnt standardize on something
09:50:38 <aspiers> the differences should be handled by the vendor packages, not by hardcoding them in the upstream RAs
09:50:48 <beekhof> but a) i dont think we need to drive it and b) i dont think its a requirement
09:50:54 <aspiers> e.g. the above breaks if a distro changes locations between major versions
09:51:00 <beekhof> or a reason not to do an OCF RA
09:51:20 <aspiers> do we really want to go down the road of having 'case "$SUSE_VERSION" in' stuff inside each RA?
09:51:38 <beekhof> no
09:52:05 <aspiers> I don't see another way, unless the RA delegates distro-specific decisions to something external
09:52:07 <beekhof> but i dont think the DEFAULT_ stuff is necessary
09:52:07 <beekhof> or
09:52:11 <beekhof> could be done better imho
09:52:27 <aspiers> OK, would you like to write up a proposal?
09:52:34 <aspiers> no huge rush, obviously
09:52:42 <aspiers> but probably doesn't make sense to delve into details here
09:53:07 <beekhof> i think the sticking point, is that everyone wants to have their own defaults
09:53:19 <aspiers> yes, and that's exactly why I proposed delegation
09:53:24 <beekhof> for all i care, they can be the suse values and our installer can override them
09:53:43 <aspiers> that's assuming that everything is parametrized
09:53:43 <beekhof> or vice versa
09:53:58 <beekhof> they usually need to be anyway
09:54:01 <aspiers> so we could make binary/config/pid file locations parametrized
09:54:10 <beekhof> better idea
09:54:17 <beekhof> lets decide what they should be
09:54:25 <aspiers> OCF_RESKEY_binary
09:54:32 <masahito> It sounds good.
09:54:33 <beekhof> and anyone that gets it wrong has to rely on their installer
09:54:46 <beekhof> until they fix their packaging :)
09:55:21 <aspiers> It's still duplicating a bunch of stuff which exists in systemd service descriptions
09:56:00 <beekhof> that doesn't help convergance though
09:56:09 <beekhof> write something that pulls those values out?
09:56:15 <beekhof> 3 lines of shell?
09:56:30 <aspiers> not sure I follow
09:57:16 <beekhof> grep $field $unit | awk -F= '{print $2}'
09:57:42 <aspiers> that seems a lot more cumbersome than simply wrapping systemctl
09:57:46 <beekhof> that would be the default "default default"
09:57:59 <beekhof> except it has one key advantage
09:58:07 <beekhof> i might actually work
09:58:09 <beekhof> it might actually work
09:58:17 <beekhof> systemd == nightmare
09:58:26 <aspiers> OK, I'm beginning to learn you don't like systemd :)
09:58:31 <aspiers> anyway we're out of time for now
09:58:44 <aspiers> let's continue this on IRC / openstack-dev@
09:58:48 <beekhof> k
09:58:58 <aspiers> any other topics in the remaining seconds?
09:59:11 <_gryf> i guess no
09:59:20 <aspiers> I think we sent everyone else to sleep with our RA talk ;-))
09:59:24 <masahito> next week, I want us to start thinking milestone of the HA.
09:59:38 <aspiers> masahito: good idea, I'll put on the agenda!
09:59:47 <masahito> aspiers: thanks!
09:59:49 <aspiers> ok thanks all, and see you next week, or sooner on IRC!
10:00:05 <aspiers> #action put roadmap on agenda for next meeting
10:00:21 <ddeja> thanks, bye
10:00:24 <aspiers> #endmeeting