09:01:23 <aspiers> #startmeeting ha 09:01:24 <openstack> Meeting started Mon Dec 7 09:01:23 2015 UTC and is due to finish in 60 minutes. The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:01:25 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:01:27 <openstack> The meeting name has been set to 'ha' 09:01:40 <aspiers> OK hopefully this is the right channel now ;-) 09:01:53 <ddeja> yes, hello :-) 09:01:59 <masahito> o/ 09:02:02 <aspiers> Welcome everyone - again :-) 09:02:12 <kazuIchikawa> o/ 09:02:22 <aspiers> Maybe I should have mentioned that this meeting time is Monday morning for me, so sometimes my brain will not work ;-) 09:02:40 <beekhof> howdy 09:02:48 <_gryf> hey 09:03:19 <aspiers> #topic Current status (progress, issues, roadblocks, further plans) 09:03:55 <aspiers> _gryf/ddeja: you want to give any updates? 09:04:18 <ddeja> I have working fix for bug in mistral, but struggle with tests 09:04:26 <aspiers> ok 09:04:46 <ddeja> but other guy from Mistral have proposed another (probably better) fix 09:04:57 <_gryf> from my side - nothing in the ha area 09:05:04 <ddeja> so I hope to have it resolved this week 09:05:09 <aspiers> ok cool 09:05:25 <ddeja> I'll be making review and start hardening POC solution 09:05:44 <aspiers> ddeja: sounds good! 09:05:49 <aspiers> masahito: anything from your side? 09:06:09 <masahito> I checked 2 things last week. i) whether Masakari works with pacemaker-remote with no change or not? ii) easy to replace sqlalchemy? 09:06:25 <masahito> i) We need some works for using pacemaker-remote. Changes sources of host's status for hostmonitor process. Masakari parses output of crm_mon command to check which host is online and offline. aspiers suggetted up-to-date crmsh is better. 09:06:40 <masahito> but I hit a problem in crmsh when I checked it. A remote node appears in 'crm node list', but when pacemaker-remote goes down the command doesn't said the remote node is offline. I used pacemaker 1.1.10 on ubuntu14.04. If the version is wrong for testing, please let me know. 09:06:52 <masahito> ii) replacing MySQLdb with sqlalchemy isn't difficult but needs some works. 09:07:01 <aspiers> 1.1.100 is old 09:07:05 <aspiers> 1.1.10 I mean 09:07:24 <masahito> above is my update :) 09:07:26 <beekhof> crmsh wont work for RH 09:07:54 <masahito> beekhof: which tools should I use? 09:08:07 <beekhof> well RH would love you to use pcs 09:08:07 <beekhof> but 09:08:14 <beekhof> maybe crm_mon --xml ? 09:08:27 <masahito> ubuntu doesn't have it. 09:08:36 <ddeja> beekhof: why not pcs? I'm using it and it's OK? 09:08:53 <aspiers> ddeja: pcs only exists on RH IIUC 09:08:58 <beekhof> ddeja: because that will make suse just as unhappy 09:09:05 <ddeja> Oh, OK 09:09:13 <aspiers> (and Ubuntu) 09:09:33 <beekhof> masahito: i'd worry about using pacemaker remote on a version old enough not to have --xml 09:09:46 <beekhof> sorry, --as-xml 09:09:57 <aspiers> masahito: you should probably use 1.1.13 at least 09:10:09 <beekhof> at least 09:10:17 <masahito> beekhof: ok, I'll try it with latest one. 09:10:21 <beekhof> lots of work went into it this year 09:10:50 <beekhof> lots of scary, "how did this ever work" kind of fixes 09:11:15 <aspiers> crm_node should work cross-distro, but IIRC it doesn't work on remotes unless they have a node attribute set 09:11:16 <_gryf> masahito, what operating system do you use? 09:11:44 <masahito> _gryf: ubuntu 09:11:57 <_gryf> masahito, oh, i see, ubuntu lts 09:12:34 <masahito> _gryf: currently we use ubuntu14.04 because of lts 09:12:34 <aspiers> I should try to summarise some of this for the minutes 09:13:01 <aspiers> #action ddeja to review alternate solution to mistral bug 09:13:13 <aspiers> #action ddeja to continue hardening mistral PoC 09:13:43 <aspiers> #info masahito is now working on integrating pacemaker_remote into masakari 09:13:59 <aspiers> #info masahito is now working on replacing MySQLdb with sqlalchemy 09:14:00 <beekhof> aspiers: it should work 09:14:20 <aspiers> #action masahito will see if a newer pacemaker solves some of his issues 09:14:47 <aspiers> beekhof: it works, but remotes are missing from one part of the CIB until a node attribute is set on them 09:15:01 <aspiers> beekhof: therefore IIRC, crm_node -l only includes remotes with node attributes 09:15:10 <aspiers> but I could be remembering the details totally wrong 09:15:17 <aspiers> since it is still early on Monday 09:15:39 <beekhof> i think i encouraged ken to change that 09:15:49 <aspiers> that would be nice - it did seem a bit inconsistent 09:15:52 <beekhof> should be reliable in 1.1.14 09:16:16 <aspiers> beekhof: any updates from your side? 09:16:29 <beekhof> ha. yeah 09:16:56 <beekhof> so remember all my problems last week? yeah, someone pointed fencing at the undercloud instead. 09:17:07 <aspiers> LOL 09:17:31 <beekhof> there are still some fixes to be had, and i'd like to co-ordinate with you on the host/nova name mappings 09:17:39 <aspiers> yes please 09:17:45 <aspiers> that was the bit I had the most problems with 09:17:46 <beekhof> see if we can get something that works for both of us 09:18:07 <beekhof> yep, i still have your diff open 09:18:23 <aspiers> I have a whole page of notes from when I was reverse-engineering that 09:18:36 <aspiers> I should share it 09:18:50 <beekhof> i should have documented it :) 09:18:58 <aspiers> that would be nice :) 09:19:01 <aspiers> but it wasn't just you 09:19:21 <aspiers> a lot of the mapping stuff is quite sparse on docs 09:19:46 <aspiers> e.g. pcmk_host_map, how port gets set for fencing agents etc. 09:19:55 <beekhof> yeah 09:19:59 <aspiers> #action beekhof and aspiers to discuss host/nova name mappings 09:20:01 <beekhof> too little time 09:20:04 <aspiers> right 09:20:07 <beekhof> i updated RH on the status here, and someone raised a concern that Congress might be flatlining 09:20:11 <beekhof> anyone want to comment on that? 09:20:31 <aspiers> it doesn't look like that based on the meeting minutes I read last week 09:20:50 <aspiers> but I wanted to discuss Congress as a separate topic anyway 09:21:09 <_gryf> beekhof, you mean, Congress would be discontinued? 09:21:24 <aspiers> so let me first just say I don't have any significant updates from my side since last week was our team meeting 09:21:35 <aspiers> #topic Congress and potential use in HA 09:21:36 <beekhof> someone basically sent this link ( http://stackalytics.com/?module=congress-group&metric=commits&release=liberty ) and said see, vmware isn't interested anymore 09:21:48 <beekhof> presumably vmware was the original project champion? 09:22:05 <masahito> hi, I'm also contributing Congress. 09:22:10 <aspiers> beekhof: are you mentioning Congress due to what I posted on #openstack-ha last week? 09:22:17 <beekhof> _gryf: that was the suggestion 09:22:25 <beekhof> aspiers: i dont recall who suggested it 09:22:29 <aspiers> it was me 09:22:29 <beekhof> but yeah 09:22:34 <beekhof> ok 09:22:46 <aspiers> they already have a use case for triggering nova evacuation workflows from Congress 09:23:03 <aspiers> #info Congress project has documented a use case for triggering nova evacuation workflows from Congress 09:23:07 <beekhof> use case == reason or use case == code? 09:23:14 <aspiers> reason IIUC 09:23:30 <beekhof> link? 09:23:54 <masahito> I don't know what was happen in ha, but I can explain from Congress side maybe. 09:24:09 <aspiers> #link https://docs.google.com/document/d/1ExDmT06vDZjzOPePYBqojMRfXodvsk0R8nRkX-zrkSw/edit# 09:24:21 <beekhof> to be up front, i don't much care who does the triggering. but the division of labor seemed sane 09:24:28 <aspiers> the use case is missing from the ToC 09:24:43 <aspiers> but it's just after the "evacuation of tenants for planned outage" section 09:25:17 <aspiers> I also talked to one of our guys who is following Congress 09:25:26 <aspiers> and found out that they are talking about integrating it with Mistral 09:26:05 <aspiers> so it could watch out for the attribute set by fence_compute, and then initiate evacuation via Mistral based on cloud-specific policies 09:26:27 <aspiers> which sounds really nice because each cloud will have different ideas of how to set SLAs for pets 09:26:40 <beekhof> would we bother with fence_compute? other than to tell nova that the node is down? 09:27:10 <beekhof> once we've told nova, then congress should automagically know what to do right? 09:27:38 <aspiers> beekhof: I guess it depends on the details 09:27:50 <beekhof> doesn't everything :) 09:27:56 <ddeja> From the document that aspiers linked, it seemes that fence_compute should also tell congress 09:28:01 <aspiers> if congress is too slow to notice for some reason, the compute host could bounce back up without any action taken 09:28:07 <beekhof> also possible 09:28:10 <ddeja> but maybe masahito know the details? 09:28:36 <masahito> yap 09:28:46 <aspiers> but I really like the idea of allowing flexible policies via Congress, e.g. some might want to do it with availability zones, others per-tenant, etc... 09:28:51 <beekhof> aspiers: so was the question just "what do you think of including it?" ? 09:29:11 <aspiers> beekhof: yeah pretty much 09:29:25 <aspiers> I think there's still quite a bit of work to do, e.g. the mistral driver for congress 09:29:28 <beekhof> masahito: how speedy congress would be for this? seconds? minutes? 09:29:40 <masahito> beekhof: depends on config. 09:30:08 <masahito> beekhof: currently, congress polls Nova API every 10s in default. but 09:30:17 <aspiers> Congress team were originally planning to add Mistral integration for liberty 09:30:23 <aspiers> but AFAIK it hasn't started yet 09:30:44 <aspiers> #link https://wiki.openstack.org/wiki/PolicyGuidedFulfillmentLibertyPlanning_Remediation is an example quite similar to the use case we are discussing 09:30:48 <beekhof> aspiers: i think phase 1 == confirm if mistral is the right path + implement, phase 2 == decide if triggering via congress is the right path 09:31:01 <masahito> In Mitaka release, I'll implement a new feature that Congress receive notification from other service. 09:31:03 <aspiers> beekhof: violently agree :) 09:31:16 <beekhof> masahito: ah, excellent 09:31:25 <aspiers> masahito: very nice :-D 09:31:52 <aspiers> #info masahito is planning to enable Congress to receive external notifications in mitaka 09:32:13 <masahito> there is already a BP. 09:32:33 <masahito> #link https://blueprints.launchpad.net/congress/+spec/push-type-datasource-driver 09:32:35 <aspiers> ah cool 09:34:35 <masahito> On the other hand, the usecase aspiers mentioned above is suggested by others, so it's not under discussion now. 09:35:13 <aspiers> I will try to attend Congress / Mistral IRC meetings when possible 09:35:30 <aspiers> to stay up to date and represent our interests 09:35:48 <beekhof> aspiers: did you want to move on to our fun email thread? 09:35:52 <aspiers> but I'm pretty busy so hopefully others can too 09:35:59 <ddeja> I'm trying to cover Mistral meetings 09:36:01 <aspiers> beekhof: yes I was about to suggest that :) 09:36:06 <aspiers> ddeja: great! 09:36:14 <beekhof> but i got in first so ,,!,, :) 09:36:17 <aspiers> #topic future direction of OCF RAs 09:36:21 <aspiers> haha :) 09:36:30 <aspiers> so, I started a discussion with beekhof 09:36:35 <aspiers> on IRC 09:36:47 <aspiers> and then it turned into a private mail thread, which we should probably avoid in future :) 09:36:49 <beekhof> but then i needed to cook dinner 09:37:00 <beekhof> and you logged off 09:37:04 <aspiers> right :) 09:37:05 <ddeja> beekhof: LOL 09:37:20 <aspiers> I suggested the idea of converting the RAs to wrap around service(8) 09:37:25 <aspiers> and beekhof doesn't like that at all :) 09:37:35 <aspiers> we're currently working towards a consensus 09:37:35 <beekhof> and everyone agreed it was a terrible idea. the end. good night 09:37:39 <aspiers> haha ;-) 09:37:47 <_gryf> aspiers, serviced, or just any other init system? 09:37:57 <aspiers> _gryf: that's one of the key points 09:38:10 <aspiers> my main point was that each distro potentially has a different way of managing services 09:38:16 <aspiers> most are on systemd by now 09:38:17 <_gryf> s/serviced/systed/ 09:38:19 <aspiers> but I guess not all 09:38:24 <aspiers> e.g. Ubuntu LTSS 09:38:27 <beekhof> i'm no systemd fan, but it won, we should move on 09:38:29 <_gryf> darn, systemd :) 09:38:50 <beekhof> well, at least until people start using pacemaker_remoted as pid one 09:39:05 <_gryf> ubuntu 14.04 have only logind, but still uses upstart 09:39:16 <_gryf> ubuntu 16.04 will have systed 09:39:25 <aspiers> sorry, s/LTSS/LTS/ 09:39:38 <beekhof> _gryf: so still a few months out 09:39:55 <aspiers> 14.04 LTS is supported until late 2019 09:40:11 <_gryf> beekhof, right, and we have to count existing setups in 09:40:17 <beekhof> LALALALALAL there is only RHEL 09:40:18 <aspiers> beekhof: so you are suggesting we should drop support for that? 09:40:51 <ddeja> IMHO even after U16.04 we still should assume, that some other OS may come without systemd 09:40:54 <aspiers> beekhof: in theory you and I should care the same amount about Ubuntu LTS ;-) 09:40:55 <beekhof> i probably wouldnt base a forward looking solution around not having systemd if thats what you mean 09:41:10 <aspiers> ddeja: that's right 09:41:45 <beekhof> in any case, we should just use OCF scripts. problem solved 09:41:47 <_gryf> ddeja, besides slackware and gentoo are there any other significant distribution that does not use systemd? 09:42:12 <aspiers> now might be a good time to mention the https://wiki.openstack.org/wiki/Rpm-packaging project 09:42:30 <beekhof> no comes with useful status monitoring! free with the first 100 orders 09:42:35 <beekhof> (soory, its late, i get silly) 09:42:55 <_gryf> beekhof, lol 09:43:03 <aspiers> the likelihood of converging on the same systemd service descriptions for OpenStack services for rpm-based distros depends on the success of that project 09:43:17 <aspiers> but it's actually expanding beyond rpm to deb 09:43:36 <aspiers> one of our guys (Dirk Mueller) is the PTL and he was talking to Mirantis etc. in Tokyo 09:43:51 <aspiers> although it's early stages 09:44:14 <aspiers> anyway, my point is, even if everybody uses systemd, can we rely on the services having the same names etc.? 09:44:31 <aspiers> or doing the same things 09:44:36 <beekhof> nope. do we need to? 09:44:41 <aspiers> that's the next question 09:44:53 <aspiers> e.g. if Pacemaker is controlling keystone, should systemctl status openstack-keystone still work reliably? 09:45:00 <aspiers> I would say yes 09:45:15 <aspiers> of course systemctl start/stop are a very bad idea in those scenarios 09:45:17 <beekhof> not necessarily 09:45:28 <beekhof> its arguably better if it doesnt 09:45:45 <ddeja> I'll agree with beekhof 09:45:46 <beekhof> because then people will try start/stop via systemctl too 09:45:59 <ddeja> with pacemaker we got one point to control all services 09:46:00 <beekhof> we know this because it already happens :-( 09:46:01 <aspiers> beekhof: you mean, so deliberately treat HA services as distinct from non-HA ones? 09:46:07 <beekhof> yes 09:46:09 <beekhof> absolutely 09:46:13 <beekhof> or 09:46:33 <beekhof> figure out a way to redirect systemctl X to the cluster 09:46:37 <beekhof> we're close 09:46:47 <beekhof> but of course lacking the time to implement it 09:46:52 <aspiers> ok, but then the problem is: how do we ensure that the RAs work cross-distro? 09:46:56 <beekhof> there are a couple of ways to go 09:47:03 <aspiers> which again comes back to packaging 09:47:38 <beekhof> are the binaries deployed to such different locations? 09:47:45 <beekhof> running as completely different users? 09:48:11 <aspiers> beekhof: I would be surprised if everything was identical across all distros 09:48:33 <aspiers> IIRC, one difference was whether to prefix binaries/users/groups etc. with "openstack-" 09:48:37 <beekhof> we handle apache as an RA and the binary names aren't even consistent 09:48:57 <beekhof> across distros i mean 09:48:58 <aspiers> that's true 09:49:07 <beekhof> httpd vs. apache 09:49:39 <aspiers> HTTPDLIST="/sbin/httpd2 /usr/sbin/httpd2 /usr/sbin/apache2 /sbin/httpd /usr/sbin/httpd /usr/sbin/apache $IBMHTTPD" 09:49:44 <aspiers> wow, that's pretty ugly :-/ 09:50:02 <aspiers> DEFAULT_IBMCONFIG=/opt/IBMHTTPServer/conf/httpd.conf 09:50:04 <aspiers> DEFAULT_SUSECONFIG="/etc/apache2/httpd.conf" 09:50:06 <aspiers> DEFAULT_RHELCONFIG="/etc/httpd/conf/httpd.conf" 09:50:15 <aspiers> this is exactly what I would like to avoid 09:50:16 <beekhof> i'm not saying the community shouldnt standardize on something 09:50:38 <aspiers> the differences should be handled by the vendor packages, not by hardcoding them in the upstream RAs 09:50:48 <beekhof> but a) i dont think we need to drive it and b) i dont think its a requirement 09:50:54 <aspiers> e.g. the above breaks if a distro changes locations between major versions 09:51:00 <beekhof> or a reason not to do an OCF RA 09:51:20 <aspiers> do we really want to go down the road of having 'case "$SUSE_VERSION" in' stuff inside each RA? 09:51:38 <beekhof> no 09:52:05 <aspiers> I don't see another way, unless the RA delegates distro-specific decisions to something external 09:52:07 <beekhof> but i dont think the DEFAULT_ stuff is necessary 09:52:07 <beekhof> or 09:52:11 <beekhof> could be done better imho 09:52:27 <aspiers> OK, would you like to write up a proposal? 09:52:34 <aspiers> no huge rush, obviously 09:52:42 <aspiers> but probably doesn't make sense to delve into details here 09:53:07 <beekhof> i think the sticking point, is that everyone wants to have their own defaults 09:53:19 <aspiers> yes, and that's exactly why I proposed delegation 09:53:24 <beekhof> for all i care, they can be the suse values and our installer can override them 09:53:43 <aspiers> that's assuming that everything is parametrized 09:53:43 <beekhof> or vice versa 09:53:58 <beekhof> they usually need to be anyway 09:54:01 <aspiers> so we could make binary/config/pid file locations parametrized 09:54:10 <beekhof> better idea 09:54:17 <beekhof> lets decide what they should be 09:54:25 <aspiers> OCF_RESKEY_binary 09:54:32 <masahito> It sounds good. 09:54:33 <beekhof> and anyone that gets it wrong has to rely on their installer 09:54:46 <beekhof> until they fix their packaging :) 09:55:21 <aspiers> It's still duplicating a bunch of stuff which exists in systemd service descriptions 09:56:00 <beekhof> that doesn't help convergance though 09:56:09 <beekhof> write something that pulls those values out? 09:56:15 <beekhof> 3 lines of shell? 09:56:30 <aspiers> not sure I follow 09:57:16 <beekhof> grep $field $unit | awk -F= '{print $2}' 09:57:42 <aspiers> that seems a lot more cumbersome than simply wrapping systemctl 09:57:46 <beekhof> that would be the default "default default" 09:57:59 <beekhof> except it has one key advantage 09:58:07 <beekhof> i might actually work 09:58:09 <beekhof> it might actually work 09:58:17 <beekhof> systemd == nightmare 09:58:26 <aspiers> OK, I'm beginning to learn you don't like systemd :) 09:58:31 <aspiers> anyway we're out of time for now 09:58:44 <aspiers> let's continue this on IRC / openstack-dev@ 09:58:48 <beekhof> k 09:58:58 <aspiers> any other topics in the remaining seconds? 09:59:11 <_gryf> i guess no 09:59:20 <aspiers> I think we sent everyone else to sleep with our RA talk ;-)) 09:59:24 <masahito> next week, I want us to start thinking milestone of the HA. 09:59:38 <aspiers> masahito: good idea, I'll put on the agenda! 09:59:47 <masahito> aspiers: thanks! 09:59:49 <aspiers> ok thanks all, and see you next week, or sooner on IRC! 10:00:05 <aspiers> #action put roadmap on agenda for next meeting 10:00:21 <ddeja> thanks, bye 10:00:24 <aspiers> #endmeeting