09:03:11 <aspiers> #startmeeting ha 09:03:12 <openstack> Meeting started Mon Jan 18 09:03:11 2016 UTC and is due to finish in 60 minutes. The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:03:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:03:15 <openstack> The meeting name has been set to 'ha' 09:03:21 <aspiers> welcome everyone 09:03:40 <_gryf> hello again :) 09:03:50 <aspiers> I guess it might be a bit short today but that's no problem :) 09:04:03 <masahito> hello 09:04:13 <aspiers> let's start with some status updates as per normal 09:04:21 <aspiers> #topic Current status (progress, issues, roadblocks, further plans) 09:04:37 * aspiers picks someone at random 09:04:46 <aspiers> masahito: anything you'd like to report? 09:05:19 <masahito> I'm working on my work items for Masakari, so nothing to specials. 09:05:30 <aspiers> ok 09:06:23 <aspiers> hey kazuIchikawa, welcome to our small group :) are you also working on masakari? 09:07:03 <kazuIchikawa> Yes, I'm working on masakari with masahito. 09:07:08 <aspiers> cool 09:07:41 <kazuIchikawa> He is my colleague, actually. working in same office. 09:07:51 <aspiers> nice 09:08:11 <aspiers> I don't have too much to report 09:08:15 <aspiers> briefly: 09:08:32 <aspiers> I merged one or two patches into openstack-resource-agents 09:08:37 <_gryf> also, nothing from my side 09:08:41 <_gryf> but ddeja was able to bring his setup alive last Friday 09:08:43 <aspiers> there are still a few outstanding 09:08:46 <_gryf> and currently he is back on track and working on tuning the mistral workflows for evacuation 09:08:57 <aspiers> _gryf: sounds cool! 09:09:09 <bogdando> hi 09:09:09 <aspiers> _gryf: I guess the github will continue to be updated? 09:09:15 <aspiers> oh hey bogdando :) 09:09:18 <_gryf> aspiers, I hope :) 09:09:40 <aspiers> I've continued to have a long discussion with beekhof about the future of OCF RAs 09:09:57 <aspiers> and their relationship to systemd 09:10:27 <aspiers> I am still thinking it might be a good idea if the RAs are changed to wrap around system unit services 09:10:42 <aspiers> instead of reimplementing the logic / config for starting/stopping the daemons 09:10:54 <aspiers> beekhof strongly disagrees ;-) 09:11:15 <aspiers> for reasons which I don't fully understand yet 09:11:18 <bogdando> aspiers, do you mean only stateless resources by OCF or stateful as well? 09:11:25 <bogdando> like clusters 09:11:31 <aspiers> bogdando: possibly both 09:11:47 <aspiers> hey ddeja :) 09:11:51 <ddeja> hi all, sorry for beeing late :) 09:11:55 <aspiers> no problem 09:11:56 <bogdando> the point is for the clusters, start/stop differs 09:11:57 <ddeja> I left my car for repair and didin't expect public transport to take so long 09:12:03 <aspiers> hehe 09:12:09 <_gryf> heh 09:12:12 <bogdando> and systemd start/stop logic cannot fit the needs perhaps 09:12:22 <bogdando> as well as any init-like one 09:12:23 <aspiers> well, let's finish the status round first, and then maybe we can come back to this topic? 09:12:36 <aspiers> want to make sure everyone gets a chance to give their status 09:13:04 <aspiers> the other thing I have to report is that I'm still working on our automation of compute node HA setup via Chef 09:13:16 <aspiers> but it's quite close to completion now 09:13:32 <aspiers> bogdando: you want to give a status update on anything? 09:13:59 <bogdando> no so far, only few updates to the ha guide related things 09:14:29 <aspiers> yeah, ha-guide project seems very active :) 09:14:38 <aspiers> ddeja: anything from your side? 09:14:46 <ddeja> aspiers: yes, thanks 09:14:52 <bogdando> galera patch was reworked, thanks to Kenneth, https://review.openstack.org/#/c/263075/ 09:15:07 <ddeja> I have finished working whit my setup for auto-evac testing 09:15:32 <aspiers> cool 09:15:45 <ddeja> And I'm hardening mistral workflow, but on Friday I hit something, that might be a bug 09:15:59 <ddeja> But not sure yet, must do a double check 09:16:30 <ddeja> as soon as I have my workflow fully working, I'll let you know and I'll describe it in etherpad/github 09:16:31 <aspiers> ddeja: do you have any updates you could push to https://github.com/gryf/mistral-evacuate ? 09:16:37 <aspiers> ok great! 09:17:11 <aspiers> alright, I think that's everyone, or does anyone else want to report anything? 09:17:24 <ddeja> aspiers: yes, but I must confirm that there is bug in mistral itself, not in my workflow before I push it, hopefully I'll do it tommorow :) 09:17:36 <aspiers> ddeja: cool :) 09:17:54 <aspiers> also, if you want to discuss a particular topic today (or in the future), please say so now 09:18:14 <aspiers> I would quite like to return to this topic of OCF RAs and systemd to get other people's opinions 09:18:24 <aspiers> but happy to discuss anything else 09:19:05 <ddeja> I'm OK with OCF RAs stuff 09:19:11 <aspiers> ok 09:19:16 <aspiers> #topic OCF RAs and systemd 09:19:26 <aspiers> firstly, is anyone *not* using systemd? 09:20:15 <aspiers> so my original idea, which I think I have mentioned in previous meetings, or on #openstack-ha, is to make small changes to the existing OCF RAs so that they wrap around systemd 09:20:18 <masahito> for HA one? 09:20:27 <aspiers> or at least, that they wrap around service(8) 09:20:37 <aspiers> masahito: I mean, in general 09:20:44 <aspiers> masahito: vs. sysvinit, upstart etc. 09:21:10 <aspiers> there are a few problems with the existing OCF RAs 09:21:13 <masahito> got it. we're using upstart. 09:21:16 <bogdando> let's create a spec first. There is high dev & ops impact 09:21:29 <aspiers> bogdando: sure. this is very much in the proposal phase right now 09:21:33 <aspiers> a spec is a good idea 09:21:35 <bogdando> I'd like to see this change backwards compatible 09:21:47 <aspiers> bogdando: I think it would be, but we'd have to check 09:21:55 <aspiers> the problems I want to solve are: 09:22:01 <bogdando> with some deprectaion perhaps unless there is ubuntu 16.04 at lest :) 09:22:06 <aspiers> 1. duplication of daemon management logic 09:22:16 <aspiers> 2. distro-specific code in openstack-resource-agents repo 09:22:30 <aspiers> 3. inconsistency of daemon management between HA clouds and non-HA clouds 09:22:56 <aspiers> 4. delegation of maintenance of daemon management to distro packages 09:23:09 <aspiers> (2. and 4. are kind of the same thing) 09:23:22 <aspiers> so currently the OCF RAs start up daemons in their own way 09:23:30 <aspiers> they don't care how the daemons are packaged by the distro 09:23:48 <aspiers> if the distro package changes, the RA might also need a change 09:23:54 <aspiers> or at least the Pacemaker primitive 09:24:05 <bogdando> another point is probably a single point of entry for the resources control 09:24:24 <bogdando> would systemd allow to effectively handle this? 09:24:39 <aspiers> bogdando: could you expand a bit on what you mean by that please? 09:24:49 <bogdando> I mean that init-based control plane shall be not used, or desabled then pacemaker takes care 09:24:58 <bogdando> disabled* 09:25:12 <aspiers> well that's already true 09:25:27 <aspiers> if Pacemaker manages the service, then the admin must not also try to start/stop it 09:25:38 <aspiers> I don't think my proposal changes that 09:25:46 <bogdando> yes, but I thought systemd may integrate things a bit better 09:25:54 <bogdando> must not -> cannot 09:26:11 <bogdando> or transparently "will not", in fact 09:26:28 <aspiers> I don't think systemd lets us prevent admins interfering with Pacemaker-controlled services 09:26:32 <bogdando> ;( 09:26:46 <aspiers> which is a shame 09:26:53 <aspiers> maybe there is some way to hack that, I'm not sure 09:26:54 <masahito> my concern is how pacemaker works when a process launched by systemd goes down. 09:27:14 <aspiers> masahito: in that case, Pacemaker must be in charge of restarting it 09:27:17 <bogdando> yes, good point to address in the spec, failure modes 09:27:27 <bogdando> and avoiding split brain to control planes 09:27:38 <aspiers> masahito: in my proposal, Pacemaker launches the process *via* systemd 09:28:12 <aspiers> so e.g. the 'start' action of the OCF RA will call something like "service openstack-keystone start" 09:28:17 <masahito> aspiers: got it. 09:28:19 <bogdando> so status in the systemd will be correct, but it shall not try to do respawn race 09:28:26 <aspiers> bogdando: correct 09:28:58 <aspiers> yes, this also solves the problem that 5. currently "systemctl status" is not guaranteed accurate if the service is started by Pacemaker 09:29:13 <aspiers> since that relies on the OCF RA using the same pid file and/or daemon name as systemd 09:29:14 <bogdando> looks like very good idea, but complex to address (considering keeping it backwards compatible unless there is LTS ubuntu with systemd) 09:29:26 <aspiers> bogdando: I think it would be extremely simple to do 09:29:27 <bogdando> so we need a spec and PoC perhaps 09:29:36 <aspiers> bogdando: it's just a few lines of code changed in each RA 09:29:47 <aspiers> beekhof highlighted one issue with systemctl though 09:29:50 <bogdando> yes perhaps 09:29:54 <aspiers> systemctl start/stop are asynchronous 09:30:08 <aspiers> they do not block until the service is started/stopped 09:30:14 <bogdando> oh, that may be a big problem for stop logic in pacemaker 09:30:22 <aspiers> so after you call systemctl, you have to poll until the service is really started or stopped 09:30:25 <bogdando> and unexpected STONITHes :) 09:30:43 <aspiers> but I think that can be solved simply by a loop which calls the 'monitor' action 09:30:51 <bogdando> then the action stop times out 09:31:04 <aspiers> where 'monitor' should do application-level monitoring (e.g. HTTP layer tests) 09:31:08 <bogdando> but in fact is running async 09:31:18 <bogdando> another failure mode to catch up 09:31:23 <aspiers> bogdando: yes, the polling loop would need a timeout, but this is easy to do 09:31:34 <aspiers> also the systemctl would need a timeout, which again is easy 09:31:53 <aspiers> I have discussed this proposal a LOT with beekhof 09:32:01 <aspiers> he doesn't like it ;-) 09:32:05 <aspiers> but I don't understand why yet 09:32:14 <aspiers> he says systemd is too unreliable to use 09:32:38 <bogdando> :) 09:32:38 <aspiers> but I don't know what technical issues he is referring to 09:32:46 <aspiers> so I guess that conversation will continue 09:33:10 <bogdando> by the way 09:33:13 <ddeja> maybe whe sould ask beekhof to elaborate more on this? 09:33:20 <aspiers> ddeja: I already did 09:33:26 <ddeja> aspiers: cool 09:33:27 <aspiers> ddeja: waiting for an answer 09:33:34 <ddeja> aspiers: OK 09:33:35 <bogdando> for stop, I believe we should use a unified proc_stop which relies on several iterations of SIGTERM 09:33:50 <bogdando> folowing by the SIGKILL if nothing helps to stop gracefully 09:33:54 <aspiers> but in any case, I agree that a spec and some PoC (e.g. a WIP pull request to openstack-resource-agents) is a good way to go 09:33:59 <_gryf> aspiers, is that thread on the ml and somehow i've missed it? 09:34:07 <bogdando> that would be a more classic unix way, rely on the signals, IMO 09:34:20 <bogdando> I have an example to show 09:34:42 <aspiers> _gryf: no, unfortunately it had to be private since there were some minor political topics involved :-/ 09:34:58 <_gryf> aspiers, oh, got it ;) 09:35:09 <aspiers> _gryf: but a spec / gerrit review would help keep the technical discussion open 09:35:13 <aspiers> I would much prefer that 09:35:31 <bogdando> https://github.com/rabbitmq/rabbitmq-server/blob/6fd4eb5bcb39be7f5ac26dcc78e3a4b4df4c6fbb/scripts/rabbitmq-server-ha.ocf#L303-L463 09:35:55 <aspiers> #action aspiers to write a spec proposing OCF RAs wrap systemd, and possibly a gerrit review showing a PoC 09:35:58 <_gryf> aspiers, yeah. and for me both (gerrit/ml) works fine 09:35:59 <bogdando> we could contribute this to the ocf-shell-funcs of the resource-agents 09:36:19 <bogdando> an use this instead of the init control plane for stopping things 09:36:22 <bogdando> and 09:36:24 <aspiers> #action beekhof to give technical details of systemd issues which prevent reliable building of OCF RAs on top of it 09:36:49 <bogdando> thoughts? 09:36:50 <aspiers> bogdando: yes, maybe there is an opportunity for reuse of library code there 09:37:03 <aspiers> bogdando: however the idea is to leave the majority of the logic to systemd 09:37:23 <bogdando> I believe posix signals will make happy even the ones with the longest beards among us :) 09:37:53 <aspiers> so effective the OCF RAs don't add much around systemd except 1. polling and error handling to cover systemd timing / failure issues and 2. application-layer monitoring 09:38:06 <bogdando> at least please put this as the alternative to the stop case 09:38:14 <bogdando> an alternative* 09:38:36 <aspiers> bogdando: well, determining which PID to kill is a detail I would prefer to be handled by systemd 09:38:43 <bogdando> cuz you know, in pacemaker the stop is the most sensitive thing 09:38:48 <aspiers> since that depends on how the daemon is coded and packaged by the distro 09:38:58 <aspiers> yes I agree, stop is really critical 09:39:21 <aspiers> but I believe that if the distro packages can't reliably stop the service via systemd, then that is a distro bug 09:40:01 <aspiers> there is no 100% reliable cross-distro way to determine the pid of the daemon 09:40:07 <bogdando> procfs? 09:40:09 <bogdando> not 100%? 09:40:25 <aspiers> no, because you have to make assumptions about the daemon process name 09:40:29 <aspiers> or the location of the pid file 09:40:45 <aspiers> and then this has to be a parameter to the Pacemaker primitive 09:40:49 <bogdando> the latter one is under OCF RA control, as a parameter perhaps 09:40:57 <aspiers> which is a duplication of config data already defined by the distro packages 09:41:03 <aspiers> and that's what I want to avoid 09:41:24 <bogdando> well, we could rely on the systemd to get the pid 09:41:32 <aspiers> yes, that's exactly my proposal :) 09:41:33 <bogdando> but still use proc_stop for stopping 09:41:43 <aspiers> bogdando: what's wrong with systemctl stop? 09:41:48 <bogdando> async 09:41:55 <aspiers> bogdando: solved by polling 09:42:02 <bogdando> not guarantees results 09:42:16 <aspiers> then isn't that a distro bug? 09:42:25 <aspiers> what could go wrong? 09:42:31 <bogdando> we want stop like STONITH, to be sure the one was nuked down 09:42:43 <bogdando> in the specified op stop timeout 09:42:52 <aspiers> why can't systemd do this? 09:43:06 <bogdando> some folks say it is not reliable enough ;) 09:43:27 <bogdando> while kill -TERM , kill -KILL was proven to be so for decades 09:43:34 <ddeja> but it's one of the pacemaker features - it'll kill host if it's unable to stop one of it's services 09:43:39 <aspiers> but that's what systemd does 09:43:45 <aspiers> bogdando: see the systemd.kill(5) man page 09:43:55 <bogdando> okay, thanks will look 09:43:57 <ddeja> so what's the big deal if systemd will not stop service? 09:44:05 <aspiers> bogdando: systemd can send SIGKILL 09:44:07 <aspiers> if needed 09:44:30 <aspiers> ddeja: we want to avoid STONITH if we possibly can, since it's much more disruptive 09:45:18 <ddeja> aspiers: ok, sure thing 09:45:19 <aspiers> but I can't imagine a scenario where systemd would fail to kill the process via SIGTERM/HUP/KILL, unless there was some kind kernel bug 09:45:28 <aspiers> and in that case, we want STONITH anyway :) 09:45:36 <bogdando> sigkill should work indeed 09:45:41 <aspiers> there may well be other issues I am not aware of 09:45:47 <aspiers> beekhof certainly seems to think so 09:45:53 <aspiers> but I haven't heard the details yet 09:45:55 <bogdando> but that if pid was lost? 09:46:05 <aspiers> bogdando: what do you mean by lost? 09:46:06 <bogdando> proc_stop assumes one may use the name based mathcing 09:46:29 <aspiers> bogdando: name-based matching is not cross-distro and also unreliable 09:46:30 <bogdando> imagine a byzantine case then the app crashed and removed its pid too eraly 09:46:35 <bogdando> earlyu 09:46:44 <bogdando> well, *early* 09:46:58 <aspiers> bogdando: in that case, systemd can detect the service is already stopped 09:47:06 <aspiers> so there is no problem 09:47:30 <bogdando> I can only say that pidfiles not always contain one's expecting 09:48:00 <bogdando> especially for high load systems 09:48:08 <aspiers> bogdando: perhaps you can give some concrete examples on #openstack-ha later, or via email? 09:48:14 <bogdando> okay 09:48:17 <aspiers> cool, thanks 09:48:51 <aspiers> btw, there was a report on #openstack-ha in the last few days about a bug where detecting pid via name-based matching failed 09:49:30 <aspiers> so this is not just a theoretical issue 09:49:44 <aspiers> ok, so I will write a spec 09:49:59 <aspiers> apologies in advance: I will probably not have time to do it in the next few weeks :-( 09:50:08 <aspiers> we have a major release coming up, and I am also going away for 2 weeks 09:50:16 <aspiers> but I don't think this proposal is urgent anyway 09:50:49 <aspiers> anyone have any other topics to discuss? bogdando, anything about the ha-guide maybe? e.g. new meeting time? 09:51:48 <aspiers> #action bogdando to give some concrete examples where pid files (and perhaps hence systemd) cannot be relied on 09:52:40 <aspiers> #info success of systemctl stop is considered very important, to avoid STONITH unless we really need it 09:53:00 <aspiers> #info systemd seems to be quite powerful in capabilities for killing misbehaving processes 09:55:02 <aspiers> ok I guess not 09:55:11 <aspiers> in that case let's end the meeting 09:55:28 <aspiers> thanks all, and see you next week, or on #openstack-ha before then! 09:55:32 <masahito> ok, bye 09:55:35 <aspiers> bye for now :) 09:55:44 <kazuIchikawa> bye 09:55:45 <ddeja> bye 09:56:09 <aspiers> #endmeeting