09:02:46 #startmeeting ha 09:02:47 Meeting started Mon Sep 5 09:02:46 2016 UTC and is due to finish in 60 minutes. The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:02:48 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:02:49 morning all 09:02:51 The meeting name has been set to 'ha' 09:03:00 oh hi beekhof :) 09:03:05 cool 09:03:12 * beekhof was on vacation last week 09:03:24 hope you had a good time and your inbox is not too bloated ;-) 09:03:43 mine tends to get ~1000/week (excluding mailing lists) 09:04:04 well, including the mailing lists I try to stay on top of 09:04:08 so not openstack-dev ;-) 09:04:21 hahaha 09:04:21 same here...lol 09:04:30 #topic HA guide 09:04:31 yes, i went somewhere warm to escape winter 09:04:43 does Australia even get winter? ;-) 09:05:42 indeed 09:05:42 #info ddeja submitted a review to HA guide adding instance HA info https://review.openstack.org/#/c/359955/ 09:05:52 samP: I just added you as a reviewer 09:06:15 aspiers: thank you. Ill take a look 09:06:28 however ddeja is now supposed to be on vacation 09:06:58 so I'll submit a new patchset taking my feedback into account 09:07:35 #action aspiers to update the HA guide review 09:07:52 beekhof: you wanna mention anything else about the HA guide? 09:08:12 can i defer? i got pulled into a meeting real quick 09:08:19 5 minutes or so 09:08:36 sure 09:10:30 samP: shall we wait a few minutes for him to get back? I guess there's not too much to discuss today 09:10:48 other topics: specs and Barcelona 09:10:57 and anything else you want to discuss 09:11:02 sure, sorry I dont have much topic to dicuss 09:12:05 that's fine 09:14:41 OK that's 5 mins or so ;-) 09:14:45 #topic specs 09:15:05 so the VM monitoring spec is almost finished 09:15:16 just 1 or 2 FIXMEs I think 09:15:52 samP: how much detail do you think the spec should give about the event data? 09:16:31 samP: also I think it should mention the need to work with https so that the communication is secure 09:17:08 i'm pretty much back 09:17:27 cool 09:18:02 #link https://review.openstack.org/#/c/352217 is the VM monitoring spec 09:18:11 atleast, we need events about unexpected stop, crash and IO error 09:18:37 samP: yes 09:18:42 samP: and what about event filtering? 09:18:51 that should be configurable in the monitoring service, right? 09:18:59 aspiers: yes 09:19:02 better to filter at source than destination 09:19:02 in client side 09:19:21 aspiers: agree 09:19:27 maybe it's enough for now to make the spec request filtering just by event type? 09:19:38 later we could improve if required 09:19:52 but the spec doesn't need to be the final perfect solution 09:20:22 aspiers: evetnt type is enough, as u said we can add more details later 09:20:37 ok cool 09:21:27 I think masakari-instancemonitor already implements 95% of what we need, which is good news :) 09:22:20 aspiers: yes. it also has client side event monitoring. though hard coded 09:22:25 yep 09:22:33 and we need https 09:23:14 how should it get the server's certificate? 09:23:36 a) just trust there is no middle-man attach the first time, and cache the cert 09:23:46 b) require the sysadmin to provide the cert 09:24:05 s/sysadmin/Chef or Puppet or Ansible or .../ 09:24:39 I think b) is better 09:24:43 b) is feasible 09:24:50 since b) still allows the possibility of a) 09:25:13 ok cool 09:25:25 from my ops view: also b) sounds better 09:25:33 oh hi haukebruno :) 09:25:37 aspiers: yes. when we add a new computer node, thing are automated and b) is not a big issue 09:25:37 morning all \o/ 09:25:40 that's good to hear 09:25:49 ok, so you wanted to talk ha guide? 09:25:49 haukebruno: morning.. 09:26:01 its mine, all mine i tell you! 09:26:03 beekhof: we're currently talking about the VM monitoring spec 09:26:06 hah 09:26:23 beekhof: anything from you on that? we can switch back to HA guide if you have more to add 09:26:34 i've not done much 09:26:51 i'd like to get around a table at summit and come up with a plan 09:26:57 +1 09:27:09 +1 09:27:14 something we can then tell people what they can do to help 09:27:22 yep 09:27:24 there was a couple of folks 09:27:43 #topic HA guide (part 2) 09:27:57 we kind of hashed most of it out over email 09:28:14 but we should document it and circulate it a bit more 09:28:25 #info beekhof has ideas for future of HA guide 09:28:32 #info we should discuss in Barcelona 09:28:32 and of course it ties into the other conversation about the new RH arch 09:28:36 right 09:28:52 which i expect to get grilled about in spain 09:29:02 we can call it a spanish inquisition! 09:29:11 or an Australian BBQ ;-) 09:29:15 no-one expects those 09:29:22 lol. are you guys all heading to barcelona? 09:29:22 hahah, or that 09:29:27 yep 09:29:33 my travel's not confirmed yet 09:29:52 since I didn't get a talk approved this time 09:29:56 doh 09:30:09 beekhof: do you know if RH saw a big reduction in approved talks? 09:30:19 it was a massive drop for SUSE 09:30:19 neither of my 2 got accepted 09:30:27 i dont know about others 09:30:29 very strange 09:30:36 maybe florian doesnt like you anymore? 09:30:40 hah 09:30:45 it's not up to him 09:30:47 if i can get some time, i might try and get started on the docs plans 09:31:27 #topic Barcelona 09:31:38 have you had time to absorb the RH plans? anything you still want to ask or critique? 09:31:47 I think I already mentioned that it was not possible to get any official HA track this time :-/ 09:31:53 yeah 09:32:02 which kinda blows 09:32:12 but it sounds like when they split the event in two it will be a lot easier 09:32:21 unclear 09:32:31 Thierry suggested that will be the case, IIRC 09:32:42 i expect that is the intention 09:33:02 #info still no official HA track, hopefully the future event split will fix this though 09:33:18 #topic RH's new generation HA architecture 09:33:34 aka. why beekhof is wrong 09:33:39 cat we get the fish bowl? 09:33:48 beekhof: we don't need a dedicated topic to discuss that ;-) 09:34:02 does anyone not know what RH is planning regarding the HA arch? 09:34:11 would be hard to keep up if not :) 09:34:18 beekhof: I think your blog posts are pretty clear 09:34:27 beekhof: although I guess there is more they probably don't cover 09:34:28 maybe not everyone read it 09:34:43 http://blog.clusterlabs.org/blog/2016/next-openstack-ha-arch 09:35:12 in any case, if anyone has questions or concerns... now is your chance 09:35:22 http://blog.clusterlabs.org/blog/2016/composable-openstack-ha 09:35:40 one question is: how are you going to implement the service-level monitoring? 09:36:06 and I just had a crazy idea for you to shoot down in flames 09:36:10 i expect there will be two layers 09:36:40 simple systemd based + more advanced nagios style external monitoring 09:36:46 since I plan to maintain the OCF RAs which have a "monitor" action which should do service-level monitoring (and in the cases where it doesn't, I can fix it) 09:36:55 or sensu or whatever the flavor of the month is 09:37:04 yes, by "service-level" I meant the non-systemd layer 09:37:27 ok 09:37:27 IOW, how will your $nagios_or_similar know how to monitor each service? 09:37:33 yeah, that will all be external 09:37:34 I assume it will need some kind of plugin per service 09:37:40 so here's my crazy idea ... 09:37:49 reuse "monitor" action of the OCF RAs :-) 09:37:56 there will be something that gets called, yes 09:38:03 thats one possibility 09:38:30 of course we might all be in containers by then, so we'd be doing something like http://kubernetes.io/docs/user-guide/production-pods/#liveness-and-readiness-probes-aka-health-checks 09:38:34 if the OCF RA needs to do some extra non-RA stuff to work with your monitoring layer, I'd be more than happy to accommodate it 09:39:15 i dont think there is any great desire to write this stuff, so if it exists in some consumable form i bet it would get reused 09:39:16 yeah I've already been looking at that and stackanetes 09:40:03 a simple service readiness problem could be too naive in some cases 09:40:08 i've not come across that one 09:40:17 aspiers: agreed 09:40:59 so let's agree that in principle, we're aligned on the idea of sharing/reusing code which does service-level monitoring 09:41:08 thats one of the problems the kubernetes proponents will need to find a solution for if they want it adopted 09:41:16 yes 09:41:36 the only real wrinkle, is if your agents expect parameters 09:41:39 if OCF RAs seems to be a suitable home for that code, as the maintainer I'm 100% happy to look after it 09:41:52 thought... 09:41:54 well that should be easy to deal with 09:42:03 simply pass the right environment variables 09:42:11 and i get that this is contrary to everything i've said for 14 years 09:42:22 and if k8s doesn't support that, a wrapper script can set them easily 09:42:29 what if the monitor logic lived in a separate file/script 09:42:44 I'm OK with that too 09:42:58 well the point is that we wouldn;'t have any... anything would be in the sysconfig file 09:43:11 since systemd doesnt have parameters 09:43:25 sep file would make them easier to consume == less resistance 09:43:30 sure 09:43:36 * beekhof has to run again... kids bedtime 09:43:38 I'm totally fine with that 09:43:50 i'll try and make it a bit more often 09:43:57 separate files could still potentially live the o-r-a repo 09:43:57 it == this meeting 09:44:04 yes 09:44:11 beekhof: cool, was great to have you here this time 09:44:52 samP: sorry yes, we should try to book a decent room in advance 09:45:03 samP, haukebruno: you want to discuss anything else? 09:45:27 not from my site :( 09:45:37 ok np 09:45:39 just looking forward to the barcelona summit to see some of you folks 09:45:48 yes I hope I can make it! :-/ 09:46:31 aspiers: not from my side 09:46:43 ok then, let's close for today 09:46:51 thanks all and bye for now! 09:47:06 thank you all... 09:47:41 have a nice day all 09:47:55 you too :) 09:47:57 #endmeeting