17:08:21 <aspiers> #startmeeting self-healing 17:08:22 <openstack> Meeting started Wed Jun 5 17:08:21 2019 UTC and is due to finish in 60 minutes. The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:08:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:08:25 <openstack> The meeting name has been set to 'self_healing' 17:09:13 <aspiers> So this morning witek mentioned some ongoing discussions around billing, and the idea that instrumenting service code in order to provide metrics might work better than black-box monitoring for that 17:09:27 <aspiers> which ties in with https://storyboard.openstack.org/#!/story/2005632 17:09:45 <aspiers> #topic exporting metrics from services 17:10:03 <aspiers> BTW we seem to have a duplicate story in storyboard for this I think? 17:10:15 <aspiers> https://storyboard.openstack.org/#!/story/2005640 17:10:37 <aspiers> seem to remember some weirdness with StoryBoard when we were submitting stories recently 17:11:10 <ekcs> oh weird. yea I may have created a duplicate because of the weirdness. 17:11:22 <ekcs> I guess we should delete one? 17:11:33 <aspiers> yeah, https://storyboard.openstack.org/#!/story/2005632 has one fewer task 17:11:50 <ekcs> ok I’ll delete that one. 17:11:54 <aspiers> thanks 17:12:37 <aspiers> not much more to say on that right now except link to this morning's minutes 17:12:45 <aspiers> #link http://eavesdrop.openstack.org/meetings/self_healing/2019/self_healing.2019-06-05-09.05.html this morning's minutes 17:13:06 <aspiers> #topic heat + octavia + aodh 17:13:12 <ekcs> great. yea I read up on the morning meeting. sounds like there isn’t great support just yet, but great thing that witek is working on it. 17:13:22 <aspiers> so this popped up on the mailing list: 17:13:37 <aspiers> #link http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006582.html demo of app auto-healing via heat+octavia+aodh 17:13:52 <aspiers> Didn't get a response though 17:14:22 <aspiers> We can either keep chasing or try to document at least a skeleton for it ourselves 17:14:25 <aspiers> #action aspiers to create a story for documenting that use case 17:15:57 <ekcs> got it. yea first step maybe simply to link to that video in a skeletal doc. I can take a stab at that. 17:16:02 <aspiers> I'll finish that after the meeting 17:16:10 <aspiers> I mean, finish creating the story 17:16:18 <aspiers> That would be awesome if you could kick it off 17:16:24 <ekcs> yup. 17:16:26 <aspiers> We can totally merge a skeleton and flesh it out later 17:16:39 <aspiers> Main thing is promoting the discoverability / awareness 17:16:48 <aspiers> If people are aware and they need more details, they'll probably ask for them 17:17:06 <aspiers> #topic automated testing 17:17:08 <ekcs> sounds good 17:17:12 <aspiers> This old chestnut :) 17:17:34 <aspiers> So we *may* have an intern doing a masters thesis on this topic 17:17:44 <aspiers> in which case we could expect to see some progress 17:17:47 <aspiers> but nothing guaranteed yet 17:17:55 <aspiers> fingers crossed! 17:18:27 <ekcs> oh very nice! I also see that ricolin started some basic tempest setup. 17:18:50 <aspiers> Yup. IIRC it's still marked WIP so not sure if he needs any help with that 17:19:16 <ricolin> aspiers, ekcs, yes, it's working already but I'm more working on how to make the test scenario test more stable 17:19:26 <aspiers> ricolin: cool! 17:19:43 <aspiers> #link https://storyboard.openstack.org/#!/story/2005830 New story for documenting Heat+Octavia+Aodh 17:19:58 <aspiers> ricolin: Let us know if you need any help 17:20:08 <ekcs> awesomeness 17:20:16 <aspiers> I think that was all I had for now 17:20:25 <aspiers> #topic AOB 17:20:26 <ricolin> the self-healing scenario is very unstable in https://review.opendev.org/656070 try to figure out why 17:20:34 <aspiers> ah OK 17:20:38 <aspiers> anything else? 17:20:52 * aspiers takes a look at that review 17:21:45 <ekcs> ricolin: are these similar to tests already being run on heat repos? 17:21:45 <aspiers> heat_tempest_plugin.common.exceptions.TimeoutException: Request timed out 17:21:59 <aspiers> Details: Stack SelfHealingTest-243821469/c9e222f4-e0f0-4cbf-ba58-dea30d2d6a08 failed to reach UPDATE_COMPLETE status within the required time (1200 s). 17:22:15 <aspiers> #topic heat self-healing tests 17:22:34 <ekcs> knowing what’s new exsting heat tests may help us diagnose. 17:22:41 <aspiers> true 17:23:24 <ricolin> the time out is when the healing process didn't start in any reason 17:23:41 <aspiers> OK 17:24:03 <aspiers> that's beyond my familiarity right now 17:24:31 <ricolin> Heat should play better role during entire process and help to make sure all component works well 17:24:49 <ricolin> and reduce the unstable cases 17:25:02 <aspiers> do you know why it didn't start? 17:25:46 <ricolin> I think I got some idea 17:26:04 <ricolin> but since next week is part of my wedding ceremony, I won't be that available before 6/15 17:26:15 <aspiers> Ah! No problem, enjoy! :-D 17:26:30 <ekcs> oh wow congrats! 17:27:02 <ricolin> and the rest part happen in 11/17 so it's going to be a very long years for me!lol 17:27:08 <ricolin> ekcs, aspiers thx! 17:27:14 <aspiers> haha 17:27:35 <aspiers> alright 17:27:44 <aspiers> anything else anyone want to discuss? 17:27:57 <ricolin> aspiers, in short, I think that test case fail because Heat didn't make sure the Mistral workflow is up and running stable before we assume next step 17:28:08 <aspiers> ahah, I see 17:28:24 <ricolin> I will look into that and hope I can bring some good knews 17:28:31 <ricolin> knews/news 17:28:31 <aspiers> perfect 17:28:44 <ekcs> great! 17:28:51 <ricolin> Once that test is stable, the rest gate job setting will be easy 17:29:10 <ricolin> since all required patch is already there 17:29:25 <aspiers> nice 17:29:46 <aspiers> I guess we need a short doc explaining it too 17:31:22 <ekcs> not a discussion topic per se, but I’ve been wavering in my personal priority between identifying and supporting new use cases vs documenting existing use cases. I think I settled on documenting existing as higher priority at this stage of the sig. 17:31:45 <aspiers> personally I think either is fine 17:31:55 <aspiers> Whatever you are more excited about ;) 17:32:06 <ekcs> = ) 17:32:34 <aspiers> Any small contributions are a lot better than nothing :) 17:32:54 <ekcs> yup 17:33:18 <aspiers> We're all busy with other stuff, so IMO there's no problem at all with being selective and time-boxing SIG work 17:34:00 <aspiers> Alright, sounds like we're done for today? 17:34:17 <ekcs> yup 17:34:28 <aspiers> cool 17:34:34 <aspiers> thanks, and catch you soon! 17:35:00 <ekcs> yup later guys! have a great week! 17:35:28 <aspiers> o/ 17:35:30 <aspiers> #endmeeting