14:03:07 <tosky> #startmeeting sahara 14:03:08 <openstack> Meeting started Thu Oct 18 14:03:07 2018 UTC and is due to finish in 60 minutes. The chair is tosky. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:03:09 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:03:11 <openstack> The meeting name has been set to 'sahara' 14:03:18 <jeremyfreudberg> o/ 14:03:50 <tosky> Telles is on vacation, but still, good to check the status 14:03:59 <jeremyfreudberg> indeed 14:04:00 <tosky> #topic News/Updates 14:04:40 <tosky> I've been working on S3 testing (still, I know, but now there is more working code) 14:05:44 <jeremyfreudberg> i have not had any quality time to work on features recently (unlike last cycle, the features are a bit more complicated). very slowly making progress on health repair (which I have a small discussion point about) 14:05:49 <tosky> apart from that, I've been following the status of the gates and discussed with -infra and -releases about few issues and changes 14:06:10 <jeremyfreudberg> thanks tosky for holding down the fort, both on the openstack-wide stuff and also on those big patches like s3 testing, doc refactor, etc 14:06:45 <tosky> in the doc refactor one, at some point I had too many windows open with code floating 14:06:48 <tosky> but yeah :) 14:07:50 <tosky> last time I spoke with Telles, the unit tests for the split repositories were passing and he was going to start the real testing 14:08:14 <tosky> I'm writing an email about the planned impact on deployment tools and users 14:08:46 <jeremyfreudberg> indeed, i've seen interesting stuff on telles's github 14:09:00 <jeremyfreudberg> and yes, the email is a good idea 14:09:03 <tosky> when we are done with the split, apart from the bugs that can come from it, we can go full steam with API v2 and Python 3 14:09:15 <tosky> which are the other big things 14:09:46 <jeremyfreudberg> yup 14:11:00 <tosky> I guess we are done with the news - any specific point to discuss? Until I'm out from the S3 pit, I don't have a lot more to add 14:12:11 <tosky> oh, health repair 14:12:18 <tosky> let's go for it, jeremyfreudberg 14:12:24 <jeremyfreudberg> yes, health repair 14:12:26 <tosky> #topic Health repair 14:13:34 <jeremyfreudberg> so, i won't discuss every aspect of health repair, but there's one aspect that i've been grappling with recently 14:13:47 <jeremyfreudberg> as you know (or not), the idea was to base health repair off of the existing health checks mechanism 14:14:02 <tosky> I'm re-reading the minutes from the PTG 14:14:18 <jeremyfreudberg> and in looking at that code, i'm surprised at how much database stuff are involved 14:15:28 <jeremyfreudberg> and from what i can tell, the point of putting health checks in the DB is to make things stateful-- don't start a check when one is already in progress, etc. plus with the checks being (configurably) periodic the db kinda acts as a log 14:15:47 <jeremyfreudberg> anyway, my question for today is 14:16:05 <jeremyfreudberg> is all that DB stuff really necessary for health repair? 14:16:46 <jeremyfreudberg> my sense is, kinda, but not totally 14:17:14 <tosky> don't you see the same need for a synchronization point, so that the same operation don't start again? 14:17:24 <tosky> or anyway, do you think it could be implemented differently? 14:17:53 <jeremyfreudberg> short answer- yes to both 14:17:54 <jeremyfreudberg> long answer- 14:20:32 <jeremyfreudberg> because health repair is thought to only be executed by user request (NOT periodic), the synchronization point is easier to pin down. AND, I planned to make the repair modes as idempotent and non-disruptive as possible, so theoretically i don't care if the repair call gets sent twice in quick succession 14:20:53 <jeremyfreudberg> but then again, some kind of locking mechanism seems intuitively necssary 14:21:44 <jeremyfreudberg> not to mention, if there is no lock, then the user could send way-too-many repair requests launching way-too-many subprocesses 14:21:52 <tosky> yep, that's the risk 14:22:02 <tosky> "why it's not working, repair, REPAAAAIR" 14:22:03 <tosky> yeah 14:22:18 <jeremyfreudberg> so, a lock of some kind is necessary, i'm just not convinced that tossing around db state is the right way to go about it 14:22:23 <jeremyfreudberg> not sure how else to do it, though 14:22:45 <tosky> maybe minimizing the use of the DB may be enough 14:23:55 <jeremyfreudberg> i'll see what i can trim out 14:24:02 <jeremyfreudberg> i haven't done much of a deep dive yet 14:24:03 <tosky> or we can have an hard-requirement on tooz (which is optional right now, used only for one functionality) 14:24:18 <jeremyfreudberg> yes, there is tooz 14:24:24 <tosky> ... if we can make it working with python3, the better 14:24:51 <jeremyfreudberg> another subtopic about health repair: 14:25:18 <jeremyfreudberg> i wrote this on the story last night 14:25:43 <jeremyfreudberg> that there won't be a direct correspondence between all the existing health checks, and the new health repair modes 14:25:54 <jeremyfreudberg> at least in the basic case 14:26:55 <tosky> uh, what is the story? I didn't get the notification 14:27:07 <tosky> but I should be subscribed to all sahara* notifications 14:27:20 <jeremyfreudberg> the description update doesn't seem to trigger the email 14:27:25 <jeremyfreudberg> https://storyboard.openstack.org/#!/story/2003842 14:27:28 <tosky> oh, ok 14:29:24 <jeremyfreudberg> let me try to remember what i mean by my point, actually 14:29:26 <tosky> what would the main difference be then? 14:29:30 <tosky> yeah, better 14:31:08 <jeremyfreudberg> actually, i disagree now, with what i just said (not sure what i was thinking last night) 14:31:18 <jeremyfreudberg> all of the health checks can have an inverse which is its repair mode 14:31:51 <jeremyfreudberg> with the exception of this check https://github.com/openstack/sahara/blob/master/sahara/service/health/health_check_base.py#L133 14:36:34 <jeremyfreudberg> oh, i think my point from last night was, health repair can eclipse health check 14:36:50 <jeremyfreudberg> as in, we can write MORE health repair modes, beyond what limited checks we have 14:37:28 <jeremyfreudberg> and my other point-- there is a minimal amount of work that needs to be done before the plugin split (the plugin-specific health repair modes need to be able to import the right stuff from core) 14:37:29 <tosky> and then we will have more checks? If we have more repair modes, it means that we can check that something is really broken 14:37:54 <tosky> uh, I didn't check if Telles also considered that 14:38:21 <jeremyfreudberg> yes-- hopefully new repair modes will encourage new checks 14:39:06 <jeremyfreudberg> regarding the split, i think it should only be one new import to cover 14:40:07 <jeremyfreudberg> there would be a new module similar to sahara/service/health/health_check_base.py which has exceptions and the base class, for the plugin-specific repair modes to consume 14:41:50 <jeremyfreudberg> actually, telles did something which i don't understand 14:42:40 <jeremyfreudberg> looking at what he has on github (which may not be accurate), he simply moved health_check_base.py from sahara/service/health to sahara/plugins 14:42:58 <jeremyfreudberg> but he didn't change the imports within that file 14:43:01 <tosky> uh 14:44:01 <jeremyfreudberg> he did fix the import on the, for example, sahara-plugin-ambari side though 14:44:06 <jeremyfreudberg> anyway, i have to signoff in a minute 14:44:51 <tosky> oki, I guess we discussed enough points 14:45:02 <jeremyfreudberg> yes 14:45:08 <jeremyfreudberg> i'll be sure to look further into health repair 14:45:10 <tosky> Telles, when you read this, remember to recheck the rechecks 14:45:17 <tosky> thanks! 14:45:32 <tosky> so if there is nothing else to discuss, we can close it here 14:45:37 <jeremyfreudberg> yup, thanks, let's close 14:45:48 <tosky> see you next week 14:46:00 <jeremyfreudberg> bye 14:46:14 <tosky> (or even before, on the usual channel) 14:46:21 <tosky> #endmeeting