#openstack-freezer log

16:05:19 <vannif> #startmeeting 2015-10-29
16:05:19 <freezerBot`> Meeting started Thu Oct 29 16:05:19 2015 UTC and is due to finish in 60 minutes.  The chair is vannif. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:05:20 <freezerBot`> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:05:20 <freezerBot`> The meeting name has been set to '2015_10_29'
16:05:21 <openstack> Meeting started Thu Oct 29 16:05:19 2015 UTC and is due to finish in 60 minutes.  The chair is vannif. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:05:22 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:05:24 <openstack> The meeting name has been set to '2015_10_29'
16:05:30 <vannif> hello everyone
16:05:58 <marzif_> hello :)
16:06:33 <vannif> shall we begin ?
16:06:37 <marzif_> yep
16:06:57 <vannif> marzif, please start
16:06:59 <marzif_> I've been working on https://review.openstack.org/#/c/238940/ and https://review.openstack.org/#/c/238933/
16:07:35 <marzif_> mainly on improving pbr support and requirements and still have the agent work on windows too (also with m3m0  code)
16:08:05 <marzif_> so now are installing before the setup() in setup.py the pre requirements
16:08:16 <marzif_> as advised in the pbr doc
16:08:38 <marzif_> well more then work on windows, that code allows the agent to be installed on windows
16:08:51 <marzif_> what really make the agent work is m3m0  code
16:09:04 <marzif_> also I've been doing few code reviews
16:09:30 <marzif_> I hope we can finish by tomorrow the dvsm integration job and restart working the block based incremental backups
16:09:56 <vannif> so, we're introducing the freezer-agent in place of the freezerc script
16:10:10 <marzif_> one nice things also we submitted for the big tent here: https://review.openstack.org/#/c/239668/
16:10:14 <vannif> that is simply a kind of alias, right ?
16:10:20 <marzif_> we are using both now
16:10:22 <marzif_> they are alias
16:10:23 <marzif_> yes
16:11:10 <vannif> but in the future the plan is to have the freezer-agent to be incompatible from the point of view of the command line options
16:11:28 <marzif_> I got disconnected sorry...
16:11:40 <marzif_> the plan would be to remove the freezerc
16:11:54 <marzif_> and use only the freezer-agent with config files
16:11:58 <marzif_> we need to write a blueprint for that
16:12:05 <marzif_> or add a specs
16:12:35 <marzif_> I think we can have a point of discussion on the wiki
16:12:36 <marzif_> https://wiki.openstack.org/wiki/Freezer
16:12:50 <marzif_> all are encouraged to improve/modify that
16:13:20 <marzif_> and also we have a manifesto available here https://etherpad.openstack.org/p/freezer-manifesto please improve as you think would be better
16:14:29 <vannif> good point
16:14:41 <marzif_> this is quite all the tasks I've been working in the last week
16:15:00 <vannif> we definitely need to close the dvsm integration jobs, too
16:15:12 <marzif_> yep
16:16:26 <vannif> good. thank you
16:16:31 <marzif_> : )
16:16:41 <vannif> reldan ?
16:16:44 <marzif_> ah... one last thing...
16:17:04 <marzif_> Saturday there's a new release planned of freezer
16:17:08 <marzif_> on pypi
16:17:18 <marzif_> and branch on git
16:17:42 <marzif_> so we should try to send our changes in by EOD tomorrow
16:18:08 <m3m0> is the code stable enough for pypi?
16:18:11 <vannif> yes. that's why we are all involved in code reviews :)
16:18:35 <reldan> Ok, my turn )
16:18:44 <reldan> Chunk size for swift and fix for nova backups are merged now. I’m working on cinder backups. And going to send a pull request today or tomorrow.
16:18:50 <marzif_> well, the only way to know... is by adding the integration test job
16:18:51 <marzif_> :)
16:20:05 <reldan> Problems: 1) We need to understand how to implement nova backups with bootable disk 2) We need to have discussion about new version of config (for parallel backups) in client
16:20:23 <vannif> We have some manual integration tests. They don't cover completely *all* the features, but the most important yes. automating the tests is the current effort on the testing side
16:20:33 <vannif> sorry reldan, please go on
16:20:42 <reldan> It’s ok )
16:21:33 <marzif_> reldan, feel free to create a subsection in the wiki to propose that if you want
16:22:01 <vannif> yes relda. As marzif suggested, we can also start and share the discussion on the wiki, with an example config file for example
16:22:32 <reldan> Ok. I can create two blueprints and send it on your review.
16:22:56 <vannif> when sections and options make sense, we can write some tests and then continue with the implementation ;)
16:23:24 <reldan> And I know that we would like to migrate our config to oslo.config
16:23:49 <reldan> So our changes will be atop of oslo.config integration?
16:24:39 <reldan> Ok, I can describe everything in my blueprints
16:24:51 <vannif> yes. the idea is that. and also split the freezer-scheduler from the freezerclient, which should use cliff and act only as an interface to the freezer-api
16:25:30 <reldan> In this case we should understand “depends on” relations between these improvements
16:25:58 <marzif_> +1
16:26:01 <reldan> But you are right, let me write everything in blueprint. I can also create issues or question list there
16:26:39 <reldan> I also should write integration tests on nova and cinder backup
16:26:59 <vannif> maybe we can work on the freezer-agent to fully support any new feature, and leave the freezerc with support for a reduced set of features
16:27:32 <vannif> I think that's the point of having both freezerc and freezer-agent around for some time
16:27:58 <reldan> So action plan for me: 1) Cinder backup fixes 2) Blueprints for nova bootable disk backup problem and parallel backup config 3) Integration tests for nova and cinder backup
16:28:25 <reldan> Let’s discuss. I actually have no strong opinion how to do changes with configuration
16:28:48 <reldan> it’s seems to be all from my side
16:29:11 <vannif> thanks reldan. great job, btw
16:29:19 <reldan> Thank you vannif !
16:29:51 <vannif> on my side.
16:30:35 <vannif> nothing particularly relevant. code reviews, a (kind of) fix for the initialization of the elasticsearch index to support replicas
16:30:47 <vannif> https://review.openstack.org/#/c/239880/
16:32:08 <marzif_> vannif, <subliminal/*devstack integration gate job*/subliminal>
16:32:10 <marzif_> :)
16:32:21 <vannif> new code to support editing of actions in the api is not complete. It adds knowledge of jobs to the api, which has been treating jobs as opaque document until now
16:32:23 <vannif> ahahhah
16:32:25 <vannif> yes
16:32:30 <vannif> I have to focus more on the dvsm integration tests.
16:32:52 <vannif> sorry guys .. I have to switch to integ tests ... now
16:32:53 <vannif> ahah
16:32:56 <vannif> I'm joking
16:32:58 <marzif_> lol
16:33:09 <marzif_> not a bad joke :)
16:33:18 <vannif> well. that's all for me.
16:33:31 <vannif> m3m0 ?
16:33:45 <m3m0> sup
16:33:58 <m3m0> I've been working on the ui mostly
16:34:48 <m3m0> on internal improvements to reduce the amount of code and have better resiliency
16:35:29 <m3m0> I'm implementing react js in some of the modal windows to improve usability but this is a experiment (because I'm learning react)
16:36:06 <m3m0> on the other side of business the freezer-agent on windows is working
16:36:14 <m3m0> with all the new changes
16:36:26 <m3m0> and I'm currently implementing the scheduler as well
16:36:35 <m3m0> that's it for me
16:36:42 <vannif> are the fs snapshots being managed correctly ?
16:36:56 <vannif> in case of errors, in particular
16:36:58 <marzif_> good question....
16:38:58 <m3m0> I've been trying to recreate the error that I had where the snapshot wasn't remove while the agent fails
16:39:46 <m3m0> but the latest experiments that I did the snapshot was removed correctly
16:40:10 <m3m0> and I had to change the default value for snapshot in the arguments to be False
16:40:50 <reldan> but you are right m3m0, always can imagine a forced closing of freezer in the middle of doing backup
16:40:59 <reldan> And it can cause for example blocking mysql
16:41:33 <m3m0> the try: finally: should be enough for this cases
16:41:39 <reldan> we have made a flush and stop transactions and then by some reason our process is killed
16:42:04 <reldan> Yes, but not in case of process killing
16:43:05 <m3m0> good point
16:43:13 <vannif> the unlock of the mysql is in a "finally" statement
16:43:15 <vannif> yes
16:43:28 <reldan> I know that the probability isn’t big enought
16:43:49 <vannif> even the finally statement needs to be carefully analyzed. pitfalls can show up unexpectedly
16:43:55 <reldan> And can be solved only be having some sort of watch dog
16:44:21 <reldan> That can detect a failure of freezerc
16:44:22 <m3m0> does anyone know any sql server "expert"
16:44:43 <reldan> you?
16:44:54 <marzif_> yes, one guy called m3m0
16:45:05 <m3m0> hahaha I'm just a very handsome guy
16:45:13 <reldan> :)
16:45:15 <vannif> well ... process killing ... we can make the process "difficult" to be killed. but in the end, if the user intentionally disrupts the process ...
16:45:37 <reldan> It can take too much memory and be killed by os
16:45:57 <reldan> let’s say we have wrong chunk_size parameter
16:46:04 <m3m0> or the process can be unresponsive
16:46:29 <m3m0> that happens a lot on windows at least :P
16:47:12 <vannif> so, the idea could be to have a watchdog process to andle the killing and unlocking the db ?
16:47:40 <reldan> Yes, but we should have different watchdogs for windows and linux ))
16:47:46 <m3m0> I don't like the idea
16:47:55 <vannif> I think we can delay that to a later stage, only if the problem arises
16:48:51 <vannif> I'm used to watchdogs in electronics, but at the moment, I'd say not to mess with convoluted solutions
16:49:12 <m3m0> we are adding more and more complexity in each iteration
16:49:22 <vannif> yes
16:49:30 <reldan> Ok ) but we still can have garbage after the run
16:49:34 <vannif> and then ... who watches the watchdog ? :)
16:49:36 <reldan> lvm snapshot
16:49:45 <reldan> we can place it to initd
16:49:50 <m3m0> a watchcat
16:49:53 <reldan> ))
16:49:56 <reldan> Ok, sorry
16:50:34 <vannif> leaving a lvm snapshot around is not a tremendous problem.
16:50:47 <vannif> forgetting to unlock the db is much worse :)
16:51:05 <reldan> You know
16:51:10 <reldan> we can have something like log
16:51:16 <reldan> Or distributed log
16:51:32 <reldan> So we can see - ok we have server5 and it blocked the database
16:51:43 <reldan> and don’t response already 10 minutes
16:52:05 <reldan> so probably it should trigger alert
16:52:10 <m3m0> can we do the blocking in a commit rollback scenario?
16:52:38 <marzif_> in linux, with pymysql the db is unlocked automatically when the process exits, so if the python process crash the dh is unlocked
16:52:49 <marzif_> but we need to test that more I think
16:53:00 <marzif_> that should work the same also for windows as we are using the same module
16:55:03 <vannif> yes. let's keep it simple for now, improving testing
16:55:48 <vannif> let's say the idea of "heartbeats" and watchdogs is on-hold.
16:55:52 <vannif> is that all m3m0 ?
16:56:47 <m3m0> yep, that's all from my side
16:56:51 <vannif> thanks
16:57:25 <vannif> federico, are you there ? do you want to say something on your side ?
16:57:32 <marzif_> federico3, ^^
16:58:01 <vannif> like lupinIII, sorry :)
17:03:59 <vannif> well. I think that's all
17:04:09 <vannif> thank you everyone
17:04:22 <vannif> #endmeeting