07:01:00 <yoctozepto> #startmeeting masakari
07:01:00 <openstack> Meeting started Tue Sep 15 07:01:00 2020 UTC and is due to finish in 60 minutes.  The chair is yoctozepto. Information about MeetBot at http://wiki.debian.org/MeetBot.
07:01:01 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
07:01:03 <openstack> The meeting name has been set to 'masakari'
07:01:06 <yoctozepto> #topic Roll-call
07:01:18 <yoctozepto> o/
07:02:01 <jopdorp> here we go
07:02:02 <ykado> o/
07:02:06 <suzhengwei> hi
07:02:07 <noonedeadpunk> o/
07:02:17 <suzhengwei> o/
07:02:55 <jopdorp> o/
07:03:39 <yoctozepto> #topic Agenda
07:03:43 <yoctozepto> * Roll-call
07:03:43 <yoctozepto> * Agenda
07:03:43 <yoctozepto> * Announcements
07:03:43 <yoctozepto> ** gates have been fixed and are running Focal now
07:03:43 <yoctozepto> ** python-masakariclient 6.1.1 released for Victoria (branched): https://releases.openstack.org/victoria/index.html#victoria-python-masakariclient
07:03:43 <yoctozepto> ** we are in Victoria feature freeze now / RC1 next week: https://releases.openstack.org/victoria/schedule.html
07:03:44 <yoctozepto> * Review action items from the last meeting
07:03:44 <yoctozepto> * CI status
07:03:45 <yoctozepto> * Critical Bugs and Patches
07:03:45 <yoctozepto> * Victoria release planning
07:03:46 <yoctozepto> * Open discussion
07:03:54 <yoctozepto> #topic Announcements
07:04:03 <yoctozepto> #info gates have been fixed and are running Focal now
07:04:30 <yoctozepto> a part of the Victoria community goals was to migrate the testing to Focal which runs Py38
07:04:38 <yoctozepto> it has happened now
07:05:12 <yoctozepto> I had to self-merge a bunch of patches (due to deadlines) to make it happen but we are there
07:05:42 <jopdorp> good
07:05:45 <yoctozepto> all merged patches are visible in gerrit so you can post-review them still and raise any issues; there should really be none though due to the kind of those patches
07:06:09 <jopdorp> I remember a patch that removed py37 tests
07:06:10 <yoctozepto> many thanks to gmann for helping with the migration and driving the goal openstack-wise
07:06:20 <yoctozepto> jopdorp: yes
07:06:30 <yoctozepto> do note py37 is *not* a target platform for openstack now
07:06:41 <jopdorp> cool
07:07:14 <yoctozepto> https://governance.openstack.org/tc/reference/runtimes/victoria.html
07:07:23 <jopdorp> thanks
07:07:41 <yoctozepto> the spoken-of migration has happened so py37 is no longer relevant
07:07:57 <yoctozepto> it's unlikely it breaks now that we test py36 and py38 but who knows :-)
07:08:05 <jopdorp> agreed
07:08:10 <yoctozepto> #info https://governance.openstack.org/tc/reference/runtimes/victoria.html
07:08:31 <yoctozepto> (in case you are wondering why I'm doing the # stuff - it's to get these entries into the summary)
07:08:50 <jopdorp> I was indeed wondering
07:09:02 <yoctozepto> #info python-masakariclient 6.1.1 released for Victoria (branched): https://releases.openstack.org/victoria/index.html#victoria-python-masakariclient
07:09:23 <yoctozepto> python-masakariclient has a stable/victoria branch now
07:09:31 <yoctozepto> this week all client libraries got their releases
07:09:54 <yoctozepto> and there should be no feature changes whatsoever in them now that they are officially "stable"
07:10:13 <yoctozepto> #info we are in Victoria feature freeze now / RC1 next week: https://releases.openstack.org/victoria/schedule.html
07:10:31 <noonedeadpunk> that makes me sad kind of
07:10:42 <yoctozepto> noonedeadpunk: me too, unfortunately
07:10:49 <suzhengwei> me, too
07:11:03 <jopdorp> this one didn't make it https://review.opendev.org/#/c/740777/
07:11:04 <yoctozepto> don't worry, we'll make wallaby the best masakari release ever :-)
07:11:10 <jopdorp> haha
07:11:13 <jopdorp> yes
07:11:21 <yoctozepto> #info https://review.opendev.org/740777
07:11:50 <yoctozepto> as a matter of fact, due to low gate activity and low complexity of that patch, I suggest we put it in the exceptions and just merge it
07:11:57 <yoctozepto> then we have at least one feature in victoria
07:12:24 <yoctozepto> suzhengwei: what do you think?
07:12:29 <yoctozepto> jopdorp has already +2
07:12:37 <suzhengwei> agree
07:12:44 <yoctozepto> I find it uncomfortable to +2 my own proposals
07:12:48 <yoctozepto> unless they fix the gates
07:13:15 <yoctozepto> suzhengwei: thanks, please review and hopefully leave a +2 today/tomorrow :-)
07:13:26 <jopdorp> nice
07:13:40 <yoctozepto> #info Review action items from the last meeting
07:13:48 <yoctozepto> aaand there were none!
07:13:56 <jopdorp> lol
07:13:58 <yoctozepto> oopsie
07:14:01 <yoctozepto> #undo
07:14:02 <openstack> Removing item from minutes: #info Review action items from the last meeting
07:14:10 <yoctozepto> #topic Review action items from the last meeting
07:14:21 <yoctozepto> #info there were none
07:14:32 <yoctozepto> #topic CI status
07:14:50 <yoctozepto> the master CI is green, I did not check others
07:15:03 <yoctozepto> master+Victoria considering the client has already branched
07:15:38 <yoctozepto> we could use some better organization in this regard, I'll try to spin up some CI glory page like we have for Kolla
07:15:59 <yoctozepto> #action yoctozepto to bring some visibility to Masakari CI status
07:16:09 <jopdorp> I don't really know where to look for the ci stuff
07:16:33 <jopdorp> only the results in gerrit
07:16:42 <jopdorp> but I don't know where they get configured
07:16:59 <jopdorp> or where the results for the main branches are visible
07:17:14 <yoctozepto> no problem, I know most of you are new to this so that's why I just assigned this task to myself
07:17:25 <yoctozepto> in general one can look at https://zuul.opendev.org/t/openstack/builds
07:17:34 <yoctozepto> with filters obviously
07:17:38 <yoctozepto> or https://zuul.opendev.org/t/openstack/status
07:17:41 <yoctozepto> for currently running
07:17:58 <jopdorp> is that also where they are configured?
07:18:03 <yoctozepto> I don't remember if masakari runs relevant periodics at the moment
07:18:26 <yoctozepto> jopdorp: all CI stuff is configured in the repo nowadays
07:18:34 <yoctozepto> the CI/CD system that openstack uses is Zuul
07:18:52 <yoctozepto> in this setup we are considered users of zuul so these docs hold: https://zuul-ci.org/docs/zuul/reference/user.html
07:19:06 <yoctozepto> it's usually either zuul.yaml or zuul.d with some more yamls inside
07:19:20 <yoctozepto> it can be a hidden file/dir so .zuul.yaml .zuul.d respectively
07:19:45 <yoctozepto> zuul is driven using yaml and ansible (which still uses yaml)
07:20:07 <yoctozepto> for all the other details please just review the files in repo and the docs :-)
07:20:12 <jopdorp> thanks, I'll dive into that
07:20:50 <yoctozepto> #topic Critical Bugs and Patches
07:21:04 <yoctozepto> #info none so far
07:21:10 <ykado> hi, I wanted to raise about this review. https://review.opendev.org/#/c/720623/
07:21:15 <yoctozepto> but it could be that they have not been triaged
07:21:37 <ykado> sorry, probably not the good timing yet?
07:22:09 <yoctozepto> ykado: well, it's a fix to some bug but not necessarily critical I guess? let's postpone for the open discussion
07:22:19 <ykado> ok
07:22:38 <yoctozepto> if you know of breaking/fugly bugs then please report/triage them
07:22:51 <jopdorp> we encountered something that I'm not entirely sure is a masakari bug
07:23:12 <yoctozepto> please speak up
07:23:20 <jopdorp> but we weren't able yet to get failovers of instances with LUKS encrypted volumes tow ork
07:23:36 <jopdorp> they get a keymanager error
07:23:42 <yoctozepto> hmm, that does not sound like something masakari could go wrong about
07:23:45 <jopdorp> barbican right related
07:23:52 <jopdorp> rights
07:23:58 <yoctozepto> masakari essentially runs evacuations against instances
07:24:09 <yoctozepto> try plain evacuation and it might be failing
07:24:20 <jopdorp> yeah
07:24:27 <yoctozepto> I *think* I saw someone reporting this issue against cinder+barbican
07:24:27 <jopdorp> I think it's more configuration related
07:24:37 <yoctozepto> could
07:24:51 <yoctozepto> well then, let's not wander offtopic too much :-)
07:24:53 <jopdorp> probably the place would be @openstack-kolla
07:24:55 <jopdorp> #
07:25:08 <yoctozepto> jopdorp: yeah
07:25:11 <yoctozepto> #topic Victoria release planning
07:25:45 <yoctozepto> we already know it's frozen (freezing? :-) ) and we can only really squeeze that one patch of mine I mentioned
07:25:51 <yoctozepto> (plus obviously any bug fixes)
07:26:05 <yoctozepto> (noonedeadpunk triggered)
07:26:19 <yoctozepto> next week is RC1
07:26:29 <yoctozepto> so all the other repos will branch stable/victoria as well
07:26:45 <yoctozepto> RC1 is R-3
07:26:57 <yoctozepto> so then it's a matter of 3 weeks to polish eventual issues
07:27:31 <yoctozepto> #topic Open discussion
07:27:42 <yoctozepto> ykado: now it's the time
07:27:47 <yoctozepto> what about that commit
07:27:52 <ykado> yoctozepto: thanks
07:28:02 <yoctozepto> https://review.opendev.org/720623
07:28:15 <ykado> I was wondering how this can progress.
07:28:53 <yoctozepto> the Radosław guy is me so my opinion on how that should progress is in that comment there
07:29:29 <yoctozepto> the "better design" part is surely about wallaby now
07:29:43 <yoctozepto> but the rest holds
07:30:25 <yoctozepto> suzhengwei: could you comment on that?
07:30:49 <yoctozepto> the part that needs dealing is "I guess we could still mix the two ideas and check on init while considering the timeout"
07:31:04 <suzhengwei> the current design is the simplest
07:31:21 <yoctozepto> so it's actually a mix of your (suzhengwei) and tpatil's ideas
07:32:12 <suzhengwei> no, I didn't get tpatil's thought.
07:32:38 <ykado> I see. it is true, that there is no real reproducible way, if I understand it correctly.
07:32:38 <ykado> I only could reproduce this by forcefully powering off all the compute nodes or by stopping the masakari-engine services
07:32:38 <ykado> however, without this patch there is no way to recover the compute-nodes that got resolved, unless you update the database manually
07:33:26 <suzhengwei> yes, it is a big use problem for product.
07:33:49 <yoctozepto> hmm, maybe what we need is an easier but manual way to achieve that
07:34:13 <yoctozepto> I'm worried tpatil is right that this could be too aggressive and result in more masakari surprises
07:34:46 <yoctozepto> i.e. simple but backstabbing :-)
07:35:56 <suzhengwei> It give a expired time for user to config. and that looks reasonable.
07:36:38 <ykado> I agree. sorry, I'm still new to Masakari. but what are the potential issue that relates with "running" statuses, if this get removed(although the default timeout value is quite long as suzhengwei mentioned) ?
07:36:55 <suzhengwei> If it can't recovery one failure host in a short time, the HA is useless.
07:38:25 <yoctozepto> suzhengwei: true that
07:38:27 <suzhengwei> I think to turn it into failure is OK
07:38:58 <yoctozepto> one thing is some notifications are host-level so for a large host this could take a while
07:40:13 <suzhengwei> so I leave the expired time configrable.
07:40:58 <yoctozepto> aye, it's set to 24 hours
07:41:01 <yoctozepto> by default
07:42:26 <yoctozepto> ok, there is one edge case that this deals with because of RUNNING and generated_time
07:42:46 <yoctozepto> it could be that the engine picks up a notification to run and self-sabotages itself
07:43:06 <yoctozepto> imagine a situation where the engine was down too long
07:43:36 <yoctozepto> or maybe not
07:43:44 <yoctozepto> because rpc call will surely expire by this point
07:44:33 <suzhengwei> I think it is an controller node issue.
07:44:47 <suzhengwei> controller HA issue.
07:45:30 <yoctozepto> yeah, masakari does not do a great job of self-HA
07:45:35 <suzhengwei> other service also suffer
07:46:32 <yoctozepto> yeah, but it's no consolation considering masakari is THE HA project :D
07:48:07 <suzhengwei> controller HA and instance/compute HA is diffrent issue. They have their own solution.
07:49:33 <yoctozepto> true that but still sad
07:49:42 <suzhengwei> we can't avoid all controller HA problem influence.
07:50:47 <yoctozepto> well, we could mitigate more though but it needs some redesign to happen
07:51:03 <yoctozepto> your approach seems to be dealing with the reported issue
07:51:23 <suzhengwei> Doing in a better way is better than doing nothing.
07:53:19 <suzhengwei> https://review.opendev.org/#/c/732477/
07:54:11 <suzhengwei> this is a long term spec, it gives a solution-degrade retry.
07:56:10 <yoctozepto> looks promising
07:57:03 <yoctozepto> ok, I'll re-review suzhengwei's patch; I just need to delve into the masakari code more to be more confident about it
07:57:11 <yoctozepto> anyone else up to the review task?
07:58:42 <yoctozepto> oh my, I have completely forgotten - the virtual PTG is coming - do we want a session for masakari? do you have any time preferences? please let me know via mail - I'll spin up a thread on openstack-discuss
07:58:48 <yoctozepto> I hope you are all subscribed
07:59:14 <yoctozepto> #action yoctozepto to spin up a Masakari Wallaby vPTG thread on openstack-discuss mailing list
07:59:45 <yoctozepto> please suzhengwei remember to review the only-feature-patch-that-we-can-get-it
07:59:47 <yoctozepto> thank you
07:59:52 <yoctozepto> and thank you all for attending
07:59:58 <ykado> thank you!
08:00:03 <yoctozepto> #endmeeting