07:01:00 <yoctozepto> #startmeeting masakari 07:01:00 <openstack> Meeting started Tue Sep 15 07:01:00 2020 UTC and is due to finish in 60 minutes. The chair is yoctozepto. Information about MeetBot at http://wiki.debian.org/MeetBot. 07:01:01 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 07:01:03 <openstack> The meeting name has been set to 'masakari' 07:01:06 <yoctozepto> #topic Roll-call 07:01:18 <yoctozepto> o/ 07:02:01 <jopdorp> here we go 07:02:02 <ykado> o/ 07:02:06 <suzhengwei> hi 07:02:07 <noonedeadpunk> o/ 07:02:17 <suzhengwei> o/ 07:02:55 <jopdorp> o/ 07:03:39 <yoctozepto> #topic Agenda 07:03:43 <yoctozepto> * Roll-call 07:03:43 <yoctozepto> * Agenda 07:03:43 <yoctozepto> * Announcements 07:03:43 <yoctozepto> ** gates have been fixed and are running Focal now 07:03:43 <yoctozepto> ** python-masakariclient 6.1.1 released for Victoria (branched): https://releases.openstack.org/victoria/index.html#victoria-python-masakariclient 07:03:43 <yoctozepto> ** we are in Victoria feature freeze now / RC1 next week: https://releases.openstack.org/victoria/schedule.html 07:03:44 <yoctozepto> * Review action items from the last meeting 07:03:44 <yoctozepto> * CI status 07:03:45 <yoctozepto> * Critical Bugs and Patches 07:03:45 <yoctozepto> * Victoria release planning 07:03:46 <yoctozepto> * Open discussion 07:03:54 <yoctozepto> #topic Announcements 07:04:03 <yoctozepto> #info gates have been fixed and are running Focal now 07:04:30 <yoctozepto> a part of the Victoria community goals was to migrate the testing to Focal which runs Py38 07:04:38 <yoctozepto> it has happened now 07:05:12 <yoctozepto> I had to self-merge a bunch of patches (due to deadlines) to make it happen but we are there 07:05:42 <jopdorp> good 07:05:45 <yoctozepto> all merged patches are visible in gerrit so you can post-review them still and raise any issues; there should really be none though due to the kind of those patches 07:06:09 <jopdorp> I remember a patch that removed py37 tests 07:06:10 <yoctozepto> many thanks to gmann for helping with the migration and driving the goal openstack-wise 07:06:20 <yoctozepto> jopdorp: yes 07:06:30 <yoctozepto> do note py37 is *not* a target platform for openstack now 07:06:41 <jopdorp> cool 07:07:14 <yoctozepto> https://governance.openstack.org/tc/reference/runtimes/victoria.html 07:07:23 <jopdorp> thanks 07:07:41 <yoctozepto> the spoken-of migration has happened so py37 is no longer relevant 07:07:57 <yoctozepto> it's unlikely it breaks now that we test py36 and py38 but who knows :-) 07:08:05 <jopdorp> agreed 07:08:10 <yoctozepto> #info https://governance.openstack.org/tc/reference/runtimes/victoria.html 07:08:31 <yoctozepto> (in case you are wondering why I'm doing the # stuff - it's to get these entries into the summary) 07:08:50 <jopdorp> I was indeed wondering 07:09:02 <yoctozepto> #info python-masakariclient 6.1.1 released for Victoria (branched): https://releases.openstack.org/victoria/index.html#victoria-python-masakariclient 07:09:23 <yoctozepto> python-masakariclient has a stable/victoria branch now 07:09:31 <yoctozepto> this week all client libraries got their releases 07:09:54 <yoctozepto> and there should be no feature changes whatsoever in them now that they are officially "stable" 07:10:13 <yoctozepto> #info we are in Victoria feature freeze now / RC1 next week: https://releases.openstack.org/victoria/schedule.html 07:10:31 <noonedeadpunk> that makes me sad kind of 07:10:42 <yoctozepto> noonedeadpunk: me too, unfortunately 07:10:49 <suzhengwei> me, too 07:11:03 <jopdorp> this one didn't make it https://review.opendev.org/#/c/740777/ 07:11:04 <yoctozepto> don't worry, we'll make wallaby the best masakari release ever :-) 07:11:10 <jopdorp> haha 07:11:13 <jopdorp> yes 07:11:21 <yoctozepto> #info https://review.opendev.org/740777 07:11:50 <yoctozepto> as a matter of fact, due to low gate activity and low complexity of that patch, I suggest we put it in the exceptions and just merge it 07:11:57 <yoctozepto> then we have at least one feature in victoria 07:12:24 <yoctozepto> suzhengwei: what do you think? 07:12:29 <yoctozepto> jopdorp has already +2 07:12:37 <suzhengwei> agree 07:12:44 <yoctozepto> I find it uncomfortable to +2 my own proposals 07:12:48 <yoctozepto> unless they fix the gates 07:13:15 <yoctozepto> suzhengwei: thanks, please review and hopefully leave a +2 today/tomorrow :-) 07:13:26 <jopdorp> nice 07:13:40 <yoctozepto> #info Review action items from the last meeting 07:13:48 <yoctozepto> aaand there were none! 07:13:56 <jopdorp> lol 07:13:58 <yoctozepto> oopsie 07:14:01 <yoctozepto> #undo 07:14:02 <openstack> Removing item from minutes: #info Review action items from the last meeting 07:14:10 <yoctozepto> #topic Review action items from the last meeting 07:14:21 <yoctozepto> #info there were none 07:14:32 <yoctozepto> #topic CI status 07:14:50 <yoctozepto> the master CI is green, I did not check others 07:15:03 <yoctozepto> master+Victoria considering the client has already branched 07:15:38 <yoctozepto> we could use some better organization in this regard, I'll try to spin up some CI glory page like we have for Kolla 07:15:59 <yoctozepto> #action yoctozepto to bring some visibility to Masakari CI status 07:16:09 <jopdorp> I don't really know where to look for the ci stuff 07:16:33 <jopdorp> only the results in gerrit 07:16:42 <jopdorp> but I don't know where they get configured 07:16:59 <jopdorp> or where the results for the main branches are visible 07:17:14 <yoctozepto> no problem, I know most of you are new to this so that's why I just assigned this task to myself 07:17:25 <yoctozepto> in general one can look at https://zuul.opendev.org/t/openstack/builds 07:17:34 <yoctozepto> with filters obviously 07:17:38 <yoctozepto> or https://zuul.opendev.org/t/openstack/status 07:17:41 <yoctozepto> for currently running 07:17:58 <jopdorp> is that also where they are configured? 07:18:03 <yoctozepto> I don't remember if masakari runs relevant periodics at the moment 07:18:26 <yoctozepto> jopdorp: all CI stuff is configured in the repo nowadays 07:18:34 <yoctozepto> the CI/CD system that openstack uses is Zuul 07:18:52 <yoctozepto> in this setup we are considered users of zuul so these docs hold: https://zuul-ci.org/docs/zuul/reference/user.html 07:19:06 <yoctozepto> it's usually either zuul.yaml or zuul.d with some more yamls inside 07:19:20 <yoctozepto> it can be a hidden file/dir so .zuul.yaml .zuul.d respectively 07:19:45 <yoctozepto> zuul is driven using yaml and ansible (which still uses yaml) 07:20:07 <yoctozepto> for all the other details please just review the files in repo and the docs :-) 07:20:12 <jopdorp> thanks, I'll dive into that 07:20:50 <yoctozepto> #topic Critical Bugs and Patches 07:21:04 <yoctozepto> #info none so far 07:21:10 <ykado> hi, I wanted to raise about this review. https://review.opendev.org/#/c/720623/ 07:21:15 <yoctozepto> but it could be that they have not been triaged 07:21:37 <ykado> sorry, probably not the good timing yet? 07:22:09 <yoctozepto> ykado: well, it's a fix to some bug but not necessarily critical I guess? let's postpone for the open discussion 07:22:19 <ykado> ok 07:22:38 <yoctozepto> if you know of breaking/fugly bugs then please report/triage them 07:22:51 <jopdorp> we encountered something that I'm not entirely sure is a masakari bug 07:23:12 <yoctozepto> please speak up 07:23:20 <jopdorp> but we weren't able yet to get failovers of instances with LUKS encrypted volumes tow ork 07:23:36 <jopdorp> they get a keymanager error 07:23:42 <yoctozepto> hmm, that does not sound like something masakari could go wrong about 07:23:45 <jopdorp> barbican right related 07:23:52 <jopdorp> rights 07:23:58 <yoctozepto> masakari essentially runs evacuations against instances 07:24:09 <yoctozepto> try plain evacuation and it might be failing 07:24:20 <jopdorp> yeah 07:24:27 <yoctozepto> I *think* I saw someone reporting this issue against cinder+barbican 07:24:27 <jopdorp> I think it's more configuration related 07:24:37 <yoctozepto> could 07:24:51 <yoctozepto> well then, let's not wander offtopic too much :-) 07:24:53 <jopdorp> probably the place would be @openstack-kolla 07:24:55 <jopdorp> # 07:25:08 <yoctozepto> jopdorp: yeah 07:25:11 <yoctozepto> #topic Victoria release planning 07:25:45 <yoctozepto> we already know it's frozen (freezing? :-) ) and we can only really squeeze that one patch of mine I mentioned 07:25:51 <yoctozepto> (plus obviously any bug fixes) 07:26:05 <yoctozepto> (noonedeadpunk triggered) 07:26:19 <yoctozepto> next week is RC1 07:26:29 <yoctozepto> so all the other repos will branch stable/victoria as well 07:26:45 <yoctozepto> RC1 is R-3 07:26:57 <yoctozepto> so then it's a matter of 3 weeks to polish eventual issues 07:27:31 <yoctozepto> #topic Open discussion 07:27:42 <yoctozepto> ykado: now it's the time 07:27:47 <yoctozepto> what about that commit 07:27:52 <ykado> yoctozepto: thanks 07:28:02 <yoctozepto> https://review.opendev.org/720623 07:28:15 <ykado> I was wondering how this can progress. 07:28:53 <yoctozepto> the Radosław guy is me so my opinion on how that should progress is in that comment there 07:29:29 <yoctozepto> the "better design" part is surely about wallaby now 07:29:43 <yoctozepto> but the rest holds 07:30:25 <yoctozepto> suzhengwei: could you comment on that? 07:30:49 <yoctozepto> the part that needs dealing is "I guess we could still mix the two ideas and check on init while considering the timeout" 07:31:04 <suzhengwei> the current design is the simplest 07:31:21 <yoctozepto> so it's actually a mix of your (suzhengwei) and tpatil's ideas 07:32:12 <suzhengwei> no, I didn't get tpatil's thought. 07:32:38 <ykado> I see. it is true, that there is no real reproducible way, if I understand it correctly. 07:32:38 <ykado> I only could reproduce this by forcefully powering off all the compute nodes or by stopping the masakari-engine services 07:32:38 <ykado> however, without this patch there is no way to recover the compute-nodes that got resolved, unless you update the database manually 07:33:26 <suzhengwei> yes, it is a big use problem for product. 07:33:49 <yoctozepto> hmm, maybe what we need is an easier but manual way to achieve that 07:34:13 <yoctozepto> I'm worried tpatil is right that this could be too aggressive and result in more masakari surprises 07:34:46 <yoctozepto> i.e. simple but backstabbing :-) 07:35:56 <suzhengwei> It give a expired time for user to config. and that looks reasonable. 07:36:38 <ykado> I agree. sorry, I'm still new to Masakari. but what are the potential issue that relates with "running" statuses, if this get removed(although the default timeout value is quite long as suzhengwei mentioned) ? 07:36:55 <suzhengwei> If it can't recovery one failure host in a short time, the HA is useless. 07:38:25 <yoctozepto> suzhengwei: true that 07:38:27 <suzhengwei> I think to turn it into failure is OK 07:38:58 <yoctozepto> one thing is some notifications are host-level so for a large host this could take a while 07:40:13 <suzhengwei> so I leave the expired time configrable. 07:40:58 <yoctozepto> aye, it's set to 24 hours 07:41:01 <yoctozepto> by default 07:42:26 <yoctozepto> ok, there is one edge case that this deals with because of RUNNING and generated_time 07:42:46 <yoctozepto> it could be that the engine picks up a notification to run and self-sabotages itself 07:43:06 <yoctozepto> imagine a situation where the engine was down too long 07:43:36 <yoctozepto> or maybe not 07:43:44 <yoctozepto> because rpc call will surely expire by this point 07:44:33 <suzhengwei> I think it is an controller node issue. 07:44:47 <suzhengwei> controller HA issue. 07:45:30 <yoctozepto> yeah, masakari does not do a great job of self-HA 07:45:35 <suzhengwei> other service also suffer 07:46:32 <yoctozepto> yeah, but it's no consolation considering masakari is THE HA project :D 07:48:07 <suzhengwei> controller HA and instance/compute HA is diffrent issue. They have their own solution. 07:49:33 <yoctozepto> true that but still sad 07:49:42 <suzhengwei> we can't avoid all controller HA problem influence. 07:50:47 <yoctozepto> well, we could mitigate more though but it needs some redesign to happen 07:51:03 <yoctozepto> your approach seems to be dealing with the reported issue 07:51:23 <suzhengwei> Doing in a better way is better than doing nothing. 07:53:19 <suzhengwei> https://review.opendev.org/#/c/732477/ 07:54:11 <suzhengwei> this is a long term spec, it gives a solution-degrade retry. 07:56:10 <yoctozepto> looks promising 07:57:03 <yoctozepto> ok, I'll re-review suzhengwei's patch; I just need to delve into the masakari code more to be more confident about it 07:57:11 <yoctozepto> anyone else up to the review task? 07:58:42 <yoctozepto> oh my, I have completely forgotten - the virtual PTG is coming - do we want a session for masakari? do you have any time preferences? please let me know via mail - I'll spin up a thread on openstack-discuss 07:58:48 <yoctozepto> I hope you are all subscribed 07:59:14 <yoctozepto> #action yoctozepto to spin up a Masakari Wallaby vPTG thread on openstack-discuss mailing list 07:59:45 <yoctozepto> please suzhengwei remember to review the only-feature-patch-that-we-can-get-it 07:59:47 <yoctozepto> thank you 07:59:52 <yoctozepto> and thank you all for attending 07:59:58 <ykado> thank you! 08:00:03 <yoctozepto> #endmeeting