15:00:54 #startmeeting manila 15:00:54 Meeting started Thu Jan 14 15:00:54 2021 UTC and is due to finish in 60 minutes. The chair is gouthamr. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:55 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:58 The meeting name has been set to 'manila' 15:01:01 o/ 15:01:03 o/ 15:01:11 courtesy ping: ganso dviroel lseki tbarron andrebeltrami felipe_rodrigues esantos 15:01:13 Hi 15:01:13 hello 15:01:47 hello o/ 15:02:12 here's our agenda for today: https://wiki.openstack.org/wiki/Manila/Meetings#Next_meeting 15:02:17 hi 15:02:33 thank you for tuning in, let's begin 15:02:38 #topic Announcements 15:03:19 We've an upcoming bug target deadline next week: Milestone-r 15:03:24 Milestone-2* 15:03:27 o/ 15:03:29 #link https://releases.openstack.org/wallaby/schedule.html 15:03:49 our new driver deadline is in 3 weeks 15:04:11 and the feature proposal freeze follows that 15:04:13 o/ 15:05:52 o/ 15:05:53 so as we're slowly inching our way into the tail end of this release cycle, a lot of code could be submitted, and as ever, we'll need to keep the momentum on reviews :) 15:06:31 no other pressing announcements atm, anyone else got any? 15:07:36 cool, lets move on.. 15:07:49 #topic Manila deployment via Kolla-Ansible (mgoddard, eliaswimmer) 15:08:03 Hi Zorillas 15:08:09 hi mgoddard, eliaswimmer 15:08:15 o/ 15:08:23 hi 15:08:44 This agenda item came about after discussing manila in yesterday's kolla meeting 15:09:19 We have a bug, it seems we are doing multinode manila incorrectly 15:09:21 https://bugs.launchpad.net/kolla-ansible/+bug/1905542 15:09:23 Launchpad bug 1905542 in kolla-ansible "Manila ceph configuration won't work in HA mode" [Undecided,New] 15:09:52 We naively deploy manila-share active/active, and hope for the best 15:10:29 tbarron joined in the conversation and suggested we discuss the state of manila in Kolla here 15:10:34 Are you doing this for cinder-volume as well? 15:10:34 so here we are 15:10:46 well, that's another story 15:11:00 mgoddard: And thanks for joining us! 15:11:28 we were previously using cinder backend_host, then it somehow got dropped, and now we have a bug which we're working on 15:11:38 I ask because manila developed as a fork of cinder and we inheritied some issues w.r.t. active-active service talking to storage back end 15:11:49 They are making some progress. 15:11:56 right 15:12:00 We keep talking about it but 15:12:23 it certainly would be nice to have 15:12:36 as I mentioned to you many of us work with deployment arch (tripleo) that does a pretty good job running the service active-passive so 15:12:45 I will do everything in my power not to have to deploy pacemaker 15:12:53 we've been able to get away with procrastination. 15:13:21 mgoddard: I am actually quie sympathetic to that POV and am not recommendinng pacemaker for kolla-ansible. 15:13:50 yeah, testing/supporting active-active HA for manila-share has been on the backlog for a while 15:13:51 We already have the machinery in place so using it for manila isn't a big deal in that context. 15:13:53 do we have any alternatives? 15:14:06 (for existing releases) 15:14:54 mgoddard: I'm not draft *you* as I know you are doing many things already but we really need some folks working in manila whose 15:15:03 day jobs don't depend on tripleo 15:15:10 to balance things out 15:15:37 not that i'm aware of, we've only ever tested/documented using pacemaker: 15:15:39 #link https://docs.openstack.org/ha-guide/storage-ha-file-systems.html#add-shared-file-systems-api-resource-to-pacemaker 15:15:46 mgoddard: the particular bug you raise is one thing, I think the independent keys idea may work for that 15:16:19 But the more general problem that active-active is not really safe for manila-share service yet is another matter. 15:16:20 * gouthamr scratch that link, it should be manila-share we're talking about 15:18:16 mgoddard: manila-share core code needs some fixes, then back-end by back-end there need to be fixes. 15:18:36 mgoddard: which back ends is the kolla community most interested in? 15:19:14 for me it is ceph and generic 15:19:22 we support generic, hnas, cephfsnative, cephfsnfs & glusterfsnfs 15:19:35 although it's fairly customisable 15:19:51 so other drivers too potentially 15:21:19 so from our perspective I think there are two issues 15:21:41 1. what do we need to do today to safely use existing manila-share 15:22:07 2. if and when active/active manila-share becomes available, how do we use it 15:22:53 I'd love to find you some contributors, although I expect many projects are in a similar position 15:23:49 ^ +1 thanks for that - that'd be the prime concern - we could talk in theory about the pitfalls there may be for attempting active-active HA 15:23:57 with the workaround you have 15:24:13 for the native cephfs backend 15:24:39 that would be helpful to understand the severity of the issue 15:25:02 from the top of my head: driver startup, and periodic driver interactions 15:25:25 note that depending on the back end there is an issue of HA for the back end as well 15:25:35 no issue for native cephfs 15:26:13 for generic driver we have the issues covered in this incomplete review https://review.opendev.org/c/openstack/manila-specs/+/504987 15:26:46 queens :) 15:26:51 for cephfsnfs, tripleo uses pacemaker :( 15:27:09 mgoddard: yeah, we've been talking about this stuff for some time 15:27:25 during startup, there's a reconciliation of exports on the backend, so this might run twice, but, if the driver startup is staggered, this isn't a biggie; and there aren't many periodic driver interactions for the native cephfs driver 15:27:32 just the scheduler stats update 15:27:50 it's very difficult to retrofit if this stuff isn't designed in from the beginning 15:28:07 mgoddard: +1 15:29:13 gouthamr: yeah, fixing for native cephfs and adding some missing tooz locks in manila-share core would be a good start 15:29:40 And then maybe documenting caveats for other back ends until someone can fix them. 15:30:37 eliaswimmer: is your interest in the generic driver for production use? 15:30:39 yeah, that's a good point - one of the other pieces that relies on coordination is share access control - we currently protect critical sections there with file locks 15:31:10 if the file locks aren't coordinated, this could potentially break 15:31:16 tbarron: only until I have a good setup for using cephfs natively 15:31:44 gouthamr: the code changes to swith them to tooz wouldn't be hard, but need to deploy tooz with dlm and test test test 15:32:04 tbarron: +1 15:32:47 gouthamr: mgoddard but consequences of races there are probably not too bad either. Access to share isn't granted or withdrawn properly and needs to be re-done? 15:33:31 Not like data loss. The more general bogey with active-active cinder-volume or manila-share. 15:35:16 that's all helpful stuff, thanks 15:35:21 mgoddard: so k-a gate runs multinode jobs with a DLM in play? If manila does make changes for a-a, they could be tested there? 15:35:22 i think so - if you added a lot of access rules at once on a particular share, we could claim some of your access rules have been applied. but, in reality, cephfs could know nothing about them 15:36:02 gouthamr: so undetected access failure, or rather in-use detection by end users. 15:36:40 yeah, the annoying workaround would be to apply access rules one at a time 15:36:50 currently we do not deploy a DLM in CI jobs, although I expect we will soon 15:37:07 it should be a one-liner to enable 15:37:26 mgoddard: So perhaps respecting yoour time we should leave the conversation where it is atm. We don't have a quick 15:37:57 solution for you but the good thing is we in manila are more aware of the need and potential for testing 15:37:58 yes, we should let you continue with your other topics 15:38:15 outside of just manila gate or tripleo. 15:38:36 +1, I think we're all a bit more synced up now 15:38:42 mgoddard: Thanks much for your time, and you know where to find us! 15:38:56 yes, thank you for bringing this up mgoddard eliaswimmer 15:39:18 well, thank you for inviting us. Enjoy the rest of the meeting o/ 15:39:31 let's move on to regular programming 15:39:32 eliaswimmer: hang around please :D 15:39:39 on wait 15:40:01 gouthamr: he has bug, can be there 15:40:06 tbarron: anything else about $topic 15:40:08 ah i see 15:40:15 #topic Reviews needing feedback 15:40:42 s/he I shouldn't be assuming :D 15:40:44 alright, time for our favorite review focus etherpad 15:41:01 #link https://etherpad.opendev.org/p/manila-wallaby-review-focus (Wallaby cycle review focus etherpad) 15:42:01 if you have anything that needs reviewer attention, please feel free to add your change to this etherpad, and we'll track through the end of this release cycle 15:43:02 it's brand new, we could have patches that you need a nudge on already 15:43:52 anything that needs to be discussed right now? 15:45:13 cool; we'll talk about this again next week 15:45:25 we can move on to: 15:45:28 #topic Bugs (vhari) 15:45:39 o/ vhari - the floor's yours 15:46:27 gouthamr, I am on another call can you pls drive the Bugs 15:46:33 ah sure thing 15:46:44 #link https://bugs.launchpad.net/manila/+bug/1911695 15:46:45 Launchpad bug 1911695 in OpenStack Shared File Systems Service (Manila) "generic backend resize share failure when no access rule is set" [Undecided,New] 15:46:45 gouthamr, all new bugs are on etherpad 15:47:15 I think this is quite easy to fix 15:47:21 some quotes are missing 15:47:32 before exec via bash 15:48:05 ah i see 15:48:10 If you want I can patch it 15:48:20 that would be very welcome, eliaswimmer 15:48:30 thanks for filing this bug 15:48:31 eliaswimmer++ 15:49:24 should we target m2 or m3 for this one? 15:49:46 m is milestone? 15:49:47 m2 is 3 weeks away 15:49:51 yes, sorry 15:49:55 eliaswimmer++ 15:50:04 I would like to have it in train :) 15:50:12 m2 is next week, tbarron 15:50:19 gouthamr: ah yes 15:50:23 d'ph 15:50:32 i know that too 15:50:35 you just told us 15:50:45 eliaswimmer: yes, we'll need to fix it in the trunk branch, and we can backport it one branch at a time 15:51:34 we can set this one to m-3, and follow up on LP - do let us know if you have any questions 15:52:00 for reference, the wallaby release schedule is here: https://releases.openstack.org/wallaby/schedule.html 15:52:08 gouthamr: ok 15:52:14 and milestone-3 is Mar 08 - Mar 12 15:53:07 lauchpad's very slow atm 15:53:22 ty for picking this up eliaswimmer, lets move to the next bug 15:53:33 #link https://bugs.launchpad.net/manila/+bug/1911071 15:53:36 Launchpad bug 1911071 in OpenStack Shared File Systems Service (Manila) "lower constraints job broken on stable branches older than ussuri" [Medium,In progress] - Assigned to Goutham Pacha Ravi (gouthamr) 15:54:05 alright this one's been targeted 15:54:41 we're discussing lower-constraints jobs quite a bit over the mailing list 15:55:18 so far, the consensus seems to be to keep them since they've been fixed - in manila, we've now fixed them train+ 15:55:34 we still have a problem with stein, rocky and queens 15:55:48 and these branches are on extended maintenance 15:56:01 i attempted a patch for stein 15:56:08 #link https://review.opendev.org/c/openstack/manila/+/770207 ([stable/stein] Update requirements and constraints) 15:56:40 still failing - will need to check what's going on there 15:57:09 but, i'd not like to do this on rocky and queens - i don't see the value of bumping requirement files on these really old branches 15:57:29 specifically to call out lower requirements 15:58:08 so i'm proposing we drop the lower-constraints testing on these 15:58:18 #link https://review.opendev.org/c/openstack/manila/+/770704 ([stable/rocky] Adjust CI jobs) 15:58:33 took the opportunity to do some more CI cleanup ^ 15:58:54 let's discuss the merits/pitfalls of this directly on the change 16:00:10 thanks for working on this, gouthamr :) 16:00:10 let's wrap up bug triage here.. 16:00:11 you're welcome carloss 16:00:13 no time for open discussion today :) 16:00:25 if you have something, please come on over to #openstack-manila 16:00:31 thank you all for attending 16:00:34 stay safe! 16:00:38 #endmeeting