14:00:00 #startmeeting cinder 14:00:01 Meeting started Wed Mar 17 14:00:00 2021 UTC and is due to finish in 60 minutes. The chair is rosmaita. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:02 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:05 The meeting name has been set to 'cinder' 14:00:09 #topic roll call 14:00:19 howdy 14:00:30 Hello 14:00:41 hi 14:00:44 hi 14:01:38 hi! o/ 14:01:54 hello everyone 14:01:54 #link https://etherpad.openstack.org/p/cinder-wallaby-meetings 14:02:02 #topic updates 14:02:10 hi 14:02:36 hi 14:02:55 as enriquetaso proposed last week, we will be having a half-hour bug squad meeting each week right after this cinder meeting 14:02:59 info is here: 14:03:07 #link http://eavesdrop.openstack.org/#Cinder_Bug_Squad_Meeting 14:03:12 :) 14:03:21 there's a calendar invite at that link so you can add it to your calendar 14:03:31 meeting will be back in #openstack-cinder 14:03:40 next item 14:03:54 the virtual PTG is coming up, 19-23 April 14:04:03 it's free to attend, but you need to register 14:04:13 #link https://etherpad.opendev.org/p/xena-ptg-cinder-planning 14:04:18 info ^^ 14:04:25 also, start thinking about topics 14:04:39 we agreed to meet Tuesday-Friday 1300-1600 UTC 14:04:54 i'd like to make one of those "drivers day" to deal with driver issues 14:05:15 so if you are a driver maintainer, please indicate on the etherpad what is a good day for that 14:05:44 ok, so the last festival of XS reviews was so productive that people want to do another one 14:05:51 i think friday would be good 14:06:17 because RC-1 is thursday, and if there are revisions, people can get them done hopefully before RC-1 14:06:29 because after that, only release-critical bugs will be allowed into stable/wallaby 14:06:47 so, question #1 ... does friday work for people? 14:06:59 y 14:07:01 yes 14:07:05 yes 14:07:13 that's this friday 19 March to be specific 14:07:17 yes 14:07:51 ok, sounds like we have enough people to make it worthwhile, and hopefully more will join 14:08:01 ok question #2 - the time 14:08:08 last time was 1400-1600 UTC 14:08:28 my local time has shifted, so i would propose 1300-1500 UTC 14:08:31 but i can do either one 14:08:57 any preference? 14:09:00 i can do either as well 14:09:03 either one 14:10:05 I will work it out as best I can either way. 14:10:08 well, maybe stick with 1400-1600 UTC for consistency? 14:10:39 rosmaita: sounds good 14:10:41 because in the fall i will not want to do 1300-1500! 14:11:01 good point :) 14:11:13 ok, thanks for the feedback 14:11:40 #action rosmaita - note to ML about festival of XS reviews friday 19 March 1400-1600 UTC 14:12:17 ok, that's all i have for announcements 14:12:40 #topic review of FFEs 14:12:51 ok, some feature freeze exceptions were granted 14:13:05 oh yeah, i do remember one more announcement 14:13:11 the docs job is fixed 14:13:24 https://zuul.openstack.org/builds?job_name=openstack-tox-docs&project=openstack%2Fcinder 14:13:37 thanks to geguileo and tosky for getting a good solution worked out 14:13:37 yay 14:13:43 \o/ 14:14:19 the good news is that the fix actually applies to other jobs too 14:14:37 so hopefully we will see fewer of these bizarre failures in the future 14:14:43 ok, back to FFEs 14:14:53 #link https://etherpad.opendev.org/p/cinder-wallaby-features 14:15:16 first one is NetApp Flex Group support 14:15:40 that one had +2s but needed to resolve some merge conflicts 14:15:50 and then another feature broke all the nfs gates 14:16:01 that bugfix is now in the gate 14:16:22 so the CIs should all be good later today 14:17:01 ok, I think Felipe already solve the merge conflicts, we are waiting for the nfs fix to be merged 14:17:18 yeah, i have re-reviewed the change 14:17:32 would you like your opinion if it is too late to add a few ONTAP version check lines in this patch. 14:17:34 not sure if smcginnis will have time to look, he was the other +2 on the original patch 14:17:47 Will do. 14:17:52 thanks! 14:18:24 sfernand: well, how about making that a bugfix? 14:18:34 a alternative to include the version checking maybe we can create a separate patch to handle that checking stuff as fix 14:18:39 yep 14:18:46 that works as well 14:18:58 ok, cool 14:19:16 if you get it posted & reviewed before thursday, should make it into wallaby 14:19:17 ok thanks 14:19:50 next one is Pure Storage QoS support 14:20:08 that is in the gate now, so i think it's under control 14:20:23 DS8000 revert to snapshot 14:20:54 need some advice about this one, they are having trouble getting cinder-tempest-plugin running 14:21:05 (you will see, this is a common theme today) 14:21:19 that's where the snapshot revert CI tests are 14:21:46 they have verified manually ... question is, is that sufficient (assuming they pinky-promise to get it working) 14:22:16 actually, forget it 14:22:31 looks like that has already merged 14:22:55 ok, so apparently the answer to that question above is "yes" 14:23:16 next one: Nimble driver 14:23:21 :-) 14:23:36 i am holding this one up because of CI, but i may be incorrect here 14:23:41 they are adding group support 14:23:51 CI is passing with main tempest suite 14:23:59 but not running cinder-tempest-plugin 14:24:19 ajitha pointed out to me that the cinder-tempest-plugin test the old consistency-group stuff 14:24:42 so maybe just running the main tempest tests are fine here? 14:24:51 https://opendev.org/openstack/tempest/src/branch/master/tempest/api/volume/admin 14:25:00 they are the test_group* tests 14:25:44 do those test CGs or just regular groups? 14:26:06 looks like the tempest ones are regular groups 14:26:54 consistency group tests are only in cinder-tempest-plugin as far as I know; of course they may need to be updated 14:28:17 i will admit that i am confused about the group stuff, so i wanted to ask for feedback about this situation 14:30:03 so do we have tests anywhere for consistency groups via regular volume groups? (the newer implementation of CGs) 14:30:44 eharney: i do not know 14:31:41 i guess we need to figure out what the testing requirements are to know that CGs work 14:31:58 i guess my proposal is that we say main tempest is OK for nimble patch, and we make group testing a topic at the ptg 14:32:35 and i assume that if we add better/more thorough tests, nimble team will be interested in fixing any bugs that come up? 14:33:15 rosmaita: ++ That sounds like a good plan. 14:33:18 works for me, their implementation looks plausible enough 14:33:24 ok, cool 14:33:37 next: Zadara driver features 14:33:43 this looks fine to me 14:33:52 i think i have a +2 on the patch 14:34:28 i checked their CI, it's set to run the manage_* tests 14:34:42 so once it responds with success, should be good to go 14:35:01 can i have a cinder core "adopt" this patch? 14:36:24 just need to know that someone other than me is keeping an eye on it before Friday 14:36:50 has to be merged before 2100 UTC 14:36:56 i can review it, but my first question will be: why does zadara_use_iser default to True 14:37:44 probably leave a -1 and that question on the patch 14:37:49 rosmaita: I am planning to gake another look through what is in the etherpad you have. 14:37:54 yah 14:38:01 ok, cool 14:38:34 btw, no one should feel pressure to approve anything ... i just want to make sure we are looking and giving enough time for people to revise & resubmit 14:38:57 ok, last one is JovianDSS 14:39:41 code looks ok, but they aren't running the cinder-tempest-plugin 14:39:53 i think that's the only place that revert-to-snapshot is tested 14:41:11 hopefully they will get it running 14:41:40 ok, that's all ... thanks to everyone who has been reviewing these 14:41:48 the docs job breakage slowed things down a bit 14:41:55 but we are moving now! 14:42:19 #topic Parallel Backup 14:42:24 I have created PR for parallel cinder backup. Tested on devstack from 1 GB to 250GB and other setup where 1 TB backup shown improvement around 20-22%. However this is applicable for single backup. 14:42:24 Want to know opinion ? If https://review.opendev.org/c/openstack/cinder/+/779233 is not correct way.. then what would be another approach ? 14:43:39 this might be a good PTG topic 14:43:43 i'm not sure that testing a single backup is an interesting test for how people actually use cinder backup 14:44:09 testing if this work helps at all when running a handful of backups seems like a better target 14:44:11 does it increase the memory usage? 14:44:53 it would also help if the commit message better described what exactly is being parallelized 14:45:00 If kept at lower no of threads e.g. 4, no significant increase 14:45:26 is this about moving checksum calculations onto multiple cpu cores? 14:46:22 no upload. 14:47:43 so it sends multiple chunks concurrently over the network based on the number of cpu cores available? 14:47:51 yes 14:48:06 i don't really think this makes sense, but like rosmaita said, maybe a better PTG topic than here 14:48:34 eharney: ++ 14:48:46 kinpaa12389: can you attend the PTG? 14:48:51 it may make one backup faster but then as soon as you're running 8 backups at once, having all of this parallelized across threads doesn't actually gain any benefit, so i don't know why we want to do this 14:49:07 a single backup isn't the case we really want to optimize for 14:49:08 Yeah, looking at geguileo 's comment and what is being done I think doing this carefully is going to be required. 14:49:09 i think that was geguileo's comment on the patch also 14:49:52 so there needs to be some explanation to justify the complexity of adding this 14:50:01 At a minimum some logic across the multiple optimization options would be required. 14:50:18 +1 14:50:37 it just seems like it's solving the wrong thing to me 14:50:45 Which that, obviously, won't be easy. 14:51:04 sounds like the consensus is that this requires more discussion 14:51:19 so in general opinion is -- backup parallelization for single backup is not recommended.. when do we have PTG ? 14:51:30 19-23 april 14:51:51 ok 14:51:51 ehterpad is here: https://etherpad.opendev.org/p/xena-ptg-cinder-planning 14:52:20 you might want to look at the cinder spec template, it will give you an idea of what kind of questions to think about/answer 14:52:55 but the main concern seems to be that since backups can already happen in parallel, this kind of micro-parallelization may not help 14:53:14 anything else? 14:53:28 nothing on parallel backup then. 14:53:41 there is another backup bug. which needs to allow backup after token expiry 14:53:53 looks like the next topic continues the backup theme 14:53:58 https://bugs.launchpad.net/cinder/+bug/1298135 14:54:00 Launchpad bug 1298135 in Cinder "Cinder should handle token expiration for long ops" [Medium,Confirmed] 14:54:25 i don't think this is a valid request, because if you configure things correctly, this is already handled, as far as i can tell 14:54:57 you hardcode expiration in keystone.conf and run backup.. it fails 14:55:13 as soon as token expire.. the solution was to add service_Token alongwith user_token 14:55:58 i tried to change cinder to send service_token to swiftclient and then test it. But it does not work.. since swiftclient is quite thin and swift itself validates token for each backup chunk. 14:56:43 my understanding is that the keystone middleware handles this 14:57:05 right, i haven't tried this recently, but presumably swiftclient is doing validation in a way where this would work 14:57:20 or, swift, more likely 14:57:53 swift validate token for each backup and it fails to validate token once expiry hit. 14:58:12 Two minute warning. 14:58:14 swift validates tokens by checking them against keystone etc 14:58:40 kinpaa12389: have you tried the troubleshooting steps in https://docs.openstack.org/cinder/latest/configuration/block-storage/service-token.html ? 14:58:41 if there is a request for a change in Cinder related to this bug, it's not clear what it would be 14:58:48 yes, i did try 14:59:04 first service_token for cinder <-> swift does not exist today 14:59:19 u need to add it in backup/drivers/swift.py 15:00:20 well. we are out of time ... continue discussion in bug meeting, i guess 15:00:24 then swiftclient httpConnection needs to modify to accept this auth_token (user_token+service_token). 15:00:38 #endmeeting