Wednesday, 2021-07-14

*** dviroel|out is now known as dviroel11:24
*** hemna0 is now known as hemna12:53
rosmaita#startmeeting cinder14:00
opendevmeetMeeting started Wed Jul 14 14:00:14 2021 UTC and is due to finish in 60 minutes.  The chair is rosmaita. Information about MeetBot at http://wiki.debian.org/MeetBot.14:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.14:00
opendevmeetThe meeting name has been set to 'cinder'14:00
rosmaita#topic roll call14:00
eharneyhi14:00
rosmaitaam i netsplit?14:00
walshh_hi14:00
rosmaitaoh good14:00
whoami-rajatHi14:01
enriquetasoHi14:01
fabiooliveiraHi14:01
rosmaitawas worried for a minute there14:01
simondodsleyHi14:01
Mr_FreezeexHi14:01
TusharTgitehi14:01
rosmaita#link https://etherpad.opendev.org/p/cinder-xena-meetings14:01
rosmaitahi everyone, i am going to get started because i have like 500 announcements and there's a bunch of topics on the agenda14:02
rosmaita#topic announcements14:02
rosmaitaok, first item14:02
rosmaitaTusharTgite put a request for help near the end of our video meeting 2 weeks ago, and i am afraid it got lost14:02
rosmaitaand i forgot to mention it last week14:02
rosmaita#link https://wiki.openstack.org/wiki/Reset_State_Robustification14:03
rosmaitahe would like some feedback, so please take a look14:03
rosmaitanext item is a reminder14:03
rosmaitaCinder Festival of XS Reviews this friday!14:03
rosmaitalocation as usual is https://meetpad.opendev.org/cinder-festival-of-reviews14:04
rosmaitanext item, the next OpenStack release will be named "Yoga"14:04
rosmaitaand so the Yoga virtual PTG 18-22 October 202114:04
rosmaitaand now we have a planning etherpad14:04
rosmaita#link https://etherpad.opendev.org/p/yoga-ptg-cinder-planning14:04
rosmaitaand, teams must sign up for time slots within the next week14:05
rosmaitai sent a proposal to the ML yesterday which you may have seen14:05
rosmaita#link http://lists.openstack.org/pipermail/openstack-discuss/2021-July/023624.html14:05
rosmaitabasically, the proposal is for us to do what we have done for the past virtual PTGs14:06
rosmaitathe downside is that it is rough on APAC time zones14:06
rosmaitabut i really don't know what to do about that14:06
rosmaitai am open to suggestions14:06
rosmaitaas i said in the email, maybe we have an APAC-friendly post-PTG session, if there is interest14:07
rosmaitathe problem is that interest is usually shown by attending cinder weekly meetings14:07
rosmaitaand if the weekly meeting time sucks for you, you won't be here to express an opinion14:07
whoami-rajati think it's fine for India, not sure about China, Japan and others...14:08
rosmaitayeah, for shanghai time it's sub-optimal, but probably do-able14:08
rosmaitabut for Japan, kind of not very good14:08
rosmaitathe source of the problem, as i see it, is that we are not all physically in the same TZ for the week14:09
rosmaitait is really hard to accommodate Seattle to Tokyo14:09
rosmaitaanyway, my point is that i don't want to be exclusionary about time zones, but i also don't have any ideas14:10
rosmaitaand we definitely need to accommodate the core team14:10
rosmaitaso, please make sure you mark your calendar for the PTG14:10
rosmaitaand also register14:10
rosmaitanext item14:10
rosmaitadropping lower-constraints job, file (discussed last week)14:11
rosmaita#link https://review.opendev.org/q/topic:%22drop-l-c%22+(status:open%20OR%20status:merged)14:11
whoami-rajatone way is to add APAC topics to the beginning and they can follow the rest of PTG in recordings if it gets too late?14:11
rosmaitawhoami-rajat: yeah, i am happy to do that, i'll add a note on the etherpad14:11
rosmaita#action rosmaita ^^14:11
rosmaitaok, please review ^^ so we can get lower-constraints out of our collective lives and easily make requirements changes14:12
rosmaitanext item14:12
rosmaitathis is week R-12 - xena milestone-214:12
rosmaitatwo major things here14:13
rosmaita1 - new driver merge deadline14:13
rosmaitaone new driver: Add Cinder NFS driver for Dell EMC PowerStore14:13
rosmaitahttps://review.opendev.org/c/openstack/cinder/+/79760814:13
rosmaitai need help reviewing this one, especially from driver maintainers who already have NFS-based drivers14:14
rosmaitado please do PowerStore a solid and take a look and leave comments14:14
rosmaitai will be honest, the driver as it stands looks a bit raw14:14
eharneyi've got this one on my list to follow up on14:15
rosmaitain that i'm not sure that it takes into account all the implications of multiattach14:15
rosmaitabut i will also be honest and say i don't know jack about NFS, really14:15
rosmaitaso thanks eharney, definitely want you to look closely14:15
rosmaitabut also, other people ... anyone from netapp here today?14:15
rosmaitaguess not14:16
rosmaitaok the other big thing is that i want to get v2 of the block storage api removed this week and cut releases of everything affected14:17
rosmaitaand since these are my patches, i cannot review14:17
rosmaitawell, i can, but it doesn't do any good14:17
rosmaitai sent an appeal about cinder to the ML earlier this week14:18
rosmaita#link http://lists.openstack.org/pipermail/openstack-discuss/2021-July/023606.html14:18
rosmaitaand then this morning, i realized that we had discussed this earlier and people signed up to review patches14:18
rosmaita#link https://wiki.openstack.org/wiki/CinderXenaMidCycleSummary#Block_Storage_API_v2_Removal_Update14:18
rosmaitaand here are the patches14:18
rosmaita#link https://review.opendev.org/q/topic:%22drop-v2%22+(status:open%20OR%20status:merged)14:18
enriquetasoi'm not on that list but i'll review them asap 14:19
rosmaitaenriquetaso: please start with the cinderclient patches14:20
enriquetasosure14:20
rosmaitai am having problems getting the second cinderclient patch passing CI after rebasing it on the other patch14:20
rosmaitathe "other patch" being shell removal14:20
rosmaitashell removal: https://review.opendev.org/c/openstack/python-cinderclient/+/79183414:21
rosmaitaclasses removal: https://review.opendev.org/c/openstack/python-cinderclient/+/79295914:21
rosmaitai have no idea what is going on ... i am consistently getting tempest-full-py failures in a tempest scenario test that tries to log in to horizon14:22
rosmaitai am pretty sure my patch is not affecting that14:22
rosmaitabut my patch seems to be cursed14:22
rosmaitathe other thing is that i am getting consistent fixture timeouts in the functional-py36 test ... 14:23
whoami-rajati also got the python-cinderclient-functional-py36 failure today on my patch...14:23
rosmaitayes,and then you passed on the next go-round14:23
eharneyrosmaita: er... i think horizon imports cinderclient.v214:23
rosmaitawhereas mine failed again14:23
rosmaitaeharney: i thought not, but i better look more closely14:23
eharneyit does, i just looked14:24
rosmaitashoot14:24
rosmaitaok, i guess that's why we have CI tests14:24
rosmaitaeharney: shoot me a link and i will put up an emergency horizon patch or something14:25
eharneyhttps://opendev.org/openstack/horizon/src/branch/master/openstack_dashboard/api/cinder.py#L3114:25
eharneyi'll add a review comment14:25
rosmaitaty14:25
rosmaitaok, if anyone has an idea about the fixtures problem (only seen in py36) i will be happy to hear it14:26
rosmaitafinal announcement:14:26
rosmaitaxena midcycle-2 coming up at R-914:26
rosmaitaWednesday 4 August 2021, 1400-1600 UTC14:26
rosmaitaso make sure your calendar is blocked off for fun & excitement14:26
rosmaitaand, add topics to the etherpad:14:27
rosmaita#link https://etherpad.opendev.org/p/cinder-xena-mid-cycles14:27
rosmaitathat's all for announcements, other than that i will be away for a week starting next wednesday14:27
rosmaitawhoami-rajat has volunteered to run the meeting14:28
rosmaitaok, on to real topics14:28
rosmaita#topic rbd: snapshot mirroring for ceph backup driver 14:28
rosmaitaMr_Freezeex: that's you14:28
Mr_FreezeexHi, so this is a followup discussion on the addition of snapshot replication discussed in last weekly meetings14:28
rosmaita#link https://review.opendev.org/c/openstack/cinder/+/78495614:28
Mr_FreezeexThe snapshot mirroring was denied and I am here to discuss present why I think it's nice to have that in cinder-backup14:29
Mr_FreezeexSo first thing is that you can't enable snapshot replication on the whole pool like with journaling, you have to do that per image. So this either needs to be done directly in cinder-backup or with an external automation.14:29
Mr_FreezeexNow if we compare the two modes. It's true that the writes are synchronously written to the journal, but it's not synchronously replayed to the other cluster.14:30
Mr_FreezeexWith journaling the replication is consistent per write and with the snapshot it's per snapshot. But for backups, I don't see a real advantage for per write consistency.14:31
Mr_FreezeexSo IMO both alternative are equally safe for backups and the snapshot option well exceed the performance of journaling replication in terms of client performance (so backup speed in our case) and replication speed.14:31
Mr_FreezeexIf you have any questions/concerns, feel free!14:32
geguileoMr_Freezeex: I haven't tried the feature, and I'm looking for the doc where it says that it cannot do whole pool configuration for snapshot replication...14:33
geguileodo you have a link, location where the docs says it?14:33
geguileoor is it missing from the docs?14:34
Mr_FreezeexWell it's not in docs but the rbd command to enable replication on the whole pool assume journaling14:34
Mr_Freezeexthere is no options to say either snapshot or journal14:34
Mr_FreezeexAnd the default in ceph is journaling so...14:34
geguileoMr_Freezeex: that mirroring mode may be the default, but I believe you can change it14:35
geguileoright?14:35
Mr_FreezeexHmm I don't think so but I am maybe wrong14:36
Mr_Freezeexat least it not appears to be documented anywhere...14:36
geguileomy bad14:37
geguileopool (default): When configured in pool mode, all images in the pool with the journaling feature enabled are mirrored.14:37
Mr_FreezeexAh indeed14:37
geguileooooooh, wait, but you can configure the mirror-snapshot to work on a per-pool level14:38
geguileowhich should solve the problem, I would think14:38
geguileojbernard: you around to chime in on this? ^14:38
Mr_FreezeexOh ok, where is that documented?14:38
geguileoMirror-snapshots can also be automatically created on a periodic basis if mirror-snapshot schedules are defined. The mirror-snapshot can be scheduled globally, per-pool, or per-image levels.14:39
geguileohttps://docs.ceph.com/en/latest/rbd/rbd-mirroring/#create-image-mirror-snapshots14:39
geguileonot 100% sure that is all that's needed14:39
Mr_FreezeexAh yes ok but this only creates snapshots on image that have the snapshot replication mode enabled14:40
geguileoignore it: When using snapshot-based mirroring, mirror-snapshots will need to be created whenever it is desired to mirror the changed contents of the RBD image14:40
geguileoyup14:40
geguileook, so in that light, it makes sense to support it in the backups14:40
geguileoI find it odd that there is no way to do that by default14:41
Mr_FreezeexSpeaking of the mirror snapshots on the whole pool. I assumed this would be the go to for backup in my patch but I will probably add a quick addition to do a mirror snapshot after every data transfer so you would not even have to enable that!14:42
Mr_FreezeexYeah but I think they want to deprecate to enable replication on the whole pool, or at least it is not very recommanded14:42
rosmaitai'm trying to remember what the objection to using this for backups was last week, it was something to do with a RPO gap14:43
geguileoMr_Freezeex: what do you mean do a mirror snapshot after every data transfer?14:43
Mr_Freezeexgeguileo: When the backup is finished do a mirror snapshot14:44
geguileorosmaita: my main complain was that this should be done like the journal based, at the Ceph end, and cinder shouldn't need to know14:44
Mr_FreezeexBasically even an empty snapshot has a (small) cost on the rbd-mirorr daemon so that's an easy optimisation14:44
geguileoMr_Freezeex: that is what's necessary, right? otherwise it won't be mirrored14:45
Mr_Freezeexgeguileo: Yes you have to create mirror snapshot one option is to do that with the mgr and let ceph do it but do it via mirror snapshots would be nice because if we don't do that even backup that has changed for years would have those "useless" snapshot being created.14:47
geguileoMr_Freezeex: so if I understand correctly, we could not do it in Cinder and let storage admin set the scheduler in the backend, right?14:48
geguileoand if I understand correctly even the number of snapshots per image can be configured in the cluster, so it can be changed to 114:49
Mr_Freezeexgeguileo: yes, but that's an easy thing to do in cinder (like ~10 lines). You just have to call the function to create a mirror snapshot and let ceph does his thing in the backend 14:49
Mr_FreezeexYes but this is managed internally by ceph14:49
geguileoyeah, but that was the complain, if it can be done in the backend, we don't want to add that on our end14:49
*** dviroel is now known as dviroel|lunch14:50
rosmaitaok, so the discussion last week was that it made sense to configure journaling vs. snapshot mirroring in cinder ... for backups, do we want to have no option, and just do snapshot mirroring if it's available?14:50
geguileoif we create the snapshots, then we are responsible for the snapshots14:50
geguileorosmaita: my complain about backups is that afaik an admin can do it all on the Ceph side14:50
geguileorosmaita: like they are currently doing for journaling14:50
geguileos/journaling/journal-mirroring14:51
rosmaitaok, this has until M-3 to merge, so we should continue this discussion later14:51
geguileoso if that can be done as well for snapshot-based mirroring14:51
geguileowhy add code to cinder just for that?14:51
geguileook14:51
rosmaitai am all for not adding unnecessary code to cinder!14:51
geguileoMr_Freezeex: we can continue the discussion on the cinder channel after the meeting14:52
Mr_Freezeexyes sure!14:52
rosmaitaok, geguileo you are up next14:52
rosmaita#topic meaning of "available" status for a volume? 14:52
geguileorosmaita: thanks14:52
geguileoso I'm working on fixing a race condition on the new attachment API, specifically on the delete operation14:53
geguileoand while looking at the code around it, I saw some inconsistencies14:53
geguileosome with how we worked before14:53
geguileosome on their own14:53
geguileofor example, if a remove export fails when deleint the attachment, then we don't fail the operation14:53
geguileothough we set the attach_status of the attachment to error_detaching14:54
geguileosince we don't fail, the volume will go into available and detached (if there are no additional attachments on that volume)14:54
geguileoso we end up with a detach operation that according to the old attachment API should have failed14:54
geguileobut now succeeds14:54
geguileothat's not a big deal, since on delete we now ensure we call the remove export, just something that we have changed, and sounds a bit weird14:55
geguileothe bad part is that now we have a volume that has an attachment, yet is in available and detached state14:55
whoami-rajatgeguileo, I'm not clear with state "available and detached", i assume you mean status=available and attach_status=detached but you said attach_status is set as error_detaching14:56
geguileowhoami-rajat: attach_status was for the attachment itself14:56
geguileowhoami-rajat: where as the available and detached is like you described, which is for the volume14:56
whoami-rajatok got it14:57
geguileowhich is kind of weird, as one would assume that an available volume has no attachment records14:57
geguileoso there are 2 ways to look at the available status:14:57
geguileo1- available => has no attachments, on any state (this was the old way)14:58
geguileo2- available => there is no-one using the volume (this is the new way)14:58
geguileoso that is my first concern (of may around this code)14:58
geguileoif we want to keep current behavior we need to document it14:59
rosmaita(btw ... horizon now meets in their own channel, so we can continue this discussion right here)14:59
geguileogreat!14:59
geguileothough I feel like I put everyone to sleep14:59
rosmaitawe are trying to digest what you are saying15:00
geguileook, thanks15:00
whoami-rajatgeguileo, since you mentioned we handle the delete export part during volume delete, doesn't it make sense to delete the attachment record rather than setting the status=error_detaching?15:00
geguileowhoami-rajat: yes, it would make sense to me15:00
whoami-rajati know it doesn't sound good but keeping an attachment record which is of no use doesn't sound good either15:00
geguileooh, no, it sounds a lot better than what we have now!!15:01
whoami-rajatgreat15:01
geguileoI'm ok with doing that15:01
geguileowhat do others think?15:01
whoami-rajatregarding your question, i prefer 1) since there might be complications with a stale attachment record during multiattach or even with normal volumes (that i can't think of right now)15:02
rosmaitageguileo: so when you say "new way", you mean this is the implication of the current code, not that you are proposing that it is good15:03
geguileowhoami-rajat: the only time we set error_deleting on the attachments is that one, so if we delete it, we don't have to worry about that state anymore on the attachments themselves15:04
geguileorosmaita: correct, this is the new attachment API current behavior15:04
geguileonot something I want to do15:04
rosmaitaok, got it15:06
whoami-rajatgeguileo, great, that solves all the problem then15:06
rosmaitai agree with whoami-rajat that 1 is preferable15:06
geguileook, then when we have an error_attaching case we have to change the status of the volume and the attachment_status15:07
geguileocurrently there are at least 2 cases where a volume's status will go into available detached even when having an attachment with error_attaching attachment_status15:07
geguileoand the "fun" part is that error_attaching will only happen if we have mapped and exported the volume15:08
whoami-rajatlot of cleanup needed!15:09
geguileoif the export succeeds and initialize fails => available detached15:09
hemnaso in that case unexport needs to be called?15:10
geguileoif the export and initialize succeed and attach_volume fails => available  detached and the attachment.attach_status=error_attaching15:10
whoami-rajatIMO that attachment records only makes sense if we could reuse those or reset state to get the volume attached somehow with same attachment ID (or maybe for tracking purpose)15:10
geguileowhoami-rajat: attachment records also helps openstack have the information where it should be15:11
geguileoso we can terminate connections even when nova doesn't exist15:11
geguileohemna: in the new API we don't consider having a volume left as exported "a problem"15:12
geguileowe call it on delete before the actual driver.delete call15:12
geguileoin my opinion the error_attaching must be taken into consideration15:13
whoami-rajatgeguileo, i was referring to error attachment records15:13
hemnawell, except that the backend may have a volume exported when it shouldn't be at that point15:13
geguileobecause the volume will be exported and mapped15:13
geguileowhoami-rajat: oh, those15:13
geguileowhoami-rajat: with the change we agreed we would only have attachment_error record15:13
geguileoI mean the only error kind15:14
geguileoand that is necessary to state that whatever we have there should not be used15:14
geguileoimo15:14
geguileoand we shouldn't delete an attachment record on our own on an "attach" (which is an update) request15:14
whoami-rajatack, makes sense15:15
geguileoso I would like to reach an agreement on:15:15
geguileo- If a volume has been exported & mapped, but the call to attach_volume fails, what should the volume's status and attach_status be?15:16
whoami-rajatgeguileo, should we consider cases of multiattach and normal volumes separately?15:17
geguileo- If a volume is exported but mapping fails (it could be actually mapped), should we change the attachment.attach_status to attaching_error?15:17
geguileowhoami-rajat: yes, I'm talking only about single attachment15:18
geguileobecause for multi-attach there is a priority on the attachment status15:18
geguileoI mean on the attachments' attachment_status15:18
hemnafwiw if initialize_connection fails remove_export is already called15:19
geguileoyup15:21
geguileothanks15:21
geguileowhoami-rajat: for multi attach: attachments priority is: attached > attaching > detaching > reserved15:22
geguileoso the volume's status and attachment_status of the volume is set according to the highest priority existance between those15:22
geguileoif there is one reserved and one attached => volume goes into in-use attached15:23
rosmaitaok, that makes sense15:24
geguileobtw, if the initialize fails, the remove export could also fail because the mapping may have succeeded in the backend15:24
geguileorosmaita: that is the current behavior, but ignores the error_attaching case15:24
geguileoand that is a volume that is exported & mapped!!15:24
whoami-rajatgeguileo, ok, and these states have higher priority than error attachment records, eg: reserved > error_attaching15:24
geguileoso an exported & mapped volume stays as available detached15:24
geguileowhoami-rajat: error_* are literally ignored15:25
whoami-rajatgeguileo, i don't see any cleanup being done if driver's attach volume call fails15:25
geguileowhoami-rajat: there is none15:25
whoami-rajatgeguileo, oh, that's good i guess15:25
whoami-rajathmm15:25
geguileobut even if there were, it could fail15:25
whoami-rajat:/15:26
geguileoso the question is, how do we treat the attaching_error on the attachment?15:26
geguileotowards the volume's status and attachment_status15:27
geguileowell, I seem to have saturated everyone with so much stuff15:28
geguileoso I'll take the win of agreement on the removal of the error_detaching status15:28
geguileoand leave the error_attaching questions for another year15:28
geguileoor the mid-cycle, or PTG, or whatever15:29
whoami-rajatgeguileo, maybe we can discuss it in upcoming virtual PTG, good to add a topic there15:29
whoami-rajatvirtual mid-cycle i mean15:29
geguileowill add it15:29
rosmaitageguileo: can you give a quick summary of the error_detaching agreement?15:29
geguileosure15:29
rosmaitaand yes, good topic for virtual midcycle15:29
geguileothe error_detaching state only happens when the remove_export fails15:30
geguileoand in that case the REST API response is that the attachment delete has successfully been completed15:30
geguileoit's not a problem that the volume hasn't been unexported because we'll call it on the volume deletion15:31
geguileoso we are going to be consistent with the REST API response, and ignore remove_export errors altogether15:31
geguileoso we try, if we fail, we postpone until deletion or until another attachment delete on this volume15:31
geguileoby ignoring the error we delete the attachment regardless of whethe the remove_export suceeded or not15:32
geguileoinstead of leaving it in error_detaching state15:32
geguileowhich wasn't helping anyone and was weird15:32
geguileobecause cinder was saying that it was deleted, yet it was still there15:32
geguileorosmaita: that would be my crazy long summary   lol15:33
rosmaitathat is helpful, thanks15:33
geguileonp15:33
rosmaitaok, thanks to everyone who stuck around for this week's XL meeting15:33
rosmaitadon't forget the festival of XS reviews on friday15:33
enriquetasoyes, really helpful 15:33
rosmaitaalso15:33
rosmaitaplease look at stephenfin's comments in the agenda about sqlalchemy-migrate -> alembic migration15:33
rosmaitaif we don't get it done this cycle, the cinder project will be stuck maintaining sqlalchemy-migrate15:33
rosmaita(and we don't want that!)15:33
rosmaitaso please start prioritizing reviews of the alembic migration patches as soon as v2 is removed from cinder15:33
rosmaitaand rajat, you can discuss stable releases next week15:34
enriquetasohey, can I ask for eyes on (reviews) https://review.opendev.org/c/openstack/cinder/+/67913815:34
enriquetaso:P15:34
geguileowhoami-rajat: sorry, for taking most of the time15:34
rosmaitathanks everyone!15:34
stephenfinYes please. Let me know if anything is unclear. I've tried to keep things as concise as possible15:34
geguileothanks everyone!15:34
enriquetasothanks!15:34
whoami-rajatgeguileo, np, it was a good and important discussion15:34
rosmaita#endmeeting15:34
opendevmeetMeeting ended Wed Jul 14 15:34:35 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)15:34
opendevmeetMinutes:        https://meetings.opendev.org/meetings/cinder/2021/cinder.2021-07-14-14.00.html15:34
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/cinder/2021/cinder.2021-07-14-14.00.txt15:34
opendevmeetLog:            https://meetings.opendev.org/meetings/cinder/2021/cinder.2021-07-14-14.00.log.html15:34
geguileostephenfin: thanks!15:34
whoami-rajatthanks everyone!15:34
*** dviroel|lunch is now known as dviroel15:39
*** dviroel is now known as dviroel|out22:01

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!