opendevreview | XuQi proposed openstack/cinder master: Fujitsu Driver: Change the function of attach/detach https://review.opendev.org/c/openstack/cinder/+/860997 | 03:40 |
---|---|---|
*** amoralej is now known as amoralej|off | 06:33 | |
*** amoralej|off is now known as amoralej | 06:37 | |
opendevreview | Masayuki Igawa proposed openstack/cinder stable/ussuri: Add warning message about slow volume backend https://review.opendev.org/c/openstack/cinder/+/861057 | 07:34 |
opendevreview | Raghavendra Tilay proposed openstack/cinder master: HPE 3PAR: test - please ignore https://review.opendev.org/c/openstack/cinder/+/861001 | 10:28 |
*** amoralej is now known as amoralej|lunch | 12:14 | |
opendevreview | Tushar Trambak Gite proposed openstack/cinder master: Deleting a volume in 'downloading' state https://review.opendev.org/c/openstack/cinder/+/826607 | 12:28 |
*** amoralej|lunch is now known as amoralej | 13:08 | |
hemna | so, fwiw, I filed this one yesterday https://bugs.launchpad.net/cinder/+bug/1992493 as a placeholder for some major problems that I face on the daily with customers that are using our deployment | 14:22 |
hemna | customers end up not being able to do basic operations on volumes due to cinder's inability to manage operations where pools are full, but there are many pools with capacity available in the same backend. | 14:24 |
hemna | this is turning into a major problem in our deployments. can't do backups, clones, snapshots, extends | 14:24 |
hemna | we have patched cinder to try a migrate on a failed extend as a solution and that has worked | 14:25 |
hemna | snapshots are a bit more difficult as cinder doesn't allow migrations for volumes that have existing snaps | 14:25 |
hemna | and backups first thing it does is to snap a volume (attached) | 14:26 |
hemna | which fails | 14:26 |
hemna | so most of the cinder deployment is completely inoperative for customers in this state | 14:26 |
jbernard | hemna: are the other, non-full, pools considered to be equal to the full one in that particular deployment? | 14:38 |
hemna | all pools are available for provisioning in the same backend | 14:38 |
hemna | there is plenty of space in the backend, but certain pools are full and can't take anymore. the volumes being backed up, cloned, snapshotted are typically volumes on the pools that are full as they have been around the longest | 14:39 |
hemna | those operations shouldn't fail | 14:39 |
jbernard | i see, migrating seems like the solution there, could we not also migrate the snaps along with the volume? | 14:40 |
hemna | so what ends up happening is the customer complains, files a ticket, then I have to manually migrate the volume in question so they can do the backup/snap/clone/etc | 14:40 |
hemna | that process isn't 'cloud' like, nor scalable | 14:40 |
jbernard | what happens the volumes' snaps? | 14:40 |
hemna | in our case, snaps are full clones | 14:40 |
hemna | so, those could also get migrated | 14:41 |
hemna | but cinder just quits and says no to everything | 14:41 |
jbernard | would an operator complain about an automigrate function? it seems like really nice to have imo | 14:41 |
hemna | for example, in one backend we have 52 pools, 22 of which are full. | 14:42 |
hemna | I think migrate should be built in to these operations. | 14:42 |
hemna | if it fails to find a host for the operation, try to find a host to migrate it to on the same backend | 14:43 |
jbernard | in considering an automatic migration, how would you select the target pool? random? | 14:43 |
hemna | just sent it back through the scheduler and tell it to find a pool in the same backend | 14:43 |
jbernard | ahh | 14:43 |
hemna | that's what we did for our patch for extend | 14:43 |
jbernard | with available capacity | 14:43 |
hemna | extend fails because the current pool is full or can't take the new size. so we ask the scheduler to migrate it with the new size, it picks one and migrates it, then we extend after the migration completes | 14:44 |
hemna | cinder should be doing that for all the operations where it can't find a host on a particular backend | 14:44 |
jbernard | that would definately be more consistent | 14:45 |
hemna | otherwise most of the real world operations are broken for users | 14:45 |
hemna | I have 20+ deployments of cinder around the world. doing manual migrations isn't a 'solution' | 14:46 |
jbernard | yikes | 14:46 |
jbernard | ptg is in a few days, this might be a good time to air this idea across the group | 14:47 |
hemna | yah, I'll bring it up, which is also why I filed the bug. the bug was unfortunately labeled as medium. | 14:47 |
hemna | I think it's a high priority. cinder isn't really cloud like software if human intervention is needed for basic operations. | 14:48 |
jbernard | enriquetaso: ^ this might be worth considering for https://bugs.launchpad.net/cinder/+bug/1992493 | 14:48 |
hemna | I think this was just a big oversight when we enabled pools way back when. | 14:48 |
hemna | heh, the bug was actually labeled 'wishlist' ! | 14:49 |
hemna | smh | 14:49 |
* enriquetaso reading | 14:53 | |
hemna | fwiw, this is what we did to fix extend https://github.com/sapcc/cinder/pull/134 | 14:54 |
*** dviroel_ is now known as dviroel | 14:55 | |
enriquetaso | I've mentioned 1992493 on yesterday bug meeting. I'll re-target it to High priority. I think the best idea is to discuss it on the PTG, please add a topic hemna. | 15:02 |
hemna | yes, we need to discuss it. thank you | 15:02 |
enriquetaso | Thanks jbernard hemna | 15:03 |
hemna | posted it in the etherpad | 15:28 |
*** tkajinam is now known as Guest2971 | 15:43 | |
*** amoralej is now known as amoralej|off | 16:16 | |
enriquetaso | hemna++ | 16:42 |
*** dviroel is now known as dviroel|biab | 19:55 | |
*** dviroel|biab is now known as dviroel | 20:51 | |
*** dviroel is now known as dviroel|afk | 21:33 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!