Thursday, 2022-10-13

opendevreview	XuQi proposed openstack/cinder master: Fujitsu Driver: Change the function of attach/detach https://review.opendev.org/c/openstack/cinder/+/860997	03:40
*** amoralej is now known as amoralej\|off		06:33
*** amoralej\|off is now known as amoralej		06:37
opendevreview	Masayuki Igawa proposed openstack/cinder stable/ussuri: Add warning message about slow volume backend https://review.opendev.org/c/openstack/cinder/+/861057	07:34
opendevreview	Raghavendra Tilay proposed openstack/cinder master: HPE 3PAR: test - please ignore https://review.opendev.org/c/openstack/cinder/+/861001	10:28
*** amoralej is now known as amoralej\|lunch		12:14
opendevreview	Tushar Trambak Gite proposed openstack/cinder master: Deleting a volume in 'downloading' state https://review.opendev.org/c/openstack/cinder/+/826607	12:28
*** amoralej\|lunch is now known as amoralej		13:08
hemna	so, fwiw, I filed this one yesterday https://bugs.launchpad.net/cinder/+bug/1992493 as a placeholder for some major problems that I face on the daily with customers that are using our deployment	14:22
hemna	customers end up not being able to do basic operations on volumes due to cinder's inability to manage operations where pools are full, but there are many pools with capacity available in the same backend.	14:24
hemna	this is turning into a major problem in our deployments. can't do backups, clones, snapshots, extends	14:24
hemna	we have patched cinder to try a migrate on a failed extend as a solution and that has worked	14:25
hemna	snapshots are a bit more difficult as cinder doesn't allow migrations for volumes that have existing snaps	14:25
hemna	and backups first thing it does is to snap a volume (attached)	14:26
hemna	which fails	14:26
hemna	so most of the cinder deployment is completely inoperative for customers in this state	14:26
jbernard	hemna: are the other, non-full, pools considered to be equal to the full one in that particular deployment?	14:38
hemna	all pools are available for provisioning in the same backend	14:38
hemna	there is plenty of space in the backend, but certain pools are full and can't take anymore. the volumes being backed up, cloned, snapshotted are typically volumes on the pools that are full as they have been around the longest	14:39
hemna	those operations shouldn't fail	14:39
jbernard	i see, migrating seems like the solution there, could we not also migrate the snaps along with the volume?	14:40
hemna	so what ends up happening is the customer complains, files a ticket, then I have to manually migrate the volume in question so they can do the backup/snap/clone/etc	14:40
hemna	that process isn't 'cloud' like, nor scalable	14:40
jbernard	what happens the volumes' snaps?	14:40
hemna	in our case, snaps are full clones	14:40
hemna	so, those could also get migrated	14:41
hemna	but cinder just quits and says no to everything	14:41
jbernard	would an operator complain about an automigrate function? it seems like really nice to have imo	14:41
hemna	for example, in one backend we have 52 pools, 22 of which are full.	14:42
hemna	I think migrate should be built in to these operations.	14:42
hemna	if it fails to find a host for the operation, try to find a host to migrate it to on the same backend	14:43
jbernard	in considering an automatic migration, how would you select the target pool? random?	14:43
hemna	just sent it back through the scheduler and tell it to find a pool in the same backend	14:43
jbernard	ahh	14:43
hemna	that's what we did for our patch for extend	14:43
jbernard	with available capacity	14:43
hemna	extend fails because the current pool is full or can't take the new size. so we ask the scheduler to migrate it with the new size, it picks one and migrates it, then we extend after the migration completes	14:44
hemna	cinder should be doing that for all the operations where it can't find a host on a particular backend	14:44
jbernard	that would definately be more consistent	14:45
hemna	otherwise most of the real world operations are broken for users	14:45
hemna	I have 20+ deployments of cinder around the world. doing manual migrations isn't a 'solution'	14:46
jbernard	yikes	14:46
jbernard	ptg is in a few days, this might be a good time to air this idea across the group	14:47
hemna	yah, I'll bring it up, which is also why I filed the bug. the bug was unfortunately labeled as medium.	14:47
hemna	I think it's a high priority. cinder isn't really cloud like software if human intervention is needed for basic operations.	14:48
jbernard	enriquetaso: ^ this might be worth considering for https://bugs.launchpad.net/cinder/+bug/1992493	14:48
hemna	I think this was just a big oversight when we enabled pools way back when.	14:48
hemna	heh, the bug was actually labeled 'wishlist' !	14:49
hemna	smh	14:49
* enriquetaso reading		14:53
hemna	fwiw, this is what we did to fix extend https://github.com/sapcc/cinder/pull/134	14:54
*** dviroel_ is now known as dviroel		14:55
enriquetaso	I've mentioned 1992493 on yesterday bug meeting. I'll re-target it to High priority. I think the best idea is to discuss it on the PTG, please add a topic hemna.	15:02
hemna	yes, we need to discuss it. thank you	15:02
enriquetaso	Thanks jbernard hemna	15:03
hemna	posted it in the etherpad	15:28
*** tkajinam is now known as Guest2971		15:43
*** amoralej is now known as amoralej\|off		16:16
enriquetaso	hemna++	16:42
*** dviroel is now known as dviroel\|biab		19:55
*** dviroel\|biab is now known as dviroel		20:51
*** dviroel is now known as dviroel\|afk		21:33

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!