Monday, 2022-06-20

opendevreview	Arun KV proposed openstack/cinder master: Reintroduce DataCore driver https://review.opendev.org/c/openstack/cinder/+/836996	05:06
opendevreview	Tushar Trambak Gite proposed openstack/cinder master: Reset state robustification for snapshot os-reset_status https://review.opendev.org/c/openstack/cinder/+/804035	05:10
opendevreview	Tushar Trambak Gite proposed openstack/cinder master: Reset state robustification for group os-reset_status https://review.opendev.org/c/openstack/cinder/+/804735	05:11
opendevreview	Tushar Trambak Gite proposed openstack/cinder master: Reset state robustification for group-snapshot os-reset_status https://review.opendev.org/c/openstack/cinder/+/804757	05:11
opendevreview	Tushar Trambak Gite proposed openstack/cinder master: Reset state robustification for backup os-reset_status https://review.opendev.org/c/openstack/cinder/+/778193	05:11
opendevreview	Tushar Trambak Gite proposed openstack/cinder master: Include volume type constraints in internal API https://review.opendev.org/c/openstack/cinder/+/846146	10:22
opendevreview	Alexander Malashenko proposed openstack/cinder master: Cinder displays the volume size provided by the driver, when creating the volume with enabled cache. https://review.opendev.org/c/openstack/cinder/+/836973	11:28
opendevreview	Alexander Malashenko proposed openstack/cinder master: Cinder displays the volume size provided by the driver, when creating the volume with enabled cache. https://review.opendev.org/c/openstack/cinder/+/836973	11:30
*** dviroel_ is now known as dviroel		12:05
tosky	whoami-rajat, geguileo: any idea on how to unbreak https://review.opendev.org/c/openstack/cinder/+/845799 (failure caused by https://review.opendev.org/c/openstack/os-brick/+/834604 )?	12:45
geguileo	tosky: that's the cinderlib failure, right?	12:46
geguileo	it is	12:46
tosky	maybe you discussed this already on Friday but I'm catching up with emails	12:47
geguileo	tosky: whoami-rajat told me about the failure and I started investigating	12:48
geguileo	in my eyes the failure is caused by privsep not using the venv's libraries and using the host's ones	12:48
geguileo	so fixing that would be the right fix	12:48
geguileo	to unblock the gate I think we have 2 options	12:49
geguileo	1- Change the .zuul jobs (the 2 that fail: LVM & Ceph) that stable branch to use brick from source	12:49
geguileo	2- change the cinderlib tox.ini in that branch to use the released os-brick version instead of using the one from master	12:50
geguileo	those are the 2 simple fixes that I think can unblock the gate	12:50
geguileo	I'm currently looking into the proper solution	12:50
tosky	I think 2 is the one I would personally choose first	13:02
rosmaita	geguileo: fwiw, i agree with tosky about option #2	13:12
rosmaita	i think we need to have a cinder-project-deliverables CI review at the next midcycle	13:13
rosmaita	i'm worried about not testing os-brick properly in the gate, but at the same time, not getting into the situation glance had, where a glance CI fix could not merge because glance_store CI was broken, and the glance_store CI fix could not merge because glance CI was broken	13:14
rosmaita	luckily, we were able to make a devstack change that fixed both	13:15
rosmaita	but it's not a good situation to be in	13:15
whoami-rajat	agree with the majority here, we should modify cinderlib instead of the jobs consuming it else we might start facing similar issues in other job and end up patching jobs again and again	13:16
whoami-rajat	rosmaita, I face that issue every cycle, is it because of the functional job we run on glance_store gate?	13:17
whoami-rajat	glance functional job on glance_store gate	13:18
rosmaita	whoami-rajat: i am not sure, i haven't looked into it carefully	13:18
whoami-rajat	ack	13:18
whoami-rajat	geguileo, IIUC, we still have an issue with how cinderlib using privsep in gate right? since the patch in os-brick stable/wallaby broke cinderlib tests, if os-brick gets released with that patch, cinderlib tests are going to break again right?	13:21
geguileo	whoami-rajat: cinderlib tests work fine on its gate	13:21
geguileo	they only break in the Cinder gate	13:21
whoami-rajat	geguileo, i mean on cinder gate	13:21
whoami-rajat	yes	13:22
geguileo	oh, forgot additional way to fix the issue, make an os-brick release	13:22
geguileo	afaik the issue right now is caused by the host having an older version of os-brick than the one in the virtual env	13:22
geguileo	cinderlib sees the os-brick version in the virtual env and calls privsep to execute the code	13:23
geguileo	and privsep is using the older os-brick version (from pip)	13:23
geguileo	this can be: 1- Because os-brick is not doing correctly the privsep 2- Because cinderlib is somehow calling the privsep that Cinder started	13:24
geguileo	I don't think the second option is possible... Because privsep should be generating a random new directory and a new socket inside of it each time	13:25
whoami-rajat	wasn't able to understand all the details but looks like we've an issue somewhere with our usage of privsep (probably os-brick as you said)	13:27
geguileo	whoami-rajat: no, not os-brick, this is 99.99% sure on the cinderlib side (code, config, job config, etc)	13:36
geguileo	I am testing things locally and it works fine, and privsep called with rootwrap has the right sys.path to search for libraries (at least in my system)...	13:36
whoami-rajat	ok, you said "1- Because os-brick is not doing correctly the privsep" so got confused	13:36
geguileo	oh, yeah, that was a possibility 10 minutes ago lol	13:37
geguileo	but according to my local tests doesn't look like that's the case...	13:38
whoami-rajat	:D ack	13:38
whoami-rajat	thanks for looking into it, we can have the workaround/"right way to use libs in gate i.e. released" till then to at least unblock the wallaby gate	13:39
whoami-rajat	geguileo, hope you will be pushing the patch for it?	13:39
opendevreview	Alexander Malashenko proposed openstack/cinder master: Cinder displays the volume size provided by the driver, when creating the volume with enabled cache. https://review.opendev.org/c/openstack/cinder/+/836973	14:07
*** dviroel is now known as dviroel\|lunch		15:36
*** dviroel\|lunch is now known as dviroel		16:43
opendevreview	Alexander Malashenko proposed openstack/cinder master: Cinder displays the volume size provided by the driver, when creating the volume with enabled cache. https://review.opendev.org/c/openstack/cinder/+/836973	17:56
hemna	geguileo, so it looks like there are really only 3 places where the workers entries are created in all of cinder. the scheduler for volume create, the volume rpcapi for delete_volume and create_snapshot.	18:41
hemna	that's it	18:41
hemna	I'm not sure I see the purpose of the workers table at this point as it's really not used	18:42
geguileo	iirc they are created only in the places related to the operations we can actual clean	18:42
geguileo	which are very few	18:42
geguileo	I thought there were some more...	18:42
hemna	yah, so I guess the workers table purpose is for somehow cleaning up during the next start?	18:42
geguileo	next start and on a clustered service so an operator can trigger the cleaning on other nodes	18:44
geguileo	without races conditions	18:44
hemna	hrmm ok so it's hard to see how I could use this table then	18:49
hemna	I am looking for 2 things out of the proposal. 1) cinder top - to show in realtime(ish) what cinder is working on right now. 2) a history of state changes for a volume for a particular request id.	18:50
hemna	cinder top would simply loop forever showing a list of active actions being taken on what volumes and each actions progress.	18:51
hemna	and the history is a log of what happened at various steps in the process for actions on a volume.	18:52
hemna	cinder top could be used to help me decide if I can safely bounce the cinder service.	18:52
hemna	and the history would help find out wtf went wrong on actions on a volume....or at least where it bailed.	18:52
hemna	there are so many damn volume inconsistencies it's a mountain of problems for me at this point.	18:53
hemna	this is one of my many deployments: https://paste.openstack.org/show/bsV8DWpSeh9eq0oDDOVJ/	18:55
geguileo	the top could be done with the workers table (adding other operations there), which could allow us to do proper cleanup for other states as well	18:56
geguileo	and it would help make the query a lot faster (since it does hard deletes and not soft ones)	18:56
geguileo	hemna: ouch, ouch, ouch, 92 94 errors	18:56
hemna	yes	18:57
hemna	well the workers table is a new row for every change in the process of a volume	18:58
hemna	it would be hard to show a live table	18:59
geguileo	no, no, the same row is updated	18:59
geguileo	that's why it doesn't help with your second objetive	18:59
geguileo	it helps with top, because it only shows what's ongoing	18:59
hemna	hrmm, maybe the data I'm looking at is bogus then	18:59
geguileo	but once a resource reaches a stable state it gets removed, so no history	19:00
geguileo	it was done on purpose to make sure it was performant	19:00
geguileo	I think that using the same table for history and top would not have a good performance	19:01
geguileo	because it would store a lot of records	19:01
geguileo	but then again, something non performant is 1000% better than nothing	19:02
geguileo	so we could always use the history table to do top, and in the future move things to the workers table	19:02
geguileo	by things I mean only the top part	19:02
hemna	https://paste.openstack.org/show/boLvsDvkDRXnQYLgKpB2/	19:02
hemna	yah I agree, those 2 features can't be in the same table	19:03
geguileo	but we can do them in the same table "for now"	19:03
hemna	but both the top and the history should have request id in it too	19:03
geguileo	history definitely, I'm thinking about top...	19:04
hemna	the top table would be pretty small at any given point in time	19:04
geguileo	if it's a different table yes	19:04
hemna	the history could be cleaned out by an external script via the soft delete mechanism	19:04
geguileo	or we can add an endpoint to clean things older than a date or something	19:05
geguileo	that way we don't have to do weird things with the soft delete	19:05
hemna	yah that's fine too	19:05
geguileo	having the request id is ok in the top table, though I admit I don't quite see its usefulness (disadvantages of not maintaining a cloud)	19:06
hemna	not a lot of activity going on at the moment, but I wrote this as a volume states tool https://asciinema.org/a/znuSbH2hfxB5TPv71xTL1zXkC	19:08
hemna	so the request_id is vital for me, because right now we push all the logs into kibana	19:09
hemna	and when there are problems I have to search kibana for the specific error, then I filter out by request_id after I find the request_id that failed.	19:09
hemna	being able to go into the DB and look by request_id would make it easier for filtering for me in the history table, instead of by volume id	19:10
hemna	we do lots of stuff to the same volume, but a particular request_id is related to a specific action being take on that volume that may have failed.	19:10
hemna	re: I don't care about 3 days ago extension of a particular volume, but 3 days ago's attach of that volume I do care about.	19:11
hemna	2 different actions with 2 request_ids	19:11
hemna	so what happens if I start stuffing other actions in the workers table?	19:16
hemna	like attach, detach, extend, migration	19:16
hemna	heh, every entry in the workers table is deleted=0;	19:17
hemna	entries from 2018! :P	19:18
*** dviroel is now known as dviroel\|afk		20:05
opendevreview	Francesco Pantano proposed openstack/devstack-plugin-ceph master: Deploy with cephadm https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/826484	21:01
*** dviroel\|afk is now known as dviroel\|out		21:05

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!