opendevreview | Arun KV proposed openstack/cinder master: Reintroduce DataCore driver https://review.opendev.org/c/openstack/cinder/+/836996 | 05:06 |
---|---|---|
opendevreview | Tushar Trambak Gite proposed openstack/cinder master: Reset state robustification for snapshot os-reset_status https://review.opendev.org/c/openstack/cinder/+/804035 | 05:10 |
opendevreview | Tushar Trambak Gite proposed openstack/cinder master: Reset state robustification for group os-reset_status https://review.opendev.org/c/openstack/cinder/+/804735 | 05:11 |
opendevreview | Tushar Trambak Gite proposed openstack/cinder master: Reset state robustification for group-snapshot os-reset_status https://review.opendev.org/c/openstack/cinder/+/804757 | 05:11 |
opendevreview | Tushar Trambak Gite proposed openstack/cinder master: Reset state robustification for backup os-reset_status https://review.opendev.org/c/openstack/cinder/+/778193 | 05:11 |
opendevreview | Tushar Trambak Gite proposed openstack/cinder master: Include volume type constraints in internal API https://review.opendev.org/c/openstack/cinder/+/846146 | 10:22 |
opendevreview | Alexander Malashenko proposed openstack/cinder master: Cinder displays the volume size provided by the driver, when creating the volume with enabled cache. https://review.opendev.org/c/openstack/cinder/+/836973 | 11:28 |
opendevreview | Alexander Malashenko proposed openstack/cinder master: Cinder displays the volume size provided by the driver, when creating the volume with enabled cache. https://review.opendev.org/c/openstack/cinder/+/836973 | 11:30 |
*** dviroel_ is now known as dviroel | 12:05 | |
tosky | whoami-rajat, geguileo: any idea on how to unbreak https://review.opendev.org/c/openstack/cinder/+/845799 (failure caused by https://review.opendev.org/c/openstack/os-brick/+/834604 )? | 12:45 |
geguileo | tosky: that's the cinderlib failure, right? | 12:46 |
geguileo | it is | 12:46 |
tosky | maybe you discussed this already on Friday but I'm catching up with emails | 12:47 |
geguileo | tosky: whoami-rajat told me about the failure and I started investigating | 12:48 |
geguileo | in my eyes the failure is caused by privsep not using the venv's libraries and using the host's ones | 12:48 |
geguileo | so fixing that would be the right fix | 12:48 |
geguileo | to unblock the gate I think we have 2 options | 12:49 |
geguileo | 1- Change the .zuul jobs (the 2 that fail: LVM & Ceph) that stable branch to use brick from source | 12:49 |
geguileo | 2- change the cinderlib tox.ini in that branch to use the released os-brick version instead of using the one from master | 12:50 |
geguileo | those are the 2 simple fixes that I think can unblock the gate | 12:50 |
geguileo | I'm currently looking into the proper solution | 12:50 |
tosky | I think 2 is the one I would personally choose first | 13:02 |
rosmaita | geguileo: fwiw, i agree with tosky about option #2 | 13:12 |
rosmaita | i think we need to have a cinder-project-deliverables CI review at the next midcycle | 13:13 |
rosmaita | i'm worried about not testing os-brick properly in the gate, but at the same time, not getting into the situation glance had, where a glance CI fix could not merge because glance_store CI was broken, and the glance_store CI fix could not merge because glance CI was broken | 13:14 |
rosmaita | luckily, we were able to make a devstack change that fixed both | 13:15 |
rosmaita | but it's not a good situation to be in | 13:15 |
whoami-rajat | agree with the majority here, we should modify cinderlib instead of the jobs consuming it else we might start facing similar issues in other job and end up patching jobs again and again | 13:16 |
whoami-rajat | rosmaita, I face that issue every cycle, is it because of the functional job we run on glance_store gate? | 13:17 |
whoami-rajat | glance functional job on glance_store gate | 13:18 |
rosmaita | whoami-rajat: i am not sure, i haven't looked into it carefully | 13:18 |
whoami-rajat | ack | 13:18 |
whoami-rajat | geguileo, IIUC, we still have an issue with how cinderlib using privsep in gate right? since the patch in os-brick stable/wallaby broke cinderlib tests, if os-brick gets released with that patch, cinderlib tests are going to break again right? | 13:21 |
geguileo | whoami-rajat: cinderlib tests work fine on its gate | 13:21 |
geguileo | they only break in the Cinder gate | 13:21 |
whoami-rajat | geguileo, i mean on cinder gate | 13:21 |
whoami-rajat | yes | 13:22 |
geguileo | oh, forgot additional way to fix the issue, make an os-brick release | 13:22 |
geguileo | afaik the issue right now is caused by the host having an older version of os-brick than the one in the virtual env | 13:22 |
geguileo | cinderlib sees the os-brick version in the virtual env and calls privsep to execute the code | 13:23 |
geguileo | and privsep is using the older os-brick version (from pip) | 13:23 |
geguileo | this can be: 1- Because os-brick is not doing correctly the privsep 2- Because cinderlib is somehow calling the privsep that Cinder started | 13:24 |
geguileo | I don't think the second option is possible... Because privsep should be generating a random new directory and a new socket inside of it each time | 13:25 |
whoami-rajat | wasn't able to understand all the details but looks like we've an issue somewhere with our usage of privsep (probably os-brick as you said) | 13:27 |
geguileo | whoami-rajat: no, not os-brick, this is 99.99% sure on the cinderlib side (code, config, job config, etc) | 13:36 |
geguileo | I am testing things locally and it works fine, and privsep called with rootwrap has the right sys.path to search for libraries (at least in my system)... | 13:36 |
whoami-rajat | ok, you said "1- Because os-brick is not doing correctly the privsep" so got confused | 13:36 |
geguileo | oh, yeah, that was a possibility 10 minutes ago lol | 13:37 |
geguileo | but according to my local tests doesn't look like that's the case... | 13:38 |
whoami-rajat | :D ack | 13:38 |
whoami-rajat | thanks for looking into it, we can have the workaround/"right way to use libs in gate i.e. released" till then to at least unblock the wallaby gate | 13:39 |
whoami-rajat | geguileo, hope you will be pushing the patch for it? | 13:39 |
opendevreview | Alexander Malashenko proposed openstack/cinder master: Cinder displays the volume size provided by the driver, when creating the volume with enabled cache. https://review.opendev.org/c/openstack/cinder/+/836973 | 14:07 |
*** dviroel is now known as dviroel|lunch | 15:36 | |
*** dviroel|lunch is now known as dviroel | 16:43 | |
opendevreview | Alexander Malashenko proposed openstack/cinder master: Cinder displays the volume size provided by the driver, when creating the volume with enabled cache. https://review.opendev.org/c/openstack/cinder/+/836973 | 17:56 |
hemna | geguileo, so it looks like there are really only 3 places where the workers entries are created in all of cinder. the scheduler for volume create, the volume rpcapi for delete_volume and create_snapshot. | 18:41 |
hemna | that's it | 18:41 |
hemna | I'm not sure I see the purpose of the workers table at this point as it's really not used | 18:42 |
geguileo | iirc they are created only in the places related to the operations we can actual clean | 18:42 |
geguileo | which are very few | 18:42 |
geguileo | I thought there were some more... | 18:42 |
hemna | yah, so I guess the workers table purpose is for somehow cleaning up during the next start? | 18:42 |
geguileo | next start and on a clustered service so an operator can trigger the cleaning on other nodes | 18:44 |
geguileo | without races conditions | 18:44 |
hemna | hrmm ok so it's hard to see how I could use this table then | 18:49 |
hemna | I am looking for 2 things out of the proposal. 1) cinder top - to show in realtime(ish) what cinder is working on right now. 2) a history of state changes for a volume for a particular request id. | 18:50 |
hemna | cinder top would simply loop forever showing a list of active actions being taken on what volumes and each actions progress. | 18:51 |
hemna | and the history is a log of what happened at various steps in the process for actions on a volume. | 18:52 |
hemna | cinder top could be used to help me decide if I can safely bounce the cinder service. | 18:52 |
hemna | and the history would help find out wtf went wrong on actions on a volume....or at least where it bailed. | 18:52 |
hemna | there are so many damn volume inconsistencies it's a mountain of problems for me at this point. | 18:53 |
hemna | this is one of my many deployments: https://paste.openstack.org/show/bsV8DWpSeh9eq0oDDOVJ/ | 18:55 |
geguileo | the top could be done with the workers table (adding other operations there), which could allow us to do proper cleanup for other states as well | 18:56 |
geguileo | and it would help make the query a lot faster (since it does hard deletes and not soft ones) | 18:56 |
geguileo | hemna: ouch, ouch, ouch, 92 94 errors | 18:56 |
hemna | yes | 18:57 |
hemna | well the workers table is a new row for every change in the process of a volume | 18:58 |
hemna | it would be hard to show a live table | 18:59 |
geguileo | no, no, the same row is updated | 18:59 |
geguileo | that's why it doesn't help with your second objetive | 18:59 |
geguileo | it helps with top, because it only shows what's ongoing | 18:59 |
hemna | hrmm, maybe the data I'm looking at is bogus then | 18:59 |
geguileo | but once a resource reaches a stable state it gets removed, so no history | 19:00 |
geguileo | it was done on purpose to make sure it was performant | 19:00 |
geguileo | I think that using the same table for history and top would not have a good performance | 19:01 |
geguileo | because it would store a lot of records | 19:01 |
geguileo | but then again, something non performant is 1000% better than nothing | 19:02 |
geguileo | so we could always use the history table to do top, and in the future move things to the workers table | 19:02 |
geguileo | by things I mean only the top part | 19:02 |
hemna | https://paste.openstack.org/show/boLvsDvkDRXnQYLgKpB2/ | 19:02 |
hemna | yah I agree, those 2 features can't be in the same table | 19:03 |
geguileo | but we can do them in the same table "for now" | 19:03 |
hemna | but both the top and the history should have request id in it too | 19:03 |
geguileo | history definitely, I'm thinking about top... | 19:04 |
hemna | the top table would be pretty small at any given point in time | 19:04 |
geguileo | if it's a different table yes | 19:04 |
hemna | the history could be cleaned out by an external script via the soft delete mechanism | 19:04 |
geguileo | or we can add an endpoint to clean things older than a date or something | 19:05 |
geguileo | that way we don't have to do weird things with the soft delete | 19:05 |
hemna | yah that's fine too | 19:05 |
geguileo | having the request id is ok in the top table, though I admit I don't quite see its usefulness (disadvantages of not maintaining a cloud) | 19:06 |
hemna | not a lot of activity going on at the moment, but I wrote this as a volume states tool https://asciinema.org/a/znuSbH2hfxB5TPv71xTL1zXkC | 19:08 |
hemna | so the request_id is vital for me, because right now we push all the logs into kibana | 19:09 |
hemna | and when there are problems I have to search kibana for the specific error, then I filter out by request_id after I find the request_id that failed. | 19:09 |
hemna | being able to go into the DB and look by request_id would make it easier for filtering for me in the history table, instead of by volume id | 19:10 |
hemna | we do lots of stuff to the same volume, but a particular request_id is related to a specific action being take on that volume that may have failed. | 19:10 |
hemna | re: I don't care about 3 days ago extension of a particular volume, but 3 days ago's attach of that volume I do care about. | 19:11 |
hemna | 2 different actions with 2 request_ids | 19:11 |
hemna | so what happens if I start stuffing other actions in the workers table? | 19:16 |
hemna | like attach, detach, extend, migration | 19:16 |
hemna | heh, every entry in the workers table is deleted=0; | 19:17 |
hemna | entries from 2018! :P | 19:18 |
*** dviroel is now known as dviroel|afk | 20:05 | |
opendevreview | Francesco Pantano proposed openstack/devstack-plugin-ceph master: Deploy with cephadm https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/826484 | 21:01 |
*** dviroel|afk is now known as dviroel|out | 21:05 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!