openstackgerrit | Merged openstack/cinder stable/pike: NetApp: Return all iSCSI targets-portals https://review.opendev.org/662757 | 00:01 |
---|---|---|
openstackgerrit | Merged openstack/cinder master: Refactor use of encryption/image volume utils https://review.opendev.org/669945 | 00:01 |
openstackgerrit | Merged openstack/cinderlib master: Run functional tests with memory persistence https://review.opendev.org/648218 | 00:01 |
openstackgerrit | Merged openstack/cinder stable/stein: NetApp ONTAP: Fix JSON serialization error on EMS logs https://review.opendev.org/678105 | 00:01 |
openstackgerrit | Merged openstack/cinder stable/rocky: Fix ceph: only close rbd image after snapshot iteration is finished https://review.opendev.org/676211 | 00:01 |
openstackgerrit | Merged openstack/cinder stable/stein: Fix LVM IPv6 target portals https://review.opendev.org/678601 | 00:01 |
openstackgerrit | Merged openstack/cinder master: [api-ref]Fix values of service-status in list-hosts https://review.opendev.org/677859 | 00:01 |
openstackgerrit | Merged openstack/cinder stable/pike: Remove Sheepdog tests from zuul config https://review.opendev.org/669334 | 00:02 |
openstackgerrit | Merged openstack/cinder master: [api-ref]Fix response example file of update_type https://review.opendev.org/677664 | 00:02 |
*** trident has joined #openstack-cinder | 00:03 | |
openstackgerrit | Merged openstack/cinder master: Add context to cloning snapshots in remotefs driver https://review.opendev.org/570885 | 00:08 |
openstackgerrit | Alan Bishop proposed openstack/cinder stable/pike: Fix NFS volume retype with migrate https://review.opendev.org/680237 | 00:10 |
openstackgerrit | Merged openstack/cinder master: Synology: Fix driver to be compatible with python3 https://review.opendev.org/679705 | 00:14 |
openstackgerrit | Merged openstack/cinderlib master: Add pdf build support https://review.opendev.org/676997 | 00:14 |
*** markvoelker has joined #openstack-cinder | 01:00 | |
*** markvoelker has quit IRC | 01:05 | |
*** senrique_ has quit IRC | 01:14 | |
openstackgerrit | zhufl proposed openstack/cinder master: Fix potential NameError of rc_id https://review.opendev.org/679944 | 01:43 |
*** markvoelker has joined #openstack-cinder | 02:01 | |
*** markvoelker has quit IRC | 02:05 | |
*** markvoelker has joined #openstack-cinder | 03:31 | |
*** markvoelker has quit IRC | 03:41 | |
*** mvkr has joined #openstack-cinder | 03:51 | |
*** gkadam has joined #openstack-cinder | 03:54 | |
*** gkadam has quit IRC | 03:54 | |
*** whfnst has joined #openstack-cinder | 04:07 | |
*** markvoelker has joined #openstack-cinder | 04:30 | |
*** markvoelker has quit IRC | 04:35 | |
*** Luzi has joined #openstack-cinder | 04:55 | |
*** hoonetorg has quit IRC | 04:56 | |
*** udesale has joined #openstack-cinder | 05:02 | |
*** hoonetorg has joined #openstack-cinder | 05:13 | |
*** markvoelker has joined #openstack-cinder | 05:30 | |
*** markvoelker has quit IRC | 05:35 | |
*** psachin has joined #openstack-cinder | 05:36 | |
*** snecker has joined #openstack-cinder | 06:04 | |
*** snecker has quit IRC | 06:29 | |
*** psachin has quit IRC | 06:31 | |
openstackgerrit | Bhaa Shakur proposed openstack/cinder master: Zadara VPSA: Move to API access key authentication https://review.opendev.org/670715 | 06:36 |
openstackgerrit | Bhaa Shakur proposed openstack/cinder master: Zadara VPSA: Add Multi-Attach to driver capabilities. https://review.opendev.org/679565 | 06:36 |
*** mmethot_ has joined #openstack-cinder | 06:43 | |
*** mmethot has quit IRC | 06:45 | |
openstackgerrit | Abhishek Kekane proposed openstack/cinder master: Support multiple stores of Glance https://review.opendev.org/661676 | 06:46 |
*** openstackgerrit has quit IRC | 06:51 | |
*** udesale has quit IRC | 06:56 | |
*** raghavendrat has joined #openstack-cinder | 07:01 | |
*** snecker has joined #openstack-cinder | 07:03 | |
*** markvoelker has joined #openstack-cinder | 07:06 | |
*** markvoelker has quit IRC | 07:10 | |
*** sapd1_x has quit IRC | 07:10 | |
*** tesseract has joined #openstack-cinder | 07:15 | |
*** sahid has joined #openstack-cinder | 07:18 | |
*** trident has quit IRC | 07:21 | |
*** tosky has joined #openstack-cinder | 07:23 | |
*** trident has joined #openstack-cinder | 07:29 | |
*** rcernin has quit IRC | 07:33 | |
*** trident has quit IRC | 07:34 | |
*** trident has joined #openstack-cinder | 07:43 | |
*** openstackgerrit has joined #openstack-cinder | 07:43 | |
openstackgerrit | Simon O'Donovan proposed openstack/cinder master: PowerMax Driver - Revert to Snapshot Fix https://review.opendev.org/679970 | 07:43 |
*** snecker has quit IRC | 07:46 | |
*** tkajinam has quit IRC | 08:05 | |
openstackgerrit | pengyuesheng proposed openstack/cinder master: Blacklist eventlet 0.25.0 https://review.opendev.org/680318 | 08:18 |
*** davidsha has joined #openstack-cinder | 08:26 | |
openstackgerrit | OpenStack Release Bot proposed openstack/os-brick stable/train: Update .gitreview for stable/train https://review.opendev.org/680325 | 08:32 |
openstackgerrit | OpenStack Release Bot proposed openstack/os-brick stable/train: Update TOX/UPPER_CONSTRAINTS_FILE for stable/train https://review.opendev.org/680326 | 08:32 |
openstackgerrit | OpenStack Release Bot proposed openstack/os-brick master: Update master for stable/train https://review.opendev.org/680327 | 08:32 |
*** whfnst has quit IRC | 08:41 | |
*** e0ne has joined #openstack-cinder | 08:41 | |
openstackgerrit | pengyuesheng proposed openstack/os-brick master: Blacklist eventlet 0.25.0 https://review.opendev.org/680336 | 08:42 |
*** markvoelker has joined #openstack-cinder | 08:45 | |
*** whfnst has joined #openstack-cinder | 08:46 | |
*** markvoelker has quit IRC | 08:50 | |
*** trident has quit IRC | 09:01 | |
*** trident has joined #openstack-cinder | 09:09 | |
*** ociuhandu has joined #openstack-cinder | 09:23 | |
openstackgerrit | Pawel Kaminski proposed openstack/cinder master: target/spdknvmf: Add max_queue_depth configuration parameter https://review.opendev.org/672064 | 09:35 |
*** udesale has joined #openstack-cinder | 10:20 | |
*** udesale has quit IRC | 10:28 | |
*** udesale has joined #openstack-cinder | 10:29 | |
openstackgerrit | Naoki Saito proposed openstack/cinder master: NEC Driver: Support multi-attach https://review.opendev.org/675279 | 10:41 |
*** sapd1_x has joined #openstack-cinder | 10:46 | |
*** markvoelker has joined #openstack-cinder | 10:46 | |
*** markvoelker has quit IRC | 10:52 | |
*** carloss has joined #openstack-cinder | 11:02 | |
*** sapd1_x has quit IRC | 11:07 | |
*** ociuhandu has quit IRC | 11:23 | |
hemna | doink | 11:26 |
*** lpetrut has joined #openstack-cinder | 11:27 | |
openstackgerrit | Rajat Dhasmana proposed openstack/cinder master: Untyped to Default Volume Type https://review.opendev.org/639180 | 11:30 |
openstackgerrit | Rajat Dhasmana proposed openstack/cinder master: Untyped to Default Volume Type https://review.opendev.org/639180 | 11:30 |
*** rosmaita has left #openstack-cinder | 11:35 | |
*** ociuhandu has joined #openstack-cinder | 11:39 | |
*** ociuhandu has quit IRC | 11:40 | |
*** ociuhandu has joined #openstack-cinder | 11:41 | |
*** ociuhandu has quit IRC | 11:41 | |
*** ociuhandu has joined #openstack-cinder | 11:42 | |
*** ociuhandu has quit IRC | 11:43 | |
*** ociuhandu has joined #openstack-cinder | 11:44 | |
*** ociuhandu has quit IRC | 11:50 | |
*** dr_gogeta86 has joined #openstack-cinder | 11:53 | |
dr_gogeta86 | hi guys | 11:53 |
dr_gogeta86 | who can help me to increase timeout between volume allocation and wwn scan on cinder volume | 11:53 |
dr_gogeta86 | cinder wan't that particular volume | 11:54 |
dr_gogeta86 | but doesn't find ... and heat stack delete its | 11:54 |
dr_gogeta86 | *it | 11:54 |
hemna | ? | 11:57 |
*** markvoelker has joined #openstack-cinder | 12:02 | |
dr_gogeta86 | got this latency problem | 12:03 |
dr_gogeta86 | I'm using hitachi cinder driver on os queens | 12:03 |
dr_gogeta86 | it creates volumes like a charm | 12:03 |
dr_gogeta86 | but when a boot from volume is needed | 12:03 |
dr_gogeta86 | there is a bit of delay to see the volume inside cinder-volume machine | 12:04 |
dr_gogeta86 | copy fail | 12:04 |
dr_gogeta86 | and delete the volume due failure | 12:04 |
*** virendra-sharma has joined #openstack-cinder | 12:05 | |
hemna | have to look at the logs | 12:05 |
hemna | to see why it's failing | 12:05 |
*** dviroel has joined #openstack-cinder | 12:05 | |
dr_gogeta86 | time to gather is | 12:07 |
dr_gogeta86 | some in particular to check? | 12:07 |
raghavendrat | hi hemna: I had a query regarding your review comment on https://review.opendev.org/#/c/677945/ | 12:09 |
dr_gogeta86 | http://paste.openstack.org/show/771290/ | 12:09 |
dr_gogeta86 | hemna, ^ | 12:11 |
*** virendra-sharma has quit IRC | 12:13 | |
*** ociuhandu has joined #openstack-cinder | 12:27 | |
*** n-saito has quit IRC | 12:28 | |
hemna | raghavendrat: sup | 12:28 |
raghavendrat | I had written _initialize_connection_common function because similar code was used for primary array & secondary array | 12:31 |
*** ociuhandu has quit IRC | 12:32 | |
raghavendrat | If i understand correctly, I have to remove this function & revert to original code flow. Also write a separate function _initialize_connection_replication for secondary array | 12:34 |
*** jmlowe has quit IRC | 12:36 | |
* hemna looks again | 12:39 | |
hemna | ok I see that now | 12:41 |
*** eharney has joined #openstack-cinder | 12:41 | |
hemna | the logging is still way too noisy for info | 12:41 |
hemna | that stuff should be debug | 12:41 |
raghavendrat | i will change logging to debug | 12:41 |
hemna | the cl name too | 12:42 |
raghavendrat | yes. i will change name | 12:42 |
raghavendrat | my query was regarding _initialize_connection_common function. | 12:42 |
raghavendrat | this function was written because similar code was used for primary array & secondary array | 12:43 |
raghavendrat | If i understand correctly, I have to remove this function & revert to original code flow. Also write a separate function _initialize_connection_replication for secondary array | 12:43 |
raghavendrat | right ? | 12:43 |
hemna | nah, it's fine as it is I suppose | 12:44 |
hemna | it's just overly complicated | 12:44 |
hemna | but there in lies the problem with cinder supporting replication itself | 12:45 |
hemna | at this point, lets focus on the CI | 12:45 |
hemna | and get that running | 12:45 |
hemna | because we can't even approve this patch without it. | 12:46 |
raghavendrat | ok. my team-mates are working on CI front. will keep everyone posted | 12:47 |
raghavendrat | as regards, my patch; i will do two things: [1] change log to debug [2] rename "cl" to "remote_client" | 12:47 |
raghavendrat | and submit another patchset | 12:47 |
hemna | ok good | 12:48 |
raghavendrat | thanks | 12:49 |
openstackgerrit | Rajat Dhasmana proposed openstack/python-cinderclient master: Optional filters parameters should be passed only once https://review.opendev.org/678523 | 12:57 |
*** jmlowe has joined #openstack-cinder | 12:59 | |
dr_gogeta86 | hemna, did u seen the logs ? | 13:03 |
*** mriedem has joined #openstack-cinder | 13:07 | |
*** pcaruana has quit IRC | 13:10 | |
raghavendrat | I am leaving for the day. | 13:13 |
*** ociuhandu has joined #openstack-cinder | 13:18 | |
openstackgerrit | Simon O'Donovan proposed openstack/cinder master: PowerMax Driver - Metro Volume Metadata change https://review.opendev.org/680406 | 13:19 |
smcginnis | dr_gogeta86: That paste doesn't have a lot of details. But the error there is on running multipath. I guess make sure you have multipath installed and configured would be my first suggestion. | 13:21 |
dr_gogeta86 | smcginnis, of cours it is | 13:21 |
dr_gogeta86 | but i got so much delay from unmask | 13:21 |
*** spatel has joined #openstack-cinder | 13:22 | |
*** senrique_ has joined #openstack-cinder | 13:27 | |
*** rosmaita has joined #openstack-cinder | 13:32 | |
*** lseki has joined #openstack-cinder | 13:39 | |
*** pcaruana has joined #openstack-cinder | 13:39 | |
*** spatel has quit IRC | 13:41 | |
*** Luzi has quit IRC | 13:46 | |
*** senrique_ has quit IRC | 13:49 | |
*** enriquetaso has joined #openstack-cinder | 13:57 | |
openstackgerrit | Eric Harney proposed openstack/cinder master: Rename volume/utils.py to volume/volume_utils.py https://review.opendev.org/677492 | 14:00 |
*** raghavendrat has quit IRC | 14:03 | |
*** ociuhandu has quit IRC | 14:04 | |
*** lpetrut has quit IRC | 14:05 | |
*** senrique_ has joined #openstack-cinder | 14:12 | |
*** pcaruana has quit IRC | 14:12 | |
openstackgerrit | Merged openstack/python-cinderclient master: Add custom CA support for get_server_version https://review.opendev.org/675891 | 14:14 |
*** enriquetaso has quit IRC | 14:14 | |
*** ociuhandu has joined #openstack-cinder | 14:20 | |
*** senrique_ has quit IRC | 14:20 | |
*** ociuhandu has quit IRC | 14:25 | |
*** enriquetaso has joined #openstack-cinder | 14:27 | |
*** enriquetaso has quit IRC | 14:28 | |
*** enriquetaso has joined #openstack-cinder | 14:28 | |
*** lpetrut has joined #openstack-cinder | 14:29 | |
openstackgerrit | Simon O'Donovan proposed openstack/cinder master: PowerMax Driver - Metro Volume Metadata change https://review.opendev.org/680406 | 14:32 |
*** enriquetaso has quit IRC | 14:42 | |
*** tpsilva has quit IRC | 14:43 | |
*** udesale has quit IRC | 14:46 | |
*** lpetrut has quit IRC | 14:47 | |
*** udesale has joined #openstack-cinder | 14:47 | |
hemna | dr_gogeta86: have to find out why multipath -l call is failing | 14:57 |
hemna | that is quite unusual | 14:57 |
hemna | my guess is there more to the log than that | 14:58 |
*** e0ne has quit IRC | 15:05 | |
*** sfernand has joined #openstack-cinder | 15:05 | |
openstackgerrit | Brian Rosmaita proposed openstack/cinderlib master: Don't run functional gates on doc-only changes https://review.opendev.org/680070 | 15:06 |
Roamer` | could anyone do a quick review on two pretty much trivial StorPool driver changes? https://review.opendev.org/679785 (advertise thin provisioning and a couple of other capabilities) and https://review.opendev.org/679676 (mark it as supported again) - thanks in advance! | 15:14 |
*** tosky has quit IRC | 15:15 | |
smcginnis | Roamer`: I don't see CINDER_BRANCH being set in the local.conf. Where are you telling devstack to load the current patch to test? | 15:17 |
*** enriquetaso has joined #openstack-cinder | 15:26 | |
dr_gogeta86 | hemna, my suspect is | 15:27 |
dr_gogeta86 | the storage manager returns the WWNN | 15:27 |
dr_gogeta86 | but that lun is visible after many minutes | 15:27 |
*** ociuhandu has joined #openstack-cinder | 15:31 | |
*** tpsilva has joined #openstack-cinder | 15:32 | |
*** ociuhandu has quit IRC | 15:36 | |
*** thgcorrea has joined #openstack-cinder | 15:37 | |
hemna | if that volume doesn't show up for minutes on the host, then that seems like a problem between the host and the storage backend | 15:43 |
hemna | adding a minutes long timout looking for a device will cause cinder api calls and rabbitmq messages to bail | 15:44 |
hemna | so that's not going to help you | 15:44 |
hemna | gotta debug why that volume isn't showing up quickly | 15:44 |
*** jmlowe has quit IRC | 15:47 | |
hemna | so it doesn't look like we have API rate limiting? | 15:50 |
hemna | https://bugs.launchpad.net/cinder/+bug/1662637 | 15:50 |
openstack | Launchpad bug 1662637 in Cinder "Rate limit settings not enforced" [Undecided,New] | 15:50 |
*** jmlowe has joined #openstack-cinder | 16:00 | |
*** altlogbot_0 has quit IRC | 16:01 | |
Roamer` | smcginnis, you mean for the CI? um, is this not what zuul-merger does? from what I can see, when devstack starts, the Ansible jobs have already copied the Cinder repository to /opt/stack/cinder after merging the patch to be tested | 16:01 |
*** altlogbot_1 has joined #openstack-cinder | 16:02 | |
Roamer` | smcginnis, right now I have an SSH session to the worker node and cd /opt/stack/cinder and git log shows me a commit from zuul that says "Merge commit 'refs/changes/64/672064/6' of ssh://review.opendev.org:29418/openstack/cinder into HEAD" | 16:02 |
Roamer` | and that's exactly what https://spfactory.storpool.com/zuul/t/local/status says is being tested right now | 16:02 |
*** irclogbot_2 has quit IRC | 16:02 | |
*** irclogbot_3 has joined #openstack-cinder | 16:03 | |
Roamer` | I mean, fine, it's not called zuul-merger any more, it's part of what zuul-executor does :) | 16:05 |
*** irclogbot_3 has quit IRC | 16:07 | |
*** irclogbot_3 has joined #openstack-cinder | 16:07 | |
*** gnufied has joined #openstack-cinder | 16:20 | |
*** e0ne has joined #openstack-cinder | 16:23 | |
*** e0ne has quit IRC | 16:31 | |
*** sapd1_x has joined #openstack-cinder | 16:34 | |
*** zigo has quit IRC | 16:43 | |
*** spsurya has quit IRC | 16:48 | |
*** sapd1_x has quit IRC | 16:48 | |
*** rosmaita has left #openstack-cinder | 17:01 | |
*** ociuhandu has joined #openstack-cinder | 17:01 | |
*** markvoelker has quit IRC | 17:04 | |
*** tesseract has quit IRC | 17:06 | |
*** zigo has joined #openstack-cinder | 17:09 | |
openstackgerrit | Eric Harney proposed openstack/cinder stable/stein: Don't allow retype to encrypted+multiattach type https://review.opendev.org/680473 | 17:09 |
openstackgerrit | Eric Harney proposed openstack/cinder master: Continue renaming volume_utils (core) https://review.opendev.org/680474 | 17:10 |
openstackgerrit | Eric Harney proposed openstack/cinder master: Continue renaming of volume_utils (drivers) https://review.opendev.org/680475 | 17:10 |
*** markvoelker has joined #openstack-cinder | 17:13 | |
*** udesale has quit IRC | 17:18 | |
*** ociuhandu_ has joined #openstack-cinder | 17:32 | |
*** ociuhandu has quit IRC | 17:36 | |
*** ociuhandu_ has quit IRC | 17:36 | |
hemna | Roamer`: so did you get the storpool lib py3 compatible ? | 17:39 |
*** davidsha has quit IRC | 17:47 | |
openstackgerrit | Sofia Enriquez proposed openstack/cinder master: Allow removing NFS snapshots in error status https://review.opendev.org/679138 | 17:58 |
*** sfernand has quit IRC | 18:05 | |
*** sahid has quit IRC | 18:05 | |
*** thgcorrea has quit IRC | 18:07 | |
smcginnis | Roamer`: If it is being set, that is good. I'm not aware of how zuul-executor does that, as usually I've just used devstack and I thought it would just check out master if not told to do otherwise. | 18:22 |
smcginnis | Roamer`: I didn't see a flag in local.conf allowing using unsupported drivers either, so that led me to believe it's not actually running the current code if the driver is marked unsupported. It should fail otherwise. | 18:23 |
smcginnis | I wonder if devstack or service initialization logs the current commit anywhere. | 18:23 |
smcginnis | Ah, looks like it is applying the storpool patches separately. | 18:25 |
*** mvkr has quit IRC | 18:26 | |
smcginnis | But this is a little concerning - https://spfactory.storpool.com/logs/15/670715/12/check/cinder-storpool-tempest/57b655b/job-output.txt.gz#_2019-09-05_16_47_30_964994 | 18:26 |
smcginnis | So it's testing master with the outstanding storpool patches applied, regardless of the patch that actually is supposed to be tested? | 18:27 |
smcginnis | Yep, checking out master - https://spfactory.storpool.com/logs/15/670715/12/check/cinder-storpool-tempest/57b655b/job-output.txt.gz#_2019-09-05_16_57_08_748699 | 18:27 |
*** mriedem has quit IRC | 18:31 | |
*** mriedem has joined #openstack-cinder | 18:33 | |
hemna | smcginnis: I saw the cinder.conf that he had that enabled unsupported driver | 18:37 |
smcginnis | Oh, you're right. | 18:38 |
hemna | https://spfactory.storpool.com/logs/85/679785/2/check/cinder-storpool-tempest/105fbc7/controller/logs/etc/cinder/cinder_conf.txt.gz | 18:38 |
hemna | smcginnis: did you ever look into this anymore? https://bugs.launchpad.net/cinder/+bug/1662637 | 18:39 |
openstack | Launchpad bug 1662637 in Cinder "Rate limit settings not enforced" [Undecided,New] | 18:39 |
smcginnis | hemna: No, I never got back to that. I think Michal's suggestion is probably the best. | 18:40 |
hemna | nova still has their limiter code in place afaik | 18:40 |
openstackgerrit | Sean McGinnis proposed openstack/cinder master: Perform CI test https://review.opendev.org/523143 | 18:40 |
hemna | https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/limits.py | 18:40 |
hemna | I'm trying to debug it a little bit to see if I can even get it loaded | 18:40 |
hemna | not sure the api-paste.ini settings are working | 18:41 |
smcginnis | That was part of the confusing bit. We have horrible terminology. That nova limiting I believe is related to quotas. The bug is for API rate limiting. | 18:41 |
hemna | ok good, I'm not the only one completely confused by it | 18:41 |
smcginnis | I think when I filed that, it was as I was trying to understand what all that was because it was very confusing at first glance. | 18:42 |
hemna | and we have zero documentation on it | 18:42 |
hemna | so what has caused me to look into this is an issue when issuing lots of API requests in a short period of time | 18:42 |
hemna | c-vol crashes when you call lots of deletes | 18:43 |
hemna | and | 18:43 |
hemna | there are problems with calling create quickly too | 18:43 |
hemna | I have a bash script that creates 60 volumes, quickly. | 18:44 |
hemna | (ceph backend) | 18:44 |
smcginnis | Wow, that's not good. Are we crashing in a specific place, or does it cause general instability? | 18:44 |
hemna | the scheduler shits out saying that it can't find any hosts | 18:44 |
hemna | as it thinks the ceph backend is full | 18:44 |
hemna | but, ceph sends a stats update and it's got lots of space | 18:44 |
smcginnis | So only with Ceph? Doesn't happen with LVM? | 18:45 |
hemna | I haven't tried with lvm thin yet | 18:45 |
hemna | I suppose I can try with lvm | 18:45 |
eharney | iirc that space accounting issue is not ceph-specific and has to do with how we keep track of usage in the scheduler | 18:45 |
smcginnis | It might be an interesting experiment to see if it's something specific to the Ceph driver or to the volume manager. | 18:45 |
hemna | have to create a large thin vg | 18:45 |
eharney | but it's been a little bit since i looked at it | 18:45 |
hemna | one thing I noticed is that during deletes | 18:45 |
hemna | someone is calling get_volume_stats after every delete | 18:46 |
smcginnis | eharney: So maybe not handling concurrent operations very well? | 18:46 |
hemna | and that is what eventually pukes | 18:46 |
hemna | but on create, the scheduler relies on the period task to update stats | 18:46 |
smcginnis | That's weird, I would have thought up to date stats would be more important for create than delete. | 18:47 |
hemna | it's pretty damn expensive to call get_volume_stats | 18:47 |
smcginnis | Unless it's doing that to make sure you can create right away after deleting. | 18:47 |
smcginnis | Yeah. | 18:47 |
hemna | depending on the usage and what's been created, etc. | 18:47 |
hemna | all of this begs the question of rally jobs | 18:48 |
hemna | I guess we have 1 rally job? | 18:48 |
hemna | cinder-rally-task | 18:48 |
hemna | https://github.com/openstack/cinder/blob/master/rally-jobs/cinder.yaml#L54 | 18:49 |
eharney | smcginnis: need to find my notes/bug on it | 18:49 |
hemna | heh yah, that's not really much of a test | 18:49 |
eharney | i think abishop ran into this happening on an LVM backend in some tripleo CI | 18:51 |
hemna | so in this case there are 2 separate problems | 18:51 |
hemna | 1 is the scheduler and stats during create | 18:51 |
hemna | and the other is lots of deletes w/ ceph | 18:51 |
hemna | let me setup lvm in my same vagrant vm and see what happens | 18:51 |
*** e0ne has joined #openstack-cinder | 18:54 | |
abishop | yeah, see my comment in the commit message to the patch I submitted for tripleo, https://review.opendev.org/678894 | 18:54 |
smcginnis | We've had to raise that in devstack periodically too. https://review.opendev.org/#/c/533312/ | 18:56 |
eharney | i think the trick to this is that consume_from_volume() in the scheduler subtracts away free space in real time upon creation, but nothing does that when you delete a volume | 18:57 |
hemna | yah that looks familiar | 18:57 |
eharney | so you have to wait until get_volume_stats() happens again | 18:57 |
smcginnis | So maybe we just need an unconsume_from_volume() call in there. | 18:57 |
abishop | eharney: precisely | 18:57 |
hemna | and that get_volume_stats() goes through rabbitmq a few times before it gets to the capacity filter | 18:57 |
hemna | the scheduler has the updated date, but the capacity filter doesn't yet. | 18:58 |
hemna | no host found. | 18:58 |
abishop | yup, that's what I observed | 18:58 |
smcginnis | hemna: Maybe a quick test to add back the space with putting a consume_from_volume(-negative_space) call in the delete path? | 18:58 |
hemna | so I'm doing lots of creates | 18:58 |
hemna | then after that's all done | 18:58 |
hemna | lots of deletes | 18:58 |
hemna | not a mix | 18:58 |
hemna | the scheduler runs out of space, even though the backend doesn't | 18:59 |
abishop | it's the mix that should reveal the issue | 18:59 |
hemna | you'll hit it with lots of creates quickly | 18:59 |
eharney | smcginnis: i'm not sure the delete path even goes through that same area of code -- i think that's the right concept but i'm not sure what the implementation looks like | 18:59 |
abishop | sure, if you actually create enough to consume the space | 18:59 |
hemna | the scheduler will think it's full, but the backend doesn't | 18:59 |
abishop | it's the rapid create/delete where space is not accumulated that I found to be the issue | 18:59 |
hemna | and that causes the create failures | 19:00 |
smcginnis | eharney: Oh, is that another one that we don't have go through the scheduler yet? | 19:00 |
eharney | smcginnis: i think so | 19:00 |
abishop | scheduler deducts for the creates, but slow feedback to realize the deletes reclaimed the space | 19:00 |
smcginnis | So not so quick of a test. | 19:00 |
* hemna restacks | 19:00 | |
hemna | it's slow feedback even during the creates | 19:01 |
hemna | as teh scheduler deducts the absolute, but then the backend eventually says...no I still have unused space | 19:01 |
hemna | so if you put a sleep inbetween create calls, they all work | 19:02 |
hemna | I had a pastebin of this last week | 19:02 |
smcginnis | If it's not going through the scheduler, then that update stats call after the delete is pretty useless too. | 19:04 |
smcginnis | Something has to let the scheduler know that things have changed. | 19:04 |
hemna | after stack is back up I'll run a test and pastebin the logs | 19:04 |
smcginnis | Short of forcing a stats update immediately before every create call. | 19:04 |
hemna | this makes me believe we need to throttle requests | 19:05 |
hemna | we aren't a web server serving up e-commerce pages | 19:05 |
eharney | so delete in the volume manager calls publish_service_capabilities() which gathers stats... which appears to send it to schedulers via update_service_capabilities, and even says so with a comment | 19:06 |
eharney | did we used to send an update for each delete from the volume manager from the scheduler and now that isn't going through for some reason? | 19:07 |
eharney | because it sure reads like it's already trying to solve this problem | 19:07 |
*** e0ne has quit IRC | 19:07 | |
hemna | well, during deletes it does | 19:07 |
hemna | do 100 creates | 19:07 |
hemna | it'll puke in there | 19:07 |
hemna | even when the backend has the space for it | 19:08 |
hemna | well, ceph in this case | 19:08 |
eharney | that sounds like the other problem | 19:08 |
hemna | and I think the get_volume_stats is where we crash during the 100 parallel deletes | 19:08 |
*** trident has quit IRC | 19:09 | |
eharney | hmm | 19:10 |
eharney | what kind of error happens there? | 19:11 |
*** e0ne has joined #openstack-cinder | 19:12 | |
hemna | https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/rbd.py#L483 | 19:13 |
hemna | I think that's what is nuking it | 19:13 |
hemna | if you have 1000 volumes in there and you try and delete 100 | 19:13 |
*** e0ne has quit IRC | 19:14 | |
hemna | get_volume_stats() will eventually end up with calls to the rbdproxy to fetch the size of all 1000 volumes individually | 19:14 |
hemna | on every delete call | 19:14 |
hemna | 100* | 19:14 |
hemna | :( | 19:14 |
eharney | hmm | 19:15 |
hemna | we have an internal bug tracking this | 19:15 |
hemna | I wish they would just file this upstream | 19:15 |
hemna | but it has customer details in it :( | 19:15 |
*** e0ne has joined #openstack-cinder | 19:16 | |
hemna | maybe put a lock around get_volume_stats() ? | 19:17 |
hemna | that would serialize it | 19:17 |
hemna | boatloads and boatloads of ImageNotFound errors in c-vol | 19:18 |
*** e0ne_ has joined #openstack-cinder | 19:18 | |
hemna | as the volumes dissapear between get_volume_stats calls | 19:18 |
eharney | hemna: which release is this? | 19:18 |
* hemna checks | 19:18 | |
smcginnis | Would still be good to file an upstream bug that just has the failure information without any customer details. | 19:19 |
eharney | this sounds like the old stats behavior before we rewrote it a while ago to be much faster | 19:19 |
hemna | Pike | 19:19 |
smcginnis | Still happens with master? | 19:20 |
*** trident has joined #openstack-cinder | 19:20 | |
eharney | hemna: setting the rbd_exclusive_cinder_pool option may help a lot: https://review.opendev.org/#/c/607192/2 | 19:21 |
hemna | waiting for devstack to come back up to try it | 19:21 |
hemna | yah we have done that, that was the main suggestion to the customer | 19:21 |
hemna | not sure if they updated to include that | 19:21 |
*** e0ne has quit IRC | 19:21 | |
hemna | ok looks like they did set that | 19:23 |
hemna | and it helped, but still getting tons of ImageNotFound exceptions | 19:23 |
hemna | https://github.com/openstack/cinder/commit/5e4d7e5e986f7a7076632f1cef2c8195fdcc0824#diff-439c80f9f706c9c6c8fe90266cde5c40 | 19:24 |
hemna | hrmm | 19:24 |
hemna | that might help | 19:24 |
hemna | the ImageNotFound exceptions are coming from the parallel deletes eventually having a race | 19:28 |
hemna | between the time it fetches the list of images | 19:29 |
hemna | and when it calls driver.rbd.Image() in the rbdvolumeproxy | 19:29 |
*** jmlowe has quit IRC | 19:32 | |
hemna | maybe we can not log the ImageNotFound exception during _get_usage_info() time | 19:33 |
hemna | since that seems to ignore the ImageNotFound exception anyway | 19:33 |
hemna | https://github.com/openstack/cinder/blob/stable/pike/cinder/volume/drivers/rbd.py#L144 | 19:33 |
hemna | we log the exception there | 19:33 |
hemna | but in the case of _get_usage_info() we just keep going, and in this case we flood the logs with ImageNotFound exceptions | 19:34 |
eharney | if you're seeing that then they aren't using the exclusive pool option | 19:35 |
eharney | because it skips that method | 19:35 |
hemna | oh I see that | 19:35 |
hemna | on line 489 | 19:35 |
hemna | hrmm | 19:35 |
*** rosmaita has joined #openstack-cinder | 19:42 | |
*** e0ne_ has quit IRC | 19:49 | |
*** jmlowe has joined #openstack-cinder | 19:50 | |
*** e0ne has joined #openstack-cinder | 19:54 | |
hemna | deletes aren | 19:57 |
hemna | deletes aren't going through the scheduler | 19:57 |
*** enriquetaso has quit IRC | 19:58 | |
*** e0ne_ has joined #openstack-cinder | 19:59 | |
*** e0ne has quit IRC | 20:00 | |
*** eharney has quit IRC | 20:01 | |
hemna | looks like the create issue happens with LVM too | 20:03 |
hemna | yup | 20:06 |
hemna | so the problem is related to the periodic task happening every 60 seconds during non delete calls | 20:07 |
hemna | if you have lots of create calls between update_volume_stats(), the scheduler can think you have no allocation space left on the backend | 20:07 |
hemna | and it will fail create requests | 20:08 |
hemna | until the next get_volume_stats() happens | 20:08 |
hemna | so for scale testing, you can have lots of failures for creates, that really are false failures | 20:08 |
hemna | makes me wonder why we are doing get_volume_stats after every delete, but not after every create | 20:12 |
hemna | as well as periodic | 20:12 |
*** whfnst has quit IRC | 20:15 | |
*** e0ne_ has quit IRC | 20:19 | |
Roamer` | smcginnis, you are right that the devstack git_clone function is called with "master" as argument, but it does not really check out the master branch: since RECLONE is not set, it will not forcefully replace an already-checked-out repository | 20:28 |
Roamer` | smcginnis, but I saw what you did with the "be evil" "break the CI" patchset, we'll see how it goes :) | 20:28 |
Roamer` | hemna, yes, we fixed the storpool and storpool.spopenstack libraries on PyPI; the Cinder StorPool driver itself did not need any changes | 20:29 |
smcginnis | Roamer`: OK, great. That sounds right then. Thanks for checking on it. | 20:29 |
*** lpetrut has joined #openstack-cinder | 20:34 | |
*** lpetrut has quit IRC | 20:38 | |
*** ociuhandu has joined #openstack-cinder | 20:39 | |
*** ociuhandu has quit IRC | 20:45 | |
*** ociuhandu has joined #openstack-cinder | 20:46 | |
hemna | Roamer`: you might want to look at adding the storpool lib to setup.cfg https://github.com/openstack/cinder/blob/master/setup.cfg#L100 | 20:48 |
hemna | Roamer`: https://github.com/openstack/cinder/commit/4c9ae85ac8e31394c73338920642ef1b4ffa1127#diff-380c6a8ebbbce17d55d50ef17d3cf906 | 20:49 |
*** ociuhandu has quit IRC | 20:50 | |
Roamer` | hemna, ah, thanks, just found your commit that mentioned that storpool was skipped | 20:50 |
Roamer` | hemna, I'll do that, thanks a lot! | 20:50 |
hemna | so I added publish_service_capabilities() at the bottom end of create_volume() for success | 20:58 |
hemna | and the create problem went away | 20:58 |
hemna | Roamer`: no problem. Thanks for getting your CI back up and supporting the driver! | 20:58 |
*** markvoelker has quit IRC | 21:05 | |
*** markvoelker has joined #openstack-cinder | 21:11 | |
-openstackstatus- NOTICE: Gerrit is being restarted to pick up configuration changes. Should be quick. Sorry for the interruption. | 21:13 | |
*** markvoelker has quit IRC | 21:15 | |
Roamer` | hemna, BTW is there a reason why purestorage is mentioned twice in setup.cfg's extras section - once for "pure" (where I believe it belongs) and once again for "all"? | 21:32 |
openstackgerrit | Merged openstack/cinder master: Google backup: correct string encoding between py 2 and 3 https://review.opendev.org/676403 | 21:55 |
openstackgerrit | Peter Penchev proposed openstack/cinder master: StorPool: update the driver requirements. https://review.opendev.org/680530 | 22:11 |
*** carloss has quit IRC | 22:28 | |
*** kaisers has quit IRC | 22:37 | |
*** kaisers has joined #openstack-cinder | 22:38 | |
*** mriedem has quit IRC | 22:50 | |
*** dviroel has quit IRC | 23:02 | |
*** tkajinam has joined #openstack-cinder | 23:02 | |
*** rcernin has joined #openstack-cinder | 23:23 | |
*** threestrands has joined #openstack-cinder | 23:30 | |
*** n-saito has joined #openstack-cinder | 23:33 | |
*** rcernin is now known as rcernin|brb | 23:52 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!