opendevreview | OpenStack Proposal Bot proposed openstack/cinder master: Imported Translations from Zanata https://review.opendev.org/c/openstack/cinder/+/889998 | 04:09 |
---|---|---|
opendevreview | Katarina Strenkova proposed openstack/cinder-tempest-plugin master: Replacedeprecated terms https://review.opendev.org/c/openstack/cinder-tempest-plugin/+/889848 | 07:44 |
opendevreview | Raghavendra Tilay proposed openstack/cinder master: HPE 3par: Fix issue seen during retype/migrate https://review.opendev.org/c/openstack/cinder/+/887559 | 08:31 |
raghavendrat | hi whoami-rajat: are you around ? | 12:15 |
raghavendrat | whoami-rajat: regarding https://review.opendev.org/c/openstack/cinder/+/878684 | 12:55 |
raghavendrat | It has two +2. Whenever you get time, could you please have a look. Thanks. | 12:55 |
dansmith | rosmaita: my feeling is that "lvchange -ay" should be a pretty fast operation, no? IIRC, it's just creating a dm device, dm entries, and making it active | 15:27 |
dansmith | I | 15:27 |
dansmith | am looking at a failure where a bunch of stuff grinds to a halt on a CI worker for a minute, and the thing that seems to correlate is running 'lvchange -ay' at the same time, which takes 60s | 15:28 |
rosmaita | dansmith: yes, i would expect that to be pretty fast, but apparently not | 15:34 |
dansmith | because it's 60s exactly, it smells of a deadlock/timeout thing, but it also doesn't fail | 15:35 |
dansmith | rosmaita: any pointers to who might be able to take a look at that? | 15:48 |
rosmaita | dansmith: can you give me a link? | 15:52 |
dansmith | rosmaita: in and about here: https://zuul.opendev.org/t/openstack/build/ed1d53dce8204b6e82c6dcedf335fb66/log/controller/logs/screen-c-vol.txt#6520 | 15:54 |
rosmaita | ty | 15:55 |
dansmith | no activity for a minute or so before that, then some "this took too long" things around running that command, updating the service state etc | 15:55 |
dansmith | get_volume_stats took 134s, etc | 15:55 |
dansmith | so I'm wondering if that is because c-vol is hung up with long-running lvm commands, | 15:57 |
dansmith | or if the long-running db and lvm commands are a symptom of something somewhere else | 15:57 |
dansmith | that loopingcall thing that is complaining about report_state taking longer than the interval kinda makes me wonder | 15:59 |
dansmith | I see it running pretty often and complaining about an overage that almost looks like a good portion of the interval | 15:59 |
dansmith | how often is that supposed to run? certainly not multiple times per minute right? | 16:00 |
rosmaita | i think 6x a minute by default | 16:02 |
dansmith | really? what does it do? I assume update a service record in the database like nova? | 16:03 |
rosmaita | yeah, i think it's basically a heartbeat kind of thing | 16:04 |
dansmith | but to the database? | 16:04 |
rosmaita | yes, pretty sure | 16:04 |
dansmith | so we could probably slow that down in devstack right? | 16:05 |
dansmith | at times of high load, thats probably burning one of your threadpool workers all the time | 16:05 |
dansmith | and increasing load on the database | 16:05 |
dansmith | I guess nova's is pretty high as well, but I know that's one of the things people have to slow down at any kind of scale, sort of firstthing | 16:06 |
rosmaita | i think it would only be an issue for devstack if there are tempest tests around the service status APIs | 16:06 |
dansmith | well, we can control the "what do we consider to be down" interval, which would go up as well | 16:06 |
dansmith | (we being nova) | 16:06 |
dansmith | if we set the report interval too slow and don't adjust the other, then tests will fail because we won't schedule to "down" services, | 16:07 |
dansmith | but if you bump both of them it'll balance out | 16:07 |
rosmaita | I think the option is [DEFAULT]/report_interval in cinder.conf | 16:07 |
dansmith | yeah, same for nova | 16:07 |
dansmith | and service_down_time? | 16:07 |
rosmaita | yeah, we have 60 as the default for service_down_time | 16:08 |
dansmith | yeah, so I'm thinking report_interval=60 and service_down_time=120 | 16:09 |
dansmith | for both nova and cinder | 16:09 |
dansmith | it will be a small gain, granted, but if we're seeing it constantly not meeting the deadline, then there's really no point in keeping it so low | 16:09 |
rosmaita | i was thinking 120 and 720 but let's see what your change does | 16:10 |
dansmith | your numbers are bigger, let's do that :P | 16:10 |
rosmaita | i will be interested to see what happens | 16:11 |
dansmith | rosmaita: yeah, this is just a minor tweak, but we'll see.. still interested in any diagnosis of the lvm stuff | 16:16 |
dansmith | rosmaita: https://review.opendev.org/c/openstack/devstack/+/890439 | 16:19 |
dansmith | rosmaita: so back to that example, | 16:22 |
dansmith | cinder runs "lvs" which I assume is like updating its internal state or something, | 16:23 |
dansmith | and then nothing else happens for a minute, | 16:23 |
dansmith | which also correlates with the lvchange -ay delay | 16:23 |
dansmith | I'm wondering if cinder should be using an external lock to only run one lvm command at a time maybe? | 16:24 |
rosmaita | i have no idea, this is goign to require some research | 17:13 |
dansmith | rosmaita: related, but.. I wonder if it would be better to use the ceph driver more than lvm as our baseline? | 17:13 |
dansmith | I would expect lvm to be much simpler and better performing, but I kinda think we see more failures using it for random reasons than ceph | 17:14 |
rosmaita | i don't know about that, guess we'll have to pull some data | 17:16 |
eharney | the lvs/vgs commands running for over a minute is bizarre... it looks like devstack already sets up lvm device filtering so i'm not sure what could cause that | 20:02 |
opendevreview | Eric Harney proposed openstack/cinder master: LVM: Use --readonly where possible for lvs/vgs https://review.opendev.org/c/openstack/cinder/+/890460 | 20:24 |
opendevreview | Eric Harney proposed openstack/os-brick master: LVM: Use --readonly where possible for lvs/vgs https://review.opendev.org/c/openstack/os-brick/+/890288 | 20:25 |
opendevreview | Eric Harney proposed openstack/cinder master: DNM: LVM: get debug output from lvs https://review.opendev.org/c/openstack/cinder/+/890461 | 20:29 |
eharney | some ideas ^ | 20:30 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!