opendevreview | Rajat Dhasmana proposed openstack/glance-specs master: Update new location APIs spec https://review.opendev.org/c/openstack/glance-specs/+/883491 | 11:51 |
---|---|---|
whoami-rajat | dansmith, abhishekk ^ updated the spec based on our discussion yesterday | 11:53 |
dansmith | looks like something has regressed with glance and ceph lately: https://c779c3991544ea2e80ce-95a2136c445cc9f6f88a9d6c838f9ef5.ssl.cf5.rackcdn.com/882987/1/gate/devstack-plugin-ceph-tempest-py3/fcb0027/testr_results.html | 14:04 |
dansmith | abhishekk: you just missed: | 14:08 |
dansmith | looks like something has regressed with glance and ceph lately: https://c779c3991544ea2e80ce-95a2136c445cc9f6f88a9d6c838f9ef5.ssl.cf5.rackcdn.com/882987/1/gate/devstack-plugin-ceph-tempest-py3/fcb0027/testr_results.html | 14:08 |
abhishekk | ohh | 14:08 |
dansmith | seems to always fail, but I'm not sure why | 14:09 |
dansmith | and not sure what would have changed.. nothing in glance looks relevant, but I'm wondering if it's glance-store or something? | 14:09 |
dansmith | hmm, like maybe "RBD: Wrap RBD calls in native threads" | 14:10 |
abhishekk | may be | 14:10 |
dansmith | ceph job failed on that patch :( | 14:10 |
abhishekk | really, was it non-voting? | 14:10 |
dansmith | logs are gone so i can't see if that's why it failed | 14:11 |
dansmith | yes, non-voting | 14:11 |
abhishekk | :/ | 14:11 |
dansmith | I'll propose a revert and we'll see | 14:11 |
dansmith | unfortunate because we just released glance-store | 14:11 |
abhishekk | yeah | 14:11 |
dansmith | whoami-rajat: ^ | 14:11 |
opendevreview | Dan Smith proposed openstack/glance_store master: Revert "RBD: Wrap RBD calls in native threads" https://review.opendev.org/c/openstack/glance_store/+/883532 | 14:12 |
abhishekk | we can re-release with minor version change I guess | 14:12 |
opendevreview | Dan Smith proposed openstack/glance master: DNM: Test glance-store revert https://review.opendev.org/c/openstack/glance/+/883515 | 14:13 |
dansmith | the first one of these I saw yesterday, I noticed a log message about a failed greenlet switch | 14:14 |
dansmith | so this would make sense as the culprit | 14:14 |
dansmith | can we make that job voting on glance-store? : | 14:14 |
whoami-rajat | dansmith, abhishekk do we have anything pointing to that patch? we've made that change in cinder long time ago and didn't face any issues | 14:15 |
whoami-rajat | the failure i see in the job is | 14:15 |
whoami-rajat | May 17 17:04:48.044167 np0034056070 devstack@g-api.service[109746]: ERROR glance.api.v2.image_data glance.common.exception.InvalidImageStatusTransition: Image status transition from saving to saving is not allowed | 14:15 |
dansmith | whoami-rajat: yeah, let me get you a reference | 14:15 |
whoami-rajat | ack | 14:16 |
dansmith | whoami-rajat: right as we start the import of the image for one of the failed tests: https://zuul.opendev.org/t/openstack/build/06aba9dafd594beab9bf31d6bfec53c6/log/controller/logs/screen-g-api.txt#13017 | 14:17 |
dansmith | whoami-rajat: glance is running under uwsgi here, which if cinder is not would be the difference I expect | 14:17 |
dansmith | part of the problem of doing *everything* in the api worker :/ | 14:18 |
whoami-rajat | dansmith, cinder API runs under uwsgi as well | 14:18 |
dansmith | whoami-rajat: ack, but are you doing this code in your api worker? nova certainly doesn't | 14:18 |
dansmith | anyway, we can wait and see if this changes it or not. I've seen this failure on several projects starting in the last couple of days and I suspect it's due to the glance-store release for the cve | 14:20 |
dansmith | and our stable jobs aren't failing, which is a good sign it's new I think | 14:21 |
whoami-rajat | dansmith, nope, it's in the c-vol service | 14:27 |
dansmith | whoami-rajat: https://imgur.com/a/GqRUMP7 | 14:28 |
dansmith | pretty striking sudden failboat ;) | 14:28 |
dansmith | (that's not all necessarily this of course, just that job failing) | 14:31 |
whoami-rajat | sorry I'm just trying to figure out what could go wrong, i think the native thread execution is blocking operation execution in green threads somehow but not sure | 14:39 |
abhishekk | whoami-rajat, in the location api spec have you removed do_secure_hash as request param and made it config value (default to True) also http_retries conf for retrying the hash calculation before marking it fail and leave it unset? | 14:41 |
whoami-rajat | abhishekk, the first part yes, wasn't sure about the retries part, i can mention http_retries where I've written about retries | 14:46 |
abhishekk | yeah, we need a config value for that, I guess default should be 3? | 14:46 |
whoami-rajat | sounds good, hopefully hash calculation 3 times won't be a huge performance issue | 14:48 |
abhishekk | Not sure it will ever come to retries unless environment is very unstable | 14:49 |
dansmith | (sorry meeting) | 14:49 |
abhishekk | np | 14:49 |
whoami-rajat | ok, i can update that, in the meantime if you would like to add a comment on spec, that would be also good | 14:53 |
abhishekk | ack | 14:58 |
abhishekk | done | 15:01 |
dansmith | whoami-rajat: not sure, but before it was threadpooled and now it's not, so maybe there's some other resource constraint going on | 15:04 |
dansmith | whoami-rajat: we have a generic threadpool now that does the right thing under eventlet and uwsgi depending on how we're running, so perhaps that's the thing to use | 15:05 |
dansmith | oh, no we can't because this isn't in glance itself | 15:05 |
whoami-rajat | we're calling low level C code so it's better to wrap it around native threads since we've seen green threads being blocked while calling librbd code | 15:09 |
dansmith | yeah, calling to c code will always block greenthreads | 15:11 |
dansmith | but when we're in uwsgi mode the import job is *already* running in a native thread | 15:12 |
dansmith | whoami-rajat: actually, isn't that change just removing the threading entirely? | 15:13 |
dansmith | unless that call to RBD() creates a thread to run the thing and returns to the caller immediately, this is just making everything a synchronous call and would block everything in that process until it returns | 15:15 |
whoami-rajat | can we have a case where the thread doesn't get created? the sync part i think yes, but i guess that will happen with green threads as well, the gthread calling rbd code executes it in native thread and doesn't do context switch | 15:26 |
whoami-rajat | but i could be wrong | 15:26 |
dansmith | whoami-rajat: the tpool bit you removed is the thing that runs that in a helper thread, but tracked by eventlet as if it was a greenthread, AFAIK | 15:42 |
dansmith | meaning before you got a schedule event when you make that tpool call, which after it creates the native thread, will schedule more greenthreads and unblock this caller when the native thread finishes | 15:43 |
dansmith | in uwsgi mode as this is, we should already be running in a native thread for the import, which means the sync call shouldn't block all of glance-api | 15:44 |
dansmith | so I'm not 100% sure why it's stuck here, but it's clearly starving out some other greenthread-enabled code, hence that warning | 15:45 |
whoami-rajat | ok makes sense since tpool is being imported from eventlet so it should have mechanism to make the native thread non-blocking but I'm not super familiar with what eventlet does on low level | 16:06 |
dansmith | yep | 16:06 |
dansmith | clean zuul run now with that revert: https://review.opendev.org/c/openstack/glance/+/883515?tab=change-view-tab-header-zuul-results-summary | 16:20 |
dansmith | abhishekk: rosmaita ^ | 16:21 |
abhishekk | dansmith, ack, should we report a bug I guess we need to mention that in release note | 16:22 |
dansmith | okay | 16:22 |
opendevreview | Dan Smith proposed openstack/glance_store master: Revert "RBD: Wrap RBD calls in native threads" https://review.opendev.org/c/openstack/glance_store/+/883532 | 16:32 |
dansmith | abhishekk: how's that^ ? | 16:32 |
abhishekk | looking | 16:33 |
abhishekk | Looks good, thank you | 16:34 |
dansmith | cool, thanks | 16:34 |
abhishekk | croelandt, is on leave so we need rosmaita here | 16:35 |
dansmith | it looks like this is not 100% fail, which makes sense that it's a thread-related resource starvation thing being not fully deterministic | 16:35 |
dansmith | but it happens plenty often | 16:35 |
abhishekk | will you propose release patch as well? | 16:37 |
abhishekk | iwill be on leave tomorrow | 16:37 |
dansmith | okay | 16:37 |
abhishekk | thanks | 16:38 |
opendevreview | Dan Smith proposed openstack/glance_store master: Make ceph job voting https://review.opendev.org/c/openstack/glance_store/+/883551 | 16:52 |
rosmaita | dansmith: just got back from lunch, read the scrollback, approved your revert patch | 17:12 |
dansmith | thanks | 17:12 |
rosmaita | do we need to put up a release patch? | 17:12 |
dansmith | yeah | 17:12 |
dansmith | I put more description of the actual problem in the bug fwiw and for posterity | 17:13 |
rosmaita | and propose blacklisting the bad glance_store version, i guess in upper-constraints? | 17:13 |
dansmith | I mean, it would only affect ceph users so I dunno how nuclear we need to go, but probably won't hurt | 17:13 |
rosmaita | i'll put up a release patch in a minute | 17:13 |
dansmith | okay thanks | 17:13 |
rosmaita | i'll also do the u-c patch, we can let the requirements team decide | 17:14 |
dansmith | ack | 17:14 |
rosmaita | guess this is why they like us to release libraries early at M-1, to catch stuff like this | 17:15 |
dansmith | yeah, but we also should be running the ceph job on glance_store :) | 17:15 |
dansmith | it's not 100% so I guess we might'nt've caught it | 17:16 |
opendevreview | Merged openstack/glance_store master: Revert "RBD: Wrap RBD calls in native threads" https://review.opendev.org/c/openstack/glance_store/+/883532 | 20:17 |
rosmaita | dansmith: can you verify the hash on https://review.opendev.org/c/openstack/releases/+/883563 ? | 20:22 |
dansmith | rosmaita: yeah was just waiting for the merge | 20:23 |
rosmaita | also, take a look at this one: https://review.opendev.org/c/openstack/requirements/+/883568 | 20:24 |
rosmaita | (i tried not to use the term 'blacklist') | 20:24 |
rosmaita | don't think of Wayne's World or Stairway to Heaven when reading the commit message | 20:25 |
dansmith | lol, I had to look it up to get the actual denied reference.. I guess that means I need to refresh my wayne's world cache :) | 20:29 |
rosmaita | shoot, looks like the tooling prefers a minor version bump | 21:22 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!