Thursday, 2023-05-18

opendevreviewRajat Dhasmana proposed openstack/glance-specs master: Update new location APIs spec  https://review.opendev.org/c/openstack/glance-specs/+/88349111:51
whoami-rajatdansmith, abhishekk ^ updated the spec based on our discussion yesterday11:53
dansmithlooks like something has regressed with glance and ceph lately: https://c779c3991544ea2e80ce-95a2136c445cc9f6f88a9d6c838f9ef5.ssl.cf5.rackcdn.com/882987/1/gate/devstack-plugin-ceph-tempest-py3/fcb0027/testr_results.html14:04
dansmithabhishekk: you just missed:14:08
dansmithlooks like something has regressed with glance and ceph lately: https://c779c3991544ea2e80ce-95a2136c445cc9f6f88a9d6c838f9ef5.ssl.cf5.rackcdn.com/882987/1/gate/devstack-plugin-ceph-tempest-py3/fcb0027/testr_results.html14:08
abhishekkohh14:08
dansmithseems to always fail, but I'm not sure why14:09
dansmithand not sure what would have changed.. nothing in glance looks relevant, but I'm wondering if it's glance-store or something?14:09
dansmithhmm, like maybe "RBD: Wrap RBD calls in native threads"14:10
abhishekkmay be14:10
dansmithceph  job failed on that patch :(14:10
abhishekkreally, was it non-voting?14:10
dansmithlogs are gone so i can't see if that's why it failed14:11
dansmithyes, non-voting14:11
abhishekk:/14:11
dansmithI'll propose a revert and we'll see14:11
dansmithunfortunate because we just released glance-store14:11
abhishekkyeah14:11
dansmithwhoami-rajat: ^14:11
opendevreviewDan Smith proposed openstack/glance_store master: Revert "RBD: Wrap RBD calls in native threads"  https://review.opendev.org/c/openstack/glance_store/+/88353214:12
abhishekkwe can re-release with minor version change I guess14:12
opendevreviewDan Smith proposed openstack/glance master: DNM: Test glance-store revert  https://review.opendev.org/c/openstack/glance/+/88351514:13
dansmiththe first one of these I saw yesterday, I noticed a log message about a failed greenlet switch14:14
dansmithso this would make sense as the culprit14:14
dansmithcan we make that job voting on glance-store? :14:14
whoami-rajatdansmith, abhishekk do we have anything pointing to that patch? we've made that change in cinder long time ago and didn't face any issues14:15
whoami-rajatthe failure i see in the job is14:15
whoami-rajatMay 17 17:04:48.044167 np0034056070 devstack@g-api.service[109746]: ERROR glance.api.v2.image_data glance.common.exception.InvalidImageStatusTransition: Image status transition from saving to saving is not allowed14:15
dansmithwhoami-rajat: yeah, let me get you a reference14:15
whoami-rajatack14:16
dansmithwhoami-rajat: right as we start the import of the image for one of the failed tests: https://zuul.opendev.org/t/openstack/build/06aba9dafd594beab9bf31d6bfec53c6/log/controller/logs/screen-g-api.txt#1301714:17
dansmithwhoami-rajat: glance is running under uwsgi here, which if cinder is not would be the difference I expect14:17
dansmithpart of the problem of doing *everything* in the api worker :/14:18
whoami-rajatdansmith, cinder API runs under uwsgi as well14:18
dansmithwhoami-rajat: ack, but are you doing this code in your api worker? nova certainly doesn't14:18
dansmithanyway, we can wait and see if this changes it or not. I've seen this failure on several projects starting in the last couple of days and I suspect it's due to the glance-store release for the cve14:20
dansmithand our stable jobs aren't failing, which is a good sign it's new I think14:21
whoami-rajatdansmith, nope, it's in the c-vol service14:27
dansmithwhoami-rajat: https://imgur.com/a/GqRUMP714:28
dansmithpretty striking sudden failboat ;)14:28
dansmith(that's not all necessarily this of course, just that job failing)14:31
whoami-rajatsorry I'm just trying to figure out what could go wrong, i think the native thread execution is blocking operation execution in green threads somehow but not sure14:39
abhishekkwhoami-rajat, in the location api spec have you removed do_secure_hash as request param and made it config value (default to True) also http_retries conf for retrying the hash calculation before marking it fail and leave it unset?14:41
whoami-rajatabhishekk, the first part yes, wasn't sure about the retries part, i can mention http_retries where I've written about retries14:46
abhishekkyeah, we need a config value for that, I guess default should be 3?14:46
whoami-rajatsounds good, hopefully hash calculation 3 times won't be a huge performance issue14:48
abhishekkNot sure it will ever come to retries unless environment is very unstable14:49
dansmith(sorry meeting)14:49
abhishekknp14:49
whoami-rajatok, i can update that, in the meantime if you would like to add a comment on spec, that would be also good14:53
abhishekkack14:58
abhishekkdone15:01
dansmithwhoami-rajat: not sure, but before it was threadpooled and now it's not, so maybe there's some other resource constraint going on15:04
dansmithwhoami-rajat: we have a generic threadpool now that does the right thing under eventlet and uwsgi depending on how we're running, so perhaps that's the thing to use15:05
dansmithoh, no we can't because this isn't in glance itself15:05
whoami-rajatwe're calling low level C code so it's better to wrap it around native threads since we've seen green threads being blocked while calling librbd code15:09
dansmithyeah, calling to c code will always block greenthreads15:11
dansmithbut when we're in uwsgi mode the import job is *already* running in a native thread15:12
dansmithwhoami-rajat: actually, isn't that change just removing the threading entirely?15:13
dansmithunless that call to RBD() creates a thread to run the thing and returns to the caller immediately, this is just making everything a synchronous call and would block everything in that process until it returns15:15
whoami-rajatcan we have a case where the thread doesn't get created? the sync part i think yes, but i guess that will happen with green threads as well, the gthread calling rbd code executes it in native thread and doesn't do context switch15:26
whoami-rajatbut i could be wrong15:26
dansmithwhoami-rajat: the tpool bit you removed is the thing that runs that in a helper thread, but tracked by eventlet as if it was a greenthread, AFAIK15:42
dansmithmeaning before you got a schedule event when you make that tpool call, which after it creates the native thread, will schedule more greenthreads and unblock this caller when the native thread finishes15:43
dansmithin uwsgi mode as this is, we should already be running in a native thread for the import, which means the sync call shouldn't block all of glance-api15:44
dansmithso I'm not 100% sure why it's stuck here, but it's clearly starving out some other greenthread-enabled code, hence that warning15:45
whoami-rajatok makes sense since tpool is being imported from eventlet so it should have mechanism to make the native thread non-blocking but I'm not super familiar with what eventlet does on low level16:06
dansmithyep16:06
dansmithclean zuul run now with that revert: https://review.opendev.org/c/openstack/glance/+/883515?tab=change-view-tab-header-zuul-results-summary16:20
dansmithabhishekk: rosmaita ^16:21
abhishekkdansmith, ack, should we report a bug I guess we need to mention that in release note16:22
dansmithokay16:22
opendevreviewDan Smith proposed openstack/glance_store master: Revert "RBD: Wrap RBD calls in native threads"  https://review.opendev.org/c/openstack/glance_store/+/88353216:32
dansmithabhishekk: how's that^ ?16:32
abhishekklooking16:33
abhishekkLooks good, thank you16:34
dansmithcool, thanks16:34
abhishekkcroelandt, is on leave so we need rosmaita here16:35
dansmithit looks like this is not 100% fail, which makes sense that it's a thread-related resource starvation thing being not fully deterministic 16:35
dansmithbut it happens plenty often16:35
abhishekkwill you propose release patch as well?16:37
abhishekkiwill be on leave tomorrow16:37
dansmithokay16:37
abhishekkthanks 16:38
opendevreviewDan Smith proposed openstack/glance_store master: Make ceph job voting  https://review.opendev.org/c/openstack/glance_store/+/88355116:52
rosmaitadansmith: just got back from lunch, read the scrollback, approved your revert patch17:12
dansmiththanks17:12
rosmaitado we need to put up a release patch?17:12
dansmithyeah17:12
dansmithI put more description of the actual problem in the bug fwiw and for posterity17:13
rosmaitaand propose blacklisting the bad glance_store version, i guess in upper-constraints?17:13
dansmithI mean, it would only affect ceph users so I dunno how nuclear we need to go, but probably won't hurt17:13
rosmaitai'll put up a release patch in a minute17:13
dansmithokay thanks17:13
rosmaitai'll also do the u-c patch, we can let the requirements team decide17:14
dansmithack17:14
rosmaitaguess this is why they like us to release libraries early at M-1, to catch stuff like this17:15
dansmithyeah, but we also should be running the ceph job on glance_store :)17:15
dansmithit's not 100% so I guess we might'nt've caught it17:16
opendevreviewMerged openstack/glance_store master: Revert "RBD: Wrap RBD calls in native threads"  https://review.opendev.org/c/openstack/glance_store/+/88353220:17
rosmaitadansmith: can you verify the hash on https://review.opendev.org/c/openstack/releases/+/883563 ?20:22
dansmithrosmaita: yeah was just waiting for the merge20:23
rosmaitaalso, take a look at this one: https://review.opendev.org/c/openstack/requirements/+/88356820:24
rosmaita(i tried not to use the term 'blacklist')20:24
rosmaitadon't think of Wayne's World or Stairway to Heaven when reading the commit message20:25
dansmithlol, I had to look it up to get the actual denied reference.. I guess that means I need to refresh my wayne's world cache :)20:29
rosmaitashoot, looks like the tooling prefers a minor version bump21:22

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!