| opendevreview | Pranali Deore proposed openstack/glance_store master: DNM: Test whether few failing jobs passes or not https://review.opendev.org/c/openstack/glance_store/+/879940 | 07:42 |
|---|---|---|
| opendevreview | Pranali Deore proposed openstack/glance master: Change DB migration constant to 2023_2 https://review.opendev.org/c/openstack/glance/+/879947 | 09:58 |
| opendevreview | Merged openstack/glance-specs master: Add a script to prepare the next cycle https://review.opendev.org/c/openstack/glance-specs/+/878121 | 10:35 |
| dansmith | I get all the "ResourceWarning" messages when I run functional locally | 13:36 |
| dansmith | and I do see some leaked processes, but they fluctuate and apparently eventually go away | 13:39 |
| dansmith | and lots of failed process launch status, "no such process" etc | 13:40 |
| dansmith | abhishekk: something clearly happened between 3/14 and 3/27: https://zuul.opendev.org/t/openstack/builds?job_name=glance-tox-functional-py39-rbac-defaults | 14:02 |
| dansmith | almost nothing other than merging that startup directory check, but that passed the same tests, and reverting it locally doesn't fix anything for me | 14:02 |
| abhishekk | dansmith, ack, don't think that startup directory check has anything to do with timeouts | 14:03 |
| dansmith | well, I thought maybe it was preventing the functional api workers from starting, because that seems to be the failure (api servers aren't running) | 14:04 |
| dansmith | and because the time lined up, but yeah, seems unrelated | 14:04 |
| abhishekk | may be need to use skip to isolate/find out the failing test | 14:05 |
| dansmith | it's a ton of them, and maybe all of them? | 14:06 |
| dansmith | I assume you see this in the logs: AssertionError: False is not true : Unexpected server launch status for: api, | 14:06 |
| abhishekk | is there any requirement side change in between related to eventlet or something ? | 14:06 |
| dansmith | I've been looking but I don't think so, unless there's something unconstrained | 14:07 |
| abhishekk | So as per your comment those are failing locally as well, right? | 14:07 |
| dansmith | yeah, are they not for you? | 14:08 |
| abhishekk | I haven't run locally anything recently, just doing it now | 14:08 |
| abhishekk | yep it hanged locally for me as well | 14:15 |
| abhishekk | So I think all tests which are failing are using legacy api server from tests (not the new one which you have written to add new tests for import/quota/policy changes) | 14:17 |
| dansmith | yeah, where it starts a complete api worker process and talks to it over http | 14:17 |
| abhishekk | right | 14:18 |
| dansmith | and it seems that thing is crashing or never coming up or something, which is has always been super hard to debug | 14:18 |
| dansmith | I see lots of zombie processes under the test workers, which are those api processes AFAICT | 14:18 |
| abhishekk | +1 | 14:18 |
| abhishekk | Can we refactor existing api class similar to recent one? | 14:20 |
| dansmith | tests you mean? | 14:21 |
| abhishekk | yeah | 14:22 |
| dansmith | it would be a massive effort | 14:23 |
| dansmith | also, | 14:23 |
| dansmith | the thing you're talking about is synchronous, so the nature of the tests which expect async behaviors would need to change | 14:23 |
| abhishekk | right, I think shortest way is to debug a single test to find out what is going wrong | 14:24 |
| dansmith | yeah | 14:25 |
| abhishekk | pdeore, you around? | 14:25 |
| dansmith | it's also weird that it's working in the devstack jobs, which I think means whatever is broken is something only related to this weird functional worker thing | 14:25 |
| abhishekk | this made me more nervous :D | 14:26 |
| abhishekk | is there shortctut command to kill zombie processes? | 14:28 |
| dansmith | they must be waited on | 14:30 |
| abhishekk | yeah | 14:31 |
| abhishekk | _warn("subprocess %s is still running" % self.pid, | 14:32 |
| abhishekk | might be related, lots of occurrences in logs | 14:34 |
| dansmith | I think that's a symptom | 14:34 |
| abhishekk | likely | 14:35 |
| dansmith | so I think this is what's crashing it: ERROR: 'NoneType' object has no attribute 'group' | 14:36 |
| dansmith | but there's no trace so I have no idea where that is coming from | 14:36 |
| abhishekk | I think we should try skipping test_reload once? | 14:36 |
| dansmith | why? I can run any of the tests in isolation and they fail | 14:36 |
| dansmith | I've been randomly using this one: test_invalid_cors_get_request | 14:37 |
| abhishekk | ohh, I thought that is the one which is reloading configs | 14:37 |
| abhishekk | ack | 14:37 |
| abhishekk | greenlet-1.1.3 is what installed on passing job whereas now it is greenlet==2.0.2 | 14:49 |
| abhishekk | https://d2e021dde0f27c24b843-9e47a969cbb910cc10dbd93fca848265.ssl.cf5.rackcdn.com/850417/9/check/cross-glance-tox-functional/84bf072/job-output.txt | 14:49 |
| abhishekk | this is the last passing cross-glance-tox-functional | 14:50 |
| abhishekk | https://review.opendev.org/c/openstack/requirements/+/872065 | 14:53 |
| abhishekk | this patch is submitted on 26/03 | 14:53 |
| dansmith | ah, I checked greenlet, but it was released in january, so I figured unlreated | 14:55 |
| dansmith | however, | 14:55 |
| dansmith | are we getting that in local runs? | 14:55 |
| dansmith | ah, u-c changed | 14:55 |
| dansmith | and we install that from master when we run tox, regardless | 14:55 |
| dansmith | that's unfortunate | 14:55 |
| abhishekk | yeah | 14:56 |
| dansmith | that's why walking back in the git history doesn't change it I guess | 14:56 |
| dansmith | does that fix it for you? I'm trying | 14:56 |
| abhishekk | nah, I just figured out the patch | 14:56 |
| abhishekk | existing tests are still running for me :D | 14:57 |
| dansmith | I'm not actually sure if eventlet uses greenlet | 14:57 |
| dansmith | yeah, same behavior with 1.1.3 for me | 14:58 |
| abhishekk | :/ | 14:58 |
| abhishekk | how you overridden it in local run? | 15:00 |
| abhishekk | changed uc inside .tox ? | 15:00 |
| dansmith | $ .tox/functional/bin/pip install -U greenlet==1.1.3 | 15:00 |
| dansmith | I'm rebuilding my tox env with u-c from march 14th | 15:01 |
| dansmith | so that should get any others to see if that's related | 15:01 |
| abhishekk | ack | 15:01 |
| dansmith | that installed the older constraints, but didn't fix the problem | 15:04 |
| abhishekk | eventlet 0.33.3 vs 0.33.1 ? | 15:05 |
| dansmith | oh hang on, | 15:05 |
| dansmith | I might have broken soemthing else in my testing, just a sec | 15:05 |
| abhishekk | ack | 15:05 |
| dansmith | oh snap | 15:06 |
| dansmith | - Passed: 1 | 15:06 |
| abhishekk | greenlet or eventlet? | 15:06 |
| dansmith | u-c from march 14 | 15:07 |
| dansmith | but this same issue might have confused my just-greenlet testing earlier | 15:07 |
| abhishekk | ack | 15:08 |
| dansmith | I was halfway through trying to print out something on startup and got distracted with the greenlet thing, but had left a typo | 15:08 |
| dansmith | manifests the same.. a typo preventing the service from starting :) | 15:08 |
| abhishekk | :D | 15:09 |
| abhishekk | what changed in u-c 14th and now? | 15:12 |
| dansmith | okay greenlet alone does not fix it | 15:12 |
| dansmith | I'm looking | 15:12 |
| abhishekk | i found greenlet and eventlet with diff versions | 15:13 |
| dansmith | https://termbin.com/cixg2 | 15:13 |
| abhishekk | ack | 15:14 |
| dansmith | rolling back eventlet failed with dns api error, rolling back dnspython too | 15:15 |
| dansmith | that diff is 65245016de7cf2d1e585eeb1378aac6aa6d75de0..master in requirements, btw | 15:15 |
| dansmith | nope | 15:16 |
| dansmith | mmm, paste | 15:17 |
| abhishekk | pastedeploy? | 15:17 |
| dansmith | yeah, that's the one :) | 15:17 |
| dansmith | works with PasteDeploy===2.1.1 | 15:18 |
| abhishekk | bummer | 15:18 |
| abhishekk | so we need to blacklist 3.0.1 for glance ? | 15:19 |
| dansmith | I dunnno why it's not failing in devstack though | 15:19 |
| dansmith | but no, I think you need to fix the problem .. can't stay on 2.x forever right? | 15:19 |
| abhishekk | yeah | 15:20 |
| abhishekk | till we fix (which is going to take long) shouldn't we rollback to 2.1.1? | 15:20 |
| dansmith | I think u-c is supposed to be across all the projects, right? | 15:21 |
| dansmith | not sure it's an option to block it just for glance and rolling it back in u-c is problematic I think assuming some other project wanted it bumped | 15:21 |
| abhishekk | yeah, but there should/might be a way to override it? | 15:21 |
| dansmith | I dunno what the rules are here | 15:21 |
| abhishekk | me too | 15:22 |
| dansmith | overriding it just means that glance can't be installed alongside nova, for example | 15:22 |
| dansmith | gmann: ^ | 15:22 |
| dansmith | I think gmann has been getting in late recently, so might be a while before he's around | 15:22 |
| abhishekk | ack | 15:22 |
| dansmith | probably should quickly work to determine what the actual problem is though.. might be something simple | 15:22 |
| abhishekk | also we can't skip 106 tests as well :D | 15:22 |
| abhishekk | need to go through reno of PasteDeploy | 15:23 |
| dansmith | I really need to get back to what I was supposed to be doing this morning, but I assume you can take it from here? or maybe pdeore can try to suss out the change? | 15:23 |
| abhishekk | I think pdeore can take it from here | 15:23 |
| dansmith | knowing what the problem is should be like 90% of the work I bet | 15:23 |
| abhishekk | ++ | 15:24 |
| dansmith | it's probably something simple like a missing or now-required arg or something | 15:24 |
| abhishekk | likely | 15:24 |
| abhishekk | thanks for spending time on it | 15:25 |
| * dansmith nods | 15:25 | |
| dansmith | also, maybe some unit tests for the deploy stuff will make it easier to debug what is going on | 15:26 |
| dansmith | and also maybe let's not add any more functional tests based on these api workers :D | 15:26 |
| abhishekk | ++ | 15:27 |
| abhishekk | there are two releases 3.0 and 3.0.1 2022-10-16 and 2022-10-17 | 15:28 |
| abhishekk | for pastedeploy ^^ | 15:28 |
| abhishekk | https://docs.pylonsproject.org/projects/pastedeploy/en/latest/news.html | 15:28 |
| dansmith | yeah, not much in the way of news for those | 15:28 |
| dansmith | seems like the major version bump is just because of dropping py2 support | 15:29 |
| abhishekk | likely | 15:29 |
| abhishekk | app = deploy.loadapp("config:%s" % conf_file, name=app_name) | 15:35 |
| abhishekk | this is the only function i think we are calling | 15:36 |
| dansmith | yup | 15:39 |
| dansmith | and it looks the same in nova | 15:39 |
| dansmith | but of course, it's loading modules provided by glance (and nova) and might be choking on those | 15:39 |
| dansmith | I don't know how it goes from the paste config to the python objects.. so someone probably needs to figure that out ... | 15:41 |
| abhishekk | ack | 15:43 |
| dansmith | oh yeah and those workers run with generated paste configs | 15:49 |
| dansmith | different from the one in etc/ | 15:49 |
| dansmith | so could be something there too I guess | 15:49 |
| dansmith | this is what it's loading via paste I think: glance.api:root_app_factory | 15:49 |
| dansmith | which of course is different in the generated paste config for those workers | 15:51 |
| dansmith | er, well, maybe the same, but in a different stack | 15:51 |
| abhishekk | right | 15:53 |
| gmann | dansmith: abhishekk_ hi | 17:36 |
| gmann | I do not think we can/should have different u-c (blacklist specific vesion) for glance only | 17:36 |
| abhishekk_ | gmann, hi | 17:36 |
| abhishekk_ | ack | 17:37 |
| dansmith | yeah, agree | 17:37 |
| dansmith | it's unfortunate that it got bumped without being realized, but alas, here we are | 17:37 |
| abhishekk_ | dansmith, you have latest devstack deployed? | 17:38 |
| gmann | does not requirement has glance functional test job ? | 17:38 |
| dansmith | abhishekk_: no | 17:38 |
| abhishekk_ | dansmith, ack | 17:39 |
| dansmith | gmann: no | 17:39 |
| dansmith | it's a bummer to have to run all the projects' functionals really, especially lately with things breaking so much | 17:39 |
| dansmith | also, gmann, glance's functionals (these at least) are a nightmare to debug, so asking non-glance people to investigate failures is also unfortunate | 17:40 |
| gmann | ohk | 17:40 |
| abhishekk_ | somehow I came to conclusion that VersionNegotiationFilter is causing trouble but no further luck since last couple of hours | 17:40 |
| dansmith | abhishekk_: so you think it's not crashing on start, but refusing to do anything useful? | 17:41 |
| dansmith | because the test waits for the timeout for the "ping" which should fail immediately if it's up but just not working | 17:41 |
| abhishekk_ | yeah | 17:41 |
| abhishekk_ | if I rmove that filter from here | 17:42 |
| abhishekk_ | https://github.com/openstack/glance/blob/master/glance/tests/functional/__init__.py#L499 | 17:42 |
| abhishekk_ | test passes with latest paste deploy | 17:42 |
| dansmith | huh, maybe just loading that filter fails? | 17:43 |
| abhishekk_ | likely, tried putting logs there or in wsgi.Middleware but nothing actually logs | 17:44 |
| dansmith | right | 17:44 |
| dansmith | this got me some logs: | 17:44 |
| dansmith | https://termbin.com/ycd6 | 17:44 |
| abhishekk_ | may be next step is to deploy latest devstack and execute some version api calls | 17:44 |
| abhishekk_ | looking | 17:45 |
| dansmith | or make sure it configures it the same way | 17:45 |
| dansmith | maybe devstack doesn't have that filter? or it's not in that order? | 17:45 |
| abhishekk_ | i think it has | 17:46 |
| dansmith | well, something has to be different :) | 17:47 |
| abhishekk_ | agree | 17:47 |
| dansmith | definitely a different order | 17:47 |
| abhishekk_ | I think now I will hand it over to pdeore | 17:47 |
| * dansmith nods | 17:48 | |
| abhishekk_ | curiosity is what changed in deploy 3.0 that causes this middleware to fail :D | 17:48 |
| dansmith | yeah | 17:49 |
| abhishekk_ | tempest is also not failing means not much to worry? | 17:56 |
| dansmith | well, that's what I'm saying.. it must be something specific to the config in the functional workers, since the devstack jobs are fine | 17:56 |
| abhishekk_ | yep | 17:57 |
| * abhishekk_ signing out now, have a good day | 17:57 | |
| dansmith | o/ | 18:01 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!