tobias-urdin | gthiemonge: trying to wrap my head around this, how could this: Batch updating members: old='[]', new='[None]' | 09:12 |
---|---|---|
tobias-urdin | ever happend here: https://github.com/openstack/octavia/blob/stable/yoga/octavia/controller/queue/v2/endpoints.py#L128 | 09:13 |
tobias-urdin | and could that indicate an issue I see with being stuck in PENDING_UPDATE when a lot of batch members updates is done | 09:13 |
gthiemonge | tobias-urdin: looking... | 09:16 |
gthiemonge | tobias-urdin: I think I know this one | 09:17 |
gthiemonge | tobias-urdin: https://bugs.launchpad.net/octavia/+bug/2036156 | 09:17 |
gthiemonge | hm maybe it's time to cut a new bugfix release for yoga | 09:19 |
tobias-urdin | yeah i was looking at that one, I think that is already released in yoga 10.0.1 | 09:25 |
tobias-urdin | but i didn't understand if it's the same based on the error messages in the bugreport | 09:26 |
tobias-urdin | thanks I don't think we have that so will try with that | 09:28 |
tobias-urdin | the pending_update stuck seems to be unrelated, invalid tls cert container caused listener to get stuck in PENDING_UPDATE because certificate was either invalid or not found | 09:44 |
tobias-urdin | https://paste.opendev.org/show/bxsnaKgk096PIiQagFDp/ – i always wondered why there is no failed or update_failed provisioning_status on resources | 09:48 |
opendevreview | Lukas Piwowarski proposed openstack/octavia-tempest-plugin master: Add backup member tests https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/897564 | 09:58 |
gthiemonge | tobias-urdin: yeah, we should have a try/except block to set the state to ERROR if one of those lines triggers an exception https://opendev.org/openstack/octavia/src/branch/stable/yoga/octavia/controller/worker/v2/controller_worker.py#L748-L757 | 10:43 |
gthiemonge | that said, the certs should have been checked in the API | 10:45 |
opendevreview | Tobias Urdin proposed openstack/octavia master: Catch exceptions in listener update https://review.opendev.org/c/openstack/octavia/+/903753 | 13:47 |
tobias-urdin | gthiemonge: yeah – thinking about it, isn't above ^ valid? doing same in put() that is done in post() otherwise when certificate is invalid we get stuck in PENDING_UPDATE, I guess that's what happend here, the codepath makes it possible for it to be stuck in PENDING_UPDATE when updating the listener but not upon creation | 13:48 |
tobias-urdin | (stuck in PENDING_UPDATE since we don't rollback?) | 13:48 |
tobias-urdin | more context: since somewhere we made it possible to introduce a invalid certificate with the API, the failure to delete is expected since I updated the LB+listener+pool+l7policy from PENDING_UPDATE -> ACTIVE so that I could delete them | 13:50 |
tobias-urdin | but I never could delete it until I unset the certificate `openstack loadbalancer listener unset --default-tls-container-ref` and then PENDING_UPDATE went to ACTIVE since that's where the error was, then I could delete everything | 13:51 |
gthiemonge | tobias-urdin: hmm, AFAIK the with session.begin() block should implicitly handle the rollback, so for me, your patch doesn't change the behavior | 13:53 |
gthiemonge | "call_provider" sends the request to the worker, if the exception is trigger in the worker, the API doesn't catch it | 13:54 |
gthiemonge | "sends the request (via the RPC)" | 13:54 |
tobias-urdin | feels like somewhere the api allowed to introduce the broken certificate without rollback of provisioning_status back to ACTIVE or setting it to ERROR | 13:54 |
gthiemonge | tobias-urdin: do you know what could be wrong with your certificate? | 13:56 |
tobias-urdin | I'm not sure, I don't think I have access to it or if it's still left – I only have the logs for when I tried to delete it https://paste.opendev.org/show/bxsnaKgk096PIiQagFDp/ | 14:01 |
tobias-urdin | let me see if I can find the PUT call for the listener, if you suspicion is correct | 14:01 |
tobias-urdin | gthiemonge: i don't have a request ID for the listener post() call but after the "Creating listener" line in the log I don't have any errors | 14:20 |
tobias-urdin | about 50 minutes later this listener put() call https://paste.opendev.org/show/bS6QHyAXZj5eoe2iudDh/ | 14:20 |
tobias-urdin | if the cert is invalid it shouldn't send it to amphora provider but looks like that happens (last line), it would make more sense and set back PENDING_UPDATE -> ACTIVE and reject the change or set status to ERROR instead no? | 14:22 |
tobias-urdin | now what happend was the LB, listener, pool and l7policy got stuck in PENDING_UPDATE, and it was solved by unsetting the tls container ref on listener and everything went back to ACTIVE so it could be deleted | 14:23 |
tobias-urdin | i get that the secret could be updated without octavia knowing about it and the cert being invalid, but when stuck in pending_update people more feel like "the service is broken" because their input data was accepted and there is no mention of error anywhere | 14:24 |
gthiemonge | yes I agree, it needs to be fixed, most of the code in controller_worker.py should be in try/except block with an update of the prov_status in case of errors | 14:32 |
opendevreview | Tobias Urdin proposed openstack/octavia master: Ensure tls containers is validated https://review.opendev.org/c/openstack/octavia/+/903759 | 14:49 |
tobias-urdin | gthiemonge: ^ maybe something, if listener was created in POST /v2/lbaas/loadbalancers call it would not have verified default_tls_container_ref, that is called in lbaas create path through _graph_create() in listener code | 15:02 |
tobias-urdin | i'm asking user to see if listener was created through the POST /v2/lbaas/loadbalancers or with POST /v2/lbaas/listeners (and if default_tls_container_ref was included in POST or if it was PUT /v2/lbaas/listeners/<id> afterwards) | 15:02 |
opendevreview | Michael Johnson proposed openstack/octavia-tempest-plugin master: DNM: Testing with enable scope set to True https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/902096 | 23:43 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!