gthiemonge | skraynev: please open a new story for this issue, if you can provide the Octavia worker and api logs, that would be really helpful | 07:58 |
---|---|---|
skraynev | gthiemonge: hi. I am working on it right now ;) I will. notify when create it and share a link | 07:59 |
gthiemonge | skraynev: I see that you're still using the amphorav1 driver in yoga, we would recommand to switch to amphorav2 (or amphora which is an alias), v1 is still supported (but will be removed in Bobcat) and it receives minimal maintenance | 08:00 |
gthiemonge | hmm when I see "for Task 'octavia-member-to-error-on-revert-flow-created' transitioned into state 'REVERTING' from state 'SUCCESS'" | 08:04 |
gthiemonge | it makes me think that there was an earlier error | 08:04 |
skraynev | gthiemonge: https://storyboard.openstack.org/#!/story/2010646 | 09:36 |
skraynev | the main points here: yoga, amphora v1 and unclear retry batch_member_update after success revert failed task... | 09:37 |
skraynev | gthiemonge: one more interesting thing: this issue time to time (not in all cases) makes LB to stuck in PENDING_UPDATE state. | 09:41 |
gthiemonge | skraynev: thanks, interesting... I'm looking at it | 09:42 |
gthiemonge | the code that creates the flow of tasks for deleting members is different in v2 | 09:46 |
gthiemonge | in v1, I don't understand why we have those 2 calls: https://opendev.org/openstack/octavia/src/branch/master/octavia/controller/worker/v1/flows/member_flows.py#L158-L166 | 09:47 |
gthiemonge | DeleteModelObject + DeleteMemberInDB | 09:47 |
gthiemonge | we don't have it in v2: https://opendev.org/openstack/octavia/src/branch/master/octavia/controller/worker/v2/flows/member_flows.py#L154-L158 | 09:47 |
gthiemonge | skraynev: I have one question: do you know when the API received those 2 calls? | 09:57 |
skraynev | yes. I wrongly hide the request during formatting message | 09:58 |
skraynev | please refresh story: | 09:58 |
skraynev | it happens 2023-03-13 04:12:40, on server1 and 2023-03-13 04:12:39 - server 2 | 09:58 |
skraynev | I did not expect, that ``` requires moving text to new line. I left log on the same line and it did not displayed in story. sorry for that | 09:59 |
gthiemonge | ah ok | 10:04 |
gthiemonge | skraynev: I may have reproduced the issue locally | 10:10 |
gthiemonge | skraynev: when updating the members the Octavia API should lock the access to the LB, but in this API call, that doesn't work properly | 10:10 |
skraynev | gthiemonge: wow! it's great news, that it's not my local issue :) I really happy, that you get the same result | 10:11 |
skraynev | I am not a happy, that bug exist, but repro - is 50% solution. | 10:12 |
skraynev | regarding stuck in PENDING_UPDATE: do I understand right, that it could happen in some corner cases, like when worker failed due to traceback - did some work? | 10:12 |
gthiemonge | skraynev: in most of the cases, the worker should recover from those issues and mark the LB in ERROR | 10:35 |
skraynev | gthiemonge: hm. is it correct for amphora v1. without jobboard ? | 10:35 |
gthiemonge | skraynev: but here, the workers processed at the same time 2 actions on the same LB which triggered a bug in the error handling (in octavia-member-to-error-on-revert-flow-created) | 10:36 |
gthiemonge | skraynev: well if the worker is killed (without jobboard), yeah you may have some resources in PENDING states | 10:36 |
skraynev | gthiemonge: got it! thx | 10:37 |
gthiemonge | skraynev: I updated the story with my findings, but ATM I have no idea | 13:02 |
gthiemonge | tweining: johnsom: if you want to take a look: https://storyboard.openstack.org/#!/story/2010646 | 13:02 |
gthiemonge | the code looks fine BTW | 13:03 |
tweining | gthiemonge: did you reproduce it with two api instances in devstack? | 13:03 |
gthiemonge | no, only one instance, but the requests are processed concurrently | 13:05 |
tweining | I will try to reproduce it as well later | 13:07 |
gthiemonge | I see the COMMIT for the first request, so the LB should be in PENDING_UPDATE in the DB, but the 2nd request queries the DB and the LB seems to be ACTIVE | 13:09 |
skraynev | gthiemonge: thank you. it looks interesting. Do I understand right, that code should set PENDING_UPDATE for members and it will block second API call? | 13:12 |
skraynev | in the PUT method for batch update I see only _test_lb_and_listener_and_pool_statuses - which block pool and listener only | 13:13 |
skraynev | I actually thought, that it should be enough... | 13:14 |
opendevreview | Omer Schwartz proposed openstack/octavia master: Fix pool creation with single LB create call https://review.opendev.org/c/openstack/octavia/+/864204 | 13:14 |
skraynev | gthiemonge. Could the issue be related with long code under context manager for session? I mean, that we set _test_lb_and_listener_and_pool_statuses at the start of context manager, later do a lot of work for members (we could have 100 members for example for update). and only after all these actions we commit session. | 13:23 |
gthiemonge | skraynev: the call at 04:12:39,780.780 is fine, it was processed by the worker on server2 (between 04:12:39,955.955 and 04:12:41,085.085), that means that the call at 04:12:40,197.197 should have been denied, the LB should have been locked (with the PENDING_UPDATE provisioning_status) | 13:23 |
gthiemonge | I don't think so, I'm reproducing it with small batch requests | 13:24 |
skraynev | gthiemonge: yeap. I mean, that call should be denied based on LB and Pool statuses - so statuses for members do not matter for validation of immutability, right? | 13:24 |
skraynev | gthiemonge: hm.. you're right. if it works for small set - my theory does not work. | 13:25 |
gthiemonge | skraynev: no, I think the statuses of the members are not evaluated | 13:25 |
skraynev | gthiemonge: read your comment. does the issue with sqlalchemy? or maybe with session? | 15:54 |
gthiemonge | skraynev: don't know yet | 16:12 |
opendevreview | Gregory Thiemonge proposed openstack/octavia master: Fix ORM caching for with_for_update calls https://review.opendev.org/c/openstack/octavia/+/877414 | 17:14 |
gthiemonge | thanks johnsom ;-) | 17:15 |
opendevreview | Michael Johnson proposed openstack/octavia-tempest-plugin master: Update Octavia tempest tests for no scoped tokens https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/876904 | 23:03 |
opendevreview | Michael Johnson proposed openstack/octavia master: Fix devstack policy overrides https://review.opendev.org/c/openstack/octavia/+/877433 | 23:06 |
opendevreview | Michael Johnson proposed openstack/octavia-tempest-plugin master: Update Octavia tempest tests for no scoped tokens https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/876904 | 23:07 |
opendevreview | Michael Johnson proposed openstack/octavia master: Fix devstack policy overrides https://review.opendev.org/c/openstack/octavia/+/877433 | 23:21 |
opendevreview | Michael Johnson proposed openstack/octavia-tempest-plugin master: Update Octavia tempest tests for no scoped tokens https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/876904 | 23:22 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!