*** dtruong has quit IRC | 00:07 | |
*** dtruong has joined #openstack-lbaas | 00:07 | |
*** bzhao__ has joined #openstack-lbaas | 00:19 | |
johnsom | Ok, done with my action items from this morning. Signing off for the evening. | 00:28 |
---|---|---|
*** sapd1_x has joined #openstack-lbaas | 01:30 | |
*** goldyfruit has quit IRC | 03:01 | |
*** psachin has joined #openstack-lbaas | 03:02 | |
*** KeithMnemonic1 has quit IRC | 03:06 | |
*** KeithMnemonic has joined #openstack-lbaas | 03:12 | |
*** ramishra has joined #openstack-lbaas | 03:17 | |
*** ricolin has quit IRC | 04:14 | |
*** ricolin has joined #openstack-lbaas | 04:21 | |
openstackgerrit | Hidekazu Nakamura proposed openstack/octavia master: Add install guide for Ubuntu https://review.opendev.org/672842 | 05:00 |
*** ricolin has quit IRC | 05:02 | |
*** ricolin has joined #openstack-lbaas | 05:03 | |
*** takamatsu has joined #openstack-lbaas | 07:08 | |
*** ricolin has quit IRC | 07:15 | |
*** rcernin has quit IRC | 07:15 | |
*** trident has quit IRC | 07:25 | |
*** trident has joined #openstack-lbaas | 07:31 | |
*** sapd1_ has joined #openstack-lbaas | 07:33 | |
*** sapd1 has quit IRC | 07:37 | |
*** rpittau|afk is now known as rpittau | 08:14 | |
*** tkajinam has quit IRC | 08:19 | |
*** ivve has joined #openstack-lbaas | 08:26 | |
openstackgerrit | Carlos Goncalves proposed openstack/octavia stable/stein: worker: Re-add FailoverPreparationForAmphora https://review.opendev.org/678180 | 08:33 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia stable/rocky: worker: Re-add FailoverPreparationForAmphora https://review.opendev.org/678181 | 08:33 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia stable/queens: worker: Re-add FailoverPreparationForAmphora https://review.opendev.org/678182 | 08:34 |
*** sapd1_x has quit IRC | 08:47 | |
*** salmankhan has joined #openstack-lbaas | 08:50 | |
*** gcheresh has joined #openstack-lbaas | 09:03 | |
*** ajay33 has joined #openstack-lbaas | 09:07 | |
rm_work | i'm going to be on during the later side of the day this time, flipping my schedule around | 09:42 |
rm_work | sleep time now ;) | 09:42 |
*** gcheresh has quit IRC | 09:50 | |
*** psachin has quit IRC | 10:04 | |
*** psachin has joined #openstack-lbaas | 10:06 | |
*** maciejjozefczyk has joined #openstack-lbaas | 10:06 | |
*** maciejjozefczyk has quit IRC | 10:07 | |
*** roukoswarf has quit IRC | 10:24 | |
*** rouk has joined #openstack-lbaas | 10:24 | |
openstackgerrit | Ann Taraday proposed openstack/octavia master: Convert pool flows to use dicts https://review.opendev.org/665381 | 10:29 |
openstackgerrit | Ann Taraday proposed openstack/octavia master: Transition amphora flows to dicts https://review.opendev.org/668898 | 10:29 |
openstackgerrit | Ann Taraday proposed openstack/octavia master: [WIP] Lb flows to dicts https://review.opendev.org/671725 | 10:29 |
openstackgerrit | Ann Taraday proposed openstack/octavia master: [WIP] Jobboard based controller https://review.opendev.org/647406 | 10:45 |
*** tesseract has joined #openstack-lbaas | 11:12 | |
*** goldyfruit has joined #openstack-lbaas | 11:14 | |
*** salmankhan has quit IRC | 11:30 | |
*** salmankhan has joined #openstack-lbaas | 11:31 | |
openstackgerrit | Ann Taraday proposed openstack/octavia master: [WIP] Lb flows to dicts https://review.opendev.org/671725 | 11:58 |
*** spatel has joined #openstack-lbaas | 12:43 | |
*** spatel has quit IRC | 12:48 | |
*** roukoswarf has joined #openstack-lbaas | 12:59 | |
*** rouk has quit IRC | 13:00 | |
*** spatel has joined #openstack-lbaas | 13:14 | |
*** spatel has quit IRC | 13:17 | |
openstackgerrit | Ann Taraday proposed openstack/octavia master: Transition amphora flows to dicts https://review.opendev.org/668898 | 13:23 |
openstackgerrit | Ann Taraday proposed openstack/octavia master: [WIP] Lb flows to dicts https://review.opendev.org/671725 | 13:23 |
openstackgerrit | Ann Taraday proposed openstack/octavia master: [WIP] Jobboard based controller https://review.opendev.org/647406 | 13:23 |
*** ivve has quit IRC | 13:24 | |
*** psachin has quit IRC | 13:47 | |
*** ajay33 has quit IRC | 13:52 | |
*** ramishra has quit IRC | 14:01 | |
*** salmankhan has quit IRC | 14:14 | |
*** salmankhan has joined #openstack-lbaas | 14:26 | |
*** salmankhan has quit IRC | 15:10 | |
*** salmankhan has joined #openstack-lbaas | 15:13 | |
*** ccamposr has quit IRC | 15:17 | |
*** rpittau is now known as rpittau|afk | 15:32 | |
*** Vorrtex has joined #openstack-lbaas | 15:51 | |
*** salmankhan has quit IRC | 15:55 | |
*** salmankhan has joined #openstack-lbaas | 16:04 | |
*** tesseract has quit IRC | 16:11 | |
*** Vorrtex has quit IRC | 16:29 | |
*** Vorrtex has joined #openstack-lbaas | 16:39 | |
*** gcheresh has joined #openstack-lbaas | 18:07 | |
gregwork | is there any way of clearing a load balancer in PENDING_UPDATE from a failed stack delete ? | 19:02 |
gregwork | i mean i think the stack delete failed because the delete on the loadbalancer got stuck in pending_update | 19:02 |
johnsom | So LBs don't get "stuck" unless you hard kill the controller that owned that resource. By owned, I mean it was actively working on the load balancer. | 19:03 |
johnsom | You may have very high retry timeouts that means it could be 25+ minutes before it gives up, but they don't get stuck. | 19:04 |
johnsom | Now, if you hard killed the controller (kill -9, pulled the power, etc.) you can leave orphaned resources in a PENDING_* state. | 19:04 |
johnsom | We are actually working on solving that as well (it's the jobboard effort). | 19:05 |
gregwork | so that didnt happen | 19:05 |
gregwork | these resources were stood up by openshift-ansible. and got stuck tearing that environemnt down via openstack stack delete openshift-cluster | 19:05 |
gregwork | no servers have been abruptly turned off | 19:06 |
gregwork | it just got stuck for some reason | 19:06 |
johnsom | Maybe it did a kill -9 and not a graceful shutdown (kill -15)? | 19:06 |
*** bzhao__ has quit IRC | 19:06 | |
johnsom | All of our code paths lead back to an unlocked state, either ERROR or ACTIVE. | 19:07 |
gregwork | im not sure where any killing occurs, it was stood up by the stack create, and a standard stack delete occurred so i imagine its just running whatever OS::Octavia:Whatever calls | 19:07 |
johnsom | Anyway, here is how to check what is going on and potentially fix it | 19:07 |
johnsom | What is OS::Octavia? | 19:07 |
gregwork | i was generalizing on the heat template that setup octavia | 19:07 |
johnsom | Did someone create an ansible module? | 19:07 |
gregwork | all of this is being orchestrated by heat | 19:08 |
gregwork | openshift-ansible creates a heat stack that setups the environment | 19:08 |
gregwork | then does a openstack stack create on that template | 19:08 |
johnsom | Ah, hmm. Maybe we should try to find that heat code and see if they are kill -9 the processes. | 19:08 |
*** ash2307 has joined #openstack-lbaas | 19:08 | |
gregwork | the stack create completes without issue, its just when we do the teardown of that stack | 19:09 |
gregwork | that it breaks | 19:09 |
*** salmankhan has quit IRC | 19:09 | |
johnsom | Anyway, so first thing is to check the logs for all of the controller processes and make sure it's not retrying actions on the resource you want to unlock. Typically they log "warning" messages when they are still retrying an action (for example if nova is failing). | 19:09 |
johnsom | Yeah, I don't think we have seen that heat code, so I can't really speak to it. | 19:10 |
johnsom | If all of the instances of the Octavia controllers are idle and not actively working on the resource, then you can go into the octavia database in mysql and update the load balancer record to provisioning_status = 'ERROR'. Here is example SQL: | 19:11 |
johnsom | update load_balancer set provisioning_status = 'ERROR' where id = '<UUID of LB>'; | 19:12 |
johnsom | Then use the openstack command to delete the resource "openstack loadbalancer delete --cascade <lb uuid or name>" | 19:12 |
gregwork | alright | 19:13 |
johnsom | However, it is very important that you make sure no controller has active ownership of the resource. If it does, and this process is followed, you will potentially orphan resources in other services, get into failover/retry loops, and/or end up with duplicate ghost resources. | 19:13 |
gregwork | johnsom: zaneb in #heat says heat doesnt touch the octavia process | 19:14 |
gregwork | in regards to if it -9 or -15's it | 19:14 |
johnsom | Ok, then I would carefully check the logs as one of the controllers probably still has ownership and is retrying something. | 19:15 |
johnsom | The controller logs would also reflect the unsafe shutdown as we log safe shutdowns, but if it was kill -9 there would be no shutdown messages, just a new startup. | 19:17 |
gregwork | 2019-08-23 15:01:36.602 26 INFO octavia.api.v2.controllers.load_balancer [req-16342311-7390-497f-9dd6-8f5afd493523 - 1d11e677a9e94ebca0dcea2b9ae9a7fe - default default] Invalid state PENDING_UPDATE of loadbalancer resource 99ea5fd5-eb1a-41e0-828a-7488acf61577 | 19:23 |
gregwork | i see this guy on one of my controlls | 19:23 |
gregwork | before that | 19:25 |
gregwork | https://pastebin.com/SDmDmZYv | 19:25 |
gregwork | with traceback | 19:25 |
johnsom | Yeah, so that message says that some thing attempted to make a change to a load balancer while it was still in the PENDING_UPDATE state. The API will then return a 409 HTTP status code and the client should try again later. (4xx status codes are all retry codes for REST) | 19:25 |
johnsom | You are running queens? Which version? | 19:28 |
gregwork | https://pastebin.com/vqwzxTqH | 19:28 |
gregwork | those appear to be the only other tracebacks | 19:29 |
gregwork | ip address already allocated in subnet | 19:30 |
gregwork | im not sure how to tell if its in use | 19:30 |
gregwork | is there something i can do on the controller | 19:31 |
gregwork | were kind of stuck until we can bring this stack down | 19:31 |
johnsom | The IP address in use should have just returned an error to the user saying the address was already in use on the subnet. That is a passive error. | 19:32 |
johnsom | That is a create call as well, not a delete. | 19:32 |
johnsom | I'm looking at the code for your first traceback. | 19:33 |
gregwork | gotcha | 19:33 |
johnsom | gregwork Can you run a "openstack loadbalancer status show <lbid>" on that load balancer and paste it? | 19:37 |
gregwork | sure | 19:40 |
gregwork | stats ? | 19:40 |
gregwork | i dont have status | 19:40 |
johnsom | This command: https://docs.openstack.org/python-octaviaclient/latest/cli/index.html#loadbalancer-status-show | 19:41 |
johnsom | Oh, the queens client was missing that. | 19:42 |
gregwork | yeah :/ | 19:42 |
johnsom | Would you mind installing a newer version in a venv and running that command? Queens support it, just the client didn't have it for queens. | 19:43 |
gregwork | the site im in has limited internet access | 19:46 |
gregwork | is there a neutron equivalent | 19:46 |
gregwork | or non osc method | 19:46 |
johnsom | No, that is it for CLI. I could give you curl commands if you are really adventurous | 19:47 |
johnsom | curl -v -X GET -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: $current_token" $test_API_ENDPOINT/v2.0/lbaas/loadbalancers/$test_lb_id/status | 19:48 |
johnsom | export test_API_ENDPOINT=$(openstack endpoint list --service load-balancer --interface public -f value -c URL) | 19:48 |
johnsom | export current_token=`curl -i -s -H "Content-Type: application/json" -d '{"auth":{"identity":{"methods":["password"],"password":{"user":{"name":"admin","domain":{"id":"default"},"password":"password"}}},"scope":{"project":{"name":"admin","domain":{"id":"default"}}}}}' http://$test_API_IP/identity/v3/auth/tokens | awk '/X-Subject-Token/ {print $2}' | tr -cd "[:print:]"` | 19:49 |
johnsom | Your traceback has my interest. It implies that the load balancer had a listener on it that is missing from the database. Which should not be able to happen as there are database constraints that block it from being possible. The status tree dump would give me the LB data model. | 19:52 |
gregwork | having some difficulty with those curls .. getting a 404 talking to keystone | 20:03 |
*** ivve has joined #openstack-lbaas | 20:03 | |
johnsom | Maybe your cloud uses the old endpoint scheme? | 20:04 |
johnsom | openstack endpoint list | grep keystone should show the proper URL path | 20:05 |
johnsom | It might have :5000 on it if it's the old form | 20:05 |
gregwork | identity | True | public | https://overcloud.idm.symrad.com:13000 | 20:06 |
gregwork | only one accessible | 20:06 |
johnsom | Ha, ok so to get a token you would use: | 20:07 |
johnsom | export current_token=`curl -k -i -s -H "Content-Type: application/json" -d '{"auth":{"identity":{"methods":["password"],"password":{"user":{"name":"admin","domain":{"id":"default"},"password":"password"}}},"scope":{"project":{"name":"admin","domain":{"id":"default"}}}}}' https://overcloud.idm.symrad.com:13000/v3/auth/tokens | awk '/X-Subject-Token/ {print $2}' | tr -cd "[:print:]" | 20:07 |
johnsom | you might also be able to get one with "openstack token issue" if your client has that | 20:10 |
johnsom | I keep forgetting about that command | 20:11 |
gregwork | {"statuses": {"loadbalancer": {"listeners": [{"pools": [{"members": [{"name": "", "provisioning_status": "ACTIVE", "address": "172.20.1.7", "protocol_port": 8443, "id": "a97f1b82-cc06-432d-8ffd-cdf04d5f77db", "operating_status": "NO_MONITOR"}, {"name": "", "provisioning_status": "ACTIVE", "address": "172.20.1.8", "protocol_port": 8443, "id": "50b00457-e267-4fb1-bef7-e7d4807955ac", "operating_status": | 20:12 |
gregwork | "NO_MONITOR"}], "provisioning_status": "ACTIVE", "id": "dfdf6769-01a9-490b-af40-d4eff5f22c47", "operating_status": "ONLINE", "name": "openshift-ansible-openshift.idm.symrad.com-api-lb-pool"}], "provisioning_status": "ACTIVE", "id": "bb12eacf-39c6-4e03-b8b1-5c05f3155181", "operating_status": "ONLINE", "name": "openshift-ansible-openshift.idm.symrad.com-api-lb-listener"}], "provisioning_status": "PENDING_UPDATE", | 20:12 |
gregwork | "id": "99ea5fd5-eb1a-41e0-828a-7488acf61577", "operating_status": "ONLINE", "name": "openshift-ansible-openshift.idm.symrad.com-api-lb"}}} | 20:12 |
gregwork | do we think it is safe to mysql delete ? | 20:13 |
gregwork | this is the only LB in the stack left | 20:13 |
gregwork | we are deleting everything, all the instances are gone | 20:13 |
gregwork | the network will be torn down | 20:13 |
gregwork | etc | 20:13 |
johnsom | Yeah, if you have checked the logs and it's not scrolling warnings, it's probably safe | 20:13 |
johnsom | Thank you for taking the time on this, I want to understand that traceback | 20:14 |
johnsom | I do see how it didn't rollback correctly. That was fixed in Rocky as part of some other work. I should probably create a special backport patch for queens for that issue. The test and lock is outside the larger transaction in queens and isn't set to rollback like it should. | 20:15 |
johnsom | That is really strange. So it's a simple load balancer with just one listener. | 20:17 |
johnsom | If you still have the DB, it would be interesting to see if there is a listener here with id "bb12eacf-39c6-4e03-b8b1-5c05f3155181", I think it is, but.... | 20:17 |
gregwork | octavia db name is octavia or octavia_api | 20:19 |
johnsom | octavia | 20:19 |
johnsom | mysql octavia | 20:19 |
johnsom | should open it | 20:19 |
gregwork | trying to figure out how tripleo locked down that | 20:19 |
gregwork | got it | 20:20 |
gregwork | ok were in the db, any last requests before we blow up the bad lb | 20:21 |
gregwork | redhat is asking about the bug in rocky you mentioned.. could you provide the link for them | 20:21 |
gregwork | they are on my webex | 20:21 |
gregwork | so just pasting it here would work | 20:21 |
johnsom | select * from listener where id = "bb12eacf-39c6-4e03-b8b1-5c05f3155181"; | 20:22 |
johnsom | Bug in Rocky? | 20:22 |
gregwork | from your earlier comment | 20:22 |
gregwork | about rollback not working? | 20:22 |
gregwork | btw the results of that query | 20:23 |
gregwork | https://pastebin.com/E3kE9zyC | 20:23 |
johnsom | The roll back I mentioned is a bug in queens, not Rocky | 20:23 |
gregwork | yeah do you have a reference to the bug i can pass them | 20:23 |
gregwork | a launchpad or something | 20:23 |
johnsom | I don't have one. I don't think there is an open bug/story for that. | 20:23 |
johnsom | You can give them this link: https://github.com/openstack/octavia/blob/stable/queens/octavia/api/v2/controllers/member.py#L273 | 20:24 |
* johnsom notes, it's just going to come back to me anyway.... | 20:24 | |
gregwork | ok gonna run that sql update | 20:27 |
johnsom | So, that is strange. So the root cause doesn't make any sense. Your traceback shows a DB query was made for that listener and it got back no results, which it should not be able to do due to the DB relations. Plus the ID it was querying should have come from the DB. I wonder if there is a sqlalchemy bug here. | 20:27 |
*** gcheresh has quit IRC | 20:35 | |
gregwork | hmmn, the load balancer cleaned up but i think there are some ports left over | 20:36 |
johnsom | Well, going to ERROR and then deleting the LB should have cleaned up any ports that load balancer had. They might be from a different issue, or source. | 20:37 |
johnsom | The controller would have logged if neutron refused to delete a port when we asked it to. | 20:38 |
cgoncalves | gregwork: hi. FYI, both johnsom and I are Red Hatters :) | 20:44 |
*** Vorrtex has quit IRC | 20:47 | |
gregwork | i figured | 20:48 |
gregwork | johnsom: possibly i know there is a possibility of badness because we are going under the hood tho | 20:49 |
gregwork | we are getting 409 conflict errors trying to kill these ports | 20:49 |
gregwork | they are not attached to any host id which is odd | 20:49 |
gregwork | and we just cleaned up the amphora | 20:49 |
gregwork | there are also no instances in this tenant on the networks these ports are attached to anymore | 20:50 |
gregwork | so they are kind of phantoms | 20:50 |
johnsom | What are the port names? | 20:50 |
johnsom | The only thing I can think of that would cause neutron to 409 the port delete is if there is a floating IP attached to the port. | 20:52 |
johnsom | Though I'm not a neutron port expert, so there may be other causes | 20:52 |
gregwork | actually i think they are ports for openshift nodes that didnt get cleaned up | 20:53 |
johnsom | Ah, ok. | 20:54 |
openstackgerrit | Michael Johnson proposed openstack/octavia master: WIP: Generate PDF documentation https://review.opendev.org/667249 | 21:44 |
*** takamatsu has quit IRC | 21:55 | |
*** rcernin has joined #openstack-lbaas | 22:11 | |
*** KeithMnemonic has quit IRC | 22:13 | |
*** ivve has quit IRC | 22:17 | |
colin- | this is O/T but wondered if anybody has tried out https://github.com/cilium/cilium to test benefits of bpf/xdp in software lbs? | 22:35 |
johnsom | I think there was a group that did a XDP proof-of-concept and presented at a summit about it. | 22:43 |
johnsom | https://www.youtube.com/watch?v=1oAsRzrwAAw | 22:44 |
colin- | ah nice that's relevant thanks | 22:47 |
johnsom | Yeah, I went to that talk | 22:48 |
johnsom | We are doing something similar to that with the TCP flows. We use the kernel splicing, so once it's established it is pretty much and in and out at the kernel level | 22:52 |
johnsom | Just we don't have to write the BPF ourselves | 22:53 |
johnsom | http://www.linuxvirtualserver.org/software/tcpsp/index.html | 22:54 |
johnsom | Though that is old info, it's in the main kernel now | 22:55 |
*** rcernin has quit IRC | 23:11 | |
*** rcernin has joined #openstack-lbaas | 23:12 | |
openstackgerrit | Michael Johnson proposed openstack/octavia master: WIP: Generate PDF documentation https://review.opendev.org/667249 | 23:18 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!