*** yamamoto has joined #openstack-lbaas | 01:45 | |
*** yamamoto has quit IRC | 02:04 | |
*** yamamoto has joined #openstack-lbaas | 02:04 | |
*** yamamoto has quit IRC | 02:35 | |
*** yamamoto has joined #openstack-lbaas | 02:38 | |
*** ricolin has joined #openstack-lbaas | 02:51 | |
*** abaindur has joined #openstack-lbaas | 03:29 | |
*** psachin has joined #openstack-lbaas | 03:30 | |
*** ricolin_ has joined #openstack-lbaas | 03:35 | |
*** ricolin has quit IRC | 03:38 | |
*** abaindur has quit IRC | 03:54 | |
*** ramishra has joined #openstack-lbaas | 04:04 | |
*** ramishra has quit IRC | 04:45 | |
*** ramishra has joined #openstack-lbaas | 04:45 | |
openstackgerrit | Merged openstack/octavia-tempest-plugin master: Adds provider flavor capabilities API tests https://review.opendev.org/631113 | 04:58 |
---|---|---|
openstackgerrit | Merged openstack/octavia-tempest-plugin master: Add a flavor to the load balancer CRUD scenarios https://review.opendev.org/631353 | 04:58 |
openstackgerrit | Merged openstack/octavia-tempest-plugin master: Add amphora update service client and API test https://review.opendev.org/633295 | 04:59 |
openstackgerrit | Merged openstack/octavia-tempest-plugin master: Add amphora failover API test https://review.opendev.org/633614 | 04:59 |
*** vishalmanchanda has joined #openstack-lbaas | 05:13 | |
*** abaindur has joined #openstack-lbaas | 05:18 | |
*** abaindur has quit IRC | 05:35 | |
*** yamamoto has quit IRC | 05:47 | |
*** yamamoto has joined #openstack-lbaas | 05:48 | |
*** abaindur has joined #openstack-lbaas | 05:54 | |
*** abaindur has quit IRC | 06:05 | |
*** ramishra has quit IRC | 06:06 | |
*** ramishra has joined #openstack-lbaas | 06:06 | |
*** abaindur has joined #openstack-lbaas | 06:12 | |
*** abaindur has quit IRC | 06:31 | |
*** ricolin__ has joined #openstack-lbaas | 06:41 | |
*** ricolin__ is now known as ricolin | 06:41 | |
*** ltomasbo has left #openstack-lbaas | 06:45 | |
*** ricolin_ has quit IRC | 06:45 | |
*** ccamposr has joined #openstack-lbaas | 06:46 | |
*** maciejjozefczyk has joined #openstack-lbaas | 07:02 | |
*** maciejjozefczyk has quit IRC | 07:03 | |
*** rcernin has quit IRC | 07:04 | |
*** pcaruana has joined #openstack-lbaas | 07:08 | |
*** tesseract has joined #openstack-lbaas | 07:17 | |
*** maciejjozefczyk has joined #openstack-lbaas | 07:33 | |
*** rpittau|afk is now known as rpittau | 07:51 | |
openstackgerrit | Ann Taraday proposed openstack/octavia master: Convert pool flows to use dicts https://review.opendev.org/665381 | 07:59 |
openstackgerrit | Ann Taraday proposed openstack/octavia master: Transition amphora flows to dicts https://review.opendev.org/668898 | 07:59 |
openstackgerrit | Ann Taraday proposed openstack/octavia master: [WIP] Lb flows to dicts https://review.opendev.org/671725 | 07:59 |
openstackgerrit | Ann Taraday proposed openstack/octavia master: [WIP] Jobboard based controller https://review.opendev.org/647406 | 07:59 |
*** tkajinam has quit IRC | 08:11 | |
*** yamamoto has quit IRC | 08:13 | |
openstackgerrit | Maciej Józefczyk proposed openstack/octavia master: Validate supported LB algorithm in Amphora provider drivers https://review.opendev.org/672477 | 08:19 |
*** tesseract-RH has joined #openstack-lbaas | 08:22 | |
*** tesseract has quit IRC | 08:22 | |
openstackgerrit | Maciej Józefczyk proposed openstack/octavia-tempest-plugin master: Specify used algorithm for tests https://review.opendev.org/672264 | 08:23 |
openstackgerrit | Maciej Józefczyk proposed openstack/octavia-tempest-plugin master: Add an option to reuse connections https://review.opendev.org/672976 | 08:23 |
*** tesseract-RH has quit IRC | 08:24 | |
*** tesseract has joined #openstack-lbaas | 08:24 | |
*** yamamoto has joined #openstack-lbaas | 08:54 | |
*** yamamoto has quit IRC | 09:09 | |
*** ajay33 has joined #openstack-lbaas | 09:21 | |
*** lemko has joined #openstack-lbaas | 09:42 | |
lemko | One of my loadbalancer and its unique amphora failed because of some issue with database. When I do openstack loadbalancer failover <loadbalancer_id>, it creates another amphora but it gets immediately in error state. As a result of this, there are four amphora for this loadbalancer, all in error state. Any idea, what can I do? | 09:43 |
*** yamamoto has joined #openstack-lbaas | 09:44 | |
*** dasp has quit IRC | 09:49 | |
*** dasp has joined #openstack-lbaas | 09:49 | |
*** ramishra has quit IRC | 11:24 | |
*** ramishra has joined #openstack-lbaas | 11:26 | |
*** devfaz has quit IRC | 11:26 | |
*** devfaz has joined #openstack-lbaas | 11:30 | |
*** mkuf_ is now known as mkuf | 11:35 | |
*** yamamoto has quit IRC | 11:53 | |
jrosser | johnsom: when we try to locally build an amphora from the stable/stein branch code, something run inside diskimage-create.sh does "Cloning from amphora-agent cache and applying ref master" so we get a master version amphora on a Stein cloud, which doesnt work..... is there somewhere to specify the amphora branch to build, on top of checking out the stein branch of the octavia code? | 12:07 |
maciejjozefczyk | hey! looks like we have some gate trouble after octavia-lib release, https://logs.opendev.org/77/672477/4/check/openstack-tox-py27/5af555f/job-output.txt.gz | 12:09 |
jrosser | johnsom: looks like things break becasue the API version is now changed on master here https://github.com/openstack/octavia/commit/37799137a3f1f5ff6aa0f8809a141d4ea04cca75 | 12:09 |
*** yamamoto has joined #openstack-lbaas | 12:13 | |
*** yamamoto has quit IRC | 12:24 | |
*** ramishra has quit IRC | 12:46 | |
*** yamamoto has joined #openstack-lbaas | 12:48 | |
johnsom | lemko You will need to look in the worker log to see why the controller is unable to build a replacement. The amphora in error state will get cleaned up by the housekeeping manager eventually. They should not have actual nova VMs behind them. | 13:02 |
*** goldyfruit has joined #openstack-lbaas | 13:06 | |
johnsom | jrosser https://github.com/openstack/octavia/blob/master/devstack/plugin.sh#L72 | 13:06 |
johnsom | You can set two environment variables to override the version of the agent it pulls in. I thought we had that in the README file, but I don't see it. | 13:07 |
jrosser | johnsom: ah great will take a look at those | 13:07 |
johnsom | jrosser So, as of this critical patch, you can older amphora image versions with the current controllers, but you can't run a newer amphora image on an older controller. This is due to the api version change. | 13:09 |
jrosser | yes, thats what we see today | 13:09 |
jrosser | the amphora has been built with 1.0 api and then things are all a bit broken | 13:09 |
johnsom | Yeah, you would need an updated controller set | 13:10 |
jrosser | so it should be possible to pin the repo back to stable/stein for the amphora build then? | 13:10 |
johnsom | Yes | 13:10 |
*** goldyfruit has quit IRC | 13:11 | |
johnsom | maciejjozefczyk Yes, there has been a patch up for that issue: https://review.opendev.org/#/c/673687/ | 13:11 |
maciejjozefczyk | johnsom, thanks | 13:12 |
jrosser | johnsom: so relatedly, from an openstack-ansible perspective i guess this is now only good for master? https://github.com/openstack/openstack-ansible-os_octavia/blob/master/defaults/main.yml#L245 | 13:13 |
maciejjozefczyk | johnsom, did the same in our unittests, i'm working on functionals to enable those foo listeners . the same way You proposed here: https://review.opendev.org/#/c/665029/21/octavia/tests/functional/api/drivers/driver_agent/test_driver_agent.py | 13:14 |
maciejjozefczyk | johnsom, but for now I stuck on duplicate config option: http://paste.openstack.org/show/755513/ | 13:15 |
johnsom | jrosser Yes, those are master only images | 13:15 |
lemko | thanks johnsom, so the log is here: http://paste.openstack.org/show/755519/ about PortNotFound: port not found (port id: 42a01db7-0b95-49fd-afd1-cbe96c713b4a) and the operation is aborted after that. Any idea how to fix it? | 13:25 |
johnsom | The actual error may have occurred prior to that log snippet. | 13:27 |
johnsom | Usually the root issue is above that tree in the log | 13:28 |
lemko | Here it is johnsom : http://paste.openstack.org/show/755520/ | 13:33 |
lemko | Instance could not be found is the issue? | 13:34 |
johnsom | No, that actually looks ok. It says it's assuming it's already been deleted and moving on. | 13:36 |
johnsom | It looks like this task "octavia.controller.worker.tasks.network_tasks.GetAmphoraeNetworkConfigs" is the one failing | 13:36 |
johnsom | So, it is failing when trying to look up the VIP port. | 13:37 |
lemko | So if the port is not found, what can be done? | 13:38 |
johnsom | Well, this is very timely to report as I'm about to start work on fixing the failover flow. This is a good use case that needs to be fixed. | 13:39 |
johnsom | We can probably recover it if it is worth the effort vs. just deleting and re-building the LB. Let me know which you prefer | 13:41 |
johnsom | I have opened this story to track you situation | 13:42 |
johnsom | https://storyboard.openstack.org/#!/story/2006333 | 13:42 |
lemko | If I can help the community, I would be happy ;) | 13:44 |
lemko | It is worth the effort for me because it might happen again in the future and I would like to be able to fix it | 13:44 |
johnsom | Lol. Yeah, fixing the failover flow is the next work item on my list. It has some deficiencies when resources disappear under the load balanacer. | 13:44 |
johnsom | Ok, give me a minute to get setup and I will work with you through the process | 13:45 |
johnsom | Ok, I need to restack a cloud for this. Give me ~10 minutes to get setup. | 13:47 |
lemko | ok, thanks a lot! | 13:47 |
*** pcaruana has quit IRC | 13:47 | |
*** ajay33 has quit IRC | 13:50 | |
*** yamamoto has quit IRC | 13:58 | |
*** pcaruana has joined #openstack-lbaas | 14:00 | |
johnsom | lemko Ok, ready to get started. Is the load balancer active/standby? | 14:06 |
*** ramishra has joined #openstack-lbaas | 14:07 | |
lemko | SINGLE | 14:07 |
lemko | Standalone I mean | 14:07 |
johnsom | Ah, ok. | 14:07 |
johnsom | First up, let's have a look at what is in the DB for this amp. | 14:07 |
johnsom | connect to the octavia database. (mysql octavia) | 14:08 |
johnsom | select * from amphora; | 14:08 |
*** yamamoto has joined #openstack-lbaas | 14:08 | |
johnsom | select * from amphora where load_balancer_id = '<id>'; | 14:08 |
lemko | ok. | 14:08 |
johnsom | Actually, looking for just the one amp is probably easier. | 14:08 |
lemko | here it is : https://pastebin.com/0pdscaLL | 14:11 |
lemko | you can see 7 amphora in error state | 14:11 |
johnsom | Ok, reading | 14:12 |
johnsom | Ok, good. Can you do an "openstack port show 5891db34-622b-40c5-88ad-a08919ad0da0"? | 14:13 |
johnsom | Let's see if the other port is present or not. | 14:14 |
lemko | the port is present | 14:14 |
lemko | https://pastebin.com/jve0WiGn | 14:14 |
johnsom | Ok, cool. So half the battle is done already. | 14:15 |
johnsom | Now "openstack port list | grep fa:16:3e:d0:8d:" | 14:16 |
lemko | | 5891db34-622b-40c5-88ad-a08919ad0da0 | octavia-lb-vrrp-153a6d53-a9cc-45f9-b189-5caf7f323609 | fa:16:3e:d0:8d:8c | ip_address='192.168.81.8', subnet_id='23e3aa95-62c9-4fdb-bb9e-8453afc6756e' | DOWN | | 14:17 |
johnsom | Ok, so the situation is that the base port is present however, neutron has lost it's "allowed_address_pairs" port for some reason. It's still configured on the base port, but it doesn't actually exist in neutron. | 14:18 |
lemko | allowed_address_pairs should be the ip of the amphora responsible for it? | 14:19 |
johnsom | It should be the load balancer VIP address. When you add an allowed_address_pair to a port, neutron creates a "fake" port for it. | 14:20 |
lemko | should I delete it and add it again? | 14:21 |
lemko | or simply delete it? | 14:22 |
johnsom | Yeah, just a second, I will provide some commands to try | 14:22 |
johnsom | openstack port unset --allowed-address ip-address=192.168.81.6 5891db34-622b-40c5-88ad-a08919ad0da0 | 14:23 |
johnsom | openstack port set --allowed-address ip-address=192.168.81.6,mac-address=fa:16:3e:d0:8d:8c 5891db34-622b-40c5-88ad-a08919ad0da0 | 14:23 |
johnsom | Let's try that and see if neutron will let us do it | 14:23 |
lemko | Port does not contain allowed-address-pair {'ip_address': '192.168.81.6'} | 14:24 |
lemko | :) | 14:24 |
johnsom | Did both commands fail? | 14:25 |
lemko | yes. | 14:25 |
lemko | BadRequestException: 400: Client Error for url: http://neutron-server.openstack.svc.cluster.local:9696/v2.0/ports/5891db34-622b-40c5-88ad-a08919ad0da0, Request contains duplicate address pair: mac_address fa:16:3e:d0:8d:8c ip_address 192.168.81.6. | 14:25 |
johnsom | Sigh, ok, maybe we need to be more specific for neutron. | 14:26 |
lemko | for the second one | 14:26 |
lemko | ok. | 14:26 |
johnsom | let's try "openstack port unset --allowed-address ip-address=192.168.81.6,mac-address=fa:16:3e:d0:8d:8c 5891db34-622b-40c5-88ad-a08919ad0da0" | 14:26 |
lemko | good. yes | 14:27 |
lemko | should I do the second command and also specify the mac? | 14:27 |
johnsom | Ha, ok, let's run the second command again | 14:27 |
lemko | didn't read well sorry | 14:27 |
lemko | ok just ran it | 14:27 |
johnsom | No worries, I think we are doing fine. | 14:28 |
johnsom | Ok, so that passed this time? | 14:28 |
lemko | yes | 14:28 |
johnsom | Excellent! Now let's do this again: "openstack port list | grep fa:16:3e:d0:8d" | 14:28 |
lemko | | 5891db34-622b-40c5-88ad-a08919ad0da0 | octavia-lb-vrrp-153a6d53-a9cc-45f9-b189-5caf7f323609 | fa:16:3e:d0:8d:8c | ip_address='192.168.81.8', subnet_id='23e3aa95-62c9-4fdb-bb9e-8453afc6756e' | DOWN | | 14:29 |
johnsom | Just one result? I was hoping for two | 14:29 |
lemko | just one yes. | 14:29 |
johnsom | how about if we do "openstack port list | grep 192.168.81.6" | 14:30 |
lemko | no result | 14:31 |
*** yamamoto has quit IRC | 14:31 | |
johnsom | Blah. Ok, so we are going to have to get more creative as neutron isn't rebuilding the AAP port for us. | 14:32 |
*** yamamoto has joined #openstack-lbaas | 14:32 | |
johnsom | One second while I build a command for you | 14:33 |
lemko | Sad. what's AAP? amphora port? | 14:33 |
johnsom | Allowed_address_pairs (AAP) | 14:35 |
johnsom | It's a neutron term for the "fake" port they create when you add a secondary IP address on a neutron port. | 14:35 |
johnsom | Ok, let's give this a go: openstack port create --fixed-ip subnet=23e3aa95-62c9-4fdb-bb9e-8453afc6756e,ip-address=192.168.81.6 --disable --project 2a095cd2d94c4d888dd8e8edf8b851b3 temp-v | 14:36 |
johnsom | ip | 14:36 |
johnsom | Well, the name was supposed to be temp-vip, it line wrapped for some reason. | 14:36 |
*** yamamoto has quit IRC | 14:37 | |
lemko | Ok, I created the port (I also added the --network field) | 14:38 |
johnsom | Ah, yes, that might be needed. | 14:39 |
johnsom | What is the ID of the port? | 14:39 |
lemko | 27bb002f-c515-403e-8093-52e0adec284d | 14:39 |
johnsom | Ok, back in the octavia database, let's do this to trick Octavia to work around the missing AAP port: | 14:40 |
johnsom | update amphora set ha_port_id = '27bb002f-c515-403e-8093-52e0adec284d' where load_balancer_id = '983c9b7c-6874-40a0-b626-d7694ddd93e6'; | 14:40 |
johnsom | once that is done, let's do another load balancer failover command | 14:40 |
lemko | ok, it's going on | 14:41 |
johnsom | When it completes we will do a few more commands to see if we need to do any cleanup or not. | 14:42 |
lemko | ok it already failed | 14:43 |
johnsom | Bummer, can you paste worker log? | 14:43 |
lemko | http://paste.openstack.org/show/755525/ | 14:46 |
lemko | + a tiny bit here : http://paste.openstack.org/show/755526/ | 14:48 |
johnsom | Ah, I made a mistake and forgot to update the port ID in one more table. | 14:50 |
johnsom | In the DB we need to do this command as well: | 14:50 |
johnsom | update vip set port_id = '27bb002f-c515-403e-8093-52e0adec284d' where load_balancer_id = '983c9b7c-6874-40a0-b626-d7694ddd93e6'; | 14:50 |
johnsom | Then we will try failover again | 14:51 |
lemko | ok | 14:51 |
lemko | it's now pending update | 14:52 |
lemko | still stucked in pending update | 14:55 |
johnsom | Give it some time | 14:55 |
lemko | I see from octavia-worker : | 14:55 |
lemko | 2019-08-05 14:54:59,164.164 17 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: ConnectTimeout: HTTPSConnectionPool(host='10.22.0.6', port=9443): Max retries exceeded with url: /0.5/info (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x7f53e747a590>, 'Connection to 10.22.0.6 timed out. (connect timeout=10.0)')) | 14:55 |
johnsom | Yeah, that is normal. That is just saying that nova has not yet finished booting the VM | 14:55 |
johnsom | It will retry for a while, waiting for nova to finish the boot. | 14:55 |
johnsom | The nova status goes to Active as soon as the process launches, but that is not when the VM is actually booted, so we have to retry/wait to find when it actually comes up | 14:56 |
lemko | 10.22.0.6 is an amphora in `openstack loablanacer list | grep 983c9b7c-6874-40a0-b626-d7694ddd93e6` which is in error state and that I don't see any vm with this IP in my tenant where amphora are | 14:57 |
johnsom | Typically that is about 30 seconds, but depending on your setup this could take up to 18 mintues. | 14:57 |
lemko | I am a bit confused, what octavia will try to do? create a vm and give it the IP 10.22.0.6? | 14:58 |
lemko | and why this IP since it was already in database when we started the procedure? | 14:58 |
johnsom | Well, assuming that log entry is from the load balancer we are working on (it may not be), it would have asked nova to boot a VM and nova gave it back 10.22.0.6 as the IP address nova assigned for the lb-mgmt-net | 14:59 |
johnsom | Nova can re-use IPs, which is fine, we account for that | 14:59 |
johnsom | So, when you look at our load balancer with openstack loadbalancer show it's marked in provisioning_status ERROR? | 15:01 |
johnsom | Not PENDING_UPDATE? | 15:01 |
lemko | PENDING_UPDATE. | 15:01 |
johnsom | Ok good, that is what we want to see | 15:01 |
johnsom | What do we have here? select * from amphora where load_balancer_id = '983c9b7c-6874-40a0-b626-d7694ddd93e6'; | 15:03 |
lemko | BTW there's an ALLOCATED amphora with STANDALONE status for our LB but it has different IP | 15:03 |
lemko | I'll execute the select | 15:03 |
lemko | http://paste.openstack.org/show/755529/ | 15:04 |
johnsom | Ok, that is good. That is what I would expect. Now, why it's still in pending update.... hmmm | 15:05 |
johnsom | Can you look in both the controller worker log and the health manager log to see if it's trying to contact "10.22.0.9"? | 15:06 |
lemko | No mention of it. | 15:08 |
lemko | only from house-keeping 30 mins afgo | 15:08 |
lemko | ago* | 15:08 |
lemko | 2019-08-05 14:43:27,993.993 1 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: ConnectionError: HTTPSConnectionPool(host='10.22.0.9', port=9443): Max retries exceeded with url: /0.5/info (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f3df53f91d0>: Failed to establish a new connection: [Errno 111] Connection refused',)) | 15:08 |
lemko | when the amphora was booting and getting ready, it was full of logs like this but it's the last one I sent you | 15:09 |
lemko | and octavia worker is still trying to contact 10.22.0.6 | 15:10 |
johnsom | Hmm. Have you tuned your startup timeouts or are you running with the defaults? | 15:11 |
johnsom | Do you have spares pool enabled? | 15:13 |
lemko | Yes | 15:13 |
lemko | spare-pool of 2 | 15:13 |
lemko | here is the conf with timeout http://paste.openstack.org/show/755530/ | 15:13 |
lemko | full conf : http://paste.openstack.org/show/N3pHJTLKbVx5NIKVwTTw/ | 15:15 |
johnsom | Yeah, ok, we are going to have to wait a while for the LB to go back to ERROR | 15:15 |
johnsom | You might consider the following settings: | 15:16 |
johnsom | [haproxy_amphora] connection_max_retries = 120 | 15:16 |
johnsom | [haproxy_amphora] build_active_retries = 120 | 15:16 |
lemko | now we're waiting for connection_max_retries to go away | 15:16 |
lemko | ? | 15:16 |
johnsom | So, spares pool bring an intersting twist to this. I wonder if there isn't a bug in the spares pool allocation on failover. | 15:17 |
johnsom | Yeah, we need the controller to release ownership of the load balancer. Basically we want it to come out of the PENDING_* state and either go ACTIVE or ERROR | 15:17 |
lemko | Is there a chance it can go to ACTIVE? | 15:18 |
johnsom | I'm going to look in the failover code path when spares pool is enabled. | 15:18 |
lemko | what would forcing it to ERROR do? | 15:18 |
johnsom | Bad things | 15:18 |
*** yamamoto has joined #openstack-lbaas | 15:19 | |
johnsom | When an object is in PEDNING_* a controller has ownership of the resource. If you force it out of PENDING you are going to leave resources in use in the tenant and most likely make this situation of repairing much worse. | 15:20 |
lemko | OK. I did this a few time and I know understand why it ended badly with me having to purge completely octavia and reinstall it | 15:20 |
lemko | now* | 15:20 |
johnsom | Plus, at some point the controller will give up waiting on nova and will start making changes to whatever exists at that time. | 15:20 |
johnsom | So it could circle back and start deleting things even if we resolved the problem | 15:21 |
johnsom | Yeah, PENDING is an important state. | 15:21 |
johnsom | It's best if you tune those timeouts so Octavia doesn't retry so long on nova/neutron. | 15:21 |
johnsom | That way it will release the resource much faster. | 15:22 |
lemko | If I change it now, will it take effect now? | 15:22 |
johnsom | No | 15:22 |
lemko | Ok. | 15:22 |
johnsom | The long defaults are there for the zuul test instances that run in nested VMs without hardware acceleration. They can take up to 18 minutes to boot a VM. | 15:23 |
johnsom | It's an unfortunate set of defaults. | 15:23 |
johnsom | So, what I think we should do is: | 15:24 |
johnsom | 1. Wait for the controller to release the load balancer. Likely to ERROR | 15:24 |
johnsom | 2. Check the amphora table status. Likely all DELETED and ERROR, which would be fine. | 15:25 |
*** yamamoto has quit IRC | 15:25 | |
johnsom | 3. Purge the old amphora records from the amphora table (I think there might be a bug with spares pool). "DELETE from amphora where status = 'ERROR' or 'DELETED';" | 15:26 |
*** goldyfruit has joined #openstack-lbaas | 15:26 | |
johnsom | 4. Trigger another failover. | 15:26 |
johnsom | optionally, update those timeouts and restart the controllers. Please wait until the LB is out of PENDING though. | 15:27 |
openstackgerrit | Adam Harwell proposed openstack/octavia master: Fix a unit test for new octavia-lib https://review.opendev.org/673687 | 15:27 |
lemko | and what about the port we created previously? | 15:28 |
lemko | the temp-vip port | 15:29 |
johnsom | After we get the LB going again. We should grep the port list for the VIP IP. if we see the other AAP port with it we will delete the temp-vip | 15:29 |
rm_work | ok lets | 15:35 |
rm_work | *let's see if that works now | 15:35 |
johnsom | Hmm, I don't see anything the the failover flow with spares that could cause it to get the wrong amp for the DB. | 15:39 |
lemko | Thanks a lot for your help. I'll come back to you when the retry ends ;) | 15:44 |
johnsom | Ok, cool | 15:45 |
rm_work | yep cool, docs passed | 15:49 |
rm_work | so cgoncalves we could prolly merge https://review.opendev.org/#/c/673687/ now :D it'll fix our lower-constraints job which will I believe otherwise continue to break | 15:49 |
rm_work | (technically waiting for tests to finish, but everything passed before besides docs | 15:49 |
rm_work | ) | 15:49 |
rm_work | johnsom: https://review.opendev.org/#/c/673172/ | 15:58 |
rm_work | https://review.opendev.org/#/c/674087/ is waiting on that one to merge ^^ | 15:58 |
rm_work | there's a LOT of patches up with one +2 from me that could use reviews from other cores ;) | 16:01 |
*** tesseract has quit IRC | 16:04 | |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: WIP: Switch Fedora-based amphora to fedora-minimal https://review.opendev.org/673173 | 16:09 |
rm_work | specifically this thingy https://review.opendev.org/#/c/645495/ could probably use a review, it's been waiting a while and seems trivial-ish to me | 16:12 |
*** henriqueof has quit IRC | 16:12 | |
rm_work | and https://review.opendev.org/#/c/661309/ will start another one | 16:20 |
rm_work | ugh need to merge https://review.opendev.org/#/c/673687/ before anything in Octavia can merge I thinjk | 16:30 |
johnsom | Yes | 16:31 |
rm_work | can we just +A that as a gatefix | 16:32 |
rm_work | I guess I'll wait at least until the checks finish | 16:32 |
rm_work | ok cool, cgoncalves got it :) | 16:32 |
johnsom | It seems like the octavia-lib patches should run some octavia tests.... At least unit and functional | 16:33 |
rm_work | maybe yeah <_< | 16:37 |
*** ramishra has quit IRC | 16:44 | |
*** ricolin has quit IRC | 16:57 | |
*** goldyfruit has quit IRC | 17:18 | |
*** goldyfruit_ has joined #openstack-lbaas | 17:18 | |
*** Vorrtex has joined #openstack-lbaas | 17:21 | |
openstackgerrit | Michael Johnson proposed openstack/octavia master: Add Octavia tox "tips" jobs https://review.opendev.org/674659 | 17:21 |
johnsom | Let's see what that does. If it works we can add it to octavia-lib | 17:21 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: Install missing packages in nodepool instance https://review.opendev.org/674259 | 17:21 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: WIP: Switch Fedora-based amphora to fedora-minimal https://review.opendev.org/673173 | 17:21 |
rm_work | tests take sooooo excruciatingly looooooong | 17:26 |
*** psachin has quit IRC | 17:28 | |
*** rpittau is now known as rpittau|afk | 17:42 | |
johnsom | Is it the functionals? | 17:51 |
johnsom | I am seeing this odd behavior that if an /etc/octavia/octavia.conf exists, and debug is True, the functional tests slow to a crawl | 17:52 |
openstackgerrit | Merged openstack/octavia-tempest-plugin master: Increase connection_max_retries to 480 secs on CentOS jobs https://review.opendev.org/673172 | 17:56 |
rm_work | hmmmmm | 18:14 |
rm_work | i should check | 18:14 |
rm_work | nope don't have that | 18:14 |
rm_work | it's specifically these new data api ones | 18:15 |
rm_work | err, let me see, how do i get it to tell me the times again? | 18:15 |
rm_work | anywho, gonna rebase patches down the line | 18:17 |
openstackgerrit | Adam Harwell proposed openstack/octavia master: Lookup interfaces by MAC directly https://review.opendev.org/673337 | 18:17 |
openstackgerrit | Adam Harwell proposed openstack/octavia master: Fix L7 repository create methods https://review.opendev.org/673154 | 18:17 |
openstackgerrit | Adam Harwell proposed openstack/octavia master: Fix provider driver utils https://review.opendev.org/673155 | 18:17 |
openstackgerrit | Adam Harwell proposed openstack/octavia master: Add get method support to the driver-agent https://review.opendev.org/665029 | 18:17 |
*** maciejjozefczyk has quit IRC | 18:52 | |
rm_work | yeah i'm thinking i can't actually get a functional suite run to complete | 18:57 |
rm_work | they hang on my machine | 18:57 |
johnsom | Really? Total hang or super slow reports? | 18:58 |
*** gcheresh_ has joined #openstack-lbaas | 19:15 | |
*** gcheresh_ has quit IRC | 19:21 | |
rm_work | total hang when i run it from the CLI with tox | 19:21 |
rm_work | they hang for about 3 minutes each when i run them individually | 19:22 |
rm_work | in pycharm | 19:22 |
rm_work | trying to figure out WHERE it's hanging | 19:22 |
rm_work | ah i am 90% sure it's this: | 19:23 |
rm_work | self.status_listener_proc.join(60) | 19:23 |
rm_work | self.stats_listener_proc.join(60) | 19:23 |
rm_work | self.get_listener_proc.join(60) | 19:23 |
rm_work | that's the 3 minutes :D | 19:23 |
rm_work | they all time out one at a time | 19:23 |
rm_work | which means the real issue is that the exit_event that's set isn't actually working | 19:23 |
rm_work | they all just run `server.handle_request()` which seems to be blocking | 19:24 |
rm_work | should have a timeout set? what is the default for CONF.driver_agent.get_request_timeout | 19:25 |
rm_work | 5 seconds? ... it's not respecting that for sure | 19:26 |
rm_work | johnsom: ^^ | 19:27 |
johnsom | Hmm, works great for me, those take about 9 seconds each and done | 19:30 |
johnsom | I thought I left those without a timeout.... | 19:35 |
johnsom | I only force killed the third party processes | 19:35 |
johnsom | Yeah where do you see the join(60)??? | 19:37 |
rm_work | top of test_driver_agent.py | 19:37 |
rm_work | like like 50 or so | 19:37 |
rm_work | *line | 19:37 |
johnsom | Ah, the test | 19:37 |
johnsom | https://www.irccloud.com/pastebin/QAnZ5M2S/ | 19:40 |
johnsom | johnsom@python23:/tmp/octavia$ tox -e functional -- octavia.tests.functional.api.drivers.driver_agent | 19:41 |
johnsom | Each test is right about 9 seconds | 19:42 |
rm_work | yeah these are never dieing | 19:42 |
rm_work | i wonder if the socket implementation is different / f'd on OSX | 19:43 |
rm_work | ahh hmm | 19:44 |
rm_work | in py27 they finish basically instantly | 19:44 |
rm_work | it's in py37 that they hang | 19:44 |
johnsom | Ah, let me try | 19:44 |
rm_work | (again, i usually test in py37) | 19:45 |
rm_work | or py36 | 19:45 |
rm_work | T_T | 19:45 |
johnsom | I get the same on py36 (this 18.04 VM doesn't have 3.7 on it) | 19:46 |
johnsom | https://www.irccloud.com/pastebin/uHNqGYKm/ | 19:46 |
rm_work | grrrr something in cinder broke and killed our grenade run and so we have to recheck that patch and have it run through twice again | 19:49 |
rm_work | there goes another 5 hours | 19:49 |
johnsom | I thought I had cinder disabled. Is it just that it keeps cloning it? | 19:50 |
johnsom | I shut it down for this very reason.... | 19:50 |
rm_work | it looked like it was installing/setting it up during the devstack run | 19:50 |
rm_work | again, in Grenade | 19:50 |
rm_work | so maybe not disabled there? | 19:50 |
johnsom | Yeah, ok. | 19:50 |
rm_work | setting up a functional-py36 env so i can test | 19:51 |
rm_work | to see if it's a 3.7 only issue | 19:51 |
rm_work | it could be <_< | 19:51 |
johnsom | well, installing "python3.7" on 18.04 ends up with strange results. The tests "pass" but it throws one of the "ascii" codec errors | 19:55 |
johnsom | No runtime info, nothing | 19:55 |
johnsom | Yeah, the built in 3.7 support seems to just not function correctly.... | 19:58 |
openstackgerrit | Anqi Li proposed openstack/octavia master: Implements notifications for octavia https://review.opendev.org/674432 | 19:59 |
rm_work | yes, 3.6 runs fine | 20:03 |
rm_work | <_< | 20:03 |
rm_work | only 3.7 hangs | 20:04 |
rm_work | so something is borked in 3.7 and we will have to figure it out at some point | 20:04 |
johnsom | Don't our gates run that? Seems like I need to figure that out sooner-ish | 20:10 |
johnsom | Ah, we only have 3.6 in there | 20:11 |
openstackgerrit | Adam Harwell proposed openstack/octavia master: Add get method support to the driver-agent https://review.opendev.org/665029 | 20:17 |
rm_work | well, that's a working version anyway | 20:17 |
rm_work | but yeah uhh... something is causing the process to hang in 3.7 and it COULD be the socket lib, not sure | 20:17 |
*** henriqueof has joined #openstack-lbaas | 20:20 | |
openstackgerrit | Anqi Li proposed openstack/octavia master: Implements notifications for octavia https://review.opendev.org/674432 | 20:28 |
openstackgerrit | Adam Harwell proposed openstack/octavia master: Move to using octavia-lib constants https://review.opendev.org/673712 | 20:32 |
rm_work | eugh rebases | 20:36 |
openstackgerrit | Adam Harwell proposed openstack/octavia master: Add long-running provider agent support https://review.opendev.org/674140 | 20:37 |
rm_work | ok that chain is up to date | 20:37 |
openstackgerrit | Michael Johnson proposed openstack/octavia master: Add Octavia tox "tips" jobs https://review.opendev.org/674659 | 20:41 |
*** lemko has quit IRC | 20:54 | |
*** vishalmanchanda has quit IRC | 21:03 | |
*** yamamoto has joined #openstack-lbaas | 21:23 | |
*** yamamoto has quit IRC | 21:28 | |
openstackgerrit | Michael Johnson proposed openstack/octavia master: Add the DIB_REPO* variables to the README.rst https://review.opendev.org/674701 | 21:30 |
johnsom | Could have swore we had that in the README, but I guess not. | 21:30 |
*** rcernin has joined #openstack-lbaas | 22:06 | |
*** spatel has joined #openstack-lbaas | 22:12 | |
rm_work | AUGH IT FAILED AGAIN | 22:32 |
rm_work | WTB gatefix merging please zuul | 22:32 |
*** tkajinam has joined #openstack-lbaas | 22:56 | |
*** Vorrtex has quit IRC | 22:58 | |
*** henriqueof has quit IRC | 23:00 | |
* johnsom starts chanting towards zuul | 23:01 | |
*** spatel has quit IRC | 23:33 | |
openstackgerrit | Merged openstack/octavia-tempest-plugin master: Support skipping APP_COOKIE and HTTP_COOKIE https://review.opendev.org/645495 | 23:34 |
rm_work | cgoncalves: so ... centos is getting ... slower? or what | 23:38 |
rm_work | i thought you showed graphs of the boot time getting way faster | 23:39 |
rm_work | now we're just getting TIMEOUTs on the centos gate job :( | 23:40 |
johnsom | Worst part is the centos gates have been that way for months | 23:50 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!