Monday, 2019-08-05

*** yamamoto has joined #openstack-lbaas		01:45
*** yamamoto has quit IRC		02:04
*** yamamoto has joined #openstack-lbaas		02:04
*** yamamoto has quit IRC		02:35
*** yamamoto has joined #openstack-lbaas		02:38
*** ricolin has joined #openstack-lbaas		02:51
*** abaindur has joined #openstack-lbaas		03:29
*** psachin has joined #openstack-lbaas		03:30
*** ricolin_ has joined #openstack-lbaas		03:35
*** ricolin has quit IRC		03:38
*** abaindur has quit IRC		03:54
*** ramishra has joined #openstack-lbaas		04:04
*** ramishra has quit IRC		04:45
*** ramishra has joined #openstack-lbaas		04:45
openstackgerrit	Merged openstack/octavia-tempest-plugin master: Adds provider flavor capabilities API tests https://review.opendev.org/631113	04:58
openstackgerrit	Merged openstack/octavia-tempest-plugin master: Add a flavor to the load balancer CRUD scenarios https://review.opendev.org/631353	04:58
openstackgerrit	Merged openstack/octavia-tempest-plugin master: Add amphora update service client and API test https://review.opendev.org/633295	04:59
openstackgerrit	Merged openstack/octavia-tempest-plugin master: Add amphora failover API test https://review.opendev.org/633614	04:59
*** vishalmanchanda has joined #openstack-lbaas		05:13
*** abaindur has joined #openstack-lbaas		05:18
*** abaindur has quit IRC		05:35
*** yamamoto has quit IRC		05:47
*** yamamoto has joined #openstack-lbaas		05:48
*** abaindur has joined #openstack-lbaas		05:54
*** abaindur has quit IRC		06:05
*** ramishra has quit IRC		06:06
*** ramishra has joined #openstack-lbaas		06:06
*** abaindur has joined #openstack-lbaas		06:12
*** abaindur has quit IRC		06:31
*** ricolin__ has joined #openstack-lbaas		06:41
*** ricolin__ is now known as ricolin		06:41
*** ltomasbo has left #openstack-lbaas		06:45
*** ricolin_ has quit IRC		06:45
*** ccamposr has joined #openstack-lbaas		06:46
*** maciejjozefczyk has joined #openstack-lbaas		07:02
*** maciejjozefczyk has quit IRC		07:03
*** rcernin has quit IRC		07:04
*** pcaruana has joined #openstack-lbaas		07:08
*** tesseract has joined #openstack-lbaas		07:17
*** maciejjozefczyk has joined #openstack-lbaas		07:33
*** rpittau\|afk is now known as rpittau		07:51
openstackgerrit	Ann Taraday proposed openstack/octavia master: Convert pool flows to use dicts https://review.opendev.org/665381	07:59
openstackgerrit	Ann Taraday proposed openstack/octavia master: Transition amphora flows to dicts https://review.opendev.org/668898	07:59
openstackgerrit	Ann Taraday proposed openstack/octavia master: [WIP] Lb flows to dicts https://review.opendev.org/671725	07:59
openstackgerrit	Ann Taraday proposed openstack/octavia master: [WIP] Jobboard based controller https://review.opendev.org/647406	07:59
*** tkajinam has quit IRC		08:11
*** yamamoto has quit IRC		08:13
openstackgerrit	Maciej Józefczyk proposed openstack/octavia master: Validate supported LB algorithm in Amphora provider drivers https://review.opendev.org/672477	08:19
*** tesseract-RH has joined #openstack-lbaas		08:22
*** tesseract has quit IRC		08:22
openstackgerrit	Maciej Józefczyk proposed openstack/octavia-tempest-plugin master: Specify used algorithm for tests https://review.opendev.org/672264	08:23
openstackgerrit	Maciej Józefczyk proposed openstack/octavia-tempest-plugin master: Add an option to reuse connections https://review.opendev.org/672976	08:23
*** tesseract-RH has quit IRC		08:24
*** tesseract has joined #openstack-lbaas		08:24
*** yamamoto has joined #openstack-lbaas		08:54
*** yamamoto has quit IRC		09:09
*** ajay33 has joined #openstack-lbaas		09:21
*** lemko has joined #openstack-lbaas		09:42
lemko	One of my loadbalancer and its unique amphora failed because of some issue with database. When I do openstack loadbalancer failover <loadbalancer_id>, it creates another amphora but it gets immediately in error state. As a result of this, there are four amphora for this loadbalancer, all in error state. Any idea, what can I do?	09:43
*** yamamoto has joined #openstack-lbaas		09:44
*** dasp has quit IRC		09:49
*** dasp has joined #openstack-lbaas		09:49
*** ramishra has quit IRC		11:24
*** ramishra has joined #openstack-lbaas		11:26
*** devfaz has quit IRC		11:26
*** devfaz has joined #openstack-lbaas		11:30
*** mkuf_ is now known as mkuf		11:35
*** yamamoto has quit IRC		11:53
jrosser	johnsom: when we try to locally build an amphora from the stable/stein branch code, something run inside diskimage-create.sh does "Cloning from amphora-agent cache and applying ref master" so we get a master version amphora on a Stein cloud, which doesnt work..... is there somewhere to specify the amphora branch to build, on top of checking out the stein branch of the octavia code?	12:07
maciejjozefczyk	hey! looks like we have some gate trouble after octavia-lib release, https://logs.opendev.org/77/672477/4/check/openstack-tox-py27/5af555f/job-output.txt.gz	12:09
jrosser	johnsom: looks like things break becasue the API version is now changed on master here https://github.com/openstack/octavia/commit/37799137a3f1f5ff6aa0f8809a141d4ea04cca75	12:09
*** yamamoto has joined #openstack-lbaas		12:13
*** yamamoto has quit IRC		12:24
*** ramishra has quit IRC		12:46
*** yamamoto has joined #openstack-lbaas		12:48
johnsom	lemko You will need to look in the worker log to see why the controller is unable to build a replacement. The amphora in error state will get cleaned up by the housekeeping manager eventually. They should not have actual nova VMs behind them.	13:02
*** goldyfruit has joined #openstack-lbaas		13:06
johnsom	jrosser https://github.com/openstack/octavia/blob/master/devstack/plugin.sh#L72	13:06
johnsom	You can set two environment variables to override the version of the agent it pulls in. I thought we had that in the README file, but I don't see it.	13:07
jrosser	johnsom: ah great will take a look at those	13:07
johnsom	jrosser So, as of this critical patch, you can older amphora image versions with the current controllers, but you can't run a newer amphora image on an older controller. This is due to the api version change.	13:09
jrosser	yes, thats what we see today	13:09
jrosser	the amphora has been built with 1.0 api and then things are all a bit broken	13:09
johnsom	Yeah, you would need an updated controller set	13:10
jrosser	so it should be possible to pin the repo back to stable/stein for the amphora build then?	13:10
johnsom	Yes	13:10
*** goldyfruit has quit IRC		13:11
johnsom	maciejjozefczyk Yes, there has been a patch up for that issue: https://review.opendev.org/#/c/673687/	13:11
maciejjozefczyk	johnsom, thanks	13:12
jrosser	johnsom: so relatedly, from an openstack-ansible perspective i guess this is now only good for master? https://github.com/openstack/openstack-ansible-os_octavia/blob/master/defaults/main.yml#L245	13:13
maciejjozefczyk	johnsom, did the same in our unittests, i'm working on functionals to enable those foo listeners . the same way You proposed here: https://review.opendev.org/#/c/665029/21/octavia/tests/functional/api/drivers/driver_agent/test_driver_agent.py	13:14
maciejjozefczyk	johnsom, but for now I stuck on duplicate config option: http://paste.openstack.org/show/755513/	13:15
johnsom	jrosser Yes, those are master only images	13:15
lemko	thanks johnsom, so the log is here: http://paste.openstack.org/show/755519/ about PortNotFound: port not found (port id: 42a01db7-0b95-49fd-afd1-cbe96c713b4a) and the operation is aborted after that. Any idea how to fix it?	13:25
johnsom	The actual error may have occurred prior to that log snippet.	13:27
johnsom	Usually the root issue is above that tree in the log	13:28
lemko	Here it is johnsom : http://paste.openstack.org/show/755520/	13:33
lemko	Instance could not be found is the issue?	13:34
johnsom	No, that actually looks ok. It says it's assuming it's already been deleted and moving on.	13:36
johnsom	It looks like this task "octavia.controller.worker.tasks.network_tasks.GetAmphoraeNetworkConfigs" is the one failing	13:36
johnsom	So, it is failing when trying to look up the VIP port.	13:37
lemko	So if the port is not found, what can be done?	13:38
johnsom	Well, this is very timely to report as I'm about to start work on fixing the failover flow. This is a good use case that needs to be fixed.	13:39
johnsom	We can probably recover it if it is worth the effort vs. just deleting and re-building the LB. Let me know which you prefer	13:41
johnsom	I have opened this story to track you situation	13:42
johnsom	https://storyboard.openstack.org/#!/story/2006333	13:42
lemko	If I can help the community, I would be happy ;)	13:44
lemko	It is worth the effort for me because it might happen again in the future and I would like to be able to fix it	13:44
johnsom	Lol. Yeah, fixing the failover flow is the next work item on my list. It has some deficiencies when resources disappear under the load balanacer.	13:44
johnsom	Ok, give me a minute to get setup and I will work with you through the process	13:45
johnsom	Ok, I need to restack a cloud for this. Give me ~10 minutes to get setup.	13:47
lemko	ok, thanks a lot!	13:47
*** pcaruana has quit IRC		13:47
*** ajay33 has quit IRC		13:50
*** yamamoto has quit IRC		13:58
*** pcaruana has joined #openstack-lbaas		14:00
johnsom	lemko Ok, ready to get started. Is the load balancer active/standby?	14:06
*** ramishra has joined #openstack-lbaas		14:07
lemko	SINGLE	14:07
lemko	Standalone I mean	14:07
johnsom	Ah, ok.	14:07
johnsom	First up, let's have a look at what is in the DB for this amp.	14:07
johnsom	connect to the octavia database. (mysql octavia)	14:08
johnsom	select * from amphora;	14:08
*** yamamoto has joined #openstack-lbaas		14:08
johnsom	select * from amphora where load_balancer_id = '<id>';	14:08
lemko	ok.	14:08
johnsom	Actually, looking for just the one amp is probably easier.	14:08
lemko	here it is : https://pastebin.com/0pdscaLL	14:11
lemko	you can see 7 amphora in error state	14:11
johnsom	Ok, reading	14:12
johnsom	Ok, good. Can you do an "openstack port show 5891db34-622b-40c5-88ad-a08919ad0da0"?	14:13
johnsom	Let's see if the other port is present or not.	14:14
lemko	the port is present	14:14
lemko	https://pastebin.com/jve0WiGn	14:14
johnsom	Ok, cool. So half the battle is done already.	14:15
johnsom	Now "openstack port list \| grep fa:16:3e:d0:8d:"	14:16
lemko	\| 5891db34-622b-40c5-88ad-a08919ad0da0 \| octavia-lb-vrrp-153a6d53-a9cc-45f9-b189-5caf7f323609 \| fa:16:3e:d0:8d:8c \| ip_address='192.168.81.8', subnet_id='23e3aa95-62c9-4fdb-bb9e-8453afc6756e' \| DOWN \|	14:17
johnsom	Ok, so the situation is that the base port is present however, neutron has lost it's "allowed_address_pairs" port for some reason. It's still configured on the base port, but it doesn't actually exist in neutron.	14:18
lemko	allowed_address_pairs should be the ip of the amphora responsible for it?	14:19
johnsom	It should be the load balancer VIP address. When you add an allowed_address_pair to a port, neutron creates a "fake" port for it.	14:20
lemko	should I delete it and add it again?	14:21
lemko	or simply delete it?	14:22
johnsom	Yeah, just a second, I will provide some commands to try	14:22
johnsom	openstack port unset --allowed-address ip-address=192.168.81.6 5891db34-622b-40c5-88ad-a08919ad0da0	14:23
johnsom	openstack port set --allowed-address ip-address=192.168.81.6,mac-address=fa:16:3e:d0:8d:8c 5891db34-622b-40c5-88ad-a08919ad0da0	14:23
johnsom	Let's try that and see if neutron will let us do it	14:23
lemko	Port does not contain allowed-address-pair {'ip_address': '192.168.81.6'}	14:24
lemko	:)	14:24
johnsom	Did both commands fail?	14:25
lemko	yes.	14:25
lemko	BadRequestException: 400: Client Error for url: http://neutron-server.openstack.svc.cluster.local:9696/v2.0/ports/5891db34-622b-40c5-88ad-a08919ad0da0, Request contains duplicate address pair: mac_address fa:16:3e:d0:8d:8c ip_address 192.168.81.6.	14:25
johnsom	Sigh, ok, maybe we need to be more specific for neutron.	14:26
lemko	for the second one	14:26
lemko	ok.	14:26
johnsom	let's try "openstack port unset --allowed-address ip-address=192.168.81.6,mac-address=fa:16:3e:d0:8d:8c 5891db34-622b-40c5-88ad-a08919ad0da0"	14:26
lemko	good. yes	14:27
lemko	should I do the second command and also specify the mac?	14:27
johnsom	Ha, ok, let's run the second command again	14:27
lemko	didn't read well sorry	14:27
lemko	ok just ran it	14:27
johnsom	No worries, I think we are doing fine.	14:28
johnsom	Ok, so that passed this time?	14:28
lemko	yes	14:28
johnsom	Excellent! Now let's do this again: "openstack port list \| grep fa:16:3e:d0:8d"	14:28
lemko	\| 5891db34-622b-40c5-88ad-a08919ad0da0 \| octavia-lb-vrrp-153a6d53-a9cc-45f9-b189-5caf7f323609 \| fa:16:3e:d0:8d:8c \| ip_address='192.168.81.8', subnet_id='23e3aa95-62c9-4fdb-bb9e-8453afc6756e' \| DOWN \|	14:29
johnsom	Just one result? I was hoping for two	14:29
lemko	just one yes.	14:29
johnsom	how about if we do "openstack port list \| grep 192.168.81.6"	14:30
lemko	no result	14:31
*** yamamoto has quit IRC		14:31
johnsom	Blah. Ok, so we are going to have to get more creative as neutron isn't rebuilding the AAP port for us.	14:32
*** yamamoto has joined #openstack-lbaas		14:32
johnsom	One second while I build a command for you	14:33
lemko	Sad. what's AAP? amphora port?	14:33
johnsom	Allowed_address_pairs (AAP)	14:35
johnsom	It's a neutron term for the "fake" port they create when you add a secondary IP address on a neutron port.	14:35
johnsom	Ok, let's give this a go: openstack port create --fixed-ip subnet=23e3aa95-62c9-4fdb-bb9e-8453afc6756e,ip-address=192.168.81.6 --disable --project 2a095cd2d94c4d888dd8e8edf8b851b3 temp-v	14:36
johnsom	ip	14:36
johnsom	Well, the name was supposed to be temp-vip, it line wrapped for some reason.	14:36
*** yamamoto has quit IRC		14:37
lemko	Ok, I created the port (I also added the --network field)	14:38
johnsom	Ah, yes, that might be needed.	14:39
johnsom	What is the ID of the port?	14:39
lemko	27bb002f-c515-403e-8093-52e0adec284d	14:39
johnsom	Ok, back in the octavia database, let's do this to trick Octavia to work around the missing AAP port:	14:40
johnsom	update amphora set ha_port_id = '27bb002f-c515-403e-8093-52e0adec284d' where load_balancer_id = '983c9b7c-6874-40a0-b626-d7694ddd93e6';	14:40
johnsom	once that is done, let's do another load balancer failover command	14:40
lemko	ok, it's going on	14:41
johnsom	When it completes we will do a few more commands to see if we need to do any cleanup or not.	14:42
lemko	ok it already failed	14:43
johnsom	Bummer, can you paste worker log?	14:43
lemko	http://paste.openstack.org/show/755525/	14:46
lemko	+ a tiny bit here : http://paste.openstack.org/show/755526/	14:48
johnsom	Ah, I made a mistake and forgot to update the port ID in one more table.	14:50
johnsom	In the DB we need to do this command as well:	14:50
johnsom	update vip set port_id = '27bb002f-c515-403e-8093-52e0adec284d' where load_balancer_id = '983c9b7c-6874-40a0-b626-d7694ddd93e6';	14:50
johnsom	Then we will try failover again	14:51
lemko	ok	14:51
lemko	it's now pending update	14:52
lemko	still stucked in pending update	14:55
johnsom	Give it some time	14:55
lemko	I see from octavia-worker :	14:55
lemko	2019-08-05 14:54:59,164.164 17 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: ConnectTimeout: HTTPSConnectionPool(host='10.22.0.6', port=9443): Max retries exceeded with url: /0.5/info (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x7f53e747a590>, 'Connection to 10.22.0.6 timed out. (connect timeout=10.0)'))	14:55
johnsom	Yeah, that is normal. That is just saying that nova has not yet finished booting the VM	14:55
johnsom	It will retry for a while, waiting for nova to finish the boot.	14:55
johnsom	The nova status goes to Active as soon as the process launches, but that is not when the VM is actually booted, so we have to retry/wait to find when it actually comes up	14:56
lemko	10.22.0.6 is an amphora in `openstack loablanacer list \| grep 983c9b7c-6874-40a0-b626-d7694ddd93e6` which is in error state and that I don't see any vm with this IP in my tenant where amphora are	14:57
johnsom	Typically that is about 30 seconds, but depending on your setup this could take up to 18 mintues.	14:57
lemko	I am a bit confused, what octavia will try to do? create a vm and give it the IP 10.22.0.6?	14:58
lemko	and why this IP since it was already in database when we started the procedure?	14:58
johnsom	Well, assuming that log entry is from the load balancer we are working on (it may not be), it would have asked nova to boot a VM and nova gave it back 10.22.0.6 as the IP address nova assigned for the lb-mgmt-net	14:59
johnsom	Nova can re-use IPs, which is fine, we account for that	14:59
johnsom	So, when you look at our load balancer with openstack loadbalancer show it's marked in provisioning_status ERROR?	15:01
johnsom	Not PENDING_UPDATE?	15:01
lemko	PENDING_UPDATE.	15:01
johnsom	Ok good, that is what we want to see	15:01
johnsom	What do we have here? select * from amphora where load_balancer_id = '983c9b7c-6874-40a0-b626-d7694ddd93e6';	15:03
lemko	BTW there's an ALLOCATED amphora with STANDALONE status for our LB but it has different IP	15:03
lemko	I'll execute the select	15:03
lemko	http://paste.openstack.org/show/755529/	15:04
johnsom	Ok, that is good. That is what I would expect. Now, why it's still in pending update.... hmmm	15:05
johnsom	Can you look in both the controller worker log and the health manager log to see if it's trying to contact "10.22.0.9"?	15:06
lemko	No mention of it.	15:08
lemko	only from house-keeping 30 mins afgo	15:08
lemko	ago*	15:08
lemko	2019-08-05 14:43:27,993.993 1 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: ConnectionError: HTTPSConnectionPool(host='10.22.0.9', port=9443): Max retries exceeded with url: /0.5/info (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f3df53f91d0>: Failed to establish a new connection: [Errno 111] Connection refused',))	15:08
lemko	when the amphora was booting and getting ready, it was full of logs like this but it's the last one I sent you	15:09
lemko	and octavia worker is still trying to contact 10.22.0.6	15:10
johnsom	Hmm. Have you tuned your startup timeouts or are you running with the defaults?	15:11
johnsom	Do you have spares pool enabled?	15:13
lemko	Yes	15:13
lemko	spare-pool of 2	15:13
lemko	here is the conf with timeout http://paste.openstack.org/show/755530/	15:13
lemko	full conf : http://paste.openstack.org/show/N3pHJTLKbVx5NIKVwTTw/	15:15
johnsom	Yeah, ok, we are going to have to wait a while for the LB to go back to ERROR	15:15
johnsom	You might consider the following settings:	15:16
johnsom	[haproxy_amphora] connection_max_retries = 120	15:16
johnsom	[haproxy_amphora] build_active_retries = 120	15:16
lemko	now we're waiting for connection_max_retries to go away	15:16
lemko	?	15:16
johnsom	So, spares pool bring an intersting twist to this. I wonder if there isn't a bug in the spares pool allocation on failover.	15:17
johnsom	Yeah, we need the controller to release ownership of the load balancer. Basically we want it to come out of the PENDING_* state and either go ACTIVE or ERROR	15:17
lemko	Is there a chance it can go to ACTIVE?	15:18
johnsom	I'm going to look in the failover code path when spares pool is enabled.	15:18
lemko	what would forcing it to ERROR do?	15:18
johnsom	Bad things	15:18
*** yamamoto has joined #openstack-lbaas		15:19
johnsom	When an object is in PEDNING_* a controller has ownership of the resource. If you force it out of PENDING you are going to leave resources in use in the tenant and most likely make this situation of repairing much worse.	15:20
lemko	OK. I did this a few time and I know understand why it ended badly with me having to purge completely octavia and reinstall it	15:20
lemko	now*	15:20
johnsom	Plus, at some point the controller will give up waiting on nova and will start making changes to whatever exists at that time.	15:20
johnsom	So it could circle back and start deleting things even if we resolved the problem	15:21
johnsom	Yeah, PENDING is an important state.	15:21
johnsom	It's best if you tune those timeouts so Octavia doesn't retry so long on nova/neutron.	15:21
johnsom	That way it will release the resource much faster.	15:22
lemko	If I change it now, will it take effect now?	15:22
johnsom	No	15:22
lemko	Ok.	15:22
johnsom	The long defaults are there for the zuul test instances that run in nested VMs without hardware acceleration. They can take up to 18 minutes to boot a VM.	15:23
johnsom	It's an unfortunate set of defaults.	15:23
johnsom	So, what I think we should do is:	15:24
johnsom	1. Wait for the controller to release the load balancer. Likely to ERROR	15:24
johnsom	2. Check the amphora table status. Likely all DELETED and ERROR, which would be fine.	15:25
*** yamamoto has quit IRC		15:25
johnsom	3. Purge the old amphora records from the amphora table (I think there might be a bug with spares pool). "DELETE from amphora where status = 'ERROR' or 'DELETED';"	15:26
*** goldyfruit has joined #openstack-lbaas		15:26
johnsom	4. Trigger another failover.	15:26
johnsom	optionally, update those timeouts and restart the controllers. Please wait until the LB is out of PENDING though.	15:27
openstackgerrit	Adam Harwell proposed openstack/octavia master: Fix a unit test for new octavia-lib https://review.opendev.org/673687	15:27
lemko	and what about the port we created previously?	15:28
lemko	the temp-vip port	15:29
johnsom	After we get the LB going again. We should grep the port list for the VIP IP. if we see the other AAP port with it we will delete the temp-vip	15:29
rm_work	ok lets	15:35
rm_work	*let's see if that works now	15:35
johnsom	Hmm, I don't see anything the the failover flow with spares that could cause it to get the wrong amp for the DB.	15:39
lemko	Thanks a lot for your help. I'll come back to you when the retry ends ;)	15:44
johnsom	Ok, cool	15:45
rm_work	yep cool, docs passed	15:49
rm_work	so cgoncalves we could prolly merge https://review.opendev.org/#/c/673687/ now :D it'll fix our lower-constraints job which will I believe otherwise continue to break	15:49
rm_work	(technically waiting for tests to finish, but everything passed before besides docs	15:49
rm_work	)	15:49
rm_work	johnsom: https://review.opendev.org/#/c/673172/	15:58
rm_work	https://review.opendev.org/#/c/674087/ is waiting on that one to merge ^^	15:58
rm_work	there's a LOT of patches up with one +2 from me that could use reviews from other cores ;)	16:01
*** tesseract has quit IRC		16:04
openstackgerrit	Carlos Goncalves proposed openstack/octavia master: WIP: Switch Fedora-based amphora to fedora-minimal https://review.opendev.org/673173	16:09
rm_work	specifically this thingy https://review.opendev.org/#/c/645495/ could probably use a review, it's been waiting a while and seems trivial-ish to me	16:12
*** henriqueof has quit IRC		16:12
rm_work	and https://review.opendev.org/#/c/661309/ will start another one	16:20
rm_work	ugh need to merge https://review.opendev.org/#/c/673687/ before anything in Octavia can merge I thinjk	16:30
johnsom	Yes	16:31
rm_work	can we just +A that as a gatefix	16:32
rm_work	I guess I'll wait at least until the checks finish	16:32
rm_work	ok cool, cgoncalves got it :)	16:32
johnsom	It seems like the octavia-lib patches should run some octavia tests.... At least unit and functional	16:33
rm_work	maybe yeah <_<	16:37
*** ramishra has quit IRC		16:44
*** ricolin has quit IRC		16:57
*** goldyfruit has quit IRC		17:18
*** goldyfruit_ has joined #openstack-lbaas		17:18
*** Vorrtex has joined #openstack-lbaas		17:21
openstackgerrit	Michael Johnson proposed openstack/octavia master: Add Octavia tox "tips" jobs https://review.opendev.org/674659	17:21
johnsom	Let's see what that does. If it works we can add it to octavia-lib	17:21
openstackgerrit	Carlos Goncalves proposed openstack/octavia master: Install missing packages in nodepool instance https://review.opendev.org/674259	17:21
openstackgerrit	Carlos Goncalves proposed openstack/octavia master: WIP: Switch Fedora-based amphora to fedora-minimal https://review.opendev.org/673173	17:21
rm_work	tests take sooooo excruciatingly looooooong	17:26
*** psachin has quit IRC		17:28
*** rpittau is now known as rpittau\|afk		17:42
johnsom	Is it the functionals?	17:51
johnsom	I am seeing this odd behavior that if an /etc/octavia/octavia.conf exists, and debug is True, the functional tests slow to a crawl	17:52
openstackgerrit	Merged openstack/octavia-tempest-plugin master: Increase connection_max_retries to 480 secs on CentOS jobs https://review.opendev.org/673172	17:56
rm_work	hmmmmm	18:14
rm_work	i should check	18:14
rm_work	nope don't have that	18:14
rm_work	it's specifically these new data api ones	18:15
rm_work	err, let me see, how do i get it to tell me the times again?	18:15
rm_work	anywho, gonna rebase patches down the line	18:17
openstackgerrit	Adam Harwell proposed openstack/octavia master: Lookup interfaces by MAC directly https://review.opendev.org/673337	18:17
openstackgerrit	Adam Harwell proposed openstack/octavia master: Fix L7 repository create methods https://review.opendev.org/673154	18:17
openstackgerrit	Adam Harwell proposed openstack/octavia master: Fix provider driver utils https://review.opendev.org/673155	18:17
openstackgerrit	Adam Harwell proposed openstack/octavia master: Add get method support to the driver-agent https://review.opendev.org/665029	18:17
*** maciejjozefczyk has quit IRC		18:52
rm_work	yeah i'm thinking i can't actually get a functional suite run to complete	18:57
rm_work	they hang on my machine	18:57
johnsom	Really? Total hang or super slow reports?	18:58
*** gcheresh_ has joined #openstack-lbaas		19:15
*** gcheresh_ has quit IRC		19:21
rm_work	total hang when i run it from the CLI with tox	19:21
rm_work	they hang for about 3 minutes each when i run them individually	19:22
rm_work	in pycharm	19:22
rm_work	trying to figure out WHERE it's hanging	19:22
rm_work	ah i am 90% sure it's this:	19:23
rm_work	self.status_listener_proc.join(60)	19:23
rm_work	self.stats_listener_proc.join(60)	19:23
rm_work	self.get_listener_proc.join(60)	19:23
rm_work	that's the 3 minutes :D	19:23
rm_work	they all time out one at a time	19:23
rm_work	which means the real issue is that the exit_event that's set isn't actually working	19:23
rm_work	they all just run `server.handle_request()` which seems to be blocking	19:24
rm_work	should have a timeout set? what is the default for CONF.driver_agent.get_request_timeout	19:25
rm_work	5 seconds? ... it's not respecting that for sure	19:26
rm_work	johnsom: ^^	19:27
johnsom	Hmm, works great for me, those take about 9 seconds each and done	19:30
johnsom	I thought I left those without a timeout....	19:35
johnsom	I only force killed the third party processes	19:35
johnsom	Yeah where do you see the join(60)???	19:37
rm_work	top of test_driver_agent.py	19:37
rm_work	like like 50 or so	19:37
rm_work	*line	19:37
johnsom	Ah, the test	19:37
johnsom	https://www.irccloud.com/pastebin/QAnZ5M2S/	19:40
johnsom	johnsom@python23:/tmp/octavia$ tox -e functional -- octavia.tests.functional.api.drivers.driver_agent	19:41
johnsom	Each test is right about 9 seconds	19:42
rm_work	yeah these are never dieing	19:42
rm_work	i wonder if the socket implementation is different / f'd on OSX	19:43
rm_work	ahh hmm	19:44
rm_work	in py27 they finish basically instantly	19:44
rm_work	it's in py37 that they hang	19:44
johnsom	Ah, let me try	19:44
rm_work	(again, i usually test in py37)	19:45
rm_work	or py36	19:45
rm_work	T_T	19:45
johnsom	I get the same on py36 (this 18.04 VM doesn't have 3.7 on it)	19:46
johnsom	https://www.irccloud.com/pastebin/uHNqGYKm/	19:46
rm_work	grrrr something in cinder broke and killed our grenade run and so we have to recheck that patch and have it run through twice again	19:49
rm_work	there goes another 5 hours	19:49
johnsom	I thought I had cinder disabled. Is it just that it keeps cloning it?	19:50
johnsom	I shut it down for this very reason....	19:50
rm_work	it looked like it was installing/setting it up during the devstack run	19:50
rm_work	again, in Grenade	19:50
rm_work	so maybe not disabled there?	19:50
johnsom	Yeah, ok.	19:50
rm_work	setting up a functional-py36 env so i can test	19:51
rm_work	to see if it's a 3.7 only issue	19:51
rm_work	it could be <_<	19:51
johnsom	well, installing "python3.7" on 18.04 ends up with strange results. The tests "pass" but it throws one of the "ascii" codec errors	19:55
johnsom	No runtime info, nothing	19:55
johnsom	Yeah, the built in 3.7 support seems to just not function correctly....	19:58
openstackgerrit	Anqi Li proposed openstack/octavia master: Implements notifications for octavia https://review.opendev.org/674432	19:59
rm_work	yes, 3.6 runs fine	20:03
rm_work	<_<	20:03
rm_work	only 3.7 hangs	20:04
rm_work	so something is borked in 3.7 and we will have to figure it out at some point	20:04
johnsom	Don't our gates run that? Seems like I need to figure that out sooner-ish	20:10
johnsom	Ah, we only have 3.6 in there	20:11
openstackgerrit	Adam Harwell proposed openstack/octavia master: Add get method support to the driver-agent https://review.opendev.org/665029	20:17
rm_work	well, that's a working version anyway	20:17
rm_work	but yeah uhh... something is causing the process to hang in 3.7 and it COULD be the socket lib, not sure	20:17
*** henriqueof has joined #openstack-lbaas		20:20
openstackgerrit	Anqi Li proposed openstack/octavia master: Implements notifications for octavia https://review.opendev.org/674432	20:28
openstackgerrit	Adam Harwell proposed openstack/octavia master: Move to using octavia-lib constants https://review.opendev.org/673712	20:32
rm_work	eugh rebases	20:36
openstackgerrit	Adam Harwell proposed openstack/octavia master: Add long-running provider agent support https://review.opendev.org/674140	20:37
rm_work	ok that chain is up to date	20:37
openstackgerrit	Michael Johnson proposed openstack/octavia master: Add Octavia tox "tips" jobs https://review.opendev.org/674659	20:41
*** lemko has quit IRC		20:54
*** vishalmanchanda has quit IRC		21:03
*** yamamoto has joined #openstack-lbaas		21:23
*** yamamoto has quit IRC		21:28
openstackgerrit	Michael Johnson proposed openstack/octavia master: Add the DIB_REPO* variables to the README.rst https://review.opendev.org/674701	21:30
johnsom	Could have swore we had that in the README, but I guess not.	21:30
*** rcernin has joined #openstack-lbaas		22:06
*** spatel has joined #openstack-lbaas		22:12
rm_work	AUGH IT FAILED AGAIN	22:32
rm_work	WTB gatefix merging please zuul	22:32
*** tkajinam has joined #openstack-lbaas		22:56
*** Vorrtex has quit IRC		22:58
*** henriqueof has quit IRC		23:00
* johnsom starts chanting towards zuul		23:01
*** spatel has quit IRC		23:33
openstackgerrit	Merged openstack/octavia-tempest-plugin master: Support skipping APP_COOKIE and HTTP_COOKIE https://review.opendev.org/645495	23:34
rm_work	cgoncalves: so ... centos is getting ... slower? or what	23:38
rm_work	i thought you showed graphs of the boot time getting way faster	23:39
rm_work	now we're just getting TIMEOUTs on the centos gate job :(	23:40
johnsom	Worst part is the centos gates have been that way for months	23:50

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!