Friday, 2018-08-03

bbbbzhao_	cgoncalves: Maybe you don't run health-monitor to refresh the operating_status ? ;-)	00:07
bbbbzhao_	cgoncalves: Sorry , I need to run to office, it's late.. ;-). I will reply you when I arrive.	00:07
bbbbzhao_	johnsom: Does that mean I need to post a new revision for patch 2?	00:08
johnsom	I haven't posted any comment that require a revision to patch 2. We may add a patch to the end of the chain, but I am not committed to another patch 2 yet	00:10
bbbbzhao_	johnsom: Oh, yeah. Sorry. I consider -1 for making me concern the issue. Thanks	00:23
johnsom	bbbbzhao_ Yesh, No problem, I just wanted to call attention to my question there as reordering flows is higher risk	00:26
*** longkb has joined #openstack-lbaas		00:49
*** abaindur has quit IRC		01:31
*** hongbin has joined #openstack-lbaas		01:44
bzhao__	cgoncalves: Hi, For your question, did you test on centos? The get_udp_listeners just do searching the udp specific named config file, is_udp_listener_running is checking the keepalived which hold the specific file is running by searching in /proc with its pid.. Maybe the keepalived is not running.	01:59
*** ramishra has joined #openstack-lbaas		02:00
*** yamamoto has joined #openstack-lbaas		02:01
bzhao__	johnsom: I will provider a full logs for the concerned reorder issue in part 2. And today, I will begin the storyboard highest UDP bug today, and try my best to fullfill the rest of them in this weekend. Thank you again for many helps, and thanks our octavia team. ;-)	02:04
bzhao__	cgoncalves: So your amp are in trouble status, the healthmonitor will remove the amp and rebuild it.	02:05
bzhao__	cgoncalves: As it can not get the expect listener to update into db.	02:05
bzhao__	cgoncalves: I mean "it" above sentence is health monitor process	02:06
*** ramishra has quit IRC		02:08
*** yamamoto has quit IRC		02:12
*** yamamoto has joined #openstack-lbaas		02:18
*** yamamoto has quit IRC		02:23
bzhao__	cgoncalves: sorry, s/ healthmonitor /health manger/	02:35
johnsom	Thank you. Volunteering at the county fair tonight I can’t work on it tonight, but will work again tomorrow	02:39
*** yamamoto has joined #openstack-lbaas		03:03
bzhao__	johnsom: Thanks. ;-). Run with the time.	03:08
bzhao__	johnsom: I had prepared the LOG, I think I show it here. Hope not fresh other guys' screen.	03:09
bbbbzhao_	https://www.irccloud.com/pastebin/tgt6zgFO/This%20is%20operation%20steps.	03:10
*** yamamoto has quit IRC		03:11
bbbbzhao_	http://paste.openstack.org/show/727198/ This is piece of health manager logs.	03:20
bbbbzhao_	I collect this by not moving the order. Agent side will raise 500 for start listener https://www.irccloud.com/pastebin/GuMt9ECo/This%20is%20the%20error%20in%20log	03:26
bbbbzhao_	johnsom: I mark here and ping you for not miss it. Thanks. ;-). Have a good rest.	03:26
*** yamamoto has joined #openstack-lbaas		03:37
*** yamamoto has quit IRC		03:41
*** yamamoto has joined #openstack-lbaas		03:43
*** hongbin has quit IRC		03:52
*** yamamoto has quit IRC		04:02
*** ramishra has joined #openstack-lbaas		04:03
*** yamamoto has joined #openstack-lbaas		04:03
*** yamamoto has quit IRC		04:14
*** yamamoto has joined #openstack-lbaas		04:28
*** yamamoto has quit IRC		04:38
*** yamamoto has joined #openstack-lbaas		04:43
*** yamamoto has quit IRC		04:47
*** yamamoto has joined #openstack-lbaas		04:56
*** yamamoto has quit IRC		04:58
*** yamamoto has joined #openstack-lbaas		06:03
*** yamamoto_ has joined #openstack-lbaas		06:07
*** yamamoto has quit IRC		06:09
*** yamamoto_ has quit IRC		06:10
*** rcernin has quit IRC		06:54
*** annp has quit IRC		07:02
*** longkb has quit IRC		07:03
*** longkb has joined #openstack-lbaas		07:17
*** longkb has quit IRC		07:31
*** longkb has joined #openstack-lbaas		07:35
*** ktibi has joined #openstack-lbaas		07:56
cgoncalves	bzhao__, yes, centos. right, health manager is failing over amp because of expected listeners not matching	08:19
cgoncalves	I will continue looking at it today	08:19
*** salmankhan has joined #openstack-lbaas		08:48
openstackgerrit	ZhaoBo proposed openstack/octavia master: Followup patch for UDP support https://review.openstack.org/587690	08:49
bzhao__	cgoncalves: Thanks. My env just be flushed. Is the process can be found in "/proc/PID" on centos? If yes, we must to go inside and check the /var/log/message for the reason why it can not be setup. If not, another different between OS.	08:51
cgoncalves	bzhao__, I'm restacking with latest patches. yeah, I wanted to check that yesterday but was 2 AM for me :)	08:55
cgoncalves	I'll check and keep you posted	08:55
bzhao__	cgoncalves: You work so hard. ;-) . Take a good rest.	08:58
cgoncalves	thanks :)	08:58
*** obre is now known as obre_		09:58
*** obre_ is now known as obre		09:58
*** obre has quit IRC		10:04
openstackgerrit	ZhaoBo proposed openstack/octavia master: [UDP] Fix failed member always in DRAIN status https://review.openstack.org/588511	10:53
*** amuller has joined #openstack-lbaas		11:53
*** longkb has quit IRC		12:38
*** ktibi has quit IRC		13:04
*** ramishra has quit IRC		13:22
*** ktibi has joined #openstack-lbaas		14:06
cgoncalves	I no longer get unexpected # of listeners. latest PS should have fixed it	14:12
cgoncalves	altough pool operating_status keeps on OFFLINE	14:13
cgoncalves	member provisioning status is ACTIVE	14:13
cgoncalves	ah... netcat isn't installed on centos amp :/	14:16
*** rpittau has quit IRC		14:18
*** hongbin has joined #openstack-lbaas		14:33
*** erjacobs has joined #openstack-lbaas		14:41
cgoncalves	interesting. amphora-haproxy netns isn't created after vm reboot and isn't failed over as eth0 (lb-mgmt) is up and reports health msgs	14:41
cgoncalves	https://storyboard.openstack.org/#!/story/2003306	14:54
openstackgerrit	German Eichberger proposed openstack/octavia master: Allows failover if port is not deallocated by nova https://review.openstack.org/585864	15:47
*** erjacobs has quit IRC		15:57
-openstackstatus- NOTICE: The infra team is renaming projects in Gerrit. There will be a short ~10 minute Gerrit downtime in a few minutes as a result.		16:02
johnsom	cgoncalves Hmm, so I think there is a systemd service for setting up the netns, I wonder if that is failing	16:21
xgerman_	would that run for UDP?	16:27
johnsom	Well, it doesn't depend on any other systemd service, but the question is where in the code is it getting written out.	16:32
johnsom	Yeah, it is missing on the pure UDP path	16:34
*** openstackgerrit has quit IRC		16:49
cgoncalves	johnsom, can you reproduce it on ubuntu?	16:52
johnsom	I haven't tried, but I can clearly see in the UDP path where this is missing.	16:52
johnsom	cgoncalves https://github.com/openstack/octavia/blob/master/octavia/amphorae/backends/agent/api_server/listener.py#L190	16:53
johnsom	That section is not in the pure UDP path	16:53
johnsom	And should be	16:53
johnsom	Though a refactor would be nice too, but....	16:53
*** dmellado has quit IRC		17:22
johnsom	cgoncalves Can you confirm the LB with the amp you reboot only had a UDP listener? Is there something more you want me to test here on xenial?	17:27
cgoncalves	johnsom, only one UDP listener	17:28
cgoncalves	johnsom, netcat being installed is a new requirement, correct?	17:28
johnsom	cgoncalves Yeah, ok, so it's that missing code. I will fix in my patch today.	17:28
-openstackstatus- NOTICE: Project renames and review.openstack.org downtime are complete without any major issue.		17:28
johnsom	I am stacking last night's code, going to investigate that flow re-order, then try to finish my review of #2	17:29
cgoncalves	thanks \o/	17:29
*** openstackgerrit has joined #openstack-lbaas		17:39
openstackgerrit	Carlos Goncalves proposed openstack/octavia-tempest-plugin master: WIP: Gate on CentOS 7 and check on Ubuntu Bionic https://review.openstack.org/587414	17:39
openstackgerrit	Carlos Goncalves proposed openstack/octavia-tempest-plugin master: WIP: Gate on CentOS 7 and check on Ubuntu Bionic https://review.openstack.org/587414	17:44
cgoncalves	johnsom, we'll need to revert your Ib8677d2b85e352b19abf5fd0b79c1b8653819301	17:45
cgoncalves	"Job octavia-v2-dsvm-scenario-ubuntu.bionic in openstack/octavia-tempest-plugin is not permitted to shadow job octavia-v2-dsvm-scenario-ubuntu.bionic in openstack/octavia"	17:45
johnsom	Say what?	17:45
cgoncalves	or I can fix that in https://review.openstack.org/#/c/587442/	17:46
*** salmankhan has quit IRC		17:46
johnsom	cgoncalves Where are you seeing that? Ib8677d2b85e352b19abf5fd0b79c1b8653819301 is correct, however, there should be no definition for that in octavia/octavia	17:48
KeithMnemonic	johnsonm can you point me in the location of the code that populates the octavia.conf or amphora-agent.conf in the amphora images. i.e when changing a value on the octavia-worked node how it gets sent to the amphora when it is freshly booted?	17:49
johnsom	amphora-agent.conf is created at amp boot time and loaded via config drive. It does not get updated after boot (If I remember right). We have plans to enable that, but not implemented yet	17:50
johnsom	KeithMnemonic Hi Keith BTW.	17:51
johnsom	The template is here: https://github.com/openstack/octavia/blob/master/octavia/amphorae/backends/agent/templates/amphora_agent_conf.template	17:51
johnsom	It gets rendered here: https://github.com/openstack/octavia/blob/master/octavia/controller/worker/tasks/compute_tasks.py#L77	17:52
johnsom	cgoncalves Ah, this needs to go away... https://github.com/openstack/octavia/blob/master/zuul.d/jobs.yaml#L54	17:53
johnsom	Sigh, to many things going on at once....	17:53
johnsom	cgoncalves yes, nuke it here would be great: https://review.openstack.org/#/c/587442/	17:53
cgoncalves	right. that's what I'm gonna do once I get to fix octavia-v2-dsvm-scenario-centos.7	17:54
johnsom	Thanks	17:55
KeithMnemonic	johnsonm Hello Back, sorry to be so abrupt ;0-) thanks, config drive is what i was looking for	17:57
KeithMnemonic	is that the same for octavia.conf	17:57
johnsom	octavia.conf is only on the controllers	17:57
johnsom	And is configured/installed by the operator or packager	17:58
KeithMnemonic	ok do maybe this guy read your note wrong http://eavesdrop.openstack.org/irclogs/%23openstack-lbaas/%23openstack-lbaas.2016-05-26.log.html	17:58
KeithMnemonic	johnsom: Just a second, I will send you the settings I think you need to increase	17:58
KeithMnemonic	johnsom: kevo These two in octavia.conf	17:58
KeithMnemonic	johnsom: # rest_request_conn_timeout = 10	17:58
KeithMnemonic	johnsom: # rest_request_read_timeout = 60	17:58
KeithMnemonic	kevo: Johnson, I'll try that out and I'll let you know. Thanks	17:58
KeithMnemonic	kevo: thanks johnsom your suggestion worked.	17:58
johnsom	# ls /etc/octavia/	17:58
johnsom	amphora-agent.conf certs	17:58
johnsom	root@amphora-6bf09249-2ba9-4ce3-9572-83e61dcf5e21:/usr/lib/systemd/system#	17:58
KeithMnemonic	i see an octavia.conf but it is all commented out	17:59
KeithMnemonic	do you recall that conversation	17:59
johnsom	Right, those settings are only valid in the octavia.conf (they are controller settings). When they are commented out, they are using the default values, which is the number in the comment.	18:00
openstackgerrit	Carlos Goncalves proposed openstack/octavia master: Gate on octavia-dsvm-base based jobs and housekeeping https://review.openstack.org/587442	18:00
KeithMnemonic	ok they guy who pinged me thought they should end up on the amphorae	18:00
johnsom	So, this: # rest_request_conn_timeout = 10	18:00
johnsom	Means rest_request_conn_timeout is using the coded default, which is 10 by default.	18:01
johnsom	No, that is the timeout for the controller talking to the amphora-agent. It doesn't get set or used in the amp	18:01
KeithMnemonic	yes he misunderstood	18:01
johnsom	Two years ago??? ha	18:02
KeithMnemonic	his issue is the vip is not plugging fast enough and he saw that old thread and thought it was the same issue	18:02
johnsom	Lucky if I remember what people asked me last week	18:02
johnsom	Hmm, so nova isn't booting the VM fast enough?	18:02
openstackgerrit	Carlos Goncalves proposed openstack/octavia master: Gate on octavia-dsvm-base based jobs and housekeeping https://review.openstack.org/587442	18:02
johnsom	KeithMnemonic That setting is connection_max_retries and connection_retry_interval	18:03
johnsom	But the default for that is like 25 minutes. If he can't boot an amp in that, he might as well go home.....	18:04
johnsom	Could be that his lb-mgmt-net isn't working	18:04
johnsom	Normal time for that action is less than 30 seconds	18:04
johnsom	We have it at 25 minutes for virtualbox users and super slow gate test hosts	18:05
openstackgerrit	Carlos Goncalves proposed openstack/octavia master: Gate on octavia-dsvm-base based jobs and housekeeping https://review.openstack.org/587442	18:05
* cgoncalves needs more coffee...		18:05
johnsom	Production deploys usually drop that down, but like I said, it's normal to just boot in less than 30 seconds	18:05
johnsom	KeithMnemonic I would check that the lb-mgmt-net is even working. Most likely that is the problem	18:06
KeithMnemonic	yeah for sure. i can check the plumbing	18:06
KeithMnemonic	the log showed Error code 400:	18:07
KeithMnemonic	JSON Response:	18:07
KeithMnemonic	{	18:07
KeithMnemonic	'message': 'Invalid VIP',	18:07
KeithMnemonic	}	18:07
KeithMnemonic	I found a match of these error:	18:07
KeithMnemonic	and that led him to that old thread listed above	18:07
KeithMnemonic	2018-07-03 11:44:44.977 1711 INFO werkzeug [-] 10.207.206.13 - - [03/Jul/2018 11:44:44] "POST /0.5/plug/vip/10.207.221.55 HTTP/1.1" 404 –	18:09
colin-	when you guys create an LB and the provisioning state goes to ERROR, what's the first place you look for additional info? i'm not super familiar with the API at the moment and my instinct is to just check the worker's process output but i'm wondering if there's better info than that available	18:09
johnsom	KeithMnemonic 400 means user error. Like the VIP address doesn't match the subnet/network specified, or something like that.	18:16
cgoncalves	colin-, that is how I do it, too :/	18:17
johnsom	colin- Yes, provisioning_status ERROR means the controller ran into a problem and all of the retries/workarounds have failed and the controller needs to stop and ask for operator intervention. Like if nova goes down in the middle of booting an amphora, or neutron fails to create a port and the retries time out. The first stop is going to be the controller logs	18:17
cgoncalves	today I added "Debugging - End-user friendly ERROR messages" as proposed topic for the PTG	18:18
johnsom	In that state the end user has the option of escalating to the operator or deleting the object in ERROR and trying again	18:18
*** openstackgerrit has quit IRC		18:19
johnsom	cgoncalves bring the popcorn. This is fun topic. Many operators want to hide the true reason for the failures from the users... SLA contracts and such....	18:19
colin-	understood, thanks	18:19
johnsom	They don't like octavia objects to say "Nova compute has been down for 8 hours, unable to create load balancer" Which is exactly what it would say for a recent issue I saw... lol	18:20
cgoncalves	johnsom, I know. i was thinking soemthing generic yet a bit more useful like pointing fingers to the compute or network service	18:20
johnsom	cgoncalves It is a good topic to discuss though, so please feel free to add it to the etherpad	18:32
johnsom	https://etherpad.openstack.org/p/octavia-stein-ptg	18:33
cgoncalves	I already did :)	18:33
johnsom	I see that. Awesome	18:33
*** openstackgerrit has joined #openstack-lbaas		18:34
openstackgerrit	German Eichberger proposed openstack/octavia master: Allows failover if port is not deallocated by nova https://review.openstack.org/585864	18:34
openstackgerrit	Carlos Goncalves proposed openstack/octavia-tempest-plugin master: WIP: Gate on CentOS 7 and check on Ubuntu Bionic https://review.openstack.org/587414	18:36
*** ktibi has quit IRC		19:04
*** abaindur has joined #openstack-lbaas		19:16
*** salmankhan has joined #openstack-lbaas		19:19
*** salmankhan has quit IRC		19:24
*** abaindur has quit IRC		19:28
*** amuller has quit IRC		19:52
*** dmellado has joined #openstack-lbaas		19:56
rm_work	jiteka: ask questions here :P	20:09
jiteka	I'm too shy	20:15
jiteka	:D	20:15
jiteka	actually I already bothered johnsom couple of time this week	20:15
jiteka	it was your turn	20:15
xgerman_	lol	20:16
rm_work	lol	20:17
*** harlowja has joined #openstack-lbaas		20:21
johnsom	jiteka No need to be shy, we are all friendly here	20:27
jiteka	I know johnsom was joking :)	20:29
johnsom	Funny how we all have the same advice... grin	20:37
johnsom	Nice, our upgrade tags have shown up: https://pypi.org/project/octavia/	20:47
cgoncalves	I don't get why so often our centos job fails to build the amp image with "Cannot retrieve metalink for repository: epel/x86_64. Please verify its path and try again"	20:48
cgoncalves	http://logs.openstack.org/14/587414/5/check/octavia-v2-dsvm-scenario-centos.7/18de25c/job-output.txt.gz#_2018-08-03_20_28_39_672377	20:48
cgoncalves	this is a successful run: http://logs.openstack.org/55/587255/1/check/octavia-v1-dsvm-scenario-kvm-centos.7/f145430/logs/devstacklog.txt.gz#_2018-07-31_00_16_20_969	20:50
johnsom	looking	20:50
cgoncalves	even though there was that metadata 404	20:50
johnsom	Well, even on that one "updateinfo.xml.bz2: [Errno 14] HTTP Error 404 - Not Found" doesn't seem good	20:51
johnsom	So, I could be lazy and say the epel mirrors are trash, but I'm not so give me a few minutes to look at some things	20:52
cgoncalves	johnsom, I asked on #openstack-infra but no luck. perhaps you could work your magic :D	20:52
johnsom	lol	20:53
johnsom	Maybe they don't get into the RedHat parties either	20:53
cgoncalves	lol	20:53
johnsom	Hmm, looks like that is occuring in the base elements from DIB too and not one of ours? I can't imagine you got an include of iscsi by me	20:55
cgoncalves	diskimage-builder/diskimage_builder/elements/base/install.d/00-baseline-environment	20:56
cgoncalves	install-packages -m base iscsi_package	20:56
xgerman_	with the next PTG around the corner —- we are better on that bus ;-)	21:01
johnsom	cgoncalves My initial guess is this element step is not working: 01-set-centos-mirror	21:02
johnsom	But still looking	21:02
cgoncalves	johnsom, I doubt that because that's for centos repos. epel is managed separately	21:03
johnsom	That it is running out to the interwebs and not using the OpenStack infra mirrors. But it could be that that epel isn't mirrored as well	21:03
johnsom	Well, that might be the issue.... See what I am saying	21:03
johnsom	cgoncalves Yep, ok, got it	21:04
johnsom	http://logs.openstack.org/14/587414/5/check/octavia-v2-dsvm-scenario-centos.7/18de25c/job-output.txt.gz#_2018-08-03_20_28_38_556548	21:04
johnsom	This is the execution log of the DIB phase. The # prefixes are the ordering	21:05
johnsom	Oh, wait, nevermind, looking in the wrong place.	21:05
johnsom	05-rpm-epel-release This one might not be working....	21:05
johnsom	lol	21:05
johnsom	https://github.com/openstack/diskimage-builder/tree/master/diskimage_builder/elements/epel	21:06
cgoncalves	from what I've seen, it is working. the epel-release package gets installed	21:06
cgoncalves	mirror is not overwritten, which is expected	21:07
cgoncalves	it's really intermittent. see http://logs.openstack.org/06/586906/2/check/octavia-v1-dsvm-scenario-kvm-centos.7/b60cf4b/logs/devstacklog.txt.gz#_2018-08-01_12_12_02_645. no errors or warnings whatsoever	21:08
johnsom	Yeah, it looks like the "cache data" from for fastestmirror is getting trashed?	21:11
johnsom	Hmm, so your failed job ran at OVH, the success ran at inap. Is there a relation on the failed jobs?	21:13
johnsom	You can look in zuul-info/zuul-info.controller.txt for the provider	21:13
johnsom	It could be one provider has a problem with the mirror	21:14
cgoncalves	I'd say very likely	21:14
johnsom	This still makes me wonder:	21:15
johnsom	http://logs.openstack.org/06/586906/2/check/octavia-v1-dsvm-scenario-kvm-centos.7/b60cf4b/logs/devstacklog.txt.gz#_2018-08-01_12_12_02_898	21:15
johnsom	So the others seem to be mirrors at the provider, this one (success) went out to the internets	21:15
johnsom	osuosl.org (which happens to be here in town)	21:15
johnsom	Go beavs	21:15
johnsom	mnaser Quick question, are you aware of any recent issues with the epel mirror at OVH?	21:17
mnaser	johnsom: not that i know of	21:17
johnsom	Ok, no smoking gun here yet, but thought I would give a quick ping to check	21:17
johnsom	http://logs.openstack.org/14/587414/5/check/octavia-v2-dsvm-scenario-centos.7/18de25c/job-output.txt.gz#_2018-08-03_20_28_39_672377 for those playing along	21:18
cgoncalves	http://logs.openstack.org/56/584856/3/check/octavia-v1-dsvm-scenario-kvm-centos.7/1766905/zuul-info/inventory.yaml okay with rax	21:21
johnsom	Yeah, inap and rax seem to pass.	21:21
johnsom	cgoncalves Is this centos 7?	21:21
cgoncalves	johnsom, yes. controller and amp	21:22
cgoncalves	http://logs.openstack.org/14/587414/5/check/octavia-v2-dsvm-scenario-centos.7/b7ce6d4/zuul-info/inventory.yaml this is inap and failed	21:22
johnsom	Ok, so likely not a local mirror issue	21:22
johnsom	cgoncalves these are running on different nodepool images. Your failed is on centos image where pass is on ubuntu	21:24
cgoncalves	hmm ok, looking now at nodeset level...	21:25
cgoncalves	why can't you just pull your strings with infra people. would be much easier xD	21:25
johnsom	Must save silver bullets for things specific and clear.....	21:26
johnsom	So that path I was going down is looking at the ca-certificates files as that plays a role here. That is when I noticed the base image is different	21:27
johnsom	Yeah, all four samples I have line up, so it looks like you must build centos images on ubuntu hosts... lol	21:28
xgerman_	yep, that’s how it was designed	21:28
xgerman_	cgoncalves: did you think you could build centos on centos?	21:29
johnsom	I would look at clock skew on the centos hosts (i.e. is it getting a good time so the ssl can negotiate?), the packages like yum ca-certificates, etc.	21:29
johnsom	xgerman_ I love the wording of this MicroFocus proxy vote letter (from hp/hpe stock you might have had): "To approve the disposal by the Company of the SUSE buisness segment ..."	21:32
johnsom	cgoncalves I need to get back to UDP stuff. Noodle on that a bit. If you are still stuck ping your colleague ianw in #openstack-dib	21:36
cgoncalves	johnsom, ok. thank you for your time!	21:36
johnsom	If you are still stuck next week ping me again on it	21:37
xgerman_	Yeah, only kept the HP printer Corp. - the rest seemed to risky:-)	21:38
*** pcaruana has quit IRC		21:38
johnsom	Yeah, I must have like one share floating around somewhere	21:38
jiteka	Could someone confirm which parameter I need to change to increase the timeout on amphora build	21:39
johnsom	Wow, really? 25 minutes isn't enough? I think that is the default	21:39
jiteka	I don't have enough time to troubleshoot why the controller can't reach my VM	21:39
jiteka	octavia-worker[18322]: WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: ConnectionError: HTTPSConnectionPool(host='10.79.80.30', port=9443): Max retries exceeded with url: /0.5/plug/vip/10.63.69.0 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fc56b4edbd0>: Failed to establish a new connection: [Errno 111] Connection refused',))	21:39
jiteka	the VM go away in something like 1 or 2 min	21:40
johnsom	jiteka connection_max_retries (default 300) and connection_retry_interval (default 5) are the two timeouts there while we wait for nova to boot the instance.	21:41
johnsom	Though this also could mean your lb-mgmt-net is not working. Booting up a cirros there and checking that it is reachable/got an IP can help there	21:41
johnsom	Hmmm refused, could be a firewall or some kind of customized image that is broken.	21:42
jiteka	what's the difference between :	21:43
jiteka	- ConnectTimeoutError	21:43
jiteka	- NewConnectionError	21:43
johnsom	Normally we handle the security groups, so that should not be an issue.	21:43
jiteka	I noticed that it throw few timeout before throwing error	21:43
johnsom	Hmm, neither of those are from our code. They are things we are catching. Let me search. Are they both in that same warning message?	21:48
johnsom	jiteka They are both URLLIB3 execptions: http://urllib3.readthedocs.io/en/latest/reference/index.html#module-urllib3.exceptions	21:49
johnsom	NewConnectionError appears to mean something is actively rejecting the connection, where ConectTimeoutError is no response at all	21:50
johnsom	You would see ComputeWaitTimeoutException if octavia actually gives up trying to connect	21:51
KeithMnemonic	johnsom thanks again have a great weekend (need to make up for my earlier abruptness)	21:53
johnsom	Or a pure "TimeOutException" with an error log entry "Connection retries (currently set to %(max_retries)s) exhausted. The amphora is unavailable.	21:53
johnsom	KeithMnemonic o/	21:53
*** KeithMnemonic has quit IRC		21:53
rm_work	hmmm i feel like i JUST saw someone else posting about that centos build error somewhere else...	21:57
johnsom	rm_work BTW, I hacked a stack with your bbq client patch (fed=False) and it doesn't appear to solve the problem.	21:59
rm_work	hmmmmm	21:59
johnsom	At least when using Octavia	22:00
johnsom	I didn't try the CLI test	22:00
rm_work	it still tries to hit the href passed in?	22:00
johnsom	It hits the public URL in keystone. I don't know if it's getting that from keystone, the hardcoded one in the config file, or the href	22:00
rm_work	hmmmm	22:01
rm_work	yeah can you just put a print statement in to show what URL it's passing in	22:01
johnsom	Sadly I don't have direct access to that stack, so my debugging is limited there.	22:01
rm_work	in the secrets class (right after I do the if-statemet to generate the new URL)	22:01
rm_work	ah	22:01
rm_work	not devstack?	22:01
johnsom	No, it's an actual cloud that someone else controls	22:02
johnsom	Thus why the internal and public URLs are different and we found this issue.	22:02
rm_work	yeah but	22:03
johnsom	Maybe next week I can setup that CLI test on devstack again.	22:03
rm_work	hmmm	22:03
rm_work	do you know it was done correctly then?	22:03
johnsom	I'm just arms deep in a amphora-agent refactor for UDP	22:03
rm_work	like, the patch was actually installed in the right place	22:03
johnsom	Yeah, I watched and instructed as the installed. 85% confident	22:03
rm_work	hmmmmmmmm	22:03
rm_work	I can add in some debugging if you can have them try it again	22:04
rm_work	then could at least verify it is running that code	22:04
rm_work	there are some changes i wanted to make anyway	22:04
johnsom	Yeah, maybe next week. I can also just setup a devstack and change the internal URL to broken and run my test steps in the story	22:04
rm_work	yes	22:04
johnsom	I just don't have the VMs for that right now	22:04
rm_work	i would like to see that :P	22:04
rm_work	ah yeah	22:04
rm_work	cloud + bit.do/devstack? :P	22:05
rm_work	i always just used RAX VMs	22:05
johnsom	I don't, for the reasons your are aware of....	22:05
johnsom	lol	22:05
rm_work	i mean	22:05
rm_work	it's better than NO VMs	22:05
rm_work	usually	22:05
rm_work	but, let me see if i can spin a stack	22:06
rm_work	i haven't tried in a while	22:06
rm_work	do you stack on Bionic now?	22:07
rm_work	i should prolly just make a clean thing	22:07
johnsom	No, I haven't switched yet	22:07
rm_work	hmmm	22:08
johnsom	I was nervous about the major networking changes, but it seems the compatibility stuff works well enough our amps run.	22:08
rm_work	but it should be safe, right?	22:08
rm_work	hmm k	22:08
openstackgerrit	Adam Harwell proposed openstack/octavia master: Add usage admin resource https://review.openstack.org/557548	22:11
*** hongbin has quit IRC		22:19
openstackgerrit	Carlos Goncalves proposed openstack/octavia-tempest-plugin master: WIP: Gate on CentOS 7 and check on Ubuntu Bionic https://review.openstack.org/587414	22:35
cgoncalves	got it! https://review.openstack.org/#/c/588676/	23:13
johnsom	Nice, so it was a ca-certificates issue. Probably just not installed soon enough.	23:19
cgoncalves	not installed at all	23:22
cgoncalves	epel is installed on the host but is http://	23:22
johnsom	Yeah, it is eventually, it was in rpm list	23:22
cgoncalves	all repos I've come across are http:// in fact	23:22
johnsom	http://logs.openstack.org/14/587414/5/check/octavia-v2-dsvm-scenario-centos.7/18de25c/controller/logs/rpm-qa.txt.gz	23:24
johnsom	I checked that, but it must be coming in too late	23:24
cgoncalves	you, sir, always amuse me with such useful logs	23:24
cgoncalves	so, the mystery of octavia-tempest-plugin is still unsolved. the perms seem to be right, though	23:25
* johnsom spends way too much time looking at logs for people.....		23:25
cgoncalves	guilty xD	23:25
johnsom	Hmmm, have a link for the plugin issue?	23:26
johnsom	Where you are dumping the perms?	23:26
cgoncalves	post-run. 2 secs	23:26
johnsom	My little patch for the netns service is getting bigger. I found that some of the unit tests are actually functional tests and need moved....	23:26
cgoncalves	johnsom, http://logs.openstack.org/14/587414/5/check/octavia-v2-dsvm-scenario-centos.7/18de25c/job-output.txt.gz#_2018-08-03_20_28_52_632600	23:31
cgoncalves	run with ca-certificate fixed: http://logs.openstack.org/14/587414/6/check/octavia-v2-dsvm-scenario-centos.7/cf0e780/job-output.txt.gz#_2018-08-03_23_24_59_017149	23:34
johnsom	So right at the top of that, it only zuul can access that path. Where is that original failure?	23:34
johnsom	Ah, ok	23:34
rm_work	woo my devstack worked	23:46
rm_work	ok time to test this patch thing	23:47
rm_work	so you recommended ... setting the config for "internal" in octavia.conf for barbican	23:47
rm_work	and then setting internal to something invalid	23:47
johnsom	Yeah	23:47
rm_work	and seeing if it still succeeds?	23:47
rm_work	... prolly i'll just do tons of debug logging	23:47
johnsom	certificates section endpoint_type	23:47
johnsom	Yeah	23:48
johnsom	Or just use my CLI test in the story	23:48
rm_work	oh, right	23:48
rm_work	well	23:48
rm_work	CLI doesn't have a flag...	23:48
*** harlowja has quit IRC		23:48
rm_work	i'd have to default it to False	23:48
rm_work	which i can do, so :)	23:49
johnsom	cgoncalves Try overriding the path to the tempest plugin to be /opt/stack/octavia-tempest-plugin	23:50
johnsom	I set it to /home/zuul/.... here:	23:50
johnsom	vars:	23:51
johnsom	devstack_localrc:	23:51
johnsom	TEMPEST_PLUGINS: "'{{ ansible_user_dir }}/src/git.openstack.org/openstack/octavia-tempest-plugin'"	23:51
rm_work	oh ummm	23:51
rm_work	johnsom: any chance they're still using the old-style Containers?	23:51
rm_work	instead of one PKCS12 secret?	23:51
johnsom	No, it is pkcs12	23:51
rm_work	because... i didn't do it for Containers yet in that patch...	23:51
rm_work	hmm ok	23:51
rm_work	really thought i had it there for a sec :P	23:51
johnsom	cgoncalves Yeah, my money is on that /home/zuul directory permissions are different on that centos nodepool instance. opt/stack/octavia-tempest-plugin should fix you right up. Probably should just to that in the parent job for all of them	23:53
cgoncalves	2018-08-03 23:25:28.290794 \| controller \| /home/zuul:	23:56
cgoncalves	2018-08-03 23:25:28.290953 \| controller \| total 52	23:56
cgoncalves	2018-08-03 23:25:28.291080 \| controller \| drwx------. 7 zuul zuul 4096 Aug 3 22:43 .	23:56
johnsom	Yeah,, I think the user at that point has switched to "stack" via devstack	23:57
cgoncalves	isn't the ansible user zuul? if so it should have no probs in reading /home/zuul	23:57
rm_work	hmmmm johnsom this seems to be working in my client, one sec	23:57
cgoncalves	ah, right	23:57
openstackgerrit	Carlos Goncalves proposed openstack/octavia-tempest-plugin master: WIP: Gate on CentOS 7 and check on Ubuntu Bionic https://review.openstack.org/587414	23:59
cgoncalves	thanks johnsom	23:59
cgoncalves	off I go	23:59
johnsom	o/	23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!