Monday, 2020-01-27

*** gthiemonge has quit IRC		01:30
*** gthiemon1e has joined #openstack-lbaas		01:31
*** yamamoto has quit IRC		01:48
*** openstackgerrit has quit IRC		02:04
*** armax has joined #openstack-lbaas		02:10
*** yamamoto has joined #openstack-lbaas		02:57
*** psachin has joined #openstack-lbaas		03:35
*** ramishra has joined #openstack-lbaas		03:46
*** gcheresh has joined #openstack-lbaas		06:41
*** tkajinam has quit IRC		07:02
*** tkajinam has joined #openstack-lbaas		07:04
*** gthiemon1e is now known as gthiemonge		07:16
*** luksky has joined #openstack-lbaas		07:29
*** tkajinam_ has joined #openstack-lbaas		07:52
*** tkajinam has quit IRC		07:55
*** tesseract has joined #openstack-lbaas		08:13
*** tkajinam_ has quit IRC		08:18
*** rpittau\|afk is now known as rpittau		08:18
*** pcaruana has joined #openstack-lbaas		08:25
*** openstackgerrit has joined #openstack-lbaas		08:26
openstackgerrit	Ann Taraday proposed openstack/octavia master: Jobboard based controller https://review.opendev.org/647406	08:26
*** AlexStaf has joined #openstack-lbaas		08:27
*** ccamposr has joined #openstack-lbaas		08:30
*** vesper11 has quit IRC		09:01
*** vesper11 has joined #openstack-lbaas		09:05
*** yamamoto has quit IRC		09:13
openstackgerrit	Ann Taraday proposed openstack/octavia master: Jobboard based controller https://review.opendev.org/647406	09:18
*** etp has quit IRC		10:58
*** etp has joined #openstack-lbaas		11:00
*** rpittau is now known as rpittau\|bbl		11:21
openstackgerrit	Ann Taraday proposed openstack/octavia master: Jobboard based controller https://review.opendev.org/647406	11:24
openstackgerrit	Ann Taraday proposed openstack/octavia master: Testing https://review.opendev.org/697213	11:30
*** luksky has quit IRC		11:55
*** yamamoto has joined #openstack-lbaas		12:04
*** yamamoto has quit IRC		12:09
*** yamamoto_ has joined #openstack-lbaas		12:09
TMM	Does anyone happen to know if there's any version of the octavia dashboard that supports the new allowed-cidr options from Train?	12:22
TMM	I upgraded horizon to train but it appears that there's no such option in horizon at least	12:22
cgoncalves	TMM, allowed-cidr option has not been added to the dashboard yet	12:23
TMM	OK, thanks for the confirmation, I'm not losing my mind :)	12:23
*** rpittau\|bbl is now known as rpittau		12:55
*** ramishra has quit IRC		13:20
*** ramishra has joined #openstack-lbaas		13:21
*** ramishra has quit IRC		13:21
*** ramishra has joined #openstack-lbaas		13:21
*** psachin has quit IRC		13:29
*** yamamoto_ has quit IRC		13:56
*** yamamoto has joined #openstack-lbaas		13:58
TMM	Is there a way to tell octavia to retry some operations on an lbaas? I made an error updating octavia and now all my load balancers are either in PENDING or ERROR state :P (forgot to set the rabbit topic name)	14:01
TMM	They are all still working fine, I still need to update their amphoras	14:01
openstackgerrit	Gregory Thiemonge proposed openstack/octavia master: Support haproxy development snapshot version parsing https://review.opendev.org/701823	14:07
*** haleyb has joined #openstack-lbaas		14:14
*** luksky has joined #openstack-lbaas		14:16
johnsom	TMM that scenario might be tricky. For those in ERROR, you can use the failover API. For those in PENDING_*, it is likely the workers never got that message, so you need to set them to ERROR in the DB, then fail those over.	14:19
TMM	johnsom: the openstack loadbalancer failover, or the amphora failover api?	14:20
johnsom	Load balancer failover	14:20
TMM	ok, thanks	14:21
TMM	I appreciate it! :D	14:21
TMM	hmm, after the failover the amphoras went to 'standalone'	14:22
johnsom	It should do that temporarily during the failover process	14:22
TMM	Ahh, ok	14:22
johnsom	It builds them as standalone, then will update them to their proper role.	14:23
TMM	clever :) I don't think it used to do that	14:23
johnsom	A load balancer failover sequencing the amphora replacements so that it minimizes downtime, etc.	14:23
johnsom	There are also additional improvements to failover coming. I am working on that right now.	14:24
TMM	hmm, I now have some amphora with a non-matchin ssl cert? (Caused by SSLError(CertificateError("hostname u'fe56fc4a-71cb-416a-a4c5-bf8892d81879' doesn't match '6cf80b52-c839-4d05-a777-e72a1530e126'") I wonder how I managed to do this	14:25
TMM	@johnsom Thank you for your work! I generally really like octavia!	14:25
johnsom	Excellent, glad to hear it.	14:25
johnsom	That is very odd, but maybe the rabbit issue impacted nova or neutron too?	14:26
TMM	Hmm, maybe, but I only touched octavia	14:27
TMM	and it was just a new setting in oslo configs that I didn't set	14:27
johnsom	This is a funny one: https://storyboard.openstack.org/#!/story/2007218	14:34
johnsom	My guess is it is the standard openstackclient behavior, but I will take a look	14:35
cgoncalves	https://docs.python.org/3/library/argparse.html#choices	14:36
gthiemonge	is 65535 a reserved port? it's not in the choice list	14:37
gthiemonge	choices=range(1, 65535) doesn't look good	14:39
johnsom	Yep. I bet we can do better.... Interestingly enough, we don't validate the listener port #, just pass it through to the API to validate.	14:41
TMM	Hmm, I appear to have at least 1 loadbalancer where the amphora is in MASTER role, but there is no slave amphora for it at all	14:47
johnsom	Yeah, that can happen under strange circumstances. This is one of the things I am currently fixing.	14:47
TMM	anything I can do about that now? :)	14:48
johnsom	So, the short answer is, there is not an easy way to fix this. If it's not a critical LB, delete it and recreate it. If it's critical, there is likely a number of steps required to fix it enough that a failover will complete.	14:49
TMM	well, I ran a 'loadbalancer failover' on a bunch of my loadbalancers and most of them are now in this state it seems	14:50
TMM	would an amphora failover make octavia notice that there's some amphoras missing?	14:51
johnsom	Are they back to Active or Error state, or still Pending?	14:51
TMM	Everything is in 'active' state	14:52
TMM	there's just amphora's missing, but nothing seems to be too concerned about this	14:52
*** TrevorV has joined #openstack-lbaas		14:53
TMM	yeah, so right now all my loadbalancers are in provisioning_status ACTIVE, and ONLINE	14:55
TMM	All my amphoras are ALLOCATED, but there's just a bunch of backups just kind of not there	14:55
TMM	(this is octavia from Train btw)	14:56
johnsom	Yeah, it's a bug where if the database records for the amphora somehow got removed, it doesn't notice there is one missing. This is the patch I working on to fix now. I have it working in my lab, but still have work to do before I can publish it. The original authors assumed that scenario would never happen.	14:57
TMM	I don't think I deleted any octavia db records	14:57
johnsom	Yes, but the failover might have.	14:58
TMM	ah, ok	14:58
TMM	Do I have to manually undelete the amphora db record?	14:58
johnsom	If you can't just delete/rebuild, you will need to, yes.	14:59
TMM	I can't really delete them no	14:59
johnsom	I think rm_work has a procedure for recreating those records, but I'm not sure if he is online at the moment.	15:01
johnsom	If not, I can probably walk through it, but it might be a bit of a process...	15:03
TMM	it's still in the database as DELETED	15:03
TMM	so I just set it back to active, I'll try to do a failover now	15:04
johnsom	Ah, that is good. Or try ERROR	15:04
TMM	alright, I put the missing ones in error	15:07
TMM	I'll do another failover, see if it'll fix them	15:07
*** gcheresh has quit IRC		15:09
TMM	ok, yeah, so setting the deleted db records to 'ERROR' and running the lb failover twice seems to have fixed it	15:19
TMM	the first time the lb itself went into ERROR mode, as octavia desperately tried to contact the non-existent amphora	15:20
TMM	the second time it recovered the state	15:20
johnsom	Oh good.	15:21
TMM	Not sure if that was expected? :) But it worked for me at least	15:21
johnsom	Yes, with the amp in error it should handle it better. So, keep an eye out for future bug fix releases that will include a much improved failover capability.	15:23
TMM	Awesome, thank you for your help. I really appreciate it.	15:24
haleyb	johnsom: you want me to fix the port validation bug?	16:09
johnsom	haleyb Almost done	16:09
haleyb	johnsom: i'm already done iwth it :)	16:11
haleyb	it's a race right?	16:12
johnsom	Ha, well, I took assignment of the bug. But we can both post and see who did a better patch....	16:12
* johnsom throws the gauntlet		16:12
* cgoncalves is open to bribes		16:15
openstackgerrit	Brian Haley proposed openstack/python-octaviaclient master: Do not print large usage message for port or weight https://review.opendev.org/704348	16:19
haleyb	untested though	16:19
*** ccamposr has quit IRC		16:34
haleyb	johnsom: sigh, that doesn't exactly work ^^^	16:38
openstackgerrit	Michael Johnson proposed openstack/python-octaviaclient master: Fix long CLI error messages https://review.opendev.org/704355	16:47
johnsom	haleyb ^^^^ This works.... (I still need to finish the cleanup/tests)	16:47
* haleyb shakes fist		16:48
haleyb	johnsom: it would have helped if my devstack had octavia running, part of the problem was a 500 error	16:49
johnsom	Invalid input for field/attribute 'protocol-port'. Value: '65536'. Value must be between 1 and 65535.	16:51
johnsom	I made the error similar to the API error message	16:51
*** AlexStaf has quit IRC		16:51
haleyb	johnsom: the only thing you forgot it tests :-p	16:52
johnsom	Yep, still working on those	16:53
*** mithilarun has joined #openstack-lbaas		16:56
*** yamamoto has quit IRC		17:06
*** mithilarun has quit IRC		17:06
*** gregwork has joined #openstack-lbaas		17:15
rm_work	TMM / johnsom: yep, resurrecting old amp records into ERROR state is the easiest way (though I just do an Amphora failover on that specific ID, not a LB failover) -- the hard way is copying the INSERT from the MASTER amp, and just changing all of the amp/compute/port ID fields to junk uuids so it will just see nothing there	17:16
openstackgerrit	Michael Johnson proposed openstack/python-octaviaclient master: Fix long CLI error messages https://review.opendev.org/704355	17:16
rm_work	which is only necessary if you literally have no other records to work with	17:17
rm_work	(like I did at the time I dealt with most of that)	17:17
TMM	still waiting on the last lb to recover	17:17
TMM	it's taking so friggin long to timeout on the non-existent amps	17:17
rm_work	yeah it's safer/easier to do individual amp failovers	17:17
rm_work	then you don't run into that	17:17
TMM	ah	17:17
TMM	well, now I know	17:18
rm_work	and if you want to be extra sure, do the LB failover once the Amp failover succeeds and you have two active amps	17:18
rm_work	less downtime that way too	17:18
TMM	Yeah, this one lb has been down for like 20 minutes now	17:18
TMM	Probably should've just recreated it	17:19
TMM	oh well	17:19
rm_work	:(	17:19
rm_work	looking forward to johnsom's failover rework	17:19
johnsom	If it makes you feel better, the new code won't do that	17:19
TMM	computers are awful	17:19
TMM	:P	17:19
rm_work	^^ yes	17:19
rm_work	they do what we tell them to, it's horrible :D	17:20
TMM	why is it even TRYING to contact the amp that's in ERROR mode	17:20
TMM	just shoot it	17:20
TMM	shoooot iiiitttt	17:20
TMM	it's been 11 minutes now :P	17:21
johnsom	Well, in defense of the original authors, we get differing views. Some want retries waiting for other services (nova for example) forever, others want fail fast.	17:21
TMM	I just think that perhaps 11 minutes to wait on a node that's already in error state with a 'no route to host' error is maybe excessive	17:22
johnsom	Yeah, the default is 25 minutes I think. That was because people were using virtualbox and some of the zuul tests nodes don't have hardware virtualization. For example, one hosting provider can take up to 18 minutes to boot a via using nova.	17:23
johnsom	It's a poor default I think. production should be much lower.	17:24
TMM	I just killed octavia-worker and set everything to error except the lb, doing amphora failovers now	17:24
TMM	I Don't have another 50 minutes to wait	17:24
johnsom	Yeah, be super careful killing the octavia processes. Currently that can halt other actions going on in the cloud and lead to PENDING_* states and broken LBs	17:25
rm_work	yeah i was tempted to suggest that	17:25
rm_work	\\\\\\but	17:25
rm_work	yeah it's a little risky	17:25
johnsom	They may even blow up in the future, not necessarily right way.	17:25
TMM	Nothing really was happening at the time	17:25
johnsom	There are patches in flight for that issue too	17:26
TMM	at least the debug log of worker didn't seem to suggest it was doing anything except waiting on that one amp	17:26
*** luksky has quit IRC		17:27
johnsom	haleyb Up for review... grin	17:27
*** tesseract has quit IRC		17:38
*** yamamoto has joined #openstack-lbaas		17:43
*** yamamoto has quit IRC		17:52
*** mithilarun has joined #openstack-lbaas		18:09
openstackgerrit	Michael Johnson proposed openstack/python-octaviaclient master: Fix long CLI error messages https://review.opendev.org/704355	18:11
*** yamamoto has joined #openstack-lbaas		18:14
*** rpittau is now known as rpittau\|afk		18:18
TMM	I learned that resurrecting two amp records that are bot set to 'MASTER' is not a recipe for success	18:24
*** gcheresh has joined #openstack-lbaas		18:37
*** AlexStaf has joined #openstack-lbaas		18:40
rm_work	ah no, you need to set one to BACKUP	18:40
rm_work	and also fix the vrrp_priority field?	18:41
rm_work	uhh... that might be the only other thing	18:41
*** gcheresh has quit IRC		18:44
*** gcheresh has joined #openstack-lbaas		18:46
*** AlexStaf has quit IRC		18:46
*** yamamoto has quit IRC		18:59
*** yamamoto has joined #openstack-lbaas		19:03
*** yamamoto has quit IRC		19:03
*** yamamoto has joined #openstack-lbaas		19:03
*** yamamoto has quit IRC		19:08
*** AlexStaf has joined #openstack-lbaas		19:08
TMM	well, it appears to work now	19:15
*** gregwork has quit IRC		19:25
*** KeithMnemonic has joined #openstack-lbaas		19:26
*** AlexStaf has quit IRC		19:28
openstackgerrit	Brian Haley proposed openstack/octavia-tempest-plugin master: Change to use memory_tracker variable https://review.opendev.org/704202	19:29
rm_work	you should definitely fix it so one is MASTER and one is BACKUP or failover will not work great right now	19:42
*** luksky has joined #openstack-lbaas		19:47
*** AlexStaf has joined #openstack-lbaas		20:17
*** openstackstatus has joined #openstack-lbaas		20:28
*** ChanServ sets mode: +v openstackstatus		20:28
*** gcheresh has quit IRC		20:29
*** mithilarun has quit IRC		21:36
*** TrevorV has quit IRC		21:36
*** mithilarun has joined #openstack-lbaas		21:37
*** mithilarun has quit IRC		21:52
*** rcernin has joined #openstack-lbaas		22:10
*** mithilarun has joined #openstack-lbaas		22:20
*** mithilarun has quit IRC		22:24
*** mithilarun has joined #openstack-lbaas		22:36
*** tkajinam has joined #openstack-lbaas		22:55
*** mithilarun has quit IRC		23:31
*** mithilarun has joined #openstack-lbaas		23:32
*** mithilarun has quit IRC		23:36
*** yamamoto has joined #openstack-lbaas		23:45
*** mithilarun has joined #openstack-lbaas		23:49
*** yamamoto has quit IRC		23:49

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!