*** gthiemonge has quit IRC | 01:30 | |
*** gthiemon1e has joined #openstack-lbaas | 01:31 | |
*** yamamoto has quit IRC | 01:48 | |
*** openstackgerrit has quit IRC | 02:04 | |
*** armax has joined #openstack-lbaas | 02:10 | |
*** yamamoto has joined #openstack-lbaas | 02:57 | |
*** psachin has joined #openstack-lbaas | 03:35 | |
*** ramishra has joined #openstack-lbaas | 03:46 | |
*** gcheresh has joined #openstack-lbaas | 06:41 | |
*** tkajinam has quit IRC | 07:02 | |
*** tkajinam has joined #openstack-lbaas | 07:04 | |
*** gthiemon1e is now known as gthiemonge | 07:16 | |
*** luksky has joined #openstack-lbaas | 07:29 | |
*** tkajinam_ has joined #openstack-lbaas | 07:52 | |
*** tkajinam has quit IRC | 07:55 | |
*** tesseract has joined #openstack-lbaas | 08:13 | |
*** tkajinam_ has quit IRC | 08:18 | |
*** rpittau|afk is now known as rpittau | 08:18 | |
*** pcaruana has joined #openstack-lbaas | 08:25 | |
*** openstackgerrit has joined #openstack-lbaas | 08:26 | |
openstackgerrit | Ann Taraday proposed openstack/octavia master: Jobboard based controller https://review.opendev.org/647406 | 08:26 |
---|---|---|
*** AlexStaf has joined #openstack-lbaas | 08:27 | |
*** ccamposr has joined #openstack-lbaas | 08:30 | |
*** vesper11 has quit IRC | 09:01 | |
*** vesper11 has joined #openstack-lbaas | 09:05 | |
*** yamamoto has quit IRC | 09:13 | |
openstackgerrit | Ann Taraday proposed openstack/octavia master: Jobboard based controller https://review.opendev.org/647406 | 09:18 |
*** etp has quit IRC | 10:58 | |
*** etp has joined #openstack-lbaas | 11:00 | |
*** rpittau is now known as rpittau|bbl | 11:21 | |
openstackgerrit | Ann Taraday proposed openstack/octavia master: Jobboard based controller https://review.opendev.org/647406 | 11:24 |
openstackgerrit | Ann Taraday proposed openstack/octavia master: Testing https://review.opendev.org/697213 | 11:30 |
*** luksky has quit IRC | 11:55 | |
*** yamamoto has joined #openstack-lbaas | 12:04 | |
*** yamamoto has quit IRC | 12:09 | |
*** yamamoto_ has joined #openstack-lbaas | 12:09 | |
TMM | Does anyone happen to know if there's any version of the octavia dashboard that supports the new allowed-cidr options from Train? | 12:22 |
TMM | I upgraded horizon to train but it appears that there's no such option in horizon at least | 12:22 |
cgoncalves | TMM, allowed-cidr option has not been added to the dashboard yet | 12:23 |
TMM | OK, thanks for the confirmation, I'm not losing my mind :) | 12:23 |
*** rpittau|bbl is now known as rpittau | 12:55 | |
*** ramishra has quit IRC | 13:20 | |
*** ramishra has joined #openstack-lbaas | 13:21 | |
*** ramishra has quit IRC | 13:21 | |
*** ramishra has joined #openstack-lbaas | 13:21 | |
*** psachin has quit IRC | 13:29 | |
*** yamamoto_ has quit IRC | 13:56 | |
*** yamamoto has joined #openstack-lbaas | 13:58 | |
TMM | Is there a way to tell octavia to retry some operations on an lbaas? I made an error updating octavia and now all my load balancers are either in PENDING or ERROR state :P (forgot to set the rabbit topic name) | 14:01 |
TMM | They are all still working fine, I still need to update their amphoras | 14:01 |
openstackgerrit | Gregory Thiemonge proposed openstack/octavia master: Support haproxy development snapshot version parsing https://review.opendev.org/701823 | 14:07 |
*** haleyb has joined #openstack-lbaas | 14:14 | |
*** luksky has joined #openstack-lbaas | 14:16 | |
johnsom | TMM that scenario might be tricky. For those in ERROR, you can use the failover API. For those in PENDING_*, it is likely the workers never got that message, so you need to set them to ERROR in the DB, then fail those over. | 14:19 |
TMM | johnsom: the openstack loadbalancer failover, or the amphora failover api? | 14:20 |
johnsom | Load balancer failover | 14:20 |
TMM | ok, thanks | 14:21 |
TMM | I appreciate it! :D | 14:21 |
TMM | hmm, after the failover the amphoras went to 'standalone' | 14:22 |
johnsom | It should do that temporarily during the failover process | 14:22 |
TMM | Ahh, ok | 14:22 |
johnsom | It builds them as standalone, then will update them to their proper role. | 14:23 |
TMM | clever :) I don't think it used to do that | 14:23 |
johnsom | A load balancer failover sequencing the amphora replacements so that it minimizes downtime, etc. | 14:23 |
johnsom | There are also additional improvements to failover coming. I am working on that right now. | 14:24 |
TMM | hmm, I now have some amphora with a non-matchin ssl cert? (Caused by SSLError(CertificateError("hostname u'fe56fc4a-71cb-416a-a4c5-bf8892d81879' doesn't match '6cf80b52-c839-4d05-a777-e72a1530e126'") I wonder how I managed to do this | 14:25 |
TMM | @johnsom Thank you for your work! I generally really like octavia! | 14:25 |
johnsom | Excellent, glad to hear it. | 14:25 |
johnsom | That is very odd, but maybe the rabbit issue impacted nova or neutron too? | 14:26 |
TMM | Hmm, maybe, but I only touched octavia | 14:27 |
TMM | and it was just a new setting in oslo configs that I didn't set | 14:27 |
johnsom | This is a funny one: https://storyboard.openstack.org/#!/story/2007218 | 14:34 |
johnsom | My guess is it is the standard openstackclient behavior, but I will take a look | 14:35 |
cgoncalves | https://docs.python.org/3/library/argparse.html#choices | 14:36 |
gthiemonge | is 65535 a reserved port? it's not in the choice list | 14:37 |
gthiemonge | choices=range(1, 65535) doesn't look good | 14:39 |
johnsom | Yep. I bet we can do better.... Interestingly enough, we don't validate the listener port #, just pass it through to the API to validate. | 14:41 |
TMM | Hmm, I appear to have at least 1 loadbalancer where the amphora is in MASTER role, but there is no slave amphora for it at all | 14:47 |
johnsom | Yeah, that can happen under strange circumstances. This is one of the things I am currently fixing. | 14:47 |
TMM | anything I can do about that now? :) | 14:48 |
johnsom | So, the short answer is, there is not an easy way to fix this. If it's not a critical LB, delete it and recreate it. If it's critical, there is likely a number of steps required to fix it enough that a failover will complete. | 14:49 |
TMM | well, I ran a 'loadbalancer failover' on a bunch of my loadbalancers and most of them are now in this state it seems | 14:50 |
TMM | would an amphora failover make octavia notice that there's some amphoras missing? | 14:51 |
johnsom | Are they back to Active or Error state, or still Pending? | 14:51 |
TMM | Everything is in 'active' state | 14:52 |
TMM | there's just amphora's missing, but nothing seems to be too concerned about this | 14:52 |
*** TrevorV has joined #openstack-lbaas | 14:53 | |
TMM | yeah, so right now all my loadbalancers are in provisioning_status ACTIVE, and ONLINE | 14:55 |
TMM | All my amphoras are ALLOCATED, but there's just a bunch of backups just kind of not there | 14:55 |
TMM | (this is octavia from Train btw) | 14:56 |
johnsom | Yeah, it's a bug where if the database records for the amphora somehow got removed, it doesn't notice there is one missing. This is the patch I working on to fix now. I have it working in my lab, but still have work to do before I can publish it. The original authors assumed that scenario would never happen. | 14:57 |
TMM | I don't think I deleted any octavia db records | 14:57 |
johnsom | Yes, but the failover might have. | 14:58 |
TMM | ah, ok | 14:58 |
TMM | Do I have to manually undelete the amphora db record? | 14:58 |
johnsom | If you can't just delete/rebuild, you will need to, yes. | 14:59 |
TMM | I can't really delete them no | 14:59 |
johnsom | I think rm_work has a procedure for recreating those records, but I'm not sure if he is online at the moment. | 15:01 |
johnsom | If not, I can probably walk through it, but it might be a bit of a process... | 15:03 |
TMM | it's still in the database as DELETED | 15:03 |
TMM | so I just set it back to active, I'll try to do a failover now | 15:04 |
johnsom | Ah, that is good. Or try ERROR | 15:04 |
TMM | alright, I put the missing ones in error | 15:07 |
TMM | I'll do another failover, see if it'll fix them | 15:07 |
*** gcheresh has quit IRC | 15:09 | |
TMM | ok, yeah, so setting the deleted db records to 'ERROR' and running the lb failover twice seems to have fixed it | 15:19 |
TMM | the first time the lb itself went into ERROR mode, as octavia desperately tried to contact the non-existent amphora | 15:20 |
TMM | the second time it recovered the state | 15:20 |
johnsom | Oh good. | 15:21 |
TMM | Not sure if that was expected? :) But it worked for me at least | 15:21 |
johnsom | Yes, with the amp in error it should handle it better. So, keep an eye out for future bug fix releases that will include a much improved failover capability. | 15:23 |
TMM | Awesome, thank you for your help. I really appreciate it. | 15:24 |
haleyb | johnsom: you want me to fix the port validation bug? | 16:09 |
johnsom | haleyb Almost done | 16:09 |
haleyb | johnsom: i'm already done iwth it :) | 16:11 |
haleyb | it's a race right? | 16:12 |
johnsom | Ha, well, I took assignment of the bug. But we can both post and see who did a better patch.... | 16:12 |
* johnsom throws the gauntlet | 16:12 | |
* cgoncalves is open to bribes | 16:15 | |
openstackgerrit | Brian Haley proposed openstack/python-octaviaclient master: Do not print large usage message for port or weight https://review.opendev.org/704348 | 16:19 |
haleyb | untested though | 16:19 |
*** ccamposr has quit IRC | 16:34 | |
haleyb | johnsom: sigh, that doesn't exactly work ^^^ | 16:38 |
openstackgerrit | Michael Johnson proposed openstack/python-octaviaclient master: Fix long CLI error messages https://review.opendev.org/704355 | 16:47 |
johnsom | haleyb ^^^^ This works.... (I still need to finish the cleanup/tests) | 16:47 |
* haleyb shakes fist | 16:48 | |
haleyb | johnsom: it would have helped if my devstack had octavia running, part of the problem was a 500 error | 16:49 |
johnsom | Invalid input for field/attribute 'protocol-port'. Value: '65536'. Value must be between 1 and 65535. | 16:51 |
johnsom | I made the error similar to the API error message | 16:51 |
*** AlexStaf has quit IRC | 16:51 | |
haleyb | johnsom: the only thing you forgot it tests :-p | 16:52 |
johnsom | Yep, still working on those | 16:53 |
*** mithilarun has joined #openstack-lbaas | 16:56 | |
*** yamamoto has quit IRC | 17:06 | |
*** mithilarun has quit IRC | 17:06 | |
*** gregwork has joined #openstack-lbaas | 17:15 | |
rm_work | TMM / johnsom: yep, resurrecting old amp records into ERROR state is the easiest way (though I just do an Amphora failover on that specific ID, not a LB failover) -- the hard way is copying the INSERT from the MASTER amp, and just changing all of the amp/compute/port ID fields to junk uuids so it will just see nothing there | 17:16 |
openstackgerrit | Michael Johnson proposed openstack/python-octaviaclient master: Fix long CLI error messages https://review.opendev.org/704355 | 17:16 |
rm_work | which is only necessary if you literally have no other records to work with | 17:17 |
rm_work | (like I did at the time I dealt with most of that) | 17:17 |
TMM | still waiting on the last lb to recover | 17:17 |
TMM | it's taking so friggin long to timeout on the non-existent amps | 17:17 |
rm_work | yeah it's safer/easier to do individual amp failovers | 17:17 |
rm_work | then you don't run into that | 17:17 |
TMM | ah | 17:17 |
TMM | well, now I know | 17:18 |
rm_work | and if you want to be extra sure, do the LB failover once the Amp failover succeeds and you have two active amps | 17:18 |
rm_work | less downtime that way too | 17:18 |
TMM | Yeah, this one lb has been down for like 20 minutes now | 17:18 |
TMM | Probably should've just recreated it | 17:19 |
TMM | oh well | 17:19 |
rm_work | :( | 17:19 |
rm_work | looking forward to johnsom's failover rework | 17:19 |
johnsom | If it makes you feel better, the new code won't do that | 17:19 |
TMM | computers are awful | 17:19 |
TMM | :P | 17:19 |
rm_work | ^^ yes | 17:19 |
rm_work | they do what we tell them to, it's horrible :D | 17:20 |
TMM | why is it even TRYING to contact the amp that's in ERROR mode | 17:20 |
TMM | just shoot it | 17:20 |
TMM | shoooot iiiitttt | 17:20 |
TMM | it's been 11 minutes now :P | 17:21 |
johnsom | Well, in defense of the original authors, we get differing views. Some want retries waiting for other services (nova for example) forever, others want fail fast. | 17:21 |
TMM | I just think that perhaps 11 minutes to wait on a node that's already in error state with a 'no route to host' error is maybe excessive | 17:22 |
johnsom | Yeah, the default is 25 minutes I think. That was because people were using virtualbox and some of the zuul tests nodes don't have hardware virtualization. For example, one hosting provider can take up to 18 minutes to boot a via using nova. | 17:23 |
johnsom | It's a poor default I think. production should be much lower. | 17:24 |
TMM | I just killed octavia-worker and set everything to error except the lb, doing amphora failovers now | 17:24 |
TMM | I Don't have another 50 minutes to wait | 17:24 |
johnsom | Yeah, be super careful killing the octavia processes. Currently that can halt other actions going on in the cloud and lead to PENDING_* states and broken LBs | 17:25 |
rm_work | yeah i was tempted to suggest that | 17:25 |
rm_work | \\\\\\but | 17:25 |
rm_work | yeah it's a little risky | 17:25 |
johnsom | They may even blow up in the future, not necessarily right way. | 17:25 |
TMM | Nothing really was happening at the time | 17:25 |
johnsom | There are patches in flight for that issue too | 17:26 |
TMM | at least the debug log of worker didn't seem to suggest it was doing anything except waiting on that one amp | 17:26 |
*** luksky has quit IRC | 17:27 | |
johnsom | haleyb Up for review... grin | 17:27 |
*** tesseract has quit IRC | 17:38 | |
*** yamamoto has joined #openstack-lbaas | 17:43 | |
*** yamamoto has quit IRC | 17:52 | |
*** mithilarun has joined #openstack-lbaas | 18:09 | |
openstackgerrit | Michael Johnson proposed openstack/python-octaviaclient master: Fix long CLI error messages https://review.opendev.org/704355 | 18:11 |
*** yamamoto has joined #openstack-lbaas | 18:14 | |
*** rpittau is now known as rpittau|afk | 18:18 | |
TMM | I learned that resurrecting two amp records that are bot set to 'MASTER' is not a recipe for success | 18:24 |
*** gcheresh has joined #openstack-lbaas | 18:37 | |
*** AlexStaf has joined #openstack-lbaas | 18:40 | |
rm_work | ah no, you need to set one to BACKUP | 18:40 |
rm_work | and also fix the vrrp_priority field? | 18:41 |
rm_work | uhh... that might be the only other thing | 18:41 |
*** gcheresh has quit IRC | 18:44 | |
*** gcheresh has joined #openstack-lbaas | 18:46 | |
*** AlexStaf has quit IRC | 18:46 | |
*** yamamoto has quit IRC | 18:59 | |
*** yamamoto has joined #openstack-lbaas | 19:03 | |
*** yamamoto has quit IRC | 19:03 | |
*** yamamoto has joined #openstack-lbaas | 19:03 | |
*** yamamoto has quit IRC | 19:08 | |
*** AlexStaf has joined #openstack-lbaas | 19:08 | |
TMM | well, it appears to work now | 19:15 |
*** gregwork has quit IRC | 19:25 | |
*** KeithMnemonic has joined #openstack-lbaas | 19:26 | |
*** AlexStaf has quit IRC | 19:28 | |
openstackgerrit | Brian Haley proposed openstack/octavia-tempest-plugin master: Change to use memory_tracker variable https://review.opendev.org/704202 | 19:29 |
rm_work | you should definitely fix it so one is MASTER and one is BACKUP or failover will not work great right now | 19:42 |
*** luksky has joined #openstack-lbaas | 19:47 | |
*** AlexStaf has joined #openstack-lbaas | 20:17 | |
*** openstackstatus has joined #openstack-lbaas | 20:28 | |
*** ChanServ sets mode: +v openstackstatus | 20:28 | |
*** gcheresh has quit IRC | 20:29 | |
*** mithilarun has quit IRC | 21:36 | |
*** TrevorV has quit IRC | 21:36 | |
*** mithilarun has joined #openstack-lbaas | 21:37 | |
*** mithilarun has quit IRC | 21:52 | |
*** rcernin has joined #openstack-lbaas | 22:10 | |
*** mithilarun has joined #openstack-lbaas | 22:20 | |
*** mithilarun has quit IRC | 22:24 | |
*** mithilarun has joined #openstack-lbaas | 22:36 | |
*** tkajinam has joined #openstack-lbaas | 22:55 | |
*** mithilarun has quit IRC | 23:31 | |
*** mithilarun has joined #openstack-lbaas | 23:32 | |
*** mithilarun has quit IRC | 23:36 | |
*** yamamoto has joined #openstack-lbaas | 23:45 | |
*** mithilarun has joined #openstack-lbaas | 23:49 | |
*** yamamoto has quit IRC | 23:49 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!