*** hongbin has quit IRC | 00:29 | |
*** hongbin has joined #openstack-lbaas | 00:40 | |
*** wuchunyang has joined #openstack-lbaas | 00:59 | |
*** wuchunyang has quit IRC | 01:05 | |
*** yamamoto has quit IRC | 02:28 | |
*** yamamoto has joined #openstack-lbaas | 02:36 | |
*** rcernin_ has joined #openstack-lbaas | 02:58 | |
*** rcernin has quit IRC | 02:59 | |
*** rcernin_ has quit IRC | 03:16 | |
*** ramishra has joined #openstack-lbaas | 03:30 | |
*** rcernin_ has joined #openstack-lbaas | 03:32 | |
*** psachin has joined #openstack-lbaas | 03:39 | |
*** rcernin_ has quit IRC | 03:45 | |
*** rcernin has joined #openstack-lbaas | 03:45 | |
*** wuchunyang has joined #openstack-lbaas | 04:02 | |
*** wuchunyang has quit IRC | 04:06 | |
*** vishalmanchanda has joined #openstack-lbaas | 04:29 | |
*** hongbin has quit IRC | 04:39 | |
*** rcernin has quit IRC | 05:32 | |
*** gcheresh has joined #openstack-lbaas | 05:34 | |
*** rcernin has joined #openstack-lbaas | 05:40 | |
*** rpittau|afk is now known as rpittau | 06:21 | |
openstackgerrit | Merged openstack/octavia master: fix(elements): fix nf_conntrack sysctl param names https://review.opendev.org/706674 | 07:02 |
---|---|---|
*** maciejjozefczyk has joined #openstack-lbaas | 07:09 | |
*** stingrayza has joined #openstack-lbaas | 07:23 | |
*** also_stingrayza has quit IRC | 07:25 | |
*** rcernin_ has joined #openstack-lbaas | 07:47 | |
*** rcernin has quit IRC | 07:47 | |
*** rcernin_ has quit IRC | 07:54 | |
*** born2bake has joined #openstack-lbaas | 08:13 | |
*** ccamposr__ has joined #openstack-lbaas | 08:14 | |
*** ccamposr has quit IRC | 08:17 | |
*** ataraday_ has joined #openstack-lbaas | 08:32 | |
*** salmankhan has joined #openstack-lbaas | 08:33 | |
*** salmankhan has quit IRC | 08:36 | |
*** dayou_ has joined #openstack-lbaas | 08:36 | |
*** dayou has quit IRC | 08:39 | |
openstackgerrit | Merged openstack/octavia master: Cap jsonschema 3.2.0 as the minimal version https://review.opendev.org/730961 | 09:05 |
*** tkajinam has quit IRC | 09:21 | |
dulek | Hi! Can I ask you to take a look at why kuryr-kubernetes-tempest-(train|stein) are failing on https://review.opendev.org/#/c/734364? | 09:41 |
dulek | "/opt/stack/devstack/inc/python: line 456: cd: /opt/stack/diskimage-builder: No such file or directory" - this is pretty specific, we're probably missing something in local.conf? Does it ring a bell? | 09:41 |
*** ivve has joined #openstack-lbaas | 09:46 | |
ivve | oi folks, i've got a question about recreation of vrrp ports. i've got the usual scenario of losing network connectivity, octavia loses connection to lbs, which it tries to failover which of course fails due to network issues. then im left with tons of lbs in error. vrrp ports missing, recreate them but i get this when failing them over now: | 09:48 |
ivve | Amphora c198a06d-a4c4-4b4c-a35f-70b06ea5fb76 failover exception: subnet not found (subnet id: None).: SubnetNotFound: subnet not found (subnet id: None). | 09:48 |
ivve | using the command: neutron port-create --tenant-id <LB project/tenant ID> --name octavia-lb-vrrp-<amp ID> --security-group lb-<lb ID> --allowed-address-pair ip_address=<VIP IP address> <network ID for VIP> | 09:49 |
ivve | the info (i assume) should come from the ports as neither the loadbalancer nor the amphora keeps that in the db | 09:52 |
ivve | the fixed_ips field on the port does contain the correct {"subnet_id": xxx} | 09:53 |
ivve | so im a bit confused on where it is looking atm | 09:53 |
*** rpittau is now known as rpittau|bbl | 10:15 | |
ivve | similar issue described here: http://eavesdrop.openstack.org/irclogs/%23openstack-lbaas/%23openstack-lbaas.2017-11-02.log.html#t2017-11-02T11:07:45 | 10:21 |
ivve | old but still relevant and breaks in the same way | 10:21 |
*** wuchunyang has joined #openstack-lbaas | 10:38 | |
*** wuchunyang has quit IRC | 10:49 | |
*** wuchunyang has joined #openstack-lbaas | 10:49 | |
dulek | cgoncalves: Thanks for help! | 10:55 |
*** TMM has quit IRC | 10:56 | |
*** TMM has joined #openstack-lbaas | 10:56 | |
ivve | btw, is there any form of way to disable octavias behaviour on automatic failover/recreation of objects/resources | 11:07 |
ivve | as every time octavia loses network connectivity i'd like warnings and logs rather than full environment failover | 11:07 |
ivve | as it rarely fail in other way, i.e. a host dies and INSTANTLY needs a new amphora | 11:08 |
ivve | i'd rather just be notified that active/passive amphora died and needs action. this would greatly solve the aftermath when octavia tries to "solve" a backend network issue. which it probably will never be able to do, its kinda too much to ask for imo | 11:09 |
ivve | i guess i could set an extreme heartbeat_timeout? | 11:20 |
*** wuchunyang has quit IRC | 12:11 | |
*** servagem has joined #openstack-lbaas | 12:16 | |
*** yamamoto has quit IRC | 12:22 | |
*** rpittau|bbl is now known as rpittau | 12:22 | |
*** yamamoto has joined #openstack-lbaas | 12:35 | |
*** riuzen has joined #openstack-lbaas | 12:59 | |
*** riuzen has quit IRC | 13:20 | |
*** TrevorV has joined #openstack-lbaas | 13:30 | |
*** psachin has quit IRC | 13:44 | |
openstackgerrit | Gregory Thiemonge proposed openstack/octavia-tempest-plugin master: DNM check UDP pool fix https://review.opendev.org/737283 | 14:51 |
*** TMM has quit IRC | 15:05 | |
*** TMM has joined #openstack-lbaas | 15:05 | |
*** armax has joined #openstack-lbaas | 15:09 | |
*** rpittau is now known as rpittau|afk | 16:01 | |
*** aannuusshhkkaa has joined #openstack-lbaas | 16:04 | |
*** ccamposr has joined #openstack-lbaas | 16:08 | |
aannuusshhkkaa | Hello! shtepanie, rm_work and I have been working on updating the amphora stats driver interface. It is still a WIP. Here is the link to the change we have put up: https://review.opendev.org/#/c/737111/1 . Any and all reviews/comments are welcome! | 16:11 |
*** ccamposr__ has quit IRC | 16:11 | |
*** shtepanie has joined #openstack-lbaas | 16:27 | |
rm_work | johnsom: ^^ hopefully that's moving the right direction | 16:37 |
johnsom | Yeah, was going to take a look this morning after I dig out from weekend/Monday e-mails | 16:38 |
rm_work | plan is to do the status driver interface the same way, it's just much more complicated | 16:38 |
rm_work | well, a bit more complicated | 16:38 |
*** gcheresh has quit IRC | 16:56 | |
cgoncalves | octavia-v2-dsvm-scenarioSUCCESS in 38m 58s | 17:09 |
johnsom | So it didn't run? | 17:10 |
cgoncalves | it did -- https://0c1967d9212ec47f9513-eccda9a716b7d91f091af6c9420bdc89.ssl.cf5.rackcdn.com/731416/1/check/octavia-v2-dsvm-scenario/33ff8cd/testr_results.html | 17:10 |
johnsom | Devstack install in 18 minutes, what voodoo has mnaser invoked? (ubuntu-bionic-vexxhost-ca-ymq-1-0016825754) | 17:13 |
mnaser | johnsom: may or may not be super fast new amd epyc gen 2 machines with raid-0'd local storage | 17:14 |
mnaser | :) | 17:14 |
johnsom | mnaser Sold! | 17:14 |
mnaser | that's awesome feedback to hear, haha | 17:14 |
mnaser | johnsom: not announced yet tho ;) | 17:15 |
johnsom | mnaser Just to give you an idea, that 38 minute job run on your gear takes 1:50 on a different cloud.... | 17:16 |
mnaser | aha. I love the “so it didn’t run comment” | 17:19 |
johnsom | Yeah, that is usually want gets that kind of result. Tempest just skips all of the tests, etc. | 17:19 |
mnaser | johnsom: i think we have nested virt enabled too on those with a _much_ newer kernel too | 17:20 |
cgoncalves | for sure with nested virt | 17:20 |
johnsom | mnaser Yeah, it's clearly a combination. Nested virt usually takes it down to about an hour. | 17:20 |
johnsom | mnaser Congratulations. You get the Octavia team "smokin' fast cloud" award for 2020. | 17:23 |
mnaser | wewt | 17:23 |
mnaser | \o/ | 17:23 |
rm_work | dayum | 17:25 |
openstackgerrit | Michael Johnson proposed openstack/octavia-tempest-plugin master: Fix availability zone API tests https://review.opendev.org/737191 | 17:29 |
johnsom | rm_work have a minute to chat about the stats patch? | 17:48 |
rm_work | i think we prolly do -- aannuusshhkkaa / shtepanie | 17:48 |
johnsom | Ok, just wanted to bounce some thoughts around before I commented | 17:48 |
johnsom | So, I like the idea of moving the packet parsing up to the amphora driver. This makes sense to me. | 17:49 |
johnsom | We could be a bit more bold and nuke this whole mixin thing, as I'm not sure it brings us any value. | 17:50 |
rm_work | yeaaahhh i'm not sure why it's a mixin? | 17:50 |
rm_work | i mean... i think we did kinda nuke part of it? | 17:50 |
johnsom | Also, we might consider moving the octavia.amphora.stats_update_drivers stevedore lookup to a singleton as I don't think it will really get live-swapped. Though open to thoughts on that. | 17:50 |
johnsom | Yeah, exactly, I think we should just remove the whole mixin stuff on the amp side. It's just extra code we don't really use/need. | 17:51 |
rm_work | is there any on the amp? | 17:52 |
johnsom | https://review.opendev.org/#/c/737111/1/octavia/amphorae/drivers/driver_base.py | 17:52 |
johnsom | That bit seems.... | 17:52 |
johnsom | Yeah, I didn't mean in the amp, but under the amp driver. | 17:55 |
johnsom | The current code hops back and forth which is lame. | 17:56 |
rm_work | hmm | 17:57 |
rm_work | yeah | 17:57 |
rm_work | err tho on the stevedore part | 17:57 |
rm_work | it's gonna be a loop over "handlers" rather than "handler" i think? | 17:58 |
rm_work | or does stevedore have a native way to handle that | 17:58 |
johnsom | I think there is a native "call this on all" option. Let me refresh my memory | 17:58 |
johnsom | Maybe https://docs.openstack.org/stevedore/latest/user/patterns_loading.html#hooks-single-name-many-entry-points ? | 18:00 |
johnsom | Or maybe https://docs.openstack.org/stevedore/latest/reference/index.html#namedextensionmanager | 18:01 |
rm_work | hmm | 18:04 |
aannuusshhkkaa | https://www.irccloud.com/pastebin/qT9IamKu/ | 18:11 |
johnsom | I think that is ok. The listener IDs are globally unique. | 18:12 |
aannuusshhkkaa | right, and we dont want loadbalancer_id at all? Wouldn't it help in appropriate "roll-ups"? | 18:15 |
johnsom | Well, there is a direct relationship between the load balancer (parent | 18:16 |
johnsom | ) and the listener (child). | 18:16 |
aannuusshhkkaa | aah okay | 18:17 |
johnsom | So, the new SQL for the deltas change should be able to update both at the same time. | 18:17 |
rm_work | it would make sense to include the LB_ID somehow in OTHER drivers | 18:17 |
aannuusshhkkaa | so we will be able to uniquely identify the loadbalancer from the listener_id by querying the DB again | 18:18 |
rm_work | in the DB driver, it's stored in such a way that retrieval WILL have the LB_ID | 18:18 |
rm_work | but we'll maybe want to look it up before sending to influx | 18:18 |
johnsom | Right, I expect the "external" drivers will want to know that relationship. | 18:18 |
johnsom | The question is does it need to be in the message from the amps? probably not | 18:19 |
rm_work | oh if the amp has it... MAYBE | 18:21 |
rm_work | it saves us a DB query | 18:21 |
rm_work | which ... for health... | 18:21 |
aannuusshhkkaa | yeah.. that is what i was thinking.. one less hit to the DB | 18:21 |
johnsom | Yeah, I keep thinking of these as separate messages, but they aren't. We have to have the LB ID for health. | 18:22 |
rm_work | err | 18:23 |
rm_work | i don't think so? but | 18:23 |
johnsom | https://github.com/openstack/octavia/blob/master/octavia/controller/healthmanager/health_drivers/update_db.py#L159 | 18:23 |
johnsom | Well, we could reverse lookup, but it's like the very first query | 18:24 |
rm_work | it's still the part that is running in our critical path | 18:24 |
rm_work | ah yeah but we need the whole LB not just the ID | 18:24 |
johnsom | It's my brain malfunction that keeps thinking they are separate. | 18:24 |
rm_work | for the stats we'd just need the ID | 18:24 |
johnsom | Yeah | 18:24 |
rm_work | so do we make ANOTHER query to the DB for the LB_ID for the stats message for non-db drivers? | 18:25 |
rm_work | or... include it in the message | 18:25 |
johnsom | Well, it's already there in the message, so I say we just keep it | 18:25 |
rm_work | errr | 18:27 |
rm_work | i don't think it is? | 18:27 |
johnsom | Oh, ID is amphora id.... | 18:29 |
rm_work | right | 18:29 |
rm_work | we removed it from ... like... the sample output format the mixin (that was never used) defined | 18:29 |
rm_work | but passing LB_ID to another driver would require a lookup | 18:30 |
rm_work | in the health (speed sensitive) section | 18:30 |
aannuusshhkkaa | looks like we are fetching the LB in VRRPDriverMixin in the very next function.. could we use the same one? | 18:30 |
johnsom | Health should be in it's own thread/process though right? Stats gets split off, so a lookup probably isn't too bad. Plus, we are going to need/want the project ID I expect when we send it out to other external targets | 18:32 |
rm_work | sorry i just mean "health in general" | 18:33 |
rm_work | the "health manager" is all kinda critical path... in that either backing up is not good | 18:38 |
rm_work | health-type-message is worse obviously | 18:39 |
rm_work | but stats backing up isn't greate ither | 18:39 |
*** mloza has joined #openstack-lbaas | 18:39 | |
johnsom | Yeah | 18:42 |
*** rouk has joined #openstack-lbaas | 18:47 | |
rouk | for moving to train, aka the multi ca change, theres nothing in kolla for the actual upgrade, it does the cert and config placement fine, but wont update amphoras client CA while octavia is down, etc. will it work as intended if we push the new certs and restart the agents right after the octavia services come back up? | 19:30 |
johnsom | Octavia production deployments have always been multi-CA. (just saying, but I know some of the deployment tools copied the old devstack setup that used a single CA) | 19:33 |
rouk | yeah, kolla-ansible was single. | 19:34 |
rouk | which is fun. | 19:34 |
rouk | so im just looking if me doing the change then pushing the amphora client CA file and restarting the agent 1-20 seconds after octavia is reconfigured would do any damage | 19:35 |
johnsom | So, if it is rotating all of the certificates, you will have to failover the amphora (which will happen automatically, but maybe at a larger volume than you would like). If it is only rotating some of them, you can set the cert expiration dates in the DB and the housekeeping process will rotate them automatically. I just don't know what kolla has done for the transition. Either way, I would consider stopping | 19:36 |
johnsom | the health manager while you transition and heavily test the upgrade on a throw away deployment. | 19:36 |
johnsom | Yeah, worst case, Octavia will think they are compromised amphora and just rebuild them via failovers. | 19:37 |
johnsom | You will see messages in the health manager if it thinks the certs are bad. | 19:37 |
rouk | https://etherpad.opendev.org/p/octavia-single-ca-to-multi-ca im just reading this, which implies the only thing the amphora needs to do is have the new CA copied to it? | 19:37 |
johnsom | As for the agent, yeah, a simple restart will pick up new certs. | 19:37 |
rouk | and if i do that fast enough, i shouldnt have major rebuilds, right? | 19:38 |
johnsom | Or just stop your health managers for the period of time you are working. | 19:38 |
rouk | yeah, but thats the only thing i need to do? theres nothing im missing? just have the client CA there before health manager comes back? | 19:39 |
johnsom | Ah, yeah, that etherpad. I wrote that up over a year ago. Let me refresh my memory. | 19:39 |
rouk | trying to avoid a rebuild on 134 amphoras, which will be... quite the processing. | 19:40 |
johnsom | That was also targeted at the tripleo deployer, just FYI | 19:40 |
*** gthiemonge has quit IRC | 19:40 | |
rouk | yeah, kolla-ansible does exactly nothing, just places certs and configs in the control plane, so im using this as an example of what pieces are missing. | 19:40 |
rouk | which, looks like i just need to copy the CA and im good, and i wanted to confirm that i wasnt crazy. | 19:41 |
*** gthiemonge has joined #openstack-lbaas | 19:41 | |
johnsom | Yeah, I think it's both copy the CA over and, if you are changing out the "server" CA, you will need to set the expiration dates in the DB, Line 90 | 19:43 |
johnsom | Let the housekeeping update the "server" certs in the amps, then re-enable HM | 19:44 |
rouk | we are not swapping out the server ca this update | 19:46 |
rouk | oh, nevermind, other way around, adding server ca, so i guess we have to do that expiration. | 19:47 |
johnsom | Yeah, there is context of which side is client and which is server in the guide: https://docs.openstack.org/octavia/latest/admin/guides/certificates.html | 19:48 |
johnsom | It's a two-way authentication, so can be a bit confusing | 19:48 |
rouk | how long do you think for housekeeping to respond to 140 amphoras needing certs issued? | 19:50 |
rouk | keep healthmanager down for an hour? | 19:50 |
johnsom | Oh, I doubt you will need more than half an hour. It will log each rotation | 19:50 |
rouk | alright | 19:51 |
*** gcheresh has joined #openstack-lbaas | 19:51 | |
*** ataraday_ has quit IRC | 19:56 | |
*** vishalmanchanda has quit IRC | 20:16 | |
*** gcheresh has quit IRC | 21:10 | |
*** spatel has joined #openstack-lbaas | 21:13 | |
*** spatel has quit IRC | 21:36 | |
*** maciejjozefczyk has quit IRC | 21:36 | |
*** spatel has joined #openstack-lbaas | 21:42 | |
*** spatel has quit IRC | 21:46 | |
*** spatel has joined #openstack-lbaas | 21:52 | |
*** spatel has quit IRC | 22:10 | |
*** gthiemonge has quit IRC | 22:10 | |
*** gthiemonge has joined #openstack-lbaas | 22:11 | |
*** TrevorV has quit IRC | 22:16 | |
*** spatel has joined #openstack-lbaas | 22:28 | |
*** spatel has quit IRC | 22:31 | |
*** rcernin_ has joined #openstack-lbaas | 22:33 | |
*** born2bake has quit IRC | 22:42 | |
*** rcernin_ has quit IRC | 22:47 | |
*** tkajinam has joined #openstack-lbaas | 22:51 | |
*** rcernin_ has joined #openstack-lbaas | 23:02 | |
*** rcernin_ has quit IRC | 23:16 | |
*** rcernin has joined #openstack-lbaas | 23:18 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!