Monday, 2020-06-22

*** hongbin has quit IRC00:29
*** hongbin has joined #openstack-lbaas00:40
*** wuchunyang has joined #openstack-lbaas00:59
*** wuchunyang has quit IRC01:05
*** yamamoto has quit IRC02:28
*** yamamoto has joined #openstack-lbaas02:36
*** rcernin_ has joined #openstack-lbaas02:58
*** rcernin has quit IRC02:59
*** rcernin_ has quit IRC03:16
*** ramishra has joined #openstack-lbaas03:30
*** rcernin_ has joined #openstack-lbaas03:32
*** psachin has joined #openstack-lbaas03:39
*** rcernin_ has quit IRC03:45
*** rcernin has joined #openstack-lbaas03:45
*** wuchunyang has joined #openstack-lbaas04:02
*** wuchunyang has quit IRC04:06
*** vishalmanchanda has joined #openstack-lbaas04:29
*** hongbin has quit IRC04:39
*** rcernin has quit IRC05:32
*** gcheresh has joined #openstack-lbaas05:34
*** rcernin has joined #openstack-lbaas05:40
*** rpittau|afk is now known as rpittau06:21
openstackgerritMerged openstack/octavia master: fix(elements): fix nf_conntrack sysctl param names  https://review.opendev.org/70667407:02
*** maciejjozefczyk has joined #openstack-lbaas07:09
*** stingrayza has joined #openstack-lbaas07:23
*** also_stingrayza has quit IRC07:25
*** rcernin_ has joined #openstack-lbaas07:47
*** rcernin has quit IRC07:47
*** rcernin_ has quit IRC07:54
*** born2bake has joined #openstack-lbaas08:13
*** ccamposr__ has joined #openstack-lbaas08:14
*** ccamposr has quit IRC08:17
*** ataraday_ has joined #openstack-lbaas08:32
*** salmankhan has joined #openstack-lbaas08:33
*** salmankhan has quit IRC08:36
*** dayou_ has joined #openstack-lbaas08:36
*** dayou has quit IRC08:39
openstackgerritMerged openstack/octavia master: Cap jsonschema 3.2.0 as the minimal version  https://review.opendev.org/73096109:05
*** tkajinam has quit IRC09:21
dulekHi! Can I ask you to take a look at why kuryr-kubernetes-tempest-(train|stein) are failing on https://review.opendev.org/#/c/734364?09:41
dulek"/opt/stack/devstack/inc/python: line 456: cd: /opt/stack/diskimage-builder: No such file or directory" - this is pretty specific, we're probably missing something in local.conf? Does it ring a bell?09:41
*** ivve has joined #openstack-lbaas09:46
ivveoi folks, i've got a question about recreation of vrrp ports. i've got the usual scenario of losing network connectivity, octavia loses connection to lbs, which it tries to failover which of course fails due to network issues. then im left with tons of lbs in error. vrrp ports missing, recreate them but i get this when failing them over now:09:48
ivveAmphora c198a06d-a4c4-4b4c-a35f-70b06ea5fb76 failover exception: subnet not found (subnet id: None).: SubnetNotFound: subnet not found (subnet id: None).09:48
ivveusing the command: neutron port-create --tenant-id <LB project/tenant ID> --name octavia-lb-vrrp-<amp ID> --security-group lb-<lb ID> --allowed-address-pair ip_address=<VIP IP address> <network ID for VIP>09:49
ivvethe info (i assume) should come from the ports as neither the loadbalancer nor the amphora keeps that in the db09:52
ivvethe fixed_ips field on the port does contain the correct {"subnet_id": xxx}09:53
ivveso im a bit confused on where it is looking atm09:53
*** rpittau is now known as rpittau|bbl10:15
ivvesimilar issue described here: http://eavesdrop.openstack.org/irclogs/%23openstack-lbaas/%23openstack-lbaas.2017-11-02.log.html#t2017-11-02T11:07:4510:21
ivveold but still relevant and breaks in the same way10:21
*** wuchunyang has joined #openstack-lbaas10:38
*** wuchunyang has quit IRC10:49
*** wuchunyang has joined #openstack-lbaas10:49
dulekcgoncalves: Thanks for help!10:55
*** TMM has quit IRC10:56
*** TMM has joined #openstack-lbaas10:56
ivvebtw, is there any form of way to disable octavias behaviour on automatic failover/recreation of objects/resources11:07
ivveas every time octavia loses network connectivity i'd like warnings and logs rather than full environment failover11:07
ivveas it rarely fail in other way, i.e. a host dies and INSTANTLY needs a new amphora11:08
ivvei'd rather just be notified that active/passive amphora died and needs action. this would greatly solve the aftermath when octavia tries to "solve" a backend network issue. which it probably will never be able to do, its kinda too much to ask for imo11:09
ivvei guess i could set an extreme heartbeat_timeout?11:20
*** wuchunyang has quit IRC12:11
*** servagem has joined #openstack-lbaas12:16
*** yamamoto has quit IRC12:22
*** rpittau|bbl is now known as rpittau12:22
*** yamamoto has joined #openstack-lbaas12:35
*** riuzen has joined #openstack-lbaas12:59
*** riuzen has quit IRC13:20
*** TrevorV has joined #openstack-lbaas13:30
*** psachin has quit IRC13:44
openstackgerritGregory Thiemonge proposed openstack/octavia-tempest-plugin master: DNM check UDP pool fix  https://review.opendev.org/73728314:51
*** TMM has quit IRC15:05
*** TMM has joined #openstack-lbaas15:05
*** armax has joined #openstack-lbaas15:09
*** rpittau is now known as rpittau|afk16:01
*** aannuusshhkkaa has joined #openstack-lbaas16:04
*** ccamposr has joined #openstack-lbaas16:08
aannuusshhkkaaHello! shtepanie, rm_work and I have been working on updating the amphora stats driver interface. It is still a WIP. Here is the link to the change we have put up: https://review.opendev.org/#/c/737111/1 . Any and all reviews/comments are welcome!16:11
*** ccamposr__ has quit IRC16:11
*** shtepanie has joined #openstack-lbaas16:27
rm_workjohnsom: ^^ hopefully that's moving the right direction16:37
johnsomYeah, was going to take a look this morning after I dig out from weekend/Monday e-mails16:38
rm_workplan is to do the status driver interface the same way, it's just much more complicated16:38
rm_workwell, a bit more complicated16:38
*** gcheresh has quit IRC16:56
cgoncalvesoctavia-v2-dsvm-scenarioSUCCESS in 38m 58s17:09
johnsomSo it didn't run?17:10
cgoncalvesit did -- https://0c1967d9212ec47f9513-eccda9a716b7d91f091af6c9420bdc89.ssl.cf5.rackcdn.com/731416/1/check/octavia-v2-dsvm-scenario/33ff8cd/testr_results.html17:10
johnsomDevstack install in 18 minutes, what voodoo has mnaser invoked? (ubuntu-bionic-vexxhost-ca-ymq-1-0016825754)17:13
mnaserjohnsom: may or may not be super fast new amd epyc gen 2 machines with raid-0'd local storage17:14
mnaser:)17:14
johnsommnaser Sold!17:14
mnaserthat's awesome feedback to hear, haha17:14
mnaserjohnsom: not announced yet tho ;)17:15
johnsommnaser Just to give you an idea, that 38 minute job run on your gear takes 1:50 on a different cloud....17:16
mnaseraha.  I love the “so it didn’t run comment”17:19
johnsomYeah, that is usually want gets that kind of result. Tempest just skips all of the tests, etc.17:19
mnaserjohnsom: i think we have nested virt enabled too on those with a _much_ newer kernel too17:20
cgoncalvesfor sure with nested virt17:20
johnsommnaser Yeah, it's clearly a combination. Nested virt usually takes it down to about an hour.17:20
johnsommnaser Congratulations. You get the Octavia team "smokin' fast cloud" award for 2020.17:23
mnaserwewt17:23
mnaser\o/17:23
rm_workdayum17:25
openstackgerritMichael Johnson proposed openstack/octavia-tempest-plugin master: Fix availability zone API tests  https://review.opendev.org/73719117:29
johnsomrm_work have a minute to chat about the stats patch?17:48
rm_worki think we prolly do -- aannuusshhkkaa / shtepanie17:48
johnsomOk, just wanted to bounce some thoughts around before I commented17:48
johnsomSo, I like the idea of moving the packet parsing up to the amphora driver. This makes sense to me.17:49
johnsomWe could be a bit more bold and nuke this whole mixin thing, as I'm not sure it brings us any value.17:50
rm_workyeaaahhh i'm not sure why it's a mixin?17:50
rm_worki mean... i think we did kinda nuke part of it?17:50
johnsomAlso, we might consider moving the octavia.amphora.stats_update_drivers stevedore lookup to a singleton as I don't think it will really get live-swapped. Though open to thoughts on that.17:50
johnsomYeah, exactly, I think we should just remove the whole mixin stuff on the amp side. It's just extra code we don't really use/need.17:51
rm_workis there any on the amp?17:52
johnsomhttps://review.opendev.org/#/c/737111/1/octavia/amphorae/drivers/driver_base.py17:52
johnsomThat bit seems....17:52
johnsomYeah, I didn't mean in the amp, but under the amp driver.17:55
johnsomThe current code hops back  and forth which is lame.17:56
rm_workhmm17:57
rm_workyeah17:57
rm_workerr tho on the stevedore part17:57
rm_workit's gonna be a loop over "handlers" rather than "handler" i think?17:58
rm_workor does stevedore have a native way to handle that17:58
johnsomI think there is a native "call this on all" option. Let me refresh my memory17:58
johnsomMaybe https://docs.openstack.org/stevedore/latest/user/patterns_loading.html#hooks-single-name-many-entry-points ?18:00
johnsomOr maybe https://docs.openstack.org/stevedore/latest/reference/index.html#namedextensionmanager18:01
rm_workhmm18:04
aannuusshhkkaahttps://www.irccloud.com/pastebin/qT9IamKu/18:11
johnsomI think that is ok. The listener IDs are globally unique.18:12
aannuusshhkkaaright, and we dont want loadbalancer_id at all? Wouldn't it help in appropriate "roll-ups"?18:15
johnsomWell, there is a direct relationship between the load balancer (parent18:16
johnsom) and the listener (child).18:16
aannuusshhkkaaaah okay18:17
johnsomSo, the new SQL for the deltas change should be able to update both at the same time.18:17
rm_workit would make sense to include the LB_ID somehow in OTHER drivers18:17
aannuusshhkkaaso we will be able to uniquely identify the loadbalancer from the listener_id by querying the DB again18:18
rm_workin the DB driver, it's stored in such a way that retrieval WILL have the LB_ID18:18
rm_workbut we'll maybe want to look it up before sending to influx18:18
johnsomRight, I expect the "external" drivers will want to know that relationship.18:18
johnsomThe question is does it need to be in the message from the amps? probably not18:19
rm_workoh if the amp has it... MAYBE18:21
rm_workit saves us a DB query18:21
rm_workwhich ... for health...18:21
aannuusshhkkaayeah.. that is what i was thinking.. one less hit to the DB18:21
johnsomYeah, I keep thinking of these as separate messages, but they aren't.  We have to have the LB ID for health.18:22
rm_workerr18:23
rm_worki don't think so? but18:23
johnsomhttps://github.com/openstack/octavia/blob/master/octavia/controller/healthmanager/health_drivers/update_db.py#L15918:23
johnsomWell, we could reverse lookup, but it's like the very first query18:24
rm_workit's still the part that is running in our critical path18:24
rm_workah yeah but we need the whole LB not just the ID18:24
johnsomIt's my brain malfunction that keeps thinking they are separate.18:24
rm_workfor the stats we'd just need the ID18:24
johnsomYeah18:24
rm_workso do we make ANOTHER query to the DB for the LB_ID for the stats message for non-db drivers?18:25
rm_workor... include it in the message18:25
johnsomWell, it's already there in the message, so I say we just keep it18:25
rm_workerrr18:27
rm_worki don't think it is?18:27
johnsomOh, ID is amphora id....18:29
rm_workright18:29
rm_workwe removed it from ... like... the sample output format the mixin (that was never used) defined18:29
rm_workbut passing LB_ID to another driver would require a lookup18:30
rm_workin the health (speed sensitive) section18:30
aannuusshhkkaalooks like we are fetching the LB in VRRPDriverMixin in the very next function.. could we use the same one?18:30
johnsomHealth should be in it's own thread/process though right? Stats gets split off, so a lookup probably isn't too bad. Plus, we are going to need/want the project ID I expect when we send it out to other external targets18:32
rm_worksorry i just mean "health in general"18:33
rm_workthe "health manager" is all kinda critical path... in that either backing up is not good18:38
rm_workhealth-type-message is worse obviously18:39
rm_workbut stats backing up isn't greate ither18:39
*** mloza has joined #openstack-lbaas18:39
johnsomYeah18:42
*** rouk has joined #openstack-lbaas18:47
roukfor moving to train, aka the multi ca change, theres nothing in kolla for the actual upgrade, it does the cert and config placement fine, but wont update amphoras client CA while octavia is down, etc. will it work as intended if we push the new certs and restart the agents right after the octavia services come back up?19:30
johnsomOctavia production deployments have always been multi-CA. (just saying, but I know some of the deployment tools copied the old devstack setup that used a single CA)19:33
roukyeah, kolla-ansible was single.19:34
roukwhich is fun.19:34
roukso im just looking if me doing the change then pushing the amphora client CA file and restarting the agent 1-20 seconds after octavia is reconfigured would do any damage19:35
johnsomSo, if it is rotating all of the certificates, you will have to failover the amphora (which will happen automatically, but maybe at a larger volume than you would like). If it is only rotating some of them, you can set the cert expiration dates in the DB and the housekeeping process will rotate them automatically. I just don't know what kolla has done for the transition. Either way, I would consider stopping19:36
johnsomthe health manager while you transition and heavily test the upgrade on a throw away deployment.19:36
johnsomYeah, worst case, Octavia will think they are compromised amphora and just rebuild them via failovers.19:37
johnsomYou will see messages in the health manager if it thinks the certs are bad.19:37
roukhttps://etherpad.opendev.org/p/octavia-single-ca-to-multi-ca im just reading this, which implies the only thing the amphora needs to do is have the new CA copied to it?19:37
johnsomAs for the agent, yeah, a simple restart will pick up new certs.19:37
roukand if i do that fast enough, i shouldnt have major rebuilds, right?19:38
johnsomOr just stop your health managers for the period of time you are working.19:38
roukyeah, but thats the only thing i need to do? theres nothing im missing? just have the client CA there before health manager comes back?19:39
johnsomAh, yeah, that etherpad. I wrote that up over a year ago. Let me refresh my memory.19:39
rouktrying to avoid a rebuild on 134 amphoras, which will be... quite the processing.19:40
johnsomThat was also targeted at the tripleo deployer, just FYI19:40
*** gthiemonge has quit IRC19:40
roukyeah, kolla-ansible does exactly nothing, just places certs and configs in the control plane, so im using this as an example of what pieces are missing.19:40
roukwhich, looks like i just need to copy the CA and im good, and i wanted to confirm that i wasnt crazy.19:41
*** gthiemonge has joined #openstack-lbaas19:41
johnsomYeah, I think it's both copy the CA over and, if you are changing out the "server" CA, you will need to set the expiration dates in the DB, Line 9019:43
johnsomLet the housekeeping update the "server" certs in the amps, then re-enable HM19:44
roukwe are not swapping out the server ca this update19:46
roukoh, nevermind, other way around, adding server ca, so i guess we have to do that expiration.19:47
johnsomYeah, there is context of which side is client and which is server in the guide: https://docs.openstack.org/octavia/latest/admin/guides/certificates.html19:48
johnsomIt's a two-way authentication, so can be a bit confusing19:48
roukhow long do you think for housekeeping to respond to 140 amphoras needing certs issued?19:50
roukkeep healthmanager down for an hour?19:50
johnsomOh, I doubt you will need more than half an hour. It will log each rotation19:50
roukalright19:51
*** gcheresh has joined #openstack-lbaas19:51
*** ataraday_ has quit IRC19:56
*** vishalmanchanda has quit IRC20:16
*** gcheresh has quit IRC21:10
*** spatel has joined #openstack-lbaas21:13
*** spatel has quit IRC21:36
*** maciejjozefczyk has quit IRC21:36
*** spatel has joined #openstack-lbaas21:42
*** spatel has quit IRC21:46
*** spatel has joined #openstack-lbaas21:52
*** spatel has quit IRC22:10
*** gthiemonge has quit IRC22:10
*** gthiemonge has joined #openstack-lbaas22:11
*** TrevorV has quit IRC22:16
*** spatel has joined #openstack-lbaas22:28
*** spatel has quit IRC22:31
*** rcernin_ has joined #openstack-lbaas22:33
*** born2bake has quit IRC22:42
*** rcernin_ has quit IRC22:47
*** tkajinam has joined #openstack-lbaas22:51
*** rcernin_ has joined #openstack-lbaas23:02
*** rcernin_ has quit IRC23:16
*** rcernin has joined #openstack-lbaas23:18

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!