Monday, 2020-06-22

*** hongbin has quit IRC		00:29
*** hongbin has joined #openstack-lbaas		00:40
*** wuchunyang has joined #openstack-lbaas		00:59
*** wuchunyang has quit IRC		01:05
*** yamamoto has quit IRC		02:28
*** yamamoto has joined #openstack-lbaas		02:36
*** rcernin_ has joined #openstack-lbaas		02:58
*** rcernin has quit IRC		02:59
*** rcernin_ has quit IRC		03:16
*** ramishra has joined #openstack-lbaas		03:30
*** rcernin_ has joined #openstack-lbaas		03:32
*** psachin has joined #openstack-lbaas		03:39
*** rcernin_ has quit IRC		03:45
*** rcernin has joined #openstack-lbaas		03:45
*** wuchunyang has joined #openstack-lbaas		04:02
*** wuchunyang has quit IRC		04:06
*** vishalmanchanda has joined #openstack-lbaas		04:29
*** hongbin has quit IRC		04:39
*** rcernin has quit IRC		05:32
*** gcheresh has joined #openstack-lbaas		05:34
*** rcernin has joined #openstack-lbaas		05:40
*** rpittau\|afk is now known as rpittau		06:21
openstackgerrit	Merged openstack/octavia master: fix(elements): fix nf_conntrack sysctl param names https://review.opendev.org/706674	07:02
*** maciejjozefczyk has joined #openstack-lbaas		07:09
*** stingrayza has joined #openstack-lbaas		07:23
*** also_stingrayza has quit IRC		07:25
*** rcernin_ has joined #openstack-lbaas		07:47
*** rcernin has quit IRC		07:47
*** rcernin_ has quit IRC		07:54
*** born2bake has joined #openstack-lbaas		08:13
*** ccamposr__ has joined #openstack-lbaas		08:14
*** ccamposr has quit IRC		08:17
*** ataraday_ has joined #openstack-lbaas		08:32
*** salmankhan has joined #openstack-lbaas		08:33
*** salmankhan has quit IRC		08:36
*** dayou_ has joined #openstack-lbaas		08:36
*** dayou has quit IRC		08:39
openstackgerrit	Merged openstack/octavia master: Cap jsonschema 3.2.0 as the minimal version https://review.opendev.org/730961	09:05
*** tkajinam has quit IRC		09:21
dulek	Hi! Can I ask you to take a look at why kuryr-kubernetes-tempest-(train\|stein) are failing on https://review.opendev.org/#/c/734364?	09:41
dulek	"/opt/stack/devstack/inc/python: line 456: cd: /opt/stack/diskimage-builder: No such file or directory" - this is pretty specific, we're probably missing something in local.conf? Does it ring a bell?	09:41
*** ivve has joined #openstack-lbaas		09:46
ivve	oi folks, i've got a question about recreation of vrrp ports. i've got the usual scenario of losing network connectivity, octavia loses connection to lbs, which it tries to failover which of course fails due to network issues. then im left with tons of lbs in error. vrrp ports missing, recreate them but i get this when failing them over now:	09:48
ivve	Amphora c198a06d-a4c4-4b4c-a35f-70b06ea5fb76 failover exception: subnet not found (subnet id: None).: SubnetNotFound: subnet not found (subnet id: None).	09:48
ivve	using the command: neutron port-create --tenant-id <LB project/tenant ID> --name octavia-lb-vrrp-<amp ID> --security-group lb-<lb ID> --allowed-address-pair ip_address=<VIP IP address> <network ID for VIP>	09:49
ivve	the info (i assume) should come from the ports as neither the loadbalancer nor the amphora keeps that in the db	09:52
ivve	the fixed_ips field on the port does contain the correct {"subnet_id": xxx}	09:53
ivve	so im a bit confused on where it is looking atm	09:53
*** rpittau is now known as rpittau\|bbl		10:15
ivve	similar issue described here: http://eavesdrop.openstack.org/irclogs/%23openstack-lbaas/%23openstack-lbaas.2017-11-02.log.html#t2017-11-02T11:07:45	10:21
ivve	old but still relevant and breaks in the same way	10:21
*** wuchunyang has joined #openstack-lbaas		10:38
*** wuchunyang has quit IRC		10:49
*** wuchunyang has joined #openstack-lbaas		10:49
dulek	cgoncalves: Thanks for help!	10:55
*** TMM has quit IRC		10:56
*** TMM has joined #openstack-lbaas		10:56
ivve	btw, is there any form of way to disable octavias behaviour on automatic failover/recreation of objects/resources	11:07
ivve	as every time octavia loses network connectivity i'd like warnings and logs rather than full environment failover	11:07
ivve	as it rarely fail in other way, i.e. a host dies and INSTANTLY needs a new amphora	11:08
ivve	i'd rather just be notified that active/passive amphora died and needs action. this would greatly solve the aftermath when octavia tries to "solve" a backend network issue. which it probably will never be able to do, its kinda too much to ask for imo	11:09
ivve	i guess i could set an extreme heartbeat_timeout?	11:20
*** wuchunyang has quit IRC		12:11
*** servagem has joined #openstack-lbaas		12:16
*** yamamoto has quit IRC		12:22
*** rpittau\|bbl is now known as rpittau		12:22
*** yamamoto has joined #openstack-lbaas		12:35
*** riuzen has joined #openstack-lbaas		12:59
*** riuzen has quit IRC		13:20
*** TrevorV has joined #openstack-lbaas		13:30
*** psachin has quit IRC		13:44
openstackgerrit	Gregory Thiemonge proposed openstack/octavia-tempest-plugin master: DNM check UDP pool fix https://review.opendev.org/737283	14:51
*** TMM has quit IRC		15:05
*** TMM has joined #openstack-lbaas		15:05
*** armax has joined #openstack-lbaas		15:09
*** rpittau is now known as rpittau\|afk		16:01
*** aannuusshhkkaa has joined #openstack-lbaas		16:04
*** ccamposr has joined #openstack-lbaas		16:08
aannuusshhkkaa	Hello! shtepanie, rm_work and I have been working on updating the amphora stats driver interface. It is still a WIP. Here is the link to the change we have put up: https://review.opendev.org/#/c/737111/1 . Any and all reviews/comments are welcome!	16:11
*** ccamposr__ has quit IRC		16:11
*** shtepanie has joined #openstack-lbaas		16:27
rm_work	johnsom: ^^ hopefully that's moving the right direction	16:37
johnsom	Yeah, was going to take a look this morning after I dig out from weekend/Monday e-mails	16:38
rm_work	plan is to do the status driver interface the same way, it's just much more complicated	16:38
rm_work	well, a bit more complicated	16:38
*** gcheresh has quit IRC		16:56
cgoncalves	octavia-v2-dsvm-scenarioSUCCESS in 38m 58s	17:09
johnsom	So it didn't run?	17:10
cgoncalves	it did -- https://0c1967d9212ec47f9513-eccda9a716b7d91f091af6c9420bdc89.ssl.cf5.rackcdn.com/731416/1/check/octavia-v2-dsvm-scenario/33ff8cd/testr_results.html	17:10
johnsom	Devstack install in 18 minutes, what voodoo has mnaser invoked? (ubuntu-bionic-vexxhost-ca-ymq-1-0016825754)	17:13
mnaser	johnsom: may or may not be super fast new amd epyc gen 2 machines with raid-0'd local storage	17:14
mnaser	:)	17:14
johnsom	mnaser Sold!	17:14
mnaser	that's awesome feedback to hear, haha	17:14
mnaser	johnsom: not announced yet tho ;)	17:15
johnsom	mnaser Just to give you an idea, that 38 minute job run on your gear takes 1:50 on a different cloud....	17:16
mnaser	aha. I love the “so it didn’t run comment”	17:19
johnsom	Yeah, that is usually want gets that kind of result. Tempest just skips all of the tests, etc.	17:19
mnaser	johnsom: i think we have nested virt enabled too on those with a _much_ newer kernel too	17:20
cgoncalves	for sure with nested virt	17:20
johnsom	mnaser Yeah, it's clearly a combination. Nested virt usually takes it down to about an hour.	17:20
johnsom	mnaser Congratulations. You get the Octavia team "smokin' fast cloud" award for 2020.	17:23
mnaser	wewt	17:23
mnaser	\o/	17:23
rm_work	dayum	17:25
openstackgerrit	Michael Johnson proposed openstack/octavia-tempest-plugin master: Fix availability zone API tests https://review.opendev.org/737191	17:29
johnsom	rm_work have a minute to chat about the stats patch?	17:48
rm_work	i think we prolly do -- aannuusshhkkaa / shtepanie	17:48
johnsom	Ok, just wanted to bounce some thoughts around before I commented	17:48
johnsom	So, I like the idea of moving the packet parsing up to the amphora driver. This makes sense to me.	17:49
johnsom	We could be a bit more bold and nuke this whole mixin thing, as I'm not sure it brings us any value.	17:50
rm_work	yeaaahhh i'm not sure why it's a mixin?	17:50
rm_work	i mean... i think we did kinda nuke part of it?	17:50
johnsom	Also, we might consider moving the octavia.amphora.stats_update_drivers stevedore lookup to a singleton as I don't think it will really get live-swapped. Though open to thoughts on that.	17:50
johnsom	Yeah, exactly, I think we should just remove the whole mixin stuff on the amp side. It's just extra code we don't really use/need.	17:51
rm_work	is there any on the amp?	17:52
johnsom	https://review.opendev.org/#/c/737111/1/octavia/amphorae/drivers/driver_base.py	17:52
johnsom	That bit seems....	17:52
johnsom	Yeah, I didn't mean in the amp, but under the amp driver.	17:55
johnsom	The current code hops back and forth which is lame.	17:56
rm_work	hmm	17:57
rm_work	yeah	17:57
rm_work	err tho on the stevedore part	17:57
rm_work	it's gonna be a loop over "handlers" rather than "handler" i think?	17:58
rm_work	or does stevedore have a native way to handle that	17:58
johnsom	I think there is a native "call this on all" option. Let me refresh my memory	17:58
johnsom	Maybe https://docs.openstack.org/stevedore/latest/user/patterns_loading.html#hooks-single-name-many-entry-points ?	18:00
johnsom	Or maybe https://docs.openstack.org/stevedore/latest/reference/index.html#namedextensionmanager	18:01
rm_work	hmm	18:04
aannuusshhkkaa	https://www.irccloud.com/pastebin/qT9IamKu/	18:11
johnsom	I think that is ok. The listener IDs are globally unique.	18:12
aannuusshhkkaa	right, and we dont want loadbalancer_id at all? Wouldn't it help in appropriate "roll-ups"?	18:15
johnsom	Well, there is a direct relationship between the load balancer (parent	18:16
johnsom	) and the listener (child).	18:16
aannuusshhkkaa	aah okay	18:17
johnsom	So, the new SQL for the deltas change should be able to update both at the same time.	18:17
rm_work	it would make sense to include the LB_ID somehow in OTHER drivers	18:17
aannuusshhkkaa	so we will be able to uniquely identify the loadbalancer from the listener_id by querying the DB again	18:18
rm_work	in the DB driver, it's stored in such a way that retrieval WILL have the LB_ID	18:18
rm_work	but we'll maybe want to look it up before sending to influx	18:18
johnsom	Right, I expect the "external" drivers will want to know that relationship.	18:18
johnsom	The question is does it need to be in the message from the amps? probably not	18:19
rm_work	oh if the amp has it... MAYBE	18:21
rm_work	it saves us a DB query	18:21
rm_work	which ... for health...	18:21
aannuusshhkkaa	yeah.. that is what i was thinking.. one less hit to the DB	18:21
johnsom	Yeah, I keep thinking of these as separate messages, but they aren't. We have to have the LB ID for health.	18:22
rm_work	err	18:23
rm_work	i don't think so? but	18:23
johnsom	https://github.com/openstack/octavia/blob/master/octavia/controller/healthmanager/health_drivers/update_db.py#L159	18:23
johnsom	Well, we could reverse lookup, but it's like the very first query	18:24
rm_work	it's still the part that is running in our critical path	18:24
rm_work	ah yeah but we need the whole LB not just the ID	18:24
johnsom	It's my brain malfunction that keeps thinking they are separate.	18:24
rm_work	for the stats we'd just need the ID	18:24
johnsom	Yeah	18:24
rm_work	so do we make ANOTHER query to the DB for the LB_ID for the stats message for non-db drivers?	18:25
rm_work	or... include it in the message	18:25
johnsom	Well, it's already there in the message, so I say we just keep it	18:25
rm_work	errr	18:27
rm_work	i don't think it is?	18:27
johnsom	Oh, ID is amphora id....	18:29
rm_work	right	18:29
rm_work	we removed it from ... like... the sample output format the mixin (that was never used) defined	18:29
rm_work	but passing LB_ID to another driver would require a lookup	18:30
rm_work	in the health (speed sensitive) section	18:30
aannuusshhkkaa	looks like we are fetching the LB in VRRPDriverMixin in the very next function.. could we use the same one?	18:30
johnsom	Health should be in it's own thread/process though right? Stats gets split off, so a lookup probably isn't too bad. Plus, we are going to need/want the project ID I expect when we send it out to other external targets	18:32
rm_work	sorry i just mean "health in general"	18:33
rm_work	the "health manager" is all kinda critical path... in that either backing up is not good	18:38
rm_work	health-type-message is worse obviously	18:39
rm_work	but stats backing up isn't greate ither	18:39
*** mloza has joined #openstack-lbaas		18:39
johnsom	Yeah	18:42
*** rouk has joined #openstack-lbaas		18:47
rouk	for moving to train, aka the multi ca change, theres nothing in kolla for the actual upgrade, it does the cert and config placement fine, but wont update amphoras client CA while octavia is down, etc. will it work as intended if we push the new certs and restart the agents right after the octavia services come back up?	19:30
johnsom	Octavia production deployments have always been multi-CA. (just saying, but I know some of the deployment tools copied the old devstack setup that used a single CA)	19:33
rouk	yeah, kolla-ansible was single.	19:34
rouk	which is fun.	19:34
rouk	so im just looking if me doing the change then pushing the amphora client CA file and restarting the agent 1-20 seconds after octavia is reconfigured would do any damage	19:35
johnsom	So, if it is rotating all of the certificates, you will have to failover the amphora (which will happen automatically, but maybe at a larger volume than you would like). If it is only rotating some of them, you can set the cert expiration dates in the DB and the housekeeping process will rotate them automatically. I just don't know what kolla has done for the transition. Either way, I would consider stopping	19:36
johnsom	the health manager while you transition and heavily test the upgrade on a throw away deployment.	19:36
johnsom	Yeah, worst case, Octavia will think they are compromised amphora and just rebuild them via failovers.	19:37
johnsom	You will see messages in the health manager if it thinks the certs are bad.	19:37
rouk	https://etherpad.opendev.org/p/octavia-single-ca-to-multi-ca im just reading this, which implies the only thing the amphora needs to do is have the new CA copied to it?	19:37
johnsom	As for the agent, yeah, a simple restart will pick up new certs.	19:37
rouk	and if i do that fast enough, i shouldnt have major rebuilds, right?	19:38
johnsom	Or just stop your health managers for the period of time you are working.	19:38
rouk	yeah, but thats the only thing i need to do? theres nothing im missing? just have the client CA there before health manager comes back?	19:39
johnsom	Ah, yeah, that etherpad. I wrote that up over a year ago. Let me refresh my memory.	19:39
rouk	trying to avoid a rebuild on 134 amphoras, which will be... quite the processing.	19:40
johnsom	That was also targeted at the tripleo deployer, just FYI	19:40
*** gthiemonge has quit IRC		19:40
rouk	yeah, kolla-ansible does exactly nothing, just places certs and configs in the control plane, so im using this as an example of what pieces are missing.	19:40
rouk	which, looks like i just need to copy the CA and im good, and i wanted to confirm that i wasnt crazy.	19:41
*** gthiemonge has joined #openstack-lbaas		19:41
johnsom	Yeah, I think it's both copy the CA over and, if you are changing out the "server" CA, you will need to set the expiration dates in the DB, Line 90	19:43
johnsom	Let the housekeeping update the "server" certs in the amps, then re-enable HM	19:44
rouk	we are not swapping out the server ca this update	19:46
rouk	oh, nevermind, other way around, adding server ca, so i guess we have to do that expiration.	19:47
johnsom	Yeah, there is context of which side is client and which is server in the guide: https://docs.openstack.org/octavia/latest/admin/guides/certificates.html	19:48
johnsom	It's a two-way authentication, so can be a bit confusing	19:48
rouk	how long do you think for housekeeping to respond to 140 amphoras needing certs issued?	19:50
rouk	keep healthmanager down for an hour?	19:50
johnsom	Oh, I doubt you will need more than half an hour. It will log each rotation	19:50
rouk	alright	19:51
*** gcheresh has joined #openstack-lbaas		19:51
*** ataraday_ has quit IRC		19:56
*** vishalmanchanda has quit IRC		20:16
*** gcheresh has quit IRC		21:10
*** spatel has joined #openstack-lbaas		21:13
*** spatel has quit IRC		21:36
*** maciejjozefczyk has quit IRC		21:36
*** spatel has joined #openstack-lbaas		21:42
*** spatel has quit IRC		21:46
*** spatel has joined #openstack-lbaas		21:52
*** spatel has quit IRC		22:10
*** gthiemonge has quit IRC		22:10
*** gthiemonge has joined #openstack-lbaas		22:11
*** TrevorV has quit IRC		22:16
*** spatel has joined #openstack-lbaas		22:28
*** spatel has quit IRC		22:31
*** rcernin_ has joined #openstack-lbaas		22:33
*** born2bake has quit IRC		22:42
*** rcernin_ has quit IRC		22:47
*** tkajinam has joined #openstack-lbaas		22:51
*** rcernin_ has joined #openstack-lbaas		23:02
*** rcernin_ has quit IRC		23:16
*** rcernin has joined #openstack-lbaas		23:18

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!