Friday, 2022-04-22

opendevreview	Dan Smith proposed openstack/devstack master: WIP: Test static perfdata comparisons https://review.opendev.org/c/openstack/devstack/+/838947	00:08
*** pojadhav\|out is now known as pojadhav		04:59
*** bhagyashris is now known as bhagyashris\|ruck		05:33
*** pojadhav is now known as pojadhav\|lunch		06:27
*** jpena\|off is now known as jpena		06:59
*** pojadhav\|lunch is now known as pojadhav		07:11
opendevreview	Balazs Gibizer proposed openstack/devstack master: Use proper sed separator for paths https://review.opendev.org/c/openstack/devstack/+/839034	10:05
opendevreview	Merged openstack/devstack master: modify the sample value of LOGDAYS https://review.opendev.org/c/openstack/devstack/+/838829	11:09
sean-k-mooney	clarkb: i have been playing with the virtual env patch actully this morning but in a very different setup	11:42
sean-k-mooney	clarkb: it was working pretty well on centos9 for me before	11:43
sean-k-mooney	this morning i have been tryign to use it to install on arm (m1 macbook air) using debian testing/bookworm	11:43
sean-k-mooney	clarkb: ... privsep	12:09
sean-k-mooney	privsep is not using the same python as everything else	12:10
sean-k-mooney	oh i bet this is the concole script edge case wehre the the python version is not updated after the initall insatll	12:14
sean-k-mooney	i did orgianlly try to stack with 10 a few weeks ago	12:14
sean-k-mooney	i bet that is the issue	12:14
*** pojadhav is now known as pojadhav\|afk		13:23
opendevreview	Dan Smith proposed openstack/devstack master: Update our API call counting method https://review.opendev.org/c/openstack/devstack/+/839067	14:00
dansmith	clarkb: gmann: I'm starting to realize that some of these "stable" metrics aren't actually that performance-independent	14:05
dansmith	like api calls.. on a slower worker, tempest will poll more times for resource completion, inflating that number	14:05
dansmith	that's actually not that hard to filter out	14:06
dansmith	but, the db queries it generates is	14:06
dansmith	service memory footprint seems to also be somewhat unreliable, like by 30% in some cases, which seems odd to me	14:07
*** akekane_ is now known as abhishekk		15:15
clarkb	dansmith: oh intreesting	15:18
sean-k-mooney	dansmith: is that 30% in the same job but from differnt runs	15:23
sean-k-mooney	or stack/unstack on your local system	15:23
sean-k-mooney	that does seam higher then i would expect too	15:23
dansmith	sean-k-mooney: same job, same patch, two subsequent runs	15:28
dansmith	(in ci, not local)	15:28
sean-k-mooney	dansmith: did it hit the same provier? i would not thinkt that shoudl affect memory liek that but if it was slower excrta maybe caches are expanding or somethign over time	15:29
dansmith	sean-k-mooney: nope, different providers, which is likely why it took 10 minutes longer on one, and had more polling	15:32
sean-k-mooney	ya so we might be bufferign the logs or soemthing that would result in larger memory usage	15:33
dansmith	30% seems like a lot for that difference, but yeah, something	15:34
dansmith	I was thinking more like the extra polling ended up spawning more horizontal workers or something	15:34
sean-k-mooney	ya it does	15:34
dansmith	more tempest tests waiting and polling all at once or something	15:34
sean-k-mooney	perhaps or more queued requets in apache	15:34
sean-k-mooney	the way we run say nova-api behidn aprch in uwsgi means that we get no scaling form eventlet	15:35
sean-k-mooney	all requets are queued in appache and then each uwsgi python proces handels one request at a time	15:36
sean-k-mooney	so more polling means more thigns in the queue and more memory usage	15:36
dansmith	well, the things in the queue are small though, and not charged against the python process until they're dispatched right?	15:38
sean-k-mooney	hum ya i guess that woudl be correct it woudl be agaisnt apache	15:38
sean-k-mooney	not hte python process	15:38
sean-k-mooney	and ya its jut the http get wich in most cases is tiny	15:39
dansmith	right	15:39
dansmith	especially in the polling case	15:39
sean-k-mooney	although each request will translate to a db query	15:39
sean-k-mooney	and proably memcache lookup	15:40
sean-k-mooney	so its proably on the service -> db side that its increasing	15:40
dansmith	memcache when?	15:42
dansmith	not for things like instance show I wouldn't think	15:43
sean-k-mooney	i was thinking the keystone tokens dont we cache that in memcache	15:43
dansmith	and memcached might go up, but not sure why memcache calls from the python services would	15:43
sean-k-mooney	i guess that would not affect the python usage	15:44
sean-k-mooney	its an interesting result in any case	15:44
sean-k-mooney	i assume os-profiler or the other tools we have would not help narrow down why its increasing	15:45
dansmith	yeah I dunno, but yes definitely interesting	15:45
dansmith	I think focusing on the should-be-repeatable metrics first is probably most useful	15:46
dansmith	db queries being impacted by polling is troubling,	15:46
dansmith	so I probably need to separate out SELECT vs. (everything else) or something	15:46
dansmith	but even still, the impetus for this was additional select queries due to rbac, so it would only help for large spikes	15:46
sean-k-mooney	is that select load confied ot keystone or across all services	15:48
sean-k-mooney	i would not have expected this to affect other serivces	15:48
dansmith	of course it's all services, because it	15:49
dansmith	is polling for instance -> active or something :)	15:49
dansmith	which makes it.. a mess :)	15:49
sean-k-mooney	right but that would increass the http request to keystone form nova	15:49
sean-k-mooney	sory form nova to keystone	15:49
sean-k-mooney	to validate the tokens	15:49
dansmith	so keystone goes up as well because of all the token validation, but so does straight up nova db queries, pulling the instance each time	15:50
sean-k-mooney	ya but that last part is partly unavoiable	15:50
sean-k-mooney	altough i think status is in the non detail endpoint	15:50
dansmith	it is, and that's the point	15:50
sean-k-mooney	so if its just active they shoudl not use the detail one	15:50
sean-k-mooney	i know ceilomiter fixed that a few years ago	15:50
dansmith	if we're trying to alert on "why is nova doing a bunch more db queries on this patch" it's hard to separate that from "this is a slow worker and tempest did a bunch more polling on just this run"	15:51
sean-k-mooney	ya	15:51
sean-k-mooney	maybe you could syntisie a memtric liek quiries/jobtime or something that was less affected	15:52
dansmith	yeah, so I can tell what api calls are tempest and which are inter-service, so maybe figuring out what the proportion is and then apply that to the db queries would help normalize it	15:54
*** jpena is now known as jpena\|off		15:55
sean-k-mooney	dansmith: clarkb https://twitter.com/sean_k_mooney/status/1517537626923929601?s=20&t=arNXrLIXTd_74nKhsAZauA	16:17
dansmith	nice	16:17
sean-k-mooney	i have not got vms booting fullly yet. im missing some config for uefi to work properly	16:18
sean-k-mooney	but its close	16:18
sean-k-mooney	i am also using the global_venv	16:18
clarkb	nice! when you say natively is it still in a linux vm but running arm not emulated x86?	16:19
sean-k-mooney	nope	16:20
clarkb	(I wouldn't expect devstack to run on osx direclty, really neat if so)	16:20
sean-k-mooney	linux running nativly on m1	16:20
sean-k-mooney	no vm	16:20
sean-k-mooney	then devstack installed on that	16:20
clarkb	oh woww you are extremely brave :)	16:20
clarkb	that is cool though	16:20
sean-k-mooney	so debian testing	16:20
dansmith	ah, gdi, we're using a different log format for tls-proxy.log	16:21
sean-k-mooney	https://github.com/AsahiLinux/docs/wiki/SW%3AAlternative-Distros im using the debian comunity installer	16:21
dansmith	clarkb: gmann frickler: do you know why we're not using the "combined" format for the tls proxy log? we're specifically choosing a lot format there, but it's lacking things like user-agent	16:23
clarkb	I don't	16:23
clarkb	are we adding port info? that may be why (its a common reason for opendev services ot override common format at least)	16:24
gmann	me too, not sure about it.	16:24
clarkb	by default the combined format doesn't show youport info which is necessary to trace connections through a proxy	16:24
dansmith	it's very barebones	16:25
dansmith	CustomLog /var/log/apache2/tls-proxy_access.log "%{%Y-%m-%d}t %{%T}t.%{msec_frac}t [%l] %a \"%r\" %>s %b"	16:25
dansmith	so I'mma switch that to combined like our regular access.log if that's okay	16:26
clarkb	ya if port info isn't there then combined shouldn't be a regression. wfm	16:26
opendevreview	Dan Smith proposed openstack/devstack master: Update our API call counting method https://review.opendev.org/c/openstack/devstack/+/839067	16:31
sean-k-mooney	dansmith: clarkb got a vm booted :)	17:00
opendevreview	Dan Smith proposed openstack/devstack master: Update our API call counting method https://review.opendev.org/c/openstack/devstack/+/839067	17:02
sean-k-mooney	ok that might be premature i think it crashed	17:08
sean-k-mooney	which kind of make sense	17:08
dansmith	hah	17:08
sean-k-mooney	the current kernel im using does not yet support 4k pages	17:08
sean-k-mooney	its using 16k pages by default because of iommu issues	17:09
sean-k-mooney	oh yay ... sudo reboot does not work	17:10
sean-k-mooney	power button does. i think ill leave it off for a while its had a long day	17:10
sean-k-mooney	o/ talk to ye on monday	17:11
dansmith	shocking that apple even includes a power button anymore	17:13
opendevreview	Dan Smith proposed openstack/devstack master: Update our API call counting method https://review.opendev.org/c/openstack/devstack/+/839067	18:40
opendevreview	Dan Smith proposed openstack/devstack master: WIP: Test static perfdata comparisons https://review.opendev.org/c/openstack/devstack/+/838947	18:40

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!