Thursday, 2022-06-16

*** soniya29 is now known as soniya29|ruck04:37
*** soniya29 is now known as soniya29|ruck05:29
*** soniya is now known as soniya|ruck06:45
*** elodilles is now known as elodilles_pto07:04
*** jpena|off is now known as jpena07:42
opendevreviewMartin Kopec proposed openstack/patrole master: Drop py3.6 and py3.7 from Patrole  https://review.opendev.org/c/openstack/patrole/+/84045708:48
opendevreviewMartin Kopec proposed openstack/coverage2sql master: Update python testing as per zed cycle teting runtime  https://review.opendev.org/c/openstack/coverage2sql/+/84100209:00
opendevreviewMartin Kopec proposed openstack/os-performance-tools master: DNM CI jobs health check  https://review.opendev.org/c/openstack/os-performance-tools/+/84613609:02
opendevreviewMartin Kopec proposed openstack/devstack-tools master: DNM CI jobs health check  https://review.opendev.org/c/openstack/devstack-tools/+/84613809:06
opendevreviewMerged openstack/grenade master: docs: fix typo in README  https://review.opendev.org/c/openstack/grenade/+/83924709:08
opendevreviewMartin Kopec proposed openstack/tempest-stress master: DNM CI jobs health check  https://review.opendev.org/c/openstack/tempest-stress/+/84613909:08
opendevreviewMerged openstack/bashate master: Do not run pre-commit verbose by default  https://review.opendev.org/c/openstack/bashate/+/83614209:13
opendevreviewMartin Kopec proposed openstack/devstack-tools master: Drop py3.6 and py3.7 from devstack-tools  https://review.opendev.org/c/openstack/devstack-tools/+/84613810:09
*** soniya is now known as soniya29|ruck11:00
*** soniya29 is now known as soniya29|ruck11:47
*** soniya is now known as soniya29|ruck12:05
*** soniya is now known as soniya29|ruck12:39
opendevreviewMerged openstack/tempest master: Make test_server_actions.resource_setup() wait for SSHABLE  https://review.opendev.org/c/openstack/tempest/+/84315513:31
*** soniya is now known as soniya|ruck13:58
*** soniya29 is now known as soniya29|ruck14:38
opendevreviewMerged openstack/coverage2sql master: Update python testing as per zed cycle teting runtime  https://review.opendev.org/c/openstack/coverage2sql/+/84100214:45
*** soniya29|ruck is now known as soniya29|out15:18
dansmithclarkb: so to query individual values from the perf json in the web ui, do I need to select some other index?15:57
dansmithseems like logstash-logs is all I can see15:58
clarkbI'm not sure dpawlik may know?15:59
dansmithI figured, but he's not around15:59
dansmithwe've got maybe a regression to go hunt, but I'm not sure how15:59
clarkbtristanC may know too15:59
opendevreviewDan Smith proposed openstack/devstack master: Add perftop.py tool for examining performance  https://review.opendev.org/c/openstack/devstack/+/84619816:19
dansmithsean-k-mooney: geguileo was looking at why c-bak was getting so big, which led us (him) to notice that n-cpu did as well in certain circumstances16:20
dansmithmoving the convo here since it's not cinder-specific anymore16:20
dansmithsean-k-mooney: so my memory stat is rss not virt (like I thought it was), so your swap thing may be relevant,16:21
dansmithalthough as you noted, they're all the same worker16:21
dansmithbut perhaps a pressure-induced behavior difference16:21
dansmithstill seems like a big difference to me which makes me wonder if we're doing something dumb like reading a big file into memory16:22
geguileoI think it's glibc not releasing memory back to the system16:22
geguileodue to the memory pressure16:23
geguileobecause the compute node is the one that goes >1GB while the controller node is < 500MB16:23
dansmithyou mean when pressure is low, it avoids releasing as much/16:23
geguileoyes16:24
dansmithokay, so you're thinking that pressure is lower in the multinode case? I suppose that might be true if we're spinning up the same number of instances but across two nodes16:25
geguileocontroller nodes are using >5GB in total, whereas the computes only use 2GB16:25
dansmithbut in the multinode case, we're running one controller+compute in the same way as the single node job, only the "other compute" is emptier16:26
geguileoyes, and since it's emptier there is less presure for glibc to release to the system, right?16:26
dansmithright, but the perf.json is only from the controller+compute node16:27
geguileomaybe I'm missing something...16:27
geguileoaren't those 2 different nodes?16:27
geguileoone is controller and another is computes?16:28
sean-k-mooneyi think we have swap enabled by default16:28
dansmiththere are two nodes total.. one with all the control services *and* a compute, per normal, and then one other node with just compute16:28
sean-k-mooneyso i was wonderign if it could be related to memcache16:28
sean-k-mooneyand the fact the subnode wont use it and will use a dict cache16:28
geguileodansmith: correct, and in that one on the controller node n-cpu uses 500MB (out of the 5GB that performance.json has) whereas in the compute n-cpu consumes 1.xGB out of 2GB16:29
geguileoso if the nodes have the same amount of RAM and the instances are evenly distributed16:30
dansmithgeguileo: not based on my info.. the only perf.json I'm looking at is from the combined node, which is where it has inflated to 1.5G16:30
geguileothen there is less pressure on the compute node16:30
dansmithright, but that's not where I'm seeing n-cpu be larger :)16:30
geguileodansmith: I'm looking only at multinode ones...16:30
geguileocompute 2.4GB https://f6314bbe689272b182bf-704d2e5cde896695f5c12544f01f1d12.ssl.cf1.rackcdn.com/845806/2/check/tempest-slow-py3/363a876/compute1/logs/performance.json16:31
geguileocontroller 580MB  https://f6314bbe689272b182bf-704d2e5cde896695f5c12544f01f1d12.ssl.cf1.rackcdn.com/845806/2/check/tempest-slow-py3/363a876/controller/logs/performance.json16:31
dansmithack, yep, I had no stored data from the compute-only one, but looking at a recent job I see the same16:31
dansmiththe thing I was trying to say is:16:32
geguileoif it's a pressure thing it could be the reason why suddently when c-bak consumes less memory n-cpu seems to consume more16:32
dansmithn-cpu on the controller+compute node is *larger* on the multinode job than a controller+compute on a single-node job, by like 3x16:32
dansmithBUT, presumably that's because we're stressing the controller node less on the multinode job because half the instances are spun up on there16:33
dansmithI see 4.2G on one compute-only job I just examined16:33
dansmiththat's still way too big, even if we can hand that back to the system on demand I think.. like I dunno what's making us be that big16:34
geguileofor c-bak it was the fragmentation caused by glibc's per-thread malloc arenas16:35
geguileoduring my c-bak investigation I checked leaked objects, python memory management system stats, and even forced glibc malloc to free the memory in the service16:37
geguileoand in the end for c-bak it was sufficient to set those env variables16:37
dansmithokay, but were those chunks of memory that glibc would have given up if it was under pressure or not?16:37
geguileomaybe for nova we need to either force glibc to free the memory or we need to configure the memory pressure16:37
geguileodansmith: let me see if I tried that specific combo16:38
geguileoI didn't try that combo  :-(16:38
geguileoI only tried it with a single malloc arena16:39
dansmithjust wondering if the same thing applies to n-cpu or if it's something else,16:39
dansmithbecause based on these numbers, I would have expected n-cpu to always be more than c-bak in any of the dumps, which I never saw16:39
geguileoI've seen CI failures caused by OOM c-bak kills, but I don't think there's been those for n-cpu, right?16:41
dansmithnever that I've seen yeah16:41
*** jpena is now known as jpena|off16:44
geguileodansmith: is the nova-grenade-multinode the right job to look at for the compute & controller memory usage?16:50
dansmithI'd look at one of the more standard ones, like nova-multi-cell16:50
dansmiththe grenade one runs both more and less because it's 1.5 tempest runs, 1.5 devstack runs, etc16:51
geguileook, so multi-cell has compute & controller nodes16:51
geguileoI'll do a couple of tests with that one16:52

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!