*** threestrands has joined #zuul | 00:43 | |
*** haint_ has quit IRC | 01:04 | |
*** xinliang has quit IRC | 01:56 | |
*** threestrands has quit IRC | 02:01 | |
*** xinliang has joined #zuul | 02:09 | |
*** xinliang has quit IRC | 02:09 | |
*** xinliang has joined #zuul | 02:09 | |
*** threestrands has joined #zuul | 02:16 | |
*** threestrands has quit IRC | 02:16 | |
*** threestrands has joined #zuul | 02:16 | |
*** sshnaidm_ has joined #zuul | 02:45 | |
*** sshnaidm has quit IRC | 02:47 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: Use yarn and webpack to manage zuul-web javascript https://review.openstack.org/538099 | 03:01 |
---|---|---|
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: Add babel transpiling enabling use of ES6 features https://review.openstack.org/538125 | 03:01 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: Add StandardJS linting and analysis https://review.openstack.org/538126 | 03:01 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: Fix source_url handling for jobs view https://review.openstack.org/538127 | 03:01 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: Fix StandardJS warnings and turn them to errors https://review.openstack.org/538128 | 03:01 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: Add bundle analysis to the lint target https://review.openstack.org/538129 | 03:01 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: Inject url endpoint information https://review.openstack.org/538130 | 03:01 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: Make bundle of build web content https://review.openstack.org/538131 | 03:01 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: Remove use strict https://review.openstack.org/538132 | 03:01 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/jobs/{job_name} route https://review.openstack.org/535545 | 03:01 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add jobs graph rendering https://review.openstack.org/537869 | 03:01 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: /{tenant}/projects.json routes https://review.openstack.org/537870 | 03:01 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add project pipeline rendering https://review.openstack.org/537871 | 03:01 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: Refactor run_handler to be generic https://review.openstack.org/535554 | 03:18 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: Refactor NodeLauncher to be generic https://review.openstack.org/535555 | 03:18 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: Implement an OpenContainer driver https://review.openstack.org/535556 | 03:18 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: Implement a Kubernetes driver https://review.openstack.org/535557 | 03:18 |
*** sshnaidm_ has quit IRC | 03:24 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: Add /node-list to the webapp https://review.openstack.org/535562 | 03:35 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: Add /label-list to the webapp https://review.openstack.org/535563 | 03:35 |
*** threestrands has quit IRC | 04:07 | |
*** threestrands has joined #zuul | 04:19 | |
*** threestrands has quit IRC | 04:19 | |
*** threestrands has joined #zuul | 04:19 | |
*** threestrands has quit IRC | 04:36 | |
*** threestrands has joined #zuul | 04:37 | |
*** threestrands has quit IRC | 05:07 | |
*** sshnaidm has joined #zuul | 05:25 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add buildset-artifacts-location https://review.openstack.org/530679 | 05:31 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add recursive-download https://review.openstack.org/540716 | 05:40 |
*** threestrands has joined #zuul | 05:42 | |
*** threestrands has quit IRC | 05:47 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add linters job and role https://review.openstack.org/530682 | 05:47 |
*** threestrands has joined #zuul | 05:47 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add ansible-lint job https://review.openstack.org/532083 | 05:50 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add ansible-review job https://review.openstack.org/535223 | 05:52 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add ansible-upload-to-galaxy job https://review.openstack.org/532084 | 05:57 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul-jobs master: Add ansible-spec job https://review.openstack.org/532085 | 05:59 |
*** threestrands_ has joined #zuul | 06:04 | |
*** threestrands_ has quit IRC | 06:04 | |
*** threestrands_ has joined #zuul | 06:04 | |
*** threestrands has quit IRC | 06:06 | |
openstackgerrit | liusheng proposed openstack-infra/zuul master: Fix AttributeError when handle periodic job with github driver https://review.openstack.org/536645 | 07:16 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: mqtt: add basic reporter https://review.openstack.org/535543 | 07:18 |
*** xinliang has quit IRC | 07:18 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: config: add statsd-server config parameter https://review.openstack.org/535560 | 07:21 |
*** AJaeger has quit IRC | 07:43 | |
*** AJaeger has joined #zuul | 07:45 | |
*** haint has joined #zuul | 07:54 | |
*** hashar has joined #zuul | 08:12 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: Refactor run_handler to be generic https://review.openstack.org/535554 | 08:12 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/nodepool master: Convert nodepool-zuul-functional job https://review.openstack.org/540595 | 08:30 |
*** jpena|off is now known as jpena | 08:45 | |
*** threestrands_ has quit IRC | 08:54 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: WIP: Status branch protection checking for github https://review.openstack.org/535680 | 08:55 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/nodepool master: Convert nodepool-zuul-functional job https://review.openstack.org/540595 | 08:58 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: WIP: Status branch protection checking for github https://review.openstack.org/535680 | 09:03 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix github connection for standalone debugging https://review.openstack.org/540772 | 09:06 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Enhance github debugging script for apps https://review.openstack.org/540774 | 09:11 |
hughsaunders | odyssey4me: I think this is the lifecycle I was looking for, at least the basics https://github.com/openstack-infra/zuul/blob/master/tests/nodepool/test_nodepool_integration.py#L53-L82 | 09:51 |
odyssey4me | hughsaunders aha, yay for tests! | 10:03 |
hughsaunders | yeah :) | 10:04 |
*** sshnaidm is now known as sshnaidm|afk | 10:14 | |
tobiash | corvus, mordred: just verified that the zuul-web changes we rushed in on friday work | 10:25 |
tobiash | and it works in my test env :) | 10:26 |
*** sshnaidm|afk is now known as sshnaidm|lnch | 10:58 | |
*** jimi|ansible has quit IRC | 11:20 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Enhance github debugging script for apps https://review.openstack.org/540774 | 11:22 |
*** jimi|ansible has joined #zuul | 11:26 | |
*** jimi|ansible has joined #zuul | 11:26 | |
*** jpena is now known as jpena|lunch | 12:25 | |
*** sshnaidm|lnch is now known as sshnaidm|afk | 12:29 | |
*** sshnaidm|afk is now known as sshnaidm | 13:02 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: config: add statsd-server config parameter https://review.openstack.org/535560 | 13:21 |
*** jpena|lunch is now known as jpena | 13:27 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: mqtt: add basic reporter https://review.openstack.org/535543 | 13:30 |
*** sshnaidm is now known as sshnaidm|rover | 14:13 | |
*** dkranz has quit IRC | 14:24 | |
mordred | tobiash: woot! | 14:40 |
*** myoung|bbl is now known as myoung | 15:00 | |
*** dkranz has joined #zuul | 15:01 | |
openstackgerrit | Doug Hellmann proposed openstack-infra/zuul-jobs master: add debug info to mirror-workspace-git-repos https://review.openstack.org/540880 | 15:07 |
*** dkranz has quit IRC | 15:15 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Increase test timeout https://review.openstack.org/540889 | 15:31 |
openstackgerrit | Andrea Frittoli proposed openstack-infra/zuul-jobs master: Remove support for extensions as lists https://review.openstack.org/540890 | 15:32 |
*** JasonCL has quit IRC | 15:43 | |
*** JasonCL has joined #zuul | 15:47 | |
*** JasonCL has quit IRC | 15:52 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: add debug info to mirror-workspace-git-repos https://review.openstack.org/540880 | 16:06 |
*** dkranz has joined #zuul | 16:08 | |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool master: Add unit test for multiple launchers https://review.openstack.org/540916 | 16:16 |
pabelanger | corvus: tobiash: Shrews: clarkb: ^ a base test to test multiple launchers for nodepool. Also showing current behavior of min-ready across launchers. | 16:17 |
pabelanger | going to build a top of that and see how to to have min-ready=1 be 1 node over multiple launchers | 16:17 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool master: Add unit test for multiple launchers https://review.openstack.org/540916 | 16:18 |
rcarrillocruz | wait | 16:20 |
rcarrillocruz | is zuul-web webpack and all merged already | 16:21 |
rcarrillocruz | ? | 16:21 |
rcarrillocruz | tobiash , mordred ^ | 16:21 |
mordred | rcarrillocruz: no - I'll be wokring on finishing that up today | 16:21 |
rcarrillocruz | ack | 16:21 |
rcarrillocruz | i stopped containerizing the zuul-web stuff since you were on that | 16:21 |
rcarrillocruz | thx | 16:22 |
* rcarrillocruz needs to get back to zuul-ci-container soon | 16:22 | |
pabelanger | Shrews: I'm going to get the terminology wrong, hopefully you can correct me. When we create a NodePool:min-ready request, is there any way we can lock that, so other provider cannot also create one? As a way to have each launcher first check if that is locked before submitting their request? | 16:31 |
clarkb | pabelanger: Shrews we could probably do it in a cheap and mostly effective way of just scanning the node list in zk for ready nodes of label foo | 16:32 |
clarkb | count them all and if we have min ready ready don't launch any more | 16:32 |
clarkb | it will be racy but should mostly work out since goal is keeping enough fresh instances around to reduce reaction time | 16:32 |
Shrews | clarkb: that's how it works now | 16:32 |
clarkb | ah ok for some reason I thought it only checked the current provider and not all providers | 16:33 |
Shrews | At a chiropractor appt now but can expand more when I get back to a computer | 16:33 |
pabelanger | sure | 16:34 |
pabelanger | clarkb: yah, you can see the race in 540916 logs | 16:34 |
*** jimi|ansible has quit IRC | 16:52 | |
Shrews | pabelanger: yeah, so, coordination between launchers has been a "next version" goal for a while. Happy to have you work on that though. | 17:06 |
openstackgerrit | Monty Taylor proposed openstack-infra/zuul-jobs master: Use item.checkout from zuul.projects when mirroring https://review.openstack.org/540934 | 17:06 |
Shrews | pabelanger: that's going to involve locking "something" to enable that coordination | 17:06 |
tobiash | Shrews, pabelanger: maybe locking a node describing the pool itself? | 17:13 |
pabelanger | Shrews: tobiash: sorry, lunch. Yah, I'd be interested in seeing how we'd do it. And happy to learn more and see if I could write the patch | 17:23 |
pabelanger | easier to do with testing now in place :) | 17:23 |
Shrews | pabelanger: i'm interested to see how we'd do it too :) | 17:23 |
pabelanger | ha | 17:24 |
Shrews | it hasn't been discussed or thought about (at least by me), so a solution is up in the air still | 17:24 |
pabelanger | ack | 17:24 |
Shrews | tobiash: responded to your question on https://review.openstack.org/535899. i'd like to see us get that merged soon-ish | 17:26 |
tobiash | Shrews: +2 from me | 17:30 |
Shrews | tobiash: http://logs.openstack.org/99/535899/14/check/nodepool-functional-py35/35b5a19/job-output.txt.gz#_2018-02-05_15_02_36_075534 shows that it's using the source from the review | 17:31 |
tobiash | ah, ok | 17:31 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: Test Use item.checkout from zuul.projects when mirroring https://review.openstack.org/540945 | 17:32 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: Use item.checkout from zuul.projects when mirroring https://review.openstack.org/540934 | 17:43 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: Revert "Test Use item.checkout from zuul.projects when mirroring" https://review.openstack.org/540950 | 17:45 |
*** hashar is now known as hasharAway | 17:47 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: DNM: Test new mirror-workspace-git-repos role https://review.openstack.org/540952 | 17:49 |
*** sshnaidm|rover is now known as sshnaidm|bbl | 17:52 | |
*** jimi|ansible has joined #zuul | 18:15 | |
*** jimi|ansible has joined #zuul | 18:15 | |
*** jpena is now known as jpena|off | 18:19 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Test Use item.checkout from zuul.projects when mirroring https://review.openstack.org/540945 | 18:27 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Allow a few more starting builds https://review.openstack.org/540965 | 18:38 |
*** myoung is now known as myoung|dr | 18:38 | |
corvus | tobiash, mordred: ^ can you look at that? | 18:38 |
tobiash | looking | 18:43 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/zuul master: Update nodepool-integration for Zuul v3 https://review.openstack.org/540967 | 18:44 |
tobiash | corvus: +2 left +3 to you | 18:44 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/nodepool master: Convert nodepool-zuul-functional job https://review.openstack.org/540595 | 18:45 |
corvus | clarkb: can you +3 https://review.openstack.org/540889 please? | 18:46 |
*** JasonCL has joined #zuul | 18:47 | |
Shrews | Would anyone have any issues with turning the nodepool devstack plugin files (devstack/plugin.sh and devstack/settings) into jinja templates? I'm trying to devise a way to specify the default image via an ansible var rather than environment variables | 18:58 |
Shrews | (not sure i actually want to do that just yet, but considering it) | 19:01 |
clarkb | Shrews: considering that is an interface defined by devstack I think we should avoid that. You should be able to run devstack + nodepool without ansible or zuul | 19:01 |
openstackgerrit | Merged openstack-infra/zuul master: Increase test timeout https://review.openstack.org/540889 | 19:02 |
clarkb | Shrews: devstack uses env vars so I think thats the way we go about it? you might be able to clean that up in the job by using the native devstack localrc stuff though | 19:03 |
*** harlowja has joined #zuul | 19:03 | |
AJaeger | so, nodepool / zuul integration tests fail now with Zuul v3 native job the same way as with old job. | 19:07 |
AJaeger | team, could you review https://review.openstack.org/540967 and https://review.openstack.org/540595 , please? | 19:08 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Allow a few more starting builds https://review.openstack.org/540965 | 19:12 |
*** JasonCL has quit IRC | 19:15 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: Fix error in test-mirror-workspace-git-repos https://review.openstack.org/540977 | 19:17 |
Shrews | AJaeger: why do we need the "zuul | zuul_legacy_vars" environment vars? | 19:17 |
AJaeger | Shrews: Just wondering that myself - let's remove it and test again... | 19:18 |
Shrews | AJaeger: k. i left another comment in the review | 19:18 |
AJaeger | Shrews: answered your comment | 19:18 |
Shrews | ack | 19:18 |
Shrews | AJaeger: oh right. that script is in zuul | 19:19 |
AJaeger | yes | 19:19 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/nodepool master: Convert nodepool-zuul-functional job https://review.openstack.org/540595 | 19:19 |
AJaeger | Shrews: ^ | 19:19 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: Use item.checkout from zuul.projects when mirroring https://review.openstack.org/540934 | 19:21 |
openstackgerrit | Merged openstack-infra/zuul master: Update nodepool-integration for Zuul v3 https://review.openstack.org/540967 | 19:24 |
Shrews | AJaeger: the example here for another project's src_dir might be more future proof: https://docs.openstack.org/infra/zuul/user/jobs.html#var-zuul.projects.checkout | 19:25 |
AJaeger | Shrews: I can update... | 19:26 |
AJaeger | editor is still open - thanks, didn't know this syntax | 19:26 |
AJaeger | Shrews, is this the right value: '{{ zuul.projects['git.openstack.org/openstack-infra/zuul'].src_dir }}' ? | 19:27 |
Shrews | AJaeger: neither did I :) figured there had to be some other way | 19:27 |
Shrews | AJaeger: i *think* so | 19:27 |
AJaeger | let's try ;) | 19:27 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/nodepool master: Convert nodepool-zuul-functional job https://review.openstack.org/540595 | 19:29 |
openstackgerrit | Andreas Jaeger proposed openstack-infra/nodepool master: Convert nodepool-zuul-functional job https://review.openstack.org/540595 | 19:30 |
AJaeger | Shrews: so, fails at same place as before - so, changes are fine ^ | 19:38 |
*** jimi|ansible has quit IRC | 19:39 | |
*** JasonCL has joined #zuul | 19:42 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/zuul master: Fix nodepool integration tests https://review.openstack.org/540983 | 19:44 |
corvus | clarkb: can i get a re-review on https://review.openstack.org/540965 ? i fixed the test to accomodate the change | 19:44 |
Shrews | AJaeger: https://review.openstack.org/540983 hopefully fixes that | 19:45 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Fix error in test-mirror-workspace-git-repos https://review.openstack.org/540977 | 19:48 |
*** JasonCL has quit IRC | 19:50 | |
*** jimi|ansible has joined #zuul | 19:55 | |
*** jimi|ansible has joined #zuul | 19:55 | |
AJaeger | great, Shrews ! | 20:02 |
corvus | i'm looking at ze02 at 16:20. cacti says we're using 7.12G of ram (out of 8G == 89% used). the ram governor in the executor reports 59% available (or 41% used). those numbers don't seem to be anywhere near each other. | 20:12 |
corvus | http://cacti.openstack.org/cacti/graph.php?action=zoom&local_graph_id=64155&rra_id=0&view_type=tree&graph_start=1517847537&graph_end=1517849173&graph_height=120&graph_width=500&title_font_size=12 | 20:12 |
corvus | http://grafana.openstack.org/dashboard/db/zuul-status?from=1517847487131&to=1517848972297 | 20:12 |
corvus | we may need to look into what we're using for the ram calculation a bit deeper | 20:13 |
corvus | (at 16:40 the log streamer was oom-killed) | 20:13 |
clarkb | corvus: the buffers and cache probably don't count in the govnernor? | 20:13 |
*** JasonCL has joined #zuul | 20:14 | |
corvus | clarkb: the commit message said it 'does not take into account buffers or cache which could be reclaimed'. iiuc, that means it should be even more conservative than the cacti data, but it's the opposite. | 20:16 |
corvus | could it be including swap in the total? | 20:17 |
clarkb | 41% used implies ^ is a good guess assuming swap size is equal to ram size | 20:17 |
corvus | they are equal | 20:17 |
corvus | if i redo that math based on those assumptions, i get 45% used | 20:19 |
corvus | that's pretty close to the 41% used reported. | 20:19 |
*** JasonCL has quit IRC | 20:19 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul-jobs master: Revert "Test Use item.checkout from zuul.projects when mirroring" https://review.openstack.org/540950 | 20:59 |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Use item.checkout from zuul.projects when mirroring https://review.openstack.org/540934 | 20:59 |
*** JasonCL has joined #zuul | 21:04 | |
fungi | dmesg on ze02 now shows that it oom-killed a git process as recently as 18:35 | 21:08 |
*** JasonCL has quit IRC | 21:11 | |
*** JasonCL has joined #zuul | 21:32 | |
*** JasonCL has quit IRC | 21:37 | |
*** myoung|dr is now known as myoung | 21:51 | |
dmsimard | zuul meeting in ~6 minutes ? | 21:53 |
jhesketh | Morning | 21:55 |
*** hasharAway has quit IRC | 21:56 | |
corvus | i think i need to discard all of my thinking about the ram governor and ze02 earlier. it looks like cacti isn't reliably getting data from ze02, so i'm not sure i can trust my reading of the graph earlier. | 21:57 |
corvus | the data from the executor, via psutil and graphite, should actually match up. | 21:57 |
corvus | i'll try to confirm that on one of the other hosts. | 21:57 |
corvus | meanwhile ze02 is interesting in that it seems to be the only one with the reporting issues, as well as the only one that has oom'd | 21:58 |
clarkb | dmsimard: yes now 2 minutes | 21:58 |
corvus | zuul meeting time in #openstack-meeting-alt | 22:00 |
Shrews | I'll plan to restart the launchers in the morning when I can be around to watch them all day. | 22:18 |
Shrews | but i don't expect problems | 22:18 |
corvus | Shrews: thanks! | 22:18 |
pabelanger | Shrews: tobiash: https://review.openstack.org/536930/ would be a nice to have, still failing tests. To fix quota issue we see in citycloud. | 22:18 |
pabelanger | corvus: too^ | 22:19 |
Shrews | pabelanger: I'd prefer that not go in when I'm going to be away for a week | 22:20 |
pabelanger | Shrews: ack, we can likely -2 it then | 22:20 |
Shrews | pabelanger: i mean, we can totally merge it if you can get it to pass tests, but i won't be around to help diagnose/fix if things go awry :) | 22:22 |
pabelanger | Shrews: yah, a good reason not to do it then | 22:22 |
Shrews | https://review.openstack.org/540983 timing out on the nodepool-zuul-integration test is perplexing | 22:24 |
Shrews | the log is not helpful either: http://logs.openstack.org/83/540983/1/experimental/nodepool-zuul-functional/ddc8517/job-output.txt.gz#_2018-02-05_20_19_20_427952 | 22:25 |
clarkb | might be good to capture the testr logs there | 22:30 |
corvus | okay, i've determined that the percent available/used that the executor is reporting is basically what you'd get by looking at the 'available' column in free. which is generally the thing we want. it's also what the cacti graph is trying to do, but we are missing an important bit of data from cacti.... | 22:35 |
corvus | cacti calculates used with total-free-cache-buffers, which is almost how free calculates available -- free+cache+buffers+reclaimable_slab | 22:36 |
corvus | so there is a discrepancy between cacti and psutil, and it's due to reclaimable slab memory | 22:36 |
corvus | which includes the kernels inode cache. that's 2G on ze03 right now. | 22:36 |
clarkb | that is a lot of inodes | 22:37 |
corvus | at any rate, that leads me to accept the 'percent free' value with some more confidence, and i think we can mostly set aside cacti for now and just look at graphite | 22:38 |
fungi | wow | 22:38 |
clarkb | not entirely suprising considering we had to bump up inode count on those filesystems | 22:38 |
corvus | but if we look at ze02 at 16:40, it reports 61.5% available memeroy | 22:38 |
corvus | i don't know what the relative proportions of the constituents are -- it's probably safe to assume 'free' is small -- it usually is. but at the very least, there was still 3.2G of reclaimable ram -- why would it oom? | 22:39 |
clarkb | oomkiller operates off of cgroup levels now iirc. Possibly that that is set too aggresively by default? | 22:40 |
clarkb | basically it can be invoked far below the 100% memory used case | 22:41 |
clarkb | I would be surprised if ubuntu set it to a low value but could be one explanation | 22:41 |
*** JasonCL has joined #zuul | 22:47 | |
corvus | clarkb: this is interesting ... know how to debug further? | 22:49 |
corvus | i wonder if there's any relevant info in the oom-keller kernel messages about it...? | 22:50 |
clarkb | corvus: ya let me look at my local system real quick | 22:50 |
clarkb | corvus: look in /sys/fs/cgroup/memory and there should be attributes like memory.limit_in_bytes | 22:52 |
corvus | that is very high on ze02 | 22:52 |
clarkb | oh! since bwrap is running these processes they likely have their own cgroups so you may need to look in /sys/fs/cgroup/memory/$groupnameforcontainer | 22:52 |
corvus | 8191 petabytes | 22:53 |
clarkb | I think that is the hierarchy? | 22:53 |
clarkb | /sys/fs/cgroup/memory/system.slice/zuul-executor.service looks to be the path for the executor | 22:53 |
clarkb | 6701445120 is the number there which is smaller | 22:54 |
clarkb | er thats usage not limit | 22:54 |
clarkb | ya limit is huge thre too | 22:54 |
corvus | i think that's what systemd set up for it | 22:54 |
clarkb | so ya I don't think that is the issue but does rule out one thing that can influence oomkiller invocation | 22:55 |
clarkb | memory.oom_control is something that might be useful to us though | 22:55 |
corvus | yeah, though i'd like to end up with 'stop killing things' rather than 'kill something else' | 22:55 |
*** JasonCL has quit IRC | 23:08 | |
corvus | i was curious about the flags in: | 23:15 |
corvus | Feb 5 16:40:25 ze02 kernel: [141779.382764] git invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0 | 23:16 |
corvus | the gfp_mask is: ___GFP_NOTRACK; ___GFP_DIRECT_RECLAIM; ___GFP_IO; ___GFP_FS | 23:16 |
corvus | which means: avoid tracking with kmemcheck; caller may enter direct reclaim; can start physical io, can call down to low-level fs. | 23:16 |
corvus | none of those jump out at me as suggesting that the kernel couldn't use the memory it had available | 23:17 |
corvus | which included slab_reclaimable:1995768kB | 23:17 |
*** JasonCL has joined #zuul | 23:18 | |
clarkb | corvus: internet says https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1655842 may be related | 23:19 |
openstack | Launchpad bug 1655842 in linux-aws (Ubuntu Xenial) ""Out of memory" errors after upgrade to 4.4.0-59" [Undecided,Confirmed] | 23:19 |
*** JasonCL has quit IRC | 23:19 | |
clarkb | but it is also marked fix released | 23:19 |
*** JasonCL has joined #zuul | 23:19 | |
*** threestrands has joined #zuul | 23:19 | |
*** JasonCL has quit IRC | 23:22 | |
*** JasonCL has joined #zuul | 23:23 | |
*** JasonCL has quit IRC | 23:28 | |
corvus | clarkb: hrm. this is starting to look like a kernel issue in our deployment, and maybe not zuul related... i'll switch to -infra | 23:43 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!