Saturday, 2021-01-16

openstackgerritJeremy Stanley proposed openstack/project-config master: Revert "Un-pause Gentoo image builds"
fungiprometheanfire: ^ i've tried to include as much diagnostic info as i can in that commit message15:04
openstackgerritJeremy Stanley proposed zuul/zuul-jobs master: Temporarily stop running Gentoo base role tests
openstackgerritJeremy Stanley proposed zuul/zuul-jobs master: Revert "Temporarily stop running Gentoo base role tests"
openstackgerritJeremy Stanley proposed opendev/system-config master: Correct path in mk-archives-index cronjob on lists
prometheanfirefungi: kk17:37
prometheanfirefungi: it seemed like it was still unpacking if there was a lock on distfiles17:41
fungiprometheanfire: yeah, i wonder if whatever unpacking was going on died and the parent process didn't notice or something... but dmesg didn't indicate an oom or anything of the sort18:33
prometheanfirefungi: is this a single instance of an issue or a repeated issue?18:42
fungiprometheanfire: it happens consistently. image build starts, it gets to installing six, then sticks like that for 4+ hours and finally nodepool gives up18:52
fungiimage builds aren't completing18:52
fungiprometheanfire: could we maybe add some additional debugging output to the emerge?19:49
prometheanfireheh, there is a --debug option20:01
prometheanfireit's kinda verbose :P20:01
prometheanfirefungi: would this be a good test?
prometheanfireiirc that's what I was running before when I initially developed the stuff20:06
prometheanfirethen I can test locally20:06
fungiprometheanfire: probably? i honestly haven't tried reproducing an image build locally for a while20:09
prometheanfireok, I'll assume it's right20:09
fungiprometheanfire: i'm also wondering if it could be related to the kernel version on our builder... but also nb01 has now ceased to be reachable so i'm going to see what's happened to it20:18
fungiahh, my bad, i was trying to reach old servers which we never cleaned up in dns20:34
fungii'll clean that up20:35
prometheanfireI am having an issue (not the same one though)20:35
prometheanfireI'm thinking that six isn't installed in the base image for me20:38
prometheanfireit happens before I can install anything20:38
prometheanfireproject-config/nodepool/elements/openstack-repos/extra-data.d/50-create-repo-list does it20:38
fungi#status log deleted old aaaa records for nonexistent and servers20:39
openstackstatusfungi: finished logging20:39
fungilooks like somebody cleaned up the ipv4 address records but not ipv620:40
prometheanfiregonna try wrapping that import in a try/except20:42
prometheanfireyep, that worked20:43
fungiprometheanfire: so as far as replicating the issue, if it comes down to it, we're running this in an ubuntu 18.04 lts vm with docker-compose using this compose file:
fungiwith defaults, so "image:"20:47
prometheanfireit'll be a minute, for my testing20:47
fungii'm not sure if any of those details will be involved in the problem20:47
prometheanfire2021-01-16 20:47:09.165 | Caching gerrit from in /opt/dib_cache/source-repositories/gerrit_0a56dd139195635d3ada2296d9ddf8ce967dea2820:47
fungizuul seems to have caught up on its backlog, so maybe i'll restart the scheduler for gerrit wip support after dinner20:52
fungilooks like the static site volumes are releasing on a normal cadence again21:48
fungithere are still outstanding transactions for some of the mirrors though, so we'll need to consider if we want to try to abort them now that we can make rpc calls in a timely fashion again21:48
fungiwe should be able to approve the revert for serving static sites from the writeable path at least, if any other config-core wants to review:
fungizuul utilization had a bit of a spike around 20:30z so i'll give it a bit longer before i restart the scheduler so i won't need to reenqueue quite so many builds (we have around 150 nodes in use at the moment according to the zuul dashboard in grafana)21:53
mnaserfungi: I think you’re looking for infra-root perhaps :) — I can’t review that :p21:53
fungimnaser: d'oh, you're right, that's a system-config repo. sorrt!21:53
fungier, sorry!21:54
fungibut thanks for looking i guess :/21:54
mordredfungi: lgtm22:15
fungiwe're down around 45 nodes in use now... getting ready to restart the scheduler shortly if it drops a bit more22:51
fungier, 65 i mean22:51
openstackgerritMerged opendev/system-config master: Revert "Temporarily serve static sites from AFS R+W vols"
* fungi sighs23:25
fungianother (smaller) spike around 23:00z, so just over 100 nodes in use at the moment23:25

