ianw | clarkb: i think it's fine to use the shell container, but we just want to "run --rm" it ... because we don't want old shell containers around? | 00:04 |
---|---|---|
clarkb | ianw: the problem is that we aren't doing that today because docker-compose up -d doesn't rm the shell container | 00:05 |
clarkb | my concern is thta I don't want us relying on a random cronjob to correct the docker-compose commands run elsewhere. | 00:06 |
clarkb | basically the container exists today whether we want it to or not. I think removing it should be handled separately | 00:06 |
ianw | i'm not 100% sure we're talking about the same thing ... in "docker images ls" we don't want to see old shell containers that were started to run this job, right? | 00:08 |
clarkb | ianw: its `docker ps -a` and I agree that is probably a good thing. But if you run that command right now you'll see this container exists and isn't cleaned up | 00:10 |
clarkb | docker image ls is for the images which this change doesn't affect (its going to use whichever image is currently present which is why I split it into a separate change) | 00:10 |
opendevreview | Clark Boylan proposed opendev/system-config master: Prune invalid replication tasks on Gerrit startup https://review.opendev.org/c/opendev/system-config/+/880672 | 00:11 |
opendevreview | Clark Boylan proposed opendev/system-config master: Run the replication task cleanup daily https://review.opendev.org/c/opendev/system-config/+/880688 | 00:11 |
clarkb | it is a side effect of having shell defined in docker-compose.yaml and using `docker-compose up -d` this runs the shell container which by default executes `true` in the image then exits but does not remove the container | 00:12 |
ianw | ok, but the daily cron job is going to create another container every day? | 00:13 |
clarkb | no, it will be the same container I think | 00:13 |
clarkb | (due to docker-compose magic) | 00:13 |
ianw | ok, maybe that's where we're crossing wires. in other commands like that we exec, to use the current container | 00:14 |
ianw | i feel like the run is going to create a new container, and without --rm leave it there | 00:14 |
ianw | there is an easy way to test :) | 00:14 |
clarkb | It will create a new container but only one that is logically managed by docker-compose called shell | 00:14 |
clarkb | if you used docker run instead of docker-compose run then you would create a new contaner every day | 00:15 |
ianw | # /usr/local/bin/docker-compose -f /etc/gerrit-compose/docker-compose.yaml run -T shell ls | 00:15 |
ianw | 1475ef7225d8 opendevorg/gerrit:3.7 "/usr/bin/dumb-init …" 6 seconds ago Exited (0) 5 seconds ago gerrit-compose_shell_run_f9f293a19816 | 00:15 |
ianw | # /usr/local/bin/docker-compose -f /etc/gerrit-compose/docker-compose.yaml run --rm -T shell ls | 00:16 |
clarkb | and if you docker ps -a there should only be three containers not four | 00:16 |
ianw | there is 4, that was from ps -a | 00:16 |
fungi | hoe does docker-compose run differ from docker-compose exec? | 00:16 |
fungi | s/hoe/how/ | 00:16 |
ianw | https://paste.opendev.org/show/beCqux2wuQgO7RxVQsea/ | 00:16 |
clarkb | fungi: run means start a new container instance and exec means run something in an already running container | 00:16 |
fungi | ah, okay | 00:17 |
ianw | where as with "--rm" "docker ps -a" doesn't show another container added | 00:17 |
clarkb | ianw: what ends up cleaning our gerrit init/reindex containers then? | 00:17 |
ianw | i think we exec all them? | 00:17 |
clarkb | maybe that is where I'm getting confused. We only have the one shell container and we use it for multiple commands and don't end up with extras | 00:17 |
clarkb | no we don't because there isn't a running gerrit to exec an offline reindex in | 00:18 |
clarkb | it has to be run | 00:18 |
clarkb | https://etherpad.opendev.org/p/gerrit-upgrade-3.7 line 136 for example | 00:18 |
clarkb | maybe docker-compose down clears those extras too | 00:19 |
clarkb | and so over time we trned towards having them cleaned up | 00:19 |
ianw | perhaps down clears? | 00:19 |
ianw | oh, yeah, what you said :) | 00:19 |
ianw | up or down, or something like that | 00:19 |
clarkb | I find that extremely confusing behavior if so | 00:19 |
ianw | it was named "gerrit-compose_shell_run_f9f293a19816" so maybe docker-compose does some matching | 00:19 |
clarkb | ya its something like gerrit-compose == the dirname of the docker-compose.yaml file. shell is the container name. Then $instance after that | 00:21 |
clarkb | we have _1s by default from up -d because we only want a single copy of each container | 00:22 |
clarkb | I guess the run invocation gets something else that lets it track and clean it up | 00:22 |
opendevreview | Clark Boylan proposed opendev/system-config master: Prune invalid replication tasks on Gerrit startup https://review.opendev.org/c/opendev/system-config/+/880672 | 00:22 |
opendevreview | Clark Boylan proposed opendev/system-config master: Run the replication task cleanup daily https://review.opendev.org/c/opendev/system-config/+/880688 | 00:22 |
clarkb | I expected everything to happen within the one container based on the fact that we only ever had the one container when I ran docker ps -a and we definitely run other commands on the host. But ya we must down often enough to clean them up | 00:22 |
ianw | do we use a "dummy" node in any tests? | 00:33 |
ianw | the LE test jobs roll out things to the adns primary server, but we dont' have any secondary ns nodes in the test | 00:34 |
ianw | we don't actually need them, but i'd like them in the inventory | 00:34 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] dns: abstract names https://review.opendev.org/c/opendev/system-config/+/880580 | 00:53 |
Clark[m] | I don't know if we have any nodes that are ignored by the test job if that is what you mean? | 00:55 |
ianw | yeah, i guess we could use add_host: to add something fake, but if there's any "host: all" that doesn't work. | 00:57 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] dns: abstract names https://review.opendev.org/c/opendev/system-config/+/880580 | 01:41 |
opendevreview | Ian Wienand proposed opendev/system-config master: letsencrypt test : update to jammy https://review.opendev.org/c/opendev/system-config/+/880698 | 01:41 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] dns: abstract names https://review.opendev.org/c/opendev/system-config/+/880580 | 02:15 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] dns: abstract names https://review.opendev.org/c/opendev/system-config/+/880580 | 04:06 |
auniyal | ehterpad opendev is down | 04:20 |
auniyal | it's up again, thanks | 04:21 |
opendevreview | Ian Wienand proposed opendev/system-config master: letsencrypt test : update to jammy https://review.opendev.org/c/opendev/system-config/+/880698 | 04:28 |
opendevreview | Ian Wienand proposed opendev/system-config master: dns: abstract names https://review.opendev.org/c/opendev/system-config/+/880580 | 04:28 |
opendevreview | Ian Wienand proposed opendev/system-config master: inventory : add Ubuntu Jammy DNS refresh servers https://review.opendev.org/c/opendev/system-config/+/880579 | 04:28 |
opendevreview | Ian Wienand proposed opendev/system-config master: dns : add Jammy refresh servers https://review.opendev.org/c/opendev/system-config/+/880706 | 06:28 |
*** amoralej|off is now known as amoralej | 06:56 | |
opendevreview | Ian Wienand proposed opendev/zone-opendev.org master: Add Jammy refresh NS records https://review.opendev.org/c/opendev/zone-opendev.org/+/880577 | 06:58 |
opendevreview | Ian Wienand proposed opendev/zone-opendev.org master: Remove old NS nodes https://review.opendev.org/c/opendev/zone-opendev.org/+/880709 | 06:58 |
opendevreview | Ian Wienand proposed opendev/system-config master: Remove old DNS servers https://review.opendev.org/c/opendev/system-config/+/880710 | 07:04 |
ianw | clarkb / fungi: https://etherpad.opendev.org/p/2023-opendev-dns is what i think the final plan for dns server swizzle comes down to. the mess of changes ^ should work step-by-step in there... | 07:12 |
frickler | ianw: does https://review.opendev.org/c/opendev/system-config/+/880580 need new records to be created, too? or why is the job failing? | 07:27 |
*** jpena|off is now known as jpena | 07:32 | |
*** cloudnull6 is now known as cloudnull | 13:37 | |
fungi | looks like the zuul restarts eventually finished, though a new commit landed after ze05 pulled, so ze01-ze05 are on a slightly older version (by one commit) than the rest of the servers | 14:20 |
opendevreview | Clark Boylan proposed opendev/system-config master: Prune invalid replication tasks on Gerrit startup https://review.opendev.org/c/opendev/system-config/+/880672 | 15:22 |
opendevreview | Clark Boylan proposed opendev/system-config master: Run the replication task cleanup daily https://review.opendev.org/c/opendev/system-config/+/880688 | 15:22 |
frickler | infra-root: seems arm64 builds are failing, timing suggests it is related to the updated nodepool https://grafana.opendev.org/d/f3089338b3/nodepool-dib-status?orgId=1 | 15:27 |
clarkb | exec_sudo: losetup: /opt/dib_tmp/dib_image.x1eCaKaY/image0.raw: failed to set up loop device: No such file or directory | 15:29 |
clarkb | the data volume isn't full but there are 8 losetup devices in use (which is high) | 15:30 |
clarkb | might be worth stopping the services and rebooting to clear those out and take it from there? I don't recall if there are limits to losetup device numbrs smaller than say 256. losetup -f says loop9 is next at least | 15:31 |
frickler | oh, so likely some other build failure first that wasn't cleaned up properly and now this. that matches that some failures seem to have started earlier in grafana than others | 15:32 |
clarkb | yes or at least that is one possibility | 15:32 |
frickler | so restarting sounds reasonable, maybe also check for uncleaned build dirs if possible | 15:33 |
clarkb | frickler: should I do that or did you want to do it? | 15:34 |
frickler | clarkb: please go ahead, I'm in evening mode already | 15:35 |
clarkb | ok | 15:37 |
clarkb | I've stopped services and have begun cleanup of /opt/dib_tmp. Will reboot when that is done | 15:39 |
frickler | openeuler has a different failure, seems there is still an issue in the mirror setup for it | 15:47 |
frickler | error: Status code: 404 for http://mirror.dfw.rax.opendev.org/openeuler/openEuler-20.03-LTS-SP2//openEuler-20.03-LTS-SP2/OS/x86_64/repodata/repomd.xml | 15:47 |
frickler | note the duplication in the middle of the path | 15:47 |
frickler | oh, wait, that's 20.03. we only mirror 22.03 currently | 15:48 |
frickler | and that from from an old build log, too. maybe we should clean those up after like a couple of weeks to avoid such confusion? | 15:50 |
frickler | the error for 22.03 is: /tmp/in_target.d/pre-install.d/00-setup-mirror: line 12: TARGET_ROOT: unbound variable | 15:50 |
frickler | which I think is related to the latest fix attempt in dib | 15:51 |
clarkb | ++ to removing old images | 15:52 |
frickler | clarkb: old images should get removed by nodepool when we remove them from the config in the right order, this is just about the build logs that linger | 15:54 |
frickler | https://review.opendev.org/c/openstack/diskimage-builder/+/878807/1/diskimage_builder/elements/openeuler-minimal/pre-install.d/00-setup-mirror is exactly the file above | 15:54 |
clarkb | frickler: oh I see | 15:54 |
clarkb | re the TARGET_ROOT error previously that file wasn't running at all because it wasn't named in a way the runparts implementation would find and execute. I suspect this has exposed a latent bug now that it is actually running | 15:55 |
frickler | yes, maybe the author will notice and debug further, seems they were have some interest in openeuler | 15:56 |
frickler | having | 15:56 |
*** amoralej is now known as amoralej|off | 16:02 | |
fungi | huawei has a vested interest in openstack upstream testing on that platform, but i think they're not always paying close enough attention to help ensure it happens. i've prodded account managers at the foundation to bring the recent openeuler testing concerns to the attention of contacts there | 16:05 |
fungi | hopefully they'll figure out that they need to keep an eye on this stuff | 16:06 |
*** jpena is now known as jpena|off | 16:17 | |
clarkb | ianw: I've reviewed most of the dns update changes and I think overall the process looks good. Testing did catch a bug and I've lefta couple of questions though | 16:32 |
opendevreview | Martin Kopec proposed opendev/system-config master: refstack: fix doc paths https://review.opendev.org/c/opendev/system-config/+/880767 | 16:33 |
clarkb | ianw: I also left a note in the etherpad about needing ot update the other zones (zuul-ci.org, zuulci.org, and gating.dev) as well | 16:34 |
opendevreview | Merged opendev/system-config master: letsencrypt test : update to jammy https://review.opendev.org/c/opendev/system-config/+/880698 | 16:37 |
fungi | when did zuul start reporting the offending change number in a parent merge failure? that's SO useful | 16:43 |
clarkb | I want to say that happened end of last year or early this year | 16:44 |
opendevreview | Jeremy Stanley proposed opendev/zone-opendev.org master: Dummy mailman hostname to house the list of lists https://review.opendev.org/c/opendev/zone-opendev.org/+/867981 | 16:46 |
clarkb | nb04 finally finished cleaning up things in dib_tmp and I am rebooting it now | 16:59 |
clarkb | losetup -f looks better | 17:02 |
clarkb | lets see how the next build(s) do | 17:02 |
opendevreview | Martin Kopec proposed opendev/system-config master: refstack: fix doc paths https://review.opendev.org/c/opendev/system-config/+/880767 | 17:11 |
clarkb | fungi: do ns servers need valid reverse DNS? | 17:39 |
clarkb | (something we'll need to ask vexxhost to update if so) | 17:39 |
fungi | not strictly required afaik, just a really really good idea | 17:42 |
frickler | +1, was about to write something similar | 17:42 |
clarkb | ok I'll add a note in the etherpad | 17:42 |
fungi | but then, i tend to think accurate reverse dns records for everything is a good idea | 17:43 |
ianw | one other thought is that i can't remember how the grafana page is generated, but there is a chance it's not 100% reflecting all the active volumes. it might be worth checking to ensure somethings not growing we're not seeing there | 20:01 |
clarkb | ianw: the 92.2% is total though right? you're saying a volume may have growth we aren't graphing | 20:02 |
* clarkb finds lunch. | 20:03 | |
ianw | the overall is right, but yeah, might miss a volume | 20:03 |
clarkb | got it | 20:03 |
clarkb | oh also arm64 image builds were all broken. I decided to start by cleaning up the building since it was complaining around losetup commands | 20:03 |
clarkb | will need to check on build logs to see if that persists after clearing dib_tmp and reboots | 20:04 |
clarkb | *cleaning up the builder | 20:04 |
frickler | there was one good build at least after the reboot https://nb04.opendev.org/ubuntu-bionic-arm64-0000147820.log | 20:48 |
frickler | maybe we also want to pause openeuler builds now that they are hard broken | 20:49 |
ianw | i thought we merged a mirror fix for that? i'll have to pull it up. but thanks for getting it back on track | 22:44 |
ianw | clarkb: good point ont he . in https://review.opendev.org/c/opendev/system-config/+/880580 and will fix | 22:45 |
ianw | AFAIK, the name listed in the SOA record doesn't really do anything | 22:45 |
clarkb | ianw: re the '.' it was the test results that made me realize those were missing since it was complaining about ns99.opendev.org.acme.opendev.org :) | 22:46 |
ianw | yeah that for sure :) | 22:46 |
ianw | i'm not sure what it means to have the hidden master listed in the SOA record. it doesn't respond to anyone but the two nameservers | 22:47 |
clarkb | oh good point | 22:48 |
clarkb | ya so that should be fine to toggle back and forth | 22:48 |
clarkb | unless it needs to match what the registrar says? | 22:48 |
ianw | from what i read, and i'm willing to be corrected, it's only used by some fairly obscure dynamic-dns implementation | 22:49 |
clarkb | ianw: because the dynamic updates want to go to the primary authoritatvie server? Thinking out loud here we give the registrar our NS server values so that `dig NS opendev.org` can resolve due ot the way dns db tree works. So ya the actual adns server which doesn't respond to requests shouldn't matter here | 22:51 |
clarkb | (and really the registry is recording those values with the .org domain not doing anything with it itself) | 22:51 |
ianw | i don't know the flow for the dynamic-dns use of the SOA record. but yeah, for us it's ask .org. for NS records for opendev.org. (which has glue records for ns1/ns2) and then it checks the opendev.org. NS records to make sure the server is authoritative | 22:53 |
fungi | the name in the soa record is used by some secondary servers to know which server to axfr the zone from, in situations where a resolver may be a secondary for multiple primaries | 22:55 |
clarkb | fungi: so as long as we chnage that record after adns2 is up and running we're fine | 22:56 |
fungi | it boils down to the behavior of the nsd and how explicit the configs are | 22:56 |
fungi | normally, you want the primary to notify secondaries when there are serial changes anyway | 22:56 |
fungi | so it rarely comes into play | 22:57 |
ianw | right -- that is our situation | 22:57 |
fungi | the hostname in the soa can become useful if notifications don't happen (or are missed) and the zone ttl expires | 22:58 |
fungi | but otherwise tends to be irrelevant | 22:58 |
ianw | since the secondaries have just one variable telling them their primary, we can deploy all the zones to two primaries (adns1 and adns02) with ansible, and none of the 4 secondaries (ns1,ns2,ns03,ns04) will get confused | 22:58 |
fungi | agreed | 22:59 |
fungi | it also used to be a little more relevant since you might have the same zones served from multiple primaries, you could use that field to track which primary the original record came from, but nobody really does that these days | 23:00 |
fungi | to a great extent it's residual baggage from an era of belt-and-suspenders design | 23:01 |
opendevreview | Clark Boylan proposed opendev/system-config master: Run the replication task cleanup daily https://review.opendev.org/c/opendev/system-config/+/880688 | 23:06 |
clarkb | ianw: ^ good catch | 23:06 |
opendevreview | Ian Wienand proposed opendev/system-config master: dns: abstract names https://review.opendev.org/c/opendev/system-config/+/880580 | 23:27 |
opendevreview | Ian Wienand proposed opendev/system-config master: inventory : add Ubuntu Jammy DNS refresh servers https://review.opendev.org/c/opendev/system-config/+/880579 | 23:27 |
opendevreview | Ian Wienand proposed opendev/system-config master: dns : add Jammy refresh servers https://review.opendev.org/c/opendev/system-config/+/880706 | 23:27 |
opendevreview | Ian Wienand proposed opendev/system-config master: Remove old DNS servers https://review.opendev.org/c/opendev/system-config/+/880710 | 23:27 |
ianw | i guess the gitea api gets a lot less exposure because the UI isn't one of these SPA's that drives everything via calls | 23:40 |
ianw | when I first looked, i pulled up the network monitor and reloaded /explore/organizations and was like -- wait, where's the calls? | 23:41 |
ianw | imagine my shock to "view source" and see ... actual source | 23:41 |
ianw | not just "<script jsblob.js>" | 23:41 |
fungi | oh those were the days | 23:43 |
clarkb | ya I want to say it was org renaming or something that we hacked around before by manipulating forms because there was no api for it | 23:46 |
clarkb | but then that got fixed | 23:46 |
clarkb | ianw: why does key[1] change to key[2] ? | 23:47 |
clarkb | we aren't changing any of the datastructures as far as I can tell? | 23:47 |
clarkb | in https://review.opendev.org/c/opendev/system-config/+/880580 specifically | 23:47 |
ianw | arrrgggghhhhhh! | 23:52 |
ianw | it's freaking emacs dns mode. it thinks it's the serial number i think | 23:52 |
opendevreview | Ian Wienand proposed opendev/system-config master: dns: abstract names https://review.opendev.org/c/opendev/system-config/+/880580 | 23:54 |
opendevreview | Ian Wienand proposed opendev/system-config master: inventory : add Ubuntu Jammy DNS refresh servers https://review.opendev.org/c/opendev/system-config/+/880579 | 23:54 |
opendevreview | Ian Wienand proposed opendev/system-config master: dns : add Jammy refresh servers https://review.opendev.org/c/opendev/system-config/+/880706 | 23:54 |
opendevreview | Ian Wienand proposed opendev/system-config master: Remove old DNS servers https://review.opendev.org/c/opendev/system-config/+/880710 | 23:54 |
ianw | now saved with vi :) | 23:55 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!