clarkb | fungi: should we approve https://review.opendev.org/c/opendev/system-config/+/951873 ? | 14:53 |
---|---|---|
corvus | +2 | 14:55 |
fungi | clarkb: yes | 14:56 |
clarkb | I've approved it | 15:05 |
fungi | thanks! | 15:06 |
Ramereth[m] | Yes, everything still seem stable and good like it was before? | 15:07 |
clarkb | mnasiadka: frickler ^ | 15:07 |
clarkb | Ramereth[m]: the only reports I've heard are positive so that is my assumption | 15:07 |
mnasiadka | Yes, I commented the other day that all seems to be back to normal :) | 15:10 |
frickler | I didn't check today, but all arm builds passed on https://review.opendev.org/c/opendev/zuul-providers/+/951471 yesterday, too | 15:23 |
opendevreview | Merged opendev/system-config master: Block access to Gitea's archive feature https://review.opendev.org/c/opendev/system-config/+/951873 | 17:01 |
clarkb | that is behind the hourly jobs | 17:04 |
clarkb | the gitea job failed doing an apt cache update on gitea10. Due to running each gitea in sequence that is as far as we got and I think only gitea09 may have a new app.ini config | 17:13 |
clarkb | I'm going to try a manual apt-get update on gitea10 now | 17:13 |
clarkb | Temporary failure resolving 'us.archive.ubuntu.com' | 17:14 |
clarkb | ;; communications error to 127.0.0.1#53: connection refused | 17:15 |
fungi | lovely | 17:16 |
fungi | dns config problem? | 17:16 |
clarkb | error: failed to read /var/lib/unbound/root.key | 17:16 |
clarkb | from journalctl -u unbound | 17:16 |
clarkb | fungi: I suspect this is fallout from full disks since 10,11,13, and 14 alls seem to have hte same issue | 17:17 |
clarkb | 09 and 12 appear to be ok and they were also the ones that didn't fill disks | 17:18 |
fungi | ah, i wonder if ansible tried to overwrite the file while the disk was full | 17:18 |
clarkb | fungi: maybe we can modify /etc/resolv.conf to point at 1.1.1.1/8.8.8.8 temporarily, then reinstall the unbound security package infos then restart unbound? | 17:18 |
clarkb | then restore /etc/resolv.conf? | 17:18 |
clarkb | then reenqueue the job | 17:18 |
clarkb | any chance you're able to dig into that? | 17:19 |
fungi | sure, on it | 17:19 |
clarkb | I want to say that data comes from that package that we now have to explicitly install. But ya maybe package updates went sideways or something? | 17:20 |
fungi | unfortunately the package is not still present in /var/cache/apt/archives/ or i'd just reinstall from there | 17:20 |
clarkb | -rw-r--r-- 1 unbound unbound 0 Jun 4 14:10 root.key | 17:20 |
clarkb | 0 bytes definitely not something I would expect unbound to be able to handle | 17:20 |
fungi | i think the file is generated, because it's not in the list of files shipped in any package | 17:21 |
clarkb | ah | 17:21 |
clarkb | so script ran and wrote out bytes that couldnt' be preserved | 17:22 |
clarkb | possible it worked until we rebooted too if things were in cache somehow? | 17:22 |
fungi | the unbound-anchor package includes a /usr/sbin/unbound-anchor tool which seems to do that | 17:23 |
fungi | yeah, i expect the running unbound daemon had it in process memory until the reboot | 17:24 |
fungi | looks like /etc/init.d/unbound us supposed to run it? | 17:24 |
fungi | specifically on start it runs `/usr/lib/unbound/package-helper root_trust_anchor_update 2>&1 | logger -p daemon.info -t unbound-anchor` | 17:25 |
clarkb | I wonder if systemd even runs that script anymore? | 17:26 |
clarkb | /usr/lib/systemd/system/unbound.service also exists but it also has a pre start helper to run that | 17:26 |
clarkb | maybe we can just stop start unbound? | 17:27 |
fungi | looks like systemd is using /usr/lib/systemd/system/unbound.service | 17:27 |
clarkb | perhaps its a chicken and egg? | 17:28 |
fungi | which runs `/usr/lib/unbound/package-helper root_trust_anchor_update` in ExecStartPre | 17:28 |
clarkb | do we need working dns for that script to run? | 17:28 |
fungi | maybe just removing the empty /var/lib/unbound/root.key would work? | 17:28 |
clarkb | oh ya that could be. Maybe it sees the file is present and doesn't check for non zero size | 17:29 |
fungi | alternatively i can shuffle a copy over from another machine like gitea09 | 17:29 |
clarkb | fungi: I think we can try (re)moving the empty file then do a systemctl stop unbound && systemctl start unbound and see if it is happier | 17:29 |
fungi | yeah, /var/lib/unbound/root.key has content after that | 17:30 |
fungi | and apt update works again | 17:30 |
clarkb | and there is an unbound process cool | 17:30 |
fungi | i'll repeat on the other affected servers | 17:30 |
clarkb | thanks | 17:30 |
clarkb | also gitea09 did not restart as expected so we need to followup and do that too | 17:31 |
clarkb | but first fix dns, reenqueue the change to deploy so that the other 5 get updated app.ini files | 17:31 |
fungi | okay, on gitea10, 11, 13 and 14 i rmoved the empty root.key file, restarted unbound and tested apt update is working | 17:33 |
fungi | shall i do the deploy reenqueue as well or were you working on that already? | 17:33 |
clarkb | go for it | 17:33 |
fungi | well, that was unexpected... my workstation just spontaneously powered itself off and back on | 17:34 |
fungi | i'll be a few then i'll get that going | 17:35 |
clarkb | ok | 17:35 |
fungi | i should be in the deploy pipeline again now | 17:38 |
fungi | er, it should be | 17:38 |
clarkb | I see it | 17:38 |
fungi | as do i | 17:39 |
fungi | going to take this opportunity to do a bit of power recabling on the workstation while that re-runs | 17:39 |
clarkb | fungi: the safe gitea restart process is to disable the host in haproxy, then docker-compose down on the host, then docker-compose up -d mariadb memcached zuul-web, then wait for the web to load on that particular node, then docker-compose up -d to start the ssh daemon, finally reenable the host in haproxy | 17:39 |
clarkb | The haproxy stuff is probably not super necessary either as it should notice that things are down, but that risks breaking a non zero number of connections whereas stopping things in haproxy first should cause stuff to drain a bit more | 17:40 |
clarkb | and then the container start order is done so that gerrit doesn't try to replicate before the gitea service can be aware of the updates (the ssh container will accept the updates and the gitea service won't be aware of those updates even though they will be in the git repo) | 17:41 |
fungi | we have that encoded in a playbook too yes? | 17:42 |
clarkb | ya | 17:42 |
fungi | would it make more sense to run that manually from bridge? | 17:42 |
clarkb | well the deploy job is already running the same playbook. The problem is those tasks only trigger if the container images change currently | 17:43 |
clarkb | and since there are no new container images they won't run | 17:43 |
clarkb | I guess we could make a new one off playbook that doesn't have that restriction similar to how we have the zuul restart playbooks | 17:44 |
clarkb | then run that | 17:44 |
clarkb | that isn't a terrible idea | 17:44 |
fungi | yeah, i guess trying to configure it to also run for config changes might be overkill | 17:46 |
fungi | this doesn't come up often | 17:46 |
clarkb | https://zuul.opendev.org/t/openstack/build/6e81a7d43ad841c1afebb6ae95774f33 success this time around | 17:48 |
clarkb | and I see the new config option in the app.ini file on gitea10 | 17:48 |
clarkb | I'm happy to just sort of work through these really quickly unless you think we should try and sort out a better automated system or tooling first | 17:49 |
fungi | nah, manual process is fine, it was more a matter of me remembering the mystic socat incantations (or more likely grepping them out of my shell history on the lb) | 17:50 |
clarkb | it should be documented in our docs too | 17:50 |
clarkb | but ya I often just look at my history :) | 17:50 |
fungi | ah, right, good old-fashioned runbooks | 17:52 |
clarkb | just to avoid any confusion and stepping on toes. Should I start on that now with gitea09 or did you want to do it? | 17:54 |
fungi | i was going to do it, just making sure i have the right keys in place so i don't need my workstation for this | 18:01 |
clarkb | ok I'll hang around and can be an extra set of eyes and hands if necessary | 18:03 |
fungi | i have a root screen session going on gitea-lb02 | 18:04 |
fungi | i'll start by disabling gitea09 in the http and https pools | 18:04 |
fungi | both are showing in maint now | 18:07 |
fungi | proceeding with the docker-compose down now | 18:08 |
fungi | on gitea09 | 18:09 |
fungi | clarkb: when you mentioned the specific subset of containers to up, you listed zuul-web. was that supposed to be something else? | 18:09 |
clarkb | fungi: yes sorry it should be gitea-web | 18:10 |
clarkb | we want to start the three containers that are not gitea-ssh | 18:10 |
fungi | ah, yep i see it | 18:10 |
clarkb | then wait for the web service to respond then start the last container gitea-ssh | 18:10 |
fungi | okay, those three are up and gitea-ssh is still down | 18:10 |
clarkb | and https://gitea09.opendev.org:3081/opendev/system-config/ loads so now you can just do a default up -d to start the last container | 18:11 |
clarkb | (also if you hit the code dropdown on that page there are no more artifact links as expected) | 18:11 |
fungi | awesome! | 18:12 |
clarkb | we just need the gitea web service sufficiently up that it can process gerrit replication and having the web service respond seems to be sufficient indicator | 18:12 |
fungi | okay, did a full docker-compose up -d now and will add gitea09 back to the pools and start the same process on 10 through 14 in serial | 18:13 |
clarkb | yup sounds great | 18:14 |
clarkb | slittle: ^ fyi I think its come up before that starlingx has asked about non working gitea archive links. We're formally disabling them today due to all the problems we've had with them historically | 18:21 |
clarkb | something we probably would've done sooner if we had realized it was an option. I think we did our big initial debug around this and it wasn't an option then didn't realize they later made it a possibility | 18:21 |
fungi | okay, all done | 18:27 |
clarkb | I hopped off the screen on lb02. I think you can close it whenever you're comofrotable | 18:28 |
clarkb | and ya I was folliwing along and checking the code dropdowns as each server finished. I think it looks good from here | 18:28 |
fungi | cool | 18:29 |
fungi | and done | 18:30 |
clarkb | thanks! | 18:30 |
fungi | thank you for the help! | 18:31 |
johnsom | Hi packaging gurus, I have a question you might know the answer to. | 20:25 |
johnsom | we have: https://github.com/openstack/octavia/blob/master/setup.cfg#L27 | 20:25 |
johnsom | data_files we like to include in the octavia package. I'm trying to fill out a pyproject.toml for octavia. Since we are still using pbr which creates the manifest, I assume I don't need to include anything special in the pyproject.toml for those. Is that a correct assumption? | 20:26 |
Clark[m] | Yes PBR should continue to honor the setup.cfg | 20:28 |
johnsom | Excellent, thank you | 20:28 |
fungi | though you could move them to the equivalent pyproject keys and let setuptools handle them directly from there as well | 20:29 |
fungi | i don't recall the exact names off the top of my head, but they should be in the pyproject spec | 20:30 |
johnsom | I looked at using "package_data", but I was worried it might conflict with the manifest file settings. (basically this whole data_files to pyproject.toml is a bit foggy for me) | 20:30 |
johnsom | Here is the proposed patch if you are curious what I am up to: https://review.opendev.org/c/openstack/octavia/+/951994/1/pyproject.toml | 20:33 |
fungi | what i usually do to test is just run pyproject-build in the old and new versions of the repo, then list the file contents of the resulting sdist/wheel files and copare them. also sometimes extract the metadata files and diff them too | 20:34 |
fungi | did that extensively for the setup.cfg->pyproject.toml conversions in several opendev tools | 20:35 |
johnsom | Yeah, that is a good idea. I used validate-pyproject for my syntax check | 20:35 |
johnsom | I need to do a bit more research, but I might propose validate-pyproject in global requirements so we have some validation on the file. It caught a few of my mistakes. | 20:38 |
fungi | sounds great to me. i also usually run twine --check as part of my dist file validation for my personal projects | 20:39 |
fungi | since twine is what we're using to upload them to pypi eventually, it's good to know early if it's going to balk | 20:40 |
johnsom | Yep, it seems good. The files in the tarball and twine check passed | 20:55 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!