clarkb | on backup02 I think we can live with this situation considering the lists case is handled proeprly due to a lack of a db backup | 00:00 |
---|---|---|
clarkb | When we upgrade to mailman3 we'll add a db I think as part of that but we'll also switch to a new server | 00:00 |
fungi | yep | 00:00 |
clarkb | and that new server will have a new borg backup location and we won't have problems as a result | 00:00 |
ianw | i can just remove those archives, as they are old, from before i fixed it | 00:00 |
clarkb | basically that means lists is fine as is I think | 00:00 |
clarkb | and review01 is unfortaunte but not much we can do now and we have backup01 archives | 00:00 |
ianw | (the archives NOT named "lists-filesystem-*" in the lists backup, to be clear) | 00:01 |
clarkb | ianw: right or we can keep them and let them age out/persist as its ok in that particular circumstance (maybe make note of it somewhere) | 00:01 |
ianw | yes, actually, i can just rename it "lists-filesystem" with the same date | 00:01 |
ianw | i'll do that, and same for review, to avoid confusion over this | 00:02 |
ianw | and i will audit backup01 for the same issue | 00:02 |
clarkb | ok I'm completely off of that server fwiw so there won't be any interference from me | 00:02 |
clarkb | (and the only non RO thing I did was run the prune script which logs to a file for us automatically) | 00:02 |
ianw | and i'll post a change for storyboard db backup. i'm kind of assumign that's a trove? | 00:02 |
clarkb | I think it may be a local db fungi ^ ? | 00:02 |
ianw | anyway, i'll look and figure it out | 00:03 |
fungi | yes, it's on the server | 00:03 |
fungi | we did that to help performance | 00:03 |
clarkb | ianw: thanks. This was a good exercise for me to better understand this stuff. Thank you for walking me through this stuff | 00:03 |
ianw | clarkb: thank you! another set of eyes always finds interesting things | 00:03 |
clarkb | I was hoping to restart nodepool-builder on nb03 today as well. I'll check on that fter dinner I guess | 00:03 |
ianw | did that change pass? | 00:04 |
clarkb | ianw: its in the gate right now | 00:04 |
clarkb | 816389 specifically | 00:04 |
ianw | ok, no worries. i can watch and pull the new container when it promotes and check it out | 00:04 |
clarkb | I don't mind checking really quickly after dinner. Then can hand off or defer to tomorrow if this finds a new problem :) | 00:05 |
ianw | if i get the f35 change in, i will want to bump a new nb container with a later dib release in it to build that too | 00:05 |
clarkb | alright food is here. I'll check in in a bit on the nodepool thing | 00:05 |
clarkb | thanks again | 00:05 |
ianw | ok, step 1, backups01 doesn't have this issue : https://paste.opendev.org/show/810352/ | 00:13 |
ianw | *** /opt/backups/borg-wiki-update-test/backup | 00:14 |
ianw | wiki-upgrade-test-filesystem | 00:14 |
ianw | wiki-upgrade-test-filesystem-2021-02-16 | 00:14 |
ianw | is weird though | 00:14 |
ianw | that is coming from "wiki-upgrade-test-filesystem-2021-02-16T02:56:09.checkpoint Tue, 2021-02-16 02:56:11 [c444a0765e5791f3f68f08624d1efd80bf8a3ebc96bb225f08e4013befa2b460]" | 00:16 |
ianw | not sure where ".checkpoint" comes from ... | 00:16 |
ianw | https://borgbackup.readthedocs.io/en/stable/faq.html#if-a-backup-stops-mid-way-does-the-already-backed-up-data-stay-there | 00:17 |
clarkb | ok I've pulled the new arm64 builder image and started services there | 00:31 |
clarkb | it seems to be running and not dying in a loop | 00:31 |
clarkb | it is having some trouble deleting images in osuosl because they are in use outside of the glance store so we may have leaked eimages there | 00:32 |
clarkb | but things seem to be running so I can look at that tomorrow. | 00:32 |
ianw | thanks | 00:34 |
opendevreview | Ian Wienand proposed opendev/system-config master: borg-backup: skip .checkpoint archives https://review.opendev.org/c/opendev/system-config/+/816422 | 01:45 |
corvus | ianw: i'm going to restart zuul for some more interactive debugging | 01:47 |
ianw | okdokie | 01:47 |
corvus | restarting now | 01:52 |
corvus | re-enqueing | 02:03 |
ianw | i've done like "/opt/borg/bin/borg rename /opt/backups/borg-lists/backup::lists-2021-01-14T05:51:03 lists-filesystem-2021-01-14T05:51:03" on backup01; all borg-lists archives are now correctly named | 02:07 |
fungi | awesome, thanks for straightening that out | 02:09 |
corvus | i'm starting a second scheduler, so status pages will produce sporadic errors for a bit | 02:10 |
corvus | re-enqueue complete | 02:12 |
ianw | ok, same done for review01 on backup02 (sorry, prior message was about backup02 as well) | 02:13 |
ianw | clarkb: re storyboard dbs; i mounted the backups and var/backups/mysql_backups/ is populated and looks good. i don't think it's worth updating the puppet to put in a streaming backup job | 02:37 |
ianw | i've heard rumors of it being containerised, etc. so i guess we can just update it when we pull it all under ansible control | 02:38 |
corvus | okay i have managed to break zuul in a controlled manner | 02:42 |
corvus | i'm continuing to debug for a bit before i restart | 02:43 |
corvus | i think i grok it. i'll revert/restart now | 02:48 |
corvus | deleting zk state | 02:50 |
corvus | re-enqueing | 03:14 |
corvus | done | 03:18 |
corvus | ianw: okay, i think everything's back to normal now re zuul; i'm heading out | 03:19 |
*** redrobot2 is now known as redrobot | 05:58 | |
*** gibi is now known as gibi_pto_back_thu | 06:11 | |
*** jpena|off is now known as jpena | 10:09 | |
opendevreview | Dr. Jens Harbott proposed opendev/infra-manual master: Update description of our testing environment https://review.opendev.org/c/opendev/infra-manual/+/816451 | 11:02 |
opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: cloud-init: Support growpart and pvresize for specified devices https://review.opendev.org/c/openstack/diskimage-builder/+/816458 | 11:15 |
*** dviroel|out is now known as dviroel|rover | 11:21 | |
opendevreview | Alfredo Moralejo proposed openstack/project-config master: Add support for CentOS Stream 9 in nodepool elements https://review.opendev.org/c/openstack/project-config/+/811442 | 11:49 |
opendevreview | Alfredo Moralejo proposed openstack/project-config master: Add centos-9-stream nodepool image https://review.opendev.org/c/openstack/project-config/+/816465 | 11:49 |
opendevreview | Merged opendev/infra-manual master: Update description of our testing environment https://review.opendev.org/c/opendev/infra-manual/+/816451 | 11:53 |
opendevreview | Merged openstack/diskimage-builder master: fedora-container: update to Fedora 35 https://review.opendev.org/c/openstack/diskimage-builder/+/815574 | 11:58 |
clarkb | ianw: oh got it its doign the old school backups so we have them db in there. | 15:19 |
opendevreview | Jeremy Stanley proposed opendev/infra-specs master: Add a specification for Mailman 3 https://review.opendev.org/c/opendev/infra-specs/+/810990 | 15:21 |
clarkb | fungi: do you want to respond to emilienM or should I do it re gophercloud using "some" opendev resources (we're pretty well aligned against doing that at this point, but an openstack project could set up third party testing with the sdk I suppose) | 16:26 |
fungi | yeah, i was just pulling up the references to link in my reply | 16:27 |
clarkb | thanks I figured you were already on top of it since you tend to read that mailing list more quiickly than I do hence my question :) | 16:27 |
clarkb | let me know if I can help | 16:27 |
fungi | replied | 16:41 |
clarkb | fungi: for the mailman3 spec (catching up on the latest ps) the talk about bounce handling and all that makes me wonder if we need ot explicitly test the dmarc/dkim stuff? | 16:42 |
clarkb | I think kata lists just do the munging and opendev lists do the don't rewrite/add headers option. We probably want to ensure both options are still viable? | 16:42 |
clarkb | I don't think we need to explicitly add that to the spec unless you feel it is improtant too, but wanted to call it out while I was thinking of it | 16:43 |
*** marios is now known as marios|out | 16:43 | |
fungi | i figured we'd evaluate all of that at the time, yeah | 16:48 |
corvus | fyi, my thinking on zuul now is that it would be best to try to wait until we finish the branch cache refresh work before we restart on master again. we have hopefully found and fixed the biggest bug (re the change cache) in master, but we know we can't run with two schedulers for long without the branch cache work. so getting that work done before restarting again puts us in a place where we can potentially run with 2 schedulers for a longer | 17:16 |
corvus | period if everything works. and in the interim, that gives the opendev service a breather since we had a lot of restarts recently. | 17:16 |
clarkb | makes sense | 17:17 |
fungi | yeah, works for me, thanks | 17:21 |
frickler | infra-root: ethercalc seems down (503), can someone doublecheck and possibly restart? | 17:36 |
clarkb | looks like the service itself isn't running | 17:38 |
clarkb | I'll restart it | 17:38 |
clarkb | I don't see any logs in journalctl or in /var/log/ethercalc so not sure why | 17:38 |
clarkb | frickler: it is up now | 17:40 |
fungi | we've seen it crash due to a csv export bug in the past | 17:40 |
clarkb | fungi: any idea where the logs go? | 17:40 |
fungi | also our puppet-ethercalc module explicitly ensures "started" which is why it magically recovers on its own if left alone | 17:41 |
fungi | apache maybe? | 17:41 |
clarkb | well it runs as a separate node process | 17:41 |
clarkb | all apache does is proxy | 17:41 |
fungi | oh, syslog i think | 17:41 |
clarkb | its also possible that it just doesn't log anything and dies | 17:41 |
fungi | digging | 17:42 |
fungi | yeah, it's in syslog | 17:42 |
fungi | /var/log/syslog.1 around 14:08:43 utc yesterday | 17:43 |
fungi | Error: Can't set headers after they are sent. | 17:43 |
clarkb | heh so a bug in the tool. I guess restarting is what we do then | 17:45 |
fungi | yeah, same error we usually see, reported a few years ago but still unfixed: https://github.com/audreyt/ethercalc/issues/626 | 17:46 |
fungi | we might be able to disable excel exporting | 17:46 |
*** jpena is now known as jpena|off | 17:48 | |
frickler | cacti02 says: The SSL certificate for openstackid.org "(CN: subject= /CN=openstackid.org)" will expire on Dec 1 19:21:18 2021 GMT | 18:23 |
fungi | yeah, clarkb gave the sysadmins for that service a heads up about it earlier today | 18:25 |
fungi | we don't host it any longer, but we do host a few systems which would be inaccessable if it were to have its cert expire, so might be good to continue monitoring it | 18:26 |
fungi | or maybe it's just translate.openstack.org we host which uses it at this point | 18:26 |
clarkb | ya I think keeping that check in place for another cycle or two is a good idea | 18:28 |
clarkb | then if they are refreshing the cert properly we might turn it off | 18:28 |
fungi | our znode count is drastically reduced (by more than half) since sunday. did zuul's znode usage get more efficient? | 19:03 |
dmsimard | FYI, upstream EOL of ansible 2.9 and ansible-base 2.10 have been announced: https://groups.google.com/g/ansible-announce/c/kegIH5_okmg/ | 19:04 |
fungi | https://grafana.opendev.org/d/5Imot6EMk/zuul-status?viewPanel=37&orgId=1&from=now-14d&to=now | 19:04 |
fungi | dmsimard: thanks for the info! | 19:04 |
fungi | "The planned end of life date for upstream Ansible 2.9 is May 23, 2022 which coincides with the scheduled release of ansible-core 2.13. End of life for ansible-base 2.10 will also coincide with the scheduled release of ansible-core 2.13." | 19:05 |
*** dviroel|rover is now known as dviroel|rover|afk | 19:45 | |
ianw | clarkb/fungi: https://review.opendev.org/c/opendev/system-config/+/816422 was a fairly easy one from yesterday after i noticed weird archives in the list. be good if you could double check my awking | 20:17 |
ianw | it looks like magnum have merged the change to stop using our mirror on all branches that currently aren't in a failing state. | 20:18 |
ianw | i'll take opinions on our mirror removal with https://review.opendev.org/c/opendev/system-config/+/816416 ... we could either force-merge the other changes to the broken branches, or just leave those changes as-is | 20:19 |
ianw | if anyone wants to fix the gates, they can stack their changes ontop, but i feel like we've giving good notification the image isn't there any more | 20:19 |
fungi | yeah, i think the awk script there looks okay, maybe in the future we can replace it with inefficient and verbose code which is a bit less inscrutible | 20:23 |
ianw | probably a "borg list-archives" command would be useful, that just gives the unique archive names in the backup | 20:25 |
ianw | ... having being prompted and actually reading borg list ... | 20:26 |
ianw | possibly --format '{name}{NL}' does that? | 20:27 |
clarkb | dmsimard: I believ we're on ansible 4? something liek that | 20:28 |
*** dviroel|rover|afk is now known as dviroel|rover | 21:29 | |
*** dviroel|rover is now known as dviroel|rover|out | 22:18 | |
clarkb | ianw: I've approved the .checkpoint fixup after testing the awk locally. I think that is correct but maybe we should run a noop prune after it lands? | 22:21 |
clarkb | by testing the awk I mean echo FOO | awk where FOO has a .checkpoint or it doesn't | 22:21 |
clarkb | I didn't check it with actual borg listings | 22:21 |
ianw | clarkb: thanks, good idea; backup01 has one | 22:26 |
opendevreview | Merged opendev/system-config master: borg-backup: skip .checkpoint archives https://review.opendev.org/c/opendev/system-config/+/816422 | 22:45 |
ianw | i'm just running that noop prune test now | 23:29 |
opendevreview | Merged opendev/system-config master: Add Fedora 35 mirror https://review.opendev.org/c/opendev/system-config/+/816404 | 23:42 |
ianw | i'll run that ^ i've noticed that heat is using fedora images (https://review.opendev.org/c/openstack/heat/+/816592) | 23:50 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!