*** dviroel|afk is now known as dviroel | 00:41 | |
*** dviroel is now known as dviroel|out | 00:50 | |
*** rlandy is now known as rlandy|out | 01:58 | |
frickler | ianw: well cirros uses the ubuntu kernel, so that would likely be a match. I just don't understand why the same thing doesn't seem to be happening with the cloud image that neutron uses in their tests. but I can easily set up a test with the added cmdline parameter for cirros | 04:41 |
---|---|---|
*** ysandeep|out is now known as ysandeep | 04:42 | |
*** ysandeep is now known as ysandeep|brb | 05:02 | |
ianw | frickler: hrm, does that use nested virt? probably doesn't happen on binary translation i guess | 05:17 |
frickler | ianw: yes, nested virt. https://opendev.org/openstack/neutron-tempest-plugin/src/branch/master/zuul.d/base-nested-switch.yaml | 05:27 |
frickler | if I switch back to qemu, the issue indeed doesn't happen | 05:28 |
frickler | but booting a full ubuntu image under qemu takes ages, it would be fine for the cirros based tests though | 05:29 |
frickler | https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/854910 is the root of my test stack. note that even jobs that aren't affected are failing (due to cirros not supporting network setup via configdrive plus some multicast issue) | 05:31 |
*** ysandeep|brb is now known as ysandeep | 05:35 | |
ianw | frickler: if you have a sec to just double-check https://review.opendev.org/c/opendev/system-config/+/857241 that is for the translate db backup failure you noted, thanks | 06:09 |
*** tkajinam is now known as Guest162 | 06:33 | |
frickler | infra-root: seems there is a tmux session on mirror-update since May 22 locking debian? | 07:09 |
frickler | kolla is seeing failures due to what looks like an outdated debian mirror, not 100% sure if related, but looks strange to me | 07:10 |
*** jpena|off is now known as jpena | 07:37 | |
*** diablo_rojo_phone is now known as Guest166 | 07:50 | |
*** mgoddard- is now known as mgoddard | 08:11 | |
*** ysandeep is now known as ysandeep|lunch | 09:06 | |
zigo | Hi. Is it expected that https://docs.openstack.org/releasenotes/ceilometer/unreleased.html is empty? | 09:13 |
zigo | (rc1 just got released...) | 09:13 |
frickler | looking closer, the tmux session doesn't cause issues other than confusing me, the flock has been exited | 10:14 |
frickler | however, debian seem to have had a large update on friday that may have timeouted the reprepro run, there is a stale lockfile from that date | 10:15 |
frickler | I'm tempted to just remove it and let things rerun, but maybe taking the lock manually might be needed, waiting for others to jump in | 10:15 |
zigo | frickler: Maybe that's related to the point release of this week-end, where both buster and bullseye got a new point release? | 10:24 |
frickler | that would likely explain the rush of updates, yes | 10:25 |
zigo | FYI, this is the last Buster point release, as it is now handled by the LTS team rather than all DDs. | 10:27 |
zigo | This time, OpenStack (rocky) will be part of the supported list of software. | 10:27 |
zigo | So we get a 5 years support for OpenStack in Debian from now on, which I'm very happy about. | 10:28 |
*** rlandy|out is now known as rlandy | 10:29 | |
*** ysandeep|lunch is now known as ysandeep | 10:43 | |
fungi | zigo: about the release notes, i think the way reno handles that is it sorts them under the release candidate once it exists, so they moved to https://docs.openstack.org/releasenotes/ceilometer/zed.html when rc1 was tagged (because that's the branch point for stable/zed and all master branch development now is targeting 2013.1/antelope) | 11:26 |
*** dviroel|out is now known as dviroel | 11:36 | |
frickler | when I checked earlier, the Zed renos weren't there yet, either. IIRC it needs a patch in ceilometer to be merged after branching. not sure whether in master or zed or both | 11:37 |
zigo | Oh... | 11:54 |
zigo | I guessed it was updated right after I wrote here ! :) | 11:54 |
fungi | yeah, there's likely a period between branching and merging the initial autogenerated change to the stable branch where those release notes are in limbo | 11:58 |
*** frenzyfriday is now known as frenzyfriday|lunch | 12:32 | |
*** dasm|off is now known as dasm | 13:21 | |
*** frenzyfriday|lunch is now known as frenzyfriday | 13:26 | |
opendevreview | Dr. Jens Harbott proposed openstack/project-config master: Fix update_target definition for service-types-authority https://review.opendev.org/c/openstack/project-config/+/857435 | 13:38 |
frickler | gtema: dtantsur: ^^ seems that's an easy fix | 13:39 |
frickler | ianw: seems that was two typos in one patch. the other was found in 2020, this one made it longer ;) https://opendev.org/openstack/project-config/commit/e1d227bf5b70f531ca608202d47d5797536c67ad | 13:43 |
dtantsur | thx! | 13:45 |
frickler | fungi: clarkb: do you have time for a quick look at 857435? | 13:51 |
fungi | frickler: done | 14:21 |
frickler | ty | 14:24 |
opendevreview | Merged openstack/project-config master: Fix update_target definition for service-types-authority https://review.opendev.org/c/openstack/project-config/+/857435 | 14:28 |
frickler | dtantsur: would you consider merging the venus patch to verify ^^? or should I rather rerun the failed post job? | 14:33 |
*** ykarel is now known as ykarel|afk | 14:54 | |
*** ykarel|afk is now known as ykarel | 15:10 | |
dtantsur | frickler: so, merge https://review.opendev.org/c/openstack/service-types-authority/+/857080 ? | 15:27 |
frickler | dtantsur: if you will, yes. I created the followup | 15:28 |
dtantsur | done | 15:30 |
*** marios is now known as marios|out | 15:49 | |
*** dviroel is now known as dviroel|lunch | 15:51 | |
clarkb | frickler: did the mirror-update situation for debian get resolved? | 15:52 |
clarkb | frickler: I'm looking at the server and there are no flock processes currently. The files exist on disk but that doesn't mean they are locked iirc. You need an active process to hold open the lock I think | 15:55 |
*** ysandeep is now known as ysandeep|out | 15:57 | |
frickler | clarkb: no, there still is this file: -rw------- 1 10004 root 0 Sep 10 14:10 /afs/.openstack.org/mirror/debian/db/lockfile | 16:02 |
frickler | and /var/log/reprepro/debian.log says it is not doing anything because of that | 16:03 |
frickler | on mirror-update | 16:04 |
clarkb | oh a reprepro lock file not our crontab flock locks. | 16:13 |
clarkb | The man page indicates that this can happen if reprepro is interrupted inappropriately (and indicates that sort of thing can also lead to db corruption :/) | 16:13 |
clarkb | afs: Warning: We are having trouble keeping the AFS stat cache trimmed down under the configured limit (current -stat setting: 15000, current vcache usage: 91072). afs: If AFS access seems slow, consider raising the -stat setting for afsd. | 16:14 |
clarkb | noticed that when checking dmesg for OOMs | 16:14 |
clarkb | probably unrelated, but maybe something we should update as well | 16:14 |
clarkb | my hunch here is that we'll have to manually remove that lockfile as no reprepro process for debian is running | 16:15 |
clarkb | and then see if the db needs fixing (clearing and starting over?) | 16:15 |
clarkb | fungi: ^ you may know? | 16:15 |
fungi | i think we can just delete the lockfile and let it try to sync again. if it complains about a corrupt db then we may have to force it to rebuild | 16:18 |
clarkb | and that will require aklog because it is in the afs fs | 16:22 |
* clarkb does this | 16:22 | |
clarkb | done | 16:24 |
clarkb | it will next run in about an hour and 45 minutes | 16:25 |
clarkb | The forest fire smoke has largely cleared out of here today. I'm going to get on the bike momentarily. Should be back around when that starts and for the meeting | 16:25 |
opendevreview | Felipe Reyes proposed openstack/project-config master: Add Keystone OpenID Connect charm to OpenStack charms https://review.opendev.org/c/openstack/project-config/+/857492 | 16:28 |
*** jpena is now known as jpena|off | 16:35 | |
opendevreview | Felipe Reyes proposed openstack/project-config master: Add Keystone OpenID Connect charm to OpenStack charms https://review.opendev.org/c/openstack/project-config/+/857492 | 16:35 |
*** dviroel|lunch is now known as dviroel | 16:57 | |
clarkb | looks like reprepro is running for debian now | 18:15 |
opendevreview | Clark Boylan proposed opendev/system-config master: Update python builder and base image https://review.opendev.org/c/opendev/system-config/+/856537 | 18:40 |
clarkb | the base images updated about an hour ago and appear to include the debian updates so we don't need to manually install libc updates | 18:41 |
fungi | db_close(contents.cache.db, compressedfilelists): BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery | 18:52 |
fungi | i guess that's for after the meeting, i can take a closer look then | 18:52 |
clarkb | fungi: that was from reprepro? | 18:57 |
fungi | yeah | 19:01 |
ianw | interesting that https://grafana.opendev.org/d/9871b26303/afs?orgId=1 is showing releases as of 3 days ago | 19:03 |
ianw | oh right; -rw------- 1 10004 root 0 Sep 10 14:10 /afs/.openstack.org/mirror/debian/db/lockfile would match that | 19:04 |
fungi | yep, precisely | 19:05 |
ianw | Secure Connection Failed | 19:23 |
ianw | An error occurred during a connection to lists.opendev.org. Peer reports it experienced an internal error. | 19:23 |
ianw | Error code: SSL_ERROR_INTERNAL_ERROR_ALERT | 19:23 |
ianw | ^ this is a weird thing i see in my work vm. but not the browser outside that | 19:23 |
clarkb | ianw: this is to https not smtp + ssl? | 19:31 |
clarkb | just want to make sure we're looking at the right server with ssl | 19:31 |
ianw | yeah, sorry https. it must be a local thing, i can't replicate | 19:32 |
fungi | different operating systems? | 19:34 |
clarkb | ianw: I think it is fair to ask that grafana follow typical norms for tagging images on docker hub | 20:03 |
clarkb | ianw: it is highly unexpected that beta releases end up in :latest | 20:03 |
clarkb | similarly it was highly unexpected for jitsi meet to stop tagging :latest entirely and left us behind :/ | 20:03 |
fungi | though they do have a stable tag we've switched to | 20:05 |
ianw | https://github.com/grafana/grafana/discussions/47177 is from when this came up before | 20:07 |
clarkb | the currentl latest-ubuntu points to the 9.1.5-ubuntu tag so that seems to be an actual release | 20:08 |
clarkb | but latest doesn't seem to point at 9.1.5 | 20:08 |
clarkb | oh wait it does I opened bad tabs? | 20:08 |
clarkb | wow for some reason docker hub showes me armv7 by default for latest and amd64 for 9.1.5 | 20:09 |
fungi | on the other hand, zuul's latest tag follows tip of master | 20:09 |
*** dviroel is now known as dviroel|brb | 20:10 | |
clarkb | fungi: zuul is also expected to be functional on every commit and doesn't do beta releases | 20:10 |
clarkb | thoguh I guess it does sometimes have required upgrade steps | 20:10 |
fungi | and makes releases, which some people do use | 20:11 |
clarkb | considering :latest is currently 9.1.5 I'm ok with trying it again. This was the first time we ran into problems? ianw maybe and a comment to the docker-compose yaml file suggesting a rollback to the current latest release version tag should something go wrong just as a hint that latest here include betas? | 20:12 |
clarkb | and now I must find lunch | 20:12 |
ianw | grafana has had 5 releases in two weeks (9.0.8 -> 9.1.5) ... we're not watching closely enough to keep up with that and manual bumps | 20:13 |
ianw | a proposal bot update might be something, but i don't think it's worth the effort | 20:13 |
ianw | i think we're better to assume things will work, and if they don't pin and investigate. anyway, that's what i said in the review, so now in a loop :) i'll go with the crowd decision | 20:14 |
fungi | makes sense | 20:17 |
clarkb | thinking about making the multi node known hosts role quicker, does anyone know if simply appending our keys to the known hosts file may result in errors? for example what happens if we add a different rsa host key for an existing entry? I'd like to avoid rewriting too much of the wheel here since ansible already has a known_hosts module. If I can do a simple append and call it a day | 20:55 |
clarkb | that might be worthwhile | 20:55 |
clarkb | I guess another more involved option is to fork the known_hosts module and update it to take a list of entries so that it can add them all at once rather than one at a time | 20:55 |
fungi | you can have duplicates, afaik | 21:03 |
fungi | as long as they aren't different keys of the same type for the same host | 21:04 |
clarkb | ya so maybe having a small module that simply appends all keys at once to the file is a good improvement | 21:05 |
ianw | in case you were wondering i must have been testing lists.opendev.org and had a hosts override, and that ip is now live but some other site :) | 21:06 |
ianw | re the SSL error previously | 21:06 |
fungi | ianw: oh, you were probably helping with the mm3 testing | 21:11 |
fungi | and yeah, those held nodes are now gone | 21:11 |
ianw | i've got a root screen on mirror-update and will poke at this debian reprepro | 21:14 |
ianw | i get the feeling "Warning: We are having trouble keeping the AFS stat cache trimmed down under the configured limit" seems more likely to mean "i can't talk properly and too much is buffering" more than "i'm too busy" | 21:18 |
ianw | none of the afs server seem to have any errors | 21:21 |
fungi | ahh, thanks. it looks like there's a debian update in progress, but once the repos which aren't in trouble are done, it's probably best to just remove /afs/.openstack.org/mirror/debian/db | 21:26 |
fungi | "recovering" the berkeleydb it uses is unlikely to be all that helpful, and would be better to tell reprepro to generate a new db (however long that takes) | 21:27 |
*** rlandy is now known as rlandy|bbl | 21:27 | |
ianw | fungi: yeah, i have the lock right now so that's probably what is being seen | 21:28 |
fungi | oh, that's you | 21:28 |
ianw | just double checking, i'm not seeing anything in the openafs logs indicating an error | 21:29 |
fungi | yeah, now i see the flock is parented to a screen session. i forgot to give ps the f flag | 21:29 |
ianw | ubuntu.log:Starting ForwardMulti from 536870950 to 536870950 on afs02.dfw.openstack.org (as of Tue Sep 13 18:45:06 2022). | 21:30 |
ianw | [Tue Sep 13 18:45:46 2022] afs: Warning: We are having trouble keeping the AFS stat cache trimmed down under the configured limit (current -stat setting: 15000, current vcache usage: 639641). | 21:30 |
ianw | they must line up | 21:30 |
fungi | indeed, they do | 21:30 |
fungi | i wonder if something was going on in rackspace | 21:31 |
ianw | which isn't debian ... so there's no smoking gun there | 21:31 |
ianw | we've been seeing that since ... [Fri May 13 17:48:02 2022] | 21:32 |
fungi | uptime on all the servers involved is quite lengthy, so nothing seems to have spontaneously rebooted at least | 21:32 |
ianw | i think it might be worth a reboot of mirror-update; it's uptime is such a new kernel update won't hurt | 21:35 |
ianw | i'm going to quickly do that now, because it's not doing anything briefly | 21:36 |
fungi | yeah, good call | 21:38 |
fungi | clarkb: which job for 855292 were you holding before? | 21:41 |
fungi | oh, looks like it would have been system-config-run-lists3. also does zuul-client still need to filter via --ref=refs/changes/92/855292/7 or has that been improved? looks like it's been a while since i set an autohold, judging from my command history | 21:43 |
ianw | it's contents.cache.db that seems corrupt | 21:43 |
fungi | yeah, that's the one it was complaining about in the logs | 21:44 |
fungi | though i'm unsure what reprepro will do if you remove just one db file | 21:44 |
ianw | https://docs.opendev.org/opendev/system-config/latest/reprepro.html does not discuss recreating that one :/ | 21:44 |
ianw | "new format for contents.cache.db. Only needs half of the disk space and runtime | 21:46 |
ianw | to generate Contents files, but you need to run translatefilelists to translate | 21:46 |
ianw | the cached items (or delete your contents.cache.db and let reprepro reread | 21:46 |
ianw | all your .deb files)" | 21:46 |
ianw | that suggests that deleting the contents.cache.db might just have reprepro rebuild it | 21:46 |
ianw | (from a changelog note) | 21:46 |
fungi | worth a try. will take a while of course since it's reading them all back over afs | 21:47 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: DNM force mm3 failure to hold the node https://review.opendev.org/c/opendev/system-config/+/855292 | 21:47 |
ianw | i guess i'll move it out of the way and re-run it | 21:47 |
fungi | and then probably find something else to do for a very looooong while | 21:48 |
ianw | "reprepo update" has not complained (so far) and there is a new contents.cache.db | 21:52 |
ianw | hrm, it also seemed to exit and not do anything | 21:53 |
fungi | maybe it doesn't read the file contents, just size/date et cetera? | 21:55 |
fungi | in which case it might rebuild quickly | 21:55 |
ianw | it might be that it's in sync | 21:55 |
fungi | maybe | 21:55 |
fungi | it was pulling the files down, but failing to update the cache db | 21:56 |
ianw | i'm just letting the full script run now | 21:57 |
clarkb | fungi: system-config-run-lists3 sorry just got back from school run | 22:01 |
clarkb | looks like you found it /me double checks a listing | 22:02 |
clarkb | fungi: if you look at the listing you can see how to do it for arbitrary patchests using a regex but ya that looks right | 22:02 |
ianw | the contents.cache.db is growning now -- https://review.opendev.org/c/zuul/zuul/+/857517 seems to be writing it | 22:03 |
ianw | bah, paste error | 22:04 |
ianw | eprepro --confdir /etc/reprepro/debian export | 22:04 |
ianw | unfortunately i've run that under timeout :/ | 22:05 |
clarkb | looking at the afs -stat thing /etc/openafs/afs.conf has an OPTIONS value currently set to OPTIONS=AUTOMATIC which means that openafs can automatically determine what -stat should be (among a bunch of other options) | 22:08 |
clarkb | It isn't clear to me if we override and set only -stat if all the other options will end up with different chosen defaults | 22:08 |
clarkb | looking at the init script I think changing that value to something other than AUTOMATIC won't cause different values for the unset flags | 22:10 |
clarkb | afs.conf.client is sourced before OPTIONS is set in afs.conf so we can't set that value in the config file we already manage :/ | 22:13 |
clarkb | if we want to change that value I think we need to add afs.conf to the ansible role and change the value. I'll make a change for that and we can discuss further on that | 22:16 |
opendevreview | Ian Wienand proposed opendev/system-config master: mirror-update: make jobs interactive by default https://review.opendev.org/c/opendev/system-config/+/857519 | 22:24 |
ianw | ^ i feel like we've discussed that before | 22:27 |
*** dasm is now known as dasm|off | 22:28 | |
clarkb | ianw: ya https://review.opendev.org/c/opendev/system-config/+/840214 | 22:28 |
opendevreview | Clark Boylan proposed opendev/system-config master: Up openafs client -stat value https://review.opendev.org/c/opendev/system-config/+/857520 | 22:29 |
ianw | oh weird, that's actually a sourced shell script | 22:32 |
clarkb | ugh either the job finishes in ~10 minutes or it times out https://review.opendev.org/c/opendev/system-config/+/856537 | 22:32 |
clarkb | I wish I understood the emulation better | 22:32 |
ianw | clarkb: this does run on other distros (for the wheel build) and that is maybe deb specific? have to look closer | 22:33 |
ianw | urgh. that definitely feels like "i'm building something" one time, and not the other | 22:34 |
clarkb | ya, but the last time I looked at it I couldn't reproduce it | 22:36 |
clarkb | however, I only tested on amd64 to see what it did. Maybe the arm images are different? | 22:36 |
clarkb | but also it seems to change over time | 22:36 |
clarkb | the previous run everything timed out | 22:37 |
*** dasm|off is now known as Guest305 | 23:03 | |
clarkb | this time jobs for https://review.opendev.org/c/opendev/system-config/+/856537 succeeded | 23:24 |
clarkb | fungi: ^ if you've got time to review that | 23:24 |
*** rlandy|bbl is now known as rlandy | 23:27 | |
opendevreview | Merged opendev/system-config master: Update python builder and base image https://review.opendev.org/c/opendev/system-config/+/856537 | 23:59 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!