opendevreview | Cedric Jeanneret proposed zuul/zuul-jobs master: Toggle synchronize to "quiet" mode https://review.opendev.org/c/zuul/zuul-jobs/+/917118 | 08:36 |
---|---|---|
yoctozepto | morning folks | 10:33 |
yoctozepto | I need a little guidance if it is possible to set some extra requirements on node's hardware in opendev's Zuul | 10:33 |
yoctozepto | specifically, NebulOuS uses MongoDB in some places, and it requires AVX nowadays: | 10:34 |
yoctozepto | WARNING: MongoDB 5.0+ requires a CPU with AVX support, and your current system does not appear to have that! | 10:34 |
yoctozepto | so we have jobs randomly failing | 10:34 |
fungi | yoctozepto: it's possible that using one of the less general node labels will get you that, at the expense of possibly having jobs wait or end in node_failure on the occasion that some of our providers are offline/unavailable | 13:13 |
fungi | for example, maybe the providers with nested-virt acceleration also all have avx/avx2 capable processors | 13:29 |
yoctozepto | I see, so no clean way, thanks for confirming, fungi! I am now evaluating the alternative of pinning to mongodb 4.4 (it seems unlikely they really require 5.0+) | 13:33 |
fungi | yeah, it's also a sign that you can't run latest mongodb in at least some public clouds. i have no idea if that's a consideration for your project | 13:38 |
fungi | from an end user perspective i mean | 13:38 |
opendevreview | Albin Vass proposed zuul/zuul-jobs master: Zuul Cache role with s3 implementation. https://review.opendev.org/c/zuul/zuul-jobs/+/764808 | 14:04 |
yoctozepto | agreed fungi | 14:07 |
Clark[m] | Also mongodb 5.0 isn't open source iirc. | 14:22 |
Clark[m] | Ya all releases after October 2018 ish and 5.0 is from 2021 | 14:22 |
fungi | how did i miss that bcachefs made it into linux 6.7? | 14:37 |
fungi | still considered experimental for now, but encouraging nonetheless | 14:38 |
fungi | looks like ems did the scheduled maintenance on our opendev.org matrix server 11:10:02-11:23:17 utc | 14:42 |
fungi | i didn't see any problems, but keep an eye out i guess | 14:42 |
Clark[m] | I'm having a slow start today. Desktop was unresponsive and after a reboot it isn't fscking clean. I cleared out an orphaned inode and now we'll see if there are any clues to the original problem | 15:18 |
fungi | yikes. hope your drive isn't on its way out | 15:18 |
Clark[m] | smartctl says it is fine. Not sure how trustworthy that is. It is a new drive and statistically drives fail early or very late in life aiui | 15:27 |
Clark[m] | I suspect though that this is an old Linux display port bug where it fails to rewake an idle device. And then rebooting was what made ext4 sad. I should probably switch over to HDMI | 15:28 |
fungi | ah, yeah i've found dp very fiddly, but thought that was due to using a display hub to connect three monitors | 15:29 |
Clark[m] | Though there is no kernel log for the prior boot. Almost like things ended up read only at some point and I didn't notice (because I put a lot of workload stuff in /home?) | 15:30 |
Clark[m] | Arg | 15:30 |
Clark[m] | Oh wait no I misunderstood the flags to journalctl I think | 15:31 |
Clark[m] | Found it. Kernel panic. NULL pointer dereference for address 000...0008 | 15:32 |
Clark[m] | In amdgpu related call stack stuff | 15:32 |
Clark[m] | This caused xorg to crash which is why I had no more display | 15:33 |
fungi | yeah, sounds unpleasant for sure | 15:37 |
clarkb | ok reading the stack trace I think this is in displayport related code. So ya I may be able to workaround this with hdmi if it happens again | 15:46 |
clarkb | and I'm reasonably confident my disk is ok now | 15:47 |
clarkb | I'm going to punt on further debugging unless it bcomes a bigger issue | 15:47 |
clarkb | also in the future I may be able to change vtys and debug without rebooting | 15:49 |
clarkb | I should've attempted that at the start, but its early in the mornign :) | 15:50 |
fungi | depending on the panic, i can also usually leverage magic sysrq nerve pinches | 15:59 |
fungi | i typically try to do: sync, emergency unmount, reboot | 15:59 |
fungi | that often avoids leaving the fs dirty | 16:00 |
clarkb | drm_dp_add_payload_part2 is where the stacktrace jump to the page fault routine. I thin dp may be displayport so ya | 16:00 |
opendevreview | Clark Boylan proposed opendev/glean master: Update zuul config to drop xenial jobs https://review.opendev.org/c/opendev/glean/+/916952 | 16:06 |
clarkb | frickler: ^ now with python311 | 16:06 |
opendevreview | Clark Boylan proposed openstack/project-config master: Remove last bit of system-config-puppet-apply-jobs usage https://review.opendev.org/c/openstack/project-config/+/917198 | 16:21 |
opendevreview | Clark Boylan proposed opendev/system-config master: Remove old infra team puppet testing https://review.opendev.org/c/opendev/system-config/+/912311 | 16:22 |
clarkb | infra-root if you have time for reviews a subset of https://review.opendev.org/q/topic:%22drop-ubuntu-xenial%22+status:open should be mergeable for some Xenial cleanup. However this is really just scratching the surface so not sure if we want to wait a bit more until we can remove larger portions of config | 16:23 |
fungi | do you have a feel yet for whether ripping out d-g at the same time makes sense, or should happen separately? | 16:36 |
clarkb | In my head I think I'd like to do project retirements separately just so that I don't have too much stuff to page into memory. But I think if someone else wanted to get that done it would simplify some cleanup | 17:00 |
clarkb | Gerrit 3.9 removes stars info from the api ChangeInfo repsones. I'm like 99.99% certain this is a non issue for Zuul, but it may be a problem for gertty so calling it out here | 17:05 |
* clarkb is currently looking at the gerrit 3.9 release note list again | 17:05 | |
clarkb | fungi: if you look at the etherpad the last breaking change has to do with how ssh keys are validated. If you still have your scripting around for looking at ssh key stats maybe you can check if we have any users with bad keys? | 17:08 |
clarkb | I don't think it is a big deal but would be good to know ahead of time if that is straightforward. | 17:08 |
clarkb | the suggested edit feature is a little clunky but it is really cool that we can do that in 3.9 | 17:11 |
clarkb | The other thing that might be good for people to think about is we can optionally enable diff3 diffing for merge changes which may show better context | 17:12 |
clarkb | timburke: I believe you're someone that deals with merge changes in Gerrit semi often, do you have any opinion on whether or not the extra context diff3 provides would be helpful to you? | 17:13 |
clarkb | fungi: I remembered to check the mailman uwsgi-error.log files for 'listen queue' errors and there haven't been any since we updated the webserver stuff | 17:20 |
clarkb | however we also weren't getting those issues every day so still to be determined if that was sufficient for improving it | 17:20 |
fungi | clarkb: is gerrit deprecating the starring functionality, or just requiring different api methods to get the detail now? | 17:25 |
fungi | what's the link to the pad? | 17:26 |
clarkb | fungi: they are just requiring you to supply extra flags to changeinfo requests to populate the data | 17:30 |
clarkb | https://etherpad.opendev.org/p/gerrit-upgrade-3.9 | 17:30 |
fungi | oh, got it. that'll be easy to patch gertty for in that case | 17:31 |
fungi | clarkb: i guarantee there are ssh keys in our gerrit which will fail that. many had mistyped or completely bogus algorithm fields | 17:33 |
fungi | is that going to cause a problem for the server, or just for the accounts in question? | 17:33 |
clarkb | my read is the only thing it should do is result in extra error logs for existing entries | 17:34 |
clarkb | they will continue to function as will the server. New keys won't be able to be added if they fail the criteria though | 17:34 |
clarkb | eventually we may see the keys stop working for those users though | 17:34 |
fungi | odds are few, if any, of those accounts are still in use | 17:36 |
clarkb | then we're probably fine | 17:37 |
opendevreview | Merged openstack/diskimage-builder master: Add tox-py311 job https://review.opendev.org/c/openstack/diskimage-builder/+/917058 | 17:47 |
timburke | clarkb, re: diff3 -- i don't know exactly what that would end up looking like. for the most part, though, we (swift) would only see a merge change if someone's updating a feature branch; that person would pretty much always be a core themselves and just merge it once they see tests pass | 19:05 |
timburke | so idk that it really matters much (for me) | 19:05 |
timburke | if it'd be useful to have an example patch to see what resolved merge conflicts look like today, though, see https://review.opendev.org/c/openstack/swift/+/735381 | 19:07 |
timburke | i just noticed -- the "Size" column in the patchlist is a little funny for merges: it shows XL for that patch, "added 3174, removed 745 lines", but when you go into it the "Delta" summary is a much more reasonable +1 / -28 :P | 19:11 |
clarkb | timburke: I think it shows a third file state which is the "base" state | 19:39 |
clarkb | timburke: "You can pass --conflict either diff3 or merge (which is the default). If you pass it diff3, Git will use a slightly different version of conflict markers, not only giving you the “ours” and “theirs” versions, but also the “base” version inline to give you more context." | 19:40 |
clarkb | so its just extra information/context | 19:40 |
clarkb | https://blog.nilbus.com/take-the-pain-out-of-git-conflict-resolution-use-diff3/ here's a writeup of it | 19:41 |
clarkb | I think this boils down to deciding if we feel the extra information is helpful or too much noise and should be omitted | 19:41 |
clarkb | at this point I've managed to go through the entire list of things I put on the etherpad. There is a rather large list of other changes gerrit has made that I should probably skim too for any concerns. But good news is I don't think there are any major problems with the list we've already got | 20:05 |
clarkb | and the proposed edits feature is a really nifty. | 20:07 |
clarkb | upgrade doc now has a rough plan for the upgrade itself. I'll run through that on monday with a held node so that I can fill in details like log info and things to watch out for. I thought having those last upgrade was really helpful for ensuring the upgrade was going as anticipated | 21:02 |
clarkb | I should test it now but I don't want to discover a problem with the upgrade friday afternoon just to stew on it all weekend :) | 21:04 |
fungi | no, no. it's wind-down time | 21:26 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!