| opendevreview | OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/957995 | 02:20 |
|---|---|---|
| *** clarkb is now known as Guest24593 | 07:31 | |
| *** dmellado62 is now known as dmellado | 13:32 | |
| fungi | clarkb: yes, i'm keeping them in the emergency list until they're all done | 14:07 |
| zigo | Hi there! | 14:38 |
| zigo | I'd like to backport https://review.opendev.org/c/openstack/watcher/+/958207 to unmaintained/zed, though there's no such branch anymore. How can I do, as there's only the EOL tag? | 14:38 |
| zigo | (that's for Watcher's OSSN-0094) | 14:38 |
| fungi | zigo: you would carry a local backport | 14:42 |
| zigo | You mean in Debian? Yeah, I have that already done. Though I would have prefer to share it. | 14:42 |
| fungi | it's really more of an openstack question, not an opendev question, but basically if you need unmaintained branches to stay open for longer then the people serving as caretakers for those branches would probably appreciate the help | 14:43 |
| fungi | if there are no volunteers to keep up minimal testing for them at least, then they get closed down (tagged and deleted) so that people will stop relying on them | 14:43 |
| zigo | Well, then IMO we should stop destroying branches, and keep them open just in case there's a security problem and people start to care again. | 14:45 |
| zigo | It's very much ok to keep them as unmaintained/X | 14:45 |
| zigo | I just heard Red Hat people are even backporting to Train. Why not sharing these patches then? | 14:46 |
| fungi | zigo: that's an excellent question to ask them | 14:46 |
| fungi | but i suspect it's because their work would violate upstream policies since they aren't backporting to newer branches when doing so | 14:47 |
| fungi | which leaves users on those versions without a clear upgrade path to newer versions | 14:47 |
| zigo | *I* do the work, and would be happy to share it for watcher unmaintained/zed. | 14:48 |
| zigo | I guess I should open a new thread in the list about this, since that's not the first time this happens. | 14:48 |
| fungi | i will say that from an opendev hosting perspective, we don't want projects leaving an ever-growing pile of branches around because every branch is additional configuration in the ci/cd system and it makes pruning old unused configuration impossible | 14:50 |
| zigo | I think it's very much ok to delete all the CI stuff around it, and just let downstream share patches without the CI. | 14:51 |
| fungi | workflows, processes and policies are built around many years of an assumption that unused branches will be deleted | 14:51 |
| zigo | Can I quote this ? :) | 14:51 |
| fungi | feel free! i know we've had this discussion ad nauseum, and every time openstack tries to appease you on this by coming up with new ways to leave branches open for longer you still complain | 14:52 |
| fungi | the latest attempt is the unmaintained branch policy, which has put a lot of additional strain on project maintainers and our systems | 14:53 |
| zigo | I do each time there's a new security fix that needs backporting, and we have no space to share work. :) | 14:53 |
| fungi | well, basically the technical committee came up with a way for interested downstream stakeholders to volunteer to take care of those branches, but when nobody volunteers to do that they get closed | 14:56 |
| fungi | https://governance.openstack.org/tc/resolutions/20230724-unmaintained-branches.html which was further amended by https://governance.openstack.org/tc/resolutions/20231114-amend-unmaintained-status.html | 14:57 |
| fungi | if that's not adequately solving the problem, then #openstack-tc would be a good place to discuss it (or on the openstack-discuss mailing list, but i'd recommend adding at least [tc] in the subject line) | 14:57 |
| Guest24593 | re the system strain: it wouldn't be so bad if peopl were actively caring for the branches as presumably the ci jobs would get pruned or updaetd as necesasry to keep things mostly working. The problem is when we create the branch under the assumption it will be cared for, then it is ignored which orphans the system configurations leaving others to clean them up when say opendev wants | 15:01 |
| Guest24593 | to drop a test platform. Or if zuul changes some syntax | 15:01 |
| Guest24593 | arg I'm a guest again | 15:01 |
| frickler | yeah, we could consider to more actively clean up broken zuul configs, like delete them completely after a while | 15:03 |
| Guest24593 | the thing I was trying to argue for was not to open the branches in the first place. Wait until someone volunteers. But I'm not sure what effect that may have on say tempest testing and branch defaults | 15:04 |
| *** Guest24593 is now known as clarkb | 15:05 | |
| clarkb | ok sorry about that I am me again | 15:05 |
| clarkb | looks like rax flex iad3 did use a couple of instances overnight. Not a lot of load there but non zero | 15:08 |
| clarkb | fwiw I'm not sure if we can easily tell zuul to ignore branches either | 15:08 |
| fungi | i suppose they could merge a change to replace all their pipeline configs with noop jobs, but that's still configuration on every branch. and even without any configuration on a branch at all zuul still needs to evaluate the branch contents to determine there is no configuration | 15:11 |
| clarkb | explicitly using a noop config is probably a nice way to represent it for humans if zuul won't complain | 15:13 |
| clarkb | infra-root https://review.opendev.org/c/opendev/system-config/+/957950 is a relatively straightforward change to drop testing of bionic servers with system-config | 15:18 |
| clarkb | I checked our fact cache and as far as I can tell we don't have any bionic servers any longer | 15:18 |
| clarkb | (then once zuul drops ansible 9 we can drop bionic from zuul-launcher completely and clean up our mirrors etc) | 15:18 |
| mattcrees[m] | Hi all. In the Blazar project, we still have a stable/pike branch available. I understand this is because the branch was made before Blazar was managed by opendev. Does anyone know how we'd go about removing this branch? | 15:19 |
| clarkb | mattcrees[m]: in general the openstack release team has permissions to manage branches within openstack projects. If they want you to clean it up then you'd need extra gerrit acl permissions on the project or have a gerrit admin do it for you | 15:21 |
| clarkb | I would check with them first. I believe they already have tools that script branch cleanups which does the eol tag first then drops the branch | 15:21 |
| mattcrees[m] | I see, thanks clarkb. I'll reach out to them | 15:22 |
| frickler | mattcrees[m]: clarkb: likely the branch was created before release-management was in place, so we'd need to delete it manually anyway. I'll add that to my todo list | 15:22 |
| fungi | yeah, i think this one may fall into a grey area where they've avoided managing existing branches that pre-date a project's inclusion in openstack, but it would be good to confirm with them first | 15:22 |
| fungi | ah sounds like we just did ;) | 15:22 |
| mattcrees[m] | Nice, thanks frickler | 15:23 |
| frickler | confirmed, deliverables/pike/blazar.yaml doesn't exist | 15:23 |
| frickler | for the zuul config issue, I created https://review.opendev.org/c/openstack/tacker/+/958219 as an example, seems to work fine. waiting for feedback from elodilles but maybe that can be a simple workaround for the pile of zuul config errors we still have | 15:24 |
| clarkb | frickler: that looks promising | 15:34 |
| elodilles | frickler: well, tacker has zuul config errors (and broken gates?) even on stable branches. so probably tacker team should start with those, as i guess we don't want to set noop for the whole project... | 15:35 |
| clarkb | not for the whole project but unmaintained branches seems like a good idea if tyhey are broken since they are well unmaintained | 15:36 |
| elodilles | clarkb: but as i said, stable branches are broken too | 15:36 |
| clarkb | yes and those should be fixed | 15:37 |
| elodilles | i agree | 15:38 |
| clarkb | the extra old branches present extra problems because they tend to be even less cared after and also rely on old ci constructs that need to go away. Replacing them with noop jobs nicely addresses both problems | 15:38 |
| elodilles | anyway, i'm not against dropping the complete CI on unmaintained in this case, but i feel it a bit drastic when there are broken stable branches, too | 15:40 |
| frickler | the thing is we do not notice the broken stable branches when the huge amount of issues for unmaintained branches overwhelm that list. plus, it is an explicit requirement to keep unmaintained branches open, even though we are slacking at enforcing that requirement | 15:46 |
| fungi | yeah, stable branches with broken testing need to be fixed, unmaintained branches with broken testing are supposed to just get deleted | 15:49 |
| fungi | but it's an acceptable compromise to remove the testing on the unmaintained branches instead of deleting them immediately | 15:50 |
| elodilles | frickler: well, i could name a couple of places where we could add the noop and after that not the unmaintained branches will be the majority of the zuul config errors (like monasca-* and openstack-ansible-tests) | 15:50 |
| elodilles | fungi: yepp, now that frickler is proposed the monasca 2023.1-eol patches we will be one step closer to that. | 15:51 |
| fungi | hopefully monasca ceases to be a problem soon (either because the person offering to adopt it fixes the jobs, or because the tc decides to go ahead with retiring it) | 15:51 |
| elodilles | yepp | 15:53 |
| frickler | elodilles: yes, but monasca will hopefully be retired, so I simply chose the next best other example I found | 16:00 |
| frickler | also someone seems to be actively working on fixing tacker at least for master https://review.opendev.org/c/openstack/tacker/+/956458 | 16:00 |
| frickler | which I haven't seen happen for many of the unmaintained branches (though maybe I didn't look closely enough?) | 16:01 |
| clarkb | ok I'm popping out now for the eyeball inspection. I'll be back in a bit | 16:31 |
| fungi | hope you come back with as many as you left with! | 16:31 |
| fungi | (or at least as many) | 16:32 |
| fungi | looks like backups failed today on kdc03 | 17:05 |
| fungi | aha, i think we install borg into a venv and the python version has changed, so we'll need ansible to recreate that venv. sound right? | 17:14 |
| fungi | the log has a traceback for importlib.metadata.PackageNotFoundError: No package metadata was found for borgbackup | 17:14 |
| fungi | following https://docs.opendev.org/opendev/system-config/latest/afs.html#no-outage-server-maintenance for afs01.ord.openstack.org it looks like there are no rw volumes on it, so not moving any before upgrading | 17:29 |
| fungi | having to rm -rf /var/lib/docker/aufs on these too | 17:32 |
| clarkb | fungi: yes that sounds right | 17:40 |
| clarkb | re having ansible recreate the venv | 17:40 |
| clarkb | and then you'll have to do that again with the jump to noble. | 17:41 |
| fungi | i have a feeling with the additional work required for the afs01.dfw server, it will make the most sense to upgrade it from focal to jammy and then from jammy to noble, and start working my way back through the noble upgrades on the others | 17:42 |
| fungi | that way we don't have to move rw volumes off and back onto it more than once | 17:42 |
| clarkb | makes sense | 17:42 |
| clarkb | one thing I wondered about is if the dkms stuff is reinstalling the pacakges on these upgrades in order to rebuild against the new kernels | 17:43 |
| fungi | so basically having it be the last focal->jammy upgrade and then be the first jammy->noble upgrade | 17:43 |
| clarkb | it must as I'm pretty sure this is hwo we upgraded them in the past | 17:43 |
| fungi | and yes it is, i'm wayching it right now | 17:43 |
| clarkb | cool | 17:43 |
| fungi | part of what takes so long with the upgrades | 17:43 |
| fungi | as for holding writes to the rw volume on afs01.dfw, i wonder whether we need to put zuul executors on hold too somehow | 17:47 |
| fungi | er, rw volumes | 17:47 |
| fungi | the mirror-update server can just be shut down temporarily, but it's not the only system we have doing writes into afs | 17:48 |
| clarkb | isn't that why we change the rw volume to the other srver? | 17:48 |
| clarkb | or do we have to hold writes to do that? | 17:48 |
| fungi | oh! it's an either/or in the doc i guess | 17:49 |
| fungi | make sure i'm not misreading that | 17:49 |
| fungi | so if i move rw volumes from afs01.dfw to afs02.dfw then that in theory happens transparently and i don't have to block anything from writing | 17:50 |
| clarkb | that was my understanding though I have't reread the docs | 17:51 |
| clarkb | but yes I thought the idea was to always keep the rw volumes up so that we didn't have to stop writers. Do the work only on the ro side and then let it resync | 17:51 |
| fungi | and https://grafana.opendev.org/d/9871b26303/afs seems to indicate that they both have the same amount of available space and sizes | 17:52 |
| fungi | afs01.ord.openstack.org is on jammy now, working on afs02.dfw.openstack.org next and saving afs01.dfw.openstack.org for last | 17:53 |
| fungi | afs seems to be at least functional on afs01.ord | 17:54 |
| fungi | still reporting all its same ro volume sites | 17:54 |
| fungi | all the volumes on afs02.dfw are confirmed to be ro too so no need to move any yet | 17:55 |
| clarkb | "Basically what we need to do is make sure that either no one needs the RW volumes hosted by a fileserver before taking it down or move the RW volume to another fileserver." | 18:04 |
| clarkb | yes I read that as two options are available to us and ensuring all of the RW volumes are on one fileserver and taking down the other avoids needing to stop all the writers | 18:04 |
| clarkb | the other options requires stopping all writers | 18:05 |
| fungi | cool, so i think we're fine here. the main unknown is how long the rw volume moves will take, but hopefully not long since there are synced ro equivalents of all of them | 18:06 |
| clarkb | I wonder if disabling cron jobs on mirror-update will make that go more smoothly/quickly | 18:11 |
| clarkb | it may still be worth doing if not strictly necessary if it speeds the process up | 18:11 |
| fungi | could just... stop crond too | 18:17 |
| fungi | or whatever systemd replaced it with | 18:17 |
| fungi | afs02.dfw.openstack.org is up on jammy and afs seems to work there still. now on to afs01.dfw.openstack.org, going to start moving its rw volumes to afs02.dfw.openstack.org | 18:21 |
| fungi | i'll move a small one initially and double-check its still functional | 18:22 |
| clarkb | ++ | 18:22 |
| fungi | afs01.dfw.openstack.org had 57 rw volumes and 55 ro volumes | 18:25 |
| clarkb | that sounds right if all the rw volumes are there since I think one or two don't have ro volumes | 18:25 |
| clarkb | iirc its ok for those volumes to go down. But its worth double checking | 18:25 |
| fungi | docs-old and mirror.logs | 18:25 |
| clarkb | hrm is mirror.logs the volume that hosts: https://mirror.dfw.rax.opendev.org/logs/ ? | 18:26 |
| fungi | yeah | 18:26 |
| clarkb | if so then we probably do end up having writers to that volume and we need to do something about that | 18:26 |
| clarkb | afsmon and afs-release run super often. Then the others are the mirror cron jobs | 18:26 |
| fungi | oh, also the "service" volume only exists rw on afs01.dfw and there are no ro sites | 18:27 |
| clarkb | should we maybe add an ro volume for mirror.logs on afs02.dfw and then it can become the rw site? | 18:27 |
| clarkb | I think docs-old is unlikely to be an issue. And I'm not sure about server | 18:27 |
| clarkb | *service | 18:27 |
| fungi | and then there's a test.corvus volume which is ro on afs02.dfw but has no rw site at all | 18:27 |
| clarkb | I suspect that is partial cleanup that orphaned the RO volume | 18:28 |
| fungi | looks like there are a few volumes which don't have an ro replica on afs02.dfw | 18:28 |
| clarkb | fungi: some may be on afs01.ord | 18:28 |
| fungi | well, i mean there are some where the only ro volume is also on afs01.dfw, no other servers have a replica | 18:29 |
| fungi | afs02.dfw.openstack.org has 47 ro volumes | 18:29 |
| clarkb | got it. I was just calling out that afs02 may be the RO site but also afs01.ord could be | 18:29 |
| fungi | and afs01.ord.openstack.org has 17 | 18:30 |
| corvus | i don't need test.corvus | 18:30 |
| fungi | i figured, looked like it was just a missed deletion of a replica | 18:30 |
| fungi | but yeah, i'll need to audit the volumes on afs01.dfw to see which ones only have local replicas and no remote ones | 18:31 |
| fungi | this'll take a few minutes | 18:31 |
| fungi | okay, so these are the volumes with no remote replica, residing only on afs01.dfw: docs-old, mirror.logs, service, user | 18:38 |
| fungi | the have remote replicas on afs01.ord but not afs02.dfw: docs, docs.dev, mirror, project, project.airship, root.afs, root.cell, user.corvus | 18:38 |
| fungi | all other volumes with rw on afs01.dfw have a ro replica on afs02.dfw | 18:39 |
| clarkb | I think its ok to have RO in ord and not dfw we just have to make that site the new RW site temporarily | 18:39 |
| clarkb | which means the main thing to consider is whether docs-old, mirror.logs, service, and user need secondary sites. I think mirror.logs having a secondary site is a good idea so that we don't have to turn off all the logging and afs monitoring | 18:40 |
| clarkb | I suspect that not having access to user.corvus (because user is down) is something that won't be a big deal. Any idea what "service" is for? | 18:41 |
| fungi | i can probably still move its rw volume, it has no ro replica not even locally | 18:41 |
| clarkb | oh ya maybe that is the case. I guess in my head you had to have an RO first that gets promoted to RW but that is probably an inaccurate mental model | 18:42 |
| fungi | i think it just ends up having to move all the data | 18:42 |
| clarkb | got it that makes sense. Wheraes if you have an up to date RO copy its a matter of flipping some attributes | 18:43 |
| fungi | vos move -id mirror.logs -toserver afs02.dfw.openstack.org -topartition vicepa -fromserver afs01.dfw.openstack.org -frompartition vicepa -localauth | 18:43 |
| fungi | 55M /afs/.openstack.org/mirror/logs | 18:44 |
| fungi | probably wouldn't take too long | 18:44 |
| clarkb | ya and they are in the same region too | 18:45 |
| fungi | 5 directories and 707 files | 18:45 |
| clarkb | I'm guessing service and user are even smaller | 18:46 |
| clarkb | docs-old is maybe old enough to not worry about? | 18:46 |
| fungi | right, i don't think we hook any readers up to it | 18:46 |
| fungi | and we definitely don't write to it | 18:46 |
| fungi | we kept it just in case | 18:47 |
| fungi | maybe it's time to tar it up and stick the file somewhere for posterity | 18:47 |
| clarkb | but also if size isnt massive could just move it too | 18:47 |
| fungi | also just realized that on afs01.dfw i have to use 104.130.138.161 instead of afs01.dfw.openstack.org in commands | 18:49 |
| fungi | moving mirror.logs rw volume to afs02.dfw.openstack.org took 16 seconds | 18:49 |
| clarkb | any idea why we need the ip? | 18:49 |
| fungi | has to do with the host lookup | 18:50 |
| fungi | vos: server 'afs01.dfw.openstack.org' not found in host table | 18:50 |
| clarkb | ah it gets back 127.0.0.1 | 18:50 |
| fungi | yeah | 18:50 |
| clarkb | I smell lunch so will be afk for a bit | 18:51 |
| fungi | /afs/.openstack.org/mirror/logs/ still has contents so i think we're good with that | 18:51 |
| clarkb | will you do the same for service and user? seems like a good idea given I'm still not quite sure what is under service | 18:52 |
| fungi | i've moved service and user to afs02.dfw as well just now, yes. they took only a few seconds each | 18:53 |
| clarkb | ack | 18:53 |
| fungi | i think it's just an anchor for the various service.foo volumes | 18:53 |
| fungi | i'm not going to bother moving docs-old | 18:53 |
| fungi | i'll work through the 8 that have ro replicas on afs01.ord next, moving the corresponding rw volumes there | 18:54 |
| Clark[m] | Oh if it is the anchor for those volumes we probably want ro copies on a second server? | 18:54 |
| fungi | though i also wonder why root.afs and root.cell have replicas on afs01.ord but not afs02.dfw | 18:55 |
| fungi | i assume those are special in some way | 18:55 |
| Clark[m] | Ord was the original alternative. But the small window size made it unworkable for large mirrors | 18:56 |
| Clark[m] | So we added the second dfw server after and prefer it for new stuff | 18:56 |
| fungi | right, basically i was wondering if those were forgotten | 18:56 |
| fungi | some volumes have remote replicas on both | 18:56 |
| Clark[m] | Or just unnecessary to move them since they are small | 18:57 |
| fungi | i'm switching docs rw to afs01.ord now, seeing how long it takes | 18:57 |
| fungi | answer so far: it's definitely not instantaneous (still in progress after ~10 minutes), i've got it running in a root screen session on afs01.dfw in case anyone needs that | 19:06 |
| fungi | still going after half an hour | 19:30 |
| fungi | and still going... i'll grab a bite to eat and check back in, hopefully that'll give it enough time. bbiab | 19:48 |
| clarkb | I've been doing laptop surgery. New wifi card seems to work a lot more reliably than the old one. Now to reinstall everything to clear out the mess of debugging steps I ahd previously applied | 20:37 |
| *** mtreinish_ is now known as mtreinish | 20:37 | |
| *** keekz_ is now known as keekz | 20:39 | |
| fungi | and the docs volume move is still in progress | 20:46 |
| *** ShadowJonathan_ is now known as ShadowJonathan | 20:55 | |
| *** keekz_ is now known as keekz | 20:55 | |
| *** dviroel_ is now known as dviroel | 20:55 | |
| *** clayg_ is now known as clayg | 20:55 | |
| *** naskio_ is now known as naskio | 20:55 | |
| *** dan_with__ is now known as dan_with_ | 20:55 | |
| *** thystips_ is now known as thystips | 20:55 | |
| *** rpittau_ is now known as rpittau | 20:55 | |
| *** tonyb_ is now known as tonyb | 20:55 | |
| *** gmaan_ is now known as gmaan | 20:55 | |
| *** clarkb is now known as Guest24679 | 20:58 | |
| Guest24679 | is anyone else having conenctivity issues to oftc or is it more likely on my client side? | 21:39 |
| Guest24679 | the other irc networks I'm connceted to don't seem to be bothered. But maybe its an ipv4 vs ipv6 problem | 21:39 |
| tonyb | my connection dropped about 40 mins ago but had been stable since then. | 21:40 |
| fungi | i think everyone fell off the wagon | 21:41 |
| fungi | seems the wheels are back on now | 21:41 |
| Guest24679 | ack this is just the third time in 2 days so I've started to wonder if it was me or the servers | 21:41 |
| fungi | it's the first time it's gotten me, fwiw | 21:42 |
| fungi | or at least that i've noticed | 21:42 |
| fungi | https://meetings.opendev.org/irclogs/%23opendev/latest.log.html also hasn't been updating that i can see | 21:42 |
| Guest24679 | taking longer this time because I can't just identify I have hto igure out how to ghost myself | 21:43 |
| *** Guest24679 is now known as clarkb | 21:44 | |
| fungi | ah, though the raw version at https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2025-08-21.log has content at least | 21:44 |
| clarkb | looks like reclaim and regain are aliases of one another | 21:45 |
| clarkb | fungi: I assume afs is still working on that docs volume? | 21:47 |
| corvus | i'm going to restart the launchers, schedulers, and web in order to pick up some zuul provider configuration syntax changes | 21:48 |
| clarkb | ack | 21:49 |
| fungi | clarkb: yeah, it almost feels like having a ro replica at that location doesn't speed up moving the rw there at all and it's just transferring all the data anyway | 21:49 |
| clarkb | fungi: I wonder if that is because we're publishing docs updates via zuul jobs regularly | 21:50 |
| fungi | no idea, if it doesn't finish before i knock off in a bit, i'll pick it back up in the morning. at least i ran it under time so i'll know how long it took to finish | 21:51 |
| clarkb | oh good idea. I wouldn't have done that :) | 21:52 |
| corvus | it'd be cool if a shell would let you retroactively "time" something | 21:53 |
| corvus | like "time %1" | 21:53 |
| fungi | yeah, seems like it wouldn't take much for a shell to track that internally | 21:53 |
| corvus | apparently fish has a $CMD_DURATION variable set after each interactive command | 21:54 |
| fungi | oh cool! | 21:54 |
| tonyb | history -w and look at the timestamps | 21:54 |
| fungi | that's a tempting selling point for it | 21:55 |
| fungi | tonyb: that gets you start time but not end time, i think? | 21:55 |
| fungi | so if you catch the process right after it terminates that's mostly usable | 21:55 |
| fungi | if you leave and come back later and want to know when the process ended, i don't think history gets you that | 21:55 |
| tonyb | true. it'd be an approximation | 21:56 |
| corvus | i've been trying out fish on one of my machines... i'm growing addicted to its command completion | 21:56 |
| clarkb | tonyb: corvus anyone want to weigh in on https://review.opendev.org/c/opendev/system-config/+/957950 which gets system-config out of needing ansible 9 overrides? | 21:57 |
| tonyb | you could add something to a prompt_command to record timestamps | 21:57 |
| tonyb | but that's getting kinda clunky. fish's solution is much neater | 21:58 |
| fungi | i just noticed mirror.debian last updated 12 days ago. looking at the reprepro logs, it's complaining about a missing pubkey for signatures by B8E5F13176D2A7A75220028078DBA3BC47EF2265, so probably we need to add another key to our config but i don't think i have time for that tonight | 22:01 |
| corvus | clarkb: lgtm | 22:01 |
| fungi | see the end of /var/log/reprepro/debian.log on mirror-update.o.o for details | 22:02 |
| clarkb | fungi: I'm guessing the new key came in via trixie and they started signing the old stuff with it too? I can take a look | 22:02 |
| fungi | likely, that's what i'm assuming anyway | 22:02 |
| fungi | timing about lines up | 22:02 |
| tonyb | clarkb: lgtm. I guess approve at will | 22:03 |
| opendevreview | James E. Blair proposed opendev/zuul-providers master: Update bindep-fallback path https://review.opendev.org/c/opendev/zuul-providers/+/958247 | 22:04 |
| clarkb | thanks I've approved it | 22:04 |
| corvus | the change to remove the nodepool elements from openstack/project-config merged, but that exposed a place where zuul-providers was referencing that repo. https://review.opendev.org/958247 is needed to fix our image builds, and we need to fix our image builds in order to get images for rax-flex iad3 (i think we have probably exceeded our 3 day timeout for keeping the objects around) | 22:05 |
| corvus | since clarkb +2d that i went ahead and approved it. that's got several hours of work in the gate ahead of it still if anyone else has thoughts.... | 22:07 |
| corvus | actually it's failing in gate because | 22:07 |
| corvus | 2025-08-21 22:07:11,093 ERROR zuul.Launcher: zuul.exceptions.LaunchStatusException: Server in error state | 22:07 |
| corvus | we're getting that from osuosl | 22:07 |
| clarkb | corvus: if we can get the error message we can pass that along to Ramereth[m] | 22:08 |
| corvus | clarkb: zuul logs it if it gets one | 22:08 |
| corvus | and i don't see one | 22:08 |
| corvus | worth double checking on bridge though in case there's a bug in that | 22:09 |
| clarkb | corvus: via server show you mean? | 22:09 |
| corvus | (or maybe there are errors that only show up in detailed server gets, not server listings) | 22:09 |
| corvus | yeah | 22:09 |
| corvus | i'm on that | 22:09 |
| corvus | this command right? /usr/launcher-venv/bin/openstack --os-cloud opendevzuul-osuosl --os-region RegionOne server list | 22:10 |
| corvus | that's empty, i think zuul is deleting the error servers very quickly | 22:11 |
| clarkb | ya that command looks right | 22:11 |
| corvus | i think it may be time to consider this change: https://review.opendev.org/955797 | 22:11 |
| clarkb | we may need to boot something out of band and check the error or see if Ramereth[m] sees anything on the cloud side | 22:12 |
| clarkb | corvus: for 955797 what causes zuul to ignore the failed image builds? They won't have the archive info listed because the failed jobs don't get to that point? | 22:12 |
| corvus | yep | 22:13 |
| corvus | they either produced and artifact or didn't, that's all we really care about | 22:13 |
| clarkb | +2 from me | 22:14 |
| corvus | another thing we could consider is doing another set of pipelines for the arm images. would let us group them together in buildsets. that's an alternative to 797, but we can also do both. | 22:16 |
| corvus | 797 gives us the swiss-cheese model of image builds: whatever builds, we use. making another set of pipelines lets us do all-or-none for x86, and all-or-none for arm. both are different from our current global-all-or-none. | 22:17 |
| opendevreview | Clark Boylan proposed opendev/system-config master: Add Trixie keys to reprepro config https://review.opendev.org/c/opendev/system-config/+/958248 | 22:17 |
| corvus | (and, if we do both things, then we get swiss-cheese buildsets for each) | 22:17 |
| clarkb | corvus: considering each image build is largely its own thing (even each release within a distro is pretty independent) I think taking what we can get is probably best | 22:17 |
| clarkb | rather than splitting it up by arch or distro or whatever | 22:18 |
| corvus | wfm | 22:18 |
| clarkb | fungi: something like 958248 maybe | 22:18 |
| opendevreview | James E. Blair proposed opendev/zuul-providers master: Remove build_diskimage_image_name variable https://review.opendev.org/c/opendev/zuul-providers/+/956373 | 22:19 |
| clarkb | fungi: the key hash string you provided above doesn't seem to match the key hashes I found but I'm guessing thats just because its hashing something different | 22:20 |
| opendevreview | Merged opendev/zuul-providers master: Use label-defaults https://review.opendev.org/c/opendev/zuul-providers/+/956946 | 22:20 |
| corvus | i have not seen this error building images before: https://zuul.opendev.org/t/opendev/build/6cdc17c3b50c41d2a9daa5149813c1c7 | 22:21 |
| fungi | clarkb: looks like it's a related subkey: https://lists.debian.org/debian-devel-announce/2025/04/msg00001.html | 22:21 |
| corvus | other builds got past that point... so it seems like maybe a fluke, but i don't know the origin | 22:22 |
| clarkb | corvus: let me try cloning that repo locally from each backend I guess | 22:22 |
| corvus | clarkb: oh good idea | 22:22 |
| clarkb | could be gitea backend specific | 22:22 |
| fungi | clarkb: 958248 is adding 225629DF75B188BD which is the corresponding master key for that subkey, so that looks right to me | 22:23 |
| clarkb | first thing I notice is that repo appears to be somewhat large.... | 22:23 |
| clarkb | like bigger than nova | 22:23 |
| corvus | oO | 22:24 |
| clarkb | https://opendev.org/cfn/computing-offload/src/branch/master/LingYaoSNIC/BCLinux they are just shoving rpms in there | 22:24 |
| clarkb | 456MB | 22:25 |
| clarkb | so maybe not bigger than nova but in the same range | 22:25 |
| corvus | give it time | 22:26 |
| clarkb | fungi: do you know who to talk to about this? I know you interacted with them a couple times then I tried to respond to them on the list. But this is really not what gerrit or git is good at... | 22:27 |
| clarkb | anyway gitea09 clones cleanly | 22:27 |
| fungi | i do not, sorry | 22:27 |
| fungi | horace may have some contacts | 22:27 |
| fungi | if memory serves, he follows some of their development effort | 22:28 |
| clarkb | thanks I'll followup there once I have a bit more info | 22:28 |
| fungi | he might also be able to get their attention more easily through wechat/weixin | 22:31 |
| opendevreview | Merged opendev/system-config master: Drop Bionic testing for OpenDev services, playbooks and roles https://review.opendev.org/c/opendev/system-config/+/957950 | 22:35 |
| clarkb | corvus: all 6 gitea backends clone that repo for me successfully right now | 22:35 |
| clarkb | corvus: the only other thought I've got is maybe a zuul merger is copying data over that is unhappy and that is propogating into the image builds? | 22:36 |
| clarkb | except I think we're just copying the old cache over not using the mergers? | 22:36 |
| clarkb | could be the repo is large enough to have hit a bit flip? | 22:36 |
| clarkb | I've asked horace if we can see about helping them use our tools more effectively (code review to prevent bad patches in the first place, artifact build and storage, git/gerrit/gitea, etc) | 22:47 |
| clarkb | corvus: tonyb: re arm64 there is a bookworm arm64 node booted right now that is active and not in an error state and I was able to ssh into it | 23:37 |
| clarkb | I also manually booted an ubuntu noble node and that worked (using our image) | 23:38 |
| clarkb | so maybe whatever the issue is has been resolved or is image specific? | 23:38 |
| clarkb | I'm going to delete my test node now | 23:39 |
| corvus | maybe | 23:39 |
| corvus | i can recheck something in a sec | 23:39 |
| corvus | but i just got distracted by the fact that all the x86 image builds failed somehow | 23:39 |
| corvus | oh | 23:40 |
| opendevreview | James E. Blair proposed opendev/zuul-providers master: Update bindep-fallback path https://review.opendev.org/c/opendev/zuul-providers/+/958247 | 23:40 |
| corvus | clarkb: ^ i got the path wrong | 23:41 |
| clarkb | +2 | 23:41 |
| corvus | that will produce a number of arm requests | 23:41 |
| clarkb | the last server listing I did shows 4 servers. 2 noble, 1 jammy, 1 bookwrom all active on the cloud side | 23:42 |
| corvus | well, they're 2 minutes in the building stat, that's promising | 23:43 |
| corvus | and some in use now | 23:43 |
| corvus | so yes, i guess the anomaly was temporally bounded :) | 23:44 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!