Wednesday, 2023-04-26

fungilooks like i get nsec/rrsig records back from it too00:21
ianwi'm not seeing anything completely obvious like "fix centos" in
ianwthough visiting is a bit of a blast to the past00:52
opendevreviewIan Wienand proposed opendev/system-config master: [dnm] trying to get some logs for openafs on centos builds
ianwit is a build error "/var/lib/dkms/openafs/1.8.9-0.pre1.el9/build/src/libafs/MODLOAD-5.14.0-302.el9.x86_64-SP/osi_vnodeops.c:2272:20: error: implicit declaration of function ‘add_to_page_cache’; did you mean ‘add_to_page_cache_lru’"01:10
ianwi guess
genekuoianw: It seems like this patch is waiting for a workflow+1, is there anything missing that I should do?02:55
ianwgenekuo: sorry, nope that's fine.  i had held off just because i thought it might be something we'd use to test pushing our images to, clarkb is working on it.  but I think it's fine to go ahead as i know he's got local testing setup etc.03:13
genekuogot it, thanks03:14
opendevreviewIan Wienand proposed opendev/system-config master: nodepool: switch images to
ianwsigh, i guess the openafs rpm build jobs build the rpms, but don't test the install :/03:45
ianwso passes, but still fails03:46
opendevreviewMerged opendev/system-config master: Build houndd Directly
ianwi may have got the naming wrong
ianwhrm, no the 1.8.9 rpm's arent' there04:33
ianwbut copied them04:33
ianw... and they are in the afs R/W volume04:33
ianwthis implies either vos release hasn't happened, or static is showing old content04:34
ianw2023-04-26 04:35:02,529 release DEBUG    Running: ssh -T -i /root/.ssh/id_vos_release -- vos release project.tarballs04:35
ianw2023-04-26 04:35:03,225 release DEBUG    04:35
ianw2023-04-26 04:35:03,225 release ERROR    Release of project.tarballs failed04:35
ianwi wonder how related this is to the afsdb failure this morning ...04:36
ianwvos listvldb -locked shows only tarballs04:39
ianwafs-release.log.3.gz:2023-04-23 23:55:03,259 release ERROR    Release of project.tarballs failed04:40
ianwappears to be the first error04:40
ianwi've unlocked it, and now it needs to do a full release :/04:44
ianwit's doing it in a screen on mirror-update, but it might take a while04:45
ianw#status log mirror-update02 in emergency as it runs a full release of project.tarballs after the volume became locked during a prior operation04:45
opendevstatusianw: finished logging04:45
fricklerneed to fix afs in order to be able to fix afs, nice ;)07:16
fricklerseems to be still running strong, no idea how to measure progress07:20
fricklergrafana shows afs01.ord has dropped from 262GB to 42GB two hours ago and is now slightly increasing again. if the target is the previous level, we haven't even done 10% yet07:25
ianwit's basically running at 10mbit08:16
ianwi think that's more or less the limit of the way it queues packets.  so yeah, it probably tracks for a long time08:18
frickleryes, certainly not the storage solution made for high performance. did we ever consider other solutions? like maybe zfs with snapshots and replication?08:57
opendevreviewChing Kuo proposed opendev/system-config master: Update Hound to Use Python 3.11 Base Images
opendevreviewMaksim Malchuk proposed openstack/diskimage-builder master: Extend the checksum files generation procedure
clarkbfrickler: I don't think zfs was ever considered due to it not having a good linux story when this was built and it still has licensing concerns (though none any worse than openafs aiui)15:15
clarkbbut also I'm not sure zfs send is really a good substitute. We would need to be able to have 3TB of disk in every location15:16
frickleryeah, likely not enough pain with afs yet to really dig deeper into this15:23
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Update ensure-quay-repo to run opportunistically
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Use consistent registry_type var name across roles
opendevreviewClark Boylan proposed opendev/system-config master: WIP Base jobs for image publishing
clarkbianw: fungi: are we to the point where we should ask the foundation to update NS and glue records for and
fungiwas the plan to do that before changing the ns records and soa in our zone files?16:28
fungibecause if not, we haven't done that yet16:29
clarkblooking at the etherpad the plan was to update the zones first on our end16:33
clarkbDoesn't look like that step has been updated on the etherpad for as far as it got yesterday16:34
clarkbso ya I think we need to land those changes first then talk to csc/foundation16:34
fungiotherwise the registrars will likely balk at making the change since it's not set in our zones16:43
fungii can take a look in a few and approve/rebase/propose those as needed16:44
clarkbfungi: k. I'm happy to wait a few hours if we want ot sync up with ianw too first. I was mostly trying to figureo ut where things were and to ensure I wasn't holding anything up between opendev and the registrar16:45
fungisure, i can hold off approving and just check for presence and state of those changes16:52
opendevreviewClark Boylan proposed opendev/system-config master: Switch to nodepool images on
opendevreviewClark Boylan proposed opendev/system-config master: Switch zuul container images to
opendevreviewClark Boylan proposed opendev/system-config master: Switch the zuul-registry image location to
opendevreviewClark Boylan proposed opendev/system-config master: Cleanup unused nodepool-base-legacy role
opendevreviewClark Boylan proposed openstack/project-config master: Switch jeepyb over to Gerrit 3.7 image builds
opendevreviewClark Boylan proposed opendev/system-config master: Remove Gerrit 3.6 image builds and test jobs
opendevreviewGhanshyam proposed openstack/project-config master: Correct the patrole repo gerrit acl to openstack/retired.config
gmannfungi: frickler ^^ I updated patrole repo acl wrongly. fixing that so that i can close the open reviews in that repo18:02
opendevreviewClark Boylan proposed opendev/system-config master: Add Gerrit 3.8 image builds and test jobs
clarkbI'm going to pause here on the gerrit stuff. Want to make sure 3.8 builds at all and can be deployed from scratch before we start worrying about the upgrade job18:06
clarkboh and I need the project-config update to jeepyb to land before the system-config changes are even testable18:08
clarkbI'll plan to clean up etherpad01 after lunch today so in about an hour and a half or so18:22
JayFAre things cached on
JayF landed, post jobs are posted there all successful, but the link is not here, and if I manually construct the URL, the workstreams spec is not there18:30
clarkbJayF: the content is likely hosted in afs and gets written to the RW volume. Then periodically the RW volume gets promoted to the RO volumes and the RO volumes are what we serve18:31
JayFack; so just need to wait for someone to release the vos (? it's been a while since I admin'd AFS :D)18:31
clarkbI believe this happens roughly every 5 minutes, but there was something with the tarballs volume in scrollback needing to be resynced from scratch and that taking time whihc may have backed things up18:31
clarkbJayF: ya its automated vos releases but with locking to avoid stepping on each other. I think the resync of tarballs has the others backed up behind it18:32
JayFsure, it's not a rush to get this published was just confused when I was gonna go reference it :D 18:32
fricklerreleases are blocked by the resync for maybe another day or so18:32
clarkbwe could do a manual vos release out of band maybe since it is for a different volume18:33
JayFIf you wanna do it just to do it; feel free. It causes me no pain, frustration, or delay to wait a day or two :)18:33
fungifwiw, waiting on the tarballs volume resync before updating the releases site is best anyway, since otherwise the releases site is going to mention and link to new releases on the tarballs site that aren't there yet18:42
opendevreviewMerged openstack/project-config master: Correct the patrole repo gerrit acl to openstack/retired.config
fungiclarkb: ianw: looks like the current state is that is already listing ns03/ns04 as additional ns records, while 880577 for and 880909 for zuul-? have 2x+2 and are ready to merge19:28
clarkbfungi: can you review that will unstick the followup changes for updating gerrit image stuff19:49
opendevreviewMerged openstack/project-config master: Switch jeepyb over to Gerrit 3.7 image builds
* clarkb rechecks things20:00
clarkbI'm looking at etherpad01 cleanup and have noticed that openstack server show is buggy anddoes not list attached volumes20:07
clarkbinfra-root ^ a warning as that could be potentially dangerous20:07
clarkbinfra-root last call to say don't delete etherpad01 and its volume. I'll get to that in about 10 minutes20:07
clarkbI hear no objections. Proceeding now20:17
clarkb#status log Deleted (648795e3-a523-4998-8256-8e40c6e6f222) and its volume (020a2963-1d11-4665-bfdf-1fefb74c8a9f) to complete the etherpad server replacement and cleanup20:21
opendevstatusclarkb: finished logging20:21
opendevreviewClark Boylan proposed opendev/system-config master: Add Gerrit 3.8 image builds and test jobs
clarkbLooking at the 3.8 release notes I think the upgrade process itself is going to be simpler than 3.7s but I haven't digested what all the changes we need to accomodate pre upgrade are yet20:39
clarkbcool I think the gerrit 3.8 war is building now21:00
ianwtarballs release still running21:09
ianwsince things went ok with i'll approve the other NS addition changes now.  then the registrars can be updated at leisure21:10
clarkbianw: re tarballs what precipitated that? a stale lock?21:11
opendevreviewMerged opendev/ master: Add Jammy refresh NS records
ianwclarkb: yeah the volume was locked.  the first afs-release that failed was "afs-release.log.3.gz:2023-04-23 23:55:03,259 release ERROR    Release of project.tarballs failed"21:13
ianwso something must have happened to the release before that?21:13
opendevreviewMerged opendev/ master: Add Jammy refresh NS records
ianwfungi/clarkb: and have the 03/04 NS records now, so they are safe to be switched at the registrar when ready21:24
clarkbianw: thanks fungi mentioned he could coordinate as I have a school function in a little bit21:26
ianwno rush21:26
opendevreviewClark Boylan proposed opendev/system-config master: Add Gerrit 3.7 -> 3.8 upgrade job
clarkbianw: to be clear we need to delete NS and NS records and add NS and NS records?21:33
clarkbthen swap out with at the beginning of the records for that domain?21:34
clarkbI want to make sure that we request everything we need clearly for them21:34
fungiyeah, i think that's what we expect, but will give foundation folks a heads up as soon as ianw confirms21:41
clarkbour gerrit theme plugin js will need to be updated:
clarkb but we can upgrade from 3.7 to 3.8 apparently22:01
clarkbfungi: ^ I think you can slap a DNM change on top of that and hold the 3.8 job and check the gitea links22:02
ianwclarkb: yep, we need to replace ns1/ns2 with ns03/ns04, and update glue records for both.22:03
clarkbfungi: ^ theres the confirmation22:04
ianwi don't know if we want to bring it up, probably doesn't need glue records, it would be one less thing to manage next time22:04
fungithanks ianw, i'll let them know now (no idea how long the change will take to complete)22:04
clarkband i guess make note that ns1/ns2 don't have the 0 prefix in the digits but ns03 and ns04 do22:05
ianwthere's no rush, we just don't want to turn off the old servers until done :)22:05
fungiclarkb: yep22:05
fungiclarkb: i had included that in the message i was hovering over the send button for22:05
fungiianw: you mean cease to serve otherwise we should still maintain its domain registration, and so specify nameservers for it22:06
clarkbfungi: cease serving the NS records for out of the .org domain22:07
clarkband let them get served out of instead I think22:07
fungiusually the registrar handles figuring that part out22:08
clarkbfwiw I can never remember exactly when those records are necessary in the parent domain22:08
clarkbchickens and eggs are confusing22:08
fungiat least i've never used a registrar where you independently altered your whois and requested glue record injection into the tld zone22:09
fungiusually you just say "these are my new nameservers" and then they update the whois and also inject glue records if they're needed22:09
ianwoh that may be the case.  porkbun we've been talking about does let you set glue records, if you want22:10
ianwso yeah, they may be set automatically22:10
ianwdig +noall +authority +additional +norecurse NS zuulci.org22:10
fungitypically, glue records are only injected into the tld zone if they're within the same domain or a subdomain22:10
fungibut some registrars may just do it for all domains regardless22:11
ianwright, they make sense for, but has them too22:11
clarkbinfra-root I think the gerrit image cleanup and addition of 3.8 stuff is good to go now: and children. Fixing up the issues in 3.8 should happen separately22:11
clarkbisn't there still a chicken and egg when you query NS .org has to point you at otherwise ou won't know what to talk to?22:12
ianwthanks! will look22:12
clarkbanyway we don't need to get into all those details22:12
ianwyeah, but it can look up ns03.opendev.org22:13
fungiclarkb: the ns records need to be in the tld zone, but not any glue records (a/aaaa)22:13
ianwwhereas when says "ask" there's a loop22:13
fungiall of the registrars i've ever worked with don't give you a separate choice to inject glue records, presumably because dns and domain registration are already confusing enough and they just want customers to give them (recurring) revenue without incurring additional support questions from people who don't grok the distinction22:16
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Use consistent registry type var name across roles
clarkbI don't actually know if that will make the linter happy. We will find out22:34
ianwhrm, i hope that some function has just changed name and it's not a whole rewrite of the zuul plugin23:28
ianwRemove registerStyleModule() plugin API -- Use plugin.styleApi().insertCSSRule() instead.23:33

