Monday, 2023-10-30

Clark[m]fungi: etherpad released 1.9.4 over the weekend. Not sure if we want to do 1.9.3 first or just respin and do 1.9.4 instead. Either way we should proceed with etherpad upgrades now that the ptg is over14:06
fungii can test with the new one later today, but need to disappear right now to run a bunch of errands (early voting, grocery pickup, etc)14:07
fungii should be back no later than 16:00 utc14:07
Clark[m]See you then. I've got school run and breakfast to contend with myself14:08
slittlePlease add me as first core for starlingx-app-gen-tool-core15:01
clarkbslittle: I can do that in a few. Just need to settle in and load ssh keys first15:10
clarkbslittle: looks like someone else beat me to it15:15
fungislittle: clarkb: i did that on friday and pinged you in the starlingx-build matrix thread jreed started15:22
fungier, pinged slittle in there that is15:22
clarkbfungi: the updated mm3 change lgtm fwiw15:22
clarkbworking on local updates now then will test the gerrit downgrade upgrade and use that as an opportunity to upgrade by hand to be more comfortable with the process on the test node. If that goes well I'll send the november 17 upgrade announcement email15:24
fungicool, i'm putting together the revised etherpad upgrade change now15:26
fungiand will set another autohold15:26
clarkbfungi: we should be able to start cleaning up old lists.o.o soon too ya?15:27
clarkbsee my note about extracting the kernel before we snapshot the node (to give us the best chance of boot it later)15:27
fungiclarkb: yes, also is related if you haven't seen it yet15:27
clarkbI had reviewed that one. Looks like it got rebased. I'll rereview15:27
fungiaha. yep i guess that was a nontrivial rebase15:28
fungier, wasn't a rebase. i had to fix that change15:29
clarkbyup I see it now. We need to keep the host vars around for the test node passwords. Slight nit: now that we aren't putting exim configs in that file we can move it back into templates and treat it the same way as the other files15:30
fungioh, good point!15:30
fungii'll adjust it in a few15:30
clarkbit got moved into files/ out of templates/ because exim configs have a lot of {{ }} type characters that jinja didn't like and I didn't want to have to escape the whole thing15:30
opendevreviewJeremy Stanley proposed opendev/system-config master: Upgrade Etherpad to 1.9.4
opendevreviewJeremy Stanley proposed opendev/system-config master: DNM force etherpad failure to hold node
fungiclarkb: should it stay host-specific when moving into templates? i.e. playbooks/zuul/templates/host_vars/ vs playbooks/zuul/templates/group_vars/mailman3.yaml.j215:54
fungiseeing as how inventory/service/host_vars/ moved to inventory/service/group_vars/mailman3.yaml i guess we want the latter for lists99 as well?15:56
clarkbfungi: in this case the secrets are not in system-config for prod. I'm not sure if they went into a hostvars file or a group vars file in the secrets location15:56
clarkbbut probably best to align with that?15:56
fungioh, good point. on bridge we have a host_vars/ and no group_vars/mailman3.yaml15:57
fungibut i can also move that. is there any reason not to?15:58
fungiif we build a replacement mm3 server in the future (however unlikely), i doubt we'll want to generate all new credentials15:58
fungior if we do, it'll be deliberate (e.g. in response to a compromise)15:59
clarkbya I think moving them all to secret group contents is fine16:01
clarkbok I've tested the gerrit downgrade then reupgraded that node by hand to 3.816:36
clarkbhttps:// this is the node. I've updated my etherpad with notes based on this. I did find that the plugin manager exception seems to have maybe changed on startup. I collected some info and sent that to upstream's discord server just now to see if that is debuggable further16:37
clarkbhwoever, it wasn't working before (and honestly we should consider dropping that plugin entirely) so I don't think this is an issue16:37
opendevreviewClark Boylan proposed opendev/system-config master: Upgrade Gerrit to Gerrit 3.8
clarkbinfra-root is the other next step in the gerrit upgrade process. We'll want to land that and then restart gerrit at some point. Testing notes are in the comments of that change16:44
clarkbI think we should be safe to land that whenever and then we just need to restart gerrit quickly within the next few days16:45
clarkbinfra-root ^ if you can ack that you're comfortable with the upgrade continuing on the 17th despite the plugin manager thing please do so then I'll send the announcement16:47
fungiclarkb: i'm okay with moving forward on the existing upgrade plan, yes (saw the question in the matrix channel, hopefully someone will pipe up)16:56
fungii suppose the only risk is if the plugin-manager plugin has become mandatory, but that doesn't appear to be the case16:59
fungi#status log Moved private host vars for to a mailman3 group vars file on bridge0117:04
opendevstatusfungi: finished logging17:04
fungiclarkb: do you happen to know why there's an untracked group_vars/adns-primary.yaml on bridge? seems like it may be related to (last modified date is the same day that change merged, fwiw)17:06
fungimaybe ianw just missed committing it?17:07
clarkbthat would be my guess17:07
fungihappy to commit it, just want to make sure it's needed17:07
clarkbyup I think we should. based on that change it seems we do use that group now17:08
clarkbI'm going to send that email announcement now17:08
clarkbfor gerrit 3.817:08
fungilooks like there was a commit on the same day which deleted the group_vars/adns.yaml file on bridge, so i suspect it was a missed git add (or maybe used mv instead of git mv?)17:09
fungii'm satisfied it's supposed to be tracked, so committing it now17:09
opendevreviewJeremy Stanley proposed opendev/system-config master: Merge production and test node mailman configs
clarkbI'm going to remove the LE failures item from our meeting agenda. I think this job is stable again (perhaps ansible 8 made it happier)17:16
fungiseems likely17:17
fungiclarkb: looks like we may need to adjust our etherpad logging for 1.9.4?17:19
clarkbfungi: we weren't doing anything special for etherpad. Did the tests show we don't get logs anymore?17:19
funginew node-log4j is choking on our /opt/etherpad-lite/src/node_modules/log4js/lib/configuration.js17:19
clarkbhrm I didn't think we edited that at all17:20
fungier, node-log4js i mean17:20
clarkbfungi: oh its in our settings.json looks like17:20
clarkbso maybe we need to diff that against the example settings.json in 1.9.4 and reconcile any differences?17:20
fungiquite possible17:21
fungii'll take a look17:21
fungiseems like we have both docker/etherpad/settings.json.docker and playbooks/roles/etherpad/templates/settings.json.j2 to check over17:21
fungiboth appear to be identical?17:21
clarkb fwiw I think what we are doing is no longer a thing in etherpad17:21
clarkbfungi: ya they sync them according to the git logs17:21
fungier, not identical never mind. my shell fu is weak today17:22
clarkbso ya I suspect we need to remove that section and add any log settings like the log level entry then cross check that we still output logs where we expect them and adjust if not17:22
fungilooks like there were some changes to the dockerfile too. i'll get both updated to match, similar to what you did for the 1.9.1 upgrade17:24
clarkbreminder to update the meeting agenda if you haev items to add. I did a first pass update already. I'm going to pop out for a bike ride ina  bit but will get that sent out when I return19:01
opendevreviewTristan Cacqueray proposed zuul/zuul-jobs master: Introduce LogJuicer roles
opendevreviewJeremy Stanley proposed opendev/system-config master: Merge production and test node mailman configs
opendevreviewTristan Cacqueray proposed zuul/zuul-jobs master: Introduce LogJuicer roles
opendevreviewMerged opendev/system-config master: Convert commentlinks to new no html system
clarkbthe commentlinks config update appears to have applied successfully. I'll put restarting gerrit on my todo list for tomorrow21:47
clarkbfungi: the three mailman changes you've got look good to me. I think we can approve those whenever you are ready (doesn't have to be now I think your evening is started already)21:49
fungiyeah, we can approve the cleanup ones tomorrow, i'm on the fence about scheduling the upgrade change though it's probably going to be no more than a blip for delivery and webui22:03
clarkbya and even then probalby the only way people notice is if the webui is gone long enough someone tries to use it22:03
clarkbfungi: maybe schedule it but say for thursday or friday? enough notice people can be aware but not so far in the future we're waiting long22:04
clarkbnext on my todo list is sending that meeting agenda. Anything else to add? Last call22:05
fungii've got nothing22:05
clarkbfungi: for etherpad you were going to update the config and dockerfile right? Just making sure I wasn't supposed to do that22:09
tonybNothing for the agenda.22:09
fungicorrect, i just haven't done it yet. trying to confirm i know which files are copied from where22:09
fungii'm trying to figure out the difference between docker/etherpad/settings.json.docker and playbooks/roles/etherpad/templates/settings.json.j222:12
fungiand whether one or both need to be updated from upstream22:12
fungii guess docker/etherpad/settings.json.docker is what we bake into the images, but then we deploy playbooks/roles/etherpad/templates/settings.json.j2 to the server and map it over top the one in the container?22:12
fungioh, wait, upstream settings.json.template is what playbooks/roles/etherpad/templates/settings.json.j2 comes from i guess?22:14
clarkbfungi: correct we appear to bind mount over what is in the container22:14
clarkbthen ya I think our settings.json template originates from the upstream template that predates docker22:14
clarkbfungi: our settings.json is from the before times way way back when docker barely even existed :)22:16
fungimaybe i should just look at what they changed in upstream's between 1.9.2 and 1.9.4 instead of trying to refresh what we have to 1:1 match the upstream minus our custom edits22:17
fungistill, a nice future improvement would be to try to clean up as much divergence between them as we can22:17
tonybI've started looking at the last few "easy" bionic nodes (before looking at storyboard/wiki/translate/cacti with discussion about how we deal with them).  I can't see and testinfra for those machines.  Should I add some?22:36
clarkbtonyb: the main reason those don't have testinfra is they either 1) aren't managed at all (wiki) and stuck in time or 2) are the last remaining puppet nodes22:38
tonybSorry the testinfra question was for the mirror nodes.22:38
clarkboh I misread22:38
clarkbtonyb: there is testinfra tests for mirror test_mirror.py22:39
clarkband for those22:39
clarkbI think adding testinfra tests for any ansible managed hosts is a good thing though if we find them22:39
tonyb  Is there some aliasing between the and  (for example) ?22:41
clarkbtonyb: iirc its just regular ansible group membership. So and are in the mirror group which matches mirror[0-9]*.opendev.org22:42
clarkbtonyb: that causes the test node ( to be deployed as a mirror and the production node to be deployed as a mirror. Its just a matter of what the host name is in each context but they end up running the same playbooks due to the group membership22:42
clarkbin our CI systems we've tried to make the names obviously different than production after making them look very production like. Part of the raeson for this is the avoid confusion when testing things like say the gerrit upgrade and accidentally stopping services on production. I don't think all services have been converted to the obviously different names yet but that has been the22:43
clarkbmore recent appraoch we've used22:43
clarkbin many cases we do instead of Then when you see 99 in the hostname after ssh'ing in you can be confident it isn't production22:44
tonybOkay.  I think that's what's confusing me.  We for sure uses insecure-registry99 when I/We did that one and then towards the end we switched to insecure-registry02 (in testinfra)22:46
tonybI thought everything needed to match "just so" so the testing would run.22:46
clarkbin this case we could name the hosts and instead of the mirror01 and mirror02 entries22:46
clarkbthe group file is in inventory/service/groups.yaml and then in the playbooks we match on those groups22:47
clarkbthats the best way to map what runs on certain hosts I think22:47
tonybOkay.  I think I get it.22:47
tonybclarkb: did you finish zp01?22:48
clarkbtonyb: yes looks like I did22:49
clarkbtonyb: oh also the matching rules in the groups files are shell globs not regexes22:50
tonybOkay.  I shall strikethrough the zp01 lines in the etherpad22:50
tonybre regexes vs globs, got it22:50
*** dhill is now known as Guest519823:01
ianwfungi: thanks, yes I must have forgotten to git add :/ ... previously that key info was in adns.yaml which the hidden primary and public nameservers both had access to, but only the hidden bind server needed it, hence moving to adns-primary.yaml23:10
tonybSo looking over the playbooks in system-config and mulling over clarkb's comments above. We have a couple of jobs system-config-run-mirror-x86 creates 2 debug/throwaway nodes (bionic) and (focal). By way of the parent job system-config-run-mirror-base will run the playbooks/letsencrypt.yaml & playbooks/service-mirror.yaml playbooks to deploy the host.  There are 23:57
tonybsimilar jobs for arm64 mirrors. The testinfra code runs against those hostnames mirror{01,02}
tonybSo in terms of verifying that mirroring will work on jammy, We don't really need test in each region, we just want an (jammy) node created as part of the job and include that in testinfra.  Going forward do we need to keep the testing on bionic, focal and jammy?23:57
tonybTo be complete I feel like I should also update the system-config-run-mirror-update job to deploy to jammy, but It does occur to me that this will possibly create a bunch of unwanted AFS replication traffic.23:57
tonybI also think that I should do most of the above for a jammy-arm64 node.23:57

Generated by 2.17.3 by Marius Gedminas - find it at!