Clark[m] | fungi: etherpad released 1.9.4 over the weekend. Not sure if we want to do 1.9.3 first or just respin and do 1.9.4 instead. Either way we should proceed with etherpad upgrades now that the ptg is over | 14:06 |
---|---|---|
fungi | i can test with the new one later today, but need to disappear right now to run a bunch of errands (early voting, grocery pickup, etc) | 14:07 |
fungi | i should be back no later than 16:00 utc | 14:07 |
Clark[m] | See you then. I've got school run and breakfast to contend with myself | 14:08 |
slittle | Please add me as first core for starlingx-app-gen-tool-core | 15:01 |
clarkb | slittle: I can do that in a few. Just need to settle in and load ssh keys first | 15:10 |
clarkb | slittle: looks like someone else beat me to it | 15:15 |
fungi | slittle: clarkb: i did that on friday and pinged you in the starlingx-build matrix thread jreed started | 15:22 |
fungi | er, pinged slittle in there that is | 15:22 |
clarkb | fungi: the updated mm3 change lgtm fwiw | 15:22 |
clarkb | working on local updates now then will test the gerrit downgrade upgrade and use that as an opportunity to upgrade by hand to be more comfortable with the process on the test node. If that goes well I'll send the november 17 upgrade announcement email | 15:24 |
fungi | cool, i'm putting together the revised etherpad upgrade change now | 15:26 |
fungi | and will set another autohold | 15:26 |
clarkb | fungi: we should be able to start cleaning up old lists.o.o soon too ya? | 15:27 |
clarkb | see my note about extracting the kernel before we snapshot the node (to give us the best chance of boot it later) | 15:27 |
fungi | clarkb: yes, also https://review.opendev.org/899304 is related if you haven't seen it yet | 15:27 |
clarkb | I had reviewed that one. Looks like it got rebased. I'll rereview | 15:27 |
fungi | aha. yep i guess that was a nontrivial rebase | 15:28 |
fungi | er, wasn't a rebase. i had to fix that change | 15:29 |
clarkb | yup I see it now. We need to keep the host vars around for the test node passwords. Slight nit: now that we aren't putting exim configs in that file we can move it back into templates and treat it the same way as the other files | 15:30 |
fungi | oh, good point! | 15:30 |
fungi | i'll adjust it in a few | 15:30 |
clarkb | it got moved into files/ out of templates/ because exim configs have a lot of {{ }} type characters that jinja didn't like and I didn't want to have to escape the whole thing | 15:30 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Upgrade Etherpad to 1.9.4 https://review.opendev.org/c/opendev/system-config/+/896454 | 15:38 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: DNM force etherpad failure to hold node https://review.opendev.org/c/opendev/system-config/+/840972 | 15:39 |
fungi | clarkb: should it stay host-specific when moving into templates? i.e. playbooks/zuul/templates/host_vars/lists99.opendev.org.yaml.j2 vs playbooks/zuul/templates/group_vars/mailman3.yaml.j2 | 15:54 |
fungi | seeing as how inventory/service/host_vars/lists01.opendev.org.yaml moved to inventory/service/group_vars/mailman3.yaml i guess we want the latter for lists99 as well? | 15:56 |
clarkb | fungi: in this case the secrets are not in system-config for prod. I'm not sure if they went into a hostvars file or a group vars file in the secrets location | 15:56 |
clarkb | but probably best to align with that? | 15:56 |
fungi | oh, good point. on bridge we have a host_vars/lists01.opendev.org.yaml and no group_vars/mailman3.yaml | 15:57 |
fungi | but i can also move that. is there any reason not to? | 15:58 |
fungi | if we build a replacement mm3 server in the future (however unlikely), i doubt we'll want to generate all new credentials | 15:58 |
fungi | or if we do, it'll be deliberate (e.g. in response to a compromise) | 15:59 |
clarkb | ya I think moving them all to secret group contents is fine | 16:01 |
clarkb | ok I've tested the gerrit downgrade then reupgraded that node by hand to 3.8 | 16:36 |
clarkb | https://217.182.143.183/ this is the node. I've updated my etherpad with notes based on this. I did find that the plugin manager exception seems to have maybe changed on startup. I collected some info and sent that to upstream's discord server just now to see if that is debuggable further | 16:37 |
clarkb | hwoever, it wasn't working before (and honestly we should consider dropping that plugin entirely) so I don't think this is an issue | 16:37 |
opendevreview | Clark Boylan proposed opendev/system-config master: Upgrade Gerrit to Gerrit 3.8 https://review.opendev.org/c/opendev/system-config/+/899609 | 16:41 |
clarkb | infra-root https://review.opendev.org/c/opendev/system-config/+/898989/ is the other next step in the gerrit upgrade process. We'll want to land that and then restart gerrit at some point. Testing notes are in the comments of that change | 16:44 |
clarkb | I think we should be safe to land that whenever and then we just need to restart gerrit quickly within the next few days | 16:45 |
clarkb | infra-root ^ if you can ack that you're comfortable with the upgrade continuing on the 17th despite the plugin manager thing please do so then I'll send the announcement | 16:47 |
fungi | clarkb: i'm okay with moving forward on the existing upgrade plan, yes (saw the question in the matrix channel, hopefully someone will pipe up) | 16:56 |
fungi | i suppose the only risk is if the plugin-manager plugin has become mandatory, but that doesn't appear to be the case | 16:59 |
fungi | #status log Moved private host vars for lists01.opendev.org to a mailman3 group vars file on bridge01 | 17:04 |
opendevstatus | fungi: finished logging | 17:04 |
fungi | clarkb: do you happen to know why there's an untracked group_vars/adns-primary.yaml on bridge? seems like it may be related to https://review.opendev.org/876936 (last modified date is the same day that change merged, fwiw) | 17:06 |
fungi | maybe ianw just missed committing it? | 17:07 |
clarkb | that would be my guess | 17:07 |
fungi | happy to commit it, just want to make sure it's needed | 17:07 |
clarkb | yup I think we should. based on that change it seems we do use that group now | 17:08 |
clarkb | I'm going to send that email announcement now | 17:08 |
clarkb | for gerrit 3.8 | 17:08 |
fungi | looks like there was a commit on the same day which deleted the group_vars/adns.yaml file on bridge, so i suspect it was a missed git add (or maybe used mv instead of git mv?) | 17:09 |
fungi | i'm satisfied it's supposed to be tracked, so committing it now | 17:09 |
clarkb | ++ | 17:09 |
fungi | fixed | 17:10 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Merge production and test node mailman configs https://review.opendev.org/c/opendev/system-config/+/899304 | 17:14 |
clarkb | I'm going to remove the LE failures item from our meeting agenda. I think this job is stable again (perhaps ansible 8 made it happier) | 17:16 |
fungi | seems likely | 17:17 |
fungi | clarkb: looks like we may need to adjust our etherpad logging for 1.9.4? | 17:19 |
clarkb | fungi: we weren't doing anything special for etherpad. Did the tests show we don't get logs anymore? | 17:19 |
fungi | new node-log4j is choking on our /opt/etherpad-lite/src/node_modules/log4js/lib/configuration.js | 17:19 |
clarkb | hrm I didn't think we edited that at all | 17:20 |
fungi | er, node-log4js i mean | 17:20 |
clarkb | fungi: oh its in our settings.json looks like | 17:20 |
clarkb | so maybe we need to diff that against the example settings.json in 1.9.4 and reconcile any differences? | 17:20 |
fungi | quite possible | 17:21 |
fungi | i'll take a look | 17:21 |
fungi | seems like we have both docker/etherpad/settings.json.docker and playbooks/roles/etherpad/templates/settings.json.j2 to check over | 17:21 |
fungi | both appear to be identical? | 17:21 |
clarkb | https://github.com/ether/etherpad-lite/commit/aec619cc0bd043e8a921b599d1deecd2ef09b898 fwiw I think what we are doing is no longer a thing in etherpad | 17:21 |
clarkb | fungi: ya they sync them according to the git logs | 17:21 |
fungi | er, not identical never mind. my shell fu is weak today | 17:22 |
clarkb | so ya I suspect we need to remove that section and add any log settings like the log level entry then cross check that we still output logs where we expect them and adjust if not | 17:22 |
fungi | looks like there were some changes to the dockerfile too. i'll get both updated to match, similar to what you did for the 1.9.1 upgrade | 17:24 |
clarkb | reminder to update the meeting agenda if you haev items to add. I did a first pass update already. I'm going to pop out for a bike ride ina bit but will get that sent out when I return | 19:01 |
opendevreview | Tristan Cacqueray proposed zuul/zuul-jobs master: Introduce LogJuicer roles https://review.opendev.org/c/zuul/zuul-jobs/+/899212 | 19:17 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Merge production and test node mailman configs https://review.opendev.org/c/opendev/system-config/+/899304 | 19:26 |
opendevreview | Tristan Cacqueray proposed zuul/zuul-jobs master: Introduce LogJuicer roles https://review.opendev.org/c/zuul/zuul-jobs/+/899212 | 19:37 |
opendevreview | Merged opendev/system-config master: Convert commentlinks to new no html system https://review.opendev.org/c/opendev/system-config/+/898989 | 20:18 |
clarkb | the commentlinks config update appears to have applied successfully. I'll put restarting gerrit on my todo list for tomorrow | 21:47 |
clarkb | fungi: the three mailman changes you've got look good to me. I think we can approve those whenever you are ready (doesn't have to be now I think your evening is started already) | 21:49 |
fungi | yeah, we can approve the cleanup ones tomorrow, i'm on the fence about scheduling the upgrade change though it's probably going to be no more than a blip for delivery and webui | 22:03 |
clarkb | ya and even then probalby the only way people notice is if the webui is gone long enough someone tries to use it | 22:03 |
clarkb | fungi: maybe schedule it but say for thursday or friday? enough notice people can be aware but not so far in the future we're waiting long | 22:04 |
fungi | yeah | 22:04 |
clarkb | next on my todo list is sending that meeting agenda. Anything else to add? Last call | 22:05 |
fungi | i've got nothing | 22:05 |
clarkb | fungi: for etherpad you were going to update the config and dockerfile right? Just making sure I wasn't supposed to do that | 22:09 |
tonyb | Nothing for the agenda. | 22:09 |
fungi | correct, i just haven't done it yet. trying to confirm i know which files are copied from where | 22:09 |
clarkb | thanks! | 22:10 |
fungi | i'm trying to figure out the difference between docker/etherpad/settings.json.docker and playbooks/roles/etherpad/templates/settings.json.j2 | 22:12 |
fungi | and whether one or both need to be updated from upstream | 22:12 |
fungi | i guess docker/etherpad/settings.json.docker is what we bake into the images, but then we deploy playbooks/roles/etherpad/templates/settings.json.j2 to the server and map it over top the one in the container? | 22:12 |
fungi | oh, wait, upstream settings.json.template is what playbooks/roles/etherpad/templates/settings.json.j2 comes from i guess? | 22:14 |
clarkb | fungi: correct we appear to bind mount over what is in the container | 22:14 |
clarkb | then ya I think our settings.json template originates from the upstream template that predates docker | 22:14 |
clarkb | fungi: our settings.json is from the before times way way back when docker barely even existed :) | 22:16 |
fungi | maybe i should just look at what they changed in upstream's between 1.9.2 and 1.9.4 instead of trying to refresh what we have to 1:1 match the upstream minus our custom edits | 22:17 |
clarkb | ++ | 22:17 |
fungi | still, a nice future improvement would be to try to clean up as much divergence between them as we can | 22:17 |
tonyb | I've started looking at the last few "easy" bionic nodes (before looking at storyboard/wiki/translate/cacti with discussion about how we deal with them). I can't see and testinfra for those machines. Should I add some? | 22:36 |
clarkb | tonyb: the main reason those don't have testinfra is they either 1) aren't managed at all (wiki) and stuck in time or 2) are the last remaining puppet nodes | 22:38 |
tonyb | Sorry the testinfra question was for the mirror nodes. | 22:38 |
clarkb | oh I misread | 22:38 |
clarkb | tonyb: there is testinfra tests for mirror test_mirror.py | 22:39 |
clarkb | and test_meetpad.py for those | 22:39 |
clarkb | I think adding testinfra tests for any ansible managed hosts is a good thing though if we find them | 22:39 |
tonyb | https://opendev.org/opendev/system-config/src/branch/master/testinfra/test_mirror.py#L19-L20 Is there some aliasing between the mirror01.openafs.provider.opendev.org and (for example) mirror01.bhs1.ovh.opendev.org ? | 22:41 |
clarkb | tonyb: iirc its just regular ansible group membership. So mirror01.openafs.provider.opendev.org and mirror01.bhs1.ovh.opendev.org are in the mirror group which matches mirror[0-9]*.opendev.org | 22:42 |
clarkb | tonyb: that causes the test node (mirror01.openafs.provider.opendev.org) to be deployed as a mirror and the production node to be deployed as a mirror. Its just a matter of what the host name is in each context but they end up running the same playbooks due to the group membership | 22:42 |
clarkb | in our CI systems we've tried to make the names obviously different than production after making them look very production like. Part of the raeson for this is the avoid confusion when testing things like say the gerrit upgrade and accidentally stopping services on production. I don't think all services have been converted to the obviously different names yet but that has been the | 22:43 |
clarkb | more recent appraoch we've used | 22:43 |
clarkb | in many cases we do foo99.opendev.org instead of foo01.opendev.org. Then when you see 99 in the hostname after ssh'ing in you can be confident it isn't production | 22:44 |
tonyb | Okay. I think that's what's confusing me. We for sure uses insecure-registry99 when I/We did that one and then towards the end we switched to insecure-registry02 (in testinfra) | 22:46 |
tonyb | I thought everything needed to match "just so" so the testing would run. | 22:46 |
clarkb | in this case we could name the hosts mirror99.bhs1.ovh.opendev.org and mirror99.dfw.rax.opendev.org instead of the mirror01 and mirror02 entries | 22:46 |
clarkb | the group file is in inventory/service/groups.yaml and then in the playbooks we match on those groups | 22:47 |
clarkb | thats the best way to map what runs on certain hosts I think | 22:47 |
tonyb | Okay. I think I get it. | 22:47 |
tonyb | clarkb: did you finish zp01? | 22:48 |
clarkb | tonyb: yes looks like I did | 22:49 |
clarkb | tonyb: oh also the matching rules in the groups files are shell globs not regexes | 22:50 |
tonyb | Okay. I shall strikethrough the zp01 lines in the etherpad | 22:50 |
tonyb | re regexes vs globs, got it | 22:50 |
*** dhill is now known as Guest5198 | 23:01 | |
ianw | fungi: thanks, yes I must have forgotten to git add :/ ... previously that key info was in adns.yaml which the hidden primary and public nameservers both had access to, but only the hidden bind server needed it, hence moving to adns-primary.yaml | 23:10 |
tonyb | So looking over the playbooks in system-config and mulling over clarkb's comments above. We have a couple of jobs system-config-run-mirror-x86 creates 2 debug/throwaway nodes mirror01.openafs.provider.opendev.org (bionic) and mirror02.openafs.provider.opendev.org (focal). By way of the parent job system-config-run-mirror-base will run the playbooks/letsencrypt.yaml & playbooks/service-mirror.yaml playbooks to deploy the host. There are | 23:57 |
tonyb | similar jobs for arm64 mirrors. The testinfra code runs against those hostnames mirror{01,02}.openafs.provider.opendev.org. | 23:57 |
tonyb | So in terms of verifying that mirroring will work on jammy, We don't really need test in each region, we just want an mirror03.openafs.provider.opendev.org (jammy) node created as part of the job and include that in testinfra. Going forward do we need to keep the testing on bionic, focal and jammy? | 23:57 |
tonyb | To be complete I feel like I should also update the system-config-run-mirror-update job to deploy mirror-update99.opendev.org to jammy, but It does occur to me that this will possibly create a bunch of unwanted AFS replication traffic. | 23:57 |
tonyb | I also think that I should do most of the above for a jammy-arm64 node. | 23:57 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!