ianw | i guess we don't have --upgrade in our pip call, so bridge hasn't pulled in ansible <8 on the deployment | 00:06 |
---|---|---|
ianw | i wonder if it's a better idea to write out a requirements file, and run pip with --update if the requirements file changes | 00:10 |
ianw | otherwise i feel like it's constantly hitting pypi | 00:10 |
BlaisePabon[m] | <ianw> "otherwise i feel like it's..." <- Have you considered changing your piprc to point to a local cache? | 00:40 |
BlaisePabon[m] | Pulp project has a nice pypi (and other) server. | 00:40 |
ianw | BlaisePabon[m]: that seems to just move the problem from how to update the production install to how to update the cache :) | 00:53 |
ianw | either way i think maybe not a requirements file, but a stamp file from a template should be idempotent. if it changes, the add the --update flag | 00:53 |
ianw | i shouldn't have started looking, because now ansible has moved to collections, the way we pick the versions doesn't make a lot of sense now | 02:39 |
opendevreview | Ian Wienand proposed opendev/system-config master: bootstrap-bridge: Add cautionary note on installation of Ansible from git https://review.opendev.org/c/opendev/system-config/+/866541 | 03:45 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] overhaul install ansible requirements for 2022 https://review.opendev.org/c/opendev/system-config/+/866542 | 03:45 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] overhaul install ansible requirements for 2022 https://review.opendev.org/c/opendev/system-config/+/866542 | 03:52 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] overhaul install ansible requirements for 2022 https://review.opendev.org/c/opendev/system-config/+/866542 | 03:58 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] overhaul install ansible requirements for 2022 https://review.opendev.org/c/opendev/system-config/+/866542 | 04:10 |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] overhaul install ansible requirements for 2022 https://review.opendev.org/c/opendev/system-config/+/866542 | 04:16 |
*** yadnesh|away is now known as yadnesh | 04:30 | |
*** marios is now known as marios|ruck | 06:09 | |
ianw | i've received no email from gerrit today :/ | 06:22 |
ianw | looking now | 06:22 |
ianw | sigh ... SMTP error from remote mail server after RCPT TO:<my@email>: 550 zen.mimecast.org https://www.spamhaus.org/sbl/query/SBLCSS | 06:23 |
ianw | Delisting successful | 06:35 |
ianw | Nice work and congratulations! | 06:35 |
ianw | from the logs, @redhat.com, @hp.com seem the most affected, with a handful of other addresses showing the same thing | 06:38 |
ianw | #status log delisted review.opendev.org from Spamhaus blocklist, several coporate domains were rejecting Gerrit mail | 06:39 |
opendevstatus | ianw: finished logging | 06:39 |
*** ysandeep__ is now known as ysandeep|afk | 07:49 | |
*** yadnesh is now known as yadnesh|afk | 08:01 | |
opendevreview | Ian Wienand proposed opendev/system-config master: install-ansible: overhaul install ansible requirements https://review.opendev.org/c/opendev/system-config/+/866542 | 08:20 |
ianw | clarkb / fungi : ^ that one will actually get ansible 7 deployed to bridge. it's a bit long, but a lot of code-removal and i think a good simplification for the current era | 08:21 |
*** jpena|off is now known as jpena | 08:42 | |
*** ysandeep|afk is now known as ysandeep | 09:10 | |
*** benj_79 is now known as benj_7 | 09:15 | |
*** yadnesh|afk is now known as yadnesh | 09:24 | |
*** ysandeep is now known as ysandeep|brb | 10:49 | |
*** ysandeep|brb is now known as ysandeep | 10:59 | |
*** dviroel|afk is now known as dviroel | 11:05 | |
*** rlandy|out is now known as rlandy|rover | 11:10 | |
*** pojadhav is now known as pojadhav|brb | 11:31 | |
*** yadnesh is now known as yadnesh|afk | 11:45 | |
*** pojadhav|brb is now known as pojadhav | 12:23 | |
*** yadnesh|afk is now known as yadnesh | 12:26 | |
*** frenzy_friday is now known as frenzy_friday|food | 12:40 | |
*** arxcruz is now known as arx|2023 | 13:08 | |
*** dasm|off is now known as dasm | 13:38 | |
*** yadnesh is now known as yadnesh|away | 14:21 | |
*** pojadhav is now known as pojadhav|afk | 14:38 | |
*** ysandeep is now known as ysandeep|dinner | 14:52 | |
fungi | shortly i'll lower the ttls on the address records for lists.opendev.org and lists.zuul-ci.org in order to facilitate faster updates during the 20:00 utc maintenance | 15:02 |
Clark[m] | fungi: did the new server get zuul-ci lists pre created or just OpenDev.org? | 15:08 |
Clark[m] | https://opendev.org/opendev/system-config/src/branch/master/inventory/service/host_vars/lists01.opendev.org.yaml#L214 I think means only OpenDev.org stuff is prepped | 15:09 |
Clark[m] | (also looking like a slow start for to the day for me here but will try to be helpful) | 15:10 |
Clark[m] | I suppose you could probably do the migration then uncomment those lines too? Just wanted to call that out as there is time to modify if necessary | 15:16 |
corvus | i have cleared the moderation queues on the zuul lists | 15:33 |
Tengu | hello there! I guess it's therefore not the right time to request reviews and, if possible, merge on my changes here? https://review.opendev.org/c/opendev/system-config/+/866175 + https://review.opendev.org/c/openstack/project-config/+/866475 (beware that last one, it has already bitten us last week) | 15:35 |
Tengu | (seeing the "maintenance" mentioned earlier. | 15:36 |
fungi | Clark[m]: oh, good call. i'll push up the patch to uncomment those in a moment | 15:38 |
fungi | Tengu: oh, it's a fairly light maintenance and not starting for another 4+ hours anyway, but details are at https://lists.opendev.org/pipermail/service-announce/2022-December/000049.html | 15:39 |
*** pojadhav|afk is now known as pojadhav | 15:40 | |
Tengu | "light" :9 | 15:41 |
Tengu | still - if those patches could be nudged then... :) | 15:42 |
*** dviroel is now known as dviroel|lunch | 16:00 | |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Create lists.zuul-ci.org on the Mailman v3 server https://review.opendev.org/c/opendev/system-config/+/866599 | 16:07 |
fungi | corvus: Clark[m]: ^ | 16:07 |
*** ysandeep|dinner is now known as ysandeep | 16:13 | |
*** ysandeep is now known as ysandeep|out | 16:15 | |
*** frenzy_friday|food is now known as frenzy_friday | 16:24 | |
*** marios|ruck is now known as marios|out | 16:35 | |
clarkb | fungi: +2, you may want to go ahead and approve that if trying to get it in before the maintenance window as that modifies the inventory so will take a while to apply iirc | 16:37 |
fungi | yep, i'm working on the dns changes now | 16:38 |
fungi | i'll approve it | 16:38 |
*** dviroel|lunch is now known as dviroel | 16:51 | |
opendevreview | Jeremy Stanley proposed opendev/zone-opendev.org master: Temporarily lower the address TTLs for lists https://review.opendev.org/c/opendev/zone-opendev.org/+/866604 | 17:06 |
opendevreview | Jeremy Stanley proposed opendev/zone-opendev.org master: Temporarily CNAME lists to review for deferral https://review.opendev.org/c/opendev/zone-opendev.org/+/866605 | 17:06 |
opendevreview | Jeremy Stanley proposed opendev/zone-opendev.org master: Switch lists to resolve to the new Mailman server https://review.opendev.org/c/opendev/zone-opendev.org/+/866606 | 17:06 |
opendevreview | Jeremy Stanley proposed opendev/zone-opendev.org master: Restore the default TTL to lists https://review.opendev.org/c/opendev/zone-opendev.org/+/866607 | 17:06 |
fungi | infra-root: that's ^ the maintenance dns change stack for the opendev.org zone. the first should be approved pretty much asap, i'll wip the others and make a similar series for zuul-ci.org | 17:07 |
opendevreview | Jeremy Stanley proposed opendev/zone-zuul-ci.org master: Temporarily lower the address TTLs for lists https://review.opendev.org/c/opendev/zone-zuul-ci.org/+/866608 | 17:11 |
corvus | fungi: where's the etherpad plan? | 17:13 |
fungi | https://etherpad.opendev.org/p/mm3migration | 17:14 |
fungi | if you're concerned about the temporary cname to review02, i have tested that attempting deliveries to it queues up in my mta's deferrals | 17:15 |
fungi | but i'm open to other similarly simple solutions | 17:15 |
corvus | i am -- (i didn't remember cname in the original plan; was assuming just a new A record; i'm mentally going through the possibilities now) | 17:16 |
fungi | i can make it an address record for one of our servers instead if that sits better with you | 17:16 |
corvus | i think it's fine, just working through it :) | 17:17 |
fungi | oh, i see what you mean, we used a/aaaa records in other zones too rather than cname to the server | 17:17 |
fungi | and yes, i agree cname for mail delivery has traditionally been discouraged | 17:17 |
corvus | i'm thinking about both things really -- the temporary cname, and the permanent cname after the move. | 17:18 |
opendevreview | Merged opendev/system-config master: Create lists.zuul-ci.org on the Mailman v3 server https://review.opendev.org/c/opendev/system-config/+/866599 | 17:18 |
corvus | fungi: note that some MTAs will literally rewrite the addresses if there is a cname (taking the "canonical name" part literally) | 17:18 |
fungi | i'm perfectly happy to use a/aaaa in those changes, no sweat | 17:18 |
corvus | so that may be a reason to avoid that in the end-state. | 17:18 |
clarkb | I've approved the TTL lowering change | 17:19 |
clarkb | for lists.opendev.org specifically | 17:19 |
fungi | yeah the ttl changes are fine regardless of the remaining implementation details. 866608 is the equivalent one for zuul-ci.org btw | 17:19 |
clarkb | +2 on that one. Will let corvus approve when happy | 17:20 |
corvus | i don't think a temporary cname should be a big problem (aside from potentially having some weird messages if we happen to get some messages from mtas that perform that rewriting during that time). maybe an A or MX would avoid that? but it's pretty unlikely to be a problem. | 17:20 |
fungi | well, if we're going to end with a/aaaa then i may as well just stick with those throughout the series for consistency and ease of reasoning | 17:21 |
corvus | wfm and lets us shut off the "is this CNAME okay?" mental subroutine :) | 17:21 |
fungi | exactly | 17:22 |
opendevreview | Merged opendev/zone-opendev.org master: Temporarily lower the address TTLs for lists https://review.opendev.org/c/opendev/zone-opendev.org/+/866604 | 17:24 |
opendevreview | Merged opendev/zone-zuul-ci.org master: Temporarily lower the address TTLs for lists https://review.opendev.org/c/opendev/zone-zuul-ci.org/+/866608 | 17:27 |
opendevreview | Jeremy Stanley proposed opendev/zone-opendev.org master: Temporarily point lists to review for deferral https://review.opendev.org/c/opendev/zone-opendev.org/+/866605 | 17:51 |
opendevreview | Jeremy Stanley proposed opendev/zone-opendev.org master: Switch lists to resolve to the new Mailman server https://review.opendev.org/c/opendev/zone-opendev.org/+/866606 | 17:51 |
opendevreview | Jeremy Stanley proposed opendev/zone-opendev.org master: Restore the default TTL to lists https://review.opendev.org/c/opendev/zone-opendev.org/+/866607 | 17:51 |
opendevreview | Jeremy Stanley proposed opendev/zone-zuul-ci.org master: Temporarily point lists to review.o.o for deferral https://review.opendev.org/c/opendev/zone-zuul-ci.org/+/866613 | 17:51 |
opendevreview | Jeremy Stanley proposed opendev/zone-zuul-ci.org master: Switch lists to resolve to the new Mailman server https://review.opendev.org/c/opendev/zone-zuul-ci.org/+/866614 | 17:51 |
opendevreview | Jeremy Stanley proposed opendev/zone-zuul-ci.org master: Restore the default TTL to lists https://review.opendev.org/c/opendev/zone-zuul-ci.org/+/866615 | 17:51 |
fungi | infra-root: those ^ are the maintenance dns stacks for both domains, revised to use exclusively a/aaaa rrs. i'll wip all 6 for now until closer to time to merge | 17:52 |
*** dviroel is now known as dviroel|afk | 17:52 | |
*** jpena is now known as jpena|off | 17:58 | |
fungi | i've checked and cleared the moderation queues for the mailing lists i watch over, but will check them again closer to maintenance time just to be sure | 18:14 |
fungi | including all three service-* lists | 18:14 |
clarkb | fungi: thank you! | 18:14 |
fungi | infra-root: i've added links for the dns changes to the migration plan at the top of https://etherpad.opendev.org/p/mm3migration so we have them for easy reference later, and also added a draft status notice for step 6. i'll get working on steps 4 and 5 shortly, but not crossing off step 4 until i see the deploy jobs confirmed and double-check dns resolution myself | 18:37 |
fungi | er, not crossing off step 3 i mean | 18:37 |
clarkb | the notice text lgtm | 18:38 |
fungi | step 5 won't really take all that long for this migration, but saves us a few minutes during the outage | 18:38 |
fungi | for lists.openstack.org it's critical, since the initial rsync will take hours | 18:38 |
fungi | but this also gives us a good opportunity to refine the process for the benefit of the remaining migrations next month-ish | 18:39 |
clarkb | ++ | 18:40 |
fungi | i've created root screen sessions on lists.openstack.org lists01.opendev.org for use in the preliminary steps and also during the maintenance | 18:44 |
fungi | infra-root: any recommendations for speedier dns updates during maintenance? should we temporarily stop the opendev-prod-hourly pipeline so it doesn't block deploy? | 19:15 |
fungi | i'm noticing those dns update changes took ~1.5 hours to deploy after they merged, even though the jobs themselves only ran for 5 minutes | 19:15 |
Clark[m] | Were they behind the inventory update for extra lists which takes forever? Otherwise I would expect 0-~40 minutes to run. | 19:17 |
Clark[m] | (depending on where the hourly jobs are) | 19:17 |
fungi | oh maybe. faster might still be nice though since it would shorten the maintenance window | 19:17 |
Clark[m] | One trick is to land the changes before the top of the hour so that they get the semaphore prior to the hourly jobs. We could also maybe land a change to temporarily disable the hourly jobs | 19:18 |
fungi | at the very least, let's try to avoid approving any system-config changes not related to today's maintenance until after we're done | 19:18 |
fungi | but yeah, i'm mainly concerned with merging the dns changes that start the maintenance outage, less so with the ones which end it | 19:19 |
Clark[m] | I think either try to time landing them for just before 20:00 UTC or land a change nowish that disabled the hourly jobs | 19:20 |
Clark[m] | The disabling will be more reliable | 19:20 |
fungi | at least my resolver sees a 300-second ttl being served with the a/aaaa rrs for lists.opendev.org and lists.zuul-ci.org now so i'm crossing step #3 off as done | 19:21 |
fungi | by 19:42z we can expect that all reasonable clients are respecting the new ttl | 19:22 |
fungi | infra-root: we're 15 minutes out from the official start of maintenance, so i'm approving the two changes to temporarily switch the lists.opendev.org and lists.zuul-ci.org names to resolve to our gerrit server | 19:45 |
clarkb | sounds good | 19:46 |
fungi | the initial rsync of the sites to the new server is done, so we're ready to proceed at the top of the hour | 19:47 |
opendevreview | Merged opendev/zone-zuul-ci.org master: Temporarily point lists to review.o.o for deferral https://review.opendev.org/c/opendev/zone-zuul-ci.org/+/866613 | 19:48 |
fungi | i've also sketched out a few of the more important commands for the upcoming steps in the maintenance plan at https://etherpad.opendev.org/p/mm3migration | 19:48 |
fungi | assuming our dns changes deploy as hoped | 19:49 |
fungi | looks like the remaining change is taking time to get a node assignment | 19:50 |
fungi | there it goes | 19:52 |
corvus | i'm around, along with my sandwich | 19:52 |
fungi | i'll be making pad prik king once this is done | 19:52 |
opendevreview | Merged opendev/zone-opendev.org master: Temporarily point lists to review for deferral https://review.opendev.org/c/opendev/zone-opendev.org/+/866605 | 19:53 |
fungi | but in the meantime i'm jealous of that sandwich | 19:53 |
clarkb | yall are making me hungry | 19:53 |
fungi | zuul-ci.org dns change already deployed, opendev.org change seems to be waiting behind event processing | 19:54 |
clarkb | it will enqueue ahead of the hourly jobs though | 19:55 |
fungi | and there is is | 19:55 |
clarkb | so should be fine for quick processing | 19:55 |
fungi | yeah, if it finishes at the predicted time, we can start importing at 5 after the hour | 19:55 |
fungi | (accounting for the ttl) | 19:56 |
clarkb | do we also shutdown the daemons on lists.openstack.org? I guess that doesn't help because exim will accept theemail either way so rely on dns | 19:56 |
fungi | yes | 19:56 |
fungi | exactly why we need to wait for dns | 19:56 |
fungi | we don't want inbound messages to end up in the exim queue on the old server | 19:57 |
fungi | i'll send the status notice at the top of the hour to make sure it has time to circulate | 19:57 |
ianw | o/ ... all seems to be going well! :) | 19:58 |
fungi | #status notice The lists.opendev.org and lists.zuul-ci.org sites will be offline briefly for migration to a new server | 20:00 |
opendevstatus | fungi: sending notice | 20:00 |
-opendevstatus- NOTICE: The lists.opendev.org and lists.zuul-ci.org sites will be offline briefly for migration to a new server | 20:00 | |
fungi | 866605 deployed at roughly 19:59 so we can proceed at 20:04 with next steps | 20:01 |
fungi | ianw: yes, we're at step #9 on https://etherpad.opendev.org/p/mm3migration | 20:02 |
fungi | but we've really not reached any of the hairy parts yet | 20:03 |
opendevstatus | fungi: finished sending notice | 20:03 |
fungi | okay, we should be safe to stop the services for those two sites on the old server now | 20:05 |
ianw | good to know the "noticeboard push pin" works @ https://fosstodon.org/@opendevinfra/109462853268874327 :) | 20:05 |
fungi | very nice! | 20:05 |
fungi | okay, systemd reports mailman-opendev and mailman-zuul are definitely stopped | 20:07 |
fungi | final rsync to the new server is done | 20:08 |
fungi | okay, site copies are moved into the import directory on the new server and recursively chowned to the uid/gid expected by the containers | 20:10 |
fungi | now comes the exciting part: the import process | 20:10 |
fungi | very minor catch. i had my trailing slashes wrong in rsync. need to move the resulting directories a little. i'll correct in the pad | 20:11 |
fungi | and done. trying the import again | 20:13 |
fungi | it's running now | 20:13 |
fungi | side note, need to decide how to permanently stop those services on the old server after the migration. i guess a change to remove their sites will suffice | 20:17 |
clarkb | fungi: oh good point. And ya I think you can just remove the content from ansible and then disable the services on the host | 20:18 |
clarkb | ansible should leave it alone at that point | 20:18 |
fungi | i tacked it onto the end of the maintenance plan so we don't forget | 20:19 |
fungi | opendev's done, no reported errors that i can see. took roughly 6 minutes. starting zuul... | 20:20 |
fungi | technically i can do #13 while this is running, so i'll work on that now | 20:22 |
fungi | test message sent to the incident list and i've received my copy. other subscribers can double-check | 20:25 |
fungi | also the zuul import finished in roughly 3 minutes | 20:25 |
clarkb | fungi: I got the test message | 20:25 |
fungi | i'll check that the sites look right with overridden dns resolution | 20:25 |
clarkb | fungi: the manual injection is so that you're sure exim and mm3 on the new server processed it? | 20:26 |
fungi | correct | 20:26 |
fungi | just want to make sure it's accepting list mail and distributing it before we point the world at it | 20:26 |
corvus | msg lgtm | 20:27 |
fungi | and the fact that subscribers received it means the subscriber list import worked | 20:27 |
fungi | with dns resolution overridden locally, browsing https://lists.opendev.org/ and https://lists.zuul-ci.org/ seems to work and i can see archive contents, like https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/HLQ2NDG5BKNM5L7VWAQTWSROW2L6HJQM/ (the maintenance announcement) | 20:30 |
fungi | https://lists.zuul-ci.org/archives/list/zuul-discuss@lists.zuul-ci.org/thread/S7AMZ4NXZCVT36NHRJRITPPNCL2L4L23/ too | 20:30 |
fungi | i think this checks out, so we should be able to update dns | 20:30 |
clarkb | fungi: what about the old archive redirects? (easy to fix if they don't work | 20:30 |
fungi | yeah, we can worry about that after | 20:31 |
clarkb | I think you likely can proceed regardless of old archive links | 20:31 |
clarkb | ++ | 20:31 |
fungi | approved 866606 and 866614 | 20:31 |
fungi | there's another step in the plan for more thorough testing after dns updates, i just wanted a cursory check to decide whether we needed to roll back | 20:32 |
fungi | simpler to do before people are able to start sending mail to the new server | 20:33 |
opendevreview | Merged opendev/zone-opendev.org master: Switch lists to resolve to the new Mailman server https://review.opendev.org/c/opendev/zone-opendev.org/+/866606 | 20:33 |
clarkb | ++ | 20:33 |
opendevreview | Merged opendev/zone-zuul-ci.org master: Switch lists to resolve to the new Mailman server https://review.opendev.org/c/opendev/zone-zuul-ci.org/+/866614 | 20:34 |
fungi | 5 minutes after those deploy, we can move on to step 16 | 20:34 |
fungi | actually, there was a opendev-prod-hourly buildset underway (done now) but it didn't block deploy for these that i could tell | 20:36 |
fungi | or maybe it was just fortuitous timing combined with some browser lag | 20:36 |
clarkb | fungi: it finished | 20:36 |
clarkb | I think this was perfect timing :) | 20:36 |
fungi | indeed, perhaps so | 20:36 |
fungi | i love it when a plan comes together (too soon to cue the a-team theme music though) | 20:37 |
clarkb | checking email it looks like lists01 backups were sorted out too | 20:37 |
fungi | yes, ianw fixed that before i even found time to investigate, so thanks! | 20:37 |
clarkb | related to ^ borg 2.0 is going to release soon. It is not backward compatbile. Bu I checked and we seem to pin to a 1.x release already so we should be fine | 20:37 |
fungi | apparently we had the wrong container name in the cronjobs | 20:38 |
clarkb | aha | 20:38 |
clarkb | thank you ianw for figuring that out | 20:38 |
fungi | while we're waiting, i'll optimistically start writing our conclusion e-mail as a reply to my original service-announce message, but hold off sending it until testing is done | 20:40 |
fungi | and that's readied to send once we're clear | 20:45 |
fungi | the second dns patch just finished, so at 20:50 utc we should be safe to test | 20:45 |
clarkb | I'm getting the new web ui without an overrides for lists.opendev.org. Looks happy after a quick exploration | 20:46 |
corvus | i have to wait a whole 10 seconds | 20:47 |
corvus | an interesting side effect is that some browsers will cache the initial redirect that gerrit performs | 20:48 |
clarkb | hrm maybe we want to use an IP that doesn't have a web server in the future then? A zuul merger? | 20:49 |
corvus | so even after dns updated, i needed to type in something other than "lists.opendev.org" in order to actually see the new site | 20:49 |
corvus | yeah maybe so | 20:49 |
fungi | agreed | 20:50 |
fungi | about not using gerrit for the parking | 20:50 |
fungi | i was able to go to lists.opendev.org fine in my browser | 20:51 |
fungi | but hopefully if someone went to the url and cached a redirect in that short time, a cache clear will fix it | 20:51 |
fungi | we can pick something better for the next time | 20:51 |
fungi | anyway, are things looking generally okay for everyone else? | 20:52 |
clarkb | fungi: I added another less urgent followup task to the todo list (double checking db backups now that we have content for lists) | 20:53 |
corvus | fungi: should i "sign up" ? (or "sign in?") | 20:53 |
clarkb | fungi: things look ok to me | 20:53 |
fungi | corvus: sign up | 20:53 |
clarkb | corvus: I believe you sign up with the existing email addr that old lists knew you as and it will associate your accounts via that once you confirm the account | 20:53 |
fungi | for some reason it seemed like an earlier v3 precreated accounts, but the later ones we tested don't. they will, however, automatically associate your subscriber/moderator/owner roles to your account once you confirm it, as long as the address you use matches | 20:54 |
fungi | and it's server wide (not just site-wide), so once you have an account it's the same across all the sites on that server | 20:55 |
fungi | i mentioned it in the maintenance announcement, but i'm keeping it reiterated in the follow-up i'll send once we're cool with this | 20:55 |
clarkb | note I haven't signed up yet. I should do that I guess | 20:57 |
fungi | i signed up on lists.zuul-ci.org and then logged into lists.opendev.org with the same credentials just to test, and it worked | 20:59 |
fungi | this will become less confusing when we integrate the sso we're working on | 20:59 |
clarkb | seems to work for me too | 21:00 |
fungi | but it seems to work as designed | 21:00 |
corvus | this all lgtm | 21:00 |
clarkb | it shows me the lists I own/moderate | 21:00 |
fungi | obviously we're going to spend a while fiddling with new options in the list configs, but sounds like we're probably good to call it migrated? | 21:00 |
clarkb | yes, I think you can send that email now | 21:00 |
corvus | i did a password reset for lists.zuul-ci.org, and the page header said lists.opendev.org. that's a pretty minor thing i think we can probably ignore. | 21:01 |
corvus | ++ sending email | 21:01 |
fungi | sent | 21:02 |
fungi | and i seem to have received my copy as a subscriber | 21:03 |
clarkb | I've received it and it even filtered as expected so the migration didn't break my rules (based on list id iirc) | 21:03 |
fungi | i'm going to set this aside for a bit and cook dinner, then look at writing a change to permanently disable those two sites on the old server so we don't accidentally wind up having them restart and send stale digests or anything | 21:04 |
clarkb | sounds good. Thank you for all the help building out the config management for this and testing it and building a migration plan and now migrating two sites. | 21:04 |
fungi | i figure the dns ttl cleanup can wait for a day until we're sure we don't need to make any urgend dns updates to these | 21:04 |
corvus | fungi: it looks great, thanks for all the work! | 21:04 |
clarkb | fungi: ++ | 21:04 |
fungi | this is still sort of just the beginning, but glad it's gone smoothly! | 21:05 |
clarkb | I'm going to eat lunch now. One of the things I need to do later today is send a meeting agenda which will help exercise this further | 21:05 |
ianw | i've got the test email too | 21:05 |
corvus | i will be sure to "like" your posts :) | 21:06 |
corvus | oh interesting, i don't see the 'completed' post in the archives yet | 21:07 |
corvus | https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/latest | 21:07 |
corvus | are they cron, or instantaneous? | 21:08 |
ianw | i haven't received my signup mail; but also after yesterday and spamhaus etc, this could very well be a @redhat.com problem | 21:10 |
Clark[m] | corvus: They are cron jobs. We had trouble with this in testing iirc | 21:11 |
Clark[m] | There is a test step that forces a cron run to populate the (empty in testing) archives | 21:11 |
corvus | what's the interval? | 21:11 |
Clark[m] | That I don't remember it was short enough that sometimes the testing worked and sometimes it didnt | 21:12 |
Clark[m] | They are adjustable too iirc | 21:12 |
corvus | ianw: i got signup emails instantaneously (literally zero delay) | 21:12 |
ianw | interesting, in the exim log on list01, there's @redhat.com mail waiting | 21:12 |
ianw | SMTP error from remote mail server after RCPT TO:<address@redhat.com>: 451 Internal resource temporarily unavailable - | 21:13 |
ianw | https://community.mimecast.com/docs/DOC-1369#451 [-x9V7H_7N8iFec1UDFFY6Q.us182] | 21:13 |
corvus | ianw: probably greylisting the new ip | 21:13 |
ianw | "451 Unable to process connection at this time The Mimecast server is under maximum load." | 21:13 |
ianw | yeah, that seems likely | 21:13 |
corvus | ianw: (also, next line under that mentions greylisting specifically) | 21:13 |
ianw | heh, yeah | 21:14 |
corvus | actually "451 Internal resource temporarily unavailable" from the exim log is the greylisting form of that msg | 21:14 |
corvus | so i think that's greylisting confirmed | 21:14 |
corvus | (they seem to use 451 for a lot of stuff and then disambiguate based on the accompanying message) | 21:15 |
ianw | yeah -- i guess not much we can do about that, possibly try to figure out the corporate it system and get it whitelisted | 21:15 |
ianw | i think it's likely retries will get mail through faster than that | 21:16 |
corvus | that will probably take longer than the typical 15 minutes spent greylisting | 21:16 |
corvus | presumably that should lead to an auto-whitelist so that messages don't get greylisted anymore... it would be good to track performance for a bit and make sure that happens | 21:17 |
corvus | this will "prime" remote systems and then hopefully the rest of the lists don't have to go through this in the later transition | 21:17 |
ianw | yep -- and i did get fungi's test mail in a timely fashion | 21:18 |
corvus | we might want to "chat" a bit on the incident or discuss lists over the next days | 21:18 |
corvus | i think we should exchange holiday recipes | 21:18 |
Clark[m] | fwiw I agree that I don't see the email in the archive. I think there are ~10 minute jobs, hourly jobs and daily jobs. Maybe this is hourly by default which is probably too infrequent? Definitely worth looking at more closely to make sure we understand it | 21:20 |
ianw | "451 Too many mentions of eggnog" | 21:20 |
ianw | it's different versions, but visually on https://lists.opendev.org/archives/ there's no left (or right) margin for me. compared to say https://lists.fedorahosted.org/archives/ | 21:25 |
ianw | everything loads correctly (thought it may have been a missing css file) so may be intended | 21:25 |
ianw | i think it's intentional ... https://gitlab.com/mailman/hyperkitty/-/merge_requests/398 | 21:31 |
ianw | the top < navigation button is right on the bit where my pixel 6 pro screen starts to curve around, making it hard to click. i think all-lists wants some margins -- but it's not our fault. i can probably file something | 21:32 |
Clark[m] | Ya we intentionally didn't try a custom theme | 21:33 |
clarkb | looks like one thing to check re archiving is that it is enabled for the list in the list settings (I would've expected the migration to do this but we shoulddouble check it) | 21:45 |
clarkb | in theory hyperkitty is functional though otherwise the migration wouldn't have been able to populate the archives | 21:47 |
clarkb | `docker exec mailman-web ./manage.py runjobs hourly` is what the test job runs to force things to run early and ensure archives (empty ones) are present) | 21:48 |
clarkb | confusingly I think hyperkitty stuff operates in the -core container, but -web controls it using an api key | 21:48 |
clarkb | the hourly runs should run in ten minutes so that will be a good clue too I guess | 21:51 |
clarkb | archived-at is empty in the email we got | 21:53 |
clarkb | fungi: the sign in page shows other login options too | 21:54 |
clarkb | fungi: I thought we had disabled that | 21:54 |
clarkb | oh wait that may be my fault it opened the upstream list in tab complete... | 21:54 |
clarkb | yup my fault we're good for logins | 21:55 |
clarkb | ok archiving should be enabled for service-discuss according to the configuration page for that list | 21:56 |
clarkb | oh but you sent it to service-announce | 21:56 |
fungi | right | 21:56 |
fungi | because... announcement | 21:56 |
clarkb | ya sorry, but it too has the flag set in settings | 21:56 |
clarkb | thats good implies the migration process handled that for us | 21:57 |
* fungi is finished making, eating, and cleaning up from dinner now | 21:57 | |
clarkb | hourly crons should run momentarily and we can recheck and then take it from there I guess | 21:59 |
corvus | f5 f5 f5 f5 | 22:00 |
fungi | i'm just about done with monday evening chores that have to get done before it's dark out, so can help look in a moment | 22:00 |
*** dviroel|afk is now known as dviroel | 22:01 | |
clarkb | log says it ran | 22:01 |
clarkb | but I don't see it so that isn't it I guess | 22:01 |
corvus | the mbox file in /var/lib/mailman/web-data/mm2archives/lists.opendev.org/private/service-announce.mbox does not have the new msg | 22:03 |
corvus | (but that says "mm2" no idea if that's still relevant for hyperkitty?) | 22:04 |
clarkb | /var/lib/mailman/core/var/logs/mailman.log hs hyperkitty errors | 22:05 |
clarkb | https://paste.opendev.org/show/borF78wGCKTOxv6wGy2N/ | 22:06 |
corvus | so the message failed when being injected into the archives | 22:07 |
clarkb | and then it emits an html page that is fairly large with "This page either doesn't exist, or it moved somewhere else." | 22:07 |
clarkb | corvus: yes I think so. Something about the url it is trying to hit to perform that action? | 22:07 |
corvus | should it be using an http://127.0.0.1 url for that action? | 22:08 |
clarkb | corvus: yes I blieve so | 22:08 |
fungi | i wonder if we're missing some plumbing to the api on the default vhost | 22:09 |
clarkb | it listens on a special port | 22:09 |
clarkb | I didn't think external facing wbe server was involved | 22:09 |
fungi | yeah, 8000/tcp looks like | 22:09 |
fungi | Page not found: This page either doesn't exist, or it moved somewhere else. | 22:10 |
fungi | that seems to be coming from postorius | 22:10 |
corvus | so it's not relying on hostnames for site routing? | 22:11 |
clarkb | ya but is that because he other end sent it a 404 or it sent a 404? I'm having a hard time processing what that log is trying to tell us | 22:11 |
corvus | my fuzzy read of that is something like "it thought it was posting to a hyperkitty url and got a 404 back from postorius" | 22:12 |
clarkb | corvus: oh hrm | 22:12 |
clarkb | oh you know what | 22:13 |
clarkb | fungi: the urls changed right? | 22:13 |
clarkb | it was /hyperkitty but now is /archives | 22:13 |
clarkb | I suspect ^ is what broke it | 22:13 |
clarkb | we might want to flip that back and test? | 22:14 |
clarkb | alternativesly we edit our docker-compose var to use a different url for hyperkitty | 22:14 |
clarkb | ? | 22:14 |
clarkb | HYPERKITTY_URL=http://127.0.0.1:8000/hyperkitty <- maybe that needed to be aligned with your routes update that we did to address the django update | 22:14 |
clarkb | so ya I think I would try that first, switch ti to /archives or whatever the updated url was | 22:15 |
corvus | clarkb: where is that setting? | 22:15 |
corvus | found it | 22:16 |
clarkb | corvus: /etc/mailman-compose/docker-compose.yaml | 22:16 |
clarkb | is hte on host file | 22:16 |
corvus | (docker-compose) | 22:16 |
clarkb | https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mailman3/templates/docker-compose.yaml.j2 ansible side for that file and https://opendev.org/opendev/system-config/src/branch/master/docker/mailman/web/mailman-web/urls.py#L29 is where things got updated | 22:17 |
clarkb | previously that was 'hyperkitty/' I think | 22:17 |
clarkb | so ya thats my first hunch | 22:17 |
clarkb | https://github.com/maxking/docker-mailman/blob/main/web/mailman-web/urls.py#L29 is what it was set to previously | 22:18 |
fungi | d'oh, yep upstream mailman changed that years ago but the docker images never did. aligned the templates with upstream but it's possible that was missed | 22:18 |
clarkb | by the way the /var/lib/mailman/web-data/logs/mailmanweb.log has logs showing the urls are 404'ing there in the webserver | 22:19 |
fungi | it's possible we need to adjust something else in the images | 22:19 |
clarkb | fungi: in this case we may need to only update docker-compose to match | 22:19 |
fungi | maybe a git grep for the old url path in the docker repo will turn up something? | 22:19 |
fungi | oh, even better | 22:20 |
clarkb | because email comes in via exim, exim things hit mailman-core, mailman-core hits web :8000 hyperkitty api and then it archives is the flow I think | 22:20 |
fungi | ooh | 22:20 |
clarkb | and that value is configurable via the image already | 22:20 |
corvus | i agree with clarkb | 22:20 |
clarkb | so maybe manually update the docker-compoe file and down then up things | 22:20 |
clarkb | (someone else should do that as I'm not in a great spot to do so myself) | 22:20 |
corvus | fungi: can you, or should i? | 22:21 |
fungi | corvus: if you're already right there, please feel free | 22:22 |
corvus | can do | 22:22 |
fungi | i'm still digesting the specifics | 22:22 |
fungi | another e-mail to service incident should help us confirm it's fixed | 22:22 |
corvus | done, i'll send an email | 22:23 |
clarkb | does incident archive? | 22:24 |
corvus | i'm re-logging in with the addr i'm subscribed to that list to check first before i email | 22:24 |
clarkb | ++ | 22:24 |
corvus | okay, i see the (private) archives for that list; last msg is from october, as expected (ie, not fungi's msg from today) | 22:25 |
fungi | sounds right | 22:25 |
corvus | email sent | 22:26 |
corvus | msg appears in web archive | 22:26 |
fungi | perfect! | 22:26 |
clarkb | it has an archived-at header too | 22:26 |
clarkb | excellent | 22:26 |
fungi | corvus: are you also in a position to push a change to gerrit reflecting that adjustment? | 22:26 |
corvus | on it | 22:26 |
fungi | even better. i'll start working on the change to remove the config for the lists on the old server so the initscripts won't accidentally restart things in the future | 22:27 |
opendevreview | James E. Blair proposed opendev/system-config master: Update internal hyperkitty URL https://review.opendev.org/c/opendev/system-config/+/866629 | 22:27 |
corvus | fungi: clarkb ^ | 22:27 |
clarkb | +2 | 22:28 |
fungi | thanks again! | 22:30 |
clarkb | fwiw looking in the mailmanweb log is what made me think of that fix. But only because i remember updated the urls for web | 22:36 |
clarkb | once I understood the processing flow it made a lot of sense. Speaking of writing up docs of some sort is probably a good next step too? | 22:36 |
clarkb | We can document some commands like force running cron jobs and also describe the high level flow that I wrote down above to aid debugging | 22:36 |
clarkb | the upstream docs are pretty terse when it comes to this stuff, but their mailing list is pretty responsive | 22:37 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Remove opendev and zuul sites from old mm2 server https://review.opendev.org/c/opendev/system-config/+/866630 | 22:43 |
fungi | i think that's ^ the necessary config cleanup | 22:44 |
*** dasm is now known as dasm|off | 22:51 | |
corvus | fungi: +2s on that but i did not +w | 22:55 |
fungi | thanks! | 22:55 |
clarkb | completely unrelated: anyone know why the order of pipelines on the zuul status page changed? | 23:00 |
clarkb | gate is no longer in the center | 23:00 |
corvus | clarkb: https://review.opendev.org/c/openstack/project-config/+/859977 | 23:04 |
clarkb | aha | 23:05 |
corvus | (they're listed in definition order) | 23:05 |
corvus | clarkb: fungi gtema also i'm not 100% sure that trigger config is what you want. you may want 'approval' instead of 'require-approval'. https://zuul-ci.org/docs/zuul/latest/drivers/gerrit.html#attr-pipeline.trigger.%3Cgerrit%20source%3E.approval | 23:08 |
clarkb | hrm ya I suspect so | 23:09 |
corvus | fungi: 866630 requires an update to testinfra (we have tests asserting the service-discuss list) | 23:10 |
ianw | sigh, was going to file something about the zero margins on smaller width layouts, gitlab seemed to kick me out when i logged in on my work vm and now i can't seem to get back in at all :/ | 23:13 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Remove opendev and zuul sites from old mm2 server https://review.opendev.org/c/opendev/system-config/+/866630 | 23:13 |
fungi | corvus: thanks! just noticed that myself and fix is there ^ | 23:13 |
clarkb | ianw: on the ansible installation change is there a reason we can't just say version: '<8' state: latest? | 23:20 |
clarkb | I think that would have the desired effect? | 23:20 |
clarkb | I'm wondering if we over thought this problem and overlooked a simple fix | 23:22 |
clarkb | personally, I'd like to avoid using a requirements file like that if we can. There is a lot of indirection going on to make that happen. We set ansible vars which template out a file which then gets installed by virtualenv. | 23:23 |
ianw | the other thing i realised is that we also need more complex openstacksdk dependences expressed in the ansible venv too, pinning cinderclient | 23:25 |
clarkb | ianw: looks like the old code already had a state argument we could set. Is that worth trying? | 23:26 |
ianw | i see where you're coming from, but what was there wasn't very simple either. i kind of like that the requirements file gives us idempotence, because this runs all the time | 23:26 |
opendevreview | Merged opendev/system-config master: Update internal hyperkitty URL https://review.opendev.org/c/opendev/system-config/+/866629 | 23:27 |
clarkb | the requirements file isn't going to be more idempotent though | 23:27 |
clarkb | since we do a pip install --upgrade it should be the same as state: latest? they'll both update minor ansible releases until we update the cap | 23:28 |
ianw | it should be, as templating will only return changed if the file updates, so we'll only run pip when we actually change a requirement | 23:28 |
ianw | just have to think about the "latest" thing | 23:28 |
clarkb | oh I see you're relying on he file changed result parameter | 23:28 |
ianw | the problem is when that is combined with paths installation i think | 23:28 |
clarkb | so that won't actually update minor ansible versions either | 23:29 |
clarkb | we'd have to make a noop edit to the file to trigger those | 23:29 |
ianw | hrm, true... | 23:31 |
ianw | the old idea that we could set our production ansible version to a github tag doesn't work in the world of collections | 23:32 |
clarkb | because the github stuff is like 10 repos now? | 23:32 |
ianw | a long time ago i feel like we've hit brown-bag ansible issues and had to switch to someones github fork for a little while | 23:32 |
ianw | but that won't work in production because it won't have the ansible collections installed | 23:33 |
clarkb | but we really only use github today for testing future ansible right? | 23:33 |
ianw | not even really; we install zuul's checkout (but yeah, via github) | 23:34 |
clarkb | ianw: looking at the change more closely and thinking about some of the goals above I half wonder if a lockfile/constraints is really what you are hoping to express. That gives you idempotency and when you want to update you edit and you get a new version | 23:37 |
clarkb | ianw: maybe we can express that without extra files? except I think both constraints and the lockfile only work as file inputs ? | 23:37 |
*** rlandy|rover is now known as rlandy|out | 23:37 | |
clarkb | thinking out loud here. I don't actually know if that is helpful | 23:37 |
ianw | hrm, yeah i take your points | 23:38 |
ianw | I feel like much more removal than addition in the diffstat "95 insertions(+), 198 deletions(-)" means we're onto something cleaning this up a bit | 23:39 |
clarkb | ya I think the simplification is good. It just seems really odd that we need to write a file to pip install somethign when pip and virtualenv have ansible constructs. However, we break out to shell to do lots of git things in ansible beacuse the git construct isn't very good so maybe that is the case here too | 23:40 |
clarkb | infra-root I've just done some quick edits to the meeting agenda. Please add any other edits in he next half hour or so then I'll get that sent out | 23:40 |
clarkb | ianw: looks like install_ansible_ara_callback_plugins.stdout has been used in ansible.cfg.j2 but we didn't set the flag previously/ | 23:50 |
clarkb | oh wait I see it now nevermind | 23:50 |
ianw | yeah, that was a bit confusing. i updated that comment to hopefully make it clearer | 23:51 |
ianw | i agree the idepotence is broken with the <8 version specifier | 23:52 |
ianw | i guess ultimately there's no way around that | 23:52 |
ianw | you either check if there's something more recent, or hard-code the version | 23:52 |
clarkb | ya | 23:52 |
corvus | fungi: a punch-list item for mm: as a list owner, i just got some bounce messages (which i suspect may have been in response to something recently disloged from a queue since the list in question didn't have a recent email). the "From:" header on the cover letter from mailman (ie, the message mailman sent to me as the list owner which then had the actual bounce message as an attachement) is from "changeme@example.com". so i suspect there may | 23:53 |
corvus | be a missing setting in there somewhere. | 23:53 |
clarkb | ianw: I've gone aheada nd +2'd it as this does reduce the amount of code and adds more comments which are helpful and I think it will mostly work maybe just not exactly as originally envisioned (the edit requirements thing) | 23:54 |
clarkb | I acn live with that while we figure out something better if we decide to do that | 23:54 |
corvus | fungi: i forwarded that message to you personally, as an attachment, so you should be able to see all of that. if anyone else wants to look into that, let me know and i can forward it to you as well. | 23:55 |
corvus | (i did a quick check of the settings/bounce processing page in postorius for the list and did not see anything relevent) | 23:56 |
clarkb | corvus: fungi https://github.com/maxking/docker-mailman/blob/d928d36b97fab6fac2a6295ef5822549a68ed0c8/README.md#site-owner | 23:56 |
clarkb | thats possibly something that just got overlooked as a file to edit | 23:57 |
clarkb | yup I think that file is just missing | 23:58 |
corvus | there's a /var/lib/mailman/core/var/etc/mailman.cfg that is essentially empty except for a comment about it being auto-generated | 23:58 |
corvus | so probably we copy that file out into system-config and add that setting to it? | 23:58 |
corvus | (or leave it alone and create mailman-extra.cfg? i dunno, all new to me) | 23:59 |
clarkb | corvus: I think the docker image treats the mailman-ext.cfg special and incorporates it into that file | 23:59 |
clarkb | its part of the startup routine to do that bit | 23:59 |
ianw | clarkb: thanks, agree we can probably do even bettter | 23:59 |
corvus | oh interesting/weird | 23:59 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!