opendevreview | Dr. Jens Harbott proposed opendev/system-config master: Update IRC channel docs https://review.opendev.org/c/opendev/system-config/+/936806 | 07:11 |
---|---|---|
frickler | infra-root: ildikov: ^^ that's what I came up with, please have a look and feel free to amend if needed | 07:12 |
opendevreview | Albin Vass proposed zuul/zuul-jobs master: prepare-workspace-git: Make it possible to sync a subset of projects https://review.opendev.org/c/zuul/zuul-jobs/+/936828 | 12:46 |
opendevreview | Albin Vass proposed zuul/zuul-jobs master: prepare-workspace-git: Make it possible to sync a subset of projects https://review.opendev.org/c/zuul/zuul-jobs/+/936828 | 12:53 |
opendevreview | Karolina Kula proposed zuul/zuul-jobs master: DNM Switch to KVM https://review.opendev.org/c/zuul/zuul-jobs/+/936023 | 12:54 |
opendevreview | Karolina Kula proposed openstack/diskimage-builder master: DNM Testing on KVM https://review.opendev.org/c/openstack/diskimage-builder/+/936024 | 12:55 |
opendevreview | Dr. Jens Harbott proposed opendev/system-config master: Update IRC channel docs https://review.opendev.org/c/opendev/system-config/+/936806 | 13:07 |
ildikov | frickler: Thank you! I added one question to see if I interpreted the new text as intended, but otherwise the change looks good. The '-1' is for the typo. | 13:16 |
opendevreview | Albin Vass proposed zuul/zuul-jobs master: prepare-workspace-git: Make it possible to sync a subset of projects https://review.opendev.org/c/zuul/zuul-jobs/+/936828 | 13:52 |
opendevreview | Dr. Jens Harbott proposed opendev/system-config master: Update IRC channel docs https://review.opendev.org/c/opendev/system-config/+/936806 | 13:54 |
fungi | mailing list server maintenance begins in an hour, at 1500z: https://etherpad.opendev.org/p/lists-openinfra-org-migration | 13:58 |
fungi | all advance steps are completed, so an hour from now we'll pick up with step #4 to kick it off | 13:59 |
frickler | fungi: small question on the config change, but not critical I think. also I'll be kind of around in case something goes badly wrong | 14:09 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Move OpenInfra mailing lists to new domain https://review.opendev.org/c/opendev/system-config/+/936303 | 14:34 |
fungi | thanks frickler! | 14:35 |
fungi | maintenance starts in 5 minutes, i've got a root screen session open on lists01 | 14:54 |
opendevreview | Karolina Kula proposed zuul/zuul-jobs master: DNM Switch to KVM https://review.opendev.org/c/zuul/zuul-jobs/+/936023 | 14:56 |
fungi | and the session is logging to ~root/screenlog.0 for posterity | 14:57 |
fungi | #status log Mailing lists services are offline for the next two hours for maintenance, but messages will be deferred and delivered once work has concluded: https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/NQ3CPI2OX2TKE3OWNATUBA477T4SYTUZ/ | 15:00 |
opendevstatus | fungi: finished logging | 15:00 |
fungi | stopping exim4 | 15:00 |
fungi | stopping mailman containers | 15:00 |
fungi | backing up the database now, this will take a few minutes to complete | 15:01 |
fungi | that's done, now i'm applying the 15 (!) sql update queries we need for this | 15:05 |
fungi | the bounceevent table queries didn't match any rows, i'm skeptical but it's possible that's fine. i'll check the table contents when i'm done | 15:09 |
fungi | there are only 28 rows in that table at the moment, and none of them are for lists.openinfra.dev subscribers, so it's fine | 15:14 |
fungi | my latest edit to 936303 seems like it's problematic, taking a look real quick though it's not critical that it merge right now | 15:15 |
fungi | yeah, trivial typo for one of the errors... | 15:17 |
fungi | other one is an oopsie on my part, assert assumed a path edit where there was none | 15:18 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Move OpenInfra mailing lists to new domain https://review.opendev.org/c/opendev/system-config/+/936303 | 15:20 |
fungi | that should hopefully take care of it | 15:20 |
fungi | back to the maintenance tasks, skipping step 9 for the moment | 15:21 |
fungi | and made a note that we shouldn't go past step 17 until that gets done | 15:22 |
fungi | for step 10, i'll make sure the patch to the apache vhost config reflects the last edit to 936303 | 15:23 |
frickler | sorry for having been late with my review, we could also merge PS3 and add the testing in a followup? | 15:25 |
fungi | i've manually applied the latest vhost config from 936303 now | 15:31 |
fungi | apache is restarted, and i'm restarting the mailman containers now | 15:33 |
fungi | the various components take a couple of minutes to start, then we can check redirects and content manually (but have to accept the incorrect cert temporarily) | 15:34 |
fungi | the redirect for the example url is working correctly, but i'm still getting a 503 service unavailable for the moment | 15:36 |
fungi | okay, it's responding to me now | 15:38 |
fungi | https://lists.openinfra.dev/archives/list/foundation@lists.openinfra.dev/thread/QQ6CUBG337L3UP7FXKLWTCHBJISTQBOC/ gets correctly redirected to https://lists.openinfra.org/archives/list/foundation@lists.openinfra.org/thread/QQ6CUBG337L3UP7FXKLWTCHBJISTQBOC/ and has content | 15:38 |
fungi | so that's good | 15:39 |
fungi | starting exim back up | 15:40 |
fungi | i'm going to reply to my foundation ml message about the maintenance now and make sure it comes through and ends up in the archive | 15:41 |
clarkb | fungi: school run is complete anything I can do to be helpful with the mm3 work? | 15:48 |
fungi | i think we're still on track, i just sent a message to the foundation ml, though it hasn't arrived yet | 15:48 |
clarkb | infra-root I would be super appreciative if we can land https://review.opendev.org/c/opendev/system-config/+/936305/ sometime today with a plan for restarting Gerrit later in my day as system load falls. That will put us in a good spot to do the upgrade of gerrit at the end of the week as I can retest upgrading latest to latest bugfix releases | 15:49 |
fungi | okay, i'm going to start tracing my message through mta logs | 15:52 |
fungi | 2024-12-02 15:45:48 1tI8cI-0007Zc-Mz => foundation@lists.openinfra.dev R=dnslookup T=remote_smtp H=lists.openinfra.dev [2001:4800:7813:516:be76:4eff:fe04:5423] X=TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256 CV=no DN="C=UK,O=Exim Developers,CN=lists01.opendev.org" C="250 OK id=1tI8cO-005Vn8-LL" | 15:53 |
fungi | that's from my mta, so far so good | 15:53 |
fungi | 2024-12-02 15:45:48 1tI8cO-005Vn8-LL ** foundation@lists.openinfra.dev R=mailman_router T=mailman_transport H=localhost [127.0.0.1]: SMTP error from remote mail server after RCPT TO:<foundation@lists.openinfra.dev>: 550 Requested action not taken: mailbox unavailable | 15:55 |
fungi | hah! my local edits forgot about the forwarding aliases | 15:55 |
fungi | lemme add those quickly | 15:55 |
clarkb | ah ok you sent to the old address and that failed due to missing aliases makes sense | 15:55 |
fungi | right, i wanted to test the forwarding aliases, exactly what i forgot to hand-apply | 15:56 |
fungi | i'll make a mental note to adjust system-config-run-lists3 to also log the /etc/aliases file in future | 15:58 |
fungi | i've temporarily stopped exim again while i'm working to fix this bit | 15:58 |
fungi | okay, /etc/aliases.domain has been edited to match what 936303 should generate, and i'm starting exim again | 16:04 |
fungi | also 936303 is passing tests, so i've un-wipped it | 16:08 |
fungi | going to try re-sending my message to the foundation ml now | 16:09 |
fungi | foundation@lists.openinfra.dev host lists.openinfra.dev [2001:4800:7813:516:be76:4eff:fe04:5423] SMTP error from remote mail server after RCPT TO:<foundation@lists.openinfra.dev>: 550 Unrouteable address | 16:16 |
fungi | both lists.openinfra.dev and lists.openinfra.org are in the local_domains list in exim4.conf, and foundation@lists.openinfra.dev is aliased to foundation@lists.openinfra.org in the aliases.domain file | 16:17 |
clarkb | maybe its unroutable beacuse it asked mailman to take delivery and mailman refused? | 16:18 |
clarkb | though I guess we have exim logs to check first right? | 16:18 |
fungi | i'm digging into the logs on lists01 now to see if there's more detail | 16:18 |
fungi | yep | 16:18 |
corvus | exim -d -bt foundations@lists.openinfra.dev | 16:18 |
fungi | 2024-12-02 16:13:53 H=azathoth.yuggoth.org [2001:4802:7802:102:be76:4eff:fe20:6e0c] X=TLS1.2:ECDHE_SECP256R1__RSA_SHA256__AES_256_GCM:256 CV=no F=<fungi@yuggoth.org> rejected RCPT <foundation@lists.openinfra.dev>: Unrouteable address | 16:18 |
corvus | fungi: i'm just catching up, but how is it supposed to route? | 16:19 |
fungi | the entry in /etc/aliases.domain should reroute it to foundation@lists.openinfra.org and hand it off to the mailman_router | 16:20 |
fungi | file check: /var/lib/mailman/core/var/lists/${local_part}.${domain} | 16:20 |
fungi | expanded file: /var/lib/mailman/core/var/lists/foundation.lists.openinfra.org | 16:20 |
fungi | stat() yielded -1 | 16:20 |
fungi | yeah, so we do need to move some files | 16:20 |
fungi | i'm going to stop exim and mailman containers again for a moment while i adjust those | 16:21 |
fungi | did a `find /var/lib/mailman/ -name "*openinfra.dev*"` and other than old pipermail archives (which i need to reevaluate whether they still go to the right place now that we have blanket redirects), it's just the perlist directories under /var/lib/mailman/core/var/lists/ that need renaming | 16:23 |
corvus | ++ | 16:24 |
fungi | okay, those have been moved, services are back up again, will try to send a message once more | 16:35 |
clarkb | I got it | 16:39 |
fungi | as did i. it made it into the archive too: https://lists.openinfra.org/archives/list/foundation@lists.openinfra.org/thread/QQ6CUBG337L3UP7FXKLWTCHBJISTQBOC/ | 16:39 |
fungi | the new domain shows up as desired in relevant headers for list id and urls | 16:41 |
fungi | i think we're all set to approve 936303 at this point and get the ssl cert in place | 16:41 |
clarkb | rereviewing now | 16:44 |
clarkb | fungi: whats the deal with the updated rewrite rules? I guess the old ones just didn't work? | 16:45 |
fungi | yeah, my reading of the N option was apparently wrong, something was causing it to not get applied | 16:46 |
fungi | and also i needed to make the match string explicit there, %{SERVER_NAME} didn't work though maybe %{HTTP_HOST} would have | 16:46 |
fungi | and i made the http vhost rewrite more thorough since otherwise it would still have resulted in a second redirect to clients | 16:47 |
JayF | fungi: I think the bounce processing stuff is busted? I just got an email to -owners that *my own address* was removed from the list | 16:48 |
clarkb | fungi: +2 from me | 16:48 |
JayF | if I, as one of the active admins of the list, got my stuff disabled; wouldn't that mean we might have disabled a bunch of other valid members, too? | 16:49 |
fungi | JayF: yeah, i wanted to look at that next today. i ran into the same thing over the weekend, it seems my bounce score got incremented because my mailserver rejected a message for openstack-discuss-owner about a message held for moderation which included spam content | 16:49 |
JayF | Honestly I kinda would suggest, if possible, we revert *all* the bounce removals since the change. It's already hard enough to communicate with users when we don't have bugs that kick 'em off the list. | 16:50 |
JayF | if literally list admins can't stay subscribed, who can? | 16:50 |
fungi | but also there are subscribers getting disabled because of dmarc signature mismatches for posts from people at cisco and fujitsu, both of these issues suggest that the separate verp probes aren't being applied correctly. hoping to look into it as soon as i'm done withthis maintenance | 16:51 |
JayF | Yeah, this reinforces to me that we should, if possible (and I know this is easy to say as someone who wouldn't have to be doing the work) revert the removals of recent | 16:51 |
fungi | and yes, i can reenable all the disabled subscriptions too | 16:51 |
JayF | thank you :) | 16:51 |
clarkb | so can JayF | 16:51 |
clarkb | its in the members list I believe and entries can be toggled back over | 16:52 |
JayF | I can do it manually, one at a time | 16:52 |
JayF | is there a bulk way to do it? | 16:52 |
fungi | i may be able to script something more conveniently, it's hundreds of subscribers at this point | 16:52 |
clarkb | but also I think this points to maybe being less aggressive in bounce processing rather than disabling it? Trying to deliver hundreds of emails a year to invalid addresses is problematic for other reasons | 16:53 |
fungi | the bulk of whom probably are defunct addresses, but without manually parsing the ndr associated with each one's disablement it's hard to know | 16:53 |
JayF | My suggestion to revert for now doesn't say we shouldn't re-enable after making it less aggressive, just that we clearly have false positives that we should cleanup first imo | 16:55 |
clarkb | sure | 16:55 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Fix old openinfra.dev Pipermail archive rewrites https://review.opendev.org/c/opendev/system-config/+/936860 | 17:04 |
fungi | minor followup fix ^ | 17:05 |
fungi | i should probably add a regression test for that too | 17:09 |
fungi | mmm, though those only work when the files exist, which they won't on our test nodes because they're the result of manual import work, so not really easy to test without adding mock archive data | 17:13 |
fungi | as soon as 936303 merges, and assuming we're not already in an hourly prod run, i'll take lists01 back out of the disable list so we will hopefully get the letsencrypt update on it as immediately as possible | 17:22 |
clarkb | fungi: if you get a chance can you review 936305 I'd like to land that today and restart gerrit on 3.9.8 so that I can rerun through testing using the latest images this week | 17:43 |
fungi | you bet | 17:45 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Add mirror-container-images role and job https://review.opendev.org/c/zuul/zuul-jobs/+/935574 | 17:47 |
opendevreview | Merged opendev/system-config master: Move OpenInfra mailing lists to new domain https://review.opendev.org/c/opendev/system-config/+/936303 | 17:50 |
fungi | removed lists01 from the disable list | 17:58 |
fungi | infra-prod-letsencrypt is running now | 18:01 |
fungi | https://lists.openinfra.org/ has a working cert now | 18:13 |
clarkb | and hopefully we didn't recreate the old name and old lists :) | 18:13 |
clarkb | I doubt we did | 18:13 |
fungi | i need to dig through the deploy log on bridge before i'm sure | 18:14 |
fungi | but was waiting for infra-prod-service-lists3 to finish (just completed) | 18:14 |
fungi | the task to write /etc/aliases.domain changed, presumably because the entries i hand-added didn't end up in the same order as ansible would have sorted them | 18:33 |
fungi | similarly for the docker-compose file which subsequently triggered a pull and restart | 18:33 |
fungi | the vhost config got updated too, temporarily reverting my manual application of https://review.opendev.org/936860 | 18:35 |
clarkb | that all sounds like expected actions though? | 18:35 |
fungi | so far, yes, i'm still reading... | 18:36 |
fungi | which triggered an apache reload | 18:36 |
fungi | so other than some checks that reported changed: true, that was it. looks right | 18:37 |
fungi | and that's it for the maintenance | 18:38 |
fungi | approved 936305 for the gerrit point releases | 18:39 |
clarkb | thanks | 18:41 |
clarkb | will plan to do the restart sometime in my afternoon as things calm down | 18:41 |
clarkb | and then tomorrow after meetings my intention is to rerun through the upgrade process with the new images, update notes as necessary, and finish up any remaining todos. I think I need a change to update the image we deploy with ansible for example | 18:42 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Enable extra VERP probes in Mailman https://review.opendev.org/c/opendev/system-config/+/936873 | 19:11 |
fungi | infra-root: JayF: ^ that's what we wanted all along, i think. for some reason i thought that was on by default | 19:13 |
fungi | also i've closed out the root screen session on lists01, but the transcript from it is saved as ~root/screen.lists01.maintenance.2024-12-02.log | 19:14 |
clarkb | +2 from me | 19:15 |
fungi | https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/4YFOLEIPOXPZD4Q6H73LT4RIAQPRNDIE/ looks like what we need for reenabling subscriptions on openstack-discuss | 19:19 |
fungi | i'll start on that in parallel | 19:19 |
fungi | #status log Reenabled 325 subscriptions which had been recently disabled for excessive bounces on the openstack-discuss mailing list | 19:27 |
fungi | JayF: ^ | 19:27 |
opendevreview | Merged opendev/system-config master: Fix old openinfra.dev Pipermail archive rewrites https://review.opendev.org/c/opendev/system-config/+/936860 | 19:30 |
opendevreview | Merged opendev/system-config master: Update Gerrit images to 3.9.8 and 3.10.3 https://review.opendev.org/c/opendev/system-config/+/936305 | 19:30 |
fungi | once 936873 lands, any new subscription disablements should indicate legitimately defunct addresses | 19:30 |
JayF | Awesome, thanks. | 19:43 |
fungi | huh, the jjb maintainers are continuing to develop it and tag new releases (according to the deprecation warning we got that jenkins-job-builder-6.4.2.tar.gz is a non-normalized sdist filename) | 19:57 |
clarkb | fungi: its still quite popular I think | 20:14 |
fungi | that's pretty awesome | 20:15 |
clarkb | image promotion succeeded for 936305 so I think we can update to the latset 3.9.8 image when we are ready | 20:15 |
clarkb | fungi: on the mailman side of things are we sufficiently caught up that a gerrit restart at ~21:30 should be fine? | 20:17 |
clarkb | thinking if we get gerrit done by then maybe we can also update lodgeit | 20:17 |
fungi | clarkb: absolutely, though getting 936873 merged soonish would be good if any other infra-root has a moment to look it over | 20:17 |
fungi | and yeah, getting lodgeit knocked out too would be amazing | 20:18 |
clarkb | I feel liek the verp setting is straightforward enoguh we could just go ahead and approve that | 20:18 |
clarkb | any objections? | 20:18 |
fungi | corvus sometimes has opinions on mailing list management. i think more generally though we were all operating under the assumption that this was the default behavior in mm3 (possibly my fault for misinterpreting documentation and/or mailman-users discussions) | 20:20 |
fungi | thanks frickler! | 20:23 |
corvus | sgtm | 20:24 |
clarkb | I've just noticed that our screenshots of gerrit 3.9.8 show we're missing some font glyphs for the arrows showing you what change you are on in the change relation chain | 20:27 |
clarkb | I don't think that is a major issue though and half suspect it isn't new with 3.9.8 either | 20:27 |
clarkb | we render one of the utf8 blocks with the code inside within the test env and simply adding more fonts would likely fix it | 20:28 |
clarkb | I'm finding my held node now to confirm I don't see the same on my local browser with 3.9.8 | 20:28 |
clarkb | confirmed not an issue with my local browser so must be a problem with the test env | 20:29 |
clarkb | opendevorg/gerrit 3.9 9ba502024800 <- this is the docker image we're currently running for gerrit 3.9.7.x | 21:07 |
fungi | noted | 21:08 |
clarkb | Getting ready to do a quick update to the new 3.9.8 image. Does this look good #status notice We are updating Gerrit to the latest 3.9 version in preparation for Friday's Gerrit upgrade to 3.10. You may notice a short outage of Gerrit. | 21:08 |
clarkb | Then also last time we did an upgrade it was asked that we announce it a lot more to remind people a day or so in advance so I'll probably #status notice and send antoher email on Thursday if it looks like we're going to proceed | 21:09 |
fungi | sure, wfm, though i usually lead with the fact that the service is going offline momentarily (followed by reasons) | 21:09 |
clarkb | how about this #status notice Gerrit will have a short outage while we update to the latest 3.9 release in preparation for our 3.10 upgrade on Friday | 21:10 |
clarkb | thats a bit shortere and more to the point | 21:10 |
clarkb | I've started a screen on review02. I'll send that revised notice at 21:30 UTC then proceed with a pull, verification of hte new image, down, mv of waiting queue, then up -d | 21:12 |
fungi | loks great, yep | 21:15 |
fungi | looks | 21:15 |
fungi | attached to the screen session | 21:16 |
clarkb | I went ahead and pulled the image as I can do that first and docker inspect shows that image matches https://hub.docker.com/layers/opendevorg/gerrit/3.9/images/sha256-cea988456b9c158ba9920b745e3484c2ba76e1ea7125468e5ebcb7204b146aa7 | 21:26 |
clarkb | #status notice Gerrit will have a short outage while we update to the latest 3.9 release in preparation for our 3.10 upgrade on Friday | 21:30 |
opendevstatus | clarkb: sending notice | 21:30 |
-opendevstatus- NOTICE: Gerrit will have a short outage while we update to the latest 3.9 release in preparation for our 3.10 upgrade on Friday | 21:30 | |
clarkb | I wonder if we can make the bot go faster | 21:33 |
* clarkb waits patiently | 21:33 | |
opendevstatus | clarkb: finished sending notice | 21:34 |
clarkb | ok proceeding now | 21:34 |
clarkb | INFO com.google.gerrit.pgm.Daemon : Gerrit Code Review 3.9.8-1-gcf4c706eb8-dirty ready | 21:35 |
clarkb | web loads for me but I'm still waiting for diffs to show up again | 21:36 |
clarkb | fungi: there is a traceback trying to discover the openid endpoint. In an incognito tab I do get it to redirect all the way to ubuntu one just fine but haven't logged in yet (I'll work on that after diffs show up) any chance you can login with your extra user too to check that works as epxected? | 21:38 |
clarkb | I half suspect the problem is related to the link involved (it was using gitiles?) | 21:38 |
fungi | looking | 21:38 |
clarkb | ok I do get diffs now | 21:38 |
clarkb | we don't even publish gitiles links anymore too so I wonder if that is a bot | 21:39 |
fungi | i was logged out of the webui already, so logged in with my normal account | 21:39 |
fungi | it redirects back to https://review.opendev.org// without logging me in | 21:39 |
clarkb | ya I see a few more of those errors in the log now arg | 21:40 |
clarkb | ok so now what :/ | 21:40 |
fungi | trying with my secondary test accound | 21:40 |
fungi | account | 21:40 |
clarkb | it says cannot discover openid then lists the openid path | 21:41 |
fungi | mmm, i get https://review.opendev.org/SignInFailure,SIGN_IN,Contact+site+administrator but it's possible i previously disabled that account for a test | 21:41 |
clarkb | which I'm wondering if those don't line up with external account ids for some reason | 21:41 |
fungi | were there changes to the openid functionality? | 21:41 |
clarkb | https://www.gerritcodereview.com/3.9.html#398 not that I was aware of | 21:42 |
fungi | maybe we were testing too soon? my normal account is logged in now and getting the normal dashboard | 21:42 |
clarkb | I wonder if it is related to the account caches | 21:42 |
fungi | ooh, mayhaps | 21:42 |
clarkb | I restarted my tail of the log to see if we still get those | 21:43 |
clarkb | and now I'll try to login in a different or incognito browser | 21:43 |
clarkb | fungi: it redirected me to // but I was logged in | 21:46 |
clarkb | and since I restarted my tail only hte what i think is bogus gitiles link has exploded not other people (nor myself) | 21:46 |
fungi | yeah, having a bit of deja-vu about the extra trailing / | 21:46 |
clarkb | I'm going to log out again and then login on a change page and see what happens | 21:46 |
clarkb | starting from a change page it wasn't happy then I logged in again and then it worked | 21:49 |
clarkb | and the openid it says it cannot discover in the error_log message seems to match what I have listed under my identities | 21:49 |
fungi | that's definitely weird... maybe something is cached browser-side initially and not matching up? | 21:51 |
clarkb | this was in an incognito tab so shouldn't be using teh cache I don't think | 21:51 |
clarkb | and yes I feel like we've run into something similar in the past | 21:51 |
clarkb | I want to say its something like gerrit caches openid stuff on startup and if the remote isn't happy when you get problems? | 21:52 |
clarkb | I don't remmber the specifics or feel confident in that only to say I think we have hit something similar in the past and I feel like the solution wasn't really a fxi fix just somethign to live with | 21:52 |
clarkb | tracing the back and forth between openid server and gerrit the openid server sends us back to gerrit openid login url then that 302's to location: https://review.opendev.org// | 21:57 |
clarkb | I feel like maybe something changed to where gerrit is redirecting to / explicitly and maybe we're not proxypassing that properly? I don't see anything obviously wrong in the login handshake but it does send us explicitly to // | 22:00 |
clarkb | I would've expected it to send us to whatever page we were on before | 22:00 |
clarkb | ya the return_to argument is https://review.opendev.org/OpenIDstuff | 22:01 |
clarkb | so its gerrit making the decision to redirect to // from there and in the process landing us in a weird page for the user | 22:02 |
fungi | we could try dropping the trailing / from the destinations in our ProxyPass and ProxyPassReverse directives, but mostly stabbing in the dark now | 22:03 |
fungi | at least it's not hard to test if it's only apache config adjustments | 22:05 |
clarkb | I think the fail to discover thing is maybe just noise | 22:06 |
clarkb | I just did a new login and got that traceback to trip but it let me in. Didnd't change the behavior re // but I was able to log in | 22:06 |
clarkb | oh wait maybe I didn't log in | 22:06 |
clarkb | the failure to discover is a timeout error to the openid server | 22:06 |
opendevreview | Merged opendev/system-config master: Enable extra VERP probes in Mailman https://review.opendev.org/c/opendev/system-config/+/936873 | 22:07 |
clarkb | the annoying thing is this seems to work the vast majority of the time (except for the // problem) which is making it hard to undersatnd what is noise and what isn't | 22:09 |
clarkb | I think that what may be happening with the discovery errors is when you log in after the dance gerrit fetches your openid info from the backend to populate things like name and so on | 22:10 |
clarkb | name email? and that can fail for existing users and its fine but new users might fail hard? | 22:10 |
clarkb | but it also doesn't happen 100% of the time as I can login multiple times and it only fails occasionally | 22:10 |
fungi | yeah, this is weird, and i want to say it wasn't the case with the previous version... but maybe that was a matter of timing and there's something separate going on with ubuntu one openid? | 22:13 |
clarkb | ya fwiw canonical web url does have a / | 22:14 |
fungi | mmm | 22:14 |
clarkb | which makes me wonder if our proxy pass is appending one to the reverse path | 22:14 |
clarkb | and maybe older gerrit was chomping the trailing / but doesn't now or something | 22:15 |
fungi | i'll buy that | 22:17 |
clarkb | https://gerrit-review.googlesource.com/c/gerrit/+/442201 | 22:18 |
clarkb | this is almost certainly the cause | 22:18 |
clarkb | I don't know why yet but it has to do with login redirects and is new in 3.9.8 | 22:19 |
clarkb | so ya I think we ignore the discovery problem that we see tracebacks for for the moment under the assumption it is the ubuntu side having an occasional sad and now focus on understanding ^ to address login redirect ebhavior | 22:19 |
clarkb | and this will affect 3.10 too so I guess good to rip the bandaid off | 22:20 |
clarkb | https://gerrit-review.googlesource.com/c/gerrit/+/421238 is the original change. I was hoping it would have mroe info | 22:20 |
clarkb | I half wonder if we went from going from /login/stuff to //login/stuff and then that initial / prefix is being carried all the way through the login process | 22:21 |
clarkb | I'll try and trace that now | 22:21 |
clarkb | its the other way around its trying to login to /login// | 22:23 |
fungi | oh, huh... | 22:25 |
clarkb | actually it does /login/%2F | 22:25 |
clarkb | and it makes that same request even when logging in from a cahgne page | 22:25 |
fungi | wow, so encodes the / from the login url i guess? which then doesn't get deduplicated | 22:26 |
clarkb | ya though looking at the change itself it seems like the /login/%2f behavior should've been there before. Its only the new prefix stuff that has changed | 22:27 |
clarkb | and manually logging in via navigation to /login/ produces the same // result post login | 22:28 |
clarkb | so maybe this isn't related? it just seems too coincidental to not be somehow | 22:28 |
clarkb | I'm going to ask upstream | 22:29 |
clarkb | fungi: can you check replication is happy for 936873? that was the last thing on my todo list after I got derailed by the login fun | 22:30 |
fungi | oh, sure | 22:31 |
clarkb | I suspect we don't need to rollback if this is the only issue | 22:32 |
fungi | https://opendev.org/opendev/system-config agrees the merge commit for that is current | 22:32 |
clarkb | as it is annoying but workable | 22:32 |
clarkb | ok if I manually login via /login it works. /login/ and /login/%2f produce the broken https://review.opendev.org// result | 22:35 |
clarkb | in that change getBaseUrl must return '/' otherwise we wouldn't be sent to /login/%2f | 22:38 |
clarkb | it would still redirect to /login/ but maybe assign chomps trailing /'s or something | 22:39 |
clarkb | fungi: I'm going to dive into the code and diffs between 3.9.7 and 3.9.8 to try and understand this more. My read on the situation is that this is annoying but not a fatal issue and we don't need to revert | 22:43 |
clarkb | if your read is different feel free to chime in and we can work on a revert instead. However, I half expect that 3.10 is affected too based on the cherry picking of the suspected change and that would mean the upgrade would need to be sorted out if we revert | 22:44 |
clarkb | compare https://gerrit.googlesource.com/gerrit/+/refs/tags/v3.9.8/polygerrit-ui/app/elements/core/gr-router/gr-router.ts#450 to https://gerrit.googlesource.com/gerrit/+/refs/tags/v3.9.7/polygerrit-ui/app/elements/core/gr-router/gr-router.ts#457 there is actually more difference | 22:47 |
clarkb | https://gerrit-review.googlesource.com/c/gerrit/+/442162 maybe | 22:49 |
fungi | yeah, pressing forward makes more sense than rolling back. if we end up with a bunch of complaints we can revisit | 22:56 |
corvus | clarkb: oh are you talking about on current review.o.o? for some reason i thought you were looking at a preview job | 23:10 |
clarkb | corvus: yes this is production after the 3.9.7 -> 3.9.8 update we just did | 23:11 |
clarkb | my goal was to get that done today so that I can test the 3.9.8 -> 3.10.3 upgrades before the upgrade Friday (hopefully as early as tomorrow to start that testing) | 23:12 |
corvus | got it. | 23:12 |
clarkb | but I'm likt 99% certain this is fallout from a fairly big refactoring upstream did around paths and changing urls and redirects | 23:12 |
clarkb | so would affect us on 3.10.3 too (and maybe 3.10.2 depending if/when they backported this stuff to 3.10) | 23:12 |
clarkb | I've found at least one bug in their code but its not directly related to this I don't think | 23:15 |
clarkb | looking at https://review.rdoproject.org/r/q/status:open+-is:wip the sign in button /login/ path is actually capturing the query or change page info there | 23:24 |
clarkb | so that when you login you go back to that spot | 23:24 |
fungi | they're on 3.7.8, so fairly far back | 23:26 |
clarkb | ya but that was good for some new info | 23:26 |
clarkb | if I hard refresh (maybe just a regular refresh will do I haven't tested yet) a change page or the path that a single / redirects you to then the sign in url path acts like the rdoproject gerrit's and appends the full path behind the login/ | 23:27 |
clarkb | the problem I think ultimately is that the sign in path is not being updated to match the page you are on | 23:28 |
clarkb | which previously it was so you're always ending up at the dashboard page or whatever | 23:28 |
clarkb | its possible the // path problem has been there forever but / isn't really a valid path in gerrit it always redirects you to something else | 23:28 |
clarkb | ya I think the problem is that gerrit sin't updating the url to follow you as you either get redirected or navigate around | 23:32 |
clarkb | but ya I believe this to be the issue | 23:35 |
clarkb | and I think I can live with that for now knowing what is going on | 23:35 |
clarkb | I've posted some of this debugging process upstream in discord | 23:35 |
clarkb | I'll see if they want me to file an issue etc | 23:36 |
fungi | i guess the discord-matrix bridge for that channel died some time ago | 23:37 |
clarkb | yes. I've brought it up but current gerrit membership dioesn't know how that was setup or how todebug it | 23:37 |
clarkb | gerrit's upstream gerrit doesn't seem to have this problem with the sign in button | 23:37 |
clarkb | oh wait no it does | 23:38 |
clarkb | go to https://gerrit-review.googlesource.com and then hover the sign in button you'll see its a login// too | 23:38 |
clarkb | navigate to a change and same thing | 23:38 |
clarkb | if I login there it redirects me to my personal dashboard even if I started on a change page | 23:39 |
opendevreview | Vladimir Kozhukalov proposed zuul/zuul-jobs master: Respect image registry in container roles https://review.opendev.org/c/zuul/zuul-jobs/+/936909 | 23:43 |
clarkb | to summarize the problem is with Gerrit's construction of the url in the 'sign in' buttons. They are lacking sufficient context to send you back to where you want to go. Since / was never really a valid url that was never expected to work but is what we end up falling back on in these cases | 23:47 |
clarkb | I think this is ok. We can just ask people to live with it for a bit particularly since Gerrit 3.11 upstream also seems to exhibit the same behavior so its either downgrade and wait a while or upgrade and deal with it | 23:47 |
clarkb | fungi: any objection to me closing the root screen on review now? | 23:47 |
clarkb | we recorded the image id for 3.9.7 above if we do want to fallback (or we can rebuild onto 3.9.7 new images) | 23:48 |
clarkb | but other than the login weirdness I think this is looking ok ? and the login weirdness is understood well enough to deal with it | 23:48 |
fungi | clarkb: no objection | 23:52 |
clarkb | done | 23:57 |
clarkb | upstream has asked if I can test with a certain change reverted (442162) I'll work on that tomorrow as I have to sort out a meeting agenda now and send that out before dinner | 23:58 |
clarkb | my plan is to push that revert to stable-3.9 upstream then depends-on locally and hold the 3.9 system-config-run job | 23:58 |
clarkb | should be straightforward once I figure out how to push to upstream again | 23:58 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!