frickler | JayF: fwiw I've been thinking to set up a link shortener fed by gerrit reviews, that would be pretty simplistic. for now though, I'm just running one privately for the things I regularly need | 05:44 |
---|---|---|
frickler | seems gmail is still at least delaying mails because we have more recipients than they like. I really wonder if proactively blocking mails towards them wouldn't be a better solution | 07:24 |
frickler | if not, we'll likely have to do further work and maybe deploy ARC https://docs.mailman3.org/projects/mailman/en/latest/src/mailman/handlers/docs/arc_sign.html | 07:25 |
*** Guest8552 is now known as layer8 | 08:37 | |
*** layer8 is now known as Guest9391 | 08:37 | |
*** Guest9391 is now known as layer9 | 08:39 | |
opendevreview | Merged opendev/zone-opendev.org master: Remove old mirror nodes from DNS https://review.opendev.org/c/opendev/zone-opendev.org/+/902716 | 12:31 |
opendevreview | Bartosz Bezak proposed openstack/diskimage-builder master: Add NetworkManager-config-server to rocky-container https://review.opendev.org/c/openstack/diskimage-builder/+/892893 | 12:57 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Switch install-docker playbook to include_tasks https://review.opendev.org/c/opendev/system-config/+/902775 | 13:58 |
fungi | noticed that in this deploy job failure: https://zuul.opendev.org/t/openstack/build/7c9446f8b03345158de4346e946f8f31 | 13:59 |
fungi | not sure if it was the reason for the failure or just a red herring, but worth cleaning up either way | 13:59 |
fungi | yeah, it wasn't the cause. found this in the log on bridge: | 14:00 |
fungi | Recreating jaeger-docker_jaeger_1 ... done | 14:00 |
fungi | fatal: [tracing01.opendev.org]: FAILED! | 14:00 |
fungi | Timeout when waiting for 127.0.0.1:16686 | 14:00 |
fungi | so the container didn't come up, or didn't come up fast enough | 14:01 |
fungi | the periodic build also failed exactly the same way | 14:03 |
fungi | the last successful periodic run was on friday | 14:04 |
fungi | the container log is complaining about subchannel connectivity problems | 14:18 |
fungi | 2023-12-06T12:40:47.191630938Z grpc@v1.59.0/server.go:964 [core][Server #5] grpc: Server.Serve failed to create ServerTransport: connection error: desc = ServerHandshake("127.0.0.1:45852") failed: tls: first record does not look like a TLS handshake | 14:19 |
fungi | i'm going to try manually downing and upping the container just to see if i get anything else out of it | 14:20 |
tonyb | Kk | 14:20 |
fungi | log is still full of connection errors after the restart | 14:23 |
fungi | looks like there were updates for the jaegertracing/all-in-one container image 44 hours ago, 4 days ago, and 7 days ago | 14:24 |
tonyb | can we get the SHA for the last good deploy? | 14:25 |
fungi | i don't know if the failures are related to the images, but the image from 7 days ago was definitely before the periodic job started failing | 14:26 |
fungi | the one from 4 days ago falls inside the window between the last successful run on friday and the first failing run on monday | 14:28 |
fungi | the one from 44 hours ago was after the first failure | 14:29 |
fungi | docker image inspect of the 7-day-old image says it's jaegertracing/all-in-one@sha256:963fed00648f7e797fa15a71c6e693b7ddace2ba7968207bb14f657914dac65b | 14:30 |
fungi | "Created": "2023-11-29T06:06:18.566997546Z" | 14:30 |
fungi | can i replace "latest" with "963fed00648f7e797fa15a71c6e693b7ddace2ba7968207bb14f657914dac65b" in the compose file to test? does that syntax work? | 14:32 |
fungi | not found: manifest unknown: manifest unknown | 14:33 |
fungi | guess not | 14:33 |
fungi | i switched from latest to 1.51 after looking at https://hub.docker.com/r/jaegertracing/all-in-one/tags and am seeing similar connection failures | 14:35 |
tonyb | Try FROM jaegertracing/all-in-one@sha256:963fed00648f7e797fa15a71c6e693b7ddace2ba7968207bb14f657914dac65b | 14:35 |
tonyb | Ah okay so that probably isn't it | 14:36 |
fungi | mmm, though these connection failures are info and warn level only | 14:37 |
fungi | 2023-12-06T14:36:11.773226176Z grpc@v1.59.0/clientconn.go:1521 [core][Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "localhost:4317", ServerName: "localhost:4317", }. Err: connection error: desc = transport: Error while dialing: dial tcp 127.0.0.1:4317: connect: connection refused | 14:37 |
frickler | infra-root: forgot to mention yesterday, we still have three old held nodes that don't show up in autoholds, can someone look into cleaning those up? | 14:44 |
frickler | there's also node 0035950265 that seems to be stuck in "deleted" state somehow | 14:48 |
fungi | sometimes `openstack server show ...` will include an error message in those situations | 14:49 |
frickler | I think that's more for "deleting" rather than "deleted"? anyway: No Server found for 01b73bd3-ad22-46a6-a88e-6e33fc2f4b61 | 15:00 |
fungi | "stuck in deleted state" according to openstack server list? or according to nodepool list? | 15:01 |
frickler | according to the "nodes" tab in zuul, is that the same as nodepool list? | 15:02 |
fungi | i think so, i haven't relied much on the webui for that | 15:02 |
fungi | so is that where you're also seeing the held nodes you're talking about? | 15:03 |
frickler | yes | 15:05 |
fungi | nodepool list reports 11 nodes in a hold state, which corresponds to what i see at https://zuul.opendev.org/t/openstack/nodes | 15:06 |
frickler | just checked, "nodepool list" doesn't have 0035950265, but 31 other very old nodes in "deleted" state | 15:06 |
frickler | so a) some mismatch in state between zuul and nodepool and b) some cleanup in nodepools zk being broken I guess | 15:07 |
fungi | zuul/nodepool switched from numeric node ids to uuids semi-recently | 15:08 |
fungi | maybe that's the event horizon where it lost track of the old deleted nodes | 15:10 |
frickler | I'm still seing numeric node ids both in zuul and nodepool, though. I think the switch was only for image IDs? | 15:10 |
fungi | oh, maybe | 15:11 |
frickler | build ids to be more specific | 15:11 |
frickler | image builds, that is, not zuul builds | 15:12 |
fungi | looks like all the ones `nodepool list` reports in a "deleted" state are missing pretty much all data besides the node number, state, time in state, lock status, username, port | 15:13 |
fungi | no provider even | 15:13 |
fungi | so yes, probably will require some manual cleanup with zkcli | 15:13 |
fungi | looks like they're all around 8-12 months old | 15:14 |
JayF | frickler: running one privately is an option, I should do that | 15:59 |
tonyb | I did a little poking at the ensure-pip role for enabling python3.12. Under the hood both pyenv and stow use the python-build tool from pyenv. It's just a question of when. pyenv would build a python3.12 on every job run, stow would build the python3.12 binary when we build the nodepool image (by using the python-stow-versions DIB element) | 16:00 |
tonyb | I guess for now as a POC option 1 pyenv with job builds is the quickest POC | 16:01 |
fungi | looks like the issue with the jaeger role may be that the 60-second timeout is too short | 16:16 |
fungi | it does eventually listen and start responding on port 16686 but takes a while | 16:17 |
fungi | i'll see if i can get a baseline timeframe | 16:17 |
clarkb | to figure out the extra held nodes nodepool can list them with the detail data. I'll take a look shortly | 16:20 |
fungi | clarkb: the "extra" held nodes already show the comment info in the nodes list (both from the nodepool cli and zuul's webui) | 16:20 |
fungi | there just isn't a corresponding autohold zuul is tracking any longer | 16:21 |
clarkb | oh then we just identify if they need to be held any longer and if not delete them | 16:21 |
clarkb | based on the comment | 16:22 |
fungi | empirical testing suggests 60 seconds is way too short of a timeout for jaeger to start listening. my initial test took 80 seconds. i'll run a few more restarts to see how consistent that is | 16:22 |
tonyb | fungi: so we've just gotten *really* lucky with the 60s timeout to date? | 16:24 |
fungi | or something has recently caused it to get slower | 16:24 |
clarkb | frickler: fungi: the quoted text in the commit messages for the spf record updates implied that spf or dkim were sufficient. Maybe they actually want spf and dkim? In which case arc does seem like a next step | 16:24 |
tonyb | Fair | 16:24 |
fungi | i can try rolling back the version again to see if it speeds up | 16:24 |
fungi | clarkb: the new deferral responses from gmail doesn't indicate it has anything to do with message authentication, nor does it imply that additional message authentication would help | 16:25 |
clarkb | fwiw jaeger must sit in front of a db of some sort. I'm not sure how safe rolling backwards will be | 16:25 |
clarkb | fungi: oh they chagned the message? | 16:25 |
fungi | clarkb: it says there are too many recipients for the same message-id | 16:25 |
tonyb | I know I had to add SPF and DKIM to my bakeyournoodle.com domain to get google to not send messages directly into spam. | 16:25 |
clarkb | weird that we would do what they ask and then complain about something else | 16:25 |
tonyb | but lists are ... different | 16:26 |
fungi | 80 seconds again on my second restart test | 16:26 |
tonyb | I guess the MTA just bailed at the first error so there could be more coming | 16:26 |
tonyb | ... Other communities must have hit this | 16:27 |
clarkb | tonyb: likely, but these are also new changeso n the google side | 16:28 |
clarkb | so it may be we're all hitting them in the last couple of days and scratching our heads | 16:28 |
fungi | well, gmail wasn't exactly barfing before. it rate-limited deliveries from the server, saying authenticating by adding either spf or dkim would help (and will become mandatory in a couple months). now it's rate limiting again, but because of the number of people receiving the same post (so basically the number of gmail subscribers to lists) | 16:28 |
clarkb | and I'm sure they'll be intentionally vague/obtuse in the name of not giving spammers a leg up | 16:29 |
fungi | this time the implication is that the deferral is per message-id though, so it seems like some gmail subscribers receive the message, but it takes multiple returns back from the deferral queue before everyone does | 16:29 |
clarkb | the incredibly frustrating thing here is that if you use email you'll be aware of the fact that gmail is the source of a significant portion of spam | 16:29 |
fungi | and yeah, i expect this is just gmail ratcheting up their rules to try to block spam messages. i half wonder if it's also restricted to posts from people using gmail or gmail-hosted addresses. i've seen insinuation elsewhere that gmail is cracking down on messages that say they're "from" a gmail account but are originating from servers outside gmail's network. this may be the time we turn on | 16:33 |
fungi | selective dmarc mitigation on openstack-discuss for people posting from gmail.com and any other domains which seem to be hosted at gmail | 16:33 |
fungi | supposedly they added that feature in the latest release specifically because of gmail delivery problems | 16:34 |
fungi | basically, mailman tries to guess how to mitigate messages with existing dkim signatures by evaluating the published dmarc policies for the post author's domain, but gmail doesn't obey its own published dmarc policies so mailman guesses wrong | 16:36 |
tonyb | https://support.google.com/mail/answer/81126#requirements-5k&zippy=%2Crequirements-for-sending-or-more-messages-per-day | 16:36 |
tonyb | It seems to be a reaonsably long list of do's | 16:36 |
tonyb | SPF, DKIM and ARC | 16:36 |
fungi | we don't send 5k messages a day, but maybe they mean multiplied by the number of recipients | 16:37 |
tonyb | Yeah I suspect 1 list-email to 100 [gmail]accounts counts as 100 in the context of that document | 16:39 |
opendevreview | Brian Rosmaita proposed openstack/project-config master: Implement ironic-unmaintained-core group https://review.opendev.org/c/openstack/project-config/+/902796 | 16:39 |
fungi | all my jaeger restart tests are coming in at 80 or 81 seconds | 16:41 |
clarkb | maybe set the timeout to 160 seconds then? | 16:42 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Increase jaeger startup timeout https://review.opendev.org/c/opendev/system-config/+/902797 | 16:44 |
clarkb | infra-root https://review.opendev.org/c/opendev/system-config/+/902490 could use reviews if we still want to restart gerrit and try to use the new key again | 16:53 |
clarkb | I've just come to a realization that that file is not using any templating anymore. Do you want me to make it a normal file copy and stop jinjaing it? | 16:53 |
clarkb | that file == .ssh/config | 16:53 |
fungi | that would probably be better, yes | 16:55 |
clarkb | ok I'll make that update momentarily. Still haven't sorted out ssh keys this morning | 16:56 |
opendevreview | Clark Boylan proposed opendev/system-config master: Reapply "Switch Gerrit replication to a larger RSA key" https://review.opendev.org/c/opendev/system-config/+/902490 | 17:02 |
clarkb | fungi: ^ that removes the jinja templating | 17:02 |
fungi | lgtm, thanks! | 17:04 |
tonyb | and me. | 17:05 |
clarkb | we still good to do a restart later today? If so I'll go ahead and approve it now | 17:06 |
tonyb | Yup. I'll be AFK from 1730-1900[UTC] but I'm also optional | 17:07 |
clarkb | frickler: two of the nodes are related to holds I did for testing of the Gerrit bookworm and java 17 update. Those two can be deleted (I'll do this). The third is a kolla octavia debugging hold. I believe all three were leaked bceause we did a zuul update that including removing the zuul database (but not nodepools) | 17:08 |
clarkb | frickler: I'll let you delete 0035154490 when you are done with it | 17:09 |
tonyb | We could also merge https://review.opendev.org/q/topic:gerrit-3.7-cleanup+is:open as it doesn't touch prod right? | 17:09 |
fungi | i expect to step out to a late lunch/early dinner 19:30-20:30 utc | 17:09 |
fungi | otherwise i'm available | 17:09 |
frickler | clarkb: how do I delete it? I only know how to delete autoholds | 17:10 |
clarkb | cool the best time for me is probably after 21:00 anyway (since before that I've got lunch and all the stuff from yesterday to catch up on) | 17:10 |
clarkb | frickler: on a nodepool launcher node (nl01-nl04) you can run nodepool commands using this incantation `sudo docker exec nodepool-docker_nodepool-launcher_1 nodepool $subcommand`. In this case I ran the `list --detail` subcommand and piped it to `grep hold` to find the nodes nodepool sees as heald | 17:12 |
clarkb | frickler: tehn you can run the `delete $nodeid` subcommand to delete nodes directly | 17:12 |
clarkb | frickler: the number I pasted above is the node id (0035154490) | 17:12 |
clarkb | it shouldn't generally be necessary but I believe when we cleared out the zuul database entirely (because there was a manual upgrade problem? logs would tell us exactly why) that only claered out the zuul side of the zk database and nodepool kept its portion of the held records | 17:14 |
clarkb | generally we don't want to dlete the nodepool side of the db because it keeps track of state outside of our systems and we want that to be in sync as much as possible | 17:15 |
opendevreview | Merged opendev/system-config master: Increase jaeger startup timeout https://review.opendev.org/c/opendev/system-config/+/902797 | 17:22 |
frickler | clarkb: ok, thx, that seems to have worked fine, node 0035154490 is gone. doing a delete on one of the deleted nodes has put it back into "deleting" | 17:29 |
clarkb | ya the delete command puts nodes in a deleting state in the db then normla nodepool runtime loops process that deletion | 17:29 |
clarkb | when nodes are already deleting and stuck in that state an explicit deletion in nodepool is unlikely to change anything due to this. We can try to delete things manually using the openstack client directly and see if we get any errors back that we can parse and take action on | 17:30 |
clarkb | looking at the ARC stuff that is basically fancy DKIM for mailing lists? We wouldn't need to configure separate DKIM records? | 17:33 |
clarkb | or would we need separate DKIM for the administrative emails that come directly from the server? | 17:34 |
fungi | we'd also have to stop keeping the original from addresses | 17:34 |
tonyb | The doc I linked to indicates that we need SPF and DKIM for all "bulk" mail senders, and ARC for list hosts specifically | 17:35 |
clarkb | hrm I'm confused as to what ARC does then. If we're rewriting the email to say its from us then we would just do normal DKIM ? | 17:35 |
fungi | oh, i have no idea about arc, by "fancy dkim" thought you still meant based on the from address | 17:35 |
tonyb | We're saying it's from us on behalf for "them", and we good with that | 17:36 |
tonyb | IIUC | 17:36 |
clarkb | ya I think my confusion is taht ARC is just DKIM | 17:36 |
fungi | so far what little i've known about arc is that it's yet another attempt by massmail providers to make e-mail impossible for anyone who isn't them | 17:36 |
tonyb | LOL | 17:36 |
clarkb | but we've got another term in play to encapsulate the "remove the source DKIM stuff and replacei t with our own" | 17:36 |
clarkb | the dns records used by arc are dkim records | 17:37 |
clarkb | so it is just dkim with maybe extra steps | 17:37 |
tonyb | heading out. I'll be on my phone if needed | 17:37 |
fungi | but anyway, if people want to get list mail in a timely manner, they can subscribe from a proper mail provider. i've maybe got bandwidth to look at what would be involved in adding our own dkim signing sometime next year | 17:38 |
clarkb | ya I mean I gave up on gmail for open source mailing lists in ~2015? I don't remember exactly when I jumped ship due to the problems they were creating back then | 17:42 |
clarkb | Definitely not new problems. What I think has chagned is in the intervening period more people (often due to employer choices) have ended up on gmail for work like this and gone the opposite direction | 17:43 |
fungi | yes, well i also don't subscribe to mailing lists with my employer-supplied e-mail address for similar reasons | 17:44 |
clarkb | to followup on the ControlPersist chagne I can see ssh processes passing in the new values so it applied properly and doesn't seem to have broken anything. On the process cleanup side of things the main ssh process with the -oControlPersist=180s flag does seem to go awy when ansible goes away. but there are ssh processes associated with .ansible/cp/$socketid sockets that appear to | 19:13 |
clarkb | hang around for the timeout | 19:13 |
clarkb | so there is "leakage" but I think three minutes is still short enough to not create too much headache | 19:13 |
clarkb | those socket paths are the -o ControlPath values so ssh must fork a child or something to manage the socket because the timeout is meant to last beyond the lifetime of the parent if you start another control persist process with the same path? | 19:14 |
clarkb | frickler: ^ fyi since there was qusetion about this | 19:15 |
opendevreview | Merged opendev/system-config master: Reapply "Switch Gerrit replication to a larger RSA key" https://review.opendev.org/c/opendev/system-config/+/902490 | 19:19 |
fungi | oh, as for the jaeger startup timeout change, the deploy job worked when it merged | 19:23 |
frickler | clarkb: ok, thx for checking | 19:26 |
clarkb | the gerrit ssh config change appears to have applied as expected | 19:26 |
clarkb | I need lunch soon and will look at service restarts afterwards | 19:27 |
tonyb | clarkb: Yeah. there is a master pid for each ControlSocket. ps `pidof ssh` | grep -E mux should show you? | 19:31 |
fungi | okay, disappearing for food, back in about an hour | 19:34 |
tonyb | Enjoy | 19:35 |
clarkb | The process for restarting gerrit and using the new key should be something like: docker-compose down; mv id_rsa and id_rsa.pub to new .bak suffixed files; mv the replication wauting queue aside; docker-compose up -d; trigger replication | 20:24 |
clarkb | this is basically the same process as on Friday but we expect different results due to the updated ssh config file with the correct path to the new private key | 20:25 |
clarkb | unfortunately it seems that we only get the "trying key foo" log messages when replication fails and we also get "no more keys" | 20:26 |
clarkb | so there isn't a positive confirmation of the key being used when it succeeds in the replication log. One alternativei f we don't want to mv id_rsa aside is to check the sshd logs on the gitea servers as those should log the hashed pubkey value | 20:27 |
clarkb | if we don't move id_rsa aside because we're concerned about a repeat of friday that may be good enough for confirmation | 20:27 |
tonyb | I get that MINA ssh isn't openssh but is it worth doing something like: docker exec -it --user gerrit gerrit_compose_???_1 ssh -vv gitea10.opendev.org before the replication step | 20:29 |
tonyb | that'd verif that the ssh config file is correct and the key is prsent at each end | 20:30 |
clarkb | ya I think we can do that before we even restart | 20:30 |
clarkb | I'll do that now | 20:30 |
tonyb | is .ssh/config coming from a mounted volume? | 20:31 |
clarkb | yes | 20:31 |
tonyb | Nice | 20:31 |
clarkb | unexpectedly openssh wants me to confirm the ssh host key for gitea09 so I've ^C'd and am trying to sort out why | 20:32 |
clarkb | oh I know the port is wrong | 20:32 |
clarkb | I did `sudo docker exec -it gerrit-compose_gerrit_1 bash` then `ssh -p222 git@gitea09.opendev.org` and that returned "Hi there, gerrit! You've successfully authenticated with the key named Gerrit replication key B, but Gitea does not provide shell access." | 20:33 |
tonyb | if you do include the -vv it'll tell you the Host sections parsed and the key being presented | 20:34 |
clarkb | tonyb: ah ok I can do that again. Though gitea seems to have confirmed it used the correct key | 20:35 |
tonyb | Oh okay | 20:35 |
tonyb | .... Ah I see it there. nevermind | 20:36 |
clarkb | "/var/gerrit/.ssh/config line 1: Applying options for gitea*.opendev.org" and "identity file /var/gerrit/.ssh/replication_id_rsa_B type 0" are in the debug output | 20:36 |
tonyb | That's sounding good. | 20:37 |
clarkb | openssh at least appears to parse this the way we expect | 20:37 |
tonyb | \o/ | 20:37 |
clarkb | in that case I'm inlcined to move id_rsa aside just so there is no doubt when we restart since we should be pretty confident that the new key can be used and will be used | 20:38 |
tonyb | Sure. | 20:39 |
fungi | okay, i've returned from food. need to change back into something more comfortable and will be available for gerrit work | 20:56 |
clarkb | infra-root looking at the gitea09 backup failures today did not fail and the failures that happened appaer to have occured due to mysql being updated at the same time as we try to do mysqldumps | 20:56 |
clarkb | automated softrware updates conflicting with automated backups. The good news is we backup to two separate location and only the one location seems to have conflicted with our mysql updates | 20:57 |
clarkb | I think this has to do with overlap in periodic job runtimes and updates upstream of us | 20:57 |
fungi | that sounds plausible | 20:57 |
clarkb | if it persists we should look at maybe removing the 02:00 to 08:00 window of time from valid hours for automated backups or something like that since that is around when periodic jobs run | 20:58 |
clarkb | fungi: see above for additional gerrit replication validation. I think we can start planning a time to restart the service | 20:58 |
clarkb | maybe 21:30 ish? | 20:59 |
fungi | yeah, already saw the ssh tests, i concur | 21:00 |
fungi | 30 minutes from now sounds great | 21:00 |
clarkb | I've started a root screen | 21:21 |
clarkb | also how does this look #status notice We are restarting Gerrit again for replication configuration updates after we failed to make them last week. Gerrit may be unavailable for short periods of time in the near future. | 21:24 |
fungi | a bit wordy. i'm not opposed, but if it were me i'd just repeat the one i sent last week for brevity | 21:25 |
fungi | attached to the screen session | 21:26 |
* clarkb looks that one up | 21:26 | |
clarkb | here it is: #status notice The Gerrit service on review.opendev.org will be offline momentarily to restart it onto an updated replication key | 21:27 |
clarkb | I'm good with that | 21:27 |
clarkb | I'm tempted to let the zuul gate clean up since a number of changes are saying they are less than a minute away | 21:28 |
clarkb | but then send that notice and proceed | 21:28 |
fungi | yeah, i don't feel like we need to apologize for the previous attempt. it's not like anybody else volunteered to take care of it | 21:29 |
fungi | takes as many tries as it takes | 21:29 |
fungi | not everything gets done right the first time around | 21:30 |
clarkb | the nova job is finishing up | 21:31 |
clarkb | so ya I can wait a couple minutes | 21:31 |
fungi | i'm in no hurry | 21:32 |
clarkb | I think enough things have happened in zuul we can proceed | 21:35 |
clarkb | I'll send the notice now | 21:35 |
clarkb | #status notice The Gerrit service on review.opendev.org will be offline momentarily to restart it onto an updated replication key | 21:35 |
opendevstatus | clarkb: sending notice | 21:35 |
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline momentarily to restart it onto an updated replication key | 21:35 | |
opendevstatus | clarkb: finished sending notice | 21:38 |
clarkb | I will proceed now | 21:38 |
clarkb | gerrit has been restarted. Now we need someone to push something :) | 21:40 |
fungi | i have something to push, just a sec | 21:40 |
fungi | git review took a minute | 21:41 |
fungi | https://opendev.org/openstack/election/commit/21cd3533ddfa587d16bbceed498c48029ab91fa9 | 21:41 |
fungi | that says patchset 3 which has a timestamp from now: https://review.opendev.org/c/openstack/election/+/876738 | 21:42 |
clarkb | you pushed as a wip so it didn't show up in my listing of open changes :) | 21:42 |
clarkb | the replication log appears to show replication for openstack/election completing successfully though | 21:43 |
clarkb | now to check the actual gitea content | 21:43 |
clarkb | fungi: I confirm that the latest patchset which was pushed after the restart shows up in gitea | 21:43 |
clarkb | I think we are good | 21:43 |
fungi | yep, lgtm! | 21:44 |
fungi | second time's a charm | 21:44 |
clarkb | I'm going to shutdown the screen now | 21:44 |
clarkb | I'll get a change up to remove the old key and rebase the gitea 1.21 upgrade on that | 21:44 |
opendevreview | Clark Boylan proposed opendev/system-config master: Set both replication gitea ssh keys to the same value https://review.opendev.org/c/opendev/system-config/+/902842 | 21:47 |
JayF | Internal server error while editing a wiki page, https://home.jvf.cc/~jay/wiki-openstack-org-error-20231206.png | 21:50 |
opendevreview | Clark Boylan proposed opendev/system-config master: Update gitea to 1.21.1 https://review.opendev.org/c/opendev/system-config/+/897679 | 21:50 |
JayF | "Lock wait timeout exceeded" | 21:50 |
JayF | Perhaps just a DB needs a restart? | 21:51 |
JayF | I'll note that the page edit did take. | 21:51 |
clarkb | or there is lock contention for that lock for some reason | 21:52 |
JayF | No idea, but wanted to make sure it got reported since I was able to screenshot the error. I know wiki is barely supported if at all | 21:52 |
clarkb | I was able to edit https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting without an error | 21:53 |
clarkb | so isn't representative of a general persistent problem (or at least not affecting 100% of edits) | 21:53 |
fungi | the database is remote (rackspace "trove" instance), so network timeouts for database writes aren't unheard of | 21:55 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM intentional gitea failure to hold a node https://review.opendev.org/c/opendev/system-config/+/848181 | 21:55 |
clarkb | I've refreshed the autoholds for ^ and I'll clean up the gerrit replication autoholds tomorrow | 21:57 |
clarkb | I'll approve the gerrit 3.9 image stuff first thing tomorrow | 22:42 |
clarkb | Will be a good conversation item for the gerrit community meeting if nothing else comes up | 22:42 |
tonyb | sounds good | 22:43 |
clarkb | now that EU and NA are both off of DST the meeting should be at 8am pacific time | 22:43 |
tonyb | that isn't terrible, but could be a pain with any school run | 22:44 |
clarkb | it alsoconflicts with some writing show off thing at school, but it sounds like the kids will come home with their writing and can show it off at home attheend of the day so not a big deal | 22:45 |
tonyb | that's frustrating. but good that you have a backup | 23:16 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!