opendevreview | Alberto Gonzalez proposed openstack/project-config master: Add new repository powertrain-build https://review.opendev.org/c/openstack/project-config/+/916038 | 06:49 |
---|---|---|
opendevreview | Alberto Gonzalez proposed openstack/project-config master: Add new repository powertrain-build https://review.opendev.org/c/openstack/project-config/+/916038 | 06:57 |
opendevreview | Dmitriy Rabotyagov proposed openstack/project-config master: Move vexxhost/ansible-role-frrouting to openstack namespace https://review.opendev.org/c/openstack/project-config/+/910018 | 08:22 |
opendevreview | Dmitriy Rabotyagov proposed openstack/project-config master: Move vexxhost/ansible-role-frrouting to openstack namespace https://review.opendev.org/c/openstack/project-config/+/910018 | 08:22 |
opendevreview | Merged opendev/system-config master: gitea: move robots.txt to public directory https://review.opendev.org/c/opendev/system-config/+/916420 | 12:19 |
frickler | ^^ verified new image deployed and running on all gitea instances, all serving the moved robots.txt file with updated timestamp | 12:49 |
fungi | frickler: thanks! and no more error about serving that from a legacy path, i take it? | 12:58 |
frickler | fungi: yes, that message is gone now, too, thx for the reminder to check it :) | 13:10 |
fungi | perfect! | 13:16 |
clarkb | thank you! | 15:04 |
clarkb | infra-root: if you have time over the next few hours pelase review the project rename plan etherpad and related gerrit changes: https://etherpad.opendev.org/p/opendev-project-renames-20240422 (change links are in the etherpad) | 15:05 |
clarkb | I need to copy the renames.yaml file to bridge but otherwise I think everything should be prepped unless you find concerns or issues | 15:05 |
clarkb | we will begin renaming in just under 5 hours. cc noonedeadpunk | 15:09 |
clarkb | ok I've created the little staging dir on bridge and copied the rename yaml file into it | 15:32 |
clarkb | that was the last todo I had in terms of prep other than review | 15:33 |
frickler | is it a known issue that the gitea links on the gerrit repo tags page are wrong? like https://opendev.org/openstack/sushy-oem-idrac/src/tag/refs/tags/5.0.0 instead of https://opendev.org/openstack/sushy-oem-idrac/src/tag/5.0.0 ? | 16:03 |
frickler | (on https://review.opendev.org/admin/repos/openstack/sushy-oem-idrac,tags) | 16:03 |
clarkb | no that is news to me. `tag = ${project}/src/tag/${tag}` is what we have in gerrit's config that sets that | 16:04 |
clarkb | we probably need a different ${tag} variable that doesn't include the refs/tags/ prefix | 16:05 |
clarkb | https://gerrit-review.googlesource.com/Documentation/config-gerrit.html#gitweb indicates there isn't currently a different option there | 16:06 |
* noonedeadpunk around | 16:18 | |
clarkb | noonedeadpunk: mostly just a heads up that we plan to do the rename later today and to double check things. I had to rebase your change to address merge conflicts | 16:20 |
noonedeadpunk | yeah, thanks a lot for that | 16:20 |
fungi | rename will be happening in a little over 3.5 hours | 16:21 |
clarkb | I just checked ssh (not from bridge but from my local network so maybe not the best test) and all hsots are up. I however cannot ssh into storyboard-dev01 with my new key because it is in the emergency file | 16:25 |
clarkb | the playbooks does not honor the emergency list which means we'll rename things there anyway. fungi you're on the emergency file entry comment not sure if that is a problem for you | 16:25 |
fungi | i think it's fine, we've had it in there for 4 years already since the attachments work stalled on patches that were behind the upgrade to python 3 | 16:27 |
clarkb | ack | 16:28 |
fungi | if the playbook didn't respect the emergency file the last dozen times we ran it, then it's probably still fine | 16:28 |
fungi | also it doesn't appear to have used or be using storyboard anyway, so i don't expect it's going to end up doing any update queries there | 16:29 |
fungi | the ansible-role-frrouting project i mean | 16:29 |
clarkb | even better | 16:30 |
fungi | maintenance plan lgtm. i added the urls to test and updated the patchset number in the submit command | 17:12 |
fungi | i'm also +2 on both the changes | 17:12 |
clarkb | thanks! | 17:12 |
clarkb | I added a #status notice message to step three in the etherpad | 18:13 |
clarkb | feel free to edit with more or less info | 18:13 |
clarkb | I'm going to try and grab early lunch in a little bit. I'll stick servers in the emergency file before I do that (so that step will get done a little early) | 18:22 |
fungi | sounds good | 18:29 |
clarkb | fungi: so I can't ssh to storyboard-dev01 from bridge due to ssh keys not matching | 18:36 |
clarkb | fungi: I suspect that fixing that will be quicker than landing a chagne to remove storyboard-dev01 from the playbook | 18:36 |
clarkb | as an alternative we could use a --limit on the ansible-playbook command | 18:36 |
clarkb | fungi: any chance you can look at storyboard-dev01 and see if we can/should fix ssh from bridge to it and it not then we can fallback to the limit I think | 18:38 |
fungi | clarkb: yeah, i'm trying | 18:45 |
clarkb | ok you must've just fixed it because I'm seeing authorized_keys file match between storyboard and storyboard-dev | 18:45 |
clarkb | interestingly it still fails | 18:45 |
fungi | the "from" restriction was for the old bridges, yeah | 18:46 |
fungi | i updated that but it doesn't seem to have helped | 18:46 |
clarkb | I think sshd_config is preventing it too | 18:47 |
clarkb | I'm somewhat inclined to say maybe we use the --limit since ansible hasn't hopped onto this host in ever | 18:47 |
fungi | /var/log/auth.log records it as refused, yeah | 18:47 |
clarkb | and then we remove the dev server from the rename playbook | 18:47 |
clarkb | or we use a local checkout of system-config and update the playbook to remove the one play for storyboard-dev instead of using --limit | 18:48 |
clarkb | because who knows what else may fail if we try to connect to this server with ansible | 18:48 |
fungi | the reason we have been renaming there is that manage-projects still tries to update it too | 18:48 |
fungi | so we'd need to remove it there first | 18:49 |
clarkb | I'm having a hard time understanding how this ever worked with the rename playbook though | 18:49 |
clarkb | also I don't see where manage-projects runs on storyboard-dev | 18:49 |
clarkb | the manage-projects.yaml playbook runs against gitea and gerrit | 18:50 |
fungi | okay, got it fixes for now | 18:51 |
fungi | er, fixed | 18:51 |
fungi | if a project specified use-storyboard, manage-projects added it to storyboard and storyboard-dev when it didn't exist. maybe we've already pared that down? | 18:52 |
fungi | and yeah, we had a similar ip address restriction in /etc/ssh/sshd_config that also needed updating | 18:52 |
fungi | followed by an ssh service restart | 18:53 |
clarkb | fungi: looks like puppet runs jeepyb things but since storyboard-dev is in emergency and ahs been for some time puppet won't run there | 18:53 |
clarkb | so ya it just never touches storyboard-dev I don't think | 18:53 |
fungi | yeah, so probably fine to remove | 18:54 |
clarkb | right but what do we want to do for this run | 18:54 |
clarkb | because I don't think we can land a change to remove it from the playbook quickly enough | 18:54 |
fungi | it should be fine anyway now that bridge can ssh again | 18:54 |
clarkb | I think our options are let it run as is and see what breaks if anything, use --limit to exclude that server, or use a copy of the playbook in a checkout of the repo with the playbooked edited to remove that play | 18:55 |
clarkb | sounds like you think we should go ahead with option 1 I guess I'm good with that and we can change our approach if that fails | 18:55 |
clarkb | fungi: oh actually I think it will fail because that repo is unlikely to be in storyboar-dev at all | 18:55 |
clarkb | so the sql update won't have any rows to update? or maybe that isn't a failure with mysql | 18:56 |
clarkb | ya actually that should be a non error right? it will just update zero rows? | 18:56 |
fungi | it's not a failure with sql update queries. will just match 0 rows | 18:56 |
fungi | and therefore no-op | 18:56 |
clarkb | ok then ya I think we can proceed with the playbook as is and address it if there is a problem | 18:56 |
clarkb | in that case steps 2 and 2.5 are both done, but feel free to double check | 18:57 |
clarkb | and I'm going to eat a sandwich | 18:57 |
clarkb | I put a clone of system-config in our tmpdir for the rename just in case we end up needing a place to edit that playbook | 19:08 |
clarkb | I've started a screen and asked it to log into our working dir | 19:31 |
clarkb | the timestamp on the rename_repos.yaml playbook on bridge is quite a bit newer than I expected, However the content looks correct to me | 19:34 |
fungi | timestamp is when you did the git clone, i think? | 19:36 |
clarkb | fungi: I mean on the file in /home/zuul/.../etc | 19:38 |
clarkb | not the new clone I made | 19:38 |
clarkb | but ya maybe that aligns with zuul jobs updating the repo | 19:38 |
clarkb | With 10 minutes to go before our announced window I guess now is a good time to double check that we're still happy to proceed? and if so any updates to make to the proposed #status notice message? | 19:49 |
fungi | everything lgtm including the status notice | 19:52 |
clarkb | great we'll get started in a few minutes then | 19:53 |
fungi | though if you wanted you could also append https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/message/KP6NCOKJEYRGFD5FS26CZPVLEKFSY2ZO/ for reference | 19:53 |
fungi | i always like doing that because it serves as a subtle reminder that you can get advance warning of this stuff if you subscribe to the service-announce ml | 19:54 |
clarkb | problem is those links are so long now :) | 19:56 |
clarkb | but still within the irc message limit so ya I'll do that. | 19:56 |
clarkb | ok it is 2000 UTC. I'm going to send the notice now | 20:00 |
clarkb | #status notice Gerrit will be offline for a short time while we rename a project repo. https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/message/KP6NCOKJEYRGFD5FS26CZPVLEKFSY2ZO/ for more details | 20:00 |
opendevstatus | clarkb: sending notice | 20:00 |
fungi | thanks! | 20:00 |
-opendevstatus- NOTICE: Gerrit will be offline for a short time while we rename a project repo. https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/message/KP6NCOKJEYRGFD5FS26CZPVLEKFSY2ZO/ for more details | 20:00 | |
clarkb | fungi: I put the command in screen commented out then ls'd the file paths just to make sure all is well. I guess I go ahead and run the command as soon as the bot is done sending the notice? | 20:02 |
opendevstatus | clarkb: finished sending notice | 20:03 |
clarkb | fungi: you good with me running the playbook now? | 20:03 |
clarkb | oops looks like you had it in copy mode. sorry if I dropped it out of that | 20:04 |
fungi | this is the screen session on bridge? | 20:04 |
clarkb | yes | 20:04 |
fungi | not seeing the commented command | 20:04 |
clarkb | its a few lines up | 20:04 |
clarkb | I did two ls's and a cat after it | 20:04 |
fungi | aha the ansible-playbook command | 20:05 |
fungi | yep, lgtm | 20:05 |
clarkb | yes, that is what will actually do the rename | 20:05 |
clarkb | ok I'm proceeding to invoke that command now | 20:05 |
fungi | tonyb: ^ you had mentioned being around for this, so fyi | 20:06 |
fungi | lookin' good so far | 20:07 |
clarkb | ok playbook is done with no errors | 20:10 |
fungi | can confirm, yes | 20:12 |
clarkb | everything I've checked so far lgtm | 20:12 |
fungi | i see a working redirect in gitea | 20:12 |
clarkb | those three links you have for example as well as general gerrit web ui | 20:12 |
fungi | gerrit project link has content, yes | 20:12 |
clarkb | I'm going to look at gerrit queues next then we can decide if we want to forc merge the project-config change with or without hosts in the emergency file | 20:13 |
fungi | old project link in gerrit is a 404 too | 20:13 |
fungi | and changes have been moved according to the query url | 20:13 |
fungi | query for the old url turns up an empty list | 20:13 |
clarkb | show queue doesn't show any replication tasks. I think one of the reasons we were particularly concerned about replication and letting jobs run and then noop is that gerrit would replicate all the things on startup then anything that merged would be behind that list | 20:14 |
clarkb | since gerrit doesn't do that anymore and I've just confirmed that there is nothing in the replication queue I'm reasonably happy to see what happens if we force merge with hosts outside of the emergency file | 20:15 |
clarkb | maybe we wait for the index queue to complete first though? | 20:15 |
fungi | yeah, i think we can | 20:16 |
clarkb | we're down to 1939 indexing tasks and had 2173 when I first checked | 20:16 |
fungi | the only real risk is if we have in-flight project changes that could merge in the interim and rerun manage-projects with the wrong state, i think? | 20:17 |
clarkb | fungi: yes | 20:17 |
fungi | oh, though it should result in skipping the servers in emergency too | 20:17 |
clarkb | right the risk begins when we remove the servers from the emergency file if we want to land the change and have it run normally to noop | 20:17 |
clarkb | I have not removed any servers from the emergency file yet so we're good for now | 20:18 |
fungi | and the waiting change should also be a no-op since we made the updates out of band to reflect its intended eventual state | 20:18 |
clarkb | exactly | 20:18 |
clarkb | I think what we can do is wait for gerrit to settle the index queue (just so there isn't a bunch of stuff fighting for cpu time), then ermove hosts from the emergency file and force merge the project-config change. Then check that it replicates and applies normally | 20:18 |
clarkb | now 1741 indexing tasks. It seems to be moving fairly quickly | 20:19 |
fungi | it's way faster than it used to be | 20:19 |
clarkb | ya they chunk it up by estimated work now rather than by naive project order | 20:20 |
clarkb | and they can split larger projects into multiple tasks so each task has a ceiling of effort (roughly) | 20:20 |
clarkb | results in much better scheduling of effort | 20:20 |
clarkb | fungi: are you good with my proposed plan or do you think we should force merge with things still in emergency then check replication and then either reenqueue the manage-project jobs or just wait for the daily run later today? | 20:21 |
clarkb | but ya thinking back on it the main concern was gerrit replication could lag because gerrit would replicate everythign on startup. It doesn't do that anymore so I don't think w eneed to go through that crazy dance | 20:22 |
fungi | i agree we should be able to simplify this | 20:27 |
fungi | i suppose if there are multiple rename changes then we need to keep the emergency file set for all but the last one yeah? | 20:28 |
clarkb | cool in that case we wait for a few more minutes for reindexing to end and then plan the force merge | 20:28 |
clarkb | fungi: ++ | 20:28 |
clarkb | down to 853 now | 20:28 |
clarkb | down to 104 | 20:33 |
clarkb | fungi: do you want to escalte privs to force merge or should I? | 20:33 |
clarkb | you reviewed the change and I did a reabse of it. Might be good to have you do the merging? | 20:33 |
clarkb | and I'll let you know when the emergency file is cleaned up and that can happen | 20:33 |
fungi | can do | 20:34 |
fungi | technically the pad is missing the gerrit set-members --add command | 20:35 |
clarkb | its on line 43 | 20:35 |
fungi | oh, i missed it because of the optional step. never mind! | 20:36 |
clarkb | it also used the wrong hostname for the vote setting command. I've just fixed that | 20:36 |
clarkb | reidnexing is done. I'll remove hosts from the emergency file now | 20:38 |
clarkb | fungi: that is done. I did a cat of the file in the screen to confirm it too | 20:39 |
clarkb | I think you can merge the chagne when you are ready | 20:39 |
opendevreview | Merged openstack/project-config master: Move vexxhost/ansible-role-frrouting to openstack namespace https://review.opendev.org/c/openstack/project-config/+/910018 | 20:40 |
fungi | clarkb: tonyb: ^ | 20:40 |
clarkb | thanks replication happened almost instantly and the chagne shows up at https://opendev.org/openstack/project-config/commits/branch/master | 20:41 |
clarkb | the manage projects-job is running now and should noop, once we've confirmed that we can land the recording change and I think we're done | 20:41 |
clarkb | the playbook is done and it ran quickly enough that I suspect it did noop | 20:44 |
clarkb | checking gitea and gerrit links again | 20:44 |
fungi | yeah, looks good to me still | 20:46 |
clarkb | ya the gitea redirect is still there. In gerrit I get an error trying to open the vexxhost project name and if I search by the vexxhost project name I get no changes and that name doesn't show in the autocomplete list | 20:46 |
fungi | so it didn't recreate the old project, which is the biggest concern | 20:46 |
clarkb | correct | 20:46 |
fungi | or would be the biggest concern if it had, i mean | 20:46 |
clarkb | merging https://review.opendev.org/c/opendev/project-config/+/916323 is the last thing in the todo list | 20:47 |
clarkb | whcih can happen normally | 20:47 |
clarkb | I also made a note of what we did at the end of the etherpad | 20:47 |
clarkb | I have self approved 916323 | 20:48 |
fungi | oh, i also approved it | 20:48 |
opendevreview | Merged opendev/project-config master: Add record for planned rename on April 22, 2024 https://review.opendev.org/c/opendev/project-config/+/916323 | 20:48 |
clarkb | fungi: should I go ahead and exit the screen as well? | 20:48 |
fungi | sure | 20:48 |
clarkb | the log is in our working dir so we've got that captured if necessary | 20:48 |
clarkb | I'm going to try and take a break for a bit as I ended up eating a sandwich in about 5 minutes and then getting back to my office for lunch. But I think this can be considered done and seems to have gone well. When I get back I'll put our weekly meeting agenda together. Now is a good time to add items if you have them | 20:49 |
fungi | yeah, gonna warm up some leftovers for dinner | 20:54 |
clarkb | looks like fungi added an agenda item. I added a couple and pruned some others. I'll get this sent out at about 23:00 UTC. | 22:14 |
fungi | cool, thanks! | 22:18 |
clarkb | fungi: if you have time I did end up pushing two changes for mailman web hosting to add a robots.txt and also add the UA filtering | 22:19 |
clarkb | I think we can probably land those changes alongside the zuul changes that do similar if/when you are happy with them | 22:19 |
clarkb | there is also the glean change for python3.12 support https://review.opendev.org/c/opendev/glean/+/915907 | 22:19 |
fungi | i thought i reviewed them but will double-check | 22:20 |
opendevreview | Merged opendev/glean master: Use importlib when pkg_resources isn't available https://review.opendev.org/c/opendev/glean/+/915907 | 23:20 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!