fungi | when lack of features is the best possible feature, there's vi | 00:16 |
---|---|---|
clarkb | I'm trying to remember what does it but vi definitely doesn't invoke all my vimrc stuff. Maybe thats a built in behavior | 00:18 |
ianw | while i'm thinking of it from the meeting, it looks like we don't account for project.meetings in the afs graph | 00:21 |
ianw | this is another todo that would be a good intro to several services. we should have something that makes the grafana graph from a afs volume listing | 00:22 |
ianw | we've also missed mirror.deb-quincy and mirror.logs | 00:25 |
fungi | yeah, i fear it's a manually curated list in a file buried somewhere in system-config at the moment | 00:33 |
ianw | it's https://opendev.org/openstack/project-config/src/branch/master/grafana/afs.yaml | 00:34 |
ianw | the tool that puts the stats into graphite is https://opendev.org/opendev/afsmon | 00:35 |
opendevreview | Ian Wienand proposed opendev/system-config master: launch: fix RAX rdns command-line tool https://review.opendev.org/c/opendev/system-config/+/880785 | 00:52 |
opendevreview | Ian Wienand proposed opendev/system-config master: launch : add debug flag https://review.opendev.org/c/opendev/system-config/+/880786 | 00:52 |
opendevreview | Merged opendev/zone-opendev.org master: Add DNS servers for Ubuntu Jammy refresh https://review.opendev.org/c/opendev/zone-opendev.org/+/880576 | 02:16 |
opendevreview | Merged opendev/system-config master: opendev.org: remove Twitter link https://review.opendev.org/c/opendev/system-config/+/880570 | 03:35 |
opendevreview | Merged opendev/system-config master: opendev.org : update mailman links https://review.opendev.org/c/opendev/system-config/+/880571 | 03:39 |
opendevreview | Merged openstack/project-config master: Add TC repos in gerritbot https://review.opendev.org/c/openstack/project-config/+/880235 | 04:46 |
opendevreview | Merged openstack/project-config master: Add Dell Storage App to StarlingX https://review.opendev.org/c/openstack/project-config/+/879744 | 04:49 |
opendevreview | Ian Wienand proposed openstack/project-config master: Indent Gerrit ACL options https://review.opendev.org/c/openstack/project-config/+/879906 | 05:05 |
opendevreview | Ian Wienand proposed openstack/project-config master: gerrit/acls : add NO_CODE_CHANGE https://review.opendev.org/c/openstack/project-config/+/880115 | 05:05 |
opendevreview | Ian Wienand proposed openstack/project-config master: acl : remove NO_CODE_CHANGE from Allow-Post-Review https://review.opendev.org/c/openstack/project-config/+/880792 | 05:05 |
*** amoralej|off is now known as amoralej | 07:20 | |
opendevreview | waleed mousa proposed openstack/diskimage-builder master: Create a wildcard InfiniBand connection profile for IB interfaces https://review.opendev.org/c/openstack/diskimage-builder/+/880567 | 07:51 |
*** amoralej is now known as amoralej|lunch | 11:05 | |
*** amoralej|lunch is now known as amoralej | 12:22 | |
*** Guest9750 is now known as atmark | 14:32 | |
clarkb | I'm double checking etherpad things and the db dumps appear to drop table store if it exists which should wipe out my test data. THis is a good thing. I was worried for a second that I may have had to manually clean things up after testing but this doesn't ppear to be the case | 16:08 |
clarkb | There are some local package updates that I may as well apply and reboot for as well before this server goes into production. Working on that now | 16:08 |
clarkb | ok thats all done and server is up and running again and looks good when I test with overriden /etc/hosts | 16:20 |
clarkb | The only other real prep work I can think of is getting reviews on https://review.opendev.org/c/opendev/zone-opendev.org/+/880168 which is the DNS update which we will merge when we shutdown services for database dumping | 16:24 |
clarkb | apparently that change is merge conflicted too /me fixes it | 16:51 |
clarkb | ah the new ns servres change landed | 16:51 |
opendevreview | Clark Boylan proposed opendev/zone-opendev.org master: Update etherpad.o.o to point at etherpad02 https://review.opendev.org/c/opendev/zone-opendev.org/+/880168 | 16:55 |
opendevreview | Clark Boylan proposed opendev/zone-opendev.org master: Cleanup etherpad DNS records https://review.opendev.org/c/opendev/zone-opendev.org/+/880169 | 16:55 |
clarkb | fungi: ^ sorry about that if you have another chance to take a look | 16:55 |
fungi | aha | 16:55 |
*** amoralej is now known as amoralej|off | 16:57 | |
clarkb | I've been staring at java this morning and I think the replication stuff might be two separate issues. The first is the spamming of the error log for tasks that don't exist causing errors when renaming the task file from waiting/ to running/. I suspect this is possibly a race between different threads trying to handle tasks at the same time? Then separately we have the leaking | 18:08 |
clarkb | waiting/ tasks that never get processed | 18:08 |
clarkb | The reason I think the leaks are separate is that the task uuids for leaked files don't appear to show up in our error logs | 18:09 |
clarkb | they just sit there forever for some reason rather than getting handled | 18:09 |
clarkb | aha yes two different ChainedScheduler.StreamScheduler<>()s are created because ReplicationQueue() calls synchronizePendingEvents() twice. Once at start() and another at run() | 18:14 |
clarkb | I think for the errors moving files we may have a race between those two threads | 18:15 |
fungi | oh that's fun | 18:19 |
fungi | speaking of fun, i'm heading out to an early wedding anniversary dinner with christine, but am planning to return well before the start of the etherpad maintenance window | 18:19 |
fungi | bbiaw | 18:20 |
clarkb | now that I've said that the streams seem to list the contents on the waitin/ directory once and process them entirely then the thread should die | 18:33 |
clarkb | so that may result in oddness at startup but I wouldn't expect to continue seeing the errors (which we do but at a lower rate than at startup) | 18:34 |
clarkb | this plugin is quite complicated too | 18:34 |
slittle1 | Please set me up as first core member of starlingx-app-dell-storage-core. As usual I'll add the others. | 18:48 |
clarkb | slittle1: I'm popping out for lunch but can take a look after | 19:04 |
frickler | the warning at the end of https://review.opendev.org/c/openstack/requirements/+/879743 might be interesting: Job openstack-tox-pep8: unable to map line for file comments: stderr: 'fatal: file update.py has only 1 lines' | 19:19 |
frickler | the reason for that, as I noticed while fixing that pop8 issue, is that the file is indeed a symlink to another file in the same repo | 19:19 |
frickler | maybe someone wants to look into adding some special handling for that case | 19:20 |
frickler | (https://review.opendev.org/c/openstack/requirements/+/880884 is the fix for the underlying issue, triggered or rather brought to the light by a new hacking release) | 19:21 |
slittle1 | Please set me up as first core member of starlingx-app-dell-storage-core. As usual I'll add the others. | 20:08 |
clarkb | slittle1: done | 20:26 |
clarkb | I'm finding that the replication plugin logging is really confusing | 21:11 |
clarkb | NOT_ATTEMPTED in the logs means "I'm going to get to it in just a second most likely" | 21:11 |
clarkb | but heavily implies "I'm not doing this at all? | 21:11 |
fungi | i'm back, just catching up | 21:24 |
ianw | clarkb: sorry, catching up, but we have two distinct messages in the error log spew? | 21:26 |
clarkb | ianw: two distinct behaviors I think. One is the leaking and the other is the error spew | 21:30 |
clarkb | basically I don't know what causes those errors and it doesn't seem to be directly related to the leaked files | 21:31 |
clarkb | I am 95% certain I understand the leaking of All-Projects and All-Users task files now. But not the user edit refs (though I think I have a sense for what happens I can't trace it in the code yet) | 21:31 |
clarkb | In the case of All-Projects and All-Users we hit https://gerrit.googlesource.com/plugins/replication/+/refs/heads/master/src/main/java/com/googlesource/gerrit/plugins/replication/Destination.java#419 which means we never get to line 440 which means we never schedule the activity that will ultimately notifyFinished which cleans up the files on disk | 21:34 |
clarkb | Inthe case of user edits I suspect what is happening is the filter here: https://gerrit.googlesource.com/plugins/replication/+/refs/heads/master/src/main/java/com/googlesource/gerrit/plugins/replication/Destination.java#401 is modifying the refs list that gets scheduled which causes us to have a new hash on disk which does get cleaned up. But because the hash is different it does not | 21:35 |
clarkb | cleanup the original file on disk that triggered things in the first place | 21:35 |
clarkb | Where I'm lost for this second thing is I don't see where we are creating the new content on disk for that to work | 21:36 |
clarkb | I need to put this down now though in order to focus on etherpad stuff | 21:43 |
clarkb | I'm going to get ready for that now. I've been trying to take notes in the source directly then I can git diff my way to further understanding | 21:44 |
clarkb | I've put etherpad01 and etherpad02 in the emergency file per https://paste.opendev.org/show/brRuhPssVLSi4UnF5hcN/ everything else needs to wait until 22:00 | 21:47 |
clarkb | re gerrit I'm probably going to need to draw out the actual flow of events. The problem is I don't understand the scheduler and event system well enough to do that yet. But I'm sure I'll get there | 21:54 |
fungi | 4 minutes to maintenance. should we #status notice it? | 21:55 |
clarkb | sure | 21:56 |
fungi | status notice The Etherpad service on etherpad.opendev.org will be offline for the next 90 minutes for a server replacement and operating system upgrade | 21:56 |
fungi | something like that? | 21:56 |
clarkb | sounds great to me | 21:56 |
fungi | #status notice The Etherpad service on etherpad.opendev.org will be offline for the next 90 minutes for a server replacement and operating system upgrade | 21:57 |
opendevstatus | fungi: sending notice | 21:57 |
-opendevstatus- NOTICE: The Etherpad service on etherpad.opendev.org will be offline for the next 90 minutes for a server replacement and operating system upgrade | 21:57 | |
clarkb | I've stopped services on 02 since those dont' actually matter yet | 21:58 |
opendevstatus | fungi: finished sending notice | 21:59 |
fungi | cool | 22:00 |
clarkb | ok it is also time now. I will stop services on etherpad01 now | 22:00 |
clarkb | fungi: want to approve https://review.opendev.org/c/opendev/zone-opendev.org/+/880168 ? | 22:01 |
fungi | on it | 22:01 |
fungi | and done | 22:02 |
clarkb | and maybe check that etherpad looks down to you in your browser? | 22:02 |
clarkb | it should be down on both servers now | 22:02 |
clarkb | I will start the db dump if that looks good to you | 22:02 |
ianw | down for me | 22:02 |
clarkb | ok proceeding with the dump. THis will take about half an hour | 22:03 |
opendevreview | Merged opendev/zone-opendev.org master: Update etherpad.o.o to point at etherpad02 https://review.opendev.org/c/opendev/zone-opendev.org/+/880168 | 22:03 |
clarkb | the file is writing to my homedir on etherpad01 and should end up being just under 4GB large | 22:04 |
clarkb | you can track rough progress this way | 22:04 |
clarkb | for anyone following along on the paste we are on line 10 | 22:06 |
clarkb | we are at 2GB and about halfway through 30 minutes so I think we are right on time | 22:16 |
fungi | awesome | 22:16 |
fungi | we're coming up on completion of this phase in the next few minutes, presumably | 22:27 |
clarkb | yes up to 3.6GB should be just a few more minutes | 22:27 |
clarkb | DNS is showing updated for me now too | 22:27 |
fungi | same, but it was updated shortly after deployment for me | 22:28 |
fungi | ttl was only 5m anyway | 22:28 |
clarkb | I only just now remembered to check :) | 22:28 |
clarkb | should've put that on my list | 22:29 |
ianw | luckily we cnamed etherpad.openstack.org -> opendev.org, so that's good too | 22:30 |
clarkb | its down. Chowning and copying now | 22:30 |
clarkb | *its done | 22:30 |
fungi | yat! | 22:31 |
fungi | yay too | 22:31 |
fungi | copying looks done | 22:34 |
fungi | checksum comparison of the db dump on both servers lgtm | 22:35 |
clarkb | yup my sha256sums match up | 22:35 |
clarkb | I'm actually going to skip the step on line 19. THis was more for when I was testing but in theory this matters less now since we are moving off of that server so disk being limited is ok | 22:35 |
clarkb | that means next step is to restore on etherpad02. I'm doing this next | 22:36 |
fungi | yep | 22:36 |
clarkb | This is another half hour or so step and harder to track progress for. Instead we just get to be patient | 22:39 |
clarkb | I think /var/etherpad/db will be at like 27 or 29 GB usage when the restore is done. Currently at 4ish | 22:41 |
JayF | So, I just went to do some looking at the status of virtualpdu. It looks like the redirect has been removed (https://github.com/openstack/virtualpdu) but there is no content sync'd into the repo | 22:56 |
JayF | still not a hugely urgent issue, but just followed up on it today to find this state :) | 22:57 |
clarkb | JayF: does the repo run the github synchronization job? | 22:57 |
clarkb | and if so have you merged any changes? | 22:57 |
ianw | JayF: you'll have to merge something with the sync job, I'd say | 22:57 |
JayF | I don't know (would assume yes), and no. | 22:57 |
JayF | So basically merge a noop change to virtualpdu and we should be good? | 22:57 |
clarkb | check that the job is applied to it in openstack/project-config first | 22:58 |
clarkb | but yes | 22:58 |
ianw | yeah, you can see the jobs @ https://zuul.openstack.org/project/opendev.org/openstack/virtualpdu | 23:00 |
ianw | post jobs - openstack-upload-github-mirror | 23:00 |
ianw | i guess there is some chance given this was retired and moved around it may not work, but we can debug that if it happens | 23:01 |
JayF | What about for bot configs? Not seeing updates for virtualpdu in #openstack-ironic | 23:01 |
JayF | I assume that's in git somewhere, too | 23:01 |
fungi | gerritbot config is in openstack/project-config yes | 23:02 |
JayF | ack, will check that now | 23:02 |
clarkb | up to 23 GB shouldn't be long now | 23:03 |
opendevreview | Jay Faulkner proposed openstack/project-config master: Enable IRC notifications for virtualpdu for ironic https://review.opendev.org/c/openstack/project-config/+/880895 | 23:04 |
ianw | clarkb: if you're happy with https://review.opendev.org/c/openstack/project-config/+/880792 i can merge that NO_CHANGE stack soon | 23:06 |
ianw | that removes it from the post-review label as disucssed. i couldn't see any others i thought needed it explicitly removed | 23:06 |
clarkb | ianw: did we want to do the indent chagne first? I was thinking the other way around because landing the indent chagne would require manually running manage-projects since every file is affected | 23:08 |
clarkb | db restore is done | 23:09 |
clarkb | are we ready to start up services on etherpad02? | 23:09 |
ianw | i could go either way -- my thought was to get the config options in sync, despite the indenting moving about | 23:09 |
clarkb | the restore command exited 0 I can't really think of a good way to check things without starting the service. So I think I'm starting it now | 23:10 |
ianw | there seemed to be some desire to make the normalise process a bit more explicit before requiring indenting | 23:10 |
fungi | i'm ready | 23:10 |
ianw | i've reloaded some etherpads i've got open and LGTM! | 23:11 |
clarkb | ya the zuul user survey feedback pad has the new stuff fungi added to it since I did the test restore on etherpad02 implying that the restore dropped and pulled in new content (the disk usage listings by df also implied this) | 23:12 |
clarkb | https://etherpad.opendev.org/p/jammy-server-test-pad seems to work with basic stuff | 23:13 |
fungi | yeah, i've reconnected to pads i was using earlier today and they seem fine | 23:14 |
clarkb | infra-root are we happy enough with it to remove etherpad02 from the emergency file? If so I think that is the last step for today. Then tomorrow we can land the change to remove etherpad01 from dns and I need to push a change to remove it from system-config as well | 23:15 |
clarkb | assuming everything remains happy | 23:15 |
fungi | i'm happy with it, and still around for a while if there are problems | 23:15 |
clarkb | ok etherpad02 is no longer in the emergency file but 01 is | 23:16 |
clarkb | I'll work on the change to remove etherpad01 from system-config now so that is ready to merge whenever we're happy | 23:16 |
clarkb | nevermind I was good and lready took care of it https://review.opendev.org/c/opendev/system-config/+/880087 | 23:17 |
clarkb | reviews very much welcome and ya if things look good tomorrow then maybe we can land ^ and https://review.opendev.org/c/opendev/zone-opendev.org/+/880169 | 23:18 |
fungi | it usually takes me spotting my earlier change in the gerrit conflicts list to realize i've just redone my own work | 23:18 |
clarkb | hrm is meetpad not showing the shared document? I wonder if this could be java's bad dns caching | 23:21 |
fungi | may need jitsi service restarts? | 23:22 |
clarkb | ya could be | 23:23 |
clarkb | I can try that | 23:23 |
fungi | i'll admit i didn't think to test it | 23:23 |
clarkb | it isn't something we can easily test pre move iirc because it does the proxying in the host itself | 23:24 |
clarkb | so we kinda have to take a leap of faith that if everything else works it should too which is what we did :) | 23:25 |
opendevreview | Merged openstack/project-config master: Enable IRC notifications for virtualpdu for ironic https://review.opendev.org/c/openstack/project-config/+/880895 | 23:25 |
clarkb | it seems to work now after restarting jitsi meet services. I suspect the issue was caching the old ip addr and that being inaccessible | 23:26 |
fungi | sounds about right | 23:28 |
* clarkb makes a note to check that backups are backing up what we want from etherpad02 tomorrow as well | 23:28 | |
clarkb | the cronjobs are installed, mostly a matter of mounting the backups and checking content looks good | 23:28 |
clarkb | why is opendevreview dropping? | 23:29 |
clarkb | gerritbot container restarted according to docker ps -a | 23:31 |
clarkb | it should be fine now I think unless something is very broken and I'm out of steam to deal with very broken for that bot right now :) | 23:31 |
fungi | it restarted for the config update in 880895 deploying just now | 23:32 |
clarkb | aha | 23:32 |
clarkb | #status log Moved the etherpad service from etherpad01.opendev.org to etherpad02.opendev.org | 23:32 |
fungi | nothing to see here, move along | 23:32 |
opendevstatus | clarkb: finished logging | 23:32 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!