Wednesday, 2023-04-19

fungiwhen lack of features is the best possible feature, there's vi00:16
clarkbI'm trying to remember what does it but vi definitely doesn't invoke all my vimrc stuff. Maybe thats a built in behavior00:18
ianwwhile i'm thinking of it from the meeting, it looks like we don't account for project.meetings in the afs graph00:21
ianwthis is another todo that would be a good intro to several services.  we should have something that makes the grafana graph from a afs volume listing00:22
ianwwe've also missed mirror.deb-quincy and mirror.logs00:25
fungiyeah, i fear it's a manually curated list in a file buried somewhere in system-config at the moment00:33
ianwit's 00:34
ianwthe tool that puts the stats into graphite is
opendevreviewIan Wienand proposed opendev/system-config master: launch: fix RAX rdns command-line tool
opendevreviewIan Wienand proposed opendev/system-config master: launch : add debug flag
opendevreviewMerged opendev/ master: Add DNS servers for Ubuntu Jammy refresh
opendevreviewMerged opendev/system-config master: remove Twitter link
opendevreviewMerged opendev/system-config master: : update mailman links
opendevreviewMerged openstack/project-config master: Add TC repos in gerritbot
opendevreviewMerged openstack/project-config master: Add Dell Storage App to StarlingX
opendevreviewIan Wienand proposed openstack/project-config master: Indent Gerrit ACL options
opendevreviewIan Wienand proposed openstack/project-config master: gerrit/acls : add NO_CODE_CHANGE
opendevreviewIan Wienand proposed openstack/project-config master: acl : remove NO_CODE_CHANGE from Allow-Post-Review
*** amoralej|off is now known as amoralej07:20
opendevreviewwaleed mousa proposed openstack/diskimage-builder master: Create a wildcard InfiniBand connection profile for IB interfaces
*** amoralej is now known as amoralej|lunch11:05
*** amoralej|lunch is now known as amoralej12:22
*** Guest9750 is now known as atmark14:32
clarkbI'm double checking etherpad things and the db dumps appear to drop table store if it exists which should wipe out my test data. THis is a good thing. I was worried for a second that I may have had to manually clean things up after testing but this doesn't ppear to be the case16:08
clarkbThere are some local package updates that I may as well apply and reboot for as well before this server goes into production. Working on that now16:08
clarkbok thats all done and server is up and running again and looks good when I test with overriden /etc/hosts16:20
clarkbThe only other real prep work I can think of is getting reviews on which is the DNS update which we will merge when we shutdown services for database dumping16:24
clarkbapparently that change is merge conflicted too /me fixes it16:51
clarkbah the new ns servres change landed16:51
opendevreviewClark Boylan proposed opendev/ master: Update etherpad.o.o to point at etherpad02
opendevreviewClark Boylan proposed opendev/ master: Cleanup etherpad DNS records
clarkbfungi: ^ sorry about that if you have another chance to take a look16:55
*** amoralej is now known as amoralej|off16:57
clarkbI've been staring at java this morning and I think the replication stuff might be two separate issues. The first is the spamming of the error log for tasks that don't exist causing errors when renaming the task file from waiting/ to running/. I suspect this is possibly a race between different threads trying to handle tasks at the same time? Then separately we have the leaking18:08
clarkbwaiting/ tasks that never get processed18:08
clarkbThe reason I think the leaks are separate is that the task uuids for leaked files don't appear to show up in our error logs18:09
clarkbthey just sit there forever for some reason rather than getting handled18:09
clarkbaha yes two different ChainedScheduler.StreamScheduler<>()s are created because ReplicationQueue() calls synchronizePendingEvents() twice. Once at start() and another at run()18:14
clarkbI think for the errors moving files we may have a race between those two threads18:15
fungioh that's fun18:19
fungispeaking of fun, i'm heading out to an early wedding anniversary dinner with christine, but am planning to return well before the start of the etherpad maintenance window18:19
clarkbnow that I've said that the streams seem to list the contents on the waitin/ directory once and process them entirely then the thread should die18:33
clarkbso that may result in oddness at startup but I wouldn't expect to continue seeing the errors (which we do but at a lower rate than at startup)18:34
clarkbthis plugin is quite complicated too18:34
slittle1Please set me up as first core member of starlingx-app-dell-storage-core.  As usual I'll add the others.18:48
clarkbslittle1: I'm popping out for lunch but can take a look after19:04
fricklerthe warning at the end of might be interesting:  Job openstack-tox-pep8: unable to map line for file comments:  stderr: 'fatal: file has only 1 lines'19:19
fricklerthe reason for that, as I noticed while fixing that pop8 issue, is that the file is indeed a symlink to another file in the same repo19:19
fricklermaybe someone wants to look into adding some special handling for that case19:20
frickler( is the fix for the underlying issue, triggered or rather brought to the light by a new hacking release)19:21
slittle1Please set me up as first core member of starlingx-app-dell-storage-core.  As usual I'll add the others.20:08
clarkbslittle1: done20:26
clarkbI'm finding that the replication plugin logging is really confusing21:11
clarkbNOT_ATTEMPTED in the logs means "I'm going to get to it in just a second most likely"21:11
clarkbbut heavily implies "I'm not doing this at all?21:11
fungii'm back, just catching up21:24
ianwclarkb: sorry, catching up, but we have two distinct messages in the error log spew?21:26
clarkbianw: two distinct behaviors I think. One is the leaking and the other is the error spew21:30
clarkbbasically I don't know what causes those errors and it doesn't seem to be directly related to the leaked files21:31
clarkbI am 95% certain I understand the leaking of All-Projects and All-Users task files now. But not the user edit refs (though I think I have a sense for what happens I can't trace it in the code yet)21:31
clarkbIn the case of All-Projects and All-Users we hit which means we never get to line 440 which means we never schedule the activity that will ultimately notifyFinished which cleans up the files on disk21:34
clarkbInthe case of user edits I suspect what is happening is the filter here: is modifying the refs list that gets scheduled which causes us to have a new hash on disk which does get cleaned up. But because the hash is different it does not21:35
clarkbcleanup the original file on disk that triggered things in the first place21:35
clarkbWhere I'm lost for this second thing is I don't see where we are creating the new content on disk for that to work21:36
clarkbI need to put this down now though in order to focus on etherpad stuff21:43
clarkbI'm going to get ready for that now. I've been trying to take notes in the source directly then I can git diff my way to further understanding21:44
clarkbI've put etherpad01 and etherpad02 in the emergency file per everything else needs to wait until 22:0021:47
clarkbre gerrit I'm probably going to need to draw out the actual flow of events. The problem is I don't understand the scheduler and event system well enough to do that yet. But I'm sure I'll get there21:54
fungi4 minutes to maintenance. should we #status notice it?21:55
fungistatus notice The Etherpad service on will be offline for the next 90 minutes for a server replacement and operating system upgrade21:56
fungisomething like that?21:56
clarkbsounds great to me21:56
fungi#status notice The Etherpad service on will be offline for the next 90 minutes for a server replacement and operating system upgrade21:57
opendevstatusfungi: sending notice21:57
-opendevstatus- NOTICE: The Etherpad service on will be offline for the next 90 minutes for a server replacement and operating system upgrade21:57
clarkbI've stopped services on 02 since those dont' actually matter yet21:58
opendevstatusfungi: finished sending notice21:59
clarkbok it is also time now. I will stop services on etherpad01 now22:00
clarkbfungi: want to approve ?22:01
fungion it22:01
fungiand done22:02
clarkband maybe check that etherpad looks down to you in your browser?22:02
clarkbit should be down on both servers now22:02
clarkbI will start the db dump if that looks good to you22:02
ianwdown for me22:02
clarkbok proceeding with the dump. THis will take about half an hour22:03
opendevreviewMerged opendev/ master: Update etherpad.o.o to point at etherpad02
clarkbthe file is writing to my homedir on etherpad01 and should end up being just under 4GB large22:04
clarkbyou can track rough progress this way22:04
clarkbfor anyone following along on the paste we are on line 1022:06
clarkbwe are at 2GB and about halfway through 30 minutes so I think we are right on time22:16
fungiwe're coming up on completion of this phase in the next few minutes, presumably22:27
clarkbyes up to 3.6GB should be just a few more minutes22:27
clarkbDNS is showing updated for me now too22:27
fungisame, but it was updated shortly after deployment for me22:28
fungittl was only 5m anyway22:28
clarkbI only just now remembered to check :)22:28
clarkbshould've put that on my list22:29
ianwluckily we cnamed ->, so that's good too22:30
clarkbits down. Chowning and copying now22:30
clarkb*its done22:30
fungiyay too22:31
fungicopying looks done22:34
fungichecksum comparison of the db dump on both servers lgtm22:35
clarkbyup my sha256sums match up22:35
clarkbI'm actually going to skip the step on line 19. THis was more for when I was testing but in theory this matters less now since we are moving off of that server so disk being limited is ok22:35
clarkbthat means next step is to restore on etherpad02. I'm doing this next22:36
clarkbThis is another half hour or so step and harder to track progress for. Instead we just get to be patient22:39
clarkbI think /var/etherpad/db will be at like 27 or 29 GB usage when the restore is done. Currently at 4ish22:41
JayFSo, I just went to do some looking at the status of virtualpdu. It looks like the redirect has been removed ( but there is no content sync'd into the repo22:56
JayFstill not a hugely urgent issue, but just followed up on it today to find this state :)22:57
clarkbJayF: does the repo run the github synchronization job?22:57
clarkband if so have you merged any changes?22:57
ianwJayF: you'll have to merge something with the sync job, I'd say22:57
JayFI don't know (would assume yes), and no.22:57
JayFSo basically merge a noop change to virtualpdu and we should be good?22:57
clarkbcheck that the job is applied to it in openstack/project-config first22:58
clarkbbut yes22:58
ianwyeah, you can see the jobs @
ianwpost jobs - openstack-upload-github-mirror23:00
ianwi guess there is some chance given this was retired and moved around it may not work, but we can debug that if it happens23:01
JayFWhat about for bot configs? Not seeing updates for virtualpdu in #openstack-ironic23:01
JayFI assume that's in git somewhere, too23:01
fungigerritbot config is in openstack/project-config yes23:02
JayFack, will check that now23:02
clarkbup to 23 GB shouldn't be long now23:03
opendevreviewJay Faulkner proposed openstack/project-config master: Enable IRC notifications for virtualpdu for ironic
ianwclarkb: if you're happy with i can merge that NO_CHANGE stack soon23:06
ianwthat removes it from the post-review label as disucssed.  i couldn't see any others i thought needed it explicitly removed23:06
clarkbianw: did we want to do the indent chagne first? I was thinking the other way around because landing the indent chagne would require manually running manage-projects since every file is affected23:08
clarkbdb restore is done23:09
clarkbare we ready to start up services on etherpad02?23:09
ianwi could go either way -- my thought was to get the config options in sync, despite the indenting moving about23:09
clarkbthe restore command exited 0 I can't really think of a good way to check things without starting the service. So I think I'm starting it now23:10
ianwthere seemed to be some desire to make the normalise process a bit more explicit before requiring indenting23:10
fungii'm ready23:10
ianwi've reloaded some etherpads i've got open and LGTM!23:11
clarkbya the zuul user survey feedback pad has the new stuff fungi added to it since I did the test restore on etherpad02 implying that the restore dropped and pulled in new content (the disk usage listings by df also implied this)23:12
clarkb seems to work with basic stuff23:13
fungiyeah, i've reconnected to pads i was using earlier today and they seem fine23:14
clarkbinfra-root are we happy enough with it to remove etherpad02 from the emergency file? If so I think that is the last step for today. Then tomorrow we can land the change to remove etherpad01 from dns and I need to push a change to remove it from system-config as well23:15
clarkbassuming everything remains happy23:15
fungii'm happy with it, and still around for a while if there are problems23:15
clarkbok etherpad02 is no longer in the emergency file but 01 is23:16
clarkbI'll work on the change to remove etherpad01 from system-config now so that is ready to merge whenever we're happy23:16
clarkbnevermind I was good and lready took care of it
clarkbreviews very much welcome and ya if things look good tomorrow then maybe we can land ^ and
fungiit usually takes me spotting my earlier change in the gerrit conflicts list to realize i've just redone my own work23:18
clarkbhrm is meetpad not showing the shared document? I wonder if this could be java's bad dns caching23:21
fungimay need jitsi service restarts?23:22
clarkbya could be23:23
clarkbI can try that23:23
fungii'll admit i didn't think to test it23:23
clarkbit isn't something we can easily test pre move iirc because it does the proxying in the host itself23:24
clarkbso we kinda have to take a leap of faith that if everything else works it should too which is what we did :)23:25
opendevreviewMerged openstack/project-config master: Enable IRC notifications for virtualpdu for ironic
clarkbit seems to work now after restarting jitsi meet services. I suspect the issue was caching the old ip addr and that being inaccessible23:26
fungisounds about right23:28
* clarkb makes a note to check that backups are backing up what we want from etherpad02 tomorrow as well23:28
clarkbthe cronjobs are installed, mostly a matter of mounting the backups and checking content looks good23:28
clarkbwhy is opendevreview dropping?23:29
clarkbgerritbot container restarted according to docker ps -a23:31
clarkbit should be fine now I think unless something is very broken and I'm out of steam to deal with very broken for that bot right now :)23:31
fungiit restarted for the config update in 880895 deploying just now23:32
clarkb#status log Moved the etherpad service from to etherpad02.opendev.org23:32
funginothing to see here, move along23:32
opendevstatusclarkb: finished logging23:32

Generated by 2.17.3 by Marius Gedminas - find it at!