Wednesday, 2022-09-14

clarkbwow that managed to merge first pass after the check +1.00:02
clarkbAll of the images appear to have promoted too00:02
ianwweirdly i can not update a change on gerrit's gerrit00:41
ianwsuggests to me that i should be able to, but ... "cannot add patch set to"00:42
opendevreviewIan Wienand proposed opendev/system-config master: reprepro doc: mention contents.cache.db
opendevreviewMerged opendev/system-config master: translate: fix dump with MySQL 5.7
opendevreviewIan Wienand proposed opendev/system-config master: Add an .allowedSigners file
ianwinterested to see what we all think about ^ :)03:34
ianw-rw-r--r-- 1 10004 root 210M Sep 14 03:43 contents.cache.db03:43
ianw-rw-r--r-- 1 10004 root 724M Sep 13 18:26 contents.cache.db.old03:43
ianwso it's ... ~30% done?  i started it at about 8am, it's ~2pm now ... so i make it about 16-17 hours to go, at this rate03:44
ianw7pm-ish UTC03:45
fungiokay, so it did end up needing to slowly rebuild the cache?11:31
opendevreviewMerged openstack/project-config master: Add Keystone OpenID Connect charm to OpenStack charms
frickleramorin: the flavors that you mentioned for testing the nested kvm issue are a bit low on cpu and ram, would using [bc]2-15 or even -30 work, too? otherwise I can set up a smaller devstack based test12:38
corvusclarkb: looks like the zuul reboot script stopped after an error on zm0513:51
Clark[m]corvus: it is probably something related to docker-compose ps output not discriminating running containers vs exited like docker ps. I'll take a look after the school run this morning13:57
corvusyeah, it said no container found, so it may have been a previously stopped container or something (?)13:57
Clark[m]Ya zm05 has stopped containers as fallout from the previous crash. I thought I had addressed that but maybe not completely.13:58
Clark[m]And we should be able to restart with a limit to mergers and schedulers13:58
Clark[m]To avoid iterating through executors again13:59
amorinfrickler ack, c2-15 is ending on the same hardware as c2-714:55
amorinso it's good14:55
amorinsame for -3014:55
opendevreviewClark Boylan proposed opendev/system-config master: Fix error checking with zuul graceful stops
clarkbcorvus: ^15:18
clarkbI can restart the playbook limiting it to mergers and schedulers if we are happy with that and it lands15:18
fungiclarkb: when you were cleaning up the old mm3 holds, it looks like you left one from two weeks ago15:24
fungithough in good news, the hold i set yesterday did trigger, so we should have a fresh example to work from now with the latest state of the implementation15:24
clarkbfungi: the one from two weeks ago is the one you did the last round of imports on. I didn't want to claen it up from under you15:25
clarkbfungi: if you are done with it feel free to remove it though15:25
fungiahh, okay, yeah i thought you'd asked if i was done with it. happy to delete it now. thanks!15:25
fungiand done15:26
fungi104.239.143.143 is the latest held listserv15:27
opendevreviewAlfredo Moralejo proposed zuul/zuul-jobs master: Use AFS mirrors for extras-common in CS9
fungirsync from the production listservs to the new held node are underway15:36
clarkbI think is related to our ansible slowness17:22
clarkbit affects ansible 5 but not 6. They appear to have decided to not fix ansible 5 for some reason17:22
clarkbtesting locally this definitely has an impact (though harder to measure how big of one without doing far more invovled testing)17:23
clarkbI went ahead and approved the zuul reboot playbook update17:24
corvusclarkb: we still have ansible 2 available; you could throw up a change to switch to ansible 2 for a job and compare runtimes17:29
clarkbcorvus: thats a good point. I'll give that a go17:30
corvusclarkb: 857725 lgtm17:31
opendevreviewClark Boylan proposed opendev/system-config master: DNM Check if older ansible 2.9 pipelining is faster than ansible 5
clarkbUsing the system-config-run-zuul job since that has a large nodeset which seems to amplify these issues17:35
opendevreviewJames E. Blair proposed opendev/system-config master: Add Jaeger tracing server
clarkbcorvus: ^ the currently running system-config-run-zuul for the reboot playbook change took about 2 minutes to copy ssh host keys to all the hosts. That same process took about a minute on the job running for the ansible 2.9 DNM change on rax-ord18:01
clarkbnot a direct comparison but at least we haven't disproven it helps yet18:01
corvusclarkb: i love that the change to add ansible 6 is waiting on the relase of zuul 6.4.0 which is waiting on the opendev restart which is waiting on the zuul test job which is slow because we don't have ansible 618:04
fungizuul 6.4.0, in which franz kafka meets rube goldberg for a few beers18:06
clarkbchecking the previous 3 successful runs 2 minutes, 3 minutes, and 6 minutes for the same block of tassk work18:06
clarkbnone of them ran on rax-ord but the 2 minute one did run on rax-dfw18:06
clarkbdefinitely seems like this may help18:07
clarkbI can recheck the dnm change a couple times to generate more data too18:07
clarkbIn a way this was a good thing. We did finda few places where the code was just inefficient even if we speed it up a bit with pipelining18:09
opendevreviewMerged opendev/system-config master: Fix error checking with zuul graceful stops
clarkbonce the repo on bridge updates I'll start the playbook again with a --limit 'zuul-merger:zuul-scheduler' added18:20
clarkbthe playbook is running limited to mergers and schedulers18:27
clarkbcorvus: schedulers are restarting now. I notice failed without logs. Makes me suspicious given timing but I haven't started tracking down why that happened yet18:41
clarkbI can probably take a look after lunch if no one beats me to it18:41
clarkbcorvus: reboot is done18:47
clarkbthe error was a decyption error19:48
clarkbother jobs don't seem to have thatissue so likely due to the job config19:49
clarkbit ran on ze02 if you want to find the logs for it. I think we're ok19:49
fungiargh, bit of a hangup on the latest mm3 import test. the held node is in rackspace which only gets a 37gb rootfs. i'll need to move /var/lib/mailman to the ephemeral disk before i can get all the data copied over20:17
fungiwish i'd noticed that before i copied some 30gb of data over to it and filled the rootfs20:23
fungihopefully the local file move won't take too long, and then i can resume the rsync rather than having to re-copy everything over the net20:23
*** dviroel is now known as dviroel|afk20:26
fungigoing to pop out for some food while the resumed prod mailman copies finish up20:44
opendevreviewJames E. Blair proposed opendev/system-config master: Add Jaeger tracing server
fungisee, going out to the biergarten is the answer to slow file transfers. i'm certain that's what got them to finish22:24
ianw-rw-r--r-- 1 10004 root 555M Sep 14 20:41 contents.cache.db22:25
ianwi predicted 7pm utc, so close enough :)  22:26
fungii don't think anyone else's jellybean count was closer, so you win22:26
ianwso the debian volume has been released -- i guess afaik we'll call it fixed?  why exactly the file went corrupt is still unknown?22:27
ianwi'm just moving the old file to my homedir just in case, but i've dropped the locks22:29
ianw#status performed reprepro db recovery on debian mirror; has been synced and volume released 22:29
opendevstatusianw: unknown command22:29
ianw#status log performed reprepro db recovery on debian mirror; has been synced and volume released 22:29
opendevstatusianw: finished logging22:29
ianwi added a quick on it via too22:30
ianwa quick note on it22:30
ianwalso we are still seeing the "afs: Warning: We are having trouble keeping the AFS stat cache trimmed down under the configured limit" popping up every now and then22:31
fungiianw: best guess is that whatever event leaked the reprepro lockfile on the 13th terminated the process and left the bdb file for its content cache dirty, but that's really as far as we got22:33
clarkbianw: was my attempt at dealing with the afs warning but it appears I got something horribly wrong22:42
opendevreviewClark Boylan proposed opendev/system-config master: Up openafs client -stat value
clarkbmaybe that is better22:43
*** dviroel is now known as dviroel|afk22:52
opendevreviewMerged zuul/zuul-jobs master: Update gpg key file for extras-common in CS9
ianwyeah i couldn't find much googling for that error23:19
