corvus | #status log restarted all of zuul on 7e1e5a0176620d877717e87035223e4f3195d267 | 00:02 |
---|---|---|
opendevstatus | corvus: finished logging | 00:02 |
corvus | okay, the last NO_JOBS we should see in the buildset page for the openstack tenant is openstack/charm-octavia 800619,2 | 00:05 |
*** dpawlik3 is now known as dpawlik | 00:39 | |
*** dpawlik2 is now known as dpawlik | 07:43 | |
ianw | o/ | 21:00 |
ianw | it's moving day | 21:00 |
ianw | for reference the checklist is @ https://etherpad.opendev.org/p/gerrit-upgrade-2021 | 21:04 |
ianw | i'll log notes in here too | 21:05 |
ianw | as a first thought, i'm going to reboot review02 just to makes sure it's on the latest everything | 21:05 |
ianw | ok, it's back, ipv6 & ipv4 pinging | 21:10 |
ianw | i've updated the review01 maintenance message in preparation for switching that on | 21:16 |
clarkb | ianw: sounds good. I'm rolling in nowish. Have some things I need to get done as this picks up steam | 21:23 |
ianw | i'm a bit earlier than the email, but since the last email was sent we've decided to start spreading the delta variant and close schools, which is annoying on many levels | 21:26 |
ianw | #status alert Gerrit Downtime -- over the next few hours the gerrit service will be offline as we move it to a new home. Thank you for your patience and we will send an alert when things are restored in a 4-6 hours | 21:30 |
opendevstatus | ianw: sending alert | 21:30 |
-opendevstatus- NOTICE: Gerrit Downtime -- over the next few hours the gerrit service will be offline as we move it to a new home. Thank you for your patience and we will send an alert when things are restored in a 4-6 hours | 21:30 | |
ianw | ok, maintenance page up | 21:44 |
ianw | i'm running a screen on review02, it's also logging to /root/screenlog.0 | 21:45 |
clarkb | ok I'll try to join in a bit. Finishing up a few things | 21:46 |
ianw | no worries; i know i'm early on this sorry | 21:48 |
ianw | so far all good, and syncing git repos now | 21:49 |
clarkb | For the two user account fixups I wanted to mention that you need to fetch the latest version of that ref before updating it and pushing it back to avoid undoing any changes since we tested it last week. I think you know this but wasn't sure if that was explicitly stated | 21:53 |
clarkb | I see that ansible is thoroughly stopped on bridge as we want :) | 21:54 |
ianw | yep i was going to clone it fresh to avoid any old data laying around | 21:54 |
clarkb | I'm on the screen now. Will go grab a glass of water and be back shortly | 21:55 |
ianw | i just realised i'm running the rsync in the terminal tab i opened to check the logging of the screen session, doh. anyway, i'll be sure to close that when it finishes | 21:57 |
clarkb | When you've got the user fixes written and committed I'll happily check the git log -2 -p on that :) | 21:58 |
clarkb | ianw: it occurred to me that you'll need to populate the known hosts for gerrit2 on review02 to include gitea01-08 | 21:59 |
clarkb | and I want to say we need the rsa keys for that too? | 21:59 |
clarkb | the rsa thing might be an old gerrit problem and the other host keys may be fine now though | 22:00 |
*** kopecmartin is now known as kopecmartin|pto | 22:02 | |
ianw | i was thinking whatever ssh gerrit uses probably just accepts the keys, although that makes sense | 22:02 |
ianw | ok, syncing done | 22:02 |
clarkb | pretty sure it doesn't and this is a common "oh crap" problem that forces us to restart the server sincei t loads that data only on startup | 22:02 |
clarkb | not the end of the world if we run into it, but I think we can avoid it entirely and add those host keys to known hosts early | 22:03 |
clarkb | (I put a note on the etherpad about it as that can be done while reindexing happens) | 22:03 |
clarkb | ianw: I think you need to pass --author the author info? | 22:08 |
ianw | yep :) | 22:11 |
clarkb | ianw: those two commits lgtm | 22:11 |
ianw | excellent, i think we can push then | 22:11 |
clarkb | you need to stop being gerrit2 | 22:13 |
clarkb | it was correct to be gerrit2 for the git edits but now not for the reindex | 22:13 |
ianw | probably should have ansibleised the known hosts thing, we could definitely do that | 22:15 |
ianw | i can just run a loop to get them | 22:15 |
clarkb | ianw: one small edit to your loop in the etherpad | 22:20 |
clarkb | I think you need the echo -n otherwise they will end up on different lines | 22:21 |
ianw | oh it was supposed to be a comment | 22:21 |
clarkb | ah | 22:22 |
ianw | i guess that kind of defies the point of hashing the string in the file, but i think it's better in this case :) | 22:22 |
ianw | /home/gerrit2/tmp/ianw/known_hosts is a sample | 22:23 |
clarkb | looking | 22:24 |
clarkb | ah grab all of them. that wfm | 22:24 |
clarkb | all the key types I mean | 22:25 |
clarkb | ianw: note there is content in known hosts already that you probably don't want to overwrite | 22:25 |
clarkb | so append don't mv the file over :) | 22:25 |
ianw | yep should >> | 22:25 |
ianw | https://www.bunniestudios.com/blog/?p=6140 was a very interesting read on a Curve25519 hardware accelerator prototype that popped in my feed | 22:26 |
ianw | apropos key types | 22:26 |
clarkb | ianw: another thing I notice (because nothing like pressure to quadruple check things) is the ~gerrit2/.launchpadlib dir is empty on 02 | 22:34 |
clarkb | I wonder if we failed to convert that properly in the puppet -> ansible rewrite | 22:34 |
clarkb | that might eb another thing to manually copy over then ansible later? | 22:34 |
ianw | hrm | 22:37 |
clarkb | reindexing is done | 22:42 |
clarkb | ianw: should be 0600 | 22:45 |
clarkb | according to the half there ansible | 22:45 |
clarkb | lgtm | 22:45 |
clarkb | ianw: oh wait problem with that | 22:46 |
clarkb | its port 222 that we care about I think | 22:46 |
clarkb | double checking the port on gitea01 now | 22:46 |
clarkb | yup 222 | 22:47 |
clarkb | looks like ssh-keyscan -p 222 -H $HOST will do it? | 22:48 |
ianw | hrm, good point, let me try that | 22:48 |
ianw | ok, i think we're ready to start containers | 22:53 |
clarkb | seems like it is up. Are you testing login with overridden hosts file? | 22:55 |
ianw | about to, yep | 22:55 |
ianw | ok, i'm in, my changes all look correct | 22:57 |
clarkb | Excellent | 22:58 |
ianw | ok, i agree on just replicating a project to make sure it's working | 22:59 |
clarkb | based on the log I think that went well | 23:03 |
clarkb | replication log I mean | 23:03 |
clarkb | if we had problems you get errors and retries | 23:03 |
ianw | ++ yay. alright, time to merge the dns change and replicate that then | 23:04 |
clarkb | need a space before --message | 23:06 |
clarkb | I'll check all 8 giteas to be sure they got that now | 23:06 |
clarkb | seems to be there on all 8 | 23:08 |
ianw | ok, switching to bridge to run the dns update | 23:09 |
clarkb | ok | 23:09 |
ianw | nameserver playbook running | 23:10 |
clarkb | I see the updated CNAME now | 23:12 |
ianw | and done. google admin dns toolbox tells me review.opendev.org is updated | 23:13 |
clarkb | I have successfully loged in via the web ui now too | 23:14 |
ianw | yay | 23:14 |
ianw | bringing zuul back | 23:15 |
clarkb | as expected if I look at changes it shows I haven't reviewed any files. But if I open and view those files their state changes to reviewed | 23:15 |
clarkb | ianw: myname.admin needs to be updated | 23:17 |
ianw | just waiting for zuul to load openstack tenant | 23:19 |
clarkb | I left a comment on https://review.opendev.org/c/zuul/nodepool/+/801189 to kick the tires and it went well | 23:20 |
clarkb | ianw: hrm if I click the dropdown for the lone file in that change I get a 500 error for Endpoint: /changes/*~*/revisions/*/files/*/reviewed | 23:22 |
clarkb | the file loads and I can dismiss that though | 23:22 |
clarkb | looking at error log it seems that it is trying to insert an entry to the db saying I've reviewed the file | 23:23 |
ianw | Caused by: java.sql.SQLException: Duplicate entry '801189-1-4146-nodepool/nodeutils.py' for key 'PRIMARY | 23:23 |
clarkb | then hits a duplicate entry error | 23:23 |
clarkb | I'm guessing gerrit indescriminantly pushes those updates and mariadb doesn't liek that? | 23:24 |
ianw | yeah, if i mark it reviewed, then click the file to review it, i get the same thing | 23:26 |
ianw | :/ | 23:26 |
clarkb | https://mariadb.com/kb/en/insert-on-duplicate-key-update/ I think that is how mariadb wants you to handle it | 23:26 |
clarkb | ianw: really thinking out loud here we could pivot and force it to use the h2 database instead | 23:27 |
clarkb | that makes me sad because the h2 tooling is so clunky | 23:28 |
ianw | it is an option | 23:28 |
ianw | i'm just reading the insert bits now | 23:30 |
clarkb | ok | 23:30 |
ianw | if (ormException instanceof DuplicateKeyException) { | 23:30 |
ianw | return false; | 23:30 |
ianw | it might be that mariadb returns a different value | 23:31 |
clarkb | oh interesting | 23:31 |
clarkb | Caused by: java.sql.SQLException: Duplicate entry | 23:31 |
clarkb | that seems to be the exception that is raised that is causing the fuss | 23:31 |
clarkb | which becomes a SQLIntegrityConstraintViolationException | 23:32 |
clarkb | on a separate note it seems that zuul is up | 23:33 |
clarkb | zuul isn't doing anything though | 23:33 |
ianw | https://gerrit.googlesource.com/gerrit/+/refs/heads/master/java/com/google/gerrit/server/schema/MariaDBAccountPatchReviewStore.java#42 | 23:33 |
clarkb | so I guess if we can determine the error code we might be able to patch this? | 23:35 |
ianw | it seems the error codes should be the same for mariadb (i mean, not surprising) | 23:36 |
clarkb | https://mariadb.com/kb/en/mariadb-error-codes/ seems to be a full list | 23:37 |
clarkb | I wonder if we are getting a 1061 from the database | 23:38 |
clarkb | hwoever 1062's string in the mariadb list matches what I see in our error log for the exception | 23:38 |
clarkb | Duplicate entry '%s' for key %d <- that should be a 1062 | 23:39 |
clarkb | maybe getSQLStateInt(err) is returning something else? | 23:39 |
ianw | https://gerrit.googlesource.com/gerrit/+/refs/heads/master/java/com/google/gerrit/server/schema/JdbcAccountPatchReviewStore.java#239 | 23:40 |
ianw | that's where i believe it's saying "if duplicate key error, just ignore this" ... presumably because it's already been reviewed | 23:40 |
clarkb | yup | 23:41 |
clarkb | https://gerrit.googlesource.com/gerrit/+/e906789307262fb827f8dbb0b3e0b9948abd665e/gerrit-server/src/main/java/com/google/gerrit/server/schema/JdbcAccountPatchReviewStore.java#312 it does default to -1 or 0 if it can't parse the error out | 23:41 |
clarkb | I think either we're not parsing the error out properly or the instanceof check is failing | 23:41 |
ianw | https://gerrit.googlesource.com/gerrit/+/refs/heads/master/java/com/google/gerrit/server/schema/JdbcAccountPatchReviewStore.java#399 | 23:43 |
ianw | i think that comes from the connector layer | 23:43 |
ianw | i wonder if switching to the mysql connector might help | 23:44 |
clarkb | wouldn't surprise me if they handle that better for whatever reason | 23:45 |
clarkb | I feel like this is fixable in gerrit, but probably not today. We can use my test instance held for the openid deletion testing to dig in more later if we want. But for now we might consider h2 or trying mysql? | 23:47 |
ianw | i agree. i think first thing to try is mysql connector, and then h2 | 23:47 |
clarkb | sounds good to me | 23:47 |
clarkb | ianw: that seems to work | 23:50 |
ianw | alright, i have manually switched it to the mysql connector and it doesn't seem to be doing it | 23:50 |
clarkb | ya I think that is good enough for now. We can follow up with this upstream next | 23:51 |
clarkb | but use mysql connector in the interim | 23:51 |
ianw | yep. i will update the change to move gerrit out of the staging group to include this, so config update doesn't overwrite it | 23:51 |
clarkb | ++ | 23:51 |
clarkb | I'll probably take a break for dinner once I've reviewed that updated change. Then try to be back around when it will be landing | 23:54 |
opendevreview | Ian Wienand proposed opendev/system-config master: review02: move out of staging group https://review.opendev.org/c/opendev/system-config/+/797563 | 23:54 |
clarkb | I think you need to rebase the backups change on that too | 23:55 |
clarkb | the change in ^ seems to match what I saw you update on the server not too long ago | 23:56 |
clarkb | ianw: that change isn't being tested by zuul? | 23:56 |
ianw | yep backups shouldn't conflict | 23:56 |
ianw | yes, i was just noticing that :) | 23:57 |
clarkb | 2021-07-18 23:56:23,144 ERROR zuul.Pipeline.openstack.release-approval: [e: 6a90559ab864448dae6831a5fc805777] Invalid config for change <Change 0x7f4f2460c3a0 opendev/system-config 797563,3> | 23:57 |
ianw | indeed | 23:58 |
ianw | zuul.model.TemplateNotFoundError: Project template system-required not found | 23:58 |
clarkb | zuul.model.TemplateNotFoundError: Project template system-required not found | 23:58 |
clarkb | yup just found that | 23:58 |
clarkb | this feels like broken zuul restart | 23:59 |
clarkb | like it didn't load the configs successfully | 23:59 |
clarkb | zuul pulls from gerrit | 23:59 |
clarkb | ianw: did you do a full zuul restart of the executors and mergers too? I wonder if they are/were still trying to talk to the old server? | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!