Sunday, 2021-07-18

corvus#status log restarted all of zuul on 7e1e5a0176620d877717e87035223e4f3195d26700:02
opendevstatuscorvus: finished logging00:02
corvusokay, the last NO_JOBS we should see in the buildset page for the openstack tenant is openstack/charm-octavia 800619,200:05
*** dpawlik3 is now known as dpawlik00:39
*** dpawlik2 is now known as dpawlik07:43
ianwit's moving day21:00
ianwfor reference the checklist is @
ianwi'll log notes in here too21:05
ianwas a first thought, i'm going to reboot review02 just to makes sure it's on the latest everything21:05
ianwok, it's back, ipv6 & ipv4 pinging21:10
ianwi've updated the review01 maintenance message in preparation for switching that on21:16
clarkbianw: sounds good. I'm rolling in nowish. Have some things I need to get done as this picks up steam21:23
ianwi'm a bit earlier than the email, but since the last email was sent we've decided to start spreading the delta variant and close schools, which is annoying on many levels21:26
ianw#status alert Gerrit Downtime -- over the next few hours the gerrit service will be offline as we move it to a new home.  Thank you for your patience and we will send an alert when things are restored in a 4-6 hours21:30
opendevstatusianw: sending alert21:30
-opendevstatus- NOTICE: Gerrit Downtime -- over the next few hours the gerrit service will be offline as we move it to a new home. Thank you for your patience and we will send an alert when things are restored in a 4-6 hours21:30
ianwok, maintenance page up21:44
ianwi'm running a screen on review02, it's also logging to /root/screenlog.021:45
clarkbok I'll try to join in a bit. Finishing up a few things21:46
ianwno worries; i know i'm early on this sorry21:48
ianwso far all good, and syncing git repos now21:49
clarkbFor the two user account fixups I wanted to mention that you need to fetch the latest version of that ref before updating it and pushing it back to avoid undoing any changes since we tested it last week. I think you know this but wasn't sure if that was explicitly stated21:53
clarkbI see that ansible is thoroughly stopped on bridge as we want :)21:54
ianwyep i was going to clone it fresh to avoid any old data laying around21:54
clarkbI'm on the screen now. Will go grab a glass of water and be back shortly21:55
ianwi just realised i'm running the rsync in the terminal tab i opened to check the logging of the screen session, doh.  anyway, i'll be sure to close that when it finishes21:57
clarkbWhen you've got the user fixes written and committed I'll happily check the git log -2 -p on that :)21:58
clarkbianw: it occurred to me that you'll need to populate the known hosts for gerrit2 on review02 to include gitea01-0821:59
clarkband I want to say we need the rsa keys for that too?21:59
clarkbthe rsa thing might be an old gerrit problem and the other host keys may be fine now though22:00
*** kopecmartin is now known as kopecmartin|pto22:02
ianwi was thinking whatever ssh gerrit uses probably just accepts the keys, although that makes sense22:02
ianwok, syncing done22:02
clarkbpretty sure it doesn't and this is a common "oh crap" problem that forces us to restart the server sincei t loads that data only on startup22:02
clarkbnot the end of the world if we run into it, but I think we can avoid it entirely and add those host keys to known hosts early22:03
clarkb(I put a note on the etherpad about it as that can be done while reindexing happens)22:03
clarkbianw: I think you need to pass --author the author info?22:08
ianwyep :)22:11
clarkbianw: those two commits lgtm22:11
ianwexcellent, i think we can push then22:11
clarkbyou need to stop being gerrit222:13
clarkbit was correct to be gerrit2 for the git edits but now not for the reindex22:13
ianwprobably should have ansibleised the known hosts thing, we could definitely do that22:15
ianwi can just run a loop to get them22:15
clarkbianw: one small edit to your loop in the etherpad22:20
clarkbI think you need the echo -n otherwise they will end up on different lines22:21
ianwoh it was supposed to be a comment22:21
ianwi guess that kind of defies the point of hashing the string in the file, but i think it's better in this case :)22:22
ianw /home/gerrit2/tmp/ianw/known_hosts is a sample22:23
clarkbah grab all of them. that wfm22:24
clarkball the key types I mean22:25
clarkbianw: note there is content in known hosts already that you probably don't want to overwrite22:25
clarkbso append don't mv the file over :)22:25
ianwyep should >>22:25
ianw was a very interesting read on a Curve25519 hardware accelerator prototype that popped in my feed22:26
ianwapropos key types22:26
clarkbianw: another thing I notice (because nothing like pressure to quadruple check things) is the ~gerrit2/.launchpadlib dir is empty on 0222:34
clarkbI wonder if we failed to convert that properly in the puppet -> ansible rewrite22:34
clarkbthat might eb another thing to manually copy over then ansible later?22:34
clarkbreindexing is done22:42
clarkbianw: should be 060022:45
clarkbaccording to the half there ansible22:45
clarkbianw: oh wait problem with that22:46
clarkbits port 222 that we care about I think22:46
clarkbdouble checking the port on gitea01 now22:46
clarkbyup 22222:47
clarkblooks like ssh-keyscan -p 222 -H $HOST will do it?22:48
ianwhrm, good point, let me try that22:48
ianwok, i think we're ready to start containers22:53
clarkbseems like it is up. Are you testing login with overridden hosts file?22:55
ianwabout to, yep22:55
ianwok, i'm in, my changes all look correct22:57
ianwok, i agree on just replicating a project to make sure it's working22:59
clarkbbased on the log I think that went well23:03
clarkbreplication log I mean23:03
clarkbif we had problems you get errors and retries23:03
ianw++ yay.  alright, time to merge the dns change and replicate that then23:04
clarkbneed a space before --message23:06
clarkbI'll check all 8 giteas to be sure they got that now23:06
clarkbseems to be there on all 823:08
ianwok, switching to bridge to run the dns update23:09
ianwnameserver playbook running23:10
clarkbI see the updated CNAME now23:12
ianwand done.  google admin dns toolbox tells me is updated23:13
clarkbI have successfully loged in via the web ui now too23:14
ianwbringing zuul back23:15
clarkbas expected if I look at changes it shows I haven't reviewed any files. But if I open and view those files their state changes to reviewed23:15
clarkbianw: myname.admin needs to be updated23:17
ianwjust waiting for zuul to load openstack tenant 23:19
clarkbI left a comment on to kick the tires and it went well23:20
clarkbianw: hrm if I click the dropdown for the lone file in that change I get a 500 error for Endpoint: /changes/*~*/revisions/*/files/*/reviewed23:22
clarkbthe file loads and I can dismiss that though23:22
clarkblooking at error log it seems that it is trying to insert an entry to the db saying I've reviewed the file23:23
ianwCaused by: java.sql.SQLException: Duplicate entry '801189-1-4146-nodepool/' for key 'PRIMARY23:23
clarkbthen hits a duplicate entry error23:23
clarkbI'm guessing gerrit indescriminantly pushes those updates and mariadb doesn't liek that?23:24
ianwyeah, if i mark it reviewed, then click the file to review it, i get the same thing23:26
clarkb I think that is how mariadb wants you to handle it23:26
clarkbianw: really thinking out loud here we could pivot and force it to use the h2 database instead23:27
clarkbthat makes me sad because the h2 tooling is so clunky23:28
ianwit is an option23:28
ianwi'm just reading the insert bits now23:30
ianw      if (ormException instanceof DuplicateKeyException) {23:30
ianw        return false;23:30
ianwit might be that mariadb returns a different value23:31
clarkboh interesting23:31
clarkbCaused by: java.sql.SQLException: Duplicate entry23:31
clarkbthat seems to be the exception that is raised that is causing the fuss23:31
clarkbwhich becomes a SQLIntegrityConstraintViolationException23:32
clarkbon a separate note it seems that zuul is up23:33
clarkbzuul isn't doing anything though23:33
clarkbso I guess if we can determine the error code we might be able to patch this?23:35
ianwit seems the error codes should be the same for mariadb (i mean, not surprising)23:36
clarkb seems to be a full list23:37
clarkbI wonder if we are getting a 1061 from the database23:38
clarkbhwoever 1062's string in the mariadb list matches what I see in our error log for the exception23:38
clarkbDuplicate entry '%s' for key %d <- that should be a 106223:39
clarkbmaybe getSQLStateInt(err) is returning something else?23:39
ianwthat's where i believe it's saying "if duplicate key error, just ignore this" ... presumably because it's already been reviewed23:40
clarkb it does default to -1 or 0 if it can't parse the error out23:41
clarkbI think either we're not parsing the error out properly or the instanceof check is failing23:41
ianwi think that comes from the connector layer23:43
ianwi wonder if switching to the mysql connector might help23:44
clarkbwouldn't surprise me if they handle that better for whatever reason23:45
clarkbI feel like this is fixable in gerrit, but probably not today. We can use my test instance held for the openid deletion testing to dig in more later if we want. But for now we might consider h2 or trying mysql?23:47
ianwi agree.  i think first thing to try is mysql connector, and then h223:47
clarkbsounds good to me23:47
clarkbianw: that seems to work23:50
ianwalright, i have manually switched it to the mysql connector and it doesn't seem to be doing it23:50
clarkbya I think that is good enough for now. We can follow up with this upstream next23:51
clarkbbut use mysql connector in the interim23:51
ianwyep.  i will update the change to move gerrit out of the staging group to include this, so config update doesn't overwrite it23:51
clarkbI'll probably take a break for dinner once I've reviewed that updated change. Then try to be back around when it will be landing23:54
opendevreviewIan Wienand proposed opendev/system-config master: review02: move out of staging group
clarkbI think you need to rebase the backups change on that too23:55
clarkbthe change in ^ seems to match what I saw you update on the server not too long ago23:56
clarkbianw: that change isn't being tested by zuul?23:56
ianwyep backups shouldn't conflict23:56
ianwyes, i was just noticing that :)23:57
clarkb2021-07-18 23:56:23,144 ERROR zuul.Pipeline.openstack.release-approval: [e: 6a90559ab864448dae6831a5fc805777] Invalid config for change <Change 0x7f4f2460c3a0 opendev/system-config 797563,3>23:57
ianwzuul.model.TemplateNotFoundError: Project template system-required not found23:58
clarkbzuul.model.TemplateNotFoundError: Project template system-required not found23:58
clarkbyup just found that23:58
clarkbthis feels like broken zuul restart23:59
clarkblike it didn't load the configs successfully23:59
clarkbzuul pulls from gerrit23:59
clarkbianw: did you do a full zuul restart of the executors and mergers too? I wonder if they are/were still trying to talk to the old server?23:59

Generated by 2.17.2 by Marius Gedminas - find it at!