Friday, 2021-07-30

ianwit is running a lot of git fetches as expected, hound does the polling internally, we only re-index if projects.yaml changes00:00
clarkbcorvus: I left a note00:01
clarkbbut +2'd00:01
corvusclarkb: i relpied to your comment -- i don't think it's an issue, but in replying to yours i think i found a -100:07
corvusclarkb: can you double check me on that?00:08
opendevreviewJames E. Blair proposed opendev/system-config master: Serve static website
opendevreviewJames E. Blair proposed opendev/system-config master: Serve static website
ianw./vcs-c7bbaa409001dd481855aa03ce4d290bd71f9d88/.git/index.lock is zuul and is why it's not updating i guess00:23
ianwi've removed the lock files.  i get the feeling we may not shut down hound nicely00:24
ianwit runs under supervisord so you'd think it would pass on signals00:25
fungii think there's also been one provider outage causing an unclean reboot in the not so distant past00:29
ianwJul 30 00:27:57 codesearch01 docker-hound[650]: 2021/07/30 00:27:57 Rebuilding zuul/zuul for 9f2e3e23e91f8d397f36bd03fcaccc4ca7ff4edc00:29
ianwfungi: yeah, that seems more likely.  we could add a container start step to remove lockfiles00:30
opendevreviewIan Wienand proposed opendev/system-config master: hound: clear git lockfiles before starting daemon
opendevreviewIan Wienand proposed opendev/system-config master: hound: clear git lockfiles before starting daemon
opendevreviewIan Wienand proposed opendev/system-config master: hound: clear git lockfiles before starting daemon
opendevreviewTristan Cacqueray proposed opendev/system-config master: Run matrix-gerritbot on eavesdrop
opendevreviewIan Wienand proposed opendev/system-config master: system-config-roles: test centos-8-stream
opendevreviewMerged opendev/system-config master: Add Debian Bullseye testing
opendevreviewIan Wienand proposed zuul/zuul-jobs master: configure-mirrors: don't install wheel mirror on CentOS 8 Stream
opendevreviewTristan Cacqueray proposed opendev/system-config master: Add script to import gerritbot configuration
opendevreviewIan Wienand proposed opendev/system-config master: Remove Fedora 32 mirror
opendevreviewMerged openstack/project-config master: Stop launch fedora-32 nodes nodepool
ianwi think our upstream centos-8-stream mirror is out of sync04:10
*** ykarel|away is now known as ykarel04:33
*** marios is now known as marios|ruck05:05
ianwmaybe not, i might have just chosen literally the worst time to try and build a centos-8-stream kernel module05:11
*** amoralej|off is now known as amoralej06:39
*** rpittau|afk is now known as rpittau07:21
rpittaugood morning! Please when anyone has a chance can you have a look at ? Thank you!07:36
*** ykarel is now known as ykarel|lunch08:25
*** artom_ is now known as artom09:03
*** ykarel|lunch is now known as ykarel09:52
fungithe renewed ssl cert still hasn't been picked up by apache, so i'll plan to restart apache as part of today's rename maintenance13:09
clarkbfungi: morning14:03
clarkbfungi: I need to water the garden really quick and find some breakfast but I've managed an early start :)14:03
fungiawesome, i'm working on the disable list now14:04
fungiand will then send the slightly less than one hour warning14:04
*** rpittau is now known as rpittau|afk14:07
fungi#status log Temporarily disabled automated deployment for,, and gitea* in preparation for 15:00 UTC project rename maintenance.14:07
opendevstatusfungi: finished logging14:08
fungi#status log There will be a brief outage of the Gerrit service on starting at 15:00 UTC today as part of a routine project rename maintenance:
opendevstatusfungi: finished logging14:10
fungier, that was supposed to be a notice, trying again14:10
fungi#status notice There will be a brief outage of the Gerrit service on starting at 15:00 UTC today as part of a routine project rename maintenance:
opendevstatusfungi: sending notice14:10
-opendevstatus- NOTICE: There will be a brief outage of the Gerrit service on starting at 15:00 UTC today as part of a routine project rename maintenance:
opendevreviewMonty Taylor proposed opendev/system-config master: Run matrix-gerritbot on eavesdrop
mordredcorvus, tristanC, clarkb : ^^ I left a -1 on the previous PS, but then just figured I could also just fix it14:17
mordredin the process, I also added a mode: 0600 to the template - because we're putting the token into that file14:17
corvusmordred: thanks for catching that, fix lgtm14:17
clarkbthe script to get the acces token is a nice way to address that14:33
clarkbfungi: have you started a screen on bridge yet? If not maybe you want to do that? Your terminal size is likely to be smaller than mine :)14:34
fungiclarkb: done14:37
fungithanks for the suggestion14:37
clarkbfungi: I also put the renames file on bridge. Path is in the etherpad if you want to double check that14:39
fungiyep, i was just fixing up the command line for it too14:39
clarkbdouble check the path and the file contents I mean14:39
clarkbI took it from
fungiinteresting that the rename_repos.yaml playbook copy on bridge was modified 7 minutes ago... did you do that?14:41
clarkbfungi: and maybe we should go ahead and accept the ssh host key as gerrit2 on review02?14:41
fungii'll do the host key now, sure14:41
fungiECDSA key fingerprint is SHA256:/aPoKpg+804wdezs21L9djZ4bOsLudpGF7m7779XVuQ.14:43
fungiWarning: Permanently added '[localhost]:29418' (ECDSA) to the list of known hosts.14:43
fungithat was as the gerrit2 account14:44
clarkbon review02? should be set then14:44
clarkbfungi: I left a thought on step 7 of the etherpad. I'm not sure what the best approach is there but I think either way it will be fine.14:44
fungiwhat's the way to have ansible list the inventory entries matching a group? i wanted to test that the gitea wildcard actually got matched correctly14:44
fungiand yeah i think #7 is safe as-is, but we probably want to double-check things after the jobs run14:46
clarkb`ansible-inventory disabled` ?14:46
clarkbmordred: ^14:46
fungilooks like `ansible-inventory --list disabled` but that throws a serialization error14:47
fungiERROR! Unexpected Exception, this is probably a bug: Object of type 'bytes' is not JSON serializable14:48
clarkbis it possible it doesn't like the * for some reason?14:49
clarkbit does error after it says it is parsing the emergency file14:49
fungiif i tell it to do yaml intsead (-y) it works14:50
clarkbha that did a thing14:50
fungi--list doesn't take a parameter though, it's just dumping the entire inventory14:51
clarkbmaybe `ansible disabled --list-hosts` I'm trying to read more on that14:52
fungiyeah, that did the trick, thanks14:52
mordredyeah - ansible --list-hosts disabled should do it14:53
fungiit is indeed correctly including all the gitea servers14:53
clarkbfungi: but not storyboard14:53
mordredsorry - I didn't catch the "ansible-inventory" part14:53
fungigood catch, that has a new hostname14:53 <- should be the entry in emergency.yaml not storyboard.opendev.org14:53
fungithat's better14:54
opendevreviewMonty Taylor proposed opendev/system-config master: Replace callback_whitelist with callback_enabled
mordredansible --list-hosts disabled is throwing some warnings - that ^^ will fix one of them14:56
clarkbI feel like those warnings are even less meaningful now with the version situation14:57
mordredthe other is a warning that we'll need python 3.8 for the next version of ansible14:57
mordredit is installable on bridge14:57
clarkbwe install ansible 4.0 or whatever and then it says ansible-core 2.1214:57
mordredso maybe the easy fix for that is just to apt install python3.8 and then update the pip install of ansible to use python3.8?14:58
clarkbmordred: I think we can do that all iva config management. Its all tested too14:58
clarkbbaically if you update config management to do python3.8 install and then install ansible under that it will update the bin paths14:58
clarkband our testing will do a decent job of ensuring the new python doesn't break us14:58
fungishould i go ahead and push the status notice that we're starting?14:59
clarkb++ I'm ready if you are14:59
fungi#status notice There will be a brief outage of the Gerrit service on in the next few minutes as part of a routine project rename maintenance:
opendevstatusfungi: sending notice14:59
-opendevstatus- NOTICE: There will be a brief outage of the Gerrit service on in the next few minutes as part of a routine project rename maintenance:
fungii'll go ahead and restart apache in just a moment to pick up the renewed https cert14:59
clarkblet me know when that is done and I can double check on my client15:00
fungiit's restarted15:00
clarkbhrm it still shows the august expiry. I wonder if we didn't renew the cert properly?15:01
clarkbI swear the timestamp on the cert was for july 2815:01
fungii guess we can look into that afterward15:01
clarkbya this is why we alert a month in advance15:01
fungii'm prepped to start the playbook if everyone's ready15:01
fungi(root screen session on bridge.o.o)15:02
clarkbI guess we're as ready as we will be15:02
clarkbstoryboard failed15:03
clarkblooks like storyboard-dev is unreachable?15:04
fungifatal: []: UNREACHABLE!15:04
clarkbI think we should make a copy of the playbook then disable everything that has already run15:05
clarkbthen let it run forward against storyboard and zuul and so on15:05
clarkbnow what I'm not clear about is whether or not we need that to be in a full sytem-config checkout15:05
clarkbdue to playbook and role lookup paths. I think I've always done it that way15:05
clarkb(this is why I have a checkout in my homedir)15:05
fungiokay, that version look right?15:08
fungideleted all the hosts up through storyboard-dev15:08
clarkbya it starts with storyboard15:08
clarkbI think that is good15:08
mordredyeah - for manual stuff you really need to disable ansible with the flag file then operate out of the zuul dir15:09
mordredbefcause of paths and stuff15:09
fungiupdated command there look right?15:09
clarkbfungi: yes15:09
mordredit's probably fine for renames - but having a tmp-system-config checkout dir is dangerous15:10
fungiyeah, this is one playbook we run manually, so presumably okay15:11
clarkbmordred: the problem is the disable file doesn't work for other reasons15:11
clarkbmordred: specifically the jobs queue up behind it with old state15:11
mordredoh awesome15:11
clarkbreally what we need is a disable flag where no jobs queue they just noop or fail15:11
fungii mean, it works as designed, we just didn't design it to do what we need for this15:12
clarkbthen we can reenqueue later15:12
clarkbfungi: right15:12
clarkbthe web ui loads for me so this extra sleep is probably too conservative :)15:12
fungipart of it though is because we're changing state behind gerrit/gitea/storyboard/zuul's backs15:12
clarkbI wasn't sure how soon after the 29418 port starts to listen it is actually able to process things15:12
fungiokay, that looks better15:12
clarkbyup that looks like it all went as expected15:13
fungino errors on the remainder, so it was just a problem with storyboard-dev being down15:13
clarkb lgtm15:13
fungiwhich is unimportant15:13
clarkb reports no change15:13
clarkb*no changes which I think is expected while stuff reindexes?15:14
fungi is redirecting properly too15:14
clarkbIf the no changes report above persists after reindexing is complete that would imply we need to reindex project changes with that command I linked the other day15:14
clarkbfungi: are you goign to force merge next?15:15
fungi[2021-07-30T15:12:40.775+0000] [Reindex changes v60-v60] INFO : Starting online reindex of changes from schema version 60 to 6015:16
fungiokay, so yeah i think we can move on15:16
fungii'll add myself to project bootstrappers for step 515:16
opendevreviewMerged openstack/project-config master: Rename x/tap-as-a-service to openstack/tap-as-a-service
fungithat's in ^15:18
fungishould i do the same with 802809 or just let it gate normally?15:18
clarkblets let that one gate normally15:18
fungiapproved it now15:19
fungideescalating my account privs now too15:19
clarkbI'm checking replication of 790093 now15:19
opendevreviewMerged opendev/project-config master: Add tap-as-a-service rename records
fungithat was quick15:20
clarkblooks replicated15:20
clarkbfungi: ya its noop jobs iirc15:20
fungionce 802809 is confirmed replicated i guess we can do step 715:21
clarkbI don't think replication for 802809 matters. The deploy jobs for 790093 have queued up behind the hourly job to run the cloud launcher playbooks which emans if we remove the hosts from the emergency file they should all update normally (hourly deploy only touches zuul and we aren't worried about it getting the wrong config I don't think)15:22
clarkball that is a long way of saying I think we can remove the hosts from the emergency file safely and then watch that jobs do the noop thing we expect them to15:22
fungiyeah, i buy that logic. okay removing those entries from the disable list now15:22
fungiand done15:22
clarkbI guess it isn't completely a noop as the acls for the project will change15:22
clarkbbut mostly a noop :)15:23
fungii'll start taking a look at the storyboard-dev situation, and can then look into the review https cert problem15:23
fungibut barring any problems with deploy jobs or completion of gerrit's change reindexing we can consider the maintenance concluded15:24
clarkbthe gerrit queue is dropping15:24
clarkbopenstack/tap-as-a-service reindexing is still queued though15:24
funginova says is in an active state. i'll check its console15:25
clarkbthinking out loud here I think what that other project changes reindex command would do for us is optimize this process. We are reindexing everything because that is what older gerrit had the option of doing but now we have the option of reindexing for specific projects15:28
clarkbthis would need testing, but maybe next time we rename we try with a set of project change reindexes instead and then if that fails we do the big reindex15:29
fungigood idea15:29
fungistoryboard-dev oob console shows the usual signs of cpu lockup (task xxxx:yyyy blocked for more than 120 seconds)15:30
fungiand i can't get a fresh login prompt on it15:30
clarkbunder 1k tasks now15:31
fungigoing to hard reboot15:31
clarkbwhen I first looked it was at 166215:31
fungi#status log Hard rebooted via Nova API due to what appears to be a CPU lockup.15:32
opendevstatusfungi: finished logging15:32
fungii can ssh into it now15:33
fungilooks like it died around 2021-01-10 19:40z based on the gap in its syslog15:34
fungiso it's been hung for a little over 6 months15:34
clarkbbeen a while :)15:34
fungilooking at review's https cert now15:34
fungiapache is loading /etc/letsencrypt-certs/ which was last modified 2021-05-28 03:57z15:35
fungiwe have an updated csr in that directory from today, but no new cert15:36
fungiso i think renewal is breaking15:36
clarkbbah its the conf and csr files that had new timestamps15:36
clarkbmy mistake on reading ls output15:36
fungi/var/log/ says " error:DNS problem: NXDOMAIN looking up TXT for - check that a DNS record exists for this domain"15:37
fungiand i concur15:37
fungiHost not found: 3(NXDOMAIN)15:37
clarkbI wonder if that got missed in the dns updates for the server move15:37
fungii'll see what's missing in our zonefile15:37
clarkblike maybe it got removed as part of the review01 cleanup15:37
fungiwe were probably relying on a cname for it at some point15:37
clarkb has changes now15:38
clarkbwith 96 tasks remaining15:38
clarkbreindexing lgtm15:38
clarkbthough it isn't done yet. It will likely chew on the bigger repos like nova and neutron for a bit15:38
clarkbfungi: ya looking at the git log I think we may have mistakenly removed the record for and left's record15:39
clarkbwe should be good to go if we just add that record back again15:40
clarkboh wow reindexing is done15:40
clarkbsuch zoom zoom15:40
opendevreviewJeremy Stanley proposed opendev/ master: Restore ACME challenge CNAME for review
clarkb+2 you can probably go ahead and approve that though15:43
clarkblooks like we have to wait for all of the hourly jobs to finish before the project-config deploy jobs will run15:44
clarkbProbably at least another 15 minutes?15:44
fungisounds about right15:44
clarkbI have detached from the screen as I think the work there is complete15:45
fungiyeah, closing it down now15:48
fungi[2021-07-30T15:39:30.606+0000] [Reindex changes v60-v60] INFO : Reindex changes to version 60 complete15:48
clarkbI think the offline reindex was about 28 minutes15:49
clarkbonlien seems to be about 38?15:49
clarkbwhen did it start?15:49
fungi0h26m50s before that15:49
clarkbat 15:12 - 15:39 so that is in the same time range as offline15:49
fungiso yeah, completed in under half an hour15:50
clarkbfungi: I guess the last thing to check is the openstack/tap-as-a-service acl after manage projects runs?15:50
clarkbthey are rehoming under the openstack meta project and adding a review priority category15:50
fungiin theory the acl shouldn't change15:50
clarkbfungi: it is changing, they add a review category15:51
fungiunless we don't push the updated acl into gerrot as part of the rename maintenance15:51
clarkbwe don't15:51
fungiyeah, then it should change ;)15:51
clarkbI think this is the first time where an acl change is part of a rename15:52
clarkbor at least it wasn't common before. But now  that openstack has the meta project it will be more common15:52
clarkbwe might requset people not do things like add review categories in those changes though15:52
clarkbto keep it as simple as possible15:52
opendevreviewMerged opendev/ master: Restore ACME challenge CNAME for review
clarkbthinking out loud for a reason to split those project-config changes up in the future with a followon to do the zuulconfig update: We don't run any ci when zuul hits the config error15:58
clarkbI don't think that will be a problem in this case. I've gone and double checked the projects.yaml and the acl file and I think they are all in good shape, but not having that CI is annoying15:58
fungiso we get no real validation of the rest15:59
clarkbI think that is a good enough reason for me to want to split those changes up in the future15:59
clarkbmanage-projects job is running now15:59
clarkbansible reports it was successful. Looking on the gerrit side I see its rights inherit from points to the meta config project now16:02
clarkbI'm not in the group to confirm I can set a review priority but gerrit won't do half acl updates16:02
clarkbI think we can consider that aspect done. Last thing is the zuul config reloading?16:03
fungiyeah, acl looks like it updated cleanly16:03
fungiso what needs to be done to the zuul scheduler?16:03
fungido we need to tell it to drop the old keys from zk?16:03
fungioh, you mean the deploy job running16:04
clarkbfungi: yes the infra-prod-run-zuul job should update zuul's config and then hup it or whatever the equiavlent is16:04
clarkbyup. We don't need to tell it to drop the old keys from zk as I don't think those changes have landed16:04
fungiyep, i concur16:04
clarkbthen the eavesdrop job will run to update gerritbot16:05
clarkband that should be everything16:05
clarkbon the process side it would be cool if ansible had a playbook connectivity mode where you give it a playbook and it does nothing but ensure it can connect to the hosts16:05
clarkbthen we could use that as a step 0 check16:06
clarkbmordred: ^ do you know if anything like that exists?16:06
clarkbzuul is done. Just eavesdrop is left16:09
clarkbbot has left as expected16:11
clarkbI guess it won't rejoin here until someone pushes to a repo we care about16:11
*** marios|ruck is now known as marios|out16:30
mordredclarkb: no, I don't believe it does. you can do ansible 'hoststring' -m setup17:04
mordredbut I agree, "please just do connectivity checks to the hosts that you would connect to in this playbook" would be really helpful17:05
mordreddmsimard: ^^ 17:06
*** melwitt is now known as jgwentworth17:55
ashrodrifungi: hi! how can i go about merging two gerrit accounts? id like to associate my work email with my original username, but its already in use.18:13
Clark[m]ashrodri: currently this isn't easy due to some gerrit user database conflicts. I have been correcting them as I am able to determine which are used and which aren't. One those are correct it becomes much easier for Gerrit admins to correct the database.18:17
Clark[m]Unfortunately Gerrit won't let us modify a single account without correcting all of them and there are currently more than 10018:18
Clark[m]The alternative is to take a downtime18:18
Clark[m]All that to say if you are patient with us it should be possible in a bit but for now it is easiest if you keep using one of the two18:22
fungiashrodri: yep, sorry, exactly what Clark[m] said... we should be able to do it "soon" but that ability is blocked by ongoing duplicate account cleanup in the wake of our gerrit 2.13->3.2 upgrade from november18:49
fungiwe've been taking it slowly so that we avoid accidentally breaking accounts people are actually using, at least as far as we can18:50
*** odyssey4me is now known as Guest284919:12
ashrodriI understand! Thanks!19:17

Generated by 2.17.2 by Marius Gedminas - find it at!