ianw | [Fri Oct 9 23:08:49 2020] Buffer I/O error on dev dm-11, logical block 526344, lost async page write | 00:14 |
---|---|---|
ianw | i think this fs is unhappy generally; trying to find out where this space is, is not working | 00:14 |
ianw | i can't fsck /dev/mapper/main-main | 00:21 |
ianw | says it's in use, but i've killed everything and unmounted /opt | 00:21 |
ianw | i'll reboot, stop the containers, and try and fsck this partition | 00:23 |
clarkb | do you need a vgchange -a n ? | 00:30 |
clarkb | that should disable the vg | 00:30 |
ianw | maybe i could have done something like that, a reboot has allowed me to fsck it | 00:32 |
ianw | once it's finished, hopefuly i can df it to find what's going on | 00:32 |
ianw | otherwise i guess i can just format it and start again, but i'd prefer not to | 00:33 |
clarkb | I'm guessing we've leaked partial image builds. At least that is what it has been in the past | 00:34 |
*** DSpider has quit IRC | 00:36 | |
fungi | odds are there were unlinked inodes with open file handles in process, so you won't see them reflected as "used" in any particular part of the tree but they'll still count toward the used blocks for the fs itself | 00:41 |
ianw | yeah, fsck gave back some space, but i'm still not sure if /opt/dib_tmp is just being really slow, or actually causing issues | 00:42 |
ianw | i'm clearing /opt/dib_tmp, very slowly | 01:04 |
ianw | ok, i guess it's back | 02:02 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: borg : match install-borg role to run deploy job https://review.opendev.org/757366 | 02:06 |
*** zbr has quit IRC | 02:29 | |
*** zbr has joined #opendev | 02:44 | |
ianw | #status log cleared full storage on nb01 and rebooted | 02:46 |
openstackstatus | ianw: finished logging | 02:46 |
ianw | #status log cleared openafs cache on mirror01.bhs1.ovh.opendev.org and rebooted | 02:46 |
openstackstatus | ianw: finished logging | 02:46 |
ianw | it seems to be serving again | 02:46 |
ianw | and listening to En Français by Pomplamoose for good measure | 02:49 |
clarkb | oh was it abad cache? you rm'd /var/cache/afs contents? | 02:59 |
ianw | clarkb: yeah, i've reported issues with a similar backtrace to the afs list before | 03:10 |
ianw | it seems when it shuts down hard, it's very likely to corrupt itself | 03:11 |
clarkb | also is openafs-client enabled in systemd? I had disabledit in order to try manually starting it after normal boot hadfinished | 03:12 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: DNM Forcing a gitea job failure to test gerrit replication https://review.opendev.org/757165 | 03:15 |
clarkb | that never ran the job I was trying to hold on friday | 03:15 |
clarkb | hopeflly that gets a held node I can use tomorrow morning | 03:15 |
openstackgerrit | Merged opendev/system-config master: borg : match install-borg role to run deploy job https://review.opendev.org/757366 | 03:21 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: install-borg: bump to latest version https://review.opendev.org/757382 | 03:33 |
*** zbr has quit IRC | 03:41 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: install-borg: bump to latest version https://review.opendev.org/757382 | 03:46 |
ianw | clarkb: ahh, i wondered why that was disabled. i've enabled it | 03:51 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: install-borg: bump to latest version https://review.opendev.org/757382 | 04:07 |
*** ykarel has joined #opendev | 04:38 | |
openstackgerrit | Merged opendev/system-config master: install-borg: bump to latest version https://review.opendev.org/757382 | 05:21 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: borg backup : add ethercalc02 https://review.opendev.org/757419 | 05:32 |
ianw | clarkb: ^ as discussed for first borg backup host | 05:33 |
*** marios has joined #opendev | 05:36 | |
*** ysandeep|away is now known as ysandeep | 05:38 | |
*** dmsimard has quit IRC | 05:55 | |
*** slaweq has joined #opendev | 05:55 | |
*** dmsimard has joined #opendev | 05:58 | |
*** marios has quit IRC | 06:02 | |
*** slaweq has quit IRC | 06:08 | |
*** marios has joined #opendev | 06:10 | |
*** andrewbonney has joined #opendev | 07:00 | |
*** tosky has joined #opendev | 07:03 | |
*** hashar has joined #opendev | 07:05 | |
*** slaweq has joined #opendev | 07:13 | |
*** rpittau|afk is now known as rpittau | 07:35 | |
*** slaweq_ has joined #opendev | 07:47 | |
*** slaweq has quit IRC | 07:48 | |
*** DSpider has joined #opendev | 07:49 | |
*** priteau has joined #opendev | 07:51 | |
*** sshnaidm|off is now known as sshnaidm | 07:58 | |
*** DSpider has quit IRC | 08:00 | |
*** DSpider has joined #opendev | 08:07 | |
*** ysandeep is now known as ysandeep|lunch | 08:11 | |
*** zbr has joined #opendev | 08:19 | |
*** paramite has joined #opendev | 08:19 | |
dirk | clarkb: it is a bit awkward that this lsb installation issue slipped through. I submitted a fix today | 08:28 |
*** elod is now known as elod_pto | 08:41 | |
*** ysandeep|lunch is now known as ysandeep | 09:10 | |
*** openstackgerrit has quit IRC | 09:16 | |
frickler | #status log restarted gerritbot on eavesdrop once again | 09:17 |
openstackstatus | frickler: finished logging | 09:17 |
frickler | infra-root: ^^ this seems to be becoming an almost daily issue, anything we can do about this? | 09:19 |
AJaeger | frickler: was that really needed? I saw it reporting earlier... | 09:21 |
AJaeger | No problems on #zuul as far as I can see | 09:21 |
frickler | AJaeger: I missed a report for a patch I submitted for devstack and there were two complaints in #-infra, so at least something was wrong. | 09:25 |
frickler | I didn't spot any issue in the docker log, but that also seems to go back for a couple of hours, so likely isn't very helpful when issues happened some time ago | 09:26 |
AJaeger | strange... | 09:29 |
jrosser | its certainly now doing something, when previously it wasnt, for the things i'm interested in | 09:30 |
AJaeger | great! | 09:31 |
danpawlik | Hi. Is everything ok with AFS mirror? | 09:39 |
danpawlik | mostly related to mirror.{fedora,centos,epel} | 09:39 |
*** ralonsoh has joined #opendev | 09:39 | |
frickler | danpawlik: likely not, looks like they are three days old | 09:54 |
frickler | fungi: ^^ also while mirror.ubuntu is recent, mirror.ubuntu-ports and mirror.debian still seem to be 7 days old, did you unlock those latter ones, too? | 09:56 |
*** iurygregory has quit IRC | 09:57 | |
*** iurygregory has joined #opendev | 09:58 | |
danpawlik | frickler: exactly ;) | 10:03 |
*** ysandeep is now known as ysandeep|afk | 10:32 | |
*** lpetrut has joined #opendev | 10:36 | |
*** ysandeep|afk is now known as ysandeep | 11:26 | |
*** priteau has quit IRC | 11:59 | |
fungi | frickler: i did not do any other volumes, just ubuntu. i'll start those now | 12:04 |
fungi | i've removed stale vos release locks for mirror.ubuntu-ports and mirror.debian | 12:07 |
fungi | but i need to go run errands for the next few hours and won't be on hand to check their updates | 12:07 |
fungi | there were stale locks for mirror.fedora, mirror.centos and mirror.epel which i've also removed now | 12:09 |
danpawlik | cool fungi++ | 12:19 |
fungi | i figure centos is probably the most urgent one to complete first, so i've held the flock for it in a root screen session on mirror-update.opendev.org and am performing a manual vos release with -localauth for it in a root screen session on afs01.dfw.openstack.org | 12:25 |
fungi | hoping to get it a headstart before the fileserver is starved for bandwidth | 12:25 |
fungi | stepping out now, should hopefully be back by 16:00 utc | 12:38 |
*** larsks has joined #opendev | 12:40 | |
*** ykarel has quit IRC | 13:26 | |
*** iurygregory has quit IRC | 13:27 | |
*** ykarel has joined #opendev | 13:27 | |
*** iurygregory has joined #opendev | 13:28 | |
*** slaweq_ is now known as slaweq | 14:09 | |
clarkb | re gerritbot according to the logs it thinks it is still connected | 14:43 |
clarkb | which makes me think this isn't another python3 conversion issue but instead some sort of problem interfacing with the freenode network | 14:44 |
clarkb | fungi: can we reenable bhs1 in nodepool now too? (just remove nl04 from the emergency file?) | 14:46 |
*** sgw has left #opendev | 14:47 | |
*** mlavalle has joined #opendev | 15:04 | |
fungi | clarkb: https://mirror.bhs1.ovh.opendev.org/ubuntu/dists/bionic/ yeah i think so | 15:08 |
fungi | as for gerritbot, my suspicion has been something times out the connection to the irc server and the irc client module doesn't notice, so keep sending messages to a dead socket | 15:09 |
fungi | we see evidence of socket timeout reported by the server in channels | 15:10 |
*** ysandeep is now known as ysandeep|away | 15:11 | |
clarkb | I'm in meetings right now, but then can remove nl04 from emergency if htat isn't done already | 15:12 |
clarkb | then I'm going to work on getting my caught gitea99 into shape for replication | 15:12 |
clarkb | I've already updated the ssh key for that but realize I need to tweak iptables rules as well as formatting and remounting xvde onto /var/gitea to have enough disk space | 15:13 |
*** hashar has quit IRC | 15:31 | |
*** qchris has quit IRC | 15:32 | |
*** ykarel is now known as ykarel|away | 15:37 | |
clarkb | I've removed nl04.openstack.org from the emergency file which should restore its config on the next hourly pass | 15:47 |
*** rpittau is now known as rpittau|afk | 16:01 | |
*** ralonsoh has quit IRC | 16:05 | |
*** nuclearg1 has joined #opendev | 16:05 | |
*** ykarel|away has quit IRC | 16:07 | |
*** marios has quit IRC | 16:13 | |
*** priteau has joined #opendev | 16:20 | |
*** tosky has quit IRC | 16:21 | |
clarkb | paladox: are you around? if so do you know if the gerrit serverId config setting identifies a logical gerrit install or a specific server? seems like instanceId is for the specific server? | 16:38 |
clarkb | I've just discovered that notedb relies on this apparently but the migration doc doesnt mention it :( anyway wantto figure out what we ahould set ours to | 16:39 |
paladox | https://gerrit-review.googlesource.com/Documentation/config-gerrit.html#gerrit.instanceId | 16:39 |
paladox | you can really set it to any, you can even set up a new install and just use that value | 16:40 |
paladox | https://github.com/wikimedia/puppet/blob/production/modules/gerrit/templates/gerrit.config.erb#L137 | 16:40 |
clarkb | ya on my test serverit is a uuid | 16:41 |
clarkb | whichbwas auto set, but I dont want ansible to delete it then causeus problems with notedb after | 16:41 |
clarkb | looks like wikimedia is using the uuid value in config mgmt | 16:42 |
clarkb | I guess we can do that too then | 16:42 |
clarkb | thanks! | 16:42 |
clarkb | in thats case I think we may just set it after the migration is done to whatever value is chosen | 16:44 |
*** hamalq has joined #opendev | 16:49 | |
*** openstackgerrit has joined #opendev | 17:04 | |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Disable change.move and enableSignedPush in gerrit https://review.opendev.org/757153 | 17:04 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Stop blocking /p/ in the gerrit apache vhost https://review.opendev.org/757155 | 17:04 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Update gerrit container image to 3.2 https://review.opendev.org/757176 | 17:04 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Switch to zuul's default gerrit auth type https://review.opendev.org/757156 | 17:04 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Clean up old Gerrit html theming and commentlinks https://review.opendev.org/757161 | 17:04 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Remove reviewdb config from Gerrit https://review.opendev.org/757162 | 17:04 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Post 2.16 upgrade config updates https://review.opendev.org/757625 | 17:04 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Switch to zuul's default gerrit auth type https://review.opendev.org/757156 | 17:07 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Update gerrit container image to 3.2 https://review.opendev.org/757176 | 17:07 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Clean up old Gerrit html theming and commentlinks https://review.opendev.org/757161 | 17:07 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Remove reviewdb config from Gerrit https://review.opendev.org/757162 | 17:07 |
clarkb | sorry noticed a slight ordering bug in the previous stack | 17:07 |
clarkb | one thing that isn't clear to me is how all those changes will work out with our zuul cd stuff | 17:08 |
clarkb | I think we'll be ok because they mostly reflect the end state just different portions of each end state. As long as we don't restart gerrit between the beginning and end of the application of those we should be fine | 17:09 |
clarkb | we can also squash them all together on the dayo f and land a single change, but I think for now this makes review simpler | 17:09 |
clarkb | actually zuul will probably be off | 17:09 |
clarkb | so we can land the stack then manually update zuul's auth config for gerrit, start zuul, then have it run with the end of the stack | 17:10 |
*** andrewbonney has quit IRC | 17:29 | |
*** qchris has joined #opendev | 17:42 | |
*** nuclearg1 has quit IRC | 17:50 | |
fungi | infra-root: things are not looking great for the centos volume... "Failed to end transaction on rw volume: Possible communication failure" | 17:53 |
clarkb | fungi: that is from a vos release? | 17:54 |
*** lpetrut has quit IRC | 17:59 | |
fungi | yeah, it apparently got stuck when afs02.dfw hung late last week and had to be ungracefully rebooted | 18:01 |
fungi | looks like there's a transaction which may need to be ended, will likely need the full replica on 02 replaced i'm guessing | 18:02 |
AJaeger | config-core, please review https://review.opendev.org/757102 and https://review.opendev.org/753773 | 18:08 |
clarkb | I've just started a review-test replication to my held gitea node. iptables is blocking ports 3081 and 3000 currently so we can review what is replicated before exposing it | 18:14 |
clarkb | it does look like the changes meta refs are being replicated which will increase the disk usage of our replica by quite a bit (I think it roughly doubles the size of the git repos) | 18:14 |
clarkb | if anyone else has time to look at replication config options to determine if excluding the refs/changes/XY/ABCXY/meta ref is possible that may be useful to avoid very expensive replication | 18:15 |
clarkb | (I've looked and can't figure out a way to do that) | 18:15 |
clarkb | AJaeger: I'll take a look after lunch | 18:16 |
openstackgerrit | Merged zuul/zuul-jobs master: Use ansible_distribution* facts instead of ansible_lsb https://review.opendev.org/742310 | 18:24 |
clarkb | looking at replication we do already replicate refs/notes (I think these make sense as it summarizes completed reviews) and refs/users/ (this is where your drafts in the web editor go) | 18:45 |
clarkb | that means the major addition is refs/changes/XY/ABCXY/meta | 18:46 |
fungi | yes, i make a lot of use of the refs/notes content, i'm probably not the only one | 18:46 |
fungi | i'm hoping we can come up with a way to configure gitea to display gerrit's notes (it can display notes but hard-codes the notes base last i checked) | 18:47 |
clarkb | fungi: yup definitely useful but I wonder if the refs/change/..../meta content supercedes it | 18:48 |
clarkb | the big difference is the notes are a summary but the meta ref is a full history aiui | 18:48 |
fungi | oh, it may. i mainly use its notes to find the change url, who approved and what date and time it merged | 18:49 |
clarkb | 2196 tasks to go .... doing this full replica is not fast | 18:49 |
clarkb | but at least it seems to be working and the only major change from what we have today is the addition of the notedb meta refs | 18:49 |
clarkb | (that I can see so far) | 18:49 |
*** qchris_ has joined #opendev | 18:51 | |
clarkb | also need to wait and see what disk use looks like to determine if we have to rebuild the gitea servers :/ | 18:51 |
*** qchris has quit IRC | 18:51 | |
clarkb | thinking out loud here you can tell the replication plugin to not replicate hidden projects | 18:53 |
clarkb | I half wonder if we should consider setting the hidden flag on certain subsets of repos (the deb package repos come to mind) | 18:53 |
openstackgerrit | Merged openstack/project-config master: Update neutron stable grafana dashboards https://review.opendev.org/757102 | 18:56 |
fungi | well, it's not like they're changing, so after initial replication there won't really be any additional replication churn for them | 18:59 |
clarkb | correct, but its more than doubling the size of our git repos I think | 18:59 |
clarkb | basically what was once a 15GB database in a single mysql instance is now in all the git repos and copied 9 times | 19:00 |
clarkb | current gitea01 repo size post packing is 12GB, the repo growth on review-test after the migration and pre packing was ~15GB | 19:00 |
clarkb | and now that I think of it I'm compaing packed vs unpacked sizes so it isn't quite doubling it | 19:01 |
clarkb | I think the packed review-test size growth was ~5GB | 19:01 |
clarkb | we're adding about 50% disk overhead | 19:01 |
*** priteau has quit IRC | 19:04 | |
openstackgerrit | Merged openstack/project-config master: Add ansible-role-refstack-client under x namespace https://review.opendev.org/753773 | 19:04 |
*** Eighth_Doctor has quit IRC | 19:16 | |
*** mordred has quit IRC | 19:16 | |
*** Eighth_Doctor has joined #opendev | 19:25 | |
*** mordred has joined #opendev | 19:47 | |
clarkb | down to 1979 tasks now | 20:01 |
clarkb | once I'm happy with the status of replication the next thing I want to look at is manage-projects and the delete project plugin | 20:01 |
clarkb | basically I'll check that I can create a new project and then delete it | 20:01 |
clarkb | in quickly checking jeepyb for basic validation of ^ I've noticed that while manage-projects doesn't use the db a few other jeepyb gerrit integrations do (update spec, update bug, and welcome message) | 20:02 |
clarkb | any opinions on whether or not we should try and fix those, disable them proactively, or just let them eventually fail? | 20:04 |
fungi | all things which could become zuul jobs in opendev/base-jobs or openstack/project-config | 20:04 |
fungi | they need creds to either authenticate to lp or gerrit | 20:04 |
fungi | but i think they can be sufficiently generalized | 20:04 |
clarkb | ya also I think its harmeless to leave them in place for now. When we migrate to notedb they'll just look at stale content in the db then we'll drop reviewdb and they'll error setting up a connection to reviewdb | 20:04 |
clarkb | but we can also remove them from hooks since we know they will stop working | 20:05 |
clarkb | I expect that manage-projects will work because it is tested in gerritlib and jeepyb against a more up to date gerrit | 20:06 |
clarkb | but I want to double check it on review-test too | 20:06 |
clarkb | gives me an excuse to test the delete-project plugin as well :) | 20:06 |
clarkb | but trying not to get too far ahead of myself as I'm about to enter the ansiblefest then summit then ptg period of time where I'll have less time for this | 20:09 |
openstackgerrit | sebastian marcet proposed opendev/system-config master: OpenstackID v3.0.15 https://review.opendev.org/757645 | 20:21 |
*** tosky has joined #opendev | 20:26 | |
ianw | clarkb/fungi: you ok with ethercalc being the test-case for borg backups with https://review.opendev.org/#/c/757419/ ? | 20:58 |
clarkb | +2 yes | 20:59 |
openstackgerrit | Clark Boylan proposed opendev/system-config master: Remove reviewdb config from Gerrit https://review.opendev.org/757162 | 21:10 |
clarkb | hopefully that change will pass CI testing now | 21:10 |
*** slaweq has quit IRC | 21:13 | |
openstackgerrit | Merged opendev/system-config master: OpenstackID v3.0.15 https://review.opendev.org/757645 | 21:18 |
fungi | ianw: sounds great, thanks! | 21:33 |
openstackgerrit | Merged opendev/system-config master: Add gerrit static files that were lost in ansiblification https://review.opendev.org/746335 | 21:57 |
openstackgerrit | Merged opendev/system-config master: Stop replicating to local git mirror on gerrit https://review.opendev.org/757152 | 21:57 |
*** slaweq has joined #opendev | 22:08 | |
clarkb | when we do project renames in gerrit we rely on online reindexing right? | 22:20 |
clarkb | I've realized that we ahould test a project rename too which should be mv the git repo, delete caches, and reindex but wondering if that needs to be offline | 22:21 |
clarkb | (I'll test this between project create and delete) | 22:21 |
clarkb | also I bet we'll orphan accountPatchReviewDb data that way but itsprobably fine | 22:22 |
*** slaweq has quit IRC | 22:22 | |
openstackgerrit | Merged opendev/system-config master: borg backup : add ethercalc02 https://review.opendev.org/757419 | 22:22 |
*** qchris_ has quit IRC | 22:41 | |
fungi | clarkb: online, yes | 22:53 |
*** qchris_ has joined #opendev | 22:55 | |
clarkb | down to 1400 tasks on the replication, this will likely run overnight | 22:56 |
ianw | i've rebooted nb03 that was off ... :/ | 23:07 |
*** tosky has quit IRC | 23:15 | |
*** mlavalle has quit IRC | 23:17 | |
*** hamalq has quit IRC | 23:52 | |
*** DSpider has quit IRC | 23:58 | |
*** larsks has quit IRC | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!