Saturday, 2020-12-19

Alex_Gaynorapt updates appear to be broken on Ubunt ARM64 machines00:30
Alex_GaynorCan't connect to the apt mirror00:30
ianwkevinz: ^ this is the problem we've seen periodically where hosts seem to shut themselves down00:44
ianwboth nb03 and the mirror are in SHUTOFF00:44
ianwi've actually had a netconosle on the mirror all this time, trying to catch this00:44
ianw[3449556.856931] afs: volume location server in cell is back up (code 0)00:45
ianwstupid relative timestamps; but i didn't catch any oops or other messages out of the mirror00:45
ianwthis suggests to me the cloud shut it down without warning the host00:45
ianwAlex_Gaynor: thanks for helping as we get this more stable :)00:46
*** slaweq has joined #opendev08:14
*** DSpider has joined #opendev08:17
Alex_GaynorWe're now seeing 403s from the apt mirror in the ARM64 (Linaro) cloud.16:28
fungiugh, checking iy17:30
fungi seems to work17:34
Alex_GaynorClicking re-run let's see if it works now17:35
fungiahh, the afs-backed mirrors seem to return 40317:35
fungiafs-backed paths on our mirrors in other providers are working, so at least it doesn't seem to be a central afs problem17:36
fungithe openafs lkm is still loaded and afsd is still running17:37
funginothing new in dmesg output since ianw booted the server up at ~00:45 utc17:38
fungi[Sat Dec 19 00:47:15 2020] Unable to handle kernel paging request at virtual address 7f9a50d0d050918717:40
fungi[Sat Dec 19 00:47:15 2020] Internal error: Oops: 96000004 [#1] SMP17:40
fungithat seems to have happened during the afs mount bringup at boot, so maybe it broke while booting (maybe it's got a corrupt local cache from the unclean shutdown earlier)17:40
fungiand afsd is unkillable17:44
fungii'm going to try a soft reboot17:45
fungi#status log rebooting in order to attempt to free an unkillable afsd process17:46
openstackstatusfungi: finished logging17:46
fungiit's parent is init, and even kill -9 didn't work, stayed in Ss state, not even zombie17:46
fungicurious to see if it will even shutdown17:47
fungiif not, i'll hard reboot it via nova api17:47
fungiit did eventually reboot, and now after afsd taking its sweet time starting up, i can finally get a directory listing again17:55
fungi returns content now instead of a 403 firbidden17:56
fungiAlex_Gaynor: sorry for the delay, i think it should be back in working order now17:56
Alex_Gaynorwill retry momentarily18:00
*** fressi has joined #opendev22:58
ianwfungi: yeah, several times i've just had to rm -rf the cache dir to get things sane again23:23
ianwit maybe wouldn't be insane to have a boot job that did that before afs23:29
fungidownside is every boot starts with a cold cache, even if it was a clean/controlled reboot23:30
