*** tosky has quit IRC | 00:20 | |
Alex_Gaynor | apt updates appear to be broken on Ubunt ARM64 machines | 00:30 |
---|---|---|
Alex_Gaynor | Can't connect to the apt mirror | 00:30 |
ianw | kevinz: ^ this is the problem we've seen periodically where hosts seem to shut themselves down | 00:44 |
ianw | both nb03 and the mirror are in SHUTOFF | 00:44 |
ianw | i've actually had a netconosle on the mirror all this time, trying to catch this | 00:44 |
ianw | [3449556.856931] afs: volume location server 23.253.200.228 in cell openstack.org is back up (code 0) | 00:45 |
ianw | stupid relative timestamps; but i didn't catch any oops or other messages out of the mirror | 00:45 |
ianw | this suggests to me the cloud shut it down without warning the host | 00:45 |
ianw | Alex_Gaynor: thanks for helping as we get this more stable :) | 00:46 |
*** larainema has quit IRC | 02:26 | |
*** auristor has quit IRC | 02:27 | |
*** larainema has joined #opendev | 02:29 | |
*** hamalq_ has quit IRC | 02:32 | |
*** iurygregory has quit IRC | 03:32 | |
*** mlavalle has quit IRC | 03:45 | |
*** mlavalle has joined #opendev | 03:48 | |
*** mlavalle has quit IRC | 04:11 | |
*** brinzhang has joined #opendev | 04:12 | |
*** mlavalle has joined #opendev | 04:13 | |
*** mlavalle has quit IRC | 04:16 | |
*** mlavalle has joined #opendev | 04:17 | |
*** ysandeep|away is now known as ysandeep | 04:17 | |
*** mlavalle has quit IRC | 04:23 | |
*** mlavalle has joined #opendev | 04:24 | |
*** mlavalle has quit IRC | 04:26 | |
*** mlavalle has joined #opendev | 04:27 | |
*** auristor has joined #opendev | 04:41 | |
*** mlavalle has quit IRC | 04:42 | |
*** mlavalle has joined #opendev | 05:07 | |
*** mgagne has quit IRC | 05:31 | |
*** mlavalle has quit IRC | 05:53 | |
*** mlavalle has joined #opendev | 05:57 | |
*** brinzhang has quit IRC | 07:02 | |
*** slaweq has joined #opendev | 08:14 | |
*** DSpider has joined #opendev | 08:17 | |
*** slaweq has quit IRC | 08:45 | |
*** tosky has joined #opendev | 13:14 | |
*** ykarel has joined #opendev | 14:27 | |
*** ykarel has quit IRC | 15:06 | |
Alex_Gaynor | We're now seeing 403s from the apt mirror in the ARM64 (Linaro) cloud. | 16:28 |
*** klonn has joined #opendev | 17:10 | |
*** slaweq has joined #opendev | 17:11 | |
*** klonn has quit IRC | 17:17 | |
*** klonn has joined #opendev | 17:17 | |
*** klonn has quit IRC | 17:24 | |
fungi | ugh, checking iy | 17:30 |
fungi | it | 17:31 |
fungi | http://mirror.regionone.linaro-us.opendev.org/pypi/simple/cryptography/ seems to work | 17:34 |
Alex_Gaynor | Clicking re-run let's see if it works now | 17:35 |
fungi | ahh, the afs-backed mirrors seem to return 403 | 17:35 |
fungi | afs-backed paths on our mirrors in other providers are working, so at least it doesn't seem to be a central afs problem | 17:36 |
fungi | the openafs lkm is still loaded and afsd is still running | 17:37 |
*** ysandeep is now known as ysandeep|away | 17:38 | |
fungi | nothing new in dmesg output since ianw booted the server up at ~00:45 utc | 17:38 |
fungi | [Sat Dec 19 00:47:15 2020] Unable to handle kernel paging request at virtual address 7f9a50d0d0509187 | 17:40 |
fungi | [Sat Dec 19 00:47:15 2020] Internal error: Oops: 96000004 [#1] SMP | 17:40 |
fungi | that seems to have happened during the afs mount bringup at boot, so maybe it broke while booting (maybe it's got a corrupt local cache from the unclean shutdown earlier) | 17:40 |
fungi | and afsd is unkillable | 17:44 |
fungi | i'm going to try a soft reboot | 17:45 |
fungi | #status log rebooting mirror02.regionone.linaro-us.opendev.org in order to attempt to free an unkillable afsd process | 17:46 |
openstackstatus | fungi: finished logging | 17:46 |
fungi | it's parent is init, and even kill -9 didn't work, stayed in Ss state, not even zombie | 17:46 |
fungi | curious to see if it will even shutdown | 17:47 |
fungi | if not, i'll hard reboot it via nova api | 17:47 |
fungi | it did eventually reboot, and now after afsd taking its sweet time starting up, i can finally get a directory listing again | 17:55 |
fungi | http://mirror.regionone.linaro-us.opendev.org/debian/ returns content now instead of a 403 firbidden | 17:56 |
fungi | Alex_Gaynor: sorry for the delay, i think it should be back in working order now | 17:56 |
Alex_Gaynor | will retry momentarily | 18:00 |
*** slaweq has quit IRC | 18:10 | |
*** klonn has joined #opendev | 18:31 | |
*** klonn has quit IRC | 19:15 | |
*** prometheanfire has quit IRC | 19:36 | |
*** prometheanfire has joined #opendev | 19:36 | |
*** klonn has joined #opendev | 22:19 | |
*** klonn has quit IRC | 22:41 | |
*** slittle1 has quit IRC | 22:43 | |
*** slittle1 has joined #opendev | 22:44 | |
*** fressi has joined #opendev | 22:58 | |
ianw | fungi: yeah, several times i've just had to rm -rf the cache dir to get things sane again | 23:23 |
*** DSpider has quit IRC | 23:25 | |
ianw | it maybe wouldn't be insane to have a boot job that did that before afs | 23:29 |
fungi | downside is every boot starts with a cold cache, even if it was a clean/controlled reboot | 23:30 |
*** tosky has quit IRC | 23:44 | |
*** fressi has quit IRC | 23:57 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!