ianw | yeah, rsa key seems to do the same thing | 00:00 |
---|---|---|
clarkb | just talking out loud here: other things that can cause ssh to fail include the shell not being set properly (wrong path, not being installed, etc). Permissions on the authorized_keys file. What else? But you ruled those issues out by using the key to login from elsewhere | 00:02 |
clarkb | if it was source side permissions on the key itself it wouldn't offer it at all and you said it was being offered | 00:03 |
ianw | yeah, the user was generated by ansible, same as all the other users | 00:06 |
fungi | corvus: yeah, it does appear that gerrit starts timing out worker threads and throwing write errors in that situation, at least judging from the exceptions raised in the log about a subset of the pushes from that series. how bad it gets may also depend on background load on the system too | 00:09 |
fungi | ianw: agreed, it's possible the openssh client on trusty may have trouble with an openssh server on focal. i saw something similar trying to talk to very old gerrit (mina-sshd) from a focal openssh client too. i expect it comes down to deprecated ciphers/hashes in focal | 00:11 |
ianw | i've built an OpenSSH_7.6p1 in /tmp and it still doesn't work | 00:12 |
ianw | but it is linked against it's old openssl | 00:12 |
fungi | right, i think it's likely more to do with what openssl supports or doesn't | 00:12 |
fungi | since openssh is relying on it for cryptographic primitives | 00:13 |
fungi | i suppose we could snapshot the server and then try an in-place upgrade to xenial | 00:14 |
corvus | happy [utc] lunar new year! | 00:29 |
fungi | and to you! | 00:36 |
*** tosky has quit IRC | 00:36 | |
corvus | clarkb: npr talks krz: https://www.npr.org/2021/02/11/966499158/reading-the-game-kentucky-route-zero | 00:39 |
clarkb | corvus: its the sort of game that non gamers can get into too if you are interested | 00:46 |
clarkb | doesn't require you to react quickly or figure out a controller to perform coordinated tasks | 00:46 |
*** mlavalle has quit IRC | 01:01 | |
*** DSpider has quit IRC | 01:08 | |
openstackgerrit | Goutham Pacha Ravi proposed opendev/yaml2ical master: Add third week meetings schedule https://review.opendev.org/c/opendev/yaml2ical/+/775304 | 01:24 |
ianw | so it turned out to be me misreading user names | 01:33 |
ianw | sigh | 01:33 |
ianw | fungi: do we need the various /homes on wiki server backed up? | 01:33 |
fungi | doubtful but i'll take a quick look | 01:36 |
*** hemanth_n has joined #opendev | 01:52 | |
fungi | ianw: other than those usernames being a trip down memory lane, i don't see anything important to hold onto (it's all old downloads of wiki source, old copies of configs, et cetera | 02:18 |
fungi | ) | 02:18 |
ianw | thanks, i'll probably just prune them. i need to put the db in too | 02:30 |
ianw | sorry, a little distracted, we just went back into a 5 day lockdown due to the UK strain getting out :( | 02:31 |
fungi | oof | 02:33 |
*** ysandeep|out is now known as ysandeep|rover | 02:42 | |
fungi | from what i gather it's already running rampant here, we're more concerned about the south african strain at this point | 02:48 |
ianw | :( | 02:54 |
*** dviroel has quit IRC | 03:09 | |
ysandeep|rover | #opendev We're still seeing some limestone mirror related RETRY_LIMIT failures https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_99a/775067/1/check/openstack-tox-py37/99aa3cd/job-output.txt | 03:35 |
fungi | those are almost certainly the continued random missing ipv4 route problem, we abort builds pretty much first thing if they hit that, but if the build had already retried twice before (maybe it got extremely lucky and hit that problem three times in a row? or maybe it's a job which already tends to crash nodes at random some of the time?) and then ran into that on its third try | 03:42 |
fungi | oh, looking at that log, the failure was something different | 03:43 |
fungi | that's acting like the mirror there is having trouble getting some things in afs | 03:44 |
ysandeep|rover | hmm, Failed to fetch https://mirror.regionone.limestone.opendev.org/ubuntu/dists/bionic/universe/binary-amd64/Packages 403 Forbidden [IP: 2607:ff68:100:54:f816:3eff:feb5:4635 443] | 03:44 |
fungi | ls: cannot open directory '/afs/openstack.org/': Connection timed out | 03:45 |
fungi | [Fri Feb 12 02:02:13 2021] afs: Lost contact with volume location server 23.253.200.228 in cell openstack.org (code -1) | 03:45 |
fungi | [Fri Feb 12 02:03:08 2021] afs: Lost contact with volume location server 104.130.136.20 in cell openstack.org (code -1) | 03:45 |
fungi | ipv4 network connectivity problems there? | 03:45 |
fungi | it can ping them | 03:46 |
fungi | trying to restart openafs-client on it now but it's just hanging | 03:48 |
fungi | i'll try rebooting the mirror | 03:48 |
ysandeep|rover | ack | 03:48 |
fungi | the reboot may take a minute to give up on the afs client | 03:49 |
fungi | there it goes | 03:49 |
fungi | it's booted back up | 03:51 |
fungi | [Fri Feb 12 03:54:21 2021] afs: network error for 104.130.136.20:7003: origin 0 type 3 code 10 (Destination Host Prohibited) | 03:55 |
fungi | now it's saying that for both of them too | 03:55 |
fungi | and still not connecting | 03:55 |
fungi | um, our other mirrors are saying the same thing | 03:56 |
ysandeep|rover | fungi.. fyi i just noticed another failure with different mirror | 03:56 |
ysandeep|rover | https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_9b6/775310/3/check/tripleo-validations-centos-8-molecule-ceph/9b6b259/job-output.txt | 03:56 |
ysandeep|rover | ~~~ | 03:56 |
ysandeep|rover | error: Status code: 403 for https://mirror.kna1.airship-citycloud.opendev.org/centos/8/AppStream/x86_64/os/repodata/repomd.xml (IP: 188.212.109.26) (https://mirror.kna1.airship-citycloud.opendev.org/centos/8/AppStream/x86_64/os/repodata/repomd.xml). | 03:56 |
ysandeep|rover | ~~~ | 03:56 |
fungi | at least ovh gra1 | 03:56 |
fungi | where i just tried | 03:56 |
fungi | ysandeep|rover: yes, i think something has just happened to our fileservers | 03:56 |
fungi | i can ssh into both of them, they've been up for weeks since last reboots | 03:57 |
fungi | oh, that's the afs db servers everything's complaining about, not the fileservers | 03:59 |
fungi | they're both reachable by ssh and up for over a week | 03:59 |
fungi | looks like everything lost contact with them at 02:05 utc | 04:00 |
fungi | nothing interesting in dmesg since their last reboots | 04:01 |
fungi | i can get to them via ipv4 as well | 04:01 |
fungi | ianw: any guesses as to what to check? syslog is basically quiet as well | 04:06 |
fungi | if you filter out the constant snmp noise anyway | 04:06 |
fungi | ahh, we've got separate service logs under /var/log/openafs/ | 04:06 |
fungi | but they're also no help, basically nothing in them since the most recent reboots when those services started | 04:08 |
fungi | ianw: ansible updated our iptables rules just before everything lost contact | 04:09 |
fungi | i think the client errors about "Destination Host Prohibited" are literal | 04:10 |
fungi | 22:46 <openstackgerrit> Merged opendev/system-config master: Refactor AFS groups https://review.opendev.org/c/opendev/system-config/+/775057 | 04:11 |
fungi | i bet it was deploying that | 04:11 |
fungi | now to figure out what we were previously allowing on the db servers which we suddenly blocked | 04:12 |
fungi | iptables_extra_public_udp_ports: [7000,7001,7002,7003,7004,7005,7006,7007] | 04:15 |
fungi | i'm going to temporarily put these servers into the emergency disable list for ansible | 04:16 |
fungi | okay, i think everything's back up | 04:20 |
fungi | #status log Added afsdb01 and afsdb02 servers to emergency disable list and added back missing public UDP ports in firewall rules while we work out what was missing from 775057 | 04:21 |
openstackstatus | fungi: finished logging | 04:21 |
fungi | ysandeep|rover: i *think* everything should be back to normal now | 04:22 |
fungi | ianw: do we need to rename inventory/service/group_vars/afs.yaml and inventory/service/group_vars/afsdb.yaml to match the new group names? | 04:25 |
ysandeep|rover | fungi thank you, ++ | 04:26 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Update AFS group vars filenames https://review.opendev.org/c/opendev/system-config/+/775311 | 04:29 |
fungi | ianw: ysandeep|rover: ^ i think that's the longer term fix | 04:29 |
fungi | unless i'm misunderstanding how these pieces fit together | 04:29 |
ysandeep|rover | fungi, thanks!, I am rechecking failed patches, i will report here if i find still some issues. | 04:31 |
fungi | appreciated! and thanks for letting us know you were seeing a problem | 04:32 |
ysandeep|rover | thanks for fixing issues so quickly :) | 04:33 |
*** ykarel_ has joined #opendev | 04:51 | |
*** ykarel_ is now known as ykarel | 05:59 | |
*** marios has joined #opendev | 06:20 | |
*** rchurch has quit IRC | 06:24 | |
*** eolivare has joined #opendev | 06:55 | |
*** slaweq has joined #opendev | 07:11 | |
ianw | arrrgghhh terribly sorry to step away and leave that | 07:11 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Update AFS group vars filenames https://review.opendev.org/c/opendev/system-config/+/775311 | 07:27 |
ianw | oh, the afs servers survived because they're still in emergency | 07:33 |
*** sboyron_ has joined #opendev | 07:45 | |
*** ykarel_ has joined #opendev | 07:47 | |
*** ralonsoh has joined #opendev | 07:50 | |
*** ykarel has quit IRC | 07:50 | |
*** hashar has joined #opendev | 07:54 | |
*** ysandeep|rover is now known as ysandeep|lunch | 07:56 | |
*** rpittau|afk is now known as rpittau | 08:02 | |
*** andrewbonney has joined #opendev | 08:13 | |
*** tosky has joined #opendev | 08:24 | |
fungi | yep | 08:33 |
fungi | i realized that when i went to add these | 08:33 |
fungi | we can peel them back carefully and make sure things still work | 08:33 |
fungi | and no apologies needed, these things happen | 08:33 |
*** jpena|off is now known as jpena | 08:56 | |
*** ysandeep|lunch is now known as ysandeep|rover | 08:59 | |
*** DSpider has joined #opendev | 09:44 | |
*** redrobot9 has joined #opendev | 10:26 | |
*** redrobot has quit IRC | 10:27 | |
*** redrobot9 is now known as redrobot | 10:27 | |
*** ykarel_ is now known as ykarel | 10:37 | |
*** ysandeep|rover is now known as ysandeep|afk | 11:18 | |
*** dviroel has joined #opendev | 11:22 | |
*** sshnaidm|afk has quit IRC | 11:41 | |
*** dtantsur|afk is now known as dtantsur | 11:41 | |
*** ysandeep|afk is now known as ysandeep|rover | 11:42 | |
*** sshnaidm|afk has joined #opendev | 11:49 | |
*** sshnaidm|afk is now known as sshnaidm|off | 11:50 | |
*** hashar has quit IRC | 12:03 | |
*** fressi has joined #opendev | 12:06 | |
*** jpena is now known as jpena|lunch | 12:31 | |
openstackgerrit | Sorin Sbârnea proposed zuul/zuul-jobs master: update-json-file: avoid failure when destination does not exists https://review.opendev.org/c/zuul/zuul-jobs/+/775373 | 12:33 |
*** hashar has joined #opendev | 12:42 | |
*** hemanth_n has quit IRC | 12:45 | |
openstackgerrit | Sorin Sbârnea proposed zuul/zuul-jobs master: update-json-file: avoid failure when destination does not exists https://review.opendev.org/c/zuul/zuul-jobs/+/775373 | 13:22 |
openstackgerrit | Sorin Sbârnea proposed zuul/zuul-jobs master: update-json-file: avoid failure when destination does not exists https://review.opendev.org/c/zuul/zuul-jobs/+/775373 | 13:26 |
*** jpena|lunch is now known as jpena | 13:30 | |
*** ysandeep|rover is now known as ysandeep|mtg | 13:32 | |
*** d34dh0r53 has quit IRC | 13:46 | |
*** d34dh0r53 has joined #opendev | 13:54 | |
*** mlavalle has joined #opendev | 14:00 | |
*** ysandeep|mtg is now known as ysandeep | 14:05 | |
*** ysandeep is now known as ysandeep|away | 14:07 | |
*** fressi has quit IRC | 14:22 | |
*** d34dh0r53 has quit IRC | 14:45 | |
*** d34dh0r53 has joined #opendev | 14:45 | |
*** rpittau is now known as rpittau|afk | 15:03 | |
*** ykarel has quit IRC | 15:42 | |
*** ykarel has joined #opendev | 15:42 | |
*** roman_g has joined #opendev | 15:45 | |
*** lbragstad_ has joined #opendev | 15:46 | |
frickler | slaweq: I have a held node for you on https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_881/773670/3/check/neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid/88171f4/ can you remind me of your ssh key once more? (we really should make a list of those somewhere) | 15:47 |
roman_g | Hello team. Is there any maintenance on CityCloud mirror VM? We are getting either "unable to connect" when trying you apt-get install packages from mirror.kna1.airship-citycloud.opendev.org, or HTTP 403, or something like that. | 15:47 |
slaweq | frickler: http://paste.openstack.org/show/802603/ | 15:47 |
slaweq | thx a lot | 15:47 |
frickler | slaweq: root@172.99.69.133 , let us know how it goes | 15:49 |
*** lbragstad has quit IRC | 15:50 | |
frickler | roman_g: there was an issue about 12h ago, are you looking at old logs or is that still happening for you now? | 15:50 |
roman_g | frickler haha, I see old issues. Need to investigate latest ones. | 15:52 |
roman_g | Thank you. | 15:52 |
slaweq | frickler: sure, thx a lot | 15:55 |
roman_g | haha -> aha, sorry | 15:56 |
*** sboyron_ has quit IRC | 15:58 | |
*** LowKeys has joined #opendev | 16:03 | |
LowKeys | Hi morning | 16:04 |
LowKeys | i've questions, how to fix this issue during git clone : error: RPC failed; curl 56 GnuTLS recv error (-9): Error decoding the received TLS packet. | 16:04 |
*** mlavalle has quit IRC | 16:08 | |
*** marios is now known as marios|call | 16:08 | |
*** mlavalle has joined #opendev | 16:09 | |
clarkb | LowKeys: what are you cloning against? | 16:14 |
clarkb | (the url would be helpful) | 16:15 |
LowKeys | clarkb: i do git clone https://opendev.org/openstack/openstack-ansible | 16:15 |
clarkb | ok as a quick sanity check I've cloned that repo from all three gitea backends directly | 16:18 |
clarkb | that seems happy at least | 16:18 |
clarkb | cacti also shows things look happy so no clues there | 16:20 |
LowKeys | ok problem solved, i think this connection issue, i changed the public ip, and solved | 16:23 |
clarkb | ya curl error 56 typically indicates a network issue looks like | 16:25 |
*** lbragstad_ is now known as lbragstad | 16:25 | |
slaweq | frickler: I have to leave now for some time, but I will continue my debugging in few hours if that isn't big problem, later I will tell You on this channel when the vm can be deleted | 16:29 |
slaweq | I hope it's fine for You | 16:30 |
*** marios|call is now known as marios | 16:42 | |
frickler | slaweq: sure, take your time, can also be a couple of days, no problem | 16:50 |
*** ykarel is now known as ykarel|away | 16:54 | |
fungi | roman_g: yeah, there was a ~2hr period around 02:00-04:00 where we accidentally merged some incorrect configuration management and it removed firewall rules from part of our distributed storage backend for the package mirrors, sorry about that. job volume was low enough at that time i opted to just mention something in the status log and not spam all the irc channels | 16:58 |
*** ykarel|away has quit IRC | 17:06 | |
*** marios is now known as marios|out | 17:07 | |
*** jpena is now known as jpena|brb | 17:13 | |
LowKeys | clarkb: yes, thank you btw | 17:16 |
*** marios|out has quit IRC | 17:18 | |
*** hashar has quit IRC | 17:18 | |
*** d34dh0r53 has quit IRC | 17:21 | |
*** eolivare has quit IRC | 17:30 | |
*** gmann is now known as gmann_afk | 17:40 | |
roman_g | fungi Thank you. | 17:42 |
*** andrewbonney has quit IRC | 18:02 | |
*** jpena|brb is now known as jpena | 18:03 | |
*** LowKeys has quit IRC | 18:04 | |
*** hamalq has joined #opendev | 18:36 | |
*** dtantsur is now known as dtantsur|afk | 18:39 | |
*** roman_g has quit IRC | 18:47 | |
*** gmann_afk is now known as gmann | 18:52 | |
*** jpena is now known as jpena|off | 18:58 | |
fungi | still watching http://travaux.ovh.net/?do=details&id=48997 to determine when it's safe to merge https://review.opendev.org/775209 but they haven't marked the incident as resolved yet (last update was yesterday... not sure if those timestamps are utc or cst) | 19:42 |
*** hashar has joined #opendev | 20:19 | |
*** auristor has quit IRC | 20:21 | |
*** auristor has joined #opendev | 20:22 | |
*** ralonsoh has quit IRC | 20:37 | |
clarkb | fungi: it seems like things are working if we want to just go for it but ya the indication that stuff was still being fixed made me decide not ot push it | 20:43 |
*** hashar has quit IRC | 20:46 | |
slaweq | frickler: clarkb: thx a lot for that host, I think I found the problem there. You can delete node 172.99.69.133 now | 20:50 |
slaweq | and also have a great weekend :) | 20:50 |
*** slaweq has quit IRC | 21:00 | |
fungi | looks like that hold was removed | 21:18 |
*** roman_g has joined #opendev | 21:21 | |
*** roman_g has quit IRC | 21:21 | |
*** mlavalle has quit IRC | 21:22 | |
*** mlavalle has joined #opendev | 21:35 | |
*** klonn has joined #opendev | 22:14 | |
*** hamalq has quit IRC | 23:51 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!