Sunday, 2021-01-31

fungirevisiting the problem with the citycloud-kna1 mirror instance, i'm a little fuzzy on how to fiddle with floating ips... it's not clear to me how to double-check that the port it's associated with is correctly connected to the server instance01:32
fungiopenstack floating ip show gives me the port id and network id01:33
fungiopenstack server show does not tell me what network and port ids are used by the server instance, only the addresses (which do match, at least)01:34
fungiwe don't have cacti correctly polling it as far as i can tell either, so no idea when it went offline. i can at least see that it was logged as unreachable when the daily base playbook run happened 2021-01-29 06:09:31 utc, but it was reachable for the previous 2021-01-28 06:04:59 run, so i think it went offline sometime in between those01:40
fungialso the login credentials we have for our tenant accounts don't seem to be viable for logging into citycloud's web dashboard, so i don't think we'll be able to open a support ticket that way01:53
*** DSpider has quit IRC13:05
yoctozeptoinfra-root: are there known zuul issues now? I see lots and lots of jobs ending up in post_failure with a finger url17:18
fungiyoctozepto: not known to me, i'll take a look. often that means a problem with one of our swift providers17:21
fungiWARNING:keystoneauth.identity.generic.base:Failed to discover available identity versions when contacting Attempting to parse version from URL.17:24
funginot a lot of luck parsing their status info there17:27
fungii'll see if i can work out when this started17:27
fungi16:13:33 utc today (roughly an hour ago) is the first occurrence we saw for that particular error17:31
fungimight be too recent for them to have recorded the underlying problem yet17:31
fungiinfra-root: i'm going to bypass zuul to directly merge 773258 since there's a good chance it will fail out on the problem it's working around17:37
yoctozeptofungi: thanks for your quick reaction17:37
yoctozeptouh-oh, I have now also learnt that the logs target is independent of where the job ran17:39
yoctozeptogood to know :D17:39
fungiyoctozepto: yes, we don't have usable swift containers in all the clouds where we run the builds17:39
yoctozeptoah, that makes sense17:40
yoctozeptoI will record this fact (or at least try to)17:40
fungialso the zuul executors are what do the log uploading, and they're not local to the clouds where builds happen, so colocating log uploads with the build location doesn't buy us any stability/bandwidth savings anyway17:40
yoctozeptohow can I easily check which cloud ran my job?17:41
yoctozeptoah, ack17:41
yoctozeptothanks for the insight; always more to learn17:41
fungieasiest way, if logs got uploaded (hah!) is to look in the zuul_info/inventory.yaml17:41
yoctozeptothanks, yeah, and if they did not? :D17:41
fungithough also the nodes themselves embed the cloud where they're booted in their hostnames17:42
fungiif no logs were uploaded, and you're watching teh console stream, you can possibly see it mentioned in the log output (a bunch of jobs echo the hostname as part of their initial diagnostic checks)17:42
yoctozeptowe have a pending task to debug why some of our runs end up in disk_full even though it does not make sense as we use very few gb in kolla17:42
yoctozeptotrue that, though I am usually not doing that :D17:43
fungizuul doesn't permanently record the node provider with the mysql reporter used by the dashboard, so it's not part of the build metadata available there17:43
yoctozeptoas in, I don't wait for jobs to fail17:43
yoctozeptoah, shucks17:43
fungithose are things you could work with the zuul maintainers on in #zuul if they'd be useful data for you17:44
fungi#status log Temporarily suspended Zuul build log uploads to OVH due to Keystone auth errors; POST_FAILURE results recorded between 16:30 and 17:40 UTC can be safely rechecked17:46
openstackstatusfungi: finished logging17:46
fungiianw: seeing as how you're around, any chance you'd be interested in poking at the fip for the unreachable citycloud mirror so we might be able to reenable that region? otherwise i expect our only hope is to e-mail our contact there22:51
ianwfungi: fip being floating ip?  i can see what i can find out22:53
fungiianw: yeah, see scrollback basically we lost contact with the instance sometime on friday or early saturday utc (we're not monitoring it with cacti so the daily ansible runs are the most granularity we have other than maybe querying logstash for timeouts reaching the mirrored urls)23:03
ianwok, poking now to see if i can find anything23:03
fungii tried rebooting it, soft and then hard, no dice, still basically networks timeouts23:03
ianwif we have quota i can try building another23:04
fungithe console initially showed kernel messages about tasks timing out, making me wonder if there might have been a pause from a live migration23:04
fungiso half suspecting that the fip might not have moved or refreshed its neutron port when that happened23:04
fungibut i'm not confident enough in my reading of the available fip operations to know if i'll make matters worse trying to detach and reattach it23:05
ianwahh, well yes i can try detaching/reattaching23:05
ianwheh me either but i guess it can hardly get much worse than being offline :)23:05
fungithat's entirely true23:05
fungijust didn't want to make a bigger mess if we were going to wind up e-mailing our contact there to have them troubleshoot23:06
fungibasically i wasn't all that sure what i was doing, and didn't want to go fumbling in the dark with sharp instrumentsif nobody else was about ;)23:06
fungii disabled the region by setting max-servers for it to 0 in nodepool, so the special flavors there aren't booting but otherwise whatever happens there shouldn't been too disruptive at least23:07
fungialso as a heads up, i took the ovh regions out of our pool for zuul build log swift uploads due to some keystone errors, haven't checked to see if that problem may have fixed itself (see status log or scrollback for deets on that)23:08
fungii have reverts proposed for both the citycloud nodepool change and ovh swift logs change in case either is save to roll back23:09
ianwwell i deleted the floating ip, created a new one and attached it and ... still doesn't seem to work :/23:10
fungireverts are 773240 for citycloud and 773259 for ovh, in case anyone needs them23:10
fungithat's an unfortunate turn of events, but at least rules out it being a half-connected fip i suppose23:11
ianwi don't seem to have permission to view quota; and i can't figure out a log into the UI control panel.  i'll try lauching another node and see if it a) can even launch and b) if it connects23:12
ianwit doesn't look like the fresh node is any happier either23:19
ianwnope; i'll put together an email because i don't think we can do much else23:28
fungiugh, maybe they borked our network somehow23:32
ianwi won't try re-creating all that23:38
