*** prometheanfire has joined #opendev | 00:02 | |
*** Green_Bird has quit IRC | 00:15 | |
*** tosky has quit IRC | 00:30 | |
fungi | revisiting the problem with the citycloud-kna1 mirror instance, i'm a little fuzzy on how to fiddle with floating ips... it's not clear to me how to double-check that the port it's associated with is correctly connected to the server instance | 01:32 |
---|---|---|
fungi | openstack floating ip show gives me the port id and network id | 01:33 |
fungi | openstack server show does not tell me what network and port ids are used by the server instance, only the addresses (which do match, at least) | 01:34 |
fungi | we don't have cacti correctly polling it as far as i can tell either, so no idea when it went offline. i can at least see that it was logged as unreachable when the daily base playbook run happened 2021-01-29 06:09:31 utc, but it was reachable for the previous 2021-01-28 06:04:59 run, so i think it went offline sometime in between those | 01:40 |
*** aprice has quit IRC | 01:52 | |
fungi | also the login credentials we have for our tenant accounts don't seem to be viable for logging into citycloud's web dashboard, so i don't think we'll be able to open a support ticket that way | 01:53 |
*** aprice has joined #opendev | 01:53 | |
*** klonn has joined #opendev | 02:02 | |
*** klonn has quit IRC | 02:04 | |
*** DSpider has quit IRC | 02:53 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/773224 | 06:05 |
*** DSpider has joined #opendev | 08:05 | |
*** slaweq has joined #opendev | 10:42 | |
*** slaweq has quit IRC | 10:44 | |
*** DSpider has joined #opendev | 10:46 | |
*** dviroel has joined #opendev | 10:52 | |
*** tosky has joined #opendev | 11:22 | |
*** slaweq has joined #opendev | 12:17 | |
*** icey has quit IRC | 13:00 | |
*** icey has joined #opendev | 13:00 | |
*** klonn has joined #opendev | 13:02 | |
*** DSpider has quit IRC | 13:05 | |
*** dviroel has quit IRC | 13:26 | |
*** klonn has quit IRC | 14:16 | |
*** zbr9 has joined #opendev | 14:35 | |
*** zbr has quit IRC | 14:37 | |
*** zbr9 is now known as zbr | 14:37 | |
openstackgerrit | Merged openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/773224 | 14:43 |
*** klonn has joined #opendev | 16:09 | |
*** slaweq has quit IRC | 16:25 | |
yoctozepto | infra-root: are there known zuul issues now? I see lots and lots of jobs ending up in post_failure with a finger url | 17:18 |
fungi | yoctozepto: not known to me, i'll take a look. often that means a problem with one of our swift providers | 17:21 |
fungi | WARNING:keystoneauth.identity.generic.base:Failed to discover available identity versions when contacting https://auth.cloud.ovh.net/. Attempting to parse version from URL. | 17:24 |
fungi | bingo | 17:24 |
fungi | http://travaux.ovh.net/ | 17:25 |
fungi | not a lot of luck parsing their status info there | 17:27 |
fungi | i'll see if i can work out when this started | 17:27 |
fungi | 16:13:33 utc today (roughly an hour ago) is the first occurrence we saw for that particular error | 17:31 |
fungi | might be too recent for them to have recorded the underlying problem yet | 17:31 |
openstackgerrit | Jeremy Stanley proposed opendev/base-jobs master: Temporarily disable build log uploads to OVH https://review.opendev.org/c/opendev/base-jobs/+/773258 | 17:37 |
openstackgerrit | Jeremy Stanley proposed opendev/base-jobs master: Revert "Temporarily disable build log uploads to OVH" https://review.opendev.org/c/opendev/base-jobs/+/773259 | 17:37 |
fungi | infra-root: i'm going to bypass zuul to directly merge 773258 since there's a good chance it will fail out on the problem it's working around | 17:37 |
yoctozepto | fungi: thanks for your quick reaction | 17:37 |
yoctozepto | uh-oh, I have now also learnt that the logs target is independent of where the job ran | 17:39 |
openstackgerrit | Merged opendev/base-jobs master: Temporarily disable build log uploads to OVH https://review.opendev.org/c/opendev/base-jobs/+/773258 | 17:39 |
yoctozepto | good to know :D | 17:39 |
fungi | yoctozepto: yes, we don't have usable swift containers in all the clouds where we run the builds | 17:39 |
yoctozepto | ah, that makes sense | 17:40 |
yoctozepto | I will record this fact (or at least try to) | 17:40 |
fungi | also the zuul executors are what do the log uploading, and they're not local to the clouds where builds happen, so colocating log uploads with the build location doesn't buy us any stability/bandwidth savings anyway | 17:40 |
yoctozepto | how can I easily check which cloud ran my job? | 17:41 |
yoctozepto | ah, ack | 17:41 |
yoctozepto | thanks for the insight; always more to learn | 17:41 |
fungi | easiest way, if logs got uploaded (hah!) is to look in the zuul_info/inventory.yaml | 17:41 |
yoctozepto | thanks, yeah, and if they did not? :D | 17:41 |
fungi | though also the nodes themselves embed the cloud where they're booted in their hostnames | 17:42 |
fungi | if no logs were uploaded, and you're watching teh console stream, you can possibly see it mentioned in the log output (a bunch of jobs echo the hostname as part of their initial diagnostic checks) | 17:42 |
*** klonn has quit IRC | 17:42 | |
yoctozepto | we have a pending task to debug why some of our runs end up in disk_full even though it does not make sense as we use very few gb in kolla | 17:42 |
yoctozepto | true that, though I am usually not doing that :D | 17:43 |
fungi | zuul doesn't permanently record the node provider with the mysql reporter used by the dashboard, so it's not part of the build metadata available there | 17:43 |
yoctozepto | as in, I don't wait for jobs to fail | 17:43 |
yoctozepto | ah, shucks | 17:43 |
fungi | those are things you could work with the zuul maintainers on in #zuul if they'd be useful data for you | 17:44 |
fungi | #status log Temporarily suspended Zuul build log uploads to OVH due to Keystone auth errors; POST_FAILURE results recorded between 16:30 and 17:40 UTC can be safely rechecked | 17:46 |
openstackstatus | fungi: finished logging | 17:46 |
*** Dmitrii-Sh has quit IRC | 17:58 | |
*** klonn has joined #opendev | 18:20 | |
*** klonn has quit IRC | 18:51 | |
*** klonn has joined #opendev | 19:32 | |
*** stevebaker has quit IRC | 20:22 | |
*** stevebaker has joined #opendev | 20:42 | |
*** klonn has quit IRC | 21:28 | |
*** iurygregory has joined #opendev | 22:24 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Expand gerrit testing to multiple changes https://review.opendev.org/c/opendev/system-config/+/772823 | 22:49 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Test x/ project clones in Gerrit https://review.opendev.org/c/opendev/system-config/+/773153 | 22:49 |
fungi | ianw: seeing as how you're around, any chance you'd be interested in poking at the fip for the unreachable citycloud mirror so we might be able to reenable that region? otherwise i expect our only hope is to e-mail our contact there | 22:51 |
ianw | fungi: fip being floating ip? i can see what i can find out | 22:53 |
*** tkajinam has joined #opendev | 22:56 | |
fungi | ianw: yeah, see scrollback basically we lost contact with the instance sometime on friday or early saturday utc (we're not monitoring it with cacti so the daily ansible runs are the most granularity we have other than maybe querying logstash for timeouts reaching the mirrored urls) | 23:03 |
ianw | ok, poking now to see if i can find anything | 23:03 |
fungi | i tried rebooting it, soft and then hard, no dice, still basically networks timeouts | 23:03 |
ianw | if we have quota i can try building another | 23:04 |
fungi | the console initially showed kernel messages about tasks timing out, making me wonder if there might have been a pause from a live migration | 23:04 |
fungi | so half suspecting that the fip might not have moved or refreshed its neutron port when that happened | 23:04 |
fungi | but i'm not confident enough in my reading of the available fip operations to know if i'll make matters worse trying to detach and reattach it | 23:05 |
ianw | ahh, well yes i can try detaching/reattaching | 23:05 |
ianw | heh me either but i guess it can hardly get much worse than being offline :) | 23:05 |
fungi | that's entirely true | 23:05 |
fungi | just didn't want to make a bigger mess if we were going to wind up e-mailing our contact there to have them troubleshoot | 23:06 |
fungi | basically i wasn't all that sure what i was doing, and didn't want to go fumbling in the dark with sharp instrumentsif nobody else was about ;) | 23:06 |
fungi | i disabled the region by setting max-servers for it to 0 in nodepool, so the special flavors there aren't booting but otherwise whatever happens there shouldn't been too disruptive at least | 23:07 |
fungi | also as a heads up, i took the ovh regions out of our pool for zuul build log swift uploads due to some keystone errors, haven't checked to see if that problem may have fixed itself (see status log or scrollback for deets on that) | 23:08 |
fungi | i have reverts proposed for both the citycloud nodepool change and ovh swift logs change in case either is save to roll back | 23:09 |
ianw | well i deleted the floating ip, created a new one and attached it and ... still doesn't seem to work :/ | 23:10 |
fungi | reverts are 773240 for citycloud and 773259 for ovh, in case anyone needs them | 23:10 |
fungi | that's an unfortunate turn of events, but at least rules out it being a half-connected fip i suppose | 23:11 |
ianw | i don't seem to have permission to view quota; and i can't figure out a log into the UI control panel. i'll try lauching another node and see if it a) can even launch and b) if it connects | 23:12 |
ianw | it doesn't look like the fresh node is any happier either | 23:19 |
ianw | nope; i'll put together an email because i don't think we can do much else | 23:28 |
fungi | ugh, maybe they borked our network somehow | 23:32 |
ianw | i won't try re-creating all that | 23:38 |
*** tosky has quit IRC | 23:39 | |
*** Dmitrii-Sh has joined #opendev | 23:43 | |
*** Dmitrii-Sh has quit IRC | 23:44 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Expand gerrit testing to multiple changes https://review.opendev.org/c/opendev/system-config/+/772823 | 23:48 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Test x/ project clones in Gerrit https://review.opendev.org/c/opendev/system-config/+/773153 | 23:48 |
*** Dmitrii-Sh has joined #opendev | 23:48 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!