timburke | any chance i could get a node hold on whatever node is running https://zuul.opendev.org/t/openstack/stream/48bab61cbb10481f920d119435abed2c?logfile=console.log ? seems like the job's hung. i've seen a fair number of these with our probe tests and not been able to figure out *why* it happens... | 00:37 |
---|---|---|
timburke | that'd be part of this buildset if that helps at all: https://zuul.opendev.org/t/openstack/buildset/65cd0135c6124919845075fbacb3324e | 00:38 |
timburke | ah, too late -- looks like it timed out already. i noticed it too late in the run :-( | 00:41 |
fungi | timburke: maybe bump out the job timeout and then it'll be easier to catch before the job ends (or we can just set a hold for a single change and you can recheck it until you manage to trigger the problem) | 00:47 |
Clark[m] | fungi: the LE changes failed on system-config-run-base. At this point they won't land before the daily jobs so we can just sort it out tomorrow | 01:05 |
corvus | infra-root: fyi https://review.opendev.org/q/934288 is likely a corrupted change that i just uploaded. same content as https://review.opendev.org/q/934289 which worked. | 01:09 |
corvus | i don't see any user-visible errors from that. i don't know if there will be any lasting effects. but just noting it since i observed the unusual behavior when it hung on upload. | 01:10 |
corvus | oh wow 288 works now | 01:10 |
corvus | so i guess it was some async background process that was taking a long time. there was a period where 289 was accessible and 288 was not. | 01:11 |
fungi | Clark[m]: yeah, i was just looking at it, seems there was a bunch of ssh disconnects in the testinfra phase | 01:12 |
cardoe | Gerrit auth not working? | 05:53 |
tonyb | cardoe: let me check. | 05:54 |
cardoe | Getting “Provider is not supported, or was incorrectly entered.” | 05:55 |
tonyb | cardoe: yes it does seem something changed, and it's broken. | 05:55 |
tonyb | Actually it looks like login.ubuntu.com is down | 06:01 |
cardoe | That would do it. | 06:02 |
tonyb | https://www.isitdownrightnow.com/login.ubuntu.com.html | 06:02 |
tonyb | #status log OpenID Services are currently down, this means new web logins to OpenDev services (Gerrit and Wiki) are currently unavailable. Existing sessions are fine | 06:09 |
opendevstatus | tonyb: finished logging | 06:09 |
ykarel | tonyb, if you can check/merge https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/934034 | 06:14 |
ykarel | need to get it merged to test/confirm the fix | 06:14 |
tonyb | ykarel: Let me look at it. | 06:18 |
frickler | Topic for #launchpad: There is a known outage affecting our services, including Launchpad. | 06:22 |
frickler | sadly I lack the history for that channel since I somehow had dropped out from it | 06:23 |
tonyb | frickler: I didn't even know that existed. | 06:29 |
tonyb | frickler: Nothing logged in: https://irclogs.ubuntu.com/2024/11/07/%23launchpad.html | 06:32 |
frickler | looks like the logger might be sad, too | 06:52 |
frickler | actually issues should be resolved now according to that channel, but I'm not going to test by logging out myself | 06:58 |
tonyb | frickler: Is it #launchpad on libera? | 07:31 |
tonyb | #status log Authenictation services are functional again | 07:32 |
tonyb | cardoe: ^^^ | 07:32 |
opendevstatus | tonyb: finished logging | 07:32 |
frickler | tonyb: yes, ubuntu/canonical is all on liberachat | 07:35 |
tonyb | frickler: Cool. Thanks. | 07:36 |
tonyb | frickler: I didn't see it listed on https://wiki.ubuntu.com/IRC/ChannelList | 07:37 |
frickler | tonyb: yeah, likely because it isn't really ubuntu-specific | 07:41 |
frickler | there's also #canonical-sysadmin which IIRC is for non-launchpad services like ubuntu.one. but also the overlap is similarly large as in our channels here ;) | 07:45 |
tonyb | frickler: Thanks | 07:45 |
slaweq | hi frickler, fungi and ianw can you take a look at https://review.opendev.org/c/zuul/zuul-jobs/+/934243 when you will have few minutes, thx in advance :) | 08:32 |
ttx | slaweq: left a +1 on it -- it's not a repo we have +2 on | 09:59 |
ttx | ah, wrong channel | 09:59 |
slaweq | ttx tonyb thx :) | 10:05 |
opendevreview | Karolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10 https://review.opendev.org/c/openstack/diskimage-builder/+/934045 | 10:20 |
opendevreview | Karolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10 https://review.opendev.org/c/openstack/diskimage-builder/+/934045 | 10:49 |
opendevreview | Karolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10 https://review.opendev.org/c/openstack/diskimage-builder/+/934045 | 12:09 |
opendevreview | yatin proposed openstack/project-config master: Use 2024.2 constraints translation jobs https://review.opendev.org/c/openstack/project-config/+/934325 | 12:57 |
fungi | looks like the launchpad channel didn't have any discussion about the outage, but they indicated at 06:38 utc that it should be resolved | 13:09 |
fungi | though during the outage the explanation in #canonical-sysadmin was "we have PS5 network issues" | 13:10 |
fungi | infra-prod-letsencrypt succeeded in periodic since i freed up space on nb04 and reset the acme.sh git repo state in /opt | 14:23 |
fungi | i've stopped the builder container there and am doing a second cleanup pass (which will hopefully complete quickly), and will then reboot it to free up loop devs | 14:24 |
fungi | and i've reapproved the acme.sh git checkout fix series, which should merge soon since they got back to verified +1 before i went to sleep | 14:25 |
fungi | i'll keep a close eye on them | 14:26 |
fungi | nb04 second dib tempfile cleanup pass is done, got us down to 58% utilization on /opt (334G free). rebooting now | 14:45 |
fungi | and it's finally back up and responding now | 14:51 |
fungi | and i've upped the container on it again | 14:52 |
opendevreview | Dmitriy Rabotyagov proposed openstack/diskimage-builder master: Add support for building Fedora 40 https://review.opendev.org/c/openstack/diskimage-builder/+/922109 | 15:19 |
opendevreview | Dmitriy Rabotyagov proposed openstack/diskimage-builder master: Remove verbosity from DNF/YUM command https://review.opendev.org/c/openstack/diskimage-builder/+/934332 | 15:31 |
fungi | huh, traceroute failure from an ovh node to gitea: https://zuul.opendev.org/t/openstack/build/720df125010046809062ee896584ca54/console#0/3/5/ubuntu-noble | 15:39 |
fungi | i wonder if that happens a lot | 15:39 |
clarkb | fungi: specifically dns resolution failed | 15:43 |
fungi | oh, i missed that's what it was in the traceback | 15:44 |
opendevreview | Merged openstack/project-config master: [neutron-dashboard] drop enforce-scope-old-defaults https://review.opendev.org/c/openstack/project-config/+/934052 | 15:44 |
fungi | so dns resolution failure looking up opendev.org from an ovh node | 15:44 |
fungi | "opendev.org: Temporary failure in name resolution" | 15:44 |
clarkb | we've got those checks in there because network issues have been an issue in the past. But a lot of this stuff predates even ovh I think | 15:44 |
fungi | yeah, i overlooked that crucial line | 15:45 |
fungi | well, it was particularly critical in rackspace back when we were relying on their resolvers and they would blacklist "abusive ip addresses" from accessing the resolvers but then recycle those addresses to other tenants without cleaning up the blocklist entries they'd added | 15:46 |
opendevreview | Merged opendev/system-config master: Only update acme.sh if necessary https://review.opendev.org/c/opendev/system-config/+/934256 | 16:02 |
opendevreview | Merged opendev/system-config master: Run letsencrypt twice in system-config-run-letsencrypt https://review.opendev.org/c/opendev/system-config/+/934258 | 16:02 |
fungi | looks like the deploy worked but ended in a post_failure? | 16:32 |
fungi | the log does show it correctly skipping the "Install acme.sh client" task on servers due to the condition matching | 16:55 |
fungi | found a semi-explanation in the debug logs on ze09, zuul_swift_upload task failed but with no detail since no_log is set for it | 17:01 |
fungi | i'm open to reenqueuing it, but since the log on bridge indicates the run phase worked correctly i'm not overly concerned | 17:02 |
fungi | rackspace is experiencing intermittent keystone (i think?) issues, which could explain it: https://rackspace.service-now.com/system_status?id=service_status&service=af7982f0db6cf200e93ff2e9af96198d | 17:14 |
Clark[m] | Ya the important logs are on bridge anyway | 17:39 |
corvus | it is making a lot of little red tiles on the status page though | 17:40 |
fungi | i think rackspace has resolved the issue, since i'm able to login and do things through the api again (i was not able to a few minutes ago) | 17:42 |
frickler | there might be some issue with log uploads, had two post_failures with no logs in a row. https://zuul.opendev.org/t/openstack/build/0fa916ea0eac424eb61bbeea8b2da768 and https://zuul.opendev.org/t/openstack/build/73bca34cf70144528265b1612a5b0ed2 , but won't get to check more closely today | 18:05 |
frickler | oh, that might correlate with those rax issues | 18:06 |
fungi | yes, that was my working hypothesis | 18:19 |
opendevreview | Ghanshyam proposed openstack/project-config master: Add separate acl group for watcher-tempest-plugin https://review.opendev.org/c/openstack/project-config/+/934357 | 18:24 |
clarkb | ok that took much much longer than I had hoped it would | 19:17 |
opendevreview | Merged openstack/project-config master: Use 2024.2 constraints translation jobs https://review.opendev.org/c/openstack/project-config/+/934325 | 19:25 |
opendevreview | Merged openstack/diskimage-builder master: Update Nodepool image location in docs https://review.opendev.org/c/openstack/diskimage-builder/+/933923 | 19:51 |
JayF | When using the search in gerrit of `reviewer:self`, does that mean "you reviewed any patchset" or "you reviewed latest patchset"; I think it means the former but I'm looking for a search for the latter | 19:52 |
JayF | trying to build a dashboard essentially of "things open in $repo that I don't have an up to date review on" | 19:53 |
fungi | JayF: i think it means you're listed as a "reviewer" for the change, regardless of what votes you've left | 19:53 |
clarkb | Changes that have been, or need to be, reviewed by 'USER'. The special case of reviewer:self will find changes where the caller has been added as a reviewer. | 19:55 |
clarkb | from https://review.opendev.org/Documentation/user-search.html so fungi is correct | 19:55 |
clarkb | JayF: you want reviewedby | 19:56 |
JayF | ++ thank you, that was what I was getting around to on that doc | 19:57 |
JayF | you know, I've been using gerrit for a decade+, and this is the first time I saw that it had a documentatino drop down | 19:57 |
JayF | I thought "I should find where those docs are for next time" then felt like an idiot lol | 19:57 |
JayF | looks like https://review.opendev.org/q/repo:openstack/diskimage-builder+status:open+-reviewedBy:self+-owner:self+is:mergeable is a fairly reasonable implementation of what I was looking for; thanks! | 19:59 |
* JayF trying to get more structured about his reviewing processes so stuff is less likely to fall through the cracks | 19:59 | |
clarkb | JayF: `status:open age:14d delta:<80 label:Verified=1,zuul NOT label:Code-Review=-2 NOT label:Code-Review=-1 NOT label:Workflow=-1` this is the one I've been using a bit more recently | 20:45 |
clarkb | the idea is show me everything that has been ignored for 2 weeks but is otherwise on the right track and isn't hundreds of lines | 20:46 |
clarkb | basically can I flush out easy things | 20:46 |
JayF | that is fairly close to what I used before | 20:46 |
JayF | and this is intentionally to make sure the /not easy stuff/ gets looked at :) | 20:46 |
JayF | filtering for passing CI is especially rough in Ironic | 20:46 |
clarkb | in my case I'm usually able to track a handlful of the bigger things mroe manually and any more is too much but I'm hapyp to get quick things out as I'm able | 20:48 |
clarkb | definitely no perfect solution though. Its easy to miss things | 20:48 |
JayF | That's mostly what I did with Ironic, but I'm trying to get regularly reviewing on other repos | 20:48 |
JayF | mainly DIB/oslo stuff for now | 20:48 |
clarkb | infra-root thgouths on https://review.opendev.org/c/opendev/system-config/+/934075? this is one of the outcomes of gerrit 3.10 upgrade prep | 20:55 |
clarkb | I guess my question is any concern with deploying that pre upgrade and also would you prefer to stick with the existing setup of using a cronjob to delete old files instead? | 20:58 |
tonyb | nope looks good to me I'll have another look when I'm caffeinated | 21:02 |
clarkb | cool thanks. As much stuff as I can clear off the upgrade todo list the better I'll feel prepared | 21:04 |
tonyb | All good | 21:04 |
fungi | lgtm | 21:04 |
fungi | we should have a post-upgrade step to act on that todo comment, if there isn't one in the upgrade plan already (i don't recall) | 21:05 |
clarkb | I don't recall etiher. I'll go add one if it is missing | 21:06 |
clarkb | done | 21:07 |
fungi | thanks! | 21:07 |
clarkb | oh another note about that document. I struck through one item as done done because I'm well confident we're good. On everything else I've posted my notes on what I've found but havent' struck them through as I'm still hoping I can get double checked on that stuff | 21:08 |
clarkb | https://etherpad.opendev.org/p/gerrit-upgrade-3.10 feel free to read through the notes and do your own investigations or just read through what I've written then maybe put a note on them as you do | 21:08 |
opendevreview | Merged openstack/diskimage-builder master: docs: add two contextual warnings to the replace-partition element https://review.opendev.org/c/openstack/diskimage-builder/+/928819 | 23:34 |
*** tosky_ is now known as tosky | 23:40 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!