Thursday, 2024-11-07

timburkeany chance i could get a node hold on whatever node is running https://zuul.opendev.org/t/openstack/stream/48bab61cbb10481f920d119435abed2c?logfile=console.log ? seems like the job's hung. i've seen a fair number of these with our probe tests and not been able to figure out *why* it happens...00:37
timburkethat'd be part of this buildset if that helps at all: https://zuul.opendev.org/t/openstack/buildset/65cd0135c6124919845075fbacb3324e00:38
timburkeah, too late -- looks like it timed out already. i noticed it too late in the run :-(00:41
fungitimburke: maybe bump out the job timeout and then it'll be easier to catch before the job ends (or we can just set a hold for a single change and you can recheck it until you manage to trigger the problem)00:47
Clark[m]fungi: the LE changes failed on system-config-run-base. At this point they won't land before the daily jobs so we can just sort it out tomorrow 01:05
corvusinfra-root: fyi https://review.opendev.org/q/934288 is likely a corrupted change that i just uploaded.  same content as https://review.opendev.org/q/934289 which worked.01:09
corvusi don't see any user-visible errors from that.  i don't know if there will be any lasting effects.  but just noting it since i observed the unusual behavior when it hung on upload.01:10
corvusoh wow 288 works now01:10
corvusso i guess it was some async background process that was taking a long time.  there was a period where 289 was accessible and 288 was not.01:11
fungiClark[m]: yeah, i was just looking at it, seems there was a bunch of ssh disconnects in the testinfra phase01:12
cardoeGerrit auth not working?05:53
tonybcardoe: let me check.05:54
cardoeGetting “Provider is not supported, or was incorrectly entered.”05:55
tonybcardoe: yes it does seem something changed, and it's broken.05:55
tonybActually it looks like login.ubuntu.com is down06:01
cardoeThat would do it.06:02
tonybhttps://www.isitdownrightnow.com/login.ubuntu.com.html06:02
tonyb#status log OpenID Services are currently down, this means new web logins to OpenDev services (Gerrit and Wiki) are currently unavailable.  Existing sessions are fine06:09
opendevstatustonyb: finished logging06:09
ykareltonyb, if you can check/merge https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/93403406:14
ykarelneed to get it merged to test/confirm the fix06:14
tonybykarel: Let me look at it.06:18
fricklerTopic for #launchpad: There is a known outage affecting our services, including Launchpad.06:22
fricklersadly I lack the history for that channel since I somehow had dropped out from it06:23
tonybfrickler: I didn't even know that existed.06:29
tonybfrickler: Nothing logged in: https://irclogs.ubuntu.com/2024/11/07/%23launchpad.html06:32
fricklerlooks like the logger might be sad, too06:52
frickleractually issues should be resolved now according to that channel, but I'm not going to test by logging out myself06:58
tonybfrickler: Is it #launchpad on libera?07:31
tonyb#status log Authenictation services are functional again07:32
tonybcardoe: ^^^07:32
opendevstatustonyb: finished logging07:32
fricklertonyb: yes, ubuntu/canonical is all on liberachat07:35
tonybfrickler: Cool.  Thanks.07:36
tonybfrickler: I didn't see it listed on https://wiki.ubuntu.com/IRC/ChannelList07:37
fricklertonyb: yeah, likely because it isn't really ubuntu-specific07:41
fricklerthere's also #canonical-sysadmin which IIRC is for non-launchpad services like ubuntu.one. but also the overlap is similarly large as in our channels here ;)07:45
tonybfrickler: Thanks07:45
slaweqhi frickler, fungi and ianw can you take a look at https://review.opendev.org/c/zuul/zuul-jobs/+/934243 when you will have few minutes, thx in advance :)08:32
ttxslaweq: left a +1 on it -- it's not a repo we have +2 on09:59
ttxah, wrong channel09:59
slaweqttx tonyb thx :)10:05
opendevreviewKarolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404510:20
opendevreviewKarolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404510:49
opendevreviewKarolina Kula proposed openstack/diskimage-builder master: WIP: Add support for CentOS Stream 10  https://review.opendev.org/c/openstack/diskimage-builder/+/93404512:09
opendevreviewyatin proposed openstack/project-config master: Use 2024.2 constraints translation jobs  https://review.opendev.org/c/openstack/project-config/+/93432512:57
fungilooks like the launchpad channel didn't have any discussion about the outage, but they indicated at 06:38 utc that it should be resolved13:09
fungithough during the outage the explanation in #canonical-sysadmin was "we have PS5 network issues"13:10
fungiinfra-prod-letsencrypt succeeded in periodic since i freed up space on nb04 and reset the acme.sh git repo state in /opt14:23
fungii've stopped the builder container there and am doing a second cleanup pass (which will hopefully complete quickly), and will then reboot it to free up loop devs14:24
fungiand i've reapproved the acme.sh git checkout fix series, which should merge soon since they got back to verified +1 before i went to sleep14:25
fungii'll keep a close eye on them14:26
funginb04 second dib tempfile cleanup pass is done, got us down to 58% utilization on /opt (334G free). rebooting now14:45
fungiand it's finally back up and responding now14:51
fungiand i've upped the container on it again14:52
opendevreviewDmitriy Rabotyagov proposed openstack/diskimage-builder master: Add support for building Fedora 40  https://review.opendev.org/c/openstack/diskimage-builder/+/92210915:19
opendevreviewDmitriy Rabotyagov proposed openstack/diskimage-builder master: Remove verbosity from DNF/YUM command  https://review.opendev.org/c/openstack/diskimage-builder/+/93433215:31
fungihuh, traceroute failure from an ovh node to gitea: https://zuul.opendev.org/t/openstack/build/720df125010046809062ee896584ca54/console#0/3/5/ubuntu-noble15:39
fungii wonder if that happens a lot15:39
clarkbfungi: specifically dns resolution failed15:43
fungioh, i missed that's what it was in the traceback15:44
opendevreviewMerged openstack/project-config master: [neutron-dashboard] drop enforce-scope-old-defaults  https://review.opendev.org/c/openstack/project-config/+/93405215:44
fungiso dns resolution failure looking up opendev.org from an ovh node15:44
fungi"opendev.org: Temporary failure in name resolution"15:44
clarkbwe've got those checks in there because network issues have been an issue in the past. But a lot of this stuff predates even ovh I think15:44
fungiyeah, i overlooked that crucial line15:45
fungiwell, it was particularly critical in rackspace back when we were relying on their resolvers and they would blacklist "abusive ip addresses" from accessing the resolvers but then recycle those addresses to other tenants without cleaning up the blocklist entries they'd added15:46
opendevreviewMerged opendev/system-config master: Only update acme.sh if necessary  https://review.opendev.org/c/opendev/system-config/+/93425616:02
opendevreviewMerged opendev/system-config master: Run letsencrypt twice in system-config-run-letsencrypt  https://review.opendev.org/c/opendev/system-config/+/93425816:02
fungilooks like the deploy worked but ended in a post_failure?16:32
fungithe log does show it correctly skipping the "Install acme.sh client" task on servers due to the condition matching16:55
fungifound a semi-explanation in the debug logs on ze09, zuul_swift_upload task failed but with no detail since no_log is set for it17:01
fungii'm open to reenqueuing it, but since the log on bridge indicates the run phase worked correctly i'm not overly concerned17:02
fungirackspace is experiencing intermittent keystone (i think?) issues, which could explain it: https://rackspace.service-now.com/system_status?id=service_status&service=af7982f0db6cf200e93ff2e9af96198d17:14
Clark[m]Ya the important logs are on bridge anyway17:39
corvusit is making a lot of little red tiles on the status page though17:40
fungii think rackspace has resolved the issue, since i'm able to login and do things through the api again (i was not able to a few minutes ago)17:42
fricklerthere might be some issue with log uploads, had two post_failures with no logs in a row. https://zuul.opendev.org/t/openstack/build/0fa916ea0eac424eb61bbeea8b2da768 and https://zuul.opendev.org/t/openstack/build/73bca34cf70144528265b1612a5b0ed2 , but won't get to check more closely today18:05
frickleroh, that might correlate with those rax issues18:06
fungiyes, that was my working hypothesis18:19
opendevreviewGhanshyam proposed openstack/project-config master: Add separate acl group for watcher-tempest-plugin  https://review.opendev.org/c/openstack/project-config/+/93435718:24
clarkbok that took much much longer than I had hoped it would19:17
opendevreviewMerged openstack/project-config master: Use 2024.2 constraints translation jobs  https://review.opendev.org/c/openstack/project-config/+/93432519:25
opendevreviewMerged openstack/diskimage-builder master: Update Nodepool image location in docs  https://review.opendev.org/c/openstack/diskimage-builder/+/93392319:51
JayFWhen using the search in gerrit of `reviewer:self`, does that mean "you reviewed any patchset" or "you reviewed latest patchset"; I think it means the former but I'm looking for a search for the latter19:52
JayFtrying to build a dashboard essentially of "things open in $repo that I don't have an up to date review on"19:53
fungiJayF: i think it means you're listed as a "reviewer" for the change, regardless of what votes you've left19:53
clarkbChanges that have been, or need to be, reviewed by 'USER'. The special case of reviewer:self will find changes where the caller has been added as a reviewer.19:55
clarkbfrom https://review.opendev.org/Documentation/user-search.html so fungi is correct19:55
clarkbJayF: you want reviewedby19:56
JayF++ thank you, that was what I was getting around to on that doc19:57
JayFyou know, I've been using gerrit for a decade+, and this is the first time I saw that it had a documentatino drop down19:57
JayFI thought "I should find where those docs are for next time" then felt like an idiot lol19:57
JayFlooks like https://review.opendev.org/q/repo:openstack/diskimage-builder+status:open+-reviewedBy:self+-owner:self+is:mergeable is a fairly reasonable implementation of what I was looking for; thanks!19:59
* JayF trying to get more structured about his reviewing processes so stuff is less likely to fall through the cracks19:59
clarkbJayF: `status:open age:14d delta:<80 label:Verified=1,zuul NOT label:Code-Review=-2 NOT label:Code-Review=-1 NOT label:Workflow=-1` this is the one I've been using a bit more recently20:45
clarkbthe idea is show me everything that has been ignored for 2 weeks but is otherwise on the right track and isn't hundreds of lines20:46
clarkbbasically can I flush out easy things20:46
JayFthat is fairly close to what I used before20:46
JayFand this is intentionally to make sure the /not easy stuff/ gets looked at :)20:46
JayFfiltering for passing CI is especially rough in Ironic20:46
clarkbin my case I'm usually able to track a handlful of the bigger things mroe manually and any more is too much but I'm hapyp to get quick things out as I'm able20:48
clarkbdefinitely no perfect solution though. Its easy to miss things20:48
JayFThat's mostly what I did with Ironic, but I'm trying to get regularly reviewing on other repos20:48
JayFmainly DIB/oslo stuff for now20:48
clarkbinfra-root thgouths on https://review.opendev.org/c/opendev/system-config/+/934075? this is one of the outcomes of gerrit 3.10 upgrade prep20:55
clarkbI guess my question is any concern with deploying that pre upgrade and also would you prefer to stick with the existing setup of using a cronjob to delete old files instead?20:58
tonybnope looks good to me I'll have another look when I'm caffeinated21:02
clarkbcool thanks. As much stuff as I can clear off the upgrade todo list the better I'll feel prepared21:04
tonybAll good21:04
fungilgtm21:04
fungiwe should have a post-upgrade step to act on that todo comment, if there isn't one in the upgrade plan already (i don't recall)21:05
clarkbI don't recall etiher. I'll go add one if it is missing21:06
clarkbdone21:07
fungithanks!21:07
clarkboh another note about that document. I struck through one item as done done because I'm well confident we're good. On everything else I've posted my notes on what I've found but havent' struck them through as I'm still hoping I can get double checked on that stuff21:08
clarkbhttps://etherpad.opendev.org/p/gerrit-upgrade-3.10 feel free to read through the notes and do your own investigations or just read through what I've written then maybe put a note on them as you do21:08
opendevreviewMerged openstack/diskimage-builder master: docs: add two contextual warnings to the replace-partition element  https://review.opendev.org/c/openstack/diskimage-builder/+/92881923:34
*** tosky_ is now known as tosky23:40

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!