Thursday, 2025-02-06

opendevreviewGhanshyam proposed openstack/project-config master: Add separate ACL for openstackdocstheme  https://review.opendev.org/c/openstack/project-config/+/94084503:13
*** ralonsoh_ is now known as ralonsoh07:40
*** ykarel_ is now known as ykarel13:27
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Add nodesets  https://review.opendev.org/c/opendev/zuul-providers/+/94088515:22
clarkbinfra-root if we still want to try and restart gerrit today to pick up the new image build by the new jeepyb chagne it would be nice to get https://review.opendev.org/c/opendev/system-config/+/940351 in as well to source mariadb from the mirror location15:47
fungiyeah, i'm available any time15:47
corvuslgtm/sgtm15:48
fungiapproved15:48
clarkbalso gitea immediately made a new release after the one yesterday so I'll update that change15:48
clarkbor maybe not my desktop isn't seeing the usb device with offline keys (laptop sees it)15:57
clarkbweird a different port worked. I tested the same port with a different device and it too worked16:01
opendevreviewClark Boylan proposed opendev/system-config master: Update to gitea 1.23.3  https://review.opendev.org/c/opendev/system-config/+/94082316:04
frickler#status notice nominations for the OpenStack PTL and TC positions are now open, for details see https://governance.openstack.org/election/16:06
opendevstatusfrickler: sending notice16:06
-opendevstatus- NOTICE: nominations for the OpenStack PTL and TC positions are now open, for details see https://governance.openstack.org/election/16:07
opendevstatusfrickler: finished sending notice16:09
corvusit appears that we tried to launch one node on vexxhost and it finished launching but failed the nodescan16:34
corvus(re niz)16:34
clarkbcorvus: did it fail because the host was not reachable?16:37
clarkbI think there is a certain amount of background noise for failures like that though usually in clouds that boot more slowly16:37
corvusyeah16:37
corvus2025-02-06 15:25:21,754 DEBUG zuul.Launcher: Nodescan request failed with 0 keys, 96 initial connection attempts, 0 key connection failures, 0 key negotiation failures in 300 seconds16:37
clarkbit is possible that is due to needing to copy the disk image for the first boot though which may delay the first boot on each hypervisor until things get cached16:37
corvusi think it's 162.253.55.62 which is still not accepting ssh connections16:37
corvusthe node is still there because there is a bug in zuul-launcher and the state machine is stuck due to the failed nodescan16:38
corvusso that's lucky :)16:38
corvus| f961eecc-0e67-4855-ac9a-e7e771feb9c9 | np7b5f6a91646c4 | ACTIVE | public=162.253.55.62, 2604:e100:1:0:f816:3eff:fee1:680e  | ubuntu-jammy-c148b7b27ddb45b8a48230e6b28167e3 | v3-standard-4 |16:38
corvusyeah that's the node16:39
clarkbI've got to pop out now to get to that doctor appointment, but checking the console log for the node might be helpful to confirm glean tried to configure the network16:39
corvushow do we console on vexxhost?16:39
corvusapi/cli/web?16:40
clarkbcorvus: I think openstack client should work16:40
corvusack thx i'll try that16:40
clarkbits something like server console log show uuid or console log show uuid16:40
corvusconsole log show f961eecc-0e67-4855-ac9a-e7e771feb9c9 just exits 0 with no output :/16:42
corvusrunning the same command with another nodepool vm does return output16:43
corvusperhaps this is a fault with the image16:43
fungiwas it uploaded raw? i think we normally do raw because they're keeping bfv images in ceph there?16:45
corvusyep, and image show says  disk_format      | raw 16:45
corvusit is much smaller than one of the current nodepool images though16:46
corvusnodepool: 29045489664 / zuul: 931562124216:46
corvus(heh, recording the image size in the artifact data at https://zuul.opendev.org/t/zuul/image/ubuntu-jammy would be helpful)16:47
fungiwas it maybe stored compressed?16:48
fungiand that's the compressed size rather than the uploaded size?16:48
corvusthe job is supposed to compress it and the launcher is supposed to decompress it... let's see16:49
corvushttps://zuul.opendev.org/t/opendev/build/e297d3d8de8e433f8ed579e6f99174ba is the build16:50
corvushere's the compression: /opt/dib_tmp/dib-images/ubuntu-jammy.raw : 32.03%   (  27.1 GiB =>   8.68 GiB, /opt/dib_tmp/dib-images/ubuntu-jammy.raw.zst) 16:51
corvusthat's the right order of magnitude for an error like the launcher didn't uncompress it16:51
corvusyep, we're missing the "decompress" log entries16:54
corvusi see the bug16:56
opendevreviewMerged opendev/system-config master: Switch Gerrit to opendevmirror hosted mariadb image  https://review.opendev.org/c/opendev/system-config/+/94035117:20
fungithat ^ deployed successfully17:38
corvusfunny story: the zuul change to fix the error with empty nodeset requests has merged.  the promote job for that change is sitting in the queue waiting for an empty nodeset request to be fulfilled; it never will be.  therefore the image with the fix in it will never be published and deployed to fix the bug that's causing the image not to be published.17:50
fungihah17:51
corvusit's been a while since we've had one of these ouroboros bugs17:51
corvusthe nice thing is -- this is actually fixable with nothing but normal changes; i'm trying to think of the least disruptive17:52
opendevreviewJames E. Blair proposed openstack/project-config master: Temporarily remove zuul-providers from zuul tenant  https://review.opendev.org/c/openstack/project-config/+/94090117:55
opendevreviewJames E. Blair proposed openstack/project-config master: Revert "Temporarily remove zuul-providers from zuul tenant"  https://review.opendev.org/c/openstack/project-config/+/94090217:55
corvusi think that's the fastest ^17:55
clarkbok back18:28
clarkbcorvus: +2 from me18:29
fungime^218:31
clarkbfungi: any thoughts on the best time to proceed with a gerrit restart? the largest cache file is "only" 8.5gb so I think we can skip clearing those out this time18:31
clarkbwhich means the process should be announce, pull images, down containers, move replication queue aside or delete it, start containers, check functioanlity18:32
fungii'm ready whenever everyone else is18:32
fungithat process seems fine18:32
clarkbI can be ready in probably 5 ish minutes. Just need to settle back into the office and find something to drink18:33
fungialso did we ever get any resolution on the replication queue state problems across restarts?18:33
fungii'm guessing not, since we keep clearing it out still18:33
clarkbfungi: no, I pushed a change that I was told is wrong but no one has been able to help point out the correct path forweard18:33
fungiugh. got it18:33
clarkbunfortunately I think it is low priority for upstream beacuse most people using the replciation plugin do so for ha setups which means they replicate absolutely everything18:34
clarkbthe bug has to do with how we're replicating less than absolutely everything and how some replication events get generated for things we don't want to replicate and then ignored18:34
clarkbsince they are ignored they never get out of the initial waiting queue18:34
clarkbI think people are also scared of making drastic changes to the plugin due to the reliance of ha on it18:34
clarkbone idea I had semi recently was to propose a new plugin copies over the existing plugin then can be more cavalier with changes to mitigate those concerns18:35
clarkbbut I haven't asked anyone upstream if that makes sense to them18:35
fungiah, yeah, that discussion, now i remember18:35
fungione plugin for ha relpicating everything, another plugin for selective replication18:35
clarkbya18:37
clarkboh and the reason we notice on startup is on startup the plugin goes through the waiting queue and tries to catch back up again but since those events got mistakenly recorded there isn't an ability ot process them and each one generates a massive traceback18:38
clarkbthis is why moving the waiting queue aside addresses the log flood. There aren't events anymore to trigger the tracebacks18:38
opendevreviewMerged openstack/project-config master: Temporarily remove zuul-providers from zuul tenant  https://review.opendev.org/c/openstack/project-config/+/94090118:38
clarkbbindep's promote jobs for the tox.ini removal are in the same boat with stuck jobs;18:39
clarkbI thinkwe can avoid landing other bindep changes until the zuul upgrade happens and then manually dequeue those stuck ones18:39
clarkband just deal with bindep after as the lazy option there18:40
fungistatus notice The Gerrit service on review.opendev.org will be offline momentarily while we upgrade for a new jeepyb feature and switch our database container image source repository18:43
fungiis that an accurate summary?18:43
clarkbyes. I do note that gerrit stable-3.10 hasn't updated since the last image build so gerrit itself isn't updating18:44
clarkbI started a root screen on review02 fwiw18:44
fungijoined18:45
clarkbI'll go ahead and do a docker-compose pull now18:46
clarkbthat way if it faisl we can surgery /etc/hosts again before we're too deep into the announcement18:46
clarkb464ba0f25c04 is the current gerrit image18:47
clarkbits interesting that mariadb 10.11 got no updates for a while and now has been getting regular updates every other day18:48
clarkbholidays I guess18:49
fungimind if i scroll up so i can check the new gerrit image details?18:49
clarkbgo for it18:49
fungiokay, so 22 hours old18:50
fungilgtm18:50
clarkbshall we proceed? Next step would be the announcement18:50
fungi#status notice The Gerrit service on review.opendev.org will be offline momentarily while we upgrade for a new jeepyb feature and switch our database container image source repository18:50
opendevstatusfungi: sending notice18:50
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline momentarily while we upgrade for a new jeepyb feature and switch our database container image source repository18:50
opendevstatusfungi: finished sending notice18:53
fungiguess we can proceed18:53
clarkbok I'll down the service next18:53
clarkbmv things aside then start again18:54
corvusi'm around18:54
clarkblog say it is up18:55
clarkbweb ui responds but no diffs yet18:55
clarkbshow-queue shows it working through cache pruning on startup18:56
fungiyeah, seems to be working fine to me18:56
clarkbI have diffs again18:57
clarkband the queue looks empty now18:57
clarkbanyone have a change they need to update then we can use that to check replication?18:58
clarkb(it also checks ability to push a change) if those two things look good I think we're done18:58
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Simplify provider configuration  https://review.opendev.org/c/opendev/zuul-providers/+/94092618:59
corvusyep18:59
corvusit's a new change but i think that meets reqs :)18:59
clarkbyup I'm checking a direct fetch from gitea now18:59
clarkbseems to have worked19:00
clarkbgit clone https://opendev.org/opendev/zuul-providers ; cd zuul-providers; git fetch origin refs/changes/26/940926/1; git show FETCH_HEAD19:00
clarkbthe resulting sha there matches the one gerrit reports19:00
clarkbthanks!19:01
corvusthank you :)19:01
clarkbI'll let this run for a bit longer before closing out the screen but I don't see anything conerning in logs (we still have a few clinets that can't neogitate key exchange...) or in my limited interaction with the service so far19:03
corvusi've been pushing a bunch of changes with no ill effect, and i have no read flags in gertty19:07
clarkbcorvus: any reason to not approe 940855 now?19:08
clarkbI don't think that interacts iwth the zuul promote problem now that we've excised this repo from zuul's zuul configs19:09
clarkbso should be safe?19:09
corvusclarkb: can you check that change number?19:20
corvusnot sure that's what you meant19:20
opendevreviewAndy Ladjadj proposed zuul/zuul-jobs master: [upload-logs-base] add ipa extension to known mime types  https://review.opendev.org/c/zuul/zuul-jobs/+/83404519:20
clarkboh sorry 94088519:20
clarkbthe change to add nodesets19:20
corvusyep that and child should be fine19:21
clarkbcool approved btoh19:21
fungithe python packaging community now has a formal governance proposal: https://peps.python.org/pep-0772/19:22
opendevreviewMerged opendev/zuul-providers master: Add nodesets  https://review.opendev.org/c/opendev/zuul-providers/+/94088519:28
opendevreviewMerged opendev/zuul-providers master: Simplify provider configuration  https://review.opendev.org/c/opendev/zuul-providers/+/94092619:30
clarkbgitea 1.23.3 screenshots lgtm https://f7e253438fd22bd1a6b5-fade3af1ce0fe256b4b67e8c2d1b465c.ssl.cf5.rackcdn.com/940823/2/check/system-config-run-gitea/2c3785c/bridge99.opendev.org/screenshots/19:51
clarkbI'm tempted to merge that chagne but also the weather is clearing out finally so I may try to pop outside in an hour or two to take advantage before then next round of of cold and wet arrives19:51
opendevreviewClark Boylan proposed opendev/system-config master: Start using python3.12  https://review.opendev.org/c/opendev/system-config/+/94092819:58
clarkbalso ^ is something I've been meaning to get the ball rolling onf or some time19:58
clarkbhttps://etherpad.opendev.org/p/opendev-server-replacement-sprint I've started to capture a list of what I think is more easy/ straightforward to upgrade next week to get things onto the new platform as canaries20:41
clarkbI think if we work through the easy and medium difficulty items on that list then when we get to the more difficult stuff we should've shaken out all the big issues20:42
clarkbone thing I'll note is that corvus has suggested we upgrade all of zuul together so updating the schedulers may also imply doing the mergers and executors. There is also some question about whether we can defer on nodepool, but I think nodepool will be a good canary and i'm not sure how quickly we can get all of opendev switched to zuul launcher so think we should just proceed with20:43
clarkbupdating it too20:43
clarkbI'm going to close the screen on review02 now20:58
corvuswhat's the support life for nodepool vms?20:59
corvusapril20:59
clarkbya21:00
corvusthat's certainly in my mental grey area.  i hope by then but i agree we can't count on it.21:01
clarkband I don't want zuul to feel pressured or rushed either21:02
clarkbI think its been good that we've been working through it step by step and checking things are functioanl along the way21:02
clarkbsetting a hard deadline on that might be counter productive21:02
corvus++21:03
opendevreviewAndy Ladjadj proposed zuul/zuul-jobs master: [upload-logs-base] add android mime-type  https://review.opendev.org/c/zuul/zuul-jobs/+/83404622:39
opendevreviewMerged zuul/zuul-jobs master: Add ensure-uv role  https://review.opendev.org/c/zuul/zuul-jobs/+/94027123:43

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!