opendevreview | Ghanshyam proposed openstack/project-config master: Add separate ACL for openstackdocstheme https://review.opendev.org/c/openstack/project-config/+/940845 | 03:13 |
---|---|---|
*** ralonsoh_ is now known as ralonsoh | 07:40 | |
*** ykarel_ is now known as ykarel | 13:27 | |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Add nodesets https://review.opendev.org/c/opendev/zuul-providers/+/940885 | 15:22 |
clarkb | infra-root if we still want to try and restart gerrit today to pick up the new image build by the new jeepyb chagne it would be nice to get https://review.opendev.org/c/opendev/system-config/+/940351 in as well to source mariadb from the mirror location | 15:47 |
fungi | yeah, i'm available any time | 15:47 |
corvus | lgtm/sgtm | 15:48 |
fungi | approved | 15:48 |
clarkb | also gitea immediately made a new release after the one yesterday so I'll update that change | 15:48 |
clarkb | or maybe not my desktop isn't seeing the usb device with offline keys (laptop sees it) | 15:57 |
clarkb | weird a different port worked. I tested the same port with a different device and it too worked | 16:01 |
opendevreview | Clark Boylan proposed opendev/system-config master: Update to gitea 1.23.3 https://review.opendev.org/c/opendev/system-config/+/940823 | 16:04 |
frickler | #status notice nominations for the OpenStack PTL and TC positions are now open, for details see https://governance.openstack.org/election/ | 16:06 |
opendevstatus | frickler: sending notice | 16:06 |
-opendevstatus- NOTICE: nominations for the OpenStack PTL and TC positions are now open, for details see https://governance.openstack.org/election/ | 16:07 | |
opendevstatus | frickler: finished sending notice | 16:09 |
corvus | it appears that we tried to launch one node on vexxhost and it finished launching but failed the nodescan | 16:34 |
corvus | (re niz) | 16:34 |
clarkb | corvus: did it fail because the host was not reachable? | 16:37 |
clarkb | I think there is a certain amount of background noise for failures like that though usually in clouds that boot more slowly | 16:37 |
corvus | yeah | 16:37 |
corvus | 2025-02-06 15:25:21,754 DEBUG zuul.Launcher: Nodescan request failed with 0 keys, 96 initial connection attempts, 0 key connection failures, 0 key negotiation failures in 300 seconds | 16:37 |
clarkb | it is possible that is due to needing to copy the disk image for the first boot though which may delay the first boot on each hypervisor until things get cached | 16:37 |
corvus | i think it's 162.253.55.62 which is still not accepting ssh connections | 16:37 |
corvus | the node is still there because there is a bug in zuul-launcher and the state machine is stuck due to the failed nodescan | 16:38 |
corvus | so that's lucky :) | 16:38 |
corvus | | f961eecc-0e67-4855-ac9a-e7e771feb9c9 | np7b5f6a91646c4 | ACTIVE | public=162.253.55.62, 2604:e100:1:0:f816:3eff:fee1:680e | ubuntu-jammy-c148b7b27ddb45b8a48230e6b28167e3 | v3-standard-4 | | 16:38 |
corvus | yeah that's the node | 16:39 |
clarkb | I've got to pop out now to get to that doctor appointment, but checking the console log for the node might be helpful to confirm glean tried to configure the network | 16:39 |
corvus | how do we console on vexxhost? | 16:39 |
corvus | api/cli/web? | 16:40 |
clarkb | corvus: I think openstack client should work | 16:40 |
corvus | ack thx i'll try that | 16:40 |
clarkb | its something like server console log show uuid or console log show uuid | 16:40 |
corvus | console log show f961eecc-0e67-4855-ac9a-e7e771feb9c9 just exits 0 with no output :/ | 16:42 |
corvus | running the same command with another nodepool vm does return output | 16:43 |
corvus | perhaps this is a fault with the image | 16:43 |
fungi | was it uploaded raw? i think we normally do raw because they're keeping bfv images in ceph there? | 16:45 |
corvus | yep, and image show says disk_format | raw | 16:45 |
corvus | it is much smaller than one of the current nodepool images though | 16:46 |
corvus | nodepool: 29045489664 / zuul: 9315621242 | 16:46 |
corvus | (heh, recording the image size in the artifact data at https://zuul.opendev.org/t/zuul/image/ubuntu-jammy would be helpful) | 16:47 |
fungi | was it maybe stored compressed? | 16:48 |
fungi | and that's the compressed size rather than the uploaded size? | 16:48 |
corvus | the job is supposed to compress it and the launcher is supposed to decompress it... let's see | 16:49 |
corvus | https://zuul.opendev.org/t/opendev/build/e297d3d8de8e433f8ed579e6f99174ba is the build | 16:50 |
corvus | here's the compression: /opt/dib_tmp/dib-images/ubuntu-jammy.raw : 32.03% ( 27.1 GiB => 8.68 GiB, /opt/dib_tmp/dib-images/ubuntu-jammy.raw.zst) | 16:51 |
corvus | that's the right order of magnitude for an error like the launcher didn't uncompress it | 16:51 |
corvus | yep, we're missing the "decompress" log entries | 16:54 |
corvus | i see the bug | 16:56 |
opendevreview | Merged opendev/system-config master: Switch Gerrit to opendevmirror hosted mariadb image https://review.opendev.org/c/opendev/system-config/+/940351 | 17:20 |
fungi | that ^ deployed successfully | 17:38 |
corvus | funny story: the zuul change to fix the error with empty nodeset requests has merged. the promote job for that change is sitting in the queue waiting for an empty nodeset request to be fulfilled; it never will be. therefore the image with the fix in it will never be published and deployed to fix the bug that's causing the image not to be published. | 17:50 |
fungi | hah | 17:51 |
corvus | it's been a while since we've had one of these ouroboros bugs | 17:51 |
corvus | the nice thing is -- this is actually fixable with nothing but normal changes; i'm trying to think of the least disruptive | 17:52 |
opendevreview | James E. Blair proposed openstack/project-config master: Temporarily remove zuul-providers from zuul tenant https://review.opendev.org/c/openstack/project-config/+/940901 | 17:55 |
opendevreview | James E. Blair proposed openstack/project-config master: Revert "Temporarily remove zuul-providers from zuul tenant" https://review.opendev.org/c/openstack/project-config/+/940902 | 17:55 |
corvus | i think that's the fastest ^ | 17:55 |
clarkb | ok back | 18:28 |
clarkb | corvus: +2 from me | 18:29 |
fungi | me^2 | 18:31 |
clarkb | fungi: any thoughts on the best time to proceed with a gerrit restart? the largest cache file is "only" 8.5gb so I think we can skip clearing those out this time | 18:31 |
clarkb | which means the process should be announce, pull images, down containers, move replication queue aside or delete it, start containers, check functioanlity | 18:32 |
fungi | i'm ready whenever everyone else is | 18:32 |
fungi | that process seems fine | 18:32 |
clarkb | I can be ready in probably 5 ish minutes. Just need to settle back into the office and find something to drink | 18:33 |
fungi | also did we ever get any resolution on the replication queue state problems across restarts? | 18:33 |
fungi | i'm guessing not, since we keep clearing it out still | 18:33 |
clarkb | fungi: no, I pushed a change that I was told is wrong but no one has been able to help point out the correct path forweard | 18:33 |
fungi | ugh. got it | 18:33 |
clarkb | unfortunately I think it is low priority for upstream beacuse most people using the replciation plugin do so for ha setups which means they replicate absolutely everything | 18:34 |
clarkb | the bug has to do with how we're replicating less than absolutely everything and how some replication events get generated for things we don't want to replicate and then ignored | 18:34 |
clarkb | since they are ignored they never get out of the initial waiting queue | 18:34 |
clarkb | I think people are also scared of making drastic changes to the plugin due to the reliance of ha on it | 18:34 |
clarkb | one idea I had semi recently was to propose a new plugin copies over the existing plugin then can be more cavalier with changes to mitigate those concerns | 18:35 |
clarkb | but I haven't asked anyone upstream if that makes sense to them | 18:35 |
fungi | ah, yeah, that discussion, now i remember | 18:35 |
fungi | one plugin for ha relpicating everything, another plugin for selective replication | 18:35 |
clarkb | ya | 18:37 |
clarkb | oh and the reason we notice on startup is on startup the plugin goes through the waiting queue and tries to catch back up again but since those events got mistakenly recorded there isn't an ability ot process them and each one generates a massive traceback | 18:38 |
clarkb | this is why moving the waiting queue aside addresses the log flood. There aren't events anymore to trigger the tracebacks | 18:38 |
opendevreview | Merged openstack/project-config master: Temporarily remove zuul-providers from zuul tenant https://review.opendev.org/c/openstack/project-config/+/940901 | 18:38 |
clarkb | bindep's promote jobs for the tox.ini removal are in the same boat with stuck jobs; | 18:39 |
clarkb | I thinkwe can avoid landing other bindep changes until the zuul upgrade happens and then manually dequeue those stuck ones | 18:39 |
clarkb | and just deal with bindep after as the lazy option there | 18:40 |
fungi | status notice The Gerrit service on review.opendev.org will be offline momentarily while we upgrade for a new jeepyb feature and switch our database container image source repository | 18:43 |
fungi | is that an accurate summary? | 18:43 |
clarkb | yes. I do note that gerrit stable-3.10 hasn't updated since the last image build so gerrit itself isn't updating | 18:44 |
clarkb | I started a root screen on review02 fwiw | 18:44 |
fungi | joined | 18:45 |
clarkb | I'll go ahead and do a docker-compose pull now | 18:46 |
clarkb | that way if it faisl we can surgery /etc/hosts again before we're too deep into the announcement | 18:46 |
clarkb | 464ba0f25c04 is the current gerrit image | 18:47 |
clarkb | its interesting that mariadb 10.11 got no updates for a while and now has been getting regular updates every other day | 18:48 |
clarkb | holidays I guess | 18:49 |
fungi | mind if i scroll up so i can check the new gerrit image details? | 18:49 |
clarkb | go for it | 18:49 |
fungi | okay, so 22 hours old | 18:50 |
fungi | lgtm | 18:50 |
clarkb | shall we proceed? Next step would be the announcement | 18:50 |
fungi | #status notice The Gerrit service on review.opendev.org will be offline momentarily while we upgrade for a new jeepyb feature and switch our database container image source repository | 18:50 |
opendevstatus | fungi: sending notice | 18:50 |
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline momentarily while we upgrade for a new jeepyb feature and switch our database container image source repository | 18:50 | |
opendevstatus | fungi: finished sending notice | 18:53 |
fungi | guess we can proceed | 18:53 |
clarkb | ok I'll down the service next | 18:53 |
clarkb | mv things aside then start again | 18:54 |
corvus | i'm around | 18:54 |
clarkb | log say it is up | 18:55 |
clarkb | web ui responds but no diffs yet | 18:55 |
clarkb | show-queue shows it working through cache pruning on startup | 18:56 |
fungi | yeah, seems to be working fine to me | 18:56 |
clarkb | I have diffs again | 18:57 |
clarkb | and the queue looks empty now | 18:57 |
clarkb | anyone have a change they need to update then we can use that to check replication? | 18:58 |
clarkb | (it also checks ability to push a change) if those two things look good I think we're done | 18:58 |
opendevreview | James E. Blair proposed opendev/zuul-providers master: Simplify provider configuration https://review.opendev.org/c/opendev/zuul-providers/+/940926 | 18:59 |
corvus | yep | 18:59 |
corvus | it's a new change but i think that meets reqs :) | 18:59 |
clarkb | yup I'm checking a direct fetch from gitea now | 18:59 |
clarkb | seems to have worked | 19:00 |
clarkb | git clone https://opendev.org/opendev/zuul-providers ; cd zuul-providers; git fetch origin refs/changes/26/940926/1; git show FETCH_HEAD | 19:00 |
clarkb | the resulting sha there matches the one gerrit reports | 19:00 |
clarkb | thanks! | 19:01 |
corvus | thank you :) | 19:01 |
clarkb | I'll let this run for a bit longer before closing out the screen but I don't see anything conerning in logs (we still have a few clinets that can't neogitate key exchange...) or in my limited interaction with the service so far | 19:03 |
corvus | i've been pushing a bunch of changes with no ill effect, and i have no read flags in gertty | 19:07 |
clarkb | corvus: any reason to not approe 940855 now? | 19:08 |
clarkb | I don't think that interacts iwth the zuul promote problem now that we've excised this repo from zuul's zuul configs | 19:09 |
clarkb | so should be safe? | 19:09 |
corvus | clarkb: can you check that change number? | 19:20 |
corvus | not sure that's what you meant | 19:20 |
opendevreview | Andy Ladjadj proposed zuul/zuul-jobs master: [upload-logs-base] add ipa extension to known mime types https://review.opendev.org/c/zuul/zuul-jobs/+/834045 | 19:20 |
clarkb | oh sorry 940885 | 19:20 |
clarkb | the change to add nodesets | 19:20 |
corvus | yep that and child should be fine | 19:21 |
clarkb | cool approved btoh | 19:21 |
fungi | the python packaging community now has a formal governance proposal: https://peps.python.org/pep-0772/ | 19:22 |
opendevreview | Merged opendev/zuul-providers master: Add nodesets https://review.opendev.org/c/opendev/zuul-providers/+/940885 | 19:28 |
opendevreview | Merged opendev/zuul-providers master: Simplify provider configuration https://review.opendev.org/c/opendev/zuul-providers/+/940926 | 19:30 |
clarkb | gitea 1.23.3 screenshots lgtm https://f7e253438fd22bd1a6b5-fade3af1ce0fe256b4b67e8c2d1b465c.ssl.cf5.rackcdn.com/940823/2/check/system-config-run-gitea/2c3785c/bridge99.opendev.org/screenshots/ | 19:51 |
clarkb | I'm tempted to merge that chagne but also the weather is clearing out finally so I may try to pop outside in an hour or two to take advantage before then next round of of cold and wet arrives | 19:51 |
opendevreview | Clark Boylan proposed opendev/system-config master: Start using python3.12 https://review.opendev.org/c/opendev/system-config/+/940928 | 19:58 |
clarkb | also ^ is something I've been meaning to get the ball rolling onf or some time | 19:58 |
clarkb | https://etherpad.opendev.org/p/opendev-server-replacement-sprint I've started to capture a list of what I think is more easy/ straightforward to upgrade next week to get things onto the new platform as canaries | 20:41 |
clarkb | I think if we work through the easy and medium difficulty items on that list then when we get to the more difficult stuff we should've shaken out all the big issues | 20:42 |
clarkb | one thing I'll note is that corvus has suggested we upgrade all of zuul together so updating the schedulers may also imply doing the mergers and executors. There is also some question about whether we can defer on nodepool, but I think nodepool will be a good canary and i'm not sure how quickly we can get all of opendev switched to zuul launcher so think we should just proceed with | 20:43 |
clarkb | updating it too | 20:43 |
clarkb | I'm going to close the screen on review02 now | 20:58 |
corvus | what's the support life for nodepool vms? | 20:59 |
corvus | april | 20:59 |
clarkb | ya | 21:00 |
corvus | that's certainly in my mental grey area. i hope by then but i agree we can't count on it. | 21:01 |
clarkb | and I don't want zuul to feel pressured or rushed either | 21:02 |
clarkb | I think its been good that we've been working through it step by step and checking things are functioanl along the way | 21:02 |
clarkb | setting a hard deadline on that might be counter productive | 21:02 |
corvus | ++ | 21:03 |
opendevreview | Andy Ladjadj proposed zuul/zuul-jobs master: [upload-logs-base] add android mime-type https://review.opendev.org/c/zuul/zuul-jobs/+/834046 | 22:39 |
opendevreview | Merged zuul/zuul-jobs master: Add ensure-uv role https://review.opendev.org/c/zuul/zuul-jobs/+/940271 | 23:43 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!