Tuesday, 2025-11-11

Clark[m]bit of a slow start for me today. I'm working on system updates then our meeting agenda then I'm into a big block of meetings.16:12
Clark[m]corvus: for the zk upgrade the release notes indicate no special upgrade steps from 3.7 or 3.8 to 3.9. I do think the "normal" upgrade process is to upgrade each non leader node to the new version first then upgrade the leader last. I don't think our current ansible playbook is able to do things in that order (though may do so by chance). Do you think we should do that upgrade manually?16:13
corvusClark: that's what you did last time.  i think that's the safe thing to do.  i would also be willing to try just letting it run unmanaged and see what happens (over a weekend).  recovering shouldn't be a big deal if something goes wrong.  either way wfm (and i'm happy to help / run it).16:23
clarkbcorvus: ack16:26
clarkbin other news seems my usb drive or usb control or usb port is unhappy making loading secrets more painful this morning than it should be16:27
clarkbI'll have to test the device in another computer to see if I can narrow down the problem16:27
clarkbok I've hacked up the agenda on the wiki. I think we managed to complete a lot of stuff (dealth with gerrit restarts, etherpad upgrade, trixie mirroring, etc) so I've cleared a bunch of stuff off the ageanda and made some updates to other items16:33
clarkbis there anything new that should be added? If not I'll send this out in about 10-15 minutes probably16:33
corvusclarkb: want to put zk on there?16:34
clarkbcorvus: ++16:44
clarkbalso I think I have a bad usb port, but another port on the same controller seems to work16:45
clarkbunsure if this is possibly related to the recent kernel update (wouldn't surprise me)16:45
clarkband now my usb dac doesn't want to show up. This is "fun"16:46
fungirecent kernel update?16:46
fungimaybe something changed with the driver for your hub chip16:47
clarkbya I'm wondering. It is a minor patch update not a new proper version but weirder things have been known to happen16:47
clarkbI may just try a reboot here and see if that reinitializes things more happily16:50
fungihave you tried turning it off and on again?16:52
clarkbok agenda is ssent16:57
clarkbI'm going to see if a reboot helps anything16:57
clarkbmy av setup is also not working which will make my meeting later today more annoying. Here I was thinking "this is smart usb just always works" and now usb has failed me16:57
clarkbya that port works now after a reboot. But my usb dac is still not coming up so I'm beginning to suspect something with the kernel update impact usb behaviors. Fun17:02
fungihave the ability to boot your prior kver?17:05
clarkbI do, I'll test that if I can't get av stuff working17:09
clarkbfurther debugging: if I plug my external hub back into that port (that was working when the device was direct attached) it goes back to not working. So maybe something to do with this external hub17:14
fungion the topic of "why i hate the typical github pull request workflow" i give you https://github.com/orcwg/orcwg/pull/213/commits as a prime example17:17
fungipush a huge commit... oh not suitable? push a revert, then a tiny change17:18
funginow when that merges you get a pile of unnecessary noise in the commit history for what should have been a comparatively small diff17:18
fungiunless you squash them, rewriting the authors commits entirely17:19
clarkbthe built in audio controller continues to not function and I can't get my usb dac going. I suspect that I have a bank of working ports on controller A and a bank of less working ports on controller B but some devices seem to work with controller B17:22
clarkband dac doesnt' work with any ports.17:23
clarkbbut I suspect that dac issues may be unrelated (or maybe whatever affected my controller hit the connected dac too17:24
fungipossible some usb-connected device is throwing noise onto the bus? is dmesg reporting any usb resets?17:29
fungior could something be drawing too much current?17:30
clarkbyes on the bad port only17:30
clarkbI got it working with some combo of other ports, then disconnected to tidy of cables and now it doesn't work again. So something is definitely breaking in a weird way. I may just put it down for today and figure out using my phone as an av device17:30
clarkbya ok I think if I just avoid that one bad port things eventually steady state into a happier place17:37
clarkbnot ideal but I can make this work for now17:37
clarkbcorvus: were you able to track down that arm64 node in osuosl that was not getting cleaned up after being put in an used state?17:54
corvusclarkb: yes!  https://review.opendev.org/966500 should fix the immediate issue with that node17:57
corvusthen every change after it in that stack is making sure it doesn't happen again (and ultimately prompted the zk upgrade conversation)17:57
clarkbawesome that is now on my review list17:57
corvusit probably makes sense to merge that today and do a launcher restart with just that17:58
clarkback I should be able to review after all my meetings are done ~noon today17:58
corvus++.  should be a short review.  :)17:58
mnasercorvus: i was thinking of slowly adding patches to my zuul-web stack to bump up things a few components at a time (aka redux... then react.. then patternfly.. etc) -- would that duplicate something that's on your list already  since you mentioned it yessterday?18:01
mnaserSorry. Thought this was the Zuul channel.18:03
corvusmnaser: yes, i've done enough work in that direction already that i know that eventually we hit a deadlock that's going to require untangling a lot, including removing CRA.  i think it's going to be easier to just do it all at once, and yes, i currently have that penciled in for early december on my calendar.18:03
opendevreviewGoutham Pacha Ravi proposed openstack/project-config master: Set noop job for the governance-sigs repository  https://review.opendev.org/c/openstack/project-config/+/96675518:43
tonybclarkb: anytime from 1900utc works for me for the Gerrit update.    that's middle of the day(ish) for the US though so maybe a little later?19:40
clarkbtonyb: ya what about 2100 or 2200 UTC? thats 1pm or 2pm pacific so not quite end of day (which is my preference) while still minimizing impact on others19:41
clarkbtonyb: I think we can expect that gerrit shutdown will timeout trying to process the h2 databases which is a 5 minute timeout. Then startup will be "cold" with caches needing to be rebuilt19:41
tonybworks for me19:41
clarkbso it may take around 15 minutes or so from start to finish19:41
tonybokay.  Any of that time window works for me.   so pick based on your preference (lunch/bike ride/weather etc)19:44
tonybI'll be around 19:44
clarkbtonyb: I think we can get started at 2100 which should have the actual restart happening by about 213019:44
* clarkb makes notes now to not forgety19:44
tonybsounds good19:46
opendevreviewMerged opendev/zuul-providers master: Use mirror for trixie image build job  https://review.opendev.org/c/opendev/zuul-providers/+/96661519:51
clarkbfungi: I ran a tail on the apache log for lists and don't see anythin that stands out as particularly bad. Just the expected crawling behaviors20:05
clarkbI need to eat lunch now, but could be the issue is unrelated to normal traffic in that case?20:05
opendevreviewMerged opendev/system-config master: Fix test_yamlgroup  https://review.opendev.org/c/opendev/system-config/+/96663920:07
fungiyeah, without knowing what the database is spending so much of its time on, it's hard to say20:09
clarkbafter lunch I looked at apache logs againand I think we can identify a couple of potentially problematic sets of requests21:36
clarkbnot positive they are the cause of the load issues, but they follow patterns we've seen in the past that have caused problems21:37
clarkbI'm going to take advantage of the current weather situation and go outside for a bit. Back later and can catch up on lists or anything else that may come up then21:57
corvusclarkb: the cleanup change landed, so i'll restart the launchers and verify the node gets deleted22:01
corvusthere are now 0 "used" nodes22:06
corvusRamereth[m]: that instance you asked about on friday is deleted now (and the zuul bug is fixed).  thanks for letting us know.22:07
corvusi have also cleaned up the leaked image upload records (that was a bug from a couple of weeks ago that should be fixed now)22:09

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!