Monday, 2025-02-03

*** Guest7825 is now known as diablo_rojo_phone06:12
*** ralonsoh_ is now known as ralonsoh15:00
*** tao is now known as Guest792115:28
Guest7921Hi everyone! We’re researching cross-project flakiness in OpenStack—share your insights in our 10-minute anonymous survey: https://forms.gle/dUMWRL8MNALQE6kG8 Thank you! 🚀15:29
clarkbif there are no objections in the next 15-20 minutes I'll approve https://review.opendev.org/c/opendev/system-config/+/940536 to update the haproxy image location15:56
fungilgtm, thanks!16:01
fungiclarkb: want me to go ahead and approve 940536?17:03
fungioh, never mind, you already did17:03
fungii missed the workflow vote on there17:03
fungiand that was about 50 minutes ago, so should be merging soon17:04
fungizuul says another 23 minutes17:04
clarkbyup I'm waiting patientyl while I get other stuff done17:12
clarkbinfra-root I'm going to put this on the meeting agenda, but I'm thinking I'd like to do a server replacement sprint probably next week. Basically do our best to redeploy as many things on jammy or noble as possible to get off of focal. I think if we focus on that we should be able to get a good number of servers done in a week17:24
clarkba good part of what makes the process take a long time is waiting for reviews for things like dns updates and confirmation services on the new server are working. My goal is that if we set aside time specifically to work through that then we can quickly get trhough those updates17:25
clarkbalso I realized that I think I can put the existing grafana server in the emergency file. Land the upgrade change, then deploy a new grafana and have it deploy a new one from scratch that we switch dns over to if we're happy with the result17:26
clarkbthat mgiht be the safest way to do an upgrade if we're worried about landing the upgrade on the old server17:26
clarkbopen to idea on ^ if we have a preference for goign the safe route or not17:26
fungithat sounds great17:27
fungiwe've done that for a few other replacements in the past, especially during the puppet-to-ansible work17:28
clarkbin that case I'll work on a change to update launch node to check cpu count, then an update to test grafana on top of noble. When we are happy to land those I can put the existing host in the emergency file and then launch a new server17:31
opendevreviewMerged opendev/system-config master: Switch our haproxy image to quay opendevmirror location  https://review.opendev.org/c/opendev/system-config/+/94053617:32
corvushi, i'd like to get some feedback from opendev users, probably especially from the openstack project, on this change: https://review.opendev.org/938677 17:33
corvusit's a status page ui change that causes the individual gate queue items to be collapsed by default17:33
corvushere is the site preview for the change: https://0eff135c994e4125b903-72f20af1f3723272b921a9ee1bd5f518.ssl.cf2.rackcdn.com/938677/5/check/zuul-build-dashboard-opendev/007b437/npm/html/17:34
clarkbJayF: TheJulia: dansmith ^ are probably good feedback sources17:34
dansmiththere was already a change that causes the jobs to be collapsed by default, late last year17:35
dansmithI wanted to complain about that already17:35
corvusnot jobs17:35
dansmithI'm definitely not in favor of collapsing further for sure17:35
clarkbif you use the expand all button then it doesn't look like you're affected by this change17:36
dansmithyeah, is that new? that definitely helps.. I didn't recall seeing that after the previous change17:36
corvuscorrect; that's kind of the thesis i'm wondering about: might it be the case that people who would normally be adversely affected by this already using expand all?17:36
clarkbexpand all came out of the feedback from the earlier change you talked about17:37
clarkbso its newer than that change but not new17:37
dansmithon zuul.o.o I have no expand all toggle17:37
dansmithoh, show all jobs I guess17:38
corvusthis shows the difference in the proposed change (just to be clear and make sure we're talking about the same thing) https://imgur.com/a/XsCWxam17:38
dansmithto me there's very little value in the fully collapsed view, so I'm not sure why that's an improvement.. I understand why collapsing the jobs (the previous change) might be preferred by some (but not me)17:39
dansmithbut as long as the expand-all-the-things is there then, meh17:39
dansmithcorvus: I'm complaining about any/all of the collapsing, given this is the first feedback ask, so I'm "collapsing" multiple changes in my complaints :)17:39
corvusdansmith: https://imgur.com/a/hJAZ5XB the page should look like that, and if you select both of those toggles, you should get approximately "the old status page"17:40
clarkbzuul load balancer also updated and seems to still work for me17:40
corvusdansmith: if it doesn't look like that, then shift-reload :)17:40
dansmithyeah, all the toggles makes me happy again17:41
dansmithback button navigation seems very broken, but I'm guessing that's because it's a preview site17:42
corvusdansmith: ack.  and the expand-all (and some other changes) were made to address that; hopefully that helps.  just fyi (since you wondered who the collapsed view helps) -- there are some much larger installations of zuul where it's difficult to get a view of overall trends without "zooming out" more, so the new status page is an attempt to help with those use cases; ideally without breaking existing ones.  hopefully we're getting close.  :)17:42
corvusyes, "back" navigation can weird on the preview sites in some cases; that change shouldn't affect it, so if it's not broken on the real site, it shouldn't be broken by that change.17:43
dansmithcorvus: okay I guess I would expect that's a reason to allow collapsing by non-default, but fair enough.. as long as I can expand (and it's sticky for me) then I'm happy17:45
corvusack, thanks17:46
JayFcorvus: I run with "expand all" checked all the time, I don't like the collapsing at all but I don't experience any of the collapsed versions as literally step 1 for any use there is "expand all"17:49
JayFand afaict there is no change for the expanded version17:49
JayFI agree with dansmith that I'd have prefered no change to default collapse at all; but given that change is already there, this single additional change will not impact my workflow17:50
clarkbIts interesting to me that so many people appaer to run with expand all set but no one seems aware of kolla and tacker and I'm sure others using multiples of our quota pushing just a small numebr of changes17:50
clarkbmaybe people are aware and just indifferent17:50
JayFclarkb: expand all -> ^f [123456] -> look at jobs17:50
JayFor replace a patch number with ironic17:51
fungiaha, so using expand all but with a filtered view17:51
JayFI use browser search frequently and do not like collapsed patterns which make me unable to use browser serach17:51
JayFfungi: not filtered-by-webapp; literally in browser search17:51
fungioh, got it17:51
JayFI want to use the thing that has my keyboard shortcuts setup; not a webapp search box whose ui is different based on website :) 17:51
dansmithyes, and there's a special place in hell for web app designers that hijack the browser's find-in-page17:52
fungiagreed. the worst for me is gitlab17:52
clarkbwe got gerrit to stop doing it and now github does it17:52
dansmithit's right on the shore of a lava lake17:52
fungigitlab overloads both / and ^f, so i have to click on a browser menu to do find-in-page17:52
dansmithyep, I feel sorry for those people.. gonna be toasty in retirement17:53
fungiwhat i really wish is that browsers themselves would give users the ability to block that17:53
JayFthere are extensions which do so17:53
funginext best thing i guess17:53
JayFbut usually if a website hijacks it, it uses dynamic loading in a way that makes browser search useless17:53
dansmithyeah, similar to Don17:53
dansmithDontFuckWithPaste, which is another important one17:53
JayFthe only thing worse than infinite scroll is when they eat the thing I just scrolled past while showing the new thing17:54
corvusi largely agree with everything said about searching, but i do want to point out that the search filters in zuul's status page automatically migrate to the url, so if you do set some up, they are easy to bookmark17:54
corvuseg https://zuul.openstack.org/status?project=openstack%2Fnova&project=openstack%2Fkeystone17:55
fungithat is super useful too17:55
corvus(just in case people hadn't noticed that)17:56
JayFclarkb: I'll also note: kolla is a pretty unique skillset to manage -- I often even find myself pointing Ironic questions; when they include kolla/kolla-ansible, to the kolla team instead. I'm not sure the installers have the same benefit of a shared knowledge base to start from17:56
opendevreviewClark Boylan proposed opendev/system-config master: Add cpu count check to launch node  https://review.opendev.org/c/opendev/system-config/+/94064817:56
JayFclarkb: because re-reading your comment; I realized that even if I was aware of a weirdness with kolla jobs, it'd be really outta scope of what I usually work on in openstack and would likely spend my time in places I can make a bigger impact17:57
clarkbJayF: right but you can see they are running 64 jobs per patchset17:57
JayFOK. So what would by action be based on that knowledge? That's what I'm getting at. That knowledge with no context is not actionable17:57
clarkbits not about how to debug kolla but how to see that kolla is using all the quota in zuul17:57
JayFs/by/my/17:57
clarkbJayF: well ideally people stop asking me why zuul is slow (granted you aren't one of the people who have asked recently)17:58
clarkbthe information is gneraelly right there: Because a different project used all of our quota all at once17:58
clarkbinfra-root 940648 is totally untested. Probably the easiest way to test that is to just land it and run it?17:59
fungiwe run that script manually anyway, and should be its only users, so that seems like a fine enough choice17:59
corvus++18:01
clarkbwhen not expanding the zuul dashboard its easy to miss those details but so far 2/2 people we've asked for feedback run expanded so I'm just suprised that others are missing the info. But it is entirely possible they don't expand18:06
JayFclarkb: with ci I take after Ron Popeil: I set it and forget it :D 18:10
JayFI care more about the inconsistent performance node to node (also likely caused by noisy neighbors where the noisy neighbor is me) than I do about queue time18:10
JayFunless I'm trying to smash a CVE fix through the gate :D18:10
opendevreviewClark Boylan proposed opendev/system-config master: Update grafana to 10.4.14  https://review.opendev.org/c/opendev/system-config/+/94007318:11
clarkbinfra-root ^ ok I went ahead and updated grafana to test on noble. If that change looks good still I can put grafana01 in the emergency file, then approve that change then deploy a new grafana02 on noble18:12
opendevreviewMerged opendev/system-config master: Add cpu count check to launch node  https://review.opendev.org/c/opendev/system-config/+/94064818:19
clarkbI'll test ^ by launching grafana02 after a short break for some food18:20
fungioh, even better18:23
fungigood that there's a candidate so we don't forget that's merged by the time we get around to running the script again18:23
opendevreviewJames E. Blair proposed opendev/zuul-jobs master: DNM: test niz node  https://review.opendev.org/c/opendev/zuul-jobs/+/93245518:31
clarkbI think I need to wait for the hourly jobs to redeploy the launch tool to the launcher venv18:44
clarkbinfra-prod-service-bridge should do that18:44
clarkbya looking at venv contents it hasn't updated yet and looking at ansible it should when hourlies run18:45
clarkbthat should allow me to check the grafana on noble ci results before launching a new node too18:47
clarkbthere is a bug in the launch node update. I've fixed it directly on bridge and will push to system-config once grafana02 boot confirms it is generally working19:10
clarkbI forgot to boot with config drive so I'm waiting for it to timeout and I'll try again :/19:12
clarkboh hrm the timeout is 600 seconds * 4 accounts so thats ~40 minutes? Maybe I should ^C and manually cleanup the server19:20
clarkboh no its 600 seconds total I think it just timed out19:20
opendevreviewClark Boylan proposed opendev/system-config master: Fix launch node string quoting  https://review.opendev.org/c/opendev/system-config/+/94065219:23
clarkbthats the fix launch node appears to have gotten past that check so Ithink we're good19:24
clarkbgrafana01 did not have an external volume mounted. It also doesn't use backups. Once this is done I'll push up a change to add it to dns then a change to update system-config and we should be set to see that it deploys properly (I already put 01 in the emergency file)19:27
opendevreviewClark Boylan proposed opendev/zone-opendev.org master: Add grafana02 to DNS  https://review.opendev.org/c/opendev/zone-opendev.org/+/94065319:33
opendevreviewClark Boylan proposed opendev/system-config master: Deploy grafana02  https://review.opendev.org/c/opendev/system-config/+/94065419:47
clarkbinfra-root I blieve 940651 to fix launch node based on my testing. Then 940073, 940653 and 940654 should be safe to land. grafana01 is already in the emergency file and bridge group vars were already set up (so 940654 reorg of host vars to group vars matches what we already do in prod)19:48
clarkbI guess we may want dns to update first before approving the system-config chagne just to be sure that LE will deploy properly19:48
clarkbas always please double check things19:49
clarkbwe actually do have a test case for testing launch node venv stuff that fails in 94065420:13
clarkbso fixing launch node is step 0 here20:13
opendevreviewMerged opendev/zone-opendev.org master: Add grafana02 to DNS  https://review.opendev.org/c/opendev/zone-opendev.org/+/94065321:03
opendevreviewClark Boylan proposed opendev/system-config master: Test launch installation on launch edits  https://review.opendev.org/c/opendev/system-config/+/94065621:11
clarkbthis is an attempt at improving the testing to actually run the test case we have when we edit launch/21:11
clarkbI've reintroduced the bug in this patchset to ensure we catch the problem and will pull the bug back out again if it does21:11
fungicool!21:12
clarkbalso reminder to suggested/add/edit meeting agenda items nowish. I intend on adding a reminder that our election is starting tomorrow and bring up the idea of the noble node replacement sprint/hackathon/focus for next week21:15
opendevreviewMerged opendev/system-config master: Fix launch node string quoting  https://review.opendev.org/c/opendev/system-config/+/94065221:17
clarkbI have rechecked https://review.opendev.org/c/opendev/system-config/+/940654 now that ^ is in21:17
fungiSyntaxError: unexpected character after line continuation character21:33
fungihttps://zuul.opendev.org/t/openstack/build/bf4b04f8acca4c43b0ddd7a8c660fdb521:33
fungiworked!21:33
opendevreviewClark Boylan proposed opendev/system-config master: Test launch installation on launch edits  https://review.opendev.org/c/opendev/system-config/+/94065621:35
clarkbthat should make it mergeable21:35
clarkbcorvus: what updates should I apply to niz topic in our meeting tomorrow? Is it that zuul is going to start testing jobs on the niz managed nodes?21:41
clarkbI've done a first pass set of edits on the agenda but didn't touch niz yet21:49
corvusclarkb: i think that and the new repo21:53
opendevreviewMerged opendev/system-config master: Test launch installation on launch edits  https://review.opendev.org/c/opendev/system-config/+/94065621:53
fungiclarkb: did you want the inventory addition merged first, or the upgrade change?21:53
clarkbfungi: the inventory change is stacked on top of the upgrade change. I think we land both together though21:54
fungiaha, yeah21:55
clarkbbut ya Id like to avoid there ever being an old version on the new server we need to worry about configs migrating/upgrading for21:55
clarkbstart fresh and move on as our CI jobs seem to indicate this works21:55
fungiyou might actually want a bit of a pause between them. i've seen funk when a deploy job for change #1 starts running with the inventory addition from change #221:56
clarkbya that works too21:56
fungiwhen they both merge at nearly the same time that is21:56
clarkbgrafana01 is in a holding pattern so should be safe21:56
clarkbcorvus: thanks I've made those updates21:56
opendevreviewMerged opendev/system-config master: Update grafana to 10.4.14  https://review.opendev.org/c/opendev/system-config/+/94007322:19
clarkbthat "deployed" and I have confirmed grafana01 is still running the old image as expected22:23
fungiawesome, i'll approve the inventory addition now22:24
clarkbthanks22:25
opendevreviewMerged opendev/system-config master: Deploy grafana02  https://review.opendev.org/c/opendev/system-config/+/94065423:02
clarkbwe're still waiting for jobs to start running for  but I will keep an eye on it23:20
clarkbin the mean time any other edits for the meeting agenda? Otherwise I'll get that out soonish23:20

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!