Thursday, 2025-02-13

clarkbI got it working via flags=(attach_disconnected) in the rsyslogd profile but I think I may have had to restart processes (both for rsyslogd and haproxy) for it to take effect. sarnold was very helpful in the ubuntu security channel00:14
clarkbnot in a great spot to push a patch up for that at the moment as that meeting is happeing in ~40 minutes and I need to help get kids out the door to an activity. I'll try and followup in the morning00:21
fungidan_with: cloudnull: just saw another "blip" reaching sjc3. lost contact with a vm there at precisely 00:08:44 utc and regained access at about 00:20 utc. during the disconnect i was getting this when trying to reach the api with openstackclient: "Request to https://keystone.api.sjc3.rackspacecloud.com/v3/auth/tokens timed out"00:22
fungiso again neither server instances nor the api were reachable00:23
clarkbI filed https://bugs.launchpad.net/apparmor/+bug/2098148 at sarnold's request00:44
clarkbI'll leave zuul-lb02 in a working manually configured state for now and look to addressing that with a new change tomorrow morning00:48
*** dmellado075539372 is now known as dmellado0755393706:50
opendevreviewVladimir Kozhukalov proposed zuul/zuul-jobs master: [remove-registry-tag] Allow using in a loop  https://review.opendev.org/c/zuul/zuul-jobs/+/94151608:07
opendevreviewClark Boylan proposed opendev/system-config master: Install apparmor when installing podman  https://review.opendev.org/c/opendev/system-config/+/94147116:31
opendevreviewClark Boylan proposed opendev/system-config master: Fix haproxy access to rsyslogd on Noble  https://review.opendev.org/c/opendev/system-config/+/94157616:31
clarkbI split that out into two changes. The first installs apparmor with podman on noble and disables the new test on zuul-lb since it is already noble running podman and broken with apparmor. Then the second change applies the fix that was suggested to me yesterday (and manually applied to zuul-lb02) as well as readding the test that would fail16:32
clarkb941471 is still bouncing off rate limits but 941576 looks like it will pass18:01
clarkbhttps://c3380332c392aeb2c300-d82475c388238604ca15d044b3307e59.ssl.cf2.rackcdn.com/941576/1/check/system-config-run-zuul/e4307bb/bridge99.opendev.org/ara-report/results/767.html I think this and then the subsequent apparmor_parser -r command task as well as successful testinfra tests are a good indication this is happy now18:05
clarkbreviews welcome as well as thoughts on how we want ot land 941471 (we could temporarily trim the job list or force merge ore just recheck into oblivion)18:06
fungii need to go run a couple of errands briefly before weather gets here, but will take a look in an hour-ish18:39
TheJuliaQuestion regarding cloud providers, is rax all the newer rax infra?18:39
fungiTheJulia: no, rax-dfw, rax-iad and rax-ord are all the old rackspace classic; raxflex-sjc3 is the new rackspace flex19:30
fungiand we just got word this week there's a new flex region so we'll hopefully add raxflex-dfw3 in short order19:30
fungibut right now we have about 10x as much quota in classic as flex, so most jobs that run in rackspace are on the older hardware still19:31
clarkbI thinjk it would be great to expand our quota in sjc3 if rax is up to it but before we do that we should redeploy everything to get the new network mtus19:32
clarkbinfra-root https://review.opendev.org/c/opendev/system-config/+/941304/ is a change that should be an easy review that isn't directly imapcted by docker rate limits19:34
clarkband now that I've got a handle on haproxy and noble and podman and apparmor https://review.opendev.org/c/opendev/system-config/+/941130 would be a good one to continue on with to deploy the new codesaerch if we're comfortable with where things ended up. I suppose we could wait for the apparmor installation in testing to ensure we don't miss any bugs there19:35
clarkbthinking about 941471 more I think the risk it presents is very low as it only affects testing and noble nodes (of which we don't have many and they have the new packges preinstalled in prod)19:42
clarkbI'm willing to give that maybe one more recheck but if it consistently fails I think we should consider force merging given the low risk but also reltiavely high beneift to the noble rollout19:43
fungiwe have seen all the jobs pass for 941471, just not all at the same time19:51
fungiand the failures have been docker rate limit issues, not failures in the changes, and getting all 30+ to avoid those rate limits in one go is not realistic, and this change is in service of trying to incrementally address the problem causing the failures in the first place... given all that, i'm fine bypassing the gate to merge it19:53
clarkbyup though I did just push a new patchset toady the only difference for it is it removed a test case that a different change pulled in19:53
clarkbthat train of thought is basically where I ended up. its safe it should help eventually make the rate limits better so maybe we just go for it19:54
clarkbthen the followup change to fix haproxy on noble can go through normal review and ci19:54
clarkbas that one is a bit more meaningful19:54
fungii concur. if another infra-root is on hand to +2 941471 i'm willing to do the commands to bypass gating19:56
opendevreviewMerged opendev/system-config master: Mirror docker tool images  https://review.opendev.org/c/opendev/system-config/+/94130419:56
Clark[m]fungi: thanks. I'm in search of lunch now and hesitate to +2 my own change for that purpose20:00
fungisure, i don't think it's dire though if we have to wait a bit for someone else to confirm consensus20:01
Clark[m]++20:03
fungidan_with: cloudnull: yet another "blip" reaching sjc3. lost contact at precisely 20:37:27 utc and regained access at 20:48:51 utc. during the disconnect, my traceroutes to api.sjc3.rackspacecloud.com were dying at my service provider's border, but once it was restored i started seeing responses from datapipe and your network... could it be routes dropping out of bgp?20:51
clarkbI guess corvus and/or tonyb would be the folks available to weigh in on force merging 941471 as a second +2 at this time of day (probably a bit early for tonyb still)20:53
corvusagree20:56
clarkbthanks! fungi you mentioend volunteering to do the honors I think we're ready whenever you are20:59
fungiyep, on it now21:01
opendevreviewMerged opendev/system-config master: Install apparmor when installing podman  https://review.opendev.org/c/opendev/system-config/+/94147121:06
fungiopendev-promote-docs failed for ^ but i think it's because there was no gate build artifact for it to pull21:09
fungii also removed and readded my approval on the child change since it seemed like the way i merged it didn't signal to zuul that it could act21:10
fungibut it's enqueued normally now21:10
clarkbthe child landing should hopefully address the docs problem too21:14
clarkbI don't think there are important doc updates anyway21:14
fungiyep21:34
JayFclarkb: it's here https://usercontent.irccloud-cdn.com/file/A3q4CZFE/irccloudcapture4588956697937453146.jpg21:39
clarkbJayF: it seems to have stopped here and moved to your neighborhood21:41
clarkbwe ended up with about 1-2" so not much but enough to be fun/dangerous. Supposedly more tonight and tomorrow morning21:41
clarkbthe gitea-lb job completed and seems to have nooped as expected21:44
clarkboh though this first chagne would only affect noble nodes since the only prod chagne was adding apparmor to the package list21:46
clarkblooking at grafana02 I think we've nooped there as well21:46
clarkbya the install docker compose and friends task reports no change on grafana0221:48
opendevreviewClark Boylan proposed opendev/zone-opendev.org master: Reapply "Switch zuul.o.o to zuul-lb02"  https://review.opendev.org/c/opendev/zone-opendev.org/+/94160522:15
clarkbthe change to apply the rsyslogd fixups should merge soon22:15
clarkbif that deploys happily then I think we're ready to try 941605 again22:16
opendevreviewMerged opendev/system-config master: Fix haproxy access to rsyslogd on Noble  https://review.opendev.org/c/opendev/system-config/+/94157622:41
*** dmellado075539373 is now known as dmellado0755393722:48
clarkbthat deployment appears to have been a noop for zuul-lb01 and zuul-lb02. In lb02's case that is beacuse I manually made those cahgnes yesterday and lb01 properly skipped as the platform is too old22:50
clarkbhttps://zuul-lb02.opendev.org seems to be working for me. Any objection to approving 941605?22:51
fungii approved it just now22:52
clarkbthanks!22:52
fungii should at least be around long enough to test once it deploys and dns changes propogate22:53
clarkbI'm around. I'll poke out at some point for snow round two but I'm not in a huge hurry22:55
opendevreviewMerged opendev/zone-opendev.org master: Reapply "Switch zuul.o.o to zuul-lb02"  https://review.opendev.org/c/opendev/zone-opendev.org/+/94160522:59
clarkbzuul.opendev.org resolves to zuul-lb02 for me now23:03
clarkbperforming a hard refresh against my open browser tab for zuul status gets me content23:03
clarkbI'm starting to see connections trickle into the /var/log/haproxy.log file on zuul-lb02 too23:03
fungiyeah, resolving to zuul-lb02 for me here too, and i'm browsing it fine23:05
fungii didn't have an open tab23:05
clarkbtomorrow I can unwind zuul-lb01 and work on its cleanup and also start with codesarch02 again https://review.opendev.org/c/opendev/system-config/+/941130 is the change to get the initial deployment rolling (dns for it is already in place so should be safe to land that whenever)23:06
opendevreviewMerged opendev/system-config master: Deploy codesearch02  https://review.opendev.org/c/opendev/system-config/+/94113023:55
fungiit's already deploying, good timing!23:58

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!