Thursday, 2022-02-10

opendevreview	Ian Wienand proposed zuul/zuul-jobs master: [DNM] make sure centos-8 nodes fail with 828440 https://review.opendev.org/c/zuul/zuul-jobs/+/828630	00:01
opendevreview	Ian Wienand proposed zuul/zuul-jobs master: [DNM] make sure centos-8 nodes fail with 828440 https://review.opendev.org/c/zuul/zuul-jobs/+/828630	00:02
corvus	2 executors stopped. ah ah ah.	00:16
fungi	narration by the count never gets old	00:18
corvus	one left	00:44
corvus	hrm, i can't teel what it's waiting on	00:52
corvus	i don't see any build related subprocesses. i do see a bunch of stale looking 'git cat-file' jobs	00:52
corvus	i may send it a sigusr2	00:53
corvus	ah, it's a paused build	00:54
corvus	tripleo-ci-centos-8-content-provider head of gate	00:56
clarkb	ya that has confused me before but Zuul does the correct thing	01:07
corvus	it's resumed; apparently the ooo quickstart collect logs is not fast	01:10
corvus	done; on to batch 2 now	01:14
corvus	the first batch looks like it's running jobs okay. i'm going to afk now	01:15
ianw	hrm, https://review.opendev.org/c/zuul/zuul-jobs/+/828630 reported NODE_FAILURE when i switched the node types to centos-8 anyway. it feels like that should have run on centos-8-stream nodes. i wonder what i'm missing...	01:27
clarkb	ianw: label: centos-8 is not stream	01:34
clarkb	and that change seems to set lable to centos-8	01:34
clarkb	maybe we can just alnd that invalid config and it will report NODE_FAILURE? I thought zuul would validate more than that but seems not to	01:35
ianw	clarkb: yeah, but i thought that centos-8 label now actually selected centos-8-stream nodes	01:36
opendevreview	Steve Baker proposed openstack/diskimage-builder master: Replace kpartx with qemu-nbd in extract-image https://review.opendev.org/c/openstack/diskimage-builder/+/828617	02:25
opendevreview	Ian Wienand proposed zuul/zuul-jobs master: [DNM] make sure centos-8 nodes fail with 828440 https://review.opendev.org/c/zuul/zuul-jobs/+/828630	02:28
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: Futher bootloader cleanups https://review.opendev.org/c/openstack/diskimage-builder/+/790878	03:09
*** ysandeep\|out is now known as ysandeep		03:15
Clark[m]	ianw: not the label. The centos-8 nodeset	03:23
ianw	Clark[m]: yeah, i had it wrong, it was not using the nodeset defined in base-jobs	03:24
opendevreview	Ian Wienand proposed zuul/zuul-jobs master: [DNM] make sure centos-8 nodes fail with 828440 https://review.opendev.org/c/zuul/zuul-jobs/+/828630	03:54
opendevreview	Ian Wienand proposed openstack/diskimage-builder master: Futher bootloader cleanups https://review.opendev.org/c/openstack/diskimage-builder/+/790878	04:09
*** ysandeep is now known as ysandeep\|afk		05:11
*** ysandeep\|afk is now known as ysandeep		05:57
*** amoralej\|off is now known as amoralej		07:21
sshnaidm	clarkb, ianw, corvus if you have merge rights, please merge: https://review.opendev.org/c/openstack/project-config/+/828371	08:17
*** ysandeep is now known as ysandeep\|lunch		08:31
*** jpena\|off is now known as jpena		08:38
*** ysandeep\|lunch is now known as ysandeep		10:06
*** bhagyashris__ is now known as bhagyashris		11:26
*** dviroel\|out is now known as dviroel\|ruck		11:30
*** rlandy\|out is now known as rlandy\|ruck		11:38
*** ykarel is now known as ykarel\|away		12:23
frickler	kevinz_: Certificate for us.linaro.cloud is at 21 days, could you have a look please?	12:42
kevinz_	frickler: Sure, I will re-gen it.	12:43
*** amoralej is now known as amoralej\|lunch		13:09
*** dviroel is now known as dviroel\|ruck		13:13
*** artom__ is now known as artom		13:14
*** ysandeep is now known as ysandeep\|afk		13:52
*** pojadhav is now known as pojadhav\|afk		13:53
*** amoralej\|lunch is now known as amoralej		13:58
*** ysandeep\|afk is now known as ysandeep		14:55
dtantsur	hi folks! I remember I was talking to some of you about having a real partition image for cirros. Has there been any movement around it?	15:08
dtantsur	I'm asking because we're about to start building our own centos images with DIB in the CI, but I'd rather not to	15:08
fungi	dtantsur: frickler has been working on a cirros fork at https://opendev.org/cirros/cirros so maybe he has some ideas	15:15
dtantsur	oh, I also wanted to ask about the reason of creating a fork. is upstream development stagnating?	15:16
fungi	he can speak better to the reasons, but my understanding is that he wanted to set up some zuul jobs, possibly do integration testing with devstack, and discuss with smoser about relocating development to here and/or adopting the project	15:17
dtantsur	k understood	15:17
fungi	there have been ml threads in the past about cirros going stale upstream for long periods and the possibility of the openstack community picking up maintenance of it, but i don't know if those prior discussions had any bearing on the present situation	15:18
frickler	so currently this isn't a fork, but an attempt to get a working CI again for the original project	15:19
frickler	regarding the "real partition image", I did some testing, and the main issue seems to be getting grub installed into the image, which requires changing library options and in the end makes the result twice as large as the original	15:21
frickler	so I don't think that this is feasible as a default solution, but only optionally as a different flavor of cirros possibly	15:21
frickler	I also don't have too much time for this myself, so expect some results in a couple of months, not within days or weeks	15:24
dtantsur	frickler: is it something I could pick up or is there too much context to transfer?	15:24
frickler	dtantsur: well help is always welcome, if you want to look into setting up the build to generate what you need, best join us in #cirros (currently still on libera), so smoser and myself can work together answering your questions	15:28
frickler	you could also look at https://review.opendev.org/c/cirros/cirros/+/827916 and help me find out how to collect and store build results in a useful way without exploding our log storage	15:30
*** ysandeep is now known as ysandeep\|out		16:03
*** priteau_ is now known as priteau		16:11
dtantsur	frickler: you mean, store the actual generated images? I wish I knew, we could use that in Ironic...	16:24
*** tkajinam is now known as Guest210		16:30
corvus	the zuul executor restarts from yesterday are done; i'm going to restart zuul01 now	16:31
corvus	2022-02-10 16:33:05,483 INFO zuul.ComponentRegistry: System minimum data model version 1; this component 3	16:34
corvus	2022-02-10 16:33:05,484 INFO zuul.ComponentRegistry: The data model version of this component is newer than the rest of the system; this component will operate in compatability mode until the system is upgraded	16:34
corvus	that's as expected. as soon as i shut down the zuul02 components, that should bump up.	16:34
clarkb	and then 02 will start on the new version when it see everyone else is at the new rev too	16:34
corvus	yep	16:34
corvus	i was just thinking, a zuul CD job to upgrade zuul would be a little tricky... gracefully restarting an executor can take longer than the max job runtime due to paused jobs...	16:38
clarkb	corvus: crazy talk but I wonder if zuul could fork itself on the new code and just keep running with the old state	16:39
clarkb	basically replace itself in place and not need a synchronization at all	16:40
corvus	clarkb: kinda awesome.. but tricky with our container deployment model... :/	16:41
corvus	zuul01 is up, restarting 02 now	16:44
clarkb	In theory it would work pretty well to do that if you got the mechanics down since we're already storing the bulk of the state in zk	16:44
clarkb	the danger would be if needing to migrate internal datastructures but they could be forced to refetch from zk maybe	16:45
corvus	2022-02-10 16:45:19,445 INFO zuul.ComponentRegistry: System minimum data model version 3; this component 3	16:45
corvus	2022-02-10 16:45:19,445 INFO zuul.ComponentRegistry: The rest of the system has been upgraded to the data model version of this component	16:45
clarkb	nice	16:46
clarkb	corvus: if you get a chance this morning can you look at https://review.opendev.org/c/opendev/system-config/+/828203/ since you pointed out the slurp module which I used there	16:47
*** priteau is now known as priteau_		16:48
*** priteau_ is now known as priteau		16:48
corvus	clarkb: lgtm	16:49
clarkb	thanks I went ahead and approved it (and responded to your question	16:49
corvus	the big zuul changes necessitating the model upgrade are related to semaphores and changes in gate superceding check; so please keep an eye out for any unexpected behavior there	16:50
clarkb	I'm going to followup on that gerrit bug I filed about the cloning weirdness once my brain has fully booted.	16:52
clarkb	I suspect that we can go ahead and close the bug out	16:52
fungi	clarkb: migrating file descriptors and socket handles gets tricky when you're replacing processes live, but it's doable	16:52
fungi	closing and reopening everything is probably simpler	16:53
fungi	forks do mostly inherit them though	16:53
gibi	is it just me or the zuul web ui is down? https://zuul.opendev.org/	16:53
fungi	gibi: it's being restarted	16:53
gibi	ahh, OK	16:53
corvus	and it's up	16:54
fungi	gibi: zuul itself is able to do hitless rolling restarts now, but we only have one zuul-web service at the moment so it goes offline for a bit	16:54
corvus	#status log rolling restarted all of zuul on ad1351c225c8516a0281d5b7da173a75a60bf10d	16:54
clarkb	there are some TODOs to get a load balancer in front of multiple webs	16:54
opendevstatus	corvus: finished logging	16:54
gibi	fungi: nice improvement	16:54
corvus	what was the decision about LB -- make a new one or reuse the gitea one?	16:56
clarkb	corvus: I think my slgiht perference is to make a new one since small nodes seem to work well for haproxy and this way we can continue to operate zuul and gitea independently	16:57
fungi	is the gitea one in the same region as the zuul servers anyway?	17:01
clarkb	fungi: it is not	17:01
fungi	better to have the lb as topologically close to the servers as possible if it's doing socket forwarding	17:01
fungi	from a performance and stability standpoint	17:01
corvus	can haproxy handle websockets?	17:04
fungi	the short answer is "yes" because haproxy has a variety of different load balancing solutions	17:06
fungi	and client persistence algorithms	17:07
fungi	the longer answer depends on how exactly you want websockets "handled"	17:07
corvus	oh and we use tcp right?	17:07
fungi	for gitea we do, yes	17:07
clarkb	corvus: we have historically used tcp	17:07
clarkb	I think the reason for that is it simplified tls	17:08
corvus	so that should work fine for this, modulo maybe needing to set max connections or something	17:08
clarkb	basically instead of needing certs for every point in btween you just do it int he service and pass straight through	17:08
*** rlandy\|ruck is now known as rlandy\|ruck\|mtg		17:09
fungi	yeah, layer-4 proxying with client ip persistence can work for just about anything modulo cgn clients	17:09
fungi	if you want to do things like layer-7 forwarding with ssl/tls termination on the load balancer, or more granular client persistence to specific backends based on session ids or injecting cookies, that's where it starts to depend a lot on the application itself	17:10
fungi	is it important that multiple requests from the same client are persisted to the same backend in this case? like for authenticated sessions?	17:11
corvus	nope :)	17:12
corvus	there is no server-side session state	17:12
fungi	in that case we can probably ignore client persistence entirely	17:13
fungi	which should get us a much more even load distribution	17:13
fungi	it's a bigger problem for gitea, where a git operation can involve multiple requests over different connections, and there's no guarantee that the state of the repositories between backends is completely consistent (repacks, replication races, et cetera)	17:15
*** dviroel\|ruck is now known as dviroel\|ruck\|afk		17:16
clarkb	ok I updated https://bugs.chromium.org/p/gerrit/issues/detail?id=15649 with what we learned	17:22
corvus	we.. have a 5 node job limit?	17:23
clarkb	ya I seem to recall someone went a bit overboard and we had to set that. But maybe I'm misremembering	17:24
corvus	i would suggest that we increase that for the opendev tenant, but that wouldn't help us.	17:24
corvus	since the opendev tenant isn't where we run the opendev service jobs	17:24
clarkb	corvus: what we can do is use groups rather than hosts and have some thinsg colocated. I'ev thought about doing that in the past but it seemed non urgent	17:25
fungi	i have no problem with raising it if we have jobs that need that many, it was simply a useful starting point	17:25
clarkb	and ya I think we could bump it	17:25
clarkb	I guess the way we do zuul services doesn't really allow for colocating though (since everything is bind mounted from the same path regardless of service)	17:29
opendevreview	James E. Blair proposed opendev/system-config master: Add Zuul load balancer https://review.opendev.org/c/opendev/system-config/+/828773	17:31
corvus	presumably zuul will refuse to run that until we figure out how to run 6 nodes	17:32
*** jpena is now known as jpena\|off		17:32
sshnaidm	clarkb, hi, can you please merge the perms patch in your time https://review.opendev.org/c/openstack/project-config/+/828371	17:34
clarkb	fungi: sshnaidm: we clarified that deleting branche is lossy right?	17:35
clarkb	sshnaidm: if you delete a branch and don't haev another permanent ref pointing to that commit our regular garbage collection will delete data	17:35
sshnaidm	clarkb, yeah, I'm aware of that	17:36
sshnaidm	it was created by mistake, so I just don't want it to be there to confuse users with right branches..	17:37
clarkb	fungi: I think your ethercalc copy may be trying to abckup and fail based on emails we are getting. Can you double check that and maybe comment out the backup crons on your copy ?	17:40
fungi	clarkb: i've done one better and deleted the server	17:42
fungi	just cleaning up the snapshot i built it on now	17:42
clarkb	thanks	17:42
clarkb	corvus: I think we can either bump the limit or combine the load balancer and zk or merger or similar.	17:48
opendevreview	Merged openstack/project-config master: Give perm to release team to delete branches https://review.opendev.org/c/openstack/project-config/+/828371	17:52
opendevreview	Merged opendev/system-config master: Test pushes into gitea over ssh https://review.opendev.org/c/opendev/system-config/+/828203	17:52
clarkb	I was hoping ^ that stack would cause semaphores to be exercised but the second chagne is test only so we don't run the prod deploy after it	17:55
clarkb	oh well	17:55
corvus	clarkbfungi : oh apparently max is 10 nodes	17:55
fungi	oh neat	17:58
clarkb	I'm trying to write a chagne to fix ls-members --recursive now	18:10
*** amoralej is now known as amoralej\|off		18:21
clarkb	https://gerrit-review.googlesource.com/c/gerrit/+/330179 that may do it	18:36
fungi	so it was working for the rest api, but they pulled the rug out from under the cli?	18:37
clarkb	fungi: yup I think at some point they combined the rest api and ssh commands but then in cases like this missed that the recursive flag was private and needed to be explicitly handled	18:38
clarkb	Next I need to work on a depends on to check this output	18:38
clarkb	it is easier for me to do that than figure out their test framework :/	18:38
fungi	their release notes handling is kinda neat, i didn't realize they embed that in commit message footers	18:39
clarkb	fungi: that is brand new as of last night	18:39
fungi	i wonder if they've considered how to go about correcting/updating a release note after the commit merges ;)	18:39
* corvus uploaded an image: (36KiB) < https://matrix.org/_matrix/media/r0/download/acmegating.com/vwNHycEJaXWOcgjpDcZbNpRM/image.png >		18:43
corvus	my guess is they write a new note.	18:43
fungi	hah	18:44
fungi	a moose once bit my sister	18:44
fungi	today i learned about mysql's string replace function. wish i had known about it during the login.launchpad.net to login.ubuntu.com migration	18:53
opendevreview	Clark Boylan proposed opendev/system-config master: DNM testing upstream fix for gerrit ls-members https://review.opendev.org/c/opendev/system-config/+/828786	18:55
clarkb	and ^ should hopefully test this for us	18:55
*** rlandy\|ruck\|mtg is now known as rlandy\|ruck		18:57
clarkb	on the surface it is a silly little bug but wow did it create a lot of confusion for us debugging the performance issues	18:58
clarkb	the other really neat thing about that depends on setup is if we push the fix to stable-3.4 and not also to stable-3.5 then we'll get a test that checks for failure and success in the same buildset :)	19:00
clarkb	granted on different realses of gerrit but for this situation that should be fine	19:01
*** dviroel\|ruck\|afk is now known as dviroel\|ruck		19:19
opendevreview	Ade Lee proposed zuul/zuul-jobs master: WIP/DNM - Test new version of mariadb https://review.opendev.org/c/zuul/zuul-jobs/+/827366	19:22
opendevreview	Clark Boylan proposed opendev/system-config master: DNM testing upstream fix for gerrit ls-members https://review.opendev.org/c/opendev/system-config/+/828786	19:46
clarkb	apparently admin is already a member of groups it creates? I guess implicitly as owner maybe? To be extra sure that we're getting recursive listings I went ahead and updated that to make another user that is distinct	19:47
fungi	makes sense, as gerrit sets its groups to be self-owned by default	20:52
corvus	clarkbfungi https://review.opendev.org/828773 is, um, possibly a hole-in-one?	20:55
clarkb	corvus: nice I'll review it shortly. Just sitting back down after some lunch	20:55
opendevreview	Ian Wienand proposed openstack/project-config master: Remove Fedora 34 https://review.opendev.org/c/openstack/project-config/+/816933	20:56
fungi	corvus: and on a par 4 hole at least	20:57
clarkb	woot my gerrit test seems to work. I'll update it now to ensure that we aren't always recursive etc but I think the change I wrote is working	20:58
clarkb	it is really cool that we can do this sort of thing with zuul	20:58
opendevreview	Merged openstack/diskimage-builder master: Futher bootloader cleanups https://review.opendev.org/c/openstack/diskimage-builder/+/790878	21:02
opendevreview	Clark Boylan proposed opendev/system-config master: DNM testing upstream fix for gerrit ls-members https://review.opendev.org/c/opendev/system-config/+/828786	21:03
ianw	corvus: couple of minor comments, pleasing how easily the roles and testing and just generally everything make it look easy	21:08
clarkb	I left a few too	21:10
ianw	fungi / clarkb: the centos-8 failure patch I believe is fully tested now -> https://review.opendev.org/c/opendev/base-jobs/+/828437	21:13
ianw	if we're ok we can go with that, and i can send another follow-up email about the current status	21:14
fungi	ianw: sounds good to me, thanks. i also brought it up in the tc meeting today so at least the openstack leadership is aware of the current situation and impending behavior change	21:15
clarkb	approved I guess keep an eye out for unexpected fallout, but thank you for testing it so that we avoid all that I hope :)	21:17
ianw	i think we're covered for migration now. if you're explicitly using centos-8 then you'll get NODE_FAILURE. if you're using the base-jobs definition then you'll get RETRY_FAILURE with a clear failure message not to use it	21:21
fungi	it's too bad there's no clear signal to explicitly trigger a failure result directly from pre-run, but at least it'll triple-fail quickly	21:22
clarkb	looks like OSA has managed to clean things up for the most part. https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/827483 should be the last one and I rechecekd it	21:23
clarkb	jrosser: fyi I think that didn't auto enter the gate with its parent because they must not share a gate queue	21:24
clarkb	ya'll may want to ensure all the osa repos share a queue so that they are cogated and your integration testing makes use of speculative states properly	21:24
opendevreview	Ade Lee proposed zuul/zuul-jobs master: WIP/DNM - Test new version of mariadb https://review.opendev.org/c/zuul/zuul-jobs/+/827366	21:25
corvus	ianw: clarkb how sure are you that we don't need the letsencrypt job to run before the lb job?	21:26
opendevreview	Merged opendev/base-jobs master: base: fail centos-8 if pointing to centos-8-stream image type https://review.opendev.org/c/opendev/base-jobs/+/828437	21:26
clarkb	corvus: like 80%. I think the risk is that we'll end up having a proxy up that connects to backends that don't have valid https certs yet. It might be better to have users get no tcp connection at all	21:27
clarkb	its also not a major problem to have that dep there if we want to be safe	21:27
fungi	we could also mitigate that by changing the health check to be more than just a tcp socket prove	21:27
fungi	probe	21:27
opendevreview	James E. Blair proposed opendev/system-config master: Add Zuul load balancer https://review.opendev.org/c/opendev/system-config/+/828773	21:27
opendevreview	James E. Blair proposed opendev/system-config master: Clean up some gitea-lb zuul config https://review.opendev.org/c/opendev/system-config/+/828793	21:27
clarkb	fungi: I think if you are doing tcp lb you cannot do the richer health checks	21:28
clarkb	as haproxy does't load the necessary bits into its state tables	21:28
corvus	okay but if you want to add it back we're totally adding that to my handicap. strictly speaking, the -focal/-bionic comment is the only error i made :)	21:28
fungi	mulligan's fair there	21:29
corvus	i believe i made sure that zuul-web doesn't answer on http until it's fully initialized, so should be compatible with a tcp healthcheck	21:30
corvus	(as we've seen from the rolling-restart outages)	21:30
fungi	corvus: we put it behind apache which does answer though?	21:31
fungi	also looks like you're introducing a config error on 828793	21:31
corvus	hrm, that could be a problem then.	21:31
corvus	do we just gloss over that discrepancy with gitea?	21:32
corvus	"if apache is up, gitea is probably up"	21:33
clarkb	corvus: ya I think so	21:33
fungi	it's a good question. and yes i think it's probably something we should try to solve	21:33
clarkb	when manually doing gitea work I always try to tell the load balancer about it first	21:33
clarkb	fungi: ++	21:33
fungi	we put the haproxy config in first. later we added apache in between haproxy and gitea but didn't consider what that might do to our health checks	21:34
fungi	i think that was a regression we simply haven't noticed	21:34
clarkb	one solution is to have the health check check the direct port	21:34
clarkb	rather than the apache ssl terminator	21:35
clarkb	I think that is possible as it has the bits to do the tcp check in place. Its just a matter of telling it to use the other port?	21:35
clarkb	looking at merger queue graphs and wow can you see the periodic jobs loading in	21:39
clarkb	tomorrows periodic run will be intersting since we should have the full complement of mergers for it	21:42
clarkb	today's set took about the same amount of time as yseterdays but with 60% of the mergers	21:42
*** dviroel\|ruck is now known as dviroel\|out		21:49
opendevreview	Merged openstack/project-config master: Remove Fedora 34 https://review.opendev.org/c/openstack/project-config/+/816933	21:51
clarkb	infra-root any opinions on the best way to start shutting down subuntu2sql workers and openstack health api? The health api hasn't worked in months and I'ev discussed with the qa team and we're basically going to turn it off. I was thinking I should shutdown apache (running the wsgi service) and the gearman workers for subunit2sql but it looks like puppet will restart apache. Should	21:54
clarkb	I put the hosts in the emergency file or go ahead and start removing the puppet for them then shutdown the services. Then delete stuff in a bit?	21:54
ianw	probably makes sense to emergency them and shutdown	22:01
clarkb	ianw: oh ya maybe that is the easiest thing	22:03
clarkb	https://zuul.opendev.org/t/openstack/build/82be22c4e5b646438f0038851f3042f1/log/job-output.txt#30230-30264 ok I think that shows my upstream fix is working. I left a comment on the upstream chagne pointing to that	22:11
clarkb	ianw: when you do that do you `shutdown -h now` or do it via the nova api?	22:12
fungi	yeah, that makes sense to me. i do `sudo poweroff`	22:25
clarkb	cool I'll get started on that shortly. Will be the two subunit2sql workers and the health.openstack.org server	22:25
clarkb	All the data is in the trove db though so I'm not too concerned about deleting these other than not being sure we'll be able to rebuild them if somehow necessary	22:26
clarkb	But I figure if we go slow it gives people a chance to scream :)	22:26
fungi	we can make images of them before deleting, i've done that with most of the other services i've shut down	22:30
clarkb	ok health01, subunit-worker01, and subunit-worker02.openstack.org are all in the emergency file now	22:35
clarkb	next up server shutdowns	22:35
clarkb	and done. They can be booted back up again if necessary but I don't expect much trouble since no one noticed the services was not working for a long time anyway	22:37
clarkb	gmann: ^ fyi I shutdown the servers as a first step in cleaning things up. If nothing comes up in the next ~week I'll snapshot the servers and delete them	22:38
clarkb	status.openstack.org is another related server but it hosts e-r things so I want to make sure that the whole ELK thing is settled before I clean it up	22:39
clarkb	but once that is done I think status can go away too	22:39
gmann	clarkb: ack.	22:43
ianw	i wonder if we should move that to static and make status.openstack.org/zuul redirect	22:46
ianw	i did have that in bookmarks for years, just thanks to inertia	22:47
clarkb	ianw: I think the zuul redirect would be just about the only thing that is still valid there when we are done	22:47
clarkb	reviewday hasn't been working doesn't look liek (and I'm not sure anyone has used it recently), health is broken and going away. e-r + ELK is moving.	22:47
opendevreview	Merged opendev/system-config master: Add Zuul load balancer https://review.opendev.org/c/opendev/system-config/+/828773	23:20
ianw	ianw@bridge:/var/log/ansible$ ls -l .2020- \| wc -l	23:37
ianw	3175	23:37
ianw	does anyone have a problem if i remove these?	23:38
ianw	it seems like in 2020 we had a period where we kept all the logs for a while	23:38
ianw	this inspired by trying to figure out why this infra-prod-service-nodepool run failed https://zuul.opendev.org/t/openstack/build/63b692ad36d94bf1b2f574bbee98e8cb/console	23:39
clarkb	ianw: I think I started cleaning those up at one point and then frickler determined something else was hogging all the disk	23:39
clarkb	But I'm not opposed to removing the 2020 log files	23:39
opendevreview	Jeremy Stanley proposed opendev/system-config master: Clean up some gitea-lb zuul config https://review.opendev.org/c/opendev/system-config/+/828793	23:39
ianw	hrm, i guess actually if i expand it, we're just keeping everything	23:41
ianw	i thought we only kept the last few runs, but must be wrong	23:41
ianw	it would be good to have a more direct way to sync zuul build -> file on disk	23:42
ianw	ok, for my own reference, "Rename playbook log on bridge" is i guess that	23:43
ianw	and then it seems to run "find /var/log/ansible -name 'service-nodepool.yaml.log.*' -type f -mtime 30 -delete"	23:43
clarkb	ya it should be cleaning up. I think what hapepns is if the filename changes we orphan some things	23:44
opendevreview	Ian Wienand proposed opendev/system-config master: bridge production: fix mtime matching https://review.opendev.org/c/opendev/system-config/+/828808	23:47
ianw	clarkb: ^ that will probably help	23:48
*** prometheanfire is now known as Guest2		23:49
*** osmanlicilegi is now known as Guest0		23:49
clarkb	ah we can leak then if we don't run often enough to get the exact match	23:51
ianw	crazy idea; keep a list of "log reader" gpg keys and encrypt each log file from the bridge runs with multiple --recipient keys. then have a build artifact like our download-all-logs which is a command you can paste to just cat out the logs from a production run	23:51
fungi	not a bad idea	23:51
ianw	presumably infra-root, but if someone were interested in a particular service they could add themselves	23:52
fungi	clarkb: -mtime +30 would solve that	23:53
clarkb	fungi: ya that is ianw's fix	23:54
fungi	oh, missed it	23:54
fungi	thanks, reviewing	23:54

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!