Tuesday, 2025-11-18

clarkb	thats interseting I sent the agenda approximately 5 minutes ago according to my mail client and it still hasn't show up in the archive or on the return back to me	00:26
clarkb	and as I type this message it arrives. Nevermind	00:26
tonyb	We can discuss it in the meeting tomorrow if you'd like. We've had remove unneeded users from gitea on the todo list for a while. Looking at `gitea admin user list` there's only 1 (on all servers) (Also: https://opendev.org/explore/users). Any objection to be running `gitea admin user delete --id 25` on all the servers ?	01:13
*** elodilles_pto is now known as elodilles		08:51
*** dmellado4 is now known as dmellado		12:44
priteau	Hello. Today I am seeing frequent issues accessing tarballs.openstack.org / releases.openstack.org. Is there an issue with the server?	13:06
opendevreview	Piotr Parczewski proposed zuul/zuul-jobs master: Drop Python 2 support https://review.opendev.org/c/zuul/zuul-jobs/+/966977	13:26
fungi	priteau: what sort of issues?	15:12
fungi	priteau: also tarballs.openstack.org is just a redirect to tarballs.opendev.org/openstack	15:14
fungi	openstack.org dns is hosted in cloudflare (opendev.org is not), so if you were seeing dns resolution issues i heard from some colleagues there was a cloudflare outage earlier today (resolved now)	15:15
fungi	that would be my first guess, without any additional detail on the nature of the problem you observed	15:16
priteau	Ah, that could be it. I didn't know there was any dependency on cloudflare for this	15:59
fungi	yeah, opendev itself wasn't impacted since we don't rely on cloudflare for anything, but openstack.org dns resolution could have been broken for a while earlier today	16:03
slittle1_	can i get eyes on https://review.opendev.org/c/openstack/project-config/+/965422 when you have a chance. Thanks	16:04
fungi	done, i'll add you as the initial member of starlingx-app-kubernetes-module-manager-core once the deploy jobs complete	16:10
fungi	sorry i missed that had been proposed a few weeks ago	16:10
slittle1_	no probs. THanks again	16:12
opendevreview	Merged openstack/project-config master: Add Kubernetes Module Manager app https://review.opendev.org/c/openstack/project-config/+/965422	16:19
*** jonher_ is now known as jonher		16:20
opendevreview	Takashi Kajinami proposed opendev/base-jobs master: Add standard Debian Trixie nodesets https://review.opendev.org/c/opendev/base-jobs/+/967584	16:41
opendevreview	Takashi Kajinami proposed opendev/base-jobs master: Add standard Debian Trixie nodesets https://review.opendev.org/c/opendev/base-jobs/+/967584	16:45
clarkb	corvus: ^ is that what we have in mind for nodeset management going forward with the new launcher non generic labels?	16:46
clarkb	we're discussing in #openstack-infra and I think this is the general direction we're headed in but wanted to make sure you had a chance to check as you've been thinking about this far more than anyone else I suspect	16:47
*** dmellado0 is now known as dmellado		16:48
tkajinam	my own preference (as I stated in #openstack-infra) is have generic nodeset defintion in a single place to avoud duplicating similar defs across repos or adding multiple cross-repo dependency... though I don't have strong opinion so will follow the guidance	16:49
tkajinam	I'm leaving in a few minutes but will check the discussion tomorrow (I have to go out during the day so maybe in the evening).	16:50
clarkb	tkajinam: enjoy!	16:51
tkajinam	clarkb, thx !	16:51
clarkb	fwiw I think I'm good with the current iteration of 967584. Basically keep the trned of being specific but provide easy to use nodeset definitions within that	16:55
clarkb	I did find one issue in tkajinam's change and left a comment. I'm happy to update the change if there are not other concerns with it	17:36
opendevreview	James E. Blair proposed opendev/base-jobs master: Remove nodesets file https://review.opendev.org/c/opendev/base-jobs/+/967597	17:54
corvus	clarkb: tkajinam i think we want to update this file instead: https://opendev.org/opendev/zuul-providers/src/branch/master/zuul.d/nodesets.yaml	17:54
clarkb	oh right we're consolidating thati n the zuul-providers repo	17:55
clarkb	I'll get a change up for that shortly	17:55
fungi	oh! somehow i missed that we had deprecated the nodesets in base-jobs	17:55
fungi	maybe we should add a comment in the file	17:56
clarkb	++	17:56
corvus	one other thing: i can't recall whether we decided we should keep a non-memory-specific (or default) nodeset around.... i know we talked about it; did we record the decision?	17:56
corvus	fungi: we should delete it https://review.opendev.org/967597	17:57
fungi	oh! even better	17:57
opendevreview	Clark Boylan proposed opendev/zuul-providers master: Add Debian Trixie x86-64 nodesets https://review.opendev.org/c/opendev/zuul-providers/+/967599	17:59
clarkb	corvus: I feel like we discussed it and wanted to get away from the generic non memory specific labels for sure	17:59
clarkb	corvus: I don't recall if we wanted to extend that to nodesets. I'm somewhat inclined to be explicit now as this is a great opportunity to transition and it removes some level of confusion for people	18:00
fungi	the main thing we lose, without some more work, is an indication of preferred/best-served alternative labels	18:01
fungi	which of those will have the greatest diversity of providers to boot in?	18:01
clarkb	currently its the 8GB because we can boot them in every provider (I think OVH is the main location where we can't boot the others)	18:01
fungi	if i pick the "wrong" one my jobs may have to wait for the handful of providers serving that label to have free quota	18:01
clarkb	but yes that isn't communicated anywhere and having a generic label would allow us to continue to provide that without thinking too hard	18:02
clarkb	I'm ahppy to add a generic nodeset to 967597 if we prefer that	18:03
clarkb	corvus: we expect https://review.opendev.org/c/opendev/base-jobs/+/967597 to be safe to land now right?	18:03
clarkb	we aren't even loading nodesets from that repo aiui so I can just go ahead and approve it	18:03
corvus	clarkb: yep	18:03
corvus	the tenant config file in openstack/project-config is the place to double check me if you want	18:04
corvus	(but also, if we missed something and it blows up, we should just roll forward :)	18:04
clarkb	ya I was just looking. We exclude nodesets in opendev but then only include job and secret in the other tenants	18:04
clarkb	so I think this is all good	18:04
corvus	i honestly don't know the best answer to the generic nodeset question -- i'm a little hesitant to not have it because if doing so encourages lots of ppl to set -4GB nodes then that's not good for our ovh utilization. but maybe we should not have it and just encourage people to still use -8GB unless there's a good reason not to.	18:06
clarkb	ya I think the alternative to implicit unsaid communication is explicit communication	18:06
corvus	also, i don't have time at the moment to coordinate with the ovh folks on adjusting things so we can use different node sizes; if someone wants to try to restart that project, that would be very useful.	18:07
fungi	that might be a good thing to get dmsimard[m] to help with	18:07
clarkb	yes and there was discussion about refreshing the hardware at the same time possibly	18:07
corvus	yeah, if we want to try to end up with more explicit/deliberate choices in the future, then avoiding the generic nodeset and over-communicating that people should use -8gb for now may be best.	18:08
clarkb	though I don't think we need to couple the efforts	18:08
clarkb	corvus: why don't we start with that and if it creates problems we can switch to adding the generic nodeset later	18:08
clarkb	I feel like removing the generic nodeset is harder so going in this direction is probably best?	18:09
corvus	yep, not coupled, but related: once we have all providers able to run 4gb nodes, we can stop telling people to use 8gb.	18:09
corvus	(and encourage them to use 4gb)	18:09
clarkb	I think my change is good as is if we don't start with the generic nodeset	18:10
corvus	clarkb: ++	18:10
opendevreview	Merged opendev/base-jobs master: Remove nodesets file https://review.opendev.org/c/opendev/base-jobs/+/967597	18:17
mnasiadka	Any chance for landing trixie arm64? (https://review.opendev.org/c/opendev/zuul-providers/+/966200)	19:07
fungi	approved	19:09
mnasiadka	Thanks fungi	19:09
fungi	sorry i missed it earlier	19:09
opendevreview	Clark Boylan proposed opendev/system-config master: Add matrix gerritbot to the new opendev matrix room https://review.opendev.org/c/opendev/system-config/+/967608	19:40
clarkb	infra-root ^ that is the promised gerritbot change. I modeled it after what we do for zuul but updated the repo names	19:40
clarkb	https://review.opendev.org/q/hashtag:%22opendev-matrix%22+status:open will pull up that change and the statusbot change so is probably preferable as a link to monitor	19:41
clarkb	I'm going to figure out lunch then probably go out for a bike ride. But when I get back I'll try the test case check for non overlapping channel names between matrix and irc for logging	19:45
clarkb	I did skim the limnoria channellogger config and it appears to be an all or nothing choice	19:45
clarkb	there are ways to 'tune' the logging on disk, but I think if we change that it changes it for every channel and makes the log files one big flat setup which is not a great option	19:47
clarkb	so simply having a cutover flag day and switching over is probably ebest	19:47
tonyb	as long as we communicate the flag day and have consistent expectations around how we handle activity here after the flag day that sounds good to me	19:49
tonyb	could be I'm worried about nothing	19:49
dmsimard[m]	fungi: hi, I read up a bit and might be lacking context, you want different flavors ?	19:54
fungi	dmsimard[m]: basically the account zuul uses is tied to a dedicated nova host aggregate with scheduling controlled through a custom flavor, as i understand it	19:55
fungi	amorin had mentioned there was an opportunity for a hardware refresh as well, i think	19:56
fungi	clarkb might remember the conversation better than i do	19:56
fungi	but ultimately we'd like to have a few different flavors with 4/8/16 gb ram options now that zuul can more efficiently make use of them	19:57
dmsimard[m]	yeah I think he mentioned it would be relevant to do a refresh of the aggregate, we can and should still do that but if new flavor(s) is all you need for now we can make that happen faster	19:57
dmsimard[m]	let me see what the config looks like	19:58
fungi	i don't think we're in any hurry	19:58
dmsimard[m]	there's two projects, openstackci and openstackjenkins, I'm guessing we're talking about openstackjenkins	19:59
fungi	correct	20:00
dmsimard[m]	after all these years, jenkins manages to stick around somehow 😂	20:00
fungi	indeed, we don't name our new accounts that way, it's a testament for how long ovh has been dedicated to helping us test openstack	20:01
fungi	i want to say they were our second cloud donor after rackspace	20:01
fungi	so it's been a _long_ time	20:02
dmsimard[m]	ok I see how the flavor and aggregate is set up. Do you have a list of the flavors documented somewhere ? You mentioned RAM but what about vcpus and disk ?	20:03
Clark[m]	We were told that the setup is old and uses some extension or plugin or something that needs to be dealt with first or in conjunction	20:04
Clark[m]	I think the modern setup uses normal nova functionality	20:05
Clark[m]	aiui step 1 was making that transition. Then we could start "tuning" flavors from there	20:06
fungi	our other cloud providers don't really give us the flexibility of custom flavors, but approximate guess would be core count equivalent to ram gigabytes, and at least 80gb for the rootfs	20:06
Clark[m]	as far as size goes today we expect an 80GB disk, 8GB of memory, and ~8vcpu. There is flexibility when going up and down in memory size to also go up and down in vcpu count but I think 80GB disk is probably a good ballpark due to git repos and all that	20:07
dmsimard[m]	ok, I could be mistaken but from what I am looking at here it would be in the realm of possibility to get you new flavors soon enough while we figure out the hardware refresh, I'll double check with Arnaud tomorrow	20:07
fungi	so 4gbram+4vcpu+80gbroot, 8gbram+8vcpu+80gbroot, 16gbram+16vcpu+80gbroot would be my best guess	20:07
clarkb	https://etherpad.opendev.org/p/ovh-flavors here is where we started sketching things out	20:08
dmsimard[m]	on your end you would map these flavor names/ids somewhere in zuul or something ?	20:08
fungi	correct	20:08
clarkb	that has some background on the step 1 thing	20:09
fungi	people writing jobs would basically choose between 4gb, 8gb or 16gb options for their nodes	20:09
clarkb	reading that document actually seems to cover much of this	20:09
fungi	and we then try to normalize those across our different providers	20:09
dmsimard[m]	yeah thanks for the pad clarkb :)	20:10
clarkb	I had to switch back to the computer to find that in my browser history	20:10
clarkb	I may switch back to matrix client again while I eat lunch	20:10
dmsimard[m]	let me get back to you tomorrow on that	20:10
fungi	and like i said, there's no rush on this. we just wanted to make sure it doesn't fall through the cracks, and also we like having an opportunity to talk to you again! ;)	20:10
dmsimard[m]	haha likewise	20:11
fungi	was awesome to see you in france too	20:11
dmsimard[m]	I still have your business card with the pgp key on it :P	20:12
fungi	it's still valid! i don't think it gets you any discounts on food sadly	20:12
fungi	but that's the same pgp key signing openstack security advisories and attesting to the keys that sign openstack release tags and tarballs	20:13
fungi	4k rsa should be post-quantum safe enough, from what i understand, so i don't expect to rotate it for a while still	20:14
opendevreview	Merged opendev/zuul-providers master: Add Debian Trixie x86-64 nodesets https://review.opendev.org/c/opendev/zuul-providers/+/967599	20:31
fungi	openstack.exceptions.HttpException: HttpException: 499: Client Error for url: https://swift.api.sjc3.rackspacecloud.com/v1/AUTH_ac0fed44dbe4539d83485bcefc4e2d4b/images-7b7d44d25aa9/d2de98d192f240e8a3ed59002d3d4629-debian-trixie-arm64.qcow2/000015, Client DisconnectThe client was disconnected during request.	21:36
fungi	hopefully temporary? guess i'll recheck 966200	21:36
fungi	now on recheck opendev-build-diskimage-debian-trixie-arm64 hit NODE_FAILURE	21:42
dmsimard[m]	speaking of france, I am still working through my backlog from summit but one of the questions I've had is if we could make a logs.openstack.org-like server like we once used to have once upon a time	21:49
dmsimard[m]	kolla-ansible and openstack-ansible would like to send their ara databases somewhere so they could look at reporting of their "nested" ansible (in addition to zuul's perspective)	21:50
dmsimard[m]	I have a "demo" ara server that I happen to use for CI but it's not meant to receive a lot of traffic :p	21:51
dmsimard[m]	the idea would be to have something in the post pipeline upload the ara sqlite database to a server somewhere so they can be dynamically rendered, like logs.openstack.org was	21:52
corvus	fungi: osuosl seems to be returning a lot of http errors now, causing the node failure	21:52
tonyb	dmsimard[m]: We did discuss some related topics. I think it was mostly focused on a persistent DB to send the ara reports to, less a central logging server	21:52
tonyb	dmsimard[m]: For nested ansible we have some support for that see 'ARA Report' in https://zuul.opendev.org/t/openstack/build/8c59d2cdf14f4f2786051255139bc56d/artifacts as an example	21:53
dmsimard[m]	tonyb: yes, I advised against an "open" server to send results to in real time, it would be needlessly demanding from a latency/performance perspective	21:54
fungi	dmsimard[m]: do you mean central logging for infrastructure services, or for analyzing job output?	21:54
tonyb	dmsimard[m]: There is an opensearch managed by RH $somewhere	21:54
dmsimard[m]	I am not in the loop for central logging, this was about job reports specifically	21:55
fungi	yeah, dpawlik has been maintaining a replacement for the old logstash/elasticsearch systems we used to run	21:55
fungi	https://governance.openstack.org/sigs/tact-sig.html#opensearch	21:56
tonyb	fungi: Thanks I couldn't find a link o guess the name	21:57
dmsimard[m]	tonyb: the thing about html reports is that they are very inefficient, it's a lot of small files to object storage, they take time to generate and upload	21:57
fungi	there's a shared login because at some point years ago kibana stopped supporting anonymous access	21:57
fungi	this runs on amazon-supplied opensearch and aws services, ftr	21:58
tonyb	dmsimard[m]: That's fair and one of the aspects that was discussed. One of the benefits we get (becuase it's swift backed) is automatic expiration.	21:59
fungi	they give us a pile of free credits yearly to operate that specific service	21:59
fungi	the opensearch+aws resources i mean	21:59
tonyb	dmsimard[m]: I think the TL;DR: is we need a solid plan and ideally a list of costs and benefits, but there isn't any strong objection to doign $something better	22:00
dmsimard[m]	tonyb: I remember there used to be a big find command that'd automatically delete old files on logs.openstack.org, it's kinda like automatic expiration	22:01
tonyb	dmsimard[m]: LOL, true.	22:01
dmsimard[m]	what I am suggesting is a similar approach with just the (relatively) small sqlite databases: https://ara.readthedocs.io/en/latest/distributed-sqlite-backend.html	22:01
* jrosser remembers hacking an attempt at this		22:02
dmsimard[m]	noonedeadpunk, mnasiadka ^	22:03
* tonyb reads		22:03
jrosser	which pulled the db, rather than had it pushed iirc	22:03
corvus	clarkb: has been pretty involved in these conversations, might be good to arrange a discussion when he's available	22:04
fungi	i'll note that the "little" find command started to run into the same problems we see with htcacheclean on some of our apache mod_cache mirrors... the expiration/cleanup takes so long to iterate over the massive number of files that data accumulates faster than we can expire and remove it	22:04
dmsimard[m]	fungi: oh yeah, it was definitely not without its share of issues, I remember running out of inodes :p	22:05
tonyb	I think having an etherpad or similar to flesh out the idea is the next step. I recall some discussion around the distributed sqlite approach	22:05
fungi	uploading to swift and declaring object expirations that the backend can act on asynchronously has absolved us of that task when it comes to job logs	22:05
corvus	but if the ara server can load the sqlite db over an http connection, then using an architectural approach like zuul-proxy, where an artifact in zuul links to a special url on an ara server that instructs the middleware to fetch the sqlite db from the existing object storage used for logs uploads could be a low-maintenance option.	22:06
corvus	sorry, s/zuul-proxy/zuul-preview/	22:07
dmsimard[m]	if we really want to keep stuff in swift, I wonder if something like s3fs would work, but I guess the challenge is that there are a lot of different swifts	22:09
corvus	(well, actually both, zuul-storage-proxy servers log urls by url, and zuul-preview serves them by header)	22:09
tonyb	Or something I did at a former employer (which is a little gross but worked) was store the temporary data in a directory rooted in soemthing like `TZ=0000 date --date "+2 weeks" +"%Y%m%d"`, then the cleanup was fairly quick	22:10
dmsimard[m]	the ara server doesn't know how to load sqlite databases over http :(	22:10
dmsimard[m]	not yet, anyway, but nothing a curl or wget can't fix	22:11
jrosser	dmsimard[m]: you remember i did that?	22:11
dmsimard[m]	or rsync	22:11
dmsimard[m]	jrosser: I think so	22:11
jrosser	https://github.com/jrosser/ara/commit/f9af69eaef4ea1228f4fc641e36b9d8df5adbaaa	22:11
jrosser	i know it was not liked, but it is what it is :)	22:11
dmsimard[m]	oh it's even against the new django codebase	22:12
dmsimard[m]	I can try it	22:13
jrosser	its from really some time ago but i did run it for a while	22:13
dmsimard[m]	2022 does feel like forever ago	22:14
fungi	in the beforetime, in the longlongago	22:16
corvus	that implementation looks a little bit fragile. something that might make it more future-proof would be to report the sqlite db url as an artifact, and then, if you wanted to have ara fetch it, use zuul's api to look up the artifact url and fetch that. it would avoid encoding some of that business logic in the wsgi code.	22:16
corvus	(that would get rid of settings.DISTRIBUTED_SQLITE_ZUUL_DB_PATH and api_resp.json()[0].get('log_url')	22:18
corvus	jrosser: why did you stop running it?	22:25
jrosser	corvus: because i didnt want to start relying (and having others also rely) on something that got such lukewarm response back in 2022	22:27
dmsimard[m]	corvus: I think I see what you mean, we can try that too	22:27
corvus	jrosser: are you concerned the ara devs might stop supporting that? (or did you mean lukewarm response from open infra community)	22:28
jrosser	well a bit of all of it really	22:28
jrosser	but the topic does keep coming round again	22:28
jrosser	i have enough imposter syndrome to just drop it and step away rather than push for it, sadly	22:30
dmsimard[m]	I feel like my argument at the time might have been that something generic would have been nice (less zuul specific, more sqlite over http), someone looked at getting it through a react js app similar to how zuul loads the json but alas that never panned out and I have no javascript skills :(	22:30
corvus	wearing my opendev hat: i think if someone wanted to come along and add the code to the system-config repo to run an ara server in that configuration in a container on an opendev vm (with appropriate testinfra, etc) just like our other servers, i think that would be a fine outcome.	22:31
corvus	(i don't speak for all of us, but that's my individual feeling)	22:31
dmsimard[m]	I don't philosophically speaking have objections to carrying zuul bits in ara, but it would be in a specific backend, not hijacking the distributed sqlite one :p	22:32
jrosser	oh thats totally reasonable - i dont see my code really as much more than a proof of concept	22:32
corvus	yeah, as a software engineer, i do think a generic http one might be better, but also, there's more security questions to address if you do that, compared to one that's restricted to just a zuul installation (but, perhaps, an allow-list of url roots might address that)	22:33
corvus	if someone wanted to update the zuul api implementation in ara to use artifacts, i'm happy to answer questions on that too.	22:33
jrosser	^ i think i was concerned also about making something that just could be coerced into arbitrary downloads	22:35
corvus	jrosser: ++	22:35
dmsimard[m]	:D	22:36
corvus	if anyone's willing to sign up for some work on this, then putting an agenda item on the opendev team meeting might be a good idea (or finding another time or medium (like mailing list) to discuss it).	22:37
tonyb	corvus: mnasiadka was going to add it to the meeting agenda.	22:37
dmsimard[m]	I am rusty, but someone mentioned we should make a pad earlier, it can be a good start	22:38
corvus	++	22:38
tonyb	++	22:38
dmsimard[m]	I am running out of time for today but I will write some things down in here: https://etherpad.opendev.org/p/ara-for-databases	22:40
tonyb	and for the record I agree with corvus. if someone wants to do the bulk of the work and the ARA side gets done I'd be happy to help	22:40
dmsimard[m]	the meeting is this one? https://meetings.opendev.org/#OpenDev_Meeting	22:47
corvus	yep	22:47
tonyb	dmsimard[m]: Correct	22:47
dmsimard[m]	ok, I will put it in my calendar	22:48
Clark[m]	My main concern when this last came up was compatibility between ansible versions. You can run an arbitrary version in ci jobs, zuul pins to two specific versions, and ara also limits compatibility..the matrix there seems like a lot of trouble to maintain in a generic manner. What happens if opendev and Openstack ansible and kolla are all running three very distinct ansible versions and ara can only handle one or two of them?	22:52
Clark[m]	Maybe we are ok with that I don't know	22:52
dmsimard[m]	that is a good point to consider, I am out of time for now I can elaborate on that later	22:58
tonyb	I started some notes on that pad. Feel free to delete/edit/update as needed to accurately capture what was said: https://etherpad.opendev.org/p/ara-for-databases	22:59
opendevreview	Clark Boylan proposed opendev/system-config master: Add checks to avoid irc and matrix log collisions https://review.opendev.org/c/opendev/system-config/+/967619	23:36
clarkb	infra-root ^ that implements fungi's suggestion in a simple way. I think this should prevent the biggest unexpected footguns. I will note that the matrix-eavesdrop bot does make the log path for each room configurable so we could log to two different locations. But I think that doing a cutover is likely toavoid the most confusion over time	23:37
clarkb	a year from now we won't have to remember where the logs were at $point in time	23:37
clarkb	tonyb: I added a bit more context to the etherpad based on the recent conversation we had with mnasiadka	23:47

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!