Tuesday, 2025-11-18

clarkbthats interseting I sent the agenda approximately 5 minutes ago according to my mail client and it still hasn't show up in the archive or on the return back to me00:26
clarkband as I type this message it arrives. Nevermind00:26
tonybWe can discuss it in the meeting tomorrow if you'd like.  We've had remove unneeded users from gitea on the todo list for a while.   Looking at `gitea admin user list` there's only 1 (on all servers) (Also: https://opendev.org/explore/users).  Any objection to be running `gitea admin user delete --id 25` on all the servers ?01:13
*** elodilles_pto is now known as elodilles08:51
*** dmellado4 is now known as dmellado12:44
priteauHello. Today I am seeing frequent issues accessing tarballs.openstack.org / releases.openstack.org. Is there an issue with the server?13:06
opendevreviewPiotr Parczewski proposed zuul/zuul-jobs master: Drop Python 2 support  https://review.opendev.org/c/zuul/zuul-jobs/+/96697713:26
fungipriteau: what sort of issues?15:12
fungipriteau: also tarballs.openstack.org is just a redirect to tarballs.opendev.org/openstack15:14
fungiopenstack.org dns is hosted in cloudflare (opendev.org is not), so if you were seeing dns resolution issues i heard from some colleagues there was a cloudflare outage earlier today (resolved now)15:15
fungithat would be my first guess, without any additional detail on the nature of the problem you observed15:16
priteauAh, that could be it. I didn't know there was any dependency on cloudflare for this15:59
fungiyeah, opendev itself wasn't impacted since we don't rely on cloudflare for anything, but openstack.org dns resolution could have been broken for a while earlier today16:03
slittle1_can i get eyes on https://review.opendev.org/c/openstack/project-config/+/965422 when you have a chance.  Thanks16:04
fungidone, i'll add you as the initial member of starlingx-app-kubernetes-module-manager-core once the deploy jobs complete16:10
fungisorry i missed that had been proposed a few weeks ago16:10
slittle1_no probs.  THanks again16:12
opendevreviewMerged openstack/project-config master: Add Kubernetes Module Manager app  https://review.opendev.org/c/openstack/project-config/+/96542216:19
*** jonher_ is now known as jonher16:20
opendevreviewTakashi Kajinami proposed opendev/base-jobs master: Add standard Debian Trixie nodesets  https://review.opendev.org/c/opendev/base-jobs/+/96758416:41
opendevreviewTakashi Kajinami proposed opendev/base-jobs master: Add standard Debian Trixie nodesets  https://review.opendev.org/c/opendev/base-jobs/+/96758416:45
clarkbcorvus: ^ is that what we have in mind for nodeset management going forward with the new launcher non generic labels?16:46
clarkbwe're discussing in #openstack-infra and I think this is the general direction we're headed in but wanted to make sure you had a chance to check as you've been thinking about this far more than anyone else I suspect16:47
*** dmellado0 is now known as dmellado16:48
tkajinammy own preference (as I stated in #openstack-infra) is have generic nodeset defintion in a single place to avoud duplicating similar defs across repos or adding multiple cross-repo dependency... though I don't have strong opinion so will follow the guidance 16:49
tkajinamI'm leaving in a few minutes but will check the discussion tomorrow (I have to go out during the day so maybe in the evening).16:50
clarkbtkajinam: enjoy!16:51
tkajinamclarkb, thx !16:51
clarkbfwiw I think I'm good with the current iteration of 967584. Basically keep the trned of being specific but provide easy to use nodeset definitions within that16:55
clarkbI did find one issue in tkajinam's change and left a comment. I'm happy to update the change if there are not other concerns with it17:36
opendevreviewJames E. Blair proposed opendev/base-jobs master: Remove nodesets file  https://review.opendev.org/c/opendev/base-jobs/+/96759717:54
corvusclarkb: tkajinam i think we want to update this file instead: https://opendev.org/opendev/zuul-providers/src/branch/master/zuul.d/nodesets.yaml17:54
clarkboh right we're consolidating thati n the zuul-providers repo17:55
clarkbI'll get a change up for that shortly17:55
fungioh! somehow i missed that we had deprecated the nodesets in base-jobs17:55
fungimaybe we should add a comment in the file17:56
clarkb++17:56
corvusone other thing: i can't recall whether we decided we should keep a non-memory-specific (or default) nodeset around.... i know we talked about it; did we record the decision?17:56
corvusfungi: we should delete it https://review.opendev.org/96759717:57
fungioh! even better17:57
opendevreviewClark Boylan proposed opendev/zuul-providers master: Add Debian Trixie x86-64 nodesets  https://review.opendev.org/c/opendev/zuul-providers/+/96759917:59
clarkbcorvus: I feel like we discussed it and wanted to get away from the generic non memory specific labels for sure17:59
clarkbcorvus: I don't recall if we wanted to extend that to nodesets. I'm somewhat inclined to be explicit now as this is a great opportunity to transition and it removes some level of confusion for people18:00
fungithe main thing we lose, without some more work, is an indication of preferred/best-served alternative labels18:01
fungiwhich of those will have the greatest diversity of providers to boot in?18:01
clarkbcurrently its the 8GB because we can boot them in every provider (I think OVH is the main location where we can't boot the others)18:01
fungiif i pick the "wrong" one my jobs may have to wait for the handful of providers serving that label to have free quota18:01
clarkbbut yes that isn't communicated anywhere and having a generic label would allow us to continue to provide that without thinking too hard18:02
clarkbI'm ahppy to add a generic nodeset to 967597 if we prefer that18:03
clarkbcorvus: we expect https://review.opendev.org/c/opendev/base-jobs/+/967597 to be safe to land now right?18:03
clarkbwe aren't even loading nodesets from that repo aiui so I can just go ahead and approve it18:03
corvusclarkb: yep18:03
corvusthe tenant config file in openstack/project-config is the place to double check me if you want18:04
corvus(but also, if we missed something and it blows up, we should just roll forward :)18:04
clarkbya I was just looking. We exclude nodesets in opendev but then only include job and secret in the other tenants18:04
clarkbso I think this is all good18:04
corvusi honestly don't know the best answer to the generic nodeset question -- i'm a little hesitant to not have it because if doing so encourages lots of ppl to set -4GB nodes then that's not good for our ovh utilization.  but maybe we should not have it and just encourage people to still use -8GB unless there's a good reason not to.18:06
clarkbya I think the alternative to implicit unsaid communication is explicit communication18:06
corvusalso, i don't have time at the moment to coordinate with the ovh folks on adjusting things so we can use different node sizes; if someone wants to try to restart that project, that would be very useful.18:07
fungithat might be a good thing to get dmsimard[m] to help with18:07
clarkbyes and there was discussion about refreshing the hardware at the same time possibly18:07
corvusyeah, if we want to try to end up with more explicit/deliberate choices in the future, then avoiding the generic nodeset and over-communicating that people should use -8gb for now may be best.18:08
clarkbthough I don't think we need to couple the efforts18:08
clarkbcorvus: why don't we start with that and if it creates problems we can switch to adding the generic nodeset later18:08
clarkbI feel like removing the generic nodeset is harder so going in this direction is probably best?18:09
corvusyep, not coupled, but related: once we have all providers able to run 4gb nodes, we can stop telling people to use 8gb.18:09
corvus(and encourage them to use 4gb)18:09
clarkbI think my change is good as is if we don't start with the generic nodeset18:10
corvusclarkb: ++18:10
opendevreviewMerged opendev/base-jobs master: Remove nodesets file  https://review.opendev.org/c/opendev/base-jobs/+/96759718:17
mnasiadkaAny chance for landing trixie arm64? (https://review.opendev.org/c/opendev/zuul-providers/+/966200)19:07
fungiapproved19:09
mnasiadkaThanks fungi19:09
fungisorry i missed it earlier19:09
opendevreviewClark Boylan proposed opendev/system-config master: Add matrix gerritbot to the new opendev matrix room  https://review.opendev.org/c/opendev/system-config/+/96760819:40
clarkbinfra-root ^ that is the promised gerritbot change. I modeled it after what we do for zuul but updated the repo names19:40
clarkbhttps://review.opendev.org/q/hashtag:%22opendev-matrix%22+status:open will pull up that change and the statusbot change so is probably preferable as a link to monitor19:41
clarkbI'm going to figure out lunch then probably go out for a bike ride. But when I get back I'll try the test case check for non overlapping channel names between matrix and irc for logging19:45
clarkbI did skim the limnoria channellogger config and it appears to be an all or nothing choice19:45
clarkbthere are ways to 'tune' the logging on disk, but I think if we change that it changes it for every channel and makes the log files one big flat setup which is not a great option19:47
clarkbso simply having a cutover flag day and switching over is probably ebest19:47
tonybas long as we communicate the flag day and have consistent expectations around how we handle activity here after the flag day that sounds good to me19:49
tonybcould be I'm worried about nothing 19:49
dmsimard[m]fungi: hi, I read up a bit and might be lacking context, you want different flavors ?19:54
fungidmsimard[m]: basically the account zuul uses is tied to a dedicated nova host aggregate with scheduling controlled through a custom flavor, as i understand it19:55
fungiamorin had mentioned there was an opportunity for a hardware refresh as well, i think19:56
fungiclarkb might remember the conversation better than i do19:56
fungibut ultimately we'd like to have a few different flavors with 4/8/16 gb ram options now that zuul can more efficiently make use of them19:57
dmsimard[m]yeah I think he mentioned it would be relevant to do a refresh of the aggregate, we can and should still do that but if new flavor(s) is all you need for now we can make that happen faster19:57
dmsimard[m]let me see what the config looks like19:58
fungii don't think we're in any hurry19:58
dmsimard[m]there's two projects, openstackci and openstackjenkins, I'm guessing we're talking about openstackjenkins19:59
fungicorrect20:00
dmsimard[m]after all these years, jenkins manages to stick around somehow 😂20:00
fungiindeed, we don't name our new accounts that way, it's a testament for how long ovh has been dedicated to helping us test openstack20:01
fungii want to say they were our second cloud donor after rackspace20:01
fungiso it's been a _long_ time20:02
dmsimard[m]ok I see how the flavor and aggregate is set up. Do you have a list of the flavors documented somewhere ? You mentioned RAM but what about vcpus and disk ?20:03
Clark[m]We were told that the setup is old and uses some extension or plugin or something that needs to be dealt with first or in conjunction 20:04
Clark[m]I think the modern setup uses normal nova functionality 20:05
Clark[m]aiui step 1 was making that transition. Then we could start "tuning" flavors from there20:06
fungiour other cloud providers don't really give us the flexibility of custom flavors, but approximate guess would be core count equivalent to ram gigabytes, and at least 80gb for the rootfs20:06
Clark[m]as far as size goes today we expect an 80GB disk, 8GB of memory, and ~8vcpu. There is flexibility when going up and down in memory size to also go up and down in vcpu count but I think 80GB disk is probably a good ballpark due to git repos and all that20:07
dmsimard[m]ok, I could be mistaken but from what I am looking at here it would be in the realm of possibility to get you new flavors soon enough while we figure out the hardware refresh, I'll double check with Arnaud tomorrow20:07
fungiso 4gbram+4vcpu+80gbroot, 8gbram+8vcpu+80gbroot, 16gbram+16vcpu+80gbroot would be my best guess20:07
clarkbhttps://etherpad.opendev.org/p/ovh-flavors here is where we started sketching things out20:08
dmsimard[m]on your end you would map these flavor names/ids somewhere in zuul or something ?20:08
fungicorrect20:08
clarkbthat has some background on the step 1 thing20:09
fungipeople writing jobs would basically choose between 4gb, 8gb or 16gb options for their nodes20:09
clarkbreading that document actually seems to cover much of this20:09
fungiand we then try to normalize those across our different providers20:09
dmsimard[m]yeah thanks for the pad clarkb :)20:10
clarkbI had to switch back to the computer to find that in my browser history20:10
clarkbI may switch back to matrix client again while I eat lunch20:10
dmsimard[m]let me get back to you tomorrow on that20:10
fungiand like i said, there's no rush on this. we just wanted to make sure it doesn't fall through the cracks, and also we like having an opportunity to talk to you again! ;)20:10
dmsimard[m]haha likewise20:11
fungiwas awesome to see you in france too20:11
dmsimard[m]I still have your business card with the pgp key on it :P20:12
fungiit's still valid! i don't think it gets you any discounts on food sadly20:12
fungibut that's the same pgp key signing openstack security advisories and attesting to the keys that sign openstack release tags and tarballs20:13
fungi4k rsa should be post-quantum safe enough, from what i understand, so i don't expect to rotate it for a while still20:14
opendevreviewMerged opendev/zuul-providers master: Add Debian Trixie x86-64 nodesets  https://review.opendev.org/c/opendev/zuul-providers/+/96759920:31
fungiopenstack.exceptions.HttpException: HttpException: 499: Client Error for url: https://swift.api.sjc3.rackspacecloud.com/v1/AUTH_ac0fed44dbe4539d83485bcefc4e2d4b/images-7b7d44d25aa9/d2de98d192f240e8a3ed59002d3d4629-debian-trixie-arm64.qcow2/000015, Client DisconnectThe client was disconnected during request.21:36
fungihopefully temporary? guess i'll recheck 96620021:36
funginow on recheck opendev-build-diskimage-debian-trixie-arm64 hit NODE_FAILURE21:42
dmsimard[m]speaking of france, I am still working through my backlog from summit but one of the questions I've had is if we could make a logs.openstack.org-like server like we once used to have once upon a time21:49
dmsimard[m]kolla-ansible and openstack-ansible would like to send their ara databases somewhere so they could look at reporting of their "nested" ansible (in addition to zuul's perspective)21:50
dmsimard[m]I have a "demo" ara server that I happen to use for CI but it's not meant to receive a lot of traffic :p21:51
dmsimard[m]the idea would be to have something in the post pipeline upload the ara sqlite database to a server somewhere so they can be dynamically rendered, like logs.openstack.org was21:52
corvusfungi: osuosl seems to be returning a lot of http errors now, causing the node failure21:52
tonybdmsimard[m]: We did discuss some related topics.  I think it was mostly focused on a persistent DB to send the ara reports to, less a central logging server21:52
tonybdmsimard[m]: For nested ansible we have some support for that see 'ARA Report' in https://zuul.opendev.org/t/openstack/build/8c59d2cdf14f4f2786051255139bc56d/artifacts as an example21:53
dmsimard[m]tonyb: yes, I advised against an "open" server to send results to in real time, it would be needlessly demanding from a latency/performance perspective21:54
fungidmsimard[m]: do you mean central logging for infrastructure services, or for analyzing job output?21:54
tonybdmsimard[m]: There is an opensearch managed by RH $somewhere21:54
dmsimard[m]I am not in the loop for central logging, this was about job reports specifically21:55
fungiyeah, dpawlik has been maintaining a replacement for the old logstash/elasticsearch systems we used to run21:55
fungihttps://governance.openstack.org/sigs/tact-sig.html#opensearch21:56
tonybfungi: Thanks I couldn't find a link o guess the name21:57
dmsimard[m]tonyb: the thing about html reports is that they are very inefficient, it's a lot of small files to object storage, they take time to generate and upload21:57
fungithere's a shared login because at some point years ago kibana stopped supporting anonymous access21:57
fungithis runs on amazon-supplied opensearch and aws services, ftr21:58
tonybdmsimard[m]: That's fair and one of the aspects that was discussed.   One of the benefits we get (becuase it's swift backed) is automatic expiration.21:59
fungithey give us a pile of free credits yearly to operate that specific service21:59
fungithe opensearch+aws resources i mean21:59
tonybdmsimard[m]: I think the TL;DR: is we need a solid plan and ideally a list of costs and benefits, but there isn't any strong objection to doign $something better22:00
dmsimard[m]tonyb: I remember there used to be a big find command that'd automatically delete old files on logs.openstack.org, it's kinda like automatic expiration22:01
tonybdmsimard[m]: LOL, true.22:01
dmsimard[m]what I am suggesting is a similar approach with just the (relatively) small sqlite databases: https://ara.readthedocs.io/en/latest/distributed-sqlite-backend.html22:01
* jrosser remembers hacking an attempt at this22:02
dmsimard[m]noonedeadpunk, mnasiadka ^22:03
* tonyb reads22:03
jrosserwhich pulled the db, rather than had it pushed iirc22:03
corvusclarkb: has been pretty involved in these conversations, might be good to arrange a discussion when he's available22:04
fungii'll note that the "little" find command started to run into the same problems we see with htcacheclean on some of our apache mod_cache mirrors... the expiration/cleanup takes so long to iterate over the massive number of files that data accumulates faster than we can expire and remove it22:04
dmsimard[m]fungi: oh yeah, it was definitely not without its share of issues, I remember running out of inodes :p22:05
tonybI think having an etherpad or similar to flesh out the idea is the next step.  I recall some discussion around the distributed sqlite approach22:05
fungiuploading to swift and declaring object expirations that the backend can act on asynchronously has absolved us of that task when it comes to job logs22:05
corvusbut if the ara server can load the sqlite db over an http connection, then using an architectural approach like zuul-proxy, where an artifact in zuul links to a special url on an ara server that instructs the middleware to fetch the sqlite db from the existing object storage used for logs uploads could be a low-maintenance option.22:06
corvussorry, s/zuul-proxy/zuul-preview/22:07
dmsimard[m]if we really want to keep stuff in swift, I wonder if something like s3fs would work, but I guess the challenge is that there are a lot of different swifts22:09
corvus(well, actually both, zuul-storage-proxy servers log urls by url, and zuul-preview serves them by header)22:09
tonybOr something I did at a former employer (which is a little gross but worked) was store the temporary data in a directory rooted in soemthing like `TZ=0000 date --date "+2 weeks" +"%Y%m%d"`, then the cleanup was fairly quick22:10
dmsimard[m]the ara server doesn't know how to load sqlite databases over http :(22:10
dmsimard[m]not yet, anyway, but nothing a curl or wget can't fix22:11
jrosserdmsimard[m]: you remember i did that?22:11
dmsimard[m]or rsync22:11
dmsimard[m]jrosser: I think so22:11
jrosserhttps://github.com/jrosser/ara/commit/f9af69eaef4ea1228f4fc641e36b9d8df5adbaaa22:11
jrosseri know it was not liked, but it is what it is :)22:11
dmsimard[m]oh it's even against the new django codebase22:12
dmsimard[m]I can try it22:13
jrosserits from really some time ago but i did run it for a while22:13
dmsimard[m]2022 does feel like forever ago22:14
fungiin the beforetime, in the longlongago22:16
corvusthat implementation looks a little bit fragile.  something that might make it more future-proof would be to report the sqlite db url as an artifact, and then, if you wanted to have ara fetch it, use zuul's api to look up the artifact url and fetch that.  it would avoid encoding some of that business logic in the wsgi code.22:16
corvus(that would get rid of settings.DISTRIBUTED_SQLITE_ZUUL_DB_PATH and api_resp.json()[0].get('log_url')22:18
corvusjrosser: why did you stop running it?22:25
jrossercorvus: because i didnt want to start relying (and having others also rely) on something that got such lukewarm response back in 202222:27
dmsimard[m]corvus: I think I see what you mean, we can try that too22:27
corvusjrosser: are you concerned the ara devs might stop supporting that?  (or did you mean lukewarm response from open infra community)22:28
jrosserwell a bit of all of it really22:28
jrosserbut the topic does keep coming round again22:28
jrosseri have enough imposter syndrome to just drop it and step away rather than push for it, sadly22:30
dmsimard[m]I feel like my argument at the time might have been that something generic would have been nice (less zuul specific, more sqlite over http), someone looked at getting it through a react js app similar to how zuul loads the json but alas that never panned out and I have no javascript skills :(22:30
corvuswearing my opendev hat: i think if someone wanted to come along and add the code to the system-config repo to run an ara server in that configuration in a container on an opendev vm (with appropriate testinfra, etc) just like our other servers, i think that would be a fine outcome.22:31
corvus(i don't speak for all of us, but that's my individual feeling)22:31
dmsimard[m]I don't philosophically speaking have objections to carrying zuul bits in ara, but it would be in a specific backend, not hijacking the distributed sqlite one :p22:32
jrosseroh thats totally reasonable - i dont see my code really as much more than a proof of concept22:32
corvusyeah, as a software engineer, i do think a generic http one might be better, but also, there's more security questions to address if you do that, compared to one that's restricted to just a zuul installation (but, perhaps, an allow-list of url roots might address that)22:33
corvusif someone wanted to update the zuul api implementation in ara to use artifacts, i'm happy to answer questions on that too.22:33
jrosser^ i think i was concerned also about making something that just could be coerced into arbitrary downloads22:35
corvusjrosser: ++22:35
dmsimard[m]:D22:36
corvusif anyone's willing to sign up for some work on this, then putting an agenda item on the opendev team meeting might be a good idea (or finding another time or medium (like mailing list) to discuss it).22:37
tonybcorvus: mnasiadka was going to add it to the meeting agenda.22:37
dmsimard[m]I am rusty, but someone mentioned we should make a pad earlier, it can be a good start22:38
corvus++22:38
tonyb++22:38
dmsimard[m]I am running out of time for today but I will write some things down in here: https://etherpad.opendev.org/p/ara-for-databases22:40
tonyband for the record I agree with corvus.   if someone wants to do the bulk of the work and the ARA side gets done I'd be happy to help22:40
dmsimard[m]the meeting is this one? https://meetings.opendev.org/#OpenDev_Meeting22:47
corvusyep22:47
tonybdmsimard[m]: Correct22:47
dmsimard[m]ok, I will put it in my calendar22:48
Clark[m]My main concern when this last came up was compatibility between ansible versions. You can run an arbitrary version in ci jobs, zuul pins to two specific versions, and ara also limits compatibility..the matrix there seems like a lot of trouble to maintain in a generic manner. What happens if opendev and Openstack ansible and kolla are all running three very distinct ansible versions and ara can only handle one or two of them?22:52
Clark[m]Maybe we are ok with that I don't know22:52
dmsimard[m]that is a good point to consider, I am out of time for now I can elaborate on that later22:58
tonybI started some notes on that pad.  Feel free to delete/edit/update as needed to accurately capture  what was said: https://etherpad.opendev.org/p/ara-for-databases22:59
opendevreviewClark Boylan proposed opendev/system-config master: Add checks to avoid irc and matrix log collisions  https://review.opendev.org/c/opendev/system-config/+/96761923:36
clarkbinfra-root ^ that implements fungi's suggestion in a simple way. I think this should prevent the biggest unexpected footguns. I will note that the matrix-eavesdrop bot does make the log path for each room configurable so we could log to two different locations. But I think that doing a cutover is likely toavoid the most confusion over time23:37
clarkba year from now we won't have to remember where the logs were at $point in time23:37
clarkbtonyb: I added a bit more context to the etherpad based on the recent conversation we had with mnasiadka 23:47

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!