clarkb | #startmeeting infra | 19:00 |
---|---|---|
opendevmeet | Meeting started Tue May 14 19:00:28 2024 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:00 |
opendevmeet | The meeting name has been set to 'infra' | 19:00 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/Q3ZNOUTLIF3HWCUX7CQI7ITFBYC3KCCF/ Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
clarkb | I'm planning to take Friday off and probably Monday as well and do a long weekend | 19:01 |
frickler | monday is a public holiday here | 19:01 |
fungi | i am back from my travels and trying to catch up again | 19:02 |
clarkb | oh nice. The monday after is a holiday here but I'm likely going to work that monday and instead do the long weekend this weekend | 19:02 |
clarkb | #topic Upgrading Old Servers | 19:03 |
clarkb | not sure if tonyb is around but I think we can dive right in | 19:03 |
clarkb | the meetpad cluster has been upgraded to jammy \o/ | 19:03 |
tonyb | Yup I'm here | 19:04 |
clarkb | the old servers are still around but in the emergency file and shutdown | 19:04 |
tonyb | so next is wiki and/or cacti | 19:04 |
clarkb | I think we can probably delete them in the near future as I've just managed to use meetpad successfully without issue | 19:04 |
tonyb | Great. I'll remove them today/tomorrow | 19:04 |
tonyb | WRT cacti there is an approved spec to implement prometheus | 19:05 |
clarkb | tonyb: sounds good. In addition to what is next I'm thinking we should try and collapse our various todo lists into a single one for server upgrades. Do you want to do that or should I take that on? | 19:05 |
tonyb | I an do it. | 19:05 |
tonyb | *can | 19:05 |
clarkb | tonyb: yes there is. The main bit of that that needs effort is likely the agent for reporting from servers to prometheus. I think running prometheus should be straightfowrard with a volume for metric data backing and containers for the service (thats still non zero effort but is fairly straightfowrard) | 19:06 |
tonyb | So how long would cacti be around once we have prometheus? | 19:06 |
clarkb | tonyb: that is a good qusetion. I suspect we can keep it around for some time in a locked down state for historical access, but don't need to worry about that being public whcih does reduce a number of concerns | 19:07 |
tonyb | I looked at cacti and it seems like it should be doable on jammy or noble which I think would avoid the need for extended OS support from Canonical | 19:07 |
clarkb | I think currently cacti will give us a yaer of graphing by default so thats probably a good goal to aim for if possible | 19:07 |
clarkb | I'm not sure if older data is accessible at all | 19:07 |
tonyb | Okay. I'll work on that. | 19:08 |
corvus | yep 1 year no older data | 19:08 |
tonyb | got it. I have the start of a plan | 19:09 |
clarkb | thanks, feel free to reach out with any other questions or concerns | 19:09 |
tonyb | I was also thinking .... we have a number of "simple" services that just got put onto jammy. Is is a good/bad or indifferent idea to pull them onto noble | 19:10 |
clarkb | tonyb: I don't think it is a bad idea, but I also think they are less urgent | 19:10 |
tonyb | I worry that doing so will mean more pressure when getting away from noble, but it also simplifies the infra | 19:10 |
tonyb | Okay. Noted | 19:10 |
clarkb | historically we've been happy to skip the intermediate LTS and just go to the next one | 19:10 |
clarkb | though I think we've got a ton of stuff on focal there may be some imbalance in the distribution | 19:11 |
tonyb | if there are ones that are easy or make sense I'll see about including them | 19:11 |
clarkb | we also don't have noble test nodes yet so testing noble system-config nodes isn't yet doable | 19:11 |
clarkb | which is related to the next coupel of topics | 19:11 |
clarkb | lets jump into those | 19:12 |
clarkb | #topic Adding Noble Test Nodes | 19:12 |
clarkb | I think we're still at the step of needing to mirror noble. I may have time to poke at that later this week depending on how some of this gerrit upgrade prep goes | 19:12 |
clarkb | Once we've managed to write all that data into afs we can then add noble nodes to nodepool. I don't think we need a dib release for that baceuse only glean and our own elements changed for noble | 19:13 |
tonyb | Sounds good. | 19:13 |
clarkb | but if my undersatnding of that is wrong please let me know and we can make a new dib release too | 19:13 |
clarkb | #topic AFS Volume Cleanup | 19:14 |
clarkb | after our meeting last week all the mergeable topic:drop-ubuntu-xenial changes merged | 19:14 |
clarkb | thank you for the reviews. At this point we are no longer building wheels for xenial and a number of unused jobs have been cleared out. That said there is still much to do | 19:15 |
clarkb | next on my list for this item is retiring devstack-gate | 19:15 |
clarkb | it is/was a big xenial consumer and openstack doesn't use it anymore so I'm going to retire it and pull it out of zuul entirely | 19:15 |
clarkb | then when that is done the next step is pruning the rest of the xenial usage for python jobs and nodejs and so on. I suspect this is where momentum will slow down and/or we'll just pull the plug on some stuff | 19:16 |
fungi | hopefully the blast radius for that is vanishingly small, probably non-openstack projects which haven't been touched in years | 19:16 |
fungi | for devstack-gate i mean | 19:16 |
clarkb | ya a lot of x/* stuff have xenial jobs too | 19:16 |
clarkb | I expect many of those will end up removed from zuul (without full retirement) | 19:16 |
fungi | also means we should be able to drop a lot of the "legacy" jobs converted from zuul v2 | 19:17 |
clarkb | similar to what we did with fedora cleanup and windmill | 19:17 |
clarkb | oh good point | 19:17 |
fungi | tripleo was one of the last stragglers depending on legacy jobs, i think | 19:17 |
tonyb | fun. never a dull moment ;P | 19:17 |
clarkb | so ya keep an eye out for changes related to this and I'm sure I'll bug people when I've got a good set that need eyeballs | 19:19 |
clarkb | slow progress is being made otherwise | 19:19 |
clarkb | #topic Gerrit 3.9 Upgrade Planning | 19:19 |
clarkb | Last week we landed a chnge to rebuild our gerrit images so that we would actually have an up to date 3.9 image after the last attempt failed (my fault as the jobs I needed to run were not triggered) | 19:20 |
clarkb | we restarted Gerrit to pick up the corresponding 3.8 image update just to be sure everything there was happy and also upgraded mariadb to 10.11 at the same time | 19:20 |
clarkb | I have since held some test nodes to go through the upgrade and downgrade process in order to capture notes for our planning either | 19:21 |
clarkb | s/either/etherpad/ | 19:21 |
fungi | i can't remember, did i see a change go by last week so that similar changes will trigger image rebuilds in the future? | 19:21 |
clarkb | #link https://etherpad.opendev.org/p/gerrit-upgrade-3.9 Upgrade prep and process notes | 19:21 |
tonyb | fungi: I think you are thinking wishfully | 19:21 |
clarkb | fungi: no change that I am aware of. I'm on the fence over whether or not we want that | 19:21 |
clarkb | its ambiguous to me if the 3.9 image should be promoted when we aren't explicitly updating the 3.9 image and only modifying the job. Maybe it should | 19:22 |
fungi | tonyb: i'm a repo-half-full kinda guy | 19:22 |
clarkb | other items of note here: testing seems to confirm the topic change limit should be a non issue for the upgrade and our known areas of concern (since we never have that many open changes) | 19:23 |
clarkb | the limit is also forgiving enough to allow you to restore abandoned chagnes that push it over the limit | 19:23 |
clarkb | creating new changes with that topic or applying the topic to existing changes that would push it over the limit does error though | 19:23 |
fungi | and only rejects attempts to push new ones or set the topic | 19:23 |
fungi | yeah, that. perfect | 19:23 |
clarkb | yup that appears to be the case through testing | 19:23 |
clarkb | I haven't seen any feedback on my change to add a release war build target to drop web fonts | 19:24 |
clarkb | would probably be good for other infra-root to form an opinion on whether or not we want to patch the build stuff locally for our prod builds or just keep using docs with web fonts | 19:24 |
fungi | what was the concern there? were we leaking visitor details by linking to remotely-hosted webfont files? | 19:25 |
clarkb | fungi: we are but only if you open docs I think | 19:25 |
fungi | oh, so not for the general interface | 19:26 |
fungi | just the documentation part | 19:26 |
clarkb | and gerrit added the ability to disable that very recently when building docs. However the didn't expose that option when building release wars | 19:26 |
clarkb | just when building the documentation directly | 19:26 |
fungi | neat | 19:26 |
clarkb | ya just did a qucik check. Gerrit proper serves the fonts it needs | 19:26 |
clarkb | the docs fetch them from google | 19:27 |
fungi | probably added the disable feature to aid in faster local gerrit doc development | 19:27 |
fungi | and didn't consider that people might not want remote webfont links in production | 19:28 |
clarkb | so anyway I pushed a change upstream to have a release war build target that depends on the documentation build without web fonts | 19:28 |
clarkb | we can apply that patch locally to our gerrit builds if we want | 19:28 |
clarkb | or we can stick with the status quo until the change merges upstream | 19:28 |
clarkb | feedback on approach taken there very much welcome | 19:29 |
clarkb | #link https://gerrit-review.googlesource.com/c/gerrit/+/424404 Upstream release war build without web fonts change | 19:30 |
clarkb | #topic openstack.org DNS record cleanup | 19:30 |
clarkb | #link https://paste.opendev.org/show/bVHJchKcKBnlgTNRHLVK/ Proposed DNS cleanup | 19:30 |
fungi | just a quick request for folks to look through that list and let me know if there are any entries which i shouldn't delete | 19:30 |
clarkb | fungi put together this list of openstack.org records that can be cleaned up. I skimmed the list and had a couple of records to call out, but otherwise seems fine | 19:30 |
clarkb | api.openstack.org redirects to developer.openstack.org so something is backing that name and record | 19:31 |
clarkb | and then I don't think we managed whatever sat behind the rss.cdn record so unsure if that is in use or expected to function | 19:31 |
fungi | oh, yep, i'm not sure why api.openstack.org was in the list | 19:31 |
fungi | i'll say it was a test to see if anyone read closely ;) | 19:32 |
fungi | we did manage the rss.cdn.o.o service, back around 2013 or so | 19:32 |
fungi | it was used to provide an rss feed of changes for review | 19:32 |
clarkb | oh I remember that now. | 19:33 |
fungi | i set up the swift container for that, and anita did the frontend work i think | 19:33 |
fungi | openstackwatch, i think it was called? | 19:33 |
clarkb | that sounds right | 19:33 |
fungi | cronjob that turned gerrit queries into rss and uploaded the blob there | 19:34 |
clarkb | those were the only two I had questions about. Everything else looked safe to me | 19:35 |
clarkb | and now api.o.o is the only one I would remove fro mthe list | 19:35 |
tonyb | Nothing stands out to me, but I can do some naive checking to confirm | 19:36 |
fungi | oddly, i can't tell what's supplying the api.openstack.org redirect | 19:36 |
fungi | something we don't manage inside an ip block allocated to Liquid Web, L.L.C | 19:37 |
fungi | i guess we should host that redirect on static.o.o instead | 19:38 |
fungi | i'll push up a change for that and make the dns adjustment once it deploys | 19:39 |
clarkb | sounds like a plan | 19:40 |
clarkb | #topic Open Discussion | 19:40 |
clarkb | Anything else? | 19:40 |
tonyb | Vexxhost and nodepool | 19:40 |
clarkb | I meant to mention this in the main channel earlier today then didn't: but this is the classic reason for why we have redundancy | 19:41 |
tonyb | frickler: reported API issues and they seem to ongoing | 19:41 |
clarkb | I don't think this is an emergency | 19:41 |
clarkb | however, it would be good to understand in case its a systemic problem that would affect say gitea servers or review | 19:41 |
frickler | seeing the node failures I'm also wondering whether disabling that region would help | 19:41 |
tonyb | No not an emergency just don't want it to linger and wondering when, or indeed, if we should act/escalate | 19:42 |
clarkb | yes i think the next step may be to set max-servers to 0 in those region(s) then try manually booting instances | 19:43 |
clarkb | and debug from there | 19:43 |
clarkb | then hopefully that gives us information that can be used to escalate effectively | 19:44 |
tonyb | Okay. That sounds reasonable. | 19:45 |
clarkb | sounds like that may be everything. Thank you everyone | 19:48 |
clarkb | We'll be back here same time and location next week | 19:48 |
corvus | thanks! | 19:48 |
clarkb | and feel free to bring things up in irc or on the mailing list in the interim | 19:48 |
clarkb | #endmeeting | 19:48 |
opendevmeet | Meeting ended Tue May 14 19:48:53 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:48 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2024/infra.2024-05-14-19.00.html | 19:48 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-05-14-19.00.txt | 19:48 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2024/infra.2024-05-14-19.00.log.html | 19:48 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!