Tuesday, 2021-10-05

clarkbAnyone else here for the team meeting?18:59
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Oct  5 19:01:02 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2021-October/000287.html Our Agenda19:01
clarkb#topic Announcements19:01
ianwo/19:01
clarkbThe OpenStack release is happening tomorrow afternoon UTC time19:01
fungiahoy19:01
fungiit'll probably start tomorrow morning utc19:01
clarkbWe should avoid changes to tools that produce code today and tomorrow until that is done19:01
fungibut should hopefully be complete by 14:00 utc or therabouts19:02
clarkbfungi: good point, It starts earlier but aims to be done by ~1500UTC?19:02
fungiyeah, 15z is press release time19:02
clarkbToday is a good day to avoid touching gerrit, gitea, zuul, etc :)19:02
fungibut they generally shoot to have all the artifacts and docs published and rechecked at least an hour prior19:02
clarkbI plan to try and get up a bit early tomorrow to help out if anything comes up. But ya I expect it will be done by the time i have to take kids to school which will be nice as I can do that iwthout concern then :)19:03
fungiand it's a multi-hour process so usually begins around 10:00z or so19:03
clarkbAnyway just be aware of that and lets avoid restarting zuul for example19:03
clarkb#topic Actions from last meeting19:04
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-09-28-19.01.txt minutes from last meeting19:04
clarkbI don't see any recorded actions. Lets move on19:04
clarkb#topic Specs19:04
clarkb#link https://review.opendev.org/c/opendev/infra-specs/+/804122 Prometheus Cacti replacement19:04
clarkbI didn't find time to update this spec to propose using the prebuilt binaries19:05
clarkb#action clarkb update prometheus spec to use prebuilt binaries19:05
fungithe current spec doesn't rule them out though19:05
clarkbit does suggest we use the docker images19:05
clarkbwhcih are different19:05
clarkbwell specifically for node exporter the idea was docker iamges for that if we used it19:05
clarkbbut ianw makes a good point that we can just grab the prebuilt binaries for node exporter and host them ourselves and stick them on all our systems19:06
clarkbthat will give us consistent node exporter metrics without concern for version variance and avoids oddness with docker images19:06
clarkbAnyway I'll try to update that. Seems like there are a lot of things happenign this week (yay pre release pressure build up all able to let go now)19:07
clarkb#link https://review.opendev.org/810990 Mailman 3 spec19:07
clarkbI'll give us all an action to give this a review as well. This is important not just for the service update but it will help inform us on the best path for handling the odd existing server19:07
clarkb#action infra-root Review mailman 3 spec19:07
fungishould be in a reviewable state, i don't have any outstanding todos for it, but feedback would be appreciated19:07
clarkb#topic Topics19:08
clarkb#topic PTGBot Deployment19:08
clarkbThis is a late entry to the meeting agenda that I added to my local copy19:08
clarkbLooks like we've got a stack of changes to deploy ptgbot on the new eavesdrop setup, but we're struggling with LE staging errors19:09
clarkbLE doesn't indicate any issues at https://letsencrypt.status.io/19:09
clarkbWe also had an idea that we might be able to split up the handlers/main.yaml that handles all the service restarts post cert update. That would then allow us to run more minimal sets of jobs when doing LE updates, but ansible doesn't work that way unfortunately19:09
clarkbacme.sh is already retrying things that the protocol is expected to be less reliable with19:10
clarkbfor that reason I hesitate to add retries in our ansible calls of acme.sh but that is a possibility too if we want to try and force this a bit more19:10
ianwi wonder if we should just short-cut acme.sh19:10
clarkbfungi: ianw: ^ anything else to add on this subject? Mostly wanted to call it out because the LE stuff could have more widespread implications19:10
clarkbianw: when testing you mean?19:11
fungiianw: i wouldn't be entirely opposed19:11
ianwat the moment, what it does is asks the staging to setup the certs, so we get the TXT responses19:11
ianwbut we never actually put them into dns and finish verifying, we self-generate with openssl19:11
fungior i considered deploying a pebble container on the test nodes and pointing acme.sh to a localhost:xxxx url19:11
ianwyeah, that has been on my todo list for a long time :)19:11
clarkbIn the past when we've seen the staging api have trouble it usually goes away within a day. Not great to rely on that nor is there any garuntee or indication that will be the case when it happens again19:12
fungithe staging environment api docs outright say it's not recommended for use in ci jobs19:12
ianwin testing mode, we could just avoid calling acme.sh and somehow output a fake/random TXT record, to keep testing the surrounding bits19:12
clarkbianw: that might be a good compromise19:13
clarkbianw: the driver.sh could echo that out pretty easily19:13
ianwi can look at this today; it would be nice to keep the path on one job, but maybe we should have a specific acme.sh test job19:13
clarkbsomething like cat /dev/urandom | tr [:stuff:] [:otherstuff:] | tail -2019:14
ianwi have had https://github.com/acmesh-official/acme.sh/pull/2606 open for 2 years to better detect failures; clearly it hasn't attracted attention19:14
fungiyes, i was thinking the same, maybe the system-config-run-letsencrypt job should use the staging env properly, and then we fake out all the others?19:14
clarkb++19:14
ianwi can look at this today19:15
clarkbthanks19:15
fungii looked through the acme.sh code and it does seem to retry aggressively with delays all over the place, so i'm surprised we're still seeing 500 responses bubble up19:15
clarkb#topic Improving OpenDev's CD throughput19:15
clarkblets keep moving as we have a number of other topics to talk about today and limited time :)19:16
clarkbianw has written a stack of changes to move this along and improve our CD throughput19:16
clarkb#link https://review.opendev.org/c/opendev/system-config/+/807672 List dependencies for all jobs19:16
clarkbthis isn't currently mergeable beacuse Zuul doesn't detect this change as having changes that need jobs to run19:17
clarkbianw: I was thinking that we should maybe just put a simple edit in a file somewhere to trick it19:17
clarkbianw: like our README or a dockerfile or something19:17
ianwclarkb: I do think https://review.opendev.org/c/zuul/zuul/+/755988 might fix this type of situation19:18
fungishould we have some job which always runs?19:18
ianwbut yes, i can do something like that.  the syntax check is probably the important bit of that19:19
fungiahh, 755988 is a neat idea!19:19
clarkboh interesting I'll have to review that zuul change19:19
fungisimilar to how it handles config changes19:19
fungigreat approach19:19
clarkb#link https://review.opendev.org/c/opendev/base-jobs/+/807807 Update opendev/base-jobs to support having jobs in system-config that don't clone repos19:19
clarkb#link https://review.opendev.org/c/opendev/system-config/+/807808 stop cloning system-config in every system-config job19:20
clarkbianw: at the end of this stack we'll still be running everything serially, but in theory we'll be ready to update semaphores and run stuff in parallel?19:20
ianwyes, that's the intention19:20
clarkbgreat, they are on my list of things to review I've just got to find time between everything else :)19:21
clarkbhopefully thsi afternoon for those though19:21
ianwnp; they *should* all be no-ops for live jobs19:21
clarkbthank you for working on that19:21
ianwbut, as about 7 hours of yesterday highlights, sometimes something you think is a no-op can have interesting side-effects :)19:21
clarkb#topic Gerrit Account Cleanup19:22
clarkbI'm going to keep moving along to be sure we can get through everything. Happy to swing back to any topic at the end of our hour if we have time19:22
clarkbI don't have anything new to say on this item. This issue gets deprioritized pretty easily unfortauntely19:23
clarkbI may drop it from the meeting until I expect to be able to send the emails19:23
clarkb#topic Debian Buster to Bullseye Updates19:23
clarkbWe have updated python base images for our docker containers. We should try to move as many images as possible from buster to bullseye as buster will eventually stop getting updates19:24
clarkb#link https://review.opendev.org/c/opendev/system-config/+/809269 Gitea bullseye update19:24
clarkb#link https://review.opendev.org/c/opendev/system-config/+/809286 Gerrit bullseye update19:24
clarkbI've got those two changes pushed up for gerrit and gitea beacuse I've been making changes to their docker images recently. But basically all the containers we run need similar treatment aiui19:25
clarkbI'm brining this up first because in a few minutes we'll also discuss gitea and gerrit service upgrades. I think we should decide on the order we want to tackle these updates in. Do we do the service or the OS first?19:25
fungi"soon" is relative, the debian-lts team expecy to sipport buster until june 202419:26
fungier, expect to support19:26
ianwi would say OS then service19:26
clarkbfungi: oh isn't it like a year after release?19:26
clarkbmaybe it is a year after the n-1 release19:26
fungiofficial security support ends in july 202219:26
corvus'soon' in debian time :)19:26
clarkbfungi: aha I am not completely crazy then19:26
fungiand then lts takes over19:26
ianwit seems either are fairly easy to roll back19:27
fungithe lts peeps are separate from the debian security team19:27
clarkbianw: ++ exactly my thinking and ya happy to do OS first as a result19:27
fungisort of like openstack's stable maintenance team and extended maintenance19:27
clarkbfungi: ok I'm not sure if our python base images and the other base images enable the lts stuff or not. We don't make those19:27
clarkbprobably best to get off of buster by july 2022 then we don't have to worry about it19:27
fungiright, that'll be the bigger concern. what is the support lifetime of the python-base image19:27
ianw(i need to get back to the nodepool/dib image upgrades too)19:28
fungiwhich may or may not be tied to debian's support timeline19:28
clarkbfungi: those images are based on debian so there is some relationship there. I doubt they go past the lts period. But wouldn't be surprised if they end in july 202219:28
clarkbit is also possible they stop building updates sooner than that. And as ianw mentions the updats seem straightforward with easy reverts so we should go ahead and work through them19:29
fungithe debian docker images also aren't official, at least from the debian release team's perspective19:29
fungiso it's more about when the debian docker image maintainers want to stop supporting buster19:29
clarkbfungi: right and they are unlikely to make new buster packages once debian stops doing so19:29
clarkbthat caps the useful life of those images to likely july 202219:30
clarkb(unless they do lts)19:30
clarkbConsidering there is a vote for doing OS updates first I guess I should plan to land those two changes above tomorrow after openstack release is complete19:30
fungimight be able to infer something by looking at whether/when they stopped doing stretch images19:30
clarkbfungi: they may also just directly say it somewhere19:31
clarkbAnyway I think we can pretty quickly work through these updates and then not worry about it19:32
clarkband as a side effect we'll get newer git and other fancy new software19:32
clarkb(but git in particular should give us improvements on gitea and possibly even gerrit doing things like repacking)19:32
clarkb#topic Gitea 1.15.3 Upgrade19:33
clarkbOnce the gitea OS update is done. THis is the next thing I would like to do to gitea19:33
clarkbLatest test instance: https://198.72.124.104:3081/opendev/system-config19:33
clarkbThat test instance lgtm and the logo hosting situation has been addressed with gerrit and paste et al19:33
clarkb#link https://review.opendev.org/c/opendev/system-config/+/80323119:34
clarkbAre there any other concerns with doing this upgrade tomorrow/thursday timeframe?19:34
fungiafter the openstack release has wrapped up, i should be around to help with it19:34
ianwno issues, similarly i can help19:35
clarkbgreat and thanks19:35
fungialso this reminds me, i want to work on getting our infrastructure donors on the main opendev.org page, now that we have apache on the gitea servers we could just serve normal page content instead of having to stuff it into gitea's main page template, would that be a better place to start?19:35
clarkbfungi: there might be issues doing that and neeing to host gitea at say https://opendev.org/gitea/19:36
clarkbsince all of our existing links out there don't have that root19:36
fungiwe'd need apache to directly serve the donor logos anyway probably19:36
clarkbfungi: you can have gitea serve them just like the opendev logos19:36
clarkbthey have a static content directory with what I hope are stable paths now that they moved them19:36
fungiseems like if we configure apache to only serve that page for get requests to the / url and when there are no query parameters, that wouldn't interfere with gitea19:37
clarkbI guess we could maybe set it up where only the gitea landing page was hosted at /gitea/ and then all other paths would keep working? That is definitely my concern with doing something like that19:37
clarkbfungi: I think you still need a gitea landing page beacuse gitea serves a home link19:38
clarkbbasically you either need to hack up redirects such that that continues to work or you're hacking templates either way19:38
clarkbI don't have any objections to simply updating the home page template as a result19:38
fungii mean, as new content for what's served at the home url19:39
fungisimply shadowing that one url19:39
clarkbright, I think I prefer not relying on apache for that19:39
clarkbsince it doesn't really gain us anything and potentially complicates gitea in say k8s if we ever do that19:39
fungigot it. i was hoping we could have a way to serve an opendev.org main page without the constraints of what the gitea homepage template can support, but we can talk about it another time i guess19:40
clarkbI'm not sure I'm aware of what those constraints are?19:40
clarkbI may be missing something important19:40
fungihas to use gitea's templating, right?19:40
clarkb"yes" you can just put what you want in there and ignore the templating stuff at the header and footer19:40
fungiso we can't easily preview and publish that page separately from the gitea container19:41
ianwfungi: ++ it has lightly troubled me for a while that that page is a wall of text that seems to start talking about gerrit workflows very very early.  so having something more flexible is a good goal19:41
clarkbfungi: that is true, you have to run gitea to render the header and footer and see the entire page19:41
fungiand i guess it can't have a different header/footer from the rest of gitea19:42
clarkbI think it can, since it explicitly includes those bits19:42
clarkbBut you'd haev to use the existing templating system to make changes19:42
fungioh, so we could at least leave them out of the template if we wanted19:42
clarkbyes19:42
corvusthe header seems like a good header for that site regardless19:43
clarkb{{template "base/head" .}} and {{template "base/footer" .}} are line 1 and line before EOF19:43
corvushome/explore/get started19:43
clarkbcorvus: ++19:43
fungiyeah, i don't object to the current header and footer, just would prefer not to be unable to extend them easily19:44
clarkbfungi: you can extend them as well19:44
clarkb(we do the header already)19:44
fungiokay, anyway i didn't mean to derail the meeting19:44
corvusif we get to a point where the header for opendev.org isn't appropriate for a gitea service then we should probably move gitea to a subdomain19:45
fungijust noodling on how to have an actual project website for opendev as a whole rather than one specific to the code browser19:45
clarkbcorvus: ya that was my immediate reaction to what this would imply. I'm ok doing that too, but it seems like we haven't reached the point where that is necessary yet19:45
fungiwe could also have a different page for the opendev main page, but having it at https://opendev.org/ seems convenient19:46
clarkbLets continue as we have a couple more things to go over really quickly19:46
fungiyep19:46
clarkbThese next two are going to be related19:46
clarkb#topic Upgrading Gerrit to 3.319:46
clarkbWe are running gerrit 3.2 today. Gerrit 3.3 and 3.4 exist. 3.5 is in development but has not been released yet19:47
clarkbThe upgrade from 3.2 to 3.3 is pretty straightforward with most of the changes being UX stuff not really server backend19:47
clarkbStraight forward enough that we are testing that upgrade in CI now :)19:47
clarkbThe upgrade to 3.4 is quite a bit more involved and the release notes are extensive19:47
clarkbFor this reason I'm thinking we can do a near term upgrade to 3.3. Then plan for 3.4 maybe around quiet holidaying time? (or whenever is convenient, mostly just thinking that will take more time)19:48
fungiwhat are the main challenges you see for 3.4?19:48
ianwi'd be happy to do 3.3 upgrade on like my monday, which is usually very quiet19:49
clarkbfungi: mostly just double checking that things like plugins and zuul etc are all working with it19:49
clarkbnote you can also revert 3.3 to 3.2 and 3.4 to 3.319:49
clarkbso doing this incrementally keeps the reverts as small changes that we can do19:49
ianwyeah no schema changes for either i believe19:49
clarkb(I think you could revert 3.4 to 3.2 as well just more pieces to update)19:49
fungii can be around to help with the 3.3 upgrade on ianw's monday morning19:49
fungi(my sunday evening)19:49
clarkbthere is a schema change between 3.2 and 3.3 you have to manually edit All-Users or All-Projects to do the revert19:50
clarkbThe next topic item is scheduling the project renames next week. I was thinking it might be good to do the renames on 3.2 since we have tested and done that before19:50
clarkbhowever, we test the renames in CI on 3.2 and 3.3 currently so ti should just work if you're talking about this monday and not a general monday19:51
clarkbIn my head I was considering late next week renames, then late week after that (week of ptg) for the 3.3 upgrade19:51
fungii don't mind doing the renames on 3.3 and working through any unlikely gotchas we encounter19:52
fungibut happy to go either way19:52
clarkbianw: ^ when you said monday did you mean this monday or just generally your monday is good?19:52
clarkbwe can also revert 3.3 to 3.2 if necessary so I'm comfortable doing it this au monday if we prefer19:53
ianwi meant any monday, but the 11th does work19:53
fungialso this coming monday (11th) is a national holiday for some in the usa and most of canada19:53
ianwi feel like we've got about as much testing as is practical 19:53
clarkbok in that case I think the two option we are talking about are 1) upgrade the 11th then rename the 15th or 2) rename the 15th then upgrade the 25th19:53
clarkbsounds like everyone really likes 1) ?19:54
clarkbdo we think we need more time to announce that?19:54
fungithe sooner we get it out of the way, the better19:54
ianw++ to 119:54
clarkbin that case any objections to doing renames on the 15th at say 1500UTC ish fungi ?19:54
fungii think we can announce it on relatively short notice since we anticipate only brief outages19:54
clarkbyup thinking we can announce both the upgrade and the renames today if that is the schedule we like19:55
fungisgtm19:55
clarkbok I'll work on drafting that announcement after lunch today and make sure we get it sent out19:55
clarkbI think the actual act of upgrading gerrit is captured in the CI job. We'll basically land a change to update the gerrit image to 3.3. Then manually stop gerrit once docker-compose is updated, pull the image, run the init command then start gerrit19:56
clarkbpretty straightforward19:56
clarkbAnd we are almost out of tiem.19:56
clarkb#topic Open Discussion19:57
clarkbIs there naything else to call out in our last ~3 minutes?19:57
fungii plan to not be around on friday this week19:57
ianw#link https://review.opendev.org/c/zuul/zuul-jobs/+/81227219:57
ianwif i could get some eyes on that, it reworks the rust install which was noticed by pyca/cryptography19:57
clarkbcan do19:58
clarkbfungi: enjoy the time off. I ended up not being around as much as I expected yesterday but it was fun to walk on the beach and stop at the salt water taffy shop19:59
fungiall our salt water taffy is imported. no idea why. like our salt water isn't good enough?20:00
fungiit's a shame20:00
clarkbthis was made onsite :)20:00
clarkbAnd we are at time. Thank you everyone20:00
clarkb#endmeeting20:00
opendevmeetMeeting ended Tue Oct  5 20:00:31 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:00
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2021/infra.2021-10-05-19.01.html20:00
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-10-05-19.01.txt20:00
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2021/infra.2021-10-05-19.01.log.html20:00
fungithanks clarkb!20:01

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!