clarkb | Anyone else here for the team meeting? | 18:59 |
---|---|---|
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Oct 5 19:01:02 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2021-October/000287.html Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
ianw | o/ | 19:01 |
clarkb | The OpenStack release is happening tomorrow afternoon UTC time | 19:01 |
fungi | ahoy | 19:01 |
fungi | it'll probably start tomorrow morning utc | 19:01 |
clarkb | We should avoid changes to tools that produce code today and tomorrow until that is done | 19:01 |
fungi | but should hopefully be complete by 14:00 utc or therabouts | 19:02 |
clarkb | fungi: good point, It starts earlier but aims to be done by ~1500UTC? | 19:02 |
fungi | yeah, 15z is press release time | 19:02 |
clarkb | Today is a good day to avoid touching gerrit, gitea, zuul, etc :) | 19:02 |
fungi | but they generally shoot to have all the artifacts and docs published and rechecked at least an hour prior | 19:02 |
clarkb | I plan to try and get up a bit early tomorrow to help out if anything comes up. But ya I expect it will be done by the time i have to take kids to school which will be nice as I can do that iwthout concern then :) | 19:03 |
fungi | and it's a multi-hour process so usually begins around 10:00z or so | 19:03 |
clarkb | Anyway just be aware of that and lets avoid restarting zuul for example | 19:03 |
clarkb | #topic Actions from last meeting | 19:04 |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-09-28-19.01.txt minutes from last meeting | 19:04 |
clarkb | I don't see any recorded actions. Lets move on | 19:04 |
clarkb | #topic Specs | 19:04 |
clarkb | #link https://review.opendev.org/c/opendev/infra-specs/+/804122 Prometheus Cacti replacement | 19:04 |
clarkb | I didn't find time to update this spec to propose using the prebuilt binaries | 19:05 |
clarkb | #action clarkb update prometheus spec to use prebuilt binaries | 19:05 |
fungi | the current spec doesn't rule them out though | 19:05 |
clarkb | it does suggest we use the docker images | 19:05 |
clarkb | whcih are different | 19:05 |
clarkb | well specifically for node exporter the idea was docker iamges for that if we used it | 19:05 |
clarkb | but ianw makes a good point that we can just grab the prebuilt binaries for node exporter and host them ourselves and stick them on all our systems | 19:06 |
clarkb | that will give us consistent node exporter metrics without concern for version variance and avoids oddness with docker images | 19:06 |
clarkb | Anyway I'll try to update that. Seems like there are a lot of things happenign this week (yay pre release pressure build up all able to let go now) | 19:07 |
clarkb | #link https://review.opendev.org/810990 Mailman 3 spec | 19:07 |
clarkb | I'll give us all an action to give this a review as well. This is important not just for the service update but it will help inform us on the best path for handling the odd existing server | 19:07 |
clarkb | #action infra-root Review mailman 3 spec | 19:07 |
fungi | should be in a reviewable state, i don't have any outstanding todos for it, but feedback would be appreciated | 19:07 |
clarkb | #topic Topics | 19:08 |
clarkb | #topic PTGBot Deployment | 19:08 |
clarkb | This is a late entry to the meeting agenda that I added to my local copy | 19:08 |
clarkb | Looks like we've got a stack of changes to deploy ptgbot on the new eavesdrop setup, but we're struggling with LE staging errors | 19:09 |
clarkb | LE doesn't indicate any issues at https://letsencrypt.status.io/ | 19:09 |
clarkb | We also had an idea that we might be able to split up the handlers/main.yaml that handles all the service restarts post cert update. That would then allow us to run more minimal sets of jobs when doing LE updates, but ansible doesn't work that way unfortunately | 19:09 |
clarkb | acme.sh is already retrying things that the protocol is expected to be less reliable with | 19:10 |
clarkb | for that reason I hesitate to add retries in our ansible calls of acme.sh but that is a possibility too if we want to try and force this a bit more | 19:10 |
ianw | i wonder if we should just short-cut acme.sh | 19:10 |
clarkb | fungi: ianw: ^ anything else to add on this subject? Mostly wanted to call it out because the LE stuff could have more widespread implications | 19:10 |
clarkb | ianw: when testing you mean? | 19:11 |
fungi | ianw: i wouldn't be entirely opposed | 19:11 |
ianw | at the moment, what it does is asks the staging to setup the certs, so we get the TXT responses | 19:11 |
ianw | but we never actually put them into dns and finish verifying, we self-generate with openssl | 19:11 |
fungi | or i considered deploying a pebble container on the test nodes and pointing acme.sh to a localhost:xxxx url | 19:11 |
ianw | yeah, that has been on my todo list for a long time :) | 19:11 |
clarkb | In the past when we've seen the staging api have trouble it usually goes away within a day. Not great to rely on that nor is there any garuntee or indication that will be the case when it happens again | 19:12 |
fungi | the staging environment api docs outright say it's not recommended for use in ci jobs | 19:12 |
ianw | in testing mode, we could just avoid calling acme.sh and somehow output a fake/random TXT record, to keep testing the surrounding bits | 19:12 |
clarkb | ianw: that might be a good compromise | 19:13 |
clarkb | ianw: the driver.sh could echo that out pretty easily | 19:13 |
ianw | i can look at this today; it would be nice to keep the path on one job, but maybe we should have a specific acme.sh test job | 19:13 |
clarkb | something like cat /dev/urandom | tr [:stuff:] [:otherstuff:] | tail -20 | 19:14 |
ianw | i have had https://github.com/acmesh-official/acme.sh/pull/2606 open for 2 years to better detect failures; clearly it hasn't attracted attention | 19:14 |
fungi | yes, i was thinking the same, maybe the system-config-run-letsencrypt job should use the staging env properly, and then we fake out all the others? | 19:14 |
clarkb | ++ | 19:14 |
ianw | i can look at this today | 19:15 |
clarkb | thanks | 19:15 |
fungi | i looked through the acme.sh code and it does seem to retry aggressively with delays all over the place, so i'm surprised we're still seeing 500 responses bubble up | 19:15 |
clarkb | #topic Improving OpenDev's CD throughput | 19:15 |
clarkb | lets keep moving as we have a number of other topics to talk about today and limited time :) | 19:16 |
clarkb | ianw has written a stack of changes to move this along and improve our CD throughput | 19:16 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/807672 List dependencies for all jobs | 19:16 |
clarkb | this isn't currently mergeable beacuse Zuul doesn't detect this change as having changes that need jobs to run | 19:17 |
clarkb | ianw: I was thinking that we should maybe just put a simple edit in a file somewhere to trick it | 19:17 |
clarkb | ianw: like our README or a dockerfile or something | 19:17 |
ianw | clarkb: I do think https://review.opendev.org/c/zuul/zuul/+/755988 might fix this type of situation | 19:18 |
fungi | should we have some job which always runs? | 19:18 |
ianw | but yes, i can do something like that. the syntax check is probably the important bit of that | 19:19 |
fungi | ahh, 755988 is a neat idea! | 19:19 |
clarkb | oh interesting I'll have to review that zuul change | 19:19 |
fungi | similar to how it handles config changes | 19:19 |
fungi | great approach | 19:19 |
clarkb | #link https://review.opendev.org/c/opendev/base-jobs/+/807807 Update opendev/base-jobs to support having jobs in system-config that don't clone repos | 19:19 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/807808 stop cloning system-config in every system-config job | 19:20 |
clarkb | ianw: at the end of this stack we'll still be running everything serially, but in theory we'll be ready to update semaphores and run stuff in parallel? | 19:20 |
ianw | yes, that's the intention | 19:20 |
clarkb | great, they are on my list of things to review I've just got to find time between everything else :) | 19:21 |
clarkb | hopefully thsi afternoon for those though | 19:21 |
ianw | np; they *should* all be no-ops for live jobs | 19:21 |
clarkb | thank you for working on that | 19:21 |
ianw | but, as about 7 hours of yesterday highlights, sometimes something you think is a no-op can have interesting side-effects :) | 19:21 |
clarkb | #topic Gerrit Account Cleanup | 19:22 |
clarkb | I'm going to keep moving along to be sure we can get through everything. Happy to swing back to any topic at the end of our hour if we have time | 19:22 |
clarkb | I don't have anything new to say on this item. This issue gets deprioritized pretty easily unfortauntely | 19:23 |
clarkb | I may drop it from the meeting until I expect to be able to send the emails | 19:23 |
clarkb | #topic Debian Buster to Bullseye Updates | 19:23 |
clarkb | We have updated python base images for our docker containers. We should try to move as many images as possible from buster to bullseye as buster will eventually stop getting updates | 19:24 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/809269 Gitea bullseye update | 19:24 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/809286 Gerrit bullseye update | 19:24 |
clarkb | I've got those two changes pushed up for gerrit and gitea beacuse I've been making changes to their docker images recently. But basically all the containers we run need similar treatment aiui | 19:25 |
clarkb | I'm brining this up first because in a few minutes we'll also discuss gitea and gerrit service upgrades. I think we should decide on the order we want to tackle these updates in. Do we do the service or the OS first? | 19:25 |
fungi | "soon" is relative, the debian-lts team expecy to sipport buster until june 2024 | 19:26 |
fungi | er, expect to support | 19:26 |
ianw | i would say OS then service | 19:26 |
clarkb | fungi: oh isn't it like a year after release? | 19:26 |
clarkb | maybe it is a year after the n-1 release | 19:26 |
fungi | official security support ends in july 2022 | 19:26 |
corvus | 'soon' in debian time :) | 19:26 |
clarkb | fungi: aha I am not completely crazy then | 19:26 |
fungi | and then lts takes over | 19:26 |
ianw | it seems either are fairly easy to roll back | 19:27 |
fungi | the lts peeps are separate from the debian security team | 19:27 |
clarkb | ianw: ++ exactly my thinking and ya happy to do OS first as a result | 19:27 |
fungi | sort of like openstack's stable maintenance team and extended maintenance | 19:27 |
clarkb | fungi: ok I'm not sure if our python base images and the other base images enable the lts stuff or not. We don't make those | 19:27 |
clarkb | probably best to get off of buster by july 2022 then we don't have to worry about it | 19:27 |
fungi | right, that'll be the bigger concern. what is the support lifetime of the python-base image | 19:27 |
ianw | (i need to get back to the nodepool/dib image upgrades too) | 19:28 |
fungi | which may or may not be tied to debian's support timeline | 19:28 |
clarkb | fungi: those images are based on debian so there is some relationship there. I doubt they go past the lts period. But wouldn't be surprised if they end in july 2022 | 19:28 |
clarkb | it is also possible they stop building updates sooner than that. And as ianw mentions the updats seem straightforward with easy reverts so we should go ahead and work through them | 19:29 |
fungi | the debian docker images also aren't official, at least from the debian release team's perspective | 19:29 |
fungi | so it's more about when the debian docker image maintainers want to stop supporting buster | 19:29 |
clarkb | fungi: right and they are unlikely to make new buster packages once debian stops doing so | 19:29 |
clarkb | that caps the useful life of those images to likely july 2022 | 19:30 |
clarkb | (unless they do lts) | 19:30 |
clarkb | Considering there is a vote for doing OS updates first I guess I should plan to land those two changes above tomorrow after openstack release is complete | 19:30 |
fungi | might be able to infer something by looking at whether/when they stopped doing stretch images | 19:30 |
clarkb | fungi: they may also just directly say it somewhere | 19:31 |
clarkb | Anyway I think we can pretty quickly work through these updates and then not worry about it | 19:32 |
clarkb | and as a side effect we'll get newer git and other fancy new software | 19:32 |
clarkb | (but git in particular should give us improvements on gitea and possibly even gerrit doing things like repacking) | 19:32 |
clarkb | #topic Gitea 1.15.3 Upgrade | 19:33 |
clarkb | Once the gitea OS update is done. THis is the next thing I would like to do to gitea | 19:33 |
clarkb | Latest test instance: https://198.72.124.104:3081/opendev/system-config | 19:33 |
clarkb | That test instance lgtm and the logo hosting situation has been addressed with gerrit and paste et al | 19:33 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/803231 | 19:34 |
clarkb | Are there any other concerns with doing this upgrade tomorrow/thursday timeframe? | 19:34 |
fungi | after the openstack release has wrapped up, i should be around to help with it | 19:34 |
ianw | no issues, similarly i can help | 19:35 |
clarkb | great and thanks | 19:35 |
fungi | also this reminds me, i want to work on getting our infrastructure donors on the main opendev.org page, now that we have apache on the gitea servers we could just serve normal page content instead of having to stuff it into gitea's main page template, would that be a better place to start? | 19:35 |
clarkb | fungi: there might be issues doing that and neeing to host gitea at say https://opendev.org/gitea/ | 19:36 |
clarkb | since all of our existing links out there don't have that root | 19:36 |
fungi | we'd need apache to directly serve the donor logos anyway probably | 19:36 |
clarkb | fungi: you can have gitea serve them just like the opendev logos | 19:36 |
clarkb | they have a static content directory with what I hope are stable paths now that they moved them | 19:36 |
fungi | seems like if we configure apache to only serve that page for get requests to the / url and when there are no query parameters, that wouldn't interfere with gitea | 19:37 |
clarkb | I guess we could maybe set it up where only the gitea landing page was hosted at /gitea/ and then all other paths would keep working? That is definitely my concern with doing something like that | 19:37 |
clarkb | fungi: I think you still need a gitea landing page beacuse gitea serves a home link | 19:38 |
clarkb | basically you either need to hack up redirects such that that continues to work or you're hacking templates either way | 19:38 |
clarkb | I don't have any objections to simply updating the home page template as a result | 19:38 |
fungi | i mean, as new content for what's served at the home url | 19:39 |
fungi | simply shadowing that one url | 19:39 |
clarkb | right, I think I prefer not relying on apache for that | 19:39 |
clarkb | since it doesn't really gain us anything and potentially complicates gitea in say k8s if we ever do that | 19:39 |
fungi | got it. i was hoping we could have a way to serve an opendev.org main page without the constraints of what the gitea homepage template can support, but we can talk about it another time i guess | 19:40 |
clarkb | I'm not sure I'm aware of what those constraints are? | 19:40 |
clarkb | I may be missing something important | 19:40 |
fungi | has to use gitea's templating, right? | 19:40 |
clarkb | "yes" you can just put what you want in there and ignore the templating stuff at the header and footer | 19:40 |
fungi | so we can't easily preview and publish that page separately from the gitea container | 19:41 |
ianw | fungi: ++ it has lightly troubled me for a while that that page is a wall of text that seems to start talking about gerrit workflows very very early. so having something more flexible is a good goal | 19:41 |
clarkb | fungi: that is true, you have to run gitea to render the header and footer and see the entire page | 19:41 |
fungi | and i guess it can't have a different header/footer from the rest of gitea | 19:42 |
clarkb | I think it can, since it explicitly includes those bits | 19:42 |
clarkb | But you'd haev to use the existing templating system to make changes | 19:42 |
fungi | oh, so we could at least leave them out of the template if we wanted | 19:42 |
clarkb | yes | 19:42 |
corvus | the header seems like a good header for that site regardless | 19:43 |
clarkb | {{template "base/head" .}} and {{template "base/footer" .}} are line 1 and line before EOF | 19:43 |
corvus | home/explore/get started | 19:43 |
clarkb | corvus: ++ | 19:43 |
fungi | yeah, i don't object to the current header and footer, just would prefer not to be unable to extend them easily | 19:44 |
clarkb | fungi: you can extend them as well | 19:44 |
clarkb | (we do the header already) | 19:44 |
fungi | okay, anyway i didn't mean to derail the meeting | 19:44 |
corvus | if we get to a point where the header for opendev.org isn't appropriate for a gitea service then we should probably move gitea to a subdomain | 19:45 |
fungi | just noodling on how to have an actual project website for opendev as a whole rather than one specific to the code browser | 19:45 |
clarkb | corvus: ya that was my immediate reaction to what this would imply. I'm ok doing that too, but it seems like we haven't reached the point where that is necessary yet | 19:45 |
fungi | we could also have a different page for the opendev main page, but having it at https://opendev.org/ seems convenient | 19:46 |
clarkb | Lets continue as we have a couple more things to go over really quickly | 19:46 |
fungi | yep | 19:46 |
clarkb | These next two are going to be related | 19:46 |
clarkb | #topic Upgrading Gerrit to 3.3 | 19:46 |
clarkb | We are running gerrit 3.2 today. Gerrit 3.3 and 3.4 exist. 3.5 is in development but has not been released yet | 19:47 |
clarkb | The upgrade from 3.2 to 3.3 is pretty straightforward with most of the changes being UX stuff not really server backend | 19:47 |
clarkb | Straight forward enough that we are testing that upgrade in CI now :) | 19:47 |
clarkb | The upgrade to 3.4 is quite a bit more involved and the release notes are extensive | 19:47 |
clarkb | For this reason I'm thinking we can do a near term upgrade to 3.3. Then plan for 3.4 maybe around quiet holidaying time? (or whenever is convenient, mostly just thinking that will take more time) | 19:48 |
fungi | what are the main challenges you see for 3.4? | 19:48 |
ianw | i'd be happy to do 3.3 upgrade on like my monday, which is usually very quiet | 19:49 |
clarkb | fungi: mostly just double checking that things like plugins and zuul etc are all working with it | 19:49 |
clarkb | note you can also revert 3.3 to 3.2 and 3.4 to 3.3 | 19:49 |
clarkb | so doing this incrementally keeps the reverts as small changes that we can do | 19:49 |
ianw | yeah no schema changes for either i believe | 19:49 |
clarkb | (I think you could revert 3.4 to 3.2 as well just more pieces to update) | 19:49 |
fungi | i can be around to help with the 3.3 upgrade on ianw's monday morning | 19:49 |
fungi | (my sunday evening) | 19:49 |
clarkb | there is a schema change between 3.2 and 3.3 you have to manually edit All-Users or All-Projects to do the revert | 19:50 |
clarkb | The next topic item is scheduling the project renames next week. I was thinking it might be good to do the renames on 3.2 since we have tested and done that before | 19:50 |
clarkb | however, we test the renames in CI on 3.2 and 3.3 currently so ti should just work if you're talking about this monday and not a general monday | 19:51 |
clarkb | In my head I was considering late next week renames, then late week after that (week of ptg) for the 3.3 upgrade | 19:51 |
fungi | i don't mind doing the renames on 3.3 and working through any unlikely gotchas we encounter | 19:52 |
fungi | but happy to go either way | 19:52 |
clarkb | ianw: ^ when you said monday did you mean this monday or just generally your monday is good? | 19:52 |
clarkb | we can also revert 3.3 to 3.2 if necessary so I'm comfortable doing it this au monday if we prefer | 19:53 |
ianw | i meant any monday, but the 11th does work | 19:53 |
fungi | also this coming monday (11th) is a national holiday for some in the usa and most of canada | 19:53 |
ianw | i feel like we've got about as much testing as is practical | 19:53 |
clarkb | ok in that case I think the two option we are talking about are 1) upgrade the 11th then rename the 15th or 2) rename the 15th then upgrade the 25th | 19:53 |
clarkb | sounds like everyone really likes 1) ? | 19:54 |
clarkb | do we think we need more time to announce that? | 19:54 |
fungi | the sooner we get it out of the way, the better | 19:54 |
ianw | ++ to 1 | 19:54 |
clarkb | in that case any objections to doing renames on the 15th at say 1500UTC ish fungi ? | 19:54 |
fungi | i think we can announce it on relatively short notice since we anticipate only brief outages | 19:54 |
clarkb | yup thinking we can announce both the upgrade and the renames today if that is the schedule we like | 19:55 |
fungi | sgtm | 19:55 |
clarkb | ok I'll work on drafting that announcement after lunch today and make sure we get it sent out | 19:55 |
clarkb | I think the actual act of upgrading gerrit is captured in the CI job. We'll basically land a change to update the gerrit image to 3.3. Then manually stop gerrit once docker-compose is updated, pull the image, run the init command then start gerrit | 19:56 |
clarkb | pretty straightforward | 19:56 |
clarkb | And we are almost out of tiem. | 19:56 |
clarkb | #topic Open Discussion | 19:57 |
clarkb | Is there naything else to call out in our last ~3 minutes? | 19:57 |
fungi | i plan to not be around on friday this week | 19:57 |
ianw | #link https://review.opendev.org/c/zuul/zuul-jobs/+/812272 | 19:57 |
ianw | if i could get some eyes on that, it reworks the rust install which was noticed by pyca/cryptography | 19:57 |
clarkb | can do | 19:58 |
clarkb | fungi: enjoy the time off. I ended up not being around as much as I expected yesterday but it was fun to walk on the beach and stop at the salt water taffy shop | 19:59 |
fungi | all our salt water taffy is imported. no idea why. like our salt water isn't good enough? | 20:00 |
fungi | it's a shame | 20:00 |
clarkb | this was made onsite :) | 20:00 |
clarkb | And we are at time. Thank you everyone | 20:00 |
clarkb | #endmeeting | 20:00 |
opendevmeet | Meeting ended Tue Oct 5 20:00:31 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:00 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2021/infra.2021-10-05-19.01.html | 20:00 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-10-05-19.01.txt | 20:00 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2021/infra.2021-10-05-19.01.log.html | 20:00 |
fungi | thanks clarkb! | 20:01 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!