Tuesday, 2022-10-11

*** kopecmartin|sick is now known as kopecmartin08:08
clarkbmeeting time19:00
ianwo/19:00
fungiohai19:00
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Oct 11 19:01:04 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
frickler\o19:01
clarkb#link https://lists.opendev.org/pipermail/service-discuss/2022-October/000364.html Our Agenda19:01
clarkb#topic Announcements19:01
clarkbThe PTG is happening next week. Be aware of that as we make changes (best to avoid updating meetpad and etherpad for example)19:01
clarkbAlso encourage people that have problems to reach out to us directly. In the past we've gotten reports of trouble via a game of telephone and that has been difficult19:02
clarkbAlso, this morning I spilled a glass of water on my keyboard so I've been in disarray all day. Thankfully it was the desktop keyboard and not the laptop and I've got a spare modem M that has been plugged in19:03
clarkbbut still its really weird to type on a new keyboard after having that one for a decade19:03
fungithe zuul default ansible version changed to 6 across all of our tenants as of late last week, and we'll be dropping ansible 5 support "real soon" so fixing related problems is time-sensitive19:03
fungiprobably warrants another announcement if we can nail down a timeline for the ansible 5 removal change19:04
clarkbI did at least warn of the ansible 5 removal in the email anouncing the switch to 6 by default19:04
fungiyep19:05
clarkb#topic Topics19:06
clarkb#topic Bastion Host Updates19:06
clarkbThe changes to stop writing console log files on bridge landed yesterday. Looks like there was a small issue getting the flag name correct. ianw do we have an idea yet if that is working as expected?19:07
fungiand we still need a similar one for static.o.o right? or is that already up?19:08
ianwi just checked and i think so.  there hasn't been a log file written since Oct 11 06:09 (UTC) which is about when all the periodic jobs cleared out19:08
ianwstatic has been done the same way, so it looks good too19:09
clarkbawesome. Thats one less thing to worry about now :)19:09
ianwheh yes, thank you for reviews.  i think it was good to reach a bit more of a generic solution19:10
ianwthe "tunnel console things via a socket and the ssh connection" changes are another option that is still on my todo list, and seems like a great thing to look into as well19:10
ianwone day ... :)19:11
clarkbya though I think we don't want to expose that on bridge or static due to how the protocl works19:11
clarkbsince it could be used to read other files?19:11
clarkbI think what we've done not exposing it is correct for us19:11
fungiis intended for reading additional files19:11
fungidesigned with that in mind anyway19:11
clarkbThe other stack of changes in flight here has to do with ansible in a venv19:12
clarkb#link https://review.opendev.org/q/topic:bridge-ansible-venv19:12
corvuser... the current protocol shouldn't be able to read other files?19:12
clarkbcorvus: ya I don't think it does, but that was always part of the intention iirc. And I don't want to have to think about undoing any opening of things if that changes19:13
fungii know it was at least a theoretical future use case19:13
corvusclarkb: yes, it was originally designed for that, but we haven't implemented it because we haven't figured out how to do it safely19:13
clarkbgotcha19:13
corvusso if that's the concern -- just future-proofing cool.  but if there was a thought that we had a current vulnerability... then i would like to explore that more.  :)19:14
clarkboh no I don't think we have a current vulnerability. We're set up to avoid it should the zuul behavior change to what was (I thought anyway) the intended behavior19:14
corvus(and there's really 2 protocols here -- there's the websocket/finger protocol of user -> zuul, and the internal protocol of executor -> node; the former is the one we designed to allow reading other files in the future, and the second is what we just changed)19:15
corvus(though support for the former probably would need changes like the latter)19:16
corvusokay, so i think we're all on the same page that currently there is not the ability to read arbitrary files, but that we like the status quo of explicitly disabling log streaming on bridge because, among other things, that future-proofs us against eventually adding that feature.  ya?19:16
clarkb++19:17
corvuscool, thx and sorry for the diversion.  just wanted to make sure we didn't open something we didn't intend to.  :)19:17
clarkbianw: for ansible in a venv did you manage to sort out using the first member of a singleton group as the hosts specification?19:18
ianw(specifically i was talking about https://review.opendev.org/c/zuul/zuul/+/542469 but let's not go into that further now :)19:18
ianwclarkb: thanks ... one step back i just approved the change reviewed by yourself and fungi to move the production ansible into a venv on the current bridge.  so i'll watch that in today.  that's the "venv" bit of it really19:19
fungiand that preps us for being able to use newer ansible, right?19:20
clarkbya and in theory thta should just switch over due to symlinking the venv install over to ansible19:20
clarkb(that was my thought during review anyway)19:20
ianwyep, *in theory* it's a noop :)19:20
clarkbfungi: sort of, we need to upgrade the python installation too (which is where the replacement node comes in and why the other group work is related)19:20
ianwthe bits on top now are about upgrading to jammy, and abstracting the way we address the bastion host so we can switch the host more easily -- in this case to probably bridge01.opendev.org19:20
ianwanyway, i did establish that as a playbook matcher "groupname[0]" does seem to work to address the first member of a group19:21
corvuslike `- hosts: bridgegroup[0]` means this is a play that runs on the first host in the bridge group?19:22
corvus(er, in the group named "bridgegroup"; i was trying to be clear and may have failed :)19:22
fungiand group member ordering is guaranteed deterministic (uses the order in which the members are added i guess) right?19:22
clarkbya I the idea being we can control what the bridge is in a single place (the bridgegroup group) but then only ever have a single entry in that group19:22
ianwyep -- https://review.opendev.org/c/opendev/system-config/+/85847619:22
clarkbfungi: the idea is that it would be a singleton group19:22
clarkbbut to enforce that we would take the first entry everywhere19:23
fungii see19:23
corvuswhy not just let it run on the whole group of 1?19:23
clarkbthe reason I was concerned with that is it makes the ansible really confusing when you need to address a specific node19:23
clarkblike when grabbing the CA files19:23
clarkbthe ansible you express becomes "create a different CA on every member of the bridge group, but only distribute the CA files for the first group member19:24
clarkbif others prefer that I'm ok with that too, but I found it a bit confusing to read when I reviewed it19:24
ianwcorvus: one problem i haven't dealt with yet is playbooks/bootstrap-bridge.yaml.  that runs both under zuul, where the inventory is setup via the job, and in infra-prod, where the inventory is setup by opendev/base-jobs19:25
corvusi'm not sure whether or not i would have the same confusion, but i certainly see your point, and the solution seems good.  now that i know the reasoning, i can be on board with that.19:25
ianwso basically both have to agree on the name/group.  this is a bit annoying for clarkb's note of trying to use a different group name for the initial setup bastion host, and the production version19:26
ianwsorry, that wasn't intended for corvus: ... :) 19:26
clarkboh hrm if using distinct groups for the top level ansible and nested ansible in CI is problematic I think we can just not do that19:26
corvusoh whew cause that's a hard question and i was struggling with that.  glad i'm off the hook.  :)19:27
clarkbit was an idea I had whentrying to sort out why the job needed to redefine the group19:27
ianwyeah, it is mostly explained in the comment at https://review.opendev.org/c/opendev/system-config/+/858476/9/zuul.d/system-config-run.yaml19:27
ianwanyway -- i will keep at it and see what we can come up with; i don't think we need a solution now19:28
corvusintuitively, having the group name be the same makes sense to me... so if that's a workable/livable option i would be in favor of that.19:28
ianwi think that's where i'm coming back to as well ...19:29
corvusand maybe keep a version of that comment explaining that we're using that as a group for the zuul playbook19:30
fungisounds good to me19:30
clarkbwfm19:30
ianwyes i will definitely do my usual probably-too-verbose commenting on all this :)19:30
ianwanyway, I think it's quite likely by this time next meeting we'll have a fully updated bridge, and an easier path when we want to rotate it out next time as well19:32
clarkbsounds good. Thank you for working through all the little details of this19:32
clarkb#topic Upgrading Bionic Servers19:32
clarkbThe expected fix for removing the ubuntu user has landed. Now just need to try booting a jammy control plane server again. I'm hoping to give that a go sometime this week.19:33
clarkbSounds like ianw may also give it a go19:33
clarkbBut other than that I didn't have any new updated here19:33
fungiwe'll want it before we boot the new listserv at the very least19:33
clarkbyup I was thinking I'd find something easy to replace as a guinea pig like a mirror maybe19:35
clarkbbut probably not until the end of this week19:35
clarkbLets keep moving as the last topic on the agenda is one that deserves discussion before we run out of time19:36
clarkb#topic Mailman 319:36
clarkbfungi has edited the extra long strings on he production mailman2 site and has begun the process of copying data for reattempting the mm3 migration on a newly held test node with our forked images19:36
funginew held node for this is 149.202.168.204, built from your container image fork19:37
fungiwill hopefully kick off a new scripted import on it within the next hour or so19:37
fungidepending on how much longer the rsync runs19:37
clarkbcorvus: we noticed that a child change of https://review.opendev.org/c/opendev/system-config/+/860157 doesn't find the images that change builds. And were wondering if we got the bits wrong for telling zuul about the image19:37
clarkbcorvus: maybe if you get some time you can take a look at how the new image build jobs and system-config-run-mailman3 job are hooked up with the buildest registry and provides/requires and dependencies19:37
clarkbwe've worked around it by forcingthe node hold change to rebuild the images itself19:38
clarkbfungi: anything else you need from the rest of us? I expect it is largely just a wait for test results though19:38
fungiwe've knocked out about all the remaining todo items, so we're probably ready to talk scheduling for lists.opendev.org and lists.zuul-ci.org production migrations19:39
corvusclarkb: let's continue that in #opendev19:39
fungii did want to check a few more urls for possible easy/convenient redirects (things like list description pages which people tend to link in various places)19:39
fungistuff not covered by keeping a copy of the pipermail archives hosted from the new server19:40
clarkbcorvus: yup don't need to solve that here19:40
clarkbfungi: good idea, the existing redirects are probably not much help though as tehy redirect to content on disk but you probbaly want url redirects to mm3 urls for those19:40
fungiright. i think the list description pages are probably the only thing we really care about redirects for19:41
fungithe list indexes for the sites are just served from the root url of each vhost anyway19:42
fungiand i'm not too worried about redirecting old admin and moderator interface urls19:42
clarkbmakes sense19:42
clarkbanything else on mm3 before we continue?19:42
fungiwe should probably also confirm whether we want local logins for users or whether there's a desire to hold this for keycloak integration in order to avoid local credentials in mailman19:43
fungii'm assuming we'd rather get the mm3 migration done and then look at keycloak integration after the fact, but just want to be sure everyone's on the same page there19:44
clarkbyou can subscribe to lists without creating a user (I did this with upstream mm3)19:44
fungicorrect19:44
clarkbwe might even encourage users to do that if they never want to use the web ui for repsonding to things19:44
clarkbbut ya I wasn't too worried about a future switch over19:44
ianwjust off the top of my head, it feels like if we allow local logins and then move to a more generic keycloak, we then have the problem of having to merge the local users too?19:44
fungilist admins/moderators will need accounts though, and if someone wants to adjust their subscription preferences they'll need a login19:44
clarkbianw: yes we'd likely need to do that. The good thing is we should have email on both sides to align them at least19:45
fungiianw: we'll have that either way. subscribers technically all have accounts, they just don't necessarily have login info for them unless they go through the password reset19:45
ianwahh ok19:46
frickleris the login per list or per site or per installation? for mm2 it was per list iiuc19:46
clarkbfrickler: its per installation19:46
fungifrickler: right, for mm3 it's system-wide19:47
fungiso not just all lists on a given site, but all mailman sites on that server19:47
fungiconvenient for folks who interact with a lot of lists, especially across multiple domains on the sam ehost19:47
fungisame host19:48
fricklerso if this is needed to set e.g. digest mode, I think we cannot delay it into the future19:49
fungianyway, i didn't have anything else. we can mull that over, i expect we'll start doing migration scheduling after the ptg19:49
fungifrickler: correct19:49
fungibasically the options are 1. wait to migrate lists to mm3 until we have keycloak in production the way we want, or 2. migrate to mm3 and then integrate keycloak later and make sure accounts can be linked/merged as needed19:50
clarkbright, I think some users will still need to create accounts, but a good chunk of them shouldn't need to which helps simplify things if we want to try and keep them simple like that19:50
clarkbI'm fine with 219:50
fricklerack19:50
fungiwell, to reiterate, the accounts are precreated, whether the users have login info for them or not19:50
clarkbfungi: for all uses?19:51
clarkbI guess the migration doesn't stick to not creating an account if it doesn't need to19:51
fungiif they're referenced in a config (admin, mod, existing subscription) then the import process creates their accounts. if they subscribe later an account is created the first time they do so19:51
clarkbanyway I think its fine to migrate them later since in this case we should have the info needed to make associations19:51
clarkbalso the mailing list is the sort of thing that can probably safely not have single sign on forever19:52
clarkbwe are running out of time and I do want to get to the last item on the agenda19:53
clarkbwe can return to this in #opendev if necessary19:53
fungiplease do19:53
clarkb#topic Updating OpenDev's base job nodeset to Jammy19:53
clarkbIt has been pointed out that OpenDev's base job nodeset is still Focal. Jammy has been out for about half a year now and has a .1 release. It should be stable enough for our jobs19:53
clarkbBut that opens questions about how we want to communicate and schedule the switch19:54
frickleryes, I came across that while looking to upgrade devstack jobs19:54
clarkbI was thinking that we should avoid changing it before the PTG since that will just add a distraction during PTG week. But maybe we can do it the week after ish? Basically do a 2 week notice to service-announce and then swap?19:54
fungiopenstack is actively switching from focal to jammy for testing now that their zed release is done19:54
fricklerI think we'd want to run some tests with base-test before discussing details of scheduling?19:54
clarkbfrickler: in the past we've done that (when the infra team managed this all for openstack) and he problem with that is it sets the expectation that we are repsonsible for making it work for every job19:55
clarkbI twas the xenial switch or maybe trusty switch that made me never want to do that again.19:55
clarkbI think people should test what they are interested in and be explicit where they know they need to be (say for specific verisons of python).19:56
fricklerstill we'd need to change base-test in order to allow for that?19:56
frickler#link https://review.opendev.org/c/opendev/base-jobs/+/860686 would be the change for that19:56
clarkbfrickler: no, any job can select the jammy nodeset19:56
fungianything inheriting from our default nodeset which breaks when we change it has the option of overriding the nodeset it uses to the earlier value anyway19:56
fungijust as it can be adjusted to use the new value before our planned transition date19:57
fricklerhmm, true that19:57
clarkbI think updating base-test is a good idea to keep it in sync with base. But I don't think that is the method for tesitng this. base-test is for testing the roles in base19:57
clarkbwe know they work on jammy because projects like zuul already use jammy19:57
clarkbso we don19:57
clarkber we don't need to test that base functionality19:58
corvusi agree i don't think this needs a base-test cycle since we know that the change won't break all jobs (because we can and have made the change explicitly elsewhere, and zuul performs syntax validation on the change)19:58
fungiin my mind, the main questions are when do we plan to switch it/how much advance notice do we want to provide users19:58
clarkbfungi: ++19:58
clarkbI think we should wait for after the PTG at the very least19:58
fungiwait for after the ptg to announce it, or for actually changing it?19:59
clarkbactually changing it. Ideally we should announce whatever we decide on real soon now19:59
frickler2 week notice should be fine then. announce now19:59
fungisounds good to me19:59
ianw++19:59
clarkbcool. I can work on a draft for service-announce after lunch today20:00
clarkb(I'm happy to send that as I think most others get moderated)20:00
fungihowever, we should be mindful of the zuul dropping ansible 5 situation as well, and whether we want those to coincide, or be announced together, or not compete20:00
clarkbdropping ansible 5 has already been announced but without a hard date. I think it was a week or so from today that zuul had planned to drop ansible520:01
frickleragree, having a couple of days between them will help in distinguishing failure causes20:01
clarkbwe will need to manually restart zuul to pick up that change quicker than our weekly restarts. But that is easy to do20:02
clarkb(also I don't think anything is using ansible 5 so should be an easy switch)20:02
clarkbI'll work on a draft email for all that in a bit20:02
fungithanks!20:02
clarkband we are at time20:02
ianwfor mine i think it probably gets confusing to combine them as a single change, as they're not really related as such, so agree with doing separtely20:02
clarkbthanks everyone20:02
corvusthanks clarkb 20:02
clarkbfeel free to continue discussion over in #opendev20:03
clarkb#endmeeting20:03
opendevmeetMeeting ended Tue Oct 11 20:03:05 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:03
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2022/infra.2022-10-11-19.01.html20:03
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-10-11-19.01.txt20:03
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2022/infra.2022-10-11-19.01.log.html20:03
fungithanks clarkb!20:03
clarkbI'm also trying to decide how hard I should try and rescue that keyboard20:03
clarkbI have to rip rubber feet off the bottom to get at the screws then somehow workaround some clips....20:03
clarkbbut one thing at a time20:03

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!