Tuesday, 2021-06-15

*** diablo_rojo is now known as Guest218103:19
*** diablo_rojo__ is now known as diablo_rojo13:13
clarkbAnyone else here for the meeting? We will get started in a few minutes18:58
ianwo/19:00
fungiohai19:00
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Jun 15 19:01:07 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2021-June/000254.html Our Agenda19:01
clarkb#topic Announcements19:01
clarkbI will not be around next week. We will either need a volunteer meeting chair or we can skip19:01
clarkbI'll leave that up to those who will be around to decide :)19:01
clarkb#topic Actions from last meeting19:02
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-06-08-19.01.txt minutes from last meeting19:02
clarkb#action clarkb Followup with OpenStack on ELK retirement19:02
clarkbI have not done this yet19:03
clarkb#action someone write spec to replace Cacti with Prometheus19:03
clarkbI have not seen a spec for this either. I assume it hasn't been done19:03
clarkbianw: did centos ppc packages get cleaned up?19:03
ianwnot yet sorry19:03
clarkb#action ianw Push change to cleanup ppc packages in our CentOS mirrors19:04
clarkbno worries i think we had a number of distractions last week19:04
clarkbLets jump in and talk about them :)19:04
clarkb#topic Topics19:04
clarkb#topic Eavesdrop and Limnoria19:04
clarkbI wanted to call out that we had to fix a bug in limnoria to handle joins to many channels properly19:05
clarkbThis morning fungi discovered that limnoria doesn't seem to aggressively flush files to disk, but there is a config option we can toggle to have it do that19:05
ianwhrm i'm pretty sure i turned that on19:05
fungiwe don't know for certain this will fix the observed behavior19:05
clarkbAnd gmann was asking about the ptg.openstack.org etherpad lists were were/are hosted on eavesdrop01.openstack.org19:05
clarkb#link https://review.opendev.org/c/opendev/system-config/+/796513/ Limnoria flush channel logs19:06
fungiyeah, those are in ptgbot's sqlite database19:06
clarkbOn the whole we seem to be tackling these issues as they pop up so I'm not super worried, but wanted to call them out in case people want to look into any of them19:06
fungithe channel log flushing i'm not so sure is what we think it is. i watched some channels updating rather readily, while others it decided to not flush the log files for a day or so19:06
clarkbon a related note at ~00:30UTC today freenode killed itself and split into a new freenode network with no user or channel migration19:07
clarkbif there was any qusetion about us making the right choice to move I think that is settled now.19:07
ianwhttps://review.opendev.org/c/opendev/system-config/+/795978 is what i was thinking of19:08
ianwre flushing19:08
clarkbah a different flush setting19:09
ianwohhh, that's flushing the config file19:09
clarkbhopefully fungi's change sorts this problem out19:09
clarkbyup19:09
ianwyeah, ++19:09
fungii'm not convinced, but we'll see19:09
mordredtoday's freenode-splosion is one of the most fascinating things to have happened in a while19:09
clarkbfungi: I doubt it will hurt anything at least so seems safe to try19:10
fungiagreed19:10
clarkbalso upstream has been super responsive which means if we can find and fix bugs pushing back upstream is worthwhile19:10
clarkbalright, anything else on the topic of IRC and IRC bots?19:10
fungithe pattern of what it was writing to disk and what it had seemingly decided to just no longer flush at all was not consistent19:10
ianwtemplates/supybot.conf.erb:supybot.plugins.ChannelLogger.flushImmediately: False is in the old config19:11
fungibut maybe there was more going on behind the scenes with fs write caching19:11
clarkbfungi: it is running on a relatively new kernel inside a container with bind mounts too19:11
fungiyeah19:12
ianwthere's also supybot.debug.flushVeryOften: False19:12
fungiso lots of things can have changed under the covers19:12
fungisupybot.debug.flushVeryOften seems to be about flushing its debug logs19:12
ianw"automatically flush all flushers"19:12
fungiwhich i figured was independent from channel logging19:13
fungibut who knows how many toilets it flushes19:13
clarkbwe don't need to debug it in the meeting :) just want to make sure it is called out as a problem with a potential fix pushed up19:13
ianwsorry the ptg thing, is there something to migrate off the old server?  i'm not super familiar with that bit19:13
clarkbI am not super familiar with it either. FOr some reason I thought the foundation was hosting that site19:14
clarkbbut dns said that was wrong when I asked dns about it19:14
fungiptgbot's ptg.openstack.org website was served from the old eavesdrop server, and the bot maintained some state in an sqlite file (mainly for things people set/updated via irc messages)19:14
ianwhrm so it's some sort of cgi?19:14
clarkbfungi: was the site cgi/wsgi then as part of the bot install?19:14
fungigmann was looking for the list of team etherpads from the last ptg, which the ptg.o.o site would still have been serving from data in ptgbot's sqlite db19:15
fungiyeah, the puppet-ptgbot module handles all of that19:15
fungiin theory it should get rewritten as aansible(+docker) including the website configuration19:16
clarkbin that case I guess we can grab the sqlite db file then query against it until the site is up again if people need info from it?19:16
fungiyeah, that was my thinking19:16
clarkbfungi: yup and diablo_rojo volunteered to look at it starting next week19:16
diablo_rojo_phoneYep! 19:16
clarkb*to look at the ansible (+docker) bits19:16
fungibut also archive.org might be indexing that site, in which case there could be a list we can point people to there in the meantime19:16
diablo_rojo_phoneAlmost down to that section of my to-do list. Probably by tomorrow. 19:17
fungiwe can't confirm whether archive.org has an old copy until their power maintenance is over though19:17
ianwit sounds like i should probably leave it alone then, but happy to help migrate things etc.19:18
ianwit looks like possibly it's a javascript thing.  i'm not seeing cgi/wsgi19:18
fungiyeah, mainly let's just not delete the old server yet19:18
clarkbya I think worst case we'll look at the instance disk that is shutdown and/or backups and pull the data off19:18
clarkbbut waiting a few days is probably fine too19:18
ianwyeah it's only shut down, essentially to avoid accidentally restarting the daemons twice19:18
fungicool, thanks for confirming19:20
clarkbsounds like that may be it for this topic. Lets move on.19:20
clarkb#topic Gerrit Account Cleanup19:20
clarkbHas anyone had a chance to look at this info yet? I think I need to go through it again myself just to page context back in. But it would be nice to disable more accounts when we have had people take a look at hte lists so that we can let them site for a few weeks before permanently cleaning them up19:20
fungii've lost track of whether there was something which needed reviewing on this, sorry19:20
clarkbya there is a file on review in my homedir. Let me dig it up19:21
fungii'll try to look through it after dinner, apologies19:21
clarkb~clarkb/gerrit_user_cleanups/notes/proposed-cleanups.20210416 I think19:22
clarkbbut I need to repage things in myself too19:22
clarkbanyway if you can take a look that would be helpful19:22
clarkb#topic Server Upgrades19:22
clarkbI have not made progress on the listserv upgrade testing as I have been distracted by things like server reboots and irc and all the things19:23
clarkbit is still on my list but at this point I likely won't make progress on this until after next week19:23
clarkbianw: I think you have been making progress with review02. Anything new to report?19:23
ianwi have just approved the container db bits that you've been working on and will monitor closely19:24
fungimnaser mentioned that the server got rebooted due to a host outage, so double-check things are still sane there i guess19:24
clarkbfungi: ya that also took care of the reboot I was going to do on it :)19:24
clarkbianw: sounds good , thanks for pushing that along19:24
ianwafter that doesn't do anything to production, i will apply it to review02 and get the server mounting it's storage and ready19:24
ianwi think we'll be very close to deciding when to sync data and move dns at that point19:25
clarkbexciting19:25
ianwi also had something to up the heap19:25
ianw#link https://review.opendev.org/c/opendev/system-config/+/78400319:25
clarkbAnd then after that we can resurrect the gerrit 3.3 and 3.4 changes (there are som threads about 3.4 and ssh host key problems, but 3.3 looks like it should be good for us at this point)19:25
clarkbany other server upgrade notes to amke?19:27
clarkb#topic Draft Matrix Spec19:28
clarkbcorvus: did you want to introduce this topic?19:28
corvusoh hi19:28
corvusincoming prepared text dump:19:28
corvusfirst i want to share a little update:19:28
corvusi spent some time today talking with folks from the ansible, fedora, and gnome communities, all of which have serious plans to adopt matrix (they either have a homeserver or have plans to).19:28
mordredI was there too19:28
corvus#link gnome matrix spaces https://discourse.gnome.org/t/experimenting-with-matrix-spaces/657119:29
corvus#link gnome matrix sovreignty https://blog.ergaster.org/post/20210610-sovereignty-federated-system-gnome/19:29
corvus#link fedora irc/matrix https://communityblog.fedoraproject.org/irc-announcement/19:29
corvus#link fedora matrix plan https://discussion.fedoraproject.org/t/matrix-server-channel-setup/29844/719:29
corvus#link ansible matrix plan https://hackmd.io/FnpIUIrrRuec-gT3lrv-rQ?view#Current-plan-as-of-2021-06-1419:29
corvusso we've got some really good company here, and people to collaborate with as we figure stuff out.19:29
corvusjust today i've learned way too much to share here in full, but in short: there are even more options for managing transitions from irc to matrix (including ways to take full admin control of the existing matrix.org portal rooms, rename those to :opendev.org rooms, and either maintain or retire the bridge at any point).  all of that to say that we'll make some choices that are appropriate for zuul, but there are other choices that may be19:29
corvusmore appropriate for other opendev projects which are equally valid.19:29
corvusno matter what we do, the next step is for opendev to host a homeserver, so, on to the topic at hand, i uploaded a spec: https://review.opendev.org/79615619:29
corvusand there are 2 big questions from my pov:19:29
corvus1) does opendev want a homeserver?19:29
corvus2) if so, does opendev want to run one itself or pay EMS to run it?19:29
corvusand so, what order do we want to answer these questions, and how do we want to decide the second one?19:29
corvus(fwiw, i advocate for OIF paying EMS to host the homeserver)19:29
corvus[eof]19:30
mordredin the realm of "learned too much"...19:30
mordredI recommend very strongly reading the matrix sovreignity post above19:30
clarkbfor 2) I'm pretty strongly in the have someone else run it if at all possible19:30
fungiany feel for what the recurring opex is on paying ems to host a homeserver?19:30
mordredit made me pretty well convinced that there is way more sharp edges around having a homeserver that also has user accounts19:31
mordredthan value19:31
fungijust wondering what additional research we need to do there before appealing to have funds applied19:31
corvusfungi: could be as little as $10/mo.  i think it's wise for someone from the oif to talk to them and determine if that's appropriate.19:31
mordredand so I think a homeserver that just hosts rooms and not users is what we'd be wanting19:31
clarkbmordred: does that article cover the advantages of running a homeserver in that case without user accounts? eg why not just use matrix.org in that case?19:32
mordredthe $10/mo actually technically could do that - but we might be weasel-word reading the intent of that price tier - so we should likely talk to them19:32
mordredclarkb: it does ... but I can summarize real quick19:32
mordredif we have an opendev.org homeserver then we have control over things that brand themselves as being opendev channels19:32
clarkbgot it, its about channel management then. Makes sense19:33
mordredso someone can be sure that #zuul:opendev.org is the zuul channel hosted by opendev - whereas #zuul:matrix.org might or might not have any relationship to us19:33
mordredyeah19:33
mordredwe would also have a few more integration possibilities19:33
mordredallowing us to think about things like logging bots slightly differently - or not, we could still use bots19:33
clarkbya and integration with other chat systems19:33
fungiwhat is the process for moving herds of users from one matrix channel to a replacement channel? like could we use a matrix.org channel and later "forward" that to an opendev.org channel?19:33
corvus(or slack bridges....)19:34
corvusfungi: one of the options is to actually just rename the channel :)19:34
fungiso "renames" are a thing then19:34
clarkband I guess the background on this topic is that Zuul has decided they would like to use matrix for primary synchronous comms rather than irc19:35
corvusi just learned (moments ago) that's actually a possibility for the oftc portal rooms!19:35
fungiand #zuul:matrix.org could be "renamed" to #zuul:opendev.org?19:35
corvuswell, there's no #zuul:matrix.org to my knowledge; i have no intention of creating any rooms :matrix.org19:35
clarkbLooking at element pricing is done per user. The $10 option is for 5 users. I suspect we'd end up with ~500 users at any one time?19:35
mordredclarkb: nope.19:36
mordredclarkb: we'd just have rooms19:36
mordredusers would not log in to our homeserver19:36
clarkboh I see19:36
corvusfungi: but if there were, we could rename that room.  more to the point, we can rename the `#_oftc_#zuul:matrix.org` portal room, with some help from the matrix.org admins.19:36
clarkbthere would be ~500 users interacting with the channels on that homeserver but none of them would be opendev.org users19:36
mordred(email winds up being an excellent analogy fwiw)19:36
mordredyah19:36
fungiyeah, if we need to have foundation staff talking to matrix about pricing, we probably should be clear on what wording is relevant19:36
corvuswe may want a handful of admin/bot accounts, that's it.  5 accounts is the right number to be thinking of.19:37
clarkbcorvus: got it19:37
mordredexactly - that's where I'd want the EMS folks to be cool with our intended use19:37
mordredbut also - it's other homeservers that would be federating with it19:37
fungiso my user might be fungi:yuggoth.org if i decide to run my own homeserver19:38
mordredyah19:38
fungiwhich wouldn't count against the opendev.org user count19:38
clarkbfungi: yup. I'm currently Clark:matrix.org or something19:38
corvusotoh, it's like "hey we're going to increase your load on matrix.org by X hundred/thousand users".  they may be ":(" or they may hear "we're going to expose X hundred/thousand more people to matrix technology" and be ":)".  i dunno.19:38
fungiright, i have fungicide:matrix.org because fungi was taken by someone else cool enough to like the name19:38
clarkbif I understand correctly what we would want to ask about is whether or not the $10/month (or maybe even the $75/month) options fit our use case of running a homeserver where the vast majority of users are authenticating with their own homeservers or matrix.org19:39
clarkbThe hosted homeserver would primarily be used to set channel ownership and manage those channels19:40
corvus++19:40
mordredyup19:40
clarkbfungi and I should probably go read the spec and bring that up with OIF then19:40
clarkband then based on what we learn we can bring that feedback to the spec19:41
corvusthat sounds like a great next step -- you can do that, and then we can revise the spec to only include one hosting option19:41
fungiclarkb: that would be great, happy to help in that discussion, and we can certainly involve anyone else who wants to be in on those conversations too19:41
mordredI think corvus and I would be happy to chat with our OIF friends if that would be helpful19:42
mordredyou could tell sparky that he's welcome to come to Pal's and talk with me about it there19:42
corvusyes, i am fully prepared to be a resource as needed :)19:42
clarkbsounds good. I'll try to get started on that when I get back on Thursday19:42
clarkbAnything else to bring up on the subject of Matrix? Or should we see where we end up after talking to OIF?19:43
fungiis Pal's a bar?19:43
corvusi think that's good for me19:43
clarkb#topic arm64 cloud status19:44
clarkbThis wasn't on the agenda but it should've been so I'm adding it :{19:44
clarkber :P19:44
fungichef's choice19:44
clarkbWhen I rebooted servers the osuosl mirror node did not come back with working openafs. Googling found that ianw had run into this in the past but I couldn't find out how we got past it previously. For this reason we ended up disabling osuosl in nodepool19:45
mordredfungi: yes19:45
fungimore specifically, it's throwing a kernel oops in cache setup19:45
clarkbsince then we've discovered that linaro has a bunch of leaked nodes limiting our total capacity there. That cloud is functioning just not at full capacity. I have emailed kevinz with those details19:45
ianwsorry i must have missed this19:45
clarkbI expect kevinz will be able to clean up the nodes i listed as leaked and we'll be back to happy again in linaro. But I'm not sure what the next stpes for us in osuosl are19:46
fungiianw: it's partly my fault for being so scattered i forgot to mention it19:46
ianwthe usual case i've found is that the /var/cache/openafs is corrupt, and removing it helps19:46
clarkbianw: no worries. I think I remember for your initial query to the openafs list that this is focal specific. I suppose one option is to downgrade to bionic on the mirror19:46
fungiwe've tried a few things there, clearing the old cache files, reformatting and even recreating the cache volume in case it was a block level problem, manually loading the lkm before starting afsd...19:46
clarkbianw: we've cleared out the cache multiple times without it helping unfortunately. fungi even compeltely replaced the cinder volume that bakced it19:46
fungiyeah, still the same oops every time we try to start up afsd19:47
ianwdo you have a link to the oops, i can't even remeber sorry19:47
clarkbopenafs upstream mentioned that 1.8.7 should include the expected fix19:48
clarkblet me see if I can find it in scrollback19:48
fungii can scrape it from dmesg on the server, sure19:48
ianwanyway, we can debug this today19:48
clarkbianw: https://www.mail-archive.com/openafs-info@openafs.org/msg41186.html should match the dmesg if I got the right thing19:48
clarkbianw: that would be great. Thanks!19:48
fungi#link http://paste.openstack.org/show/806651 openafs kernel oops19:49
clarkb#topic Gerrit Project Renames19:49
clarkbfungi: do we have a change to update the playbook for this yet?19:49
fungii have not, no19:49
clarkbok, lets skip it for now then19:49
fungimeant to do it late last week19:49
fungisorry!19:49
clarkbno worries. It has been a fun few weeks19:49
clarkb#topic Open Discussion19:49
clarkbIs there anything else to talk about?19:50
clarkbSounds like that may be it. Thank you everyone!19:52
clarkb#endmeeting19:52
opendevmeetMeeting ended Tue Jun 15 19:52:57 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:52
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2021/infra.2021-06-15-19.01.html19:52
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-06-15-19.01.txt19:52
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2021/infra.2021-06-15-19.01.log.html19:52
fungithanks clarkb!19:54

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!