19:01:29 <clarkb> #startmeeting infra 19:01:30 <openstack> Meeting started Tue Oct 6 19:01:29 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:31 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:33 <openstack> The meeting name has been set to 'infra' 19:01:42 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2020-October/000102.html Our Agenda 19:01:49 <clarkb> #topic Announcements 19:02:10 <clarkb> PTG and Summit happen this month. Register now if you haven't already and plan to attend (it is free to register) 19:02:29 <clarkb> The OpenStack release happens next week which we should keep in mind for service changes 19:02:47 <clarkb> and finally Rax hosted db outages around 03:00-05:00 UTC Friday including those for review and grafana 19:03:52 <clarkb> Thats thursday evening around here and Friday morning for ianw I think. I'll try to be around so that ianw isn't the only one at a keyboard if gerrit or grafana get sad 19:04:59 <ianw> heh, yeah that's a good time for .au for things to go wrong :) 19:05:17 <clarkb> #topic Actions from last meeting 19:05:24 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-09-29-19.01.txt minutes from last meeting 19:05:28 <clarkb> We recorded no actions 19:05:33 <clarkb> #topic Priority Efforts 19:05:41 <clarkb> #topic Update Config Management 19:05:50 <clarkb> Any configuration management updates to call out? 19:06:36 <fungi> graphite got swapped out and the old server deleted thanks to ianw's tireless effort 19:07:01 <ianw> well i haven't deleted the old server yet, but it has been a week so i'll probably do it soon (on my todo list :) 19:07:25 <fungi> oh, right, deleted it from the inventory 19:08:02 <ianw> hrm, it looks like that got a -2 for a job that passed in check but failed gate ... https://review.opendev.org/#/c/755150/ ... will look into that 19:10:05 <clarkb> #topic OpenDev 19:10:52 <clarkb> Lets start with gitea. We discovered that setting descriptions errors in a very small portion of requests. I've made the description update non fatal in our repo management if it fails, we'll get to it on another pass (it is fatal if project creation breaks) 19:11:11 <clarkb> I also updated the gitea config to log tracebacks on those errors so that if it happens again we can hopefully debug it further than it broke 19:11:26 <corvus> so it intermittently fails? 19:11:33 <clarkb> I have confirmed the new config setting is functional and we get tracebacks when starting gitea and the mysql server isn't listening on tcp yet 19:11:45 <clarkb> corvus: yes the error message we get indicates the cache returns an unexpected nil value 19:11:50 <corvus> like, it's not that some project has a pile of poo emoji in the description and that one always fails? 19:11:51 <corvus> k 19:11:54 <clarkb> correct 19:12:26 * ianw has a new idea for the dib project description :) 19:12:34 <fungi> though i can think of a few of my projects which probably deserve a pile of poo as a description 19:12:43 <fungi> ianw beat me to it 19:14:15 <clarkb> On the gerrit upgrade side of things fungi and I have been doing pair programming ops style and got review-test into shape mimicing a gerrit 2.13 as of october 1. We then upgraded it to 2.16 pre notedb migration using the process described at https://etherpad.opendev.org/p/gerrit-2.16-upgrade 19:14:28 <clarkb> That server is now up and running and you can test it at https://review-test.opendev.org 19:14:49 * diablo_rojo sneaks in late 19:15:04 <fungi> we also timed the relevant steps so we can gauge how much downtime we'll likely incur for them 19:15:07 <clarkb> The next thing I'm working on is the notedb migration and 3.x upgrades. I've found that we need 3.1 and 3.2 images so am working on that next. While doing that I expect we'll leave 2.16 up like that so people can interact with it 19:15:38 <clarkb> the reason for the notedb migration and 3.x stuff happening next is we want data to determine if we should do a 2.13 -> 3.x upgrade or just 2.13 -> 2.16 then later 2.16 -> 3.x 19:15:59 <clarkb> As expected things like our hacky js CI results table do not work in either the old ui or the polygerrit ui on gerrit 2.16 19:16:05 <fungi> also if folks have any observations/concerns about the 2.16 deployment there, please add them to the notes section at the bottom of the etherpad 19:16:28 <clarkb> more suprisingly the zuul comment commentlink config does not work under polygerrit and fomr what I've been able to debug cannot be made to work without changing how zuul comments 19:16:44 <corvus> huh :( 19:16:57 <clarkb> the issue being that gerrit finds the url that zuul posts and treats that separately to other strings which you can regex match on 19:17:03 <fungi> either things we'll need to address before the production upgrade, or things we might want to look into fixing after upgrading 19:17:10 <clarkb> https://gerrit.googlesource.com/gerrit/+/refs/heads/stable-2.16/polygerrit-ui/app/elements/shared/gr-linked-text/link-text-parser.js#268 is the code that does that in gerrit 19:17:30 <clarkb> we could fix that by having zuul not leave comments with urls then do comment link rewrites to urls ourselves 19:17:37 <fungi> yeah, basically no commentlink matching if the pattern covers an existing url in the content 19:17:42 <clarkb> I also checked that code through 3.2 and I believe its just as broken there as 2.16 19:18:08 <clarkb> in positive news the upgrade itself went pretty smoothly 19:18:10 <corvus> well, zuul needs to leave usable comments without commentlinks 19:18:17 <corvus> so i think they have to have urls 19:18:22 <clarkb> corvus: ya and it does, so I'm ok with just ignoring that 19:18:33 <clarkb> they aren't as pretty but they are usable imo 19:18:34 <fungi> i also don't think the commentlinks really buy us a lot for the zuul comments anyway 19:18:55 <clarkb> another neat thing is gerrit annotates comments that leave -1's and -2's so the zuul comments jump out in the comment list 19:19:02 <corvus> so the issue is that we can't modify the link to replace the anchor text with the job name? 19:19:03 <clarkb> making it easy to find them and figure out what is broken 19:19:17 <clarkb> corvus: we can't have a regex that matches a url 19:19:34 <clarkb> corvus: and the end result is what you describe 19:19:54 <corvus> well, i'm unclear about whether the problem is that we can't match a url or we can't modify the anchor text 19:20:01 <clarkb> I can match either side of the url but not the url itself 19:20:24 <corvus> like, if the comment were "jobname {url}" would we be able to match the interior of the {} and change the anchor text of the resulting url? 19:20:34 <fungi> and it's matches on the raw comment string minus the url, not the baked comment html 19:21:16 <clarkb> corvus: thats a good qusetion, we can test that by leaving some comments manually and trying to match them with commentlinks 19:21:46 <clarkb> corvus: I expect that if we manually leave such a comment and it doesn't get rendered to an href then we could use commentlinks for that but if the inner bit of {} gets rendered to href we can't 19:21:53 <fungi> we may however wind up with an href object string inside an escaped version of our attempt at adding an href object, but worth a try 19:22:27 <corvus> i think it would be in scope to update zuul's comment syntax to make it parseable with commentlinks in newer gerrit, i think the only restriction is we need to make it sensible without commentlinks 19:22:36 <fungi> <a href="<a href="https:... 19:23:02 <fungi> is kinda what i'm expecting that to result in 19:23:23 <corvus> i can try to help with this 19:23:35 <clarkb> corvus: thanks, feel free to edit the config and reload on review-test 19:23:41 <corvus> k 19:23:45 <clarkb> I think you can do that without restarting gerrit on 2.16 but I haven't confirmed that yet 19:23:59 <corvus> fancy 19:24:10 <fungi> but also restarting the container shouldn't be much of a concern 19:24:22 <clarkb> corvus: and note polygerrit commentlinks are parsed on the raw string not the rendered html 19:24:32 <clarkb> that was the first thing I had to sort out 19:24:58 <clarkb> on the image building side of things I'm running into issues with jgit being a submodule in gerrit now. But will have a patchset after the meeting to see if I've figured that out 19:25:20 <corvus> i thought we used zuul required-projects to set that up? 19:25:43 <clarkb> corvus: we do except for like ~3 repos because they don't have branches/tags/refs to checkout and they just use submodule pointing at specific refs 19:25:54 <clarkb> corvus: jgit was only converted to a submodule in 3.1 which is why its different 19:26:11 <clarkb> but also they don't use refs that we can easily checkout and just have it on specific commits :/ 19:26:28 <clarkb> looks like mordred stubbed support for this out and I just need to tie it into the new jobs I'm creating 19:27:02 <corvus> might be worth a look at the zuul jobs in upstream gerrit too, maybe we did something there we can backport 19:28:25 <clarkb> As mentioned earlier the upgrade process went really smoothly and would probably only take a day to get to 2.16 pre notedb 19:28:35 <clarkb> considering all the moving parts thats pretty good 19:29:06 <fungi> aggregate time to do the gc passes and reindexing is only a few hours 19:29:18 <fungi> which is way better than i anticipated 19:29:45 <clarkb> this is why I'm now looking at 3.x as it isn't entirely far fetched to think we could make that leap in one go 19:29:46 <fungi> and the db migrations 19:29:53 <corvus> clarkb: do we have this locally? https://gerrit.googlesource.com/zuul/jobs/+/refs/heads/master/roles/prepare-gerrit-repos/tasks/repo.yaml#53 19:30:22 <clarkb> corvus: ya mordreds stubbed out thing is similar 19:30:33 <clarkb> corvus: except it hardcodes the things to submodule init on 19:30:48 <corvus> k; we might be able to just move to the upstream one? 19:31:30 <clarkb> ya that may be a good refactor 19:32:24 <clarkb> the last thing I wanted to note here is that I think we should be careful about trying to fix everything pre upgrade otherwise we may never upgrade :) if we find issues and identify which ones are important to fix that would be helpful so that we can prioritize them as much as possible 19:32:54 <clarkb> its storyboard for example hasn't been updated in a long long time 19:33:15 <corvus> ++ i bet we can live with weird commentlinks :) 19:34:13 <clarkb> but ya please do give that server a good test. I've dumped the db and snapshotted the cinder volume on 2.13 and 2.16 already so we can roll back pretty easily 19:34:26 <clarkb> Any thing else to bring up on the topic of opendev before we move on? 19:34:42 <fungi> its storyboard is a good example of something we could probably just replace with a zuul job too 19:34:48 <clarkb> fungi: ++ 19:34:50 <fungi> er, its-storyboard 19:35:28 <fungi> also i don't think we've tested whether the hook scripts will continue to work 19:35:38 <corvus> clarkb: maybe send out an email to service-discuss to ask folks to poke at review-dev? 19:35:45 <fungi> (for things like launchpad bug updates and welcome new contributor message) 19:35:47 <clarkb> corvus: oh ya I can do that too 19:35:50 <ianw> just a big thanks to clarkb and fungi for working on it!!!! 19:36:17 <corvus> ianw: ++ thanks clarkb and fungi and also luca :) 19:36:27 <clarkb> and mordred for laying the groundwork 19:36:36 <fungi> yes, luca's guidance has been invaluable 19:36:58 <fungi> also if you say mordred three times he might appear 19:37:02 * mordred hands out candy and goats to everyone in celebration 19:37:16 <fungi> oh, two times! ;) 19:37:44 <mordred> fungi: that was actually three ;) 19:37:44 <clarkb> #topic General topics 19:37:56 <corvus> mordred: getting ready for halloween nola style? :) 19:38:01 <clarkb> #topic PTG Planning 19:38:02 <fungi> mordred: yeah, you're right (four now!) 19:38:17 <clarkb> corvus: for some reason I just assume they had out cocktails isntead of candy 19:38:25 <mordred> corvus: does buying pig feet count? 19:38:48 <clarkb> #link https://etherpad.opendev.org/opendev-ptg-planning-oct-2020 PTG Planning and details for OpenDev here 19:38:49 <corvus> mordred: pretty sure that's just 'tuesday' not halloween 19:39:20 <fungi> in my old neighborhood the russian family across the street from us set up a table and handed out vodka shots to the parents dragging their costumed toddlers around. a public service 19:39:21 <clarkb> I've scribbled notes on that etherpad. As mentioned before please indicate if you'd like to be a part of specific discussions and we'll do our best to accomodate with timezones 19:39:41 <corvus> those looks like pretty good times 19:39:44 <clarkb> fungi: and that was in nola? 19:39:54 <fungi> nope, raleigh 19:40:21 <clarkb> oh also fungi and ttx and diablo_rojo have indicated they have a million ptg things and if you are in that boat too and have conflicts wieh opendev let us know 19:40:33 <clarkb> I think we can do minor tweaks to the schedule to accomodate 19:40:44 <fungi> i expect to just have to float in and out of sessions. where conflicts arise 19:41:09 <clarkb> #topic Rehoming tarballs 19:41:20 <fungi> and rely on folks to ping me in irc if they need me in a particular discussion and i'm not in the right meetpad 19:41:25 <clarkb> ianw: I kept this on the agenda in case there was anything more to say about this, but I think its been taken care of? 19:43:38 <clarkb> sounds like maybe no 19:43:50 <ianw> yeah, sorry 19:43:56 <clarkb> tldr is the tarballs were moved to their proper homes and apache redirects were added for people that had old urls 19:44:13 <ianw> i have some follow-up to do on the zuul side with some questions about what needs to be published etc. i said i'd send to the list 19:44:20 <ianw> on the todo :) 19:44:24 <clarkb> roger 19:44:35 <clarkb> #topic Splitting puppet else into specific infra-prod jobs 19:45:02 <clarkb> I don't think anything has happened on this topic yet. I've been thinking about dropping it from the agenda and writing a help wanted doc for things like this 19:45:20 <clarkb> we'll see how I do with all the other stuff happening this month first 19:45:32 <clarkb> (ptg summit ansiblefest openstack release gerrit upgrade testing so many things) 19:45:41 <clarkb> #topic Bup and Borg Backups 19:46:06 <corvus> a help wanted doc would be useful (looking at the ptg list, looks like auth might be another item there?) 19:46:07 <clarkb> ianw: I don't think the borg change has landed yet but you are building a new backup server. Were you planning to make that a "normal" bup server and conver to borg later or? 19:46:12 <clarkb> corvus: ++ 19:46:58 <ianw> so i have started up a new server in vexxhost 19:47:30 <ianw> i have merged the borg change with the idea to apply it to this server 19:47:46 <clarkb> nice 19:48:03 <fungi> looking at the help-wanted section of our specs index (which already includes the auth spec), the irc bot consolidation might be good to drum up interest for as well 19:48:12 <ianw> it's been not top priority but i'm making progress :) 19:48:51 <clarkb> ianw: one thing I've noticed recently with borg locally is that pip installing borg on arm64 is a bit of a pain for all the reasons we've been having python build time issues in other places 19:49:02 <clarkb> not an issue for us today as we don't need to backup and arm64 hosts but somethign to keep in mind 19:49:20 <clarkb> and the python ecosystem is slowly getting better about that in part due to your work so yay 19:50:07 <clarkb> #topic Open Discussion 19:50:14 <clarkb> Any other items to bring up today? 19:52:41 <clarkb> sounds like that may be it. Thank you everyone 19:52:58 <clarkb> Feel free to bring up discussions on the mailing list or in #opendev 19:52:59 <fungi> thanks clarkb! 19:53:03 <clarkb> #endmeeting