Tuesday, 2020-10-13

*** hashar has joined #opendev-meeting06:52
*** hashar has quit IRC09:17
*** SotK has quit IRC16:27
*** SotK has joined #opendev-meeting16:29
*** hamalq has joined #opendev-meeting16:44
*** hashar has joined #opendev-meeting18:57
clarkbanyone else here for the team meeting?19:00
corvuso/19:00
ianwo/19:01
*** diablo_rojo has joined #opendev-meeting19:01
clarkb#startmeeting infra19:01
openstackMeeting started Tue Oct 13 19:01:14 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
*** openstack changes topic to " (Meeting topic: infra)"19:01
openstackThe meeting name has been set to 'infra'19:01
diablo_rojoo/19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2020-October/000105.html Our Agenda19:01
fungiohai19:01
clarkbI'm actually going to flip the order of this agenda around so that we can talk about gerrit last so that we can just talk about it until we are done or run out of time19:01
clarkb#topic Announcements19:02
*** openstack changes topic to "Announcements (Meeting topic: infra)"19:02
clarkbThe OpenStack release happens tomorrow19:02
clarkbwe should be slushy on things that impact that (I think we've been managing that so far so not super concerned)19:02
clarkbThen next week we have the summit and the week after that the PTG19:02
clarkbhope to see you all virtually there :)19:03
clarkb#topic Actions from last meeting19:03
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)"19:03
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-10-06-19.01.txt minutes from last meeting19:03
clarkbNo actions recordred19:03
clarkb(and I can't type)19:03
clarkb#topic General topics19:03
*** openstack changes topic to "General topics (Meeting topic: infra)"19:04
clarkb#topic PTG Planning19:04
*** openstack changes topic to "PTG Planning (Meeting topic: infra)"19:04
clarkb#link https://etherpad.opendev.org/opendev-ptg-planning-oct-2020 October PTG planning happens here19:04
clarkbif you haven't looked over this etherpad yet it would be a good idea to do a quick check to ensure we aren't missing any important items19:04
clarkbOther than that ensure you've registered19:05
clarkb#link https://www.openstack.org/ptg/19:05
clarkbAnd we'll see you on meetpad in a couple weeks19:05
clarkb#topic Bup and Borg Backups19:05
*** openstack changes topic to "Bup and Borg Backups (Meeting topic: infra)"19:05
clarkbethercalc is are borg test unit19:05
* fungi tries to parse19:06
clarkbianw: that landed yesterday, anything unexpected or exciting to mention on that?19:06
clarkbs/are/our/19:06
ianwwell, yes, about ansible on bridge of course :)19:06
clarkboh ya the jinja thing. That would be good to recount here19:06
fungidid manually upgrading jinja2 work?19:06
fungii sort of passed out around that time19:07
ianwin a yak shaving adventure, i realised that bridge has jinja2 2.10, because ansible doesn't specify any lower bound19:07
ianwand of course, i managed to find a place where it is incompatible with ~2.11, which is what gets installed in the gate testing19:07
ianwi haven't tried with the manual update of it yet, will today19:08
ianwin the mean time, i wrote up https://review.opendev.org/757670 to install ansible in a venv on bridge, so we can --update it19:08
clarkbassuming that works we should have our first borg'd server ya?19:09
ianwi'll clean that up today, but it seems to work19:09
ianwclarkb: hopefully :)  anyway, progress is being made19:09
clarkb#topic Splitting puppet else into specific infra-prod jobs19:10
*** openstack changes topic to "Splitting puppet else into specific infra-prod jobs (Meeting topic: infra)"19:10
clarkbI don't think anyone has started this yet but thought I'd quickly double check19:10
fungii have not, no19:11
clarkb#topic Priority Efforts19:11
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)"19:11
clarkb#topic Update Configuration Management19:11
*** openstack changes topic to "Update Configuration Management (Meeting topic: infra)"19:11
clarkbI think I saw ianw working on the reprepro ansiblification. There was also an update to add in some files that were missing in gerrit ansible19:12
clarkbanything to add re ^ or any other config management updates?19:12
ianwyeah i'm starting on that, so we can get rid of more puppet there19:12
fungioh, was there an update to that change? i'm happy to take a look19:12
ianwfungi: still very much a wip.  i'm taking a less template-centric approach19:13
clarkbthis is an area we should be careful with the openstack release happening tomorrow but things like reprepro should be low impact if they break (due to how we vos release)19:13
fungiprobably wise. that was far too many templates19:13
ianwyeah, there were about 3 different forms of templates, which made it more confusing than just looking at the files19:14
ianw(it didn't start that way, of course :)19:14
fungishall i just abandon my topic:ansible-reprepro changes then? i guess you're working in a new change19:15
fungii'm entirely in favor of something with fewer templates19:15
fungithat's what was so daunting about trying to convert the puppet to begin with19:16
fungii got as far as template conversion and stalled out19:16
ianwfungi: you can leave it for now, i used some of it as reference :)19:17
fungiby all means, happy it helped19:17
clarkb#topic OpenDev19:19
*** openstack changes topic to "OpenDev (Meeting topic: infra)"19:19
clarkbThat takes us to the topic I was hoping to make room for (and we did yay)19:19
clarkbspecifically upgrading our gerrit server19:19
clarkbfungi and I have worked through a gerrit 2.13 to 3.2 upgrade on review-test using a snapshot of production from october 119:19
clarkbThat upgrade is looking to be about 2 days long (with gerrit offline for it)19:20
clarkbThe first step is to upgrade from 2.13 to 2.16 as we need 2.16 to do the notedb conversion19:20
fungiwhich wouldn't be too terrible over a weekend19:20
fungitwo days over a weekend i mean19:20
clarkbonce we've upgraded to 2.16 I think we shoud checkpoint there so we don't have to fall back all the way to 2.13 if something goes wrong19:21
clarkbthen we run the notedb migration which will take about 8 hours19:21
clarkbthen the next day we can do the 3.0 through 3.2 upgrades19:21
corvusby about 2 days, what do you mean?  like 8am one day to 5pm the next?  with idle time for when processes finish and no one is watching?19:21
corvusor 48 hours straight?19:21
clarkb8am to 5pm the next19:22
corvuscool19:22
fungiwith likely some idle time interspersed19:22
clarkbroughly that process would look like: shut everything down and put up notices, backup reviewdb and git repos, do upgrade to 2.16, check it is happy, backup reviewdb and git repos (this is ~5pm day one), do notedb migration, at 8am next day do 3.0 to 3.2 upgrades which should finish around midday. Spend rest of day turning things back on and merging changes to catch up with our new state19:23
fungilike the notedb conversion, but also to a lesser extent offline reindexing, database schema migrations, git gc passes...19:23
clarkband ya lots of idle time waiting for things to finish19:24
clarkbhttps://etherpad.opendev.org/p/gerrit-2.16-upgrade has timing data19:24
clarkbFrom the testing side of things basic functionality seems to work19:24
fungiwhich is about as accurately measured as we can manage. same server flavor, volume type, snapshot of production data, et cetera19:24
fungiobviously though, clouds, no way to be sure about the timing for any of it19:25
clarkbI can login, do git review -s, git review a change, review a change, search for changes, and otherwise interact with the web ui19:25
clarkbfungi tested that ICLA signing works19:25
fungiyup19:25
clarkbI'm currently testing replication to a gitea99 from a held system-config-run-gitea job19:25
clarkbthat has been running for 25 hours now and is still not done replicating19:25
clarkbbut it is working19:26
fungifor followup tasks (or could even be done beforehand), there are likely some zuul jobs to be written to replace some of our jeepyb gerrit hooks19:26
clarkbmy takeaway from that is we should be prepared to possibly stop replicating refs/changes again but we can make that decision if it becomes a problem?19:26
clarkbyes, the next thing I want to test is project creation with manage-projects, then renaming a project, and finally use the delete-project plugin to test deleting a project19:26
corvuswhy would we need to re-replicate refs/changes?19:27
clarkbcorvus: we'll be replicating all of the notedb content which is in refs/changes/XY/ABCXY/meta now19:27
corvushrm.  but in your test, gitea99 doesn't have the bulk of the refs/changes content while prod gitea does19:27
clarkbcorvus: correct19:28
fungithis isn't for timing the replication, but measuring the resultant system19:28
clarkbit will be ~15GB of data to replicate I think based on df output before and after the notedb migration19:28
fungialso the replication to gitea can happen outside the window, it's merely additive19:29
fungibut we want to know if the added data is going to cause our gitea servers to topple over19:29
corvusright, though it could cause replication to lag which could affect users19:29
fungithis is true, yes19:29
clarkbya I think we should go in with the intention of replicating refs/changes and keep in mind we can disable it if we notice problems (so far the only problemi s the speed at which a fresh server can be replicated to)19:30
fungiwe can re-test by importing a gitea production db backup and re-replicating the difference19:30
fungiif we want to have an idea of how long it's going to actually take to catch up19:30
fungi(for purposes of messaging to our users about replication lag)19:31
clarkbon the jeepyb side of things the lp bug and spec integration as well as the welcome message hook all talk to reviewdb which will be stale after the notedb migration (and eventually we'll drop that db and it will break completely)19:31
corvuswe can also disable replicating refs/changes/XY/ABCXY/meta right?19:31
clarkbcorvus: I can't figure out how to do it19:31
corvusis it a regex?  negative lookahead?19:31
clarkbno its just globs I think19:31
clarkbit uses gits normal ref syntax19:31
corvusoh. nm then.19:31
fungimight be worth asking luca if we can't figure it out19:31
fungithough yeah, sounds... like a glob19:32
corvusi agree that we can probably assume it will be okay and disable if not and regroup19:32
clarkbfor jeepyb I'm wondering if people think we should proactively remove those gerrit hooks19:32
clarkbor if anyone is interseted in looking at them to see if they need to use the db or if they can just hit the rest api maybe19:33
clarkb(I think the rest api would be the best way to interact with notedb)19:33
fungithose hooks seem like somethnig which could be replaced with zuul jobs in advance with little or no modification needed between 2.13 and 3.219:34
ianwcan you point out what those hooks are, for those of us who might not know? :)19:35
clarkbone sec19:35
ianwgetting something zuulified might be somewhere i can practically help :)19:35
clarkbhttps://opendev.org/opendev/jeepyb/src/branch/master/jeepyb/cmd/update_bug.py19:36
clarkbhttps://opendev.org/opendev/jeepyb/src/branch/master/jeepyb/cmd/update_blueprint.py19:36
clarkbhttps://opendev.org/opendev/jeepyb/src/branch/master/jeepyb/cmd/welcome_message.py19:36
clarkbthen in system config wehave simple shim bash scripts that execute those jeepyb tools via gerrit hooks19:37
fungiianw: yeah, that's why i mentioned, they can probably be worked on in parallel and then that's one less thing we have to worry about afterward19:37
clarkbrelated to this is storyboard integration which is currently done via the its-storyboard plugin. That plugin hasnt had much development in years19:38
clarkbits possible that it just works since storyboard and gerrit plugins are quite stable but I havent set it up to test it19:38
fungialso zuul and the storyboard api are both far more extensible than the its framework19:38
ianwlooks like the db parts there are for setting the uploader of a change to the owner of the bug in launchpad19:39
fungiso there's a lot of opportunity for improvement as a zuul job anyway in my opinion19:39
clarkbianw maybe you can look at the jeepyb side and give a recommendation and based on that we decide if weneed to test storyboard integration?19:40
ianwok, i can take a look and see if i can find the apis to replicate what's there19:41
ianwi can make an therpad19:42
clarkbthank you19:42
clarkbother things to note, draft changes will be converted to WIP changes19:43
clarkbthe UI will change as gerrit 3.2 is polygerrit only19:43
clarkbthe zuul commentlink stuff will be removed and we can figure out how to make them fancy again post upgrade19:43
fungialso the custom summary table js overlay yeah?19:44
clarkbyes that doesnt work either so in my change stack it is removed19:44
corvusi'm still out of ideas for that other than a polygerrit plugin19:44
clarkbhttps://review.opendev.org/#/c/757162/ is the end of my WIP stack if you want to take alook19:45
clarkbmechanically I'm not quite sure yet how we land those post upgrade19:46
clarkbassuming zuul is already running we won't want them to actually execute in sequence19:47
clarkbwe could squash them all together or perhaps remove the infra-prod job for gerrit19:47
clarkbor land them pre zuul being started (one change in the sequence is a zuul config update to get it authing properly to gerrit)19:47
clarkbI think we can do a proper review of the upgrade process from start to finish once I've written up a better change doc19:47
clarkband check on that there19:48
clarkbAre there other gerrit features or functionality that people think will be critical to test pre upgrade?19:48
ianwgertty?19:49
clarkb++ would existing gertty users like to point it at review-test or should I get a local install running again?19:49
clarkbalso possible that corvus already uses gertty with upstream gerrit and its fine?19:50
fungiplenty of folks are also using gertty with newer gerrits, but can't hurt19:50
clarkbok add that to my local list19:50
corvusi have used it with upstream gerrit and it works19:50
clarkbexcellent19:50
corvusit doesn't fully support all the new features, but it functions19:50
corvus(eg tag support is half-implemented)19:50
corvusi mean hashtag19:50
clarkbgiven all that do we think we should start working to schedule a downtime under the assumption we'll work through the remainder of our test items in the interim?19:51
corvusgerrit devs are good api stewards :)19:51
clarkbI think we should start a downtime window ~PDT friday morning and end it ~PDT sunday evening19:51
clarkbwhere PDT may be PST due to time change19:52
clarkbthat gives us a large buffer over our less busy period of time19:52
clarkbwith the goal of being done saturday19:52
fungiit will be pst by then, yes19:53
fungitime change is coming up for the usa next week i think?19:53
clarkbfungi: ya just before the PTG19:53
clarkbwith the summit and PTG coming up and then tue US election likely to be distracting I think the earliest we could do it is 13th of november (friday the 13th!)19:54
clarkbbut maybe 20th is better as it gives us a buffer and is closer to a large us holiday (so likely to be quiet?)19:54
fungilucky 1319:54
clarkbianw: corvus: any opinions?19:54
corvuschecking19:55
fungii'm open all of the above19:55
fungithere's yet more pressure to update as we've just today learned that fedora 33's default openssl policy causes ssh to no longer accept our gerrit host key without additional overrides19:55
clarkbI think my personal vote would be November 20, 21, 2219:55
clarkband keep working on testing things if something major comes up we can push it back19:56
clarkbbut so far testing has been mostly happy (as long as we accept things won't be perfect just functional)19:56
ianwafaik i can be around then, not that i've really done anything on the details of the upgrade19:56
clarkbya the only reason I've said PST timezone reference is I've been doing a lot of the work so figure I should be around to drive things19:57
clarkbbut I'd love as many eyeballs as possible :)19:57
fungiadditional hands are helpful for sorting out unanticipated issues after the upgrade too19:57
corvuseither of those sounds good.  i suspect due to covid, people may be generous in taking time off before or after thanksgiving this year.19:57
corvus(eg, banked use-it-or-lose-it vacation days in usa)19:57
clarkbya19:58
fungior they may be like me and have an excuse to be anti-social and not worry about family obligations ;)19:58
clarkbwhy don't we pencil in the 20th if nothing jeopardizing that comes up in the next week we can announce it with a full month of warning?19:58
corvusthat sounds good19:58
fungisounds great19:58
corvusfungi: oh yeah, i'm not suggesting people would use vacation to meet with other people socially.  just use it period.19:59
clarkbwoo and we haven't quite run out of time yet either :)19:59
clarkbIs there anything else related to the gerrit upgrade that we want to talk about before I end the meeting?19:59
clarkbalso thank you for all the help and willingness to do a weekend outage19:59
corvusclarkb: thanks to you for driving it.  and fungi too.19:59
clarkband we're at time. Thanks again20:00
clarkb#endmeeting20:00
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev"20:00
openstackMeeting ended Tue Oct 13 20:00:15 2020 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:00
openstackMinutes:        http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-10-13-19.01.html20:00
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-10-13-19.01.txt20:00
openstackLog:            http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-10-13-19.01.log.html20:00
fungithanks clarkb!20:02
*** hashar has quit IRC20:30
*** diablo_rojo has quit IRC22:34

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!