*** hashar has joined #opendev-meeting | 06:52 | |
*** hashar has quit IRC | 09:17 | |
*** SotK has quit IRC | 16:27 | |
*** SotK has joined #opendev-meeting | 16:29 | |
*** hamalq has joined #opendev-meeting | 16:44 | |
*** hashar has joined #opendev-meeting | 18:57 | |
clarkb | anyone else here for the team meeting? | 19:00 |
---|---|---|
corvus | o/ | 19:00 |
ianw | o/ | 19:01 |
*** diablo_rojo has joined #opendev-meeting | 19:01 | |
clarkb | #startmeeting infra | 19:01 |
openstack | Meeting started Tue Oct 13 19:01:14 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
*** openstack changes topic to " (Meeting topic: infra)" | 19:01 | |
openstack | The meeting name has been set to 'infra' | 19:01 |
diablo_rojo | o/ | 19:01 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2020-October/000105.html Our Agenda | 19:01 |
fungi | ohai | 19:01 |
clarkb | I'm actually going to flip the order of this agenda around so that we can talk about gerrit last so that we can just talk about it until we are done or run out of time | 19:01 |
clarkb | #topic Announcements | 19:02 |
*** openstack changes topic to "Announcements (Meeting topic: infra)" | 19:02 | |
clarkb | The OpenStack release happens tomorrow | 19:02 |
clarkb | we should be slushy on things that impact that (I think we've been managing that so far so not super concerned) | 19:02 |
clarkb | Then next week we have the summit and the week after that the PTG | 19:02 |
clarkb | hope to see you all virtually there :) | 19:03 |
clarkb | #topic Actions from last meeting | 19:03 |
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)" | 19:03 | |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-10-06-19.01.txt minutes from last meeting | 19:03 |
clarkb | No actions recordred | 19:03 |
clarkb | (and I can't type) | 19:03 |
clarkb | #topic General topics | 19:03 |
*** openstack changes topic to "General topics (Meeting topic: infra)" | 19:04 | |
clarkb | #topic PTG Planning | 19:04 |
*** openstack changes topic to "PTG Planning (Meeting topic: infra)" | 19:04 | |
clarkb | #link https://etherpad.opendev.org/opendev-ptg-planning-oct-2020 October PTG planning happens here | 19:04 |
clarkb | if you haven't looked over this etherpad yet it would be a good idea to do a quick check to ensure we aren't missing any important items | 19:04 |
clarkb | Other than that ensure you've registered | 19:05 |
clarkb | #link https://www.openstack.org/ptg/ | 19:05 |
clarkb | And we'll see you on meetpad in a couple weeks | 19:05 |
clarkb | #topic Bup and Borg Backups | 19:05 |
*** openstack changes topic to "Bup and Borg Backups (Meeting topic: infra)" | 19:05 | |
clarkb | ethercalc is are borg test unit | 19:05 |
* fungi tries to parse | 19:06 | |
clarkb | ianw: that landed yesterday, anything unexpected or exciting to mention on that? | 19:06 |
clarkb | s/are/our/ | 19:06 |
ianw | well, yes, about ansible on bridge of course :) | 19:06 |
clarkb | oh ya the jinja thing. That would be good to recount here | 19:06 |
fungi | did manually upgrading jinja2 work? | 19:06 |
fungi | i sort of passed out around that time | 19:07 |
ianw | in a yak shaving adventure, i realised that bridge has jinja2 2.10, because ansible doesn't specify any lower bound | 19:07 |
ianw | and of course, i managed to find a place where it is incompatible with ~2.11, which is what gets installed in the gate testing | 19:07 |
ianw | i haven't tried with the manual update of it yet, will today | 19:08 |
ianw | in the mean time, i wrote up https://review.opendev.org/757670 to install ansible in a venv on bridge, so we can --update it | 19:08 |
clarkb | assuming that works we should have our first borg'd server ya? | 19:09 |
ianw | i'll clean that up today, but it seems to work | 19:09 |
ianw | clarkb: hopefully :) anyway, progress is being made | 19:09 |
clarkb | #topic Splitting puppet else into specific infra-prod jobs | 19:10 |
*** openstack changes topic to "Splitting puppet else into specific infra-prod jobs (Meeting topic: infra)" | 19:10 | |
clarkb | I don't think anyone has started this yet but thought I'd quickly double check | 19:10 |
fungi | i have not, no | 19:11 |
clarkb | #topic Priority Efforts | 19:11 |
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)" | 19:11 | |
clarkb | #topic Update Configuration Management | 19:11 |
*** openstack changes topic to "Update Configuration Management (Meeting topic: infra)" | 19:11 | |
clarkb | I think I saw ianw working on the reprepro ansiblification. There was also an update to add in some files that were missing in gerrit ansible | 19:12 |
clarkb | anything to add re ^ or any other config management updates? | 19:12 |
ianw | yeah i'm starting on that, so we can get rid of more puppet there | 19:12 |
fungi | oh, was there an update to that change? i'm happy to take a look | 19:12 |
ianw | fungi: still very much a wip. i'm taking a less template-centric approach | 19:13 |
clarkb | this is an area we should be careful with the openstack release happening tomorrow but things like reprepro should be low impact if they break (due to how we vos release) | 19:13 |
fungi | probably wise. that was far too many templates | 19:13 |
ianw | yeah, there were about 3 different forms of templates, which made it more confusing than just looking at the files | 19:14 |
ianw | (it didn't start that way, of course :) | 19:14 |
fungi | shall i just abandon my topic:ansible-reprepro changes then? i guess you're working in a new change | 19:15 |
fungi | i'm entirely in favor of something with fewer templates | 19:15 |
fungi | that's what was so daunting about trying to convert the puppet to begin with | 19:16 |
fungi | i got as far as template conversion and stalled out | 19:16 |
ianw | fungi: you can leave it for now, i used some of it as reference :) | 19:17 |
fungi | by all means, happy it helped | 19:17 |
clarkb | #topic OpenDev | 19:19 |
*** openstack changes topic to "OpenDev (Meeting topic: infra)" | 19:19 | |
clarkb | That takes us to the topic I was hoping to make room for (and we did yay) | 19:19 |
clarkb | specifically upgrading our gerrit server | 19:19 |
clarkb | fungi and I have worked through a gerrit 2.13 to 3.2 upgrade on review-test using a snapshot of production from october 1 | 19:19 |
clarkb | That upgrade is looking to be about 2 days long (with gerrit offline for it) | 19:20 |
clarkb | The first step is to upgrade from 2.13 to 2.16 as we need 2.16 to do the notedb conversion | 19:20 |
fungi | which wouldn't be too terrible over a weekend | 19:20 |
fungi | two days over a weekend i mean | 19:20 |
clarkb | once we've upgraded to 2.16 I think we shoud checkpoint there so we don't have to fall back all the way to 2.13 if something goes wrong | 19:21 |
clarkb | then we run the notedb migration which will take about 8 hours | 19:21 |
clarkb | then the next day we can do the 3.0 through 3.2 upgrades | 19:21 |
corvus | by about 2 days, what do you mean? like 8am one day to 5pm the next? with idle time for when processes finish and no one is watching? | 19:21 |
corvus | or 48 hours straight? | 19:21 |
clarkb | 8am to 5pm the next | 19:22 |
corvus | cool | 19:22 |
fungi | with likely some idle time interspersed | 19:22 |
clarkb | roughly that process would look like: shut everything down and put up notices, backup reviewdb and git repos, do upgrade to 2.16, check it is happy, backup reviewdb and git repos (this is ~5pm day one), do notedb migration, at 8am next day do 3.0 to 3.2 upgrades which should finish around midday. Spend rest of day turning things back on and merging changes to catch up with our new state | 19:23 |
fungi | like the notedb conversion, but also to a lesser extent offline reindexing, database schema migrations, git gc passes... | 19:23 |
clarkb | and ya lots of idle time waiting for things to finish | 19:24 |
clarkb | https://etherpad.opendev.org/p/gerrit-2.16-upgrade has timing data | 19:24 |
clarkb | From the testing side of things basic functionality seems to work | 19:24 |
fungi | which is about as accurately measured as we can manage. same server flavor, volume type, snapshot of production data, et cetera | 19:24 |
fungi | obviously though, clouds, no way to be sure about the timing for any of it | 19:25 |
clarkb | I can login, do git review -s, git review a change, review a change, search for changes, and otherwise interact with the web ui | 19:25 |
clarkb | fungi tested that ICLA signing works | 19:25 |
fungi | yup | 19:25 |
clarkb | I'm currently testing replication to a gitea99 from a held system-config-run-gitea job | 19:25 |
clarkb | that has been running for 25 hours now and is still not done replicating | 19:25 |
clarkb | but it is working | 19:26 |
fungi | for followup tasks (or could even be done beforehand), there are likely some zuul jobs to be written to replace some of our jeepyb gerrit hooks | 19:26 |
clarkb | my takeaway from that is we should be prepared to possibly stop replicating refs/changes again but we can make that decision if it becomes a problem? | 19:26 |
clarkb | yes, the next thing I want to test is project creation with manage-projects, then renaming a project, and finally use the delete-project plugin to test deleting a project | 19:26 |
corvus | why would we need to re-replicate refs/changes? | 19:27 |
clarkb | corvus: we'll be replicating all of the notedb content which is in refs/changes/XY/ABCXY/meta now | 19:27 |
corvus | hrm. but in your test, gitea99 doesn't have the bulk of the refs/changes content while prod gitea does | 19:27 |
clarkb | corvus: correct | 19:28 |
fungi | this isn't for timing the replication, but measuring the resultant system | 19:28 |
clarkb | it will be ~15GB of data to replicate I think based on df output before and after the notedb migration | 19:28 |
fungi | also the replication to gitea can happen outside the window, it's merely additive | 19:29 |
fungi | but we want to know if the added data is going to cause our gitea servers to topple over | 19:29 |
corvus | right, though it could cause replication to lag which could affect users | 19:29 |
fungi | this is true, yes | 19:29 |
clarkb | ya I think we should go in with the intention of replicating refs/changes and keep in mind we can disable it if we notice problems (so far the only problemi s the speed at which a fresh server can be replicated to) | 19:30 |
fungi | we can re-test by importing a gitea production db backup and re-replicating the difference | 19:30 |
fungi | if we want to have an idea of how long it's going to actually take to catch up | 19:30 |
fungi | (for purposes of messaging to our users about replication lag) | 19:31 |
clarkb | on the jeepyb side of things the lp bug and spec integration as well as the welcome message hook all talk to reviewdb which will be stale after the notedb migration (and eventually we'll drop that db and it will break completely) | 19:31 |
corvus | we can also disable replicating refs/changes/XY/ABCXY/meta right? | 19:31 |
clarkb | corvus: I can't figure out how to do it | 19:31 |
corvus | is it a regex? negative lookahead? | 19:31 |
clarkb | no its just globs I think | 19:31 |
clarkb | it uses gits normal ref syntax | 19:31 |
corvus | oh. nm then. | 19:31 |
fungi | might be worth asking luca if we can't figure it out | 19:31 |
fungi | though yeah, sounds... like a glob | 19:32 |
corvus | i agree that we can probably assume it will be okay and disable if not and regroup | 19:32 |
clarkb | for jeepyb I'm wondering if people think we should proactively remove those gerrit hooks | 19:32 |
clarkb | or if anyone is interseted in looking at them to see if they need to use the db or if they can just hit the rest api maybe | 19:33 |
clarkb | (I think the rest api would be the best way to interact with notedb) | 19:33 |
fungi | those hooks seem like somethnig which could be replaced with zuul jobs in advance with little or no modification needed between 2.13 and 3.2 | 19:34 |
ianw | can you point out what those hooks are, for those of us who might not know? :) | 19:35 |
clarkb | one sec | 19:35 |
ianw | getting something zuulified might be somewhere i can practically help :) | 19:35 |
clarkb | https://opendev.org/opendev/jeepyb/src/branch/master/jeepyb/cmd/update_bug.py | 19:36 |
clarkb | https://opendev.org/opendev/jeepyb/src/branch/master/jeepyb/cmd/update_blueprint.py | 19:36 |
clarkb | https://opendev.org/opendev/jeepyb/src/branch/master/jeepyb/cmd/welcome_message.py | 19:36 |
clarkb | then in system config wehave simple shim bash scripts that execute those jeepyb tools via gerrit hooks | 19:37 |
fungi | ianw: yeah, that's why i mentioned, they can probably be worked on in parallel and then that's one less thing we have to worry about afterward | 19:37 |
clarkb | related to this is storyboard integration which is currently done via the its-storyboard plugin. That plugin hasnt had much development in years | 19:38 |
clarkb | its possible that it just works since storyboard and gerrit plugins are quite stable but I havent set it up to test it | 19:38 |
fungi | also zuul and the storyboard api are both far more extensible than the its framework | 19:38 |
ianw | looks like the db parts there are for setting the uploader of a change to the owner of the bug in launchpad | 19:39 |
fungi | so there's a lot of opportunity for improvement as a zuul job anyway in my opinion | 19:39 |
clarkb | ianw maybe you can look at the jeepyb side and give a recommendation and based on that we decide if weneed to test storyboard integration? | 19:40 |
ianw | ok, i can take a look and see if i can find the apis to replicate what's there | 19:41 |
ianw | i can make an therpad | 19:42 |
clarkb | thank you | 19:42 |
clarkb | other things to note, draft changes will be converted to WIP changes | 19:43 |
clarkb | the UI will change as gerrit 3.2 is polygerrit only | 19:43 |
clarkb | the zuul commentlink stuff will be removed and we can figure out how to make them fancy again post upgrade | 19:43 |
fungi | also the custom summary table js overlay yeah? | 19:44 |
clarkb | yes that doesnt work either so in my change stack it is removed | 19:44 |
corvus | i'm still out of ideas for that other than a polygerrit plugin | 19:44 |
clarkb | https://review.opendev.org/#/c/757162/ is the end of my WIP stack if you want to take alook | 19:45 |
clarkb | mechanically I'm not quite sure yet how we land those post upgrade | 19:46 |
clarkb | assuming zuul is already running we won't want them to actually execute in sequence | 19:47 |
clarkb | we could squash them all together or perhaps remove the infra-prod job for gerrit | 19:47 |
clarkb | or land them pre zuul being started (one change in the sequence is a zuul config update to get it authing properly to gerrit) | 19:47 |
clarkb | I think we can do a proper review of the upgrade process from start to finish once I've written up a better change doc | 19:47 |
clarkb | and check on that there | 19:48 |
clarkb | Are there other gerrit features or functionality that people think will be critical to test pre upgrade? | 19:48 |
ianw | gertty? | 19:49 |
clarkb | ++ would existing gertty users like to point it at review-test or should I get a local install running again? | 19:49 |
clarkb | also possible that corvus already uses gertty with upstream gerrit and its fine? | 19:50 |
fungi | plenty of folks are also using gertty with newer gerrits, but can't hurt | 19:50 |
clarkb | ok add that to my local list | 19:50 |
corvus | i have used it with upstream gerrit and it works | 19:50 |
clarkb | excellent | 19:50 |
corvus | it doesn't fully support all the new features, but it functions | 19:50 |
corvus | (eg tag support is half-implemented) | 19:50 |
corvus | i mean hashtag | 19:50 |
clarkb | given all that do we think we should start working to schedule a downtime under the assumption we'll work through the remainder of our test items in the interim? | 19:51 |
corvus | gerrit devs are good api stewards :) | 19:51 |
clarkb | I think we should start a downtime window ~PDT friday morning and end it ~PDT sunday evening | 19:51 |
clarkb | where PDT may be PST due to time change | 19:52 |
clarkb | that gives us a large buffer over our less busy period of time | 19:52 |
clarkb | with the goal of being done saturday | 19:52 |
fungi | it will be pst by then, yes | 19:53 |
fungi | time change is coming up for the usa next week i think? | 19:53 |
clarkb | fungi: ya just before the PTG | 19:53 |
clarkb | with the summit and PTG coming up and then tue US election likely to be distracting I think the earliest we could do it is 13th of november (friday the 13th!) | 19:54 |
clarkb | but maybe 20th is better as it gives us a buffer and is closer to a large us holiday (so likely to be quiet?) | 19:54 |
fungi | lucky 13 | 19:54 |
clarkb | ianw: corvus: any opinions? | 19:54 |
corvus | checking | 19:55 |
fungi | i'm open all of the above | 19:55 |
fungi | there's yet more pressure to update as we've just today learned that fedora 33's default openssl policy causes ssh to no longer accept our gerrit host key without additional overrides | 19:55 |
clarkb | I think my personal vote would be November 20, 21, 22 | 19:55 |
clarkb | and keep working on testing things if something major comes up we can push it back | 19:56 |
clarkb | but so far testing has been mostly happy (as long as we accept things won't be perfect just functional) | 19:56 |
ianw | afaik i can be around then, not that i've really done anything on the details of the upgrade | 19:56 |
clarkb | ya the only reason I've said PST timezone reference is I've been doing a lot of the work so figure I should be around to drive things | 19:57 |
clarkb | but I'd love as many eyeballs as possible :) | 19:57 |
fungi | additional hands are helpful for sorting out unanticipated issues after the upgrade too | 19:57 |
corvus | either of those sounds good. i suspect due to covid, people may be generous in taking time off before or after thanksgiving this year. | 19:57 |
corvus | (eg, banked use-it-or-lose-it vacation days in usa) | 19:57 |
clarkb | ya | 19:58 |
fungi | or they may be like me and have an excuse to be anti-social and not worry about family obligations ;) | 19:58 |
clarkb | why don't we pencil in the 20th if nothing jeopardizing that comes up in the next week we can announce it with a full month of warning? | 19:58 |
corvus | that sounds good | 19:58 |
fungi | sounds great | 19:58 |
corvus | fungi: oh yeah, i'm not suggesting people would use vacation to meet with other people socially. just use it period. | 19:59 |
clarkb | woo and we haven't quite run out of time yet either :) | 19:59 |
clarkb | Is there anything else related to the gerrit upgrade that we want to talk about before I end the meeting? | 19:59 |
clarkb | also thank you for all the help and willingness to do a weekend outage | 19:59 |
corvus | clarkb: thanks to you for driving it. and fungi too. | 19:59 |
clarkb | and we're at time. Thanks again | 20:00 |
clarkb | #endmeeting | 20:00 |
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev" | 20:00 | |
openstack | Meeting ended Tue Oct 13 20:00:15 2020 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:00 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-10-13-19.01.html | 20:00 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-10-13-19.01.txt | 20:00 |
openstack | Log: http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-10-13-19.01.log.html | 20:00 |
fungi | thanks clarkb! | 20:02 |
*** hashar has quit IRC | 20:30 | |
*** diablo_rojo has quit IRC | 22:34 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!