19:01:02 <clarkb> #startmeeting infra 19:01:03 <openstack> Meeting started Tue Aug 11 19:01:02 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:04 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:06 <openstack> The meeting name has been set to 'infra' 19:01:11 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2020-August/000070.html Our Agenda 19:01:38 <clarkb> #topic Announcements 19:01:56 <clarkb> I'll be out tomorrow in order to get outside and do some socially distant fishing 19:02:06 <ianw> o/ 19:02:09 <zbr> o/ 19:02:25 <clarkb> may as well take advantage of the early morning schedule for opendev event and get out on the water early too :) 19:02:33 <clarkb> any other announcements? 19:03:03 <frickler> o/ 19:03:08 <clarkb> #topic Actions from last meeting 19:03:18 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-08-04-19.01.txt minutes from last meeting 19:03:34 <clarkb> we didn't record any actions, but the week prior had an action for ianw to look into python wheel caching for more than openstack 19:03:38 <clarkb> ianw: ^ anything new to add to that? 19:04:09 <ianw> i haven't pushed any changes, but yes i started playing with the manylinux docker container builders 19:05:00 <ianw> https://github.com/pyca/cryptography/issues/5292#issuecomment-671759306 19:05:38 <ianw> that will be useful for us, but also be something that can be useful for others too. although how it gets consumed is an open question 19:05:55 <clarkb> ianw: I think a good chunk of the wheels we build assume things about system packages like libvirt? but even that one may be able to be manylinux'd if it can handle many libvirt apis 19:06:16 <clarkb> that said I suspect a number are actually reduceable to many linux 19:06:34 <ianw> yeah, i think think it's an "and" thing rather than an "or" 19:07:36 <ianw> anyway, i got a bit sidetracked into other things as well, so didn't make great progress, but still looking at it 19:07:41 <clarkb> thanks 19:07:43 <clarkb> #topic Specs approval 19:07:51 <clarkb> #link https://review.opendev.org/#/c/731838/ Authentication broker service 19:08:09 <clarkb> I've reviewed the latest patchset for this change and it look sgood to me 19:08:30 <clarkb> but I don't think many others have reviewed it yet. Is this something we think we might put up for approval next week and hopefully do any remaining iteration this week? 19:09:48 <fungi> i wouldn't mind corvus and mordred taking a look through it since they're technically co-authors (i copied and pasted text from some of their e-mails from years past) 19:10:02 <corvus> i'll do that this week 19:10:06 <fungi> thanks! 19:10:23 <clarkb> thank you. I guess we'll see where we end up and possibly will have that up for approval next week 19:10:34 <clarkb> #topic Priority Efforts 19:10:35 <fungi> would just hate to have a spec they sort of co-wrote merge without at least getting skimmed ;) 19:10:45 <clarkb> fungi: ++ 19:10:47 <clarkb> #topic Update Config Management 19:11:01 <clarkb> I've been poking at some Gerrit related docker things recently 19:11:17 <clarkb> Gerritbot containerization in progress at https://review.opendev.org/#/c/745573/1 and parents 19:11:24 <clarkb> #link https://review.opendev.org/#/c/745240/ in particular needs second review. 19:11:47 <clarkb> if I can get a second review that is happy with ^ I'll try to land and coordinate those changes with review.o.o and eavesdrop today 19:12:12 <clarkb> the ansible group vars should all be set. Its just a matter of stopping gerritbot on review.o.o and ensuring the process that starts in docker on eavesdrop is happy 19:12:50 <clarkb> I've also been trying to better understand the gerrit upgrade process which has led to a change with image fixes for gerrit 19:12:52 <clarkb> #link https://review.opendev.org/745595 fixes for gerrit plugins on newer gerrit images 19:13:08 <clarkb> that change as is is a bit omnibus like. I'm happy to split it up a bit if reviewers would prefer 19:13:42 <clarkb> mostly what it does is checkout valid versions of plugins so they build properly across the gerrit versions. It also addresses a javamelody is special problem with plugin building 19:14:36 <clarkb> Any other config management items to bring up? 19:15:46 <fungi> that reminds me i need to finish the mirror-update ansibilification for reprepro mirrors 19:16:29 <clarkb> #topic OpenDev 19:17:04 <clarkb> #link https://review.opendev.org/741277 Gerritlib change to support creating projects with non master HEAD 19:17:19 <clarkb> #link https://review.opendev.org/741279 Can land once Gerritlib release is made with above change. This change updates jeepyb to toggle that flag 19:17:58 <clarkb> For review-test does anyone understand the state it is in? 19:18:15 <clarkb> ianw: I think you had to disable the upstream trakcing cron because it had filled the disk with logs? 19:18:44 <ianw> yes, but the point was more that ansible wasn't completing on it so i had to do it manually 19:19:08 <clarkb> do we need to add it to the emergency file while we figure out what it needs? 19:19:55 <ianw> umm, i forget now why it wasn't simple to fix 19:21:11 <clarkb> ok, it would probably be a good idea to see if we can keep it from interacting with other production things 19:21:50 <clarkb> from a general upgrade perspective I've been trying to bootstrap myself on the process there so that one can be written down for tesitng on review-test 19:22:03 <ianw> http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2020-08-04.log.html#t2020-08-04T21:52:30 19:22:04 <clarkb> that is how I found the docker image issues I'm trying to fix in that change above 19:22:09 <zbr> clarkb: i can help with gerritbot, probably gerritlib too, just add me s reviewer 19:22:16 <ianw> AnsibleUndefinedVariable: 'gerrit_vhost_name' is undefined 19:22:34 <ianw> that's what review-test is failing on, if that rings any bells for anyone 19:22:58 <fungi> maybe it's as simple as just adding a missed hostvar 19:23:12 <clarkb> corvus: one thing in particular that I've learned/discovered is that an upgrade to 2.15 should be really straightforward. Do you know why it was rejected or at least not the preferred route when you and mordred talked to gerrit? 19:23:41 <fungi> though we do set gerrit_vhost_name: review-test.opendev.org in inventory/service/host_vars/review-test.openstack.org.yaml 19:23:47 <clarkb> (I know it doesn't get us to fully notedb, but it is forward progress and likely could be done with minimal downtime, though I've yet to test 2.13 -> 2.15 without 2.14 in the middle) 19:24:28 <ianw> fungi: yeah i think that's about where i got to :) 19:24:52 <corvus> clarkb: one thing i remember is *do not use notedb in 2.15* 19:25:19 <clarkb> corvus: ya the default is notedb is only used for accounts on 2.15 iirc 19:25:20 <corvus> clarkb: so since we can't actually make progress on the conversion until 2.16, would an upgrade to 2.15 get us anything? 19:25:28 <ianw> fungi: yeah, that's right ... it started getting into how we deploy the host vars which got all split up and changed and *that's* why it wasn't trivial 19:25:55 <clarkb> corvus: it gets us momentum? and simplifies the testing process for us to get there as we don't have to do the full notedb profiling (disk use and performance) 19:26:04 <corvus> clarkb: i'm not sure if even that is safe? i dunno. 19:26:48 <clarkb> maybe that is something we should ask upstream. Mostly I'm thinking that we could get that done relatively quickly while still working towards the 2.16 notedb transition too 19:26:51 <corvus> clarkb: i just remember them saying that notedb in 2.15 is wrong, and doing a conversion in 2.15 would prevent a correct upgrade later. 19:27:18 <corvus> clarkb: like, i don't get what the purpose is? what is easy in 2.15 that's hard in 2.16? 19:27:33 <clarkb> corvus: 2.15 doesn't notedb for changes 19:27:37 <corvus> from my pov, 2.16 is like 2.15 but without bugs 19:27:44 <clarkb> corvus: thats not how I read their docs 19:27:53 <clarkb> 2.15 is no changes notedb because its buggy 19:28:07 <clarkb> and you keep using the system that worked previously (which is why it is easier, we know the prformance of that) 19:28:20 <corvus> you can do that with 2.16 too, right? 19:28:45 <clarkb> corvus: their docs actively recommend against that (but don't provide a reason), but yes we could go to 2.16 with no notedb then switch to notedb after potentially 19:29:15 <fungi> i guess it's a question of whether they're saying not to run 2.16 without notedb (so prevents us from decoupling the upgrade and db migration steps) or whether it's safe to run with everything still in mysql on 2.16 and then perform the notedb migration 19:29:18 <clarkb> but ya maybe the way to frame it is more around change in notedb rather than specific versions. I think we can relatively easily upgrade to a gerrit version without converting to changes notedb 19:29:44 <clarkb> I think it is the notedb conversion itself that we want to carefully test and profile to ensure we don't run out disk, the performance remains reasonable, and to deal with any replication change requirements 19:29:45 <corvus> clarkb: say you're right: let's say it's easy to upgrade to 2.15 without making any changes. why would we do it? it doesn't help the upgrade to anything later. 19:29:57 <corvus> it's also not supported 19:30:05 <corvus> (even 2.16 isn't supported at this point) 19:30:27 <corvus> to me, it just feels like wasting time that could be spent working on an upgrade to 2.16 19:30:30 <clarkb> I think there is value in making some progress 19:30:39 <clarkb> it gets us polygerrit 19:30:42 <corvus> i don't think it's progress 19:30:46 <corvus> it's a bad/wrong polygerrit 19:30:54 <corvus> 2.16 is the polygerrit that we actually want to introduce users to 19:31:01 <zbr> i am inclined to believe doing both would be less work, less risks. 19:31:02 <clarkb> what is bad or wrong about it? 19:31:08 <fungi> another question. if we upgrade to 2.16 and migrate to notedb all in one shot, but subsequently discover issues with it, would we want to try to roll back to not-notedb or roll forward to a polygerrit-only 3.x and hope the issues resolved themselves? 19:31:31 <corvus> i think 2.15 polygerrit is different than 2.16; 2.16 is like what's in 3.x 19:32:39 <clarkb> I'm just looking at the problem in front of us and a big part of it (from my perspective) is its size. It feels huge and breaking it down if we can would help us make progress I think 19:32:52 <clarkb> it isn't perfect, and we should continue to plan to upgrade further 19:32:56 <corvus> clarkb: there is a significant chance (i say this based on past upgrades) that each version upgrade will cause us to burn cycles on version-specific issues. i'd hate to spend time on a 2.15 specific issue rather than 2.16. 19:33:19 <corvus> clarkb: in general, i agree with small incremental steps. i just don't see what the incremental step between here and 2.16 is. 19:33:28 <clarkb> corvus: thats fair, maybe we should ask about the risk with not doing the notedb migration immediately on 2.16? then we can continue to split the problems up? 19:34:25 <clarkb> from an operational perspective it would simplify the imgaes we need to manage 19:34:30 <clarkb> we could drop 2 or 3 of them 19:34:30 <corvus> clarkb: luca has offered to help us upgrade, and i think that would be exactly the sort of thing he would welcome as part of that offer 19:34:51 <corvus> (that question) 19:35:05 <clarkb> corvus: thats great to hear, why don't I do a bit more local investigating (there are a few scnearios I wanted to test like skipping versions) then try and compile a couple of questions like that for luca 19:35:12 <corvus> sorry if that wasn't clear; i'm suggesting that asking luca about running 2.16 without notedb is a good q 19:35:16 <clarkb> yup 19:35:30 <corvus> also, the q about 2.15 19:35:48 <fungi> also maybe he can clarify whether running the 2.15 its-storyboard plugin with 3.x is a bad idea 19:36:22 <fungi> since they don't seem to have branched it past 2.15 19:36:30 <corvus> (my guess is he's more likely to agree that upgrading to 2.15 is a good idea) 19:36:59 <corvus> (but he doesn't have my lived experience with what it takes to perform an opendev gerrit upgrade) 19:37:12 <corvus> (he upgrades continually :) 19:37:30 <clarkb> I should be able to have tested our updated images by the end of the week and have a general sense for what is potentially a good idea and what isn't based on simple local testing. Then work up an email for luca 19:38:17 <clarkb> having images that don't spit out tracebacks on startup was step 0 there :) 19:38:27 <clarkb> anything else on opendev? 19:39:11 <clarkb> #topic General Topics 19:39:19 <clarkb> #topic Bup and Borg 19:39:31 <clarkb> ianw I think this is largely a borg topic now. The borg change has the reviews it needs 19:40:01 <ianw> yep, i just need to start a server and get something in to test, still on my short-term TODO sorry 19:40:15 <clarkb> no worries. I think we're all pretty swamped. But wanted ot make sure you saw that 19:40:26 <clarkb> #topic Github 3rd Party CI 19:40:38 <clarkb> how is this going? 19:41:00 <ianw> so OK I think, no complaints 19:41:21 <clarkb> are they consuming it as a CI system yet or are we still in the water temp testing stage? 19:41:28 <fungi> praise would be better, but i'll take no complaints ;) 19:41:28 <ianw> next thing for pyca/cryptography we should enable it for master commits, as well as pull requests 19:41:46 <ianw> the project has a .zuul.d directory committed, so that's good :) 19:42:29 <ianw> so probably the next thing is to see if we can fit into wheel generation somewhere, as described previously 19:43:38 <ianw> i got some private communication that libxml was also in need of similar arm64 resources 19:44:44 <clarkb> lxml or libxml? 19:45:03 <clarkb> but ya it wouldn't surprise me if there is a similar need for many of those more costly python packages we've seen end up in the openstack wheel cache 19:45:52 <clarkb> anything else on this topic? 19:47:06 <clarkb> #topic Open Discussion 19:47:13 <ianw> lxml sorry 19:47:25 <fungi> yeah, lxml links libxml when building 19:47:27 <clarkb> we have a bit of time for any other items that are shareable 19:47:46 <fungi> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016424.html dates for ptg have been firmed up (it's the week after the summit) 19:49:07 <clarkb> oh I had missed that 19:49:24 <clarkb> the 6 hours we had last time seemed to work well so I'll probably schedule a similar block. 19:50:04 <clarkb> also we'll likely want to double check meetpad is still happy a few weeks prior to that (iirc we auto update the images so we may pull in new things we need to accomodate) 19:51:14 <corvus> my patch isn't merged upstream, so the web server container is still pinned 19:51:45 <clarkb> we should also consider scaling it up again 19:52:03 <clarkb> though we probably only need ~2 extra servers this time based on load last time (really the bottleneck most people saw eemed to be in the browser) 19:54:36 <clarkb> sounds like that may be it. Thank you everyone. We'll see you here next week. 19:54:40 <clarkb> #endmeeting